Re: [RFC 1/2] target: Add documentation on the target userspace pass-through driver
On Tue, Jul 01, 2014 at 12:11:14PM -0700, Andy Grover wrote: Describes the driver and its interface to make it possible for user programs to back a LIO-exported LUN. Signed-off-by: Andy Grover agro...@redhat.com --- Documentation/target/tcmu-design.txt | 210 +++ 1 file changed, 210 insertions(+) create mode 100644 Documentation/target/tcmu-design.txt diff --git a/Documentation/target/tcmu-design.txt b/Documentation/target/tcmu-design.txt new file mode 100644 index 000..200ff3e --- /dev/null +++ b/Documentation/target/tcmu-design.txt @@ -0,0 +1,210 @@ +TCM Userspace Design + + + +Background: + +In addition to modularizing the transport protocol used for carrying +SCSI commands (fabrics), the Linux kernel target, LIO, also modularizes +the actual data storage as well. These are referred to as backstores +or storage engines. The target comes with backstores that allow a +file, a block device, RAM, or another SCSI device to be used for the +local storage needed for the exported SCSI LUN. Like the rest of LIO, +these are implemented entirely as kernel code. + +These backstores cover the most common use cases, but not all. One new +use case that other non-kernel target solutions, such as tgt, are able +to support is using Gluster's GLFS or Ceph's RBD as a backstore. The +target then serves as a translator, allowing initiators to store data +in these non-traditional networked storage systems, while still only +using standard protocols themselves. + +If the target is a userspace process, supporting these is easy. tgt, +for example, needs only a small adapter module for each, because the +modules just use the available userspace libraries for RBD and GLFS. + +Adding support for these backstores in LIO is considerably more +difficult, because LIO is entirely kernel code. Instead of undertaking +the significant work to port the GLFS or RBD APIs and protocols to the +kernel, another approach is to create a userspace pass-through +backstore for LIO, TCMU. It has to be said that this documentation is terrible. Jumping in medias res[1] is great for fiction, awful for technical documentation. I would recommend the Economist Style Guide[2]. They always say Barak Obama, President of the United States the first time he is mentioned in an article, even though almost everyone knows who Barak Obama is. In this case you're leaping into something .. fabrics, LIO, backstores, target solutions, ... aargh. Explain what you mean by each term and how it all fits together. Thanks, Rich. [1] https://en.wikipedia.org/wiki/In_medias_res [2] http://www.economist.com/styleguide/introduction -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-builder quickly builds VMs from scratch http://libguestfs.org/virt-builder.1.html -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 1/2] target: Add documentation on the target userspace pass-through driver
On Sun, Aug 31, 2014 at 12:49:26PM -0700, Andy Grover wrote: Thanks for the feedback. I am undoubtedly too close to the details, because I thought I *was* explaining things :) Yeah, sorry it came across as a bit harsh. Benoit did explain it to me so I understood it in the end (I think!) This doc is for people like you -- tech-savvy but unfamiliar with this specific area. Would you be so kind as to point out exactly the terms this document should explain? Should it explain SCSI and SCSI commands? What a SCSI target is? Say target implementations rather than target solutions? Do I need some ASCII art? So I can only speak for myself here, but I'm pretty familiar with iSCSI, using it, and some of the internals -- in fact I'm using the Linux kernel target daily. TCM Userspace Design In addition to modularizing the transport protocol used for carrying SCSI commands (fabrics), the Linux kernel target, LIO, also modularizes the actual data storage as well. These are referred to as backstores or storage engines. Reading this several times, I now think I get what it's trying to say, but I think it needs to introduces the terms (as the Economist style does). Something like this: TCM is the new name for LIO, an in-kernel iSCSI target (server). Existing TCM targets run in the kernel. TCMU (TCM in Userspace) allows userspace programs to be written which act as iSCSI targets. This document describes the design. The existing kernel provides modules for different SCSI transport protocols. TCM also modularizes the data storage. There are existing modules for file, block device, RAM or using another SCSI device as storage. These are called backstores or storage engines. These built-in modules are implemented entirely as kernel code. And hopefully having defined a bit of background, the rest of the document just flows nicely: These backstores cover the most common use cases, but not all. One new use case that other non-kernel target solutions, such as tgt, are able to support is using Gluster's GLFS or Ceph's RBD as a backstore. The target then serves as a translator, allowing initiators to store data in these non-traditional networked storage systems, while still only using standard protocols themselves. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-df lists disk usage of guests without needing to install any software inside the virtual machine. Supports Linux and Windows. http://people.redhat.com/~rjones/virt-df/ -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [SCSI] virtio-scsi: Initialize scatterlist structure.
On Mon, Oct 01, 2012 at 03:13:01PM +0200, Paolo Bonzini wrote: Il 20/08/2012 16:05, Paolo Bonzini ha scritto: Il 20/08/2012 16:04, Richard W.M. Jones ha scritto: From: Richard W.M. Jones rjo...@redhat.com The sg struct is used without being initialized. https://bugzilla.redhat.com/show_bug.cgi?id=847548 Signed-off-by: Richard W.M. Jones rjo...@redhat.com --- drivers/scsi/virtio_scsi.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c index c7030fb..8a66f83 100644 --- a/drivers/scsi/virtio_scsi.c +++ b/drivers/scsi/virtio_scsi.c @@ -219,7 +219,7 @@ static int virtscsi_kick_event(struct virtio_scsi *vscsi, struct scatterlist sg; unsigned long flags; - sg_set_buf(sg, event_node-event, sizeof(struct virtio_scsi_event)); + sg_init_one(sg, event_node-event, sizeof(struct virtio_scsi_event)); spin_lock_irqsave(vscsi-event_vq.vq_lock, flags); Acked-by: Paolo Bonzini pbonz...@redhat.com Paolo Ping, is this patch going into 3.7? I was a bit surprised to see this wasn't upstream since it's vital for using virtio-scsi reliably (as well as being a one line fix for a blatant bug). Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming blog: http://rwmj.wordpress.com Fedora now supports 80 OCaml packages (the OPEN alternative to F#) http://cocan.org/getting_started_with_ocaml_on_red_hat_and_fedora -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] [SCSI] virtio-scsi: Initialize scatterlist structure.
From: Richard W.M. Jones rjo...@redhat.com The sg struct is used without being initialized. https://bugzilla.redhat.com/show_bug.cgi?id=847548 Signed-off-by: Richard W.M. Jones rjo...@redhat.com --- drivers/scsi/virtio_scsi.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c index c7030fb..8a66f83 100644 --- a/drivers/scsi/virtio_scsi.c +++ b/drivers/scsi/virtio_scsi.c @@ -219,7 +219,7 @@ static int virtscsi_kick_event(struct virtio_scsi *vscsi, struct scatterlist sg; unsigned long flags; - sg_set_buf(sg, event_node-event, sizeof(struct virtio_scsi_event)); + sg_init_one(sg, event_node-event, sizeof(struct virtio_scsi_event)); spin_lock_irqsave(vscsi-event_vq.vq_lock, flags); -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Increased memory usage with scsi-mq
OK this is looking a bit better now. With scsi-mq enabled: 175 disks virtqueue_size=64: 318 disks * virtqueue_size=16: 775 disks * With scsi-mq disabled: 1755 disks * = new results I also ran the whole libguestfs test suite with virtqueue_size=16 (with no failures shown). As this tests many different disk I/O operations, it gives me some confidence that things generally work. Do you have any other comments about the patches? I'm not sure I know enough to write an intelligent commit message for the kernel patch. Rich. --- kernel patch --- diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c index 9be211d68b15..d6b4ff634c0d 100644 --- a/drivers/scsi/virtio_scsi.c +++ b/drivers/scsi/virtio_scsi.c @@ -972,6 +972,8 @@ static int virtscsi_probe(struct virtio_device *vdev) if (err) goto virtscsi_init_failed; + shost->can_queue = virtqueue_get_vring_size(vscsi->req_vqs[0].vq); + cmd_per_lun = virtscsi_config_get(vdev, cmd_per_lun) ?: 1; shost->cmd_per_lun = min_t(u32, cmd_per_lun, shost->can_queue); shost->max_sectors = virtscsi_config_get(vdev, max_sectors) ?: 0x; diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index 5e1b548828e6..2d7509da9f39 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -296,7 +296,6 @@ static inline int virtqueue_add(struct virtqueue *_vq, } #endif - BUG_ON(total_sg > vq->vring.num); BUG_ON(total_sg == 0); head = vq->free_head; @@ -305,8 +304,10 @@ static inline int virtqueue_add(struct virtqueue *_vq, * buffers, then go indirect. FIXME: tune this threshold */ if (vq->indirect && total_sg > 1 && vq->vq.num_free) desc = alloc_indirect(_vq, total_sg, gfp); - else + else { desc = NULL; + WARN_ON_ONCE(total_sg > vq->vring.num && !vq->indirect); +} if (desc) { /* Use a single buffer which doesn't continue */ --- qemu patch --- diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c index eb639442d1..aadd99aad1 100644 --- a/hw/scsi/virtio-scsi.c +++ b/hw/scsi/virtio-scsi.c @@ -867,10 +867,10 @@ void virtio_scsi_common_realize(DeviceState *dev, s->sense_size = VIRTIO_SCSI_SENSE_DEFAULT_SIZE; s->cdb_size = VIRTIO_SCSI_CDB_DEFAULT_SIZE; -s->ctrl_vq = virtio_add_queue(vdev, VIRTIO_SCSI_VQ_SIZE, ctrl); -s->event_vq = virtio_add_queue(vdev, VIRTIO_SCSI_VQ_SIZE, evt); +s->ctrl_vq = virtio_add_queue(vdev, s->conf.virtqueue_size, ctrl); +s->event_vq = virtio_add_queue(vdev, s->conf.virtqueue_size, evt); for (i = 0; i < s->conf.num_queues; i++) { -s->cmd_vqs[i] = virtio_add_queue(vdev, VIRTIO_SCSI_VQ_SIZE, cmd); +s->cmd_vqs[i] = virtio_add_queue(vdev, s->conf.virtqueue_size, cmd); } } @@ -917,6 +917,7 @@ static void virtio_scsi_device_unrealize(DeviceState *dev, Error **errp) static Property virtio_scsi_properties[] = { DEFINE_PROP_UINT32("num_queues", VirtIOSCSI, parent_obj.conf.num_queues, 1), +DEFINE_PROP_UINT32("virtqueue_size", VirtIOSCSI, parent_obj.conf.virtqueue_size, 128), DEFINE_PROP_UINT32("max_sectors", VirtIOSCSI, parent_obj.conf.max_sectors, 0x), DEFINE_PROP_UINT32("cmd_per_lun", VirtIOSCSI, parent_obj.conf.cmd_per_lun, diff --git a/include/hw/virtio/virtio-scsi.h b/include/hw/virtio/virtio-scsi.h index de6ae5a9f6..e30a92d3e7 100644 --- a/include/hw/virtio/virtio-scsi.h +++ b/include/hw/virtio/virtio-scsi.h @@ -48,6 +48,7 @@ typedef struct virtio_scsi_config VirtIOSCSIConfig; struct VirtIOSCSIConf { uint32_t num_queues; +uint32_t virtqueue_size; uint32_t max_sectors; uint32_t cmd_per_lun; #ifdef CONFIG_VHOST_SCSI -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-top is 'top' for virtual machines. Tiny program with many powerful monitoring features, net stats, disk stats, logging, etc. http://people.redhat.com/~rjones/virt-top
Re: Increased memory usage with scsi-mq
On Thu, Aug 10, 2017 at 02:53:58PM +0200, Paolo Bonzini wrote: > can_queue and cmd_per_lun are different. can_queue should be set to the > value of vq->vring.num where vq is the command virtqueue (the first one > is okay if there's >1). > > If you want to change it, you'll have to do so in QEMU. Here are a couple more patches I came up with, the first to Linux, the second to qemu. There are a few problems ... (1) Setting virtqueue_size to < 128 causes a BUG_ON failure in virtio_ring.c:virtqueue_add at: BUG_ON(total_sg > vq->vring.num); I initially thought that I should also set cmd_per_lun to the same value, but that doesn't help. Is there some relationship between virtqueue_size and other settings? (2) Can/should the ctrl, event and cmd queue sizes be set to the same values? Or should ctrl & event be left at 128? (3) It seems like it might be a problem that virtqueue_size is not passed through the virtio_scsi_conf struct (which is ABI between the kernel and qemu AFAICT and so not easily modifiable). However I think this is not a problem because virtqueue size is stored in the Virtqueue Descriptor table, which is how the kernel gets the value. Is that right? Rich. --- kernel patch --- diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c index 9be211d68b15..d6b4ff634c0d 100644 --- a/drivers/scsi/virtio_scsi.c +++ b/drivers/scsi/virtio_scsi.c @@ -972,6 +972,8 @@ static int virtscsi_probe(struct virtio_device *vdev) if (err) goto virtscsi_init_failed; + shost->can_queue = virtqueue_get_vring_size(vscsi->req_vqs[0].vq); + cmd_per_lun = virtscsi_config_get(vdev, cmd_per_lun) ?: 1; shost->cmd_per_lun = min_t(u32, cmd_per_lun, shost->can_queue); shost->max_sectors = virtscsi_config_get(vdev, max_sectors) ?: 0x; --- qemu patch --- diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c index eb639442d1..aadd99aad1 100644 --- a/hw/scsi/virtio-scsi.c +++ b/hw/scsi/virtio-scsi.c @@ -867,10 +867,10 @@ void virtio_scsi_common_realize(DeviceState *dev, s->sense_size = VIRTIO_SCSI_SENSE_DEFAULT_SIZE; s->cdb_size = VIRTIO_SCSI_CDB_DEFAULT_SIZE; -s->ctrl_vq = virtio_add_queue(vdev, VIRTIO_SCSI_VQ_SIZE, ctrl); -s->event_vq = virtio_add_queue(vdev, VIRTIO_SCSI_VQ_SIZE, evt); +s->ctrl_vq = virtio_add_queue(vdev, s->conf.virtqueue_size, ctrl); +s->event_vq = virtio_add_queue(vdev, s->conf.virtqueue_size, evt); for (i = 0; i < s->conf.num_queues; i++) { -s->cmd_vqs[i] = virtio_add_queue(vdev, VIRTIO_SCSI_VQ_SIZE, cmd); +s->cmd_vqs[i] = virtio_add_queue(vdev, s->conf.virtqueue_size, cmd); } } @@ -917,6 +917,7 @@ static void virtio_scsi_device_unrealize(DeviceState *dev, Error **errp) static Property virtio_scsi_properties[] = { DEFINE_PROP_UINT32("num_queues", VirtIOSCSI, parent_obj.conf.num_queues, 1), +DEFINE_PROP_UINT32("virtqueue_size", VirtIOSCSI, parent_obj.conf.virtqueue_size, 128), DEFINE_PROP_UINT32("max_sectors", VirtIOSCSI, parent_obj.conf.max_sectors, 0x), DEFINE_PROP_UINT32("cmd_per_lun", VirtIOSCSI, parent_obj.conf.cmd_per_lun, diff --git a/include/hw/virtio/virtio-scsi.h b/include/hw/virtio/virtio-scsi.h index de6ae5a9f6..e30a92d3e7 100644 --- a/include/hw/virtio/virtio-scsi.h +++ b/include/hw/virtio/virtio-scsi.h @@ -48,6 +48,7 @@ typedef struct virtio_scsi_config VirtIOSCSIConfig; struct VirtIOSCSIConf { uint32_t num_queues; +uint32_t virtqueue_size; uint32_t max_sectors; uint32_t cmd_per_lun; #ifdef CONFIG_VHOST_SCSI -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-builder quickly builds VMs from scratch http://libguestfs.org/virt-builder.1.html
[PATCH 2/2] virtio: virtio_scsi: Set can_queue to the length of the virtqueue.
Since switching to blk-mq as the default in commit 5c279bd9e406 ("scsi: default to scsi-mq"), virtio-scsi LUNs consume about 10x as much kernel memory. qemu currently allocates a fixed 128 entry virtqueue. can_queue currently is set to 1024. But with indirect descriptors, each command in the queue takes 1 virtqueue entry, so the number of commands which can be queued is equal to the length of the virtqueue. Note I intend to send a patch to qemu to allow the virtqueue size to be configured from the qemu command line. Thanks Paolo Bonzini, Christoph Hellwig. Signed-off-by: Richard W.M. Jones <rjo...@redhat.com> --- drivers/scsi/virtio_scsi.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c index 9be211d68b15..d6b4ff634c0d 100644 --- a/drivers/scsi/virtio_scsi.c +++ b/drivers/scsi/virtio_scsi.c @@ -972,6 +972,8 @@ static int virtscsi_probe(struct virtio_device *vdev) if (err) goto virtscsi_init_failed; + shost->can_queue = virtqueue_get_vring_size(vscsi->req_vqs[0].vq); + cmd_per_lun = virtscsi_config_get(vdev, cmd_per_lun) ?: 1; shost->cmd_per_lun = min_t(u32, cmd_per_lun, shost->can_queue); shost->max_sectors = virtscsi_config_get(vdev, max_sectors) ?: 0x; -- 2.13.1
[PATCH v2 0/2] virtio_scsi: Set can_queue based on size of virtqueue.
v1 was here: https://lkml.org/lkml/2017/8/10/689 v1 -> v2: Remove .can_queue field from the templates. Rich.
Re: [PATCH 1/2] virtio: Reduce BUG if total_sg > virtqueue size to WARN.
On Fri, Aug 11, 2017 at 12:21:16AM +0300, Michael S. Tsirkin wrote: > On Thu, Aug 10, 2017 at 05:40:34PM +0100, Richard W.M. Jones wrote: > > If using indirect descriptors, you can make the total_sg as large as > > you want. > > That would be a spec violation though, even if it happens to > work on current QEMU. > > The spec says: > A driver MUST NOT create a descriptor chain longer than the Queue Size > of the device. > > What prompted this patch? > Do we ever encounter this situation? This patch is needed because the following (2/2) patch will trigger that BUG_ON if I set virtqueue_size=64 or any smaller value. The precise backtrace is below. Rich. [4.029510] [ cut here ] [4.030127] kernel BUG at drivers/virtio/virtio_ring.c:299! [4.030834] invalid opcode: [#1] SMP [4.031340] Modules linked in: libcrc32c crc8 crc7 crc4 crc_itu_t virtio_pci virtio_mmio virtio_input virtio_balloon virtio_scsi nd_pmem nd_btt virtio_net virtio_crypto crypto_engine virtio_console virtio_rng virtio_blk virtio_ring virtio nfit crc32_generic crct10dif_pclmul crc32c_intel crc32_pclmul [4.034606] CPU: 0 PID: 1 Comm: init Not tainted 4.13.0-rc4+ #100 [4.035354] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014 [4.036770] task: 9a859e243300 task.stack: a731c00cc000 [4.037506] RIP: 0010:virtqueue_add_sgs+0x23d/0x460 [virtio_ring] [4.038250] RSP: :a731c00cf6e0 EFLAGS: 00010097 [4.038898] RAX: RBX: 0011 RCX: dd0680646c40 [4.039762] RDX: RSI: a731c00cf7b8 RDI: 9a85945c4d68 [4.040634] RBP: a731c00cf748 R08: 9a85945c4a78 R09: 01080020 [4.041508] R10: a731c00cf788 R11: 9a859b3d3120 R12: a731c00cf7d0 [4.042382] R13: a731c00cf7d0 R14: 0001 R15: 9a859b3f8200 [4.043248] FS: () GS:9a859e60() knlGS: [4.044232] CS: 0010 DS: ES: CR0: 80050033 [4.044942] CR2: 7fcff02e931c CR3: 1d23b000 CR4: 003406f0 [4.045815] DR0: DR1: DR2: [4.046684] DR3: DR6: fffe0ff0 DR7: 0400 [4.047559] Call Trace: [4.047876] virtscsi_add_cmd+0x1c9/0x280 [virtio_scsi] [4.048528] virtscsi_kick_cmd+0x38/0x90 [virtio_scsi] [4.049161] virtscsi_queuecommand+0x104/0x280 [virtio_scsi] [4.049875] virtscsi_queuecommand_single+0x38/0x40 [virtio_scsi] [4.050628] scsi_dispatch_cmd+0xf9/0x390 [4.051128] scsi_queue_rq+0x5e5/0x6f0 [4.051602] blk_mq_dispatch_rq_list+0x1ff/0x3c0 [4.052175] blk_mq_sched_dispatch_requests+0x199/0x1f0 [4.052824] __blk_mq_run_hw_queue+0x11d/0x1b0 [4.053382] __blk_mq_delay_run_hw_queue+0x8d/0xa0 [4.053972] blk_mq_run_hw_queue+0x14/0x20 [4.054485] blk_mq_sched_insert_requests+0x96/0x120 [4.055095] blk_mq_flush_plug_list+0x19d/0x410 [4.055661] blk_flush_plug_list+0xf9/0x2b0 [4.056182] blk_finish_plug+0x2c/0x40 [4.056655] __do_page_cache_readahead+0x32e/0x400 [4.057261] filemap_fault+0x2fb/0x890 [4.057731] ? filemap_fault+0x2fb/0x890 [4.058220] ? find_held_lock+0x3c/0xb0 [4.058714] ext4_filemap_fault+0x34/0x50 [4.059212] __do_fault+0x1e/0x110 [4.059644] __handle_mm_fault+0x6b2/0x1080 [4.060167] handle_mm_fault+0x178/0x350 [4.060662] __do_page_fault+0x26e/0x510 [4.061152] trace_do_page_fault+0x9d/0x290 [4.061677] do_async_page_fault+0x51/0xa0 [4.062189] async_page_fault+0x28/0x30 [4.062667] RIP: 0033:0x7fcff030a24f [4.063113] RSP: 002b:7ffefc2ad078 EFLAGS: 00010206 [4.063760] RAX: 7fcff02e931c RBX: 7fcff050f660 RCX: 7fcff02e935c [4.064648] RDX: 0664 RSI: RDI: 7fcff02e931c [4.065519] RBP: 7ffefc2ad320 R08: 7fcff02e931c R09: 00027000 [4.066392] R10: 7fcff02e9980 R11: 0206 R12: 7ffefc2ad0b0 [4.067263] R13: 7ffefc2ad408 R14: 0002 R15: 87f0 [4.068135] Code: af 01 c7 45 c8 01 00 00 00 45 31 ed e9 9b fe ff ff 31 db 48 83 7d d0 00 0f 85 3f fe ff ff 0f 0b 48 8b 7d b8 e8 e5 fd 22 de eb 8b <0f> 0b 0f 0b 44 89 6d a8 45 89 f5 48 8b 45 a0 44 8d 70 01 48 83 [4.070506] RIP: virtqueue_add_sgs+0x23d/0x460 [virtio_ring] RSP: a731c00cf6e0 [4.071434] ---[ end trace 02532659840e2a64 ]--- -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com libguestfs lets you edit virtual machines. Supports shell scripting, bindings from many languages. http://libguestfs.org
Re: [PATCH 1/2] virtio: Reduce BUG if total_sg > virtqueue size to WARN.
On Fri, Aug 11, 2017 at 12:31:44AM +0300, Michael S. Tsirkin wrote: > Then we probably should fail probe if vq size is too small. What does this mean? Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-builder quickly builds VMs from scratch http://libguestfs.org/virt-builder.1.html
Re: [PATCH 1/2] virtio: Reduce BUG if total_sg > virtqueue size to WARN.
On Fri, Aug 11, 2017 at 12:41:47AM +0300, Michael S. Tsirkin wrote: > On Thu, Aug 10, 2017 at 10:35:11PM +0100, Richard W.M. Jones wrote: > > On Fri, Aug 11, 2017 at 12:31:44AM +0300, Michael S. Tsirkin wrote: > > > Then we probably should fail probe if vq size is too small. > > > > What does this mean? > > > > Rich. > > We must prevent driver from submitting s/g lists > vq size to device. Fair point. I would have thought (not knowing anything about how this works in the kernel) that the driver should be able to split up scatter-gather lists if they are larger than what the driver supports? Anyway the commit message is derived from what Paolo told me on IRC, so let's see what he says. Rich. > Either tell linux to avoid s/g lists that are too long, or > simply fail request if this happens, or refuse to attach driver to device. > > Later option would look something like this within probe: > > for (i = VIRTIO_SCSI_VQ_BASE; i < num_vqs; i++) > if (vqs[i]->num < MAX_SG_USED_BY_LINUX) > goto err; > > > I don't know what's MAX_SG_USED_BY_LINUX though. > > > -- > > Richard Jones, Virtualization Group, Red Hat > > http://people.redhat.com/~rjones > > Read my programming and virtualization blog: http://rwmj.wordpress.com > > virt-builder quickly builds VMs from scratch > > http://libguestfs.org/virt-builder.1.html -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-builder quickly builds VMs from scratch http://libguestfs.org/virt-builder.1.html
[PATCH 1/2] virtio: Reduce BUG if total_sg > virtqueue size to WARN.
If using indirect descriptors, you can make the total_sg as large as you want. If not, BUG is too serious because the function later returns -ENOSPC. Thanks Paolo Bonzini, Christoph Hellwig. Signed-off-by: Richard W.M. Jones <rjo...@redhat.com> --- drivers/virtio/virtio_ring.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index 5e1b548828e6..27cbc1eab868 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -296,7 +296,6 @@ static inline int virtqueue_add(struct virtqueue *_vq, } #endif - BUG_ON(total_sg > vq->vring.num); BUG_ON(total_sg == 0); head = vq->free_head; @@ -305,8 +304,10 @@ static inline int virtqueue_add(struct virtqueue *_vq, * buffers, then go indirect. FIXME: tune this threshold */ if (vq->indirect && total_sg > 1 && vq->vq.num_free) desc = alloc_indirect(_vq, total_sg, gfp); - else + else { desc = NULL; + WARN_ON_ONCE(total_sg > vq->vring.num && !vq->indirect); + } if (desc) { /* Use a single buffer which doesn't continue */ -- 2.13.1
[PATCH 0/2] virtio_scsi: Set can_queue based on size of virtqueue.
Earlier discussion: https://lkml.org/lkml/2017/8/4/601 "Increased memory usage with scsi-mq" Downstream bug report: https://bugzilla.redhat.com/show_bug.cgi?id=1478201
[PATCH v2 2/2] virtio: virtio_scsi: Set can_queue to the length of the virtqueue.
Since switching to blk-mq as the default in commit 5c279bd9e406 ("scsi: default to scsi-mq"), virtio-scsi LUNs consume about 10x as much kernel memory. qemu currently allocates a fixed 128 entry virtqueue. can_queue currently is set to 1024. But with indirect descriptors, each command in the queue takes 1 virtqueue entry, so the number of commands which can be queued is equal to the length of the virtqueue. Note I intend to send a patch to qemu to allow the virtqueue size to be configured from the qemu command line. Thanks Paolo Bonzini, Christoph Hellwig. Signed-off-by: Richard W.M. Jones <rjo...@redhat.com> --- drivers/scsi/virtio_scsi.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c index 9be211d68b15..7c28e8d4955a 100644 --- a/drivers/scsi/virtio_scsi.c +++ b/drivers/scsi/virtio_scsi.c @@ -818,7 +818,6 @@ static struct scsi_host_template virtscsi_host_template_single = { .eh_timed_out = virtscsi_eh_timed_out, .slave_alloc = virtscsi_device_alloc, - .can_queue = 1024, .dma_boundary = UINT_MAX, .use_clustering = ENABLE_CLUSTERING, .target_alloc = virtscsi_target_alloc, @@ -839,7 +838,6 @@ static struct scsi_host_template virtscsi_host_template_multi = { .eh_timed_out = virtscsi_eh_timed_out, .slave_alloc = virtscsi_device_alloc, - .can_queue = 1024, .dma_boundary = UINT_MAX, .use_clustering = ENABLE_CLUSTERING, .target_alloc = virtscsi_target_alloc, @@ -972,6 +970,8 @@ static int virtscsi_probe(struct virtio_device *vdev) if (err) goto virtscsi_init_failed; + shost->can_queue = virtqueue_get_vring_size(vscsi->req_vqs[0].vq); + cmd_per_lun = virtscsi_config_get(vdev, cmd_per_lun) ?: 1; shost->cmd_per_lun = min_t(u32, cmd_per_lun, shost->can_queue); shost->max_sectors = virtscsi_config_get(vdev, max_sectors) ?: 0x; -- 2.13.1
[PATCH v2 1/2] virtio: Reduce BUG if total_sg > virtqueue size to WARN.
If using indirect descriptors, you can make the total_sg as large as you want. If not, BUG is too serious because the function later returns -ENOSPC. Thanks Paolo Bonzini, Christoph Hellwig. Signed-off-by: Richard W.M. Jones <rjo...@redhat.com> Reviewed-by: Paolo Bonzini <pbonz...@redhat.com> --- drivers/virtio/virtio_ring.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index 5e1b548828e6..27cbc1eab868 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -296,7 +296,6 @@ static inline int virtqueue_add(struct virtqueue *_vq, } #endif - BUG_ON(total_sg > vq->vring.num); BUG_ON(total_sg == 0); head = vq->free_head; @@ -305,8 +304,10 @@ static inline int virtqueue_add(struct virtqueue *_vq, * buffers, then go indirect. FIXME: tune this threshold */ if (vq->indirect && total_sg > 1 && vq->vq.num_free) desc = alloc_indirect(_vq, total_sg, gfp); - else + else { desc = NULL; + WARN_ON_ONCE(total_sg > vq->vring.num && !vq->indirect); + } if (desc) { /* Use a single buffer which doesn't continue */ -- 2.13.1
Re: Increased memory usage with scsi-mq
On Wed, Aug 09, 2017 at 06:50:10PM +0200, Paolo Bonzini wrote: > On 09/08/2017 18:01, Christoph Hellwig wrote: > > On Mon, Aug 07, 2017 at 03:07:48PM +0200, Paolo Bonzini wrote: > >> can_queue should depend on the virtqueue size, which unfortunately can > >> vary for each virtio-scsi device in theory. The virtqueue size is > >> retrieved by drivers/virtio prior to calling vring_create_virtqueue, and > >> in QEMU it is the second argument to virtio_add_queue. > > > > Why is that unfortunate? We don't even have to set can_queue in > > the host template, we can dynamically set it on per-host basis. > > Ah, cool, I thought allocations based on can_queue happened already in > scsi_host_alloc, but they happen at scsi_add_host time. I think I've decoded all that information into the patch below. I tested it, and it appears to work: when I set cmd_per_lun on the qemu command line, I see that the guest can add more disks: With scsi-mq enabled: 175 disks cmd_per_lun not set:177 disks * cmd_per_lun=16: 776 disks * cmd_per_lun=4: 1160 disks * With scsi-mq disabled: 1755 disks * = new result >From my point of view, this is a good result, but you should be warned that I don't fully understand what's going on here and I may have made obvious or not-so-obvious mistakes. I tested the performance impact and it's not noticable in the libguestfs case even with very small cmd_per_lun settings, but libguestfs is largely serial and so this result won't be applicable to guests in general. Also, should the guest kernel validate cmd_per_lun to make sure it's not too small or large? And if so, what would the limits be? Rich. >From e923e49846189b2f55f3f02b70a290d4be237ed5 Mon Sep 17 00:00:00 2001 From: "Richard W.M. Jones" <rjo...@redhat.com> Date: Thu, 10 Aug 2017 12:21:47 +0100 Subject: [PATCH] scsi: virtio_scsi: Set can_queue based on cmd_per_lun passed by hypervisor. Signed-off-by: Richard W.M. Jones <rjo...@redhat.com> --- drivers/scsi/virtio_scsi.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c index 9be211d68b15..b22591e9b16b 100644 --- a/drivers/scsi/virtio_scsi.c +++ b/drivers/scsi/virtio_scsi.c @@ -973,7 +973,7 @@ static int virtscsi_probe(struct virtio_device *vdev) goto virtscsi_init_failed; cmd_per_lun = virtscsi_config_get(vdev, cmd_per_lun) ?: 1; - shost->cmd_per_lun = min_t(u32, cmd_per_lun, shost->can_queue); + shost->cmd_per_lun = shost->can_queue = cmd_per_lun; shost->max_sectors = virtscsi_config_get(vdev, max_sectors) ?: 0x; /* LUNs > 256 are reported with format 1, so they go in the range -- 2.13.1 -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com Fedora Windows cross-compiler. Compile Windows programs, test, and build Windows installers. Over 100 libraries supported. http://fedoraproject.org/wiki/MinGW
Increased memory usage with scsi-mq
https://bugzilla.redhat.com/show_bug.cgi?id=1478201 We have a libguestfs test which adds 256 virtio-scsi disks to a qemu virtual machine. The VM has 500 MB of RAM, 1 vCPU and no swap. This test has been failing for a little while. It runs out of memory during SCSI enumeration in early boot. Tonight I bisected the cause to: 5c279bd9e40624f4ab6e688671026d6005b066fa is the first bad commit commit 5c279bd9e40624f4ab6e688671026d6005b066fa Author: Christoph HellwigDate: Fri Jun 16 10:27:55 2017 +0200 scsi: default to scsi-mq Remove the SCSI_MQ_DEFAULT config option and default to the blk-mq I/O path now that we had plenty of testing, and have I/O schedulers for blk-mq. The module option to disable the blk-mq path is kept around for now. Signed-off-by: Christoph Hellwig Signed-off-by: Martin K. Petersen :04 04 57ec7d5d2ba76592a695f533a69f747700c31966 c79f6ecb070acc4fadf6fc05ca9ba32bc9c0c665 Mdrivers I also wrote a small test to see the maximum number of virtio-scsi disks I could add to the above VM. The results were very surprising (to me anyhow): With scsi-mq enabled: 175 disks With scsi-mq disabled: 1755 disks I don't know why the ratio is almost exactly 10 times. I read your slides about scsi-mq and it seems like a significant benefit to large machines, but could the out of the box defaults be made more friendly for small memory machines? Thanks, Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com libguestfs lets you edit virtual machines. Supports shell scripting, bindings from many languages. http://libguestfs.org
Re: Increased memory usage with scsi-mq
On Sat, Aug 05, 2017 at 10:44:36AM +0200, Christoph Hellwig wrote: > On Fri, Aug 04, 2017 at 10:00:47PM +0100, Richard W.M. Jones wrote: > > I read your slides about scsi-mq and it seems like a significant > > benefit to large machines, but could the out of the box defaults be > > made more friendly for small memory machines? > > The default inumber of queues and queue depth and thus memory usage is > set by the LLDD. > > Try to reduce the can_queue value in virtio_scsi and/or make sure > you use the single queue variant in your VM (which should be tunable > in qemu). Thanks, this is interesting. Virtio-scsi seems to have a few settable parameters that might be related to this: DEFINE_PROP_UINT32("num_queues", VirtIOSCSI, parent_obj.conf.num_queues, 1), DEFINE_PROP_UINT32("max_sectors", VirtIOSCSI, parent_obj.conf.max_sectors, 0x), DEFINE_PROP_UINT32("cmd_per_lun", VirtIOSCSI, parent_obj.conf.cmd_per_lun, 128), Unfortunately (assuming I'm setting them right - see below), none of them have any effect on the number of disks that I can add to the VM. I am testing them by placing them in the ‘-device virtio-scsi-pci’ parameter, ie. as a property of the controller, not a property of the LUN, eg: -device virtio-scsi-pci,cmd_per_lun=32,id=scsi \ -drive file=/home/rjones/d/libguestfs/tmp/libguestfshXImTv/scratch.1,cache=unsafe,format=raw,id=hd0,if=none \ -device scsi-hd,drive=hd0 \ The debugging output is a bit too large to attach to this email, but I have placed it at the link below. It contains (if you scroll down a bit) the full qemu command line and full kernel output. http://oirase.annexia.org/tmp/bz1478201-log.txt I can add some extra debugging into the kernel if you like. Just point me to the right place. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-builder quickly builds VMs from scratch http://libguestfs.org/virt-builder.1.html
Re: Increased memory usage with scsi-mq
On Mon, Aug 07, 2017 at 02:11:39PM +0200, Paolo Bonzini wrote: > You could also add a module parameter to the driver, and set it to 64 on > the kernel command line (there is an example in > drivers/scsi/vmw_pvscsi.c of how to do it). [Proviso: I've not tested the performance of difference values, nor do I have any particular knowledge in this area] can_queue is documented as: * This determines if we will use a non-interrupt driven * or an interrupt driven scheme. It is set to the maximum number * of simultaneous commands a given host adapter will accept. Wouldn't it be better to make it default to k * number of CPUs for some small integer k? I looked at the other scsi drivers and they set it to all kinds of values. 1, small integers, large integers, configurable values. Also I noticed this code in virtio_scsi.c: cmd_per_lun = virtscsi_config_get(vdev, cmd_per_lun) ?: 1; shost->cmd_per_lun = min_t(u32, cmd_per_lun, shost->can_queue); but setting cmd_per_lun (as a qemu virtio-scsi-pci parameter) didn't seem to make any difference to memory usage. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-df lists disk usage of guests without needing to install any software inside the virtual machine. Supports Linux and Windows. http://people.redhat.com/~rjones/virt-df/
Re: Increased memory usage with scsi-mq
On Sat, Aug 05, 2017 at 03:39:54PM +0200, Christoph Hellwig wrote: > For now can you apply this testing patch to the guest kernel? > > diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c > index 9be211d68b15..0cbe2c882e1c 100644 > --- a/drivers/scsi/virtio_scsi.c > +++ b/drivers/scsi/virtio_scsi.c > @@ -818,7 +818,7 @@ static struct scsi_host_template > virtscsi_host_template_single = { > .eh_timed_out = virtscsi_eh_timed_out, > .slave_alloc = virtscsi_device_alloc, > > - .can_queue = 1024, > + .can_queue = 64, > .dma_boundary = UINT_MAX, > .use_clustering = ENABLE_CLUSTERING, > .target_alloc = virtscsi_target_alloc, > @@ -839,7 +839,7 @@ static struct scsi_host_template > virtscsi_host_template_multi = { > .eh_timed_out = virtscsi_eh_timed_out, > .slave_alloc = virtscsi_device_alloc, > > - .can_queue = 1024, > + .can_queue = 64, > .dma_boundary = UINT_MAX, > .use_clustering = ENABLE_CLUSTERING, > .target_alloc = virtscsi_target_alloc, > @@ -983,7 +983,7 @@ static int virtscsi_probe(struct virtio_device *vdev) > shost->max_id = num_targets; > shost->max_channel = 0; > shost->max_cmd_len = VIRTIO_SCSI_CDB_SIZE; > - shost->nr_hw_queues = num_queues; > + shost->nr_hw_queues = 1; > > #ifdef CONFIG_BLK_DEV_INTEGRITY > if (virtio_has_feature(vdev, VIRTIO_SCSI_F_T10_PI)) { Yes, that's an improvement, although it's still a little way off the density possible the old way: With scsi-mq enabled: 175 disks * With this patch:319 disks * With scsi-mq disabled: 1755 disks Also only the first two hunks are necessary. The kernel behaves exactly the same way with or without the third hunk (ie. num_queues must already be 1). Can I infer from this that qemu needs a way to specify the can_queue setting to the virtio-scsi driver in the guest kernel? Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com libguestfs lets you edit virtual machines. Supports shell scripting, bindings from many languages. http://libguestfs.org