[PATCH 0/2] migrate inflight emulated SCSI request for the scsi disk device
This patchset refine the comment of ther previous series: https://patchew.org/QEMU/cover.1712577715.git.yong.hu...@smartx.com/ Aiming to make the review easier, please review, thanks. Yong When designing the USB mass storage device model, QEMU places SCSI disk device as the backend of USB mass storage device. In addition, USB mass device driver in Guest OS conforms to the "Universal Serial Bus Mass Storage Class Bulk-Only Transport" specification in order to simulate the transform behavior between a USB controller and a USB mass device. The following shows the protocol hierarchy: ++ CDROM driver | scsi command |CDROM ++ +---+ USB mass | USB Mass Storage Class|USB mass storage driver| Bulk-Only Transport |storage device +---+ ++ USB Controller | USB Protocol |USB device ++ In the USB protocol layer, between the USB controller and USB device, at least two USB packets will be transformed when guest OS send a read operation to USB mass storage device: 1. The CBW packet, which will be delivered to the USB device's Bulk-Out endpoint. In order to simulate a read operation, the USB mass storage device parses the CBW and converts it to a SCSI command, which would be executed by CDROM(represented as SCSI disk in QEMU internally), and store the result data of the SCSI command in a buffer. 2. The DATA-IN packet, which will be delivered from the USB device's Bulk-In endpoint(fetched directly from the preceding buffer) to the USB controller. We consider UHCI to be the controller. The two packets mentioned above may have been processed by UHCI in two separate frame entries of the Frame List , and also described by two different TDs. Unlike the physical environment, a virtualized environment requires the QEMU to make sure that the result data of CBW is not lost and is delivered to the UHCI controller. Currently, these types of SCSI requests are not migrated, so QEMU cannot ensure the result data of the IO operation is not lost if there are inflight emulated SCSI requests during the live migration. Assume for the moment that the USB mass storage device is processing the CBW and storing the result data of the read operation to a buffre, live migration happens and moves the VM to the destination while not migrating the result data of the read operation. After migration, when UHCI at the destination issues a DATA-IN request to the USB mass storage device, a crash happens because USB mass storage device fetches the result data and get nothing. The scenario this patch addresses is this one. Theoretically, any device that uses the SCSI disk as a back-end would be affected by this issue. In this case, it is the USB CDROM. To fix it, inflight emulated SCSI request be migrated during live migration, similar to the DMA SCSI request. Hyman Huang (2): scsi-disk: Introduce the migrate_emulate_scsi_request field scsi-disk: Fix crash for VM configured with USB CDROM after live migration hw/scsi/scsi-disk.c | 35 ++- 1 file changed, 34 insertions(+), 1 deletion(-) -- 2.39.3
[PATCH 1/2] scsi-disk: Introduce the migrate_emulate_scsi_request field
To indicate to the destination whether or not emulational SCSI requests are sent, introduce the migrate_emulate_scsi_request in struct SCSIDiskState. It seeks to achieve migration backend compatibility. This commit sets the stage for the next one, which addresses the crash of a VM configured with a CDROM during live migration. Signed-off-by: Hyman Huang Message-Id: <2da3a08785453478079cfd46d8293ee68d284391.1712577715.git.yong.hu...@smartx.com> --- hw/scsi/scsi-disk.c | 13 - 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c index 4bd7af9d0c..0985676f73 100644 --- a/hw/scsi/scsi-disk.c +++ b/hw/scsi/scsi-disk.c @@ -111,6 +111,7 @@ struct SCSIDiskState { * 0x- reserved */ uint16_t rotation_rate; +bool migrate_emulate_scsi_request; }; static void scsi_free_request(SCSIRequest *req) @@ -3133,11 +3134,21 @@ static Property scsi_hd_properties[] = { DEFINE_PROP_END_OF_LIST(), }; +static int scsi_disk_pre_save(void *opaque) +{ +SCSIDiskState *dev = opaque; +dev->migrate_emulate_scsi_request = false; + +return 0; +} + static const VMStateDescription vmstate_scsi_disk_state = { .name = "scsi-disk", -.version_id = 1, +.version_id = 2, .minimum_version_id = 1, +.pre_save = scsi_disk_pre_save, .fields = (const VMStateField[]) { +VMSTATE_BOOL_V(migrate_emulate_scsi_request, SCSIDiskState, 2), VMSTATE_SCSI_DEVICE(qdev, SCSIDiskState), VMSTATE_BOOL(media_changed, SCSIDiskState), VMSTATE_BOOL(media_event, SCSIDiskState), -- 2.39.3
[PATCH 2/2] scsi-disk: Fix crash for VM configured with USB CDROM after live migration
0x472 When designing the USB mass storage device model, QEMU places SCSI disk device as the backend of USB mass storage device. In addition, USB mass device driver in Guest OS conforms to the "Universal Serial Bus Mass Storage Class Bulk-Only Transport" specification in order to simulate the transform behavior between a USB controller and a USB mass device. The following shows the protocol hierarchy: ++ CDROM driver | scsi command |CDROM ++ +---+ USB mass | USB Mass Storage Class|USB mass storage driver| Bulk-Only Transport |storage device +---+ ++ USB Controller | USB Protocol |USB device ++ In the USB protocol layer, between the USB controller and USB device, at least two USB packets will be transformed when guest OS send a read operation to USB mass storage device: 1. The CBW packet, which will be delivered to the USB device's Bulk-Out endpoint. In order to simulate a read operation, the USB mass storage device parses the CBW and converts it to a SCSI command, which would be executed by CDROM(represented as SCSI disk in QEMU internally), and store the result data of the SCSI command in a buffer. 2. The DATA-IN packet, which will be delivered from the USB device's Bulk-In endpoint(fetched directly from the preceding buffer) to the USB controller. We consider UHCI to be the controller. The two packets mentioned above may have been processed by UHCI in two separate frame entries of the Frame List , and also described by two different TDs. Unlike the physical environment, a virtualized environment requires the QEMU to make sure that the result data of CBW is not lost and is delivered to the UHCI controller. Currently, these types of SCSI requests are not migrated, so QEMU cannot ensure the result data of the IO operation is not lost if there are inflight emulated SCSI requests during the live migration. Assume for the moment that the USB mass storage device is processing the CBW and storing the result data of the read operation to a buffre, live migration happens and moves the VM to the destination while not migrating the result data of the read operation. After migration, when UHCI at the destination issues a DATA-IN request to the USB mass storage device, a crash happens because USB mass storage device fetches the result data and get nothing. The scenario this patch addresses is this one. Theoretically, any device that uses the SCSI disk as a back-end would be affected by this issue. In this case, it is the USB CDROM. To fix it, inflight emulated SCSI request be migrated during live migration, similar to the DMA SCSI request. Signed-off-by: Hyman Huang --- hw/scsi/scsi-disk.c | 24 +++- 1 file changed, 23 insertions(+), 1 deletion(-) diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c index 0985676f73..d6e9d9e8d4 100644 --- a/hw/scsi/scsi-disk.c +++ b/hw/scsi/scsi-disk.c @@ -160,6 +160,16 @@ static void scsi_disk_save_request(QEMUFile *f, SCSIRequest *req) } } +static void scsi_disk_emulate_save_request(QEMUFile *f, SCSIRequest *req) +{ +SCSIDiskReq *r = DO_UPCAST(SCSIDiskReq, req, req); +SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev); + +if (s->migrate_emulate_scsi_request) { +scsi_disk_save_request(f, req); +} +} + static void scsi_disk_load_request(QEMUFile *f, SCSIRequest *req) { SCSIDiskReq *r = DO_UPCAST(SCSIDiskReq, req, req); @@ -183,6 +193,16 @@ static void scsi_disk_load_request(QEMUFile *f, SCSIRequest *req) qemu_iovec_init_external(>qiov, >iov, 1); } +static void scsi_disk_emulate_load_request(QEMUFile *f, SCSIRequest *req) +{ +SCSIDiskReq *r = DO_UPCAST(SCSIDiskReq, req, req); +SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev); + +if (s->migrate_emulate_scsi_request) { +scsi_disk_load_request(f, req); +} +} + /* * scsi_handle_rw_error has two return values. False means that the error * must be ignored, true means that the error has been processed and the @@ -2593,6 +2613,8 @@ static const SCSIReqOps scsi_disk_emulate_reqops = { .read_data= scsi_disk_emulate_read_data, .write_data = scsi_disk_emulate_write_data, .get_buf = scsi_get_buf, +.load_request = scsi_disk_emulate_load_request, +.save_request = scsi_disk_emulate_save_request, }; static const SCSIReqOps scsi_disk_dma_reqops = { @@ -3137,7 +3159,7 @@ static Property scsi_hd_properties[] = { static int scsi_disk_pre_save(void *opaque) { SCSIDiskState *dev = opaque; -dev->migrate_emulate_scsi_request = false; +dev->migrate_emulate_scsi_request = true; return 0; } -- 2.39.3
[PATCH RESEND 1/2] scsi-disk: Introduce the migrate_emulate_scsi_request field
To indicate to the destination whether or not emulational SCSI requests are sent, introduce the migrate_emulate_scsi_request in struct SCSIDiskState. It seeks to achieve migration backend compatibility. This commit sets the stage for the next one, which addresses the crash of a VM configured with a CDROM during live migration. Signed-off-by: Hyman Huang --- hw/scsi/scsi-disk.c | 13 - 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c index 4bd7af9d0c..0985676f73 100644 --- a/hw/scsi/scsi-disk.c +++ b/hw/scsi/scsi-disk.c @@ -111,6 +111,7 @@ struct SCSIDiskState { * 0x- reserved */ uint16_t rotation_rate; +bool migrate_emulate_scsi_request; }; static void scsi_free_request(SCSIRequest *req) @@ -3133,11 +3134,21 @@ static Property scsi_hd_properties[] = { DEFINE_PROP_END_OF_LIST(), }; +static int scsi_disk_pre_save(void *opaque) +{ +SCSIDiskState *dev = opaque; +dev->migrate_emulate_scsi_request = false; + +return 0; +} + static const VMStateDescription vmstate_scsi_disk_state = { .name = "scsi-disk", -.version_id = 1, +.version_id = 2, .minimum_version_id = 1, +.pre_save = scsi_disk_pre_save, .fields = (const VMStateField[]) { +VMSTATE_BOOL_V(migrate_emulate_scsi_request, SCSIDiskState, 2), VMSTATE_SCSI_DEVICE(qdev, SCSIDiskState), VMSTATE_BOOL(media_changed, SCSIDiskState), VMSTATE_BOOL(media_event, SCSIDiskState), -- 2.39.3
[PATCH RESEND 0/2] Fix crash of VMs configured with the CDROM device
This patchset fixes the crash of VMs configured with the CDROM device on the destination during live migration. See the commit message for details. The previous patchset does not show up at https://patchew.org/QEMU. Just resend it to ensure the email gets to the inbox. Please review. Yong Hyman Huang (2): scsi-disk: Introduce the migrate_emulate_scsi_request field scsi-disk: Fix crash of VMs configured with the CDROM device hw/scsi/scsi-disk.c | 35 ++- 1 file changed, 34 insertions(+), 1 deletion(-) -- 2.39.3
[PATCH RESEND 2/2] scsi-disk: Fix crash of VMs configured with the CDROM device
When configuring VMs with the CDROM device using the USB bus in Libvirt, do as follows: The destination Qemu process crashed, causing the VM migration to fail; the backtrace reveals the following: Program terminated with signal SIGSEGV, Segmentation fault. 0 __memmove_sse2_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:312 312movq-8(%rsi,%rdx), %rcx [Current thread is 1 (Thread 0x7f0a9025fc00 (LWP 3286206))] (gdb) bt 0 __memmove_sse2_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:312 1 memcpy (__len=8, __src=, __dest=) at /usr/include/bits/string_fortified.h:34 2 iov_from_buf_full (iov=, iov_cnt=, offset=, buf=0x0, bytes=bytes@entry=8) at ../util/iov.c:33 3 iov_from_buf (bytes=8, buf=, offset=, iov_cnt=, iov=) at /usr/src/debug/qemu-6-6.2.0-75.7.oe1.smartx.git.40.x86_64/include/qemu/iov.h:49 4 usb_packet_copy (p=p@entry=0x56066b2fb5a0, ptr=, bytes=bytes@entry=8) at ../hw/usb/core.c:636 5 usb_msd_copy_data (s=s@entry=0x56066c62c770, p=p@entry=0x56066b2fb5a0) at ../hw/usb/dev-storage.c:186 6 usb_msd_handle_data (dev=0x56066c62c770, p=0x56066b2fb5a0) at ../hw/usb/dev-storage.c:496 7 usb_handle_packet (dev=0x56066c62c770, p=p@entry=0x56066b2fb5a0) at ../hw/usb/core.c:455 8 uhci_handle_td (s=s@entry=0x56066bd5f210, q=0x56066bb7fbd0, q@entry=0x0, qh_addr=qh_addr@entry=902518530, td=td@entry=0x7fffe6e788f0, td_addr=, int_mask=int_mask@entry=0x7fffe6e788e4) at ../hw/usb/hcd-uhci.c:885 9 uhci_process_frame (s=s@entry=0x56066bd5f210) at ../hw/usb/hcd-uhci.c:1061 10 uhci_frame_timer (opaque=opaque@entry=0x56066bd5f210) at ../hw/usb/hcd-uhci.c:1159 11 timerlist_run_timers (timer_list=0x56066af26bd0) at ../util/qemu-timer.c:642 12 qemu_clock_run_timers (type=QEMU_CLOCK_VIRTUAL) at ../util/qemu-timer.c:656 13 qemu_clock_run_all_timers () at ../util/qemu-timer.c:738 14 main_loop_wait (nonblocking=nonblocking@entry=0) at ../util/main-loop.c:542 15 qemu_main_loop () at ../softmmu/runstate.c:739 16 main (argc=, argv=, envp=) at ../softmmu/main.c:52 (gdb) frame 5 (gdb) p ((SCSIDiskReq *)s->req)->iov $1 = {iov_base = 0x0, iov_len = 0} (gdb) p/x s->req->tag $2 = 0x472 The scsi commands that the CDROM issued are wrapped as the payload of the USB protocol in Qemu's implementation of a USB mass storage device, which is used to implement a CDROM device that uses a USB bus. In general, the USB controller processes SCSI commands in two phases. Sending the OUT USB package that encapsulates the SCSI command is the first stage; scsi-disk would handle this by emulating the SCSI operation. Receiving the IN USB package containing the SCSI operation's output is the second stage. Additionally, the SCSI request tag tracks the request during the procedure. Since QEMU did not migrate the flying SCSI request, the output of the SCSI may be lost if the live migration is initiated between the two previously mentioned steps. In our scenario, the SCSI command is GET_EVENT_STATUS_NOTIFICATION, the QEMU log information below demonstrates how the SCSI command is being handled (first step) on the source: usb_packet_state_change bus 0, port 2, ep 2, packet 0x559f9ba14b00, state undef -> setup usb_msd_cmd_submit lun 0, tag 0x472, flags 0x0080, len 10, data-len 8 After migration, the VM crashed as soon as the destination's UHCI controller began processing the remaining portion of the SCSI request (second step)! Here is how the QEMU logged out: usb_packet_state_change bus 0, port 2, ep 1, packet 0x56066b2fb5a0, state undef -> setup usb_msd_data_in 8/8 (scsi 8) shutting down, reason=crashed To summarize, the missing scsi request during a live migration may cause a VM configured with a CDROM to crash. Migrating the SCSI request that the scsi-disk is handling is the simple approach, assuming that it actually exists. Signed-off-by: Hyman Huang --- hw/scsi/scsi-disk.c | 24 +++- 1 file changed, 23 insertions(+), 1 deletion(-) diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c index 0985676f73..d6e9d9e8d4 100644 --- a/hw/scsi/scsi-disk.c +++ b/hw/scsi/scsi-disk.c @@ -160,6 +160,16 @@ static void scsi_disk_save_request(QEMUFile *f, SCSIRequest *req) } } +static void scsi_disk_emulate_save_request(QEMUFile *f, SCSIRequest *req) +{ +SCSIDiskReq *r = DO_UPCAST(SCSIDiskReq, req, req); +SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev); + +if (s->migrate_emulate_scsi_request) { +scsi_disk_save_request(f, req); +} +} + static void scsi_disk_load_request(QEMUFile *f, SCSIRequest *req) { SCSIDiskReq *r = DO_UPCAST(SCSIDiskReq, req, req); @@ -183,6 +193,16 @@ static void scsi_disk_load_request(QEMUFile *f, SCSIRequest *req) qemu_iovec_init_external(>qiov, >iov, 1); } +static void scsi_disk_emulate_load_request(QEMUFile *f, SCSIRequest *req) +{ +SCSIDiskReq *r = DO_UPCAST(SCSIDiskReq, req, req); +SCSIDiskStat
[PATCH 1/2] scsi-disk: Introduce the migrate_emulate_scsi_request field
To indicate to the destination whether or not emulational SCSI requests are sent, introduce the migrate_emulate_scsi_request in struct SCSIDiskState. It seeks to achieve migration backend compatibility. This commit sets the stage for the next one, which addresses the crash of a VM configured with a CDROM during live migration. Signed-off-by: Hyman Huang --- hw/scsi/scsi-disk.c | 13 - 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c index 4bd7af9d0c..0985676f73 100644 --- a/hw/scsi/scsi-disk.c +++ b/hw/scsi/scsi-disk.c @@ -111,6 +111,7 @@ struct SCSIDiskState { * 0x- reserved */ uint16_t rotation_rate; +bool migrate_emulate_scsi_request; }; static void scsi_free_request(SCSIRequest *req) @@ -3133,11 +3134,21 @@ static Property scsi_hd_properties[] = { DEFINE_PROP_END_OF_LIST(), }; +static int scsi_disk_pre_save(void *opaque) +{ +SCSIDiskState *dev = opaque; +dev->migrate_emulate_scsi_request = false; + +return 0; +} + static const VMStateDescription vmstate_scsi_disk_state = { .name = "scsi-disk", -.version_id = 1, +.version_id = 2, .minimum_version_id = 1, +.pre_save = scsi_disk_pre_save, .fields = (const VMStateField[]) { +VMSTATE_BOOL_V(migrate_emulate_scsi_request, SCSIDiskState, 2), VMSTATE_SCSI_DEVICE(qdev, SCSIDiskState), VMSTATE_BOOL(media_changed, SCSIDiskState), VMSTATE_BOOL(media_event, SCSIDiskState), -- 2.39.3
[PATCH 2/2] scsi-disk: Fix the migration crash of the CDROM device with USB bus
When configuring VMs with the CDROM device using the USB bus in Libvirt, do as follows: The destination Qemu process crashed, causing the VM migration to fail; the backtrace reveals the following: Program terminated with signal SIGSEGV, Segmentation fault. 0 __memmove_sse2_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:312 312movq-8(%rsi,%rdx), %rcx [Current thread is 1 (Thread 0x7f0a9025fc00 (LWP 3286206))] (gdb) bt 0 __memmove_sse2_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:312 1 memcpy (__len=8, __src=, __dest=) at /usr/include/bits/string_fortified.h:34 2 iov_from_buf_full (iov=, iov_cnt=, offset=, buf=0x0, bytes=bytes@entry=8) at ../util/iov.c:33 3 iov_from_buf (bytes=8, buf=, offset=, iov_cnt=, iov=) at /usr/src/debug/qemu-6-6.2.0-75.7.oe1.smartx.git.40.x86_64/include/qemu/iov.h:49 4 usb_packet_copy (p=p@entry=0x56066b2fb5a0, ptr=, bytes=bytes@entry=8) at ../hw/usb/core.c:636 5 usb_msd_copy_data (s=s@entry=0x56066c62c770, p=p@entry=0x56066b2fb5a0) at ../hw/usb/dev-storage.c:186 6 usb_msd_handle_data (dev=0x56066c62c770, p=0x56066b2fb5a0) at ../hw/usb/dev-storage.c:496 7 usb_handle_packet (dev=0x56066c62c770, p=p@entry=0x56066b2fb5a0) at ../hw/usb/core.c:455 8 uhci_handle_td (s=s@entry=0x56066bd5f210, q=0x56066bb7fbd0, q@entry=0x0, qh_addr=qh_addr@entry=902518530, td=td@entry=0x7fffe6e788f0, td_addr=, int_mask=int_mask@entry=0x7fffe6e788e4) at ../hw/usb/hcd-uhci.c:885 9 uhci_process_frame (s=s@entry=0x56066bd5f210) at ../hw/usb/hcd-uhci.c:1061 10 uhci_frame_timer (opaque=opaque@entry=0x56066bd5f210) at ../hw/usb/hcd-uhci.c:1159 11 timerlist_run_timers (timer_list=0x56066af26bd0) at ../util/qemu-timer.c:642 12 qemu_clock_run_timers (type=QEMU_CLOCK_VIRTUAL) at ../util/qemu-timer.c:656 13 qemu_clock_run_all_timers () at ../util/qemu-timer.c:738 14 main_loop_wait (nonblocking=nonblocking@entry=0) at ../util/main-loop.c:542 15 qemu_main_loop () at ../softmmu/runstate.c:739 16 main (argc=, argv=, envp=) at ../softmmu/main.c:52 (gdb) frame 5 (gdb) p ((SCSIDiskReq *)s->req)->iov $1 = {iov_base = 0x0, iov_len = 0} (gdb) p/x s->req->tag $2 = 0x472 The scsi commands that the CDROM issued are wrapped as the payload of the USB protocol in Qemu's implementation of a USB mass storage device, which is used to implement a CDROM device that uses a USB bus. In general, the USB controller processes SCSI commands in two phases. Sending the OUT USB package that encapsulates the SCSI command is the first stage; scsi-disk would handle this by emulating the SCSI operation. Receiving the IN USB package containing the SCSI operation's output is the second stage. Additionally, the SCSI request tag tracks the request during the procedure. Since QEMU did not migrate the flying SCSI request, the output of the SCSI may be lost if the live migration is initiated between the two previously mentioned steps. In our scenario, the SCSI command is GET_EVENT_STATUS_NOTIFICATION, the QEMU log information below demonstrates how the SCSI command is being handled (first step) on the source: usb_packet_state_change bus 0, port 2, ep 2, packet 0x559f9ba14b00, state undef -> setup usb_msd_cmd_submit lun 0, tag 0x472, flags 0x0080, len 10, data-len 8 After migration, the VM crashed as soon as the destination's UHCI controller began processing the remaining portion of the SCSI request (second step)! Here is how the QEMU logged out: usb_packet_state_change bus 0, port 2, ep 1, packet 0x56066b2fb5a0, state undef -> setup usb_msd_data_in 8/8 (scsi 8) shutting down, reason=crashed To summarize, the missing scsi request during a live migration may cause a VM configured with a CDROM to crash. Migrating the SCSI request that the scsi-disk is handling is the simple approach, assuming that it actually exists. Signed-off-by: Hyman Huang --- hw/scsi/scsi-disk.c | 24 +++- 1 file changed, 23 insertions(+), 1 deletion(-) diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c index 0985676f73..d6e9d9e8d4 100644 --- a/hw/scsi/scsi-disk.c +++ b/hw/scsi/scsi-disk.c @@ -160,6 +160,16 @@ static void scsi_disk_save_request(QEMUFile *f, SCSIRequest *req) } } +static void scsi_disk_emulate_save_request(QEMUFile *f, SCSIRequest *req) +{ +SCSIDiskReq *r = DO_UPCAST(SCSIDiskReq, req, req); +SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev); + +if (s->migrate_emulate_scsi_request) { +scsi_disk_save_request(f, req); +} +} + static void scsi_disk_load_request(QEMUFile *f, SCSIRequest *req) { SCSIDiskReq *r = DO_UPCAST(SCSIDiskReq, req, req); @@ -183,6 +193,16 @@ static void scsi_disk_load_request(QEMUFile *f, SCSIRequest *req) qemu_iovec_init_external(>qiov, >iov, 1); } +static void scsi_disk_emulate_load_request(QEMUFile *f, SCSIRequest *req) +{ +SCSIDiskReq *r = DO_UPCAST(SCSIDiskReq, req, req); +SCSIDiskStat
[PATCH RFC 0/3] Support GM/T 0018-2012 cryptographic standard
This patchset introduce GM/T 0018-2012 as a crypto backend driver, which is applied for block encryption. Currently, we support SM4 cipher algorithm only. GM/T 0018-2012 is a cryptographic standard issued by the State Cryptography Administration of China. Visit https://hbba.sacinfo.org.cn search GM/T 0018-2012 for brief introduction. The objective of the standard is to develop a uniform application interface standard for the service-based cryptography device under the public key cryptographic infrastructure application framework, and to call the cryptography device through this interface to provide basic cryptographic services for the uppler layer. For more information about contents of the standard, download the specificaiton from: "https://github.com/guanzhi/GM-Standards/blob/master/GMT密码行标/ GMT 00018-2012 密码设备应用接口规范.pdf" There are two benefits to doing this, at least. * Performance - using a cryptography device for block encryption offers an opportunity to enhance the input/output performance once the hardware is certified * Secrecy - hardware manufacturers may fortify cryptography equipment with security features, so increasing the secrecy of block encryption. The precise way that vendors implement the standard APIs for data encryption using the cryptographic device is uncoupled from the GM/T 0018-2012 specification. Thus, if developers enable this functionality with the following conditions met, we could accomplish the general implementation: 1. rename the header file provided by vendor to gmt-0018-2012.h and copy it to the /usr/include directory. 2. rename the dynamic library provided by vendor to gmt_0018_2012.so and copy it to the /usr/lib64 or any directory that linker could find before compiling QEMU. 3. enable crypto_gmt option when compiling QEMU and make the feature availiable. By offering a development package for GM/T 0018-2012, the above provisions could be standardized; unfortunately, the hardware manufacturer has not completed this task. So developers who don't work with the vendor to obtain the cryptography device and related library may not be able to test this functionality because the standard implementation depends on the cryptography device supplied by the hardware vendor. We are hesitant to contribute to this series as a result. After all, we uploaded this series with the intention of receiving feedback, as the title suggests. We would welcome any suggestions and feedback regarding this feature. Hyman Huang (3): crypto: Introduce GM/T 0018-2012 cryptographic driver meson.build: Support GM/T 0018-2012 cryptographic standard crypto: Allow GM/T 0018-2012 to support SM4 cipher algorithm MAINTAINERS | 3 +- crypto/block-luks.c | 4 +- crypto/cipher-gmt.c | 263 ++ crypto/cipher.c | 6 +- crypto/cipherpriv.h | 6 + crypto/meson.build| 3 + meson.build | 30 meson_options.txt | 2 + scripts/meson-buildoptions.sh | 3 + 9 files changed, 315 insertions(+), 5 deletions(-) create mode 100644 crypto/cipher-gmt.c -- 2.39.3
[PATCH RFC 1/3] crypto: Introduce GM/T 0018-2012 cryptographic driver
GM/T 0018-2012 is a cryptographic standard issued by the State Cryptography Administration of China. For more information about the standard, visit https://hbba.sacinfo.org.cn. The objective of the standard is to develop a uniform application interface standard for the service-based cryptography device under the public key cryptographic infrastructure application framework, and to call the cryptography device through this interface to provide basic cryptographic services for the uppler layer. For more information about contents of the standard, download the specificaiton from: "https://github.com/guanzhi/GM-Standards/blob/master/GMT密码行标/ GMT%200018-2012%20密码设备应用接口规范.pdf" This patch implement the basic functions of GM/T 0018-2012 standard. Currently, for block encryption, it support SM4 cipher algorithm only. Signed-off-by: Hyman Huang --- MAINTAINERS | 3 +- crypto/cipher-gmt.c | 263 crypto/cipher.c | 2 + crypto/cipherpriv.h | 6 + 4 files changed, 273 insertions(+), 1 deletion(-) create mode 100644 crypto/cipher-gmt.c diff --git a/MAINTAINERS b/MAINTAINERS index a24c2b51b6..822726e9da 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -3418,10 +3418,11 @@ F: migration/dirtyrate.c F: migration/dirtyrate.h F: include/sysemu/dirtyrate.h -Detached LUKS header +Detached LUKS header and GM/T 0018-2012 cryptography M: Hyman Huang S: Maintained F: tests/qemu-iotests/tests/luks-detached-header +F: crypto/cipher-gmt.c D-Bus M: Marc-André Lureau diff --git a/crypto/cipher-gmt.c b/crypto/cipher-gmt.c new file mode 100644 index 00..40e32c114f --- /dev/null +++ b/crypto/cipher-gmt.c @@ -0,0 +1,263 @@ +/* + * QEMU GM/T 0018-2012 cryptographic standard support + * + * Copyright (c) 2024 SmartX Inc + * + * Authors: + *Hyman Huang + * + * This work is licensed under the terms of the GNU GPL, version 2 or + * (at your option) any later version. See the COPYING file in the + * top-level directory. + */ +#include + +#include "qemu/osdep.h" +#include "qemu/thread.h" +#include "qapi/error.h" +#include "crypto/cipher.h" +#include "cipherpriv.h" + +#include "qemu/error-report.h" + +typedef struct QCryptoGMT QCryptoGMT; + +struct QCryptoGMT { +QCryptoCipher base; + +SGD_HANDLE session; +SGD_HANDLE key; +SGD_UINT32 alg; +unsigned char iv[16]; /* not used for SM4 algo currently */ +}; + +typedef struct QCryptoGMTDeviceInfo QCryptoGMTDeviceInfo; + +struct QCryptoGMTDeviceInfo { +SGD_HANDLE device; +struct DeviceInfo_st info; +bool opened; +gint ref_count; +}; +/* + * It is advised to use numerous sessions with one open device + * as opposed to single sessions with several devices. + */ +static QCryptoGMTDeviceInfo gmt_device; +/* Protect the gmt_device */ +static QemuMutex gmt_device_mutex; + +static const struct QCryptoCipherDriver qcrypto_cipher_gmt_driver; + +static void gmt_device_lock(void) +{ +qemu_mutex_lock(_device_mutex); +} + +static void gmt_device_unlock(void) +{ +qemu_mutex_unlock(_device_mutex); +} + +static void +__attribute__((__constructor__)) gmt_device_mutex_init(void) +{ +qemu_mutex_init(_device_mutex); +} + +static void +gmt_device_ref(void) +{ +g_assert(gmt_device.device != NULL); +g_atomic_int_inc(_device.ref_count); +} + +static void +gmt_device_unref(void) +{ +g_assert(gmt_device.device != NULL); +if (g_atomic_int_dec_and_test(_device.ref_count)) { +SDF_CloseDevice(gmt_device.device); +gmt_device.opened = false; +gmt_device.device = NULL; +memset(_device.info, 0, sizeof(struct DeviceInfo_st)); +} +} + +static bool +qcrypto_gmt_cipher_supports(QCryptoCipherAlgorithm alg, +QCryptoCipherMode mode) +{ +switch (alg) { +case QCRYPTO_CIPHER_ALG_SM4: +break; +default: +return false; +} + +switch (mode) { +case QCRYPTO_CIPHER_MODE_ECB: +return true; +default: +return false; +} +} + +QCryptoCipher * +qcrypto_gmt_cipher_ctx_new(QCryptoCipherAlgorithm alg, + QCryptoCipherMode mode, + const uint8_t *key, + size_t nkey, + Error **errp) +{ +QCryptoGMT *gmt; +int rv; + +if (!qcrypto_gmt_cipher_supports(alg, mode)) { +return NULL; +} + +gmt = g_new0(QCryptoGMT, 1); +if (!gmt) { +return NULL; +} + +switch (alg) { +case QCRYPTO_CIPHER_ALG_SM4: +gmt->alg = SGD_SM4_ECB; +break; +default: +return NULL; +} + +gmt_device_lock(); +if (!gmt_device.opened) { +rv = SDF_OpenDevice(_device.device); +if (rv != SDR_OK) { +info_report("Could not open encryption card device, disabling"); +goto abort; +} +gmt_device.opened =
[PATCH RFC 3/3] crypto: Allow GM/T 0018-2012 to support SM4 cipher algorithm
Since GM/T 0018-2012 was probed by SM4 cipher algorithm, allow it to support SM4 cipher algorithm in block encryption. Signed-off-by: Hyman Huang --- crypto/block-luks.c | 4 ++-- crypto/cipher.c | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/crypto/block-luks.c b/crypto/block-luks.c index 3ee928fb5a..f4101fd435 100644 --- a/crypto/block-luks.c +++ b/crypto/block-luks.c @@ -95,7 +95,7 @@ qcrypto_block_luks_cipher_size_map_twofish[] = { { 0, 0 }, }; -#ifdef CONFIG_CRYPTO_SM4 +#if defined CONFIG_CRYPTO_SM4 || defined CONFIG_GMT_0018_2012 static const QCryptoBlockLUKSCipherSizeMap qcrypto_block_luks_cipher_size_map_sm4[] = { { 16, QCRYPTO_CIPHER_ALG_SM4}, @@ -109,7 +109,7 @@ qcrypto_block_luks_cipher_name_map[] = { { "cast5", qcrypto_block_luks_cipher_size_map_cast5 }, { "serpent", qcrypto_block_luks_cipher_size_map_serpent }, { "twofish", qcrypto_block_luks_cipher_size_map_twofish }, -#ifdef CONFIG_CRYPTO_SM4 +#if defined CONFIG_CRYPTO_SM4 || defined CONFIG_GMT_0018_2012 { "sm4", qcrypto_block_luks_cipher_size_map_sm4}, #endif }; diff --git a/crypto/cipher.c b/crypto/cipher.c index 785f231948..5c2a620dcf 100644 --- a/crypto/cipher.c +++ b/crypto/cipher.c @@ -38,7 +38,7 @@ static const size_t alg_key_len[QCRYPTO_CIPHER_ALG__MAX] = { [QCRYPTO_CIPHER_ALG_TWOFISH_128] = 16, [QCRYPTO_CIPHER_ALG_TWOFISH_192] = 24, [QCRYPTO_CIPHER_ALG_TWOFISH_256] = 32, -#ifdef CONFIG_CRYPTO_SM4 +#if defined CONFIG_CRYPTO_SM4 || defined CONFIG_GMT_0018_2012 [QCRYPTO_CIPHER_ALG_SM4] = 16, #endif }; @@ -56,7 +56,7 @@ static const size_t alg_block_len[QCRYPTO_CIPHER_ALG__MAX] = { [QCRYPTO_CIPHER_ALG_TWOFISH_128] = 16, [QCRYPTO_CIPHER_ALG_TWOFISH_192] = 16, [QCRYPTO_CIPHER_ALG_TWOFISH_256] = 16, -#ifdef CONFIG_CRYPTO_SM4 +#if defined CONFIG_CRYPTO_SM4 || defined CONFIG_GMT_0018_2012 [QCRYPTO_CIPHER_ALG_SM4] = 16, #endif }; -- 2.39.3
[PATCH RFC 2/3] meson.build: Support GM/T 0018-2012 cryptographic standard
GM/T 0018-2012 is a cryptographic standard issued by the State Cryptography Administration of China. The implement of the standard could support symmetric cipher algorithm for block encryption. SM4 cipher algorithms could be applied currently, so detect SM4 cipher algorithms via GM/T 0018-2012 API and enable the feature if crypto-gmt is given explictly. This feature defaults to disabled. Signed-off-by: Hyman Huang --- crypto/meson.build| 3 +++ meson.build | 30 ++ meson_options.txt | 2 ++ scripts/meson-buildoptions.sh | 3 +++ 4 files changed, 38 insertions(+) diff --git a/crypto/meson.build b/crypto/meson.build index c46f9c22a7..dd49d03780 100644 --- a/crypto/meson.build +++ b/crypto/meson.build @@ -46,6 +46,9 @@ endif if have_afalg crypto_ss.add(if_true: files('afalg.c', 'cipher-afalg.c', 'hash-afalg.c')) endif +if gmt_0018_2012.found() + crypto_ss.add(gmt_0018_2012, files('cipher-gmt.c')) +endif system_ss.add(when: gnutls, if_true: files('tls-cipher-suites.c')) diff --git a/meson.build b/meson.build index c1dc83e4c0..cd188582b5 100644 --- a/meson.build +++ b/meson.build @@ -1693,6 +1693,34 @@ if not gnutls_crypto.found() endif endif +if get_option('crypto_gmt').enabled() and get_option('crypto_afalg').enabled() + error('Only one of GM/T 0018-2012 & afalg can be enabled') +endif + +gmt_0018_2012 = not_found +if (not get_option('crypto_gmt').auto() or have_system) + gmt_0018_2012 = cc.find_library('gmt_0018_2012', has_headers: ['gmt-0018-2012.h'], + required: get_option('crypto_gmt')) + if gmt_0018_2012.found() and not cc.links(''' +#include +#include +int main(void) { + unsigned char iv[16] = {0}; + unsigned char plainData[16] = {0}; + unsigned char cipherData[16] = {0}; + unsigned int rlen; + SDF_Encrypt(NULL, NULL, SGD_SM4_ECB, iv, plainData, 16, cipherData, ); + return 0; +}''', dependencies: gmt_0018_2012) +gmt_0018_2012 = not_found +if get_option('crypto_gmt').enabled() + error('could not link gmt_0018_2012') +else + warning('could not link gmt_0018_2012, disabling') +endif + endif +endif + capstone = not_found if not get_option('capstone').auto() or have_system or have_user capstone = dependency('capstone', version: '>=3.0.5', @@ -2291,6 +2319,7 @@ config_host_data.set('CONFIG_GNUTLS_CRYPTO', gnutls_crypto.found()) config_host_data.set('CONFIG_TASN1', tasn1.found()) config_host_data.set('CONFIG_GCRYPT', gcrypt.found()) config_host_data.set('CONFIG_NETTLE', nettle.found()) +config_host_data.set('CONFIG_GMT_0018_2012', gmt_0018_2012.found()) config_host_data.set('CONFIG_CRYPTO_SM4', crypto_sm4.found()) config_host_data.set('CONFIG_HOGWEED', hogweed.found()) config_host_data.set('CONFIG_QEMU_PRIVATE_XTS', xts == 'private') @@ -4333,6 +4362,7 @@ if nettle.found() endif summary_info += {'SM4 ALG support': crypto_sm4} summary_info += {'AF_ALG support':have_afalg} +summary_info += {'GM/T 0018-2012 support': gmt_0018_2012.found()} summary_info += {'rng-none': get_option('rng_none')} summary_info += {'Linux keyring': have_keyring} summary_info += {'Linux keyutils':keyutils} diff --git a/meson_options.txt b/meson_options.txt index 0a99a059ec..4f35d3d62d 100644 --- a/meson_options.txt +++ b/meson_options.txt @@ -174,6 +174,8 @@ option('gcrypt', type : 'feature', value : 'auto', description: 'libgcrypt cryptography support') option('crypto_afalg', type : 'feature', value : 'disabled', description: 'Linux AF_ALG crypto backend driver') +option('crypto_gmt', type : 'feature', value : 'disabled', + description: 'GM/T 0018-2012 cryptographic standard driver') option('libdaxctl', type : 'feature', value : 'auto', description: 'libdaxctl support') option('libpmem', type : 'feature', value : 'auto', diff --git a/scripts/meson-buildoptions.sh b/scripts/meson-buildoptions.sh index 680fa3f581..e116e7b9ed 100644 --- a/scripts/meson-buildoptions.sh +++ b/scripts/meson-buildoptions.sh @@ -106,6 +106,7 @@ meson_options_help() { printf "%s\n" ' colo-proxy colo-proxy support' printf "%s\n" ' coreaudio CoreAudio sound support' printf "%s\n" ' crypto-afalgLinux AF_ALG crypto backend driver' + printf "%s\n" ' crypto-gmt GM/T 0018-2012 crypto backend driver' printf "%s\n" ' curlCURL block device driver' printf "%s\n" ' curses curses UI' printf "%s\n" ' dbus-display-display dbus support' @@ -282,6 +283,8 @@ _meson_option_parse() { --disable-coroutine-pool) printf "%s" -Dcoroutine_pool=false ;; --enable-crypto-afalg) printf "%s" -Dcrypto_afalg=enabled ;; --disable-crypto-afalg) printf "%s" -Dcrypto_afalg=disabled ;; +--enable-crypto-gmt) printf "%s&quo
[PATCH v4 1/3] qmp: Switch x-query-virtio-status back to numeric encoding
x-query-virtio-status returns several sets of virtio feature and status flags. It goes back to v7.2.0. In the initial commit 90c066cd682 (qmp: add QMP command x-query-virtio-status), we returned them as numbers, using virtio's well-known binary encoding. The next commit f3034ad71fc (qmp: decode feature & status bits in virtio-status) replaced the numbers by objects. The objects represent bits QEMU knows symbolically, and any unknown bits numerically just like before. Commit 8a8287981d1 (hmp: add virtio commands) added the matching HMP command "info virtio" (and a few more, which aren't relevant here). The symbolic representation uses lists of strings. The string format is undocumented. The strings look like "WELL_KNOWN_SYMBOL: human readable explanation". This symbolic representation is nice for humans. Machines it can save the trouble of decoding virtio's well-known binary encoding. However, we sometimes want to compare features and status bits without caring for their exact meaning. Say we want to verify the correctness of the virtio negotiation between guest, QEMU, and OVS-DPDK. We can use QMP command x-query-virtio-status to retrieve vhost-user net device features, and the "ovs-vsctl list interface" command to retrieve interface features. Without commit f3034ad71fc, we could then simply compare the numbers. With this commit, we first have to map from the strings back to the numeric encoding. Revert the decoding for QMP, but keep it for HMP. This makes the QMP command easier to use for use cases where we don't need to decode, like the comparison above. For use cases where we need to decode, we replace parsing undocumented strings by decoding virtio's well-known binary encoding. Incompatible change; acceptable because x-query-virtio-status comes without a stability promise. Signed-off-by: Hyman Huang Acked-by: Markus Armbruster --- hw/virtio/virtio-hmp-cmds.c | 25 +++-- hw/virtio/virtio-qmp.c | 23 ++--- qapi/virtio.json| 192 3 files changed, 45 insertions(+), 195 deletions(-) diff --git a/hw/virtio/virtio-hmp-cmds.c b/hw/virtio/virtio-hmp-cmds.c index 477c97dea2..721c630ab0 100644 --- a/hw/virtio/virtio-hmp-cmds.c +++ b/hw/virtio/virtio-hmp-cmds.c @@ -6,6 +6,7 @@ */ #include "qemu/osdep.h" +#include "virtio-qmp.h" #include "monitor/hmp.h" #include "monitor/monitor.h" #include "qapi/qapi-commands-virtio.h" @@ -145,13 +146,17 @@ void hmp_virtio_status(Monitor *mon, const QDict *qdict) monitor_printf(mon, " endianness: %s\n", s->device_endian); monitor_printf(mon, " status:\n"); -hmp_virtio_dump_status(mon, s->status); +hmp_virtio_dump_status(mon, +qmp_decode_status(s->status)); monitor_printf(mon, " Guest features:\n"); -hmp_virtio_dump_features(mon, s->guest_features); +hmp_virtio_dump_features(mon, +qmp_decode_features(s->device_id, s->guest_features)); monitor_printf(mon, " Host features:\n"); -hmp_virtio_dump_features(mon, s->host_features); +hmp_virtio_dump_features(mon, +qmp_decode_features(s->device_id, s->host_features)); monitor_printf(mon, " Backend features:\n"); -hmp_virtio_dump_features(mon, s->backend_features); +hmp_virtio_dump_features(mon, +qmp_decode_features(s->device_id, s->backend_features)); if (s->vhost_dev) { monitor_printf(mon, " VHost:\n"); @@ -172,13 +177,17 @@ void hmp_virtio_status(Monitor *mon, const QDict *qdict) monitor_printf(mon, "log_size: %"PRId64"\n", s->vhost_dev->log_size); monitor_printf(mon, "Features:\n"); -hmp_virtio_dump_features(mon, s->vhost_dev->features); +hmp_virtio_dump_features(mon, +qmp_decode_features(s->device_id, s->vhost_dev->features)); monitor_printf(mon, "Acked features:\n"); -hmp_virtio_dump_features(mon, s->vhost_dev->acked_features); +hmp_virtio_dump_features(mon, +qmp_decode_features(s->device_id, s->vhost_dev->acked_features)); monitor_printf(mon, "Backend features:\n"); -hmp_virtio_dump_features(mon, s->vhost_dev->backend_features); +hmp_virtio_dump_features(mon, +qmp_decode_features(s->device_id, s->vhost_dev->backend_features)); monitor_printf(mon, "Protocol features:\n"); -hmp_virtio_dump_protocols(mon, s->vhost_dev->protocol_features); +hmp_virtio_dump_protocols(mon, +qmp_decode_protocols(s->vhost_dev->protocol_features)); } qapi_free_VirtioStatus(s); diff --git a/hw/virtio/virtio-qmp.c b/
[PATCH v4 2/3] virtio: Declare the decoding functions to static
qmp_decode_protocols(), qmp_decode_status(), and qmp_decode_features() are now only used in virtio-hmp-cmds.c. So move them into there, redeclare them to static, and replace the qmp_ prefix with hmp_. Signed-off-by: Hyman Huang --- hw/virtio/meson.build | 4 +- hw/virtio/virtio-hmp-cmds.c | 677 +++- hw/virtio/virtio-qmp.c | 661 --- hw/virtio/virtio-qmp.h | 3 - 4 files changed, 671 insertions(+), 674 deletions(-) diff --git a/hw/virtio/meson.build b/hw/virtio/meson.build index d7f18c96e6..384fbf7e32 100644 --- a/hw/virtio/meson.build +++ b/hw/virtio/meson.build @@ -9,7 +9,7 @@ system_virtio_ss.add(when: 'CONFIG_VHOST_VDPA_DEV', if_true: files('vdpa-dev.c') specific_virtio_ss = ss.source_set() specific_virtio_ss.add(files('virtio.c')) -specific_virtio_ss.add(files('virtio-config-io.c', 'virtio-qmp.c')) +specific_virtio_ss.add(files('virtio-config-io.c', 'virtio-hmp-cmds.c')) if have_vhost system_virtio_ss.add(files('vhost.c')) @@ -87,7 +87,7 @@ specific_virtio_ss.add_all(when: 'CONFIG_VIRTIO_PCI', if_true: virtio_pci_ss) system_ss.add_all(when: 'CONFIG_VIRTIO', if_true: system_virtio_ss) system_ss.add(when: 'CONFIG_VIRTIO', if_false: files('vhost-stub.c')) system_ss.add(when: 'CONFIG_VIRTIO', if_false: files('virtio-stub.c')) -system_ss.add(files('virtio-hmp-cmds.c')) +system_ss.add(files('virtio-qmp.c')) specific_ss.add_all(when: 'CONFIG_VIRTIO', if_true: specific_virtio_ss) system_ss.add(when: 'CONFIG_ACPI', if_true: files('virtio-acpi.c')) diff --git a/hw/virtio/virtio-hmp-cmds.c b/hw/virtio/virtio-hmp-cmds.c index 721c630ab0..f95bad0069 100644 --- a/hw/virtio/virtio-hmp-cmds.c +++ b/hw/virtio/virtio-hmp-cmds.c @@ -11,7 +11,668 @@ #include "monitor/monitor.h" #include "qapi/qapi-commands-virtio.h" #include "qapi/qmp/qdict.h" +#include "hw/virtio/vhost-user.h" +#include "standard-headers/linux/virtio_ids.h" +#include "standard-headers/linux/vhost_types.h" +#include "standard-headers/linux/virtio_blk.h" +#include "standard-headers/linux/virtio_console.h" +#include "standard-headers/linux/virtio_gpu.h" +#include "standard-headers/linux/virtio_net.h" +#include "standard-headers/linux/virtio_scsi.h" +#include "standard-headers/linux/virtio_i2c.h" +#include "standard-headers/linux/virtio_balloon.h" +#include "standard-headers/linux/virtio_iommu.h" +#include "standard-headers/linux/virtio_mem.h" +#include "standard-headers/linux/virtio_vsock.h" +#include "standard-headers/linux/virtio_gpio.h" + +#include CONFIG_DEVICES + +#define FEATURE_ENTRY(name, desc) (qmp_virtio_feature_map_t) \ +{ .virtio_bit = name, .feature_desc = desc } + +/* Virtio transport features mapping */ +static const qmp_virtio_feature_map_t virtio_transport_map[] = { +/* Virtio device transport features */ +#ifndef VIRTIO_CONFIG_NO_LEGACY +FEATURE_ENTRY(VIRTIO_F_NOTIFY_ON_EMPTY, \ +"VIRTIO_F_NOTIFY_ON_EMPTY: Notify when device runs out of avail. " +"descs. on VQ"), +FEATURE_ENTRY(VIRTIO_F_ANY_LAYOUT, \ +"VIRTIO_F_ANY_LAYOUT: Device accepts arbitrary desc. layouts"), +#endif /* !VIRTIO_CONFIG_NO_LEGACY */ +FEATURE_ENTRY(VIRTIO_F_VERSION_1, \ +"VIRTIO_F_VERSION_1: Device compliant for v1 spec (legacy)"), +FEATURE_ENTRY(VIRTIO_F_IOMMU_PLATFORM, \ +"VIRTIO_F_IOMMU_PLATFORM: Device can be used on IOMMU platform"), +FEATURE_ENTRY(VIRTIO_F_RING_PACKED, \ +"VIRTIO_F_RING_PACKED: Device supports packed VQ layout"), +FEATURE_ENTRY(VIRTIO_F_IN_ORDER, \ +"VIRTIO_F_IN_ORDER: Device uses buffers in same order as made " +"available by driver"), +FEATURE_ENTRY(VIRTIO_F_ORDER_PLATFORM, \ +"VIRTIO_F_ORDER_PLATFORM: Memory accesses ordered by platform"), +FEATURE_ENTRY(VIRTIO_F_SR_IOV, \ +"VIRTIO_F_SR_IOV: Device supports single root I/O virtualization"), +FEATURE_ENTRY(VIRTIO_F_RING_RESET, \ +"VIRTIO_F_RING_RESET: Driver can reset a queue individually"), +/* Virtio ring transport features */ +FEATURE_ENTRY(VIRTIO_RING_F_INDIRECT_DESC, \ +"VIRTIO_RING_F_INDIRECT_DESC: Indirect descriptors supported"), +FEATURE_ENTRY(VIRTIO_RING_F_EVENT_IDX, \ +"VIRTIO_RING_F_EVENT_IDX: Used & avail. event fields enabled"), +{ -1, "" } +}; + +/* Vhost-user protocol features mapping */ +static const qmp_virtio_feature_map_t vhost_user_protocol_map[] = { +FEATURE_ENTRY(VHOST_USER_PROTOCOL_F_MQ, \ +"VHOST_USER_PROTOCOL_F_MQ: Multiqueue protocol supported"), +FEATURE_ENTRY(V
[PATCH v4 3/3] qapi: Define VhostDeviceProtocols and VirtioDeviceFeatures as plain C types
VhostDeviceProtocols and VirtioDeviceFeatures are only used in virtio-hmp-cmds.c. So define them as plain C types there, and drop them from the QAPI schema. Signed-off-by: Hyman Huang Reviewed-by: Markus Armbruster --- hw/virtio/virtio-hmp-cmds.c | 16 +++ qapi/virtio.json| 39 - 2 files changed, 16 insertions(+), 39 deletions(-) diff --git a/hw/virtio/virtio-hmp-cmds.c b/hw/virtio/virtio-hmp-cmds.c index f95bad0069..045b472228 100644 --- a/hw/virtio/virtio-hmp-cmds.c +++ b/hw/virtio/virtio-hmp-cmds.c @@ -29,6 +29,22 @@ #include CONFIG_DEVICES +typedef struct VhostDeviceProtocols VhostDeviceProtocols; +struct VhostDeviceProtocols { +strList *protocols; +bool has_unknown_protocols; +uint64_t unknown_protocols; +}; + +typedef struct VirtioDeviceFeatures VirtioDeviceFeatures; +struct VirtioDeviceFeatures { +strList *transports; +bool has_dev_features; +strList *dev_features; +bool has_unknown_dev_features; +uint64_t unknown_dev_features; +}; + #define FEATURE_ENTRY(name, desc) (qmp_virtio_feature_map_t) \ { .virtio_bit = name, .feature_desc = desc } diff --git a/qapi/virtio.json b/qapi/virtio.json index 26516fb29c..42dbc87f2f 100644 --- a/qapi/virtio.json +++ b/qapi/virtio.json @@ -300,45 +300,6 @@ 'data': { 'statuses': [ 'str' ], '*unknown-statuses': 'uint8' } } -## -# @VhostDeviceProtocols: -# -# A structure defined to list the vhost user protocol features of a -# Vhost User device -# -# @protocols: List of decoded vhost user protocol features of a vhost -# user device -# -# @unknown-protocols: Vhost user device protocol features bitmap that -# have not been decoded -# -# Since: 7.2 -## -{ 'struct': 'VhostDeviceProtocols', - 'data': { 'protocols': [ 'str' ], -'*unknown-protocols': 'uint64' } } - -## -# @VirtioDeviceFeatures: -# -# The common fields that apply to most Virtio devices. Some devices -# may not have their own device-specific features (e.g. virtio-rng). -# -# @transports: List of transport features of the virtio device -# -# @dev-features: List of device-specific features (if the device has -# unique features) -# -# @unknown-dev-features: Virtio device features bitmap that have not -# been decoded -# -# Since: 7.2 -## -{ 'struct': 'VirtioDeviceFeatures', - 'data': { 'transports': [ 'str' ], -'*dev-features': [ 'str' ], -'*unknown-dev-features': 'uint64' } } - ## # @VirtQueueStatus: # -- 2.39.3
[PATCH v4 0/3] Adjust the output of x-query-virtio-status
v4: - Rebase on master - Fix the syntax mistake within the commit message of [PATCH v3 1/3] - Adjust the linking file in hw/virtio/meson.build suggested by Markus Please review, Yong v3: - Rebase on master - Use the refined commit message furnished by Markus for [PATCH v2 1/2] - Drop the [PATCH v2 2/2] - Add [PATCH v3 2/3] to declare the decoding functions to static - Add [PATCH v3 3/3] to Define VhostDeviceProtocols and VirtioDeviceFeatures as plain C types v2: - Changing the hmp_virtio_dump_xxx function signatures to implement the bitmap decoding, suggested by Philippe. This patchset is derived from the series: https://lore.kernel.org/qemu-devel/cover.1699793550.git.yong.hu...@smartx.com/ Please go to the link to see more background information. The following points are what we have done in the patchset: 1. Take the policy of adding human-readable output just in HMP. 2. For the HMP output, display the human-readable information and drop the unknown bits in practice. 3. For the QMP output, remove the descriptive strings and only display bits encoded as numbers. Hyman Huang (3): qmp: Switch x-query-virtio-status back to numeric encoding virtio: Declare the decoding functions to static qapi: Define VhostDeviceProtocols and VirtioDeviceFeatures as plain C types hw/virtio/meson.build | 4 +- hw/virtio/virtio-hmp-cmds.c | 702 +++- hw/virtio/virtio-qmp.c | 684 +-- hw/virtio/virtio-qmp.h | 3 - qapi/virtio.json| 231 +--- 5 files changed, 724 insertions(+), 900 deletions(-) -- 2.39.3
[PATCH] qapi: Craft the BlockdevCreateOptionsLUKS comment
Add comment in detail for commit 433957bb7f (qapi: Make parameter 'file' optional for BlockdevCreateOptionsLUKS). Signed-off-by: Hyman Huang --- qapi/block-core.json | 20 +++- 1 file changed, 19 insertions(+), 1 deletion(-) diff --git a/qapi/block-core.json b/qapi/block-core.json index ab5a93a966..42b0840d43 100644 --- a/qapi/block-core.json +++ b/qapi/block-core.json @@ -4973,7 +4973,25 @@ ## # @BlockdevCreateOptionsLUKS: # -# Driver specific image creation options for LUKS. +# Driver specific image creation options for LUKS. Note that +# @file is required if @preallocation is specified and equals +# PREALLOC_MODE_ON. The following three scenarios determine how +# creation logic behaves when @preallocation is either equal to +# PREALLOC_MODE_OFF or is not given: +# +# 1) When @file is given only, format the block device referenced +# by @file as the LUKS specification and trunk it to the @size. +# In this case, the @size should reflect amount of space made +# available to the guest, so the trunk size must take account +# of that which will be used by the crypto header. +# +# 2) When @header is given only, just format the block device +# referenced by @header as the LUKS specification. +# +# 3) When both @file and @header are given, block device +# referenced by @file should be trunked to @size, and block +# device referenced by @header should be formatted as the LUKS +# specification. # # @file: Node to create the image format on, mandatory except when #'preallocation' is not requested -- 2.39.3
[PATCH] docs/devel: Add introduction to LUKS volume with detached header
Signed-off-by: Hyman Huang --- MAINTAINERS | 1 + docs/devel/luks-detached-header.rst | 182 2 files changed, 183 insertions(+) create mode 100644 docs/devel/luks-detached-header.rst diff --git a/MAINTAINERS b/MAINTAINERS index a24c2b51b6..e8b03032ab 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -3422,6 +3422,7 @@ Detached LUKS header M: Hyman Huang S: Maintained F: tests/qemu-iotests/tests/luks-detached-header +F: docs/devel/luks-detached-header.rst D-Bus M: Marc-André Lureau diff --git a/docs/devel/luks-detached-header.rst b/docs/devel/luks-detached-header.rst new file mode 100644 index 00..15e9ccde1d --- /dev/null +++ b/docs/devel/luks-detached-header.rst @@ -0,0 +1,182 @@ + +LUKS volume with detached header + + +Introduction + + +This document gives an overview of the design of LUKS volume with detached +header and how to use it. + +Background +== + +The LUKS format has ability to store the header in a separate volume from +the payload. We could extend the LUKS driver in QEMU to support this use +case. + +Normally a LUKS volume has a layout: + +:: + + +---+ + | ||| + disk| header | key material | disk payload data | + | ||| + +---+ + +With a detached LUKS header, you need 2 disks so getting: + +:: + + +--+ + disk1 | header | key material | + +--+ + +-+ + disk2 | disk payload data | + +-+ + +There are a variety of benefits to doing this: + + * Secrecy - the disk2 cannot be identified as containing LUKS + volume since there's no header + * Control - if access to the disk1 is restricted, then even + if someone has access to disk2 they can't unlock + it. Might be useful if you have disks on NFS but + want to restrict which host can launch a VM + instance from it, by dynamically providing access + to the header to a designated host + * Flexibility - your application data volume may be a given + size and it is inconvenient to resize it to + add encryption.You can store the LUKS header + separately and use the existing storage + volume for payload + * Recovery - corruption of a bit in the header may make the + entire payload inaccessible. It might be + convenient to take backups of the header. If + your primary disk header becomes corrupt, you + can unlock the data still by pointing to the + backup detached header + +Architecture + + +Take the qcow2 encryption, for example. The architecture of the +LUKS volume with detached header is shown in the diagram below. + +There are two children of the root node: a file and a header. +Data from the disk payload is stored in the file node. The +LUKS header and key material are located in the header node, +as previously mentioned. + +:: + + +-+ + Root node| foo[luks] | + +-+ + | | + file |header | + | | + +-++--+ + Child node |payload-format[qcow2]||header-format[raw]| + +-++--+ + | | + file | file | + | | + +--+ +-+ + Child node |payload-protocol[file]| |header-protocol[file]| + +--+ +-+ + | | + | | + | | + Host storageHost storage + +Usage += + +Create a LUKS disk with a detached header using qemu-img + + +Shell commandline:: + +# qemu-img create --object secret,id=sec0,data=abc123 -f luks \ +> -o cipher-alg=aes-256,cipher-mode=xts -o key-secret=sec0 \ +> -o detached-header=true test-header.img +# qemu-img create -f qcow2 test-payload.qcow2 200G +# qemu-img info 'json:{"driver":"luks","file":{"filename": \ +> "test-payload.img"},"header":{"filename
[PATCH v3 2/3] virtio: Declare the decoding functions to static
qmp_decode_protocols(), qmp_decode_status(), and qmp_decode_features() are now only used in virtio-hmp-cmds.c. So move them into there, redeclare them to static, and replace the qmp_ prefix with hmp_. Signed-off-by: Hyman Huang --- hw/virtio/meson.build | 3 +- hw/virtio/virtio-hmp-cmds.c | 677 +++- hw/virtio/virtio-qmp.c | 661 --- hw/virtio/virtio-qmp.h | 3 - 4 files changed, 670 insertions(+), 674 deletions(-) diff --git a/hw/virtio/meson.build b/hw/virtio/meson.build index 47baf00366..6665669480 100644 --- a/hw/virtio/meson.build +++ b/hw/virtio/meson.build @@ -9,7 +9,7 @@ system_virtio_ss.add(when: 'CONFIG_VHOST_VDPA_DEV', if_true: files('vdpa-dev.c') specific_virtio_ss = ss.source_set() specific_virtio_ss.add(files('virtio.c')) -specific_virtio_ss.add(files('virtio-config-io.c', 'virtio-qmp.c')) +specific_virtio_ss.add(files('virtio-config-io.c', 'virtio-qmp.c', 'virtio-hmp-cmds.c')) if have_vhost system_virtio_ss.add(files('vhost.c')) @@ -74,7 +74,6 @@ specific_virtio_ss.add_all(when: 'CONFIG_VIRTIO_PCI', if_true: virtio_pci_ss) system_ss.add_all(when: 'CONFIG_VIRTIO', if_true: system_virtio_ss) system_ss.add(when: 'CONFIG_VIRTIO', if_false: files('vhost-stub.c')) system_ss.add(when: 'CONFIG_VIRTIO', if_false: files('virtio-stub.c')) -system_ss.add(files('virtio-hmp-cmds.c')) specific_ss.add_all(when: 'CONFIG_VIRTIO', if_true: specific_virtio_ss) system_ss.add(when: 'CONFIG_ACPI', if_true: files('virtio-acpi.c')) diff --git a/hw/virtio/virtio-hmp-cmds.c b/hw/virtio/virtio-hmp-cmds.c index 721c630ab0..f95bad0069 100644 --- a/hw/virtio/virtio-hmp-cmds.c +++ b/hw/virtio/virtio-hmp-cmds.c @@ -11,7 +11,668 @@ #include "monitor/monitor.h" #include "qapi/qapi-commands-virtio.h" #include "qapi/qmp/qdict.h" +#include "hw/virtio/vhost-user.h" +#include "standard-headers/linux/virtio_ids.h" +#include "standard-headers/linux/vhost_types.h" +#include "standard-headers/linux/virtio_blk.h" +#include "standard-headers/linux/virtio_console.h" +#include "standard-headers/linux/virtio_gpu.h" +#include "standard-headers/linux/virtio_net.h" +#include "standard-headers/linux/virtio_scsi.h" +#include "standard-headers/linux/virtio_i2c.h" +#include "standard-headers/linux/virtio_balloon.h" +#include "standard-headers/linux/virtio_iommu.h" +#include "standard-headers/linux/virtio_mem.h" +#include "standard-headers/linux/virtio_vsock.h" +#include "standard-headers/linux/virtio_gpio.h" + +#include CONFIG_DEVICES + +#define FEATURE_ENTRY(name, desc) (qmp_virtio_feature_map_t) \ +{ .virtio_bit = name, .feature_desc = desc } + +/* Virtio transport features mapping */ +static const qmp_virtio_feature_map_t virtio_transport_map[] = { +/* Virtio device transport features */ +#ifndef VIRTIO_CONFIG_NO_LEGACY +FEATURE_ENTRY(VIRTIO_F_NOTIFY_ON_EMPTY, \ +"VIRTIO_F_NOTIFY_ON_EMPTY: Notify when device runs out of avail. " +"descs. on VQ"), +FEATURE_ENTRY(VIRTIO_F_ANY_LAYOUT, \ +"VIRTIO_F_ANY_LAYOUT: Device accepts arbitrary desc. layouts"), +#endif /* !VIRTIO_CONFIG_NO_LEGACY */ +FEATURE_ENTRY(VIRTIO_F_VERSION_1, \ +"VIRTIO_F_VERSION_1: Device compliant for v1 spec (legacy)"), +FEATURE_ENTRY(VIRTIO_F_IOMMU_PLATFORM, \ +"VIRTIO_F_IOMMU_PLATFORM: Device can be used on IOMMU platform"), +FEATURE_ENTRY(VIRTIO_F_RING_PACKED, \ +"VIRTIO_F_RING_PACKED: Device supports packed VQ layout"), +FEATURE_ENTRY(VIRTIO_F_IN_ORDER, \ +"VIRTIO_F_IN_ORDER: Device uses buffers in same order as made " +"available by driver"), +FEATURE_ENTRY(VIRTIO_F_ORDER_PLATFORM, \ +"VIRTIO_F_ORDER_PLATFORM: Memory accesses ordered by platform"), +FEATURE_ENTRY(VIRTIO_F_SR_IOV, \ +"VIRTIO_F_SR_IOV: Device supports single root I/O virtualization"), +FEATURE_ENTRY(VIRTIO_F_RING_RESET, \ +"VIRTIO_F_RING_RESET: Driver can reset a queue individually"), +/* Virtio ring transport features */ +FEATURE_ENTRY(VIRTIO_RING_F_INDIRECT_DESC, \ +"VIRTIO_RING_F_INDIRECT_DESC: Indirect descriptors supported"), +FEATURE_ENTRY(VIRTIO_RING_F_EVENT_IDX, \ +"VIRTIO_RING_F_EVENT_IDX: Used & avail. event fields enabled"), +{ -1, "" } +}; + +/* Vhost-user protocol features mapping */ +static const qmp_virtio_feature_map_t vhost_user_protocol_map[] = { +FEATURE_ENTRY(VHOST_USER_PROTOCOL_F_MQ, \ +"VHOST_USER_PROTOCOL_F_MQ: Multiqueue protocol supported"), +FEATURE_ENTRY(VHOST_USER_PROTOCOL_F_LOG_SHMFD, \ +"VHOST_U
[PATCH v3 1/3] qmp: Switch x-query-virtio-status back to numeric encoding
x-query-virtio-status returns several sets of virtio feature and status flags. It goes back to v7.2.0. In the initial commit 90c066cd682 (qmp: add QMP command x-query-virtio-status), we returned them as numbers, using virtio's well-known binary encoding. The next commit f3034ad71fc (qmp: decode feature & status bits in virtio-status) replaced the numbers by objects. The objects represent bits QEMU knows symbolically, and any unknown bits numerically just like before. Commit 8a8287981d1 (hmp: add virtio commands) the matching HMP command "info virtio" (and a few more, which aren't relevant here). The symbolic representation uses lists of strings. The string format is undocumented. The strings look like "WELL_KNOWN_SYMBOL: human readable explanation". This symbolic representation is nice for humans. Machines it can save the trouble of decoding virtio's well-known binary encoding. However, we sometimes want to compare features and status bits without caring for their exact meaning. Say we want to verify the correctness of the virtio negotiation between guest, QEMU, and OVS-DPDK. We can use QMP command x-query-virtio-status to retrieve vhost-user net device features, and the "ovs-vsctl list interface" command to retrieve interface features. Without commit f3034ad71fc, we could then simply compare the numbers. With this commit, we first have to map from the strings back to the numeric encoding. Revert the decoding for QMP, but keep it for HMP. This makes the QMP command easier to use for use cases where we don't need to decode, like the comparison above. For use cases where we need to decode, we replace parsing undocumented strings by decoding virtio's well-known binary encoding. Incompatible change; acceptable because x-query-virtio-status does comes without a stability promise. Signed-off-by: Hyman Huang --- hw/virtio/virtio-hmp-cmds.c | 25 +++-- hw/virtio/virtio-qmp.c | 23 ++--- qapi/virtio.json| 192 3 files changed, 45 insertions(+), 195 deletions(-) diff --git a/hw/virtio/virtio-hmp-cmds.c b/hw/virtio/virtio-hmp-cmds.c index 477c97dea2..721c630ab0 100644 --- a/hw/virtio/virtio-hmp-cmds.c +++ b/hw/virtio/virtio-hmp-cmds.c @@ -6,6 +6,7 @@ */ #include "qemu/osdep.h" +#include "virtio-qmp.h" #include "monitor/hmp.h" #include "monitor/monitor.h" #include "qapi/qapi-commands-virtio.h" @@ -145,13 +146,17 @@ void hmp_virtio_status(Monitor *mon, const QDict *qdict) monitor_printf(mon, " endianness: %s\n", s->device_endian); monitor_printf(mon, " status:\n"); -hmp_virtio_dump_status(mon, s->status); +hmp_virtio_dump_status(mon, +qmp_decode_status(s->status)); monitor_printf(mon, " Guest features:\n"); -hmp_virtio_dump_features(mon, s->guest_features); +hmp_virtio_dump_features(mon, +qmp_decode_features(s->device_id, s->guest_features)); monitor_printf(mon, " Host features:\n"); -hmp_virtio_dump_features(mon, s->host_features); +hmp_virtio_dump_features(mon, +qmp_decode_features(s->device_id, s->host_features)); monitor_printf(mon, " Backend features:\n"); -hmp_virtio_dump_features(mon, s->backend_features); +hmp_virtio_dump_features(mon, +qmp_decode_features(s->device_id, s->backend_features)); if (s->vhost_dev) { monitor_printf(mon, " VHost:\n"); @@ -172,13 +177,17 @@ void hmp_virtio_status(Monitor *mon, const QDict *qdict) monitor_printf(mon, "log_size: %"PRId64"\n", s->vhost_dev->log_size); monitor_printf(mon, "Features:\n"); -hmp_virtio_dump_features(mon, s->vhost_dev->features); +hmp_virtio_dump_features(mon, +qmp_decode_features(s->device_id, s->vhost_dev->features)); monitor_printf(mon, "Acked features:\n"); -hmp_virtio_dump_features(mon, s->vhost_dev->acked_features); +hmp_virtio_dump_features(mon, +qmp_decode_features(s->device_id, s->vhost_dev->acked_features)); monitor_printf(mon, "Backend features:\n"); -hmp_virtio_dump_features(mon, s->vhost_dev->backend_features); +hmp_virtio_dump_features(mon, +qmp_decode_features(s->device_id, s->vhost_dev->backend_features)); monitor_printf(mon, "Protocol features:\n"); -hmp_virtio_dump_protocols(mon, s->vhost_dev->protocol_features); +hmp_virtio_dump_protocols(mon, +qmp_decode_protocols(s->vhost_dev->protocol_features)); } qapi_free_VirtioStatus(s); diff --git a/hw/virtio/virtio-qmp.c b/hw/virtio/virtio-qmp.c
[PATCH v3 3/3] qapi: Define VhostDeviceProtocols and VirtioDeviceFeatures as plain C types
VhostDeviceProtocols and VirtioDeviceFeatures are only used in virtio-hmp-cmds.c. So define them as plain C types there, and drop them from the QAPI schema. Signed-off-by: Hyman Huang --- hw/virtio/virtio-hmp-cmds.c | 16 +++ qapi/virtio.json| 39 - 2 files changed, 16 insertions(+), 39 deletions(-) diff --git a/hw/virtio/virtio-hmp-cmds.c b/hw/virtio/virtio-hmp-cmds.c index f95bad0069..045b472228 100644 --- a/hw/virtio/virtio-hmp-cmds.c +++ b/hw/virtio/virtio-hmp-cmds.c @@ -29,6 +29,22 @@ #include CONFIG_DEVICES +typedef struct VhostDeviceProtocols VhostDeviceProtocols; +struct VhostDeviceProtocols { +strList *protocols; +bool has_unknown_protocols; +uint64_t unknown_protocols; +}; + +typedef struct VirtioDeviceFeatures VirtioDeviceFeatures; +struct VirtioDeviceFeatures { +strList *transports; +bool has_dev_features; +strList *dev_features; +bool has_unknown_dev_features; +uint64_t unknown_dev_features; +}; + #define FEATURE_ENTRY(name, desc) (qmp_virtio_feature_map_t) \ { .virtio_bit = name, .feature_desc = desc } diff --git a/qapi/virtio.json b/qapi/virtio.json index 26516fb29c..42dbc87f2f 100644 --- a/qapi/virtio.json +++ b/qapi/virtio.json @@ -300,45 +300,6 @@ 'data': { 'statuses': [ 'str' ], '*unknown-statuses': 'uint8' } } -## -# @VhostDeviceProtocols: -# -# A structure defined to list the vhost user protocol features of a -# Vhost User device -# -# @protocols: List of decoded vhost user protocol features of a vhost -# user device -# -# @unknown-protocols: Vhost user device protocol features bitmap that -# have not been decoded -# -# Since: 7.2 -## -{ 'struct': 'VhostDeviceProtocols', - 'data': { 'protocols': [ 'str' ], -'*unknown-protocols': 'uint64' } } - -## -# @VirtioDeviceFeatures: -# -# The common fields that apply to most Virtio devices. Some devices -# may not have their own device-specific features (e.g. virtio-rng). -# -# @transports: List of transport features of the virtio device -# -# @dev-features: List of device-specific features (if the device has -# unique features) -# -# @unknown-dev-features: Virtio device features bitmap that have not -# been decoded -# -# Since: 7.2 -## -{ 'struct': 'VirtioDeviceFeatures', - 'data': { 'transports': [ 'str' ], -'*dev-features': [ 'str' ], -'*unknown-dev-features': 'uint64' } } - ## # @VirtQueueStatus: # -- 2.31.1
[PATCH v3 0/3] Adjust the output of x-query-virtio-status
Sorry for the late post of version 3. The modifications are as follows: v3: - Rebase on master - Use the refined commit message furnished by Markus for [PATCH v2 1/2] - Drop the [PATCH v2 2/2] - Add [PATCH v3 2/3] to declare the decoding functions to static - Add [PATCH v3 3/3] to Define VhostDeviceProtocols and VirtioDeviceFeatures as plain C types Since Markus inspired all of the alterations above, we would like to thank him for his contribution to this series. Please review, Yong v2: - Changing the hmp_virtio_dump_xxx function signatures to implement the bitmap decoding, suggested by Philippe. This patchset is derived from the series: https://lore.kernel.org/qemu-devel/cover.1699793550.git.yong.hu...@smartx.com/ Please go to the link to see more background information. The following points are what we have done in the patchset: 1. Take the policy of adding human-readable output just in HMP. 2. For the HMP output, display the human-readable information and drop the unknown bits in practice. 3. For the QMP output, remove the descriptive strings and only display bits encoded as numbers. Hyman Huang (3): qmp: Switch x-query-virtio-status back to numeric encoding virtio: Declare the decoding functions to static qapi: Define VhostDeviceProtocols and VirtioDeviceFeatures as plain C types hw/virtio/meson.build | 3 +- hw/virtio/virtio-hmp-cmds.c | 702 +++- hw/virtio/virtio-qmp.c | 684 +-- hw/virtio/virtio-qmp.h | 3 - qapi/virtio.json| 231 +--- 5 files changed, 723 insertions(+), 900 deletions(-) -- 2.31.1
[PATCH v2 1/2] i386/sev: Sort the error message
Prior to giving the caller the return number(in the next commit), sorting the error message: 1. report the error number on the ram_block_discard_disable failure path 2. report the error number on the syscall "open" failure path 3. report EINVAL when a prerequisite check fails or the command line is invalid Signed-off-by: Hyman Huang --- target/i386/sev.c | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/target/i386/sev.c b/target/i386/sev.c index 9a71246682..96eff73001 100644 --- a/target/i386/sev.c +++ b/target/i386/sev.c @@ -923,7 +923,7 @@ int sev_kvm_init(ConfidentialGuestSupport *cgs, Error **errp) ret = ram_block_discard_disable(true); if (ret) { error_report("%s: cannot disable RAM discard", __func__); -return -1; +return ret; } sev_guest = sev; @@ -940,6 +940,7 @@ int sev_kvm_init(ConfidentialGuestSupport *cgs, Error **errp) if (host_cbitpos != sev->cbitpos) { error_setg(errp, "%s: cbitpos check failed, host '%d' requested '%d'", __func__, host_cbitpos, sev->cbitpos); +ret = -EINVAL; goto err; } @@ -952,11 +953,12 @@ int sev_kvm_init(ConfidentialGuestSupport *cgs, Error **errp) error_setg(errp, "%s: reduced_phys_bits check failed," " it should be in the range of 1 to 63, requested '%d'", __func__, sev->reduced_phys_bits); +ret = -EINVAL; goto err; } devname = object_property_get_str(OBJECT(sev), "sev-device", NULL); -sev->sev_fd = open(devname, O_RDWR); +ret = sev->sev_fd = open(devname, O_RDWR); if (sev->sev_fd < 0) { error_setg(errp, "%s: Failed to open %s '%s'", __func__, devname, strerror(errno)); @@ -981,6 +983,7 @@ int sev_kvm_init(ConfidentialGuestSupport *cgs, Error **errp) if (!kvm_kernel_irqchip_allowed()) { error_report("%s: SEV-ES guests require in-kernel irqchip support", __func__); +ret = -EINVAL; goto err; } @@ -988,6 +991,7 @@ int sev_kvm_init(ConfidentialGuestSupport *cgs, Error **errp) error_report("%s: guest policy requires SEV-ES, but " "host SEV-ES support unavailable", __func__); +ret = -EINVAL; goto err; } cmd = KVM_SEV_ES_INIT; -- 2.39.1
[PATCH v2 0/2] Nitpick at the error message's output
v2: - rebase on master - add a commit to sort the error message so that an explanation error number can be returned on all failure paths Hyman Huang (2): i386/sev: Sort the error message i386/sev: Nitpick at the error message's output target/i386/sev.c | 10 +++--- 1 file changed, 7 insertions(+), 3 deletions(-) -- 2.39.1
[PATCH v2 2/2] i386/sev: Nitpick at the error message's output
The incorrect error message was produced as a result of the return number being disregarded on the sev_kvm_init failure path. For instance, when a user's failure to launch a SEV guest is caused by an incorrect IOCTL, the following message is reported: kvm: sev_kvm_init: failed to initialize ret=-25 fw_error=0 kvm: failed to initialize kvm: Operation not permitted While the error message's accurate output should be: kvm: sev_kvm_init: failed to initialize ret=-25 fw_error=0 kvm: failed to initialize kvm: Inappropriate ioctl for device Fix this by returning the return number directly on the failure path. Signed-off-by: Hyman Huang Reviewed-by: Daniel P. Berrangé Message-Id: --- target/i386/sev.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/target/i386/sev.c b/target/i386/sev.c index 96eff73001..3fef8cf163 100644 --- a/target/i386/sev.c +++ b/target/i386/sev.c @@ -1023,7 +1023,7 @@ int sev_kvm_init(ConfidentialGuestSupport *cgs, Error **errp) err: sev_guest = NULL; ram_block_discard_disable(false); -return -1; +return ret; } int -- 2.39.1
[PATCH] i386/sev: Nitpick at the error message's output
The incorrect error message was produced as a result of the return number being disregarded on the sev_kvm_init failure path. For instance, when a user's failure to launch a SEV guest is caused by an incorrect IOCTL, the following message is reported: kvm: sev_kvm_init: failed to initialize ret=-25 fw_error=0 kvm: failed to initialize kvm: Operation not permitted While the error message's accurate output should be: kvm: sev_kvm_init: failed to initialize ret=-25 fw_error=0 kvm: failed to initialize kvm: Inappropriate ioctl for device Fix this by returning the return number directly on the failure path. Signed-off-by: Hyman Huang --- target/i386/sev.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/target/i386/sev.c b/target/i386/sev.c index 9a71246682..4a69ca457c 100644 --- a/target/i386/sev.c +++ b/target/i386/sev.c @@ -1019,7 +1019,7 @@ int sev_kvm_init(ConfidentialGuestSupport *cgs, Error **errp) err: sev_guest = NULL; ram_block_discard_disable(false); -return -1; +return ret; } int -- 2.39.1
[PATCH v2 1/2] qapi/virtio: Keep feature and status bits in the QMP output
Maintain the feature and status bits in the x-query-virtio-status output and, as usual, add human-readable output only in HMP. Applications may find it useful to compare features and status information directly. An upper application, for example, could use the QMP command x-query-virtio-status to retrieve vhost-user net device features and the "ovs-vsctl list interface" command to retrieve interface features (in number format) in order to verify the correctness of the virtio negotiation between guest, QEMU, and OVS-DPDK. The application could then compare the two features directly, without the need for additional feature encoding. Signed-off-by: Hyman Huang --- hw/virtio/virtio-hmp-cmds.c | 29 -- hw/virtio/virtio-qmp.c | 23 ++--- qapi/virtio.json| 192 3 files changed, 48 insertions(+), 196 deletions(-) diff --git a/hw/virtio/virtio-hmp-cmds.c b/hw/virtio/virtio-hmp-cmds.c index 477c97dea2..4fabba4f9c 100644 --- a/hw/virtio/virtio-hmp-cmds.c +++ b/hw/virtio/virtio-hmp-cmds.c @@ -6,6 +6,7 @@ */ #include "qemu/osdep.h" +#include "virtio-qmp.h" #include "monitor/hmp.h" #include "monitor/monitor.h" #include "qapi/qapi-commands-virtio.h" @@ -13,8 +14,10 @@ static void hmp_virtio_dump_protocols(Monitor *mon, - VhostDeviceProtocols *pcol) + uint64_t bitmap) { +VhostDeviceProtocols *pcol = qmp_decode_protocols(bitmap); + strList *pcol_list = pcol->protocols; while (pcol_list) { monitor_printf(mon, "\t%s", pcol_list->value); @@ -31,8 +34,10 @@ static void hmp_virtio_dump_protocols(Monitor *mon, } static void hmp_virtio_dump_status(Monitor *mon, - VirtioDeviceStatus *status) + uint64_t bitmap) { +VirtioDeviceStatus *status = qmp_decode_status(bitmap); + strList *status_list = status->statuses; while (status_list) { monitor_printf(mon, "\t%s", status_list->value); @@ -49,8 +54,12 @@ static void hmp_virtio_dump_status(Monitor *mon, } static void hmp_virtio_dump_features(Monitor *mon, - VirtioDeviceFeatures *features) + uint16_t device_id, + uint64_t bitmap) { +VirtioDeviceFeatures *features = +qmp_decode_features(device_id, bitmap); + strList *transport_list = features->transports; while (transport_list) { monitor_printf(mon, "\t%s", transport_list->value); @@ -147,11 +156,11 @@ void hmp_virtio_status(Monitor *mon, const QDict *qdict) monitor_printf(mon, " status:\n"); hmp_virtio_dump_status(mon, s->status); monitor_printf(mon, " Guest features:\n"); -hmp_virtio_dump_features(mon, s->guest_features); +hmp_virtio_dump_features(mon, s->device_id, s->guest_features); monitor_printf(mon, " Host features:\n"); -hmp_virtio_dump_features(mon, s->host_features); +hmp_virtio_dump_features(mon, s->device_id, s->host_features); monitor_printf(mon, " Backend features:\n"); -hmp_virtio_dump_features(mon, s->backend_features); +hmp_virtio_dump_features(mon, s->device_id, s->backend_features); if (s->vhost_dev) { monitor_printf(mon, " VHost:\n"); @@ -172,11 +181,13 @@ void hmp_virtio_status(Monitor *mon, const QDict *qdict) monitor_printf(mon, "log_size: %"PRId64"\n", s->vhost_dev->log_size); monitor_printf(mon, "Features:\n"); -hmp_virtio_dump_features(mon, s->vhost_dev->features); +hmp_virtio_dump_features(mon, s->device_id, s->vhost_dev->features); monitor_printf(mon, "Acked features:\n"); -hmp_virtio_dump_features(mon, s->vhost_dev->acked_features); +hmp_virtio_dump_features(mon, +s->device_id, s->vhost_dev->acked_features); monitor_printf(mon, "Backend features:\n"); -hmp_virtio_dump_features(mon, s->vhost_dev->backend_features); +hmp_virtio_dump_features(mon, +s->device_id, s->vhost_dev->backend_features); monitor_printf(mon, "Protocol features:\n"); hmp_virtio_dump_protocols(mon, s->vhost_dev->protocol_features); } diff --git a/hw/virtio/virtio-qmp.c b/hw/virtio/virtio-qmp.c index 1dd96ed20f..1660c17653 100644 --- a/hw/virtio/virtio-qmp.c +++ b/hw/virtio/virtio-qmp.c @@ -733,12 +733,9 @@ VirtioStatus *qmp_x_query_virtio_status(const char *path, Error **errp) status->name = g_strdup(vdev->name); status->device_id = vdev->device_id;
[PATCH v2 2/2] hmp: Drop unknown feature and status bits
The QMP command "x-query-virtio-status" outputs the full feature and status bit information, so there is no need to maintain it in the HMP output; drop it. Signed-off-by: Hyman Huang --- hw/virtio/virtio-hmp-cmds.c | 13 - 1 file changed, 13 deletions(-) diff --git a/hw/virtio/virtio-hmp-cmds.c b/hw/virtio/virtio-hmp-cmds.c index 4fabba4f9c..ae27968523 100644 --- a/hw/virtio/virtio-hmp-cmds.c +++ b/hw/virtio/virtio-hmp-cmds.c @@ -27,10 +27,6 @@ static void hmp_virtio_dump_protocols(Monitor *mon, } } monitor_printf(mon, "\n"); -if (pcol->has_unknown_protocols) { -monitor_printf(mon, " unknown-protocols(0x%016"PRIx64")\n", - pcol->unknown_protocols); -} } static void hmp_virtio_dump_status(Monitor *mon, @@ -47,10 +43,6 @@ static void hmp_virtio_dump_status(Monitor *mon, } } monitor_printf(mon, "\n"); -if (status->has_unknown_statuses) { -monitor_printf(mon, " unknown-statuses(0x%016"PRIx32")\n", - status->unknown_statuses); -} } static void hmp_virtio_dump_features(Monitor *mon, @@ -81,11 +73,6 @@ static void hmp_virtio_dump_features(Monitor *mon, } monitor_printf(mon, "\n"); } - -if (features->has_unknown_dev_features) { -monitor_printf(mon, " unknown-features(0x%016"PRIx64")\n", - features->unknown_dev_features); -} } void hmp_virtio_query(Monitor *mon, const QDict *qdict) -- 2.39.1
[PATCH v2 0/2] Adjust the output of x-query-virtio-status
v2: - Changing the hmp_virtio_dump_xxx function signatures to implement the bitmap decoding, suggested by Philippe. Please review, thanks, Yong This patchset is derived from the series: https://lore.kernel.org/qemu-devel/cover.1699793550.git.yong.hu...@smartx.com/ Please go to the link to see more background information. The following points are what we have done in the patchset: 1. Take the policy of adding human-readable output just in HMP. 2. For the HMP output, display the human-readable information and drop the unknown bits in practice. 3. For the QMP output, remove the descriptive strings and only display bits encoded as numbers. Hyman Huang (2): qapi/virtio: Keep feature and status bits in the QMP output hmp: Drop unknown feature and status bits hw/virtio/virtio-hmp-cmds.c | 42 hw/virtio/virtio-qmp.c | 23 ++--- qapi/virtio.json| 192 3 files changed, 48 insertions(+), 209 deletions(-) -- 2.39.1
[PATCH 2/2] hmp: Drop unknown feature and status bits
The QMP command "x-query-virtio-status" outputs the full feature and status bit information, so there is no need to maintain it in the HMP output; drop it. Signed-off-by: Hyman Huang --- hw/virtio/virtio-hmp-cmds.c | 13 - 1 file changed, 13 deletions(-) diff --git a/hw/virtio/virtio-hmp-cmds.c b/hw/virtio/virtio-hmp-cmds.c index 721c630ab0..f9a7384604 100644 --- a/hw/virtio/virtio-hmp-cmds.c +++ b/hw/virtio/virtio-hmp-cmds.c @@ -25,10 +25,6 @@ static void hmp_virtio_dump_protocols(Monitor *mon, } } monitor_printf(mon, "\n"); -if (pcol->has_unknown_protocols) { -monitor_printf(mon, " unknown-protocols(0x%016"PRIx64")\n", - pcol->unknown_protocols); -} } static void hmp_virtio_dump_status(Monitor *mon, @@ -43,10 +39,6 @@ static void hmp_virtio_dump_status(Monitor *mon, } } monitor_printf(mon, "\n"); -if (status->has_unknown_statuses) { -monitor_printf(mon, " unknown-statuses(0x%016"PRIx32")\n", - status->unknown_statuses); -} } static void hmp_virtio_dump_features(Monitor *mon, @@ -73,11 +65,6 @@ static void hmp_virtio_dump_features(Monitor *mon, } monitor_printf(mon, "\n"); } - -if (features->has_unknown_dev_features) { -monitor_printf(mon, " unknown-features(0x%016"PRIx64")\n", - features->unknown_dev_features); -} } void hmp_virtio_query(Monitor *mon, const QDict *qdict) -- 2.39.1
[PATCH 0/2] Adjust the output of x-query-virtio-status
This patchset is derived from the series: https://lore.kernel.org/qemu-devel/cover.1699793550.git.yong.hu...@smartx.com/ Please go to the link to see more background information. The following points are what we have done in the patchset: 1. Take the policy of adding human-readable output just in HMP. 2. For the HMP output, display the human-readable information and drop the unknown bits in practice. 3. For the QMP output, remove the descriptive strings and only display bits encoded as numbers. Please review, thanks, Yong Hyman Huang (2): qapi/virtio: Keep feature and status bits in the QMP output hmp: Drop unknown feature and status bits hw/virtio/virtio-hmp-cmds.c | 38 --- hw/virtio/virtio-qmp.c | 23 ++--- qapi/virtio.json| 192 3 files changed, 45 insertions(+), 208 deletions(-) -- 2.39.1
[PATCH 1/2] qapi/virtio: Keep feature and status bits in the QMP output
Maintain the feature and status bits in the x-query-virtio-status output and, as usual, add human-readable output only in HMP. Applications may find it useful to compare features and status information directly. An upper application, for example, could use the QMP command x-query-virtio-status to retrieve vhost-user net device features and the "ovs-vsctl list interface" command to retrieve interface features (in number format) in order to verify the correctness of the virtio negotiation between guest, QEMU, and OVS-DPDK. The application could then compare the two features directly, without the need for additional feature encoding. Signed-off-by: Hyman Huang --- hw/virtio/virtio-hmp-cmds.c | 25 +++-- hw/virtio/virtio-qmp.c | 23 ++--- qapi/virtio.json| 192 3 files changed, 45 insertions(+), 195 deletions(-) diff --git a/hw/virtio/virtio-hmp-cmds.c b/hw/virtio/virtio-hmp-cmds.c index 477c97dea2..721c630ab0 100644 --- a/hw/virtio/virtio-hmp-cmds.c +++ b/hw/virtio/virtio-hmp-cmds.c @@ -6,6 +6,7 @@ */ #include "qemu/osdep.h" +#include "virtio-qmp.h" #include "monitor/hmp.h" #include "monitor/monitor.h" #include "qapi/qapi-commands-virtio.h" @@ -145,13 +146,17 @@ void hmp_virtio_status(Monitor *mon, const QDict *qdict) monitor_printf(mon, " endianness: %s\n", s->device_endian); monitor_printf(mon, " status:\n"); -hmp_virtio_dump_status(mon, s->status); +hmp_virtio_dump_status(mon, +qmp_decode_status(s->status)); monitor_printf(mon, " Guest features:\n"); -hmp_virtio_dump_features(mon, s->guest_features); +hmp_virtio_dump_features(mon, +qmp_decode_features(s->device_id, s->guest_features)); monitor_printf(mon, " Host features:\n"); -hmp_virtio_dump_features(mon, s->host_features); +hmp_virtio_dump_features(mon, +qmp_decode_features(s->device_id, s->host_features)); monitor_printf(mon, " Backend features:\n"); -hmp_virtio_dump_features(mon, s->backend_features); +hmp_virtio_dump_features(mon, +qmp_decode_features(s->device_id, s->backend_features)); if (s->vhost_dev) { monitor_printf(mon, " VHost:\n"); @@ -172,13 +177,17 @@ void hmp_virtio_status(Monitor *mon, const QDict *qdict) monitor_printf(mon, "log_size: %"PRId64"\n", s->vhost_dev->log_size); monitor_printf(mon, "Features:\n"); -hmp_virtio_dump_features(mon, s->vhost_dev->features); +hmp_virtio_dump_features(mon, +qmp_decode_features(s->device_id, s->vhost_dev->features)); monitor_printf(mon, "Acked features:\n"); -hmp_virtio_dump_features(mon, s->vhost_dev->acked_features); +hmp_virtio_dump_features(mon, +qmp_decode_features(s->device_id, s->vhost_dev->acked_features)); monitor_printf(mon, "Backend features:\n"); -hmp_virtio_dump_features(mon, s->vhost_dev->backend_features); +hmp_virtio_dump_features(mon, +qmp_decode_features(s->device_id, s->vhost_dev->backend_features)); monitor_printf(mon, "Protocol features:\n"); -hmp_virtio_dump_protocols(mon, s->vhost_dev->protocol_features); +hmp_virtio_dump_protocols(mon, +qmp_decode_protocols(s->vhost_dev->protocol_features)); } qapi_free_VirtioStatus(s); diff --git a/hw/virtio/virtio-qmp.c b/hw/virtio/virtio-qmp.c index 1dd96ed20f..1660c17653 100644 --- a/hw/virtio/virtio-qmp.c +++ b/hw/virtio/virtio-qmp.c @@ -733,12 +733,9 @@ VirtioStatus *qmp_x_query_virtio_status(const char *path, Error **errp) status->name = g_strdup(vdev->name); status->device_id = vdev->device_id; status->vhost_started = vdev->vhost_started; -status->guest_features = qmp_decode_features(vdev->device_id, - vdev->guest_features); -status->host_features = qmp_decode_features(vdev->device_id, -vdev->host_features); -status->backend_features = qmp_decode_features(vdev->device_id, - vdev->backend_features); +status->guest_features = vdev->guest_features; +status->host_features = vdev->host_features; +status->backend_features = vdev->backend_features; switch (vdev->device_endian) { case VIRTIO_DEVICE_ENDIAN_LITTLE: @@ -753,7 +750,7 @@ VirtioStatus *qmp_x_query_virtio_status(const char *path, Error **errp) } status->num_vqs = virtio_get_num_queues(vdev); -status->
[PULL 1/1] migration/dirtyrate: Remove an extra parameter
From: Wafer vcpu_dirty_stat_collect() has an unused parameter so remove it. Signed-off-by: Wafer Reviewed-by: Hyman Huang Message-Id: <20231204012230.4123-1-wa...@jaguarmicro.com> --- migration/dirtyrate.c | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/migration/dirtyrate.c b/migration/dirtyrate.c index 036ac017fc..62d86b8be2 100644 --- a/migration/dirtyrate.c +++ b/migration/dirtyrate.c @@ -129,8 +129,7 @@ static DirtyPageRecord *vcpu_dirty_stat_alloc(VcpuStat *stat) return g_new0(DirtyPageRecord, nvcpu); } -static void vcpu_dirty_stat_collect(VcpuStat *stat, -DirtyPageRecord *records, +static void vcpu_dirty_stat_collect(DirtyPageRecord *records, bool start) { CPUState *cpu; @@ -158,7 +157,7 @@ retry: WITH_QEMU_LOCK_GUARD(_cpu_list_lock) { gen_id = cpu_list_generation_id_get(); records = vcpu_dirty_stat_alloc(stat); -vcpu_dirty_stat_collect(stat, records, true); +vcpu_dirty_stat_collect(records, true); } duration = dirty_stat_wait(calc_time_ms, init_time_ms); @@ -172,7 +171,7 @@ retry: cpu_list_unlock(); goto retry; } -vcpu_dirty_stat_collect(stat, records, false); +vcpu_dirty_stat_collect(records, false); } for (i = 0; i < stat->nvcpu; i++) { -- 2.39.1
[PULL 0/1] Dirty page rate and dirty page limit 20231225 patch
The following changes since commit 191710c221f65b1542f6ea7fa4d30dde6e134fd7: Merge tag 'pull-request-2023-12-20' of https://gitlab.com/thuth/qemu into staging (2023-12-20 09:40:16 -0500) are available in the Git repository at: https://github.com/newfriday/qemu.git tags/dirtylimit-dirtyrate-pull-request-20231225 for you to fetch changes up to 4918712fb1c34ae43361b402642e426be85a789e: migration/dirtyrate: Remove an extra parameter (2023-12-25 18:05:47 +0800) dirtylimit dirtyrate pull request 20231225 Nitpick about an unused parameter Please apply, thanks, Yong Wafer (1): migration/dirtyrate: Remove an extra parameter migration/dirtyrate.c | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) -- 2.39.1
[PATCH RESEND v3 00/10] Support generic Luks encryption
v3: - Rebase on master - Add a test case for detached LUKS header - Adjust the design to honour preallocation of the payload device - Adjust the design to honour the payload offset from the header, even when detached - Support detached LUKS header creation using qemu-img - Support detached LUKS header querying - Do some code clean Thanks for commenting on this series, please review. Best regared, Yong v2: - Simplify the design by reusing the LUKS driver to implement the generic Luks encryption, thank Daniel for the insightful advice. - rebase on master. This functionality was motivated by the following to-do list seen in crypto documents: https://wiki.qemu.org/Features/Block/Crypto The last chapter says we should "separate header volume": The LUKS format has ability to store the header in a separate volume from the payload. We should extend the LUKS driver in QEMU to support this use case. By enhancing the LUKS driver, it is possible to enable the detachable LUKS header and, as a result, achieve general encryption for any disk format that QEMU has supported. Take the qcow2 as an example, the usage of the generic LUKS encryption as follows: 1. add a protocol blockdev node of data disk $ virsh qemu-monitor-command vm '{"execute":"blockdev-add", > "arguments":{"node-name":"libvirt-1-storage", "driver":"file", > "filename":"/path/to/test_disk.qcow2"}}' 2. add a protocol blockdev node of LUKS header as above. $ virsh qemu-monitor-command vm '{"execute":"blockdev-add", > "arguments":{"node-name":"libvirt-2-storage", "driver":"file", > "filename": "/path/to/cipher.gluks" }}' 3. add the secret for decrypting the cipher stored in LUKS header above $ virsh qemu-monitor-command vm '{"execute":"object-add", > "arguments":{"qom-type":"secret", "id": > "libvirt-2-storage-secret0", "data":"abc123"}}' 4. add the qcow2-drived blockdev format node $ virsh qemu-monitor-command vm '{"execute":"blockdev-add", > "arguments":{"node-name":"libvirt-1-format", "driver":"qcow2", > "file":"libvirt-1-storage"}}' 5. add the luks-drived blockdev to link the qcow2 disk with LUKS header by specifying the field "header" $ virsh qemu-monitor-command vm '{"execute":"blockdev-add", > "arguments":{"node-name":"libvirt-2-format", "driver":"luks", > "file":"libvirt-1-format", "header":"libvirt-2-storage", > "key-secret":"libvirt-2-format-secret0"}}' 6. add the virtio-blk device finally $ virsh qemu-monitor-command vm '{"execute":"device_add", > "arguments": {"num-queues":"1", "driver":"virtio-blk-pci", > "drive": "libvirt-2-format", "id":"virtio-disk2"}}' The generic LUKS encryption method of starting a virtual machine (VM) is somewhat similar to hot-plug in that both maintaining the same json command while the starting VM changes the "blockdev-add/device_add" parameters to "blockdev/device". Hyman Huang (10): crypto: Introduce option and structure for detached LUKS header crypto: Support generic LUKS encryption qapi: Make parameter 'file' optional for BlockdevCreateOptionsLUKS crypto: Introduce creation option and structure for detached LUKS header crypto: Mark the payload_offset_sector invalid for detached LUKS header block: Support detached LUKS header creation using blockdev-create block: Support detached LUKS header creation using qemu-img crypto: Introduce 'detached-header' field in QCryptoBlockInfoLUKS tests: Add detached LUKS header case MAINTAINERS: Add section "Detached LUKS header" MAINTAINERS | 5 + block.c | 5 +- block/crypto.c| 146 ++-- block/crypto.h| 8 + crypto/block-luks.c | 49 +++- crypto/block.c| 1 + crypto/blockpriv.h| 3 + qapi/block-core.json | 14 +- qapi/crypto.json | 13 +- tests/qemu-iotests/210.out| 4 + tests/qemu-iotests/tests/luks-detached-header | 214 ++ .../tests/luks-detached-header.out| 5 + 12 files changed, 436 insertions(+), 31 deletions(-) create mode 100755 tests/qemu-iotests/tests/luks-detached-header create mode 100644 tests/qemu-iotests/tests/luks-detached-header.out -- 2.39.1
[PATCH RESEND v3 04/10] crypto: Introduce creation option and structure for detached LUKS header
Introduce 'header' field in BlockdevCreateOptionsLUKS to support detached LUKS header creation. Meanwhile, introduce header-related field in QCryptoBlock. Signed-off-by: Hyman Huang --- crypto/blockpriv.h | 3 +++ qapi/block-core.json | 3 +++ qapi/crypto.json | 5 - 3 files changed, 10 insertions(+), 1 deletion(-) diff --git a/crypto/blockpriv.h b/crypto/blockpriv.h index 3c7ccea504..6289aea961 100644 --- a/crypto/blockpriv.h +++ b/crypto/blockpriv.h @@ -42,6 +42,9 @@ struct QCryptoBlock { size_t niv; uint64_t payload_offset; /* In bytes */ uint64_t sector_size; /* In bytes */ + +bool detached_header; /* True if disk has a detached LUKS header */ +uint64_t detached_header_size; /* LUKS header size plus key slot size */ }; struct QCryptoBlockDriver { diff --git a/qapi/block-core.json b/qapi/block-core.json index 9ac256c489..8aec179926 100644 --- a/qapi/block-core.json +++ b/qapi/block-core.json @@ -4948,6 +4948,8 @@ # @file: Node to create the image format on, mandatory except when #'preallocation' is not requested # +# @header: Detached LUKS header node to format. (since 9.0) +# # @size: Size of the virtual disk in bytes # # @preallocation: Preallocation mode for the new image (since: 4.2) @@ -4958,6 +4960,7 @@ { 'struct': 'BlockdevCreateOptionsLUKS', 'base': 'QCryptoBlockCreateOptionsLUKS', 'data': { '*file':'BlockdevRef', +'*header': 'BlockdevRef', 'size': 'size', '*preallocation': 'PreallocMode' } } diff --git a/qapi/crypto.json b/qapi/crypto.json index fd3d46ebd1..6b4e86cb81 100644 --- a/qapi/crypto.json +++ b/qapi/crypto.json @@ -195,10 +195,13 @@ # decryption key. Mandatory except when probing image for # metadata only. # +# @detached-header: if true, disk has detached LUKS header. +# # Since: 2.6 ## { 'struct': 'QCryptoBlockOptionsLUKS', - 'data': { '*key-secret': 'str' }} + 'data': { '*key-secret': 'str', +'*detached-header': 'bool' }} ## # @QCryptoBlockCreateOptionsLUKS: -- 2.39.1
[PATCH RESEND v3 10/10] MAINTAINERS: Add section "Detached LUKS header"
I've built interests in block cryptography and also have been working on projects related to this subsystem. Add a section to the MAINTAINERS file for detached LUKS header, it only has a test case in it currently. Signed-off-by: Hyman Huang --- MAINTAINERS | 5 + 1 file changed, 5 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index 395f26ba86..f0f7b889a3 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -3391,6 +3391,11 @@ F: migration/dirtyrate.c F: migration/dirtyrate.h F: include/sysemu/dirtyrate.h +Detached LUKS header +M: Hyman Huang +S: Maintained +F: tests/qemu-iotests/tests/luks-detached-header + D-Bus M: Marc-André Lureau S: Maintained -- 2.39.1
[PATCH RESEND v3 05/10] crypto: Mark the payload_offset_sector invalid for detached LUKS header
Set the payload_offset_sector to a value that is nearly never reached in order to mark it as invalid and indicate that 0 should be the offset of the read/write operation on the 'file' protocol blockdev node. Signed-off-by: Hyman Huang --- crypto/block-luks.c | 41 +++-- 1 file changed, 31 insertions(+), 10 deletions(-) diff --git a/crypto/block-luks.c b/crypto/block-luks.c index fb01ec38bb..48443ffcae 100644 --- a/crypto/block-luks.c +++ b/crypto/block-luks.c @@ -34,6 +34,8 @@ #include "qemu/bitmap.h" +#define INVALID_SECTOR_OFFSET UINT32_MAX + /* * Reference for the LUKS format implemented here is * @@ -136,6 +138,13 @@ struct QCryptoBlockLUKS { }; +static inline uint32_t +qcrypto_block_luks_payload_offset(uint32_t sector) +{ +return sector == INVALID_SECTOR_OFFSET ? 0 : +sector * QCRYPTO_BLOCK_LUKS_SECTOR_SIZE; +} + static int qcrypto_block_luks_cipher_name_lookup(const char *name, QCryptoCipherMode mode, uint32_t key_bytes, @@ -1255,8 +1264,8 @@ qcrypto_block_luks_open(QCryptoBlock *block, } block->sector_size = QCRYPTO_BLOCK_LUKS_SECTOR_SIZE; -block->payload_offset = luks->header.payload_offset_sector * -block->sector_size; +block->payload_offset = +qcrypto_block_luks_payload_offset(luks->header.payload_offset_sector); return 0; @@ -1529,16 +1538,28 @@ qcrypto_block_luks_create(QCryptoBlock *block, slot->stripes = QCRYPTO_BLOCK_LUKS_STRIPES; } -/* The total size of the LUKS headers is the partition header + key - * slot headers, rounded up to the nearest sector, combined with - * the size of each master key material region, also rounded up - * to the nearest sector */ -luks->header.payload_offset_sector = header_sectors + -QCRYPTO_BLOCK_LUKS_NUM_KEY_SLOTS * split_key_sectors; +if (block->detached_header) { +/* + * Set the payload_offset_sector to a value that is nearly never + * reached in order to mark it as invalid and indicate that 0 should + * be the offset of the read/write operation on the 'file' protocol + * blockdev node. Here the UINT32_MAX is choosed + */ +luks->header.payload_offset_sector = INVALID_SECTOR_OFFSET; +} else { +/* + * The total size of the LUKS headers is the partition header + key + * slot headers, rounded up to the nearest sector, combined with + * the size of each master key material region, also rounded up + * to the nearest sector + */ +luks->header.payload_offset_sector = header_sectors + +QCRYPTO_BLOCK_LUKS_NUM_KEY_SLOTS * split_key_sectors; +} block->sector_size = QCRYPTO_BLOCK_LUKS_SECTOR_SIZE; -block->payload_offset = luks->header.payload_offset_sector * -block->sector_size; +block->payload_offset = +qcrypto_block_luks_payload_offset(luks->header.payload_offset_sector); /* Reserve header space to match payload offset */ initfunc(block, block->payload_offset, opaque, _err); -- 2.39.1
[PATCH RESEND v3 09/10] tests: Add detached LUKS header case
Signed-off-by: Hyman Huang --- tests/qemu-iotests/tests/luks-detached-header | 214 ++ .../tests/luks-detached-header.out| 5 + 2 files changed, 219 insertions(+) create mode 100755 tests/qemu-iotests/tests/luks-detached-header create mode 100644 tests/qemu-iotests/tests/luks-detached-header.out diff --git a/tests/qemu-iotests/tests/luks-detached-header b/tests/qemu-iotests/tests/luks-detached-header new file mode 100755 index 00..cf305bfa47 --- /dev/null +++ b/tests/qemu-iotests/tests/luks-detached-header @@ -0,0 +1,214 @@ +#!/usr/bin/env python3 +# group: rw auto +# +# Test detached LUKS header +# +# Copyright (C) 2024 SmartX Inc. +# +# Authors: +# Hyman Huang +# +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 2 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program. If not, see <http://www.gnu.org/licenses/>. +# + +import os +import iotests +from iotests import imgfmt, qemu_img_create, img_info_log, qemu_img_info, QMPTestCase + + +image_size = 128 * 1024 * 1024 + +luks_img = os.path.join(iotests.test_dir, 'luks.img') +detached_header_img1 = os.path.join(iotests.test_dir, 'detached_header.img1') +detached_header_img2 = os.path.join(iotests.test_dir, 'detached_header.img2') +detached_payload_raw_img = os.path.join(iotests.test_dir, 'detached_payload_raw.img') +detached_payload_qcow2_img = os.path.join(iotests.test_dir, 'detached_payload_qcow2.img') + +secret_obj = 'secret,id=sec0,data=foo' +luks_opts = 'key-secret=sec0' + + +class TestDetachedLUKSHeader(QMPTestCase): +def setUp(self) -> None: +self.vm = iotests.VM() +self.vm.add_object(secret_obj) +self.vm.launch() + +# 1. Create the normal LUKS disk with 128M size +self.vm.blockdev_create({ 'driver': 'file', + 'filename': luks_img, + 'size': 0 }) +self.vm.qmp_log('blockdev-add', driver='file', filename=luks_img, + node_name='luks-1-storage') +result = self.vm.blockdev_create({ 'driver': imgfmt, + 'file': 'luks-1-storage', + 'key-secret': 'sec0', + 'size': image_size, + 'iter-time': 10 }) +# None is expected +self.assertEqual(result, None) + +# 2. Create the LUKS disk with detached header (raw) + +# Create detached LUKS header +self.vm.blockdev_create({ 'driver': 'file', + 'filename': detached_header_img1, + 'size': 0 }) +self.vm.qmp_log('blockdev-add', driver='file', filename=detached_header_img1, + node_name='luks-2-header-storage') + +# Create detached LUKS raw payload +self.vm.blockdev_create({ 'driver': 'file', + 'filename': detached_payload_raw_img, + 'size': 0 }) +self.vm.qmp_log('blockdev-add', driver='file', + filename=detached_payload_raw_img, + node_name='luks-2-payload-storage') + +# Format LUKS disk with detached header +result = self.vm.blockdev_create({ 'driver': imgfmt, + 'header': 'luks-2-header-storage', + 'file': 'luks-2-payload-storage', + 'key-secret': 'sec0', + 'preallocation': 'full', + 'size': image_size, + 'iter-time': 10 }) +self.assertEqual(result, None) + +self.vm.shutdown() + +# 3. Create the LUKS disk with detached header (qcow2) + +# Create detached LUKS header using qemu-img +res = qemu_img_create('-f', 'luks', '--object', secret_obj, '-o', luks_opts, + '-o', "detached-mode=true", detached_header_img2) +assert res.returncode == 0 + +# Create detached LUKS qcow2 payload +res = qemu_img_create('-f', 'qcow2', detached_payload_qcow2_img, str(image_size)) +assert res.returncode == 0 + +def tearDown(self) -> None: +os.remove(luks_img) +os.remove(detached_header_img1) +os.remov
[PATCH RESEND v3 01/10] crypto: Introduce option and structure for detached LUKS header
Add the "header" option for the LUKS format. This field would be used to identify the blockdev's position where a detachable LUKS header is stored. In addition, introduce header field in struct BlockCrypto Signed-off-by: Hyman Huang Reviewed-by: Daniel P. Berrangé Message-Id: <5b99f60c7317092a563d7ca3fb4b414197015eb2.1701879996.git.yong.hu...@smartx.com> --- block/crypto.c | 1 + qapi/block-core.json | 6 +- 2 files changed, 6 insertions(+), 1 deletion(-) diff --git a/block/crypto.c b/block/crypto.c index 921933a5e5..f82b13d32b 100644 --- a/block/crypto.c +++ b/block/crypto.c @@ -39,6 +39,7 @@ typedef struct BlockCrypto BlockCrypto; struct BlockCrypto { QCryptoBlock *block; bool updating_keys; +BdrvChild *header; /* Reference to the detached LUKS header */ }; diff --git a/qapi/block-core.json b/qapi/block-core.json index ca390c5700..10be08d08f 100644 --- a/qapi/block-core.json +++ b/qapi/block-core.json @@ -3352,11 +3352,15 @@ # decryption key (since 2.6). Mandatory except when doing a # metadata-only probe of the image. # +# @header: optional reference to the location of a blockdev +# storing a detached LUKS header. (since 9.0) +# # Since: 2.9 ## { 'struct': 'BlockdevOptionsLUKS', 'base': 'BlockdevOptionsGenericFormat', - 'data': { '*key-secret': 'str' } } + 'data': { '*key-secret': 'str', +'*header': 'BlockdevRef'} } ## # @BlockdevOptionsGenericCOWFormat: -- 2.39.1
[PATCH RESEND v3 08/10] crypto: Introduce 'detached-header' field in QCryptoBlockInfoLUKS
When querying the LUKS disk with the qemu-img tool or other APIs, add information about whether the LUKS header is detached. Additionally, update the test case with the appropriate modification. Signed-off-by: Hyman Huang --- crypto/block-luks.c| 2 ++ qapi/crypto.json | 3 +++ tests/qemu-iotests/210.out | 4 3 files changed, 9 insertions(+) diff --git a/crypto/block-luks.c b/crypto/block-luks.c index 474c7aee2e..c5e53b4ee4 100644 --- a/crypto/block-luks.c +++ b/crypto/block-luks.c @@ -1266,6 +1266,7 @@ qcrypto_block_luks_open(QCryptoBlock *block, block->sector_size = QCRYPTO_BLOCK_LUKS_SECTOR_SIZE; block->payload_offset = qcrypto_block_luks_payload_offset(luks->header.payload_offset_sector); +block->detached_header = (block->payload_offset == 0) ? true : false; return 0; @@ -1892,6 +1893,7 @@ static int qcrypto_block_luks_get_info(QCryptoBlock *block, info->u.luks.master_key_iters = luks->header.master_key_iterations; info->u.luks.uuid = g_strndup((const char *)luks->header.uuid, sizeof(luks->header.uuid)); +info->u.luks.detached_header = block->detached_header; for (i = 0; i < QCRYPTO_BLOCK_LUKS_NUM_KEY_SLOTS; i++) { slot = g_new0(QCryptoBlockInfoLUKSSlot, 1); diff --git a/qapi/crypto.json b/qapi/crypto.json index 8e81aa8454..336c880b5d 100644 --- a/qapi/crypto.json +++ b/qapi/crypto.json @@ -317,6 +317,8 @@ # # @hash-alg: the master key hash algorithm # +# @detached-header: whether the LUKS header is detached (Since 9.0) +# # @payload-offset: offset to the payload data in bytes # # @master-key-iters: number of PBKDF2 iterations for key material @@ -333,6 +335,7 @@ 'ivgen-alg': 'QCryptoIVGenAlgorithm', '*ivgen-hash-alg': 'QCryptoHashAlgorithm', 'hash-alg': 'QCryptoHashAlgorithm', + 'detached-header': 'bool', 'payload-offset': 'int', 'master-key-iters': 'int', 'uuid': 'str', diff --git a/tests/qemu-iotests/210.out b/tests/qemu-iotests/210.out index 96d9f749dd..94b29b2120 100644 --- a/tests/qemu-iotests/210.out +++ b/tests/qemu-iotests/210.out @@ -18,6 +18,7 @@ virtual size: 128 MiB (134217728 bytes) encrypted: yes Format specific information: ivgen alg: plain64 +detached header: false hash alg: sha256 cipher alg: aes-256 uuid: ---- @@ -70,6 +71,7 @@ virtual size: 64 MiB (67108864 bytes) encrypted: yes Format specific information: ivgen alg: plain64 +detached header: false hash alg: sha1 cipher alg: aes-128 uuid: ---- @@ -125,6 +127,7 @@ virtual size: 0 B (0 bytes) encrypted: yes Format specific information: ivgen alg: plain64 +detached header: false hash alg: sha256 cipher alg: aes-256 uuid: ---- @@ -195,6 +198,7 @@ virtual size: 0 B (0 bytes) encrypted: yes Format specific information: ivgen alg: plain64 +detached header: false hash alg: sha256 cipher alg: aes-256 uuid: ---- -- 2.39.1
[PATCH RESEND v3 03/10] qapi: Make parameter 'file' optional for BlockdevCreateOptionsLUKS
To support detached LUKS header creation, make the existing 'file' filed in BlockdevCreateOptionsLUKS optional, while also adding an extra optional 'header' field in the next commit. Signed-off-by: Hyman Huang --- block/crypto.c | 21 ++--- qapi/block-core.json | 5 +++-- 2 files changed, 17 insertions(+), 9 deletions(-) diff --git a/block/crypto.c b/block/crypto.c index 6063879bac..78fbe79c95 100644 --- a/block/crypto.c +++ b/block/crypto.c @@ -659,9 +659,9 @@ block_crypto_co_create_luks(BlockdevCreateOptions *create_options, Error **errp) assert(create_options->driver == BLOCKDEV_DRIVER_LUKS); luks_opts = _options->u.luks; -bs = bdrv_co_open_blockdev_ref(luks_opts->file, errp); -if (bs == NULL) { -return -EIO; +if (luks_opts->file == NULL) { +error_setg(errp, "Formatting LUKS disk requires parameter 'file'"); +return -EINVAL; } create_opts = (QCryptoBlockCreateOptions) { @@ -673,10 +673,17 @@ block_crypto_co_create_luks(BlockdevCreateOptions *create_options, Error **errp) preallocation = luks_opts->preallocation; } -ret = block_crypto_co_create_generic(bs, luks_opts->size, _opts, - preallocation, errp); -if (ret < 0) { -goto fail; +if (luks_opts->file) { +bs = bdrv_co_open_blockdev_ref(luks_opts->file, errp); +if (bs == NULL) { +return -EIO; +} + +ret = block_crypto_co_create_generic(bs, luks_opts->size, _opts, + preallocation, errp); +if (ret < 0) { +goto fail; +} } ret = 0; diff --git a/qapi/block-core.json b/qapi/block-core.json index 10be08d08f..9ac256c489 100644 --- a/qapi/block-core.json +++ b/qapi/block-core.json @@ -4945,7 +4945,8 @@ # # Driver specific image creation options for LUKS. # -# @file: Node to create the image format on +# @file: Node to create the image format on, mandatory except when +#'preallocation' is not requested # # @size: Size of the virtual disk in bytes # @@ -4956,7 +4957,7 @@ ## { 'struct': 'BlockdevCreateOptionsLUKS', 'base': 'QCryptoBlockCreateOptionsLUKS', - 'data': { 'file': 'BlockdevRef', + 'data': { '*file':'BlockdevRef', 'size': 'size', '*preallocation': 'PreallocMode' } } -- 2.39.1
[PATCH RESEND v3 06/10] block: Support detached LUKS header creation using blockdev-create
The LUKS disk with detached header consists of a separate LUKS header and payload. This LUKS disk type should be formatted as follows: 1. add the secret to lock/unlock the cipher stored in the detached LUKS header $ virsh qemu-monitor-command vm '{"execute":"object-add", > "arguments":{"qom-type": "secret", "id": "sec0", "data": "foo"}}' 2. create a header img with 0 size $ virsh qemu-monitor-command vm '{"execute":"blockdev-create", > "arguments":{"job-id":"job0", "options":{"driver":"file", > "filename":"/path/to/detached_luks_header.img", "size":0 }}}' 3. add protocol blockdev node for header $ virsh qemu-monitor-command vm '{"execute":"blockdev-add", > "arguments": {"driver":"file", "filename": > "/path/to/detached_luks_header.img", "node-name": > "detached-luks-header-storage"}}' 4. create a payload img with 0 size $ virsh qemu-monitor-command vm '{"execute":"blockdev-create", > "arguments":{"job-id":"job1", "options":{"driver":"file", > "filename":"/path/to/detached_luks_payload_raw.img", "size":0}}}' 5. add protocol blockdev node for payload $ virsh qemu-monitor-command vm '{"execute":"blockdev-add", > "arguments": {"driver":"file", "filename": > "/path/to/detached_luks_payload_raw.img", "node-name": > "luks-payload-raw-storage"}}' 6. do the formatting with 128M size $ virsh qemu-monitor-command c81_node1 '{"execute":"blockdev-create", > "arguments":{"job-id":"job2", "options":{"driver":"luks", "header": > "detached-luks-header-storage", "file":"luks-payload-raw-storage", > "size":134217728, "preallocation":"full", "key-secret":"sec0" }}}' Signed-off-by: Hyman Huang --- block/crypto.c | 109 crypto/block-luks.c | 6 ++- crypto/block.c | 1 + 3 files changed, 106 insertions(+), 10 deletions(-) diff --git a/block/crypto.c b/block/crypto.c index 78fbe79c95..76cc8bda49 100644 --- a/block/crypto.c +++ b/block/crypto.c @@ -160,6 +160,48 @@ error: return ret; } +static int coroutine_fn GRAPH_UNLOCKED +block_crypto_co_format_luks_payload(BlockdevCreateOptionsLUKS *luks_opts, +Error **errp) +{ +BlockDriverState *bs = NULL; +BlockBackend *blk = NULL; +Error *local_error = NULL; +int ret; + +if (luks_opts->size > INT64_MAX) { +return -EFBIG; +} + +bs = bdrv_co_open_blockdev_ref(luks_opts->file, errp); +if (bs == NULL) { +return -EIO; +} + +blk = blk_co_new_with_bs(bs, BLK_PERM_WRITE | BLK_PERM_RESIZE, + BLK_PERM_ALL, errp); +if (!blk) { +ret = -EPERM; +goto fail; +} + +ret = blk_truncate(blk, luks_opts->size, true, + luks_opts->preallocation, 0, _error); +if (ret < 0) { +if (ret == -EFBIG) { +/* Replace the error message with a better one */ +error_free(local_error); +error_setg(errp, "The requested file size is too large"); +} +goto fail; +} + +ret = 0; + +fail: +bdrv_co_unref(bs); +return ret; +} static QemuOptsList block_crypto_runtime_opts_luks = { .name = "crypto", @@ -651,6 +693,7 @@ static int coroutine_fn GRAPH_UNLOCKED block_crypto_co_create_luks(BlockdevCreateOptions *create_options, Error **errp) { BlockdevCreateOptionsLUKS *luks_opts; +BlockDriverState *hdr_bs = NULL; BlockDriverState *bs = NULL; QCryptoBlockCreateOptions create_opts; PreallocMode preallocation = PREALLOC_MODE_OFF; @@ -659,8 +702,22 @@ block_crypto_co_create_luks(BlockdevCreateOptions *create_options, Error **errp) assert(create_options->driver == BLOCKDEV_DRIVER_LUKS); luks_opts = _options->u.luks; -if (luks_opts->file == NULL) { -error_setg(errp, "Formatting LUKS disk requires parameter 'file'"); +if (luks_opts->header == NULL && luks_opts->file == NULL) { +error_setg(errp, "Either the parameter 'header' or 'file' should " + "be specified"); +return -EINVAL; +} + +if (luks_opts->detached_header && luks_opts->header == NULL) { +error_setg(errp, "Formatting a detached LUKS disk requries "
[PATCH RESEND v3 02/10] crypto: Support generic LUKS encryption
By enhancing the LUKS driver, it is possible to enable the detachable LUKS header and, as a result, achieve general encryption for any disk format that QEMU has supported. Take the qcow2 as an example, the usage of the generic LUKS encryption as follows: 1. add a protocol blockdev node of data disk $ virsh qemu-monitor-command vm '{"execute":"blockdev-add", > "arguments":{"node-name":"libvirt-1-storage", "driver":"file", > "filename":"/path/to/test_disk.qcow2"}}' 2. add a protocol blockdev node of LUKS header as above. $ virsh qemu-monitor-command vm '{"execute":"blockdev-add", > "arguments":{"node-name":"libvirt-2-storage", "driver":"file", > "filename": "/path/to/cipher.gluks" }}' 3. add the secret for decrypting the cipher stored in LUKS header above $ virsh qemu-monitor-command vm '{"execute":"object-add", > "arguments":{"qom-type":"secret", "id": > "libvirt-2-storage-secret0", "data":"abc123"}}' 4. add the qcow2-drived blockdev format node $ virsh qemu-monitor-command vm '{"execute":"blockdev-add", > "arguments":{"node-name":"libvirt-1-format", "driver":"qcow2", > "file":"libvirt-1-storage"}}' 5. add the luks-drived blockdev to link the qcow2 disk with LUKS header by specifying the field "header" $ virsh qemu-monitor-command vm '{"execute":"blockdev-add", > "arguments":{"node-name":"libvirt-2-format", "driver":"luks", > "file":"libvirt-1-format", "header":"libvirt-2-storage", > "key-secret":"libvirt-2-format-secret0"}}' 6. add the virtio-blk device finally $ virsh qemu-monitor-command vm '{"execute":"device_add", > "arguments": {"num-queues":"1", "driver":"virtio-blk-pci", > "drive": "libvirt-2-format", "id":"virtio-disk2"}}' The generic LUKS encryption method of starting a virtual machine (VM) is somewhat similar to hot-plug in that both maintaining the same json command while the starting VM changes the "blockdev-add/device_add" parameters to "blockdev/device". Signed-off-by: Hyman Huang Message-Id: <910801f303da1601051479d3b7e5c2c6b4e01eb7.1701879996.git.yong.hu...@smartx.com> --- block/crypto.c | 14 +- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/block/crypto.c b/block/crypto.c index f82b13d32b..6063879bac 100644 --- a/block/crypto.c +++ b/block/crypto.c @@ -64,12 +64,14 @@ static int block_crypto_read_func(QCryptoBlock *block, Error **errp) { BlockDriverState *bs = opaque; +BlockCrypto *crypto = bs->opaque; ssize_t ret; GLOBAL_STATE_CODE(); GRAPH_RDLOCK_GUARD_MAINLOOP(); -ret = bdrv_pread(bs->file, offset, buflen, buf, 0); +ret = bdrv_pread(crypto->header ? crypto->header : bs->file, + offset, buflen, buf, 0); if (ret < 0) { error_setg_errno(errp, -ret, "Could not read encryption header"); return ret; @@ -269,6 +271,7 @@ static int block_crypto_open_generic(QCryptoBlockFormat format, QCryptoBlockOpenOptions *open_opts = NULL; unsigned int cflags = 0; QDict *cryptoopts = NULL; +const char *hdr_bdref = qdict_get_try_str(options, "header"); GLOBAL_STATE_CODE(); @@ -277,6 +280,15 @@ static int block_crypto_open_generic(QCryptoBlockFormat format, return ret; } +if (hdr_bdref) { +crypto->header = bdrv_open_child(NULL, options, "header", bs, + _of_bds, BDRV_CHILD_METADATA, + false, errp); +if (!crypto->header) { +return -EINVAL; +} +} + GRAPH_RDLOCK_GUARD_MAINLOOP(); bs->supported_write_flags = BDRV_REQ_FUA & -- 2.39.1
[PATCH RESEND v3 07/10] block: Support detached LUKS header creation using qemu-img
Add the 'detached-mode' option to specify the creation of a detached LUKS header. This is how it is used: $ qemu-img create --object secret,id=sec0,data=abc123 -f luks > -o cipher-alg=aes-256,cipher-mode=xts -o key-secret=sec0 > -o detached-mode=true header.luks Signed-off-by: Hyman Huang --- block.c | 5 - block/crypto.c | 9 - block/crypto.h | 8 qapi/crypto.json | 5 - 4 files changed, 24 insertions(+), 3 deletions(-) diff --git a/block.c b/block.c index bfb0861ec6..fa9ce36928 100644 --- a/block.c +++ b/block.c @@ -7517,7 +7517,10 @@ void bdrv_img_create(const char *filename, const char *fmt, goto out; } -if (size == -1) { +/* Parameter 'size' is not needed for detached LUKS header */ +if (size == -1 && +!(!strcmp(fmt, "luks") && + qemu_opt_get_bool(opts, "detached-mode", false))) { error_setg(errp, "Image creation needs a size parameter"); goto out; } diff --git a/block/crypto.c b/block/crypto.c index 76cc8bda49..812c3c28f5 100644 --- a/block/crypto.c +++ b/block/crypto.c @@ -229,6 +229,7 @@ static QemuOptsList block_crypto_create_opts_luks = { BLOCK_CRYPTO_OPT_DEF_LUKS_IVGEN_HASH_ALG(""), BLOCK_CRYPTO_OPT_DEF_LUKS_HASH_ALG(""), BLOCK_CRYPTO_OPT_DEF_LUKS_ITER_TIME(""), +BLOCK_CRYPTO_OPT_DEF_LUKS_DETACHED_MODE(""), { /* end of list */ } }, }; @@ -793,6 +794,8 @@ block_crypto_co_create_opts_luks(BlockDriver *drv, const char *filename, PreallocMode prealloc; char *buf = NULL; int64_t size; +bool detached_mode = +qemu_opt_get_bool(opts, "detached-mode", false); int ret; Error *local_err = NULL; @@ -832,8 +835,12 @@ block_crypto_co_create_opts_luks(BlockDriver *drv, const char *filename, goto fail; } + /* The detached_header default to true if detached-mode is specified */ +create_opts->u.luks.detached_header = detached_mode ? true : false; + /* Create format layer */ -ret = block_crypto_co_create_generic(bs, size, create_opts, prealloc, errp); +ret = block_crypto_co_create_generic(bs, detached_mode ? 0 : size, + create_opts, prealloc, errp); if (ret < 0) { goto fail; } diff --git a/block/crypto.h b/block/crypto.h index 72e792c9af..bceefd45bd 100644 --- a/block/crypto.h +++ b/block/crypto.h @@ -41,6 +41,7 @@ #define BLOCK_CRYPTO_OPT_LUKS_IVGEN_HASH_ALG "ivgen-hash-alg" #define BLOCK_CRYPTO_OPT_LUKS_HASH_ALG "hash-alg" #define BLOCK_CRYPTO_OPT_LUKS_ITER_TIME "iter-time" +#define BLOCK_CRYPTO_OPT_LUKS_DETACHED_MODE "detached-mode" #define BLOCK_CRYPTO_OPT_LUKS_KEYSLOT "keyslot" #define BLOCK_CRYPTO_OPT_LUKS_STATE "state" #define BLOCK_CRYPTO_OPT_LUKS_OLD_SECRET "old-secret" @@ -100,6 +101,13 @@ .help = "Select new state of affected keyslots (active/inactive)",\ } +#define BLOCK_CRYPTO_OPT_DEF_LUKS_DETACHED_MODE(prefix) \ +{ \ +.name = prefix BLOCK_CRYPTO_OPT_LUKS_DETACHED_MODE, \ +.type = QEMU_OPT_BOOL,\ +.help = "Create a detached LUKS header", \ +} + #define BLOCK_CRYPTO_OPT_DEF_LUKS_KEYSLOT(prefix) \ { \ .name = prefix BLOCK_CRYPTO_OPT_LUKS_KEYSLOT, \ diff --git a/qapi/crypto.json b/qapi/crypto.json index 6b4e86cb81..8e81aa8454 100644 --- a/qapi/crypto.json +++ b/qapi/crypto.json @@ -226,6 +226,8 @@ # @iter-time: number of milliseconds to spend in PBKDF passphrase # processing. Currently defaults to 2000. (since 2.8) # +# @detached-mode: create a detached LUKS header. (since 9.0) +# # Since: 2.6 ## { 'struct': 'QCryptoBlockCreateOptionsLUKS', @@ -235,7 +237,8 @@ '*ivgen-alg': 'QCryptoIVGenAlgorithm', '*ivgen-hash-alg': 'QCryptoHashAlgorithm', '*hash-alg': 'QCryptoHashAlgorithm', -'*iter-time': 'int'}} +'*iter-time': 'int', +'*detached-mode': 'bool'}} ## # @QCryptoBlockOpenOptions: -- 2.39.1
[v3 04/10] crypto: Introduce creation option and structure for detached LUKS header
Introduce 'header' field in BlockdevCreateOptionsLUKS to support detached LUKS header creation. Meanwhile, introduce header-related field in QCryptoBlock. Signed-off-by: Hyman Huang --- crypto/blockpriv.h | 3 +++ qapi/block-core.json | 3 +++ qapi/crypto.json | 5 - 3 files changed, 10 insertions(+), 1 deletion(-) diff --git a/crypto/blockpriv.h b/crypto/blockpriv.h index 3c7ccea504..6289aea961 100644 --- a/crypto/blockpriv.h +++ b/crypto/blockpriv.h @@ -42,6 +42,9 @@ struct QCryptoBlock { size_t niv; uint64_t payload_offset; /* In bytes */ uint64_t sector_size; /* In bytes */ + +bool detached_header; /* True if disk has a detached LUKS header */ +uint64_t detached_header_size; /* LUKS header size plus key slot size */ }; struct QCryptoBlockDriver { diff --git a/qapi/block-core.json b/qapi/block-core.json index 9ac256c489..8aec179926 100644 --- a/qapi/block-core.json +++ b/qapi/block-core.json @@ -4948,6 +4948,8 @@ # @file: Node to create the image format on, mandatory except when #'preallocation' is not requested # +# @header: Detached LUKS header node to format. (since 9.0) +# # @size: Size of the virtual disk in bytes # # @preallocation: Preallocation mode for the new image (since: 4.2) @@ -4958,6 +4960,7 @@ { 'struct': 'BlockdevCreateOptionsLUKS', 'base': 'QCryptoBlockCreateOptionsLUKS', 'data': { '*file':'BlockdevRef', +'*header': 'BlockdevRef', 'size': 'size', '*preallocation': 'PreallocMode' } } diff --git a/qapi/crypto.json b/qapi/crypto.json index fd3d46ebd1..6b4e86cb81 100644 --- a/qapi/crypto.json +++ b/qapi/crypto.json @@ -195,10 +195,13 @@ # decryption key. Mandatory except when probing image for # metadata only. # +# @detached-header: if true, disk has detached LUKS header. +# # Since: 2.6 ## { 'struct': 'QCryptoBlockOptionsLUKS', - 'data': { '*key-secret': 'str' }} + 'data': { '*key-secret': 'str', +'*detached-header': 'bool' }} ## # @QCryptoBlockCreateOptionsLUKS: -- 2.39.1
[v3 09/10] tests: Add detached LUKS header case
Signed-off-by: Hyman Huang --- tests/qemu-iotests/tests/luks-detached-header | 214 ++ .../tests/luks-detached-header.out| 5 + 2 files changed, 219 insertions(+) create mode 100755 tests/qemu-iotests/tests/luks-detached-header create mode 100644 tests/qemu-iotests/tests/luks-detached-header.out diff --git a/tests/qemu-iotests/tests/luks-detached-header b/tests/qemu-iotests/tests/luks-detached-header new file mode 100755 index 00..cf305bfa47 --- /dev/null +++ b/tests/qemu-iotests/tests/luks-detached-header @@ -0,0 +1,214 @@ +#!/usr/bin/env python3 +# group: rw auto +# +# Test detached LUKS header +# +# Copyright (C) 2024 SmartX Inc. +# +# Authors: +# Hyman Huang +# +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 2 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program. If not, see <http://www.gnu.org/licenses/>. +# + +import os +import iotests +from iotests import imgfmt, qemu_img_create, img_info_log, qemu_img_info, QMPTestCase + + +image_size = 128 * 1024 * 1024 + +luks_img = os.path.join(iotests.test_dir, 'luks.img') +detached_header_img1 = os.path.join(iotests.test_dir, 'detached_header.img1') +detached_header_img2 = os.path.join(iotests.test_dir, 'detached_header.img2') +detached_payload_raw_img = os.path.join(iotests.test_dir, 'detached_payload_raw.img') +detached_payload_qcow2_img = os.path.join(iotests.test_dir, 'detached_payload_qcow2.img') + +secret_obj = 'secret,id=sec0,data=foo' +luks_opts = 'key-secret=sec0' + + +class TestDetachedLUKSHeader(QMPTestCase): +def setUp(self) -> None: +self.vm = iotests.VM() +self.vm.add_object(secret_obj) +self.vm.launch() + +# 1. Create the normal LUKS disk with 128M size +self.vm.blockdev_create({ 'driver': 'file', + 'filename': luks_img, + 'size': 0 }) +self.vm.qmp_log('blockdev-add', driver='file', filename=luks_img, + node_name='luks-1-storage') +result = self.vm.blockdev_create({ 'driver': imgfmt, + 'file': 'luks-1-storage', + 'key-secret': 'sec0', + 'size': image_size, + 'iter-time': 10 }) +# None is expected +self.assertEqual(result, None) + +# 2. Create the LUKS disk with detached header (raw) + +# Create detached LUKS header +self.vm.blockdev_create({ 'driver': 'file', + 'filename': detached_header_img1, + 'size': 0 }) +self.vm.qmp_log('blockdev-add', driver='file', filename=detached_header_img1, + node_name='luks-2-header-storage') + +# Create detached LUKS raw payload +self.vm.blockdev_create({ 'driver': 'file', + 'filename': detached_payload_raw_img, + 'size': 0 }) +self.vm.qmp_log('blockdev-add', driver='file', + filename=detached_payload_raw_img, + node_name='luks-2-payload-storage') + +# Format LUKS disk with detached header +result = self.vm.blockdev_create({ 'driver': imgfmt, + 'header': 'luks-2-header-storage', + 'file': 'luks-2-payload-storage', + 'key-secret': 'sec0', + 'preallocation': 'full', + 'size': image_size, + 'iter-time': 10 }) +self.assertEqual(result, None) + +self.vm.shutdown() + +# 3. Create the LUKS disk with detached header (qcow2) + +# Create detached LUKS header using qemu-img +res = qemu_img_create('-f', 'luks', '--object', secret_obj, '-o', luks_opts, + '-o', "detached-mode=true", detached_header_img2) +assert res.returncode == 0 + +# Create detached LUKS qcow2 payload +res = qemu_img_create('-f', 'qcow2', detached_payload_qcow2_img, str(image_size)) +assert res.returncode == 0 + +def tearDown(self) -> None: +os.remove(luks_img) +os.remove(detached_header_img1) +os.remov
[v3 06/10] block: Support detached LUKS header creation using blockdev-create
The LUKS disk with detached header consists of a separate LUKS header and payload. This LUKS disk type should be formatted as follows: 1. add the secret to lock/unlock the cipher stored in the detached LUKS header $ virsh qemu-monitor-command vm '{"execute":"object-add", > "arguments":{"qom-type": "secret", "id": "sec0", "data": "foo"}}' 2. create a header img with 0 size $ virsh qemu-monitor-command vm '{"execute":"blockdev-create", > "arguments":{"job-id":"job0", "options":{"driver":"file", > "filename":"/path/to/detached_luks_header.img", "size":0 }}}' 3. add protocol blockdev node for header $ virsh qemu-monitor-command vm '{"execute":"blockdev-add", > "arguments": {"driver":"file", "filename": > "/path/to/detached_luks_header.img", "node-name": > "detached-luks-header-storage"}}' 4. create a payload img with 0 size $ virsh qemu-monitor-command vm '{"execute":"blockdev-create", > "arguments":{"job-id":"job1", "options":{"driver":"file", > "filename":"/path/to/detached_luks_payload_raw.img", "size":0}}}' 5. add protocol blockdev node for payload $ virsh qemu-monitor-command vm '{"execute":"blockdev-add", > "arguments": {"driver":"file", "filename": > "/path/to/detached_luks_payload_raw.img", "node-name": > "luks-payload-raw-storage"}}' 6. do the formatting with 128M size $ virsh qemu-monitor-command c81_node1 '{"execute":"blockdev-create", > "arguments":{"job-id":"job2", "options":{"driver":"luks", "header": > "detached-luks-header-storage", "file":"luks-payload-raw-storage", > "size":134217728, "preallocation":"full", "key-secret":"sec0" }}}' Signed-off-by: Hyman Huang --- block/crypto.c | 109 crypto/block-luks.c | 6 ++- crypto/block.c | 1 + 3 files changed, 106 insertions(+), 10 deletions(-) diff --git a/block/crypto.c b/block/crypto.c index 78fbe79c95..76cc8bda49 100644 --- a/block/crypto.c +++ b/block/crypto.c @@ -160,6 +160,48 @@ error: return ret; } +static int coroutine_fn GRAPH_UNLOCKED +block_crypto_co_format_luks_payload(BlockdevCreateOptionsLUKS *luks_opts, +Error **errp) +{ +BlockDriverState *bs = NULL; +BlockBackend *blk = NULL; +Error *local_error = NULL; +int ret; + +if (luks_opts->size > INT64_MAX) { +return -EFBIG; +} + +bs = bdrv_co_open_blockdev_ref(luks_opts->file, errp); +if (bs == NULL) { +return -EIO; +} + +blk = blk_co_new_with_bs(bs, BLK_PERM_WRITE | BLK_PERM_RESIZE, + BLK_PERM_ALL, errp); +if (!blk) { +ret = -EPERM; +goto fail; +} + +ret = blk_truncate(blk, luks_opts->size, true, + luks_opts->preallocation, 0, _error); +if (ret < 0) { +if (ret == -EFBIG) { +/* Replace the error message with a better one */ +error_free(local_error); +error_setg(errp, "The requested file size is too large"); +} +goto fail; +} + +ret = 0; + +fail: +bdrv_co_unref(bs); +return ret; +} static QemuOptsList block_crypto_runtime_opts_luks = { .name = "crypto", @@ -651,6 +693,7 @@ static int coroutine_fn GRAPH_UNLOCKED block_crypto_co_create_luks(BlockdevCreateOptions *create_options, Error **errp) { BlockdevCreateOptionsLUKS *luks_opts; +BlockDriverState *hdr_bs = NULL; BlockDriverState *bs = NULL; QCryptoBlockCreateOptions create_opts; PreallocMode preallocation = PREALLOC_MODE_OFF; @@ -659,8 +702,22 @@ block_crypto_co_create_luks(BlockdevCreateOptions *create_options, Error **errp) assert(create_options->driver == BLOCKDEV_DRIVER_LUKS); luks_opts = _options->u.luks; -if (luks_opts->file == NULL) { -error_setg(errp, "Formatting LUKS disk requires parameter 'file'"); +if (luks_opts->header == NULL && luks_opts->file == NULL) { +error_setg(errp, "Either the parameter 'header' or 'file' should " + "be specified"); +return -EINVAL; +} + +if (luks_opts->detached_header && luks_opts->header == NULL) { +error_setg(errp, "Formatting a detached LUKS disk requries "
[v3 03/10] qapi: Make parameter 'file' optional for BlockdevCreateOptionsLUKS
To support detached LUKS header creation, make the existing 'file' filed in BlockdevCreateOptionsLUKS optional, while also adding an extra optional 'header' field in the next commit. Signed-off-by: Hyman Huang --- block/crypto.c | 21 ++--- qapi/block-core.json | 5 +++-- 2 files changed, 17 insertions(+), 9 deletions(-) diff --git a/block/crypto.c b/block/crypto.c index 6063879bac..78fbe79c95 100644 --- a/block/crypto.c +++ b/block/crypto.c @@ -659,9 +659,9 @@ block_crypto_co_create_luks(BlockdevCreateOptions *create_options, Error **errp) assert(create_options->driver == BLOCKDEV_DRIVER_LUKS); luks_opts = _options->u.luks; -bs = bdrv_co_open_blockdev_ref(luks_opts->file, errp); -if (bs == NULL) { -return -EIO; +if (luks_opts->file == NULL) { +error_setg(errp, "Formatting LUKS disk requires parameter 'file'"); +return -EINVAL; } create_opts = (QCryptoBlockCreateOptions) { @@ -673,10 +673,17 @@ block_crypto_co_create_luks(BlockdevCreateOptions *create_options, Error **errp) preallocation = luks_opts->preallocation; } -ret = block_crypto_co_create_generic(bs, luks_opts->size, _opts, - preallocation, errp); -if (ret < 0) { -goto fail; +if (luks_opts->file) { +bs = bdrv_co_open_blockdev_ref(luks_opts->file, errp); +if (bs == NULL) { +return -EIO; +} + +ret = block_crypto_co_create_generic(bs, luks_opts->size, _opts, + preallocation, errp); +if (ret < 0) { +goto fail; +} } ret = 0; diff --git a/qapi/block-core.json b/qapi/block-core.json index 10be08d08f..9ac256c489 100644 --- a/qapi/block-core.json +++ b/qapi/block-core.json @@ -4945,7 +4945,8 @@ # # Driver specific image creation options for LUKS. # -# @file: Node to create the image format on +# @file: Node to create the image format on, mandatory except when +#'preallocation' is not requested # # @size: Size of the virtual disk in bytes # @@ -4956,7 +4957,7 @@ ## { 'struct': 'BlockdevCreateOptionsLUKS', 'base': 'QCryptoBlockCreateOptionsLUKS', - 'data': { 'file': 'BlockdevRef', + 'data': { '*file':'BlockdevRef', 'size': 'size', '*preallocation': 'PreallocMode' } } -- 2.39.1
[v3 08/10] crypto: Introduce 'detached-header' field in QCryptoBlockInfoLUKS
When querying the LUKS disk with the qemu-img tool or other APIs, add information about whether the LUKS header is detached. Additionally, update the test case with the appropriate modification. Signed-off-by: Hyman Huang --- crypto/block-luks.c| 2 ++ qapi/crypto.json | 3 +++ tests/qemu-iotests/210.out | 4 3 files changed, 9 insertions(+) diff --git a/crypto/block-luks.c b/crypto/block-luks.c index 474c7aee2e..c5e53b4ee4 100644 --- a/crypto/block-luks.c +++ b/crypto/block-luks.c @@ -1266,6 +1266,7 @@ qcrypto_block_luks_open(QCryptoBlock *block, block->sector_size = QCRYPTO_BLOCK_LUKS_SECTOR_SIZE; block->payload_offset = qcrypto_block_luks_payload_offset(luks->header.payload_offset_sector); +block->detached_header = (block->payload_offset == 0) ? true : false; return 0; @@ -1892,6 +1893,7 @@ static int qcrypto_block_luks_get_info(QCryptoBlock *block, info->u.luks.master_key_iters = luks->header.master_key_iterations; info->u.luks.uuid = g_strndup((const char *)luks->header.uuid, sizeof(luks->header.uuid)); +info->u.luks.detached_header = block->detached_header; for (i = 0; i < QCRYPTO_BLOCK_LUKS_NUM_KEY_SLOTS; i++) { slot = g_new0(QCryptoBlockInfoLUKSSlot, 1); diff --git a/qapi/crypto.json b/qapi/crypto.json index 8e81aa8454..336c880b5d 100644 --- a/qapi/crypto.json +++ b/qapi/crypto.json @@ -317,6 +317,8 @@ # # @hash-alg: the master key hash algorithm # +# @detached-header: whether the LUKS header is detached (Since 9.0) +# # @payload-offset: offset to the payload data in bytes # # @master-key-iters: number of PBKDF2 iterations for key material @@ -333,6 +335,7 @@ 'ivgen-alg': 'QCryptoIVGenAlgorithm', '*ivgen-hash-alg': 'QCryptoHashAlgorithm', 'hash-alg': 'QCryptoHashAlgorithm', + 'detached-header': 'bool', 'payload-offset': 'int', 'master-key-iters': 'int', 'uuid': 'str', diff --git a/tests/qemu-iotests/210.out b/tests/qemu-iotests/210.out index 96d9f749dd..94b29b2120 100644 --- a/tests/qemu-iotests/210.out +++ b/tests/qemu-iotests/210.out @@ -18,6 +18,7 @@ virtual size: 128 MiB (134217728 bytes) encrypted: yes Format specific information: ivgen alg: plain64 +detached header: false hash alg: sha256 cipher alg: aes-256 uuid: ---- @@ -70,6 +71,7 @@ virtual size: 64 MiB (67108864 bytes) encrypted: yes Format specific information: ivgen alg: plain64 +detached header: false hash alg: sha1 cipher alg: aes-128 uuid: ---- @@ -125,6 +127,7 @@ virtual size: 0 B (0 bytes) encrypted: yes Format specific information: ivgen alg: plain64 +detached header: false hash alg: sha256 cipher alg: aes-256 uuid: ---- @@ -195,6 +198,7 @@ virtual size: 0 B (0 bytes) encrypted: yes Format specific information: ivgen alg: plain64 +detached header: false hash alg: sha256 cipher alg: aes-256 uuid: ---- -- 2.39.1
[v3 02/10] crypto: Support generic LUKS encryption
By enhancing the LUKS driver, it is possible to enable the detachable LUKS header and, as a result, achieve general encryption for any disk format that QEMU has supported. Take the qcow2 as an example, the usage of the generic LUKS encryption as follows: 1. add a protocol blockdev node of data disk $ virsh qemu-monitor-command vm '{"execute":"blockdev-add", > "arguments":{"node-name":"libvirt-1-storage", "driver":"file", > "filename":"/path/to/test_disk.qcow2"}}' 2. add a protocol blockdev node of LUKS header as above. $ virsh qemu-monitor-command vm '{"execute":"blockdev-add", > "arguments":{"node-name":"libvirt-2-storage", "driver":"file", > "filename": "/path/to/cipher.gluks" }}' 3. add the secret for decrypting the cipher stored in LUKS header above $ virsh qemu-monitor-command vm '{"execute":"object-add", > "arguments":{"qom-type":"secret", "id": > "libvirt-2-storage-secret0", "data":"abc123"}}' 4. add the qcow2-drived blockdev format node $ virsh qemu-monitor-command vm '{"execute":"blockdev-add", > "arguments":{"node-name":"libvirt-1-format", "driver":"qcow2", > "file":"libvirt-1-storage"}}' 5. add the luks-drived blockdev to link the qcow2 disk with LUKS header by specifying the field "header" $ virsh qemu-monitor-command vm '{"execute":"blockdev-add", > "arguments":{"node-name":"libvirt-2-format", "driver":"luks", > "file":"libvirt-1-format", "header":"libvirt-2-storage", > "key-secret":"libvirt-2-format-secret0"}}' 6. add the virtio-blk device finally $ virsh qemu-monitor-command vm '{"execute":"device_add", > "arguments": {"num-queues":"1", "driver":"virtio-blk-pci", > "drive": "libvirt-2-format", "id":"virtio-disk2"}}' The generic LUKS encryption method of starting a virtual machine (VM) is somewhat similar to hot-plug in that both maintaining the same json command while the starting VM changes the "blockdev-add/device_add" parameters to "blockdev/device". Signed-off-by: Hyman Huang Message-Id: <910801f303da1601051479d3b7e5c2c6b4e01eb7.1701879996.git.yong.hu...@smartx.com> --- block/crypto.c | 14 +- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/block/crypto.c b/block/crypto.c index f82b13d32b..6063879bac 100644 --- a/block/crypto.c +++ b/block/crypto.c @@ -64,12 +64,14 @@ static int block_crypto_read_func(QCryptoBlock *block, Error **errp) { BlockDriverState *bs = opaque; +BlockCrypto *crypto = bs->opaque; ssize_t ret; GLOBAL_STATE_CODE(); GRAPH_RDLOCK_GUARD_MAINLOOP(); -ret = bdrv_pread(bs->file, offset, buflen, buf, 0); +ret = bdrv_pread(crypto->header ? crypto->header : bs->file, + offset, buflen, buf, 0); if (ret < 0) { error_setg_errno(errp, -ret, "Could not read encryption header"); return ret; @@ -269,6 +271,7 @@ static int block_crypto_open_generic(QCryptoBlockFormat format, QCryptoBlockOpenOptions *open_opts = NULL; unsigned int cflags = 0; QDict *cryptoopts = NULL; +const char *hdr_bdref = qdict_get_try_str(options, "header"); GLOBAL_STATE_CODE(); @@ -277,6 +280,15 @@ static int block_crypto_open_generic(QCryptoBlockFormat format, return ret; } +if (hdr_bdref) { +crypto->header = bdrv_open_child(NULL, options, "header", bs, + _of_bds, BDRV_CHILD_METADATA, + false, errp); +if (!crypto->header) { +return -EINVAL; +} +} + GRAPH_RDLOCK_GUARD_MAINLOOP(); bs->supported_write_flags = BDRV_REQ_FUA & -- 2.39.1
[v3 05/10] crypto: Mark the payload_offset_sector invalid for detached LUKS header
Set the payload_offset_sector to a value that is nearly never reached in order to mark it as invalid and indicate that 0 should be the offset of the read/write operation on the 'file' protocol blockdev node. Signed-off-by: Hyman Huang --- crypto/block-luks.c | 41 +++-- 1 file changed, 31 insertions(+), 10 deletions(-) diff --git a/crypto/block-luks.c b/crypto/block-luks.c index fb01ec38bb..48443ffcae 100644 --- a/crypto/block-luks.c +++ b/crypto/block-luks.c @@ -34,6 +34,8 @@ #include "qemu/bitmap.h" +#define INVALID_SECTOR_OFFSET UINT32_MAX + /* * Reference for the LUKS format implemented here is * @@ -136,6 +138,13 @@ struct QCryptoBlockLUKS { }; +static inline uint32_t +qcrypto_block_luks_payload_offset(uint32_t sector) +{ +return sector == INVALID_SECTOR_OFFSET ? 0 : +sector * QCRYPTO_BLOCK_LUKS_SECTOR_SIZE; +} + static int qcrypto_block_luks_cipher_name_lookup(const char *name, QCryptoCipherMode mode, uint32_t key_bytes, @@ -1255,8 +1264,8 @@ qcrypto_block_luks_open(QCryptoBlock *block, } block->sector_size = QCRYPTO_BLOCK_LUKS_SECTOR_SIZE; -block->payload_offset = luks->header.payload_offset_sector * -block->sector_size; +block->payload_offset = +qcrypto_block_luks_payload_offset(luks->header.payload_offset_sector); return 0; @@ -1529,16 +1538,28 @@ qcrypto_block_luks_create(QCryptoBlock *block, slot->stripes = QCRYPTO_BLOCK_LUKS_STRIPES; } -/* The total size of the LUKS headers is the partition header + key - * slot headers, rounded up to the nearest sector, combined with - * the size of each master key material region, also rounded up - * to the nearest sector */ -luks->header.payload_offset_sector = header_sectors + -QCRYPTO_BLOCK_LUKS_NUM_KEY_SLOTS * split_key_sectors; +if (block->detached_header) { +/* + * Set the payload_offset_sector to a value that is nearly never + * reached in order to mark it as invalid and indicate that 0 should + * be the offset of the read/write operation on the 'file' protocol + * blockdev node. Here the UINT32_MAX is choosed + */ +luks->header.payload_offset_sector = INVALID_SECTOR_OFFSET; +} else { +/* + * The total size of the LUKS headers is the partition header + key + * slot headers, rounded up to the nearest sector, combined with + * the size of each master key material region, also rounded up + * to the nearest sector + */ +luks->header.payload_offset_sector = header_sectors + +QCRYPTO_BLOCK_LUKS_NUM_KEY_SLOTS * split_key_sectors; +} block->sector_size = QCRYPTO_BLOCK_LUKS_SECTOR_SIZE; -block->payload_offset = luks->header.payload_offset_sector * -block->sector_size; +block->payload_offset = +qcrypto_block_luks_payload_offset(luks->header.payload_offset_sector); /* Reserve header space to match payload offset */ initfunc(block, block->payload_offset, opaque, _err); -- 2.39.1
[v3 07/10] block: Support detached LUKS header creation using qemu-img
Add the 'detached-mode' option to specify the creation of a detached LUKS header. This is how it is used: $ qemu-img create --object secret,id=sec0,data=abc123 -f luks > -o cipher-alg=aes-256,cipher-mode=xts -o key-secret=sec0 > -o detached-mode=true header.luks Signed-off-by: Hyman Huang --- block.c | 5 - block/crypto.c | 9 - block/crypto.h | 8 qapi/crypto.json | 5 - 4 files changed, 24 insertions(+), 3 deletions(-) diff --git a/block.c b/block.c index bfb0861ec6..fa9ce36928 100644 --- a/block.c +++ b/block.c @@ -7517,7 +7517,10 @@ void bdrv_img_create(const char *filename, const char *fmt, goto out; } -if (size == -1) { +/* Parameter 'size' is not needed for detached LUKS header */ +if (size == -1 && +!(!strcmp(fmt, "luks") && + qemu_opt_get_bool(opts, "detached-mode", false))) { error_setg(errp, "Image creation needs a size parameter"); goto out; } diff --git a/block/crypto.c b/block/crypto.c index 76cc8bda49..812c3c28f5 100644 --- a/block/crypto.c +++ b/block/crypto.c @@ -229,6 +229,7 @@ static QemuOptsList block_crypto_create_opts_luks = { BLOCK_CRYPTO_OPT_DEF_LUKS_IVGEN_HASH_ALG(""), BLOCK_CRYPTO_OPT_DEF_LUKS_HASH_ALG(""), BLOCK_CRYPTO_OPT_DEF_LUKS_ITER_TIME(""), +BLOCK_CRYPTO_OPT_DEF_LUKS_DETACHED_MODE(""), { /* end of list */ } }, }; @@ -793,6 +794,8 @@ block_crypto_co_create_opts_luks(BlockDriver *drv, const char *filename, PreallocMode prealloc; char *buf = NULL; int64_t size; +bool detached_mode = +qemu_opt_get_bool(opts, "detached-mode", false); int ret; Error *local_err = NULL; @@ -832,8 +835,12 @@ block_crypto_co_create_opts_luks(BlockDriver *drv, const char *filename, goto fail; } + /* The detached_header default to true if detached-mode is specified */ +create_opts->u.luks.detached_header = detached_mode ? true : false; + /* Create format layer */ -ret = block_crypto_co_create_generic(bs, size, create_opts, prealloc, errp); +ret = block_crypto_co_create_generic(bs, detached_mode ? 0 : size, + create_opts, prealloc, errp); if (ret < 0) { goto fail; } diff --git a/block/crypto.h b/block/crypto.h index 72e792c9af..bceefd45bd 100644 --- a/block/crypto.h +++ b/block/crypto.h @@ -41,6 +41,7 @@ #define BLOCK_CRYPTO_OPT_LUKS_IVGEN_HASH_ALG "ivgen-hash-alg" #define BLOCK_CRYPTO_OPT_LUKS_HASH_ALG "hash-alg" #define BLOCK_CRYPTO_OPT_LUKS_ITER_TIME "iter-time" +#define BLOCK_CRYPTO_OPT_LUKS_DETACHED_MODE "detached-mode" #define BLOCK_CRYPTO_OPT_LUKS_KEYSLOT "keyslot" #define BLOCK_CRYPTO_OPT_LUKS_STATE "state" #define BLOCK_CRYPTO_OPT_LUKS_OLD_SECRET "old-secret" @@ -100,6 +101,13 @@ .help = "Select new state of affected keyslots (active/inactive)",\ } +#define BLOCK_CRYPTO_OPT_DEF_LUKS_DETACHED_MODE(prefix) \ +{ \ +.name = prefix BLOCK_CRYPTO_OPT_LUKS_DETACHED_MODE, \ +.type = QEMU_OPT_BOOL,\ +.help = "Create a detached LUKS header", \ +} + #define BLOCK_CRYPTO_OPT_DEF_LUKS_KEYSLOT(prefix) \ { \ .name = prefix BLOCK_CRYPTO_OPT_LUKS_KEYSLOT, \ diff --git a/qapi/crypto.json b/qapi/crypto.json index 6b4e86cb81..8e81aa8454 100644 --- a/qapi/crypto.json +++ b/qapi/crypto.json @@ -226,6 +226,8 @@ # @iter-time: number of milliseconds to spend in PBKDF passphrase # processing. Currently defaults to 2000. (since 2.8) # +# @detached-mode: create a detached LUKS header. (since 9.0) +# # Since: 2.6 ## { 'struct': 'QCryptoBlockCreateOptionsLUKS', @@ -235,7 +237,8 @@ '*ivgen-alg': 'QCryptoIVGenAlgorithm', '*ivgen-hash-alg': 'QCryptoHashAlgorithm', '*hash-alg': 'QCryptoHashAlgorithm', -'*iter-time': 'int'}} +'*iter-time': 'int', +'*detached-mode': 'bool'}} ## # @QCryptoBlockOpenOptions: -- 2.39.1
[v3 10/10] MAINTAINERS: Add section "Detached LUKS header"
I've built interests in block cryptography and also have been working on projects related to this subsystem. Add a section to the MAINTAINERS file for detached LUKS header, it only has a test case in it currently. Signed-off-by: Hyman Huang --- MAINTAINERS | 5 + 1 file changed, 5 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index 395f26ba86..f0f7b889a3 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -3391,6 +3391,11 @@ F: migration/dirtyrate.c F: migration/dirtyrate.h F: include/sysemu/dirtyrate.h +Detached LUKS header +M: Hyman Huang +S: Maintained +F: tests/qemu-iotests/tests/luks-detached-header + D-Bus M: Marc-André Lureau S: Maintained -- 2.39.1
[v3 00/10] Support generic Luks encryption
v3: - Rebase on master - Add a test case for detached LUKS header - Adjust the design to honour preallocation of the payload device - Adjust the design to honour the payload offset from the header, even when detached - Support detached LUKS header creation using qemu-img - Support detached LUKS header querying - Do some code clean Hyman Huang (10): crypto: Introduce option and structure for detached LUKS header crypto: Support generic LUKS encryption qapi: Make parameter 'file' optional for BlockdevCreateOptionsLUKS crypto: Introduce creation option and structure for detached LUKS header crypto: Mark the payload_offset_sector invalid for detached LUKS header block: Support detached LUKS header creation using blockdev-create block: Support detached LUKS header creation using qemu-img crypto: Introduce 'detached-header' field in QCryptoBlockInfoLUKS tests: Add detached LUKS header case MAINTAINERS: Add section "Detached LUKS header" MAINTAINERS | 5 + block.c | 5 +- block/crypto.c| 146 ++-- block/crypto.h| 8 + crypto/block-luks.c | 49 +++- crypto/block.c| 1 + crypto/blockpriv.h| 3 + qapi/block-core.json | 14 +- qapi/crypto.json | 13 +- tests/qemu-iotests/210.out| 4 + tests/qemu-iotests/tests/luks-detached-header | 214 ++ .../tests/luks-detached-header.out| 5 + 12 files changed, 436 insertions(+), 31 deletions(-) create mode 100755 tests/qemu-iotests/tests/luks-detached-header create mode 100644 tests/qemu-iotests/tests/luks-detached-header.out -- 2.39.1
[v3 01/10] crypto: Introduce option and structure for detached LUKS header
Add the "header" option for the LUKS format. This field would be used to identify the blockdev's position where a detachable LUKS header is stored. In addition, introduce header field in struct BlockCrypto Signed-off-by: Hyman Huang Reviewed-by: Daniel P. Berrangé Message-Id: <5b99f60c7317092a563d7ca3fb4b414197015eb2.1701879996.git.yong.hu...@smartx.com> --- block/crypto.c | 1 + qapi/block-core.json | 6 +- 2 files changed, 6 insertions(+), 1 deletion(-) diff --git a/block/crypto.c b/block/crypto.c index 921933a5e5..f82b13d32b 100644 --- a/block/crypto.c +++ b/block/crypto.c @@ -39,6 +39,7 @@ typedef struct BlockCrypto BlockCrypto; struct BlockCrypto { QCryptoBlock *block; bool updating_keys; +BdrvChild *header; /* Reference to the detached LUKS header */ }; diff --git a/qapi/block-core.json b/qapi/block-core.json index ca390c5700..10be08d08f 100644 --- a/qapi/block-core.json +++ b/qapi/block-core.json @@ -3352,11 +3352,15 @@ # decryption key (since 2.6). Mandatory except when doing a # metadata-only probe of the image. # +# @header: optional reference to the location of a blockdev +# storing a detached LUKS header. (since 9.0) +# # Since: 2.9 ## { 'struct': 'BlockdevOptionsLUKS', 'base': 'BlockdevOptionsGenericFormat', - 'data': { '*key-secret': 'str' } } + 'data': { '*key-secret': 'str', +'*header': 'BlockdevRef'} } ## # @BlockdevOptionsGenericCOWFormat: -- 2.39.1
[PATCH v6] crypto: Introduce SM4 symmetric cipher algorithm
Introduce the SM4 cipher algorithms (OSCCA GB/T 32907-2016). SM4 (GBT.32907-2016) is a cryptographic standard issued by the Organization of State Commercial Administration of China (OSCCA) as an authorized cryptographic algorithms for the use within China. Detect the SM4 cipher algorithms and enable the feature silently if it is available. Signed-off-by: Hyman Huang Reviewed-by: Philippe Mathieu-Daudé --- crypto/block-luks.c | 11 crypto/cipher-gcrypt.c.inc | 8 ++ crypto/cipher-nettle.c.inc | 49 + crypto/cipher.c | 6 meson.build | 26 + qapi/crypto.json| 5 +++- tests/unit/test-crypto-cipher.c | 13 + 7 files changed, 117 insertions(+), 1 deletion(-) diff --git a/crypto/block-luks.c b/crypto/block-luks.c index fb01ec38bb..f0813d69b4 100644 --- a/crypto/block-luks.c +++ b/crypto/block-luks.c @@ -95,12 +95,23 @@ qcrypto_block_luks_cipher_size_map_twofish[] = { { 0, 0 }, }; +#ifdef CONFIG_CRYPTO_SM4 +static const QCryptoBlockLUKSCipherSizeMap +qcrypto_block_luks_cipher_size_map_sm4[] = { +{ 16, QCRYPTO_CIPHER_ALG_SM4}, +{ 0, 0 }, +}; +#endif + static const QCryptoBlockLUKSCipherNameMap qcrypto_block_luks_cipher_name_map[] = { { "aes", qcrypto_block_luks_cipher_size_map_aes }, { "cast5", qcrypto_block_luks_cipher_size_map_cast5 }, { "serpent", qcrypto_block_luks_cipher_size_map_serpent }, { "twofish", qcrypto_block_luks_cipher_size_map_twofish }, +#ifdef CONFIG_CRYPTO_SM4 +{ "sm4", qcrypto_block_luks_cipher_size_map_sm4}, +#endif }; QEMU_BUILD_BUG_ON(sizeof(struct QCryptoBlockLUKSKeySlot) != 48); diff --git a/crypto/cipher-gcrypt.c.inc b/crypto/cipher-gcrypt.c.inc index a6a0117717..1377cbaf14 100644 --- a/crypto/cipher-gcrypt.c.inc +++ b/crypto/cipher-gcrypt.c.inc @@ -35,6 +35,9 @@ bool qcrypto_cipher_supports(QCryptoCipherAlgorithm alg, case QCRYPTO_CIPHER_ALG_SERPENT_256: case QCRYPTO_CIPHER_ALG_TWOFISH_128: case QCRYPTO_CIPHER_ALG_TWOFISH_256: +#ifdef CONFIG_CRYPTO_SM4 +case QCRYPTO_CIPHER_ALG_SM4: +#endif break; default: return false; @@ -219,6 +222,11 @@ static QCryptoCipher *qcrypto_cipher_ctx_new(QCryptoCipherAlgorithm alg, case QCRYPTO_CIPHER_ALG_TWOFISH_256: gcryalg = GCRY_CIPHER_TWOFISH; break; +#ifdef CONFIG_CRYPTO_SM4 +case QCRYPTO_CIPHER_ALG_SM4: +gcryalg = GCRY_CIPHER_SM4; +break; +#endif default: error_setg(errp, "Unsupported cipher algorithm %s", QCryptoCipherAlgorithm_str(alg)); diff --git a/crypto/cipher-nettle.c.inc b/crypto/cipher-nettle.c.inc index 24cc61f87b..42b39e18a2 100644 --- a/crypto/cipher-nettle.c.inc +++ b/crypto/cipher-nettle.c.inc @@ -33,6 +33,9 @@ #ifndef CONFIG_QEMU_PRIVATE_XTS #include #endif +#ifdef CONFIG_CRYPTO_SM4 +#include +#endif static inline bool qcrypto_length_check(size_t len, size_t blocksize, Error **errp) @@ -426,6 +429,30 @@ DEFINE_ECB_CBC_CTR_XTS(qcrypto_nettle_twofish, QCryptoNettleTwofish, TWOFISH_BLOCK_SIZE, twofish_encrypt_native, twofish_decrypt_native) +#ifdef CONFIG_CRYPTO_SM4 +typedef struct QCryptoNettleSm4 { +QCryptoCipher base; +struct sm4_ctx key[2]; +} QCryptoNettleSm4; + +static void sm4_encrypt_native(void *ctx, size_t length, + uint8_t *dst, const uint8_t *src) +{ +struct sm4_ctx *keys = ctx; +sm4_crypt([0], length, dst, src); +} + +static void sm4_decrypt_native(void *ctx, size_t length, + uint8_t *dst, const uint8_t *src) +{ +struct sm4_ctx *keys = ctx; +sm4_crypt([1], length, dst, src); +} + +DEFINE_ECB(qcrypto_nettle_sm4, + QCryptoNettleSm4, SM4_BLOCK_SIZE, + sm4_encrypt_native, sm4_decrypt_native) +#endif bool qcrypto_cipher_supports(QCryptoCipherAlgorithm alg, QCryptoCipherMode mode) @@ -443,6 +470,9 @@ bool qcrypto_cipher_supports(QCryptoCipherAlgorithm alg, case QCRYPTO_CIPHER_ALG_TWOFISH_128: case QCRYPTO_CIPHER_ALG_TWOFISH_192: case QCRYPTO_CIPHER_ALG_TWOFISH_256: +#ifdef CONFIG_CRYPTO_SM4 +case QCRYPTO_CIPHER_ALG_SM4: +#endif break; default: return false; @@ -701,6 +731,25 @@ static QCryptoCipher *qcrypto_cipher_ctx_new(QCryptoCipherAlgorithm alg, return >base; } +#ifdef CONFIG_CRYPTO_SM4 +case QCRYPTO_CIPHER_ALG_SM4: +{ +QCryptoNettleSm4 *ctx = g_new0(QCryptoNettleSm4, 1); + +switch (mode) { +case QCRYPTO_CIPHER_MODE_ECB: +ctx->base.driver = _nettle_sm4_driver_ecb; +break; +default: +goto bad_cipher_mode; +} + +sm4_set_
[PATCH v5] crypto: Introduce SM4 symmetric cipher algorithm
Introduce the SM4 cipher algorithms (OSCCA GB/T 32907-2016). SM4 (GBT.32907-2016) is a cryptographic standard issued by the Organization of State Commercial Administration of China (OSCCA) as an authorized cryptographic algorithms for the use within China. Use the crypto-sm4 meson build option to explicitly control the feature, which would be detected by default. Signed-off-by: Hyman Huang Reviewed-by: Philippe Mathieu-Daudé --- crypto/block-luks.c | 11 crypto/cipher-gcrypt.c.inc | 8 ++ crypto/cipher-nettle.c.inc | 49 + crypto/cipher.c | 6 meson.build | 26 + qapi/crypto.json| 5 +++- tests/unit/test-crypto-cipher.c | 13 + 7 files changed, 117 insertions(+), 1 deletion(-) diff --git a/crypto/block-luks.c b/crypto/block-luks.c index fb01ec38bb..f0813d69b4 100644 --- a/crypto/block-luks.c +++ b/crypto/block-luks.c @@ -95,12 +95,23 @@ qcrypto_block_luks_cipher_size_map_twofish[] = { { 0, 0 }, }; +#ifdef CONFIG_CRYPTO_SM4 +static const QCryptoBlockLUKSCipherSizeMap +qcrypto_block_luks_cipher_size_map_sm4[] = { +{ 16, QCRYPTO_CIPHER_ALG_SM4}, +{ 0, 0 }, +}; +#endif + static const QCryptoBlockLUKSCipherNameMap qcrypto_block_luks_cipher_name_map[] = { { "aes", qcrypto_block_luks_cipher_size_map_aes }, { "cast5", qcrypto_block_luks_cipher_size_map_cast5 }, { "serpent", qcrypto_block_luks_cipher_size_map_serpent }, { "twofish", qcrypto_block_luks_cipher_size_map_twofish }, +#ifdef CONFIG_CRYPTO_SM4 +{ "sm4", qcrypto_block_luks_cipher_size_map_sm4}, +#endif }; QEMU_BUILD_BUG_ON(sizeof(struct QCryptoBlockLUKSKeySlot) != 48); diff --git a/crypto/cipher-gcrypt.c.inc b/crypto/cipher-gcrypt.c.inc index a6a0117717..1377cbaf14 100644 --- a/crypto/cipher-gcrypt.c.inc +++ b/crypto/cipher-gcrypt.c.inc @@ -35,6 +35,9 @@ bool qcrypto_cipher_supports(QCryptoCipherAlgorithm alg, case QCRYPTO_CIPHER_ALG_SERPENT_256: case QCRYPTO_CIPHER_ALG_TWOFISH_128: case QCRYPTO_CIPHER_ALG_TWOFISH_256: +#ifdef CONFIG_CRYPTO_SM4 +case QCRYPTO_CIPHER_ALG_SM4: +#endif break; default: return false; @@ -219,6 +222,11 @@ static QCryptoCipher *qcrypto_cipher_ctx_new(QCryptoCipherAlgorithm alg, case QCRYPTO_CIPHER_ALG_TWOFISH_256: gcryalg = GCRY_CIPHER_TWOFISH; break; +#ifdef CONFIG_CRYPTO_SM4 +case QCRYPTO_CIPHER_ALG_SM4: +gcryalg = GCRY_CIPHER_SM4; +break; +#endif default: error_setg(errp, "Unsupported cipher algorithm %s", QCryptoCipherAlgorithm_str(alg)); diff --git a/crypto/cipher-nettle.c.inc b/crypto/cipher-nettle.c.inc index 24cc61f87b..42b39e18a2 100644 --- a/crypto/cipher-nettle.c.inc +++ b/crypto/cipher-nettle.c.inc @@ -33,6 +33,9 @@ #ifndef CONFIG_QEMU_PRIVATE_XTS #include #endif +#ifdef CONFIG_CRYPTO_SM4 +#include +#endif static inline bool qcrypto_length_check(size_t len, size_t blocksize, Error **errp) @@ -426,6 +429,30 @@ DEFINE_ECB_CBC_CTR_XTS(qcrypto_nettle_twofish, QCryptoNettleTwofish, TWOFISH_BLOCK_SIZE, twofish_encrypt_native, twofish_decrypt_native) +#ifdef CONFIG_CRYPTO_SM4 +typedef struct QCryptoNettleSm4 { +QCryptoCipher base; +struct sm4_ctx key[2]; +} QCryptoNettleSm4; + +static void sm4_encrypt_native(void *ctx, size_t length, + uint8_t *dst, const uint8_t *src) +{ +struct sm4_ctx *keys = ctx; +sm4_crypt([0], length, dst, src); +} + +static void sm4_decrypt_native(void *ctx, size_t length, + uint8_t *dst, const uint8_t *src) +{ +struct sm4_ctx *keys = ctx; +sm4_crypt([1], length, dst, src); +} + +DEFINE_ECB(qcrypto_nettle_sm4, + QCryptoNettleSm4, SM4_BLOCK_SIZE, + sm4_encrypt_native, sm4_decrypt_native) +#endif bool qcrypto_cipher_supports(QCryptoCipherAlgorithm alg, QCryptoCipherMode mode) @@ -443,6 +470,9 @@ bool qcrypto_cipher_supports(QCryptoCipherAlgorithm alg, case QCRYPTO_CIPHER_ALG_TWOFISH_128: case QCRYPTO_CIPHER_ALG_TWOFISH_192: case QCRYPTO_CIPHER_ALG_TWOFISH_256: +#ifdef CONFIG_CRYPTO_SM4 +case QCRYPTO_CIPHER_ALG_SM4: +#endif break; default: return false; @@ -701,6 +731,25 @@ static QCryptoCipher *qcrypto_cipher_ctx_new(QCryptoCipherAlgorithm alg, return >base; } +#ifdef CONFIG_CRYPTO_SM4 +case QCRYPTO_CIPHER_ALG_SM4: +{ +QCryptoNettleSm4 *ctx = g_new0(QCryptoNettleSm4, 1); + +switch (mode) { +case QCRYPTO_CIPHER_MODE_ECB: +ctx->base.driver = _nettle_sm4_driver_ecb; +break; +default: +goto bad_cipher_mo
[v2 0/4] Support generic Luks encryption
v2: - Simplify the design by reusing the LUKS driver to implement the generic Luks encryption, thank Daniel for the insightful advice. - rebase on master. This functionality was motivated by the following to-do list seen in crypto documents: https://wiki.qemu.org/Features/Block/Crypto The last chapter says we should "separate header volume": The LUKS format has ability to store the header in a separate volume from the payload. We should extend the LUKS driver in QEMU to support this use case. By enhancing the LUKS driver, it is possible to enable the detachable LUKS header and, as a result, achieve general encryption for any disk format that QEMU has supported. Take the qcow2 as an example, the usage of the generic LUKS encryption as follows: 1. add a protocol blockdev node of data disk $ virsh qemu-monitor-command vm '{"execute":"blockdev-add", > "arguments":{"node-name":"libvirt-1-storage", "driver":"file", > "filename":"/path/to/test_disk.qcow2"}}' 2. add a protocol blockdev node of LUKS header as above. $ virsh qemu-monitor-command vm '{"execute":"blockdev-add", > "arguments":{"node-name":"libvirt-2-storage", "driver":"file", > "filename": "/path/to/cipher.gluks" }}' 3. add the secret for decrypting the cipher stored in LUKS header above $ virsh qemu-monitor-command vm '{"execute":"object-add", > "arguments":{"qom-type":"secret", "id": > "libvirt-2-storage-secret0", "data":"abc123"}}' 4. add the qcow2-drived blockdev format node $ virsh qemu-monitor-command vm '{"execute":"blockdev-add", > "arguments":{"node-name":"libvirt-1-format", "driver":"qcow2", > "file":"libvirt-1-storage"}}' 5. add the luks-drived blockdev to link the qcow2 disk with LUKS header by specifying the field "header" $ virsh qemu-monitor-command vm '{"execute":"blockdev-add", > "arguments":{"node-name":"libvirt-2-format", "driver":"luks", > "file":"libvirt-1-format", "header":"libvirt-2-storage", > "key-secret":"libvirt-2-format-secret0"}}' 6. add the virtio-blk device finally $ virsh qemu-monitor-command vm '{"execute":"device_add", > "arguments": {"num-queues":"1", "driver":"virtio-blk-pci", > "drive": "libvirt-2-format", "id":"virtio-disk2"}}' The generic LUKS encryption method of starting a virtual machine (VM) is somewhat similar to hot-plug in that both maintaining the same json command while the starting VM changes the "blockdev-add/device_add" parameters to "blockdev/device". Please review, thanks Best regared, Yong Hyman Huang (4): crypto: Introduce option and structure for detached LUKS header crypto: Introduce payload offset set function crypto: Support generic LUKS encryption block: Support detached LUKS header creation for blockdev-create block/crypto.c | 47 -- crypto/block.c | 4 include/crypto/block.h | 1 + qapi/block-core.json | 11 -- 4 files changed, 59 insertions(+), 4 deletions(-) -- 2.39.1
[v2 2/4] crypto: Introduce payload offset set function
Signed-off-by: Hyman Huang --- crypto/block.c | 4 include/crypto/block.h | 1 + 2 files changed, 5 insertions(+) diff --git a/crypto/block.c b/crypto/block.c index 7bb4b74a37..3dcf22a69f 100644 --- a/crypto/block.c +++ b/crypto/block.c @@ -319,6 +319,10 @@ QCryptoHashAlgorithm qcrypto_block_get_kdf_hash(QCryptoBlock *block) return block->kdfhash; } +void qcrypto_block_set_payload_offset(QCryptoBlock *block, uint64_t offset) +{ +block->payload_offset = offset; +} uint64_t qcrypto_block_get_payload_offset(QCryptoBlock *block) { diff --git a/include/crypto/block.h b/include/crypto/block.h index 4f63a37872..b47a90c529 100644 --- a/include/crypto/block.h +++ b/include/crypto/block.h @@ -312,4 +312,5 @@ void qcrypto_block_free(QCryptoBlock *block); G_DEFINE_AUTOPTR_CLEANUP_FUNC(QCryptoBlock, qcrypto_block_free) +void qcrypto_block_set_payload_offset(QCryptoBlock *block, uint64_t offset); #endif /* QCRYPTO_BLOCK_H */ -- 2.39.1
[v2 3/4] crypto: Support generic LUKS encryption
By enhancing the LUKS driver, it is possible to enable the detachable LUKS header and, as a result, achieve general encryption for any disk format that QEMU has supported. Take the qcow2 as an example, the usage of the generic LUKS encryption as follows: 1. add a protocol blockdev node of data disk $ virsh qemu-monitor-command vm '{"execute":"blockdev-add", > "arguments":{"node-name":"libvirt-1-storage", "driver":"file", > "filename":"/path/to/test_disk.qcow2"}}' 2. add a protocol blockdev node of LUKS header as above. $ virsh qemu-monitor-command vm '{"execute":"blockdev-add", > "arguments":{"node-name":"libvirt-2-storage", "driver":"file", > "filename": "/path/to/cipher.gluks" }}' 3. add the secret for decrypting the cipher stored in LUKS header above $ virsh qemu-monitor-command vm '{"execute":"object-add", > "arguments":{"qom-type":"secret", "id": > "libvirt-2-storage-secret0", "data":"abc123"}}' 4. add the qcow2-drived blockdev format node $ virsh qemu-monitor-command vm '{"execute":"blockdev-add", > "arguments":{"node-name":"libvirt-1-format", "driver":"qcow2", > "file":"libvirt-1-storage"}}' 5. add the luks-drived blockdev to link the qcow2 disk with LUKS header by specifying the field "header" $ virsh qemu-monitor-command vm '{"execute":"blockdev-add", > "arguments":{"node-name":"libvirt-2-format", "driver":"luks", > "file":"libvirt-1-format", "header":"libvirt-2-storage", > "key-secret":"libvirt-2-format-secret0"}}' 6. add the virtio-blk device finally $ virsh qemu-monitor-command vm '{"execute":"device_add", > "arguments": {"num-queues":"1", "driver":"virtio-blk-pci", > "drive": "libvirt-2-format", "id":"virtio-disk2"}}' The generic LUKS encryption method of starting a virtual machine (VM) is somewhat similar to hot-plug in that both maintaining the same json command while the starting VM changes the "blockdev-add/device_add" parameters to "blockdev/device". Signed-off-by: Hyman Huang --- block/crypto.c | 38 +- 1 file changed, 37 insertions(+), 1 deletion(-) diff --git a/block/crypto.c b/block/crypto.c index f82b13d32b..7d70349463 100644 --- a/block/crypto.c +++ b/block/crypto.c @@ -40,6 +40,7 @@ struct BlockCrypto { QCryptoBlock *block; bool updating_keys; BdrvChild *header; /* Reference to the detached LUKS header */ +bool detached_mode; /* If true, LUKS plays a detached header role */ }; @@ -64,12 +65,16 @@ static int block_crypto_read_func(QCryptoBlock *block, Error **errp) { BlockDriverState *bs = opaque; +BlockCrypto *crypto = bs->opaque; ssize_t ret; GLOBAL_STATE_CODE(); GRAPH_RDLOCK_GUARD_MAINLOOP(); -ret = bdrv_pread(bs->file, offset, buflen, buf, 0); +if (crypto->detached_mode) +ret = bdrv_pread(crypto->header, offset, buflen, buf, 0); +else +ret = bdrv_pread(bs->file, offset, buflen, buf, 0); if (ret < 0) { error_setg_errno(errp, -ret, "Could not read encryption header"); return ret; @@ -269,6 +274,8 @@ static int block_crypto_open_generic(QCryptoBlockFormat format, QCryptoBlockOpenOptions *open_opts = NULL; unsigned int cflags = 0; QDict *cryptoopts = NULL; +const char *header_bdref = +qdict_get_try_str(options, "header"); GLOBAL_STATE_CODE(); @@ -277,6 +284,16 @@ static int block_crypto_open_generic(QCryptoBlockFormat format, return ret; } +if (header_bdref) { +crypto->detached_mode = true; +crypto->header = bdrv_open_child(NULL, options, "header", bs, + _of_bds, BDRV_CHILD_METADATA, + false, errp); +if (!crypto->header) { +return -EINVAL; +} +} + GRAPH_RDLOCK_GUARD_MAINLOOP(); bs->supported_write_flags = BDRV_REQ_FUA & @@ -312,6 +329,14 @@ static int block_crypto_open_generic(QCryptoBlockFormat format, goto cleanup; } +if (crypto->detached_mode) { +/* + * Set payload offset to zero as the file bdref has no LUKS + * header under detached mode. + */ +qcrypto_block_set_payload_offset(crypto->block, 0); +} + bs->encrypted = true; ret = 0; @@ -903,6 +928,17 @@ block_crypto_child_perms(BlockDriverState *bs, BdrvChild *c, BlockCrypto *crypto = bs->opaque; +if (role == (role & BDRV_CHILD_METADATA)) { +/* Assign read permission only */ +perm |= BLK_PERM_CONSISTENT_READ; +/* Share all permissions */ +shared |= BLK_PERM_ALL; + +*nperm = perm; +*nshared = shared; +return; +} + bdrv_default_perms(bs, c, role, reopen_queue, perm, shared, nperm, nshared); /* -- 2.39.1
[v2 4/4] block: Support detached LUKS header creation for blockdev-create
Provide the "detached-mode" option for detached LUKS header formatting. To format the LUKS header on the pre-creating disk, example as follows: 1. add a protocol blockdev node of LUKS header $ virsh qemu-monitor-command vm '{"execute":"blockdev-add", > "arguments":{"node-name":"libvirt-1-storage", "driver":"file", > "filename":"/path/to/cipher.gluks" }}' 2. add the secret for encrypting the cipher stored in LUKS header above $ virsh qemu-monitor-command vm '{"execute":"object-add", > "arguments":{"qom-type": "secret", "id": > "libvirt-1-storage-secret0", "data": "abc123"}}' 3. format the disk node $ virsh qemu-monitor-command vm '{"execute":"blockdev-create", > "arguments":{"job-id":"job0", "options":{"driver":"luks", > "size":0, "file":"libvirt-1-storage", "detached-mode":true, > "cipher-alg":"aes-256", > "key-secret":"libvirt-3-storage-encryption-secret0"}}}' Signed-off-by: Hyman Huang --- block/crypto.c | 8 +++- qapi/block-core.json | 5 - 2 files changed, 11 insertions(+), 2 deletions(-) diff --git a/block/crypto.c b/block/crypto.c index 7d70349463..e77c49bd0c 100644 --- a/block/crypto.c +++ b/block/crypto.c @@ -667,10 +667,12 @@ block_crypto_co_create_luks(BlockdevCreateOptions *create_options, Error **errp) BlockDriverState *bs = NULL; QCryptoBlockCreateOptions create_opts; PreallocMode preallocation = PREALLOC_MODE_OFF; +int64_t size; int ret; assert(create_options->driver == BLOCKDEV_DRIVER_LUKS); luks_opts = _options->u.luks; +size = luks_opts->size; bs = bdrv_co_open_blockdev_ref(luks_opts->file, errp); if (bs == NULL) { @@ -686,7 +688,11 @@ block_crypto_co_create_luks(BlockdevCreateOptions *create_options, Error **errp) preallocation = luks_opts->preallocation; } -ret = block_crypto_co_create_generic(bs, luks_opts->size, _opts, +if (luks_opts->detached_mode) { +size = 0; +} + +ret = block_crypto_co_create_generic(bs, size, _opts, preallocation, errp); if (ret < 0) { goto fail; diff --git a/qapi/block-core.json b/qapi/block-core.json index 10be08d08f..1e7a7e1b05 100644 --- a/qapi/block-core.json +++ b/qapi/block-core.json @@ -4952,13 +4952,16 @@ # @preallocation: Preallocation mode for the new image (since: 4.2) # (default: off; allowed values: off, metadata, falloc, full) # +# @detached-mode: create a detached LUKS header. (since 9.0) +# # Since: 2.12 ## { 'struct': 'BlockdevCreateOptionsLUKS', 'base': 'QCryptoBlockCreateOptionsLUKS', 'data': { 'file': 'BlockdevRef', 'size': 'size', -'*preallocation': 'PreallocMode' } } +'*preallocation': 'PreallocMode', +'*detached-mode': 'bool'}} ## # @BlockdevCreateOptionsNfs: -- 2.39.1
[v2 1/4] crypto: Introduce option and structure for detached LUKS header
Add the "header" option for the LUKS format. This field would be used to identify the blockdev's position where a detachable LUKS header is stored. In addition, introduce header field in struct BlockCrypto Signed-off-by: Hyman Huang --- block/crypto.c | 1 + qapi/block-core.json | 6 +- 2 files changed, 6 insertions(+), 1 deletion(-) diff --git a/block/crypto.c b/block/crypto.c index 921933a5e5..f82b13d32b 100644 --- a/block/crypto.c +++ b/block/crypto.c @@ -39,6 +39,7 @@ typedef struct BlockCrypto BlockCrypto; struct BlockCrypto { QCryptoBlock *block; bool updating_keys; +BdrvChild *header; /* Reference to the detached LUKS header */ }; diff --git a/qapi/block-core.json b/qapi/block-core.json index ca390c5700..10be08d08f 100644 --- a/qapi/block-core.json +++ b/qapi/block-core.json @@ -3352,11 +3352,15 @@ # decryption key (since 2.6). Mandatory except when doing a # metadata-only probe of the image. # +# @header: optional reference to the location of a blockdev +# storing a detached LUKS header. (since 9.0) +# # Since: 2.9 ## { 'struct': 'BlockdevOptionsLUKS', 'base': 'BlockdevOptionsGenericFormat', - 'data': { '*key-secret': 'str' } } + 'data': { '*key-secret': 'str', +'*header': 'BlockdevRef'} } ## # @BlockdevOptionsGenericCOWFormat: -- 2.39.1
[RFC 4/8] Gluks: Introduce Gluks options
Similar to Luks, the Gluks format primarily recycles the Luks choices with the exception of the "size" option. Signed-off-by: Hyman Huang --- block/crypto.c | 4 ++-- block/generic-luks.c | 18 ++ block/generic-luks.h | 3 +++ 3 files changed, 23 insertions(+), 2 deletions(-) diff --git a/block/crypto.c b/block/crypto.c index 6afae1de2e..6f8528dccc 100644 --- a/block/crypto.c +++ b/block/crypto.c @@ -150,7 +150,7 @@ error: } -static QemuOptsList block_crypto_runtime_opts_luks = { +QemuOptsList block_crypto_runtime_opts_luks = { .name = "crypto", .head = QTAILQ_HEAD_INITIALIZER(block_crypto_runtime_opts_luks.head), .desc = { @@ -181,7 +181,7 @@ static QemuOptsList block_crypto_create_opts_luks = { }; -static QemuOptsList block_crypto_amend_opts_luks = { +QemuOptsList block_crypto_amend_opts_luks = { .name = "crypto", .head = QTAILQ_HEAD_INITIALIZER(block_crypto_create_opts_luks.head), .desc = { diff --git a/block/generic-luks.c b/block/generic-luks.c index f23e202991..ebc0365d40 100644 --- a/block/generic-luks.c +++ b/block/generic-luks.c @@ -35,6 +35,21 @@ typedef struct BDRVGLUKSState { uint64_t header_size; /* In bytes */ } BDRVGLUKSState; +static QemuOptsList gluks_create_opts_luks = { +.name = "crypto", +.head = QTAILQ_HEAD_INITIALIZER(gluks_create_opts_luks.head), +.desc = { +BLOCK_CRYPTO_OPT_DEF_LUKS_KEY_SECRET(""), +BLOCK_CRYPTO_OPT_DEF_LUKS_CIPHER_ALG(""), +BLOCK_CRYPTO_OPT_DEF_LUKS_CIPHER_MODE(""), +BLOCK_CRYPTO_OPT_DEF_LUKS_IVGEN_ALG(""), +BLOCK_CRYPTO_OPT_DEF_LUKS_IVGEN_HASH_ALG(""), +BLOCK_CRYPTO_OPT_DEF_LUKS_HASH_ALG(""), +BLOCK_CRYPTO_OPT_DEF_LUKS_ITER_TIME(""), +{ /* end of list */ } +}, +}; + static int gluks_open(BlockDriverState *bs, QDict *options, int flags, Error **errp) { @@ -71,6 +86,9 @@ static BlockDriver bdrv_generic_luks = { .bdrv_co_create_opts= gluks_co_create_opts, .bdrv_child_perm= gluks_child_perms, .bdrv_co_getlength = gluks_co_getlength, + +.create_opts= _create_opts_luks, +.amend_opts = _crypto_amend_opts_luks, }; static void block_generic_luks_init(void) diff --git a/block/generic-luks.h b/block/generic-luks.h index 2aae866fa4..f18adf41ea 100644 --- a/block/generic-luks.h +++ b/block/generic-luks.h @@ -23,4 +23,7 @@ #ifndef GENERIC_LUKS_H #define GENERIC_LUKS_H +extern QemuOptsList block_crypto_runtime_opts_luks; +extern QemuOptsList block_crypto_amend_opts_luks; + #endif /* GENERIC_LUKS_H */ -- 2.39.1
[RFC 6/8] crypto: Provide the Luks crypto driver to Gluks
Hooks up the Luks crypto driver for Gluks. Signed-off-by: Hyman Huang --- crypto/block.c | 1 + 1 file changed, 1 insertion(+) diff --git a/crypto/block.c b/crypto/block.c index 3dcf22a69f..7e695c0a04 100644 --- a/crypto/block.c +++ b/crypto/block.c @@ -27,6 +27,7 @@ static const QCryptoBlockDriver *qcrypto_block_drivers[] = { [Q_CRYPTO_BLOCK_FORMAT_QCOW] = _block_driver_qcow, [Q_CRYPTO_BLOCK_FORMAT_LUKS] = _block_driver_luks, +[Q_CRYPTO_BLOCK_FORMAT_GLUKS] = _block_driver_luks, }; -- 2.39.1
[RFC 5/8] qapi: Introduce Gluks types to qapi
Primarily using the Luks types again, Gluks adds an extra option called "header", which points to the Luks header node's description. Signed-off-by: Hyman Huang --- qapi/block-core.json | 22 +- qapi/crypto.json | 10 +++--- 2 files changed, 28 insertions(+), 4 deletions(-) diff --git a/qapi/block-core.json b/qapi/block-core.json index ca390c5700..e2208f6891 100644 --- a/qapi/block-core.json +++ b/qapi/block-core.json @@ -3185,12 +3185,14 @@ # # @snapshot-access: Since 7.0 # +# @gluks: Since 9.0 +# # Since: 2.9 ## { 'enum': 'BlockdevDriver', 'data': [ 'blkdebug', 'blklogwrites', 'blkreplay', 'blkverify', 'bochs', 'cloop', 'compress', 'copy-before-write', 'copy-on-read', 'dmg', -'file', 'snapshot-access', 'ftp', 'ftps', 'gluster', +'file', 'snapshot-access', 'ftp', 'ftps', 'gluks', 'gluster', {'name': 'host_cdrom', 'if': 'HAVE_HOST_BLOCK_DEVICE' }, {'name': 'host_device', 'if': 'HAVE_HOST_BLOCK_DEVICE' }, 'http', 'https', @@ -3957,6 +3959,23 @@ '*debug': 'int', '*logfile': 'str' } } +## +# @BlockdevOptionsGLUKS: +# +# Driver specific block device options for GLUKS. +# +# @header: reference to the definition of the luks header node. +# +# @key-secret: the ID of a QCryptoSecret object providing the +# decryption key. +# +# Since: 9.0 +## +{ 'struct': 'BlockdevOptionsGLUKS', + 'base': 'BlockdevOptionsGenericFormat', + 'data': { 'header': 'BlockdevRef', +'key-secret': 'str' } } + ## # @BlockdevOptionsIoUring: # @@ -4680,6 +4699,7 @@ 'file': 'BlockdevOptionsFile', 'ftp':'BlockdevOptionsCurlFtp', 'ftps': 'BlockdevOptionsCurlFtps', + 'gluks': 'BlockdevOptionsGLUKS', 'gluster':'BlockdevOptionsGluster', 'host_cdrom': { 'type': 'BlockdevOptionsFile', 'if': 'HAVE_HOST_BLOCK_DEVICE' }, diff --git a/qapi/crypto.json b/qapi/crypto.json index fd3d46ebd1..9afb242b5b 100644 --- a/qapi/crypto.json +++ b/qapi/crypto.json @@ -154,11 +154,13 @@ # # @luks: LUKS encryption format. Recommended for new images # +# @gluks: generic LUKS encryption format. (since 9.0) +# # Since: 2.6 ## { 'enum': 'QCryptoBlockFormat', # 'prefix': 'QCRYPTO_BLOCK_FORMAT', - 'data': ['qcow', 'luks']} + 'data': ['qcow', 'luks', 'gluks']} ## # @QCryptoBlockOptionsBase: @@ -246,7 +248,8 @@ 'base': 'QCryptoBlockOptionsBase', 'discriminator': 'format', 'data': { 'qcow': 'QCryptoBlockOptionsQCow', -'luks': 'QCryptoBlockOptionsLUKS' } } +'luks': 'QCryptoBlockOptionsLUKS', +'gluks': 'QCryptoBlockOptionsLUKS' } } ## # @QCryptoBlockCreateOptions: @@ -260,7 +263,8 @@ 'base': 'QCryptoBlockOptionsBase', 'discriminator': 'format', 'data': { 'qcow': 'QCryptoBlockOptionsQCow', -'luks': 'QCryptoBlockCreateOptionsLUKS' } } +'luks': 'QCryptoBlockCreateOptionsLUKS', +'gluks': 'QCryptoBlockCreateOptionsLUKS' } } ## # @QCryptoBlockInfoBase: -- 2.39.1
[RFC 1/8] crypto: Export util functions and structures
Luks driver logic is primarily reused by Gluk, which, therefore, exports several pre-existing functions and structures. Signed-off-by: Hyman Huang --- block/crypto.c | 16 block/crypto.h | 23 +++ 2 files changed, 27 insertions(+), 12 deletions(-) diff --git a/block/crypto.c b/block/crypto.c index 921933a5e5..6afae1de2e 100644 --- a/block/crypto.c +++ b/block/crypto.c @@ -34,14 +34,6 @@ #include "qemu/memalign.h" #include "crypto.h" -typedef struct BlockCrypto BlockCrypto; - -struct BlockCrypto { -QCryptoBlock *block; -bool updating_keys; -}; - - static int block_crypto_probe_generic(QCryptoBlockFormat format, const uint8_t *buf, int buf_size, @@ -321,7 +313,7 @@ static int block_crypto_open_generic(QCryptoBlockFormat format, } -static int coroutine_fn GRAPH_UNLOCKED +int coroutine_fn GRAPH_UNLOCKED block_crypto_co_create_generic(BlockDriverState *bs, int64_t size, QCryptoBlockCreateOptions *opts, PreallocMode prealloc, Error **errp) @@ -385,7 +377,7 @@ block_crypto_co_truncate(BlockDriverState *bs, int64_t offset, bool exact, return bdrv_co_truncate(bs->file, offset, exact, prealloc, 0, errp); } -static void block_crypto_close(BlockDriverState *bs) +void block_crypto_close(BlockDriverState *bs) { BlockCrypto *crypto = bs->opaque; qcrypto_block_free(crypto->block); @@ -404,7 +396,7 @@ static int block_crypto_reopen_prepare(BDRVReopenState *state, */ #define BLOCK_CRYPTO_MAX_IO_SIZE (1024 * 1024) -static int coroutine_fn GRAPH_RDLOCK +int coroutine_fn GRAPH_RDLOCK block_crypto_co_preadv(BlockDriverState *bs, int64_t offset, int64_t bytes, QEMUIOVector *qiov, BdrvRequestFlags flags) { @@ -466,7 +458,7 @@ block_crypto_co_preadv(BlockDriverState *bs, int64_t offset, int64_t bytes, } -static int coroutine_fn GRAPH_RDLOCK +int coroutine_fn GRAPH_RDLOCK block_crypto_co_pwritev(BlockDriverState *bs, int64_t offset, int64_t bytes, QEMUIOVector *qiov, BdrvRequestFlags flags) { diff --git a/block/crypto.h b/block/crypto.h index 72e792c9af..06465009f0 100644 --- a/block/crypto.h +++ b/block/crypto.h @@ -21,6 +21,8 @@ #ifndef BLOCK_CRYPTO_H #define BLOCK_CRYPTO_H +#include "crypto/block.h" + #define BLOCK_CRYPTO_OPT_DEF_KEY_SECRET(prefix, helpstr)\ { \ .name = prefix BLOCK_CRYPTO_OPT_QCOW_KEY_SECRET,\ @@ -131,4 +133,25 @@ block_crypto_amend_opts_init(QDict *opts, Error **errp); QCryptoBlockOpenOptions * block_crypto_open_opts_init(QDict *opts, Error **errp); +typedef struct BlockCrypto BlockCrypto; + +struct BlockCrypto { +QCryptoBlock *block; +bool updating_keys; +}; + +int coroutine_fn GRAPH_UNLOCKED +block_crypto_co_create_generic(BlockDriverState *bs, int64_t size, + QCryptoBlockCreateOptions *opts, + PreallocMode prealloc, Error **errp); + +int coroutine_fn GRAPH_RDLOCK +block_crypto_co_preadv(BlockDriverState *bs, int64_t offset, int64_t bytes, + QEMUIOVector *qiov, BdrvRequestFlags flags); + +int coroutine_fn GRAPH_RDLOCK +block_crypto_co_pwritev(BlockDriverState *bs, int64_t offset, int64_t bytes, +QEMUIOVector *qiov, BdrvRequestFlags flags); + +void block_crypto_close(BlockDriverState *bs); #endif /* BLOCK_CRYPTO_H */ -- 2.39.1
[RFC 2/8] crypto: Introduce payload offset set function
Implement the payload offset set function for Gluks. Signed-off-by: Hyman Huang --- crypto/block.c | 4 include/crypto/block.h | 1 + 2 files changed, 5 insertions(+) diff --git a/crypto/block.c b/crypto/block.c index 7bb4b74a37..3dcf22a69f 100644 --- a/crypto/block.c +++ b/crypto/block.c @@ -319,6 +319,10 @@ QCryptoHashAlgorithm qcrypto_block_get_kdf_hash(QCryptoBlock *block) return block->kdfhash; } +void qcrypto_block_set_payload_offset(QCryptoBlock *block, uint64_t offset) +{ +block->payload_offset = offset; +} uint64_t qcrypto_block_get_payload_offset(QCryptoBlock *block) { diff --git a/include/crypto/block.h b/include/crypto/block.h index 4f63a37872..b47a90c529 100644 --- a/include/crypto/block.h +++ b/include/crypto/block.h @@ -312,4 +312,5 @@ void qcrypto_block_free(QCryptoBlock *block); G_DEFINE_AUTOPTR_CLEANUP_FUNC(QCryptoBlock, qcrypto_block_free) +void qcrypto_block_set_payload_offset(QCryptoBlock *block, uint64_t offset); #endif /* QCRYPTO_BLOCK_H */ -- 2.39.1
[RFC 3/8] Gluks: Add the basic framework
Gluks would be a built-in format in the QEMU block layer. Signed-off-by: Hyman Huang --- block/generic-luks.c | 81 block/generic-luks.h | 26 ++ block/meson.build| 1 + 3 files changed, 108 insertions(+) create mode 100644 block/generic-luks.c create mode 100644 block/generic-luks.h diff --git a/block/generic-luks.c b/block/generic-luks.c new file mode 100644 index 00..f23e202991 --- /dev/null +++ b/block/generic-luks.c @@ -0,0 +1,81 @@ +/* + * QEMU block driver for the generic luks encryption + * + * Copyright (c) 2024 SmartX Inc + * + * Author: Hyman Huang + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, see <http://www.gnu.org/licenses/>. + * + */ + +#include "qemu/osdep.h" + +#include "block/block_int.h" +#include "block/crypto.h" +#include "crypto/block.h" + +#include "generic-luks.h" + +/* BDRVGLUKSState holds the state of one generic LUKS instance */ +typedef struct BDRVGLUKSState { +BlockCrypto crypto; +BdrvChild *header; /* LUKS header node */ +uint64_t header_size; /* In bytes */ +} BDRVGLUKSState; + +static int gluks_open(BlockDriverState *bs, QDict *options, int flags, + Error **errp) +{ +return 0; +} + +static int coroutine_fn GRAPH_UNLOCKED +gluks_co_create_opts(BlockDriver *drv, const char *filename, + QemuOpts *opts, Error **errp) +{ +return 0; +} + +static void +gluks_child_perms(BlockDriverState *bs, BdrvChild *c, + const BdrvChildRole role, + BlockReopenQueue *reopen_queue, + uint64_t perm, uint64_t shared, + uint64_t *nperm, uint64_t *nshared) +{ + +} + +static int64_t coroutine_fn GRAPH_RDLOCK +gluks_co_getlength(BlockDriverState *bs) +{ +return 0; +} + +static BlockDriver bdrv_generic_luks = { +.format_name= "gluks", +.instance_size = sizeof(BDRVGLUKSState), +.bdrv_open = gluks_open, +.bdrv_co_create_opts= gluks_co_create_opts, +.bdrv_child_perm= gluks_child_perms, +.bdrv_co_getlength = gluks_co_getlength, +}; + +static void block_generic_luks_init(void) +{ +bdrv_register(_generic_luks); +} + +block_init(block_generic_luks_init); diff --git a/block/generic-luks.h b/block/generic-luks.h new file mode 100644 index 00..2aae866fa4 --- /dev/null +++ b/block/generic-luks.h @@ -0,0 +1,26 @@ +/* + * QEMU block driver for the generic luks encryption + * + * Copyright (c) 2024 SmartX Inc + * + * Author: Hyman Huang + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, see <http://www.gnu.org/licenses/>. + * + */ + +#ifndef GENERIC_LUKS_H +#define GENERIC_LUKS_H + +#endif /* GENERIC_LUKS_H */ diff --git a/block/meson.build b/block/meson.build index 59ff6d380c..74f2da7bed 100644 --- a/block/meson.build +++ b/block/meson.build @@ -39,6 +39,7 @@ block_ss.add(files( 'throttle.c', 'throttle-groups.c', 'write-threshold.c', + 'generic-luks.c', ), zstd, zlib, gnutls) system_ss.add(when: 'CONFIG_TCG', if_true: files('blkreplay.c')) -- 2.39.1
[RFC 8/8] block: Support Gluks format image creation using qemu-img
To create a Gluks header image, use the command as follows: $ qemu-img create --object secret,id=sec0,data=abc123 -f gluks > -o cipher-alg=aes-256,cipher-mode=xts -o key-secret=sec0 > cipher.gluks Signed-off-by: Hyman Huang --- block.c | 5 + block/generic-luks.c | 53 +++- 2 files changed, 57 insertions(+), 1 deletion(-) diff --git a/block.c b/block.c index bfb0861ec6..cc9a517a25 100644 --- a/block.c +++ b/block.c @@ -7517,6 +7517,11 @@ void bdrv_img_create(const char *filename, const char *fmt, goto out; } +if (!strcmp(fmt, "gluks")) { +qemu_opt_set(opts, "size", "0M", _err); +size = 0; +} + if (size == -1) { error_setg(errp, "Image creation needs a size parameter"); goto out; diff --git a/block/generic-luks.c b/block/generic-luks.c index 32cbedc86f..579f01c4b0 100644 --- a/block/generic-luks.c +++ b/block/generic-luks.c @@ -145,7 +145,58 @@ static int coroutine_fn GRAPH_UNLOCKED gluks_co_create_opts(BlockDriver *drv, const char *filename, QemuOpts *opts, Error **errp) { -return 0; +QCryptoBlockCreateOptions *create_opts = NULL; +BlockDriverState *bs = NULL; +QDict *cryptoopts; +int ret; + +if (qemu_opt_get_size_del(opts, BLOCK_OPT_SIZE, 0) != 0) { +info_report("gluks format image need not size parameter, ignore it"); +} + +cryptoopts = qemu_opts_to_qdict_filtered(opts, NULL, + _create_opts_luks, + true); + +qdict_put_str(cryptoopts, "format", +QCryptoBlockFormat_str(Q_CRYPTO_BLOCK_FORMAT_GLUKS)); + +create_opts = block_crypto_create_opts_init(cryptoopts, errp); +if (!create_opts) { +ret = -EINVAL; +goto fail; +} + +/* Create protocol layer */ +ret = bdrv_co_create_file(filename, opts, errp); +if (ret < 0) { +goto fail; +} + +bs = bdrv_co_open(filename, NULL, NULL, + BDRV_O_RDWR | BDRV_O_RESIZE | BDRV_O_PROTOCOL, errp); +if (!bs) { +ret = -EINVAL; +goto fail; +} +/* Create format layer */ +ret = block_crypto_co_create_generic(bs, 0, create_opts, 0, errp); +if (ret < 0) { +goto fail; +} + +ret = 0; +fail: +/* + * If an error occurred, delete 'filename'. Even if the file existed + * beforehand, it has been truncated and corrupted in the process. + */ +if (ret) { +bdrv_graph_co_rdlock(); +bdrv_co_delete_file_noerr(bs); +bdrv_graph_co_rdunlock(); +} +return ret; } static void -- 2.39.1
[RFC 7/8] Gluks: Implement the fundamental block layer driver hooks
Signed-off-by: Hyman Huang --- block/generic-luks.c | 104 ++- 1 file changed, 102 insertions(+), 2 deletions(-) diff --git a/block/generic-luks.c b/block/generic-luks.c index ebc0365d40..32cbedc86f 100644 --- a/block/generic-luks.c +++ b/block/generic-luks.c @@ -23,8 +23,14 @@ #include "qemu/osdep.h" #include "block/block_int.h" +#include "block/block-io.h" #include "block/crypto.h" +#include "block/qdict.h" #include "crypto/block.h" +#include "qapi/error.h" +#include "qemu/error-report.h" +#include "qemu/module.h" +#include "qemu/option.h" #include "generic-luks.h" @@ -50,10 +56,89 @@ static QemuOptsList gluks_create_opts_luks = { }, }; +static int gluks_read_func(QCryptoBlock *block, + size_t offset, + uint8_t *buf, + size_t buflen, + void *opaque, + Error **errp) +{ + +BlockDriverState *bs = opaque; +BDRVGLUKSState *s = bs->opaque; +ssize_t ret; + +GLOBAL_STATE_CODE(); +GRAPH_RDLOCK_GUARD_MAINLOOP(); + +ret = bdrv_pread(s->header, offset, buflen, buf, 0); +if (ret < 0) { +error_setg_errno(errp, -ret, "Could not read generic luks header"); +return ret; +} +return 0; +} + static int gluks_open(BlockDriverState *bs, QDict *options, int flags, Error **errp) { -return 0; +BDRVGLUKSState *s = bs->opaque; +QemuOpts *opts = NULL; +QCryptoBlockOpenOptions *open_opts = NULL; +QDict *cryptoopts = NULL; +unsigned int cflags = 0; +int ret; + +GLOBAL_STATE_CODE(); + +if (!bdrv_open_child(NULL, options, "file", bs, _of_bds, + (BDRV_CHILD_DATA | BDRV_CHILD_PRIMARY), false, errp)) { +return -EINVAL; +} +s->header = bdrv_open_child(NULL, options, "header", bs, +_of_bds, BDRV_CHILD_METADATA, false, +errp); +if (!s->header) { +return -EINVAL; +} + +GRAPH_RDLOCK_GUARD_MAINLOOP(); + +opts = qemu_opts_create(_crypto_runtime_opts_luks, +NULL, 0, _abort); +if (!qemu_opts_absorb_qdict(opts, options, errp)) { +ret = -EINVAL; +goto cleanup; +} + +cryptoopts = qemu_opts_to_qdict(opts, NULL); +qdict_put_str(cryptoopts, "format", +QCryptoBlockFormat_str(Q_CRYPTO_BLOCK_FORMAT_GLUKS)); + +open_opts = block_crypto_open_opts_init(cryptoopts, errp); +if (!open_opts) { +goto cleanup; +} + +s->crypto.block = qcrypto_block_open(open_opts, NULL, + gluks_read_func, + bs, + cflags, + 1, + errp); +if (!s->crypto.block) { +ret = -EIO; +goto cleanup; +} + +s->header_size = qcrypto_block_get_payload_offset(s->crypto.block); +qcrypto_block_set_payload_offset(s->crypto.block, 0); + +ret = 0; + cleanup: +qobject_unref(cryptoopts); +qapi_free_QCryptoBlockOpenOptions(open_opts); +return ret; } static int coroutine_fn GRAPH_UNLOCKED @@ -70,13 +155,24 @@ gluks_child_perms(BlockDriverState *bs, BdrvChild *c, uint64_t perm, uint64_t shared, uint64_t *nperm, uint64_t *nshared) { +if (role & BDRV_CHILD_METADATA) { +/* assign read permission only */ +perm |= BLK_PERM_CONSISTENT_READ; +/* share all permissions */ +shared |= BLK_PERM_ALL; +*nperm = perm; +*nshared = shared; +return; +} + +bdrv_default_perms(bs, c, role, reopen_queue, perm, shared, nperm, nshared); } static int64_t coroutine_fn GRAPH_RDLOCK gluks_co_getlength(BlockDriverState *bs) { -return 0; +return bdrv_co_getlength(bs->file->bs); } static BlockDriver bdrv_generic_luks = { @@ -87,8 +183,12 @@ static BlockDriver bdrv_generic_luks = { .bdrv_child_perm= gluks_child_perms, .bdrv_co_getlength = gluks_co_getlength, +.bdrv_close = block_crypto_close, +.bdrv_co_preadv = block_crypto_co_preadv, +.bdrv_co_pwritev= block_crypto_co_pwritev, .create_opts= _create_opts_luks, .amend_opts = _crypto_amend_opts_luks, +.is_format = false, }; static void block_generic_luks_init(void) -- 2.39.1
[RFC 0/8] Support generic Luks encryption
This functionality was motivated by the following to-do list seen in crypto documents: https://wiki.qemu.org/Features/Block/Crypto The last chapter says we should "separate header volume": The LUKS format has ability to store the header in a separate volume from the payload. We should extend the LUKS driver in QEMU to support this use case. As a proof-of-concept, I've created this patchset, which I've named the Gluks: generic luks. As their name suggests, they offer encryption for any format that QEMU theoretically supports. As you can see below, the Gluks format block layer driver's design is quite simple. virtio-blk/vhost-user-blk...(front-end device) ^ | Gluks (format-like disk node) / \ file header (blockdev reference) / \ filefile (protocol node) | | disk data Luks data We don't need to create a new disk format in order to use the Gluks to encrypt the disk; all we need to do is construct a Luks header, which we will refer to as the "Gluk" because it only contains Luks header data and no user data. The creation command, for instance, is nearly identical to Luks image: $ qemu-img create --object secret,id=sec0,data=abc123 -f gluks -o cipher-alg=aes-256,cipher-mode=xts -o key-secret=sec0 cipher.gluks As previously mentioned, the "size" option is not accepted during the generation of the Gluks format because it only contains the Luks header data. To hot-add a raw disk with Gluks encryption, see the following steps: 1. add a protocol blockdev node of data disk $ virsh qemu-monitor-command vm '{"execute":"blockdev-add", "arguments":{"node-name": "libvirt-1-storage", "driver": "file", "filename": "/path/to/test_disk.raw"}}' 2. add a protocol blockdev node of Luks header $ virsh qemu-monitor-command vm '{"execute":"blockdev-add", "arguments":{"node-name": "libvirt-2-storage", "driver": "file", "filename": "/path/to/cipher.gluks" }}' 3. add the secret for decrypting the cipher stored in Gluks header $ virsh qemu-monitor-command c81_node1 '{"execute":"object-add", "arguments":{"qom-type": "secret", "id": "libvirt-2-storage-secret0", "data": "abc123"}}' 4. add the Gluks-drived blockdev to connect the user disk with Luks header, QEMU will use the cipher in the Luks header to encrypt/decrypt the disk data $ virsh qemu-monitor-command vm '{"execute":"blockdev-add", "arguments":{"node-name": "libvirt-1-format", "driver": "gluks", "file": "libvirt-1-storage", "header": "libvirt-2-storage", "key-secret": "libvirt-2-storage-secret0"}}' 5. add the device finally $ virsh qemu-monitor-command vm '{"execute":"device_add", "arguments": {"num-queues": "1", "driver": "virtio-blk-pci", "scsi": "off", "drive": "libvirt-1-format", "id": "virtio-disk1"}}' Do the reverse to hot-del the raw disk. To hot-add a qcow2 disk with Gluks encryption: 1. add a protocol blockdev node of data disk $ virsh qemu-monitor-command vm '{"execute":"blockdev-add", "arguments":{"node-name": "libvirt-1-storage", "driver": "file", "filename": "/path/to/test_disk.qcow2"}}' 2. add a protocol blockdev node of Luks header as above. block ref: libvirt-2-storage 3. add the secret for decrypting the cipher stored in Gluks header as above too secret ref: libvirt-2-storage-secret0 4. add the qcow2-drived blockdev format node: $ virsh qemu-monitor-command vm '{"execute":"blockdev-add", "arguments":{"node-name": "libvirt-1-format", "driver": "qcow2", "file": "libvirt-1-storage"}}' 5. add the Gluks-drived blockdev to connect the qcow2 disk with Luks header $ virsh qemu-monitor-command vm '{"execute":"blockdev-add", "arguments":{"node-name": "libvirt-2-format", "driver": "gluks", "file": "libvirt-1-format", "header": "libvirt-2-storage", "key-secret": "libvirt-2-format-secret0"}}' 6. add the device finally $ virsh qemu-monitor-command vm '{"execute":"device_add", "arguments": {"num-queues": "
[PATCH v4] crypto: Introduce SM4 symmetric cipher algorithm
Introduce the SM4 cipher algorithms (OSCCA GB/T 32907-2016). SM4 (GBT.32907-2016) is a cryptographic standard issued by the Organization of State Commercial Administration of China (OSCCA) as an authorized cryptographic algorithms for the use within China. Use the crypto-sm4 meson build option to explicitly control the feature, which would be detected by default. Signed-off-by: Hyman Huang Reviewed-by: Philippe Mathieu-Daudé --- crypto/block-luks.c | 11 crypto/cipher-gcrypt.c.inc | 8 ++ crypto/cipher-nettle.c.inc | 49 + crypto/cipher.c | 6 meson.build | 42 meson_options.txt | 2 ++ qapi/crypto.json| 5 +++- scripts/meson-buildoptions.sh | 3 ++ tests/unit/test-crypto-cipher.c | 13 + 9 files changed, 138 insertions(+), 1 deletion(-) diff --git a/crypto/block-luks.c b/crypto/block-luks.c index fb01ec38bb..f0813d69b4 100644 --- a/crypto/block-luks.c +++ b/crypto/block-luks.c @@ -95,12 +95,23 @@ qcrypto_block_luks_cipher_size_map_twofish[] = { { 0, 0 }, }; +#ifdef CONFIG_CRYPTO_SM4 +static const QCryptoBlockLUKSCipherSizeMap +qcrypto_block_luks_cipher_size_map_sm4[] = { +{ 16, QCRYPTO_CIPHER_ALG_SM4}, +{ 0, 0 }, +}; +#endif + static const QCryptoBlockLUKSCipherNameMap qcrypto_block_luks_cipher_name_map[] = { { "aes", qcrypto_block_luks_cipher_size_map_aes }, { "cast5", qcrypto_block_luks_cipher_size_map_cast5 }, { "serpent", qcrypto_block_luks_cipher_size_map_serpent }, { "twofish", qcrypto_block_luks_cipher_size_map_twofish }, +#ifdef CONFIG_CRYPTO_SM4 +{ "sm4", qcrypto_block_luks_cipher_size_map_sm4}, +#endif }; QEMU_BUILD_BUG_ON(sizeof(struct QCryptoBlockLUKSKeySlot) != 48); diff --git a/crypto/cipher-gcrypt.c.inc b/crypto/cipher-gcrypt.c.inc index a6a0117717..1377cbaf14 100644 --- a/crypto/cipher-gcrypt.c.inc +++ b/crypto/cipher-gcrypt.c.inc @@ -35,6 +35,9 @@ bool qcrypto_cipher_supports(QCryptoCipherAlgorithm alg, case QCRYPTO_CIPHER_ALG_SERPENT_256: case QCRYPTO_CIPHER_ALG_TWOFISH_128: case QCRYPTO_CIPHER_ALG_TWOFISH_256: +#ifdef CONFIG_CRYPTO_SM4 +case QCRYPTO_CIPHER_ALG_SM4: +#endif break; default: return false; @@ -219,6 +222,11 @@ static QCryptoCipher *qcrypto_cipher_ctx_new(QCryptoCipherAlgorithm alg, case QCRYPTO_CIPHER_ALG_TWOFISH_256: gcryalg = GCRY_CIPHER_TWOFISH; break; +#ifdef CONFIG_CRYPTO_SM4 +case QCRYPTO_CIPHER_ALG_SM4: +gcryalg = GCRY_CIPHER_SM4; +break; +#endif default: error_setg(errp, "Unsupported cipher algorithm %s", QCryptoCipherAlgorithm_str(alg)); diff --git a/crypto/cipher-nettle.c.inc b/crypto/cipher-nettle.c.inc index 24cc61f87b..42b39e18a2 100644 --- a/crypto/cipher-nettle.c.inc +++ b/crypto/cipher-nettle.c.inc @@ -33,6 +33,9 @@ #ifndef CONFIG_QEMU_PRIVATE_XTS #include #endif +#ifdef CONFIG_CRYPTO_SM4 +#include +#endif static inline bool qcrypto_length_check(size_t len, size_t blocksize, Error **errp) @@ -426,6 +429,30 @@ DEFINE_ECB_CBC_CTR_XTS(qcrypto_nettle_twofish, QCryptoNettleTwofish, TWOFISH_BLOCK_SIZE, twofish_encrypt_native, twofish_decrypt_native) +#ifdef CONFIG_CRYPTO_SM4 +typedef struct QCryptoNettleSm4 { +QCryptoCipher base; +struct sm4_ctx key[2]; +} QCryptoNettleSm4; + +static void sm4_encrypt_native(void *ctx, size_t length, + uint8_t *dst, const uint8_t *src) +{ +struct sm4_ctx *keys = ctx; +sm4_crypt([0], length, dst, src); +} + +static void sm4_decrypt_native(void *ctx, size_t length, + uint8_t *dst, const uint8_t *src) +{ +struct sm4_ctx *keys = ctx; +sm4_crypt([1], length, dst, src); +} + +DEFINE_ECB(qcrypto_nettle_sm4, + QCryptoNettleSm4, SM4_BLOCK_SIZE, + sm4_encrypt_native, sm4_decrypt_native) +#endif bool qcrypto_cipher_supports(QCryptoCipherAlgorithm alg, QCryptoCipherMode mode) @@ -443,6 +470,9 @@ bool qcrypto_cipher_supports(QCryptoCipherAlgorithm alg, case QCRYPTO_CIPHER_ALG_TWOFISH_128: case QCRYPTO_CIPHER_ALG_TWOFISH_192: case QCRYPTO_CIPHER_ALG_TWOFISH_256: +#ifdef CONFIG_CRYPTO_SM4 +case QCRYPTO_CIPHER_ALG_SM4: +#endif break; default: return false; @@ -701,6 +731,25 @@ static QCryptoCipher *qcrypto_cipher_ctx_new(QCryptoCipherAlgorithm alg, return >base; } +#ifdef CONFIG_CRYPTO_SM4 +case QCRYPTO_CIPHER_ALG_SM4: +{ +QCryptoNettleSm4 *ctx = g_new0(QCryptoNettleSm4, 1); + +switch (mode) { +case QCRYPTO_CIPHER_MODE_ECB: +ctx->base.driver = _nettle_sm4_driver_ecb; +
[PATCH v3] crypto: Introduce SM4 symmetric cipher algorithm
Introduce the SM4 cipher algorithms (OSCCA GB/T 32907-2016). SM4 (GBT.32907-2016) is a cryptographic standard issued by the Organization of State Commercial Administration of China (OSCCA) as an authorized cryptographic algorithms for the use within China. Use the crypto-sm4 meson build option to explicitly control the feature, which would be detected by default. Signed-off-by: Hyman Huang --- crypto/block-luks.c | 11 crypto/cipher-gcrypt.c.inc | 8 ++ crypto/cipher-nettle.c.inc | 49 + crypto/cipher.c | 6 meson.build | 42 meson_options.txt | 2 ++ qapi/crypto.json| 5 +++- scripts/meson-buildoptions.sh | 3 ++ tests/unit/test-crypto-cipher.c | 13 + 9 files changed, 138 insertions(+), 1 deletion(-) diff --git a/crypto/block-luks.c b/crypto/block-luks.c index fb01ec38bb..f0813d69b4 100644 --- a/crypto/block-luks.c +++ b/crypto/block-luks.c @@ -95,12 +95,23 @@ qcrypto_block_luks_cipher_size_map_twofish[] = { { 0, 0 }, }; +#ifdef CONFIG_CRYPTO_SM4 +static const QCryptoBlockLUKSCipherSizeMap +qcrypto_block_luks_cipher_size_map_sm4[] = { +{ 16, QCRYPTO_CIPHER_ALG_SM4}, +{ 0, 0 }, +}; +#endif + static const QCryptoBlockLUKSCipherNameMap qcrypto_block_luks_cipher_name_map[] = { { "aes", qcrypto_block_luks_cipher_size_map_aes }, { "cast5", qcrypto_block_luks_cipher_size_map_cast5 }, { "serpent", qcrypto_block_luks_cipher_size_map_serpent }, { "twofish", qcrypto_block_luks_cipher_size_map_twofish }, +#ifdef CONFIG_CRYPTO_SM4 +{ "sm4", qcrypto_block_luks_cipher_size_map_sm4}, +#endif }; QEMU_BUILD_BUG_ON(sizeof(struct QCryptoBlockLUKSKeySlot) != 48); diff --git a/crypto/cipher-gcrypt.c.inc b/crypto/cipher-gcrypt.c.inc index a6a0117717..1377cbaf14 100644 --- a/crypto/cipher-gcrypt.c.inc +++ b/crypto/cipher-gcrypt.c.inc @@ -35,6 +35,9 @@ bool qcrypto_cipher_supports(QCryptoCipherAlgorithm alg, case QCRYPTO_CIPHER_ALG_SERPENT_256: case QCRYPTO_CIPHER_ALG_TWOFISH_128: case QCRYPTO_CIPHER_ALG_TWOFISH_256: +#ifdef CONFIG_CRYPTO_SM4 +case QCRYPTO_CIPHER_ALG_SM4: +#endif break; default: return false; @@ -219,6 +222,11 @@ static QCryptoCipher *qcrypto_cipher_ctx_new(QCryptoCipherAlgorithm alg, case QCRYPTO_CIPHER_ALG_TWOFISH_256: gcryalg = GCRY_CIPHER_TWOFISH; break; +#ifdef CONFIG_CRYPTO_SM4 +case QCRYPTO_CIPHER_ALG_SM4: +gcryalg = GCRY_CIPHER_SM4; +break; +#endif default: error_setg(errp, "Unsupported cipher algorithm %s", QCryptoCipherAlgorithm_str(alg)); diff --git a/crypto/cipher-nettle.c.inc b/crypto/cipher-nettle.c.inc index 24cc61f87b..42b39e18a2 100644 --- a/crypto/cipher-nettle.c.inc +++ b/crypto/cipher-nettle.c.inc @@ -33,6 +33,9 @@ #ifndef CONFIG_QEMU_PRIVATE_XTS #include #endif +#ifdef CONFIG_CRYPTO_SM4 +#include +#endif static inline bool qcrypto_length_check(size_t len, size_t blocksize, Error **errp) @@ -426,6 +429,30 @@ DEFINE_ECB_CBC_CTR_XTS(qcrypto_nettle_twofish, QCryptoNettleTwofish, TWOFISH_BLOCK_SIZE, twofish_encrypt_native, twofish_decrypt_native) +#ifdef CONFIG_CRYPTO_SM4 +typedef struct QCryptoNettleSm4 { +QCryptoCipher base; +struct sm4_ctx key[2]; +} QCryptoNettleSm4; + +static void sm4_encrypt_native(void *ctx, size_t length, + uint8_t *dst, const uint8_t *src) +{ +struct sm4_ctx *keys = ctx; +sm4_crypt([0], length, dst, src); +} + +static void sm4_decrypt_native(void *ctx, size_t length, + uint8_t *dst, const uint8_t *src) +{ +struct sm4_ctx *keys = ctx; +sm4_crypt([1], length, dst, src); +} + +DEFINE_ECB(qcrypto_nettle_sm4, + QCryptoNettleSm4, SM4_BLOCK_SIZE, + sm4_encrypt_native, sm4_decrypt_native) +#endif bool qcrypto_cipher_supports(QCryptoCipherAlgorithm alg, QCryptoCipherMode mode) @@ -443,6 +470,9 @@ bool qcrypto_cipher_supports(QCryptoCipherAlgorithm alg, case QCRYPTO_CIPHER_ALG_TWOFISH_128: case QCRYPTO_CIPHER_ALG_TWOFISH_192: case QCRYPTO_CIPHER_ALG_TWOFISH_256: +#ifdef CONFIG_CRYPTO_SM4 +case QCRYPTO_CIPHER_ALG_SM4: +#endif break; default: return false; @@ -701,6 +731,25 @@ static QCryptoCipher *qcrypto_cipher_ctx_new(QCryptoCipherAlgorithm alg, return >base; } +#ifdef CONFIG_CRYPTO_SM4 +case QCRYPTO_CIPHER_ALG_SM4: +{ +QCryptoNettleSm4 *ctx = g_new0(QCryptoNettleSm4, 1); + +switch (mode) { +case QCRYPTO_CIPHER_MODE_ECB: +ctx->base.driver = _nettle_sm4_driver_ecb; +
[PATCH v2] crypto: Introduce SM4 symmetric cipher algorithm
Introduce the SM4 cipher algorithms (OSCCA GB/T 32907-2016). SM4 (GBT.32907-2016) is a cryptographic standard issued by the Organization of State Commercial Administration of China (OSCCA) as an authorized cryptographic algorithms for the use within China. Use the crypto-sm4 meson build option for enabling this feature. Signed-off-by: Hyman Huang --- crypto/block-luks.c | 11 crypto/cipher-gcrypt.c.inc | 8 ++ crypto/cipher-nettle.c.inc | 49 + crypto/cipher.c | 6 meson.build | 23 meson_options.txt | 2 ++ qapi/crypto.json| 5 +++- scripts/meson-buildoptions.sh | 3 ++ tests/unit/test-crypto-cipher.c | 13 + 9 files changed, 119 insertions(+), 1 deletion(-) diff --git a/crypto/block-luks.c b/crypto/block-luks.c index fb01ec38bb..f0813d69b4 100644 --- a/crypto/block-luks.c +++ b/crypto/block-luks.c @@ -95,12 +95,23 @@ qcrypto_block_luks_cipher_size_map_twofish[] = { { 0, 0 }, }; +#ifdef CONFIG_CRYPTO_SM4 +static const QCryptoBlockLUKSCipherSizeMap +qcrypto_block_luks_cipher_size_map_sm4[] = { +{ 16, QCRYPTO_CIPHER_ALG_SM4}, +{ 0, 0 }, +}; +#endif + static const QCryptoBlockLUKSCipherNameMap qcrypto_block_luks_cipher_name_map[] = { { "aes", qcrypto_block_luks_cipher_size_map_aes }, { "cast5", qcrypto_block_luks_cipher_size_map_cast5 }, { "serpent", qcrypto_block_luks_cipher_size_map_serpent }, { "twofish", qcrypto_block_luks_cipher_size_map_twofish }, +#ifdef CONFIG_CRYPTO_SM4 +{ "sm4", qcrypto_block_luks_cipher_size_map_sm4}, +#endif }; QEMU_BUILD_BUG_ON(sizeof(struct QCryptoBlockLUKSKeySlot) != 48); diff --git a/crypto/cipher-gcrypt.c.inc b/crypto/cipher-gcrypt.c.inc index a6a0117717..1377cbaf14 100644 --- a/crypto/cipher-gcrypt.c.inc +++ b/crypto/cipher-gcrypt.c.inc @@ -35,6 +35,9 @@ bool qcrypto_cipher_supports(QCryptoCipherAlgorithm alg, case QCRYPTO_CIPHER_ALG_SERPENT_256: case QCRYPTO_CIPHER_ALG_TWOFISH_128: case QCRYPTO_CIPHER_ALG_TWOFISH_256: +#ifdef CONFIG_CRYPTO_SM4 +case QCRYPTO_CIPHER_ALG_SM4: +#endif break; default: return false; @@ -219,6 +222,11 @@ static QCryptoCipher *qcrypto_cipher_ctx_new(QCryptoCipherAlgorithm alg, case QCRYPTO_CIPHER_ALG_TWOFISH_256: gcryalg = GCRY_CIPHER_TWOFISH; break; +#ifdef CONFIG_CRYPTO_SM4 +case QCRYPTO_CIPHER_ALG_SM4: +gcryalg = GCRY_CIPHER_SM4; +break; +#endif default: error_setg(errp, "Unsupported cipher algorithm %s", QCryptoCipherAlgorithm_str(alg)); diff --git a/crypto/cipher-nettle.c.inc b/crypto/cipher-nettle.c.inc index 24cc61f87b..42b39e18a2 100644 --- a/crypto/cipher-nettle.c.inc +++ b/crypto/cipher-nettle.c.inc @@ -33,6 +33,9 @@ #ifndef CONFIG_QEMU_PRIVATE_XTS #include #endif +#ifdef CONFIG_CRYPTO_SM4 +#include +#endif static inline bool qcrypto_length_check(size_t len, size_t blocksize, Error **errp) @@ -426,6 +429,30 @@ DEFINE_ECB_CBC_CTR_XTS(qcrypto_nettle_twofish, QCryptoNettleTwofish, TWOFISH_BLOCK_SIZE, twofish_encrypt_native, twofish_decrypt_native) +#ifdef CONFIG_CRYPTO_SM4 +typedef struct QCryptoNettleSm4 { +QCryptoCipher base; +struct sm4_ctx key[2]; +} QCryptoNettleSm4; + +static void sm4_encrypt_native(void *ctx, size_t length, + uint8_t *dst, const uint8_t *src) +{ +struct sm4_ctx *keys = ctx; +sm4_crypt([0], length, dst, src); +} + +static void sm4_decrypt_native(void *ctx, size_t length, + uint8_t *dst, const uint8_t *src) +{ +struct sm4_ctx *keys = ctx; +sm4_crypt([1], length, dst, src); +} + +DEFINE_ECB(qcrypto_nettle_sm4, + QCryptoNettleSm4, SM4_BLOCK_SIZE, + sm4_encrypt_native, sm4_decrypt_native) +#endif bool qcrypto_cipher_supports(QCryptoCipherAlgorithm alg, QCryptoCipherMode mode) @@ -443,6 +470,9 @@ bool qcrypto_cipher_supports(QCryptoCipherAlgorithm alg, case QCRYPTO_CIPHER_ALG_TWOFISH_128: case QCRYPTO_CIPHER_ALG_TWOFISH_192: case QCRYPTO_CIPHER_ALG_TWOFISH_256: +#ifdef CONFIG_CRYPTO_SM4 +case QCRYPTO_CIPHER_ALG_SM4: +#endif break; default: return false; @@ -701,6 +731,25 @@ static QCryptoCipher *qcrypto_cipher_ctx_new(QCryptoCipherAlgorithm alg, return >base; } +#ifdef CONFIG_CRYPTO_SM4 +case QCRYPTO_CIPHER_ALG_SM4: +{ +QCryptoNettleSm4 *ctx = g_new0(QCryptoNettleSm4, 1); + +switch (mode) { +case QCRYPTO_CIPHER_MODE_ECB: +ctx->base.driver = _nettle_sm4_driver_ecb; +break; +default: +goto bad_cipher_mo
[PATCH] crypto: Introduce SM4 symmetric cipher algorithm
Introduce the SM4 cipher algorithms (OSCCA GB/T 32907-2016). SM4 (GBT.32907-2016) is a cryptographic standard issued by the Organization of State Commercial Administration of China (OSCCA) as an authorized cryptographic algorithms for the use within China. Signed-off-by: Hyman Huang --- crypto/block-luks.c | 7 ++ crypto/cipher-gcrypt.c.inc | 4 crypto/cipher-nettle.c.inc | 42 + crypto/cipher.c | 2 ++ qapi/crypto.json| 5 +++- tests/unit/test-crypto-cipher.c | 11 + 6 files changed, 70 insertions(+), 1 deletion(-) diff --git a/crypto/block-luks.c b/crypto/block-luks.c index fb01ec38bb..1cb7f21a05 100644 --- a/crypto/block-luks.c +++ b/crypto/block-luks.c @@ -95,12 +95,19 @@ qcrypto_block_luks_cipher_size_map_twofish[] = { { 0, 0 }, }; +static const QCryptoBlockLUKSCipherSizeMap +qcrypto_block_luks_cipher_size_map_sm4[] = { +{ 16, QCRYPTO_CIPHER_ALG_SM4}, +{ 0, 0 }, +}; + static const QCryptoBlockLUKSCipherNameMap qcrypto_block_luks_cipher_name_map[] = { { "aes", qcrypto_block_luks_cipher_size_map_aes }, { "cast5", qcrypto_block_luks_cipher_size_map_cast5 }, { "serpent", qcrypto_block_luks_cipher_size_map_serpent }, { "twofish", qcrypto_block_luks_cipher_size_map_twofish }, +{ "sm4", qcrypto_block_luks_cipher_size_map_sm4}, }; QEMU_BUILD_BUG_ON(sizeof(struct QCryptoBlockLUKSKeySlot) != 48); diff --git a/crypto/cipher-gcrypt.c.inc b/crypto/cipher-gcrypt.c.inc index a6a0117717..03af50b0c3 100644 --- a/crypto/cipher-gcrypt.c.inc +++ b/crypto/cipher-gcrypt.c.inc @@ -35,6 +35,7 @@ bool qcrypto_cipher_supports(QCryptoCipherAlgorithm alg, case QCRYPTO_CIPHER_ALG_SERPENT_256: case QCRYPTO_CIPHER_ALG_TWOFISH_128: case QCRYPTO_CIPHER_ALG_TWOFISH_256: +case QCRYPTO_CIPHER_ALG_SM4: break; default: return false; @@ -219,6 +220,9 @@ static QCryptoCipher *qcrypto_cipher_ctx_new(QCryptoCipherAlgorithm alg, case QCRYPTO_CIPHER_ALG_TWOFISH_256: gcryalg = GCRY_CIPHER_TWOFISH; break; +case QCRYPTO_CIPHER_ALG_SM4: +gcryalg = GCRY_CIPHER_SM4; +break; default: error_setg(errp, "Unsupported cipher algorithm %s", QCryptoCipherAlgorithm_str(alg)); diff --git a/crypto/cipher-nettle.c.inc b/crypto/cipher-nettle.c.inc index 24cc61f87b..cd2ca0c7b5 100644 --- a/crypto/cipher-nettle.c.inc +++ b/crypto/cipher-nettle.c.inc @@ -30,6 +30,7 @@ #include #include #include +#include #ifndef CONFIG_QEMU_PRIVATE_XTS #include #endif @@ -426,6 +427,28 @@ DEFINE_ECB_CBC_CTR_XTS(qcrypto_nettle_twofish, QCryptoNettleTwofish, TWOFISH_BLOCK_SIZE, twofish_encrypt_native, twofish_decrypt_native) +typedef struct QCryptoNettleSm4 { +QCryptoCipher base; +struct sm4_ctx key[2]; +} QCryptoNettleSm4; + +static void sm4_encrypt_native(void *ctx, size_t length, + uint8_t *dst, const uint8_t *src) +{ +struct sm4_ctx *keys = ctx; +sm4_crypt([0], length, dst, src); +} + +static void sm4_decrypt_native(void *ctx, size_t length, + uint8_t *dst, const uint8_t *src) +{ +struct sm4_ctx *keys = ctx; +sm4_crypt([1], length, dst, src); +} + +DEFINE_ECB(qcrypto_nettle_sm4, + QCryptoNettleSm4, SM4_BLOCK_SIZE, + sm4_encrypt_native, sm4_decrypt_native) bool qcrypto_cipher_supports(QCryptoCipherAlgorithm alg, QCryptoCipherMode mode) @@ -443,6 +466,7 @@ bool qcrypto_cipher_supports(QCryptoCipherAlgorithm alg, case QCRYPTO_CIPHER_ALG_TWOFISH_128: case QCRYPTO_CIPHER_ALG_TWOFISH_192: case QCRYPTO_CIPHER_ALG_TWOFISH_256: +case QCRYPTO_CIPHER_ALG_SM4: break; default: return false; @@ -702,6 +726,24 @@ static QCryptoCipher *qcrypto_cipher_ctx_new(QCryptoCipherAlgorithm alg, return >base; } +case QCRYPTO_CIPHER_ALG_SM4: +{ +QCryptoNettleSm4 *ctx = g_new0(QCryptoNettleSm4, 1); + +switch (mode) { +case QCRYPTO_CIPHER_MODE_ECB: +ctx->base.driver = _nettle_sm4_driver_ecb; +break; +default: +goto bad_cipher_mode; +} + +sm4_set_encrypt_key(>key[0], key); +sm4_set_decrypt_key(>key[1], key); + +return >base; +} + default: error_setg(errp, "Unsupported cipher algorithm %s", QCryptoCipherAlgorithm_str(alg)); diff --git a/crypto/cipher.c b/crypto/cipher.c index 74b09a5b26..048ceaa6a3 100644 --- a/crypto/cipher.c +++ b/crypto/cipher.c @@ -38,6 +38,7 @@ static const size_t alg_key_len[QCRYPTO_CIPHER_ALG__MAX] = { [QCRYPTO_CIPHER_ALG_TWOFISH_128] = 16, [QCRYPTO_CIPHER_ALG_TWOFIS
[v2 1/2] qapi/virtio: Add feature and status bits for x-query-virtio-status
This patch allows to display feature and status bits in virtio-status. Applications could find it helpful to compare status and features that are numeric encoded. For example, an upper application could use the features (encoded as a number) in the output of "ovs-vsctl list interface" and the feature bits fields in the output of QMP command "x-query-virtio-status" to compare directly when attempting to ensure the correctness of the virtio negotiation between guest, QEMU, and OVS-DPDK. Not applying any more encoding. This patch also serves as a preparation for the next one, which implements a vhost-user test case about acked features of vhost-user protocol. Note that since the matching HMP command is typically used for human, leave it unchanged. Signed-off-by: Hyman Huang --- hw/virtio/virtio-qmp.c | 8 qapi/virtio.json | 37 + 2 files changed, 45 insertions(+) diff --git a/hw/virtio/virtio-qmp.c b/hw/virtio/virtio-qmp.c index 1dd96ed20f..13ba1e926e 100644 --- a/hw/virtio/virtio-qmp.c +++ b/hw/virtio/virtio-qmp.c @@ -733,6 +733,9 @@ VirtioStatus *qmp_x_query_virtio_status(const char *path, Error **errp) status->name = g_strdup(vdev->name); status->device_id = vdev->device_id; status->vhost_started = vdev->vhost_started; +status->guest_features_bits = vdev->guest_features; +status->host_features_bits = vdev->host_features; +status->backend_features_bits = vdev->backend_features; status->guest_features = qmp_decode_features(vdev->device_id, vdev->guest_features); status->host_features = qmp_decode_features(vdev->device_id, @@ -753,6 +756,7 @@ VirtioStatus *qmp_x_query_virtio_status(const char *path, Error **errp) } status->num_vqs = virtio_get_num_queues(vdev); +status->status_bits = vdev->status; status->status = qmp_decode_status(vdev->status); status->isr = vdev->isr; status->queue_sel = vdev->queue_sel; @@ -775,6 +779,10 @@ VirtioStatus *qmp_x_query_virtio_status(const char *path, Error **errp) status->vhost_dev->n_tmp_sections = hdev->n_tmp_sections; status->vhost_dev->nvqs = hdev->nvqs; status->vhost_dev->vq_index = hdev->vq_index; +status->vhost_dev->features_bits = hdev->features; +status->vhost_dev->acked_features_bits = hdev->acked_features; +status->vhost_dev->backend_features_bits = hdev->backend_features; +status->vhost_dev->protocol_features_bits = hdev->protocol_features; status->vhost_dev->features = qmp_decode_features(vdev->device_id, hdev->features); status->vhost_dev->acked_features = diff --git a/qapi/virtio.json b/qapi/virtio.json index e6dcee7b83..6f1b5e3710 100644 --- a/qapi/virtio.json +++ b/qapi/virtio.json @@ -79,12 +79,20 @@ # # @vq-index: vhost_dev vq_index # +# @features-bits: vhost_dev features encoded as a number +# # @features: vhost_dev features # +# @acked-features-bits: vhost_dev acked_features encoded as a number +# # @acked-features: vhost_dev acked_features # +# @backend-features-bits: vhost_dev backend_features encoded as a number +# # @backend-features: vhost_dev backend_features # +# @protocol-features-bits: vhost_dev protocol_features encoded as a number +# # @protocol-features: vhost_dev protocol_features # # @max-queues: vhost_dev max_queues @@ -102,9 +110,13 @@ 'n-tmp-sections': 'int', 'nvqs': 'uint32', 'vq-index': 'int', +'features-bits': 'uint64', 'features': 'VirtioDeviceFeatures', +'acked-features-bits': 'uint64', 'acked-features': 'VirtioDeviceFeatures', +'backend-features-bits': 'uint64', 'backend-features': 'VirtioDeviceFeatures', +'protocol-features-bits': 'uint64', 'protocol-features': 'VhostDeviceProtocols', 'max-queues': 'uint64', 'backend-cap': 'uint64', @@ -124,10 +136,16 @@ # # @vhost-started: VirtIODevice vhost_started flag # +# @guest-features-bits: VirtIODevice guest_features encoded as a number +# # @guest-features: VirtIODevice guest_features # +# @host-features-bits: VirtIODevice host_features encoded as a number +# # @host-features: VirtIODevice host_features # +# @backend-features-bits: VirtIODevice backend_features encoded as a number +# # @backend-features: VirtIODevice backend_features # # @device-endian: VirtIODevice device_endian @@ -135,6 +153,9 @@ # @num-vqs: VirtIODevice virtqueue count. This is the number of # active virtqueues being used by the VirtIODevice. # +# @status-bits: VirtIODevice configuration status encoded as a number +# (VirtioDeviceStatus) +# # @status: VirtIODevice configuration s
[v2 2/2] vhost-user-test: Add negotiated features check
When a vhost-user network device is restored from an unexpected failure, the acked_features could be used as input for the VHOST_USER_SET_FEATURES command because QEMU internally backups the final features as acked_features after the guest acknowledges features during virtio-net driver initialization. The negotiated features check verifies whether the features in the Vhost slave device and the acked_features in QEMU are identical. Through the usage of the vhost-user protocol, the test case seeks to verify that the vhost-user network device is correctly negotiating. Signed-off-by: Hyman Huang --- tests/qtest/vhost-user-test.c | 100 ++ 1 file changed, 100 insertions(+) diff --git a/tests/qtest/vhost-user-test.c b/tests/qtest/vhost-user-test.c index d4e437265f..4f98ee2560 100644 --- a/tests/qtest/vhost-user-test.c +++ b/tests/qtest/vhost-user-test.c @@ -13,6 +13,7 @@ #include "libqtest-single.h" #include "qapi/error.h" #include "qapi/qmp/qdict.h" +#include "qapi/qmp/qlist.h" #include "qemu/config-file.h" #include "qemu/option.h" #include "qemu/range.h" @@ -169,6 +170,7 @@ typedef struct TestServer { int test_flags; int queues; struct vhost_user_ops *vu_ops; +uint64_t features; } TestServer; struct vhost_user_ops { @@ -1020,6 +1022,100 @@ static void test_multiqueue(void *obj, void *arg, QGuestAllocator *alloc) } +static QDict *query_virtio(QTestState *who) +{ +QDict *rsp; + +rsp = qtest_qmp(who, "{ 'execute': 'x-query-virtio'}"); +g_assert(!qdict_haskey(rsp, "error")); +g_assert(qdict_haskey(rsp, "return")); + +return rsp; +} + +static QDict *query_virtio_status(QTestState *who, const char *path) +{ +QDict *rsp; + +rsp = qtest_qmp(who, "{ 'execute': 'x-query-virtio-status', " +"'arguments': { 'path': %s} }", path); + +g_assert(!qdict_haskey(rsp, "error")); +g_assert(qdict_haskey(rsp, "return")); + +return rsp; +} + +static uint64_t get_acked_features(QTestState *who) +{ +QDict *rsp_return, *status, *vhost_info, *dev; +QList *dev_list; +const QListEntry *entry; +const char *name; +char *path; +uint64_t acked_features; + +/* query the virtio devices */ +rsp_return = query_virtio(who); +g_assert(rsp_return); + +dev_list = qdict_get_qlist(rsp_return, "return"); +g_assert(dev_list && !qlist_empty(dev_list)); + +/* fetch the first and the sole device */ +entry = qlist_first(dev_list); +g_assert(entry); + +dev = qobject_to(QDict, qlist_entry_obj(entry)); +g_assert(dev); + +name = qdict_get_try_str(dev, "name"); +g_assert_cmpstr(name, ==, "virtio-net"); + +path = g_strdup(qdict_get_try_str(dev, "path")); +g_assert(path); +qobject_unref(rsp_return); +rsp_return = NULL; + +/* fetch the status of the virtio-net device by QOM path */ +rsp_return = query_virtio_status(who, path); +g_assert(rsp_return); + +status = qdict_get_qdict(rsp_return, "return"); +g_assert(status); + +vhost_info = qdict_get_qdict(status, "vhost-dev"); +g_assert(vhost_info); + +acked_features = qdict_get_try_int(vhost_info, "acked-features-bits", 0); + +qobject_unref(rsp_return); +g_free(path); + +return acked_features; +} + +static void acked_features_check(QTestState *qts, TestServer *s) +{ +uint64_t acked_features; + +acked_features = get_acked_features(qts); +g_assert_cmpint(acked_features, ==, s->features); +} + +static void test_acked_features(void *obj, + void *arg, + QGuestAllocator *alloc) +{ +TestServer *server = arg; + +if (!wait_for_fds(server)) { +return; +} + +acked_features_check(global_qtest, server); +} + static uint64_t vu_net_get_features(TestServer *s) { uint64_t features = 0x1ULL << VHOST_F_LOG_ALL | @@ -1040,6 +1136,7 @@ static void vu_net_set_features(TestServer *s, CharBackend *chr, qemu_chr_fe_disconnect(chr); s->test_flags = TEST_FLAGS_BAD; } +s->features = msg->payload.u64; } static void vu_net_get_protocol_features(TestServer *s, CharBackend *chr, @@ -1109,6 +1206,9 @@ static void register_vhost_user_test(void) qos_add_test("vhost-user/multiqueue", "virtio-net", test_multiqueue, ); +qos_add_test("vhost-user/read_acked_features", + "virtio-net", + test_acked_features, ); } libqos_init(register_vhost_user_test); -- 2.39.1
[v2 0/2] vhost-user-test: Add negotiated features check
Markus made suggestions for the changes to version 2, and thanks for that as well. v2: - rebase on master. - drop the "show-bits" option. - refine the comment. v1: The patchset "Fix the virtio features negotiation flaw" fix a vhost-user negotiation flaw: c9bdc449f9 vhost-user: Fix the virtio features negotiation flaw bebcac052a vhost-user: Refactor the chr_closed_bh 937b7d96e4 vhost-user: Refactor vhost acked features saving While the test case remain unmerged, the detail reference: https://lore.kernel.org/qemu-devel/cover.1667232396.git.huang...@chinatelecom.cn/ Since Michael pointed out that the info virtio makes sense to query the negotiation feature, this patchset uses the x-query-virtio-status to retrieve the features instead of exporting netdev capabilities and information as we did in the previous patchset to aid in confirming the negotiation's validity. To do that, we first introduce an "show-bits" argument for x-query-virtio-status such that the feature bits can be used directly, and then implement the test case for negotiated features check. As we post, the code is divided into two patches. Please review, thanks, Yong Hyman Huang (2): qapi/virtio: Add feature and status bits for x-query-virtio-status vhost-user-test: Add negotiated features check hw/virtio/virtio-qmp.c| 8 +++ qapi/virtio.json | 37 + tests/qtest/vhost-user-test.c | 100 ++ 3 files changed, 145 insertions(+) -- 2.39.1
[RFC 1/2] qapi/virtio: introduce the "show-bits" argument for x-query-virtio-status
This patch allows to display feature and status bits in virtio-status. An optional argument is introduced: show-bits. For example: {"execute": "x-query-virtio-status", "arguments": {"path": "/machine/peripheral-anon/device[1]/virtio-backend", "show-bits": true} Features and status bits could be helpful for applications to compare directly. For instance, when an upper application aims to ensure the virtio negotiation correctness between guest, QEMU, and OVS-DPDK, it use the "ovs-vsctl list interface" command to retrieve interface features (in number format) and the QMP command x-query-virtio-status to retrieve vhost-user net device features. If "show-bits" is added, the application can compare the two features directly; No need to encoding the features returned by the QMP command. This patch also serves as a preparation for the next one, which implements a vhost-user test case about acked features of vhost-user protocol. Note that since the matching HMP command is typically used for human, leave it unchanged. Signed-off-by: Hyman Huang --- hw/virtio/virtio-hmp-cmds.c | 2 +- hw/virtio/virtio-qmp.c | 21 +++- qapi/virtio.json| 49 ++--- 3 files changed, 67 insertions(+), 5 deletions(-) diff --git a/hw/virtio/virtio-hmp-cmds.c b/hw/virtio/virtio-hmp-cmds.c index 477c97dea2..3774f3d4bf 100644 --- a/hw/virtio/virtio-hmp-cmds.c +++ b/hw/virtio/virtio-hmp-cmds.c @@ -108,7 +108,7 @@ void hmp_virtio_status(Monitor *mon, const QDict *qdict) { Error *err = NULL; const char *path = qdict_get_try_str(qdict, "path"); -VirtioStatus *s = qmp_x_query_virtio_status(path, ); +VirtioStatus *s = qmp_x_query_virtio_status(path, false, false, ); if (err != NULL) { hmp_handle_error(mon, err); diff --git a/hw/virtio/virtio-qmp.c b/hw/virtio/virtio-qmp.c index 1dd96ed20f..2e92bf28ac 100644 --- a/hw/virtio/virtio-qmp.c +++ b/hw/virtio/virtio-qmp.c @@ -718,10 +718,15 @@ VirtIODevice *qmp_find_virtio_device(const char *path) return VIRTIO_DEVICE(dev); } -VirtioStatus *qmp_x_query_virtio_status(const char *path, Error **errp) +VirtioStatus *qmp_x_query_virtio_status(const char *path, +bool has_show_bits, +bool show_bits, +Error **errp) { VirtIODevice *vdev; VirtioStatus *status; +bool display_bits = +has_show_bits ? show_bits : false; vdev = qmp_find_virtio_device(path); if (vdev == NULL) { @@ -733,6 +738,11 @@ VirtioStatus *qmp_x_query_virtio_status(const char *path, Error **errp) status->name = g_strdup(vdev->name); status->device_id = vdev->device_id; status->vhost_started = vdev->vhost_started; +if (display_bits) { +status->guest_features_bits = vdev->guest_features; +status->host_features_bits = vdev->host_features; +status->backend_features_bits = vdev->backend_features; +} status->guest_features = qmp_decode_features(vdev->device_id, vdev->guest_features); status->host_features = qmp_decode_features(vdev->device_id, @@ -753,6 +763,9 @@ VirtioStatus *qmp_x_query_virtio_status(const char *path, Error **errp) } status->num_vqs = virtio_get_num_queues(vdev); +if (display_bits) { +status->status_bits = vdev->status; +} status->status = qmp_decode_status(vdev->status); status->isr = vdev->isr; status->queue_sel = vdev->queue_sel; @@ -775,6 +788,12 @@ VirtioStatus *qmp_x_query_virtio_status(const char *path, Error **errp) status->vhost_dev->n_tmp_sections = hdev->n_tmp_sections; status->vhost_dev->nvqs = hdev->nvqs; status->vhost_dev->vq_index = hdev->vq_index; +if (display_bits) { +status->vhost_dev->features_bits = hdev->features; +status->vhost_dev->acked_features_bits = hdev->acked_features; +status->vhost_dev->backend_features_bits = hdev->backend_features; +status->vhost_dev->protocol_features_bits = hdev->protocol_features; +} status->vhost_dev->features = qmp_decode_features(vdev->device_id, hdev->features); status->vhost_dev->acked_features = diff --git a/qapi/virtio.json b/qapi/virtio.json index e6dcee7b83..608b841a89 100644 --- a/qapi/virtio.json +++ b/qapi/virtio.json @@ -79,12 +79,20 @@ # # @vq-index: vhost_dev vq_index # +# @features-bits: vhost_dev features in decimal format +# # @features: vhost_dev features # +# @acked-features-bits: vhost_dev acked_features in decimal format +# # @acked-features: vhost_dev acked
[RFC 2/2] vhost-user-test: Add negotiated features check
When a vhost-user network device is restored from an unexpected failure, the acked_features could be used as input for the VHOST_USER_SET_FEATURES command because QEMU internally backups the final features as acked_features after the guest acknowledges features during virtio-net driver initialization. The negotiated features check verifies whether the features in the Vhost slave device and the acked_features in QEMU are identical. Through the usage of the vhost-user protocol, the test case seeks to verify that the vhost-user network device is correctly negotiating. Signed-off-by: Hyman Huang --- tests/qtest/vhost-user-test.c | 100 ++ 1 file changed, 100 insertions(+) diff --git a/tests/qtest/vhost-user-test.c b/tests/qtest/vhost-user-test.c index d4e437265f..14df89f823 100644 --- a/tests/qtest/vhost-user-test.c +++ b/tests/qtest/vhost-user-test.c @@ -13,6 +13,7 @@ #include "libqtest-single.h" #include "qapi/error.h" #include "qapi/qmp/qdict.h" +#include "qapi/qmp/qlist.h" #include "qemu/config-file.h" #include "qemu/option.h" #include "qemu/range.h" @@ -169,6 +170,7 @@ typedef struct TestServer { int test_flags; int queues; struct vhost_user_ops *vu_ops; +uint64_t features; } TestServer; struct vhost_user_ops { @@ -1020,6 +1022,100 @@ static void test_multiqueue(void *obj, void *arg, QGuestAllocator *alloc) } +static QDict *query_virtio(QTestState *who) +{ +QDict *rsp; + +rsp = qtest_qmp(who, "{ 'execute': 'x-query-virtio'}"); +g_assert(!qdict_haskey(rsp, "error")); +g_assert(qdict_haskey(rsp, "return")); + +return rsp; +} + +static QDict *query_virtio_status(QTestState *who, const char *path) +{ +QDict *rsp; + +rsp = qtest_qmp(who, "{ 'execute': 'x-query-virtio-status', " +"'arguments': { 'path': %s, 'show-bits': true} }", path); + +g_assert(!qdict_haskey(rsp, "error")); +g_assert(qdict_haskey(rsp, "return")); + +return rsp; +} + +static uint64_t get_acked_features(QTestState *who) +{ +QDict *rsp_return, *status, *vhost_info, *dev; +QList *dev_list; +const QListEntry *entry; +const char *name; +char *path; +uint64_t acked_features; + +/* query the virtio devices */ +rsp_return = query_virtio(who); +g_assert(rsp_return); + +dev_list = qdict_get_qlist(rsp_return, "return"); +g_assert(dev_list && !qlist_empty(dev_list)); + +/* fetch the first and the sole device */ +entry = qlist_first(dev_list); +g_assert(entry); + +dev = qobject_to(QDict, qlist_entry_obj(entry)); +g_assert(dev); + +name = qdict_get_try_str(dev, "name"); +g_assert_cmpstr(name, ==, "virtio-net"); + +path = g_strdup(qdict_get_try_str(dev, "path")); +g_assert(path); +qobject_unref(rsp_return); +rsp_return = NULL; + +/* fetch the status of the virtio-net device by QOM path */ +rsp_return = query_virtio_status(who, path); +g_assert(rsp_return); + +status = qdict_get_qdict(rsp_return, "return"); +g_assert(status); + +vhost_info = qdict_get_qdict(status, "vhost-dev"); +g_assert(vhost_info); + +acked_features = qdict_get_try_int(vhost_info, "acked-features-bits", 0); + +qobject_unref(rsp_return); +g_free(path); + +return acked_features; +} + +static void acked_features_check(QTestState *qts, TestServer *s) +{ +uint64_t acked_features; + +acked_features = get_acked_features(qts); +g_assert_cmpint(acked_features, ==, s->features); +} + +static void test_acked_features(void *obj, + void *arg, + QGuestAllocator *alloc) +{ +TestServer *server = arg; + +if (!wait_for_fds(server)) { +return; +} + +acked_features_check(global_qtest, server); +} + static uint64_t vu_net_get_features(TestServer *s) { uint64_t features = 0x1ULL << VHOST_F_LOG_ALL | @@ -1040,6 +1136,7 @@ static void vu_net_set_features(TestServer *s, CharBackend *chr, qemu_chr_fe_disconnect(chr); s->test_flags = TEST_FLAGS_BAD; } +s->features = msg->payload.u64; } static void vu_net_get_protocol_features(TestServer *s, CharBackend *chr, @@ -1109,6 +1206,9 @@ static void register_vhost_user_test(void) qos_add_test("vhost-user/multiqueue", "virtio-net", test_multiqueue, ); +qos_add_test("vhost-user/read_acked_features", + "virtio-net", + test_acked_features, ); } libqos_init(register_vhost_user_test); -- 2.39.1
[RFC 0/2] vhost-user-test: Add negotiated features check
The patchset "Fix the virtio features negotiation flaw" fix a vhost-user negotiation flaw: c9bdc449f9 vhost-user: Fix the virtio features negotiation flaw bebcac052a vhost-user: Refactor the chr_closed_bh 937b7d96e4 vhost-user: Refactor vhost acked features saving While the test case remain unmerged, the detail reference: https://lore.kernel.org/qemu-devel/cover.1667232396.git.huang...@chinatelecom.cn/ Since Michael pointed out that the info virtio makes sense to query the negotiation feature, this patchset uses the x-query-virtio-status to retrieve the features instead of exporting netdev capabilities and information as we did in the previous patchset to aid in confirming the negotiation's validity. To do that, we first introduce an "show-bits" argument for x-query-virtio-status such that the feature bits can be used directly, and then implement the test case for negotiated features check. As we post, the code is divided into two patches. Please review, thanks, Yong Hyman Huang (2): qapi/virtio: introduce the "show-bits" argument for x-query-virtio-status vhost-user-test: Add negotiated features check hw/virtio/virtio-hmp-cmds.c | 2 +- hw/virtio/virtio-qmp.c| 21 ++- qapi/virtio.json | 49 - tests/qtest/vhost-user-test.c | 100 ++ 4 files changed, 167 insertions(+), 5 deletions(-) -- 2.39.1
[v3 6/6] docs/migration: Add the dirty limit section
The dirty limit feature has been introduced since the 8.1 QEMU release but has not reflected in the document, add a section for that. Signed-off-by: Hyman Huang Reviewed-by: Fabiano Rosas Message-Id: <36194a8a23d937392bf13d9fff8e898030c827a3.1697815117.git.yong.hu...@smartx.com> --- docs/devel/migration.rst | 71 1 file changed, 71 insertions(+) diff --git a/docs/devel/migration.rst b/docs/devel/migration.rst index be913630c3..12c35f9bc4 100644 --- a/docs/devel/migration.rst +++ b/docs/devel/migration.rst @@ -590,6 +590,77 @@ path. Return path - opened by main thread, written by main thread AND postcopy thread (protected by rp_mutex) +Dirty limit += +The dirty limit, short for dirty page rate upper limit, is a new capability +introduced in the 8.1 QEMU release that uses a new algorithm based on the KVM +dirty ring to throttle down the guest during live migration. + +The algorithm framework is as follows: + +:: + + -- + main --> throttle thread > PREPARE(1) < + thread \| | + \ | | +\ V | + -\CALCULATE(2) | + \ | | +\ | | + \ V | + \SET PENALTY(3) - + -\ | + \ | + \V + -> virtual CPU thread ---> ACCEPT PENALTY(4) + -- + +When the qmp command qmp_set_vcpu_dirty_limit is called for the first time, +the QEMU main thread starts the throttle thread. The throttle thread, once +launched, executes the loop, which consists of three steps: + + - PREPARE (1) + + The entire work of PREPARE (1) is preparation for the second stage, + CALCULATE(2), as the name implies. It involves preparing the dirty + page rate value and the corresponding upper limit of the VM: + The dirty page rate is calculated via the KVM dirty ring mechanism, + which tells QEMU how many dirty pages a virtual CPU has had since the + last KVM_EXIT_DIRTY_RING_FULL exception; The dirty page rate upper + limit is specified by caller, therefore fetch it directly. + + - CALCULATE (2) + + Calculate a suitable sleep period for each virtual CPU, which will be + used to determine the penalty for the target virtual CPU. The + computation must be done carefully in order to reduce the dirty page + rate progressively down to the upper limit without oscillation. To + achieve this, two strategies are provided: the first is to add or + subtract sleep time based on the ratio of the current dirty page rate + to the limit, which is used when the current dirty page rate is far + from the limit; the second is to add or subtract a fixed time when + the current dirty page rate is close to the limit. + + - SET PENALTY (3) + + Set the sleep time for each virtual CPU that should be penalized based + on the results of the calculation supplied by step CALCULATE (2). + +After completing the three above stages, the throttle thread loops back +to step PREPARE (1) until the dirty limit is reached. + +On the other hand, each virtual CPU thread reads the sleep duration and +sleeps in the path of the KVM_EXIT_DIRTY_RING_FULL exception handler, that +is ACCEPT PENALTY (4). Virtual CPUs tied with writing processes will +obviously exit to the path and get penalized, whereas virtual CPUs involved +with read processes will not. + +In summary, thanks to the KVM dirty ring technology, the dirty limit +algorithm will restrict virtual CPUs as needed to keep their dirty page +rate inside the limit. This leads to more steady reading performance during +live migration and can aid in improving large guest responsiveness. + Postcopy -- 2.39.1
[v3 4/6] tests/migration: Introduce dirty-ring-size option into guestperf
Dirty ring size configuration is not supported by guestperf tool. Introduce dirty-ring-size (ranges in [1024, 65536]) option so developers can play with dirty-ring and dirty-limit feature easier. To set dirty ring size with 4096 during migration test: $ ./tests/migration/guestperf.py --dirty-ring-size 4096 xxx Signed-off-by: Hyman Huang Reviewed-by: Fabiano Rosas Message-Id: --- tests/migration/guestperf/engine.py | 6 +- tests/migration/guestperf/hardware.py | 8 ++-- tests/migration/guestperf/shell.py| 6 +- 3 files changed, 16 insertions(+), 4 deletions(-) diff --git a/tests/migration/guestperf/engine.py b/tests/migration/guestperf/engine.py index da96ca034a..aabf6de4d9 100644 --- a/tests/migration/guestperf/engine.py +++ b/tests/migration/guestperf/engine.py @@ -325,7 +325,6 @@ def _get_common_args(self, hardware, tunnelled=False): cmdline = "'" + cmdline + "'" argv = [ -"-accel", "kvm", "-cpu", "host", "-kernel", self._kernel, "-initrd", self._initrd, @@ -333,6 +332,11 @@ def _get_common_args(self, hardware, tunnelled=False): "-m", str((hardware._mem * 1024) + 512), "-smp", str(hardware._cpus), ] +if hardware._dirty_ring_size: +argv.extend(["-accel", "kvm,dirty-ring-size=%s" % + hardware._dirty_ring_size]) +else: +argv.extend(["-accel", "kvm"]) argv.extend(self._get_qemu_serial_args()) diff --git a/tests/migration/guestperf/hardware.py b/tests/migration/guestperf/hardware.py index 3145785ffd..f779cc050b 100644 --- a/tests/migration/guestperf/hardware.py +++ b/tests/migration/guestperf/hardware.py @@ -23,7 +23,8 @@ def __init__(self, cpus=1, mem=1, src_cpu_bind=None, src_mem_bind=None, dst_cpu_bind=None, dst_mem_bind=None, prealloc_pages = False, - huge_pages=False, locked_pages=False): + huge_pages=False, locked_pages=False, + dirty_ring_size=0): self._cpus = cpus self._mem = mem # GiB self._src_mem_bind = src_mem_bind # List of NUMA nodes @@ -33,6 +34,7 @@ def __init__(self, cpus=1, mem=1, self._prealloc_pages = prealloc_pages self._huge_pages = huge_pages self._locked_pages = locked_pages +self._dirty_ring_size = dirty_ring_size def serialize(self): @@ -46,6 +48,7 @@ def serialize(self): "prealloc_pages": self._prealloc_pages, "huge_pages": self._huge_pages, "locked_pages": self._locked_pages, +"dirty_ring_size": self._dirty_ring_size, } @classmethod @@ -59,4 +62,5 @@ def deserialize(cls, data): data["dst_mem_bind"], data["prealloc_pages"], data["huge_pages"], -data["locked_pages"]) +data["locked_pages"], +data["dirty_ring_size"]) diff --git a/tests/migration/guestperf/shell.py b/tests/migration/guestperf/shell.py index 8a809e3dda..7d6b8cd7cf 100644 --- a/tests/migration/guestperf/shell.py +++ b/tests/migration/guestperf/shell.py @@ -60,6 +60,8 @@ def __init__(self): parser.add_argument("--prealloc-pages", dest="prealloc_pages", default=False) parser.add_argument("--huge-pages", dest="huge_pages", default=False) parser.add_argument("--locked-pages", dest="locked_pages", default=False) +parser.add_argument("--dirty-ring-size", dest="dirty_ring_size", +default=0, type=int) self._parser = parser @@ -89,7 +91,9 @@ def split_map(value): locked_pages=args.locked_pages, huge_pages=args.huge_pages, -prealloc_pages=args.prealloc_pages) +prealloc_pages=args.prealloc_pages, + +dirty_ring_size=args.dirty_ring_size) class Shell(BaseShell): -- 2.39.1
[v3 5/6] tests/migration: Introduce dirty-limit into guestperf
Currently, guestperf does not cover the dirty-limit migration, support this feature. Note that dirty-limit requires 'dirty-ring-size' set. To enable dirty-limit, setting x-vcpu-dirty-limit-period as 500ms and x-vcpu-dirty-limit as 10MB/s: $ ./tests/migration/guestperf.py \ --dirty-ring-size 4096 \ --dirty-limit --x-vcpu-dirty-limit-period 500 \ --vcpu-dirty-limit 10 --output output.json \ To run the entire standardized set of dirty-limit-enabled comparisons, with unix migration: $ ./tests/migration/guestperf-batch.py \ --dirty-ring-size 4096 \ --dst-host localhost --transport unix \ --filter compr-dirty-limit* --output outputdir Signed-off-by: Hyman Huang Reviewed-by: Fabiano Rosas Message-Id: <516e7a55dfc6e33d33510be37eb24223de5dc072.1697815117.git.yong.hu...@smartx.com> --- tests/migration/guestperf/comparison.py | 23 +++ tests/migration/guestperf/engine.py | 17 + tests/migration/guestperf/progress.py | 16 ++-- tests/migration/guestperf/scenario.py | 11 ++- tests/migration/guestperf/shell.py | 18 +- 5 files changed, 81 insertions(+), 4 deletions(-) diff --git a/tests/migration/guestperf/comparison.py b/tests/migration/guestperf/comparison.py index c03b3f6d7e..42cc0372d1 100644 --- a/tests/migration/guestperf/comparison.py +++ b/tests/migration/guestperf/comparison.py @@ -135,4 +135,27 @@ def __init__(self, name, scenarios): Scenario("compr-multifd-channels-64", multifd=True, multifd_channels=64), ]), + +# Looking at effect of dirty-limit with +# varying x_vcpu_dirty_limit_period +Comparison("compr-dirty-limit-period", scenarios = [ +Scenario("compr-dirty-limit-period-500", + dirty_limit=True, x_vcpu_dirty_limit_period=500), +Scenario("compr-dirty-limit-period-800", + dirty_limit=True, x_vcpu_dirty_limit_period=800), +Scenario("compr-dirty-limit-period-1000", + dirty_limit=True, x_vcpu_dirty_limit_period=1000), +]), + + +# Looking at effect of dirty-limit with +# varying vcpu_dirty_limit +Comparison("compr-dirty-limit", scenarios = [ +Scenario("compr-dirty-limit-10MB", + dirty_limit=True, vcpu_dirty_limit=10), +Scenario("compr-dirty-limit-20MB", + dirty_limit=True, vcpu_dirty_limit=20), +Scenario("compr-dirty-limit-50MB", + dirty_limit=True, vcpu_dirty_limit=50), +]), ] diff --git a/tests/migration/guestperf/engine.py b/tests/migration/guestperf/engine.py index aabf6de4d9..608d7270f6 100644 --- a/tests/migration/guestperf/engine.py +++ b/tests/migration/guestperf/engine.py @@ -102,6 +102,8 @@ def _migrate_progress(self, vm): info.get("expected-downtime", 0), info.get("setup-time", 0), info.get("cpu-throttle-percentage", 0), +info.get("dirty-limit-throttle-time-per-round", 0), +info.get("dirty-limit-ring-full-time", 0), ) def _migrate(self, hardware, scenario, src, dst, connect_uri): @@ -203,6 +205,21 @@ def _migrate(self, hardware, scenario, src, dst, connect_uri): resp = dst.cmd("migrate-set-parameters", multifd_channels=scenario._multifd_channels) +if scenario._dirty_limit: +if not hardware._dirty_ring_size: +raise Exception("dirty ring size must be configured when " +"testing dirty limit migration") + +resp = src.cmd("migrate-set-capabilities", + capabilities = [ + { "capability": "dirty-limit", + "state": True } + ]) +resp = src.cmd("migrate-set-parameters", +x_vcpu_dirty_limit_period=scenario._x_vcpu_dirty_limit_period) +resp = src.cmd("migrate-set-parameters", + vcpu_dirty_limit=scenario._vcpu_dirty_limit) + resp = src.cmd("migrate", uri=connect_uri) post_copy = False diff --git a/tests/migration/guestperf/progress.py b/tests/migration/guestperf/progress.py index ab1ee57273..d490584217 100644 --- a/tests/migration/guestperf/progress.py +++ b/tests/migration/guestperf/progress.py @@ -81,7 +81,9 @@ def __init__(self, downtime, downtime_expected, setup_time, - throttle_pcent): + throttle_pcent, + dirty_limit_throttle_time_per_round, + dirty_limit_ring_full_time): self._status = status
[v3 0/6] dirtylimit: miscellaneous patches
v3: - do nothing but rebase on master v2: - rebase on master. - fix the document typo. v1: This is a miscellaneous patchset for dirtylimit that contains the following parts: 1. dirtylimit module: fix for a race situation and replace usleep by g_usleep. 2. migration test: add dirtylimit test case. 3. guestperf for migration: add support for dirtylimit migration. 4. docs for migration: add dirtylimit section. Please review, thanks. Regards, Hyman Huang (6): system/dirtylimit: Fix a race situation system/dirtylimit: Drop the reduplicative check tests: Add migration dirty-limit capability test tests/migration: Introduce dirty-ring-size option into guestperf tests/migration: Introduce dirty-limit into guestperf docs/migration: Add the dirty limit section docs/devel/migration.rst| 71 ++ system/dirtylimit.c | 24 ++-- tests/migration/guestperf/comparison.py | 23 tests/migration/guestperf/engine.py | 23 +++- tests/migration/guestperf/hardware.py | 8 +- tests/migration/guestperf/progress.py | 16 ++- tests/migration/guestperf/scenario.py | 11 +- tests/migration/guestperf/shell.py | 24 +++- tests/qtest/migration-test.c| 164 9 files changed, 346 insertions(+), 18 deletions(-) -- 2.39.1
[v3 3/6] tests: Add migration dirty-limit capability test
Add migration dirty-limit capability test if kernel support dirty ring. Migration dirty-limit capability introduce dirty limit capability, two parameters: x-vcpu-dirty-limit-period and vcpu-dirty-limit are introduced to implement the live migration with dirty limit. The test case does the following things: 1. start src, dst vm and enable dirty-limit capability 2. start migrate and set cancel it to check if dirty limit stop working. 3. restart dst vm 4. start migrate and enable dirty-limit capability 5. check if migration satisfy the convergence condition during pre-switchover phase. Note that this test case involves many passes, so it runs in slow mode only. Signed-off-by: Hyman Huang Acked-by: Peter Xu Reviewed-by: Fabiano Rosas Message-Id: --- tests/qtest/migration-test.c | 164 +++ 1 file changed, 164 insertions(+) diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c index bc70a14642..0693078b07 100644 --- a/tests/qtest/migration-test.c +++ b/tests/qtest/migration-test.c @@ -2968,6 +2968,166 @@ static void test_vcpu_dirty_limit(void) dirtylimit_stop_vm(vm); } +static void migrate_dirty_limit_wait_showup(QTestState *from, +const int64_t period, +const int64_t value) +{ +/* Enable dirty limit capability */ +migrate_set_capability(from, "dirty-limit", true); + +/* Set dirty limit parameters */ +migrate_set_parameter_int(from, "x-vcpu-dirty-limit-period", period); +migrate_set_parameter_int(from, "vcpu-dirty-limit", value); + +/* Make sure migrate can't converge */ +migrate_ensure_non_converge(from); + +/* To check limit rate after precopy */ +migrate_set_capability(from, "pause-before-switchover", true); + +/* Wait for the serial output from the source */ +wait_for_serial("src_serial"); +} + +/* + * This test does: + * source destination + * start vm + * start incoming vm + * migrate + * wait dirty limit to begin + * cancel migrate + * cancellation check + * restart incoming vm + * migrate + * wait dirty limit to begin + * wait pre-switchover event + * convergence condition check + * + * And see if dirty limit migration works correctly. + * This test case involves many passes, so it runs in slow mode only. + */ +static void test_migrate_dirty_limit(void) +{ +g_autofree char *uri = g_strdup_printf("unix:%s/migsocket", tmpfs); +QTestState *from, *to; +int64_t remaining; +uint64_t throttle_us_per_full; +/* + * We want the test to be stable and as fast as possible. + * E.g., with 1Gb/s bandwith migration may pass without dirty limit, + * so we need to decrease a bandwidth. + */ +const int64_t dirtylimit_period = 1000, dirtylimit_value = 50; +const int64_t max_bandwidth = 4; /* ~400Mb/s */ +const int64_t downtime_limit = 250; /* 250ms */ +/* + * We migrate through unix-socket (> 500Mb/s). + * Thus, expected migration speed ~= bandwidth limit (< 500Mb/s). + * So, we can predict expected_threshold + */ +const int64_t expected_threshold = max_bandwidth * downtime_limit / 1000; +int max_try_count = 10; +MigrateCommon args = { +.start = { +.hide_stderr = true, +.use_dirty_ring = true, +}, +.listen_uri = uri, +.connect_uri = uri, +}; + +/* Start src, dst vm */ +if (test_migrate_start(, , args.listen_uri, )) { +return; +} + +/* Prepare for dirty limit migration and wait src vm show up */ +migrate_dirty_limit_wait_showup(from, dirtylimit_period, dirtylimit_value); + +/* Start migrate */ +migrate_qmp(from, uri, "{}"); + +/* Wait for dirty limit throttle begin */ +throttle_us_per_full = 0; +while (throttle_us_per_full == 0) { +throttle_us_per_full = +read_migrate_property_int(from, "dirty-limit-throttle-time-per-round"); +usleep(100); +g_assert_false(got_src_stop); +} + +/* Now cancel migrate and wait for dirty limit throttle switch off */ +migrate_cancel(from); +wait_for_migration_status(from, "cancelled", NULL); + +/* Check if dirty limit throttle switched off, set timeout 1ms */ +do { +throttle_us_per_full = +read_migrate_property_int(from, "dirty-limit-throttle-time-per-round"); +usleep(100); +g_assert_false(got_src_stop); +} while (throttle_us_per_full != 0 && --max_try_count); + +/* Assert dirty limit is not in service */ +g_assert_cmpint(throttle_us_per_full, ==, 0); + +args = (MigrateCommon) { +.start = { +.only_target = true, +.use_dirty_ring = true, +}, +
[v3 2/6] system/dirtylimit: Drop the reduplicative check
Checking if dirty limit is in service is done by the dirtylimit_query_all function, drop the reduplicative check in the qmp_query_vcpu_dirty_limit function. Signed-off-by: Hyman Huang Reviewed-by: Fabiano Rosas Message-Id: <31384f768279027560ab952ebc2bbff1ddb62531.1697815117.git.yong.hu...@smartx.com> --- system/dirtylimit.c | 4 1 file changed, 4 deletions(-) diff --git a/system/dirtylimit.c b/system/dirtylimit.c index 3666c4cb7c..495c7a7082 100644 --- a/system/dirtylimit.c +++ b/system/dirtylimit.c @@ -652,10 +652,6 @@ static struct DirtyLimitInfoList *dirtylimit_query_all(void) struct DirtyLimitInfoList *qmp_query_vcpu_dirty_limit(Error **errp) { -if (!dirtylimit_in_service()) { -return NULL; -} - return dirtylimit_query_all(); } -- 2.39.1
[v3 1/6] system/dirtylimit: Fix a race situation
Fix a race situation for global variable dirtylimit_state. Also, replace usleep by g_usleep to increase platform accessibility to the sleep function. Signed-off-by: Hyman Huang Reviewed-by: Fabiano Rosas Message-Id: --- system/dirtylimit.c | 20 ++-- 1 file changed, 14 insertions(+), 6 deletions(-) diff --git a/system/dirtylimit.c b/system/dirtylimit.c index fa959d7743..3666c4cb7c 100644 --- a/system/dirtylimit.c +++ b/system/dirtylimit.c @@ -411,12 +411,20 @@ void dirtylimit_set_all(uint64_t quota, void dirtylimit_vcpu_execute(CPUState *cpu) { -if (dirtylimit_in_service() && -dirtylimit_vcpu_get_state(cpu->cpu_index)->enabled && -cpu->throttle_us_per_full) { -trace_dirtylimit_vcpu_execute(cpu->cpu_index, -cpu->throttle_us_per_full); -usleep(cpu->throttle_us_per_full); +if (cpu->throttle_us_per_full) { +dirtylimit_state_lock(); + +if (dirtylimit_in_service() && +dirtylimit_vcpu_get_state(cpu->cpu_index)->enabled) { +dirtylimit_state_unlock(); +trace_dirtylimit_vcpu_execute(cpu->cpu_index, +cpu->throttle_us_per_full); + +g_usleep(cpu->throttle_us_per_full); +return; +} + +dirtylimit_state_unlock(); } } -- 2.39.1
Re: [v2 4/6] tests/migration: Introduce dirty-ring-size option into guestperf
ping1 在 2023/10/23 10:03, Yong Huang 写道: ping. Regarding the performance of the live migration, Guestperf could provide us with a clear response. IMHO, by just adding a few metrics, it might be developed into a more user-friendly metrics system in the future. We may still enrich it prior to that. On Fri, Oct 20, 2023 at 11:24 PM Hyman Huang wrote: Dirty ring size configuration is not supported by guestperf tool. Introduce dirty-ring-size (ranges in [1024, 65536]) option so developers can play with dirty-ring and dirty-limit feature easier. To set dirty ring size with 4096 during migration test: $ ./tests/migration/guestperf.py --dirty-ring-size 4096 xxx Signed-off-by: Hyman Huang --- tests/migration/guestperf/engine.py | 6 +- tests/migration/guestperf/hardware.py | 8 ++-- tests/migration/guestperf/shell.py | 6 +- 3 files changed, 16 insertions(+), 4 deletions(-) diff --git a/tests/migration/guestperf/engine.py b/tests/migration/guestperf/engine.py index da96ca034a..aabf6de4d9 100644 --- a/tests/migration/guestperf/engine.py +++ b/tests/migration/guestperf/engine.py @@ -325,7 +325,6 @@ def _get_common_args(self, hardware, tunnelled=False): cmdline = "'" + cmdline + "'" argv = [ - "-accel", "kvm", "-cpu", "host", "-kernel", self._kernel, "-initrd", self._initrd, @@ -333,6 +332,11 @@ def _get_common_args(self, hardware, tunnelled=False): "-m", str((hardware._mem * 1024) + 512), "-smp", str(hardware._cpus), ] + if hardware._dirty_ring_size: + argv.extend(["-accel", "kvm,dirty-ring-size=%s" % + hardware._dirty_ring_size]) + else: + argv.extend(["-accel", "kvm"]) argv.extend(self._get_qemu_serial_args()) diff --git a/tests/migration/guestperf/hardware.py b/tests/migration/guestperf/hardware.py index 3145785ffd..f779cc050b 100644 --- a/tests/migration/guestperf/hardware.py +++ b/tests/migration/guestperf/hardware.py @@ -23,7 +23,8 @@ def __init__(self, cpus=1, mem=1, src_cpu_bind=None, src_mem_bind=None, dst_cpu_bind=None, dst_mem_bind=None, prealloc_pages = False, - huge_pages=False, locked_pages=False): + huge_pages=False, locked_pages=False, + dirty_ring_size=0): self._cpus = cpus self._mem = mem # GiB self._src_mem_bind = src_mem_bind # List of NUMA nodes @@ -33,6 +34,7 @@ def __init__(self, cpus=1, mem=1, self._prealloc_pages = prealloc_pages self._huge_pages = huge_pages self._locked_pages = locked_pages + self._dirty_ring_size = dirty_ring_size def serialize(self): @@ -46,6 +48,7 @@ def serialize(self): "prealloc_pages": self._prealloc_pages, "huge_pages": self._huge_pages, "locked_pages": self._locked_pages, + "dirty_ring_size": self._dirty_ring_size, } @classmethod @@ -59,4 +62,5 @@ def deserialize(cls, data): data["dst_mem_bind"], data["prealloc_pages"], data["huge_pages"], - data["locked_pages"]) + data["locked_pages"], + data["dirty_ring_size"]) diff --git a/tests/migration/guestperf/shell.py b/tests/migration/guestperf/shell.py index 8a809e3dda..7d6b8cd7cf 100644 --- a/tests/migration/guestperf/shell.py +++ b/tests/migration/guestperf/shell.py @@ -60,6 +60,8 @@ def __init__(self): parser.add_argument("--prealloc-pages", dest="prealloc_pages", default=False) parser.add_argument("--huge-pages", dest="huge_pages", default=False) parser.add_argument("--locked-pages", dest="locked_pages", default=False) + parser.add_argument("--dirty-ring-size", dest="dirty_ring_size", + default=0, type=int) self._parser = parser @@ -89,7 +91,9 @@ def split_map(value): locked_pages=args.locked_pages, huge_pages=args.huge_pages, - prealloc_pages=args.prealloc_pages) + prealloc_pages=args.prealloc_pages, + + dirty_ring_size=args.dirty_ring_size) class Shell(BaseShell): -- 2.39.1 -- Best regards
[v2 5/6] tests/migration: Introduce dirty-limit into guestperf
Currently, guestperf does not cover the dirty-limit migration, support this feature. Note that dirty-limit requires 'dirty-ring-size' set. To enable dirty-limit, setting x-vcpu-dirty-limit-period as 500ms and x-vcpu-dirty-limit as 10MB/s: $ ./tests/migration/guestperf.py \ --dirty-ring-size 4096 \ --dirty-limit --x-vcpu-dirty-limit-period 500 \ --vcpu-dirty-limit 10 --output output.json \ To run the entire standardized set of dirty-limit-enabled comparisons, with unix migration: $ ./tests/migration/guestperf-batch.py \ --dirty-ring-size 4096 \ --dst-host localhost --transport unix \ --filter compr-dirty-limit* --output outputdir Signed-off-by: Hyman Huang --- tests/migration/guestperf/comparison.py | 23 +++ tests/migration/guestperf/engine.py | 17 + tests/migration/guestperf/progress.py | 16 ++-- tests/migration/guestperf/scenario.py | 11 ++- tests/migration/guestperf/shell.py | 18 +- 5 files changed, 81 insertions(+), 4 deletions(-) diff --git a/tests/migration/guestperf/comparison.py b/tests/migration/guestperf/comparison.py index c03b3f6d7e..42cc0372d1 100644 --- a/tests/migration/guestperf/comparison.py +++ b/tests/migration/guestperf/comparison.py @@ -135,4 +135,27 @@ def __init__(self, name, scenarios): Scenario("compr-multifd-channels-64", multifd=True, multifd_channels=64), ]), + +# Looking at effect of dirty-limit with +# varying x_vcpu_dirty_limit_period +Comparison("compr-dirty-limit-period", scenarios = [ +Scenario("compr-dirty-limit-period-500", + dirty_limit=True, x_vcpu_dirty_limit_period=500), +Scenario("compr-dirty-limit-period-800", + dirty_limit=True, x_vcpu_dirty_limit_period=800), +Scenario("compr-dirty-limit-period-1000", + dirty_limit=True, x_vcpu_dirty_limit_period=1000), +]), + + +# Looking at effect of dirty-limit with +# varying vcpu_dirty_limit +Comparison("compr-dirty-limit", scenarios = [ +Scenario("compr-dirty-limit-10MB", + dirty_limit=True, vcpu_dirty_limit=10), +Scenario("compr-dirty-limit-20MB", + dirty_limit=True, vcpu_dirty_limit=20), +Scenario("compr-dirty-limit-50MB", + dirty_limit=True, vcpu_dirty_limit=50), +]), ] diff --git a/tests/migration/guestperf/engine.py b/tests/migration/guestperf/engine.py index aabf6de4d9..608d7270f6 100644 --- a/tests/migration/guestperf/engine.py +++ b/tests/migration/guestperf/engine.py @@ -102,6 +102,8 @@ def _migrate_progress(self, vm): info.get("expected-downtime", 0), info.get("setup-time", 0), info.get("cpu-throttle-percentage", 0), +info.get("dirty-limit-throttle-time-per-round", 0), +info.get("dirty-limit-ring-full-time", 0), ) def _migrate(self, hardware, scenario, src, dst, connect_uri): @@ -203,6 +205,21 @@ def _migrate(self, hardware, scenario, src, dst, connect_uri): resp = dst.cmd("migrate-set-parameters", multifd_channels=scenario._multifd_channels) +if scenario._dirty_limit: +if not hardware._dirty_ring_size: +raise Exception("dirty ring size must be configured when " +"testing dirty limit migration") + +resp = src.cmd("migrate-set-capabilities", + capabilities = [ + { "capability": "dirty-limit", + "state": True } + ]) +resp = src.cmd("migrate-set-parameters", +x_vcpu_dirty_limit_period=scenario._x_vcpu_dirty_limit_period) +resp = src.cmd("migrate-set-parameters", + vcpu_dirty_limit=scenario._vcpu_dirty_limit) + resp = src.cmd("migrate", uri=connect_uri) post_copy = False diff --git a/tests/migration/guestperf/progress.py b/tests/migration/guestperf/progress.py index ab1ee57273..d490584217 100644 --- a/tests/migration/guestperf/progress.py +++ b/tests/migration/guestperf/progress.py @@ -81,7 +81,9 @@ def __init__(self, downtime, downtime_expected, setup_time, - throttle_pcent): + throttle_pcent, + dirty_limit_throttle_time_per_round, + dirty_limit_ring_full_time): self._status = status self._ram = ram @@ -91,6 +93,10 @@ def __init__(self, self._downtime_expected = downtime_expected self._setup_time = setup_t
[v2 0/6] dirtylimit: miscellaneous patches
v2: - rebase on master. - fix the document typo. v1: This is a miscellaneous patchset for dirtylimit that contains the following parts: 1. dirtylimit module: fix for a race situation and replace usleep by g_usleep. 2. migration test: add dirtylimit test case. 3. guestperf for migration: add support for dirtylimit migration. 4. docs for migration: add dirtylimit section. Please review, thanks. Regards, Yong Hyman Huang (6): system/dirtylimit: Fix a race situation system/dirtylimit: Drop the reduplicative check tests: Add migration dirty-limit capability test tests/migration: Introduce dirty-ring-size option into guestperf tests/migration: Introduce dirty-limit into guestperf docs/migration: Add the dirty limit section docs/devel/migration.rst| 71 ++ system/dirtylimit.c | 24 ++-- tests/migration/guestperf/comparison.py | 23 tests/migration/guestperf/engine.py | 23 +++- tests/migration/guestperf/hardware.py | 8 +- tests/migration/guestperf/progress.py | 16 ++- tests/migration/guestperf/scenario.py | 11 +- tests/migration/guestperf/shell.py | 24 +++- tests/qtest/migration-test.c| 164 9 files changed, 346 insertions(+), 18 deletions(-) -- 2.39.1
[v2 3/6] tests: Add migration dirty-limit capability test
Add migration dirty-limit capability test if kernel support dirty ring. Migration dirty-limit capability introduce dirty limit capability, two parameters: x-vcpu-dirty-limit-period and vcpu-dirty-limit are introduced to implement the live migration with dirty limit. The test case does the following things: 1. start src, dst vm and enable dirty-limit capability 2. start migrate and set cancel it to check if dirty limit stop working. 3. restart dst vm 4. start migrate and enable dirty-limit capability 5. check if migration satisfy the convergence condition during pre-switchover phase. Note that this test case involves many passes, so it runs in slow mode only. Signed-off-by: Hyman Huang Acked-by: Peter Xu Reviewed-by: Fabiano Rosas --- tests/qtest/migration-test.c | 164 +++ 1 file changed, 164 insertions(+) diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c index e1c110537b..8f966c4d25 100644 --- a/tests/qtest/migration-test.c +++ b/tests/qtest/migration-test.c @@ -2943,6 +2943,166 @@ static void test_vcpu_dirty_limit(void) dirtylimit_stop_vm(vm); } +static void migrate_dirty_limit_wait_showup(QTestState *from, +const int64_t period, +const int64_t value) +{ +/* Enable dirty limit capability */ +migrate_set_capability(from, "dirty-limit", true); + +/* Set dirty limit parameters */ +migrate_set_parameter_int(from, "x-vcpu-dirty-limit-period", period); +migrate_set_parameter_int(from, "vcpu-dirty-limit", value); + +/* Make sure migrate can't converge */ +migrate_ensure_non_converge(from); + +/* To check limit rate after precopy */ +migrate_set_capability(from, "pause-before-switchover", true); + +/* Wait for the serial output from the source */ +wait_for_serial("src_serial"); +} + +/* + * This test does: + * source destination + * start vm + * start incoming vm + * migrate + * wait dirty limit to begin + * cancel migrate + * cancellation check + * restart incoming vm + * migrate + * wait dirty limit to begin + * wait pre-switchover event + * convergence condition check + * + * And see if dirty limit migration works correctly. + * This test case involves many passes, so it runs in slow mode only. + */ +static void test_migrate_dirty_limit(void) +{ +g_autofree char *uri = g_strdup_printf("unix:%s/migsocket", tmpfs); +QTestState *from, *to; +int64_t remaining; +uint64_t throttle_us_per_full; +/* + * We want the test to be stable and as fast as possible. + * E.g., with 1Gb/s bandwith migration may pass without dirty limit, + * so we need to decrease a bandwidth. + */ +const int64_t dirtylimit_period = 1000, dirtylimit_value = 50; +const int64_t max_bandwidth = 4; /* ~400Mb/s */ +const int64_t downtime_limit = 250; /* 250ms */ +/* + * We migrate through unix-socket (> 500Mb/s). + * Thus, expected migration speed ~= bandwidth limit (< 500Mb/s). + * So, we can predict expected_threshold + */ +const int64_t expected_threshold = max_bandwidth * downtime_limit / 1000; +int max_try_count = 10; +MigrateCommon args = { +.start = { +.hide_stderr = true, +.use_dirty_ring = true, +}, +.listen_uri = uri, +.connect_uri = uri, +}; + +/* Start src, dst vm */ +if (test_migrate_start(, , args.listen_uri, )) { +return; +} + +/* Prepare for dirty limit migration and wait src vm show up */ +migrate_dirty_limit_wait_showup(from, dirtylimit_period, dirtylimit_value); + +/* Start migrate */ +migrate_qmp(from, uri, "{}"); + +/* Wait for dirty limit throttle begin */ +throttle_us_per_full = 0; +while (throttle_us_per_full == 0) { +throttle_us_per_full = +read_migrate_property_int(from, "dirty-limit-throttle-time-per-round"); +usleep(100); +g_assert_false(got_src_stop); +} + +/* Now cancel migrate and wait for dirty limit throttle switch off */ +migrate_cancel(from); +wait_for_migration_status(from, "cancelled", NULL); + +/* Check if dirty limit throttle switched off, set timeout 1ms */ +do { +throttle_us_per_full = +read_migrate_property_int(from, "dirty-limit-throttle-time-per-round"); +usleep(100); +g_assert_false(got_src_stop); +} while (throttle_us_per_full != 0 && --max_try_count); + +/* Assert dirty limit is not in service */ +g_assert_cmpint(throttle_us_per_full, ==, 0); + +args = (MigrateCommon) { +.start = { +.only_target = true, +.use_dirty_ring = true, +}, +.listen_u
[v2 6/6] docs/migration: Add the dirty limit section
The dirty limit feature has been introduced since the 8.1 QEMU release but has not reflected in the document, add a section for that. Signed-off-by: Hyman Huang --- docs/devel/migration.rst | 71 1 file changed, 71 insertions(+) diff --git a/docs/devel/migration.rst b/docs/devel/migration.rst index c3e1400c0c..347244af89 100644 --- a/docs/devel/migration.rst +++ b/docs/devel/migration.rst @@ -588,6 +588,77 @@ path. Return path - opened by main thread, written by main thread AND postcopy thread (protected by rp_mutex) +Dirty limit += +The dirty limit, short for dirty page rate upper limit, is a new capability +introduced in the 8.1 QEMU release that uses a new algorithm based on the KVM +dirty ring to throttle down the guest during live migration. + +The algorithm framework is as follows: + +:: + + -- + main --> throttle thread > PREPARE(1) < + thread \| | + \ | | +\ V | + -\CALCULATE(2) | + \ | | +\ | | + \ V | + \SET PENALTY(3) - + -\ | + \ | + \V + -> virtual CPU thread ---> ACCEPT PENALTY(4) + -- + +When the qmp command qmp_set_vcpu_dirty_limit is called for the first time, +the QEMU main thread starts the throttle thread. The throttle thread, once +launched, executes the loop, which consists of three steps: + + - PREPARE (1) + + The entire work of PREPARE (1) is preparation for the second stage, + CALCULATE(2), as the name implies. It involves preparing the dirty + page rate value and the corresponding upper limit of the VM: + The dirty page rate is calculated via the KVM dirty ring mechanism, + which tells QEMU how many dirty pages a virtual CPU has had since the + last KVM_EXIT_DIRTY_RING_FULL exception; The dirty page rate upper + limit is specified by caller, therefore fetch it directly. + + - CALCULATE (2) + + Calculate a suitable sleep period for each virtual CPU, which will be + used to determine the penalty for the target virtual CPU. The + computation must be done carefully in order to reduce the dirty page + rate progressively down to the upper limit without oscillation. To + achieve this, two strategies are provided: the first is to add or + subtract sleep time based on the ratio of the current dirty page rate + to the limit, which is used when the current dirty page rate is far + from the limit; the second is to add or subtract a fixed time when + the current dirty page rate is close to the limit. + + - SET PENALTY (3) + + Set the sleep time for each virtual CPU that should be penalized based + on the results of the calculation supplied by step CALCULATE (2). + +After completing the three above stages, the throttle thread loops back +to step PREPARE (1) until the dirty limit is reached. + +On the other hand, each virtual CPU thread reads the sleep duration and +sleeps in the path of the KVM_EXIT_DIRTY_RING_FULL exception handler, that +is ACCEPT PENALTY (4). Virtual CPUs tied with writing processes will +obviously exit to the path and get penalized, whereas virtual CPUs involved +with read processes will not. + +In summary, thanks to the KVM dirty ring technology, the dirty limit +algorithm will restrict virtual CPUs as needed to keep their dirty page +rate inside the limit. This leads to more steady reading performance during +live migration and can aid in improving large guest responsiveness. + Postcopy -- 2.39.1
[v2 2/6] system/dirtylimit: Drop the reduplicative check
Checking if dirty limit is in service is done by the dirtylimit_query_all function, drop the reduplicative check in the qmp_query_vcpu_dirty_limit function. Signed-off-by: Hyman Huang Reviewed-by: Fabiano Rosas --- system/dirtylimit.c | 4 1 file changed, 4 deletions(-) diff --git a/system/dirtylimit.c b/system/dirtylimit.c index 3666c4cb7c..495c7a7082 100644 --- a/system/dirtylimit.c +++ b/system/dirtylimit.c @@ -652,10 +652,6 @@ static struct DirtyLimitInfoList *dirtylimit_query_all(void) struct DirtyLimitInfoList *qmp_query_vcpu_dirty_limit(Error **errp) { -if (!dirtylimit_in_service()) { -return NULL; -} - return dirtylimit_query_all(); } -- 2.39.1
[v2 1/6] system/dirtylimit: Fix a race situation
Fix a race situation for global variable dirtylimit_state. Also, replace usleep by g_usleep to increase platform accessibility to the sleep function. Signed-off-by: Hyman Huang Reviewed-by: Fabiano Rosas --- system/dirtylimit.c | 20 ++-- 1 file changed, 14 insertions(+), 6 deletions(-) diff --git a/system/dirtylimit.c b/system/dirtylimit.c index fa959d7743..3666c4cb7c 100644 --- a/system/dirtylimit.c +++ b/system/dirtylimit.c @@ -411,12 +411,20 @@ void dirtylimit_set_all(uint64_t quota, void dirtylimit_vcpu_execute(CPUState *cpu) { -if (dirtylimit_in_service() && -dirtylimit_vcpu_get_state(cpu->cpu_index)->enabled && -cpu->throttle_us_per_full) { -trace_dirtylimit_vcpu_execute(cpu->cpu_index, -cpu->throttle_us_per_full); -usleep(cpu->throttle_us_per_full); +if (cpu->throttle_us_per_full) { +dirtylimit_state_lock(); + +if (dirtylimit_in_service() && +dirtylimit_vcpu_get_state(cpu->cpu_index)->enabled) { +dirtylimit_state_unlock(); +trace_dirtylimit_vcpu_execute(cpu->cpu_index, +cpu->throttle_us_per_full); + +g_usleep(cpu->throttle_us_per_full); +return; +} + +dirtylimit_state_unlock(); } } -- 2.39.1
[v2 4/6] tests/migration: Introduce dirty-ring-size option into guestperf
Dirty ring size configuration is not supported by guestperf tool. Introduce dirty-ring-size (ranges in [1024, 65536]) option so developers can play with dirty-ring and dirty-limit feature easier. To set dirty ring size with 4096 during migration test: $ ./tests/migration/guestperf.py --dirty-ring-size 4096 xxx Signed-off-by: Hyman Huang --- tests/migration/guestperf/engine.py | 6 +- tests/migration/guestperf/hardware.py | 8 ++-- tests/migration/guestperf/shell.py| 6 +- 3 files changed, 16 insertions(+), 4 deletions(-) diff --git a/tests/migration/guestperf/engine.py b/tests/migration/guestperf/engine.py index da96ca034a..aabf6de4d9 100644 --- a/tests/migration/guestperf/engine.py +++ b/tests/migration/guestperf/engine.py @@ -325,7 +325,6 @@ def _get_common_args(self, hardware, tunnelled=False): cmdline = "'" + cmdline + "'" argv = [ -"-accel", "kvm", "-cpu", "host", "-kernel", self._kernel, "-initrd", self._initrd, @@ -333,6 +332,11 @@ def _get_common_args(self, hardware, tunnelled=False): "-m", str((hardware._mem * 1024) + 512), "-smp", str(hardware._cpus), ] +if hardware._dirty_ring_size: +argv.extend(["-accel", "kvm,dirty-ring-size=%s" % + hardware._dirty_ring_size]) +else: +argv.extend(["-accel", "kvm"]) argv.extend(self._get_qemu_serial_args()) diff --git a/tests/migration/guestperf/hardware.py b/tests/migration/guestperf/hardware.py index 3145785ffd..f779cc050b 100644 --- a/tests/migration/guestperf/hardware.py +++ b/tests/migration/guestperf/hardware.py @@ -23,7 +23,8 @@ def __init__(self, cpus=1, mem=1, src_cpu_bind=None, src_mem_bind=None, dst_cpu_bind=None, dst_mem_bind=None, prealloc_pages = False, - huge_pages=False, locked_pages=False): + huge_pages=False, locked_pages=False, + dirty_ring_size=0): self._cpus = cpus self._mem = mem # GiB self._src_mem_bind = src_mem_bind # List of NUMA nodes @@ -33,6 +34,7 @@ def __init__(self, cpus=1, mem=1, self._prealloc_pages = prealloc_pages self._huge_pages = huge_pages self._locked_pages = locked_pages +self._dirty_ring_size = dirty_ring_size def serialize(self): @@ -46,6 +48,7 @@ def serialize(self): "prealloc_pages": self._prealloc_pages, "huge_pages": self._huge_pages, "locked_pages": self._locked_pages, +"dirty_ring_size": self._dirty_ring_size, } @classmethod @@ -59,4 +62,5 @@ def deserialize(cls, data): data["dst_mem_bind"], data["prealloc_pages"], data["huge_pages"], -data["locked_pages"]) +data["locked_pages"], +data["dirty_ring_size"]) diff --git a/tests/migration/guestperf/shell.py b/tests/migration/guestperf/shell.py index 8a809e3dda..7d6b8cd7cf 100644 --- a/tests/migration/guestperf/shell.py +++ b/tests/migration/guestperf/shell.py @@ -60,6 +60,8 @@ def __init__(self): parser.add_argument("--prealloc-pages", dest="prealloc_pages", default=False) parser.add_argument("--huge-pages", dest="huge_pages", default=False) parser.add_argument("--locked-pages", dest="locked_pages", default=False) +parser.add_argument("--dirty-ring-size", dest="dirty_ring_size", +default=0, type=int) self._parser = parser @@ -89,7 +91,9 @@ def split_map(value): locked_pages=args.locked_pages, huge_pages=args.huge_pages, -prealloc_pages=args.prealloc_pages) +prealloc_pages=args.prealloc_pages, + +dirty_ring_size=args.dirty_ring_size) class Shell(BaseShell): -- 2.39.1
[PATCH 5/6] tests/migration: Introduce dirty-limit into guestperf
Currently, guestperf does not cover the dirty-limit migration, support this feature. Note that dirty-limit requires 'dirty-ring-size' set. To enable dirty-limit, setting x-vcpu-dirty-limit-period as 500ms and x-vcpu-dirty-limit as 10MB/s: $ ./tests/migration/guestperf.py \ --dirty-ring-size 4096 \ --dirty-limit --x-vcpu-dirty-limit-period 500 \ --vcpu-dirty-limit 10 --output output.json \ To run the entire standardized set of dirty-limit-enabled comparisons, with unix migration: $ ./tests/migration/guestperf-batch.py \ --dirty-ring-size 4096 \ --dst-host localhost --transport unix \ --filter compr-dirty-limit* --output outputdir Signed-off-by: Hyman Huang --- tests/migration/guestperf/comparison.py | 23 +++ tests/migration/guestperf/engine.py | 17 + tests/migration/guestperf/progress.py | 16 ++-- tests/migration/guestperf/scenario.py | 11 ++- tests/migration/guestperf/shell.py | 18 +- 5 files changed, 81 insertions(+), 4 deletions(-) diff --git a/tests/migration/guestperf/comparison.py b/tests/migration/guestperf/comparison.py index c03b3f6d7e..42cc0372d1 100644 --- a/tests/migration/guestperf/comparison.py +++ b/tests/migration/guestperf/comparison.py @@ -135,4 +135,27 @@ def __init__(self, name, scenarios): Scenario("compr-multifd-channels-64", multifd=True, multifd_channels=64), ]), + +# Looking at effect of dirty-limit with +# varying x_vcpu_dirty_limit_period +Comparison("compr-dirty-limit-period", scenarios = [ +Scenario("compr-dirty-limit-period-500", + dirty_limit=True, x_vcpu_dirty_limit_period=500), +Scenario("compr-dirty-limit-period-800", + dirty_limit=True, x_vcpu_dirty_limit_period=800), +Scenario("compr-dirty-limit-period-1000", + dirty_limit=True, x_vcpu_dirty_limit_period=1000), +]), + + +# Looking at effect of dirty-limit with +# varying vcpu_dirty_limit +Comparison("compr-dirty-limit", scenarios = [ +Scenario("compr-dirty-limit-10MB", + dirty_limit=True, vcpu_dirty_limit=10), +Scenario("compr-dirty-limit-20MB", + dirty_limit=True, vcpu_dirty_limit=20), +Scenario("compr-dirty-limit-50MB", + dirty_limit=True, vcpu_dirty_limit=50), +]), ] diff --git a/tests/migration/guestperf/engine.py b/tests/migration/guestperf/engine.py index aabf6de4d9..608d7270f6 100644 --- a/tests/migration/guestperf/engine.py +++ b/tests/migration/guestperf/engine.py @@ -102,6 +102,8 @@ def _migrate_progress(self, vm): info.get("expected-downtime", 0), info.get("setup-time", 0), info.get("cpu-throttle-percentage", 0), +info.get("dirty-limit-throttle-time-per-round", 0), +info.get("dirty-limit-ring-full-time", 0), ) def _migrate(self, hardware, scenario, src, dst, connect_uri): @@ -203,6 +205,21 @@ def _migrate(self, hardware, scenario, src, dst, connect_uri): resp = dst.cmd("migrate-set-parameters", multifd_channels=scenario._multifd_channels) +if scenario._dirty_limit: +if not hardware._dirty_ring_size: +raise Exception("dirty ring size must be configured when " +"testing dirty limit migration") + +resp = src.cmd("migrate-set-capabilities", + capabilities = [ + { "capability": "dirty-limit", + "state": True } + ]) +resp = src.cmd("migrate-set-parameters", +x_vcpu_dirty_limit_period=scenario._x_vcpu_dirty_limit_period) +resp = src.cmd("migrate-set-parameters", + vcpu_dirty_limit=scenario._vcpu_dirty_limit) + resp = src.cmd("migrate", uri=connect_uri) post_copy = False diff --git a/tests/migration/guestperf/progress.py b/tests/migration/guestperf/progress.py index ab1ee57273..d490584217 100644 --- a/tests/migration/guestperf/progress.py +++ b/tests/migration/guestperf/progress.py @@ -81,7 +81,9 @@ def __init__(self, downtime, downtime_expected, setup_time, - throttle_pcent): + throttle_pcent, + dirty_limit_throttle_time_per_round, + dirty_limit_ring_full_time): self._status = status self._ram = ram @@ -91,6 +93,10 @@ def __init__(self, self._downtime_expected = downtime_expected self._setup_time = setup_t
[PATCH 4/6] tests/migration: Introduce dirty-ring-size option into guestperf
Dirty ring size configuration is not supported by guestperf tool. Introduce dirty-ring-size (ranges in [1024, 65536]) option so developers can play with dirty-ring and dirty-limit feature easier. To set dirty ring size with 4096 during migration test: $ ./tests/migration/guestperf.py --dirty-ring-size 4096 xxx Signed-off-by: Hyman Huang --- tests/migration/guestperf/engine.py | 6 +- tests/migration/guestperf/hardware.py | 8 ++-- tests/migration/guestperf/shell.py| 6 +- 3 files changed, 16 insertions(+), 4 deletions(-) diff --git a/tests/migration/guestperf/engine.py b/tests/migration/guestperf/engine.py index da96ca034a..aabf6de4d9 100644 --- a/tests/migration/guestperf/engine.py +++ b/tests/migration/guestperf/engine.py @@ -325,7 +325,6 @@ def _get_common_args(self, hardware, tunnelled=False): cmdline = "'" + cmdline + "'" argv = [ -"-accel", "kvm", "-cpu", "host", "-kernel", self._kernel, "-initrd", self._initrd, @@ -333,6 +332,11 @@ def _get_common_args(self, hardware, tunnelled=False): "-m", str((hardware._mem * 1024) + 512), "-smp", str(hardware._cpus), ] +if hardware._dirty_ring_size: +argv.extend(["-accel", "kvm,dirty-ring-size=%s" % + hardware._dirty_ring_size]) +else: +argv.extend(["-accel", "kvm"]) argv.extend(self._get_qemu_serial_args()) diff --git a/tests/migration/guestperf/hardware.py b/tests/migration/guestperf/hardware.py index 3145785ffd..f779cc050b 100644 --- a/tests/migration/guestperf/hardware.py +++ b/tests/migration/guestperf/hardware.py @@ -23,7 +23,8 @@ def __init__(self, cpus=1, mem=1, src_cpu_bind=None, src_mem_bind=None, dst_cpu_bind=None, dst_mem_bind=None, prealloc_pages = False, - huge_pages=False, locked_pages=False): + huge_pages=False, locked_pages=False, + dirty_ring_size=0): self._cpus = cpus self._mem = mem # GiB self._src_mem_bind = src_mem_bind # List of NUMA nodes @@ -33,6 +34,7 @@ def __init__(self, cpus=1, mem=1, self._prealloc_pages = prealloc_pages self._huge_pages = huge_pages self._locked_pages = locked_pages +self._dirty_ring_size = dirty_ring_size def serialize(self): @@ -46,6 +48,7 @@ def serialize(self): "prealloc_pages": self._prealloc_pages, "huge_pages": self._huge_pages, "locked_pages": self._locked_pages, +"dirty_ring_size": self._dirty_ring_size, } @classmethod @@ -59,4 +62,5 @@ def deserialize(cls, data): data["dst_mem_bind"], data["prealloc_pages"], data["huge_pages"], -data["locked_pages"]) +data["locked_pages"], +data["dirty_ring_size"]) diff --git a/tests/migration/guestperf/shell.py b/tests/migration/guestperf/shell.py index 8a809e3dda..7d6b8cd7cf 100644 --- a/tests/migration/guestperf/shell.py +++ b/tests/migration/guestperf/shell.py @@ -60,6 +60,8 @@ def __init__(self): parser.add_argument("--prealloc-pages", dest="prealloc_pages", default=False) parser.add_argument("--huge-pages", dest="huge_pages", default=False) parser.add_argument("--locked-pages", dest="locked_pages", default=False) +parser.add_argument("--dirty-ring-size", dest="dirty_ring_size", +default=0, type=int) self._parser = parser @@ -89,7 +91,9 @@ def split_map(value): locked_pages=args.locked_pages, huge_pages=args.huge_pages, -prealloc_pages=args.prealloc_pages) +prealloc_pages=args.prealloc_pages, + +dirty_ring_size=args.dirty_ring_size) class Shell(BaseShell): -- 2.39.1
[PATCH 6/6] docs/migration: Add the dirty limit section
The dirty limit feature has been introduced since the 8.1 QEMU release but has not reflected in the document, add a section for that. Signed-off-by: Hyman Huang --- docs/devel/migration.rst | 71 1 file changed, 71 insertions(+) diff --git a/docs/devel/migration.rst b/docs/devel/migration.rst index c3e1400c0c..1cbec22e2a 100644 --- a/docs/devel/migration.rst +++ b/docs/devel/migration.rst @@ -588,6 +588,77 @@ path. Return path - opened by main thread, written by main thread AND postcopy thread (protected by rp_mutex) +Dirty limit += +The dirty limit, short for dirty page rate upper limit, is a new capability +introduced in the 8.1 QEMU release that uses a new algorithm based on the KVM +dirty ring to throttle down the guest during live migration. + +The algorithm framework is as follows: + +:: + + -- + main --> throttle thread > PREPARE(1) < + thread \| | + \ | | +\ V | + -\CALCULATE(2) | + \ | | +\ | | + \ V | + \SET PENALTY(3) - + -\ | + \ | + \V + -> virtual CPU thread ---> ACCEPT PENALTY(4) + -- + +When the qmp command qmp_set_vcpu_dirty_limit is called for the first time, +the QEMU main thread starts the throttle thread. The throttle thread, once +launched, executes the loop, which consists of three steps: + + - PREPARE (1) + + The entire work of PREPARE (1) is prepared for the second stage, + CALCULATE(2), as the name implies. It involves preparing the dirty + page rate value and the corresponding upper limit of the VM: + The dirty page rate is calculated via the KVM dirty ring mechanism, + which tells QEMU how many dirty pages a virtual CPU has had since the + last KVM_EXIT_DIRTY_RING_RULL exception; The dirty page rate upper + limit is specified by caller, therefore fetch it directly. + + - CALCULATE (2) + + Calculate a suitable sleep period for each virtual CPU, which will be + used to determine the penalty for the target virtual CPU. The + computation must be done carefully in order to reduce the??dirty page + rate progressively down to the upper limit without oscillation. To + achieve this, two strategies are provided: the first is to add or + subtract sleep time based on the ratio of the current dirty page rate + to the limit, which is used when the current dirty page rate is far + from the limit; the second is to add or subtract a fixed time when + the current dirty page rate is close to the limit. + + - SET PENALTY (3) + + Set the sleep time for each virtual CPU that should be penalized based + on the results of the calculation supplied by step CALCULATE (2). + +After completing the three above stages, the throttle thread loops back +to step PREPARE (1) until the dirty limit is reached. + +On the other hand, each virtual CPU thread reads the sleep duration and +sleeps in the path of the KVM_EXIT_DIRTY_RING_RULL exception handler, that +is ACCEPT PENALTY (4). Virtual CPUs tied with writing processes will +obviously exit to the path and get penalized, whereas virtual CPUs involved +with read processes will not. + +In summary, thanks to the KVM dirty ring technology, the dirty limit +algorithm will restrict virtual CPUs as needed to keep their dirty page +rate inside the limit. This leads to more steady reading performance during +live migration and can aid in improving large guest responsiveness. + Postcopy -- 2.39.1