[PATCH 0/2] migrate inflight emulated SCSI request for the scsi disk device

2024-05-24 Thread Hyman Huang
This patchset refine the comment of ther previous series:
https://patchew.org/QEMU/cover.1712577715.git.yong.hu...@smartx.com/

Aiming to make the review easier, please review, thanks.
Yong 

When designing the USB mass storage device model, QEMU places SCSI disk
device as the backend of USB mass storage device. In addition, USB mass
device driver in Guest OS conforms to the "Universal Serial Bus Mass
Storage Class Bulk-Only Transport" specification in order to simulate
the transform behavior between a USB controller and a USB mass device.
The following shows the protocol hierarchy:

  ++
 CDROM driver |  scsi command  |CDROM
  ++

   +---+
 USB mass  | USB Mass Storage Class|USB mass
 storage driver| Bulk-Only Transport   |storage device
   +---+

  ++
 USB Controller   |  USB Protocol  |USB device
  ++

In the USB protocol layer, between the USB controller and USB device, at
least two USB packets will be transformed when guest OS send a
read operation to USB mass storage device:

1. The CBW packet, which will be delivered to the USB device's Bulk-Out
endpoint. In order to simulate a read operation, the USB mass storage
device parses the CBW and converts it to a SCSI command, which would be
executed by CDROM(represented as SCSI disk in QEMU internally), and store
the result data of the SCSI command in a buffer.

2. The DATA-IN packet, which will be delivered from the USB device's
Bulk-In endpoint(fetched directly from the preceding buffer) to the USB
controller.

We consider UHCI to be the controller. The two packets mentioned above may
have been processed by UHCI in two separate frame entries of the Frame List
, and also described by two different TDs. Unlike the physical environment,
a virtualized environment requires the QEMU to make sure that the result
data of CBW is not lost and is delivered to the UHCI controller.

Currently, these types of SCSI requests are not migrated, so QEMU cannot
ensure the result data of the IO operation is not lost if there are
inflight emulated SCSI requests during the live migration.

Assume for the moment that the USB mass storage device is processing the
CBW and storing the result data of the read operation to a buffre, live
migration happens and moves the VM to the destination while not migrating
the result data of the read operation.

After migration, when UHCI at the destination issues a DATA-IN request to
the USB mass storage device, a crash happens because USB mass storage device
fetches the result data and get nothing.

The scenario this patch addresses is this one.

Theoretically, any device that uses the SCSI disk as a back-end would be
affected by this issue. In this case, it is the USB CDROM.

To fix it, inflight emulated SCSI request be migrated during live migration,
similar to the DMA SCSI request.

Hyman Huang (2):
  scsi-disk: Introduce the migrate_emulate_scsi_request field
  scsi-disk: Fix crash for VM configured with USB CDROM after live
migration

 hw/scsi/scsi-disk.c | 35 ++-
 1 file changed, 34 insertions(+), 1 deletion(-)

-- 
2.39.3




[PATCH 1/2] scsi-disk: Introduce the migrate_emulate_scsi_request field

2024-05-24 Thread Hyman Huang
To indicate to the destination whether or not emulational SCSI
requests are sent, introduce the migrate_emulate_scsi_request
in struct SCSIDiskState. It seeks to achieve migration backend
compatibility.

This commit sets the stage for the next one, which addresses
the crash of a VM configured with a CDROM during live migration.

Signed-off-by: Hyman Huang 
Message-Id: 
<2da3a08785453478079cfd46d8293ee68d284391.1712577715.git.yong.hu...@smartx.com>
---
 hw/scsi/scsi-disk.c | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
index 4bd7af9d0c..0985676f73 100644
--- a/hw/scsi/scsi-disk.c
+++ b/hw/scsi/scsi-disk.c
@@ -111,6 +111,7 @@ struct SCSIDiskState {
  * 0x- reserved
  */
 uint16_t rotation_rate;
+bool migrate_emulate_scsi_request;
 };
 
 static void scsi_free_request(SCSIRequest *req)
@@ -3133,11 +3134,21 @@ static Property scsi_hd_properties[] = {
 DEFINE_PROP_END_OF_LIST(),
 };
 
+static int scsi_disk_pre_save(void *opaque)
+{
+SCSIDiskState *dev = opaque;
+dev->migrate_emulate_scsi_request = false;
+
+return 0;
+}
+
 static const VMStateDescription vmstate_scsi_disk_state = {
 .name = "scsi-disk",
-.version_id = 1,
+.version_id = 2,
 .minimum_version_id = 1,
+.pre_save = scsi_disk_pre_save,
 .fields = (const VMStateField[]) {
+VMSTATE_BOOL_V(migrate_emulate_scsi_request, SCSIDiskState, 2),
 VMSTATE_SCSI_DEVICE(qdev, SCSIDiskState),
 VMSTATE_BOOL(media_changed, SCSIDiskState),
 VMSTATE_BOOL(media_event, SCSIDiskState),
-- 
2.39.3




[PATCH 2/2] scsi-disk: Fix crash for VM configured with USB CDROM after live migration

2024-05-24 Thread Hyman Huang
0x472

When designing the USB mass storage device model, QEMU places SCSI disk
device as the backend of USB mass storage device. In addition, USB mass
device driver in Guest OS conforms to the "Universal Serial Bus Mass
Storage Class Bulk-Only Transport" specification in order to simulate
the transform behavior between a USB controller and a USB mass device.
The following shows the protocol hierarchy:

  ++
 CDROM driver |  scsi command  |CDROM
  ++

   +---+
 USB mass  | USB Mass Storage Class|USB mass
 storage driver| Bulk-Only Transport   |storage device
   +---+

  ++
 USB Controller   |  USB Protocol  |USB device
  ++

In the USB protocol layer, between the USB controller and USB device, at
least two USB packets will be transformed when guest OS send a
read operation to USB mass storage device:

1. The CBW packet, which will be delivered to the USB device's Bulk-Out
endpoint. In order to simulate a read operation, the USB mass storage
device parses the CBW and converts it to a SCSI command, which would be
executed by CDROM(represented as SCSI disk in QEMU internally), and store
the result data of the SCSI command in a buffer.

2. The DATA-IN packet, which will be delivered from the USB device's
Bulk-In endpoint(fetched directly from the preceding buffer) to the USB
controller.

We consider UHCI to be the controller. The two packets mentioned above may
have been processed by UHCI in two separate frame entries of the Frame List
, and also described by two different TDs. Unlike the physical environment,
a virtualized environment requires the QEMU to make sure that the result
data of CBW is not lost and is delivered to the UHCI controller.

Currently, these types of SCSI requests are not migrated, so QEMU cannot
ensure the result data of the IO operation is not lost if there are
inflight emulated SCSI requests during the live migration.

Assume for the moment that the USB mass storage device is processing the
CBW and storing the result data of the read operation to a buffre, live
migration happens and moves the VM to the destination while not migrating
the result data of the read operation.

After migration, when UHCI at the destination issues a DATA-IN request to
the USB mass storage device, a crash happens because USB mass storage device
fetches the result data and get nothing.

The scenario this patch addresses is this one.

Theoretically, any device that uses the SCSI disk as a back-end would be
affected by this issue. In this case, it is the USB CDROM.

To fix it, inflight emulated SCSI request be migrated during live migration,
similar to the DMA SCSI request.

Signed-off-by: Hyman Huang 
---
 hw/scsi/scsi-disk.c | 24 +++-
 1 file changed, 23 insertions(+), 1 deletion(-)

diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
index 0985676f73..d6e9d9e8d4 100644
--- a/hw/scsi/scsi-disk.c
+++ b/hw/scsi/scsi-disk.c
@@ -160,6 +160,16 @@ static void scsi_disk_save_request(QEMUFile *f, 
SCSIRequest *req)
 }
 }
 
+static void scsi_disk_emulate_save_request(QEMUFile *f, SCSIRequest *req)
+{
+SCSIDiskReq *r = DO_UPCAST(SCSIDiskReq, req, req);
+SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
+
+if (s->migrate_emulate_scsi_request) {
+scsi_disk_save_request(f, req);
+}
+}
+
 static void scsi_disk_load_request(QEMUFile *f, SCSIRequest *req)
 {
 SCSIDiskReq *r = DO_UPCAST(SCSIDiskReq, req, req);
@@ -183,6 +193,16 @@ static void scsi_disk_load_request(QEMUFile *f, 
SCSIRequest *req)
 qemu_iovec_init_external(>qiov, >iov, 1);
 }
 
+static void scsi_disk_emulate_load_request(QEMUFile *f, SCSIRequest *req)
+{
+SCSIDiskReq *r = DO_UPCAST(SCSIDiskReq, req, req);
+SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
+
+if (s->migrate_emulate_scsi_request) {
+scsi_disk_load_request(f, req);
+}
+}
+
 /*
  * scsi_handle_rw_error has two return values.  False means that the error
  * must be ignored, true means that the error has been processed and the
@@ -2593,6 +2613,8 @@ static const SCSIReqOps scsi_disk_emulate_reqops = {
 .read_data= scsi_disk_emulate_read_data,
 .write_data   = scsi_disk_emulate_write_data,
 .get_buf  = scsi_get_buf,
+.load_request = scsi_disk_emulate_load_request,
+.save_request = scsi_disk_emulate_save_request,
 };
 
 static const SCSIReqOps scsi_disk_dma_reqops = {
@@ -3137,7 +3159,7 @@ static Property scsi_hd_properties[] = {
 static int scsi_disk_pre_save(void *opaque)
 {
 SCSIDiskState *dev = opaque;
-dev->migrate_emulate_scsi_request = false;
+dev->migrate_emulate_scsi_request = true;
 
 return 0;
 }
-- 
2.39.3




[PATCH RESEND 1/2] scsi-disk: Introduce the migrate_emulate_scsi_request field

2024-04-08 Thread Hyman Huang
To indicate to the destination whether or not emulational SCSI
requests are sent, introduce the migrate_emulate_scsi_request
in struct SCSIDiskState. It seeks to achieve migration backend
compatibility.

This commit sets the stage for the next one, which addresses
the crash of a VM configured with a CDROM during live migration.

Signed-off-by: Hyman Huang 
---
 hw/scsi/scsi-disk.c | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
index 4bd7af9d0c..0985676f73 100644
--- a/hw/scsi/scsi-disk.c
+++ b/hw/scsi/scsi-disk.c
@@ -111,6 +111,7 @@ struct SCSIDiskState {
  * 0x- reserved
  */
 uint16_t rotation_rate;
+bool migrate_emulate_scsi_request;
 };
 
 static void scsi_free_request(SCSIRequest *req)
@@ -3133,11 +3134,21 @@ static Property scsi_hd_properties[] = {
 DEFINE_PROP_END_OF_LIST(),
 };
 
+static int scsi_disk_pre_save(void *opaque)
+{
+SCSIDiskState *dev = opaque;
+dev->migrate_emulate_scsi_request = false;
+
+return 0;
+}
+
 static const VMStateDescription vmstate_scsi_disk_state = {
 .name = "scsi-disk",
-.version_id = 1,
+.version_id = 2,
 .minimum_version_id = 1,
+.pre_save = scsi_disk_pre_save,
 .fields = (const VMStateField[]) {
+VMSTATE_BOOL_V(migrate_emulate_scsi_request, SCSIDiskState, 2),
 VMSTATE_SCSI_DEVICE(qdev, SCSIDiskState),
 VMSTATE_BOOL(media_changed, SCSIDiskState),
 VMSTATE_BOOL(media_event, SCSIDiskState),
-- 
2.39.3




[PATCH RESEND 0/2] Fix crash of VMs configured with the CDROM device

2024-04-08 Thread Hyman Huang
This patchset fixes the crash of VMs configured with the CDROM device
on the destination during live migration. See the commit message for
details.

The previous patchset does not show up at https://patchew.org/QEMU.
Just resend it to ensure the email gets to the inbox.

Please review.

Yong

Hyman Huang (2):
  scsi-disk: Introduce the migrate_emulate_scsi_request field
  scsi-disk: Fix crash of VMs configured with the CDROM device

 hw/scsi/scsi-disk.c | 35 ++-
 1 file changed, 34 insertions(+), 1 deletion(-)

-- 
2.39.3




[PATCH RESEND 2/2] scsi-disk: Fix crash of VMs configured with the CDROM device

2024-04-08 Thread Hyman Huang
When configuring VMs with the CDROM device using the USB bus
in Libvirt, do as follows:


  
  
  
  
  



The destination Qemu process crashed, causing the VM migration
to fail; the backtrace reveals the following:

Program terminated with signal SIGSEGV, Segmentation fault.
0  __memmove_sse2_unaligned_erms () at 
../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:312
312movq-8(%rsi,%rdx), %rcx
[Current thread is 1 (Thread 0x7f0a9025fc00 (LWP 3286206))]
(gdb) bt
0  __memmove_sse2_unaligned_erms () at 
../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:312
1  memcpy (__len=8, __src=, __dest=) at 
/usr/include/bits/string_fortified.h:34
2  iov_from_buf_full (iov=, iov_cnt=, 
offset=, buf=0x0, bytes=bytes@entry=8) at ../util/iov.c:33
3  iov_from_buf (bytes=8, buf=, offset=, 
iov_cnt=, iov=)
   at 
/usr/src/debug/qemu-6-6.2.0-75.7.oe1.smartx.git.40.x86_64/include/qemu/iov.h:49
4  usb_packet_copy (p=p@entry=0x56066b2fb5a0, ptr=, 
bytes=bytes@entry=8) at ../hw/usb/core.c:636
5  usb_msd_copy_data (s=s@entry=0x56066c62c770, p=p@entry=0x56066b2fb5a0) at 
../hw/usb/dev-storage.c:186
6  usb_msd_handle_data (dev=0x56066c62c770, p=0x56066b2fb5a0) at 
../hw/usb/dev-storage.c:496
7  usb_handle_packet (dev=0x56066c62c770, p=p@entry=0x56066b2fb5a0) at 
../hw/usb/core.c:455
8  uhci_handle_td (s=s@entry=0x56066bd5f210, q=0x56066bb7fbd0, q@entry=0x0, 
qh_addr=qh_addr@entry=902518530, td=td@entry=0x7fffe6e788f0, td_addr=,
   int_mask=int_mask@entry=0x7fffe6e788e4) at ../hw/usb/hcd-uhci.c:885
9  uhci_process_frame (s=s@entry=0x56066bd5f210) at ../hw/usb/hcd-uhci.c:1061
10 uhci_frame_timer (opaque=opaque@entry=0x56066bd5f210) at 
../hw/usb/hcd-uhci.c:1159
11 timerlist_run_timers (timer_list=0x56066af26bd0) at ../util/qemu-timer.c:642
12 qemu_clock_run_timers (type=QEMU_CLOCK_VIRTUAL) at ../util/qemu-timer.c:656
13 qemu_clock_run_all_timers () at ../util/qemu-timer.c:738
14 main_loop_wait (nonblocking=nonblocking@entry=0) at ../util/main-loop.c:542
15 qemu_main_loop () at ../softmmu/runstate.c:739
16 main (argc=, argv=, envp=) at 
../softmmu/main.c:52
(gdb) frame 5
(gdb) p ((SCSIDiskReq *)s->req)->iov
$1 = {iov_base = 0x0, iov_len = 0}
(gdb) p/x s->req->tag
$2 = 0x472

The scsi commands that the CDROM issued are wrapped as the
payload of the USB protocol in Qemu's implementation of a
USB mass storage device, which is used to implement a
CDROM device that uses a USB bus.

In general, the USB controller processes SCSI commands in
two phases. Sending the OUT USB package that encapsulates
the SCSI command is the first stage; scsi-disk would handle
this by emulating the SCSI operation. Receiving the IN USB
package containing the SCSI operation's output is the second
stage. Additionally, the SCSI request tag tracks the request
during the procedure.

Since QEMU did not migrate the flying SCSI request, the
output of the SCSI may be lost if the live migration is
initiated between the two previously mentioned steps.

In our scenario, the SCSI command is GET_EVENT_STATUS_NOTIFICATION,
the QEMU log information below demonstrates how the SCSI command
is being handled (first step) on the source:

usb_packet_state_change bus 0, port 2, ep 2, packet 0x559f9ba14b00, state undef 
-> setup
usb_msd_cmd_submit lun 0, tag 0x472, flags 0x0080, len 10, data-len 8

After migration, the VM crashed as soon as the destination's UHCI
controller began processing the remaining portion of the SCSI
request (second step)! Here is how the QEMU logged out:

usb_packet_state_change bus 0, port 2, ep 1, packet 0x56066b2fb5a0, state undef 
-> setup
usb_msd_data_in 8/8 (scsi 8)
shutting down, reason=crashed

To summarize, the missing scsi request during a live migration
may cause a VM configured with a CDROM to crash.

Migrating the SCSI request that the scsi-disk is handling is
the simple approach, assuming that it actually exists.

Signed-off-by: Hyman Huang 
---
 hw/scsi/scsi-disk.c | 24 +++-
 1 file changed, 23 insertions(+), 1 deletion(-)

diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
index 0985676f73..d6e9d9e8d4 100644
--- a/hw/scsi/scsi-disk.c
+++ b/hw/scsi/scsi-disk.c
@@ -160,6 +160,16 @@ static void scsi_disk_save_request(QEMUFile *f, 
SCSIRequest *req)
 }
 }
 
+static void scsi_disk_emulate_save_request(QEMUFile *f, SCSIRequest *req)
+{
+SCSIDiskReq *r = DO_UPCAST(SCSIDiskReq, req, req);
+SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
+
+if (s->migrate_emulate_scsi_request) {
+scsi_disk_save_request(f, req);
+}
+}
+
 static void scsi_disk_load_request(QEMUFile *f, SCSIRequest *req)
 {
 SCSIDiskReq *r = DO_UPCAST(SCSIDiskReq, req, req);
@@ -183,6 +193,16 @@ static void scsi_disk_load_request(QEMUFile *f, 
SCSIRequest *req)
 qemu_iovec_init_external(>qiov, >iov, 1);
 }
 
+static void scsi_disk_emulate_load_request(QEMUFile *f, SCSIRequest *req)
+{
+SCSIDiskReq *r = DO_UPCAST(SCSIDiskReq, req, req);
+SCSIDiskStat

[PATCH 1/2] scsi-disk: Introduce the migrate_emulate_scsi_request field

2024-04-03 Thread Hyman Huang
To indicate to the destination whether or not emulational SCSI
requests are sent, introduce the migrate_emulate_scsi_request
in struct SCSIDiskState. It seeks to achieve migration backend
compatibility.

This commit sets the stage for the next one, which addresses
the crash of a VM configured with a CDROM during live migration.

Signed-off-by: Hyman Huang 
---
 hw/scsi/scsi-disk.c | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
index 4bd7af9d0c..0985676f73 100644
--- a/hw/scsi/scsi-disk.c
+++ b/hw/scsi/scsi-disk.c
@@ -111,6 +111,7 @@ struct SCSIDiskState {
  * 0x- reserved
  */
 uint16_t rotation_rate;
+bool migrate_emulate_scsi_request;
 };
 
 static void scsi_free_request(SCSIRequest *req)
@@ -3133,11 +3134,21 @@ static Property scsi_hd_properties[] = {
 DEFINE_PROP_END_OF_LIST(),
 };
 
+static int scsi_disk_pre_save(void *opaque)
+{
+SCSIDiskState *dev = opaque;
+dev->migrate_emulate_scsi_request = false;
+
+return 0;
+}
+
 static const VMStateDescription vmstate_scsi_disk_state = {
 .name = "scsi-disk",
-.version_id = 1,
+.version_id = 2,
 .minimum_version_id = 1,
+.pre_save = scsi_disk_pre_save,
 .fields = (const VMStateField[]) {
+VMSTATE_BOOL_V(migrate_emulate_scsi_request, SCSIDiskState, 2),
 VMSTATE_SCSI_DEVICE(qdev, SCSIDiskState),
 VMSTATE_BOOL(media_changed, SCSIDiskState),
 VMSTATE_BOOL(media_event, SCSIDiskState),
-- 
2.39.3




[PATCH 2/2] scsi-disk: Fix the migration crash of the CDROM device with USB bus

2024-04-03 Thread Hyman Huang
When configuring VMs with the CDROM device using the USB bus
in Libvirt, do as follows:


  
  
  
  
  



The destination Qemu process crashed, causing the VM migration
to fail; the backtrace reveals the following:

Program terminated with signal SIGSEGV, Segmentation fault.
0  __memmove_sse2_unaligned_erms () at 
../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:312
312movq-8(%rsi,%rdx), %rcx
[Current thread is 1 (Thread 0x7f0a9025fc00 (LWP 3286206))]
(gdb) bt
0  __memmove_sse2_unaligned_erms () at 
../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:312
1  memcpy (__len=8, __src=, __dest=) at 
/usr/include/bits/string_fortified.h:34
2  iov_from_buf_full (iov=, iov_cnt=, 
offset=, buf=0x0, bytes=bytes@entry=8) at ../util/iov.c:33
3  iov_from_buf (bytes=8, buf=, offset=, 
iov_cnt=, iov=)
   at 
/usr/src/debug/qemu-6-6.2.0-75.7.oe1.smartx.git.40.x86_64/include/qemu/iov.h:49
4  usb_packet_copy (p=p@entry=0x56066b2fb5a0, ptr=, 
bytes=bytes@entry=8) at ../hw/usb/core.c:636
5  usb_msd_copy_data (s=s@entry=0x56066c62c770, p=p@entry=0x56066b2fb5a0) at 
../hw/usb/dev-storage.c:186
6  usb_msd_handle_data (dev=0x56066c62c770, p=0x56066b2fb5a0) at 
../hw/usb/dev-storage.c:496
7  usb_handle_packet (dev=0x56066c62c770, p=p@entry=0x56066b2fb5a0) at 
../hw/usb/core.c:455
8  uhci_handle_td (s=s@entry=0x56066bd5f210, q=0x56066bb7fbd0, q@entry=0x0, 
qh_addr=qh_addr@entry=902518530, td=td@entry=0x7fffe6e788f0, td_addr=,
   int_mask=int_mask@entry=0x7fffe6e788e4) at ../hw/usb/hcd-uhci.c:885
9  uhci_process_frame (s=s@entry=0x56066bd5f210) at ../hw/usb/hcd-uhci.c:1061
10 uhci_frame_timer (opaque=opaque@entry=0x56066bd5f210) at 
../hw/usb/hcd-uhci.c:1159
11 timerlist_run_timers (timer_list=0x56066af26bd0) at ../util/qemu-timer.c:642
12 qemu_clock_run_timers (type=QEMU_CLOCK_VIRTUAL) at ../util/qemu-timer.c:656
13 qemu_clock_run_all_timers () at ../util/qemu-timer.c:738
14 main_loop_wait (nonblocking=nonblocking@entry=0) at ../util/main-loop.c:542
15 qemu_main_loop () at ../softmmu/runstate.c:739
16 main (argc=, argv=, envp=) at 
../softmmu/main.c:52
(gdb) frame 5
(gdb) p ((SCSIDiskReq *)s->req)->iov
$1 = {iov_base = 0x0, iov_len = 0}
(gdb) p/x s->req->tag
$2 = 0x472

The scsi commands that the CDROM issued are wrapped as the
payload of the USB protocol in Qemu's implementation of a
USB mass storage device, which is used to implement a
CDROM device that uses a USB bus.

In general, the USB controller processes SCSI commands in
two phases. Sending the OUT USB package that encapsulates
the SCSI command is the first stage; scsi-disk would handle
this by emulating the SCSI operation. Receiving the IN USB
package containing the SCSI operation's output is the second
stage. Additionally, the SCSI request tag tracks the request
during the procedure.

Since QEMU did not migrate the flying SCSI request, the
output of the SCSI may be lost if the live migration is
initiated between the two previously mentioned steps.

In our scenario, the SCSI command is GET_EVENT_STATUS_NOTIFICATION,
the QEMU log information below demonstrates how the SCSI command
is being handled (first step) on the source:

usb_packet_state_change bus 0, port 2, ep 2, packet 0x559f9ba14b00, state undef 
-> setup
usb_msd_cmd_submit lun 0, tag 0x472, flags 0x0080, len 10, data-len 8

After migration, the VM crashed as soon as the destination's UHCI
controller began processing the remaining portion of the SCSI
request (second step)! Here is how the QEMU logged out:

usb_packet_state_change bus 0, port 2, ep 1, packet 0x56066b2fb5a0, state undef 
-> setup
usb_msd_data_in 8/8 (scsi 8)
shutting down, reason=crashed

To summarize, the missing scsi request during a live migration
may cause a VM configured with a CDROM to crash.

Migrating the SCSI request that the scsi-disk is handling is
the simple approach, assuming that it actually exists.

Signed-off-by: Hyman Huang 
---
 hw/scsi/scsi-disk.c | 24 +++-
 1 file changed, 23 insertions(+), 1 deletion(-)

diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
index 0985676f73..d6e9d9e8d4 100644
--- a/hw/scsi/scsi-disk.c
+++ b/hw/scsi/scsi-disk.c
@@ -160,6 +160,16 @@ static void scsi_disk_save_request(QEMUFile *f, 
SCSIRequest *req)
 }
 }
 
+static void scsi_disk_emulate_save_request(QEMUFile *f, SCSIRequest *req)
+{
+SCSIDiskReq *r = DO_UPCAST(SCSIDiskReq, req, req);
+SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
+
+if (s->migrate_emulate_scsi_request) {
+scsi_disk_save_request(f, req);
+}
+}
+
 static void scsi_disk_load_request(QEMUFile *f, SCSIRequest *req)
 {
 SCSIDiskReq *r = DO_UPCAST(SCSIDiskReq, req, req);
@@ -183,6 +193,16 @@ static void scsi_disk_load_request(QEMUFile *f, 
SCSIRequest *req)
 qemu_iovec_init_external(>qiov, >iov, 1);
 }
 
+static void scsi_disk_emulate_load_request(QEMUFile *f, SCSIRequest *req)
+{
+SCSIDiskReq *r = DO_UPCAST(SCSIDiskReq, req, req);
+SCSIDiskStat

[PATCH RFC 0/3] Support GM/T 0018-2012 cryptographic standard

2024-02-24 Thread Hyman Huang
This patchset introduce GM/T 0018-2012 as a crypto backend driver,
which is applied for block encryption. Currently, we support SM4
cipher algorithm only.

GM/T 0018-2012 is a cryptographic standard issued by the State
Cryptography Administration of China. Visit https://hbba.sacinfo.org.cn
search GM/T 0018-2012 for brief introduction.

The objective of the standard is to develop a uniform application
interface standard for the service-based cryptography device under
the public key cryptographic infrastructure application framework,
and to call the cryptography device through this interface to
provide basic cryptographic services for the uppler layer. For
more information about contents of the standard, download the
specificaiton from:
"https://github.com/guanzhi/GM-Standards/blob/master/GMT密码行标/
GMT 00018-2012 密码设备应用接口规范.pdf"

There are two benefits to doing this, at least.
 * Performance - using a cryptography device for block encryption
 offers an opportunity to enhance the input/output
 performance once the hardware is certified
 * Secrecy - hardware manufacturers may fortify cryptography
 equipment with security features, so increasing the
 secrecy of block encryption.

The precise way that vendors implement the standard APIs for data
encryption using the cryptographic device is uncoupled from the
GM/T 0018-2012 specification. Thus, if developers enable this
functionality with the following conditions met, we could accomplish
the general implementation:

1. rename the header file provided by vendor to gmt-0018-2012.h
   and copy it to the /usr/include directory.
2. rename the dynamic library provided by vendor to
   gmt_0018_2012.so and copy it to the /usr/lib64 or any directory
   that linker could find before compiling QEMU.
3. enable crypto_gmt option when compiling QEMU and make the feature
   availiable.

By offering a development package for GM/T 0018-2012, the above
provisions could be standardized; unfortunately, the hardware
manufacturer has not completed this task. So developers who don't
work with the vendor to obtain the cryptography device and related
library may not be able to test this functionality because the
standard implementation depends on the cryptography device supplied
by the hardware vendor. We are hesitant to contribute to this series
as a result.

After all, we uploaded this series with the intention of receiving
feedback, as the title suggests. We would welcome any suggestions
and feedback regarding this feature. 

Hyman Huang (3):
  crypto: Introduce GM/T 0018-2012 cryptographic driver
  meson.build: Support GM/T 0018-2012 cryptographic standard
  crypto: Allow GM/T 0018-2012 to support SM4 cipher algorithm

 MAINTAINERS   |   3 +-
 crypto/block-luks.c   |   4 +-
 crypto/cipher-gmt.c   | 263 ++
 crypto/cipher.c   |   6 +-
 crypto/cipherpriv.h   |   6 +
 crypto/meson.build|   3 +
 meson.build   |  30 
 meson_options.txt |   2 +
 scripts/meson-buildoptions.sh |   3 +
 9 files changed, 315 insertions(+), 5 deletions(-)
 create mode 100644 crypto/cipher-gmt.c

-- 
2.39.3




[PATCH RFC 1/3] crypto: Introduce GM/T 0018-2012 cryptographic driver

2024-02-24 Thread Hyman Huang
GM/T 0018-2012 is a cryptographic standard issued by the State
Cryptography Administration of China. For more information about
the standard, visit https://hbba.sacinfo.org.cn.

The objective of the standard is to develop a uniform application
interface standard for the service-based cryptography device under
the public key cryptographic infrastructure application framework,
and to call the cryptography device through this interface to
provide basic cryptographic services for the uppler layer. For
more information about contents of the standard, download the
specificaiton from:
"https://github.com/guanzhi/GM-Standards/blob/master/GMT密码行标/
GMT%200018-2012%20密码设备应用接口规范.pdf"

This patch implement the basic functions of GM/T 0018-2012
standard. Currently, for block encryption, it support SM4 cipher
algorithm only.

Signed-off-by: Hyman Huang 
---
 MAINTAINERS |   3 +-
 crypto/cipher-gmt.c | 263 
 crypto/cipher.c |   2 +
 crypto/cipherpriv.h |   6 +
 4 files changed, 273 insertions(+), 1 deletion(-)
 create mode 100644 crypto/cipher-gmt.c

diff --git a/MAINTAINERS b/MAINTAINERS
index a24c2b51b6..822726e9da 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3418,10 +3418,11 @@ F: migration/dirtyrate.c
 F: migration/dirtyrate.h
 F: include/sysemu/dirtyrate.h
 
-Detached LUKS header
+Detached LUKS header and GM/T 0018-2012 cryptography
 M: Hyman Huang 
 S: Maintained
 F: tests/qemu-iotests/tests/luks-detached-header
+F: crypto/cipher-gmt.c
 
 D-Bus
 M: Marc-André Lureau 
diff --git a/crypto/cipher-gmt.c b/crypto/cipher-gmt.c
new file mode 100644
index 00..40e32c114f
--- /dev/null
+++ b/crypto/cipher-gmt.c
@@ -0,0 +1,263 @@
+/*
+ * QEMU GM/T 0018-2012 cryptographic standard support
+ *
+ * Copyright (c) 2024 SmartX Inc
+ *
+ * Authors:
+ *Hyman Huang 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * (at your option) any later version.  See the COPYING file in the
+ * top-level directory.
+ */
+#include 
+
+#include "qemu/osdep.h"
+#include "qemu/thread.h"
+#include "qapi/error.h"
+#include "crypto/cipher.h"
+#include "cipherpriv.h"
+
+#include "qemu/error-report.h"
+
+typedef struct QCryptoGMT QCryptoGMT;
+
+struct QCryptoGMT {
+QCryptoCipher base;
+
+SGD_HANDLE session;
+SGD_HANDLE key;
+SGD_UINT32 alg;
+unsigned char iv[16];  /* not used for SM4 algo currently */
+};
+
+typedef struct QCryptoGMTDeviceInfo QCryptoGMTDeviceInfo;
+
+struct QCryptoGMTDeviceInfo {
+SGD_HANDLE device;
+struct DeviceInfo_st info;
+bool opened;
+gint ref_count;
+};
+/*
+ * It is advised to use numerous sessions with one open device
+ * as opposed to single sessions with several devices.
+ */
+static QCryptoGMTDeviceInfo gmt_device;
+/* Protect the gmt_device */
+static QemuMutex gmt_device_mutex;
+
+static const struct QCryptoCipherDriver qcrypto_cipher_gmt_driver;
+
+static void gmt_device_lock(void)
+{
+qemu_mutex_lock(_device_mutex);
+}
+
+static void gmt_device_unlock(void)
+{
+qemu_mutex_unlock(_device_mutex);
+}
+
+static void
+__attribute__((__constructor__)) gmt_device_mutex_init(void)
+{
+qemu_mutex_init(_device_mutex);
+}
+
+static void
+gmt_device_ref(void)
+{
+g_assert(gmt_device.device != NULL);
+g_atomic_int_inc(_device.ref_count);
+}
+
+static void
+gmt_device_unref(void)
+{
+g_assert(gmt_device.device != NULL);
+if (g_atomic_int_dec_and_test(_device.ref_count)) {
+SDF_CloseDevice(gmt_device.device);
+gmt_device.opened = false;
+gmt_device.device = NULL;
+memset(_device.info, 0, sizeof(struct DeviceInfo_st));
+}
+}
+
+static bool
+qcrypto_gmt_cipher_supports(QCryptoCipherAlgorithm alg,
+QCryptoCipherMode mode)
+{
+switch (alg) {
+case QCRYPTO_CIPHER_ALG_SM4:
+break;
+default:
+return false;
+}
+
+switch (mode) {
+case QCRYPTO_CIPHER_MODE_ECB:
+return true;
+default:
+return false;
+}
+}
+
+QCryptoCipher *
+qcrypto_gmt_cipher_ctx_new(QCryptoCipherAlgorithm alg,
+   QCryptoCipherMode mode,
+   const uint8_t *key,
+   size_t nkey,
+   Error **errp)
+{
+QCryptoGMT *gmt;
+int rv;
+
+if (!qcrypto_gmt_cipher_supports(alg, mode)) {
+return NULL;
+}
+
+gmt = g_new0(QCryptoGMT, 1);
+if (!gmt) {
+return NULL;
+}
+
+switch (alg) {
+case QCRYPTO_CIPHER_ALG_SM4:
+gmt->alg = SGD_SM4_ECB;
+break;
+default:
+return NULL;
+}
+
+gmt_device_lock();
+if (!gmt_device.opened) {
+rv = SDF_OpenDevice(_device.device);
+if (rv != SDR_OK) {
+info_report("Could not open encryption card device, disabling");
+goto abort;
+}
+gmt_device.opened = 

[PATCH RFC 3/3] crypto: Allow GM/T 0018-2012 to support SM4 cipher algorithm

2024-02-24 Thread Hyman Huang
Since GM/T 0018-2012 was probed by SM4 cipher algorithm, allow
it to support SM4 cipher algorithm in block encryption.

Signed-off-by: Hyman Huang 
---
 crypto/block-luks.c | 4 ++--
 crypto/cipher.c | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/crypto/block-luks.c b/crypto/block-luks.c
index 3ee928fb5a..f4101fd435 100644
--- a/crypto/block-luks.c
+++ b/crypto/block-luks.c
@@ -95,7 +95,7 @@ qcrypto_block_luks_cipher_size_map_twofish[] = {
 { 0, 0 },
 };
 
-#ifdef CONFIG_CRYPTO_SM4
+#if defined CONFIG_CRYPTO_SM4 || defined CONFIG_GMT_0018_2012
 static const QCryptoBlockLUKSCipherSizeMap
 qcrypto_block_luks_cipher_size_map_sm4[] = {
 { 16, QCRYPTO_CIPHER_ALG_SM4},
@@ -109,7 +109,7 @@ qcrypto_block_luks_cipher_name_map[] = {
 { "cast5", qcrypto_block_luks_cipher_size_map_cast5 },
 { "serpent", qcrypto_block_luks_cipher_size_map_serpent },
 { "twofish", qcrypto_block_luks_cipher_size_map_twofish },
-#ifdef CONFIG_CRYPTO_SM4
+#if defined CONFIG_CRYPTO_SM4 || defined CONFIG_GMT_0018_2012
 { "sm4", qcrypto_block_luks_cipher_size_map_sm4},
 #endif
 };
diff --git a/crypto/cipher.c b/crypto/cipher.c
index 785f231948..5c2a620dcf 100644
--- a/crypto/cipher.c
+++ b/crypto/cipher.c
@@ -38,7 +38,7 @@ static const size_t alg_key_len[QCRYPTO_CIPHER_ALG__MAX] = {
 [QCRYPTO_CIPHER_ALG_TWOFISH_128] = 16,
 [QCRYPTO_CIPHER_ALG_TWOFISH_192] = 24,
 [QCRYPTO_CIPHER_ALG_TWOFISH_256] = 32,
-#ifdef CONFIG_CRYPTO_SM4
+#if defined CONFIG_CRYPTO_SM4 || defined CONFIG_GMT_0018_2012
 [QCRYPTO_CIPHER_ALG_SM4] = 16,
 #endif
 };
@@ -56,7 +56,7 @@ static const size_t alg_block_len[QCRYPTO_CIPHER_ALG__MAX] = {
 [QCRYPTO_CIPHER_ALG_TWOFISH_128] = 16,
 [QCRYPTO_CIPHER_ALG_TWOFISH_192] = 16,
 [QCRYPTO_CIPHER_ALG_TWOFISH_256] = 16,
-#ifdef CONFIG_CRYPTO_SM4
+#if defined CONFIG_CRYPTO_SM4 || defined CONFIG_GMT_0018_2012
 [QCRYPTO_CIPHER_ALG_SM4] = 16,
 #endif
 };
-- 
2.39.3




[PATCH RFC 2/3] meson.build: Support GM/T 0018-2012 cryptographic standard

2024-02-24 Thread Hyman Huang
GM/T 0018-2012 is a cryptographic standard issued by the State
Cryptography Administration of China.

The implement of the standard could support symmetric cipher
algorithm for block encryption. SM4 cipher algorithms could be
applied currently, so detect SM4 cipher algorithms via GM/T
0018-2012 API and enable the feature if crypto-gmt is given
explictly. This feature defaults to disabled.

Signed-off-by: Hyman Huang 
---
 crypto/meson.build|  3 +++
 meson.build   | 30 ++
 meson_options.txt |  2 ++
 scripts/meson-buildoptions.sh |  3 +++
 4 files changed, 38 insertions(+)

diff --git a/crypto/meson.build b/crypto/meson.build
index c46f9c22a7..dd49d03780 100644
--- a/crypto/meson.build
+++ b/crypto/meson.build
@@ -46,6 +46,9 @@ endif
 if have_afalg
   crypto_ss.add(if_true: files('afalg.c', 'cipher-afalg.c', 'hash-afalg.c'))
 endif
+if gmt_0018_2012.found()
+  crypto_ss.add(gmt_0018_2012, files('cipher-gmt.c'))
+endif
 
 system_ss.add(when: gnutls, if_true: files('tls-cipher-suites.c'))
 
diff --git a/meson.build b/meson.build
index c1dc83e4c0..cd188582b5 100644
--- a/meson.build
+++ b/meson.build
@@ -1693,6 +1693,34 @@ if not gnutls_crypto.found()
   endif
 endif
 
+if get_option('crypto_gmt').enabled() and get_option('crypto_afalg').enabled()
+  error('Only one of GM/T 0018-2012 & afalg can be enabled')
+endif
+
+gmt_0018_2012 = not_found
+if (not get_option('crypto_gmt').auto() or have_system)
+  gmt_0018_2012 = cc.find_library('gmt_0018_2012', has_headers: 
['gmt-0018-2012.h'],
+  required: get_option('crypto_gmt'))
+  if gmt_0018_2012.found() and not cc.links('''
+#include 
+#include 
+int main(void) {
+  unsigned char iv[16] = {0};
+  unsigned char plainData[16] = {0};
+  unsigned char cipherData[16] = {0};
+  unsigned int rlen;
+  SDF_Encrypt(NULL, NULL, SGD_SM4_ECB, iv, plainData, 16, cipherData, 
);
+  return 0;
+}''', dependencies: gmt_0018_2012)
+gmt_0018_2012 = not_found
+if get_option('crypto_gmt').enabled()
+  error('could not link gmt_0018_2012')
+else
+  warning('could not link gmt_0018_2012, disabling')
+endif
+  endif
+endif
+
 capstone = not_found
 if not get_option('capstone').auto() or have_system or have_user
   capstone = dependency('capstone', version: '>=3.0.5',
@@ -2291,6 +2319,7 @@ config_host_data.set('CONFIG_GNUTLS_CRYPTO', 
gnutls_crypto.found())
 config_host_data.set('CONFIG_TASN1', tasn1.found())
 config_host_data.set('CONFIG_GCRYPT', gcrypt.found())
 config_host_data.set('CONFIG_NETTLE', nettle.found())
+config_host_data.set('CONFIG_GMT_0018_2012', gmt_0018_2012.found())
 config_host_data.set('CONFIG_CRYPTO_SM4', crypto_sm4.found())
 config_host_data.set('CONFIG_HOGWEED', hogweed.found())
 config_host_data.set('CONFIG_QEMU_PRIVATE_XTS', xts == 'private')
@@ -4333,6 +4362,7 @@ if nettle.found()
 endif
 summary_info += {'SM4 ALG support':   crypto_sm4}
 summary_info += {'AF_ALG support':have_afalg}
+summary_info += {'GM/T 0018-2012 support': gmt_0018_2012.found()}
 summary_info += {'rng-none':  get_option('rng_none')}
 summary_info += {'Linux keyring': have_keyring}
 summary_info += {'Linux keyutils':keyutils}
diff --git a/meson_options.txt b/meson_options.txt
index 0a99a059ec..4f35d3d62d 100644
--- a/meson_options.txt
+++ b/meson_options.txt
@@ -174,6 +174,8 @@ option('gcrypt', type : 'feature', value : 'auto',
description: 'libgcrypt cryptography support')
 option('crypto_afalg', type : 'feature', value : 'disabled',
description: 'Linux AF_ALG crypto backend driver')
+option('crypto_gmt', type : 'feature', value : 'disabled',
+   description: 'GM/T 0018-2012 cryptographic standard driver')
 option('libdaxctl', type : 'feature', value : 'auto',
description: 'libdaxctl support')
 option('libpmem', type : 'feature', value : 'auto',
diff --git a/scripts/meson-buildoptions.sh b/scripts/meson-buildoptions.sh
index 680fa3f581..e116e7b9ed 100644
--- a/scripts/meson-buildoptions.sh
+++ b/scripts/meson-buildoptions.sh
@@ -106,6 +106,7 @@ meson_options_help() {
   printf "%s\n" '  colo-proxy  colo-proxy support'
   printf "%s\n" '  coreaudio   CoreAudio sound support'
   printf "%s\n" '  crypto-afalgLinux AF_ALG crypto backend driver'
+  printf "%s\n" '  crypto-gmt  GM/T 0018-2012 crypto backend driver'
   printf "%s\n" '  curlCURL block device driver'
   printf "%s\n" '  curses  curses UI'
   printf "%s\n" '  dbus-display-display dbus support'
@@ -282,6 +283,8 @@ _meson_option_parse() {
 --disable-coroutine-pool) printf "%s" -Dcoroutine_pool=false ;;
 --enable-crypto-afalg) printf "%s" -Dcrypto_afalg=enabled ;;
 --disable-crypto-afalg) printf "%s" -Dcrypto_afalg=disabled ;;
+--enable-crypto-gmt) printf "%s&quo

[PATCH v4 1/3] qmp: Switch x-query-virtio-status back to numeric encoding

2024-02-21 Thread Hyman Huang
x-query-virtio-status returns several sets of virtio feature and
status flags.  It goes back to v7.2.0.

In the initial commit 90c066cd682 (qmp: add QMP command
x-query-virtio-status), we returned them as numbers, using virtio's
well-known binary encoding.

The next commit f3034ad71fc (qmp: decode feature & status bits in
virtio-status) replaced the numbers by objects.  The objects represent
bits QEMU knows symbolically, and any unknown bits numerically just like
before.

Commit 8a8287981d1 (hmp: add virtio commands) added the matching HMP
command "info virtio" (and a few more, which aren't relevant here).

The symbolic representation uses lists of strings.  The string format is
undocumented.  The strings look like "WELL_KNOWN_SYMBOL: human readable
explanation".

This symbolic representation is nice for humans.  Machines it can save
the trouble of decoding virtio's well-known binary encoding.

However, we sometimes want to compare features and status bits without
caring for their exact meaning.  Say we want to verify the correctness
of the virtio negotiation between guest, QEMU, and OVS-DPDK.  We can use
QMP command x-query-virtio-status to retrieve vhost-user net device
features, and the "ovs-vsctl list interface" command to retrieve
interface features.  Without commit f3034ad71fc, we could then simply
compare the numbers.  With this commit, we first have to map from the
strings back to the numeric encoding.

Revert the decoding for QMP, but keep it for HMP.

This makes the QMP command easier to use for use cases where we
don't need to decode, like the comparison above.  For use cases
where we need to decode, we replace parsing undocumented strings by
decoding virtio's well-known binary encoding.

Incompatible change; acceptable because x-query-virtio-status
comes without a stability promise.

Signed-off-by: Hyman Huang 
Acked-by: Markus Armbruster 
---
 hw/virtio/virtio-hmp-cmds.c |  25 +++--
 hw/virtio/virtio-qmp.c  |  23 ++---
 qapi/virtio.json| 192 
 3 files changed, 45 insertions(+), 195 deletions(-)

diff --git a/hw/virtio/virtio-hmp-cmds.c b/hw/virtio/virtio-hmp-cmds.c
index 477c97dea2..721c630ab0 100644
--- a/hw/virtio/virtio-hmp-cmds.c
+++ b/hw/virtio/virtio-hmp-cmds.c
@@ -6,6 +6,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "virtio-qmp.h"
 #include "monitor/hmp.h"
 #include "monitor/monitor.h"
 #include "qapi/qapi-commands-virtio.h"
@@ -145,13 +146,17 @@ void hmp_virtio_status(Monitor *mon, const QDict *qdict)
 monitor_printf(mon, "  endianness:  %s\n",
s->device_endian);
 monitor_printf(mon, "  status:\n");
-hmp_virtio_dump_status(mon, s->status);
+hmp_virtio_dump_status(mon,
+qmp_decode_status(s->status));
 monitor_printf(mon, "  Guest features:\n");
-hmp_virtio_dump_features(mon, s->guest_features);
+hmp_virtio_dump_features(mon,
+qmp_decode_features(s->device_id, s->guest_features));
 monitor_printf(mon, "  Host features:\n");
-hmp_virtio_dump_features(mon, s->host_features);
+hmp_virtio_dump_features(mon,
+qmp_decode_features(s->device_id, s->host_features));
 monitor_printf(mon, "  Backend features:\n");
-hmp_virtio_dump_features(mon, s->backend_features);
+hmp_virtio_dump_features(mon,
+qmp_decode_features(s->device_id, s->backend_features));
 
 if (s->vhost_dev) {
 monitor_printf(mon, "  VHost:\n");
@@ -172,13 +177,17 @@ void hmp_virtio_status(Monitor *mon, const QDict *qdict)
 monitor_printf(mon, "log_size:   %"PRId64"\n",
s->vhost_dev->log_size);
 monitor_printf(mon, "Features:\n");
-hmp_virtio_dump_features(mon, s->vhost_dev->features);
+hmp_virtio_dump_features(mon,
+qmp_decode_features(s->device_id, s->vhost_dev->features));
 monitor_printf(mon, "Acked features:\n");
-hmp_virtio_dump_features(mon, s->vhost_dev->acked_features);
+hmp_virtio_dump_features(mon,
+qmp_decode_features(s->device_id, s->vhost_dev->acked_features));
 monitor_printf(mon, "Backend features:\n");
-hmp_virtio_dump_features(mon, s->vhost_dev->backend_features);
+hmp_virtio_dump_features(mon,
+qmp_decode_features(s->device_id, s->vhost_dev->backend_features));
 monitor_printf(mon, "Protocol features:\n");
-hmp_virtio_dump_protocols(mon, s->vhost_dev->protocol_features);
+hmp_virtio_dump_protocols(mon,
+qmp_decode_protocols(s->vhost_dev->protocol_features));
 }
 
 qapi_free_VirtioStatus(s);
diff --git a/hw/virtio/virtio-qmp.c b/

[PATCH v4 2/3] virtio: Declare the decoding functions to static

2024-02-21 Thread Hyman Huang
qmp_decode_protocols(), qmp_decode_status(), and qmp_decode_features()
are now only used in virtio-hmp-cmds.c.  So move them into there,
redeclare them to static, and replace the qmp_ prefix with hmp_.

Signed-off-by: Hyman Huang 
---
 hw/virtio/meson.build   |   4 +-
 hw/virtio/virtio-hmp-cmds.c | 677 +++-
 hw/virtio/virtio-qmp.c  | 661 ---
 hw/virtio/virtio-qmp.h  |   3 -
 4 files changed, 671 insertions(+), 674 deletions(-)

diff --git a/hw/virtio/meson.build b/hw/virtio/meson.build
index d7f18c96e6..384fbf7e32 100644
--- a/hw/virtio/meson.build
+++ b/hw/virtio/meson.build
@@ -9,7 +9,7 @@ system_virtio_ss.add(when: 'CONFIG_VHOST_VDPA_DEV', if_true: 
files('vdpa-dev.c')
 
 specific_virtio_ss = ss.source_set()
 specific_virtio_ss.add(files('virtio.c'))
-specific_virtio_ss.add(files('virtio-config-io.c', 'virtio-qmp.c'))
+specific_virtio_ss.add(files('virtio-config-io.c', 'virtio-hmp-cmds.c'))
 
 if have_vhost
   system_virtio_ss.add(files('vhost.c'))
@@ -87,7 +87,7 @@ specific_virtio_ss.add_all(when: 'CONFIG_VIRTIO_PCI', 
if_true: virtio_pci_ss)
 system_ss.add_all(when: 'CONFIG_VIRTIO', if_true: system_virtio_ss)
 system_ss.add(when: 'CONFIG_VIRTIO', if_false: files('vhost-stub.c'))
 system_ss.add(when: 'CONFIG_VIRTIO', if_false: files('virtio-stub.c'))
-system_ss.add(files('virtio-hmp-cmds.c'))
+system_ss.add(files('virtio-qmp.c'))
 
 specific_ss.add_all(when: 'CONFIG_VIRTIO', if_true: specific_virtio_ss)
 system_ss.add(when: 'CONFIG_ACPI', if_true: files('virtio-acpi.c'))
diff --git a/hw/virtio/virtio-hmp-cmds.c b/hw/virtio/virtio-hmp-cmds.c
index 721c630ab0..f95bad0069 100644
--- a/hw/virtio/virtio-hmp-cmds.c
+++ b/hw/virtio/virtio-hmp-cmds.c
@@ -11,7 +11,668 @@
 #include "monitor/monitor.h"
 #include "qapi/qapi-commands-virtio.h"
 #include "qapi/qmp/qdict.h"
+#include "hw/virtio/vhost-user.h"
 
+#include "standard-headers/linux/virtio_ids.h"
+#include "standard-headers/linux/vhost_types.h"
+#include "standard-headers/linux/virtio_blk.h"
+#include "standard-headers/linux/virtio_console.h"
+#include "standard-headers/linux/virtio_gpu.h"
+#include "standard-headers/linux/virtio_net.h"
+#include "standard-headers/linux/virtio_scsi.h"
+#include "standard-headers/linux/virtio_i2c.h"
+#include "standard-headers/linux/virtio_balloon.h"
+#include "standard-headers/linux/virtio_iommu.h"
+#include "standard-headers/linux/virtio_mem.h"
+#include "standard-headers/linux/virtio_vsock.h"
+#include "standard-headers/linux/virtio_gpio.h"
+
+#include CONFIG_DEVICES
+
+#define FEATURE_ENTRY(name, desc) (qmp_virtio_feature_map_t) \
+{ .virtio_bit = name, .feature_desc = desc }
+
+/* Virtio transport features mapping */
+static const qmp_virtio_feature_map_t virtio_transport_map[] = {
+/* Virtio device transport features */
+#ifndef VIRTIO_CONFIG_NO_LEGACY
+FEATURE_ENTRY(VIRTIO_F_NOTIFY_ON_EMPTY, \
+"VIRTIO_F_NOTIFY_ON_EMPTY: Notify when device runs out of avail. "
+"descs. on VQ"),
+FEATURE_ENTRY(VIRTIO_F_ANY_LAYOUT, \
+"VIRTIO_F_ANY_LAYOUT: Device accepts arbitrary desc. layouts"),
+#endif /* !VIRTIO_CONFIG_NO_LEGACY */
+FEATURE_ENTRY(VIRTIO_F_VERSION_1, \
+"VIRTIO_F_VERSION_1: Device compliant for v1 spec (legacy)"),
+FEATURE_ENTRY(VIRTIO_F_IOMMU_PLATFORM, \
+"VIRTIO_F_IOMMU_PLATFORM: Device can be used on IOMMU platform"),
+FEATURE_ENTRY(VIRTIO_F_RING_PACKED, \
+"VIRTIO_F_RING_PACKED: Device supports packed VQ layout"),
+FEATURE_ENTRY(VIRTIO_F_IN_ORDER, \
+"VIRTIO_F_IN_ORDER: Device uses buffers in same order as made "
+"available by driver"),
+FEATURE_ENTRY(VIRTIO_F_ORDER_PLATFORM, \
+"VIRTIO_F_ORDER_PLATFORM: Memory accesses ordered by platform"),
+FEATURE_ENTRY(VIRTIO_F_SR_IOV, \
+"VIRTIO_F_SR_IOV: Device supports single root I/O virtualization"),
+FEATURE_ENTRY(VIRTIO_F_RING_RESET, \
+"VIRTIO_F_RING_RESET: Driver can reset a queue individually"),
+/* Virtio ring transport features */
+FEATURE_ENTRY(VIRTIO_RING_F_INDIRECT_DESC, \
+"VIRTIO_RING_F_INDIRECT_DESC: Indirect descriptors supported"),
+FEATURE_ENTRY(VIRTIO_RING_F_EVENT_IDX, \
+"VIRTIO_RING_F_EVENT_IDX: Used & avail. event fields enabled"),
+{ -1, "" }
+};
+
+/* Vhost-user protocol features mapping */
+static const qmp_virtio_feature_map_t vhost_user_protocol_map[] = {
+FEATURE_ENTRY(VHOST_USER_PROTOCOL_F_MQ, \
+"VHOST_USER_PROTOCOL_F_MQ: Multiqueue protocol supported"),
+FEATURE_ENTRY(V

[PATCH v4 3/3] qapi: Define VhostDeviceProtocols and VirtioDeviceFeatures as plain C types

2024-02-21 Thread Hyman Huang
VhostDeviceProtocols and VirtioDeviceFeatures are only used in
virtio-hmp-cmds.c.  So define them as plain C types there, and drop
them from the QAPI schema.

Signed-off-by: Hyman Huang 
Reviewed-by: Markus Armbruster 
---
 hw/virtio/virtio-hmp-cmds.c | 16 +++
 qapi/virtio.json| 39 -
 2 files changed, 16 insertions(+), 39 deletions(-)

diff --git a/hw/virtio/virtio-hmp-cmds.c b/hw/virtio/virtio-hmp-cmds.c
index f95bad0069..045b472228 100644
--- a/hw/virtio/virtio-hmp-cmds.c
+++ b/hw/virtio/virtio-hmp-cmds.c
@@ -29,6 +29,22 @@
 
 #include CONFIG_DEVICES
 
+typedef struct VhostDeviceProtocols VhostDeviceProtocols;
+struct VhostDeviceProtocols {
+strList *protocols;
+bool has_unknown_protocols;
+uint64_t unknown_protocols;
+};
+
+typedef struct VirtioDeviceFeatures VirtioDeviceFeatures;
+struct VirtioDeviceFeatures {
+strList *transports;
+bool has_dev_features;
+strList *dev_features;
+bool has_unknown_dev_features;
+uint64_t unknown_dev_features;
+};
+
 #define FEATURE_ENTRY(name, desc) (qmp_virtio_feature_map_t) \
 { .virtio_bit = name, .feature_desc = desc }
 
diff --git a/qapi/virtio.json b/qapi/virtio.json
index 26516fb29c..42dbc87f2f 100644
--- a/qapi/virtio.json
+++ b/qapi/virtio.json
@@ -300,45 +300,6 @@
   'data': { 'statuses': [ 'str' ],
 '*unknown-statuses': 'uint8' } }
 
-##
-# @VhostDeviceProtocols:
-#
-# A structure defined to list the vhost user protocol features of a
-# Vhost User device
-#
-# @protocols: List of decoded vhost user protocol features of a vhost
-# user device
-#
-# @unknown-protocols: Vhost user device protocol features bitmap that
-# have not been decoded
-#
-# Since: 7.2
-##
-{ 'struct': 'VhostDeviceProtocols',
-  'data': { 'protocols': [ 'str' ],
-'*unknown-protocols': 'uint64' } }
-
-##
-# @VirtioDeviceFeatures:
-#
-# The common fields that apply to most Virtio devices.  Some devices
-# may not have their own device-specific features (e.g. virtio-rng).
-#
-# @transports: List of transport features of the virtio device
-#
-# @dev-features: List of device-specific features (if the device has
-# unique features)
-#
-# @unknown-dev-features: Virtio device features bitmap that have not
-# been decoded
-#
-# Since: 7.2
-##
-{ 'struct': 'VirtioDeviceFeatures',
-  'data': { 'transports': [ 'str' ],
-'*dev-features': [ 'str' ],
-'*unknown-dev-features': 'uint64' } }
-
 ##
 # @VirtQueueStatus:
 #
-- 
2.39.3




[PATCH v4 0/3] Adjust the output of x-query-virtio-status

2024-02-21 Thread Hyman Huang
v4:
- Rebase on master
- Fix the syntax mistake within the commit message of [PATCH v3 1/3]
- Adjust the linking file in hw/virtio/meson.build suggested by Markus

Please review,
Yong

v3:
- Rebase on master
- Use the refined commit message furnished by Markus for [PATCH v2 1/2]
- Drop the [PATCH v2 2/2]
- Add [PATCH v3 2/3] to declare the decoding functions to static
- Add [PATCH v3 3/3] to Define VhostDeviceProtocols and
  VirtioDeviceFeatures as plain C types

v2:
- Changing the hmp_virtio_dump_xxx function signatures to implement
  the bitmap decoding, suggested by Philippe.

This patchset is derived from the series:
https://lore.kernel.org/qemu-devel/cover.1699793550.git.yong.hu...@smartx.com/
Please go to the link to see more background information.

The following points are what we have done in the patchset:
1. Take the policy of adding human-readable output just in HMP.
2. For the HMP output, display the human-readable information and
   drop the unknown bits in practice.
3. For the QMP output, remove the descriptive strings and only
   display bits encoded as numbers.

Hyman Huang (3):
  qmp: Switch x-query-virtio-status back to numeric encoding
  virtio: Declare the decoding functions to static
  qapi: Define VhostDeviceProtocols and VirtioDeviceFeatures as plain C
types

 hw/virtio/meson.build   |   4 +-
 hw/virtio/virtio-hmp-cmds.c | 702 +++-
 hw/virtio/virtio-qmp.c  | 684 +--
 hw/virtio/virtio-qmp.h  |   3 -
 qapi/virtio.json| 231 +---
 5 files changed, 724 insertions(+), 900 deletions(-)

-- 
2.39.3




[PATCH] qapi: Craft the BlockdevCreateOptionsLUKS comment

2024-02-20 Thread Hyman Huang
Add comment in detail for commit 433957bb7f (qapi:
Make parameter 'file' optional for
BlockdevCreateOptionsLUKS).

Signed-off-by: Hyman Huang 
---
 qapi/block-core.json | 20 +++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index ab5a93a966..42b0840d43 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -4973,7 +4973,25 @@
 ##
 # @BlockdevCreateOptionsLUKS:
 #
-# Driver specific image creation options for LUKS.
+# Driver specific image creation options for LUKS. Note that
+# @file is required if @preallocation is specified and equals
+# PREALLOC_MODE_ON. The following three scenarios determine how
+# creation logic behaves when @preallocation is either equal to
+# PREALLOC_MODE_OFF or is not given:
+#
+#  1) When @file is given only, format the block device referenced
+# by @file as the LUKS specification and trunk it to the @size.
+# In this case, the @size should reflect amount of space made
+# available to the guest, so the trunk size must take account
+# of that which will be used by the crypto header.
+#
+#  2) When @header is given only, just format the block device
+# referenced by @header as the LUKS specification.
+#
+#  3) When both @file and @header are given, block device
+# referenced by @file should be trunked to @size, and block
+# device referenced by @header should be formatted as the LUKS
+# specification.
 #
 # @file: Node to create the image format on, mandatory except when
 #'preallocation' is not requested
-- 
2.39.3




[PATCH] docs/devel: Add introduction to LUKS volume with detached header

2024-02-19 Thread Hyman Huang
Signed-off-by: Hyman Huang 
---
 MAINTAINERS |   1 +
 docs/devel/luks-detached-header.rst | 182 
 2 files changed, 183 insertions(+)
 create mode 100644 docs/devel/luks-detached-header.rst

diff --git a/MAINTAINERS b/MAINTAINERS
index a24c2b51b6..e8b03032ab 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3422,6 +3422,7 @@ Detached LUKS header
 M: Hyman Huang 
 S: Maintained
 F: tests/qemu-iotests/tests/luks-detached-header
+F: docs/devel/luks-detached-header.rst
 
 D-Bus
 M: Marc-André Lureau 
diff --git a/docs/devel/luks-detached-header.rst 
b/docs/devel/luks-detached-header.rst
new file mode 100644
index 00..15e9ccde1d
--- /dev/null
+++ b/docs/devel/luks-detached-header.rst
@@ -0,0 +1,182 @@
+
+LUKS volume with detached header
+
+
+Introduction
+
+
+This document gives an overview of the design of LUKS volume with detached
+header and how to use it.
+
+Background
+==
+
+The LUKS format has ability to store the header in a separate volume from
+the payload. We could extend the LUKS driver in QEMU to support this use
+case.
+
+Normally a LUKS volume has a layout:
+
+::
+
+ +---+
+ | |||
+ disk| header  |  key material  |  disk payload data |
+ | |||
+ +---+
+
+With a detached LUKS header, you need 2 disks so getting:
+
+::
+
+ +--+
+ disk1   |   header  | key material |
+ +--+
+ +-+
+ disk2   |  disk payload data  |
+ +-+
+
+There are a variety of benefits to doing this:
+
+ * Secrecy - the disk2 cannot be identified as containing LUKS
+ volume since there's no header
+ * Control - if access to the disk1 is restricted, then even
+ if someone has access to disk2 they can't unlock
+ it. Might be useful if you have disks on NFS but
+ want to restrict which host can launch a VM
+ instance from it, by dynamically providing access
+ to the header to a designated host
+ * Flexibility - your application data volume may be a given
+ size and it is inconvenient to resize it to
+ add encryption.You can store the LUKS header
+ separately and use the existing storage
+ volume for payload
+ * Recovery - corruption of a bit in the header may make the
+  entire payload inaccessible. It might be
+  convenient to take backups of the header. If
+  your primary disk header becomes corrupt, you
+  can unlock the data still by pointing to the
+  backup detached header
+
+Architecture
+
+
+Take the qcow2 encryption, for example. The architecture of the
+LUKS volume with detached header is shown in the diagram below.
+
+There are two children of the root node: a file and a header.
+Data from the disk payload is stored in the file node. The
+LUKS header and key material are located in the header node,
+as previously mentioned.
+
+::
+
+   +-+
+  Root node|  foo[luks]  |
+   +-+
+  |   |
+ file |header |
+  |   |
+   +-++--+
+  Child node   |payload-format[qcow2]||header-format[raw]|
+   +-++--+
+  |   |
+ file | file  |
+  |   |
+   +--+  +-+
+  Child node   |payload-protocol[file]|  |header-protocol[file]|
+   +--+  +-+
+  |   |
+  |   |
+  |   |
+ Host storageHost storage
+
+Usage
+=
+
+Create a LUKS disk with a detached header using qemu-img
+
+
+Shell commandline::
+
+# qemu-img create --object secret,id=sec0,data=abc123 -f luks \
+> -o cipher-alg=aes-256,cipher-mode=xts -o key-secret=sec0 \
+> -o detached-header=true test-header.img
+# qemu-img create -f qcow2 test-payload.qcow2 200G
+# qemu-img info 'json:{"driver":"luks","file":{"filename": \
+> "test-payload.img"},"header":{"filename

[PATCH v3 2/3] virtio: Declare the decoding functions to static

2024-02-02 Thread Hyman Huang
qmp_decode_protocols(), qmp_decode_status(), and qmp_decode_features()
are now only used in virtio-hmp-cmds.c.  So move them into there,
redeclare them to static, and replace the qmp_ prefix with hmp_.

Signed-off-by: Hyman Huang 
---
 hw/virtio/meson.build   |   3 +-
 hw/virtio/virtio-hmp-cmds.c | 677 +++-
 hw/virtio/virtio-qmp.c  | 661 ---
 hw/virtio/virtio-qmp.h  |   3 -
 4 files changed, 670 insertions(+), 674 deletions(-)

diff --git a/hw/virtio/meson.build b/hw/virtio/meson.build
index 47baf00366..6665669480 100644
--- a/hw/virtio/meson.build
+++ b/hw/virtio/meson.build
@@ -9,7 +9,7 @@ system_virtio_ss.add(when: 'CONFIG_VHOST_VDPA_DEV', if_true: 
files('vdpa-dev.c')
 
 specific_virtio_ss = ss.source_set()
 specific_virtio_ss.add(files('virtio.c'))
-specific_virtio_ss.add(files('virtio-config-io.c', 'virtio-qmp.c'))
+specific_virtio_ss.add(files('virtio-config-io.c', 'virtio-qmp.c', 
'virtio-hmp-cmds.c'))
 
 if have_vhost
   system_virtio_ss.add(files('vhost.c'))
@@ -74,7 +74,6 @@ specific_virtio_ss.add_all(when: 'CONFIG_VIRTIO_PCI', 
if_true: virtio_pci_ss)
 system_ss.add_all(when: 'CONFIG_VIRTIO', if_true: system_virtio_ss)
 system_ss.add(when: 'CONFIG_VIRTIO', if_false: files('vhost-stub.c'))
 system_ss.add(when: 'CONFIG_VIRTIO', if_false: files('virtio-stub.c'))
-system_ss.add(files('virtio-hmp-cmds.c'))
 
 specific_ss.add_all(when: 'CONFIG_VIRTIO', if_true: specific_virtio_ss)
 system_ss.add(when: 'CONFIG_ACPI', if_true: files('virtio-acpi.c'))
diff --git a/hw/virtio/virtio-hmp-cmds.c b/hw/virtio/virtio-hmp-cmds.c
index 721c630ab0..f95bad0069 100644
--- a/hw/virtio/virtio-hmp-cmds.c
+++ b/hw/virtio/virtio-hmp-cmds.c
@@ -11,7 +11,668 @@
 #include "monitor/monitor.h"
 #include "qapi/qapi-commands-virtio.h"
 #include "qapi/qmp/qdict.h"
+#include "hw/virtio/vhost-user.h"
 
+#include "standard-headers/linux/virtio_ids.h"
+#include "standard-headers/linux/vhost_types.h"
+#include "standard-headers/linux/virtio_blk.h"
+#include "standard-headers/linux/virtio_console.h"
+#include "standard-headers/linux/virtio_gpu.h"
+#include "standard-headers/linux/virtio_net.h"
+#include "standard-headers/linux/virtio_scsi.h"
+#include "standard-headers/linux/virtio_i2c.h"
+#include "standard-headers/linux/virtio_balloon.h"
+#include "standard-headers/linux/virtio_iommu.h"
+#include "standard-headers/linux/virtio_mem.h"
+#include "standard-headers/linux/virtio_vsock.h"
+#include "standard-headers/linux/virtio_gpio.h"
+
+#include CONFIG_DEVICES
+
+#define FEATURE_ENTRY(name, desc) (qmp_virtio_feature_map_t) \
+{ .virtio_bit = name, .feature_desc = desc }
+
+/* Virtio transport features mapping */
+static const qmp_virtio_feature_map_t virtio_transport_map[] = {
+/* Virtio device transport features */
+#ifndef VIRTIO_CONFIG_NO_LEGACY
+FEATURE_ENTRY(VIRTIO_F_NOTIFY_ON_EMPTY, \
+"VIRTIO_F_NOTIFY_ON_EMPTY: Notify when device runs out of avail. "
+"descs. on VQ"),
+FEATURE_ENTRY(VIRTIO_F_ANY_LAYOUT, \
+"VIRTIO_F_ANY_LAYOUT: Device accepts arbitrary desc. layouts"),
+#endif /* !VIRTIO_CONFIG_NO_LEGACY */
+FEATURE_ENTRY(VIRTIO_F_VERSION_1, \
+"VIRTIO_F_VERSION_1: Device compliant for v1 spec (legacy)"),
+FEATURE_ENTRY(VIRTIO_F_IOMMU_PLATFORM, \
+"VIRTIO_F_IOMMU_PLATFORM: Device can be used on IOMMU platform"),
+FEATURE_ENTRY(VIRTIO_F_RING_PACKED, \
+"VIRTIO_F_RING_PACKED: Device supports packed VQ layout"),
+FEATURE_ENTRY(VIRTIO_F_IN_ORDER, \
+"VIRTIO_F_IN_ORDER: Device uses buffers in same order as made "
+"available by driver"),
+FEATURE_ENTRY(VIRTIO_F_ORDER_PLATFORM, \
+"VIRTIO_F_ORDER_PLATFORM: Memory accesses ordered by platform"),
+FEATURE_ENTRY(VIRTIO_F_SR_IOV, \
+"VIRTIO_F_SR_IOV: Device supports single root I/O virtualization"),
+FEATURE_ENTRY(VIRTIO_F_RING_RESET, \
+"VIRTIO_F_RING_RESET: Driver can reset a queue individually"),
+/* Virtio ring transport features */
+FEATURE_ENTRY(VIRTIO_RING_F_INDIRECT_DESC, \
+"VIRTIO_RING_F_INDIRECT_DESC: Indirect descriptors supported"),
+FEATURE_ENTRY(VIRTIO_RING_F_EVENT_IDX, \
+"VIRTIO_RING_F_EVENT_IDX: Used & avail. event fields enabled"),
+{ -1, "" }
+};
+
+/* Vhost-user protocol features mapping */
+static const qmp_virtio_feature_map_t vhost_user_protocol_map[] = {
+FEATURE_ENTRY(VHOST_USER_PROTOCOL_F_MQ, \
+"VHOST_USER_PROTOCOL_F_MQ: Multiqueue protocol supported"),
+FEATURE_ENTRY(VHOST_USER_PROTOCOL_F_LOG_SHMFD, \
+"VHOST_U

[PATCH v3 1/3] qmp: Switch x-query-virtio-status back to numeric encoding

2024-02-02 Thread Hyman Huang
x-query-virtio-status returns several sets of virtio feature and
status flags.  It goes back to v7.2.0.

In the initial commit 90c066cd682 (qmp: add QMP command
x-query-virtio-status), we returned them as numbers, using virtio's
well-known binary encoding.

The next commit f3034ad71fc (qmp: decode feature & status bits in
virtio-status) replaced the numbers by objects.  The objects represent
bits QEMU knows symbolically, and any unknown bits numerically just like
before.

Commit 8a8287981d1 (hmp: add virtio commands) the matching HMP command
"info virtio" (and a few more, which aren't relevant here).

The symbolic representation uses lists of strings.  The string format is
undocumented.  The strings look like "WELL_KNOWN_SYMBOL: human readable
explanation".

This symbolic representation is nice for humans.  Machines it can save
the trouble of decoding virtio's well-known binary encoding.

However, we sometimes want to compare features and status bits without
caring for their exact meaning.  Say we want to verify the correctness
of the virtio negotiation between guest, QEMU, and OVS-DPDK.  We can use
QMP command x-query-virtio-status to retrieve vhost-user net device
features, and the "ovs-vsctl list interface" command to retrieve
interface features.  Without commit f3034ad71fc, we could then simply
compare the numbers.  With this commit, we first have to map from the
strings back to the numeric encoding.

Revert the decoding for QMP, but keep it for HMP.

This makes the QMP command easier to use for use cases where we
don't need to decode, like the comparison above.  For use cases
where we need to decode, we replace parsing undocumented strings by
decoding virtio's well-known binary encoding.

Incompatible change; acceptable because x-query-virtio-status does
comes without a stability promise.

Signed-off-by: Hyman Huang 
---
 hw/virtio/virtio-hmp-cmds.c |  25 +++--
 hw/virtio/virtio-qmp.c  |  23 ++---
 qapi/virtio.json| 192 
 3 files changed, 45 insertions(+), 195 deletions(-)

diff --git a/hw/virtio/virtio-hmp-cmds.c b/hw/virtio/virtio-hmp-cmds.c
index 477c97dea2..721c630ab0 100644
--- a/hw/virtio/virtio-hmp-cmds.c
+++ b/hw/virtio/virtio-hmp-cmds.c
@@ -6,6 +6,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "virtio-qmp.h"
 #include "monitor/hmp.h"
 #include "monitor/monitor.h"
 #include "qapi/qapi-commands-virtio.h"
@@ -145,13 +146,17 @@ void hmp_virtio_status(Monitor *mon, const QDict *qdict)
 monitor_printf(mon, "  endianness:  %s\n",
s->device_endian);
 monitor_printf(mon, "  status:\n");
-hmp_virtio_dump_status(mon, s->status);
+hmp_virtio_dump_status(mon,
+qmp_decode_status(s->status));
 monitor_printf(mon, "  Guest features:\n");
-hmp_virtio_dump_features(mon, s->guest_features);
+hmp_virtio_dump_features(mon,
+qmp_decode_features(s->device_id, s->guest_features));
 monitor_printf(mon, "  Host features:\n");
-hmp_virtio_dump_features(mon, s->host_features);
+hmp_virtio_dump_features(mon,
+qmp_decode_features(s->device_id, s->host_features));
 monitor_printf(mon, "  Backend features:\n");
-hmp_virtio_dump_features(mon, s->backend_features);
+hmp_virtio_dump_features(mon,
+qmp_decode_features(s->device_id, s->backend_features));
 
 if (s->vhost_dev) {
 monitor_printf(mon, "  VHost:\n");
@@ -172,13 +177,17 @@ void hmp_virtio_status(Monitor *mon, const QDict *qdict)
 monitor_printf(mon, "log_size:   %"PRId64"\n",
s->vhost_dev->log_size);
 monitor_printf(mon, "Features:\n");
-hmp_virtio_dump_features(mon, s->vhost_dev->features);
+hmp_virtio_dump_features(mon,
+qmp_decode_features(s->device_id, s->vhost_dev->features));
 monitor_printf(mon, "Acked features:\n");
-hmp_virtio_dump_features(mon, s->vhost_dev->acked_features);
+hmp_virtio_dump_features(mon,
+qmp_decode_features(s->device_id, s->vhost_dev->acked_features));
 monitor_printf(mon, "Backend features:\n");
-hmp_virtio_dump_features(mon, s->vhost_dev->backend_features);
+hmp_virtio_dump_features(mon,
+qmp_decode_features(s->device_id, s->vhost_dev->backend_features));
 monitor_printf(mon, "Protocol features:\n");
-hmp_virtio_dump_protocols(mon, s->vhost_dev->protocol_features);
+hmp_virtio_dump_protocols(mon,
+qmp_decode_protocols(s->vhost_dev->protocol_features));
 }
 
 qapi_free_VirtioStatus(s);
diff --git a/hw/virtio/virtio-qmp.c b/hw/virtio/virtio-qmp.c

[PATCH v3 3/3] qapi: Define VhostDeviceProtocols and VirtioDeviceFeatures as plain C types

2024-02-02 Thread Hyman Huang
VhostDeviceProtocols and VirtioDeviceFeatures are only used in
virtio-hmp-cmds.c.  So define them as plain C types there, and drop
them from the QAPI schema.

Signed-off-by: Hyman Huang 
---
 hw/virtio/virtio-hmp-cmds.c | 16 +++
 qapi/virtio.json| 39 -
 2 files changed, 16 insertions(+), 39 deletions(-)

diff --git a/hw/virtio/virtio-hmp-cmds.c b/hw/virtio/virtio-hmp-cmds.c
index f95bad0069..045b472228 100644
--- a/hw/virtio/virtio-hmp-cmds.c
+++ b/hw/virtio/virtio-hmp-cmds.c
@@ -29,6 +29,22 @@
 
 #include CONFIG_DEVICES
 
+typedef struct VhostDeviceProtocols VhostDeviceProtocols;
+struct VhostDeviceProtocols {
+strList *protocols;
+bool has_unknown_protocols;
+uint64_t unknown_protocols;
+};
+
+typedef struct VirtioDeviceFeatures VirtioDeviceFeatures;
+struct VirtioDeviceFeatures {
+strList *transports;
+bool has_dev_features;
+strList *dev_features;
+bool has_unknown_dev_features;
+uint64_t unknown_dev_features;
+};
+
 #define FEATURE_ENTRY(name, desc) (qmp_virtio_feature_map_t) \
 { .virtio_bit = name, .feature_desc = desc }
 
diff --git a/qapi/virtio.json b/qapi/virtio.json
index 26516fb29c..42dbc87f2f 100644
--- a/qapi/virtio.json
+++ b/qapi/virtio.json
@@ -300,45 +300,6 @@
   'data': { 'statuses': [ 'str' ],
 '*unknown-statuses': 'uint8' } }
 
-##
-# @VhostDeviceProtocols:
-#
-# A structure defined to list the vhost user protocol features of a
-# Vhost User device
-#
-# @protocols: List of decoded vhost user protocol features of a vhost
-# user device
-#
-# @unknown-protocols: Vhost user device protocol features bitmap that
-# have not been decoded
-#
-# Since: 7.2
-##
-{ 'struct': 'VhostDeviceProtocols',
-  'data': { 'protocols': [ 'str' ],
-'*unknown-protocols': 'uint64' } }
-
-##
-# @VirtioDeviceFeatures:
-#
-# The common fields that apply to most Virtio devices.  Some devices
-# may not have their own device-specific features (e.g. virtio-rng).
-#
-# @transports: List of transport features of the virtio device
-#
-# @dev-features: List of device-specific features (if the device has
-# unique features)
-#
-# @unknown-dev-features: Virtio device features bitmap that have not
-# been decoded
-#
-# Since: 7.2
-##
-{ 'struct': 'VirtioDeviceFeatures',
-  'data': { 'transports': [ 'str' ],
-'*dev-features': [ 'str' ],
-'*unknown-dev-features': 'uint64' } }
-
 ##
 # @VirtQueueStatus:
 #
-- 
2.31.1




[PATCH v3 0/3] Adjust the output of x-query-virtio-status

2024-02-02 Thread Hyman Huang
Sorry for the late post of version 3. The modifications are as follows:

v3:
- Rebase on master
- Use the refined commit message furnished by Markus for [PATCH v2 1/2] 
- Drop the [PATCH v2 2/2]
- Add [PATCH v3 2/3] to declare the decoding functions to static
- Add [PATCH v3 3/3] to Define VhostDeviceProtocols and
  VirtioDeviceFeatures as plain C types

Since Markus inspired all of the alterations above, we would like to
thank him for his contribution to this series.

Please review,
Yong

v2:
- Changing the hmp_virtio_dump_xxx function signatures to implement
  the bitmap decoding, suggested by Philippe. 

This patchset is derived from the series:
https://lore.kernel.org/qemu-devel/cover.1699793550.git.yong.hu...@smartx.com/
Please go to the link to see more background information.

The following points are what we have done in the patchset:
1. Take the policy of adding human-readable output just in HMP.
2. For the HMP output, display the human-readable information and
   drop the unknown bits in practice.
3. For the QMP output, remove the descriptive strings and only
   display bits encoded as numbers.

Hyman Huang (3):
  qmp: Switch x-query-virtio-status back to numeric encoding
  virtio: Declare the decoding functions to static
  qapi: Define VhostDeviceProtocols and VirtioDeviceFeatures as plain C
types

 hw/virtio/meson.build   |   3 +-
 hw/virtio/virtio-hmp-cmds.c | 702 +++-
 hw/virtio/virtio-qmp.c  | 684 +--
 hw/virtio/virtio-qmp.h  |   3 -
 qapi/virtio.json| 231 +---
 5 files changed, 723 insertions(+), 900 deletions(-)

-- 
2.31.1




[PATCH v2 1/2] i386/sev: Sort the error message

2024-01-07 Thread Hyman Huang
Prior to giving the caller the return number(in the next commit),
sorting the error message:
1. report the error number on the ram_block_discard_disable
   failure path
2. report the error number on the syscall "open" failure path
3. report EINVAL when a prerequisite check fails or the command
   line is invalid

Signed-off-by: Hyman Huang 
---
 target/i386/sev.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/target/i386/sev.c b/target/i386/sev.c
index 9a71246682..96eff73001 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -923,7 +923,7 @@ int sev_kvm_init(ConfidentialGuestSupport *cgs, Error 
**errp)
 ret = ram_block_discard_disable(true);
 if (ret) {
 error_report("%s: cannot disable RAM discard", __func__);
-return -1;
+return ret;
 }
 
 sev_guest = sev;
@@ -940,6 +940,7 @@ int sev_kvm_init(ConfidentialGuestSupport *cgs, Error 
**errp)
 if (host_cbitpos != sev->cbitpos) {
 error_setg(errp, "%s: cbitpos check failed, host '%d' requested '%d'",
__func__, host_cbitpos, sev->cbitpos);
+ret = -EINVAL;
 goto err;
 }
 
@@ -952,11 +953,12 @@ int sev_kvm_init(ConfidentialGuestSupport *cgs, Error 
**errp)
 error_setg(errp, "%s: reduced_phys_bits check failed,"
" it should be in the range of 1 to 63, requested '%d'",
__func__, sev->reduced_phys_bits);
+ret = -EINVAL;
 goto err;
 }
 
 devname = object_property_get_str(OBJECT(sev), "sev-device", NULL);
-sev->sev_fd = open(devname, O_RDWR);
+ret = sev->sev_fd = open(devname, O_RDWR);
 if (sev->sev_fd < 0) {
 error_setg(errp, "%s: Failed to open %s '%s'", __func__,
devname, strerror(errno));
@@ -981,6 +983,7 @@ int sev_kvm_init(ConfidentialGuestSupport *cgs, Error 
**errp)
 if (!kvm_kernel_irqchip_allowed()) {
 error_report("%s: SEV-ES guests require in-kernel irqchip support",
  __func__);
+ret = -EINVAL;
 goto err;
 }
 
@@ -988,6 +991,7 @@ int sev_kvm_init(ConfidentialGuestSupport *cgs, Error 
**errp)
 error_report("%s: guest policy requires SEV-ES, but "
  "host SEV-ES support unavailable",
  __func__);
+ret = -EINVAL;
 goto err;
 }
 cmd = KVM_SEV_ES_INIT;
-- 
2.39.1




[PATCH v2 0/2] Nitpick at the error message's output

2024-01-07 Thread Hyman Huang
v2:
- rebase on master
- add a commit to sort the error message so that an explanation
  error number can be returned on all failure paths

Hyman Huang (2):
  i386/sev: Sort the error message
  i386/sev: Nitpick at the error message's output

 target/i386/sev.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

-- 
2.39.1




[PATCH v2 2/2] i386/sev: Nitpick at the error message's output

2024-01-07 Thread Hyman Huang
The incorrect error message was produced as a result of
the return number being disregarded on the sev_kvm_init
failure path.

For instance, when a user's failure to launch a SEV guest
is caused by an incorrect IOCTL, the following message is
reported:

kvm: sev_kvm_init: failed to initialize ret=-25 fw_error=0
kvm: failed to initialize kvm: Operation not permitted

While the error message's accurate output should be:

kvm: sev_kvm_init: failed to initialize ret=-25 fw_error=0
kvm: failed to initialize kvm: Inappropriate ioctl for device

Fix this by returning the return number directly on the
failure path.

Signed-off-by: Hyman Huang 
Reviewed-by: Daniel P. Berrangé 
Message-Id: 

---
 target/i386/sev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/i386/sev.c b/target/i386/sev.c
index 96eff73001..3fef8cf163 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -1023,7 +1023,7 @@ int sev_kvm_init(ConfidentialGuestSupport *cgs, Error 
**errp)
 err:
 sev_guest = NULL;
 ram_block_discard_disable(false);
-return -1;
+return ret;
 }
 
 int
-- 
2.39.1




[PATCH] i386/sev: Nitpick at the error message's output

2024-01-05 Thread Hyman Huang
The incorrect error message was produced as a result of
the return number being disregarded on the sev_kvm_init
failure path.

For instance, when a user's failure to launch a SEV guest
is caused by an incorrect IOCTL, the following message is
reported:

kvm: sev_kvm_init: failed to initialize ret=-25 fw_error=0
kvm: failed to initialize kvm: Operation not permitted

While the error message's accurate output should be:

kvm: sev_kvm_init: failed to initialize ret=-25 fw_error=0
kvm: failed to initialize kvm: Inappropriate ioctl for device

Fix this by returning the return number directly on the
failure path.

Signed-off-by: Hyman Huang 
---
 target/i386/sev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/i386/sev.c b/target/i386/sev.c
index 9a71246682..4a69ca457c 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -1019,7 +1019,7 @@ int sev_kvm_init(ConfidentialGuestSupport *cgs, Error 
**errp)
 err:
 sev_guest = NULL;
 ram_block_discard_disable(false);
-return -1;
+return ret;
 }
 
 int
-- 
2.39.1




[PATCH v2 1/2] qapi/virtio: Keep feature and status bits in the QMP output

2024-01-04 Thread Hyman Huang
Maintain the feature and status bits in the x-query-virtio-status
output and, as usual, add human-readable output only in HMP.

Applications may find it useful to compare features and status
information directly. An upper application, for example, could
use the QMP command x-query-virtio-status to retrieve vhost-user
net device features and the "ovs-vsctl list interface" command to
retrieve interface features (in number format) in order to verify
the correctness of the virtio negotiation between guest, QEMU,
and OVS-DPDK. The application could then compare the two features
directly, without the need for additional feature encoding.

Signed-off-by: Hyman Huang 
---
 hw/virtio/virtio-hmp-cmds.c |  29 --
 hw/virtio/virtio-qmp.c  |  23 ++---
 qapi/virtio.json| 192 
 3 files changed, 48 insertions(+), 196 deletions(-)

diff --git a/hw/virtio/virtio-hmp-cmds.c b/hw/virtio/virtio-hmp-cmds.c
index 477c97dea2..4fabba4f9c 100644
--- a/hw/virtio/virtio-hmp-cmds.c
+++ b/hw/virtio/virtio-hmp-cmds.c
@@ -6,6 +6,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "virtio-qmp.h"
 #include "monitor/hmp.h"
 #include "monitor/monitor.h"
 #include "qapi/qapi-commands-virtio.h"
@@ -13,8 +14,10 @@
 
 
 static void hmp_virtio_dump_protocols(Monitor *mon,
-  VhostDeviceProtocols *pcol)
+  uint64_t bitmap)
 {
+VhostDeviceProtocols *pcol = qmp_decode_protocols(bitmap);
+
 strList *pcol_list = pcol->protocols;
 while (pcol_list) {
 monitor_printf(mon, "\t%s", pcol_list->value);
@@ -31,8 +34,10 @@ static void hmp_virtio_dump_protocols(Monitor *mon,
 }
 
 static void hmp_virtio_dump_status(Monitor *mon,
-   VirtioDeviceStatus *status)
+   uint64_t bitmap)
 {
+VirtioDeviceStatus *status = qmp_decode_status(bitmap);
+
 strList *status_list = status->statuses;
 while (status_list) {
 monitor_printf(mon, "\t%s", status_list->value);
@@ -49,8 +54,12 @@ static void hmp_virtio_dump_status(Monitor *mon,
 }
 
 static void hmp_virtio_dump_features(Monitor *mon,
- VirtioDeviceFeatures *features)
+ uint16_t device_id,
+ uint64_t bitmap)
 {
+VirtioDeviceFeatures *features =
+qmp_decode_features(device_id, bitmap);
+
 strList *transport_list = features->transports;
 while (transport_list) {
 monitor_printf(mon, "\t%s", transport_list->value);
@@ -147,11 +156,11 @@ void hmp_virtio_status(Monitor *mon, const QDict *qdict)
 monitor_printf(mon, "  status:\n");
 hmp_virtio_dump_status(mon, s->status);
 monitor_printf(mon, "  Guest features:\n");
-hmp_virtio_dump_features(mon, s->guest_features);
+hmp_virtio_dump_features(mon, s->device_id, s->guest_features);
 monitor_printf(mon, "  Host features:\n");
-hmp_virtio_dump_features(mon, s->host_features);
+hmp_virtio_dump_features(mon, s->device_id, s->host_features);
 monitor_printf(mon, "  Backend features:\n");
-hmp_virtio_dump_features(mon, s->backend_features);
+hmp_virtio_dump_features(mon, s->device_id, s->backend_features);
 
 if (s->vhost_dev) {
 monitor_printf(mon, "  VHost:\n");
@@ -172,11 +181,13 @@ void hmp_virtio_status(Monitor *mon, const QDict *qdict)
 monitor_printf(mon, "log_size:   %"PRId64"\n",
s->vhost_dev->log_size);
 monitor_printf(mon, "Features:\n");
-hmp_virtio_dump_features(mon, s->vhost_dev->features);
+hmp_virtio_dump_features(mon, s->device_id, s->vhost_dev->features);
 monitor_printf(mon, "Acked features:\n");
-hmp_virtio_dump_features(mon, s->vhost_dev->acked_features);
+hmp_virtio_dump_features(mon,
+s->device_id, s->vhost_dev->acked_features);
 monitor_printf(mon, "Backend features:\n");
-hmp_virtio_dump_features(mon, s->vhost_dev->backend_features);
+hmp_virtio_dump_features(mon,
+s->device_id, s->vhost_dev->backend_features);
 monitor_printf(mon, "Protocol features:\n");
 hmp_virtio_dump_protocols(mon, s->vhost_dev->protocol_features);
 }
diff --git a/hw/virtio/virtio-qmp.c b/hw/virtio/virtio-qmp.c
index 1dd96ed20f..1660c17653 100644
--- a/hw/virtio/virtio-qmp.c
+++ b/hw/virtio/virtio-qmp.c
@@ -733,12 +733,9 @@ VirtioStatus *qmp_x_query_virtio_status(const char *path, 
Error **errp)
 status->name = g_strdup(vdev->name);
 status->device_id = vdev->device_id;

[PATCH v2 2/2] hmp: Drop unknown feature and status bits

2024-01-04 Thread Hyman Huang
The QMP command "x-query-virtio-status" outputs the full
feature and status bit information, so there is no need
to maintain it in the HMP output; drop it.

Signed-off-by: Hyman Huang 
---
 hw/virtio/virtio-hmp-cmds.c | 13 -
 1 file changed, 13 deletions(-)

diff --git a/hw/virtio/virtio-hmp-cmds.c b/hw/virtio/virtio-hmp-cmds.c
index 4fabba4f9c..ae27968523 100644
--- a/hw/virtio/virtio-hmp-cmds.c
+++ b/hw/virtio/virtio-hmp-cmds.c
@@ -27,10 +27,6 @@ static void hmp_virtio_dump_protocols(Monitor *mon,
 }
 }
 monitor_printf(mon, "\n");
-if (pcol->has_unknown_protocols) {
-monitor_printf(mon, "  unknown-protocols(0x%016"PRIx64")\n",
-   pcol->unknown_protocols);
-}
 }
 
 static void hmp_virtio_dump_status(Monitor *mon,
@@ -47,10 +43,6 @@ static void hmp_virtio_dump_status(Monitor *mon,
 }
 }
 monitor_printf(mon, "\n");
-if (status->has_unknown_statuses) {
-monitor_printf(mon, "  unknown-statuses(0x%016"PRIx32")\n",
-   status->unknown_statuses);
-}
 }
 
 static void hmp_virtio_dump_features(Monitor *mon,
@@ -81,11 +73,6 @@ static void hmp_virtio_dump_features(Monitor *mon,
 }
 monitor_printf(mon, "\n");
 }
-
-if (features->has_unknown_dev_features) {
-monitor_printf(mon, "  unknown-features(0x%016"PRIx64")\n",
-   features->unknown_dev_features);
-}
 }
 
 void hmp_virtio_query(Monitor *mon, const QDict *qdict)
-- 
2.39.1




[PATCH v2 0/2] Adjust the output of x-query-virtio-status

2024-01-04 Thread Hyman Huang
v2:
- Changing the hmp_virtio_dump_xxx function signatures to implement
  the bitmap decoding, suggested by Philippe. 

Please review, thanks,
Yong

This patchset is derived from the series:
https://lore.kernel.org/qemu-devel/cover.1699793550.git.yong.hu...@smartx.com/
Please go to the link to see more background information.

The following points are what we have done in the patchset:
1. Take the policy of adding human-readable output just in HMP.
2. For the HMP output, display the human-readable information and
   drop the unknown bits in practice.
3. For the QMP output, remove the descriptive strings and only
   display bits encoded as numbers.

Hyman Huang (2):
  qapi/virtio: Keep feature and status bits in the QMP output
  hmp: Drop unknown feature and status bits

 hw/virtio/virtio-hmp-cmds.c |  42 
 hw/virtio/virtio-qmp.c  |  23 ++---
 qapi/virtio.json| 192 
 3 files changed, 48 insertions(+), 209 deletions(-)

-- 
2.39.1




[PATCH 2/2] hmp: Drop unknown feature and status bits

2023-12-28 Thread Hyman Huang
The QMP command "x-query-virtio-status" outputs the full
feature and status bit information, so there is no need
to maintain it in the HMP output; drop it.

Signed-off-by: Hyman Huang 
---
 hw/virtio/virtio-hmp-cmds.c | 13 -
 1 file changed, 13 deletions(-)

diff --git a/hw/virtio/virtio-hmp-cmds.c b/hw/virtio/virtio-hmp-cmds.c
index 721c630ab0..f9a7384604 100644
--- a/hw/virtio/virtio-hmp-cmds.c
+++ b/hw/virtio/virtio-hmp-cmds.c
@@ -25,10 +25,6 @@ static void hmp_virtio_dump_protocols(Monitor *mon,
 }
 }
 monitor_printf(mon, "\n");
-if (pcol->has_unknown_protocols) {
-monitor_printf(mon, "  unknown-protocols(0x%016"PRIx64")\n",
-   pcol->unknown_protocols);
-}
 }
 
 static void hmp_virtio_dump_status(Monitor *mon,
@@ -43,10 +39,6 @@ static void hmp_virtio_dump_status(Monitor *mon,
 }
 }
 monitor_printf(mon, "\n");
-if (status->has_unknown_statuses) {
-monitor_printf(mon, "  unknown-statuses(0x%016"PRIx32")\n",
-   status->unknown_statuses);
-}
 }
 
 static void hmp_virtio_dump_features(Monitor *mon,
@@ -73,11 +65,6 @@ static void hmp_virtio_dump_features(Monitor *mon,
 }
 monitor_printf(mon, "\n");
 }
-
-if (features->has_unknown_dev_features) {
-monitor_printf(mon, "  unknown-features(0x%016"PRIx64")\n",
-   features->unknown_dev_features);
-}
 }
 
 void hmp_virtio_query(Monitor *mon, const QDict *qdict)
-- 
2.39.1




[PATCH 0/2] Adjust the output of x-query-virtio-status

2023-12-28 Thread Hyman Huang
This patchset is derived from the series:
https://lore.kernel.org/qemu-devel/cover.1699793550.git.yong.hu...@smartx.com/
Please go to the link to see more background information.

The following points are what we have done in the patchset:
1. Take the policy of adding human-readable output just in HMP.
2. For the HMP output, display the human-readable information and
   drop the unknown bits in practice.
3. For the QMP output, remove the descriptive strings and only
   display bits encoded as numbers.

Please review, thanks,
Yong

Hyman Huang (2):
  qapi/virtio: Keep feature and status bits in the QMP output
  hmp: Drop unknown feature and status bits

 hw/virtio/virtio-hmp-cmds.c |  38 ---
 hw/virtio/virtio-qmp.c  |  23 ++---
 qapi/virtio.json| 192 
 3 files changed, 45 insertions(+), 208 deletions(-)

-- 
2.39.1




[PATCH 1/2] qapi/virtio: Keep feature and status bits in the QMP output

2023-12-28 Thread Hyman Huang
Maintain the feature and status bits in the x-query-virtio-status
output and, as usual, add human-readable output only in HMP.

Applications may find it useful to compare features and status
information directly. An upper application, for example, could
use the QMP command x-query-virtio-status to retrieve vhost-user
net device features and the "ovs-vsctl list interface" command to
retrieve interface features (in number format) in order to verify
the correctness of the virtio negotiation between guest, QEMU,
and OVS-DPDK. The application could then compare the two features
directly, without the need for additional feature encoding.

Signed-off-by: Hyman Huang 
---
 hw/virtio/virtio-hmp-cmds.c |  25 +++--
 hw/virtio/virtio-qmp.c  |  23 ++---
 qapi/virtio.json| 192 
 3 files changed, 45 insertions(+), 195 deletions(-)

diff --git a/hw/virtio/virtio-hmp-cmds.c b/hw/virtio/virtio-hmp-cmds.c
index 477c97dea2..721c630ab0 100644
--- a/hw/virtio/virtio-hmp-cmds.c
+++ b/hw/virtio/virtio-hmp-cmds.c
@@ -6,6 +6,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "virtio-qmp.h"
 #include "monitor/hmp.h"
 #include "monitor/monitor.h"
 #include "qapi/qapi-commands-virtio.h"
@@ -145,13 +146,17 @@ void hmp_virtio_status(Monitor *mon, const QDict *qdict)
 monitor_printf(mon, "  endianness:  %s\n",
s->device_endian);
 monitor_printf(mon, "  status:\n");
-hmp_virtio_dump_status(mon, s->status);
+hmp_virtio_dump_status(mon,
+qmp_decode_status(s->status));
 monitor_printf(mon, "  Guest features:\n");
-hmp_virtio_dump_features(mon, s->guest_features);
+hmp_virtio_dump_features(mon,
+qmp_decode_features(s->device_id, s->guest_features));
 monitor_printf(mon, "  Host features:\n");
-hmp_virtio_dump_features(mon, s->host_features);
+hmp_virtio_dump_features(mon,
+qmp_decode_features(s->device_id, s->host_features));
 monitor_printf(mon, "  Backend features:\n");
-hmp_virtio_dump_features(mon, s->backend_features);
+hmp_virtio_dump_features(mon,
+qmp_decode_features(s->device_id, s->backend_features));
 
 if (s->vhost_dev) {
 monitor_printf(mon, "  VHost:\n");
@@ -172,13 +177,17 @@ void hmp_virtio_status(Monitor *mon, const QDict *qdict)
 monitor_printf(mon, "log_size:   %"PRId64"\n",
s->vhost_dev->log_size);
 monitor_printf(mon, "Features:\n");
-hmp_virtio_dump_features(mon, s->vhost_dev->features);
+hmp_virtio_dump_features(mon,
+qmp_decode_features(s->device_id, s->vhost_dev->features));
 monitor_printf(mon, "Acked features:\n");
-hmp_virtio_dump_features(mon, s->vhost_dev->acked_features);
+hmp_virtio_dump_features(mon,
+qmp_decode_features(s->device_id, s->vhost_dev->acked_features));
 monitor_printf(mon, "Backend features:\n");
-hmp_virtio_dump_features(mon, s->vhost_dev->backend_features);
+hmp_virtio_dump_features(mon,
+qmp_decode_features(s->device_id, s->vhost_dev->backend_features));
 monitor_printf(mon, "Protocol features:\n");
-hmp_virtio_dump_protocols(mon, s->vhost_dev->protocol_features);
+hmp_virtio_dump_protocols(mon,
+qmp_decode_protocols(s->vhost_dev->protocol_features));
 }
 
 qapi_free_VirtioStatus(s);
diff --git a/hw/virtio/virtio-qmp.c b/hw/virtio/virtio-qmp.c
index 1dd96ed20f..1660c17653 100644
--- a/hw/virtio/virtio-qmp.c
+++ b/hw/virtio/virtio-qmp.c
@@ -733,12 +733,9 @@ VirtioStatus *qmp_x_query_virtio_status(const char *path, 
Error **errp)
 status->name = g_strdup(vdev->name);
 status->device_id = vdev->device_id;
 status->vhost_started = vdev->vhost_started;
-status->guest_features = qmp_decode_features(vdev->device_id,
- vdev->guest_features);
-status->host_features = qmp_decode_features(vdev->device_id,
-vdev->host_features);
-status->backend_features = qmp_decode_features(vdev->device_id,
-   vdev->backend_features);
+status->guest_features = vdev->guest_features;
+status->host_features = vdev->host_features;
+status->backend_features = vdev->backend_features;
 
 switch (vdev->device_endian) {
 case VIRTIO_DEVICE_ENDIAN_LITTLE:
@@ -753,7 +750,7 @@ VirtioStatus *qmp_x_query_virtio_status(const char *path, 
Error **errp)
 }
 
 status->num_vqs = virtio_get_num_queues(vdev);
-status->

[PULL 1/1] migration/dirtyrate: Remove an extra parameter

2023-12-25 Thread Hyman Huang
From: Wafer 

vcpu_dirty_stat_collect() has an unused parameter so remove it.

Signed-off-by: Wafer 
Reviewed-by: Hyman Huang 
Message-Id: <20231204012230.4123-1-wa...@jaguarmicro.com>
---
 migration/dirtyrate.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/migration/dirtyrate.c b/migration/dirtyrate.c
index 036ac017fc..62d86b8be2 100644
--- a/migration/dirtyrate.c
+++ b/migration/dirtyrate.c
@@ -129,8 +129,7 @@ static DirtyPageRecord *vcpu_dirty_stat_alloc(VcpuStat 
*stat)
 return g_new0(DirtyPageRecord, nvcpu);
 }
 
-static void vcpu_dirty_stat_collect(VcpuStat *stat,
-DirtyPageRecord *records,
+static void vcpu_dirty_stat_collect(DirtyPageRecord *records,
 bool start)
 {
 CPUState *cpu;
@@ -158,7 +157,7 @@ retry:
 WITH_QEMU_LOCK_GUARD(_cpu_list_lock) {
 gen_id = cpu_list_generation_id_get();
 records = vcpu_dirty_stat_alloc(stat);
-vcpu_dirty_stat_collect(stat, records, true);
+vcpu_dirty_stat_collect(records, true);
 }
 
 duration = dirty_stat_wait(calc_time_ms, init_time_ms);
@@ -172,7 +171,7 @@ retry:
 cpu_list_unlock();
 goto retry;
 }
-vcpu_dirty_stat_collect(stat, records, false);
+vcpu_dirty_stat_collect(records, false);
 }
 
 for (i = 0; i < stat->nvcpu; i++) {
-- 
2.39.1




[PULL 0/1] Dirty page rate and dirty page limit 20231225 patch

2023-12-25 Thread Hyman Huang
The following changes since commit 191710c221f65b1542f6ea7fa4d30dde6e134fd7:

  Merge tag 'pull-request-2023-12-20' of https://gitlab.com/thuth/qemu into 
staging (2023-12-20 09:40:16 -0500)

are available in the Git repository at:

  https://github.com/newfriday/qemu.git 
tags/dirtylimit-dirtyrate-pull-request-20231225

for you to fetch changes up to 4918712fb1c34ae43361b402642e426be85a789e:

  migration/dirtyrate: Remove an extra parameter (2023-12-25 18:05:47 +0800)


dirtylimit dirtyrate pull request 20231225

Nitpick about an unused parameter
Please apply, thanks,
Yong


Wafer (1):
  migration/dirtyrate: Remove an extra parameter

 migration/dirtyrate.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

-- 
2.39.1




[PATCH RESEND v3 00/10] Support generic Luks encryption

2023-12-24 Thread Hyman Huang
v3:
- Rebase on master
- Add a test case for detached LUKS header
- Adjust the design to honour preallocation of the payload device
- Adjust the design to honour the payload offset from the header,
  even when detached
- Support detached LUKS header creation using qemu-img
- Support detached LUKS header querying
- Do some code clean

Thanks for commenting on this series, please review.

Best regared,

Yong

v2:
- Simplify the design by reusing the LUKS driver to implement
  the generic Luks encryption, thank Daniel for the insightful 
  advice.
- rebase on master. 

This functionality was motivated by the following to-do list seen
in crypto documents:
https://wiki.qemu.org/Features/Block/Crypto 

The last chapter says we should "separate header volume": 

The LUKS format has ability to store the header in a separate volume
from the payload. We should extend the LUKS driver in QEMU to support
this use case.

By enhancing the LUKS driver, it is possible to enable
the detachable LUKS header and, as a result, achieve
general encryption for any disk format that QEMU has
supported.

Take the qcow2 as an example, the usage of the generic
LUKS encryption as follows:

1. add a protocol blockdev node of data disk
$ virsh qemu-monitor-command vm '{"execute":"blockdev-add",
> "arguments":{"node-name":"libvirt-1-storage", "driver":"file",
> "filename":"/path/to/test_disk.qcow2"}}'

2. add a protocol blockdev node of LUKS header as above.
$ virsh qemu-monitor-command vm '{"execute":"blockdev-add",
> "arguments":{"node-name":"libvirt-2-storage", "driver":"file",
> "filename": "/path/to/cipher.gluks" }}'

3. add the secret for decrypting the cipher stored in LUKS
   header above
$ virsh qemu-monitor-command vm '{"execute":"object-add",
> "arguments":{"qom-type":"secret", "id":
> "libvirt-2-storage-secret0", "data":"abc123"}}'

4. add the qcow2-drived blockdev format node
$ virsh qemu-monitor-command vm '{"execute":"blockdev-add",
> "arguments":{"node-name":"libvirt-1-format", "driver":"qcow2",
> "file":"libvirt-1-storage"}}'

5. add the luks-drived blockdev to link the qcow2 disk with
   LUKS header by specifying the field "header"
$ virsh qemu-monitor-command vm '{"execute":"blockdev-add",
> "arguments":{"node-name":"libvirt-2-format", "driver":"luks",
> "file":"libvirt-1-format", "header":"libvirt-2-storage",
> "key-secret":"libvirt-2-format-secret0"}}'

6. add the virtio-blk device finally
$ virsh qemu-monitor-command vm '{"execute":"device_add",
> "arguments": {"num-queues":"1", "driver":"virtio-blk-pci",
> "drive": "libvirt-2-format", "id":"virtio-disk2"}}'

The generic LUKS encryption method of starting a virtual
machine (VM) is somewhat similar to hot-plug in that both
maintaining the same json command while the starting VM
changes the "blockdev-add/device_add" parameters to
"blockdev/device".

Hyman Huang (10):
  crypto: Introduce option and structure for detached LUKS header
  crypto: Support generic LUKS encryption
  qapi: Make parameter 'file' optional for BlockdevCreateOptionsLUKS
  crypto: Introduce creation option and structure for detached LUKS
header
  crypto: Mark the payload_offset_sector invalid for detached LUKS
header
  block: Support detached LUKS header creation using blockdev-create
  block: Support detached LUKS header creation using qemu-img
  crypto: Introduce 'detached-header' field in QCryptoBlockInfoLUKS
  tests: Add detached LUKS header case
  MAINTAINERS: Add section "Detached LUKS header"

 MAINTAINERS   |   5 +
 block.c   |   5 +-
 block/crypto.c| 146 ++--
 block/crypto.h|   8 +
 crypto/block-luks.c   |  49 +++-
 crypto/block.c|   1 +
 crypto/blockpriv.h|   3 +
 qapi/block-core.json  |  14 +-
 qapi/crypto.json  |  13 +-
 tests/qemu-iotests/210.out|   4 +
 tests/qemu-iotests/tests/luks-detached-header | 214 ++
 .../tests/luks-detached-header.out|   5 +
 12 files changed, 436 insertions(+), 31 deletions(-)
 create mode 100755 tests/qemu-iotests/tests/luks-detached-header
 create mode 100644 tests/qemu-iotests/tests/luks-detached-header.out

-- 
2.39.1




[PATCH RESEND v3 04/10] crypto: Introduce creation option and structure for detached LUKS header

2023-12-24 Thread Hyman Huang
Introduce 'header' field in BlockdevCreateOptionsLUKS to support
detached LUKS header creation. Meanwhile, introduce header-related
field in QCryptoBlock.

Signed-off-by: Hyman Huang 
---
 crypto/blockpriv.h   | 3 +++
 qapi/block-core.json | 3 +++
 qapi/crypto.json | 5 -
 3 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/crypto/blockpriv.h b/crypto/blockpriv.h
index 3c7ccea504..6289aea961 100644
--- a/crypto/blockpriv.h
+++ b/crypto/blockpriv.h
@@ -42,6 +42,9 @@ struct QCryptoBlock {
 size_t niv;
 uint64_t payload_offset; /* In bytes */
 uint64_t sector_size; /* In bytes */
+
+bool detached_header; /* True if disk has a detached LUKS header */
+uint64_t detached_header_size; /* LUKS header size plus key slot size */
 };
 
 struct QCryptoBlockDriver {
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 9ac256c489..8aec179926 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -4948,6 +4948,8 @@
 # @file: Node to create the image format on, mandatory except when
 #'preallocation' is not requested
 #
+# @header: Detached LUKS header node to format. (since 9.0)
+#
 # @size: Size of the virtual disk in bytes
 #
 # @preallocation: Preallocation mode for the new image (since: 4.2)
@@ -4958,6 +4960,7 @@
 { 'struct': 'BlockdevCreateOptionsLUKS',
   'base': 'QCryptoBlockCreateOptionsLUKS',
   'data': { '*file':'BlockdevRef',
+'*header':  'BlockdevRef',
 'size': 'size',
 '*preallocation':   'PreallocMode' } }
 
diff --git a/qapi/crypto.json b/qapi/crypto.json
index fd3d46ebd1..6b4e86cb81 100644
--- a/qapi/crypto.json
+++ b/qapi/crypto.json
@@ -195,10 +195,13 @@
 # decryption key.  Mandatory except when probing image for
 # metadata only.
 #
+# @detached-header: if true, disk has detached LUKS header.
+#
 # Since: 2.6
 ##
 { 'struct': 'QCryptoBlockOptionsLUKS',
-  'data': { '*key-secret': 'str' }}
+  'data': { '*key-secret': 'str',
+'*detached-header': 'bool' }}
 
 ##
 # @QCryptoBlockCreateOptionsLUKS:
-- 
2.39.1




[PATCH RESEND v3 10/10] MAINTAINERS: Add section "Detached LUKS header"

2023-12-24 Thread Hyman Huang
I've built interests in block cryptography and also
have been working on projects related to this
subsystem.

Add a section to the MAINTAINERS file for detached
LUKS header, it only has a test case in it currently.

Signed-off-by: Hyman Huang 
---
 MAINTAINERS | 5 +
 1 file changed, 5 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 395f26ba86..f0f7b889a3 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3391,6 +3391,11 @@ F: migration/dirtyrate.c
 F: migration/dirtyrate.h
 F: include/sysemu/dirtyrate.h
 
+Detached LUKS header
+M: Hyman Huang 
+S: Maintained
+F: tests/qemu-iotests/tests/luks-detached-header
+
 D-Bus
 M: Marc-André Lureau 
 S: Maintained
-- 
2.39.1




[PATCH RESEND v3 05/10] crypto: Mark the payload_offset_sector invalid for detached LUKS header

2023-12-24 Thread Hyman Huang
Set the payload_offset_sector to a value that is nearly never reached
in order to mark it as invalid and indicate that 0 should be the offset
of the read/write operation on the 'file' protocol blockdev node.

Signed-off-by: Hyman Huang 
---
 crypto/block-luks.c | 41 +++--
 1 file changed, 31 insertions(+), 10 deletions(-)

diff --git a/crypto/block-luks.c b/crypto/block-luks.c
index fb01ec38bb..48443ffcae 100644
--- a/crypto/block-luks.c
+++ b/crypto/block-luks.c
@@ -34,6 +34,8 @@
 
 #include "qemu/bitmap.h"
 
+#define INVALID_SECTOR_OFFSET UINT32_MAX
+
 /*
  * Reference for the LUKS format implemented here is
  *
@@ -136,6 +138,13 @@ struct QCryptoBlockLUKS {
 };
 
 
+static inline uint32_t
+qcrypto_block_luks_payload_offset(uint32_t sector)
+{
+return sector == INVALID_SECTOR_OFFSET ? 0 :
+sector * QCRYPTO_BLOCK_LUKS_SECTOR_SIZE;
+}
+
 static int qcrypto_block_luks_cipher_name_lookup(const char *name,
  QCryptoCipherMode mode,
  uint32_t key_bytes,
@@ -1255,8 +1264,8 @@ qcrypto_block_luks_open(QCryptoBlock *block,
 }
 
 block->sector_size = QCRYPTO_BLOCK_LUKS_SECTOR_SIZE;
-block->payload_offset = luks->header.payload_offset_sector *
-block->sector_size;
+block->payload_offset =
+qcrypto_block_luks_payload_offset(luks->header.payload_offset_sector);
 
 return 0;
 
@@ -1529,16 +1538,28 @@ qcrypto_block_luks_create(QCryptoBlock *block,
 slot->stripes = QCRYPTO_BLOCK_LUKS_STRIPES;
 }
 
-/* The total size of the LUKS headers is the partition header + key
- * slot headers, rounded up to the nearest sector, combined with
- * the size of each master key material region, also rounded up
- * to the nearest sector */
-luks->header.payload_offset_sector = header_sectors +
-QCRYPTO_BLOCK_LUKS_NUM_KEY_SLOTS * split_key_sectors;
+if (block->detached_header) {
+/*
+ * Set the payload_offset_sector to a value that is nearly never
+ * reached in order to mark it as invalid and indicate that 0 should
+ * be the offset of the read/write operation on the 'file' protocol
+ * blockdev node. Here the UINT32_MAX is choosed
+ */
+luks->header.payload_offset_sector = INVALID_SECTOR_OFFSET;
+} else {
+/*
+ * The total size of the LUKS headers is the partition header + key
+ * slot headers, rounded up to the nearest sector, combined with
+ * the size of each master key material region, also rounded up
+ * to the nearest sector
+ */
+luks->header.payload_offset_sector = header_sectors +
+QCRYPTO_BLOCK_LUKS_NUM_KEY_SLOTS * split_key_sectors;
+}
 
 block->sector_size = QCRYPTO_BLOCK_LUKS_SECTOR_SIZE;
-block->payload_offset = luks->header.payload_offset_sector *
-block->sector_size;
+block->payload_offset =
+qcrypto_block_luks_payload_offset(luks->header.payload_offset_sector);
 
 /* Reserve header space to match payload offset */
 initfunc(block, block->payload_offset, opaque, _err);
-- 
2.39.1




[PATCH RESEND v3 09/10] tests: Add detached LUKS header case

2023-12-24 Thread Hyman Huang
Signed-off-by: Hyman Huang 
---
 tests/qemu-iotests/tests/luks-detached-header | 214 ++
 .../tests/luks-detached-header.out|   5 +
 2 files changed, 219 insertions(+)
 create mode 100755 tests/qemu-iotests/tests/luks-detached-header
 create mode 100644 tests/qemu-iotests/tests/luks-detached-header.out

diff --git a/tests/qemu-iotests/tests/luks-detached-header 
b/tests/qemu-iotests/tests/luks-detached-header
new file mode 100755
index 00..cf305bfa47
--- /dev/null
+++ b/tests/qemu-iotests/tests/luks-detached-header
@@ -0,0 +1,214 @@
+#!/usr/bin/env python3
+# group: rw auto
+#
+# Test detached LUKS header
+#
+# Copyright (C) 2024 SmartX Inc.
+#
+# Authors:
+# Hyman Huang 
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+#
+
+import os
+import iotests
+from iotests import imgfmt, qemu_img_create, img_info_log, qemu_img_info, 
QMPTestCase
+
+
+image_size = 128 * 1024 * 1024
+
+luks_img = os.path.join(iotests.test_dir, 'luks.img')
+detached_header_img1 = os.path.join(iotests.test_dir, 'detached_header.img1')
+detached_header_img2 = os.path.join(iotests.test_dir, 'detached_header.img2')
+detached_payload_raw_img = os.path.join(iotests.test_dir, 
'detached_payload_raw.img')
+detached_payload_qcow2_img = os.path.join(iotests.test_dir, 
'detached_payload_qcow2.img')
+
+secret_obj = 'secret,id=sec0,data=foo'
+luks_opts = 'key-secret=sec0'
+
+
+class TestDetachedLUKSHeader(QMPTestCase):
+def setUp(self) -> None:
+self.vm = iotests.VM()
+self.vm.add_object(secret_obj)
+self.vm.launch()
+
+# 1. Create the normal LUKS disk with 128M size
+self.vm.blockdev_create({ 'driver': 'file',
+  'filename': luks_img,
+  'size': 0 })
+self.vm.qmp_log('blockdev-add', driver='file', filename=luks_img,
+ node_name='luks-1-storage')
+result = self.vm.blockdev_create({ 'driver': imgfmt,
+   'file': 'luks-1-storage',
+   'key-secret': 'sec0',
+   'size': image_size,
+   'iter-time': 10 })
+# None is expected
+self.assertEqual(result, None)
+
+# 2. Create the LUKS disk with detached header (raw)
+
+# Create detached LUKS header
+self.vm.blockdev_create({ 'driver': 'file',
+  'filename': detached_header_img1,
+  'size': 0 })
+self.vm.qmp_log('blockdev-add', driver='file', 
filename=detached_header_img1,
+ node_name='luks-2-header-storage')
+
+# Create detached LUKS raw payload
+self.vm.blockdev_create({ 'driver': 'file',
+  'filename': detached_payload_raw_img,
+  'size': 0 })
+self.vm.qmp_log('blockdev-add', driver='file',
+ filename=detached_payload_raw_img,
+ node_name='luks-2-payload-storage')
+
+# Format LUKS disk with detached header
+result = self.vm.blockdev_create({ 'driver': imgfmt,
+   'header': 'luks-2-header-storage',
+   'file': 'luks-2-payload-storage',
+   'key-secret': 'sec0',
+   'preallocation': 'full',
+   'size': image_size,
+   'iter-time': 10 })
+self.assertEqual(result, None)
+
+self.vm.shutdown()
+
+# 3. Create the LUKS disk with detached header (qcow2)
+
+# Create detached LUKS header using qemu-img
+res = qemu_img_create('-f', 'luks', '--object', secret_obj, '-o', 
luks_opts,
+  '-o', "detached-mode=true", detached_header_img2)
+assert res.returncode == 0
+
+# Create detached LUKS qcow2 payload
+res = qemu_img_create('-f', 'qcow2', detached_payload_qcow2_img, 
str(image_size))
+assert res.returncode == 0
+
+def tearDown(self) -> None:
+os.remove(luks_img)
+os.remove(detached_header_img1)
+os.remov

[PATCH RESEND v3 01/10] crypto: Introduce option and structure for detached LUKS header

2023-12-24 Thread Hyman Huang
Add the "header" option for the LUKS format. This field would be
used to identify the blockdev's position where a detachable LUKS
header is stored.

In addition, introduce header field in struct BlockCrypto

Signed-off-by: Hyman Huang 
Reviewed-by: Daniel P. Berrangé 
Message-Id: 
<5b99f60c7317092a563d7ca3fb4b414197015eb2.1701879996.git.yong.hu...@smartx.com>
---
 block/crypto.c   | 1 +
 qapi/block-core.json | 6 +-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/block/crypto.c b/block/crypto.c
index 921933a5e5..f82b13d32b 100644
--- a/block/crypto.c
+++ b/block/crypto.c
@@ -39,6 +39,7 @@ typedef struct BlockCrypto BlockCrypto;
 struct BlockCrypto {
 QCryptoBlock *block;
 bool updating_keys;
+BdrvChild *header;  /* Reference to the detached LUKS header */
 };
 
 
diff --git a/qapi/block-core.json b/qapi/block-core.json
index ca390c5700..10be08d08f 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -3352,11 +3352,15 @@
 # decryption key (since 2.6). Mandatory except when doing a
 # metadata-only probe of the image.
 #
+# @header: optional reference to the location of a blockdev
+# storing a detached LUKS header. (since 9.0)
+#
 # Since: 2.9
 ##
 { 'struct': 'BlockdevOptionsLUKS',
   'base': 'BlockdevOptionsGenericFormat',
-  'data': { '*key-secret': 'str' } }
+  'data': { '*key-secret': 'str',
+'*header': 'BlockdevRef'} }
 
 ##
 # @BlockdevOptionsGenericCOWFormat:
-- 
2.39.1




[PATCH RESEND v3 08/10] crypto: Introduce 'detached-header' field in QCryptoBlockInfoLUKS

2023-12-24 Thread Hyman Huang
When querying the LUKS disk with the qemu-img tool or other APIs,
add information about whether the LUKS header is detached.

Additionally, update the test case with the appropriate
modification.

Signed-off-by: Hyman Huang 
---
 crypto/block-luks.c| 2 ++
 qapi/crypto.json   | 3 +++
 tests/qemu-iotests/210.out | 4 
 3 files changed, 9 insertions(+)

diff --git a/crypto/block-luks.c b/crypto/block-luks.c
index 474c7aee2e..c5e53b4ee4 100644
--- a/crypto/block-luks.c
+++ b/crypto/block-luks.c
@@ -1266,6 +1266,7 @@ qcrypto_block_luks_open(QCryptoBlock *block,
 block->sector_size = QCRYPTO_BLOCK_LUKS_SECTOR_SIZE;
 block->payload_offset =
 qcrypto_block_luks_payload_offset(luks->header.payload_offset_sector);
+block->detached_header = (block->payload_offset == 0) ? true : false;
 
 return 0;
 
@@ -1892,6 +1893,7 @@ static int qcrypto_block_luks_get_info(QCryptoBlock 
*block,
 info->u.luks.master_key_iters = luks->header.master_key_iterations;
 info->u.luks.uuid = g_strndup((const char *)luks->header.uuid,
   sizeof(luks->header.uuid));
+info->u.luks.detached_header = block->detached_header;
 
 for (i = 0; i < QCRYPTO_BLOCK_LUKS_NUM_KEY_SLOTS; i++) {
 slot = g_new0(QCryptoBlockInfoLUKSSlot, 1);
diff --git a/qapi/crypto.json b/qapi/crypto.json
index 8e81aa8454..336c880b5d 100644
--- a/qapi/crypto.json
+++ b/qapi/crypto.json
@@ -317,6 +317,8 @@
 #
 # @hash-alg: the master key hash algorithm
 #
+# @detached-header: whether the LUKS header is detached (Since 9.0)
+#
 # @payload-offset: offset to the payload data in bytes
 #
 # @master-key-iters: number of PBKDF2 iterations for key material
@@ -333,6 +335,7 @@
'ivgen-alg': 'QCryptoIVGenAlgorithm',
'*ivgen-hash-alg': 'QCryptoHashAlgorithm',
'hash-alg': 'QCryptoHashAlgorithm',
+   'detached-header': 'bool',
'payload-offset': 'int',
'master-key-iters': 'int',
'uuid': 'str',
diff --git a/tests/qemu-iotests/210.out b/tests/qemu-iotests/210.out
index 96d9f749dd..94b29b2120 100644
--- a/tests/qemu-iotests/210.out
+++ b/tests/qemu-iotests/210.out
@@ -18,6 +18,7 @@ virtual size: 128 MiB (134217728 bytes)
 encrypted: yes
 Format specific information:
 ivgen alg: plain64
+detached header: false
 hash alg: sha256
 cipher alg: aes-256
 uuid: ----
@@ -70,6 +71,7 @@ virtual size: 64 MiB (67108864 bytes)
 encrypted: yes
 Format specific information:
 ivgen alg: plain64
+detached header: false
 hash alg: sha1
 cipher alg: aes-128
 uuid: ----
@@ -125,6 +127,7 @@ virtual size: 0 B (0 bytes)
 encrypted: yes
 Format specific information:
 ivgen alg: plain64
+detached header: false
 hash alg: sha256
 cipher alg: aes-256
 uuid: ----
@@ -195,6 +198,7 @@ virtual size: 0 B (0 bytes)
 encrypted: yes
 Format specific information:
 ivgen alg: plain64
+detached header: false
 hash alg: sha256
 cipher alg: aes-256
 uuid: ----
-- 
2.39.1




[PATCH RESEND v3 03/10] qapi: Make parameter 'file' optional for BlockdevCreateOptionsLUKS

2023-12-24 Thread Hyman Huang
To support detached LUKS header creation, make the existing 'file'
filed in BlockdevCreateOptionsLUKS optional, while also adding an
extra optional 'header' field in the next commit.

Signed-off-by: Hyman Huang 
---
 block/crypto.c   | 21 ++---
 qapi/block-core.json |  5 +++--
 2 files changed, 17 insertions(+), 9 deletions(-)

diff --git a/block/crypto.c b/block/crypto.c
index 6063879bac..78fbe79c95 100644
--- a/block/crypto.c
+++ b/block/crypto.c
@@ -659,9 +659,9 @@ block_crypto_co_create_luks(BlockdevCreateOptions 
*create_options, Error **errp)
 assert(create_options->driver == BLOCKDEV_DRIVER_LUKS);
 luks_opts = _options->u.luks;
 
-bs = bdrv_co_open_blockdev_ref(luks_opts->file, errp);
-if (bs == NULL) {
-return -EIO;
+if (luks_opts->file == NULL) {
+error_setg(errp, "Formatting LUKS disk requires parameter 'file'");
+return -EINVAL;
 }
 
 create_opts = (QCryptoBlockCreateOptions) {
@@ -673,10 +673,17 @@ block_crypto_co_create_luks(BlockdevCreateOptions 
*create_options, Error **errp)
 preallocation = luks_opts->preallocation;
 }
 
-ret = block_crypto_co_create_generic(bs, luks_opts->size, _opts,
- preallocation, errp);
-if (ret < 0) {
-goto fail;
+if (luks_opts->file) {
+bs = bdrv_co_open_blockdev_ref(luks_opts->file, errp);
+if (bs == NULL) {
+return -EIO;
+}
+
+ret = block_crypto_co_create_generic(bs, luks_opts->size, _opts,
+ preallocation, errp);
+if (ret < 0) {
+goto fail;
+}
 }
 
 ret = 0;
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 10be08d08f..9ac256c489 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -4945,7 +4945,8 @@
 #
 # Driver specific image creation options for LUKS.
 #
-# @file: Node to create the image format on
+# @file: Node to create the image format on, mandatory except when
+#'preallocation' is not requested
 #
 # @size: Size of the virtual disk in bytes
 #
@@ -4956,7 +4957,7 @@
 ##
 { 'struct': 'BlockdevCreateOptionsLUKS',
   'base': 'QCryptoBlockCreateOptionsLUKS',
-  'data': { 'file': 'BlockdevRef',
+  'data': { '*file':'BlockdevRef',
 'size': 'size',
 '*preallocation':   'PreallocMode' } }
 
-- 
2.39.1




[PATCH RESEND v3 06/10] block: Support detached LUKS header creation using blockdev-create

2023-12-24 Thread Hyman Huang
The LUKS disk with detached header consists of a separate LUKS
header and payload. This LUKS disk type should be formatted
as follows:

1. add the secret to lock/unlock the cipher stored in the
   detached LUKS header
$ virsh qemu-monitor-command vm '{"execute":"object-add",
> "arguments":{"qom-type": "secret", "id": "sec0", "data": "foo"}}'

2. create a header img with 0 size
$ virsh qemu-monitor-command vm '{"execute":"blockdev-create",
> "arguments":{"job-id":"job0", "options":{"driver":"file",
> "filename":"/path/to/detached_luks_header.img", "size":0 }}}'

3. add protocol blockdev node for header
$ virsh qemu-monitor-command vm '{"execute":"blockdev-add",
> "arguments": {"driver":"file", "filename":
> "/path/to/detached_luks_header.img", "node-name":
> "detached-luks-header-storage"}}'

4. create a payload img with 0 size
$ virsh qemu-monitor-command vm '{"execute":"blockdev-create",
> "arguments":{"job-id":"job1", "options":{"driver":"file",
> "filename":"/path/to/detached_luks_payload_raw.img", "size":0}}}'

5. add protocol blockdev node for payload
$ virsh qemu-monitor-command vm '{"execute":"blockdev-add",
> "arguments": {"driver":"file", "filename":
> "/path/to/detached_luks_payload_raw.img", "node-name":
> "luks-payload-raw-storage"}}'

6. do the formatting with 128M size
$ virsh qemu-monitor-command c81_node1 '{"execute":"blockdev-create",
> "arguments":{"job-id":"job2", "options":{"driver":"luks", "header":
> "detached-luks-header-storage", "file":"luks-payload-raw-storage",
> "size":134217728, "preallocation":"full", "key-secret":"sec0" }}}'

Signed-off-by: Hyman Huang 
---
 block/crypto.c  | 109 
 crypto/block-luks.c |   6 ++-
 crypto/block.c  |   1 +
 3 files changed, 106 insertions(+), 10 deletions(-)

diff --git a/block/crypto.c b/block/crypto.c
index 78fbe79c95..76cc8bda49 100644
--- a/block/crypto.c
+++ b/block/crypto.c
@@ -160,6 +160,48 @@ error:
 return ret;
 }
 
+static int coroutine_fn GRAPH_UNLOCKED
+block_crypto_co_format_luks_payload(BlockdevCreateOptionsLUKS *luks_opts,
+Error **errp)
+{
+BlockDriverState *bs = NULL;
+BlockBackend *blk = NULL;
+Error *local_error = NULL;
+int ret;
+
+if (luks_opts->size > INT64_MAX) {
+return -EFBIG;
+}
+
+bs = bdrv_co_open_blockdev_ref(luks_opts->file, errp);
+if (bs == NULL) {
+return -EIO;
+}
+
+blk = blk_co_new_with_bs(bs, BLK_PERM_WRITE | BLK_PERM_RESIZE,
+ BLK_PERM_ALL, errp);
+if (!blk) {
+ret = -EPERM;
+goto fail;
+}
+
+ret = blk_truncate(blk, luks_opts->size, true,
+   luks_opts->preallocation, 0, _error);
+if (ret < 0) {
+if (ret == -EFBIG) {
+/* Replace the error message with a better one */
+error_free(local_error);
+error_setg(errp, "The requested file size is too large");
+}
+goto fail;
+}
+
+ret = 0;
+
+fail:
+bdrv_co_unref(bs);
+return ret;
+}
 
 static QemuOptsList block_crypto_runtime_opts_luks = {
 .name = "crypto",
@@ -651,6 +693,7 @@ static int coroutine_fn GRAPH_UNLOCKED
 block_crypto_co_create_luks(BlockdevCreateOptions *create_options, Error 
**errp)
 {
 BlockdevCreateOptionsLUKS *luks_opts;
+BlockDriverState *hdr_bs = NULL;
 BlockDriverState *bs = NULL;
 QCryptoBlockCreateOptions create_opts;
 PreallocMode preallocation = PREALLOC_MODE_OFF;
@@ -659,8 +702,22 @@ block_crypto_co_create_luks(BlockdevCreateOptions 
*create_options, Error **errp)
 assert(create_options->driver == BLOCKDEV_DRIVER_LUKS);
 luks_opts = _options->u.luks;
 
-if (luks_opts->file == NULL) {
-error_setg(errp, "Formatting LUKS disk requires parameter 'file'");
+if (luks_opts->header == NULL && luks_opts->file == NULL) {
+error_setg(errp, "Either the parameter 'header' or 'file' should "
+   "be specified");
+return -EINVAL;
+}
+
+if (luks_opts->detached_header && luks_opts->header == NULL) {
+error_setg(errp, "Formatting a detached LUKS disk requries "

[PATCH RESEND v3 02/10] crypto: Support generic LUKS encryption

2023-12-24 Thread Hyman Huang
By enhancing the LUKS driver, it is possible to enable
the detachable LUKS header and, as a result, achieve
general encryption for any disk format that QEMU has
supported.

Take the qcow2 as an example, the usage of the generic
LUKS encryption as follows:

1. add a protocol blockdev node of data disk
$ virsh qemu-monitor-command vm '{"execute":"blockdev-add",
> "arguments":{"node-name":"libvirt-1-storage", "driver":"file",
> "filename":"/path/to/test_disk.qcow2"}}'

2. add a protocol blockdev node of LUKS header as above.
$ virsh qemu-monitor-command vm '{"execute":"blockdev-add",
> "arguments":{"node-name":"libvirt-2-storage", "driver":"file",
> "filename": "/path/to/cipher.gluks" }}'

3. add the secret for decrypting the cipher stored in LUKS
   header above
$ virsh qemu-monitor-command vm '{"execute":"object-add",
> "arguments":{"qom-type":"secret", "id":
> "libvirt-2-storage-secret0", "data":"abc123"}}'

4. add the qcow2-drived blockdev format node
$ virsh qemu-monitor-command vm '{"execute":"blockdev-add",
> "arguments":{"node-name":"libvirt-1-format", "driver":"qcow2",
> "file":"libvirt-1-storage"}}'

5. add the luks-drived blockdev to link the qcow2 disk with
   LUKS header by specifying the field "header"
$ virsh qemu-monitor-command vm '{"execute":"blockdev-add",
> "arguments":{"node-name":"libvirt-2-format", "driver":"luks",
> "file":"libvirt-1-format", "header":"libvirt-2-storage",
> "key-secret":"libvirt-2-format-secret0"}}'

6. add the virtio-blk device finally
$ virsh qemu-monitor-command vm '{"execute":"device_add",
> "arguments": {"num-queues":"1", "driver":"virtio-blk-pci",
> "drive": "libvirt-2-format", "id":"virtio-disk2"}}'

The generic LUKS encryption method of starting a virtual
machine (VM) is somewhat similar to hot-plug in that both
maintaining the same json command while the starting VM
changes the "blockdev-add/device_add" parameters to
"blockdev/device".

Signed-off-by: Hyman Huang 
Message-Id: 
<910801f303da1601051479d3b7e5c2c6b4e01eb7.1701879996.git.yong.hu...@smartx.com>
---
 block/crypto.c | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/block/crypto.c b/block/crypto.c
index f82b13d32b..6063879bac 100644
--- a/block/crypto.c
+++ b/block/crypto.c
@@ -64,12 +64,14 @@ static int block_crypto_read_func(QCryptoBlock *block,
   Error **errp)
 {
 BlockDriverState *bs = opaque;
+BlockCrypto *crypto = bs->opaque;
 ssize_t ret;
 
 GLOBAL_STATE_CODE();
 GRAPH_RDLOCK_GUARD_MAINLOOP();
 
-ret = bdrv_pread(bs->file, offset, buflen, buf, 0);
+ret = bdrv_pread(crypto->header ? crypto->header : bs->file,
+ offset, buflen, buf, 0);
 if (ret < 0) {
 error_setg_errno(errp, -ret, "Could not read encryption header");
 return ret;
@@ -269,6 +271,7 @@ static int block_crypto_open_generic(QCryptoBlockFormat 
format,
 QCryptoBlockOpenOptions *open_opts = NULL;
 unsigned int cflags = 0;
 QDict *cryptoopts = NULL;
+const char *hdr_bdref = qdict_get_try_str(options, "header");
 
 GLOBAL_STATE_CODE();
 
@@ -277,6 +280,15 @@ static int block_crypto_open_generic(QCryptoBlockFormat 
format,
 return ret;
 }
 
+if (hdr_bdref) {
+crypto->header = bdrv_open_child(NULL, options, "header", bs,
+ _of_bds, BDRV_CHILD_METADATA,
+ false, errp);
+if (!crypto->header) {
+return -EINVAL;
+}
+}
+
 GRAPH_RDLOCK_GUARD_MAINLOOP();
 
 bs->supported_write_flags = BDRV_REQ_FUA &
-- 
2.39.1




[PATCH RESEND v3 07/10] block: Support detached LUKS header creation using qemu-img

2023-12-24 Thread Hyman Huang
Add the 'detached-mode' option to specify the creation of
a detached LUKS header. This is how it is used:
$ qemu-img create --object secret,id=sec0,data=abc123 -f luks
> -o cipher-alg=aes-256,cipher-mode=xts -o key-secret=sec0
> -o detached-mode=true header.luks

Signed-off-by: Hyman Huang 
---
 block.c  | 5 -
 block/crypto.c   | 9 -
 block/crypto.h   | 8 
 qapi/crypto.json | 5 -
 4 files changed, 24 insertions(+), 3 deletions(-)

diff --git a/block.c b/block.c
index bfb0861ec6..fa9ce36928 100644
--- a/block.c
+++ b/block.c
@@ -7517,7 +7517,10 @@ void bdrv_img_create(const char *filename, const char 
*fmt,
 goto out;
 }
 
-if (size == -1) {
+/* Parameter 'size' is not needed for detached LUKS header */
+if (size == -1 &&
+!(!strcmp(fmt, "luks") &&
+  qemu_opt_get_bool(opts, "detached-mode", false))) {
 error_setg(errp, "Image creation needs a size parameter");
 goto out;
 }
diff --git a/block/crypto.c b/block/crypto.c
index 76cc8bda49..812c3c28f5 100644
--- a/block/crypto.c
+++ b/block/crypto.c
@@ -229,6 +229,7 @@ static QemuOptsList block_crypto_create_opts_luks = {
 BLOCK_CRYPTO_OPT_DEF_LUKS_IVGEN_HASH_ALG(""),
 BLOCK_CRYPTO_OPT_DEF_LUKS_HASH_ALG(""),
 BLOCK_CRYPTO_OPT_DEF_LUKS_ITER_TIME(""),
+BLOCK_CRYPTO_OPT_DEF_LUKS_DETACHED_MODE(""),
 { /* end of list */ }
 },
 };
@@ -793,6 +794,8 @@ block_crypto_co_create_opts_luks(BlockDriver *drv, const 
char *filename,
 PreallocMode prealloc;
 char *buf = NULL;
 int64_t size;
+bool detached_mode =
+qemu_opt_get_bool(opts, "detached-mode", false);
 int ret;
 Error *local_err = NULL;
 
@@ -832,8 +835,12 @@ block_crypto_co_create_opts_luks(BlockDriver *drv, const 
char *filename,
 goto fail;
 }
 
+   /* The detached_header default to true if detached-mode is specified */
+create_opts->u.luks.detached_header = detached_mode ? true : false;
+
 /* Create format layer */
-ret = block_crypto_co_create_generic(bs, size, create_opts, prealloc, 
errp);
+ret = block_crypto_co_create_generic(bs, detached_mode ? 0 : size,
+ create_opts, prealloc, errp);
 if (ret < 0) {
 goto fail;
 }
diff --git a/block/crypto.h b/block/crypto.h
index 72e792c9af..bceefd45bd 100644
--- a/block/crypto.h
+++ b/block/crypto.h
@@ -41,6 +41,7 @@
 #define BLOCK_CRYPTO_OPT_LUKS_IVGEN_HASH_ALG "ivgen-hash-alg"
 #define BLOCK_CRYPTO_OPT_LUKS_HASH_ALG "hash-alg"
 #define BLOCK_CRYPTO_OPT_LUKS_ITER_TIME "iter-time"
+#define BLOCK_CRYPTO_OPT_LUKS_DETACHED_MODE "detached-mode"
 #define BLOCK_CRYPTO_OPT_LUKS_KEYSLOT "keyslot"
 #define BLOCK_CRYPTO_OPT_LUKS_STATE "state"
 #define BLOCK_CRYPTO_OPT_LUKS_OLD_SECRET "old-secret"
@@ -100,6 +101,13 @@
 .help = "Select new state of affected keyslots (active/inactive)",\
 }
 
+#define BLOCK_CRYPTO_OPT_DEF_LUKS_DETACHED_MODE(prefix) \
+{ \
+.name = prefix BLOCK_CRYPTO_OPT_LUKS_DETACHED_MODE, \
+.type = QEMU_OPT_BOOL,\
+.help = "Create a detached LUKS header",  \
+}
+
 #define BLOCK_CRYPTO_OPT_DEF_LUKS_KEYSLOT(prefix)  \
 {  \
 .name = prefix BLOCK_CRYPTO_OPT_LUKS_KEYSLOT,  \
diff --git a/qapi/crypto.json b/qapi/crypto.json
index 6b4e86cb81..8e81aa8454 100644
--- a/qapi/crypto.json
+++ b/qapi/crypto.json
@@ -226,6 +226,8 @@
 # @iter-time: number of milliseconds to spend in PBKDF passphrase
 # processing.  Currently defaults to 2000. (since 2.8)
 #
+# @detached-mode: create a detached LUKS header. (since 9.0)
+#
 # Since: 2.6
 ##
 { 'struct': 'QCryptoBlockCreateOptionsLUKS',
@@ -235,7 +237,8 @@
 '*ivgen-alg': 'QCryptoIVGenAlgorithm',
 '*ivgen-hash-alg': 'QCryptoHashAlgorithm',
 '*hash-alg': 'QCryptoHashAlgorithm',
-'*iter-time': 'int'}}
+'*iter-time': 'int',
+'*detached-mode': 'bool'}}
 
 ##
 # @QCryptoBlockOpenOptions:
-- 
2.39.1




[v3 04/10] crypto: Introduce creation option and structure for detached LUKS header

2023-12-24 Thread Hyman Huang
Introduce 'header' field in BlockdevCreateOptionsLUKS to support
detached LUKS header creation. Meanwhile, introduce header-related
field in QCryptoBlock.

Signed-off-by: Hyman Huang 
---
 crypto/blockpriv.h   | 3 +++
 qapi/block-core.json | 3 +++
 qapi/crypto.json | 5 -
 3 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/crypto/blockpriv.h b/crypto/blockpriv.h
index 3c7ccea504..6289aea961 100644
--- a/crypto/blockpriv.h
+++ b/crypto/blockpriv.h
@@ -42,6 +42,9 @@ struct QCryptoBlock {
 size_t niv;
 uint64_t payload_offset; /* In bytes */
 uint64_t sector_size; /* In bytes */
+
+bool detached_header; /* True if disk has a detached LUKS header */
+uint64_t detached_header_size; /* LUKS header size plus key slot size */
 };
 
 struct QCryptoBlockDriver {
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 9ac256c489..8aec179926 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -4948,6 +4948,8 @@
 # @file: Node to create the image format on, mandatory except when
 #'preallocation' is not requested
 #
+# @header: Detached LUKS header node to format. (since 9.0)
+#
 # @size: Size of the virtual disk in bytes
 #
 # @preallocation: Preallocation mode for the new image (since: 4.2)
@@ -4958,6 +4960,7 @@
 { 'struct': 'BlockdevCreateOptionsLUKS',
   'base': 'QCryptoBlockCreateOptionsLUKS',
   'data': { '*file':'BlockdevRef',
+'*header':  'BlockdevRef',
 'size': 'size',
 '*preallocation':   'PreallocMode' } }
 
diff --git a/qapi/crypto.json b/qapi/crypto.json
index fd3d46ebd1..6b4e86cb81 100644
--- a/qapi/crypto.json
+++ b/qapi/crypto.json
@@ -195,10 +195,13 @@
 # decryption key.  Mandatory except when probing image for
 # metadata only.
 #
+# @detached-header: if true, disk has detached LUKS header.
+#
 # Since: 2.6
 ##
 { 'struct': 'QCryptoBlockOptionsLUKS',
-  'data': { '*key-secret': 'str' }}
+  'data': { '*key-secret': 'str',
+'*detached-header': 'bool' }}
 
 ##
 # @QCryptoBlockCreateOptionsLUKS:
-- 
2.39.1




[v3 09/10] tests: Add detached LUKS header case

2023-12-24 Thread Hyman Huang
Signed-off-by: Hyman Huang 
---
 tests/qemu-iotests/tests/luks-detached-header | 214 ++
 .../tests/luks-detached-header.out|   5 +
 2 files changed, 219 insertions(+)
 create mode 100755 tests/qemu-iotests/tests/luks-detached-header
 create mode 100644 tests/qemu-iotests/tests/luks-detached-header.out

diff --git a/tests/qemu-iotests/tests/luks-detached-header 
b/tests/qemu-iotests/tests/luks-detached-header
new file mode 100755
index 00..cf305bfa47
--- /dev/null
+++ b/tests/qemu-iotests/tests/luks-detached-header
@@ -0,0 +1,214 @@
+#!/usr/bin/env python3
+# group: rw auto
+#
+# Test detached LUKS header
+#
+# Copyright (C) 2024 SmartX Inc.
+#
+# Authors:
+# Hyman Huang 
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+#
+
+import os
+import iotests
+from iotests import imgfmt, qemu_img_create, img_info_log, qemu_img_info, 
QMPTestCase
+
+
+image_size = 128 * 1024 * 1024
+
+luks_img = os.path.join(iotests.test_dir, 'luks.img')
+detached_header_img1 = os.path.join(iotests.test_dir, 'detached_header.img1')
+detached_header_img2 = os.path.join(iotests.test_dir, 'detached_header.img2')
+detached_payload_raw_img = os.path.join(iotests.test_dir, 
'detached_payload_raw.img')
+detached_payload_qcow2_img = os.path.join(iotests.test_dir, 
'detached_payload_qcow2.img')
+
+secret_obj = 'secret,id=sec0,data=foo'
+luks_opts = 'key-secret=sec0'
+
+
+class TestDetachedLUKSHeader(QMPTestCase):
+def setUp(self) -> None:
+self.vm = iotests.VM()
+self.vm.add_object(secret_obj)
+self.vm.launch()
+
+# 1. Create the normal LUKS disk with 128M size
+self.vm.blockdev_create({ 'driver': 'file',
+  'filename': luks_img,
+  'size': 0 })
+self.vm.qmp_log('blockdev-add', driver='file', filename=luks_img,
+ node_name='luks-1-storage')
+result = self.vm.blockdev_create({ 'driver': imgfmt,
+   'file': 'luks-1-storage',
+   'key-secret': 'sec0',
+   'size': image_size,
+   'iter-time': 10 })
+# None is expected
+self.assertEqual(result, None)
+
+# 2. Create the LUKS disk with detached header (raw)
+
+# Create detached LUKS header
+self.vm.blockdev_create({ 'driver': 'file',
+  'filename': detached_header_img1,
+  'size': 0 })
+self.vm.qmp_log('blockdev-add', driver='file', 
filename=detached_header_img1,
+ node_name='luks-2-header-storage')
+
+# Create detached LUKS raw payload
+self.vm.blockdev_create({ 'driver': 'file',
+  'filename': detached_payload_raw_img,
+  'size': 0 })
+self.vm.qmp_log('blockdev-add', driver='file',
+ filename=detached_payload_raw_img,
+ node_name='luks-2-payload-storage')
+
+# Format LUKS disk with detached header
+result = self.vm.blockdev_create({ 'driver': imgfmt,
+   'header': 'luks-2-header-storage',
+   'file': 'luks-2-payload-storage',
+   'key-secret': 'sec0',
+   'preallocation': 'full',
+   'size': image_size,
+   'iter-time': 10 })
+self.assertEqual(result, None)
+
+self.vm.shutdown()
+
+# 3. Create the LUKS disk with detached header (qcow2)
+
+# Create detached LUKS header using qemu-img
+res = qemu_img_create('-f', 'luks', '--object', secret_obj, '-o', 
luks_opts,
+  '-o', "detached-mode=true", detached_header_img2)
+assert res.returncode == 0
+
+# Create detached LUKS qcow2 payload
+res = qemu_img_create('-f', 'qcow2', detached_payload_qcow2_img, 
str(image_size))
+assert res.returncode == 0
+
+def tearDown(self) -> None:
+os.remove(luks_img)
+os.remove(detached_header_img1)
+os.remov

[v3 06/10] block: Support detached LUKS header creation using blockdev-create

2023-12-24 Thread Hyman Huang
The LUKS disk with detached header consists of a separate LUKS
header and payload. This LUKS disk type should be formatted
as follows:

1. add the secret to lock/unlock the cipher stored in the
   detached LUKS header
$ virsh qemu-monitor-command vm '{"execute":"object-add",
> "arguments":{"qom-type": "secret", "id": "sec0", "data": "foo"}}'

2. create a header img with 0 size
$ virsh qemu-monitor-command vm '{"execute":"blockdev-create",
> "arguments":{"job-id":"job0", "options":{"driver":"file",
> "filename":"/path/to/detached_luks_header.img", "size":0 }}}'

3. add protocol blockdev node for header
$ virsh qemu-monitor-command vm '{"execute":"blockdev-add",
> "arguments": {"driver":"file", "filename":
> "/path/to/detached_luks_header.img", "node-name":
> "detached-luks-header-storage"}}'

4. create a payload img with 0 size
$ virsh qemu-monitor-command vm '{"execute":"blockdev-create",
> "arguments":{"job-id":"job1", "options":{"driver":"file",
> "filename":"/path/to/detached_luks_payload_raw.img", "size":0}}}'

5. add protocol blockdev node for payload
$ virsh qemu-monitor-command vm '{"execute":"blockdev-add",
> "arguments": {"driver":"file", "filename":
> "/path/to/detached_luks_payload_raw.img", "node-name":
> "luks-payload-raw-storage"}}'

6. do the formatting with 128M size
$ virsh qemu-monitor-command c81_node1 '{"execute":"blockdev-create",
> "arguments":{"job-id":"job2", "options":{"driver":"luks", "header":
> "detached-luks-header-storage", "file":"luks-payload-raw-storage",
> "size":134217728, "preallocation":"full", "key-secret":"sec0" }}}'

Signed-off-by: Hyman Huang 
---
 block/crypto.c  | 109 
 crypto/block-luks.c |   6 ++-
 crypto/block.c  |   1 +
 3 files changed, 106 insertions(+), 10 deletions(-)

diff --git a/block/crypto.c b/block/crypto.c
index 78fbe79c95..76cc8bda49 100644
--- a/block/crypto.c
+++ b/block/crypto.c
@@ -160,6 +160,48 @@ error:
 return ret;
 }
 
+static int coroutine_fn GRAPH_UNLOCKED
+block_crypto_co_format_luks_payload(BlockdevCreateOptionsLUKS *luks_opts,
+Error **errp)
+{
+BlockDriverState *bs = NULL;
+BlockBackend *blk = NULL;
+Error *local_error = NULL;
+int ret;
+
+if (luks_opts->size > INT64_MAX) {
+return -EFBIG;
+}
+
+bs = bdrv_co_open_blockdev_ref(luks_opts->file, errp);
+if (bs == NULL) {
+return -EIO;
+}
+
+blk = blk_co_new_with_bs(bs, BLK_PERM_WRITE | BLK_PERM_RESIZE,
+ BLK_PERM_ALL, errp);
+if (!blk) {
+ret = -EPERM;
+goto fail;
+}
+
+ret = blk_truncate(blk, luks_opts->size, true,
+   luks_opts->preallocation, 0, _error);
+if (ret < 0) {
+if (ret == -EFBIG) {
+/* Replace the error message with a better one */
+error_free(local_error);
+error_setg(errp, "The requested file size is too large");
+}
+goto fail;
+}
+
+ret = 0;
+
+fail:
+bdrv_co_unref(bs);
+return ret;
+}
 
 static QemuOptsList block_crypto_runtime_opts_luks = {
 .name = "crypto",
@@ -651,6 +693,7 @@ static int coroutine_fn GRAPH_UNLOCKED
 block_crypto_co_create_luks(BlockdevCreateOptions *create_options, Error 
**errp)
 {
 BlockdevCreateOptionsLUKS *luks_opts;
+BlockDriverState *hdr_bs = NULL;
 BlockDriverState *bs = NULL;
 QCryptoBlockCreateOptions create_opts;
 PreallocMode preallocation = PREALLOC_MODE_OFF;
@@ -659,8 +702,22 @@ block_crypto_co_create_luks(BlockdevCreateOptions 
*create_options, Error **errp)
 assert(create_options->driver == BLOCKDEV_DRIVER_LUKS);
 luks_opts = _options->u.luks;
 
-if (luks_opts->file == NULL) {
-error_setg(errp, "Formatting LUKS disk requires parameter 'file'");
+if (luks_opts->header == NULL && luks_opts->file == NULL) {
+error_setg(errp, "Either the parameter 'header' or 'file' should "
+   "be specified");
+return -EINVAL;
+}
+
+if (luks_opts->detached_header && luks_opts->header == NULL) {
+error_setg(errp, "Formatting a detached LUKS disk requries "

[v3 03/10] qapi: Make parameter 'file' optional for BlockdevCreateOptionsLUKS

2023-12-24 Thread Hyman Huang
To support detached LUKS header creation, make the existing 'file'
filed in BlockdevCreateOptionsLUKS optional, while also adding an
extra optional 'header' field in the next commit.

Signed-off-by: Hyman Huang 
---
 block/crypto.c   | 21 ++---
 qapi/block-core.json |  5 +++--
 2 files changed, 17 insertions(+), 9 deletions(-)

diff --git a/block/crypto.c b/block/crypto.c
index 6063879bac..78fbe79c95 100644
--- a/block/crypto.c
+++ b/block/crypto.c
@@ -659,9 +659,9 @@ block_crypto_co_create_luks(BlockdevCreateOptions 
*create_options, Error **errp)
 assert(create_options->driver == BLOCKDEV_DRIVER_LUKS);
 luks_opts = _options->u.luks;
 
-bs = bdrv_co_open_blockdev_ref(luks_opts->file, errp);
-if (bs == NULL) {
-return -EIO;
+if (luks_opts->file == NULL) {
+error_setg(errp, "Formatting LUKS disk requires parameter 'file'");
+return -EINVAL;
 }
 
 create_opts = (QCryptoBlockCreateOptions) {
@@ -673,10 +673,17 @@ block_crypto_co_create_luks(BlockdevCreateOptions 
*create_options, Error **errp)
 preallocation = luks_opts->preallocation;
 }
 
-ret = block_crypto_co_create_generic(bs, luks_opts->size, _opts,
- preallocation, errp);
-if (ret < 0) {
-goto fail;
+if (luks_opts->file) {
+bs = bdrv_co_open_blockdev_ref(luks_opts->file, errp);
+if (bs == NULL) {
+return -EIO;
+}
+
+ret = block_crypto_co_create_generic(bs, luks_opts->size, _opts,
+ preallocation, errp);
+if (ret < 0) {
+goto fail;
+}
 }
 
 ret = 0;
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 10be08d08f..9ac256c489 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -4945,7 +4945,8 @@
 #
 # Driver specific image creation options for LUKS.
 #
-# @file: Node to create the image format on
+# @file: Node to create the image format on, mandatory except when
+#'preallocation' is not requested
 #
 # @size: Size of the virtual disk in bytes
 #
@@ -4956,7 +4957,7 @@
 ##
 { 'struct': 'BlockdevCreateOptionsLUKS',
   'base': 'QCryptoBlockCreateOptionsLUKS',
-  'data': { 'file': 'BlockdevRef',
+  'data': { '*file':'BlockdevRef',
 'size': 'size',
 '*preallocation':   'PreallocMode' } }
 
-- 
2.39.1




[v3 08/10] crypto: Introduce 'detached-header' field in QCryptoBlockInfoLUKS

2023-12-24 Thread Hyman Huang
When querying the LUKS disk with the qemu-img tool or other APIs,
add information about whether the LUKS header is detached.

Additionally, update the test case with the appropriate
modification.

Signed-off-by: Hyman Huang 
---
 crypto/block-luks.c| 2 ++
 qapi/crypto.json   | 3 +++
 tests/qemu-iotests/210.out | 4 
 3 files changed, 9 insertions(+)

diff --git a/crypto/block-luks.c b/crypto/block-luks.c
index 474c7aee2e..c5e53b4ee4 100644
--- a/crypto/block-luks.c
+++ b/crypto/block-luks.c
@@ -1266,6 +1266,7 @@ qcrypto_block_luks_open(QCryptoBlock *block,
 block->sector_size = QCRYPTO_BLOCK_LUKS_SECTOR_SIZE;
 block->payload_offset =
 qcrypto_block_luks_payload_offset(luks->header.payload_offset_sector);
+block->detached_header = (block->payload_offset == 0) ? true : false;
 
 return 0;
 
@@ -1892,6 +1893,7 @@ static int qcrypto_block_luks_get_info(QCryptoBlock 
*block,
 info->u.luks.master_key_iters = luks->header.master_key_iterations;
 info->u.luks.uuid = g_strndup((const char *)luks->header.uuid,
   sizeof(luks->header.uuid));
+info->u.luks.detached_header = block->detached_header;
 
 for (i = 0; i < QCRYPTO_BLOCK_LUKS_NUM_KEY_SLOTS; i++) {
 slot = g_new0(QCryptoBlockInfoLUKSSlot, 1);
diff --git a/qapi/crypto.json b/qapi/crypto.json
index 8e81aa8454..336c880b5d 100644
--- a/qapi/crypto.json
+++ b/qapi/crypto.json
@@ -317,6 +317,8 @@
 #
 # @hash-alg: the master key hash algorithm
 #
+# @detached-header: whether the LUKS header is detached (Since 9.0)
+#
 # @payload-offset: offset to the payload data in bytes
 #
 # @master-key-iters: number of PBKDF2 iterations for key material
@@ -333,6 +335,7 @@
'ivgen-alg': 'QCryptoIVGenAlgorithm',
'*ivgen-hash-alg': 'QCryptoHashAlgorithm',
'hash-alg': 'QCryptoHashAlgorithm',
+   'detached-header': 'bool',
'payload-offset': 'int',
'master-key-iters': 'int',
'uuid': 'str',
diff --git a/tests/qemu-iotests/210.out b/tests/qemu-iotests/210.out
index 96d9f749dd..94b29b2120 100644
--- a/tests/qemu-iotests/210.out
+++ b/tests/qemu-iotests/210.out
@@ -18,6 +18,7 @@ virtual size: 128 MiB (134217728 bytes)
 encrypted: yes
 Format specific information:
 ivgen alg: plain64
+detached header: false
 hash alg: sha256
 cipher alg: aes-256
 uuid: ----
@@ -70,6 +71,7 @@ virtual size: 64 MiB (67108864 bytes)
 encrypted: yes
 Format specific information:
 ivgen alg: plain64
+detached header: false
 hash alg: sha1
 cipher alg: aes-128
 uuid: ----
@@ -125,6 +127,7 @@ virtual size: 0 B (0 bytes)
 encrypted: yes
 Format specific information:
 ivgen alg: plain64
+detached header: false
 hash alg: sha256
 cipher alg: aes-256
 uuid: ----
@@ -195,6 +198,7 @@ virtual size: 0 B (0 bytes)
 encrypted: yes
 Format specific information:
 ivgen alg: plain64
+detached header: false
 hash alg: sha256
 cipher alg: aes-256
 uuid: ----
-- 
2.39.1




[v3 02/10] crypto: Support generic LUKS encryption

2023-12-24 Thread Hyman Huang
By enhancing the LUKS driver, it is possible to enable
the detachable LUKS header and, as a result, achieve
general encryption for any disk format that QEMU has
supported.

Take the qcow2 as an example, the usage of the generic
LUKS encryption as follows:

1. add a protocol blockdev node of data disk
$ virsh qemu-monitor-command vm '{"execute":"blockdev-add",
> "arguments":{"node-name":"libvirt-1-storage", "driver":"file",
> "filename":"/path/to/test_disk.qcow2"}}'

2. add a protocol blockdev node of LUKS header as above.
$ virsh qemu-monitor-command vm '{"execute":"blockdev-add",
> "arguments":{"node-name":"libvirt-2-storage", "driver":"file",
> "filename": "/path/to/cipher.gluks" }}'

3. add the secret for decrypting the cipher stored in LUKS
   header above
$ virsh qemu-monitor-command vm '{"execute":"object-add",
> "arguments":{"qom-type":"secret", "id":
> "libvirt-2-storage-secret0", "data":"abc123"}}'

4. add the qcow2-drived blockdev format node
$ virsh qemu-monitor-command vm '{"execute":"blockdev-add",
> "arguments":{"node-name":"libvirt-1-format", "driver":"qcow2",
> "file":"libvirt-1-storage"}}'

5. add the luks-drived blockdev to link the qcow2 disk with
   LUKS header by specifying the field "header"
$ virsh qemu-monitor-command vm '{"execute":"blockdev-add",
> "arguments":{"node-name":"libvirt-2-format", "driver":"luks",
> "file":"libvirt-1-format", "header":"libvirt-2-storage",
> "key-secret":"libvirt-2-format-secret0"}}'

6. add the virtio-blk device finally
$ virsh qemu-monitor-command vm '{"execute":"device_add",
> "arguments": {"num-queues":"1", "driver":"virtio-blk-pci",
> "drive": "libvirt-2-format", "id":"virtio-disk2"}}'

The generic LUKS encryption method of starting a virtual
machine (VM) is somewhat similar to hot-plug in that both
maintaining the same json command while the starting VM
changes the "blockdev-add/device_add" parameters to
"blockdev/device".

Signed-off-by: Hyman Huang 
Message-Id: 
<910801f303da1601051479d3b7e5c2c6b4e01eb7.1701879996.git.yong.hu...@smartx.com>
---
 block/crypto.c | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/block/crypto.c b/block/crypto.c
index f82b13d32b..6063879bac 100644
--- a/block/crypto.c
+++ b/block/crypto.c
@@ -64,12 +64,14 @@ static int block_crypto_read_func(QCryptoBlock *block,
   Error **errp)
 {
 BlockDriverState *bs = opaque;
+BlockCrypto *crypto = bs->opaque;
 ssize_t ret;
 
 GLOBAL_STATE_CODE();
 GRAPH_RDLOCK_GUARD_MAINLOOP();
 
-ret = bdrv_pread(bs->file, offset, buflen, buf, 0);
+ret = bdrv_pread(crypto->header ? crypto->header : bs->file,
+ offset, buflen, buf, 0);
 if (ret < 0) {
 error_setg_errno(errp, -ret, "Could not read encryption header");
 return ret;
@@ -269,6 +271,7 @@ static int block_crypto_open_generic(QCryptoBlockFormat 
format,
 QCryptoBlockOpenOptions *open_opts = NULL;
 unsigned int cflags = 0;
 QDict *cryptoopts = NULL;
+const char *hdr_bdref = qdict_get_try_str(options, "header");
 
 GLOBAL_STATE_CODE();
 
@@ -277,6 +280,15 @@ static int block_crypto_open_generic(QCryptoBlockFormat 
format,
 return ret;
 }
 
+if (hdr_bdref) {
+crypto->header = bdrv_open_child(NULL, options, "header", bs,
+ _of_bds, BDRV_CHILD_METADATA,
+ false, errp);
+if (!crypto->header) {
+return -EINVAL;
+}
+}
+
 GRAPH_RDLOCK_GUARD_MAINLOOP();
 
 bs->supported_write_flags = BDRV_REQ_FUA &
-- 
2.39.1




[v3 05/10] crypto: Mark the payload_offset_sector invalid for detached LUKS header

2023-12-24 Thread Hyman Huang
Set the payload_offset_sector to a value that is nearly never reached
in order to mark it as invalid and indicate that 0 should be the offset
of the read/write operation on the 'file' protocol blockdev node.

Signed-off-by: Hyman Huang 
---
 crypto/block-luks.c | 41 +++--
 1 file changed, 31 insertions(+), 10 deletions(-)

diff --git a/crypto/block-luks.c b/crypto/block-luks.c
index fb01ec38bb..48443ffcae 100644
--- a/crypto/block-luks.c
+++ b/crypto/block-luks.c
@@ -34,6 +34,8 @@
 
 #include "qemu/bitmap.h"
 
+#define INVALID_SECTOR_OFFSET UINT32_MAX
+
 /*
  * Reference for the LUKS format implemented here is
  *
@@ -136,6 +138,13 @@ struct QCryptoBlockLUKS {
 };
 
 
+static inline uint32_t
+qcrypto_block_luks_payload_offset(uint32_t sector)
+{
+return sector == INVALID_SECTOR_OFFSET ? 0 :
+sector * QCRYPTO_BLOCK_LUKS_SECTOR_SIZE;
+}
+
 static int qcrypto_block_luks_cipher_name_lookup(const char *name,
  QCryptoCipherMode mode,
  uint32_t key_bytes,
@@ -1255,8 +1264,8 @@ qcrypto_block_luks_open(QCryptoBlock *block,
 }
 
 block->sector_size = QCRYPTO_BLOCK_LUKS_SECTOR_SIZE;
-block->payload_offset = luks->header.payload_offset_sector *
-block->sector_size;
+block->payload_offset =
+qcrypto_block_luks_payload_offset(luks->header.payload_offset_sector);
 
 return 0;
 
@@ -1529,16 +1538,28 @@ qcrypto_block_luks_create(QCryptoBlock *block,
 slot->stripes = QCRYPTO_BLOCK_LUKS_STRIPES;
 }
 
-/* The total size of the LUKS headers is the partition header + key
- * slot headers, rounded up to the nearest sector, combined with
- * the size of each master key material region, also rounded up
- * to the nearest sector */
-luks->header.payload_offset_sector = header_sectors +
-QCRYPTO_BLOCK_LUKS_NUM_KEY_SLOTS * split_key_sectors;
+if (block->detached_header) {
+/*
+ * Set the payload_offset_sector to a value that is nearly never
+ * reached in order to mark it as invalid and indicate that 0 should
+ * be the offset of the read/write operation on the 'file' protocol
+ * blockdev node. Here the UINT32_MAX is choosed
+ */
+luks->header.payload_offset_sector = INVALID_SECTOR_OFFSET;
+} else {
+/*
+ * The total size of the LUKS headers is the partition header + key
+ * slot headers, rounded up to the nearest sector, combined with
+ * the size of each master key material region, also rounded up
+ * to the nearest sector
+ */
+luks->header.payload_offset_sector = header_sectors +
+QCRYPTO_BLOCK_LUKS_NUM_KEY_SLOTS * split_key_sectors;
+}
 
 block->sector_size = QCRYPTO_BLOCK_LUKS_SECTOR_SIZE;
-block->payload_offset = luks->header.payload_offset_sector *
-block->sector_size;
+block->payload_offset =
+qcrypto_block_luks_payload_offset(luks->header.payload_offset_sector);
 
 /* Reserve header space to match payload offset */
 initfunc(block, block->payload_offset, opaque, _err);
-- 
2.39.1




[v3 07/10] block: Support detached LUKS header creation using qemu-img

2023-12-24 Thread Hyman Huang
Add the 'detached-mode' option to specify the creation of
a detached LUKS header. This is how it is used:
$ qemu-img create --object secret,id=sec0,data=abc123 -f luks
> -o cipher-alg=aes-256,cipher-mode=xts -o key-secret=sec0
> -o detached-mode=true header.luks

Signed-off-by: Hyman Huang 
---
 block.c  | 5 -
 block/crypto.c   | 9 -
 block/crypto.h   | 8 
 qapi/crypto.json | 5 -
 4 files changed, 24 insertions(+), 3 deletions(-)

diff --git a/block.c b/block.c
index bfb0861ec6..fa9ce36928 100644
--- a/block.c
+++ b/block.c
@@ -7517,7 +7517,10 @@ void bdrv_img_create(const char *filename, const char 
*fmt,
 goto out;
 }
 
-if (size == -1) {
+/* Parameter 'size' is not needed for detached LUKS header */
+if (size == -1 &&
+!(!strcmp(fmt, "luks") &&
+  qemu_opt_get_bool(opts, "detached-mode", false))) {
 error_setg(errp, "Image creation needs a size parameter");
 goto out;
 }
diff --git a/block/crypto.c b/block/crypto.c
index 76cc8bda49..812c3c28f5 100644
--- a/block/crypto.c
+++ b/block/crypto.c
@@ -229,6 +229,7 @@ static QemuOptsList block_crypto_create_opts_luks = {
 BLOCK_CRYPTO_OPT_DEF_LUKS_IVGEN_HASH_ALG(""),
 BLOCK_CRYPTO_OPT_DEF_LUKS_HASH_ALG(""),
 BLOCK_CRYPTO_OPT_DEF_LUKS_ITER_TIME(""),
+BLOCK_CRYPTO_OPT_DEF_LUKS_DETACHED_MODE(""),
 { /* end of list */ }
 },
 };
@@ -793,6 +794,8 @@ block_crypto_co_create_opts_luks(BlockDriver *drv, const 
char *filename,
 PreallocMode prealloc;
 char *buf = NULL;
 int64_t size;
+bool detached_mode =
+qemu_opt_get_bool(opts, "detached-mode", false);
 int ret;
 Error *local_err = NULL;
 
@@ -832,8 +835,12 @@ block_crypto_co_create_opts_luks(BlockDriver *drv, const 
char *filename,
 goto fail;
 }
 
+   /* The detached_header default to true if detached-mode is specified */
+create_opts->u.luks.detached_header = detached_mode ? true : false;
+
 /* Create format layer */
-ret = block_crypto_co_create_generic(bs, size, create_opts, prealloc, 
errp);
+ret = block_crypto_co_create_generic(bs, detached_mode ? 0 : size,
+ create_opts, prealloc, errp);
 if (ret < 0) {
 goto fail;
 }
diff --git a/block/crypto.h b/block/crypto.h
index 72e792c9af..bceefd45bd 100644
--- a/block/crypto.h
+++ b/block/crypto.h
@@ -41,6 +41,7 @@
 #define BLOCK_CRYPTO_OPT_LUKS_IVGEN_HASH_ALG "ivgen-hash-alg"
 #define BLOCK_CRYPTO_OPT_LUKS_HASH_ALG "hash-alg"
 #define BLOCK_CRYPTO_OPT_LUKS_ITER_TIME "iter-time"
+#define BLOCK_CRYPTO_OPT_LUKS_DETACHED_MODE "detached-mode"
 #define BLOCK_CRYPTO_OPT_LUKS_KEYSLOT "keyslot"
 #define BLOCK_CRYPTO_OPT_LUKS_STATE "state"
 #define BLOCK_CRYPTO_OPT_LUKS_OLD_SECRET "old-secret"
@@ -100,6 +101,13 @@
 .help = "Select new state of affected keyslots (active/inactive)",\
 }
 
+#define BLOCK_CRYPTO_OPT_DEF_LUKS_DETACHED_MODE(prefix) \
+{ \
+.name = prefix BLOCK_CRYPTO_OPT_LUKS_DETACHED_MODE, \
+.type = QEMU_OPT_BOOL,\
+.help = "Create a detached LUKS header",  \
+}
+
 #define BLOCK_CRYPTO_OPT_DEF_LUKS_KEYSLOT(prefix)  \
 {  \
 .name = prefix BLOCK_CRYPTO_OPT_LUKS_KEYSLOT,  \
diff --git a/qapi/crypto.json b/qapi/crypto.json
index 6b4e86cb81..8e81aa8454 100644
--- a/qapi/crypto.json
+++ b/qapi/crypto.json
@@ -226,6 +226,8 @@
 # @iter-time: number of milliseconds to spend in PBKDF passphrase
 # processing.  Currently defaults to 2000. (since 2.8)
 #
+# @detached-mode: create a detached LUKS header. (since 9.0)
+#
 # Since: 2.6
 ##
 { 'struct': 'QCryptoBlockCreateOptionsLUKS',
@@ -235,7 +237,8 @@
 '*ivgen-alg': 'QCryptoIVGenAlgorithm',
 '*ivgen-hash-alg': 'QCryptoHashAlgorithm',
 '*hash-alg': 'QCryptoHashAlgorithm',
-'*iter-time': 'int'}}
+'*iter-time': 'int',
+'*detached-mode': 'bool'}}
 
 ##
 # @QCryptoBlockOpenOptions:
-- 
2.39.1




[v3 10/10] MAINTAINERS: Add section "Detached LUKS header"

2023-12-24 Thread Hyman Huang
I've built interests in block cryptography and also
have been working on projects related to this
subsystem.

Add a section to the MAINTAINERS file for detached
LUKS header, it only has a test case in it currently.

Signed-off-by: Hyman Huang 
---
 MAINTAINERS | 5 +
 1 file changed, 5 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 395f26ba86..f0f7b889a3 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3391,6 +3391,11 @@ F: migration/dirtyrate.c
 F: migration/dirtyrate.h
 F: include/sysemu/dirtyrate.h
 
+Detached LUKS header
+M: Hyman Huang 
+S: Maintained
+F: tests/qemu-iotests/tests/luks-detached-header
+
 D-Bus
 M: Marc-André Lureau 
 S: Maintained
-- 
2.39.1




[v3 00/10] Support generic Luks encryption

2023-12-24 Thread Hyman Huang
v3:
- Rebase on master
- Add a test case for detached LUKS header
- Adjust the design to honour preallocation of the payload device
- Adjust the design to honour the payload offset from the header,
  even when detached
- Support detached LUKS header creation using qemu-img
- Support detached LUKS header querying
- Do some code clean

Hyman Huang (10):
  crypto: Introduce option and structure for detached LUKS header
  crypto: Support generic LUKS encryption
  qapi: Make parameter 'file' optional for BlockdevCreateOptionsLUKS
  crypto: Introduce creation option and structure for detached LUKS
header
  crypto: Mark the payload_offset_sector invalid for detached LUKS
header
  block: Support detached LUKS header creation using blockdev-create
  block: Support detached LUKS header creation using qemu-img
  crypto: Introduce 'detached-header' field in QCryptoBlockInfoLUKS
  tests: Add detached LUKS header case
  MAINTAINERS: Add section "Detached LUKS header"

 MAINTAINERS   |   5 +
 block.c   |   5 +-
 block/crypto.c| 146 ++--
 block/crypto.h|   8 +
 crypto/block-luks.c   |  49 +++-
 crypto/block.c|   1 +
 crypto/blockpriv.h|   3 +
 qapi/block-core.json  |  14 +-
 qapi/crypto.json  |  13 +-
 tests/qemu-iotests/210.out|   4 +
 tests/qemu-iotests/tests/luks-detached-header | 214 ++
 .../tests/luks-detached-header.out|   5 +
 12 files changed, 436 insertions(+), 31 deletions(-)
 create mode 100755 tests/qemu-iotests/tests/luks-detached-header
 create mode 100644 tests/qemu-iotests/tests/luks-detached-header.out

-- 
2.39.1




[v3 01/10] crypto: Introduce option and structure for detached LUKS header

2023-12-24 Thread Hyman Huang
Add the "header" option for the LUKS format. This field would be
used to identify the blockdev's position where a detachable LUKS
header is stored.

In addition, introduce header field in struct BlockCrypto

Signed-off-by: Hyman Huang 
Reviewed-by: Daniel P. Berrangé 
Message-Id: 
<5b99f60c7317092a563d7ca3fb4b414197015eb2.1701879996.git.yong.hu...@smartx.com>
---
 block/crypto.c   | 1 +
 qapi/block-core.json | 6 +-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/block/crypto.c b/block/crypto.c
index 921933a5e5..f82b13d32b 100644
--- a/block/crypto.c
+++ b/block/crypto.c
@@ -39,6 +39,7 @@ typedef struct BlockCrypto BlockCrypto;
 struct BlockCrypto {
 QCryptoBlock *block;
 bool updating_keys;
+BdrvChild *header;  /* Reference to the detached LUKS header */
 };
 
 
diff --git a/qapi/block-core.json b/qapi/block-core.json
index ca390c5700..10be08d08f 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -3352,11 +3352,15 @@
 # decryption key (since 2.6). Mandatory except when doing a
 # metadata-only probe of the image.
 #
+# @header: optional reference to the location of a blockdev
+# storing a detached LUKS header. (since 9.0)
+#
 # Since: 2.9
 ##
 { 'struct': 'BlockdevOptionsLUKS',
   'base': 'BlockdevOptionsGenericFormat',
-  'data': { '*key-secret': 'str' } }
+  'data': { '*key-secret': 'str',
+'*header': 'BlockdevRef'} }
 
 ##
 # @BlockdevOptionsGenericCOWFormat:
-- 
2.39.1




[PATCH v6] crypto: Introduce SM4 symmetric cipher algorithm

2023-12-07 Thread Hyman Huang
Introduce the SM4 cipher algorithms (OSCCA GB/T 32907-2016).

SM4 (GBT.32907-2016) is a cryptographic standard issued by the
Organization of State Commercial Administration of China (OSCCA)
as an authorized cryptographic algorithms for the use within China.

Detect the SM4 cipher algorithms and enable the feature silently
if it is available.

Signed-off-by: Hyman Huang 
Reviewed-by: Philippe Mathieu-Daudé 
---
 crypto/block-luks.c | 11 
 crypto/cipher-gcrypt.c.inc  |  8 ++
 crypto/cipher-nettle.c.inc  | 49 +
 crypto/cipher.c |  6 
 meson.build | 26 +
 qapi/crypto.json|  5 +++-
 tests/unit/test-crypto-cipher.c | 13 +
 7 files changed, 117 insertions(+), 1 deletion(-)

diff --git a/crypto/block-luks.c b/crypto/block-luks.c
index fb01ec38bb..f0813d69b4 100644
--- a/crypto/block-luks.c
+++ b/crypto/block-luks.c
@@ -95,12 +95,23 @@ qcrypto_block_luks_cipher_size_map_twofish[] = {
 { 0, 0 },
 };
 
+#ifdef CONFIG_CRYPTO_SM4
+static const QCryptoBlockLUKSCipherSizeMap
+qcrypto_block_luks_cipher_size_map_sm4[] = {
+{ 16, QCRYPTO_CIPHER_ALG_SM4},
+{ 0, 0 },
+};
+#endif
+
 static const QCryptoBlockLUKSCipherNameMap
 qcrypto_block_luks_cipher_name_map[] = {
 { "aes", qcrypto_block_luks_cipher_size_map_aes },
 { "cast5", qcrypto_block_luks_cipher_size_map_cast5 },
 { "serpent", qcrypto_block_luks_cipher_size_map_serpent },
 { "twofish", qcrypto_block_luks_cipher_size_map_twofish },
+#ifdef CONFIG_CRYPTO_SM4
+{ "sm4", qcrypto_block_luks_cipher_size_map_sm4},
+#endif
 };
 
 QEMU_BUILD_BUG_ON(sizeof(struct QCryptoBlockLUKSKeySlot) != 48);
diff --git a/crypto/cipher-gcrypt.c.inc b/crypto/cipher-gcrypt.c.inc
index a6a0117717..1377cbaf14 100644
--- a/crypto/cipher-gcrypt.c.inc
+++ b/crypto/cipher-gcrypt.c.inc
@@ -35,6 +35,9 @@ bool qcrypto_cipher_supports(QCryptoCipherAlgorithm alg,
 case QCRYPTO_CIPHER_ALG_SERPENT_256:
 case QCRYPTO_CIPHER_ALG_TWOFISH_128:
 case QCRYPTO_CIPHER_ALG_TWOFISH_256:
+#ifdef CONFIG_CRYPTO_SM4
+case QCRYPTO_CIPHER_ALG_SM4:
+#endif
 break;
 default:
 return false;
@@ -219,6 +222,11 @@ static QCryptoCipher 
*qcrypto_cipher_ctx_new(QCryptoCipherAlgorithm alg,
 case QCRYPTO_CIPHER_ALG_TWOFISH_256:
 gcryalg = GCRY_CIPHER_TWOFISH;
 break;
+#ifdef CONFIG_CRYPTO_SM4
+case QCRYPTO_CIPHER_ALG_SM4:
+gcryalg = GCRY_CIPHER_SM4;
+break;
+#endif
 default:
 error_setg(errp, "Unsupported cipher algorithm %s",
QCryptoCipherAlgorithm_str(alg));
diff --git a/crypto/cipher-nettle.c.inc b/crypto/cipher-nettle.c.inc
index 24cc61f87b..42b39e18a2 100644
--- a/crypto/cipher-nettle.c.inc
+++ b/crypto/cipher-nettle.c.inc
@@ -33,6 +33,9 @@
 #ifndef CONFIG_QEMU_PRIVATE_XTS
 #include 
 #endif
+#ifdef CONFIG_CRYPTO_SM4
+#include 
+#endif
 
 static inline bool qcrypto_length_check(size_t len, size_t blocksize,
 Error **errp)
@@ -426,6 +429,30 @@ DEFINE_ECB_CBC_CTR_XTS(qcrypto_nettle_twofish,
QCryptoNettleTwofish, TWOFISH_BLOCK_SIZE,
twofish_encrypt_native, twofish_decrypt_native)
 
+#ifdef CONFIG_CRYPTO_SM4
+typedef struct QCryptoNettleSm4 {
+QCryptoCipher base;
+struct sm4_ctx key[2];
+} QCryptoNettleSm4;
+
+static void sm4_encrypt_native(void *ctx, size_t length,
+   uint8_t *dst, const uint8_t *src)
+{
+struct sm4_ctx *keys = ctx;
+sm4_crypt([0], length, dst, src);
+}
+
+static void sm4_decrypt_native(void *ctx, size_t length,
+   uint8_t *dst, const uint8_t *src)
+{
+struct sm4_ctx *keys = ctx;
+sm4_crypt([1], length, dst, src);
+}
+
+DEFINE_ECB(qcrypto_nettle_sm4,
+   QCryptoNettleSm4, SM4_BLOCK_SIZE,
+   sm4_encrypt_native, sm4_decrypt_native)
+#endif
 
 bool qcrypto_cipher_supports(QCryptoCipherAlgorithm alg,
  QCryptoCipherMode mode)
@@ -443,6 +470,9 @@ bool qcrypto_cipher_supports(QCryptoCipherAlgorithm alg,
 case QCRYPTO_CIPHER_ALG_TWOFISH_128:
 case QCRYPTO_CIPHER_ALG_TWOFISH_192:
 case QCRYPTO_CIPHER_ALG_TWOFISH_256:
+#ifdef CONFIG_CRYPTO_SM4
+case QCRYPTO_CIPHER_ALG_SM4:
+#endif
 break;
 default:
 return false;
@@ -701,6 +731,25 @@ static QCryptoCipher 
*qcrypto_cipher_ctx_new(QCryptoCipherAlgorithm alg,
 
 return >base;
 }
+#ifdef CONFIG_CRYPTO_SM4
+case QCRYPTO_CIPHER_ALG_SM4:
+{
+QCryptoNettleSm4 *ctx = g_new0(QCryptoNettleSm4, 1);
+
+switch (mode) {
+case QCRYPTO_CIPHER_MODE_ECB:
+ctx->base.driver = _nettle_sm4_driver_ecb;
+break;
+default:
+goto bad_cipher_mode;
+}
+
+sm4_set_

[PATCH v5] crypto: Introduce SM4 symmetric cipher algorithm

2023-12-07 Thread Hyman Huang
Introduce the SM4 cipher algorithms (OSCCA GB/T 32907-2016).

SM4 (GBT.32907-2016) is a cryptographic standard issued by the
Organization of State Commercial Administration of China (OSCCA)
as an authorized cryptographic algorithms for the use within China.

Use the crypto-sm4 meson build option to explicitly control the
feature, which would be detected by default.

Signed-off-by: Hyman Huang 
Reviewed-by: Philippe Mathieu-Daudé 
---
 crypto/block-luks.c | 11 
 crypto/cipher-gcrypt.c.inc  |  8 ++
 crypto/cipher-nettle.c.inc  | 49 +
 crypto/cipher.c |  6 
 meson.build | 26 +
 qapi/crypto.json|  5 +++-
 tests/unit/test-crypto-cipher.c | 13 +
 7 files changed, 117 insertions(+), 1 deletion(-)

diff --git a/crypto/block-luks.c b/crypto/block-luks.c
index fb01ec38bb..f0813d69b4 100644
--- a/crypto/block-luks.c
+++ b/crypto/block-luks.c
@@ -95,12 +95,23 @@ qcrypto_block_luks_cipher_size_map_twofish[] = {
 { 0, 0 },
 };
 
+#ifdef CONFIG_CRYPTO_SM4
+static const QCryptoBlockLUKSCipherSizeMap
+qcrypto_block_luks_cipher_size_map_sm4[] = {
+{ 16, QCRYPTO_CIPHER_ALG_SM4},
+{ 0, 0 },
+};
+#endif
+
 static const QCryptoBlockLUKSCipherNameMap
 qcrypto_block_luks_cipher_name_map[] = {
 { "aes", qcrypto_block_luks_cipher_size_map_aes },
 { "cast5", qcrypto_block_luks_cipher_size_map_cast5 },
 { "serpent", qcrypto_block_luks_cipher_size_map_serpent },
 { "twofish", qcrypto_block_luks_cipher_size_map_twofish },
+#ifdef CONFIG_CRYPTO_SM4
+{ "sm4", qcrypto_block_luks_cipher_size_map_sm4},
+#endif
 };
 
 QEMU_BUILD_BUG_ON(sizeof(struct QCryptoBlockLUKSKeySlot) != 48);
diff --git a/crypto/cipher-gcrypt.c.inc b/crypto/cipher-gcrypt.c.inc
index a6a0117717..1377cbaf14 100644
--- a/crypto/cipher-gcrypt.c.inc
+++ b/crypto/cipher-gcrypt.c.inc
@@ -35,6 +35,9 @@ bool qcrypto_cipher_supports(QCryptoCipherAlgorithm alg,
 case QCRYPTO_CIPHER_ALG_SERPENT_256:
 case QCRYPTO_CIPHER_ALG_TWOFISH_128:
 case QCRYPTO_CIPHER_ALG_TWOFISH_256:
+#ifdef CONFIG_CRYPTO_SM4
+case QCRYPTO_CIPHER_ALG_SM4:
+#endif
 break;
 default:
 return false;
@@ -219,6 +222,11 @@ static QCryptoCipher 
*qcrypto_cipher_ctx_new(QCryptoCipherAlgorithm alg,
 case QCRYPTO_CIPHER_ALG_TWOFISH_256:
 gcryalg = GCRY_CIPHER_TWOFISH;
 break;
+#ifdef CONFIG_CRYPTO_SM4
+case QCRYPTO_CIPHER_ALG_SM4:
+gcryalg = GCRY_CIPHER_SM4;
+break;
+#endif
 default:
 error_setg(errp, "Unsupported cipher algorithm %s",
QCryptoCipherAlgorithm_str(alg));
diff --git a/crypto/cipher-nettle.c.inc b/crypto/cipher-nettle.c.inc
index 24cc61f87b..42b39e18a2 100644
--- a/crypto/cipher-nettle.c.inc
+++ b/crypto/cipher-nettle.c.inc
@@ -33,6 +33,9 @@
 #ifndef CONFIG_QEMU_PRIVATE_XTS
 #include 
 #endif
+#ifdef CONFIG_CRYPTO_SM4
+#include 
+#endif
 
 static inline bool qcrypto_length_check(size_t len, size_t blocksize,
 Error **errp)
@@ -426,6 +429,30 @@ DEFINE_ECB_CBC_CTR_XTS(qcrypto_nettle_twofish,
QCryptoNettleTwofish, TWOFISH_BLOCK_SIZE,
twofish_encrypt_native, twofish_decrypt_native)
 
+#ifdef CONFIG_CRYPTO_SM4
+typedef struct QCryptoNettleSm4 {
+QCryptoCipher base;
+struct sm4_ctx key[2];
+} QCryptoNettleSm4;
+
+static void sm4_encrypt_native(void *ctx, size_t length,
+   uint8_t *dst, const uint8_t *src)
+{
+struct sm4_ctx *keys = ctx;
+sm4_crypt([0], length, dst, src);
+}
+
+static void sm4_decrypt_native(void *ctx, size_t length,
+   uint8_t *dst, const uint8_t *src)
+{
+struct sm4_ctx *keys = ctx;
+sm4_crypt([1], length, dst, src);
+}
+
+DEFINE_ECB(qcrypto_nettle_sm4,
+   QCryptoNettleSm4, SM4_BLOCK_SIZE,
+   sm4_encrypt_native, sm4_decrypt_native)
+#endif
 
 bool qcrypto_cipher_supports(QCryptoCipherAlgorithm alg,
  QCryptoCipherMode mode)
@@ -443,6 +470,9 @@ bool qcrypto_cipher_supports(QCryptoCipherAlgorithm alg,
 case QCRYPTO_CIPHER_ALG_TWOFISH_128:
 case QCRYPTO_CIPHER_ALG_TWOFISH_192:
 case QCRYPTO_CIPHER_ALG_TWOFISH_256:
+#ifdef CONFIG_CRYPTO_SM4
+case QCRYPTO_CIPHER_ALG_SM4:
+#endif
 break;
 default:
 return false;
@@ -701,6 +731,25 @@ static QCryptoCipher 
*qcrypto_cipher_ctx_new(QCryptoCipherAlgorithm alg,
 
 return >base;
 }
+#ifdef CONFIG_CRYPTO_SM4
+case QCRYPTO_CIPHER_ALG_SM4:
+{
+QCryptoNettleSm4 *ctx = g_new0(QCryptoNettleSm4, 1);
+
+switch (mode) {
+case QCRYPTO_CIPHER_MODE_ECB:
+ctx->base.driver = _nettle_sm4_driver_ecb;
+break;
+default:
+goto bad_cipher_mo

[v2 0/4] Support generic Luks encryption

2023-12-06 Thread Hyman Huang
v2:
- Simplify the design by reusing the LUKS driver to implement
  the generic Luks encryption, thank Daniel for the insightful 
  advice.
- rebase on master. 

This functionality was motivated by the following to-do list seen
in crypto documents:
https://wiki.qemu.org/Features/Block/Crypto 

The last chapter says we should "separate header volume": 

The LUKS format has ability to store the header in a separate volume
from the payload. We should extend the LUKS driver in QEMU to support
this use case.

By enhancing the LUKS driver, it is possible to enable
the detachable LUKS header and, as a result, achieve
general encryption for any disk format that QEMU has
supported.

Take the qcow2 as an example, the usage of the generic
LUKS encryption as follows:

1. add a protocol blockdev node of data disk
$ virsh qemu-monitor-command vm '{"execute":"blockdev-add",
> "arguments":{"node-name":"libvirt-1-storage", "driver":"file",
> "filename":"/path/to/test_disk.qcow2"}}'

2. add a protocol blockdev node of LUKS header as above.
$ virsh qemu-monitor-command vm '{"execute":"blockdev-add",
> "arguments":{"node-name":"libvirt-2-storage", "driver":"file",
> "filename": "/path/to/cipher.gluks" }}'

3. add the secret for decrypting the cipher stored in LUKS
   header above
$ virsh qemu-monitor-command vm '{"execute":"object-add",
> "arguments":{"qom-type":"secret", "id":
> "libvirt-2-storage-secret0", "data":"abc123"}}'

4. add the qcow2-drived blockdev format node
$ virsh qemu-monitor-command vm '{"execute":"blockdev-add",
> "arguments":{"node-name":"libvirt-1-format", "driver":"qcow2",
> "file":"libvirt-1-storage"}}'

5. add the luks-drived blockdev to link the qcow2 disk with
   LUKS header by specifying the field "header"
$ virsh qemu-monitor-command vm '{"execute":"blockdev-add",
> "arguments":{"node-name":"libvirt-2-format", "driver":"luks",
> "file":"libvirt-1-format", "header":"libvirt-2-storage",
> "key-secret":"libvirt-2-format-secret0"}}'

6. add the virtio-blk device finally
$ virsh qemu-monitor-command vm '{"execute":"device_add",
> "arguments": {"num-queues":"1", "driver":"virtio-blk-pci",
> "drive": "libvirt-2-format", "id":"virtio-disk2"}}'

The generic LUKS encryption method of starting a virtual
machine (VM) is somewhat similar to hot-plug in that both
maintaining the same json command while the starting VM
changes the "blockdev-add/device_add" parameters to
"blockdev/device".

Please review, thanks

Best regared,

Yong

Hyman Huang (4):
  crypto: Introduce option and structure for detached LUKS header
  crypto: Introduce payload offset set function
  crypto: Support generic LUKS encryption
  block: Support detached LUKS header creation for blockdev-create

 block/crypto.c | 47 --
 crypto/block.c |  4 
 include/crypto/block.h |  1 +
 qapi/block-core.json   | 11 --
 4 files changed, 59 insertions(+), 4 deletions(-)

-- 
2.39.1




[v2 2/4] crypto: Introduce payload offset set function

2023-12-06 Thread Hyman Huang
Signed-off-by: Hyman Huang 
---
 crypto/block.c | 4 
 include/crypto/block.h | 1 +
 2 files changed, 5 insertions(+)

diff --git a/crypto/block.c b/crypto/block.c
index 7bb4b74a37..3dcf22a69f 100644
--- a/crypto/block.c
+++ b/crypto/block.c
@@ -319,6 +319,10 @@ QCryptoHashAlgorithm 
qcrypto_block_get_kdf_hash(QCryptoBlock *block)
 return block->kdfhash;
 }
 
+void qcrypto_block_set_payload_offset(QCryptoBlock *block, uint64_t offset)
+{
+block->payload_offset = offset;
+}
 
 uint64_t qcrypto_block_get_payload_offset(QCryptoBlock *block)
 {
diff --git a/include/crypto/block.h b/include/crypto/block.h
index 4f63a37872..b47a90c529 100644
--- a/include/crypto/block.h
+++ b/include/crypto/block.h
@@ -312,4 +312,5 @@ void qcrypto_block_free(QCryptoBlock *block);
 
 G_DEFINE_AUTOPTR_CLEANUP_FUNC(QCryptoBlock, qcrypto_block_free)
 
+void qcrypto_block_set_payload_offset(QCryptoBlock *block, uint64_t offset);
 #endif /* QCRYPTO_BLOCK_H */
-- 
2.39.1




[v2 3/4] crypto: Support generic LUKS encryption

2023-12-06 Thread Hyman Huang
By enhancing the LUKS driver, it is possible to enable
the detachable LUKS header and, as a result, achieve
general encryption for any disk format that QEMU has
supported.

Take the qcow2 as an example, the usage of the generic
LUKS encryption as follows:

1. add a protocol blockdev node of data disk
$ virsh qemu-monitor-command vm '{"execute":"blockdev-add",
> "arguments":{"node-name":"libvirt-1-storage", "driver":"file",
> "filename":"/path/to/test_disk.qcow2"}}'

2. add a protocol blockdev node of LUKS header as above.
$ virsh qemu-monitor-command vm '{"execute":"blockdev-add",
> "arguments":{"node-name":"libvirt-2-storage", "driver":"file",
> "filename": "/path/to/cipher.gluks" }}'

3. add the secret for decrypting the cipher stored in LUKS
   header above
$ virsh qemu-monitor-command vm '{"execute":"object-add",
> "arguments":{"qom-type":"secret", "id":
> "libvirt-2-storage-secret0", "data":"abc123"}}'

4. add the qcow2-drived blockdev format node
$ virsh qemu-monitor-command vm '{"execute":"blockdev-add",
> "arguments":{"node-name":"libvirt-1-format", "driver":"qcow2",
> "file":"libvirt-1-storage"}}'

5. add the luks-drived blockdev to link the qcow2 disk with
   LUKS header by specifying the field "header"
$ virsh qemu-monitor-command vm '{"execute":"blockdev-add",
> "arguments":{"node-name":"libvirt-2-format", "driver":"luks",
> "file":"libvirt-1-format", "header":"libvirt-2-storage",
> "key-secret":"libvirt-2-format-secret0"}}'

6. add the virtio-blk device finally
$ virsh qemu-monitor-command vm '{"execute":"device_add",
> "arguments": {"num-queues":"1", "driver":"virtio-blk-pci",
> "drive": "libvirt-2-format", "id":"virtio-disk2"}}'

The generic LUKS encryption method of starting a virtual
machine (VM) is somewhat similar to hot-plug in that both
maintaining the same json command while the starting VM
changes the "blockdev-add/device_add" parameters to
"blockdev/device".

Signed-off-by: Hyman Huang 
---
 block/crypto.c | 38 +-
 1 file changed, 37 insertions(+), 1 deletion(-)

diff --git a/block/crypto.c b/block/crypto.c
index f82b13d32b..7d70349463 100644
--- a/block/crypto.c
+++ b/block/crypto.c
@@ -40,6 +40,7 @@ struct BlockCrypto {
 QCryptoBlock *block;
 bool updating_keys;
 BdrvChild *header;  /* Reference to the detached LUKS header */
+bool detached_mode; /* If true, LUKS plays a detached header role */
 };
 
 
@@ -64,12 +65,16 @@ static int block_crypto_read_func(QCryptoBlock *block,
   Error **errp)
 {
 BlockDriverState *bs = opaque;
+BlockCrypto *crypto = bs->opaque;
 ssize_t ret;
 
 GLOBAL_STATE_CODE();
 GRAPH_RDLOCK_GUARD_MAINLOOP();
 
-ret = bdrv_pread(bs->file, offset, buflen, buf, 0);
+if (crypto->detached_mode)
+ret = bdrv_pread(crypto->header, offset, buflen, buf, 0);
+else
+ret = bdrv_pread(bs->file, offset, buflen, buf, 0);
 if (ret < 0) {
 error_setg_errno(errp, -ret, "Could not read encryption header");
 return ret;
@@ -269,6 +274,8 @@ static int block_crypto_open_generic(QCryptoBlockFormat 
format,
 QCryptoBlockOpenOptions *open_opts = NULL;
 unsigned int cflags = 0;
 QDict *cryptoopts = NULL;
+const char *header_bdref =
+qdict_get_try_str(options, "header");
 
 GLOBAL_STATE_CODE();
 
@@ -277,6 +284,16 @@ static int block_crypto_open_generic(QCryptoBlockFormat 
format,
 return ret;
 }
 
+if (header_bdref) {
+crypto->detached_mode = true;
+crypto->header = bdrv_open_child(NULL, options, "header", bs,
+ _of_bds, BDRV_CHILD_METADATA,
+ false, errp);
+if (!crypto->header) {
+return -EINVAL;
+}
+}
+
 GRAPH_RDLOCK_GUARD_MAINLOOP();
 
 bs->supported_write_flags = BDRV_REQ_FUA &
@@ -312,6 +329,14 @@ static int block_crypto_open_generic(QCryptoBlockFormat 
format,
 goto cleanup;
 }
 
+if (crypto->detached_mode) {
+/*
+ * Set payload offset to zero as the file bdref has no LUKS
+ * header under detached mode.
+ */
+qcrypto_block_set_payload_offset(crypto->block, 0);
+}
+
 bs->encrypted = true;
 
 ret = 0;
@@ -903,6 +928,17 @@ block_crypto_child_perms(BlockDriverState *bs, BdrvChild 
*c,
 
 BlockCrypto *crypto = bs->opaque;
 
+if (role == (role & BDRV_CHILD_METADATA)) {
+/* Assign read permission only */
+perm |= BLK_PERM_CONSISTENT_READ;
+/* Share all permissions */
+shared |= BLK_PERM_ALL;
+
+*nperm = perm;
+*nshared = shared;
+return;
+}
+
 bdrv_default_perms(bs, c, role, reopen_queue, perm, shared, nperm, 
nshared);
 
 /*
-- 
2.39.1




[v2 4/4] block: Support detached LUKS header creation for blockdev-create

2023-12-06 Thread Hyman Huang
Provide the "detached-mode" option for detached LUKS header
formatting.

To format the LUKS header on the pre-creating disk, example
as follows:

1. add a protocol blockdev node of LUKS header
$ virsh qemu-monitor-command vm '{"execute":"blockdev-add",
> "arguments":{"node-name":"libvirt-1-storage", "driver":"file",
> "filename":"/path/to/cipher.gluks" }}'

2. add the secret for encrypting the cipher stored in LUKS
   header above
$ virsh qemu-monitor-command vm '{"execute":"object-add",
> "arguments":{"qom-type": "secret", "id":
> "libvirt-1-storage-secret0", "data": "abc123"}}'

3. format the disk node
$ virsh qemu-monitor-command vm '{"execute":"blockdev-create",
> "arguments":{"job-id":"job0", "options":{"driver":"luks",
> "size":0, "file":"libvirt-1-storage", "detached-mode":true,
> "cipher-alg":"aes-256",
> "key-secret":"libvirt-3-storage-encryption-secret0"}}}'

Signed-off-by: Hyman Huang 
---
 block/crypto.c   | 8 +++-
 qapi/block-core.json | 5 -
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/block/crypto.c b/block/crypto.c
index 7d70349463..e77c49bd0c 100644
--- a/block/crypto.c
+++ b/block/crypto.c
@@ -667,10 +667,12 @@ block_crypto_co_create_luks(BlockdevCreateOptions 
*create_options, Error **errp)
 BlockDriverState *bs = NULL;
 QCryptoBlockCreateOptions create_opts;
 PreallocMode preallocation = PREALLOC_MODE_OFF;
+int64_t size;
 int ret;
 
 assert(create_options->driver == BLOCKDEV_DRIVER_LUKS);
 luks_opts = _options->u.luks;
+size = luks_opts->size;
 
 bs = bdrv_co_open_blockdev_ref(luks_opts->file, errp);
 if (bs == NULL) {
@@ -686,7 +688,11 @@ block_crypto_co_create_luks(BlockdevCreateOptions 
*create_options, Error **errp)
 preallocation = luks_opts->preallocation;
 }
 
-ret = block_crypto_co_create_generic(bs, luks_opts->size, _opts,
+if (luks_opts->detached_mode) {
+size = 0;
+}
+
+ret = block_crypto_co_create_generic(bs, size, _opts,
  preallocation, errp);
 if (ret < 0) {
 goto fail;
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 10be08d08f..1e7a7e1b05 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -4952,13 +4952,16 @@
 # @preallocation: Preallocation mode for the new image (since: 4.2)
 # (default: off; allowed values: off, metadata, falloc, full)
 #
+# @detached-mode: create a detached LUKS header. (since 9.0)
+#
 # Since: 2.12
 ##
 { 'struct': 'BlockdevCreateOptionsLUKS',
   'base': 'QCryptoBlockCreateOptionsLUKS',
   'data': { 'file': 'BlockdevRef',
 'size': 'size',
-'*preallocation':   'PreallocMode' } }
+'*preallocation':   'PreallocMode',
+'*detached-mode':   'bool'}}
 
 ##
 # @BlockdevCreateOptionsNfs:
-- 
2.39.1




[v2 1/4] crypto: Introduce option and structure for detached LUKS header

2023-12-06 Thread Hyman Huang
Add the "header" option for the LUKS format. This field would be
used to identify the blockdev's position where a detachable LUKS
header is stored.

In addition, introduce header field in struct BlockCrypto

Signed-off-by: Hyman Huang 
---
 block/crypto.c   | 1 +
 qapi/block-core.json | 6 +-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/block/crypto.c b/block/crypto.c
index 921933a5e5..f82b13d32b 100644
--- a/block/crypto.c
+++ b/block/crypto.c
@@ -39,6 +39,7 @@ typedef struct BlockCrypto BlockCrypto;
 struct BlockCrypto {
 QCryptoBlock *block;
 bool updating_keys;
+BdrvChild *header;  /* Reference to the detached LUKS header */
 };
 
 
diff --git a/qapi/block-core.json b/qapi/block-core.json
index ca390c5700..10be08d08f 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -3352,11 +3352,15 @@
 # decryption key (since 2.6). Mandatory except when doing a
 # metadata-only probe of the image.
 #
+# @header: optional reference to the location of a blockdev
+# storing a detached LUKS header. (since 9.0)
+#
 # Since: 2.9
 ##
 { 'struct': 'BlockdevOptionsLUKS',
   'base': 'BlockdevOptionsGenericFormat',
-  'data': { '*key-secret': 'str' } }
+  'data': { '*key-secret': 'str',
+'*header': 'BlockdevRef'} }
 
 ##
 # @BlockdevOptionsGenericCOWFormat:
-- 
2.39.1




[RFC 4/8] Gluks: Introduce Gluks options

2023-12-04 Thread Hyman Huang
Similar to Luks, the Gluks format primarily recycles the
Luks choices with the exception of the "size" option.

Signed-off-by: Hyman Huang 
---
 block/crypto.c   |  4 ++--
 block/generic-luks.c | 18 ++
 block/generic-luks.h |  3 +++
 3 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/block/crypto.c b/block/crypto.c
index 6afae1de2e..6f8528dccc 100644
--- a/block/crypto.c
+++ b/block/crypto.c
@@ -150,7 +150,7 @@ error:
 }
 
 
-static QemuOptsList block_crypto_runtime_opts_luks = {
+QemuOptsList block_crypto_runtime_opts_luks = {
 .name = "crypto",
 .head = QTAILQ_HEAD_INITIALIZER(block_crypto_runtime_opts_luks.head),
 .desc = {
@@ -181,7 +181,7 @@ static QemuOptsList block_crypto_create_opts_luks = {
 };
 
 
-static QemuOptsList block_crypto_amend_opts_luks = {
+QemuOptsList block_crypto_amend_opts_luks = {
 .name = "crypto",
 .head = QTAILQ_HEAD_INITIALIZER(block_crypto_create_opts_luks.head),
 .desc = {
diff --git a/block/generic-luks.c b/block/generic-luks.c
index f23e202991..ebc0365d40 100644
--- a/block/generic-luks.c
+++ b/block/generic-luks.c
@@ -35,6 +35,21 @@ typedef struct BDRVGLUKSState {
 uint64_t header_size;   /* In bytes */
 } BDRVGLUKSState;
 
+static QemuOptsList gluks_create_opts_luks = {
+.name = "crypto",
+.head = QTAILQ_HEAD_INITIALIZER(gluks_create_opts_luks.head),
+.desc = {
+BLOCK_CRYPTO_OPT_DEF_LUKS_KEY_SECRET(""),
+BLOCK_CRYPTO_OPT_DEF_LUKS_CIPHER_ALG(""),
+BLOCK_CRYPTO_OPT_DEF_LUKS_CIPHER_MODE(""),
+BLOCK_CRYPTO_OPT_DEF_LUKS_IVGEN_ALG(""),
+BLOCK_CRYPTO_OPT_DEF_LUKS_IVGEN_HASH_ALG(""),
+BLOCK_CRYPTO_OPT_DEF_LUKS_HASH_ALG(""),
+BLOCK_CRYPTO_OPT_DEF_LUKS_ITER_TIME(""),
+{ /* end of list */ }
+},
+};
+
 static int gluks_open(BlockDriverState *bs, QDict *options, int flags,
   Error **errp)
 {
@@ -71,6 +86,9 @@ static BlockDriver bdrv_generic_luks = {
 .bdrv_co_create_opts= gluks_co_create_opts,
 .bdrv_child_perm= gluks_child_perms,
 .bdrv_co_getlength  = gluks_co_getlength,
+
+.create_opts= _create_opts_luks,
+.amend_opts = _crypto_amend_opts_luks,
 };
 
 static void block_generic_luks_init(void)
diff --git a/block/generic-luks.h b/block/generic-luks.h
index 2aae866fa4..f18adf41ea 100644
--- a/block/generic-luks.h
+++ b/block/generic-luks.h
@@ -23,4 +23,7 @@
 #ifndef GENERIC_LUKS_H
 #define GENERIC_LUKS_H
 
+extern QemuOptsList block_crypto_runtime_opts_luks;
+extern QemuOptsList block_crypto_amend_opts_luks;
+
 #endif /* GENERIC_LUKS_H */
-- 
2.39.1




[RFC 6/8] crypto: Provide the Luks crypto driver to Gluks

2023-12-04 Thread Hyman Huang
Hooks up the Luks crypto driver for Gluks.

Signed-off-by: Hyman Huang 
---
 crypto/block.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/crypto/block.c b/crypto/block.c
index 3dcf22a69f..7e695c0a04 100644
--- a/crypto/block.c
+++ b/crypto/block.c
@@ -27,6 +27,7 @@
 static const QCryptoBlockDriver *qcrypto_block_drivers[] = {
 [Q_CRYPTO_BLOCK_FORMAT_QCOW] = _block_driver_qcow,
 [Q_CRYPTO_BLOCK_FORMAT_LUKS] = _block_driver_luks,
+[Q_CRYPTO_BLOCK_FORMAT_GLUKS] = _block_driver_luks,
 };
 
 
-- 
2.39.1




[RFC 5/8] qapi: Introduce Gluks types to qapi

2023-12-04 Thread Hyman Huang
Primarily using the Luks types again, Gluks adds an
extra option called "header", which points to the Luks
header node's description.

Signed-off-by: Hyman Huang 
---
 qapi/block-core.json | 22 +-
 qapi/crypto.json | 10 +++---
 2 files changed, 28 insertions(+), 4 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index ca390c5700..e2208f6891 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -3185,12 +3185,14 @@
 #
 # @snapshot-access: Since 7.0
 #
+# @gluks: Since 9.0
+#
 # Since: 2.9
 ##
 { 'enum': 'BlockdevDriver',
   'data': [ 'blkdebug', 'blklogwrites', 'blkreplay', 'blkverify', 'bochs',
 'cloop', 'compress', 'copy-before-write', 'copy-on-read', 'dmg',
-'file', 'snapshot-access', 'ftp', 'ftps', 'gluster',
+'file', 'snapshot-access', 'ftp', 'ftps', 'gluks', 'gluster',
 {'name': 'host_cdrom', 'if': 'HAVE_HOST_BLOCK_DEVICE' },
 {'name': 'host_device', 'if': 'HAVE_HOST_BLOCK_DEVICE' },
 'http', 'https',
@@ -3957,6 +3959,23 @@
 '*debug': 'int',
 '*logfile': 'str' } }
 
+##
+# @BlockdevOptionsGLUKS:
+#
+# Driver specific block device options for GLUKS.
+#
+# @header: reference to the definition of the luks header node.
+#
+# @key-secret: the ID of a QCryptoSecret object providing the
+# decryption key.
+#
+# Since: 9.0
+##
+{ 'struct': 'BlockdevOptionsGLUKS',
+  'base': 'BlockdevOptionsGenericFormat',
+  'data': { 'header': 'BlockdevRef',
+'key-secret': 'str' } }
+
 ##
 # @BlockdevOptionsIoUring:
 #
@@ -4680,6 +4699,7 @@
   'file':   'BlockdevOptionsFile',
   'ftp':'BlockdevOptionsCurlFtp',
   'ftps':   'BlockdevOptionsCurlFtps',
+  'gluks':  'BlockdevOptionsGLUKS',
   'gluster':'BlockdevOptionsGluster',
   'host_cdrom':  { 'type': 'BlockdevOptionsFile',
'if': 'HAVE_HOST_BLOCK_DEVICE' },
diff --git a/qapi/crypto.json b/qapi/crypto.json
index fd3d46ebd1..9afb242b5b 100644
--- a/qapi/crypto.json
+++ b/qapi/crypto.json
@@ -154,11 +154,13 @@
 #
 # @luks: LUKS encryption format.  Recommended for new images
 #
+# @gluks: generic LUKS encryption format. (since 9.0)
+#
 # Since: 2.6
 ##
 { 'enum': 'QCryptoBlockFormat',
 #  'prefix': 'QCRYPTO_BLOCK_FORMAT',
-  'data': ['qcow', 'luks']}
+  'data': ['qcow', 'luks', 'gluks']}
 
 ##
 # @QCryptoBlockOptionsBase:
@@ -246,7 +248,8 @@
   'base': 'QCryptoBlockOptionsBase',
   'discriminator': 'format',
   'data': { 'qcow': 'QCryptoBlockOptionsQCow',
-'luks': 'QCryptoBlockOptionsLUKS' } }
+'luks': 'QCryptoBlockOptionsLUKS',
+'gluks': 'QCryptoBlockOptionsLUKS' } }
 
 ##
 # @QCryptoBlockCreateOptions:
@@ -260,7 +263,8 @@
   'base': 'QCryptoBlockOptionsBase',
   'discriminator': 'format',
   'data': { 'qcow': 'QCryptoBlockOptionsQCow',
-'luks': 'QCryptoBlockCreateOptionsLUKS' } }
+'luks': 'QCryptoBlockCreateOptionsLUKS',
+'gluks': 'QCryptoBlockCreateOptionsLUKS' } }
 
 ##
 # @QCryptoBlockInfoBase:
-- 
2.39.1




[RFC 1/8] crypto: Export util functions and structures

2023-12-04 Thread Hyman Huang
Luks driver logic is primarily reused by Gluk, which,
therefore, exports several pre-existing functions and
structures.

Signed-off-by: Hyman Huang 
---
 block/crypto.c | 16 
 block/crypto.h | 23 +++
 2 files changed, 27 insertions(+), 12 deletions(-)

diff --git a/block/crypto.c b/block/crypto.c
index 921933a5e5..6afae1de2e 100644
--- a/block/crypto.c
+++ b/block/crypto.c
@@ -34,14 +34,6 @@
 #include "qemu/memalign.h"
 #include "crypto.h"
 
-typedef struct BlockCrypto BlockCrypto;
-
-struct BlockCrypto {
-QCryptoBlock *block;
-bool updating_keys;
-};
-
-
 static int block_crypto_probe_generic(QCryptoBlockFormat format,
   const uint8_t *buf,
   int buf_size,
@@ -321,7 +313,7 @@ static int block_crypto_open_generic(QCryptoBlockFormat 
format,
 }
 
 
-static int coroutine_fn GRAPH_UNLOCKED
+int coroutine_fn GRAPH_UNLOCKED
 block_crypto_co_create_generic(BlockDriverState *bs, int64_t size,
QCryptoBlockCreateOptions *opts,
PreallocMode prealloc, Error **errp)
@@ -385,7 +377,7 @@ block_crypto_co_truncate(BlockDriverState *bs, int64_t 
offset, bool exact,
 return bdrv_co_truncate(bs->file, offset, exact, prealloc, 0, errp);
 }
 
-static void block_crypto_close(BlockDriverState *bs)
+void block_crypto_close(BlockDriverState *bs)
 {
 BlockCrypto *crypto = bs->opaque;
 qcrypto_block_free(crypto->block);
@@ -404,7 +396,7 @@ static int block_crypto_reopen_prepare(BDRVReopenState 
*state,
  */
 #define BLOCK_CRYPTO_MAX_IO_SIZE (1024 * 1024)
 
-static int coroutine_fn GRAPH_RDLOCK
+int coroutine_fn GRAPH_RDLOCK
 block_crypto_co_preadv(BlockDriverState *bs, int64_t offset, int64_t bytes,
QEMUIOVector *qiov, BdrvRequestFlags flags)
 {
@@ -466,7 +458,7 @@ block_crypto_co_preadv(BlockDriverState *bs, int64_t 
offset, int64_t bytes,
 }
 
 
-static int coroutine_fn GRAPH_RDLOCK
+int coroutine_fn GRAPH_RDLOCK
 block_crypto_co_pwritev(BlockDriverState *bs, int64_t offset, int64_t bytes,
 QEMUIOVector *qiov, BdrvRequestFlags flags)
 {
diff --git a/block/crypto.h b/block/crypto.h
index 72e792c9af..06465009f0 100644
--- a/block/crypto.h
+++ b/block/crypto.h
@@ -21,6 +21,8 @@
 #ifndef BLOCK_CRYPTO_H
 #define BLOCK_CRYPTO_H
 
+#include "crypto/block.h"
+
 #define BLOCK_CRYPTO_OPT_DEF_KEY_SECRET(prefix, helpstr)\
 {   \
 .name = prefix BLOCK_CRYPTO_OPT_QCOW_KEY_SECRET,\
@@ -131,4 +133,25 @@ block_crypto_amend_opts_init(QDict *opts, Error **errp);
 QCryptoBlockOpenOptions *
 block_crypto_open_opts_init(QDict *opts, Error **errp);
 
+typedef struct BlockCrypto BlockCrypto;
+
+struct BlockCrypto {
+QCryptoBlock *block;
+bool updating_keys;
+};
+
+int coroutine_fn GRAPH_UNLOCKED
+block_crypto_co_create_generic(BlockDriverState *bs, int64_t size,
+   QCryptoBlockCreateOptions *opts,
+   PreallocMode prealloc, Error **errp);
+
+int coroutine_fn GRAPH_RDLOCK
+block_crypto_co_preadv(BlockDriverState *bs, int64_t offset, int64_t bytes,
+   QEMUIOVector *qiov, BdrvRequestFlags flags);
+
+int coroutine_fn GRAPH_RDLOCK
+block_crypto_co_pwritev(BlockDriverState *bs, int64_t offset, int64_t bytes,
+QEMUIOVector *qiov, BdrvRequestFlags flags);
+
+void block_crypto_close(BlockDriverState *bs);
 #endif /* BLOCK_CRYPTO_H */
-- 
2.39.1




[RFC 2/8] crypto: Introduce payload offset set function

2023-12-04 Thread Hyman Huang
Implement the payload offset set function for Gluks.

Signed-off-by: Hyman Huang 
---
 crypto/block.c | 4 
 include/crypto/block.h | 1 +
 2 files changed, 5 insertions(+)

diff --git a/crypto/block.c b/crypto/block.c
index 7bb4b74a37..3dcf22a69f 100644
--- a/crypto/block.c
+++ b/crypto/block.c
@@ -319,6 +319,10 @@ QCryptoHashAlgorithm 
qcrypto_block_get_kdf_hash(QCryptoBlock *block)
 return block->kdfhash;
 }
 
+void qcrypto_block_set_payload_offset(QCryptoBlock *block, uint64_t offset)
+{
+block->payload_offset = offset;
+}
 
 uint64_t qcrypto_block_get_payload_offset(QCryptoBlock *block)
 {
diff --git a/include/crypto/block.h b/include/crypto/block.h
index 4f63a37872..b47a90c529 100644
--- a/include/crypto/block.h
+++ b/include/crypto/block.h
@@ -312,4 +312,5 @@ void qcrypto_block_free(QCryptoBlock *block);
 
 G_DEFINE_AUTOPTR_CLEANUP_FUNC(QCryptoBlock, qcrypto_block_free)
 
+void qcrypto_block_set_payload_offset(QCryptoBlock *block, uint64_t offset);
 #endif /* QCRYPTO_BLOCK_H */
-- 
2.39.1




[RFC 3/8] Gluks: Add the basic framework

2023-12-04 Thread Hyman Huang
Gluks would be a built-in format in the QEMU block layer.

Signed-off-by: Hyman Huang 
---
 block/generic-luks.c | 81 
 block/generic-luks.h | 26 ++
 block/meson.build|  1 +
 3 files changed, 108 insertions(+)
 create mode 100644 block/generic-luks.c
 create mode 100644 block/generic-luks.h

diff --git a/block/generic-luks.c b/block/generic-luks.c
new file mode 100644
index 00..f23e202991
--- /dev/null
+++ b/block/generic-luks.c
@@ -0,0 +1,81 @@
+/*
+ * QEMU block driver for the generic luks encryption
+ *
+ * Copyright (c) 2024 SmartX Inc
+ *
+ * Author: Hyman Huang 
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+#include "qemu/osdep.h"
+
+#include "block/block_int.h"
+#include "block/crypto.h"
+#include "crypto/block.h"
+
+#include "generic-luks.h"
+
+/* BDRVGLUKSState holds the state of one generic LUKS instance */
+typedef struct BDRVGLUKSState {
+BlockCrypto crypto;
+BdrvChild *header;  /* LUKS header node */
+uint64_t header_size;   /* In bytes */
+} BDRVGLUKSState;
+
+static int gluks_open(BlockDriverState *bs, QDict *options, int flags,
+  Error **errp)
+{
+return 0;
+}
+
+static int coroutine_fn GRAPH_UNLOCKED
+gluks_co_create_opts(BlockDriver *drv, const char *filename,
+ QemuOpts *opts, Error **errp)
+{
+return 0;
+}
+
+static void
+gluks_child_perms(BlockDriverState *bs, BdrvChild *c,
+  const BdrvChildRole role,
+  BlockReopenQueue *reopen_queue,
+  uint64_t perm, uint64_t shared,
+  uint64_t *nperm, uint64_t *nshared)
+{
+
+}
+
+static int64_t coroutine_fn GRAPH_RDLOCK
+gluks_co_getlength(BlockDriverState *bs)
+{
+return 0;
+}
+
+static BlockDriver bdrv_generic_luks = {
+.format_name= "gluks",
+.instance_size  = sizeof(BDRVGLUKSState),
+.bdrv_open  = gluks_open,
+.bdrv_co_create_opts= gluks_co_create_opts,
+.bdrv_child_perm= gluks_child_perms,
+.bdrv_co_getlength  = gluks_co_getlength,
+};
+
+static void block_generic_luks_init(void)
+{
+bdrv_register(_generic_luks);
+}
+
+block_init(block_generic_luks_init);
diff --git a/block/generic-luks.h b/block/generic-luks.h
new file mode 100644
index 00..2aae866fa4
--- /dev/null
+++ b/block/generic-luks.h
@@ -0,0 +1,26 @@
+/*
+ * QEMU block driver for the generic luks encryption
+ *
+ * Copyright (c) 2024 SmartX Inc
+ *
+ * Author: Hyman Huang 
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+#ifndef GENERIC_LUKS_H
+#define GENERIC_LUKS_H
+
+#endif /* GENERIC_LUKS_H */
diff --git a/block/meson.build b/block/meson.build
index 59ff6d380c..74f2da7bed 100644
--- a/block/meson.build
+++ b/block/meson.build
@@ -39,6 +39,7 @@ block_ss.add(files(
   'throttle.c',
   'throttle-groups.c',
   'write-threshold.c',
+  'generic-luks.c',
 ), zstd, zlib, gnutls)
 
 system_ss.add(when: 'CONFIG_TCG', if_true: files('blkreplay.c'))
-- 
2.39.1




[RFC 8/8] block: Support Gluks format image creation using qemu-img

2023-12-04 Thread Hyman Huang
To create a Gluks header image, use the command as follows:
$ qemu-img create --object secret,id=sec0,data=abc123 -f gluks
> -o cipher-alg=aes-256,cipher-mode=xts -o key-secret=sec0
> cipher.gluks

Signed-off-by: Hyman Huang 
---
 block.c  |  5 +
 block/generic-luks.c | 53 +++-
 2 files changed, 57 insertions(+), 1 deletion(-)

diff --git a/block.c b/block.c
index bfb0861ec6..cc9a517a25 100644
--- a/block.c
+++ b/block.c
@@ -7517,6 +7517,11 @@ void bdrv_img_create(const char *filename, const char 
*fmt,
 goto out;
 }
 
+if (!strcmp(fmt, "gluks")) {
+qemu_opt_set(opts, "size", "0M", _err);
+size = 0;
+}
+
 if (size == -1) {
 error_setg(errp, "Image creation needs a size parameter");
 goto out;
diff --git a/block/generic-luks.c b/block/generic-luks.c
index 32cbedc86f..579f01c4b0 100644
--- a/block/generic-luks.c
+++ b/block/generic-luks.c
@@ -145,7 +145,58 @@ static int coroutine_fn GRAPH_UNLOCKED
 gluks_co_create_opts(BlockDriver *drv, const char *filename,
  QemuOpts *opts, Error **errp)
 {
-return 0;
+QCryptoBlockCreateOptions *create_opts = NULL;
+BlockDriverState *bs = NULL;
+QDict *cryptoopts;
+int ret;
+
+if (qemu_opt_get_size_del(opts, BLOCK_OPT_SIZE, 0) != 0) {
+info_report("gluks format image need not size parameter, ignore it");
+}
+
+cryptoopts = qemu_opts_to_qdict_filtered(opts, NULL,
+ _create_opts_luks,
+ true);
+
+qdict_put_str(cryptoopts, "format",
+QCryptoBlockFormat_str(Q_CRYPTO_BLOCK_FORMAT_GLUKS));
+
+create_opts = block_crypto_create_opts_init(cryptoopts, errp);
+if (!create_opts) {
+ret = -EINVAL;
+goto fail;
+}
+
+/* Create protocol layer */
+ret = bdrv_co_create_file(filename, opts, errp);
+if (ret < 0) {
+goto fail;
+}
+
+bs = bdrv_co_open(filename, NULL, NULL,
+  BDRV_O_RDWR | BDRV_O_RESIZE | BDRV_O_PROTOCOL, errp);
+if (!bs) {
+ret = -EINVAL;
+goto fail;
+}
+/* Create format layer */
+ret = block_crypto_co_create_generic(bs, 0, create_opts, 0, errp);
+if (ret < 0) {
+goto fail;
+}
+
+ret = 0;
+fail:
+/*
+ * If an error occurred, delete 'filename'. Even if the file existed
+ * beforehand, it has been truncated and corrupted in the process.
+ */
+if (ret) {
+bdrv_graph_co_rdlock();
+bdrv_co_delete_file_noerr(bs);
+bdrv_graph_co_rdunlock();
+}
+return ret;
 }
 
 static void
-- 
2.39.1




[RFC 7/8] Gluks: Implement the fundamental block layer driver hooks

2023-12-04 Thread Hyman Huang
Signed-off-by: Hyman Huang 
---
 block/generic-luks.c | 104 ++-
 1 file changed, 102 insertions(+), 2 deletions(-)

diff --git a/block/generic-luks.c b/block/generic-luks.c
index ebc0365d40..32cbedc86f 100644
--- a/block/generic-luks.c
+++ b/block/generic-luks.c
@@ -23,8 +23,14 @@
 #include "qemu/osdep.h"
 
 #include "block/block_int.h"
+#include "block/block-io.h"
 #include "block/crypto.h"
+#include "block/qdict.h"
 #include "crypto/block.h"
+#include "qapi/error.h"
+#include "qemu/error-report.h"
+#include "qemu/module.h"
+#include "qemu/option.h"
 
 #include "generic-luks.h"
 
@@ -50,10 +56,89 @@ static QemuOptsList gluks_create_opts_luks = {
 },
 };
 
+static int gluks_read_func(QCryptoBlock *block,
+   size_t offset,
+   uint8_t *buf,
+   size_t buflen,
+   void *opaque,
+   Error **errp)
+{
+
+BlockDriverState *bs = opaque;
+BDRVGLUKSState *s = bs->opaque;
+ssize_t ret;
+
+GLOBAL_STATE_CODE();
+GRAPH_RDLOCK_GUARD_MAINLOOP();
+
+ret = bdrv_pread(s->header, offset, buflen, buf, 0);
+if (ret < 0) {
+error_setg_errno(errp, -ret, "Could not read generic luks header");
+return ret;
+}
+return 0;
+}
+
 static int gluks_open(BlockDriverState *bs, QDict *options, int flags,
   Error **errp)
 {
-return 0;
+BDRVGLUKSState *s = bs->opaque;
+QemuOpts *opts = NULL;
+QCryptoBlockOpenOptions *open_opts = NULL;
+QDict *cryptoopts = NULL;
+unsigned int cflags = 0;
+int ret;
+
+GLOBAL_STATE_CODE();
+
+if (!bdrv_open_child(NULL, options, "file", bs, _of_bds,
+ (BDRV_CHILD_DATA | BDRV_CHILD_PRIMARY), false, errp)) 
{
+return -EINVAL;
+}
+s->header = bdrv_open_child(NULL, options, "header", bs,
+_of_bds, BDRV_CHILD_METADATA, false,
+errp);
+if (!s->header) {
+return -EINVAL;
+}
+
+GRAPH_RDLOCK_GUARD_MAINLOOP();
+
+opts = qemu_opts_create(_crypto_runtime_opts_luks,
+NULL, 0, _abort);
+if (!qemu_opts_absorb_qdict(opts, options, errp)) {
+ret = -EINVAL;
+goto cleanup;
+}
+
+cryptoopts = qemu_opts_to_qdict(opts, NULL);
+qdict_put_str(cryptoopts, "format",
+QCryptoBlockFormat_str(Q_CRYPTO_BLOCK_FORMAT_GLUKS));
+
+open_opts = block_crypto_open_opts_init(cryptoopts, errp);
+if (!open_opts) {
+goto cleanup;
+}
+
+s->crypto.block = qcrypto_block_open(open_opts, NULL,
+ gluks_read_func,
+ bs,
+ cflags,
+ 1,
+ errp);
+if (!s->crypto.block) {
+ret = -EIO;
+goto cleanup;
+}
+
+s->header_size = qcrypto_block_get_payload_offset(s->crypto.block);
+qcrypto_block_set_payload_offset(s->crypto.block, 0);
+
+ret = 0;
+ cleanup:
+qobject_unref(cryptoopts);
+qapi_free_QCryptoBlockOpenOptions(open_opts);
+return ret;
 }
 
 static int coroutine_fn GRAPH_UNLOCKED
@@ -70,13 +155,24 @@ gluks_child_perms(BlockDriverState *bs, BdrvChild *c,
   uint64_t perm, uint64_t shared,
   uint64_t *nperm, uint64_t *nshared)
 {
+if (role & BDRV_CHILD_METADATA) {
+/* assign read permission only */
+perm |= BLK_PERM_CONSISTENT_READ;
+/* share all permissions */
+shared |= BLK_PERM_ALL;
 
+*nperm = perm;
+*nshared = shared;
+return;
+}
+
+bdrv_default_perms(bs, c, role, reopen_queue, perm, shared, nperm, 
nshared);
 }
 
 static int64_t coroutine_fn GRAPH_RDLOCK
 gluks_co_getlength(BlockDriverState *bs)
 {
-return 0;
+return bdrv_co_getlength(bs->file->bs);
 }
 
 static BlockDriver bdrv_generic_luks = {
@@ -87,8 +183,12 @@ static BlockDriver bdrv_generic_luks = {
 .bdrv_child_perm= gluks_child_perms,
 .bdrv_co_getlength  = gluks_co_getlength,
 
+.bdrv_close = block_crypto_close,
+.bdrv_co_preadv = block_crypto_co_preadv,
+.bdrv_co_pwritev= block_crypto_co_pwritev,
 .create_opts= _create_opts_luks,
 .amend_opts = _crypto_amend_opts_luks,
+.is_format  = false,
 };
 
 static void block_generic_luks_init(void)
-- 
2.39.1




[RFC 0/8] Support generic Luks encryption

2023-12-04 Thread Hyman Huang
This functionality was motivated by the following to-do list seen
in crypto documents:
https://wiki.qemu.org/Features/Block/Crypto 

The last chapter says we should "separate header volume": 

The LUKS format has ability to store the header in a separate volume
from the payload. We should extend the LUKS driver in QEMU to support
this use case.

As a proof-of-concept, I've created this patchset, which I've named
the Gluks: generic luks. As their name suggests, they offer encryption
for any format that QEMU theoretically supports.

As you can see below, the Gluks format block layer driver's design is
quite simple.

 virtio-blk/vhost-user-blk...(front-end device)
  ^
  |
 Gluks   (format-like disk node) 
  / \ 
   file   header (blockdev reference)
/ \
 filefile (protocol node)
   |   |
   disk data   Luks data 

We don't need to create a new disk format in order to use the Gluks
to encrypt the disk; all we need to do is construct a Luks header, which
we will refer to as the "Gluk" because it only contains Luks header data
and no user data. The creation command, for instance, is nearly
identical to Luks image:

$ qemu-img create --object secret,id=sec0,data=abc123 -f gluks
  -o cipher-alg=aes-256,cipher-mode=xts -o key-secret=sec0
  cipher.gluks

As previously mentioned, the "size" option is not accepted during the
generation of the Gluks format because it only contains the Luks header
data.

To hot-add a raw disk with Gluks encryption, see the following steps:

1. add a protocol blockdev node of data disk 
$ virsh qemu-monitor-command vm '{"execute":"blockdev-add",
  "arguments":{"node-name": "libvirt-1-storage", "driver": "file",
  "filename": "/path/to/test_disk.raw"}}'

2. add a protocol blockdev node of Luks header
$ virsh qemu-monitor-command vm '{"execute":"blockdev-add",
  "arguments":{"node-name": "libvirt-2-storage", "driver": "file",
  "filename": "/path/to/cipher.gluks" }}'

3. add the secret for decrypting the cipher stored in Gluks header
$ virsh qemu-monitor-command c81_node1 '{"execute":"object-add",
  "arguments":{"qom-type": "secret", "id":
  "libvirt-2-storage-secret0", "data": "abc123"}}'

4. add the Gluks-drived blockdev to connect the user disk with Luks
   header, QEMU will use the cipher in the Luks header to
   encrypt/decrypt the disk data
$ virsh qemu-monitor-command vm '{"execute":"blockdev-add",
  "arguments":{"node-name": "libvirt-1-format", "driver": "gluks", "file":
  "libvirt-1-storage", "header": "libvirt-2-storage", "key-secret":
  "libvirt-2-storage-secret0"}}' 

5. add the device finally
$ virsh qemu-monitor-command vm '{"execute":"device_add",
  "arguments": {"num-queues": "1", "driver": "virtio-blk-pci", "scsi":
  "off", "drive": "libvirt-1-format", "id": "virtio-disk1"}}'

Do the reverse to hot-del the raw disk.

To hot-add a qcow2 disk with Gluks encryption:

1. add a protocol blockdev node of data disk
$ virsh qemu-monitor-command vm '{"execute":"blockdev-add",
  "arguments":{"node-name": "libvirt-1-storage", "driver": "file",
  "filename": "/path/to/test_disk.qcow2"}}'

2. add a protocol blockdev node of Luks header as above.
   block ref: libvirt-2-storage

3. add the secret for decrypting the cipher stored in Gluks header as
   above too 
   secret ref: libvirt-2-storage-secret0

4. add the qcow2-drived blockdev format node:
$ virsh qemu-monitor-command vm '{"execute":"blockdev-add",
  "arguments":{"node-name": "libvirt-1-format", "driver": "qcow2",
  "file": "libvirt-1-storage"}}'

5. add the Gluks-drived blockdev to connect the qcow2 disk with Luks
   header 
$ virsh qemu-monitor-command vm '{"execute":"blockdev-add",
  "arguments":{"node-name": "libvirt-2-format", "driver": "gluks",
  "file": "libvirt-1-format", "header": "libvirt-2-storage",
  "key-secret": "libvirt-2-format-secret0"}}'

6. add the device finally
$ virsh qemu-monitor-command vm '{"execute":"device_add",
  "arguments": {"num-queues": "

[PATCH v4] crypto: Introduce SM4 symmetric cipher algorithm

2023-11-29 Thread Hyman Huang
Introduce the SM4 cipher algorithms (OSCCA GB/T 32907-2016).

SM4 (GBT.32907-2016) is a cryptographic standard issued by the
Organization of State Commercial Administration of China (OSCCA)
as an authorized cryptographic algorithms for the use within China.

Use the crypto-sm4 meson build option to explicitly control the
feature, which would be detected by default.

Signed-off-by: Hyman Huang 
Reviewed-by: Philippe Mathieu-Daudé 
---
 crypto/block-luks.c | 11 
 crypto/cipher-gcrypt.c.inc  |  8 ++
 crypto/cipher-nettle.c.inc  | 49 +
 crypto/cipher.c |  6 
 meson.build | 42 
 meson_options.txt   |  2 ++
 qapi/crypto.json|  5 +++-
 scripts/meson-buildoptions.sh   |  3 ++
 tests/unit/test-crypto-cipher.c | 13 +
 9 files changed, 138 insertions(+), 1 deletion(-)

diff --git a/crypto/block-luks.c b/crypto/block-luks.c
index fb01ec38bb..f0813d69b4 100644
--- a/crypto/block-luks.c
+++ b/crypto/block-luks.c
@@ -95,12 +95,23 @@ qcrypto_block_luks_cipher_size_map_twofish[] = {
 { 0, 0 },
 };
 
+#ifdef CONFIG_CRYPTO_SM4
+static const QCryptoBlockLUKSCipherSizeMap
+qcrypto_block_luks_cipher_size_map_sm4[] = {
+{ 16, QCRYPTO_CIPHER_ALG_SM4},
+{ 0, 0 },
+};
+#endif
+
 static const QCryptoBlockLUKSCipherNameMap
 qcrypto_block_luks_cipher_name_map[] = {
 { "aes", qcrypto_block_luks_cipher_size_map_aes },
 { "cast5", qcrypto_block_luks_cipher_size_map_cast5 },
 { "serpent", qcrypto_block_luks_cipher_size_map_serpent },
 { "twofish", qcrypto_block_luks_cipher_size_map_twofish },
+#ifdef CONFIG_CRYPTO_SM4
+{ "sm4", qcrypto_block_luks_cipher_size_map_sm4},
+#endif
 };
 
 QEMU_BUILD_BUG_ON(sizeof(struct QCryptoBlockLUKSKeySlot) != 48);
diff --git a/crypto/cipher-gcrypt.c.inc b/crypto/cipher-gcrypt.c.inc
index a6a0117717..1377cbaf14 100644
--- a/crypto/cipher-gcrypt.c.inc
+++ b/crypto/cipher-gcrypt.c.inc
@@ -35,6 +35,9 @@ bool qcrypto_cipher_supports(QCryptoCipherAlgorithm alg,
 case QCRYPTO_CIPHER_ALG_SERPENT_256:
 case QCRYPTO_CIPHER_ALG_TWOFISH_128:
 case QCRYPTO_CIPHER_ALG_TWOFISH_256:
+#ifdef CONFIG_CRYPTO_SM4
+case QCRYPTO_CIPHER_ALG_SM4:
+#endif
 break;
 default:
 return false;
@@ -219,6 +222,11 @@ static QCryptoCipher 
*qcrypto_cipher_ctx_new(QCryptoCipherAlgorithm alg,
 case QCRYPTO_CIPHER_ALG_TWOFISH_256:
 gcryalg = GCRY_CIPHER_TWOFISH;
 break;
+#ifdef CONFIG_CRYPTO_SM4
+case QCRYPTO_CIPHER_ALG_SM4:
+gcryalg = GCRY_CIPHER_SM4;
+break;
+#endif
 default:
 error_setg(errp, "Unsupported cipher algorithm %s",
QCryptoCipherAlgorithm_str(alg));
diff --git a/crypto/cipher-nettle.c.inc b/crypto/cipher-nettle.c.inc
index 24cc61f87b..42b39e18a2 100644
--- a/crypto/cipher-nettle.c.inc
+++ b/crypto/cipher-nettle.c.inc
@@ -33,6 +33,9 @@
 #ifndef CONFIG_QEMU_PRIVATE_XTS
 #include 
 #endif
+#ifdef CONFIG_CRYPTO_SM4
+#include 
+#endif
 
 static inline bool qcrypto_length_check(size_t len, size_t blocksize,
 Error **errp)
@@ -426,6 +429,30 @@ DEFINE_ECB_CBC_CTR_XTS(qcrypto_nettle_twofish,
QCryptoNettleTwofish, TWOFISH_BLOCK_SIZE,
twofish_encrypt_native, twofish_decrypt_native)
 
+#ifdef CONFIG_CRYPTO_SM4
+typedef struct QCryptoNettleSm4 {
+QCryptoCipher base;
+struct sm4_ctx key[2];
+} QCryptoNettleSm4;
+
+static void sm4_encrypt_native(void *ctx, size_t length,
+   uint8_t *dst, const uint8_t *src)
+{
+struct sm4_ctx *keys = ctx;
+sm4_crypt([0], length, dst, src);
+}
+
+static void sm4_decrypt_native(void *ctx, size_t length,
+   uint8_t *dst, const uint8_t *src)
+{
+struct sm4_ctx *keys = ctx;
+sm4_crypt([1], length, dst, src);
+}
+
+DEFINE_ECB(qcrypto_nettle_sm4,
+   QCryptoNettleSm4, SM4_BLOCK_SIZE,
+   sm4_encrypt_native, sm4_decrypt_native)
+#endif
 
 bool qcrypto_cipher_supports(QCryptoCipherAlgorithm alg,
  QCryptoCipherMode mode)
@@ -443,6 +470,9 @@ bool qcrypto_cipher_supports(QCryptoCipherAlgorithm alg,
 case QCRYPTO_CIPHER_ALG_TWOFISH_128:
 case QCRYPTO_CIPHER_ALG_TWOFISH_192:
 case QCRYPTO_CIPHER_ALG_TWOFISH_256:
+#ifdef CONFIG_CRYPTO_SM4
+case QCRYPTO_CIPHER_ALG_SM4:
+#endif
 break;
 default:
 return false;
@@ -701,6 +731,25 @@ static QCryptoCipher 
*qcrypto_cipher_ctx_new(QCryptoCipherAlgorithm alg,
 
 return >base;
 }
+#ifdef CONFIG_CRYPTO_SM4
+case QCRYPTO_CIPHER_ALG_SM4:
+{
+QCryptoNettleSm4 *ctx = g_new0(QCryptoNettleSm4, 1);
+
+switch (mode) {
+case QCRYPTO_CIPHER_MODE_ECB:
+ctx->base.driver = _nettle_sm4_driver_ecb;
+

[PATCH v3] crypto: Introduce SM4 symmetric cipher algorithm

2023-11-29 Thread Hyman Huang
Introduce the SM4 cipher algorithms (OSCCA GB/T 32907-2016).

SM4 (GBT.32907-2016) is a cryptographic standard issued by the
Organization of State Commercial Administration of China (OSCCA)
as an authorized cryptographic algorithms for the use within China.

Use the crypto-sm4 meson build option to explicitly control the
feature, which would be detected by default.

Signed-off-by: Hyman Huang 
---
 crypto/block-luks.c | 11 
 crypto/cipher-gcrypt.c.inc  |  8 ++
 crypto/cipher-nettle.c.inc  | 49 +
 crypto/cipher.c |  6 
 meson.build | 42 
 meson_options.txt   |  2 ++
 qapi/crypto.json|  5 +++-
 scripts/meson-buildoptions.sh   |  3 ++
 tests/unit/test-crypto-cipher.c | 13 +
 9 files changed, 138 insertions(+), 1 deletion(-)

diff --git a/crypto/block-luks.c b/crypto/block-luks.c
index fb01ec38bb..f0813d69b4 100644
--- a/crypto/block-luks.c
+++ b/crypto/block-luks.c
@@ -95,12 +95,23 @@ qcrypto_block_luks_cipher_size_map_twofish[] = {
 { 0, 0 },
 };
 
+#ifdef CONFIG_CRYPTO_SM4
+static const QCryptoBlockLUKSCipherSizeMap
+qcrypto_block_luks_cipher_size_map_sm4[] = {
+{ 16, QCRYPTO_CIPHER_ALG_SM4},
+{ 0, 0 },
+};
+#endif
+
 static const QCryptoBlockLUKSCipherNameMap
 qcrypto_block_luks_cipher_name_map[] = {
 { "aes", qcrypto_block_luks_cipher_size_map_aes },
 { "cast5", qcrypto_block_luks_cipher_size_map_cast5 },
 { "serpent", qcrypto_block_luks_cipher_size_map_serpent },
 { "twofish", qcrypto_block_luks_cipher_size_map_twofish },
+#ifdef CONFIG_CRYPTO_SM4
+{ "sm4", qcrypto_block_luks_cipher_size_map_sm4},
+#endif
 };
 
 QEMU_BUILD_BUG_ON(sizeof(struct QCryptoBlockLUKSKeySlot) != 48);
diff --git a/crypto/cipher-gcrypt.c.inc b/crypto/cipher-gcrypt.c.inc
index a6a0117717..1377cbaf14 100644
--- a/crypto/cipher-gcrypt.c.inc
+++ b/crypto/cipher-gcrypt.c.inc
@@ -35,6 +35,9 @@ bool qcrypto_cipher_supports(QCryptoCipherAlgorithm alg,
 case QCRYPTO_CIPHER_ALG_SERPENT_256:
 case QCRYPTO_CIPHER_ALG_TWOFISH_128:
 case QCRYPTO_CIPHER_ALG_TWOFISH_256:
+#ifdef CONFIG_CRYPTO_SM4
+case QCRYPTO_CIPHER_ALG_SM4:
+#endif
 break;
 default:
 return false;
@@ -219,6 +222,11 @@ static QCryptoCipher 
*qcrypto_cipher_ctx_new(QCryptoCipherAlgorithm alg,
 case QCRYPTO_CIPHER_ALG_TWOFISH_256:
 gcryalg = GCRY_CIPHER_TWOFISH;
 break;
+#ifdef CONFIG_CRYPTO_SM4
+case QCRYPTO_CIPHER_ALG_SM4:
+gcryalg = GCRY_CIPHER_SM4;
+break;
+#endif
 default:
 error_setg(errp, "Unsupported cipher algorithm %s",
QCryptoCipherAlgorithm_str(alg));
diff --git a/crypto/cipher-nettle.c.inc b/crypto/cipher-nettle.c.inc
index 24cc61f87b..42b39e18a2 100644
--- a/crypto/cipher-nettle.c.inc
+++ b/crypto/cipher-nettle.c.inc
@@ -33,6 +33,9 @@
 #ifndef CONFIG_QEMU_PRIVATE_XTS
 #include 
 #endif
+#ifdef CONFIG_CRYPTO_SM4
+#include 
+#endif
 
 static inline bool qcrypto_length_check(size_t len, size_t blocksize,
 Error **errp)
@@ -426,6 +429,30 @@ DEFINE_ECB_CBC_CTR_XTS(qcrypto_nettle_twofish,
QCryptoNettleTwofish, TWOFISH_BLOCK_SIZE,
twofish_encrypt_native, twofish_decrypt_native)
 
+#ifdef CONFIG_CRYPTO_SM4
+typedef struct QCryptoNettleSm4 {
+QCryptoCipher base;
+struct sm4_ctx key[2];
+} QCryptoNettleSm4;
+
+static void sm4_encrypt_native(void *ctx, size_t length,
+   uint8_t *dst, const uint8_t *src)
+{
+struct sm4_ctx *keys = ctx;
+sm4_crypt([0], length, dst, src);
+}
+
+static void sm4_decrypt_native(void *ctx, size_t length,
+   uint8_t *dst, const uint8_t *src)
+{
+struct sm4_ctx *keys = ctx;
+sm4_crypt([1], length, dst, src);
+}
+
+DEFINE_ECB(qcrypto_nettle_sm4,
+   QCryptoNettleSm4, SM4_BLOCK_SIZE,
+   sm4_encrypt_native, sm4_decrypt_native)
+#endif
 
 bool qcrypto_cipher_supports(QCryptoCipherAlgorithm alg,
  QCryptoCipherMode mode)
@@ -443,6 +470,9 @@ bool qcrypto_cipher_supports(QCryptoCipherAlgorithm alg,
 case QCRYPTO_CIPHER_ALG_TWOFISH_128:
 case QCRYPTO_CIPHER_ALG_TWOFISH_192:
 case QCRYPTO_CIPHER_ALG_TWOFISH_256:
+#ifdef CONFIG_CRYPTO_SM4
+case QCRYPTO_CIPHER_ALG_SM4:
+#endif
 break;
 default:
 return false;
@@ -701,6 +731,25 @@ static QCryptoCipher 
*qcrypto_cipher_ctx_new(QCryptoCipherAlgorithm alg,
 
 return >base;
 }
+#ifdef CONFIG_CRYPTO_SM4
+case QCRYPTO_CIPHER_ALG_SM4:
+{
+QCryptoNettleSm4 *ctx = g_new0(QCryptoNettleSm4, 1);
+
+switch (mode) {
+case QCRYPTO_CIPHER_MODE_ECB:
+ctx->base.driver = _nettle_sm4_driver_ecb;
+

[PATCH v2] crypto: Introduce SM4 symmetric cipher algorithm

2023-11-28 Thread Hyman Huang
Introduce the SM4 cipher algorithms (OSCCA GB/T 32907-2016).

SM4 (GBT.32907-2016) is a cryptographic standard issued by the
Organization of State Commercial Administration of China (OSCCA)
as an authorized cryptographic algorithms for the use within China.

Use the crypto-sm4 meson build option for enabling this feature.

Signed-off-by: Hyman Huang 
---
 crypto/block-luks.c | 11 
 crypto/cipher-gcrypt.c.inc  |  8 ++
 crypto/cipher-nettle.c.inc  | 49 +
 crypto/cipher.c |  6 
 meson.build | 23 
 meson_options.txt   |  2 ++
 qapi/crypto.json|  5 +++-
 scripts/meson-buildoptions.sh   |  3 ++
 tests/unit/test-crypto-cipher.c | 13 +
 9 files changed, 119 insertions(+), 1 deletion(-)

diff --git a/crypto/block-luks.c b/crypto/block-luks.c
index fb01ec38bb..f0813d69b4 100644
--- a/crypto/block-luks.c
+++ b/crypto/block-luks.c
@@ -95,12 +95,23 @@ qcrypto_block_luks_cipher_size_map_twofish[] = {
 { 0, 0 },
 };
 
+#ifdef CONFIG_CRYPTO_SM4
+static const QCryptoBlockLUKSCipherSizeMap
+qcrypto_block_luks_cipher_size_map_sm4[] = {
+{ 16, QCRYPTO_CIPHER_ALG_SM4},
+{ 0, 0 },
+};
+#endif
+
 static const QCryptoBlockLUKSCipherNameMap
 qcrypto_block_luks_cipher_name_map[] = {
 { "aes", qcrypto_block_luks_cipher_size_map_aes },
 { "cast5", qcrypto_block_luks_cipher_size_map_cast5 },
 { "serpent", qcrypto_block_luks_cipher_size_map_serpent },
 { "twofish", qcrypto_block_luks_cipher_size_map_twofish },
+#ifdef CONFIG_CRYPTO_SM4
+{ "sm4", qcrypto_block_luks_cipher_size_map_sm4},
+#endif
 };
 
 QEMU_BUILD_BUG_ON(sizeof(struct QCryptoBlockLUKSKeySlot) != 48);
diff --git a/crypto/cipher-gcrypt.c.inc b/crypto/cipher-gcrypt.c.inc
index a6a0117717..1377cbaf14 100644
--- a/crypto/cipher-gcrypt.c.inc
+++ b/crypto/cipher-gcrypt.c.inc
@@ -35,6 +35,9 @@ bool qcrypto_cipher_supports(QCryptoCipherAlgorithm alg,
 case QCRYPTO_CIPHER_ALG_SERPENT_256:
 case QCRYPTO_CIPHER_ALG_TWOFISH_128:
 case QCRYPTO_CIPHER_ALG_TWOFISH_256:
+#ifdef CONFIG_CRYPTO_SM4
+case QCRYPTO_CIPHER_ALG_SM4:
+#endif
 break;
 default:
 return false;
@@ -219,6 +222,11 @@ static QCryptoCipher 
*qcrypto_cipher_ctx_new(QCryptoCipherAlgorithm alg,
 case QCRYPTO_CIPHER_ALG_TWOFISH_256:
 gcryalg = GCRY_CIPHER_TWOFISH;
 break;
+#ifdef CONFIG_CRYPTO_SM4
+case QCRYPTO_CIPHER_ALG_SM4:
+gcryalg = GCRY_CIPHER_SM4;
+break;
+#endif
 default:
 error_setg(errp, "Unsupported cipher algorithm %s",
QCryptoCipherAlgorithm_str(alg));
diff --git a/crypto/cipher-nettle.c.inc b/crypto/cipher-nettle.c.inc
index 24cc61f87b..42b39e18a2 100644
--- a/crypto/cipher-nettle.c.inc
+++ b/crypto/cipher-nettle.c.inc
@@ -33,6 +33,9 @@
 #ifndef CONFIG_QEMU_PRIVATE_XTS
 #include 
 #endif
+#ifdef CONFIG_CRYPTO_SM4
+#include 
+#endif
 
 static inline bool qcrypto_length_check(size_t len, size_t blocksize,
 Error **errp)
@@ -426,6 +429,30 @@ DEFINE_ECB_CBC_CTR_XTS(qcrypto_nettle_twofish,
QCryptoNettleTwofish, TWOFISH_BLOCK_SIZE,
twofish_encrypt_native, twofish_decrypt_native)
 
+#ifdef CONFIG_CRYPTO_SM4
+typedef struct QCryptoNettleSm4 {
+QCryptoCipher base;
+struct sm4_ctx key[2];
+} QCryptoNettleSm4;
+
+static void sm4_encrypt_native(void *ctx, size_t length,
+   uint8_t *dst, const uint8_t *src)
+{
+struct sm4_ctx *keys = ctx;
+sm4_crypt([0], length, dst, src);
+}
+
+static void sm4_decrypt_native(void *ctx, size_t length,
+   uint8_t *dst, const uint8_t *src)
+{
+struct sm4_ctx *keys = ctx;
+sm4_crypt([1], length, dst, src);
+}
+
+DEFINE_ECB(qcrypto_nettle_sm4,
+   QCryptoNettleSm4, SM4_BLOCK_SIZE,
+   sm4_encrypt_native, sm4_decrypt_native)
+#endif
 
 bool qcrypto_cipher_supports(QCryptoCipherAlgorithm alg,
  QCryptoCipherMode mode)
@@ -443,6 +470,9 @@ bool qcrypto_cipher_supports(QCryptoCipherAlgorithm alg,
 case QCRYPTO_CIPHER_ALG_TWOFISH_128:
 case QCRYPTO_CIPHER_ALG_TWOFISH_192:
 case QCRYPTO_CIPHER_ALG_TWOFISH_256:
+#ifdef CONFIG_CRYPTO_SM4
+case QCRYPTO_CIPHER_ALG_SM4:
+#endif
 break;
 default:
 return false;
@@ -701,6 +731,25 @@ static QCryptoCipher 
*qcrypto_cipher_ctx_new(QCryptoCipherAlgorithm alg,
 
 return >base;
 }
+#ifdef CONFIG_CRYPTO_SM4
+case QCRYPTO_CIPHER_ALG_SM4:
+{
+QCryptoNettleSm4 *ctx = g_new0(QCryptoNettleSm4, 1);
+
+switch (mode) {
+case QCRYPTO_CIPHER_MODE_ECB:
+ctx->base.driver = _nettle_sm4_driver_ecb;
+break;
+default:
+goto bad_cipher_mo

[PATCH] crypto: Introduce SM4 symmetric cipher algorithm

2023-11-27 Thread Hyman Huang
Introduce the SM4 cipher algorithms (OSCCA GB/T 32907-2016).

SM4 (GBT.32907-2016) is a cryptographic standard issued by the
Organization of State Commercial Administration of China (OSCCA)
as an authorized cryptographic algorithms for the use within China.

Signed-off-by: Hyman Huang 
---
 crypto/block-luks.c |  7 ++
 crypto/cipher-gcrypt.c.inc  |  4 
 crypto/cipher-nettle.c.inc  | 42 +
 crypto/cipher.c |  2 ++
 qapi/crypto.json|  5 +++-
 tests/unit/test-crypto-cipher.c | 11 +
 6 files changed, 70 insertions(+), 1 deletion(-)

diff --git a/crypto/block-luks.c b/crypto/block-luks.c
index fb01ec38bb..1cb7f21a05 100644
--- a/crypto/block-luks.c
+++ b/crypto/block-luks.c
@@ -95,12 +95,19 @@ qcrypto_block_luks_cipher_size_map_twofish[] = {
 { 0, 0 },
 };
 
+static const QCryptoBlockLUKSCipherSizeMap
+qcrypto_block_luks_cipher_size_map_sm4[] = {
+{ 16, QCRYPTO_CIPHER_ALG_SM4},
+{ 0, 0 },
+};
+
 static const QCryptoBlockLUKSCipherNameMap
 qcrypto_block_luks_cipher_name_map[] = {
 { "aes", qcrypto_block_luks_cipher_size_map_aes },
 { "cast5", qcrypto_block_luks_cipher_size_map_cast5 },
 { "serpent", qcrypto_block_luks_cipher_size_map_serpent },
 { "twofish", qcrypto_block_luks_cipher_size_map_twofish },
+{ "sm4", qcrypto_block_luks_cipher_size_map_sm4},
 };
 
 QEMU_BUILD_BUG_ON(sizeof(struct QCryptoBlockLUKSKeySlot) != 48);
diff --git a/crypto/cipher-gcrypt.c.inc b/crypto/cipher-gcrypt.c.inc
index a6a0117717..03af50b0c3 100644
--- a/crypto/cipher-gcrypt.c.inc
+++ b/crypto/cipher-gcrypt.c.inc
@@ -35,6 +35,7 @@ bool qcrypto_cipher_supports(QCryptoCipherAlgorithm alg,
 case QCRYPTO_CIPHER_ALG_SERPENT_256:
 case QCRYPTO_CIPHER_ALG_TWOFISH_128:
 case QCRYPTO_CIPHER_ALG_TWOFISH_256:
+case QCRYPTO_CIPHER_ALG_SM4:
 break;
 default:
 return false;
@@ -219,6 +220,9 @@ static QCryptoCipher 
*qcrypto_cipher_ctx_new(QCryptoCipherAlgorithm alg,
 case QCRYPTO_CIPHER_ALG_TWOFISH_256:
 gcryalg = GCRY_CIPHER_TWOFISH;
 break;
+case QCRYPTO_CIPHER_ALG_SM4:
+gcryalg = GCRY_CIPHER_SM4;
+break;
 default:
 error_setg(errp, "Unsupported cipher algorithm %s",
QCryptoCipherAlgorithm_str(alg));
diff --git a/crypto/cipher-nettle.c.inc b/crypto/cipher-nettle.c.inc
index 24cc61f87b..cd2ca0c7b5 100644
--- a/crypto/cipher-nettle.c.inc
+++ b/crypto/cipher-nettle.c.inc
@@ -30,6 +30,7 @@
 #include 
 #include 
 #include 
+#include 
 #ifndef CONFIG_QEMU_PRIVATE_XTS
 #include 
 #endif
@@ -426,6 +427,28 @@ DEFINE_ECB_CBC_CTR_XTS(qcrypto_nettle_twofish,
QCryptoNettleTwofish, TWOFISH_BLOCK_SIZE,
twofish_encrypt_native, twofish_decrypt_native)
 
+typedef struct QCryptoNettleSm4 {
+QCryptoCipher base;
+struct sm4_ctx key[2];
+} QCryptoNettleSm4;
+
+static void sm4_encrypt_native(void *ctx, size_t length,
+   uint8_t *dst, const uint8_t *src)
+{
+struct sm4_ctx *keys = ctx;
+sm4_crypt([0], length, dst, src);
+}
+
+static void sm4_decrypt_native(void *ctx, size_t length,
+   uint8_t *dst, const uint8_t *src)
+{
+struct sm4_ctx *keys = ctx;
+sm4_crypt([1], length, dst, src);
+}
+
+DEFINE_ECB(qcrypto_nettle_sm4,
+   QCryptoNettleSm4, SM4_BLOCK_SIZE,
+   sm4_encrypt_native, sm4_decrypt_native)
 
 bool qcrypto_cipher_supports(QCryptoCipherAlgorithm alg,
  QCryptoCipherMode mode)
@@ -443,6 +466,7 @@ bool qcrypto_cipher_supports(QCryptoCipherAlgorithm alg,
 case QCRYPTO_CIPHER_ALG_TWOFISH_128:
 case QCRYPTO_CIPHER_ALG_TWOFISH_192:
 case QCRYPTO_CIPHER_ALG_TWOFISH_256:
+case QCRYPTO_CIPHER_ALG_SM4:
 break;
 default:
 return false;
@@ -702,6 +726,24 @@ static QCryptoCipher 
*qcrypto_cipher_ctx_new(QCryptoCipherAlgorithm alg,
 return >base;
 }
 
+case QCRYPTO_CIPHER_ALG_SM4:
+{
+QCryptoNettleSm4 *ctx = g_new0(QCryptoNettleSm4, 1);
+
+switch (mode) {
+case QCRYPTO_CIPHER_MODE_ECB:
+ctx->base.driver = _nettle_sm4_driver_ecb;
+break;
+default:
+goto bad_cipher_mode;
+}
+
+sm4_set_encrypt_key(>key[0], key);
+sm4_set_decrypt_key(>key[1], key);
+
+return >base;
+}
+
 default:
 error_setg(errp, "Unsupported cipher algorithm %s",
QCryptoCipherAlgorithm_str(alg));
diff --git a/crypto/cipher.c b/crypto/cipher.c
index 74b09a5b26..048ceaa6a3 100644
--- a/crypto/cipher.c
+++ b/crypto/cipher.c
@@ -38,6 +38,7 @@ static const size_t alg_key_len[QCRYPTO_CIPHER_ALG__MAX] = {
 [QCRYPTO_CIPHER_ALG_TWOFISH_128] = 16,
 [QCRYPTO_CIPHER_ALG_TWOFIS

[v2 1/2] qapi/virtio: Add feature and status bits for x-query-virtio-status

2023-11-18 Thread Hyman Huang
This patch allows to display feature and status bits in
virtio-status.

Applications could find it helpful to compare status and features
that are numeric encoded. For example, an upper application could
use the features (encoded as a number) in the output of "ovs-vsctl
list interface" and the feature bits fields in the output of QMP
command "x-query-virtio-status" to compare directly when attempting
to ensure the correctness of the virtio negotiation between guest,
QEMU, and OVS-DPDK. Not applying any more encoding.

This patch also serves as a preparation for the next one, which implements
a vhost-user test case about acked features of vhost-user protocol.

Note that since the matching HMP command is typically used for human,
leave it unchanged.

Signed-off-by: Hyman Huang 
---
 hw/virtio/virtio-qmp.c |  8 
 qapi/virtio.json   | 37 +
 2 files changed, 45 insertions(+)

diff --git a/hw/virtio/virtio-qmp.c b/hw/virtio/virtio-qmp.c
index 1dd96ed20f..13ba1e926e 100644
--- a/hw/virtio/virtio-qmp.c
+++ b/hw/virtio/virtio-qmp.c
@@ -733,6 +733,9 @@ VirtioStatus *qmp_x_query_virtio_status(const char *path, 
Error **errp)
 status->name = g_strdup(vdev->name);
 status->device_id = vdev->device_id;
 status->vhost_started = vdev->vhost_started;
+status->guest_features_bits = vdev->guest_features;
+status->host_features_bits = vdev->host_features;
+status->backend_features_bits = vdev->backend_features;
 status->guest_features = qmp_decode_features(vdev->device_id,
  vdev->guest_features);
 status->host_features = qmp_decode_features(vdev->device_id,
@@ -753,6 +756,7 @@ VirtioStatus *qmp_x_query_virtio_status(const char *path, 
Error **errp)
 }
 
 status->num_vqs = virtio_get_num_queues(vdev);
+status->status_bits = vdev->status;
 status->status = qmp_decode_status(vdev->status);
 status->isr = vdev->isr;
 status->queue_sel = vdev->queue_sel;
@@ -775,6 +779,10 @@ VirtioStatus *qmp_x_query_virtio_status(const char *path, 
Error **errp)
 status->vhost_dev->n_tmp_sections = hdev->n_tmp_sections;
 status->vhost_dev->nvqs = hdev->nvqs;
 status->vhost_dev->vq_index = hdev->vq_index;
+status->vhost_dev->features_bits = hdev->features;
+status->vhost_dev->acked_features_bits = hdev->acked_features;
+status->vhost_dev->backend_features_bits = hdev->backend_features;
+status->vhost_dev->protocol_features_bits = hdev->protocol_features;
 status->vhost_dev->features =
 qmp_decode_features(vdev->device_id, hdev->features);
 status->vhost_dev->acked_features =
diff --git a/qapi/virtio.json b/qapi/virtio.json
index e6dcee7b83..6f1b5e3710 100644
--- a/qapi/virtio.json
+++ b/qapi/virtio.json
@@ -79,12 +79,20 @@
 #
 # @vq-index: vhost_dev vq_index
 #
+# @features-bits: vhost_dev features encoded as a number
+#
 # @features: vhost_dev features
 #
+# @acked-features-bits: vhost_dev acked_features encoded as a number
+#
 # @acked-features: vhost_dev acked_features
 #
+# @backend-features-bits: vhost_dev backend_features encoded as a number
+#
 # @backend-features: vhost_dev backend_features
 #
+# @protocol-features-bits: vhost_dev protocol_features encoded as a number
+#
 # @protocol-features: vhost_dev protocol_features
 #
 # @max-queues: vhost_dev max_queues
@@ -102,9 +110,13 @@
 'n-tmp-sections': 'int',
 'nvqs': 'uint32',
 'vq-index': 'int',
+'features-bits': 'uint64',
 'features': 'VirtioDeviceFeatures',
+'acked-features-bits': 'uint64',
 'acked-features': 'VirtioDeviceFeatures',
+'backend-features-bits': 'uint64',
 'backend-features': 'VirtioDeviceFeatures',
+'protocol-features-bits': 'uint64',
 'protocol-features': 'VhostDeviceProtocols',
 'max-queues': 'uint64',
 'backend-cap': 'uint64',
@@ -124,10 +136,16 @@
 #
 # @vhost-started: VirtIODevice vhost_started flag
 #
+# @guest-features-bits: VirtIODevice guest_features encoded as a number
+#
 # @guest-features: VirtIODevice guest_features
 #
+# @host-features-bits: VirtIODevice host_features encoded as a number
+#
 # @host-features: VirtIODevice host_features
 #
+# @backend-features-bits: VirtIODevice backend_features encoded as a number
+#
 # @backend-features: VirtIODevice backend_features
 #
 # @device-endian: VirtIODevice device_endian
@@ -135,6 +153,9 @@
 # @num-vqs: VirtIODevice virtqueue count.  This is the number of
 # active virtqueues being used by the VirtIODevice.
 #
+# @status-bits: VirtIODevice configuration status encoded as a number
+# (VirtioDeviceStatus)
+#
 # @status: VirtIODevice configuration s

[v2 2/2] vhost-user-test: Add negotiated features check

2023-11-18 Thread Hyman Huang
When a vhost-user network device is restored from an unexpected
failure, the acked_features could be used as input for the
VHOST_USER_SET_FEATURES command because QEMU internally backups
the final features as acked_features after the guest acknowledges
features during virtio-net driver initialization.

The negotiated features check verifies whether the features in the
Vhost slave device and the acked_features in QEMU are identical.

Through the usage of the vhost-user protocol, the test case seeks to
verify that the vhost-user network device is correctly negotiating.

Signed-off-by: Hyman Huang 
---
 tests/qtest/vhost-user-test.c | 100 ++
 1 file changed, 100 insertions(+)

diff --git a/tests/qtest/vhost-user-test.c b/tests/qtest/vhost-user-test.c
index d4e437265f..4f98ee2560 100644
--- a/tests/qtest/vhost-user-test.c
+++ b/tests/qtest/vhost-user-test.c
@@ -13,6 +13,7 @@
 #include "libqtest-single.h"
 #include "qapi/error.h"
 #include "qapi/qmp/qdict.h"
+#include "qapi/qmp/qlist.h"
 #include "qemu/config-file.h"
 #include "qemu/option.h"
 #include "qemu/range.h"
@@ -169,6 +170,7 @@ typedef struct TestServer {
 int test_flags;
 int queues;
 struct vhost_user_ops *vu_ops;
+uint64_t features;
 } TestServer;
 
 struct vhost_user_ops {
@@ -1020,6 +1022,100 @@ static void test_multiqueue(void *obj, void *arg, 
QGuestAllocator *alloc)
 }
 
 
+static QDict *query_virtio(QTestState *who)
+{
+QDict *rsp;
+
+rsp = qtest_qmp(who, "{ 'execute': 'x-query-virtio'}");
+g_assert(!qdict_haskey(rsp, "error"));
+g_assert(qdict_haskey(rsp, "return"));
+
+return rsp;
+}
+
+static QDict *query_virtio_status(QTestState *who, const char *path)
+{
+QDict *rsp;
+
+rsp = qtest_qmp(who, "{ 'execute': 'x-query-virtio-status', "
+"'arguments': { 'path': %s} }", path);
+
+g_assert(!qdict_haskey(rsp, "error"));
+g_assert(qdict_haskey(rsp, "return"));
+
+return rsp;
+}
+
+static uint64_t get_acked_features(QTestState *who)
+{
+QDict *rsp_return, *status, *vhost_info, *dev;
+QList *dev_list;
+const QListEntry *entry;
+const char *name;
+char *path;
+uint64_t acked_features;
+
+/* query the virtio devices */
+rsp_return = query_virtio(who);
+g_assert(rsp_return);
+
+dev_list = qdict_get_qlist(rsp_return, "return");
+g_assert(dev_list && !qlist_empty(dev_list));
+
+/* fetch the first and the sole device */
+entry = qlist_first(dev_list);
+g_assert(entry);
+
+dev = qobject_to(QDict, qlist_entry_obj(entry));
+g_assert(dev);
+
+name = qdict_get_try_str(dev, "name");
+g_assert_cmpstr(name, ==, "virtio-net");
+
+path = g_strdup(qdict_get_try_str(dev, "path"));
+g_assert(path);
+qobject_unref(rsp_return);
+rsp_return = NULL;
+
+/* fetch the status of the virtio-net device by QOM path */
+rsp_return = query_virtio_status(who, path);
+g_assert(rsp_return);
+
+status = qdict_get_qdict(rsp_return, "return");
+g_assert(status);
+
+vhost_info = qdict_get_qdict(status, "vhost-dev");
+g_assert(vhost_info);
+
+acked_features = qdict_get_try_int(vhost_info, "acked-features-bits", 0);
+
+qobject_unref(rsp_return);
+g_free(path);
+
+return acked_features;
+}
+
+static void acked_features_check(QTestState *qts, TestServer *s)
+{
+uint64_t acked_features;
+
+acked_features = get_acked_features(qts);
+g_assert_cmpint(acked_features, ==, s->features);
+}
+
+static void test_acked_features(void *obj,
+ void *arg,
+ QGuestAllocator *alloc)
+{
+TestServer *server = arg;
+
+if (!wait_for_fds(server)) {
+return;
+}
+
+acked_features_check(global_qtest, server);
+}
+
 static uint64_t vu_net_get_features(TestServer *s)
 {
 uint64_t features = 0x1ULL << VHOST_F_LOG_ALL |
@@ -1040,6 +1136,7 @@ static void vu_net_set_features(TestServer *s, 
CharBackend *chr,
 qemu_chr_fe_disconnect(chr);
 s->test_flags = TEST_FLAGS_BAD;
 }
+s->features = msg->payload.u64;
 }
 
 static void vu_net_get_protocol_features(TestServer *s, CharBackend *chr,
@@ -1109,6 +1206,9 @@ static void register_vhost_user_test(void)
 qos_add_test("vhost-user/multiqueue",
  "virtio-net",
  test_multiqueue, );
+qos_add_test("vhost-user/read_acked_features",
+ "virtio-net",
+ test_acked_features, );
 }
 libqos_init(register_vhost_user_test);
 
-- 
2.39.1




[v2 0/2] vhost-user-test: Add negotiated features check

2023-11-18 Thread Hyman Huang
Markus made suggestions for the changes to version 2, and thanks
for that as well.

v2:
- rebase on master.
- drop the "show-bits" option. 
- refine the comment.


v1:
The patchset "Fix the virtio features negotiation flaw" fix a
vhost-user negotiation flaw:
c9bdc449f9 vhost-user: Fix the virtio features negotiation flaw
bebcac052a vhost-user: Refactor the chr_closed_bh
937b7d96e4 vhost-user: Refactor vhost acked features saving

While the test case remain unmerged, the detail reference:
https://lore.kernel.org/qemu-devel/cover.1667232396.git.huang...@chinatelecom.cn/

Since Michael pointed out that the info virtio makes sense to query
the negotiation feature, this patchset uses the x-query-virtio-status
to retrieve the features instead of exporting netdev capabilities and
information as we did in the previous patchset to aid in confirming
the negotiation's validity.

To do that, we first introduce an "show-bits" argument for
x-query-virtio-status such that the feature bits can be used
directly, and then implement the test case for negotiated features
check. As we post, the code is divided into two patches.

Please review, thanks,
Yong

Hyman Huang (2):
  qapi/virtio: Add feature and status bits for x-query-virtio-status
  vhost-user-test: Add negotiated features check

 hw/virtio/virtio-qmp.c|   8 +++
 qapi/virtio.json  |  37 +
 tests/qtest/vhost-user-test.c | 100 ++
 3 files changed, 145 insertions(+)

-- 
2.39.1




[RFC 1/2] qapi/virtio: introduce the "show-bits" argument for x-query-virtio-status

2023-11-12 Thread Hyman Huang
This patch allows to display feature and status bits in virtio-status.

An optional argument is introduced: show-bits. For example:
{"execute": "x-query-virtio-status",
 "arguments": {"path": "/machine/peripheral-anon/device[1]/virtio-backend",
   "show-bits": true}

Features and status bits could be helpful for applications to compare
directly. For instance, when an upper application aims to ensure the
virtio negotiation correctness between guest, QEMU, and OVS-DPDK, it use
the "ovs-vsctl list interface" command to retrieve interface features
(in number format) and the QMP command x-query-virtio-status to retrieve
vhost-user net device features. If "show-bits" is added, the application
can compare the two features directly; No need to encoding the features
returned by the QMP command.

This patch also serves as a preparation for the next one, which implements
a vhost-user test case about acked features of vhost-user protocol.

Note that since the matching HMP command is typically used for human,
leave it unchanged.

Signed-off-by: Hyman Huang 
---
 hw/virtio/virtio-hmp-cmds.c |  2 +-
 hw/virtio/virtio-qmp.c  | 21 +++-
 qapi/virtio.json| 49 ++---
 3 files changed, 67 insertions(+), 5 deletions(-)

diff --git a/hw/virtio/virtio-hmp-cmds.c b/hw/virtio/virtio-hmp-cmds.c
index 477c97dea2..3774f3d4bf 100644
--- a/hw/virtio/virtio-hmp-cmds.c
+++ b/hw/virtio/virtio-hmp-cmds.c
@@ -108,7 +108,7 @@ void hmp_virtio_status(Monitor *mon, const QDict *qdict)
 {
 Error *err = NULL;
 const char *path = qdict_get_try_str(qdict, "path");
-VirtioStatus *s = qmp_x_query_virtio_status(path, );
+VirtioStatus *s = qmp_x_query_virtio_status(path, false, false, );
 
 if (err != NULL) {
 hmp_handle_error(mon, err);
diff --git a/hw/virtio/virtio-qmp.c b/hw/virtio/virtio-qmp.c
index 1dd96ed20f..2e92bf28ac 100644
--- a/hw/virtio/virtio-qmp.c
+++ b/hw/virtio/virtio-qmp.c
@@ -718,10 +718,15 @@ VirtIODevice *qmp_find_virtio_device(const char *path)
 return VIRTIO_DEVICE(dev);
 }
 
-VirtioStatus *qmp_x_query_virtio_status(const char *path, Error **errp)
+VirtioStatus *qmp_x_query_virtio_status(const char *path,
+bool has_show_bits,
+bool show_bits,
+Error **errp)
 {
 VirtIODevice *vdev;
 VirtioStatus *status;
+bool display_bits =
+has_show_bits ? show_bits : false;
 
 vdev = qmp_find_virtio_device(path);
 if (vdev == NULL) {
@@ -733,6 +738,11 @@ VirtioStatus *qmp_x_query_virtio_status(const char *path, 
Error **errp)
 status->name = g_strdup(vdev->name);
 status->device_id = vdev->device_id;
 status->vhost_started = vdev->vhost_started;
+if (display_bits) {
+status->guest_features_bits = vdev->guest_features;
+status->host_features_bits = vdev->host_features;
+status->backend_features_bits = vdev->backend_features;
+}
 status->guest_features = qmp_decode_features(vdev->device_id,
  vdev->guest_features);
 status->host_features = qmp_decode_features(vdev->device_id,
@@ -753,6 +763,9 @@ VirtioStatus *qmp_x_query_virtio_status(const char *path, 
Error **errp)
 }
 
 status->num_vqs = virtio_get_num_queues(vdev);
+if (display_bits) {
+status->status_bits = vdev->status;
+}
 status->status = qmp_decode_status(vdev->status);
 status->isr = vdev->isr;
 status->queue_sel = vdev->queue_sel;
@@ -775,6 +788,12 @@ VirtioStatus *qmp_x_query_virtio_status(const char *path, 
Error **errp)
 status->vhost_dev->n_tmp_sections = hdev->n_tmp_sections;
 status->vhost_dev->nvqs = hdev->nvqs;
 status->vhost_dev->vq_index = hdev->vq_index;
+if (display_bits) {
+status->vhost_dev->features_bits = hdev->features;
+status->vhost_dev->acked_features_bits = hdev->acked_features;
+status->vhost_dev->backend_features_bits = hdev->backend_features;
+status->vhost_dev->protocol_features_bits = 
hdev->protocol_features;
+}
 status->vhost_dev->features =
 qmp_decode_features(vdev->device_id, hdev->features);
 status->vhost_dev->acked_features =
diff --git a/qapi/virtio.json b/qapi/virtio.json
index e6dcee7b83..608b841a89 100644
--- a/qapi/virtio.json
+++ b/qapi/virtio.json
@@ -79,12 +79,20 @@
 #
 # @vq-index: vhost_dev vq_index
 #
+# @features-bits: vhost_dev features in decimal format
+#
 # @features: vhost_dev features
 #
+# @acked-features-bits: vhost_dev acked_features in decimal format
+#
 # @acked-features: vhost_dev acked

[RFC 2/2] vhost-user-test: Add negotiated features check

2023-11-12 Thread Hyman Huang
When a vhost-user network device is restored from an unexpected
failure, the acked_features could be used as input for the
VHOST_USER_SET_FEATURES command because QEMU internally backups
the final features as acked_features after the guest acknowledges
features during virtio-net driver initialization.

The negotiated features check verifies whether the features in the
Vhost slave device and the acked_features in QEMU are identical.

Through the usage of the vhost-user protocol, the test case seeks to
verify that the vhost-user network device is correctly negotiating.

Signed-off-by: Hyman Huang 
---
 tests/qtest/vhost-user-test.c | 100 ++
 1 file changed, 100 insertions(+)

diff --git a/tests/qtest/vhost-user-test.c b/tests/qtest/vhost-user-test.c
index d4e437265f..14df89f823 100644
--- a/tests/qtest/vhost-user-test.c
+++ b/tests/qtest/vhost-user-test.c
@@ -13,6 +13,7 @@
 #include "libqtest-single.h"
 #include "qapi/error.h"
 #include "qapi/qmp/qdict.h"
+#include "qapi/qmp/qlist.h"
 #include "qemu/config-file.h"
 #include "qemu/option.h"
 #include "qemu/range.h"
@@ -169,6 +170,7 @@ typedef struct TestServer {
 int test_flags;
 int queues;
 struct vhost_user_ops *vu_ops;
+uint64_t features;
 } TestServer;
 
 struct vhost_user_ops {
@@ -1020,6 +1022,100 @@ static void test_multiqueue(void *obj, void *arg, 
QGuestAllocator *alloc)
 }
 
 
+static QDict *query_virtio(QTestState *who)
+{
+QDict *rsp;
+
+rsp = qtest_qmp(who, "{ 'execute': 'x-query-virtio'}");
+g_assert(!qdict_haskey(rsp, "error"));
+g_assert(qdict_haskey(rsp, "return"));
+
+return rsp;
+}
+
+static QDict *query_virtio_status(QTestState *who, const char *path)
+{
+QDict *rsp;
+
+rsp = qtest_qmp(who, "{ 'execute': 'x-query-virtio-status', "
+"'arguments': { 'path': %s, 'show-bits': true} }", path);
+
+g_assert(!qdict_haskey(rsp, "error"));
+g_assert(qdict_haskey(rsp, "return"));
+
+return rsp;
+}
+
+static uint64_t get_acked_features(QTestState *who)
+{
+QDict *rsp_return, *status, *vhost_info, *dev;
+QList *dev_list;
+const QListEntry *entry;
+const char *name;
+char *path;
+uint64_t acked_features;
+
+/* query the virtio devices */
+rsp_return = query_virtio(who);
+g_assert(rsp_return);
+
+dev_list = qdict_get_qlist(rsp_return, "return");
+g_assert(dev_list && !qlist_empty(dev_list));
+
+/* fetch the first and the sole device */
+entry = qlist_first(dev_list);
+g_assert(entry);
+
+dev = qobject_to(QDict, qlist_entry_obj(entry));
+g_assert(dev);
+
+name = qdict_get_try_str(dev, "name");
+g_assert_cmpstr(name, ==, "virtio-net");
+
+path = g_strdup(qdict_get_try_str(dev, "path"));
+g_assert(path);
+qobject_unref(rsp_return);
+rsp_return = NULL;
+
+/* fetch the status of the virtio-net device by QOM path */
+rsp_return = query_virtio_status(who, path);
+g_assert(rsp_return);
+
+status = qdict_get_qdict(rsp_return, "return");
+g_assert(status);
+
+vhost_info = qdict_get_qdict(status, "vhost-dev");
+g_assert(vhost_info);
+
+acked_features = qdict_get_try_int(vhost_info, "acked-features-bits", 0);
+
+qobject_unref(rsp_return);
+g_free(path);
+
+return acked_features;
+}
+
+static void acked_features_check(QTestState *qts, TestServer *s)
+{
+uint64_t acked_features;
+
+acked_features = get_acked_features(qts);
+g_assert_cmpint(acked_features, ==, s->features);
+}
+
+static void test_acked_features(void *obj,
+ void *arg,
+ QGuestAllocator *alloc)
+{
+TestServer *server = arg;
+
+if (!wait_for_fds(server)) {
+return;
+}
+
+acked_features_check(global_qtest, server);
+}
+
 static uint64_t vu_net_get_features(TestServer *s)
 {
 uint64_t features = 0x1ULL << VHOST_F_LOG_ALL |
@@ -1040,6 +1136,7 @@ static void vu_net_set_features(TestServer *s, 
CharBackend *chr,
 qemu_chr_fe_disconnect(chr);
 s->test_flags = TEST_FLAGS_BAD;
 }
+s->features = msg->payload.u64;
 }
 
 static void vu_net_get_protocol_features(TestServer *s, CharBackend *chr,
@@ -1109,6 +1206,9 @@ static void register_vhost_user_test(void)
 qos_add_test("vhost-user/multiqueue",
  "virtio-net",
  test_multiqueue, );
+qos_add_test("vhost-user/read_acked_features",
+ "virtio-net",
+ test_acked_features, );
 }
 libqos_init(register_vhost_user_test);
 
-- 
2.39.1




[RFC 0/2] vhost-user-test: Add negotiated features check

2023-11-12 Thread Hyman Huang
The patchset "Fix the virtio features negotiation flaw" fix a
vhost-user negotiation flaw:
c9bdc449f9 vhost-user: Fix the virtio features negotiation flaw
bebcac052a vhost-user: Refactor the chr_closed_bh
937b7d96e4 vhost-user: Refactor vhost acked features saving

While the test case remain unmerged, the detail reference:
https://lore.kernel.org/qemu-devel/cover.1667232396.git.huang...@chinatelecom.cn/

Since Michael pointed out that the info virtio makes sense to query
the negotiation feature, this patchset uses the x-query-virtio-status
to retrieve the features instead of exporting netdev capabilities and
information as we did in the previous patchset to aid in confirming
the negotiation's validity.

To do that, we first introduce an "show-bits" argument for
x-query-virtio-status such that the feature bits can be used
directly, and then implement the test case for negotiated features
check. As we post, the code is divided into two patches.

Please review, thanks,
Yong

Hyman Huang (2):
  qapi/virtio: introduce the "show-bits" argument for
x-query-virtio-status
  vhost-user-test: Add negotiated features check

 hw/virtio/virtio-hmp-cmds.c   |   2 +-
 hw/virtio/virtio-qmp.c|  21 ++-
 qapi/virtio.json  |  49 -
 tests/qtest/vhost-user-test.c | 100 ++
 4 files changed, 167 insertions(+), 5 deletions(-)

-- 
2.39.1




[v3 6/6] docs/migration: Add the dirty limit section

2023-11-01 Thread Hyman Huang
The dirty limit feature has been introduced since the 8.1
QEMU release but has not reflected in the document, add a
section for that.

Signed-off-by: Hyman Huang 
Reviewed-by: Fabiano Rosas 
Message-Id: 
<36194a8a23d937392bf13d9fff8e898030c827a3.1697815117.git.yong.hu...@smartx.com>
---
 docs/devel/migration.rst | 71 
 1 file changed, 71 insertions(+)

diff --git a/docs/devel/migration.rst b/docs/devel/migration.rst
index be913630c3..12c35f9bc4 100644
--- a/docs/devel/migration.rst
+++ b/docs/devel/migration.rst
@@ -590,6 +590,77 @@ path.
  Return path  - opened by main thread, written by main thread AND postcopy
  thread (protected by rp_mutex)
 
+Dirty limit
+=
+The dirty limit, short for dirty page rate upper limit, is a new capability
+introduced in the 8.1 QEMU release that uses a new algorithm based on the KVM
+dirty ring to throttle down the guest during live migration.
+
+The algorithm framework is as follows:
+
+::
+
+  
--
+  main   --> throttle thread > PREPARE(1) <
+  thread  \|  |
+   \   |  |
+\  V  |
+ -\CALCULATE(2)   |
+   \   |  |
+\  |  |
+ \ V  |
+  \SET PENALTY(3) -
+   -\  |
+ \ |
+  \V
+   -> virtual CPU thread ---> ACCEPT PENALTY(4)
+  
--
+
+When the qmp command qmp_set_vcpu_dirty_limit is called for the first time,
+the QEMU main thread starts the throttle thread. The throttle thread, once
+launched, executes the loop, which consists of three steps:
+
+  - PREPARE (1)
+
+ The entire work of PREPARE (1) is preparation for the second stage,
+ CALCULATE(2), as the name implies. It involves preparing the dirty
+ page rate value and the corresponding upper limit of the VM:
+ The dirty page rate is calculated via the KVM dirty ring mechanism,
+ which tells QEMU how many dirty pages a virtual CPU has had since the
+ last KVM_EXIT_DIRTY_RING_FULL exception; The dirty page rate upper
+ limit is specified by caller, therefore fetch it directly.
+
+  - CALCULATE (2)
+
+ Calculate a suitable sleep period for each virtual CPU, which will be
+ used to determine the penalty for the target virtual CPU. The
+ computation must be done carefully in order to reduce the dirty page
+ rate progressively down to the upper limit without oscillation. To
+ achieve this, two strategies are provided: the first is to add or
+ subtract sleep time based on the ratio of the current dirty page rate
+ to the limit, which is used when the current dirty page rate is far
+ from the limit; the second is to add or subtract a fixed time when
+ the current dirty page rate is close to the limit.
+
+  - SET PENALTY (3)
+
+ Set the sleep time for each virtual CPU that should be penalized based
+ on the results of the calculation supplied by step CALCULATE (2).
+
+After completing the three above stages, the throttle thread loops back
+to step PREPARE (1) until the dirty limit is reached.
+
+On the other hand, each virtual CPU thread reads the sleep duration and
+sleeps in the path of the KVM_EXIT_DIRTY_RING_FULL exception handler, that
+is ACCEPT PENALTY (4). Virtual CPUs tied with writing processes will
+obviously exit to the path and get penalized, whereas virtual CPUs involved
+with read processes will not.
+
+In summary, thanks to the KVM dirty ring technology, the dirty limit
+algorithm will restrict virtual CPUs as needed to keep their dirty page
+rate inside the limit. This leads to more steady reading performance during
+live migration and can aid in improving large guest responsiveness.
+
 Postcopy
 
 
-- 
2.39.1




[v3 4/6] tests/migration: Introduce dirty-ring-size option into guestperf

2023-11-01 Thread Hyman Huang
Dirty ring size configuration is not supported by guestperf tool.

Introduce dirty-ring-size (ranges in [1024, 65536]) option so
developers can play with dirty-ring and dirty-limit feature easier.

To set dirty ring size with 4096 during migration test:
$ ./tests/migration/guestperf.py --dirty-ring-size 4096 xxx

Signed-off-by: Hyman Huang 
Reviewed-by: Fabiano Rosas 
Message-Id: 

---
 tests/migration/guestperf/engine.py   | 6 +-
 tests/migration/guestperf/hardware.py | 8 ++--
 tests/migration/guestperf/shell.py| 6 +-
 3 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/tests/migration/guestperf/engine.py 
b/tests/migration/guestperf/engine.py
index da96ca034a..aabf6de4d9 100644
--- a/tests/migration/guestperf/engine.py
+++ b/tests/migration/guestperf/engine.py
@@ -325,7 +325,6 @@ def _get_common_args(self, hardware, tunnelled=False):
 cmdline = "'" + cmdline + "'"
 
 argv = [
-"-accel", "kvm",
 "-cpu", "host",
 "-kernel", self._kernel,
 "-initrd", self._initrd,
@@ -333,6 +332,11 @@ def _get_common_args(self, hardware, tunnelled=False):
 "-m", str((hardware._mem * 1024) + 512),
 "-smp", str(hardware._cpus),
 ]
+if hardware._dirty_ring_size:
+argv.extend(["-accel", "kvm,dirty-ring-size=%s" %
+ hardware._dirty_ring_size])
+else:
+argv.extend(["-accel", "kvm"])
 
 argv.extend(self._get_qemu_serial_args())
 
diff --git a/tests/migration/guestperf/hardware.py 
b/tests/migration/guestperf/hardware.py
index 3145785ffd..f779cc050b 100644
--- a/tests/migration/guestperf/hardware.py
+++ b/tests/migration/guestperf/hardware.py
@@ -23,7 +23,8 @@ def __init__(self, cpus=1, mem=1,
  src_cpu_bind=None, src_mem_bind=None,
  dst_cpu_bind=None, dst_mem_bind=None,
  prealloc_pages = False,
- huge_pages=False, locked_pages=False):
+ huge_pages=False, locked_pages=False,
+ dirty_ring_size=0):
 self._cpus = cpus
 self._mem = mem # GiB
 self._src_mem_bind = src_mem_bind # List of NUMA nodes
@@ -33,6 +34,7 @@ def __init__(self, cpus=1, mem=1,
 self._prealloc_pages = prealloc_pages
 self._huge_pages = huge_pages
 self._locked_pages = locked_pages
+self._dirty_ring_size = dirty_ring_size
 
 
 def serialize(self):
@@ -46,6 +48,7 @@ def serialize(self):
 "prealloc_pages": self._prealloc_pages,
 "huge_pages": self._huge_pages,
 "locked_pages": self._locked_pages,
+"dirty_ring_size": self._dirty_ring_size,
 }
 
 @classmethod
@@ -59,4 +62,5 @@ def deserialize(cls, data):
 data["dst_mem_bind"],
 data["prealloc_pages"],
 data["huge_pages"],
-data["locked_pages"])
+data["locked_pages"],
+data["dirty_ring_size"])
diff --git a/tests/migration/guestperf/shell.py 
b/tests/migration/guestperf/shell.py
index 8a809e3dda..7d6b8cd7cf 100644
--- a/tests/migration/guestperf/shell.py
+++ b/tests/migration/guestperf/shell.py
@@ -60,6 +60,8 @@ def __init__(self):
 parser.add_argument("--prealloc-pages", dest="prealloc_pages", 
default=False)
 parser.add_argument("--huge-pages", dest="huge_pages", default=False)
 parser.add_argument("--locked-pages", dest="locked_pages", 
default=False)
+parser.add_argument("--dirty-ring-size", dest="dirty_ring_size",
+default=0, type=int)
 
 self._parser = parser
 
@@ -89,7 +91,9 @@ def split_map(value):
 
 locked_pages=args.locked_pages,
 huge_pages=args.huge_pages,
-prealloc_pages=args.prealloc_pages)
+prealloc_pages=args.prealloc_pages,
+
+dirty_ring_size=args.dirty_ring_size)
 
 
 class Shell(BaseShell):
-- 
2.39.1




[v3 5/6] tests/migration: Introduce dirty-limit into guestperf

2023-11-01 Thread Hyman Huang
Currently, guestperf does not cover the dirty-limit
migration, support this feature.

Note that dirty-limit requires 'dirty-ring-size' set.

To enable dirty-limit, setting x-vcpu-dirty-limit-period
as 500ms and x-vcpu-dirty-limit as 10MB/s:
$ ./tests/migration/guestperf.py \
--dirty-ring-size 4096 \
--dirty-limit --x-vcpu-dirty-limit-period 500 \
--vcpu-dirty-limit 10 --output output.json \

To run the entire standardized set of dirty-limit-enabled
comparisons, with unix migration:
$ ./tests/migration/guestperf-batch.py \
--dirty-ring-size 4096 \
--dst-host localhost --transport unix \
--filter compr-dirty-limit* --output outputdir

Signed-off-by: Hyman Huang 
Reviewed-by: Fabiano Rosas 
Message-Id: 
<516e7a55dfc6e33d33510be37eb24223de5dc072.1697815117.git.yong.hu...@smartx.com>
---
 tests/migration/guestperf/comparison.py | 23 +++
 tests/migration/guestperf/engine.py | 17 +
 tests/migration/guestperf/progress.py   | 16 ++--
 tests/migration/guestperf/scenario.py   | 11 ++-
 tests/migration/guestperf/shell.py  | 18 +-
 5 files changed, 81 insertions(+), 4 deletions(-)

diff --git a/tests/migration/guestperf/comparison.py 
b/tests/migration/guestperf/comparison.py
index c03b3f6d7e..42cc0372d1 100644
--- a/tests/migration/guestperf/comparison.py
+++ b/tests/migration/guestperf/comparison.py
@@ -135,4 +135,27 @@ def __init__(self, name, scenarios):
 Scenario("compr-multifd-channels-64",
  multifd=True, multifd_channels=64),
 ]),
+
+# Looking at effect of dirty-limit with
+# varying x_vcpu_dirty_limit_period
+Comparison("compr-dirty-limit-period", scenarios = [
+Scenario("compr-dirty-limit-period-500",
+ dirty_limit=True, x_vcpu_dirty_limit_period=500),
+Scenario("compr-dirty-limit-period-800",
+ dirty_limit=True, x_vcpu_dirty_limit_period=800),
+Scenario("compr-dirty-limit-period-1000",
+ dirty_limit=True, x_vcpu_dirty_limit_period=1000),
+]),
+
+
+# Looking at effect of dirty-limit with
+# varying vcpu_dirty_limit
+Comparison("compr-dirty-limit", scenarios = [
+Scenario("compr-dirty-limit-10MB",
+ dirty_limit=True, vcpu_dirty_limit=10),
+Scenario("compr-dirty-limit-20MB",
+ dirty_limit=True, vcpu_dirty_limit=20),
+Scenario("compr-dirty-limit-50MB",
+ dirty_limit=True, vcpu_dirty_limit=50),
+]),
 ]
diff --git a/tests/migration/guestperf/engine.py 
b/tests/migration/guestperf/engine.py
index aabf6de4d9..608d7270f6 100644
--- a/tests/migration/guestperf/engine.py
+++ b/tests/migration/guestperf/engine.py
@@ -102,6 +102,8 @@ def _migrate_progress(self, vm):
 info.get("expected-downtime", 0),
 info.get("setup-time", 0),
 info.get("cpu-throttle-percentage", 0),
+info.get("dirty-limit-throttle-time-per-round", 0),
+info.get("dirty-limit-ring-full-time", 0),
 )
 
 def _migrate(self, hardware, scenario, src, dst, connect_uri):
@@ -203,6 +205,21 @@ def _migrate(self, hardware, scenario, src, dst, 
connect_uri):
 resp = dst.cmd("migrate-set-parameters",
multifd_channels=scenario._multifd_channels)
 
+if scenario._dirty_limit:
+if not hardware._dirty_ring_size:
+raise Exception("dirty ring size must be configured when "
+"testing dirty limit migration")
+
+resp = src.cmd("migrate-set-capabilities",
+   capabilities = [
+   { "capability": "dirty-limit",
+ "state": True }
+   ])
+resp = src.cmd("migrate-set-parameters",
+x_vcpu_dirty_limit_period=scenario._x_vcpu_dirty_limit_period)
+resp = src.cmd("migrate-set-parameters",
+   vcpu_dirty_limit=scenario._vcpu_dirty_limit)
+
 resp = src.cmd("migrate", uri=connect_uri)
 
 post_copy = False
diff --git a/tests/migration/guestperf/progress.py 
b/tests/migration/guestperf/progress.py
index ab1ee57273..d490584217 100644
--- a/tests/migration/guestperf/progress.py
+++ b/tests/migration/guestperf/progress.py
@@ -81,7 +81,9 @@ def __init__(self,
  downtime,
  downtime_expected,
  setup_time,
- throttle_pcent):
+ throttle_pcent,
+ dirty_limit_throttle_time_per_round,
+ dirty_limit_ring_full_time):
 
 self._status = status
  

[v3 0/6] dirtylimit: miscellaneous patches

2023-11-01 Thread Hyman Huang
v3:
- do nothing but rebase on master

v2:
- rebase on master.
- fix the document typo.

v1:
This is a miscellaneous patchset for dirtylimit that contains
the following parts:

1. dirtylimit module: fix for a race situation and
   replace usleep by g_usleep.
2. migration test: add dirtylimit test case.
3. guestperf for migration: add support for dirtylimit migration.
4. docs for migration: add dirtylimit section. 

Please review, thanks.

Regards,

Hyman Huang (6):
  system/dirtylimit: Fix a race situation
  system/dirtylimit: Drop the reduplicative check
  tests: Add migration dirty-limit capability test
  tests/migration: Introduce dirty-ring-size option into guestperf
  tests/migration: Introduce dirty-limit into guestperf
  docs/migration: Add the dirty limit section

 docs/devel/migration.rst|  71 ++
 system/dirtylimit.c |  24 ++--
 tests/migration/guestperf/comparison.py |  23 
 tests/migration/guestperf/engine.py |  23 +++-
 tests/migration/guestperf/hardware.py   |   8 +-
 tests/migration/guestperf/progress.py   |  16 ++-
 tests/migration/guestperf/scenario.py   |  11 +-
 tests/migration/guestperf/shell.py  |  24 +++-
 tests/qtest/migration-test.c| 164 
 9 files changed, 346 insertions(+), 18 deletions(-)

-- 
2.39.1




[v3 3/6] tests: Add migration dirty-limit capability test

2023-11-01 Thread Hyman Huang
Add migration dirty-limit capability test if kernel support
dirty ring.

Migration dirty-limit capability introduce dirty limit
capability, two parameters: x-vcpu-dirty-limit-period and
vcpu-dirty-limit are introduced to implement the live
migration with dirty limit.

The test case does the following things:
1. start src, dst vm and enable dirty-limit capability
2. start migrate and set cancel it to check if dirty limit
   stop working.
3. restart dst vm
4. start migrate and enable dirty-limit capability
5. check if migration satisfy the convergence condition
   during pre-switchover phase.

Note that this test case involves many passes, so it runs
in slow mode only.

Signed-off-by: Hyman Huang 
Acked-by: Peter Xu 
Reviewed-by: Fabiano Rosas 
Message-Id: 

---
 tests/qtest/migration-test.c | 164 +++
 1 file changed, 164 insertions(+)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index bc70a14642..0693078b07 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -2968,6 +2968,166 @@ static void test_vcpu_dirty_limit(void)
 dirtylimit_stop_vm(vm);
 }
 
+static void migrate_dirty_limit_wait_showup(QTestState *from,
+const int64_t period,
+const int64_t value)
+{
+/* Enable dirty limit capability */
+migrate_set_capability(from, "dirty-limit", true);
+
+/* Set dirty limit parameters */
+migrate_set_parameter_int(from, "x-vcpu-dirty-limit-period", period);
+migrate_set_parameter_int(from, "vcpu-dirty-limit", value);
+
+/* Make sure migrate can't converge */
+migrate_ensure_non_converge(from);
+
+/* To check limit rate after precopy */
+migrate_set_capability(from, "pause-before-switchover", true);
+
+/* Wait for the serial output from the source */
+wait_for_serial("src_serial");
+}
+
+/*
+ * This test does:
+ *  source  destination
+ *  start vm
+ *  start incoming vm
+ *  migrate
+ *  wait dirty limit to begin
+ *  cancel migrate
+ *  cancellation check
+ *  restart incoming vm
+ *  migrate
+ *  wait dirty limit to begin
+ *  wait pre-switchover event
+ *  convergence condition check
+ *
+ * And see if dirty limit migration works correctly.
+ * This test case involves many passes, so it runs in slow mode only.
+ */
+static void test_migrate_dirty_limit(void)
+{
+g_autofree char *uri = g_strdup_printf("unix:%s/migsocket", tmpfs);
+QTestState *from, *to;
+int64_t remaining;
+uint64_t throttle_us_per_full;
+/*
+ * We want the test to be stable and as fast as possible.
+ * E.g., with 1Gb/s bandwith migration may pass without dirty limit,
+ * so we need to decrease a bandwidth.
+ */
+const int64_t dirtylimit_period = 1000, dirtylimit_value = 50;
+const int64_t max_bandwidth = 4; /* ~400Mb/s */
+const int64_t downtime_limit = 250; /* 250ms */
+/*
+ * We migrate through unix-socket (> 500Mb/s).
+ * Thus, expected migration speed ~= bandwidth limit (< 500Mb/s).
+ * So, we can predict expected_threshold
+ */
+const int64_t expected_threshold = max_bandwidth * downtime_limit / 1000;
+int max_try_count = 10;
+MigrateCommon args = {
+.start = {
+.hide_stderr = true,
+.use_dirty_ring = true,
+},
+.listen_uri = uri,
+.connect_uri = uri,
+};
+
+/* Start src, dst vm */
+if (test_migrate_start(, , args.listen_uri, )) {
+return;
+}
+
+/* Prepare for dirty limit migration and wait src vm show up */
+migrate_dirty_limit_wait_showup(from, dirtylimit_period, dirtylimit_value);
+
+/* Start migrate */
+migrate_qmp(from, uri, "{}");
+
+/* Wait for dirty limit throttle begin */
+throttle_us_per_full = 0;
+while (throttle_us_per_full == 0) {
+throttle_us_per_full =
+read_migrate_property_int(from, "dirty-limit-throttle-time-per-round");
+usleep(100);
+g_assert_false(got_src_stop);
+}
+
+/* Now cancel migrate and wait for dirty limit throttle switch off */
+migrate_cancel(from);
+wait_for_migration_status(from, "cancelled", NULL);
+
+/* Check if dirty limit throttle switched off, set timeout 1ms */
+do {
+throttle_us_per_full =
+read_migrate_property_int(from, "dirty-limit-throttle-time-per-round");
+usleep(100);
+g_assert_false(got_src_stop);
+} while (throttle_us_per_full != 0 && --max_try_count);
+
+/* Assert dirty limit is not in service */
+g_assert_cmpint(throttle_us_per_full, ==, 0);
+
+args = (MigrateCommon) {
+.start = {
+.only_target = true,
+.use_dirty_ring = true,
+},
+

[v3 2/6] system/dirtylimit: Drop the reduplicative check

2023-11-01 Thread Hyman Huang
Checking if dirty limit is in service is done by the
dirtylimit_query_all function, drop the reduplicative
check in the qmp_query_vcpu_dirty_limit function.

Signed-off-by: Hyman Huang 
Reviewed-by: Fabiano Rosas 
Message-Id: 
<31384f768279027560ab952ebc2bbff1ddb62531.1697815117.git.yong.hu...@smartx.com>
---
 system/dirtylimit.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/system/dirtylimit.c b/system/dirtylimit.c
index 3666c4cb7c..495c7a7082 100644
--- a/system/dirtylimit.c
+++ b/system/dirtylimit.c
@@ -652,10 +652,6 @@ static struct DirtyLimitInfoList 
*dirtylimit_query_all(void)
 
 struct DirtyLimitInfoList *qmp_query_vcpu_dirty_limit(Error **errp)
 {
-if (!dirtylimit_in_service()) {
-return NULL;
-}
-
 return dirtylimit_query_all();
 }
 
-- 
2.39.1




[v3 1/6] system/dirtylimit: Fix a race situation

2023-11-01 Thread Hyman Huang
Fix a race situation for global variable dirtylimit_state.

Also, replace usleep by g_usleep to increase platform
accessibility to the sleep function.

Signed-off-by: Hyman Huang 
Reviewed-by: Fabiano Rosas 
Message-Id: 

---
 system/dirtylimit.c | 20 ++--
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/system/dirtylimit.c b/system/dirtylimit.c
index fa959d7743..3666c4cb7c 100644
--- a/system/dirtylimit.c
+++ b/system/dirtylimit.c
@@ -411,12 +411,20 @@ void dirtylimit_set_all(uint64_t quota,
 
 void dirtylimit_vcpu_execute(CPUState *cpu)
 {
-if (dirtylimit_in_service() &&
-dirtylimit_vcpu_get_state(cpu->cpu_index)->enabled &&
-cpu->throttle_us_per_full) {
-trace_dirtylimit_vcpu_execute(cpu->cpu_index,
-cpu->throttle_us_per_full);
-usleep(cpu->throttle_us_per_full);
+if (cpu->throttle_us_per_full) {
+dirtylimit_state_lock();
+
+if (dirtylimit_in_service() &&
+dirtylimit_vcpu_get_state(cpu->cpu_index)->enabled) {
+dirtylimit_state_unlock();
+trace_dirtylimit_vcpu_execute(cpu->cpu_index,
+cpu->throttle_us_per_full);
+
+g_usleep(cpu->throttle_us_per_full);
+return;
+}
+
+dirtylimit_state_unlock();
 }
 }
 
-- 
2.39.1




Re: [v2 4/6] tests/migration: Introduce dirty-ring-size option into guestperf

2023-10-27 Thread Hyman Huang

ping1

在 2023/10/23 10:03, Yong Huang 写道:

ping.

Regarding the performance of the live migration, Guestperf could 
provide us
with a clear response. IMHO, by just adding a few metrics, it might be 
developed

into a more user-friendly metrics system in the future.

We may still enrich it prior to that.

On Fri, Oct 20, 2023 at 11:24 PM Hyman Huang  
wrote:


Dirty ring size configuration is not supported by guestperf tool.

Introduce dirty-ring-size (ranges in [1024, 65536]) option so
developers can play with dirty-ring and dirty-limit feature easier.

To set dirty ring size with 4096 during migration test:
$ ./tests/migration/guestperf.py --dirty-ring-size 4096 xxx

Signed-off-by: Hyman Huang 
---
 tests/migration/guestperf/engine.py   | 6 +-
 tests/migration/guestperf/hardware.py | 8 ++--
 tests/migration/guestperf/shell.py    | 6 +-
 3 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/tests/migration/guestperf/engine.py
b/tests/migration/guestperf/engine.py
index da96ca034a..aabf6de4d9 100644
--- a/tests/migration/guestperf/engine.py
+++ b/tests/migration/guestperf/engine.py
@@ -325,7 +325,6 @@ def _get_common_args(self, hardware,
tunnelled=False):
             cmdline = "'" + cmdline + "'"

         argv = [
-            "-accel", "kvm",
             "-cpu", "host",
             "-kernel", self._kernel,
             "-initrd", self._initrd,
@@ -333,6 +332,11 @@ def _get_common_args(self, hardware,
tunnelled=False):
             "-m", str((hardware._mem * 1024) + 512),
             "-smp", str(hardware._cpus),
         ]
+        if hardware._dirty_ring_size:
+            argv.extend(["-accel", "kvm,dirty-ring-size=%s" %
+                         hardware._dirty_ring_size])
+        else:
+            argv.extend(["-accel", "kvm"])

         argv.extend(self._get_qemu_serial_args())

diff --git a/tests/migration/guestperf/hardware.py
b/tests/migration/guestperf/hardware.py
index 3145785ffd..f779cc050b 100644
--- a/tests/migration/guestperf/hardware.py
+++ b/tests/migration/guestperf/hardware.py
@@ -23,7 +23,8 @@ def __init__(self, cpus=1, mem=1,
                  src_cpu_bind=None, src_mem_bind=None,
                  dst_cpu_bind=None, dst_mem_bind=None,
                  prealloc_pages = False,
-                 huge_pages=False, locked_pages=False):
+                 huge_pages=False, locked_pages=False,
+                 dirty_ring_size=0):
         self._cpus = cpus
         self._mem = mem # GiB
         self._src_mem_bind = src_mem_bind # List of NUMA nodes
@@ -33,6 +34,7 @@ def __init__(self, cpus=1, mem=1,
         self._prealloc_pages = prealloc_pages
         self._huge_pages = huge_pages
         self._locked_pages = locked_pages
+        self._dirty_ring_size = dirty_ring_size


     def serialize(self):
@@ -46,6 +48,7 @@ def serialize(self):
             "prealloc_pages": self._prealloc_pages,
             "huge_pages": self._huge_pages,
             "locked_pages": self._locked_pages,
+            "dirty_ring_size": self._dirty_ring_size,
         }

     @classmethod
@@ -59,4 +62,5 @@ def deserialize(cls, data):
             data["dst_mem_bind"],
             data["prealloc_pages"],
             data["huge_pages"],
-            data["locked_pages"])
+            data["locked_pages"],
+            data["dirty_ring_size"])
diff --git a/tests/migration/guestperf/shell.py
b/tests/migration/guestperf/shell.py
index 8a809e3dda..7d6b8cd7cf 100644
--- a/tests/migration/guestperf/shell.py
+++ b/tests/migration/guestperf/shell.py
@@ -60,6 +60,8 @@ def __init__(self):
         parser.add_argument("--prealloc-pages",
dest="prealloc_pages", default=False)
         parser.add_argument("--huge-pages", dest="huge_pages",
default=False)
         parser.add_argument("--locked-pages",
dest="locked_pages", default=False)
+        parser.add_argument("--dirty-ring-size",
dest="dirty_ring_size",
+                            default=0, type=int)

         self._parser = parser

@@ -89,7 +91,9 @@ def split_map(value):

                         locked_pages=args.locked_pages,
                         huge_pages=args.huge_pages,
-                        prealloc_pages=args.prealloc_pages)
+                        prealloc_pages=args.prealloc_pages,
+
+                        dirty_ring_size=args.dirty_ring_size)


 class Shell(BaseShell):
-- 
2.39.1




--
Best regards




[v2 5/6] tests/migration: Introduce dirty-limit into guestperf

2023-10-20 Thread Hyman Huang
Currently, guestperf does not cover the dirty-limit
migration, support this feature.

Note that dirty-limit requires 'dirty-ring-size' set.

To enable dirty-limit, setting x-vcpu-dirty-limit-period
as 500ms and x-vcpu-dirty-limit as 10MB/s:
$ ./tests/migration/guestperf.py \
--dirty-ring-size 4096 \
--dirty-limit --x-vcpu-dirty-limit-period 500 \
--vcpu-dirty-limit 10 --output output.json \

To run the entire standardized set of dirty-limit-enabled
comparisons, with unix migration:
$ ./tests/migration/guestperf-batch.py \
--dirty-ring-size 4096 \
--dst-host localhost --transport unix \
--filter compr-dirty-limit* --output outputdir

Signed-off-by: Hyman Huang 
---
 tests/migration/guestperf/comparison.py | 23 +++
 tests/migration/guestperf/engine.py | 17 +
 tests/migration/guestperf/progress.py   | 16 ++--
 tests/migration/guestperf/scenario.py   | 11 ++-
 tests/migration/guestperf/shell.py  | 18 +-
 5 files changed, 81 insertions(+), 4 deletions(-)

diff --git a/tests/migration/guestperf/comparison.py 
b/tests/migration/guestperf/comparison.py
index c03b3f6d7e..42cc0372d1 100644
--- a/tests/migration/guestperf/comparison.py
+++ b/tests/migration/guestperf/comparison.py
@@ -135,4 +135,27 @@ def __init__(self, name, scenarios):
 Scenario("compr-multifd-channels-64",
  multifd=True, multifd_channels=64),
 ]),
+
+# Looking at effect of dirty-limit with
+# varying x_vcpu_dirty_limit_period
+Comparison("compr-dirty-limit-period", scenarios = [
+Scenario("compr-dirty-limit-period-500",
+ dirty_limit=True, x_vcpu_dirty_limit_period=500),
+Scenario("compr-dirty-limit-period-800",
+ dirty_limit=True, x_vcpu_dirty_limit_period=800),
+Scenario("compr-dirty-limit-period-1000",
+ dirty_limit=True, x_vcpu_dirty_limit_period=1000),
+]),
+
+
+# Looking at effect of dirty-limit with
+# varying vcpu_dirty_limit
+Comparison("compr-dirty-limit", scenarios = [
+Scenario("compr-dirty-limit-10MB",
+ dirty_limit=True, vcpu_dirty_limit=10),
+Scenario("compr-dirty-limit-20MB",
+ dirty_limit=True, vcpu_dirty_limit=20),
+Scenario("compr-dirty-limit-50MB",
+ dirty_limit=True, vcpu_dirty_limit=50),
+]),
 ]
diff --git a/tests/migration/guestperf/engine.py 
b/tests/migration/guestperf/engine.py
index aabf6de4d9..608d7270f6 100644
--- a/tests/migration/guestperf/engine.py
+++ b/tests/migration/guestperf/engine.py
@@ -102,6 +102,8 @@ def _migrate_progress(self, vm):
 info.get("expected-downtime", 0),
 info.get("setup-time", 0),
 info.get("cpu-throttle-percentage", 0),
+info.get("dirty-limit-throttle-time-per-round", 0),
+info.get("dirty-limit-ring-full-time", 0),
 )
 
 def _migrate(self, hardware, scenario, src, dst, connect_uri):
@@ -203,6 +205,21 @@ def _migrate(self, hardware, scenario, src, dst, 
connect_uri):
 resp = dst.cmd("migrate-set-parameters",
multifd_channels=scenario._multifd_channels)
 
+if scenario._dirty_limit:
+if not hardware._dirty_ring_size:
+raise Exception("dirty ring size must be configured when "
+"testing dirty limit migration")
+
+resp = src.cmd("migrate-set-capabilities",
+   capabilities = [
+   { "capability": "dirty-limit",
+ "state": True }
+   ])
+resp = src.cmd("migrate-set-parameters",
+x_vcpu_dirty_limit_period=scenario._x_vcpu_dirty_limit_period)
+resp = src.cmd("migrate-set-parameters",
+   vcpu_dirty_limit=scenario._vcpu_dirty_limit)
+
 resp = src.cmd("migrate", uri=connect_uri)
 
 post_copy = False
diff --git a/tests/migration/guestperf/progress.py 
b/tests/migration/guestperf/progress.py
index ab1ee57273..d490584217 100644
--- a/tests/migration/guestperf/progress.py
+++ b/tests/migration/guestperf/progress.py
@@ -81,7 +81,9 @@ def __init__(self,
  downtime,
  downtime_expected,
  setup_time,
- throttle_pcent):
+ throttle_pcent,
+ dirty_limit_throttle_time_per_round,
+ dirty_limit_ring_full_time):
 
 self._status = status
 self._ram = ram
@@ -91,6 +93,10 @@ def __init__(self,
 self._downtime_expected = downtime_expected
 self._setup_time = setup_t

[v2 0/6] dirtylimit: miscellaneous patches

2023-10-20 Thread Hyman Huang
v2:
- rebase on master.
- fix the document typo.

v1:
This is a miscellaneous patchset for dirtylimit that contains
the following parts:

1. dirtylimit module: fix for a race situation and
   replace usleep by g_usleep.
2. migration test: add dirtylimit test case.
3. guestperf for migration: add support for dirtylimit migration.
4. docs for migration: add dirtylimit section. 

Please review, thanks.

Regards,

Yong

Hyman Huang (6):
  system/dirtylimit: Fix a race situation
  system/dirtylimit: Drop the reduplicative check
  tests: Add migration dirty-limit capability test
  tests/migration: Introduce dirty-ring-size option into guestperf
  tests/migration: Introduce dirty-limit into guestperf
  docs/migration: Add the dirty limit section

 docs/devel/migration.rst|  71 ++
 system/dirtylimit.c |  24 ++--
 tests/migration/guestperf/comparison.py |  23 
 tests/migration/guestperf/engine.py |  23 +++-
 tests/migration/guestperf/hardware.py   |   8 +-
 tests/migration/guestperf/progress.py   |  16 ++-
 tests/migration/guestperf/scenario.py   |  11 +-
 tests/migration/guestperf/shell.py  |  24 +++-
 tests/qtest/migration-test.c| 164 
 9 files changed, 346 insertions(+), 18 deletions(-)

-- 
2.39.1




[v2 3/6] tests: Add migration dirty-limit capability test

2023-10-20 Thread Hyman Huang
Add migration dirty-limit capability test if kernel support
dirty ring.

Migration dirty-limit capability introduce dirty limit
capability, two parameters: x-vcpu-dirty-limit-period and
vcpu-dirty-limit are introduced to implement the live
migration with dirty limit.

The test case does the following things:
1. start src, dst vm and enable dirty-limit capability
2. start migrate and set cancel it to check if dirty limit
   stop working.
3. restart dst vm
4. start migrate and enable dirty-limit capability
5. check if migration satisfy the convergence condition
   during pre-switchover phase.

Note that this test case involves many passes, so it runs
in slow mode only.

Signed-off-by: Hyman Huang 
Acked-by: Peter Xu 
Reviewed-by: Fabiano Rosas 
---
 tests/qtest/migration-test.c | 164 +++
 1 file changed, 164 insertions(+)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index e1c110537b..8f966c4d25 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -2943,6 +2943,166 @@ static void test_vcpu_dirty_limit(void)
 dirtylimit_stop_vm(vm);
 }
 
+static void migrate_dirty_limit_wait_showup(QTestState *from,
+const int64_t period,
+const int64_t value)
+{
+/* Enable dirty limit capability */
+migrate_set_capability(from, "dirty-limit", true);
+
+/* Set dirty limit parameters */
+migrate_set_parameter_int(from, "x-vcpu-dirty-limit-period", period);
+migrate_set_parameter_int(from, "vcpu-dirty-limit", value);
+
+/* Make sure migrate can't converge */
+migrate_ensure_non_converge(from);
+
+/* To check limit rate after precopy */
+migrate_set_capability(from, "pause-before-switchover", true);
+
+/* Wait for the serial output from the source */
+wait_for_serial("src_serial");
+}
+
+/*
+ * This test does:
+ *  source  destination
+ *  start vm
+ *  start incoming vm
+ *  migrate
+ *  wait dirty limit to begin
+ *  cancel migrate
+ *  cancellation check
+ *  restart incoming vm
+ *  migrate
+ *  wait dirty limit to begin
+ *  wait pre-switchover event
+ *  convergence condition check
+ *
+ * And see if dirty limit migration works correctly.
+ * This test case involves many passes, so it runs in slow mode only.
+ */
+static void test_migrate_dirty_limit(void)
+{
+g_autofree char *uri = g_strdup_printf("unix:%s/migsocket", tmpfs);
+QTestState *from, *to;
+int64_t remaining;
+uint64_t throttle_us_per_full;
+/*
+ * We want the test to be stable and as fast as possible.
+ * E.g., with 1Gb/s bandwith migration may pass without dirty limit,
+ * so we need to decrease a bandwidth.
+ */
+const int64_t dirtylimit_period = 1000, dirtylimit_value = 50;
+const int64_t max_bandwidth = 4; /* ~400Mb/s */
+const int64_t downtime_limit = 250; /* 250ms */
+/*
+ * We migrate through unix-socket (> 500Mb/s).
+ * Thus, expected migration speed ~= bandwidth limit (< 500Mb/s).
+ * So, we can predict expected_threshold
+ */
+const int64_t expected_threshold = max_bandwidth * downtime_limit / 1000;
+int max_try_count = 10;
+MigrateCommon args = {
+.start = {
+.hide_stderr = true,
+.use_dirty_ring = true,
+},
+.listen_uri = uri,
+.connect_uri = uri,
+};
+
+/* Start src, dst vm */
+if (test_migrate_start(, , args.listen_uri, )) {
+return;
+}
+
+/* Prepare for dirty limit migration and wait src vm show up */
+migrate_dirty_limit_wait_showup(from, dirtylimit_period, dirtylimit_value);
+
+/* Start migrate */
+migrate_qmp(from, uri, "{}");
+
+/* Wait for dirty limit throttle begin */
+throttle_us_per_full = 0;
+while (throttle_us_per_full == 0) {
+throttle_us_per_full =
+read_migrate_property_int(from, "dirty-limit-throttle-time-per-round");
+usleep(100);
+g_assert_false(got_src_stop);
+}
+
+/* Now cancel migrate and wait for dirty limit throttle switch off */
+migrate_cancel(from);
+wait_for_migration_status(from, "cancelled", NULL);
+
+/* Check if dirty limit throttle switched off, set timeout 1ms */
+do {
+throttle_us_per_full =
+read_migrate_property_int(from, "dirty-limit-throttle-time-per-round");
+usleep(100);
+g_assert_false(got_src_stop);
+} while (throttle_us_per_full != 0 && --max_try_count);
+
+/* Assert dirty limit is not in service */
+g_assert_cmpint(throttle_us_per_full, ==, 0);
+
+args = (MigrateCommon) {
+.start = {
+.only_target = true,
+.use_dirty_ring = true,
+},
+.listen_u

[v2 6/6] docs/migration: Add the dirty limit section

2023-10-20 Thread Hyman Huang
The dirty limit feature has been introduced since the 8.1
QEMU release but has not reflected in the document, add a
section for that.

Signed-off-by: Hyman Huang 
---
 docs/devel/migration.rst | 71 
 1 file changed, 71 insertions(+)

diff --git a/docs/devel/migration.rst b/docs/devel/migration.rst
index c3e1400c0c..347244af89 100644
--- a/docs/devel/migration.rst
+++ b/docs/devel/migration.rst
@@ -588,6 +588,77 @@ path.
  Return path  - opened by main thread, written by main thread AND postcopy
  thread (protected by rp_mutex)
 
+Dirty limit
+=
+The dirty limit, short for dirty page rate upper limit, is a new capability
+introduced in the 8.1 QEMU release that uses a new algorithm based on the KVM
+dirty ring to throttle down the guest during live migration.
+
+The algorithm framework is as follows:
+
+::
+
+  
--
+  main   --> throttle thread > PREPARE(1) <
+  thread  \|  |
+   \   |  |
+\  V  |
+ -\CALCULATE(2)   |
+   \   |  |
+\  |  |
+ \ V  |
+  \SET PENALTY(3) -
+   -\  |
+ \ |
+  \V
+   -> virtual CPU thread ---> ACCEPT PENALTY(4)
+  
--
+
+When the qmp command qmp_set_vcpu_dirty_limit is called for the first time,
+the QEMU main thread starts the throttle thread. The throttle thread, once
+launched, executes the loop, which consists of three steps:
+
+  - PREPARE (1)
+
+ The entire work of PREPARE (1) is preparation for the second stage,
+ CALCULATE(2), as the name implies. It involves preparing the dirty
+ page rate value and the corresponding upper limit of the VM:
+ The dirty page rate is calculated via the KVM dirty ring mechanism,
+ which tells QEMU how many dirty pages a virtual CPU has had since the
+ last KVM_EXIT_DIRTY_RING_FULL exception; The dirty page rate upper
+ limit is specified by caller, therefore fetch it directly.
+
+  - CALCULATE (2)
+
+ Calculate a suitable sleep period for each virtual CPU, which will be
+ used to determine the penalty for the target virtual CPU. The
+ computation must be done carefully in order to reduce the dirty page
+ rate progressively down to the upper limit without oscillation. To
+ achieve this, two strategies are provided: the first is to add or
+ subtract sleep time based on the ratio of the current dirty page rate
+ to the limit, which is used when the current dirty page rate is far
+ from the limit; the second is to add or subtract a fixed time when
+ the current dirty page rate is close to the limit.
+
+  - SET PENALTY (3)
+
+ Set the sleep time for each virtual CPU that should be penalized based
+ on the results of the calculation supplied by step CALCULATE (2).
+
+After completing the three above stages, the throttle thread loops back
+to step PREPARE (1) until the dirty limit is reached.
+
+On the other hand, each virtual CPU thread reads the sleep duration and
+sleeps in the path of the KVM_EXIT_DIRTY_RING_FULL exception handler, that
+is ACCEPT PENALTY (4). Virtual CPUs tied with writing processes will
+obviously exit to the path and get penalized, whereas virtual CPUs involved
+with read processes will not.
+
+In summary, thanks to the KVM dirty ring technology, the dirty limit
+algorithm will restrict virtual CPUs as needed to keep their dirty page
+rate inside the limit. This leads to more steady reading performance during
+live migration and can aid in improving large guest responsiveness.
+
 Postcopy
 
 
-- 
2.39.1




[v2 2/6] system/dirtylimit: Drop the reduplicative check

2023-10-20 Thread Hyman Huang
Checking if dirty limit is in service is done by the
dirtylimit_query_all function, drop the reduplicative
check in the qmp_query_vcpu_dirty_limit function.

Signed-off-by: Hyman Huang 
Reviewed-by: Fabiano Rosas 
---
 system/dirtylimit.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/system/dirtylimit.c b/system/dirtylimit.c
index 3666c4cb7c..495c7a7082 100644
--- a/system/dirtylimit.c
+++ b/system/dirtylimit.c
@@ -652,10 +652,6 @@ static struct DirtyLimitInfoList 
*dirtylimit_query_all(void)
 
 struct DirtyLimitInfoList *qmp_query_vcpu_dirty_limit(Error **errp)
 {
-if (!dirtylimit_in_service()) {
-return NULL;
-}
-
 return dirtylimit_query_all();
 }
 
-- 
2.39.1




[v2 1/6] system/dirtylimit: Fix a race situation

2023-10-20 Thread Hyman Huang
Fix a race situation for global variable dirtylimit_state.

Also, replace usleep by g_usleep to increase platform
accessibility to the sleep function.

Signed-off-by: Hyman Huang 
Reviewed-by: Fabiano Rosas 
---
 system/dirtylimit.c | 20 ++--
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/system/dirtylimit.c b/system/dirtylimit.c
index fa959d7743..3666c4cb7c 100644
--- a/system/dirtylimit.c
+++ b/system/dirtylimit.c
@@ -411,12 +411,20 @@ void dirtylimit_set_all(uint64_t quota,
 
 void dirtylimit_vcpu_execute(CPUState *cpu)
 {
-if (dirtylimit_in_service() &&
-dirtylimit_vcpu_get_state(cpu->cpu_index)->enabled &&
-cpu->throttle_us_per_full) {
-trace_dirtylimit_vcpu_execute(cpu->cpu_index,
-cpu->throttle_us_per_full);
-usleep(cpu->throttle_us_per_full);
+if (cpu->throttle_us_per_full) {
+dirtylimit_state_lock();
+
+if (dirtylimit_in_service() &&
+dirtylimit_vcpu_get_state(cpu->cpu_index)->enabled) {
+dirtylimit_state_unlock();
+trace_dirtylimit_vcpu_execute(cpu->cpu_index,
+cpu->throttle_us_per_full);
+
+g_usleep(cpu->throttle_us_per_full);
+return;
+}
+
+dirtylimit_state_unlock();
 }
 }
 
-- 
2.39.1




[v2 4/6] tests/migration: Introduce dirty-ring-size option into guestperf

2023-10-20 Thread Hyman Huang
Dirty ring size configuration is not supported by guestperf tool.

Introduce dirty-ring-size (ranges in [1024, 65536]) option so
developers can play with dirty-ring and dirty-limit feature easier.

To set dirty ring size with 4096 during migration test:
$ ./tests/migration/guestperf.py --dirty-ring-size 4096 xxx

Signed-off-by: Hyman Huang 
---
 tests/migration/guestperf/engine.py   | 6 +-
 tests/migration/guestperf/hardware.py | 8 ++--
 tests/migration/guestperf/shell.py| 6 +-
 3 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/tests/migration/guestperf/engine.py 
b/tests/migration/guestperf/engine.py
index da96ca034a..aabf6de4d9 100644
--- a/tests/migration/guestperf/engine.py
+++ b/tests/migration/guestperf/engine.py
@@ -325,7 +325,6 @@ def _get_common_args(self, hardware, tunnelled=False):
 cmdline = "'" + cmdline + "'"
 
 argv = [
-"-accel", "kvm",
 "-cpu", "host",
 "-kernel", self._kernel,
 "-initrd", self._initrd,
@@ -333,6 +332,11 @@ def _get_common_args(self, hardware, tunnelled=False):
 "-m", str((hardware._mem * 1024) + 512),
 "-smp", str(hardware._cpus),
 ]
+if hardware._dirty_ring_size:
+argv.extend(["-accel", "kvm,dirty-ring-size=%s" %
+ hardware._dirty_ring_size])
+else:
+argv.extend(["-accel", "kvm"])
 
 argv.extend(self._get_qemu_serial_args())
 
diff --git a/tests/migration/guestperf/hardware.py 
b/tests/migration/guestperf/hardware.py
index 3145785ffd..f779cc050b 100644
--- a/tests/migration/guestperf/hardware.py
+++ b/tests/migration/guestperf/hardware.py
@@ -23,7 +23,8 @@ def __init__(self, cpus=1, mem=1,
  src_cpu_bind=None, src_mem_bind=None,
  dst_cpu_bind=None, dst_mem_bind=None,
  prealloc_pages = False,
- huge_pages=False, locked_pages=False):
+ huge_pages=False, locked_pages=False,
+ dirty_ring_size=0):
 self._cpus = cpus
 self._mem = mem # GiB
 self._src_mem_bind = src_mem_bind # List of NUMA nodes
@@ -33,6 +34,7 @@ def __init__(self, cpus=1, mem=1,
 self._prealloc_pages = prealloc_pages
 self._huge_pages = huge_pages
 self._locked_pages = locked_pages
+self._dirty_ring_size = dirty_ring_size
 
 
 def serialize(self):
@@ -46,6 +48,7 @@ def serialize(self):
 "prealloc_pages": self._prealloc_pages,
 "huge_pages": self._huge_pages,
 "locked_pages": self._locked_pages,
+"dirty_ring_size": self._dirty_ring_size,
 }
 
 @classmethod
@@ -59,4 +62,5 @@ def deserialize(cls, data):
 data["dst_mem_bind"],
 data["prealloc_pages"],
 data["huge_pages"],
-data["locked_pages"])
+data["locked_pages"],
+data["dirty_ring_size"])
diff --git a/tests/migration/guestperf/shell.py 
b/tests/migration/guestperf/shell.py
index 8a809e3dda..7d6b8cd7cf 100644
--- a/tests/migration/guestperf/shell.py
+++ b/tests/migration/guestperf/shell.py
@@ -60,6 +60,8 @@ def __init__(self):
 parser.add_argument("--prealloc-pages", dest="prealloc_pages", 
default=False)
 parser.add_argument("--huge-pages", dest="huge_pages", default=False)
 parser.add_argument("--locked-pages", dest="locked_pages", 
default=False)
+parser.add_argument("--dirty-ring-size", dest="dirty_ring_size",
+default=0, type=int)
 
 self._parser = parser
 
@@ -89,7 +91,9 @@ def split_map(value):
 
 locked_pages=args.locked_pages,
 huge_pages=args.huge_pages,
-prealloc_pages=args.prealloc_pages)
+prealloc_pages=args.prealloc_pages,
+
+dirty_ring_size=args.dirty_ring_size)
 
 
 class Shell(BaseShell):
-- 
2.39.1




[PATCH 5/6] tests/migration: Introduce dirty-limit into guestperf

2023-10-17 Thread Hyman Huang
Currently, guestperf does not cover the dirty-limit
migration, support this feature.

Note that dirty-limit requires 'dirty-ring-size' set.

To enable dirty-limit, setting x-vcpu-dirty-limit-period
as 500ms and x-vcpu-dirty-limit as 10MB/s:
$ ./tests/migration/guestperf.py \
--dirty-ring-size 4096 \
--dirty-limit --x-vcpu-dirty-limit-period 500 \
--vcpu-dirty-limit 10 --output output.json \

To run the entire standardized set of dirty-limit-enabled
comparisons, with unix migration:
$ ./tests/migration/guestperf-batch.py \
--dirty-ring-size 4096 \
--dst-host localhost --transport unix \
--filter compr-dirty-limit* --output outputdir

Signed-off-by: Hyman Huang 
---
 tests/migration/guestperf/comparison.py | 23 +++
 tests/migration/guestperf/engine.py | 17 +
 tests/migration/guestperf/progress.py   | 16 ++--
 tests/migration/guestperf/scenario.py   | 11 ++-
 tests/migration/guestperf/shell.py  | 18 +-
 5 files changed, 81 insertions(+), 4 deletions(-)

diff --git a/tests/migration/guestperf/comparison.py 
b/tests/migration/guestperf/comparison.py
index c03b3f6d7e..42cc0372d1 100644
--- a/tests/migration/guestperf/comparison.py
+++ b/tests/migration/guestperf/comparison.py
@@ -135,4 +135,27 @@ def __init__(self, name, scenarios):
 Scenario("compr-multifd-channels-64",
  multifd=True, multifd_channels=64),
 ]),
+
+# Looking at effect of dirty-limit with
+# varying x_vcpu_dirty_limit_period
+Comparison("compr-dirty-limit-period", scenarios = [
+Scenario("compr-dirty-limit-period-500",
+ dirty_limit=True, x_vcpu_dirty_limit_period=500),
+Scenario("compr-dirty-limit-period-800",
+ dirty_limit=True, x_vcpu_dirty_limit_period=800),
+Scenario("compr-dirty-limit-period-1000",
+ dirty_limit=True, x_vcpu_dirty_limit_period=1000),
+]),
+
+
+# Looking at effect of dirty-limit with
+# varying vcpu_dirty_limit
+Comparison("compr-dirty-limit", scenarios = [
+Scenario("compr-dirty-limit-10MB",
+ dirty_limit=True, vcpu_dirty_limit=10),
+Scenario("compr-dirty-limit-20MB",
+ dirty_limit=True, vcpu_dirty_limit=20),
+Scenario("compr-dirty-limit-50MB",
+ dirty_limit=True, vcpu_dirty_limit=50),
+]),
 ]
diff --git a/tests/migration/guestperf/engine.py 
b/tests/migration/guestperf/engine.py
index aabf6de4d9..608d7270f6 100644
--- a/tests/migration/guestperf/engine.py
+++ b/tests/migration/guestperf/engine.py
@@ -102,6 +102,8 @@ def _migrate_progress(self, vm):
 info.get("expected-downtime", 0),
 info.get("setup-time", 0),
 info.get("cpu-throttle-percentage", 0),
+info.get("dirty-limit-throttle-time-per-round", 0),
+info.get("dirty-limit-ring-full-time", 0),
 )
 
 def _migrate(self, hardware, scenario, src, dst, connect_uri):
@@ -203,6 +205,21 @@ def _migrate(self, hardware, scenario, src, dst, 
connect_uri):
 resp = dst.cmd("migrate-set-parameters",
multifd_channels=scenario._multifd_channels)
 
+if scenario._dirty_limit:
+if not hardware._dirty_ring_size:
+raise Exception("dirty ring size must be configured when "
+"testing dirty limit migration")
+
+resp = src.cmd("migrate-set-capabilities",
+   capabilities = [
+   { "capability": "dirty-limit",
+ "state": True }
+   ])
+resp = src.cmd("migrate-set-parameters",
+x_vcpu_dirty_limit_period=scenario._x_vcpu_dirty_limit_period)
+resp = src.cmd("migrate-set-parameters",
+   vcpu_dirty_limit=scenario._vcpu_dirty_limit)
+
 resp = src.cmd("migrate", uri=connect_uri)
 
 post_copy = False
diff --git a/tests/migration/guestperf/progress.py 
b/tests/migration/guestperf/progress.py
index ab1ee57273..d490584217 100644
--- a/tests/migration/guestperf/progress.py
+++ b/tests/migration/guestperf/progress.py
@@ -81,7 +81,9 @@ def __init__(self,
  downtime,
  downtime_expected,
  setup_time,
- throttle_pcent):
+ throttle_pcent,
+ dirty_limit_throttle_time_per_round,
+ dirty_limit_ring_full_time):
 
 self._status = status
 self._ram = ram
@@ -91,6 +93,10 @@ def __init__(self,
 self._downtime_expected = downtime_expected
 self._setup_time = setup_t

[PATCH 4/6] tests/migration: Introduce dirty-ring-size option into guestperf

2023-10-17 Thread Hyman Huang
Dirty ring size configuration is not supported by guestperf tool.

Introduce dirty-ring-size (ranges in [1024, 65536]) option so
developers can play with dirty-ring and dirty-limit feature easier.

To set dirty ring size with 4096 during migration test:
$ ./tests/migration/guestperf.py --dirty-ring-size 4096 xxx

Signed-off-by: Hyman Huang 
---
 tests/migration/guestperf/engine.py   | 6 +-
 tests/migration/guestperf/hardware.py | 8 ++--
 tests/migration/guestperf/shell.py| 6 +-
 3 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/tests/migration/guestperf/engine.py 
b/tests/migration/guestperf/engine.py
index da96ca034a..aabf6de4d9 100644
--- a/tests/migration/guestperf/engine.py
+++ b/tests/migration/guestperf/engine.py
@@ -325,7 +325,6 @@ def _get_common_args(self, hardware, tunnelled=False):
 cmdline = "'" + cmdline + "'"
 
 argv = [
-"-accel", "kvm",
 "-cpu", "host",
 "-kernel", self._kernel,
 "-initrd", self._initrd,
@@ -333,6 +332,11 @@ def _get_common_args(self, hardware, tunnelled=False):
 "-m", str((hardware._mem * 1024) + 512),
 "-smp", str(hardware._cpus),
 ]
+if hardware._dirty_ring_size:
+argv.extend(["-accel", "kvm,dirty-ring-size=%s" %
+ hardware._dirty_ring_size])
+else:
+argv.extend(["-accel", "kvm"])
 
 argv.extend(self._get_qemu_serial_args())
 
diff --git a/tests/migration/guestperf/hardware.py 
b/tests/migration/guestperf/hardware.py
index 3145785ffd..f779cc050b 100644
--- a/tests/migration/guestperf/hardware.py
+++ b/tests/migration/guestperf/hardware.py
@@ -23,7 +23,8 @@ def __init__(self, cpus=1, mem=1,
  src_cpu_bind=None, src_mem_bind=None,
  dst_cpu_bind=None, dst_mem_bind=None,
  prealloc_pages = False,
- huge_pages=False, locked_pages=False):
+ huge_pages=False, locked_pages=False,
+ dirty_ring_size=0):
 self._cpus = cpus
 self._mem = mem # GiB
 self._src_mem_bind = src_mem_bind # List of NUMA nodes
@@ -33,6 +34,7 @@ def __init__(self, cpus=1, mem=1,
 self._prealloc_pages = prealloc_pages
 self._huge_pages = huge_pages
 self._locked_pages = locked_pages
+self._dirty_ring_size = dirty_ring_size
 
 
 def serialize(self):
@@ -46,6 +48,7 @@ def serialize(self):
 "prealloc_pages": self._prealloc_pages,
 "huge_pages": self._huge_pages,
 "locked_pages": self._locked_pages,
+"dirty_ring_size": self._dirty_ring_size,
 }
 
 @classmethod
@@ -59,4 +62,5 @@ def deserialize(cls, data):
 data["dst_mem_bind"],
 data["prealloc_pages"],
 data["huge_pages"],
-data["locked_pages"])
+data["locked_pages"],
+data["dirty_ring_size"])
diff --git a/tests/migration/guestperf/shell.py 
b/tests/migration/guestperf/shell.py
index 8a809e3dda..7d6b8cd7cf 100644
--- a/tests/migration/guestperf/shell.py
+++ b/tests/migration/guestperf/shell.py
@@ -60,6 +60,8 @@ def __init__(self):
 parser.add_argument("--prealloc-pages", dest="prealloc_pages", 
default=False)
 parser.add_argument("--huge-pages", dest="huge_pages", default=False)
 parser.add_argument("--locked-pages", dest="locked_pages", 
default=False)
+parser.add_argument("--dirty-ring-size", dest="dirty_ring_size",
+default=0, type=int)
 
 self._parser = parser
 
@@ -89,7 +91,9 @@ def split_map(value):
 
 locked_pages=args.locked_pages,
 huge_pages=args.huge_pages,
-prealloc_pages=args.prealloc_pages)
+prealloc_pages=args.prealloc_pages,
+
+dirty_ring_size=args.dirty_ring_size)
 
 
 class Shell(BaseShell):
-- 
2.39.1




[PATCH 6/6] docs/migration: Add the dirty limit section

2023-10-17 Thread Hyman Huang
The dirty limit feature has been introduced since the 8.1
QEMU release but has not reflected in the document, add a
section for that.

Signed-off-by: Hyman Huang 
---
 docs/devel/migration.rst | 71 
 1 file changed, 71 insertions(+)

diff --git a/docs/devel/migration.rst b/docs/devel/migration.rst
index c3e1400c0c..1cbec22e2a 100644
--- a/docs/devel/migration.rst
+++ b/docs/devel/migration.rst
@@ -588,6 +588,77 @@ path.
  Return path  - opened by main thread, written by main thread AND postcopy
  thread (protected by rp_mutex)
 
+Dirty limit
+=
+The dirty limit, short for dirty page rate upper limit, is a new capability
+introduced in the 8.1 QEMU release that uses a new algorithm based on the KVM
+dirty ring to throttle down the guest during live migration.
+
+The algorithm framework is as follows:
+
+::
+
+  
--
+  main   --> throttle thread > PREPARE(1) <
+  thread  \|  |
+   \   |  |
+\  V  |
+ -\CALCULATE(2)   |
+   \   |  |
+\  |  |
+ \ V  |
+  \SET PENALTY(3) -
+   -\  |
+ \ |
+  \V
+   -> virtual CPU thread ---> ACCEPT PENALTY(4)
+  
--
+
+When the qmp command qmp_set_vcpu_dirty_limit is called for the first time,
+the QEMU main thread starts the throttle thread. The throttle thread, once
+launched, executes the loop, which consists of three steps:
+
+  - PREPARE (1)
+
+ The entire work of PREPARE (1) is prepared for the second stage,
+ CALCULATE(2), as the name implies. It involves preparing the dirty
+ page rate value and the corresponding upper limit of the VM:
+ The dirty page rate is calculated via the KVM dirty ring mechanism,
+ which tells QEMU how many dirty pages a virtual CPU has had since the
+ last KVM_EXIT_DIRTY_RING_RULL exception; The dirty page rate upper
+ limit is specified by caller, therefore fetch it directly.
+
+  - CALCULATE (2)
+
+ Calculate a suitable sleep period for each virtual CPU, which will be
+ used to determine the penalty for the target virtual CPU. The
+ computation must be done carefully in order to reduce the??dirty page
+ rate progressively down to the upper limit without oscillation. To
+ achieve this, two strategies are provided: the first is to add or
+ subtract sleep time based on the ratio of the current dirty page rate
+ to the limit, which is used when the current dirty page rate is far
+ from the limit; the second is to add or subtract a fixed time when
+ the current dirty page rate is close to the limit.
+
+  - SET PENALTY (3)
+
+ Set the sleep time for each virtual CPU that should be penalized based
+ on the results of the calculation supplied by step CALCULATE (2).
+
+After completing the three above stages, the throttle thread loops back
+to step PREPARE (1) until the dirty limit is reached.
+
+On the other hand, each virtual CPU thread reads the sleep duration and
+sleeps in the path of the KVM_EXIT_DIRTY_RING_RULL exception handler, that
+is ACCEPT PENALTY (4). Virtual CPUs tied with writing processes will
+obviously exit to the path and get penalized, whereas virtual CPUs involved
+with read processes will not.
+
+In summary, thanks to the KVM dirty ring technology, the dirty limit
+algorithm will restrict virtual CPUs as needed to keep their dirty page
+rate inside the limit. This leads to more steady reading performance during
+live migration and can aid in improving large guest responsiveness.
+
 Postcopy
 
 
-- 
2.39.1




  1   2   3   >