Re: [PATCH RESEND v3 00/10] migration: introduce dirtylimit capability

2023-01-11 Thread Markus Armbruster
My sincere apologies for not replying sooner.

This needs a rebase now.  But let me have a look at it first.




Re: [PATCH RESEND v3 00/10] migration: introduce dirtylimit capability

2023-01-02 Thread Hyman Huang

Ping,

Hi, David, how about the commit about live migration:
[PATCH RESEND v3 08/10] migration: Implement dirty-limit convergence algo.

在 2022/12/4 1:09, huang...@chinatelecom.cn 写道:

From: Hyman Huang(黄勇) 

v3(resend):
- fix the syntax error of the topic.

v3:
This version make some modifications inspired by Peter and Markus
as following:
1. Do the code clean up in [PATCH v2 02/11] suggested by Markus
2. Replace the [PATCH v2 03/11] with a much simpler patch posted by
Peter to fix the following bug:
https://bugzilla.redhat.com/show_bug.cgi?id=2124756
3. Fix the error path of migrate_params_check in [PATCH v2 04/11]
pointed out by Markus. Enrich the commit message to explain why
x-vcpu-dirty-limit-period an unstable parameter.
4. Refactor the dirty-limit convergence algo in [PATCH v2 07/11]
suggested by Peter:
a. apply blk_mig_bulk_active check before enable dirty-limit
b. drop the unhelpful check function before enable dirty-limit
c. change the migration_cancel logic, just cancel dirty-limit
   only if dirty-limit capability turned on.
d. abstract a code clean commit [PATCH v3 07/10] to adjust
   the check order before enable auto-converge
5. Change the name of observing indexes during dirty-limit live
migration to make them more easy-understanding. Use the
maximum throttle time of vpus as "dirty-limit-throttle-time-per-full"
6. Fix some grammatical and spelling errors pointed out by Markus
and enrich the document about the dirty-limit live migration
observing indexes "dirty-limit-ring-full-time"
and "dirty-limit-throttle-time-per-full"
7. Change the default value of x-vcpu-dirty-limit-period to 1000ms,
which is optimal value pointed out in cover letter in that
testing environment.
8. Drop the 2 guestperf test commits [PATCH v2 10/11],
[PATCH v2 11/11] and post them with a standalone series in the
future.

Thanks Peter and Markus sincerely for the passionate, efficient
and careful comments and suggestions.

Please review.

Yong

v2:
This version make a little bit modifications comparing with
version 1 as following:
1. fix the overflow issue reported by Peter Maydell
2. add parameter check for hmp "set_vcpu_dirty_limit" command
3. fix the racing issue between dirty ring reaper thread and
Qemu main thread.
4. add migrate parameter check for x-vcpu-dirty-limit-period
and vcpu-dirty-limit.
5. add the logic to forbid hmp/qmp commands set_vcpu_dirty_limit,
cancel_vcpu_dirty_limit during dirty-limit live migration when
implement dirty-limit convergence algo.
6. add capability check to ensure auto-converge and dirty-limit
are mutually exclusive.
7. pre-check if kvm dirty ring size is configured before setting
dirty-limit migrate parameter

A more comprehensive test was done comparing with version 1.

The following are test environment:
-
a. Host hardware info:

CPU:
Intel(R) Xeon(R) Gold 5218 CPU @ 2.30GHz

CPU(s):  64
On-line CPU(s) list: 0-63
Thread(s) per core:  2
Core(s) per socket:  16
Socket(s):   2
NUMA node(s):2

NUMA node0 CPU(s):   0-15,32-47
NUMA node1 CPU(s):   16-31,48-63

Memory:
Hynix  503Gi

Interface:
Intel Corporation Ethernet Connection X722 for 1GbE (rev 09)
Speed: 1000Mb/s

b. Host software info:

OS: ctyunos release 2
Kernel: 4.19.90-2102.2.0.0066.ctl2.x86_64
Libvirt baseline version:  libvirt-6.9.0
Qemu baseline version: qemu-5.0

c. vm scale
CPU: 4
Memory: 4G
-

All the supplementary test data shown as follows are basing on
above test environment.

In version 1, we post a test data from unixbench as follows:

$ taskset -c 8-15 ./Run -i 2 -c 8 {unixbench test item}

host cpu: Intel(R) Xeon(R) Platinum 8378A
host interface speed: 1000Mb/s
   |-+++---|
   | UnixBench test item | Normal | Dirtylimit | Auto-converge |
   |-+++---|
   | dhry2reg| 32800  | 32786  | 25292 |
   | whetstone-double| 10326  | 10315  | 9847  |
   | pipe| 15442  | 15271  | 14506 |
   | context1| 7260   | 6235   | 4514  |
   | spawn   | 3663   | 3317   | 3249  |
   | syscall | 4669   | 4667   | 3841  |
   |-+++---|

In version 2, we post a supplementary test data that do not use
taskset and make the scenario more general, see as follows:

$ ./Run

per-vcpu data:
   |-+++---|
   | UnixBench test item | Normal | Dirtylimit | Auto-converge |
   |-+++---|
   | dhry2reg| 2991   

Re: [PATCH RESEND v3 00/10] migration: introduce dirtylimit capability

2022-12-08 Thread Hyman

Ping ?

在 2022/12/4 1:09, huang...@chinatelecom.cn 写道:

From: Hyman Huang(黄勇) 

v3(resend):
- fix the syntax error of the topic.

v3:
This version make some modifications inspired by Peter and Markus
as following:
1. Do the code clean up in [PATCH v2 02/11] suggested by Markus
2. Replace the [PATCH v2 03/11] with a much simpler patch posted by
Peter to fix the following bug:
https://bugzilla.redhat.com/show_bug.cgi?id=2124756
3. Fix the error path of migrate_params_check in [PATCH v2 04/11]
pointed out by Markus. Enrich the commit message to explain why
x-vcpu-dirty-limit-period an unstable parameter.
4. Refactor the dirty-limit convergence algo in [PATCH v2 07/11]
suggested by Peter:
a. apply blk_mig_bulk_active check before enable dirty-limit
b. drop the unhelpful check function before enable dirty-limit
c. change the migration_cancel logic, just cancel dirty-limit
   only if dirty-limit capability turned on.
d. abstract a code clean commit [PATCH v3 07/10] to adjust
   the check order before enable auto-converge
5. Change the name of observing indexes during dirty-limit live
migration to make them more easy-understanding. Use the
maximum throttle time of vpus as "dirty-limit-throttle-time-per-full"
6. Fix some grammatical and spelling errors pointed out by Markus
and enrich the document about the dirty-limit live migration
observing indexes "dirty-limit-ring-full-time"
and "dirty-limit-throttle-time-per-full"
7. Change the default value of x-vcpu-dirty-limit-period to 1000ms,
which is optimal value pointed out in cover letter in that
testing environment.
8. Drop the 2 guestperf test commits [PATCH v2 10/11],
[PATCH v2 11/11] and post them with a standalone series in the
future.

Thanks Peter and Markus sincerely for the passionate, efficient
and careful comments and suggestions.

Please review.

Yong

v2:
This version make a little bit modifications comparing with
version 1 as following:
1. fix the overflow issue reported by Peter Maydell
2. add parameter check for hmp "set_vcpu_dirty_limit" command
3. fix the racing issue between dirty ring reaper thread and
Qemu main thread.
4. add migrate parameter check for x-vcpu-dirty-limit-period
and vcpu-dirty-limit.
5. add the logic to forbid hmp/qmp commands set_vcpu_dirty_limit,
cancel_vcpu_dirty_limit during dirty-limit live migration when
implement dirty-limit convergence algo.
6. add capability check to ensure auto-converge and dirty-limit
are mutually exclusive.
7. pre-check if kvm dirty ring size is configured before setting
dirty-limit migrate parameter

A more comprehensive test was done comparing with version 1.

The following are test environment:
-
a. Host hardware info:

CPU:
Intel(R) Xeon(R) Gold 5218 CPU @ 2.30GHz

CPU(s):  64
On-line CPU(s) list: 0-63
Thread(s) per core:  2
Core(s) per socket:  16
Socket(s):   2
NUMA node(s):2

NUMA node0 CPU(s):   0-15,32-47
NUMA node1 CPU(s):   16-31,48-63

Memory:
Hynix  503Gi

Interface:
Intel Corporation Ethernet Connection X722 for 1GbE (rev 09)
Speed: 1000Mb/s

b. Host software info:

OS: ctyunos release 2
Kernel: 4.19.90-2102.2.0.0066.ctl2.x86_64
Libvirt baseline version:  libvirt-6.9.0
Qemu baseline version: qemu-5.0

c. vm scale
CPU: 4
Memory: 4G
-

All the supplementary test data shown as follows are basing on
above test environment.

In version 1, we post a test data from unixbench as follows:

$ taskset -c 8-15 ./Run -i 2 -c 8 {unixbench test item}

host cpu: Intel(R) Xeon(R) Platinum 8378A
host interface speed: 1000Mb/s
   |-+++---|
   | UnixBench test item | Normal | Dirtylimit | Auto-converge |
   |-+++---|
   | dhry2reg| 32800  | 32786  | 25292 |
   | whetstone-double| 10326  | 10315  | 9847  |
   | pipe| 15442  | 15271  | 14506 |
   | context1| 7260   | 6235   | 4514  |
   | spawn   | 3663   | 3317   | 3249  |
   | syscall | 4669   | 4667   | 3841  |
   |-+++---|

In version 2, we post a supplementary test data that do not use
taskset and make the scenario more general, see as follows:

$ ./Run

per-vcpu data:
   |-+++---|
   | UnixBench test item | Normal | Dirtylimit | Auto-converge |
   |-+++---|
   | dhry2reg| 2991   | 2902   | 1722  |
   | whetstone-double| 1018   | 1006   | 627   |
   | Execl Throughput| 955   

[PATCH RESEND v3 00/10] migration: introduce dirtylimit capability

2022-12-03 Thread huangy81
From: Hyman Huang(黄勇) 

v3(resend):
- fix the syntax error of the topic.

v3:
This version make some modifications inspired by Peter and Markus
as following:
1. Do the code clean up in [PATCH v2 02/11] suggested by Markus 
2. Replace the [PATCH v2 03/11] with a much simpler patch posted by
   Peter to fix the following bug:
   https://bugzilla.redhat.com/show_bug.cgi?id=2124756
3. Fix the error path of migrate_params_check in [PATCH v2 04/11]
   pointed out by Markus. Enrich the commit message to explain why
   x-vcpu-dirty-limit-period an unstable parameter.
4. Refactor the dirty-limit convergence algo in [PATCH v2 07/11] 
   suggested by Peter:
   a. apply blk_mig_bulk_active check before enable dirty-limit
   b. drop the unhelpful check function before enable dirty-limit
   c. change the migration_cancel logic, just cancel dirty-limit
  only if dirty-limit capability turned on. 
   d. abstract a code clean commit [PATCH v3 07/10] to adjust
  the check order before enable auto-converge 
5. Change the name of observing indexes during dirty-limit live
   migration to make them more easy-understanding. Use the
   maximum throttle time of vpus as "dirty-limit-throttle-time-per-full"
6. Fix some grammatical and spelling errors pointed out by Markus
   and enrich the document about the dirty-limit live migration
   observing indexes "dirty-limit-ring-full-time"
   and "dirty-limit-throttle-time-per-full"
7. Change the default value of x-vcpu-dirty-limit-period to 1000ms,
   which is optimal value pointed out in cover letter in that
   testing environment.
8. Drop the 2 guestperf test commits [PATCH v2 10/11],
   [PATCH v2 11/11] and post them with a standalone series in the
   future.

Thanks Peter and Markus sincerely for the passionate, efficient
and careful comments and suggestions.

Please review.  

Yong

v2: 
This version make a little bit modifications comparing with
version 1 as following:
1. fix the overflow issue reported by Peter Maydell
2. add parameter check for hmp "set_vcpu_dirty_limit" command
3. fix the racing issue between dirty ring reaper thread and
   Qemu main thread.
4. add migrate parameter check for x-vcpu-dirty-limit-period
   and vcpu-dirty-limit.
5. add the logic to forbid hmp/qmp commands set_vcpu_dirty_limit,
   cancel_vcpu_dirty_limit during dirty-limit live migration when
   implement dirty-limit convergence algo.
6. add capability check to ensure auto-converge and dirty-limit
   are mutually exclusive.
7. pre-check if kvm dirty ring size is configured before setting
   dirty-limit migrate parameter 

A more comprehensive test was done comparing with version 1.

The following are test environment:
-
a. Host hardware info:

CPU:
Intel(R) Xeon(R) Gold 5218 CPU @ 2.30GHz

CPU(s):  64
On-line CPU(s) list: 0-63
Thread(s) per core:  2
Core(s) per socket:  16
Socket(s):   2
NUMA node(s):2

NUMA node0 CPU(s):   0-15,32-47
NUMA node1 CPU(s):   16-31,48-63

Memory:
Hynix  503Gi

Interface:
Intel Corporation Ethernet Connection X722 for 1GbE (rev 09)
Speed: 1000Mb/s

b. Host software info:

OS: ctyunos release 2
Kernel: 4.19.90-2102.2.0.0066.ctl2.x86_64
Libvirt baseline version:  libvirt-6.9.0
Qemu baseline version: qemu-5.0

c. vm scale
CPU: 4
Memory: 4G
-

All the supplementary test data shown as follows are basing on
above test environment.

In version 1, we post a test data from unixbench as follows:

$ taskset -c 8-15 ./Run -i 2 -c 8 {unixbench test item}

host cpu: Intel(R) Xeon(R) Platinum 8378A
host interface speed: 1000Mb/s
  |-+++---|
  | UnixBench test item | Normal | Dirtylimit | Auto-converge |
  |-+++---|
  | dhry2reg| 32800  | 32786  | 25292 |
  | whetstone-double| 10326  | 10315  | 9847  |
  | pipe| 15442  | 15271  | 14506 |
  | context1| 7260   | 6235   | 4514  |
  | spawn   | 3663   | 3317   | 3249  |
  | syscall | 4669   | 4667   | 3841  |
  |-+++---|

In version 2, we post a supplementary test data that do not use
taskset and make the scenario more general, see as follows:

$ ./Run

per-vcpu data:
  |-+++---|
  | UnixBench test item | Normal | Dirtylimit | Auto-converge |
  |-+++---|
  | dhry2reg| 2991   | 2902   | 1722  |
  | whetstone-double| 1018   | 1006   | 627   |
  | Execl Throughput| 955| 320| 660   |
  | File Copy - 1   | 2362   | 805| 1325