RE: [PULL 00/30] Next patches

2022-11-16 Thread Xu, Ling1
Hi, All,
  Very appreciated for your time on reviewing our patch.
  The second CI failure caused by our patch has been addressed. One simple 
way is moving "#endif" in qemu/tests/bench/xbzrle-bench.c from line 46 to line 
450.
We have submitted patch v7 to update this modification. Thanks for your time 
again.

Best Regards,
Ling
  

-Original Message-
From: Stefan Hajnoczi  
Sent: Wednesday, November 16, 2022 2:58 AM
To: Juan Quintela ; Xu, Ling1 ; Zhao, 
Zhou ; Jin, Jun I 
Cc: qemu-devel@nongnu.org; Michael Tokarev ; Marc-André Lureau 
; David Hildenbrand ; Laurent 
Vivier ; Paolo Bonzini ; Daniel P. 
Berrangé ; Peter Xu ; Stefan Hajnoczi 
; Dr. David Alan Gilbert ; Thomas 
Huth ; qemu-bl...@nongnu.org; qemu-triv...@nongnu.org; 
Philippe Mathieu-Daudé ; Fam Zheng 
Subject: Re: [PULL 00/30] Next patches

On Tue, 15 Nov 2022 at 10:40, Juan Quintela  wrote:
>
> The following changes since commit 98f10f0e2613ba1ac2ad3f57a5174014f6dcb03d:
>
>   Merge tag 'pull-target-arm-20221114' of 
> https://git.linaro.org/people/pmaydell/qemu-arm into staging 
> (2022-11-14 13:31:17 -0500)
>
> are available in the Git repository at:
>
>   https://gitlab.com/juan.quintela/qemu.git tags/next-pull-request
>
> for you to fetch changes up to d896a7a40db13fc2d05828c94ddda2747530089c:
>
>   migration: Block migration comment or code is wrong (2022-11-15 
> 10:31:06 +0100)
>
> 
> Migration PULL request (take 2)
>
> Hi
>
> This time properly signed.
>
> [take 1]
> It includes:
> - Leonardo fix for zero_copy flush
> - Fiona fix for return value of readv/writev
> - Peter Xu cleanups
> - Peter Xu preempt patches
> - Patches ready from zero page (me)
> - AVX2 support (ling)
> - fix for slow networking and reordering of first packets (manish)
>
> Please, apply.
>
> 
>
> Fiona Ebner (1):
>   migration/channel-block: fix return value for
> qio_channel_block_{readv,writev}
>
> Juan Quintela (5):
>   multifd: Create page_size fields into both MultiFD{Recv,Send}Params
>   multifd: Create page_count fields into both MultiFD{Recv,Send}Params
>   migration: Export ram_transferred_ram()
>   migration: Export ram_release_page()
>   migration: Block migration comment or code is wrong
>
> Leonardo Bras (1):
>   migration/multifd/zero-copy: Create helper function for flushing
>
> Peter Xu (20):
>   migration: Fix possible infinite loop of ram save process
>   migration: Fix race on qemu_file_shutdown()
>   migration: Disallow postcopy preempt to be used with compress
>   migration: Use non-atomic ops for clear log bitmap
>   migration: Disable multifd explicitly with compression
>   migration: Take bitmap mutex when completing ram migration
>   migration: Add postcopy_preempt_active()
>   migration: Cleanup xbzrle zero page cache update logic
>   migration: Trivial cleanup save_page_header() on same block check
>   migration: Remove RAMState.f references in compression code
>   migration: Yield bitmap_mutex properly when sending/sleeping
>   migration: Use atomic ops properly for page accountings
>   migration: Teach PSS about host page
>   migration: Introduce pss_channel
>   migration: Add pss_init()
>   migration: Make PageSearchStatus part of RAMState
>   migration: Move last_sent_block into PageSearchStatus
>   migration: Send requested page directly in rp-return thread
>   migration: Remove old preempt code around state maintainance
>   migration: Drop rs->f
>
> ling xu (2):
>   Update AVX512 support for xbzrle_encode_buffer
>   Unit test code and benchmark code

This commit causes the following CI failure:

cc -m64 -mcx16 -Ilibauthz.fa.p -I. -I.. -Iqapi -Itrace -Iui/shader
-I/usr/include/glib-2.0 -I/usr/lib/x86_64-linux-gnu/glib-2.0/include
-fdiagnostics-color=auto -Wall -Winvalid-pch -Werror -std=gnu11 -O2 -g -isystem 
/builds/qemu-project/qemu/linux-headers -isystem linux-headers -iquote . 
-iquote /builds/qemu-project/qemu -iquote /builds/qemu-project/qemu/include 
-iquote
/builds/qemu-project/qemu/tcg/i386 -pthread -U_FORTIFY_SOURCE
-D_FORTIFY_SOURCE=2 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE 
-Wstrict-prototypes -Wredundant-decls -Wundef -Wwrite-strings 
-Wmissing-prototypes -fno-strict-aliasing -fno-common -fwrapv 
-Wold-style-declaration -Wold-style-definition -Wtype-limits -Wformat-security 
-Wformat-y2k -Winit-self -Wignored-qualifiers -Wempty-body -Wnested-externs 
-Wendif-labels -Wexpansion-to-defined
-Wimplicit-fallthrough=2 -Wno-missing-include-dirs -Wno-shift-negative-value 
-Wno-psabi -fstack-protector-strong -fPIE -MD -MQ 
libauthz.fa.p/authz_simple.c.o -MF libauthz.fa.p/authz_simple.c.o.d -o 
libauthz.fa.p/authz_sim

RE: [PATCH v6 1/2] Update AVX512 support for xbzrle_encode_buffer

2022-10-26 Thread Xu, Ling1
Hi, All,
 This is a "ping" email~. 
 It seems that the newest version of our patch has been ignored. So I 
"ping" this patchset again. 
 All comments and suggestions have been revised and updated in this V6 
version patch, and link for the patch is below:
 
https://lore.kernel.org/qemu-devel/20220826095719.2887535-2-ling1...@intel.com/
 Looking forward to hearing from you!

Best Regards
Ling

-Original Message-
From: Xu, Ling1  
Sent: Friday, August 26, 2022 5:57 PM
To: qemu-devel@nongnu.org
Cc: quint...@redhat.com; dgilb...@redhat.com; Xu, Ling1 ; 
Zhao, Zhou ; Jin, Jun I 
Subject: [PATCH v6 1/2] Update AVX512 support for xbzrle_encode_buffer

This commit updates code of avx512 support for xbzrle_encode_buffer function to 
accelerate xbzrle encoding speed. Runtime check of avx512 support and benchmark 
for this feature are added. Compared with C version of xbzrle_encode_buffer 
function, avx512 version can achieve 50%-70% performance improvement on 
benchmarking. In addition, if dirty data is randomly located in 4K page, the 
avx512 version can achieve almost 140% performance gain.

Signed-off-by: ling xu 
Co-authored-by: Zhou Zhao 
Co-authored-by: Jun Jin 
---
 meson.build|  16 ++
 meson_options.txt  |   2 +
 migration/ram.c|  34 +++--
 migration/xbzrle.c | 124 +
 migration/xbzrle.h |   4 ++
 5 files changed, 177 insertions(+), 3 deletions(-)

diff --git a/meson.build b/meson.build
index 20fddbd707..5d4b82d7f3 100644
--- a/meson.build
+++ b/meson.build
@@ -2264,6 +2264,22 @@ config_host_data.set('CONFIG_AVX512F_OPT', 
get_option('avx512f') \
 int main(int argc, char *argv[]) { return bar(argv[0]); }
   '''), error_message: 'AVX512F not available').allowed())
 
+config_host_data.set('CONFIG_AVX512BW_OPT', get_option('avx512bw') \
+  .require(have_cpuid_h, error_message: 'cpuid.h not available, cannot 
+enable AVX512BW') \
+  .require(cc.links('''
+#pragma GCC push_options
+#pragma GCC target("avx512bw")
+#include 
+#include 
+static int bar(void *a) {
+
+  __m512i *x = a;
+  __m512i res= _mm512_abs_epi8(*x);
+  return res[1];
+}
+int main(int argc, char *argv[]) { return bar(argv[0]); }  '''), 
+ error_message: 'AVX512BW not available').allowed())
+
 have_pvrdma = get_option('pvrdma') \
   .require(rdma.found(), error_message: 'PVRDMA requires OpenFabrics 
libraries') \
   .require(cc.compiles(gnu_source_prefix + '''
diff --git a/meson_options.txt b/meson_options.txt index e58e158396..07194bf680 
100644
--- a/meson_options.txt
+++ b/meson_options.txt
@@ -104,6 +104,8 @@ option('avx2', type: 'feature', value: 'auto',
description: 'AVX2 optimizations')  option('avx512f', type: 'feature', 
value: 'disabled',
description: 'AVX512F optimizations')
+option('avx512bw', type: 'feature', value: 'auto',
+   description: 'AVX512BW optimizations')
 option('keyring', type: 'feature', value: 'auto',
description: 'Linux keyring support')
 
diff --git a/migration/ram.c b/migration/ram.c index dc1de9ddbc..ff4c15c9c3 
100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -83,6 +83,34 @@
 /* 0x80 is reserved in migration.h start with 0x100 next */
 #define RAM_SAVE_FLAG_COMPRESS_PAGE0x100
 
+int (*xbzrle_encode_buffer_func)(uint8_t *, uint8_t *, int,
+ uint8_t *, int) = xbzrle_encode_buffer; #if 
+defined(CONFIG_AVX512BW_OPT) #include "qemu/cpuid.h"
+static void __attribute__((constructor)) init_cpu_flag(void) {
+unsigned max = __get_cpuid_max(0, NULL);
+int a, b, c, d;
+if (max >= 1) {
+__cpuid(1, a, b, c, d);
+ /* We must check that AVX is not just available, but usable.  */
+if ((c & bit_OSXSAVE) && (c & bit_AVX) && max >= 7) {
+int bv;
+__asm("xgetbv" : "=a"(bv), "=d"(d) : "c"(0));
+__cpuid_count(7, 0, a, b, c, d);
+   /* 0xe6:
+*  XCR0[7:5] = 111b (OPMASK state, upper 256-bit of ZMM0-ZMM15
+*and ZMM16-ZMM31 state are enabled by OS)
+*  XCR0[2:1] = 11b (XMM state and YMM state are enabled by OS)
+*/
+if ((bv & 0xe6) == 0xe6 && (b & bit_AVX512BW)) {
+xbzrle_encode_buffer_func = xbzrle_encode_buffer_avx512;
+}
+}
+}
+}
+#endif
+
 XBZRLECacheStats xbzrle_counters;
 
 /* struct contains XBZRLE cache and a static page @@ -802,9 +830,9 @@ static 
int save_xbzrle_page(RAMState *rs, uint8_t **current_data,
 memcpy(XBZRLE.current_buf, *current_data, TARGET_PAGE_SIZE);
 
 /* XBZRLE encoding (if there is no overflow) */
-encoded_len = xbzrle_encode_buffer(prev_cached_page, XBZRLE.current_buf,
-   TARGET_PAGE_SIZE, XBZRLE.encoded_buf,
-  

RE: [PATCH v6 1/2] Update AVX512 support for xbzrle_encode_buffer

2022-09-19 Thread Xu, Ling1
Hi, All,
 This is a "ping" email~. 
 It seems that my patch has been ignored. So I "ping" this patchset. 
 Link for the patch: 
https://lore.kernel.org/qemu-devel/20220826095719.2887535-2-ling1...@intel.com/

Best Regards
Ling

-Original Message-
From: Xu, Ling1  
Sent: Friday, August 26, 2022 5:57 PM
To: qemu-devel@nongnu.org
Cc: quint...@redhat.com; dgilb...@redhat.com; Xu, Ling1 ; 
Zhao, Zhou ; Jin, Jun I 
Subject: [PATCH v6 1/2] Update AVX512 support for xbzrle_encode_buffer

This commit updates code of avx512 support for xbzrle_encode_buffer function to 
accelerate xbzrle encoding speed. Runtime check of avx512 support and benchmark 
for this feature are added. Compared with C version of xbzrle_encode_buffer 
function, avx512 version can achieve 50%-70% performance improvement on 
benchmarking. In addition, if dirty data is randomly located in 4K page, the 
avx512 version can achieve almost 140% performance gain.

Signed-off-by: ling xu 
Co-authored-by: Zhou Zhao 
Co-authored-by: Jun Jin 
---
 meson.build|  16 ++
 meson_options.txt  |   2 +
 migration/ram.c|  34 +++--
 migration/xbzrle.c | 124 +
 migration/xbzrle.h |   4 ++
 5 files changed, 177 insertions(+), 3 deletions(-)

diff --git a/meson.build b/meson.build
index 20fddbd707..5d4b82d7f3 100644
--- a/meson.build
+++ b/meson.build
@@ -2264,6 +2264,22 @@ config_host_data.set('CONFIG_AVX512F_OPT', 
get_option('avx512f') \
 int main(int argc, char *argv[]) { return bar(argv[0]); }
   '''), error_message: 'AVX512F not available').allowed())
 
+config_host_data.set('CONFIG_AVX512BW_OPT', get_option('avx512bw') \
+  .require(have_cpuid_h, error_message: 'cpuid.h not available, cannot 
+enable AVX512BW') \
+  .require(cc.links('''
+#pragma GCC push_options
+#pragma GCC target("avx512bw")
+#include 
+#include 
+static int bar(void *a) {
+
+  __m512i *x = a;
+  __m512i res= _mm512_abs_epi8(*x);
+  return res[1];
+}
+int main(int argc, char *argv[]) { return bar(argv[0]); }  '''), 
+ error_message: 'AVX512BW not available').allowed())
+
 have_pvrdma = get_option('pvrdma') \
   .require(rdma.found(), error_message: 'PVRDMA requires OpenFabrics 
libraries') \
   .require(cc.compiles(gnu_source_prefix + '''
diff --git a/meson_options.txt b/meson_options.txt index e58e158396..07194bf680 
100644
--- a/meson_options.txt
+++ b/meson_options.txt
@@ -104,6 +104,8 @@ option('avx2', type: 'feature', value: 'auto',
description: 'AVX2 optimizations')  option('avx512f', type: 'feature', 
value: 'disabled',
description: 'AVX512F optimizations')
+option('avx512bw', type: 'feature', value: 'auto',
+   description: 'AVX512BW optimizations')
 option('keyring', type: 'feature', value: 'auto',
description: 'Linux keyring support')
 
diff --git a/migration/ram.c b/migration/ram.c index dc1de9ddbc..ff4c15c9c3 
100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -83,6 +83,34 @@
 /* 0x80 is reserved in migration.h start with 0x100 next */
 #define RAM_SAVE_FLAG_COMPRESS_PAGE0x100
 
+int (*xbzrle_encode_buffer_func)(uint8_t *, uint8_t *, int,
+ uint8_t *, int) = xbzrle_encode_buffer; #if 
+defined(CONFIG_AVX512BW_OPT) #include "qemu/cpuid.h"
+static void __attribute__((constructor)) init_cpu_flag(void) {
+unsigned max = __get_cpuid_max(0, NULL);
+int a, b, c, d;
+if (max >= 1) {
+__cpuid(1, a, b, c, d);
+ /* We must check that AVX is not just available, but usable.  */
+if ((c & bit_OSXSAVE) && (c & bit_AVX) && max >= 7) {
+int bv;
+__asm("xgetbv" : "=a"(bv), "=d"(d) : "c"(0));
+__cpuid_count(7, 0, a, b, c, d);
+   /* 0xe6:
+*  XCR0[7:5] = 111b (OPMASK state, upper 256-bit of ZMM0-ZMM15
+*and ZMM16-ZMM31 state are enabled by OS)
+*  XCR0[2:1] = 11b (XMM state and YMM state are enabled by OS)
+*/
+if ((bv & 0xe6) == 0xe6 && (b & bit_AVX512BW)) {
+xbzrle_encode_buffer_func = xbzrle_encode_buffer_avx512;
+}
+}
+}
+}
+#endif
+
 XBZRLECacheStats xbzrle_counters;
 
 /* struct contains XBZRLE cache and a static page @@ -802,9 +830,9 @@ static 
int save_xbzrle_page(RAMState *rs, uint8_t **current_data,
 memcpy(XBZRLE.current_buf, *current_data, TARGET_PAGE_SIZE);
 
 /* XBZRLE encoding (if there is no overflow) */
-encoded_len = xbzrle_encode_buffer(prev_cached_page, XBZRLE.current_buf,
-   TARGET_PAGE_SIZE, XBZRLE.encoded_buf,
-   TARGET_PAGE_SIZE);
+encoded_len = xbzrle_encode_buffer_func(prev_cached_page, 
XBZRLE.current_buf,
+TARGET_PAGE_SIZE, 
XBZRLE.encoded_buf,
+

RE: [PATCH v5 1/2] Update AVX512 support for xbzrle_encode_buffer

2022-08-26 Thread Xu, Ling1
Hi, juan, 
  Thanks for your time and suggestions on this patch. We have revised our 
code according to your nice comments. We will submit patch v6 to update these 
modifications.

Best Regards
Ling

-Original Message-
From: Juan Quintela  
Sent: Wednesday, August 24, 2022 4:42 PM
To: Xu, Ling1 
Cc: qemu-devel@nongnu.org; dgilb...@redhat.com; Zhao, Zhou 
; Jin, Jun I 
Subject: Re: [PATCH v5 1/2] Update AVX512 support for xbzrle_encode_buffer

ling xu  wrote:
> This commit updates code of avx512 support for xbzrle_encode_buffer 
> function to accelerate xbzrle encoding speed. We add runtime check of 
> avx512 and add benchmark for this feature. Compared with C version of 
> xbzrle_encode_buffer function, avx512 version can achieve 50%-70% 
> performance improvement on benchmarking. In addition, if dirty data is 
> randomly located in 4K page, the avx512 version can achieve almost 
> 140% performance gain.
>
> Signed-off-by: ling xu 
> Co-authored-by: Zhou Zhao 
> Co-authored-by: Jun Jin 
> ---
>  meson.build|  16 ++
>  meson_options.txt  |   2 +
>  migration/ram.c|  35 ++--
>  migration/xbzrle.c | 130 +
>  migration/xbzrle.h |   4 ++
>  5 files changed, 184 insertions(+), 3 deletions(-)
>
> diff --git a/meson.build b/meson.build index 30a380752c..c9d90a5bff 
> 100644
> --- a/meson.build
> +++ b/meson.build
> @@ -2264,6 +2264,22 @@ config_host_data.set('CONFIG_AVX512F_OPT', 
> get_option('avx512f') \
>  int main(int argc, char *argv[]) { return bar(argv[0]); }
>'''), error_message: 'AVX512F not available').allowed())
>  
> +config_host_data.set('CONFIG_AVX512BW_OPT', get_option('avx512bw') \
> +  .require(have_cpuid_h, error_message: 'cpuid.h not available, 
> +cannot enable AVX512BW') \
> +  .require(cc.links('''
> +#pragma GCC push_options
> +#pragma GCC target("avx512bw")
> +#include 
> +#include 
> +static int bar(void *a) {


> +  __m512i x = *(__m512i *)a;
> +  __m512i res= _mm512_abs_epi8(x);

Cast is as ugly as hell, what about:

  __m512i *x = a;
  __m512i res = _mm512_abs_epi8(*x);

??

> +static void __attribute__((constructor)) init_cpu_flag(void) {
> +unsigned max = __get_cpuid_max(0, NULL);
> +int a, b, c, d;
> +if (max >= 1) {
> +__cpuid(1, a, b, c, d);
> + /* We must check that AVX is not just available, but usable.  */
> +if ((c & bit_OSXSAVE) && (c & bit_AVX) && max >= 7) {
> +int bv;
> +__asm("xgetbv" : "=a"(bv), "=d"(d) : "c"(0));
> +__cpuid_count(7, 0, a, b, c, d);
> +   /* 0xe6:
> +*  XCR0[7:5] = 111b (OPMASK state, upper 256-bit of ZMM0-ZMM15
> +*and ZMM16-ZMM31 state are enabled by OS)
> +*  XCR0[2:1] = 11b (XMM state and YMM state are enabled by OS)
> +*/
> +if ((bv & 0xe6) == 0xe6 && (b & bit_AVX512BW)) {
> +xbzrle_encode_buffer_func = xbzrle_encode_buffer_avx512;
> +}
> +}
> +}
> +return ;

This return line is not needed.

> +}
> +#endif
> +
>  XBZRLECacheStats xbzrle_counters;
>  
>  /* struct contains XBZRLE cache and a static page @@ -802,9 +831,9 @@ 
> static int save_xbzrle_page(RAMState *rs, uint8_t **current_data,
>  memcpy(XBZRLE.current_buf, *current_data, TARGET_PAGE_SIZE);
>  
>  /* XBZRLE encoding (if there is no overflow) */
> -encoded_len = xbzrle_encode_buffer(prev_cached_page, XBZRLE.current_buf,
> -   TARGET_PAGE_SIZE, XBZRLE.encoded_buf,
> -   TARGET_PAGE_SIZE);
> +encoded_len = xbzrle_encode_buffer_func(prev_cached_page, 
> XBZRLE.current_buf,
> +TARGET_PAGE_SIZE, 
> XBZRLE.encoded_buf,
> +TARGET_PAGE_SIZE);
>  
>  /*
>   * Update the cache contents, so that it corresponds to the data 
> diff --git a/migration/xbzrle.c b/migration/xbzrle.c index 
> 1ba482ded9..6da7f79625 100644
> --- a/migration/xbzrle.c
> +++ b/migration/xbzrle.c
> @@ -174,3 +174,133 @@ int xbzrle_decode_buffer(uint8_t *src, int slen, 
> uint8_t *dst, int dlen)
>  
>  return d;
>  }
> +
> +#if defined(CONFIG_AVX512BW_OPT)
> +#pragma GCC push_options
> +#pragma GCC target("avx512bw")
> +#include 
> +int xbzrle_encode_buffer_avx512(uint8_t *old_buf, uint8_t *new_buf, int slen,
> + uint8_t *dst, int dlen) {
> +uint32_t zrun_len = 0, nzrun_len = 0;
> +int d = 0,

RE: [PATCH v3 1/2] Update AVX512 support for xbzrle_encode_buffer function

2022-08-11 Thread Xu, Ling1
Hi, Richard,
  Thanks for your nice comments! Your suggestions are very helpful. We have 
revised code in ram.c according to your comments. As for "unroll residual from 
main loop" problem in algorithm, we will fix this later. Thanks for your time 
and patience~

Best Regards,
Ling

-Original Message-
From: Richard Henderson  
Sent: Wednesday, August 10, 2022 2:25 AM
To: Xu, Ling1 ; quint...@redhat.com
Cc: qemu-devel@nongnu.org; dgilb...@redhat.com; Zhao, Zhou 
; Jin, Jun I 
Subject: Re: [PATCH v3 1/2] Update AVX512 support for xbzrle_encode_buffer 
function

On 8/9/22 00:51, Xu, Ling1 wrote:
> Hi, Juan,
>Thanks for your advice. We have revised our code including: 1) change 
> "IS_CPU_SUPPORT_AVX512BW" to "is_cpu_support_avx512bw" to indicate that 
> variable isn't global variable;

You can remove this variable entirely...

> 2) use a function pointer to simplify code in ram.c;

... because it's redundant with the function pointer.


r~


RE: [PATCH v3 1/2] Update AVX512 support for xbzrle_encode_buffer function

2022-08-09 Thread Xu, Ling1
Hi, Juan, 
  Thanks for your advice. We have revised our code including: 1) change 
"IS_CPU_SUPPORT_AVX512BW" to "is_cpu_support_avx512bw" to indicate that 
variable isn't global variable; 2) use a function pointer to simplify code in 
ram.c; 3) change function name "xbzrle_encode_buffer_512" to 
"xbzrle_encode_buffer_avx512", change variable "res" to "countResidual" for 
better understanding, and replace "unsigned long long" with "uint64_t". 
   We will submit patch v4 to fix all issues mentioned in comments. 

Best Regard,
Ling

-Original Message-
From: Juan Quintela  
Sent: Monday, August 8, 2022 9:12 PM
To: Xu, Ling1 
Cc: qemu-devel@nongnu.org; dgilb...@redhat.com; Zhao, Zhou 
; Jin, Jun I 
Subject: Re: [PATCH v3 1/2] Update AVX512 support for xbzrle_encode_buffer 
function

ling xu  wrote:
> This commit update runtime check of AVX512, and implements avx512 of 
> xbzrle_encode_buffer function to accelerate xbzrle encoding speed.
> Compared with C version of xbzrle_encode_buffer function, avx512 
> version can achieve almost 60%-70% performance improvement on unit 
> test provided by Qemu. In addition, we provide one more unit test 
> called "test_encode_decode_random", in which dirty data are randomly 
> located in 4K page, and this case can achieve almost 140% performance gain.
>
> Signed-off-by: ling xu 
> Co-authored-by: Zhou Zhao 
> Co-authored-by: Jun Jin 
> ---
>  meson.build|  16 
>  meson_options.txt  |   2 +
>  migration/ram.c|  41 ++
>  migration/xbzrle.c | 181 +
>  migration/xbzrle.h |   4 +
>  5 files changed, 244 insertions(+)
>
> diff --git a/meson.build b/meson.build index 294e9a8f32..4222b77e9f 
> 100644
> --- a/meson.build
> +++ b/meson.build
> @@ -2262,6 +2262,22 @@ config_host_data.set('CONFIG_AVX512F_OPT', 
> get_option('avx512f') \
>  int main(int argc, char *argv[]) { return bar(argv[0]); }
>'''), error_message: 'AVX512F not available').allowed())
>  
> +config_host_data.set('CONFIG_AVX512BW_OPT', get_option('avx512bw') \
> +  .require(have_cpuid_h, error_message: 'cpuid.h not available, 
> +cannot enable AVX512BW') \
> +  .require(cc.links('''
> +#pragma GCC push_options
> +#pragma GCC target("avx512bw")
> +#include 
> +#include 
> +static int bar(void *a) {
> +
> +  __m512i x = *(__m512i *)a;
> +  __m512i res= _mm512_abs_epi8(x);
> +  return res[1];
> +}
> +int main(int argc, char *argv[]) { return bar(argv[0]); }  '''), 
> + error_message: 'AVX512BW not available').allowed())
> +
>  have_pvrdma = get_option('pvrdma') \
>.require(rdma.found(), error_message: 'PVRDMA requires OpenFabrics 
> libraries') \
>.require(cc.compiles(gnu_source_prefix + '''
> diff --git a/meson_options.txt b/meson_options.txt index 
> e58e158396..07194bf680 100644
> --- a/meson_options.txt
> +++ b/meson_options.txt
> @@ -104,6 +104,8 @@ option('avx2', type: 'feature', value: 'auto',
> description: 'AVX2 optimizations')  option('avx512f', type: 
> 'feature', value: 'disabled',
> description: 'AVX512F optimizations')
> +option('avx512bw', type: 'feature', value: 'auto',
> +   description: 'AVX512BW optimizations')
>  option('keyring', type: 'feature', value: 'auto',
> description: 'Linux keyring support')
>  

[no clue about meson, it looks ok]

> diff --git a/migration/ram.c b/migration/ram.c index 
> dc1de9ddbc..d9c1ac2f7a 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -83,6 +83,35 @@
>  /* 0x80 is reserved in migration.h start with 0x100 next */
>  #define RAM_SAVE_FLAG_COMPRESS_PAGE0x100
>  
> +#if defined(CONFIG_AVX512BW_OPT)
> +static bool IS_CPU_SUPPORT_AVX512BW;

An all caps global variable?

> +#include "qemu/cpuid.h"
> +static void __attribute__((constructor)) init_cpu_flag(void) {
> +unsigned max = __get_cpuid_max(0, NULL);
> +int a, b, c, d;
> +IS_CPU_SUPPORT_AVX512BW = false;
> +if (max >= 1) {
> +__cpuid(1, a, b, c, d);
> + /* We must check that AVX is not just available, but usable.  */
> +if ((c & bit_OSXSAVE) && (c & bit_AVX) && max >= 7) {
> +int bv;
> +__asm("xgetbv" : "=a"(bv), "=d"(d) : "c"(0));
> +__cpuid_count(7, 0, a, b, c, d);
> +   /* 0xe6:
> +*  XCR0[7:5] = 111b (OPMASK state, upper 256-bit of ZMM0-ZMM15
> +*and ZMM16-ZMM31 state are enabled by OS)
> +*  XCR0[2:1] = 11b (XMM state and YMM state are enabled b

RE: [PATCH v3 0/2] This patch updates runtime check of AVX512

2022-08-08 Thread Xu, Ling1
Hi, Juan, 
 You are right, this v3 and previous v3 are identical except the link to 
previous discussion. The previous [patch v3 0/2] was sent failed as shown in my 
mail, so I resend this patch. Sorry for the ambiguity of resending same patch, 
and thanks for your time ~

Best Regards
Ling

-Original Message-
From: Juan Quintela  
Sent: Monday, August 8, 2022 7:54 PM
To: Xu, Ling1 
Cc: qemu-devel@nongnu.org; dgilb...@redhat.com
Subject: Re: [PATCH v3 0/2] This patch updates runtime check of AVX512

ling xu  wrote:
> This patch updates runtime check of AVX512 and update avx512 support 
> for xbzrle_encode_buffer function to accelerate xbzrle encoding speed.
>
> The runtime check is updated in meson.build and meson_options.txt.
>
> The updated AVX512 algorithm is provided in ram.c, xbzrle.c and 
> xbzrle.h.
>
> The test code is provided in test-xbzrle.c.
>
> Previous discussion is refered below:
> https://www.mail-archive.com/qemu-devel@nongnu.org/msg903520.html
>
> ling xu (2):
>   Update AVX512 support for xbzrle_encode_buffer function
>   Test code for AVX512 support for xbzrle_encode_buffer

I think this v3 and previous v3 are identical except for mthe link to the 
previous discussion.

Later, Juan.




RE: [PATCH v3 2/2] Test code for AVX512 support for xbzrle_encode_buffer

2022-08-08 Thread Xu, Ling1
Hi, Thomas,
  Thanks for your reply. This test code can only work on system supporting 
avx512. It's reasonably to add condition check in test code to, agree to your 
suggestion. I'll add condition check in test code later. 

Best Regards
Ling

-Original Message-
From: Thomas Huth  
Sent: Monday, August 8, 2022 4:09 PM
To: Xu, Ling1 ; qemu-devel@nongnu.org
Cc: quint...@redhat.com; dgilb...@redhat.com; Zhao, Zhou ; 
Jin, Jun I 
Subject: Re: [PATCH v3 2/2] Test code for AVX512 support for 
xbzrle_encode_buffer

On 08/08/2022 09.48, ling xu wrote:
> Signed-off-by: ling xu 
> Co-authored-by: Zhou Zhao 
> Co-authored-by: Jun Jin 
> ---
>   tests/unit/test-xbzrle.c | 307 ---
>   1 file changed, 290 insertions(+), 17 deletions(-)
> 
> diff --git a/tests/unit/test-xbzrle.c b/tests/unit/test-xbzrle.c index 
> ef951b6e54..653016826f 100644
> --- a/tests/unit/test-xbzrle.c
> +++ b/tests/unit/test-xbzrle.c
> @@ -38,111 +38,280 @@ static void test_uleb(void)
>   g_assert(val == 0);
>   }
>   
> -static void test_encode_decode_zero(void)
> +static float *test_encode_decode_zero(void)
>   {
>   uint8_t *buffer = g_malloc0(XBZRLE_PAGE_SIZE);
>   uint8_t *compressed = g_malloc0(XBZRLE_PAGE_SIZE);
> +uint8_t *buffer512 = g_malloc0(XBZRLE_PAGE_SIZE);
> +uint8_t *compressed512 = g_malloc0(XBZRLE_PAGE_SIZE);
>   int i = 0;
> -int dlen = 0;
> +int dlen = 0, dlen512 = 0;
>   int diff_len = g_test_rand_int_range(0, XBZRLE_PAGE_SIZE - 
> 1006);
>   
>   for (i = diff_len; i > 0; i--) {
>   buffer[1000 + i] = i;
> +buffer512[1000 + i] = i;
>   }
>   
>   buffer[1000 + diff_len + 3] = 103;
>   buffer[1000 + diff_len + 5] = 105;
>   
> +buffer512[1000 + diff_len + 3] = 103;
> +buffer512[1000 + diff_len + 5] = 105;
> +
>   /* encode zero page */
> +time_t t_start, t_end, t_start512, t_end512;
> +t_start = clock();
>   dlen = xbzrle_encode_buffer(buffer, buffer, XBZRLE_PAGE_SIZE, 
> compressed,
>  XBZRLE_PAGE_SIZE);
> +t_end = clock();
> +float time_val = difftime(t_end, t_start);
>   g_assert(dlen == 0);
>   
> +t_start512 = clock();
> +dlen512 = xbzrle_encode_buffer_512(buffer512, buffer512, 
> XBZRLE_PAGE_SIZE,
> +   compressed512, 
> + XBZRLE_PAGE_SIZE);

Does this also still work on systems without AVX? If I've got patch 1/2 right, 
this function is only defined if CONFIG_AVX512BW_OPT has been set, so using it 
unconditionally here seems to be wrong?

  Thomas