答复: [RFC PATCH 0/5] hw/arm/virt: Introduce cpu topology support

2021-03-10 Thread fangying


-邮件原件-
发件人: Andrew Jones [mailto:drjo...@redhat.com] 
发送时间: 2021年3月10日 17:21
收件人: fangying 
主题: Re: [RFC PATCH 0/5] hw/arm/virt: Introduce cpu topology support


> Hi Ying Fang,
> 
> Do you plan to repost this soon? It'd be great if it got into 6.0.
>
> Thanks,
> drew


Hi Andrew
Thanks for your remind.
Yes, I will repost and update this series soon.
It seems 6.0 will be soft feature frozen at 3.16.
Deadline is close.

On Thu, Feb 25, 2021 at 04:56:22PM +0800, Ying Fang wrote:
> An accurate cpu topology may help improve the cpu scheduler's decision 
> making when dealing with multi-core system. So cpu topology 
> description is helpful to provide guest with the right view. Dario 
> Faggioli's talk in [0] also shows the virtual topology may has impact on 
> sched performace.
> Thus this patch series is posted to introduce cpu topology support for 
> arm platform.
> 
> Both fdt and ACPI are introduced to present the cpu topology. To 
> describe the cpu topology via ACPI, a PPTT table is introduced 
> according to the processor hierarchy node structure. This series is 
> derived from [1], in [1] we are trying to bring both cpu and cache 
> topology support for arm platform, but there is still some issues to 
> solve to support the cache hierarchy. So we split the cpu topology part out 
> and send it seperately.
> The patch series to support cache hierarchy will be send later since 
> Salil Mehta's cpu hotplug feature need the cpu topology enabled first 
> and he is waiting for it to be upstreamed.
> 
> This patch series was initially based on the patches posted by Andrew Jones 
> [2].
> I jumped in on it since some OS vendor cooperative partner are eager for it.
> Thanks for Andrew's contribution.
> 
> After applying this patch series, launch a guest with virt-6.0 and cpu 
> topology configured with sockets:cores:threads = 2:4:2, you will get 
> the bellow messages with the lscpu command.
> 
> -
> Architecture:aarch64
> CPU op-mode(s):  64-bit
> Byte Order:  Little Endian
> CPU(s):  16
> On-line CPU(s) list: 0-15
> Thread(s) per core:  2
> Core(s) per socket:  4
> Socket(s):   2
> NUMA node(s):2
> Vendor ID:   HiSilicon
> Model:   0
> Model name:  Kunpeng-920
> Stepping:0x1
> BogoMIPS:200.00
> NUMA node0 CPU(s):   0-7
> NUMA node1 CPU(s):   8-15
> 
> [0] 
> https://kvmforum2020.sched.com/event/eE1y/virtual-topology-for-virtual
> -machines-friend-or-foe-dario-faggioli-suse
> [1] 
> https://lists.gnu.org/archive/html/qemu-devel/2020-11/msg02166.html
> [2] 
> https://patchwork.ozlabs.org/project/qemu-devel/cover/20180704124923.3
> 2483-1-drjo...@redhat.com
> 
> Ying Fang (5):
>   device_tree: Add qemu_fdt_add_path
>   hw/arm/virt: Add cpu-map to device tree
>   hw/arm/virt-acpi-build: distinguish possible and present cpus
>   hw/acpi/aml-build: add processor hierarchy node structure
>   hw/arm/virt-acpi-build: add PPTT table
> 
>  hw/acpi/aml-build.c  | 40 ++
>  hw/arm/virt-acpi-build.c | 64 +---
>  hw/arm/virt.c| 40 +-
>  include/hw/acpi/acpi-defs.h  | 13   
> include/hw/acpi/aml-build.h  |  7 
>  include/hw/arm/virt.h|  1 +
>  include/sysemu/device_tree.h |  1 +
>  softmmu/device_tree.c| 45 +++--
>  8 files changed, 204 insertions(+), 7 deletions(-)
> 
> --
> 2.23.0
> 



Re: [PATCH v2 0/4] arm64: Add the cpufreq device to show cpufreq info to guest

2020-02-13 Thread fangying




On 2020/2/13 16:18, Andrew Jones wrote:

On Thu, Feb 13, 2020 at 03:36:26PM +0800, Ying Fang wrote:

On ARM64 platform, cpu frequency is retrieved via ACPI CPPC.
A virtual cpufreq device based on ACPI CPPC is created to
present cpu frequency info to the guest.

The default frequency is set to host cpu nominal frequency,
which is obtained from the host CPPC sysfs. Other performance
data are set to the same value, since we don't support guest
performance scaling here.

Performance counters are also not emulated and they simply
return 1 if read, and guest should fallback to use desired
performance value as the current performance.

Guest kernel version above 4.18 is required to make it work.



This is v2 of the series, but I don't see a changelog.


Hi Andrew, please forgive my carelessness, I forget to add the changelog here.
Actually v2 is slightly modified with a few fixes for compilation
warning and a commit message.


Can you please describe the motivation for this? If I understand
correctly, all of this is just to inform the guest of the host's
CPU0 nominal or max (if nominal isn't present?) frequency. Why
do that? 


Yes, you are right.

The motivation is that there seems no other formal way than the CPPC
feature for arm64 to present cpu frequency info to the OS. However on
x86 platform we have the CPUID method to do that. Some of our VM customers
and cloud developers want that information to do something. So we come up
with the virtual cpufreq device way.


What happens if the guest migrates somewhere where the
frequency is different? 


If the guest is migrated to any host where the frequency is different,
a next read on the CPPC related register may return the new cpufreq info.


If this is for a special use case, then
why not come up with a different channel (guest agent?) to pass
this information?


Maybe some userspace apps need it to do perf tuning or they
want to know the accurate cpu nominal frequency by using tools
like lshw.

We use the CPPC cpufreq way because we think this is a much more
standard way to do that.


Thanks,
drew


.



Thanks,
Ying




Re: [Qemu-devel] [PATCH] qmp: Fix memory leak in migrate_params_test_apply

2019-09-04 Thread fangying




On 2019/9/4 0:46, Dr. David Alan Gilbert wrote:

* Ying Fang (fangyi...@huawei.com) wrote:

Address Sanitizer shows memory leak in migrate_params_test_apply
migration/migration.c:1253 and the stack is as bellow:

Direct leak of 45 byte(s) in 9 object(s) allocated from:
 #0 0xbd7fc1db in __interceptor_malloc (/lib64/libasan.so.4+0xd31db)
 #1 0xbd514163 in g_malloc (/lib64/libglib-2.0.so.0+0x57163)
 #2 0xbd52f43b in g_strdup (/lib64/libglib-2.0.so.0+0x7243b)
 #3 0xdfa4d623 in migrate_params_test_apply migration/migration.c:1253
 #4 0xdfa4d623 in qmp_migrate_set_parameters migration/migration.c:1422
 #5 0xdfa963f3 in hmp_migrate_set_parameter monitor/hmp-cmds.c:1867
 #6 0xdfa8afe3 in handle_hmp_command monitor/hmp.c:1082
 #7 0xdf479c57 in qmp_human_monitor_command monitor/misc.c:140
 #8 0xdfadf87b in qmp_marshal_human_monitor_command 
qapi/qapi-commands-misc.c:1024
 #9 0xdfc7797b in do_qmp_dispatch qapi/qmp-dispatch.c:131
 #10 0xdfc7797b in qmp_dispatch qapi/qmp-dispatch.c:174
 #11 0xdfa84fff in monitor_qmp_dispatch monitor/qmp.c:120
 #12 0xdfa85bbf in monitor_qmp_bh_dispatcher monitor/qmp.c:209
 #13 0xdfd2228f in aio_bh_call util/async.c:89
 #14 0xdfd2228f in aio_bh_poll util/async.c:117
 #15 0xdfd29bc3 in aio_dispatch util/aio-posix.c:459
 #16 0xdfd21ff7 in aio_ctx_dispatch util/async.c:260
 #17 0xbd50e2f7 in g_main_context_dispatch 
(/lib64/libglib-2.0.so.0+0x512f7)
 #18 0xdfd278d7 in glib_pollfds_poll util/main-loop.c:218
 #19 0xdfd278d7 in os_host_main_loop_wait util/main-loop.c:241
 #20 0xdfd278d7 in main_loop_wait util/main-loop.c:517
 #21 0xdf67b5e7 in main_loop vl.c:1806
 #22 0xdf15d453 in main vl.c:4488

Cc: zhanghailiang 
Signed-off-by: Ying Fang 
---
  migration/migration.c | 6 ++
  1 file changed, 6 insertions(+)

diff --git a/migration/migration.c b/migration/migration.c
index 8b9f2fe30a..05e44ff7cc 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1250,11 +1250,17 @@ static void 
migrate_params_test_apply(MigrateSetParameters *params,
  
  if (params->has_tls_creds) {

  assert(params->tls_creds->type == QTYPE_QSTRING);
+if (dest->tls_creds) {
+g_free(dest->tls_creds);
+}
  dest->tls_creds = g_strdup(params->tls_creds->u.s);
  }
  
  if (params->has_tls_hostname) {

  assert(params->tls_hostname->type == QTYPE_QSTRING);
+if (dest->tls_hostname) {
+g_free(dest->tls_hostname);
+}
  dest->tls_hostname = g_strdup(params->tls_hostname->u.s);
  }


Thanks for reporting the leak, but I don't think this is the right fix:

In the call chain we have, qmp_migrate_set_parameters calls:

 migrate_params_test_apply(params, );

tmp is a stack allocated variable  that becomes the 'dest'
we see here.  Then at the top of migrate_params_test_apply
we have:

 *dest = migrate_get_current()->parameters;

so that's probably bad; that's a shallow copy, so dest->tls_authz
points to the same storage as the real current migration parameters.

whne the code does:
 if (params->has_tls_creds) {
 assert(params->tls_creds->type == QTYPE_QSTRING);
 dest->tls_creds = g_strdup(params->tls_creds->u.s);
 }


Yes, you are right, this patch will not fix this issue.
'tmp' is just a copy of 'dest' here, it 's used to do parameter sanity check.
We should either free tmp.tls_creds/tmp.tls_hostname after migrate_params_check
or change migrate_params_test_apply .


it's only changing the pointer in the 'tmp' not the main copy
because of migrate_params_check fails then the parameters get entirely
unchanged.  So if you do a free on dest->tls_hostname you end up
freeing the real parameter that's still getting used, not the tmp.

So I think we need to:
   a) change migrate_params_test_apply so that it returns a
MigrationParameters *  rather than taking 
   b) Make migrate_params_test use QAPI_CLONE to clone instead of doing:
  *dest = migrate_get_current()->parameters;
   c) Then do a qapi_free_MigrateParameters in qmp_migrate_set_parameters
 on both the true and false paths.

Does that make sense?

Dave


  
--

2.19.1



--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

.






Re: [Qemu-devel] Discussion: vnc: memory leak in zrle_compress_data

2019-09-01 Thread fangying
Nice work, your patch does fix this issue in my test.

I think we should make VncState.zlib to be a pointer type as well.

Since we are going to use pointers instead of copy, we must make sure that 
there’s no race condition of pointer members between the local vs (vnc worker 
thread) and origin vs (main thread).


Thanks.
Ying Fang
发件人: Li Qiangmailto:liq...@gmail.com>>
收件人: fangyingmailto:fangyi...@huawei.com>>
抄送: Gerd 
Hoffmannmailto:kra...@redhat.com>>;qemu-develmailto:qemu-devel@nongnu.org>>;Daniel
 P. 
Berrangemailto:berra...@redhat.com>>;zhouyibomailto:zhouyi...@huawei.com>>
主题: Re: [Qemu-devel] Discussion: vnc: memory leak in zrle_compress_data
时间: 2019-08-31 23:48:10



fangying mailto:fangyi...@huawei.com>> 于2019年8月31日周六 
上午8:45写道:
Hi Gerd,

Memory leak is observed in zrle_compress_data when we are doing some
AddressSanitizer tests. The leak stack is as bellow:

=
==47887==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 29760 byte(s) in 5 object(s) allocated from:
 #0 0xa67ef3c3 in __interceptor_calloc (/lib64/libasan.so.4+0xd33c3)
 #1 0xa65071cb in g_malloc0 (/lib64/libglib-2.0.so.0+0x571cb)
 #2 0xa5e968f7 in deflateInit2_ (/lib64/libz.so.1+0x78f7)
 #3 0xcec58613 in zrle_compress_data ui/vnc-enc-zrle.c:87
 #4 0xcec58613 in zrle_send_framebuffer_update ui/vnc-enc-zrle.c:344
 #5 0xcec34e77 in vnc_send_framebuffer_update ui/vnc.c:919
 #6 0xcec5e023 in vnc_worker_thread_loop ui/vnc-jobs.c:271
 #7 0xcec5e5e7 in vnc_worker_thread ui/vnc-jobs.c:340
 #8 0xcee4d3c3 in qemu_thread_start util/qemu-thread-posix.c:502
 #9 0xa544e8bb in start_thread (/lib64/libpthread.so.0+0x78bb)
 #10 0xa53965cb in thread_start (/lib64/libc.so.6+0xd55cb)

This leak stack can be easily reproduced in the case that a client repeatedly
does vnc connect/disconnect .

To get the leak stack, we can compile qemu with
--extra-ldflags=-Wl,--build-id -Wl,-z,relro -Wl,-z,now -lasan'
'--extra-cflags=-O0 -g -fno-omit-frame-pointer -fno-stack-protector
-fsanitize=address -fsanitize=leak' using gcc that supports asan.

It is obvious that there may be memory leak in the zlib/zrle compress stuff.
IIUC, *deflateInit2* is called with the local VncState vs->zrle.stream when a
client is connected the vnc server. And then *deflate* is called to do the
encoding. Finally *deflateEnd* is called in vnc_zrle_clear when a connection is
closed.

I had not think it out why this memory leak could happen here.
It is noticed that deflateInit2 is called with the local vs,
however deflateEnd is called with the origin vs.
The local vs is copied to the origin vs in vnc_async_encoding_start and
vnc_async_encoding_end on the contrary.

Have you got any idea on this issue ?


Hello Ying,

I have posted a patch for this issue:
--> https://lists.gnu.org/archive/html/qemu-devel/2019-08/msg06675.html

Please check whether the patch can address this issue.

Thanks,
Li Qiang




Thanks.
Ying Fang




[Qemu-devel] Discussion: vnc: memory leak in zrle_compress_data

2019-08-30 Thread fangying

Hi Gerd,

Memory leak is observed in zrle_compress_data when we are doing some 
AddressSanitizer tests. The leak stack is as bellow:


=
==47887==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 29760 byte(s) in 5 object(s) allocated from:
#0 0xa67ef3c3 in __interceptor_calloc (/lib64/libasan.so.4+0xd33c3)
#1 0xa65071cb in g_malloc0 (/lib64/libglib-2.0.so.0+0x571cb)
#2 0xa5e968f7 in deflateInit2_ (/lib64/libz.so.1+0x78f7)
#3 0xcec58613 in zrle_compress_data ui/vnc-enc-zrle.c:87
#4 0xcec58613 in zrle_send_framebuffer_update ui/vnc-enc-zrle.c:344
#5 0xcec34e77 in vnc_send_framebuffer_update ui/vnc.c:919
#6 0xcec5e023 in vnc_worker_thread_loop ui/vnc-jobs.c:271
#7 0xcec5e5e7 in vnc_worker_thread ui/vnc-jobs.c:340
#8 0xcee4d3c3 in qemu_thread_start util/qemu-thread-posix.c:502
#9 0xa544e8bb in start_thread (/lib64/libpthread.so.0+0x78bb)
#10 0xa53965cb in thread_start (/lib64/libc.so.6+0xd55cb)

This leak stack can be easily reproduced in the case that a client repeatedly 
does vnc connect/disconnect .


To get the leak stack, we can compile qemu with
--extra-ldflags=-Wl,--build-id -Wl,-z,relro -Wl,-z,now -lasan'
'--extra-cflags=-O0 -g -fno-omit-frame-pointer -fno-stack-protector 
-fsanitize=address -fsanitize=leak' using gcc that supports asan.


It is obvious that there may be memory leak in the zlib/zrle compress stuff.
IIUC, *deflateInit2* is called with the local VncState vs->zrle.stream when a 
client is connected the vnc server. And then *deflate* is called to do the 
encoding. Finally *deflateEnd* is called in vnc_zrle_clear when a connection is

closed.

I had not think it out why this memory leak could happen here.
It is noticed that deflateInit2 is called with the local vs,
however deflateEnd is called with the origin vs.
The local vs is copied to the origin vs in vnc_async_encoding_start and 
vnc_async_encoding_end on the contrary.


Have you got any idea on this issue ?

Thanks.
Ying Fang




Re: [Qemu-devel] [PATCH] qmp: Fix memory leak in migrate_params_test_apply

2019-08-27 Thread fangying




On 2019/8/27 16:38, Li Qiang wrote:



Ying Fang mailto:fangyi...@huawei.com>> 于2019年8月27日周 
二 下午4:06写道:


Address Sanitizer shows memory leak in migrate_params_test_apply
migration/migration.c:1253 and the stack is as bellow:

Direct leak of 45 byte(s) in 9 object(s) allocated from:
     #0 0xbd7fc1db in __interceptor_malloc (/lib64/libasan.so.4+0xd31db)
     #1 0xbd514163 in g_malloc (/lib64/libglib-2.0.so.0+0x57163)
     #2 0xbd52f43b in g_strdup (/lib64/libglib-2.0.so.0+0x7243b)
     #3 0xdfa4d623 in migrate_params_test_apply 
migration/migration.c:1253
     #4 0xdfa4d623 in qmp_migrate_set_parameters 
migration/migration.c:1422
     #5 0xdfa963f3 in hmp_migrate_set_parameter monitor/hmp-cmds.c:1867
     #6 0xdfa8afe3 in handle_hmp_command monitor/hmp.c:1082
     #7 0xdf479c57 in qmp_human_monitor_command monitor/misc.c:140
     #8 0xdfadf87b in qmp_marshal_human_monitor_command
qapi/qapi-commands-misc.c:1024
     #9 0xdfc7797b in do_qmp_dispatch qapi/qmp-dispatch.c:131
     #10 0xdfc7797b in qmp_dispatch qapi/qmp-dispatch.c:174
     #11 0xdfa84fff in monitor_qmp_dispatch monitor/qmp.c:120
     #12 0xdfa85bbf in monitor_qmp_bh_dispatcher monitor/qmp.c:209
     #13 0xdfd2228f in aio_bh_call util/async.c:89
     #14 0xdfd2228f in aio_bh_poll util/async.c:117
     #15 0xdfd29bc3 in aio_dispatch util/aio-posix.c:459
     #16 0xdfd21ff7 in aio_ctx_dispatch util/async.c:260
     #17 0xbd50e2f7 in g_main_context_dispatch
(/lib64/libglib-2.0.so.0+0x512f7)
     #18 0xdfd278d7 in glib_pollfds_poll util/main-loop.c:218
     #19 0xdfd278d7 in os_host_main_loop_wait util/main-loop.c:241
     #20 0xdfd278d7 in main_loop_wait util/main-loop.c:517
     #21 0xdf67b5e7 in main_loop vl.c:1806
     #22 0xdf15d453 in main vl.c:4488

Cc: zhanghailiang mailto:zhang.zhanghaili...@huawei.com>>
Signed-off-by: Ying Fang mailto:fangyi...@huawei.com>>
---
  migration/migration.c | 6 ++
  1 file changed, 6 insertions(+)

diff --git a/migration/migration.c b/migration/migration.c
index 8b9f2fe30a..05e44ff7cc 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1250,11 +1250,17 @@ static void
migrate_params_test_apply(MigrateSetParameters *params,

      if (params->has_tls_creds) {
          assert(params->tls_creds->type == QTYPE_QSTRING);
+        if (dest->tls_creds) {
+            g_free(dest->tls_creds);
+        }


g_free can handle NULL, no need to do the NULL check.


Thanks for your remind, we can call g_free here directly.
I'll send a v2 later.


Thanks,
Li Qiang

          dest->tls_creds = g_strdup(params->tls_creds->u.s);
      }

      if (params->has_tls_hostname) {
          assert(params->tls_hostname->type == QTYPE_QSTRING);
+        if (dest->tls_hostname) {
+            g_free(dest->tls_hostname);
+        }
          dest->tls_hostname = g_strdup(params->tls_hostname->u.s);
      }

-- 
2.19.1









[Qemu-devel] [Bug 1840865] Re: qemu crashes when doing iotest on virtio-9p filesystem

2019-08-21 Thread fangying
** Description changed:

  Qemu crashes when doing avocado-vt test on virtio-9p filesystem.
- This bug can be reproduced running 
https://github.com/autotest/tp-qemu/blob/master/qemu/tests/9p.py.
+ This bug can be reproduced running 
https://github.com/autotest/tp-qemu/blob/master/qemu/tests/9p.py with the 
latest qemu-4.0.0.
  The crash stack goes like:
  
  Program terminated with signal SIGSEGV, Segmentation fault.
  #0  v9fs_mark_fids_unreclaim (pdu=pdu@entry=0xaaab00046868, 
path=path@entry=0x851e2fa8)
- at hw/9pfs/9p.c:505
+ at hw/9pfs/9p.c:505
  #1  0xe3585acc in v9fs_unlinkat (opaque=0xaaab00046868) at 
hw/9pfs/9p.c:2590
  #2  0xe3811c10 in coroutine_trampoline (i0=, 
i1=)
- at util/coroutine-ucontext.c:116
+ at util/coroutine-ucontext.c:116
  #3  0xa13ddb20 in ?? () from /lib64/libc.so.6
  Backtrace stopped: not enough registers or memory available to unwind further
  
  A segment fault is triggered at hw/9pfs/9p.c line 505
  
- for (fidp = s->fid_list; fidp; fidp = fidp->next) {
- if (fidp->path.size != path->size) { # fidp is invalid 
- continue;
- }
+ for (fidp = s->fid_list; fidp; fidp = fidp->next) {
+ if (fidp->path.size != path->size) { # fidp is invalid
+ continue;
+ }
  
  (gdb) p path
  $10 = (V9fsPath *) 0x851e2fa8
  (gdb) p *path
  $11 = {size = 21, data = 0xfed6f420 "./9p_test/p2a1/d0/f1"}
  (gdb) p *fidp
  Cannot access memory at address 0x101010101010101
  (gdb) p *pdu
  $12 = {size = 19, tag = 54, id = 76 'L', cancelled = 0 '\000', complete = 
{entries = {
-   sqh_first = 0x0, sqh_last = 0xaaab00046870}}, s = 0xaaab000454b8, next 
= {
- le_next = 0xaaab000467c0, le_prev = 0xaaab00046f88}, idx = 88}
- (gdb) 
+   sqh_first = 0x0, sqh_last = 0xaaab00046870}}, s = 0xaaab000454b8, next 
= {
+ le_next = 0xaaab000467c0, le_prev = 0xaaab00046f88}, idx = 88}
+ (gdb)
  
  Address Sanitizer shows error and saying that there is a heap-use-after-
  free on *fidp*.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1840865

Title:
  qemu crashes when doing iotest on  virtio-9p filesystem

Status in QEMU:
  New

Bug description:
  Qemu crashes when doing avocado-vt test on virtio-9p filesystem.
  This bug can be reproduced running 
https://github.com/autotest/tp-qemu/blob/master/qemu/tests/9p.py with the 
latest qemu-4.0.0.
  The crash stack goes like:

  Program terminated with signal SIGSEGV, Segmentation fault.
  #0  v9fs_mark_fids_unreclaim (pdu=pdu@entry=0xaaab00046868, 
path=path@entry=0x851e2fa8)
  at hw/9pfs/9p.c:505
  #1  0xe3585acc in v9fs_unlinkat (opaque=0xaaab00046868) at 
hw/9pfs/9p.c:2590
  #2  0xe3811c10 in coroutine_trampoline (i0=, 
i1=)
  at util/coroutine-ucontext.c:116
  #3  0xa13ddb20 in ?? () from /lib64/libc.so.6
  Backtrace stopped: not enough registers or memory available to unwind further

  A segment fault is triggered at hw/9pfs/9p.c line 505

  for (fidp = s->fid_list; fidp; fidp = fidp->next) {
  if (fidp->path.size != path->size) { # fidp is invalid
  continue;
  }

  (gdb) p path
  $10 = (V9fsPath *) 0x851e2fa8
  (gdb) p *path
  $11 = {size = 21, data = 0xfed6f420 "./9p_test/p2a1/d0/f1"}
  (gdb) p *fidp
  Cannot access memory at address 0x101010101010101
  (gdb) p *pdu
  $12 = {size = 19, tag = 54, id = 76 'L', cancelled = 0 '\000', complete = 
{entries = {
    sqh_first = 0x0, sqh_last = 0xaaab00046870}}, s = 0xaaab000454b8, next 
= {
  le_next = 0xaaab000467c0, le_prev = 0xaaab00046f88}, idx = 88}
  (gdb)

  Address Sanitizer shows error and saying that there is a heap-use-
  after-free on *fidp*.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1840865/+subscriptions



[Qemu-devel] [Bug 1840865] [NEW] qemu crashes when doing iotest on virtio-9p filesystem

2019-08-20 Thread fangying
Public bug reported:

Qemu crashes when doing avocado-vt test on virtio-9p filesystem.
This bug can be reproduced running 
https://github.com/autotest/tp-qemu/blob/master/qemu/tests/9p.py.
The crash stack goes like:

Program terminated with signal SIGSEGV, Segmentation fault.
#0  v9fs_mark_fids_unreclaim (pdu=pdu@entry=0xaaab00046868, 
path=path@entry=0x851e2fa8)
at hw/9pfs/9p.c:505
#1  0xe3585acc in v9fs_unlinkat (opaque=0xaaab00046868) at 
hw/9pfs/9p.c:2590
#2  0xe3811c10 in coroutine_trampoline (i0=, 
i1=)
at util/coroutine-ucontext.c:116
#3  0xa13ddb20 in ?? () from /lib64/libc.so.6
Backtrace stopped: not enough registers or memory available to unwind further

A segment fault is triggered at hw/9pfs/9p.c line 505

for (fidp = s->fid_list; fidp; fidp = fidp->next) {
if (fidp->path.size != path->size) { # fidp is invalid 
continue;
}

(gdb) p path
$10 = (V9fsPath *) 0x851e2fa8
(gdb) p *path
$11 = {size = 21, data = 0xfed6f420 "./9p_test/p2a1/d0/f1"}
(gdb) p *fidp
Cannot access memory at address 0x101010101010101
(gdb) p *pdu
$12 = {size = 19, tag = 54, id = 76 'L', cancelled = 0 '\000', complete = 
{entries = {
  sqh_first = 0x0, sqh_last = 0xaaab00046870}}, s = 0xaaab000454b8, next = {
le_next = 0xaaab000467c0, le_prev = 0xaaab00046f88}, idx = 88}
(gdb) 

Address Sanitizer shows error and saying that there is a heap-use-after-
free on *fidp*.

** Affects: qemu
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1840865

Title:
  qemu crashes when doing iotest on  virtio-9p filesystem

Status in QEMU:
  New

Bug description:
  Qemu crashes when doing avocado-vt test on virtio-9p filesystem.
  This bug can be reproduced running 
https://github.com/autotest/tp-qemu/blob/master/qemu/tests/9p.py.
  The crash stack goes like:

  Program terminated with signal SIGSEGV, Segmentation fault.
  #0  v9fs_mark_fids_unreclaim (pdu=pdu@entry=0xaaab00046868, 
path=path@entry=0x851e2fa8)
  at hw/9pfs/9p.c:505
  #1  0xe3585acc in v9fs_unlinkat (opaque=0xaaab00046868) at 
hw/9pfs/9p.c:2590
  #2  0xe3811c10 in coroutine_trampoline (i0=, 
i1=)
  at util/coroutine-ucontext.c:116
  #3  0xa13ddb20 in ?? () from /lib64/libc.so.6
  Backtrace stopped: not enough registers or memory available to unwind further

  A segment fault is triggered at hw/9pfs/9p.c line 505

  for (fidp = s->fid_list; fidp; fidp = fidp->next) {
  if (fidp->path.size != path->size) { # fidp is invalid 
  continue;
  }

  (gdb) p path
  $10 = (V9fsPath *) 0x851e2fa8
  (gdb) p *path
  $11 = {size = 21, data = 0xfed6f420 "./9p_test/p2a1/d0/f1"}
  (gdb) p *fidp
  Cannot access memory at address 0x101010101010101
  (gdb) p *pdu
  $12 = {size = 19, tag = 54, id = 76 'L', cancelled = 0 '\000', complete = 
{entries = {
sqh_first = 0x0, sqh_last = 0xaaab00046870}}, s = 0xaaab000454b8, next 
= {
  le_next = 0xaaab000467c0, le_prev = 0xaaab00046f88}, idx = 88}
  (gdb) 

  Address Sanitizer shows error and saying that there is a heap-use-
  after-free on *fidp*.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1840865/+subscriptions



Re: [Qemu-devel] Discussion: redundant process during hotplug and missed process during unplug

2019-07-20 Thread fangying



Hi Michael,

On Fri, Jul 19, 2019 at 02:35:14AM +, Zhangbo (Oscar) wrote:

Hi All:
I have 2 questions about (un)hotplug on pcie-root-port.
First Question (hotplug failure because of redundant PCI_EXP_LNKSTA_DLLLA bit 
set):
 during VM boot, qemu sets PCI_EXP_LNKSTA_DLLLA according to this process:
 pcie_cap_init() -> pcie_cap_v1_fill(),
 even if there's no pcie device added to the VM.
 I noticed that during hotplug, qemu also sets PCI_EXP_LNKSTA_DLLLA in 
pcie_cap_slot_hotplug_cb().
 It means that the bit PCI_EXP_LNKSTA_DLLLA is set TWICE.
 why set this bit during initializing pcie-root-port? It seems unnecessary.


Makes sense.


 The bad side of this is it causes HOTPLUG FAILURE if we boot the VM and 
hotplug a pcie device at the same time:
In VM kernel,according to this bit set, it senses a PDC event, the 
process is:
 pciehp_probe -> pcie_init -> pcie_init_slot -> 
pciehp_queue_pushbutton_work.
 If the 2 PDC events get too close, the VM kernel will wrongly unplug the 
device.
Suggestion to the 1st problem:
Can I remove the PCI_EXP_LNKSTA_DLLLA bit set process during 
pcie_cap_init().



We raise this qeustion here because we find out that if the pcie ext 
capability PCI_EXP_LNKSTA_DLLLA is set by default, linux kernel may try 
to send PCI_EXP_HP_EV_PDC event after boot-up. When we do virtio device 
hotplug during the processing of PCI_EXP_HP_EV_PDC event (pciehp_ist 
=>pciehp_handle_presence_or_link_change => pciehp_enable_slot)
the device may be accidently powered down because the power state 
detected is ON_STATE.


Kernel sends PCI_EXP_HP_EV_PDC event when it tries to probe the
pcie-root-probe, i.e:
pciehp_probe => pciehp_check_presence => 
pciehp_card_present_or_link_active  => pciehp_check_link_active
pciehp_check_link_active returns true if PCI_EXP_LNKSTA_DLLLA Cap is 
presented.


We are going send a patch to have PCI_EXP_LNKSTA_DLLLA Cap removed to 
fix the problem here.

Second Question (time cost too much during pcie device unplug):
 qemu only send ABP event to VM kernel during unpluging pcie devices, VM 
kernel receives the
 ABP event then sleep 5s to expect a PDC event, which causes unpluging 
devices takes too long.
Suggestion to the 2nd problem:
Can I send ABP and *PDC* events to kernet when unplug devices.


I think we should not only set PDC but also try clearing presence bit,
even though the device is actually still there and mapped into guest
memory.
Maybe we should also not send the ABP event at all.

In both cases it's necessary to test with a non-linux guest
(e.g. a couple of versions of windows) to be sure we are not breaking
anything.


Thanks to your opinion, we will try to send PCI_EXP_HP_EV_PDC event 
instead of the PCI_EXP_HP_EV_PDC event at device unplug and do some 
Windows guest compatibility test.

We will reply later.




Re: [Qemu-devel] [RFC] Questions on the I/O performance of emulated host cdrom device

2019-01-21 Thread fangying
Hi,
I am not intended to send a patch now, I use the ‘read-only’ code snippet to 
prove that the I/O performance
can increase from 30% to 80% compared to host side if we do not call 
cdrom_is_inserted here. Obviously it is not
suitable to skip over checking cdrom drive status.

However I cannot come up with a proper solution right now. But I think there 
may be two approches.
One way that can check drive status equal to ioctl CDROM_DRIVE_STATUS but much 
faster. The other way
that let qemu catch drive status via event triggered mechanism.

Regards.

发件人: fangying
发送时间: 2019年1月22日 11:27
收件人: John Snow ; Kevin Wolf 
抄送: Zhoujian (jay) ; dengkai (A) 
; qemu-devel ; mreitz 

主题: RE: [Qemu-devel] [RFC] Questions on the I/O performance of emulated host 
cdrom device

发件人: John Snow
收件人: fangyingmailto:fangyi...@huawei.com>>;Kevin 
Wolfmailto:kw...@redhat.com>>
抄送: Zhoujian 
(jay)mailto:jianjay.z...@huawei.com>>;dengkai 
(A)mailto:dengk...@huawei.com>>;qemu-develmailto:qemu-devel@nongnu.org>>;mreitzmailto:mre...@redhat.com>>
主题: Re: [Qemu-devel] [RFC] Questions on the I/O performance of emulated host 
cdrom device
时间: 2019-01-19 06:43:43



On 1/15/19 9:48 PM, Ying Fang wrote:
>
>
> On 2019/1/16 4:15, John Snow wrote:
>>
>>
>> On 1/8/19 10:20 PM, Ying Fang wrote:
>>>
>>>
>>> On 2019/1/8 20:46, Kevin Wolf wrote:
>>>> Am 29.12.2018 um 07:33 hat Ying Fang geschrieben:
>>>>> Hi.
>>>>> Recently one of our customer complained about the I/O performance of QEMU 
>>>>> emulated host cdrom device.
>>>>> I did some investigation on it and there was still some point I could not 
>>>>> figure out. So I had to ask for your help.
>>>>>
>>>>> Here is the application scenario setup by our customer.
>>>>> filename.iso/dev/sr0   /dev/cdrom
>>>>> remote client-->  host(cdemu)  -->  Linux VM
>>>>> (1)A remote client maps an iso file to x86 host machine through network 
>>>>> using tcp.
>>>>> (2)The cdemu daemon then load it as a local virtual cdrom disk drive.
>>>>> (3)A VM is launched with the virtual cdrom disk drive configured.
>>>>> The VM can make use of this virtual cdrom to install an OS in the iso 
>>>>> file.
>>>>>
>>>>> The network bandwith btw the remote client and host is 100Mbps, we test 
>>>>> I/O perf using: dd if=/dev/sr0 of=/dev/null bs=32K count=10.
>>>>> And we have
>>>>> (1) I/O perf of host side /dev/sr0 is 11MB/s;
>>>>> (2) I/O perf of /dev/cdrom inside VM is 3.8MB/s.
>>>>>
>>>>> As we can see, I/O perf of cdrom inside the VM is about 34.5% compared 
>>>>> with host side.
>>>>> FlameGraph is used to find out the bottleneck of this operation and we 
>>>>> find out that too much time is occupied by calling *bdrv_is_inserted*.
>>>>> Then we dig into the code and figure out that the ioctl in 
>>>>> *cdrom_is_inserted* takes too much time, because it triggers 
>>>>> io_schdule_timeout in kernel.
>>>>> In the code path of emulated cdrom device, each DMA I/O request consists 
>>>>> of several *bdrv_is_inserted*, which degrades the I/O performance by 
>>>>> about 31% in our test.
>>>>> static bool cdrom_is_inserted(BlockDriverState *bs)
>>>>> {
>>>>> BDRVRawState *s = bs->opaque;
>>>>> int ret;
>>>>>
>>>>> ret = ioctl(s->fd, CDROM_DRIVE_STATUS, CDSL_CURRENT);
>>>>> return ret == CDS_DISC_OK;
>>>>> }
>>>>> A flamegraph svg file (cdrom.svg) is attachieved in this email to show 
>>>>> the code timing profile we've tested.
>>>>>
>>>>> So here is my question.
>>>>> (1) Why do we regularly check the presence of a cdrom disk drive in the 
>>>>> code path?  Can we do it asynchronously?
>>
>> The ATAPI emulator needs to know if it has storage present to carry out
>> the request or to signal an error, so it checks. It might be the case
>> that the VM operator forcibly dismounted network storage without
>> awareness from the guest. (This is basically emulation of the case when
>> a user mechanically forces a CDROM tray open.)
>>
>> I wasn't aware this check was so slow.
> It is fast enough in most case, however it may be slow if the cdrom drive is 
> a virtual drive mapped from remote client.
> This is showed in 
> https://raw.gith

Re: [Qemu-devel] [RFC] Questions on the I/O performance of emulated host cdrom device

2019-01-21 Thread fangying






方应
M:+86-15925605092
E:fangyi...@huawei.com<mailto:fangyi...@huawei.com>
2012实验室-欧拉五部
2012 Laboratories-Euler Dept 5

发件人: John Snow
收件人: fangyingmailto:fangyi...@huawei.com>>;Kevin 
Wolfmailto:kw...@redhat.com>>
抄送: Zhoujian 
(jay)mailto:jianjay.z...@huawei.com>>;dengkai 
(A)mailto:dengk...@huawei.com>>;qemu-develmailto:qemu-devel@nongnu.org>>;mreitzmailto:mre...@redhat.com>>
主题: Re: [Qemu-devel] [RFC] Questions on the I/O performance of emulated host 
cdrom device
时间: 2019-01-19 06:43:43



On 1/15/19 9:48 PM, Ying Fang wrote:
>
>
> On 2019/1/16 4:15, John Snow wrote:
>>
>>
>> On 1/8/19 10:20 PM, Ying Fang wrote:
>>>
>>>
>>> On 2019/1/8 20:46, Kevin Wolf wrote:
>>>> Am 29.12.2018 um 07:33 hat Ying Fang geschrieben:
>>>>> Hi.
>>>>> Recently one of our customer complained about the I/O performance of QEMU 
>>>>> emulated host cdrom device.
>>>>> I did some investigation on it and there was still some point I could not 
>>>>> figure out. So I had to ask for your help.
>>>>>
>>>>> Here is the application scenario setup by our customer.
>>>>> filename.iso/dev/sr0   /dev/cdrom
>>>>> remote client-->  host(cdemu)  -->  Linux VM
>>>>> (1)A remote client maps an iso file to x86 host machine through network 
>>>>> using tcp.
>>>>> (2)The cdemu daemon then load it as a local virtual cdrom disk drive.
>>>>> (3)A VM is launched with the virtual cdrom disk drive configured.
>>>>> The VM can make use of this virtual cdrom to install an OS in the iso 
>>>>> file.
>>>>>
>>>>> The network bandwith btw the remote client and host is 100Mbps, we test 
>>>>> I/O perf using: dd if=/dev/sr0 of=/dev/null bs=32K count=10.
>>>>> And we have
>>>>> (1) I/O perf of host side /dev/sr0 is 11MB/s;
>>>>> (2) I/O perf of /dev/cdrom inside VM is 3.8MB/s.
>>>>>
>>>>> As we can see, I/O perf of cdrom inside the VM is about 34.5% compared 
>>>>> with host side.
>>>>> FlameGraph is used to find out the bottleneck of this operation and we 
>>>>> find out that too much time is occupied by calling *bdrv_is_inserted*.
>>>>> Then we dig into the code and figure out that the ioctl in 
>>>>> *cdrom_is_inserted* takes too much time, because it triggers 
>>>>> io_schdule_timeout in kernel.
>>>>> In the code path of emulated cdrom device, each DMA I/O request consists 
>>>>> of several *bdrv_is_inserted*, which degrades the I/O performance by 
>>>>> about 31% in our test.
>>>>> static bool cdrom_is_inserted(BlockDriverState *bs)
>>>>> {
>>>>> BDRVRawState *s = bs->opaque;
>>>>> int ret;
>>>>>
>>>>> ret = ioctl(s->fd, CDROM_DRIVE_STATUS, CDSL_CURRENT);
>>>>> return ret == CDS_DISC_OK;
>>>>> }
>>>>> A flamegraph svg file (cdrom.svg) is attachieved in this email to show 
>>>>> the code timing profile we've tested.
>>>>>
>>>>> So here is my question.
>>>>> (1) Why do we regularly check the presence of a cdrom disk drive in the 
>>>>> code path?  Can we do it asynchronously?
>>
>> The ATAPI emulator needs to know if it has storage present to carry out
>> the request or to signal an error, so it checks. It might be the case
>> that the VM operator forcibly dismounted network storage without
>> awareness from the guest. (This is basically emulation of the case when
>> a user mechanically forces a CDROM tray open.)
>>
>> I wasn't aware this check was so slow.
> It is fast enough in most case, however it may be slow if the cdrom drive is 
> a virtual drive mapped from remote client.
> This is showed in 
> https://raw.githubusercontent.com/fangying/fangying.github.io/master/94E119FA-8BA1-41AB-A26A-FBDC635BCD2C.png
> The reason is ioctl(s->fd, CDROM_DRIVE_STATUS, CDSL_CURRENT) in 
> cdrom_is_inserted takes much time in this scenario.
>
>>
>> Maybe we can just cache blk_is_inserted -- I don't remember if it's
>> possible to eject media without awareness from the device model, but if
>> this check winds up being slow in some configurations we can cache it.
> To cache media status is a good idea here, we check blk_is_inserted instead 
> and modify it when media status is changed.
&

[Qemu-devel] [PATCH v4] vhost: Don't abort when vhost-user connection is lost during migration

2017-12-01 Thread fangying
QEMU will abort when vhost-user process is restarted during migration
when vhost_log_global_start/stop is called. The reason is clear that
vhost_dev_set_log returns -1 because network connection is lost.

To handle this situation, let's cancel migration by setting migrate
state to failure and report it to user.
---
 hw/virtio/vhost.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index ddc42f0..92725f7 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -26,6 +26,8 @@
 #include "hw/virtio/virtio-bus.h"
 #include "hw/virtio/virtio-access.h"
 #include "migration/blocker.h"
+#include "migration/migration.h"
+#include "migration/qemu-file.h"
 #include "sysemu/dma.h"
 
 /* enabled until disconnected backend stabilizes */
@@ -885,7 +887,10 @@ static void vhost_log_global_start(MemoryListener 
*listener)
 
 r = vhost_migration_log(listener, true);
 if (r < 0) {
-abort();
+error_report("Failed to start vhost dirty log");
+if (migrate_get_current()->migration_thread_running) {
+qemu_file_set_error(migrate_get_current()->to_dst_file, -ECHILD);
+}
 }
 }
 
@@ -895,7 +900,10 @@ static void vhost_log_global_stop(MemoryListener *listener)
 
 r = vhost_migration_log(listener, false);
 if (r < 0) {
-abort();
+error_report("Failed to stop vhost dirty log");
+if (migrate_get_current()->migration_thread_running) {
+qemu_file_set_error(migrate_get_current()->to_dst_file, -ECHILD);
+}
 }
 }
 
-- 
1.8.3.1





[Qemu-devel] [PATCH v3] vhost: Cancel migration when vhost-user process restarted during migration

2017-11-27 Thread fangying
QEMU will abort when vhost-user process is restarted during migration
and vhost_log_global_start/stop is called. The reason is clear that
vhost_dev_set_log returns -1 because network connection is temporarily
lost. To handle this situation, let's cancel migration here.

Signed-off-by: Ying Fang 
---
 hw/virtio/vhost.c | 15 +--
 migration/migration.c |  2 +-
 migration/migration.h |  2 +-
 3 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index ddc42f0..e2ade93 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -26,6 +26,7 @@
 #include "hw/virtio/virtio-bus.h"
 #include "hw/virtio/virtio-access.h"
 #include "migration/blocker.h"
+#include "migration/migration.h"
 #include "sysemu/dma.h"
 
 /* enabled until disconnected backend stabilizes */
@@ -882,20 +883,30 @@ static int vhost_migration_log(MemoryListener *listener, 
int enable)
 static void vhost_log_global_start(MemoryListener *listener)
 {
 int r;
+MigrationState *s = NULL;
 
 r = vhost_migration_log(listener, true);
 if (r < 0) {
-abort();
+error_report("Failed to start vhost dirty log");
+s = migrate_get_current();
+if (s->migration_thread_running) {
+migrate_fd_cancel(s);
+}
 }
 }
 
 static void vhost_log_global_stop(MemoryListener *listener)
 {
 int r;
+MigrationState *s = NULL;
 
 r = vhost_migration_log(listener, false);
 if (r < 0) {
-abort();
+error_report("Failed to stop vhost dirty log");
+s = migrate_get_current();
+if (s->migration_thread_running) {
+migrate_fd_cancel(s);
+}
 }
 }
 
diff --git a/migration/migration.c b/migration/migration.c
index 4de3b55..6d2b7df 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1131,7 +1131,7 @@ void migrate_fd_error(MigrationState *s, const Error 
*error)
 block_cleanup_parameters(s);
 }
 
-static void migrate_fd_cancel(MigrationState *s)
+void migrate_fd_cancel(MigrationState *s)
 {
 int old_state ;
 QEMUFile *f = migrate_get_current()->to_dst_file;
diff --git a/migration/migration.h b/migration/migration.h
index 663415f..f0261e3 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -167,7 +167,7 @@ uint64_t migrate_max_downtime(void);
 
 void migrate_set_error(MigrationState *s, const Error *error);
 void migrate_fd_error(MigrationState *s, const Error *error);
-
+void migrate_fd_cancel(MigrationState *s);
 void migrate_fd_connect(MigrationState *s);
 
 MigrationState *migrate_init(void);
-- 
1.8.3.1





[Qemu-devel] [PATCH v2] vhost: Cancel migration when vhost-user process restarted during migration

2017-11-15 Thread fangying
From: Ying Fang 

QEMU will abort when vhost-user process is restarted during migration when
vhost_log_global_start/stop is called. The reason is clear that
vhost_dev_set_log returns -1 because network connection is temporarily lost.
To handle this situation, let's cancel migration and report it to user.

Signed-off-by: Ying Fang 
---
 hw/virtio/vhost.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index ddc42f0..a9d1895 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -27,6 +27,7 @@
 #include "hw/virtio/virtio-access.h"
 #include "migration/blocker.h"
 #include "sysemu/dma.h"
+#include "migration/migration.h"
 
 /* enabled until disconnected backend stabilizes */
 #define _VHOST_DEBUG 1
@@ -882,20 +883,24 @@ static int vhost_migration_log(MemoryListener *listener, 
int enable)
 static void vhost_log_global_start(MemoryListener *listener)
 {
 int r;
+Error *errp = NULL;
 
 r = vhost_migration_log(listener, true);
 if (r < 0) {
-abort();
+error_setg(, "Failed to start vhost migration log");
+migrate_fd_error(migrate_get_current(), errp);
 }
 }
 
 static void vhost_log_global_stop(MemoryListener *listener)
 {
 int r;
+Error *errp = NULL;
 
 r = vhost_migration_log(listener, false);
 if (r < 0) {
-abort();
+error_setg(, "Failed to stop vhost migration log");
+migrate_fd_error(migrate_get_current(), errp);
 }
 }
 
-- 
2.10.0.windows.1





[Qemu-devel] [PATCH] vhost: Cancel migration when vhost-user process restarted during migration

2017-11-14 Thread fangying
From: Ying Fang

QEMU will abort when vhost-user process is restarted during migration when
vhost_log_global_start/stop is called. The reason is clear that
vhost_dev_set_log returns -1 because network connection is temporarily lost.
To handle this situation, let's cancel migration and report it to user.

Signed-off-by: Ying Fang 
---
 hw/virtio/vhost.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index ddc42f0..f409b06 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -27,6 +27,7 @@
 #include "hw/virtio/virtio-access.h"
 #include "migration/blocker.h"
 #include "sysemu/dma.h"
+#include "qmp-commands.h"
 
 /* enabled until disconnected backend stabilizes */
 #define _VHOST_DEBUG 1
@@ -882,20 +883,24 @@ static int vhost_migration_log(MemoryListener *listener, 
int enable)
 static void vhost_log_global_start(MemoryListener *listener)
 {
 int r;
+Error *errp = NULL;
 
 r = vhost_migration_log(listener, true);
 if (r < 0) {
-abort();
+error_setg(errp, "Failed to start vhost migration log");
+qmp_migrate_cancel();
 }
 }
 
 static void vhost_log_global_stop(MemoryListener *listener)
 {
 int r;
+Error *errp = NULL;
 
 r = vhost_migration_log(listener, false);
 if (r < 0) {
-abort();
+error_setg(errp, "Failed to stop vhost migration log");
+qmp_migrate_cancel();
 }
 }
 
-- 
2.10.0.windows.1





[Qemu-devel] QEMU abort when network serivce is restarted during live migration with vhost-user as the network backend

2017-11-13 Thread fangying
Hi all,

We have a vm running migration with vhost-user as network backend, we notice 
that qemu will abort when openvswitch is restarted
when MEMORY_LISTENER_CALL_GLOBAL(log_global_start, Forward) is called. The 
reasion is clear that vhost_dev_set_log returns -1 because
the network connection is temporarily lost due to the restart of openvswitch 
service.

Below is the trace of the call stack.

#0  0x7f868ed971d7 in raise() from /usr/lib64/libc.so.6
#1  0x7f868ed988c8 in abort() from /usr/lib64/libc.so.6
#2  0x004d0d35 in vhost_log_global_start (listener=) at 
/usr/src/debug/qemu-kvm-2.8.1/hw/virtio/vhost.c:794
#2  0x00486bd2 in memory_global_dirty_log_start at 
/usr/src/debug/qemu-kvm-2.8.1/memory.c:2304
#3  0x00486dcd in ram_save_init_globals at 
/usr/src/debug/qemu-kvm-2.8.1/migration/ram.c:2072
#4  0x0048c185 in ram_save_setup (f=0x25e6ac0, opaque=) 
at /usr/src/debug/qemu-kvm-2.8.1/migration/ram.c:2093
#5  0x004fbee2 in qemu_savevm_state_begin at 
/usr/src/debug/qemu-kvm-2.8.1/migration/savevm.c:956
#6  0x0083d8f8 in migration_thread at migration/migration.c:2198

static void vhost_log_global_start(MemoryListener *listener)
{
int r;

r = vhost_migration_log(listener, true);
if (r < 0) {
abort();   /* branch taken */
}
}

What confuse me is that
1. do we really need to abort here ?
2. all member of callbacks in MemoryListener returned with type void, we cannot 
judge in any upper function on the call stack.
Can we just cancel migration here instead of calling abort ? like:

diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index ddc42f0..27ae4a2 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -27,6 +27,7 @@
 #include "hw/virtio/virtio-access.h"
 #include "migration/blocker.h"
 #include "sysemu/dma.h"
+#include "qmp-commands.h"

 /* enabled until disconnected backend stabilizes */
 #define _VHOST_DEBUG 1
@@ -885,7 +886,7 @@ static void vhost_log_global_start(MemoryListener *listener)

 r = vhost_migration_log(listener, true);
 if (r < 0) {
-abort();
+qmp_migrate_cancel(NULL);
 }
 }

@@ -895,7 +896,7 @@ static void vhost_log_global_stop(MemoryListener *listener)

 r = vhost_migration_log(listener, false);
 if (r < 0) {
-abort();
+qmp_migrate_cancel(NULL);
 }
 }






Re: [Qemu-devel] kvm bug in __rmap_clear_dirty during live migration

2017-03-13 Thread fangying

Hi, Huang Kai

After weeks of intensive testing, we think the problem is solved and 
this issue can be closed.


On 2017/2/27 15:38, Huang, Kai wrote:



On 2/25/2017 2:44 PM, Herongguang (Stephen) wrote:



On 2017/2/24 23:14, Paolo Bonzini wrote:



On 24/02/2017 16:10, Chris Friesen wrote:

On 02/23/2017 08:23 PM, Herongguang (Stephen) wrote:


On 2017/2/22 22:43, Paolo Bonzini wrote:



Hopefully Gaohuai and Rongguang can help with this too.

Paolo


Yes, we are looking into and testing this.

I think this can result in any memory corruption, if VM1 writes its
PML buffer into VM2’s VMCS (since sched_in/sched_out notifier of VM1
is not registered yet), then VM1 is destroyed (hence its PML buffer
is freed back to kernel), after that, VM2 starts migration, so CPU
logs VM2’s dirty GFNS into a freed memory, results in any memory
corruption.

As its severity, this commit
(http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=4e59516a12a6ef6dcb660cb3a3f70c64bd60cfec)



is eligible to back port to kernel stable.


Are we expecting that fix to resolve the original issue, or is it a
separate issue that needs fixing in stable?


It should be the original issue.

Paolo

.


Yes, I agree, though we are still testing.



Hi Stephen,

Sorry for late reply. I was taking the whole week off last week. How's
the test going?

Thanks,
-Kai

.