Re: [PATCH v1 0/2] Add timeout mechanism to qmp actions

2020-12-21 Thread Zhenyu Ye
On 2020/12/15 0:33, Stefan Hajnoczi wrote:
> On Tue, Dec 08, 2020 at 08:47:42AM -0500, Glauber Costa wrote:
>> The work we did at the time was in fixing those things in the kernel
>> as much as we could.
>> But the API is just like that...
> 

The best way for us is to replace io_submit with io_uring.

Thanks for the discussion!

Zhenyu





Re: [PATCH v1 0/2] Add timeout mechanism to qmp actions

2020-12-14 Thread Stefan Hajnoczi
On Tue, Dec 08, 2020 at 08:47:42AM -0500, Glauber Costa wrote:
> The work we did at the time was in fixing those things in the kernel
> as much as we could.
> But the API is just like that...

Thanks!

Stefan


signature.asc
Description: PGP signature


Re: [PATCH v1 0/2] Add timeout mechanism to qmp actions

2020-12-08 Thread Glauber Costa
On Tue, Dec 8, 2020 at 8:11 AM Stefan Hajnoczi  wrote:
>
> On Thu, Oct 22, 2020 at 05:29:16PM +0100, Fam Zheng wrote:
> > On Tue, 2020-10-20 at 09:34 +0800, Zhenyu Ye wrote:
> > > On 2020/10/19 21:25, Paolo Bonzini wrote:
> > > > On 19/10/20 14:40, Zhenyu Ye wrote:
> > > > > The kernel backtrace for io_submit in GUEST is:
> > > > >
> > > > > guest# ./offcputime -K -p `pgrep -nx fio`
> > > > > b'finish_task_switch'
> > > > > b'__schedule'
> > > > > b'schedule'
> > > > > b'io_schedule'
> > > > > b'blk_mq_get_tag'
> > > > > b'blk_mq_get_request'
> > > > > b'blk_mq_make_request'
> > > > > b'generic_make_request'
> > > > > b'submit_bio'
> > > > > b'blkdev_direct_IO'
> > > > > b'generic_file_read_iter'
> > > > > b'aio_read'
> > > > > b'io_submit_one'
> > > > > b'__x64_sys_io_submit'
> > > > > b'do_syscall_64'
> > > > > b'entry_SYSCALL_64_after_hwframe'
> > > > > -fio (1464)
> > > > > 40031912
> > > > >
> > > > > And Linux io_uring can avoid the latency problem.
> >
> > Thanks for the info. What this tells us is basically the inflight
> > requests are high. It's sad that the linux-aio is in practice
> > implemented as a blocking API.

it is.

> >
> > Host side backtrace will be of more help. Can you get that too?
>
> I guess Linux AIO didn't set the BLK_MQ_REQ_NOWAIT flag so the task went
> to sleep when it ran out of blk-mq tags. The easiest solution is to move
> to io_uring. Linux AIO is broken - it's not AIO :).

Agree!
>
> If we know that no other process is writing to the host block device
> then maybe we can determine the blk-mq tags limit (the queue depth) and
> avoid sending more requests. That way QEMU doesn't block, but I don't
> think this approach works when other processes are submitting I/O to the
> same host block device :(.
>
> Fam's original suggestion of invoking io_submit(2) from a worker thread
> is an option, but I'm afraid it will slow down the uncontended case.
>
> I'm CCing Glauber in case he battled this in the past in ScyllaDB.

We have, and a lot. I don't recall seeing this particular lock, but
XFS would block us all the time
if it had to update metadata to submit the operation, lock inodes, etc.

The work we did at the time was in fixing those things in the kernel
as much as we could.
But the API is just like that...

>
> Stefan



Re: [PATCH v1 0/2] Add timeout mechanism to qmp actions

2020-12-08 Thread Stefan Hajnoczi
On Thu, Oct 22, 2020 at 05:29:16PM +0100, Fam Zheng wrote:
> On Tue, 2020-10-20 at 09:34 +0800, Zhenyu Ye wrote:
> > On 2020/10/19 21:25, Paolo Bonzini wrote:
> > > On 19/10/20 14:40, Zhenyu Ye wrote:
> > > > The kernel backtrace for io_submit in GUEST is:
> > > > 
> > > > guest# ./offcputime -K -p `pgrep -nx fio`
> > > > b'finish_task_switch'
> > > > b'__schedule'
> > > > b'schedule'
> > > > b'io_schedule'
> > > > b'blk_mq_get_tag'
> > > > b'blk_mq_get_request'
> > > > b'blk_mq_make_request'
> > > > b'generic_make_request'
> > > > b'submit_bio'
> > > > b'blkdev_direct_IO'
> > > > b'generic_file_read_iter'
> > > > b'aio_read'
> > > > b'io_submit_one'
> > > > b'__x64_sys_io_submit'
> > > > b'do_syscall_64'
> > > > b'entry_SYSCALL_64_after_hwframe'
> > > > -fio (1464)
> > > > 40031912
> > > > 
> > > > And Linux io_uring can avoid the latency problem.
> 
> Thanks for the info. What this tells us is basically the inflight
> requests are high. It's sad that the linux-aio is in practice
> implemented as a blocking API.
> 
> Host side backtrace will be of more help. Can you get that too?

I guess Linux AIO didn't set the BLK_MQ_REQ_NOWAIT flag so the task went
to sleep when it ran out of blk-mq tags. The easiest solution is to move
to io_uring. Linux AIO is broken - it's not AIO :).

If we know that no other process is writing to the host block device
then maybe we can determine the blk-mq tags limit (the queue depth) and
avoid sending more requests. That way QEMU doesn't block, but I don't
think this approach works when other processes are submitting I/O to the
same host block device :(.

Fam's original suggestion of invoking io_submit(2) from a worker thread
is an option, but I'm afraid it will slow down the uncontended case.

I'm CCing Glauber in case he battled this in the past in ScyllaDB.

Stefan


signature.asc
Description: PGP signature


Re: [PATCH v1 0/2] Add timeout mechanism to qmp actions

2020-10-22 Thread Fam Zheng
On Tue, 2020-10-20 at 09:34 +0800, Zhenyu Ye wrote:
> On 2020/10/19 21:25, Paolo Bonzini wrote:
> > On 19/10/20 14:40, Zhenyu Ye wrote:
> > > The kernel backtrace for io_submit in GUEST is:
> > > 
> > >   guest# ./offcputime -K -p `pgrep -nx fio`
> > >   b'finish_task_switch'
> > >   b'__schedule'
> > >   b'schedule'
> > >   b'io_schedule'
> > >   b'blk_mq_get_tag'
> > >   b'blk_mq_get_request'
> > >   b'blk_mq_make_request'
> > >   b'generic_make_request'
> > >   b'submit_bio'
> > >   b'blkdev_direct_IO'
> > >   b'generic_file_read_iter'
> > >   b'aio_read'
> > >   b'io_submit_one'
> > >   b'__x64_sys_io_submit'
> > >   b'do_syscall_64'
> > >   b'entry_SYSCALL_64_after_hwframe'
> > >   -fio (1464)
> > >   40031912
> > > 
> > > And Linux io_uring can avoid the latency problem.

Thanks for the info. What this tells us is basically the inflight
requests are high. It's sad that the linux-aio is in practice
implemented as a blocking API.

Host side backtrace will be of more help. Can you get that too?

Fam

> > 
> > What filesystem are you using?
> > 
> 
> On host, the VM image and disk images are based on ext4 filesystem.
> In guest, the '/' uses xfs filesystem, and the disks are raw devices.
> 
> guest# df -hT
> Filesystem  Type  Size  Used Avail Use% Mounted on
> devtmpfsdevtmpfs   16G 0   16G   0% /dev
> tmpfs   tmpfs  16G 0   16G   0% /dev/shm
> tmpfs   tmpfs  16G  976K   16G   1% /run
> /dev/mapper/fedora-root xfs   8.0G  3.2G  4.9G  40% /
> tmpfs   tmpfs  16G 0   16G   0% /tmp
> /dev/sda1   xfs  1014M  181M  834M  18% /boot
> tmpfs   tmpfs 3.2G 0  3.2G   0% /run/user/0
> 
> guest# lsblk
> NAMEMAJ:MIN RM SIZE RO TYPE MOUNTPOINT
> sda   8:00  10G  0 disk
> ├─sda18:10   1G  0 part /boot
> └─sda28:20   9G  0 part
>   ├─fedora-root 253:00   8G  0 lvm  /
>   └─fedora-swap 253:10   1G  0 lvm  [SWAP]
> vda 252:00  10G  0 disk
> vdb 252:16   0  10G  0 disk
> vdc 252:32   0  10G  0 disk
> vdd 252:48   0  10G  0 disk
> 
> Thanks,
> Zhenyu
> 




Re: [PATCH v1 0/2] Add timeout mechanism to qmp actions

2020-10-19 Thread Zhenyu Ye
On 2020/10/19 21:25, Paolo Bonzini wrote:
> On 19/10/20 14:40, Zhenyu Ye wrote:
>> The kernel backtrace for io_submit in GUEST is:
>>
>>  guest# ./offcputime -K -p `pgrep -nx fio`
>>  b'finish_task_switch'
>>  b'__schedule'
>>  b'schedule'
>>  b'io_schedule'
>>  b'blk_mq_get_tag'
>>  b'blk_mq_get_request'
>>  b'blk_mq_make_request'
>>  b'generic_make_request'
>>  b'submit_bio'
>>  b'blkdev_direct_IO'
>>  b'generic_file_read_iter'
>>  b'aio_read'
>>  b'io_submit_one'
>>  b'__x64_sys_io_submit'
>>  b'do_syscall_64'
>>  b'entry_SYSCALL_64_after_hwframe'
>>  -fio (1464)
>>  40031912
>>
>> And Linux io_uring can avoid the latency problem.
> 
> What filesystem are you using?
> 

On host, the VM image and disk images are based on ext4 filesystem.
In guest, the '/' uses xfs filesystem, and the disks are raw devices.

guest# df -hT
Filesystem  Type  Size  Used Avail Use% Mounted on
devtmpfsdevtmpfs   16G 0   16G   0% /dev
tmpfs   tmpfs  16G 0   16G   0% /dev/shm
tmpfs   tmpfs  16G  976K   16G   1% /run
/dev/mapper/fedora-root xfs   8.0G  3.2G  4.9G  40% /
tmpfs   tmpfs  16G 0   16G   0% /tmp
/dev/sda1   xfs  1014M  181M  834M  18% /boot
tmpfs   tmpfs 3.2G 0  3.2G   0% /run/user/0

guest# lsblk
NAMEMAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda   8:00  10G  0 disk
├─sda18:10   1G  0 part /boot
└─sda28:20   9G  0 part
  ├─fedora-root 253:00   8G  0 lvm  /
  └─fedora-swap 253:10   1G  0 lvm  [SWAP]
vda 252:00  10G  0 disk
vdb 252:16   0  10G  0 disk
vdc 252:32   0  10G  0 disk
vdd 252:48   0  10G  0 disk

Thanks,
Zhenyu



Re: [PATCH v1 0/2] Add timeout mechanism to qmp actions

2020-10-19 Thread Paolo Bonzini
On 19/10/20 14:40, Zhenyu Ye wrote:
> The kernel backtrace for io_submit in GUEST is:
> 
>   guest# ./offcputime -K -p `pgrep -nx fio`
>   b'finish_task_switch'
>   b'__schedule'
>   b'schedule'
>   b'io_schedule'
>   b'blk_mq_get_tag'
>   b'blk_mq_get_request'
>   b'blk_mq_make_request'
>   b'generic_make_request'
>   b'submit_bio'
>   b'blkdev_direct_IO'
>   b'generic_file_read_iter'
>   b'aio_read'
>   b'io_submit_one'
>   b'__x64_sys_io_submit'
>   b'do_syscall_64'
>   b'entry_SYSCALL_64_after_hwframe'
>   -fio (1464)
>   40031912
> 
> And Linux io_uring can avoid the latency problem.

What filesystem are you using?

Paolo




Re: [PATCH v1 0/2] Add timeout mechanism to qmp actions

2020-10-19 Thread Zhenyu Ye
Hi Stefan,

On 2020/10/13 18:00, Stefan Hajnoczi wrote:
> 
> Sorry, I lost track of this on-going email thread.
> 
> Thanks for the backtrace. It shows the io_submit call is done while the
> AioContext lock is held. The monitor thread is waiting for the
> IOThread's AioContext lock. vcpus threads can get stuck waiting on the
> big QEMU lock (BQL) that is held by the monitor in the meantime.
> 
> Please collect the kernel backtrace for io_submit so we can understand
> why multi-second io_submit latencies happen.
> 
> I also suggest trying aio=io_uring to check if Linux io_uring avoids the
> latency problem.
> 

The kernel backtrace for io_submit in GUEST is:

guest# ./offcputime -K -p `pgrep -nx fio`
b'finish_task_switch'
b'__schedule'
b'schedule'
b'io_schedule'
b'blk_mq_get_tag'
b'blk_mq_get_request'
b'blk_mq_make_request'
b'generic_make_request'
b'submit_bio'
b'blkdev_direct_IO'
b'generic_file_read_iter'
b'aio_read'
b'io_submit_one'
b'__x64_sys_io_submit'
b'do_syscall_64'
b'entry_SYSCALL_64_after_hwframe'
-fio (1464)
40031912

And Linux io_uring can avoid the latency problem.

Thanks,
Zhenyu



Re: [PATCH v1 0/2] Add timeout mechanism to qmp actions

2020-10-13 Thread Stefan Hajnoczi
On Mon, Sep 21, 2020 at 11:14:35AM +, Fam Zheng wrote:
> On 2020-09-19 10:22, Zhenyu Ye wrote:
> > On 2020/9/18 22:06, Fam Zheng wrote:
> > > 
> > > I can see how blocking in a slow io_submit can cause trouble for main
> > > thread. I think one way to fix it (until it's made truly async in new
> > > kernels) is moving the io_submit call to thread pool, and wrapped in a
> > > coroutine, perhaps.
> > >
> > 
> > I'm not sure if any other operation will block the main thread, other
> > than io_submit().
> 
> Then that's a problem with io_submit which should be fixed. Or more
> precisely, that is a long held lock that we should avoid in QEMU's event
> loops.
> 
> > 
> > > I'm not sure qmp timeout is a complete solution because we would still
> > > suffer from a blocked state for a period, in this exact situation before
> > > the timeout.
> > 
> > Anyway, the qmp timeout may be the last measure to prevent the VM
> > soft lockup. 
> 
> Maybe, but I don't think baking such a workaround into the QMP API is a
> good idea. No QMP command should be synchronously long running, so
> having a timeout parameter is just a wrong design.

Sorry, I lost track of this on-going email thread.

Thanks for the backtrace. It shows the io_submit call is done while the
AioContext lock is held. The monitor thread is waiting for the
IOThread's AioContext lock. vcpus threads can get stuck waiting on the
big QEMU lock (BQL) that is held by the monitor in the meantime.

Please collect the kernel backtrace for io_submit so we can understand
why multi-second io_submit latencies happen.

I also suggest trying aio=io_uring to check if Linux io_uring avoids the
latency problem.

Stefan


signature.asc
Description: PGP signature


Re: [PATCH v1 0/2] Add timeout mechanism to qmp actions

2020-09-21 Thread Fam Zheng
On 2020-09-19 10:22, Zhenyu Ye wrote:
> On 2020/9/18 22:06, Fam Zheng wrote:
> > 
> > I can see how blocking in a slow io_submit can cause trouble for main
> > thread. I think one way to fix it (until it's made truly async in new
> > kernels) is moving the io_submit call to thread pool, and wrapped in a
> > coroutine, perhaps.
> >
> 
> I'm not sure if any other operation will block the main thread, other
> than io_submit().

Then that's a problem with io_submit which should be fixed. Or more
precisely, that is a long held lock that we should avoid in QEMU's event
loops.

> 
> > I'm not sure qmp timeout is a complete solution because we would still
> > suffer from a blocked state for a period, in this exact situation before
> > the timeout.
> 
> Anyway, the qmp timeout may be the last measure to prevent the VM
> soft lockup. 

Maybe, but I don't think baking such a workaround into the QMP API is a
good idea. No QMP command should be synchronously long running, so
having a timeout parameter is just a wrong design.

Thanks,

Fam



Re: [PATCH v1 0/2] Add timeout mechanism to qmp actions

2020-09-18 Thread Zhenyu Ye
On 2020/9/18 22:06, Fam Zheng wrote:
> 
> I can see how blocking in a slow io_submit can cause trouble for main
> thread. I think one way to fix it (until it's made truly async in new
> kernels) is moving the io_submit call to thread pool, and wrapped in a
> coroutine, perhaps.
>

I'm not sure if any other operation will block the main thread, other
than io_submit().

> I'm not sure qmp timeout is a complete solution because we would still
> suffer from a blocked state for a period, in this exact situation before
> the timeout.

Anyway, the qmp timeout may be the last measure to prevent the VM
soft lockup.  Ideally, after your fix of io_submit, this mechanism will
never be triggered.

Thanks,
Zhenyu



Re: [PATCH v1 0/2] Add timeout mechanism to qmp actions

2020-09-18 Thread Fam Zheng
On 2020-09-18 19:23, Zhenyu Ye wrote:
>   Thread 5 (LWP 4802):
>   #0  0x83086b54 in syscall () at /lib64/libc.so.6
>   #1  0x834598b8 in io_submit () at /lib64/libaio.so.1
>   #2  0xe851e89c in ioq_submit (s=0xfffd3c001bb0) at 
> ../block/linux-aio.c:299
>   #3  0xe851eb50 in laio_io_unplug (bs=0xef0f2340, 
> s=0xfffd3c001bb0)
>   at ../block/linux-aio.c:344
>   #4  0xe8559f1c in raw_aio_unplug (bs=0xef0f2340) at 
> ../block/file-posix.c:2063
>   #5  0xe8538344 in bdrv_io_unplug (bs=0xef0f2340) at 
> ../block/io.c:3135
>   #6  0xe8538360 in bdrv_io_unplug (bs=0xef0eb020) at 
> ../block/io.c:3140
>   #7  0xe8496104 in blk_io_unplug (blk=0xef0e8f20)
>   at ../block/block-backend.c:2147
>   #8  0xe830e1a4 in virtio_blk_handle_vq (s=0xf0374280, 
> vq=0x700fc1d8)
>   at ../hw/block/virtio-blk.c:796
>   #9  0xe82e6b68 in virtio_blk_data_plane_handle_output
>   (vdev=0xf0374280, vq=0x700fc1d8) at 
> ../hw/block/dataplane/virtio-blk.c:165
>   #10 0xe83878fc in virtio_queue_notify_aio_vq (vq=0x700fc1d8)
>   at ../hw/virtio/virtio.c:2325
>   #11 0xe838ab50 in virtio_queue_host_notifier_aio_poll 
> (opaque=0x700fc250)
>   at ../hw/virtio/virtio.c:3545
>   #12 0xe85fab3c in run_poll_handlers_once
>   (ctx=0xef0a87b0, now=77604310618960, timeout=0x73ffdf78)
>   at ../util/aio-posix.c:398
>   #13 0xe85fae5c in run_poll_handlers
>   (ctx=0xef0a87b0, max_ns=4000, timeout=0x73ffdf78) at 
> ../util/aio-posix.c:492
>   #14 0xe85fb078 in try_poll_mode (ctx=0xef0a87b0, 
> timeout=0x73ffdf78)
>   at ../util/aio-posix.c:535
>   #15 0xe85fb180 in aio_poll (ctx=0xef0a87b0, blocking=true)
>   at ../util/aio-posix.c:571
>   #16 0xe8027004 in iothread_run (opaque=0xeee79a00) at 
> ../iothread.c:73
>   #17 0xe85f269c in qemu_thread_start (args=0xef0a8d10)
>   at ../util/qemu-thread-posix.c:521
>   #18 0x831428bc in  () at /lib64/libpthread.so.0
>   #19 0x8308aa1c in  () at /lib64/libc.so.6

I can see how blocking in a slow io_submit can cause trouble for main
thread. I think one way to fix it (until it's made truly async in new
kernels) is moving the io_submit call to thread pool, and wrapped in a
coroutine, perhaps.

I'm not sure qmp timeout is a complete solution because we would still
suffer from a blocked state for a period, in this exact situation before
the timeout.

Fam



Re: [PATCH v1 0/2] Add timeout mechanism to qmp actions

2020-09-18 Thread Zhenyu Ye
Hi Stefan, Fam,

On 2020/9/18 0:01, Fam Zheng wrote:
> On 2020-09-17 16:44, Stefan Hajnoczi wrote:
>> On Thu, Sep 17, 2020 at 03:36:57PM +0800, Zhenyu Ye wrote:
>>> When the hang occurs, the QEMU is blocked at:
>>>
>>> #0  0x95762b64 in ?? () from target:/usr/lib64/libpthread.so.0
>>> #1  0x9575bd88 in pthread_mutex_lock () from 
>>> target:/usr/lib64/libpthread.so.0
>>> #2  0xbb1f5948 in qemu_mutex_lock_impl (mutex=0xcc8e1860,
>>> file=0xbb4e1bd0 
>>> "/Images/eillon/CODE/5-opensource/qemu/util/async.c", line=605)
>>> #3  0xbb20acd4 in aio_context_acquire (ctx=0xcc8e1800)
>>> #4  0xbb105e90 in bdrv_query_image_info (bs=0xcc934620,
>>> p_info=0xccc41e18, errp=0xca669118)
>>> #5  0xbb105968 in bdrv_block_device_info (blk=0xcdca19f0, 
>>> bs=0xcc934620,
>>> flat=false, errp=0xca6692b8)
>>> #6  0xbb1063dc in bdrv_query_info (blk=0xcdca19f0, 
>>> p_info=0xcd29c9a8,
>>> errp=0xca6692b8)
>>> #7  0xbb106c14 in qmp_query_block (errp=0x0)
>>> #8  0xbacb8e6c in hmp_info_block (mon=0xca6693d0, 
>>> qdict=0xcd089790)
>>
>> Great, this shows that the main loop thread is stuck waiting for the
>> AioContext lock.
>>
>> Please post backtraces from all QEMU threads ((gdb) thread apply all bt)
>> so we can figure out which thread is holding up the main loop.
> 
> I think that is reflected in the perf backtrace posted by Zheny already:
> 
> And in the host, the information of sys_enter_io_submit() is:
> 
> Samples: 3K of event 'syscalls:sys_enter_io_submit', Event count
> (approx.): 3150
>Children  Self  Trace output
>-   66.70%66.70%  ctx_id: 0x9c044000,
>nr: 0x0001, iocbpp: 0x9f7fad28
>0xae7f871c
>0xae8a27c4
>qemu_thread_start
>iothread_run
>aio_poll
>aio_dispatch_ready_handlers
>aio_dispatch_handler
>virtio_queue_host_notifier_aio_read
>virtio_queue_notify_aio_vq
>virtio_blk_data_plane_handle_output
>virtio_blk_handle_vq
>blk_io_unplug
>bdrv_io_unplug
>bdrv_io_unplug
>raw_aio_unplug
>laio_io_unplug
>syscall
> 
> So the iothread is blocked by a slow io_submit holding the AioContext
> lock.
> 
> It would be interesting to know what in kernel is blocking io_submit
> from returning.
> 

Sorry, I only get the backtraces of all QEMU threads, like below:

(gdb) thread apply all bt

Thread 35 (LWP 49700):
#0  0x83148fe4 in pthread_cond_timedwait () at 
/lib64/libpthread.so.0
#1  0xe85f2080 in qemu_sem_timedwait (sem=0xef12e748, 
ms=1)
at ../util/qemu-thread-posix.c:282
#2  0xe861c600 in worker_thread (opaque=0xef12e6c0)
at ../util/thread-pool.c:91
#3  0xe85f269c in qemu_thread_start (args=0xf0204be0)
at ../util/qemu-thread-posix.c:521
#4  0x831428bc in  () at /lib64/libpthread.so.0
#5  0x8308aa1c in  () at /lib64/libc.so.6

Thread 34 (LWP 45337):
#0  0x83148fe4 in pthread_cond_timedwait () at 
/lib64/libpthread.so.0
#1  0xe85f2080 in qemu_sem_timedwait (sem=0xef12e748, 
ms=1)
at ../util/qemu-thread-posix.c:282
#2  0xe861c600 in worker_thread (opaque=0xef12e6c0)
at ../util/thread-pool.c:91
#3  0xe85f269c in qemu_thread_start (args=0xfffcc8000b20)
at ../util/qemu-thread-posix.c:521
#4  0x831428bc in  () at /lib64/libpthread.so.0
#5  0x8308aa1c in  () at /lib64/libc.so.6

Thread 33 (LWP 45336):
#0  0x83148fe4 in pthread_cond_timedwait () at 
/lib64/libpthread.so.0
#1  0xe85f2080 in qemu_sem_timedwait (sem=0xef12e748, 
ms=1)
at ../util/qemu-thread-posix.c:282
#2  0xe861c600 in worker_thread (opaque=0xef12e6c0)
at ../util/thread-pool.c:91
#3  0xe85f269c in qemu_thread_start (args=0xfffd14000b20)
at ../util/qemu-thread-posix.c:521
#4  0x831428bc in  () at /lib64/libpthread.so.0
#5  0x8308aa1c in  () at /lib64/libc.so.6

Thread 32 (LWP 45335):
#0  0x83148fe4 in pthread_cond_timedwait () at 
/lib64/libpthread.so.0
#1  0xe85f2080 in qemu_sem_timedwait (sem=0xef12e748, 
ms=1)
at ../util/qemu-thread-posix.c:282
#2  0xe861c600 in worker_thread (opaque=0xef12e6c0)
at ../util/thread-pool.c:91
#3  0xe85f269c in qemu_thread_start (args=0xefb00f00)
at ../util/qemu-thread-posix.c:521
#4  0x831428bc in  () at /lib64/libpthread.so.0
#5  0x8308aa1c in  () at /lib64/libc.so.6

Thread 12 (LWP 4849):
#0  0x83082a0c in 

Re: [PATCH v1 0/2] Add timeout mechanism to qmp actions

2020-09-17 Thread Fam Zheng
On 2020-09-17 16:44, Stefan Hajnoczi wrote:
> On Thu, Sep 17, 2020 at 03:36:57PM +0800, Zhenyu Ye wrote:
> > When the hang occurs, the QEMU is blocked at:
> > 
> > #0  0x95762b64 in ?? () from target:/usr/lib64/libpthread.so.0
> > #1  0x9575bd88 in pthread_mutex_lock () from 
> > target:/usr/lib64/libpthread.so.0
> > #2  0xbb1f5948 in qemu_mutex_lock_impl (mutex=0xcc8e1860,
> > file=0xbb4e1bd0 
> > "/Images/eillon/CODE/5-opensource/qemu/util/async.c", line=605)
> > #3  0xbb20acd4 in aio_context_acquire (ctx=0xcc8e1800)
> > #4  0xbb105e90 in bdrv_query_image_info (bs=0xcc934620,
> > p_info=0xccc41e18, errp=0xca669118)
> > #5  0xbb105968 in bdrv_block_device_info (blk=0xcdca19f0, 
> > bs=0xcc934620,
> > flat=false, errp=0xca6692b8)
> > #6  0xbb1063dc in bdrv_query_info (blk=0xcdca19f0, 
> > p_info=0xcd29c9a8,
> > errp=0xca6692b8)
> > #7  0xbb106c14 in qmp_query_block (errp=0x0)
> > #8  0xbacb8e6c in hmp_info_block (mon=0xca6693d0, 
> > qdict=0xcd089790)
> 
> Great, this shows that the main loop thread is stuck waiting for the
> AioContext lock.
> 
> Please post backtraces from all QEMU threads ((gdb) thread apply all bt)
> so we can figure out which thread is holding up the main loop.

I think that is reflected in the perf backtrace posted by Zheny already:

And in the host, the information of sys_enter_io_submit() is:

Samples: 3K of event 'syscalls:sys_enter_io_submit', Event count
(approx.): 3150
   Children  Self  Trace output
   -   66.70%66.70%  ctx_id: 0x9c044000,
   nr: 0x0001, iocbpp: 0x9f7fad28
   0xae7f871c
   0xae8a27c4
   qemu_thread_start
   iothread_run
   aio_poll
   aio_dispatch_ready_handlers
   aio_dispatch_handler
   virtio_queue_host_notifier_aio_read
   virtio_queue_notify_aio_vq
   virtio_blk_data_plane_handle_output
   virtio_blk_handle_vq
   blk_io_unplug
   bdrv_io_unplug
   bdrv_io_unplug
   raw_aio_unplug
   laio_io_unplug
   syscall

So the iothread is blocked by a slow io_submit holding the AioContext
lock.

It would be interesting to know what in kernel is blocking io_submit
from returning.

Fam



Re: [PATCH v1 0/2] Add timeout mechanism to qmp actions

2020-09-17 Thread Stefan Hajnoczi
On Thu, Sep 17, 2020 at 03:36:57PM +0800, Zhenyu Ye wrote:
> When the hang occurs, the QEMU is blocked at:
> 
>   #0  0x95762b64 in ?? () from target:/usr/lib64/libpthread.so.0
>   #1  0x9575bd88 in pthread_mutex_lock () from 
> target:/usr/lib64/libpthread.so.0
>   #2  0xbb1f5948 in qemu_mutex_lock_impl (mutex=0xcc8e1860,
>   file=0xbb4e1bd0 
> "/Images/eillon/CODE/5-opensource/qemu/util/async.c", line=605)
>   #3  0xbb20acd4 in aio_context_acquire (ctx=0xcc8e1800)
>   #4  0xbb105e90 in bdrv_query_image_info (bs=0xcc934620,
>   p_info=0xccc41e18, errp=0xca669118)
>   #5  0xbb105968 in bdrv_block_device_info (blk=0xcdca19f0, 
> bs=0xcc934620,
>   flat=false, errp=0xca6692b8)
>   #6  0xbb1063dc in bdrv_query_info (blk=0xcdca19f0, 
> p_info=0xcd29c9a8,
>   errp=0xca6692b8)
>   #7  0xbb106c14 in qmp_query_block (errp=0x0)
>   #8  0xbacb8e6c in hmp_info_block (mon=0xca6693d0, 
> qdict=0xcd089790)

Great, this shows that the main loop thread is stuck waiting for the
AioContext lock.

Please post backtraces from all QEMU threads ((gdb) thread apply all bt)
so we can figure out which thread is holding up the main loop.

Thanks,
Stefan


signature.asc
Description: PGP signature


Re: [PATCH v1 0/2] Add timeout mechanism to qmp actions

2020-09-17 Thread Fam Zheng
On 2020-09-17 15:36, Zhenyu Ye wrote:
> Hi Stefan,
> 
> On 2020/9/14 21:27, Stefan Hajnoczi wrote:
> >>
> >> Theoretically, everything running in an iothread is asynchronous. However,
> >> some 'asynchronous' actions are not non-blocking entirely, such as
> >> io_submit().  This will block while the iodepth is too big and I/O pressure
> >> is too high.  If we do some qmp actions, such as 'info block', at this 
> >> time,
> >> may cause vm soft lockup.  This series can make these qmp actions safer.
> >>
> >> I constructed the scene as follow:
> >> 1. create a vm with 4 disks, using iothread.
> >> 2. add press to the CPU on the host.  In my scene, the CPU usage exceeds 
> >> 95%.
> >> 3. add press to the 4 disks in the vm at the same time.  I used the fio and
> >> some parameters are:
> >>
> >> fio -rw=randrw -bs=1M -size=1G -iodepth=512 -ioengine=libaio -numjobs=4
> >>
> >> 4. do block query actions, for example, by virsh:
> >>
> >>virsh qemu-monitor-command [vm name] --hmp info block
> >>
> >> Then the vm will soft lockup, the calltrace is:
> >>
> >> [  192.311393] watchdog: BUG: soft lockup - CPU#1 stuck for 42s! 
> >> [kworker/1:1:33]
> > 
> > Hi,
> > Sorry I haven't had time to investigate this myself yet.
> > 
> > Do you also have a QEMU backtrace when the hang occurs?
> > 
> > Let's find out if QEMU is stuck in the io_submit(2) syscall or whether
> > there's an issue in QEMU itself that causes the softlockup (for example,
> > aio_poll() with the global mutex held).
> 
> Sorry for the delay.
> 
> I traced the io_submit() within the guest. The maximum execution time exceeds 
> 16s:
> 
>   guest# perf trace -p `pidof -s fio` --duration 1
>   14.808 (3843.082 ms): io_submit(ctx_id: 281472924459008, nr: 1, 
> iocbpp: 0x19a87160  ) = 1
> 3861.336 (11470.608 ms): io_submit(ctx_id: 281472924459008, nr: 1, 
> iocbpp: 0x19a87160  ) = 1
>15341.998 (479.283 ms): io_submit(ctx_id: 281472924459008, nr: 1, 
> iocbpp: 0x19a87160  ) = 1
>15831.380 (3704.997 ms): io_submit(ctx_id: 281472924459008, nr: 1, 
> iocbpp: 0x19a87160  ) = 1
>19547.869 (3412.839 ms): io_submit(ctx_id: 281472924459008, nr: 1, 
> iocbpp: 0x19a87160  ) = 1
>22966.953 (1080.854 ms): io_submit(ctx_id: 281472924459008, nr: 1, 
> iocbpp: 0x19a87160  ) = 1
>24062.689 (6939.813 ms): io_submit(ctx_id: 281472924459008, nr: 1, 
> iocbpp: 0x19a87160  ) = 1
>31019.219 (532.147 ms): io_submit(ctx_id: 281472924459008, nr: 1, 
> iocbpp: 0x19a87160  ) = 1
>31556.610 (3469.920 ms): io_submit(ctx_id: 281472924459008, nr: 1, 
> iocbpp: 0x19a87160  ) = 1
>35038.885 (9007.175 ms): io_submit(ctx_id: 281472924459008, nr: 1, 
> iocbpp: 0x19a87160  ) = 1
>44053.578 (16006.405 ms): io_submit(ctx_id: 281472924459008, nr: 1, 
> iocbpp: 0x19a87160  ) = 1
>60068.944 (3068.748 ms): io_submit(ctx_id: 281472924459008, nr: 1, 
> iocbpp: 0x19a87160  ) = 1
>63138.474 (13012.499 ms): io_getevents(ctx_id: 281472924459008, 
> min_nr: 511, nr: 511, events: 0x19a83150) = 511
>76191.073 (2872.657 ms): io_submit(ctx_id: 281472924459008, nr: 1, 
> iocbpp: 0x19a87160  ) = 1
>   ...
> 
> And in the host, the information of sys_enter_io_submit() is:
> 
>   Samples: 3K of event 'syscalls:sys_enter_io_submit', Event count 
> (approx.): 3150
>   Children  Self  Trace output
>   -   66.70%66.70%  ctx_id: 0x9c044000, nr: 0x0001, iocbpp: 
> 0x9f7fad28
>   0xae7f871c
>   0xae8a27c4
>   qemu_thread_start
>   iothread_run
>   aio_poll
>   aio_dispatch_ready_handlers
>   aio_dispatch_handler
>   virtio_queue_host_notifier_aio_read
>   virtio_queue_notify_aio_vq
>   virtio_blk_data_plane_handle_output
>   virtio_blk_handle_vq
>   blk_io_unplug
>   bdrv_io_unplug
>   bdrv_io_unplug
>   raw_aio_unplug
>   laio_io_unplug
>   syscall
> 
> 
> When the hang occurs, the QEMU is blocked at:
> 
>   #0  0x95762b64 in ?? () from target:/usr/lib64/libpthread.so.0
>   #1  0x9575bd88 in pthread_mutex_lock () from 
> target:/usr/lib64/libpthread.so.0
>   #2  0xbb1f5948 in qemu_mutex_lock_impl (mutex=0xcc8e1860,
>   file=0xbb4e1bd0 
> "/Images/eillon/CODE/5-opensource/qemu/util/async.c", line=605)
>   #3  0xbb20acd4 in aio_context_acquire (ctx=0xcc8e1800)
>   #4  0xbb105e90 in bdrv_query_image_info (bs=0xcc934620,
>   p_info=0xccc41e18, errp=0xca669118)
>   #5  0xbb105968 in bdrv_block_device_info (blk=0xcdca19f0, 
> bs=0xcc934620,
>   flat=false, errp=0xca6692b8)
>   #6  0xbb1063dc in bdrv_query_info (blk=0xcdca19f0, 
> p_info=0xcd29c9a8,
>   errp=0xca6692b8)
>   #7  0xbb106c14 in qmp_query_block (errp=0x0)
>   

Re: [PATCH v1 0/2] Add timeout mechanism to qmp actions

2020-09-17 Thread Zhenyu Ye
Hi Daniel,

On 2020/9/14 22:42, Daniel P. Berrangé wrote:
> On Tue, Aug 11, 2020 at 09:54:08PM +0800, Zhenyu Ye wrote:
>> Hi Kevin,
>>
>> On 2020/8/10 23:38, Kevin Wolf wrote:
>>> Am 10.08.2020 um 16:52 hat Zhenyu Ye geschrieben:
 Before doing qmp actions, we need to lock the qemu_global_mutex,
 so the qmp actions should not take too long time.

 Unfortunately, some qmp actions need to acquire aio context and
 this may take a long time.  The vm will soft lockup if this time
 is too long.
>>>
>>> Do you have a specific situation in mind where getting the lock of an
>>> AioContext can take a long time? I know that the main thread can
>>> block for considerable time, but QMP commands run in the main thread, so
>>> this patch doesn't change anything for this case. It would be effective
>>> if an iothread blocks, but shouldn't everything running in an iothread
>>> be asynchronous and therefore keep the AioContext lock only for a short
>>> time?
>>>
>>
>> Theoretically, everything running in an iothread is asynchronous. However,
>> some 'asynchronous' actions are not non-blocking entirely, such as
>> io_submit().  This will block while the iodepth is too big and I/O pressure
>> is too high.  If we do some qmp actions, such as 'info block', at this time,
>> may cause vm soft lockup.  This series can make these qmp actions safer.
>>
>> I constructed the scene as follow:
>> 1. create a vm with 4 disks, using iothread.
>> 2. add press to the CPU on the host.  In my scene, the CPU usage exceeds 95%.
>> 3. add press to the 4 disks in the vm at the same time.  I used the fio and
>> some parameters are:
>>
>>   fio -rw=randrw -bs=1M -size=1G -iodepth=512 -ioengine=libaio -numjobs=4
>>
>> 4. do block query actions, for example, by virsh:
>>
>>  virsh qemu-monitor-command [vm name] --hmp info block
>>
>> Then the vm will soft lockup, the calltrace is:
> 
> [snip]
> 
>> This problem can be avoided after this series applied.
> 
> At what cost though ?   With this timeout, QEMU is going to start
> reporting bogus failures for various QMP commands when running
> under high load, even if those commands would actually run
> successfully.  This will turn into an error report from libvirt
> which will in turn probably cause an error in the mgmt application
> using libvirt, and in turn could break the user's automation.
> 

I think it's worth reporting an error to avoid the VM softlockup.
The VM may even crash if kernel.softlockup_panic is configured!

We can increase the timeout value (close to the VM cpu soft lock time)
to avoid unnecessary errors.

Thanks,
Zhenyu



Re: [PATCH v1 0/2] Add timeout mechanism to qmp actions

2020-09-17 Thread Zhenyu Ye
Hi Stefan,

On 2020/9/14 21:27, Stefan Hajnoczi wrote:
>>
>> Theoretically, everything running in an iothread is asynchronous. However,
>> some 'asynchronous' actions are not non-blocking entirely, such as
>> io_submit().  This will block while the iodepth is too big and I/O pressure
>> is too high.  If we do some qmp actions, such as 'info block', at this time,
>> may cause vm soft lockup.  This series can make these qmp actions safer.
>>
>> I constructed the scene as follow:
>> 1. create a vm with 4 disks, using iothread.
>> 2. add press to the CPU on the host.  In my scene, the CPU usage exceeds 95%.
>> 3. add press to the 4 disks in the vm at the same time.  I used the fio and
>> some parameters are:
>>
>>   fio -rw=randrw -bs=1M -size=1G -iodepth=512 -ioengine=libaio -numjobs=4
>>
>> 4. do block query actions, for example, by virsh:
>>
>>  virsh qemu-monitor-command [vm name] --hmp info block
>>
>> Then the vm will soft lockup, the calltrace is:
>>
>> [  192.311393] watchdog: BUG: soft lockup - CPU#1 stuck for 42s! 
>> [kworker/1:1:33]
> 
> Hi,
> Sorry I haven't had time to investigate this myself yet.
> 
> Do you also have a QEMU backtrace when the hang occurs?
> 
> Let's find out if QEMU is stuck in the io_submit(2) syscall or whether
> there's an issue in QEMU itself that causes the softlockup (for example,
> aio_poll() with the global mutex held).

Sorry for the delay.

I traced the io_submit() within the guest. The maximum execution time exceeds 
16s:

guest# perf trace -p `pidof -s fio` --duration 1
14.808 (3843.082 ms): io_submit(ctx_id: 281472924459008, nr: 1, 
iocbpp: 0x19a87160  ) = 1
  3861.336 (11470.608 ms): io_submit(ctx_id: 281472924459008, nr: 1, 
iocbpp: 0x19a87160  ) = 1
 15341.998 (479.283 ms): io_submit(ctx_id: 281472924459008, nr: 1, 
iocbpp: 0x19a87160  ) = 1
 15831.380 (3704.997 ms): io_submit(ctx_id: 281472924459008, nr: 1, 
iocbpp: 0x19a87160  ) = 1
 19547.869 (3412.839 ms): io_submit(ctx_id: 281472924459008, nr: 1, 
iocbpp: 0x19a87160  ) = 1
 22966.953 (1080.854 ms): io_submit(ctx_id: 281472924459008, nr: 1, 
iocbpp: 0x19a87160  ) = 1
 24062.689 (6939.813 ms): io_submit(ctx_id: 281472924459008, nr: 1, 
iocbpp: 0x19a87160  ) = 1
 31019.219 (532.147 ms): io_submit(ctx_id: 281472924459008, nr: 1, 
iocbpp: 0x19a87160  ) = 1
 31556.610 (3469.920 ms): io_submit(ctx_id: 281472924459008, nr: 1, 
iocbpp: 0x19a87160  ) = 1
 35038.885 (9007.175 ms): io_submit(ctx_id: 281472924459008, nr: 1, 
iocbpp: 0x19a87160  ) = 1
 44053.578 (16006.405 ms): io_submit(ctx_id: 281472924459008, nr: 1, 
iocbpp: 0x19a87160  ) = 1
 60068.944 (3068.748 ms): io_submit(ctx_id: 281472924459008, nr: 1, 
iocbpp: 0x19a87160  ) = 1
 63138.474 (13012.499 ms): io_getevents(ctx_id: 281472924459008, 
min_nr: 511, nr: 511, events: 0x19a83150) = 511
 76191.073 (2872.657 ms): io_submit(ctx_id: 281472924459008, nr: 1, 
iocbpp: 0x19a87160  ) = 1
...

And in the host, the information of sys_enter_io_submit() is:

Samples: 3K of event 'syscalls:sys_enter_io_submit', Event count 
(approx.): 3150
Children  Self  Trace output
-   66.70%66.70%  ctx_id: 0x9c044000, nr: 0x0001, iocbpp: 
0x9f7fad28
0xae7f871c
0xae8a27c4
qemu_thread_start
iothread_run
aio_poll
aio_dispatch_ready_handlers
aio_dispatch_handler
virtio_queue_host_notifier_aio_read
virtio_queue_notify_aio_vq
virtio_blk_data_plane_handle_output
virtio_blk_handle_vq
blk_io_unplug
bdrv_io_unplug
bdrv_io_unplug
raw_aio_unplug
laio_io_unplug
syscall


When the hang occurs, the QEMU is blocked at:

#0  0x95762b64 in ?? () from target:/usr/lib64/libpthread.so.0
#1  0x9575bd88 in pthread_mutex_lock () from 
target:/usr/lib64/libpthread.so.0
#2  0xbb1f5948 in qemu_mutex_lock_impl (mutex=0xcc8e1860,
file=0xbb4e1bd0 
"/Images/eillon/CODE/5-opensource/qemu/util/async.c", line=605)
#3  0xbb20acd4 in aio_context_acquire (ctx=0xcc8e1800)
#4  0xbb105e90 in bdrv_query_image_info (bs=0xcc934620,
p_info=0xccc41e18, errp=0xca669118)
#5  0xbb105968 in bdrv_block_device_info (blk=0xcdca19f0, 
bs=0xcc934620,
flat=false, errp=0xca6692b8)
#6  0xbb1063dc in bdrv_query_info (blk=0xcdca19f0, 
p_info=0xcd29c9a8,
errp=0xca6692b8)
#7  0xbb106c14 in qmp_query_block (errp=0x0)
#8  0xbacb8e6c in hmp_info_block (mon=0xca6693d0, 
qdict=0xcd089790)
#9  0xbb0068f0 in handle_hmp_command (mon=0xca6693d0,

Re: [PATCH v1 0/2] Add timeout mechanism to qmp actions

2020-09-14 Thread Daniel P . Berrangé
On Tue, Aug 11, 2020 at 09:54:08PM +0800, Zhenyu Ye wrote:
> Hi Kevin,
> 
> On 2020/8/10 23:38, Kevin Wolf wrote:
> > Am 10.08.2020 um 16:52 hat Zhenyu Ye geschrieben:
> >> Before doing qmp actions, we need to lock the qemu_global_mutex,
> >> so the qmp actions should not take too long time.
> >>
> >> Unfortunately, some qmp actions need to acquire aio context and
> >> this may take a long time.  The vm will soft lockup if this time
> >> is too long.
> > 
> > Do you have a specific situation in mind where getting the lock of an
> > AioContext can take a long time? I know that the main thread can
> > block for considerable time, but QMP commands run in the main thread, so
> > this patch doesn't change anything for this case. It would be effective
> > if an iothread blocks, but shouldn't everything running in an iothread
> > be asynchronous and therefore keep the AioContext lock only for a short
> > time?
> > 
> 
> Theoretically, everything running in an iothread is asynchronous. However,
> some 'asynchronous' actions are not non-blocking entirely, such as
> io_submit().  This will block while the iodepth is too big and I/O pressure
> is too high.  If we do some qmp actions, such as 'info block', at this time,
> may cause vm soft lockup.  This series can make these qmp actions safer.
> 
> I constructed the scene as follow:
> 1. create a vm with 4 disks, using iothread.
> 2. add press to the CPU on the host.  In my scene, the CPU usage exceeds 95%.
> 3. add press to the 4 disks in the vm at the same time.  I used the fio and
> some parameters are:
> 
>fio -rw=randrw -bs=1M -size=1G -iodepth=512 -ioengine=libaio -numjobs=4
> 
> 4. do block query actions, for example, by virsh:
> 
>   virsh qemu-monitor-command [vm name] --hmp info block
> 
> Then the vm will soft lockup, the calltrace is:

[snip]

> This problem can be avoided after this series applied.

At what cost though ?   With this timeout, QEMU is going to start
reporting bogus failures for various QMP commands when running
under high load, even if those commands would actually run
successfully.  This will turn into an error report from libvirt
which will in turn probably cause an error in the mgmt application
using libvirt, and in turn could break the user's automation.


Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH v1 0/2] Add timeout mechanism to qmp actions

2020-09-14 Thread Stefan Hajnoczi
On Tue, Aug 11, 2020 at 09:54:08PM +0800, Zhenyu Ye wrote:
> Hi Kevin,
> 
> On 2020/8/10 23:38, Kevin Wolf wrote:
> > Am 10.08.2020 um 16:52 hat Zhenyu Ye geschrieben:
> >> Before doing qmp actions, we need to lock the qemu_global_mutex,
> >> so the qmp actions should not take too long time.
> >>
> >> Unfortunately, some qmp actions need to acquire aio context and
> >> this may take a long time.  The vm will soft lockup if this time
> >> is too long.
> > 
> > Do you have a specific situation in mind where getting the lock of an
> > AioContext can take a long time? I know that the main thread can
> > block for considerable time, but QMP commands run in the main thread, so
> > this patch doesn't change anything for this case. It would be effective
> > if an iothread blocks, but shouldn't everything running in an iothread
> > be asynchronous and therefore keep the AioContext lock only for a short
> > time?
> > 
> 
> Theoretically, everything running in an iothread is asynchronous. However,
> some 'asynchronous' actions are not non-blocking entirely, such as
> io_submit().  This will block while the iodepth is too big and I/O pressure
> is too high.  If we do some qmp actions, such as 'info block', at this time,
> may cause vm soft lockup.  This series can make these qmp actions safer.
> 
> I constructed the scene as follow:
> 1. create a vm with 4 disks, using iothread.
> 2. add press to the CPU on the host.  In my scene, the CPU usage exceeds 95%.
> 3. add press to the 4 disks in the vm at the same time.  I used the fio and
> some parameters are:
> 
>fio -rw=randrw -bs=1M -size=1G -iodepth=512 -ioengine=libaio -numjobs=4
> 
> 4. do block query actions, for example, by virsh:
> 
>   virsh qemu-monitor-command [vm name] --hmp info block
> 
> Then the vm will soft lockup, the calltrace is:
> 
> [  192.311393] watchdog: BUG: soft lockup - CPU#1 stuck for 42s! 
> [kworker/1:1:33]

Hi,
Sorry I haven't had time to investigate this myself yet.

Do you also have a QEMU backtrace when the hang occurs?

Let's find out if QEMU is stuck in the io_submit(2) syscall or whether
there's an issue in QEMU itself that causes the softlockup (for example,
aio_poll() with the global mutex held).

Stefan


signature.asc
Description: PGP signature


Re: [PATCH v1 0/2] Add timeout mechanism to qmp actions

2020-08-21 Thread Stefan Hajnoczi
On Tue, Aug 11, 2020 at 09:54:08PM +0800, Zhenyu Ye wrote:
> Hi Kevin,
> 
> On 2020/8/10 23:38, Kevin Wolf wrote:
> > Am 10.08.2020 um 16:52 hat Zhenyu Ye geschrieben:
> >> Before doing qmp actions, we need to lock the qemu_global_mutex,
> >> so the qmp actions should not take too long time.
> >>
> >> Unfortunately, some qmp actions need to acquire aio context and
> >> this may take a long time.  The vm will soft lockup if this time
> >> is too long.
> > 
> > Do you have a specific situation in mind where getting the lock of an
> > AioContext can take a long time? I know that the main thread can
> > block for considerable time, but QMP commands run in the main thread, so
> > this patch doesn't change anything for this case. It would be effective
> > if an iothread blocks, but shouldn't everything running in an iothread
> > be asynchronous and therefore keep the AioContext lock only for a short
> > time?
> > 
> 
> Theoretically, everything running in an iothread is asynchronous. However,
> some 'asynchronous' actions are not non-blocking entirely, such as
> io_submit().  This will block while the iodepth is too big and I/O pressure
> is too high.  If we do some qmp actions, such as 'info block', at this time,
> may cause vm soft lockup.  This series can make these qmp actions safer.
> 
> I constructed the scene as follow:
> 1. create a vm with 4 disks, using iothread.
> 2. add press to the CPU on the host.  In my scene, the CPU usage exceeds 95%.
> 3. add press to the 4 disks in the vm at the same time.  I used the fio and
> some parameters are:
> 
>fio -rw=randrw -bs=1M -size=1G -iodepth=512 -ioengine=libaio -numjobs=4
> 
> 4. do block query actions, for example, by virsh:
> 
>   virsh qemu-monitor-command [vm name] --hmp info block

Great, thanks for describing a reproducer.

Do you also have a QEMU backtrace?

I'm curious whether io_submit(2) is blocking for 40+ seconds or whether
the softlockup is caused by QEMU code waiting to drain in-flight I/O
requests with the global mutex held.

I'll try to reproduce this here and start with:

  host# perf record -a -e syscalls:sys_enter_io_submit \
   -e syscalls:sys_exit_io_submit \
   -e kvm:kvm_exit \
   -e kvm:kvm_entry

This will allow us to check the io_submit() and vmexit->vmenter timings.

I want to understand this better to be sure that the timeout is really
the only solution before merging this patch. My concern is that relying
on timeouts will hide QEMU bugs that could be fixed instead.

> Then the vm will soft lockup, the calltrace is:
> 
> [  192.311393] watchdog: BUG: soft lockup - CPU#1 stuck for 42s! 
> [kworker/1:1:33]
> [  192.314241] Kernel panic - not syncing: softlockup: hung tasks
> [  192.316370] CPU: 1 PID: 33 Comm: kworker/1:1 Kdump: loaded Tainted: G  
>  OEL4.19.36+ #16
> [  192.318765] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
> [  192.325638] Workqueue: events drm_fb_helper_dirty_work
> [  192.roorr327238] Call trace:
> [  192.331528]  dump_backtrace+0x0/0x198
> [  192.332695]  show_stack+0x24/0x30
> [  192.334219]  dump_stack+0xa4/0xcc
> [  192.335578]  panic+0x12c/0x314
> [  192.336565]  watchdog_timoot_fn+0x3e4/0x3e8
> [  192.339984]  __hrtimer_run_queues+0x114/0x358
> [  192.341576]  hrtimer_interrupt+0x104/0x2d8
> [  192.343247]  arch_timer_handler_virt+0x38/0x58
> [  192.345074]  handle_percpu_devid_irq+0x90/0x248
> [  192.347238]  generic_handle_irq+0x34/0x50
> [  192.349536]  __handle_domain_irq+0x68/0xc0
> [  192.351193]  gic_handle_irq+0x6c/0x150
> [  192.352639]  el1_irq+0xb8/0x140
> [  192.353855]  vp_notify+0x28/0x38 [virtio_pci]
> [  192.355646]  virtqueue_kick+0x3c/0x78 [virtio_ring]
> [  192.357539]  virtio_gpu_queue_ctrl_buffer_locked+0x180/0x248 [virtio_gpu]
> [  192.359869]  virtio_gpu_queue_ctrl_buffer+0x50/0x78 [virtio_gpu]
> [  192.361456]  virtio_gpu_cmd_resource_flush+0x8c/0xb0 [virtio_gpu]
> [  192.363422]  virtio_gpu_surface_dirty+0x60/0x110 [virtio_gpu]
> [  192.365215]  virtio_gpu_framebuffer_surface_dirty+0x34/0x48 [virtio_gpu]
> [  192.367452]  drm_fb_helper_dirty_work+0x178/0x1c0
> [  192.368912]  process_one_work+0x1b4/0x3f8
> [  192.370192]  worker_thread+0x54/0x470
> [  192.371370]  kthread+0x134/0x138
> [  192.379241]  ret_from_fork+0x10/0x18
> [  192.382688] kernel fault(0x5) notification starting on CPU 1
> [  192.385059] kernel fault(0x5) notification finished on CPU 1
> [  192.387311] SMP: stopping secondary CPUs
> [  192.391024] Kernel Offset: disabled
> [  192.392111] CPU features: 0x0,a1806008
> [  192.393306] Memory Limit: none
> [  192.396701] Starting crashdump kernel...
> [  192.397821] Bye!
> 
> This problem can be avoided after this series applied.
> 
> Thanks,
> Zhenyu
> 


signature.asc
Description: PGP signature


Re: [PATCH v1 0/2] Add timeout mechanism to qmp actions

2020-08-12 Thread Zhenyu Ye
Hi Stefan,

On 2020/8/12 21:51, Stefan Hajnoczi wrote:
> On Mon, Aug 10, 2020 at 10:52:44PM +0800, Zhenyu Ye wrote:
>> Before doing qmp actions, we need to lock the qemu_global_mutex,
>> so the qmp actions should not take too long time.
>>
>> Unfortunately, some qmp actions need to acquire aio context and
>> this may take a long time.  The vm will soft lockup if this time
>> is too long.
>>
>> So add a timeout mechanism while doing qmp actions.
> 
> aio_context_acquire_timeout() is a workaround for code that blocks the
> event loop. Ideally there should be no code that blocks the event loop.
> 
> Which cases have you found where the event loop is blocked?
> 

Currently I only found the io_submit() will block while I/O pressure
is too high, for details, see:


https://lore.kernel.org/qemu-devel/c6d75e49-3e36-6a76-fdc8-cdf09e7c3...@huawei.com/

io_submit can not ensure non-blocking at any time.

> I think they should be investigated and fixed (if possible) before
> introducing an API like aio_context_acquire_timeout().
> 

We cannot ensure that everything is non-blocking in iothread, because
some actions seems like asynchronous but will block in some times (such
as io_submit).  Anyway, the _timeout() API can make these qmp commands
(which need to get aio_context) be safer.

Thanks,
Zhenyu




Re: [PATCH v1 0/2] Add timeout mechanism to qmp actions

2020-08-12 Thread Stefan Hajnoczi
On Mon, Aug 10, 2020 at 10:52:44PM +0800, Zhenyu Ye wrote:
> Before doing qmp actions, we need to lock the qemu_global_mutex,
> so the qmp actions should not take too long time.
> 
> Unfortunately, some qmp actions need to acquire aio context and
> this may take a long time.  The vm will soft lockup if this time
> is too long.
> 
> So add a timeout mechanism while doing qmp actions.

aio_context_acquire_timeout() is a workaround for code that blocks the
event loop. Ideally there should be no code that blocks the event loop.

Which cases have you found where the event loop is blocked?

I think they should be investigated and fixed (if possible) before
introducing an API like aio_context_acquire_timeout().

Stefan


signature.asc
Description: PGP signature


Re: [PATCH v1 0/2] Add timeout mechanism to qmp actions

2020-08-11 Thread Zhenyu Ye
Hi Kevin,

On 2020/8/10 23:38, Kevin Wolf wrote:
> Am 10.08.2020 um 16:52 hat Zhenyu Ye geschrieben:
>> Before doing qmp actions, we need to lock the qemu_global_mutex,
>> so the qmp actions should not take too long time.
>>
>> Unfortunately, some qmp actions need to acquire aio context and
>> this may take a long time.  The vm will soft lockup if this time
>> is too long.
> 
> Do you have a specific situation in mind where getting the lock of an
> AioContext can take a long time? I know that the main thread can
> block for considerable time, but QMP commands run in the main thread, so
> this patch doesn't change anything for this case. It would be effective
> if an iothread blocks, but shouldn't everything running in an iothread
> be asynchronous and therefore keep the AioContext lock only for a short
> time?
> 

Theoretically, everything running in an iothread is asynchronous. However,
some 'asynchronous' actions are not non-blocking entirely, such as
io_submit().  This will block while the iodepth is too big and I/O pressure
is too high.  If we do some qmp actions, such as 'info block', at this time,
may cause vm soft lockup.  This series can make these qmp actions safer.

I constructed the scene as follow:
1. create a vm with 4 disks, using iothread.
2. add press to the CPU on the host.  In my scene, the CPU usage exceeds 95%.
3. add press to the 4 disks in the vm at the same time.  I used the fio and
some parameters are:

 fio -rw=randrw -bs=1M -size=1G -iodepth=512 -ioengine=libaio -numjobs=4

4. do block query actions, for example, by virsh:

virsh qemu-monitor-command [vm name] --hmp info block

Then the vm will soft lockup, the calltrace is:

[  192.311393] watchdog: BUG: soft lockup - CPU#1 stuck for 42s! 
[kworker/1:1:33]
[  192.314241] Kernel panic - not syncing: softlockup: hung tasks
[  192.316370] CPU: 1 PID: 33 Comm: kworker/1:1 Kdump: loaded Tainted: G
   OEL4.19.36+ #16
[  192.318765] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
[  192.325638] Workqueue: events drm_fb_helper_dirty_work
[  192.roorr327238] Call trace:
[  192.331528]  dump_backtrace+0x0/0x198
[  192.332695]  show_stack+0x24/0x30
[  192.334219]  dump_stack+0xa4/0xcc
[  192.335578]  panic+0x12c/0x314
[  192.336565]  watchdog_timoot_fn+0x3e4/0x3e8
[  192.339984]  __hrtimer_run_queues+0x114/0x358
[  192.341576]  hrtimer_interrupt+0x104/0x2d8
[  192.343247]  arch_timer_handler_virt+0x38/0x58
[  192.345074]  handle_percpu_devid_irq+0x90/0x248
[  192.347238]  generic_handle_irq+0x34/0x50
[  192.349536]  __handle_domain_irq+0x68/0xc0
[  192.351193]  gic_handle_irq+0x6c/0x150
[  192.352639]  el1_irq+0xb8/0x140
[  192.353855]  vp_notify+0x28/0x38 [virtio_pci]
[  192.355646]  virtqueue_kick+0x3c/0x78 [virtio_ring]
[  192.357539]  virtio_gpu_queue_ctrl_buffer_locked+0x180/0x248 [virtio_gpu]
[  192.359869]  virtio_gpu_queue_ctrl_buffer+0x50/0x78 [virtio_gpu]
[  192.361456]  virtio_gpu_cmd_resource_flush+0x8c/0xb0 [virtio_gpu]
[  192.363422]  virtio_gpu_surface_dirty+0x60/0x110 [virtio_gpu]
[  192.365215]  virtio_gpu_framebuffer_surface_dirty+0x34/0x48 [virtio_gpu]
[  192.367452]  drm_fb_helper_dirty_work+0x178/0x1c0
[  192.368912]  process_one_work+0x1b4/0x3f8
[  192.370192]  worker_thread+0x54/0x470
[  192.371370]  kthread+0x134/0x138
[  192.379241]  ret_from_fork+0x10/0x18
[  192.382688] kernel fault(0x5) notification starting on CPU 1
[  192.385059] kernel fault(0x5) notification finished on CPU 1
[  192.387311] SMP: stopping secondary CPUs
[  192.391024] Kernel Offset: disabled
[  192.392111] CPU features: 0x0,a1806008
[  192.393306] Memory Limit: none
[  192.396701] Starting crashdump kernel...
[  192.397821] Bye!

This problem can be avoided after this series applied.

Thanks,
Zhenyu




Re: [PATCH v1 0/2] Add timeout mechanism to qmp actions

2020-08-10 Thread Kevin Wolf
Am 10.08.2020 um 16:52 hat Zhenyu Ye geschrieben:
> Before doing qmp actions, we need to lock the qemu_global_mutex,
> so the qmp actions should not take too long time.
> 
> Unfortunately, some qmp actions need to acquire aio context and
> this may take a long time.  The vm will soft lockup if this time
> is too long.

Do you have a specific situation in mind where getting the lock of an
AioContext can take a long time? I know that the main thread can
block for considerable time, but QMP commands run in the main thread, so
this patch doesn't change anything for this case. It would be effective
if an iothread blocks, but shouldn't everything running in an iothread
be asynchronous and therefore keep the AioContext lock only for a short
time?

Kevin




[PATCH v1 0/2] Add timeout mechanism to qmp actions

2020-08-10 Thread Zhenyu Ye
Before doing qmp actions, we need to lock the qemu_global_mutex,
so the qmp actions should not take too long time.

Unfortunately, some qmp actions need to acquire aio context and
this may take a long time.  The vm will soft lockup if this time
is too long.

So add a timeout mechanism while doing qmp actions.

Zhenyu Ye (2):
  util: introduce aio_context_acquire_timeout
  qmp: use aio_context_acquire_timeout replace aio_context_acquire

 block/qapi-sysemu.c |  7 ++-
 block/qapi.c|  6 +-
 blockdev.c  | 35 ++-
 include/block/aio.h |  6 ++
 include/qemu/thread-posix.h |  1 +
 include/qemu/thread.h   |  1 +
 util/async.c| 10 ++
 util/qemu-thread-posix.c|  6 ++
 8 files changed, 65 insertions(+), 7 deletions(-)

-- 
2.22.0.windows.1