Re: VMD consumes 100% cpu after unpausing guest

2018-02-27 Thread Pratik Vyas

* Dave Voutila  [2018-02-27 21:29:25 -0500]:


I can confirm this patch resolves the issue I reported. I _think_ I'm
seeing a similar CPU load drop as well, but definitely have
paused/unpaused the guest multiple times without issues.




Thanks Dave and Peter for testing. I will commit this.

I cannot explain a general decrease in CPU load because these lines are
in the code path only when you unpause or receive a vm. 



* Peter Hessler  [2018-02-27 11:16:52 +0100]:


(btw, should rtc_fireper() receive a similar change?)




rtc_fireper is unrelated to the cause of this. rtc_reschedule_per will
do an event_add for rtc_fireper if required and rtc_fireper keeps on
doing an event_add for itself.

--
Pratik



Re: VMD consumes 100% cpu after unpausing guest

2018-02-27 Thread Dave Voutila
Peter Hessler <phess...@openbsd.org> writes:

> On 2018 Feb 26 (Mon) at 18:52:34 -0800 (-0800), Pratik Vyas wrote:
> :* Dave Voutila <d...@sisu.io> [2018-02-22 23:40:21 -0500]:
> :
> :> > Synopsis:    VMD consumes 100% cpu after unpausing guest
> :> > Category:amd64
> :> > Environment:
> :>System  : OpenBSD 6.2
> :>Details : OpenBSD 6.2-current (GENERIC.MP) #10: Wed Feb 21 21:26:27 
> MST 2018
> :> 
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> :> 
> :>Architecture: OpenBSD.amd64
> :>Machine : amd64
> :> 
> :> > Description:
> :> 
> :>Not sure if this is a known issue, but I couldn't find anything
> :> searching the lists.
> :> 
> :> Using an Alpine Linux guest vm, I can successfully pause the guest using
> :> `vmctl pause 1` and some time later resume it using `vmctl unpause 1`.
> :> 
> :> Unpausing works as the guest comes back to life, I can SSH back in, and
> :> it's fine. However, on the host the vmd process representing that guest
> :> sits at 100% CPU utilization with 1 thread constantly queueing onto a
> :> cpu and running. The guest reports normal load so it must be one of the
> :> 2 threads.
> :
> :This should fix it.
> :
> :Use rtc_reschedule_per in mc146818_start instead of re arming the
> :periodic interrupt without checking if it's enabled in REGB.
> :
> :ok?
> :
> :--
> :Pratik
> :
> :Index: usr.sbin/vmd/mc146818.c
> :===
> :RCS file: /home/pdvyas/cvs/src/usr.sbin/vmd/mc146818.c,v
> :retrieving revision 1.15
> :diff -u -p -a -u -r1.15 mc146818.c
> :--- usr.sbin/vmd/mc146818.c  9 Jul 2017 00:51:40 -   1.15
> :+++ usr.sbin/vmd/mc146818.c  27 Feb 2018 02:47:18 -
> :@@ -354,6 +354,6 @@ mc146818_stop()
> :void
> :mc146818_start()
> :{
> :-evtimer_add(, _tv);
> : evtimer_add(, _tv);
> :+rtc_reschedule_per();
> :}
> :
>
> This helps a lot with the CPU load on a vmd host.  Drops my single guest
> from ~50% CPU to ~9% CPU on the host.

I can confirm this patch resolves the issue I reported. I _think_ I'm
seeing a similar CPU load drop as well, but definitely have
paused/unpaused the guest multiple times without issues.




Re: VMD consumes 100% cpu after unpausing guest

2018-02-27 Thread Peter Hessler
On 2018 Feb 26 (Mon) at 18:52:34 -0800 (-0800), Pratik Vyas wrote:
:* Dave Voutila <d...@sisu.io> [2018-02-22 23:40:21 -0500]:
:
:> > Synopsis:  VMD consumes 100% cpu after unpausing guest
:> > Category:  amd64
:> > Environment:
:>  System  : OpenBSD 6.2
:>  Details : OpenBSD 6.2-current (GENERIC.MP) #10: Wed Feb 21 21:26:27 
MST 2018
:>   
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
:> 
:>  Architecture: OpenBSD.amd64
:>  Machine : amd64
:> 
:> > Description:
:> 
:>Not sure if this is a known issue, but I couldn't find anything
:> searching the lists.
:> 
:> Using an Alpine Linux guest vm, I can successfully pause the guest using
:> `vmctl pause 1` and some time later resume it using `vmctl unpause 1`.
:> 
:> Unpausing works as the guest comes back to life, I can SSH back in, and
:> it's fine. However, on the host the vmd process representing that guest
:> sits at 100% CPU utilization with 1 thread constantly queueing onto a
:> cpu and running. The guest reports normal load so it must be one of the
:> 2 threads.
:
:This should fix it.
:
:Use rtc_reschedule_per in mc146818_start instead of re arming the
:periodic interrupt without checking if it's enabled in REGB.
:
:ok?
:
:--
:Pratik
:
:Index: usr.sbin/vmd/mc146818.c
:===
:RCS file: /home/pdvyas/cvs/src/usr.sbin/vmd/mc146818.c,v
:retrieving revision 1.15
:diff -u -p -a -u -r1.15 mc146818.c
:--- usr.sbin/vmd/mc146818.c9 Jul 2017 00:51:40 -   1.15
:+++ usr.sbin/vmd/mc146818.c27 Feb 2018 02:47:18 -
:@@ -354,6 +354,6 @@ mc146818_stop()
:void
:mc146818_start()
:{
:-  evtimer_add(, _tv);
:   evtimer_add(, _tv);
:+  rtc_reschedule_per();
:}
:

This helps a lot with the CPU load on a vmd host.  Drops my single guest
from ~50% CPU to ~9% CPU on the host.

OK

(btw, should rtc_fireper() receive a similar change?)


-- 
The right half of the brain controls the left half of the body.  This
means that only left handed people are in their right mind.



Re: VMD consumes 100% cpu after unpausing guest

2018-02-26 Thread Pratik Vyas

* Dave Voutila <d...@sisu.io> [2018-02-22 23:40:21 -0500]:


Synopsis:       VMD consumes 100% cpu after unpausing guest
Category:   amd64
Environment:

System  : OpenBSD 6.2
Details : OpenBSD 6.2-current (GENERIC.MP) #10: Wed Feb 21 21:26:27 
MST 2018
 
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP

Architecture: OpenBSD.amd64
Machine : amd64


Description:


   Not sure if this is a known issue, but I couldn't find anything
searching the lists.

Using an Alpine Linux guest vm, I can successfully pause the guest using
`vmctl pause 1` and some time later resume it using `vmctl unpause 1`.

Unpausing works as the guest comes back to life, I can SSH back in, and
it's fine. However, on the host the vmd process representing that guest
sits at 100% CPU utilization with 1 thread constantly queueing onto a
cpu and running. The guest reports normal load so it must be one of the
2 threads.


This should fix it. 


Use rtc_reschedule_per in mc146818_start instead of re arming the
periodic interrupt without checking if it's enabled in REGB.

ok?

--
Pratik

Index: usr.sbin/vmd/mc146818.c
===
RCS file: /home/pdvyas/cvs/src/usr.sbin/vmd/mc146818.c,v
retrieving revision 1.15
diff -u -p -a -u -r1.15 mc146818.c
--- usr.sbin/vmd/mc146818.c 9 Jul 2017 00:51:40 -   1.15
+++ usr.sbin/vmd/mc146818.c 27 Feb 2018 02:47:18 -
@@ -354,6 +354,6 @@ mc146818_stop()
void
mc146818_start()
{
-   evtimer_add(, _tv);
evtimer_add(, _tv);
+   rtc_reschedule_per();
}



Re: VMD consumes 100% cpu after unpausing guest

2018-02-23 Thread Pratik Vyas

* Dave Voutila <d...@sisu.io> [2018-02-22 23:40:21 -0500]:


Synopsis:       VMD consumes 100% cpu after unpausing guest
Category:   amd64
Environment:

System  : OpenBSD 6.2
Details : OpenBSD 6.2-current (GENERIC.MP) #10: Wed Feb 21 21:26:27 
MST 2018
 
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP

Architecture: OpenBSD.amd64
Machine : amd64


Description:


   Not sure if this is a known issue, but I couldn't find anything
searching the lists.

Using an Alpine Linux guest vm, I can successfully pause the guest using
`vmctl pause 1` and some time later resume it using `vmctl unpause 1`.

Unpausing works as the guest comes back to life, I can SSH back in, and
it's fine. However, on the host the vmd process representing that guest
sits at 100% CPU utilization with 1 thread constantly queueing onto a
cpu and running. The guest reports normal load so it must be one of the
2 threads.


Thanks Dave for the report. I can reproduce this with a receive as well.
Probably mc146818_start doesn't do the right thing. Will report back
when I find a solution.

--
Pratik



VMD consumes 100% cpu after unpausing guest

2018-02-22 Thread Dave Voutila
>Synopsis:      VMD consumes 100% cpu after unpausing guest
>Category:  amd64
>Environment:
System  : OpenBSD 6.2
Details : OpenBSD 6.2-current (GENERIC.MP) #10: Wed Feb 21 21:26:27 
MST 2018
 
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP

Architecture: OpenBSD.amd64
Machine : amd64

>Description:

Not sure if this is a known issue, but I couldn't find anything
searching the lists.

Using an Alpine Linux guest vm, I can successfully pause the guest using
`vmctl pause 1` and some time later resume it using `vmctl unpause 1`. 

Unpausing works as the guest comes back to life, I can SSH back in, and
it's fine. However, on the host the vmd process representing that guest
sits at 100% CPU utilization with 1 thread constantly queueing onto a
cpu and running. The guest reports normal load so it must be one of the
2 threads.

Taking a ktrace of that particular thread, and slimming for sake of
email, it's constantly calling clock_gettime and kevent:

CALLfutex(0x7361d183cd0,0x2,1,0,0)  
RET futex   0
CALLkevent(5,0,0,0x7361d17c800,64,0x735f272b7c0)
STRUstruct  timespec
RET kevent  0
CALLclock_gettime(CLOCK_MONOTONIC,0x735f272b860)
STRUstruct  timespec
RET clock_gettime   0
CALLkevent(5,0,0,0x7361d17c800,64,0x735f272b7c0)
STRUstruct  timespec
RET kevent  0
CALLclock_gettime(CLOCK_MONOTONIC,0x735f272b860)
STRUstruct  timespec
RET clock_gettime   0
CALLkevent(5,0,0,0x7361d17c800,64,0x735f272b7c0)
STRUstruct  timespec
RET kevent  0
CALLclock_gettime(CLOCK_MONOTONIC,0x735f272b860)
STRUstruct  timespec
RET clock_gettime   0
CALLkevent(5,0,0,0x7361d17c800,64,0x735f272b7c0)
STRUstruct  timespec
RET kevent  0
CALLclock_gettime(CLOCK_MONOTONIC,0x735f272b860)
STRUstruct  timespec
RET clock_gettime   0
CALLkevent(5,0,0,0x7361d17c800,64,0x735f272b7c0)
STRUstruct  timespec
RET kevent  0
...etc.

VMD reports nothing strange, which I'd expect as the guest vm is
perfectly functional during this period even while that thread
burns up the CPU:

startup
/etc/vm.conf:3: switch "uplink" registered
vm_register: registering vm 1   
/etc/vm.conf:12: vm "alpine" registered (disabled)
vm_priv_brconfig: interface bridge0 description switch1-uplink
vmd_configure: not creating vm alpine (disabled)
config_setconfig: setting config
config_getconfig: retrieving config
config_getconfig: retrieving config
config_getconfig: retrieving config
vm_opentty: vm alpine tty /dev/ttyp5 uid 1000 gid 4 mode 620
vm_register: registering vm 1
vm_priv_ifconfig: interface tap0 description vm1-if0-alpine
vm_priv_ifconfig: switch "uplink" interface bridge0 add tap0
alpine: started vm 1 successfully, tty /dev/ttyp5
loadfile_bios: loaded BIOS image
run_vm: initializing hardware for vm alpine
virtio_init: vm "alpine" vio0 lladdr fe:e1:bb:d1:1b:bd
run_vm: starting vcpu threads for vm alpine
vcpu_reset: resetting vcpu 0 for vm 3
run_vm: waiting on events for VM alpine
i8259_write_datareg: master pic, reset IRQ vector to 0x8
i8259_write_datareg: slave pic, reset IRQ vector to 0x70
vcpu_exit_i8253: channel 0 reset, mode=0, start=65535
virtio_blk_io: device reset
virtio_blk_io: device reset
vcpu_process_com_lcr: set baudrate = 115200
vcpu_process_com_lcr: set baudrate = 115200
i8259_write_datareg: master pic, reset IRQ vector to 0x30
i8259_write_datareg: slave pic, reset IRQ vector to 0x38
vcpu_process_com_lcr: set baudrate = 115200
vcpu_exit_i8253: channel 0 reset, mode=7, start=3977
vcpu_exit_i8253: channel 2 reset, mode=7, start=65535
vcpu_exit_i8253: channel 2 reset, mode=7, start=65535
vcpu_exit_i8253: channel 2 reset, mode=7, start=65535
vcpu_exit_i8253: channel 2 reset, mode=7, start=65535
vcpu_process_com_lcr: set baudrate = 115200
vcpu_process_com_data: guest reading com1 when not ready
vcpu_process_com_data: guest reading com1 when not ready
vcpu_process_com_data: guest reading com1 when not ready
vcpu_process_com_lcr: set baudrate = 115200
virtio_blk_io: device reset
virtio_blk_io: device reset
virtio_net_io: device reset
alpine: paused vm 1 successfully
alpine: unpaused vm 1 successfully.
rtc_update_rega: set non-32KHz timebase not supported
rtc_fire1: RTC clock drift (44s), requesting guest resync
rtc_update_rega: set non-32KHz timebase not supported

>How-To-Repeat:
Pause an actively running linux guest: `vmctl pause 1`
After some time, resume the guest: `vmctl unpause 1`
Observe CPU utilization of matching VMD process.

>Fix:
Unknown. Stopping the guest through either having it halt or 
`vmctl stop ` obviously ends the cpu consumption.

dmesg:
OpenBSD 6.2-current (GENERIC.MP) #10: Wed Feb 21 21:26:27 MST 2018
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 17053851648 (16263MB)