Re: A non-responsive guest problem

2011-08-29 Thread Paul
Hi,
After changing the clock source from kvm-clock to tsc, everything is
OK. Probably it's the bug of kvm-clock. Maybe these bugs have been
fixed in latest qemu.
Thanks,
Paul

On Wed, Aug 24, 2011 at 8:47 PM, Paul  wrote:
>
> Hi,
>
> Sometimes this problem happened in one day, but sometimes it was very
> difficult to reproduce it.
> Previously the clock source of the guest was kvm-clock. Now I changed
> it to tsc. The problem didn't occur until now. Is it related to the
> clock source? I  find that there are some bug fixes for kvm-clock
> recently. (e.g.,
> http://www.spinics.net/lists/stable-commits/msg11942.html) Anyway, I
> will update KVM later.
>
> Thanks,
> Paul
>
> On Wed, Aug 24, 2011 at 6:24 PM, Stefan Hajnoczi  wrote:
> >
> > On Wed, Aug 24, 2011 at 10:02 AM, Xiao Guangrong
> >  wrote:
> > > On 08/24/2011 04:40 PM, Paul wrote:
> > >> Hi,
> > >>
> > >> I captured the output of pidstat when the problem was reproduced:
> > >>
> > >> bash-4.1# pidstat -p $PID 8966
> > >> Linux 2.6.32-71.el6.x86_64 (test)     07/24/11        _x86_64_        (4 
> > >> CPU)
> > >>
> > >> 16:25:15          PID    %usr %system  %guest    %CPU   CPU  Command
> > >> 16:25:15         8966    0.14   55.04  115.41  170.59     1  qemu-kvm
> > >>
> > >
> > > I have tried to reproduce it, but it was failed. I am using the
> > > current KVM code. I suggest you to test it by the new code if possible.
> >
> > Yes, that's a good idea.  The issue might already be fixed.  But if
> > this is hard to reproduce then perhaps keep the spinning guest around
> > a bit longer so we can poke at it and figure out what is happening.
> >
> > The pidstat output shows us that it's the guest that is spinning, not
> > qemu-kvm userspace.
> >
> > The system time (time spent in host kernel) is also quite high so
> > running kvm_stat might show some interesting KVM events happening.
> >
> > Stefan
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: A non-responsive guest problem

2011-08-24 Thread Paul
Hi,

Sometimes this problem happened in one day, but sometimes it was very
difficult to reproduce it.
Previously the clock source of the guest was kvm-clock. Now I changed
it to tsc. The problem didn't occur until now. Is it related to the
clock source? I  find that there are some bug fixes for kvm-clock
recently. (e.g.,
http://www.spinics.net/lists/stable-commits/msg11942.html) Anyway, I
will update KVM later.

Thanks,
Paul

On Wed, Aug 24, 2011 at 6:24 PM, Stefan Hajnoczi  wrote:
>
> On Wed, Aug 24, 2011 at 10:02 AM, Xiao Guangrong
>  wrote:
> > On 08/24/2011 04:40 PM, Paul wrote:
> >> Hi,
> >>
> >> I captured the output of pidstat when the problem was reproduced:
> >>
> >> bash-4.1# pidstat -p $PID 8966
> >> Linux 2.6.32-71.el6.x86_64 (test)     07/24/11        _x86_64_        (4 
> >> CPU)
> >>
> >> 16:25:15          PID    %usr %system  %guest    %CPU   CPU  Command
> >> 16:25:15         8966    0.14   55.04  115.41  170.59     1  qemu-kvm
> >>
> >
> > I have tried to reproduce it, but it was failed. I am using the
> > current KVM code. I suggest you to test it by the new code if possible.
>
> Yes, that's a good idea.  The issue might already be fixed.  But if
> this is hard to reproduce then perhaps keep the spinning guest around
> a bit longer so we can poke at it and figure out what is happening.
>
> The pidstat output shows us that it's the guest that is spinning, not
> qemu-kvm userspace.
>
> The system time (time spent in host kernel) is also quite high so
> running kvm_stat might show some interesting KVM events happening.
>
> Stefan
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: A non-responsive guest problem

2011-08-24 Thread Stefan Hajnoczi
On Wed, Aug 24, 2011 at 10:02 AM, Xiao Guangrong
 wrote:
> On 08/24/2011 04:40 PM, Paul wrote:
>> Hi,
>>
>> I captured the output of pidstat when the problem was reproduced:
>>
>> bash-4.1# pidstat -p $PID 8966
>> Linux 2.6.32-71.el6.x86_64 (test)     07/24/11        _x86_64_        (4 CPU)
>>
>> 16:25:15          PID    %usr %system  %guest    %CPU   CPU  Command
>> 16:25:15         8966    0.14   55.04  115.41  170.59     1  qemu-kvm
>>
>
> I have tried to reproduce it, but it was failed. I am using the
> current KVM code. I suggest you to test it by the new code if possible.

Yes, that's a good idea.  The issue might already be fixed.  But if
this is hard to reproduce then perhaps keep the spinning guest around
a bit longer so we can poke at it and figure out what is happening.

The pidstat output shows us that it's the guest that is spinning, not
qemu-kvm userspace.

The system time (time spent in host kernel) is also quite high so
running kvm_stat might show some interesting KVM events happening.

Stefan
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: A non-responsive guest problem

2011-08-24 Thread Xiao Guangrong
On 08/24/2011 04:40 PM, Paul wrote:
> Hi,
> 
> I captured the output of pidstat when the problem was reproduced:
> 
> bash-4.1# pidstat -p $PID 8966
> Linux 2.6.32-71.el6.x86_64 (test) 07/24/11_x86_64_(4 CPU)
> 
> 16:25:15  PID%usr %system  %guest%CPU   CPU  Command
> 16:25:15 89660.14   55.04  115.41  170.59 1  qemu-kvm
> 

I have tried to reproduce it, but it was failed. I am using the
current KVM code. I suggest you to test it by the new code if possible.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: A non-responsive guest problem

2011-08-24 Thread Paul
Hi,

I captured the output of pidstat when the problem was reproduced:

bash-4.1# pidstat -p $PID 8966
Linux 2.6.32-71.el6.x86_64 (test) 07/24/11_x86_64_(4 CPU)

16:25:15  PID%usr %system  %guest%CPU   CPU  Command
16:25:15 89660.14   55.04  115.41  170.59 1  qemu-kvm

Thanks,
Paul

On Tue, Aug 23, 2011 at 6:09 PM, Stefan Hajnoczi  wrote:
>
> On Tue, Aug 23, 2011 at 9:10 AM, Paul  wrote:
> > From trace messages, it seemed no interrupts for guest.
> > I also tried sysrq, but it didn't work. I doubt that kvm-qemu entered
> > some infinite loop.
>
> The fact that a fresh VNC connection to the guest works (but the mouse
> doesn't move) means that qemu-kvm itself is not completely locked up.
> The VNC server runs in a qemu-kvm thread.
>
> So this seems to be a problem inside the guest that causes it to
> consume 100% CPU.
>
> One way to confirm this is to run pidstat(1):
> $ pidstat -p $PID 1
> 11:05:51          PID    %usr %system  %guest    %CPU   CPU  Command
> 11:06:05        26994   65.00    0.00   98.00  163.00     1  kvm
>
> The %guest value is the percentage spent executing guest code.  The
> %usr time is the percentage spent executing qemu-kvm userspace code.
> I'm guessing you will see >80% %guest.
>
> In my example I was running while true; do true; done inside the guest :).
>
> Perhaps Avi can suggest kvm_stat or other techniques to discover what
> exactly this guest is doing.
>
> Stefan
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: A non-responsive guest problem

2011-08-23 Thread Stefan Hajnoczi
On Tue, Aug 23, 2011 at 9:10 AM, Paul  wrote:
> From trace messages, it seemed no interrupts for guest.
> I also tried sysrq, but it didn't work. I doubt that kvm-qemu entered
> some infinite loop.

The fact that a fresh VNC connection to the guest works (but the mouse
doesn't move) means that qemu-kvm itself is not completely locked up.
The VNC server runs in a qemu-kvm thread.

So this seems to be a problem inside the guest that causes it to
consume 100% CPU.

One way to confirm this is to run pidstat(1):
$ pidstat -p $PID 1
11:05:51  PID%usr %system  %guest%CPU   CPU  Command
11:06:0526994   65.000.00   98.00  163.00 1  kvm

The %guest value is the percentage spent executing guest code.  The
%usr time is the percentage spent executing qemu-kvm userspace code.
I'm guessing you will see >80% %guest.

In my example I was running while true; do true; done inside the guest :).

Perhaps Avi can suggest kvm_stat or other techniques to discover what
exactly this guest is doing.

Stefan
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: A non-responsive guest problem

2011-08-23 Thread Paul
Hi,

>From trace messages, it seemed no interrupts for guest.
I also tried sysrq, but it didn't work. I doubt that kvm-qemu entered
some infinite loop.

Thanks,
Paul

On Mon, Aug 22, 2011 at 8:10 PM, Stefan Hajnoczi  wrote:
> On Mon, Aug 22, 2011 at 10:37 AM, Paul  wrote:
>> Hi,
>> I found the clock of guest OS has been changed. For example, today was Aug
>> 22, but I found the time of guest was Mar 22 from the VNC desktop. The clock
>> source of guest was kvm-clock. Was it related to KVM clock bug? How about it
>> if I changed the clock to tsc?
>
> If the guest is using 100% CPU but the kernel is still responsive at
> some level you can use SysRq to gather information:
> http://en.wikipedia.org/wiki/Magic_SysRq_key
>
> Especially Alt+SysRQ+t is interesting because it prints the current
> tasks to the console.
>
> If you are able to get interactive access again then top(1) would be
> interesting.
>
> Stefan
>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: A non-responsive guest problem

2011-08-22 Thread Stefan Hajnoczi
On Mon, Aug 22, 2011 at 10:37 AM, Paul  wrote:
> Hi,
> I found the clock of guest OS has been changed. For example, today was Aug
> 22, but I found the time of guest was Mar 22 from the VNC desktop. The clock
> source of guest was kvm-clock. Was it related to KVM clock bug? How about it
> if I changed the clock to tsc?

If the guest is using 100% CPU but the kernel is still responsive at
some level you can use SysRq to gather information:
http://en.wikipedia.org/wiki/Magic_SysRq_key

Especially Alt+SysRQ+t is interesting because it prints the current
tasks to the console.

If you are able to get interactive access again then top(1) would be
interesting.

Stefan
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: A non-responsive guest problem

2011-08-22 Thread Paul
Hi,

I found the clock of guest OS has been changed. For example, today was Aug
22, but I found the time of guest was Mar 22 from the VNC desktop. The clock
source of guest was kvm-clock. Was it related to KVM clock bug? How about it
if I changed the clock to tsc?

Thanks,
Paul

On Fri, Aug 19, 2011 at 9:18 PM, Stefan Hajnoczi  wrote:
>
> On Thu, Aug 18, 2011 at 8:42 AM, Paul  wrote:
> > Today I saw the guest OS hung and was no responsive. In the host, I
> > found the guest was running via virsh command. But I couldn't use ssh
> > to connect the guest, and even couldn't ping it. I could use VNC saw
> > the desktop of VNC, but I couldn't move the mouse pointer. In the
> > host, the qemu-kvm process occupied almost 100% CPU.
> >
> > The host was Redhat Enterprise Linux 6 64bit (not SP1).  The CPU was
> > Intel quad-core Q9550S. The guest was SUSE Linux Enterprise Server 11
> > SP1 64bit. The guest had been running for two weeks before it hung.
>
> Any interesting messages in dmesg on the host?
>
> Stefan
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: A non-responsive guest problem

2011-08-19 Thread Stefan Hajnoczi
On Thu, Aug 18, 2011 at 8:42 AM, Paul  wrote:
> Today I saw the guest OS hung and was no responsive. In the host, I
> found the guest was running via virsh command. But I couldn't use ssh
> to connect the guest, and even couldn't ping it. I could use VNC saw
> the desktop of VNC, but I couldn't move the mouse pointer. In the
> host, the qemu-kvm process occupied almost 100% CPU.
>
> The host was Redhat Enterprise Linux 6 64bit (not SP1).  The CPU was
> Intel quad-core Q9550S. The guest was SUSE Linux Enterprise Server 11
> SP1 64bit. The guest had been running for two weeks before it hung.

Any interesting messages in dmesg on the host?

Stefan
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: A non-responsive guest problem

2011-08-18 Thread Paul
Hi,

This problem happened more than twice. But I don't know how to
reproduce it. Sometimes I could see the mouse pointer in VNC desktop
could be moved, but sometimes I couldn't.


Thanks,
Paul

On Thu, Aug 18, 2011 at 3:42 PM, Paul  wrote:
>
> Hi,
> Today I saw the guest OS hung and was no responsive. In the host, I
> found the guest was running via virsh command. But I couldn't use ssh
> to connect the guest, and even couldn't ping it. I could use VNC saw
> the desktop of VNC, but I couldn't move the mouse pointer. In the
> host, the qemu-kvm process occupied almost 100% CPU.
>
> The host was Redhat Enterprise Linux 6 64bit (not SP1).  The CPU was
> Intel quad-core Q9550S. The guest was SUSE Linux Enterprise Server 11
> SP1 64bit. The guest had been running for two weeks before it hung.
>
> Here are some KVM trace messages:
> qemu-kvm-32604 [002] 3252503.178924: kvm_exit: reason ext_irq rip
> 0x81396eb8
>         qemu-kvm-32604 [002] 3252503.178924: kvm_entry: vcpu 1
>         qemu-kvm-32606 [003] 3252503.179049: kvm_exit: reason ext_irq
> rip 0x81396ebb
>         qemu-kvm-32606 [003] 3252503.179049: kvm_entry: vcpu 3
>            <...>-32603 [000] 3252503.179673: kvm_exit: reason ext_irq
> rip 0x810578c6
>            <...>-32603 [000] 3252503.179673: kvm_entry: vcpu 0
>            <...>-32605 [001] 3252503.179797: kvm_exit: reason ext_irq
> rip 0x81396ebb
>            <...>-32605 [001] 3252503.179798: kvm_entry: vcpu 2
>         qemu-kvm-32604 [002] 3252503.179923: kvm_exit: reason ext_irq
> rip 0x81396ebb
>         qemu-kvm-32604 [002] 3252503.179923: kvm_entry: vcpu 1
>         qemu-kvm-32606 [003] 3252503.180047: kvm_exit: reason ext_irq
> rip 0x81396ebb
>         qemu-kvm-32606 [003] 3252503.180048: kvm_entry: vcpu 3
>            <...>-32603 [000] 3252503.180672: kvm_exit: reason ext_irq
> rip 0x8105788c
>            <...>-32603 [000] 3252503.180672: kvm_entry: vcpu 0
>            <...>-32605 [001] 3252503.180796: kvm_exit: reason ext_irq
> rip 0x81396ebb
>            <...>-32605 [001] 3252503.180797: kvm_entry: vcpu 2
>         qemu-kvm-32604 [002] 3252503.180921: kvm_exit: reason ext_irq
> rip 0x81396ebb
>         qemu-kvm-32604 [002] 3252503.180922: kvm_entry: vcpu 1
>         qemu-kvm-32606 [003] 3252503.181046: kvm_exit: reason ext_irq
> rip 0x81396ebb
>         qemu-kvm-32606 [003] 3252503.181047: kvm_entry: vcpu 3
>            <...>-32603 [000] 3252503.181670: kvm_exit: reason ext_irq
> rip 0x81057878
>            <...>-32603 [000] 3252503.181671: kvm_entry: vcpu 0
>            <...>-32605 [001] 3252503.181795: kvm_exit: reason ext_irq
> rip 0x81396ebb
>            <...>-32605 [001] 3252503.181795: kvm_entry: vcpu 2
>         qemu-kvm-32604 [002] 3252503.181920: kvm_exit: reason ext_irq
> rip 0x81396ebb
>         qemu-kvm-32604 [002] 3252503.181921: kvm_entry: vcpu 1
>         qemu-kvm-32606 [003] 3252503.182045: kvm_exit: reason ext_irq
> rip 0x81396ebb
>         qemu-kvm-32606 [003] 3252503.182046: kvm_entry: vcpu 3
>
> Was it interrupt storm? Or the guest entered some dead loop? Are
> latest KVM code solve the similar problems?
>
> Thanks,
> Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


A non-responsive guest problem

2011-08-18 Thread Paul
Hi,
Today I saw the guest OS hung and was no responsive. In the host, I
found the guest was running via virsh command. But I couldn't use ssh
to connect the guest, and even couldn't ping it. I could use VNC saw
the desktop of VNC, but I couldn't move the mouse pointer. In the
host, the qemu-kvm process occupied almost 100% CPU.

The host was Redhat Enterprise Linux 6 64bit (not SP1).  The CPU was
Intel quad-core Q9550S. The guest was SUSE Linux Enterprise Server 11
SP1 64bit. The guest had been running for two weeks before it hung.

Here are some KVM trace messages:
qemu-kvm-32604 [002] 3252503.178924: kvm_exit: reason ext_irq rip
0x81396eb8
        qemu-kvm-32604 [002] 3252503.178924: kvm_entry: vcpu 1
        qemu-kvm-32606 [003] 3252503.179049: kvm_exit: reason ext_irq
rip 0x81396ebb
        qemu-kvm-32606 [003] 3252503.179049: kvm_entry: vcpu 3
           <...>-32603 [000] 3252503.179673: kvm_exit: reason ext_irq
rip 0x810578c6
           <...>-32603 [000] 3252503.179673: kvm_entry: vcpu 0
           <...>-32605 [001] 3252503.179797: kvm_exit: reason ext_irq
rip 0x81396ebb
           <...>-32605 [001] 3252503.179798: kvm_entry: vcpu 2
        qemu-kvm-32604 [002] 3252503.179923: kvm_exit: reason ext_irq
rip 0x81396ebb
        qemu-kvm-32604 [002] 3252503.179923: kvm_entry: vcpu 1
        qemu-kvm-32606 [003] 3252503.180047: kvm_exit: reason ext_irq
rip 0x81396ebb
        qemu-kvm-32606 [003] 3252503.180048: kvm_entry: vcpu 3
           <...>-32603 [000] 3252503.180672: kvm_exit: reason ext_irq
rip 0x8105788c
           <...>-32603 [000] 3252503.180672: kvm_entry: vcpu 0
           <...>-32605 [001] 3252503.180796: kvm_exit: reason ext_irq
rip 0x81396ebb
           <...>-32605 [001] 3252503.180797: kvm_entry: vcpu 2
        qemu-kvm-32604 [002] 3252503.180921: kvm_exit: reason ext_irq
rip 0x81396ebb
        qemu-kvm-32604 [002] 3252503.180922: kvm_entry: vcpu 1
        qemu-kvm-32606 [003] 3252503.181046: kvm_exit: reason ext_irq
rip 0x81396ebb
        qemu-kvm-32606 [003] 3252503.181047: kvm_entry: vcpu 3
           <...>-32603 [000] 3252503.181670: kvm_exit: reason ext_irq
rip 0x81057878
           <...>-32603 [000] 3252503.181671: kvm_entry: vcpu 0
           <...>-32605 [001] 3252503.181795: kvm_exit: reason ext_irq
rip 0x81396ebb
           <...>-32605 [001] 3252503.181795: kvm_entry: vcpu 2
        qemu-kvm-32604 [002] 3252503.181920: kvm_exit: reason ext_irq
rip 0x81396ebb
        qemu-kvm-32604 [002] 3252503.181921: kvm_entry: vcpu 1
        qemu-kvm-32606 [003] 3252503.182045: kvm_exit: reason ext_irq
rip 0x81396ebb
        qemu-kvm-32606 [003] 3252503.182046: kvm_entry: vcpu 3

Was it interrupt storm? Or the guest entered some dead loop? Are
latest KVM code solve the similar problems?

Thanks,
Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html