Re: A non-responsive guest problem
Hi, After changing the clock source from kvm-clock to tsc, everything is OK. Probably it's the bug of kvm-clock. Maybe these bugs have been fixed in latest qemu. Thanks, Paul On Wed, Aug 24, 2011 at 8:47 PM, Paul wrote: > > Hi, > > Sometimes this problem happened in one day, but sometimes it was very > difficult to reproduce it. > Previously the clock source of the guest was kvm-clock. Now I changed > it to tsc. The problem didn't occur until now. Is it related to the > clock source? I find that there are some bug fixes for kvm-clock > recently. (e.g., > http://www.spinics.net/lists/stable-commits/msg11942.html) Anyway, I > will update KVM later. > > Thanks, > Paul > > On Wed, Aug 24, 2011 at 6:24 PM, Stefan Hajnoczi wrote: > > > > On Wed, Aug 24, 2011 at 10:02 AM, Xiao Guangrong > > wrote: > > > On 08/24/2011 04:40 PM, Paul wrote: > > >> Hi, > > >> > > >> I captured the output of pidstat when the problem was reproduced: > > >> > > >> bash-4.1# pidstat -p $PID 8966 > > >> Linux 2.6.32-71.el6.x86_64 (test) 07/24/11 _x86_64_ (4 > > >> CPU) > > >> > > >> 16:25:15 PID %usr %system %guest %CPU CPU Command > > >> 16:25:15 8966 0.14 55.04 115.41 170.59 1 qemu-kvm > > >> > > > > > > I have tried to reproduce it, but it was failed. I am using the > > > current KVM code. I suggest you to test it by the new code if possible. > > > > Yes, that's a good idea. The issue might already be fixed. But if > > this is hard to reproduce then perhaps keep the spinning guest around > > a bit longer so we can poke at it and figure out what is happening. > > > > The pidstat output shows us that it's the guest that is spinning, not > > qemu-kvm userspace. > > > > The system time (time spent in host kernel) is also quite high so > > running kvm_stat might show some interesting KVM events happening. > > > > Stefan -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: A non-responsive guest problem
Hi, Sometimes this problem happened in one day, but sometimes it was very difficult to reproduce it. Previously the clock source of the guest was kvm-clock. Now I changed it to tsc. The problem didn't occur until now. Is it related to the clock source? I find that there are some bug fixes for kvm-clock recently. (e.g., http://www.spinics.net/lists/stable-commits/msg11942.html) Anyway, I will update KVM later. Thanks, Paul On Wed, Aug 24, 2011 at 6:24 PM, Stefan Hajnoczi wrote: > > On Wed, Aug 24, 2011 at 10:02 AM, Xiao Guangrong > wrote: > > On 08/24/2011 04:40 PM, Paul wrote: > >> Hi, > >> > >> I captured the output of pidstat when the problem was reproduced: > >> > >> bash-4.1# pidstat -p $PID 8966 > >> Linux 2.6.32-71.el6.x86_64 (test) 07/24/11 _x86_64_ (4 > >> CPU) > >> > >> 16:25:15 PID %usr %system %guest %CPU CPU Command > >> 16:25:15 8966 0.14 55.04 115.41 170.59 1 qemu-kvm > >> > > > > I have tried to reproduce it, but it was failed. I am using the > > current KVM code. I suggest you to test it by the new code if possible. > > Yes, that's a good idea. The issue might already be fixed. But if > this is hard to reproduce then perhaps keep the spinning guest around > a bit longer so we can poke at it and figure out what is happening. > > The pidstat output shows us that it's the guest that is spinning, not > qemu-kvm userspace. > > The system time (time spent in host kernel) is also quite high so > running kvm_stat might show some interesting KVM events happening. > > Stefan -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: A non-responsive guest problem
On Wed, Aug 24, 2011 at 10:02 AM, Xiao Guangrong wrote: > On 08/24/2011 04:40 PM, Paul wrote: >> Hi, >> >> I captured the output of pidstat when the problem was reproduced: >> >> bash-4.1# pidstat -p $PID 8966 >> Linux 2.6.32-71.el6.x86_64 (test) 07/24/11 _x86_64_ (4 CPU) >> >> 16:25:15 PID %usr %system %guest %CPU CPU Command >> 16:25:15 8966 0.14 55.04 115.41 170.59 1 qemu-kvm >> > > I have tried to reproduce it, but it was failed. I am using the > current KVM code. I suggest you to test it by the new code if possible. Yes, that's a good idea. The issue might already be fixed. But if this is hard to reproduce then perhaps keep the spinning guest around a bit longer so we can poke at it and figure out what is happening. The pidstat output shows us that it's the guest that is spinning, not qemu-kvm userspace. The system time (time spent in host kernel) is also quite high so running kvm_stat might show some interesting KVM events happening. Stefan -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: A non-responsive guest problem
On 08/24/2011 04:40 PM, Paul wrote: > Hi, > > I captured the output of pidstat when the problem was reproduced: > > bash-4.1# pidstat -p $PID 8966 > Linux 2.6.32-71.el6.x86_64 (test) 07/24/11_x86_64_(4 CPU) > > 16:25:15 PID%usr %system %guest%CPU CPU Command > 16:25:15 89660.14 55.04 115.41 170.59 1 qemu-kvm > I have tried to reproduce it, but it was failed. I am using the current KVM code. I suggest you to test it by the new code if possible. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: A non-responsive guest problem
Hi, I captured the output of pidstat when the problem was reproduced: bash-4.1# pidstat -p $PID 8966 Linux 2.6.32-71.el6.x86_64 (test) 07/24/11_x86_64_(4 CPU) 16:25:15 PID%usr %system %guest%CPU CPU Command 16:25:15 89660.14 55.04 115.41 170.59 1 qemu-kvm Thanks, Paul On Tue, Aug 23, 2011 at 6:09 PM, Stefan Hajnoczi wrote: > > On Tue, Aug 23, 2011 at 9:10 AM, Paul wrote: > > From trace messages, it seemed no interrupts for guest. > > I also tried sysrq, but it didn't work. I doubt that kvm-qemu entered > > some infinite loop. > > The fact that a fresh VNC connection to the guest works (but the mouse > doesn't move) means that qemu-kvm itself is not completely locked up. > The VNC server runs in a qemu-kvm thread. > > So this seems to be a problem inside the guest that causes it to > consume 100% CPU. > > One way to confirm this is to run pidstat(1): > $ pidstat -p $PID 1 > 11:05:51 PID %usr %system %guest %CPU CPU Command > 11:06:05 26994 65.00 0.00 98.00 163.00 1 kvm > > The %guest value is the percentage spent executing guest code. The > %usr time is the percentage spent executing qemu-kvm userspace code. > I'm guessing you will see >80% %guest. > > In my example I was running while true; do true; done inside the guest :). > > Perhaps Avi can suggest kvm_stat or other techniques to discover what > exactly this guest is doing. > > Stefan -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: A non-responsive guest problem
On Tue, Aug 23, 2011 at 9:10 AM, Paul wrote: > From trace messages, it seemed no interrupts for guest. > I also tried sysrq, but it didn't work. I doubt that kvm-qemu entered > some infinite loop. The fact that a fresh VNC connection to the guest works (but the mouse doesn't move) means that qemu-kvm itself is not completely locked up. The VNC server runs in a qemu-kvm thread. So this seems to be a problem inside the guest that causes it to consume 100% CPU. One way to confirm this is to run pidstat(1): $ pidstat -p $PID 1 11:05:51 PID%usr %system %guest%CPU CPU Command 11:06:0526994 65.000.00 98.00 163.00 1 kvm The %guest value is the percentage spent executing guest code. The %usr time is the percentage spent executing qemu-kvm userspace code. I'm guessing you will see >80% %guest. In my example I was running while true; do true; done inside the guest :). Perhaps Avi can suggest kvm_stat or other techniques to discover what exactly this guest is doing. Stefan -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: A non-responsive guest problem
Hi, >From trace messages, it seemed no interrupts for guest. I also tried sysrq, but it didn't work. I doubt that kvm-qemu entered some infinite loop. Thanks, Paul On Mon, Aug 22, 2011 at 8:10 PM, Stefan Hajnoczi wrote: > On Mon, Aug 22, 2011 at 10:37 AM, Paul wrote: >> Hi, >> I found the clock of guest OS has been changed. For example, today was Aug >> 22, but I found the time of guest was Mar 22 from the VNC desktop. The clock >> source of guest was kvm-clock. Was it related to KVM clock bug? How about it >> if I changed the clock to tsc? > > If the guest is using 100% CPU but the kernel is still responsive at > some level you can use SysRq to gather information: > http://en.wikipedia.org/wiki/Magic_SysRq_key > > Especially Alt+SysRQ+t is interesting because it prints the current > tasks to the console. > > If you are able to get interactive access again then top(1) would be > interesting. > > Stefan > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: A non-responsive guest problem
On Mon, Aug 22, 2011 at 10:37 AM, Paul wrote: > Hi, > I found the clock of guest OS has been changed. For example, today was Aug > 22, but I found the time of guest was Mar 22 from the VNC desktop. The clock > source of guest was kvm-clock. Was it related to KVM clock bug? How about it > if I changed the clock to tsc? If the guest is using 100% CPU but the kernel is still responsive at some level you can use SysRq to gather information: http://en.wikipedia.org/wiki/Magic_SysRq_key Especially Alt+SysRQ+t is interesting because it prints the current tasks to the console. If you are able to get interactive access again then top(1) would be interesting. Stefan -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: A non-responsive guest problem
Hi, I found the clock of guest OS has been changed. For example, today was Aug 22, but I found the time of guest was Mar 22 from the VNC desktop. The clock source of guest was kvm-clock. Was it related to KVM clock bug? How about it if I changed the clock to tsc? Thanks, Paul On Fri, Aug 19, 2011 at 9:18 PM, Stefan Hajnoczi wrote: > > On Thu, Aug 18, 2011 at 8:42 AM, Paul wrote: > > Today I saw the guest OS hung and was no responsive. In the host, I > > found the guest was running via virsh command. But I couldn't use ssh > > to connect the guest, and even couldn't ping it. I could use VNC saw > > the desktop of VNC, but I couldn't move the mouse pointer. In the > > host, the qemu-kvm process occupied almost 100% CPU. > > > > The host was Redhat Enterprise Linux 6 64bit (not SP1). The CPU was > > Intel quad-core Q9550S. The guest was SUSE Linux Enterprise Server 11 > > SP1 64bit. The guest had been running for two weeks before it hung. > > Any interesting messages in dmesg on the host? > > Stefan -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: A non-responsive guest problem
On Thu, Aug 18, 2011 at 8:42 AM, Paul wrote: > Today I saw the guest OS hung and was no responsive. In the host, I > found the guest was running via virsh command. But I couldn't use ssh > to connect the guest, and even couldn't ping it. I could use VNC saw > the desktop of VNC, but I couldn't move the mouse pointer. In the > host, the qemu-kvm process occupied almost 100% CPU. > > The host was Redhat Enterprise Linux 6 64bit (not SP1). The CPU was > Intel quad-core Q9550S. The guest was SUSE Linux Enterprise Server 11 > SP1 64bit. The guest had been running for two weeks before it hung. Any interesting messages in dmesg on the host? Stefan -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: A non-responsive guest problem
Hi, This problem happened more than twice. But I don't know how to reproduce it. Sometimes I could see the mouse pointer in VNC desktop could be moved, but sometimes I couldn't. Thanks, Paul On Thu, Aug 18, 2011 at 3:42 PM, Paul wrote: > > Hi, > Today I saw the guest OS hung and was no responsive. In the host, I > found the guest was running via virsh command. But I couldn't use ssh > to connect the guest, and even couldn't ping it. I could use VNC saw > the desktop of VNC, but I couldn't move the mouse pointer. In the > host, the qemu-kvm process occupied almost 100% CPU. > > The host was Redhat Enterprise Linux 6 64bit (not SP1). The CPU was > Intel quad-core Q9550S. The guest was SUSE Linux Enterprise Server 11 > SP1 64bit. The guest had been running for two weeks before it hung. > > Here are some KVM trace messages: > qemu-kvm-32604 [002] 3252503.178924: kvm_exit: reason ext_irq rip > 0x81396eb8 > qemu-kvm-32604 [002] 3252503.178924: kvm_entry: vcpu 1 > qemu-kvm-32606 [003] 3252503.179049: kvm_exit: reason ext_irq > rip 0x81396ebb > qemu-kvm-32606 [003] 3252503.179049: kvm_entry: vcpu 3 > <...>-32603 [000] 3252503.179673: kvm_exit: reason ext_irq > rip 0x810578c6 > <...>-32603 [000] 3252503.179673: kvm_entry: vcpu 0 > <...>-32605 [001] 3252503.179797: kvm_exit: reason ext_irq > rip 0x81396ebb > <...>-32605 [001] 3252503.179798: kvm_entry: vcpu 2 > qemu-kvm-32604 [002] 3252503.179923: kvm_exit: reason ext_irq > rip 0x81396ebb > qemu-kvm-32604 [002] 3252503.179923: kvm_entry: vcpu 1 > qemu-kvm-32606 [003] 3252503.180047: kvm_exit: reason ext_irq > rip 0x81396ebb > qemu-kvm-32606 [003] 3252503.180048: kvm_entry: vcpu 3 > <...>-32603 [000] 3252503.180672: kvm_exit: reason ext_irq > rip 0x8105788c > <...>-32603 [000] 3252503.180672: kvm_entry: vcpu 0 > <...>-32605 [001] 3252503.180796: kvm_exit: reason ext_irq > rip 0x81396ebb > <...>-32605 [001] 3252503.180797: kvm_entry: vcpu 2 > qemu-kvm-32604 [002] 3252503.180921: kvm_exit: reason ext_irq > rip 0x81396ebb > qemu-kvm-32604 [002] 3252503.180922: kvm_entry: vcpu 1 > qemu-kvm-32606 [003] 3252503.181046: kvm_exit: reason ext_irq > rip 0x81396ebb > qemu-kvm-32606 [003] 3252503.181047: kvm_entry: vcpu 3 > <...>-32603 [000] 3252503.181670: kvm_exit: reason ext_irq > rip 0x81057878 > <...>-32603 [000] 3252503.181671: kvm_entry: vcpu 0 > <...>-32605 [001] 3252503.181795: kvm_exit: reason ext_irq > rip 0x81396ebb > <...>-32605 [001] 3252503.181795: kvm_entry: vcpu 2 > qemu-kvm-32604 [002] 3252503.181920: kvm_exit: reason ext_irq > rip 0x81396ebb > qemu-kvm-32604 [002] 3252503.181921: kvm_entry: vcpu 1 > qemu-kvm-32606 [003] 3252503.182045: kvm_exit: reason ext_irq > rip 0x81396ebb > qemu-kvm-32606 [003] 3252503.182046: kvm_entry: vcpu 3 > > Was it interrupt storm? Or the guest entered some dead loop? Are > latest KVM code solve the similar problems? > > Thanks, > Paul -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
A non-responsive guest problem
Hi, Today I saw the guest OS hung and was no responsive. In the host, I found the guest was running via virsh command. But I couldn't use ssh to connect the guest, and even couldn't ping it. I could use VNC saw the desktop of VNC, but I couldn't move the mouse pointer. In the host, the qemu-kvm process occupied almost 100% CPU. The host was Redhat Enterprise Linux 6 64bit (not SP1). The CPU was Intel quad-core Q9550S. The guest was SUSE Linux Enterprise Server 11 SP1 64bit. The guest had been running for two weeks before it hung. Here are some KVM trace messages: qemu-kvm-32604 [002] 3252503.178924: kvm_exit: reason ext_irq rip 0x81396eb8 qemu-kvm-32604 [002] 3252503.178924: kvm_entry: vcpu 1 qemu-kvm-32606 [003] 3252503.179049: kvm_exit: reason ext_irq rip 0x81396ebb qemu-kvm-32606 [003] 3252503.179049: kvm_entry: vcpu 3 <...>-32603 [000] 3252503.179673: kvm_exit: reason ext_irq rip 0x810578c6 <...>-32603 [000] 3252503.179673: kvm_entry: vcpu 0 <...>-32605 [001] 3252503.179797: kvm_exit: reason ext_irq rip 0x81396ebb <...>-32605 [001] 3252503.179798: kvm_entry: vcpu 2 qemu-kvm-32604 [002] 3252503.179923: kvm_exit: reason ext_irq rip 0x81396ebb qemu-kvm-32604 [002] 3252503.179923: kvm_entry: vcpu 1 qemu-kvm-32606 [003] 3252503.180047: kvm_exit: reason ext_irq rip 0x81396ebb qemu-kvm-32606 [003] 3252503.180048: kvm_entry: vcpu 3 <...>-32603 [000] 3252503.180672: kvm_exit: reason ext_irq rip 0x8105788c <...>-32603 [000] 3252503.180672: kvm_entry: vcpu 0 <...>-32605 [001] 3252503.180796: kvm_exit: reason ext_irq rip 0x81396ebb <...>-32605 [001] 3252503.180797: kvm_entry: vcpu 2 qemu-kvm-32604 [002] 3252503.180921: kvm_exit: reason ext_irq rip 0x81396ebb qemu-kvm-32604 [002] 3252503.180922: kvm_entry: vcpu 1 qemu-kvm-32606 [003] 3252503.181046: kvm_exit: reason ext_irq rip 0x81396ebb qemu-kvm-32606 [003] 3252503.181047: kvm_entry: vcpu 3 <...>-32603 [000] 3252503.181670: kvm_exit: reason ext_irq rip 0x81057878 <...>-32603 [000] 3252503.181671: kvm_entry: vcpu 0 <...>-32605 [001] 3252503.181795: kvm_exit: reason ext_irq rip 0x81396ebb <...>-32605 [001] 3252503.181795: kvm_entry: vcpu 2 qemu-kvm-32604 [002] 3252503.181920: kvm_exit: reason ext_irq rip 0x81396ebb qemu-kvm-32604 [002] 3252503.181921: kvm_entry: vcpu 1 qemu-kvm-32606 [003] 3252503.182045: kvm_exit: reason ext_irq rip 0x81396ebb qemu-kvm-32606 [003] 3252503.182046: kvm_entry: vcpu 3 Was it interrupt storm? Or the guest entered some dead loop? Are latest KVM code solve the similar problems? Thanks, Paul -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html