Bug#638172: [Xen-devel] Re: Bug#638172: BUG: soft lockup - CPU#0 stuck for 61s! [qemu-dm:3205]

2011-10-18 Thread Ian Campbell
On Fri, 2011-08-26 at 10:28 +0200, Giuseppe Sacco wrote:
 I just installed the new kernel and switched to 64bit hypervisor. I'll
 let you know about any news.

Has everything been OK since you switched?

Ian.

 
 Il giorno ven, 26/08/2011 alle 08.25 +0100, Ian Campbell ha scritto:
 [...]
  I'm in the process of uploading a kernel to
  http://xenbits.xen.org/people/ianc/2.6.32-36~xen0/ which has a bunch of
  patches to the event channel (aka IRQ) subsystem backported. I think the
  kernel flavour you want is there already please could you give it a go
  when you get the chance.
 
 During boot I got this message. Is this related to this bug or to new
 kernel?
 
 [0.004000] [ cut here ]
 [0.004000] WARNING: at 
 /tmp/buildd/linux-2.6-2.6.32/debian/build/source_i386_xen/arch/x86/xen/enlighten.c:726
  perf_events_lapic_init+0x28/0x29()
 [0.004000] Hardware name: MS-7368
 [0.004000] Modules linked in:
 [0.004000] Pid: 0, comm: swapper Not tainted 2.6.32-5-xen-686 #1
 [0.004000] Call Trace:
 [0.004000]  [c1037839] ? warn_slowpath_common+0x5e/0x8a
 [0.004000]  [c103786f] ? warn_slowpath_null+0xa/0xc
 [0.004000]  [c1011db0] ? perf_events_lapic_init+0x28/0x29
 [0.004000]  [c14033dd] ? init_hw_perf_events+0x2dd/0x376
 [0.004000]  [c1403030] ? check_bugs+0x8/0xd8
 [0.004000]  [c13fb808] ? start_kernel+0x309/0x31d
 [0.004000]  [c13fd410] ? xen_start_kernel+0x564/0x56b
 [0.004000]  [c1409045] ? check_nmi_watchdog+0xcd/0x1f2
 [0.004000] ---[ end trace a7919e7f17c0a725 ]---
 
 Thanks,
 Giuseppe
 
 

-- 
Ian Campbell
Current Noise: Anathema - Flying

There's no future in time travel.




-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#638172: [Xen-devel] Re: Bug#638172: BUG: soft lockup - CPU#0 stuck for 61s! [qemu-dm:3205]

2011-10-18 Thread Giuseppe Sacco
Il giorno mar, 18/10/2011 alle 11.54 +0100, Ian Campbell ha scritto:
 On Fri, 2011-08-26 at 10:28 +0200, Giuseppe Sacco wrote:
  I just installed the new kernel and switched to 64bit hypervisor. I'll
  let you know about any news.
 
 Has everything been OK since you switched?

Yes: no crashes since then.

Thanks,
Giuseppe




-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#638172: [Xen-devel] Re: Bug#638172: BUG: soft lockup - CPU#0 stuck for 61s! [qemu-dm:3205]

2011-08-26 Thread Ian Campbell
Hi Giuseppe,

On Thu, 2011-08-25 at 07:56 +0100, Ian Campbell wrote:
 
  But, I may use a different kernel and let the server goes until crashes:
  no problem in rebooting it for kernel update.
 
 Thanks that would be useful, I'll put something together and let you
 know.

I'm in the process of uploading a kernel to
http://xenbits.xen.org/people/ianc/2.6.32-36~xen0/ which has a bunch of
patches to the event channel (aka IRQ) subsystem backported. I think the
kernel flavour you want is there already please could you give it a go
when you get the chance.

Thanks,
Ian.

-- 
Ian Campbell


On-line, adj.:
The idea that a human being should always be accessible to a computer.


signature.asc
Description: This is a digitally signed message part


Bug#638172: [Xen-devel] Re: Bug#638172: BUG: soft lockup - CPU#0 stuck for 61s! [qemu-dm:3205]

2011-08-26 Thread Giuseppe Sacco
Hi,
I just installed the new kernel and switched to 64bit hypervisor. I'll
let you know about any news.

Il giorno ven, 26/08/2011 alle 08.25 +0100, Ian Campbell ha scritto:
[...]
 I'm in the process of uploading a kernel to
 http://xenbits.xen.org/people/ianc/2.6.32-36~xen0/ which has a bunch of
 patches to the event channel (aka IRQ) subsystem backported. I think the
 kernel flavour you want is there already please could you give it a go
 when you get the chance.

During boot I got this message. Is this related to this bug or to new
kernel?

[0.004000] [ cut here ]
[0.004000] WARNING: at 
/tmp/buildd/linux-2.6-2.6.32/debian/build/source_i386_xen/arch/x86/xen/enlighten.c:726
 perf_events_lapic_init+0x28/0x29()
[0.004000] Hardware name: MS-7368
[0.004000] Modules linked in:
[0.004000] Pid: 0, comm: swapper Not tainted 2.6.32-5-xen-686 #1
[0.004000] Call Trace:
[0.004000]  [c1037839] ? warn_slowpath_common+0x5e/0x8a
[0.004000]  [c103786f] ? warn_slowpath_null+0xa/0xc
[0.004000]  [c1011db0] ? perf_events_lapic_init+0x28/0x29
[0.004000]  [c14033dd] ? init_hw_perf_events+0x2dd/0x376
[0.004000]  [c1403030] ? check_bugs+0x8/0xd8
[0.004000]  [c13fb808] ? start_kernel+0x309/0x31d
[0.004000]  [c13fd410] ? xen_start_kernel+0x564/0x56b
[0.004000]  [c1409045] ? check_nmi_watchdog+0xcd/0x1f2
[0.004000] ---[ end trace a7919e7f17c0a725 ]---

Thanks,
Giuseppe




-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#638172: [Xen-devel] Re: Bug#638172: BUG: soft lockup - CPU#0 stuck for 61s! [qemu-dm:3205]

2011-08-26 Thread Ian Campbell
On Fri, 2011-08-26 at 10:28 +0200, Giuseppe Sacco wrote:
 Hi,
 I just installed the new kernel and switched to 64bit hypervisor. I'll
 let you know about any news.
 
 Il giorno ven, 26/08/2011 alle 08.25 +0100, Ian Campbell ha scritto:
 [...]
  I'm in the process of uploading a kernel to
  http://xenbits.xen.org/people/ianc/2.6.32-36~xen0/ which has a bunch of
  patches to the event channel (aka IRQ) subsystem backported. I think the
  kernel flavour you want is there already please could you give it a go
  when you get the chance.
 
 During boot I got this message. Is this related to this bug or to new
 kernel?

It's a benign (but annoying) warning. I'm angling to get it dropped:
http://marc.info/?l=xen-develm=131400621622691

 
 [0.004000] [ cut here ]
 [0.004000] WARNING: at 
 /tmp/buildd/linux-2.6-2.6.32/debian/build/source_i386_xen/arch/x86/xen/enlighten.c:726
  perf_events_lapic_init+0x28/0x29()
 [0.004000] Hardware name: MS-7368
 [0.004000] Modules linked in:
 [0.004000] Pid: 0, comm: swapper Not tainted 2.6.32-5-xen-686 #1
 [0.004000] Call Trace:
 [0.004000]  [c1037839] ? warn_slowpath_common+0x5e/0x8a
 [0.004000]  [c103786f] ? warn_slowpath_null+0xa/0xc
 [0.004000]  [c1011db0] ? perf_events_lapic_init+0x28/0x29
 [0.004000]  [c14033dd] ? init_hw_perf_events+0x2dd/0x376
 [0.004000]  [c1403030] ? check_bugs+0x8/0xd8
 [0.004000]  [c13fb808] ? start_kernel+0x309/0x31d
 [0.004000]  [c13fd410] ? xen_start_kernel+0x564/0x56b
 [0.004000]  [c1409045] ? check_nmi_watchdog+0xcd/0x1f2
 [0.004000] ---[ end trace a7919e7f17c0a725 ]---
 
 Thanks,
 Giuseppe
 
 

-- 
Ian Campbell

May your Tongue stick to the Roof of your Mouth with the Force of a
Thousand Caramels.




-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org




Bug#638172: [Xen-devel] Re: Bug#638172: BUG: soft lockup - CPU#0 stuck for 61s! [qemu-dm:3205]

2011-08-25 Thread Giuseppe Sacco
Il giorno mer, 24/08/2011 alle 22.24 +0100, Ian Campbell ha scritto:
[...]
 Giuseppe, are you able to reproduce the issue you are seeing at will? If
 I build a test kernel would you be able to try it? You are using a -686
 kernel right (as opposed to amd64). OOI which hypervisor flavour do you
 use?

Unfortunately not. This server crashes often, but I do not have any
method to let it crashes. The worst case is when it crashes just after
reboot, the opposite was after a month from reboot.

But, I may use a different kernel and let the server goes until crashes:
no problem in rebooting it for kernel update.

And yes, it is a 32bit debian squeeze system on a 64 bit Athlon cpu.

Bye,
Giuseppe




-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#638172: [Xen-devel] Re: Bug#638172: BUG: soft lockup - CPU#0 stuck for 61s! [qemu-dm:3205]

2011-08-25 Thread Ian Campbell
On Thu, 2011-08-25 at 08:52 +0200, Giuseppe Sacco wrote:
 Il giorno mer, 24/08/2011 alle 22.24 +0100, Ian Campbell ha scritto:
 [...]
  Giuseppe, are you able to reproduce the issue you are seeing at will? If
  I build a test kernel would you be able to try it? You are using a -686
  kernel right (as opposed to amd64). OOI which hypervisor flavour do you
  use?
 
 Unfortunately not. This server crashes often, but I do not have any
 method to let it crashes. The worst case is when it crashes just after
 reboot, the opposite was after a month from reboot.
 
 But, I may use a different kernel and let the server goes until crashes:
 no problem in rebooting it for kernel update.

Thanks that would be useful, I'll put something together and let you
know.

 And yes, it is a 32bit debian squeeze system on a 64 bit Athlon cpu.

But are you running the amd64 or 686 flavour of the hypervisor? Both are
available in 32bit Debian. FWIW I would always recommend running the 64
bit hypervisor (even with 32 bit dom0) if you are able to.

Ian.

-- 
Ian Campbell


Once the toothpaste is out of the tube, it's hard to get it back in.
-- H. R. Haldeman


signature.asc
Description: This is a digitally signed message part


Bug#638172: [Xen-devel] Re: Bug#638172: BUG: soft lockup - CPU#0 stuck for 61s! [qemu-dm:3205]

2011-08-25 Thread Giuseppe Sacco
Il giorno gio, 25/08/2011 alle 07.56 +0100, Ian Campbell ha scritto:
[...]
  And yes, it is a 32bit debian squeeze system on a 64 bit Athlon cpu.
 
 But are you running the amd64 or 686 flavour of the hypervisor? Both are
 available in 32bit Debian. FWIW I would always recommend running the 64
 bit hypervisor (even with 32 bit dom0) if you are able to.

The only hypervisor package installed is xen-hypervisor-4.0-i386. I was
not aware that it would be possible to use the amd64 version when
running a 32bit kernel dom0. If you suggest it, I may switch to the
64bit version when I'll install your patched kernel.

Thanks,
Giuseppe




-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#638172: [Xen-devel] Re: Bug#638172: BUG: soft lockup - CPU#0 stuck for 61s! [qemu-dm:3205]

2011-08-24 Thread Ian Campbell
On Wed, 2011-08-24 at 21:24 +0100, Konrad Rzeszutek Wilk wrote:
 On Mon, Aug 22, 2011 at 10:00:11AM +0100, Ian Campbell wrote:
  @xen-devel:
  
  Does this look familiar to anyone, this is (I expect, hopefully Giuseppe
  will confirm) from Debian Squeeze which has a Xen 4.0.x with a PVops
  dom0 kernel based on xen.git from last summer (e73f4955a821) with more
  recent upstream longterm kernels (up to and including 2.6.32.41) merged
  in. While it does seem to have the switch from level to edge triggered
  interrupt the Debian kernel doesn't appear to have the switch to fasteoi
  for pirqs (0672fb44a111 plus a few followups) -- could that be related
  to this? (I'm not sure if that was a cleanup or a fix)
 
 It was a fix. We had some interrupts getting wedged - but I don't recall
 the stack exactly.

OK, sounds very much like those fixes are worth a try then. Thanks.

  But there are some follows - like
 e5ac0bda96c495321dbad9b57a4b1a93a5a72e7f
 7e186bdd0098b34c69fb8067c67340ae610ea499

The list of changesets against drivers/xen/events.c which are not in the
Debian kernel which I came up with is below [0]. A small number are
false positives (Debian already got them via the longterm branches) but
most are not.

The majority look like real fixes to me either for this particular issue
or other problems. I would consider them all candidates for inclusion in
a future update of the Debian kernel.

Giuseppe, are you able to reproduce the issue you are seeing at will? If
I build a test kernel would you be able to try it? You are using a -686
kernel right (as opposed to amd64). OOI which hypervisor flavour do you
use?

 The interesting about the stack trace is that it looks similiar to:
 
 http://groups.google.com/group/linux.kernel/browse_thread/thread/39a397566cafc979
 
 which has some fixes https://patchwork.kernel.org/patch/1091772/
 but they may not help.

Looks like it is an issue on native too. If it is an issue as far back
as 2.6.32 as well I expect we'll see the fix via the longterm channels
at some point.

Ian.

[0]

652c98bac315a2253628885f05cfd5f30b553ae5 xen: Use IRQF_FORCE_RESUME
f9f09329407e3a11140827ba71d8f9d9ede42823 xen: events: do not unmask event 
channels on resume
ea2020837ca7dc2c9bcfc477fb4d261cf067db4f xen: do not try to allocate the 
callback vector again at restore time
acad13511ebe1db666aab5807117d3ac647ea58d xen: events: Remove redundant clear of 
l2i at end of round-robin loop
0e2ec1fb16f9ca84f91de3d9427a0964d679738a xen: events: Make round-robin scan 
fairer by snapshotting each l2 word
188449f889c6c30709c7e9e8710b9eff14fd963f xen: events: Clean up round-robin 
evtchn scan.
1acdebd2d67f71d230f5857c28843e636b7dd92e xen: events: Make last processed event 
channel a per-cpu variable.
2d9c33e1b47b800e43a1444a65353fcb96e27165 xen: events: Process event channels 
notifications in round-robin order.
2b1c9503c615f68262ae2e96ee26ee128b486287 xen/events: only unmask irq if enabled
c756a6e7f711308ce85afc7d4c79213cce58a033 xen: allocate irq descriptors on any 
numa node
b1a003a2aa9ee0d3d69237725c91839f4b6a8559 xen/events: use locked set|clear_bit() 
for cpu_evtchn_mask
cca68cf2d344eb3c4ff996e99f36cf8f8382bc2b xen/evtchn: clear secondary CPUs' 
cpu_evtchn_mask[] after restore
c7ff70d2824191af119091d3af8db3bb57b06f77 xen: events: do not unmask event 
channels on resume
d4283609c7504309b8b93d7582857ff4623105f3 xen: improvements to VIRQ_DEBUG output
7c42097171f2e0beafa16e007a06e464b3014bea xen: correct parameter type for 
pirq_eoi
97708051c14157e95e25d112c26902f1c6fbb462 xen: ensure that all event channels 
start off bound to VCPU 0
e05885b24a55db82fbdb5cbc3f31426b976d7fc1 xen: set up IRQ before binding virq to 
evtchn
f0d4a0552f03b52027fb2c7958a1cbbe210cf418 xen/apic: fix pirq_eoi_gmfn resume
d2ea486300ca6e207ba178a425fbd023b8621bb1 xen/pirq: use fasteoi for MSI too
158d6550716687486000a828c601706b55322ad0 xen/pirq: use eoi as enable
2390c371ecd32d9f06e22871636185382bf70ab7 xen/events: use 
PHYSDEVOP_pirq_eoi_gmfn to get pirq need-EOI info
cb23e8d58ca35b6f9e10e1ea5682bd61f2442ebd xen/evtchn: correction, pirq hypercall 
does not unmask
43d8a5030a502074f3c4aafed4d6095ebd76067c xen/evtchn: pirq_eoi does unmask
f4526f9a78ffb3d3fc9f81636c5b0357fc1beccd xen/evtchn: make pirq enable/disable 
unmask/mask
c6a16a778f86699b339585ba5b9197035d77c40f xen/evtchn: rename retrigger_dynirq - 
irq
d0936845a856816af2af48ddf019366be68e96ba xen/evtchn: rename 
enable/disable_dynirq - unmask/mask_irq
2789ef00cbe2cdb38deb30ee4085b88befadb1b0 xen: make pirq interrupts use fasteoi
0672fb44a111dfb6386022071725c5b15c9de584 xen/events: change to using fasteoi
9fa90aa72d6af5cc2c2eddf56f9a586035e13ae7 xen: use 
dynamic_irq_init_keep_chip_data
f55ce8740101c54016544a0d633dc1b6b21244ae Introduce CONFIG_XEN_PVHVM compile 
option
f61692642a2a2b83a52dd7e64619ba3bb29998af xen/pirq: do EOI properly for pirq 
events
47cd3eb068a8a0cea124495e525ac16876fa08f6 xen/pci: fix compile error when 
CONFIG_PCI_XEN disabled

Bug#638172: [Xen-devel] Re: Bug#638172: BUG: soft lockup - CPU#0 stuck for 61s! [qemu-dm:3205]

2011-08-24 Thread Konrad Rzeszutek Wilk
On Mon, Aug 22, 2011 at 10:00:11AM +0100, Ian Campbell wrote:
 @xen-devel:
 
 Does this look familiar to anyone, this is (I expect, hopefully Giuseppe
 will confirm) from Debian Squeeze which has a Xen 4.0.x with a PVops
 dom0 kernel based on xen.git from last summer (e73f4955a821) with more
 recent upstream longterm kernels (up to and including 2.6.32.41) merged
 in. While it does seem to have the switch from level to edge triggered
 interrupt the Debian kernel doesn't appear to have the switch to fasteoi
 for pirqs (0672fb44a111 plus a few followups) -- could that be related
 to this? (I'm not sure if that was a cleanup or a fix)

It was a fix. We had some interrupts getting wedged - but I don't recall
the stack exactly. But there are some follows - like
e5ac0bda96c495321dbad9b57a4b1a93a5a72e7f
7e186bdd0098b34c69fb8067c67340ae610ea499

 
 Might the tsc unstable message be relevant?

Hm, not sure. I keep on getting those on my guests but life seems to go on.


The interesting about the stack trace is that it looks similiar to:

http://groups.google.com/group/linux.kernel/browse_thread/thread/39a397566cafc979

which has some fixes https://patchwork.kernel.org/patch/1091772/
but they may not help.



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#638172: BUG: soft lockup - CPU#0 stuck for 61s! [qemu-dm:3205]

2011-08-22 Thread Ian Campbell
@xen-devel:

Does this look familiar to anyone, this is (I expect, hopefully Giuseppe
will confirm) from Debian Squeeze which has a Xen 4.0.x with a PVops
dom0 kernel based on xen.git from last summer (e73f4955a821) with more
recent upstream longterm kernels (up to and including 2.6.32.41) merged
in. While it does seem to have the switch from level to edge triggered
interrupt the Debian kernel doesn't appear to have the switch to fasteoi
for pirqs (0672fb44a111 plus a few followups) -- could that be related
to this? (I'm not sure if that was a cleanup or a fix)

Might the tsc unstable message be relevant?

@Giuseppe:

Can you confirm the versions of the xen and qemu-dm packages which you
have got installed please.

Also I think it would be useful to see the guest configuration file and
details of the storage (filesystems, SCSI controllers etc) backing the
guest storage which you have got configured.

Full history of this report can be found at
http://bugs.debian.org/638172

Ian.

Can you also provide configuration details 
On Wed, 2011-08-17 at 12:44 +0200, Giuseppe Sacco wrote:
 Package: linux-image-2.6.32-5-xen-686
 Version: 2.6.32-35
 Severity: important
 
 Hi,
 I am experiencing a few outages on a XEN server. Often I have to
 poweroff the server, but last time I found some information in syslog.
 Here it is:
 
 Aug 17 12:35:45 centrum kernel: [ 1424.037532] Clocksource tsc unstable 
 (delta = -103103328 ns)
 Aug 17 12:35:45 centrum kernel: [ 1456.620463] BUG: soft lockup - CPU#0 stuck 
 for 61s! [qemu-dm:3205]
 Aug 17 12:35:45 centrum kernel: [ 1456.620463] Modules linked in: xt_state 
 xt_physdev iptable_filter tun cpufreq_userspace cpufreq_powersave cpufreq_c
 onservative cpufreq_stats dummy bridge stp xen_evtchn xenfs xt_tcpudp 
 iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 ip_tables 
 x_tab
 les xfs exportfs loop snd_hda_codec_atihdmi snd_hda_intel snd_hda_codec 
 radeon snd_hwdep ttm snd_pcm snd_timer drm_kms_helper drm snd soundcore snd_pa
 ge_alloc i2c_algo_bit shpchp i2c_piix4 pcspkr k8temp pci_hotplug i2c_core 
 evdev button ext3 jbd mbcache dm_mod aacraid 3w_9xxx 3w_ raid10 raid456 
 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy async_tx raid1 
 raid0 md_mod sata_nv sata_sil sata_via sd_mod crc_t10dif ata_generic ahc
 i pata_atiixp ohci_hcd libata processor ehci_hcd r8169 mii scsi_mod thermal 
 usbcore nls_base thermal_sys acpi_processor [last unloaded: dummy]
 Aug 17 12:35:45 centrum kernel: [ 1456.620463] 
 Aug 17 12:35:45 centrum kernel: [ 1456.620463] Pid: 3205, comm: qemu-dm 
 Tainted: GW  (2.6.32-5-xen-686 #1) MS-7368
 Aug 17 12:35:45 centrum kernel: [ 1456.620463] EIP: 0061:[c1002227] EFLAGS: 
 00200246 CPU: 0
 Aug 17 12:35:45 centrum kernel: [ 1456.620463] EIP is at 
 hypercall_page+0x227/0x1001
 Aug 17 12:35:45 centrum kernel: [ 1456.620463] EAX: 0004 EBX:  
 ECX:  EDX: ec8fa828
 Aug 17 12:35:45 centrum kernel: [ 1456.620463] ESI: ec8fa800 EDI: c24d9600 
 EBP: c27d4800 ESP: e4207d64
 Aug 17 12:35:45 centrum kernel: [ 1456.620463]  DS: 007b ES: 007b FS: 00d8 
 GS: 00e0 SS: 0069
 Aug 17 12:35:45 centrum kernel: [ 1456.620463] CR0: 8005003b CR2: b7712200 
 CR3: 241f CR4: 0660
 Aug 17 12:35:45 centrum kernel: [ 1456.620463] DR0:  DR1:  
 DR2:  DR3: 
 Aug 17 12:35:45 centrum kernel: [ 1456.620463] DR6: 0ff0 DR7: 0400
 Aug 17 12:35:45 centrum kernel: [ 1456.620463] Call Trace:
 Aug 17 12:35:45 centrum kernel: [ 1456.620463]  [c1006034] ? 
 xen_force_evtchn_callback+0xc/0x10
 Aug 17 12:35:45 centrum kernel: [ 1456.620463]  [c1006764] ? 
 check_events+0x8/0xc
 Aug 17 12:35:45 centrum kernel: [ 1456.620463]  [c1006723] ? 
 xen_irq_enable_direct_end+0x0/0x1
 Aug 17 12:35:45 centrum kernel: [ 1456.620463]  [ed93f457] ? 
 scsi_request_fn+0x440/0x47a [scsi_mod]
 Aug 17 12:35:45 centrum kernel: [ 1456.620463]  [c1132541] ? 
 __blk_run_queue+0x2e/0x5a
 Aug 17 12:35:45 centrum kernel: [ 1456.620463]  [c11325f3] ? 
 blk_run_queue+0x18/0x27
 Aug 17 12:35:45 centrum kernel: [ 1456.620463]  [ed93eaca] ? 
 scsi_run_queue+0x281/0x308 [scsi_mod]
 Aug 17 12:35:45 centrum kernel: [ 1456.620463]  [ed93f639] ? 
 scsi_next_command+0x25/0x2f [scsi_mod]
 Aug 17 12:35:45 centrum kernel: [ 1456.620463]  [ed93ffa1] ? 
 scsi_io_completion+0x383/0x3a4 [scsi_mod]
 Aug 17 12:35:45 centrum kernel: [ 1456.620463]  [ed93a723] ? 
 scsi_finish_command+0xaa/0xc2 [scsi_mod]
 Aug 17 12:35:45 centrum kernel: [ 1456.620463]  [c1135bb3] ? 
 blk_done_softirq+0x53/0x5f
 Aug 17 12:35:45 centrum kernel: [ 1456.620463]  [c103c8ea] ? 
 __do_softirq+0xaa/0x156
 Aug 17 12:35:45 centrum kernel: [ 1456.620463]  [c103c9c7] ? 
 do_softirq+0x31/0x3c
 Aug 17 12:35:45 centrum kernel: [ 1456.620463]  [c103caa1] ? 
 irq_exit+0x26/0x58
 Aug 17 12:35:45 centrum kernel: [ 1456.620463]  [c1199be6] ? 
 xen_evtchn_do_upcall+0x22/0x2c
 Aug 17 12:35:45 centrum kernel: [ 1456.620463]  [c1009b3f] ? 
 xen_do_upcall+0x7/0xc
 Aug 17 12:35:45 

Bug#638172: BUG: soft lockup - CPU#0 stuck for 61s! [qemu-dm:3205]

2011-08-22 Thread Giuseppe Sacco
Il giorno lun, 22/08/2011 alle 10.00 +0100, Ian Campbell ha scritto:
[...]
 @Giuseppe:
 
 Can you confirm the versions of the xen and qemu-dm packages which you
 have got installed please.

$ COLUMNS=120 dpkg -l \*linux-image\* \*qemu-\*  | grep ^ii
ii  linux-image-2.6.32-5-xen 2.6.32-35Linux 2.6.32 for modern 
PCs, Xen dom0 support
ii  linux-image-xen-686  2.6.32+29Linux for modern PCs 
(meta-package), Xen dom0 support
ii  qemu-keymaps 0.12.5+dfsg-3squeeze1QEMU keyboard maps
ii  qemu-system  0.12.5+dfsg-3squeeze1QEMU full system 
emulation binaries
ii  qemu-utils   0.12.5+dfsg-3squeeze1QEMU utilities
ii  xen-qemu-dm-4.0  4.0.1-2  Xen Qemu Device Model 
virtual machine hardware emulator

 Also I think it would be useful to see the guest configuration file and
 details of the storage (filesystems, SCSI controllers etc) backing the
 guest storage which you have got configured.

I do host two VM:

1.
kernel = /usr/lib/xen-default/boot/hvmloader
builder='hvm'
memory = 1024
shadow_memory = 8
name = piero
#vif = [ 'mac=00:16:3e:6f:81:0a, type=ioemu, bridge=dummy0' ]
vif = [ 'type=ioemu, bridge=dummy0' ]
disk = [ 'phy:mapper/rootvg-piero--disk,hda,w' ]
device_model = '/usr/' + arch_libdir + '/xen-default/bin/qemu-dm'

2.
kernel = /usr/lib/xen-default/boot/hvmloader
builder='hvm'
memory = 512
shadow_memory = 8
name = suse
vif = [ 'mac=00:16:3e:6f:81:0a, type=ioemu, bridge=dummy0' ]
disk = [ 'phy:mapper/rootvg-suse32--disk,hda,w' ]
device_model = '/usr/' + arch_libdir + '/xen-default/bin/qemu-dm'

Xen is really version 4:

$ ls -ld /usr/lib/xen-* /etc/alternatives/xen-default
lrwxrwxrwx 1 root root   16  4 gen  2011 /etc/alternatives/xen-default - 
/usr/lib/xen-4.0
drwxr-xr-x 5 root root 4096  3 gen  2011 /usr/lib/xen-4.0
drwxr-xr-x 3 root root 4096  9 dic  2010 /usr/lib/xen-common
lrwxrwxrwx 1 root root   29  4 gen  2011 /usr/lib/xen-default - 
/etc/alternatives/xen-default

storage for both machines is on LVM: two volumes on the same volume
group rootvg. rootvg have one physical volume: a raid1 md0 built with
two SATA disks connected to the same controller.

centrum:~# vgs
  VG #PV #LV #SN Attr   VSize   VFree  
  rootvg   1  13   0 wz--n- 370,35g 121,07g
centrum:~# pvs
  PV VG Fmt  Attr PSize   PFree  
  /dev/md2   rootvg lvm2 a-   370,35g 121,07g
centrum:~# cat /proc/mdstat 
Personalities : [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 
md2 : active raid1 sda3[0] sdb3[1]
  388339136 blocks [2/2] [UU]
  
md1 : active raid1 sda2[0] sdb2[1]
  264960 blocks [2/2] [UU]
  
md0 : active raid1 sda1[0] sdb1[1]
  2102464 blocks [2/2] [UU]
  
unused devices: none
centrum:~# cat 
/sys/devices/pci:00/:00:12.0/host4/target4:0:0/4:0:0:0/model
SAMSUNG HD403LJ 
centrum:~# cat 
/sys/devices/pci:00/:00:12.0/host2/target2:0:0/2:0:0:0/model
SAMSUNG HD403LJ 

Bye,
Giuseppe




-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#638172: BUG: soft lockup - CPU#0 stuck for 61s! [qemu-dm:3205]

2011-08-21 Thread Ben Hutchings
On Wed, 2011-08-17 at 14:54 +0200, Giuseppe Sacco wrote:
 Hi Ben,
 
 Il giorno mer, 17/08/2011 alle 13.08 +0100, Ben Hutchings ha scritto:
 [...]
  This indicates there was an earlier WARNING from the kernel; what was
  that?
 
 here it is:
[...]

OK, I don't think that has anything to do with the problem.

Ian, can you make something of the original trace?  Maybe some sort of
deadlock in use of an event channel?  How might the 'Clocksource tsc
unstable' relate to this, if at all?

Ben.



signature.asc
Description: This is a digitally signed message part


Bug#638172: BUG: soft lockup - CPU#0 stuck for 61s! [qemu-dm:3205]

2011-08-17 Thread Giuseppe Sacco
Package: linux-image-2.6.32-5-xen-686
Version: 2.6.32-35
Severity: important

Hi,
I am experiencing a few outages on a XEN server. Often I have to
poweroff the server, but last time I found some information in syslog.
Here it is:

Aug 17 12:35:45 centrum kernel: [ 1424.037532] Clocksource tsc unstable (delta 
= -103103328 ns)
Aug 17 12:35:45 centrum kernel: [ 1456.620463] BUG: soft lockup - CPU#0 stuck 
for 61s! [qemu-dm:3205]
Aug 17 12:35:45 centrum kernel: [ 1456.620463] Modules linked in: xt_state 
xt_physdev iptable_filter tun cpufreq_userspace cpufreq_powersave cpufreq_c
onservative cpufreq_stats dummy bridge stp xen_evtchn xenfs xt_tcpudp 
iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 ip_tables x_tab
les xfs exportfs loop snd_hda_codec_atihdmi snd_hda_intel snd_hda_codec radeon 
snd_hwdep ttm snd_pcm snd_timer drm_kms_helper drm snd soundcore snd_pa
ge_alloc i2c_algo_bit shpchp i2c_piix4 pcspkr k8temp pci_hotplug i2c_core evdev 
button ext3 jbd mbcache dm_mod aacraid 3w_9xxx 3w_ raid10 raid456 
async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy async_tx raid1 
raid0 md_mod sata_nv sata_sil sata_via sd_mod crc_t10dif ata_generic ahc
i pata_atiixp ohci_hcd libata processor ehci_hcd r8169 mii scsi_mod thermal 
usbcore nls_base thermal_sys acpi_processor [last unloaded: dummy]
Aug 17 12:35:45 centrum kernel: [ 1456.620463] 
Aug 17 12:35:45 centrum kernel: [ 1456.620463] Pid: 3205, comm: qemu-dm 
Tainted: GW  (2.6.32-5-xen-686 #1) MS-7368
Aug 17 12:35:45 centrum kernel: [ 1456.620463] EIP: 0061:[c1002227] EFLAGS: 
00200246 CPU: 0
Aug 17 12:35:45 centrum kernel: [ 1456.620463] EIP is at 
hypercall_page+0x227/0x1001
Aug 17 12:35:45 centrum kernel: [ 1456.620463] EAX: 0004 EBX:  ECX: 
 EDX: ec8fa828
Aug 17 12:35:45 centrum kernel: [ 1456.620463] ESI: ec8fa800 EDI: c24d9600 EBP: 
c27d4800 ESP: e4207d64
Aug 17 12:35:45 centrum kernel: [ 1456.620463]  DS: 007b ES: 007b FS: 00d8 GS: 
00e0 SS: 0069
Aug 17 12:35:45 centrum kernel: [ 1456.620463] CR0: 8005003b CR2: b7712200 CR3: 
241f CR4: 0660
Aug 17 12:35:45 centrum kernel: [ 1456.620463] DR0:  DR1:  DR2: 
 DR3: 
Aug 17 12:35:45 centrum kernel: [ 1456.620463] DR6: 0ff0 DR7: 0400
Aug 17 12:35:45 centrum kernel: [ 1456.620463] Call Trace:
Aug 17 12:35:45 centrum kernel: [ 1456.620463]  [c1006034] ? 
xen_force_evtchn_callback+0xc/0x10
Aug 17 12:35:45 centrum kernel: [ 1456.620463]  [c1006764] ? 
check_events+0x8/0xc
Aug 17 12:35:45 centrum kernel: [ 1456.620463]  [c1006723] ? 
xen_irq_enable_direct_end+0x0/0x1
Aug 17 12:35:45 centrum kernel: [ 1456.620463]  [ed93f457] ? 
scsi_request_fn+0x440/0x47a [scsi_mod]
Aug 17 12:35:45 centrum kernel: [ 1456.620463]  [c1132541] ? 
__blk_run_queue+0x2e/0x5a
Aug 17 12:35:45 centrum kernel: [ 1456.620463]  [c11325f3] ? 
blk_run_queue+0x18/0x27
Aug 17 12:35:45 centrum kernel: [ 1456.620463]  [ed93eaca] ? 
scsi_run_queue+0x281/0x308 [scsi_mod]
Aug 17 12:35:45 centrum kernel: [ 1456.620463]  [ed93f639] ? 
scsi_next_command+0x25/0x2f [scsi_mod]
Aug 17 12:35:45 centrum kernel: [ 1456.620463]  [ed93ffa1] ? 
scsi_io_completion+0x383/0x3a4 [scsi_mod]
Aug 17 12:35:45 centrum kernel: [ 1456.620463]  [ed93a723] ? 
scsi_finish_command+0xaa/0xc2 [scsi_mod]
Aug 17 12:35:45 centrum kernel: [ 1456.620463]  [c1135bb3] ? 
blk_done_softirq+0x53/0x5f
Aug 17 12:35:45 centrum kernel: [ 1456.620463]  [c103c8ea] ? 
__do_softirq+0xaa/0x156
Aug 17 12:35:45 centrum kernel: [ 1456.620463]  [c103c9c7] ? 
do_softirq+0x31/0x3c
Aug 17 12:35:45 centrum kernel: [ 1456.620463]  [c103caa1] ? 
irq_exit+0x26/0x58
Aug 17 12:35:45 centrum kernel: [ 1456.620463]  [c1199be6] ? 
xen_evtchn_do_upcall+0x22/0x2c
Aug 17 12:35:45 centrum kernel: [ 1456.620463]  [c1009b3f] ? 
xen_do_upcall+0x7/0xc
Aug 17 12:35:45 centrum kernel: [ 1456.620463]  [c1002407] ? 
hypercall_page+0x407/0x1001
Aug 17 12:35:45 centrum kernel: [ 1456.620463]  [eda10015] ? 
HYPERVISOR_event_channel_op+0x15/0x4c [xen_evtchn]
Aug 17 12:35:45 centrum kernel: [ 1456.620463]  [c10937c6] ? 
__alloc_pages_nodemask+0xf3/0x4d9
Aug 17 12:35:45 centrum kernel: [ 1456.620463] EIP is at 
hypercall_page+0x227/0x1001
Aug 17 12:35:45 centrum kernel: [ 1456.620463] EAX: 0004 EBX:  ECX: 
 EDX: ec8fa828
Aug 17 12:35:45 centrum kernel: [ 1456.620463] ESI: ec8fa800 EDI: c24d9600 EBP: 
c27d4800 ESP: e4207d64
Aug 17 12:35:45 centrum kernel: [ 1456.620463]  DS: 007b ES: 007b FS: 00d8 GS: 
00e0 SS: 0069
Aug 17 12:35:45 centrum kernel: [ 1456.620463] CR0: 8005003b CR2: b7712200 CR3: 
241f CR4: 0660
Aug 17 12:35:45 centrum kernel: [ 1456.620463] DR0:  DR1:  DR2: 
 DR3: 
Aug 17 12:35:45 centrum kernel: [ 1456.620463] DR6: 0ff0 DR7: 0400
Aug 17 12:35:45 centrum kernel: [ 1456.620463] Call Trace:
Aug 17 12:35:45 centrum kernel: [ 1456.620463]  [c1006034] ? 
xen_force_evtchn_callback+0xc/0x10
Aug 17 12:35:45 centrum kernel: [ 1456.620463]  [c1006764] ? 

Bug#638172: BUG: soft lockup - CPU#0 stuck for 61s! [qemu-dm:3205]

2011-08-17 Thread Ben Hutchings
On Wed, 2011-08-17 at 12:44 +0200, Giuseppe Sacco wrote:
 Package: linux-image-2.6.32-5-xen-686
 Version: 2.6.32-35
 Severity: important
 
 Hi,
 I am experiencing a few outages on a XEN server. Often I have to
 poweroff the server, but last time I found some information in syslog.
 Here it is:
 
 Aug 17 12:35:45 centrum kernel: [ 1424.037532] Clocksource tsc unstable 
 (delta = -103103328 ns)
 Aug 17 12:35:45 centrum kernel: [ 1456.620463] BUG: soft lockup - CPU#0 stuck 
 for 61s! [qemu-dm:3205]
 Aug 17 12:35:45 centrum kernel: [ 1456.620463] Modules linked in: xt_state 
 xt_physdev iptable_filter tun cpufreq_userspace cpufreq_powersave cpufreq_c
 onservative cpufreq_stats dummy bridge stp xen_evtchn xenfs xt_tcpudp 
 iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 ip_tables 
 x_tab
 les xfs exportfs loop snd_hda_codec_atihdmi snd_hda_intel snd_hda_codec 
 radeon snd_hwdep ttm snd_pcm snd_timer drm_kms_helper drm snd soundcore snd_pa
 ge_alloc i2c_algo_bit shpchp i2c_piix4 pcspkr k8temp pci_hotplug i2c_core 
 evdev button ext3 jbd mbcache dm_mod aacraid 3w_9xxx 3w_ raid10 raid456 
 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy async_tx raid1 
 raid0 md_mod sata_nv sata_sil sata_via sd_mod crc_t10dif ata_generic ahc
 i pata_atiixp ohci_hcd libata processor ehci_hcd r8169 mii scsi_mod thermal 
 usbcore nls_base thermal_sys acpi_processor [last unloaded: dummy]
 Aug 17 12:35:45 centrum kernel: [ 1456.620463] 
 Aug 17 12:35:45 centrum kernel: [ 1456.620463] Pid: 3205, comm: qemu-dm 
 Tainted: GW  (2.6.32-5-xen-686 #1) MS-7368
[...]

This indicates there was an earlier WARNING from the kernel; what was
that?

Ben.



signature.asc
Description: This is a digitally signed message part


Bug#638172: BUG: soft lockup - CPU#0 stuck for 61s! [qemu-dm:3205]

2011-08-17 Thread Giuseppe Sacco
Hi Ben,

Il giorno mer, 17/08/2011 alle 13.08 +0100, Ben Hutchings ha scritto:
[...]
 This indicates there was an earlier WARNING from the kernel; what was
 that?

here it is:

Aug 17 12:07:23 centrum kernel: [0.004000] WARNING: at 
/tmp/buildd/linux-2.6-2.6.32/debian/build/source_i386_xen/arch/x86/xen/enlighten.c:726
 perf
_events_lapic_init+0x28/0x29()
Aug 17 12:07:23 centrum kernel: [0.004000] Hardware name: MS-7368
Aug 17 12:07:23 centrum kernel: [0.004000] Modules linked in:
Aug 17 12:07:23 centrum kernel: [0.004000] Pid: 0, comm: swapper Not 
tainted 2.6.32-5-xen-686 #1
Aug 17 12:07:23 centrum kernel: [0.004000] Call Trace:
Aug 17 12:07:23 centrum kernel: [0.004000]  [c1037819] ? 
warn_slowpath_common+0x5e/0x8a
Aug 17 12:07:23 centrum kernel: [0.004000]  [c103784f] ? 
warn_slowpath_null+0xa/0xc
Aug 17 12:07:23 centrum kernel: [0.004000]  [c1011d90] ? 
perf_events_lapic_init+0x28/0x29
Aug 17 12:07:23 centrum kernel: [0.004000]  [c14033c5] ? 
init_hw_perf_events+0x2dd/0x376
Aug 17 12:07:23 centrum kernel: [0.004000]  [c1403018] ? 
check_bugs+0x8/0xd8
Aug 17 12:07:23 centrum kernel: [0.004000]  [c13fb808] ? 
start_kernel+0x309/0x31d
Aug 17 12:07:23 centrum kernel: [0.004000]  [c13fd410] ? 
xen_start_kernel+0x564/0x56b
Aug 17 12:07:23 centrum kernel: [0.004000]  [c1409045] ? 
check_nmi_watchdog+0xe5/0x1f2
Aug 17 12:07:23 centrum kernel: [0.004000] ---[ end trace a7919e7f17c0a725 
]---





-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org