Bug#789770: linux-image-3.16.0-4-amd64: Dell R310 server (Xen 4.4 Dom0) periodically crashing after upgrade to Jessie from Wheezy

2015-06-24 Thread Andrew Perry
Package: src:linux
Version: 3.16.7-ckt11-1
Severity: critical
Justification: breaks the whole system

Dear Maintainer,

After upgrading the server from wheezy to jessie the server, which is a Xen 4.4 
Dom0, it is crashing every 2-3 days and bringing down all its hosted VMs, with 
the following syslog dump:

Jun 24 13:24:09 servername kernel: [438520.690952] WARNING: CPU: 0 PID: 0 at 
/build/linux-QZaPpC/linux-3.16.7-ckt11/net/sched/sch_generic.c:264 
dev_watchdog+0x236/0x240()
Jun 24 13:24:09 servername kernel: [438520.690959] NETDEV WATCHDOG: eth0 
(bnx2): transmit queue 7 timed out
Jun 24 13:24:09 servername kernel: [438520.690963] Modules linked in: 
dm_snapshot dm_bufio binfmt_misc xt_physdev xen_netback xen_blkback xen_gntdev 
xen_evtchn xenfs xen_privcmd nfsd auth_rpcgss oid_registry nfs_acl nfs lockd 
fscache sunrpc ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr 
iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi bridge stp llc xt_tcpudp 
xt_recent xt_conntrack iptable_mangle iptable_nat nf_conntrack_ipv4 
nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_filter ip_tables 
x_tables joydev acpi_power_meter coretemp ipmi_devintf evdev ttm drm_kms_helper 
drm i2c_algo_bit dcdbas iTCO_wdt iTCO_vendor_support i2c_core pcspkr button 
ipmi_si tpm_tis ipmi_msghandler tpm lpc_ich i7core_edac mfd_core edac_core 
processor shpchp thermal_sys loop autofs4 ext4 crc16 mbcache jbd2 dm_mod sd_mod 
crc_t10dif crct10dif_generic sg ses crct10dif_common enclosure sr_mod cdrom 
usb_storage hid_generic usbhid hid ata_generic crc32c_intel ehci_pci ata_piix 
ehci_hcd libata m
 egaraid_sas usbcore usb_common scsi_mod bnx2
Jun 24 13:24:09 servername kernel: [438520.691049] CPU: 0 PID: 0 Comm: 
swapper/0 Not tainted 3.16.0-4-amd64 #1 Debian 3.16.7-ckt11-1
Jun 24 13:24:09 servername kernel: [438520.691052] Hardware name: Dell Inc. 
PowerEdge R310/05XKKK, BIOS 1.6.4 03/03/2011
Jun 24 13:24:09 servername kernel: [438520.691055]  0009 
8150b405 88007c803e28 81067797
Jun 24 13:24:09 servername kernel: [438520.691059]  0007 
88007c803e78 0008 
Jun 24 13:24:09 servername kernel: [438520.691062]  880002444000 
810677fc 81777fb8 8830
Jun 24 13:24:09 servername kernel: [438520.691066] Call Trace:
Jun 24 13:24:09 servername kernel: [438520.691069]  IRQ  [8150b405] 
? dump_stack+0x41/0x51
Jun 24 13:24:09 servername kernel: [438520.691081]  [81067797] ? 
warn_slowpath_common+0x77/0x90
Jun 24 13:24:09 servername kernel: [438520.691085]  [810677fc] ? 
warn_slowpath_fmt+0x4c/0x50
Jun 24 13:24:09 servername kernel: [438520.691092]  [8143eb96] ? 
dev_watchdog+0x236/0x240
Jun 24 13:24:09 servername kernel: [438520.691096]  [8143e960] ? 
dev_graft_qdisc+0x70/0x70
Jun 24 13:24:09 servername kernel: [438520.691102]  [81072ae1] ? 
call_timer_fn+0x31/0x100
Jun 24 13:24:09 servername kernel: [438520.691108]  [8143e960] ? 
dev_graft_qdisc+0x70/0x70
Jun 24 13:24:09 servername kernel: [438520.691113]  [81074119] ? 
run_timer_softirq+0x209/0x2f0
Jun 24 13:24:09 servername kernel: [438520.691117]  [8106c641] ? 
__do_softirq+0xf1/0x290
Jun 24 13:24:09 servername kernel: [438520.691122]  [8106ca15] ? 
irq_exit+0x95/0xa0
Jun 24 13:24:09 servername kernel: [438520.691128]  [81358495] ? 
xen_evtchn_do_upcall+0x35/0x50
Jun 24 13:24:09 servername kernel: [438520.691135]  [8151325e] ? 
xen_do_hypervisor_callback+0x1e/0x30
Jun 24 13:24:09 servername kernel: [438520.691137]  EOI  [810013aa] 
? xen_hypercall_sched_op+0xa/0x20
Jun 24 13:24:09 servername kernel: [438520.691145]  [810013aa] ? 
xen_hypercall_sched_op+0xa/0x20
Jun 24 13:24:09 servername kernel: [438520.691150]  [81009e0c] ? 
xen_safe_halt+0xc/0x20
Jun 24 13:24:09 servername kernel: [438520.691154]  [8101c999] ? 
default_idle+0x19/0xb0
Jun 24 13:24:09 servername kernel: [438520.691158]  [810a7ff0] ? 
cpu_startup_entry+0x340/0x400
Jun 24 13:24:09 servername kernel: [438520.691161]  [81903071] ? 
start_kernel+0x492/0x49d
Jun 24 13:24:09 servername kernel: [438520.691163]  [81902a04] ? 
set_init_arg+0x4e/0x4e
Jun 24 13:24:09 servername kernel: [438520.691166]  [81904f64] ? 
xen_start_kernel+0x569/0x573
Jun 24 13:24:09 servername kernel: [438520.691169] ---[ end trace 
05255fd39e925fd5 ]---
Jun 24 13:24:09 servername kernel: [438520.691173] bnx2 :04:00.0 eth0: --- 
start FTQ dump ---
Jun 24 13:24:09 servername kernel: [438520.691206] bnx2 :04:00.0 eth0: 
RV2P_PFTQ_CTL 0001
Jun 24 13:24:09 servername kernel: [438520.691235] bnx2 :04:00.0 eth0: 
RV2P_TFTQ_CTL 0002
Jun 24 13:24:09 servername kernel: [438520.691265] bnx2 :04:00.0 eth0: 
RV2P_MFTQ_CTL 4000
Jun 24 13:24:09 servername kernel: [438520.691294] bnx2 :04:00.0 eth0: 
TBDR_FTQ_CTL 4002
Jun 24 13:24:09 servername kernel: [438520.691324] 

Bug#786936: xen-hypervisor-4.4-amd64: Upgrade dom0 from wheezy to jessie on Dell R610 results in dom0 unaccessible with xen_netback issue

2015-05-26 Thread Andrew Perry
Package: xen-hypervisor-4.4-amd64
Version: 4.4.1-9
Severity: critical
Justification: breaks the whole system

Dear Maintainer,

After upgrading the R610 server from Debian 7 to Debian 8, the dom0 becomes 
unresponsive via ssh after an hour or so, although the domUs still remain 
accessible.

Initially we thought it may be a disk space issue on / or /boot so action was 
taken to increase those petition sizes but it has no effect.

We get the following trace in /var/log/syslog:

May 26 09:18:59 servername kernel: [31526.937788] BUG: unable to handle kernel 
paging request at c90013a4b158
May 26 09:18:59 servername kernel: [31526.937798] IP: [a06802a0] 
xenvif_get_ethtool_stats+0x50/0x80 [xen_netback]
May 26 09:18:59 servername kernel: [31526.937807] PGD b243c067 PUD b243d067 PMD 
8a56c067 PTE 0
May 26 09:18:59 servername kernel: [31526.937813] Oops:  [#1] SMP 
May 26 09:18:59 servername kernel: [31526.937817] Modules linked in: 
dm_snapshot dm_bufio binfmt_misc xt_tcpudp xt_physdev iptable_filter ip_tables 
x_tables xen_netback xen_blkback xen_gntdev xen_evtchn xenfs xen_privcmd nfsd 
auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc ib_iser rdma_cm iw_cm 
ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi 
scsi_transport_iscsi bridge stp llc nls_utf8 nls_cp437 vfat fat joydev 
intel_powerclamp coretemp crc32_pclmul ghash_clmulni_intel ttm evdev 
aesni_intel ipmi_devintf iTCO_wdt iTCO_vendor_support aes_x86_64 drm_kms_helper 
acpi_power_meter dcdbas lrw gf128mul glue_helper tpm_tis tpm drm i2c_algo_bit 
ablk_helper processor i2c_core lpc_ich ipmi_si ipmi_msghandler i7core_edac 
thermal_sys cryptd mfd_core button psmouse pcspkr serio_raw shpchp wmi 
edac_core loop autofs4 ext4 crc16 mbcache jbd2 dm_mod hid_generic usbhid hid sg 
sr_mod cdrom ses sd_mod enclosure ata_generic crc32c_intel lpfc crc_t10dif 
crct10dif_generic ehci_pci
  uhci_hcd crct10dif_pclmul ata_piix ehci_hcd scsi_transport_fc libata 
megaraid_sas scsi_tgt usbcore scsi_mod usb_common crct10dif_common bnx2
May 26 09:18:59 servername kernel: [31526.937917] CPU: 0 PID: 1311 Comm: snmpd 
Not tainted 3.16.0-4-amd64 #1 Debian 3.16.7-ckt9-3~deb8u1
May 26 09:18:59 servername kernel: [31526.937922] Hardware name: Dell Inc. 
PowerEdge R610/0F0XJ6, BIOS 6.4.0 07/23/2013
May 26 09:18:59 servername kernel: [31526.937927] task: 88008a86a250 ti: 
880002b4c000 task.ti: 880002b4c000
May 26 09:18:59 servername kernel: [31526.937931] RIP: 
e030:[a06802a0]  [a06802a0] 
xenvif_get_ethtool_stats+0x50/0x80 [xen_netback]
May 26 09:18:59 servername kernel: [31526.937939] RSP: e02b:880002b4fd70  
EFLAGS: 00010283
May 26 09:18:59 servername kernel: [31526.937942] RAX: c90013a14f38 RBX: 
0230f940 RCX: 92008ea28c88
May 26 09:18:59 servername kernel: [31526.937946] RDX: 88008ecadc00 RSI: 
c90013a4b190 RDI: 88008da7c000
May 26 09:18:59 servername kernel: [31526.937949] RBP: 880002b4fe10 R08: 
a06827e0 R09: 0006
May 26 09:18:59 servername kernel: [31526.937953] R10: 0010ebb8 R11: 
0246 R12: 0005
May 26 09:18:59 servername kernel: [31526.937957] R13: 88008da7c000 R14: 
a0682640 R15: 88008ecadc00
May 26 09:18:59 servername kernel: [31526.937965] FS:  7f93bcc9e700() 
GS:8800b2a0() knlGS:
May 26 09:18:59 servername kernel: [31526.937969] CS:  e033 DS:  ES:  
CR0: 8005003b
May 26 09:18:59 servername kernel: [31526.937973] CR2: c90013a4b158 CR3: 
899ff000 CR4: 2660
May 26 09:18:59 servername kernel: [31526.937977] Stack:
May 26 09:18:59 servername kernel: [31526.937979]  814225f1 
000400114813 7fff3fff32a8 
May 26 09:18:59 servername kernel: [31526.937985]  880002b4ff18 
001d3fff32a0 880002b4fde0 814039a6
May 26 09:18:59 servername kernel: [31526.937990]  0005001d 
8805 81420455 7fff3fff3280
May 26 09:18:59 servername kernel: [31526.937995] Call Trace:
May 26 09:18:59 servername kernel: [31526.938003]  [814225f1] ? 
dev_ethtool+0x921/0x1ac0
May 26 09:18:59 servername kernel: [31526.938009]  [814039a6] ? 
___sys_recvmsg+0x136/0x2a0
May 26 09:18:59 servername kernel: [31526.938014]  [81420455] ? 
netdev_run_todo+0x55/0x2f0
May 26 09:18:59 servername kernel: [31526.938020]  [8143310f] ? 
dev_ioctl+0x19f/0x590
May 26 09:18:59 servername kernel: [31526.938026]  [8118e148] ? 
kfree+0x118/0x220
May 26 09:18:59 servername kernel: [31526.938033]  [811e330a] ? 
fsnotify_clear_marks_by_inode+0x2a/0x110
May 26 09:18:59 servername kernel: [31526.938038]  [814011fd] ? 
sock_do_ioctl+0x3d/0x50
May 26 09:18:59 servername kernel: [31526.938043]  [81401718] ? 
sock_ioctl+0x1e8/0x2c0
May 26 09:18:59 servername kernel: [31526.938048]  [811ba2ff] ? 
do_vfs_ioctl+0x2cf/0x4b0
May 26 09:18:59 servername 

Bug#697585: We've also experienced this issue.

2013-01-31 Thread Andrew Perry

Just rebooting for the second time now!