Bug#682007: linux-image-3.2.0-0.bpo.2-amd64: NULL pointer dereference in __fscache_read_or_alloc_pages
Ben Hutchings 2012-07-19 16:21: On Thu, Jul 19, 2012 at 09:03:26AM -0500, Brian Kroth wrote: Ben Hutchings 2012-07-19 13:32: > On Thu, 2012-07-19 at 13:37 +0200, Bastian Blank wrote: >> On Wed, Jul 18, 2012 at 11:16:33AM -0500, Brian Kroth wrote: >>> ** Tainted: PO (4097) >>> * Proprietary module has been loaded. >>> * Out-of-tree module has been loaded. >> >>> 21:04:00 kefka [187206.183487] Pid: 20810, comm: MATLAB Tainted: P >>> O 3.2.0-0.bpo.2-amd64 #1 >> >> We don't support proprietary stuff. Please remove and try again. > > To be clear, Bastian is referring to the proprietary kernel module > (nvidia). Ok. The driver is required for some third party engineering software we have to run, but I can rig a spare machine to run some of these other jobs without it for a bit. I'll report back if/when I have a new panic. I will note though that the driver comes from the debian provided packages (albeit from backports instead of stable). I realise that, but it's not part of Debian proper and none of us signed up to debug drivers that don't come with source. Fair enough. I've attached a new set of kernel messages captured from some runs without the nvidia driver loaded, but with the rest of the setup the same. It doesn't quite seem to be tickling the same code path - this time it's an invalid opcode message instead of a NULL pointer dereference. I'll let it go for a while more to see if I can get the same style message to back. Unfortunately I don't exactly know how to reproduce it. Thanks, Brian Jul 19 15:45:53 kefka [ 4289.632673] [ cut here ] Jul 19 15:45:53 kefka [ 4289.632711] kernel BUG at /build/buildd-linux_3.2.20-1~bpo60+1-amd64-tQMw4f/linux-3.2.20/fs/buffer.c:3088! Jul 19 15:45:53 kefka [ 4289.632756] invalid opcode: [#1] Jul 19 15:45:53 kefka SMP Jul 19 15:45:53 kefka Jul 19 15:45:53 kefka [ 4289.632784] CPU 3 Jul 19 15:45:53 kefka Jul 19 15:45:53 kefka [ 4289.632792] Modules linked in: Jul 19 15:45:53 kefka acpi_cpufreq Jul 19 15:45:53 kefka mperf Jul 19 15:45:53 kefka cpufreq_userspace Jul 19 15:45:53 kefka cpufreq_powersave Jul 19 15:45:53 kefka cpufreq_conservative Jul 19 15:45:53 kefka cpufreq_stats Jul 19 15:45:53 kefka autofs4 Jul 19 15:45:53 kefka cachefiles Jul 19 15:45:53 kefka kvm_intel Jul 19 15:45:53 kefka kvm Jul 19 15:45:53 kefka binfmt_misc Jul 19 15:45:53 kefka nfsd Jul 19 15:45:53 kefka nfs Jul 19 15:45:53 kefka lockd Jul 19 15:45:53 kefka fscache Jul 19 15:45:53 kefka auth_rpcgss Jul 19 15:45:53 kefka nfs_acl Jul 19 15:45:53 kefka sunrpc Jul 19 15:45:53 kefka netconsole Jul 19 15:45:53 kefka configfs Jul 19 15:45:53 kefka ext3 Jul 19 15:45:53 kefka jbd Jul 19 15:45:53 kefka coretemp Jul 19 15:45:53 kefka ipmi_watchdog Jul 19 15:45:53 kefka ipmi_devintf Jul 19 15:45:53 kefka ipmi_si Jul 19 15:45:53 kefka ipmi_msghandler Jul 19 15:45:53 kefka fuse Jul 19 15:45:53 kefka uhci_hcd Jul 19 15:45:53 kefka ohci_hcd Jul 19 15:45:53 kefka tpm_infineon Jul 19 15:45:53 kefka snd_hda_codec_realtek Jul 19 15:45:53 kefka snd_hda_intel Jul 19 15:45:53 kefka snd_hda_codec Jul 19 15:45:53 kefka snd_hwdep Jul 19 15:45:53 kefka snd_pcm_oss Jul 19 15:45:53 kefka snd_mixer_oss Jul 19 15:45:53 kefka snd_pcm Jul 19 15:45:53 kefka snd_seq_midi Jul 19 15:45:53 kefka snd_rawmidi Jul 19 15:45:53 kefka snd_seq_midi_event Jul 19 15:45:53 kefka snd_seq Jul 19 15:45:53 kefka snd_timer Jul 19 15:45:53 kefka snd_seq_device Jul 19 15:45:53 kefka snd Jul 19 15:45:53 kefka i2c_i801 Jul 19 15:45:53 kefka tpm_tis Jul 19 15:45:53 kefka tpm Jul 19 15:45:53 kefka processor Jul 19 15:45:53 kefka soundcore Jul 19 15:45:53 kefka hp_wmi Jul 19 15:45:53 kefka sparse_keymap Jul 19 15:45:53 kefka rfkill Jul 19 15:45:53 kefka tpm_bios Jul 19 15:45:53 kefka snd_page_alloc Jul 19 15:45:53 kefka thermal_sys Jul 19 15:45:53 kefka i2c_core Jul 19 15:45:53 kefka psmouse Jul 19 15:45:53 kefka wmi Jul 19 15:45:53 kefka serio_raw Jul 19 15:45:53 kefka evdev Jul 19 15:45:53 kefka joydev Jul 19 15:45:53 kefka button Jul 19 15:45:53 kefka ext4 Jul 19 15:45:53 kefka mbcache Jul 19 15:45:53 kefka jbd2 Jul 19 15:45:53 kefka crc16 Jul 19 15:45:53 kefka dm_mod Jul 19 15:45:53 kefka raid10 Jul 19 15:45:53 kefka raid456 Jul 19 15:45:53 kefka async_raid6_recov Jul 19 15:45:53 kefka async_pq Jul 19 15:45:53 kefka raid6_pq Jul 19 15:45:53 kefka async_xor Jul 19 15:45:53 kefka xor Jul 19 15:45:53 kefka async_memcpy Jul 19 15:45:53 kefka async_tx Jul 19 15:45:53 kefka raid1 Jul 19 15:45:53 kefka raid0 Jul 19 15:45:53 kefka multipath Jul 19 15:45:53 kefka linear Jul 19 15:45:53 kefka md_mod Jul 19 15:45:53 kefka hid_microsoft Jul 19 15:45:53 kefka usbhid Jul 19 15:45:53 kefka hid Jul 19 15:45:53 kefka sg Jul 19 15:45:53 kefka sr_mod Jul 19 15:45:53 kefka sd_mod Jul 19 15:45:53 kefka cdrom Jul 19 15:45:53 kefka crc_t10dif Jul 19 15:45:53 kefka ahci Jul 19 15:45:53 kefka libahci Jul 19 15:45:53 kefka libata Jul 19 15:45:53 kefka scsi_mod Jul 19 15:45:53 kefka ehci_hcd Jul 19 15:45:53 kefka e1000e Ju
Bug#682007: linux-image-3.2.0-0.bpo.2-amd64: NULL pointer dereference in __fscache_read_or_alloc_pages
On Thu, Jul 19, 2012 at 09:03:26AM -0500, Brian Kroth wrote: > Ben Hutchings 2012-07-19 13:32: > >On Thu, 2012-07-19 at 13:37 +0200, Bastian Blank wrote: > >>On Wed, Jul 18, 2012 at 11:16:33AM -0500, Brian Kroth wrote: > >>> ** Tainted: PO (4097) > >>> * Proprietary module has been loaded. > >>> * Out-of-tree module has been loaded. > >> > >>> 21:04:00 kefka [187206.183487] Pid: 20810, comm: MATLAB Tainted: P > >>> O 3.2.0-0.bpo.2-amd64 #1 > >> > >>We don't support proprietary stuff. Please remove and try again. > > > >To be clear, Bastian is referring to the proprietary kernel module > >(nvidia). > > Ok. The driver is required for some third party engineering > software we have to run, but I can rig a spare machine to run some > of these other jobs without it for a bit. I'll report back if/when > I have a new panic. > > I will note though that the driver comes from the debian provided > packages (albeit from backports instead of stable). I realise that, but it's not part of Debian proper and none of us signed up to debug drivers that don't come with source. Ben. -- Ben Hutchings We get into the habit of living before acquiring the habit of thinking. - Albert Camus -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20120719152127.gy1...@decadent.org.uk
Bug#682007: linux-image-3.2.0-0.bpo.2-amd64: NULL pointer dereference in __fscache_read_or_alloc_pages
Ben Hutchings 2012-07-19 13:32: On Thu, 2012-07-19 at 13:37 +0200, Bastian Blank wrote: On Wed, Jul 18, 2012 at 11:16:33AM -0500, Brian Kroth wrote: > ** Tainted: PO (4097) > * Proprietary module has been loaded. > * Out-of-tree module has been loaded. > 21:04:00 kefka [187206.183487] Pid: 20810, comm: MATLAB Tainted: P > O 3.2.0-0.bpo.2-amd64 #1 We don't support proprietary stuff. Please remove and try again. To be clear, Bastian is referring to the proprietary kernel module (nvidia). Ok. The driver is required for some third party engineering software we have to run, but I can rig a spare machine to run some of these other jobs without it for a bit. I'll report back if/when I have a new panic. I will note though that the driver comes from the debian provided packages (albeit from backports instead of stable). Thanks, Brian signature.asc Description: Digital signature
Bug#682007: linux-image-3.2.0-0.bpo.2-amd64: NULL pointer dereference in __fscache_read_or_alloc_pages
On Thu, 2012-07-19 at 13:37 +0200, Bastian Blank wrote: > On Wed, Jul 18, 2012 at 11:16:33AM -0500, Brian Kroth wrote: > > ** Tainted: PO (4097) > > * Proprietary module has been loaded. > > * Out-of-tree module has been loaded. > > > 21:04:00 kefka [187206.183487] Pid: 20810, comm: MATLAB Tainted: P > > O 3.2.0-0.bpo.2-amd64 #1 > > We don't support proprietary stuff. Please remove and try again. To be clear, Bastian is referring to the proprietary kernel module (nvidia). Ben. -- Ben Hutchings DNRC Motto: I can please only one person per day. Today is not your day. Tomorrow isn't looking good either. signature.asc Description: This is a digitally signed message part
Bug#682007: linux-image-3.2.0-0.bpo.2-amd64: NULL pointer dereference in __fscache_read_or_alloc_pages
On Wed, Jul 18, 2012 at 11:16:33AM -0500, Brian Kroth wrote: > ** Tainted: PO (4097) > * Proprietary module has been loaded. > * Out-of-tree module has been loaded. > 21:04:00 kefka [187206.183487] Pid: 20810, comm: MATLAB Tainted: P > O 3.2.0-0.bpo.2-amd64 #1 We don't support proprietary stuff. Please remove and try again. -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20120719113721.ga31...@wavehammer.waldi.eu.org
Bug#682007: linux-image-3.2.0-0.bpo.2-amd64: NULL pointer dereference in __fscache_read_or_alloc_pages
Subject: linux-image-3.2.0-0.bpo.2-amd64: NULL pointer dereference in __fscache_read_or_alloc_pages Package: src:linux Version: 3.2.20-1~bpo60+1 Severity: important ** Please type your report below this line *** I have a number of machines running linux-image-3.2.0-0.bpo.2-amd64 from squeeze-backports that are experiencing a NULL pointer dereference bug in __fscache_read_or_alloc_pages fairly consistently. Out of ~120 machines at least 10 of them seem to experience a panic once a day. The full details of a typical panic as captured via netconsole are included below. The relevant setup details are as follows: Third party applications (eg: matlab) are installed on an NFS server. Clients mount the exports (one fs/export per application) via nfsv4's root exports traversal mechanism (I forget what it's really called off hand). In the options they include "ro" and "fsc" and run cachefilesd (0.10.4 since 0.9-3 had an excessive debug logging bug - #620732) so that the mostly static and read-only application data can be cached locally. The mount point for the cachefilesd looks like this: /dev/mapper/vg-fscache /var/cache/fscache ext4 rw,relatime,errors=panic,user_xattr,acl,barrier=1,data=ordered 0 0 The cachefilesd.conf file is also included below in case it matters. From all of the detailed panic reports I've looked at the bug seems to be triggered on a MATLAB comm, but that might just be that this is our less busy time of the year so there's more condor compute jobs running while the machines are otherwise idle. Since many of those jobs need the third party apps they'll tend to be using the fscache more frequently. What that also means is that I haven't seen any of these bugs show up referencing data that's on one of our other nfsv3 mounts yet. They also all have fsc turned on. Not sure if that's relevant or just a red herring though. Let me know if you need any more details. Thanks, Brian -- Package-specific info: ** Version: Linux version 3.2.0-0.bpo.2-amd64 (Debian 3.2.20-1~bpo60+1) (debian-kernel@lists.debian.org) (gcc version 4.4.5 (Debian 4.4.5-8) ) #1 SMP Fri Jun 29 20:42:29 UTC 2012 ** Command line: BOOT_IMAGE=/vmlinuz-3.2.0-0.bpo.2-amd64 root=/dev/mapper/vg-root ro panic=30 rootdelay=10 quiet ** Tainted: PO (4097) * Proprietary module has been loaded. * Out-of-tree module has been loaded. ** Kernel log: [ 28.191814] ACPI: Power Button [PWRB] [ 28.191852] input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input4 [ 28.191874] ACPI: Power Button [PWRF] [ 28.200722] wmi: Mapper loaded [ 28.658193] i801_smbus :00:1f.3: PCI INT C -> GSI 18 (level, low) -> IRQ 18 [ 28.888998] tpm_tis 00:0b: 1.2 TPM (device-id 0xB, rev-id 16) [ 28.949523] input: HP WMI hotkeys as /devices/virtual/input/input5 [ 29.264980] nvidia: module license 'NVIDIA' taints kernel. [ 29.264983] Disabling lock debugging due to kernel taint [ 29.346355] nvidia :01:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16 [ 29.346362] nvidia :01:00.0: setting latency timer to 64 [ 29.346366] vgaarb: device changed decodes: PCI::01:00.0,olddecodes=io+mem,decodes=none:owns=io+mem [ 29.346425] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 295.59 Wed Jun 6 21:19:40 PDT 2012 [ 29.610240] snd_hda_intel :00:1b.0: PCI INT A -> GSI 22 (level, low) -> IRQ 22 [ 29.610287] snd_hda_intel :00:1b.0: irq 48 for MSI/MSI-X [ 29.610308] snd_hda_intel :00:1b.0: setting latency timer to 64 [ 29.668854] input: HDA Intel PCH Headphone as /devices/pci:00/:00:1b.0/sound/card0/input6 [ 30.548826] EXT4-fs (dm-0): re-mounted. Opts: (null) [ 30.726930] EXT4-fs (dm-0): re-mounted. Opts: errors=panic [ 30.864134] scsi_verify_blk_ioctl: 14 callbacks suppressed [ 30.864136] mdadm: sending ioctl 1261 to a partition! [ 30.864138] mdadm: sending ioctl 1261 to a partition! [ 30.871044] mdadm: sending ioctl 1261 to a partition! [ 30.871055] mdadm: sending ioctl 1261 to a partition! [ 30.879765] mdadm: sending ioctl 1261 to a partition! [ 30.879776] mdadm: sending ioctl 1261 to a partition! [ 30.888659] mdadm: sending ioctl 1261 to a partition! [ 30.888663] mdadm: sending ioctl 1261 to a partition! [ 30.889120] mdadm: sending ioctl 800c0910 to a partition! [ 30.889122] mdadm: sending ioctl 800c0910 to a partition! [ 30.910763] ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver [ 30.912552] uhci_hcd: USB Universal Host Controller Interface driver [ 30.941330] fuse init (API version 7.17) [ 30.957496] ipmi message handler version 39.2 [ 30.968837] IPMI System Interface driver. [ 30.968859] ipmi_si: probing via SMBIOS [ 30.968860] ipmi_si: SMBIOS: mem 0x0 regsize 1 spacing 1 irq 0 [ 30.968862] ipmi_si: Adding SMBIOS-specified kcs state machine [ 30.968864] ipmi_si: Trying SMBIOS-specified kcs state machine at mem address 0x0, slave address 0x20, irq 0 [ 30.968866] ipmi_si: Could not set up I/O space [ 3