Bug#682007: NULL pointer dereference in __fscache_read_or_alloc_pages

2012-10-12 Thread Brian Kroth

Brian Paul Kroth bpkr...@gmail.com 2012-10-11 14:06:

Jonathan Nieder jrnie...@gmail.com 2012-10-01 01:25:

snip/

Once again very sorry for the delay :(

I forgot to disable the DEBUG_INFO and kept filling up my build VMs 
disk during compile.  Then realized I had grabbed the 3.7 rc code, 
which these patches don't apply against.  git checkout 
remotes/stable/linux-3.2.y (results in head 
c74a5e1fe4d0672936c8fb63d7484dfeaa30669c and 3.2.28), seemed to fix 
that.

snip/
Anyways, I just started running that on a machine, so I'll let you 
know if I noticed anything there first before I think about pushing 
it to further places.


Thanks,
Brian


Got another panic using this kernel/set of patches.  The dump is 
attached.


Let me know if you need anything else.

Thanks,
Brian
Oct 12 13:43:01 kefka [14595.129262] FS-Cache: Unsupported event 2 [44/7] in 
state OBJECT_DEAD
Oct 12 13:43:01 kefka [14595.129317] [ cut here ]
Oct 12 13:43:01 kefka [14595.129338] kernel BUG at fs/fscache/object.c:357!
Oct 12 13:43:01 kefka [14595.129358] invalid opcode:  [#1] 
Oct 12 13:43:01 kefka SMP 
Oct 12 13:43:01 kefka 
Oct 12 13:43:01 kefka [14595.129390] CPU 1 
Oct 12 13:43:01 kefka 
Oct 12 13:43:01 kefka [14595.129395] Modules linked in:
Oct 12 13:43:01 kefka acpi_cpufreq
Oct 12 13:43:01 kefka mperf
Oct 12 13:43:01 kefka cpufreq_stats
Oct 12 13:43:01 kefka cpufreq_userspace
Oct 12 13:43:01 kefka cpufreq_powersave
Oct 12 13:43:01 kefka cpufreq_conservative
Oct 12 13:43:01 kefka autofs4
Oct 12 13:43:01 kefka kvm_intel
Oct 12 13:43:01 kefka kvm
Oct 12 13:43:01 kefka cachefiles
Oct 12 13:43:01 kefka binfmt_misc
Oct 12 13:43:01 kefka nfsd
Oct 12 13:43:01 kefka nfs
Oct 12 13:43:01 kefka lockd
Oct 12 13:43:01 kefka fscache
Oct 12 13:43:01 kefka auth_rpcgss
Oct 12 13:43:01 kefka nfs_acl
Oct 12 13:43:01 kefka sunrpc
Oct 12 13:43:01 kefka netconsole
Oct 12 13:43:01 kefka configfs
Oct 12 13:43:01 kefka ext3
Oct 12 13:43:01 kefka jbd
Oct 12 13:43:01 kefka coretemp
Oct 12 13:43:01 kefka ipmi_watchdog
Oct 12 13:43:01 kefka ipmi_devintf
Oct 12 13:43:01 kefka ipmi_si
Oct 12 13:43:01 kefka ipmi_msghandler
Oct 12 13:43:01 kefka fuse
Oct 12 13:43:01 kefka uhci_hcd
Oct 12 13:43:01 kefka ohci_hcd
Oct 12 13:43:01 kefka tpm_infineon
Oct 12 13:43:01 kefka snd_hda_codec_realtek
Oct 12 13:43:01 kefka snd_hda_intel
Oct 12 13:43:01 kefka snd_hda_codec
Oct 12 13:43:01 kefka snd_hwdep
Oct 12 13:43:01 kefka snd_pcm_oss
Oct 12 13:43:01 kefka snd_mixer_oss
Oct 12 13:43:01 kefka snd_pcm
Oct 12 13:43:01 kefka snd_seq_midi
Oct 12 13:43:01 kefka button
Oct 12 13:43:01 kefka hp_wmi
Oct 12 13:43:01 kefka snd_rawmidi
Oct 12 13:43:01 kefka snd_seq_midi_event
Oct 12 13:43:01 kefka processor
Oct 12 13:43:01 kefka sparse_keymap
Oct 12 13:43:01 kefka rfkill
Oct 12 13:43:01 kefka snd_seq
Oct 12 13:43:01 kefka psmouse
Oct 12 13:43:01 kefka thermal_sys
Oct 12 13:43:01 kefka serio_raw
Oct 12 13:43:01 kefka joydev
Oct 12 13:43:01 kefka evdev
Oct 12 13:43:01 kefka tpm_tis
Oct 12 13:43:01 kefka tpm
Oct 12 13:43:01 kefka i2c_i801
Oct 12 13:43:01 kefka tpm_bios
Oct 12 13:43:01 kefka i2c_core
Oct 12 13:43:01 kefka wmi
Oct 12 13:43:01 kefka snd_timer
Oct 12 13:43:01 kefka snd_seq_device
Oct 12 13:43:01 kefka snd
Oct 12 13:43:01 kefka soundcore
Oct 12 13:43:01 kefka snd_page_alloc
Oct 12 13:43:01 kefka ext4
Oct 12 13:43:01 kefka mbcache
Oct 12 13:43:01 kefka jbd2
Oct 12 13:43:01 kefka crc16
Oct 12 13:43:01 kefka dm_mod
Oct 12 13:43:01 kefka raid10
Oct 12 13:43:01 kefka raid456
Oct 12 13:43:01 kefka async_raid6_recov
Oct 12 13:43:01 kefka async_pq
Oct 12 13:43:01 kefka raid6_pq
Oct 12 13:43:01 kefka async_xor
Oct 12 13:43:01 kefka xor
Oct 12 13:43:01 kefka async_memcpy
Oct 12 13:43:01 kefka async_tx
Oct 12 13:43:01 kefka raid1
Oct 12 13:43:01 kefka raid0
Oct 12 13:43:01 kefka multipath
Oct 12 13:43:01 kefka linear
Oct 12 13:43:01 kefka md_mod
Oct 12 13:43:01 kefka hid_microsoft
Oct 12 13:43:01 kefka usbhid
Oct 12 13:43:01 kefka hid
Oct 12 13:43:01 kefka sg
Oct 12 13:43:01 kefka sr_mod
Oct 12 13:43:01 kefka sd_mod
Oct 12 13:43:01 kefka cdrom
Oct 12 13:43:01 kefka crc_t10dif
Oct 12 13:43:01 kefka ahci
Oct 12 13:43:01 kefka libahci
Oct 12 13:43:01 kefka libata
Oct 12 13:43:01 kefka scsi_mod
Oct 12 13:43:01 kefka ehci_hcd
Oct 12 13:43:01 kefka usbcore
Oct 12 13:43:01 kefka e1000e
Oct 12 13:43:01 kefka usb_common
Oct 12 13:43:01 kefka [last unloaded: microcode]
Oct 12 13:43:01 kefka 
Oct 12 13:43:01 kefka [14595.130083] 
Oct 12 13:43:01 kefka [14595.130101] Pid: 25732, comm: kworker/u:0 Not tainted 
3.2.28+ #8
Oct 12 13:43:01 kefka Hewlett-Packard HP Compaq 8200 Elite CMT PC
Oct 12 13:43:01 kefka /1494
Oct 12 13:43:01 kefka 
Oct 12 13:43:01 kefka [14595.130149] RIP: 0010:[a0411fe5] 
Oct 12 13:43:01 kefka [a0411fe5] fscache_object_work_func+0x79c/0x7db 
[fscache]
Oct 12 13:43:01 kefka [14595.130192] RSP: 0018:88021ed15e20  EFLAGS: 
00010286
Oct 12 13:43:01 kefka [14595.130217] RAX: 004f RBX: 
88021f6406c0 RCX: 

Bug#682007: NULL pointer dereference in __fscache_read_or_alloc_pages

2012-10-01 Thread Jonathan Nieder
Brian Kroth wrote:

 Sorry, the labs went into their dormant period and all of my test cases ran
 away for the rest of the summer (the find cmd didn't seem to trigger the
 __fscache problem), so I hadn't moved any further on this.

 Now that they're back, I'm definitely seeing it again (about 20 different
 machines in two days last week), so I've started the process of hunting down
 a trigger cause again.  I'll let you know if I find something.

Thanks for the update.

The human test cases can work fine for vetting a fix.  I'd also be
interested to hear whether the series I sent was completely borked, so
I'd recommend trying on a test machine for a day or two before putting
such a patched kernel into production, though.

Jonathan


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20121001082554.GA7957@elie.Belkin



Bug#682007: NULL pointer dereference in __fscache_read_or_alloc_pages

2012-09-30 Thread Brian Kroth

Jonathan Nieder jrnie...@gmail.com 2012-09-29 15:22:

Hi again,

In July, 2012, Brian Kroth wrote:

Jonathan Nieder jrnie...@gmail.com 2012-07-21 12:04:



Please test the attached patches, for example following the instructions
below:

[...]

Anyways, I'll wait on the results of my previous test first to see if we
have a reliable test case from it before moving forward.

At this point the grep -r abc ... test is just hitting the cache over and
over again, so it's not showing a whole lot.

One other thing I'd tried before was something like this run a couple of
times in a row (hmm, I suppose I could try them in parallel too):

find /fsc_mounted_nfs -type f -exec cat {}  /dev/null \;

A couple of them paniced, but with inconsistent messages, so I had left them
out for now.  Perhaps that's worth another shot ...


So, how did it go?  Did some test case prove reliable?  Any other
new observations?


Sorry, the labs went into their dormant period and all of my test cases 
ran away for the rest of the summer (the find cmd didn't seem to trigger 
the __fscache problem), so I hadn't moved any further on this.


Now that they're back, I'm definitely seeing it again (about 20 
different machines in two days last week), so I've started the process 
of hunting down a trigger cause again.  I'll let you know if I find 
something.


Thanks,
Brian


signature.asc
Description: Digital signature


Bug#682007: NULL pointer dereference in __fscache_read_or_alloc_pages

2012-09-29 Thread Jonathan Nieder
Hi again,

In July, 2012, Brian Kroth wrote:
 Jonathan Nieder jrnie...@gmail.com 2012-07-21 12:04:

 Please test the attached patches, for example following the instructions
 below:
[...]
 Anyways, I'll wait on the results of my previous test first to see if we
 have a reliable test case from it before moving forward.

 At this point the grep -r abc ... test is just hitting the cache over and
 over again, so it's not showing a whole lot.

 One other thing I'd tried before was something like this run a couple of
 times in a row (hmm, I suppose I could try them in parallel too):

 find /fsc_mounted_nfs -type f -exec cat {}  /dev/null \;

 A couple of them paniced, but with inconsistent messages, so I had left them
 out for now.  Perhaps that's worth another shot ...

So, how did it go?  Did some test case prove reliable?  Any other
new observations?

Thanks,
Jonathan


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/2012092902.GA12884@elie.Belkin