I took a look at this crash. The oops is in the following code, using
debug symbols (ddeb) from the affected kernel:

(gdb) l *(do_writepages+0x16)
0xffffffff81117146 is in do_writepages 
(/build/buildd/linux-2.6.38/mm/page-writeback.c:1057).
1052    {
1053            int ret;
1054    
1055            if (wbc->nr_to_write <= 0)
1056                    return 0;
1057            if (mapping->a_ops->writepages)
1058                    ret = mapping->a_ops->writepages(mapping, wbc);
1059            else
1060                    ret = generic_writepages(mapping, wbc);
1061            return ret;

So the crash seems in the dereferece of either mapping->a_ops or
a_ops->writepages. To check its indeed the case, lets see the offset of
the pointer derefereces in nearby code and compare with the decodecode
output:

(gdb) p &((struct address_space *)0)->a_ops
$1 = (const struct address_space_operations **) 0x58
(gdb) p &((struct address_space_operations *)0)->writepages
$2 = (int (**)(struct address_space *, struct writeback_control *)) 0x18
(gdb) p &((struct writeback_control *)0)->nr_to_write
$3 = (long int *) 0x18

Relevant decodecode:

All code
========
   0:   e4 48                   in     $0x48,%al
   2:   c7 05 e9 a3 91 00 00    movl   $0x400,0x91a3e9(%rip)        # 0x91a3f5
   9:   04 00 00 
   c:   48 83 c4 08             add    $0x8,%rsp
  10:   5b                      pop    %rbx
  11:   c9                      leaveq 
  12:   c3                      retq   
  13:   66 90                   xchg   %ax,%ax
  15:   55                      push   %rbp
  16:   48 89 e5                mov    %rsp,%rbp
  19:   66 66 66 66 90          data32 data32 data32 xchg %ax,%ax
  1e:   31 c0                   xor    %eax,%eax
  20:   48 83 7e 18 00          cmpq   $0x0,0x18(%rsi)
  25:   7e 0f                   jle    0x36
  27:   48 8b 47 58             mov    0x58(%rdi),%rax
  2b:*  48 8b 40 18             mov    0x18(%rax),%rax     <-- trapping 
instruction
  2f:   48 85 c0                test   %rax,%rax
  32:   74 09                   je     0x3d
  34:   ff d0                   callq  *%rax

mov    0x58(%rdi),%rax == mapping->a_ops
mov    0x18(%rax),%rax == ...a_ops->writepages

So indeed the crash happens on a_ops->writepages dereference, which
means a_ops has an invalid value.

This is strange. Looking at the dmesg, we can see that ext4 is being
used, and that the i_mapping->a_ops probably is set by ext4. Looking at
the ext4 code, I found something interesting:

- ext4 sets a_ops, probably in case above to ext4_da_aops
- Using the same debug symbols, we can get the same address as the running 
kernel for this ext4_da_aops case, here I get:
(gdb) p &ext4_da_aops
$11 = (const struct address_space_operations *) 0xffffffff81627fc0
(gdb) p/x 0xffffffff81627fc0 + 0x18
$12 = 0xffffffff81627fd8

Now look at the last value, which is the address from a_ops->writepages 
dereference. It's *very similar* to the value in the invalid dereference which 
triggers the oops:
ffffffff81627fd8  --> valid address of ext4_da_aops
ffffffef81627fd8  --> invalid address which triggers the oops

Note that there is one bit flipped, "ef" instead of "ff" in the middle.

So, something corrupted the a_ops pointer in the code, or the machine
has some hardware problem (may be memory issue) which flipped the bit.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/766334

Title:
  BUG: unable to handle kernel paging request at ffffffef81627fd8

-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to