Note that the following is not final statement but sharing some thoughts
to whoever else is looking at this report (and for me to remember). So
while I did find nothing that really looked odd in the xen-netfront code
I saw there was some change to the generic timer code:

commit 1dabbcec2c0a36fe43509d06499b9e512e70a028
  timer: Use hlist for the timer wheel hash buckets

That change was part of 4.2 but if it would be the cause I would expect
problems not only on AWS instances. But then it might just be that bare-
metal servers with a similarly high traffic tend to be upgraded much
less often.... anyway... Part of the change above seems to be some
exchange of special meaning of list pointer values. Not sure I grasp the
implications, yet. While using double linked lists before, the pointer
to the next element seemed to serve as pending indicator and the pointer
to the previous element was invalidated with a LIST_POISON2 value. Now
its the other way round. Referring to the detach_timer function which is
called from __run_timers via detached_expired_timer.

The crash happens at offset 0x116 in run_timer_softirq (thats 278
decimal). The disassembly of that function around there is:

   0xffffffff810e5c1e <+254>:   mov    %r15,0x8(%rbx)
   0xffffffff810e5c22 <+258>:   nopl   0x0(%rax,%rax,1)
   // Guest this is __hlist_del(struct hlist_node *n)
   // rax = n->next
   0xffffffff810e5c27 <+263>:   mov    (%r15),%rax
   // rdx = n->ppev
   0xffffffff810e5c2a <+266>:   mov    0x8(%r15),%rdx
   0xffffffff810e5c2e <+270>:   test   %rax,%rax
   // *(n->pprev) = n->next
   0xffffffff810e5c31 <+273>:   mov    %rax,(%rdx)
   // if (n->next == NULL) jump
   0xffffffff810e5c34 <+276>:   je     0xffffffff810e5c3a 
<run_timer_softirq+282>
   // (n->next)->pprev = n->pprev (but n->next is LIST_POISON2 / invalid ptr)
   0xffffffff810e5c36 <+278>:   mov    %rdx,0x8(%rax)
   0xffffffff810e5c3a <+282>:   testb  $0x10,0x2a(%r15)
   // here we seem back at detach_timer inlined and clear_pending assumed true
   // entry->next = LIST_POISON2 and entry->pprev = NULL
   0xffffffff810e5c3f <+287>:   movabs $0xdead000000200200,%rax
   0xffffffff810e5c49 <+297>:   movq   $0x0,0x8(%r15)
   0xffffffff810e5c51 <+305>:   mov    %rax,(%r15)

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1534345

Title:
  Ubuntu 15.10 Crashing Frequently on EC2 Instances w/ Enhanced
  Networking

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1534345/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to