Note that the following is not final statement but sharing some thoughts to whoever else is looking at this report (and for me to remember). So while I did find nothing that really looked odd in the xen-netfront code I saw there was some change to the generic timer code:
commit 1dabbcec2c0a36fe43509d06499b9e512e70a028 timer: Use hlist for the timer wheel hash buckets That change was part of 4.2 but if it would be the cause I would expect problems not only on AWS instances. But then it might just be that bare- metal servers with a similarly high traffic tend to be upgraded much less often.... anyway... Part of the change above seems to be some exchange of special meaning of list pointer values. Not sure I grasp the implications, yet. While using double linked lists before, the pointer to the next element seemed to serve as pending indicator and the pointer to the previous element was invalidated with a LIST_POISON2 value. Now its the other way round. Referring to the detach_timer function which is called from __run_timers via detached_expired_timer. The crash happens at offset 0x116 in run_timer_softirq (thats 278 decimal). The disassembly of that function around there is: 0xffffffff810e5c1e <+254>: mov %r15,0x8(%rbx) 0xffffffff810e5c22 <+258>: nopl 0x0(%rax,%rax,1) // Guest this is __hlist_del(struct hlist_node *n) // rax = n->next 0xffffffff810e5c27 <+263>: mov (%r15),%rax // rdx = n->ppev 0xffffffff810e5c2a <+266>: mov 0x8(%r15),%rdx 0xffffffff810e5c2e <+270>: test %rax,%rax // *(n->pprev) = n->next 0xffffffff810e5c31 <+273>: mov %rax,(%rdx) // if (n->next == NULL) jump 0xffffffff810e5c34 <+276>: je 0xffffffff810e5c3a <run_timer_softirq+282> // (n->next)->pprev = n->pprev (but n->next is LIST_POISON2 / invalid ptr) 0xffffffff810e5c36 <+278>: mov %rdx,0x8(%rax) 0xffffffff810e5c3a <+282>: testb $0x10,0x2a(%r15) // here we seem back at detach_timer inlined and clear_pending assumed true // entry->next = LIST_POISON2 and entry->pprev = NULL 0xffffffff810e5c3f <+287>: movabs $0xdead000000200200,%rax 0xffffffff810e5c49 <+297>: movq $0x0,0x8(%r15) 0xffffffff810e5c51 <+305>: mov %rax,(%r15) -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1534345 Title: Ubuntu 15.10 Crashing Frequently on EC2 Instances w/ Enhanced Networking To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1534345/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs