1369725251. It's a fix in the underline virtualization platform. Here is the quote from the ticket.
"The issue is a bug in a performance improvement (10% improved PPS when using Xen PV "netback/netfront" networking) in the latest build of the virtualization platform, which has only been released to D2 instances. The issue is triggered by a race condition deadlock in kernel code that your workload appears to trigger 5-10% of the time." On Tue, Jun 2, 2015 at 4:26 PM, Henry Cai <h...@pinterest.com.invalid> wrote: > Steven, > > Do you have the AWS case # (or the Ubuntu bug/case #) when you hit that > kernel panic issue? > > Our company will still be running on AMI image 12.04 for a while, I will > see whether the fix was also ported onto Ubuntu 12.04 > > On Tue, Jun 2, 2015 at 2:53 PM, Steven Wu <stevenz...@gmail.com> wrote: > > > now I remember we had same kernel panic issue in the first week of D2 > > rolling-out. then AWS fixed it and we haven't seen any issue since. try > > Ubuntu 14.04 and see if it resolves your remaining kernel/instability > issue. > > > > On Tue, Jun 2, 2015 at 2:30 PM, Wes Chow <w...@chartbeat.com> wrote: > > > >> > >> Daniel Nelson <daniel.nel...@vungle.com> > >> June 2, 2015 at 4:39 PM > >> > >> On Jun 2, 2015, at 1:22 PM, Steven Wu <stevenz...@gmail.com> < > stevenz...@gmail.com> wrote: > >> > >> can you elaborate what kind of instability you have encountered? > >> > >> We have seen the nodes become completely non-responsive. Usually they > get rebooted automatically after 10-20 minutes, but occasionally they get > stuck for days in a state where they cannot be rebooted via the Amazon APIs. > >> > >> > >> Same here. It was worse right after d2 launch. We had 6 out of 9 servers > >> die within 10 hours after spinning them up. Amazon rolled out a fix, but > >> we're still seeing similar issues, though not nearly as bad. The first > fix > >> was for something network related, and apparently sending lots of data > >> through the instances caused a kernel panic on the host. We have no > >> information yet about the current issue. > >> > >> Wes > >> > >> Steven Wu <stevenz...@gmail.com> > >> June 2, 2015 at 4:22 PM > >> Wes/Daniel, > >> > >> can you elaborate what kind of instability you have encountered? > >> > >> we are on Ubuntu 14.04.2 and haven't encountered any issues so far. in > >> the announcement, they did mention using Ubuntu 14.04 for better disk > >> throughput. not sure whether 14.04 also addresses any instability issue > you > >> encountered or not. > >> > >> Thanks, > >> Steven > >> > >> In order to ensure the best disk throughput performance from your D2 > instances > >> on Linux, we recommend that you use the most recent version of the > Amazon > >> Linux AMI, or another Linux AMI with a kernel version of 3.8 or later. > The > >> D2 instances provide the best disk performance when you use a Linux > >> kernel that supports Persistent Grants – an extension to the Xen block > ring > >> protocol that significantly improves disk throughput and scalability. > The > >> following Linux AMIs support this feature: > >> > >> - Amazon Linux AMI 2015.03 (HVM) > >> - Ubuntu Server 14.04 LTS (HVM) > >> - Red Hat Enterprise Linux 7.1 (HVM) > >> - SUSE Linux Enterprise Server 12 (HVM) > >> > >> > >> > >> > >> Daniel Nelson <daniel.nel...@vungle.com> > >> June 2, 2015 at 2:42 PM > >> > >> Do you have any workarounds for the d2 issues? We’ve been using them for > >> our Kafkas too, and ran into the instability. We’re on Ubuntu 12.04 and > >> plan to try on 14.04 with the latest HWE to see if that helps any. > >> > >> Thanks! > >> Wes Chow <w...@chartbeat.com> > >> June 2, 2015 at 1:39 PM > >> > >> We have run d2 instances with Kafka. They're currently unstable -- > Amazon > >> confirmed a host issue with d2 instances that gets tickled by a Kafka > >> workload yesterday. Otherwise, it seems the d2 instance type is ideal > as it > >> gets an enormous amount of disk throughput and you'll likely be network > >> bottlenecked. > >> > >> Wes > >> > >> > >> Steven Wu <stevenz...@gmail.com> > >> June 2, 2015 at 1:07 PM > >> EBS (network attached storage) has got a lot better over the last a few > >> years. we don't quite trust it for kafka workload. > >> > >> At Netflix, we were going with the new d2 instance type (HDD). our > >> perf/load testing shows it satisfy our workload. SSD is better in > latency > >> curve but pretty comparable in terms of throughput. we can use the extra > >> space from HDD for longer retention period. > >> > >> On Tue, Jun 2, 2015 at 9:37 AM, Henry Cai <h...@pinterest.com.invalid> > >> <h...@pinterest.com.invalid> > >> > >> > > >