Hello! First of all, this is my first post to this user group. If I'm in the wrong place please don't hesitate to point me in a different direction.
Starting around mid-December I've been unable to complete a backup-push. After running for an hour or so the server stops responding to network requests. The only thing I can do is wait until backup-push finishes and then I can ssh back in to the server. Once back online I can find the following problems: 1. dmesg repeats this error: *[1107575.808936] xen_netfront: xennet: skb rides the rocket: 19 slots* 2. Wal-e complains about HTTP 500 when pushing files to S3 (sorry, I don't have a copy of this error handy) My server is configured as follows (let me know if more info is helpful): - amazon ec2 i2.4xlarge - ubuntu 14.04 lts - postgres 9.3 - wal-e 7.3 - database size is ~2.4TB >From what I've been able to find so far there may be a bug in the xennet driver that is causing the "rides the rocket" error, see here <http://www.brendangregg.com/blog/2014-09-11/perf-kernel-line-tracing.html> and here <https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1317811>. I've tried turning some of the suggested features off with ethtool as suggested in the links and it seems to have prevented the "rides to the rocket" errors but backup-push still doesn't complete. I've since used an older backup-push to get another server going for testing and it too has the same problem. Has anyone else seen this? If so, were you able to resolve it? Cheers, Brian -- You received this message because you are subscribed to the Google Groups "wal-e" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
