Hi all, I meant to send out an email about this earlier but it slipped my mind thanks to a nasty cold I've been battling. (Being woken up by my phone at 4:10 AM local time couldn't have helped much either, come to think of it...)
Yesterday there was a ~30 minute partial outage caused by a glitch in the Amazon SimpleDB service used by Tarsnap for tracking user account balances. Unfortunately rather than the service failing -- in which case I've set the Tarsnap server to "fail open" and assume that account balances are positive -- it was timing out, resulting in the Tarsnap server code taking too long to respond and the Tarsnap client giving up. This would have caused any archive extracts and any archive creations started during the outage period to fail. Archive creations started before ~11:10 UTC would be unaffected by this even if they continued into the outage period. Any operations affected by this outage would have failed with tarsnap printing "tarsnap: Too many network failures". If you did not see this error message, you were not affected. (Unless you don't read tarsnap's output, of course.) While I have no particular reason to expect this problem to recur, the SimpleDB service seems to have been relegated to a "legacy" status (it has always had a rather wonky design, and Amazon now has Amazon RDS and Amazon DynamoDB, which are better for almost all contexts) so it's not a service I'm entirely comfortable trusting to be reliable any more; in the short term I intend to prevent this sort of outage by making the Tarsnap server code more aggressive about giving up on SimpleDB, and in the longer term I intend to move away from SimpleDB entirely. -- Colin Percival Security Officer Emeritus, FreeBSD | The power to serve Founder, Tarsnap | www.tarsnap.com | Online backups for the truly paranoid
