The following commit has been merged in the openafs-stable-1_6_x branch: commit 3f29b253bcda761305b5567b3936b38f797e7848 Author: Andrew Deason <adea...@dson.org> Date: Sun May 1 11:24:30 2016 -0500
ubik: Don't RECFOUNDDB if can't contact most sites Currently, the ubik recovery code will always set UBIK_RECFOUNDDB during recovery, after asking all other sites for their dbversions. This happens regardless of how many sites we were actually able to successfully contact, even if we couldn't contact any of them. This can cause problems when we are unable to contact a majority of sites with DISK_GetVersion. Since, if we haven't contacted a majority of sites, we cannot say with confidence that we know what the best db version available is (which is what UBIK_RECFOUNDDB represents; that we've found which database is the one we should be using). This can also result in UBIK_RECHAVEDB in a similar situation, indicating that we have the best db version locally, even though we never actually asked anyone else what their db version was. For example, say site A is the sync site going through recovery, and DISK_GetVersion fails for the only other sites B and C. Site A will then set UBIK_RECFOUNDDB, and will claim that site A has the best db version available (UBIK_RECHAVEDB). This allows site A to process ubik write transactions (causing the db to be labelled with a new epoch), or possibly to send the db to the other sites via DISK_SendFile, if they quickly become available during recovery. Ubik write transactions can succeed in this situation, because our ContactQuorum_* calls will succeed if we never try to contact a remote site ('rcode' defaults to 0). This situation should be rather rare, because normally a majority of sites must be reachable by site A for site A to be voted the sync site in the first place. However, it is possible for site A to lose connectivity to all other sites immediately after sync site election. It is also possible for site A to proceed far enough in the recovery process to set UBIK_RECHAVEDB before it loses its sync site status. As a result of all of this, if a site with an old database comes online and there are network connectivity problems between the other sites and a ubik write request comes in, it's possible for the "old" database to overwrite the "new" database. This makes it look as if the database has "rolled back" to an earlier version. This should be possible with any ubik database, though how to actually trigger this bug can change due to different ubik servers setting different network timeouts. It is probably the most likely with the VLDB, because the VLDB is typically the most frequently written database. If a VLDB reverts to an earlier version, it can result in existing volumes to appear to not exist in the VLDB, and can result in new volumes re-using volume IDs from existing volumes. This can result in rather confusing errors. To fix this, ensure that we have contacted a majority of sites with DISK_GetVersion before indicating that we have located the best db version. If we've contacted a majority of sites, then we are guaranteed (under ubik assumptions) that we've found the best version, since previous writes to the database should be guaranteed to hit a majority of sites (otherwise they wouldn't be successful). If we cannot reach a majority of sites, we just don't set UBIK_RECFOUNDDB, and the recovery process restarts. Presumably on the next iteration we'll be able to contact them, or we'll lose sync site status if we can't reach the other sites for long enough. Reviewed-on: https://gerrit.openafs.org/12281 Tested-by: BuildBot <build...@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <ka...@mit.edu> (cherry picked from commit d3dbdade7e8eaf6da37dd6f1f53d9f1384626071) Change-Id: I4f4e7255efd3e16e3acfec8f90bf2019cab1fb63 Reviewed-on: https://gerrit.openafs.org/12339 Tested-by: BuildBot <build...@rampaginggeek.com> Reviewed-by: Mark Vitale <mvit...@sinenomine.net> Reviewed-by: Michael Meffie <mmef...@sinenomine.net> Reviewed-by: Stephan Wiesand <stephan.wies...@desy.de> src/ubik/recovery.c | 45 +++++++++++++++++++++++++++++---------------- 1 files changed, 29 insertions(+), 16 deletions(-) -- OpenAFS Master Repository _______________________________________________ OpenAFS-cvs mailing list OpenAFS-cvs@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-cvs