The following commit has been merged in the openafs-stable-1_6_x branch: commit 37b70b9c62b2799f7095fa83ab84485eb991cf39 Author: Marcio Barbosa <mbarb...@sinenomine.net> Date: Mon Dec 11 19:18:43 2017 -0300
ubik: update epoch as soon as sync-site is elected The ubik_epochTime represents the time at which the coordinator first received its coordinator mandate. However, this global is currently not updated at the moment when a new sync-site is elected. Instead, ubik_epochTime is only updated at the very end of the first write transaction, when a new database label is written (in udisk_commit). This causes at least 2 different issues: For one, this means that we change ubik_epochTime while a remote transaction is in progress. If VOTE_Beacon is called after ubik_epochTime is updated, but before the remote transaction ends, the remote sites will detect that the transaction id in ubik_currentTrans is wrong (via urecovery_CheckTid(), since the epoch doesn't match), and they will abort the transaction. This means the transaction will fail, and it may cause a loss of quorum until another election is completed. Another issue is that ubik_epochTime can be 0 at the beginning of a write transaction, if this is the first election that this site has won. Since ubik_epochTime is used to construct transaction ids, this means that we can have different transactions that originate from different sites at different times, but they have the same epoch in their tid. For example, say a write transaction starts with epoch 0, but the originating site is killed/interrupted before finishing. That write transaction will linger on remote sites in ubik_currentTrans with an epoch of 0 (since the originating site will never call DISK_ReleaseLocks, or DISK_Abort, etc). Normally the sync site will kill such a lingering transaction via urecovery_CheckTid, but since the epoch is 0, and the election winner's epoch is also 0, the transaction looks valid and may never be killed. If that transaction is holding a lock on the database, this means that the database will forever remain locked, effectively preventing any access to the db on that site. To fix both of these issues, update ubik_epochTime with the current time as soon as we win the election. This ensures that the epoch is not updated in the middle of a transaction, and it ensures that all transactions are created with a unique epoch: the epoch of the election that we won. Note that with this commit, we do not ever set ubik_epochTime to the magic value of '2' during database init. The special '2' epoch only needs to be set in the database itself, and it is never an actual epoch that represents a real quorum that went through the election process. The database will be labelled with a 'real' epoch after the first write, like normal. [ka...@mit.edu: comment the locking strategy in ubeacon_Interact()] Reviewed-on: https://gerrit.openafs.org/12609 Tested-by: BuildBot <build...@rampaginggeek.com> Reviewed-by: Marcio Brito Barbosa <mbarb...@sinenomine.net> Reviewed-by: Benjamin Kaduk <ka...@mit.edu> (cherry picked from commit da704137f4bf766250ca87dbdc5a85c2024cb0a6) Change-Id: I82e9ec41eb1a2316ecd2b76ef5c89432b2a3c059 Reviewed-on: https://gerrit.openafs.org/12806 Tested-by: BuildBot <build...@rampaginggeek.com> Reviewed-by: Andrew Deason <adea...@sinenomine.net> Reviewed-by: Mark Vitale <mvit...@sinenomine.net> Reviewed-by: Michael Meffie <mmef...@sinenomine.net> Reviewed-by: Marcio Brito Barbosa <mbarb...@sinenomine.net> Reviewed-by: Hartmut Reuter <reu...@rzg.mpg.de> Reviewed-by: Stephan Wiesand <stephan.wies...@desy.de> src/ubik/beacon.c | 16 +++++++++++++--- src/ubik/disk.c | 3 +-- src/ubik/recovery.c | 3 +-- 3 files changed, 15 insertions(+), 7 deletions(-) -- OpenAFS Master Repository _______________________________________________ OpenAFS-cvs mailing list OpenAFS-cvs@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-cvs