Hi there, We have a CI build framework configured such that many machines are concurrently building and sharing a scons cache. This cache lives on an Amazon EFS filesystem, mounted as NFS.
In general this has been spectacularly successful, but every once in a while corrupted files start coming out of the cache. Our theory is that the EFS + NFS locking guarantees aren't good enough for the SCons temp name collision detection algorithm - attached is a patch we are going to try running with to see if it improves things. In addition to hoping a formalized version of this will be considered for SCons, I'm curious if anyone sees a more likely explanation for the symptoms described above. --- CacheDir.py 2020-08-19 12:59:25.790302000 -0700 +++ CacheDir.py.uuid 2020-08-19 14:00:29.693749695 -0700 @@ -32,6 +32,7 @@ import os import stat import sys +import uuid import SCons.Action import SCons.Warnings @@ -100,7 +101,11 @@ cd.CacheDebug('CachePush(%s): pushing to %s\n', t, cachefile) - tempfile = cachefile+'.tmp'+str(os.getpid()) + # UUID in case filesystem doesn't support file operations well enough to deal with multiple + # machines sharing a cache and attempting to write the same file at the same time (NFS mount of + # AWS EFS?). + # TODO: Long filename concern on Windows? + tempfile = cachefile+'.tmp'+str(os.getpid()) + '_' + str(uuid.uuid1()) errfmt = "Unable to copy %s to cache. Cache file is %s" if not fs.isdir(cachedir): Cheers, -- *Raven Kopelman* | Team Lead, Senior Developer Safe Software Inc. *T* 604.501.9985 x 331 | *F* 604.501.9965 raven.kopel...@safe.com | www.safe.com <http://www.safe.com/emailsignature>
_______________________________________________ Scons-dev mailing list Scons-dev@scons.org https://pairlist2.pair.net/mailman/listinfo/scons-dev