On 8/31/20 1:06 PM, Raven Kopelman wrote: > Hi there, > > We have a CI build framework configured such that many machines are > concurrently building and sharing a scons cache. This cache lives on an > Amazon EFS filesystem, mounted as NFS. > > In general this has been spectacularly successful, but every once in a > while corrupted files start coming out of the cache. Our theory is that > the EFS + NFS locking guarantees aren't good enough for the SCons temp > name collision detection algorithm - attached is a patch we are going to > try running with to see if it improves things. > > In addition to hoping a formalized version of this will be considered > for SCons, I'm curious if anyone sees a more likely explanation for the > symptoms described above. > > --- CacheDir.py 2020-08-19 12:59:25.790302000 -0700 > +++ CacheDir.py.uuid 2020-08-19 14:00:29.693749695 -0700 > @@ -32,6 +32,7 @@ > import os > import stat > import sys > +import uuid > > import SCons.Action > import SCons.Warnings > @@ -100,7 +101,11 @@ > > cd.CacheDebug('CachePush(%s): pushing to %s\n', t, cachefile) > > - tempfile = cachefile+'.tmp'+str(os.getpid()) > + # UUID in case filesystem doesn't support file operations well > enough to deal with multiple > + # machines sharing a cache and attempting to write the same file at > the same time (NFS mount of > + # AWS EFS?). > + # TODO: Long filename concern on Windows? > + tempfile = cachefile+'.tmp'+str(os.getpid()) + '_' + str(uuid.uuid1()) > errfmt = "Unable to copy %s to cache. Cache file is %s"
probably not much reason to keep the getpid().. that's a pretty weak way to generate a "unique" filename if there are multiple machines in play... _______________________________________________ Scons-dev mailing list Scons-dev@scons.org https://pairlist2.pair.net/mailman/listinfo/scons-dev