We worked this one out - it was pebkac, not slurm :/ ------ The most dangerous phrase in the language is, "We've always done it this way."
- Grace Hopper On 20 June 2016 at 10:37, Lachlan Musicman <data...@gmail.com> wrote: > Morning! > > We have a scenario where I *think* the problem is a write cache issue, but > I'm not 100% sure. > > We have JobB dependent on JobA. > > JobA internally (ie, not via --output) writes three small files to > nfs-shared disk, the first of which is then parsed by JobB - hence the > dependency (using --dependency afterok: ) > > The error we are seeing is that JobA is reporting as successful finish, > JobB is starting and failing because the file doesn't exist. In particular > we are seeing this when JobA runs on NodeX but JobB runs on NodeY. > > The file does exist - it's being created ~0.6 seconds after JobB begins > executing > > JobB's .out file stating "fail" creation time: 00:44:33.353336 > > JobA's .txt file creation time: 00:44:33.951973 > > > I presume this is related to write cache buffers. > > What are the community's ideas re how is this best handled? > > Cheers > L. > > > > > ------ > The most dangerous phrase in the language is, "We've always done it this > way." > > - Grace Hopper >