We worked this one out - it was pebkac, not slurm :/

------
The most dangerous phrase in the language is, "We've always done it this
way."

- Grace Hopper

On 20 June 2016 at 10:37, Lachlan Musicman <data...@gmail.com> wrote:

> Morning!
>
> We have a scenario where I *think* the problem is a write cache issue, but
> I'm not 100% sure.
>
> We have JobB dependent on JobA.
>
> JobA internally (ie, not via --output) writes three small files to
> nfs-shared disk, the first of which is then parsed by JobB - hence the
> dependency (using --dependency afterok: )
>
> The error we are seeing is that JobA is reporting as successful finish,
> JobB is starting and failing because the file doesn't exist.  In particular
> we are seeing this when JobA runs on NodeX but JobB runs on NodeY.
>
> The file does exist - it's being created ~0.6 seconds after JobB begins
> executing
>
> JobB's .out file stating "fail" creation time: 00:44:33.353336
>
> JobA's .txt file creation time: 00:44:33.951973
>
>
> I presume this is related to write cache buffers.
>
> What are the community's ideas re how is this best handled?
>
> Cheers
> L.
>
>
>
>
> ------
> The most dangerous phrase in the language is, "We've always done it this
> way."
>
> - Grace Hopper
>

Reply via email to