The home are exported along each cluster nodes by NFS:

/data01/home 192.168.4.0/24(rw,no_root_squash)

each node have 32 processors. The script that are:

===========  main script  =============









*#!/bin/shmkdir  Filesfor i in {1..1000}dosem -j 30 ./ene_calc $idonesem
--waitecho "all done"*
==================================

=========== *ene_calc *===============
*#!/bin/sh*
*source program_ene_calc.sh*
*mkdir $1*
*cd $1*
*echo $1*

*CALC_program.py -O ................. -o RESULTS.dat.$1*

*mv RESULTS.dat.$1 ../Files/*
*gzip ../Files/RESULTS.dat.$1&*

*cd ..*
*rm -rf $1*
===================================

I'm running the this version: *GNU parallel 20180222.*
The warning message never appear anymore by changing the temporary
directory to a node local directory,
seems that there is a problem to recognize the different job in different
node albeit each job name have also a
node id extension in semaphores directory.

thanks a lot
Massimiliano

2018-03-05 1:11 GMT+01:00 Ole Tange <o...@tange.dk>:

> On Wed, Feb 28, 2018 at 12:38 PM, Meli Massimiliano
> <massimiliano.m...@gmail.com> wrote:
>
> > The error messages that sometimes block the production of the output is:
> >
> > parallel: Warning: Semaphore stuck for 30 seconds. Consider using
> > --semaphoretimeout.
> >
> > i think that the problem come from the hidden directory in the shared
> > home of the cluster:
> >
> > .parallel
> >
> > the is any way to move this directory in a different position?
>
> The semaphores are in: ~/.parallel/semaphores so you can symlink that
> to somewhere else.
>
> Or you can do:
>
>   export XDG_CACHE_HOME=/somedir/with/write/access
>   mkdir $XDG_CACHE_HOME/parallel
>
> This should create semaphores in $XDG_CACHE_HOME/parallel (but it is
> not tested very well).
>
> I would, however, prefer if we can find the root cause and fix it. But
> if you only see this sometimes it will make it harder. How is the home
> shared? I recall doing a fix for NFS a year ago or so, so if you are
> not running newest version, then try upgrading.
>
>
> /Ole
>

Reply via email to