The home are exported along each cluster nodes by NFS: /data01/home 192.168.4.0/24(rw,no_root_squash)
each node have 32 processors. The script that are: =========== main script ============= *#!/bin/shmkdir Filesfor i in {1..1000}dosem -j 30 ./ene_calc $idonesem --waitecho "all done"* ================================== =========== *ene_calc *=============== *#!/bin/sh* *source program_ene_calc.sh* *mkdir $1* *cd $1* *echo $1* *CALC_program.py -O ................. -o RESULTS.dat.$1* *mv RESULTS.dat.$1 ../Files/* *gzip ../Files/RESULTS.dat.$1&* *cd ..* *rm -rf $1* =================================== I'm running the this version: *GNU parallel 20180222.* The warning message never appear anymore by changing the temporary directory to a node local directory, seems that there is a problem to recognize the different job in different node albeit each job name have also a node id extension in semaphores directory. thanks a lot Massimiliano 2018-03-05 1:11 GMT+01:00 Ole Tange <o...@tange.dk>: > On Wed, Feb 28, 2018 at 12:38 PM, Meli Massimiliano > <massimiliano.m...@gmail.com> wrote: > > > The error messages that sometimes block the production of the output is: > > > > parallel: Warning: Semaphore stuck for 30 seconds. Consider using > > --semaphoretimeout. > > > > i think that the problem come from the hidden directory in the shared > > home of the cluster: > > > > .parallel > > > > the is any way to move this directory in a different position? > > The semaphores are in: ~/.parallel/semaphores so you can symlink that > to somewhere else. > > Or you can do: > > export XDG_CACHE_HOME=/somedir/with/write/access > mkdir $XDG_CACHE_HOME/parallel > > This should create semaphores in $XDG_CACHE_HOME/parallel (but it is > not tested very well). > > I would, however, prefer if we can find the root cause and fix it. But > if you only see this sometimes it will make it harder. How is the home > shared? I recall doing a fix for NFS a year ago or so, so if you are > not running newest version, then try upgrading. > > > /Ole >