Thank you for your replies. Here is what happened. I didn't make any changes to either slurm config or the node config in any way. The issues went away in the morning, and then came back at around 4pm.
It seems like my slurm jobs slow down after 4pm. I am not sure why. there might be jobs running on my nodes at those times, I will investigate. Anyone else seen similar issues. On Sat, May 21, 2011 at 6:13 AM, Mark A. Grondona <[email protected]>wrote: > On Fri, 20 May 2011 18:30:16 -0700, "[email protected]" <[email protected]> > wrote: > > Take a look at your slurmctld and slurmd log files. My _guess_ > > is that the clock on one or more of your nodes is out of sync > > and that is preventing message authentication from occurring. > > As I recall Munge credentials have a five minute period of > > being valid. If any of your nodes have a clock more than that > > far out of sync, messages will get discarded. Although SLURM > > does have some recovery mechanisms, long delays like this will > > occur. > > > > Quoting Paul Thirumalai <[email protected]>: > > > > > Hi All > > > So I am trying to run launch a script using sbatch, but it just seems > to be > > > taking too long to complete. (Sbatch takes 2-3 seconds to complete) > > > > > > The commmand i am using is > > > /usr/bin/sbatch --output=/dev/null --error=/dev/null --begin=now > > > <script_name> > > > > > > This comand takes around 3.5 seconds to complete. I am not sure why its > > > taking so long. Earlier I had changed the config to use select/linear > > > instead of select/cons_res, and after that all the issues started. I > > > reverted back the config, but to no avail. > > An easy first step is sometimes to just run the command with > multiple -v's and see if there is an obvious "pause" between > two sections of output. That can help narrow down where time > is being spent. Sometimes the following perl one-liner is > useful for timestamping output: > > sbatch -vvvv --output=/dev/null --error=/dev/null --begin=now test.sh 2>&1 > \ > | perl -MTime::HiRes=time -pne 'BEGIN {$s=time} printf "%09.4f",time-$s' > > mark > > >
