syncing slurm.conf on all nodes fixed the issue. Thanks all for you help On Tue, Feb 22, 2011 at 3:19 PM, Paul Thirumalai <[email protected]>wrote:
> I see the following line in slurmctld.log > > Node w51 appears to have a different slurm.conf than the slurmctld. This > could cause issues with communication and functionality. Please review both > files and make sure they are the same. If this is expected ignore, and set > DebugFlags=NO_CONF_HASH in your slurm.conf. > > I will sync the conf files and retry > > > On Tue, Feb 22, 2011 at 3:15 PM, Danny Auble <[email protected]> wrote: > >> Is there anything of interest in the slurmctld log? How about the >> slurmd log on the node running the job? >> >> >> On 02/22/11 15:14, Paul Thirumalai wrote: >> >> But new jobs are still getting stuck in CG state. >> >> On Tue, Feb 22, 2011 at 3:13 PM, Paul Thirumalai < >> [email protected]> wrote: >> >>> I am using slurm 2.2. I restarted slurmctld using -c option and squeue >>> not does not show any new jobs >>> >>> >>> On Tue, Feb 22, 2011 at 3:11 PM, Jerry Smith <[email protected]> wrote: >>> >>>> Have you tried restarting slurmctld? We had this issue back a few >>>> revisions, but it seemed to go away with a newer rev ( though I couldn't >>>> tell you which one ). >>>> >>>> Have you tried setting the state to RESUME for the nodes? >>>> >>>> What version of Slurm are you running? >>>> >>>> Jerry >>>> >>>> Paul Thirumalai wrote: >>>> >>>> I am not using epilog while launching these jobs. the jobs are simple >>>> python scripts that run the hostname command and put thte output in a file >>>> that is provided in teh command line. >>>> >>>> I can see that file was written. This tells me that the job completed. >>>> When I login to the node I dont see the process running. However squeue >>>> still tells me that the job is in CG state. >>>> >>>> I stopped all the slurm daemons and restarted them, but the state of >>>> th job is still CG and it shows up in squeue >>>> >>>> On Tue, Feb 22, 2011 at 2:45 PM, Paul Thirumalai < >>>> [email protected]> wrote: >>>> >>>>> Actually I just figured out that all jobs seem to be stuck in >>>>> COMPLETING state. I am now reading >>>>> https://computing.llnl.gov/linux/slurm/faq.html#comp<http://www.google.com/url?sa=D&q=https://computing.llnl.gov/linux/slurm/faq.html%23comp> >>>>> >>>>> >>>>> >>>>> I will continue to trouble shoot. If I run into issues, I will repost >>>>> to this thread. >>>>> >>>> >>>> >>> >> >
