[slurm-dev] task_affinity bug in 2.5.1 and after..

2013-02-01 Thread Magnus Jonsson
Hi! We are in the process of upgrading into slurm 2.5.2 but I just found a bug in the task_affinity plugin in combination with cgroups. The commit https://github.com/SchedMD/slurm/commit/791322349856e14a3d50aadc4869d40b034a2f37 which solves some Power7 specific problems breaks task

[slurm-dev] Re: slurmctld crash, no reason

2013-02-01 Thread Barbara Krasovec
I have the same problems, nothing useful in the logs. I'm using slurm 2.4.3 on SL6. Barbara On 02/01/2013 01:28 PM, Mario Kadastik wrote: Hi, I just had slurmctld disappear out of the blue. A user reported that he cannot get job info and upon inspection slurmctld wasn't running. I started

[slurm-dev] Re: slurmctld crash, no reason

2013-02-01 Thread David Bigagli
Hi, upon startup slurmctld changes its working directory to where the log file is. If the log file is: SlurmctldLogFile=/var/tmp/slurm/slurmctld.log the working directory is /var/tmp/slurm. Assuming your slurmctld core dump for whatever reason the core file should be there. The directory should be

[slurm-dev] Re: slurmctld crash, no reason

2013-02-01 Thread Danny Auble
This is the first report of any issue like this. Even if you don't see something useful in the logs they are usually useful to reproduce or diagnose the problem. If you could send your slurmctld.log and your slurm.conf someone might be able to look trough them to see if they could get a

[slurm-dev] Re: slurmctld crash, no reason

2013-02-01 Thread Barbara Krasovec
On 2/1/13 5:57 PM, Marcin Stolarek wrote: Re: [slurm-dev] Re: slurmctld crash, no reason 2013/2/1 Barbara Krasovec barba...@arnes.si mailto:barba...@arnes.si I have the same problems, nothing useful in the logs. I'm using slurm 2.4.3 on SL6. Barbara On 02/01/2013 01:28 PM,

[slurm-dev] Re: finding nodes name

2013-02-01 Thread Moe Jette
Look at the environment variable SLURMD_NODENAME Quoting E L nak...@gmail.com: Hello, How can one find the local node-nameas Slurm knows it. Obviously `hostname -s` will do it 99% of the time, but what if a node has several names and Slurm knows it by a different one? Thanks, Nakee

[slurm-dev] Re: finding nodes name

2013-02-01 Thread Danny Auble
I would suggest setting NodeAddr for each node in the slurm.conf. That way you will aways know what address it is point to. On 02/01/2013 07:06 AM, E L wrote: Hello, How can one find the local node-nameas Slurm knows it. Obviously `hostname -s` will do it 99% of the time, but what if a

[slurm-dev] Re: task_affinity bug in 2.5.1 and after..

2013-02-01 Thread Moe Jette
It's working as desired for my test systems with cgroups on x86_64. Could you send me system hardware and configuration details (direct to me rather than the list is fine). Quoting Magnus Jonsson mag...@hpc2n.umu.se: Hi! We are in the process of upgrading into slurm 2.5.2 but I just found

[slurm-dev] Re: slurmctld crash, no reason

2013-02-01 Thread Moe Jette
Just upgrade the slurmdbd, then shutdown the cluster, install the new slurm and restart the daemons. No jobs should be lost. Quoting Mario Kadastik mario.kadas...@cern.ch: Hi, upon startup slurmctld changes its working directory to where the log file is. If the log file is: