[slurm-dev] Re: Job stuck in CONFIGURING, node is 'mix~'

2017-09-12 Thread Loris Bennett
Hi Lyn, Unfortunately, rebooting the node makes no difference to the state of the node. The job gets re-queued and the node goes back to 'mix~'. What baffles me is that there is obviously some sort of communication problem between the slurmctld on the admin node and the slurmd on the compute

[slurm-dev] Re: Cores, CPUs, and threads: take 2

2017-09-12 Thread Christopher Samuel
On 13/09/17 10:47, Lachlan Musicman wrote: > Chris how does this sacrifice performance? If none of my software > (bioinformatics/perl) is HT, surely I'm sacrificing capacity by leaving > one thread unused as jobs take an entire core? A HT is not a core, so if you are running multiple processes

[slurm-dev] Re: Cores, CPUs, and threads: take 2

2017-09-12 Thread Lachlan Musicman
On 13 September 2017 at 10:36, Christopher Samuel wrote: > > On 13/09/17 07:22, Patrick Goetz wrote: > > > All I have to say to this is: um, what? > > My take has always been that ThreadsPerCore is really for HPC workloads > where you've decided not to disable HT full stop

[slurm-dev] Re: Cores, CPUs, and threads: take 2

2017-09-12 Thread Christopher Samuel
On 13/09/17 07:22, Patrick Goetz wrote: > All I have to say to this is: um, what? My take has always been that ThreadsPerCore is really for HPC workloads where you've decided not to disable HT full stop but want to allocate full cores to each task and then let the code have 2 threads per Slurm

[slurm-dev] Re: Cores, CPUs, and threads: take 2

2017-09-12 Thread Patrick Goetz
On 09/12/2017 04:21 AM, Gennaro Oliva wrote: On Mon, Sep 11, 2017 at 04:51:04PM -0600, Lachlan Musicman wrote: "Note also if you are running with more than 1 thread per core and running the select/cons_res plugin you will want to set the SelectTypeParameters variable to something other than

[slurm-dev] Re: Job stuck in CONFIGURING, node is 'mix~'

2017-09-12 Thread Lyn Gerner
Hi Loris, At least with earlier releases, I've not found a way to act directly upon the job. However, if it's possible to down the node, that should requeue (or cancel) the job. Best, Lyn On Tue, Sep 12, 2017 at 3:40 AM, Loris Bennett wrote: > > Hi, > > I have a

[slurm-dev] On the need for slurm uid/gid consistency

2017-09-12 Thread Phil K
I'm hoping someone can provide an explanation as to why slurm requires uid/gid consistency across nodes, with emphasis on the need for the 'SlurmUser' to be uid/gid-consistent.   I know that slurmctld and slurmdbd can run as user `slurm` and that this would be safer than running as root. 

[slurm-dev] Job stuck in CONFIGURING, node is 'mix~'

2017-09-12 Thread Loris Bennett
Hi, I have a node which is powered on and to which I have sent a job. The output of sinfo is PARTITION AVAIL TIMELIMIT NODES STATE NODELIST test up 7-00:00:00 1 mix~ node001 The output of squeue is JOBID PARTITION NAME USER ST TIME NODES

[slurm-dev] unsubscribe

2017-09-12 Thread Erica Riello
I'd like to unsubscribe. Regards, -- === Erica Riello

[slurm-dev] Re: what should I do to make program does not quit when Prolog exits

2017-09-12 Thread Chris Harwell
I am not totally sure I understand the question, but try srun myprogram.sh instead of 'nohup myprogram.sh &' On Mon, Sep 11, 2017, 23:43 shengzhao wen wrote: > Hi all: > I execute a program at prolog, but when prolog exit, my program also > exit. > what should

[slurm-dev] Re: Cores, CPUs, and threads: take 2

2017-09-12 Thread Gennaro Oliva
Hi Lachlan, On Mon, Sep 11, 2017 at 04:51:04PM -0600, Lachlan Musicman wrote: > "Note also if you are running with more than 1 thread per core and running > the select/cons_res plugin you will want to set the SelectTypeParameters > variable to something other than CR_CPU to avoid unexpected