Hi Lyn,
Unfortunately, rebooting the node makes no difference to the state of
the node. The job gets re-queued and the node goes back to 'mix~'.
What baffles me is that there is obviously some sort of communication
problem between the slurmctld on the admin node and the slurmd on the
compute
On 13/09/17 10:47, Lachlan Musicman wrote:
> Chris how does this sacrifice performance? If none of my software
> (bioinformatics/perl) is HT, surely I'm sacrificing capacity by leaving
> one thread unused as jobs take an entire core?
A HT is not a core, so if you are running multiple processes
On 13 September 2017 at 10:36, Christopher Samuel
wrote:
>
> On 13/09/17 07:22, Patrick Goetz wrote:
>
> > All I have to say to this is: um, what?
>
> My take has always been that ThreadsPerCore is really for HPC workloads
> where you've decided not to disable HT full stop
On 13/09/17 07:22, Patrick Goetz wrote:
> All I have to say to this is: um, what?
My take has always been that ThreadsPerCore is really for HPC workloads
where you've decided not to disable HT full stop but want to allocate
full cores to each task and then let the code have 2 threads per Slurm
On 09/12/2017 04:21 AM, Gennaro Oliva wrote:
On Mon, Sep 11, 2017 at 04:51:04PM -0600, Lachlan Musicman wrote:
"Note also if you are running with more than 1 thread per core and running
the select/cons_res plugin you will want to set the SelectTypeParameters
variable to something other than
Hi Loris,
At least with earlier releases, I've not found a way to act directly upon
the job. However, if it's possible to down the node, that should requeue
(or cancel) the job.
Best,
Lyn
On Tue, Sep 12, 2017 at 3:40 AM, Loris Bennett
wrote:
>
> Hi,
>
> I have a
I'm hoping someone can provide an explanation as to why slurm requires uid/gid
consistency across nodes, with emphasis on the need for the 'SlurmUser' to be
uid/gid-consistent. I know that slurmctld and slurmdbd can run as user
`slurm` and that this would be safer than running as root.
Hi,
I have a node which is powered on and to which I have sent a job. The
output of sinfo is
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
test up 7-00:00:00 1 mix~ node001
The output of squeue is
JOBID PARTITION NAME USER ST TIME NODES
I'd like to unsubscribe.
Regards,
--
===
Erica Riello
I am not totally sure I understand the question, but try
srun myprogram.sh
instead of 'nohup myprogram.sh &'
On Mon, Sep 11, 2017, 23:43 shengzhao wen wrote:
> Hi all:
> I execute a program at prolog, but when prolog exit, my program also
> exit.
> what should
Hi Lachlan,
On Mon, Sep 11, 2017 at 04:51:04PM -0600, Lachlan Musicman wrote:
> "Note also if you are running with more than 1 thread per core and running
> the select/cons_res plugin you will want to set the SelectTypeParameters
> variable to something other than CR_CPU to avoid unexpected
11 matches
Mail list logo