Hi,
Is there any idea about this error
[root@cluster ~]# sacctmgr -i create cluster Rocks-Cluster
sacctmgr: error: slurmdbd: Sending DbdInit msg: Unable to connect to database
sacctmgr: error: Problem talking to the database: Unable to connect to database
--
Regards,
Mahmood
All,
I'm collecting some usage metrics for our cluster, and I'd like to look at
utilisation in terms of allocated CPU % by partition, basically equivalent of
`sinfo -O cpusstate -p partition_name`, but for historic data. What's the best
way to do this?
I've found that running `sacct
>you need to have slurmdbd running. Is it running?
[root@cluster ~]# ps aux | grep slurmdb
root 3406 0.0 0.0 338636 2672 ?Sl 00:26 0:01
/usr/sbin/slurmdbd
root 17146 0.0 0.0 105308 888 pts/2S+ 13:26 0:00 grep slurmdb
>Is slurm.conf pointing to the right
Hello,
this featute is implemented in "Job Packs" and was presented at SLUG 2016:
https://slurm.schedmd.com/SLUG16/Job_Packs_SUG_2016.pdf
nevertheless, it seems not to be in slurm-17 ...
Regards,
Hendryk
On 26.04.2017 14:12, Malte Thoma wrote:
Hi Kolja,
AFAIK it is not possible with slurm
Dear all,
I'm new to Slurm and clusters in general and I'm currently charged with a task
which, I would imagine, is rather simple but I can't figure it out.
I need to start several processes using --multi-prog but assign different
amounts of resources to each process. (i.e. asymmetric
Hi Kolja,
AFAIK it is not possible with slurm YET :'-(
If you should find out anything we would be VERY interested in an
example ;-).
Regards,
Malte
Am 26.04.2017 um 13:35 schrieb bjoern.miel...@iwes.fraunhofer.de:
Dear all,
I'm new to Slurm and clusters in general and I'm currently
Hello,
I guess that may a simple question for someone more experienced with slurm
scheduling than us. When jobs are queuing in our cluster we find that we get a
lot of these messages in our slurmctld.log
error: Job 25766 priority exceeds 32 bits
I cannot find any mention or discussion of this
Hi David,
Baker D.J. writes:
> Hello,
>
> I guess that may a simple question for someone more experienced with slurm
> scheduling than us. When jobs are queuing in our cluster we find that we get
> a lot of these messages in our slurmctld.log
>
> error: Job 25766
Hi Mahmood
> [root@cluster ~]# ps aux | grep slurmdb
> root 3406 0.0 0.0 338636 2672 ?Sl 00:26 0:01
> /usr/sbin/slurmdbd
> root 17146 0.0 0.0 105308 888 pts/2S+ 13:26 0:00 grep
slurmdb
That's good. What does its /var/log/slurm/slurmdbd.log say? Any errors?
>
Hi Loris,
Thank you for your reply. The output from "sprio -l" is:
JOBID USER PRIORITYAGE FAIRSHAREJOBSIZE PARTITION
QOSNICE TRES
25988 mjp1m12 -922337203 2nan 1 1000
0 0
Also check to see if munge is functioning properly.
On Wed, Apr 26, 2017 at 10:00 AM, Jeff Tan wrote:
> Hi Mahmood
>
> > [root@cluster ~]# ps aux | grep slurmdb
> > root 3406 0.0 0.0 338636 2672 ?Sl 00:26 0:01
> > /usr/sbin/slurmdbd
> > root 17146
Hi David,
Baker D.J. writes:
> Hi Loris,
>
> Thank you for your reply. The output from "sprio -l" is:
>
> JOBID USER PRIORITYAGE FAIRSHAREJOBSIZE
> PARTITIONQOSNICE TRES
> 25988 mjp1m12 -922337203
I've noticed that when I run jobs on lowest priority, some fraction (almost
always the second half of a job array) fail immediately. In sacct this is
what I see:
8365358_98 agent_0_v+ om_all_no+2TIMEOUT 1:0
8365358_98.+ batch2
Is there a time limit set on the queue (rather than the user)?
On 04/26/2017 12:57 PM, Uwe Sauter wrote:
Hi all,
I have a mysterios situation where a user's job is killed after 24h
though he specified "-t 7-00:00:00"
on submission. This happened to several jobs of this user in the last
few
MaxTime on the partition / queue is set to 10 weeks (70-00:00:00).
What I forgot to mention is that some month ago, the same account had jobs
running successfully for several days.
But since then there has been at least two updates within 16.05.
Am 26.04.2017 um 19:06 schrieb Andy Riebs:
Hi all,
I have a mysterios situation where a user's job is killed after 24h though he specified
"-t 7-00:00:00"
on submission. This happened to several jobs of this user in the last few days.
The account he's using is has MaxWall set to 7-00:00:00. There is no QoS used.
In
Hi Mahmood
> [root@cluster ~]# sacctmgr -i create cluster Rocks-Cluster
> sacctmgr: error: slurmdbd: Sending DbdInit msg: Unable to connect to
database
> sacctmgr: error: Problem talking to the database: Unable to connect
> to database
You need to narrow that down. If you're using sacctmgr, you
Hello,
Our slurm control process is having some difficulty with node names.
Our computing cluster is set up as a bunch of virtual machines. The names
of each node are mda01, mda02, ... (mda[01-64] in slurmspeak). If we ssh to
the nodes, `hostname` returns the correct name (same with
Thanks for the reply!
A concrete example is interactive jobs (R, python, etc). I only want the
users to request minimum needed amount of memory but then I don't want
their session to generate an error if they try to allocate more memory -
and the free memory is available in the system. The upper
Hello Slurm community,
Our lab has recently begun transitioning from maui/torque to slurm, but we
are having some difficulties getting our configuration correct. In short,
our CUDA tests routinely violate the virtual memory limits assigned by the
scheduler even though the physical memory space
20 matches
Mail list logo