"
SlurmdLogFile: "/var/log/slurm/slurmd.log"
SlurmdSpoolDir: "/var/spool/slurm/d"
SlurmUser: "{{ slurm_user.name <http://slurm_user.name> }}"
SrunPortRange: "6-61000"
StateSaveLocation: "/var/spool/slurm/ctld"
t all. Any thoughts?
Dietmar
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe sen
Il 06/03/2024 13:49, Gestió Servidors via slurm-users ha scritto:
And how can I reject the job inside the lua script?
Just use
return slurm.FAILURE
and job will be refused.
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
end
end
return slurm.SUCCESS
end
However, if I submit a job with TimeLimit of 5 hours, lua script doesn’t
modify submit and job remains “pending”…
What am I doing wrong?
Thanks.
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum
A
had something in mind when they developed MPS, so I guess our pattern
may not be typical (or at least not universal), and in that case the MPS
plugin may well be what you need.
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bolog
eplication, then manually switch to slurmdbd to a replication
slave if the master goes down? Do you do something else?
Thanks.
Daniel
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna -
to drained. Another possibility is that
slurmctld detects a mismatch between the node and its config: in this
case you'll find the reason in slurmctld.log .
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 4
all_nodes* drained 32 2:8:2 6 0
1 (null) batch job complete f
You have to RESUME the node so it starts accepting jobs.
scontrol update nodename=compute-0 state=resume
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di
penalise everyone who requests
large amounts of memory, whether it is needed or not.
Therefore I would be interested in knowing whether one can take into
account the *requested but unused memory* when calculating usage. Is
this possible?
Cheers,
Loris
--
Dr. Loris Bennett (Herr/Mr)
ZEDAT, Freie Un
array range.
I tried to add "-v" to the sbatch to see if that gives more useful info,
but I couldn't get any more insight. Does anyone have any idea why it's
rejecting my job?
thanks,
Noam
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum -
es have 15 different values
(including 0).
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
scription is misleading.
Noam
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
that snippet in
job_submit.lua ...
Would you expect that to prevent the job from ever running on
any partition?Currently (and, I think, wrongly) that's exactly what happens.
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le
hu, Sep 21, 2023 at 3:11 AM Diego Zuccato <mailto:diego.zucc...@unibo.it>> wrote:
Hello all.
We have one partition (b4) that's reserved for an account while the
others are "free for all".
The problem is that
sbatch --partition=b1,b2,b3,b4,b5 test.sh
fails
cate scheduler logic in
job_submit.lua... :)
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
ain enough
nodes to satisfy the request. That seems to also apply to all_partitions
jobsubmitplugin, making it nearly useless.
We're using Slurm 22.05.6 . On 20.11.4 it worked as expected (excluding
partitions that couldn't satisfy the request).
Any hint?
TIA
--
Diego Zuccato
DIFA - Dip. d
be a great
problem if the reservation remained...
A reservation should only get deleted when expired, IMO (but I can
understand that there are cases where the current behaviour is desired).
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bo
topic. Your expertise and
assistance would greatly help me in successfully completing my project.
Thank you in advance for your time and support.
Best regards,
Maysam
Johannes Gutenberg University of Mainz
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Stu
Ok, PEBKAC :)
When creating the reservation, I set account=root . Just adding
"account=" to the update fixed both errors.
Sorry for the noise.
Diego
Il 04/05/2023 07:51, Diego Zuccato ha scritto:
Hello all.
I'm trying to define a reservation that only allows users
@slurmctl ~]# getent group res-TEST
res-TEST:*:1180406822:testuser
The group comes from AD via sssd.
What am I missing?
TIA
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051
default partitions? In the best case
in a way that slurm schedules to partition1 on default and only to
partition2 when partition1 can't handle the job right now.
Best regards,
Xaver Stiensmeier
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum
tal runs of jobs and
gather timings. We have yet to see a 100% efficient process, but folks
are improving things all the time.
Brian Andrus
On 2/13/2023 9:56 PM, Diego Zuccato wrote:
I think that's incorrect:
> The concept of hyper-threading is not doubling cores. It is a single
> core that
78,80,82,84,86,88,90,92,94,96,98,100,102,104,106,108,110,112,114,116,118,120,122,124,126)
||Has anyone faced this or a similar issue and can give me some
directions?
Best wishes
Sebastian
||
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
h.utexas.edu/~daneel/
<http://www.ph.utexas.edu/~daneel/>
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
That's probably not optimal, but could work. I'd go with brutal
preemption: swapping 90+G can be quite time-consuming.
Diego
Il 07/02/2023 14:18, Analabha Roy ha scritto:
On Tue, 7 Feb 2023, 18:12 Diego Zuccato, <mailto:diego.zucc...@unibo.it>> wrote:
RAM used by a susp
ics/department/physics>
The University of Burdwan <http://www.buruniv.ac.in/>
Golapbag Campus, Barddhaman 713104
West Bengal, India
Emails: dan...@utexas.edu <mailto:dan...@utexas.edu>,
a...@phys.buruniv.ac.in <mailto:a...@phys.buruniv.ac.in>,
hariseldo...@gmail.com <mail
reference to the "default partition" in `JobSubmitPlugins`
and this might be the solution. However, I think this is something so
basic that it probably shouldn't need a plugin so I am unsure.
Can anyone point me towards how setting the default partition is done?
Best regards,
Xaver Stie
Il 21/10/2022 19:14, Rohith Mohan ha scritto:
IIUC this could be the source of your problem:
SelectTypeParameters=CR_CPU_Memory
Maybe try CR_Core_memory . CR_CPU* does not have notion of
sockets/cores/threads.
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma
d between controllers, right?
Possibly use NVME-backed (or even better NVDIMM-backed) NFS share. Or
replica-3 Gluster volume with NVDIMMs for the bricks, for the paranoid :)
Diego
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Ber
gards,
--
Willy Markuske
HPC Systems Engineer
Research Data Services
P: (619) 519-4435
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
Il 26/05/2022 11:48, Diego Zuccato ha scritto:
Still can't
export TMPDIR=...
from TaskProlog script. Surely missing something important. Maybe
TaskProlog is called as a subshell? In that case it can't alter caller's
env... But IIUC someone made it work, and that confuses me...
Seems I
by the job).Still can't
export TMPDIR=...
from TaskProlog script. Surely missing something important. Maybe
TaskProlog is called as a subshell? In that case it can't alter caller's
env... But IIUC someone made it work, and that confuses me...
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi
ne, but I'm sure there must be a better way to do this.
Thanks in advance for the help.
best regards,
Alain
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
}/usr/mpich-4.0.2
gives an executable that only uses 1 CPU even if sbatch requested 52. :(
Any hint appreciated.
Tks.
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20
duced (on newer versions)?
Can this somehow be avoided by setting a default number of tasks or some
other (partition) parameter? Sorry for asking but I couldn't find
anything in the documentation.
Let me know if you need more information.
Best Regards, Benjamin
--
Diego Zuccato
DIFA - Dip
.
==
Paul Brunk, system administrator
Georgia Advanced Resource Computing Center
Enterprise IT Svcs, the University of Georgia
On 2/10/22, 6:26 AM, "slurm-users"
wrote:
[EXTERNAL SENDER - PROCEED CAUTIOUSLY]
On Thu, 2022-02-10 at 11:59:58 +0100, Diego Zuccato wrote:
> Hello all.
slurmctld need read access to /home/userA/myjob.sh or does it
receive the job script as a "blob" or as a path? Does it even need to
know userA's GID or will it simply use 'userA' to lookup associations in
dbd?
Tks.
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
one (been there, done that... :( ).
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
Tks.
Will be useful soon :)
Are there other monitoring plugin you'd suggest?
Il 17/12/2021 11:15, Loris Bennett ha scritto:
Hi Diego,
Diego Zuccato writes:
Hi Loris.
Il 14/12/2021 14:16, Loris Bennett ha scritto:
spectrum, today, via our Zabbix monitoring, I spotted some jobs
Hi Loris.
Il 14/12/2021 14:16, Loris Bennett ha scritto:
spectrum, today, via our Zabbix monitoring, I spotted some jobs with an
unusually high GPU-efficiencies which turned out to be doing
cryptomining :-/
What are you using to collect data for Zabbix?
--
Diego Zuccato
DIFA - Dip. di Fisica
list,user"
JobID NodeList User
--- -----
791 smp-1 user01
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
...
Best,
Steffen
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
impact autodetection (so it "just" requires manual
config) or GPU jobs won't be able to start at all?
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
you saw. Restarting slurmd on the submit node fixes it. This is the
documented behavior (adding nodes needs slurmd restarted everywhere). Could
this be what you're seeing (as opposed to /etc/hosts vs DNS)?
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater
time could be backfilled till the reservation/maintenance starts. You
can put the reservation anytime in the system but at least or before
" minus ", e.g.
scontrol create reservation= starttime=
duration= user=root flags=maint nodes=ALL
Hope, that helps a little bit,
Carsten
-
isted here:
https://wiki.fysik.dtu.dk/niflheim/Slurm_installation#install-prerequisites
/Ole
On 05-11-2021 15:38, Diego Zuccato wrote:
They aren't using modules so it must be something system-wide :(
But not all jobs are impacted. And it seems it's a bit random (doesn't
happen always).
I'm ou
They aren't using modules so it must be something system-wide :(
But not all jobs are impacted. And it seems it's a bit random (doesn't
happen always).
I'm out of ideas, currently :(
Il 05/11/2021 13:10, Ole Holm Nielsen ha scritto:
On 11/5/21 12:47, Diego Zuccato wrote:
Some users
ConstrainRAMSpace=yes
ConstrainSwapSpace=yes
MemorySwappiness=0
MaxSwapPercent=0
AllowedSwapSpace=0
Any ideas?
Tks.
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20
tax errors and the most common errors is already
a big help, expecially for noobs :)
[OK]: All nodeweights are correct.
What do you mean with this? How can weights be "incorrect"?
If someone is interested ...Surely I am :)
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Serv
That's why I upgraded the whole cluster at once.
Tks for the help.
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
ecified").
SLURM 20.11.4.
Tks.
Diego
Il 01/10/2021 21:32, Paul Brunk ha scritto:
Hi:
If you mean "why are the nodes still Drained, now that I fixed the
slurm.conf and restarted (never mind whether the RealMem parameter is
correct)?", try 'scontrol update nodename=str957-bl0-0[1-2]
lt;--
I also tried lowering RealMemory setting to 6, in case MemSpecLimit
interfered, but the result remains the same.
Any ideas?
TIA!
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bolo
20/09/2021 13:49, Diego Zuccato ha scritto:
Tks. Checked it: it's on the home filesystem, NFS-shared between the
nodes. Well, actually a bit more involved than that: JobCompLoc points
to /var/spool/jobscompleted.txt but /var/spool/slurm is actually a
symlink to /home/conf/slurm_spool .
root
ory.
The explanation at below is taken from slurm web site:
"The backup controller recovers state information from the
StateSaveLocation directory, which must be readable and writable from
both the primary and backup controllers."
Regards;
Ahmet M.
20.09.2021 12:08 tarihinde Diego Zuccato yazd
ntly in the process of adding some nodes, but I already did it
other times w/ no issues (actually the second slurmctld node have been
installed to catch the race of a job terminating while the main
slurmctld was shut down).
Anything I should double-check?
Tks.
--
Diego Zuccato
DIFA - Dip. di Fi
right now):
RealMemory=257433 AllocMem=0 FreeMem=159610
That's probably due to buffer/caches remaining allocated between jobs.
They're handled by the SO and should be automatically freed when a
program needs mem.
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma
IIRC we increased SlurmdTimeout to 7200 .
Il 06/08/2021 13:33, Adrian Sevcenco ha scritto:
On 8/6/21 1:56 PM, Diego Zuccato wrote:
We had a similar problem some time ago (slow creation of big core
files) and solved it by increasing the Slurm timeouts
oh, i see.. well, in principle i should
:46, Adrian Sevcenco ha scritto:
On 8/6/21 1:27 PM, Diego Zuccato wrote:
Hi.
Hi!
Might it be due to a timeout (maybe the killed job is creating a core
file, or caused heavy swap usage)?
i will have to search for culprit ..
the problem is why would the node be put in drain for the reason
submit one job with 8 gpus, it will
pending because of gpu fragments: nodes A has 2 idle gpus, node b 6 idle
gpus
Thanks in advance!
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
of task fails and how
can i disable it? (i use cgroups)
moreover, how can the killing of task fails? (this is on slurm 19.05)
Thank you!
Adrian
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna
start". But pestat and slurmtop are
different tools for different uses, no need to duplicate all functionality.
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
pport is all right with no problems, but slurmctld
still does not start on boot.
Also in the log reported blade01 is the hostname of one of the nodes.
You should probably fix /usr/lib/systemd/system/slurmdbd.service as well.
/Ole
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Inform
elete it permanently from your
computer system.
Fai crescere i nostri giovani ricercatori
dona il 5 per mille alla Sapienza
*codice fiscale 80209930587*
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi In
don't
quite see how one could integrate pestat itself directly into Zabbix, as
it is more geared to producing a report, but maybe Ole has ideas :-)
How to use the collected data is one of the big open problems in IT :)
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma
with it yet (for example I still can't understand
how I can exclude some metrics from a host that got 'em added by a
template... When I'll have enough time I'll find a way :) ). Maybe
pestat can be added to the Zabbix metrics...
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Inform
restarted slurmctld and it keeps seeing all CPUs...
What should I think?
But another problem surfaces: slurmtop seems not to handle so many CPUs
gracefully and throws a lot of errors, but that should be something
manageable...
Tks for the help.
BYtE,
Diego
Il 21/07/2021 11:01, Diego Zuccato
Uff... A bit mangled... Correcting and resending.
Il 21/07/2021 08:18, Diego Zuccato ha scritto:
Il 20/07/2021 18:02, mercan ha scritto:
Hi Ahmet.
Did you check slurmctld log for a complain about the host line. if the
slumctld can not recognize a parameter, may be it give up processing
whole
] _build_node_list: No nodes satisfy JobId=33808
requirements in partition b4
(str957 is the second frontend/login node that I've had to take offline
for an unrelated problem).
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le
in later versions...
Maybe delete Boards=1 SocketsPerBoard=4 and try Sockets=4 in stead?
Already tried. Actually, it's been the first try.
The pam_slurm_adopt is very useful :-)
IIUC only if you allow users to connect to the worker nodes. I don't. :)
--
Diego Zuccato
DIFA - Dip. di Fisica e
sik.dtu.dk/niflheim/Slurm_configuration#compute-node-configuration
Tks. Interesting, but I don't se pam_slurm_adopt. Other than that, it
seems very much like what I'm doing.
BYtE,
Diego
On 7/20/21 12:49 PM, Diego Zuccato wrote:
Hello all.
It's been since yesterday that I'm facing this i
ctld after every change in slurm.conf just to be sure.
Any idea?
Tks.
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
impacting other
users. Even if you just make users "pay" for the resources used by
applying fairshare, the temptation to game the system could be too big.
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6
experienced can refine it.
No... it doesn't work...
-Mensaje original-
De: Diego Zuccato
Enviado el: jueves, 10 de junio de 2021 10:37
Para: Slurm User Community List ; Gestió
Servidors
Asunto: Re: [slurm-users] Job requesting two different GPUs on two different
nodes
Il 08/06/2021 15
,
--gres=gpu:GeForceRTX2070:1” because line “#SBATCH --gres=” is for each
node and, then, a line containing two “gres” would request a node with 2
different GPUs. So… is it possible to request 2 different GPUs in 2
different nodes?
Thanks.
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
your job.
Brian Andrus
On 6/1/2021 4:15 AM, Diego Zuccato wrote:
Hello all.
I just found that if an user tries to specify a nodelist (say
including 2 nodes) and --nodes=1, the job gets rejected with
sbatch: error: invalid number of nodes (-N 2-1)
The expected behaviour is that slurm schedules
conflicting info about the issue. Is it version-dependant?
If so, we're currently using 18.08.5-2 (from Debian stable). Should we
expect changes when Debian will ship a newer version? Is it possible to
have the expected behaviour?
Tks.
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi
s page.
Tks.
I upgrade Slurm frequently and have no problems doing so. We're at
20.11.7 now. You should avoid 20.11.{0-2} due to a bug in MPI.
That's a really useful info.
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le
).
As Ole said, it's an old version. I'd love to be able to keep up with
the newest releases, but ... :(
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
- -- ---
ophcpu 81.93%0.00%0.00% 15.85%
2.22% 100.00%
ophmem 80.60%0.00%0.00% 19.40%
0.00% 100.00%
BYtE,
Diego
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater
Il 14/05/2021 08:19, Christopher Samuel ha scritto:
sreport -t percent -T ALL cluster utilization
"sreport: fatal: No valid TRES given" :(
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 401
, but there's a very low-volume mailing list at
ccr-xdmod-l...@listserv.buffalo.edu
<mailto:ccr-xdmod-l...@listserv.buffalo.edu> you could inquire at.
[1] https://github.com/ubccr/xdmod/releases/tag/v9.5.0-rc.4
<https://github.com/ubccr/xdmod/releases/tag/v9.5.0-rc.4>
*From: *Diego Zu
Il 12/05/21 13:30, Diego Zuccato ha scritto:
Anyway, at a first glance, it uses a bit too many technologies for my
taste (php, java, js...) and could be a problem integrating it in a
vhost managed by one of our ISPConfig instances. But I'll try it.
Somehow I'll make it work :)
The more I look
ances. But I'll try it.
Somehow I'll make it work :)
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
to the bare numbers is definitely a no-no :)
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
to do some changes (re field
witdh: our usernames are quite long, being from AD), but first I have to
check if it extracts the info our users want to see :)
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 401
s (or at least the data to put in a spreadsheet for
further processing)?
Tks.
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
propagated (as implied by
PropagateResourceLimits default value of ALL).
And I can confirm that setting it to NONE seems to have solved the
issue: users on the frontend get limited resources, and jobs on the
nodes get the resources they asked.
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
tried to limit to
1GB soft / 4GB hard the memory users can use on the frontend, the jobs
began to fail at startup even if they requested 200G (that are available
on the worker nodes but not on the frontend)...
Tks.
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma
Il 29/03/21 09:35, taleinterve...@sjtu.edu.cn ha scritto:
> Why the loop code cannot get the content in job_desc? And what is the
> correct way to print all its content without manually specify each key?
I already reported it quite some time ago. Seems pairs() is not working.
--
Diego Z
ition.
So the definition will have to be reversed: set the partition limit to
the max allowed (1h) and limit all users except one in the assoc.
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologn
Il 29/01/21 08:47, Diego Zuccato ha scritto:
>> Jobs submitted with sbatch cannot run on multiple partitions. The job
>> will be submitted to the partition where it can start first. (from
>> sbatch reference)
> Did I misunderstand or heterogeneous jobs can workaround this
Il 25/01/21 14:46, Durai Arasan ha scritto:
> Jobs submitted with sbatch cannot run on multiple partitions. The job
> will be submitted to the partition where it can start first. (from
> sbatch reference)
Did I misunderstand or heterogeneous jobs can workaround this limitation?
--
Dieg
hose changes.
IIUC, if you don't specify "clean" when loading new config, users
removed from the dump are left active.
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
Just guessing, tho.
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
asks' => 1,
'nodes' => 'str957-bl0-17',
'start' => 1605621479,
'user_cpu_usec' => 116130,
'req_cpufreq_max' => 0
}
]
};
Job ID: 9604
Il 09/11/20 12:53, Diego Zuccato ha scritto:
> Seems my corrections actually work only for single-node jobs.
> In case of multi-node jobs, it only considers the memory used on one
> node, hence understimates the real efficiency.
> Someone more knowledgeable than me can spot the e
Il 15/09/20 10:14, Diego Zuccato ha scritto:
Seems my corrections actually work only for single-node jobs.
In case of multi-node jobs, it only considers the memory used on one
node, hence understimates the real efficiency.
Someone more knowledgeable than me can spot the error?
TIA!
>
obs that are mostly CPU-intensive but needing very
fast IPC.
In our tests, *fixed the time* at 24h, using HT vs one-process-per-core
lead to 1.8x the iterations. In other words there were twice the
processes running at 90% "clock".
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
S
ll run at about twice the speed it achieves when running on a single
thread.
Tested with FPU-intensive code on our cluster.
What thrashes performance is trying to run different processes in the
two threads of a core.
Just my $.02
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi I
'.
Are you sure you want to continue? (You have 30 seconds to decide)
(N/y): y
sacctmgr: error: An association name is required to remove usage
Should I iterate all the accounts or is there a better/faster method?
TIA!
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma
1 - 100 of 148 matches
Mail list logo