? I planned to attach a
file with the last 3 lines, but that's 21MB big so decided against it :)
Mario Kadastik, PhD
Senior researcher
---
Physics is like sex, sure it may have practical reasons, but that's not why
we do it
-- Richard P. Feynman
on send/recv operation
any ideas how to debug what the 256 threads are in fact doing to understand the
underlying cause? As I doubt it's normal that we're exhausting the thread count
on a 5000 jobslot cluster...
Mario Kadastik, PhD
Senior researcher
---
Physics is like sex, sure it may have
to null pointer or some
such, which is really bizarre. And turning off optimizations (as a side effect
to debugging) fixes it.
Mario Kadastik, PhD
Senior researcher
---
Physics is like sex, sure it may have practical reasons, but that's not why
we do it
-- Richard P. Feynman
Anyone? This is blocking one of our main nodes from submissions. Any ideas on
what might cause this or how to debug further are welcome.
On 07.05.2014, at 11:42, Mario Kadastik mario.kadas...@cern.ch wrote:
Hi,
yesterday I upgraded Slurm from 2.5.3 to 14.11 pre-1 (i.e. the current git
are welcome as the SL5.7 node is one of the main user nodes where they
create code and submit to cluster so it has to work even though the full rest
of the cluster works fine. The config btw is shared over NFS so it is identical
on all nodes.
Mario Kadastik, PhD
Senior researcher
---
Physics
I have encountered that slurmctld uses more than 20GB of virtual memory.
But the RSS is less than 1GB. I am not sure whether this is OK or there
is some leakage.
在 2013-06-25二的 11:56 -0700,Mario Kadastik写道:
The OOM kill:
Jun 25 18:21:32 slurm-1 kernel: [5463683.553994] OOM killed process
as a function
of cores/jobs or their flux?
Thanks,
Mario Kadastik, PhD
Researcher
---
Physics is like sex, sure it may have practical reasons, but that's not why
we do it
-- Richard P. Feynman
is probably too large.
Well then I guess this is bad:
[root@slurm-1 ~]# ps -eao pid,user,rss,cmd|grep slurm
21613 slurm6735956 /usr/sbin/slurmctld
it's already using 6.4GB of RSS...
Mario Kadastik, PhD
Researcher
---
Physics is like sex, sure it may have practical reasons, but that's
3372 /usr/sbin/slurmdbd
And I just ran sreport commands to check and got nice reports back so the
accounting DB is running.
Mario Kadastik, PhD
Researcher
---
Physics is like sex, sure it may have practical reasons, but that's not why
we do it
-- Richard P. Feynman
jobs ended hours ago.
Mario Kadastik, PhD
Researcher
---
Physics is like sex, sure it may have practical reasons, but that's not why
we do it
-- Richard P. Feynman
or not).
Thanks,
Mario Kadastik, PhD
Researcher
---
Physics is like sex, sure it may have practical reasons, but that's not why
we do it
-- Richard P. Feynman
. The user would know if the
filesystems etc is fine with that and in our case mostly is.
Is such a feature already in slurm or not? If yes, can you point me to
documentation.
Thanks,
Mario Kadastik, PhD
Researcher
---
Physics is like sex, sure it may have practical reasons, but that's
The thread has somewhat been branched for the RAM requirements. Any useful
comments on the #2-#4? I can probably summarize this by running over all
compute nodes with scontrol show host, but that may not be too efficient...
On 25.04.2013, at 17:28, Mario Kadastik mario.kadas...@cern.ch wrote
-zero exit code the job is rescheduled
automatically elsewhere. In the best case scenario this would imply no failed
jobs except for those that were running at the time of failure if they are
impacted. Will see if this works.
Mario Kadastik, PhD
Researcher
---
Physics is like sex, sure it may
to be drained as the jobs started at
about the same time and have about the same length so waiting for a whole node
to free might take a day or so... And this is wasting resources.
On 01.02.2013, at 16:42, Mario Kadastik mario.kadas...@cern.ch wrote:
Hi,
we would like to configure our cluster
slurm
[root@slurm-1 ~]# rpmbuild -ta slurm-2.5.0-rc-mario.tar.bz2
error: line 93: Tag takes single token only: Name:see META file
[root@slurm-1 ~]#
I'm guessing the problem lies between the keyboard and the chair, but just in
case I thought to ask :)
Mario Kadastik, PhD
Researcher
to
swap torque for slurm hoping the commands work :)
Mario Kadastik, PhD
Researcher
---
Physics is like sex, sure it may have practical reasons, but that's not why
we do it
-- Richard P. Feynman
17 matches
Mail list logo