e something I'm missing, or is this a bug?
All the best,
Chris
- --
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
On 03/03/14 12:01, Christopher Samuel wrote:
> and I can see the expected values when I do:
>
> # sacctmgr show assoc where cluster=barcoo account=foo
>
> but it doesn't appear to have any effect. I've tried restarting
) with:
PropagateResourceLimits=NONE
That means we can have different, appropriate, limits on both login
and compute nodes.
cheers,
Chris
- --
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
then patch our local showq to
treat those as "blocked" and hide them from other users.
cheers,
Chris
- --
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
htt
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
On 04/03/14 21:23, Marcin Stolarek wrote:
> bf_max_job_user=#
We already do this, but it doesn't change the state of the skipped
jobs to reflect the fact that they're not eligible to start.
cheers,
Chris
- --
Christopher Samuel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
On 11/03/14 06:20, Andy Riebs wrote:
> Has anyone seen this before? slurm.conf is on an NFS server, so
> it's possible we've got a configuration error there.
We've seen this same problem too, lost a heap of jobs to it. :-
-SLURM_SUBMIT_HOST
SLURM_JOB_NUM_NODES
- --
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci
-BEGIN PGP SIGNATURE-
Ver
d our Intel clusters which have all only run 2.6.x.
The University has had some occasional network and DNS issues which
have meant that our slurmdbd has had transient issues talking to our
database server, so I don't know if that could be the trigger for us.
All the best,
Chris
- --
Ch
o be sure.. :-)
Secondly will a 14.03 slurmctld happily talk to (drained) 2.6.x
slurmd's running jobs so we can do a rolling upgrade? Or will we need
to drain the entire cluster of running jobs first and then upgrade
them in a single hit?
All the best,
Chris
- --
Christopher Samuel
d we've lost a
heap of jobs from this behaviour (especially on BG/Q).
A setting that let you change that behaviour to mark nodes as draining
rather than down would be very handy.
cheers,
Chris
- --
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computatio
tely down.
For us Open-MPI catches those sorts of things for us, so I think we'd
rather just have the Slurm not kill jobs unless we (or the user) tells
it to.
cheers,
Chris
- --
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Emai
is running on the front end is the srun).
cheers,
Chris
- --
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci
-BEGIN
users can only query their own jobs and they can do that
from any of our 4 HPC systems.
Take a look at:
http://slurm.schedmd.com/accounting.html
as a starting point.
All the best,
Chris
- --
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Init
nks Moe!
- --
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.14 (GNU/Li
For instance:
[samuel@barcoo ~]$ sacct -j 1093417 -o jobid,nodelist
JobIDNodeList
- ---
1093417barcoo010
And showing it's not there by default:
[samuel@barcoo ~]$ sacct -j 1093417 -l | fgrep -i NodeList
[samuel@barcoo ~]$
Hope that helps!
Chris
it's not a great deal but it'd be good to have an idea how
long we could safely run without a slurmdbd to talk to.
thanks!
Chris
- --
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61
ave enough free memory but I assume
> you are not using 32MB of RAM.
32MB should be enough for anyone.. ;-)
All the best!
Chris
- --
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone:
s already
> calculated by the scheduler.
Would it make more sense to display that with "scontrol show job"
rather than in squeue?
But yes, anything that gives more insight into what Slurm is trying to
do for backfill is good!
All the best,
Chris
- --
Christopher SamuelSenior
lurmdbd against MySQL 5.6?
All the best!
Chris
- --
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci
-BEGIN PGP SIGNATU
or now).
Has anyone tested running 2.6.x slurmctld's against a 14.03.x slurmdbd
for extended periods of time?
cheers!
Chris
- --
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0
ecutes
>
> [root@master ~]# scancel job 14 scancel: error: Invalid job_id job
>
> says: error: Invalid job_id job
>
> How can I cancel this job?
You can't as it is no longer running, so cancelling it doesn't make sense.
Hope that helps,
Chris
- --
Christophe
script went into the background (like a daemon would)? Even
then I'm not sure that srun would help you.
We have users running batch scripts all the time on x86-64 without
using srun, it's never been an issue for us.
All the best,
Chris
- --
Christopher SamuelSenior Systems
-2 has a definite advantage, so I'm proposing
# that we turn PMI-2 "off" when under Slurm unless the user
# specifically requests we use it.
Not sure if this has been raised on slurm-dev yet.
All the best,
Chris
- --
Christopher SamuelSenior Systems Administrator
VLSCI -
Hopefully we can abstract it out into our "sinteractive" script
somehow..
All the best,
Chris
- --
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vl
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
On 22/05/14 04:49, David Bigagli wrote:
> By popular acclaim we keep supporting the old behaviour for
> historical reasons. The fix will be available in 14.03.4. The
> commit is 6aadcf15355dfe.
Thanks David!
- --
Christopher Samuel
jobs using GPUs (enforce using
# job_submit plugin)
Is that what you were after?
cheers,
Chris
- --
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au
According to the manual page the --ntasks-per-node=1 option for srun
should do what you want.
cheers,
Chris
- --
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://ww
etting nnodes to 1
Submitted batch job 1856638
A distributed job (MPI for instance) must have at least
one task on every node for this to make sense.
All the best,
Chris
- --
Christopher SamuelS
t should use Engine=InnoDB when
creating MySQL tables, that way you don't need to remember.
All the best,
Chris
- --
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www
r if the desired engine is
# unavailable, enable the NO_ENGINE_SUBSTITUTION SQL mode. If
# the desired engine is unavailable, this setting produces an
# error instead of a warning, and the table is not created or
# altered. See Section 5.1.7, “Server SQL Modes”.
cheers!
Chris
- --
Christopher Samu
this to work.
Caveat: We've never used this, so YMMV.
All the best,
Chris
- --
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com
isting and
> halt if it isn't available.
Wonderful, thanks so much!
- --
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.
es --ntasks=32 (and no node specification) not work for that?
cheers,
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci
counts
so I'll try and find a time to create a test partition there to check we
see the same.
We're (currently) on 2.6.5, what version are you on?
cheers,
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam
it difficult for me to check on the status of my jobs.
I have a suspicion that's related to your filesystem, not Slurm.
Certainly we don't see any such issue using Slurm 2.6.x with GPFS and
Panasas filesystems.
All the best,
Chris
--
Christopher SamuelSenior Systems Admin
On 22/08/14 04:43, Jesse Stroik wrote:
> We recently noticed sporadic performance inconsistencies on one of our
> clusters.
What distro is this? Are you using cgroups?
cheers,
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Compu
so if one does go bad we want to go and find out why before we
let it back into the cluster.
All the best,
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://ww
slurmdev
Unfortunately it's not clear what you do from there (there is no
unsubscribe link), but if you click "Profile Home" from the top
of that page it takes you to:
http://lists.schedmd.com/cgi-bin/dada/mail.cgi/profile/
and you'll see "Unsubscribe from this list" links the
icate 2 cores to a single job if I can
dedicate each core individually to a job.
What does "scontrol show node" say?
cheers,
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61
o build SMP systems out of individual nodes but they are
pretty rare. One company that does this is ScaleMP.
All the best,
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903
On 11/09/14 10:56, Christopher Samuel wrote:
> This is an operating system kernel issue, not a queuing system issue, so
> Slurm, LSF or Torque will all have the same issue.
To clarify the Linux kernel can support these sorts of systems, but
you'll either need extra supporting
d then copied back at the end of
the job.
Slurm doesn't do that, it defaults to writing the output (and errors) to
a file in the directory a job is running in, so there's no need to copy
it back at the end of the job.
Hope that helps,
Chris
--
Christopher SamuelSenior Systems Admi
ensure it was triggered.
Hope this helps!
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci
hows a difference in recorded memory usage between the two.
Might take a while to have something to report though. :-)
All the best,
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)
ndle this gracefully.
cheers!
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci
can't submit jobs
to it by accident.
7) Let users back onto the login nodes.
8) Set partitions back to "up" to start jobs going again.
Hope this helps folks..
cheers!
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation
nd groups, neither of which help in this scenario),
but just wondering if I've missed anything obvious.
cheers,
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
On 29/09/14 21:33, je...@schedmd.com wrote:
> Slurm does not support this today.
Thanks Moe, we'll see if we can figure another way around it.
cheers!
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Em
On 30/09/14 02:39, je...@schedmd.com wrote:
> About 70 people attended the Slurm User Group Meeting last week in
> Lugano Switzerland.
Thanks so much to you and everyone who organised the meeting and to
everyone who came, it was well worth attending.
All the best,
Chris
--
Christopher
On 29/09/14 15:28, Christopher Samuel wrote:
> B) If you update a compute node when there are jobs queued under the
> previous bash then they will fail when they run there (also cannot find
> modules, even though a prologue of ours sets BASH_ENV to force the env
> vars to get
ll)
What do the following commands say?
file /usr/bin/perl
ls -l /usr/bin/perl
getfacl /usr/bin/perl
Do you have SE Linux enabled?
cheers,
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Pho
On 03/10/14 03:05, Michael Jennings wrote:
> Is your /var/tmp mounted "noexec" by any chance?
I wondered that but I don't think that'll give the errors that Dennis is
seeing, he's seeing -EACCES for /usr/bin/perl, not for something in
/var/tmp. :-(
--
Christopher
e say?
cheers!
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci
ion row table in another database took
just a few seconds.
All the best,
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci
th 'nohup'.
cheers,
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci
r (pretty beefy) DB servers.
> The "foreign constraints" came from another database a colleague of mine
> has installed, we could fix this :-)
:-)
cheers,
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Emai
. and that new user submits a job, the
> default qos I've configured is not respected.
Hmm, could you try and mark a partition as UP with scontrol and see if
that helps? It's something we do here on Slurm 2.6 and (I believe)
resolves this for us.
All the best,
Chris
--
On 16/10/14 16:02, Christopher Samuel wrote:
> No worries, we're going to test out ours in a sandbox as well, so we'll
> be able to compare it to our (pretty beefy) DB servers.
It took around 2 minutes to add all the indexes in our sandbox, thats
with a total of about 6 milli
ch that each cluster includes
local.conf but have local.conf as a symlink to $cluster.conf (and then
exclude local.conf from git/rsync or however else that is
managed/distributed, if it is automated).
We do this trick already with other tools here at VLSCI.
cheers,
Chris
--
Christopher Samuel
can't test this.
commit 18d809fabfdf654facc480d114f6f3694b6bdd84
Author: Morris Jette
Date: Wed Aug 21 14:46:52 2013 -0700
Allocate group member buffer size as needed
Replace fixed size buffer with a buffer that can grow as needed.
--
Christopher SamuelSenior Systems Adm
sion of Slurm are you using?
cheers!
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci
en-MPI 1.8.x as that can use PMI2 with
Slurm and fixes this scaling issue.
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci
en Slurm decides to clean things up for you at that point.
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci
the list (after SC14) is to migrate to Open-MPI 1.8.4 which
is due out shortly which should address this.
cheers,
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
htt
no longer has that in the stdout or stderr files as a
> result of upgrading to 14.03.10.
Yup, this is with 14.03.10.
I've only managed to provoke it with NAMD so far, but I guess we'll hear
from users if they see it with other codes too. :-)
All the best,
Chris
--
Chris
id_500/job_2497190/
cgroup.event_control cgroup.procs freezer.state notify_on_release tasks
> Perhaps we can put it at debug level as before as it may concern users.
If it is just cosmetic it'd be good I think.
All the best,
Chris
--
Christopher SamuelSenior Systems Administrat
teable then it's a bug, not a feature.
All the best,
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci
des?
As a sysadmin any application should honour and use $TMPDIR if defined.
If that isn't set then all you can really rely on is /tmp and /var/tmp.
All the best,
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email:
I hope this is useful to people, have at it please!
All the best,
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci
None".
SchedulerParameters=bf_window=43200,bf_resolution=600,bf_max_job_user=5,max_job_bf=1,bf_continue,defer
Everything seems to perform well with those settings, slurmctld is at
around 8GB virtual and only ~35MB RSS for instance.
Best of luck!
Chris
--
Christopher SamuelSen
accounting as it was experimental and could give incorrect info, but it
seems fine in 14.03.x.
Hope this helps!
All the best,
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)
(upgraded from 2.6.x
recently). :-(
Best of luck!
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci
d whilst I could
munge things based on the units output it'd be nice if I could tell it
to just report in KB for everything.
This is 14.03.10 BTW.
Any ideas anyone?
cheers!
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiat
oo0010 0.00M
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci
pendent.
All the best,
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci
up
* If there's more then pick the high water mark for all the
steps and add that to the batch job.
Does that sound about right?
It doesn't need to be byte-accurate, this is more to give users a good
indication of how much memory their jobs are using.
All the best,
Chris
--
Christoph
nodes in preference
to higher ones unless the lower memory nodes are taken.
All the best,
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci
tc/cgroup_agents"
ConstrainCores=yes
ConstrainRAMSpace=yes
ConstrainSwapSpace=yes
Hope this helps!
All the best,
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vl
appserver/java-hotspottm-64-bit-server-vm-warning-failed-to-reserve-shared-memory-errno-12
This can also affect Open-MPI over IB too:
http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages-user
Best of luck,
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI -
.
All the best & happy new year!
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci
add_user_space_avail += bb_ptr->size;
}
bb_ptr = bb_ptr->next;
Hope this helps,
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci
the
rename on systems being upgraded).
You can see more information here:
https://tracker.debian.org/pkg/slurm-llnl
If you're on Wheezy that will give you Slurm 2.3.4 (Jessie will have
Slurm 14.03.9).
Hope this helps!
Chris
--
Christopher SamuelSenior Systems Administrator
Date: Mon Dec 15 15:24:39 2014 -0800
Revert "Commit 38068d21 expanded the reason for unavailable jobs but"
as it may cause core dumo in squeue.
This reverts commit 322c783cc437800052827d524e653313d2bed9b6.
--
Christopher SamuelSenior Systems Administrator
V
On 09/01/15 16:31, gareth.willi...@csiro.au wrote:
> I'm still stewing on this. Does anyone have sacctmgr/remote license
> setup working?
We've never attempted it here sorry!
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Comput
go
around it would work against their customers interest.
Yours,
Cynical of Melbourne..
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci
On 17/01/15 21:05, Uwe Sauter wrote:
> Trey: My Slurm installation completely lies on NFSv4.
I've seen weird problems with NFSv4 in RHEL6 in the past, so we just use
NFSv3 now.
Does the problem go away if you drop back to the 6.5 kernel on the
compute nodes?
cheers,
Chris
--
Chr
rnel panics when we boot our 4 racks at
once).
Could I suggest perhaps trying the Beowulf list for this? It might be a
better forum for general Linux distro and kernel problems in HPC:
http://beowulf.org/
Caveat: I run the Beowulf list these days.
cheers,
Chris
--
Christopher SamuelSeni
batch script just before
calling VASP to capture what is actually getting set, just in case
there's something odd going on..
We had people using VASP last year with Slurm and OpenMPI and they
didn't seem to have any issues.
Best of luck!
Chris
--
Christopher SamuelSenior System
rom inside
the VASP batch job (just before it starts) so we can see what the limits
are please?
thanks,
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.or
t off and consider
using cgroups to contain jobs.
Best of luck!
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci
lem.
Could you check to see if it's the same for you?
All the best,
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci
rt?
In my testing here (14.03.11) that worked for all those examples,
*except* for the last one.
All the best,
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci
oms to what you are seeing.
It also links to the Slurm docs on high throughput computing which may
well be useful for your configuration, you can find them here:
http://www.schedmd.com/slurmdocs/high_throughput.html
Other than that I've no ideas sorry!
Best of luck,
Chris
--
Christopher
h isn't going to happen if it's also ro?
cheers!
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci
que on everything
from RH7.3 (yes, pre RHEL), SLES9, SLES10, RHEL 3, 4 & 5 (we moved
to Slurm when we went to RHEL6).
All the best,
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci
thing we just install as an RPM (from EPEL) and our xCAT
takes care of ensuring the keys are correct.
All the best,
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 555
On 26/03/15 14:11, Fred Liu wrote:
> Just curious, What is xCAT?
It's a cluster management suite, we've used it on SGI as well as IBM gear:
http://xcat.sourceforge.net/
All the best,
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Science
x27; state waiting on some form
of device I/O.
I know some people have reported strange interactions between Slurm
being on an NFSv4 mount (NFSv3 is fine).
Good luck!
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci
at created commands that got run with
srun in parallel on our BG/Q system).
cheers,
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci
t Tools) at SC14. :-)
http://www.ugent.be/hpc/hust14.html
I really liked the look of XALT, but the reliance on people running
applications with srun pretty much nixed it here. :-(
All the best!
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Comput
1 - 100 of 497 matches
Mail list logo