Just as a caveat to other FreeBSD users: I've done very little testing
of SLURM on 10.0 at this point.
I tend to shy away from dot-oh releases of anything and 10.0 introduces
two major changes: A switch to clang as the base compiler as well as
pkgng as the default package management system.
That said, I am running 10.0 on a few non-critical systems and have see
remarkably few problems with it. There are still a few ports having
issues with clang, but the base system seems to be rock solid.
Just the same, I'm sticking with 9.2 for critical production systems at
least until 10.1 is out.
In the meantime, I would appreciate any feedback on the SLURM port under
10.x.
Regards,
JB
On 2/21/14 8:52 AM, Bob Healey wrote:
I've compiled the various bits of data folks have asked me for off
list to http://boyle.che.rpi.edu/~healer/slurm The biggest group I'm
using for access control has 30 members. Groups are sourced from an
external LDAP server.
System Details:
FreeBSD 10.0
Intel Atom D525
Compiler: Clang/llvm (FreeBSD 10 default compiler)
My queues are working with the old configs that I wanted to migrate
off of, so this is not urgent, and it sounds like I'm the first to try
something odd like this.
Bob Healey
Systems Administrator
Biocomputation and Bioinformatics Constellation
and Molecularium
[email protected]
(518) 276-4407
On 2/20/2014 5:52 PM, Jason Bacon wrote:
Hello Bob,
FYI, the FreeBSD port is very new and the groups feature is untested
as far as I know. If you can provide any additional information,
such as a debugger stack trace from the crash, it might expedite a
solution.
I'm currently working on the srun --pty flag and there are probably a
few other issues I'm not yet aware of. If you discover any other
issues, please report them to this list and I'll put them on the
to-do list.
Regards,
Jason
On 2/20/14 3:08 PM, Bob Healey wrote:
I am slowly migrating my slurmctld's from RHEL 5 to FreeBSD for
greater reliability. My cluster support systems are all FreeBSD,
and not accessible to end users. The RHEL 5 systems are all end
user accessible. I've copied a working slurm.conf file over to
FreeBSD, and slurmctld segfaults at startup if I have
AllowGroups=Something in the partition definition. The group
definitions are stored in LDAP if that makes a difference.
I'm running 2.6.4, as provided by the FreeBSD Ports tree.
Slurm.conf excerpt. If I have any partition except tiger
uncommented, slurmctld segfaults at startup. If I remove the group
restriction from the partitions, it works.
# COMPUTE NODES
NodeName=lion-[1-48] RealMemory=1800 Sockets=2 CoresPerSocket=1
ThreadsPerCore=1 State=unknown
NodeName=tiger-[1-75] Procs=4 RealMemory=7800 Sockets=2
CoresPerSocket=2 ThreadsPerCore=1 State=unknown
NodeName=calvin-[1-8] RealMemory=16000 Sockets=2 CoresPerSocket=6
ThreadsPerCore=1 State=unknown
Nodename=neutron-[1-8] RealMemory=16000 Sockets=2 CoresPerSocket=6
ThreadsPerCore=1 State=unknown
NodeName=jaguar Procs=48 RealMemory=63000 Sockets=4
CoresPerSocket=12 ThreadsPerCore=1 State=unknown
#PartitionName=tiger Nodes=tiger-[1-75] Default=YES MaxTime=2880
State=DOWN MaxNodes=16
PartitionName=jaguar Nodes=jaguar Default=NO MaxTime=2880 State=DOWN
AllowGroups=lion.che_cluster_access
#PartitionName=calvin Nodes=calvin-[1-8] Default=NO MaxTime=2880
State=DOWN AllowGroups=calvin_che_access
#PartitionName=neutron Nodes=neutron-[1-8] Default=NO MaxTime=2880
State=DOWN AllowGroups=neutron_mat_access
#PartitionName=lion Nodes=lion-[1-48] Default=NO MaxTime=2880
State=DOWN AllowGroups=lion.che_cluster_access
--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Jason W. Bacon
[email protected]
Circumstances don't make a man:
They reveal him.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~