Hello Bob,
FYI, the FreeBSD port is very new and the groups feature is untested as
far as I know. If you can provide any additional information, such as a
debugger stack trace from the crash, it might expedite a solution.
I'm currently working on the srun --pty flag and there are probably a
few other issues I'm not yet aware of. If you discover any other
issues, please report them to this list and I'll put them on the to-do list.
Regards,
Jason
On 2/20/14 3:08 PM, Bob Healey wrote:
I am slowly migrating my slurmctld's from RHEL 5 to FreeBSD for
greater reliability. My cluster support systems are all FreeBSD, and
not accessible to end users. The RHEL 5 systems are all end user
accessible. I've copied a working slurm.conf file over to FreeBSD,
and slurmctld segfaults at startup if I have AllowGroups=Something in
the partition definition. The group definitions are stored in LDAP if
that makes a difference.
I'm running 2.6.4, as provided by the FreeBSD Ports tree.
Slurm.conf excerpt. If I have any partition except tiger uncommented,
slurmctld segfaults at startup. If I remove the group restriction
from the partitions, it works.
# COMPUTE NODES
NodeName=lion-[1-48] RealMemory=1800 Sockets=2 CoresPerSocket=1
ThreadsPerCore=1 State=unknown
NodeName=tiger-[1-75] Procs=4 RealMemory=7800 Sockets=2
CoresPerSocket=2 ThreadsPerCore=1 State=unknown
NodeName=calvin-[1-8] RealMemory=16000 Sockets=2 CoresPerSocket=6
ThreadsPerCore=1 State=unknown
Nodename=neutron-[1-8] RealMemory=16000 Sockets=2 CoresPerSocket=6
ThreadsPerCore=1 State=unknown
NodeName=jaguar Procs=48 RealMemory=63000 Sockets=4 CoresPerSocket=12
ThreadsPerCore=1 State=unknown
#PartitionName=tiger Nodes=tiger-[1-75] Default=YES MaxTime=2880
State=DOWN MaxNodes=16
PartitionName=jaguar Nodes=jaguar Default=NO MaxTime=2880 State=DOWN
AllowGroups=lion.che_cluster_access
#PartitionName=calvin Nodes=calvin-[1-8] Default=NO MaxTime=2880
State=DOWN AllowGroups=calvin_che_access
#PartitionName=neutron Nodes=neutron-[1-8] Default=NO MaxTime=2880
State=DOWN AllowGroups=neutron_mat_access
#PartitionName=lion Nodes=lion-[1-48] Default=NO MaxTime=2880
State=DOWN AllowGroups=lion.che_cluster_access
--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Jason W. Bacon
[email protected]
Circumstances don't make a man:
They reveal him.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~