I've compiled the various bits of data folks have asked me for off list to http://boyle.che.rpi.edu/~healer/slurm The biggest group I'm using for access control has 30 members. Groups are sourced from an external LDAP server.

System Details:
FreeBSD 10.0
Intel Atom D525
Compiler: Clang/llvm (FreeBSD 10 default compiler)

My queues are working with the old configs that I wanted to migrate off of, so this is not urgent, and it sounds like I'm the first to try something odd like this.

Bob Healey
Systems Administrator
Biocomputation and Bioinformatics Constellation
and Molecularium
[email protected]
(518) 276-4407

On 2/20/2014 5:52 PM, Jason Bacon wrote:


Hello Bob,

FYI, the FreeBSD port is very new and the groups feature is untested as far as I know. If you can provide any additional information, such as a debugger stack trace from the crash, it might expedite a solution.

I'm currently working on the srun --pty flag and there are probably a few other issues I'm not yet aware of. If you discover any other issues, please report them to this list and I'll put them on the to-do list.

Regards,

    Jason

On 2/20/14 3:08 PM, Bob Healey wrote:

I am slowly migrating my slurmctld's from RHEL 5 to FreeBSD for greater reliability. My cluster support systems are all FreeBSD, and not accessible to end users. The RHEL 5 systems are all end user accessible. I've copied a working slurm.conf file over to FreeBSD, and slurmctld segfaults at startup if I have AllowGroups=Something in the partition definition. The group definitions are stored in LDAP if that makes a difference.

I'm running 2.6.4, as provided by the FreeBSD Ports tree.


Slurm.conf excerpt. If I have any partition except tiger uncommented, slurmctld segfaults at startup. If I remove the group restriction from the partitions, it works.
# COMPUTE NODES
NodeName=lion-[1-48] RealMemory=1800 Sockets=2 CoresPerSocket=1 ThreadsPerCore=1 State=unknown NodeName=tiger-[1-75] Procs=4 RealMemory=7800 Sockets=2 CoresPerSocket=2 ThreadsPerCore=1 State=unknown NodeName=calvin-[1-8] RealMemory=16000 Sockets=2 CoresPerSocket=6 ThreadsPerCore=1 State=unknown Nodename=neutron-[1-8] RealMemory=16000 Sockets=2 CoresPerSocket=6 ThreadsPerCore=1 State=unknown NodeName=jaguar Procs=48 RealMemory=63000 Sockets=4 CoresPerSocket=12 ThreadsPerCore=1 State=unknown #PartitionName=tiger Nodes=tiger-[1-75] Default=YES MaxTime=2880 State=DOWN MaxNodes=16 PartitionName=jaguar Nodes=jaguar Default=NO MaxTime=2880 State=DOWN AllowGroups=lion.che_cluster_access #PartitionName=calvin Nodes=calvin-[1-8] Default=NO MaxTime=2880 State=DOWN AllowGroups=calvin_che_access #PartitionName=neutron Nodes=neutron-[1-8] Default=NO MaxTime=2880 State=DOWN AllowGroups=neutron_mat_access #PartitionName=lion Nodes=lion-[1-48] Default=NO MaxTime=2880 State=DOWN AllowGroups=lion.che_cluster_access



Reply via email to