Just as a caveat to other FreeBSD users: I've done very little testing of SLURM on 10.0 at this point.

I tend to shy away from dot-oh releases of anything and 10.0 introduces two major changes: A switch to clang as the base compiler as well as pkgng as the default package management system.

That said, I am running 10.0 on a few non-critical systems and have see remarkably few problems with it. There are still a few ports having issues with clang, but the base system seems to be rock solid.

Just the same, I'm sticking with 9.2 for critical production systems at least until 10.1 is out.

In the meantime, I would appreciate any feedback on the SLURM port under 10.x.

Regards,

    JB

On 2/21/14 8:52 AM, Bob Healey wrote:

I've compiled the various bits of data folks have asked me for off list to http://boyle.che.rpi.edu/~healer/slurm The biggest group I'm using for access control has 30 members. Groups are sourced from an external LDAP server.

System Details:
FreeBSD 10.0
Intel Atom D525
Compiler: Clang/llvm (FreeBSD 10 default compiler)

My queues are working with the old configs that I wanted to migrate off of, so this is not urgent, and it sounds like I'm the first to try something odd like this.

Bob Healey
Systems Administrator
Biocomputation and Bioinformatics Constellation
and Molecularium
[email protected]
(518) 276-4407

On 2/20/2014 5:52 PM, Jason Bacon wrote:


Hello Bob,

FYI, the FreeBSD port is very new and the groups feature is untested as far as I know. If you can provide any additional information, such as a debugger stack trace from the crash, it might expedite a solution.

I'm currently working on the srun --pty flag and there are probably a few other issues I'm not yet aware of. If you discover any other issues, please report them to this list and I'll put them on the to-do list.

Regards,

    Jason

On 2/20/14 3:08 PM, Bob Healey wrote:

I am slowly migrating my slurmctld's from RHEL 5 to FreeBSD for greater reliability. My cluster support systems are all FreeBSD, and not accessible to end users. The RHEL 5 systems are all end user accessible. I've copied a working slurm.conf file over to FreeBSD, and slurmctld segfaults at startup if I have AllowGroups=Something in the partition definition. The group definitions are stored in LDAP if that makes a difference.

I'm running 2.6.4, as provided by the FreeBSD Ports tree.


Slurm.conf excerpt. If I have any partition except tiger uncommented, slurmctld segfaults at startup. If I remove the group restriction from the partitions, it works.
# COMPUTE NODES
NodeName=lion-[1-48] RealMemory=1800 Sockets=2 CoresPerSocket=1 ThreadsPerCore=1 State=unknown NodeName=tiger-[1-75] Procs=4 RealMemory=7800 Sockets=2 CoresPerSocket=2 ThreadsPerCore=1 State=unknown NodeName=calvin-[1-8] RealMemory=16000 Sockets=2 CoresPerSocket=6 ThreadsPerCore=1 State=unknown Nodename=neutron-[1-8] RealMemory=16000 Sockets=2 CoresPerSocket=6 ThreadsPerCore=1 State=unknown NodeName=jaguar Procs=48 RealMemory=63000 Sockets=4 CoresPerSocket=12 ThreadsPerCore=1 State=unknown #PartitionName=tiger Nodes=tiger-[1-75] Default=YES MaxTime=2880 State=DOWN MaxNodes=16 PartitionName=jaguar Nodes=jaguar Default=NO MaxTime=2880 State=DOWN AllowGroups=lion.che_cluster_access #PartitionName=calvin Nodes=calvin-[1-8] Default=NO MaxTime=2880 State=DOWN AllowGroups=calvin_che_access #PartitionName=neutron Nodes=neutron-[1-8] Default=NO MaxTime=2880 State=DOWN AllowGroups=neutron_mat_access #PartitionName=lion Nodes=lion-[1-48] Default=NO MaxTime=2880 State=DOWN AllowGroups=lion.che_cluster_access





--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  Jason W. Bacon
  [email protected]

  Circumstances don't make a man:
  They reveal him.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Reply via email to