The patch got things to compile and start without segfaulting. I can't actually test before 3/11 when I have a scheduled outage.

Bob Healey
Systems Administrator
Biocomputation and Bioinformatics Constellation
and Molecularium
[email protected]
(518) 276-4407

On 2/23/2014 7:35 PM, David Bigagli wrote:

This seems to be a bug in the FreeBSD port. I can reproduce it on my 8.4-RELEASE machine. getpwent_r() on FreeBSD when there are no more entries returns 0 unlike Linux which returns ENOENT.

You can try to install this patch in your source code.

diff --git a/src/slurmctld/groups.c b/src/slurmctld/groups.c
index c3e2cc2..f26c689 100644
--- a/src/slurmctld/groups.c
+++ b/src/slurmctld/groups.c
@@ -169,6 +169,8 @@ extern uid_t *get_group_members(char *group_name)
        while (!getpwent_r(&pw, pw_buffer, PW_BUF_SIZE, &pwd_result)) {
 #endif
 #endif
+               if (pwd_result == NULL)
+                       break;
                if (pwd_result->pw_gid != my_gid)
                        continue;
                if (j+1 >= uid_cnt) {


On 02/21/2014 06:52 AM, Bob Healey wrote:

I've compiled the various bits of data folks have asked me for off list
to http://boyle.che.rpi.edu/~healer/slurm  The biggest group I'm using
for access control has 30 members.  Groups are sourced from an external
LDAP server.

System Details:
FreeBSD 10.0
Intel Atom D525
Compiler: Clang/llvm (FreeBSD 10 default compiler)

My queues are working with the old configs that I wanted to migrate off
of, so this is not urgent, and it sounds like I'm the first to try
something odd like this.

Bob Healey
Systems Administrator
Biocomputation and Bioinformatics Constellation
and Molecularium
[email protected]
(518) 276-4407

On 2/20/2014 5:52 PM, Jason Bacon wrote:


Hello Bob,

FYI, the FreeBSD port is very new and the groups feature is untested
as far as I know.  If you can provide any additional information, such
as a debugger stack trace from the crash, it might expedite a solution.

I'm currently working on the srun --pty flag and there are probably a
few other issues I'm not yet aware of.  If you discover any other
issues, please report them to this list and I'll put them on the to-do
list.

Regards,

    Jason

On 2/20/14 3:08 PM, Bob Healey wrote:

I am slowly migrating my slurmctld's from RHEL 5 to FreeBSD for
greater reliability.  My cluster support systems are all FreeBSD, and
not accessible to end users.  The RHEL 5 systems are all end user
accessible.  I've copied a working slurm.conf file over to FreeBSD,
and slurmctld segfaults at startup if I have AllowGroups=Something in
the partition definition.  The group definitions are stored in LDAP
if that makes a difference.

I'm running 2.6.4, as provided by the FreeBSD Ports tree.


Slurm.conf excerpt.  If I have any partition except tiger
uncommented, slurmctld segfaults at startup.  If I remove the group
restriction from the partitions, it works.
# COMPUTE NODES
NodeName=lion-[1-48] RealMemory=1800 Sockets=2 CoresPerSocket=1
ThreadsPerCore=1 State=unknown
NodeName=tiger-[1-75] Procs=4 RealMemory=7800 Sockets=2
CoresPerSocket=2 ThreadsPerCore=1 State=unknown
NodeName=calvin-[1-8] RealMemory=16000 Sockets=2 CoresPerSocket=6
ThreadsPerCore=1 State=unknown
Nodename=neutron-[1-8] RealMemory=16000 Sockets=2 CoresPerSocket=6
ThreadsPerCore=1 State=unknown
NodeName=jaguar Procs=48 RealMemory=63000 Sockets=4 CoresPerSocket=12
ThreadsPerCore=1 State=unknown
#PartitionName=tiger Nodes=tiger-[1-75] Default=YES MaxTime=2880
State=DOWN MaxNodes=16
PartitionName=jaguar Nodes=jaguar Default=NO MaxTime=2880 State=DOWN
AllowGroups=lion.che_cluster_access
#PartitionName=calvin Nodes=calvin-[1-8] Default=NO MaxTime=2880
State=DOWN AllowGroups=calvin_che_access
#PartitionName=neutron Nodes=neutron-[1-8] Default=NO MaxTime=2880
State=DOWN AllowGroups=neutron_mat_access
#PartitionName=lion Nodes=lion-[1-48] Default=NO MaxTime=2880
State=DOWN AllowGroups=lion.che_cluster_access




Reply via email to