Perhaps the PrologSlurmctld could be used. See
http://www.schedmd.com/slurmdocs/prolog_epilog.html

Quoting Twilight Zoners <[email protected]>:

> Hi all,
>
>
> I've setup and configured slurm 2.5.0 with 1 slurm controller and 9  
> nodes (total 10 computers in a cluster, all RHEL6 64 bit) .
> Things i've set up:
> - using munge
> - all computers are joined to windows Active Directory domain, so  
> users can use their Active directory credentials to login to any  
> RHEL6 computer in the cluster.
> - samba + winbind
> - NFS share
>
> I have successfully set all the RHEL computers to use the same UID  
> and GID for each users, using RID in samba config:
>      idmap uid = 17000000-33554431
>      idmap gid = 17000000-33554431
>      idmap backend = rid:domain=100000-100000000
>
> So when a user log in to node01, node02, etc, the user will always  
> get the same UID and GID, thus, the user can run the srun command.
>
>
> However, i am facing a challenge here.
> If everytime a new user (never logged in to any computer in the  
> cluster) try to login to the controller and run a srun command, for  
> example "srun -n30 -c1 pwd", the user will get this error:
>
>
> srun: error: Task launch for 430.0 failed on node node01: User not  
> found on host
> srun: error: Task launch for 430.0 failed on node node02: User not  
> found on host
>
>
> I believe this is due to, the user account is not yet "created" on  
> the node01 and node02.
>
> So the "temporary" solution to this is, i ask the user to login to  
> all the slurm nodes in order to force all the nodes to create the  
> user's profile/account.
>
> This cluster will grow in future and we will add more slurm nodes.
> I cannot imagine if i need to add 100 nodes, what the users will  
> react if they need to login to additional 100 computers, just to  
> create their accounts on each node.
>
> I tried to implement NIS, but i think NIS only sync the local linux  
> users account. In my case, the domain user accounts are not saved on  
> the local linux account, but in winbind database  
> (/var/lib64/samba/*.tdb).
>
>
> I believe i am not the only one using this approach (using windows  
> AD for authentication) for slurm.
>
> Can anyone give me a clue how to solve this challenge?
>
>
> Thanks in advance....

Reply via email to