Hey all,

   We are trying to setup a Slurm cluster for both cpu and gpu partitions
   for research and education(courses) in a computer science faculty at
   my university
   everything seems to work fine and we have managed to accomplish almost
   everythingן¿½ needed except a few things:

   A. Is it possible to setup a global defaults values for srun/sbatch
   (i.e. number of cores ,email, etc)? if so, how can it be done?
   A2. Is it possible to make some srun/sbatch parameters required(i.e a
   user cannot run a job via srun unless specifying email)? if so, how?

   B. We have active directory(AD) in our faculty, and We prefer manage
   users/groups from there , is it possible? any guide available
   somewhere?

   C. What is the recommanded way to handle data files? meaning , a user
   wants his data/code files (for example a data set of pictures for gpu
   deep learning) to be accessible to the nodes allocated to him and get
   the result back easily without sshing to those nodes(I want to close
   the nodes to ssh if possible), so far we investigatedן¿½ nfs(low
   preformance vs files locally on server), nextcloud(file syncing back
   and forth), is there a better way we overlooked?

   D. We need to give a specific known user the ability to run his jobs
   on specific nodes on specific hours while no other jobs allowed to run
   concurrently(exclusion)
   We saw there is reservation, but it takes the resources even if that
   user didn't eventually use his reservation, another solution was to
   create a partition with priority higher than all the others , put this
   partition in down state and only give that user a right to submit jobs
   to it, then put a script in crontab to change the state of the
   partition in the time window needed,
   What do you think? is there a more elegant way?


   Our most common os is ubuntu, and we are using slurm 17.02.7

   Thanks in advance for you time and effort, Nadav

Reply via email to