Re: [slurm-users] Defining constraints for job dispatching

Renfro, Michael Thu, 20 Sep 2018 06:50:48 -0700

Partitions have an ExclusiveUser setting. Not exclusive per job as I’d 
mis-remembered, but exclusive per user.


In any case, none of my few Fluent users run graphically on the HPC. They do 
their pre- and post-processing on local workstations, copying their .cas.gz and 
.dat.gz files to the HPC and running Fluent in a non-graphical batch manner:

Bash functions that everyone sources for a Fluent run:

=====

function fluent_make_nodelist() {
    > nodelist.${SLURM_JOBID}
    for n in `echo $SLURM_NODELIST | scontrol show hostnames`; do
        LINE="${n}.hpcib.tntech.edu"
        echo "${LINE}" >> nodelist.${SLURM_JOBID}
    done
}

function fluent_setup() {
    module load fluent
    # Calculate final iteration value
    END=$(printf "%05d" $(expr ${START} + ${STEPS}))
    if [ ${SLURM_NNODES} -gt 1 ]; then
        INTERCONNECT=infiniband
        fluent_make_nodelist
        EXTRA_ARGS="-cnf=nodelist.${SLURM_JOBID}"
    else
        INTERCONNECT=shmem
        EXTRA_ARGS=""
    fi
}

function fluent_run() {
    # Remove output file if it exists
    if [ -f ${JOBNAME}-${END}.dat.gz ]; then
        rm -f ${JOBNAME}-${END}.dat.gz
    fi
    fluent -g ${SOLVER} -t${SLURM_NTASKS} -p${INTERCONNECT} ${EXTRA_ARGS} <<EOD
rc ${JOBNAME}.cas.gz
rd ${JOBNAME}-${START}.dat.gz
solve/it/${STEPS}
wd ${JOBNAME}-${END}.dat.gz
exit
EOD
rm -f nodelist.${SLURM_JOBID}
}

=====

Typical slurm script:

=====

#!/bin/bash
#SBATCH --nodes=1 --ntasks-per-node=28
#SBATCH --time=1-00:00:00

# Given a case and data file with a common prefix, a hyphen, and a 5-digit
# value for the starting iteration count:
JOBNAME=FFF-1-1
START=00000

# How many additional iterations should be run?
STEPS=3000

# Which solver style to use?
# 2d (2d single precision), 2ddp (2d double precision),
# 3d (3d single precision), 3ddp (3d double precision)
SOLVER=3ddp

# Shouldn't have to edit anything below here. A new data file will be written
# under the name ${JOBNAME}-${START+STEPS}.dat.gz
source /cm/shared/apps/ansys_inc/fluent_functions
fluent_setup
fluent_run

=====

> On Sep 20, 2018, at 2:50 AM, Mahmood Naderan <[email protected]> wrote:
> 
> Hi Michael,
> Sorry for the late response. Do you mean supplying --exclusive to the
> srun command? Or I have to do something else for partitions? Currently
> they use
> 
> srun -n 1 -c 6 --x11 -A monthly -p CAT --mem=32GB ./fluent.sh
> 
> where fluent.sh is
> 
> #!/bin/bash
> unset SLURM_GTIDS
> /state/partition1/ansys_inc/v140/fluent/bin/fluent
> 
> 
> Regards,
> Mahmood
> 
> 
> 
> 
> On Sat, Sep 1, 2018 at 7:45 PM Renfro, Michael <[email protected]> wrote:
>> 
>> Depending on the scale (what percent are Fluent users, how many nodes you 
>> have), you could use exclusive mode on either a per-partition or per-job 
>> basis.
>> 
>> Here, my (currently few) Fluent users do all their GUI work off the cluster, 
>> and just submit batch jobs using the generated case and data files.
>> 
>> --
>> Mike Renfro  / HPC Systems Administrator, Information Technology Services
>> 931 372-3601 / Tennessee Tech University
>> 
>>> On Sep 1, 2018, at 9:53 AM, Mahmood Naderan <[email protected]> wrote:
>>> 
>>> Hi,
>>> I have found that when user A is running a fluent job (some 100% processes 
>>> in top) and user B decides to run a fluent job for his own, the console 
>>> window of fluent shows some messages that another fluent process is running 
>>> and it can not set affinity. This is not an error, but I see that the speed 
>>> is somehow low.
>>> 
>>> Think that when a user runs "srun --x11 .... script" where script launches 
>>> some fluent processes and slurm put that job on compute-0-0, there should 
>>> be a way that another "script" from another user goes to compute-0-1 even 
>>> if compute-0-0 has free cores.
>>> 
>>> Is there any way in slurm configuration to set such a constraint? If slurm 
>>> wants to dispatch a job, first see if process X is running there or not.
>>> 
>>> 
>>> Regards,
>>> Mahmood
>>> 
>>> 
>> 
>> 
>

Re: [slurm-users] Defining constraints for job dispatching

Reply via email to