Partitions have an ExclusiveUser setting. Not exclusive per job as I’d
mis-remembered, but exclusive per user.
In any case, none of my few Fluent users run graphically on the HPC. They do
their pre- and post-processing on local workstations, copying their .cas.gz and
.dat.gz files to the HPC and running Fluent in a non-graphical batch manner:
Bash functions that everyone sources for a Fluent run:
=====
function fluent_make_nodelist() {
> nodelist.${SLURM_JOBID}
for n in `echo $SLURM_NODELIST | scontrol show hostnames`; do
LINE="${n}.hpcib.tntech.edu"
echo "${LINE}" >> nodelist.${SLURM_JOBID}
done
}
function fluent_setup() {
module load fluent
# Calculate final iteration value
END=$(printf "%05d" $(expr ${START} + ${STEPS}))
if [ ${SLURM_NNODES} -gt 1 ]; then
INTERCONNECT=infiniband
fluent_make_nodelist
EXTRA_ARGS="-cnf=nodelist.${SLURM_JOBID}"
else
INTERCONNECT=shmem
EXTRA_ARGS=""
fi
}
function fluent_run() {
# Remove output file if it exists
if [ -f ${JOBNAME}-${END}.dat.gz ]; then
rm -f ${JOBNAME}-${END}.dat.gz
fi
fluent -g ${SOLVER} -t${SLURM_NTASKS} -p${INTERCONNECT} ${EXTRA_ARGS} <<EOD
rc ${JOBNAME}.cas.gz
rd ${JOBNAME}-${START}.dat.gz
solve/it/${STEPS}
wd ${JOBNAME}-${END}.dat.gz
exit
EOD
rm -f nodelist.${SLURM_JOBID}
}
=====
Typical slurm script:
=====
#!/bin/bash
#SBATCH --nodes=1 --ntasks-per-node=28
#SBATCH --time=1-00:00:00
# Given a case and data file with a common prefix, a hyphen, and a 5-digit
# value for the starting iteration count:
JOBNAME=FFF-1-1
START=00000
# How many additional iterations should be run?
STEPS=3000
# Which solver style to use?
# 2d (2d single precision), 2ddp (2d double precision),
# 3d (3d single precision), 3ddp (3d double precision)
SOLVER=3ddp
# Shouldn't have to edit anything below here. A new data file will be written
# under the name ${JOBNAME}-${START+STEPS}.dat.gz
source /cm/shared/apps/ansys_inc/fluent_functions
fluent_setup
fluent_run
=====
> On Sep 20, 2018, at 2:50 AM, Mahmood Naderan <[email protected]> wrote:
>
> Hi Michael,
> Sorry for the late response. Do you mean supplying --exclusive to the
> srun command? Or I have to do something else for partitions? Currently
> they use
>
> srun -n 1 -c 6 --x11 -A monthly -p CAT --mem=32GB ./fluent.sh
>
> where fluent.sh is
>
> #!/bin/bash
> unset SLURM_GTIDS
> /state/partition1/ansys_inc/v140/fluent/bin/fluent
>
>
> Regards,
> Mahmood
>
>
>
>
> On Sat, Sep 1, 2018 at 7:45 PM Renfro, Michael <[email protected]> wrote:
>>
>> Depending on the scale (what percent are Fluent users, how many nodes you
>> have), you could use exclusive mode on either a per-partition or per-job
>> basis.
>>
>> Here, my (currently few) Fluent users do all their GUI work off the cluster,
>> and just submit batch jobs using the generated case and data files.
>>
>> --
>> Mike Renfro / HPC Systems Administrator, Information Technology Services
>> 931 372-3601 / Tennessee Tech University
>>
>>> On Sep 1, 2018, at 9:53 AM, Mahmood Naderan <[email protected]> wrote:
>>>
>>> Hi,
>>> I have found that when user A is running a fluent job (some 100% processes
>>> in top) and user B decides to run a fluent job for his own, the console
>>> window of fluent shows some messages that another fluent process is running
>>> and it can not set affinity. This is not an error, but I see that the speed
>>> is somehow low.
>>>
>>> Think that when a user runs "srun --x11 .... script" where script launches
>>> some fluent processes and slurm put that job on compute-0-0, there should
>>> be a way that another "script" from another user goes to compute-0-1 even
>>> if compute-0-0 has free cores.
>>>
>>> Is there any way in slurm configuration to set such a constraint? If slurm
>>> wants to dispatch a job, first see if process X is running there or not.
>>>
>>>
>>> Regards,
>>> Mahmood
>>>
>>>
>>
>>
>