My issue today is I can’t get interactive jobs to run, when started on
login nodes. They run fine started from the head, but users are not allowed
direct login there. The issue seems to be that SLURM_SUBMIT_HOST is getting
set to the public network hostname instead of the private network hostname.
For some reason this isn’t an issue with batch jobs. Any idea how to best
control this behavior? From the compute node point of view the login node
public address is not reachable. I am not exactly certain that variable is
the direct culprit. But I get an error message from orte that leads you to
believe mpi is trying to use the external address. I get bunches of logged
martians on head coming from the login node kernels too.


I'm really new to SLURM. We have been on torque/maui for the past 3 years.

Reply via email to