Hi All,

I am trying setup ET on a new cluster (hosted by the Minnesota Supercomputing Institute at U of Minnesota, Twin Cities). I have been using the mdb file bluewaters.ini as a template, replacing bluewaters specs with my specs where necessary. I've attached my mdb file for reference. I am able to build simfactory using the --machine=mesabi flag. I am able to run the testsuite with no failed tests (although several "unrunnable" tests, which I am assuming is ok?). I can submit jobs, and they seem to run to completion just fine.

However, every time I log in, I am on a different login node, which triggers an "unknown machine name" error, unless I have previously built simfactory on that particular node. I have tried to mimick bluewaters' aliaspattern line in hopes that it would do the trick. But I must be doing something wrong. What do I need to include in my mdb file to force the system to recognize that all of the login nodes are on the same machine?

For a bit more background: MSI uses a two step login process. First you ssh into a login machine. Then you ssh from there into one of the clusters. The machine I eventually reach is named mesabi, and the login hosts are named ln000[1-6].

Any help is greatly appreciated.

Thanks,
Eric

--
Eric J West
Assistant Professor
Department of Physics and Astronomy
University of Minnesota Duluth

[mesabi]

# last-tested-on: 2018-02-07
# last-tested-by: Eric West <[email protected]>

# Machine description
nickname        = mesabi
name            = Mesabi
location        = University of Minnesota
description     = HP Linux cluster at MSI
webpage         = https://www.msi.umn.edu/help-documentation
status          = experimental

# Access to this machine
hostname        = mesabi

envsetup        = <<EOT
    source /etc/profile
    module load gcc/7.2.0
    module load ompi/3.0.0
EOT
aliaspattern   = ^ln000[1-6](\.msi\.umn\.edu)$

# Source tree management
sourcebasedir   = /home/ewest/@USER@
optionlist      = mesabi.cfg
submitscript    = mesabi.sub
runscript       = mesabi.run
make            = make -j4

# Simulation management
basedir         = /home/ewest/@USER@/simulations
nodes           = 719 #number of nodes
#max-num-smt     = ??? #max threads per core
#num-smt         = ??? #suggested threads per core
max-num-threads = 24 #max threads per process
num-threads     = 24 #threads per process
ppn             = 24  #cores per node
min-ppn         = 1   #min allowed ppn
memory          = 61920 #memory per node in MB
allocation      = NO_ALLOCATION
queue           = small
maxwalltime     = 96:00:00
submit          = qsub @SCRIPTFILE@
getstatus       = qstat @JOB_ID@
stop            = qdel @JOB_ID@
submitpattern   = (\d+)
statuspattern   = ^@JOB_ID@\D
queuedpattern   = " Q "
runningpattern  = " R "
holdingpattern  = " H "
#scratchbasedir  = ???
stdout          = cat @[email protected]
stderr          = cat @[email protected]
stdout-follow   = tail -n 100 -f @[email protected] @[email protected]

_______________________________________________
Users mailing list
[email protected]
http://lists.einsteintoolkit.org/mailman/listinfo/users

Reply via email to