Comments:
Edison does look retired [1].
Based on the usage of hostname in Bushra's job file (below), it looks
like that is configured for a shared memory super computer.
However, if the super computer is not a shared memory (single node)
system but a distributed memory (multiple node) system [2], the use of
hostname is potentially problematic.
That is because on a distributed memory system the head node typical is
not a compute node [3].
One bad thing that can happen is that head node calculations can break
the cluster login, for example [4]:
/Do NOT use the login nodes for work. If everyone does this, the login
nodes will crash keeping 700+ HPC users from being able to login to the
cluster.//
/
It depends on local policy, but most clusters I have seen have a policy
that the system administrators can permanently take away a user's access
to the cluster if a calculation is executed on the head node, for
example [5]:
/CHTC staff reserve the right to kill any long-running or problematic
processes on the head nodes and/or disable user accounts that violate
this policy, and users may not be notified of account deactivation./
Instead of hostname, the job file usually needs to get a node list that
it gets from the queuing system's job scheduler. That could be a script
like gen.machines [6] or Machines2W [7]. Or it could be environment
variable, which name depends on the queuing system, for example the
PBS_NODEFILE variable for PBS [8,9].
[1]
https://www.nersc.gov/news-publications/nersc-news/nersc-center-news/2019/edison-supercomputer-to-retire-after-five-years-of-service/
[2]
https://www.researchgate.net/figure/Shared-vs-Distributed-memory_fig3_323108484
[3] https://zhanglab.ccmb.med.umich.edu/docs/node9.html
[4] https://hpc.oit.uci.edu/running-jobs
[5] http://chtc.cs.wisc.edu/HPCuseguide.shtml
[6] https://docs.nersc.gov/applications/wien2k/
[7] SRC_mpiutil: http://susi.theochem.tuwien.ac.at/reg_user/unsupported/
[8] Script for "pbs":
http://susi.theochem.tuwien.ac.at/reg_user/faq/pbs.html
[9]
http://docs.adaptivecomputing.com/torque/4-0-2/Content/topics/commands/qsub.htm
On 11/4/2019 6:37 AM, Dr. K. C. Bhamu wrote:
Dear Bushra,
I hope you are using the same cluster you are using before (NERSC:
cori/edison).
From your job file it seems that you want to submit job on edison (28
cores).
Please make sure that edison is still working. My available
information says that edison has retired now. Please confirm from the
system admin.
I would suggest you to submit job on cori. A job file is there on
web-page of NERSC.
Anyway, please send the details as Prof. Peter has requested so that
he can help you.
Regards
Bhamu
On Mon, Nov 4, 2019 at 1:14 PM Peter Blaha
<pbl...@theochem.tuwien.ac.at <mailto:pbl...@theochem.tuwien.ac.at>>
wrote:
What means: " does not work" ??
We need details.
On 11/3/19 10:48 PM, BUSHRA SABIR wrote:
> Hi experts,
> I am working on super computer with WIEN2K/19.1 and using the
following
> job file, but this job file is not working for parallel run of
LAPW1.
> Need help to improve this job file.
> #!/bin/bash
> #SBATCH -N 1
> #SBATCH -p RM
> #SBATCH --ntasks-per-node 28
> #SBATCH -t 2:0:00
> # echo commands to stdout
> # set -x
> module load mpi
> module load intel
> export SCRATCH="./"
>
> #rm .machines
> #write .machines file
> echo '#' .machines
> # example for an MPI parallel lapw0
> #echo 'lapw0:'`hostname`' :'$nproc >> .machines
> # k-point and mpi parallel lapw1/2
>
> echo '1:'`hostname`':1' >> .machines
> echo '1:'`hostname`':1' >> .machines
> echo '1:'`hostname`':1' >> .machines
> echo '1:'`hostname`':1' >> .machines
> echo '1:'`hostname`':1' >> .machines
> echo '1:'`hostname`':1' >> .machines
> echo '1:'`hostname`':1' >> .machines
> echo '1:'`hostname`':1' >> .machines
> echo '1:'`hostname`':1' >> .machines
> echo '1:'`hostname`':1' >> .machines
> echo '1:'`hostname`':1' >> .machines
> echo '1:'`hostname`':1' >> .machines
> echo '1:'`hostname`':1' >> .machines
> echo '1:'`hostname`':1' >> .machines
> echo '1:'`hostname`':1' >> .machines
> echo '1:'`hostname`':1' >> .machines
> echo '1:'`hostname`':1' >> .machines
> echo '1:'`hostname`':1' >> .machines
> echo '1:'`hostname`':1' >> .machines
> echo '1:'`hostname`':1' >> .machines
> echo '1:'`hostname`':1' >> .machines
> echo '1:'`hostname`':1' >> .machines
> echo '1:'`hostname`':1' >> .machines
> echo '1:'`hostname`':1' >> .machines
> echo '1:'`hostname`':1' >> .machines
> echo '1:'`hostname`':1' >> .machines
> echo '1:'`hostname`':1' >> .machines
> echo '1:'`hostname`':1' >> .machines
>
> echo 'granularity:1' >>.machines
> echo 'extrafine:1' >>.machines
> export SCRATCH=./
> runsp_lapw -p -ec 0.000001 -cc 0.0001 -i 40 -fc 1.0
>
>
> Bushra
>
>
>
------------------------------------------------------------------------
>
> _______________________________________________
> Wien mailing list
> Wien@zeus.theochem.tuwien.ac.at
<mailto:Wien@zeus.theochem.tuwien.ac.at>
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at:
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
>
--
P.Blaha
--------------------------------------------------------------------------
Peter BLAHA, Inst.f. Materials Chemistry, TU Vienna, A-1060 Vienna
Phone: +43-1-58801-165300 FAX: +43-1-58801-165982
Email: bl...@theochem.tuwien.ac.at
<mailto:bl...@theochem.tuwien.ac.at> WIEN2k: http://www.wien2k.at
WWW: http://www.imc.tuwien.ac.at/TC_Blaha
--------------------------------------------------------------------------
_______________________________________________
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html