Re: [gmx-users] Gromacs 2016.3 orte error while running on cluster

2018-01-17 Thread Rainer Rutka

HI!
WORKS LIKE A CHARM NOW.
Thank you for your hints!
.-)


Am 17.01.2018 um 13:04 schrieb Rainer Rutka:

HI!
Just a question.

We try to start a MPIed job with Gromacs 2016.3 on
our cluster-system here in Germany.

Unfortunately we get this error:

An ORTE daemon has unexpectedly failed after lunch...

See more in the attached gromacs-run-error.txt file.
Our submit-script is attached, too: gromacs-run-pbs.txt

THANKS IN ADVANCE!



--
Rainer Rutka
University of Konstanz
Communication, Information, Media Centre (KIM)
 * High-Performance-Computing (HPC)
 * KIM-Support and -Base-Services
Room: V511
78457 Konstanz, Germany
+49 7531 88-5413

-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.

[gmx-users] Gromacs 2016.3 orte error while running on cluster

2018-01-17 Thread Rainer Rutka

HI!
Just a question.

We try to start a MPIed job with Gromacs 2016.3 on
our cluster-system here in Germany.

Unfortunately we get this error:

An ORTE daemon has unexpectedly failed after lunch...

See more in the attached gromacs-run-error.txt file.
Our submit-script is attached, too: gromacs-run-pbs.txt

THANKS IN ADVANCE!

--
Rainer Rutka
University of Konstanz
Communication, Information, Media Centre (KIM)
 * High-Performance-Computing (HPC)
 * KIM-Support and -Base-Services
Room: V511
78457 Konstanz, Germany
+49 7531 88-5413

* hwloc 1.11.2 has encountered an incorrect PCI locality information.
* PCI bus :80 is supposedly close to 2nd NUMA node of 1st package,
* however hwloc believes this is impossible on this architecture.
* Therefore the PCI bus will be moved to 1st NUMA node of 2nd package.
*
* If you feel this fixup is wrong, disable it by setting in your environment
* HWLOC_PCI__80_LOCALCPUS= (empty value), and report the problem
* to the hwloc's user mailing list together with the XML output of lstopo.
*
* You may silence this message by setting HWLOC_HIDE_ERRORS=1 in your 
environment.

--
An ORTE daemon has unexpectedly failed after launch and before
communicating back to mpirun. This could be caused by a number
of factors, including an inability to create a connection back
to mpirun due to a lack of common network interfaces and/or no
route found between them. Please check network connectivity
(including firewalls and network routing requirements).
--
#!/bin/bash
#MSUB -joe
#MSUB -N XADpr-1
#MSUB -l walltime=48:00:00
#MSUB -l nodes=5:ppn=16
export OMP_NUM_THREADS=1

export 
PATH=/opt/bwhpc/common/mpi/openmpi/2.1.1-gnu-7.1/bin:/opt/bwhpc/common/compiler/gnu/7.1.0/bin:/opt/bwhpc/common/chem/gromacs/2016.3_gnu7.1/bin:/software/all/bin:/usr/lib64/qt-3.3/bin:/opt/moab/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/st/st_st/st_ac128541/.local/bin:/home/st/st_st/st_ac128541/bin
export LD_BIND_NOW=1

cd /pfs/work2/workspace/scratch/st_ac128541-Lipase-0/1EDB

module purge
module load chem/gromacs/2016.3_gnu7.1

if [ 1 == 1 ] ; then
# Start simulation 

gmx_mpi grompp -maxwarn 10 -f md.mdp -c XADpr.gro -p XADpr.top -o XADpr-1.tpr 
-po XADpr-1.mdp > XADpr-1.grompp.out 2>&1

mpirun -n 80 gmx_mpi mdrun -s XADpr-1.tpr -maxh 47 -npme 20 -cpo XADpr-1.cpt -o 
XADpr-1.trr -x XADpr-1.xtc -c XADpr-1.gro -e XADpr-1.edr -g XADpr-1.log > 
XADpr-1.mdrun.out 2>&1

else
# Continue simulation using the checkpoint feature

mpirun -n 80 gmx_mpi mdrun -cpi XADpr-0.cpt -cpo XADpr-1.cpt -s XADpr-1.tpr 
-maxh 47 -npme 20 -o XADpr-1.trr -x XADpr-1.xtc -c XADpr-1.gro -e XADpr-1.edr 
-g XADpr-1.log > XADpr-1.mdrun.out 2>&1

fi  
-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.

[gmx-users] Gromacs 2016.3 FLOATING-POINT EXECPTION/DIVIDE-BY_ZERO errors

2017-09-05 Thread Rainer Rutka

HI!.
My name is Rainer. I am one of the module-/SW-maintainers
for the bwHPC-C5-project. "http://www.bwhpc-c5.de/en/index.php;

Since a couple of time we can't run Gromacs 2016.n on
our clusters. Here's one of the errors we received from
one of the users:


I am using Gromacs on BwUniCluster for few months and I performed 
molecular Dynamics Simulations without any problem until the end of 
July. Since August, I cannot prepare the input file and if I prepare 
them elsewhere, the simulation crash although it was the same that the 
ones which perfectly worked few days before. It seems that I have not 
such problem smaller systems with less atoms.


uc1n997:43275] *** Process received signal ***
[uc1n997:43275] Signal: Floating point exception (8)
[uc1n997:43275] Signal code: Integer divide-by-zero (1)
[uc1n997:43275] Failing at address: 0x42fd74
[uc1n997:43275] [ 0] /usr/lib64/libpthread.so.0(+0xf5e0)[0x2b65e89805e0]
[uc1n997:43275] [ 1] gmx_mpi_d(__svml_idiv4_h9+0x64)[0x42fd74]
[uc1n997:43275] [ 2] 
/pfs/data1/software_uc1/bwhpc/common/chem/gromacs/5.1.2-openmpi-1.8-intel-15.0/bin/../lib64/libgromacs_mpi_d.so.1(count_bonded_distances+0x43c)[0x2b65e6b1f72c]
[uc1n997:43275] [ 3] 
/pfs/data1/software_uc1/bwhpc/common/chem/gromacs/5.1.2-openmpi-1.8-intel-15.0/bin/../lib64/libgromacs_mpi_d.so.1(pme_load_estimate+0x20)[0x2b65e6b1ef70]
[uc1n997:43275] [ 4] 
/pfs/data1/software_uc1/bwhpc/common/chem/gromacs/5.1.2-openmpi-1.8-intel-15.0/bin/../lib64/libgromacs_mpi_d.so.1(gmx_grompp+0x257a)[0x2b65e639e84a]
[uc1n997:43275] [ 5] 
/pfs/data1/software_uc1/bwhpc/common/chem/gromacs/5.1.2-openmpi-1.8-intel-15.0/bin/../lib64/libgromacs_mpi_d.so.1(_ZN3gmx24CommandLineModuleManager3runEiPPc+0x267)[0x2b65e6127e97]

[uc1n997:43275] [ 6] gmx_mpi_d(main+0xbc)[0x40c13c]
[uc1n997:43275] [ 7] 
/usr/lib64/libc.so.6(__libc_start_main+0xf5)[0x2b65e8baec05]

[uc1n997:43275] [ 8] gmx_mpi_d[0x40bfb9]
[uc1n997:43275] *** End of error message ***


Gromacs 2016.3 was build this way (excerpt):

[...]
# #(3) Load required modules for build process
module load compiler/intel/16.0
module load mpi/openmpi/2.1-intel-16.0
module load numlib/mkl/11.3.4
module load devel/cmake/3.3.2
[...]
# double precission
cmake -DCMAKE_VERBOSE_MAKEFILE=ON -DGMX_MPI=ON -DGMX_GPU=OFF 
-DGMX_DOUBLE=ON -DGMX_THREAD_MPI=OFF -DGMX_FFT_LIBRARY=mkl 
-DMPIEXEC=${MPI_BIN_DIR}/mpirun -DREGRESSIONTEST_DOWNLOAD=OFF 
-DCMAKE_INSTALL_PREFIX=${TARGET_DIR}

make 2>&1 | tee ${LOG_DIR}/make_double.out
make install 2>&1 | tee ${LOG_DIR}/make-install_double.out
[...]

ANY HELP IS MUCH APPRECIATED.

Thanx in advance.

.-)

--
Rainer Rutka
University of Konstanz
Communication, Information, Media Centre (KIM)
 * High-Performance-Computing (HPC)
 * KIM-Support and -Base-Services
Room: V511
78457 Konstanz, Germany
+49 7531 88-5413

-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.

[gmx-users] Gromacs 2016.3 FLOATING-POINT EXECPTION/DIVIDE-BY_ZERO errors

2017-09-05 Thread Rainer Rutka

HI!.
My name is Rainer. I am one of the module-/SW-maintainers
for the bwHPC-C5-project. "http://www.bwhpc-c5.de/en/index.php;

Since a couple of time we can't run Gromacs 2016.n on
our clusters. Here's one of the errors we received from
one of the users:


I am using Gromacs on BwUniCluster for few months and I performed 
molecular Dynamics Simulations without any problem until the end of 
July. Since August, I cannot prepare the input file and if I prepare 
them elsewhere, the simulation crash although it was the same that the 
ones which perfectly worked few days before. It seems that I have not 
such problem smaller systems with less atoms.


uc1n997:43275] *** Process received signal ***
[uc1n997:43275] Signal: Floating point exception (8)
[uc1n997:43275] Signal code: Integer divide-by-zero (1)
[uc1n997:43275] Failing at address: 0x42fd74
[uc1n997:43275] [ 0] /usr/lib64/libpthread.so.0(+0xf5e0)[0x2b65e89805e0]
[uc1n997:43275] [ 1] gmx_mpi_d(__svml_idiv4_h9+0x64)[0x42fd74]
[uc1n997:43275] [ 2] 
/pfs/data1/software_uc1/bwhpc/common/chem/gromacs/5.1.2-openmpi-1.8-intel-15.0/bin/../lib64/libgromacs_mpi_d.so.1(count_bonded_distances+0x43c)[0x2b65e6b1f72c]
[uc1n997:43275] [ 3] 
/pfs/data1/software_uc1/bwhpc/common/chem/gromacs/5.1.2-openmpi-1.8-intel-15.0/bin/../lib64/libgromacs_mpi_d.so.1(pme_load_estimate+0x20)[0x2b65e6b1ef70]
[uc1n997:43275] [ 4] 
/pfs/data1/software_uc1/bwhpc/common/chem/gromacs/5.1.2-openmpi-1.8-intel-15.0/bin/../lib64/libgromacs_mpi_d.so.1(gmx_grompp+0x257a)[0x2b65e639e84a]
[uc1n997:43275] [ 5] 
/pfs/data1/software_uc1/bwhpc/common/chem/gromacs/5.1.2-openmpi-1.8-intel-15.0/bin/../lib64/libgromacs_mpi_d.so.1(_ZN3gmx24CommandLineModuleManager3runEiPPc+0x267)[0x2b65e6127e97]

[uc1n997:43275] [ 6] gmx_mpi_d(main+0xbc)[0x40c13c]
[uc1n997:43275] [ 7] 
/usr/lib64/libc.so.6(__libc_start_main+0xf5)[0x2b65e8baec05]

[uc1n997:43275] [ 8] gmx_mpi_d[0x40bfb9]
[uc1n997:43275] *** End of error message ***


Gromacs 2016.3 was build this way (excerpt):

[...]
# #(3) Load required modules for build process
module load compiler/intel/16.0
module load mpi/openmpi/2.1-intel-16.0
module load numlib/mkl/11.3.4
module load devel/cmake/3.3.2
[...]
# double precission
cmake -DCMAKE_VERBOSE_MAKEFILE=ON -DGMX_MPI=ON -DGMX_GPU=OFF 
-DGMX_DOUBLE=ON -DGMX_THREAD_MPI=OFF -DGMX_FFT_LIBRARY=mkl 
-DMPIEXEC=${MPI_BIN_DIR}/mpirun -DREGRESSIONTEST_DOWNLOAD=OFF 
-DCMAKE_INSTALL_PREFIX=${TARGET_DIR}

make 2>&1 | tee ${LOG_DIR}/make_double.out
make install 2>&1 | tee ${LOG_DIR}/make-install_double.out
[...]

ANY HELP IS MUCH APPRECIATED.

Thanx in advance.

.-)

--
Rainer Rutka
University of Konstanz
Communication, Information, Media Centre (KIM)
 * High-Performance-Computing (HPC)
 * KIM-Support and -Base-Services
Room: V511
78457 Konstanz, Germany
+49 7531 88-5413

-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.