Re: [easybuild] Failure in UCX-1.14.1-GCCcore-12.3.0.eb when installing foss-2023a.eb

2024-03-15 Thread Ole Holm Nielsen
/ib_md.c:751:24: error: 'IBV_ACCESS_ON_DEMAND' undeclared (first
use in this function); did you mean 'IBV_EXP_ACCESS_ON_DEMAND'?
   751 |     if (access_flags & IBV_ACCESS_ON_DEMAND) {
       |                        ^~~~
       |                        IBV_EXP_ACCESS_ON_DEMAND
base/ib_md.c: In function 'uct_ib_md_global_odp_init':
base/ib_md.c:1449:54: error: 'IBV_ACCESS_ON_DEMAND' undeclared (first
use in this function); did you mean 'IBV_EXP_ACCESS_ON_DEMAND'?
  1449 |                            UCT_IB_MEM_ACCESS_FLAGS |
IBV_ACCESS_ON_DEMAND,
       | 
^~~~
       | 
IBV_EXP_ACCESS_ON_DEMAND



Any hint on how to fix it? Is there a bug with IBV_ACCESS_ON_DEMAND
variable?



--

*Dr. Joaquim Jornet Somoza*
*Técnico Superior de Cálculo Científico *
Servicios Generales a la Investigación (*SGIker*)
Universidad del País Vasco (*UPV/EHU*)
email: j.jornet.som...@gmail.com <mailto:j.jornet.som...@gmail.com>
Edificio Joxe Maria Korta (Campus Gipuzkoa)
Av. Tolosa 72, 4a planta
20018 Donostia-San Sebastián,
Gipuzkoa, Spain

/External Collaborator./
Nano-Bio Spectroscopy group
Departamento de Física de Materiales
Universidad del País Vasco (UPV/EHU)
Donostia-San Sebastián, Gipuzkoa, Spain

The Max Planck Institute for the Structure and Dynamics of Matter (MPSD)
Bldg. 99 (CFEL)
Luruper Chaussee 149
22761 Hamburg, Germany


--
Ole Holm Nielsen
PhD, Senior HPC Officer
Department of Physics, Technical University of Denmark,


Re: [easybuild] Build failure of Rust-1.70.0-GCCcore-12.3.0.eb

2023-12-08 Thread Ole Holm Nielsen

On 08-12-2023 20:46, Alan O'Cais wrote:
There's been an update to the easyblock that should fix this for you, 
see https://github.com/easybuilders/easybuild-easyblocks/pull/3038 
<https://github.com/easybuilders/easybuild-easyblocks/pull/3038>

--use-easyblocks-from-pr 3038


Thanks a lot!  This command does the job:

$ eb Rust-1.70.0-GCCcore-12.3.0.eb --include-easyblocks-from-pr=3038


On 8 December 2023 20:33:34 CET, Ole Holm Nielsen 
 wrote:


I'm trying to build Rust-1.70.0-GCCcore-12.3.0.eb but this fails
because of a download failure. I don't know how to deal with this...

The log file shows that download of an older version 1.69 file
rust-std-1.69.0-x86_64-unknown-linux-gnu.tar.xz is attempted:

(lines deleted)
downloading

https://ci-artifacts.rust-lang.org/rustc-builds/90c541806f23a127002de5b4038be731ba1458ca/rust-dev-1.70.0-x86_64-unknown-linux-gnu.tar.xz
 
<https://ci-artifacts.rust-lang.org/rustc-builds/90c541806f23a127002de5b4038be731ba1458ca/rust-dev-1.70.0-x86_64-unknown-linux-gnu.tar.xz>
#=#=- # #

#=O#- # #


-#O=- # # #

curl: (22) The requested URL returned error: 404

error: failed to download llvm from ci

help: old builds get deleted after a certain time
help: if trying to compile an old commit of rustc, disable
`download-ci-llvm` in config.toml:

[llvm]
download-ci-llvm = false

Build completed unsuccessfully in 0:01:05
(at easybuild/tools/run.py:681 in parse_cmd_output)
== 2023-12-08 20:28:03,381 build_log.py:267 INFO ... (took 1 min 5 secs)
== 2023-12-08 20:28:03,381 filetools.py:2012 INFO Removing lock

/home/modules/software/.locks/_home_modules_software_Rust_1.70.0-GCCcore-12.3.0.lock...
== 2023-12-08 20:28:03,382 filetools.py:383 INFO Path

/home/modules/software/.locks/_home_modules_software_Rust_1.70.0-GCCcore-12.3.0.lock
 successfully removed.
== 2023-12-08 20:28:03,382 filetools.py:2016 INFO Lock removed:

/home/modules/software/.locks/_home_modules_software_Rust_1.70.0-GCCcore-12.3.0.lock
== 2023-12-08 20:28:03,382 easyblock.py:4277 WARNING build failed
(first 300 chars): cmd " export
CARGO_HOME=/dev/shm/Rust/1.70.0/GCCcore-12.3.0/cargo && ./x.py build
-j 64 " exited with exit code 1 and output:
downloading

https://static.rust-lang.org/dist/2023-04-20/rust-std-1.69.0-x86_64-unknown-linux-gnu.tar.xz
 
<https://static.rust-lang.org/dist/2023-04-20/rust-std-1.69.0-x86_64-unknown-linux-gnu.tar.xz>
#=#=- # #
== 2023-12-08 20:28:03,383 easyblock.py:328 INFO Closing log for
application name Rust version 1.70.0

Can anyone suggest a fix?

Thanks,
Ole


[easybuild] Build failure of Rust-1.70.0-GCCcore-12.3.0.eb

2023-12-08 Thread Ole Holm Nielsen
I'm trying to build Rust-1.70.0-GCCcore-12.3.0.eb but this fails because 
of a download failure.  I don't know how to deal with this...


The log file shows that download of an older version 1.69 file 
rust-std-1.69.0-x86_64-unknown-linux-gnu.tar.xz is attempted:


(lines deleted)
downloading 
https://ci-artifacts.rust-lang.org/rustc-builds/90c541806f23a127002de5b4038be731ba1458ca/rust-dev-1.70.0-x86_64-unknown-linux-gnu.tar.xz
#=#=-  #   # 



  #=O#- ## 




 -#O=- #  #  # 



curl: (22) The requested 
URL returned error: 404


error: failed to download llvm from ci

help: old builds get deleted after a certain time
help: if trying to compile an old commit of rustc, disable 
`download-ci-llvm` in config.toml:


[llvm]
download-ci-llvm = false

Build completed unsuccessfully in 0:01:05
 (at easybuild/tools/run.py:681 in parse_cmd_output)
== 2023-12-08 20:28:03,381 build_log.py:267 INFO ... (took 1 min 5 secs)
== 2023-12-08 20:28:03,381 filetools.py:2012 INFO Removing lock 
/home/modules/software/.locks/_home_modules_software_Rust_1.70.0-GCCcore-12.3.0.lock...
== 2023-12-08 20:28:03,382 filetools.py:383 INFO Path 
/home/modules/software/.locks/_home_modules_software_Rust_1.70.0-GCCcore-12.3.0.lock 
successfully removed.
== 2023-12-08 20:28:03,382 filetools.py:2016 INFO Lock removed: 
/home/modules/software/.locks/_home_modules_software_Rust_1.70.0-GCCcore-12.3.0.lock
== 2023-12-08 20:28:03,382 easyblock.py:4277 WARNING build failed (first 
300 chars): cmd " export 
CARGO_HOME=/dev/shm/Rust/1.70.0/GCCcore-12.3.0/cargo &&   ./x.py build 
-j 64 " exited with exit code 1 and output:
downloading 
https://static.rust-lang.org/dist/2023-04-20/rust-std-1.69.0-x86_64-unknown-linux-gnu.tar.xz

#=#=-  #   #
== 2023-12-08 20:28:03,383 easyblock.py:328 INFO Closing log for 
application name Rust version 1.70.0


Can anyone suggest a fix?

Thanks,
Ole

--
Ole Holm Nielsen
PhD, Senior HPC Officer
Department of Physics, Technical University of Denmark


[easybuild] EB modules for AMD ROCm?

2023-12-07 Thread Ole Holm Nielsen

Dear Easybuilders,

We're configuring an AMD EPYC 7313 server with two AMD MI210 GPUs. AMD 
provides instructions for installing Release-specific AMDGPU and ROCm 
Repositories on Linux Distributions in the page 
https://rocm.docs.amd.com/en/latest/deploy/linux/os-native/install.html


The AMD instructions allow us to install multiple versions of ROCm, 
however, I'm missing the ability to create software modules which 
conveniently set PATH, LD_LIBRARY_PATH etc.


I looked at the EB software list 
https://docs.easybuild.io/version-specific/supported-software/#rocm but 
only an ancient version 4.5.0 is offered.  The currently latest version is 
5.7.2.  We'd like to have at least version 5.3.3 which is installed on the 
LUMI supercomputer so that we would be compatible (to some extent) with LUMI.


Question: Has anyone created EB module files to build ROCm manually, or as 
a wrapper about AMD's ROCm RPM packages?


Thanks a lot,
Ole

--
Ole Holm Nielsen
PhD, Senior HPC Officer
Department of Physics, Technical University of Denmark


[easybuild] SOLVED: OpenMPI-4.1.5-GCC-12.3.0.eb Sanity check failed on AMD "Genoa" node

2023-10-10 Thread Ole Holm Nielsen
FYI: I've solved the OpenMPI sanity check error by reinstalling the OS 
with an Omni-Path network fabric adapter mounted in the server. 
Previously I had installed all missing prerequisite RPM packages after 
adding the adapter, but apparently that wasn't enough!


Now the OpenMPI 4.1.5 module builds correctly and passes all its tests.

Best regards,
Ole

On 10/3/23 09:51, Ole Holm Nielsen wrote:
I'm starting EasyBuild up on our new AMD "Genoa" platform with 1 AMD EPYC 
9124 16-Core Processor with 2 threads/core, 384 GB RAM, Omni-Path (OPA) 
fabric, and AlmaLinux 8.8 OS.


I wiped our existing EB modules so as to start with a clean slate.  The 
goal is to build the foss-2023a toolchain as a starting point for further 
modules.


I previously experienced the same error as shown below with 
OpenMPI-4.1.4-GCC-12.2.0.eb, and Kenneth suggested that the lack of 
Infiniband hardware might be the problem.  I had an Omni-Path (OPA fabric) 
adapter lying around, so I installed it in the system and made sure that 
IPoIB is working as expected.


The build of the OpenMPI-4.1.5-GCC-12.3.0.eb unfortunately fails with the 
same "PML cm cannot be selected" error as before:


== 2023-10-03 09:36:16,437 build_log.py:171 ERROR EasyBuild crashed with 
an error (at easybuild/base/exceptions.py:126 in __init__): Sanity check 
failed: sanity check command OMPI_MCA_rmaps_base_oversubscribe=1 mpirun -n 
8 /dev/shm/OpenMPI/4.1.5/GCC-12.3.0/mpi_test_hello_c exited with code 1 
(output: [e000.nifl.fysik.dtu.dk:2392967] PML cm cannot be selected

[e000.nifl.fysik.dtu.dk:2392963] PML cm cannot be selected
)
sanity check command mpirun -n 1 
/dev/shm/OpenMPI/4.1.5/GCC-12.3.0/mpi_test_hello_c exited with code 1 
(output: [e000.nifl.fysik.dtu.dk:2392988] PML cm cannot be selected

)
sanity check command OMPI_MCA_rmaps_base_oversubscribe=1 mpirun -n 8 
/dev/shm/OpenMPI/4.1.5/GCC-12.3.0/mpi_test_hello_mpifh exited with code 1 
(output: [e000.nifl.fysik.dtu.dk:2393008] PML cm cannot be selected

)
sanity check command mpirun -n 1 
/dev/shm/OpenMPI/4.1.5/GCC-12.3.0/mpi_test_hello_mpifh exited with code 1 
(output: [e000.nifl.fysik.dtu.dk:2393029] PML cm cannot be selected

)
sanity check command OMPI_MCA_rmaps_base_oversubscribe=1 mpirun -n 8 
/dev/shm/OpenMPI/4.1.5/GCC-12.3.0/mpi_test_hello_usempi exited with code 1 
(output: [e000.nifl.fysik.dtu.dk:2393042] PML cm cannot be selected

)
sanity check command mpirun -n 1 
/dev/shm/OpenMPI/4.1.5/GCC-12.3.0/mpi_test_hello_usempi exited with code 1 
(output: [e000.nifl.fysik.dtu.dk:2393070] PML cm cannot be selected

) (at easybuild/framework/easyblock.py:3655 in _sanity_check_step)
== 2023-10-03 09:36:16,437 build_log.py:267 INFO ... (took 5 secs)
== 2023-10-03 09:36:16,437 filetools.py:2012 INFO Removing lock 
/home/modules/software/.locks/_home_modules_software_OpenMPI_4.1.5-GCC-12.3.0.lock...
== 2023-10-03 09:36:16,438 filetools.py:383 INFO Path 
/home/modules/software/.locks/_home_modules_software_OpenMPI_4.1.5-GCC-12.3.0.lock successfully removed.
== 2023-10-03 09:36:16,438 filetools.py:2016 INFO Lock removed: 
/home/modules/software/.locks/_home_modules_software_OpenMPI_4.1.5-GCC-12.3.0.lock
== 2023-10-03 09:36:16,438 easyblock.py:4277 WARNING build failed (first 
300 chars): Sanity check failed: sanity check command 
OMPI_MCA_rmaps_base_oversubscribe=1 mpirun -n 8 
/dev/shm/OpenMPI/4.1.5/GCC-12.3.0/mpi_test_hello_c exited with code 1 
(output: node[e000.nifl.fysik.dtu.dk:2392967] PML cm cannot be selected

[e000.nifl.fysik.dtu.dk:2392963] PML cm cannot be selected
)
sanity chec
== 2023-10-03 09:36:16,438 easyblock.py:328 INFO Closing log for 
application name OpenMPI version 4.1.5



Since we now have used the latest GCC 12.3.0, and we have installed an OPA 
fabric, the problem would seem to be related to having the AMD "Genoa" 
hardware.


Does anyone have suggestions for building OpenMPI successfully on this 
platform?


Thanks,
Ole



--
Ole Holm Nielsen
PhD, Senior HPC Officer
Department of Physics, Technical University of Denmark,
Fysikvej Building 309, DK-2800 Kongens Lyngby, Denmark
E-mail: ole.h.niel...@fysik.dtu.dk
Homepage: http://dcwww.fysik.dtu.dk/~ohnielse/
Mobile: (+45) 5180 1620


[easybuild] OpenMPI-4.1.5-GCC-12.3.0.eb Sanity check failed on AMD "Genoa" node

2023-10-03 Thread Ole Holm Nielsen
I'm starting EasyBuild up on our new AMD "Genoa" platform with 1 AMD EPYC 
9124 16-Core Processor with 2 threads/core, 384 GB RAM, Omni-Path (OPA) 
fabric, and AlmaLinux 8.8 OS.


I wiped our existing EB modules so as to start with a clean slate.  The 
goal is to build the foss-2023a toolchain as a starting point for further 
modules.


I previously experienced the same error as shown below with 
OpenMPI-4.1.4-GCC-12.2.0.eb, and Kenneth suggested that the lack of 
Infiniband hardware might be the problem.  I had an Omni-Path (OPA fabric) 
adapter lying around, so I installed it in the system and made sure that 
IPoIB is working as expected.


The build of the OpenMPI-4.1.5-GCC-12.3.0.eb unfortunately fails with the 
same "PML cm cannot be selected" error as before:


== 2023-10-03 09:36:16,437 build_log.py:171 ERROR EasyBuild crashed with 
an error (at easybuild/base/exceptions.py:126 in __init__): Sanity check 
failed: sanity check command OMPI_MCA_rmaps_base_oversubscribe=1 mpirun -n 
8 /dev/shm/OpenMPI/4.1.5/GCC-12.3.0/mpi_test_hello_c exited with code 1 
(output: [e000.nifl.fysik.dtu.dk:2392967] PML cm cannot be selected

[e000.nifl.fysik.dtu.dk:2392963] PML cm cannot be selected
)
sanity check command mpirun -n 1 
/dev/shm/OpenMPI/4.1.5/GCC-12.3.0/mpi_test_hello_c exited with code 1 
(output: [e000.nifl.fysik.dtu.dk:2392988] PML cm cannot be selected

)
sanity check command OMPI_MCA_rmaps_base_oversubscribe=1 mpirun -n 8 
/dev/shm/OpenMPI/4.1.5/GCC-12.3.0/mpi_test_hello_mpifh exited with code 1 
(output: [e000.nifl.fysik.dtu.dk:2393008] PML cm cannot be selected

)
sanity check command mpirun -n 1 
/dev/shm/OpenMPI/4.1.5/GCC-12.3.0/mpi_test_hello_mpifh exited with code 1 
(output: [e000.nifl.fysik.dtu.dk:2393029] PML cm cannot be selected

)
sanity check command OMPI_MCA_rmaps_base_oversubscribe=1 mpirun -n 8 
/dev/shm/OpenMPI/4.1.5/GCC-12.3.0/mpi_test_hello_usempi exited with code 1 
(output: [e000.nifl.fysik.dtu.dk:2393042] PML cm cannot be selected

)
sanity check command mpirun -n 1 
/dev/shm/OpenMPI/4.1.5/GCC-12.3.0/mpi_test_hello_usempi exited with code 1 
(output: [e000.nifl.fysik.dtu.dk:2393070] PML cm cannot be selected

) (at easybuild/framework/easyblock.py:3655 in _sanity_check_step)
== 2023-10-03 09:36:16,437 build_log.py:267 INFO ... (took 5 secs)
== 2023-10-03 09:36:16,437 filetools.py:2012 INFO Removing lock 
/home/modules/software/.locks/_home_modules_software_OpenMPI_4.1.5-GCC-12.3.0.lock...
== 2023-10-03 09:36:16,438 filetools.py:383 INFO Path 
/home/modules/software/.locks/_home_modules_software_OpenMPI_4.1.5-GCC-12.3.0.lock 
successfully removed.
== 2023-10-03 09:36:16,438 filetools.py:2016 INFO Lock removed: 
/home/modules/software/.locks/_home_modules_software_OpenMPI_4.1.5-GCC-12.3.0.lock
== 2023-10-03 09:36:16,438 easyblock.py:4277 WARNING build failed (first 
300 chars): Sanity check failed: sanity check command 
OMPI_MCA_rmaps_base_oversubscribe=1 mpirun -n 8 
/dev/shm/OpenMPI/4.1.5/GCC-12.3.0/mpi_test_hello_c exited with code 1 
(output: node[e000.nifl.fysik.dtu.dk:2392967] PML cm cannot be selected

[e000.nifl.fysik.dtu.dk:2392963] PML cm cannot be selected
)
sanity chec
== 2023-10-03 09:36:16,438 easyblock.py:328 INFO Closing log for 
application name OpenMPI version 4.1.5



Since we now have used the latest GCC 12.3.0, and we have installed an OPA 
fabric, the problem would seem to be related to having the AMD "Genoa" 
hardware.


Does anyone have suggestions for building OpenMPI successfully on this 
platform?


Thanks,
Ole

--
Ole Holm Nielsen
PhD, Senior HPC Officer
Department of Physics, Technical University of Denmark


Re: [easybuild] Module for netcdf4-python with the foss-2023a toolchain?

2023-10-02 Thread Ole Holm Nielsen

Hi Sebastian,

Thanks a lot!  I've built the module from the PR now :-)

Best regards,
Ole

On 10/2/23 14:15, Sebastian Achilles wrote:

Hey Ole,

there is the following PR for netcdf4-python in the foss-2023a toolchain: 
https://github.com/easybuilders/easybuild-easyconfigs/pull/18731


I am looking at the PR now. If the test reports are all successful I will 
merge it and it will be part of the next EasyBuild release 4.8.2. If you 
want to install it before the next release you do this with `--from-pr`, e.g.:


eb --from-pr 18731

Best,
Sebastian

On 02.10.23 13:51, Ole Holm Nielsen wrote:

We have a user request for h5py/3.9.0-foss-2023a which uses Python 3.11.3.

The user would also like to use netcdf4-python together with h5py, but 
the latest .eb file is netcdf4-python-1.6.3-foss-2022b.eb.


Question: Does anyone have a .eb file for netcdf4-python with the 
foss-2023a toolchain?


We're using EB 4.8.1.

Thanks,
Ole


--
HPC, Cloud, Data Systems and Services
Juelich Supercomputing Centre

phone: +49 2461 61-85996
email:s.achil...@fz-juelich.de

-
-
Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDir Stefan Müller
Geschaeftsfuehrung: Prof. Dr. Astrid Lambrecht (Vorsitzende),
Karsten Beneke (stellv. Vorsitzender), Dr. Ir. Pieter Jansens
-
-



--
Ole Holm Nielsen
PhD, Senior HPC Officer
Department of Physics, Technical University of Denmark,
Fysikvej Building 309, DK-2800 Kongens Lyngby, Denmark
E-mail: ole.h.niel...@fysik.dtu.dk
Homepage: http://dcwww.fysik.dtu.dk/~ohnielse/
Mobile: (+45) 5180 1620


[easybuild] Module for netcdf4-python with the foss-2023a toolchain?

2023-10-02 Thread Ole Holm Nielsen

We have a user request for h5py/3.9.0-foss-2023a which uses Python 3.11.3.

The user would also like to use netcdf4-python together with h5py, but the 
latest .eb file is netcdf4-python-1.6.3-foss-2022b.eb.


Question: Does anyone have a .eb file for netcdf4-python with the 
foss-2023a toolchain?


We're using EB 4.8.1.

Thanks,
Ole

--
Ole Holm Nielsen
PhD, Senior HPC Officer
Department of Physics, Technical University of Denmark


Re: [easybuild] Re: OpenBLAS-0.3.21-GCC-12.2.0.eb testing failed om AMD "Genoa" node

2023-09-28 Thread Ole Holm Nielsen

Dear Kenneth,

On 9/28/23 10:49, Kenneth Hoste wrote:
Not seeing the problem with OpenBLAS 0.3.23 is encouraging, that probably 
means a fix is hiding in either OpenBLAS 0.3.22 or 0.3.23 that we may be 
able to backport to 0.3.21.


I don't see anything obvious in the release notes though (see 
https://github.com/OpenMathLib/OpenBLAS/releases) at first glance.


Can you try and see if there's a problem with OpenBLAS 0.3.22, by using:

eb --try-software-version 0.3.22 OpenBLAS-0.3.23-GCC-12.3.0.eb

That would help narrow things down (a bit).


That try failed:

$ eb --try-software-version 0.3.22 OpenBLAS-0.3.23-GCC-12.3.0.eb
== Temporary log file in case of crash /tmp/eb-ljmkzhs7/easybuild-l1cmgr72.log
== found valid index for 
/home/modules/software/EasyBuild/4.8.1/easybuild/easyconfigs, so using it...
== found valid index for 
/home/modules/software/EasyBuild/4.8.1/easybuild/easyconfigs, so using it...
== processing EasyBuild easyconfig 
/tmp/eb-ljmkzhs7/tweaked_easyconfigs/OpenBLAS-0.3.22-GCC-12.3.0.eb

== building and installing OpenBLAS/0.3.22-GCC-12.3.0...
== fetching files...
== ... (took 6 secs)
== creating build dir, resetting environment...
== unpacking...
== patching...
== preparing...
== configuring...
== building...
== ... (took 49 secs)
== FAILED: Installation ended unsuccessfully (build directory: 
/dev/shm/OpenBLAS/0.3.22/GCC-12.3.0): build failed (first 300 chars): cmd 
" make -j 32 libs netlib shared  BINARY='64'  CC='gcc'  FC='gfortran' 
MAKE_NB_JOBS='-1'  USE_OPENMP='1'  USE_THREAD='1'  CFLAGS='-O2 
-ftree-vectorize -march=native -fno-math-errno' " exited with exit code 2 
and output:
/home/modules/software/binutils/2.40-GCCcore-12.3.0/bin/ld: warning: 
/tmp/eb (took 56 secs)
== Results of the build can be found in the log file(s) 
/tmp/eb-ljmkzhs7/easybuild-OpenBLAS-0.3.22-20230928.104942.hoMjh.log
ERROR: Build of 
/tmp/eb-ljmkzhs7/tweaked_easyconfigs/OpenBLAS-0.3.22-GCC-12.3.0.eb failed 
(err: 'build failed (first 300 chars): cmd " make -j 32 libs netlib shared 
 BINARY=\'64\'  CC=\'gcc\'  FC=\'gfortran\'  MAKE_NB_JOBS=\'-1\' 
USE_OPENMP=\'1\'  USE_THREAD=\'1\'  CFLAGS=\'-O2 -ftree-vectorize 
-march=native -fno-math-errno\' " exited with exit code 2 and 
output:\n/home/modules/software/binutils/2.40-GCCcore-12.3.0/bin/ld: 
warning: /tmp/eb')


The logfile ends with:

== 2023-09-28 10:50:39,240 filetools.py:2012 INFO Removing lock 
/home/modules/software/.locks/_home_modules_software_OpenBLAS_0.3.22-GCC-12.3.0.lock...
== 2023-09-28 10:50:39,241 filetools.py:383 INFO Path 
/home/modules/software/.locks/_home_modules_software_OpenBLAS_0.3.22-GCC-12.3.0.lock 
successfully removed.
== 2023-09-28 10:50:39,241 filetools.py:2016 INFO Lock removed: 
/home/modules/software/.locks/_home_modules_software_OpenBLAS_0.3.22-GCC-12.3.0.lock
== 2023-09-28 10:50:39,241 easyblock.py:4277 WARNING build failed (first 
300 chars): cmd " make -j 32 libs netlib shared  BINARY='64'  CC='gcc' 
FC='gfortran'  MAKE_NB_JOBS='-1'  USE_OPENMP='1'  USE_THREAD='1' 
CFLAGS='-O2 -ftree-vectorize -march=native -fno-math-errno' " exited with 
exit code 2 and output:

/home/modules/software/binutils/2.40-GCCcore-12.3.0/bin/ld: warning: /tmp/eb
== 2023-09-28 10:50:39,241 easyblock.py:328 INFO Closing log for 
application name OpenBLAS version 0.3.22




Best regards,
Ole


Re: [easybuild] Re: OpenBLAS-0.3.21-GCC-12.2.0.eb testing failed om AMD "Genoa" node

2023-09-28 Thread Ole Holm Nielsen

Dear Kenneth,

On 9/28/23 09:42, Kenneth Hoste wrote:

I suspect the problem is more with OpenBLAS than GCC.

OpenBLAS 0.3.20 probably doesn't detect AMD Genoa (Zen4) correctly yet, 
and doesn't try to use AVX-512 instructions there.


OpenBLAS 0.3.21 detects Genoa, enbales AVX-512, but there's a bug in a 
kernel being used.


I would try and see whether you observe any problems with more recent 
OpenBLAS versions, like OpenBLAS-0.3.23-GCC-12.3.0.eb .


That version build correctly:

$ eb OpenBLAS-0.3.23-GCC-12.3.0.eb -r
(lines deleted)
== processing EasyBuild easyconfig 
/home/modules/software/EasyBuild/4.8.1/easybuild/easyconfigs/o/OpenBLAS/OpenBLAS-0.3.23-GCC-12.3.0.eb

== building and installing OpenBLAS/0.3.23-GCC-12.3.0...
== fetching files...
== ... (took 6 secs)
== creating build dir, resetting environment...
== unpacking...
== patching...
== preparing...
== configuring...
== building...
== ... (took 57 secs)
== testing...
== ... (took 2 mins 34 secs)
== installing...
== ... (took 2 secs)
== taking care of extensions...
== restore after iterating...
== postprocessing...
== sanity checking...
== cleaning up...
== creating module...
== permissions...
== packaging...
== COMPLETED: Installation ended successfully (took 3 mins 42 secs)
== Results of the build can be found in the log file(s) 
/home/modules/software/OpenBLAS/0.3.23-GCC-12.3.0/easybuild/easybuild-OpenBLAS-0.3.23-20230928.103500.log

== Build succeeded for 22 out of 22

If not, we may be able to trace down the fix and patch OpenBLAS 0.3.21 to 
fix the problem you're seeing...


So is there any hope that foss-2022b.eb with OpenBLAS/0.3.21-GCC-12.2.0 
can be made to work correctly on AMD Genoa nodes?


Thanks,
Ole


On 28/09/2023 09:26, Ole Holm Nielsen wrote:
It's interesting that while attempting to build the foss-2022a toolchain 
in stead of foss-2022b, the build of OpenBLAS with GCC 11.3.0 succeeds 
without errors:


== processing EasyBuild easyconfig 
/home/modules/software/EasyBuild/4.8.1/easybuild/easyconfigs/o/OpenBLAS/OpenBLAS-0.3.20-GCC-11.3.0.eb

== building and installing OpenBLAS/0.3.20-GCC-11.3.0...
== fetching files...
== ... (took 4 secs)
== creating build dir, resetting environment...
== unpacking...
== patching...
== preparing...
== configuring...
== building...
== ... (took 56 secs)
== testing...
== ... (took 2 mins 24 secs)
== installing...
== ... (took 1 secs)
== taking care of extensions...
== restore after iterating...
== postprocessing...
== sanity checking...
== cleaning up...
== creating module...
== permissions...
== packaging...
== COMPLETED: Installation ended successfully (took 3 mins 28 secs)

The only difference here appears to be GCC version 12.2.0 versus 11.3.0!

Any ideas about what's causing this error in the tests?

Perhaps GCC version 12.2.0 tries to use the new AVX-512 instructions in 
AMD Genoa and has a bug?


Thanks,
Ole


On 9/26/23 08:04, Ole Holm Nielsen wrote:
I'm starting EasyBuild up on our new AMD "Genoa" platform with 1 AMD 
EPYC 9124 16-Core Processor with 2 threads/core, 384 GB RAM, and 
AlmaLinux 8.8 OS.


Unfortunately, building the foss-2022b toolchain exits during the 
testing phase of OpenBLAS-0.3.21-GCC-12.2.0.eb as shown below.  Does 
anyone have ideas about what might be wrong?


$ eb foss-2022b.eb -r
(lines deleted)
== processing EasyBuild easyconfig 
/home/modules/software/EasyBuild/4.8.1/easybuild/easyconfigs/o/OpenBLAS/OpenBLAS-0.3.21-GCC-12.2.0.eb

== building and installing OpenBLAS/0.3.21-GCC-12.2.0...
== fetching files...
== ... (took 7 secs)
== creating build dir, resetting environment...
== unpacking...
== patching...
== preparing...
== configuring...
== building...
== ... (took 53 secs)
== testing...
== ... (took 12 secs)
== FAILED: Installation ended unsuccessfully (build directory: 
/dev/shm/OpenBLAS/0.3.21/GCC-12.2.0): build failed (first 300 chars): 
cmd " make tests  BINARY='64'  CC='gcc'  FC='gfortran' 
MAKE_NB_JOBS='-1' USE_OPENMP='1'  USE_THREAD='1' " exited with exit 
code 2 and output:
/home/modules/software/binutils/2.39-GCCcore-12.2.0/bin/ld: warning: 
/tmp/eb-74m3kzgo/ccy1Gkzg.o: missing .note.GNU-stack section implies 
executable stack

/ (took 1 min 14 secs)
== Results of the build can be found in the log file(s) 
/tmp/eb-74m3kzgo/easybuild-OpenBLAS-0.3.21-20230925.161149.UfDUO.log
ERROR: Build of 
/home/modules/software/EasyBuild/4.8.1/easybuild/easyconfigs/o/OpenBLAS/OpenBLAS-0.3.21-GCC-12.2.0.eb failed (err: 'build failed (first 300 chars): cmd " make tests BINARY=\'64\'  CC=\'gcc\'  FC=\'gfortran\'  MAKE_NB_JOBS=\'-1\' USE_OPENMP=\'1\'  USE_THREAD=\'1\' " exited with exit code 2 and output:\n/home/modules/software/binutils/2.39-GCCcore-12.2.0/bin/ld: warning: /tmp/eb-74m3kzgo/ccy1Gkzg.o: missing .note.GNU-stack section implies executable stack\n/')



The log file shows some an error in test_kernel_regress.c:50:

(lines deleted)
./openblas_utest
TEST 1/37 max:smax_zero [OK]
TEST 2/37 max:dmax_positive [OK]
TEST 3/37 ma

Re: [easybuild] OpenMPI-4.1.4-GCC-12.2.0.eb Sanity check failed on AMD "Genoa" node

2023-09-28 Thread Ole Holm Nielsen

Dear Kenneth,

On 9/28/23 10:07, Kenneth Hoste wrote:
Unfortunately, building the foss-2022b toolchain exits during the 
testing phase of OpenMPI-4.1.4-GCC-12.2.0.eb as shown below.  Does 
anyone have ideas about what might be wrong?

...
By default OpenMPI is being configured with "--with-verbs", you should see 
that popping up in the log file (or use "eb --trace" to get some more info 
during the installation).


Thanks, I sort of suspected that IB was somehow being assumed tacitly by 
EB :-)


If you don't have Infiniband, you should add --without-verbs via 
configopts in your OpenMPI easyconfig file (which should prevent the 
OpenMPI easyblock from adding --with-verbs), or using a hook (see for 
example 
https://docs.easybuild.io/hooks/#replace-with-verbs-with-without-verbs-in-openmpi-configure-options, although that exact example won't work, you should just hard inject --without-verbs in self.cfg['configopts'] instead in the pre_configure_hook).


We eventually will use our AMD Genoa EB modules on some nodes to be 
installed next month which will include Mellanox/Nvidia Infiniband.


Question: Would it help if I take an old (like 10 years old) Mellanox IB 
PCIe adapter lying around and mount it in my server?  Or maybe a 
relatively new Omni-Path adapter?


Would that make the OpenMPI EB module happy, and would the module work 
with our future nodes?


Thanks,
Ole

--
Ole Holm Nielsen
PhD, Senior HPC Officer
Department of Physics, Technical University of Denmark,


[easybuild] Re: OpenBLAS-0.3.21-GCC-12.2.0.eb testing failed om AMD "Genoa" node

2023-09-28 Thread Ole Holm Nielsen
It's interesting that while attempting to build the foss-2022a toolchain 
in stead of foss-2022b, the build of OpenBLAS with GCC 11.3.0 succeeds 
without errors:


== processing EasyBuild easyconfig 
/home/modules/software/EasyBuild/4.8.1/easybuild/easyconfigs/o/OpenBLAS/OpenBLAS-0.3.20-GCC-11.3.0.eb

== building and installing OpenBLAS/0.3.20-GCC-11.3.0...
== fetching files...
== ... (took 4 secs)
== creating build dir, resetting environment...
== unpacking...
== patching...
== preparing...
== configuring...
== building...
== ... (took 56 secs)
== testing...
== ... (took 2 mins 24 secs)
== installing...
== ... (took 1 secs)
== taking care of extensions...
== restore after iterating...
== postprocessing...
== sanity checking...
== cleaning up...
== creating module...
== permissions...
== packaging...
== COMPLETED: Installation ended successfully (took 3 mins 28 secs)

The only difference here appears to be GCC version 12.2.0 versus 11.3.0!

Any ideas about what's causing this error in the tests?

Perhaps GCC version 12.2.0 tries to use the new AVX-512 instructions in 
AMD Genoa and has a bug?


Thanks,
Ole


On 9/26/23 08:04, Ole Holm Nielsen wrote:
I'm starting EasyBuild up on our new AMD "Genoa" platform with 1 AMD EPYC 
9124 16-Core Processor with 2 threads/core, 384 GB RAM, and AlmaLinux 8.8 OS.


Unfortunately, building the foss-2022b toolchain exits during the testing 
phase of OpenBLAS-0.3.21-GCC-12.2.0.eb as shown below.  Does anyone have 
ideas about what might be wrong?


$ eb foss-2022b.eb -r
(lines deleted)
== processing EasyBuild easyconfig 
/home/modules/software/EasyBuild/4.8.1/easybuild/easyconfigs/o/OpenBLAS/OpenBLAS-0.3.21-GCC-12.2.0.eb

== building and installing OpenBLAS/0.3.21-GCC-12.2.0...
== fetching files...
== ... (took 7 secs)
== creating build dir, resetting environment...
== unpacking...
== patching...
== preparing...
== configuring...
== building...
== ... (took 53 secs)
== testing...
== ... (took 12 secs)
== FAILED: Installation ended unsuccessfully (build directory: 
/dev/shm/OpenBLAS/0.3.21/GCC-12.2.0): build failed (first 300 chars): cmd 
" make tests  BINARY='64'  CC='gcc'  FC='gfortran'  MAKE_NB_JOBS='-1' 
USE_OPENMP='1'  USE_THREAD='1' " exited with exit code 2 and output:
/home/modules/software/binutils/2.39-GCCcore-12.2.0/bin/ld: warning: 
/tmp/eb-74m3kzgo/ccy1Gkzg.o: missing .note.GNU-stack section implies 
executable stack

/ (took 1 min 14 secs)
== Results of the build can be found in the log file(s) 
/tmp/eb-74m3kzgo/easybuild-OpenBLAS-0.3.21-20230925.161149.UfDUO.log
ERROR: Build of 
/home/modules/software/EasyBuild/4.8.1/easybuild/easyconfigs/o/OpenBLAS/OpenBLAS-0.3.21-GCC-12.2.0.eb failed (err: 'build failed (first 300 chars): cmd " make tests BINARY=\'64\'  CC=\'gcc\'  FC=\'gfortran\'  MAKE_NB_JOBS=\'-1\' USE_OPENMP=\'1\'  USE_THREAD=\'1\' " exited with exit code 2 and output:\n/home/modules/software/binutils/2.39-GCCcore-12.2.0/bin/ld: warning: /tmp/eb-74m3kzgo/ccy1Gkzg.o: missing .note.GNU-stack section implies executable stack\n/')



The log file shows some an error in test_kernel_regress.c:50:

(lines deleted)
./openblas_utest
TEST 1/37 max:smax_zero [OK]
TEST 2/37 max:dmax_positive [OK]
TEST 3/37 max:smax_negative [OK]
TEST 4/37 min:smin_zero [OK]
TEST 5/37 min:dmin_positive [OK]
TEST 6/37 min:smin_negative [OK]
TEST 7/37 amax:damax [OK]
TEST 8/37 amax:samax [OK]
TEST 9/37 ismax:negative_step_2 [OK]
TEST 10/37 ismax:positive_step_2 [OK]
TEST 11/37 ismin:negative_step_2 [OK]
TEST 12/37 ismin:positive_step_2 [OK]
TEST 13/37 drotmg:drotmg_D1_big_D2_big_flag_zero [OK]
TEST 14/37 drotmg:rotmg_D1eqD2_X1eqX2 [OK]
TEST 15/37 drotmg:rotmg_issue1452 [OK]
TEST 16/37 drotmg:rotmg [OK]
TEST 17/37 axpy:caxpy_inc_0 [OK]
TEST 18/37 axpy:saxpy_inc_0 [OK]
TEST 19/37 axpy:zaxpy_inc_0 [OK]
TEST 20/37 axpy:daxpy_inc_0 [OK]
TEST 21/37 zdotu:zdotu_offset_1 [OK]
TEST 22/37 zdotu:zdotu_n_1 [OK]
TEST 23/37 dsdot:dsdot_n_1 [OK]
TEST 24/37 swap:cswap_inc_0 [OK]
TEST 25/37 swap:sswap_inc_0 [OK]
TEST 26/37 swap:zswap_inc_0 [OK]
TEST 27/37 swap:dswap_inc_0 [OK]
TEST 28/37 rot:csrot_inc_0 [OK]
TEST 29/37 rot:srot_inc_0 [OK]
TEST 30/37 rot:zdrot_inc_0 [OK]
TEST 31/37 rot:drot_inc_0 [OK]
TEST 32/37 dnrm2:dnrm2_tiny [OK]
TEST 33/37 dnrm2:dnrm2_inf [OK]
TEST 34/37 potrf:smoketest_trivial [OK]
TEST 35/37 potrf:bug_695 [OK]
TEST 36/37 kernel_regress:skx_avx [FAIL]
   ERR: test_kernel_regress.c:50  expected 0.000e+00, got 6.734e+01 (diff 
-6.734e+01, tol 1.000e-10)

TEST 37/37 fork:safety_after_fork_in_parent [OK]
RESULTS: 37 tests (36 ok, 1 failed, 0 skipped) ran in 3 ms
make[1]: *** [Makefile:52: run_test] Error 1
make[1]: Leaving directory 
'/dev/shm/OpenBLAS/0.3.21/GCC-12.2.0/OpenBLAS-0.3.21/utest'

make: *** [Makefile:150: tests] Error 2
  (at easybuild/tools/run.py:681 in parse_cmd_output)
== 2023-09-25 16:13:04,292 build_log.py:267 INFO ... (took 12 secs)
== 2023-09-25 16:13:04,292 filetools.py:2012 INFO Removing lock 

[easybuild] OpenMPI-4.1.4-GCC-12.2.0.eb Sanity check failed on AMD "Genoa" node

2023-09-26 Thread Ole Holm Nielsen
328 INFO Closing log for 
application name OpenMPI version 4.1.4



Note: This node is NOT equipped with Infiniband or Omni-Path, just plain 
Ethernet.


--
Ole Holm Nielsen
PhD, Senior HPC Officer
Department of Physics, Technical University of Denmark


[easybuild] OpenBLAS-0.3.21-GCC-12.2.0.eb testing failed om AMD "Genoa" node

2023-09-26 Thread Ole Holm Nielsen
I'm starting EasyBuild up on our new AMD "Genoa" platform with 1 AMD EPYC 
9124 16-Core Processor with 2 threads/core, 384 GB RAM, and AlmaLinux 8.8 OS.


Unfortunately, building the foss-2022b toolchain exits during the testing 
phase of OpenBLAS-0.3.21-GCC-12.2.0.eb as shown below.  Does anyone have 
ideas about what might be wrong?


$ eb foss-2022b.eb -r
(lines deleted)
== processing EasyBuild easyconfig 
/home/modules/software/EasyBuild/4.8.1/easybuild/easyconfigs/o/OpenBLAS/OpenBLAS-0.3.21-GCC-12.2.0.eb

== building and installing OpenBLAS/0.3.21-GCC-12.2.0...
== fetching files...
== ... (took 7 secs)
== creating build dir, resetting environment...
== unpacking...
== patching...
== preparing...
== configuring...
== building...
== ... (took 53 secs)
== testing...
== ... (took 12 secs)
== FAILED: Installation ended unsuccessfully (build directory: 
/dev/shm/OpenBLAS/0.3.21/GCC-12.2.0): build failed (first 300 chars): cmd 
" make tests  BINARY='64'  CC='gcc'  FC='gfortran'  MAKE_NB_JOBS='-1' 
USE_OPENMP='1'  USE_THREAD='1' " exited with exit code 2 and output:
/home/modules/software/binutils/2.39-GCCcore-12.2.0/bin/ld: warning: 
/tmp/eb-74m3kzgo/ccy1Gkzg.o: missing .note.GNU-stack section implies 
executable stack

/ (took 1 min 14 secs)
== Results of the build can be found in the log file(s) 
/tmp/eb-74m3kzgo/easybuild-OpenBLAS-0.3.21-20230925.161149.UfDUO.log
ERROR: Build of 
/home/modules/software/EasyBuild/4.8.1/easybuild/easyconfigs/o/OpenBLAS/OpenBLAS-0.3.21-GCC-12.2.0.eb 
failed (err: 'build failed (first 300 chars): cmd " make tests 
BINARY=\'64\'  CC=\'gcc\'  FC=\'gfortran\'  MAKE_NB_JOBS=\'-1\' 
USE_OPENMP=\'1\'  USE_THREAD=\'1\' " exited with exit code 2 and 
output:\n/home/modules/software/binutils/2.39-GCCcore-12.2.0/bin/ld: 
warning: /tmp/eb-74m3kzgo/ccy1Gkzg.o: missing .note.GNU-stack section 
implies executable stack\n/')



The log file shows some an error in test_kernel_regress.c:50:

(lines deleted)
./openblas_utest
TEST 1/37 max:smax_zero [OK]
TEST 2/37 max:dmax_positive [OK]
TEST 3/37 max:smax_negative [OK]
TEST 4/37 min:smin_zero [OK]
TEST 5/37 min:dmin_positive [OK]
TEST 6/37 min:smin_negative [OK]
TEST 7/37 amax:damax [OK]
TEST 8/37 amax:samax [OK]
TEST 9/37 ismax:negative_step_2 [OK]
TEST 10/37 ismax:positive_step_2 [OK]
TEST 11/37 ismin:negative_step_2 [OK]
TEST 12/37 ismin:positive_step_2 [OK]
TEST 13/37 drotmg:drotmg_D1_big_D2_big_flag_zero [OK]
TEST 14/37 drotmg:rotmg_D1eqD2_X1eqX2 [OK]
TEST 15/37 drotmg:rotmg_issue1452 [OK]
TEST 16/37 drotmg:rotmg [OK]
TEST 17/37 axpy:caxpy_inc_0 [OK]
TEST 18/37 axpy:saxpy_inc_0 [OK]
TEST 19/37 axpy:zaxpy_inc_0 [OK]
TEST 20/37 axpy:daxpy_inc_0 [OK]
TEST 21/37 zdotu:zdotu_offset_1 [OK]
TEST 22/37 zdotu:zdotu_n_1 [OK]
TEST 23/37 dsdot:dsdot_n_1 [OK]
TEST 24/37 swap:cswap_inc_0 [OK]
TEST 25/37 swap:sswap_inc_0 [OK]
TEST 26/37 swap:zswap_inc_0 [OK]
TEST 27/37 swap:dswap_inc_0 [OK]
TEST 28/37 rot:csrot_inc_0 [OK]
TEST 29/37 rot:srot_inc_0 [OK]
TEST 30/37 rot:zdrot_inc_0 [OK]
TEST 31/37 rot:drot_inc_0 [OK]
TEST 32/37 dnrm2:dnrm2_tiny [OK]
TEST 33/37 dnrm2:dnrm2_inf [OK]
TEST 34/37 potrf:smoketest_trivial [OK]
TEST 35/37 potrf:bug_695 [OK]
TEST 36/37 kernel_regress:skx_avx [FAIL]
  ERR: test_kernel_regress.c:50  expected 0.000e+00, got 6.734e+01 (diff 
-6.734e+01, tol 1.000e-10)

TEST 37/37 fork:safety_after_fork_in_parent [OK]
RESULTS: 37 tests (36 ok, 1 failed, 0 skipped) ran in 3 ms
make[1]: *** [Makefile:52: run_test] Error 1
make[1]: Leaving directory 
'/dev/shm/OpenBLAS/0.3.21/GCC-12.2.0/OpenBLAS-0.3.21/utest'

make: *** [Makefile:150: tests] Error 2
 (at easybuild/tools/run.py:681 in parse_cmd_output)
== 2023-09-25 16:13:04,292 build_log.py:267 INFO ... (took 12 secs)
== 2023-09-25 16:13:04,292 filetools.py:2012 INFO Removing lock 
/home/modules/software/.locks/_home_modules_software_OpenBLAS_0.3.21-GCC-12.2.0.lock...
== 2023-09-25 16:13:04,293 filetools.py:383 INFO Path 
/home/modules/software/.locks/_home_modules_software_OpenBLAS_0.3.21-GCC-12.2.0.lock 
successfully removed.
== 2023-09-25 16:13:04,293 filetools.py:2016 INFO Lock removed: 
/home/modules/software/.locks/_home_modules_software_OpenBLAS_0.3.21-GCC-12.2.0.lock
== 2023-09-25 16:13:04,293 easyblock.py:4277 WARNING build failed (first 
300 chars): cmd " make tests  BINARY='64'  CC='gcc'  FC='gfortran' 
MAKE_NB_JOBS='-1'  USE_OPENMP='1'  USE_THREAD='1' " exited with exit code 
2 and output:
/home/modules/software/binutils/2.39-GCCcore-12.2.0/bin/ld: warning: 
/tmp/eb-74m3kzgo/ccy1Gkzg.o: missing .note.GNU-stack section implies 
executable stack

/
== 2023-09-25 16:13:04,293 easyblock.py:328 INFO Closing log for 
application name OpenBLAS version 0.3.21


Thanks,
Ole

--
Ole Holm Nielsen
PhD, Senior HPC Officer
Department of Physics, Technical University of Denmark


[easybuild] OpenPMIx PMIx important security issue

2023-09-14 Thread Ole Holm Nielsen

Hi Easybuilders,

The Slurm folks have alerted us to an important security issue in PMIx 
before 4.2.6 and 5.0.1.  See:


https://nvd.nist.gov/vuln/detail/CVE-2023-41915  (CVSS score 8.1 High)
https://github.com/openpmix/openpmix/releases/tag/v4.2.6

The description is:


A security issue was reported by François Diakhate (CEA)
which is addressed in the PMIx v4.2.6 and v5.0.1 releases.
(Older PMIx versions may be vulnerable, but are no longer
supported.)

A filesystem race condition could permit a malicious user
to obtain ownership of an arbitrary file on the filesystem
when parts of the PMIx library are called by a process
running as uid 0. This may happen under the default
configuration of certain workload managers, including Slurm.


It therefore appears that all EB modules of PMIx are vulnerable, if run 
by the root user for some reason!  The most recent EB module is 
PMIx-4.2.4-GCCcore-12.3.0.eb, and all PMIx modules in EB are no longer 
supported!


Question 1: If PMIx is used only by normal users, can we be sure that 
the security issue can't be exploited?


Question 2: Is the issue resolved by PR 18755 and 18759?  If so, how do 
we apply this to all of our currently installed PMIx modules?  Can 
anyone give the exact command used to rebuild any given PMIx module 
including the mentioned PRs?


Slurm users: Check if your Slurm has been built with PMIx support by:
$ srun --mpi=list
in which case you must rebuild Slurm without PMIx!

Thanks,
Ole

--
Ole Holm Nielsen
PhD, Senior HPC Officer
Department of Physics, Technical University of Denmark


Re: [easybuild] EB file for building a recent version of Meep?

2023-05-26 Thread Ole Holm Nielsen

Hi Terje,

Thanks a lot, I have copied out the .eb files I needed.

Best regards,
Ole

On 5/26/23 11:36, Terje Kvernes wrote:

Hi Ole,

Simon and the rest of the Bear Research Software Group have a complete tree of 
easyconfigs under https://github.com/bear-rsg/bear-eb/tree/main/easyconfigs, 
and in there you will find the missing dependencies. You either can clone this 
tree locally and add it to your robots search path, or copy out the 
dependencies you need manually.


On 26 May 2023, at 10:48, Ole Holm Nielsen  wrote:

Hi Simon,

Fantastic, thanks a lot for the quick answer!  However, when I try to build the 
module I encounter some missing prerequisites:

ERROR: Missing dependencies: autograd/1.5-foss-2021b, libctl/4.5.1-foss-2021b, 
Harminv/1.4.1-foss-2021b, MPB/1.11.1-foss-2021b, libGDSII/0.21-foss-2021b (no 
easyconfig file or existing module found)

How can I resolve this issue?

Thanks,
Ole


On 5/26/23 10:39, Simon Branford wrote:

Hi Ole,
Yes, we have an easyconfig for 1.25 - see 
https://github.com/bear-rsg/bear-eb/blob/main/easyconfigs/m/Meep/Meep-1.25.0-foss-2021b.eb
Regards,
Simon
-Original Message-
From: easybuild-requ...@lists.ugent.be  On 
Behalf Of Ole Holm Nielsen
Sent: 26 May 2023 09:29
To: easybuild 
Subject: [easybuild] EB file for building a recent version of Meep?
CAUTION: This email originated from outside the organisation. Do not click 
links or open attachments unless you recognise the sender and know the content 
is safe.
We have a user request for installing the Meep module.  However, the versions 
of Meep in EasyBuild are very old:
https://docs.easybuild.io/version-specific/supported-software/#meep
The .eb files would build an old version Meep-1.4.3-intel-2020a.eb from an 
obsolete source, or old version Meep-1.6.0-intel-2018a-Python-2.7.14.eb
with Python 2.7 :-(
The current release of Meep is 1.26:
https://github.com/NanoComp/meep/releases and installation instructions are in 
https://meep.readthedocs.io/en/latest/Installation/
Question: Has anyone looked at writing an .eb file for a recent version of Meep 
with modern compilers and Python3?
Thanks,
Ole
--
Ole Holm Nielsen
PhD, Senior HPC Officer
Department of Physics, Technical University of Denmark


Re: [easybuild] EB file for building a recent version of Meep?

2023-05-26 Thread Ole Holm Nielsen

Hi Simon,

Fantastic, thanks a lot for the quick answer!  However, when I try to 
build the module I encounter some missing prerequisites:


ERROR: Missing dependencies: autograd/1.5-foss-2021b, 
libctl/4.5.1-foss-2021b, Harminv/1.4.1-foss-2021b, MPB/1.11.1-foss-2021b, 
libGDSII/0.21-foss-2021b (no easyconfig file or existing module found)


How can I resolve this issue?

Thanks,
Ole


On 5/26/23 10:39, Simon Branford wrote:

Hi Ole,

Yes, we have an easyconfig for 1.25 - see 
https://github.com/bear-rsg/bear-eb/blob/main/easyconfigs/m/Meep/Meep-1.25.0-foss-2021b.eb

Regards,
Simon

-Original Message-
From: easybuild-requ...@lists.ugent.be  On 
Behalf Of Ole Holm Nielsen
Sent: 26 May 2023 09:29
To: easybuild 
Subject: [easybuild] EB file for building a recent version of Meep?

CAUTION: This email originated from outside the organisation. Do not click 
links or open attachments unless you recognise the sender and know the content 
is safe.


We have a user request for installing the Meep module.  However, the versions 
of Meep in EasyBuild are very old:
https://docs.easybuild.io/version-specific/supported-software/#meep

The .eb files would build an old version Meep-1.4.3-intel-2020a.eb from an 
obsolete source, or old version Meep-1.6.0-intel-2018a-Python-2.7.14.eb
with Python 2.7 :-(

The current release of Meep is 1.26:
https://github.com/NanoComp/meep/releases and installation instructions are in 
https://meep.readthedocs.io/en/latest/Installation/

Question: Has anyone looked at writing an .eb file for a recent version of Meep 
with modern compilers and Python3?

Thanks,
Ole

--
Ole Holm Nielsen
PhD, Senior HPC Officer
Department of Physics, Technical University of Denmark




[easybuild] EB file for building a recent version of Meep?

2023-05-26 Thread Ole Holm Nielsen
We have a user request for installing the Meep module.  However, the 
versions of Meep in EasyBuild are very old:

https://docs.easybuild.io/version-specific/supported-software/#meep

The .eb files would build an old version Meep-1.4.3-intel-2020a.eb from an 
obsolete source, or old version Meep-1.6.0-intel-2018a-Python-2.7.14.eb 
with Python 2.7 :-(


The current release of Meep is 1.26: 
https://github.com/NanoComp/meep/releases and installation instructions 
are in https://meep.readthedocs.io/en/latest/Installation/


Question: Has anyone looked at writing an .eb file for a recent version of 
Meep with modern compilers and Python3?


Thanks,
Ole

--
Ole Holm Nielsen
PhD, Senior HPC Officer
Department of Physics, Technical University of Denmark


Re: [easybuild] libxc-5.2.3-GCC-11.3.0.eb: Download incomplete

2023-03-28 Thread Ole Holm Nielsen

Hi Loris,

It would be great if you could make a PR against libxc!  I'm not that 
experienced with Git usage ;-)


Thanks,
Ole

On 3/28/23 13:03, Loris Bennett wrote:

Hi Ole,

Ole Holm Nielsen  writes:


Hi Loris,

I'm being hit by that libxc download problem too!  It's very bad for
the modules that we're trying to build :-(

Should EB use this download URL in stead?
https://gitlab.com/libxc/libxc/-/archive/5.2.3/libxc-5.2.3.tar.gz


Yes, I think it probably should, since, as far as I can tell, this is
probably the canonical version.  Have you already got a patched EC you
could use for a pull request, or shall I created one?

Cheers,

Loris


Thanks,
Ole

On 3/28/23 11:41, Loris Bennett wrote:

The EC libxc-5.2.3-GCC-11.3.0.eb downloads the sources from
https://www.tddft.org/programs/libxc
However, the site seems to be having issues such that the download
gets
interrupted before it is complete.  There are other places to get the
sources from, i.e.
https://gitlab.com/libxc/libxc
and
https://github.com/ElectronicStructureLibrary/libxc
but the checksums for the version 5.2.3 are different in these two
cases
and both different from the checksum in the EC (the Github version is
supposed to be a mirror of the Gitlab version).
What should happen here?  Should I wait and hope that www.tddft.org
stops being so flaky, or should the EC (or at least the next version
thereof) be changed to use Gitlab as the source, since this now seems to
be the place were development occurs?






Re: [easybuild] libxc-5.2.3-GCC-11.3.0.eb: Download incomplete

2023-03-28 Thread Ole Holm Nielsen

Hi Loris,

I'm being hit by that libxc download problem too!  It's very bad for the 
modules that we're trying to build :-(


Should EB use this download URL in stead?
https://gitlab.com/libxc/libxc/-/archive/5.2.3/libxc-5.2.3.tar.gz

Thanks,
Ole

On 3/28/23 11:41, Loris Bennett wrote:

The EC libxc-5.2.3-GCC-11.3.0.eb downloads the sources from

   https://www.tddft.org/programs/libxc

However, the site seems to be having issues such that the download gets
interrupted before it is complete.  There are other places to get the
sources from, i.e.

   https://gitlab.com/libxc/libxc

and

   https://github.com/ElectronicStructureLibrary/libxc

but the checksums for the version 5.2.3 are different in these two cases
and both different from the checksum in the EC (the Github version is
supposed to be a mirror of the Gitlab version).

What should happen here?  Should I wait and hope that www.tddft.org
stops being so flaky, or should the EC (or at least the next version
thereof) be changed to use Gitlab as the source, since this now seems to
be the place were development occurs?


Re: [easybuild] Which toolchain to use on AMD EPYC 9004 "Genoa"?

2023-03-14 Thread Ole Holm Nielsen

On 2/15/23 15:30, Jure Pečar wrote:

On Wed, 15 Feb 2023 15:26:26 +0100
Ole Holm Nielsen  wrote:


Is "foss" also the preferred toolchain on AMD Rome and Genoa?


For now, yes.

There's some work going on to create a toolchain around amd compilers but
it's questionable how much you gain as all features these compilers and
libs bring eventually end up in upstream llvm and gcc.

So for Genoa look for upcoming foss toolchain built around gcc 13. For
Rome and Milan existing ones with gcc 11 and 12 are already working fine.


Sorry for my late reply.  The latest GCC in EasyBuild is 12.2.0.  Is there 
any way to build GCC 13 with EasyBuild so that we can test AMD Genoa?


Obviously, a lot of other modules are needed as well, so when can a foss 
toolchain with GCC 13 be hoped for?


Thanks,
Ole

--
Ole Holm Nielsen
PhD, Senior HPC Officer
Department of Physics, Technical University of Denmark


[easybuild] Which toolchain to use on AMD EPYC 9004 "Genoa"?

2023-02-15 Thread Ole Holm Nielsen
In an upcoming procurement we plan to use a GPAW EasyBuild module for 
benchmarking.  On 4th Gen Intel Xeon Scalable processors we should 
presumably specify a recent "intel" toolchain because of support for the 
latest instructions.


Question: Which toolchain to use on AMD EPYC 9004 "Genoa"?

There are a number of sites out there with AMD clusters, and I wonder 
which optimal and reliable toolchains people use with AMD Rome and Genoa?


We have often used the "foss" toolchain on Intel Xeon because it provides 
OpenMPI, OpenBLAS, ScaLAPACK, and FFTW modules for the GPAW code.  Is 
"foss" also the preferred toolchain on AMD Rome and Genoa?


Looking at the long toolchain list in 
https://docs.easybuild.io/version-specific/toolchains/ I'm not much wiser 
:-(  None of these toolchains seems to use the AMD AOCC compiler, and I 
wonder why?


Thanks for sharing any advice and experiences!

Best regards,
Ole

--
Ole Holm Nielsen
PhD, Senior HPC Officer
Department of Physics, Technical University of Denmark


Re: [easybuild] Force rebuild of module plus all dependencies

2022-12-01 Thread Ole Holm Nielsen

Hi Loris,

On 12/2/22 08:27, Loris Bennett wrote:

How do I force a total rebuild of, say, a foss toolchain for a different
CPU architecture?

Up to now I had a homogeneous cluster with Intel Xeon CPUs, now we have
acquired some nodes with AMD Epyc CPUs for which I need to build
software.

I have modified EASYBUILD_INSTALLPATH to point to directory for the new
architecture and prepended a corresponding directory for the modules to
MODULEPATH.  However, running EasyBuild with the option --force just
rebuilds the package specified, not the dependencies.

What is the correct way to go about this?


Probably there are multiple ways to set up modules for multiple 
architectures :-)  My choice was to create completely different module 
trees for each type of hardware (we have 4 generations of Intel Xeon).  My 
notes are in this Wiki page:

https://wiki.fysik.dtu.dk/Niflheim_system/EasyBuild_modules/#automounting-the-cpu-architecture-dependent-modules-directory

IHTH /Ole


Re: [easybuild] Build of Perl-5.34.1-GCCcore-11.3.0.eb fails due to updated version of Variable-Magic-0.62.tar.gz

2022-11-23 Thread Ole Holm Nielsen

Hi Ed,

On 11/23/22 13:26, Ed Eyles wrote:

See https://github.com/easybuilders/easybuild-easyconfigs/issues/16621.  That 
issue is for a different Perl version, but it gives a quick-and-dirty download 
source, or you may be able to install from the given PR to fix it properly.


Thanks a lot, I fetched this tarball:
https://cpan.metacpan.org/authors/id/V/VP/VPIT/Variable-Magic-0.62.tar.gz

/Ole


-Original Message-
From: easybuild-requ...@lists.ugent.be  On 
Behalf Of Ole Holm Nielsen
Sent: 23 November 2022 12:17
To: easybuild@lists.ugent.be
Subject: [easybuild] Build of Perl-5.34.1-GCCcore-11.3.0.eb fails due to 
updated version of Variable-Magic-0.62.tar.gz

CAUTION: This email originated from outside of the organisation. Do not click 
links or open attachments unless you recognise the sender and know the content 
is safe.


I'm trying to build Perl-5.34.1-GCCcore-11.3.0.eb, but it fails due to an 
updated version of Variable-Magic where version 0.62 has been replaced by 0.63. 
 The 0.62 is nowhere to be found :-(  Only this version is
available:
https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.cpan.org%2Fauthors%2Fid%2FV%2FVP%2FVPIT%2FVariable-Magic-0.63.tar.gzdata=05%7C01%7Ced.eyles%40rothamsted.ac.uk%7Cc566010c864b4b574fd708dacd4cd14f%7Cb688362589414342b0e37b8cc8392f64%7C1%7C0%7C638048028823298027%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=GzamwdPQXKeA1QFSvAuJMmWjL%2FYF6lOgqUrDAARw7zg%3Dreserved=0

This snippet in Perl-5.34.1-GCCcore-11.3.0.eb must probably be updated:

  ('Variable::Magic', '0.62', {
  'source_tmpl': 'Variable-Magic-%(version)s.tar.gz',
  'source_urls': 
['https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.cpan.org%2Fauthors%2Fid%2FV%2FVP%2FVPIT%2Fdata=05%7C01%7Ced.eyles%40rothamsted.ac.uk%7Cc566010c864b4b574fd708dacd4cd14f%7Cb688362589414342b0e37b8cc8392f64%7C1%7C0%7C638048028823298027%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=oLn1f5Hf7E65YO522OK5RgA9bpbqmO7oYAyJA4mGxbk%3Dreserved=0'],
  'checksums':
['3f9a18517e33f006a9c2fc4f43f01b54abfe6ff2eae7322424f31069296b615c'],
  }),

Can anyone suggest a workaround such as getting a copy of 0.62?


[easybuild] Build of Perl-5.34.1-GCCcore-11.3.0.eb fails due to updated version of Variable-Magic-0.62.tar.gz

2022-11-23 Thread Ole Holm Nielsen
I'm trying to build Perl-5.34.1-GCCcore-11.3.0.eb, but it fails due to an 
updated version of Variable-Magic where version 0.62 has been replaced by 
0.63.  The 0.62 is nowhere to be found :-(  Only this version is 
available: 
https://www.cpan.org/authors/id/V/VP/VPIT/Variable-Magic-0.63.tar.gz


This snippet in Perl-5.34.1-GCCcore-11.3.0.eb must probably be updated:

('Variable::Magic', '0.62', {
'source_tmpl': 'Variable-Magic-%(version)s.tar.gz',
'source_urls': ['https://www.cpan.org/authors/id/V/VP/VPIT/'],
'checksums': 
['3f9a18517e33f006a9c2fc4f43f01b54abfe6ff2eae7322424f31069296b615c'],

}),

Can anyone suggest a workaround such as getting a copy of 0.62?

Thanks,
Ole

--
Ole Holm Nielsen
PhD, Senior HPC Officer
Department of Physics, Technical University of Denmark


Re: [easybuild] MPI jobs fail with intel toolchains after upgrade of EL8 Linux from 8.5 to 8.6

2022-09-15 Thread Ole Holm Nielsen

Hi all,

With the latest update of the RHEL 8.6 kernel 4.18.0-372.26.1.el8_6.x86_64 
the Intel MPI issue has been resolved!  The older Intel MPI versions are 
working again on the new kernel, see 
https://github.com/easybuilders/easybuild-easyconfigs/issues/15651.  Here 
is a copy of my comment:


The EL 8.6 kernel kernel-4.18.0-372.26.1.el8_6.x86_64.rpm is available in 
both AlmaLinux and RockyLinux since last night! RHEL 8.6 was updated a 
couple of days ago.


I've upgraded an EL 8.6 server and the cpulist file now has size>0 as 
expected:


$ uname -r
4.18.0-372.26.1.el8_6.x86_64
$ ls -l /sys/devices/system/node/node0/cpulist
-r--r--r--. 1 root root 28672 Sep 15 07:42 
/sys/devices/system/node/node0/cpulist


and tested all our Intel toolchains on this system:

$ ml intel/2020b
$ mpiexec.hydra --version
Intel(R) MPI Library for Linux* OS, Version 2019 Update 9 Build 20200923 
(id: abd58e492)

Copyright 2003-2020, Intel Corporation.
$ ml purge
$ ml intel/2021b
$ mpiexec.hydra --version
Intel(R) MPI Library for Linux* OS, Version 2021.4 Build 20210831 (id: 
758087adf)

Copyright 2003-2021, Intel Corporation.
$ ml purge
$ ml intel/2022a
[$ mpiexec.hydra --version
Intel(R) MPI Library for Linux* OS, Version 2021.6 Build 20220227 (id: 
28877f3f32)

Copyright 2003-2022, Intel Corporation.

As you can see, the Intel MPI is now working correctly again :-)) It was 
OK on EL 8.5, but broken on EL 8.6 until the above listed kernel was released.


Best regards,
Ole



On 6/9/22 11:09, Ole Holm Nielsen wrote:

Hi Alan,

Thanks a lot for the feedback!  I've opened a new issue now:
https://github.com/easybuilders/easybuild-easyconfigs/issues/15651

Best regards,
Ole

On 6/9/22 10:52, Alan O'Cais wrote:

Ole,

Can you please copy this over to an issue in 
https://github.com/easybuilders/easybuild-easyconfigs/issues 
<https://github.com/easybuilders/easybuild-easyconfigs/issues> so we can 
keep track of things there? It is also being discussed in Slack but we 
should really have the discussion and progress in a location where 
anyone can find it.


If you don't have a GitHub account, can you give me permission to copy 
over the content of your email to create the issue.


Thanks,

Alan

On Wed, 25 May 2022 at 10:54, Ole Holm Nielsen 
mailto:ole.h.niel...@fysik.dtu.dk>> wrote:


    Hi Easybuilders,

    I'm testing the upgrade of our compute nodes from Almalinux 8.5 to 8.6
    (the RHEL 8 clone similar to Rocky Linux).

    We have found that *all* MPI codes built with any of the Intel 
toolchains
    intel/2020b or intel/2021b fail after the 8.5 to 8.6 upgrade.  The 
codes

    fail also on login nodes, so the Slurm queue system is not involved.
    The FOSS toolchains foss/2020b and foss/2021b work perfectly on EL 8.6,
    however.

    My simple test uses the attached trivial MPI Hello World code running
    on a
    single node:

    $ module load intel/2021b
    $ mpicc mpi_hello_world.c
    $ mpirun ./a.out

    Now the mpirun command enters an infinite loop (running many 
minutes) and

    we see these processes with "ps":

    /bin/sh

/home/modules/software/impi/2021.4.0-intel-compilers-2021.4.0/mpi/2021.4.0/bin/mpirun 



    ./a.out
    mpiexec.hydra ./a.out

    The mpiexec.hydra process doesn't respond to 15/SIGTERM and I have to
    kill
    it with 9/SIGKILL.  I've tried to enable debugging output with
    export I_MPI_HYDRA_DEBUG=1
    export I_MPI_DEBUG=5
    but nothing gets printed from this.

    Question: Has anyone tried an EL 8.6 Linux with the Intel toolchain and
    mpiexec.hydra?  Can you suggest how I may debug this issue?

    OS information:

    $ cat /etc/redhat-release
    AlmaLinux release 8.6 (Sky Tiger)
    $ uname -r
    4.18.0-372.9.1.el8.x86_64

    Thanks a lot,
    Ole

    --     Ole Holm Nielsen
    PhD, Senior HPC Officer
    Department of Physics, Technical University of Denmark





--
Ole Holm Nielsen
PhD, Senior HPC Officer
Department of Physics, Technical University of Denmark,
Fysikvej Building 309, DK-2800 Kongens Lyngby, Denmark
E-mail: ole.h.niel...@fysik.dtu.dk
Homepage: http://dcwww.fysik.dtu.dk/~ohnielse/
Mobile: (+45) 5180 1620


Re: [easybuild] MPI jobs fail with intel toolchains after upgrade of EL8 Linux from 8.5 to 8.6

2022-06-09 Thread Ole Holm Nielsen

Hi Alan,

Thanks a lot for the feedback!  I've opened a new issue now:
https://github.com/easybuilders/easybuild-easyconfigs/issues/15651

Best regards,
Ole

On 6/9/22 10:52, Alan O'Cais wrote:

Ole,

Can you please copy this over to an issue in 
https://github.com/easybuilders/easybuild-easyconfigs/issues 
<https://github.com/easybuilders/easybuild-easyconfigs/issues> so we can 
keep track of things there? It is also being discussed in Slack but we 
should really have the discussion and progress in a location where anyone 
can find it.


If you don't have a GitHub account, can you give me permission to copy 
over the content of your email to create the issue.


Thanks,

Alan

On Wed, 25 May 2022 at 10:54, Ole Holm Nielsen <mailto:ole.h.niel...@fysik.dtu.dk>> wrote:


Hi Easybuilders,

I'm testing the upgrade of our compute nodes from Almalinux 8.5 to 8.6
(the RHEL 8 clone similar to Rocky Linux).

We have found that *all* MPI codes built with any of the Intel toolchains
intel/2020b or intel/2021b fail after the 8.5 to 8.6 upgrade.  The codes
fail also on login nodes, so the Slurm queue system is not involved.
The FOSS toolchains foss/2020b and foss/2021b work perfectly on EL 8.6,
however.

My simple test uses the attached trivial MPI Hello World code running
on a
single node:

$ module load intel/2021b
$ mpicc mpi_hello_world.c
$ mpirun ./a.out

Now the mpirun command enters an infinite loop (running many minutes) and
we see these processes with "ps":

/bin/sh

/home/modules/software/impi/2021.4.0-intel-compilers-2021.4.0/mpi/2021.4.0/bin/mpirun

./a.out
mpiexec.hydra ./a.out

The mpiexec.hydra process doesn't respond to 15/SIGTERM and I have to
kill
it with 9/SIGKILL.  I've tried to enable debugging output with
export I_MPI_HYDRA_DEBUG=1
export I_MPI_DEBUG=5
but nothing gets printed from this.

Question: Has anyone tried an EL 8.6 Linux with the Intel toolchain and
mpiexec.hydra?  Can you suggest how I may debug this issue?

OS information:

$ cat /etc/redhat-release
AlmaLinux release 8.6 (Sky Tiger)
$ uname -r
4.18.0-372.9.1.el8.x86_64

Thanks a lot,
Ole

    -- 
Ole Holm Nielsen

PhD, Senior HPC Officer
Department of Physics, Technical University of Denmark



--
Ole Holm Nielsen
PhD, Senior HPC Officer
Department of Physics, Technical University of Denmark,
Fysikvej Building 309, DK-2800 Kongens Lyngby, Denmark
E-mail: ole.h.niel...@fysik.dtu.dk
Homepage: http://dcwww.fysik.dtu.dk/~ohnielse/
Mobile: (+45) 5180 1620


Re: [easybuild] Re: UCX ibv_create_cq and UCP worker errors on nodes with EL8 OS and Omni-Path fabric

2021-12-06 Thread Ole Holm Nielsen

Hi Bart,

Thanks for your recommendations!  We had already tried this:

export OMPI_MCA_osc='^ucx'
export OMPI_MCA_pml='^ucx'

and unfortunately this increased the CPU time of our benchmark code (GPAW) 
by about 30% compared to the same compute node without an Omni-Path 
adapter.  So this doesn't appear to be a viable solution.


We had also tried to rebuild with:

$ eb --filter-deps=UCX OpenMPI-4.0.5-GCC-10.2.0.eb --force

but then the job error log files had some warnings:


--
By default, for Open MPI 4.0 and later, infiniband ports on a device
are not used by default.  The intent is to use UCX for these devices.
You can override this policy by setting the btl_openib_allow_ib MCA parameter
to true.

  Local host:  d063
  Local adapter:   hfi1_0
  Local port:  1

--
--
WARNING: There is at least non-excluded one OpenFabrics device found,
but there are no active ports detected (or Open MPI was unable to use
them).  This is most certainly not what you wanted.  Check your
cables, subnet manager configuration, etc.  The openib BTL will be
ignored for this job.

  Local host: d063
--
[d063.nifl.fysik.dtu.dk:23605] 55 more processes have sent help message 
help-mpi-btl-openib.txt / ib port not selected
[d063.nifl.fysik.dtu.dk:23605] Set MCA parameter "orte_base_help_aggregate" to 
0 to see all help / error messages
[d063.nifl.fysik.dtu.dk:23605] 55 more processes have sent help message 
help-mpi-btl-openib.txt / no active ports found


These warnings did sound rather bad, so we didn't pursue this approach any 
further.


Do you have any other ideas about OMPI_* variables that we could try? 
Since I'm not an MPI expert, complete commands and variables would be 
appreciated :-)


I would like to remind you that we're running AlmaLinux 8.5 with new 
versions of libfabric etc. from the BaseOS.  On CentOS 7.9 we never had 
any problems with Omni-Path adapters.


Thanks,
Ole

On 12/3/21 15:08, Bart Oldeman wrote:

Hi Ole,

we found that UCX isn't very useful not performant on OmniPath, so if your 
compiled isn't used on both InfiniBand and OmniPath you can compile 
OpenMPI using "eb --filter-deps=UCX ..."
Open MPI works well there either using libpsm2 directly (using the "cm" 
pml and "psm2" mtl), or via libfabric (using the same "cm" pml and the 
"ofi" mtl)


We use the same Open MPI binaries on multiple clusters but set this on 
OmniPath:

OMPI_MCA_btl='^openib'
OMPI_MCA_osc='^ucx'
OMPI_MCA_pml='^ucx'
to disable UCX and openib at runtime. If you include UCX in EB's OpenMPI 
it will not compile in "openib" so the first one of those three would not 
be needed.


Regards,
Bart

On Fri, 3 Dec 2021 at 07:29, Ole Holm Nielsen <mailto:ole.h.niel...@fysik.dtu.dk>> wrote:


Hi Åke,

On 12/3/21 08:27, Åke Sandgren wrote:
     >> On 02-12-2021 14:18, Åke Sandgren wrote:
 >>> On 12/2/21 2:06 PM, Ole Holm Nielsen wrote:
 >>>> These are updated observations of running OpenMPI codes with an
 >>>> Omni-Path network fabric on AlmaLinux 8.5::
 >>>>
 >>>> Using the foss-2021b toolchain and OpenMPI/4.1.1-GCC-11.2.0 my
trivial
 >>>> MPI test code works correctly:
 >>>>
 >>>> $ ml OpenMPI
 >>>> $ ml
 >>>>
 >>>> Currently Loaded Modules:
 >>>>     1) GCCcore/11.2.0 9)
hwloc/2.5.0-GCCcore-11.2.0
 >>>>     2) zlib/1.2.11-GCCcore-11.2.0    10) OpenSSL/1.1
 >>>>     3) binutils/2.37-GCCcore-11.2.0  11)
 >>>> libevent/2.1.12-GCCcore-11.2.0
 >>>>     4) GCC/11.2.0    12)
UCX/1.11.2-GCCcore-11.2.0
 >>>>     5) numactl/2.0.14-GCCcore-11.2.0 13)
 >>>> libfabric/1.13.2-GCCcore-11.2.0
 >>>>     6) XZ/5.2.5-GCCcore-11.2.0   14)
PMIx/4.1.0-GCCcore-11.2.0
 >>>>     7) libxml2/2.9.10-GCCcore-11.2.0 15)
OpenMPI/4.1.1-GCC-11.2.0
 >>>>     8) libpciaccess/0.16-GCCcore-11.2.0
 >>>>
 >>>> $ mpicc mpi_test.c
 >>>> $ mpirun -n 2 a.out
 >>>>
 >>>> (null): There are 2 processes
 >>>>
 >>>> (null): Rank  1:  d008
 >>>>
 >>>> (null): Rank  0:  d008
 >>>>
 >>>>
 >>>> I also tried the OpenMPI/4.1.0-GCC-10.2.0 module, but this still
gives
 >>>

Re: [easybuild] Re: UCX ibv_create_cq and UCP worker errors on nodes with EL8 OS and Omni-Path fabric

2021-12-03 Thread Ole Holm Nielsen

Hi Åke,

On 12/3/21 08:27, Åke Sandgren wrote:

On 02-12-2021 14:18, Åke Sandgren wrote:

On 12/2/21 2:06 PM, Ole Holm Nielsen wrote:

These are updated observations of running OpenMPI codes with an
Omni-Path network fabric on AlmaLinux 8.5::

Using the foss-2021b toolchain and OpenMPI/4.1.1-GCC-11.2.0 my trivial
MPI test code works correctly:

$ ml OpenMPI
$ ml

Currently Loaded Modules:
    1) GCCcore/11.2.0 9) hwloc/2.5.0-GCCcore-11.2.0
    2) zlib/1.2.11-GCCcore-11.2.0    10) OpenSSL/1.1
    3) binutils/2.37-GCCcore-11.2.0  11)
libevent/2.1.12-GCCcore-11.2.0
    4) GCC/11.2.0    12) UCX/1.11.2-GCCcore-11.2.0
    5) numactl/2.0.14-GCCcore-11.2.0 13)
libfabric/1.13.2-GCCcore-11.2.0
    6) XZ/5.2.5-GCCcore-11.2.0   14) PMIx/4.1.0-GCCcore-11.2.0
    7) libxml2/2.9.10-GCCcore-11.2.0 15) OpenMPI/4.1.1-GCC-11.2.0
    8) libpciaccess/0.16-GCCcore-11.2.0

$ mpicc mpi_test.c
$ mpirun -n 2 a.out

(null): There are 2 processes

(null): Rank  1:  d008

(null): Rank  0:  d008


I also tried the OpenMPI/4.1.0-GCC-10.2.0 module, but this still gives
the error messages:

$ ml OpenMPI/4.1.0-GCC-10.2.0
$ ml

Currently Loaded Modules:
    1) GCCcore/10.2.0   3) binutils/2.35-GCCcore-10.2.0   5)
numactl/2.0.13-GCCcore-10.2.0   7) libxml2/2.9.10-GCCcore-10.2.0  9)
hwloc/2.2.0-GCCcore-10.2.0  11) UCX/1.9.0-GCCcore-10.2.0 13)
PMIx/3.1.5-GCCcore-10.2.0
    2) zlib/1.2.11-GCCcore-10.2.0   4) GCC/10.2.0 6)
XZ/5.2.5-GCCcore-10.2.0 8) libpciaccess/0.16-GCCcore-10.2.0  10)
libevent/2.1.12-GCCcore-10.2.0  12) libfabric/1.11.0-GCCcore-10.2.0  14)
OpenMPI/4.1.0-GCC-10.2.0

$ mpicc mpi_test.c
$ mpirun -n 2 a.out
[1638449983.577933] [d008:910356:0]   ib_iface.c:966  UCX  ERROR
ibv_create_cq(cqe=4096) failed: Operation not supported
[1638449983.577827] [d008:910355:0]   ib_iface.c:966  UCX  ERROR
ibv_create_cq(cqe=4096) failed: Operation not supported
[d008.nifl.fysik.dtu.dk:910355] pml_ucx.c:273  Error: Failed to create
UCP worker
[d008.nifl.fysik.dtu.dk:910356] pml_ucx.c:273  Error: Failed to create
UCP worker

(null): There are 2 processes

(null): Rank  0:  d008

(null): Rank  1:  d008

Conclusion: The foss-2021b toolchain with OpenMPI/4.1.1-GCC-11.2.0 seems
to be required on systems with an Omni-Path network fabric on AlmaLinux
8.5.  Perhaps the newer UCX/1.11.2-GCCcore-11.2.0 is really what's
needed, compared to UCX/1.9.0-GCCcore-10.2.0 from foss-2020b.

Does anyone have comments on this?


UCX is the problem here in combination with libfabric I think. Write a
hook that upgrades the version of UCX to 1.11-something if it's <
1.11-ish, or just that specific version if you have older-and-working
versions.


You are right that the nodes with Omni-Path have different libfabric
packages which come from the EL8.5 BaseOS as well as the latest
Cornelis/Intel Omni-Path drivers:

$ rpm -qa | grep libfabric
libfabric-verbs-1.10.0-2.x86_64
libfabric-1.12.1-1.el8.x86_64
libfabric-devel-1.12.1-1.el8.x86_64
libfabric-psm2-1.10.0-2.x86_64

The 1.12 packages are from EL8.5, and 1.10 packages are from Cornelis.

Regarding UCX, I was first using the trusted foss-2020b toolchain which
includes UCX/1.9.0-GCCcore-10.2.0. I guess that we shouldn't mess with
the toolchains?

The foss-2021b toolchain includes the newer UCX 1.11, which seems to
solve this particular problem.

Can we make any best practices recommendations from these observations?


I didn't check properly, but UCX does not depend on libfabric, OpenMPI
does, so I'd write a hook that replaces libfabric < 1.12 with at least
1.12.1.
Sometimes you just have to mess with the toolchains, and this looks like
one of those situations.

Or as a test build your own OpenMPI-4.1.0 or 4.0.5 (that 2020b uses)
with an updated libfabric and check if that fixes the problem. And if it
does, write a hook that replaces libfabric. See the framework/contrib
for examples, I did that for UCX so there is code there to show you how.


I don't feel qualified to mess around with modifying EB toolchains...

The foss-2021b toolchain including OpenMPI/4.1.1-GCC-11.2.0 seems to solve 
the present problem.  Do you think there are any disadvantages with asking 
users to go for foss-2021b?  Of course we may need several modules to be 
upgraded from foss-2020b to foss-2021b.


Another possibility may be the coming driver upgrade from Cornelis 
Networks to support the Omni-Path fabric on EL 8.4 and EL 8.5.  I'm 
definitely going to check this when it becomes available.


Thanks,
Ole


Re: [easybuild] ORCA 5.0.1 fails in testing ("There are not enough slots")

2021-11-15 Thread Ole Holm Nielsen

Hi Alan,

Fantastic, adding `--parallel=12` fixed the issue:

$ eb ORCA-5.0.1-gompi-2021a.eb -r --parallel=12

Thanks a lot,
Ole


On 11/15/21 09:59, Alan O'Cais wrote:
Hmm, this has now come up a few times. OpenMPI does not like 
hyperthreading and only cares about the physical cores. EB is passing the 
number of cores it sees as the number of required slots. Without 
oversubscription the example will not run. Either we allow 
oversubscription, or we figure out a method to quantify the hyperthreading.


There are a few open issues on this, see 
https://github.com/easybuilders/easybuild-easyblocks/pull/2611 
<https://github.com/easybuilders/easybuild-easyblocks/pull/2611> and the 
linked issues.


For an immediate fix, you just need to limit the number of cores used for 
the build, e.g. use the eb option `--parallel=12`



On Mon, 15 Nov 2021 at 09:06, Ole Holm Nielsen <mailto:ole.h.niel...@fysik.dtu.dk>> wrote:


We use EB 4.5.0 and would like to install this module:

$ eb ORCA-5.0.1-gompi-2021a.eb -r

but it fails with:

== FAILED: Installation ended unsuccessfully (build directory:
/dev/shm/ORCA/5.0.1/gompi-2021a): build failed (first 300 chars): Sanity
check failed: sanity check command $EBROOTORCA/bin/orca
/dev/shm/ORCA/5.0.1/gompi-2021a/eb_test_hf_water.inp | grep -c 'FINAL
SINGLE POINT ENERGY[    ]*-75.95934031' exited with code 1 (output:
--
There are not enough slots (took 1 min 50 secs)
== Results of the build can be found in the log file(s)
/tmp/eb-2QJPW_/easybuild-ORCA-5.0.1-2021.140110.qlMvK.log
ERROR: Build of

/home/modules/software/EasyBuild/4.5.0/easybuild/easyconfigs/o/ORCA/ORCA-5.0.1-gompi-2021a.eb

failed (err: "build failed (first 300 chars): Sanity check failed: sanity
check command $EBROOTORCA/bin/orca
/dev/shm/ORCA/5.0.1/gompi-2021a/eb_test_hf_water.inp | grep -c 'FINAL
SINGLE POINT ENERGY[ \t]*-75.95934031' exited with code 1 (output:

--\nThere

are not enough slots")


There are further errors in the logfile:

== 2021-11-11 14:03:00,669 build_log.py:169 ERROR EasyBuild crashed with
an error (at easybuild/base/exceptions.py:124 in __init__): Sanity check
failed: sanity check command $EBROOTORCA/bin/orca
/dev/shm/ORCA/5.0.1/gompi-2021a/eb_test_hf_water.inp | grep -c 'FINAL
SINGLE POINT ENERGY[  ]*-75.95934031' exited with code 1 (output:
--
There are not enough slots available in the system to satisfy the 48
slots that were requested by the application:

    /home/modules/software/ORCA/5.0.1-gompi-2021a/bin/orca_gtoint_mpi

Either request fewer slots for your application, or make more slots
available for use.

A "slot" is the Open MPI term for an allocatable unit where we can
launch a process.  The number of slots available are defined by the
environment in which Open MPI processes are run:

    1. Hostfile, via "slots=N" clauses (N defaults to number of
       processor cores if not provided)
    2. The --host command line parameter, via a ":N" suffix on the
       hostname (N defaults to 1 if not provided)
    3. Resource manager (e.g., SLURM, PBS/Torque, LSF, etc.)
    4. If none of a hostfile, the --host command line parameter, or an
       RM is present, Open MPI defaults to the number of processor cores

In all the above cases, if you want Open MPI to default to the number
of hardware threads instead of the number of processor cores, use the
--use-hwthread-cpus option.

Alternatively, you can use the --oversubscribe option to ignore the
number of available slots when deciding the number of processes to
launch.
--
[file orca_tools/qcmsg.cpp, line 458]:
     aborting the run

0
) (at easybuild/framework/easyblock.py:3311 in _sanity_check_step)


Question: Why does the ORCA testing request 48 MPI "slots" (MPI tasks I
suppose) and then fails?

The build host has two Intel(R) Xeon(R) CPU E5-2650 v4 processors for 48
cores (including Hyperthreading).

The ORCA input file /dev/shm/ORCA/5.0.1/gompi-2021a/eb_test_hf_water.inp
contains:

!HF DEF2-SVP
%PAL NPROCS 48 END
* xyz 0 1
O   0.   0.   0.0626
H  -0.7920   0.  -0.4973
H   0.7920   0.  -0.4973
*

The user's limits would seem to be sufficient:

$ ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) 5000
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 103049

[easybuild] Code crashes with multi-node MPI jobs on AMD Rome nodes running OpenHPC 1.3

2021-06-25 Thread Ole Holm Nielsen
/modules/software/Python/3.8.6-GCCcore-10.2.0/lib/libpython3.8.so.1.0(PyEval_EvalCode+0x1b)[0x2ae7453b]
[sn537:02026] [29] 
/groups/physics/modules/software/Python/3.8.6-GCCcore-10.2.0/lib/libpython3.8.so.1.0(+0x1aa8e5)[0x2ae798e5]

[sn537:02026] *** End of error message ***



--
Ole Holm Nielsen
PhD, Senior HPC Officer
Department of Physics, Technical University of Denmark


Re: [easybuild] MATLAB: ISOs no longer available?

2021-06-08 Thread Ole Holm Nielsen

Hi Loris,

On 6/8/21 2:02 PM, Loris Bennett wrote:

MATLAB-2021a.eb seems to expect to install from ISOs, but what I can
download from MathWorks is just a zip file of binaries and other stuff.
I remember doing the ISO thing in the past, so has this maybe just
changed recently?

Or are the ISOs still available somewhere?


Our university gets MATLAB as ISO images, for example R2021a_Linux.iso.
I'm not involved in this, since another department handles MATLAB licenses.

See also 
https://github.com/easybuilders/easybuild-easyconfigs/tree/main/easybuild/easyconfigs/m/MATLAB


Best regards,
Ole

--
Ole Holm Nielsen
PhD, Senior HPC Officer
Department of Physics, Technical University of Denmark,
Fysikvej Building 309, DK-2800 Kongens Lyngby, Denmark
E-mail: ole.h.niel...@fysik.dtu.dk
Homepage: http://dcwww.fysik.dtu.dk/~ohnielse/
Mobile: (+45) 5180 1620


[easybuild] ERROR: No compatible 'python' command found via $PATH (EasyBuild requires Python 2.6+ or 3.5+)

2021-06-04 Thread Ole Holm Nielsen

I just installed EasyBuild v4.4.0 on a server running Almalinux 8.4:

$ python3 bootstrap_eb.py $EASYBUILD_PREFIX

Unfortunately, I'm now getting this error about a missing python:

$ eb --help
ERROR: No compatible 'python' command found via $PATH (EasyBuild requires 
Python 2.6+ or 3.5+)


Obviously, Python3 is installed on this el8 system:

$ which python python3
/usr/bin/which: no python in 
(/home/modules/software/EasyBuild/4.4.0/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/opt/modules/bin)

/usr/bin/python3

$ rpm -qf /usr/bin/python3
python36-3.6.8-2.module_el8.3.0+6191+6b4b10ec.x86_64

Is there a fix for the inability to locate the python command?

Thanks,
Ole

--
Ole Holm Nielsen
PhD, Senior HPC Officer
Department of Physics, Technical University of Denmark,
Fysikvej Building 309, DK-2800 Kongens Lyngby, Denmark
E-mail: ole.h.niel...@fysik.dtu.dk
Homepage: http://dcwww.fysik.dtu.dk/~ohnielse/
Mobile: (+45) 5180 1620


Re: [easybuild] ERROR: No compatible 'python' command found via $PATH (EasyBuild requires Python 2.6+ or 3.5+)

2021-06-04 Thread Ole Holm Nielsen

On 6/4/21 2:41 PM, Kenneth Hoste wrote:

On 04/06/2021 14:36, Ole Holm Nielsen wrote:
$ ls -la 
/home/modules/software/EasyBuild/4.4.0/lib/python3.6/site-packages/easybuild/base 


total 172
drwxr-xr-x. 3 modules modules   147 Jun  4 07:30 .
drwxr-xr-x. 8 modules modules   137 Jun  3 09:42 ..
-rw-r--r--. 1 modules modules  5174 Jun  2 11:54 exceptions.py
-rw-r--r--. 1 modules modules 34846 Jun  2 11:54 fancylogger.py
-rw-r--r--. 1 modules modules 78284 Jun  2 11:54 generaloption.py
-rw-r--r--. 1 modules modules 22858 Jun  2 11:54 optcomplete.py
drwxr-xr-x. 2 modules modules  4096 Jun  3 09:42 __pycache__
-rw-r--r--. 1 modules modules 11621 Jun  2 11:54 rest.py
-rw-r--r--. 1 modules modules  6494 Jun  2 11:54 testing.py


Somehow you're missing frozendict.py and wrapper.py there...

I have no idea how that happened, but I strongly suspect that something 
cleaned up those files after the bootstrap was done.


frozendict.py and wrapper.py are the files that have been touched least 
recently in that directory, which could have something to do with it 
(automatic cleanup of old files by a cron job?!), but I'm guessing...


I think you guessed right!  This node did have an automatic cleanup job, 
which I've removed now.  I'm sorry for having overlooked this!


I'll reinstall EasyBuild in an empty /home/modules folder and build my 
modules.  Probably everything is going to be fine then :-)


Thanks for your support!

/Ole



On 6/4/21 2:34 PM, Kenneth Hoste wrote:

On 04/06/2021 14:30, Ole Holm Nielsen wrote:

On 6/4/21 2:22 PM, Kenneth Hoste wrote:

Hi Ole,

On 04/06/2021 14:09, Ole Holm Nielsen wrote:

On 6/4/21 1:54 PM, Kenneth Hoste wrote:

Please try running this, which will probably reveal the problem:

    python3 -O -m easybuild.main


$ python3 -O -m easybuild.main
Traceback (most recent call last):
   File "/usr/lib64/python3.6/runpy.py", line 193, in 
_run_module_as_main

 "__main__", mod_spec)
   File "/usr/lib64/python3.6/runpy.py", line 85, in _run_code
 exec(code, run_globals)
   File 
"/home/modules/software/EasyBuild/4.4.0/lib/python3.6/site-packages/easybuild/main.py", 
line 48, in 
 from easybuild.framework.easyblock import 
build_and_install_one, inject_checksums
   File 
"/home/modules/software/EasyBuild/4.4.0/lib/python3.6/site-packages/easybuild/framework/easyblock.py", 
line 54, in 

 import easybuild.tools.environment as env
   File 
"/home/modules/software/EasyBuild/4.4.0/lib/python3.6/site-packages/easybuild/tools/environment.py", 
line 36, in 

 from easybuild.tools.config import build_option
   File 
"/home/modules/software/EasyBuild/4.4.0/lib/python3.6/site-packages/easybuild/tools/config.py", 
line 47, in 

 from easybuild.base.frozendict import FrozenDictKnownKeys
ModuleNotFoundError: No module named 'easybuild.base.frozendict'


So the conclusion that 'python3' doesn't give access to a working 
EasyBuild installation was correct...


Your EasyBuild installation is basically broken, but the question 
then is how this happened, since the bootstrap procedure includes a 
check to ensure the installation completed correctly...


Do you have a 'base' subdirectory in 
/home/modules/software/EasyBuild/4.4.0/lib/python3.6/site-packages/easybuild/, 
and if so, what's in there?


$ ls -la 
/home/modules/software/EasyBuild/4.4.0/lib/python3.6/site-packages/easybuild 


total 44
drwxr-xr-x.  8 modules modules   137 Jun  3 09:42 .
drwxr-xr-x.  4 modules modules   182 Jun  3 09:42 ..
drwxr-xr-x.  3 modules modules   147 Jun  4 07:30 base
drwxr-xr-x. 31 modules modules  4096 Jun  3 09:42 easyblocks
drwxr-xr-x.  4 modules modules   151 Jun  3 09:42 framework
-rw-r--r--.  1 modules modules  1114 Jun  2 11:54 __init__.py
-rw-r--r--.  1 modules modules 24801 Jun  2 11:54 main.py
drwxr-xr-x.  2 modules modules   134 Jun  3 09:42 __pycache__
drwxr-xr-x.  7 modules modules  4096 Jun  4 07:30 toolchains
drwxr-xr-x. 11 modules modules  4096 Jun  4 07:30 tools



And what do you have in 
/home/modules/software/EasyBuild/4.4.0/lib/python3.6/site-packages/easybuild/base 
then?





Is this problem reproducible? If you reinstall EasyBuild v4.4.0, does 
the problem stay?


I've done the bootstrap again now.  The bootstrap log file is attached.

I have the same problem:

[modules@d059 ~]$ eb --help
 >> Considering '/usr/bin/python3'...
 >> '/usr/bin/python3' version: 3.6.8, which matches Python 3 version 
requirement (>= 3.5)
 >> '/usr/bin/python3' is NOT able to import 'easybuild.main', so NOT 
retaining it

 >> Considering 'python'...
 >> No 'python' found in $PATH, skipping...
 >> Considering 'python3'...
 >> 'python3' version: 3.6.8, which matches Python 3 version 
requirement (>= 3.5)

 >> 'python3' is NOT able to import 'easybuild.main', so NOT retaining it
 >> Considering 'python2'...
 >> No 'python2' found in $PATH, skipping...
ERROR: No co

Re: [easybuild] ERROR: No compatible 'python' command found via $PATH (EasyBuild requires Python 2.6+ or 3.5+)

2021-06-04 Thread Ole Holm Nielsen

On 6/4/21 2:22 PM, Kenneth Hoste wrote:

Hi Ole,

On 04/06/2021 14:09, Ole Holm Nielsen wrote:

On 6/4/21 1:54 PM, Kenneth Hoste wrote:

Please try running this, which will probably reveal the problem:

    python3 -O -m easybuild.main


$ python3 -O -m easybuild.main
Traceback (most recent call last):
   File "/usr/lib64/python3.6/runpy.py", line 193, in _run_module_as_main
 "__main__", mod_spec)
   File "/usr/lib64/python3.6/runpy.py", line 85, in _run_code
 exec(code, run_globals)
   File 
"/home/modules/software/EasyBuild/4.4.0/lib/python3.6/site-packages/easybuild/main.py", 
line 48, in 
 from easybuild.framework.easyblock import build_and_install_one, 
inject_checksums
   File 
"/home/modules/software/EasyBuild/4.4.0/lib/python3.6/site-packages/easybuild/framework/easyblock.py", 
line 54, in 

 import easybuild.tools.environment as env
   File 
"/home/modules/software/EasyBuild/4.4.0/lib/python3.6/site-packages/easybuild/tools/environment.py", 
line 36, in 

 from easybuild.tools.config import build_option
   File 
"/home/modules/software/EasyBuild/4.4.0/lib/python3.6/site-packages/easybuild/tools/config.py", 
line 47, in 

 from easybuild.base.frozendict import FrozenDictKnownKeys
ModuleNotFoundError: No module named 'easybuild.base.frozendict'


So the conclusion that 'python3' doesn't give access to a working 
EasyBuild installation was correct...


Your EasyBuild installation is basically broken, but the question then is 
how this happened, since the bootstrap procedure includes a check to 
ensure the installation completed correctly...


Do you have a 'base' subdirectory in 
/home/modules/software/EasyBuild/4.4.0/lib/python3.6/site-packages/easybuild/, 
and if so, what's in there?


$ ls -la 
/home/modules/software/EasyBuild/4.4.0/lib/python3.6/site-packages/easybuild

total 44
drwxr-xr-x.  8 modules modules   137 Jun  3 09:42 .
drwxr-xr-x.  4 modules modules   182 Jun  3 09:42 ..
drwxr-xr-x.  3 modules modules   147 Jun  4 07:30 base
drwxr-xr-x. 31 modules modules  4096 Jun  3 09:42 easyblocks
drwxr-xr-x.  4 modules modules   151 Jun  3 09:42 framework
-rw-r--r--.  1 modules modules  1114 Jun  2 11:54 __init__.py
-rw-r--r--.  1 modules modules 24801 Jun  2 11:54 main.py
drwxr-xr-x.  2 modules modules   134 Jun  3 09:42 __pycache__
drwxr-xr-x.  7 modules modules  4096 Jun  4 07:30 toolchains
drwxr-xr-x. 11 modules modules  4096 Jun  4 07:30 tools

Is this problem reproducible? If you reinstall EasyBuild v4.4.0, does the 
problem stay?


I've done the bootstrap again now.  The bootstrap log file is attached.

I have the same problem:

[modules@d059 ~]$ eb --help
>> Considering '/usr/bin/python3'...
>> '/usr/bin/python3' version: 3.6.8, which matches Python 3 version 
requirement (>= 3.5)
>> '/usr/bin/python3' is NOT able to import 'easybuild.main', so NOT 
retaining it

>> Considering 'python'...
>> No 'python' found in $PATH, skipping...
>> Considering 'python3'...
>> 'python3' version: 3.6.8, which matches Python 3 version requirement 
(>= 3.5)

>> 'python3' is NOT able to import 'easybuild.main', so NOT retaining it
>> Considering 'python2'...
>> No 'python2' found in $PATH, skipping...
ERROR: No compatible 'python' command found via $PATH (EasyBuild requires 
Python 2.6+ or 3.5+)




Somehow that's resulting in a non-zero exit code, which makes the 'eb' 
wrapper conclude it can't use the 'python3' command.


You can control which python* command is used to run EasyBuild using 
the $EB_PYTHON environment variable:


  export EB_PYTHON=python3

But that won't make any difference here, it should work already with 
python3?


If you also define $EB_VERBOSE (any value, so "export EB_VERBOSE=1"), 
you'll get a bit more information.


Indeed:

$ export EB_VERBOSE=1
$ eb --help
 >> Considering '/usr/bin/python3'...
 >> '/usr/bin/python3' version: 3.6.8, which matches Python 3 version 
requirement (>= 3.5)
 >> '/usr/bin/python3' is NOT able to import 'easybuild.main', so NOT 
retaining it

 >> Considering 'python'...
 >> No 'python' found in $PATH, skipping...
 >> Considering 'python3'...
 >> 'python3' version: 3.6.8, which matches Python 3 version requirement 
(>= 3.5)

 >> 'python3' is NOT able to import 'easybuild.main', so NOT retaining it
 >> Considering 'python2'...
 >> No 'python2' found in $PATH, skipping...
ERROR: No compatible 'python' command found via $PATH (EasyBuild 
requires Python 2.6+ or 3.5+)


$ python3 -O -m easybuild.main
Traceback (most recent call last):
   File "/usr/lib64/python3.6/runpy.py", line 193, in _run_module_as_main
 "__main__", mod_spec)
   File "/usr/lib64/python3.6/runpy.py", line 85, in _run_code
 exec(code, run_globals)
   File 
"/home/modules/software/EasyBuild/4.4.0/lib/python3.

Re: [easybuild] ERROR: No compatible 'python' command found via $PATH (EasyBuild requires Python 2.6+ or 3.5+)

2021-06-04 Thread Ole Holm Nielsen

On 6/4/21 1:54 PM, Kenneth Hoste wrote:

Please try running this, which will probably reveal the problem:

    python3 -O -m easybuild.main


$ python3 -O -m easybuild.main
Traceback (most recent call last):
  File "/usr/lib64/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
  File "/usr/lib64/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
  File 
"/home/modules/software/EasyBuild/4.4.0/lib/python3.6/site-packages/easybuild/main.py", 
line 48, in 
from easybuild.framework.easyblock import build_and_install_one, 
inject_checksums
  File 
"/home/modules/software/EasyBuild/4.4.0/lib/python3.6/site-packages/easybuild/framework/easyblock.py", 
line 54, in 

import easybuild.tools.environment as env
  File 
"/home/modules/software/EasyBuild/4.4.0/lib/python3.6/site-packages/easybuild/tools/environment.py", 
line 36, in 

from easybuild.tools.config import build_option
  File 
"/home/modules/software/EasyBuild/4.4.0/lib/python3.6/site-packages/easybuild/tools/config.py", 
line 47, in 

from easybuild.base.frozendict import FrozenDictKnownKeys
ModuleNotFoundError: No module named 'easybuild.base.frozendict'


Somehow that's resulting in a non-zero exit code, which makes the 'eb' 
wrapper conclude it can't use the 'python3' command.


You can control which python* command is used to run EasyBuild using the 
$EB_PYTHON environment variable:


  export EB_PYTHON=python3

But that won't make any difference here, it should work already with python3?

If you also define $EB_VERBOSE (any value, so "export EB_VERBOSE=1"), 
you'll get a bit more information.


Indeed:

$ export EB_VERBOSE=1
$ eb --help
>> Considering '/usr/bin/python3'...
>> '/usr/bin/python3' version: 3.6.8, which matches Python 3 version 
requirement (>= 3.5)
>> '/usr/bin/python3' is NOT able to import 'easybuild.main', so NOT 
retaining it

>> Considering 'python'...
>> No 'python' found in $PATH, skipping...
>> Considering 'python3'...
>> 'python3' version: 3.6.8, which matches Python 3 version requirement 
(>= 3.5)

>> 'python3' is NOT able to import 'easybuild.main', so NOT retaining it
>> Considering 'python2'...
>> No 'python2' found in $PATH, skipping...
ERROR: No compatible 'python' command found via $PATH (EasyBuild requires 
Python 2.6+ or 3.5+)


$ python3 -O -m easybuild.main
Traceback (most recent call last):
  File "/usr/lib64/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
  File "/usr/lib64/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
  File 
"/home/modules/software/EasyBuild/4.4.0/lib/python3.6/site-packages/easybuild/main.py", 
line 48, in 
from easybuild.framework.easyblock import build_and_install_one, 
inject_checksums
  File 
"/home/modules/software/EasyBuild/4.4.0/lib/python3.6/site-packages/easybuild/framework/easyblock.py", 
line 54, in 

import easybuild.tools.environment as env
  File 
"/home/modules/software/EasyBuild/4.4.0/lib/python3.6/site-packages/easybuild/tools/environment.py", 
line 36, in 

from easybuild.tools.config import build_option
  File 
"/home/modules/software/EasyBuild/4.4.0/lib/python3.6/site-packages/easybuild/tools/config.py", 
line 47, in 

from easybuild.base.frozendict import FrozenDictKnownKeys
ModuleNotFoundError: No module named 'easybuild.base.frozendict'


On 04/06/2021 13:33, Ole Holm Nielsen wrote:

I just installed EasyBuild v4.4.0 on a server running Almalinux 8.4:

$ python3 bootstrap_eb.py $EASYBUILD_PREFIX

Unfortunately, I'm now getting this error about a missing python:

$ eb --help
ERROR: No compatible 'python' command found via $PATH (EasyBuild 
requires Python 2.6+ or 3.5+)


Obviously, Python3 is installed on this el8 system:

$ which python python3
/usr/bin/which: no python in 
(/home/modules/software/EasyBuild/4.4.0/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/opt/modules/bin) 


/usr/bin/python3

$ rpm -qf /usr/bin/python3
python36-3.6.8-2.module_el8.3.0+6191+6b4b10ec.x86_64

Is there a fix for the inability to locate the python command?

Thanks,
Ole



--
Ole Holm Nielsen
PhD, Senior HPC Officer
Department of Physics, Technical University of Denmark,
Fysikvej Building 309, DK-2800 Kongens Lyngby, Denmark
E-mail: ole.h.niel...@fysik.dtu.dk
Homepage: http://dcwww.fysik.dtu.dk/~ohnielse/
Mobile: (+45) 5180 1620


Re: [easybuild] Build failure with PyTorch-1.7.1-fosscuda-2020b.eb

2021-06-04 Thread Ole Holm Nielsen

Hi Alexander,

On 6/4/21 1:46 PM, Alexander Grund wrote:



==
ERROR: test_process_group_as_module_member 
(__main__.C10dProcessGroupSerialization)

--
Traceback (most recent call last):
  File 
"/tmp/eb-3tUIrb/tmpFJsxiC/lib/python3.8/site-packages/torch/testing/_internal/common_distributed.py", 
line 146, in wrapper

    return func(*args, **kwargs)
  File "distributed/test_jit_c10d.py", line 228, in 
test_process_group_as_module_member

    self.checkModule(TestModule(), (torch.rand((2, 3)),))
  File "distributed/test_jit_c10d.py", line 216, in __init__
    tcp_store = _create_tcp_store()
  File "distributed/test_jit_c10d.py", line 40, in _create_tcp_store
    return torch.classes.dist_c10d.TCPStore(addr, port, 1, True, 
timeout_millisecond)
RuntimeError: Address already in use 
Please report that issue in the pytorch github repo. This is something 
they need to fix or at least investigate first


Thanks for your recommendation!  The PR is:
https://github.com/pytorch/pytorch/issues/59441

/Ole



Re: [easybuild] Build failure with PyTorch-1.7.1-fosscuda-2020b.eb

2021-06-04 Thread Ole Holm Nielsen

Hi Kenneth,

Thanks, but PR 12814 may still have an issue with testing network ports, 
see https://github.com/easybuilders/easybuild-easyconfigs/pull/12814


I get an error about a TCP port being in use in the log file.  On CentOS 
7.9 I always see network ports being in use after I close down Flexlm 
license servers, but after a couple of minutes they're free again.  I 
don't know if there's a configurable TCP port timeout in CentOS/RHEL 7?  I 
think my system uses default values.


Log file:

(lines deleted)
Running distributed/test_jit_c10d ... [2021-06-03 11:45:18.408449]
Executing 
['/home/modules/software/Python/3.8.6-GCCcore-10.2.0/bin/python', 
'distributed/test_jit_c10d.py', '-v'] ... [2021-06-03 11:45:18.408656]

test_frontend_singleton (__main__.C10dFrontendJitTest) ... ok
test_process_group_as_module_member 
(__main__.C10dProcessGroupSerialization) ... ERROR
test_init_process_group_nccl_as_base_process_group_torchbind 
(__main__.ProcessGroupNCCLJitTest) ... ok
test_init_process_group_nccl_torchbind (__main__.ProcessGroupNCCLJitTest) 
... ok
test_process_group_nccl_as_base_process_group_torchbind_alltoall 
(__main__.ProcessGroupNCCLJitTest) ... ok
test_process_group_nccl_serialization (__main__.ProcessGroupNCCLJitTest) 
... ok
test_process_group_nccl_torchbind_alltoall 
(__main__.ProcessGroupNCCLJitTest) ... ok

test_create_file_store (__main__.StoreTest) ... ok
test_create_prefix_store (__main__.StoreTest) ... ok

==
ERROR: test_process_group_as_module_member 
(__main__.C10dProcessGroupSerialization)

--
Traceback (most recent call last):
  File 
"/tmp/eb-3tUIrb/tmpFJsxiC/lib/python3.8/site-packages/torch/testing/_internal/common_distributed.py", 
line 146, in wrapper

return func(*args, **kwargs)
  File "distributed/test_jit_c10d.py", line 228, in 
test_process_group_as_module_member

self.checkModule(TestModule(), (torch.rand((2, 3)),))
  File "distributed/test_jit_c10d.py", line 216, in __init__
tcp_store = _create_tcp_store()
  File "distributed/test_jit_c10d.py", line 40, in _create_tcp_store
return torch.classes.dist_c10d.TCPStore(addr, port, 1, True, 
timeout_millisecond)

RuntimeError: Address already in use

--
Ran 9 tests in 0.410s

FAILED (errors=1)
Traceback (most recent call last):
  File "run_test.py", line 926, in 
main()
  File "run_test.py", line 905, in main
raise RuntimeError(err_message)
RuntimeError: distributed/test_jit_c10d failed!
 (at easybuild/tools/run.py:565 in parse_cmd_output)
== 2021-06-03 11:45:23,800 filetools.py:1876 INFO Removing lock 
/home/modules/software/.locks/_home_modules_software_PyTorch_1.8.1-fosscuda-2020b.lock...
== 2021-06-03 11:45:23,802 filetools.py:358 INFO Path 
/home/modules/software/.locks/_home_modules_software_PyTorch_1.8.1-fosscuda-2020b.lock 
successfully removed.
== 2021-06-03 11:45:23,802 filetools.py:1880 INFO Lock removed: 
/home/modules/software/.locks/_home_modules_software_PyTorch_1.8.1-fosscuda-2020b.lock
== 2021-06-03 11:45:23,802 easyblock.py:3643 WARNING build failed (first 
300 chars): cmd "export 
PYTHONPATH=/tmp/eb-3tUIrb/tmpFJsxiC/lib/python3.8/site-packages:$PYTHONPATH 
&&  cd test && PYTHONUNBUFFERED=1 
/home/modules/software/Python/3.8.6-GCCcore-10.2.0/bin/python run_test.py 
--verbose -x distributed/rpc/test_process_group_agent test_quantization " 
exited with exit code 1 and ou
== 2021-06-03 11:45:23,803 easyblock.py:300 INFO Closing log for 
application name PyTorch version 1.8.1


/Ole

On 6/4/21 12:03 PM, Kenneth Hoste wrote:

See https://github.com/easybuilders/easybuild-easyconfigs/pull/12814

It needs a bit more love though (CI tests are failing currently), but you 
can try installing that using "eb --from-pr 12814".



regards,

Kenneth

On 03/06/2021 10:17, Alexander Grund wrote:

There is an open PR in the easyconfigs repo. Check that :)

Am 03.06.21 um 10:16 schrieb Ole Holm Nielsen:

Our users report an error with this PyTorch module:

AssertionError: Torch not compiled with CUDA enabled

Are there any plans to make a PyTorch-1.8.1 module with the fosscuda 
toolchain?


Thanks,
Ole


On 6/3/21 8:54 AM, Kenneth Hoste wrote:

Excellent, that's great to hear, thanks for the update Ole!


regards,

Kenneth

On 03/06/2021 07:42, Ole Holm Nielsen wrote:

Hi Kenneth,

I can confirm that with EasyBuild v4.4.0 the PyTorch 1.8.1 
installation went smoothly and without any problems:


$ eb PyTorch-1.8.1-foss-2020b.eb -r

Best regards,
Ole

On 6/1/21 5:13 PM, Kenneth Hoste wrote:

Hi Ole,

This error doesn't mean anything in particular for me, but perhaps 
it rings a bell for Alexander (in CC).


There are a couple of fixes related to PyTorch that will be included 
in the upcoming EasyBuild v4.4.0

Re: [easybuild] Build failure with PyTorch-1.7.1-fosscuda-2020b.eb

2021-06-03 Thread Ole Holm Nielsen

Our users report an error with this PyTorch module:

AssertionError: Torch not compiled with CUDA enabled

Are there any plans to make a PyTorch-1.8.1 module with the fosscuda 
toolchain?


Thanks,
Ole


On 6/3/21 8:54 AM, Kenneth Hoste wrote:

Excellent, that's great to hear, thanks for the update Ole!


regards,

Kenneth

On 03/06/2021 07:42, Ole Holm Nielsen wrote:

Hi Kenneth,

I can confirm that with EasyBuild v4.4.0 the PyTorch 1.8.1 installation 
went smoothly and without any problems:


$ eb PyTorch-1.8.1-foss-2020b.eb -r

Best regards,
Ole

On 6/1/21 5:13 PM, Kenneth Hoste wrote:

Hi Ole,

This error doesn't mean anything in particular for me, but perhaps it 
rings a bell for Alexander (in CC).


There are a couple of fixes related to PyTorch that will be included in 
the upcoming EasyBuild v4.4.0 release (which will be released tomorrow 
hopefully), so keep an eye out for that...



regards,

Kenneth


On 01/06/2021 09:56, Ole Holm Nielsen wrote:

Dear EasyBuilders,

I'm trying to build PyTorch-1.7.1-fosscuda-2020b.eb on a CentOS 7 
server with some Nvidia GPUs, and the build fails in the tests after 
about 2 hours:


$ eb PyTorch-1.7.1-fosscuda-2020b.eb -r
== Temporary log file in case of crash 
/tmp/eb-zAAAvr/easybuild-TDNRVQ.log
== found valid index for 
/home/modules/software/EasyBuild/4.3.4/easybuild/easyconfigs, so using 
it...
== found valid index for 
/home/modules/software/EasyBuild/4.3.4/easybuild/easyconfigs, so using 
it...

== resolving dependencies ...
== processing EasyBuild easyconfig 
/home/modules/software/EasyBuild/4.3.4/easybuild/easyconfigs/p/PyTorch/PyTorch-1.7.1-fosscuda-2020b.eb 


== building and installing PyTorch/1.7.1-fosscuda-2020b...
== fetching files...
== creating build dir, resetting environment...
== unpacking...
== patching...
== preparing...
== configuring...
== building...
== testing...
== FAILED: Installation ended unsuccessfully (build directory: 
/dev/shm/PyTorch/1.7.1/fosscuda-2020b): build failed (first 300 
chars): cmd "export 
PYTHONPATH=/tmp/eb-zAAAvr/tmpnh77Vl/lib/python3.8/site-packages:$PYTHONPATH 
&&  cd test && PYTHONUNBUFFERED=1 
/home/modules/software/Python/3.8.6-GCCcore-10.2.0/bin/python 
run_test.py --verbose -x distributed/rpc/test_process_group_agent 
test_quantization " exited with exit code 1 and ou (took 1 hour 59 min 
46 sec)
== Results of the build can be found in the log file(s) 
/tmp/eb-zAAAvr/easybuild-PyTorch-1.7.1-20210601.074610.WfkGf.log
ERROR: Build of 
/home/modules/software/EasyBuild/4.3.4/easybuild/easyconfigs/p/PyTorch/PyTorch-1.7.1-fosscuda-2020b.eb 
failed (err: 'build failed (first 300 chars): cmd "export 
PYTHONPATH=/tmp/eb-zAAAvr/tmpnh77Vl/lib/python3.8/site-packages:$PYTHONPATH 
&&  cd test && PYTHONUNBUFFERED=1 
/home/modules/software/Python/3.8.6-GCCcore-10.2.0/bin/python 
run_test.py --verbose -x distributed/rpc/test_process_group_agent 
test_quantization " exited with exit code 1 and ou')



The EB log file shows these 4 errors at the end of the file:

==
ERROR: test_DistributedDataParallel (__main__.TestDistBackendWithFork)
--
Traceback (most recent call last):
   File 
"/tmp/eb-zAAAvr/tmpnh77Vl/lib/python3.8/site-packages/torch/testing/_internal/common_distributed.py", 
line 267, in wrapper

 self._join_processes(fn)
   File 
"/tmp/eb-zAAAvr/tmpnh77Vl/lib/python3.8/site-packages/torch/testing/_internal/common_distributed.py", 
line 384, in _join_processes

 self._check_return_codes(elapsed_time)
   File 
"/tmp/eb-zAAAvr/tmpnh77Vl/lib/python3.8/site-packages/torch/testing/_internal/common_distributed.py", 
line 420, in _check_return_codes

 raise RuntimeError(error)
RuntimeError: Processes 0 1 2 exited with error code 10

==
ERROR: test_DistributedDataParallel_SyncBatchNorm 
(__main__.TestDistBackendWithFork)

--
Traceback (most recent call last):
   File 
"/tmp/eb-zAAAvr/tmpnh77Vl/lib/python3.8/site-packages/torch/testing/_internal/common_distributed.py", 
line 267, in wrapper

 self._join_processes(fn)
   File 
"/tmp/eb-zAAAvr/tmpnh77Vl/lib/python3.8/site-packages/torch/testing/_internal/common_distributed.py", 
line 384, in _join_processes

 self._check_return_codes(elapsed_time)
   File 
"/tmp/eb-zAAAvr/tmpnh77Vl/lib/python3.8/site-packages/torch/testing/_internal/common_distributed.py", 
line 420, in _check_return_codes

 raise RuntimeError(error)
RuntimeError: Processes 0 1 2 exited with error code 10

==
ERROR: 
test_DistributedDataParallel_SyncBatchNorm_Diff_Input_Sizes_gradient 
(__main__.TestDistBackendWithFork)

--

Re: [easybuild] Build failure with PyTorch-1.7.1-fosscuda-2020b.eb

2021-06-02 Thread Ole Holm Nielsen

Hi Kenneth,

I can confirm that with EasyBuild v4.4.0 the PyTorch 1.8.1 installation 
went smoothly and without any problems:


$ eb PyTorch-1.8.1-foss-2020b.eb -r

Best regards,
Ole

On 6/1/21 5:13 PM, Kenneth Hoste wrote:

Hi Ole,

This error doesn't mean anything in particular for me, but perhaps it 
rings a bell for Alexander (in CC).


There are a couple of fixes related to PyTorch that will be included in 
the upcoming EasyBuild v4.4.0 release (which will be released tomorrow 
hopefully), so keep an eye out for that...



regards,

Kenneth


On 01/06/2021 09:56, Ole Holm Nielsen wrote:

Dear EasyBuilders,

I'm trying to build PyTorch-1.7.1-fosscuda-2020b.eb on a CentOS 7 server 
with some Nvidia GPUs, and the build fails in the tests after about 2 
hours:


$ eb PyTorch-1.7.1-fosscuda-2020b.eb -r
== Temporary log file in case of crash /tmp/eb-zAAAvr/easybuild-TDNRVQ.log
== found valid index for 
/home/modules/software/EasyBuild/4.3.4/easybuild/easyconfigs, so using 
it...
== found valid index for 
/home/modules/software/EasyBuild/4.3.4/easybuild/easyconfigs, so using 
it...

== resolving dependencies ...
== processing EasyBuild easyconfig 
/home/modules/software/EasyBuild/4.3.4/easybuild/easyconfigs/p/PyTorch/PyTorch-1.7.1-fosscuda-2020b.eb 


== building and installing PyTorch/1.7.1-fosscuda-2020b...
== fetching files...
== creating build dir, resetting environment...
== unpacking...
== patching...
== preparing...
== configuring...
== building...
== testing...
== FAILED: Installation ended unsuccessfully (build directory: 
/dev/shm/PyTorch/1.7.1/fosscuda-2020b): build failed (first 300 chars): 
cmd "export 
PYTHONPATH=/tmp/eb-zAAAvr/tmpnh77Vl/lib/python3.8/site-packages:$PYTHONPATH 
&&  cd test && PYTHONUNBUFFERED=1 
/home/modules/software/Python/3.8.6-GCCcore-10.2.0/bin/python 
run_test.py --verbose -x distributed/rpc/test_process_group_agent 
test_quantization " exited with exit code 1 and ou (took 1 hour 59 min 
46 sec)
== Results of the build can be found in the log file(s) 
/tmp/eb-zAAAvr/easybuild-PyTorch-1.7.1-20210601.074610.WfkGf.log
ERROR: Build of 
/home/modules/software/EasyBuild/4.3.4/easybuild/easyconfigs/p/PyTorch/PyTorch-1.7.1-fosscuda-2020b.eb 
failed (err: 'build failed (first 300 chars): cmd "export 
PYTHONPATH=/tmp/eb-zAAAvr/tmpnh77Vl/lib/python3.8/site-packages:$PYTHONPATH 
&&  cd test && PYTHONUNBUFFERED=1 
/home/modules/software/Python/3.8.6-GCCcore-10.2.0/bin/python 
run_test.py --verbose -x distributed/rpc/test_process_group_agent 
test_quantization " exited with exit code 1 and ou')



The EB log file shows these 4 errors at the end of the file:

==
ERROR: test_DistributedDataParallel (__main__.TestDistBackendWithFork)
--
Traceback (most recent call last):
   File 
"/tmp/eb-zAAAvr/tmpnh77Vl/lib/python3.8/site-packages/torch/testing/_internal/common_distributed.py", 
line 267, in wrapper

 self._join_processes(fn)
   File 
"/tmp/eb-zAAAvr/tmpnh77Vl/lib/python3.8/site-packages/torch/testing/_internal/common_distributed.py", 
line 384, in _join_processes

 self._check_return_codes(elapsed_time)
   File 
"/tmp/eb-zAAAvr/tmpnh77Vl/lib/python3.8/site-packages/torch/testing/_internal/common_distributed.py", 
line 420, in _check_return_codes

 raise RuntimeError(error)
RuntimeError: Processes 0 1 2 exited with error code 10

==
ERROR: test_DistributedDataParallel_SyncBatchNorm 
(__main__.TestDistBackendWithFork)

--
Traceback (most recent call last):
   File 
"/tmp/eb-zAAAvr/tmpnh77Vl/lib/python3.8/site-packages/torch/testing/_internal/common_distributed.py", 
line 267, in wrapper

 self._join_processes(fn)
   File 
"/tmp/eb-zAAAvr/tmpnh77Vl/lib/python3.8/site-packages/torch/testing/_internal/common_distributed.py", 
line 384, in _join_processes

 self._check_return_codes(elapsed_time)
   File 
"/tmp/eb-zAAAvr/tmpnh77Vl/lib/python3.8/site-packages/torch/testing/_internal/common_distributed.py", 
line 420, in _check_return_codes

 raise RuntimeError(error)
RuntimeError: Processes 0 1 2 exited with error code 10

==
ERROR: 
test_DistributedDataParallel_SyncBatchNorm_Diff_Input_Sizes_gradient 
(__main__.TestDistBackendWithFork)

--
Traceback (most recent call last):
   File 
"/tmp/eb-zAAAvr/tmpnh77Vl/lib/python3.8/site-packages/torch/testing/_internal/common_distributed.py", 
line 267, in wrapper

 self._join_processes(fn)
   File 
"/tmp/eb-zAAAvr/tmpnh77Vl/lib/python3.8/site-packages/torch/testing/_internal/common_distributed.p

[easybuild] Build failure with PyTorch-1.7.1-fosscuda-2020b.eb

2021-06-01 Thread Ole Holm Nielsen
vr/tmpnh77Vl/lib/python3.8/site-packages/torch/testing/_internal/common_distributed.py", 
line 267, in wrapper

self._join_processes(fn)
  File 
"/tmp/eb-zAAAvr/tmpnh77Vl/lib/python3.8/site-packages/torch/testing/_internal/common_distributed.py", 
line 384, in _join_processes

self._check_return_codes(elapsed_time)
  File 
"/tmp/eb-zAAAvr/tmpnh77Vl/lib/python3.8/site-packages/torch/testing/_internal/common_distributed.py", 
line 420, in _check_return_codes

raise RuntimeError(error)
RuntimeError: Processes 0 1 2 exited with error code 10

--
Ran 134 tests in 286.115s

FAILED (errors=4, skipped=91)
Traceback (most recent call last):
  File "run_test.py", line 745, in 
main()
  File "run_test.py", line 728, in main
raise RuntimeError(err_message)
RuntimeError: distributed/test_distributed_fork failed!
 (at easybuild/tools/run.py:537 in parse_cmd_output)
== 2021-06-01 09:45:57,406 filetools.py:1810 INFO Removing lock 
/home/modules/software/.locks/_home_modules_software_PyTorch_1.7.1-fosscuda-2020b.lock...
== 2021-06-01 09:45:57,407 filetools.py:347 INFO Path 
/home/modules/software/.locks/_home_modules_software_PyTorch_1.7.1-fosscuda-2020b.lock 
successfully removed.
== 2021-06-01 09:45:57,407 filetools.py:1814 INFO Lock removed: 
/home/modules/software/.locks/_home_modules_software_PyTorch_1.7.1-fosscuda-2020b.lock
== 2021-06-01 09:45:57,407 easyblock.py:3414 WARNING build failed (first 
300 chars): cmd "export 
PYTHONPATH=/tmp/eb-zAAAvr/tmpnh77Vl/lib/python3.8/site-packages:$PYTHONPATH 
&&  cd test && PYTHONUNBUFFERED=1 
/home/modules/software/Python/3.8.6-GCCcore-10.2.0/bin/python run_test.py 
--verbose -x distributed/rpc/test_process_group_agent test_quantization " 
exited with exit code 1 and ou
== 2021-06-01 09:45:57,407 easyblock.py:298 INFO Closing log for 
application name PyTorch version 1.7.1



Question: Does anyone know how to fix these errors?

Thanks,
Ole

--
Ole Holm Nielsen
PhD, Senior HPC Officer
Department of Physics, Technical University of Denmark,
Fysikvej Building 309, DK-2800 Kongens Lyngby, Denmark
E-mail: ole.h.niel...@fysik.dtu.dk
Homepage: http://dcwww.fysik.dtu.dk/~ohnielse/
Mobile: (+45) 5180 1620


Re: [easybuild] TensorFlow build fails in //tensorflow/core/common_runtime:graph_constructor_test

2021-05-31 Thread Ole Holm Nielsen
For the record:  I managed to build TensorFlow-2.4.1-fosscuda-2020b.eb 
using this PR:

https://github.com/easybuilders/easybuild-easyconfigs/pull/12979

/Ole

On 5/27/21 1:06 PM, Alexander Grund wrote:
Yes: At the very bottom of the log there should more information about the 
failed tests. For each of those (2) tests there should be some more 
detailed output


Search for "At least 2 gpu tests failed" and look below.

FYI: Setting EASYBUILD_TMPDIR to a large directory is not required. 
Temporary files are usually small.


Am 27.05.21 um 13:02 schrieb Ole Holm Nielsen:

On 5/27/21 10:46 AM, Alexander Grund wrote:
 > Alexandre: should we look for patterns like "No space left on 
device" in the Bazel output and highlight them better, perhaps with a 
concrete suggestion to use --tmpdir to avoid the usage of /tmp?


We could in general put something into EasyBuild, yes. I started a PR 
with enhanced error parsing which could maybe be used for that.


I've configured some larger temporary file spaces:
EASYBUILD_TMPDIR=/scratch/modules  (800+ GB available)
EASYBUILD_BUILDPATH=/dev/shm   (94 GB size)

and try to build TensorFlow:

$ eb TensorFlow-2.4.1-fosscuda-2020b.eb 
--cuda-compute-capabilities=8.0,8.6 --tmpdir=/scratch/modules


== installing extension TensorFlow 2.4.1 (28/28)...
== configuring...
== building...
== testing...
== FAILED: Installation ended unsuccessfully (build directory: 
/dev/shm/TensorFlow/2.4.1/fosscuda-2020b): build failed (first 300 
chars): At least 2 gpu tests failed:
//tensorflow/core/common_runtime/gpu:gpu_device_test, 
//tensorflow/core/common_runtime/gpu:gpu_device_unified_memory_test_gpu 
(took 55 min 27 sec)
== Results of the build can be found in the log file(s) 
/scratch/modules/eb-3l5Ptk/easybuild-TensorFlow-2.4.1-20210527.114011.EmOkP.log 

ERROR: Build of 
/home/modules/software/EasyBuild/4.3.4/easybuild/easyconfigs/t/TensorFlow/TensorFlow-2.4.1-fosscuda-2020b.eb 
failed (err: 'build failed (first 300 chars): At least 2 gpu tests 
failed:\n//tensorflow/core/common_runtime/gpu:gpu_device_test, 
//tensorflow/core/common_runtime/gpu:gpu_device_unified_memory_test_gpu')


...

Is there anything else I should look for in the logfile (size: 234 MB)?


Re: [easybuild] Build fails for OpenBLAS-0.3.12-GCC-10.2.0.eb on Intel Ice Lake processors

2021-05-31 Thread Ole Holm Nielsen

Hi Simon,

On 5/28/21 2:50 PM, Simon Branford wrote:

OpenBLAS recently added IceLake detection: 
https://github.com/xianyi/OpenBLAS/pull/3233

This has been patched in EasyBuild for OpenBLAS 0.3.12 and 0.3.15: 
https://github.com/easybuilders/easybuild-easyconfigs/pull/12865


Thanks a lot, OpenBLAS 0.3.12 does build correctly with:

$ eb --from-pr=12865

/Ole


-Original Message-
From: easybuild-requ...@lists.ugent.be  On 
Behalf Of ole.h.niel...@fysik.dtu.dk
Sent: 28 May 2021 13:40
To: easybuild@lists.ugent.be
Subject: [easybuild] Build fails for OpenBLAS-0.3.12-GCC-10.2.0.eb on Intel Ice 
Lake processors

Hi,

I'm building our software stack on a new Intel Ice Lake server (Xeon Gold
6342 CPU @ 2.80GHz 24 cores dual-socket) running AlmaLinux 8.4 (RHEL 8.4 
clone).  So this is bleeding-edge CPU and OS :-)

The OpenBLAS-0.3.12-GCC-10.2.0.eb build fails:

$ eb OpenBLAS-0.3.12-GCC-10.2.0.eb
== Temporary log file in case of crash /tmp/eb-2c6j9qyj/easybuild-qhl_uje4.log
== found valid index for
/home/modules/software/EasyBuild/4.3.4/easybuild/easyconfigs, so using it...
== found valid index for
/home/modules/software/EasyBuild/4.3.4/easybuild/easyconfigs, so using it...
== processing EasyBuild easyconfig
/home/modules/software/EasyBuild/4.3.4/easybuild/easyconfigs/o/OpenBLAS/OpenBLAS-0.3.12-GCC-10.2.0.eb
== building and installing OpenBLAS/0.3.12-GCC-10.2.0...
== fetching files...
== creating build dir, resetting environment...
== unpacking...
== patching...
== preparing...
== configuring...
== building...
== FAILED: Installation ended unsuccessfully (build directory:
/dev/shm/OpenBLAS/0.3.12/GCC-10.2.0): build failed (first 300 chars): cmd " 
make -j 48 libs netlib shared  BINARY='64'  CC='gcc'  FC='gfortran'
MAKE_NB_JOBS='-1'  USE_OPENMP='1'  USE_THREAD='1'  CFLAGS='-O2 -ftree-vectorize 
-march=native -fno-math-errno' " exited with exit code 2 and output:
getarch_2nd.c: In function main:
getarch_2nd.c:14:35: error: SGEMM_DEFAULT_U (took 1 sec) == Results of the 
build can be found in the log file(s) 
/tmp/eb-2c6j9qyj/easybuild-OpenBLAS-0.3.12-20210528.142832.FCqTY.log
ERROR: Build of
/home/modules/software/EasyBuild/4.3.4/easybuild/easyconfigs/o/OpenBLAS/OpenBLAS-0.3.12-GCC-10.2.0.eb
failed (err: 'build failed (first 300 chars): cmd " make -j 48 libs netlib 
shared  BINARY=\'64\'  CC=\'gcc\'  FC=\'gfortran\'  MAKE_NB_JOBS=\'-1\'
USE_OPENMP=\'1\'  USE_THREAD=\'1\'  CFLAGS=\'-O2 -ftree-vectorize -march=native 
-fno-math-errno\' " exited with exit code 2 and
output:\ngetarch_2nd.c: In function main:\ngetarch_2nd.c:14:35: error:
SGEMM_DEFAULT_U')


The EB log file ends with:

make: *** [Makefile.prebuild:70: getarch_2nd] Error 1
Makefile:154: *** OpenBLAS: Detecting CPU failed. Please set TARGET explicitly, 
e.g. make TARGET=your_cpu_target. Please read README for the detail..  Stop.
   (at easybuild/tools/run.py:537 in parse_cmd_output) == 2021-05-28 
14:28:34,190 filetools.py:1810 INFO Removing lock 
/home/modules/software/.locks/_home_modules_software_OpenBLAS_0.3.12-GCC-10.2.0.lock...
== 2021-05-28 14:28:34,191 filetools.py:347 INFO Path 
/home/modules/software/.locks/_home_modules_software_OpenBLAS_0.3.12-GCC-10.2.0.lock
successfully removed.
== 2021-05-28 14:28:34,191 filetools.py:1814 INFO Lock removed:
/home/modules/software/.locks/_home_modules_software_OpenBLAS_0.3.12-GCC-10.2.0.lock
== 2021-05-28 14:28:34,191 easyblock.py:3414 WARNING build failed (first
300 chars): cmd " make -j 48 libs netlib shared  BINARY='64'  CC='gcc'
FC='gfortran'  MAKE_NB_JOBS='-1'  USE_OPENMP='1'  USE_THREAD='1'
CFLAGS='-O2 -ftree-vectorize -march=native -fno-math-errno' " exited with exit 
code 2 and output:
getarch_2nd.c: In function main:
getarch_2nd.c:14:35: error: SGEMM_DEFAULT_U == 2021-05-28 14:28:34,192 
easyblock.py:298 INFO Closing log for application name OpenBLAS version 0.3.12

Question: How can I tell OpenBLAS that we have an Intel Ice Lake CPU?

It seems that OpenBLAS doesn't know about Ice Lake nor Cascade Lake :-( 
https://github.com/xianyi/OpenBLAS/blob/develop/TargetList.txt

Thanks,
Ole

--
Ole Holm Nielsen
PhD, Senior HPC Officer
Department of Physics, Technical University of Denmark, Fysikvej Building 309, 
DK-2800 Kongens Lyngby, Denmark
E-mail: ole.h.niel...@fysik.dtu.dk
Homepage: http://dcwww.fysik.dtu.dk/~ohnielse/
Mobile: (+45) 5180 1620



--
Ole Holm Nielsen
PhD, Senior HPC Officer
Department of Physics, Technical University of Denmark,
Fysikvej Building 309, DK-2800 Kongens Lyngby, Denmark
E-mail: ole.h.niel...@fysik.dtu.dk
Homepage: http://dcwww.fysik.dtu.dk/~ohnielse/
Mobile: (+45) 5180 1620


Re: [easybuild] Build fails for OpenBLAS-0.3.12-GCC-10.2.0.eb on Intel Ice Lake processors

2021-05-31 Thread Ole Holm Nielsen
My mistake!  I was playing with the TARGET variable trying to set the 
icelake architeture.  OpenBLAS-0.3.12-GCC-10.2.0.eb *does* build correctly 
on Ice Lake.


/Ole

On 5/29/21 1:29 PM, Åke Sandgren wrote:

First of all, check the Result log:
tmp/eb-jcym6or4/easybuild-OpenBLAS-0.3.12-20210528.152213.BZayz.log
and figure out what the actual problem is.

On 5/28/21 3:26 PM, Ole Holm Nielsen wrote:

Hi Simon,

On 5/28/21 2:50 PM, Simon Branford wrote:

OpenBLAS recently added IceLake detection:
https://github.com/xianyi/OpenBLAS/pull/3233


Thanks a lot for the info!  It seems that Ice Lake gets detected as
CPUTYPE_SKYLAKEX?


This has been patched in EasyBuild for OpenBLAS 0.3.12 and 0.3.15:
https://github.com/easybuilders/easybuild-easyconfigs/pull/12865


Unfortunately this PR fails on my Ice Lake server (running AlmaLinux 8.4):

$ eb --from-pr=12865
== Temporary log file in case of crash
/tmp/eb-jcym6or4/easybuild-vjctr76k.log
== found valid index for
/home/modules/software/EasyBuild/4.3.4/easybuild/easyconfigs, so using
it...
== found valid index for
/home/modules/software/EasyBuild/4.3.4/easybuild/easyconfigs, so using
it...
== processing EasyBuild easyconfig
/tmp/eb-jcym6or4/files_pr12865/o/OpenBLAS/OpenBLAS-0.3.12-GCC-10.2.0.eb
== building and installing OpenBLAS/0.3.12-GCC-10.2.-- 

Ole Holm Nielsen
PhD, Senior HPC Officer
Department of Physics, Technical University of Denmark,
Fysikvej Building 309, DK-2800 Kongens Lyngby, Denmark
E-mail: ole.h.niel...@fysik.dtu.dk
Homepage: http://dcwww.fysik.dtu.dk/~ohnielse/
Mobile: (+45) 5180 16200...

== fetching files...
== creating build dir, resetting environment...
== unpacking...
== patching...
== preparing...
== configuring...
== building...
== FAILED: Installation ended unsuccessfully (build directory:
/dev/shm/OpenBLAS/0.3.12/GCC-10.2.0): build failed (first 300 chars):
cmd " make -j 48 libs netlib shared  BINARY='64'  CC='gcc'
FC='gfortran' MAKE_NB_JOBS='-1'  USE_OPENMP='1'  USE_THREAD='1'
CFLAGS='-O2 -ftree-vectorize -march=native -fno-math-errno' " exited
with exit code 2 and output:
: warning: ISO C99 requires whitespace after the macro name
ge (took 1 sec)
== Results of the build can be found in the log file(s)
/tmp/eb-jcym6or4/easybuild-OpenBLAS-0.3.12-20210528.152213.BZayz.log
ERROR: Build of
/tmp/eb-jcym6or4/files_pr12865/o/OpenBLAS/OpenBLAS-0.3.12-GCC-10.2.0.eb
failed (err: 'build failed (first 300 chars): cmd " make -j 48 libs
netlib shared  BINARY=\'64\'  CC=\'gcc\'  FC=\'gfortran\'
MAKE_NB_JOBS=\'-1\' USE_OPENMP=\'1\'  USE_THREAD=\'1\'  CFLAGS=\'-O2
-ftree-vectorize -march=native -fno-math-errno\' " exited with exit code
2 and output:\n: warning: ISO C99 requires whitespace
after the macro name\nge')


Does anyone have an idea about how to fix this?






[easybuild] Build fails for OpenBLAS-0.3.12-GCC-10.2.0.eb on Intel Ice Lake processors

2021-05-28 Thread Ole Holm Nielsen

Hi,

I'm building our software stack on a new Intel Ice Lake server (Xeon Gold 
6342 CPU @ 2.80GHz 24 cores dual-socket) running AlmaLinux 8.4 (RHEL 8.4 
clone).  So this is bleeding-edge CPU and OS :-)


The OpenBLAS-0.3.12-GCC-10.2.0.eb build fails:

$ eb OpenBLAS-0.3.12-GCC-10.2.0.eb
== Temporary log file in case of crash /tmp/eb-2c6j9qyj/easybuild-qhl_uje4.log
== found valid index for 
/home/modules/software/EasyBuild/4.3.4/easybuild/easyconfigs, so using it...
== found valid index for 
/home/modules/software/EasyBuild/4.3.4/easybuild/easyconfigs, so using it...
== processing EasyBuild easyconfig 
/home/modules/software/EasyBuild/4.3.4/easybuild/easyconfigs/o/OpenBLAS/OpenBLAS-0.3.12-GCC-10.2.0.eb

== building and installing OpenBLAS/0.3.12-GCC-10.2.0...
== fetching files...
== creating build dir, resetting environment...
== unpacking...
== patching...
== preparing...
== configuring...
== building...
== FAILED: Installation ended unsuccessfully (build directory: 
/dev/shm/OpenBLAS/0.3.12/GCC-10.2.0): build failed (first 300 chars): cmd 
" make -j 48 libs netlib shared  BINARY='64'  CC='gcc'  FC='gfortran' 
MAKE_NB_JOBS='-1'  USE_OPENMP='1'  USE_THREAD='1'  CFLAGS='-O2 
-ftree-vectorize -march=native -fno-math-errno' " exited with exit code 2 
and output:

getarch_2nd.c: In function main:
getarch_2nd.c:14:35: error: SGEMM_DEFAULT_U (took 1 sec)
== Results of the build can be found in the log file(s) 
/tmp/eb-2c6j9qyj/easybuild-OpenBLAS-0.3.12-20210528.142832.FCqTY.log
ERROR: Build of 
/home/modules/software/EasyBuild/4.3.4/easybuild/easyconfigs/o/OpenBLAS/OpenBLAS-0.3.12-GCC-10.2.0.eb 
failed (err: 'build failed (first 300 chars): cmd " make -j 48 libs netlib 
shared  BINARY=\'64\'  CC=\'gcc\'  FC=\'gfortran\'  MAKE_NB_JOBS=\'-1\' 
USE_OPENMP=\'1\'  USE_THREAD=\'1\'  CFLAGS=\'-O2 -ftree-vectorize 
-march=native -fno-math-errno\' " exited with exit code 2 and 
output:\ngetarch_2nd.c: In function main:\ngetarch_2nd.c:14:35: error: 
SGEMM_DEFAULT_U')



The EB log file ends with:

make: *** [Makefile.prebuild:70: getarch_2nd] Error 1
Makefile:154: *** OpenBLAS: Detecting CPU failed. Please set TARGET 
explicitly, e.g. make TARGET=your_cpu_target. Please read README for the 
detail..  Stop.

 (at easybuild/tools/run.py:537 in parse_cmd_output)
== 2021-05-28 14:28:34,190 filetools.py:1810 INFO Removing lock 
/home/modules/software/.locks/_home_modules_software_OpenBLAS_0.3.12-GCC-10.2.0.lock...
== 2021-05-28 14:28:34,191 filetools.py:347 INFO Path 
/home/modules/software/.locks/_home_modules_software_OpenBLAS_0.3.12-GCC-10.2.0.lock 
successfully removed.
== 2021-05-28 14:28:34,191 filetools.py:1814 INFO Lock removed: 
/home/modules/software/.locks/_home_modules_software_OpenBLAS_0.3.12-GCC-10.2.0.lock
== 2021-05-28 14:28:34,191 easyblock.py:3414 WARNING build failed (first 
300 chars): cmd " make -j 48 libs netlib shared  BINARY='64'  CC='gcc' 
FC='gfortran'  MAKE_NB_JOBS='-1'  USE_OPENMP='1'  USE_THREAD='1' 
CFLAGS='-O2 -ftree-vectorize -march=native -fno-math-errno' " exited with 
exit code 2 and output:

getarch_2nd.c: In function main:
getarch_2nd.c:14:35: error: SGEMM_DEFAULT_U
== 2021-05-28 14:28:34,192 easyblock.py:298 INFO Closing log for 
application name OpenBLAS version 0.3.12


Question: How can I tell OpenBLAS that we have an Intel Ice Lake CPU?

It seems that OpenBLAS doesn't know about Ice Lake nor Cascade Lake :-(
https://github.com/xianyi/OpenBLAS/blob/develop/TargetList.txt

Thanks,
Ole

--
Ole Holm Nielsen
PhD, Senior HPC Officer
Department of Physics, Technical University of Denmark,
Fysikvej Building 309, DK-2800 Kongens Lyngby, Denmark
E-mail: ole.h.niel...@fysik.dtu.dk
Homepage: http://dcwww.fysik.dtu.dk/~ohnielse/
Mobile: (+45) 5180 1620


Re: [easybuild] Build fails for OpenBLAS-0.3.12-GCC-10.2.0.eb on Intel Ice Lake processors

2021-05-28 Thread Ole Holm Nielsen

Hi Simon,

On 5/28/21 2:50 PM, Simon Branford wrote:

OpenBLAS recently added IceLake detection: 
https://github.com/xianyi/OpenBLAS/pull/3233


Thanks a lot for the info!  It seems that Ice Lake gets detected as 
CPUTYPE_SKYLAKEX?



This has been patched in EasyBuild for OpenBLAS 0.3.12 and 0.3.15: 
https://github.com/easybuilders/easybuild-easyconfigs/pull/12865


Unfortunately this PR fails on my Ice Lake server (running AlmaLinux 8.4):

$ eb --from-pr=12865
== Temporary log file in case of crash /tmp/eb-jcym6or4/easybuild-vjctr76k.log
== found valid index for 
/home/modules/software/EasyBuild/4.3.4/easybuild/easyconfigs, so using it...
== found valid index for 
/home/modules/software/EasyBuild/4.3.4/easybuild/easyconfigs, so using it...
== processing EasyBuild easyconfig 
/tmp/eb-jcym6or4/files_pr12865/o/OpenBLAS/OpenBLAS-0.3.12-GCC-10.2.0.eb

== building and installing OpenBLAS/0.3.12-GCC-10.2.0...
== fetching files...
== creating build dir, resetting environment...
== unpacking...
== patching...
== preparing...
== configuring...
== building...
== FAILED: Installation ended unsuccessfully (build directory: 
/dev/shm/OpenBLAS/0.3.12/GCC-10.2.0): build failed (first 300 chars): cmd 
" make -j 48 libs netlib shared  BINARY='64'  CC='gcc'  FC='gfortran' 
MAKE_NB_JOBS='-1'  USE_OPENMP='1'  USE_THREAD='1'  CFLAGS='-O2 
-ftree-vectorize -march=native -fno-math-errno' " exited with exit code 2 
and output:

: warning: ISO C99 requires whitespace after the macro name
ge (took 1 sec)
== Results of the build can be found in the log file(s) 
/tmp/eb-jcym6or4/easybuild-OpenBLAS-0.3.12-20210528.152213.BZayz.log
ERROR: Build of 
/tmp/eb-jcym6or4/files_pr12865/o/OpenBLAS/OpenBLAS-0.3.12-GCC-10.2.0.eb 
failed (err: 'build failed (first 300 chars): cmd " make -j 48 libs netlib 
shared  BINARY=\'64\'  CC=\'gcc\'  FC=\'gfortran\'  MAKE_NB_JOBS=\'-1\' 
USE_OPENMP=\'1\'  USE_THREAD=\'1\'  CFLAGS=\'-O2 -ftree-vectorize 
-march=native -fno-math-errno\' " exited with exit code 2 and 
output:\n: warning: ISO C99 requires whitespace after the 
macro name\nge')



Does anyone have an idea about how to fix this?

Thanks,
Ole


-Original Message-
From: easybuild-requ...@lists.ugent.be  On 
Behalf Of ole.h.niel...@fysik.dtu.dk
Sent: 28 May 2021 13:40
To: easybuild@lists.ugent.be
Subject: [easybuild] Build fails for OpenBLAS-0.3.12-GCC-10.2.0.eb on Intel Ice 
Lake processors

Hi,

I'm building our software stack on a new Intel Ice Lake server (Xeon Gold
6342 CPU @ 2.80GHz 24 cores dual-socket) running AlmaLinux 8.4 (RHEL 8.4 
clone).  So this is bleeding-edge CPU and OS :-)

The OpenBLAS-0.3.12-GCC-10.2.0.eb build fails:

$ eb OpenBLAS-0.3.12-GCC-10.2.0.eb
== Temporary log file in case of crash /tmp/eb-2c6j9qyj/easybuild-qhl_uje4.log
== found valid index for
/home/modules/software/EasyBuild/4.3.4/easybuild/easyconfigs, so using it...
== found valid index for
/home/modules/software/EasyBuild/4.3.4/easybuild/easyconfigs, so using it...
== processing EasyBuild easyconfig
/home/modules/software/EasyBuild/4.3.4/easybuild/easyconfigs/o/OpenBLAS/OpenBLAS-0.3.12-GCC-10.2.0.eb
== building and installing OpenBLAS/0.3.12-GCC-10.2.0...
== fetching files...
== creating build dir, resetting environment...
== unpacking...
== patching...
== preparing...
== configuring...
== building...
== FAILED: Installation ended unsuccessfully (build directory:
/dev/shm/OpenBLAS/0.3.12/GCC-10.2.0): build failed (first 300 chars): cmd " 
make -j 48 libs netlib shared  BINARY='64'  CC='gcc'  FC='gfortran'
MAKE_NB_JOBS='-1'  USE_OPENMP='1'  USE_THREAD='1'  CFLAGS='-O2 -ftree-vectorize 
-march=native -fno-math-errno' " exited with exit code 2 and output:
getarch_2nd.c: In function main:
getarch_2nd.c:14:35: error: SGEMM_DEFAULT_U (took 1 sec) == Results of the 
build can be found in the log file(s) 
/tmp/eb-2c6j9qyj/easybuild-OpenBLAS-0.3.12-20210528.142832.FCqTY.log
ERROR: Build of
/home/modules/software/EasyBuild/4.3.4/easybuild/easyconfigs/o/OpenBLAS/OpenBLAS-0.3.12-GCC-10.2.0.eb
failed (err: 'build failed (first 300 chars): cmd " make -j 48 libs netlib 
shared  BINARY=\'64\'  CC=\'gcc\'  FC=\'gfortran\'  MAKE_NB_JOBS=\'-1\'
USE_OPENMP=\'1\'  USE_THREAD=\'1\'  CFLAGS=\'-O2 -ftree-vectorize -march=native 
-fno-math-errno\' " exited with exit code 2 and
output:\ngetarch_2nd.c: In function main:\ngetarch_2nd.c:14:35: error:
SGEMM_DEFAULT_U')


The EB log file ends with:

make: *** [Makefile.prebuild:70: getarch_2nd] Error 1
Makefile:154: *** OpenBLAS: Detecting CPU failed. Please set TARGET explicitly, 
e.g. make TARGET=your_cpu_target. Please read README for the detail..  Stop.
   (at easybuild/tools/run.py:537 in parse_cmd_output) == 2021-05-28 
14:28:34,190 filetools.py:1810 INFO Removing lock 
/home/modules/software/.locks/_home_modules_software_OpenBLAS_0.3.12-GCC-10.2.0.lock...
== 2021-05-28 14:28:34,191 filetools.py:347 INFO Path 
/home/modules/software/.locks/_home_modules_software_OpenBLAS_0.3.12-GCC-10.2.0.lock
successfully removed.
== 

Re: [easybuild] TensorFlow build fails in //tensorflow/core/common_runtime:graph_constructor_test

2021-05-27 Thread Ole Holm Nielsen

On 5/27/21 1:06 PM, Alexander Grund wrote:
Yes: At the very bottom of the log there should more information about the 
failed tests. For each of those (2) tests there should be some more 
detailed output


Search for "At least 2 gpu tests failed" and look below.


This is at the very end of the logfile:

[--] Global test environment tear-down
[==] 19 tests from 2 test suites ran. (2972 ms total)
[  PASSED  ] 18 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] GPUDeviceTest.SingleVirtualDeviceWithInvalidPriority

 1 FAILED TEST

== 2021-05-27 12:35:39,386 build_log.py:169 ERROR EasyBuild crashed with 
an error (at easybuild/base/exceptions.py:124 in __init__): At least 2 gpu 
tests failed:
//tensorflow/core/common_runtime/gpu:gpu_device_test, 
//tensorflow/core/common_runtime/gpu:gpu_device_unified_memory_test_gpu 
(at easybuild/easyblocks/t/tensorflow.py:973 in test_step)
== 2021-05-27 12:35:39,386 filetools.py:1810 INFO Removing lock 
/home/modules/software/.locks/_home_modules_software_TensorFlow_2.4.1-fosscuda-2020b.lock...
== 2021-05-27 12:35:39,387 filetools.py:347 INFO Path 
/home/modules/software/.locks/_home_modules_software_TensorFlow_2.4.1-fosscuda-2020b.lock 
successfully removed.
== 2021-05-27 12:35:39,388 filetools.py:1814 INFO Lock removed: 
/home/modules/software/.locks/_home_modules_software_TensorFlow_2.4.1-fosscuda-2020b.lock
== 2021-05-27 12:35:39,388 easyblock.py:3414 WARNING build failed (first 
300 chars): At least 2 gpu tests failed:
//tensorflow/core/common_runtime/gpu:gpu_device_test, 
//tensorflow/core/common_runtime/gpu:gpu_device_unified_memory_test_gpu
== 2021-05-27 12:35:39,388 easyblock.py:298 INFO Closing log for 
application name TensorFlow version 2.4.1



If you would help by analyzing the logfile, I can gzip it and send you an URL?

Thanks,
Ole


FYI: Setting EASYBUILD_TMPDIR to a large directory is not required. 
Temporary files are usually small.


Am 27.05.21 um 13:02 schrieb Ole Holm Nielsen:

On 5/27/21 10:46 AM, Alexander Grund wrote:
 > Alexandre: should we look for patterns like "No space left on 
device" in the Bazel output and highlight them better, perhaps with a 
concrete suggestion to use --tmpdir to avoid the usage of /tmp?


We could in general put something into EasyBuild, yes. I started a PR 
with enhanced error parsing which could maybe be used for that.


I've configured some larger temporary file spaces:
EASYBUILD_TMPDIR=/scratch/modules  (800+ GB available)
EASYBUILD_BUILDPATH=/dev/shm   (94 GB size)

and try to build TensorFlow:

$ eb TensorFlow-2.4.1-fosscuda-2020b.eb 
--cuda-compute-capabilities=8.0,8.6 --tmpdir=/scratch/modules


== installing extension TensorFlow 2.4.1 (28/28)...
== configuring...
== building...
== testing...
== FAILED: Installation ended unsuccessfully (build directory: 
/dev/shm/TensorFlow/2.4.1/fosscuda-2020b): build failed (first 300 
chars): At least 2 gpu tests failed:
//tensorflow/core/common_runtime/gpu:gpu_device_test, 
//tensorflow/core/common_runtime/gpu:gpu_device_unified_memory_test_gpu 
(took 55 min 27 sec)
== Results of the build can be found in the log file(s) 
/scratch/modules/eb-3l5Ptk/easybuild-TensorFlow-2.4.1-20210527.114011.EmOkP.log 

ERROR: Build of 
/home/modules/software/EasyBuild/4.3.4/easybuild/easyconfigs/t/TensorFlow/TensorFlow-2.4.1-fosscuda-2020b.eb 
failed (err: 'build failed (first 300 chars): At least 2 gpu tests 
failed:\n//tensorflow/core/common_runtime/gpu:gpu_device_test, 
//tensorflow/core/common_runtime/gpu:gpu_device_unified_memory_test_gpu')


...

Is there anything else I should look for in the logfile (size: 234 MB)?




Re: [easybuild] TensorFlow build fails in //tensorflow/core/common_runtime:graph_constructor_test

2021-05-27 Thread Ole Holm Nielsen

On 5/27/21 10:46 AM, Alexander Grund wrote:
 > Alexandre: should we look for patterns like "No space left on device" 
in the Bazel output and highlight them better, perhaps with a concrete 
suggestion to use --tmpdir to avoid the usage of /tmp?


We could in general put something into EasyBuild, yes. I started a PR with 
enhanced error parsing which could maybe be used for that.


I've configured some larger temporary file spaces:
EASYBUILD_TMPDIR=/scratch/modules  (800+ GB available)
EASYBUILD_BUILDPATH=/dev/shm   (94 GB size)

and try to build TensorFlow:

$ eb TensorFlow-2.4.1-fosscuda-2020b.eb 
--cuda-compute-capabilities=8.0,8.6 --tmpdir=/scratch/modules


== installing extension TensorFlow 2.4.1 (28/28)...
==  configuring...
==  building...
==  testing...
== FAILED: Installation ended unsuccessfully (build directory: 
/dev/shm/TensorFlow/2.4.1/fosscuda-2020b): build failed (first 300 chars): 
At least 2 gpu tests failed:
//tensorflow/core/common_runtime/gpu:gpu_device_test, 
//tensorflow/core/common_runtime/gpu:gpu_device_unified_memory_test_gpu 
(took 55 min 27 sec)
== Results of the build can be found in the log file(s) 
/scratch/modules/eb-3l5Ptk/easybuild-TensorFlow-2.4.1-20210527.114011.EmOkP.log
ERROR: Build of 
/home/modules/software/EasyBuild/4.3.4/easybuild/easyconfigs/t/TensorFlow/TensorFlow-2.4.1-fosscuda-2020b.eb 
failed (err: 'build failed (first 300 chars): At least 2 gpu tests 
failed:\n//tensorflow/core/common_runtime/gpu:gpu_device_test, 
//tensorflow/core/common_runtime/gpu:gpu_device_unified_memory_test_gpu')


In the logfile I see multiple FAILED tests:

$ grep FAILED 
/scratch/modules/eb-3l5Ptk/easybuild-TensorFlow-2.4.1-20210527.114011.EmOkP.log

FAILED: //tensorflow/core/common_runtime/gpu:gpu_device_test (Summary)
[  FAILED  ] GPUDeviceTest.SingleVirtualDeviceWithInvalidPriority (79 ms)
[  FAILED  ] 1 test, listed below:
[  FAILED  ] GPUDeviceTest.SingleVirtualDeviceWithInvalidPriority
 1 FAILED TEST
[  FAILED  ] GPUDeviceTest.SingleVirtualDeviceWithInvalidPriority (323 ms)
[  FAILED  ] 1 test, listed below:
[  FAILED  ] GPUDeviceTest.SingleVirtualDeviceWithInvalidPriority
 1 FAILED TEST
[  FAILED  ] GPUDeviceTest.SingleVirtualDeviceWithInvalidPriority (128 ms)
[  FAILED  ] 1 test, listed below:
[  FAILED  ] GPUDeviceTest.SingleVirtualDeviceWithInvalidPriority
 1 FAILED TEST
FAILED: 
//tensorflow/core/common_runtime/gpu:gpu_device_unified_memory_test_gpu 
(Summary)

[  FAILED  ] GPUDeviceTest.SingleVirtualDeviceWithInvalidPriority (40 ms)
[  FAILED  ] 1 test, listed below:
[  FAILED  ] GPUDeviceTest.SingleVirtualDeviceWithInvalidPriority
 1 FAILED TEST
[  FAILED  ] GPUDeviceTest.SingleVirtualDeviceWithInvalidPriority (158 ms)
[  FAILED  ] 1 test, listed below:
[  FAILED  ] GPUDeviceTest.SingleVirtualDeviceWithInvalidPriority
 1 FAILED TEST
[  FAILED  ] GPUDeviceTest.SingleVirtualDeviceWithInvalidPriority (77 ms)
[  FAILED  ] 1 test, listed below:
[  FAILED  ] GPUDeviceTest.SingleVirtualDeviceWithInvalidPriority
 1 FAILED TEST
//tensorflow/core/common_runtime/gpu:gpu_device_test 
FAILED in 3 out of 3 in 4.8s
//tensorflow/core/common_runtime/gpu:gpu_device_unified_memory_test_gpu 
FAILED in 3 out of 3 in 3.5s

FAILED: //tensorflow/core/common_runtime/gpu:gpu_device_test (Summary)
[  FAILED  ] GPUDeviceTest.SingleVirtualDeviceWithInvalidPriority (79 
ms)
[  FAILED  ] 1 test, listed below:
[  FAILED  ] GPUDeviceTest.SingleVirtualDeviceWithInvalidPriority
 1 FAILED TEST
[  FAILED  ] GPUDeviceTest.SingleVirtualDeviceWithInvalidPriority (323 
ms)
[  FAILED  ] 1 test, listed below:
[  FAILED  ] GPUDeviceTest.SingleVirtualDeviceWithInvalidPriority
 1 FAILED TEST
[  FAILED  ] GPUDeviceTest.SingleVirtualDeviceWithInvalidPriority (128 
ms)
[  FAILED  ] 1 test, listed below:
[  FAILED  ] GPUDeviceTest.SingleVirtualDeviceWithInvalidPriority
 1 FAILED TEST
	FAILED: 
//tensorflow/core/common_runtime/gpu:gpu_device_unified_memory_test_gpu 
(Summary)

[  FAILED  ] GPUDeviceTest.SingleVirtualDeviceWithInvalidPriority (40 
ms)
[  FAILED  ] 1 test, listed below:
[  FAILED  ] GPUDeviceTest.SingleVirtualDeviceWithInvalidPriority
 1 FAILED TEST
[  FAILED  ] GPUDeviceTest.SingleVirtualDeviceWithInvalidPriority (158 
ms)
[  FAILED  ] 1 test, listed below:
[  FAILED  ] GPUDeviceTest.SingleVirtualDeviceWithInvalidPriority
 1 FAILED TEST
[  FAILED  ] GPUDeviceTest.SingleVirtualDeviceWithInvalidPriority (77 
ms)
[  FAILED  ] 1 test, listed below:
[  FAILED  ] GPUDeviceTest.SingleVirtualDeviceWithInvalidPriority
 1 FAILED TEST
	//tensorflow/core/common_runtime/gpu:gpu_device_test 
FAILED in 3 out of 3 in 4.8s
	//tensorflow/core/common_runtime/gpu:gpu_device_unified_memory_test_gpu 
FAILED in 3 out of 3 in 3.5s)

FAILED: //tensorflow/core/common_runtime/gpu:gpu_device_test (Summary)
[  FAILED  ] 

Re: [easybuild] TensorFlow build fails in //tensorflow/core/common_runtime:graph_constructor_test

2021-05-27 Thread Ole Holm Nielsen

On 5/27/21 10:46 AM, Alexander Grund wrote:
/home/modules/software/binutils/2.35-GCCcore-10.2.0/bin/ld.gold: fatal 
error: 
bazel-out/k8-opt/bin/tensorflow/core/common_runtime/graph_constructor_test: 
No space left on device


What device might that be?  As shown above, I have quite a bit of disk 
space.  Is /tmp being used and getting full?


 > export EASYBUILD_BUILDPATH=/run/user/$UID/eb_build

 > tmpfs   19G   19G   30M 100% /run/user/983

This clearly shows that your buildpath is full. So that is the issue. Try 
using another buildpath, Kenneth is right, we make sure Bazel doesn't use 
/tmp.


I have found out that /run/user/$UID defaults to 10% of the system RAM 
memory as defined in /etc/systemd/logind.conf (see man 5 logind.conf). 
This 10% value is 19 GB on my server. It seems to be prudent to use 
/dev/shm in stead:


export EASYBUILD_BUILDPATH=/dev/shm

While building TensorFlow the /dev/shm grows to a gigantic size:

# df -Ph /dev/shm
Filesystem  Size  Used Avail Use% Mounted on
tmpfs94G   46G   48G  50% /dev/shm

Unfortunately, the build still fails and I need to look for the source of 
errors in the logfile:


== installing extension TensorFlow 2.4.1 (28/28)...
==  configuring...
==  building...
==  testing...
== FAILED: Installation ended unsuccessfully (build directory: 
/dev/shm/TensorFlow/2.4.1/fosscuda-2020b): build failed (first 300 chars): 
At least 2 gpu tests failed:
//tensorflow/core/common_runtime/gpu:gpu_device_test, 
//tensorflow/core/common_runtime/gpu:gpu_device_unified_memory_test_gpu 
(took 55 min 27 sec)
== Results of the build can be found in the log file(s) 
/scratch/modules/eb-3l5Ptk/easybuild-TensorFlow-2.4.1-20210527.114011.EmOkP.log
ERROR: Build of 
/home/modules/software/EasyBuild/4.3.4/easybuild/easyconfigs/t/TensorFlow/TensorFlow-2.4.1-fosscuda-2020b.eb 
failed (err: 'build failed (first 300 chars): At least 2 gpu tests 
failed:\n//tensorflow/core/common_runtime/gpu:gpu_device_test, 
//tensorflow/core/common_runtime/gpu:gpu_device_unified_memory_test_gpu')



/Ole


Re: [easybuild] TensorFlow build fails in //tensorflow/core/common_runtime:graph_constructor_test

2021-05-27 Thread Ole Holm Nielsen

Hi Loris,

On 5/27/21 10:34 AM, Loris Bennett wrote:

What device might that be?  As shown above, I have quite a bit of disk space.
Is /tmp being used and getting full?


This might be the case.  In the past I ran into this problem and solved
it with the following:

   eb TensorFlow-1.15.0-fosscuda-2019b-Python-3.7.4.eb --robot 
--cuda-compute-capabilities=6.1,7.5 --buildpath=/dev/shm 
--tmpdir=/scratch/eb-build


Yes, I configured that with:

export EASYBUILD_BUILDPATH=/run/user/$UID/eb_build
ulimit -s 2000240
export EASYBUILD_TMPDIR=/scratch/$USER

Thanks,
Ole

--
Ole Holm Nielsen
PhD, Senior HPC Officer
Department of Physics, Technical University of Denmark,
Fysikvej Building 309, DK-2800 Kongens Lyngby, Denmark
E-mail: ole.h.niel...@fysik.dtu.dk
Homepage: http://dcwww.fysik.dtu.dk/~ohnielse/
Mobile: (+45) 5180 1620


Re: [easybuild] TensorFlow build fails in //tensorflow/core/common_runtime:graph_constructor_test

2021-05-27 Thread Ole Holm Nielsen
: @local_execution_config_platform//:platform]
</tt><tt>ERROR: 
</tt><tt>/run/user/983/eb_build/TensorFlow/2.4.1/fosscuda-2020b/TensorFlow/tensorflow-2.4.1/tensorflow/core/common_runtime/BUILD:2700:11: 
</tt><tt>Linking of rule '//tensorflow/core/common_runtime:graph_constructor_test' 
</tt><tt>failed (Exit 1): crosstool_wrapper_driver_is_
</tt><pre style="margin: 0em;">
not_gcc failed: error executing command
</pre><tt>/home/modules/software/binutils/2.35-GCCcore-10.2.0/bin/ld.gold: fatal 
</tt><tt>error: 
</tt><tt>bazel-out/k8-opt/bin/tensorflow/core/common_runtime/graph_constructor_test: 
</tt><tt>No space left on device
</tt><pre style="margin: 0em;">
collect2: error: ld returned 1 exit status
FAILED: Build did NOT complete successfully
</pre><tt>//tensorflow/core/common_runtime:graph_constructor_test FAILED TO 
</tt><tt>BUILD
</tt><pre style="margin: 0em;">
FAILED: Build did NOT complete successfully
</pre><tt>== 2021-05-26 15:30:49,145 run.py:554 WARNING Found 11 errors in command 
</tt><tt>output (output: WARNING: Download from 
</tt><tt><a  rel="nofollow" href="https://storage.googleapis.com/mirror.tensorflow.org/github.com/llvm/llvm-project/archive/f402e682d0ef5598eeffc9a21a691b03e602ff58.tar.gz">https://storage.googleapis.com/mirror.tensorflow.org/github.com/llvm/llvm-project/archive/f402e682d0ef5598eeffc9a21a691b03e602ff58.tar.gz</a> 
</tt><tt>failed: class 
</tt><tt>com.google.devtools.build.lib.bazel.repository.downloader.UnrecoverableHttpException 
</tt><tt>GET returned 404 Not Found
</tt><tt>SUBCOMMAND: # //tensorflow/core/platform:error [action 'Linking 
</tt><tt>tensorflow/core/platform/liberror.so', configuration: 
</tt><tt>f6bc5b6107d950b9fac2186352cdfdfe45c6815016e3edc9f32af940b50d30a6, 
</tt><tt>execution platform: @local_execution_config_platform//:platform]
</tt><tt>SUBCOMMAND: # //tensorflow/core/platform:error [action 'Compiling 
</tt><tt>tensorflow/core/platform/error.cc', configuration: 
</tt><tt>f6bc5b6107d950b9fac2186352cdfdfe45c6815016e3edc9f32af940b50d30a6, 
</tt><tt>execution platform: @local_execution_config_platform//:platform]
</tt><pre style="margin: 0em;">

</pre><tt>external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc 
</tt><tt>-MD -MF bazel-out/k8-opt/bin/tensorflow/core/platform/_objs/error/error.d 
</tt><tt>'-frandom-seed=bazel-out/k8-opt/bin/tensorflow/core/platform/_objs/error/error.o' 
</tt><tt>-DEIGEN_MPL2_ONLY '-DEIGEN_MAX_ALIGN_BYTES=64' '-DEIGEN_HAS_TYPE_TRAITS=0' 
</tt><tt>-D__CLANG_SUPPORT_DYN_ANNOTATION__ -iquote . -iquote bazel-out/k8-opt/bin 
</tt><tt>-iquote external/eigen_archive -iquote 
</tt><tt>bazel-out/k8-opt/bin/external/eigen_archive -iquote 
</tt><tt>external/com_google_absl -iquote 
</tt><tt>bazel-out/k8-opt/bin/external/com_google_absl -iquote external/nsync 
</tt><tt>-iquote bazel-out/k8-opt/bin/external/nsync -iquote 
</tt><tt>external/double_conversion -iquote 
</tt><tt>bazel-out/k8-opt/bin/external/double_conversion -iquote 
</tt><tt>external/com_google_protobuf -iquote 
</tt><tt>bazel-out/k8-opt/bin/external/com_google_protobuf -isystem 
</tt><tt>third_party/eigen3/mkl_include -isystem 
</tt><tt>bazel-out/k8-opt/bin/third_party/eigen3/mkl_include -isystem 
</tt><tt>external/eigen_archive -isystem 
</tt><tt>bazel-out/k8-opt/bin/external/eigen_archive -Wno-builtin-macro-redefined 
</tt><tt>'-D__DATE__="redacted"' '-D__TIMESTAMP__="redacted"' 
</tt><tt>'-D__TIME__="redacted"' -fPIE -U_FORTIFY_SOURCE '-D_FORTIFY_SOURCE=1' 
</tt><tt>-fstack-protector -Wall -fno-omit-frame-pointer -no-canonical-prefixes 
</tt><tt>-fno-canonical-system-headers -DNDEBUG -g0 -O2 -ffunction-sections 
</tt><tt>-fdata-sections -w -DAUTOLOAD_DYNAMIC_KERNELS -O2 -ftree-vectorize 
</tt><tt>'-march=native' -fno-math-errno -fPIC -fPIC '-std=c++14' -c 
</tt><tt>tensorflow/core/platform/error.cc -o 
</tt><tt>bazel-out/k8-opt/bin/tensorflow/core/platform/_objs/error/error.o)
</tt><tt>SUBCOMMAND: # //tensorflow/core/platform:error [action 'Linking 
</tt><tt>tensorflow/core/platform/liberror.a', configuration: 
</tt><tt>f6bc5b6107d950b9fac2186352cdfdfe45c6815016e3edc9f32af940b50d30a6, 
</tt><tt>execution platform: @local_execution_c

[easybuild] TensorFlow build fails in //tensorflow/core/common_runtime:graph_constructor_test

2021-05-26 Thread Ole Holm Nielsen

I'm trying to build TensorFlow with EB 4.3.4 but get an error:

$ eb TensorFlow-2.4.1-fosscuda-2020b.eb 
--cuda-compute-capabilities=8.0,8.6 --tmpdir=/scratch/modules


(lines deleted)
== installing extension TensorFlow 2.4.1 (28/28)...
==  configuring...
==  building...
==  testing...
== FAILED: Installation ended unsuccessfully (build directory: 
/run/user/983/eb_build/TensorFlow/2.4.1/fosscuda-2020b): build failed 
(first 300 chars): At least 1 cpu tests failed:

//tensorflow/core/common_runtime:graph_constructor_test (took 43 min 58 sec)
== Results of the build can be found in the log file(s) 
/scratch/modules/eb-KPZu0P/easybuild-TensorFlow-2.4.1-20210526.144651.PuIWy.log
ERROR: Build of 
/home/modules/software/EasyBuild/4.3.4/easybuild/easyconfigs/t/TensorFlow/TensorFlow-2.4.1-fosscuda-2020b.eb 
failed (err: 'build failed (first 300 chars): At least 1 cpu tests 
failed:\n//tensorflow/core/common_runtime:graph_constructor_test')



The EB log file reports an error:

//tensorflow/core/common_runtime:graph_constructor_test FAILED TO 
BUILD


and the log file ends with:

Executed 137 out of 814 tests: 137 tests pass, 1 fails to build and 676 
were skipped.

FAILED: Build did NOT complete successfully

== 2021-05-26 15:30:49,719 build_log.py:169 ERROR EasyBuild crashed with 
an error (at easybuild/base/exceptions.py:124 in __init__): At least 1 cpu 
tests failed:
//tensorflow/core/common_runtime:graph_constructor_test (at 
easybuild/easyblocks/t/tensorflow.py:973 in test_step)
== 2021-05-26 15:30:49,719 filetools.py:1810 INFO Removing lock 
/home/modules/software/.locks/_home_modules_software_TensorFlow_2.4.1-fosscuda-2020b.lock...
== 2021-05-26 15:30:49,721 filetools.py:347 INFO Path 
/home/modules/software/.locks/_home_modules_software_TensorFlow_2.4.1-fosscuda-2020b.lock 
successfully removed.
== 2021-05-26 15:30:49,721 filetools.py:1814 INFO Lock removed: 
/home/modules/software/.locks/_home_modules_software_TensorFlow_2.4.1-fosscuda-2020b.lock
== 2021-05-26 15:30:49,721 easyblock.py:3414 WARNING build failed (first 
300 chars): At least 1 cpu tests failed:

//tensorflow/core/common_runtime:graph_constructor_test
== 2021-05-26 15:30:49,721 easyblock.py:298 INFO Closing log for 
application name TensorFlow version 2.4.1



Can anyone suggest a fix for this issue?

Thanks,
Ole

--
Ole Holm Nielsen
PhD, Senior HPC Officer
Department of Physics, Technical University of Denmark,
Fysikvej Building 309, DK-2800 Kongens Lyngby, Denmark
E-mail: ole.h.niel...@fysik.dtu.dk
Homepage: http://dcwww.fysik.dtu.dk/~ohnielse/
Mobile: (+45) 5180 1620


[easybuild] What is the meaning of --cuda-compute-capabilities ?

2021-05-26 Thread Ole Holm Nielsen
I'm trying to build TensorFlow-2.4.1-fosscuda-2020b.eb on a GPU-equipped 
node using the latest EB 4.3.4, and I get this warning:


(lines deleted)

== installing extension TensorFlow 2.4.1 (28/28)...
==  configuring...

WARNING: No CUDA compute capabilities specified, so using TensorFlow default 
(which may not be optimal for your system).
You should use the --cuda-compute-capabilities configuration option or the 
cuda_compute_capabilities easyconfig parameter to specify a list of CUDA 
compute capabilities to compile with.



So I'm trying to find out the meaning and use of 
"--cuda-compute-capabilities", and the documentation at 
https://docs.easybuild.io/en/latest/version-specific/help.html says:



--cuda-compute-capabilities=CUDA-COMPUTE-CAPABILITIES   List of CUDA compute 
capabilities to use when building GPU software; values should be specified as 
digits separated by a dot, for example: 3.5,5.0,7.2 (type comma-separated list)


This makes me none the wiser!  Can anyone tell me what these numeric 
values are supposed to mean, and how I pick the right values for the GPUs 
in my nodes?


Thanks,
Ole


--
Ole Holm Nielsen
PhD, Senior HPC Officer
Department of Physics, Technical University of Denmark,
Fysikvej Building 309, DK-2800 Kongens Lyngby, Denmark
E-mail: ole.h.niel...@fysik.dtu.dk
Homepage: http://dcwww.fysik.dtu.dk/~ohnielse/
Mobile: (+45) 5180 1620


Re: [easybuild] What is the meaning of --cuda-compute-capabilities ?

2021-05-26 Thread Ole Holm Nielsen

On 5/26/21 12:14 PM, Strube, Alexandre wrote:

Those numbers are related to which kind of gpu you have.

There’s a good table here: https://en.wikipedia.org/wiki/CUDA


On 5/26/21 12:11 PM, Josef Dvoracek wrote:
> Hi, there is table at Nvidia site revealing the compute capabilities
> numbers: https://developer.nvidia.com/cuda-gpus.

Great, thanks a lot for sharing your insights!

EasyBuild developers: Could you kindly add the above two URLs to the 
documentation at 
https://docs.easybuild.io/en/latest/version-specific/help.html ?


Best regards,
Ole

--
Ole Holm Nielsen
PhD, Senior HPC Officer
Department of Physics, Technical University of Denmark,
Fysikvej Building 309, DK-2800 Kongens Lyngby, Denmark
E-mail: ole.h.niel...@fysik.dtu.dk
Homepage: http://dcwww.fysik.dtu.dk/~ohnielse/
Mobile: (+45) 5180 1620


[easybuild] Build of SciPy-bundle-2020.11-intel-2020b.eb fails on AMD Rome node

2021-04-30 Thread Ole Holm Nielsen
py", 
line 85, in check

E   do(self.a, self.b, tags=self.tags)
E File 
"/tmp/eb-1cYcYH/tmpkHZver/lib/python3.8/site-packages/numpy/linalg/tests/test_linalg.py", 
line 460, in do

E   assert_almost_equal(b, dot_generalized(a, x))
E File 
"/tmp/eb-1cYcYH/tmpkHZver/lib/python3.8/site-packages/numpy/linalg/tests/test_linalg.py", 
line 41, in assert_almost_equal

E   old_assert_almost_equal(a, b, decimal=decimal, **kw)
E File 
"/tmp/eb-1cYcYH/tmpkHZver/lib/python3.8/site-packages/numpy/testing/_private/utils.py", 
line 575, in assert_almost_equal

E   raise AssertionError(_build_err_msg())
E   AssertionError:
E   Arrays are not almost equal to 6 decimals
EACTUAL: array([2.+1.j, 1.+2.j], dtype=complex64)
EDESIRED: array([nan+nanj, nan+nanj], dtype=complex64)
E   AssertionError:
E   Arrays are not almost equal to 2 decimals
EACTUAL: nan
EDESIRED: 12.65

The complete EB log can be provided if anyone cares.

Thanks,
Ole

--
Ole Holm Nielsen
PhD, Senior HPC Officer
Department of Physics, Technical University of Denmark


[easybuild] EB module for AFLOW - Automatic FLOW for Materials Discovery?

2021-03-18 Thread Ole Holm Nielsen

Dear Easybuilders,

I have received a request to provide the software AFLOW - Automatic FLOW 
for Materials Discovery (http://aflow.org) with an installation page at 
http://aflow.org/install-aflow/


Question:  Has anyone been working on an EB module for AFLOW?

Thanks a lot,
Ole

--
Ole Holm Nielsen
PhD, Senior HPC Officer
Department of Physics, Technical University of Denmark


Re: [easybuild] Build fails of OpenMPI/4.0.5-GCC-10.2.0 on AMD EPYC node

2021-03-12 Thread Ole Holm Nielsen

On 12-03-2021 10:35, Ole Holm Nielsen wrote:
Thanks a lot for pointing at the solution.  I have asked the sysadmin if 
he can install the libnl3-devel RPM.  Hopefully that will resolve the 
issue for us.


I can report that after installing the libnl3-devel RPM and rebuilding 
the libfabric module, then OpenMPI builds without any problems.


Can the libfabric be updated with the prerequisite of libnl3-devel?

Thanks,
Ole


On 12-03-2021 09:30, Kenneth Hoste wrote:

Dear Ole,

Please check 
https://github.com/easybuilders/easybuild-easyconfigs/issues/11939, 
where this issue is also reported.


It seems to be related to (not) having specific OS packages installed 
when libfabric is being installed.


We probably need to make some changes (configure options, or 
registering required OS dependencies) for this, so additional feedback 
on this is welcome (in particular in the GitHub issue).



regards,

Kenneth

On 11/03/2021 16:11, Ole Holm Nielsen wrote:

Dear EasyBuilders,

I'm trying to get EasyBuild modules up and running on an external 
cluster with AMD EPYC 7351 and running CentOS 7.6.  With EB 4.3.3 I 
can't get OpenMPI to build :-(  I'm trying to build this module:


$ eb GPAW-21.1.0-foss-2020b-ASE-3.21.1.eb -r
== Temporary log file in case of crash 
/tmp/eb-b3w2Qf/easybuild-WAbEyF.log
== found valid index for 
/groups/physics/modules/software/EasyBuild/4.3.3/easybuild/easyconfigs, 
so using it...
== found valid index for 
/groups/physics/modules/software/EasyBuild/4.3.3/easybuild/easyconfigs, 
so using it...

== resolving dependencies ...
== processing EasyBuild easyconfig 
/groups/physics/modules/software/EasyBuild/4.3.3/easybuild/easyconfigs/o/OpenMPI/OpenMPI-4.0.5-GCC-10.2.0.eb 


== building and installing OpenMPI/4.0.5-GCC-10.2.0...
== fetching files...
== creating build dir, resetting environment...
== unpacking...
== patching...
== preparing...
== configuring...
== FAILED: Installation ended unsuccessfully (build directory: 
/groups/physics/modules/build/OpenMPI/4.0.5/GCC-10.2.0): build failed 
(first 300 chars): cmd " ./configure 
--prefix=/groups/physics/modules/software/OpenMPI/4.0.5-GCC-10.2.0 
--build=x86_64-pc-linux-gnu  --host=x86_64-pc-linux-gnu 
--enable-mpirun-prefix-by-default  --enable-shared 
--with-hwloc=/groups/physics/modules/software/hwloc/2.2.0-GCCcore-10.2.0 
--with-libevent=/groups/physics (took 1 min 38 sec)
== Results of the build can be found in the log file(s) 
/tmp/eb-b3w2Qf/easybuild-OpenMPI-4.0.5-20210311.085550.clqtp.log



The logfile ends with these warnings and errors:

--- MCA component btl:usnic (m4 configuration macro)
checking for MCA component btl:usnic compile mode... dso
checking size of void *... (cached) 8
checking for 64 bit Linux... yes
checking --with-ofi value... sanity check ok 
(/groups/physics/modules/software/libfabric/1.11.0-GCCcore-10.2.0)

checking --with-ofi-libdir value... simple ok (unspecified value)
checking looking for OFI libfabric in... 
(/groups/physics/modules/software/libfabric/1.11.0-GCCcore-10.2.0)

checking rdma/fabric.h usability... yes
checking rdma/fabric.h presence... yes
checking for rdma/fabric.h... yes
looking for library in lib
checking for library containing fi_getinfo... -lfabric
checking if libfabric requires libnl v1 or v3... v1 v3
configure: WARNING: Unfortunately, libfabric links to both libnl and 
libnl-3.
configure: WARNING: This is a configuration that is *known* to cause 
run-time crashes.

configure: WARNING: This is an error in libfabric (not Open MPI).
configure: WARNING: Open MPI will therefore skip using libfabric.
configure: WARNING: OFI libfabric support requested (via --with-ofi 
or --with-libfabric), but not found.

configure: error: Cannot continue.
  (at easybuild/tools/run.py:533 in parse_cmd_output)
== 2021-03-11 08:57:29,620 filetools.py:1785 INFO Removing lock 
/groups/physics/modules/software/.locks/_groups_physics_modules_software_OpenMPI_4.0.5-GCC-10.2.0.lock... 

== 2021-03-11 08:57:29,621 filetools.py:341 INFO Path 
/groups/physics/modules/software/.locks/_groups_physics_modules_software_OpenMPI_4.0.5-GCC-10.2.0.lock 
successfully removed.
== 2021-03-11 08:57:29,621 filetools.py:1789 INFO Lock removed: 
/groups/physics/modules/software/.locks/_groups_physics_modules_software_OpenMPI_4.0.5-GCC-10.2.0.lock 

== 2021-03-11 08:57:29,621 easyblock.py:3389 WARNING build failed 
(first 300 chars): cmd " ./configure 
--prefix=/groups/physics/modules/software/OpenMPI/4.0.5-GCC-10.2.0 
--build=x86_64-pc-linux-gnu  --host=x86_64-pc-linux-gnu 
--enable-mpirun-prefix-by-default  --enable-shared 
--with-hwloc=/groups/physics/modules/software/hwloc/2.2.0-GCCcore-10.2.0 
--with-libevent=/groups/physics
== 2021-03-11 08:57:29,621 easyblock.py:298 INFO Closing log for 
application name OpenMPI version 4.0.5



Can someone tell me what's going on here?  We don't have this problem 
on our own cluster.


Re: [easybuild] Build fails of OpenMPI/4.0.5-GCC-10.2.0 on AMD EPYC node

2021-03-12 Thread Ole Holm Nielsen

Hi Kenneth,

Thanks a lot for pointing at the solution.  I have asked the sysadmin if 
he can install the libnl3-devel RPM.  Hopefully that will resolve the 
issue for us.


Best regards,
Ole


On 12-03-2021 09:30, Kenneth Hoste wrote:

Dear Ole,

Please check 
https://github.com/easybuilders/easybuild-easyconfigs/issues/11939, 
where this issue is also reported.


It seems to be related to (not) having specific OS packages installed 
when libfabric is being installed.


We probably need to make some changes (configure options, or registering 
required OS dependencies) for this, so additional feedback on this is 
welcome (in particular in the GitHub issue).



regards,

Kenneth

On 11/03/2021 16:11, Ole Holm Nielsen wrote:

Dear EasyBuilders,

I'm trying to get EasyBuild modules up and running on an external 
cluster with AMD EPYC 7351 and running CentOS 7.6.  With EB 4.3.3 I 
can't get OpenMPI to build :-(  I'm trying to build this module:


$ eb GPAW-21.1.0-foss-2020b-ASE-3.21.1.eb -r
== Temporary log file in case of crash 
/tmp/eb-b3w2Qf/easybuild-WAbEyF.log
== found valid index for 
/groups/physics/modules/software/EasyBuild/4.3.3/easybuild/easyconfigs, so 
using it...
== found valid index for 
/groups/physics/modules/software/EasyBuild/4.3.3/easybuild/easyconfigs, so 
using it...

== resolving dependencies ...
== processing EasyBuild easyconfig 
/groups/physics/modules/software/EasyBuild/4.3.3/easybuild/easyconfigs/o/OpenMPI/OpenMPI-4.0.5-GCC-10.2.0.eb 


== building and installing OpenMPI/4.0.5-GCC-10.2.0...
== fetching files...
== creating build dir, resetting environment...
== unpacking...
== patching...
== preparing...
== configuring...
== FAILED: Installation ended unsuccessfully (build directory: 
/groups/physics/modules/build/OpenMPI/4.0.5/GCC-10.2.0): build failed 
(first 300 chars): cmd " ./configure 
--prefix=/groups/physics/modules/software/OpenMPI/4.0.5-GCC-10.2.0 
--build=x86_64-pc-linux-gnu  --host=x86_64-pc-linux-gnu 
--enable-mpirun-prefix-by-default  --enable-shared 
--with-hwloc=/groups/physics/modules/software/hwloc/2.2.0-GCCcore-10.2.0 
--with-libevent=/groups/physics (took 1 min 38 sec)
== Results of the build can be found in the log file(s) 
/tmp/eb-b3w2Qf/easybuild-OpenMPI-4.0.5-20210311.085550.clqtp.log



The logfile ends with these warnings and errors:

--- MCA component btl:usnic (m4 configuration macro)
checking for MCA component btl:usnic compile mode... dso
checking size of void *... (cached) 8
checking for 64 bit Linux... yes
checking --with-ofi value... sanity check ok 
(/groups/physics/modules/software/libfabric/1.11.0-GCCcore-10.2.0)

checking --with-ofi-libdir value... simple ok (unspecified value)
checking looking for OFI libfabric in... 
(/groups/physics/modules/software/libfabric/1.11.0-GCCcore-10.2.0)

checking rdma/fabric.h usability... yes
checking rdma/fabric.h presence... yes
checking for rdma/fabric.h... yes
looking for library in lib
checking for library containing fi_getinfo... -lfabric
checking if libfabric requires libnl v1 or v3... v1 v3
configure: WARNING: Unfortunately, libfabric links to both libnl and 
libnl-3.
configure: WARNING: This is a configuration that is *known* to cause 
run-time crashes.

configure: WARNING: This is an error in libfabric (not Open MPI).
configure: WARNING: Open MPI will therefore skip using libfabric.
configure: WARNING: OFI libfabric support requested (via --with-ofi or 
--with-libfabric), but not found.

configure: error: Cannot continue.
  (at easybuild/tools/run.py:533 in parse_cmd_output)
== 2021-03-11 08:57:29,620 filetools.py:1785 INFO Removing lock 
/groups/physics/modules/software/.locks/_groups_physics_modules_software_OpenMPI_4.0.5-GCC-10.2.0.lock... 

== 2021-03-11 08:57:29,621 filetools.py:341 INFO Path 
/groups/physics/modules/software/.locks/_groups_physics_modules_software_OpenMPI_4.0.5-GCC-10.2.0.lock 
successfully removed.
== 2021-03-11 08:57:29,621 filetools.py:1789 INFO Lock removed: 
/groups/physics/modules/software/.locks/_groups_physics_modules_software_OpenMPI_4.0.5-GCC-10.2.0.lock 

== 2021-03-11 08:57:29,621 easyblock.py:3389 WARNING build failed 
(first 300 chars): cmd " ./configure 
--prefix=/groups/physics/modules/software/OpenMPI/4.0.5-GCC-10.2.0 
--build=x86_64-pc-linux-gnu  --host=x86_64-pc-linux-gnu 
--enable-mpirun-prefix-by-default  --enable-shared 
--with-hwloc=/groups/physics/modules/software/hwloc/2.2.0-GCCcore-10.2.0 
--with-libevent=/groups/physics
== 2021-03-11 08:57:29,621 easyblock.py:298 INFO Closing log for 
application name OpenMPI version 4.0.5



Can someone tell me what's going on here?  We don't have this problem 
on our own cluster.


Thanks,
Ole





[easybuild] Build fails of OpenMPI/4.0.5-GCC-10.2.0 on AMD EPYC node

2021-03-11 Thread Ole Holm Nielsen

Dear EasyBuilders,

I'm trying to get EasyBuild modules up and running on an external cluster 
with AMD EPYC 7351 and running CentOS 7.6.  With EB 4.3.3 I can't get 
OpenMPI to build :-(  I'm trying to build this module:


$ eb GPAW-21.1.0-foss-2020b-ASE-3.21.1.eb -r
== Temporary log file in case of crash /tmp/eb-b3w2Qf/easybuild-WAbEyF.log
== found valid index for 
/groups/physics/modules/software/EasyBuild/4.3.3/easybuild/easyconfigs, so 
using it...
== found valid index for 
/groups/physics/modules/software/EasyBuild/4.3.3/easybuild/easyconfigs, so 
using it...

== resolving dependencies ...
== processing EasyBuild easyconfig 
/groups/physics/modules/software/EasyBuild/4.3.3/easybuild/easyconfigs/o/OpenMPI/OpenMPI-4.0.5-GCC-10.2.0.eb

== building and installing OpenMPI/4.0.5-GCC-10.2.0...
== fetching files...
== creating build dir, resetting environment...
== unpacking...
== patching...
== preparing...
== configuring...
== FAILED: Installation ended unsuccessfully (build directory: 
/groups/physics/modules/build/OpenMPI/4.0.5/GCC-10.2.0): build failed 
(first 300 chars): cmd " ./configure 
--prefix=/groups/physics/modules/software/OpenMPI/4.0.5-GCC-10.2.0 
--build=x86_64-pc-linux-gnu  --host=x86_64-pc-linux-gnu 
--enable-mpirun-prefix-by-default  --enable-shared 
--with-hwloc=/groups/physics/modules/software/hwloc/2.2.0-GCCcore-10.2.0 
--with-libevent=/groups/physics (took 1 min 38 sec)
== Results of the build can be found in the log file(s) 
/tmp/eb-b3w2Qf/easybuild-OpenMPI-4.0.5-20210311.085550.clqtp.log



The logfile ends with these warnings and errors:

--- MCA component btl:usnic (m4 configuration macro)
checking for MCA component btl:usnic compile mode... dso
checking size of void *... (cached) 8
checking for 64 bit Linux... yes
checking --with-ofi value... sanity check ok 
(/groups/physics/modules/software/libfabric/1.11.0-GCCcore-10.2.0)

checking --with-ofi-libdir value... simple ok (unspecified value)
checking looking for OFI libfabric in... 
(/groups/physics/modules/software/libfabric/1.11.0-GCCcore-10.2.0)

checking rdma/fabric.h usability... yes
checking rdma/fabric.h presence... yes
checking for rdma/fabric.h... yes
looking for library in lib
checking for library containing fi_getinfo... -lfabric
checking if libfabric requires libnl v1 or v3... v1 v3
configure: WARNING: Unfortunately, libfabric links to both libnl and libnl-3.
configure: WARNING: This is a configuration that is *known* to cause 
run-time crashes.

configure: WARNING: This is an error in libfabric (not Open MPI).
configure: WARNING: Open MPI will therefore skip using libfabric.
configure: WARNING: OFI libfabric support requested (via --with-ofi or 
--with-libfabric), but not found.

configure: error: Cannot continue.
 (at easybuild/tools/run.py:533 in parse_cmd_output)
== 2021-03-11 08:57:29,620 filetools.py:1785 INFO Removing lock 
/groups/physics/modules/software/.locks/_groups_physics_modules_software_OpenMPI_4.0.5-GCC-10.2.0.lock...
== 2021-03-11 08:57:29,621 filetools.py:341 INFO Path 
/groups/physics/modules/software/.locks/_groups_physics_modules_software_OpenMPI_4.0.5-GCC-10.2.0.lock 
successfully removed.
== 2021-03-11 08:57:29,621 filetools.py:1789 INFO Lock removed: 
/groups/physics/modules/software/.locks/_groups_physics_modules_software_OpenMPI_4.0.5-GCC-10.2.0.lock
== 2021-03-11 08:57:29,621 easyblock.py:3389 WARNING build failed (first 
300 chars): cmd " ./configure 
--prefix=/groups/physics/modules/software/OpenMPI/4.0.5-GCC-10.2.0 
--build=x86_64-pc-linux-gnu  --host=x86_64-pc-linux-gnu 
--enable-mpirun-prefix-by-default  --enable-shared 
--with-hwloc=/groups/physics/modules/software/hwloc/2.2.0-GCCcore-10.2.0 
--with-libevent=/groups/physics
== 2021-03-11 08:57:29,621 easyblock.py:298 INFO Closing log for 
application name OpenMPI version 4.0.5



Can someone tell me what's going on here?  We don't have this problem on 
our own cluster.


Thanks,
Ole

--
Ole Holm Nielsen
PhD, Senior HPC Officer
Department of Physics, Technical University of Denmark


[easybuild] Looking for a COMSOL easyblock

2020-09-29 Thread Ole Holm Nielsen

Dear Easybuilders,

I have been asked to install a module for the latest COMSOL 5.5, but there 
is no COMSOL .eb file in EB 4.3.  I note that there was a PR concerning 
COMSOL two years ago:

https://github.com/easybuilders/easybuild-easyblocks/pull/1317

Can anyone share a working COMSOL .eb file for COMSOL 5.5 (or older)?

I assume that we should use the latest MATLAB 2020b for installation.  I 
used the MATLAB-2019b.eb and updated the version number.  One trick here 
is that the contents of the MATLAB ISO file must be converted into a 
.tar.gz file, and all files must first be made user-writable by: chmod -R 
u+w .


We already have COMSOL 5.3 installed, but I don't have a record of how we 
did it back then.


Thanks a lot for sharing your insights,
Ole

--
Ole Holm Nielsen
PhD, Senior HPC Officer
Department of Physics, Technical University of Denmark


[easybuild] Tk-8.6.10-GCCcore-9.3.0.eb fails due to download issue

2020-06-22 Thread Ole Holm Nielsen
I wanted to build Tk-8.6.10-GCCcore-9.3.0.eb but this fails during source 
code download:


ERROR: Build of 
/home/modules/software/EasyBuild/4.2.1/easybuild/easyconfigs/t/Tk/Tk-8.6.10-GCCcore-9.3.0.eb 
failed (err: "build failed (first 300 chars): Couldn't find file 
tk8.6.10-src.tar.gz anywhere, and downloading it didn't work either... 
Paths attempted (in order): 
/home/modules/software/EasyBuild/4.2.1/easybuild/easyconfigs/t/Tk/t/Tk/tk8.6.10-src.tar.gz, 
/home/modules/software/EasyBuild/4.2.1/easybuild/easyconfigs/t/Tk/Tk/tk8.6.10-src.tar.gz, 
")


The SourceForge download page https://prdownloads.sourceforge.net/tcl 
seems to be messed up :-(  The version 8.6 is located in 
https://sourceforge.net/projects/tcl/files/Tcl/8.6.10/ but direct download 
from that page doesn't work and one gets into an infinite loop of 
advertisements :-(


Is there some other sane download site which could be configured in the 
EasyBuild .eb files ?


Thanks,
Ole

--
Ole Holm Nielsen
PhD, Senior HPC Officer
Department of Physics, Technical University of Denmark


Re: [easybuild] Tk-8.6.10-GCCcore-9.3.0.eb fails due to download issue

2020-06-22 Thread Ole Holm Nielsen

Hi Terje,

Thanks a lot, this alternative download site worked for me.

Best regards,
Ole

On 6/22/20 10:46 AM, Terje Kvernes wrote:

Hi Ole,

You could maybe try the following source?

curl -o 
/home/modules/software/EasyBuild/4.2.1/easybuild/easyconfigs/t/Tk/t/Tk/tk8.6.10-src.tar.gz
 https://ftp.osuosl.org/pub/blfs/conglomeration/tk/tk8.6.10-src.tar.gz

It’s not an official source, but it looks correct and you should then be able 
to restart the build.


On 22 Jun 2020, at 10:40, Ole Holm Nielsen  wrote:

I wanted to build Tk-8.6.10-GCCcore-9.3.0.eb but this fails during source code 
download:

ERROR: Build of 
/home/modules/software/EasyBuild/4.2.1/easybuild/easyconfigs/t/Tk/Tk-8.6.10-GCCcore-9.3.0.eb
 failed (err: "build failed (first 300 chars): Couldn't find file 
tk8.6.10-src.tar.gz anywhere, and downloading it didn't work either... Paths attempted 
(in order): 
/home/modules/software/EasyBuild/4.2.1/easybuild/easyconfigs/t/Tk/t/Tk/tk8.6.10-src.tar.gz,
 
/home/modules/software/EasyBuild/4.2.1/easybuild/easyconfigs/t/Tk/Tk/tk8.6.10-src.tar.gz, 
")

The SourceForge download page https://prdownloads.sourceforge.net/tcl seems to 
be messed up :-(  The version 8.6 is located in 
https://sourceforge.net/projects/tcl/files/Tcl/8.6.10/ but direct download from 
that page doesn't work and one gets into an infinite loop of advertisements :-(

Is there some other sane download site which could be configured in the 
EasyBuild .eb files ?


Re: [easybuild] Re: EB bootstrap: ImportError: No module named tools.version

2020-04-25 Thread Ole Holm Nielsen

Dear Kenneth,

On 24-04-2020 20:38, Kenneth Hoste wrote:

Can you do a debug bootstrap, by also defining $EASYBUILD_BOOTSTRAP_DEBUG?

   export EASYBUILD_BOOTSTRAP_DEBUG=1


The output is copied below.  I hope you can make some sense of it...

Maybe you have an 'easybuild' folder in the directory where you're 
running the bootstrap from?


No such folder:
$ ls -la
total 72
drwxr-xr-x.  3 ohni camdvip77 Apr 24 08:43 .
drwxr-xr-x. 10 ohni camdvip  4096 Apr 24 08:16 ..
drwxr-xr-x.  2 ohni camdvip  4096 Apr 24 15:47 benchmarks
-rw-r--r--.  1 ohni camdvip 51564 Apr 24 08:43 bootstrap_eb.py
-rw-r--r--.  1 ohni camdvip 11529 Apr 24 08:16 README.md

Thanks,
Ole

Debug output


$ python bootstrap_eb.py $EASYBUILD_PREFIX
[[INFO]] EasyBuild bootstrap script (version 20200203.01, MD5: 
fcb6314d4e0747db9c28a71f8bb2870c)
[[INFO]] Found Python 2.7.5 (default, Aug  7 2019, 00:51:29) ; [GCC 
4.8.5 20150623 (Red Hat 4.8.5-39)]


[[INFO]] Installation prefix /home/niflheim/ohni/modules
[[DEBUG]] Going to use /tmp/tmpgaI94A as temporary directory
[[INFO]] Using modules tool specified by $EASYBUILD_MODULES_TOOL: Lmod
[[DEBUG]] sys.path after cleaning: 
['/home/niflheim/ohni/Git/GPAW-benchmark-2020', 
'/usr/lib64/python27.zip', '/usr/lib64/python2.7', 
'/usr/lib64/python2.7/plat-linux2', '/usr/lib64/python2.7/lib-tk', 
'/usr/lib64/python2.7/lib-old', '/usr/lib64/python2.7/lib-dynload', 
'/usr/lib64/python2.7/site-packages', 
'/usr/lib64/python2.7/site-packages/gtk-2.0', 
'/usr/lib/python2.7/site-packages']

[[DEBUG]] Checking whether suitable setuptools installation is available...
[[DEBUG]] Found setuptools version 0.9.8
[[DEBUG]] Location of setuptools' easy_install module: 
/usr/lib/python2.7/site-packages/setuptools/command/easy_install.pyc
[[DEBUG]] Location of setuptools installation: 
/usr/lib/python2.7/site-packages

[[INFO]] Suitable setuptools installation already found, skipping stage 0...


[[INFO]] +++ STAGE 1: installing EasyBuild in temporary dir with 
easy_install...


[[DEBUG]] $ easy_install --help
[[DEBUG]] Active setuptools installation: 
/usr/lib/python2.7/site-packages/setuptools/__init__.pyc

[[DEBUG]] stdout for 'easy_install --help':

Global options:
  --verbose (-v)  run verbosely (default)
  --quiet (-q)run quietly (turns verbosity off)
  --dry-run (-n)  don't actually do anything
  --help (-h) show detailed help message
  --no-user-cfg   ignore pydistutils.cfg in your home directory

Options for 'easy_install' command:
  --prefix   installation prefix
  --zip-ok (-z)  install package as a zipfile
  --multi-version (-m)   make apps have to require() a version
  --upgrade (-U) force upgrade (searches PyPI for latest
 versions)
  --install-dir (-d) install package to DIR
  --script-dir (-s)  install scripts to DIR
  --exclude-scripts (-x) Don't install scripts
  --always-copy (-a) Copy all needed packages to install dir
  --index-url (-i)   base URL of Python Package Index
  --find-links (-f)  additional URL(s) to search for packages
  --delete-conflicting (-D)  no longer needed; don't use this
  --ignore-conflicts-at-my-risk  no longer needed; don't use this
  --build-directory (-b) download/extract/build in DIR; keep the
 results
  --optimize (-O)also compile with optimization: -O1 for
 "python -O", -O2 for "python -OO", and 
-O0 to

 disable [default: -O0]
  --record   filename in which to record list of 
installed

 files
  --always-unzip (-Z)don't install as a zipfile, no matter what
  --site-dirs (-S)   list of directories where .pth files work
  --editable (-e)Install specified packages in editable 
form

  --no-deps (-N) don't install dependencies
  --allow-hosts (-H) pattern(s) that hostnames must match
  --local-snapshots-ok (-l)  allow building eggs from local checkouts
  --version  print version information and exit
  --no-find-linksDon't load find-links defined in packages
 being installed

usage: bootstrap_eb.py [options] requirement_or_url ...
   or: bootstrap_eb.py --help


[[DEBUG]] stderr for 'easy_install --help':

[[DEBUG]] Preparing for path /tmp/tmpgaI94A/eb_stage1
[[DEBUG]] os.environ['PYTHONPATH'] after reset:
[[DEBUG]] $PATH: 
/tmp/tmpgaI94A/eb_stage1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/dell/srvadmin/bin:/opt/dell/srvadmin/iSM/bin:/home/niflheim/ohni/.local/bin:/home/niflheim/ohni/bin
[[DEBUG]] $PYTHONPATH: 
/tmp/tmpgaI94A/eb_stage1/lib64/python2.7/site-packages:/tmp/tmpgaI94A/eb_stage1/lib/python2.7/site-packages

[[DEBUG]] $EASYBUILD_MODULES_TOOL set to 

[easybuild] Re: EB bootstrap: ImportError: No module named tools.version

2020-04-24 Thread Ole Holm Nielsen

On 24-04-2020 09:15, Ole Holm Nielsen wrote:

Dear EasyBuilders,

I am trying to create a clean EasyBuild setup for a benchmarking 
environment on a CentOS 7.7 server.  In my .bashrc file I have defined 
the clean environment with:


export EASYBUILD_MODULES_TOOL=Lmod
export EASYBUILD_PREFIX=$HOME/modules

I have created the empty folder $HOME/modules/all

Unfortunately the EB bootstrap fails in stage 1:

$ python bootstrap_eb.py $EASYBUILD_PREFIX
[[INFO]] EasyBuild bootstrap script (version 20200203.01, MD5: 
fcb6314d4e0747db9c28a71f8bb2870c)
[[INFO]] Found Python 2.7.5 (default, Aug  7 2019, 00:51:29) ; [GCC 
4.8.5 20150623 (Red Hat 4.8.5-39)]


[[INFO]] Installation prefix /home/niflheim/ohni/modules
[[INFO]] Using modules tool specified by $EASYBUILD_MODULES_TOOL: Lmod
[[INFO]] Suitable setuptools installation already found, skipping stage 
0...



[[INFO]] +++ STAGE 1: installing EasyBuild in temporary dir with 
easy_install...


[[INFO]] running pre-install command 'easy_install --quiet --upgrade 
--prefix=/tmp/tmpvYYVs6/eb_stage1 vsc-install<0.11.4'
[[INFO]] running pre-install command 'easy_install --quiet --upgrade 
--prefix=/tmp/tmpvYYVs6/eb_stage1 vsc-base<2.9.0'
[[INFO]] installing EasyBuild with 'easy_install --quiet --upgrade 
--prefix=/tmp/tmpvYYVs6/eb_stage1 easybuild'


[[INFO]] Note: a 'SyntaxError' may be reported for the 
easybuild/tools/py2vs3/py3.py module.
You can safely ignore this message, it will not affect the functionality 
of the EasyBuild installation.


[[INFO]] running post install command 'easy_install --upgrade 
--prefix=/tmp/tmpvYYVs6/eb_stage1 vsc-base<2.9.0'
[[ERROR]] Stage 1 failed, could not determine EasyBuild version (txt: 
Traceback (most recent call last):

   File "", line 1, in 
ImportError: No module named tools.version
).

This seems to be the same problem as discussed in 
https://github.com/easybuilders/easybuild-framework/issues/2712 which I 
thought was already solved.


The system already this CentOS 7 package python-setuptools, but not 
python-mock:


$ rpm -q python-setuptools python-mock
python-setuptools-0.9.8-7.el7.noarch
package python-mock is not installed


Correction: python-mock is also installed:

$ rpm -q python-setuptools python2-mock
python-setuptools-0.9.8-7.el7.noarch
python2-mock-1.0.1-10.el7.noarch

/Ole


[easybuild] EB bootstrap: ImportError: No module named tools.version

2020-04-24 Thread Ole Holm Nielsen

Dear EasyBuilders,

I am trying to create a clean EasyBuild setup for a benchmarking 
environment on a CentOS 7.7 server.  In my .bashrc file I have defined 
the clean environment with:


export EASYBUILD_MODULES_TOOL=Lmod
export EASYBUILD_PREFIX=$HOME/modules

I have created the empty folder $HOME/modules/all

Unfortunately the EB bootstrap fails in stage 1:

$ python bootstrap_eb.py $EASYBUILD_PREFIX
[[INFO]] EasyBuild bootstrap script (version 20200203.01, MD5: 
fcb6314d4e0747db9c28a71f8bb2870c)
[[INFO]] Found Python 2.7.5 (default, Aug  7 2019, 00:51:29) ; [GCC 
4.8.5 20150623 (Red Hat 4.8.5-39)]


[[INFO]] Installation prefix /home/niflheim/ohni/modules
[[INFO]] Using modules tool specified by $EASYBUILD_MODULES_TOOL: Lmod
[[INFO]] Suitable setuptools installation already found, skipping stage 0...


[[INFO]] +++ STAGE 1: installing EasyBuild in temporary dir with 
easy_install...


[[INFO]] running pre-install command 'easy_install --quiet --upgrade 
--prefix=/tmp/tmpvYYVs6/eb_stage1 vsc-install<0.11.4'
[[INFO]] running pre-install command 'easy_install --quiet --upgrade 
--prefix=/tmp/tmpvYYVs6/eb_stage1 vsc-base<2.9.0'
[[INFO]] installing EasyBuild with 'easy_install --quiet --upgrade 
--prefix=/tmp/tmpvYYVs6/eb_stage1 easybuild'


[[INFO]] Note: a 'SyntaxError' may be reported for the 
easybuild/tools/py2vs3/py3.py module.
You can safely ignore this message, it will not affect the functionality 
of the EasyBuild installation.


[[INFO]] running post install command 'easy_install --upgrade 
--prefix=/tmp/tmpvYYVs6/eb_stage1 vsc-base<2.9.0'
[[ERROR]] Stage 1 failed, could not determine EasyBuild version (txt: 
Traceback (most recent call last):

  File "", line 1, in 
ImportError: No module named tools.version
).

This seems to be the same problem as discussed in 
https://github.com/easybuilders/easybuild-framework/issues/2712 which I 
thought was already solved.


The system already this CentOS 7 package python-setuptools, but not 
python-mock:


$ rpm -q python-setuptools python-mock
python-setuptools-0.9.8-7.el7.noarch
package python-mock is not installed

Did the PR https://github.com/easybuilders/easybuild-framework/pull/2717 
not solve the problem after all?


Thanks,
Ole

--
Ole Holm Nielsen
PhD, Senior HPC Officer
Department of Physics, Technical University of Denmark


[easybuild] OPAL ERROR (missing Slurm PMI support) when building GPAW module

2020-02-14 Thread Ole Holm Nielsen
I am trying to build the new GPAW module from 
https://github.com/easybuilders/easybuild-easyconfigs/pull/9834:


$ eb --from-pr=9834 GPAW-20.1.0-foss-2019b-Python-3.7.4.eb -r

This works without problems on our own Linux (CentOS 7.7) cluster with 
Slurm 19.05.


However, on a remote cluster running CentOS 7.6 with Slurm 18.08.1 I am 
having difficulties.  I have to run EB as a Slurm interactive task on a 
compute node which I access using Slurm's "srun" command.  Then the above 
build always fails during the testing stage with these errors:


== sanity checking...
== FAILED: Installation ended unsuccessfully (build directory: 
/dev/shm/GPAW/20.1.0/foss-2019b-Python-3.7.4): build failed (first 300 
chars): Sanity check failed: command 
"/groups/others/ohni/skylake/software/Python/3.7.4-GCCcore-8.3.0/bin/python 
-c "import gpaw"" failed; output:

OPAL ERROR: Not initialized in file pmix2x_client.c at line 112

and the log file says:

== 2020-02-14 11:27:00,210 run.py:219 INFO running cmd: 
/groups/others/ohni/skylake/software/Python/3.7.4-GCCcore-8.3.0/bin/python 
-c "import gpaw"
== 2020-02-14 11:27:00,487 extension.py:212 WARNING Sanity check for 
'GPAW' extension failed: command 
"/groups/others/ohni/skylake/software/Python/3.7.4-GCCcore-8.3.0/bin/python 
-c "import gpaw"" failed; output:
[node252.cluster:100651] OPAL ERROR: Not initialized in file 
pmix2x_client.c at line 112

--
The application appears to have been direct launched using "srun",
but OMPI was not built with SLURM's PMI support and therefore cannot
execute. There are several options for building PMI support under
SLURM, depending upon the SLURM version you are using:

  version 16.05 or later: you can use SLURM's PMIx support. This
  requires that you configure and build SLURM --with-pmix.

  Versions earlier than 16.05: you must use either SLURM's PMI-1 or
  PMI-2 support. SLURM builds PMI-1 by default, or you can manually
  install PMI-2. You must then build Open MPI using --with-pmi pointing
  to the SLURM PMI library location.

Please configure as appropriate and try again.
--
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***and potentially your MPI job)

This seems strange because the slurm-libpmi RPM is in fact installed on 
the system:


$ rpm -qa | grep slurm
slurm-libpmi-18.08.1-1.el7.x86_64
slurm-example-configs-18.08.1-1.el7.x86_64
slurm-18.08.1-1.el7.x86_64
slurm-pam_slurm-18.08.1-1.el7.x86_64
slurm-slurmd-18.08.1-1.el7.x86_64

$ rpm -ql slurm-libpmi-18.08.1-1.el7.x86_64
/usr/lib64/libpmi.so
/usr/lib64/libpmi.so.0
/usr/lib64/libpmi.so.0.0.0
/usr/lib64/libpmi2.so
/usr/lib64/libpmi2.so.0
/usr/lib64/libpmi2.so.0.0.0

Also, the loaded OpenMPI version appears to be sane enough:

$ which mpirun
~/skylake/software/OpenMPI/3.1.4-GCC-8.3.0/bin/mpirun
$ mpirun --version
mpirun (Open MPI) 3.1.4
$ ompi_info | grep ras
 MCA ras: slurm (MCA v2.1.0, API v2.0.0, Component v3.1.4)
 MCA ras: simulator (MCA v2.1.0, API v2.0.0, Component 
v3.1.4)


Googling for the OPAL ERROR I found a posting saying that missing the 
CentOS 7 hwloc-devel RPM was the source of the problems: 
https://users.open-mpi.narkive.com/C9HOavWo/ompi-users-fwd-openmpi-3-1-0-on-aarch64


Has anyone else seen this error?  Could a missing hwloc-devel OS 
prerequisite be causing problems?  I have tried to load explicitly the EB 
module hwloc/1.11.12-GCCcore-8.3.0, but that did not help.


Thanks,
Ole

--
Ole Holm Nielsen
PhD, Senior HPC Officer
Department of Physics, Technical University of Denmark


Re: [easybuild] Failure when building GCCcore/8.2.0

2020-01-24 Thread Ole Holm Nielsen

Hi Kenneth,

On 1/24/20 4:52 PM, Kenneth Hoste wrote:
...
lustre/hpc/others/ohni/skylake/build/GCCcore/8.2.0/system-system/gcc-8.2.0/stage3_obj/./mpfr/src/.libs 
-L/lustre/hpc/others/ohni/skylake/build/GCCcore/8.2.0/system-system/gcc-8.2.0/stage3_obj/./mpc/src/.libs 
-lmpc -lmpfr -lgm

p -rdynamic -ldl  -L./../zlib -lz
collect2: fatal error: ld terminated with signal 7 [Bus error]



This strongly suggests there was not enough memory available for GCC...


Thanks a lot for sharing your insight!  I have an interactive shell 
controlled by Slurm's srun, so I will have to submit the task with a 
larger memory.  Is it possible to estimate the memory requirement or get 
better error messages?


/Ole


[easybuild] Failure when building GCCcore/8.2.0

2020-01-24 Thread Ole Holm Nielsen
I'm making slow progress on building the foss-2019a toolchain (for our 
GPAW module GPAW-19.8.1-foss-2019a-Python-3.7.2.eb) on a Intel Xeon Gold 
6230 CPU (Cascade Lake) running CentOS 7.6.


I fixed the isl issue described in 
https://github.com/easybuilders/easybuild-easyconfigs/issues/9692 before 
building GCCcore-8.2.0.eb.


After a very long time, the build process repeatedly aborts with an error:
  * collect2: fatal error: ld terminated with signal 7 [Bus error]
with this output:

== processing EasyBuild easyconfig 
/groups/others/ohni/skylake/software/EasyBuild/4.1.1/easybuild/easyconfigs/g/GCCcore/GCCcore-8.2.0.eb

== building and installing GCCcore/8.2.0...
== fetching files...
== creating build dir, resetting environment...
== unpacking...
== patching...
== preparing...
== configuring...
== building...
== FAILED: Installation ended unsuccessfully (build directory: 
/groups/others/ohni/skylake/build/GCCcore/8.2.0/system-system): build 
failed (first 300 chars): cmd " make -j 2  bootstrap " exited with exit 
code 2 and output:

echo stage3 > stage_final
make[1]: Entering directory 
`/lustre/hpc/others/ohni/skylake/build/GCCcore/8.2.0/system-system/gcc-8.2.0/stage3_obj'
make[2]: Entering directory 
`/lustre/hpc/others/ohni/skylake/build/GCCcore/8.2.0/system-syst (took 1 
hour 18 min 33 sec)
== Results of the build can be found in the log file(s) 
/tmp/eb-CLzF4F/easybuild-GCCcore-8.2.0-20200124.145944.vrjZV.log
ERROR: Build of 
/groups/others/ohni/skylake/software/EasyBuild/4.1.1/easybuild/easyconfigs/g/GCCcore/GCCcore-8.2.0.eb 
failed (err: 'build failed (first 300 chars): cmd " make -j 2  bootstrap " 
exited with exit code 2 and output:\necho stage3 > stage_final\nmake[1]: 
Entering directory 
`/lustre/hpc/others/ohni/skylake/build/GCCcore/8.2.0/system-system/gcc-8.2.0/stage3_obj\'\nmake[2]: 
Entering directory 
`/lustre/hpc/others/ohni/skylake/build/GCCcore/8.2.0/system-syst')



The last part of the build log file is:

g++ -std=gnu++98 -no-pie   -g -DIN_GCC -fno-exceptions -fno-rtti 
-fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings 
-Wcast-qual -Wno-format -Wmissing-format-attribute -Woverloaded-virtual 
-pedantic -Wno-
long-long -Wno-variadic-macros -Wno-overlength-strings   -DHAVE_CONFIG_H 
-static-libstdc++ -static-libgcc  -o cc1 c/c-lang.o c-family/stub-objc.o 
attribs.o c/c-errors.o c/c-decl.o c/c-typeck.o c/c-convert.o 
c/c-aux-info.o c/c
-objc-common.o c/c-parser.o c/c-fold.o c/gimple-parser.o 
c-family/c-common.o c-family/c-cppbuiltin.o c-family/c-dump.o 
c-family/c-format.o c-family/c-gimplify.o c-family/c-indentation.o 
c-family/c-lex.o c-family/c-omp.o c-fam
ily/c-opts.o c-family/c-pch.o c-family/c-ppoutput.o c-family/c-pragma.o 
c-family/c-pretty-print.o c-family/c-semantics.o c-family/c-ada-spec.o 
c-family/c-ubsan.o c-family/known-headers.o c-family/c-attribs.o 
c-family/c-warn.o

 c-family/c-spellcheck.o i386-c.o glibc-c.o \
  cc1-checksum.o libbackend.a main.o libcommon-target.a libcommon.a 
../libcpp/libcpp.a ../libdecnumber/libdecnumber.a libcommon.a 
../libcpp/libcpp.a   ../libbacktrace/.libs/libbacktrace.a 
../libiberty/libiberty.a ../libdecnum
ber/libdecnumber.a 
-L/lustre/hpc/others/ohni/skylake/build/GCCcore/8.2.0/system-system/gcc-8.2.0/stage2_stuff/lib 
-lisl 
-L/lustre/hpc/others/ohni/skylake/build/GCCcore/8.2.0/system-system/gcc-8.2.0/stage3_obj/./gmp/.libs 
-L/
lustre/hpc/others/ohni/skylake/build/GCCcore/8.2.0/system-system/gcc-8.2.0/stage3_obj/./mpfr/src/.libs 
-L/lustre/hpc/others/ohni/skylake/build/GCCcore/8.2.0/system-system/gcc-8.2.0/stage3_obj/./mpc/src/.libs 
-lmpc -lmpfr -lgm

p -rdynamic -ldl  -L./../zlib -lz
collect2: fatal error: ld terminated with signal 7 [Bus error]
compilation terminated.
make[3]: *** [cc1] Error 1
make[3]: *** Waiting for unfinished jobs
rm gcc.pod
make[3]: Leaving directory 
`/lustre/hpc/others/ohni/skylake/build/GCCcore/8.2.0/system-system/gcc-8.2.0/stage3_obj/gcc'

make[2]: *** [all-stage1-gcc] Error 2
make[2]: Leaving directory 
`/lustre/hpc/others/ohni/skylake/build/GCCcore/8.2.0/system-system/gcc-8.2.0/stage3_obj'

make[1]: *** [stage1-bubble] Error 2
make[1]: Leaving directory 
`/lustre/hpc/others/ohni/skylake/build/GCCcore/8.2.0/system-system/gcc-8.2.0/stage3_obj'

make: *** [bootstrap] Error 2
 (at easybuild/tools/run.py:529 in parse_cmd_output)
== 2020-01-24 16:18:18,131 easyblock.py:3109 WARNING build failed (first 
300 chars): cmd " make -j 2  bootstrap " exited with exit code 2 and output:

echo stage3 > stage_final
make[1]: Entering directory 
`/lustre/hpc/others/ohni/skylake/build/GCCcore/8.2.0/system-system/gcc-8.2.0/stage3_obj'
make[2]: Entering directory 
`/lustre/hpc/others/ohni/skylake/build/GCCcore/8.2.0/system-syst
== 2020-01-24 16:18:18,131 easyblock.py:295 INFO Closing log for 
application name GCCcore version 8.2.0


Can anyone help me debug this problem?

Thanks a lot,
Ole

--
Ole Holm Nielsen
PhD, Senior HPC Officer
Department of Physics, Technical University of Denmark


Re: [easybuild] Checksum error when building GCCcore/8.2.0

2020-01-16 Thread Ole Holm Nielsen

On 16-01-2020 16:49, Kenneth Hoste wrote:

On 16/01/2020 16:28, Lars Viklund wrote:
The file contains a file listing and is the result of 
https://isl.gforge.inria.fr/isl-0.20.tar.bz2 redirecting with a 302 to 
http://isl.gforce.inria.fr .


It seems like the site only offers downloads over plaintext, while 
someone changed the link to HTTPS in the easyconfig recently in 
https://github.com/easybuilders/easybuild-easyconfigs/commit/d93ef853012ef04e373cd19696e045408d8b5a8f 



Ouch. That's hard to catch without forcibly re-downloading every time we 
touch source URLs (which I guess we should have done here).


I someone hits this:

* manually download with wget or curl from 
http://isl.gforge.inria.fr/isl-0.20.tar.bz2


I confirm that manual download from the http site gives the correct 
checksum for isl-0.20.tar.bz2.  Now the building of GCCcore can proceed 
normally.



* move the downloaded isl-0.20.tar.bz2 to the /path/to/sources/g/GCCcore/


The checksum in the easyconfig file is correct, but the auto-download is 
broken...


Follow-up in 
https://github.com/easybuilders/easybuild-easyconfigs/issues/9692 .

Thanks,
Ole


Re: [easybuild] Checksum error when building GCCcore/8.2.0

2020-01-16 Thread Ole Holm Nielsen

Hi Jack,

Thanks for the feedback about checksums!  The current file 
isl-0.20.tar.bz2 which was downloaded by EB actually has a different checksum:


$ ls -l /groups/others/ohni/modules/sources/g/GCCcore/isl-0.20.tar.bz2
-rw-r--r-- 1 ohni others 17121 Jan 16 14:00 
/groups/others/ohni/modules/sources/g/GCCcore/isl-0.20.tar.bz2


$ sha256sum /groups/others/ohni/modules/sources/g/GCCcore/isl-0.20.tar.bz2
c968e1f20d7e48c395a0779b6b07e2723f003b1330c6deeb606e7db96b2cf6a4 
/groups/others/ohni/modules/sources/g/GCCcore/isl-0.20.tar.bz2


The official tar-ball (dated 2018-07-28) checksum has apparently changed:

$ wget https://isl.gforge.inria.fr/isl-0.20.tar.bz2
$ sha256sum isl-0.20.tar.bz2
c968e1f20d7e48c395a0779b6b07e2723f003b1330c6deeb606e7db96b2cf6a4 
isl-0.20.tar.bz2


So I guess the EB file GCCcore-8.2.0.eb needs to have the checksum updated?

Thanks,
Ole


On 1/16/20 3:29 PM, Jack Perdue wrote:

Howdy ,

The checksum given matches the  filie I have here:

$ sha256sum /sw/eb/sources/g/GCCcore/isl-0.20.tar.bz2
b587e083eb65a8b394e833dea1744f21af3f0e413a448c17536b5549ae42a4c2 
/sw/eb/sources/g/GCCcore/isl-0.20.tar.bz2


but I downloaded that a year ago:

$ ls -al /sw/eb/sources/g/GCCcore/isl-0.20.tar.bz2
-rw-rw-r-- 1 j-perdue staff 1727820 Jan  2  2019 
/sw/eb/sources/g/GCCcore/isl-0.20.tar.bz2


You might try downloading it again. (???)

Jack Perdue
Lead Systems Administrator
High Performance Research Computing
TAMU Division of Research
j-per...@tamu.edu    http://hprc.tamu.edu
HPRC Helpdesk: h...@hprc.tamu.edu

On 1/16/20 7:21 AM, Ole Holm Nielsen wrote:
I'm working on a remote system (CentOS 7.6) and have installed the 
latest EB 4.1.1.


When building the foss-2019a.eb toolchain I get a checksum error for 
isl-0.20.tar.bz2 when building GCCcore/8.2.0, please see log file lines 
below.


Is there a fix for this error?

Thanks,
Ole

== 2020-01-16 14:14:41,190 easyblock.py:2857 INFO Starting source step
== 2020-01-16 14:14:41,191 easyblock.py:2863 INFO Running method 
checksum_step part of step source
== 2020-01-16 14:14:41,825 easyblock.py:1801 INFO Checksum verification 
for /groups/others/ohni/modules/sources/g/GCCcore/gcc-8.2.0.tar.gz using 
1b0f36be1045ff58cbb9c83743835367b860810f17f0195a4e093458b372020f passed.
== 2020-01-16 14:14:41,839 easyblock.py:1801 INFO Checksum verification 
for /groups/others/ohni/modules/sources/g/GCCcore/gmp-6.1.2.tar.bz2 
using 5275bb04f4863a13516b2f39392ac5e272f5e1bb8057b18aec1c9b79d73d8fb2 
passed.
== 2020-01-16 14:14:41,848 easyblock.py:1801 INFO Checksum verification 
for /groups/others/ohni/modules/sources/g/GCCcore/mpfr-4.0.1.tar.bz2 
using a4d97610ba8579d380b384b225187c250ef88cfe1d5e7226b89519374209b86b 
passed.
== 2020-01-16 14:14:41,852 easyblock.py:1801 INFO Checksum verification 
for /groups/others/ohni/modules/sources/g/GCCcore/mpc-1.1.0.tar.gz using 
6985c538143c1208dcb1ac42cedad6ff52e267b47e5f970183a3e75125b43c2e passed.
== 2020-01-16 14:14:41,951 build_log.py:169 ERROR EasyBuild crashed with 
an error (at easybuild/base/exceptions.py:124 in __init__): Checksum 
verification for 
/groups/others/ohni/modules/sources/g/GCCcore/isl-0.20.tar.bz2 using 
b587e083eb65a8b394e833dea1744f21af3f0e413a448c17536b5549ae42a4c2 failed. 
(at easybuild/framework/easyblock.py:1805 in checksum_step)
== 2020-01-16 14:14:41,952 easyblock.py:3109 WARNING build failed (first 
300 chars): Checksum verification for 
/groups/others/ohni/modules/sources/g/GCCcore/isl-0.20.tar.bz2 using 
b587e083eb65a8b394e833dea1744f21af3f0e413a448c17536b5549ae42a4c2 failed.
== 2020-01-16 14:14:41,952 easyblock.py:295 INFO Closing log for 
application name GCCcore version 8.2.0
== 2020-01-16 14:14:41,952 build_log.py:265 INFO FAILED: Installation 
ended unsuccessfully (build directory: 
/groups/others/ohni/modules/build/GCCcore/8.2.0/system-system): build 
failed (first 300 chars): Checksum verification for 
/groups/others/ohni/modules/sources/g/GCCcore/isl-0.20.tar.bz2 using 
b587e083eb65a8b394e833dea1744f21af3f0e413a448c17536b5549ae42a4c2 failed. 
(took 1 sec)
== 2020-01-16 14:14:41,953 build_log.py:265 INFO Results of the build 
can be found in the log file(s) 
/tmp/eb-iti_1V/easybuild-GCCcore-8.2.0-20200116.141439.jMWZa.log
== 2020-01-16 14:14:41,954 build_log.py:169 ERROR EasyBuild crashed with 
an error (at easybuild/base/exceptions.py:124 in __init__): build failed 
(first 300 chars): Checksum verification for 
/groups/others/ohni/modules/sources/g/GCCcore/isl-0.20.tar.bz2 using 
b587e083eb65a8b394e833dea1744f21af3f0e413a448c17536b5549ae42a4c2 failed. 
(at easybuild/main.py:116 in build_and_install_software)
== 2020-01-16 14:14:41,955 build_log.py:169 ERROR EasyBuild crashed with 
an error (at easybuild/base/exceptions.py:124 in __init__): Build of 
/groups/others/ohni/modules/software/EasyBuild/4.1.1/easybuild/easyconfigs/g/GCCcore/GCCcore-8.2.0.eb 
failed (err: 'build failed (first 300 chars): Checksum verification for 
/groups/others/ohni/modules/sources/g/GCCcore/isl-0.20.tar.bz2 using

[easybuild] Checksum error when building GCCcore/8.2.0

2020-01-16 Thread Ole Holm Nielsen
I'm working on a remote system (CentOS 7.6) and have installed the latest 
EB 4.1.1.


When building the foss-2019a.eb toolchain I get a checksum error for 
isl-0.20.tar.bz2 when building GCCcore/8.2.0, please see log file lines below.


Is there a fix for this error?

Thanks,
Ole

== 2020-01-16 14:14:41,190 easyblock.py:2857 INFO Starting source step
== 2020-01-16 14:14:41,191 easyblock.py:2863 INFO Running method 
checksum_step part of step source
== 2020-01-16 14:14:41,825 easyblock.py:1801 INFO Checksum verification 
for /groups/others/ohni/modules/sources/g/GCCcore/gcc-8.2.0.tar.gz using 
1b0f36be1045ff58cbb9c83743835367b860810f17f0195a4e093458b372020f passed.
== 2020-01-16 14:14:41,839 easyblock.py:1801 INFO Checksum verification 
for /groups/others/ohni/modules/sources/g/GCCcore/gmp-6.1.2.tar.bz2 using 
5275bb04f4863a13516b2f39392ac5e272f5e1bb8057b18aec1c9b79d73d8fb2 passed.
== 2020-01-16 14:14:41,848 easyblock.py:1801 INFO Checksum verification 
for /groups/others/ohni/modules/sources/g/GCCcore/mpfr-4.0.1.tar.bz2 using 
a4d97610ba8579d380b384b225187c250ef88cfe1d5e7226b89519374209b86b passed.
== 2020-01-16 14:14:41,852 easyblock.py:1801 INFO Checksum verification 
for /groups/others/ohni/modules/sources/g/GCCcore/mpc-1.1.0.tar.gz using 
6985c538143c1208dcb1ac42cedad6ff52e267b47e5f970183a3e75125b43c2e passed.
== 2020-01-16 14:14:41,951 build_log.py:169 ERROR EasyBuild crashed with 
an error (at easybuild/base/exceptions.py:124 in __init__): Checksum 
verification for 
/groups/others/ohni/modules/sources/g/GCCcore/isl-0.20.tar.bz2 using 
b587e083eb65a8b394e833dea1744f21af3f0e413a448c17536b5549ae42a4c2 failed. 
(at easybuild/framework/easyblock.py:1805 in checksum_step)
== 2020-01-16 14:14:41,952 easyblock.py:3109 WARNING build failed (first 
300 chars): Checksum verification for 
/groups/others/ohni/modules/sources/g/GCCcore/isl-0.20.tar.bz2 using 
b587e083eb65a8b394e833dea1744f21af3f0e413a448c17536b5549ae42a4c2 failed.
== 2020-01-16 14:14:41,952 easyblock.py:295 INFO Closing log for 
application name GCCcore version 8.2.0
== 2020-01-16 14:14:41,952 build_log.py:265 INFO FAILED: Installation 
ended unsuccessfully (build directory: 
/groups/others/ohni/modules/build/GCCcore/8.2.0/system-system): build 
failed (first 300 chars): Checksum verification for 
/groups/others/ohni/modules/sources/g/GCCcore/isl-0.20.tar.bz2 using 
b587e083eb65a8b394e833dea1744f21af3f0e413a448c17536b5549ae42a4c2 failed. 
(took 1 sec)
== 2020-01-16 14:14:41,953 build_log.py:265 INFO Results of the build can 
be found in the log file(s) 
/tmp/eb-iti_1V/easybuild-GCCcore-8.2.0-20200116.141439.jMWZa.log
== 2020-01-16 14:14:41,954 build_log.py:169 ERROR EasyBuild crashed with 
an error (at easybuild/base/exceptions.py:124 in __init__): build failed 
(first 300 chars): Checksum verification for 
/groups/others/ohni/modules/sources/g/GCCcore/isl-0.20.tar.bz2 using 
b587e083eb65a8b394e833dea1744f21af3f0e413a448c17536b5549ae42a4c2 failed. 
(at easybuild/main.py:116 in build_and_install_software)
== 2020-01-16 14:14:41,955 build_log.py:169 ERROR EasyBuild crashed with 
an error (at easybuild/base/exceptions.py:124 in __init__): Build of 
/groups/others/ohni/modules/software/EasyBuild/4.1.1/easybuild/easyconfigs/g/GCCcore/GCCcore-8.2.0.eb 
failed (err: 'build failed (first 300 chars): Checksum verification for 
/groups/others/ohni/modules/sources/g/GCCcore/isl-0.20.tar.bz2 using 
b587e083eb65a8b394e833dea1744f21af3f0e413a448c17536b5549ae42a4c2 failed.') 
(at easybuild/main.py:148 in build_and_install_software)


--
Ole Holm Nielsen
PhD, Senior HPC Officer
Department of Physics, Technical University of Denmark


[easybuild] Lmod upgrade to version 8.2

2019-12-15 Thread Ole Holm Nielsen

Dear Easybuilders,

For those of you who prefer to install Lmod from a public repository, the 
Fedora EPEL now provides a fairly new Lmod version 8.2.7, see

https://bugzilla.redhat.com/show_bug.cgi?id=1777262

You may use Yum to update Lmod from the EPEL repository, or you may find 
the EL7 package here:

https://dl.fedoraproject.org/pub/epel/7/x86_64/Packages/l/Lmod-8.2.7-1.el7.x86_64.rpm

Please note that Lmod releases are made very frequently, see 
https://github.com/TACC/Lmod.  But it's good for us to be tracking the 
Lmod developments regularly.


--
Ole Holm Nielsen
PhD, Senior HPC Officer
Department of Physics, Technical University of Denmark


[easybuild] GULP installation by EasyBuild?

2019-11-08 Thread Ole Holm Nielsen

Dear Easybuilders,

We have a user request to install the GULP materials simulation package 
from http://gulp.curtin.edu.au/gulp/


No GULP module is found on the EasyBuild list of software, so I would like 
to ask if anyone has already created a .eb file for GULP?


Thanks a lot,
Ole

--
Ole Holm Nielsen
PhD, Senior HPC Officer
Department of Physics, Technical University of Denmark


Re: [easybuild] EasyBuild in a heterogeneous HPC centre

2019-11-07 Thread Ole Holm Nielsen

Hi Douglas,

For our Linux cluster with 4 generations of hardware, I chose to make 
completely separate module trees for each type.


IMHO, this is the simplest and most easily maintainable solution, 
although it is slightly wasteful of storage space on our central NFS 
server.  It has worked well for several years now.


I documented my approach in a Wiki page:
https://wiki.fysik.dtu.dk/niflheim/EasyBuild_modules#automounting-the-cpu-architecture-dependent-modules-directory

Best regards,
Ole

On 11/7/19 12:57 PM, Douglas Scofield wrote:

We are curious how to maintain architecture-specific EasyBuild trees.  We are 
new to EasyBuild and already have many, many (mostly bioinformatics) tools 
installed in hand-maintained module and software trees.  In our centre, we have 
clusters running Sandy Bridge EP, Haswell EP, and Broadwell EP.  Most of our 
users are on Broadwell, so we if we compile with -march=native etc., we compile 
for Broadwell and for Sandy Bridge, which covers it and Haswell.

In our standard module tree we manage architecture automatically, keying off a 
$Cluster variable set by the module system.  We handle architectures as if we 
had modules versioned Tool/Version/Architecture, with the last bit hidden from 
the user.

We have also decided to (mostly) hide our EasyBuild tree from the user, and 
instead provide access to EasyBuild-built tools using what we are calling alias 
modules, which we place in our standard module tree.  An alias module performs 
a 'module use' of the easybuild tree and then loads the appropriate 
EasyBuild-built modules.  The large majority of our users do not care about 
toolchains, etc.  Those that do, we will have docs they can consult for working 
with EasyBuild modules directly.  The very large majority of our installed 
tools do not currently have easyconfigs.

We are guessing that our architecture solution with EasyBuild will end up being 
completely separate EasyBuild trees, accessed using distinct 'module use' 
paths.  The EasyBuild docs point to a 2015 presentation by Pablo Escobar 
describing an automounter solution which we are definitely not going to 
implement, but this suggests completely separate trees as well.

How is this typically handled by other centres ?

Thanks in advance,

Douglas
—
Douglas G. Scofield
Evolutionary Biology Centre, Uppsala University
douglas.scofi...@ebc.uu.se
douglasgscofi...@gmail.com









När du har kontakt med oss på Uppsala universitet med e-post så innebär det att 
vi behandlar dina personuppgifter. För att läsa mer om hur vi gör det kan du 
läsa här: http://www.uu.se/om-uu/dataskydd-personuppgifter/

E-mailing Uppsala University means that we will process your personal data. For 
more information on how this is performed, please read here: 
http://www.uu.se/en/about-uu/data-protection-policy



--
Ole Holm Nielsen
PhD, Senior HPC Officer
Department of Physics, Technical University of Denmark,
Fysikvej Building 309, DK-2800 Kongens Lyngby, Denmark
E-mail: ole.h.niel...@fysik.dtu.dk
Homepage: http://dcwww.fysik.dtu.dk/~ohnielse/
Mobile: (+45) 5180 1620


Re: [easybuild] Re: Missing features in ABINIT/8.10.3-intel-2018b

2019-08-28 Thread Ole Holm Nielsen

Hi Yann,

I now have these lines in the .eb file:

# xml support
configopts += '--enable-xml '
configopts += 'CFLAGS_EXTRA="-I/usr/include/libxml2" '
configopts += 'CPPFLAGS_EXTRA="-I/usr/include/libxml2" '
configopts += 'FC_LIBS_EXTRA="-lxml2 -lz -lm -ldl" '

but unfortunately I still get the same error about the missing file:

mpiifort -DHAVE_CONFIG_H -I. -I../..  -I../../src/77_ddb 
-I../../src/77_ddb -I../../src/32_util -I../../src/32_util 
-I../../src/44_abitools -I../../src/44_abitools 
-I../../src/27_toolbox_oop -I../../src/27_toolbox_oop 
-I../../src/44_abitypes_defs -I../../src/44_abitypes_defs 
-I../../src/72_response -I../../src/72_response -I../../src/16_hideleave 
-I../../src/16_hideleave -I../../src/42_parser -I../../src/42_parser 
-I../../src/55_abiutil -I../../src/55_abiutil -I../../src/41_geometry 
-I../../src/41_geometry -I../../src/45_geomoptim 
-I../../src/45_geomoptim -I../../src/12_hide_mpi -I../../src/12_hide_mpi 
-I../../src/10_defs -I../../src/10_defs -I../../src/14_hidewrite 
-I../../src/14_hidewrite -I../../src/57_iovars -I../../src/57_iovars 
-I../../src/28_numeric_noabirule -I../../src/28_numeric_noabirule 
-I../../src/incs -I../../src/incs 
-I/home/modules/software/netCDF/4.6.1-intel-2018b/include 
-I/home/modules/software/netCDF-Fortran/4.4.4-intel-2018b/include 
-I/home/modules/build/ABINIT/8.10.3/intel-2018b/abinit-8.10.3/fallbacks/exports/include 
  -free -module 
/home/modules/build/ABINIT/8.10.3/intel-2018b/abinit-8.10.3/src/mods 
-O2 -xHost -ftz -fp-speculation=safe -fp-model source -fPIC -c -o 
m_spin_hist.o m_spin_hist.F90
mpiifort -DHAVE_CONFIG_H -I. -I../..  -I../../src/77_ddb 
-I../../src/77_ddb -I../../src/32_util -I../../src/32_util 
-I../../src/44_abitools -I../../src/44_abitools 
-I../../src/27_toolbox_oop -I../../src/27_toolbox_oop 
-I../../src/44_abitypes_defs -I../../src/44_abitypes_defs 
-I../../src/72_response -I../../src/72_response -I../../src/16_hideleave 
-I../../src/16_hideleave -I../../src/42_parser -I../../src/42_parser 
-I../../src/55_abiutil -I../../src/55_abiutil -I../../src/41_geometry 
-I../../src/41_geometry -I../../src/45_geomoptim 
-I../../src/45_geomoptim -I../../src/12_hide_mpi -I../../src/12_hide_mpi 
-I../../src/10_defs -I../../src/10_defs -I../../src/14_hidewrite 
-I../../src/14_hidewrite -I../../src/57_iovars -I../../src/57_iovars 
-I../../src/28_numeric_noabirule -I../../src/28_numeric_noabirule 
-I../../src/incs -I../../src/incs 
-I/home/modules/software/netCDF/4.6.1-intel-2018b/include 
-I/home/modules/software/netCDF-Fortran/4.4.4-intel-2018b/include 
-I/home/modules/build/ABINIT/8.10.3/intel-2018b/abinit-8.10.3/fallbacks/exports/include 
  -free -module 
/home/modules/build/ABINIT/8.10.3/intel-2018b/abinit-8.10.3/src/mods 
-O2 -xHost -ftz -fp-speculation=safe -fp-model source -fPIC -c -o 
m_spin_reciprocal.o m_spin_reciprocal.F90
m_multibinit_dataset.F90(2032): remark #8291: Recommended relationship 
between field width 'W' and the number of fractional digits 'D' in this 
edit descriptor is 'W>=D+7'.


if(multibinit_dtset%strcpling==2)write(nunit,'(3x,a9,3es8.2)')'delta_df',multibinit_dtset%delta_df
---^
effpot_xml.c(29): catastrophic error: cannot open source file 
"libxml/parser.h"

  #include 
^

compilation aborted for effpot_xml.c (code 4)

Do you have any other suggestions?

Thanks,
Ole


On 8/28/19 12:42 PM, Yann Pouillon wrote:

On Wed, 28 Aug 2019 12:11:44 +0200
Ole Holm Nielsen  wrote:


Unfortunately, the build now crashes with this error in the log file:

[...]

---^
effpot_xml.c(29): catastrophic error: cannot open source file
"libxml/parser.h"
#include 
  ^
compilation aborted for effpot_xml.c (code 4)

The missing source file libxml/parser.h is actually installed on the system:

$ rpm -qf /usr/include/libxml2/libxml/parser.h
libxml2-devel-2.9.1-6.el7_2.3.x86_64

Could you kindly suggest some additional configopts to solve this problem?


I hadn't realized this is ABINIT 8.10.3, where the build system still has
some issues with mixing C and Fortran.

The following configopt might help:

   CPPFLAGS_EXTRA="-I/usr/include/libxml2"

It will be automatically searched for in future versions.


Re: [easybuild] Re: Missing features in ABINIT/8.10.3-intel-2018b

2019-08-28 Thread Ole Holm Nielsen

Hi Yann,

Thanks very much for your insightful advice!  I've modified 
ABINIT-8.10.3-intel-2018b.eb to contain flags pointing to the CentOS 
system libxml2 headers:


dependencies = [
('libxc', '3.0.1'),
('netCDF', '4.6.1'),
('netCDF-Fortran', '4.4.4'),
('HDF5', '1.10.2'),
('Wannier90', '2.0.1.1', '-abinit'),
('AtomPAW', '4.1.0.5'),
]

# ensure mpi and intel toolchain
configopts = '--enable-mpi '

# xml support
configopts += '--enable-xml '
configopts += 'CFLAGS_EXTRA="-I/usr/include/libxml2" '
configopts += 'FC_LIBS_EXTRA="-lxml2 -lz -lm -ldl" '

# openmp support
configopts += '--enable-openmp '
...

The RPM packages were already installed on our CentOS 7.6 system:

$ rpm -q libxml2 libxml2-devel
libxml2-2.9.1-6.el7_2.3.x86_64
libxml2-devel-2.9.1-6.el7_2.3.x86_64

Unfortunately, the build now crashes with this error in the log file:

mpiifort -DHAVE_CONFIG_H -I. -I../..  -I../../src/77_ddb 
-I../../src/77_ddb -I../../src/32_util -I../../src/32_util 
-I../../src/44_abitools -I../../src/44_abitools 
-I../../src/27_toolbox_oop -I../../src/27_toolbox_oop 
-I../../src/44_abitypes_defs -I../../src/44_abitypes_defs 
-I../../src/72_response -I../../src/72_response -I../../src/16_hideleave 
-I../../src/16_hideleave -I../../src/42_parser -I../../src/42_parser 
-I../../src/55_abiutil -I../../src/55_abiutil -I../../src/41_geometry 
-I../../src/41_geometry -I../../src/45_geomoptim 
-I../../src/45_geomoptim -I../../src/12_hide_mpi -I../../src/12_hide_mpi 
-I../../src/10_defs -I../../src/10_defs -I../../src/14_hidewrite 
-I../../src/14_hidewrite -I../../src/57_iovars -I../../src/57_iovars 
-I../../src/28_numeric_noabirule -I../../src/28_numeric_noabirule 
-I../../src/incs -I../../src/incs 
-I/home/modules/software/netCDF/4.6.1-intel-2018b/include 
-I/home/modules/software/netCDF-Fortran/4.4.4-intel-2018b/include 
-I/home/modules/build/ABINIT/8.10.3/intel-2018b/abinit-8.10.3/fallbacks/exports/include 
  -free -module 
/home/modules/build/ABINIT/8.10.3/intel-2018b/abinit-8.10.3/src/mods 
-O2 -xHost -ftz -fp-speculation=safe -fp-model source -fPIC -c -o 
m_spin_reciprocal.o m_spin_reciprocal.F90
m_multibinit_dataset.F90(2032): remark #8291: Recommended relationship 
between field width 'W' and the number of fractional digits 'D' in this 
edit descriptor is 'W>=D+7'.


if(multibinit_dtset%strcpling==2)write(nunit,'(3x,a9,3es8.2)')'delta_df',multibinit_dtset%delta_df
---^
effpot_xml.c(29): catastrophic error: cannot open source file 
"libxml/parser.h"

  #include 
^
compilation aborted for effpot_xml.c (code 4)

The missing source file libxml/parser.h is actually installed on the system:

$ rpm -qf /usr/include/libxml2/libxml/parser.h
libxml2-devel-2.9.1-6.el7_2.3.x86_64

Could you kindly suggest some additional configopts to solve this problem?

Thanks a lot,
Ole

On 8/27/19 11:33 PM, Yann Pouillon wrote:

Dear Ole,

If you use the --enable-xml option, the configure script of ABINIT assumes
that you have the development files of LibXML2 installed on your system.

I don't know if there is a specific EasyConfig for LibXML2 or if this should
be marked as "system dependency". In any case, EasyBuild should be informed
that the XML header files are required.





[easybuild] Re: Missing features in ABINIT/8.10.3-intel-2018b

2019-08-27 Thread Ole Holm Nielsen

Hi Jean-Michel,

Our user requiring ABINIT with XML has now hit a new error:

--- !ERROR
src_file: m_pspheads.F90
src_line: 226
mpi_rank: 0
message: |
XML norm-conserving pseudopotential has been input, but abinit is 
not compiled with libPSML support. Reconfigure and recompile.

...

Question: How do I build ABINIT with libpsml support?

I noticed that EB includes an old libpsml version:

$ eb -S libpsml
CFGS1=/home/modules/software/EasyBuild/3.9.4/lib/python2.7/site-packages/easybuild_easyconfigs-3.9.4-py2.7.egg/easybuild/easyconfigs/l/libpsml
 * $CFGS1/libpsml-1.1.7-foss-2016b.eb
 * $CFGS1/libpsml-1.1.7-foss-2017a.eb

Maybe such a module could be used?

Thanks,
Ole

On 8/27/19 12:47 PM, Ole Holm Nielsen wrote:

Hi Jean-Michel,

On 8/27/19 11:24 AM, Jean-Michel Beuken wrote:

it seems a problem of dependency...

you need to install  ( for CentOS )

libxml2-2.9.1-6.el7_2.3.x86_64
libxml2-devel-2.9.1-6.el7_2.3.x86_64

and add this two environment variables in the config file

configopts += 'CFLAGS_EXTRA="-I/usr/include/libxml2" '
configopts += 'FC_LIBS_EXTRA="-lxml2 -lz -lm -ldl" '


Actually we already have the libxml2 EB module installed, so I changed 
ABINIT-8.10.3-intel-2018b.eb to contain an extra line for a libxml2 
dependency and added your suggested configopts lines:


...
dependencies = [
     ('libxc', '3.0.1'),
     ('netCDF', '4.6.1'),
     ('netCDF-Fortran', '4.4.4'),
     ('HDF5', '1.10.2'),
     ('Wannier90', '2.0.1.1', '-abinit'),
     ('AtomPAW', '4.1.0.5'),
     ('libxml2', '2.9.8'),
]

# ensure mpi and intel toolchain
configopts = '--enable-mpi '

# xml support
configopts += '--enable-xml '
configopts += 'FC_LIBS_EXTRA="-lxml2 -lz -lm -ldl" '

# openmp support
configopts += '--enable-openmp '
...

Now ABINIT builds successfully!  Do you think it is a good idea to add 
the XML support to the ABINIT .eb file in EasyBuild?


Thanks a lot for your support!

Best regards,
Ole


[easybuild] Re: Missing features in ABINIT/8.10.3-intel-2018b

2019-08-27 Thread Ole Holm Nielsen

Hi Jean-Michel,

On 8/27/19 11:24 AM, Jean-Michel Beuken wrote:

it seems a problem of dependency...

you need to install  ( for CentOS )

libxml2-2.9.1-6.el7_2.3.x86_64
libxml2-devel-2.9.1-6.el7_2.3.x86_64

and add this two environment variables in the config file

configopts += 'CFLAGS_EXTRA="-I/usr/include/libxml2" '
configopts += 'FC_LIBS_EXTRA="-lxml2 -lz -lm -ldl" '


Actually we already have the libxml2 EB module installed, so I changed 
ABINIT-8.10.3-intel-2018b.eb to contain an extra line for a libxml2 
dependency and added your suggested configopts lines:


...
dependencies = [
('libxc', '3.0.1'),
('netCDF', '4.6.1'),
('netCDF-Fortran', '4.4.4'),
('HDF5', '1.10.2'),
('Wannier90', '2.0.1.1', '-abinit'),
('AtomPAW', '4.1.0.5'),
('libxml2', '2.9.8'),
]

# ensure mpi and intel toolchain
configopts = '--enable-mpi '

# xml support
configopts += '--enable-xml '
configopts += 'FC_LIBS_EXTRA="-lxml2 -lz -lm -ldl" '

# openmp support
configopts += '--enable-openmp '
...

Now ABINIT builds successfully!  Do you think it is a good idea to add 
the XML support to the ABINIT .eb file in EasyBuild?


Thanks a lot for your support!

Best regards,
Ole

--
Ole Holm Nielsen
PhD, Senior HPC Officer
Department of Physics, Technical University of Denmark


[easybuild] Re: Missing features in ABINIT/8.10.3-intel-2018b

2019-08-27 Thread Ole Holm Nielsen

Hi Jean-Michel,

Thanks very much for your fine analysis of our user's attempts to 
configure ABINIT (incorrectly, as it turned out)!  I have made a copy of 
the ABINIT-8.10.3-intel-2018b.eb file and added the two configopts which 
you recommend below.


Unfortunately it seems that I miss something because the build fails due 
to a missing source file "libxml/parser.h".  Error messages in the log 
file are:


mpiifort -DHAVE_CONFIG_H -I. -I../..  -I../../src/77_ddb 
-I../../src/77_ddb -I../../src/32_util -I../../src/32_util 
-I../../src/44_abitools -I../../src/44_abitools 
-I../../src/27_toolbox_oop -I../../src/27_toolbox_oop 
-I../../src/44_abitypes_defs -I../../src/44_abitypes_defs 
-I../../src/72_response -I../../src/72_response -I../../src/16_hideleave 
-I../../src/16_hideleave -I../../src/42_parser -I../../src/42_parser 
-I../../src/55_abiutil -I../../src/55_abiutil -I../../src/41_geometry 
-I../../src/41_geometry -I../../src/45_geomoptim 
-I../../src/45_geomoptim -I../../src/12_hide_mpi -I../../src/12_hide_mpi 
-I../../src/10_defs -I../../src/10_defs -I../../src/14_hidewrite 
-I../../src/14_hidewrite -I../../src/57_iovars -I../../src/57_iovars 
-I../../src/28_numeric_noabirule -I../../src/28_numeric_noabirule 
-I../../src/incs -I../../src/incs 
-I/home/modules/software/netCDF/4.6.1-intel-2018b/include 
-I/home/modules/software/netCDF-Fortran/4.4.4-intel-2018b/include 
-I/home/modules/build/ABINIT/8.10.3/intel-2018b/abinit-8.10.3/fallbacks/exports/include 
  -free -module 
/home/modules/build/ABINIT/8.10.3/intel-2018b/abinit-8.10.3/src/mods 
-O2 -xHost -ftz -fp-speculation=safe -fp-model source -fPIC -c -o 
m_mathfuncs.o m_mathfuncs.F90
effpot_xml.c(29): catastrophic error: cannot open source file 
"libxml/parser.h"

  #include 
^
compilation aborted for effpot_xml.c (code 4)

Can you help adding the missing file to the build somehow?

Thanks a lot,
Ole


On 8/26/19 6:45 PM, Jean-Michel Beuken wrote:

Add blue lines in ABINIT/8.10.3-intel-2018b config

-

# ensure mpi and intel toolchain
configopts = '--enable-mpi '

*# xml support*
*configopts += '--enable-xml '*
*
*
*# openmp support*
*configopts += '--enable-openmp '*

# linalg & fft




--
Ole Holm Nielsen
PhD, Senior HPC Officer
Department of Physics, Technical University of Denmark


[easybuild] Missing features in ABINIT/8.10.3-intel-2018b

2019-08-26 Thread Ole Holm Nielsen
Could I ask any ABINIT/EasyBuild experts out there for help in adding 
some features to the ABINIT/8.10.3-intel-2018b module installed with 
EasyBuild?


We have an ABINIT user who wrote to me about some features which he is 
missing, and which were apparently described during the "ABINIT Hands-on 
2019" session named "Installing ABINIT":


But for PAW based datasets only ‘.xml’ formatted pseudopotentials are available. So we need to activate ‘xml’ during configure. 


Listed below please find suggested setups during configure (xml, LAPACK, libxc 
are necessary, mpi and OpenMP are suggested and can work together, I'm not sure 
the compilers, CXX=mpicxx, FC=mpif90, CC=mpicc are suitable for our platform or 
not):

./configure --prefix=/path/to/be/installed \
--enable-xml=yes \
--enable-LAPACK=yes \
--enable-libxc=yes \
--enable-mpi=yes \
--enable-OpenMP=yes \
--enable-FFTW=yes \
--enable-64bit-flags \
--enable-netcdf=yes \
--with-dft-flavor=libxc \
--with-linalg-flavor=netCDF \
--enable-bigdft=yes \
--with-trio-flavor=netcdf \
--with-timer-flavor=abinit \
CXX=mpicxx FC=mpif90 CC=mpicc \


I'm not sure if this request is simple to satisfy, and whether one just 
needs to add something to the file 
$CFGS1/a/ABINIT/ABINIT-8.10.3-intel-2018b.eb.  Under any circumstances I 
need an ABINIT expert to review the above request and suggest possible 
ways in which we can satisfy our user.


Thanks for sharing any insights!

/Ole

--
Ole Holm Nielsen
PhD, Senior HPC Officer
Department of Physics, Technical University of Denmark


Re: [easybuild] Unable to install Eb on CentOS 7.6

2019-06-18 Thread Ole Holm Nielsen

On 6/18/19 2:25 PM, Yann Sagon wrote:

I'm trying to install EasyBuild as user in CentOS 7.6.

I tried this procedure:
https://easybuild.readthedocs.io/en/latest/Installation.html#bootstrapping-procedure

which complains about vsc-install.


You should first check the Requirements:
https://easybuild.readthedocs.io/en/latest/Installation.html#requirements
https://easybuild.readthedocs.io/en/latest/Installation.html#required-python-packages

Then it should work just fine :-)

/Ole


Re: [easybuild] Openblas(foss) matrix issue

2019-05-29 Thread Ole Holm Nielsen

Hi Pablo,

Thanks for the good news!  For those of us who are not experts, could 
you kindly describe the commands required to update our currently 
installed OpenBLAS modules correctly?  Are there any caveats with 
updating an existing module in a toolchain (foss)?


We currently have these OpenBLAS modules installed:

$ ml av OpenBLAS/

-- /home/modules/modules/all 
---

   OpenBLAS/0.2.20-GCC-6.4.0-2.28OpenBLAS/0.3.5-GCC-8.2.0-2.31.1 (D)
   OpenBLAS/0.3.1-GCC-7.3.0-2.30

So is the update command simply this one?

eb --rebuild OpenBLAS-0.3.5-GCC-8.2.0-2.31.1.eb 
OpenBLAS-0.3.1-GCC-7.3.0-2.30.eb OpenBLAS-0.2.20-GCC-6.4.0-2.28.eb


Is the --force flag also required?

I suppose that we do not have to rebuild the ScaLAPACK modules as well?

Thanks a lot,
Ole



On 5/28/19 5:14 PM, Pablo Escobar Lopez wrote:

thank you Carlos! You did a great job figuring out this fix :)

I can confirm that after applying this patch in our cluster the issue 
seems to be solved for us. Now we pass these tests with 
"OpenBLAS/0.3.1-GCC-7.3.0-2.30":

https://github.com/eylenth/Openblas_matrix_issue
https://github.com/xianyi/BLAS-Tester

I also got a confirmation from a colleague in our user support team that 
a problem he was trying to debug with some R code is solved after this 
fix was applied.


I have sent a PR with the fix upstream:
https://github.com/easybuilders/easybuild-easyconfigs/pull/8396

In case anyone else test the workaround it would be nice if you report 
in the mailing list or in the pull request in github if it's working 
fine for you too.


regards,
Pablo


On Tue, May 28, 2019 at 2:32 PM Carlos Fenoy > wrote:


Hi,

After fighting a long time with this, we managed to get a solution
that passes both the "Openblas_matrix_issue" and "BLAS_tester" test
suites.

To solve the issue we had to apply a patch and add a new build
parameter (USE_SIMPLE_THREADED_LEVEL3=1) to OpenBLAS to make it work
with multiple openmp threads.

This is how the buildopts line looks like for us:

buildopts = ' USE_SIMPLE_THREADED_LEVEL3=1 BINARY=64 USE_THREAD=1
USE_OPENMP=1 CC="$CC" FC="$F77" DYNAMIC_ARCH=1'

And the patch, we got it from this commit on the OpenBLAS repo:

https://github.com/xianyi/OpenBLAS/commit/b14f44d2adbe1ec8ede0cdf06fb8b09f3c4b6e43
 (you
can get the patch by adding .patch at the end of the URL)

Regards,
Carlos

On Mon, May 27, 2019 at 6:15 PM Pablo Escobar Lopez
mailto:pablo.escobarlo...@unibas.ch>>
wrote:

Hi,

did anyone found a working patch or workaround for the matrix
issue when using OpenBLAS-0.3.1 ?

After a lot of try I couldn't pass the tests in
https://github.com/eylenth/Openblas_matrix_issue when using

https://github.com/easybuilders/easybuild-easyconfigs/blob/master/easybuild/easyconfigs/o/OpenBLAS/OpenBLAS-0.3.1-GCC-7.3.0-2.30.eb
 .
No matter what patches, toolchainopts or buildopts I use (and I
have tried few different combinations) . Is anyone able to pass
the tests using openblas-0.3.1 ?

I could pass the tests using openblas-0.3.5 but upgrading my
foss/2018b toolchain would be quite messy because I use RPATH.
The less intrusive solution for my users would be to be able to
patch openblas-0.3.1 somehow but I couldn't find a working
solution. Any suggestions?

regards,
Pablo.

p.s. in a related topic, IMHO unless there is a proper
workaround I would suggest to stop providing openblas-0.3.1 with
easybuild. Right now we are distributing a broken library


On Tue, May 7, 2019 at 6:34 PM Mikael Öhman mailto:micket...@gmail.com>> wrote:

Hi Thomas,

I can also confirm these issues. I tried rebuilding
OpenBLAS+R after the fix in #7180, but I still saw the same
problems.
Very large matrix-matrix multiplications randomly gave the
wrong result. Very large errors. The larger the matrix, the
more frequent the errors.

In the end, I compiled an intel-version (but I had to remove
a few extensions that didn't build) and removed my Foss
version from our installations.

Perhaps it's related to hardware; I saw this on happen
skylake servers. I haven't had time to check if this
https://github.com/easybuilders/easybuild-easyconfigs/issues/8197
also affects 0.3.1

Best regards, Mikael


On Tue, May 7, 2019 at 6:12 PM Thomas Eylenbosch
mailto:thomas.eylenbosch@agro.basf-se.com>> wrote:

Hello

__ __

Some of our end users reported a calculation issue with
matrices when they are working with a foss/2018b module

__ __


Re: [easybuild] Building foss-2019a fails in binutils-2.31.1.eb (Skylake node)

2019-02-26 Thread Ole Holm Nielsen

On 2/26/19 11:24 AM, Kenneth Hoste wrote:

On 22/02/2019 14:40, Ole Holm Nielsen wrote:

On 2/22/19 1:24 PM, Lars Viklund wrote:
A system C and C++ compiler is documented as a required dependency of 
EasyBuild

if you are to install toolchains with it:

https://easybuild.readthedocs.io/en/latest/Installation.html#required-dependencies 



As such, I would argue that it's not worth declaring it up-front in 
osdependencies
in all software that needs a system compiler, particularly as the 
package

will be named differently on different distros.


Thanks for pointing this out!  I agree with you.

It's all too easy to miss the general EB prerequisites/dependencies 
when installing a new node.  I've added the CentOS 7 specifics to my 
EB Wiki page:
https://wiki.fysik.dtu.dk/niflheim/EasyBuild_modules#easybuild-prerequisites 




One easy thing we could do is make the binutils easyblock check whether 
both 'gcc' and 'g++' are present, and emit a clear warning if they're 
missing?
That would help significantly, since pinpointing the underlying problem 
is clearly not trivial.


Not sure we should make that a hard failure though, as there may be 
situation where not having gcc/g++ is actually fine (e.g. when 'cc' and 
'c++' compilers are available, and can be used to compile binutils).


One other option could be to detect that the build failed because g++ is 
not there (by recognizing the pattern in the configure output or in 
config.log).


Same applies for GCC(core), where you also need a system C++ compiler 
with sufficiently recent versions...


I'm definitely voting for such a check!!  This would have saved me a lot 
of time when binutils refused to build (due to my own error, as it 
turned out).  I don't know the best solution, I'll leave that to the EB 
experts.


Thanks,
Ole


Re: [easybuild] Building foss-2019a fails in binutils-2.31.1.eb (Skylake node)

2019-02-22 Thread Ole Holm Nielsen

On 2/22/19 1:24 PM, Lars Viklund wrote:

A system C and C++ compiler is documented as a required dependency of EasyBuild
if you are to install toolchains with it:

   
https://easybuild.readthedocs.io/en/latest/Installation.html#required-dependencies

As such, I would argue that it's not worth declaring it up-front in 
osdependencies
in all software that needs a system compiler, particularly as the package
will be named differently on different distros.


Thanks for pointing this out!  I agree with you.

It's all too easy to miss the general EB prerequisites/dependencies when 
installing a new node.  I've added the CentOS 7 specifics to my EB Wiki 
page:

https://wiki.fysik.dtu.dk/niflheim/EasyBuild_modules#easybuild-prerequisites

/Ole


From: easybuild-requ...@lists.ugent.be  on behalf 
of Ole Holm Nielsen 
Sent: Friday, February 22, 2019 09:11
To: easybuild@lists.ugent.be
Subject: Re: [easybuild] Building foss-2019a fails in binutils-2.31.1.eb 
(Skylake node)

Hi Olivier,

On 2/20/19 10:18 PM, Olivier Mattelaer wrote:

I actually face the same issue.

The actual error message is this one:

configure: error: in
`/usr/local/Software/build/lm3-w091/binutils/2.31.1/dummy-/binutils-2.31.1/gold':
configure: error: C++ preprocessor "/lib/cpp" fails sanity check
See `config.log' for more details
yes
checking whether compiling a cross-assembler... no
checking for size_t... checking locale.h usability... make[1]: ***
[configure-gold] Error 1
make[1]: *** Waiting for unfinished jobs


I guess that doing: "yum install gcc-c++" would solve the issue.
But I have not tested yet. But does it make sense to do that?


Your suggestion fixed the problem with binutils: yum install gcc-c++
Now it build correctly, and the foss-2019a build process is continuing.

What's the root cause of this issue?  I'm guessing that the EB file
.../EasyBuild/3.8.1/lib/python2.7/site-packages/easybuild_easyconfigs-3.8.1-py2.7.egg/easybuild/easyconfigs/b/binutils/binutils-2.31.1.eb

must include gcc-c++ as an osdependencies package.

If this guess is correct, I could open an issue.


Re: [easybuild] Building foss-2019a fails in binutils-2.31.1.eb (Skylake node)

2019-02-22 Thread Ole Holm Nielsen

Hi Olivier,

On 2/20/19 10:18 PM, Olivier Mattelaer wrote:

I actually face the same issue.

The actual error message is this one:

configure: error: in 
`/usr/local/Software/build/lm3-w091/binutils/2.31.1/dummy-/binutils-2.31.1/gold':

configure: error: C++ preprocessor "/lib/cpp" fails sanity check
See `config.log' for more details
yes
checking whether compiling a cross-assembler... no
checking for size_t... checking locale.h usability... make[1]: *** 
[configure-gold] Error 1

make[1]: *** Waiting for unfinished jobs


I guess that doing: "yum install gcc-c++" would solve the issue.
But I have not tested yet. But does it make sense to do that?


Your suggestion fixed the problem with binutils: yum install gcc-c++
Now it build correctly, and the foss-2019a build process is continuing.

What's the root cause of this issue?  I'm guessing that the EB file
.../EasyBuild/3.8.1/lib/python2.7/site-packages/easybuild_easyconfigs-3.8.1-py2.7.egg/easybuild/easyconfigs/b/binutils/binutils-2.31.1.eb

must include gcc-c++ as an osdependencies package.

If this guess is correct, I could open an issue.

Thanks,
Ole




Re: [easybuild] Building foss-2019a fails in binutils-2.31.1.eb (Skylake node)

2019-02-22 Thread Ole Holm Nielsen

Hi Olivier,

Looking further for errors in the logfile I see the same as you point out:

== 2019-02-20 14:02:32,930 build_log.py:251 INFO building...
== 2019-02-20 14:02:32,930 easyblock.py:2616 INFO Starting build step
== 2019-02-20 14:02:32,931 easyblock.py:2622 INFO Running method 
build_step part of step build
== 2019-02-20 14:02:32,931 run.py:192 INFO running cmd:  env 
LIBS='-Wl,-rpath=/usr/lib -Wl,-rpath=/usr/lib64'  make -j 80  CFLAGS="-g 
-O2 -fPIC"
== 2019-02-20 14:02:45,434 build_log.py:162 ERROR EasyBuild crashed with 
an error (at ?:124 in __init__): cmd " env LIBS='-Wl,-rpath=/usr/lib 
-Wl,-rpath=/usr/lib64'  make -j 80  CFLAGS="-g -O2 -fPIC" " exited with 
exit code 2 and output:
make[1]: Entering directory 
`/home/modules/build/binutils/2.31.1/dummy-/binutils-2.31.1'

make[1]: Nothing to be done for `all-target'.

...

checking for working alloca.h... checking elf-hints.h usability... 
checking locale.h usability... configure: error: in 
`/home/modules/build/binutils/2.31.1/dummy-/binutils-2.31.1/gold':

configure: error: C++ preprocessor "/lib/cpp" fails sanity check
See `config.log' for more details

...

In the binutils config.log file there are more details:

configure:5044: gcc -o conftest -g -O2 -fPICconftest.c 
-Wl,-rpath=/usr/lib -Wl,-rpath=/usr/lib64 >&5

/tmp/eb-dyNH9x/cclfs0Uf.o: In function `main':
/home/modules/build/binutils/2.31.1/dummy-/binutils-2.31.1/gold/conftest.c:40: 
undefined reference to `dlsym'

collect2: error: ld returned 1 exit status
configure:5044: $? = 1
configure: failed program was:
| /* confdefs.h */
| #define PACKAGE_NAME "gold"
| #define PACKAGE_TARNAME "gold"
| #define PACKAGE_VERSION "0.1"
| #define PACKAGE_STRING "gold 0.1"

Also, the CentOS 7 built-in GCC is used, I don't know if that's intended:
gcc version 4.8.5 20150623 (Red Hat 4.8.5-36) (GCC)


Can anyone shed light on this issue?

Thanks,
Ole

On 2/20/19 10:18 PM, Olivier Mattelaer wrote:

Hi Ole, Mikael,

I actually face the same issue.

The actual error message is this one:

configure: error: in 
`/usr/local/Software/build/lm3-w091/binutils/2.31.1/dummy-/binutils-2.31.1/gold':

configure: error: C++ preprocessor "/lib/cpp" fails sanity check
See `config.log' for more details
yes
checking whether compiling a cross-assembler... no
checking for size_t... checking locale.h usability... make[1]: *** 
[configure-gold] Error 1

make[1]: *** Waiting for unfinished jobs


I guess that doing: "yum install gcc-c++" would solve the issue.
But I have not tested yet. But does it make sense to do that?

Olivier





On 20 Feb 2019, at 21:27, Mikael Öhman <mailto:micket...@gmail.com>> wrote:


Hi Ole,

That still isn't the error; there should be the actual line, doing 
whatever compilation or linking, that failed. They just tend to be 
somewhere in middle of the huge output from the make command (even 
worse with parallel builds).
You can put the entire log in some pastebin and someone is  (probably) 
willing to take a look.


I've built the entire 2019a toolchains on skylake server (with 
AVX512), without any modifications to any configs or blocks (on a 
virtual machine though, but all CPU features were exposed to the VM)


Best regards, Mikael


On Wed, Feb 20, 2019 at 8:58 PM Ole Holm Nielsen 
mailto:ole.h.niel...@fysik.dtu.dk>> wrote:


Hi Åke,

The make actually fails as shown below.  There are hundreds of
configure
and checking lines above the lines I display, and the first make
command
fails as shown in the output.

Can you suggest anything else to check?  Are there issues with
AVX512 on
Skylake, and if so, how to work around it?

FYI, the foss-2018a and foss-2018b toolchains have been built without
problems on the Skylake node.

Thanks,
Ole


On 20-02-2019 18:01, Åke Sandgren wrote:
> That's not the real problem.
> You have to look through that log and figure out what it really is.
> I.e., where does the make actually fail.
>
> On 2/20/19 4:10 PM, Ole Holm Nielsen wrote:
>> On 2/20/19 2:45 PM, Åke Sandgren wrote:
>>> What is the actual error? Look in
>>> /tmp/eb-dyNH9x/easybuild-binutils-2.31.1-20190220.140138.qdBpn.log
>>
>> It's:
>>
>> $ tail
/tmp/eb-dyNH9x/easybuild-binutils-2.31.1-20190220.140138.qdBpn.log
>> config.status: executing default commands
>> make[1]: Leaving directory
>> `/home/modules/build/binutils/2.31.1/dummy-/binutils-2.31.1'
>> make: *** [all] Error 2
>>   (at easybuild/tools/run.py:501 in parse_cmd_output)
>> == 2019-02-20 14:02:45,435 easyblock.py:2870 WARNING build
failed (first
>> 300 chars): cmd " env LIBS='-Wl,-rpath=/usr/lib
-Wl,-rpath=/usr/lib64'
>> make -j 80  CFLAGS="-g

Re: [easybuild] Building foss-2019a fails in binutils-2.31.1.eb (Skylake node)

2019-02-20 Thread Ole Holm Nielsen

Hi Åke,

The make actually fails as shown below.  There are hundreds of configure 
and checking lines above the lines I display, and the first make command 
fails as shown in the output.


Can you suggest anything else to check?  Are there issues with AVX512 on 
Skylake, and if so, how to work around it?


FYI, the foss-2018a and foss-2018b toolchains have been built without 
problems on the Skylake node.


Thanks,
Ole


On 20-02-2019 18:01, Åke Sandgren wrote:

That's not the real problem.
You have to look through that log and figure out what it really is.
I.e., where does the make actually fail.

On 2/20/19 4:10 PM, Ole Holm Nielsen wrote:

On 2/20/19 2:45 PM, Åke Sandgren wrote:

What is the actual error? Look in
/tmp/eb-dyNH9x/easybuild-binutils-2.31.1-20190220.140138.qdBpn.log


It's:

$ tail /tmp/eb-dyNH9x/easybuild-binutils-2.31.1-20190220.140138.qdBpn.log
config.status: executing default commands
make[1]: Leaving directory
`/home/modules/build/binutils/2.31.1/dummy-/binutils-2.31.1'
make: *** [all] Error 2
  (at easybuild/tools/run.py:501 in parse_cmd_output)
== 2019-02-20 14:02:45,435 easyblock.py:2870 WARNING build failed (first
300 chars): cmd " env LIBS='-Wl,-rpath=/usr/lib -Wl,-rpath=/usr/lib64'
make -j 80  CFLAGS="-g -O2 -fPIC" " exited with exit code 2 and output:
make[1]: Entering directory
`/home/modules/build/binutils/2.31.1/dummy-/binutils-2.31.1'
make[1]: Nothing to be done for `all-target'.
Configuring in ./libiberty
Config
== 2019-02-20 14:02:45,435 easyblock.py:288 INFO Closing log for
application name binutils version 2.31.1





Re: [easybuild] Building foss-2019a fails in binutils-2.31.1.eb (Skylake node)

2019-02-20 Thread Ole Holm Nielsen

On 2/20/19 2:45 PM, Åke Sandgren wrote:

What is the actual error? Look in
/tmp/eb-dyNH9x/easybuild-binutils-2.31.1-20190220.140138.qdBpn.log


It's:

$ tail /tmp/eb-dyNH9x/easybuild-binutils-2.31.1-20190220.140138.qdBpn.log
config.status: executing default commands
make[1]: Leaving directory 
`/home/modules/build/binutils/2.31.1/dummy-/binutils-2.31.1'

make: *** [all] Error 2
 (at easybuild/tools/run.py:501 in parse_cmd_output)
== 2019-02-20 14:02:45,435 easyblock.py:2870 WARNING build failed (first 
300 chars): cmd " env LIBS='-Wl,-rpath=/usr/lib -Wl,-rpath=/usr/lib64' 
make -j 80  CFLAGS="-g -O2 -fPIC" " exited with exit code 2 and output:
make[1]: Entering directory 
`/home/modules/build/binutils/2.31.1/dummy-/binutils-2.31.1'

make[1]: Nothing to be done for `all-target'.
Configuring in ./libiberty
Config
== 2019-02-20 14:02:45,435 easyblock.py:288 INFO Closing log for 
application name binutils version 2.31.1





On 2/20/19 2:18 PM, Ole Holm Nielsen wrote:

I'm trying to build the foss-2019a toolchain with EB 3.8.1 on a new
Intel Skylake node (40 cores + hyperthreading = 80 cores) but it fails
in binutils-2.31.1.eb as shown below.  I've tried to increase some
system limits, but that doesn't seem to help.  My current limits are:

$ ulimit -a
core file size  (blocks, -c) 0
data seg size   (kbytes, -d) 5000
scheduling priority (-e) 0
file size   (blocks, -f) unlimited
pending signals (-i) 3088986
max locked memory   (kbytes, -l) 64
max memory size (kbytes, -m) 5000
open files  (-n) 2500
pipe size    (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority  (-r) 0
stack size  (kbytes, -s) 4000
cpu time   (seconds, -t) 3
max user processes  (-u) 2000
virtual memory  (kbytes, -v) unlimited
file locks  (-x) unlimited

Output from the build process:

$ eb foss-2019a.eb -r
== temporary log file in case of crash /tmp/eb-dyNH9x/easybuild-u8cWAg.log
== resolving dependencies ...
== processing EasyBuild easyconfig
/home/modules/software/EasyBuild/3.8.1/lib/python2.7/site-packages/easybuild_easyconfigs-3.8.1-py2.7.egg/easybuild/easyconfigs/b/binutils/binutils-2.31.1.eb

== building and installing binutils/2.31.1...
== fetching files...
== creating build dir, resetting environment...
== unpacking...
== patching...
== preparing...
== configuring...
== building...
== FAILED: Installation ended unsuccessfully (build directory:
/home/modules/build/binutils/2.31.1/dummy-): build failed (first 300
chars): cmd " env LIBS='-Wl,-rpath=/usr/lib -Wl,-rpath=/usr/lib64'  make
-j 80  CFLAGS="-g -O2 -fPIC" " exited with exit code 2 and output:
make[1]: Entering directory
`/home/modules/build/binutils/2.31.1/dummy-/binutils-2.31.1'
make[1]: Nothing to be done for `all-target'.
Configuring in ./libiberty
Config
== Results of the build can be found in the log file(s)
/tmp/eb-dyNH9x/easybuild-binutils-2.31.1-20190220.140138.qdBpn.log
ERROR: Build of
/home/modules/software/EasyBuild/3.8.1/lib/python2.7/site-packages/easybuild_easyconfigs-3.8.1-py2.7.egg/easybuild/easyconfigs/b/binutils/binutils-2.31.1.eb
failed (err: 'build failed (first 300 chars): cmd " env
LIBS=\'-Wl,-rpath=/usr/lib -Wl,-rpath=/usr/lib64\'  make -j 80
CFLAGS="-g -O2 -fPIC" " exited with exit code 2 and output:\nmake[1]:
Entering directory
`/home/modules/build/binutils/2.31.1/dummy-/binutils-2.31.1\'\nmake[1]:
Nothing to be done for `all-target\'.\nConfiguring in ./libiberty\nConfig')

Can anyone help me fix this issue?

Thanks,
Ole





--
Ole Holm Nielsen
PhD, Senior HPC Officer
Department of Physics, Technical University of Denmark,
Building 307, DK-2800 Kongens Lyngby, Denmark
E-mail: ole.h.niel...@fysik.dtu.dk
Homepage: http://dcwww.fysik.dtu.dk/~ohnielse/
Tel: (+45) 4525 3187 / Mobile (+45) 5180 1620


[easybuild] Re: Building foss-2019a fails in binutils-2.31.1.eb (Skylake node)

2019-02-20 Thread Ole Holm Nielsen
I should add that binutils-2.31.1.eb builds and installs without 
problems on other nodes with an identical CentOS 7.6 setup, but these 
nodes contain older Sandy Bridge and Nehalem CPUs.


/Ole

On 2/20/19 2:18 PM, Ole Holm Nielsen wrote:
I'm trying to build the foss-2019a toolchain with EB 3.8.1 on a new 
Intel Skylake node (40 cores + hyperthreading = 80 cores) but it fails 
in binutils-2.31.1.eb as shown below.  I've tried to increase some 
system limits, but that doesn't seem to help.  My current limits are:


$ ulimit -a
core file size  (blocks, -c) 0
data seg size   (kbytes, -d) 5000
scheduling priority (-e) 0
file size   (blocks, -f) unlimited
pending signals (-i) 3088986
max locked memory   (kbytes, -l) 64
max memory size (kbytes, -m) 5000
open files  (-n) 2500
pipe size    (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority  (-r) 0
stack size  (kbytes, -s) 4000
cpu time   (seconds, -t) 3
max user processes  (-u) 2000
virtual memory  (kbytes, -v) unlimited
file locks  (-x) unlimited

Output from the build process:

$ eb foss-2019a.eb -r
== temporary log file in case of crash /tmp/eb-dyNH9x/easybuild-u8cWAg.log
== resolving dependencies ...
== processing EasyBuild easyconfig 
/home/modules/software/EasyBuild/3.8.1/lib/python2.7/site-packages/easybuild_easyconfigs-3.8.1-py2.7.egg/easybuild/easyconfigs/b/binutils/binutils-2.31.1.eb 


== building and installing binutils/2.31.1...
== fetching files...
== creating build dir, resetting environment...
== unpacking...
== patching...
== preparing...
== configuring...
== building...
== FAILED: Installation ended unsuccessfully (build directory: 
/home/modules/build/binutils/2.31.1/dummy-): build failed (first 300 
chars): cmd " env LIBS='-Wl,-rpath=/usr/lib -Wl,-rpath=/usr/lib64'  make 
-j 80  CFLAGS="-g -O2 -fPIC" " exited with exit code 2 and output:
make[1]: Entering directory 
`/home/modules/build/binutils/2.31.1/dummy-/binutils-2.31.1'

make[1]: Nothing to be done for `all-target'.
Configuring in ./libiberty
Config
== Results of the build can be found in the log file(s) 
/tmp/eb-dyNH9x/easybuild-binutils-2.31.1-20190220.140138.qdBpn.log
ERROR: Build of 
/home/modules/software/EasyBuild/3.8.1/lib/python2.7/site-packages/easybuild_easyconfigs-3.8.1-py2.7.egg/easybuild/easyconfigs/b/binutils/binutils-2.31.1.eb 
failed (err: 'build failed (first 300 chars): cmd " env 
LIBS=\'-Wl,-rpath=/usr/lib -Wl,-rpath=/usr/lib64\'  make -j 80 
CFLAGS="-g -O2 -fPIC" " exited with exit code 2 and output:\nmake[1]: 
Entering directory 
`/home/modules/build/binutils/2.31.1/dummy-/binutils-2.31.1\'\nmake[1]: 
Nothing to be done for `all-target\'.\nConfiguring in ./libiberty\nConfig')


Can anyone help me fix this issue?

Thanks,
Ole



[easybuild] Building foss-2019a fails in binutils-2.31.1.eb (Skylake node)

2019-02-20 Thread Ole Holm Nielsen
I'm trying to build the foss-2019a toolchain with EB 3.8.1 on a new 
Intel Skylake node (40 cores + hyperthreading = 80 cores) but it fails 
in binutils-2.31.1.eb as shown below.  I've tried to increase some 
system limits, but that doesn't seem to help.  My current limits are:


$ ulimit -a
core file size  (blocks, -c) 0
data seg size   (kbytes, -d) 5000
scheduling priority (-e) 0
file size   (blocks, -f) unlimited
pending signals (-i) 3088986
max locked memory   (kbytes, -l) 64
max memory size (kbytes, -m) 5000
open files  (-n) 2500
pipe size(512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority  (-r) 0
stack size  (kbytes, -s) 4000
cpu time   (seconds, -t) 3
max user processes  (-u) 2000
virtual memory  (kbytes, -v) unlimited
file locks  (-x) unlimited

Output from the build process:

$ eb foss-2019a.eb -r
== temporary log file in case of crash /tmp/eb-dyNH9x/easybuild-u8cWAg.log
== resolving dependencies ...
== processing EasyBuild easyconfig 
/home/modules/software/EasyBuild/3.8.1/lib/python2.7/site-packages/easybuild_easyconfigs-3.8.1-py2.7.egg/easybuild/easyconfigs/b/binutils/binutils-2.31.1.eb

== building and installing binutils/2.31.1...
== fetching files...
== creating build dir, resetting environment...
== unpacking...
== patching...
== preparing...
== configuring...
== building...
== FAILED: Installation ended unsuccessfully (build directory: 
/home/modules/build/binutils/2.31.1/dummy-): build failed (first 300 
chars): cmd " env LIBS='-Wl,-rpath=/usr/lib -Wl,-rpath=/usr/lib64'  make 
-j 80  CFLAGS="-g -O2 -fPIC" " exited with exit code 2 and output:
make[1]: Entering directory 
`/home/modules/build/binutils/2.31.1/dummy-/binutils-2.31.1'

make[1]: Nothing to be done for `all-target'.
Configuring in ./libiberty
Config
== Results of the build can be found in the log file(s) 
/tmp/eb-dyNH9x/easybuild-binutils-2.31.1-20190220.140138.qdBpn.log
ERROR: Build of 
/home/modules/software/EasyBuild/3.8.1/lib/python2.7/site-packages/easybuild_easyconfigs-3.8.1-py2.7.egg/easybuild/easyconfigs/b/binutils/binutils-2.31.1.eb 
failed (err: 'build failed (first 300 chars): cmd " env 
LIBS=\'-Wl,-rpath=/usr/lib -Wl,-rpath=/usr/lib64\'  make -j 80 
CFLAGS="-g -O2 -fPIC" " exited with exit code 2 and output:\nmake[1]: 
Entering directory 
`/home/modules/build/binutils/2.31.1/dummy-/binutils-2.31.1\'\nmake[1]: 
Nothing to be done for `all-target\'.\nConfiguring in ./libiberty\nConfig')


Can anyone help me fix this issue?

Thanks,
Ole

--
Ole Holm Nielsen
PhD, Senior HPC Officer
Department of Physics, Technical University of Denmark


Re: [easybuild] Lmod 7.8 is now available from the EPEL repository

2019-02-20 Thread Ole Holm Nielsen

On 2/20/19 11:35 AM, Kenneth Hoste wrote:

On 20/02/2019 11:25, Ole Holm Nielsen wrote:

On 2/20/19 11:19 AM, Jure Pečar wrote:

On Wed, 20 Feb 2019 10:14:29 +0100
Ole Holm Nielsen  wrote:


Thanks to the kind Lmod maintainer (Orion Poplawski) of the Fedora/EPEL
repository, the Lmod version in EPEL has been updated today from 6.6.3
to 7.8.16!  Thanks also to the people who added to the Bugzilla request
shown below!


Good news! Any gotchas when moving from 6.6 to 7.8?


Yes, it's great to finally have a modern Lmod via an RPM installation!

I had one minor gotcha, namely an incompatible .modulerc file 
containing "hide-module zlib/1.2.8".  Removing that file solved the 
problem.


Welcome to the future! (well, Lmod 7.0 was released Nov 2016, so...)


There was a substantial barrier in building one's own home-grown RPM 
package for Lmod :-(  I certainly didn't feel knowledgeable enough to 
build my own, as many other sites have done it.


Each Lmod RPM I've seen has been customized to very specific site 
setups.  The people of TACC apparently don't want to provide a public 
RPM, which is perhaps quite understandable.  So thank God for EPEL!


There's a couple of other gotchas: the format/filename of the Lmod 
spider cache file has changed, and the internal format for module 
collections has changed.


See also https://github.com/TACC/Lmod/blob/master/README.md#lmod-70 and 
https://sourceforge.net/p/lmod/mailman/message/35489261/


Thanks for the heads-up!

/Ole


Re: [easybuild] Lmod 7.8 is now available from the EPEL repository

2019-02-20 Thread Ole Holm Nielsen

On 2/20/19 11:19 AM, Jure Pečar wrote:

On Wed, 20 Feb 2019 10:14:29 +0100
Ole Holm Nielsen  wrote:


Thanks to the kind Lmod maintainer (Orion Poplawski) of the Fedora/EPEL
repository, the Lmod version in EPEL has been updated today from 6.6.3
to 7.8.16!  Thanks also to the people who added to the Bugzilla request
shown below!


Good news! Any gotchas when moving from 6.6 to 7.8?


Yes, it's great to finally have a modern Lmod via an RPM installation!

I had one minor gotcha, namely an incompatible .modulerc file containing 
"hide-module zlib/1.2.8".  Removing that file solved the problem.


/Ole


[easybuild] Lmod 7.8 is now available from the EPEL repository

2019-02-20 Thread Ole Holm Nielsen
Thanks to the kind Lmod maintainer (Orion Poplawski) of the Fedora/EPEL 
repository, the Lmod version in EPEL has been updated today from 6.6.3 
to 7.8.16!  Thanks also to the people who added to the Bugzilla request 
shown below!


A modern version of Lmod has been a very long-standing wish from many 
sites running CentOS/RHEL 7.  No longer do sites need to build their own 
locally customized up-to-date version of Lmod, if they are satisfied 
with the EPEL version.


Instructions for installing Lmod from EPEL is in my Wiki:
https://wiki.fysik.dtu.dk/niflheim/EasyBuild_modules#install-lmod

Best regards,
Ole

On 1/23/19 12:17 PM, Loris Bennett wrote:

On Centos 7 I installed EB using the system version of Lmod (6.6.3).  I
have now installed Lmod 7.3 via its easyconfig.  I now have the
directory


We could always hope that the EPEL maintainers would eventually agree to my
request to update Lmod to 7.8 in EPEL, see
https://bugzilla.redhat.com/show_bug.cgi?id=1651546

It might be beneficial if other sites would add arguments to my request in the
Bugzilla, hoping that its priority might be increased.


I have seconded your request.

Cheers,

Loris



--
Ole Holm Nielsen
PhD, Senior HPC Officer
Department of Physics, Technical University of Denmark


Re: [easybuild] Canonical setup for Lmod installed via EB?

2019-01-23 Thread Ole Holm Nielsen
On 1/23/19 12:31 PM, Josef Dvoracek wrote:>  > On Centos 7 I installed 
EB using the system version of Lmod (6.6.3).


why not use less ancient lmod from openHPC ( ATM 
"lmod-ohpc-7.7.14-3.1.x86_64" ) ?
I consider openHPC as trusted-enough source of RPMs, especially in HPC 
env...


Where might the associated Yum repository or just the RPMs be found?

Thanks, Ole


Re: [easybuild] Canonical setup for Lmod installed via EB?

2019-01-23 Thread Ole Holm Nielsen

On 1/23/19 10:56 AM, Loris Bennett wrote:

On Centos 7 I installed EB using the system version of Lmod (6.6.3).  I
have now installed Lmod 7.3 via its easyconfig.  I now have the
directory


We could always hope that the EPEL maintainers would eventually agree to 
my request to update Lmod to 7.8 in EPEL, see 
https://bugzilla.redhat.com/show_bug.cgi?id=1651546


It might be beneficial if other sites would add arguments to my request 
in the Bugzilla, hoping that its priority might be increased.


Thanks,
Ole


Re: [easybuild] EB bootstrap fails: Installed distribution setuptools 0.9.8 conflicts with requirement setuptools>=17.1

2019-01-10 Thread Ole Holm Nielsen
For the record:  The problem is solved by installing this CentOS RPM 
package before bootstrapping EasyBuild:


yum install python-mock

/Ole

On 1/10/19 10:41 AM, Kenneth Hoste wrote:

Dear Ole,

Since you've also reported this at 
https://github.com/easybuilders/easybuild-framework/issues/2712, let's 
follow up there to avoid fragmenting the discussion.



regards,

Kenneth

On 10/01/2019 10:00, Ole Holm Nielsen wrote:
I'm setting up EasyBuild on a new Intel Xeon Skylake node which is 
running CentOS 7.6.  This OS comes with python-setuptools 0.9.8:


$ rpm -q python-setuptools
python-setuptools-0.9.8-7.el7.noarch

Unfortunately the EB bootstrap script for some reason is not working, 
even though the minimum version should be 0.6, see
https://easybuild.readthedocs.io/en/latest/Installation.html#required-python-packages 



I get the error shown below:

$ python bootstrap_eb.py $EASYBUILD_PREFIX
[[INFO]] EasyBuild bootstrap script (version 20180531.01, MD5: 
3968c2d88c53f96523486494bca11b4c)
[[INFO]] Found Python 2.7.5 (default, Oct 30 2018, 23:45:53) ; [GCC 
4.8.5 20150623 (Red Hat 4.8.5-36)]


[[INFO]] Installation prefix /home/modules
[[INFO]] Using modules tool specified by $EASYBUILD_MODULES_TOOL: Lmod
[[INFO]] Suitable setuptools installation already found, skipping 
stage 0...



[[INFO]] +++ STAGE 1: installing EasyBuild in temporary dir with 
easy_install...


[[INFO]] installing EasyBuild with 'easy_install --quiet --upgrade 
--prefix=/tmp/tmp5WimMe/eb_stage1 easybuild'
[[ERROR]] Running 'easy_install --quiet --upgrade 
--prefix=/tmp/tmp5WimMe/eb_stage1 easybuild' failed: error: Installed 
distribution setuptools 0.9.8 conflicts with requirement setuptools>=17.1

Traceback (most recent call last):
   File "bootstrap_eb.py", line 360, in run_easy_install
 easy_install.main(args)
   File 
"/usr/lib/python2.7/site-packages/setuptools/command/easy_install.py", 
line 1992, in main

 with_ei_usage(lambda:
   File 
"/usr/lib/python2.7/site-packages/setuptools/command/easy_install.py", 
line 1979, in with_ei_usage

 return f()
   File 
"/usr/lib/python2.7/site-packages/setuptools/command/easy_install.py", 
line 1996, in 

 distclass=DistributionWithoutHelpCommands, **kw
   File "/usr/lib64/python2.7/distutils/core.py", line 169, in setup
 raise SystemExit, "error: " + str(msg)
SystemExit: error: Installed distribution setuptools 0.9.8 conflicts 
with requirement setuptools>=17.1


I wonder hos the 17.1 requirement comes up?  Perhaps this requirement 
makes sense with Python 3.x?


Can anyone suggest a fix so that I can get started?

Thanks,
Ole



[easybuild] EB bootstrap fails: Installed distribution setuptools 0.9.8 conflicts with requirement setuptools>=17.1

2019-01-10 Thread Ole Holm Nielsen
I'm setting up EasyBuild on a new Intel Xeon Skylake node which is 
running CentOS 7.6.  This OS comes with python-setuptools 0.9.8:


$ rpm -q python-setuptools
python-setuptools-0.9.8-7.el7.noarch

Unfortunately the EB bootstrap script for some reason is not working, 
even though the minimum version should be 0.6, see

https://easybuild.readthedocs.io/en/latest/Installation.html#required-python-packages

I get the error shown below:

$ python bootstrap_eb.py $EASYBUILD_PREFIX
[[INFO]] EasyBuild bootstrap script (version 20180531.01, MD5: 
3968c2d88c53f96523486494bca11b4c)
[[INFO]] Found Python 2.7.5 (default, Oct 30 2018, 23:45:53) ; [GCC 
4.8.5 20150623 (Red Hat 4.8.5-36)]


[[INFO]] Installation prefix /home/modules
[[INFO]] Using modules tool specified by $EASYBUILD_MODULES_TOOL: Lmod
[[INFO]] Suitable setuptools installation already found, skipping stage 0...


[[INFO]] +++ STAGE 1: installing EasyBuild in temporary dir with 
easy_install...


[[INFO]] installing EasyBuild with 'easy_install --quiet --upgrade 
--prefix=/tmp/tmp5WimMe/eb_stage1 easybuild'
[[ERROR]] Running 'easy_install --quiet --upgrade 
--prefix=/tmp/tmp5WimMe/eb_stage1 easybuild' failed: error: Installed 
distribution setuptools 0.9.8 conflicts with requirement setuptools>=17.1

Traceback (most recent call last):
  File "bootstrap_eb.py", line 360, in run_easy_install
easy_install.main(args)
  File 
"/usr/lib/python2.7/site-packages/setuptools/command/easy_install.py", 
line 1992, in main

with_ei_usage(lambda:
  File 
"/usr/lib/python2.7/site-packages/setuptools/command/easy_install.py", 
line 1979, in with_ei_usage

return f()
  File 
"/usr/lib/python2.7/site-packages/setuptools/command/easy_install.py", 
line 1996, in 

distclass=DistributionWithoutHelpCommands, **kw
  File "/usr/lib64/python2.7/distutils/core.py", line 169, in setup
raise SystemExit, "error: " + str(msg)
SystemExit: error: Installed distribution setuptools 0.9.8 conflicts 
with requirement setuptools>=17.1


I wonder hos the 17.1 requirement comes up?  Perhaps this requirement 
makes sense with Python 3.x?


Can anyone suggest a fix so that I can get started?

Thanks,
Ole

--
Ole Holm Nielsen
PhD, Senior HPC Officer
Department of Physics, Technical University of Denmark


Re: [easybuild] The fontconfig/2.12.4-GCCcore-6.4.0 module makes /usr/bin/gedit crash on CentOS 7.6

2018-12-11 Thread Ole Holm Nielsen

Hi Kenneth,

Thanks a lot for your analysis of the problem.  We have upgraded our 
modules (GPAW and ASE) to use the intel-2018b or foss-2018b toolchain 
which loads fontconfig 2.13.  This solves the problem with gedit (and 
emacs) on CentOS 7.6.


Best regards,
Ole

On 12/7/18 2:10 PM, Kenneth Hoste wrote:

Dear Ole,

Based on https://bbs.archlinux.org/viewtopic.php?id=235716, it seems 
like the solution would be to update to fontconfig 2.13.0...


In other words, the gedit build in CentOS 6 requires fontconfig 2.13.0, 
which is why it fails when an older fontconfig module is loaded.


Since the problem actually comes from Pango, one solution you can try is 
to also load Pango/1.41.1-foss-2018a (which uses fontconfig 2.12.6).


Bottom line here is that the use of $LD_LIBRARY_PATH is the troublemaker 
here, since that makes gedit pick up older fontconfig libraries...



regards,

Kenneth

On 07/12/2018 11:35, Ole Holm Nielsen wrote:
After we upgraded our login-nodes from CentOS 7.5 to 7.6, the 
/usr/bin/gedit editor suddenly crashes with an error message:


$ gedit
gedit: symbol lookup error: /lib64/libpangoft2-1.0.so.0: undefined 
symbol: FcWeightFromOpenTypeDouble


I've localized the error to the loading of any one of the following 
modules:


fontconfig/2.12.4-GCCcore-6.4.0
fontconfig/2.12.6-GCCcore-6.4.0

However, the latest module fontconfig/2.13.0-GCCcore-7.3.0 doesn't 
cause the error.


How to reproduce:

$ module load fontconfig/2.12.4-GCCcore-6.4.0
$ gedit
gedit: symbol lookup error: /lib64/libpangoft2-1.0.so.0: undefined 
symbol: FcWeightFromOpenTypeDouble


The following modules get loaded:

$ module list

Currently Loaded Modules:
   1) GCCcore/6.4.0   5) libpng/1.6.32-GCCcore-6.4.0
   2) expat/2.2.4-GCCcore-6.4.0   6) freetype/2.8-GCCcore-6.4.0
   3) bzip2/1.0.6-GCCcore-6.4.0   7) fontconfig/2.12.4-GCCcore-6.4.0
   4) zlib/1.2.11-GCCcore-6.4.0

The error goes away after unloading fontconfig/2.12.4-GCCcore-6.4.0.

I've rebuilt the fontconfig/2.12.4-GCCcore-6.4.0 module on the CentOS 
7.6 node, but the error is still the same :-(


Has anyone else encountered this problem, and possibly found a solution?


[easybuild] The fontconfig/2.12.4-GCCcore-6.4.0 module makes /usr/bin/gedit crash on CentOS 7.6

2018-12-07 Thread Ole Holm Nielsen
After we upgraded our login-nodes from CentOS 7.5 to 7.6, the 
/usr/bin/gedit editor suddenly crashes with an error message:


$ gedit
gedit: symbol lookup error: /lib64/libpangoft2-1.0.so.0: undefined 
symbol: FcWeightFromOpenTypeDouble


I've localized the error to the loading of any one of the following modules:

fontconfig/2.12.4-GCCcore-6.4.0
fontconfig/2.12.6-GCCcore-6.4.0

However, the latest module fontconfig/2.13.0-GCCcore-7.3.0 doesn't cause 
the error.


How to reproduce:

$ module load fontconfig/2.12.4-GCCcore-6.4.0
$ gedit
gedit: symbol lookup error: /lib64/libpangoft2-1.0.so.0: undefined 
symbol: FcWeightFromOpenTypeDouble


The following modules get loaded:

$ module list

Currently Loaded Modules:
  1) GCCcore/6.4.0   5) libpng/1.6.32-GCCcore-6.4.0
  2) expat/2.2.4-GCCcore-6.4.0   6) freetype/2.8-GCCcore-6.4.0
  3) bzip2/1.0.6-GCCcore-6.4.0   7) fontconfig/2.12.4-GCCcore-6.4.0
  4) zlib/1.2.11-GCCcore-6.4.0

The error goes away after unloading fontconfig/2.12.4-GCCcore-6.4.0.

I've rebuilt the fontconfig/2.12.4-GCCcore-6.4.0 module on the CentOS 
7.6 node, but the error is still the same :-(


Has anyone else encountered this problem, and possibly found a solution?

Thanks,
Ole

--
Ole Holm Nielsen
PhD, Senior HPC Officer
Department of Physics, Technical University of Denmark


Re: [easybuild] Retrospectively hiding modules / hiding by default

2018-11-26 Thread Ole Holm Nielsen

(Resending because it seems that my reply got trapped by our spam-filter).
On 11/23/2018 10:53 AM, Markus Geimer wrote:
> You don't even need an Lmod hook.  Just put something like


hide-version foo/42.0

in a global rc file evaluated by Lmod (I believe it has to be TCL).
This can easily be done after the fact.  I'm using this for quite a
while now and it works like a charm.  See also

https://lmod.readthedocs.io/en/latest/040_FAQ.html?highlight=hide-version


Thanks for suggesting the use of Lmod "hide-version" for hiding 
unwanted/obsolete/system modules from the "module avail" command!  There 
are also some useful hints in 
https://github.com/TACC/Lmod/blob/master/Transition_to_Lmod7.txt.


IMHO, this is the best available solution in stead of rebuilding your 
entire module tree from scratch (which may still be what we want to do 
in the long term).


I've written the attached script for conveniently generating a 
~/.modulerc file from a list of module name patterns which you want to 
hide.  On a CentOS 7 system with the Lmod RPM from EPEL, you can copy 
the .modulerc file to the Lmod system default file /usr/share/lmod/etc/rc.


For the record, I've also documented this procedure and the script in my 
EasyBuild Wiki page at 
https://wiki.fysik.dtu.dk/niflheim/EasyBuild_modules#hiding-modules-with-lmod


/Ole

#!/bin/sh

# Create ~/.modulerc file with module hide-version information
# The hidden modules will not be shown by "module avail".
# List system-wide modulerc file by:
# $ module --config 2>&1 | grep MODULERCFILE
# MODULERCFILE /usr/share/lmod/etc/rc
MODULERC=~/.modulerc

# List available modules
TEMP=/tmp/modulerc.$$
rm -f $TEMP
module --terse --show-hidden avail > $TEMP 2>&1

# Generate a hide list

cat < $MODULERC
#%Module
# Documentation of hide-version:
# https://lmod.readthedocs.io/en/latest/040_FAQ.html?highlight=hide-version
# and https://github.com/TACC/Lmod/blob/master/Transition_to_Lmod7.txt
global env
if { [info exists env(LMOD_VERSION_MAJOR)]} {
EOF

# Define patterns for which modules to hide

cat <> $MODULERC
GCCcore-5.4.0
GCCcore-6.1.0
GCCcore/5.4.0
GCCcore/6.1.0
GCC-5.4.0-2.26
GCC-6.3.0-2.27
foss/2016a
foss-2016a
foss/2016b
foss-2016b
Autoconf
Automake
Autotools
Bison
CMake
LibTIFF
LibUUID
M4
Szip
Tcl/
Tk/
Tkinter
XML-Parser
XZ
bzip2
binutils
cURL
expat
flex
fontconfig
freetype
gettext
gompi-2016b
gompi/2016b
gperf
help2man
hwloc
libevent
libffi
libjpeg-turbo
libpng
libreadline
LibTIFF
libtool
LibUUID
libxml2
ncurses
numactl
pkg-config
tmux
util-linux
zlib
EOF

# Terminating bracket
echo "}" >> $MODULERC

echo File $MODULERC has been created

rm -f $TEMP


  1   2   >