Re: [easybuild] EB 3.6.0: Build of iomkl-2018.02.eb fails on Intel Omni-Path systems

2018-04-30 Thread Ole Holm Nielsen

On 04/30/2018 03:24 PM, Kenneth Hoste wrote:

On 30/04/2018 15:12, Ole Holm Nielsen wrote:

On 04/30/2018 02:44 PM, Bart Oldeman wrote:

Hello Ole,

the issue is here:
ld: cannot find -lpciaccess
ld: cannot find -lxml2
to solve this you can either install OS packages (.rpm, .deb, etc.) 
for libpciaccess and libxml2, and perhaps add them as OS dependencies.


Yes, I thought about this, but these packages are already present on 
the system:


# rpm -q libxml2 libpciaccess
libxml2-2.9.1-6.el7_2.3.x86_64
libpciaccess-0.13.4-3.1.el7_4.x86_64


You may need to -devel versions of these...


Good point!  The -devel versions were absent, so I installed the RPMs 
now and will redo the build.


/Ole


Re: [easybuild] building TensorFlow doesn't respect job-cores

2018-04-30 Thread Kenneth Hoste

Dear Yann,

On 30/04/2018 13:31, Yann Sagon wrote:


Dear Kenneth,


2018-04-30 11:32 GMT+02:00 Kenneth Hoste >:


Dear Yann,

On 30/04/2018 11:28, Yann Sagon wrote:
> I just noticed that building TF17 with cuda compute 6.0 and 6.1 doesn't 
> respect the job-cores = 12 that I have in my config file. According to 
> htop, it's using ~28 cores


Did you submit the TF build as a job using "eb --job"?
The --job-cores configuration setting only applies to build jobs
submitted using --job.


duhhh! Sorry about that, I'm dumb. Indeed I should use the parallel flag 
in my case.


But to be honest, I  don't see why there are two different flag for 
almost the same purpose?


They're not exactly the same purpose, but the difference is subtle.

Maybe a good example that combines both helps:

eb --job --job=cores=J --parallel=P ...

This will make EasyBuild submit a job using J cores, to build something 
using (exactly) P cores.


With P

Re: [easybuild] EB 3.6.0: Build of iomkl-2018.02.eb fails on Intel Omni-Path systems

2018-04-30 Thread Bart Oldeman
Hello Ole,

you'd need the "-devel" packages too, e.g.
libpciaccess-devel-0.13.4-3.1.el7_4.x86_64

you can install them as modules using EB, perhaps as hidden modules, it's
your choice.

Regards,
Bart
-- 
Dr. Bart E. Oldeman | bart.olde...@mcgill.ca | bart.olde...@calculquebec.ca
Scientific Computing Analyst / Analyste en calcul scientifique
McGill HPC Centre / Centre de Calcul Haute Performance de McGill |
http://www.hpc.mcgill.ca
Calcul Québec | http://www.calculquebec.ca
Compute/Calcul Canada | http://www.computecanada.ca
Tel/Tél: 514-396-8926 | Fax/Télécopieur: 514-396-8934


Re: [easybuild] EB 3.6.0: Build of iomkl-2018.02.eb fails on Intel Omni-Path systems

2018-04-30 Thread Kenneth Hoste

On 30/04/2018 15:12, Ole Holm Nielsen wrote:

On 04/30/2018 02:44 PM, Bart Oldeman wrote:

Hello Ole,

the issue is here:
ld: cannot find -lpciaccess
ld: cannot find -lxml2
to solve this you can either install OS packages (.rpm, .deb, etc.) 
for libpciaccess and libxml2, and perhaps add them as OS dependencies.


Yes, I thought about this, but these packages are already present on the 
system:


# rpm -q libxml2 libpciaccess
libxml2-2.9.1-6.el7_2.3.x86_64
libpciaccess-0.13.4-3.1.el7_4.x86_64


You may need to -devel versions of these...



Alternatively you can put in dependencies for ('libxml2', '2.9.8') and 
('libpciaccess', '0.13.4'), where you would need to install 
libpciaccess using GCCcore instead of an intel or foss toolchain.


So you mean that I may need to install these libraries as modules using EB?


Yes, that's what Bart is suggesting.

I wonder why this hasn't popped up before though...

Do you mind opening an issue on this at 
https://github.com/easybuilders/easybuild-easyconfigs/issues?



regards,

Kenneth



Thanks,
Ole

On Mon, 30 Apr 2018 at 05:31, Ole Holm Nielsen 
> wrote:


    When we build on nodes with Intel Omni-Path software installed, the
    build of "iomkl" fails:

    $ eb iomkl-2018.02.eb -r
    == temporary log file in case of crash
    /tmp/eb-MDUNmr/easybuild-UxI3tr.log
    == resolving dependencies ...
    == processing EasyBuild easyconfig

/home/modules/software/EasyBuild/3.6.0/lib/python2.7/site-packages/easybuild_easyconfigs-3.6.0-py2.7.egg/easybuild/easyconfigs/o/OpenMPI/OpenMPI-2.1.3-iccifort-2018.2.199-GCC-6.4.0-2.28.eb 


    == building and installing
    OpenMPI/2.1.3-iccifort-2018.2.199-GCC-6.4.0-2.28...
    == fetching files...
    == creating build dir, resetting environment...
    == unpacking...
    == patching...
    == preparing...
    == configuring...
    == building...
    == FAILED: Installation ended unsuccessfully (build directory:

/home/modules/build/OpenMPI/2.1.3/iccifort-2018.2.199-GCC-6.4.0-2.28):

    build failed (first 300 chars): cmd " make -j 48 " exited with exit
    code
    2 and output:
    Making all in config
    make[1]: Entering directory

`/home/modules/build/OpenMPI/2.1.3/iccifort-2018.2.199-GCC-6.4.0-2.28/openmpi-2.1.3/config' 



    Intel Omni-Path systems
    make[1]: Nothing to be done for `all'.
    make[1]: Leaving directory `/home/modules/build/OpenMPI/2.1.3/icc
    == Results of the build can be found in the log file(s)
    /tmp/eb-MDUNmr/easybuild-OpenMPI-2.1.3-20180430.105741.vWNYR.log
    ERROR: Build of

/home/modules/software/EasyBuild/3.6.0/lib/python2.7/site-packages/easybuild_easyconfigs-3.6.0-py2.7.egg/easybuild/easyconfigs/o/OpenMPI/OpenMPI-2.1.3-iccifort-2018.2.199-GCC-6.4.0-2.28.eb 



    failed (err: 'build failed (first 300 chars): cmd " make -j 48 " 
exited

    with exit code 2 and output:\nMaking all in config\nmake[1]: Entering
    directory

`/home/modules/build/OpenMPI/2.1.3/iccifort-2018.2.199-GCC-6.4.0-2.28/openmpi-2.1.3/config\'\nmake[1]: 



    Nothing to be done for `all\'.\nmake[1]: Leaving directory
    `/home/modules/build/OpenMPI/2.1.3/icc')

    The OpenMPI error log file contains near the end:

    CCLD libopen-pal.la 
    ld: cannot find -lpciaccess
    ld: cannot find -lxml2
    make[2]: *** [libopen-pal.la ] Error 1
    make[2]: Leaving directory

`/home/modules/build/OpenMPI/2.1.3/iccifort-2018.2.199-GCC-6.4.0-2.28/openmpi-2.1.3/opal' 


    make[1]: *** [all-recursive] Error 1
    make[1]: Leaving directory

`/home/modules/build/OpenMPI/2.1.3/iccifort-2018.2.199-GCC-6.4.0-2.28/openmpi-2.1.3/opal' 


    make: *** [all-recursive] Error 1
   (at easybuild/tools/run.py:501 in parse_cmd_output)
    == 2018-04-30 11:03:17,152 easyblock.py:2702 WARNING build failed
    (first
    300 chars): cmd " make -j 48 " exited with exit code 2 and output:
    Making all in config
    make[1]: Entering directory
    `/home/modules/build/OpenMPI/2.1.3/iccifort-
    Intel Omni-Path 
systems2018.2.199-GCC-6.4.0-2.28/openmpi-2.1.3/config'

    make[1]: Nothing to be done for `all'.
    make[1]: Leaving directory `/home/modules/build/OpenMPI/2.1.3/icc
    == 2018-04-30 11:03:17,152 easyblock.py:280 INFO Closing log for
    application name OpenMPI version 2.1.3


    Question: Can anyone point to the cause of this error?  Did OpenMPI
    2.1.3 introduce this error?


    Extra information: The issue
    https://github.com/easybuilders/easybuild-easyconfigs/issues/5805 is
    fixed with EB 3.6.0 and OpenMPI 2.1.3.  On our systems with Mellanox
    Infiniband software installed, the line in
    OpenMPI-2.1.3-iccifort-2018.2.199-GCC-6.4.0-2.28.eb *does* fix the
    issue
    5805 on this platform:

    configopts += '--without-ucx '  # hard disable UCX, to dance 
around bug

    (https://github.com/open-mpi/ompi/issues/4345)

    On the Intel Omni-Path system I 

Re: [easybuild] EB 3.6.0: Build of iomkl-2018.02.eb fails on Intel Omni-Path systems

2018-04-30 Thread Bart Oldeman
Hello Ole,

the issue is here:
ld: cannot find -lpciaccess
ld: cannot find -lxml2
to solve this you can either install OS packages (.rpm, .deb, etc.) for
libpciaccess and libxml2, and perhaps add them as OS dependencies.

Alternatively you can put in dependencies for ('libxml2', '2.9.8') and
('libpciaccess', '0.13.4'), where you would need to install libpciaccess
using GCCcore instead of an intel or foss toolchain.

Regards,
Bart

On Mon, 30 Apr 2018 at 05:31, Ole Holm Nielsen 
wrote:

> When we build on nodes with Intel Omni-Path software installed, the
> build of "iomkl" fails:
>
> $ eb iomkl-2018.02.eb -r
> == temporary log file in case of crash /tmp/eb-MDUNmr/easybuild-UxI3tr.log
> == resolving dependencies ...
> == processing EasyBuild easyconfig
>
> /home/modules/software/EasyBuild/3.6.0/lib/python2.7/site-packages/easybuild_easyconfigs-3.6.0-py2.7.egg/easybuild/easyconfigs/o/OpenMPI/OpenMPI-2.1.3-iccifort-2018.2.199-GCC-6.4.0-2.28.eb
> == building and installing
> OpenMPI/2.1.3-iccifort-2018.2.199-GCC-6.4.0-2.28...
> == fetching files...
> == creating build dir, resetting environment...
> == unpacking...
> == patching...
> == preparing...
> == configuring...
> == building...
> == FAILED: Installation ended unsuccessfully (build directory:
> /home/modules/build/OpenMPI/2.1.3/iccifort-2018.2.199-GCC-6.4.0-2.28):
> build failed (first 300 chars): cmd " make -j 48 " exited with exit code
> 2 and output:
> Making all in config
> make[1]: Entering directory
> `/home/modules/build/OpenMPI/2.1.3/iccifort-2018.2.199-GCC-6.4.0-2.28/openmpi-2.1.3/config'
>
> Intel Omni-Path systems
> make[1]: Nothing to be done for `all'.
> make[1]: Leaving directory `/home/modules/build/OpenMPI/2.1.3/icc
> == Results of the build can be found in the log file(s)
> /tmp/eb-MDUNmr/easybuild-OpenMPI-2.1.3-20180430.105741.vWNYR.log
> ERROR: Build of
> /home/modules/software/EasyBuild/3.6.0/lib/python2.7/site-packages/easybuild_easyconfigs-3.6.0-py2.7.egg/easybuild/easyconfigs/o/OpenMPI/OpenMPI-2.1.3-iccifort-2018.2.199-GCC-6.4.0-2.28.eb
>
> failed (err: 'build failed (first 300 chars): cmd " make -j 48 " exited
> with exit code 2 and output:\nMaking all in config\nmake[1]: Entering
> directory
> `/home/modules/build/OpenMPI/2.1.3/iccifort-2018.2.199-GCC-6.4.0-2.28/openmpi-2.1.3/config\'\nmake[1]:
>
> Nothing to be done for `all\'.\nmake[1]: Leaving directory
> `/home/modules/build/OpenMPI/2.1.3/icc')
>
> The OpenMPI error log file contains near the end:
>
>CCLD libopen-pal.la
> ld: cannot find -lpciaccess
> ld: cannot find -lxml2
> make[2]: *** [libopen-pal.la] Error 1
> make[2]: Leaving directory
>
> `/home/modules/build/OpenMPI/2.1.3/iccifort-2018.2.199-GCC-6.4.0-2.28/openmpi-2.1.3/opal'
> make[1]: *** [all-recursive] Error 1
> make[1]: Leaving directory
>
> `/home/modules/build/OpenMPI/2.1.3/iccifort-2018.2.199-GCC-6.4.0-2.28/openmpi-2.1.3/opal'
> make: *** [all-recursive] Error 1
>   (at easybuild/tools/run.py:501 in parse_cmd_output)
> == 2018-04-30 11:03:17,152 easyblock.py:2702 WARNING build failed (first
> 300 chars): cmd " make -j 48 " exited with exit code 2 and output:
> Making all in config
> make[1]: Entering directory `/home/modules/build/OpenMPI/2.1.3/iccifort-
> Intel Omni-Path systems2018.2.199-GCC-6.4.0-2.28/openmpi-2.1.3/config'
> make[1]: Nothing to be done for `all'.
> make[1]: Leaving directory `/home/modules/build/OpenMPI/2.1.3/icc
> == 2018-04-30 11:03:17,152 easyblock.py:280 INFO Closing log for
> application name OpenMPI version 2.1.3
>
>
> Question: Can anyone point to the cause of this error?  Did OpenMPI
> 2.1.3 introduce this error?
>
>
> Extra information: The issue
> https://github.com/easybuilders/easybuild-easyconfigs/issues/5805 is
> fixed with EB 3.6.0 and OpenMPI 2.1.3.  On our systems with Mellanox
> Infiniband software installed, the line in
> OpenMPI-2.1.3-iccifort-2018.2.199-GCC-6.4.0-2.28.eb *does* fix the issue
> 5805 on this platform:
>
> configopts += '--without-ucx '  # hard disable UCX, to dance around bug
> (https://github.com/open-mpi/ompi/issues/4345)
>
> On the Intel Omni-Path system I tried to comment out this line, but the
> build still fails with the same error.
>
> /Ole
>
>
> On 03/05/2018 03:44 PM, Åke Sandgren wrote:
> > To clarify, it's a bug in the OpenMPI configure script when dealing with
> > UCX which they haven't fixed.
> >
> > On 03/05/2018 03:36 PM, Balázs Hajgató wrote:
> >> Dear Ole,
> >>
> >> use
> >> configopts += '--without-ucx '
> >>
> >> in the OpenMPI easyconfig
> >>
> >> Sincerely,
> >>
> >> Balazs
> >>
> >
> >>> Thanks for any suggestions!
> >>>
> >>
> >
> --
> Ole Holm Nielsen
> PhD, Senior HPC Officer
> Department of Physics, Technical University of Denmark,
> Building 307, DK-2800 Kongens Lyngby, Denmark
> E-mail: ole.h.niel...@fysik.dtu.dk
> Homepage: http://dcwww.fysik.dtu.dk/~ohnielse/
> Tel: (+45) 4525 3187 / Mobile (+45) 5180 1620
> >> On 05/03/2018 15:27, Ole Holm Nielsen wrote:
> >>> Using EB 3.5.2 I'm 

Re: [easybuild] building TensorFlow doesn't respect job-cores

2018-04-30 Thread Yann Sagon
Dear Kenneth,


2018-04-30 11:32 GMT+02:00 Kenneth Hoste :

> Dear Yann,
>
> On 30/04/2018 11:28, Yann Sagon wrote:
> > I just noticed that building TF17 with cuda compute 6.0 and 6.1 doesn't
> > respect the job-cores = 12 that I have in my config file. According to
> > htop, it's using ~28 cores
>
> Did you submit the TF build as a job using "eb --job"?
> The --job-cores configuration setting only applies to build jobs
> submitted using --job.
>

duhhh! Sorry about that, I'm dumb. Indeed I should use the parallel flag in
my case.

But to be honest, I  don't see why there are two different flag for almost
the same purpose?


Re: [easybuild] building TensorFlow doesn't respect job-cores

2018-04-30 Thread Kenneth Hoste

Dear Yann,

On 30/04/2018 11:28, Yann Sagon wrote:
I just noticed that building TF17 with cuda compute 6.0 and 6.1 doesn't 
respect the job-cores = 12 that I have in my config file. According to 
htop, it's using ~28 cores


Did you submit the TF build as a job using "eb --job"?
The --job-cores configuration setting only applies to build jobs 
submitted using --job.


If you want to control the number of cores that EasyBuild uses for 
building, you should use "eb --parallel=12" (or set 'parallel' in your 
EasyBuild configuration file).


Also, do make sure you verify your currently active configuration using 
"eb --show-config".



regards,

Kenneth




Best


--
Logo UNIGE  Yann Sagon
Chef d'équipe HPC

Division du système et des technologies de l'information et de la 
communication

Université de Genève | 24 rue Général-Dufour
Tél 022 379 77 37 | Bureau 151

www.unige.ch/stic 



[easybuild] EB 3.6.0: Build of iomkl-2018.02.eb fails on Intel Omni-Path systems

2018-04-30 Thread Ole Holm Nielsen
When we build on nodes with Intel Omni-Path software installed, the 
build of "iomkl" fails:


$ eb iomkl-2018.02.eb -r
== temporary log file in case of crash /tmp/eb-MDUNmr/easybuild-UxI3tr.log
== resolving dependencies ...
== processing EasyBuild easyconfig 
/home/modules/software/EasyBuild/3.6.0/lib/python2.7/site-packages/easybuild_easyconfigs-3.6.0-py2.7.egg/easybuild/easyconfigs/o/OpenMPI/OpenMPI-2.1.3-iccifort-2018.2.199-GCC-6.4.0-2.28.eb
== building and installing 
OpenMPI/2.1.3-iccifort-2018.2.199-GCC-6.4.0-2.28...

== fetching files...
== creating build dir, resetting environment...
== unpacking...
== patching...
== preparing...
== configuring...
== building...
== FAILED: Installation ended unsuccessfully (build directory: 
/home/modules/build/OpenMPI/2.1.3/iccifort-2018.2.199-GCC-6.4.0-2.28): 
build failed (first 300 chars): cmd " make -j 48 " exited with exit code 
2 and output:

Making all in config
make[1]: Entering directory 
`/home/modules/build/OpenMPI/2.1.3/iccifort-2018.2.199-GCC-6.4.0-2.28/openmpi-2.1.3/config' 
Intel Omni-Path systems

make[1]: Nothing to be done for `all'.
make[1]: Leaving directory `/home/modules/build/OpenMPI/2.1.3/icc
== Results of the build can be found in the log file(s) 
/tmp/eb-MDUNmr/easybuild-OpenMPI-2.1.3-20180430.105741.vWNYR.log
ERROR: Build of 
/home/modules/software/EasyBuild/3.6.0/lib/python2.7/site-packages/easybuild_easyconfigs-3.6.0-py2.7.egg/easybuild/easyconfigs/o/OpenMPI/OpenMPI-2.1.3-iccifort-2018.2.199-GCC-6.4.0-2.28.eb 
failed (err: 'build failed (first 300 chars): cmd " make -j 48 " exited 
with exit code 2 and output:\nMaking all in config\nmake[1]: Entering 
directory 
`/home/modules/build/OpenMPI/2.1.3/iccifort-2018.2.199-GCC-6.4.0-2.28/openmpi-2.1.3/config\'\nmake[1]: 
Nothing to be done for `all\'.\nmake[1]: Leaving directory 
`/home/modules/build/OpenMPI/2.1.3/icc')


The OpenMPI error log file contains near the end:

  CCLD libopen-pal.la
ld: cannot find -lpciaccess
ld: cannot find -lxml2
make[2]: *** [libopen-pal.la] Error 1
make[2]: Leaving directory 
`/home/modules/build/OpenMPI/2.1.3/iccifort-2018.2.199-GCC-6.4.0-2.28/openmpi-2.1.3/opal'

make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory 
`/home/modules/build/OpenMPI/2.1.3/iccifort-2018.2.199-GCC-6.4.0-2.28/openmpi-2.1.3/opal'

make: *** [all-recursive] Error 1
 (at easybuild/tools/run.py:501 in parse_cmd_output)
== 2018-04-30 11:03:17,152 easyblock.py:2702 WARNING build failed (first 
300 chars): cmd " make -j 48 " exited with exit code 2 and output:

Making all in config
make[1]: Entering directory `/home/modules/build/OpenMPI/2.1.3/iccifort- 
Intel Omni-Path systems2018.2.199-GCC-6.4.0-2.28/openmpi-2.1.3/config'

make[1]: Nothing to be done for `all'.
make[1]: Leaving directory `/home/modules/build/OpenMPI/2.1.3/icc
== 2018-04-30 11:03:17,152 easyblock.py:280 INFO Closing log for 
application name OpenMPI version 2.1.3



Question: Can anyone point to the cause of this error?  Did OpenMPI 
2.1.3 introduce this error?



Extra information: The issue 
https://github.com/easybuilders/easybuild-easyconfigs/issues/5805 is 
fixed with EB 3.6.0 and OpenMPI 2.1.3.  On our systems with Mellanox 
Infiniband software installed, the line in 
OpenMPI-2.1.3-iccifort-2018.2.199-GCC-6.4.0-2.28.eb *does* fix the issue 
5805 on this platform:


configopts += '--without-ucx '  # hard disable UCX, to dance around bug 
(https://github.com/open-mpi/ompi/issues/4345)


On the Intel Omni-Path system I tried to comment out this line, but the 
build still fails with the same error.


/Ole


On 03/05/2018 03:44 PM, Åke Sandgren wrote:

To clarify, it's a bug in the OpenMPI configure script when dealing with
UCX which they haven't fixed.

On 03/05/2018 03:36 PM, Balázs Hajgató wrote:

Dear Ole,

use
configopts += '--without-ucx '

in the OpenMPI easyconfig

Sincerely,

Balazs




Thanks for any suggestions!






--
Ole Holm Nielsen
PhD, Senior HPC Officer
Department of Physics, Technical University of Denmark,
Building 307, DK-2800 Kongens Lyngby, Denmark
E-mail: ole.h.niel...@fysik.dtu.dk
Homepage: http://dcwww.fysik.dtu.dk/~ohnielse/
Tel: (+45) 4525 3187 / Mobile (+45) 5180 1620

On 05/03/2018 15:27, Ole Holm Nielsen wrote:

Using EB 3.5.2 I'm trying to build the latest iomkl:

   eb iomkl-2018a.eb -r

This works like a charm on 2 of our 3 binary architectures, but on our
Sandy Bridge nodes with Mellanox Infiniband the build aborts:

# eb iomkl-2018a.eb -r
== temporary log file in case of crash
/tmp/eb-Dnjmix/easybuild-6WeIK9.log
== resolving dependencies ...
== processing EasyBuild easyconfig
/home/modules/software/EasyBuild/3.5.2/lib/python2.7/site-packages/easybuild_easyconfigs-3.5.2-py2.7.egg/easybuild/easyconfigs/o/OpenMPI/OpenMPI-2.1.2-iccifort-2018.1.163-GCC-6.4.0-2.28.eb

== building and installing
OpenMPI/2.1.2-iccifort-2018.1.163-GCC-6.4.0-2.28...
== fetching files...
== creating build dir, resetting environment...
== unpacking...
== patching...
== preparing...
== 

[easybuild] building TensorFlow doesn't respect job-cores

2018-04-30 Thread Yann Sagon
I just noticed that building TF17 with cuda compute 6.0 and 6.1 doesn't
respect the job-cores = 12 that I have in my config file. According to
htop, it's using ~28 cores

Best


-- 
[image: Logo UNIGE] Yann Sagon
Chef d'équipe HPC

Division du système et des technologies de l'information et de la
communication
Université de Genève | 24 rue Général-Dufour
Tél 022 379 77 37 | Bureau 151

www.unige.ch/stic