Re: [easybuild] TensorFlow with GPU support.

2018-01-04 Thread Maxime Boissonneault

On 18-01-04 04:23, Jakob Schiøtz wrote:

Hi,

I made a TensorFlow easyconfig a while ago depending on Python with the foss 
toolchain; and including a variant with GPU support (PR 4904).  The latter has 
not yet been merged, probably because it is annoying to have something that can 
only build on a machine with a GPU (it fails the sanity check otherwise, as 
TensorFlow with GPU support cannot load on a machine without it).

Since I made that PR, two newer releases of TensorFlow have appeared (1.3 and 
1.4).
You're actually missing 1.5 which just came out this morning. Built with 
AVX support, Cuda 9 and cuDNN 7.


Maxime


Re: [easybuild] TensorFlow with GPU support.

2018-01-04 Thread Kenneth Hoste

Dear Jakob,

On 04/01/2018 10:23, Jakob Schiøtz wrote:

Hi,

I made a TensorFlow easyconfig a while ago depending on Python with the foss 
toolchain; and including a variant with GPU support (PR 4904).  The latter has 
not yet been merged, probably because it is annoying to have something that can 
only build on a machine with a GPU (it fails the sanity check otherwise, as 
TensorFlow with GPU support cannot load on a machine without it).


Not being able to test this on a non-GPU system is a bit unfortunate, 
but that's not a reason that it hasn't been merged yet, that's mostly 
due to a lack of time from my side to get back to it...



Since I made that PR, two newer releases of TensorFlow have appeared (1.3 and 
1.4).   There are easyconfigs for 1.3 with the Intel tool chain.  I am 
considering making easyconfigs for TensorFlow 1.4 with Python-3.6.3-foss-2017b 
(both with and without GPU support), but first I would like to know if anybody 
else is doing this - it is my impression that somebody who actually know what 
they are doing may be working on TensorFlow. :-)


I have spent quite a bit of time puzzling together an easyblock that 
supports building TensorFlow from source, see [1].


It already works for non-GPU installations (see [2] for example), but 
it's not entirely finished yet because:


* building from source with CUDA support does not work yet, the build 
fails with strange Bazel errors...


* there are some issues when the TensorFlow easyblock is used together 
with --use-ccache and the Intel compilers;
  because two compiler wrappers are used, they end up calling each 
other resulting in a "fork bomb" style situation...


I would really like to get it finished and have easyconfigs available 
for TensorFlow 1.4 and newer where we properly build TensorFlow from 
source rather than using the binary wheels...


Are you up for giving it a try, and maybe helping out with the problems 
mentioned above?



regards,

Kenneth


[1] https://github.com/easybuilders/easybuild-easyblocks/pull/1287
[2] https://github.com/easybuilders/easybuild-easyconfigs/pull/5499



Best regards

Jakob

--
Jakob Schiøtz, professor, Ph.D.
Department of Physics
Technical University of Denmark
DK-2800 Kongens Lyngby, Denmark
http://www.fysik.dtu.dk/~schiotz/







RE: [easybuild] Python-2.7.12-iccifort-2016.3.210-GCC-5.4.0-2.26 build segfault

2018-01-04 Thread Vanzo, Davide
Kenneth,

That did the trick.
Thank you!

--
Davide Vanzo, PhD
Application Developer
Adjunct Assistant Professor of Chemical and Biomolecular Engineering
Advanced Computing Center for Research and Education (ACCRE)
Vanderbilt University - Hill Center 201
(615)-875-9137
www.accre.vanderbilt.edu


On 2018-01-02 14:34:58-06:00 easybuild-requ...@lists.ugent.be wrote:

Hi Davide,

On 02/01/2018 21:29, Vanzo, Davide wrote:
Hello EasyBuilders!

I am trying to build Python-2.7.12-iccifort-2016.3.210-GCC-5.4.0-2.26.eb (from 
the intel-2016b easyconfig file I have simply removed the python packages that 
depend on MPI and MKL) on CentOS 7.2.1511 and for some reason it fails with the 
error below. I have never seen this error when I built it under CentOS 6.x.

Does it ring any bell to any of you?

This sounds a lot like the issue we encountered in 
https://github.com/easybuilders/easybuild-easyconfigs/issues/3646,
which basically boils down to this version of the Intel compilers not being 
compatible with CentOS 7.2.

The official (!) fix is to copy libintlc.so.5 from a more recent version of the 
Intel compilers, see also 
https://bugzilla.redhat.com/show_bug.cgi?id=1377895
 and 
https://software.intel.com/en-us/articles/intel-compiler-version-16-not-compatible-with-recent-libcso6
 .


regards,

Kenneth

Attached you can find the full log.

Thanks!



Modules/posixmodule.o: In function `posix_tmpnam':
/mnt/ramdisk/Python/2.7.12/iccifort-2016.3.210/Python-2.7.12/./Modules/posixmodule.c:7631:
 warning: the use of `tmpnam_r' is dangerous, better use `mkstemp'
Modules/posixmodule.o: In function `posix_tempnam':
posixmodule.c:(.text+0x4fba): warning: the use of `tempnam' is dangerous, 
better use `mkstemp'
icc -L/accre/arch/easybuild/software/Core/icc/2016.3.210/lib/intel64 
-L/accre/arch/easybuild/software/Compiler/GCCcore/5.4.0/bzip2/1.0.6/lib 
-L/accre/arch/easybuild/software/Compiler/GCCcore/5.4.0/zlib/1.2.8/li\
b -L/accre/arch/easybuild/software/Compiler/GCCcore/5.4.0/libreadline/6.3/lib 
-L/accre/arch/easybuild/software/Compiler/GCCcore/5.4.0/ncurses/6.0/lib 
-L/accre/arch/easybuild/software/Compiler/GCCcore/5.4.0/SQLi\
te/3.13.0/lib 
-L/accre/arch/easybuild/software/Compiler/GCCcore/5.4.0/Tk/8.6.5/lib 
-L/accre/arch/easybuild/software/Compiler/GCCcore/5.4.0/GMP/6.1.1/lib 
-L/accre/arch/easybuild/software/Compiler/GCCcore/5.4.0/l\
ibffi/3.2.1/lib64 
-L/accre/arch/easybuild/software/Compiler/GCCcore/5.4.0/libffi/3.2.1/lib 
-Xlinker -export-dynamic -o python \
Modules/python.o \
-L. -lpython2.7 -ldl -liomp5 -lpthread -lutil 
/accre/arch/easybuild/software/Compiler/GCCcore/5.4.0/libreadline/6.3/lib/libreadline.a
 /accre/arch/easybuild/software/Compiler/GCCcore/5.4.0/ncurse\
s/6.0/lib/libncurses.a   -lm
LD_LIBRARY_PATH=/mnt/ramdisk/Python/2.7.12/iccifort-2016.3.210/Python-2.7.12:/accre/arch/easybuild/software/Compiler/GCCcore/5.4.0/libffi/3.2.1/lib64:/accre/arch/easybuild/software/Compiler/GCCcore/5.4.0/libffi\
/3.2.1/lib:/accre/arch/easybuild/software/Compiler/GCCcore/5.4.0/GMP/6.1.1/lib:/accre/arch/easybuild/software/Compiler/GCCcore/5.4.0/Tk/8.6.5/lib:/accre/arch/easybuild/software/Compiler/GCCcore/5.4.0/X11/201608\
19/lib:/accre/arch/easybuild/software/Compiler/GCCcore/5.4.0/fontconfig/2.12.1/lib:/accre/arch/easybuild/software/Compiler/GCCcore/5.4.0/expat/2.2.0/lib:/accre/arch/easybuild/software/Compiler/GCCcore/5.4.0/fre\
etype/2.6.5/lib:/accre/arch/easybuild/software/Compiler/GCCcore/5.4.0/libpng/1.6.24/lib:/accre/arch/easybuild/software/Compiler/GCCcore/5.4.0/SQLite/3.13.0/lib:/accre/arch/easybuild/software/Compiler/GCCcore/5.\
4.0/Tcl/8.6.5/lib:/accre/arch/easybuild/software/Compiler/GCCcore/5.4.0/libreadline/6.3/lib:/accre/arch/easybuild/software/Compiler/GCCcore/5.4.0/ncurses/6.0/lib:/accre/arch/easybuild/software/Compiler/GCCcore/\

Re: [easybuild] TensorFlow with GPU support.

2018-01-04 Thread Pablo Escobar Lopez
Hi Jakob,

I installed Tensorflow in my cluster few days ago modifying your
easyconfigs.  I have just sent two PR with the two easyconfigs I installed:

https://github.com/easybuilders/easybuild-easyconfigs/pull/5590
https://github.com/easybuilders/easybuild-easyconfigs/pull/5591

I used cuDDN 6.0 as dependency instead of cuDDN 7.x because the provided
.whl is linked with 6.0. If you try 7.x you will get a ".so lib not found"
error

regards,
Pablo.

2018-01-04 10:23 GMT+01:00 Jakob Schiøtz :

> Hi,
>
> I made a TensorFlow easyconfig a while ago depending on Python with the
> foss toolchain; and including a variant with GPU support (PR 4904).  The
> latter has not yet been merged, probably because it is annoying to have
> something that can only build on a machine with a GPU (it fails the sanity
> check otherwise, as TensorFlow with GPU support cannot load on a machine
> without it).
>
> Since I made that PR, two newer releases of TensorFlow have appeared (1.3
> and 1.4).   There are easyconfigs for 1.3 with the Intel tool chain.  I am
> considering making easyconfigs for TensorFlow 1.4 with
> Python-3.6.3-foss-2017b (both with and without GPU support), but first I
> would like to know if anybody else is doing this - it is my impression that
> somebody who actually know what they are doing may be working on
> TensorFlow. :-)
>
> Best regards
>
> Jakob
>
> --
> Jakob Schiøtz, professor, Ph.D.
> Department of Physics
> Technical University of Denmark
> DK-2800 Kongens Lyngby, Denmark
> http://www.fysik.dtu.dk/~schiotz/
>
>
>
>


-- 
Pablo Escobar López
HPC systems engineer
sciCORE, University of Basel
SIB Swiss Institute of Bioinformatics
http://scicore.unibas.ch


Re: Should we skip foss/2018a (Re: [easybuild] 2018a toolchains)

2018-01-04 Thread Jakob Schiøtz


> On 2 Jan 2018, at 17:58, Mikael Öhman  wrote:

   [ … ]

> But the existance of another toolchain is not a problem; sparsely populated 
> toolchains versions are.
> With the combinatorics of at least 2 primary toolchains (foss/intel), 2 major 
> python versions (3 when some software can be configured without python), and 
> 2 releases per year, I feel we aren't doing anyone any favours.
> Configs are a collective effort, and I we are spreading ourselves thin here.

I would tend to agree with this, it is an extra effort to keep up with all 
these toolchains.  Obviously, toolchains need to be updated, but perhaps doing 
it twice every year is too much.  Is there a need for this?

Unless perhaps the “a” releases are considered kind of public beta versions, 
and the “b” releases are stable ones :-)

Best regards

Jakob


--
Jakob Schiøtz, professor, Ph.D.
Department of Physics
Technical University of Denmark
DK-2800 Kongens Lyngby, Denmark
http://www.fysik.dtu.dk/~schiotz/