Re: [easybuild] TensorFlow with GPU support.

2019-03-26 Thread Jakob Schiøtz
Thank you!

I do not think OpenMPI with cuda support is particularly relevant for us.   I 
will read the docs and try to understand what --minmal-toolchains does.

Thanks for your suggestions

Jakob

> On 26 Mar 2019, at 14:41, Jack Perdue  wrote:
> 
> Howdy Jakob,
> 
> The primary difference between fosscuda and
> foss+CUDA is that fosscuda has an OpenMPI built
> with CUDA support where as the latter does not.
> 
> We run with:
> 
> EASYBUILD_MINIMAL_TOOLCHAINS
> 
> which cuts down on the number of things that
> have be rebuilt here  For example for
> 
> TensorFlow/1.10.1-fosscuda-2018b-Python-3.6.6
> 
> we only had to rebuild these packages:
> 
> Python/3.6.6-fosscuda-2018b
> protobuf-python/3.6.0-fosscuda-2018b-Python-3.6.6
> cuDNN/7.1.4.18-fosscuda-2018b
> 
> so you might want look at the --minmal-toolchains option.
> 
> 
> Jack Perdue
> Lead Systems Administrator
> High Performance Research Computing
> TAMU Division of Research
> j-per...@tamu.eduhttp://hprc.tamu.edu
> HPRC Helpdesk: h...@hprc.tamu.edu
> 
> On 3/26/19 8:26 AM, Jakob Schiøtz wrote:
>> Dear EasyBuilders,
>> 
>> I would like to build a TensorFlow module supporting GPUs.  Currently, that 
>> looks to be TensorFlow-1.12.0-fosscuda-2018b-Python-3.6.6.eb, but this 
>> requires building a new toolchain (fosscuda), including rebuilding both 
>> OpenMPI and Python with GPU support.  In addition, any other software that 
>> the user may need alongside TensorFlow will also have to be rebuilt with the 
>> fosscuda toolchain to prevent mixing toolchains.  That seems to be overkill 
>> to me - after all little if anything is gained by rebuilding Python and 
>> stuff with GPU support unless the scripts are actually going to use it.  I 
>> don’t know if numpy will begin offloading computations to the GPU, but 
>> presumably moving data back and forth will be expensive, and TensorFlow will 
>> allocate all the GPU memory anyway.
>> 
>> There used to be a tensorflow variant for the normal toolchain which just 
>> depended on CUDA explicitly.  Is there a reason not to do it this way?  I 
>> guess I could try to make a .eb file inspired on the existing ones doing 
>> just that - or have I overlooked something?
>> 
>> Best regards
>> 
>> Jakob
>> 
>> 
>> --
>> Jakob Schiøtz, professor, Ph.D.
>> Department of Physics
>> Technical University of Denmark
>> DK-2800 Kongens Lyngby, Denmark
>> http://www.fysik.dtu.dk/~schiotz/
>> 
>> 
>> 
> 

--
Jakob Schiøtz, professor, Ph.D.
Department of Physics
Technical University of Denmark
DK-2800 Kongens Lyngby, Denmark
http://www.fysik.dtu.dk/~schiotz/





Re: [easybuild] TensorFlow with GPU support.

2019-03-26 Thread Jack Perdue

Howdy Jakob,

The primary difference between fosscuda and
foss+CUDA is that fosscuda has an OpenMPI built
with CUDA support where as the latter does not.

We run with:

EASYBUILD_MINIMAL_TOOLCHAINS

which cuts down on the number of things that
have be rebuilt here  For example for

TensorFlow/1.10.1-fosscuda-2018b-Python-3.6.6

we only had to rebuild these packages:

Python/3.6.6-fosscuda-2018b
protobuf-python/3.6.0-fosscuda-2018b-Python-3.6.6
cuDNN/7.1.4.18-fosscuda-2018b

so you might want look at the --minmal-toolchains option.


Jack Perdue
Lead Systems Administrator
High Performance Research Computing
TAMU Division of Research
j-per...@tamu.eduhttp://hprc.tamu.edu
HPRC Helpdesk: h...@hprc.tamu.edu

On 3/26/19 8:26 AM, Jakob Schiøtz wrote:

Dear EasyBuilders,

I would like to build a TensorFlow module supporting GPUs.  Currently, that 
looks to be TensorFlow-1.12.0-fosscuda-2018b-Python-3.6.6.eb, but this requires 
building a new toolchain (fosscuda), including rebuilding both OpenMPI and 
Python with GPU support.  In addition, any other software that the user may 
need alongside TensorFlow will also have to be rebuilt with the fosscuda 
toolchain to prevent mixing toolchains.  That seems to be overkill to me - 
after all little if anything is gained by rebuilding Python and stuff with GPU 
support unless the scripts are actually going to use it.  I don’t know if numpy 
will begin offloading computations to the GPU, but presumably moving data back 
and forth will be expensive, and TensorFlow will allocate all the GPU memory 
anyway.

There used to be a tensorflow variant for the normal toolchain which just 
depended on CUDA explicitly.  Is there a reason not to do it this way?  I guess 
I could try to make a .eb file inspired on the existing ones doing just that - 
or have I overlooked something?

Best regards

Jakob


--
Jakob Schiøtz, professor, Ph.D.
Department of Physics
Technical University of Denmark
DK-2800 Kongens Lyngby, Denmark
http://www.fysik.dtu.dk/~schiotz/







Re: [easybuild] TensorFlow with GPU support.

2018-01-08 Thread Kenneth Hoste

On 08/01/2018 21:28, Jakob Schiøtz wrote:

On 8 Jan 2018, at 20:27, Kenneth Hoste  wrote:

On 08/01/2018 15:48, Jakob Schiøtz wrote:

Hi Kenneth,

I have now tested your TensorFlow 1.4.0 eb on our machines with a real-world 
script.  It works, but it runs three times slower than with the prebuild 
TensorFlow 1.2.1  :-(

The prebuild version complains that it was build without AVX2 etc, so I do not 
really understand why it is so much slower to use the version compiled from 
source - assuming of course that there is not a factor three performance loss 
between 1.2.1 and 1.4.0; which seems unlikely.

Wow, that must be wrong somehow...

Is this on the GPU systems?
You're not comparing a GPU-enabled TF 1.2 with a CPU-only TF 1.4 built with EB, 
are you?
If you are, then a only factor 3 slower using only CPU is actually quite 
impressive vs GPU-enabled build. ;-)

No, I am comparing not-GPU enabled versions running on a machine without a GPU. 
 So that is not the problem.

I am running a custom script training one of my students’ model.  I agree the 
result is suspicious, and I am rerunning it now (in the queue).

I will try the benchmark you mentioned below as well; and report back - but it 
may be a few days…

By the way, could the difference be due to the compiler (Intel versus foss)?  
That would be an unusually large difference, but my own MD code (ASAP) displays 
almost a factor two difference.


Which is which? Did you install the binary wheel on top of a Python 
built with foss or Intel?


That could certainly matter, but I would be very surprised if it's more 
than 10-20% to be honest.


I saw 10% performance loss for TF 1.4 built with intel/2017b vs 
foss/2017b (on top of Python 3.6.3) on Haswell (so the foss build was 
slightly faster).



regards,

Kenneth



Jakob



How are you benchmarking this exactly?
When I was trying with the script from 
https://github.com/tensorflow/benchmarks/tree/master/scripts/tf_cnn_benchmarks, 
I saw 7x better performance when building TF 1.4.0 from source on Intel Haswell 
(no GPU) compared to a conda install (which is basically the same as using the 
binary wheel).
On a GPU system (NVIDIA K40) with the TF 1.4.0 binary wheel I saw another 8x 
performance increase over the EB-installed-from-source CPU-only TF 1.4.0 
installation.

Here's the command I was running (don't forget the change --device when running 
on a GPU system):

python tf_cnn_benchmarks.py --device cpu --batch_size=32 --model=resnet50 
--variable_update=parameter_server --data_format NHWC


regards,

Kenneth


Best regards

Jakob



On 5 Jan 2018, at 13:57, Kenneth Hoste  wrote:

On 04/01/2018 16:37, Jakob Schiøtz wrote:

Dear Kenneth, Pablo and Maxime,

Thanks for your feedback.  Yes, I will try to see if I can build from source, 
but I will focus on the foss toolchain since we use that one for our Python 
here (we do not have the Intel MPI license, and the iomkl toolchain could not 
built Python last time I tried).

I assume the reason for building from source is to ensure consistent library 
versions etc.  If that proves very difficult, could we perhaps in the interim 
have builds (with a -bin suffix?) using the prebuilt wheels?

The main reason for building from source is performance and compatibility with 
the OS.

The binary wheels that are available for TensorFlow are not compatible with 
older OS versions like CentOS 6, as I experienced first-hand when trying to get 
it to work on an older (GPU) system.
Since the compilation from source with CUDA support didn't work yet, I had to 
resort to injecting a newer glibc version in the 'python' binary, which was not 
fun (well...).

For CPU-only installations, you really have no other option than building from 
source, since the binary wheels were not built with AVX2 instructions for 
example, which leads to large performance losses (some quick benchmarking 
showed a 7x increase in performance for TF 1.4 built with foss/2017b over using 
the binary wheel).

For GPU installations, a similar concern arises, although it may be less severe 
there, depending on what CUDA compute capabilities the binary wheels were built 
with (I only tested the wheels on old systems with NVIDIA K20x/K40 GPUs, so 
there I doubt you'll get much performance increase when building from source).

If it turns out to be too difficult or time-consuming to get the build from 
source with CUDA support to work, then we can of course progress with sticking 
to the binary wheel releases for now, I'm not going to oppose that.


regards,

Kenneth


Best regards

Jakob



On 4 Jan 2018, at 15:29, Kenneth Hoste  wrote:

Dear Jakob,

On 04/01/2018 10:23, Jakob Schiøtz wrote:

Hi,

I made a TensorFlow easyconfig a while ago depending on Python with the foss 
toolchain; and including a variant with GPU support (PR 4904).  The latter has 
not yet been merged, probably because it is annoying to have something that can 

Re: [easybuild] TensorFlow with GPU support.

2018-01-08 Thread Jakob Schiøtz


> On 8 Jan 2018, at 20:27, Kenneth Hoste  wrote:
> 
> On 08/01/2018 15:48, Jakob Schiøtz wrote:
>> Hi Kenneth,
>> 
>> I have now tested your TensorFlow 1.4.0 eb on our machines with a real-world 
>> script.  It works, but it runs three times slower than with the prebuild 
>> TensorFlow 1.2.1  :-(
>> 
>> The prebuild version complains that it was build without AVX2 etc, so I do 
>> not really understand why it is so much slower to use the version compiled 
>> from source - assuming of course that there is not a factor three 
>> performance loss between 1.2.1 and 1.4.0; which seems unlikely.
> 
> Wow, that must be wrong somehow...
> 
> Is this on the GPU systems?
> You're not comparing a GPU-enabled TF 1.2 with a CPU-only TF 1.4 built with 
> EB, are you?
> If you are, then a only factor 3 slower using only CPU is actually quite 
> impressive vs GPU-enabled build. ;-)

No, I am comparing not-GPU enabled versions running on a machine without a GPU. 
 So that is not the problem.

I am running a custom script training one of my students’ model.  I agree the 
result is suspicious, and I am rerunning it now (in the queue).

I will try the benchmark you mentioned below as well; and report back - but it 
may be a few days…

By the way, could the difference be due to the compiler (Intel versus foss)?  
That would be an unusually large difference, but my own MD code (ASAP) displays 
almost a factor two difference.

Jakob


> 
> How are you benchmarking this exactly?
> When I was trying with the script from 
> https://github.com/tensorflow/benchmarks/tree/master/scripts/tf_cnn_benchmarks,
>  I saw 7x better performance when building TF 1.4.0 from source on Intel 
> Haswell (no GPU) compared to a conda install (which is basically the same as 
> using the binary wheel).
> On a GPU system (NVIDIA K40) with the TF 1.4.0 binary wheel I saw another 8x 
> performance increase over the EB-installed-from-source CPU-only TF 1.4.0 
> installation.
> 
> Here's the command I was running (don't forget the change --device when 
> running on a GPU system):
> 
> python tf_cnn_benchmarks.py --device cpu --batch_size=32 --model=resnet50 
> --variable_update=parameter_server --data_format NHWC
> 
> 
> regards,
> 
> Kenneth
> 
>> 
>> Best regards
>> 
>> Jakob
>> 
>> 
>>> On 5 Jan 2018, at 13:57, Kenneth Hoste  wrote:
>>> 
>>> On 04/01/2018 16:37, Jakob Schiøtz wrote:
 Dear Kenneth, Pablo and Maxime,
 
 Thanks for your feedback.  Yes, I will try to see if I can build from 
 source, but I will focus on the foss toolchain since we use that one for 
 our Python here (we do not have the Intel MPI license, and the iomkl 
 toolchain could not built Python last time I tried).
 
 I assume the reason for building from source is to ensure consistent 
 library versions etc.  If that proves very difficult, could we perhaps in 
 the interim have builds (with a -bin suffix?) using the prebuilt wheels?
>>> The main reason for building from source is performance and compatibility 
>>> with the OS.
>>> 
>>> The binary wheels that are available for TensorFlow are not compatible with 
>>> older OS versions like CentOS 6, as I experienced first-hand when trying to 
>>> get it to work on an older (GPU) system.
>>> Since the compilation from source with CUDA support didn't work yet, I had 
>>> to resort to injecting a newer glibc version in the 'python' binary, which 
>>> was not fun (well...).
>>> 
>>> For CPU-only installations, you really have no other option than building 
>>> from source, since the binary wheels were not built with AVX2 instructions 
>>> for example, which leads to large performance losses (some quick 
>>> benchmarking showed a 7x increase in performance for TF 1.4 built with 
>>> foss/2017b over using the binary wheel).
>>> 
>>> For GPU installations, a similar concern arises, although it may be less 
>>> severe there, depending on what CUDA compute capabilities the binary wheels 
>>> were built with (I only tested the wheels on old systems with NVIDIA 
>>> K20x/K40 GPUs, so there I doubt you'll get much performance increase when 
>>> building from source).
>>> 
>>> If it turns out to be too difficult or time-consuming to get the build from 
>>> source with CUDA support to work, then we can of course progress with 
>>> sticking to the binary wheel releases for now, I'm not going to oppose that.
>>> 
>>> 
>>> regards,
>>> 
>>> Kenneth
>>> 
 Best regards
 
 Jakob
 
 
> On 4 Jan 2018, at 15:29, Kenneth Hoste  wrote:
> 
> Dear Jakob,
> 
> On 04/01/2018 10:23, Jakob Schiøtz wrote:
>> Hi,
>> 
>> I made a TensorFlow easyconfig a while ago depending on Python with the 
>> foss toolchain; and including a variant with GPU support (PR 4904).  The 
>> latter has not yet been merged, probably because it is annoying to have 
>> something that can only build on a machine with a 

Re: [easybuild] TensorFlow with GPU support.

2018-01-08 Thread Kenneth Hoste

On 08/01/2018 15:48, Jakob Schiøtz wrote:

Hi Kenneth,

I have now tested your TensorFlow 1.4.0 eb on our machines with a real-world 
script.  It works, but it runs three times slower than with the prebuild 
TensorFlow 1.2.1  :-(

The prebuild version complains that it was build without AVX2 etc, so I do not 
really understand why it is so much slower to use the version compiled from 
source - assuming of course that there is not a factor three performance loss 
between 1.2.1 and 1.4.0; which seems unlikely.


Wow, that must be wrong somehow...

Is this on the GPU systems?
You're not comparing a GPU-enabled TF 1.2 with a CPU-only TF 1.4 built 
with EB, are you?
If you are, then a only factor 3 slower using only CPU is actually quite 
impressive vs GPU-enabled build. ;-)


How are you benchmarking this exactly?
When I was trying with the script from 
https://github.com/tensorflow/benchmarks/tree/master/scripts/tf_cnn_benchmarks, 
I saw 7x better performance when building TF 1.4.0 from source on Intel 
Haswell (no GPU) compared to a conda install (which is basically the 
same as using the binary wheel).
On a GPU system (NVIDIA K40) with the TF 1.4.0 binary wheel I saw 
another 8x performance increase over the EB-installed-from-source 
CPU-only TF 1.4.0 installation.


Here's the command I was running (don't forget the change --device when 
running on a GPU system):


python tf_cnn_benchmarks.py --device cpu --batch_size=32 
--model=resnet50 --variable_update=parameter_server --data_format NHWC



regards,

Kenneth



Best regards

Jakob



On 5 Jan 2018, at 13:57, Kenneth Hoste  wrote:

On 04/01/2018 16:37, Jakob Schiøtz wrote:

Dear Kenneth, Pablo and Maxime,

Thanks for your feedback.  Yes, I will try to see if I can build from source, 
but I will focus on the foss toolchain since we use that one for our Python 
here (we do not have the Intel MPI license, and the iomkl toolchain could not 
built Python last time I tried).

I assume the reason for building from source is to ensure consistent library 
versions etc.  If that proves very difficult, could we perhaps in the interim 
have builds (with a -bin suffix?) using the prebuilt wheels?

The main reason for building from source is performance and compatibility with 
the OS.

The binary wheels that are available for TensorFlow are not compatible with 
older OS versions like CentOS 6, as I experienced first-hand when trying to get 
it to work on an older (GPU) system.
Since the compilation from source with CUDA support didn't work yet, I had to 
resort to injecting a newer glibc version in the 'python' binary, which was not 
fun (well...).

For CPU-only installations, you really have no other option than building from 
source, since the binary wheels were not built with AVX2 instructions for 
example, which leads to large performance losses (some quick benchmarking 
showed a 7x increase in performance for TF 1.4 built with foss/2017b over using 
the binary wheel).

For GPU installations, a similar concern arises, although it may be less severe 
there, depending on what CUDA compute capabilities the binary wheels were built 
with (I only tested the wheels on old systems with NVIDIA K20x/K40 GPUs, so 
there I doubt you'll get much performance increase when building from source).

If it turns out to be too difficult or time-consuming to get the build from 
source with CUDA support to work, then we can of course progress with sticking 
to the binary wheel releases for now, I'm not going to oppose that.


regards,

Kenneth


Best regards

Jakob



On 4 Jan 2018, at 15:29, Kenneth Hoste  wrote:

Dear Jakob,

On 04/01/2018 10:23, Jakob Schiøtz wrote:

Hi,

I made a TensorFlow easyconfig a while ago depending on Python with the foss 
toolchain; and including a variant with GPU support (PR 4904).  The latter has 
not yet been merged, probably because it is annoying to have something that can 
only build on a machine with a GPU (it fails the sanity check otherwise, as 
TensorFlow with GPU support cannot load on a machine without it).

Not being able to test this on a non-GPU system is a bit unfortunate, but 
that's not a reason that it hasn't been merged yet, that's mostly due to a lack 
of time from my side to get back to it...


Since I made that PR, two newer releases of TensorFlow have appeared (1.3 and 
1.4).   There are easyconfigs for 1.3 with the Intel tool chain.  I am 
considering making easyconfigs for TensorFlow 1.4 with Python-3.6.3-foss-2017b 
(both with and without GPU support), but first I would like to know if anybody 
else is doing this - it is my impression that somebody who actually know what 
they are doing may be working on TensorFlow. :-)

I have spent quite a bit of time puzzling together an easyblock that supports 
building TensorFlow from source, see [1].

It already works for non-GPU installations (see [2] for example), but it's not 
entirely finished yet because:

* building from source with 

Re: [easybuild] TensorFlow with GPU support.

2018-01-08 Thread Kenneth Hoste



On 05/01/2018 17:28, Jakob Schiøtz wrote:

Hi again, Kenneth.

It turns out that I was wrong about the lack of internet access from the 
compute nodes.  In principle, there should be nothing stopping me from testing 
building with GPUs next week, except for my lack of knowledge :-)

I see this in the easyblock:

 def extra_options():
 extra_vars = {
 # see https://developer.nvidia.com/cuda-gpus
 'cuda_compute_capabilities': [[], "List of CUDA compute capabilities to 
build with", CUSTOM],
 'with_mkl_dnn': [True, "Make TensorFlow use Intel MKL-DNN", 
CUSTOM],
 }

Does that mean that I can call eb with something like this

eb TensorFlow-1.4.0-foss-2017b-Python-3.6.3.eb -r 
--cuda_compute_capabilities=Tesla

or something like that (I will not be able to test it until next week).  Or do 
I need to make a new easyconfig which sets that extra option somehow (and 
depends on CUDA and friends)?


The latter, cuda_compute_capabilities is a custom easyconfig parameter 
for TensorFlow, not a command line option.
Although you can try something like this to avoid having the copy & edit 
an easyconfig file yourself.


eb TensorFlow.eb --try-amend=cuda_compute_capabilities=x,y

Do note that you'll need to add CUDA & cuDNN as dependencies when you 
want to enable GPU support.


The values you provide need to be known CUDA compute capabilities 
though, so something like '3.7' (see 
https://developer.nvidia.com/cuda-gpus).



regards,

Kenneth



Best regards

Jakob




On 5 Jan 2018, at 16:10, Jakob Schiøtz  wrote:




On 5 Jan 2018, at 15:18, Kenneth Hoste  wrote:

On 05/01/2018 14:13, Jakob Schiøtz wrote:

Hi again,

Yes, I have overlooked that - I just switched my repo to your branch and tried 
to build :-)

Now I get an error when building TensorFlow.  It is a 502 Bad Gateway, 
indicating that some server is down somewhere.  But is it not a problem that 
the build process itself tried to download extra stuff in addition to the 
source files listed in the .eb file?  At least it makes the checksum checking 
moot.

That's indeed a problem, but one that is hard to avoid with TensorFlow, at 
least in a first iteration...

Once we're happy with the current approach, a new target could be to get TensorFlow to 
build "offline".

One step at a time though... ;-)

It could be a showstopper for me, though.  On our cluster, only two nodes have 
GPUs.  With the binary build, I could only install TensorFlow on those, since 
although CUDA and friends are available on all the nodes, you can only load the 
resulting TensorFlow module on a machine with a GPU.  Unfortunately, these two 
nodes are officially compute-nodes, not login-nodes, and that means that they 
are cut off from the Internet.  So no downloading is possible on these. :-(

So I have two questions:

1. What do we expect to gain by building from source instead of installing from 
the wheel?

2. Would it be OK to have a “-bin” variant installing from the binary 
distribution until we get these issues ironed out?

In my second attempt, I managed to build with foss/2017b (obviously the server 
was up again).  I have not really tested it yet (I am only just dabbing into 
TensorFlow and my main application i crashing due to another problem).  Do you 
want me to submit the new .eb file as a PR to your PR?  Or should I just wait 
till your stuff has converged?

/Jakob




regards,

Kenneth

Best regards

Jakob



WARNING: The lower priority option '-c opt' does not override the previous 
value '--compilation_mode=opt'.
WARNING: The lower priority option '-c opt' does not override the previous 
value '--compilation_mode=opt'.
Downloading 
https://github.com/bazelbuild/rules_closure/archive/4af89ef1db659eb41f110df189b67d4cf14073e1.tar.gz
 via codeload.github.com: 40,240 bytes
Downloading 
https://github.com/bazelbuild/rules_closure/archive/4af89ef1db659eb41f110df189b67d4cf14073e1.tar.gz
 via codeload.github.com: 205,436 bytes
Loading package: tensorflow/tools/pip_package
Loading package: @bazel_tools//tools/cpp
Loading package: @local_jdk//
Loading package: @local_config_cc//
Loading complete.  Analyzing...
ERROR: 
/home/niflheim/schiotz/easybuild_experimental/sandybridge/build/TensorFlow/1.4.0/foss-2017b-Python-3.6.3/tensorflow-1.4.0/tensorflow/tools/pip_package/BUILD:139:1:
 error loading package 'tensorflow': Encountered error while reading extension 
file 'protobuf.bzl': no such package '@protobuf_archive//': 
java.io.IOException: Error downloading 
[http://mirror.bazel.build/github.com/google/protobuf/archive/b04e5cba356212e4e8c66c61bbe0c3a20537c5b9.tar.gz]
 to 
/tmp/eb-GpWEyg/tmpfJrPWS-bazel-build/external/protobuf_archive/b04e5cba356212e4e8c66c61bbe0c3a20537c5b9.tar.gz:
 GET returned 502 Bad Gateway and referenced by 
'//tensorflow/tools/pip_package:build_pip_package'.
ERROR: 

Re: [easybuild] TensorFlow with GPU support.

2018-01-08 Thread Kenneth Hoste

Hi Jakob,

On 05/01/2018 16:10, Jakob Schiøtz wrote:



On 5 Jan 2018, at 15:18, Kenneth Hoste  wrote:

On 05/01/2018 14:13, Jakob Schiøtz wrote:

Hi again,

Yes, I have overlooked that - I just switched my repo to your branch and tried 
to build :-)

Now I get an error when building TensorFlow.  It is a 502 Bad Gateway, 
indicating that some server is down somewhere.  But is it not a problem that 
the build process itself tried to download extra stuff in addition to the 
source files listed in the .eb file?  At least it makes the checksum checking 
moot.

That's indeed a problem, but one that is hard to avoid with TensorFlow, at 
least in a first iteration...

Once we're happy with the current approach, a new target could be to get TensorFlow to 
build "offline".

One step at a time though... ;-)

It could be a showstopper for me, though.  On our cluster, only two nodes have 
GPUs.  With the binary build, I could only install TensorFlow on those, since 
although CUDA and friends are available on all the nodes, you can only load the 
resulting TensorFlow module on a machine with a GPU.  Unfortunately, these two 
nodes are officially compute-nodes, not login-nodes, and that means that they 
are cut off from the Internet.  So no downloading is possible on these. :-(

So I have two questions:

1. What do we expect to gain by building from source instead of installing from 
the wheel?

2. Would it be OK to have a “-bin” variant installing from the binary 
distribution until we get these issues ironed out?


See my previous e-mail. ;-)

1. better performance (due to targeting correct architecture) + 
compatibility with more OSs (e.g. CentOS 6)


2. yes


In my second attempt, I managed to build with foss/2017b (obviously the server 
was up again).  I have not really tested it yet (I am only just dabbing into 
TensorFlow and my main application i crashing due to another problem).  Do you 
want me to submit the new .eb file as a PR to your PR?  Or should I just wait 
till your stuff has converged?


That should be a separate PR I think (I'm more concerned about 
complicating existing PRs rather than one more PR to deal with).



regards,

Kenneth


/Jakob




regards,

Kenneth

Best regards

Jakob



WARNING: The lower priority option '-c opt' does not override the previous 
value '--compilation_mode=opt'.
WARNING: The lower priority option '-c opt' does not override the previous 
value '--compilation_mode=opt'.
Downloading 
https://github.com/bazelbuild/rules_closure/archive/4af89ef1db659eb41f110df189b67d4cf14073e1.tar.gz
 via codeload.github.com: 40,240 bytes
Downloading 
https://github.com/bazelbuild/rules_closure/archive/4af89ef1db659eb41f110df189b67d4cf14073e1.tar.gz
 via codeload.github.com: 205,436 bytes
Loading package: tensorflow/tools/pip_package
Loading package: @bazel_tools//tools/cpp
Loading package: @local_jdk//
Loading package: @local_config_cc//
Loading complete.  Analyzing...
ERROR: 
/home/niflheim/schiotz/easybuild_experimental/sandybridge/build/TensorFlow/1.4.0/foss-2017b-Python-3.6.3/tensorflow-1.4.0/tensorflow/tools/pip_package/BUILD:139:1:
 error loading package 'tensorflow': Encountered error while reading extension 
file 'protobuf.bzl': no such package '@protobuf_archive//': 
java.io.IOException: Error downloading 
[http://mirror.bazel.build/github.com/google/protobuf/archive/b04e5cba356212e4e8c66c61bbe0c3a20537c5b9.tar.gz]
 to 
/tmp/eb-GpWEyg/tmpfJrPWS-bazel-build/external/protobuf_archive/b04e5cba356212e4e8c66c61bbe0c3a20537c5b9.tar.gz:
 GET returned 502 Bad Gateway and referenced by 
'//tensorflow/tools/pip_package:build_pip_package'.
ERROR: 
/home/niflheim/schiotz/easybuild_experimental/sandybridge/build/TensorFlow/1.4.0/foss-2017b-Python-3.6.3/tensorflow-1.4.0/tensorflow/tools/pip_package/BUILD:139:1:
 error loading package 'tensorflow': Encountered error while reading extension 
file 'protobuf.bzl': no such package '@protobuf_archive//': 
java.io.IOException: Error downloading 
[http://mirror.bazel.build/github.com/google/protobuf/archive/b04e5cba356212e4e8c66c61bbe0c3a20537c5b9.tar.gz]
 to 
/tmp/eb-GpWEyg/tmpfJrPWS-bazel-build/external/protobuf_archive/b04e5cba356212e4e8c66c61bbe0c3a20537c5b9.tar.gz:
 GET returned 502 Bad Gateway and referenced by 
'//tensorflow/tools/pip_package:build_pip_package'.
ERROR: Analysis of target '//tensorflow/tools/pip_package:build_pip_package' 
failed; build aborted: error loading package 'tensorflow': Encountered error 
while reading extension file 'protobuf.bzl': no such package 
'@protobuf_archive//': java.io.IOException: Error downloading 
[http://mirror.bazel.build/github.com/google/protobuf/archive/b04e5cba356212e4e8c66c61bbe0c3a20537c5b9.tar.gz]
 to 
/tmp/eb-GpWEyg/tmpfJrPWS-bazel-build/external/protobuf_archive/b04e5cba356212e4e8c66c61bbe0c3a20537c5b9.tar.gz:
 GET returned 502 Bad Gateway.
Elapsed time: 6.561s
  (at easybuild/tools/run.py:481 in parse_cmd_output)
== 

Re: [easybuild] TensorFlow with GPU support.

2018-01-08 Thread Jakob Schiøtz
Hi Kenneth,

I have now tested your TensorFlow 1.4.0 eb on our machines with a real-world 
script.  It works, but it runs three times slower than with the prebuild 
TensorFlow 1.2.1  :-(

The prebuild version complains that it was build without AVX2 etc, so I do not 
really understand why it is so much slower to use the version compiled from 
source - assuming of course that there is not a factor three performance loss 
between 1.2.1 and 1.4.0; which seems unlikely.

Best regards

Jakob


> On 5 Jan 2018, at 13:57, Kenneth Hoste  wrote:
> 
> On 04/01/2018 16:37, Jakob Schiøtz wrote:
>> Dear Kenneth, Pablo and Maxime,
>> 
>> Thanks for your feedback.  Yes, I will try to see if I can build from 
>> source, but I will focus on the foss toolchain since we use that one for our 
>> Python here (we do not have the Intel MPI license, and the iomkl toolchain 
>> could not built Python last time I tried).
>> 
>> I assume the reason for building from source is to ensure consistent library 
>> versions etc.  If that proves very difficult, could we perhaps in the 
>> interim have builds (with a -bin suffix?) using the prebuilt wheels?
> 
> The main reason for building from source is performance and compatibility 
> with the OS.
> 
> The binary wheels that are available for TensorFlow are not compatible with 
> older OS versions like CentOS 6, as I experienced first-hand when trying to 
> get it to work on an older (GPU) system.
> Since the compilation from source with CUDA support didn't work yet, I had to 
> resort to injecting a newer glibc version in the 'python' binary, which was 
> not fun (well...).
> 
> For CPU-only installations, you really have no other option than building 
> from source, since the binary wheels were not built with AVX2 instructions 
> for example, which leads to large performance losses (some quick benchmarking 
> showed a 7x increase in performance for TF 1.4 built with foss/2017b over 
> using the binary wheel).
> 
> For GPU installations, a similar concern arises, although it may be less 
> severe there, depending on what CUDA compute capabilities the binary wheels 
> were built with (I only tested the wheels on old systems with NVIDIA K20x/K40 
> GPUs, so there I doubt you'll get much performance increase when building 
> from source).
> 
> If it turns out to be too difficult or time-consuming to get the build from 
> source with CUDA support to work, then we can of course progress with 
> sticking to the binary wheel releases for now, I'm not going to oppose that.
> 
> 
> regards,
> 
> Kenneth
> 
>> 
>> Best regards
>> 
>> Jakob
>> 
>> 
>>> On 4 Jan 2018, at 15:29, Kenneth Hoste  wrote:
>>> 
>>> Dear Jakob,
>>> 
>>> On 04/01/2018 10:23, Jakob Schiøtz wrote:
 Hi,
 
 I made a TensorFlow easyconfig a while ago depending on Python with the 
 foss toolchain; and including a variant with GPU support (PR 4904).  The 
 latter has not yet been merged, probably because it is annoying to have 
 something that can only build on a machine with a GPU (it fails the sanity 
 check otherwise, as TensorFlow with GPU support cannot load on a machine 
 without it).
>>> Not being able to test this on a non-GPU system is a bit unfortunate, but 
>>> that's not a reason that it hasn't been merged yet, that's mostly due to a 
>>> lack of time from my side to get back to it...
>>> 
 Since I made that PR, two newer releases of TensorFlow have appeared (1.3 
 and 1.4).   There are easyconfigs for 1.3 with the Intel tool chain.  I am 
 considering making easyconfigs for TensorFlow 1.4 with 
 Python-3.6.3-foss-2017b (both with and without GPU support), but first I 
 would like to know if anybody else is doing this - it is my impression 
 that somebody who actually know what they are doing may be working on 
 TensorFlow. :-)
>>> I have spent quite a bit of time puzzling together an easyblock that 
>>> supports building TensorFlow from source, see [1].
>>> 
>>> It already works for non-GPU installations (see [2] for example), but it's 
>>> not entirely finished yet because:
>>> 
>>> * building from source with CUDA support does not work yet, the build fails 
>>> with strange Bazel errors...
>>> 
>>> * there are some issues when the TensorFlow easyblock is used together with 
>>> --use-ccache and the Intel compilers;
>>>   because two compiler wrappers are used, they end up calling each other 
>>> resulting in a "fork bomb" style situation...
>>> 
>>> I would really like to get it finished and have easyconfigs available for 
>>> TensorFlow 1.4 and newer where we properly build TensorFlow from source 
>>> rather than using the binary wheels...
>>> 
>>> Are you up for giving it a try, and maybe helping out with the problems 
>>> mentioned above?
>>> 
>>> 
>>> regards,
>>> 
>>> Kenneth
>>> 
>>> 
>>> [1] https://github.com/easybuilders/easybuild-easyblocks/pull/1287
>>> [2] 

Re: [easybuild] TensorFlow with GPU support.

2018-01-05 Thread Jakob Schiøtz
Hi again, Kenneth.

It turns out that I was wrong about the lack of internet access from the 
compute nodes.  In principle, there should be nothing stopping me from testing 
building with GPUs next week, except for my lack of knowledge :-)

I see this in the easyblock:

def extra_options():
extra_vars = {
# see https://developer.nvidia.com/cuda-gpus
'cuda_compute_capabilities': [[], "List of CUDA compute 
capabilities to build with", CUSTOM],
'with_mkl_dnn': [True, "Make TensorFlow use Intel MKL-DNN", CUSTOM],
}

Does that mean that I can call eb with something like this

eb TensorFlow-1.4.0-foss-2017b-Python-3.6.3.eb -r 
--cuda_compute_capabilities=Tesla 

or something like that (I will not be able to test it until next week).  Or do 
I need to make a new easyconfig which sets that extra option somehow (and 
depends on CUDA and friends)?

Best regards

Jakob



> On 5 Jan 2018, at 16:10, Jakob Schiøtz  wrote:
> 
> 
> 
>> On 5 Jan 2018, at 15:18, Kenneth Hoste  wrote:
>> 
>> On 05/01/2018 14:13, Jakob Schiøtz wrote:
>>> Hi again,
>>> 
>>> Yes, I have overlooked that - I just switched my repo to your branch and 
>>> tried to build :-)
>>> 
>>> Now I get an error when building TensorFlow.  It is a 502 Bad Gateway, 
>>> indicating that some server is down somewhere.  But is it not a problem 
>>> that the build process itself tried to download extra stuff in addition to 
>>> the source files listed in the .eb file?  At least it makes the checksum 
>>> checking moot.
>> 
>> That's indeed a problem, but one that is hard to avoid with TensorFlow, at 
>> least in a first iteration...
>> 
>> Once we're happy with the current approach, a new target could be to get 
>> TensorFlow to build "offline".
>> 
>> One step at a time though... ;-)
> 
> It could be a showstopper for me, though.  On our cluster, only two nodes 
> have GPUs.  With the binary build, I could only install TensorFlow on those, 
> since although CUDA and friends are available on all the nodes, you can only 
> load the resulting TensorFlow module on a machine with a GPU.  Unfortunately, 
> these two nodes are officially compute-nodes, not login-nodes, and that means 
> that they are cut off from the Internet.  So no downloading is possible on 
> these. :-(
> 
> So I have two questions:
> 
> 1. What do we expect to gain by building from source instead of installing 
> from the wheel? 
> 
> 2. Would it be OK to have a “-bin” variant installing from the binary 
> distribution until we get these issues ironed out?
> 
> In my second attempt, I managed to build with foss/2017b (obviously the 
> server was up again).  I have not really tested it yet (I am only just 
> dabbing into TensorFlow and my main application i crashing due to another 
> problem).  Do you want me to submit the new .eb file as a PR to your PR?  Or 
> should I just wait till your stuff has converged?
> 
> /Jakob
> 
> 
>> 
>> 
>> regards,
>> 
>> Kenneth
>>> 
>>> Best regards
>>> 
>>> Jakob
>>> 
>>> 
>>> 
>>> WARNING: The lower priority option '-c opt' does not override the previous 
>>> value '--compilation_mode=opt'.
>>> WARNING: The lower priority option '-c opt' does not override the previous 
>>> value '--compilation_mode=opt'.
>>> Downloading 
>>> https://github.com/bazelbuild/rules_closure/archive/4af89ef1db659eb41f110df189b67d4cf14073e1.tar.gz
>>>  via codeload.github.com: 40,240 bytes
>>> Downloading 
>>> https://github.com/bazelbuild/rules_closure/archive/4af89ef1db659eb41f110df189b67d4cf14073e1.tar.gz
>>>  via codeload.github.com: 205,436 bytes
>>> Loading package: tensorflow/tools/pip_package
>>> Loading package: @bazel_tools//tools/cpp
>>> Loading package: @local_jdk//
>>> Loading package: @local_config_cc//
>>> Loading complete.  Analyzing...
>>> ERROR: 
>>> /home/niflheim/schiotz/easybuild_experimental/sandybridge/build/TensorFlow/1.4.0/foss-2017b-Python-3.6.3/tensorflow-1.4.0/tensorflow/tools/pip_package/BUILD:139:1:
>>>  error loading package 'tensorflow': Encountered error while reading 
>>> extension file 'protobuf.bzl': no such package '@protobuf_archive//': 
>>> java.io.IOException: Error downloading 
>>> [http://mirror.bazel.build/github.com/google/protobuf/archive/b04e5cba356212e4e8c66c61bbe0c3a20537c5b9.tar.gz]
>>>  to 
>>> /tmp/eb-GpWEyg/tmpfJrPWS-bazel-build/external/protobuf_archive/b04e5cba356212e4e8c66c61bbe0c3a20537c5b9.tar.gz:
>>>  GET returned 502 Bad Gateway and referenced by 
>>> '//tensorflow/tools/pip_package:build_pip_package'.
>>> ERROR: 
>>> /home/niflheim/schiotz/easybuild_experimental/sandybridge/build/TensorFlow/1.4.0/foss-2017b-Python-3.6.3/tensorflow-1.4.0/tensorflow/tools/pip_package/BUILD:139:1:
>>>  error loading package 'tensorflow': Encountered error while reading 
>>> extension file 'protobuf.bzl': no such package '@protobuf_archive//': 
>>> java.io.IOException: Error downloading 
>>> 

Re: [easybuild] TensorFlow with GPU support.

2018-01-05 Thread Jakob Schiøtz


> On 5 Jan 2018, at 15:18, Kenneth Hoste  wrote:
> 
> On 05/01/2018 14:13, Jakob Schiøtz wrote:
>> Hi again,
>> 
>> Yes, I have overlooked that - I just switched my repo to your branch and 
>> tried to build :-)
>> 
>> Now I get an error when building TensorFlow.  It is a 502 Bad Gateway, 
>> indicating that some server is down somewhere.  But is it not a problem that 
>> the build process itself tried to download extra stuff in addition to the 
>> source files listed in the .eb file?  At least it makes the checksum 
>> checking moot.
> 
> That's indeed a problem, but one that is hard to avoid with TensorFlow, at 
> least in a first iteration...
> 
> Once we're happy with the current approach, a new target could be to get 
> TensorFlow to build "offline".
> 
> One step at a time though... ;-)

It could be a showstopper for me, though.  On our cluster, only two nodes have 
GPUs.  With the binary build, I could only install TensorFlow on those, since 
although CUDA and friends are available on all the nodes, you can only load the 
resulting TensorFlow module on a machine with a GPU.  Unfortunately, these two 
nodes are officially compute-nodes, not login-nodes, and that means that they 
are cut off from the Internet.  So no downloading is possible on these. :-(

So I have two questions:

1. What do we expect to gain by building from source instead of installing from 
the wheel? 

2. Would it be OK to have a “-bin” variant installing from the binary 
distribution until we get these issues ironed out?

In my second attempt, I managed to build with foss/2017b (obviously the server 
was up again).  I have not really tested it yet (I am only just dabbing into 
TensorFlow and my main application i crashing due to another problem).  Do you 
want me to submit the new .eb file as a PR to your PR?  Or should I just wait 
till your stuff has converged?

/Jakob


> 
> 
> regards,
> 
> Kenneth
>> 
>> Best regards
>> 
>> Jakob
>> 
>> 
>> 
>> WARNING: The lower priority option '-c opt' does not override the previous 
>> value '--compilation_mode=opt'.
>> WARNING: The lower priority option '-c opt' does not override the previous 
>> value '--compilation_mode=opt'.
>> Downloading 
>> https://github.com/bazelbuild/rules_closure/archive/4af89ef1db659eb41f110df189b67d4cf14073e1.tar.gz
>>  via codeload.github.com: 40,240 bytes
>> Downloading 
>> https://github.com/bazelbuild/rules_closure/archive/4af89ef1db659eb41f110df189b67d4cf14073e1.tar.gz
>>  via codeload.github.com: 205,436 bytes
>> Loading package: tensorflow/tools/pip_package
>> Loading package: @bazel_tools//tools/cpp
>> Loading package: @local_jdk//
>> Loading package: @local_config_cc//
>> Loading complete.  Analyzing...
>> ERROR: 
>> /home/niflheim/schiotz/easybuild_experimental/sandybridge/build/TensorFlow/1.4.0/foss-2017b-Python-3.6.3/tensorflow-1.4.0/tensorflow/tools/pip_package/BUILD:139:1:
>>  error loading package 'tensorflow': Encountered error while reading 
>> extension file 'protobuf.bzl': no such package '@protobuf_archive//': 
>> java.io.IOException: Error downloading 
>> [http://mirror.bazel.build/github.com/google/protobuf/archive/b04e5cba356212e4e8c66c61bbe0c3a20537c5b9.tar.gz]
>>  to 
>> /tmp/eb-GpWEyg/tmpfJrPWS-bazel-build/external/protobuf_archive/b04e5cba356212e4e8c66c61bbe0c3a20537c5b9.tar.gz:
>>  GET returned 502 Bad Gateway and referenced by 
>> '//tensorflow/tools/pip_package:build_pip_package'.
>> ERROR: 
>> /home/niflheim/schiotz/easybuild_experimental/sandybridge/build/TensorFlow/1.4.0/foss-2017b-Python-3.6.3/tensorflow-1.4.0/tensorflow/tools/pip_package/BUILD:139:1:
>>  error loading package 'tensorflow': Encountered error while reading 
>> extension file 'protobuf.bzl': no such package '@protobuf_archive//': 
>> java.io.IOException: Error downloading 
>> [http://mirror.bazel.build/github.com/google/protobuf/archive/b04e5cba356212e4e8c66c61bbe0c3a20537c5b9.tar.gz]
>>  to 
>> /tmp/eb-GpWEyg/tmpfJrPWS-bazel-build/external/protobuf_archive/b04e5cba356212e4e8c66c61bbe0c3a20537c5b9.tar.gz:
>>  GET returned 502 Bad Gateway and referenced by 
>> '//tensorflow/tools/pip_package:build_pip_package'.
>> ERROR: Analysis of target '//tensorflow/tools/pip_package:build_pip_package' 
>> failed; build aborted: error loading package 'tensorflow': Encountered error 
>> while reading extension file 'protobuf.bzl': no such package 
>> '@protobuf_archive//': java.io.IOException: Error downloading 
>> [http://mirror.bazel.build/github.com/google/protobuf/archive/b04e5cba356212e4e8c66c61bbe0c3a20537c5b9.tar.gz]
>>  to 
>> /tmp/eb-GpWEyg/tmpfJrPWS-bazel-build/external/protobuf_archive/b04e5cba356212e4e8c66c61bbe0c3a20537c5b9.tar.gz:
>>  GET returned 502 Bad Gateway.
>> Elapsed time: 6.561s
>>  (at easybuild/tools/run.py:481 in parse_cmd_output)
>> == 2018-01-05 14:07:30,582 easyblock.py:2685 WARNING build failed (first 300 
>> chars): cmd "bazel 

Re: [easybuild] TensorFlow with GPU support.

2018-01-05 Thread Kenneth Hoste

On 05/01/2018 14:13, Jakob Schiøtz wrote:

Hi again,

Yes, I have overlooked that - I just switched my repo to your branch and tried 
to build :-)

Now I get an error when building TensorFlow.  It is a 502 Bad Gateway, 
indicating that some server is down somewhere.  But is it not a problem that 
the build process itself tried to download extra stuff in addition to the 
source files listed in the .eb file?  At least it makes the checksum checking 
moot.


That's indeed a problem, but one that is hard to avoid with TensorFlow, 
at least in a first iteration...


Once we're happy with the current approach, a new target could be to get 
TensorFlow to build "offline".


One step at a time though... ;-)


regards,

Kenneth


Best regards

Jakob



WARNING: The lower priority option '-c opt' does not override the previous 
value '--compilation_mode=opt'.
WARNING: The lower priority option '-c opt' does not override the previous 
value '--compilation_mode=opt'.
Downloading 
https://github.com/bazelbuild/rules_closure/archive/4af89ef1db659eb41f110df189b67d4cf14073e1.tar.gz
 via codeload.github.com: 40,240 bytes
Downloading 
https://github.com/bazelbuild/rules_closure/archive/4af89ef1db659eb41f110df189b67d4cf14073e1.tar.gz
 via codeload.github.com: 205,436 bytes
Loading package: tensorflow/tools/pip_package
Loading package: @bazel_tools//tools/cpp
Loading package: @local_jdk//
Loading package: @local_config_cc//
Loading complete.  Analyzing...
ERROR: 
/home/niflheim/schiotz/easybuild_experimental/sandybridge/build/TensorFlow/1.4.0/foss-2017b-Python-3.6.3/tensorflow-1.4.0/tensorflow/tools/pip_package/BUILD:139:1:
 error loading package 'tensorflow': Encountered error while reading extension 
file 'protobuf.bzl': no such package '@protobuf_archive//': 
java.io.IOException: Error downloading 
[http://mirror.bazel.build/github.com/google/protobuf/archive/b04e5cba356212e4e8c66c61bbe0c3a20537c5b9.tar.gz]
 to 
/tmp/eb-GpWEyg/tmpfJrPWS-bazel-build/external/protobuf_archive/b04e5cba356212e4e8c66c61bbe0c3a20537c5b9.tar.gz:
 GET returned 502 Bad Gateway and referenced by 
'//tensorflow/tools/pip_package:build_pip_package'.
ERROR: 
/home/niflheim/schiotz/easybuild_experimental/sandybridge/build/TensorFlow/1.4.0/foss-2017b-Python-3.6.3/tensorflow-1.4.0/tensorflow/tools/pip_package/BUILD:139:1:
 error loading package 'tensorflow': Encountered error while reading extension 
file 'protobuf.bzl': no such package '@protobuf_archive//': 
java.io.IOException: Error downloading 
[http://mirror.bazel.build/github.com/google/protobuf/archive/b04e5cba356212e4e8c66c61bbe0c3a20537c5b9.tar.gz]
 to 
/tmp/eb-GpWEyg/tmpfJrPWS-bazel-build/external/protobuf_archive/b04e5cba356212e4e8c66c61bbe0c3a20537c5b9.tar.gz:
 GET returned 502 Bad Gateway and referenced by 
'//tensorflow/tools/pip_package:build_pip_package'.
ERROR: Analysis of target '//tensorflow/tools/pip_package:build_pip_package' 
failed; build aborted: error loading package 'tensorflow': Encountered error 
while reading extension file 'protobuf.bzl': no such package 
'@protobuf_archive//': java.io.IOException: Error downloading 
[http://mirror.bazel.build/github.com/google/protobuf/archive/b04e5cba356212e4e8c66c61bbe0c3a20537c5b9.tar.gz]
 to 
/tmp/eb-GpWEyg/tmpfJrPWS-bazel-build/external/protobuf_archive/b04e5cba356212e4e8c66c61bbe0c3a20537c5b9.tar.gz:
 GET returned 502 Bad Gateway.
Elapsed time: 6.561s
  (at easybuild/tools/run.py:481 in parse_cmd_output)
== 2018-01-05 14:07:30,582 easyblock.py:2685 WARNING build failed (first 300 chars): cmd 
"bazel --output_base=/tmp/eb-GpWEyg/tmpfJrPWS-bazel-build build 
--compilation_mode=opt --config=opt --subcommands --verbose_failures  --config=mkl 
//tensorflow/tools/pip_package:build_pip_package" exited with exit code 1 and output:




On 5 Jan 2018, at 13:50, Kenneth Hoste  wrote:

Hi Jakob,

On 05/01/2018 13:19, Jakob Schiøtz wrote:

Hi Kenneth,

Is it possible that you forgot to check in the patches 
TensorFlow-1.4.0_swig-env.patch and TensorFlow-1.4.0_no-enum34.patch in your 
PR?  Attempting to build TensorFlow fails because it cannot find these.

The patch files are available from 
https://github.com/easybuilders/easybuild-easyconfigs/pull/5318 (as mentioned 
in the description of the PR).


regards,

Kenneth

Best regards

Jakob





On 4 Jan 2018, at 16:37, Jakob Schiøtz  wrote:

Dear Kenneth, Pablo and Maxime,

Thanks for your feedback.  Yes, I will try to see if I can build from source, 
but I will focus on the foss toolchain since we use that one for our Python 
here (we do not have the Intel MPI license, and the iomkl toolchain could not 
built Python last time I tried).

I assume the reason for building from source is to ensure consistent library 
versions etc.  If that proves very difficult, could we perhaps in the interim 
have builds (with a -bin suffix?) using the prebuilt wheels?

Best regards

Jakob



On 4 Jan 2018, at 

Re: [easybuild] TensorFlow with GPU support.

2018-01-05 Thread Jakob Schiøtz
Hi again,

Yes, I have overlooked that - I just switched my repo to your branch and tried 
to build :-)

Now I get an error when building TensorFlow.  It is a 502 Bad Gateway, 
indicating that some server is down somewhere.  But is it not a problem that 
the build process itself tried to download extra stuff in addition to the 
source files listed in the .eb file?  At least it makes the checksum checking 
moot.

Best regards

Jakob



WARNING: The lower priority option '-c opt' does not override the previous 
value '--compilation_mode=opt'.
WARNING: The lower priority option '-c opt' does not override the previous 
value '--compilation_mode=opt'.
Downloading 
https://github.com/bazelbuild/rules_closure/archive/4af89ef1db659eb41f110df189b67d4cf14073e1.tar.gz
 via codeload.github.com: 40,240 bytes
Downloading 
https://github.com/bazelbuild/rules_closure/archive/4af89ef1db659eb41f110df189b67d4cf14073e1.tar.gz
 via codeload.github.com: 205,436 bytes
Loading package: tensorflow/tools/pip_package
Loading package: @bazel_tools//tools/cpp
Loading package: @local_jdk//
Loading package: @local_config_cc//
Loading complete.  Analyzing...
ERROR: 
/home/niflheim/schiotz/easybuild_experimental/sandybridge/build/TensorFlow/1.4.0/foss-2017b-Python-3.6.3/tensorflow-1.4.0/tensorflow/tools/pip_package/BUILD:139:1:
 error loading package 'tensorflow': Encountered error while reading extension 
file 'protobuf.bzl': no such package '@protobuf_archive//': 
java.io.IOException: Error downloading 
[http://mirror.bazel.build/github.com/google/protobuf/archive/b04e5cba356212e4e8c66c61bbe0c3a20537c5b9.tar.gz]
 to 
/tmp/eb-GpWEyg/tmpfJrPWS-bazel-build/external/protobuf_archive/b04e5cba356212e4e8c66c61bbe0c3a20537c5b9.tar.gz:
 GET returned 502 Bad Gateway and referenced by 
'//tensorflow/tools/pip_package:build_pip_package'.
ERROR: 
/home/niflheim/schiotz/easybuild_experimental/sandybridge/build/TensorFlow/1.4.0/foss-2017b-Python-3.6.3/tensorflow-1.4.0/tensorflow/tools/pip_package/BUILD:139:1:
 error loading package 'tensorflow': Encountered error while reading extension 
file 'protobuf.bzl': no such package '@protobuf_archive//': 
java.io.IOException: Error downloading 
[http://mirror.bazel.build/github.com/google/protobuf/archive/b04e5cba356212e4e8c66c61bbe0c3a20537c5b9.tar.gz]
 to 
/tmp/eb-GpWEyg/tmpfJrPWS-bazel-build/external/protobuf_archive/b04e5cba356212e4e8c66c61bbe0c3a20537c5b9.tar.gz:
 GET returned 502 Bad Gateway and referenced by 
'//tensorflow/tools/pip_package:build_pip_package'.
ERROR: Analysis of target '//tensorflow/tools/pip_package:build_pip_package' 
failed; build aborted: error loading package 'tensorflow': Encountered error 
while reading extension file 'protobuf.bzl': no such package 
'@protobuf_archive//': java.io.IOException: Error downloading 
[http://mirror.bazel.build/github.com/google/protobuf/archive/b04e5cba356212e4e8c66c61bbe0c3a20537c5b9.tar.gz]
 to 
/tmp/eb-GpWEyg/tmpfJrPWS-bazel-build/external/protobuf_archive/b04e5cba356212e4e8c66c61bbe0c3a20537c5b9.tar.gz:
 GET returned 502 Bad Gateway.
Elapsed time: 6.561s
 (at easybuild/tools/run.py:481 in parse_cmd_output)
== 2018-01-05 14:07:30,582 easyblock.py:2685 WARNING build failed (first 300 
chars): cmd "bazel --output_base=/tmp/eb-GpWEyg/tmpfJrPWS-bazel-build build 
--compilation_mode=opt --config=opt --subcommands --verbose_failures  
--config=mkl //tensorflow/tools/pip_package:build_pip_package" exited with exit 
code 1 and output:



> On 5 Jan 2018, at 13:50, Kenneth Hoste  wrote:
> 
> Hi Jakob,
> 
> On 05/01/2018 13:19, Jakob Schiøtz wrote:
>> Hi Kenneth,
>> 
>> Is it possible that you forgot to check in the patches 
>> TensorFlow-1.4.0_swig-env.patch and TensorFlow-1.4.0_no-enum34.patch in your 
>> PR?  Attempting to build TensorFlow fails because it cannot find these.
> 
> The patch files are available from 
> https://github.com/easybuilders/easybuild-easyconfigs/pull/5318 (as mentioned 
> in the description of the PR).
> 
> 
> regards,
> 
> Kenneth
>> 
>> Best regards
>> 
>> Jakob
>> 
>> 
>> 
>> 
>>> On 4 Jan 2018, at 16:37, Jakob Schiøtz  wrote:
>>> 
>>> Dear Kenneth, Pablo and Maxime,
>>> 
>>> Thanks for your feedback.  Yes, I will try to see if I can build from 
>>> source, but I will focus on the foss toolchain since we use that one for 
>>> our Python here (we do not have the Intel MPI license, and the iomkl 
>>> toolchain could not built Python last time I tried).
>>> 
>>> I assume the reason for building from source is to ensure consistent 
>>> library versions etc.  If that proves very difficult, could we perhaps in 
>>> the interim have builds (with a -bin suffix?) using the prebuilt wheels?
>>> 
>>> Best regards
>>> 
>>> Jakob
>>> 
>>> 
 On 4 Jan 2018, at 15:29, Kenneth Hoste  wrote:
 
 Dear Jakob,
 
 On 04/01/2018 10:23, Jakob Schiøtz wrote:
> Hi,
> 
> I made a TensorFlow 

Re: [easybuild] TensorFlow with GPU support.

2018-01-05 Thread Kenneth Hoste

On 04/01/2018 16:37, Jakob Schiøtz wrote:

Dear Kenneth, Pablo and Maxime,

Thanks for your feedback.  Yes, I will try to see if I can build from source, 
but I will focus on the foss toolchain since we use that one for our Python 
here (we do not have the Intel MPI license, and the iomkl toolchain could not 
built Python last time I tried).

I assume the reason for building from source is to ensure consistent library 
versions etc.  If that proves very difficult, could we perhaps in the interim 
have builds (with a -bin suffix?) using the prebuilt wheels?


The main reason for building from source is performance and 
compatibility with the OS.


The binary wheels that are available for TensorFlow are not compatible 
with older OS versions like CentOS 6, as I experienced first-hand when 
trying to get it to work on an older (GPU) system.
Since the compilation from source with CUDA support didn't work yet, I 
had to resort to injecting a newer glibc version in the 'python' binary, 
which was not fun (well...).


For CPU-only installations, you really have no other option than 
building from source, since the binary wheels were not built with AVX2 
instructions for example, which leads to large performance losses (some 
quick benchmarking showed a 7x increase in performance for TF 1.4 built 
with foss/2017b over using the binary wheel).


For GPU installations, a similar concern arises, although it may be less 
severe there, depending on what CUDA compute capabilities the binary 
wheels were built with (I only tested the wheels on old systems with 
NVIDIA K20x/K40 GPUs, so there I doubt you'll get much performance 
increase when building from source).


If it turns out to be too difficult or time-consuming to get the build 
from source with CUDA support to work, then we can of course progress 
with sticking to the binary wheel releases for now, I'm not going to 
oppose that.



regards,

Kenneth



Best regards

Jakob



On 4 Jan 2018, at 15:29, Kenneth Hoste  wrote:

Dear Jakob,

On 04/01/2018 10:23, Jakob Schiøtz wrote:

Hi,

I made a TensorFlow easyconfig a while ago depending on Python with the foss 
toolchain; and including a variant with GPU support (PR 4904).  The latter has 
not yet been merged, probably because it is annoying to have something that can 
only build on a machine with a GPU (it fails the sanity check otherwise, as 
TensorFlow with GPU support cannot load on a machine without it).

Not being able to test this on a non-GPU system is a bit unfortunate, but 
that's not a reason that it hasn't been merged yet, that's mostly due to a lack 
of time from my side to get back to it...


Since I made that PR, two newer releases of TensorFlow have appeared (1.3 and 
1.4).   There are easyconfigs for 1.3 with the Intel tool chain.  I am 
considering making easyconfigs for TensorFlow 1.4 with Python-3.6.3-foss-2017b 
(both with and without GPU support), but first I would like to know if anybody 
else is doing this - it is my impression that somebody who actually know what 
they are doing may be working on TensorFlow. :-)

I have spent quite a bit of time puzzling together an easyblock that supports 
building TensorFlow from source, see [1].

It already works for non-GPU installations (see [2] for example), but it's not 
entirely finished yet because:

* building from source with CUDA support does not work yet, the build fails 
with strange Bazel errors...

* there are some issues when the TensorFlow easyblock is used together with 
--use-ccache and the Intel compilers;
   because two compiler wrappers are used, they end up calling each other resulting in a 
"fork bomb" style situation...

I would really like to get it finished and have easyconfigs available for 
TensorFlow 1.4 and newer where we properly build TensorFlow from source rather 
than using the binary wheels...

Are you up for giving it a try, and maybe helping out with the problems 
mentioned above?


regards,

Kenneth


[1] https://github.com/easybuilders/easybuild-easyblocks/pull/1287
[2] https://github.com/easybuilders/easybuild-easyconfigs/pull/5499


Best regards

Jakob

--
Jakob Schiøtz, professor, Ph.D.
Department of Physics
Technical University of Denmark
DK-2800 Kongens Lyngby, Denmark
http://www.fysik.dtu.dk/~schiotz/




--
Jakob Schiøtz, professor, Ph.D.
Department of Physics
Technical University of Denmark
DK-2800 Kongens Lyngby, Denmark
http://www.fysik.dtu.dk/~schiotz/







Re: [easybuild] TensorFlow with GPU support.

2018-01-05 Thread Kenneth Hoste

Hi Jakob,

On 05/01/2018 13:19, Jakob Schiøtz wrote:

Hi Kenneth,

Is it possible that you forgot to check in the patches 
TensorFlow-1.4.0_swig-env.patch and TensorFlow-1.4.0_no-enum34.patch in your 
PR?  Attempting to build TensorFlow fails because it cannot find these.


The patch files are available from 
https://github.com/easybuilders/easybuild-easyconfigs/pull/5318 (as 
mentioned in the description of the PR).



regards,

Kenneth


Best regards

Jakob





On 4 Jan 2018, at 16:37, Jakob Schiøtz  wrote:

Dear Kenneth, Pablo and Maxime,

Thanks for your feedback.  Yes, I will try to see if I can build from source, 
but I will focus on the foss toolchain since we use that one for our Python 
here (we do not have the Intel MPI license, and the iomkl toolchain could not 
built Python last time I tried).

I assume the reason for building from source is to ensure consistent library 
versions etc.  If that proves very difficult, could we perhaps in the interim 
have builds (with a -bin suffix?) using the prebuilt wheels?

Best regards

Jakob



On 4 Jan 2018, at 15:29, Kenneth Hoste  wrote:

Dear Jakob,

On 04/01/2018 10:23, Jakob Schiøtz wrote:

Hi,

I made a TensorFlow easyconfig a while ago depending on Python with the foss 
toolchain; and including a variant with GPU support (PR 4904).  The latter has 
not yet been merged, probably because it is annoying to have something that can 
only build on a machine with a GPU (it fails the sanity check otherwise, as 
TensorFlow with GPU support cannot load on a machine without it).

Not being able to test this on a non-GPU system is a bit unfortunate, but 
that's not a reason that it hasn't been merged yet, that's mostly due to a lack 
of time from my side to get back to it...


Since I made that PR, two newer releases of TensorFlow have appeared (1.3 and 
1.4).   There are easyconfigs for 1.3 with the Intel tool chain.  I am 
considering making easyconfigs for TensorFlow 1.4 with Python-3.6.3-foss-2017b 
(both with and without GPU support), but first I would like to know if anybody 
else is doing this - it is my impression that somebody who actually know what 
they are doing may be working on TensorFlow. :-)

I have spent quite a bit of time puzzling together an easyblock that supports 
building TensorFlow from source, see [1].

It already works for non-GPU installations (see [2] for example), but it's not 
entirely finished yet because:

* building from source with CUDA support does not work yet, the build fails 
with strange Bazel errors...

* there are some issues when the TensorFlow easyblock is used together with 
--use-ccache and the Intel compilers;
  because two compiler wrappers are used, they end up calling each other resulting in a 
"fork bomb" style situation...

I would really like to get it finished and have easyconfigs available for 
TensorFlow 1.4 and newer where we properly build TensorFlow from source rather 
than using the binary wheels...

Are you up for giving it a try, and maybe helping out with the problems 
mentioned above?


regards,

Kenneth


[1] https://github.com/easybuilders/easybuild-easyblocks/pull/1287
[2] https://github.com/easybuilders/easybuild-easyconfigs/pull/5499


Best regards

Jakob

--
Jakob Schiøtz, professor, Ph.D.
Department of Physics
Technical University of Denmark
DK-2800 Kongens Lyngby, Denmark
http://www.fysik.dtu.dk/~schiotz/




--
Jakob Schiøtz, professor, Ph.D.
Department of Physics
Technical University of Denmark
DK-2800 Kongens Lyngby, Denmark
http://www.fysik.dtu.dk/~schiotz/




--
Jakob Schiøtz, professor, Ph.D.
Department of Physics
Technical University of Denmark
DK-2800 Kongens Lyngby, Denmark
http://www.fysik.dtu.dk/~schiotz/







Re: [easybuild] TensorFlow with GPU support.

2018-01-05 Thread Jakob Schiøtz
Hi Kenneth,

Is it possible that you forgot to check in the patches 
TensorFlow-1.4.0_swig-env.patch and TensorFlow-1.4.0_no-enum34.patch in your 
PR?  Attempting to build TensorFlow fails because it cannot find these.

Best regards

Jakob




> On 4 Jan 2018, at 16:37, Jakob Schiøtz  wrote:
> 
> Dear Kenneth, Pablo and Maxime,
> 
> Thanks for your feedback.  Yes, I will try to see if I can build from source, 
> but I will focus on the foss toolchain since we use that one for our Python 
> here (we do not have the Intel MPI license, and the iomkl toolchain could not 
> built Python last time I tried).
> 
> I assume the reason for building from source is to ensure consistent library 
> versions etc.  If that proves very difficult, could we perhaps in the interim 
> have builds (with a -bin suffix?) using the prebuilt wheels?
> 
> Best regards
> 
> Jakob
> 
> 
>> On 4 Jan 2018, at 15:29, Kenneth Hoste  wrote:
>> 
>> Dear Jakob,
>> 
>> On 04/01/2018 10:23, Jakob Schiøtz wrote:
>>> Hi,
>>> 
>>> I made a TensorFlow easyconfig a while ago depending on Python with the 
>>> foss toolchain; and including a variant with GPU support (PR 4904).  The 
>>> latter has not yet been merged, probably because it is annoying to have 
>>> something that can only build on a machine with a GPU (it fails the sanity 
>>> check otherwise, as TensorFlow with GPU support cannot load on a machine 
>>> without it).
>> 
>> Not being able to test this on a non-GPU system is a bit unfortunate, but 
>> that's not a reason that it hasn't been merged yet, that's mostly due to a 
>> lack of time from my side to get back to it...
>> 
>>> Since I made that PR, two newer releases of TensorFlow have appeared (1.3 
>>> and 1.4).   There are easyconfigs for 1.3 with the Intel tool chain.  I am 
>>> considering making easyconfigs for TensorFlow 1.4 with 
>>> Python-3.6.3-foss-2017b (both with and without GPU support), but first I 
>>> would like to know if anybody else is doing this - it is my impression that 
>>> somebody who actually know what they are doing may be working on 
>>> TensorFlow. :-)
>> 
>> I have spent quite a bit of time puzzling together an easyblock that 
>> supports building TensorFlow from source, see [1].
>> 
>> It already works for non-GPU installations (see [2] for example), but it's 
>> not entirely finished yet because:
>> 
>> * building from source with CUDA support does not work yet, the build fails 
>> with strange Bazel errors...
>> 
>> * there are some issues when the TensorFlow easyblock is used together with 
>> --use-ccache and the Intel compilers;
>>  because two compiler wrappers are used, they end up calling each other 
>> resulting in a "fork bomb" style situation...
>> 
>> I would really like to get it finished and have easyconfigs available for 
>> TensorFlow 1.4 and newer where we properly build TensorFlow from source 
>> rather than using the binary wheels...
>> 
>> Are you up for giving it a try, and maybe helping out with the problems 
>> mentioned above?
>> 
>> 
>> regards,
>> 
>> Kenneth
>> 
>> 
>> [1] https://github.com/easybuilders/easybuild-easyblocks/pull/1287
>> [2] https://github.com/easybuilders/easybuild-easyconfigs/pull/5499
>> 
>>> 
>>> Best regards
>>> 
>>> Jakob
>>> 
>>> --
>>> Jakob Schiøtz, professor, Ph.D.
>>> Department of Physics
>>> Technical University of Denmark
>>> DK-2800 Kongens Lyngby, Denmark
>>> http://www.fysik.dtu.dk/~schiotz/
>>> 
>>> 
>>> 
>> 
> 
> --
> Jakob Schiøtz, professor, Ph.D.
> Department of Physics
> Technical University of Denmark
> DK-2800 Kongens Lyngby, Denmark
> http://www.fysik.dtu.dk/~schiotz/
> 
> 
> 

--
Jakob Schiøtz, professor, Ph.D.
Department of Physics
Technical University of Denmark
DK-2800 Kongens Lyngby, Denmark
http://www.fysik.dtu.dk/~schiotz/





Re: [easybuild] TensorFlow with GPU support.

2018-01-04 Thread Maxime Boissonneault

On 18-01-04 04:23, Jakob Schiøtz wrote:

Hi,

I made a TensorFlow easyconfig a while ago depending on Python with the foss 
toolchain; and including a variant with GPU support (PR 4904).  The latter has 
not yet been merged, probably because it is annoying to have something that can 
only build on a machine with a GPU (it fails the sanity check otherwise, as 
TensorFlow with GPU support cannot load on a machine without it).

Since I made that PR, two newer releases of TensorFlow have appeared (1.3 and 
1.4).
You're actually missing 1.5 which just came out this morning. Built with 
AVX support, Cuda 9 and cuDNN 7.


Maxime


Re: [easybuild] TensorFlow with GPU support.

2018-01-04 Thread Kenneth Hoste

Dear Jakob,

On 04/01/2018 10:23, Jakob Schiøtz wrote:

Hi,

I made a TensorFlow easyconfig a while ago depending on Python with the foss 
toolchain; and including a variant with GPU support (PR 4904).  The latter has 
not yet been merged, probably because it is annoying to have something that can 
only build on a machine with a GPU (it fails the sanity check otherwise, as 
TensorFlow with GPU support cannot load on a machine without it).


Not being able to test this on a non-GPU system is a bit unfortunate, 
but that's not a reason that it hasn't been merged yet, that's mostly 
due to a lack of time from my side to get back to it...



Since I made that PR, two newer releases of TensorFlow have appeared (1.3 and 
1.4).   There are easyconfigs for 1.3 with the Intel tool chain.  I am 
considering making easyconfigs for TensorFlow 1.4 with Python-3.6.3-foss-2017b 
(both with and without GPU support), but first I would like to know if anybody 
else is doing this - it is my impression that somebody who actually know what 
they are doing may be working on TensorFlow. :-)


I have spent quite a bit of time puzzling together an easyblock that 
supports building TensorFlow from source, see [1].


It already works for non-GPU installations (see [2] for example), but 
it's not entirely finished yet because:


* building from source with CUDA support does not work yet, the build 
fails with strange Bazel errors...


* there are some issues when the TensorFlow easyblock is used together 
with --use-ccache and the Intel compilers;
  because two compiler wrappers are used, they end up calling each 
other resulting in a "fork bomb" style situation...


I would really like to get it finished and have easyconfigs available 
for TensorFlow 1.4 and newer where we properly build TensorFlow from 
source rather than using the binary wheels...


Are you up for giving it a try, and maybe helping out with the problems 
mentioned above?



regards,

Kenneth


[1] https://github.com/easybuilders/easybuild-easyblocks/pull/1287
[2] https://github.com/easybuilders/easybuild-easyconfigs/pull/5499



Best regards

Jakob

--
Jakob Schiøtz, professor, Ph.D.
Department of Physics
Technical University of Denmark
DK-2800 Kongens Lyngby, Denmark
http://www.fysik.dtu.dk/~schiotz/







Re: [easybuild] TensorFlow with GPU support.

2018-01-04 Thread Pablo Escobar Lopez
Hi Jakob,

I installed Tensorflow in my cluster few days ago modifying your
easyconfigs.  I have just sent two PR with the two easyconfigs I installed:

https://github.com/easybuilders/easybuild-easyconfigs/pull/5590
https://github.com/easybuilders/easybuild-easyconfigs/pull/5591

I used cuDDN 6.0 as dependency instead of cuDDN 7.x because the provided
.whl is linked with 6.0. If you try 7.x you will get a ".so lib not found"
error

regards,
Pablo.

2018-01-04 10:23 GMT+01:00 Jakob Schiøtz :

> Hi,
>
> I made a TensorFlow easyconfig a while ago depending on Python with the
> foss toolchain; and including a variant with GPU support (PR 4904).  The
> latter has not yet been merged, probably because it is annoying to have
> something that can only build on a machine with a GPU (it fails the sanity
> check otherwise, as TensorFlow with GPU support cannot load on a machine
> without it).
>
> Since I made that PR, two newer releases of TensorFlow have appeared (1.3
> and 1.4).   There are easyconfigs for 1.3 with the Intel tool chain.  I am
> considering making easyconfigs for TensorFlow 1.4 with
> Python-3.6.3-foss-2017b (both with and without GPU support), but first I
> would like to know if anybody else is doing this - it is my impression that
> somebody who actually know what they are doing may be working on
> TensorFlow. :-)
>
> Best regards
>
> Jakob
>
> --
> Jakob Schiøtz, professor, Ph.D.
> Department of Physics
> Technical University of Denmark
> DK-2800 Kongens Lyngby, Denmark
> http://www.fysik.dtu.dk/~schiotz/
>
>
>
>


-- 
Pablo Escobar López
HPC systems engineer
sciCORE, University of Basel
SIB Swiss Institute of Bioinformatics
http://scicore.unibas.ch