Re: [VOTE] Remove conflicting OpenMP from CMake builds

2019-06-17 Thread Pedro Larroy
Regarding your paste of ldd, not sure what's your point. This is the
output with the patch from PR applied, the openmp version used is the
one provided by MKL:


Version of the PR that removes openmp from 3rdparty comes from MKL:

linux-vdso.so.1 (0x7ffc4c993000)
libmkldnn.so.0 =>
/home/piotr/mxnet_openmp/build/3rdparty/mkldnn/src/libmkldnn.so.0
(0x7f7874519000)
libopenblas.so.0 => /usr/lib/x86_64-linux-gnu/libopenblas.so.0
(0x7f7872273000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x7f787206b000)
libjemalloc.so.1 => /usr/lib/x86_64-linux-gnu/libjemalloc.so.1
(0x7f7871e35000)
libmklml_intel.so =>
/home/piotr/mxnet_openmp/build/mklml/mklml_lnx_2019.0.5.20190502/lib/libmklml_intel.so
(0x7f786a3a3000)
libiomp5.so =>
/home/piotr/mxnet_openmp/build/mklml/mklml_lnx_2019.0.5.20190502/lib/libiomp5.so
(0x7f7869fae000)



In master, ldd shows that openmp is coming from 3rdparty:

libomp.so =>
/home/piotr/mxnet_master/build/3rdparty/openmp/runtime/src/libomp.so
(0x7f1ea2353000)



Could you please explain your argument on what's the recommended way
and what's the divergence?  As explained in the PR there seems to be
no performance advantage on adding our own version of openmp.

Thanks.

On Mon, Jun 17, 2019 at 12:55 PM Pedro Larroy
 wrote:
>
> Thanks for your answer Chris. I'm not militant in regards to one
> option or another nor belong of any club that will accept me as
> member, but I think we should drive this to conclusion and understand
> why we keep it or if we should remove it and use the platform provided
> version. There are open issues linked on that PR that are causing
> problems, that remain unaddressed it has been said that linking with
> two openmp versions causes undefined behaviour, I had the experience
> of having MXNet hang when using OpenMP running the tests in plain
> ubuntu CPU, I know we are reentering OpenMP initialization when
> creating threads in the engine and getting an assertion, etc, so I
> have serious concerns when running MXNet with OpenMP enabled in
> production.
>
> I think you are the expert in OpenMP and HPC and it would be good to
> have a documented and explainable outcome of one option or another
> either in the mailing list or in the wiki.
>
> I also feel bad for contributors spending time measuring and
> benchmarking without conclusion, I feel they have given up working on
> this issue since the PR keeps getting closed and is not moved forward.
>
> Please help us understand whats the best way forward with this issue
> and not just close the PR without explanations.
>
> Pedro.
>
> On Mon, Jun 17, 2019 at 11:15 AM Chris Olivier  wrote:
> >
> > I am curious why you're being so militant troll about this.  libomp is used
> > in every MKL build (download mxnet-mkl yourself and see).  I don't see any
> > convincing reason to change it and so far as I can tell, no real issue has
> > been proven to be related.  Anyway, I am reluctant to feed trolls any more
> > than this, so I don't really have much else to add.
> >
> > ldd /usr/local/lib/python3.6/dist-packages/mxnet/libmxnet.so
> > linux-vdso.so.1 (0x7ffc989cf000)
> > libmklml_intel.so =>
> > /usr/local/lib/python3.6/dist-packages/mxnet/libmklml_intel.so
> > (0x7f0afb7c1000)
> >* libiomp5.so =>
> > /usr/local/lib/python3.6/dist-packages/mxnet/libiomp5.so
> > (0x7f0afb3e5000)*
> > librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x7f0afb1dd000)
> > libmkldnn.so.0 =>
> > /usr/local/lib/python3.6/dist-packages/mxnet/libmkldnn.so.0
> > (0x7f0afa7ba000)
> > libgfortran.so.3 =>
> > /usr/local/lib/python3.6/dist-packages/mxnet/libgfortran.so.3
> > (0x7f0afa493000)
> > libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x7f0afa28f000)
> > libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> > (0x7f0af9f06000)
> > libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x7f0af9b68000)
> > libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1
> > (0x7f0af995)
> > libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0
> > (0x7f0af9731000)
> > libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x7f0af934)
> > /lib64/ld-linux-x86-64.so.2 (0x7f0b073f4000)
> > libquadmath.so.0 =>
> > /usr/local/lib/python3.6/dist-packages/mxnet/libquadmath.so.0
> > (0x7f0af910)
> >
> >
> > On Mon, Jun 17, 2019 at 10:58 AM Pedro Larroy 
> > wrote:
> >
> > > I had read the "Apache Voting Process" guide here:
> > > https://www.apache.org/foundation/voting.html  and I thought code
> > > changes could be discussed on the mailing list in cases where the PR
> > > is stuck or there's no response for a long time, also I understood
> > > that -1's have to be justified.  Could you, or someone more familiar
> > > which the Apache way enlighten on how to move this issue forward in a
> > > constructive way?
> 

Re: [VOTE] Remove conflicting OpenMP from CMake builds

2019-06-17 Thread Pedro Larroy
Thanks for your answer Chris. I'm not militant in regards to one
option or another nor belong of any club that will accept me as
member, but I think we should drive this to conclusion and understand
why we keep it or if we should remove it and use the platform provided
version. There are open issues linked on that PR that are causing
problems, that remain unaddressed it has been said that linking with
two openmp versions causes undefined behaviour, I had the experience
of having MXNet hang when using OpenMP running the tests in plain
ubuntu CPU, I know we are reentering OpenMP initialization when
creating threads in the engine and getting an assertion, etc, so I
have serious concerns when running MXNet with OpenMP enabled in
production.

I think you are the expert in OpenMP and HPC and it would be good to
have a documented and explainable outcome of one option or another
either in the mailing list or in the wiki.

I also feel bad for contributors spending time measuring and
benchmarking without conclusion, I feel they have given up working on
this issue since the PR keeps getting closed and is not moved forward.

Please help us understand whats the best way forward with this issue
and not just close the PR without explanations.

Pedro.

On Mon, Jun 17, 2019 at 11:15 AM Chris Olivier  wrote:
>
> I am curious why you're being so militant troll about this.  libomp is used
> in every MKL build (download mxnet-mkl yourself and see).  I don't see any
> convincing reason to change it and so far as I can tell, no real issue has
> been proven to be related.  Anyway, I am reluctant to feed trolls any more
> than this, so I don't really have much else to add.
>
> ldd /usr/local/lib/python3.6/dist-packages/mxnet/libmxnet.so
> linux-vdso.so.1 (0x7ffc989cf000)
> libmklml_intel.so =>
> /usr/local/lib/python3.6/dist-packages/mxnet/libmklml_intel.so
> (0x7f0afb7c1000)
>* libiomp5.so =>
> /usr/local/lib/python3.6/dist-packages/mxnet/libiomp5.so
> (0x7f0afb3e5000)*
> librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x7f0afb1dd000)
> libmkldnn.so.0 =>
> /usr/local/lib/python3.6/dist-packages/mxnet/libmkldnn.so.0
> (0x7f0afa7ba000)
> libgfortran.so.3 =>
> /usr/local/lib/python3.6/dist-packages/mxnet/libgfortran.so.3
> (0x7f0afa493000)
> libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x7f0afa28f000)
> libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> (0x7f0af9f06000)
> libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x7f0af9b68000)
> libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1
> (0x7f0af995)
> libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0
> (0x7f0af9731000)
> libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x7f0af934)
> /lib64/ld-linux-x86-64.so.2 (0x7f0b073f4000)
> libquadmath.so.0 =>
> /usr/local/lib/python3.6/dist-packages/mxnet/libquadmath.so.0
> (0x7f0af910)
>
>
> On Mon, Jun 17, 2019 at 10:58 AM Pedro Larroy 
> wrote:
>
> > I had read the "Apache Voting Process" guide here:
> > https://www.apache.org/foundation/voting.html  and I thought code
> > changes could be discussed on the mailing list in cases where the PR
> > is stuck or there's no response for a long time, also I understood
> > that -1's have to be justified.  Could you, or someone more familiar
> > which the Apache way enlighten on how to move this issue forward in a
> > constructive way?
> >
> > Thanks a lot.
> >
> > Pedro.
> >
> > On Mon, Jun 17, 2019 at 10:46 AM Pedro Larroy
> >  wrote:
> > >
> > > Thanks.
> > >
> > > How do we go on advancing this PR then? all the questions have been
> > > answered, performance numbers provided and more. Until how long can a
> > > veto stand? Also without replies to contributors.
> > >
> > > Pedro.
> > >
> > > On Fri, Jun 14, 2019 at 5:44 PM Sheng Zha  wrote:
> > > >
> > > > This vote is invalid as the original PR has been vetoed by a
> > committer. A vote on dev@ won't help you circumvent a veto.
> > > >
> > > > -sz
> > > >
> > > > On 2019/06/14 23:59:33, Pedro Larroy 
> > wrote:
> > > > > Hi all
> > > > >
> > > > > This is a 5-day vote to act and wrap up an outstanding PR that
> > removes
> > > > > linkage with multiple OpenMP from 3rdparty and uses the system
> > > > > provided one which might resolve a number of difficult to debug
> > issues
> > > > > and possible undefined behaviour.
> > > > >
> > > > > https://github.com/apache/incubator-mxnet/pull/12160
> > > > >
> > > > > See the comments in the thread for more details but in short, linking
> > > > > with multiple openmp versions seems to lead to undefined behaviour,
> > > > > plus it would simplify not having to deal with a custom openmp
> > version
> > > > > and rely on the platform provided one.
> > > > >
> > > > > This is expected to simplify builds and address a number of problems.
> > > > > Seems it doesn't cause any performance degradation, (the Gluon 

Re: [VOTE] Remove conflicting OpenMP from CMake builds

2019-06-17 Thread Chris Olivier
I am curious why you're being so militant troll about this.  libomp is used
in every MKL build (download mxnet-mkl yourself and see).  I don't see any
convincing reason to change it and so far as I can tell, no real issue has
been proven to be related.  Anyway, I am reluctant to feed trolls any more
than this, so I don't really have much else to add.

ldd /usr/local/lib/python3.6/dist-packages/mxnet/libmxnet.so
linux-vdso.so.1 (0x7ffc989cf000)
libmklml_intel.so =>
/usr/local/lib/python3.6/dist-packages/mxnet/libmklml_intel.so
(0x7f0afb7c1000)
   * libiomp5.so =>
/usr/local/lib/python3.6/dist-packages/mxnet/libiomp5.so
(0x7f0afb3e5000)*
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x7f0afb1dd000)
libmkldnn.so.0 =>
/usr/local/lib/python3.6/dist-packages/mxnet/libmkldnn.so.0
(0x7f0afa7ba000)
libgfortran.so.3 =>
/usr/local/lib/python3.6/dist-packages/mxnet/libgfortran.so.3
(0x7f0afa493000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x7f0afa28f000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6
(0x7f0af9f06000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x7f0af9b68000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1
(0x7f0af995)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0
(0x7f0af9731000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x7f0af934)
/lib64/ld-linux-x86-64.so.2 (0x7f0b073f4000)
libquadmath.so.0 =>
/usr/local/lib/python3.6/dist-packages/mxnet/libquadmath.so.0
(0x7f0af910)


On Mon, Jun 17, 2019 at 10:58 AM Pedro Larroy 
wrote:

> I had read the "Apache Voting Process" guide here:
> https://www.apache.org/foundation/voting.html  and I thought code
> changes could be discussed on the mailing list in cases where the PR
> is stuck or there's no response for a long time, also I understood
> that -1's have to be justified.  Could you, or someone more familiar
> which the Apache way enlighten on how to move this issue forward in a
> constructive way?
>
> Thanks a lot.
>
> Pedro.
>
> On Mon, Jun 17, 2019 at 10:46 AM Pedro Larroy
>  wrote:
> >
> > Thanks.
> >
> > How do we go on advancing this PR then? all the questions have been
> > answered, performance numbers provided and more. Until how long can a
> > veto stand? Also without replies to contributors.
> >
> > Pedro.
> >
> > On Fri, Jun 14, 2019 at 5:44 PM Sheng Zha  wrote:
> > >
> > > This vote is invalid as the original PR has been vetoed by a
> committer. A vote on dev@ won't help you circumvent a veto.
> > >
> > > -sz
> > >
> > > On 2019/06/14 23:59:33, Pedro Larroy 
> wrote:
> > > > Hi all
> > > >
> > > > This is a 5-day vote to act and wrap up an outstanding PR that
> removes
> > > > linkage with multiple OpenMP from 3rdparty and uses the system
> > > > provided one which might resolve a number of difficult to debug
> issues
> > > > and possible undefined behaviour.
> > > >
> > > > https://github.com/apache/incubator-mxnet/pull/12160
> > > >
> > > > See the comments in the thread for more details but in short, linking
> > > > with multiple openmp versions seems to lead to undefined behaviour,
> > > > plus it would simplify not having to deal with a custom openmp
> version
> > > > and rely on the platform provided one.
> > > >
> > > > This is expected to simplify builds and address a number of problems.
> > > > Seems it doesn't cause any performance degradation, (the Gluon tests
> > > > run almost 4x faster in my 64 core machine).
> > > >
> > > > There has been in depth study of performance implications by
> > > > contributors like Stanislav Tsukrov and Anton Chernov.  All the
> > > > concerns and comments from the reviewers have been addressed and we
> > > > can't keep asking open ended questions to block PRs. Reviewers are
> > > > expected to be proactive and responsive to contributors so we keep
> > > > encouraging active contributors.
> > > >
> > > > please vote to merge this PR accordingly:
> > > >
> > > > +1 = approve
> > > > +0 = no opinion
> > > > -1 = disapprove (provide reason)
> > > >
> > > > If we observe regressions reported by any internal performance
> systems
> > > > or by contributors the PR can be reverted easily. So it's not a one
> > > > way door. But it will be useful to try this in master for a while.
> > > >
> > > > Thank you.
> > > >
> > > > Pedro.
> > > >
>


Re: [VOTE] Remove conflicting OpenMP from CMake builds

2019-06-17 Thread Pedro Larroy
I had read the "Apache Voting Process" guide here:
https://www.apache.org/foundation/voting.html  and I thought code
changes could be discussed on the mailing list in cases where the PR
is stuck or there's no response for a long time, also I understood
that -1's have to be justified.  Could you, or someone more familiar
which the Apache way enlighten on how to move this issue forward in a
constructive way?

Thanks a lot.

Pedro.

On Mon, Jun 17, 2019 at 10:46 AM Pedro Larroy
 wrote:
>
> Thanks.
>
> How do we go on advancing this PR then? all the questions have been
> answered, performance numbers provided and more. Until how long can a
> veto stand? Also without replies to contributors.
>
> Pedro.
>
> On Fri, Jun 14, 2019 at 5:44 PM Sheng Zha  wrote:
> >
> > This vote is invalid as the original PR has been vetoed by a committer. A 
> > vote on dev@ won't help you circumvent a veto.
> >
> > -sz
> >
> > On 2019/06/14 23:59:33, Pedro Larroy  wrote:
> > > Hi all
> > >
> > > This is a 5-day vote to act and wrap up an outstanding PR that removes
> > > linkage with multiple OpenMP from 3rdparty and uses the system
> > > provided one which might resolve a number of difficult to debug issues
> > > and possible undefined behaviour.
> > >
> > > https://github.com/apache/incubator-mxnet/pull/12160
> > >
> > > See the comments in the thread for more details but in short, linking
> > > with multiple openmp versions seems to lead to undefined behaviour,
> > > plus it would simplify not having to deal with a custom openmp version
> > > and rely on the platform provided one.
> > >
> > > This is expected to simplify builds and address a number of problems.
> > > Seems it doesn't cause any performance degradation, (the Gluon tests
> > > run almost 4x faster in my 64 core machine).
> > >
> > > There has been in depth study of performance implications by
> > > contributors like Stanislav Tsukrov and Anton Chernov.  All the
> > > concerns and comments from the reviewers have been addressed and we
> > > can't keep asking open ended questions to block PRs. Reviewers are
> > > expected to be proactive and responsive to contributors so we keep
> > > encouraging active contributors.
> > >
> > > please vote to merge this PR accordingly:
> > >
> > > +1 = approve
> > > +0 = no opinion
> > > -1 = disapprove (provide reason)
> > >
> > > If we observe regressions reported by any internal performance systems
> > > or by contributors the PR can be reverted easily. So it's not a one
> > > way door. But it will be useful to try this in master for a while.
> > >
> > > Thank you.
> > >
> > > Pedro.
> > >


Re: [VOTE] Remove conflicting OpenMP from CMake builds

2019-06-17 Thread Sheng Zha
Vetos stand until and unless withdrawn by their casters [1]. The only path 
forward would be to convince the caster to withdraw the veto.

-sz

[1] https://www.apache.org/foundation/voting.html#Veto

On 2019/06/17 17:46:59, Pedro Larroy  wrote: 
> Thanks.
> 
> How do we go on advancing this PR then? all the questions have been
> answered, performance numbers provided and more. Until how long can a
> veto stand? Also without replies to contributors.
> 
> Pedro.
> 
> On Fri, Jun 14, 2019 at 5:44 PM Sheng Zha  wrote:
> >
> > This vote is invalid as the original PR has been vetoed by a committer. A 
> > vote on dev@ won't help you circumvent a veto.
> >
> > -sz
> >
> > On 2019/06/14 23:59:33, Pedro Larroy  wrote:
> > > Hi all
> > >
> > > This is a 5-day vote to act and wrap up an outstanding PR that removes
> > > linkage with multiple OpenMP from 3rdparty and uses the system
> > > provided one which might resolve a number of difficult to debug issues
> > > and possible undefined behaviour.
> > >
> > > https://github.com/apache/incubator-mxnet/pull/12160
> > >
> > > See the comments in the thread for more details but in short, linking
> > > with multiple openmp versions seems to lead to undefined behaviour,
> > > plus it would simplify not having to deal with a custom openmp version
> > > and rely on the platform provided one.
> > >
> > > This is expected to simplify builds and address a number of problems.
> > > Seems it doesn't cause any performance degradation, (the Gluon tests
> > > run almost 4x faster in my 64 core machine).
> > >
> > > There has been in depth study of performance implications by
> > > contributors like Stanislav Tsukrov and Anton Chernov.  All the
> > > concerns and comments from the reviewers have been addressed and we
> > > can't keep asking open ended questions to block PRs. Reviewers are
> > > expected to be proactive and responsive to contributors so we keep
> > > encouraging active contributors.
> > >
> > > please vote to merge this PR accordingly:
> > >
> > > +1 = approve
> > > +0 = no opinion
> > > -1 = disapprove (provide reason)
> > >
> > > If we observe regressions reported by any internal performance systems
> > > or by contributors the PR can be reverted easily. So it's not a one
> > > way door. But it will be useful to try this in master for a while.
> > >
> > > Thank you.
> > >
> > > Pedro.
> > >
> 


Re: [VOTE] Remove conflicting OpenMP from CMake builds

2019-06-17 Thread Pedro Larroy
Thanks.

How do we go on advancing this PR then? all the questions have been
answered, performance numbers provided and more. Until how long can a
veto stand? Also without replies to contributors.

Pedro.

On Fri, Jun 14, 2019 at 5:44 PM Sheng Zha  wrote:
>
> This vote is invalid as the original PR has been vetoed by a committer. A 
> vote on dev@ won't help you circumvent a veto.
>
> -sz
>
> On 2019/06/14 23:59:33, Pedro Larroy  wrote:
> > Hi all
> >
> > This is a 5-day vote to act and wrap up an outstanding PR that removes
> > linkage with multiple OpenMP from 3rdparty and uses the system
> > provided one which might resolve a number of difficult to debug issues
> > and possible undefined behaviour.
> >
> > https://github.com/apache/incubator-mxnet/pull/12160
> >
> > See the comments in the thread for more details but in short, linking
> > with multiple openmp versions seems to lead to undefined behaviour,
> > plus it would simplify not having to deal with a custom openmp version
> > and rely on the platform provided one.
> >
> > This is expected to simplify builds and address a number of problems.
> > Seems it doesn't cause any performance degradation, (the Gluon tests
> > run almost 4x faster in my 64 core machine).
> >
> > There has been in depth study of performance implications by
> > contributors like Stanislav Tsukrov and Anton Chernov.  All the
> > concerns and comments from the reviewers have been addressed and we
> > can't keep asking open ended questions to block PRs. Reviewers are
> > expected to be proactive and responsive to contributors so we keep
> > encouraging active contributors.
> >
> > please vote to merge this PR accordingly:
> >
> > +1 = approve
> > +0 = no opinion
> > -1 = disapprove (provide reason)
> >
> > If we observe regressions reported by any internal performance systems
> > or by contributors the PR can be reverted easily. So it's not a one
> > way door. But it will be useful to try this in master for a while.
> >
> > Thank you.
> >
> > Pedro.
> >