Re: Performance regression from removing libiomp5.so

2019-12-13 Thread Chris Olivier
the follow/up PR shouldn’t affect directly since the problem was taken care
of with the first PR., it was just something i did in the course that is
probably a good idea in general.

The dual linking probably can be addressed if it is found that the
performance returns. I would suggest that when doing a performance test, be
sure that build doesn’t have both linked in because it may be that the one
it uses happens to be the first one loaded, so the performance might look
the same in that case if a dependency uses is libgomp.


On Thu, Dec 12, 2019 at 9:48 PM Lausen, Leonard 
wrote:

> That error should be fixed by Chris's work at
> https://github.com/apache/incubator-mxnet/pull/17039
>
> It is currently expected that libmxnet.so transitively requires both
> libomp.so
> and libgomp.so. If this is an issue, we need to build OpenBLAS from source
> as
> part of our build scripts, because it introduces the libgomp.so
> requirement.
>
> Are there any other known issues currently?
>
> @Sam, do we have any performance comparison with the CMake build of 1.6 (+
> #
> 17039 fix backported)?
>
> Chris also mentioned he still has a follow-up PR
>
> > I actually also did change (removed from this PR in order to keep it
> > simple) that stops forcing the creation of the engine during static init
> and
> > the fork handlers, if it is desired. This has a couple of effects
> including
> > reducing the overhead of resources and threads created simply by the
> library
> > being loaded, which then, in turn, causes a lot of thread churn (Stop(),
> > Start()) when forking. Even if hey never use mxnet, we spin up a lot of
> stuff
> > unnecessarily. It actually gets lot faster on startp when this is done.
>
> Best regards
> Leonard
>
> On Fri, 2019-12-13 at 13:34 +0800, Tao Lv wrote:
> > Hi Chris,
> >
> > From the licensing standpoint, llvm omp is indeed a choice. But
> previously
> > we noticed that building mxnet with cmake and the llvm omp under 3rdparty
> > folder will cause two runtimes linked to libmxnet.so [1]. Do you think
> > that's still a problem?
> >
> > Also with the current two build systems, linking llvm omp means we need
> > move the binary release process from make to cmake, which i think need
> more
> > discussion in the community. It's not likely we can finish that within
> the
> > schedule of 1.6.0 release.
> >
> > [1]
> >
> https://github.com/apache/incubator-mxnet/issues/11417#issuecomment-555413002
> >
> > On Thu, Dec 12, 2019 at 10:28 PM Chris Olivier 
> > wrote:
> >
> > > Hi Patric,
> > >
> > > The llvm openmp we compile (originally from same Intel source as we all
> > > know) seems to be Apache 2.0 licensed. Could we use that instead from a
> > > licensing standpoint?
> > >
> > > On Wed, Dec 11, 2019 at 10:36 PM Zhao, Patric 
> > > wrote:
> > >
> > > > Thanks, Sam.
> > > >
> > > > The root cause is from different OpenMP library. Intel OpenMP will
> > > provide
> > > > better performance as your data shown.
> > > >
> > > > Regarding release, since the license issue[1], we can't ship Intel
> OpenMP
> > > > in the binary, but the most of performance boost from MKLDNN is still
> > > > available.
> > > > I think it should be acceptable to release 1.6 with MKLDNN  + GNU
> OpenMP
> > > > for suboptimal performance.
> > > >
> > > > To achieve the best performance, user should build from source to
> enable
> > > > more advanced features like Intel MKL, Intel OpenMP, AVX512.
> > > >
> > > > Thanks,
> > > >
> > > > --Patric
> > > >
> > > > [1] https://www.apache.org/legal/resolved.html#category-x
> > > >
> > > >
> > > >
> > > > > -Original Message-
> > > > > From: Skalicky, Sam 
> > > > > Sent: Wednesday, December 11, 2019 1:36 PM
> > > > > To: dev@mxnet.incubator.apache.org
> > > > > Cc: Keshavan, Arjuna ; Harish, Nihal
> > > > > 
> > > > > Subject: Performance regression from removing libiomp5.so
> > > > >
> > > > > Hi MXNet community,
> > > > >
> > > > > I would like to bring your attention to the performance regression
> that
> > > > was
> > > > > found [1] between 1.5.1 and 1.6.0 due to removing the libiomp5.so
> > > library
> > > > > due to licensing issues. This change was made since this library
> has a
> > > > category
> > > > > x license [2] that is not compatible with the MXNet Apache
> > > > > license/distribution.
> > > > >
> > > > > We found that using OpenBLAS instead of MKL BLAS caused a
> regression
> > > > > from 1500 samples/sec to 1300 samples/sec a 13.3% regression in
> > > training
> > > > > speed for a resnet18 training benchmark on a C5.18xlarge EC2
> instance
> > > > (with
> > > > > 72 cores). Rebuilding with MKL BLAS showed an increase in
> performance
> > > to
> > > > > 1600 samples/sec in the 1.6.0 branch.
> > > > >
> > > > > Please provide your feedback on the licensing issue (are there any
> > > work-
> > > > > arounds) and the tradeoff in performance (is the benefit worth
> trying
> > > to
> > > > > include back into MXNet builds).
> > > > >
> > > > > Thanks to the efforts of th

Re: Performance regression from removing libiomp5.so

2019-12-12 Thread Lausen, Leonard
That error should be fixed by Chris's work at 
https://github.com/apache/incubator-mxnet/pull/17039

It is currently expected that libmxnet.so transitively requires both libomp.so
and libgomp.so. If this is an issue, we need to build OpenBLAS from source as
part of our build scripts, because it introduces the libgomp.so requirement.

Are there any other known issues currently?

@Sam, do we have any performance comparison with the CMake build of 1.6 (+ #
17039 fix backported)?

Chris also mentioned he still has a follow-up PR 

> I actually also did change (removed from this PR in order to keep it
> simple) that stops forcing the creation of the engine during static init and
> the fork handlers, if it is desired. This has a couple of effects including
> reducing the overhead of resources and threads created simply by the library
> being loaded, which then, in turn, causes a lot of thread churn (Stop(),
> Start()) when forking. Even if hey never use mxnet, we spin up a lot of stuff
> unnecessarily. It actually gets lot faster on startp when this is done.

Best regards
Leonard

On Fri, 2019-12-13 at 13:34 +0800, Tao Lv wrote:
> Hi Chris,
> 
> From the licensing standpoint, llvm omp is indeed a choice. But previously
> we noticed that building mxnet with cmake and the llvm omp under 3rdparty
> folder will cause two runtimes linked to libmxnet.so [1]. Do you think
> that's still a problem?
> 
> Also with the current two build systems, linking llvm omp means we need
> move the binary release process from make to cmake, which i think need more
> discussion in the community. It's not likely we can finish that within the
> schedule of 1.6.0 release.
> 
> [1]
> https://github.com/apache/incubator-mxnet/issues/11417#issuecomment-555413002
> 
> On Thu, Dec 12, 2019 at 10:28 PM Chris Olivier 
> wrote:
> 
> > Hi Patric,
> > 
> > The llvm openmp we compile (originally from same Intel source as we all
> > know) seems to be Apache 2.0 licensed. Could we use that instead from a
> > licensing standpoint?
> > 
> > On Wed, Dec 11, 2019 at 10:36 PM Zhao, Patric 
> > wrote:
> > 
> > > Thanks, Sam.
> > > 
> > > The root cause is from different OpenMP library. Intel OpenMP will
> > provide
> > > better performance as your data shown.
> > > 
> > > Regarding release, since the license issue[1], we can't ship Intel OpenMP
> > > in the binary, but the most of performance boost from MKLDNN is still
> > > available.
> > > I think it should be acceptable to release 1.6 with MKLDNN  + GNU OpenMP
> > > for suboptimal performance.
> > > 
> > > To achieve the best performance, user should build from source to enable
> > > more advanced features like Intel MKL, Intel OpenMP, AVX512.
> > > 
> > > Thanks,
> > > 
> > > --Patric
> > > 
> > > [1] https://www.apache.org/legal/resolved.html#category-x
> > > 
> > > 
> > > 
> > > > -Original Message-
> > > > From: Skalicky, Sam 
> > > > Sent: Wednesday, December 11, 2019 1:36 PM
> > > > To: dev@mxnet.incubator.apache.org
> > > > Cc: Keshavan, Arjuna ; Harish, Nihal
> > > > 
> > > > Subject: Performance regression from removing libiomp5.so
> > > > 
> > > > Hi MXNet community,
> > > > 
> > > > I would like to bring your attention to the performance regression that
> > > was
> > > > found [1] between 1.5.1 and 1.6.0 due to removing the libiomp5.so
> > library
> > > > due to licensing issues. This change was made since this library has a
> > > category
> > > > x license [2] that is not compatible with the MXNet Apache
> > > > license/distribution.
> > > > 
> > > > We found that using OpenBLAS instead of MKL BLAS caused a regression
> > > > from 1500 samples/sec to 1300 samples/sec a 13.3% regression in
> > training
> > > > speed for a resnet18 training benchmark on a C5.18xlarge EC2 instance
> > > (with
> > > > 72 cores). Rebuilding with MKL BLAS showed an increase in performance
> > to
> > > > 1600 samples/sec in the 1.6.0 branch.
> > > > 
> > > > Please provide your feedback on the licensing issue (are there any
> > work-
> > > > arounds) and the tradeoff in performance (is the benefit worth trying
> > to
> > > > include back into MXNet builds).
> > > > 
> > > > Thanks to the efforts of the following folks for working on this issue
> > > (in no
> > > > particular order):
> > > > Patric Zhao
> > > > Amol Lele
> > > > Tao Lv A
> > > > Pedro Larroy
> > > > Nihal Harish
> > > > Chai Bapat
> > > > Arjuna Keshavan
> > > > Rong Zhang
> > > > 
> > > > Thanks!
> > > > Sam
> > > > 
> > > > [1] https://github.com/apache/incubator-mxnet/issues/16891
> > > > [2] https://www.apache.org/legal/resolved.html#category-x


Re: Performance regression from removing libiomp5.so

2019-12-12 Thread Tao Lv
Hi Chris,

>From the licensing standpoint, llvm omp is indeed a choice. But previously
we noticed that building mxnet with cmake and the llvm omp under 3rdparty
folder will cause two runtimes linked to libmxnet.so [1]. Do you think
that's still a problem?

Also with the current two build systems, linking llvm omp means we need
move the binary release process from make to cmake, which i think need more
discussion in the community. It's not likely we can finish that within the
schedule of 1.6.0 release.

[1]
https://github.com/apache/incubator-mxnet/issues/11417#issuecomment-555413002

On Thu, Dec 12, 2019 at 10:28 PM Chris Olivier 
wrote:

> Hi Patric,
>
> The llvm openmp we compile (originally from same Intel source as we all
> know) seems to be Apache 2.0 licensed. Could we use that instead from a
> licensing standpoint?
>
> On Wed, Dec 11, 2019 at 10:36 PM Zhao, Patric 
> wrote:
>
> > Thanks, Sam.
> >
> > The root cause is from different OpenMP library. Intel OpenMP will
> provide
> > better performance as your data shown.
> >
> > Regarding release, since the license issue[1], we can't ship Intel OpenMP
> > in the binary, but the most of performance boost from MKLDNN is still
> > available.
> > I think it should be acceptable to release 1.6 with MKLDNN  + GNU OpenMP
> > for suboptimal performance.
> >
> > To achieve the best performance, user should build from source to enable
> > more advanced features like Intel MKL, Intel OpenMP, AVX512.
> >
> > Thanks,
> >
> > --Patric
> >
> > [1] https://www.apache.org/legal/resolved.html#category-x
> >
> >
> >
> > > -Original Message-
> > > From: Skalicky, Sam 
> > > Sent: Wednesday, December 11, 2019 1:36 PM
> > > To: dev@mxnet.incubator.apache.org
> > > Cc: Keshavan, Arjuna ; Harish, Nihal
> > > 
> > > Subject: Performance regression from removing libiomp5.so
> > >
> > > Hi MXNet community,
> > >
> > > I would like to bring your attention to the performance regression that
> > was
> > > found [1] between 1.5.1 and 1.6.0 due to removing the libiomp5.so
> library
> > > due to licensing issues. This change was made since this library has a
> > category
> > > x license [2] that is not compatible with the MXNet Apache
> > > license/distribution.
> > >
> > > We found that using OpenBLAS instead of MKL BLAS caused a regression
> > > from 1500 samples/sec to 1300 samples/sec a 13.3% regression in
> training
> > > speed for a resnet18 training benchmark on a C5.18xlarge EC2 instance
> > (with
> > > 72 cores). Rebuilding with MKL BLAS showed an increase in performance
> to
> > > 1600 samples/sec in the 1.6.0 branch.
> > >
> > > Please provide your feedback on the licensing issue (are there any
> work-
> > > arounds) and the tradeoff in performance (is the benefit worth trying
> to
> > > include back into MXNet builds).
> > >
> > > Thanks to the efforts of the following folks for working on this issue
> > (in no
> > > particular order):
> > > Patric Zhao
> > > Amol Lele
> > > Tao Lv A
> > > Pedro Larroy
> > > Nihal Harish
> > > Chai Bapat
> > > Arjuna Keshavan
> > > Rong Zhang
> > >
> > > Thanks!
> > > Sam
> > >
> > > [1] https://github.com/apache/incubator-mxnet/issues/16891
> > > [2] https://www.apache.org/legal/resolved.html#category-x
> >
>


Re: Performance regression from removing libiomp5.so

2019-12-12 Thread Chris Olivier
Hi Patric,

The llvm openmp we compile (originally from same Intel source as we all
know) seems to be Apache 2.0 licensed. Could we use that instead from a
licensing standpoint?

On Wed, Dec 11, 2019 at 10:36 PM Zhao, Patric  wrote:

> Thanks, Sam.
>
> The root cause is from different OpenMP library. Intel OpenMP will provide
> better performance as your data shown.
>
> Regarding release, since the license issue[1], we can't ship Intel OpenMP
> in the binary, but the most of performance boost from MKLDNN is still
> available.
> I think it should be acceptable to release 1.6 with MKLDNN  + GNU OpenMP
> for suboptimal performance.
>
> To achieve the best performance, user should build from source to enable
> more advanced features like Intel MKL, Intel OpenMP, AVX512.
>
> Thanks,
>
> --Patric
>
> [1] https://www.apache.org/legal/resolved.html#category-x
>
>
>
> > -Original Message-
> > From: Skalicky, Sam 
> > Sent: Wednesday, December 11, 2019 1:36 PM
> > To: dev@mxnet.incubator.apache.org
> > Cc: Keshavan, Arjuna ; Harish, Nihal
> > 
> > Subject: Performance regression from removing libiomp5.so
> >
> > Hi MXNet community,
> >
> > I would like to bring your attention to the performance regression that
> was
> > found [1] between 1.5.1 and 1.6.0 due to removing the libiomp5.so library
> > due to licensing issues. This change was made since this library has a
> category
> > x license [2] that is not compatible with the MXNet Apache
> > license/distribution.
> >
> > We found that using OpenBLAS instead of MKL BLAS caused a regression
> > from 1500 samples/sec to 1300 samples/sec a 13.3% regression in training
> > speed for a resnet18 training benchmark on a C5.18xlarge EC2 instance
> (with
> > 72 cores). Rebuilding with MKL BLAS showed an increase in performance to
> > 1600 samples/sec in the 1.6.0 branch.
> >
> > Please provide your feedback on the licensing issue (are there any work-
> > arounds) and the tradeoff in performance (is the benefit worth trying to
> > include back into MXNet builds).
> >
> > Thanks to the efforts of the following folks for working on this issue
> (in no
> > particular order):
> > Patric Zhao
> > Amol Lele
> > Tao Lv A
> > Pedro Larroy
> > Nihal Harish
> > Chai Bapat
> > Arjuna Keshavan
> > Rong Zhang
> >
> > Thanks!
> > Sam
> >
> > [1] https://github.com/apache/incubator-mxnet/issues/16891
> > [2] https://www.apache.org/legal/resolved.html#category-x
>


RE: Performance regression from removing libiomp5.so

2019-12-11 Thread Zhao, Patric
Thanks, Sam.

The root cause is from different OpenMP library. Intel OpenMP will provide 
better performance as your data shown.

Regarding release, since the license issue[1], we can't ship Intel OpenMP in 
the binary, but the most of performance boost from MKLDNN is still available.
I think it should be acceptable to release 1.6 with MKLDNN  + GNU OpenMP for 
suboptimal performance. 

To achieve the best performance, user should build from source to enable more 
advanced features like Intel MKL, Intel OpenMP, AVX512.

Thanks,

--Patric

[1] https://www.apache.org/legal/resolved.html#category-x



> -Original Message-
> From: Skalicky, Sam 
> Sent: Wednesday, December 11, 2019 1:36 PM
> To: dev@mxnet.incubator.apache.org
> Cc: Keshavan, Arjuna ; Harish, Nihal
> 
> Subject: Performance regression from removing libiomp5.so
> 
> Hi MXNet community,
> 
> I would like to bring your attention to the performance regression that was
> found [1] between 1.5.1 and 1.6.0 due to removing the libiomp5.so library
> due to licensing issues. This change was made since this library has a 
> category
> x license [2] that is not compatible with the MXNet Apache
> license/distribution.
> 
> We found that using OpenBLAS instead of MKL BLAS caused a regression
> from 1500 samples/sec to 1300 samples/sec a 13.3% regression in training
> speed for a resnet18 training benchmark on a C5.18xlarge EC2 instance (with
> 72 cores). Rebuilding with MKL BLAS showed an increase in performance to
> 1600 samples/sec in the 1.6.0 branch.
> 
> Please provide your feedback on the licensing issue (are there any work-
> arounds) and the tradeoff in performance (is the benefit worth trying to
> include back into MXNet builds).
> 
> Thanks to the efforts of the following folks for working on this issue (in no
> particular order):
> Patric Zhao
> Amol Lele
> Tao Lv A
> Pedro Larroy
> Nihal Harish
> Chai Bapat
> Arjuna Keshavan
> Rong Zhang
> 
> Thanks!
> Sam
> 
> [1] https://github.com/apache/incubator-mxnet/issues/16891
> [2] https://www.apache.org/legal/resolved.html#category-x