Re: MKLDNN performance in CI

2018-11-23 Thread Lv, Tao A
It’s great to hear that works. Thanks for your effort on this, Marco. 

-Tao

Sent from my iPhone

> On Nov 23, 2018, at 7:31 PM, Marco de Abreu 
>  wrote:
> 
> Great news, it was the debug flag:
> 
> [success] 16.54% test_gluon_model_zoo.test_models: 320.0898s
> [success] 6.64% test_random.test_shuffle: 128.5430s
> [success] 5.67% test_sparse_operator.test_elemwise_binary_ops: 109.6650s
> [success] 4.41% test_metric_perf.test_metric_performance: 85.3215s
> [success] 4.28% test_random.test_negative_binomial_generator: 82.8046s
> [success] 3.91% test_operator.test_pick: 75.7241s
> [success] 3.34% test_operator.test_psroipooling: 64.7008s
> [success] 3.30% test_random.test_poisson_generator: 63.9218s
> [success] 3.24% test_operator.test_broadcast_binary_op: 62.7417s
> [success] 2.95% test_random.test_random: 57.0268s
> [success] 2.95% test_gluon.test_slice_pooling2d_slice_pooling2d: 57.0118s
> [success] 2.03% test_random.test_normal_generator: 39.3641s
> [success] 1.86% test_io.test_Cifar10Rec: 36.0722s
> [success] 1.76% test_random.test_gamma_generator: 34.0995s
> [success] 1.65% test_gluon.test_slice_batchnorm: 31.9859s
> [success] 1.63% test_gluon.test_slice_pooling2d: 31.5945s
> 
> [...]
> 
> Ran 703 tests in 1941.053s
> 
> http://jenkins.mxnet-ci.amazon-ml.com/blue/rest/organizations/jenkins/pipelines/mxnet-validation/pipelines/unix-cpu/branches/PR-13379/runs/1/nodes/155/steps/409/log/?start=0
> 
> We are all good and set then. Thanks Tao and Patric for your support!
> 
> Best regards,
> Marco
> 
> On Fri, Nov 23, 2018 at 3:44 AM Marco de Abreu 
> wrote:
> 
>> Sure, good idea! https://github.com/apache/incubator-mxnet/pull/13379
>> 
>> -Marco
>> 
>> On Fri, Nov 23, 2018 at 3:38 AM Zhao, Patric 
>> wrote:
>> 
>>> Thanks, it should be the most time-consuming parts.
>>> 
>>> @Marco, could you try to disable this env and see the performance again?
>>> 
>>>> -Original Message-
>>>> From: Lv, Tao A [mailto:tao.a...@intel.com]
>>>> Sent: Friday, November 23, 2018 10:26 AM
>>>> To: dev@mxnet.incubator.apache.org
>>>> Subject: RE: MKLDNN performance in CI
>>>> 
>>>> I think yes, except the cpp test.
>>>> 
>>>> -Original Message-
>>>> From: Zhao, Patric [mailto:patric.z...@intel.com]
>>>> Sent: Friday, November 23, 2018 10:06 AM
>>>> To: dev@mxnet.incubator.apache.org
>>>> Subject: RE: MKLDNN performance in CI
>>>> 
>>>> Good point, Tao!
>>>> Is this env enabled in all MKL-DNN CI?
>>>> 
>>>>> -Original Message-
>>>>> From: Lv, Tao A [mailto:tao.a...@intel.com]
>>>>> Sent: Friday, November 23, 2018 9:53 AM
>>>>> To: dev@mxnet.incubator.apache.org
>>>>> Subject: RE: MKLDNN performance in CI
>>>>> 
>>>>> Thanks for bringing this up, Marco. It's really weird since most of
>>>>> those tests listed in "worth noting" are not related to mkldnn
>>> backend.
>>>>> 
>>>>> I can understand that some tests for mkldnn operator may be slower
>>>>> because MXNET_MKLDNN_DEBUG is enabled in the CI:
>>>>> https://github.com/apache/incubator-
>>>>> mxnet/blob/master/ci/docker/runtime_functions.sh#L713
>>>>> 
>>>>> -Original Message-
>>>>> From: Marco de Abreu [mailto:marco.g.ab...@googlemail.com.INVALID]
>>>>> Sent: Friday, November 23, 2018 9:22 AM
>>>>> To: dev@mxnet.incubator.apache.org
>>>>> Subject: MKLDNN performance in CI
>>>>> 
>>>>> Hello,
>>>>> 
>>>>> I have noticed that our Python tests have been increasing in duration
>>>> recently.
>>>>> In order to analyse this further, I created the PR [1] which allows to
>>>>> record test durations. Please note that I did not dive deep on these
>>>>> numbers and that they have to be taken with a grain of salt since
>>>>> slaves have varying resource utilizations.
>>>>> 
>>>>> Please have a look at the two following logs:
>>>>> Python3 CPU MKLDNN:
>>>>> http://jenkins.mxnet-ci.amazon-
>>>>> ml.com/blue/rest/organizations/jenkins/pipelines/mxnet-
>>>>> validation/pipelines/unix-cpu/branches/PR-
>>>>> 13377/runs/2/nodes/155/steps/409/log/?start=0
>>>>> Python3 CPU Openblas:
>>

Re: MKLDNN performance in CI

2018-11-23 Thread Marco de Abreu
Great news, it was the debug flag:

[success] 16.54% test_gluon_model_zoo.test_models: 320.0898s
[success] 6.64% test_random.test_shuffle: 128.5430s
[success] 5.67% test_sparse_operator.test_elemwise_binary_ops: 109.6650s
[success] 4.41% test_metric_perf.test_metric_performance: 85.3215s
[success] 4.28% test_random.test_negative_binomial_generator: 82.8046s
[success] 3.91% test_operator.test_pick: 75.7241s
[success] 3.34% test_operator.test_psroipooling: 64.7008s
[success] 3.30% test_random.test_poisson_generator: 63.9218s
[success] 3.24% test_operator.test_broadcast_binary_op: 62.7417s
[success] 2.95% test_random.test_random: 57.0268s
[success] 2.95% test_gluon.test_slice_pooling2d_slice_pooling2d: 57.0118s
[success] 2.03% test_random.test_normal_generator: 39.3641s
[success] 1.86% test_io.test_Cifar10Rec: 36.0722s
[success] 1.76% test_random.test_gamma_generator: 34.0995s
[success] 1.65% test_gluon.test_slice_batchnorm: 31.9859s
[success] 1.63% test_gluon.test_slice_pooling2d: 31.5945s

[...]

Ran 703 tests in 1941.053s

http://jenkins.mxnet-ci.amazon-ml.com/blue/rest/organizations/jenkins/pipelines/mxnet-validation/pipelines/unix-cpu/branches/PR-13379/runs/1/nodes/155/steps/409/log/?start=0

We are all good and set then. Thanks Tao and Patric for your support!

Best regards,
Marco

On Fri, Nov 23, 2018 at 3:44 AM Marco de Abreu 
wrote:

> Sure, good idea! https://github.com/apache/incubator-mxnet/pull/13379
>
> -Marco
>
> On Fri, Nov 23, 2018 at 3:38 AM Zhao, Patric 
> wrote:
>
>> Thanks, it should be the most time-consuming parts.
>>
>> @Marco, could you try to disable this env and see the performance again?
>>
>> > -Original Message-
>> > From: Lv, Tao A [mailto:tao.a...@intel.com]
>> > Sent: Friday, November 23, 2018 10:26 AM
>> > To: dev@mxnet.incubator.apache.org
>> > Subject: RE: MKLDNN performance in CI
>> >
>> > I think yes, except the cpp test.
>> >
>> > -Original Message-
>> > From: Zhao, Patric [mailto:patric.z...@intel.com]
>> > Sent: Friday, November 23, 2018 10:06 AM
>> > To: dev@mxnet.incubator.apache.org
>> > Subject: RE: MKLDNN performance in CI
>> >
>> > Good point, Tao!
>> > Is this env enabled in all MKL-DNN CI?
>> >
>> > > -Original Message-
>> > > From: Lv, Tao A [mailto:tao.a...@intel.com]
>> > > Sent: Friday, November 23, 2018 9:53 AM
>> > > To: dev@mxnet.incubator.apache.org
>> > > Subject: RE: MKLDNN performance in CI
>> > >
>> > > Thanks for bringing this up, Marco. It's really weird since most of
>> > > those tests listed in "worth noting" are not related to mkldnn
>> backend.
>> > >
>> > > I can understand that some tests for mkldnn operator may be slower
>> > > because MXNET_MKLDNN_DEBUG is enabled in the CI:
>> > > https://github.com/apache/incubator-
>> > > mxnet/blob/master/ci/docker/runtime_functions.sh#L713
>> > >
>> > > -Original Message-
>> > > From: Marco de Abreu [mailto:marco.g.ab...@googlemail.com.INVALID]
>> > > Sent: Friday, November 23, 2018 9:22 AM
>> > > To: dev@mxnet.incubator.apache.org
>> > > Subject: MKLDNN performance in CI
>> > >
>> > > Hello,
>> > >
>> > > I have noticed that our Python tests have been increasing in duration
>> > recently.
>> > > In order to analyse this further, I created the PR [1] which allows to
>> > > record test durations. Please note that I did not dive deep on these
>> > > numbers and that they have to be taken with a grain of salt since
>> > > slaves have varying resource utilizations.
>> > >
>> > > Please have a look at the two following logs:
>> > > Python3 CPU MKLDNN:
>> > > http://jenkins.mxnet-ci.amazon-
>> > > ml.com/blue/rest/organizations/jenkins/pipelines/mxnet-
>> > > validation/pipelines/unix-cpu/branches/PR-
>> > > 13377/runs/2/nodes/155/steps/409/log/?start=0
>> > > Python3 CPU Openblas:
>> > > http://jenkins.mxnet-ci.amazon-
>> > > ml.com/blue/rest/organizations/jenkins/pipelines/mxnet-
>> > > validation/pipelines/unix-cpu/branches/PR-
>> > > 13377/runs/2/nodes/152/steps/398/log/?start=0
>> > >
>> > > If you scroll to the end (note that there are multiple test stages and
>> > > summaries being printed in these logs), you will find the following
>> > > statements:
>> > >
>> > > Python3 CPU MKL

Re: MKLDNN performance in CI

2018-11-22 Thread Marco de Abreu
Sure, good idea! https://github.com/apache/incubator-mxnet/pull/13379

-Marco

On Fri, Nov 23, 2018 at 3:38 AM Zhao, Patric  wrote:

> Thanks, it should be the most time-consuming parts.
>
> @Marco, could you try to disable this env and see the performance again?
>
> > -Original Message-
> > From: Lv, Tao A [mailto:tao.a...@intel.com]
> > Sent: Friday, November 23, 2018 10:26 AM
> > To: dev@mxnet.incubator.apache.org
> > Subject: RE: MKLDNN performance in CI
> >
> > I think yes, except the cpp test.
> >
> > -Original Message-
> > From: Zhao, Patric [mailto:patric.z...@intel.com]
> > Sent: Friday, November 23, 2018 10:06 AM
> > To: dev@mxnet.incubator.apache.org
> > Subject: RE: MKLDNN performance in CI
> >
> > Good point, Tao!
> > Is this env enabled in all MKL-DNN CI?
> >
> > > -Original Message-
> > > From: Lv, Tao A [mailto:tao.a...@intel.com]
> > > Sent: Friday, November 23, 2018 9:53 AM
> > > To: dev@mxnet.incubator.apache.org
> > > Subject: RE: MKLDNN performance in CI
> > >
> > > Thanks for bringing this up, Marco. It's really weird since most of
> > > those tests listed in "worth noting" are not related to mkldnn backend.
> > >
> > > I can understand that some tests for mkldnn operator may be slower
> > > because MXNET_MKLDNN_DEBUG is enabled in the CI:
> > > https://github.com/apache/incubator-
> > > mxnet/blob/master/ci/docker/runtime_functions.sh#L713
> > >
> > > -Original Message-
> > > From: Marco de Abreu [mailto:marco.g.ab...@googlemail.com.INVALID]
> > > Sent: Friday, November 23, 2018 9:22 AM
> > > To: dev@mxnet.incubator.apache.org
> > > Subject: MKLDNN performance in CI
> > >
> > > Hello,
> > >
> > > I have noticed that our Python tests have been increasing in duration
> > recently.
> > > In order to analyse this further, I created the PR [1] which allows to
> > > record test durations. Please note that I did not dive deep on these
> > > numbers and that they have to be taken with a grain of salt since
> > > slaves have varying resource utilizations.
> > >
> > > Please have a look at the two following logs:
> > > Python3 CPU MKLDNN:
> > > http://jenkins.mxnet-ci.amazon-
> > > ml.com/blue/rest/organizations/jenkins/pipelines/mxnet-
> > > validation/pipelines/unix-cpu/branches/PR-
> > > 13377/runs/2/nodes/155/steps/409/log/?start=0
> > > Python3 CPU Openblas:
> > > http://jenkins.mxnet-ci.amazon-
> > > ml.com/blue/rest/organizations/jenkins/pipelines/mxnet-
> > > validation/pipelines/unix-cpu/branches/PR-
> > > 13377/runs/2/nodes/152/steps/398/log/?start=0
> > >
> > > If you scroll to the end (note that there are multiple test stages and
> > > summaries being printed in these logs), you will find the following
> > > statements:
> > >
> > > Python3 CPU MKLDNN: "Ran 702 tests in 3042.102s"
> > > Python3 CPU Openblas: "Ran 702 tests in 2158.458s"
> > >
> > > This shows that the MKLDNN is generally being about 40% slower than
> > > the Openblas backend. If we go into the details, we can see that some
> > > tests are significantly slower:
> > >
> > > Python3 CPU MKLDNN:
> > >
> > > >[success] 20.78% test_random.test_shuffle: 630.7165s [success] 17.79%
> > > >test_sparse_operator.test_elemwise_binary_ops: 540.0487s [success]
> > > >10.91% test_gluon_model_zoo.test_models: 331.1503s [success] 2.62%
> > > >test_operator.test_broadcast_binary_op: 79.4556s [success] 2.45%
> > > >test_operator.test_pick: 74.4041s [success] 2.39%
> > > >test_metric_perf.test_metric_performance: 72.5445s [success] 2.38%
> > > >test_random.test_negative_binomial_generator: 72.1751s [success]
> > > >1.84%
> > > >test_operator.test_psroipooling: 55.9432s [success] 1.78%
> > > >test_random.test_poisson_generator: 54.0104s [success] 1.72%
> > > >test_gluon.test_slice_pooling2d_slice_pooling2d: 52.3447s [success]
> > > >1.60% test_contrib_control_flow.test_cond: 48.6977s [success] 1.41%
> > > >test_random.test_random: 42.8712s [success] 1.03%
> > > >test_operator.test_layer_norm: 31.1242s
> > >
> > >
> > > Python3 CPU Openblas:
> > > > [success] 26.20% test_gluon_model_zoo.test_models: 563.3366s
> > > > [success] 4.34% test_random.test_shuffle: 93.3157s [success

RE: MKLDNN performance in CI

2018-11-22 Thread Zhao, Patric
Thanks, it should be the most time-consuming parts.

@Marco, could you try to disable this env and see the performance again? 

> -Original Message-
> From: Lv, Tao A [mailto:tao.a...@intel.com]
> Sent: Friday, November 23, 2018 10:26 AM
> To: dev@mxnet.incubator.apache.org
> Subject: RE: MKLDNN performance in CI
> 
> I think yes, except the cpp test.
> 
> -Original Message-
> From: Zhao, Patric [mailto:patric.z...@intel.com]
> Sent: Friday, November 23, 2018 10:06 AM
> To: dev@mxnet.incubator.apache.org
> Subject: RE: MKLDNN performance in CI
> 
> Good point, Tao!
> Is this env enabled in all MKL-DNN CI?
> 
> > -Original Message-
> > From: Lv, Tao A [mailto:tao.a...@intel.com]
> > Sent: Friday, November 23, 2018 9:53 AM
> > To: dev@mxnet.incubator.apache.org
> > Subject: RE: MKLDNN performance in CI
> >
> > Thanks for bringing this up, Marco. It's really weird since most of
> > those tests listed in "worth noting" are not related to mkldnn backend.
> >
> > I can understand that some tests for mkldnn operator may be slower
> > because MXNET_MKLDNN_DEBUG is enabled in the CI:
> > https://github.com/apache/incubator-
> > mxnet/blob/master/ci/docker/runtime_functions.sh#L713
> >
> > -Original Message-
> > From: Marco de Abreu [mailto:marco.g.ab...@googlemail.com.INVALID]
> > Sent: Friday, November 23, 2018 9:22 AM
> > To: dev@mxnet.incubator.apache.org
> > Subject: MKLDNN performance in CI
> >
> > Hello,
> >
> > I have noticed that our Python tests have been increasing in duration
> recently.
> > In order to analyse this further, I created the PR [1] which allows to
> > record test durations. Please note that I did not dive deep on these
> > numbers and that they have to be taken with a grain of salt since
> > slaves have varying resource utilizations.
> >
> > Please have a look at the two following logs:
> > Python3 CPU MKLDNN:
> > http://jenkins.mxnet-ci.amazon-
> > ml.com/blue/rest/organizations/jenkins/pipelines/mxnet-
> > validation/pipelines/unix-cpu/branches/PR-
> > 13377/runs/2/nodes/155/steps/409/log/?start=0
> > Python3 CPU Openblas:
> > http://jenkins.mxnet-ci.amazon-
> > ml.com/blue/rest/organizations/jenkins/pipelines/mxnet-
> > validation/pipelines/unix-cpu/branches/PR-
> > 13377/runs/2/nodes/152/steps/398/log/?start=0
> >
> > If you scroll to the end (note that there are multiple test stages and
> > summaries being printed in these logs), you will find the following
> > statements:
> >
> > Python3 CPU MKLDNN: "Ran 702 tests in 3042.102s"
> > Python3 CPU Openblas: "Ran 702 tests in 2158.458s"
> >
> > This shows that the MKLDNN is generally being about 40% slower than
> > the Openblas backend. If we go into the details, we can see that some
> > tests are significantly slower:
> >
> > Python3 CPU MKLDNN:
> >
> > >[success] 20.78% test_random.test_shuffle: 630.7165s [success] 17.79%
> > >test_sparse_operator.test_elemwise_binary_ops: 540.0487s [success]
> > >10.91% test_gluon_model_zoo.test_models: 331.1503s [success] 2.62%
> > >test_operator.test_broadcast_binary_op: 79.4556s [success] 2.45%
> > >test_operator.test_pick: 74.4041s [success] 2.39%
> > >test_metric_perf.test_metric_performance: 72.5445s [success] 2.38%
> > >test_random.test_negative_binomial_generator: 72.1751s [success]
> > >1.84%
> > >test_operator.test_psroipooling: 55.9432s [success] 1.78%
> > >test_random.test_poisson_generator: 54.0104s [success] 1.72%
> > >test_gluon.test_slice_pooling2d_slice_pooling2d: 52.3447s [success]
> > >1.60% test_contrib_control_flow.test_cond: 48.6977s [success] 1.41%
> > >test_random.test_random: 42.8712s [success] 1.03%
> > >test_operator.test_layer_norm: 31.1242s
> >
> >
> > Python3 CPU Openblas:
> > > [success] 26.20% test_gluon_model_zoo.test_models: 563.3366s
> > > [success] 4.34% test_random.test_shuffle: 93.3157s [success] 4.31%
> > > test_random.test_negative_binomial_generator: 92.6899s [success]
> > > 3.78%
> > > test_sparse_operator.test_elemwise_binary_ops: 81.2048s  [success]
> > > 3.30% test_operator.test_psroipooling: 70.9090s  [success] 3.20%
> > > test_random.test_poisson_generator: 68.7500s  [success] 3.10%
> > > test_metric_perf.test_metric_performance: 66.6085s  [success] 2.79%
> > > test_operator.test_layer_norm: 59.9566s  [success] 2.66%
> > > test_gluon.test_slice_pooling2d_slic

Re: MKLDNN performance in CI

2018-11-22 Thread Marco de Abreu
Okay, that makes sense. Thanks for the very detailed explanation!

Don't worry about speed in CI, especially the debug flag might be the
reason here. But I'm happy to leave it in since the task of CI is accuracy
and detailed reporting, but not speed :)

I just wanted to give a quick heads up in case this could be an indicator.
I'm looking forward to the 1:1 comparisons, but besides that there's no
further action required. Thanks for your efforts!

Best regards,
Marco

Am Fr., 23. Nov. 2018, 03:25 hat Lv, Tao A  geschrieben:

> I think yes, except the cpp test.
>
> -Original Message-
> From: Zhao, Patric [mailto:patric.z...@intel.com]
> Sent: Friday, November 23, 2018 10:06 AM
> To: dev@mxnet.incubator.apache.org
> Subject: RE: MKLDNN performance in CI
>
> Good point, Tao!
> Is this env enabled in all MKL-DNN CI?
>
> > -Original Message-
> > From: Lv, Tao A [mailto:tao.a...@intel.com]
> > Sent: Friday, November 23, 2018 9:53 AM
> > To: dev@mxnet.incubator.apache.org
> > Subject: RE: MKLDNN performance in CI
> >
> > Thanks for bringing this up, Marco. It's really weird since most of
> > those tests listed in "worth noting" are not related to mkldnn backend.
> >
> > I can understand that some tests for mkldnn operator may be slower
> > because MXNET_MKLDNN_DEBUG is enabled in the CI:
> > https://github.com/apache/incubator-
> > mxnet/blob/master/ci/docker/runtime_functions.sh#L713
> >
> > -Original Message-
> > From: Marco de Abreu [mailto:marco.g.ab...@googlemail.com.INVALID]
> > Sent: Friday, November 23, 2018 9:22 AM
> > To: dev@mxnet.incubator.apache.org
> > Subject: MKLDNN performance in CI
> >
> > Hello,
> >
> > I have noticed that our Python tests have been increasing in duration
> recently.
> > In order to analyse this further, I created the PR [1] which allows to
> > record test durations. Please note that I did not dive deep on these
> > numbers and that they have to be taken with a grain of salt since
> > slaves have varying resource utilizations.
> >
> > Please have a look at the two following logs:
> > Python3 CPU MKLDNN:
> > http://jenkins.mxnet-ci.amazon-
> > ml.com/blue/rest/organizations/jenkins/pipelines/mxnet-
> > validation/pipelines/unix-cpu/branches/PR-
> > 13377/runs/2/nodes/155/steps/409/log/?start=0
> > Python3 CPU Openblas:
> > http://jenkins.mxnet-ci.amazon-
> > ml.com/blue/rest/organizations/jenkins/pipelines/mxnet-
> > validation/pipelines/unix-cpu/branches/PR-
> > 13377/runs/2/nodes/152/steps/398/log/?start=0
> >
> > If you scroll to the end (note that there are multiple test stages and
> > summaries being printed in these logs), you will find the following
> > statements:
> >
> > Python3 CPU MKLDNN: "Ran 702 tests in 3042.102s"
> > Python3 CPU Openblas: "Ran 702 tests in 2158.458s"
> >
> > This shows that the MKLDNN is generally being about 40% slower than
> > the Openblas backend. If we go into the details, we can see that some
> > tests are significantly slower:
> >
> > Python3 CPU MKLDNN:
> >
> > >[success] 20.78% test_random.test_shuffle: 630.7165s [success] 17.79%
> > >test_sparse_operator.test_elemwise_binary_ops: 540.0487s [success]
> > >10.91% test_gluon_model_zoo.test_models: 331.1503s [success] 2.62%
> > >test_operator.test_broadcast_binary_op: 79.4556s [success] 2.45%
> > >test_operator.test_pick: 74.4041s [success] 2.39%
> > >test_metric_perf.test_metric_performance: 72.5445s [success] 2.38%
> > >test_random.test_negative_binomial_generator: 72.1751s [success]
> > >1.84%
> > >test_operator.test_psroipooling: 55.9432s [success] 1.78%
> > >test_random.test_poisson_generator: 54.0104s [success] 1.72%
> > >test_gluon.test_slice_pooling2d_slice_pooling2d: 52.3447s [success]
> > >1.60% test_contrib_control_flow.test_cond: 48.6977s [success] 1.41%
> > >test_random.test_random: 42.8712s [success] 1.03%
> > >test_operator.test_layer_norm: 31.1242s
> >
> >
> > Python3 CPU Openblas:
> > > [success] 26.20% test_gluon_model_zoo.test_models: 563.3366s
> > > [success] 4.34% test_random.test_shuffle: 93.3157s [success] 4.31%
> > > test_random.test_negative_binomial_generator: 92.6899s [success]
> > > 3.78%
> > > test_sparse_operator.test_elemwise_binary_ops: 81.2048s  [success]
> > > 3.30% test_operator.test_psroipooling: 70.9090s  [success] 3.20%
> > > test_random.test_poisson_generator: 68.7500s  [success] 3.10%
> > > test_metric_perf.test_metric_performance: 66.6085s

RE: MKLDNN performance in CI

2018-11-22 Thread Lv, Tao A
I think yes, except the cpp test.

-Original Message-
From: Zhao, Patric [mailto:patric.z...@intel.com] 
Sent: Friday, November 23, 2018 10:06 AM
To: dev@mxnet.incubator.apache.org
Subject: RE: MKLDNN performance in CI

Good point, Tao! 
Is this env enabled in all MKL-DNN CI? 

> -Original Message-
> From: Lv, Tao A [mailto:tao.a...@intel.com]
> Sent: Friday, November 23, 2018 9:53 AM
> To: dev@mxnet.incubator.apache.org
> Subject: RE: MKLDNN performance in CI
> 
> Thanks for bringing this up, Marco. It's really weird since most of 
> those tests listed in "worth noting" are not related to mkldnn backend.
> 
> I can understand that some tests for mkldnn operator may be slower 
> because MXNET_MKLDNN_DEBUG is enabled in the CI:
> https://github.com/apache/incubator-
> mxnet/blob/master/ci/docker/runtime_functions.sh#L713
> 
> -Original Message-
> From: Marco de Abreu [mailto:marco.g.ab...@googlemail.com.INVALID]
> Sent: Friday, November 23, 2018 9:22 AM
> To: dev@mxnet.incubator.apache.org
> Subject: MKLDNN performance in CI
> 
> Hello,
> 
> I have noticed that our Python tests have been increasing in duration 
> recently.
> In order to analyse this further, I created the PR [1] which allows to 
> record test durations. Please note that I did not dive deep on these 
> numbers and that they have to be taken with a grain of salt since 
> slaves have varying resource utilizations.
> 
> Please have a look at the two following logs:
> Python3 CPU MKLDNN:
> http://jenkins.mxnet-ci.amazon-
> ml.com/blue/rest/organizations/jenkins/pipelines/mxnet-
> validation/pipelines/unix-cpu/branches/PR-
> 13377/runs/2/nodes/155/steps/409/log/?start=0
> Python3 CPU Openblas:
> http://jenkins.mxnet-ci.amazon-
> ml.com/blue/rest/organizations/jenkins/pipelines/mxnet-
> validation/pipelines/unix-cpu/branches/PR-
> 13377/runs/2/nodes/152/steps/398/log/?start=0
> 
> If you scroll to the end (note that there are multiple test stages and 
> summaries being printed in these logs), you will find the following
> statements:
> 
> Python3 CPU MKLDNN: "Ran 702 tests in 3042.102s"
> Python3 CPU Openblas: "Ran 702 tests in 2158.458s"
> 
> This shows that the MKLDNN is generally being about 40% slower than 
> the Openblas backend. If we go into the details, we can see that some 
> tests are significantly slower:
> 
> Python3 CPU MKLDNN:
> 
> >[success] 20.78% test_random.test_shuffle: 630.7165s [success] 17.79%
> >test_sparse_operator.test_elemwise_binary_ops: 540.0487s [success] 
> >10.91% test_gluon_model_zoo.test_models: 331.1503s [success] 2.62%
> >test_operator.test_broadcast_binary_op: 79.4556s [success] 2.45%
> >test_operator.test_pick: 74.4041s [success] 2.39%
> >test_metric_perf.test_metric_performance: 72.5445s [success] 2.38%
> >test_random.test_negative_binomial_generator: 72.1751s [success] 
> >1.84%
> >test_operator.test_psroipooling: 55.9432s [success] 1.78%
> >test_random.test_poisson_generator: 54.0104s [success] 1.72%
> >test_gluon.test_slice_pooling2d_slice_pooling2d: 52.3447s [success] 
> >1.60% test_contrib_control_flow.test_cond: 48.6977s [success] 1.41%
> >test_random.test_random: 42.8712s [success] 1.03%
> >test_operator.test_layer_norm: 31.1242s
> 
> 
> Python3 CPU Openblas:
> > [success] 26.20% test_gluon_model_zoo.test_models: 563.3366s 
> > [success] 4.34% test_random.test_shuffle: 93.3157s [success] 4.31%
> > test_random.test_negative_binomial_generator: 92.6899s [success] 
> > 3.78%
> > test_sparse_operator.test_elemwise_binary_ops: 81.2048s  [success] 
> > 3.30% test_operator.test_psroipooling: 70.9090s  [success] 3.20%
> > test_random.test_poisson_generator: 68.7500s  [success] 3.10%
> > test_metric_perf.test_metric_performance: 66.6085s  [success] 2.79%
> > test_operator.test_layer_norm: 59.9566s  [success] 2.66%
> > test_gluon.test_slice_pooling2d_slice_pooling2d: 57.1887s  [success] 
> > 2.62% test_operator.test_pick: 56.2312s  [success] 2.60%
> > test_random.test_random: 55.8920s  [success] 2.19%
> > test_operator.test_broadcast_binary_op: 47.1879s [success] 0.96%
> > test_contrib_control_flow.test_cond: 20.6908s
> 
> Tests worth noting:
> - test_random.test_shuffle: 700% increase - but I don't know how this 
> may be related to MKLDNN. Are we doing random number generation in 
> either of those backends?
> - test_sparse_operator.test_elemwise_binary_ops: 700% increase
> - test_gluon_model_zoo.test_models: 40% decrease - that's awesome and 
> to be expect :)
> - test_operator.test_broadcast_binary_op: 80% increase
> - test_contrib_control_flow.test_cond: 250% increase
> - test_oper

RE: MKLDNN performance in CI

2018-11-22 Thread Zhao, Patric
Good point, Tao! 
Is this env enabled in all MKL-DNN CI? 

> -Original Message-
> From: Lv, Tao A [mailto:tao.a...@intel.com]
> Sent: Friday, November 23, 2018 9:53 AM
> To: dev@mxnet.incubator.apache.org
> Subject: RE: MKLDNN performance in CI
> 
> Thanks for bringing this up, Marco. It's really weird since most of those 
> tests
> listed in "worth noting" are not related to mkldnn backend.
> 
> I can understand that some tests for mkldnn operator may be slower
> because MXNET_MKLDNN_DEBUG is enabled in the CI:
> https://github.com/apache/incubator-
> mxnet/blob/master/ci/docker/runtime_functions.sh#L713
> 
> -Original Message-
> From: Marco de Abreu [mailto:marco.g.ab...@googlemail.com.INVALID]
> Sent: Friday, November 23, 2018 9:22 AM
> To: dev@mxnet.incubator.apache.org
> Subject: MKLDNN performance in CI
> 
> Hello,
> 
> I have noticed that our Python tests have been increasing in duration 
> recently.
> In order to analyse this further, I created the PR [1] which allows to record
> test durations. Please note that I did not dive deep on these numbers and
> that they have to be taken with a grain of salt since slaves have varying
> resource utilizations.
> 
> Please have a look at the two following logs:
> Python3 CPU MKLDNN:
> http://jenkins.mxnet-ci.amazon-
> ml.com/blue/rest/organizations/jenkins/pipelines/mxnet-
> validation/pipelines/unix-cpu/branches/PR-
> 13377/runs/2/nodes/155/steps/409/log/?start=0
> Python3 CPU Openblas:
> http://jenkins.mxnet-ci.amazon-
> ml.com/blue/rest/organizations/jenkins/pipelines/mxnet-
> validation/pipelines/unix-cpu/branches/PR-
> 13377/runs/2/nodes/152/steps/398/log/?start=0
> 
> If you scroll to the end (note that there are multiple test stages and
> summaries being printed in these logs), you will find the following
> statements:
> 
> Python3 CPU MKLDNN: "Ran 702 tests in 3042.102s"
> Python3 CPU Openblas: "Ran 702 tests in 2158.458s"
> 
> This shows that the MKLDNN is generally being about 40% slower than the
> Openblas backend. If we go into the details, we can see that some tests are
> significantly slower:
> 
> Python3 CPU MKLDNN:
> 
> >[success] 20.78% test_random.test_shuffle: 630.7165s [success] 17.79%
> >test_sparse_operator.test_elemwise_binary_ops: 540.0487s [success]
> >10.91% test_gluon_model_zoo.test_models: 331.1503s [success] 2.62%
> >test_operator.test_broadcast_binary_op: 79.4556s [success] 2.45%
> >test_operator.test_pick: 74.4041s [success] 2.39%
> >test_metric_perf.test_metric_performance: 72.5445s [success] 2.38%
> >test_random.test_negative_binomial_generator: 72.1751s [success] 1.84%
> >test_operator.test_psroipooling: 55.9432s [success] 1.78%
> >test_random.test_poisson_generator: 54.0104s [success] 1.72%
> >test_gluon.test_slice_pooling2d_slice_pooling2d: 52.3447s [success]
> >1.60% test_contrib_control_flow.test_cond: 48.6977s [success] 1.41%
> >test_random.test_random: 42.8712s [success] 1.03%
> >test_operator.test_layer_norm: 31.1242s
> 
> 
> Python3 CPU Openblas:
> > [success] 26.20% test_gluon_model_zoo.test_models: 563.3366s [success]
> > 4.34% test_random.test_shuffle: 93.3157s [success] 4.31%
> > test_random.test_negative_binomial_generator: 92.6899s [success] 3.78%
> > test_sparse_operator.test_elemwise_binary_ops: 81.2048s  [success]
> > 3.30% test_operator.test_psroipooling: 70.9090s  [success] 3.20%
> > test_random.test_poisson_generator: 68.7500s  [success] 3.10%
> > test_metric_perf.test_metric_performance: 66.6085s  [success] 2.79%
> > test_operator.test_layer_norm: 59.9566s  [success] 2.66%
> > test_gluon.test_slice_pooling2d_slice_pooling2d: 57.1887s  [success]
> > 2.62% test_operator.test_pick: 56.2312s  [success] 2.60%
> > test_random.test_random: 55.8920s  [success] 2.19%
> > test_operator.test_broadcast_binary_op: 47.1879s [success] 0.96%
> > test_contrib_control_flow.test_cond: 20.6908s
> 
> Tests worth noting:
> - test_random.test_shuffle: 700% increase - but I don't know how this may
> be related to MKLDNN. Are we doing random number generation in either of
> those backends?
> - test_sparse_operator.test_elemwise_binary_ops: 700% increase
> - test_gluon_model_zoo.test_models: 40% decrease - that's awesome and to
> be expect :)
> - test_operator.test_broadcast_binary_op: 80% increase
> - test_contrib_control_flow.test_cond: 250% increase
> - test_operator.test_layer_norm: 50% decrease - nice!
> 
> As I have stated previously, these numbers might not mean anything since
> the CI is not a benchmarking environment (sorry if these are false negatives),
> but I thought it might be worth mentioning so Intel could follow up and dive
> deeper.
> 
> Does anybody here create 1:1 operator comparisons (e.g. running
> layer_norm in the different backends to compare the performance) who
> could provide us with those numbers?
> 
> Best regards,
> Marco
> 
> [1]: https://github.com/apache/incubator-mxnet/pull/13377


RE: MKLDNN performance in CI

2018-11-22 Thread Zhao, Patric
Happy Thanksgiving, everyone :)

Hi Marco,

Thanks to raising this question. We will look into the details for CI test 
cases and Shufan will provide the 1:1 OP level performance data.

In general, the CI tests is not performant cases which covered the different 
situations, even corner cases, for the quality purpose.
Specifically, some overhead may be introduced if we connect  an MKL-DNN OP and 
non MKL-DNN OP where the extra data conversion will happen.
So, it's the expected behaviors that the performance drops in these kinds of 
test cases in which the computation is quite tiny.

But in the real workload, these situations are not such frequency and the total 
performance of MKL-DNN will be much better than OpenBLAS.
(data in here: 
https://cwiki.apache.org/confluence/display/MXNET/MXNet+with+Intel+MKL-DNN+-+Performance+Benchmarking)

For a short time, Shufan will help to look into these test cases and figure out 
a proper solution to make the CI faster.

For a medium and long time, we are working on implementing more MKL-DNN 
supported Ops, like reshape, slice, split, so that less data format conversion 
will be involved.

Feel free to let me know if you have any other concerns.

BR,

--Patric
 


> -Original Message-
> From: Marco de Abreu [mailto:marco.g.ab...@googlemail.com.INVALID]
> Sent: Friday, November 23, 2018 9:22 AM
> To: dev@mxnet.incubator.apache.org
> Subject: MKLDNN performance in CI
> 
> Hello,
> 
> I have noticed that our Python tests have been increasing in duration 
> recently.
> In order to analyse this further, I created the PR [1] which allows to record
> test durations. Please note that I did not dive deep on these numbers and
> that they have to be taken with a grain of salt since slaves have varying
> resource utilizations.
> 
> Please have a look at the two following logs:
> Python3 CPU MKLDNN:
> http://jenkins.mxnet-ci.amazon-
> ml.com/blue/rest/organizations/jenkins/pipelines/mxnet-
> validation/pipelines/unix-cpu/branches/PR-
> 13377/runs/2/nodes/155/steps/409/log/?start=0
> Python3 CPU Openblas:
> http://jenkins.mxnet-ci.amazon-
> ml.com/blue/rest/organizations/jenkins/pipelines/mxnet-
> validation/pipelines/unix-cpu/branches/PR-
> 13377/runs/2/nodes/152/steps/398/log/?start=0
> 
> If you scroll to the end (note that there are multiple test stages and
> summaries being printed in these logs), you will find the following
> statements:
> 
> Python3 CPU MKLDNN: "Ran 702 tests in 3042.102s"
> Python3 CPU Openblas: "Ran 702 tests in 2158.458s"
> 
> This shows that the MKLDNN is generally being about 40% slower than the
> Openblas backend. If we go into the details, we can see that some tests are
> significantly slower:
> 
> Python3 CPU MKLDNN:
> 
> >[success] 20.78% test_random.test_shuffle: 630.7165s [success] 17.79%
> >test_sparse_operator.test_elemwise_binary_ops: 540.0487s [success]
> >10.91% test_gluon_model_zoo.test_models: 331.1503s [success] 2.62%
> >test_operator.test_broadcast_binary_op: 79.4556s [success] 2.45%
> >test_operator.test_pick: 74.4041s [success] 2.39%
> >test_metric_perf.test_metric_performance: 72.5445s [success] 2.38%
> >test_random.test_negative_binomial_generator: 72.1751s [success] 1.84%
> >test_operator.test_psroipooling: 55.9432s [success] 1.78%
> >test_random.test_poisson_generator: 54.0104s [success] 1.72%
> >test_gluon.test_slice_pooling2d_slice_pooling2d: 52.3447s [success]
> >1.60% test_contrib_control_flow.test_cond: 48.6977s [success] 1.41%
> >test_random.test_random: 42.8712s [success] 1.03%
> >test_operator.test_layer_norm: 31.1242s
> 
> 
> Python3 CPU Openblas:
> > [success] 26.20% test_gluon_model_zoo.test_models: 563.3366s [success]
> > 4.34% test_random.test_shuffle: 93.3157s [success] 4.31%
> > test_random.test_negative_binomial_generator: 92.6899s [success] 3.78%
> > test_sparse_operator.test_elemwise_binary_ops: 81.2048s  [success]
> > 3.30% test_operator.test_psroipooling: 70.9090s  [success] 3.20%
> > test_random.test_poisson_generator: 68.7500s  [success] 3.10%
> > test_metric_perf.test_metric_performance: 66.6085s  [success] 2.79%
> > test_operator.test_layer_norm: 59.9566s  [success] 2.66%
> > test_gluon.test_slice_pooling2d_slice_pooling2d: 57.1887s  [success]
> > 2.62% test_operator.test_pick: 56.2312s  [success] 2.60%
> > test_random.test_random: 55.8920s  [success] 2.19%
> > test_operator.test_broadcast_binary_op: 47.1879s [success] 0.96%
> > test_contrib_control_flow.test_cond: 20.6908s
> 
> Tests worth noting:
> - test_random.test_shuffle: 700% increase - but I don't know how this may
> be related to MKLDNN. Are we doing random number generation in either of
> those backends?
> - test_sparse_operator.test_elemwise_binary_ops: 700% increase
> - test_gluon_model_zoo.test_models: 40% decrease - that's awesome and to
> be expect :)
> - test_operator.test_broadcast_binary_op: 80% increase
> - test_contrib_control_flow.test_cond: 250% increase
> - test_operator.test_layer_norm: 50% decrease - nice!
> 
> As I have stated 

RE: MKLDNN performance in CI

2018-11-22 Thread Lv, Tao A
Thanks for bringing this up, Marco. It's really weird since most of those tests 
listed in "worth noting" are not related to mkldnn backend.

I can understand that some tests for mkldnn operator may be slower because 
MXNET_MKLDNN_DEBUG is enabled in the CI: 
https://github.com/apache/incubator-mxnet/blob/master/ci/docker/runtime_functions.sh#L713
 

-Original Message-
From: Marco de Abreu [mailto:marco.g.ab...@googlemail.com.INVALID] 
Sent: Friday, November 23, 2018 9:22 AM
To: dev@mxnet.incubator.apache.org
Subject: MKLDNN performance in CI

Hello,

I have noticed that our Python tests have been increasing in duration recently. 
In order to analyse this further, I created the PR [1] which allows to record 
test durations. Please note that I did not dive deep on these numbers and that 
they have to be taken with a grain of salt since slaves have varying resource 
utilizations.

Please have a look at the two following logs:
Python3 CPU MKLDNN:
http://jenkins.mxnet-ci.amazon-ml.com/blue/rest/organizations/jenkins/pipelines/mxnet-validation/pipelines/unix-cpu/branches/PR-13377/runs/2/nodes/155/steps/409/log/?start=0
Python3 CPU Openblas:
http://jenkins.mxnet-ci.amazon-ml.com/blue/rest/organizations/jenkins/pipelines/mxnet-validation/pipelines/unix-cpu/branches/PR-13377/runs/2/nodes/152/steps/398/log/?start=0

If you scroll to the end (note that there are multiple test stages and 
summaries being printed in these logs), you will find the following
statements:

Python3 CPU MKLDNN: "Ran 702 tests in 3042.102s"
Python3 CPU Openblas: "Ran 702 tests in 2158.458s"

This shows that the MKLDNN is generally being about 40% slower than the 
Openblas backend. If we go into the details, we can see that some tests are 
significantly slower:

Python3 CPU MKLDNN:

>[success] 20.78% test_random.test_shuffle: 630.7165s [success] 17.79% 
>test_sparse_operator.test_elemwise_binary_ops: 540.0487s [success] 
>10.91% test_gluon_model_zoo.test_models: 331.1503s [success] 2.62% 
>test_operator.test_broadcast_binary_op: 79.4556s [success] 2.45% 
>test_operator.test_pick: 74.4041s [success] 2.39% 
>test_metric_perf.test_metric_performance: 72.5445s [success] 2.38% 
>test_random.test_negative_binomial_generator: 72.1751s [success] 1.84% 
>test_operator.test_psroipooling: 55.9432s [success] 1.78% 
>test_random.test_poisson_generator: 54.0104s [success] 1.72% 
>test_gluon.test_slice_pooling2d_slice_pooling2d: 52.3447s [success] 
>1.60% test_contrib_control_flow.test_cond: 48.6977s [success] 1.41% 
>test_random.test_random: 42.8712s [success] 1.03% 
>test_operator.test_layer_norm: 31.1242s


Python3 CPU Openblas:
> [success] 26.20% test_gluon_model_zoo.test_models: 563.3366s [success] 
> 4.34% test_random.test_shuffle: 93.3157s [success] 4.31% 
> test_random.test_negative_binomial_generator: 92.6899s [success] 3.78% 
> test_sparse_operator.test_elemwise_binary_ops: 81.2048s  [success] 
> 3.30% test_operator.test_psroipooling: 70.9090s  [success] 3.20% 
> test_random.test_poisson_generator: 68.7500s  [success] 3.10% 
> test_metric_perf.test_metric_performance: 66.6085s  [success] 2.79% 
> test_operator.test_layer_norm: 59.9566s  [success] 2.66% 
> test_gluon.test_slice_pooling2d_slice_pooling2d: 57.1887s  [success] 
> 2.62% test_operator.test_pick: 56.2312s  [success] 2.60% 
> test_random.test_random: 55.8920s  [success] 2.19% 
> test_operator.test_broadcast_binary_op: 47.1879s [success] 0.96% 
> test_contrib_control_flow.test_cond: 20.6908s

Tests worth noting:
- test_random.test_shuffle: 700% increase - but I don't know how this may be 
related to MKLDNN. Are we doing random number generation in either of those 
backends?
- test_sparse_operator.test_elemwise_binary_ops: 700% increase
- test_gluon_model_zoo.test_models: 40% decrease - that's awesome and to be 
expect :)
- test_operator.test_broadcast_binary_op: 80% increase
- test_contrib_control_flow.test_cond: 250% increase
- test_operator.test_layer_norm: 50% decrease - nice!

As I have stated previously, these numbers might not mean anything since the CI 
is not a benchmarking environment (sorry if these are false negatives), but I 
thought it might be worth mentioning so Intel could follow up and dive deeper.

Does anybody here create 1:1 operator comparisons (e.g. running layer_norm in 
the different backends to compare the performance) who could provide us with 
those numbers?

Best regards,
Marco

[1]: https://github.com/apache/incubator-mxnet/pull/13377