Re: MKLDNN performance in CI
It’s great to hear that works. Thanks for your effort on this, Marco. -Tao Sent from my iPhone > On Nov 23, 2018, at 7:31 PM, Marco de Abreu > wrote: > > Great news, it was the debug flag: > > [success] 16.54% test_gluon_model_zoo.test_models: 320.0898s > [success] 6.64% test_random.test_shuffle: 128.5430s > [success] 5.67% test_sparse_operator.test_elemwise_binary_ops: 109.6650s > [success] 4.41% test_metric_perf.test_metric_performance: 85.3215s > [success] 4.28% test_random.test_negative_binomial_generator: 82.8046s > [success] 3.91% test_operator.test_pick: 75.7241s > [success] 3.34% test_operator.test_psroipooling: 64.7008s > [success] 3.30% test_random.test_poisson_generator: 63.9218s > [success] 3.24% test_operator.test_broadcast_binary_op: 62.7417s > [success] 2.95% test_random.test_random: 57.0268s > [success] 2.95% test_gluon.test_slice_pooling2d_slice_pooling2d: 57.0118s > [success] 2.03% test_random.test_normal_generator: 39.3641s > [success] 1.86% test_io.test_Cifar10Rec: 36.0722s > [success] 1.76% test_random.test_gamma_generator: 34.0995s > [success] 1.65% test_gluon.test_slice_batchnorm: 31.9859s > [success] 1.63% test_gluon.test_slice_pooling2d: 31.5945s > > [...] > > Ran 703 tests in 1941.053s > > http://jenkins.mxnet-ci.amazon-ml.com/blue/rest/organizations/jenkins/pipelines/mxnet-validation/pipelines/unix-cpu/branches/PR-13379/runs/1/nodes/155/steps/409/log/?start=0 > > We are all good and set then. Thanks Tao and Patric for your support! > > Best regards, > Marco > > On Fri, Nov 23, 2018 at 3:44 AM Marco de Abreu > wrote: > >> Sure, good idea! https://github.com/apache/incubator-mxnet/pull/13379 >> >> -Marco >> >> On Fri, Nov 23, 2018 at 3:38 AM Zhao, Patric >> wrote: >> >>> Thanks, it should be the most time-consuming parts. >>> >>> @Marco, could you try to disable this env and see the performance again? >>> >>>> -Original Message- >>>> From: Lv, Tao A [mailto:tao.a...@intel.com] >>>> Sent: Friday, November 23, 2018 10:26 AM >>>> To: dev@mxnet.incubator.apache.org >>>> Subject: RE: MKLDNN performance in CI >>>> >>>> I think yes, except the cpp test. >>>> >>>> -Original Message- >>>> From: Zhao, Patric [mailto:patric.z...@intel.com] >>>> Sent: Friday, November 23, 2018 10:06 AM >>>> To: dev@mxnet.incubator.apache.org >>>> Subject: RE: MKLDNN performance in CI >>>> >>>> Good point, Tao! >>>> Is this env enabled in all MKL-DNN CI? >>>> >>>>> -Original Message- >>>>> From: Lv, Tao A [mailto:tao.a...@intel.com] >>>>> Sent: Friday, November 23, 2018 9:53 AM >>>>> To: dev@mxnet.incubator.apache.org >>>>> Subject: RE: MKLDNN performance in CI >>>>> >>>>> Thanks for bringing this up, Marco. It's really weird since most of >>>>> those tests listed in "worth noting" are not related to mkldnn >>> backend. >>>>> >>>>> I can understand that some tests for mkldnn operator may be slower >>>>> because MXNET_MKLDNN_DEBUG is enabled in the CI: >>>>> https://github.com/apache/incubator- >>>>> mxnet/blob/master/ci/docker/runtime_functions.sh#L713 >>>>> >>>>> -Original Message- >>>>> From: Marco de Abreu [mailto:marco.g.ab...@googlemail.com.INVALID] >>>>> Sent: Friday, November 23, 2018 9:22 AM >>>>> To: dev@mxnet.incubator.apache.org >>>>> Subject: MKLDNN performance in CI >>>>> >>>>> Hello, >>>>> >>>>> I have noticed that our Python tests have been increasing in duration >>>> recently. >>>>> In order to analyse this further, I created the PR [1] which allows to >>>>> record test durations. Please note that I did not dive deep on these >>>>> numbers and that they have to be taken with a grain of salt since >>>>> slaves have varying resource utilizations. >>>>> >>>>> Please have a look at the two following logs: >>>>> Python3 CPU MKLDNN: >>>>> http://jenkins.mxnet-ci.amazon- >>>>> ml.com/blue/rest/organizations/jenkins/pipelines/mxnet- >>>>> validation/pipelines/unix-cpu/branches/PR- >>>>> 13377/runs/2/nodes/155/steps/409/log/?start=0 >>>>> Python3 CPU Openblas: >>
Re: MKLDNN performance in CI
Great news, it was the debug flag: [success] 16.54% test_gluon_model_zoo.test_models: 320.0898s [success] 6.64% test_random.test_shuffle: 128.5430s [success] 5.67% test_sparse_operator.test_elemwise_binary_ops: 109.6650s [success] 4.41% test_metric_perf.test_metric_performance: 85.3215s [success] 4.28% test_random.test_negative_binomial_generator: 82.8046s [success] 3.91% test_operator.test_pick: 75.7241s [success] 3.34% test_operator.test_psroipooling: 64.7008s [success] 3.30% test_random.test_poisson_generator: 63.9218s [success] 3.24% test_operator.test_broadcast_binary_op: 62.7417s [success] 2.95% test_random.test_random: 57.0268s [success] 2.95% test_gluon.test_slice_pooling2d_slice_pooling2d: 57.0118s [success] 2.03% test_random.test_normal_generator: 39.3641s [success] 1.86% test_io.test_Cifar10Rec: 36.0722s [success] 1.76% test_random.test_gamma_generator: 34.0995s [success] 1.65% test_gluon.test_slice_batchnorm: 31.9859s [success] 1.63% test_gluon.test_slice_pooling2d: 31.5945s [...] Ran 703 tests in 1941.053s http://jenkins.mxnet-ci.amazon-ml.com/blue/rest/organizations/jenkins/pipelines/mxnet-validation/pipelines/unix-cpu/branches/PR-13379/runs/1/nodes/155/steps/409/log/?start=0 We are all good and set then. Thanks Tao and Patric for your support! Best regards, Marco On Fri, Nov 23, 2018 at 3:44 AM Marco de Abreu wrote: > Sure, good idea! https://github.com/apache/incubator-mxnet/pull/13379 > > -Marco > > On Fri, Nov 23, 2018 at 3:38 AM Zhao, Patric > wrote: > >> Thanks, it should be the most time-consuming parts. >> >> @Marco, could you try to disable this env and see the performance again? >> >> > -Original Message- >> > From: Lv, Tao A [mailto:tao.a...@intel.com] >> > Sent: Friday, November 23, 2018 10:26 AM >> > To: dev@mxnet.incubator.apache.org >> > Subject: RE: MKLDNN performance in CI >> > >> > I think yes, except the cpp test. >> > >> > -Original Message- >> > From: Zhao, Patric [mailto:patric.z...@intel.com] >> > Sent: Friday, November 23, 2018 10:06 AM >> > To: dev@mxnet.incubator.apache.org >> > Subject: RE: MKLDNN performance in CI >> > >> > Good point, Tao! >> > Is this env enabled in all MKL-DNN CI? >> > >> > > -Original Message- >> > > From: Lv, Tao A [mailto:tao.a...@intel.com] >> > > Sent: Friday, November 23, 2018 9:53 AM >> > > To: dev@mxnet.incubator.apache.org >> > > Subject: RE: MKLDNN performance in CI >> > > >> > > Thanks for bringing this up, Marco. It's really weird since most of >> > > those tests listed in "worth noting" are not related to mkldnn >> backend. >> > > >> > > I can understand that some tests for mkldnn operator may be slower >> > > because MXNET_MKLDNN_DEBUG is enabled in the CI: >> > > https://github.com/apache/incubator- >> > > mxnet/blob/master/ci/docker/runtime_functions.sh#L713 >> > > >> > > -Original Message- >> > > From: Marco de Abreu [mailto:marco.g.ab...@googlemail.com.INVALID] >> > > Sent: Friday, November 23, 2018 9:22 AM >> > > To: dev@mxnet.incubator.apache.org >> > > Subject: MKLDNN performance in CI >> > > >> > > Hello, >> > > >> > > I have noticed that our Python tests have been increasing in duration >> > recently. >> > > In order to analyse this further, I created the PR [1] which allows to >> > > record test durations. Please note that I did not dive deep on these >> > > numbers and that they have to be taken with a grain of salt since >> > > slaves have varying resource utilizations. >> > > >> > > Please have a look at the two following logs: >> > > Python3 CPU MKLDNN: >> > > http://jenkins.mxnet-ci.amazon- >> > > ml.com/blue/rest/organizations/jenkins/pipelines/mxnet- >> > > validation/pipelines/unix-cpu/branches/PR- >> > > 13377/runs/2/nodes/155/steps/409/log/?start=0 >> > > Python3 CPU Openblas: >> > > http://jenkins.mxnet-ci.amazon- >> > > ml.com/blue/rest/organizations/jenkins/pipelines/mxnet- >> > > validation/pipelines/unix-cpu/branches/PR- >> > > 13377/runs/2/nodes/152/steps/398/log/?start=0 >> > > >> > > If you scroll to the end (note that there are multiple test stages and >> > > summaries being printed in these logs), you will find the following >> > > statements: >> > > >> > > Python3 CPU MKL
Re: MKLDNN performance in CI
Sure, good idea! https://github.com/apache/incubator-mxnet/pull/13379 -Marco On Fri, Nov 23, 2018 at 3:38 AM Zhao, Patric wrote: > Thanks, it should be the most time-consuming parts. > > @Marco, could you try to disable this env and see the performance again? > > > -Original Message- > > From: Lv, Tao A [mailto:tao.a...@intel.com] > > Sent: Friday, November 23, 2018 10:26 AM > > To: dev@mxnet.incubator.apache.org > > Subject: RE: MKLDNN performance in CI > > > > I think yes, except the cpp test. > > > > -Original Message- > > From: Zhao, Patric [mailto:patric.z...@intel.com] > > Sent: Friday, November 23, 2018 10:06 AM > > To: dev@mxnet.incubator.apache.org > > Subject: RE: MKLDNN performance in CI > > > > Good point, Tao! > > Is this env enabled in all MKL-DNN CI? > > > > > -Original Message- > > > From: Lv, Tao A [mailto:tao.a...@intel.com] > > > Sent: Friday, November 23, 2018 9:53 AM > > > To: dev@mxnet.incubator.apache.org > > > Subject: RE: MKLDNN performance in CI > > > > > > Thanks for bringing this up, Marco. It's really weird since most of > > > those tests listed in "worth noting" are not related to mkldnn backend. > > > > > > I can understand that some tests for mkldnn operator may be slower > > > because MXNET_MKLDNN_DEBUG is enabled in the CI: > > > https://github.com/apache/incubator- > > > mxnet/blob/master/ci/docker/runtime_functions.sh#L713 > > > > > > -Original Message- > > > From: Marco de Abreu [mailto:marco.g.ab...@googlemail.com.INVALID] > > > Sent: Friday, November 23, 2018 9:22 AM > > > To: dev@mxnet.incubator.apache.org > > > Subject: MKLDNN performance in CI > > > > > > Hello, > > > > > > I have noticed that our Python tests have been increasing in duration > > recently. > > > In order to analyse this further, I created the PR [1] which allows to > > > record test durations. Please note that I did not dive deep on these > > > numbers and that they have to be taken with a grain of salt since > > > slaves have varying resource utilizations. > > > > > > Please have a look at the two following logs: > > > Python3 CPU MKLDNN: > > > http://jenkins.mxnet-ci.amazon- > > > ml.com/blue/rest/organizations/jenkins/pipelines/mxnet- > > > validation/pipelines/unix-cpu/branches/PR- > > > 13377/runs/2/nodes/155/steps/409/log/?start=0 > > > Python3 CPU Openblas: > > > http://jenkins.mxnet-ci.amazon- > > > ml.com/blue/rest/organizations/jenkins/pipelines/mxnet- > > > validation/pipelines/unix-cpu/branches/PR- > > > 13377/runs/2/nodes/152/steps/398/log/?start=0 > > > > > > If you scroll to the end (note that there are multiple test stages and > > > summaries being printed in these logs), you will find the following > > > statements: > > > > > > Python3 CPU MKLDNN: "Ran 702 tests in 3042.102s" > > > Python3 CPU Openblas: "Ran 702 tests in 2158.458s" > > > > > > This shows that the MKLDNN is generally being about 40% slower than > > > the Openblas backend. If we go into the details, we can see that some > > > tests are significantly slower: > > > > > > Python3 CPU MKLDNN: > > > > > > >[success] 20.78% test_random.test_shuffle: 630.7165s [success] 17.79% > > > >test_sparse_operator.test_elemwise_binary_ops: 540.0487s [success] > > > >10.91% test_gluon_model_zoo.test_models: 331.1503s [success] 2.62% > > > >test_operator.test_broadcast_binary_op: 79.4556s [success] 2.45% > > > >test_operator.test_pick: 74.4041s [success] 2.39% > > > >test_metric_perf.test_metric_performance: 72.5445s [success] 2.38% > > > >test_random.test_negative_binomial_generator: 72.1751s [success] > > > >1.84% > > > >test_operator.test_psroipooling: 55.9432s [success] 1.78% > > > >test_random.test_poisson_generator: 54.0104s [success] 1.72% > > > >test_gluon.test_slice_pooling2d_slice_pooling2d: 52.3447s [success] > > > >1.60% test_contrib_control_flow.test_cond: 48.6977s [success] 1.41% > > > >test_random.test_random: 42.8712s [success] 1.03% > > > >test_operator.test_layer_norm: 31.1242s > > > > > > > > > Python3 CPU Openblas: > > > > [success] 26.20% test_gluon_model_zoo.test_models: 563.3366s > > > > [success] 4.34% test_random.test_shuffle: 93.3157s [success
RE: MKLDNN performance in CI
Thanks, it should be the most time-consuming parts. @Marco, could you try to disable this env and see the performance again? > -Original Message- > From: Lv, Tao A [mailto:tao.a...@intel.com] > Sent: Friday, November 23, 2018 10:26 AM > To: dev@mxnet.incubator.apache.org > Subject: RE: MKLDNN performance in CI > > I think yes, except the cpp test. > > -Original Message- > From: Zhao, Patric [mailto:patric.z...@intel.com] > Sent: Friday, November 23, 2018 10:06 AM > To: dev@mxnet.incubator.apache.org > Subject: RE: MKLDNN performance in CI > > Good point, Tao! > Is this env enabled in all MKL-DNN CI? > > > -Original Message- > > From: Lv, Tao A [mailto:tao.a...@intel.com] > > Sent: Friday, November 23, 2018 9:53 AM > > To: dev@mxnet.incubator.apache.org > > Subject: RE: MKLDNN performance in CI > > > > Thanks for bringing this up, Marco. It's really weird since most of > > those tests listed in "worth noting" are not related to mkldnn backend. > > > > I can understand that some tests for mkldnn operator may be slower > > because MXNET_MKLDNN_DEBUG is enabled in the CI: > > https://github.com/apache/incubator- > > mxnet/blob/master/ci/docker/runtime_functions.sh#L713 > > > > -Original Message- > > From: Marco de Abreu [mailto:marco.g.ab...@googlemail.com.INVALID] > > Sent: Friday, November 23, 2018 9:22 AM > > To: dev@mxnet.incubator.apache.org > > Subject: MKLDNN performance in CI > > > > Hello, > > > > I have noticed that our Python tests have been increasing in duration > recently. > > In order to analyse this further, I created the PR [1] which allows to > > record test durations. Please note that I did not dive deep on these > > numbers and that they have to be taken with a grain of salt since > > slaves have varying resource utilizations. > > > > Please have a look at the two following logs: > > Python3 CPU MKLDNN: > > http://jenkins.mxnet-ci.amazon- > > ml.com/blue/rest/organizations/jenkins/pipelines/mxnet- > > validation/pipelines/unix-cpu/branches/PR- > > 13377/runs/2/nodes/155/steps/409/log/?start=0 > > Python3 CPU Openblas: > > http://jenkins.mxnet-ci.amazon- > > ml.com/blue/rest/organizations/jenkins/pipelines/mxnet- > > validation/pipelines/unix-cpu/branches/PR- > > 13377/runs/2/nodes/152/steps/398/log/?start=0 > > > > If you scroll to the end (note that there are multiple test stages and > > summaries being printed in these logs), you will find the following > > statements: > > > > Python3 CPU MKLDNN: "Ran 702 tests in 3042.102s" > > Python3 CPU Openblas: "Ran 702 tests in 2158.458s" > > > > This shows that the MKLDNN is generally being about 40% slower than > > the Openblas backend. If we go into the details, we can see that some > > tests are significantly slower: > > > > Python3 CPU MKLDNN: > > > > >[success] 20.78% test_random.test_shuffle: 630.7165s [success] 17.79% > > >test_sparse_operator.test_elemwise_binary_ops: 540.0487s [success] > > >10.91% test_gluon_model_zoo.test_models: 331.1503s [success] 2.62% > > >test_operator.test_broadcast_binary_op: 79.4556s [success] 2.45% > > >test_operator.test_pick: 74.4041s [success] 2.39% > > >test_metric_perf.test_metric_performance: 72.5445s [success] 2.38% > > >test_random.test_negative_binomial_generator: 72.1751s [success] > > >1.84% > > >test_operator.test_psroipooling: 55.9432s [success] 1.78% > > >test_random.test_poisson_generator: 54.0104s [success] 1.72% > > >test_gluon.test_slice_pooling2d_slice_pooling2d: 52.3447s [success] > > >1.60% test_contrib_control_flow.test_cond: 48.6977s [success] 1.41% > > >test_random.test_random: 42.8712s [success] 1.03% > > >test_operator.test_layer_norm: 31.1242s > > > > > > Python3 CPU Openblas: > > > [success] 26.20% test_gluon_model_zoo.test_models: 563.3366s > > > [success] 4.34% test_random.test_shuffle: 93.3157s [success] 4.31% > > > test_random.test_negative_binomial_generator: 92.6899s [success] > > > 3.78% > > > test_sparse_operator.test_elemwise_binary_ops: 81.2048s [success] > > > 3.30% test_operator.test_psroipooling: 70.9090s [success] 3.20% > > > test_random.test_poisson_generator: 68.7500s [success] 3.10% > > > test_metric_perf.test_metric_performance: 66.6085s [success] 2.79% > > > test_operator.test_layer_norm: 59.9566s [success] 2.66% > > > test_gluon.test_slice_pooling2d_slic
Re: MKLDNN performance in CI
Okay, that makes sense. Thanks for the very detailed explanation! Don't worry about speed in CI, especially the debug flag might be the reason here. But I'm happy to leave it in since the task of CI is accuracy and detailed reporting, but not speed :) I just wanted to give a quick heads up in case this could be an indicator. I'm looking forward to the 1:1 comparisons, but besides that there's no further action required. Thanks for your efforts! Best regards, Marco Am Fr., 23. Nov. 2018, 03:25 hat Lv, Tao A geschrieben: > I think yes, except the cpp test. > > -Original Message- > From: Zhao, Patric [mailto:patric.z...@intel.com] > Sent: Friday, November 23, 2018 10:06 AM > To: dev@mxnet.incubator.apache.org > Subject: RE: MKLDNN performance in CI > > Good point, Tao! > Is this env enabled in all MKL-DNN CI? > > > -Original Message- > > From: Lv, Tao A [mailto:tao.a...@intel.com] > > Sent: Friday, November 23, 2018 9:53 AM > > To: dev@mxnet.incubator.apache.org > > Subject: RE: MKLDNN performance in CI > > > > Thanks for bringing this up, Marco. It's really weird since most of > > those tests listed in "worth noting" are not related to mkldnn backend. > > > > I can understand that some tests for mkldnn operator may be slower > > because MXNET_MKLDNN_DEBUG is enabled in the CI: > > https://github.com/apache/incubator- > > mxnet/blob/master/ci/docker/runtime_functions.sh#L713 > > > > -Original Message- > > From: Marco de Abreu [mailto:marco.g.ab...@googlemail.com.INVALID] > > Sent: Friday, November 23, 2018 9:22 AM > > To: dev@mxnet.incubator.apache.org > > Subject: MKLDNN performance in CI > > > > Hello, > > > > I have noticed that our Python tests have been increasing in duration > recently. > > In order to analyse this further, I created the PR [1] which allows to > > record test durations. Please note that I did not dive deep on these > > numbers and that they have to be taken with a grain of salt since > > slaves have varying resource utilizations. > > > > Please have a look at the two following logs: > > Python3 CPU MKLDNN: > > http://jenkins.mxnet-ci.amazon- > > ml.com/blue/rest/organizations/jenkins/pipelines/mxnet- > > validation/pipelines/unix-cpu/branches/PR- > > 13377/runs/2/nodes/155/steps/409/log/?start=0 > > Python3 CPU Openblas: > > http://jenkins.mxnet-ci.amazon- > > ml.com/blue/rest/organizations/jenkins/pipelines/mxnet- > > validation/pipelines/unix-cpu/branches/PR- > > 13377/runs/2/nodes/152/steps/398/log/?start=0 > > > > If you scroll to the end (note that there are multiple test stages and > > summaries being printed in these logs), you will find the following > > statements: > > > > Python3 CPU MKLDNN: "Ran 702 tests in 3042.102s" > > Python3 CPU Openblas: "Ran 702 tests in 2158.458s" > > > > This shows that the MKLDNN is generally being about 40% slower than > > the Openblas backend. If we go into the details, we can see that some > > tests are significantly slower: > > > > Python3 CPU MKLDNN: > > > > >[success] 20.78% test_random.test_shuffle: 630.7165s [success] 17.79% > > >test_sparse_operator.test_elemwise_binary_ops: 540.0487s [success] > > >10.91% test_gluon_model_zoo.test_models: 331.1503s [success] 2.62% > > >test_operator.test_broadcast_binary_op: 79.4556s [success] 2.45% > > >test_operator.test_pick: 74.4041s [success] 2.39% > > >test_metric_perf.test_metric_performance: 72.5445s [success] 2.38% > > >test_random.test_negative_binomial_generator: 72.1751s [success] > > >1.84% > > >test_operator.test_psroipooling: 55.9432s [success] 1.78% > > >test_random.test_poisson_generator: 54.0104s [success] 1.72% > > >test_gluon.test_slice_pooling2d_slice_pooling2d: 52.3447s [success] > > >1.60% test_contrib_control_flow.test_cond: 48.6977s [success] 1.41% > > >test_random.test_random: 42.8712s [success] 1.03% > > >test_operator.test_layer_norm: 31.1242s > > > > > > Python3 CPU Openblas: > > > [success] 26.20% test_gluon_model_zoo.test_models: 563.3366s > > > [success] 4.34% test_random.test_shuffle: 93.3157s [success] 4.31% > > > test_random.test_negative_binomial_generator: 92.6899s [success] > > > 3.78% > > > test_sparse_operator.test_elemwise_binary_ops: 81.2048s [success] > > > 3.30% test_operator.test_psroipooling: 70.9090s [success] 3.20% > > > test_random.test_poisson_generator: 68.7500s [success] 3.10% > > > test_metric_perf.test_metric_performance: 66.6085s
RE: MKLDNN performance in CI
I think yes, except the cpp test. -Original Message- From: Zhao, Patric [mailto:patric.z...@intel.com] Sent: Friday, November 23, 2018 10:06 AM To: dev@mxnet.incubator.apache.org Subject: RE: MKLDNN performance in CI Good point, Tao! Is this env enabled in all MKL-DNN CI? > -Original Message- > From: Lv, Tao A [mailto:tao.a...@intel.com] > Sent: Friday, November 23, 2018 9:53 AM > To: dev@mxnet.incubator.apache.org > Subject: RE: MKLDNN performance in CI > > Thanks for bringing this up, Marco. It's really weird since most of > those tests listed in "worth noting" are not related to mkldnn backend. > > I can understand that some tests for mkldnn operator may be slower > because MXNET_MKLDNN_DEBUG is enabled in the CI: > https://github.com/apache/incubator- > mxnet/blob/master/ci/docker/runtime_functions.sh#L713 > > -Original Message- > From: Marco de Abreu [mailto:marco.g.ab...@googlemail.com.INVALID] > Sent: Friday, November 23, 2018 9:22 AM > To: dev@mxnet.incubator.apache.org > Subject: MKLDNN performance in CI > > Hello, > > I have noticed that our Python tests have been increasing in duration > recently. > In order to analyse this further, I created the PR [1] which allows to > record test durations. Please note that I did not dive deep on these > numbers and that they have to be taken with a grain of salt since > slaves have varying resource utilizations. > > Please have a look at the two following logs: > Python3 CPU MKLDNN: > http://jenkins.mxnet-ci.amazon- > ml.com/blue/rest/organizations/jenkins/pipelines/mxnet- > validation/pipelines/unix-cpu/branches/PR- > 13377/runs/2/nodes/155/steps/409/log/?start=0 > Python3 CPU Openblas: > http://jenkins.mxnet-ci.amazon- > ml.com/blue/rest/organizations/jenkins/pipelines/mxnet- > validation/pipelines/unix-cpu/branches/PR- > 13377/runs/2/nodes/152/steps/398/log/?start=0 > > If you scroll to the end (note that there are multiple test stages and > summaries being printed in these logs), you will find the following > statements: > > Python3 CPU MKLDNN: "Ran 702 tests in 3042.102s" > Python3 CPU Openblas: "Ran 702 tests in 2158.458s" > > This shows that the MKLDNN is generally being about 40% slower than > the Openblas backend. If we go into the details, we can see that some > tests are significantly slower: > > Python3 CPU MKLDNN: > > >[success] 20.78% test_random.test_shuffle: 630.7165s [success] 17.79% > >test_sparse_operator.test_elemwise_binary_ops: 540.0487s [success] > >10.91% test_gluon_model_zoo.test_models: 331.1503s [success] 2.62% > >test_operator.test_broadcast_binary_op: 79.4556s [success] 2.45% > >test_operator.test_pick: 74.4041s [success] 2.39% > >test_metric_perf.test_metric_performance: 72.5445s [success] 2.38% > >test_random.test_negative_binomial_generator: 72.1751s [success] > >1.84% > >test_operator.test_psroipooling: 55.9432s [success] 1.78% > >test_random.test_poisson_generator: 54.0104s [success] 1.72% > >test_gluon.test_slice_pooling2d_slice_pooling2d: 52.3447s [success] > >1.60% test_contrib_control_flow.test_cond: 48.6977s [success] 1.41% > >test_random.test_random: 42.8712s [success] 1.03% > >test_operator.test_layer_norm: 31.1242s > > > Python3 CPU Openblas: > > [success] 26.20% test_gluon_model_zoo.test_models: 563.3366s > > [success] 4.34% test_random.test_shuffle: 93.3157s [success] 4.31% > > test_random.test_negative_binomial_generator: 92.6899s [success] > > 3.78% > > test_sparse_operator.test_elemwise_binary_ops: 81.2048s [success] > > 3.30% test_operator.test_psroipooling: 70.9090s [success] 3.20% > > test_random.test_poisson_generator: 68.7500s [success] 3.10% > > test_metric_perf.test_metric_performance: 66.6085s [success] 2.79% > > test_operator.test_layer_norm: 59.9566s [success] 2.66% > > test_gluon.test_slice_pooling2d_slice_pooling2d: 57.1887s [success] > > 2.62% test_operator.test_pick: 56.2312s [success] 2.60% > > test_random.test_random: 55.8920s [success] 2.19% > > test_operator.test_broadcast_binary_op: 47.1879s [success] 0.96% > > test_contrib_control_flow.test_cond: 20.6908s > > Tests worth noting: > - test_random.test_shuffle: 700% increase - but I don't know how this > may be related to MKLDNN. Are we doing random number generation in > either of those backends? > - test_sparse_operator.test_elemwise_binary_ops: 700% increase > - test_gluon_model_zoo.test_models: 40% decrease - that's awesome and > to be expect :) > - test_operator.test_broadcast_binary_op: 80% increase > - test_contrib_control_flow.test_cond: 250% increase > - test_oper
RE: MKLDNN performance in CI
Good point, Tao! Is this env enabled in all MKL-DNN CI? > -Original Message- > From: Lv, Tao A [mailto:tao.a...@intel.com] > Sent: Friday, November 23, 2018 9:53 AM > To: dev@mxnet.incubator.apache.org > Subject: RE: MKLDNN performance in CI > > Thanks for bringing this up, Marco. It's really weird since most of those > tests > listed in "worth noting" are not related to mkldnn backend. > > I can understand that some tests for mkldnn operator may be slower > because MXNET_MKLDNN_DEBUG is enabled in the CI: > https://github.com/apache/incubator- > mxnet/blob/master/ci/docker/runtime_functions.sh#L713 > > -Original Message- > From: Marco de Abreu [mailto:marco.g.ab...@googlemail.com.INVALID] > Sent: Friday, November 23, 2018 9:22 AM > To: dev@mxnet.incubator.apache.org > Subject: MKLDNN performance in CI > > Hello, > > I have noticed that our Python tests have been increasing in duration > recently. > In order to analyse this further, I created the PR [1] which allows to record > test durations. Please note that I did not dive deep on these numbers and > that they have to be taken with a grain of salt since slaves have varying > resource utilizations. > > Please have a look at the two following logs: > Python3 CPU MKLDNN: > http://jenkins.mxnet-ci.amazon- > ml.com/blue/rest/organizations/jenkins/pipelines/mxnet- > validation/pipelines/unix-cpu/branches/PR- > 13377/runs/2/nodes/155/steps/409/log/?start=0 > Python3 CPU Openblas: > http://jenkins.mxnet-ci.amazon- > ml.com/blue/rest/organizations/jenkins/pipelines/mxnet- > validation/pipelines/unix-cpu/branches/PR- > 13377/runs/2/nodes/152/steps/398/log/?start=0 > > If you scroll to the end (note that there are multiple test stages and > summaries being printed in these logs), you will find the following > statements: > > Python3 CPU MKLDNN: "Ran 702 tests in 3042.102s" > Python3 CPU Openblas: "Ran 702 tests in 2158.458s" > > This shows that the MKLDNN is generally being about 40% slower than the > Openblas backend. If we go into the details, we can see that some tests are > significantly slower: > > Python3 CPU MKLDNN: > > >[success] 20.78% test_random.test_shuffle: 630.7165s [success] 17.79% > >test_sparse_operator.test_elemwise_binary_ops: 540.0487s [success] > >10.91% test_gluon_model_zoo.test_models: 331.1503s [success] 2.62% > >test_operator.test_broadcast_binary_op: 79.4556s [success] 2.45% > >test_operator.test_pick: 74.4041s [success] 2.39% > >test_metric_perf.test_metric_performance: 72.5445s [success] 2.38% > >test_random.test_negative_binomial_generator: 72.1751s [success] 1.84% > >test_operator.test_psroipooling: 55.9432s [success] 1.78% > >test_random.test_poisson_generator: 54.0104s [success] 1.72% > >test_gluon.test_slice_pooling2d_slice_pooling2d: 52.3447s [success] > >1.60% test_contrib_control_flow.test_cond: 48.6977s [success] 1.41% > >test_random.test_random: 42.8712s [success] 1.03% > >test_operator.test_layer_norm: 31.1242s > > > Python3 CPU Openblas: > > [success] 26.20% test_gluon_model_zoo.test_models: 563.3366s [success] > > 4.34% test_random.test_shuffle: 93.3157s [success] 4.31% > > test_random.test_negative_binomial_generator: 92.6899s [success] 3.78% > > test_sparse_operator.test_elemwise_binary_ops: 81.2048s [success] > > 3.30% test_operator.test_psroipooling: 70.9090s [success] 3.20% > > test_random.test_poisson_generator: 68.7500s [success] 3.10% > > test_metric_perf.test_metric_performance: 66.6085s [success] 2.79% > > test_operator.test_layer_norm: 59.9566s [success] 2.66% > > test_gluon.test_slice_pooling2d_slice_pooling2d: 57.1887s [success] > > 2.62% test_operator.test_pick: 56.2312s [success] 2.60% > > test_random.test_random: 55.8920s [success] 2.19% > > test_operator.test_broadcast_binary_op: 47.1879s [success] 0.96% > > test_contrib_control_flow.test_cond: 20.6908s > > Tests worth noting: > - test_random.test_shuffle: 700% increase - but I don't know how this may > be related to MKLDNN. Are we doing random number generation in either of > those backends? > - test_sparse_operator.test_elemwise_binary_ops: 700% increase > - test_gluon_model_zoo.test_models: 40% decrease - that's awesome and to > be expect :) > - test_operator.test_broadcast_binary_op: 80% increase > - test_contrib_control_flow.test_cond: 250% increase > - test_operator.test_layer_norm: 50% decrease - nice! > > As I have stated previously, these numbers might not mean anything since > the CI is not a benchmarking environment (sorry if these are false negatives), > but I thought it might be worth mentioning so Intel could follow up and dive > deeper. > > Does anybody here create 1:1 operator comparisons (e.g. running > layer_norm in the different backends to compare the performance) who > could provide us with those numbers? > > Best regards, > Marco > > [1]: https://github.com/apache/incubator-mxnet/pull/13377
RE: MKLDNN performance in CI
Happy Thanksgiving, everyone :) Hi Marco, Thanks to raising this question. We will look into the details for CI test cases and Shufan will provide the 1:1 OP level performance data. In general, the CI tests is not performant cases which covered the different situations, even corner cases, for the quality purpose. Specifically, some overhead may be introduced if we connect an MKL-DNN OP and non MKL-DNN OP where the extra data conversion will happen. So, it's the expected behaviors that the performance drops in these kinds of test cases in which the computation is quite tiny. But in the real workload, these situations are not such frequency and the total performance of MKL-DNN will be much better than OpenBLAS. (data in here: https://cwiki.apache.org/confluence/display/MXNET/MXNet+with+Intel+MKL-DNN+-+Performance+Benchmarking) For a short time, Shufan will help to look into these test cases and figure out a proper solution to make the CI faster. For a medium and long time, we are working on implementing more MKL-DNN supported Ops, like reshape, slice, split, so that less data format conversion will be involved. Feel free to let me know if you have any other concerns. BR, --Patric > -Original Message- > From: Marco de Abreu [mailto:marco.g.ab...@googlemail.com.INVALID] > Sent: Friday, November 23, 2018 9:22 AM > To: dev@mxnet.incubator.apache.org > Subject: MKLDNN performance in CI > > Hello, > > I have noticed that our Python tests have been increasing in duration > recently. > In order to analyse this further, I created the PR [1] which allows to record > test durations. Please note that I did not dive deep on these numbers and > that they have to be taken with a grain of salt since slaves have varying > resource utilizations. > > Please have a look at the two following logs: > Python3 CPU MKLDNN: > http://jenkins.mxnet-ci.amazon- > ml.com/blue/rest/organizations/jenkins/pipelines/mxnet- > validation/pipelines/unix-cpu/branches/PR- > 13377/runs/2/nodes/155/steps/409/log/?start=0 > Python3 CPU Openblas: > http://jenkins.mxnet-ci.amazon- > ml.com/blue/rest/organizations/jenkins/pipelines/mxnet- > validation/pipelines/unix-cpu/branches/PR- > 13377/runs/2/nodes/152/steps/398/log/?start=0 > > If you scroll to the end (note that there are multiple test stages and > summaries being printed in these logs), you will find the following > statements: > > Python3 CPU MKLDNN: "Ran 702 tests in 3042.102s" > Python3 CPU Openblas: "Ran 702 tests in 2158.458s" > > This shows that the MKLDNN is generally being about 40% slower than the > Openblas backend. If we go into the details, we can see that some tests are > significantly slower: > > Python3 CPU MKLDNN: > > >[success] 20.78% test_random.test_shuffle: 630.7165s [success] 17.79% > >test_sparse_operator.test_elemwise_binary_ops: 540.0487s [success] > >10.91% test_gluon_model_zoo.test_models: 331.1503s [success] 2.62% > >test_operator.test_broadcast_binary_op: 79.4556s [success] 2.45% > >test_operator.test_pick: 74.4041s [success] 2.39% > >test_metric_perf.test_metric_performance: 72.5445s [success] 2.38% > >test_random.test_negative_binomial_generator: 72.1751s [success] 1.84% > >test_operator.test_psroipooling: 55.9432s [success] 1.78% > >test_random.test_poisson_generator: 54.0104s [success] 1.72% > >test_gluon.test_slice_pooling2d_slice_pooling2d: 52.3447s [success] > >1.60% test_contrib_control_flow.test_cond: 48.6977s [success] 1.41% > >test_random.test_random: 42.8712s [success] 1.03% > >test_operator.test_layer_norm: 31.1242s > > > Python3 CPU Openblas: > > [success] 26.20% test_gluon_model_zoo.test_models: 563.3366s [success] > > 4.34% test_random.test_shuffle: 93.3157s [success] 4.31% > > test_random.test_negative_binomial_generator: 92.6899s [success] 3.78% > > test_sparse_operator.test_elemwise_binary_ops: 81.2048s [success] > > 3.30% test_operator.test_psroipooling: 70.9090s [success] 3.20% > > test_random.test_poisson_generator: 68.7500s [success] 3.10% > > test_metric_perf.test_metric_performance: 66.6085s [success] 2.79% > > test_operator.test_layer_norm: 59.9566s [success] 2.66% > > test_gluon.test_slice_pooling2d_slice_pooling2d: 57.1887s [success] > > 2.62% test_operator.test_pick: 56.2312s [success] 2.60% > > test_random.test_random: 55.8920s [success] 2.19% > > test_operator.test_broadcast_binary_op: 47.1879s [success] 0.96% > > test_contrib_control_flow.test_cond: 20.6908s > > Tests worth noting: > - test_random.test_shuffle: 700% increase - but I don't know how this may > be related to MKLDNN. Are we doing random number generation in either of > those backends? > - test_sparse_operator.test_elemwise_binary_ops: 700% increase > - test_gluon_model_zoo.test_models: 40% decrease - that's awesome and to > be expect :) > - test_operator.test_broadcast_binary_op: 80% increase > - test_contrib_control_flow.test_cond: 250% increase > - test_operator.test_layer_norm: 50% decrease - nice! > > As I have stated
RE: MKLDNN performance in CI
Thanks for bringing this up, Marco. It's really weird since most of those tests listed in "worth noting" are not related to mkldnn backend. I can understand that some tests for mkldnn operator may be slower because MXNET_MKLDNN_DEBUG is enabled in the CI: https://github.com/apache/incubator-mxnet/blob/master/ci/docker/runtime_functions.sh#L713 -Original Message- From: Marco de Abreu [mailto:marco.g.ab...@googlemail.com.INVALID] Sent: Friday, November 23, 2018 9:22 AM To: dev@mxnet.incubator.apache.org Subject: MKLDNN performance in CI Hello, I have noticed that our Python tests have been increasing in duration recently. In order to analyse this further, I created the PR [1] which allows to record test durations. Please note that I did not dive deep on these numbers and that they have to be taken with a grain of salt since slaves have varying resource utilizations. Please have a look at the two following logs: Python3 CPU MKLDNN: http://jenkins.mxnet-ci.amazon-ml.com/blue/rest/organizations/jenkins/pipelines/mxnet-validation/pipelines/unix-cpu/branches/PR-13377/runs/2/nodes/155/steps/409/log/?start=0 Python3 CPU Openblas: http://jenkins.mxnet-ci.amazon-ml.com/blue/rest/organizations/jenkins/pipelines/mxnet-validation/pipelines/unix-cpu/branches/PR-13377/runs/2/nodes/152/steps/398/log/?start=0 If you scroll to the end (note that there are multiple test stages and summaries being printed in these logs), you will find the following statements: Python3 CPU MKLDNN: "Ran 702 tests in 3042.102s" Python3 CPU Openblas: "Ran 702 tests in 2158.458s" This shows that the MKLDNN is generally being about 40% slower than the Openblas backend. If we go into the details, we can see that some tests are significantly slower: Python3 CPU MKLDNN: >[success] 20.78% test_random.test_shuffle: 630.7165s [success] 17.79% >test_sparse_operator.test_elemwise_binary_ops: 540.0487s [success] >10.91% test_gluon_model_zoo.test_models: 331.1503s [success] 2.62% >test_operator.test_broadcast_binary_op: 79.4556s [success] 2.45% >test_operator.test_pick: 74.4041s [success] 2.39% >test_metric_perf.test_metric_performance: 72.5445s [success] 2.38% >test_random.test_negative_binomial_generator: 72.1751s [success] 1.84% >test_operator.test_psroipooling: 55.9432s [success] 1.78% >test_random.test_poisson_generator: 54.0104s [success] 1.72% >test_gluon.test_slice_pooling2d_slice_pooling2d: 52.3447s [success] >1.60% test_contrib_control_flow.test_cond: 48.6977s [success] 1.41% >test_random.test_random: 42.8712s [success] 1.03% >test_operator.test_layer_norm: 31.1242s Python3 CPU Openblas: > [success] 26.20% test_gluon_model_zoo.test_models: 563.3366s [success] > 4.34% test_random.test_shuffle: 93.3157s [success] 4.31% > test_random.test_negative_binomial_generator: 92.6899s [success] 3.78% > test_sparse_operator.test_elemwise_binary_ops: 81.2048s [success] > 3.30% test_operator.test_psroipooling: 70.9090s [success] 3.20% > test_random.test_poisson_generator: 68.7500s [success] 3.10% > test_metric_perf.test_metric_performance: 66.6085s [success] 2.79% > test_operator.test_layer_norm: 59.9566s [success] 2.66% > test_gluon.test_slice_pooling2d_slice_pooling2d: 57.1887s [success] > 2.62% test_operator.test_pick: 56.2312s [success] 2.60% > test_random.test_random: 55.8920s [success] 2.19% > test_operator.test_broadcast_binary_op: 47.1879s [success] 0.96% > test_contrib_control_flow.test_cond: 20.6908s Tests worth noting: - test_random.test_shuffle: 700% increase - but I don't know how this may be related to MKLDNN. Are we doing random number generation in either of those backends? - test_sparse_operator.test_elemwise_binary_ops: 700% increase - test_gluon_model_zoo.test_models: 40% decrease - that's awesome and to be expect :) - test_operator.test_broadcast_binary_op: 80% increase - test_contrib_control_flow.test_cond: 250% increase - test_operator.test_layer_norm: 50% decrease - nice! As I have stated previously, these numbers might not mean anything since the CI is not a benchmarking environment (sorry if these are false negatives), but I thought it might be worth mentioning so Intel could follow up and dive deeper. Does anybody here create 1:1 operator comparisons (e.g. running layer_norm in the different backends to compare the performance) who could provide us with those numbers? Best regards, Marco [1]: https://github.com/apache/incubator-mxnet/pull/13377