Re: MKLDNN performance in CI
Sure, good idea! https://github.com/apache/incubator-mxnet/pull/13379 -Marco On Fri, Nov 23, 2018 at 3:38 AM Zhao, Patric wrote: > Thanks, it should be the most time-consuming parts. > > @Marco, could you try to disable this env and see the performance again? > > > -Original Message- > > From: Lv, Tao A [mailto:tao.a...@intel.com] > > Sent: Friday, November 23, 2018 10:26 AM > > To: dev@mxnet.incubator.apache.org > > Subject: RE: MKLDNN performance in CI > > > > I think yes, except the cpp test. > > > > -Original Message- > > From: Zhao, Patric [mailto:patric.z...@intel.com] > > Sent: Friday, November 23, 2018 10:06 AM > > To: dev@mxnet.incubator.apache.org > > Subject: RE: MKLDNN performance in CI > > > > Good point, Tao! > > Is this env enabled in all MKL-DNN CI? > > > > > -Original Message- > > > From: Lv, Tao A [mailto:tao.a...@intel.com] > > > Sent: Friday, November 23, 2018 9:53 AM > > > To: dev@mxnet.incubator.apache.org > > > Subject: RE: MKLDNN performance in CI > > > > > > Thanks for bringing this up, Marco. It's really weird since most of > > > those tests listed in "worth noting" are not related to mkldnn backend. > > > > > > I can understand that some tests for mkldnn operator may be slower > > > because MXNET_MKLDNN_DEBUG is enabled in the CI: > > > https://github.com/apache/incubator- > > > mxnet/blob/master/ci/docker/runtime_functions.sh#L713 > > > > > > -Original Message- > > > From: Marco de Abreu [mailto:marco.g.ab...@googlemail.com.INVALID] > > > Sent: Friday, November 23, 2018 9:22 AM > > > To: dev@mxnet.incubator.apache.org > > > Subject: MKLDNN performance in CI > > > > > > Hello, > > > > > > I have noticed that our Python tests have been increasing in duration > > recently. > > > In order to analyse this further, I created the PR [1] which allows to > > > record test durations. Please note that I did not dive deep on these > > > numbers and that they have to be taken with a grain of salt since > > > slaves have varying resource utilizations. > > > > > > Please have a look at the two following logs: > > > Python3 CPU MKLDNN: > > > http://jenkins.mxnet-ci.amazon- > > > ml.com/blue/rest/organizations/jenkins/pipelines/mxnet- > > > validation/pipelines/unix-cpu/branches/PR- > > > 13377/runs/2/nodes/155/steps/409/log/?start=0 > > > Python3 CPU Openblas: > > > http://jenkins.mxnet-ci.amazon- > > > ml.com/blue/rest/organizations/jenkins/pipelines/mxnet- > > > validation/pipelines/unix-cpu/branches/PR- > > > 13377/runs/2/nodes/152/steps/398/log/?start=0 > > > > > > If you scroll to the end (note that there are multiple test stages and > > > summaries being printed in these logs), you will find the following > > > statements: > > > > > > Python3 CPU MKLDNN: "Ran 702 tests in 3042.102s" > > > Python3 CPU Openblas: "Ran 702 tests in 2158.458s" > > > > > > This shows that the MKLDNN is generally being about 40% slower than > > > the Openblas backend. If we go into the details, we can see that some > > > tests are significantly slower: > > > > > > Python3 CPU MKLDNN: > > > > > > >[success] 20.78% test_random.test_shuffle: 630.7165s [success] 17.79% > > > >test_sparse_operator.test_elemwise_binary_ops: 540.0487s [success] > > > >10.91% test_gluon_model_zoo.test_models: 331.1503s [success] 2.62% > > > >test_operator.test_broadcast_binary_op: 79.4556s [success] 2.45% > > > >test_operator.test_pick: 74.4041s [success] 2.39% > > > >test_metric_perf.test_metric_performance: 72.5445s [success] 2.38% > > > >test_random.test_negative_binomial_generator: 72.1751s [success] > > > >1.84% > > > >test_operator.test_psroipooling: 55.9432s [success] 1.78% > > > >test_random.test_poisson_generator: 54.0104s [success] 1.72% > > > >test_gluon.test_slice_pooling2d_slice_pooling2d: 52.3447s [success] > > > >1.60% test_contrib_control_flow.test_cond: 48.6977s [success] 1.41% > > > >test_random.test_random: 42.8712s [success] 1.03% > > > >test_operator.test_layer_norm: 31.1242s > > > > > > > > > Python3 CPU Openblas: > > > > [success] 26.20% test_gluon_model_zoo.test_models: 563.3366s > > > > [success] 4.34% test_random.test_shuffle: 93.3157s [success] 4.31% > > > > test_random.test_negative_binomial_generator: 92.6899s [success] > > > > 3.78% > > > > test_sparse_operator.test_elemwise_binary_ops: 81.2048s [success] > > > > 3.30% test_operator.test_psroipooling: 70.9090s [success] 3.20% > > > > test_random.test_poisson_generator: 68.7500s [success] 3.10% > > > > test_metric_perf.test_metric_performance: 66.6085s [success] 2.79% > > > > test_operator.test_layer_norm: 59.9566s [success] 2.66% > > > > test_gluon.test_slice_pooling2d_slice_pooling2d: 57.1887s [success] > > > > 2.62% test_operator.test_pick: 56.2312s [success] 2.60% > > > > test_random.test_random: 55.8920s [success] 2.19% > > > > test_operator.test_broadcast_binary_op: 47.1879s [success] 0.96% > > > > test_contrib_control_flow.test_cond: 20.6908s > > > > > > Tests worth noting: > > > - test_random.test_shuffle:
RE: MKLDNN performance in CI
Thanks, it should be the most time-consuming parts. @Marco, could you try to disable this env and see the performance again? > -Original Message- > From: Lv, Tao A [mailto:tao.a...@intel.com] > Sent: Friday, November 23, 2018 10:26 AM > To: dev@mxnet.incubator.apache.org > Subject: RE: MKLDNN performance in CI > > I think yes, except the cpp test. > > -Original Message- > From: Zhao, Patric [mailto:patric.z...@intel.com] > Sent: Friday, November 23, 2018 10:06 AM > To: dev@mxnet.incubator.apache.org > Subject: RE: MKLDNN performance in CI > > Good point, Tao! > Is this env enabled in all MKL-DNN CI? > > > -Original Message- > > From: Lv, Tao A [mailto:tao.a...@intel.com] > > Sent: Friday, November 23, 2018 9:53 AM > > To: dev@mxnet.incubator.apache.org > > Subject: RE: MKLDNN performance in CI > > > > Thanks for bringing this up, Marco. It's really weird since most of > > those tests listed in "worth noting" are not related to mkldnn backend. > > > > I can understand that some tests for mkldnn operator may be slower > > because MXNET_MKLDNN_DEBUG is enabled in the CI: > > https://github.com/apache/incubator- > > mxnet/blob/master/ci/docker/runtime_functions.sh#L713 > > > > -Original Message- > > From: Marco de Abreu [mailto:marco.g.ab...@googlemail.com.INVALID] > > Sent: Friday, November 23, 2018 9:22 AM > > To: dev@mxnet.incubator.apache.org > > Subject: MKLDNN performance in CI > > > > Hello, > > > > I have noticed that our Python tests have been increasing in duration > recently. > > In order to analyse this further, I created the PR [1] which allows to > > record test durations. Please note that I did not dive deep on these > > numbers and that they have to be taken with a grain of salt since > > slaves have varying resource utilizations. > > > > Please have a look at the two following logs: > > Python3 CPU MKLDNN: > > http://jenkins.mxnet-ci.amazon- > > ml.com/blue/rest/organizations/jenkins/pipelines/mxnet- > > validation/pipelines/unix-cpu/branches/PR- > > 13377/runs/2/nodes/155/steps/409/log/?start=0 > > Python3 CPU Openblas: > > http://jenkins.mxnet-ci.amazon- > > ml.com/blue/rest/organizations/jenkins/pipelines/mxnet- > > validation/pipelines/unix-cpu/branches/PR- > > 13377/runs/2/nodes/152/steps/398/log/?start=0 > > > > If you scroll to the end (note that there are multiple test stages and > > summaries being printed in these logs), you will find the following > > statements: > > > > Python3 CPU MKLDNN: "Ran 702 tests in 3042.102s" > > Python3 CPU Openblas: "Ran 702 tests in 2158.458s" > > > > This shows that the MKLDNN is generally being about 40% slower than > > the Openblas backend. If we go into the details, we can see that some > > tests are significantly slower: > > > > Python3 CPU MKLDNN: > > > > >[success] 20.78% test_random.test_shuffle: 630.7165s [success] 17.79% > > >test_sparse_operator.test_elemwise_binary_ops: 540.0487s [success] > > >10.91% test_gluon_model_zoo.test_models: 331.1503s [success] 2.62% > > >test_operator.test_broadcast_binary_op: 79.4556s [success] 2.45% > > >test_operator.test_pick: 74.4041s [success] 2.39% > > >test_metric_perf.test_metric_performance: 72.5445s [success] 2.38% > > >test_random.test_negative_binomial_generator: 72.1751s [success] > > >1.84% > > >test_operator.test_psroipooling: 55.9432s [success] 1.78% > > >test_random.test_poisson_generator: 54.0104s [success] 1.72% > > >test_gluon.test_slice_pooling2d_slice_pooling2d: 52.3447s [success] > > >1.60% test_contrib_control_flow.test_cond: 48.6977s [success] 1.41% > > >test_random.test_random: 42.8712s [success] 1.03% > > >test_operator.test_layer_norm: 31.1242s > > > > > > Python3 CPU Openblas: > > > [success] 26.20% test_gluon_model_zoo.test_models: 563.3366s > > > [success] 4.34% test_random.test_shuffle: 93.3157s [success] 4.31% > > > test_random.test_negative_binomial_generator: 92.6899s [success] > > > 3.78% > > > test_sparse_operator.test_elemwise_binary_ops: 81.2048s [success] > > > 3.30% test_operator.test_psroipooling: 70.9090s [success] 3.20% > > > test_random.test_poisson_generator: 68.7500s [success] 3.10% > > > test_metric_perf.test_metric_performance: 66.6085s [success] 2.79% > > > test_operator.test_layer_norm: 59.9566s [success] 2.66% > > > test_gluon.test_slice_pooling2d_slice_pooling2d: 57.1887s [success] > > > 2.62% test_operator.test_pick: 56.2312s [success] 2.60% > > > test_random.test_random: 55.8920s [success] 2.19% > > > test_operator.test_broadcast_binary_op: 47.1879s [success] 0.96% > > > test_contrib_control_flow.test_cond: 20.6908s > > > > Tests worth noting: > > - test_random.test_shuffle: 700% increase - but I don't know how this > > may be related to MKLDNN. Are we doing random number generation in > > either of those backends? > > - test_sparse_operator.test_elemwise_binary_ops: 700% increase > > - test_gluon_model_zoo.test_models: 40% decrease - that's awesome and > > to be expect :) > > -
Re: MKLDNN performance in CI
Okay, that makes sense. Thanks for the very detailed explanation! Don't worry about speed in CI, especially the debug flag might be the reason here. But I'm happy to leave it in since the task of CI is accuracy and detailed reporting, but not speed :) I just wanted to give a quick heads up in case this could be an indicator. I'm looking forward to the 1:1 comparisons, but besides that there's no further action required. Thanks for your efforts! Best regards, Marco Am Fr., 23. Nov. 2018, 03:25 hat Lv, Tao A geschrieben: > I think yes, except the cpp test. > > -Original Message- > From: Zhao, Patric [mailto:patric.z...@intel.com] > Sent: Friday, November 23, 2018 10:06 AM > To: dev@mxnet.incubator.apache.org > Subject: RE: MKLDNN performance in CI > > Good point, Tao! > Is this env enabled in all MKL-DNN CI? > > > -Original Message- > > From: Lv, Tao A [mailto:tao.a...@intel.com] > > Sent: Friday, November 23, 2018 9:53 AM > > To: dev@mxnet.incubator.apache.org > > Subject: RE: MKLDNN performance in CI > > > > Thanks for bringing this up, Marco. It's really weird since most of > > those tests listed in "worth noting" are not related to mkldnn backend. > > > > I can understand that some tests for mkldnn operator may be slower > > because MXNET_MKLDNN_DEBUG is enabled in the CI: > > https://github.com/apache/incubator- > > mxnet/blob/master/ci/docker/runtime_functions.sh#L713 > > > > -Original Message- > > From: Marco de Abreu [mailto:marco.g.ab...@googlemail.com.INVALID] > > Sent: Friday, November 23, 2018 9:22 AM > > To: dev@mxnet.incubator.apache.org > > Subject: MKLDNN performance in CI > > > > Hello, > > > > I have noticed that our Python tests have been increasing in duration > recently. > > In order to analyse this further, I created the PR [1] which allows to > > record test durations. Please note that I did not dive deep on these > > numbers and that they have to be taken with a grain of salt since > > slaves have varying resource utilizations. > > > > Please have a look at the two following logs: > > Python3 CPU MKLDNN: > > http://jenkins.mxnet-ci.amazon- > > ml.com/blue/rest/organizations/jenkins/pipelines/mxnet- > > validation/pipelines/unix-cpu/branches/PR- > > 13377/runs/2/nodes/155/steps/409/log/?start=0 > > Python3 CPU Openblas: > > http://jenkins.mxnet-ci.amazon- > > ml.com/blue/rest/organizations/jenkins/pipelines/mxnet- > > validation/pipelines/unix-cpu/branches/PR- > > 13377/runs/2/nodes/152/steps/398/log/?start=0 > > > > If you scroll to the end (note that there are multiple test stages and > > summaries being printed in these logs), you will find the following > > statements: > > > > Python3 CPU MKLDNN: "Ran 702 tests in 3042.102s" > > Python3 CPU Openblas: "Ran 702 tests in 2158.458s" > > > > This shows that the MKLDNN is generally being about 40% slower than > > the Openblas backend. If we go into the details, we can see that some > > tests are significantly slower: > > > > Python3 CPU MKLDNN: > > > > >[success] 20.78% test_random.test_shuffle: 630.7165s [success] 17.79% > > >test_sparse_operator.test_elemwise_binary_ops: 540.0487s [success] > > >10.91% test_gluon_model_zoo.test_models: 331.1503s [success] 2.62% > > >test_operator.test_broadcast_binary_op: 79.4556s [success] 2.45% > > >test_operator.test_pick: 74.4041s [success] 2.39% > > >test_metric_perf.test_metric_performance: 72.5445s [success] 2.38% > > >test_random.test_negative_binomial_generator: 72.1751s [success] > > >1.84% > > >test_operator.test_psroipooling: 55.9432s [success] 1.78% > > >test_random.test_poisson_generator: 54.0104s [success] 1.72% > > >test_gluon.test_slice_pooling2d_slice_pooling2d: 52.3447s [success] > > >1.60% test_contrib_control_flow.test_cond: 48.6977s [success] 1.41% > > >test_random.test_random: 42.8712s [success] 1.03% > > >test_operator.test_layer_norm: 31.1242s > > > > > > Python3 CPU Openblas: > > > [success] 26.20% test_gluon_model_zoo.test_models: 563.3366s > > > [success] 4.34% test_random.test_shuffle: 93.3157s [success] 4.31% > > > test_random.test_negative_binomial_generator: 92.6899s [success] > > > 3.78% > > > test_sparse_operator.test_elemwise_binary_ops: 81.2048s [success] > > > 3.30% test_operator.test_psroipooling: 70.9090s [success] 3.20% > > > test_random.test_poisson_generator: 68.7500s [success] 3.10% > > > test_metric_perf.test_metric_performance: 66.6085s [success] 2.79% > > > test_operator.test_layer_norm: 59.9566s [success] 2.66% > > > test_gluon.test_slice_pooling2d_slice_pooling2d: 57.1887s [success] > > > 2.62% test_operator.test_pick: 56.2312s [success] 2.60% > > > test_random.test_random: 55.8920s [success] 2.19% > > > test_operator.test_broadcast_binary_op: 47.1879s [success] 0.96% > > > test_contrib_control_flow.test_cond: 20.6908s > > > > Tests worth noting: > > - test_random.test_shuffle: 700% increase - but I don't know how this > > may be related to MKLDNN. Are we doing random number generation in > > either of those
RE: MKLDNN performance in CI
I think yes, except the cpp test. -Original Message- From: Zhao, Patric [mailto:patric.z...@intel.com] Sent: Friday, November 23, 2018 10:06 AM To: dev@mxnet.incubator.apache.org Subject: RE: MKLDNN performance in CI Good point, Tao! Is this env enabled in all MKL-DNN CI? > -Original Message- > From: Lv, Tao A [mailto:tao.a...@intel.com] > Sent: Friday, November 23, 2018 9:53 AM > To: dev@mxnet.incubator.apache.org > Subject: RE: MKLDNN performance in CI > > Thanks for bringing this up, Marco. It's really weird since most of > those tests listed in "worth noting" are not related to mkldnn backend. > > I can understand that some tests for mkldnn operator may be slower > because MXNET_MKLDNN_DEBUG is enabled in the CI: > https://github.com/apache/incubator- > mxnet/blob/master/ci/docker/runtime_functions.sh#L713 > > -Original Message- > From: Marco de Abreu [mailto:marco.g.ab...@googlemail.com.INVALID] > Sent: Friday, November 23, 2018 9:22 AM > To: dev@mxnet.incubator.apache.org > Subject: MKLDNN performance in CI > > Hello, > > I have noticed that our Python tests have been increasing in duration > recently. > In order to analyse this further, I created the PR [1] which allows to > record test durations. Please note that I did not dive deep on these > numbers and that they have to be taken with a grain of salt since > slaves have varying resource utilizations. > > Please have a look at the two following logs: > Python3 CPU MKLDNN: > http://jenkins.mxnet-ci.amazon- > ml.com/blue/rest/organizations/jenkins/pipelines/mxnet- > validation/pipelines/unix-cpu/branches/PR- > 13377/runs/2/nodes/155/steps/409/log/?start=0 > Python3 CPU Openblas: > http://jenkins.mxnet-ci.amazon- > ml.com/blue/rest/organizations/jenkins/pipelines/mxnet- > validation/pipelines/unix-cpu/branches/PR- > 13377/runs/2/nodes/152/steps/398/log/?start=0 > > If you scroll to the end (note that there are multiple test stages and > summaries being printed in these logs), you will find the following > statements: > > Python3 CPU MKLDNN: "Ran 702 tests in 3042.102s" > Python3 CPU Openblas: "Ran 702 tests in 2158.458s" > > This shows that the MKLDNN is generally being about 40% slower than > the Openblas backend. If we go into the details, we can see that some > tests are significantly slower: > > Python3 CPU MKLDNN: > > >[success] 20.78% test_random.test_shuffle: 630.7165s [success] 17.79% > >test_sparse_operator.test_elemwise_binary_ops: 540.0487s [success] > >10.91% test_gluon_model_zoo.test_models: 331.1503s [success] 2.62% > >test_operator.test_broadcast_binary_op: 79.4556s [success] 2.45% > >test_operator.test_pick: 74.4041s [success] 2.39% > >test_metric_perf.test_metric_performance: 72.5445s [success] 2.38% > >test_random.test_negative_binomial_generator: 72.1751s [success] > >1.84% > >test_operator.test_psroipooling: 55.9432s [success] 1.78% > >test_random.test_poisson_generator: 54.0104s [success] 1.72% > >test_gluon.test_slice_pooling2d_slice_pooling2d: 52.3447s [success] > >1.60% test_contrib_control_flow.test_cond: 48.6977s [success] 1.41% > >test_random.test_random: 42.8712s [success] 1.03% > >test_operator.test_layer_norm: 31.1242s > > > Python3 CPU Openblas: > > [success] 26.20% test_gluon_model_zoo.test_models: 563.3366s > > [success] 4.34% test_random.test_shuffle: 93.3157s [success] 4.31% > > test_random.test_negative_binomial_generator: 92.6899s [success] > > 3.78% > > test_sparse_operator.test_elemwise_binary_ops: 81.2048s [success] > > 3.30% test_operator.test_psroipooling: 70.9090s [success] 3.20% > > test_random.test_poisson_generator: 68.7500s [success] 3.10% > > test_metric_perf.test_metric_performance: 66.6085s [success] 2.79% > > test_operator.test_layer_norm: 59.9566s [success] 2.66% > > test_gluon.test_slice_pooling2d_slice_pooling2d: 57.1887s [success] > > 2.62% test_operator.test_pick: 56.2312s [success] 2.60% > > test_random.test_random: 55.8920s [success] 2.19% > > test_operator.test_broadcast_binary_op: 47.1879s [success] 0.96% > > test_contrib_control_flow.test_cond: 20.6908s > > Tests worth noting: > - test_random.test_shuffle: 700% increase - but I don't know how this > may be related to MKLDNN. Are we doing random number generation in > either of those backends? > - test_sparse_operator.test_elemwise_binary_ops: 700% increase > - test_gluon_model_zoo.test_models: 40% decrease - that's awesome and > to be expect :) > - test_operator.test_broadcast_binary_op: 80% increase > - test_contrib_control_flow.test_cond: 250% increase > - test_operator.test_layer_norm: 50% decrease - nice! > > As I have stated previously, these numbers might not mean anything > since the CI is not a benchmarking environment (sorry if these are > false negatives), but I thought it might be worth mentioning so Intel > could follow up and dive deeper. > > Does anybody here create 1:1 operator comparisons (e.g. running > layer_norm in the different backends to
RE: MKLDNN performance in CI
Good point, Tao! Is this env enabled in all MKL-DNN CI? > -Original Message- > From: Lv, Tao A [mailto:tao.a...@intel.com] > Sent: Friday, November 23, 2018 9:53 AM > To: dev@mxnet.incubator.apache.org > Subject: RE: MKLDNN performance in CI > > Thanks for bringing this up, Marco. It's really weird since most of those > tests > listed in "worth noting" are not related to mkldnn backend. > > I can understand that some tests for mkldnn operator may be slower > because MXNET_MKLDNN_DEBUG is enabled in the CI: > https://github.com/apache/incubator- > mxnet/blob/master/ci/docker/runtime_functions.sh#L713 > > -Original Message- > From: Marco de Abreu [mailto:marco.g.ab...@googlemail.com.INVALID] > Sent: Friday, November 23, 2018 9:22 AM > To: dev@mxnet.incubator.apache.org > Subject: MKLDNN performance in CI > > Hello, > > I have noticed that our Python tests have been increasing in duration > recently. > In order to analyse this further, I created the PR [1] which allows to record > test durations. Please note that I did not dive deep on these numbers and > that they have to be taken with a grain of salt since slaves have varying > resource utilizations. > > Please have a look at the two following logs: > Python3 CPU MKLDNN: > http://jenkins.mxnet-ci.amazon- > ml.com/blue/rest/organizations/jenkins/pipelines/mxnet- > validation/pipelines/unix-cpu/branches/PR- > 13377/runs/2/nodes/155/steps/409/log/?start=0 > Python3 CPU Openblas: > http://jenkins.mxnet-ci.amazon- > ml.com/blue/rest/organizations/jenkins/pipelines/mxnet- > validation/pipelines/unix-cpu/branches/PR- > 13377/runs/2/nodes/152/steps/398/log/?start=0 > > If you scroll to the end (note that there are multiple test stages and > summaries being printed in these logs), you will find the following > statements: > > Python3 CPU MKLDNN: "Ran 702 tests in 3042.102s" > Python3 CPU Openblas: "Ran 702 tests in 2158.458s" > > This shows that the MKLDNN is generally being about 40% slower than the > Openblas backend. If we go into the details, we can see that some tests are > significantly slower: > > Python3 CPU MKLDNN: > > >[success] 20.78% test_random.test_shuffle: 630.7165s [success] 17.79% > >test_sparse_operator.test_elemwise_binary_ops: 540.0487s [success] > >10.91% test_gluon_model_zoo.test_models: 331.1503s [success] 2.62% > >test_operator.test_broadcast_binary_op: 79.4556s [success] 2.45% > >test_operator.test_pick: 74.4041s [success] 2.39% > >test_metric_perf.test_metric_performance: 72.5445s [success] 2.38% > >test_random.test_negative_binomial_generator: 72.1751s [success] 1.84% > >test_operator.test_psroipooling: 55.9432s [success] 1.78% > >test_random.test_poisson_generator: 54.0104s [success] 1.72% > >test_gluon.test_slice_pooling2d_slice_pooling2d: 52.3447s [success] > >1.60% test_contrib_control_flow.test_cond: 48.6977s [success] 1.41% > >test_random.test_random: 42.8712s [success] 1.03% > >test_operator.test_layer_norm: 31.1242s > > > Python3 CPU Openblas: > > [success] 26.20% test_gluon_model_zoo.test_models: 563.3366s [success] > > 4.34% test_random.test_shuffle: 93.3157s [success] 4.31% > > test_random.test_negative_binomial_generator: 92.6899s [success] 3.78% > > test_sparse_operator.test_elemwise_binary_ops: 81.2048s [success] > > 3.30% test_operator.test_psroipooling: 70.9090s [success] 3.20% > > test_random.test_poisson_generator: 68.7500s [success] 3.10% > > test_metric_perf.test_metric_performance: 66.6085s [success] 2.79% > > test_operator.test_layer_norm: 59.9566s [success] 2.66% > > test_gluon.test_slice_pooling2d_slice_pooling2d: 57.1887s [success] > > 2.62% test_operator.test_pick: 56.2312s [success] 2.60% > > test_random.test_random: 55.8920s [success] 2.19% > > test_operator.test_broadcast_binary_op: 47.1879s [success] 0.96% > > test_contrib_control_flow.test_cond: 20.6908s > > Tests worth noting: > - test_random.test_shuffle: 700% increase - but I don't know how this may > be related to MKLDNN. Are we doing random number generation in either of > those backends? > - test_sparse_operator.test_elemwise_binary_ops: 700% increase > - test_gluon_model_zoo.test_models: 40% decrease - that's awesome and to > be expect :) > - test_operator.test_broadcast_binary_op: 80% increase > - test_contrib_control_flow.test_cond: 250% increase > - test_operator.test_layer_norm: 50% decrease - nice! > > As I have stated previously, these numbers might not mean anything since > the CI is not a benchmarking environment (sorry if these are false negatives), > but I thought it might be worth mentioning so Intel could follow up and dive > deeper. > > Does anybody here create 1:1 operator comparisons (e.g. running > layer_norm in the different backends to compare the performance) who > could provide us with those numbers? > > Best regards, > Marco > > [1]: https://github.com/apache/incubator-mxnet/pull/13377
RE: MKLDNN performance in CI
Happy Thanksgiving, everyone :) Hi Marco, Thanks to raising this question. We will look into the details for CI test cases and Shufan will provide the 1:1 OP level performance data. In general, the CI tests is not performant cases which covered the different situations, even corner cases, for the quality purpose. Specifically, some overhead may be introduced if we connect an MKL-DNN OP and non MKL-DNN OP where the extra data conversion will happen. So, it's the expected behaviors that the performance drops in these kinds of test cases in which the computation is quite tiny. But in the real workload, these situations are not such frequency and the total performance of MKL-DNN will be much better than OpenBLAS. (data in here: https://cwiki.apache.org/confluence/display/MXNET/MXNet+with+Intel+MKL-DNN+-+Performance+Benchmarking) For a short time, Shufan will help to look into these test cases and figure out a proper solution to make the CI faster. For a medium and long time, we are working on implementing more MKL-DNN supported Ops, like reshape, slice, split, so that less data format conversion will be involved. Feel free to let me know if you have any other concerns. BR, --Patric > -Original Message- > From: Marco de Abreu [mailto:marco.g.ab...@googlemail.com.INVALID] > Sent: Friday, November 23, 2018 9:22 AM > To: dev@mxnet.incubator.apache.org > Subject: MKLDNN performance in CI > > Hello, > > I have noticed that our Python tests have been increasing in duration > recently. > In order to analyse this further, I created the PR [1] which allows to record > test durations. Please note that I did not dive deep on these numbers and > that they have to be taken with a grain of salt since slaves have varying > resource utilizations. > > Please have a look at the two following logs: > Python3 CPU MKLDNN: > http://jenkins.mxnet-ci.amazon- > ml.com/blue/rest/organizations/jenkins/pipelines/mxnet- > validation/pipelines/unix-cpu/branches/PR- > 13377/runs/2/nodes/155/steps/409/log/?start=0 > Python3 CPU Openblas: > http://jenkins.mxnet-ci.amazon- > ml.com/blue/rest/organizations/jenkins/pipelines/mxnet- > validation/pipelines/unix-cpu/branches/PR- > 13377/runs/2/nodes/152/steps/398/log/?start=0 > > If you scroll to the end (note that there are multiple test stages and > summaries being printed in these logs), you will find the following > statements: > > Python3 CPU MKLDNN: "Ran 702 tests in 3042.102s" > Python3 CPU Openblas: "Ran 702 tests in 2158.458s" > > This shows that the MKLDNN is generally being about 40% slower than the > Openblas backend. If we go into the details, we can see that some tests are > significantly slower: > > Python3 CPU MKLDNN: > > >[success] 20.78% test_random.test_shuffle: 630.7165s [success] 17.79% > >test_sparse_operator.test_elemwise_binary_ops: 540.0487s [success] > >10.91% test_gluon_model_zoo.test_models: 331.1503s [success] 2.62% > >test_operator.test_broadcast_binary_op: 79.4556s [success] 2.45% > >test_operator.test_pick: 74.4041s [success] 2.39% > >test_metric_perf.test_metric_performance: 72.5445s [success] 2.38% > >test_random.test_negative_binomial_generator: 72.1751s [success] 1.84% > >test_operator.test_psroipooling: 55.9432s [success] 1.78% > >test_random.test_poisson_generator: 54.0104s [success] 1.72% > >test_gluon.test_slice_pooling2d_slice_pooling2d: 52.3447s [success] > >1.60% test_contrib_control_flow.test_cond: 48.6977s [success] 1.41% > >test_random.test_random: 42.8712s [success] 1.03% > >test_operator.test_layer_norm: 31.1242s > > > Python3 CPU Openblas: > > [success] 26.20% test_gluon_model_zoo.test_models: 563.3366s [success] > > 4.34% test_random.test_shuffle: 93.3157s [success] 4.31% > > test_random.test_negative_binomial_generator: 92.6899s [success] 3.78% > > test_sparse_operator.test_elemwise_binary_ops: 81.2048s [success] > > 3.30% test_operator.test_psroipooling: 70.9090s [success] 3.20% > > test_random.test_poisson_generator: 68.7500s [success] 3.10% > > test_metric_perf.test_metric_performance: 66.6085s [success] 2.79% > > test_operator.test_layer_norm: 59.9566s [success] 2.66% > > test_gluon.test_slice_pooling2d_slice_pooling2d: 57.1887s [success] > > 2.62% test_operator.test_pick: 56.2312s [success] 2.60% > > test_random.test_random: 55.8920s [success] 2.19% > > test_operator.test_broadcast_binary_op: 47.1879s [success] 0.96% > > test_contrib_control_flow.test_cond: 20.6908s > > Tests worth noting: > - test_random.test_shuffle: 700% increase - but I don't know how this may > be related to MKLDNN. Are we doing random number generation in either of > those backends? > - test_sparse_operator.test_elemwise_binary_ops: 700% increase > - test_gluon_model_zoo.test_models: 40% decrease - that's awesome and to > be expect :) > - test_operator.test_broadcast_binary_op: 80% increase > - test_contrib_control_flow.test_cond: 250% increase > - test_operator.test_layer_norm: 50% decrease - nice! > > As I have stated
RE: MKLDNN performance in CI
Thanks for bringing this up, Marco. It's really weird since most of those tests listed in "worth noting" are not related to mkldnn backend. I can understand that some tests for mkldnn operator may be slower because MXNET_MKLDNN_DEBUG is enabled in the CI: https://github.com/apache/incubator-mxnet/blob/master/ci/docker/runtime_functions.sh#L713 -Original Message- From: Marco de Abreu [mailto:marco.g.ab...@googlemail.com.INVALID] Sent: Friday, November 23, 2018 9:22 AM To: dev@mxnet.incubator.apache.org Subject: MKLDNN performance in CI Hello, I have noticed that our Python tests have been increasing in duration recently. In order to analyse this further, I created the PR [1] which allows to record test durations. Please note that I did not dive deep on these numbers and that they have to be taken with a grain of salt since slaves have varying resource utilizations. Please have a look at the two following logs: Python3 CPU MKLDNN: http://jenkins.mxnet-ci.amazon-ml.com/blue/rest/organizations/jenkins/pipelines/mxnet-validation/pipelines/unix-cpu/branches/PR-13377/runs/2/nodes/155/steps/409/log/?start=0 Python3 CPU Openblas: http://jenkins.mxnet-ci.amazon-ml.com/blue/rest/organizations/jenkins/pipelines/mxnet-validation/pipelines/unix-cpu/branches/PR-13377/runs/2/nodes/152/steps/398/log/?start=0 If you scroll to the end (note that there are multiple test stages and summaries being printed in these logs), you will find the following statements: Python3 CPU MKLDNN: "Ran 702 tests in 3042.102s" Python3 CPU Openblas: "Ran 702 tests in 2158.458s" This shows that the MKLDNN is generally being about 40% slower than the Openblas backend. If we go into the details, we can see that some tests are significantly slower: Python3 CPU MKLDNN: >[success] 20.78% test_random.test_shuffle: 630.7165s [success] 17.79% >test_sparse_operator.test_elemwise_binary_ops: 540.0487s [success] >10.91% test_gluon_model_zoo.test_models: 331.1503s [success] 2.62% >test_operator.test_broadcast_binary_op: 79.4556s [success] 2.45% >test_operator.test_pick: 74.4041s [success] 2.39% >test_metric_perf.test_metric_performance: 72.5445s [success] 2.38% >test_random.test_negative_binomial_generator: 72.1751s [success] 1.84% >test_operator.test_psroipooling: 55.9432s [success] 1.78% >test_random.test_poisson_generator: 54.0104s [success] 1.72% >test_gluon.test_slice_pooling2d_slice_pooling2d: 52.3447s [success] >1.60% test_contrib_control_flow.test_cond: 48.6977s [success] 1.41% >test_random.test_random: 42.8712s [success] 1.03% >test_operator.test_layer_norm: 31.1242s Python3 CPU Openblas: > [success] 26.20% test_gluon_model_zoo.test_models: 563.3366s [success] > 4.34% test_random.test_shuffle: 93.3157s [success] 4.31% > test_random.test_negative_binomial_generator: 92.6899s [success] 3.78% > test_sparse_operator.test_elemwise_binary_ops: 81.2048s [success] > 3.30% test_operator.test_psroipooling: 70.9090s [success] 3.20% > test_random.test_poisson_generator: 68.7500s [success] 3.10% > test_metric_perf.test_metric_performance: 66.6085s [success] 2.79% > test_operator.test_layer_norm: 59.9566s [success] 2.66% > test_gluon.test_slice_pooling2d_slice_pooling2d: 57.1887s [success] > 2.62% test_operator.test_pick: 56.2312s [success] 2.60% > test_random.test_random: 55.8920s [success] 2.19% > test_operator.test_broadcast_binary_op: 47.1879s [success] 0.96% > test_contrib_control_flow.test_cond: 20.6908s Tests worth noting: - test_random.test_shuffle: 700% increase - but I don't know how this may be related to MKLDNN. Are we doing random number generation in either of those backends? - test_sparse_operator.test_elemwise_binary_ops: 700% increase - test_gluon_model_zoo.test_models: 40% decrease - that's awesome and to be expect :) - test_operator.test_broadcast_binary_op: 80% increase - test_contrib_control_flow.test_cond: 250% increase - test_operator.test_layer_norm: 50% decrease - nice! As I have stated previously, these numbers might not mean anything since the CI is not a benchmarking environment (sorry if these are false negatives), but I thought it might be worth mentioning so Intel could follow up and dive deeper. Does anybody here create 1:1 operator comparisons (e.g. running layer_norm in the different backends to compare the performance) who could provide us with those numbers? Best regards, Marco [1]: https://github.com/apache/incubator-mxnet/pull/13377
MKLDNN performance in CI
Hello, I have noticed that our Python tests have been increasing in duration recently. In order to analyse this further, I created the PR [1] which allows to record test durations. Please note that I did not dive deep on these numbers and that they have to be taken with a grain of salt since slaves have varying resource utilizations. Please have a look at the two following logs: Python3 CPU MKLDNN: http://jenkins.mxnet-ci.amazon-ml.com/blue/rest/organizations/jenkins/pipelines/mxnet-validation/pipelines/unix-cpu/branches/PR-13377/runs/2/nodes/155/steps/409/log/?start=0 Python3 CPU Openblas: http://jenkins.mxnet-ci.amazon-ml.com/blue/rest/organizations/jenkins/pipelines/mxnet-validation/pipelines/unix-cpu/branches/PR-13377/runs/2/nodes/152/steps/398/log/?start=0 If you scroll to the end (note that there are multiple test stages and summaries being printed in these logs), you will find the following statements: Python3 CPU MKLDNN: "Ran 702 tests in 3042.102s" Python3 CPU Openblas: "Ran 702 tests in 2158.458s" This shows that the MKLDNN is generally being about 40% slower than the Openblas backend. If we go into the details, we can see that some tests are significantly slower: Python3 CPU MKLDNN: >[success] 20.78% test_random.test_shuffle: 630.7165s >[success] 17.79% test_sparse_operator.test_elemwise_binary_ops: 540.0487s >[success] 10.91% test_gluon_model_zoo.test_models: 331.1503s >[success] 2.62% test_operator.test_broadcast_binary_op: 79.4556s >[success] 2.45% test_operator.test_pick: 74.4041s >[success] 2.39% test_metric_perf.test_metric_performance: 72.5445s >[success] 2.38% test_random.test_negative_binomial_generator: 72.1751s >[success] 1.84% test_operator.test_psroipooling: 55.9432s >[success] 1.78% test_random.test_poisson_generator: 54.0104s >[success] 1.72% test_gluon.test_slice_pooling2d_slice_pooling2d: 52.3447s >[success] 1.60% test_contrib_control_flow.test_cond: 48.6977s >[success] 1.41% test_random.test_random: 42.8712s >[success] 1.03% test_operator.test_layer_norm: 31.1242s Python3 CPU Openblas: > [success] 26.20% test_gluon_model_zoo.test_models: 563.3366s > [success] 4.34% test_random.test_shuffle: 93.3157s > [success] 4.31% test_random.test_negative_binomial_generator: 92.6899s > [success] 3.78% test_sparse_operator.test_elemwise_binary_ops: 81.2048s > [success] 3.30% test_operator.test_psroipooling: 70.9090s > [success] 3.20% test_random.test_poisson_generator: 68.7500s > [success] 3.10% test_metric_perf.test_metric_performance: 66.6085s > [success] 2.79% test_operator.test_layer_norm: 59.9566s > [success] 2.66% test_gluon.test_slice_pooling2d_slice_pooling2d: 57.1887s > [success] 2.62% test_operator.test_pick: 56.2312s > [success] 2.60% test_random.test_random: 55.8920s > [success] 2.19% test_operator.test_broadcast_binary_op: 47.1879s > [success] 0.96% test_contrib_control_flow.test_cond: 20.6908s Tests worth noting: - test_random.test_shuffle: 700% increase - but I don't know how this may be related to MKLDNN. Are we doing random number generation in either of those backends? - test_sparse_operator.test_elemwise_binary_ops: 700% increase - test_gluon_model_zoo.test_models: 40% decrease - that's awesome and to be expect :) - test_operator.test_broadcast_binary_op: 80% increase - test_contrib_control_flow.test_cond: 250% increase - test_operator.test_layer_norm: 50% decrease - nice! As I have stated previously, these numbers might not mean anything since the CI is not a benchmarking environment (sorry if these are false negatives), but I thought it might be worth mentioning so Intel could follow up and dive deeper. Does anybody here create 1:1 operator comparisons (e.g. running layer_norm in the different backends to compare the performance) who could provide us with those numbers? Best regards, Marco [1]: https://github.com/apache/incubator-mxnet/pull/13377
Re: CI impaired
Thanks everybody, I really appreciate it! Today was a good day, there were no incidents and everything appears to be stable. In the meantime I did a deep dive on why we has such a significant performance decrease with of our compilation jobs - which then clogged up the queue and resulted in 1000 jobs waiting to be scheduled. The reason was the way how we use ccache to speed up our compilation jobs. Usually, this yields us a huge performance improvement (CPU openblas, for example, goes from 30 minutes down to ~3min, ARMv7 from 30 minutes down to ~1.5min, etc.). Unfortunately in this case, ccache was our limiting factor. Here's some background about how we operate our cache: We use EFS to have a distributed ccache between all of our unrestricted-prod-slaves. EFS is classified for almost unlimited scalability (being consumed by thousands of instances in parallel [1]) with a theoretical throughput of over 10Gbps. One thing I didn't know when I designed this approach was the method how throughput is being granted. Similar to T2-CPU-Credits, EFS uses BurstCredits to allow you higher throughput (default is 50MiB/s) [2]. Due to the high load, we consumed all of our credits - here's a very interesting graph: [3]. To avoid similar incidents in future, I have taken the following actions: 1. I switched EFS from burst-mode to provisioned throughput with 300MB/s (in the graph at [3] you can see how our IO immediately increases - and thus our CI gets faster - as soon as I added provisioned throughput). 2. I created internal follow-up tickets to add monitoring and automated actions. First, we should be notified if we are running low on credits to kick-off an investigation. Second (nice to have), we could have a lambda-function which listens for that event and automatically switches the EFS volume from burst-mode to provisioned throughput during high-load-times. The required throughput could be retrieved via CloudWatch and then multiplied by a factor. EFS allows to downgrade the throughput mode 24h after the last changes (to reduce capacity if the load is over) and always allows to upgrade the provisioned capacity (if the load goes even higher). I've been looking for a pre-made CloudFormation template to facilitate that, but so far, I haven't been able to find it. I'm now running additional load tests on our test CI environment to detect other potential bottlenecks. Thanks a lot for your support! Best regards, Marco [1]: https://docs.aws.amazon.com/efs/latest/ug/performance.html [2]: https://docs.aws.amazon.com/efs/latest/ug/performance.html#throughput-modes [3]: https://i.imgur.com/nboQLOn.png On Thu, Nov 22, 2018 at 1:40 AM Qing Lan wrote: > Appreciated for your effort and help to make CI a better place! > > Qing > > On 11/21/18, 4:38 PM, "Lin Yuan" wrote: > > Thanks for your efforts, Marco! > > On Wed, Nov 21, 2018 at 4:02 PM Anirudh Subramanian < > anirudh2...@gmail.com> > wrote: > > > Thanks for the quick response and mitigation! > > > > On Wed, Nov 21, 2018 at 3:55 PM Marco de Abreu > > wrote: > > > > > Hello, > > > > > > today, CI had some issues and I had to cancel all jobs a few > minutes ago. > > > This was basically caused by the high load that is currently being > put on > > > our CI system due to the pre-release efforts for this Friday. > > > > > > It's really unfortunate that we just had outages of three core > components > > > within the last two days - sorry about that!. To recap, we had the > > > following outages (which are unrelated to the parallel refactor of > the > > > Jenkins pipeline): > > > - (yesterday evening) The Jenkins master ran out of disk space and > thus > > > processed requests at reduced capacity > > > - (this morning) The Jenkins master got updated which broke our > > > autoscalings upscaling capabilities. > > > - (new, this evening) Jenkins API was irresponsive: Due to the high > > number > > > of jobs and a bad API design in the Jenkins REST API, the > time-complexity > > > of a simple create or delete request was quadratic which resulted > in all > > > requests timing out (that was the current outage). This resulted > in our > > > auto scaling to be unable to interface with the Jenkins master. > > > > > > I have now made improvements to our REST API calls which reduced > the > > > complexity from O(N^2) to O(1). The reason was an underlying > redirect > > loop > > > in the Jenkins createNode and deleteNode REST API in combination > with > > > unrolling the entire slave and job graph (which got quite huge > during > > > extensive load) upon every single request. Since we had about 150 > > > registered slaves and 1000 jobs in the queue, the duration for a > single > > > REST API call rose to up to 45 seconds (we execute up to a few > hundred > > > queries per auto scaling loop). This lead to our auto scaling > timing
Re: Slack access
added On Thu, Nov 22, 2018 at 9:11 AM Vishaal Kapoor wrote: > Switched e-mails. Thanks! > VIshaal >
Re: [Discussion] Remove bundled llvm OpenMP
Thank you for you answer, Chris. > The whole “mixing omp libraries” is something that occurs in production every day and certainly in everything that uses mkl. I'm afraid this statement is wrong. Intel MKL-DNN strictly ensures that this mixture is not happening: "Intel MKL-DNN uses OpenMP* for parallelism and requires an OpenMP runtime library to work. As different OpenMP runtimes may not be binary compatible it's important to ensure that only one OpenMP runtime is used throughout the application. Having more than one OpenMP runtime initialized may lead to undefined behavior resulting in incorrect results or crashes." [1] That is why 2 different MKLML libraries are provided: lib/libmklml_gnu.so | Intel MKL small library for GNU* OpenMP runtime lib/libmklml_intel.so | Intel MKL small library for Intel(R) OpenMP runtime > is the suggestion that libiomp be removed from mkl? That is certainly not my suggestion. > have you spoken with intel? have you consulted Intel at all? Yes, I have asked for comments on the issue. > “hard to debug random crash”. you’re seeing an assertion which is probably ... I'm seeing the result of undefined behaviour. And I want to put emphasis on the following statement: I disregards of whether there is a particular reason for the assert - it is a result of behaviour that should not happen. There are valid ways how to use llvm OpenMP in MXNet and the current way is not one of them. > The lack of root-causing the problem and knee-jerk solution here makes me uncomfortable. I hope that my efforts highlighting the problems reach you to mitigate your uncomfort. > if you want to see performance differences there’s an environment variable you can set in the mxnet omp tuning code that will print overhead and execution times for the current omp library. I don't want to see performance differences in the current OpenMP library. I want to remove the current OpenMP library and use the one provided by the compiler. Best Anton [1] https://github.com/intel/mkl-dnn/blame/master/README.md#L261-L265 чт, 22 нояб. 2018 г. в 16:50, Chris Olivier : > Do you not work on CI mostly? My apologies for thinking that was some sort > of team effort between you and a few others that were passionate about CI > keeping the CI system running smoothly. > > You have source code, you have the line the assertion is on. If you can’t > describe what’s going wrong that causes the assertion, then I don’t really > have anything more to add to this conversation beyond what’s below: > > The whole “mixing omp libraries” is something that occurs in production > every day and certainly in everything that uses mkl. It may occasionally > cause problems for some edge cases when there is super-complex linking > strategies and dynamic loading. But this is not one of those edge cases. > Mostly blaming this is a red herring for other thread-related problems and > people switch omp library and the timing of their code changes and they > stop seeing the problem. I’ve spent my entire career doing heavily > multiphreaded c++ development and i’ve seen that a million times. is the > suggestion that libiomp be removed from mkl? have you spoken with intel? > have you consulted Intel at all? > > and what you are seeing isn’t some “hard to debug random crash”. you’re > seeing an assertion which is probably related to omp trying to create a > thread pool after a fork and something was done in the mxnet code to make > that sketchy to do. I’d suggest filing an issue with the llvm openmp just > like you’d file with any other not-well-understood behavior in mxnet. > > The lack of root-causing the problem and knee-jerk solution here makes me > uncomfortable. > > if you want to see performance differences there’s an environment variable > you can set in the mxnet omp tuning code that will print overhead and > execution times for the current omp library. > > > > > > > > On Thu, Nov 22, 2018 at 7:12 AM Anton Chernov wrote: > > > Hi Chris, > > > > Thank you for your answer. If you have noticed the initial email comes > from > > me, Anton Chernov (@lebeg on Github) and thus the proposal is not from > any > > 'Ci' team that you've mentioned, but from me personally. > > > > You are writing: > > > > > someone is doing something unhealthy when they fork ... > > > > I'm missing any context to understand what you mean. > > > > > we get a lot of performance gain from OMP ... > > > > There is no data that would prove this statement and therefore it is a > > random guess. > > > > > in many months, no investigation has occurred as to WHY the assertion > is > > failing. > > > > The investigation has concluded that this is happening due to undefined > > behaviour which is, in my opinion, a suffient answer that does not > require > > to go any deeper. > > > > > the pr is vetoed until such a time that the actual root cause of the > > problem is known. > > > > And considering the statements above there is no valid reason to veto the > > PR. > > >
Re: [Discussion] MXNet CMake build - raise minimal required version
yes that flag can be removed. profiler should always be. built in On Thu, Nov 22, 2018 at 7:44 AM Anton Chernov wrote: > You can find relevant information regarding the profiling flag here: > > https://github.com/apache/incubator-mxnet/issues/11563 > > > чт, 22 нояб. 2018 г. в 16:06, Chris Olivier : > > > what is meant by: > > > > > > *Profiling* > > The profiler is always on even for production release builds, because > MXNet > > can not be build without it [2]. ? > > > > you mean it is always built or it is turned on (ie recording and saving > > profiling information)? I am not aware of it being turned on by default. > > > > > > profiler has no overhead when built in but not turned on. > > > > > > On Thu, Nov 22, 2018 at 2:35 AM Anton Chernov > wrote: > > > > > Dear MXNet community, > > > > > > I propose to raise the minimal required cmake version that is needed to > > > build MXNet to 3.10 which was tagged on March 16 2018 [1]. > > > > > > The effort of repairing cmake scripts in general is targeting to > > deprecate > > > make and maintain only 1 build system. > > > > > > *Need* > > > > > > The build system is the foundation of every software project. It's > > quality > > > is directly impacting the quality of the project. The MXNet build > system > > is > > > fragile, partially broken and not maintained. > > > > > > Users of MXNet and developers are confused by the fact that 2 build > > systems > > > exist at the same time: make and CMake. > > > > > > The main functional areas which are impacted by the current state of > the > > > cmake files are: > > > > > > *OpenMP* > > > The current CMake files mix OpenMP libraries from different compliers > > which > > > is undefined behaviour. It leads to indeterministic crashes on some > > > platforms. Build and deployment are very hard. No evidence exists that > > > proves that there is any benefit of having llvm OpenMP library as a > > > submodule in MXNet. > > > > > > *BLAS and LAPACK* > > > Basic math library usage is mixed up. It is hard and confusing to > > configure > > > and the choosing logic of the most optimal library is not present. MKL > > and > > > OpenBLAS are intermixed in an unpredictable manner. > > > > > > *Profiling* > > > The profiler is always on even for production release builds, because > > MXNet > > > can not be build without it [2]. > > > > > > *CUDA* > > > CUDA is detected by 3 different files in the current cmake scripts and > > the > > > choice of those is based on a obscure logic with involves different > > > versions of cmake and platforms which it's building on > > > > > > * CMakeLists.txt > > > * cmake/FirstClassLangCuda.cmake > > > * 3rdparty/mshadow/cmake/Cuda.cmake > > > > > > > > > *Confusing and misleading cmake user options* > > > For example, USE_CUDA / USE_OLDCMAKECUDA. Some of them will do or not > do > > > what they supposed to based on cmake generator version and version of > > cmake > > > [3]. > > > There are currently more than 30 build parameters for MXNet none of > them > > > documented. Some of them not even located in the main CMakeLists.txt > > file, > > > for example 'BLAS'. > > > > > > > > > *Issues* > > > There is a significant amount of github issues related to cmake or > build > > in > > > general. New tickets are issued frequently. > > > > > > * #8702 (https://github.com/apache/incubator-mxnet/issues/8702) > > > [DISCUSSION] Should we deprecate Makefile and only use CMake? > > > * #5079 (https://github.com/apache/incubator-mxnet/issues/5079) > > troubles > > > building python interface on raspberry pi 3 > > > * #1722 (https://github.com/apache/incubator-mxnet/issues/1722) > > problem: > > > compile mxnet with hdfs > > > * #11549 (https://github.com/apache/incubator-mxnet/issues/11549) Pip > > > package can be much faster (OpenCV version?) > > > * #11417 (https://github.com/apache/incubator-mxnet/issues/11417) > > > libomp.so > > > dependency (need REAL fix) > > > * #8532 (https://github.com/apache/incubator-mxnet/issues/8532) > > > mxnet-mkl > > > (v0.12.0) crash when using (conda-installed) numpy with MKL // > > (indirectly) > > > * #11131 (https://github.com/apache/incubator-mxnet/issues/11131) > > > mxnet-cu92 low efficiency // (indirectly) > > > * #10743 (https://github.com/apache/incubator-mxnet/issues/10743) CUDA > > > 9.1.xx failed if not set OLDCMAKECUDA on cmake 3.10.3 with unix > makefile > > or > > > Ninja generator > > > * #10742 (https://github.com/apache/incubator-mxnet/issues/10742) typo > > in > > > cpp-package/CMakeLists.txt > > > * #10737 (https://github.com/apache/incubator-mxnet/issues/10737) > Cmake > > is > > > running again when execute make install > > > * #10543 (https://github.com/apache/incubator-mxnet/issues/10543) > Failed > > > to > > > build from source when set USE_CPP_PACKAGE = 1, fatal error C1083: > unabel > > > to open file: “mxnet-cpp/op.h”: No such file or directory > > > * #10217 (https://github.com/apache/incubator-mxnet/issues/10217) > > Building > > >
Re: [Discussion] MXNet CMake build - raise minimal required version
Thanks Anton for putting this together and your efforts here. I think it's crucial that we maintain and bring the CMake system forward. I have spent a lot of time dealing with CMake issues on different platforms, we really increase developer productivity and platform support by having a streamlined build system. On Thu, Nov 22, 2018 at 4:06 PM Chris Olivier wrote: > > what is meant by: > > > *Profiling* > The profiler is always on even for production release builds, because MXNet > can not be build without it [2]. ? > > you mean it is always built or it is turned on (ie recording and saving > profiling information)? I am not aware of it being turned on by default. > > > profiler has no overhead when built in but not turned on. > > > On Thu, Nov 22, 2018 at 2:35 AM Anton Chernov wrote: > > > Dear MXNet community, > > > > I propose to raise the minimal required cmake version that is needed to > > build MXNet to 3.10 which was tagged on March 16 2018 [1]. > > > > The effort of repairing cmake scripts in general is targeting to deprecate > > make and maintain only 1 build system. > > > > *Need* > > > > The build system is the foundation of every software project. It's quality > > is directly impacting the quality of the project. The MXNet build system is > > fragile, partially broken and not maintained. > > > > Users of MXNet and developers are confused by the fact that 2 build systems > > exist at the same time: make and CMake. > > > > The main functional areas which are impacted by the current state of the > > cmake files are: > > > > *OpenMP* > > The current CMake files mix OpenMP libraries from different compliers which > > is undefined behaviour. It leads to indeterministic crashes on some > > platforms. Build and deployment are very hard. No evidence exists that > > proves that there is any benefit of having llvm OpenMP library as a > > submodule in MXNet. > > > > *BLAS and LAPACK* > > Basic math library usage is mixed up. It is hard and confusing to configure > > and the choosing logic of the most optimal library is not present. MKL and > > OpenBLAS are intermixed in an unpredictable manner. > > > > *Profiling* > > The profiler is always on even for production release builds, because MXNet > > can not be build without it [2]. > > > > *CUDA* > > CUDA is detected by 3 different files in the current cmake scripts and the > > choice of those is based on a obscure logic with involves different > > versions of cmake and platforms which it's building on > > > > * CMakeLists.txt > > * cmake/FirstClassLangCuda.cmake > > * 3rdparty/mshadow/cmake/Cuda.cmake > > > > > > *Confusing and misleading cmake user options* > > For example, USE_CUDA / USE_OLDCMAKECUDA. Some of them will do or not do > > what they supposed to based on cmake generator version and version of cmake > > [3]. > > There are currently more than 30 build parameters for MXNet none of them > > documented. Some of them not even located in the main CMakeLists.txt file, > > for example 'BLAS'. > > > > > > *Issues* > > There is a significant amount of github issues related to cmake or build in > > general. New tickets are issued frequently. > > > > * #8702 (https://github.com/apache/incubator-mxnet/issues/8702) > > [DISCUSSION] Should we deprecate Makefile and only use CMake? > > * #5079 (https://github.com/apache/incubator-mxnet/issues/5079) troubles > > building python interface on raspberry pi 3 > > * #1722 (https://github.com/apache/incubator-mxnet/issues/1722) problem: > > compile mxnet with hdfs > > * #11549 (https://github.com/apache/incubator-mxnet/issues/11549) Pip > > package can be much faster (OpenCV version?) > > * #11417 (https://github.com/apache/incubator-mxnet/issues/11417) > > libomp.so > > dependency (need REAL fix) > > * #8532 (https://github.com/apache/incubator-mxnet/issues/8532) > > mxnet-mkl > > (v0.12.0) crash when using (conda-installed) numpy with MKL // (indirectly) > > * #11131 (https://github.com/apache/incubator-mxnet/issues/11131) > > mxnet-cu92 low efficiency // (indirectly) > > * #10743 (https://github.com/apache/incubator-mxnet/issues/10743) CUDA > > 9.1.xx failed if not set OLDCMAKECUDA on cmake 3.10.3 with unix makefile or > > Ninja generator > > * #10742 (https://github.com/apache/incubator-mxnet/issues/10742) typo in > > cpp-package/CMakeLists.txt > > * #10737 (https://github.com/apache/incubator-mxnet/issues/10737) Cmake is > > running again when execute make install > > * #10543 (https://github.com/apache/incubator-mxnet/issues/10543) Failed > > to > > build from source when set USE_CPP_PACKAGE = 1, fatal error C1083: unabel > > to open file: “mxnet-cpp/op.h”: No such file or directory > > * #10217 (https://github.com/apache/incubator-mxnet/issues/10217) Building > > with OpenCV causes link errors > > * #10175 (https://github.com/apache/incubator-mxnet/issues/10175) MXNet > > MKLDNN build dependency/flow discussion > > * #10009 (https://github.com/apache/incubator-mxnet/issues/10009) > > [CMAKE][IoT]
Re: [Discussion] Remove bundled llvm OpenMP
Do you not work on CI mostly? My apologies for thinking that was some sort of team effort between you and a few others that were passionate about CI keeping the CI system running smoothly. You have source code, you have the line the assertion is on. If you can’t describe what’s going wrong that causes the assertion, then I don’t really have anything more to add to this conversation beyond what’s below: The whole “mixing omp libraries” is something that occurs in production every day and certainly in everything that uses mkl. It may occasionally cause problems for some edge cases when there is super-complex linking strategies and dynamic loading. But this is not one of those edge cases. Mostly blaming this is a red herring for other thread-related problems and people switch omp library and the timing of their code changes and they stop seeing the problem. I’ve spent my entire career doing heavily multiphreaded c++ development and i’ve seen that a million times. is the suggestion that libiomp be removed from mkl? have you spoken with intel? have you consulted Intel at all? and what you are seeing isn’t some “hard to debug random crash”. you’re seeing an assertion which is probably related to omp trying to create a thread pool after a fork and something was done in the mxnet code to make that sketchy to do. I’d suggest filing an issue with the llvm openmp just like you’d file with any other not-well-understood behavior in mxnet. The lack of root-causing the problem and knee-jerk solution here makes me uncomfortable. if you want to see performance differences there’s an environment variable you can set in the mxnet omp tuning code that will print overhead and execution times for the current omp library. On Thu, Nov 22, 2018 at 7:12 AM Anton Chernov wrote: > Hi Chris, > > Thank you for your answer. If you have noticed the initial email comes from > me, Anton Chernov (@lebeg on Github) and thus the proposal is not from any > 'Ci' team that you've mentioned, but from me personally. > > You are writing: > > > someone is doing something unhealthy when they fork ... > > I'm missing any context to understand what you mean. > > > we get a lot of performance gain from OMP ... > > There is no data that would prove this statement and therefore it is a > random guess. > > > in many months, no investigation has occurred as to WHY the assertion is > failing. > > The investigation has concluded that this is happening due to undefined > behaviour which is, in my opinion, a suffient answer that does not require > to go any deeper. > > > the pr is vetoed until such a time that the actual root cause of the > problem is known. > > And considering the statements above there is no valid reason to veto the > PR. > > > Best > Anton > > чт, 22 нояб. 2018 г. в 15:38, Chris Olivier : > > > 3x less overhead* > > > > On Thu, Nov 22, 2018 at 6:25 AM Chris Olivier > > wrote: > > > > > someone is doing something unhealthy when they fork, which is causing > an > > > assertion in the openmp library. the same assertion that would fire in > > mkl, > > > which is linked to libiomp5 (exact same omp library). this is new > > behavior > > > and most likely due to an error or suboptimal approach in the forking > > logic > > > in mxnet. > > > > > > in order to circumvent the assert, the Ci team is proposing to remove > the > > > library completely which is equivalent to cutting off your leg to make > > the > > > pain from stubbing your toe go away. > > > > > > we get a lot of performance gain from OMP. is has about a 1/3 less > > > overhead for entering omp regions and also supports omp regions after a > > > fork, which libgomp does not. > > > > > > in many months, no investigation has occurred as to WHY the assertion > is > > > failing. > > > > > > the pr is vetoed until such a time that the actual root cause of the > > > problem is known. > > > > > > > > > thanks, > > > > > > -Chris. > > > > > > > > > > > > > > > On Thu, Nov 22, 2018 at 4:36 AM Anton Chernov > > wrote: > > > > > >> Dear MXNet community, > > >> > > >> I would like to drive attention to an important issue that is present > in > > >> the MXNet CMake build: usage of bundled llvm OpenMP library. > > >> > > >> I have opened a PR to remove it: > > >> https://github.com/apache/incubator-mxnet/pull/12160 > > >> > > >> The issue was closed, but I am strong in my oppinion that it's the > right > > >> thing to do. > > >> > > >> *Background* > > >> If you want to use OpenMP pragmas in your code for parallelization you > > >> would supply a special flag to the compiler: > > >> > > >> - Clang / -fopenmp > > >> https://openmp.llvm.org/ > > >> > > >> - GCC / -fopenmp > > >> https://gcc.gnu.org/onlinedocs/libgomp/Enabling-OpenMP.html > > >> > > >> - Intel / [Q]openmp > > >> > > >> > > > https://software.intel.com/en-us/node/522689#6E24682E-F411-4AE3-A04D-ECD81C7008D1 > > >> > > >> - Visual Studio: /openmp (Enable OpenMP 2.0 Support) > > >>
Re: [Discussion] MXNet CMake build - raise minimal required version
You can find relevant information regarding the profiling flag here: https://github.com/apache/incubator-mxnet/issues/11563 чт, 22 нояб. 2018 г. в 16:06, Chris Olivier : > what is meant by: > > > *Profiling* > The profiler is always on even for production release builds, because MXNet > can not be build without it [2]. ? > > you mean it is always built or it is turned on (ie recording and saving > profiling information)? I am not aware of it being turned on by default. > > > profiler has no overhead when built in but not turned on. > > > On Thu, Nov 22, 2018 at 2:35 AM Anton Chernov wrote: > > > Dear MXNet community, > > > > I propose to raise the minimal required cmake version that is needed to > > build MXNet to 3.10 which was tagged on March 16 2018 [1]. > > > > The effort of repairing cmake scripts in general is targeting to > deprecate > > make and maintain only 1 build system. > > > > *Need* > > > > The build system is the foundation of every software project. It's > quality > > is directly impacting the quality of the project. The MXNet build system > is > > fragile, partially broken and not maintained. > > > > Users of MXNet and developers are confused by the fact that 2 build > systems > > exist at the same time: make and CMake. > > > > The main functional areas which are impacted by the current state of the > > cmake files are: > > > > *OpenMP* > > The current CMake files mix OpenMP libraries from different compliers > which > > is undefined behaviour. It leads to indeterministic crashes on some > > platforms. Build and deployment are very hard. No evidence exists that > > proves that there is any benefit of having llvm OpenMP library as a > > submodule in MXNet. > > > > *BLAS and LAPACK* > > Basic math library usage is mixed up. It is hard and confusing to > configure > > and the choosing logic of the most optimal library is not present. MKL > and > > OpenBLAS are intermixed in an unpredictable manner. > > > > *Profiling* > > The profiler is always on even for production release builds, because > MXNet > > can not be build without it [2]. > > > > *CUDA* > > CUDA is detected by 3 different files in the current cmake scripts and > the > > choice of those is based on a obscure logic with involves different > > versions of cmake and platforms which it's building on > > > > * CMakeLists.txt > > * cmake/FirstClassLangCuda.cmake > > * 3rdparty/mshadow/cmake/Cuda.cmake > > > > > > *Confusing and misleading cmake user options* > > For example, USE_CUDA / USE_OLDCMAKECUDA. Some of them will do or not do > > what they supposed to based on cmake generator version and version of > cmake > > [3]. > > There are currently more than 30 build parameters for MXNet none of them > > documented. Some of them not even located in the main CMakeLists.txt > file, > > for example 'BLAS'. > > > > > > *Issues* > > There is a significant amount of github issues related to cmake or build > in > > general. New tickets are issued frequently. > > > > * #8702 (https://github.com/apache/incubator-mxnet/issues/8702) > > [DISCUSSION] Should we deprecate Makefile and only use CMake? > > * #5079 (https://github.com/apache/incubator-mxnet/issues/5079) > troubles > > building python interface on raspberry pi 3 > > * #1722 (https://github.com/apache/incubator-mxnet/issues/1722) > problem: > > compile mxnet with hdfs > > * #11549 (https://github.com/apache/incubator-mxnet/issues/11549) Pip > > package can be much faster (OpenCV version?) > > * #11417 (https://github.com/apache/incubator-mxnet/issues/11417) > > libomp.so > > dependency (need REAL fix) > > * #8532 (https://github.com/apache/incubator-mxnet/issues/8532) > > mxnet-mkl > > (v0.12.0) crash when using (conda-installed) numpy with MKL // > (indirectly) > > * #11131 (https://github.com/apache/incubator-mxnet/issues/11131) > > mxnet-cu92 low efficiency // (indirectly) > > * #10743 (https://github.com/apache/incubator-mxnet/issues/10743) CUDA > > 9.1.xx failed if not set OLDCMAKECUDA on cmake 3.10.3 with unix makefile > or > > Ninja generator > > * #10742 (https://github.com/apache/incubator-mxnet/issues/10742) typo > in > > cpp-package/CMakeLists.txt > > * #10737 (https://github.com/apache/incubator-mxnet/issues/10737) Cmake > is > > running again when execute make install > > * #10543 (https://github.com/apache/incubator-mxnet/issues/10543) Failed > > to > > build from source when set USE_CPP_PACKAGE = 1, fatal error C1083: unabel > > to open file: “mxnet-cpp/op.h”: No such file or directory > > * #10217 (https://github.com/apache/incubator-mxnet/issues/10217) > Building > > with OpenCV causes link errors > > * #10175 (https://github.com/apache/incubator-mxnet/issues/10175) MXNet > > MKLDNN build dependency/flow discussion > > * #10009 (https://github.com/apache/incubator-mxnet/issues/10009) > > [CMAKE][IoT] Remove pthread from android_arm64 build > > * #9944 (https://github.com/apache/incubator-mxnet/issues/9944) MXNet > > MinGW-w64 build error // (indirectly) > >
Re: [Discussion] Remove bundled llvm OpenMP
The proposal here is not to eliminate the use of OpenMP but rather to use the compiler's OpenMP implementation rather than a bundled one. I've been bitten by issues with having multiple linked OpenMP implementations before in another library and it was extremely difficult to debug. It seems to me that tackling the issue with the assert is an orthogonal issue altogether. --Alfredo Luque Software Engineer Airbnb Machine Learning Infrastructure On Thu, Nov 22, 2018 at 10:12 AM Anton Chernov wrote: > Hi Chris, > > Thank you for your answer. If you have noticed the initial email comes from > me, Anton Chernov (@lebeg on Github) and thus the proposal is not from any > 'Ci' team that you've mentioned, but from me personally. > > You are writing: > > > someone is doing something unhealthy when they fork ... > > I'm missing any context to understand what you mean. > > > we get a lot of performance gain from OMP ... > > There is no data that would prove this statement and therefore it is a > random guess. > > > in many months, no investigation has occurred as to WHY the assertion is > failing. > > The investigation has concluded that this is happening due to undefined > behaviour which is, in my opinion, a suffient answer that does not require > to go any deeper. > > > the pr is vetoed until such a time that the actual root cause of the > problem is known. > > And considering the statements above there is no valid reason to veto the > PR. > > > Best > Anton > > чт, 22 нояб. 2018 г. в 15:38, Chris Olivier : > > > 3x less overhead* > > > > On Thu, Nov 22, 2018 at 6:25 AM Chris Olivier > > wrote: > > > > > someone is doing something unhealthy when they fork, which is causing > an > > > assertion in the openmp library. the same assertion that would fire in > > mkl, > > > which is linked to libiomp5 (exact same omp library). this is new > > behavior > > > and most likely due to an error or suboptimal approach in the forking > > logic > > > in mxnet. > > > > > > in order to circumvent the assert, the Ci team is proposing to remove > the > > > library completely which is equivalent to cutting off your leg to make > > the > > > pain from stubbing your toe go away. > > > > > > we get a lot of performance gain from OMP. is has about a 1/3 less > > > overhead for entering omp regions and also supports omp regions after a > > > fork, which libgomp does not. > > > > > > in many months, no investigation has occurred as to WHY the assertion > is > > > failing. > > > > > > the pr is vetoed until such a time that the actual root cause of the > > > problem is known. > > > > > > > > > thanks, > > > > > > -Chris. > > > > > > > > > > > > > > > On Thu, Nov 22, 2018 at 4:36 AM Anton Chernov > > wrote: > > > > > >> Dear MXNet community, > > >> > > >> I would like to drive attention to an important issue that is present > in > > >> the MXNet CMake build: usage of bundled llvm OpenMP library. > > >> > > >> I have opened a PR to remove it: > > >> https://github.com/apache/incubator-mxnet/pull/12160 > > >> > > >> The issue was closed, but I am strong in my oppinion that it's the > right > > >> thing to do. > > >> > > >> *Background* > > >> If you want to use OpenMP pragmas in your code for parallelization you > > >> would supply a special flag to the compiler: > > >> > > >> - Clang / -fopenmp > > >> https://openmp.llvm.org/ > > >> > > >> - GCC / -fopenmp > > >> https://gcc.gnu.org/onlinedocs/libgomp/Enabling-OpenMP.html > > >> > > >> - Intel / [Q]openmp > > >> > > >> > > > https://software.intel.com/en-us/node/522689#6E24682E-F411-4AE3-A04D-ECD81C7008D1 > > >> > > >> - Visual Studio: /openmp (Enable OpenMP 2.0 Support) > > >> https://msdn.microsoft.com/en-us/library/tt15eb9t.aspx > > >> > > >> Each of the compilers would enable the '#pragma omp' directive during > > >> C/C++ > > >> compilation and arrange for automatic linking of the OpenMP runtime > > >> library > > >> supplied by each complier separately. > > >> > > >> Thus, to use the advantages of an OpenMP implementation one has to > > compile > > >> the code with the corresponding compiler. > > >> > > >> Currently, in MXNet CMake build scripts a bundled version of llvm > OpenMP > > >> is > > >> used ([1] and [2]) to replace the OpenMP library supplied by the > > compiler. > > >> > > >> I will quote here the README from the MKL-DNN (Intel(R) Math Kernel > > >> Library > > >> for Deep Neural Networks): > > >> > > >> "Intel MKL-DNN uses OpenMP* for parallelism and requires an OpenMP > > runtime > > >> library to work. As different OpenMP runtimes may not be binary > > compatible > > >> it's important to ensure that only one OpenMP runtime is used > throughout > > >> the application. Having more than one OpenMP runtime initialized may > > lead > > >> to undefined behavior resulting in incorrect results or crashes." [3] > > >> > > >> And: > > >> > > >> "Using GNU compiler with -fopenmp and -liomp5 options will link the > > >> application with both Intel and GNU OpenMP runtime
Re: [Discussion] Remove bundled llvm OpenMP
Hi Chris, Thank you for your answer. If you have noticed the initial email comes from me, Anton Chernov (@lebeg on Github) and thus the proposal is not from any 'Ci' team that you've mentioned, but from me personally. You are writing: > someone is doing something unhealthy when they fork ... I'm missing any context to understand what you mean. > we get a lot of performance gain from OMP ... There is no data that would prove this statement and therefore it is a random guess. > in many months, no investigation has occurred as to WHY the assertion is failing. The investigation has concluded that this is happening due to undefined behaviour which is, in my opinion, a suffient answer that does not require to go any deeper. > the pr is vetoed until such a time that the actual root cause of the problem is known. And considering the statements above there is no valid reason to veto the PR. Best Anton чт, 22 нояб. 2018 г. в 15:38, Chris Olivier : > 3x less overhead* > > On Thu, Nov 22, 2018 at 6:25 AM Chris Olivier > wrote: > > > someone is doing something unhealthy when they fork, which is causing an > > assertion in the openmp library. the same assertion that would fire in > mkl, > > which is linked to libiomp5 (exact same omp library). this is new > behavior > > and most likely due to an error or suboptimal approach in the forking > logic > > in mxnet. > > > > in order to circumvent the assert, the Ci team is proposing to remove the > > library completely which is equivalent to cutting off your leg to make > the > > pain from stubbing your toe go away. > > > > we get a lot of performance gain from OMP. is has about a 1/3 less > > overhead for entering omp regions and also supports omp regions after a > > fork, which libgomp does not. > > > > in many months, no investigation has occurred as to WHY the assertion is > > failing. > > > > the pr is vetoed until such a time that the actual root cause of the > > problem is known. > > > > > > thanks, > > > > -Chris. > > > > > > > > > > On Thu, Nov 22, 2018 at 4:36 AM Anton Chernov > wrote: > > > >> Dear MXNet community, > >> > >> I would like to drive attention to an important issue that is present in > >> the MXNet CMake build: usage of bundled llvm OpenMP library. > >> > >> I have opened a PR to remove it: > >> https://github.com/apache/incubator-mxnet/pull/12160 > >> > >> The issue was closed, but I am strong in my oppinion that it's the right > >> thing to do. > >> > >> *Background* > >> If you want to use OpenMP pragmas in your code for parallelization you > >> would supply a special flag to the compiler: > >> > >> - Clang / -fopenmp > >> https://openmp.llvm.org/ > >> > >> - GCC / -fopenmp > >> https://gcc.gnu.org/onlinedocs/libgomp/Enabling-OpenMP.html > >> > >> - Intel / [Q]openmp > >> > >> > https://software.intel.com/en-us/node/522689#6E24682E-F411-4AE3-A04D-ECD81C7008D1 > >> > >> - Visual Studio: /openmp (Enable OpenMP 2.0 Support) > >> https://msdn.microsoft.com/en-us/library/tt15eb9t.aspx > >> > >> Each of the compilers would enable the '#pragma omp' directive during > >> C/C++ > >> compilation and arrange for automatic linking of the OpenMP runtime > >> library > >> supplied by each complier separately. > >> > >> Thus, to use the advantages of an OpenMP implementation one has to > compile > >> the code with the corresponding compiler. > >> > >> Currently, in MXNet CMake build scripts a bundled version of llvm OpenMP > >> is > >> used ([1] and [2]) to replace the OpenMP library supplied by the > compiler. > >> > >> I will quote here the README from the MKL-DNN (Intel(R) Math Kernel > >> Library > >> for Deep Neural Networks): > >> > >> "Intel MKL-DNN uses OpenMP* for parallelism and requires an OpenMP > runtime > >> library to work. As different OpenMP runtimes may not be binary > compatible > >> it's important to ensure that only one OpenMP runtime is used throughout > >> the application. Having more than one OpenMP runtime initialized may > lead > >> to undefined behavior resulting in incorrect results or crashes." [3] > >> > >> And: > >> > >> "Using GNU compiler with -fopenmp and -liomp5 options will link the > >> application with both Intel and GNU OpenMP runtime libraries. This will > >> lead to undefined behavior of the application." [4] > >> > >> As can be seen from ldd for MXNet: > >> > >> $ ldd build/tests/mxnet_unit_tests | grep omp > >> libomp.so => /.../mxnet/build/3rdparty/openmp/runtime/src/libomp.so > >> (0x7f697bc55000) > >> libgomp.so.1 => /usr/lib/x86_64-linux-gnu/libgomp.so.1 > >> (0x7f69660cd000) > >> > >> *Performance* > >> > >> The only performance data related to OpenMP in MXNet I was able to find > is > >> here: > >> > >> > https://github.com/apache/incubator-mxnet/issues/9744#issuecomment-367711172 > >> > >> Which in my understanding is testing imact of different environment > >> variables for the same setup (using same bundled OpenMP library). > >> > >> The libraries may differ in implementation and
Re: [Discussion] Remove bundled llvm OpenMP
3x less overhead* On Thu, Nov 22, 2018 at 6:25 AM Chris Olivier wrote: > someone is doing something unhealthy when they fork, which is causing an > assertion in the openmp library. the same assertion that would fire in mkl, > which is linked to libiomp5 (exact same omp library). this is new behavior > and most likely due to an error or suboptimal approach in the forking logic > in mxnet. > > in order to circumvent the assert, the Ci team is proposing to remove the > library completely which is equivalent to cutting off your leg to make the > pain from stubbing your toe go away. > > we get a lot of performance gain from OMP. is has about a 1/3 less > overhead for entering omp regions and also supports omp regions after a > fork, which libgomp does not. > > in many months, no investigation has occurred as to WHY the assertion is > failing. > > the pr is vetoed until such a time that the actual root cause of the > problem is known. > > > thanks, > > -Chris. > > > > > On Thu, Nov 22, 2018 at 4:36 AM Anton Chernov wrote: > >> Dear MXNet community, >> >> I would like to drive attention to an important issue that is present in >> the MXNet CMake build: usage of bundled llvm OpenMP library. >> >> I have opened a PR to remove it: >> https://github.com/apache/incubator-mxnet/pull/12160 >> >> The issue was closed, but I am strong in my oppinion that it's the right >> thing to do. >> >> *Background* >> If you want to use OpenMP pragmas in your code for parallelization you >> would supply a special flag to the compiler: >> >> - Clang / -fopenmp >> https://openmp.llvm.org/ >> >> - GCC / -fopenmp >> https://gcc.gnu.org/onlinedocs/libgomp/Enabling-OpenMP.html >> >> - Intel / [Q]openmp >> >> https://software.intel.com/en-us/node/522689#6E24682E-F411-4AE3-A04D-ECD81C7008D1 >> >> - Visual Studio: /openmp (Enable OpenMP 2.0 Support) >> https://msdn.microsoft.com/en-us/library/tt15eb9t.aspx >> >> Each of the compilers would enable the '#pragma omp' directive during >> C/C++ >> compilation and arrange for automatic linking of the OpenMP runtime >> library >> supplied by each complier separately. >> >> Thus, to use the advantages of an OpenMP implementation one has to compile >> the code with the corresponding compiler. >> >> Currently, in MXNet CMake build scripts a bundled version of llvm OpenMP >> is >> used ([1] and [2]) to replace the OpenMP library supplied by the compiler. >> >> I will quote here the README from the MKL-DNN (Intel(R) Math Kernel >> Library >> for Deep Neural Networks): >> >> "Intel MKL-DNN uses OpenMP* for parallelism and requires an OpenMP runtime >> library to work. As different OpenMP runtimes may not be binary compatible >> it's important to ensure that only one OpenMP runtime is used throughout >> the application. Having more than one OpenMP runtime initialized may lead >> to undefined behavior resulting in incorrect results or crashes." [3] >> >> And: >> >> "Using GNU compiler with -fopenmp and -liomp5 options will link the >> application with both Intel and GNU OpenMP runtime libraries. This will >> lead to undefined behavior of the application." [4] >> >> As can be seen from ldd for MXNet: >> >> $ ldd build/tests/mxnet_unit_tests | grep omp >> libomp.so => /.../mxnet/build/3rdparty/openmp/runtime/src/libomp.so >> (0x7f697bc55000) >> libgomp.so.1 => /usr/lib/x86_64-linux-gnu/libgomp.so.1 >> (0x7f69660cd000) >> >> *Performance* >> >> The only performance data related to OpenMP in MXNet I was able to find is >> here: >> >> https://github.com/apache/incubator-mxnet/issues/9744#issuecomment-367711172 >> >> Which in my understanding is testing imact of different environment >> variables for the same setup (using same bundled OpenMP library). >> >> The libraries may differ in implementation and the Thread Affinity >> Interface [5] may have significant impact on performance. >> >> All compliers support it: >> >> - Clang / KMP_AFFINITY >> >> https://github.com/clang-ykt/openmp/blob/master/runtime/src/kmp_affinity.cpp >> >> - GCC / GOMP_CPU_AFFINITY >> >> https://gcc.gnu.org/onlinedocs/gcc-4.7.1/libgomp/GOMP_005fCPU_005fAFFINITY.html >> >> - Intel / KMP_AFFINITY >> >> https://software.intel.com/en-us/node/522689#6E24682E-F411-4AE3-A04D-ECD81C7008D1 >> >> - Visual Studio / SetThreadAffinityMask >> >> https://docs.microsoft.com/en-us/windows/desktop/api/winbase/nf-winbase-setthreadaffinitymask >> >> *Issues* >> >> Failed OpenMP assertion when loading MXNet compiled with DEBUG=1 >> https://github.com/apache/incubator-mxnet/issues/10856 >> >> libomp.so dependency (need REAL fix) >> https://github.com/apache/incubator-mxnet/issues/11417 >> >> mxnet-mkl (v0.12.0) crash when using (conda-installed) numpy with MKL >> https://github.com/apache/incubator-mxnet/issues/8532 >> >> Performance regression when OMP_NUM_THREADS environment variable is not >> set >> https://github.com/apache/incubator-mxnet/issues/9744 >> >> Poor concat CPU performance on CUDA builds >>
Re: [Discussion] MXNet CMake build - raise minimal required version
i have not seen any proof that any crashes are due to llvm openmp usage. On Thu, Nov 22, 2018 at 2:35 AM Anton Chernov wrote: > Dear MXNet community, > > I propose to raise the minimal required cmake version that is needed to > build MXNet to 3.10 which was tagged on March 16 2018 [1]. > > The effort of repairing cmake scripts in general is targeting to deprecate > make and maintain only 1 build system. > > *Need* > > The build system is the foundation of every software project. It's quality > is directly impacting the quality of the project. The MXNet build system is > fragile, partially broken and not maintained. > > Users of MXNet and developers are confused by the fact that 2 build systems > exist at the same time: make and CMake. > > The main functional areas which are impacted by the current state of the > cmake files are: > > *OpenMP* > The current CMake files mix OpenMP libraries from different compliers which > is undefined behaviour. It leads to indeterministic crashes on some > platforms. Build and deployment are very hard. No evidence exists that > proves that there is any benefit of having llvm OpenMP library as a > submodule in MXNet. > > *BLAS and LAPACK* > Basic math library usage is mixed up. It is hard and confusing to configure > and the choosing logic of the most optimal library is not present. MKL and > OpenBLAS are intermixed in an unpredictable manner. > > *Profiling* > The profiler is always on even for production release builds, because MXNet > can not be build without it [2]. > > *CUDA* > CUDA is detected by 3 different files in the current cmake scripts and the > choice of those is based on a obscure logic with involves different > versions of cmake and platforms which it's building on > > * CMakeLists.txt > * cmake/FirstClassLangCuda.cmake > * 3rdparty/mshadow/cmake/Cuda.cmake > > > *Confusing and misleading cmake user options* > For example, USE_CUDA / USE_OLDCMAKECUDA. Some of them will do or not do > what they supposed to based on cmake generator version and version of cmake > [3]. > There are currently more than 30 build parameters for MXNet none of them > documented. Some of them not even located in the main CMakeLists.txt file, > for example 'BLAS'. > > > *Issues* > There is a significant amount of github issues related to cmake or build in > general. New tickets are issued frequently. > > * #8702 (https://github.com/apache/incubator-mxnet/issues/8702) > [DISCUSSION] Should we deprecate Makefile and only use CMake? > * #5079 (https://github.com/apache/incubator-mxnet/issues/5079) troubles > building python interface on raspberry pi 3 > * #1722 (https://github.com/apache/incubator-mxnet/issues/1722) problem: > compile mxnet with hdfs > * #11549 (https://github.com/apache/incubator-mxnet/issues/11549) Pip > package can be much faster (OpenCV version?) > * #11417 (https://github.com/apache/incubator-mxnet/issues/11417) > libomp.so > dependency (need REAL fix) > * #8532 (https://github.com/apache/incubator-mxnet/issues/8532) > mxnet-mkl > (v0.12.0) crash when using (conda-installed) numpy with MKL // (indirectly) > * #11131 (https://github.com/apache/incubator-mxnet/issues/11131) > mxnet-cu92 low efficiency // (indirectly) > * #10743 (https://github.com/apache/incubator-mxnet/issues/10743) CUDA > 9.1.xx failed if not set OLDCMAKECUDA on cmake 3.10.3 with unix makefile or > Ninja generator > * #10742 (https://github.com/apache/incubator-mxnet/issues/10742) typo in > cpp-package/CMakeLists.txt > * #10737 (https://github.com/apache/incubator-mxnet/issues/10737) Cmake is > running again when execute make install > * #10543 (https://github.com/apache/incubator-mxnet/issues/10543) Failed > to > build from source when set USE_CPP_PACKAGE = 1, fatal error C1083: unabel > to open file: “mxnet-cpp/op.h”: No such file or directory > * #10217 (https://github.com/apache/incubator-mxnet/issues/10217) Building > with OpenCV causes link errors > * #10175 (https://github.com/apache/incubator-mxnet/issues/10175) MXNet > MKLDNN build dependency/flow discussion > * #10009 (https://github.com/apache/incubator-mxnet/issues/10009) > [CMAKE][IoT] Remove pthread from android_arm64 build > * #9944 (https://github.com/apache/incubator-mxnet/issues/9944) MXNet > MinGW-w64 build error // (indirectly) > * #9868 (https://github.com/apache/incubator-mxnet/issues/9868) MKL and > CMake > * #9516 (https://github.com/apache/incubator-mxnet/issues/9516) cmake > cuda arch issues > * #9105 (https://github.com/apache/incubator-mxnet/issues/9105) > libmxnet.so load path error > * #9096 (https://github.com/apache/incubator-mxnet/issues/9096) MXNet > built with GPerftools crashes > * #8786 (https://github.com/apache/incubator-mxnet/issues/8786) Link > failure on DEBUG=1 (static member symbol not defined) // (indirectly) > * #8729 (https://github.com/apache/incubator-mxnet/issues/8729) Build > amalgamation using a docker // (indirectly) > * #8667
Re: [Discussion] Remove bundled llvm OpenMP
someone is doing something unhealthy when they fork, which is causing an assertion in the openmp library. the same assertion that would fire in mkl, which is linked to libiomp5 (exact same omp library). this is new behavior and most likely due to an error or suboptimal approach in the forking logic in mxnet. in order to circumvent the assert, the Ci team is proposing to remove the library completely which is equivalent to cutting off your leg to make the pain from stubbing your toe go away. we get a lot of performance gain from OMP. is has about a 1/3 less overhead for entering omp regions and also supports omp regions after a fork, which libgomp does not. in many months, no investigation has occurred as to WHY the assertion is failing. the pr is vetoed until such a time that the actual root cause of the problem is known. thanks, -Chris. On Thu, Nov 22, 2018 at 4:36 AM Anton Chernov wrote: > Dear MXNet community, > > I would like to drive attention to an important issue that is present in > the MXNet CMake build: usage of bundled llvm OpenMP library. > > I have opened a PR to remove it: > https://github.com/apache/incubator-mxnet/pull/12160 > > The issue was closed, but I am strong in my oppinion that it's the right > thing to do. > > *Background* > If you want to use OpenMP pragmas in your code for parallelization you > would supply a special flag to the compiler: > > - Clang / -fopenmp > https://openmp.llvm.org/ > > - GCC / -fopenmp > https://gcc.gnu.org/onlinedocs/libgomp/Enabling-OpenMP.html > > - Intel / [Q]openmp > > https://software.intel.com/en-us/node/522689#6E24682E-F411-4AE3-A04D-ECD81C7008D1 > > - Visual Studio: /openmp (Enable OpenMP 2.0 Support) > https://msdn.microsoft.com/en-us/library/tt15eb9t.aspx > > Each of the compilers would enable the '#pragma omp' directive during C/C++ > compilation and arrange for automatic linking of the OpenMP runtime library > supplied by each complier separately. > > Thus, to use the advantages of an OpenMP implementation one has to compile > the code with the corresponding compiler. > > Currently, in MXNet CMake build scripts a bundled version of llvm OpenMP is > used ([1] and [2]) to replace the OpenMP library supplied by the compiler. > > I will quote here the README from the MKL-DNN (Intel(R) Math Kernel Library > for Deep Neural Networks): > > "Intel MKL-DNN uses OpenMP* for parallelism and requires an OpenMP runtime > library to work. As different OpenMP runtimes may not be binary compatible > it's important to ensure that only one OpenMP runtime is used throughout > the application. Having more than one OpenMP runtime initialized may lead > to undefined behavior resulting in incorrect results or crashes." [3] > > And: > > "Using GNU compiler with -fopenmp and -liomp5 options will link the > application with both Intel and GNU OpenMP runtime libraries. This will > lead to undefined behavior of the application." [4] > > As can be seen from ldd for MXNet: > > $ ldd build/tests/mxnet_unit_tests | grep omp > libomp.so => /.../mxnet/build/3rdparty/openmp/runtime/src/libomp.so > (0x7f697bc55000) > libgomp.so.1 => /usr/lib/x86_64-linux-gnu/libgomp.so.1 > (0x7f69660cd000) > > *Performance* > > The only performance data related to OpenMP in MXNet I was able to find is > here: > > https://github.com/apache/incubator-mxnet/issues/9744#issuecomment-367711172 > > Which in my understanding is testing imact of different environment > variables for the same setup (using same bundled OpenMP library). > > The libraries may differ in implementation and the Thread Affinity > Interface [5] may have significant impact on performance. > > All compliers support it: > > - Clang / KMP_AFFINITY > > https://github.com/clang-ykt/openmp/blob/master/runtime/src/kmp_affinity.cpp > > - GCC / GOMP_CPU_AFFINITY > > https://gcc.gnu.org/onlinedocs/gcc-4.7.1/libgomp/GOMP_005fCPU_005fAFFINITY.html > > - Intel / KMP_AFFINITY > > https://software.intel.com/en-us/node/522689#6E24682E-F411-4AE3-A04D-ECD81C7008D1 > > - Visual Studio / SetThreadAffinityMask > > https://docs.microsoft.com/en-us/windows/desktop/api/winbase/nf-winbase-setthreadaffinitymask > > *Issues* > > Failed OpenMP assertion when loading MXNet compiled with DEBUG=1 > https://github.com/apache/incubator-mxnet/issues/10856 > > libomp.so dependency (need REAL fix) > https://github.com/apache/incubator-mxnet/issues/11417 > > mxnet-mkl (v0.12.0) crash when using (conda-installed) numpy with MKL > https://github.com/apache/incubator-mxnet/issues/8532 > > Performance regression when OMP_NUM_THREADS environment variable is not set > https://github.com/apache/incubator-mxnet/issues/9744 > > Poor concat CPU performance on CUDA builds > https://github.com/apache/incubator-mxnet/issues/11905 > > I would appreciate hearing your thoughts. > > > Best > Anton > > [1] > > https://github.com/apache/incubator-mxnet/blob/master/CMakeLists.txt#L400-L405 > [2] https://github.com/apache/incubator-mxnet/tree/master/3rdparty
RE: [Discussion] Remove bundled llvm OpenMP
Thanks for the great summary, Anton. I'm curious that is there anybody builds mxnet successfully with ICC/ICPC? -Original Message- From: Anton Chernov [mailto:mecher...@gmail.com] Sent: Thursday, November 22, 2018 8:36 PM To: d...@mxnet.apache.org Subject: [Discussion] Remove bundled llvm OpenMP Dear MXNet community, I would like to drive attention to an important issue that is present in the MXNet CMake build: usage of bundled llvm OpenMP library. I have opened a PR to remove it: https://github.com/apache/incubator-mxnet/pull/12160 The issue was closed, but I am strong in my oppinion that it's the right thing to do. *Background* If you want to use OpenMP pragmas in your code for parallelization you would supply a special flag to the compiler: - Clang / -fopenmp https://openmp.llvm.org/ - GCC / -fopenmp https://gcc.gnu.org/onlinedocs/libgomp/Enabling-OpenMP.html - Intel / [Q]openmp https://software.intel.com/en-us/node/522689#6E24682E-F411-4AE3-A04D-ECD81C7008D1 - Visual Studio: /openmp (Enable OpenMP 2.0 Support) https://msdn.microsoft.com/en-us/library/tt15eb9t.aspx Each of the compilers would enable the '#pragma omp' directive during C/C++ compilation and arrange for automatic linking of the OpenMP runtime library supplied by each complier separately. Thus, to use the advantages of an OpenMP implementation one has to compile the code with the corresponding compiler. Currently, in MXNet CMake build scripts a bundled version of llvm OpenMP is used ([1] and [2]) to replace the OpenMP library supplied by the compiler. I will quote here the README from the MKL-DNN (Intel(R) Math Kernel Library for Deep Neural Networks): "Intel MKL-DNN uses OpenMP* for parallelism and requires an OpenMP runtime library to work. As different OpenMP runtimes may not be binary compatible it's important to ensure that only one OpenMP runtime is used throughout the application. Having more than one OpenMP runtime initialized may lead to undefined behavior resulting in incorrect results or crashes." [3] And: "Using GNU compiler with -fopenmp and -liomp5 options will link the application with both Intel and GNU OpenMP runtime libraries. This will lead to undefined behavior of the application." [4] As can be seen from ldd for MXNet: $ ldd build/tests/mxnet_unit_tests | grep omp libomp.so => /.../mxnet/build/3rdparty/openmp/runtime/src/libomp.so (0x7f697bc55000) libgomp.so.1 => /usr/lib/x86_64-linux-gnu/libgomp.so.1 (0x7f69660cd000) *Performance* The only performance data related to OpenMP in MXNet I was able to find is here: https://github.com/apache/incubator-mxnet/issues/9744#issuecomment-367711172 Which in my understanding is testing imact of different environment variables for the same setup (using same bundled OpenMP library). The libraries may differ in implementation and the Thread Affinity Interface [5] may have significant impact on performance. All compliers support it: - Clang / KMP_AFFINITY https://github.com/clang-ykt/openmp/blob/master/runtime/src/kmp_affinity.cpp - GCC / GOMP_CPU_AFFINITY https://gcc.gnu.org/onlinedocs/gcc-4.7.1/libgomp/GOMP_005fCPU_005fAFFINITY.html - Intel / KMP_AFFINITY https://software.intel.com/en-us/node/522689#6E24682E-F411-4AE3-A04D-ECD81C7008D1 - Visual Studio / SetThreadAffinityMask https://docs.microsoft.com/en-us/windows/desktop/api/winbase/nf-winbase-setthreadaffinitymask *Issues* Failed OpenMP assertion when loading MXNet compiled with DEBUG=1 https://github.com/apache/incubator-mxnet/issues/10856 libomp.so dependency (need REAL fix) https://github.com/apache/incubator-mxnet/issues/11417 mxnet-mkl (v0.12.0) crash when using (conda-installed) numpy with MKL https://github.com/apache/incubator-mxnet/issues/8532 Performance regression when OMP_NUM_THREADS environment variable is not set https://github.com/apache/incubator-mxnet/issues/9744 Poor concat CPU performance on CUDA builds https://github.com/apache/incubator-mxnet/issues/11905 I would appreciate hearing your thoughts. Best Anton [1] https://github.com/apache/incubator-mxnet/blob/master/CMakeLists.txt#L400-L405 [2] https://github.com/apache/incubator-mxnet/tree/master/3rdparty [3] https://github.com/intel/mkl-dnn/blame/master/README.md#L261-L265 [4] https://github.com/intel/mkl-dnn/blame/master/README.md#L278-L280 [5] https://software.intel.com/en-us/node/522691
[Discussion] Remove bundled llvm OpenMP
Dear MXNet community, I would like to drive attention to an important issue that is present in the MXNet CMake build: usage of bundled llvm OpenMP library. I have opened a PR to remove it: https://github.com/apache/incubator-mxnet/pull/12160 The issue was closed, but I am strong in my oppinion that it's the right thing to do. *Background* If you want to use OpenMP pragmas in your code for parallelization you would supply a special flag to the compiler: - Clang / -fopenmp https://openmp.llvm.org/ - GCC / -fopenmp https://gcc.gnu.org/onlinedocs/libgomp/Enabling-OpenMP.html - Intel / [Q]openmp https://software.intel.com/en-us/node/522689#6E24682E-F411-4AE3-A04D-ECD81C7008D1 - Visual Studio: /openmp (Enable OpenMP 2.0 Support) https://msdn.microsoft.com/en-us/library/tt15eb9t.aspx Each of the compilers would enable the '#pragma omp' directive during C/C++ compilation and arrange for automatic linking of the OpenMP runtime library supplied by each complier separately. Thus, to use the advantages of an OpenMP implementation one has to compile the code with the corresponding compiler. Currently, in MXNet CMake build scripts a bundled version of llvm OpenMP is used ([1] and [2]) to replace the OpenMP library supplied by the compiler. I will quote here the README from the MKL-DNN (Intel(R) Math Kernel Library for Deep Neural Networks): "Intel MKL-DNN uses OpenMP* for parallelism and requires an OpenMP runtime library to work. As different OpenMP runtimes may not be binary compatible it's important to ensure that only one OpenMP runtime is used throughout the application. Having more than one OpenMP runtime initialized may lead to undefined behavior resulting in incorrect results or crashes." [3] And: "Using GNU compiler with -fopenmp and -liomp5 options will link the application with both Intel and GNU OpenMP runtime libraries. This will lead to undefined behavior of the application." [4] As can be seen from ldd for MXNet: $ ldd build/tests/mxnet_unit_tests | grep omp libomp.so => /.../mxnet/build/3rdparty/openmp/runtime/src/libomp.so (0x7f697bc55000) libgomp.so.1 => /usr/lib/x86_64-linux-gnu/libgomp.so.1 (0x7f69660cd000) *Performance* The only performance data related to OpenMP in MXNet I was able to find is here: https://github.com/apache/incubator-mxnet/issues/9744#issuecomment-367711172 Which in my understanding is testing imact of different environment variables for the same setup (using same bundled OpenMP library). The libraries may differ in implementation and the Thread Affinity Interface [5] may have significant impact on performance. All compliers support it: - Clang / KMP_AFFINITY https://github.com/clang-ykt/openmp/blob/master/runtime/src/kmp_affinity.cpp - GCC / GOMP_CPU_AFFINITY https://gcc.gnu.org/onlinedocs/gcc-4.7.1/libgomp/GOMP_005fCPU_005fAFFINITY.html - Intel / KMP_AFFINITY https://software.intel.com/en-us/node/522689#6E24682E-F411-4AE3-A04D-ECD81C7008D1 - Visual Studio / SetThreadAffinityMask https://docs.microsoft.com/en-us/windows/desktop/api/winbase/nf-winbase-setthreadaffinitymask *Issues* Failed OpenMP assertion when loading MXNet compiled with DEBUG=1 https://github.com/apache/incubator-mxnet/issues/10856 libomp.so dependency (need REAL fix) https://github.com/apache/incubator-mxnet/issues/11417 mxnet-mkl (v0.12.0) crash when using (conda-installed) numpy with MKL https://github.com/apache/incubator-mxnet/issues/8532 Performance regression when OMP_NUM_THREADS environment variable is not set https://github.com/apache/incubator-mxnet/issues/9744 Poor concat CPU performance on CUDA builds https://github.com/apache/incubator-mxnet/issues/11905 I would appreciate hearing your thoughts. Best Anton [1] https://github.com/apache/incubator-mxnet/blob/master/CMakeLists.txt#L400-L405 [2] https://github.com/apache/incubator-mxnet/tree/master/3rdparty [3] https://github.com/intel/mkl-dnn/blame/master/README.md#L261-L265 [4] https://github.com/intel/mkl-dnn/blame/master/README.md#L278-L280 [5] https://software.intel.com/en-us/node/522691
[Discussion] MXNet CMake build - raise minimal required version
Dear MXNet community, I propose to raise the minimal required cmake version that is needed to build MXNet to 3.10 which was tagged on March 16 2018 [1]. The effort of repairing cmake scripts in general is targeting to deprecate make and maintain only 1 build system. *Need* The build system is the foundation of every software project. It's quality is directly impacting the quality of the project. The MXNet build system is fragile, partially broken and not maintained. Users of MXNet and developers are confused by the fact that 2 build systems exist at the same time: make and CMake. The main functional areas which are impacted by the current state of the cmake files are: *OpenMP* The current CMake files mix OpenMP libraries from different compliers which is undefined behaviour. It leads to indeterministic crashes on some platforms. Build and deployment are very hard. No evidence exists that proves that there is any benefit of having llvm OpenMP library as a submodule in MXNet. *BLAS and LAPACK* Basic math library usage is mixed up. It is hard and confusing to configure and the choosing logic of the most optimal library is not present. MKL and OpenBLAS are intermixed in an unpredictable manner. *Profiling* The profiler is always on even for production release builds, because MXNet can not be build without it [2]. *CUDA* CUDA is detected by 3 different files in the current cmake scripts and the choice of those is based on a obscure logic with involves different versions of cmake and platforms which it's building on * CMakeLists.txt * cmake/FirstClassLangCuda.cmake * 3rdparty/mshadow/cmake/Cuda.cmake *Confusing and misleading cmake user options* For example, USE_CUDA / USE_OLDCMAKECUDA. Some of them will do or not do what they supposed to based on cmake generator version and version of cmake [3]. There are currently more than 30 build parameters for MXNet none of them documented. Some of them not even located in the main CMakeLists.txt file, for example 'BLAS'. *Issues* There is a significant amount of github issues related to cmake or build in general. New tickets are issued frequently. * #8702 (https://github.com/apache/incubator-mxnet/issues/8702) [DISCUSSION] Should we deprecate Makefile and only use CMake? * #5079 (https://github.com/apache/incubator-mxnet/issues/5079) troubles building python interface on raspberry pi 3 * #1722 (https://github.com/apache/incubator-mxnet/issues/1722) problem: compile mxnet with hdfs * #11549 (https://github.com/apache/incubator-mxnet/issues/11549) Pip package can be much faster (OpenCV version?) * #11417 (https://github.com/apache/incubator-mxnet/issues/11417) libomp.so dependency (need REAL fix) * #8532 (https://github.com/apache/incubator-mxnet/issues/8532) mxnet-mkl (v0.12.0) crash when using (conda-installed) numpy with MKL // (indirectly) * #11131 (https://github.com/apache/incubator-mxnet/issues/11131) mxnet-cu92 low efficiency // (indirectly) * #10743 (https://github.com/apache/incubator-mxnet/issues/10743) CUDA 9.1.xx failed if not set OLDCMAKECUDA on cmake 3.10.3 with unix makefile or Ninja generator * #10742 (https://github.com/apache/incubator-mxnet/issues/10742) typo in cpp-package/CMakeLists.txt * #10737 (https://github.com/apache/incubator-mxnet/issues/10737) Cmake is running again when execute make install * #10543 (https://github.com/apache/incubator-mxnet/issues/10543) Failed to build from source when set USE_CPP_PACKAGE = 1, fatal error C1083: unabel to open file: “mxnet-cpp/op.h”: No such file or directory * #10217 (https://github.com/apache/incubator-mxnet/issues/10217) Building with OpenCV causes link errors * #10175 (https://github.com/apache/incubator-mxnet/issues/10175) MXNet MKLDNN build dependency/flow discussion * #10009 (https://github.com/apache/incubator-mxnet/issues/10009) [CMAKE][IoT] Remove pthread from android_arm64 build * #9944 (https://github.com/apache/incubator-mxnet/issues/9944) MXNet MinGW-w64 build error // (indirectly) * #9868 (https://github.com/apache/incubator-mxnet/issues/9868) MKL and CMake * #9516 (https://github.com/apache/incubator-mxnet/issues/9516) cmake cuda arch issues * #9105 (https://github.com/apache/incubator-mxnet/issues/9105) libmxnet.so load path error * #9096 (https://github.com/apache/incubator-mxnet/issues/9096) MXNet built with GPerftools crashes * #8786 (https://github.com/apache/incubator-mxnet/issues/8786) Link failure on DEBUG=1 (static member symbol not defined) // (indirectly) * #8729 (https://github.com/apache/incubator-mxnet/issues/8729) Build amalgamation using a docker // (indirectly) * #8667 (https://github.com/apache/incubator-mxnet/issues/8667) Compiler/linker error while trying to build from source on Mac OSX Sierra 10.12.6 * #8295 (https://github.com/apache/incubator-mxnet/issues/8295) Building with cmake - error * #7852 (https://github.com/apache/incubator-mxnet/issues/7852) Trouble installing MXNet on Raspberry Pi 3 * #13303