[jira] [Updated] (MXNET-82) Tutorial for implementing a sparse operator in backend

2018-03-12 Thread Haibin Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/MXNET-82?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibin Lin updated MXNET-82:

Assignee: Haibin Lin
  Status: In Progress  (was: To Do)

> Tutorial for implementing a sparse operator in backend
> --
>
> Key: MXNET-82
> URL: https://issues.apache.org/jira/browse/MXNET-82
> Project: Apache MXNet
>  Issue Type: Task
>Reporter: Haibin Lin
>Assignee: Haibin Lin
>Priority: Minor
>
> Something like 
> [https://mxnet.incubator.apache.org/faq/add_op_in_backend.html] 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@mxnet.apache.org
For additional commands, e-mail: issues-h...@mxnet.apache.org



[jira] [Created] (MXNET-82) Tutorial for implementing a sparse operator in backend

2018-03-12 Thread Haibin Lin (JIRA)
Haibin Lin created MXNET-82:
---

 Summary: Tutorial for implementing a sparse operator in backend
 Key: MXNET-82
 URL: https://issues.apache.org/jira/browse/MXNET-82
 Project: Apache MXNet
  Issue Type: Task
Reporter: Haibin Lin


Something like [https://mxnet.incubator.apache.org/faq/add_op_in_backend.html] 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@mxnet.apache.org
For additional commands, e-mail: issues-h...@mxnet.apache.org



[jira] [Commented] (MXNET-60) MXNET_MKLDNN_DEBUG=1 produces errors

2018-03-12 Thread Chris Olivier (JIRA)

[ 
https://issues.apache.org/jira/browse/MXNET-60?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16395792#comment-16395792
 ] 

Chris Olivier commented on MXNET-60:


I'm lost – how is this related to profiler?

> MXNET_MKLDNN_DEBUG=1 produces errors
> 
>
> Key: MXNET-60
> URL: https://issues.apache.org/jira/browse/MXNET-60
> Project: Apache MXNet
>  Issue Type: Bug
>Reporter: Marco de Abreu
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> [http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-9995/32/pipeline/483]
> Setting ``MXNET_MKLDNN_DEBUG=1`` as environment variable will produce the 
> following error in tests. This happens across all configurations and seeds. I 
> do not think that this is a test failure.
>  
> {code:java}
> ==
> ERROR: test_gluon_model_zoo.test_models
> --
> Traceback (most recent call last):
> File "/usr/local/lib/python2.7/dist-packages/nose/case.py", line 197, in 
> runTest
> self.test(*self.arg)
> File "/work/mxnet/tests/python/unittest/common.py", line 157, in test_new
> orig_test(*args, **kwargs)
> File "/work/mxnet/tests/python/unittest/test_gluon_model_zoo.py", line 50, in 
> test_models
> model(mx.nd.random.uniform(shape=data_shape)).wait_to_read()
> File "/work/mxnet/python/mxnet/ndarray/ndarray.py", line 1650, in wait_to_read
> check_call(_LIB.MXNDArrayWaitToRead(self.handle))
> File "/work/mxnet/python/mxnet/base.py", line 149, in check_call
> raise MXNetError(py_str(_LIB.MXGetLastError()))
> MXNetError: [17:10:12] src/operator/nn/mkldnn/mkldnn_base.cc:395: Check 
> failed: similar
> Stack trace returned 10 entries:
> [bt] (0) 
> /work/mxnet/python/mxnet/../../lib/libmxnet.so(dmlc::StackTrace[abi:cxx11]()+0x5b)
>  [0x7f06ccf3745b]
> [bt] (1) 
> /work/mxnet/python/mxnet/../../lib/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x28)
>  [0x7f06ccf38478]
> [bt] (2) 
> /work/mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::OpCheck::Run(std::function  (nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector std::allocator > const&, std::vector std::allocator > const&, std::vector std::allocator > const&)>, nnvm::NodeAttrs const&, 
> mxnet::OpContext const&, std::vector std::allocator > const&, std::vector std::allocator > const&, std::vector std::allocator > const&)+0x3ca8) [0x7f06ccf54198]
> [bt] (3) /work/mxnet/python/mxnet/../../lib/libmxnet.so(+0x2a910d9) 
> [0x7f06cf55a0d9]
> [bt] (4) 
> /work/mxnet/python/mxnet/../../lib/libmxnet.so(std::_Function_handler (mxnet::RunContext), mxnet::imperative::PushFComputeEx(std::function (nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector std::allocator > const&, std::vector std::allocator > const&, std::vector std::allocator > const&)> const&, nnvm::Op const*, 
> nnvm::NodeAttrs const&, mxnet::Context const&, 
> std::vector > 
> const&, std::vector 
> > const&, std::vector 
> const&, std::vector > 
> const&, std::vector > 
> const&, std::vector 
> const&)::
> {lambda(mxnet::RunContext)#1}
> >::_M_invoke(std::_Any_data const&, mxnet::RunContext&&)+0x7c) 
> >[0x7f06cf77608c]
> [bt] (5) /work/mxnet/python/mxnet/../../lib/libmxnet.so(+0x3148fdb) 
> [0x7f06cfc11fdb]
> [bt] (6) 
> /work/mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::engine::ThreadedEngine::ExecuteOprBlock(mxnet::RunContext,
>  mxnet::engine::OprBlock*)+0xcb5) [0x7f06cfc0b1a5]
> [bt] (7) 
> /work/mxnet/python/mxnet/../../lib/libmxnet.so(std::_Function_handler (std::shared_ptr), 
> mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock*,
>  bool)::
> {lambda()#1}
> ::operator()() const::
> {lambda(std::shared_ptr)#1}
> >::_M_invoke(std::_Any_data const&, 
> >std::shared_ptr&&)+0xd9) [0x7f06cfc1d309]
> [bt] (8) 
> /work/mxnet/python/mxnet/../../lib/libmxnet.so(std::thread::_Impl (std::shared_ptr)> 
> >::_M_run()+0x4a) [0x7f06cfc1c43a]
> [bt] (9) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7f06d7ca4c80]
>  >> begin captured stdout << -
> ResNetV1(
> (features): HybridSequential(
> (0): Conv2D(None -> 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), 
> bias=False)
> (1): BatchNorm(fix_gamma=False, 

[jira] [Updated] (MXNET-81) Fix crash with mx.nd.ones

2018-03-12 Thread Anirudh Subramanian (JIRA)

 [ 
https://issues.apache.org/jira/browse/MXNET-81?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anirudh Subramanian updated MXNET-81:
-
Component/s: MXNet Engine

> Fix crash with mx.nd.ones
> -
>
> Key: MXNET-81
> URL: https://issues.apache.org/jira/browse/MXNET-81
> Project: Apache MXNet
>  Issue Type: Bug
>  Components: MXNet Engine
>Reporter: Anirudh Subramanian
>Assignee: Anirudh Subramanian
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@mxnet.apache.org
For additional commands, e-mail: issues-h...@mxnet.apache.org



[jira] [Assigned] (MXNET-81) Fix crash with mx.nd.ones

2018-03-12 Thread Anirudh Subramanian (JIRA)

 [ 
https://issues.apache.org/jira/browse/MXNET-81?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anirudh Subramanian reassigned MXNET-81:


Assignee: Anirudh Subramanian

> Fix crash with mx.nd.ones
> -
>
> Key: MXNET-81
> URL: https://issues.apache.org/jira/browse/MXNET-81
> Project: Apache MXNet
>  Issue Type: Bug
>Reporter: Anirudh Subramanian
>Assignee: Anirudh Subramanian
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@mxnet.apache.org
For additional commands, e-mail: issues-h...@mxnet.apache.org



[jira] [Created] (MXNET-80) Fix average pooling kernel size assignment error

2018-03-12 Thread Xingjian Shi (JIRA)
Xingjian Shi created MXNET-80:
-

 Summary: Fix average pooling kernel size assignment error
 Key: MXNET-80
 URL: https://issues.apache.org/jira/browse/MXNET-80
 Project: Apache MXNet
  Issue Type: New Feature
Reporter: Xingjian Shi


When the "global_pool" parameter is assigned to be "True", there is no need to 
set the kernel parameter in the `Pooling` operator.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@mxnet.apache.org
For additional commands, e-mail: issues-h...@mxnet.apache.org



[jira] [Comment Edited] (MXNET-11) Multithreaded Inference

2018-03-12 Thread Chris Olivier (JIRA)

[ 
https://issues.apache.org/jira/browse/MXNET-11?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16395518#comment-16395518
 ] 

Chris Olivier edited comment on MXNET-11 at 3/12/18 5:05 PM:
-

I suppose the multiple threads would call the C API.  Python is a bit tricky to 
get it to do multithreading, so I wouldn't be concerned with a python entry 
point at this point.

So above, you launched many processes, loaded a model and ran inference in 
parallel?

Or you loaded a model in many threads and ran inference through those?

The main problem right now is that there's not a good way to *share* parameters 
between graphs, so that would probably be some large bulk of the work. There's 
actually several other use-cases for this, including Keras integration.

This item relates to that:  https://issues.apache.org/jira/browse/MXNET-28

 


was (Author: cjolivier01):
I suppose the multiple threads would call the C API.  Python is a bit tricky to 
get it to do multithreading, so I wouldn't be concerned with a python entry 
point at this point.

So above, you launched many processes, loaded a model and ran inference in 
parallel?

Or you loaded a model in many threads and ran inference through those?

The main problem right now is that there's not a good way to *share* parameters 
between graphs, so that would probably be some large bulk of the work. There's 
actually several other use-cases for this, including Keras integration.

 

 

> Multithreaded Inference
> ---
>
> Key: MXNET-11
> URL: https://issues.apache.org/jira/browse/MXNET-11
> Project: Apache MXNet
>  Issue Type: Epic
>  Components: MXNet Engine
>Reporter: Chris Olivier
>Priority: Major
>  Labels: inference
>
> Add the ability to do multithreaded inference without using fork() or using 
> multiple copies of a given model



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@mxnet.apache.org
For additional commands, e-mail: issues-h...@mxnet.apache.org



[jira] [Commented] (MXNET-11) Multithreaded Inference

2018-03-12 Thread Chris Olivier (JIRA)

[ 
https://issues.apache.org/jira/browse/MXNET-11?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16395518#comment-16395518
 ] 

Chris Olivier commented on MXNET-11:


I suppose the multiple threads would call the C API.  Python is a bit tricky to 
get it to do multithreading, so I wouldn't be concerned with a python entry 
point at this point.

So above, you launched many processes, loaded a model and ran inference in 
parallel?

Or you loaded a model in many threads and ran inference through those?

The main problem right now is that there's not a good way to *share* parameters 
between graphs, so that would probably be some large bulk of the work. There's 
actually several other use-cases for this, including Keras integration.

 

 

> Multithreaded Inference
> ---
>
> Key: MXNET-11
> URL: https://issues.apache.org/jira/browse/MXNET-11
> Project: Apache MXNet
>  Issue Type: Epic
>  Components: MXNet Engine
>Reporter: Chris Olivier
>Priority: Major
>  Labels: inference
>
> Add the ability to do multithreaded inference without using fork() or using 
> multiple copies of a given model



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@mxnet.apache.org
For additional commands, e-mail: issues-h...@mxnet.apache.org



[jira] [Commented] (MXNET-11) Multithreaded Inference

2018-03-12 Thread Chris Olivier (JIRA)

[ 
https://issues.apache.org/jira/browse/MXNET-11?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16395506#comment-16395506
 ] 

Chris Olivier commented on MXNET-11:


oops caps...


On Mon, Mar 12, 2018 at 9:56 AM, Chris Olivier 



> Multithreaded Inference
> ---
>
> Key: MXNET-11
> URL: https://issues.apache.org/jira/browse/MXNET-11
> Project: Apache MXNet
>  Issue Type: Epic
>  Components: MXNet Engine
>Reporter: Chris Olivier
>Priority: Major
>  Labels: inference
>
> Add the ability to do multithreaded inference without using fork() or using 
> multiple copies of a given model



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@mxnet.apache.org
For additional commands, e-mail: issues-h...@mxnet.apache.org



[jira] [Commented] (MXNET-11) Multithreaded Inference

2018-03-12 Thread Chris Olivier (JIRA)

[ 
https://issues.apache.org/jira/browse/MXNET-11?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16395505#comment-16395505
 ] 

Chris Olivier commented on MXNET-11:


i CAN;T SEEM TO ADD A COMMENT FOR SOME REASON...

On Sun, Mar 11, 2018 at 10:43 PM, Patric Zhao (JIRA) 



> Multithreaded Inference
> ---
>
> Key: MXNET-11
> URL: https://issues.apache.org/jira/browse/MXNET-11
> Project: Apache MXNet
>  Issue Type: Epic
>  Components: MXNet Engine
>Reporter: Chris Olivier
>Priority: Major
>  Labels: inference
>
> Add the ability to do multithreaded inference without using fork() or using 
> multiple copies of a given model



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@mxnet.apache.org
For additional commands, e-mail: issues-h...@mxnet.apache.org



[jira] [Updated] (MXNET-8) Add basic cython build to cmake build system

2018-03-12 Thread Chris Olivier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MXNET-8?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Olivier updated MXNET-8:
--
Status: Done  (was: In Progress)

> Add basic cython build to cmake build system
> 
>
> Key: MXNET-8
> URL: https://issues.apache.org/jira/browse/MXNET-8
> Project: Apache MXNet
>  Issue Type: Improvement
>  Components: CI Build , MXNet Engine
>Reporter: Chris Olivier
>Assignee: Chris Olivier
>Priority: Major
>  Labels: cython, performance
>
> Make Cython code buildable and runnable with CMake builds.
> Will address Makefile or other build types in subsequent ticket(s)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@mxnet.apache.org
For additional commands, e-mail: issues-h...@mxnet.apache.org



[jira] [Commented] (MXNET-8) Add basic cython build to cmake build system

2018-03-12 Thread Chris Olivier (JIRA)

[ 
https://issues.apache.org/jira/browse/MXNET-8?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16395491#comment-16395491
 ] 

Chris Olivier commented on MXNET-8:
---

CI can't do python2 yet on Windows:

 

https://issues.apache.org/jira/browse/MXNET-61

> Add basic cython build to cmake build system
> 
>
> Key: MXNET-8
> URL: https://issues.apache.org/jira/browse/MXNET-8
> Project: Apache MXNet
>  Issue Type: Improvement
>  Components: CI Build , MXNet Engine
>Reporter: Chris Olivier
>Assignee: Chris Olivier
>Priority: Major
>  Labels: cython, performance
>
> Make Cython code buildable and runnable with CMake builds.
> Will address Makefile or other build types in subsequent ticket(s)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@mxnet.apache.org
For additional commands, e-mail: issues-h...@mxnet.apache.org



[jira] [Created] (MXNET-78) Get unit tests working from normal build and IntelliJ

2018-03-12 Thread Chris Olivier (JIRA)
Chris Olivier created MXNET-78:
--

 Summary: Get unit tests working from normal build and IntelliJ
 Key: MXNET-78
 URL: https://issues.apache.org/jira/browse/MXNET-78
 Project: Apache MXNet
  Issue Type: Task
Reporter: Chris Olivier






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@mxnet.apache.org
For additional commands, e-mail: issues-h...@mxnet.apache.org



[jira] [Created] (MXNET-76) Get build working with IntelliJ

2018-03-12 Thread Chris Olivier (JIRA)
Chris Olivier created MXNET-76:
--

 Summary: Get build working with IntelliJ
 Key: MXNET-76
 URL: https://issues.apache.org/jira/browse/MXNET-76
 Project: Apache MXNet
  Issue Type: Task
Reporter: Chris Olivier






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@mxnet.apache.org
For additional commands, e-mail: issues-h...@mxnet.apache.org



[jira] [Updated] (MXNET-75) Java API

2018-03-12 Thread Chris Olivier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MXNET-75?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Olivier updated MXNET-75:
---
Component/s: Java API

> Java API
> 
>
> Key: MXNET-75
> URL: https://issues.apache.org/jira/browse/MXNET-75
> Project: Apache MXNet
>  Issue Type: Epic
>  Components: Java API
>Reporter: Chris Olivier
>Assignee: Chris Olivier
>Priority: Major
>
> MXNet Java API was started by joern@
> It would be great to continue this feature to production



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@mxnet.apache.org
For additional commands, e-mail: issues-h...@mxnet.apache.org



[jira] [Updated] (MXNET-75) Java API

2018-03-12 Thread Chris Olivier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MXNET-75?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Olivier updated MXNET-75:
---
Epic Name: Java API

> Java API
> 
>
> Key: MXNET-75
> URL: https://issues.apache.org/jira/browse/MXNET-75
> Project: Apache MXNet
>  Issue Type: Epic
>Reporter: Chris Olivier
>Assignee: Chris Olivier
>Priority: Major
>
> MXNet Java API was started by joern@
> It would be great to continue this feature to production



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@mxnet.apache.org
For additional commands, e-mail: issues-h...@mxnet.apache.org



[jira] [Updated] (MXNET-75) Java API

2018-03-12 Thread Chris Olivier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MXNET-75?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Olivier updated MXNET-75:
---
Description: 
MXNet Java API was started by joern@

It would be great to continue this feature to production

> Java API
> 
>
> Key: MXNET-75
> URL: https://issues.apache.org/jira/browse/MXNET-75
> Project: Apache MXNet
>  Issue Type: Epic
>Reporter: Chris Olivier
>Priority: Major
>
> MXNet Java API was started by joern@
> It would be great to continue this feature to production



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@mxnet.apache.org
For additional commands, e-mail: issues-h...@mxnet.apache.org



[jira] [Assigned] (MXNET-75) Java API

2018-03-12 Thread Chris Olivier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MXNET-75?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Olivier reassigned MXNET-75:
--

Assignee: Chris Olivier

> Java API
> 
>
> Key: MXNET-75
> URL: https://issues.apache.org/jira/browse/MXNET-75
> Project: Apache MXNet
>  Issue Type: Epic
>Reporter: Chris Olivier
>Assignee: Chris Olivier
>Priority: Major
>
> MXNet Java API was started by joern@
> It would be great to continue this feature to production



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@mxnet.apache.org
For additional commands, e-mail: issues-h...@mxnet.apache.org



[jira] [Created] (MXNET-75) Java API

2018-03-12 Thread Chris Olivier (JIRA)
Chris Olivier created MXNET-75:
--

 Summary: Java API
 Key: MXNET-75
 URL: https://issues.apache.org/jira/browse/MXNET-75
 Project: Apache MXNet
  Issue Type: Task
Reporter: Chris Olivier






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@mxnet.apache.org
For additional commands, e-mail: issues-h...@mxnet.apache.org



[jira] [Comment Edited] (MXNET-60) MXNET_MKLDNN_DEBUG=1 produces errors

2018-03-12 Thread Patric Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/MXNET-60?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16394900#comment-16394900
 ] 

Patric Zhao edited comment on MXNET-60 at 3/12/18 7:37 AM:
---

[~rahul003]    Sorry for the confusion. Two purposes in here:
 # Check if this open issue still exists. Seems doesn't.
 # Check if we can get the profiling results of MKL-DNN OP by latest profiling 
environment. Seems can't.

Thanks,

 


was (Author: patric zhao):
[~rahul003]    Sorry for the confusion. Two purposes in here:
 # Check if this open issue can be reproduced. Seems NOT.
 # Check if we can get the profiling results of MKL-DNN OP by latest profiling 
environment. Seems NOT.

Thanks,

 

> MXNET_MKLDNN_DEBUG=1 produces errors
> 
>
> Key: MXNET-60
> URL: https://issues.apache.org/jira/browse/MXNET-60
> Project: Apache MXNet
>  Issue Type: Bug
>Reporter: Marco de Abreu
>Priority: Major
>
> [http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-9995/32/pipeline/483]
> Setting ``MXNET_MKLDNN_DEBUG=1`` as environment variable will produce the 
> following error in tests. This happens across all configurations and seeds. I 
> do not think that this is a test failure.
>  
> {code:java}
> ==
> ERROR: test_gluon_model_zoo.test_models
> --
> Traceback (most recent call last):
> File "/usr/local/lib/python2.7/dist-packages/nose/case.py", line 197, in 
> runTest
> self.test(*self.arg)
> File "/work/mxnet/tests/python/unittest/common.py", line 157, in test_new
> orig_test(*args, **kwargs)
> File "/work/mxnet/tests/python/unittest/test_gluon_model_zoo.py", line 50, in 
> test_models
> model(mx.nd.random.uniform(shape=data_shape)).wait_to_read()
> File "/work/mxnet/python/mxnet/ndarray/ndarray.py", line 1650, in wait_to_read
> check_call(_LIB.MXNDArrayWaitToRead(self.handle))
> File "/work/mxnet/python/mxnet/base.py", line 149, in check_call
> raise MXNetError(py_str(_LIB.MXGetLastError()))
> MXNetError: [17:10:12] src/operator/nn/mkldnn/mkldnn_base.cc:395: Check 
> failed: similar
> Stack trace returned 10 entries:
> [bt] (0) 
> /work/mxnet/python/mxnet/../../lib/libmxnet.so(dmlc::StackTrace[abi:cxx11]()+0x5b)
>  [0x7f06ccf3745b]
> [bt] (1) 
> /work/mxnet/python/mxnet/../../lib/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x28)
>  [0x7f06ccf38478]
> [bt] (2) 
> /work/mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::OpCheck::Run(std::function  (nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector std::allocator > const&, std::vector std::allocator > const&, std::vector std::allocator > const&)>, nnvm::NodeAttrs const&, 
> mxnet::OpContext const&, std::vector std::allocator > const&, std::vector std::allocator > const&, std::vector std::allocator > const&)+0x3ca8) [0x7f06ccf54198]
> [bt] (3) /work/mxnet/python/mxnet/../../lib/libmxnet.so(+0x2a910d9) 
> [0x7f06cf55a0d9]
> [bt] (4) 
> /work/mxnet/python/mxnet/../../lib/libmxnet.so(std::_Function_handler (mxnet::RunContext), mxnet::imperative::PushFComputeEx(std::function (nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector std::allocator > const&, std::vector std::allocator > const&, std::vector std::allocator > const&)> const&, nnvm::Op const*, 
> nnvm::NodeAttrs const&, mxnet::Context const&, 
> std::vector > 
> const&, std::vector 
> > const&, std::vector 
> const&, std::vector > 
> const&, std::vector > 
> const&, std::vector 
> const&)::
> {lambda(mxnet::RunContext)#1}
> >::_M_invoke(std::_Any_data const&, mxnet::RunContext&&)+0x7c) 
> >[0x7f06cf77608c]
> [bt] (5) /work/mxnet/python/mxnet/../../lib/libmxnet.so(+0x3148fdb) 
> [0x7f06cfc11fdb]
> [bt] (6) 
> /work/mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::engine::ThreadedEngine::ExecuteOprBlock(mxnet::RunContext,
>  mxnet::engine::OprBlock*)+0xcb5) [0x7f06cfc0b1a5]
> [bt] (7) 
> /work/mxnet/python/mxnet/../../lib/libmxnet.so(std::_Function_handler (std::shared_ptr), 
> mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock*,
>  bool)::
> {lambda()#1}
> ::operator()() const::
> {lambda(std::shared_ptr)#1}
> >::_M_invoke(std::_Any_data const&, 
> >std::shared_ptr&&)+0xd9) [0x7f06cfc1d309]
> [bt] (8) 
> 

[jira] [Comment Edited] (MXNET-60) MXNET_MKLDNN_DEBUG=1 produces errors

2018-03-12 Thread Patric Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/MXNET-60?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16394900#comment-16394900
 ] 

Patric Zhao edited comment on MXNET-60 at 3/12/18 7:15 AM:
---

[~rahul003]    Sorry for the confusion. Two purposes in here:
 # Check if this open issue can be reproduced. Seems NOT.
 # Check if we can get the profiling results of MKL-DNN OP by latest profiling 
environment. Seems NOT.

Thanks,

 


was (Author: patric zhao):
[~rahul003] two purposes in here:
 # Check if this open issue can be reproduced. Seems NOT.
 # Check if we can get the profiling results of MKL-DNN OP by latest profiling 
environment. Seems NOT.

Thanks,

 

> MXNET_MKLDNN_DEBUG=1 produces errors
> 
>
> Key: MXNET-60
> URL: https://issues.apache.org/jira/browse/MXNET-60
> Project: Apache MXNet
>  Issue Type: Bug
>Reporter: Marco de Abreu
>Priority: Major
>
> [http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-9995/32/pipeline/483]
> Setting ``MXNET_MKLDNN_DEBUG=1`` as environment variable will produce the 
> following error in tests. This happens across all configurations and seeds. I 
> do not think that this is a test failure.
>  
> {code:java}
> ==
> ERROR: test_gluon_model_zoo.test_models
> --
> Traceback (most recent call last):
> File "/usr/local/lib/python2.7/dist-packages/nose/case.py", line 197, in 
> runTest
> self.test(*self.arg)
> File "/work/mxnet/tests/python/unittest/common.py", line 157, in test_new
> orig_test(*args, **kwargs)
> File "/work/mxnet/tests/python/unittest/test_gluon_model_zoo.py", line 50, in 
> test_models
> model(mx.nd.random.uniform(shape=data_shape)).wait_to_read()
> File "/work/mxnet/python/mxnet/ndarray/ndarray.py", line 1650, in wait_to_read
> check_call(_LIB.MXNDArrayWaitToRead(self.handle))
> File "/work/mxnet/python/mxnet/base.py", line 149, in check_call
> raise MXNetError(py_str(_LIB.MXGetLastError()))
> MXNetError: [17:10:12] src/operator/nn/mkldnn/mkldnn_base.cc:395: Check 
> failed: similar
> Stack trace returned 10 entries:
> [bt] (0) 
> /work/mxnet/python/mxnet/../../lib/libmxnet.so(dmlc::StackTrace[abi:cxx11]()+0x5b)
>  [0x7f06ccf3745b]
> [bt] (1) 
> /work/mxnet/python/mxnet/../../lib/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x28)
>  [0x7f06ccf38478]
> [bt] (2) 
> /work/mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::OpCheck::Run(std::function  (nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector std::allocator > const&, std::vector std::allocator > const&, std::vector std::allocator > const&)>, nnvm::NodeAttrs const&, 
> mxnet::OpContext const&, std::vector std::allocator > const&, std::vector std::allocator > const&, std::vector std::allocator > const&)+0x3ca8) [0x7f06ccf54198]
> [bt] (3) /work/mxnet/python/mxnet/../../lib/libmxnet.so(+0x2a910d9) 
> [0x7f06cf55a0d9]
> [bt] (4) 
> /work/mxnet/python/mxnet/../../lib/libmxnet.so(std::_Function_handler (mxnet::RunContext), mxnet::imperative::PushFComputeEx(std::function (nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector std::allocator > const&, std::vector std::allocator > const&, std::vector std::allocator > const&)> const&, nnvm::Op const*, 
> nnvm::NodeAttrs const&, mxnet::Context const&, 
> std::vector > 
> const&, std::vector 
> > const&, std::vector 
> const&, std::vector > 
> const&, std::vector > 
> const&, std::vector 
> const&)::
> {lambda(mxnet::RunContext)#1}
> >::_M_invoke(std::_Any_data const&, mxnet::RunContext&&)+0x7c) 
> >[0x7f06cf77608c]
> [bt] (5) /work/mxnet/python/mxnet/../../lib/libmxnet.so(+0x3148fdb) 
> [0x7f06cfc11fdb]
> [bt] (6) 
> /work/mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::engine::ThreadedEngine::ExecuteOprBlock(mxnet::RunContext,
>  mxnet::engine::OprBlock*)+0xcb5) [0x7f06cfc0b1a5]
> [bt] (7) 
> /work/mxnet/python/mxnet/../../lib/libmxnet.so(std::_Function_handler (std::shared_ptr), 
> mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock*,
>  bool)::
> {lambda()#1}
> ::operator()() const::
> {lambda(std::shared_ptr)#1}
> >::_M_invoke(std::_Any_data const&, 
> >std::shared_ptr&&)+0xd9) [0x7f06cfc1d309]
> [bt] (8) 
> 

[jira] [Commented] (MXNET-60) MXNET_MKLDNN_DEBUG=1 produces errors

2018-03-12 Thread Patric Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/MXNET-60?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16394900#comment-16394900
 ] 

Patric Zhao commented on MXNET-60:
--

[~rahul003] two purposes in here:
 # Check if this open issue can be reproduced. Seems NOT.
 # Check if we can get the profiling results of MKL-DNN OP by latest profiling 
environment. Seems NOT.

Thanks,

 

> MXNET_MKLDNN_DEBUG=1 produces errors
> 
>
> Key: MXNET-60
> URL: https://issues.apache.org/jira/browse/MXNET-60
> Project: Apache MXNet
>  Issue Type: Bug
>Reporter: Marco de Abreu
>Priority: Major
>
> [http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-9995/32/pipeline/483]
> Setting ``MXNET_MKLDNN_DEBUG=1`` as environment variable will produce the 
> following error in tests. This happens across all configurations and seeds. I 
> do not think that this is a test failure.
>  
> {code:java}
> ==
> ERROR: test_gluon_model_zoo.test_models
> --
> Traceback (most recent call last):
> File "/usr/local/lib/python2.7/dist-packages/nose/case.py", line 197, in 
> runTest
> self.test(*self.arg)
> File "/work/mxnet/tests/python/unittest/common.py", line 157, in test_new
> orig_test(*args, **kwargs)
> File "/work/mxnet/tests/python/unittest/test_gluon_model_zoo.py", line 50, in 
> test_models
> model(mx.nd.random.uniform(shape=data_shape)).wait_to_read()
> File "/work/mxnet/python/mxnet/ndarray/ndarray.py", line 1650, in wait_to_read
> check_call(_LIB.MXNDArrayWaitToRead(self.handle))
> File "/work/mxnet/python/mxnet/base.py", line 149, in check_call
> raise MXNetError(py_str(_LIB.MXGetLastError()))
> MXNetError: [17:10:12] src/operator/nn/mkldnn/mkldnn_base.cc:395: Check 
> failed: similar
> Stack trace returned 10 entries:
> [bt] (0) 
> /work/mxnet/python/mxnet/../../lib/libmxnet.so(dmlc::StackTrace[abi:cxx11]()+0x5b)
>  [0x7f06ccf3745b]
> [bt] (1) 
> /work/mxnet/python/mxnet/../../lib/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x28)
>  [0x7f06ccf38478]
> [bt] (2) 
> /work/mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::OpCheck::Run(std::function  (nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector std::allocator > const&, std::vector std::allocator > const&, std::vector std::allocator > const&)>, nnvm::NodeAttrs const&, 
> mxnet::OpContext const&, std::vector std::allocator > const&, std::vector std::allocator > const&, std::vector std::allocator > const&)+0x3ca8) [0x7f06ccf54198]
> [bt] (3) /work/mxnet/python/mxnet/../../lib/libmxnet.so(+0x2a910d9) 
> [0x7f06cf55a0d9]
> [bt] (4) 
> /work/mxnet/python/mxnet/../../lib/libmxnet.so(std::_Function_handler (mxnet::RunContext), mxnet::imperative::PushFComputeEx(std::function (nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector std::allocator > const&, std::vector std::allocator > const&, std::vector std::allocator > const&)> const&, nnvm::Op const*, 
> nnvm::NodeAttrs const&, mxnet::Context const&, 
> std::vector > 
> const&, std::vector 
> > const&, std::vector 
> const&, std::vector > 
> const&, std::vector > 
> const&, std::vector 
> const&)::
> {lambda(mxnet::RunContext)#1}
> >::_M_invoke(std::_Any_data const&, mxnet::RunContext&&)+0x7c) 
> >[0x7f06cf77608c]
> [bt] (5) /work/mxnet/python/mxnet/../../lib/libmxnet.so(+0x3148fdb) 
> [0x7f06cfc11fdb]
> [bt] (6) 
> /work/mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::engine::ThreadedEngine::ExecuteOprBlock(mxnet::RunContext,
>  mxnet::engine::OprBlock*)+0xcb5) [0x7f06cfc0b1a5]
> [bt] (7) 
> /work/mxnet/python/mxnet/../../lib/libmxnet.so(std::_Function_handler (std::shared_ptr), 
> mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock*,
>  bool)::
> {lambda()#1}
> ::operator()() const::
> {lambda(std::shared_ptr)#1}
> >::_M_invoke(std::_Any_data const&, 
> >std::shared_ptr&&)+0xd9) [0x7f06cfc1d309]
> [bt] (8) 
> /work/mxnet/python/mxnet/../../lib/libmxnet.so(std::thread::_Impl (std::shared_ptr)> 
> >::_M_run()+0x4a) [0x7f06cfc1c43a]
> [bt] (9) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7f06d7ca4c80]
>  >> begin captured stdout << -
> ResNetV1(
> (features): HybridSequential(
> (0): Conv2D(None -> 64, 

[jira] [Commented] (MXNET-60) MXNET_MKLDNN_DEBUG=1 produces errors

2018-03-12 Thread Rahul Huilgol (JIRA)

[ 
https://issues.apache.org/jira/browse/MXNET-60?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16394893#comment-16394893
 ] 

Rahul Huilgol commented on MXNET-60:


Hi Patric, Sorry I didn't follow what you said? Are you trying to profile the 
code and are facing issues?

> MXNET_MKLDNN_DEBUG=1 produces errors
> 
>
> Key: MXNET-60
> URL: https://issues.apache.org/jira/browse/MXNET-60
> Project: Apache MXNet
>  Issue Type: Bug
>Reporter: Marco de Abreu
>Priority: Major
>
> [http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-9995/32/pipeline/483]
> Setting ``MXNET_MKLDNN_DEBUG=1`` as environment variable will produce the 
> following error in tests. This happens across all configurations and seeds. I 
> do not think that this is a test failure.
>  
> {code:java}
> ==
> ERROR: test_gluon_model_zoo.test_models
> --
> Traceback (most recent call last):
> File "/usr/local/lib/python2.7/dist-packages/nose/case.py", line 197, in 
> runTest
> self.test(*self.arg)
> File "/work/mxnet/tests/python/unittest/common.py", line 157, in test_new
> orig_test(*args, **kwargs)
> File "/work/mxnet/tests/python/unittest/test_gluon_model_zoo.py", line 50, in 
> test_models
> model(mx.nd.random.uniform(shape=data_shape)).wait_to_read()
> File "/work/mxnet/python/mxnet/ndarray/ndarray.py", line 1650, in wait_to_read
> check_call(_LIB.MXNDArrayWaitToRead(self.handle))
> File "/work/mxnet/python/mxnet/base.py", line 149, in check_call
> raise MXNetError(py_str(_LIB.MXGetLastError()))
> MXNetError: [17:10:12] src/operator/nn/mkldnn/mkldnn_base.cc:395: Check 
> failed: similar
> Stack trace returned 10 entries:
> [bt] (0) 
> /work/mxnet/python/mxnet/../../lib/libmxnet.so(dmlc::StackTrace[abi:cxx11]()+0x5b)
>  [0x7f06ccf3745b]
> [bt] (1) 
> /work/mxnet/python/mxnet/../../lib/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x28)
>  [0x7f06ccf38478]
> [bt] (2) 
> /work/mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::OpCheck::Run(std::function  (nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector std::allocator > const&, std::vector std::allocator > const&, std::vector std::allocator > const&)>, nnvm::NodeAttrs const&, 
> mxnet::OpContext const&, std::vector std::allocator > const&, std::vector std::allocator > const&, std::vector std::allocator > const&)+0x3ca8) [0x7f06ccf54198]
> [bt] (3) /work/mxnet/python/mxnet/../../lib/libmxnet.so(+0x2a910d9) 
> [0x7f06cf55a0d9]
> [bt] (4) 
> /work/mxnet/python/mxnet/../../lib/libmxnet.so(std::_Function_handler (mxnet::RunContext), mxnet::imperative::PushFComputeEx(std::function (nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector std::allocator > const&, std::vector std::allocator > const&, std::vector std::allocator > const&)> const&, nnvm::Op const*, 
> nnvm::NodeAttrs const&, mxnet::Context const&, 
> std::vector > 
> const&, std::vector 
> > const&, std::vector 
> const&, std::vector > 
> const&, std::vector > 
> const&, std::vector 
> const&)::
> {lambda(mxnet::RunContext)#1}
> >::_M_invoke(std::_Any_data const&, mxnet::RunContext&&)+0x7c) 
> >[0x7f06cf77608c]
> [bt] (5) /work/mxnet/python/mxnet/../../lib/libmxnet.so(+0x3148fdb) 
> [0x7f06cfc11fdb]
> [bt] (6) 
> /work/mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::engine::ThreadedEngine::ExecuteOprBlock(mxnet::RunContext,
>  mxnet::engine::OprBlock*)+0xcb5) [0x7f06cfc0b1a5]
> [bt] (7) 
> /work/mxnet/python/mxnet/../../lib/libmxnet.so(std::_Function_handler (std::shared_ptr), 
> mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock*,
>  bool)::
> {lambda()#1}
> ::operator()() const::
> {lambda(std::shared_ptr)#1}
> >::_M_invoke(std::_Any_data const&, 
> >std::shared_ptr&&)+0xd9) [0x7f06cfc1d309]
> [bt] (8) 
> /work/mxnet/python/mxnet/../../lib/libmxnet.so(std::thread::_Impl (std::shared_ptr)> 
> >::_M_run()+0x4a) [0x7f06cfc1c43a]
> [bt] (9) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7f06d7ca4c80]
>  >> begin captured stdout << -
> ResNetV1(
> (features): HybridSequential(
> (0): Conv2D(None -> 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), 
> bias=False)
> (1): BatchNorm(fix_gamma=False, 

[jira] [Comment Edited] (MXNET-60) MXNET_MKLDNN_DEBUG=1 produces errors

2018-03-12 Thread Patric Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/MXNET-60?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16394872#comment-16394872
 ] 

Patric Zhao edited comment on MXNET-60 at 3/12/18 6:59 AM:
---

[~rahul003]  [~cjolivier01] I am testing the latest code without the error nor 
the profiler output.

*Is this the expected output?*  

_commit 94f68fc8fd21611b7f5c148cb0e5d134efe58f87_
 _Author: Rahul Huilgol _
 _Date: Sun Mar 11 04:00:55 2018 -0700_

_Fixes for profiler (#9932)_

MKLDNN: Only some code path information.

[14:39:54] src/operator/nn/mkldnn/mkldnn_base.cc:382: test BatchNorm
 [14:39:54] src/operator/nn/mkldnn/mkldnn_base.cc:382: test Activation
 [14:39:54] src/operator/nn/mkldnn/mkldnn_base.cc:382: test Convolution
 [14:39:54] src/operator/nn/mkldnn/mkldnn_base.cc:382: test BatchNorm
 [14:39:54] src/operator/nn/mkldnn/mkldnn_base.cc:382: test Activation
 [14:39:54] src/operator/nn/mkldnn/mkldnn_base.cc:382: test Pooling
 [14:39:54] src/operator/nn/mkldnn/mkldnn_base.cc:382: test FullyConnected 


was (Author: patric zhao):
[~rahul003]  [~cjolivier01] I am testing the latest code without the error nor 
the profiler output.

_commit 94f68fc8fd21611b7f5c148cb0e5d134efe58f87_
 _Author: Rahul Huilgol _
 _Date: Sun Mar 11 04:00:55 2018 -0700_

_Fixes for profiler (#9932)_

MKLDNN: Only some code path information.

[14:39:54] src/operator/nn/mkldnn/mkldnn_base.cc:382: test BatchNorm
 [14:39:54] src/operator/nn/mkldnn/mkldnn_base.cc:382: test Activation
 [14:39:54] src/operator/nn/mkldnn/mkldnn_base.cc:382: test Convolution
 [14:39:54] src/operator/nn/mkldnn/mkldnn_base.cc:382: test BatchNorm
 [14:39:54] src/operator/nn/mkldnn/mkldnn_base.cc:382: test Activation
 [14:39:54] src/operator/nn/mkldnn/mkldnn_base.cc:382: test Pooling
 [14:39:54] src/operator/nn/mkldnn/mkldnn_base.cc:382: test FullyConnected 

> MXNET_MKLDNN_DEBUG=1 produces errors
> 
>
> Key: MXNET-60
> URL: https://issues.apache.org/jira/browse/MXNET-60
> Project: Apache MXNet
>  Issue Type: Bug
>Reporter: Marco de Abreu
>Priority: Major
>
> [http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-9995/32/pipeline/483]
> Setting ``MXNET_MKLDNN_DEBUG=1`` as environment variable will produce the 
> following error in tests. This happens across all configurations and seeds. I 
> do not think that this is a test failure.
>  
> {code:java}
> ==
> ERROR: test_gluon_model_zoo.test_models
> --
> Traceback (most recent call last):
> File "/usr/local/lib/python2.7/dist-packages/nose/case.py", line 197, in 
> runTest
> self.test(*self.arg)
> File "/work/mxnet/tests/python/unittest/common.py", line 157, in test_new
> orig_test(*args, **kwargs)
> File "/work/mxnet/tests/python/unittest/test_gluon_model_zoo.py", line 50, in 
> test_models
> model(mx.nd.random.uniform(shape=data_shape)).wait_to_read()
> File "/work/mxnet/python/mxnet/ndarray/ndarray.py", line 1650, in wait_to_read
> check_call(_LIB.MXNDArrayWaitToRead(self.handle))
> File "/work/mxnet/python/mxnet/base.py", line 149, in check_call
> raise MXNetError(py_str(_LIB.MXGetLastError()))
> MXNetError: [17:10:12] src/operator/nn/mkldnn/mkldnn_base.cc:395: Check 
> failed: similar
> Stack trace returned 10 entries:
> [bt] (0) 
> /work/mxnet/python/mxnet/../../lib/libmxnet.so(dmlc::StackTrace[abi:cxx11]()+0x5b)
>  [0x7f06ccf3745b]
> [bt] (1) 
> /work/mxnet/python/mxnet/../../lib/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x28)
>  [0x7f06ccf38478]
> [bt] (2) 
> /work/mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::OpCheck::Run(std::function  (nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector std::allocator > const&, std::vector std::allocator > const&, std::vector std::allocator > const&)>, nnvm::NodeAttrs const&, 
> mxnet::OpContext const&, std::vector std::allocator > const&, std::vector std::allocator > const&, std::vector std::allocator > const&)+0x3ca8) [0x7f06ccf54198]
> [bt] (3) /work/mxnet/python/mxnet/../../lib/libmxnet.so(+0x2a910d9) 
> [0x7f06cf55a0d9]
> [bt] (4) 
> /work/mxnet/python/mxnet/../../lib/libmxnet.so(std::_Function_handler (mxnet::RunContext), mxnet::imperative::PushFComputeEx(std::function (nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector std::allocator > const&, std::vector std::allocator > const&, std::vector std::allocator > const&)> const&, nnvm::Op const*, 
> nnvm::NodeAttrs const&, mxnet::Context const&, 
> std::vector

[jira] [Comment Edited] (MXNET-60) MXNET_MKLDNN_DEBUG=1 produces errors

2018-03-12 Thread Patric Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/MXNET-60?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16394872#comment-16394872
 ] 

Patric Zhao edited comment on MXNET-60 at 3/12/18 6:52 AM:
---

[~rahul003]  [~cjolivier01] I am testing the latest code without the error nor 
the profiler output.

_commit 94f68fc8fd21611b7f5c148cb0e5d134efe58f87_
 _Author: Rahul Huilgol _
 _Date: Sun Mar 11 04:00:55 2018 -0700_

_Fixes for profiler (#9932)_

MKLDNN: Only some code path information.

[14:39:54] src/operator/nn/mkldnn/mkldnn_base.cc:382: test BatchNorm
 [14:39:54] src/operator/nn/mkldnn/mkldnn_base.cc:382: test Activation
 [14:39:54] src/operator/nn/mkldnn/mkldnn_base.cc:382: test Convolution
 [14:39:54] src/operator/nn/mkldnn/mkldnn_base.cc:382: test BatchNorm
 [14:39:54] src/operator/nn/mkldnn/mkldnn_base.cc:382: test Activation
 [14:39:54] src/operator/nn/mkldnn/mkldnn_base.cc:382: test Pooling
 [14:39:54] src/operator/nn/mkldnn/mkldnn_base.cc:382: test FullyConnected 


was (Author: patric zhao):
[~rahul003]  I am testing the latest code without the error nor the profiler 
output.

_commit 94f68fc8fd21611b7f5c148cb0e5d134efe58f87_
_Author: Rahul Huilgol _
_Date: Sun Mar 11 04:00:55 2018 -0700_

_Fixes for profiler (#9932)_

MKLDNN: Only some code path information.

[14:39:54] src/operator/nn/mkldnn/mkldnn_base.cc:382: test BatchNorm
[14:39:54] src/operator/nn/mkldnn/mkldnn_base.cc:382: test Activation
[14:39:54] src/operator/nn/mkldnn/mkldnn_base.cc:382: test Convolution
[14:39:54] src/operator/nn/mkldnn/mkldnn_base.cc:382: test BatchNorm
[14:39:54] src/operator/nn/mkldnn/mkldnn_base.cc:382: test Activation
[14:39:54] src/operator/nn/mkldnn/mkldnn_base.cc:382: test Pooling
[14:39:54] src/operator/nn/mkldnn/mkldnn_base.cc:382: test FullyConnected 

> MXNET_MKLDNN_DEBUG=1 produces errors
> 
>
> Key: MXNET-60
> URL: https://issues.apache.org/jira/browse/MXNET-60
> Project: Apache MXNet
>  Issue Type: Bug
>Reporter: Marco de Abreu
>Priority: Major
>
> [http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-9995/32/pipeline/483]
> Setting ``MXNET_MKLDNN_DEBUG=1`` as environment variable will produce the 
> following error in tests. This happens across all configurations and seeds. I 
> do not think that this is a test failure.
>  
> {code:java}
> ==
> ERROR: test_gluon_model_zoo.test_models
> --
> Traceback (most recent call last):
> File "/usr/local/lib/python2.7/dist-packages/nose/case.py", line 197, in 
> runTest
> self.test(*self.arg)
> File "/work/mxnet/tests/python/unittest/common.py", line 157, in test_new
> orig_test(*args, **kwargs)
> File "/work/mxnet/tests/python/unittest/test_gluon_model_zoo.py", line 50, in 
> test_models
> model(mx.nd.random.uniform(shape=data_shape)).wait_to_read()
> File "/work/mxnet/python/mxnet/ndarray/ndarray.py", line 1650, in wait_to_read
> check_call(_LIB.MXNDArrayWaitToRead(self.handle))
> File "/work/mxnet/python/mxnet/base.py", line 149, in check_call
> raise MXNetError(py_str(_LIB.MXGetLastError()))
> MXNetError: [17:10:12] src/operator/nn/mkldnn/mkldnn_base.cc:395: Check 
> failed: similar
> Stack trace returned 10 entries:
> [bt] (0) 
> /work/mxnet/python/mxnet/../../lib/libmxnet.so(dmlc::StackTrace[abi:cxx11]()+0x5b)
>  [0x7f06ccf3745b]
> [bt] (1) 
> /work/mxnet/python/mxnet/../../lib/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x28)
>  [0x7f06ccf38478]
> [bt] (2) 
> /work/mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::OpCheck::Run(std::function  (nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector std::allocator > const&, std::vector std::allocator > const&, std::vector std::allocator > const&)>, nnvm::NodeAttrs const&, 
> mxnet::OpContext const&, std::vector std::allocator > const&, std::vector std::allocator > const&, std::vector std::allocator > const&)+0x3ca8) [0x7f06ccf54198]
> [bt] (3) /work/mxnet/python/mxnet/../../lib/libmxnet.so(+0x2a910d9) 
> [0x7f06cf55a0d9]
> [bt] (4) 
> /work/mxnet/python/mxnet/../../lib/libmxnet.so(std::_Function_handler (mxnet::RunContext), mxnet::imperative::PushFComputeEx(std::function (nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector std::allocator > const&, std::vector std::allocator > const&, std::vector std::allocator > const&)> const&, nnvm::Op const*, 
> nnvm::NodeAttrs const&, mxnet::Context const&, 
> std::vector > 
> const&, std::vector