[GitHub] jinhuang415 commented on issue #10433: [MXNET-290] MKLDNN support for model quantization

2018-06-13 Thread GitBox
jinhuang415 commented on issue #10433: [MXNET-290] MKLDNN support for model 
quantization
URL: https://github.com/apache/incubator-mxnet/pull/10433#issuecomment-397143625
 
 
   @marcoabreu @zheng-da Thanks for the approve! Regarding how to check 
different backend context, if we are not in an agreement to expose this as APIs 
to users, how about keep current change? Simply from context is not enough 
since we need to differentiate MKLDNN and native CPU but they all belong to CPU 
context. Another approach from python level to separate mkldnn and native CPU 
is by checking `/proc//maps `to match related mapped library for one 
process, so if mkldnn is built in, libmkldnn.so will be mapped to the process's 
space, the same with cudnn/cuda as well. But this approach only applies for 
Linux (replies on proc file system), so if we do nosetests under MAC or windows 
this will not work.
   Anyway if you think we are fine right now to keep current change, how about 
continue to merge the PR? Thanks!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] jinhuang415 commented on issue #10433: [MXNET-290] MKLDNN support for model quantization

2018-06-10 Thread GitBox
jinhuang415 commented on issue #10433: [MXNET-290] MKLDNN support for model 
quantization
URL: https://github.com/apache/incubator-mxnet/pull/10433#issuecomment-396109115
 
 
   @marcoabreu @zheng-da A kind reminder. Would you help to review the latest 
diff? Thanks.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] jinhuang415 commented on issue #10433: [MXNET-290] MKLDNN support for model quantization

2018-06-07 Thread GitBox
jinhuang415 commented on issue #10433: [MXNET-290] MKLDNN support for model 
quantization
URL: https://github.com/apache/incubator-mxnet/pull/10433#issuecomment-395630010
 
 
   @marcoabreu I used my proposed approach 1 to test all cases and skip and 
print the cases that is expected to skip, please help to take a look. Thanks.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] jinhuang415 commented on issue #10433: [MXNET-290] MKLDNN support for model quantization

2018-06-07 Thread GitBox
jinhuang415 commented on issue #10433: [MXNET-290] MKLDNN support for model 
quantization
URL: https://github.com/apache/incubator-mxnet/pull/10433#issuecomment-395478506
 
 
   @marcoabreu I tried to use below "try .. catch ..." statement to test native 
CPU, MKLDNN, GPU
   ```
   for dtype in ['int8','uint8']:
   try:
   run_test(..., dtype)
   catch:
   ```
   But for those OPs not supported by native CPU, the test would "segmentation 
fault" directly and the exception handler could not catch it so the test will 
abort directly and could not execute any further case, I think the MXNet 
framework didn't handle the unsupported OP very well (when OP didn't register 
FCompute/FComputeEx like for native CPU it didn't register FCompute/FComputeEx 
for quantized conv/pooling/fc), another case is if we test unsupported dtype 
for one backend (like test int8 for MKLDNN or uint8 for GPU), the C++ code will 
do parameter check failure and throw runtime error message, of course we can 
catch this exception and ignore it but it is not the specific 
OperatorMissingError exception.
   
   I think it's not trivial to change MXNet C++ code to throw the specific 
exception for various unsupported cases (need to add proper exception thrower 
for all these cases), how about use below 2 approaches which makes life a 
little easier:
   
   ```
   run_test(..., dtype)// add checking at the top of run_test() so we can 
see what kind of test is expected to be skipped and the rest should be tested 
and passed
   if context == cpu_mkldnn:
   if dtype == 'int8'
   logging.info(skipping due to unsupported dtype {}, dtype)
   return
   elif context == cpu_native:
   ...
   
   
   
   for dtype in ['int8','uint8']:
   run_test(..., dtype)
   ```
   This approach tested all cases and skipped the unsupported case explicitly 
so users can see, for not skipped case they are supposed to test and pass 
through.
   ```
   run_test(..., dtype)  // test case implementation
   
   
   if context == cpu_native:// list test cases for each context implicitly
   dtype_list = []// empty means doesn't support this quantized OP
   elif if context == cpu_mkldnn:
   dtype_list = [uint8] 
   elif if context == gpu:
   dtype_list = [int8] 
   
   for dtype in dtype_list:
   run_test(..., dtype)
   ```
   This approach explicitly listed the dtype_list each context support and only 
test the supported scenario, we don't have backend info in the test case 
implementation( `run_test()` ) so the test case implementation will be clean.
   
   Please let us know your comments.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] jinhuang415 commented on issue #10433: [MXNET-290] MKLDNN support for model quantization

2018-06-07 Thread GitBox
jinhuang415 commented on issue #10433: [MXNET-290] MKLDNN support for model 
quantization
URL: https://github.com/apache/incubator-mxnet/pull/10433#issuecomment-395478506
 
 
   @marcoabreu I tried to use below "try .. catch ..." statement to test native 
CPU, MKLDNN, GPU
   ```
   for dtype in ['int8','uint8']:
   try:
   run_test(..., dtype)
   catch:
   ```
   But for those OPs not supported by native CPU, the test would "segmentation 
fault" directly and the exception handler could not catch it so the test will 
abort directly and could not execute any further case, I think the MXNet 
framework didn't handle the unsupported OP very well (when OP didn't register 
FCompute/FComputeEx like for native CPU it didn't register FCompute/FComputeEx 
for quantized conv/pooling/fc), another case is if we test unsupported dtype 
for one backend (like test int8 for MKLDNN or uint8 for GPU), the C++ code will 
do parameter check failure and throw runtime error message, of course we can 
catch this exception and ignore it but it is not the specific 
OperatorMissingError exception.
   
   I think it's not trivial to change MXNet C++ code to throw the specific 
exception for various unsupported cases (need to add proper exception thrower 
for all these cases), how about use below 2 approaches which makes life a 
little easier:
   
   ```
   run_test(..., dtype)// add checking at the top of run_test() so we can 
see what kind of test is expected to be skipped and the rest should be tested 
and passed
   if context == cpu_mkldnn:
   if dtype == 'int8'
   logging.info(skipping due to unsupported dtype {}, dtype)
   return
   elif context == cpu_native:
   ...
   
   
   
   for dtype in ['int8','uint8']:
   run_test(..., dtype)
   ```
   This approach tested all cases and skipped the unsupported case explicitly 
so users can see, for not skipped case they are supposed to test and pass 
through.
   ```
   run_test(..., dtype)  // test case implementation
   
   
   if context == cpu_native:// list test cases for each context implicitly
   dtype_list = []// empty means doesn't support this quantized OP
   elif if context == cpu_mkldnn:
   dtype_list = [uint8] 
   elif if context == gpu:
dtype_list = [int8] 
   for dtype in dtype_list:
   run_test(..., dtype)
   ```
   This approach explicitly listed the dtype_list each context support and only 
test the supported scenario, we don't have backend info in the test case 
implementation( `run_test()` ) so the test case implementation will be clean.
   
   Please let us know your comments.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] jinhuang415 commented on issue #10433: [MXNET-290] MKLDNN support for model quantization

2018-06-06 Thread GitBox
jinhuang415 commented on issue #10433: [MXNET-290] MKLDNN support for model 
quantization
URL: https://github.com/apache/incubator-mxnet/pull/10433#issuecomment-395122305
 
 
   @marcoabreu Please let me know if I misunderstood your points, and please 
also let us know if there is better suggested approach to handle such 
difference among different backend for naive CPU/MKLDNN/GPU.  (dtype difference 
and supported quantized op difference)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] jinhuang415 commented on issue #10433: [MXNET-290] MKLDNN support for model quantization

2018-06-06 Thread GitBox
jinhuang415 commented on issue #10433: [MXNET-290] MKLDNN support for model 
quantization
URL: https://github.com/apache/incubator-mxnet/pull/10433#issuecomment-395100685
 
 
   @marcoabreu Sorry maybe I didn't explain very clearly, for current 
quantization test, besides the dtype difference between mkldnn (use uint8) and 
GPU (use int8), there is also other difference between naive CPU and 
mkldnn/GPU, that naive CPU doesn't support quantized OP yet (conv/pooling/fc, 
only support quantize/dequantize/requantize), while mkldnn can support 
quantized conv/pooling and GPU can support quantized conv/pooling/fc, the test 
will fail for unsupported OP (unless we return directly for unsupported OP case 
but this needs the backend info). So in order to share the same test code among 
naive CPU/mkldnn/GPU,  looks we still need to consider the backend type.  


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] jinhuang415 commented on issue #10433: [MXNET-290] MKLDNN support for model quantization

2018-06-06 Thread GitBox
jinhuang415 commented on issue #10433: [MXNET-290] MKLDNN support for model 
quantization
URL: https://github.com/apache/incubator-mxnet/pull/10433#issuecomment-395100685
 
 
   @marcoabreu Sorry maybe I didn't explain very clearly, for current 
quantization test, besides the dtype difference between mkldnn (use uint8) and 
GPU (use int8), there is also other difference between naive CPU and 
mkldnn/GPU, that naive CPU doesn't support quantized OP yet (conv/pooling/fc, 
only support quantize/dequantize/requantize), while mkldnn can support 
quantized conv/pooling and GPU can support quantized conv/pooling/fc, the test 
will fail for unsupported OP. So in order to share the same test code among 
naive CPU/mkldnn/GPU,  looks we still need to consider the backend type.  


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] jinhuang415 commented on issue #10433: [MXNET-290] MKLDNN support for model quantization

2018-05-31 Thread GitBox
jinhuang415 commented on issue #10433: [MXNET-290] MKLDNN support for model 
quantization
URL: https://github.com/apache/incubator-mxnet/pull/10433#issuecomment-393739496
 
 
   @marcoabreu This PR has been approved by reminisce and @zheng-da will finish 
the review very soon, would you help to take a look at the PR before it's 
merged (as you suggested about a month ago :)) ? Thanks!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] jinhuang415 commented on issue #10433: [MXNET-290] MKLDNN support for model quantization

2018-05-30 Thread GitBox
jinhuang415 commented on issue #10433: [MXNET-290] MKLDNN support for model 
quantization
URL: https://github.com/apache/incubator-mxnet/pull/10433#issuecomment-393196924
 
 
   @reminisce @zheng-da  We have resolved all the comments, would you help to 
check if you have further comments on the change?
   @marcoabreu @zheng-da We see a lot of Jenkins failure recently after 
submitting new change and most of the failure happens at CPP:GPU Unittest (see 
http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-10433/51/pipeline/726/),
 we tried on our local GPU and everything is fine, and we tried to re-trigger 
the Jenkins and it can pass sometimes (not stable, sometimes need to re-trigger 
several times to pass Jenkins), I think our change should not impact CPP:GPU 
testing, would you help to check if this is an known issue for Jenkins system 
or MXNet base code? Or is there any way to debug the failure issue on Jenkins? 
Thanks. I copied the failure log as below for your reference:
   
   ```
   [14:18:36] /work/mxnet/tests/cpp/engine/threaded_engine_test.cc:133: 
Stopping: NaiveEngine
   
   [14:18:36] /work/mxnet/tests/cpp/engine/threaded_engine_test.cc:135: 
Stopped: NaiveEngine Starting...
   
   [14:18:36] /work/mxnet/tests/cpp/engine/threaded_engine_test.cc:137: 
Started: NaiveEngine Done...
   
   [14:18:36] /work/mxnet/tests/cpp/engine/threaded_engine_test.cc:133: 
Stopping: ThreadedEnginePooled
   
   terminate called after throwing an instance of 'std::system_error'
   
 what():  Operation not permitted
   
   /work/runtime_functions.sh: line 476: 7 Aborted (core 
dumped) build/tests/mxnet_unit_tests
   
   build.py: 2018-05-30 14:18:38,174 Running of command in container failed 
(134): nvidia-docker run --rm -t --shm-size=500m -v 
/home/jenkins_slave/workspace/ut-cpp-gpu:/work/mxnet -v 
/home/jenkins_slave/workspace/ut-cpp-gpu/build:/work/build -u 1001:1001 
mxnet/build.ubuntu_gpu /work/runtime_functions.sh unittest_ubuntu_gpu_cpp
   
   build.py: 2018-05-30 14:18:38,175 You can try to get into the container by 
using the following command: nvidia-docker run --rm -t --shm-size=500m -v 
/home/jenkins_slave/workspace/ut-cpp-gpu:/work/mxnet -v 
/home/jenkins_slave/workspace/ut-cpp-gpu/build:/work/build -u 1001:1001 -ti 
--entrypoint /bin/bash mxnet/build.ubuntu_gpu /work/runtime_functions.sh 
unittest_ubuntu_gpu_cpp
   
   into container: False
   
   Traceback (most recent call last):
   
 File "ci/build.py", line 307, in 
   
   sys.exit(main())
   
 File "ci/build.py", line 243, in main
   
   container_run(platform, docker_binary, shared_memory_size, command)
   
 File "ci/build.py", line 154, in container_run
   
   raise subprocess.CalledProcessError(ret, cmd)
   
   subprocess.CalledProcessError: Command 'nvidia-docker run --rm -t 
--shm-size=500m -v /home/jenkins_slave/workspace/ut-cpp-gpu:/work/mxnet -v 
/home/jenkins_slave/workspace/ut-cpp-gpu/build:/work/build -u 1001:1001 
mxnet/build.ubuntu_gpu /work/runtime_functions.sh unittest_ubuntu_gpu_cpp' 
returned non-zero exit status 134
   
   script returned exit code 1
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] jinhuang415 commented on issue #10433: [MXNET-290] MKLDNN support for model quantization

2018-05-30 Thread GitBox
jinhuang415 commented on issue #10433: [MXNET-290] MKLDNN support for model 
quantization
URL: https://github.com/apache/incubator-mxnet/pull/10433#issuecomment-393196924
 
 
   @reminisce @zheng-da  We have resolved all the comments, would you help to 
check if you have further comments on the change?
   @marcoabreu @zheng-da We see a lot of Jenkins failure recently after 
submitting new change and most of the failure happens at CPP:GPU Unittest (see 
http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-10433/51/pipeline/726/),
 we tried on our local GPU and everything is fine, and we tried to re-trigger 
the Jenkins and it can pass sometimes (not stable, sometimes need to re-trigger 
several times to pass Jenkins), I think our change should not impact CPP:GPU 
testing, would you help to check if this is an issue for Jenkins system? Or is 
there any way to debug the failure issue on Jenkins? Thanks. I copied the 
failure log as below for your reference:
   
   ```
   [14:18:36] /work/mxnet/tests/cpp/engine/threaded_engine_test.cc:133: 
Stopping: NaiveEngine
   
   [14:18:36] /work/mxnet/tests/cpp/engine/threaded_engine_test.cc:135: 
Stopped: NaiveEngine Starting...
   
   [14:18:36] /work/mxnet/tests/cpp/engine/threaded_engine_test.cc:137: 
Started: NaiveEngine Done...
   
   [14:18:36] /work/mxnet/tests/cpp/engine/threaded_engine_test.cc:133: 
Stopping: ThreadedEnginePooled
   
   terminate called after throwing an instance of 'std::system_error'
   
 what():  Operation not permitted
   
   /work/runtime_functions.sh: line 476: 7 Aborted (core 
dumped) build/tests/mxnet_unit_tests
   
   build.py: 2018-05-30 14:18:38,174 Running of command in container failed 
(134): nvidia-docker run --rm -t --shm-size=500m -v 
/home/jenkins_slave/workspace/ut-cpp-gpu:/work/mxnet -v 
/home/jenkins_slave/workspace/ut-cpp-gpu/build:/work/build -u 1001:1001 
mxnet/build.ubuntu_gpu /work/runtime_functions.sh unittest_ubuntu_gpu_cpp
   
   build.py: 2018-05-30 14:18:38,175 You can try to get into the container by 
using the following command: nvidia-docker run --rm -t --shm-size=500m -v 
/home/jenkins_slave/workspace/ut-cpp-gpu:/work/mxnet -v 
/home/jenkins_slave/workspace/ut-cpp-gpu/build:/work/build -u 1001:1001 -ti 
--entrypoint /bin/bash mxnet/build.ubuntu_gpu /work/runtime_functions.sh 
unittest_ubuntu_gpu_cpp
   
   into container: False
   
   Traceback (most recent call last):
   
 File "ci/build.py", line 307, in 
   
   sys.exit(main())
   
 File "ci/build.py", line 243, in main
   
   container_run(platform, docker_binary, shared_memory_size, command)
   
 File "ci/build.py", line 154, in container_run
   
   raise subprocess.CalledProcessError(ret, cmd)
   
   subprocess.CalledProcessError: Command 'nvidia-docker run --rm -t 
--shm-size=500m -v /home/jenkins_slave/workspace/ut-cpp-gpu:/work/mxnet -v 
/home/jenkins_slave/workspace/ut-cpp-gpu/build:/work/build -u 1001:1001 
mxnet/build.ubuntu_gpu /work/runtime_functions.sh unittest_ubuntu_gpu_cpp' 
returned non-zero exit status 134
   
   script returned exit code 1
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] jinhuang415 commented on issue #10433: [MXNET-290] MKLDNN support for model quantization

2018-05-21 Thread GitBox
jinhuang415 commented on issue #10433: [MXNET-290] MKLDNN support for model 
quantization
URL: https://github.com/apache/incubator-mxnet/pull/10433#issuecomment-390582611
 
 
   @zheng-da Currently the int8 performance is not good as FP32, we are 
planning to add several enhancement to improve performance and the int8 
performance will be 1.2~1.5x of FP32 after those enhancements.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] jinhuang415 commented on issue #10433: [MXNET-290] MKLDNN support for model quantization

2018-05-17 Thread GitBox
jinhuang415 commented on issue #10433: [MXNET-290] MKLDNN support for model 
quantization
URL: https://github.com/apache/incubator-mxnet/pull/10433#issuecomment-389866702
 
 
   @reminisce @zheng-da We have resolved all the comments, would you check if 
you have further comments on current change? Thanks.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] jinhuang415 commented on issue #10433: [MXNET-290] MKLDNN support for model quantization

2018-04-26 Thread GitBox
jinhuang415 commented on issue #10433: [MXNET-290] MKLDNN support for model 
quantization
URL: https://github.com/apache/incubator-mxnet/pull/10433#issuecomment-384533399
 
 
   @zheng-da Correct my word, for mkldnn_OIhw4i16o4i change in mkldnn_base.cc, 
this format is needed by int8 otherwise will cause int8 runtime error (FP32 
doesn't need this type so no problem during runtime).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] jinhuang415 commented on issue #10433: [MXNET-290] MKLDNN support for model quantization

2018-04-26 Thread GitBox
jinhuang415 commented on issue #10433: [MXNET-290] MKLDNN support for model 
quantization
URL: https://github.com/apache/incubator-mxnet/pull/10433#issuecomment-384533399
 
 
   @zheng-da Correct my word, for mkldnn_OIhw4i16o4i change in mkldnn_base.cc, 
this format is needed by int8 otherwise will cause runtime error.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] jinhuang415 commented on issue #10433: [MXNET-290] MKLDNN support for model quantization

2018-04-23 Thread GitBox
jinhuang415 commented on issue #10433: [MXNET-290] MKLDNN support for model 
quantization
URL: https://github.com/apache/incubator-mxnet/pull/10433#issuecomment-383627507
 
 
   @marcoabreu Sure, will do.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services