[GitHub] pengzhao-intel commented on issue #8532: mxnet-mkl (v0.12.0) crash when using (conda-installed) numpy with MKL

2018-03-12 Thread GitBox
pengzhao-intel commented on issue #8532: mxnet-mkl (v0.12.0) crash when using 
(conda-installed) numpy with MKL
URL: 
https://github.com/apache/incubator-mxnet/issues/8532#issuecomment-372555255
 
 
   @fhieber could you try the new build of 1.1.0
   https://github.com/apache/incubator-mxnet/releases/tag/1.1.0


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] pengzhao-intel commented on issue #8532: mxnet-mkl (v0.12.0) crash when using (conda-installed) numpy with MKL

2018-03-12 Thread GitBox
pengzhao-intel commented on issue #8532: mxnet-mkl (v0.12.0) crash when using 
(conda-installed) numpy with MKL
URL: 
https://github.com/apache/incubator-mxnet/issues/8532#issuecomment-372555255
 
 
   @fhieber could you try the new build of 1.1.0
   https://github.com/apache/incubator-mxnet/releases/tag/1.1.0


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] sxjscience commented on issue #10084: which optimizer to support sparse update except adam, ftrl, sgd?

2018-03-12 Thread GitBox
sxjscience commented on issue #10084: which optimizer to support sparse update 
except adam, ftrl, sgd?
URL: 
https://github.com/apache/incubator-mxnet/issues/10084#issuecomment-372554612
 
 
   @eric-haibin-lin @ZiyueHuang 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] moveforever opened a new issue #10084: which optimizer to support sparse update except adam, ftrl, sgd?

2018-03-12 Thread GitBox
moveforever opened a new issue #10084: which optimizer to support sparse update 
except adam, ftrl, sgd?
URL: https://github.com/apache/incubator-mxnet/issues/10084
 
 
   Note: Providing complete information in the most concise form is the best 
way to get help. This issue template serves as the checklist for essential 
information to most of the technical issues and bug reports. For non-technical 
issues and feature requests, feel free to present the information in what you 
believe is the best form.
   
   For Q & A and discussion, please start a discussion thread at 
https://discuss.mxnet.io 
   
   ## Description
   (Brief description of the problem in no more than 2 sentences.)
   
   ## Environment info (Required)
   
   ```
   What to do:
   1. Download the diagnosis script from 
https://raw.githubusercontent.com/apache/incubator-mxnet/master/tools/diagnose.py
   2. Run the script using `python diagnose.py` and paste its output here.
   
   ```
   
   Package used (Python/R/Scala/Julia):
   (I'm using ...)
   
   For Scala user, please provide:
   1. Java version: (`java -version`)
   2. Maven version: (`mvn -version`)
   3. Scala runtime if applicable: (`scala -version`)
   
   For R user, please provide R `sessionInfo()`:
   
   ## Build info (Required if built from source)
   
   Compiler (gcc/clang/mingw/visual studio):
   
   MXNet commit hash:
   (Paste the output of `git rev-parse HEAD` here.)
   
   Build config:
   (Paste the content of config.mk, or the build command.)
   
   ## Error Message:
   (Paste the complete error message, including stack trace.)
   
   ## Minimum reproducible example
   (If you are using your own code, please provide a short script that 
reproduces the error. Otherwise, please provide link to the existing example.)
   
   ## Steps to reproduce
   (Paste the commands you ran that produced the error.)
   
   1.
   2.
   
   ## What have you tried to solve it?
   
   1.
   2.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] sxjscience commented on issue #10035: Who can help me solve this error??batch_loss.backward() error?

2018-03-12 Thread GitBox
sxjscience commented on issue #10035: Who can help me solve this 
error??batch_loss.backward()  error?
URL: 
https://github.com/apache/incubator-mxnet/issues/10035#issuecomment-372553460
 
 
   @kenjewu May I know the status of this question?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] sxjscience commented on a change in pull request #10000: [MXNET-80] Fix average pooling kernel size assignment error

2018-03-12 Thread GitBox
sxjscience commented on a change in pull request #1: [MXNET-80] Fix average 
pooling kernel size assignment error
URL: https://github.com/apache/incubator-mxnet/pull/1#discussion_r174021284
 
 

 ##
 File path: src/operator/nn/pooling.cc
 ##
 @@ -54,11 +54,13 @@ static void PoolingParamParser(nnvm::NodeAttrs *attrs) {
 if (param.stride.ndim() == 0) param.stride = Shape3(1, 1, 1);
 if (param.pad.ndim() == 0) param.pad = Shape3(0, 0, 0);
   }
-  CHECK_EQ(param.stride.ndim(), param.kernel.ndim())
-  << "stride and kernel should have the same length";
-  CHECK_EQ(param.pad.ndim(), param.kernel.ndim())
-  << "pad and kernel should have the same length";
-  attrs->parsed = std::move(param);
+  if (param.global_pool == false) {
+CHECK_EQ(param.stride.ndim(), param.kernel.ndim())
+<< "stride and kernel should have the same length";
+CHECK_EQ(param.pad.ndim(), param.kernel.ndim())
+<< "pad and kernel should have the same length";
+attrs->parsed = std::move(param);
 
 Review comment:
   We still need to parse the param


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] tqchen commented on issue #10083: [TENSOR] Fix DLTensor conversion for int64

2018-03-12 Thread GitBox
tqchen commented on issue #10083: [TENSOR] Fix DLTensor conversion for int64
URL: https://github.com/apache/incubator-mxnet/pull/10083#issuecomment-372550931
 
 
   cc @ZihengJiang 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] sxjscience commented on a change in pull request #10000: [MXNET-80] Fix average pooling kernel size assignment error

2018-03-12 Thread GitBox
sxjscience commented on a change in pull request #1: [MXNET-80] Fix average 
pooling kernel size assignment error
URL: https://github.com/apache/incubator-mxnet/pull/1#discussion_r174019453
 
 

 ##
 File path: src/operator/nn/pooling-inl.h
 ##
 @@ -56,11 +56,11 @@ struct PoolingParam : public dmlc::Parameter 
{
 DMLC_DECLARE_FIELD(cudnn_off).set_default(false)
 .describe("Turn off cudnn pooling and use MXNet pooling operator. ");
 
-DMLC_DECLARE_FIELD(kernel)
+DMLC_DECLARE_FIELD(kernel).set_default(TShape())  // add default value here
 .enforce_nonzero()
 .describe("Pooling kernel size: (y, x) or (d, y, x)");
 
-DMLC_DECLARE_FIELD(pool_type)
+DMLC_DECLARE_FIELD(pool_type).set_default(pool_enum::kMaxPooling)  // add 
default pooling method
 .add_enum("max", pool_enum::kMaxPooling)
 .add_enum("avg", pool_enum::kAvgPooling)
 .add_enum("sum", pool_enum::kSumPooling)
 
 Review comment:
   There's no need to check this if global_pool is turned on.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] tqchen opened a new pull request #10083: [TENSOR] Fix DLTensor conversion for int64

2018-03-12 Thread GitBox
tqchen opened a new pull request #10083: [TENSOR] Fix DLTensor conversion for 
int64
URL: https://github.com/apache/incubator-mxnet/pull/10083
 
 
   This is a bugfix PR that fixes the DLTensor conversion for type int64, which 
is not covered by previous test. I have updated the test to cover this case


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] CoinCheung commented on a change in pull request #10000: [MXNET-80] Fix average pooling kernel size assignment error

2018-03-12 Thread GitBox
CoinCheung commented on a change in pull request #1: [MXNET-80] Fix average 
pooling kernel size assignment error
URL: https://github.com/apache/incubator-mxnet/pull/1#discussion_r174018923
 
 

 ##
 File path: src/operator/nn/pooling-inl.h
 ##
 @@ -56,11 +56,11 @@ struct PoolingParam : public dmlc::Parameter 
{
 DMLC_DECLARE_FIELD(cudnn_off).set_default(false)
 .describe("Turn off cudnn pooling and use MXNet pooling operator. ");
 
-DMLC_DECLARE_FIELD(kernel)
+DMLC_DECLARE_FIELD(kernel).set_default(TShape())  // add default value here
 .enforce_nonzero()
 .describe("Pooling kernel size: (y, x) or (d, y, x)");
 
-DMLC_DECLARE_FIELD(pool_type)
+DMLC_DECLARE_FIELD(pool_type).set_default(pool_enum::kMaxPooling)  // add 
default pooling method
 .add_enum("max", pool_enum::kMaxPooling)
 .add_enum("avg", pool_enum::kAvgPooling)
 .add_enum("sum", pool_enum::kSumPooling)
 
 Review comment:
   I tried a few times, and it failed at this position:
   
https://github.com/apache/incubator-mxnet/blob/94f68fc8fd21611b7f5c148cb0e5d134efe58f87/src/operator/nn/pooling.cc#L55
   But I do not understand why it requires stride and kernel have the same 
length.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] haojin2 commented on a change in pull request #10078: Support float16 in L2Normalization operator

2018-03-12 Thread GitBox
haojin2 commented on a change in pull request #10078: Support float16 in 
L2Normalization operator
URL: https://github.com/apache/incubator-mxnet/pull/10078#discussion_r174012945
 
 

 ##
 File path: src/operator/l2_normalization.cc
 ##
 @@ -26,13 +26,22 @@
 namespace mxnet {
 namespace op {
 template<>
-Operator* CreateOp(L2NormalizationParam param) {
-  return new L2NormalizationOp(param);
+Operator* CreateOp(L2NormalizationParam param, int dtype) {
 
 Review comment:
   https://github.com/apache/incubator-mxnet/pull/3011/files


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] cjolivier01 commented on a change in pull request #10078: Support float16 in L2Normalization operator

2018-03-12 Thread GitBox
cjolivier01 commented on a change in pull request #10078: Support float16 in 
L2Normalization operator
URL: https://github.com/apache/incubator-mxnet/pull/10078#discussion_r174012691
 
 

 ##
 File path: src/operator/l2_normalization.cc
 ##
 @@ -26,13 +26,22 @@
 namespace mxnet {
 namespace op {
 template<>
-Operator* CreateOp(L2NormalizationParam param) {
-  return new L2NormalizationOp(param);
+Operator* CreateOp(L2NormalizationParam param, int dtype) {
 
 Review comment:
   is it done this way elsewhere?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] haojin2 commented on issue #10078: Support float16 in L2Normalization operator

2018-03-12 Thread GitBox
haojin2 commented on issue #10078: Support float16 in L2Normalization operator
URL: https://github.com/apache/incubator-mxnet/pull/10078#issuecomment-372540444
 
 
   I think this PR should be ready for merge, @rahul003 would you please take a 
look at it to double-check? Thanks!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] cjolivier01 commented on issue #9933: [MXNET-23] Adding support to profile kvstore server during distributed training

2018-03-12 Thread GitBox
cjolivier01 commented on issue #9933: [MXNET-23] Adding support to profile 
kvstore server during distributed training
URL: https://github.com/apache/incubator-mxnet/pull/9933#issuecomment-372538789
 
 
   how well you learned the profiler stuff so fast is impressive 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] cjolivier01 commented on a change in pull request #9933: [MXNET-23] Adding support to profile kvstore server during distributed training

2018-03-12 Thread GitBox
cjolivier01 commented on a change in pull request #9933: [MXNET-23] Adding 
support to profile kvstore server during distributed training
URL: https://github.com/apache/incubator-mxnet/pull/9933#discussion_r174010079
 
 

 ##
 File path: include/mxnet/kvstore.h
 ##
 @@ -361,6 +373,17 @@ class KVStore {
*/
   virtual void SendCommandToServers(int cmd_id, const std::string& cmd_body) { 
}
 
+  /**
+   * \brief Sends server profiler commands to all server nodes
+   * Only the worker with rank=0 sends the command which will be received by 
all servers
+   * \param type ProfilerCommand type
+   * \param params parameters for that command in the form of a string
+   */
+  virtual void SetServerProfilerCommand(const KVStoreServerProfilerCommand 
type,
+const std::string& params) {
+LOG(FATAL) << "compile with USE_DIST_KVSTORE=1 to use distributed kvstore";
 
 Review comment:
   do you really need to die here? will a warning suffice?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] cjolivier01 commented on a change in pull request #9933: [MXNET-23] Adding support to profile kvstore server during distributed training

2018-03-12 Thread GitBox
cjolivier01 commented on a change in pull request #9933: [MXNET-23] Adding 
support to profile kvstore server during distributed training
URL: https://github.com/apache/incubator-mxnet/pull/9933#discussion_r174010410
 
 

 ##
 File path: src/kvstore/kvstore_dist_server.h
 ##
 @@ -32,6 +32,8 @@
 #include 
 #include 
 #include 
+#include "mxnet/c_api.h"
+#include "profiler/profiler.h"
 #include "ps/ps.h"
 
 Review comment:
   <>


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] cjolivier01 commented on a change in pull request #9933: [MXNET-23] Adding support to profile kvstore server during distributed training

2018-03-12 Thread GitBox
cjolivier01 commented on a change in pull request #9933: [MXNET-23] Adding 
support to profile kvstore server during distributed training
URL: https://github.com/apache/incubator-mxnet/pull/9933#discussion_r174010585
 
 

 ##
 File path: src/kvstore/kvstore_dist_server.h
 ##
 @@ -170,6 +187,33 @@ class KVStoreDistServer {
 app->Response(recved);
   }
 
+  void SetProfilerConfig(std::string params_str) {
+std::vector elems;
+mxnet::kvstore::split(params_str, ',', std::back_inserter(elems));
+std::vector ckeys;
+std::vector cvals;
+ckeys.reserve(elems.size());
+cvals.reserve(elems.size());
+
+for (int i=0; i < elems.size(); i++) {
+  std::vector parts;
 
 Review comment:
   oh ok missed that my mistake


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] cjolivier01 commented on a change in pull request #9933: [MXNET-23] Adding support to profile kvstore server during distributed training

2018-03-12 Thread GitBox
cjolivier01 commented on a change in pull request #9933: [MXNET-23] Adding 
support to profile kvstore server during distributed training
URL: https://github.com/apache/incubator-mxnet/pull/9933#discussion_r174009887
 
 

 ##
 File path: example/image-classification/common/fit.py
 ##
 @@ -305,3 +321,8 @@ def fit(args, network, data_loader, **kwargs):
   epoch_end_callback=checkpoint,
   allow_missing=True,
   monitor=monitor)
+
+if args.profile_server_file:
+kv.set_server_profiler_state(state='stop')
+if args.profile_worker_file:
 
 Review comment:
   just curious if you had any special reason to stop the server before the 
worker.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] cjolivier01 commented on a change in pull request #9933: [MXNET-23] Adding support to profile kvstore server during distributed training

2018-03-12 Thread GitBox
cjolivier01 commented on a change in pull request #9933: [MXNET-23] Adding 
support to profile kvstore server during distributed training
URL: https://github.com/apache/incubator-mxnet/pull/9933#discussion_r174010221
 
 

 ##
 File path: include/mxnet/kvstore.h
 ##
 @@ -361,6 +373,17 @@ class KVStore {
*/
   virtual void SendCommandToServers(int cmd_id, const std::string& cmd_body) { 
}
 
+  /**
+   * \brief Sends server profiler commands to all server nodes
+   * Only the worker with rank=0 sends the command which will be received by 
all servers
+   * \param type ProfilerCommand type
+   * \param params parameters for that command in the form of a string
+   */
+  virtual void SetServerProfilerCommand(const KVStoreServerProfilerCommand 
type,
+const std::string& params) {
+LOG(FATAL) << "compile with USE_DIST_KVSTORE=1 to use distributed kvstore";
 
 Review comment:
   or a CHECK may be a caught error, right? is FATAL catchable? python user 
might not know whether kvstore was turned on. i mean, is this something where 
execution just can?t continue?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] cjolivier01 commented on a change in pull request #9933: [MXNET-23] Adding support to profile kvstore server during distributed training

2018-03-12 Thread GitBox
cjolivier01 commented on a change in pull request #9933: [MXNET-23] Adding 
support to profile kvstore server during distributed training
URL: https://github.com/apache/incubator-mxnet/pull/9933#discussion_r174010394
 
 

 ##
 File path: src/kvstore/kvstore_dist_server.h
 ##
 @@ -32,6 +32,8 @@
 #include 
 #include 
 #include 
+#include "mxnet/c_api.h"
+#include "profiler/profiler.h"
 
 Review comment:
   ../profiler/


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] cjolivier01 commented on a change in pull request #9933: [MXNET-23] Adding support to profile kvstore server during distributed training

2018-03-12 Thread GitBox
cjolivier01 commented on a change in pull request #9933: [MXNET-23] Adding 
support to profile kvstore server during distributed training
URL: https://github.com/apache/incubator-mxnet/pull/9933#discussion_r174010534
 
 

 ##
 File path: src/kvstore/kvstore_dist_server.h
 ##
 @@ -125,6 +127,8 @@ class KVStoreDistServer {
   }
 
   ~KVStoreDistServer() {
+profiler::Profiler::Get()->SetState(
+  profiler::Profiler::ProfilerState(profiler::Profiler::kNotRunning));
 delete ps_server_;
 
 Review comment:
   not sure what the context is when this is called. is this an independent 
process? normally profiler is shut down by static shutdown, after engine


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] cjolivier01 commented on a change in pull request #9933: [MXNET-23] Adding support to profile kvstore server during distributed training

2018-03-12 Thread GitBox
cjolivier01 commented on a change in pull request #9933: [MXNET-23] Adding 
support to profile kvstore server during distributed training
URL: https://github.com/apache/incubator-mxnet/pull/9933#discussion_r174010819
 
 

 ##
 File path: src/kvstore/kvstore_dist_server.h
 ##
 @@ -153,23 +157,82 @@ class KVStoreDistServer {
 
   void CommandHandle(const ps::SimpleData& recved, ps::SimpleApp* app) {
 CommandType recved_type = static_cast(recved.head);
-if (recved_type == CommandType::kStopServer) {
-  exec_.Stop();
-} else if (recved_type == CommandType::kSyncMode) {
-  sync_mode_ = true;
-} else if (recved_type == CommandType::kSetGradientCompression) {
-  gradient_compression_->DecodeParams(recved.body);
-} else {
-  // this uses value 0 for message id from frontend
-  // let the main thread to execute ctrl, which is necessary for python
-  exec_.Exec([this, recved]() {
-  CHECK(controller_);
-  controller_(recved.head, recved.body);
-});
+switch (recved_type) {
+  case CommandType::kStopServer:
+exec_.Stop();
+break;
+  case CommandType::kSyncMode:
+sync_mode_ = true;
+break;
+  case CommandType::kSetGradientCompression:
+gradient_compression_->DecodeParams(recved.body);
+break;
+  case CommandType::kSetProfilerParams:
+// last char is the type of profiler command
+ProcessServerProfilerCommands(static_cast
+  (recved.body.back() - '0'),
+  recved.body);
+break;
+  case CommandType::kController:
+// this uses value 0 for message id from frontend
+// let the main thread to execute ctrl, which is necessary for python
+exec_.Exec([this, recved]() {
+CHECK(controller_);
+controller_(recved.head, recved.body);
+  });
+break;
 }
 app->Response(recved);
   }
 
+  void ProcessServerProfilerCommands(KVStoreServerProfilerCommand type, const 
std::string& body) {
+switch (type) {
+  case KVStoreServerProfilerCommand::kSetConfig:
+SetProfilerConfig(body.substr(0, body.size() - 1));
+break;
+  case KVStoreServerProfilerCommand::kState:
+MXSetProfilerState(static_cast(body.front() - '0'));
+break;
+  case KVStoreServerProfilerCommand::kPause:
+MXProfilePause(static_cast(body.front() - '0'));
+break;
+  case KVStoreServerProfilerCommand::kDump:
+MXDumpProfile(static_cast(body.front() - '0'));
+break;
+}
+  }
+
+  void SetProfilerConfig(std::string params_str) {
+std::vector elems;
+mxnet::kvstore::split(params_str, ',', std::back_inserter(elems));
+std::vector ckeys;
+std::vector cvals;
+ckeys.reserve(elems.size());
+cvals.reserve(elems.size());
+
+for (int i=0; i < elems.size(); i++) {
+  std::vector parts;
+  mxnet::kvstore::split(elems[i], ':', std::back_inserter(parts));
+  CHECK(!parts[0].empty()) << "ProfilerConfig parameter is empty";
+  CHECK(!parts[1].empty()) << "ProfilerConfig value is empty for parameter 
"<< parts[0];
+  if (parts[0] == "filename") {
+parts[1] = "rank" + std::to_string(ps::MyRank()) + "_" + parts[1];
+  }
+  char* ckey = new char[parts[0].length() + 1];
 
 Review comment:
   why can?t you just pass the c_str() pointers in the array passed to the api 
function so that you don?t have to allocate and free?
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] cjolivier01 commented on a change in pull request #9933: [MXNET-23] Adding support to profile kvstore server during distributed training

2018-03-12 Thread GitBox
cjolivier01 commented on a change in pull request #9933: [MXNET-23] Adding 
support to profile kvstore server during distributed training
URL: https://github.com/apache/incubator-mxnet/pull/9933#discussion_r174010353
 
 

 ##
 File path: src/kvstore/kvstore_dist_server.h
 ##
 @@ -32,6 +32,8 @@
 #include 
 #include 
 #include 
+#include "mxnet/c_api.h"
 
 Review comment:
   i think stuff in include/mxnet/xxx would tend to be in <>


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] feevos commented on issue #2910: Helper function to get GPU device number?

2018-03-12 Thread GitBox
feevos commented on issue #2910: Helper function to get GPU device number?
URL: 
https://github.com/apache/incubator-mxnet/issues/2910#issuecomment-372536394
 
 
   @kenfehling  I get the same error when I run this command on my laptop 
(single GPU) but not on the HPC cluster. Somehow the key  
```'CUDA_VISIBLE_DEVICES'``` in the ```os.environ``` dictionary is missing from 
my laptop. 
   
   When I use (CSIRO - BRACEWELL HPC facility - 4 GPUs per Node) 
```os.environ["CUDA_VISIBLE_DEVICES"]``` I get ```'0,1,2,3'```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] szha commented on issue #9705: Added unittest for benchmarking metric performance

2018-03-12 Thread GitBox
szha commented on issue #9705: Added unittest for benchmarking metric 
performance
URL: https://github.com/apache/incubator-mxnet/pull/9705#issuecomment-372535302
 
 
   One last request: would you put the performance tests in a separate test 
file, such as test_metric_perf.py, so that it's easier to move to nightly later?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] rahul003 commented on issue #9933: [MXNET-23] Adding support to profile kvstore server during distributed training

2018-03-12 Thread GitBox
rahul003 commented on issue #9933: [MXNET-23] Adding support to profile kvstore 
server during distributed training
URL: https://github.com/apache/incubator-mxnet/pull/9933#issuecomment-372531644
 
 
   @cjolivier01 (and others), what do you think?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] asitstands commented on a change in pull request #10048: [MXNET-68] Random shuffle implementation

2018-03-12 Thread GitBox
asitstands commented on a change in pull request #10048: [MXNET-68] Random 
shuffle implementation
URL: https://github.com/apache/incubator-mxnet/pull/10048#discussion_r173857649
 
 

 ##
 File path: src/operator/random/shuffle_op.cc
 ##
 @@ -0,0 +1,134 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+/*!
+ * Copyright (c) 2018 by Contributors
+ * \file shuffle_op.cc
+ * \brief Operator to shuffle elements of an NDArray
+ */
+#if (__GNUC__ > 4 && !defined(__clang__major__)) || (__clang_major__ > 4 && 
__linux__)
+  #define USE_GNU_PARALLEL_SHUFFLE
+#endif
+
+#include 
+#include 
+#include 
+#include 
+#ifdef USE_GNU_PARALLEL_SHUFFLE
+  #include 
+#endif
+#include "../elemwise_op_common.h"
+
+namespace mxnet {
+namespace op {
+
+namespace {
+
+template
+void Shuffle1D(DType* const out, const index_t size, Rand* const prnd) {
+  #ifdef USE_GNU_PARALLEL_SHUFFLE
+auto rand_n = [prnd](index_t n) {
+  std::uniform_int_distribution dist(0, n - 1);
+  return dist(*prnd);
+};
+__gnu_parallel::random_shuffle(out, out + size, rand_n);
+  #else
+std::shuffle(out, out + size, *prnd);
+  #endif
+}
+
+template
+void ShuffleND(DType* const out, const index_t size, const index_t 
first_axis_len,
+Rand* const prnd) {
+  // Fisher-Yates shuffling
+  const index_t stride = size / first_axis_len;
+  auto rand_n = [prnd](index_t n) {
+std::uniform_int_distribution dist(0, n - 1);
+return dist(*prnd);
+  };
+  CHECK_GT(first_axis_len, 0U);
+  for (index_t i = first_axis_len - 1; i > 0; --i) {
+const index_t j = rand_n(i + 1);
+if (i != j) {
+  std::swap_ranges(out + stride * i, out + stride * (i + 1), out + stride 
* j);
 
 Review comment:
   I guess that the optimization may be not trivial. Anyway here are some tests 
with a very naive parallelization with openmp. It simply splits the ranges to 
swap into multiple ranges and gives each piece to an openmp thread. Multiple 
threads benefit arrays with large number of elements per row when they run on 
two Xeon E5-2680 CPUs, but there is no gain when run on single i7-7700. For 
small arrays, multiple threads very poorly perform in either CPUs. There could 
be more sophisticated optimizations for this kind of memory copy, but I have no 
idea.
   
   Test with Xeon E5-2680 two CPUs.
   
   ```
   # ./a.out num_rows num_cols num_repeats num_threads
   # measures the running time of the two implementations in microseconds.
   
   > ./a.out 100 1000 10 4
   multi  : 3861601 us
   single : 9080845 us
   
   > ./a.out 100 100 10 4
   multi  : 338396 us
   single : 861971 us
   
   > ./a.out 100 10 10 4
   multi  : 21387 us
   single : 57533 us
   
   > ./a.out 100 1 10 4
   multi  : 6956 us
   single : 4073 us
   
   > ./a.out 100 1000 10 4
   multi  : 5886 us
   single : 597 us
   
   > ./a.out 100 100 10 4
   multi  : 4606 us
   single : 139 us
   ```
   
   Test with i7-7700.
   
   ```
   
   > ./a.out 100 1000 10 4
   multi  : 10015002 us
   single : 9327057 us
   
   > ./a.out 100 100 10 4
   multi  : 969582 us
   single : 918764 us
   
   > ./a.out 100 10 10 4
   multi  : 77717 us
   single : 75001 us
   
   > ./a.out 100 1 10 4
   multi  : 1850 us
   single : 2016 us
   
   > ./a.out 100 1000 10 4
   multi  : 1911 us
   single : 209 us
   
   > ./a.out 100 1000 10 2
   multi  : 9478994 us
   single : 9451969 us
   
   > ./a.out 100 100 10 2
   multi  : 936728 us
   single : 918129 us
   
   > ./a.out 100 10 10 2
   multi  : 75222 us
   single : 75331 us
   
   > ./a.out 100 1 10 2
   multi  : 1885 us
   single : 1953 us
   
   > ./a.out 100 1000 10 2
   multi  : 1425 us
   single : 204 us
   ```
   Here is the code.
   
   ```c++
   #include 
   #include 
   #include 
   #include 
   
   using index_t = unsigned int;
   
   // The current implementation
   template
   void ShuffleND(DType* const out, const index_t size,
  const index_t first_axis_len, Rand* const prnd) {
 const index_t stride = size / first_axis_len;
 auto rand_n = [prnd](index_t n) {
   std::uniform_int_distribution dist(0, n - 1);
   return dist(*prnd);
 };
 for (index_t i = 

[GitHub] asitstands commented on a change in pull request #10048: [MXNET-68] Random shuffle implementation

2018-03-12 Thread GitBox
asitstands commented on a change in pull request #10048: [MXNET-68] Random 
shuffle implementation
URL: https://github.com/apache/incubator-mxnet/pull/10048#discussion_r173857649
 
 

 ##
 File path: src/operator/random/shuffle_op.cc
 ##
 @@ -0,0 +1,134 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+/*!
+ * Copyright (c) 2018 by Contributors
+ * \file shuffle_op.cc
+ * \brief Operator to shuffle elements of an NDArray
+ */
+#if (__GNUC__ > 4 && !defined(__clang__major__)) || (__clang_major__ > 4 && 
__linux__)
+  #define USE_GNU_PARALLEL_SHUFFLE
+#endif
+
+#include 
+#include 
+#include 
+#include 
+#ifdef USE_GNU_PARALLEL_SHUFFLE
+  #include 
+#endif
+#include "../elemwise_op_common.h"
+
+namespace mxnet {
+namespace op {
+
+namespace {
+
+template
+void Shuffle1D(DType* const out, const index_t size, Rand* const prnd) {
+  #ifdef USE_GNU_PARALLEL_SHUFFLE
+auto rand_n = [prnd](index_t n) {
+  std::uniform_int_distribution dist(0, n - 1);
+  return dist(*prnd);
+};
+__gnu_parallel::random_shuffle(out, out + size, rand_n);
+  #else
+std::shuffle(out, out + size, *prnd);
+  #endif
+}
+
+template
+void ShuffleND(DType* const out, const index_t size, const index_t 
first_axis_len,
+Rand* const prnd) {
+  // Fisher-Yates shuffling
+  const index_t stride = size / first_axis_len;
+  auto rand_n = [prnd](index_t n) {
+std::uniform_int_distribution dist(0, n - 1);
+return dist(*prnd);
+  };
+  CHECK_GT(first_axis_len, 0U);
+  for (index_t i = first_axis_len - 1; i > 0; --i) {
+const index_t j = rand_n(i + 1);
+if (i != j) {
+  std::swap_ranges(out + stride * i, out + stride * (i + 1), out + stride 
* j);
 
 Review comment:
   I guess that the optimization may be not trivial. Anyway here are some tests 
with a very naive parallelization with openmp. It simply splits the ranges to 
swap into multiple ranges and gives each piece to an openmp thread. Multiple 
threads benefit arrays with large number of elements per row when they run on 
two Xeon E5-2680 CPUs, but there is no gain when run on single i7-7700. For 
small arrays, multiple threads very poorly performs in either CPUs. There could 
be more sophisticated optimizations for this kind of memory copy, but I have no 
idea.
   
   Test with Xeon E5-2680 two CPUs.
   
   ```
   # ./a.out num_rows num_cols num_repeats num_threads
   # measures the running time of the two implementations in microseconds.
   
   > ./a.out 100 1000 10 4
   multi  : 3861601 us
   single : 9080845 us
   
   > ./a.out 100 100 10 4
   multi  : 338396 us
   single : 861971 us
   
   > ./a.out 100 10 10 4
   multi  : 21387 us
   single : 57533 us
   
   > ./a.out 100 1 10 4
   multi  : 6956 us
   single : 4073 us
   
   > ./a.out 100 1000 10 4
   multi  : 5886 us
   single : 597 us
   
   > ./a.out 100 100 10 4
   multi  : 4606 us
   single : 139 us
   ```
   
   Test with i7-7700.
   
   ```
   
   > ./a.out 100 1000 10 4
   multi  : 10015002 us
   single : 9327057 us
   
   > ./a.out 100 100 10 4
   multi  : 969582 us
   single : 918764 us
   
   > ./a.out 100 10 10 4
   multi  : 77717 us
   single : 75001 us
   
   > ./a.out 100 1 10 4
   multi  : 1850 us
   single : 2016 us
   
   > ./a.out 100 1000 10 4
   multi  : 1911 us
   single : 209 us
   
   > ./a.out 100 1000 10 2
   multi  : 9478994 us
   single : 9451969 us
   
   > ./a.out 100 100 10 2
   multi  : 936728 us
   single : 918129 us
   
   > ./a.out 100 10 10 2
   multi  : 75222 us
   single : 75331 us
   
   > ./a.out 100 1 10 2
   multi  : 1885 us
   single : 1953 us
   
   > ./a.out 100 1000 10 2
   multi  : 1425 us
   single : 204 us
   ```
   Here is the code.
   
   ```c++
   #include 
   #include 
   #include 
   #include 
   
   using index_t = unsigned int;
   
   // The current implementation
   template
   void ShuffleND(DType* const out, const index_t size,
  const index_t first_axis_len, Rand* const prnd) {
 const index_t stride = size / first_axis_len;
 auto rand_n = [prnd](index_t n) {
   std::uniform_int_distribution dist(0, n - 1);
   return dist(*prnd);
 };
 for (index_t i = 

[GitHub] cjolivier01 commented on issue #10042: Gluon dataloader crash on speech recognition training

2018-03-12 Thread GitBox
cjolivier01 commented on issue #10042: Gluon dataloader crash on speech 
recognition training
URL: 
https://github.com/apache/incubator-mxnet/issues/10042#issuecomment-372522379
 
 
   ok, will take a look tomorrow
   
   On Mon, Mar 12, 2018 at 7:03 PM Xingjian Shi 
   wrote:
   
   > 100%
   >
   > Get Outlook for iOS
   > 
   > From: Chris Olivier 
   > Sent: Monday, March 12, 2018 6:45:46 PM
   > To: apache/incubator-mxnet
   > Cc: Xingjian SHI; Comment
   > Subject: Re: [apache/incubator-mxnet] Gluon dataloader crash on speech
   > recognition training (#10042)
   >
   >
   > with what frequency does it occur?
   >
   > ?
   > You are receiving this because you commented.
   > Reply to this email directly, view it on GitHub<
   > 
https://github.com/apache/incubator-mxnet/issues/10042#issuecomment-372518547>,
   > or mute the thread<
   > 
https://github.com/notifications/unsubscribe-auth/AE8D7rU25otxMOyO4doeItZPtMQf0Lp_ks5tdyTKgaJpZM4SjV9U
   > >.
   >
   > ?
   > You are receiving this because you were mentioned.
   > Reply to this email directly, view it on GitHub
   > 
,
   > or mute the thread
   > 

   > .
   >
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] sxjscience commented on issue #10042: Gluon dataloader crash on speech recognition training

2018-03-12 Thread GitBox
sxjscience commented on issue #10042: Gluon dataloader crash on speech 
recognition training
URL: 
https://github.com/apache/incubator-mxnet/issues/10042#issuecomment-372521638
 
 
   100%
   
   Get Outlook for iOS
   
   From: Chris Olivier 
   Sent: Monday, March 12, 2018 6:45:46 PM
   To: apache/incubator-mxnet
   Cc: Xingjian SHI; Comment
   Subject: Re: [apache/incubator-mxnet] Gluon dataloader crash on speech 
recognition training (#10042)
   
   
   with what frequency does it occur?
   
   ?
   You are receiving this because you commented.
   Reply to this email directly, view it on 
GitHub,
 or mute the 
thread.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] cjolivier01 commented on issue #10042: Gluon dataloader crash on speech recognition training

2018-03-12 Thread GitBox
cjolivier01 commented on issue #10042: Gluon dataloader crash on speech 
recognition training
URL: 
https://github.com/apache/incubator-mxnet/issues/10042#issuecomment-372518547
 
 
   with what frequency does it occur?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] sxjscience commented on issue #10042: Gluon dataloader crash on speech recognition training

2018-03-12 Thread GitBox
sxjscience commented on issue #10042: Gluon dataloader crash on speech 
recognition training
URL: 
https://github.com/apache/incubator-mxnet/issues/10042#issuecomment-372517195
 
 
   @cjolivier01 @piiswrong After BinarySearch, I can confirm that the problem 
is due to this PR: 
https://github.com/apache/incubator-mxnet/commit/106f97f1881e6bb1a00c56a0ae55200e27297733
   
   ```
   3/4 94d3c06f8511782f405f5dbf4bccf61647a78cf3 Fail
   2/27 7a0509d6f5ee19b0d3530fb2a4cb944e4f743b33 Fail
   2/23 fbbc080d47323dbc23eef4a1453452624cea859b Fail
   2/23 106f97f1881e6bb1a00c56a0ae55200e27297733 Fail
   2/23 b23e0a9f9f28f886ab20e48d0fcabcf0f8db91c4 Succeed
   2/21 ed21873b73fc633fa0a3866236ec5a92057c2056 Succeed
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] sxjscience commented on issue #10042: Gluon dataloader crash on speech recognition training

2018-03-12 Thread GitBox
sxjscience commented on issue #10042: Gluon dataloader crash on speech 
recognition training
URL: 
https://github.com/apache/incubator-mxnet/issues/10042#issuecomment-372517195
 
 
   @cjolivier01 @piiswrong After BinarySearch, I can confirm that the problem 
is due to this PR: 
https://github.com/apache/incubator-mxnet/commit/106f97f1881e6bb1a00c56a0ae55200e27297733
   
   ```
   3/4 94d3c06f8511782f405f5dbf4bccf61647a78cf3 Fail
   2/27 7a0509d6f5ee19b0d3530fb2a4cb944e4f743b33 Fail
   2/23 fbbc080d47323dbc23eef4a1453452624cea859b Fail
   2/23 106f97f1881e6bb1a00c56a0ae55200e27297733 Fail
   2/23 b23e0a9f9f28f886ab20e48d0fcabcf0f8db91c4 Succeed
   2/21 ed21873b73fc633fa0a3866236ec5a92057c2056 Succeed
   ```
   
   Also, I compile without setting the `USE_PROFILE` flag.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] sxjscience commented on issue #10082: All workloads are pushed to the key queue in multi-processing DataLoader

2018-03-12 Thread GitBox
sxjscience commented on issue #10082: All workloads are pushed to the key queue 
in multi-processing DataLoader
URL: 
https://github.com/apache/incubator-mxnet/issues/10082#issuecomment-372514596
 
 
   @piiswrong @yzhliu 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] sxjscience opened a new issue #10082: All workloads are pushed to the key queue in multi-processing DataLoader

2018-03-12 Thread GitBox
sxjscience opened a new issue #10082: All workloads are pushed to the key queue 
in multi-processing DataLoader
URL: https://github.com/apache/incubator-mxnet/issues/10082
 
 
   In the current implementation of the multi-processing part of the 
DataLoader, all batches in the `batch_sampler` are pushed to the key queue. See 
https://github.com/apache/incubator-mxnet/blob/master/python/mxnet/gluon/data/dataloader.py#L218-L219
 . This will be inefficient and will result in an endless loop if the 
batch_sampler generates an infinite number of samples, e.g,
   ```python
   class InfiniteSampler(object):
   def __iter__(self):
   while(True):
   yield 1
   ```
   I find that pytorch will incrementally pre-fetch the new batches 
(http://pytorch.org/docs/master/_modules/torch/utils/data/dataloader.html#DataLoader).
 We'd better change the logic to be similar as pytorch.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] CoinCheung commented on a change in pull request #10000: [MXNET-80] Fix average pooling kernel size assignment error

2018-03-12 Thread GitBox
CoinCheung commented on a change in pull request #1: [MXNET-80] Fix average 
pooling kernel size assignment error
URL: https://github.com/apache/incubator-mxnet/pull/1#discussion_r173990770
 
 

 ##
 File path: src/operator/nn/pooling-inl.h
 ##
 @@ -56,11 +56,11 @@ struct PoolingParam : public dmlc::Parameter 
{
 DMLC_DECLARE_FIELD(cudnn_off).set_default(false)
 .describe("Turn off cudnn pooling and use MXNet pooling operator. ");
 
-DMLC_DECLARE_FIELD(kernel)
+DMLC_DECLARE_FIELD(kernel).set_default(TShape())  // add default value here
 .enforce_nonzero()
 .describe("Pooling kernel size: (y, x) or (d, y, x)");
 
-DMLC_DECLARE_FIELD(pool_type)
+DMLC_DECLARE_FIELD(pool_type).set_default(pool_enum::kMaxPooling)  // add 
default pooling method
 .add_enum("max", pool_enum::kMaxPooling)
 .add_enum("avg", pool_enum::kAvgPooling)
 .add_enum("sum", pool_enum::kSumPooling)
 
 Review comment:
   Sorry, maybe I have misunderstood what you said, I will have a try.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] xinyu-intel commented on a change in pull request #9918: [MXNET-74]Update mkldnn to the newest & Add clang build test with mkldnn.

2018-03-12 Thread GitBox
xinyu-intel commented on a change in pull request #9918: [MXNET-74]Update 
mkldnn to the newest & Add clang build test with mkldnn.
URL: https://github.com/apache/incubator-mxnet/pull/9918#discussion_r173990615
 
 

 ##
 File path: Jenkinsfile
 ##
 @@ -175,6 +175,24 @@ try {
 }
   }
 },
+'CPU: Clang 3.9 MKLDNN': {
+  node('mxnetlinux-cpu') {
+ws('workspace/build-cpu-clang39') {
+  init_git()
+  sh "ci/build.py --build --platform ubuntu_cpu 
/work/runtime_functions.sh build_ubuntu_cpu_clang39_mkldnn"
+  pack_lib('mkldnn_cpu', mx_mkldnn_lib)
+}
+  }
+},
+'CPU: Clang 5 MKLDNN': {
+  node('mxnetlinux-cpu') {
+ws('workspace/build-cpu-clang50') {
 
 Review comment:
   Thank you, I am going to fix it:)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] CoinCheung commented on a change in pull request #10000: [MXNET-80] Fix average pooling kernel size assignment error

2018-03-12 Thread GitBox
CoinCheung commented on a change in pull request #1: [MXNET-80] Fix average 
pooling kernel size assignment error
URL: https://github.com/apache/incubator-mxnet/pull/1#discussion_r173990460
 
 

 ##
 File path: src/operator/nn/pooling-inl.h
 ##
 @@ -56,11 +56,11 @@ struct PoolingParam : public dmlc::Parameter 
{
 DMLC_DECLARE_FIELD(cudnn_off).set_default(false)
 .describe("Turn off cudnn pooling and use MXNet pooling operator. ");
 
-DMLC_DECLARE_FIELD(kernel)
+DMLC_DECLARE_FIELD(kernel).set_default(TShape())  // add default value here
 .enforce_nonzero()
 .describe("Pooling kernel size: (y, x) or (d, y, x)");
 
-DMLC_DECLARE_FIELD(pool_type)
+DMLC_DECLARE_FIELD(pool_type).set_default(pool_enum::kMaxPooling)  // add 
default pooling method
 .add_enum("max", pool_enum::kMaxPooling)
 .add_enum("avg", pool_enum::kAvgPooling)
 .add_enum("sum", pool_enum::kSumPooling)
 
 Review comment:
   But I see in the original version that the order is: global_pool(false), 
cudnn_off(false), kernel(no default value), pool_type(no default value), 
   
   In the original version, "kernel" and "pool_type" do not have default value, 
but they go after "global_pool" and "cudnn_off" which are with default values.
   
   
https://github.com/apache/incubator-mxnet/blob/94f68fc8fd21611b7f5c148cb0e5d134efe58f87/src/operator/nn/pooling-inl.h#L59


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] szha commented on a change in pull request #10025: Language model with Google's billion words dataset

2018-03-12 Thread GitBox
szha commented on a change in pull request #10025: Language model with Google's 
billion words dataset
URL: https://github.com/apache/incubator-mxnet/pull/10025#discussion_r173988730
 
 

 ##
 File path: example/rnn/large_word_lm/data.py
 ##
 @@ -0,0 +1,202 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import mxnet as mx
+import numpy as np
+import codecs, glob, random, logging
+
+class Vocabulary(object):
+""" A dictionary for words.
+Adapeted from @rafaljozefowicz's implementation.
+"""
+def __init__(self):
+self._token_to_id = {}
+self._token_to_count = {}
 
 Review comment:
   collections.Counter?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] ShootingSpace commented on issue #10068: rnn.encode_sentences deals with unknown token

2018-03-12 Thread GitBox
ShootingSpace commented on issue #10068: rnn.encode_sentences deals with 
unknown token
URL: 
https://github.com/apache/incubator-mxnet/issues/10068#issuecomment-372504273
 
 
   This is a simple but necessary solution. As people sometimes are only 
interested in k most frequent tokens, that is the provided dictionary, other 
tokens are considered as unknown tokens, and could be replaced with special 
mark (e.g. 'UNK') and added into the dictionary.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] ShootingSpace commented on issue #10068: rnn.encode_sentences deals with unknown token

2018-03-12 Thread GitBox
ShootingSpace commented on issue #10068: rnn.encode_sentences deals with 
unknown token
URL: 
https://github.com/apache/incubator-mxnet/issues/10068#issuecomment-372504273
 
 
   This is a simple but necessary solution. As people sometimes are only 
interested in k most frequent token, that is the provided dictionary, other 
tokens are considered as unknown tokens, and could be replaced with special 
mark (e.g. 'UNK') and added into the dictionary.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] sxjscience commented on issue #10042: Gluon dataloader crash on speech recognition training

2018-03-12 Thread GitBox
sxjscience commented on issue #10042: Gluon dataloader crash on speech 
recognition training
URL: 
https://github.com/apache/incubator-mxnet/issues/10042#issuecomment-372501972
 
 
   @Jerryzcn  found that v1.1.0 does not have this problem.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] cjolivier01 commented on issue #9410: Training with the same parameters and seed gets significantly different results

2018-03-12 Thread GitBox
cjolivier01 commented on issue #9410: Training with the same parameters and 
seed gets significantly different results
URL: 
https://github.com/apache/incubator-mxnet/issues/9410#issuecomment-372493368
 
 
   Ok, I can reproduce with CUDNN enabled...
   
   [Epoch 0] accuracy=0.101000
   [Epoch 0] accuracy=0.119300
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] cjolivier01 commented on issue #9410: Training with the same parameters and seed gets significantly different results

2018-03-12 Thread GitBox
cjolivier01 commented on issue #9410: Training with the same parameters and 
seed gets significantly different results
URL: 
https://github.com/apache/incubator-mxnet/issues/9410#issuecomment-372493368
 
 
   Ok, I can reproduce with CUDNN enabled...


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] cjolivier01 commented on issue #9410: Training with the same parameters and seed gets significantly different results

2018-03-12 Thread GitBox
cjolivier01 commented on issue #9410: Training with the same parameters and 
seed gets significantly different results
URL: 
https://github.com/apache/incubator-mxnet/issues/9410#issuecomment-372490007
 
 
   I didn't have CUDNN enabled. Trying that now...


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] cjolivier01 commented on issue #9410: Training with the same parameters and seed gets significantly different results

2018-03-12 Thread GitBox
cjolivier01 commented on issue #9410: Training with the same parameters and 
seed gets significantly different results
URL: 
https://github.com/apache/incubator-mxnet/issues/9410#issuecomment-372489626
 
 
   Also, is this done with the latest build?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] cjolivier01 commented on issue #9410: Training with the same parameters and seed gets significantly different results

2018-03-12 Thread GitBox
cjolivier01 commented on issue #9410: Training with the same parameters and 
seed gets significantly different results
URL: 
https://github.com/apache/incubator-mxnet/issues/9410#issuecomment-372489626
 
 
   Also, is this doen with the latest build?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] fedorzh commented on issue #9410: Training with the same parameters and seed gets significantly different results

2018-03-12 Thread GitBox
fedorzh commented on issue #9410: Training with the same parameters and seed 
gets significantly different results
URL: 
https://github.com/apache/incubator-mxnet/issues/9410#issuecomment-372489419
 
 
   Amazon p2.xlarge instance, cuda 9, cudnn7, no mkl


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] eric-haibin-lin opened a new pull request #10081: [MXNET-82] [WIP] Sparse op tutorial for developers

2018-03-12 Thread GitBox
eric-haibin-lin opened a new pull request #10081: [MXNET-82] [WIP] Sparse op 
tutorial for developers
URL: https://github.com/apache/incubator-mxnet/pull/10081
 
 
   ## Description ##
   (Brief description on what this PR is about)
   
   ## Checklist ##
   ### Essentials ###
   - [ ] Passed code style checking (`make lint`)
   - [ ] Changes are complete (i.e. I finished coding on this PR)
   - [ ] All changes have test coverage:
   - Unit tests are added for small changes to verify correctness (e.g. adding 
a new operator)
   - Nightly tests are added for complicated/long-running ones (e.g. changing 
distributed kvstore)
   - Build tests will be added for build configuration changes (e.g. adding a 
new build option with NCCL)
   - [ ] Code is well-documented: 
   - For user-facing API changes, API doc string has been updated. 
   - For new C++ functions in header files, their functionalities and arguments 
are documented. 
   - For new examples, README.md is added to explain the what the example does, 
the source of the dataset, expected performance on test set and reference to 
the original paper if applicable
   - [ ] To the my best knowledge, examples are either not affected by this 
change, or have been fixed to be compatible with this change
   
   ### Changes ###
   - [ ] Feature1, tests, (and when applicable, API doc)
   - [ ] Feature2, tests, (and when applicable, API doc)
   
   ## Comments ##
   - If this change is a backward incompatible change, why must this change be 
made.
   - Interesting edge cases to note here
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] KellenSunderland closed pull request #10080: WIP: Test building without explicit MKL disable

2018-03-12 Thread GitBox
KellenSunderland closed pull request #10080: WIP: Test building without 
explicit MKL disable
URL: https://github.com/apache/incubator-mxnet/pull/10080
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/ci/docker/runtime_functions.sh b/ci/docker/runtime_functions.sh
index 89ea44fa1ef..af0a653fc3f 100755
--- a/ci/docker/runtime_functions.sh
+++ b/ci/docker/runtime_functions.sh
@@ -278,8 +278,6 @@ build_ubuntu_gpu_cmake() {
 cmake \
 -DUSE_CUDA=1   \
 -DUSE_CUDNN=1  \
--DUSE_MKLML_MKL=0  \
--DUSE_MKLDNN=0 \
 -DCMAKE_BUILD_TYPE=Release \
 -G Ninja   \
 /work/mxnet


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] kenfehling commented on issue #2910: Helper function to get GPU device number?

2018-03-12 Thread GitBox
kenfehling commented on issue #2910: Helper function to get GPU device number?
URL: 
https://github.com/apache/incubator-mxnet/issues/2910#issuecomment-372485992
 
 
   When I try ```os.environ["CUDA_VISIBLE_DEVICES"]``` I get:
   ```
   KeyError: 'CUDA_VISIBLE_DEVICES'
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] cjolivier01 commented on issue #9410: Training with the same parameters and seed gets significantly different results

2018-03-12 Thread GitBox
cjolivier01 commented on issue #9410: Training with the same parameters and 
seed gets significantly different results
URL: 
https://github.com/apache/incubator-mxnet/issues/9410#issuecomment-372486000
 
 
   Can you please supply build parameters that you use? ie CUDA, yes? CUDNN? 
MKL? etc.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] cjolivier01 commented on issue #9410: Training with the same parameters and seed gets significantly different results

2018-03-12 Thread GitBox
cjolivier01 commented on issue #9410: Training with the same parameters and 
seed gets significantly different results
URL: 
https://github.com/apache/incubator-mxnet/issues/9410#issuecomment-372485696
 
 
   Modified script per your second code comment (smaller test set):
   ```python
   import mxnet as mx
   from mxnet import nd, gluon, autograd, ndarray
   import numpy as np
   import random
   from mxnet import profiler
   
   # profiler.set_config(profile_symbolic=True, aggregate_stats=True, 
continuous_dump=True)
   # profiler.set_state('run')
   
   def transform(data, label):
   return [dat.astype(np.float32) for dat in data], [lab.astype(np.float32) 
for lab in label]
   
   
   train_cifar_gluon = gluon.data.vision.CIFAR10(train=True, 
transform=transform)
   test_cifar_gluon = gluon.data.vision.CIFAR10(train=False, 
transform=transform)
   
   def convert_gluon_dataset_to_numpy(data):
   ds = data[:][0][0].shape
   X = np.empty((len(data[:][0]), ds[2], ds[0], ds[1]), dtype=np.float32)
   for i, example in enumerate(data[:][0]):
   X[i, :] = np.rollaxis(example.asnumpy(),2)
   y = np.array(data[:][1])
   return X, y
   
   X, y = convert_gluon_dataset_to_numpy(train_cifar_gluon)
   X_test, y_test = convert_gluon_dataset_to_numpy(test_cifar_gluon)
   
   
   # In[2]:
   
   
   def predict_scores(net, X_, batch_size, context):
   scores = None
   test_loaded = gluon.data.DataLoader(mx.nd.array(X_), batch_size, 
shuffle=False)
   for data in test_loaded:
   data = data.as_in_context(context)
   output = net(data).asnumpy()
   if scores is None:
   scores = output
   else:
   scores = np.append(scores, output, axis=0)
   return scores
   
   
   # In[3]:
   
   
   gpu_count = 1
   _ctx_list = [mx.gpu(i) for i in range(gpu_count)]
   _batch_size=64
   epochs=1
   _seed=42
   _optimizer='sgd'
   _learning_rate=0.1
   _xavier_magnitude=2.
   _momentum=0.9
   _wd=0.0001
   _nclasses=10
   
   
   n_batch=5000
   #n_batch=_batch_size * 2
   random_selector = np.random.RandomState(0)
   confidences = random_selector.rand(X.shape[0])
   cert_idx = np.argsort(confidences)
   selected_indices = cert_idx[:n_batch]
   selected_indices = np.sort(selected_indices)
   
   # ### Try 1
   
   # In[4]:
   
   
   random.seed(_seed)
   mx.random.seed(_seed)
   np.random.seed(_seed)
   
   
   # In[5]:
   
   
   net = gluon.model_zoo.vision.get_model('resnet34_v2', pretrained=False, 
classes=_nclasses, ctx=_ctx_list)
   
   loss = gluon.loss.SoftmaxCrossEntropyLoss()
   
   
   # In[6]:
   
   
   net.collect_params().initialize(mx.init.Xavier(magnitude=_xavier_magnitude), 
ctx=_ctx_list, force_reinit=True)
   
   trainer = gluon.Trainer(net.collect_params(), _optimizer,
   optimizer_params=dict(learning_rate=_learning_rate, 
momentum=_momentum,
 wd=_wd),
   kvstore='device' if len(_ctx_list) > 0 else 'local')
   
   #train_data = mx.io.NDArrayIter(X, label=y, batch_size=_batch_size)
   train_data = mx.io.NDArrayIter(X[selected_indices, :], 
label=y[selected_indices], batch_size=_batch_size)
   
   for e in range(epochs):
   train_data.reset()
   for batch in train_data:
   cur_contexts = _ctx_list
   if batch.data[0].shape[0] < len(_ctx_list):
   cur_contexts = cur_contexts[:batch.data[0].shape[0]]
   data = gluon.utils.split_and_load(batch.data[0], 
ctx_list=cur_contexts, batch_axis=0, even_split=False)
   label = gluon.utils.split_and_load(batch.label[0], 
ctx_list=cur_contexts, batch_axis=0, even_split=False)
   Ls = []
   with autograd.record():  # Start recording the derivatives
   for x_cur, y_cur in zip(data, label):
   L = loss(net(x_cur), y_cur)
   # store the loss and do backward after we have done forward
   # on all GPUs for better speed on multiple GPUs.
   Ls.append(L)
   for L in Ls:
   L.backward()
   trainer.step(batch.data[0].shape[0])
   
   scores_test = predict_scores(net, X_test, _batch_size, _ctx_list[0])
   predictions_test = np.argmax(scores_test, axis=1)
   accuracy = np.mean(predictions_test == y_test)
   print('[Epoch %d] accuracy=%f' % (e, accuracy))
   
   
   # ### Try 2
   
   # In[7]:
   # profiler.set_state('stop')
   # print(profiler.dumps(True))
   
   random.seed(_seed)
   mx.random.seed(_seed)
   np.random.seed(_seed)
   
   
   # In[8]:
   
   
   net = gluon.model_zoo.vision.get_model('resnet34_v2', pretrained=False, 
classes=_nclasses, ctx=_ctx_list)
   
   loss = gluon.loss.SoftmaxCrossEntropyLoss()
   
   
   # In[9]:
   
   
   net.collect_params().initialize(mx.init.Xavier(magnitude=_xavier_magnitude), 
ctx=_ctx_list, force_reinit=True)
   
   trainer = 

[GitHub] cjolivier01 commented on issue #9410: Training with the same parameters and seed gets significantly different results

2018-03-12 Thread GitBox
cjolivier01 commented on issue #9410: Training with the same parameters and 
seed gets significantly different results
URL: 
https://github.com/apache/incubator-mxnet/issues/9410#issuecomment-372485876
 
 
   I get the following (one GPU):
   [Epoch 0] accuracy=0.10
   [Epoch 0] accuracy=0.10
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] cjolivier01 commented on issue #9410: Training with the same parameters and seed gets significantly different results

2018-03-12 Thread GitBox
cjolivier01 commented on issue #9410: Training with the same parameters and 
seed gets significantly different results
URL: 
https://github.com/apache/incubator-mxnet/issues/9410#issuecomment-372485413
 
 
   If I run the script by itself, without the second training run, I get the 
same result every time:
   [Epoch 0] accuracy=0.10
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] rahul003 commented on issue #9774: mx.io.ImageRecordIter does not respect dtype argument / FP16 performance on Volta

2018-03-12 Thread GitBox
rahul003 commented on issue #9774: mx.io.ImageRecordIter does not respect dtype 
argument / FP16 performance on Volta
URL: 
https://github.com/apache/incubator-mxnet/issues/9774#issuecomment-371654013
 
 
   Both suggestions didn't help improve the speed unforunately. Using 
MXNET_CUDNN_AUTOTUNE_DEFAULT=2 helped in some cases. But we can't say this 
setting helps consistently. If it picks the fastest, why would it not help in 
all cases? I understand cases where it should be same speed as other algos. But 
sometimes, this is slower than setting it to 1. All else should remain same, 
right?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] rahul003 commented on issue #9774: mx.io.ImageRecordIter does not respect dtype argument / FP16 performance on Volta

2018-03-12 Thread GitBox
rahul003 commented on issue #9774: mx.io.ImageRecordIter does not respect dtype 
argument / FP16 performance on Volta
URL: 
https://github.com/apache/incubator-mxnet/issues/9774#issuecomment-372480640
 
 
   Sorry I was digressing from the topic of the issue. Regarding the iterator 
issue, we need to document the behavior that it will return fp32 data 
regardless. Keeping this open till we fix it or document it


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] rahul003 opened a new issue #9774: mx.io.ImageRecordIter does not respect dtype argument / FP16 performance on Volta

2018-03-12 Thread GitBox
rahul003 opened a new issue #9774: mx.io.ImageRecordIter does not respect dtype 
argument / FP16 performance on Volta
URL: https://github.com/apache/incubator-mxnet/issues/9774
 
 
   ## Description
   `mx.io.ImageRecordIter` or `src/io/iter_image_recordio_2.cc` doesn't respect 
dtype parameter taken. 
   It is designed to only work with float32 because of instantiating the class 
with real_t dtype. (in src/io/iter_image_recordio_2.cc). Can we make it handle 
fp16 too? This is important for fp16 training.
   
   Also, training in fp16 seems slower than fp32 for some models. 
   
   ## Environment info (Required)
   Mxnet 1.0
   Package used: Python
   
   ## Error Message:
   Silently generates fp32 data
   
   ## Minimum reproducible example
   N/A
   
   ## Steps to reproduce
   N/A
   
   ## What have you tried to solve it?
   Can we come up with a better way than to create a new operator passing DType 
as fp16?
   
   @ptrendx 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[incubator-mxnet] branch nlp_toolkit updated: fix CorpusReader (#10079)

2018-03-12 Thread zhasheng
This is an automated email from the ASF dual-hosted git repository.

zhasheng pushed a commit to branch nlp_toolkit
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git


The following commit(s) were added to refs/heads/nlp_toolkit by this push:
 new 7e59d74  fix CorpusReader (#10079)
7e59d74 is described below

commit 7e59d74de5f600b0f3841e9fa3b2bd8efb9c2378
Author: Xingjian Shi 
AuthorDate: Mon Mar 12 15:02:28 2018 -0700

fix CorpusReader (#10079)

try to accelerate

update

Revert "update"

This reverts commit 7d409b54e0fdb006e6d8be50749a90e24b12715d.

Revert "try to accelerate"

This reverts commit b7d93f9184323909ae59d77445d45588ea76aaa5.
---
 python/mxnet/gluon/data/text/base.py | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/python/mxnet/gluon/data/text/base.py 
b/python/mxnet/gluon/data/text/base.py
index 6c4cf87..3c02424 100644
--- a/python/mxnet/gluon/data/text/base.py
+++ b/python/mxnet/gluon/data/text/base.py
@@ -20,7 +20,7 @@
 
 """Base classes for text datasets and readers."""
 
-__all__ = ['WordLanguageReader']
+__all__ = ['CorpusReader', 'WordLanguageReader']
 
 import io
 import os
@@ -67,7 +67,7 @@ class CorpusReader(DataReader):
 if self._tokenizer:
 samples = [self._tokenizer(s) for s in samples if s]
 if self._flatten:
-samples = flatten(samples)
+samples = flatten_samples(samples)
 else:
 samples = [s for s in samples if s]
 return SimpleDataset(samples)

-- 
To stop receiving notification emails like this one, please contact
zhash...@apache.org.


[GitHub] szha closed pull request #10079: Fix CorpusReader

2018-03-12 Thread GitBox
szha closed pull request #10079: Fix CorpusReader
URL: https://github.com/apache/incubator-mxnet/pull/10079
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/python/mxnet/gluon/data/text/base.py 
b/python/mxnet/gluon/data/text/base.py
index 6c4cf877892..3c024245023 100644
--- a/python/mxnet/gluon/data/text/base.py
+++ b/python/mxnet/gluon/data/text/base.py
@@ -20,7 +20,7 @@
 
 """Base classes for text datasets and readers."""
 
-__all__ = ['WordLanguageReader']
+__all__ = ['CorpusReader', 'WordLanguageReader']
 
 import io
 import os
@@ -67,7 +67,7 @@ def read(self):
 if self._tokenizer:
 samples = [self._tokenizer(s) for s in samples if s]
 if self._flatten:
-samples = flatten(samples)
+samples = flatten_samples(samples)
 else:
 samples = [s for s in samples if s]
 return SimpleDataset(samples)


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] nswamy commented on issue #10063: Documentation API or file

2018-03-12 Thread GitBox
nswamy commented on issue #10063: Documentation API or file
URL: 
https://github.com/apache/incubator-mxnet/issues/10063#issuecomment-372477419
 
 
   @lutzroeder 
   We manage and answer user questions/how to's on https://discuss.mxnet.io/, 
please post this question there. I will close this issue.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] nswamy closed issue #10063: Documentation API or file

2018-03-12 Thread GitBox
nswamy closed issue #10063: Documentation API or file
URL: https://github.com/apache/incubator-mxnet/issues/10063
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] nswamy commented on issue #10053: Get all labels from mx.io.ImageRecordIter

2018-03-12 Thread GitBox
nswamy commented on issue #10053: Get all labels from mx.io.ImageRecordIter
URL: 
https://github.com/apache/incubator-mxnet/issues/10053#issuecomment-372476272
 
 
   We manage and answer user questions/how to's on https://discuss.mxnet.io/, 
please post this question there. I will close this issue.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] nswamy closed issue #10053: Get all labels from mx.io.ImageRecordIter

2018-03-12 Thread GitBox
nswamy closed issue #10053: Get all labels from mx.io.ImageRecordIter
URL: https://github.com/apache/incubator-mxnet/issues/10053
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] nswamy closed issue #10057: Multiple workers on single CPU training

2018-03-12 Thread GitBox
nswamy closed issue #10057: Multiple workers on single CPU training
URL: https://github.com/apache/incubator-mxnet/issues/10057
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] nswamy commented on issue #10057: Multiple workers on single CPU training

2018-03-12 Thread GitBox
nswamy commented on issue #10057: Multiple workers on single CPU training
URL: 
https://github.com/apache/incubator-mxnet/issues/10057#issuecomment-372476119
 
 
   We manage and answer user questions/how to's on https://discuss.mxnet.io/, 
please post this question there. I will close this issue.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Jerryzcn commented on issue #10042: Gluon dataloader crash on speech recognition training

2018-03-12 Thread GitBox
Jerryzcn commented on issue #10042: Gluon dataloader crash on speech 
recognition training
URL: 
https://github.com/apache/incubator-mxnet/issues/10042#issuecomment-372471802
 
 
   Here is the code that get stuck. Change num_worker=0 will work, however, on 
Mac with 1.0.0post3, this is not an issue
   ```
   from mxnet.gluon.data import DataLoader
   
   import random
   
   import mxnet.ndarray as nd
   import numpy as np
   from mxnet import context
   from mxnet.gluon.data.dataset import Dataset
   
   
   class Dummy(Dataset):
   def __init__(self, random_shape):
   self.random_shape = random_shape
   
   def __getitem__(self, idx):
   key = idx
   if self.random_shape:
   out = np.random.uniform(size=(random.randint(1000, 1100), 40))
   labels = np.random.uniform(size=(random.randint(10, 15)))
   else:
   out = np.random.uniform(size=(1000, 40))
   labels = np.random.uniform(size=(10))
   return key, out, labels
   
   def __len__(self):
   return 5
   
   def batchify(self, data):
   """
   Collate data into batch. Use shared memory for stacking.
   
   :param data: a list of array, with layout of 'NTC'.
   :return either x  and x's unpadded lengths, or x, x's unpadded 
lengths, y and y's unpadded lengths
   if labels are not supplied.
   """
   
   # input layout is NTC
   keys, inputs, labels = [item[0] for item in data], [item[1] for item 
in data], \
  [item[2] for item in data]
   
   if len(data) > 1:
   max_data_len = max([seq.shape[0] for seq in inputs])
   max_labels_len = 0 if not labels else max([seq.shape[0] for seq 
in labels])
   else:
   max_data_len = inputs[0].shape[0]
   max_labels_len = 0 if not labels else labels[0].shape[0]
   
   x_lens = [item.shape[0] for item in inputs]
   y_lens = [item.shape[0] for item in labels]
   
   for i, seq in enumerate(inputs):
   pad_len = max_data_len - seq.shape[0]
   inputs[i] = np.pad(seq, ((0, pad_len), (0, 0)), 'constant', 
constant_values=0)
   labels[i] = np.pad(labels[i], (0, max_labels_len - 
labels[i].shape[0]),
  'constant', constant_values=-1)
   
   inputs = np.asarray(inputs, dtype=np.float32)
   if labels is not None:
   labels = np.asarray(labels, dtype=np.float32)
   inputs = inputs.transpose((1, 0, 2))
   labels = labels.transpose((1, 0))
   
   return (nd.array(inputs, dtype=inputs.dtype, 
ctx=context.Context('cpu_shared', 0)),
   nd.array(x_lens, ctx=context.Context('cpu_shared', 0))) \
   if labels is None else (
   nd.array(inputs, dtype=inputs.dtype, 
ctx=context.Context('cpu_shared', 0)),
   nd.array(x_lens, 
ctx=context.Context('cpu_shared', 0)),
   nd.array(labels, dtype=labels.dtype, 
ctx=context.Context('cpu_shared', 0)),
   nd.array(y_lens, 
ctx=context.Context('cpu_shared', 0)))
   
   
   def main():
   data = Dummy(True)
   loader = DataLoader(data, batch_size=40, batchify_fn=data.batchify, 
num_workers=2)
   for epoch in range(20):
   for i, data in enumerate(loader):
   if i % 10 == 0:
   print(data)
   print(i)
   
   
   if __name__ == '__main__':
   main()
   
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] KellenSunderland opened a new pull request #10080: WIP: Test building without explicit MKL disable

2018-03-12 Thread GitBox
KellenSunderland opened a new pull request #10080: WIP: Test building without 
explicit MKL disable
URL: https://github.com/apache/incubator-mxnet/pull/10080
 
 
   Testing CI when MKL isn't explicitly disabled.  Temporary PR, will be closed 
in the future without merge.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Jerryzcn commented on issue #10042: Gluon dataloader crash on speech recognition training

2018-03-12 Thread GitBox
Jerryzcn commented on issue #10042: Gluon dataloader crash on speech 
recognition training
URL: 
https://github.com/apache/incubator-mxnet/issues/10042#issuecomment-372471802
 
 
   Here is the code that get stuck. Change num_worker=0 will work, however, on 
Mac, this is not an issue
   ```
   from mxnet.gluon.data import DataLoader
   
   import random
   
   import mxnet.ndarray as nd
   import numpy as np
   from mxnet import context
   from mxnet.gluon.data.dataset import Dataset
   
   
   class Dummy(Dataset):
   def __init__(self, random_shape):
   self.random_shape = random_shape
   
   def __getitem__(self, idx):
   key = idx
   if self.random_shape:
   out = np.random.uniform(size=(random.randint(1000, 1100), 40))
   labels = np.random.uniform(size=(random.randint(10, 15)))
   else:
   out = np.random.uniform(size=(1000, 40))
   labels = np.random.uniform(size=(10))
   return key, out, labels
   
   def __len__(self):
   return 5
   
   def batchify(self, data):
   """
   Collate data into batch. Use shared memory for stacking.
   
   :param data: a list of array, with layout of 'NTC'.
   :return either x  and x's unpadded lengths, or x, x's unpadded 
lengths, y and y's unpadded lengths
   if labels are not supplied.
   """
   
   # input layout is NTC
   keys, inputs, labels = [item[0] for item in data], [item[1] for item 
in data], \
  [item[2] for item in data]
   
   if len(data) > 1:
   max_data_len = max([seq.shape[0] for seq in inputs])
   max_labels_len = 0 if not labels else max([seq.shape[0] for seq 
in labels])
   else:
   max_data_len = inputs[0].shape[0]
   max_labels_len = 0 if not labels else labels[0].shape[0]
   
   x_lens = [item.shape[0] for item in inputs]
   y_lens = [item.shape[0] for item in labels]
   
   for i, seq in enumerate(inputs):
   pad_len = max_data_len - seq.shape[0]
   inputs[i] = np.pad(seq, ((0, pad_len), (0, 0)), 'constant', 
constant_values=0)
   labels[i] = np.pad(labels[i], (0, max_labels_len - 
labels[i].shape[0]),
  'constant', constant_values=-1)
   
   inputs = np.asarray(inputs, dtype=np.float32)
   if labels is not None:
   labels = np.asarray(labels, dtype=np.float32)
   inputs = inputs.transpose((1, 0, 2))
   labels = labels.transpose((1, 0))
   
   return (nd.array(inputs, dtype=inputs.dtype, 
ctx=context.Context('cpu_shared', 0)),
   nd.array(x_lens, ctx=context.Context('cpu_shared', 0))) \
   if labels is None else (
   nd.array(inputs, dtype=inputs.dtype, 
ctx=context.Context('cpu_shared', 0)),
   nd.array(x_lens, 
ctx=context.Context('cpu_shared', 0)),
   nd.array(labels, dtype=labels.dtype, 
ctx=context.Context('cpu_shared', 0)),
   nd.array(y_lens, 
ctx=context.Context('cpu_shared', 0)))
   
   
   def main():
   data = Dummy(True)
   loader = DataLoader(data, batch_size=40, batchify_fn=data.batchify, 
num_workers=2)
   for epoch in range(20):
   for i, data in enumerate(loader):
   if i % 10 == 0:
   print(data)
   print(i)
   
   
   if __name__ == '__main__':
   main()
   
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] KellenSunderland commented on a change in pull request #10075: Fix CMake build issue with MKL.

2018-03-12 Thread GitBox
KellenSunderland commented on a change in pull request #10075: Fix CMake build 
issue with MKL.
URL: https://github.com/apache/incubator-mxnet/pull/10075#discussion_r173954308
 
 

 ##
 File path: ci/docker/runtime_functions.sh
 ##
 @@ -278,8 +278,6 @@ build_ubuntu_gpu_cmake() {
 cmake \
 -DUSE_CUDA=1   \
 -DUSE_CUDNN=1  \
--DUSE_MKLML_MKL=0  \
--DUSE_MKLDNN=0 \
 
 Review comment:
   build_ubuntu_gpu_cmake_mkldnn explicitly turns _on_ MKL, which is fine 
because if you're opting into a feature you know what that feature is.  My 
intent here was to test the default behaviour (what a user is likely to do).  
I.e. what happens when  you don't explicitly change any other option other than 
CUDA.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] nswamy commented on issue #10060: Does mxnet.image suppor s3?

2018-03-12 Thread GitBox
nswamy commented on issue #10060: Does mxnet.image suppor s3?
URL: 
https://github.com/apache/incubator-mxnet/issues/10060#issuecomment-372468469
 
 
   We manage and answer user questions/how to's on https://discuss.mxnet.io/, 
please post this question there. I will close this issue.
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] nswamy closed issue #10060: Does mxnet.image suppor s3?

2018-03-12 Thread GitBox
nswamy closed issue #10060: Does mxnet.image suppor s3?
URL: https://github.com/apache/incubator-mxnet/issues/10060
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] nswamy commented on issue #10063: Documentation API or file

2018-03-12 Thread GitBox
nswamy commented on issue #10063: Documentation API or file
URL: 
https://github.com/apache/incubator-mxnet/issues/10063#issuecomment-372468259
 
 
   We don't have all in one file, but are generated into md files here: 
https://github.com/nswamy/incubator-mxnet/tree/master/docs/api/python


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] nswamy commented on issue #10066: WarpCTC loss output

2018-03-12 Thread GitBox
nswamy commented on issue #10066: WarpCTC loss output
URL: 
https://github.com/apache/incubator-mxnet/issues/10066#issuecomment-372467431
 
 
   We manage and answer user questions/how to's on https://discuss.mxnet.io/, 
please post this question there. I will close this issue.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] nswamy closed issue #10066: WarpCTC loss output

2018-03-12 Thread GitBox
nswamy closed issue #10066: WarpCTC loss output
URL: https://github.com/apache/incubator-mxnet/issues/10066
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] marcoabreu commented on a change in pull request #10075: Fix CMake build issue with MKL.

2018-03-12 Thread GitBox
marcoabreu commented on a change in pull request #10075: Fix CMake build issue 
with MKL.
URL: https://github.com/apache/incubator-mxnet/pull/10075#discussion_r173946287
 
 

 ##
 File path: ci/docker/runtime_functions.sh
 ##
 @@ -278,8 +278,6 @@ build_ubuntu_gpu_cmake() {
 cmake \
 -DUSE_CUDA=1   \
 -DUSE_CUDNN=1  \
--DUSE_MKLML_MKL=0  \
--DUSE_MKLDNN=0 \
 
 Review comment:
   This case is handled by build_ubuntu_gpu_cmake_mkldnn. The idea here is to 
create a variety of environments and different build configurations. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] nswamy closed issue #10070: Distributed Training (Permission denied)

2018-03-12 Thread GitBox
nswamy closed issue #10070: Distributed Training (Permission denied)
URL: https://github.com/apache/incubator-mxnet/issues/10070
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] nswamy commented on issue #10070: Distributed Training (Permission denied)

2018-03-12 Thread GitBox
nswamy commented on issue #10070: Distributed Training (Permission denied)
URL: 
https://github.com/apache/incubator-mxnet/issues/10070#issuecomment-372461628
 
 
   We manage and answer user questions/how to's on https://discuss.mxnet.io/, 
please post this question there. I will close this issue.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] cjolivier01 commented on a change in pull request #10078: Support float16 in L2Normalization operator

2018-03-12 Thread GitBox
cjolivier01 commented on a change in pull request #10078: Support float16 in 
L2Normalization operator
URL: https://github.com/apache/incubator-mxnet/pull/10078#discussion_r173945220
 
 

 ##
 File path: src/operator/l2_normalization.cc
 ##
 @@ -26,13 +26,18 @@
 namespace mxnet {
 namespace op {
 template<>
-Operator* CreateOp(L2NormalizationParam param) {
-  return new L2NormalizationOp(param);
+Operator* CreateOp(L2NormalizationParam param, int dtype) {
+  Operator* op = NULL;
+  MSHADOW_REAL_TYPE_SWITCH(dtype, DType, {
+op = new L2NormalizationOp(param);
+  });
+  return op;
 }
 
 // DO_BIND_DISPATCH comes from static_operator_common.h
-Operator* L2NormalizationProp::CreateOperator(Context ctx) const {
-  DO_BIND_DISPATCH(CreateOp, param_);
+Operator* L2NormalizationProp::CreateOperatorEx(Context ctx, 
std::vector *in_shape,
+std::vector *in_type) 
const {
+  DO_BIND_DISPATCH(CreateOp, param_, in_type->at(0));
 
 Review comment:
   I am not saying you need to change it, but if that were the case, you 
wouldn;t have to override CreateOpEx(), which has nontrivial logic.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] KellenSunderland commented on a change in pull request #10075: Fix CMake build issue with MKL.

2018-03-12 Thread GitBox
KellenSunderland commented on a change in pull request #10075: Fix CMake build 
issue with MKL.
URL: https://github.com/apache/incubator-mxnet/pull/10075#discussion_r173944839
 
 

 ##
 File path: ci/docker/runtime_functions.sh
 ##
 @@ -278,8 +278,6 @@ build_ubuntu_gpu_cmake() {
 cmake \
 -DUSE_CUDA=1   \
 -DUSE_CUDNN=1  \
--DUSE_MKLML_MKL=0  \
--DUSE_MKLDNN=0 \
 
 Review comment:
   I see the value in explicitly turning off MKL, but what we want to do here 
is test the default build.  I think CI should reflect how users are likely to 
actually use a project, and I don't think they're likely to explicitly turn off 
all the settings they're not using.  We can certainly do both if you prefer, 
but in my mind it's more important to ensure basic builds work.
   
   If you feel strongly these should stay in I'll amend the commit, after this 
patch it should work in either case.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] cjolivier01 commented on a change in pull request #10078: Support float16 in L2Normalization operator

2018-03-12 Thread GitBox
cjolivier01 commented on a change in pull request #10078: Support float16 in 
L2Normalization operator
URL: https://github.com/apache/incubator-mxnet/pull/10078#discussion_r173944782
 
 

 ##
 File path: src/operator/l2_normalization.cc
 ##
 @@ -26,13 +26,18 @@
 namespace mxnet {
 namespace op {
 template<>
-Operator* CreateOp(L2NormalizationParam param) {
-  return new L2NormalizationOp(param);
+Operator* CreateOp(L2NormalizationParam param, int dtype) {
+  Operator* op = NULL;
+  MSHADOW_REAL_TYPE_SWITCH(dtype, DType, {
+op = new L2NormalizationOp(param);
+  });
+  return op;
 }
 
 // DO_BIND_DISPATCH comes from static_operator_common.h
-Operator* L2NormalizationProp::CreateOperator(Context ctx) const {
-  DO_BIND_DISPATCH(CreateOp, param_);
+Operator* L2NormalizationProp::CreateOperatorEx(Context ctx, 
std::vector *in_shape,
+std::vector *in_type) 
const {
+  DO_BIND_DISPATCH(CreateOp, param_, in_type->at(0));
 
 Review comment:
   Just FYI, usually, DType is determined within the Forward() and Backward() 
functions using the type switch from the actual input blob at runtime.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] nswamy commented on issue #10077: Fine tuning network

2018-03-12 Thread GitBox
nswamy commented on issue #10077: Fine tuning network
URL: 
https://github.com/apache/incubator-mxnet/issues/10077#issuecomment-372460019
 
 
   We manage user-questions on https://discuss.mxnet.io/, please post your 
question there. 
   I am closing this issue.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] nswamy closed issue #10077: Fine tuning network

2018-03-12 Thread GitBox
nswamy closed issue #10077: Fine tuning network
URL: https://github.com/apache/incubator-mxnet/issues/10077
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] haojin2 commented on a change in pull request #10078: Support float16 in L2Normalization operator

2018-03-12 Thread GitBox
haojin2 commented on a change in pull request #10078: Support float16 in 
L2Normalization operator
URL: https://github.com/apache/incubator-mxnet/pull/10078#discussion_r173943169
 
 

 ##
 File path: src/operator/l2_normalization.cc
 ##
 @@ -26,13 +26,18 @@
 namespace mxnet {
 namespace op {
 template<>
-Operator* CreateOp(L2NormalizationParam param) {
-  return new L2NormalizationOp(param);
+Operator* CreateOp(L2NormalizationParam param, int dtype) {
+  Operator* op = NULL;
+  MSHADOW_REAL_TYPE_SWITCH(dtype, DType, {
+op = new L2NormalizationOp(param);
+  });
+  return op;
 }
 
 // DO_BIND_DISPATCH comes from static_operator_common.h
-Operator* L2NormalizationProp::CreateOperator(Context ctx) const {
-  DO_BIND_DISPATCH(CreateOp, param_);
+Operator* L2NormalizationProp::CreateOperatorEx(Context ctx, 
std::vector *in_shape,
+std::vector *in_type) 
const {
+  DO_BIND_DISPATCH(CreateOp, param_, in_type->at(0));
 
 Review comment:
   I see, just added calls to InferType and InferShape to the code, the PR will 
be updated soon.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] sxjscience opened a new pull request #10079: Fix CorpusReader

2018-03-12 Thread GitBox
sxjscience opened a new pull request #10079: Fix CorpusReader
URL: https://github.com/apache/incubator-mxnet/pull/10079
 
 
   ## Description ##
   Fix the corpus reader
   @szha 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] cjolivier01 commented on a change in pull request #10078: Support float16 in L2Normalization operator

2018-03-12 Thread GitBox
cjolivier01 commented on a change in pull request #10078: Support float16 in 
L2Normalization operator
URL: https://github.com/apache/incubator-mxnet/pull/10078#discussion_r173941304
 
 

 ##
 File path: src/operator/l2_normalization.cc
 ##
 @@ -26,13 +26,18 @@
 namespace mxnet {
 namespace op {
 template<>
-Operator* CreateOp(L2NormalizationParam param) {
-  return new L2NormalizationOp(param);
+Operator* CreateOp(L2NormalizationParam param, int dtype) {
+  Operator* op = NULL;
+  MSHADOW_REAL_TYPE_SWITCH(dtype, DType, {
+op = new L2NormalizationOp(param);
+  });
+  return op;
 }
 
 // DO_BIND_DISPATCH comes from static_operator_common.h
-Operator* L2NormalizationProp::CreateOperator(Context ctx) const {
-  DO_BIND_DISPATCH(CreateOp, param_);
+Operator* L2NormalizationProp::CreateOperatorEx(Context ctx, 
std::vector *in_shape,
+std::vector *in_type) 
const {
+  DO_BIND_DISPATCH(CreateOp, param_, in_type->at(0));
 
 Review comment:
   Since you're overriding CreateOperatorEx(), then what ends up calling 
InferShape(), InferType(), which is normally done by the base class' 
CreateOperatorEx()?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] cjolivier01 commented on a change in pull request #10078: Support float16 in L2Normalization operator

2018-03-12 Thread GitBox
cjolivier01 commented on a change in pull request #10078: Support float16 in 
L2Normalization operator
URL: https://github.com/apache/incubator-mxnet/pull/10078#discussion_r173941034
 
 

 ##
 File path: src/operator/l2_normalization-inl.h
 ##
 @@ -294,7 +321,13 @@ class L2NormalizationProp : public OperatorProperty {
 return {ResourceRequest::kTempSpace};
   }
 
-  Operator* CreateOperator(Context ctx) const override;
+  Operator* CreateOperator(Context ctx) const override {
 
 Review comment:
   Ok, I see it is masked by your override of CreateOperatorEx()
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] haojin2 commented on a change in pull request #10078: Support float16 in L2Normalization operator

2018-03-12 Thread GitBox
haojin2 commented on a change in pull request #10078: Support float16 in 
L2Normalization operator
URL: https://github.com/apache/incubator-mxnet/pull/10078#discussion_r173938378
 
 

 ##
 File path: src/operator/l2_normalization-inl.h
 ##
 @@ -294,7 +321,13 @@ class L2NormalizationProp : public OperatorProperty {
 return {ResourceRequest::kTempSpace};
   }
 
-  Operator* CreateOperator(Context ctx) const override;
+  Operator* CreateOperator(Context ctx) const override {
 
 Review comment:
   Honestly I'm not really sure, with a simple grep for "CreateOperator" in src 
only this usage appeared:
   nnvm/legacy_op_util.cc:297:  return 
OpStatePtr::Create(prop.ptr->CreateOperatorEx(ctx, , ),


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] haojin2 commented on a change in pull request #10078: Support float16 in L2Normalization operator

2018-03-12 Thread GitBox
haojin2 commented on a change in pull request #10078: Support float16 in 
L2Normalization operator
URL: https://github.com/apache/incubator-mxnet/pull/10078#discussion_r173938378
 
 

 ##
 File path: src/operator/l2_normalization-inl.h
 ##
 @@ -294,7 +321,13 @@ class L2NormalizationProp : public OperatorProperty {
 return {ResourceRequest::kTempSpace};
   }
 
-  Operator* CreateOperator(Context ctx) const override;
+  Operator* CreateOperator(Context ctx) const override {
 
 Review comment:
   Honestly I'm not really sure, with a simply grep for "CreateOperator" in src 
only this usage appeared:
   nnvm/legacy_op_util.cc:297:  return 
OpStatePtr::Create(prop.ptr->CreateOperatorEx(ctx, , ),


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] cjolivier01 commented on a change in pull request #10078: Support float16 in L2Normalization operator

2018-03-12 Thread GitBox
cjolivier01 commented on a change in pull request #10078: Support float16 in 
L2Normalization operator
URL: https://github.com/apache/incubator-mxnet/pull/10078#discussion_r173936443
 
 

 ##
 File path: src/operator/l2_normalization-inl.h
 ##
 @@ -294,7 +321,13 @@ class L2NormalizationProp : public OperatorProperty {
 return {ResourceRequest::kTempSpace};
   }
 
-  Operator* CreateOperator(Context ctx) const override;
+  Operator* CreateOperator(Context ctx) const override {
 
 Review comment:
   Does something still call this?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] marcoabreu commented on a change in pull request #9552: [REQUEST FOR REVIEW | DO NOT MERGE] Model Quantization with Calibration

2018-03-12 Thread GitBox
marcoabreu commented on a change in pull request #9552: [REQUEST FOR REVIEW | 
DO NOT MERGE] Model Quantization with Calibration
URL: https://github.com/apache/incubator-mxnet/pull/9552#discussion_r173934712
 
 

 ##
 File path: tests/ci_build/Dockerfile.build_cuda8_cudnn7
 ##
 @@ -0,0 +1,26 @@
+FROM nvidia/cuda:8.0-cudnn7-devel
+# cuda8.0 has to be used because this is the first ubuntu16.04 container
+# which is required due to OpenBLAS being incompatible with ubuntu14.04
+# the reason we used a gpu base container because we are going to test MKLDNN
+# operator implementation against GPU implementation
 
 Review comment:
   Please remove this file, we're not using that directory for dockerfiles 
anymore


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] haojin2 opened a new pull request #10078: Support float16 in L2Normalization operator

2018-03-12 Thread GitBox
haojin2 opened a new pull request #10078: Support float16 in L2Normalization 
operator
URL: https://github.com/apache/incubator-mxnet/pull/10078
 
 
   ## Description ##
   Add support for any datatype for L2Normalization operator.
   
   ## Checklist ##
   ### Essentials ###
   - [x] Passed code style checking (`make lint`)
   - [x] Changes are complete (i.e. I finished coding on this PR)
   - [x] All changes have test coverage:
   - Unit tests are added for small changes to verify correctness (e.g. adding 
a new operator)
   - [x] Code is well-documented: 
   - [x] To the my best knowledge, examples are either not affected by this 
change, or have been fixed to be compatible with this change
   
   ### Changes ###
   - [x] Change L2Normalization operator from only supporting real_t to 
supporting any datatype
   - [x] Add additional test cases for float16
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] reminisce commented on a change in pull request #9552: [REQUEST FOR REVIEW | DO NOT MERGE] Model Quantization with Calibration

2018-03-12 Thread GitBox
reminisce commented on a change in pull request #9552: [REQUEST FOR REVIEW | DO 
NOT MERGE] Model Quantization with Calibration
URL: https://github.com/apache/incubator-mxnet/pull/9552#discussion_r173931539
 
 

 ##
 File path: ci/docker/runtime_functions.sh
 ##
 @@ -307,6 +307,7 @@ unittest_ubuntu_python2_cpu() {
 export MXNET_STORAGE_FALLBACK_LOG_VERBOSE=0
 nosetests-2.7 --verbose tests/python/unittest
 nosetests-2.7 --verbose tests/python/train
+nosetests-2.7 --verbose tests/python/quantization
 
 Review comment:
   NN operators will not run on G3. There is a context check in the python 
function and it will skip the tests with context of gpu on G3.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] reminisce commented on a change in pull request #9552: [REQUEST FOR REVIEW | DO NOT MERGE] Model Quantization with Calibration

2018-03-12 Thread GitBox
reminisce commented on a change in pull request #9552: [REQUEST FOR REVIEW | DO 
NOT MERGE] Model Quantization with Calibration
URL: https://github.com/apache/incubator-mxnet/pull/9552#discussion_r173931539
 
 

 ##
 File path: ci/docker/runtime_functions.sh
 ##
 @@ -307,6 +307,7 @@ unittest_ubuntu_python2_cpu() {
 export MXNET_STORAGE_FALLBACK_LOG_VERBOSE=0
 nosetests-2.7 --verbose tests/python/unittest
 nosetests-2.7 --verbose tests/python/train
+nosetests-2.7 --verbose tests/python/quantization
 
 Review comment:
   NN operators will not run on G3. There is a context check in the python 
function and it will skip the context of gpu.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] marcoabreu commented on a change in pull request #9552: [REQUEST FOR REVIEW | DO NOT MERGE] Model Quantization with Calibration

2018-03-12 Thread GitBox
marcoabreu commented on a change in pull request #9552: [REQUEST FOR REVIEW | 
DO NOT MERGE] Model Quantization with Calibration
URL: https://github.com/apache/incubator-mxnet/pull/9552#discussion_r173931082
 
 

 ##
 File path: ci/docker/runtime_functions.sh
 ##
 @@ -307,6 +307,7 @@ unittest_ubuntu_python2_cpu() {
 export MXNET_STORAGE_FALLBACK_LOG_VERBOSE=0
 nosetests-2.7 --verbose tests/python/unittest
 nosetests-2.7 --verbose tests/python/train
+nosetests-2.7 --verbose tests/python/quantization
 
 Review comment:
   So what happens if an NN operator test is being hit on a G3 instance?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] marcoabreu commented on a change in pull request #9552: [REQUEST FOR REVIEW | DO NOT MERGE] Model Quantization with Calibration

2018-03-12 Thread GitBox
marcoabreu commented on a change in pull request #9552: [REQUEST FOR REVIEW | 
DO NOT MERGE] Model Quantization with Calibration
URL: https://github.com/apache/incubator-mxnet/pull/9552#discussion_r173930937
 
 

 ##
 File path: ci/docker/runtime_functions.sh
 ##
 @@ -339,6 +341,32 @@ unittest_ubuntu_python3_gpu() {
 nosetests-3.4 --verbose tests/python/gpu
 }
 
+# quantization gpu currently only runs on P3 instances
+# need to separte it from unittest_ubuntu_python2_gpu()
+unittest_ubuntu_python2_quantization_gpu() {
+set -ex
+export PYTHONPATH=./python/ 
+# MXNET_MKLDNN_DEBUG is buggy and produces false positives
+# https://github.com/apache/incubator-mxnet/issues/10026
+#export MXNET_MKLDNN_DEBUG=1  # Ignored if not present
+export MXNET_STORAGE_FALLBACK_LOG_VERBOSE=0
+nosetests-2.7 --verbose tests/python/gpu
 
 Review comment:
   Yes, for now.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] mbaijal commented on issue #10058: Adding back comments to index.md that cause nightly test to fail

2018-03-12 Thread GitBox
mbaijal commented on issue #10058: Adding back comments to index.md that cause 
nightly test to fail
URL: https://github.com/apache/incubator-mxnet/pull/10058#issuecomment-372444085
 
 
   @marcoabreu Can you please merge this since its causing a nightly test to 
fail. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] reminisce commented on a change in pull request #9552: [REQUEST FOR REVIEW | DO NOT MERGE] Model Quantization with Calibration

2018-03-12 Thread GitBox
reminisce commented on a change in pull request #9552: [REQUEST FOR REVIEW | DO 
NOT MERGE] Model Quantization with Calibration
URL: https://github.com/apache/incubator-mxnet/pull/9552#discussion_r173927217
 
 

 ##
 File path: ci/docker/runtime_functions.sh
 ##
 @@ -339,6 +341,32 @@ unittest_ubuntu_python3_gpu() {
 nosetests-3.4 --verbose tests/python/gpu
 }
 
+# quantization gpu currently only runs on P3 instances
+# need to separte it from unittest_ubuntu_python2_gpu()
+unittest_ubuntu_python2_quantization_gpu() {
+set -ex
+export PYTHONPATH=./python/ 
+# MXNET_MKLDNN_DEBUG is buggy and produces false positives
+# https://github.com/apache/incubator-mxnet/issues/10026
+#export MXNET_MKLDNN_DEBUG=1  # Ignored if not present
+export MXNET_STORAGE_FALLBACK_LOG_VERBOSE=0
+nosetests-2.7 --verbose tests/python/gpu
 
 Review comment:
   Is it intended that we don't run any tests on P3 except quantization?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] reminisce commented on a change in pull request #9552: [REQUEST FOR REVIEW | DO NOT MERGE] Model Quantization with Calibration

2018-03-12 Thread GitBox
reminisce commented on a change in pull request #9552: [REQUEST FOR REVIEW | DO 
NOT MERGE] Model Quantization with Calibration
URL: https://github.com/apache/incubator-mxnet/pull/9552#discussion_r173926870
 
 

 ##
 File path: ci/docker/runtime_functions.sh
 ##
 @@ -307,6 +307,7 @@ unittest_ubuntu_python2_cpu() {
 export MXNET_STORAGE_FALLBACK_LOG_VERBOSE=0
 nosetests-2.7 --verbose tests/python/unittest
 nosetests-2.7 --verbose tests/python/train
+nosetests-2.7 --verbose tests/python/quantization
 
 Review comment:
   The tests run here are for basic tensor operators such as quantize, 
dequantize, and requantize, which have both CPU and GPU versions implemented. 
The operators that can only run on P3 are NN operators such as FC, Convolution, 
and Pooling.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] szhengac commented on issue #9881: Inconsistent weight decay logics in multiple optimizers

2018-03-12 Thread GitBox
szhengac commented on issue #9881: Inconsistent weight decay logics in multiple 
optimizers
URL: 
https://github.com/apache/incubator-mxnet/issues/9881#issuecomment-372439433
 
 
   Unless explicitly specified, for most of the optimizers as implemented in 
packages such as TF and torch, wd is merged into the gradient before the 
gradient clipping. When the proximal operator is used, the wd term is not 
merged.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


  1   2   >