[GitHub] cjolivier01 commented on a change in pull request #9933: Adding support to profile kvstore server during distributed training
cjolivier01 commented on a change in pull request #9933: Adding support to profile kvstore server during distributed training URL: https://github.com/apache/incubator-mxnet/pull/9933#discussion_r171450142 ## File path: src/kvstore/kvstore_dist_server.h ## @@ -170,6 +187,33 @@ class KVStoreDistServer { app->Response(recved); } + void SetProfilerConfig(std::string params_str) { +std::vector elems; +mxnet::kvstore::split(params_str, ',', std::back_inserter(elems)); +std::vector ckeys; +std::vector cvals; +ckeys.reserve(elems.size()); +cvals.reserve(elems.size()); + +for (int i=0; i < elems.size(); i++) { + std::vector parts; + mxnet::kvstore::split(elems[i], ':', std::back_inserter(parts)); + CHECK_NOTNULL(parts[0].c_str()); + CHECK_NOTNULL(parts[1].c_str()); + if (parts[0] == "filename") { +parts[1] = "rank" + std::to_string(ps::MyRank()) + "_" + parts[1]; Review comment: c_str() will never return null This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] cjolivier01 commented on a change in pull request #9933: Adding support to profile kvstore server during distributed training
cjolivier01 commented on a change in pull request #9933: Adding support to profile kvstore server during distributed training URL: https://github.com/apache/incubator-mxnet/pull/9933#discussion_r171450417 ## File path: src/kvstore/kvstore_dist_server.h ## @@ -170,6 +187,33 @@ class KVStoreDistServer { app->Response(recved); } + void SetProfilerConfig(std::string params_str) { +std::vector elems; +mxnet::kvstore::split(params_str, ',', std::back_inserter(elems)); +std::vector ckeys; +std::vector cvals; +ckeys.reserve(elems.size()); +cvals.reserve(elems.size()); + +for (int i=0; i < elems.size(); i++) { + std::vector parts; Review comment: you never push into the vector, right? then [0] and [1] indexing is invalid addressing This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] cjolivier01 commented on a change in pull request #9933: Adding support to profile kvstore server during distributed training
cjolivier01 commented on a change in pull request #9933: Adding support to profile kvstore server during distributed training URL: https://github.com/apache/incubator-mxnet/pull/9933#discussion_r171448085 ## File path: src/kvstore/kvstore_dist_server.h ## @@ -170,6 +187,33 @@ class KVStoreDistServer { app->Response(recved); } + void SetProfilerConfig(std::string params_str) { +std::vector elems; +mxnet::kvstore::split(params_str, ',', std::back_inserter(elems)); +std::vector ckeys; +std::vector cvals; +ckeys.reserve(elems.size()); +cvals.reserve(elems.size()); + +for (int i=0; i < elems.size(); i++) { + std::vector parts; + mxnet::kvstore::split(elems[i], ':', std::back_inserter(parts)); + CHECK_NOTNULL(parts[0].c_str()); + CHECK_NOTNULL(parts[1].c_str()); + if (parts[0] == "filename") { +parts[1] = "rank" + std::to_string(ps::MyRank()) + "_" + parts[1]; + } + char * ckey = new char[parts[0].length() + 1]; + std::sprintf(ckey, "%s", parts[0].c_str()); + ckeys.push_back(ckey); + + char* cval = new char[parts[1].length() + 1]; + std::sprintf(cval, "%s", parts[1].c_str()); Review comment: why sprintf? it?s not very efficient compared to a strcpy, which is generally pretty fast This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] cjolivier01 commented on a change in pull request #9933: Adding support to profile kvstore server during distributed training
cjolivier01 commented on a change in pull request #9933: Adding support to profile kvstore server during distributed training URL: https://github.com/apache/incubator-mxnet/pull/9933#discussion_r171447567 ## File path: src/kvstore/kvstore_dist.h ## @@ -92,6 +92,14 @@ class KVStoreDist : public KVStoreLocal { } } + void SetServerProfilerCommand(KVStoreServerProfilerCommand type, const std::string params) override { Review comment: please pass std::string by const reference This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] cjolivier01 commented on a change in pull request #9933: Adding support to profile kvstore server during distributed training
cjolivier01 commented on a change in pull request #9933: Adding support to profile kvstore server during distributed training URL: https://github.com/apache/incubator-mxnet/pull/9933#discussion_r171447968 ## File path: src/kvstore/kvstore_dist_server.h ## @@ -170,6 +187,33 @@ class KVStoreDistServer { app->Response(recved); } + void SetProfilerConfig(std::string params_str) { +std::vector elems; +mxnet::kvstore::split(params_str, ',', std::back_inserter(elems)); +std::vector ckeys; +std::vector cvals; +ckeys.reserve(elems.size()); +cvals.reserve(elems.size()); + +for (int i=0; i < elems.size(); i++) { + std::vector parts; + mxnet::kvstore::split(elems[i], ':', std::back_inserter(parts)); + CHECK_NOTNULL(parts[0].c_str()); + CHECK_NOTNULL(parts[1].c_str()); + if (parts[0] == "filename") { +parts[1] = "rank" + std::to_string(ps::MyRank()) + "_" + parts[1]; + } + char * ckey = new char[parts[0].length() + 1]; + std::sprintf(ckey, "%s", parts[0].c_str()); + ckeys.push_back(ckey); + + char* cval = new char[parts[1].length() + 1]; Review comment: nit: consistent * placement This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] cjolivier01 commented on a change in pull request #9933: Adding support to profile kvstore server during distributed training
cjolivier01 commented on a change in pull request #9933: Adding support to profile kvstore server during distributed training URL: https://github.com/apache/incubator-mxnet/pull/9933#discussion_r171447768 ## File path: src/kvstore/kvstore_dist_server.h ## @@ -159,6 +163,19 @@ class KVStoreDistServer { sync_mode_ = true; } else if (recved_type == CommandType::kSetGradientCompression) { gradient_compression_->DecodeParams(recved.body); +} else if (recved_type == CommandType::kSetProfilerParams) { + // last char is the type of profiler command + KVStoreServerProfilerCommand profiler_command_type = + static_cast(recved.body.back() - '0'); + if (profiler_command_type == KVStoreServerProfilerCommand::kSetConfig) { Review comment: prefer switch to many ifs This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] cjolivier01 commented on a change in pull request #9933: Adding support to profile kvstore server during distributed training
cjolivier01 commented on a change in pull request #9933: Adding support to profile kvstore server during distributed training URL: https://github.com/apache/incubator-mxnet/pull/9933#discussion_r171446913 ## File path: include/mxnet/kvstore.h ## @@ -38,6 +38,11 @@ #endif // MXNET_USE_DIST_KVSTORE namespace mxnet { + +enum class KVStoreServerProfilerCommand { Review comment: nit: doc comment This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] cjolivier01 commented on a change in pull request #9933: Adding support to profile kvstore server during distributed training
cjolivier01 commented on a change in pull request #9933: Adding support to profile kvstore server during distributed training URL: https://github.com/apache/incubator-mxnet/pull/9933#discussion_r171447072 ## File path: include/mxnet/c_api.h ## @@ -2005,6 +2005,16 @@ MXNET_DLL int MXKVStoreSendCommmandToServers(KVStoreHandle handle, int cmd_id, const char* cmd_body); + +MXNET_DLL int MXKVStoreSetServerProfilerConfig(KVStoreHandle handle, Review comment: documentation in comments, please This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] cjolivier01 commented on a change in pull request #9933: Adding support to profile kvstore server during distributed training
cjolivier01 commented on a change in pull request #9933: Adding support to profile kvstore server during distributed training URL: https://github.com/apache/incubator-mxnet/pull/9933#discussion_r171446726 ## File path: src/kvstore/kvstore_dist_server.h ## @@ -202,12 +203,12 @@ class KVStoreDistServer { if (parts[0] == "filename") { parts[1] = "rank" + std::to_string(ps::MyRank()) + "_" + parts[1]; } - char * ckey = new char [parts[0].length()+1]; - std::strcpy (ckey, parts[0].c_str()); + char * ckey = new char[parts[0].length() + 1]; + std::sprintf(ckey, "%s", parts[0].c_str()); Review comment: why? this would be slower This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] cjolivier01 commented on a change in pull request #9933: Adding support to profile kvstore server during distributed training
cjolivier01 commented on a change in pull request #9933: Adding support to profile kvstore server during distributed training URL: https://github.com/apache/incubator-mxnet/pull/9933#discussion_r171446942 ## File path: include/mxnet/kvstore.h ## @@ -361,6 +366,11 @@ class KVStore { */ virtual void SendCommandToServers(int cmd_id, const std::string& cmd_body) { } + virtual void SetServerProfilerCommand(KVStoreServerProfilerCommand type, Review comment: doc comment This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services