[GitHub] eric-haibin-lin commented on issue #7893: Add barriers in kvstore init
eric-haibin-lin commented on issue #7893: Add barriers in kvstore init URL: https://github.com/apache/incubator-mxnet/pull/7893#issuecomment-335201037 PR moved to #8187 now. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] eric-haibin-lin commented on issue #7893: Add barriers in kvstore init
eric-haibin-lin commented on issue #7893: Add barriers in kvstore init URL: https://github.com/apache/incubator-mxnet/pull/7893#issuecomment-334042834 @solin319 no worries. Happy mid-autumn festival! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] eric-haibin-lin commented on issue #7893: Add barriers in kvstore init
eric-haibin-lin commented on issue #7893: Add barriers in kvstore init URL: https://github.com/apache/incubator-mxnet/pull/7893#issuecomment-334014237 The test failure seems irrelevant. Do you mind sync up with master again to see if it passes? Regarding fp16, yes I think we need that fix. The current static_casting approach is a bug. Why is an extra copy required? If you train with fp16, are both weight and gradient in fp16? Or you're just trying to minimize the network traffic? BTW @rahul003 is working on gradient compression in parallel This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] eric-haibin-lin commented on issue #7893: Add barriers in kvstore init
eric-haibin-lin commented on issue #7893: Add barriers in kvstore init URL: https://github.com/apache/incubator-mxnet/pull/7893#issuecomment-333282304 The fix looks good. Could you add a test in `tests/nightly/dist_sync_kvstore.py `, which tests `kv.pull ` right after `kv.init`? Also please sync with master branch. Thanks for the contribution! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] eric-haibin-lin commented on issue #7893: Add barriers in kvstore init
eric-haibin-lin commented on issue #7893: Add barriers in kvstore init URL: https://github.com/apache/incubator-mxnet/pull/7893#issuecomment-330613450 Are you using multi-machine with GPUs? I'm not getting the setup here. Could you post a code snippet for others to reproduce? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services