[GitHub] solin319 commented on issue #10366: fix bug in sgd
solin319 commented on issue #10366: fix bug in sgd URL: https://github.com/apache/incubator-mxnet/pull/10366#issuecomment-383771962 Batch-size=128 Use device kvstore, the performance almost same, both about 110 samples/sec. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] solin319 commented on issue #10366: fix bug in sgd
solin319 commented on issue #10366: fix bug in sgd URL: https://github.com/apache/incubator-mxnet/pull/10366#issuecomment-383478020 @eric-haibin-lin gpus=2*k80 network=vgg16 data=imagenet kv-store=local The performance is 131samples/sec when we remove temp resource. If not the performance is 117samples/sec This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] solin319 commented on issue #10366: fix bug in sgd
solin319 commented on issue #10366: fix bug in sgd URL: https://github.com/apache/incubator-mxnet/pull/10366#issuecomment-383478020 @eric-haibin-lin gpus=2*k80 network=vgg16 data=imagenet The performance is 131samples/sec when we remove temp resource. If not the performance is 117samples/sec This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] solin319 commented on issue #10366: fix bug in sgd
solin319 commented on issue #10366: fix bug in sgd URL: https://github.com/apache/incubator-mxnet/pull/10366#issuecomment-379620820 The results above was get in multi-GPU training with kv_store='local'. The same problem was in kv_store='device' too. When we training in multi-machine, the problem doesn't exists. @eric-haibin-lin It's a good idea to update FResourceRequest interface. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] solin319 commented on issue #10366: fix bug in sgd
solin319 commented on issue #10366: fix bug in sgd URL: https://github.com/apache/incubator-mxnet/pull/10366#issuecomment-378185953 set MXNET_CPU_TEMP_COPY = 100 When training resnet-50, the sgd_mom_update still can't start directly after fist backward computation. ![default](https://user-images.githubusercontent.com/13029886/38240939-83a60560-3763-11e8-9871-5ac184e39cb7.PNG) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] solin319 commented on issue #10366: fix bug in sgd
solin319 commented on issue #10366: fix bug in sgd URL: https://github.com/apache/incubator-mxnet/pull/10366#issuecomment-378098403 @eric-haibin-lin MXNET_EXEC_NUM_TEMP doesn't work. But make MXNET_CPU_TEMP_COPY and MXNET_GPU_TEMP_COPY larger can solve the overlap problem. It's difficult to find the description for these two env vars, and I think it's difficult to use for many users. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services