[GitHub] jeremiedb commented on issue #9803: R Metrics

2018-03-07 Thread GitBox
jeremiedb commented on issue #9803: R Metrics
URL: https://github.com/apache/incubator-mxnet/pull/9803#issuecomment-371074856
 
 
   Sure, solving the root cause is definitely preferable. Help would however be 
welcomed to track the memory leak; I'm lacking resources to do so. 
   Given there seems to be little dev effort on the R package right now, I 
thought it would still be preferable to have an easy option to circumvent the 
issue that can impair package usability, as patchy it may be. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] jeremiedb commented on issue #9803: R Metrics

2018-03-04 Thread GitBox
jeremiedb commented on issue #9803: R Metrics
URL: https://github.com/apache/incubator-mxnet/pull/9803#issuecomment-370259029
 
 
   Added a metric_cpu option defaulting to TRUE to have metrics computed on 
CPU, but allowing computation on GPU when needed. There's now no performance 
tradeoff. 
   
   An option has been added to perdiodically calll gc() in model.rnn. There's 
clearly a memory leak (both with FeedforwardCreate and model.rnn) but I cannot 
see how to fix this at root. This is a pre-existing issue, to be fixed in the 
future. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] jeremiedb commented on issue #9803: R Metrics

2018-03-04 Thread GitBox
jeremiedb commented on issue #9803: R Metrics
URL: https://github.com/apache/incubator-mxnet/pull/9803#issuecomment-370259029
 
 
   Added a metric_cpu option defaulting to TRUE to have metrics computed on 
CPU, but allowing computation on GPU when needed. There's now no performance 
tradeoff. 
   
   An option has been added to perdiodically calll gc() in model.rnn. There's 
clearly a memory leak (both with FeedforwardCreate and model.rnn) but I cannot 
see how to fix this at root. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] jeremiedb commented on issue #9803: R Metrics

2018-02-17 Thread GitBox
jeremiedb commented on issue #9803: R Metrics
URL: https://github.com/apache/incubator-mxnet/pull/9803#issuecomment-366425707
 
 
   Benchmark performed on AWS K80.
   
   Going to NDArray has no significant speed modification on MNIST:
   
   Original: 
   ```
   > source("../NLP/model_rnn_old.R")
   > source("../NLP/metrics_old.R")
   > tic("model training")
   > model <- mx.model.buckets(symbol = convnet,
   +   train.data = train.iter,
   +   eval.data = eval.iter,
   +   num.round = 5, ctx = ctx, verbose = TRUE,
   +   metric = mx.metric.accuracy,
   +   optimizer = optimizer,  initializer = 
initializer,
   +   batch.end.callback = batch.end.callback,
   +   epoch.end.callback = epoch.end.callback)
   Start training with 1 devices
   Batch [25] Speed: 17512.6201754097 samples/sec Train-accuracy=0.378203125
   [1] Train-accuracy=0.500131967905405
   [1] Validation-accuracy=0.8005859375
   Batch [25] Speed: 19089.099867642 samples/sec Train-accuracy=0.84328125
   [2] Train-accuracy=0.855600717905405
   [2] Validation-accuracy=0.8712890625
   Batch [25] Speed: 19066.4264974117 samples/sec Train-accuracy=0.8941015625
   [3] Train-accuracy=0.898701435810811
   [3] Validation-accuracy=0.8982421875
   Batch [25] Speed: 19019.28342411 samples/sec Train-accuracy=0.9138671875
   [4] Train-accuracy=0.916807432432432
   [4] Validation-accuracy=0.9126953125
   Batch [25] Speed: 19028.9402301531 samples/sec Train-accuracy=0.924453125
   [5] Train-accuracy=0.926889780405405
   [5] Validation-accuracy=0.9203125
   > mx.model.save(model = model, prefix = "models/model_en_fr_conv_4", 
iteration = 0)
   > toc()
   model training: 11.227 sec elapsed
   ```
   
   With NDArray:
   
   ```
   > source("../NLP/model_rnn_old.R")
   > source("../NLP/metrics.R")
   > tic("model training")
   > model <- mx.model.buckets(symbol = convnet,
   +   train.data = train.iter,
   +   eval.data = eval.iter,
   +   num.round = 5, ctx = ctx, verbose = TRUE,
   +   metric = mx.metric.accuracy,
   +   optimizer = optimizer,  initializer = 
initializer,
   +   batch.end.callback = batch.end.callback,
   +   epoch.end.callback = epoch.end.callback)
   Start training with 1 devices
   Batch [25] Speed: 19127.6006208524 samples/sec Train-accuracy=0.360625
   [1] Train-accuracy=0.479386613175676
   [1] Validation-accuracy=0.784765625
   Batch [25] Speed: 19106.7837383884 samples/sec Train-accuracy=0.82609375
   [2] Train-accuracy=0.839817356418919
   [2] Validation-accuracy=0.874609375
   Batch [25] Speed: 19091.8084094211 samples/sec Train-accuracy=0.8843359375
   [3] Train-accuracy=0.889490076013513
   [3] Validation-accuracy=0.901953125
   Batch [25] Speed: 19122.3887433588 samples/sec Train-accuracy=0.9060546875
   [4] Train-accuracy=0.909311655405405
   [4] Validation-accuracy=0.91171875
   Batch [25] Speed: 19023.3337071591 samples/sec Train-accuracy=0.9182421875
   [5] Train-accuracy=0.920792863175676
   [5] Validation-accuracy=0.9203125
   > mx.model.save(model = model, prefix = "models/model_en_fr_conv_4", 
iteration = 0)
   > toc()
   model training: 11.113 sec elapsed
   ```
   
   However, the on device (GPU) calculation on device brings some overhead in 
this case. Seems in the range of 2-5%. 
   
   ```
   > source("../NLP/model_rnn.R")
   > source("../NLP/metrics.R")
   > tic("model training")
   > model <- mx.model.buckets(symbol = convnet,
   +   train.data = train.iter,
   +   eval.data = eval.iter,
   +   num.round = 5, 
   +   ctx = ctx, 
   +   verbose = TRUE,
   +   metric = mx.metric.accuracy,
   +   optimizer = optimizer,  initializer = 
initializer,
   +   batch.end.callback = batch.end.callback,
   +   epoch.end.callback = epoch.end.callback)
   Start training with 1 devices
   Batch [25] Speed: 18690.2377642633 samples/sec Train-accuracy=0.4166796875
   [1] Train-accuracy=0.517868454391892
   [1] Validation-accuracy=0.7767578125
   Batch [25] Speed: 18716.4802325446 samples/sec Train-accuracy=0.819609375
   [2] Train-accuracy=0.833588471283784
   [2] Validation-accuracy=0.8595703125
   Batch [25] Speed: 18732.0356273382 samples/sec Train-accuracy=0.8775390625
   [3] Train-accuracy=0.883155616554054
   [3] Validation-accuracy=0.671875
   Batch [25] Speed: 18680.6843297833 samples/sec Train-accuracy=0.900234375
   [4] Train-accuracy=0.904244087837838
   [4] Validation-accuracy=0.90546875
   Batch [25] Speed: 18706.2253398244 samples/sec Train-accuracy=0.9145703125
   [5] Tra