[GitHub] mbaijal opened a new pull request #8770: [Merge into v1.0.0 ONLY][Copy of PR #8704] Prep1.0: bump the version number and 0.12.1 updates

2017-11-21 Thread GitBox
mbaijal opened a new pull request #8770: [Merge into v1.0.0 ONLY][Copy of PR 
#8704] Prep1.0: bump the version number and 0.12.1 updates
URL: https://github.com/apache/incubator-mxnet/pull/8770
 
 
   ## Description ##
   bump up the version number to 1.0.0
   Add the NEWS.md changes from 0.12.1 to release branch 
   Add the README.md changes from 0.12.1 to release branch 
   
   ## Checklist ##
   ### Essentials ###
   - [ ] Passed code style checking (`make lint`)
   - [ ] Changes are complete (i.e. I finished coding on this PR)
   - [ ] All changes have test coverage
   - [ ] For user-facing API changes, API doc string has been updated. For new 
C++ functions in header files, their functionalities and arguments are 
well-documented. 
   - [ ] To my best knowledge, examples are either not affected by this change, 
or have been fixed to be compatible with this change
   
   ## Comments ##
   The Readme.md and news.md have not been updated for 1.0. That will be a 
separate PR
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] madjam opened a new pull request #8769: Updating ps-lite submodule

2017-11-21 Thread GitBox
madjam opened a new pull request #8769: Updating ps-lite submodule
URL: https://github.com/apache/incubator-mxnet/pull/8769
 
 
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] zhreshold commented on issue #8735: Add cast to Block and Parameter. Implicit dtype casting is removed.

2017-11-21 Thread GitBox
zhreshold commented on issue #8735: Add cast to Block and Parameter. Implicit 
dtype casting is removed.
URL: https://github.com/apache/incubator-mxnet/pull/8735#issuecomment-346256928
 
 
   @piiswrong  So it's gonna be 
   ```
   net = HybridBlock(xxx)
   net.cast('float16')
   ```
   And we cannot instantiate a layer with dtype? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] cjolivier01 commented on a change in pull request #8719: Tune without Launch specialization macros

2017-11-21 Thread GitBox
cjolivier01 commented on a change in pull request #8719: Tune without Launch 
specialization macros
URL: https://github.com/apache/incubator-mxnet/pull/8719#discussion_r152475909
 
 

 ##
 File path: src/operator/mxnet_op.h
 ##
 @@ -441,8 +447,80 @@ struct Kernel {
 OP::Map(0, N, args...);
 #endif
   }
+
+  /*!
+   * \brief Launch a tunable OP with explicitly-supplied data type
+   * \tparam DType Data type
+   * \tparam OP type
+   * \tparam Args Varargs type to eventually pass to the OP::Map() functoion
+   * \param s Stream (usually null for CPU)
+   * \param N Number of iterations
+   * \param args Varargs to eventually pass to the OP::Map() functoion
+   * \return Always true
+   */
+  template
+  static MSHADOW_CINLINE
+  typename std::enable_if::value, bool>::type
+  LaunchWithType(mshadow::Stream *s, const int N, Args... args) {
+LaunchTuned(s, N, args...);
+return true;
+  }
+
+  /*!
+   * \brief Launch a tunable OP with implicitly-supplied data type
+   * \tparam DType Data type
+   * \tparam T OP type
+   * \tparam Args Varargs type to eventually pass to the OP::Map() functoion
+   * \param s Stream (usually null for CPU)
+   * \param N Number of iterations
+   * \param args Varargs to eventually pass to the OP::Map() functoion
+   * \return Always true
+   */
+  template
+  static MSHADOW_CINLINE
+  typename std::enable_if::value, bool>::type
+  Launch(mshadow::Stream *s, const int N, DType *dest, Args... args) {
+return LaunchWithType(s, N, dest, args...);
 
 Review comment:
   Yeah, I think I just did it this way so you could tell the difference 
between the calls. Otherwise it?s really hard to pick out the ?dest? argument 
and understand whats different. I can change though, no problem.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] piiswrong commented on a change in pull request #8719: Tune without Launch specialization macros

2017-11-21 Thread GitBox
piiswrong commented on a change in pull request #8719: Tune without Launch 
specialization macros
URL: https://github.com/apache/incubator-mxnet/pull/8719#discussion_r152475091
 
 

 ##
 File path: src/operator/mxnet_op.h
 ##
 @@ -441,8 +447,80 @@ struct Kernel {
 OP::Map(0, N, args...);
 #endif
   }
+
+  /*!
+   * \brief Launch a tunable OP with explicitly-supplied data type
+   * \tparam DType Data type
+   * \tparam OP type
+   * \tparam Args Varargs type to eventually pass to the OP::Map() functoion
+   * \param s Stream (usually null for CPU)
+   * \param N Number of iterations
+   * \param args Varargs to eventually pass to the OP::Map() functoion
+   * \return Always true
+   */
+  template
+  static MSHADOW_CINLINE
+  typename std::enable_if::value, bool>::type
+  LaunchWithType(mshadow::Stream *s, const int N, Args... args) {
+LaunchTuned(s, N, args...);
+return true;
+  }
+
+  /*!
+   * \brief Launch a tunable OP with implicitly-supplied data type
+   * \tparam DType Data type
+   * \tparam T OP type
+   * \tparam Args Varargs type to eventually pass to the OP::Map() functoion
+   * \param s Stream (usually null for CPU)
+   * \param N Number of iterations
+   * \param args Varargs to eventually pass to the OP::Map() functoion
+   * \return Always true
+   */
+  template
+  static MSHADOW_CINLINE
+  typename std::enable_if::value, bool>::type
+  Launch(mshadow::Stream *s, const int N, DType *dest, Args... args) {
+return LaunchWithType(s, N, dest, args...);
 
 Review comment:
   Can we remove LaunchWithType and call LaunchTuned here directly?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] rahul003 commented on issue #8764: Gradient compression example and raise exception if kvstore type unsupported

2017-11-21 Thread GitBox
rahul003 commented on issue #8764: Gradient compression example and raise 
exception if kvstore type unsupported
URL: https://github.com/apache/incubator-mxnet/pull/8764#issuecomment-346245923
 
 
   Now included in PR https://github.com/apache/incubator-mxnet/pull/8766 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] rahul003 closed pull request #8764: Gradient compression example and raise exception if kvstore type unsupported

2017-11-21 Thread GitBox
rahul003 closed pull request #8764: Gradient compression example and raise 
exception if kvstore type unsupported
URL: https://github.com/apache/incubator-mxnet/pull/8764
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/example/gluon/word_language_model/train.py 
b/example/gluon/word_language_model/train.py
index 0b504998be..b419277dcf 100644
--- a/example/gluon/word_language_model/train.py
+++ b/example/gluon/word_language_model/train.py
@@ -54,6 +54,11 @@
 help='report interval')
 parser.add_argument('--save', type=str, default='model.params',
 help='path to save the final model')
+parser.add_argument('--gctype', type=str, default='none',
+help='type of gradient compression to use, \
+  takes `2bit` or `none` for now.')
+parser.add_argument('--gcthreshold', type=float, default=0.5,
+help='threshold for 2bit gradient compression')
 args = parser.parse_args()
 
 
@@ -90,10 +95,13 @@ def batchify(data, batch_size):
 model = model.RNNModel(args.model, ntokens, args.emsize, args.nhid,
args.nlayers, args.dropout, args.tied)
 model.collect_params().initialize(mx.init.Xavier(), ctx=context)
+
+compression_params = None if args.gctype == 'none' else {'type': args.gctype, 
'threshold': args.gcthreshold}
 trainer = gluon.Trainer(model.collect_params(), 'sgd',
 {'learning_rate': args.lr,
  'momentum': 0,
- 'wd': 0})
+ 'wd': 0},
+compression_params=compression_params)
 loss = gluon.loss.SoftmaxCrossEntropyLoss()
 
 ###
diff --git a/python/mxnet/kvstore.py b/python/mxnet/kvstore.py
index d068d06579..a6d3aa519f 100644
--- a/python/mxnet/kvstore.py
+++ b/python/mxnet/kvstore.py
@@ -408,10 +408,13 @@ def set_gradient_compression(self, compression_params):
 Other keys in this dictionary are optional and specific to the type
 of gradient compression.
 """
-ckeys, cvals = _ctype_dict(compression_params)
-check_call(_LIB.MXKVStoreSetGradientCompression(self.handle,
-
mx_uint(len(compression_params)),
-ckeys, cvals))
+if (self.type == 'device') or ('dist' in self.type):
+ckeys, cvals = _ctype_dict(compression_params)
+check_call(_LIB.MXKVStoreSetGradientCompression(self.handle,
+
mx_uint(len(compression_params)),
+ckeys, cvals))
+else:
+raise Exception('Gradient compression is not supported for this 
type of kvstore')
 
 def set_optimizer(self, optimizer):
 """ Registers an optimizer with the kvstore.


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] rahul003 opened a new pull request #8764: Gradient compression example and raise exception if kvstore type unsupported

2017-11-21 Thread GitBox
rahul003 opened a new pull request #8764: Gradient compression example and 
raise exception if kvstore type unsupported
URL: https://github.com/apache/incubator-mxnet/pull/8764
 
 
   ## Description ##
   Added gluon example for gradient compression and raise exception if kvstore 
type unsupported 
   
   ## Checklist ##
   ### Essentials ###
   - [x] Passed code style checking (`make lint`)
   - [x] Changes are complete (i.e. I finished coding on this PR)
   - [x] All changes have test coverage
   - [x] For user-facing API changes, API doc string has been updated. For new 
C++ functions in header files, their functionalities and arguments are 
well-documented. 
   - [x] To my best knowledge, examples are either not affected by this change, 
or have been fixed to be compatible with this change
   
   ### Changes ###
   - [x] Added gluon example for gradient compression
   - [x] Raise exception if kvstore type unsupported 
   
   ## Comments ##
   - If this change is a backward incompatible change, why must this change be 
made.
   - Interesting edge cases to note here


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] rahul003 commented on a change in pull request #8766: NDArray Indexing tutorial and Gradient Compression FAQ

2017-11-21 Thread GitBox
rahul003 commented on a change in pull request #8766: NDArray Indexing tutorial 
and Gradient Compression FAQ
URL: https://github.com/apache/incubator-mxnet/pull/8766#discussion_r152470029
 
 

 ##
 File path: python/mxnet/kvstore.py
 ##
 @@ -408,6 +408,8 @@ def set_gradient_compression(self, compression_params):
 Other keys in this dictionary are optional and specific to the type
 of gradient compression.
 """
+if (self.type() == 'device') or ('dist' in self.type()):
 
 Review comment:
   Please pull this file from my updated branch gc-docs. It fixes the mistake 
here. 
   
   I'm linking the file from that branch here 
https://github.com/rahul003/mxnet/blob/gc-docs/python/mxnet/kvstore.py


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] rahul003 commented on a change in pull request #8766: NDArray Indexing tutorial and Gradient Compression FAQ

2017-11-21 Thread GitBox
rahul003 commented on a change in pull request #8766: NDArray Indexing tutorial 
and Gradient Compression FAQ
URL: https://github.com/apache/incubator-mxnet/pull/8766#discussion_r152468678
 
 

 ##
 File path: docs/faq/gradient_compression.md
 ##
 @@ -0,0 +1,107 @@
+# Gradient Compression
+
+Gradient Compression reduces communication bandwidth to make distributed 
training with GPUs more scalable and efficient without significant loss in 
convergence rate or accuracy.
+
+
+## Benefits
+
+**Increased Speed**
+
+For architectures with fully connected components, the gradient compression 
capability is observed to speedup training by about 2x, depending on the size 
of the model and the network bandwidth of the instance. Bigger models see 
larger speedup with gradient compression.
+
+**Minimal Accuracy Loss**
+
+Gradient compression uses the approach of delaying the synchronization of 
weight updates which are small. Although small weight updates might not be sent 
for that batch, this information is not discarded. Once the weight updates for 
this location accumulate to become a larger value, they will be propagated. 
Since there is no information loss, but only delayed updates, it does not lead 
to a significant loss in accuracy or convergence rate. In distributed training 
experiments[1], it is observed a loss of accuracy as low as 1% for this 
technique.
 
 Review comment:
   it is observed a loss of accuracy as low as 1% for this technique
   ->
   the accuracy loss observed due to gradient compression was as low as 1% 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] rahul003 commented on issue #8766: NDArray Indexing tutorial and Gradient Compression FAQ

2017-11-21 Thread GitBox
rahul003 commented on issue #8766: NDArray Indexing tutorial and Gradient 
Compression FAQ
URL: https://github.com/apache/incubator-mxnet/pull/8766#issuecomment-346243505
 
 
   Thanks a lot for your work on the documentation! :)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] rahul003 commented on a change in pull request #8766: NDArray Indexing tutorial and Gradient Compression FAQ

2017-11-21 Thread GitBox
rahul003 commented on a change in pull request #8766: NDArray Indexing tutorial 
and Gradient Compression FAQ
URL: https://github.com/apache/incubator-mxnet/pull/8766#discussion_r152468880
 
 

 ##
 File path: docs/faq/gradient_compression.md
 ##
 @@ -0,0 +1,107 @@
+# Gradient Compression
+
+Gradient Compression reduces communication bandwidth to make distributed 
training with GPUs more scalable and efficient without significant loss in 
convergence rate or accuracy.
+
+
+## Benefits
+
+**Increased Speed**
+
+For architectures with fully connected components, the gradient compression 
capability is observed to speedup training by about 2x, depending on the size 
of the model and the network bandwidth of the instance. Bigger models see 
larger speedup with gradient compression.
+
+**Minimal Accuracy Loss**
+
+Gradient compression uses the approach of delaying the synchronization of 
weight updates which are small. Although small weight updates might not be sent 
for that batch, this information is not discarded. Once the weight updates for 
this location accumulate to become a larger value, they will be propagated. 
Since there is no information loss, but only delayed updates, it does not lead 
to a significant loss in accuracy or convergence rate. In distributed training 
experiments[1], it is observed a loss of accuracy as low as 1% for this 
technique.
+
+
+## When to Use Gradient Compression
+
+When training models whose architectures include large fully connected 
components, it can be helpful to use gradient compression. For larger models, 
the communication cost becomes a major factor. Such models stand to benefit 
greatly with gradient compression.
+
+
+### GPU versus CPU
+
+The greatest benefits from gradient compression are realized when using 
multi-node (single or multi-GPU) distributed training. Training on CPU would 
provide a lower compute density per compute node as compared to the massive 
compute density per compute node on a GPU. Due to this, the required 
communication bandwidth for CPU-based nodes during training is not as high as 
for GPU-based nodes. Hence, the benefits of gradient compression are lower for 
CPU-based nodes as compared to GPU-based nodes.
+
+
+### Network Latency
+
+Benefits of gradient compression can be found when using distributed training 
with network connected nodes. Depending on the network latency between nodes 
and the model's size, these can contribute to slow performance such that 
gradient compression may provide speed improvements.
+
+You may not want to use gradient compression if you have low latency network 
communication.
+
+
+### Model Size
+
+Distributed training involves synchronization of weights after each batch. 
Larger models have much higher communication costs during training, hence such 
models stand to benefit much more from gradient compression.
+When running distributed training with gradient compression, the quantize and 
dequantize operations happen on CPU parallelized with OpenMP. For smaller 
models, when training on GPUs, it helps to set `OMP_NUM_THREADS=1` on each 
node, so that the overhead of launching OMP threads doesn't cause the 
compression and decompression to be slow.
+
+### Model Architecture
+
+The communication bandwidth requirements during training vary across various 
neural network architectures and hence the benefits of gradient compression 
vary accordingly.
+
+In networks which have significant fully connected components, since such 
layers have low compute cost on GPUs, communication becomes a bottleneck 
limiting the speed of distributed training. Gradient compression can help 
reduce the communication cost, and thus speed up training in such cases. We 
have observed speedup of about 2x on large fully connected neural networks. 
Models like AlexNet and VGG have large fully connected components as part of 
the network, hence stand to benefit from gradient compression. Long Short-Term 
Memory architectures require more communication bandwidth, so they also exhibit 
speed improvements with gradient compression.
+
+Architectures like Convolutional Neural Networks on the other hand have a 
higher compute cost, in which case some communication can be parallelized with 
compute. Since communication is not the bottleneck in such networks, gradient 
compression doesn't help much.
+
+
+### Single Node Gradient Compression
+
+When the training is configured to use device to device communication on a 
single node with multiple GPUs, gradient compression can be used to reduce the 
cost communication. This can provide about 20% speedup for large models using 
older generation architectures. However, speed benefits may be negligible on a 
machine with a newer generation architecture where GPUs can communicate at low 
latency.
 
 Review comment:
   cost communication -> cost of communication


This is an automated message from the Apache Git 

[GitHub] rahul003 commented on a change in pull request #8766: NDArray Indexing tutorial and Gradient Compression FAQ

2017-11-21 Thread GitBox
rahul003 commented on a change in pull request #8766: NDArray Indexing tutorial 
and Gradient Compression FAQ
URL: https://github.com/apache/incubator-mxnet/pull/8766#discussion_r152468880
 
 

 ##
 File path: docs/faq/gradient_compression.md
 ##
 @@ -0,0 +1,107 @@
+# Gradient Compression
+
+Gradient Compression reduces communication bandwidth to make distributed 
training with GPUs more scalable and efficient without significant loss in 
convergence rate or accuracy.
+
+
+## Benefits
+
+**Increased Speed**
+
+For architectures with fully connected components, the gradient compression 
capability is observed to speedup training by about 2x, depending on the size 
of the model and the network bandwidth of the instance. Bigger models see 
larger speedup with gradient compression.
+
+**Minimal Accuracy Loss**
+
+Gradient compression uses the approach of delaying the synchronization of 
weight updates which are small. Although small weight updates might not be sent 
for that batch, this information is not discarded. Once the weight updates for 
this location accumulate to become a larger value, they will be propagated. 
Since there is no information loss, but only delayed updates, it does not lead 
to a significant loss in accuracy or convergence rate. In distributed training 
experiments[1], it is observed a loss of accuracy as low as 1% for this 
technique.
+
+
+## When to Use Gradient Compression
+
+When training models whose architectures include large fully connected 
components, it can be helpful to use gradient compression. For larger models, 
the communication cost becomes a major factor. Such models stand to benefit 
greatly with gradient compression.
+
+
+### GPU versus CPU
+
+The greatest benefits from gradient compression are realized when using 
multi-node (single or multi-GPU) distributed training. Training on CPU would 
provide a lower compute density per compute node as compared to the massive 
compute density per compute node on a GPU. Due to this, the required 
communication bandwidth for CPU-based nodes during training is not as high as 
for GPU-based nodes. Hence, the benefits of gradient compression are lower for 
CPU-based nodes as compared to GPU-based nodes.
+
+
+### Network Latency
+
+Benefits of gradient compression can be found when using distributed training 
with network connected nodes. Depending on the network latency between nodes 
and the model's size, these can contribute to slow performance such that 
gradient compression may provide speed improvements.
+
+You may not want to use gradient compression if you have low latency network 
communication.
+
+
+### Model Size
+
+Distributed training involves synchronization of weights after each batch. 
Larger models have much higher communication costs during training, hence such 
models stand to benefit much more from gradient compression.
+When running distributed training with gradient compression, the quantize and 
dequantize operations happen on CPU parallelized with OpenMP. For smaller 
models, when training on GPUs, it helps to set `OMP_NUM_THREADS=1` on each 
node, so that the overhead of launching OMP threads doesn't cause the 
compression and decompression to be slow.
+
+### Model Architecture
+
+The communication bandwidth requirements during training vary across various 
neural network architectures and hence the benefits of gradient compression 
vary accordingly.
+
+In networks which have significant fully connected components, since such 
layers have low compute cost on GPUs, communication becomes a bottleneck 
limiting the speed of distributed training. Gradient compression can help 
reduce the communication cost, and thus speed up training in such cases. We 
have observed speedup of about 2x on large fully connected neural networks. 
Models like AlexNet and VGG have large fully connected components as part of 
the network, hence stand to benefit from gradient compression. Long Short-Term 
Memory architectures require more communication bandwidth, so they also exhibit 
speed improvements with gradient compression.
+
+Architectures like Convolutional Neural Networks on the other hand have a 
higher compute cost, in which case some communication can be parallelized with 
compute. Since communication is not the bottleneck in such networks, gradient 
compression doesn't help much.
+
+
+### Single Node Gradient Compression
+
+When the training is configured to use device to device communication on a 
single node with multiple GPUs, gradient compression can be used to reduce the 
cost communication. This can provide about 20% speedup for large models using 
older generation architectures. However, speed benefits may be negligible on a 
machine with a newer generation architecture where GPUs can communicate at low 
latency.
 
 Review comment:
   cost of communication


This is an automated message from the Apache Git Service.
To respond to the 

[GitHub] rahul003 commented on a change in pull request #8766: NDArray Indexing tutorial and Gradient Compression FAQ

2017-11-21 Thread GitBox
rahul003 commented on a change in pull request #8766: NDArray Indexing tutorial 
and Gradient Compression FAQ
URL: https://github.com/apache/incubator-mxnet/pull/8766#discussion_r152468831
 
 

 ##
 File path: docs/faq/gradient_compression.md
 ##
 @@ -0,0 +1,107 @@
+# Gradient Compression
+
+Gradient Compression reduces communication bandwidth to make distributed 
training with GPUs more scalable and efficient without significant loss in 
convergence rate or accuracy.
+
+
+## Benefits
+
+**Increased Speed**
+
+For architectures with fully connected components, the gradient compression 
capability is observed to speedup training by about 2x, depending on the size 
of the model and the network bandwidth of the instance. Bigger models see 
larger speedup with gradient compression.
+
+**Minimal Accuracy Loss**
+
+Gradient compression uses the approach of delaying the synchronization of 
weight updates which are small. Although small weight updates might not be sent 
for that batch, this information is not discarded. Once the weight updates for 
this location accumulate to become a larger value, they will be propagated. 
Since there is no information loss, but only delayed updates, it does not lead 
to a significant loss in accuracy or convergence rate. In distributed training 
experiments[1], it is observed a loss of accuracy as low as 1% for this 
technique.
+
+
+## When to Use Gradient Compression
+
+When training models whose architectures include large fully connected 
components, it can be helpful to use gradient compression. For larger models, 
the communication cost becomes a major factor. Such models stand to benefit 
greatly with gradient compression.
+
+
+### GPU versus CPU
+
+The greatest benefits from gradient compression are realized when using 
multi-node (single or multi-GPU) distributed training. Training on CPU would 
provide a lower compute density per compute node as compared to the massive 
compute density per compute node on a GPU. Due to this, the required 
communication bandwidth for CPU-based nodes during training is not as high as 
for GPU-based nodes. Hence, the benefits of gradient compression are lower for 
CPU-based nodes as compared to GPU-based nodes.
+
+
+### Network Latency
+
+Benefits of gradient compression can be found when using distributed training 
with network connected nodes. Depending on the network latency between nodes 
and the model's size, these can contribute to slow performance such that 
gradient compression may provide speed improvements.
+
+You may not want to use gradient compression if you have low latency network 
communication.
+
+
+### Model Size
+
+Distributed training involves synchronization of weights after each batch. 
Larger models have much higher communication costs during training, hence such 
models stand to benefit much more from gradient compression.
+When running distributed training with gradient compression, the quantize and 
dequantize operations happen on CPU parallelized with OpenMP. For smaller 
models, when training on GPUs, it helps to set `OMP_NUM_THREADS=1` on each 
node, so that the overhead of launching OMP threads doesn't cause the 
compression and decompression to be slow.
+
+### Model Architecture
+
+The communication bandwidth requirements during training vary across various 
neural network architectures and hence the benefits of gradient compression 
vary accordingly.
+
+In networks which have significant fully connected components, since such 
layers have low compute cost on GPUs, communication becomes a bottleneck 
limiting the speed of distributed training. Gradient compression can help 
reduce the communication cost, and thus speed up training in such cases. We 
have observed speedup of about 2x on large fully connected neural networks. 
Models like AlexNet and VGG have large fully connected components as part of 
the network, hence stand to benefit from gradient compression. Long Short-Term 
Memory architectures require more communication bandwidth, so they also exhibit 
speed improvements with gradient compression.
 
 Review comment:
   nit: should this be
   
   Long Short-Term Memory architectures also * 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] rahul003 commented on a change in pull request #8766: NDArray Indexing tutorial and Gradient Compression FAQ

2017-11-21 Thread GitBox
rahul003 commented on a change in pull request #8766: NDArray Indexing tutorial 
and Gradient Compression FAQ
URL: https://github.com/apache/incubator-mxnet/pull/8766#discussion_r152468856
 
 

 ##
 File path: docs/faq/gradient_compression.md
 ##
 @@ -0,0 +1,107 @@
+# Gradient Compression
+
+Gradient Compression reduces communication bandwidth to make distributed 
training with GPUs more scalable and efficient without significant loss in 
convergence rate or accuracy.
+
+
+## Benefits
+
+**Increased Speed**
+
+For architectures with fully connected components, the gradient compression 
capability is observed to speedup training by about 2x, depending on the size 
of the model and the network bandwidth of the instance. Bigger models see 
larger speedup with gradient compression.
+
+**Minimal Accuracy Loss**
+
+Gradient compression uses the approach of delaying the synchronization of 
weight updates which are small. Although small weight updates might not be sent 
for that batch, this information is not discarded. Once the weight updates for 
this location accumulate to become a larger value, they will be propagated. 
Since there is no information loss, but only delayed updates, it does not lead 
to a significant loss in accuracy or convergence rate. In distributed training 
experiments[1], it is observed a loss of accuracy as low as 1% for this 
technique.
+
+
+## When to Use Gradient Compression
+
+When training models whose architectures include large fully connected 
components, it can be helpful to use gradient compression. For larger models, 
the communication cost becomes a major factor. Such models stand to benefit 
greatly with gradient compression.
+
+
+### GPU versus CPU
+
+The greatest benefits from gradient compression are realized when using 
multi-node (single or multi-GPU) distributed training. Training on CPU would 
provide a lower compute density per compute node as compared to the massive 
compute density per compute node on a GPU. Due to this, the required 
communication bandwidth for CPU-based nodes during training is not as high as 
for GPU-based nodes. Hence, the benefits of gradient compression are lower for 
CPU-based nodes as compared to GPU-based nodes.
+
+
+### Network Latency
+
+Benefits of gradient compression can be found when using distributed training 
with network connected nodes. Depending on the network latency between nodes 
and the model's size, these can contribute to slow performance such that 
gradient compression may provide speed improvements.
+
+You may not want to use gradient compression if you have low latency network 
communication.
+
+
+### Model Size
+
+Distributed training involves synchronization of weights after each batch. 
Larger models have much higher communication costs during training, hence such 
models stand to benefit much more from gradient compression.
+When running distributed training with gradient compression, the quantize and 
dequantize operations happen on CPU parallelized with OpenMP. For smaller 
models, when training on GPUs, it helps to set `OMP_NUM_THREADS=1` on each 
node, so that the overhead of launching OMP threads doesn't cause the 
compression and decompression to be slow.
+
+### Model Architecture
+
+The communication bandwidth requirements during training vary across various 
neural network architectures and hence the benefits of gradient compression 
vary accordingly.
+
+In networks which have significant fully connected components, since such 
layers have low compute cost on GPUs, communication becomes a bottleneck 
limiting the speed of distributed training. Gradient compression can help 
reduce the communication cost, and thus speed up training in such cases. We 
have observed speedup of about 2x on large fully connected neural networks. 
Models like AlexNet and VGG have large fully connected components as part of 
the network, hence stand to benefit from gradient compression. Long Short-Term 
Memory architectures require more communication bandwidth, so they also exhibit 
speed improvements with gradient compression.
+
+Architectures like Convolutional Neural Networks on the other hand have a 
higher compute cost, in which case some communication can be parallelized with 
compute. Since communication is not the bottleneck in such networks, gradient 
compression doesn't help much.
 
 Review comment:
   nit: compute -> computation


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] eric-haibin-lin closed pull request #8765: Update AddVersion.py

2017-11-21 Thread GitBox
eric-haibin-lin closed pull request #8765: Update AddVersion.py
URL: https://github.com/apache/incubator-mxnet/pull/8765
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/docs/build_version_doc/AddVersion.py 
b/docs/build_version_doc/AddVersion.py
index 2c9ee22bf4..de7670a759 100755
--- a/docs/build_version_doc/AddVersion.py
+++ b/docs/build_version_doc/AddVersion.py
@@ -87,7 +87,7 @@
 pip_pattern = ['', '-cu80', '-cu75', '-cu80mkl', '-cu75mkl', 
'-mkl']
 if args.current_version == 'master':
 outstr = outstr.replace('git clone --recursive 
https://github.com/dmlc/mxnet',
-'git clone --recursive 
https://github.com/apache/incubator-mxnet.git mxnet')
+'git clone --recursive 
https://github.com/apache/incubator-mxnet.git')
 for trail in pip_pattern:
 outstr = outstr.replace('pip install mxnet%s<' % (trail),
 'pip install mxnet%s --pre<' % 
(trail))
@@ -95,7 +95,7 @@
 'pip install mxnet%s --pre\n<' % 
(trail))
 else:
 outstr = outstr.replace('git clone --recursive 
https://github.com/dmlc/mxnet',
-'git clone --recursive 
https://github.com/apache/incubator-mxnet.git mxnet '
+'git clone --recursive 
https://github.com/apache/incubator-mxnet.git'
 '--branch %s' % (args.current_version))
 for trail in pip_pattern:
 outstr = outstr.replace('pip install mxnet%s<' % (trail),


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] rahul003 commented on a change in pull request #8766: NDArray Indexing tutorial and Gradient Compression FAQ

2017-11-21 Thread GitBox
rahul003 commented on a change in pull request #8766: NDArray Indexing tutorial 
and Gradient Compression FAQ
URL: https://github.com/apache/incubator-mxnet/pull/8766#discussion_r152468678
 
 

 ##
 File path: docs/faq/gradient_compression.md
 ##
 @@ -0,0 +1,107 @@
+# Gradient Compression
+
+Gradient Compression reduces communication bandwidth to make distributed 
training with GPUs more scalable and efficient without significant loss in 
convergence rate or accuracy.
+
+
+## Benefits
+
+**Increased Speed**
+
+For architectures with fully connected components, the gradient compression 
capability is observed to speedup training by about 2x, depending on the size 
of the model and the network bandwidth of the instance. Bigger models see 
larger speedup with gradient compression.
+
+**Minimal Accuracy Loss**
+
+Gradient compression uses the approach of delaying the synchronization of 
weight updates which are small. Although small weight updates might not be sent 
for that batch, this information is not discarded. Once the weight updates for 
this location accumulate to become a larger value, they will be propagated. 
Since there is no information loss, but only delayed updates, it does not lead 
to a significant loss in accuracy or convergence rate. In distributed training 
experiments[1], it is observed a loss of accuracy as low as 1% for this 
technique.
 
 Review comment:
   it is observed a loss of accuracy as low as 1% for this technique
   ->
   gradient compression caused an accuracy loss observed was as low as 1% 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] eric-haibin-lin commented on a change in pull request #8767: Factorization machine example & sparse example folder re-org

2017-11-21 Thread GitBox
eric-haibin-lin commented on a change in pull request #8767: Factorization 
machine example & sparse example folder re-org
URL: https://github.com/apache/incubator-mxnet/pull/8767#discussion_r152468479
 
 

 ##
 File path: example/sparse/factorization_machine/metric.py
 ##
 @@ -0,0 +1,88 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import mxnet as mx
+import numpy as np
+
+@mx.metric.register
+@mx.metric.alias('log_loss')
+class LogLossMetric(mx.metric.EvalMetric):
 
 Review comment:
   The existing `nll_loss` metric was not used because it was expecting K-D 
output. For FM the output is 1-D for binary classification


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] eric-haibin-lin opened a new pull request #8767: Factorization machine example & sparse example folder re-org

2017-11-21 Thread GitBox
eric-haibin-lin opened a new pull request #8767: Factorization machine example 
& sparse example folder re-org
URL: https://github.com/apache/incubator-mxnet/pull/8767
 
 
   ## Description ##
   
   Added an example for factorization machine, and created separate folders for 
the examples
   
   @ZiyueHuang @anirudh2290  pls help review
   
   ## Checklist ##
   ### Essentials ###
   - [ ] Passed code style checking (`make lint`)
   - [ ] Changes are complete (i.e. I finished coding on this PR)
   - [ ] All changes have test coverage
   - [ ] For user-facing API changes, API doc string has been updated. For new 
C++ functions in header files, their functionalities and arguments are 
well-documented. 
   - [ ] To my best knowledge, examples are either not affected by this change, 
or have been fixed to be compatible with this change
   
   ### Changes ###
   - [ ] Feature1, tests, (and when applicable, API doc)
   - [ ] Feature2, tests, (and when applicable, API doc)
   
   ## Comments ##
   - If this change is a backward incompatible change, why must this change be 
made.
   - Interesting edge cases to note here
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] kevinthesun commented on issue #8765: Update AddVersion.py

2017-11-21 Thread GitBox
kevinthesun commented on issue #8765: Update AddVersion.py
URL: https://github.com/apache/incubator-mxnet/pull/8765#issuecomment-346238730
 
 
   This can cause problem on linux installation page: 
   ```bash
   git clone --recursive https://github.com/dmlc/mxnet
   cd mxnet
   ```
   There won't be mxnet folder. I suggest just change the content on OSX 
installation page.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] reminisce commented on a change in pull request #8766: NDArray Indexing tutorial and Gradient Compression FAQ

2017-11-21 Thread GitBox
reminisce commented on a change in pull request #8766: NDArray Indexing 
tutorial and Gradient Compression FAQ
URL: https://github.com/apache/incubator-mxnet/pull/8766#discussion_r152465467
 
 

 ##
 File path: docs/tutorials/basic/ndarray_indexing.md
 ##
 @@ -0,0 +1,375 @@
+
+# NDArray Indexing - Array indexing features
+
+MXNet's advanced indexing features are modeled after [NumPy's implementation 
and 
documentation](https://docs.scipy.org/doc/numpy-1.13.0/reference/arrays.indexing.html#combining-advanced-and-basic-indexing).
 You will see direct adaptations of many NumPy indexing features, and these are 
close, if not identical.
+
+`NDArray`s can be indexed using the standard Python `x[obj]` syntax, where _x_ 
is the array and _obj_ the selection.
+
+There are three kinds of indexing available:
+
+1. field access
+1. basic slicing
+1. advanced indexing
+
+In MXNet, we support both basic and advanced indexing following the convention 
of indexing NumPy's `ndarray`.
+
+
+## Basic Slicing and Indexing
+
+Basic slicing extends Python?s basic concept of slicing to N dimensions. For a 
quick review:
+
+```
+a[start:end] # items start through end-1
+a[start:]# items start through the rest of the array
+a[:end]  # items from the beginning through end-1
+a[:] # a copy of the whole array
+```
+
+
+```python
+import mxnet as mx
+from mxnet import nd
+```
+
+For some working examples of basic slicing we'll start simple.
+
+
+```python
+x = nd.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype='int32')
+x[5:]
+```
+
+
+
+
+
+[5 6 7 8 9]
+
+
+
+
+
+```python
+x = nd.array([0, 1, 2, 3])
+print('1D complete array, x=', x)
+s = x[1:3]
+print('slicing the 2nd and 3rd elements, s=', s)
+```
+
+1D complete array, x=
+[ 0.  1.  2.  3.]
+
+slicing the 2nd and 3rd elements, s=
+[ 1.  2.]
+
+
+
+Now let's try slicing the 2nd and 3rd elements of a multi-dimensional array.
+
+
+```python
+x = nd.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])
+print('multi-D complete array, x=', x)
+s = x[1:3]
+print('slicing the 2nd and 3rd elements, s=', s)
+```
+
+multi-D complete array, x=
+[[  1.   2.   3.   4.]
+ [  5.   6.   7.   8.]
+ [  9.  10.  11.  12.]]
+
+slicing the 2nd and 3rd elements, s=
+[[  5.   6.   7.   8.]
+ [  9.  10.  11.  12.]]
+
+
+
+Now let's try writing to a specific element. We'll write `9` to element `2` 
using `x[2] = 9.0`, which update the whole row.
+
+
+```python
+print('original x, x=', x)
+x[2] = 9.0
+print('replaced entire row with x[2] = 9.0, x=', x)
+```
+
+original x, x=
+[[  1.   2.   3.   4.]
+ [  5.   6.   7.   8.]
+ [  9.  10.  11.  12.]]
+
+replaced entire row with x[2] = 9.0, x=
+[[ 1.  2.  3.  4.]
+ [ 5.  6.  7.  8.]
+ [ 9.  9.  9.  9.]]
+
+
+
+We can target specific elements too. Let's replace the number `3` in the first 
row with the number `9` using `x[0,2] = 9.0`.
+
+
+```python
+print('original x, x=', x)
+x[0,2] = 9.0
+print('replaced specific element with x[0,2] = 9.0, x=', x)
+```
+
+original x, x=
+[[ 1.  2.  3.  4.]
+ [ 5.  6.  7.  8.]
+ [ 9.  9.  9.  9.]]
+
+replaced specific element with x[0,2] = 9.0, x=
+[[ 1.  2.  9.  4.]
+ [ 5.  6.  7.  8.]
+ [ 9.  9.  9.  9.]]
+
+
+
+Now lets target even more by selecting a couple of targets at the same time. 
We'll replace the `6` and the `7` with `x[1:2,1:3] = 5.0`.
+
+
+```python
+print('original x, x=', x)
+x[1:2,1:3] = 5.0
+print('replaced range of elements with x[1:2,1:3] = 5.0, x=', x)
+```
+
+original x, x=
+[[ 1.  2.  9.  4.]
+ [ 5.  6.  7.  8.]
+ [ 9.  9.  9.  9.]]
+
+replaced range of elements with x[1:2,1:3] = 5.0, x=
+[[ 1.  2.  9.  4.]
+ [ 5.  5.  5.  8.]
+ [ 9.  9.  9.  9.]]
+
+
+
+## New Indexing Features in v1.0
+
+### Step
+
+The basic slice syntax is `i:j:k` where _i_ is the starting index, _j_ is the 
stopping index, and _k_ is the step (k must be nonzero).
+
+**Note**: Previously, MXNet supported basic slicing and indexing only with 
`step=1`. From release 1.0, abitrary value of `step` is supported.
+
+
+```python
+x = nd.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype='int32')
+# Select elements 1 through 7, and use a step of 2
+x[1:7:2]
+```
+
+
+
+
+
+[1 3 5]
+
+
+
+
+## Negative Indices
+Negative _i_ and _j_ are interpreted as _n + i_ and _n + j_ where _n_ is the 
number of elements in the corresponding dimension. Negative _k_ makes stepping 
go towards smaller indices.
+
+
+```python
+x[-2:10]
+```
+
+
+
+
+
+[8 9]
+
+
+
+
+If the number of objects in the selection tuple is less than N , then : is 
assumed for any subsequent dimensions.
+
+
+```python
+x = nd.array([[[1],[2],[3]],
+ [[4],[5],[6]]], dtype='int32')
+x[1:2]
+```
+
+
+
+
+
+[[[4]
+  [5]
+  [6]]]
+
+
+
+
+You may use slicing to set values in the array, but (unlike lists) you can 
never grow the array. 

[GitHub] reminisce commented on a change in pull request #8766: NDArray Indexing tutorial and Gradient Compression FAQ

2017-11-21 Thread GitBox
reminisce commented on a change in pull request #8766: NDArray Indexing 
tutorial and Gradient Compression FAQ
URL: https://github.com/apache/incubator-mxnet/pull/8766#discussion_r152464912
 
 

 ##
 File path: docs/tutorials/basic/ndarray_indexing.md
 ##
 @@ -0,0 +1,375 @@
+
+# NDArray Indexing - Array indexing features
+
+MXNet's advanced indexing features are modeled after [NumPy's implementation 
and 
documentation](https://docs.scipy.org/doc/numpy-1.13.0/reference/arrays.indexing.html#combining-advanced-and-basic-indexing).
 You will see direct adaptations of many NumPy indexing features, and these are 
close, if not identical.
+
+`NDArray`s can be indexed using the standard Python `x[obj]` syntax, where _x_ 
is the array and _obj_ the selection.
+
+There are three kinds of indexing available:
+
+1. field access
+1. basic slicing
+1. advanced indexing
+
+In MXNet, we support both basic and advanced indexing following the convention 
of indexing NumPy's `ndarray`.
+
+
+## Basic Slicing and Indexing
+
+Basic slicing extends Python?s basic concept of slicing to N dimensions. For a 
quick review:
+
+```
+a[start:end] # items start through end-1
+a[start:]# items start through the rest of the array
+a[:end]  # items from the beginning through end-1
+a[:] # a copy of the whole array
+```
+
+
+```python
+import mxnet as mx
+from mxnet import nd
+```
+
+For some working examples of basic slicing we'll start simple.
+
+
+```python
+x = nd.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype='int32')
+x[5:]
+```
+
+
+
+
+
+[5 6 7 8 9]
+
+
+
+
+
+```python
+x = nd.array([0, 1, 2, 3])
+print('1D complete array, x=', x)
+s = x[1:3]
+print('slicing the 2nd and 3rd elements, s=', s)
+```
+
+1D complete array, x=
+[ 0.  1.  2.  3.]
+
+slicing the 2nd and 3rd elements, s=
+[ 1.  2.]
+
+
+
+Now let's try slicing the 2nd and 3rd elements of a multi-dimensional array.
+
+
+```python
+x = nd.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])
+print('multi-D complete array, x=', x)
+s = x[1:3]
+print('slicing the 2nd and 3rd elements, s=', s)
+```
+
+multi-D complete array, x=
+[[  1.   2.   3.   4.]
+ [  5.   6.   7.   8.]
+ [  9.  10.  11.  12.]]
+
+slicing the 2nd and 3rd elements, s=
+[[  5.   6.   7.   8.]
+ [  9.  10.  11.  12.]]
+
+
+
+Now let's try writing to a specific element. We'll write `9` to element `2` 
using `x[2] = 9.0`, which update the whole row.
+
+
+```python
+print('original x, x=', x)
+x[2] = 9.0
+print('replaced entire row with x[2] = 9.0, x=', x)
+```
+
+original x, x=
+[[  1.   2.   3.   4.]
+ [  5.   6.   7.   8.]
+ [  9.  10.  11.  12.]]
+
+replaced entire row with x[2] = 9.0, x=
+[[ 1.  2.  3.  4.]
+ [ 5.  6.  7.  8.]
+ [ 9.  9.  9.  9.]]
+
+
+
+We can target specific elements too. Let's replace the number `3` in the first 
row with the number `9` using `x[0,2] = 9.0`.
+
+
+```python
+print('original x, x=', x)
+x[0,2] = 9.0
 
 Review comment:
   Add space.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] reminisce commented on a change in pull request #8766: NDArray Indexing tutorial and Gradient Compression FAQ

2017-11-21 Thread GitBox
reminisce commented on a change in pull request #8766: NDArray Indexing 
tutorial and Gradient Compression FAQ
URL: https://github.com/apache/incubator-mxnet/pull/8766#discussion_r152465200
 
 

 ##
 File path: docs/tutorials/basic/ndarray_indexing.md
 ##
 @@ -0,0 +1,375 @@
+
+# NDArray Indexing - Array indexing features
+
+MXNet's advanced indexing features are modeled after [NumPy's implementation 
and 
documentation](https://docs.scipy.org/doc/numpy-1.13.0/reference/arrays.indexing.html#combining-advanced-and-basic-indexing).
 You will see direct adaptations of many NumPy indexing features, and these are 
close, if not identical.
+
+`NDArray`s can be indexed using the standard Python `x[obj]` syntax, where _x_ 
is the array and _obj_ the selection.
+
+There are three kinds of indexing available:
+
+1. field access
+1. basic slicing
+1. advanced indexing
+
+In MXNet, we support both basic and advanced indexing following the convention 
of indexing NumPy's `ndarray`.
+
+
+## Basic Slicing and Indexing
+
+Basic slicing extends Python?s basic concept of slicing to N dimensions. For a 
quick review:
+
+```
+a[start:end] # items start through end-1
+a[start:]# items start through the rest of the array
+a[:end]  # items from the beginning through end-1
+a[:] # a copy of the whole array
+```
+
+
+```python
+import mxnet as mx
+from mxnet import nd
+```
+
+For some working examples of basic slicing we'll start simple.
+
+
+```python
+x = nd.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype='int32')
+x[5:]
+```
+
+
+
+
+
+[5 6 7 8 9]
+
+
+
+
+
+```python
+x = nd.array([0, 1, 2, 3])
+print('1D complete array, x=', x)
+s = x[1:3]
+print('slicing the 2nd and 3rd elements, s=', s)
+```
+
+1D complete array, x=
+[ 0.  1.  2.  3.]
+
+slicing the 2nd and 3rd elements, s=
+[ 1.  2.]
+
+
+
+Now let's try slicing the 2nd and 3rd elements of a multi-dimensional array.
+
+
+```python
+x = nd.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])
+print('multi-D complete array, x=', x)
+s = x[1:3]
+print('slicing the 2nd and 3rd elements, s=', s)
+```
+
+multi-D complete array, x=
+[[  1.   2.   3.   4.]
+ [  5.   6.   7.   8.]
+ [  9.  10.  11.  12.]]
+
+slicing the 2nd and 3rd elements, s=
+[[  5.   6.   7.   8.]
+ [  9.  10.  11.  12.]]
+
+
+
+Now let's try writing to a specific element. We'll write `9` to element `2` 
using `x[2] = 9.0`, which update the whole row.
+
+
+```python
+print('original x, x=', x)
+x[2] = 9.0
+print('replaced entire row with x[2] = 9.0, x=', x)
+```
+
+original x, x=
+[[  1.   2.   3.   4.]
+ [  5.   6.   7.   8.]
+ [  9.  10.  11.  12.]]
+
+replaced entire row with x[2] = 9.0, x=
+[[ 1.  2.  3.  4.]
+ [ 5.  6.  7.  8.]
+ [ 9.  9.  9.  9.]]
+
+
+
+We can target specific elements too. Let's replace the number `3` in the first 
row with the number `9` using `x[0,2] = 9.0`.
+
+
+```python
+print('original x, x=', x)
+x[0,2] = 9.0
+print('replaced specific element with x[0,2] = 9.0, x=', x)
+```
+
+original x, x=
+[[ 1.  2.  3.  4.]
+ [ 5.  6.  7.  8.]
+ [ 9.  9.  9.  9.]]
+
+replaced specific element with x[0,2] = 9.0, x=
+[[ 1.  2.  9.  4.]
+ [ 5.  6.  7.  8.]
+ [ 9.  9.  9.  9.]]
+
+
+
+Now lets target even more by selecting a couple of targets at the same time. 
We'll replace the `6` and the `7` with `x[1:2,1:3] = 5.0`.
+
+
+```python
+print('original x, x=', x)
+x[1:2,1:3] = 5.0
+print('replaced range of elements with x[1:2,1:3] = 5.0, x=', x)
+```
+
+original x, x=
+[[ 1.  2.  9.  4.]
+ [ 5.  6.  7.  8.]
+ [ 9.  9.  9.  9.]]
+
+replaced range of elements with x[1:2,1:3] = 5.0, x=
+[[ 1.  2.  9.  4.]
+ [ 5.  5.  5.  8.]
+ [ 9.  9.  9.  9.]]
+
+
+
+## New Indexing Features in v1.0
+
+### Step
+
+The basic slice syntax is `i:j:k` where _i_ is the starting index, _j_ is the 
stopping index, and _k_ is the step (k must be nonzero).
+
+**Note**: Previously, MXNet supported basic slicing and indexing only with 
`step=1`. From release 1.0, abitrary value of `step` is supported.
+
+
+```python
+x = nd.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype='int32')
+# Select elements 1 through 7, and use a step of 2
+x[1:7:2]
+```
+
+
+
+
+
+[1 3 5]
+
+
+
+
+## Negative Indices
+Negative _i_ and _j_ are interpreted as _n + i_ and _n + j_ where _n_ is the 
number of elements in the corresponding dimension. Negative _k_ makes stepping 
go towards smaller indices.
+
+
+```python
+x[-2:10]
+```
+
+
+
+
+
+[8 9]
+
+
+
+
+If the number of objects in the selection tuple is less than N , then : is 
assumed for any subsequent dimensions.
+
+
+```python
+x = nd.array([[[1],[2],[3]],
+ [[4],[5],[6]]], dtype='int32')
 
 Review comment:
   Align with the previous line.


This is an automated message from the Apache Git 

[GitHub] reminisce commented on a change in pull request #8766: NDArray Indexing tutorial and Gradient Compression FAQ

2017-11-21 Thread GitBox
reminisce commented on a change in pull request #8766: NDArray Indexing 
tutorial and Gradient Compression FAQ
URL: https://github.com/apache/incubator-mxnet/pull/8766#discussion_r152464985
 
 

 ##
 File path: docs/tutorials/basic/ndarray_indexing.md
 ##
 @@ -0,0 +1,375 @@
+
+# NDArray Indexing - Array indexing features
+
+MXNet's advanced indexing features are modeled after [NumPy's implementation 
and 
documentation](https://docs.scipy.org/doc/numpy-1.13.0/reference/arrays.indexing.html#combining-advanced-and-basic-indexing).
 You will see direct adaptations of many NumPy indexing features, and these are 
close, if not identical.
+
+`NDArray`s can be indexed using the standard Python `x[obj]` syntax, where _x_ 
is the array and _obj_ the selection.
+
+There are three kinds of indexing available:
+
+1. field access
+1. basic slicing
+1. advanced indexing
+
+In MXNet, we support both basic and advanced indexing following the convention 
of indexing NumPy's `ndarray`.
+
+
+## Basic Slicing and Indexing
+
+Basic slicing extends Python?s basic concept of slicing to N dimensions. For a 
quick review:
+
+```
+a[start:end] # items start through end-1
+a[start:]# items start through the rest of the array
+a[:end]  # items from the beginning through end-1
+a[:] # a copy of the whole array
+```
+
+
+```python
+import mxnet as mx
+from mxnet import nd
+```
+
+For some working examples of basic slicing we'll start simple.
+
+
+```python
+x = nd.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype='int32')
+x[5:]
+```
+
+
+
+
+
+[5 6 7 8 9]
+
+
+
+
+
+```python
+x = nd.array([0, 1, 2, 3])
+print('1D complete array, x=', x)
+s = x[1:3]
+print('slicing the 2nd and 3rd elements, s=', s)
+```
+
+1D complete array, x=
+[ 0.  1.  2.  3.]
+
+slicing the 2nd and 3rd elements, s=
+[ 1.  2.]
+
+
+
+Now let's try slicing the 2nd and 3rd elements of a multi-dimensional array.
+
+
+```python
+x = nd.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])
+print('multi-D complete array, x=', x)
+s = x[1:3]
+print('slicing the 2nd and 3rd elements, s=', s)
+```
+
+multi-D complete array, x=
+[[  1.   2.   3.   4.]
+ [  5.   6.   7.   8.]
+ [  9.  10.  11.  12.]]
+
+slicing the 2nd and 3rd elements, s=
+[[  5.   6.   7.   8.]
+ [  9.  10.  11.  12.]]
+
+
+
+Now let's try writing to a specific element. We'll write `9` to element `2` 
using `x[2] = 9.0`, which update the whole row.
+
+
+```python
+print('original x, x=', x)
+x[2] = 9.0
+print('replaced entire row with x[2] = 9.0, x=', x)
+```
+
+original x, x=
+[[  1.   2.   3.   4.]
+ [  5.   6.   7.   8.]
+ [  9.  10.  11.  12.]]
+
+replaced entire row with x[2] = 9.0, x=
+[[ 1.  2.  3.  4.]
+ [ 5.  6.  7.  8.]
+ [ 9.  9.  9.  9.]]
+
+
+
+We can target specific elements too. Let's replace the number `3` in the first 
row with the number `9` using `x[0,2] = 9.0`.
+
+
+```python
+print('original x, x=', x)
+x[0,2] = 9.0
+print('replaced specific element with x[0,2] = 9.0, x=', x)
+```
+
+original x, x=
+[[ 1.  2.  3.  4.]
+ [ 5.  6.  7.  8.]
+ [ 9.  9.  9.  9.]]
+
+replaced specific element with x[0,2] = 9.0, x=
+[[ 1.  2.  9.  4.]
+ [ 5.  6.  7.  8.]
+ [ 9.  9.  9.  9.]]
+
+
+
+Now lets target even more by selecting a couple of targets at the same time. 
We'll replace the `6` and the `7` with `x[1:2,1:3] = 5.0`.
+
+
+```python
+print('original x, x=', x)
+x[1:2,1:3] = 5.0
 
 Review comment:
   Add a space. Please check if there are other places in the tutorial where 
spaces should be added.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] reminisce commented on a change in pull request #8766: NDArray Indexing tutorial and Gradient Compression FAQ

2017-11-21 Thread GitBox
reminisce commented on a change in pull request #8766: NDArray Indexing 
tutorial and Gradient Compression FAQ
URL: https://github.com/apache/incubator-mxnet/pull/8766#discussion_r152465405
 
 

 ##
 File path: docs/tutorials/basic/ndarray_indexing.md
 ##
 @@ -0,0 +1,375 @@
+
+# NDArray Indexing - Array indexing features
+
+MXNet's advanced indexing features are modeled after [NumPy's implementation 
and 
documentation](https://docs.scipy.org/doc/numpy-1.13.0/reference/arrays.indexing.html#combining-advanced-and-basic-indexing).
 You will see direct adaptations of many NumPy indexing features, and these are 
close, if not identical.
+
+`NDArray`s can be indexed using the standard Python `x[obj]` syntax, where _x_ 
is the array and _obj_ the selection.
+
+There are three kinds of indexing available:
+
+1. field access
+1. basic slicing
+1. advanced indexing
+
+In MXNet, we support both basic and advanced indexing following the convention 
of indexing NumPy's `ndarray`.
+
+
+## Basic Slicing and Indexing
+
+Basic slicing extends Python?s basic concept of slicing to N dimensions. For a 
quick review:
+
+```
+a[start:end] # items start through end-1
+a[start:]# items start through the rest of the array
+a[:end]  # items from the beginning through end-1
+a[:] # a copy of the whole array
+```
+
+
+```python
+import mxnet as mx
+from mxnet import nd
+```
+
+For some working examples of basic slicing we'll start simple.
+
+
+```python
+x = nd.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype='int32')
+x[5:]
+```
+
+
+
+
+
+[5 6 7 8 9]
+
+
+
+
+
+```python
+x = nd.array([0, 1, 2, 3])
+print('1D complete array, x=', x)
+s = x[1:3]
+print('slicing the 2nd and 3rd elements, s=', s)
+```
+
+1D complete array, x=
+[ 0.  1.  2.  3.]
+
+slicing the 2nd and 3rd elements, s=
+[ 1.  2.]
+
+
+
+Now let's try slicing the 2nd and 3rd elements of a multi-dimensional array.
+
+
+```python
+x = nd.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])
+print('multi-D complete array, x=', x)
+s = x[1:3]
+print('slicing the 2nd and 3rd elements, s=', s)
+```
+
+multi-D complete array, x=
+[[  1.   2.   3.   4.]
+ [  5.   6.   7.   8.]
+ [  9.  10.  11.  12.]]
+
+slicing the 2nd and 3rd elements, s=
+[[  5.   6.   7.   8.]
+ [  9.  10.  11.  12.]]
+
+
+
+Now let's try writing to a specific element. We'll write `9` to element `2` 
using `x[2] = 9.0`, which update the whole row.
+
+
+```python
+print('original x, x=', x)
+x[2] = 9.0
+print('replaced entire row with x[2] = 9.0, x=', x)
+```
+
+original x, x=
+[[  1.   2.   3.   4.]
+ [  5.   6.   7.   8.]
+ [  9.  10.  11.  12.]]
+
+replaced entire row with x[2] = 9.0, x=
+[[ 1.  2.  3.  4.]
+ [ 5.  6.  7.  8.]
+ [ 9.  9.  9.  9.]]
+
+
+
+We can target specific elements too. Let's replace the number `3` in the first 
row with the number `9` using `x[0,2] = 9.0`.
+
+
+```python
+print('original x, x=', x)
+x[0,2] = 9.0
+print('replaced specific element with x[0,2] = 9.0, x=', x)
+```
+
+original x, x=
+[[ 1.  2.  3.  4.]
+ [ 5.  6.  7.  8.]
+ [ 9.  9.  9.  9.]]
+
+replaced specific element with x[0,2] = 9.0, x=
+[[ 1.  2.  9.  4.]
+ [ 5.  6.  7.  8.]
+ [ 9.  9.  9.  9.]]
+
+
+
+Now lets target even more by selecting a couple of targets at the same time. 
We'll replace the `6` and the `7` with `x[1:2,1:3] = 5.0`.
+
+
+```python
+print('original x, x=', x)
+x[1:2,1:3] = 5.0
+print('replaced range of elements with x[1:2,1:3] = 5.0, x=', x)
+```
+
+original x, x=
+[[ 1.  2.  9.  4.]
+ [ 5.  6.  7.  8.]
+ [ 9.  9.  9.  9.]]
+
+replaced range of elements with x[1:2,1:3] = 5.0, x=
+[[ 1.  2.  9.  4.]
+ [ 5.  5.  5.  8.]
+ [ 9.  9.  9.  9.]]
+
+
+
+## New Indexing Features in v1.0
+
+### Step
+
+The basic slice syntax is `i:j:k` where _i_ is the starting index, _j_ is the 
stopping index, and _k_ is the step (k must be nonzero).
+
+**Note**: Previously, MXNet supported basic slicing and indexing only with 
`step=1`. From release 1.0, abitrary value of `step` is supported.
+
+
+```python
+x = nd.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype='int32')
+# Select elements 1 through 7, and use a step of 2
+x[1:7:2]
+```
+
+
+
+
+
+[1 3 5]
+
+
+
+
+## Negative Indices
+Negative _i_ and _j_ are interpreted as _n + i_ and _n + j_ where _n_ is the 
number of elements in the corresponding dimension. Negative _k_ makes stepping 
go towards smaller indices.
+
+
+```python
+x[-2:10]
+```
+
+
+
+
+
+[8 9]
+
+
+
+
+If the number of objects in the selection tuple is less than N , then : is 
assumed for any subsequent dimensions.
+
+
+```python
+x = nd.array([[[1],[2],[3]],
+ [[4],[5],[6]]], dtype='int32')
+x[1:2]
+```
+
+
+
+
+
+[[[4]
+  [5]
+  [6]]]
+
+
+
+
+You may use slicing to set values in the array, but (unlike lists) you can 
never grow the array. 

[GitHub] reminisce commented on a change in pull request #8766: NDArray Indexing tutorial and Gradient Compression FAQ

2017-11-21 Thread GitBox
reminisce commented on a change in pull request #8766: NDArray Indexing 
tutorial and Gradient Compression FAQ
URL: https://github.com/apache/incubator-mxnet/pull/8766#discussion_r152465737
 
 

 ##
 File path: docs/tutorials/basic/ndarray_indexing.md
 ##
 @@ -0,0 +1,375 @@
+
+# NDArray Indexing - Array indexing features
+
+MXNet's advanced indexing features are modeled after [NumPy's implementation 
and 
documentation](https://docs.scipy.org/doc/numpy-1.13.0/reference/arrays.indexing.html#combining-advanced-and-basic-indexing).
 You will see direct adaptations of many NumPy indexing features, and these are 
close, if not identical.
+
+`NDArray`s can be indexed using the standard Python `x[obj]` syntax, where _x_ 
is the array and _obj_ the selection.
+
+There are three kinds of indexing available:
+
+1. field access
+1. basic slicing
+1. advanced indexing
+
+In MXNet, we support both basic and advanced indexing following the convention 
of indexing NumPy's `ndarray`.
+
+
+## Basic Slicing and Indexing
+
+Basic slicing extends Python?s basic concept of slicing to N dimensions. For a 
quick review:
+
+```
+a[start:end] # items start through end-1
+a[start:]# items start through the rest of the array
+a[:end]  # items from the beginning through end-1
+a[:] # a copy of the whole array
+```
+
+
+```python
+import mxnet as mx
+from mxnet import nd
+```
+
+For some working examples of basic slicing we'll start simple.
+
+
+```python
+x = nd.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype='int32')
+x[5:]
+```
+
+
+
+
+
+[5 6 7 8 9]
+
+
+
+
+
+```python
+x = nd.array([0, 1, 2, 3])
+print('1D complete array, x=', x)
+s = x[1:3]
+print('slicing the 2nd and 3rd elements, s=', s)
+```
+
+1D complete array, x=
+[ 0.  1.  2.  3.]
+
+slicing the 2nd and 3rd elements, s=
+[ 1.  2.]
+
+
+
+Now let's try slicing the 2nd and 3rd elements of a multi-dimensional array.
+
+
+```python
+x = nd.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])
+print('multi-D complete array, x=', x)
+s = x[1:3]
+print('slicing the 2nd and 3rd elements, s=', s)
+```
+
+multi-D complete array, x=
+[[  1.   2.   3.   4.]
+ [  5.   6.   7.   8.]
+ [  9.  10.  11.  12.]]
+
+slicing the 2nd and 3rd elements, s=
+[[  5.   6.   7.   8.]
+ [  9.  10.  11.  12.]]
+
+
+
+Now let's try writing to a specific element. We'll write `9` to element `2` 
using `x[2] = 9.0`, which update the whole row.
+
+
+```python
+print('original x, x=', x)
+x[2] = 9.0
+print('replaced entire row with x[2] = 9.0, x=', x)
+```
+
+original x, x=
+[[  1.   2.   3.   4.]
+ [  5.   6.   7.   8.]
+ [  9.  10.  11.  12.]]
+
+replaced entire row with x[2] = 9.0, x=
+[[ 1.  2.  3.  4.]
+ [ 5.  6.  7.  8.]
+ [ 9.  9.  9.  9.]]
+
+
+
+We can target specific elements too. Let's replace the number `3` in the first 
row with the number `9` using `x[0,2] = 9.0`.
+
+
+```python
+print('original x, x=', x)
+x[0,2] = 9.0
+print('replaced specific element with x[0,2] = 9.0, x=', x)
+```
+
+original x, x=
+[[ 1.  2.  3.  4.]
+ [ 5.  6.  7.  8.]
+ [ 9.  9.  9.  9.]]
+
+replaced specific element with x[0,2] = 9.0, x=
+[[ 1.  2.  9.  4.]
+ [ 5.  6.  7.  8.]
+ [ 9.  9.  9.  9.]]
+
+
+
+Now lets target even more by selecting a couple of targets at the same time. 
We'll replace the `6` and the `7` with `x[1:2,1:3] = 5.0`.
+
+
+```python
+print('original x, x=', x)
+x[1:2,1:3] = 5.0
+print('replaced range of elements with x[1:2,1:3] = 5.0, x=', x)
+```
+
+original x, x=
+[[ 1.  2.  9.  4.]
+ [ 5.  6.  7.  8.]
+ [ 9.  9.  9.  9.]]
+
+replaced range of elements with x[1:2,1:3] = 5.0, x=
+[[ 1.  2.  9.  4.]
+ [ 5.  5.  5.  8.]
+ [ 9.  9.  9.  9.]]
+
+
+
+## New Indexing Features in v1.0
+
+### Step
+
+The basic slice syntax is `i:j:k` where _i_ is the starting index, _j_ is the 
stopping index, and _k_ is the step (k must be nonzero).
+
+**Note**: Previously, MXNet supported basic slicing and indexing only with 
`step=1`. From release 1.0, abitrary value of `step` is supported.
+
+
+```python
+x = nd.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype='int32')
+# Select elements 1 through 7, and use a step of 2
+x[1:7:2]
+```
+
+
+
+
+
+[1 3 5]
+
+
+
+
+## Negative Indices
+Negative _i_ and _j_ are interpreted as _n + i_ and _n + j_ where _n_ is the 
number of elements in the corresponding dimension. Negative _k_ makes stepping 
go towards smaller indices.
+
+
+```python
+x[-2:10]
+```
+
+
+
+
+
+[8 9]
+
+
+
+
+If the number of objects in the selection tuple is less than N , then : is 
assumed for any subsequent dimensions.
+
+
+```python
+x = nd.array([[[1],[2],[3]],
+ [[4],[5],[6]]], dtype='int32')
+x[1:2]
+```
+
+
+
+
+
+[[[4]
+  [5]
+  [6]]]
+
+
+
+
+You may use slicing to set values in the array, but (unlike lists) you can 
never grow the array. 

[GitHub] reminisce commented on a change in pull request #8766: NDArray Indexing tutorial and Gradient Compression FAQ

2017-11-21 Thread GitBox
reminisce commented on a change in pull request #8766: NDArray Indexing 
tutorial and Gradient Compression FAQ
URL: https://github.com/apache/incubator-mxnet/pull/8766#discussion_r152465917
 
 

 ##
 File path: docs/tutorials/basic/ndarray_indexing.md
 ##
 @@ -0,0 +1,375 @@
+
+# NDArray Indexing - Array indexing features
+
+MXNet's advanced indexing features are modeled after [NumPy's implementation 
and 
documentation](https://docs.scipy.org/doc/numpy-1.13.0/reference/arrays.indexing.html#combining-advanced-and-basic-indexing).
 You will see direct adaptations of many NumPy indexing features, and these are 
close, if not identical.
 
 Review comment:
   Should we also emphasize that some examples and explanations are borrowed 
from NumPy's indexing tutorial directly for illustrating the indexing feature 
of MXNet?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] reminisce commented on a change in pull request #8766: NDArray Indexing tutorial and Gradient Compression FAQ

2017-11-21 Thread GitBox
reminisce commented on a change in pull request #8766: NDArray Indexing 
tutorial and Gradient Compression FAQ
URL: https://github.com/apache/incubator-mxnet/pull/8766#discussion_r152464898
 
 

 ##
 File path: docs/tutorials/basic/ndarray_indexing.md
 ##
 @@ -0,0 +1,375 @@
+
+# NDArray Indexing - Array indexing features
+
+MXNet's advanced indexing features are modeled after [NumPy's implementation 
and 
documentation](https://docs.scipy.org/doc/numpy-1.13.0/reference/arrays.indexing.html#combining-advanced-and-basic-indexing).
 You will see direct adaptations of many NumPy indexing features, and these are 
close, if not identical.
+
+`NDArray`s can be indexed using the standard Python `x[obj]` syntax, where _x_ 
is the array and _obj_ the selection.
+
+There are three kinds of indexing available:
+
+1. field access
+1. basic slicing
+1. advanced indexing
+
+In MXNet, we support both basic and advanced indexing following the convention 
of indexing NumPy's `ndarray`.
+
+
+## Basic Slicing and Indexing
+
+Basic slicing extends Python?s basic concept of slicing to N dimensions. For a 
quick review:
+
+```
+a[start:end] # items start through end-1
+a[start:]# items start through the rest of the array
+a[:end]  # items from the beginning through end-1
+a[:] # a copy of the whole array
+```
+
+
+```python
+import mxnet as mx
+from mxnet import nd
+```
+
+For some working examples of basic slicing we'll start simple.
+
+
+```python
+x = nd.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype='int32')
+x[5:]
+```
+
+
+
+
+
+[5 6 7 8 9]
+
+
+
+
+
+```python
+x = nd.array([0, 1, 2, 3])
+print('1D complete array, x=', x)
+s = x[1:3]
+print('slicing the 2nd and 3rd elements, s=', s)
+```
+
+1D complete array, x=
+[ 0.  1.  2.  3.]
+
+slicing the 2nd and 3rd elements, s=
+[ 1.  2.]
+
+
+
+Now let's try slicing the 2nd and 3rd elements of a multi-dimensional array.
+
+
+```python
+x = nd.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])
+print('multi-D complete array, x=', x)
+s = x[1:3]
+print('slicing the 2nd and 3rd elements, s=', s)
+```
+
+multi-D complete array, x=
+[[  1.   2.   3.   4.]
+ [  5.   6.   7.   8.]
+ [  9.  10.  11.  12.]]
+
+slicing the 2nd and 3rd elements, s=
+[[  5.   6.   7.   8.]
+ [  9.  10.  11.  12.]]
+
+
+
+Now let's try writing to a specific element. We'll write `9` to element `2` 
using `x[2] = 9.0`, which update the whole row.
+
+
+```python
+print('original x, x=', x)
+x[2] = 9.0
+print('replaced entire row with x[2] = 9.0, x=', x)
+```
+
+original x, x=
+[[  1.   2.   3.   4.]
+ [  5.   6.   7.   8.]
+ [  9.  10.  11.  12.]]
+
+replaced entire row with x[2] = 9.0, x=
+[[ 1.  2.  3.  4.]
+ [ 5.  6.  7.  8.]
+ [ 9.  9.  9.  9.]]
+
+
+
+We can target specific elements too. Let's replace the number `3` in the first 
row with the number `9` using `x[0,2] = 9.0`.
 
 Review comment:
   Add a space between 0 and 2, i.e. `x[0, 2] = 9.0`, otherwise, IDE such as 
PyCharm would issue a warning of coding style.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] reminisce commented on a change in pull request #8766: NDArray Indexing tutorial and Gradient Compression FAQ

2017-11-21 Thread GitBox
reminisce commented on a change in pull request #8766: NDArray Indexing 
tutorial and Gradient Compression FAQ
URL: https://github.com/apache/incubator-mxnet/pull/8766#discussion_r152464932
 
 

 ##
 File path: docs/tutorials/basic/ndarray_indexing.md
 ##
 @@ -0,0 +1,375 @@
+
+# NDArray Indexing - Array indexing features
+
+MXNet's advanced indexing features are modeled after [NumPy's implementation 
and 
documentation](https://docs.scipy.org/doc/numpy-1.13.0/reference/arrays.indexing.html#combining-advanced-and-basic-indexing).
 You will see direct adaptations of many NumPy indexing features, and these are 
close, if not identical.
+
+`NDArray`s can be indexed using the standard Python `x[obj]` syntax, where _x_ 
is the array and _obj_ the selection.
+
+There are three kinds of indexing available:
+
+1. field access
+1. basic slicing
+1. advanced indexing
+
+In MXNet, we support both basic and advanced indexing following the convention 
of indexing NumPy's `ndarray`.
+
+
+## Basic Slicing and Indexing
+
+Basic slicing extends Python?s basic concept of slicing to N dimensions. For a 
quick review:
+
+```
+a[start:end] # items start through end-1
+a[start:]# items start through the rest of the array
+a[:end]  # items from the beginning through end-1
+a[:] # a copy of the whole array
+```
+
+
+```python
+import mxnet as mx
+from mxnet import nd
+```
+
+For some working examples of basic slicing we'll start simple.
+
+
+```python
+x = nd.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype='int32')
+x[5:]
+```
+
+
+
+
+
+[5 6 7 8 9]
+
+
+
+
+
+```python
+x = nd.array([0, 1, 2, 3])
+print('1D complete array, x=', x)
+s = x[1:3]
+print('slicing the 2nd and 3rd elements, s=', s)
+```
+
+1D complete array, x=
+[ 0.  1.  2.  3.]
+
+slicing the 2nd and 3rd elements, s=
+[ 1.  2.]
+
+
+
+Now let's try slicing the 2nd and 3rd elements of a multi-dimensional array.
+
+
+```python
+x = nd.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])
+print('multi-D complete array, x=', x)
+s = x[1:3]
+print('slicing the 2nd and 3rd elements, s=', s)
+```
+
+multi-D complete array, x=
+[[  1.   2.   3.   4.]
+ [  5.   6.   7.   8.]
+ [  9.  10.  11.  12.]]
+
+slicing the 2nd and 3rd elements, s=
+[[  5.   6.   7.   8.]
+ [  9.  10.  11.  12.]]
+
+
+
+Now let's try writing to a specific element. We'll write `9` to element `2` 
using `x[2] = 9.0`, which update the whole row.
+
+
+```python
+print('original x, x=', x)
+x[2] = 9.0
+print('replaced entire row with x[2] = 9.0, x=', x)
+```
+
+original x, x=
+[[  1.   2.   3.   4.]
+ [  5.   6.   7.   8.]
+ [  9.  10.  11.  12.]]
+
+replaced entire row with x[2] = 9.0, x=
+[[ 1.  2.  3.  4.]
+ [ 5.  6.  7.  8.]
+ [ 9.  9.  9.  9.]]
+
+
+
+We can target specific elements too. Let's replace the number `3` in the first 
row with the number `9` using `x[0,2] = 9.0`.
+
+
+```python
+print('original x, x=', x)
+x[0,2] = 9.0
+print('replaced specific element with x[0,2] = 9.0, x=', x)
+```
+
+original x, x=
+[[ 1.  2.  3.  4.]
+ [ 5.  6.  7.  8.]
+ [ 9.  9.  9.  9.]]
+
+replaced specific element with x[0,2] = 9.0, x=
 
 Review comment:
   Add a space.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] reminisce commented on a change in pull request #8766: NDArray Indexing tutorial and Gradient Compression FAQ

2017-11-21 Thread GitBox
reminisce commented on a change in pull request #8766: NDArray Indexing 
tutorial and Gradient Compression FAQ
URL: https://github.com/apache/incubator-mxnet/pull/8766#discussion_r152464946
 
 

 ##
 File path: docs/tutorials/basic/ndarray_indexing.md
 ##
 @@ -0,0 +1,375 @@
+
+# NDArray Indexing - Array indexing features
+
+MXNet's advanced indexing features are modeled after [NumPy's implementation 
and 
documentation](https://docs.scipy.org/doc/numpy-1.13.0/reference/arrays.indexing.html#combining-advanced-and-basic-indexing).
 You will see direct adaptations of many NumPy indexing features, and these are 
close, if not identical.
+
+`NDArray`s can be indexed using the standard Python `x[obj]` syntax, where _x_ 
is the array and _obj_ the selection.
+
+There are three kinds of indexing available:
+
+1. field access
+1. basic slicing
+1. advanced indexing
+
+In MXNet, we support both basic and advanced indexing following the convention 
of indexing NumPy's `ndarray`.
+
+
+## Basic Slicing and Indexing
+
+Basic slicing extends Python?s basic concept of slicing to N dimensions. For a 
quick review:
+
+```
+a[start:end] # items start through end-1
+a[start:]# items start through the rest of the array
+a[:end]  # items from the beginning through end-1
+a[:] # a copy of the whole array
+```
+
+
+```python
+import mxnet as mx
+from mxnet import nd
+```
+
+For some working examples of basic slicing we'll start simple.
+
+
+```python
+x = nd.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype='int32')
+x[5:]
+```
+
+
+
+
+
+[5 6 7 8 9]
+
+
+
+
+
+```python
+x = nd.array([0, 1, 2, 3])
+print('1D complete array, x=', x)
+s = x[1:3]
+print('slicing the 2nd and 3rd elements, s=', s)
+```
+
+1D complete array, x=
+[ 0.  1.  2.  3.]
+
+slicing the 2nd and 3rd elements, s=
+[ 1.  2.]
+
+
+
+Now let's try slicing the 2nd and 3rd elements of a multi-dimensional array.
+
+
+```python
+x = nd.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])
+print('multi-D complete array, x=', x)
+s = x[1:3]
+print('slicing the 2nd and 3rd elements, s=', s)
+```
+
+multi-D complete array, x=
+[[  1.   2.   3.   4.]
+ [  5.   6.   7.   8.]
+ [  9.  10.  11.  12.]]
+
+slicing the 2nd and 3rd elements, s=
+[[  5.   6.   7.   8.]
+ [  9.  10.  11.  12.]]
+
+
+
+Now let's try writing to a specific element. We'll write `9` to element `2` 
using `x[2] = 9.0`, which update the whole row.
+
+
+```python
+print('original x, x=', x)
+x[2] = 9.0
+print('replaced entire row with x[2] = 9.0, x=', x)
+```
+
+original x, x=
+[[  1.   2.   3.   4.]
+ [  5.   6.   7.   8.]
+ [  9.  10.  11.  12.]]
+
+replaced entire row with x[2] = 9.0, x=
+[[ 1.  2.  3.  4.]
+ [ 5.  6.  7.  8.]
+ [ 9.  9.  9.  9.]]
+
+
+
+We can target specific elements too. Let's replace the number `3` in the first 
row with the number `9` using `x[0,2] = 9.0`.
+
+
+```python
+print('original x, x=', x)
+x[0,2] = 9.0
+print('replaced specific element with x[0,2] = 9.0, x=', x)
+```
+
+original x, x=
+[[ 1.  2.  3.  4.]
+ [ 5.  6.  7.  8.]
+ [ 9.  9.  9.  9.]]
+
+replaced specific element with x[0,2] = 9.0, x=
+[[ 1.  2.  9.  4.]
+ [ 5.  6.  7.  8.]
+ [ 9.  9.  9.  9.]]
+
+
+
+Now lets target even more by selecting a couple of targets at the same time. 
We'll replace the `6` and the `7` with `x[1:2,1:3] = 5.0`.
 
 Review comment:
   Add a space.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] aaronmarkham commented on issue #8762: Gradient compression faq

2017-11-21 Thread GitBox
aaronmarkham commented on issue #8762: Gradient compression faq
URL: https://github.com/apache/incubator-mxnet/pull/8762#issuecomment-346237507
 
 
   Closing in favor of #8766


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] aaronmarkham closed pull request #8762: Gradient compression faq

2017-11-21 Thread GitBox
aaronmarkham closed pull request #8762: Gradient compression faq
URL: https://github.com/apache/incubator-mxnet/pull/8762
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/docs/faq/gradient_compression.md b/docs/faq/gradient_compression.md
new file mode 100644
index 00..e754f84017
--- /dev/null
+++ b/docs/faq/gradient_compression.md
@@ -0,0 +1,98 @@
+# Gradient Compression
+
+Gradient Compression reduces communication bandwidth to make distributed 
training with GPUs more scalable and efficient without significant loss in 
convergence rate or accuracy.
+
+
+## Benefits
+
+**Increased Speed**
+
+For tasks like acoustic modeling in speech recognition (like in Alexa), the 
gradient compression capability is observed to speedup training by about 2 
times, depending on the size of the model and the network bandwidth of the 
instance. Bigger models see larger speedup with gradient compression.
+
+**Minimal Accuracy Loss**
+
+Gradient compression uses the approach of delaying the synchronization of 
weight updates which are small. Although small weight updates might not be sent 
for that batch, this information is not discarded. Once the weight updates for 
this location accumulate to become a larger value, they will be propagated. 
Since there is no information loss, but only delayed updates, it does not lead 
to a significant loss in accuracy or convergence rate. In distributed training 
experiments[1], it is observed a loss of accuracy as low as 1% for this 
technique.
+
+
+## When to Use Gradient Compression
+
+When training models whose architectures include large fully connected 
components, it can be helpful to use gradient compression. For larger models, 
the communication cost becomes a major factor. Such models stand to benefit 
greatly with gradient compression.
+
+
+### GPU versus CPU
+
+The greatest benefits from gradient compression are realized when using GPUs 
for both single-node multi-GPU and multi-node (single or multi-GPU) distributed 
training. Training on CPU would provide a lower compute density per compute 
node as compared to the massive compute density per compute node on a GPU. Due 
to this, the required communication bandwidth for CPU-based nodes during 
training is not as high as for GPU-based nodes. Hence, the benefits of gradient 
compression are lower for CPU-based nodes as compared to GPU-based nodes.
+
+
+### Network Latency
+
+Benefits of gradient compression can be found when using distributed training 
with network connected nodes. Depending on the network latency between nodes 
and the model's size, these can contribute to slow performance such that 
gradient compression may provide speed improvements.
+
+You may not want to use gradient compression if you have low latency network 
communication.
+
+
+### Model Size
+
+Distributed training involves synchronization of weights after each batch. 
Larger models have much higher communication costs during training, hence such 
models stand to benefit much more from gradient compression.
+When running distributed training with gradient compression, the quantize and 
dequantize operations happen on CPU parallelized with OpenMP. For smaller 
models, when training on GPUs, it helps to set `OMP_NUM_THREADS=1` on each 
node, so that the overhead of launching OMP threads doesn't cause the 
compression and decompression to be slow.
+
+### Model Architecture
+
+The communication bandwidth requirements during training vary across various 
neural network architectures and hence the benefits of gradient compression 
vary accordingly.
+
+In networks which have significant fully connected components, since such 
layers have low compute cost on GPUs, communication becomes a bottleneck 
limiting the speed of distributed training. Gradient compression can help 
reduce the communication cost, and thus speed up training in such cases. We 
have observed speedup of about 2x on large fully connected neural networks. 
Models like AlexNet and VGG have large fully connected components as part of 
the network, hence stand to benefit from gradient compression. Long Short-Term 
Memory architectures require more communication bandwidth, so they also exhibit 
speed improvements with gradient compression.
+
+Architectures like Convolutional Neural Networks on the other hand have a 
higher compute cost, in which case some communication can be parallelized with 
compute. Since communication is not the bottleneck in such networks, gradient 
compression doesn't help much.
+
+
+### Single Node Gradient Compression
+
+When the training is configured to use device to device communication on a 
single node with multiple GPUs, gradient compression can be used to reduce the 
cost communication. This can provide about 20% 

[GitHub] rahul003 closed pull request #8764: Gradient compression example and raise exception if kvstore type unsupported

2017-11-21 Thread GitBox
rahul003 closed pull request #8764: Gradient compression example and raise 
exception if kvstore type unsupported
URL: https://github.com/apache/incubator-mxnet/pull/8764
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/example/gluon/word_language_model/train.py 
b/example/gluon/word_language_model/train.py
index 0b504998be..b419277dcf 100644
--- a/example/gluon/word_language_model/train.py
+++ b/example/gluon/word_language_model/train.py
@@ -54,6 +54,11 @@
 help='report interval')
 parser.add_argument('--save', type=str, default='model.params',
 help='path to save the final model')
+parser.add_argument('--gctype', type=str, default='none',
+help='type of gradient compression to use, \
+  takes `2bit` or `none` for now.')
+parser.add_argument('--gcthreshold', type=float, default=0.5,
+help='threshold for 2bit gradient compression')
 args = parser.parse_args()
 
 
@@ -90,10 +95,13 @@ def batchify(data, batch_size):
 model = model.RNNModel(args.model, ntokens, args.emsize, args.nhid,
args.nlayers, args.dropout, args.tied)
 model.collect_params().initialize(mx.init.Xavier(), ctx=context)
+
+compression_params = None if args.gctype == 'none' else {'type': args.gctype, 
'threshold': args.gcthreshold}
 trainer = gluon.Trainer(model.collect_params(), 'sgd',
 {'learning_rate': args.lr,
  'momentum': 0,
- 'wd': 0})
+ 'wd': 0},
+compression_params=compression_params)
 loss = gluon.loss.SoftmaxCrossEntropyLoss()
 
 ###
diff --git a/python/mxnet/kvstore.py b/python/mxnet/kvstore.py
index d068d06579..5d6fe2de0a 100644
--- a/python/mxnet/kvstore.py
+++ b/python/mxnet/kvstore.py
@@ -408,6 +408,8 @@ def set_gradient_compression(self, compression_params):
 Other keys in this dictionary are optional and specific to the type
 of gradient compression.
 """
+if (self.type() == 'device') or ('dist' in self.type()):
+raise Exception('Gradient compression is not supported for this 
type of kvstore')
 ckeys, cvals = _ctype_dict(compression_params)
 check_call(_LIB.MXKVStoreSetGradientCompression(self.handle,
 
mx_uint(len(compression_params)),


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] cjolivier01 commented on issue #6773: Deadlock and crashes during shutdown

2017-11-21 Thread GitBox
cjolivier01 commented on issue #6773: Deadlock and crashes during shutdown
URL: https://github.com/apache/incubator-mxnet/pull/6773#issuecomment-346235525
 
 
   @lialie , can you make a new issue for this, please?  It's different than 
the ones fixed here.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] cjolivier01 closed pull request #8755: Remove spureous std::move, fix warning regarding RVO being prevented

2017-11-21 Thread GitBox
cjolivier01 closed pull request #8755: Remove spureous std::move, fix warning 
regarding RVO being prevented
URL: https://github.com/apache/incubator-mxnet/pull/8755
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/src/operator/operator_tune.h b/src/operator/operator_tune.h
index 4f92c9d3cb..2088d4603d 100644
--- a/src/operator/operator_tune.h
+++ b/src/operator/operator_tune.h
@@ -56,7 +56,7 @@ class OperatorTuneBase {
* \return Tick object representing the current itmestamp
*/
   static MSHADOW_CINLINE Tick Now() {
-return std::move(std::chrono::high_resolution_clock::now());
+return std::chrono::high_resolution_clock::now();
   }
 
   /*!


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[incubator-mxnet] branch master updated: Remove spureous std::move, fix warning regarding RVO being prevented (#8755)

2017-11-21 Thread cjolivier01
This is an automated email from the ASF dual-hosted git repository.

cjolivier01 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git


The following commit(s) were added to refs/heads/master by this push:
 new abf01b4  Remove spureous std::move, fix warning regarding RVO being 
prevented (#8755)
abf01b4 is described below

commit abf01b4bf23f398b74b42bafb135bea7efe06e37
Author: Pedro Larroy <928489+lar...@users.noreply.github.com>
AuthorDate: Tue Nov 21 19:47:58 2017 -0800

Remove spureous std::move, fix warning regarding RVO being prevented (#8755)
---
 src/operator/operator_tune.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/operator/operator_tune.h b/src/operator/operator_tune.h
index 4f92c9d..2088d46 100644
--- a/src/operator/operator_tune.h
+++ b/src/operator/operator_tune.h
@@ -56,7 +56,7 @@ class OperatorTuneBase {
* \return Tick object representing the current itmestamp
*/
   static MSHADOW_CINLINE Tick Now() {
-return std::move(std::chrono::high_resolution_clock::now());
+return std::chrono::high_resolution_clock::now();
   }
 
   /*!

-- 
To stop receiving notification emails like this one, please contact
['"comm...@mxnet.apache.org" '].


[GitHub] rahul003 commented on a change in pull request #8762: Gradient compression faq

2017-11-21 Thread GitBox
rahul003 commented on a change in pull request #8762: Gradient compression faq
URL: https://github.com/apache/incubator-mxnet/pull/8762#discussion_r152460220
 
 

 ##
 File path: docs/faq/gradient_compression.md
 ##
 @@ -0,0 +1,98 @@
+# Gradient Compression
+
+Gradient Compression reduces communication bandwidth to make distributed 
training with GPUs more scalable and efficient without significant loss in 
convergence rate or accuracy.
+
+
+## Benefits
+
+**Increased Speed**
+
+For tasks like acoustic modeling in speech recognition (like in Alexa), the 
gradient compression capability is observed to speedup training by about 2 
times, depending on the size of the model and the network bandwidth of the 
instance. Bigger models see larger speedup with gradient compression.
+
+**Minimal Accuracy Loss**
+
+Gradient compression uses the approach of delaying the synchronization of 
weight updates which are small. Although small weight updates might not be sent 
for that batch, this information is not discarded. Once the weight updates for 
this location accumulate to become a larger value, they will be propagated. 
Since there is no information loss, but only delayed updates, it does not lead 
to a significant loss in accuracy or convergence rate. In distributed training 
experiments[1], it is observed a loss of accuracy as low as 1% for this 
technique.
+
+
+## When to Use Gradient Compression
+
+When training models whose architectures include large fully connected 
components, it can be helpful to use gradient compression. For larger models, 
the communication cost becomes a major factor. Such models stand to benefit 
greatly with gradient compression.
+
+
+### GPU versus CPU
+
+The greatest benefits from gradient compression are realized when using GPUs 
for both single-node multi-GPU and multi-node (single or multi-GPU) distributed 
training. Training on CPU would provide a lower compute density per compute 
node as compared to the massive compute density per compute node on a GPU. Due 
to this, the required communication bandwidth for CPU-based nodes during 
training is not as high as for GPU-based nodes. Hence, the benefits of gradient 
compression are lower for CPU-based nodes as compared to GPU-based nodes.
+
+
+### Network Latency
+
+Benefits of gradient compression can be found when using distributed training 
with network connected nodes. Depending on the network latency between nodes 
and the model's size, these can contribute to slow performance such that 
gradient compression may provide speed improvements.
+
+You may not want to use gradient compression if you have low latency network 
communication.
+
+
+### Model Size
+
+Distributed training involves synchronization of weights after each batch. 
Larger models have much higher communication costs during training, hence such 
models stand to benefit much more from gradient compression.
+When running distributed training with gradient compression, the quantize and 
dequantize operations happen on CPU parallelized with OpenMP. For smaller 
models, when training on GPUs, it helps to set `OMP_NUM_THREADS=1` on each 
node, so that the overhead of launching OMP threads doesn't cause the 
compression and decompression to be slow.
+
+### Model Architecture
+
+The communication bandwidth requirements during training vary across various 
neural network architectures and hence the benefits of gradient compression 
vary accordingly.
+
+In networks which have significant fully connected components, since such 
layers have low compute cost on GPUs, communication becomes a bottleneck 
limiting the speed of distributed training. Gradient compression can help 
reduce the communication cost, and thus speed up training in such cases. We 
have observed speedup of about 2x on large fully connected neural networks. 
Models like AlexNet and VGG have large fully connected components as part of 
the network, hence stand to benefit from gradient compression. Long Short-Term 
Memory architectures require more communication bandwidth, so they also exhibit 
speed improvements with gradient compression.
+
+Architectures like Convolutional Neural Networks on the other hand have a 
higher compute cost, in which case some communication can be parallelized with 
compute. Since communication is not the bottleneck in such networks, gradient 
compression doesn't help much.
+
+
+### Single Node Gradient Compression
+
+When the training is configured to use device to device communication on a 
single node with multiple GPUs, gradient compression can be used to reduce the 
cost communication. This can provide about 20% speedup for large models using 
older generation architectures. However, speed benefits may be negligible on a 
machine with a newer generation architecture where GPUs can communicate at low 
latency.
+
+
+## Deep Neural Networks and Sparse Data
+
+It is well-known that typically the weights of a fully connected DNN (Deep 
Neural Networks) are sparsely 

[GitHub] rahul003 commented on issue #8762: Gradient compression faq

2017-11-21 Thread GitBox
rahul003 commented on issue #8762: Gradient compression faq
URL: https://github.com/apache/incubator-mxnet/pull/8762#issuecomment-346233987
 
 
   Please add this to docs/faq/multi_devices.md after the section on 
Synchronize Directory
   
   
   
   
   ### Gradient compression
   
   If your model has fully connected components or recurrent neural networks, 
you may achieve increased training speed using gradient compression with 
potentially slight loss of accuracy. 
   Please see [Gradient 
Compression](https://mxnet.incubator.apache.org/versions/master/faq/gradient_compression.html)
   for more details on how to use it, how it works and when it can be helpful. 
For the above example, gradient compression can be enabled by running

   ```bash
   python ../../tools/launch.py -n 2 --launcher ssh -H hosts python 
train_mnist.py --network lenet \
   --kv-store dist_sync --gc-type 2bit
   ```
   
   Here, `gc-type` has been set to `2bit`, to enable 2bit gradient compression.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] rahul003 commented on a change in pull request #8762: Gradient compression faq

2017-11-21 Thread GitBox
rahul003 commented on a change in pull request #8762: Gradient compression faq
URL: https://github.com/apache/incubator-mxnet/pull/8762#discussion_r152460220
 
 

 ##
 File path: docs/faq/gradient_compression.md
 ##
 @@ -0,0 +1,98 @@
+# Gradient Compression
+
+Gradient Compression reduces communication bandwidth to make distributed 
training with GPUs more scalable and efficient without significant loss in 
convergence rate or accuracy.
+
+
+## Benefits
+
+**Increased Speed**
+
+For tasks like acoustic modeling in speech recognition (like in Alexa), the 
gradient compression capability is observed to speedup training by about 2 
times, depending on the size of the model and the network bandwidth of the 
instance. Bigger models see larger speedup with gradient compression.
+
+**Minimal Accuracy Loss**
+
+Gradient compression uses the approach of delaying the synchronization of 
weight updates which are small. Although small weight updates might not be sent 
for that batch, this information is not discarded. Once the weight updates for 
this location accumulate to become a larger value, they will be propagated. 
Since there is no information loss, but only delayed updates, it does not lead 
to a significant loss in accuracy or convergence rate. In distributed training 
experiments[1], it is observed a loss of accuracy as low as 1% for this 
technique.
+
+
+## When to Use Gradient Compression
+
+When training models whose architectures include large fully connected 
components, it can be helpful to use gradient compression. For larger models, 
the communication cost becomes a major factor. Such models stand to benefit 
greatly with gradient compression.
+
+
+### GPU versus CPU
+
+The greatest benefits from gradient compression are realized when using GPUs 
for both single-node multi-GPU and multi-node (single or multi-GPU) distributed 
training. Training on CPU would provide a lower compute density per compute 
node as compared to the massive compute density per compute node on a GPU. Due 
to this, the required communication bandwidth for CPU-based nodes during 
training is not as high as for GPU-based nodes. Hence, the benefits of gradient 
compression are lower for CPU-based nodes as compared to GPU-based nodes.
+
+
+### Network Latency
+
+Benefits of gradient compression can be found when using distributed training 
with network connected nodes. Depending on the network latency between nodes 
and the model's size, these can contribute to slow performance such that 
gradient compression may provide speed improvements.
+
+You may not want to use gradient compression if you have low latency network 
communication.
+
+
+### Model Size
+
+Distributed training involves synchronization of weights after each batch. 
Larger models have much higher communication costs during training, hence such 
models stand to benefit much more from gradient compression.
+When running distributed training with gradient compression, the quantize and 
dequantize operations happen on CPU parallelized with OpenMP. For smaller 
models, when training on GPUs, it helps to set `OMP_NUM_THREADS=1` on each 
node, so that the overhead of launching OMP threads doesn't cause the 
compression and decompression to be slow.
+
+### Model Architecture
+
+The communication bandwidth requirements during training vary across various 
neural network architectures and hence the benefits of gradient compression 
vary accordingly.
+
+In networks which have significant fully connected components, since such 
layers have low compute cost on GPUs, communication becomes a bottleneck 
limiting the speed of distributed training. Gradient compression can help 
reduce the communication cost, and thus speed up training in such cases. We 
have observed speedup of about 2x on large fully connected neural networks. 
Models like AlexNet and VGG have large fully connected components as part of 
the network, hence stand to benefit from gradient compression. Long Short-Term 
Memory architectures require more communication bandwidth, so they also exhibit 
speed improvements with gradient compression.
+
+Architectures like Convolutional Neural Networks on the other hand have a 
higher compute cost, in which case some communication can be parallelized with 
compute. Since communication is not the bottleneck in such networks, gradient 
compression doesn't help much.
+
+
+### Single Node Gradient Compression
+
+When the training is configured to use device to device communication on a 
single node with multiple GPUs, gradient compression can be used to reduce the 
cost communication. This can provide about 20% speedup for large models using 
older generation architectures. However, speed benefits may be negligible on a 
machine with a newer generation architecture where GPUs can communicate at low 
latency.
+
+
+## Deep Neural Networks and Sparse Data
+
+It is well-known that typically the weights of a fully connected DNN (Deep 
Neural Networks) are sparsely 

[GitHub] rahul003 commented on a change in pull request #8762: Gradient compression faq

2017-11-21 Thread GitBox
rahul003 commented on a change in pull request #8762: Gradient compression faq
URL: https://github.com/apache/incubator-mxnet/pull/8762#discussion_r152460220
 
 

 ##
 File path: docs/faq/gradient_compression.md
 ##
 @@ -0,0 +1,98 @@
+# Gradient Compression
+
+Gradient Compression reduces communication bandwidth to make distributed 
training with GPUs more scalable and efficient without significant loss in 
convergence rate or accuracy.
+
+
+## Benefits
+
+**Increased Speed**
+
+For tasks like acoustic modeling in speech recognition (like in Alexa), the 
gradient compression capability is observed to speedup training by about 2 
times, depending on the size of the model and the network bandwidth of the 
instance. Bigger models see larger speedup with gradient compression.
+
+**Minimal Accuracy Loss**
+
+Gradient compression uses the approach of delaying the synchronization of 
weight updates which are small. Although small weight updates might not be sent 
for that batch, this information is not discarded. Once the weight updates for 
this location accumulate to become a larger value, they will be propagated. 
Since there is no information loss, but only delayed updates, it does not lead 
to a significant loss in accuracy or convergence rate. In distributed training 
experiments[1], it is observed a loss of accuracy as low as 1% for this 
technique.
+
+
+## When to Use Gradient Compression
+
+When training models whose architectures include large fully connected 
components, it can be helpful to use gradient compression. For larger models, 
the communication cost becomes a major factor. Such models stand to benefit 
greatly with gradient compression.
+
+
+### GPU versus CPU
+
+The greatest benefits from gradient compression are realized when using GPUs 
for both single-node multi-GPU and multi-node (single or multi-GPU) distributed 
training. Training on CPU would provide a lower compute density per compute 
node as compared to the massive compute density per compute node on a GPU. Due 
to this, the required communication bandwidth for CPU-based nodes during 
training is not as high as for GPU-based nodes. Hence, the benefits of gradient 
compression are lower for CPU-based nodes as compared to GPU-based nodes.
+
+
+### Network Latency
+
+Benefits of gradient compression can be found when using distributed training 
with network connected nodes. Depending on the network latency between nodes 
and the model's size, these can contribute to slow performance such that 
gradient compression may provide speed improvements.
+
+You may not want to use gradient compression if you have low latency network 
communication.
+
+
+### Model Size
+
+Distributed training involves synchronization of weights after each batch. 
Larger models have much higher communication costs during training, hence such 
models stand to benefit much more from gradient compression.
+When running distributed training with gradient compression, the quantize and 
dequantize operations happen on CPU parallelized with OpenMP. For smaller 
models, when training on GPUs, it helps to set `OMP_NUM_THREADS=1` on each 
node, so that the overhead of launching OMP threads doesn't cause the 
compression and decompression to be slow.
+
+### Model Architecture
+
+The communication bandwidth requirements during training vary across various 
neural network architectures and hence the benefits of gradient compression 
vary accordingly.
+
+In networks which have significant fully connected components, since such 
layers have low compute cost on GPUs, communication becomes a bottleneck 
limiting the speed of distributed training. Gradient compression can help 
reduce the communication cost, and thus speed up training in such cases. We 
have observed speedup of about 2x on large fully connected neural networks. 
Models like AlexNet and VGG have large fully connected components as part of 
the network, hence stand to benefit from gradient compression. Long Short-Term 
Memory architectures require more communication bandwidth, so they also exhibit 
speed improvements with gradient compression.
+
+Architectures like Convolutional Neural Networks on the other hand have a 
higher compute cost, in which case some communication can be parallelized with 
compute. Since communication is not the bottleneck in such networks, gradient 
compression doesn't help much.
+
+
+### Single Node Gradient Compression
+
+When the training is configured to use device to device communication on a 
single node with multiple GPUs, gradient compression can be used to reduce the 
cost communication. This can provide about 20% speedup for large models using 
older generation architectures. However, speed benefits may be negligible on a 
machine with a newer generation architecture where GPUs can communicate at low 
latency.
+
+
+## Deep Neural Networks and Sparse Data
+
+It is well-known that typically the weights of a fully connected DNN (Deep 
Neural Networks) are sparsely 

[GitHub] rahul003 commented on a change in pull request #8762: Gradient compression faq

2017-11-21 Thread GitBox
rahul003 commented on a change in pull request #8762: Gradient compression faq
URL: https://github.com/apache/incubator-mxnet/pull/8762#discussion_r152460220
 
 

 ##
 File path: docs/faq/gradient_compression.md
 ##
 @@ -0,0 +1,98 @@
+# Gradient Compression
+
+Gradient Compression reduces communication bandwidth to make distributed 
training with GPUs more scalable and efficient without significant loss in 
convergence rate or accuracy.
+
+
+## Benefits
+
+**Increased Speed**
+
+For tasks like acoustic modeling in speech recognition (like in Alexa), the 
gradient compression capability is observed to speedup training by about 2 
times, depending on the size of the model and the network bandwidth of the 
instance. Bigger models see larger speedup with gradient compression.
+
+**Minimal Accuracy Loss**
+
+Gradient compression uses the approach of delaying the synchronization of 
weight updates which are small. Although small weight updates might not be sent 
for that batch, this information is not discarded. Once the weight updates for 
this location accumulate to become a larger value, they will be propagated. 
Since there is no information loss, but only delayed updates, it does not lead 
to a significant loss in accuracy or convergence rate. In distributed training 
experiments[1], it is observed a loss of accuracy as low as 1% for this 
technique.
+
+
+## When to Use Gradient Compression
+
+When training models whose architectures include large fully connected 
components, it can be helpful to use gradient compression. For larger models, 
the communication cost becomes a major factor. Such models stand to benefit 
greatly with gradient compression.
+
+
+### GPU versus CPU
+
+The greatest benefits from gradient compression are realized when using GPUs 
for both single-node multi-GPU and multi-node (single or multi-GPU) distributed 
training. Training on CPU would provide a lower compute density per compute 
node as compared to the massive compute density per compute node on a GPU. Due 
to this, the required communication bandwidth for CPU-based nodes during 
training is not as high as for GPU-based nodes. Hence, the benefits of gradient 
compression are lower for CPU-based nodes as compared to GPU-based nodes.
+
+
+### Network Latency
+
+Benefits of gradient compression can be found when using distributed training 
with network connected nodes. Depending on the network latency between nodes 
and the model's size, these can contribute to slow performance such that 
gradient compression may provide speed improvements.
+
+You may not want to use gradient compression if you have low latency network 
communication.
+
+
+### Model Size
+
+Distributed training involves synchronization of weights after each batch. 
Larger models have much higher communication costs during training, hence such 
models stand to benefit much more from gradient compression.
+When running distributed training with gradient compression, the quantize and 
dequantize operations happen on CPU parallelized with OpenMP. For smaller 
models, when training on GPUs, it helps to set `OMP_NUM_THREADS=1` on each 
node, so that the overhead of launching OMP threads doesn't cause the 
compression and decompression to be slow.
+
+### Model Architecture
+
+The communication bandwidth requirements during training vary across various 
neural network architectures and hence the benefits of gradient compression 
vary accordingly.
+
+In networks which have significant fully connected components, since such 
layers have low compute cost on GPUs, communication becomes a bottleneck 
limiting the speed of distributed training. Gradient compression can help 
reduce the communication cost, and thus speed up training in such cases. We 
have observed speedup of about 2x on large fully connected neural networks. 
Models like AlexNet and VGG have large fully connected components as part of 
the network, hence stand to benefit from gradient compression. Long Short-Term 
Memory architectures require more communication bandwidth, so they also exhibit 
speed improvements with gradient compression.
+
+Architectures like Convolutional Neural Networks on the other hand have a 
higher compute cost, in which case some communication can be parallelized with 
compute. Since communication is not the bottleneck in such networks, gradient 
compression doesn't help much.
+
+
+### Single Node Gradient Compression
+
+When the training is configured to use device to device communication on a 
single node with multiple GPUs, gradient compression can be used to reduce the 
cost communication. This can provide about 20% speedup for large models using 
older generation architectures. However, speed benefits may be negligible on a 
machine with a newer generation architecture where GPUs can communicate at low 
latency.
+
+
+## Deep Neural Networks and Sparse Data
+
+It is well-known that typically the weights of a fully connected DNN (Deep 
Neural Networks) are sparsely 

[GitHub] rahul003 commented on a change in pull request #8762: Gradient compression faq

2017-11-21 Thread GitBox
rahul003 commented on a change in pull request #8762: Gradient compression faq
URL: https://github.com/apache/incubator-mxnet/pull/8762#discussion_r152460220
 
 

 ##
 File path: docs/faq/gradient_compression.md
 ##
 @@ -0,0 +1,98 @@
+# Gradient Compression
+
+Gradient Compression reduces communication bandwidth to make distributed 
training with GPUs more scalable and efficient without significant loss in 
convergence rate or accuracy.
+
+
+## Benefits
+
+**Increased Speed**
+
+For tasks like acoustic modeling in speech recognition (like in Alexa), the 
gradient compression capability is observed to speedup training by about 2 
times, depending on the size of the model and the network bandwidth of the 
instance. Bigger models see larger speedup with gradient compression.
+
+**Minimal Accuracy Loss**
+
+Gradient compression uses the approach of delaying the synchronization of 
weight updates which are small. Although small weight updates might not be sent 
for that batch, this information is not discarded. Once the weight updates for 
this location accumulate to become a larger value, they will be propagated. 
Since there is no information loss, but only delayed updates, it does not lead 
to a significant loss in accuracy or convergence rate. In distributed training 
experiments[1], it is observed a loss of accuracy as low as 1% for this 
technique.
+
+
+## When to Use Gradient Compression
+
+When training models whose architectures include large fully connected 
components, it can be helpful to use gradient compression. For larger models, 
the communication cost becomes a major factor. Such models stand to benefit 
greatly with gradient compression.
+
+
+### GPU versus CPU
+
+The greatest benefits from gradient compression are realized when using GPUs 
for both single-node multi-GPU and multi-node (single or multi-GPU) distributed 
training. Training on CPU would provide a lower compute density per compute 
node as compared to the massive compute density per compute node on a GPU. Due 
to this, the required communication bandwidth for CPU-based nodes during 
training is not as high as for GPU-based nodes. Hence, the benefits of gradient 
compression are lower for CPU-based nodes as compared to GPU-based nodes.
+
+
+### Network Latency
+
+Benefits of gradient compression can be found when using distributed training 
with network connected nodes. Depending on the network latency between nodes 
and the model's size, these can contribute to slow performance such that 
gradient compression may provide speed improvements.
+
+You may not want to use gradient compression if you have low latency network 
communication.
+
+
+### Model Size
+
+Distributed training involves synchronization of weights after each batch. 
Larger models have much higher communication costs during training, hence such 
models stand to benefit much more from gradient compression.
+When running distributed training with gradient compression, the quantize and 
dequantize operations happen on CPU parallelized with OpenMP. For smaller 
models, when training on GPUs, it helps to set `OMP_NUM_THREADS=1` on each 
node, so that the overhead of launching OMP threads doesn't cause the 
compression and decompression to be slow.
+
+### Model Architecture
+
+The communication bandwidth requirements during training vary across various 
neural network architectures and hence the benefits of gradient compression 
vary accordingly.
+
+In networks which have significant fully connected components, since such 
layers have low compute cost on GPUs, communication becomes a bottleneck 
limiting the speed of distributed training. Gradient compression can help 
reduce the communication cost, and thus speed up training in such cases. We 
have observed speedup of about 2x on large fully connected neural networks. 
Models like AlexNet and VGG have large fully connected components as part of 
the network, hence stand to benefit from gradient compression. Long Short-Term 
Memory architectures require more communication bandwidth, so they also exhibit 
speed improvements with gradient compression.
+
+Architectures like Convolutional Neural Networks on the other hand have a 
higher compute cost, in which case some communication can be parallelized with 
compute. Since communication is not the bottleneck in such networks, gradient 
compression doesn't help much.
+
+
+### Single Node Gradient Compression
+
+When the training is configured to use device to device communication on a 
single node with multiple GPUs, gradient compression can be used to reduce the 
cost communication. This can provide about 20% speedup for large models using 
older generation architectures. However, speed benefits may be negligible on a 
machine with a newer generation architecture where GPUs can communicate at low 
latency.
+
+
+## Deep Neural Networks and Sparse Data
+
+It is well-known that typically the weights of a fully connected DNN (Deep 
Neural Networks) are sparsely 

[GitHub] rahul003 commented on a change in pull request #8762: Gradient compression faq

2017-11-21 Thread GitBox
rahul003 commented on a change in pull request #8762: Gradient compression faq
URL: https://github.com/apache/incubator-mxnet/pull/8762#discussion_r152460220
 
 

 ##
 File path: docs/faq/gradient_compression.md
 ##
 @@ -0,0 +1,98 @@
+# Gradient Compression
+
+Gradient Compression reduces communication bandwidth to make distributed 
training with GPUs more scalable and efficient without significant loss in 
convergence rate or accuracy.
+
+
+## Benefits
+
+**Increased Speed**
+
+For tasks like acoustic modeling in speech recognition (like in Alexa), the 
gradient compression capability is observed to speedup training by about 2 
times, depending on the size of the model and the network bandwidth of the 
instance. Bigger models see larger speedup with gradient compression.
+
+**Minimal Accuracy Loss**
+
+Gradient compression uses the approach of delaying the synchronization of 
weight updates which are small. Although small weight updates might not be sent 
for that batch, this information is not discarded. Once the weight updates for 
this location accumulate to become a larger value, they will be propagated. 
Since there is no information loss, but only delayed updates, it does not lead 
to a significant loss in accuracy or convergence rate. In distributed training 
experiments[1], it is observed a loss of accuracy as low as 1% for this 
technique.
+
+
+## When to Use Gradient Compression
+
+When training models whose architectures include large fully connected 
components, it can be helpful to use gradient compression. For larger models, 
the communication cost becomes a major factor. Such models stand to benefit 
greatly with gradient compression.
+
+
+### GPU versus CPU
+
+The greatest benefits from gradient compression are realized when using GPUs 
for both single-node multi-GPU and multi-node (single or multi-GPU) distributed 
training. Training on CPU would provide a lower compute density per compute 
node as compared to the massive compute density per compute node on a GPU. Due 
to this, the required communication bandwidth for CPU-based nodes during 
training is not as high as for GPU-based nodes. Hence, the benefits of gradient 
compression are lower for CPU-based nodes as compared to GPU-based nodes.
+
+
+### Network Latency
+
+Benefits of gradient compression can be found when using distributed training 
with network connected nodes. Depending on the network latency between nodes 
and the model's size, these can contribute to slow performance such that 
gradient compression may provide speed improvements.
+
+You may not want to use gradient compression if you have low latency network 
communication.
+
+
+### Model Size
+
+Distributed training involves synchronization of weights after each batch. 
Larger models have much higher communication costs during training, hence such 
models stand to benefit much more from gradient compression.
+When running distributed training with gradient compression, the quantize and 
dequantize operations happen on CPU parallelized with OpenMP. For smaller 
models, when training on GPUs, it helps to set `OMP_NUM_THREADS=1` on each 
node, so that the overhead of launching OMP threads doesn't cause the 
compression and decompression to be slow.
+
+### Model Architecture
+
+The communication bandwidth requirements during training vary across various 
neural network architectures and hence the benefits of gradient compression 
vary accordingly.
+
+In networks which have significant fully connected components, since such 
layers have low compute cost on GPUs, communication becomes a bottleneck 
limiting the speed of distributed training. Gradient compression can help 
reduce the communication cost, and thus speed up training in such cases. We 
have observed speedup of about 2x on large fully connected neural networks. 
Models like AlexNet and VGG have large fully connected components as part of 
the network, hence stand to benefit from gradient compression. Long Short-Term 
Memory architectures require more communication bandwidth, so they also exhibit 
speed improvements with gradient compression.
+
+Architectures like Convolutional Neural Networks on the other hand have a 
higher compute cost, in which case some communication can be parallelized with 
compute. Since communication is not the bottleneck in such networks, gradient 
compression doesn't help much.
+
+
+### Single Node Gradient Compression
+
+When the training is configured to use device to device communication on a 
single node with multiple GPUs, gradient compression can be used to reduce the 
cost communication. This can provide about 20% speedup for large models using 
older generation architectures. However, speed benefits may be negligible on a 
machine with a newer generation architecture where GPUs can communicate at low 
latency.
+
+
+## Deep Neural Networks and Sparse Data
+
+It is well-known that typically the weights of a fully connected DNN (Deep 
Neural Networks) are sparsely 

[GitHub] rahul003 commented on a change in pull request #8762: Gradient compression faq

2017-11-21 Thread GitBox
rahul003 commented on a change in pull request #8762: Gradient compression faq
URL: https://github.com/apache/incubator-mxnet/pull/8762#discussion_r152460220
 
 

 ##
 File path: docs/faq/gradient_compression.md
 ##
 @@ -0,0 +1,98 @@
+# Gradient Compression
+
+Gradient Compression reduces communication bandwidth to make distributed 
training with GPUs more scalable and efficient without significant loss in 
convergence rate or accuracy.
+
+
+## Benefits
+
+**Increased Speed**
+
+For tasks like acoustic modeling in speech recognition (like in Alexa), the 
gradient compression capability is observed to speedup training by about 2 
times, depending on the size of the model and the network bandwidth of the 
instance. Bigger models see larger speedup with gradient compression.
+
+**Minimal Accuracy Loss**
+
+Gradient compression uses the approach of delaying the synchronization of 
weight updates which are small. Although small weight updates might not be sent 
for that batch, this information is not discarded. Once the weight updates for 
this location accumulate to become a larger value, they will be propagated. 
Since there is no information loss, but only delayed updates, it does not lead 
to a significant loss in accuracy or convergence rate. In distributed training 
experiments[1], it is observed a loss of accuracy as low as 1% for this 
technique.
+
+
+## When to Use Gradient Compression
+
+When training models whose architectures include large fully connected 
components, it can be helpful to use gradient compression. For larger models, 
the communication cost becomes a major factor. Such models stand to benefit 
greatly with gradient compression.
+
+
+### GPU versus CPU
+
+The greatest benefits from gradient compression are realized when using GPUs 
for both single-node multi-GPU and multi-node (single or multi-GPU) distributed 
training. Training on CPU would provide a lower compute density per compute 
node as compared to the massive compute density per compute node on a GPU. Due 
to this, the required communication bandwidth for CPU-based nodes during 
training is not as high as for GPU-based nodes. Hence, the benefits of gradient 
compression are lower for CPU-based nodes as compared to GPU-based nodes.
+
+
+### Network Latency
+
+Benefits of gradient compression can be found when using distributed training 
with network connected nodes. Depending on the network latency between nodes 
and the model's size, these can contribute to slow performance such that 
gradient compression may provide speed improvements.
+
+You may not want to use gradient compression if you have low latency network 
communication.
+
+
+### Model Size
+
+Distributed training involves synchronization of weights after each batch. 
Larger models have much higher communication costs during training, hence such 
models stand to benefit much more from gradient compression.
+When running distributed training with gradient compression, the quantize and 
dequantize operations happen on CPU parallelized with OpenMP. For smaller 
models, when training on GPUs, it helps to set `OMP_NUM_THREADS=1` on each 
node, so that the overhead of launching OMP threads doesn't cause the 
compression and decompression to be slow.
+
+### Model Architecture
+
+The communication bandwidth requirements during training vary across various 
neural network architectures and hence the benefits of gradient compression 
vary accordingly.
+
+In networks which have significant fully connected components, since such 
layers have low compute cost on GPUs, communication becomes a bottleneck 
limiting the speed of distributed training. Gradient compression can help 
reduce the communication cost, and thus speed up training in such cases. We 
have observed speedup of about 2x on large fully connected neural networks. 
Models like AlexNet and VGG have large fully connected components as part of 
the network, hence stand to benefit from gradient compression. Long Short-Term 
Memory architectures require more communication bandwidth, so they also exhibit 
speed improvements with gradient compression.
+
+Architectures like Convolutional Neural Networks on the other hand have a 
higher compute cost, in which case some communication can be parallelized with 
compute. Since communication is not the bottleneck in such networks, gradient 
compression doesn't help much.
+
+
+### Single Node Gradient Compression
+
+When the training is configured to use device to device communication on a 
single node with multiple GPUs, gradient compression can be used to reduce the 
cost communication. This can provide about 20% speedup for large models using 
older generation architectures. However, speed benefits may be negligible on a 
machine with a newer generation architecture where GPUs can communicate at low 
latency.
+
+
+## Deep Neural Networks and Sparse Data
+
+It is well-known that typically the weights of a fully connected DNN (Deep 
Neural Networks) are sparsely 

[GitHub] rahul003 commented on a change in pull request #8762: Gradient compression faq

2017-11-21 Thread GitBox
rahul003 commented on a change in pull request #8762: Gradient compression faq
URL: https://github.com/apache/incubator-mxnet/pull/8762#discussion_r152460220
 
 

 ##
 File path: docs/faq/gradient_compression.md
 ##
 @@ -0,0 +1,98 @@
+# Gradient Compression
+
+Gradient Compression reduces communication bandwidth to make distributed 
training with GPUs more scalable and efficient without significant loss in 
convergence rate or accuracy.
+
+
+## Benefits
+
+**Increased Speed**
+
+For tasks like acoustic modeling in speech recognition (like in Alexa), the 
gradient compression capability is observed to speedup training by about 2 
times, depending on the size of the model and the network bandwidth of the 
instance. Bigger models see larger speedup with gradient compression.
+
+**Minimal Accuracy Loss**
+
+Gradient compression uses the approach of delaying the synchronization of 
weight updates which are small. Although small weight updates might not be sent 
for that batch, this information is not discarded. Once the weight updates for 
this location accumulate to become a larger value, they will be propagated. 
Since there is no information loss, but only delayed updates, it does not lead 
to a significant loss in accuracy or convergence rate. In distributed training 
experiments[1], it is observed a loss of accuracy as low as 1% for this 
technique.
+
+
+## When to Use Gradient Compression
+
+When training models whose architectures include large fully connected 
components, it can be helpful to use gradient compression. For larger models, 
the communication cost becomes a major factor. Such models stand to benefit 
greatly with gradient compression.
+
+
+### GPU versus CPU
+
+The greatest benefits from gradient compression are realized when using GPUs 
for both single-node multi-GPU and multi-node (single or multi-GPU) distributed 
training. Training on CPU would provide a lower compute density per compute 
node as compared to the massive compute density per compute node on a GPU. Due 
to this, the required communication bandwidth for CPU-based nodes during 
training is not as high as for GPU-based nodes. Hence, the benefits of gradient 
compression are lower for CPU-based nodes as compared to GPU-based nodes.
+
+
+### Network Latency
+
+Benefits of gradient compression can be found when using distributed training 
with network connected nodes. Depending on the network latency between nodes 
and the model's size, these can contribute to slow performance such that 
gradient compression may provide speed improvements.
+
+You may not want to use gradient compression if you have low latency network 
communication.
+
+
+### Model Size
+
+Distributed training involves synchronization of weights after each batch. 
Larger models have much higher communication costs during training, hence such 
models stand to benefit much more from gradient compression.
+When running distributed training with gradient compression, the quantize and 
dequantize operations happen on CPU parallelized with OpenMP. For smaller 
models, when training on GPUs, it helps to set `OMP_NUM_THREADS=1` on each 
node, so that the overhead of launching OMP threads doesn't cause the 
compression and decompression to be slow.
+
+### Model Architecture
+
+The communication bandwidth requirements during training vary across various 
neural network architectures and hence the benefits of gradient compression 
vary accordingly.
+
+In networks which have significant fully connected components, since such 
layers have low compute cost on GPUs, communication becomes a bottleneck 
limiting the speed of distributed training. Gradient compression can help 
reduce the communication cost, and thus speed up training in such cases. We 
have observed speedup of about 2x on large fully connected neural networks. 
Models like AlexNet and VGG have large fully connected components as part of 
the network, hence stand to benefit from gradient compression. Long Short-Term 
Memory architectures require more communication bandwidth, so they also exhibit 
speed improvements with gradient compression.
+
+Architectures like Convolutional Neural Networks on the other hand have a 
higher compute cost, in which case some communication can be parallelized with 
compute. Since communication is not the bottleneck in such networks, gradient 
compression doesn't help much.
+
+
+### Single Node Gradient Compression
+
+When the training is configured to use device to device communication on a 
single node with multiple GPUs, gradient compression can be used to reduce the 
cost communication. This can provide about 20% speedup for large models using 
older generation architectures. However, speed benefits may be negligible on a 
machine with a newer generation architecture where GPUs can communicate at low 
latency.
+
+
+## Deep Neural Networks and Sparse Data
+
+It is well-known that typically the weights of a fully connected DNN (Deep 
Neural Networks) are sparsely 

[GitHub] lx75249 commented on issue #8655: How do I download the data file for the examples in incubator-mxnet/cpp-package/example/feature_extract/

2017-11-21 Thread GitBox
lx75249 commented on issue #8655: How do I download the data file for the 
examples in incubator-mxnet/cpp-package/example/feature_extract/
URL: 
https://github.com/apache/incubator-mxnet/issues/8655#issuecomment-346231516
 
 
   see #8746 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] rahul003 commented on a change in pull request #8762: Gradient compression faq

2017-11-21 Thread GitBox
rahul003 commented on a change in pull request #8762: Gradient compression faq
URL: https://github.com/apache/incubator-mxnet/pull/8762#discussion_r152460220
 
 

 ##
 File path: docs/faq/gradient_compression.md
 ##
 @@ -0,0 +1,98 @@
+# Gradient Compression
+
+Gradient Compression reduces communication bandwidth to make distributed 
training with GPUs more scalable and efficient without significant loss in 
convergence rate or accuracy.
+
+
+## Benefits
+
+**Increased Speed**
+
+For tasks like acoustic modeling in speech recognition (like in Alexa), the 
gradient compression capability is observed to speedup training by about 2 
times, depending on the size of the model and the network bandwidth of the 
instance. Bigger models see larger speedup with gradient compression.
+
+**Minimal Accuracy Loss**
+
+Gradient compression uses the approach of delaying the synchronization of 
weight updates which are small. Although small weight updates might not be sent 
for that batch, this information is not discarded. Once the weight updates for 
this location accumulate to become a larger value, they will be propagated. 
Since there is no information loss, but only delayed updates, it does not lead 
to a significant loss in accuracy or convergence rate. In distributed training 
experiments[1], it is observed a loss of accuracy as low as 1% for this 
technique.
+
+
+## When to Use Gradient Compression
+
+When training models whose architectures include large fully connected 
components, it can be helpful to use gradient compression. For larger models, 
the communication cost becomes a major factor. Such models stand to benefit 
greatly with gradient compression.
+
+
+### GPU versus CPU
+
+The greatest benefits from gradient compression are realized when using GPUs 
for both single-node multi-GPU and multi-node (single or multi-GPU) distributed 
training. Training on CPU would provide a lower compute density per compute 
node as compared to the massive compute density per compute node on a GPU. Due 
to this, the required communication bandwidth for CPU-based nodes during 
training is not as high as for GPU-based nodes. Hence, the benefits of gradient 
compression are lower for CPU-based nodes as compared to GPU-based nodes.
+
+
+### Network Latency
+
+Benefits of gradient compression can be found when using distributed training 
with network connected nodes. Depending on the network latency between nodes 
and the model's size, these can contribute to slow performance such that 
gradient compression may provide speed improvements.
+
+You may not want to use gradient compression if you have low latency network 
communication.
+
+
+### Model Size
+
+Distributed training involves synchronization of weights after each batch. 
Larger models have much higher communication costs during training, hence such 
models stand to benefit much more from gradient compression.
+When running distributed training with gradient compression, the quantize and 
dequantize operations happen on CPU parallelized with OpenMP. For smaller 
models, when training on GPUs, it helps to set `OMP_NUM_THREADS=1` on each 
node, so that the overhead of launching OMP threads doesn't cause the 
compression and decompression to be slow.
+
+### Model Architecture
+
+The communication bandwidth requirements during training vary across various 
neural network architectures and hence the benefits of gradient compression 
vary accordingly.
+
+In networks which have significant fully connected components, since such 
layers have low compute cost on GPUs, communication becomes a bottleneck 
limiting the speed of distributed training. Gradient compression can help 
reduce the communication cost, and thus speed up training in such cases. We 
have observed speedup of about 2x on large fully connected neural networks. 
Models like AlexNet and VGG have large fully connected components as part of 
the network, hence stand to benefit from gradient compression. Long Short-Term 
Memory architectures require more communication bandwidth, so they also exhibit 
speed improvements with gradient compression.
+
+Architectures like Convolutional Neural Networks on the other hand have a 
higher compute cost, in which case some communication can be parallelized with 
compute. Since communication is not the bottleneck in such networks, gradient 
compression doesn't help much.
+
+
+### Single Node Gradient Compression
+
+When the training is configured to use device to device communication on a 
single node with multiple GPUs, gradient compression can be used to reduce the 
cost communication. This can provide about 20% speedup for large models using 
older generation architectures. However, speed benefits may be negligible on a 
machine with a newer generation architecture where GPUs can communicate at low 
latency.
+
+
+## Deep Neural Networks and Sparse Data
+
+It is well-known that typically the weights of a fully connected DNN (Deep 
Neural Networks) are sparsely 

[GitHub] rahul003 commented on a change in pull request #8762: Gradient compression faq

2017-11-21 Thread GitBox
rahul003 commented on a change in pull request #8762: Gradient compression faq
URL: https://github.com/apache/incubator-mxnet/pull/8762#discussion_r152460220
 
 

 ##
 File path: docs/faq/gradient_compression.md
 ##
 @@ -0,0 +1,98 @@
+# Gradient Compression
+
+Gradient Compression reduces communication bandwidth to make distributed 
training with GPUs more scalable and efficient without significant loss in 
convergence rate or accuracy.
+
+
+## Benefits
+
+**Increased Speed**
+
+For tasks like acoustic modeling in speech recognition (like in Alexa), the 
gradient compression capability is observed to speedup training by about 2 
times, depending on the size of the model and the network bandwidth of the 
instance. Bigger models see larger speedup with gradient compression.
+
+**Minimal Accuracy Loss**
+
+Gradient compression uses the approach of delaying the synchronization of 
weight updates which are small. Although small weight updates might not be sent 
for that batch, this information is not discarded. Once the weight updates for 
this location accumulate to become a larger value, they will be propagated. 
Since there is no information loss, but only delayed updates, it does not lead 
to a significant loss in accuracy or convergence rate. In distributed training 
experiments[1], it is observed a loss of accuracy as low as 1% for this 
technique.
+
+
+## When to Use Gradient Compression
+
+When training models whose architectures include large fully connected 
components, it can be helpful to use gradient compression. For larger models, 
the communication cost becomes a major factor. Such models stand to benefit 
greatly with gradient compression.
+
+
+### GPU versus CPU
+
+The greatest benefits from gradient compression are realized when using GPUs 
for both single-node multi-GPU and multi-node (single or multi-GPU) distributed 
training. Training on CPU would provide a lower compute density per compute 
node as compared to the massive compute density per compute node on a GPU. Due 
to this, the required communication bandwidth for CPU-based nodes during 
training is not as high as for GPU-based nodes. Hence, the benefits of gradient 
compression are lower for CPU-based nodes as compared to GPU-based nodes.
+
+
+### Network Latency
+
+Benefits of gradient compression can be found when using distributed training 
with network connected nodes. Depending on the network latency between nodes 
and the model's size, these can contribute to slow performance such that 
gradient compression may provide speed improvements.
+
+You may not want to use gradient compression if you have low latency network 
communication.
+
+
+### Model Size
+
+Distributed training involves synchronization of weights after each batch. 
Larger models have much higher communication costs during training, hence such 
models stand to benefit much more from gradient compression.
+When running distributed training with gradient compression, the quantize and 
dequantize operations happen on CPU parallelized with OpenMP. For smaller 
models, when training on GPUs, it helps to set `OMP_NUM_THREADS=1` on each 
node, so that the overhead of launching OMP threads doesn't cause the 
compression and decompression to be slow.
+
+### Model Architecture
+
+The communication bandwidth requirements during training vary across various 
neural network architectures and hence the benefits of gradient compression 
vary accordingly.
+
+In networks which have significant fully connected components, since such 
layers have low compute cost on GPUs, communication becomes a bottleneck 
limiting the speed of distributed training. Gradient compression can help 
reduce the communication cost, and thus speed up training in such cases. We 
have observed speedup of about 2x on large fully connected neural networks. 
Models like AlexNet and VGG have large fully connected components as part of 
the network, hence stand to benefit from gradient compression. Long Short-Term 
Memory architectures require more communication bandwidth, so they also exhibit 
speed improvements with gradient compression.
+
+Architectures like Convolutional Neural Networks on the other hand have a 
higher compute cost, in which case some communication can be parallelized with 
compute. Since communication is not the bottleneck in such networks, gradient 
compression doesn't help much.
+
+
+### Single Node Gradient Compression
+
+When the training is configured to use device to device communication on a 
single node with multiple GPUs, gradient compression can be used to reduce the 
cost communication. This can provide about 20% speedup for large models using 
older generation architectures. However, speed benefits may be negligible on a 
machine with a newer generation architecture where GPUs can communicate at low 
latency.
+
+
+## Deep Neural Networks and Sparse Data
+
+It is well-known that typically the weights of a fully connected DNN (Deep 
Neural Networks) are sparsely 

[GitHub] astonzhang commented on a change in pull request #8763: [WIP] Add text apis

2017-11-21 Thread GitBox
astonzhang commented on a change in pull request #8763: [WIP] Add text apis
URL: https://github.com/apache/incubator-mxnet/pull/8763#discussion_r152460202
 
 

 ##
 File path: python/mxnet/text/__init__.py
 ##
 @@ -0,0 +1,23 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+# coding: utf-8
+# pylint: disable=wildcard-import
+"""Text utilities."""
+
+from . import text
+from .text import *
 
 Review comment:
   Completely agree. Maybe images.images needs to change to images.utils


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] iblis17 commented on issue #8727: jenkins: julia build script

2017-11-21 Thread GitBox
iblis17 commented on issue #8727: jenkins: julia build script
URL: https://github.com/apache/incubator-mxnet/pull/8727#issuecomment-346230508
 
 
   hmm, @piiswrong could you tell more detail about why closing this?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] szha commented on a change in pull request #8763: [WIP] Add text apis

2017-11-21 Thread GitBox
szha commented on a change in pull request #8763: [WIP] Add text apis
URL: https://github.com/apache/incubator-mxnet/pull/8763#discussion_r152459622
 
 

 ##
 File path: python/mxnet/text/__init__.py
 ##
 @@ -0,0 +1,23 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+# coding: utf-8
+# pylint: disable=wildcard-import
+"""Text utilities."""
+
+from . import text
+from .text import *
 
 Review comment:
   I think these are utility functions, so a name space like mx.text.utils 
seems to be a good fit.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] astonzhang commented on a change in pull request #8763: [WIP] Add text apis

2017-11-21 Thread GitBox
astonzhang commented on a change in pull request #8763: [WIP] Add text apis
URL: https://github.com/apache/incubator-mxnet/pull/8763#discussion_r152458958
 
 

 ##
 File path: python/mxnet/text/__init__.py
 ##
 @@ -0,0 +1,23 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+# coding: utf-8
+# pylint: disable=wildcard-import
+"""Text utilities."""
+
+from . import text
+from .text import *
 
 Review comment:
   I just follow images/images. Which one is better?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] szha commented on a change in pull request #8763: [WIP] Add text apis

2017-11-21 Thread GitBox
szha commented on a change in pull request #8763: [WIP] Add text apis
URL: https://github.com/apache/incubator-mxnet/pull/8763#discussion_r152457841
 
 

 ##
 File path: python/mxnet/text/text.py
 ##
 @@ -0,0 +1,125 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""Provide text utilities."""
+from __future__ import absolute_import
+from __future__ import print_function
+
+from collections import Counter
+import logging
+import numpy as np
+import os
+import re
+
+from ..base import numeric_types
+from .. import ndarray as nd
+from .. import io
+
+
+def count_tokens_from_str(tokens, token_delim=" ", seq_delim="\n",
+  to_lower=False):
 
 Review comment:
   consider adding the counter as an optional argument, with the default value 
being an empty counter. this way, the same function can be used to either 
create new counter or update existing counter, which effectively removes the 
assumption of having to store a whole corpus in memory.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] szha commented on a change in pull request #8763: [WIP] Add text apis

2017-11-21 Thread GitBox
szha commented on a change in pull request #8763: [WIP] Add text apis
URL: https://github.com/apache/incubator-mxnet/pull/8763#discussion_r152457171
 
 

 ##
 File path: python/mxnet/text/__init__.py
 ##
 @@ -0,0 +1,23 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+# coding: utf-8
+# pylint: disable=wildcard-import
+"""Text utilities."""
+
+from . import text
+from .text import *
 
 Review comment:
   utils?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] eric-haibin-lin opened a new pull request #8765: Update AddVersion.py

2017-11-21 Thread GitBox
eric-haibin-lin opened a new pull request #8765: Update AddVersion.py
URL: https://github.com/apache/incubator-mxnet/pull/8765
 
 
   ## Description ##
   fix #8609 
   @kevinthesun @aaronmarkham @mwunderlich
   
   ## Checklist ##
   ### Essentials ###
   - [ ] Passed code style checking (`make lint`)
   - [ ] Changes are complete (i.e. I finished coding on this PR)
   - [ ] All changes have test coverage
   - [ ] For user-facing API changes, API doc string has been updated. For new 
C++ functions in header files, their functionalities and arguments are 
well-documented. 
   - [ ] To my best knowledge, examples are either not affected by this change, 
or have been fixed to be compatible with this change
   
   ### Changes ###
   - [ ] Feature1, tests, (and when applicable, API doc)
   - [ ] Feature2, tests, (and when applicable, API doc)
   
   ## Comments ##
   - If this change is a backward incompatible change, why must this change be 
made.
   - Interesting edge cases to note here
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] rahul003 opened a new pull request #8764: Gradient compression example and raise exception if kvstore type unsupported

2017-11-21 Thread GitBox
rahul003 opened a new pull request #8764: Gradient compression example and 
raise exception if kvstore type unsupported
URL: https://github.com/apache/incubator-mxnet/pull/8764
 
 
   ## Description ##
   Added gluon example for gradient compression and raise exception if kvstore 
type unsupported 
   
   ## Checklist ##
   ### Essentials ###
   - [ ] Passed code style checking (`make lint`)
   - [ ] Changes are complete (i.e. I finished coding on this PR)
   - [ ] All changes have test coverage
   - [ ] For user-facing API changes, API doc string has been updated. For new 
C++ functions in header files, their functionalities and arguments are 
well-documented. 
   - [ ] To my best knowledge, examples are either not affected by this change, 
or have been fixed to be compatible with this change
   
   ### Changes ###
   - [ ] Feature1, tests, (and when applicable, API doc)
   - [ ] Feature2, tests, (and when applicable, API doc)
   
   ## Comments ##
   - If this change is a backward incompatible change, why must this change be 
made.
   - Interesting edge cases to note here


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] rahul003 commented on a change in pull request #8762: Gradient compression faq

2017-11-21 Thread GitBox
rahul003 commented on a change in pull request #8762: Gradient compression faq
URL: https://github.com/apache/incubator-mxnet/pull/8762#discussion_r152454945
 
 

 ##
 File path: docs/faq/gradient_compression.md
 ##
 @@ -0,0 +1,95 @@
+# Gradient Compression
+
+Gradient Compression reduces communication bandwidth to make distributed 
training with GPUs more scalable and efficient without significant loss in 
convergence rate or accuracy.
+
+
+## Benefits
+
+**Increased Speed**
+
+For tasks like acoustic modeling in speech recognition (like in Alexa), the 
gradient compression capability is observed to speedup training by about 2 
times, depending on the size of the model and the network bandwidth of the 
instance. Bigger models see larger speedup with gradient compression.
+
+**Minimal Accuracy Loss**
+
+Gradient compression uses the approach of delaying the synchronization of 
weight updates which are small. Although small weight updates might not be sent 
for that batch, this information is not discarded. Once the weight updates for 
this location accumulate to become a larger value, they will be propagated. 
Since there is no information loss, but only delayed updates, it does not lead 
to a significant loss in accuracy or convergence rate. In distributed training 
experiments[1], it is observed a loss of accuracy as low as 1% for this 
technique.
+
+
+## When to Use Gradient Compression
+
+When training models whose architectures include large fully connected 
components, it can be helpful to use gradient compression. For larger models, 
the communication cost becomes a major factor. Such models stand to benefit 
greatly with gradient compression.
+
+
+### GPU versus CPU
+
+The greatest benefits from gradient compression are realized when using GPUs 
for both single-node multi-GPU and multi-node (single or multi-GPU) distributed 
training. Training on CPU would provide a lower compute density per compute 
node as compared to the massive compute density per compute node on a GPU. Due 
to this, the required communication bandwidth for CPU-based nodes during 
training is not as high as for GPU-based nodes. Hence, the benefits of gradient 
compression are lower for CPU-based nodes as compared to GPU-based nodes.
+
+
+### Scaling
+
+When the training is configured to use device to device communication on a 
single node with multiple GPUs, gradient compression can be used to reduce the 
cost communication. This can provide about 20% speedup for large models using 
older generation architectures where GPU communication goes through the CPU. 
However, speed benefits may be negligible on a 8-GPU machine with a newer 
generation architecture where GPUs can communicate without going through the 
CPU first.
+
+
+### Network Latency
+
+Benefits of gradient compression can be found when using distributed training 
with network connected nodes. Depending on the network latency between nodes 
and the model's size, these can contribute to slow performance such that 
gradient compression may provide speed improvements.
+
+You may not want to use gradient compression if you have low latency 
communication. The performance may be negligible when GPUs can communicate at 
low latency in newer architectures.
+
+
+### Model Size
+
+If the model is small, gradient compression can actually decrease speed. More 
examples of this are covered in the Benchmarking section.
 
 Review comment:
   When running distributed training with gradient compression, the quantize 
and dequantize operations happen on CPU parallelized with OpenMP. For smaller 
models, when training on GPUs, it helps to set `OMP_NUM_THREADS=1` on each 
node, so that the overhead of launching OMP threads doesn't cause the 
compression and decompression to be slow. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] rahul003 commented on a change in pull request #8762: Gradient compression faq

2017-11-21 Thread GitBox
rahul003 commented on a change in pull request #8762: Gradient compression faq
URL: https://github.com/apache/incubator-mxnet/pull/8762#discussion_r152454717
 
 

 ##
 File path: docs/faq/gradient_compression.md
 ##
 @@ -0,0 +1,95 @@
+# Gradient Compression
+
+Gradient Compression reduces communication bandwidth to make distributed 
training with GPUs more scalable and efficient without significant loss in 
convergence rate or accuracy.
+
+
+## Benefits
+
+**Increased Speed**
+
+For tasks like acoustic modeling in speech recognition (like in Alexa), the 
gradient compression capability is observed to speedup training by about 2 
times, depending on the size of the model and the network bandwidth of the 
instance. Bigger models see larger speedup with gradient compression.
+
+**Minimal Accuracy Loss**
+
+Gradient compression uses the approach of delaying the synchronization of 
weight updates which are small. Although small weight updates might not be sent 
for that batch, this information is not discarded. Once the weight updates for 
this location accumulate to become a larger value, they will be propagated. 
Since there is no information loss, but only delayed updates, it does not lead 
to a significant loss in accuracy or convergence rate. In distributed training 
experiments[1], it is observed a loss of accuracy as low as 1% for this 
technique.
+
+
+## When to Use Gradient Compression
+
+When training models whose architectures include large fully connected 
components, it can be helpful to use gradient compression. For larger models, 
the communication cost becomes a major factor. Such models stand to benefit 
greatly with gradient compression.
+
+
+### GPU versus CPU
+
+The greatest benefits from gradient compression are realized when using GPUs 
for both single-node multi-GPU and multi-node (single or multi-GPU) distributed 
training. Training on CPU would provide a lower compute density per compute 
node as compared to the massive compute density per compute node on a GPU. Due 
to this, the required communication bandwidth for CPU-based nodes during 
training is not as high as for GPU-based nodes. Hence, the benefits of gradient 
compression are lower for CPU-based nodes as compared to GPU-based nodes.
 
 Review comment:
   -> when using GPUs for distributed training. 
   
   
   Let's not put single-node multi GPU here. We can have a section at the end 
to discuss that case.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] rahul003 commented on a change in pull request #8762: Gradient compression faq

2017-11-21 Thread GitBox
rahul003 commented on a change in pull request #8762: Gradient compression faq
URL: https://github.com/apache/incubator-mxnet/pull/8762#discussion_r152454717
 
 

 ##
 File path: docs/faq/gradient_compression.md
 ##
 @@ -0,0 +1,95 @@
+# Gradient Compression
+
+Gradient Compression reduces communication bandwidth to make distributed 
training with GPUs more scalable and efficient without significant loss in 
convergence rate or accuracy.
+
+
+## Benefits
+
+**Increased Speed**
+
+For tasks like acoustic modeling in speech recognition (like in Alexa), the 
gradient compression capability is observed to speedup training by about 2 
times, depending on the size of the model and the network bandwidth of the 
instance. Bigger models see larger speedup with gradient compression.
+
+**Minimal Accuracy Loss**
+
+Gradient compression uses the approach of delaying the synchronization of 
weight updates which are small. Although small weight updates might not be sent 
for that batch, this information is not discarded. Once the weight updates for 
this location accumulate to become a larger value, they will be propagated. 
Since there is no information loss, but only delayed updates, it does not lead 
to a significant loss in accuracy or convergence rate. In distributed training 
experiments[1], it is observed a loss of accuracy as low as 1% for this 
technique.
+
+
+## When to Use Gradient Compression
+
+When training models whose architectures include large fully connected 
components, it can be helpful to use gradient compression. For larger models, 
the communication cost becomes a major factor. Such models stand to benefit 
greatly with gradient compression.
+
+
+### GPU versus CPU
+
+The greatest benefits from gradient compression are realized when using GPUs 
for both single-node multi-GPU and multi-node (single or multi-GPU) distributed 
training. Training on CPU would provide a lower compute density per compute 
node as compared to the massive compute density per compute node on a GPU. Due 
to this, the required communication bandwidth for CPU-based nodes during 
training is not as high as for GPU-based nodes. Hence, the benefits of gradient 
compression are lower for CPU-based nodes as compared to GPU-based nodes.
 
 Review comment:
   when using GPUs for distributed training. 
   Let's not put single-node multi GPU here. We can have a section at the end 
to discuss that case.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] rahul003 commented on a change in pull request #8762: Gradient compression faq

2017-11-21 Thread GitBox
rahul003 commented on a change in pull request #8762: Gradient compression faq
URL: https://github.com/apache/incubator-mxnet/pull/8762#discussion_r152454543
 
 

 ##
 File path: docs/faq/gradient_compression.md
 ##
 @@ -0,0 +1,95 @@
+# Gradient Compression
+
+Gradient Compression reduces communication bandwidth to make distributed 
training with GPUs more scalable and efficient without significant loss in 
convergence rate or accuracy.
+
+
+## Benefits
+
+**Increased Speed**
+
+For tasks like acoustic modeling in speech recognition (like in Alexa), the 
gradient compression capability is observed to speedup training by about 2 
times, depending on the size of the model and the network bandwidth of the 
instance. Bigger models see larger speedup with gradient compression.
 
 Review comment:
   Probably keep it generic here, people wouldn't know the type of models used 
in Alexa. 
   Let's say 'For architectures with fully connected components ...


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] rahul003 commented on a change in pull request #8762: Gradient compression faq

2017-11-21 Thread GitBox
rahul003 commented on a change in pull request #8762: Gradient compression faq
URL: https://github.com/apache/incubator-mxnet/pull/8762#discussion_r152453417
 
 

 ##
 File path: docs/faq/gradient_compression.md
 ##
 @@ -0,0 +1,95 @@
+# Gradient Compression
+
+Gradient Compression reduces communication bandwidth to make distributed 
training with GPUs more scalable and efficient without significant loss in 
convergence rate or accuracy.
+
+
+## Benefits
+
+**Increased Speed**
+
+For tasks like acoustic modeling in speech recognition (like in Alexa), the 
gradient compression capability is observed to speedup training by about 2 
times, depending on the size of the model and the network bandwidth of the 
instance. Bigger models see larger speedup with gradient compression.
+
+**Minimal Accuracy Loss**
+
+Gradient compression uses the approach of delaying the synchronization of 
weight updates which are small. Although small weight updates might not be sent 
for that batch, this information is not discarded. Once the weight updates for 
this location accumulate to become a larger value, they will be propagated. 
Since there is no information loss, but only delayed updates, it does not lead 
to a significant loss in accuracy or convergence rate. In distributed training 
experiments[1], it is observed a loss of accuracy as low as 1% for this 
technique.
+
+
+## When to Use Gradient Compression
+
+When training models whose architectures include large fully connected 
components, it can be helpful to use gradient compression. For larger models, 
the communication cost becomes a major factor. Such models stand to benefit 
greatly with gradient compression.
+
+
+### GPU versus CPU
+
+The greatest benefits from gradient compression are realized when using GPUs 
for both single-node multi-GPU and multi-node (single or multi-GPU) distributed 
training. Training on CPU would provide a lower compute density per compute 
node as compared to the massive compute density per compute node on a GPU. Due 
to this, the required communication bandwidth for CPU-based nodes during 
training is not as high as for GPU-based nodes. Hence, the benefits of gradient 
compression are lower for CPU-based nodes as compared to GPU-based nodes.
+
+
+### Scaling
 
 Review comment:
   I think this should go at the end, some section like can gradient 
compression help training on a single instance too? 
   Because the rest of the page would be about distributed training. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] rahul003 commented on a change in pull request #8762: Gradient compression faq

2017-11-21 Thread GitBox
rahul003 commented on a change in pull request #8762: Gradient compression faq
URL: https://github.com/apache/incubator-mxnet/pull/8762#discussion_r152453417
 
 

 ##
 File path: docs/faq/gradient_compression.md
 ##
 @@ -0,0 +1,95 @@
+# Gradient Compression
+
+Gradient Compression reduces communication bandwidth to make distributed 
training with GPUs more scalable and efficient without significant loss in 
convergence rate or accuracy.
+
+
+## Benefits
+
+**Increased Speed**
+
+For tasks like acoustic modeling in speech recognition (like in Alexa), the 
gradient compression capability is observed to speedup training by about 2 
times, depending on the size of the model and the network bandwidth of the 
instance. Bigger models see larger speedup with gradient compression.
+
+**Minimal Accuracy Loss**
+
+Gradient compression uses the approach of delaying the synchronization of 
weight updates which are small. Although small weight updates might not be sent 
for that batch, this information is not discarded. Once the weight updates for 
this location accumulate to become a larger value, they will be propagated. 
Since there is no information loss, but only delayed updates, it does not lead 
to a significant loss in accuracy or convergence rate. In distributed training 
experiments[1], it is observed a loss of accuracy as low as 1% for this 
technique.
+
+
+## When to Use Gradient Compression
+
+When training models whose architectures include large fully connected 
components, it can be helpful to use gradient compression. For larger models, 
the communication cost becomes a major factor. Such models stand to benefit 
greatly with gradient compression.
+
+
+### GPU versus CPU
+
+The greatest benefits from gradient compression are realized when using GPUs 
for both single-node multi-GPU and multi-node (single or multi-GPU) distributed 
training. Training on CPU would provide a lower compute density per compute 
node as compared to the massive compute density per compute node on a GPU. Due 
to this, the required communication bandwidth for CPU-based nodes during 
training is not as high as for GPU-based nodes. Hence, the benefits of gradient 
compression are lower for CPU-based nodes as compared to GPU-based nodes.
+
+
+### Scaling
 
 Review comment:
   I think this should go at the end, some section like can gradient 
compression help training on a single instance too? 
   Because the rest of the page would be about distributed training. What do 
you think?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] astonzhang opened a new pull request #8763: [WIP] Add text apis

2017-11-21 Thread GitBox
astonzhang opened a new pull request #8763: [WIP] Add text apis
URL: https://github.com/apache/incubator-mxnet/pull/8763
 
 
   ## Description ##
   Add text APIs, such as text utils, glossary class, and pre-trained 
embeddings.
   
   ## Checklist ##
   ### Essentials ###
   - [x] Passed code style checking (`make lint`)
   - [ ] Changes are complete (i.e. I finished coding on this PR)
   - [x] All changes have test coverage
   - [x] For user-facing API changes, API doc string has been updated. For new 
C++ functions in header files, their functionalities and arguments are 
well-documented. 
   - [x] To my best knowledge, examples are either not affected by this change, 
or have been fixed to be compatible with this change
   
   ### Changes ###
   - [ ] Text utils, tests, (and when applicable, API doc)
   - [ ] Glossary, tests, (and when applicable, API doc)
   
   ## Comments ##
   - WIP
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] aaronmarkham opened a new pull request #8762: Gradient compression faq

2017-11-21 Thread GitBox
aaronmarkham opened a new pull request #8762: Gradient compression faq
URL: https://github.com/apache/incubator-mxnet/pull/8762
 
 
   ## Description ##
   Summarizes the gradient compression feature.
   
   ## Checklist ##
   ### Essentials ###
   - [x] Changes are complete (i.e. I finished coding on this PR)
   - [x] To my best knowledge, examples are either not affected by this change, 
or have been fixed to be compatible with this change
   
   ### Changes ###
   
   
   ## Comments ##
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] javelinjs commented on issue #8297: [scala] Make accuracy idependant of output size (fix #8226)

2017-11-21 Thread GitBox
javelinjs commented on issue #8297: [scala] Make accuracy idependant of output 
size (fix #8226)
URL: https://github.com/apache/incubator-mxnet/pull/8297#issuecomment-346217712
 
 
   Thanks.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] javelinjs closed pull request #8297: [scala] Make accuracy idependant of output size (fix #8226)

2017-11-21 Thread GitBox
javelinjs closed pull request #8297: [scala] Make accuracy idependant of output 
size (fix #8226)
URL: https://github.com/apache/incubator-mxnet/pull/8297
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/scala-package/core/src/main/scala/ml/dmlc/mxnet/EvalMetric.scala 
b/scala-package/core/src/main/scala/ml/dmlc/mxnet/EvalMetric.scala
index 6b993d7665..98a09d2250 100644
--- a/scala-package/core/src/main/scala/ml/dmlc/mxnet/EvalMetric.scala
+++ b/scala-package/core/src/main/scala/ml/dmlc/mxnet/EvalMetric.scala
@@ -26,7 +26,7 @@ import scala.collection.mutable.ArrayBuffer
 abstract class EvalMetric(protected val name: String) {
 
   protected var numInst: Int = 0
-  protected var sumMetric: Float = 0.0f
+  protected var sumMetric: Double = 0.0d
 
   /**
* Update the internal evaluation.
@@ -41,7 +41,7 @@ abstract class EvalMetric(protected val name: String) {
*/
   def reset(): Unit = {
 this.numInst = 0
-this.sumMetric = 0.0f
+this.sumMetric = 0.0d
   }
 
   /**
@@ -50,7 +50,7 @@ abstract class EvalMetric(protected val name: String) {
* value, Value of the evaluation
*/
   def get: (Array[String], Array[Float]) = {
-(Array(this.name), Array(this.sumMetric / this.numInst))
+(Array(this.name), Array((this.sumMetric / this.numInst).toFloat))
   }
 }
 
@@ -111,11 +111,10 @@ class Accuracy extends EvalMetric("accuracy") {
   require(label.shape == predLabel.shape,
 s"label ${label.shape} and prediction ${predLabel.shape}" +
 s"should have the same length.")
-  for ((labelElem, predElem) <- label.toArray zip predLabel.toArray) {
-if (labelElem == predElem) {
-  this.sumMetric += 1
-}
-  }
+
+  this.sumMetric += label.toArray.zip(predLabel.toArray)
+.filter{ case (labelElem: Float, predElem: Float) => labelElem == 
predElem }
+.size
   this.numInst += predLabel.shape(0)
   predLabel.dispose()
 }


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[incubator-mxnet] branch master updated: [scala] EvalMetric sumMetric is now a Double instead of a Float (#8297)

2017-11-21 Thread liuyizhi
This is an automated email from the ASF dual-hosted git repository.

liuyizhi pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git


The following commit(s) were added to refs/heads/master by this push:
 new 8df20a2  [scala] EvalMetric sumMetric is now a Double instead of a 
Float (#8297)
8df20a2 is described below

commit 8df20a2bd074c4ab55a9b61e0ec04da48bec6426
Author: Benoît Quartier 
AuthorDate: Wed Nov 22 02:44:15 2017 +0100

[scala] EvalMetric sumMetric is now a Double instead of a Float (#8297)

When the difference in magnitude between the total
accuracy and 1 becomes too big and accuracy is not updated anymore due
to the low precision of float numbers.
---
 .../core/src/main/scala/ml/dmlc/mxnet/EvalMetric.scala| 15 +++
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/scala-package/core/src/main/scala/ml/dmlc/mxnet/EvalMetric.scala 
b/scala-package/core/src/main/scala/ml/dmlc/mxnet/EvalMetric.scala
index 6b993d7..98a09d2 100644
--- a/scala-package/core/src/main/scala/ml/dmlc/mxnet/EvalMetric.scala
+++ b/scala-package/core/src/main/scala/ml/dmlc/mxnet/EvalMetric.scala
@@ -26,7 +26,7 @@ import scala.collection.mutable.ArrayBuffer
 abstract class EvalMetric(protected val name: String) {
 
   protected var numInst: Int = 0
-  protected var sumMetric: Float = 0.0f
+  protected var sumMetric: Double = 0.0d
 
   /**
* Update the internal evaluation.
@@ -41,7 +41,7 @@ abstract class EvalMetric(protected val name: String) {
*/
   def reset(): Unit = {
 this.numInst = 0
-this.sumMetric = 0.0f
+this.sumMetric = 0.0d
   }
 
   /**
@@ -50,7 +50,7 @@ abstract class EvalMetric(protected val name: String) {
* value, Value of the evaluation
*/
   def get: (Array[String], Array[Float]) = {
-(Array(this.name), Array(this.sumMetric / this.numInst))
+(Array(this.name), Array((this.sumMetric / this.numInst).toFloat))
   }
 }
 
@@ -111,11 +111,10 @@ class Accuracy extends EvalMetric("accuracy") {
   require(label.shape == predLabel.shape,
 s"label ${label.shape} and prediction ${predLabel.shape}" +
 s"should have the same length.")
-  for ((labelElem, predElem) <- label.toArray zip predLabel.toArray) {
-if (labelElem == predElem) {
-  this.sumMetric += 1
-}
-  }
+
+  this.sumMetric += label.toArray.zip(predLabel.toArray)
+.filter{ case (labelElem: Float, predElem: Float) => labelElem == 
predElem }
+.size
   this.numInst += predLabel.shape(0)
   predLabel.dispose()
 }

-- 
To stop receiving notification emails like this one, please contact
['"comm...@mxnet.apache.org" '].


[GitHub] javelinjs closed issue #8226: [scala] Accuracy precision is too low

2017-11-21 Thread GitBox
javelinjs closed issue #8226: [scala] Accuracy precision is too low
URL: https://github.com/apache/incubator-mxnet/issues/8226
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] piiswrong opened a new pull request #8761: Refactor image operators

2017-11-21 Thread GitBox
piiswrong opened a new pull request #8761: Refactor image operators
URL: https://github.com/apache/incubator-mxnet/pull/8761
 
 
   ## Description ##
   (Brief description on what this PR is about)
   
   ## Checklist ##
   ### Essentials ###
   - [ ] Passed code style checking (`make lint`)
   - [ ] Changes are complete (i.e. I finished coding on this PR)
   - [ ] All changes have test coverage
   - [ ] For user-facing API changes, API doc string has been updated. For new 
C++ functions in header files, their functionalities and arguments are 
well-documented. 
   - [ ] To my best knowledge, examples are either not affected by this change, 
or have been fixed to be compatible with this change
   
   ### Changes ###
   - [ ] Feature1, tests, (and when applicable, API doc)
   - [ ] Feature2, tests, (and when applicable, API doc)
   
   ## Comments ##
   - If this change is a backward incompatible change, why must this change be 
made.
   - Interesting edge cases to note here
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] piiswrong closed pull request #8761: Refactor image operators

2017-11-21 Thread GitBox
piiswrong closed pull request #8761: Refactor image operators
URL: https://github.com/apache/incubator-mxnet/pull/8761
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/mshadow b/mshadow
index 2d7780c3f2..1e1f633a82 16
--- a/mshadow
+++ b/mshadow
@@ -1 +1 @@
-Subproject commit 2d7780c3f2eefe4453fa419862d1b2089bedb8d5
+Subproject commit 1e1f633a82c1fec5718fd291e2da6149708635f6
diff --git a/python/mxnet/gluon/data/dataset.py 
b/python/mxnet/gluon/data/dataset.py
index 740a2a47c7..9b4d197906 100644
--- a/python/mxnet/gluon/data/dataset.py
+++ b/python/mxnet/gluon/data/dataset.py
@@ -18,7 +18,7 @@
 # coding: utf-8
 # pylint: disable=
 """Dataset container."""
-__all__ = ['Dataset', 'SimpleDataset', 'ArrayDataset', 'LabeledDataset',
+__all__ = ['Dataset', 'SimpleDataset', 'ArrayDataset',
'RecordFileDataset']
 
 import os
diff --git a/python/mxnet/gluon/data/vision/datasets.py 
b/python/mxnet/gluon/data/vision/datasets.py
index 54da152b9f..24f66d6b4a 100644
--- a/python/mxnet/gluon/data/vision/datasets.py
+++ b/python/mxnet/gluon/data/vision/datasets.py
@@ -28,9 +28,9 @@
 import warnings
 import numpy as np
 
-from . import dataset
-from ..utils import download, check_sha1
-from ... import nd, image, recordio
+from .. import dataset
+from ...utils import download, check_sha1
+from  import nd, image, recordio
 
 apache_repo_url = 'https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/'
 
diff --git a/python/mxnet/gluon/data/vision/transforms.py 
b/python/mxnet/gluon/data/vision/transforms.py
index fa7c0f2cba..e1deef631d 100644
--- a/python/mxnet/gluon/data/vision/transforms.py
+++ b/python/mxnet/gluon/data/vision/transforms.py
@@ -58,7 +58,7 @@ def __init__(self):
 super(ToTensor, self).__init__()
 
 def hybrid_forward(self, F, x):
-return F.cast(x, 'float32').transpose((2, 0, 1))
+return F.image.to_tensor(x)
 
 
 class Normalize(HybridBlock):
diff --git a/src/operator/image/image_aug_op.h 
b/src/operator/image/image_aug_op.h
deleted file mode 100644
index 40315ec85c..00
--- a/src/operator/image/image_aug_op.h
+++ /dev/null
@@ -1,70 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- *   http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing,
- * software distributed under the License is distributed on an
- * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- * KIND, either express or implied.  See the License for the
- * specific language governing permissions and limitations
- * under the License.
- */
-
-#ifndef MXNET_OPERATOR_IMAGE_IMAGE_AUG_OP_H_
-#define MXNET_OPERATOR_IMAGE_IMAGE_AUG_OP_H_
-
-#include 
-#include 
-#include 
-#include 
-#include "../mshadow_op.h"
-#include "../elemwise_op_common.h"
-#include "../mxnet_op.h"
-
-namespace mxnet {
-namespace op {
-
-struct NormalizeParam : public dmlc::Parameter {
-  nnvm::Tuple mean, std;
-  DMLC_DECLARE_PARAMETER(NormalizeParam) {
-DMLC_DECLARE_FIELD(mean).set_default(nnvm::Tuple({0.f}))
-  .describe("");
-DMLC_DECLARE_FIELD(std).set_default(nnvm::Tuple({1.f}))
-  .describe("");
-  }
-};
-
-
-void NormalizeCompute(const nnvm::NodeAttrs& attrs,
-  const OpContext& ctx,
-  const std::vector& inputs,
-  const std::vector& req,
-  const std::vector& outputs) {
-  using namespace mxnet_op;
-  const auto& params = dmlc::get(attrs.parsed);
-  CHECK_NE(req[0], kAddTo);
-  MSHADOW_TYPE_SWITCH(inputs[0].type_flag_, DType, {
-auto num_channel = inputs[0].shape_[0];
-auto size = inputs[0].Size(1, inputs[0].ndim());
-nnvm::Tuple mean(params.mean.begin(), params.mean.end());
-nnvm::Tuple std(params.std.begin(), params.std.end());
-DType* src = inputs[0].dptr();
-DType* dst = outputs[0].dptr();
-for (int i = 0; i < num_channel; ++i) {
-  for (int j = 0; j < size; ++j, ++out, ++src) {
-*out = (*src - mean[i]) / std[i];
-  }
-}
-  });
-}
-
-}  // namespace op
-}  // namespace mxnet
-#endif  // MXNET_OPERATOR_IMAGE_IMAGE_AUG_OP_H_
diff --git a/src/operator/image/image_common.h 
b/src/operator/image/image_common.h
deleted file mode 100644
index 3b6b8e3298..00
--- a/src/operator/image/image_common.h
+++ /dev/null
@@ -1,89 +0,0 @@
-/*
-* Licensed to the Apache Software Foundation (ASF) 

[incubator-mxnet] branch vision updated: Refactor image operators (#8761)

2017-11-21 Thread jxie
This is an automated email from the ASF dual-hosted git repository.

jxie pushed a commit to branch vision
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git


The following commit(s) were added to refs/heads/vision by this push:
 new 4765139  Refactor image operators (#8761)
4765139 is described below

commit 476513984e9c7b502f8a00f9af51b62ef6658d8e
Author: Eric Junyuan Xie 
AuthorDate: Tue Nov 21 17:43:21 2017 -0800

Refactor image operators (#8761)

* fix

* fix

* fix

* fix

* refactor

* fix
---
 mshadow   |   2 +-
 src/operator/image/image_aug_op.h |  70 
 src/operator/image/image_common.h |  89 --
 src/operator/image/image_random-inl.h | 314 --
 src/operator/image/image_random.cc|  42 ++---
 5 files changed, 135 insertions(+), 382 deletions(-)

diff --git a/mshadow b/mshadow
index 2d7780c..1e1f633 16
--- a/mshadow
+++ b/mshadow
@@ -1 +1 @@
-Subproject commit 2d7780c3f2eefe4453fa419862d1b2089bedb8d5
+Subproject commit 1e1f633a82c1fec5718fd291e2da6149708635f6
diff --git a/src/operator/image/image_aug_op.h 
b/src/operator/image/image_aug_op.h
deleted file mode 100644
index 40315ec..000
--- a/src/operator/image/image_aug_op.h
+++ /dev/null
@@ -1,70 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- *   http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing,
- * software distributed under the License is distributed on an
- * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- * KIND, either express or implied.  See the License for the
- * specific language governing permissions and limitations
- * under the License.
- */
-
-#ifndef MXNET_OPERATOR_IMAGE_IMAGE_AUG_OP_H_
-#define MXNET_OPERATOR_IMAGE_IMAGE_AUG_OP_H_
-
-#include 
-#include 
-#include 
-#include 
-#include "../mshadow_op.h"
-#include "../elemwise_op_common.h"
-#include "../mxnet_op.h"
-
-namespace mxnet {
-namespace op {
-
-struct NormalizeParam : public dmlc::Parameter {
-  nnvm::Tuple mean, std;
-  DMLC_DECLARE_PARAMETER(NormalizeParam) {
-DMLC_DECLARE_FIELD(mean).set_default(nnvm::Tuple({0.f}))
-  .describe("");
-DMLC_DECLARE_FIELD(std).set_default(nnvm::Tuple({1.f}))
-  .describe("");
-  }
-};
-
-
-void NormalizeCompute(const nnvm::NodeAttrs& attrs,
-  const OpContext& ctx,
-  const std::vector& inputs,
-  const std::vector& req,
-  const std::vector& outputs) {
-  using namespace mxnet_op;
-  const auto& params = dmlc::get(attrs.parsed);
-  CHECK_NE(req[0], kAddTo);
-  MSHADOW_TYPE_SWITCH(inputs[0].type_flag_, DType, {
-auto num_channel = inputs[0].shape_[0];
-auto size = inputs[0].Size(1, inputs[0].ndim());
-nnvm::Tuple mean(params.mean.begin(), params.mean.end());
-nnvm::Tuple std(params.std.begin(), params.std.end());
-DType* src = inputs[0].dptr();
-DType* dst = outputs[0].dptr();
-for (int i = 0; i < num_channel; ++i) {
-  for (int j = 0; j < size; ++j, ++out, ++src) {
-*out = (*src - mean[i]) / std[i];
-  }
-}
-  });
-}
-
-}  // namespace op
-}  // namespace mxnet
-#endif  // MXNET_OPERATOR_IMAGE_IMAGE_AUG_OP_H_
diff --git a/src/operator/image/image_common.h 
b/src/operator/image/image_common.h
deleted file mode 100644
index 3b6b8e3..000
--- a/src/operator/image/image_common.h
+++ /dev/null
@@ -1,89 +0,0 @@
-/*
-* Licensed to the Apache Software Foundation (ASF) under one
-* or more contributor license agreements.  See the NOTICE file
-* distributed with this work for additional information
-* regarding copyright ownership.  The ASF licenses this file
-* to you under the Apache License, Version 2.0 (the
-* "License"); you may not use this file except in compliance
-* with the License.  You may obtain a copy of the License at
-*
-*   http://www.apache.org/licenses/LICENSE-2.0
-*
-* Unless required by applicable law or agreed to in writing,
-* software distributed under the License is distributed on an
-* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-* KIND, either express or implied.  See the License for the
-* specific language governing permissions and limitations
-* under the License.
-*/
-
-/*!
-* \file image_common.h
-* \brief
-* \author
-*/
-#ifndef MXNET_OPERATOR_IMAGE_IMAGE_COMMON_H_
-#define MXNET_OPERATOR_IMAGE_IMAGE_COMMON_H_
-
-#include 
-
-namespace mxnet {
-namespace op {
-
-/**
-* @brief convert TBlob to cv::Mat

[GitHub] lyblsgo commented on issue #5863: Language Model Benchmark: can not reproduce the same results as Tensorflow with the same parameters.

2017-11-21 Thread GitBox
lyblsgo commented on issue #5863: Language Model Benchmark: can not reproduce 
the same results as Tensorflow with the same parameters.
URL: 
https://github.com/apache/incubator-mxnet/issues/5863#issuecomment-345934112
 
 
   @sxjscience Can you tell me that what's the differences between gluon's 
word_language_model and lstm_bucketing.py? The  word_language_model's PPL  is 
70+ and lstm_bucketing.py's PPL is 150+ on data PTB. Thanks


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] piiswrong opened a new pull request #8760: Update index.md

2017-11-21 Thread GitBox
piiswrong opened a new pull request #8760: Update index.md
URL: https://github.com/apache/incubator-mxnet/pull/8760
 
 
   ## Description ##
   (Brief description on what this PR is about)
   
   ## Checklist ##
   ### Essentials ###
   - [ ] Passed code style checking (`make lint`)
   - [ ] Changes are complete (i.e. I finished coding on this PR)
   - [ ] All changes have test coverage
   - [ ] For user-facing API changes, API doc string has been updated. For new 
C++ functions in header files, their functionalities and arguments are 
well-documented. 
   - [ ] To my best knowledge, examples are either not affected by this change, 
or have been fixed to be compatible with this change
   
   ### Changes ###
   - [ ] Feature1, tests, (and when applicable, API doc)
   - [ ] Feature2, tests, (and when applicable, API doc)
   
   ## Comments ##
   - If this change is a backward incompatible change, why must this change be 
made.
   - Interesting edge cases to note here
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[incubator-mxnet] branch piiswrong-patch-1-1 created (now 4903f42)

2017-11-21 Thread jxie
This is an automated email from the ASF dual-hosted git repository.

jxie pushed a change to branch piiswrong-patch-1-1
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git.


  at 4903f42  Update index.md

This branch includes the following new commits:

 new 4903f42  Update index.md

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


-- 
To stop receiving notification emails like this one, please contact
['"comm...@mxnet.apache.org" '].


[incubator-mxnet] 01/01: Update index.md

2017-11-21 Thread jxie
This is an automated email from the ASF dual-hosted git repository.

jxie pushed a commit to branch piiswrong-patch-1-1
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git

commit 4903f42bd2a1a069211b04aa9bd219c82eda7472
Author: Eric Junyuan Xie 
AuthorDate: Tue Nov 21 17:31:11 2017 -0800

Update index.md
---
 docs/tutorials/index.md | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/docs/tutorials/index.md b/docs/tutorials/index.md
index 6429dfb..6432615 100644
--- a/docs/tutorials/index.md
+++ b/docs/tutorials/index.md
@@ -1,14 +1,11 @@
 # Tutorials
 
-These tutorials introduce a few fundamental concepts in deep learning and how 
to implement them in _MXNet_. The _Basics_ section contains tutorials on 
manipulating arrays, building networks, loading/preprocessing data, etc. The 
_Training and Inference_ section talks about implementing Linear Regression, 
training a Handwritten digit classifier using MLP and CNN, running inferences 
using a pre-trained model, and lastly, efficiently training a large scale image 
classifier.
-
-
 ## Gluon
 
 Gluon is the high-level interface for MXNet. It is more intuitive and easier 
to use than the lower level interface.
 Gluon supports dynamic (define-by-run) graphs with JIT-compilation to achieve 
both flexibility and efficiency.
-This is a selected subset of Gluon tutorials. For the comprehensive tutorial 
on Gluon,
-please see [gluon.mxnet.io](http://gluon.mxnet.io).
+
+This is a selected subset of Gluon tutorials that explains basic usage of 
Gluon and fundamental concepts in deep learning. For the comprehensive tutorial 
on Gluon that covers topics from basic statistics and probability theory to 
reinforcement learning and recommender systems, please see 
[gluon.mxnet.io](http://gluon.mxnet.io). 
 
 ### Basics
 
@@ -32,6 +29,8 @@ please see [gluon.mxnet.io](http://gluon.mxnet.io).
 
 ## MXNet
 
+These tutorials introduce a few fundamental concepts in deep learning and how 
to implement them in _MXNet_. The _Basics_ section contains tutorials on 
manipulating arrays, building networks, loading/preprocessing data, etc. The 
_Training and Inference_ section talks about implementing Linear Regression, 
training a Handwritten digit classifier using MLP and CNN, running inferences 
using a pre-trained model, and lastly, efficiently training a large scale image 
classifier.
+
 ### Basics
 
 ```eval_rst

-- 
To stop receiving notification emails like this one, please contact
"comm...@mxnet.apache.org" .


[incubator-mxnet] branch piiswrong-patch-1 created (now c12f687)

2017-11-21 Thread jxie
This is an automated email from the ASF dual-hosted git repository.

jxie pushed a change to branch piiswrong-patch-1
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git.


  at c12f687  Update index.md

This branch includes the following new commits:

 new c12f687  Update index.md

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


-- 
To stop receiving notification emails like this one, please contact
['"comm...@mxnet.apache.org" '].


[incubator-mxnet] 01/01: Update index.md

2017-11-21 Thread jxie
This is an automated email from the ASF dual-hosted git repository.

jxie pushed a commit to branch piiswrong-patch-1
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git

commit c12f6877d8b280b5e7abce05081c413fcd1462f3
Author: Eric Junyuan Xie 
AuthorDate: Tue Nov 21 17:28:01 2017 -0800

Update index.md
---
 docs/tutorials/index.md | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/docs/tutorials/index.md b/docs/tutorials/index.md
index 6429dfb..365b370 100644
--- a/docs/tutorials/index.md
+++ b/docs/tutorials/index.md
@@ -1,8 +1,5 @@
 # Tutorials
 
-These tutorials introduce a few fundamental concepts in deep learning and how 
to implement them in _MXNet_. The _Basics_ section contains tutorials on 
manipulating arrays, building networks, loading/preprocessing data, etc. The 
_Training and Inference_ section talks about implementing Linear Regression, 
training a Handwritten digit classifier using MLP and CNN, running inferences 
using a pre-trained model, and lastly, efficiently training a large scale image 
classifier.
-
-
 ## Gluon
 
 Gluon is the high-level interface for MXNet. It is more intuitive and easier 
to use than the lower level interface.
@@ -32,6 +29,8 @@ please see [gluon.mxnet.io](http://gluon.mxnet.io).
 
 ## MXNet
 
+These tutorials introduce a few fundamental concepts in deep learning and how 
to implement them in _MXNet_. The _Basics_ section contains tutorials on 
manipulating arrays, building networks, loading/preprocessing data, etc. The 
_Training and Inference_ section talks about implementing Linear Regression, 
training a Handwritten digit classifier using MLP and CNN, running inferences 
using a pre-trained model, and lastly, efficiently training a large scale image 
classifier.
+
 ### Basics
 
 ```eval_rst

-- 
To stop receiving notification emails like this one, please contact
"comm...@mxnet.apache.org" .


[GitHub] mbaijal commented on a change in pull request #8704: Initial Prep for 1.0: bump up version and add 0.12.1 changes to master

2017-11-21 Thread GitBox
mbaijal commented on a change in pull request #8704: Initial Prep for 1.0: bump 
up version and add 0.12.1 changes to master
URL: https://github.com/apache/incubator-mxnet/pull/8704#discussion_r152447463
 
 

 ##
 File path: docs/build_version_doc/build_all_version.sh
 ##
 @@ -21,7 +21,7 @@
 # Built files are stored in $built
 # Version numbers are stored in $tag_list.
 # Version numbers are ordered from latest to old and final one is master.
-tag_list="0.12.0 0.11.0 master"
+tag_list="1.0.0 0.12.0 0.11.0 master"
 
 Review comment:
   @szha Could you please merge this into the 1.0.0 branch


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] solin319 commented on a change in pull request #8414: Fix the strides in Gluon version of ResNet v1

2017-11-21 Thread GitBox
solin319 commented on a change in pull request #8414: Fix the strides in Gluon 
version of ResNet v1
URL: https://github.com/apache/incubator-mxnet/pull/8414#discussion_r152446834
 
 

 ##
 File path: python/mxnet/gluon/model_zoo/vision/resnet.py
 ##
 @@ -102,10 +102,10 @@ class BottleneckV1(HybridBlock):
 def __init__(self, channels, stride, downsample=False, in_channels=0, 
**kwargs):
 super(BottleneckV1, self).__init__(**kwargs)
 self.body = nn.HybridSequential(prefix='')
-self.body.add(nn.Conv2D(channels//4, kernel_size=1, strides=1))
+self.body.add(nn.Conv2D(channels//4, kernel_size=1, strides=stride))
 
 Review comment:
   1. How about add 'use_bias=False' in Conv2D? This will make resnet faster.
  The effect of bias in convolution is same as beta in the next batchnorm.
   2. Does (2,2) stride set in 1*1 conv or 3*3 conv will effect the final 
accuracy of resnet? In 1*1 conv, (2,2) stride will loss some pixels in feature 
map.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] piiswrong closed pull request #8632: a user friendly way to use g2c in module and an example of g2c

2017-11-21 Thread GitBox
piiswrong closed pull request #8632: a user friendly way to use g2c in module 
and an example of g2c
URL: https://github.com/apache/incubator-mxnet/pull/8632
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/example/model-parallel-lstm/README.md 
b/example/model-parallel/lstm/README.md
similarity index 100%
rename from example/model-parallel-lstm/README.md
rename to example/model-parallel/lstm/README.md
diff --git a/example/model-parallel-lstm/get_ptb_data.sh 
b/example/model-parallel/lstm/get_ptb_data.sh
similarity index 100%
rename from example/model-parallel-lstm/get_ptb_data.sh
rename to example/model-parallel/lstm/get_ptb_data.sh
diff --git a/example/model-parallel-lstm/lstm.py 
b/example/model-parallel/lstm/lstm.py
similarity index 100%
rename from example/model-parallel-lstm/lstm.py
rename to example/model-parallel/lstm/lstm.py
diff --git a/example/model-parallel-lstm/lstm_ptb.py 
b/example/model-parallel/lstm/lstm_ptb.py
similarity index 100%
rename from example/model-parallel-lstm/lstm_ptb.py
rename to example/model-parallel/lstm/lstm_ptb.py
diff --git a/example/model-parallel/matrix_factorization/get_data.py 
b/example/model-parallel/matrix_factorization/get_data.py
new file mode 100644
index 00..bb2503a716
--- /dev/null
+++ b/example/model-parallel/matrix_factorization/get_data.py
@@ -0,0 +1,56 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import os
+import mxnet as mx
+
+
+def get_movielens_data(prefix):
+if not os.path.exists("%s.zip" % prefix):
+print("Dataset MovieLens 10M not present. Downloading now ...")
+os.system("wget http://files.grouplens.org/datasets/movielens/%s.zip; 
% prefix)
+os.system("unzip %s.zip" % prefix)
+os.system("cd ml-10M100K; sh split_ratings.sh; cd -;")
+
+def get_movielens_iter(filename, batch_size):
+"""Not particularly fast code to parse the text file and load into 
NDArrays.
+return two data iters, one for train, the other for validation.
+"""
+print("Preparing data iterators for " + filename + " ... ")
+user = []
+item = []
+score = []
+with open(filename, 'r') as f:
+num_samples = 0
+for line in f:
+tks = line.strip().split('::')
+if len(tks) != 4:
+continue
+num_samples += 1
+user.append((tks[0]))
+item.append((tks[1]))
+score.append((tks[2]))
+# convert to ndarrays
+user = mx.nd.array(user, dtype='int32')
+item = mx.nd.array(item)
+score = mx.nd.array(score)
+# prepare data iters
+data_train = {'user':user, 'item':item}
+label_train = {'score':score}
+iter_train = mx.io.NDArrayIter(data=data_train,label=label_train,
+   batch_size=batch_size, shuffle=True)
+return mx.io.PrefetchingIter(iter_train)
diff --git 
a/example/model-parallel/matrix_factorization/matrix_fact_parallel_model.py 
b/example/model-parallel/matrix_factorization/matrix_fact_parallel_model.py
new file mode 100644
index 00..f4004d1a65
--- /dev/null
+++ b/example/model-parallel/matrix_factorization/matrix_fact_parallel_model.py
@@ -0,0 +1,56 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import mxnet as mx
+
+def 

[incubator-mxnet] branch master updated: a user friendly way to use g2c in module and an example of g2c (#8632)

2017-11-21 Thread jxie
This is an automated email from the ASF dual-hosted git repository.

jxie pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git


The following commit(s) were added to refs/heads/master by this push:
 new ec6144f  a user friendly way to use g2c in module and an example of 
g2c (#8632)
ec6144f is described below

commit ec6144f8730620300d457101ab847ff127c9
Author: Ziyue Huang 
AuthorDate: Wed Nov 22 09:15:14 2017 +0800

a user friendly way to use g2c in module and an example of g2c (#8632)

* a user friendly way to use g2c in module

* also support g2c to be list

* update

* update test

* g2c example

* Update matrix_factorization_model_parallel.py

* address comments

* update

* update

* remove fc

* debug g2c

* Revert "debug g2c"

This reverts commit caabdc5c5fa8618d3ed4db2cbad4e807b63c211e.

* update

* move g2c example to another folder

* update

* readme
---
 .../lstm}/README.md|   0
 .../lstm}/get_ptb_data.sh  |   0
 .../lstm}/lstm.py  |   0
 .../lstm}/lstm_ptb.py  |   0
 .../matrix_factorization/get_data.py   |  56 +++
 .../matrix_fact_parallel_model.py  |  56 +++
 .../matrix_factorization_model_parallel.py | 106 +
 .../model-parallel/matrix_factorization/readme.md  |   6 ++
 python/mxnet/module/bucketing_module.py|   3 +-
 python/mxnet/module/executor_group.py  |  37 ++-
 python/mxnet/module/module.py  |   3 +-
 tests/python/unittest/test_module.py   |  61 +++-
 12 files changed, 295 insertions(+), 33 deletions(-)

diff --git a/example/model-parallel-lstm/README.md 
b/example/model-parallel/lstm/README.md
similarity index 100%
rename from example/model-parallel-lstm/README.md
rename to example/model-parallel/lstm/README.md
diff --git a/example/model-parallel-lstm/get_ptb_data.sh 
b/example/model-parallel/lstm/get_ptb_data.sh
similarity index 100%
rename from example/model-parallel-lstm/get_ptb_data.sh
rename to example/model-parallel/lstm/get_ptb_data.sh
diff --git a/example/model-parallel-lstm/lstm.py 
b/example/model-parallel/lstm/lstm.py
similarity index 100%
rename from example/model-parallel-lstm/lstm.py
rename to example/model-parallel/lstm/lstm.py
diff --git a/example/model-parallel-lstm/lstm_ptb.py 
b/example/model-parallel/lstm/lstm_ptb.py
similarity index 100%
rename from example/model-parallel-lstm/lstm_ptb.py
rename to example/model-parallel/lstm/lstm_ptb.py
diff --git a/example/model-parallel/matrix_factorization/get_data.py 
b/example/model-parallel/matrix_factorization/get_data.py
new file mode 100644
index 000..bb2503a
--- /dev/null
+++ b/example/model-parallel/matrix_factorization/get_data.py
@@ -0,0 +1,56 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import os
+import mxnet as mx
+
+
+def get_movielens_data(prefix):
+if not os.path.exists("%s.zip" % prefix):
+print("Dataset MovieLens 10M not present. Downloading now ...")
+os.system("wget http://files.grouplens.org/datasets/movielens/%s.zip; 
% prefix)
+os.system("unzip %s.zip" % prefix)
+os.system("cd ml-10M100K; sh split_ratings.sh; cd -;")
+
+def get_movielens_iter(filename, batch_size):
+"""Not particularly fast code to parse the text file and load into 
NDArrays.
+return two data iters, one for train, the other for validation.
+"""
+print("Preparing data iterators for " + filename + " ... ")
+user = []
+item = []
+score = []
+with open(filename, 'r') as f:
+num_samples = 0
+for line in f:
+tks = line.strip().split('::')
+if len(tks) != 4:
+continue
+num_samples += 1
+user.append((tks[0]))
+item.append((tks[1]))
+score.append((tks[2]))
+# convert to ndarrays
+user = mx.nd.array(user, dtype='int32')
+item = mx.nd.array(item)
+ 

[GitHub] javelinjs opened a new pull request #8759: image flip op

2017-11-21 Thread GitBox
javelinjs opened a new pull request #8759: image flip op
URL: https://github.com/apache/incubator-mxnet/pull/8759
 
 
   https://github.com/apache/incubator-mxnet/issues/8556


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] yajiedesign commented on issue #8752: fix lint with cmake

2017-11-21 Thread GitBox
yajiedesign commented on issue #8752: fix lint with cmake
URL: https://github.com/apache/incubator-mxnet/pull/8752#issuecomment-346212137
 
 
   @cjolivier01 only cpplint now.should be add rcpplint jnilint pylint if our 
use only cmake build system.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] cjolivier01 commented on issue #8758: Explicitly convert float value

2017-11-21 Thread GitBox
cjolivier01 commented on issue #8758: Explicitly convert float value
URL: https://github.com/apache/incubator-mxnet/pull/8758#issuecomment-346211327
 
 
Can you add a unit test that is that issues code ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] szha commented on issue #8668: Precision error setting NDArray from np.float32 scalar

2017-11-21 Thread GitBox
szha commented on issue #8668: Precision error setting NDArray from np.float32 
scalar
URL: 
https://github.com/apache/incubator-mxnet/issues/8668#issuecomment-346210607
 
 
   A missing `float()` version was the problem. The above PR should fix it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] piiswrong commented on a change in pull request #8757: [ImageIO] Fix image io for opencv3.3

2017-11-21 Thread GitBox
piiswrong commented on a change in pull request #8757: [ImageIO] Fix image io 
for opencv3.3
URL: https://github.com/apache/incubator-mxnet/pull/8757#discussion_r152443898
 
 

 ##
 File path: src/io/image_io.cc
 ##
 @@ -156,7 +156,10 @@ void ImdecodeImpl(int flag, bool to_rgb, void* data, 
size_t size,
   } else {
 dst = cv::Mat(out->shape()[0], out->shape()[1], flag == 0 ? CV_8U : 
CV_8UC3,
 out->data().dptr_);
-#if (CV_MAJOR_VERSION > 2 || (CV_MAJOR_VERSION == 2 && CV_MINOR_VERSION >=4))
+#if (CV_MAJOR_VERSION > 2 && CV_MINOR_VERSION >= 3)
 
 Review comment:
   This condition won't work for opencv 4.1
   you should use `CV_MAJOR_VERSION > 3 || (CV_MAJOR_VERSION == 3 && 
CV_MINOR_VERSION >=3)`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] piiswrong closed pull request #8744: Added a security best practices doc

2017-11-21 Thread GitBox
piiswrong closed pull request #8744: Added a security best practices doc
URL: https://github.com/apache/incubator-mxnet/pull/8744
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/docs/faq/index.md b/docs/faq/index.md
index 1bfaea4a7f..e29bda0b68 100644
--- a/docs/faq/index.md
+++ b/docs/faq/index.md
@@ -40,6 +40,9 @@ and full working examples, visit the [tutorials 
section](../tutorials/index.md).
 
 * [How to convert MXNet models to Apple CoreML 
format?](https://github.com/apache/incubator-mxnet/tree/master/tools/coreml)
 
+## Security
+* [How to run MXNet securely?](http://mxnet.io/how_to/security.md)
+
 ## Extend and Contribute to MXNet
 
 * [How do I join the MXNet development 
discussion?](http://mxnet.io/community/mxnet_channels.html)
diff --git a/docs/how_to/security.md b/docs/how_to/security.md
new file mode 100644
index 00..6f64a9e608
--- /dev/null
+++ b/docs/how_to/security.md
@@ -0,0 +1,24 @@
+# MXNet Security best practices
+
+MXNet framework has no built-in security protections. It assumes that the 
MXNet entities involved in model training and inferencing (hosting) are fully 
trusted. It also assumes that their communications cannot be eavesdropped or 
tampered with. MXNet consumers shall ensure that the above assumptions are met.
+
+In particular the following threat-vectors exist when training using MXNet:
+
+* When running distributed training using MXNet there is no built-in support 
for authenticating cluster nodes participating in the training job.
+* Data exchange between cluster nodes happens is in plain-text.
+* Using `kvstore.set_optimizer` one can use a custom optimizer to combine 
gradients. This optimizer code is sent to the server nodes as a pickle file. A 
server does not perform any further validation of the pickle file and simply 
executes the code trusting the sender (worker).
+* Since there is no authentication between nodes, a malicious actor running on 
the same network can launch a Denial of Service (DoS) attack by sending data 
that can overwhelm/crash a scheduler or other server nodes.
+
+It is highly recommended that the following best practices be followed when 
using MXNet:
+
+* Run MXNet with least privilege, i.e. not as root.
+* Run MXNet training jobs inside a secure and isolated environment. If you are 
using a cloud provider like Amazon AWS, running your training job inside a 
[private VPC] (https://aws.amazon.com/vpc/) is a good way to accomplish this. 
Additionally, configure your network security settings so as to only allow 
connections that the cluster nodes require.
+* Make sure no unauthorized actors have physical or remote access to the nodes 
participating in MXNet training.
+* During training, one can configure MXNet to periodically save model 
checkpoints. To protect these model checkpoints from unauthorized access, make 
sure the checkpoints are written out to an encrypted storage volume, and have a 
provision to delete checkpoints that are no longer needed.
+* When sharing trained models, or when receiving trained models from other 
parties, ensure that model artifacts are authenticated and integrity protected 
using cryptographic signatures, thus ensuring that the data received comes from 
trusted sources and has not been maliciously (or accidentally) modified in 
transit.
+* By default, mx.random uses a static and fixed seed value. The random 
utilities in MXNet should therefore never be used to implement any type of 
security critical functionality where cryptographically secure pseudorandom 
number generation is required.
+
+# Deployment Considerations
+The following are not MXNet framework specific threats but are applicable to 
Machine Learning models in general.
+
+* When deploying high-value, proprietary models for inference, care should be 
taken to prevent an adversary from stealing the model. The research paper 
[Stealing Machine Learning Models via Prediction APIs] 
(https://arxiv.org/pdf/1609.02943.pdf) outlines experiments performed to show 
how an attacker can use a prediction API to leak the ML model or construct a 
nearly identical replica. A simple way to thwart such an attack is to not 
expose the prediction probabilities to a high degree of precision in the API 
response.


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[incubator-mxnet] branch master updated: Added a security best practices doc (#8744)

2017-11-21 Thread jxie
This is an automated email from the ASF dual-hosted git repository.

jxie pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git


The following commit(s) were added to refs/heads/master by this push:
 new eb72d72  Added a security best practices doc (#8744)
eb72d72 is described below

commit eb72d72707eae391d8d4a104e22be97950eee269
Author: Madan Jampani 
AuthorDate: Tue Nov 21 16:52:58 2017 -0800

Added a security best practices doc (#8744)

* Added a security best practices doc

* Minor edit
---
 docs/faq/index.md   |  3 +++
 docs/how_to/security.md | 24 
 2 files changed, 27 insertions(+)

diff --git a/docs/faq/index.md b/docs/faq/index.md
index 1bfaea4..e29bda0 100644
--- a/docs/faq/index.md
+++ b/docs/faq/index.md
@@ -40,6 +40,9 @@ and full working examples, visit the [tutorials 
section](../tutorials/index.md).
 
 * [How to convert MXNet models to Apple CoreML 
format?](https://github.com/apache/incubator-mxnet/tree/master/tools/coreml)
 
+## Security
+* [How to run MXNet securely?](http://mxnet.io/how_to/security.md)
+
 ## Extend and Contribute to MXNet
 
 * [How do I join the MXNet development 
discussion?](http://mxnet.io/community/mxnet_channels.html)
diff --git a/docs/how_to/security.md b/docs/how_to/security.md
new file mode 100644
index 000..6f64a9e
--- /dev/null
+++ b/docs/how_to/security.md
@@ -0,0 +1,24 @@
+# MXNet Security best practices
+
+MXNet framework has no built-in security protections. It assumes that the 
MXNet entities involved in model training and inferencing (hosting) are fully 
trusted. It also assumes that their communications cannot be eavesdropped or 
tampered with. MXNet consumers shall ensure that the above assumptions are met.
+
+In particular the following threat-vectors exist when training using MXNet:
+
+* When running distributed training using MXNet there is no built-in support 
for authenticating cluster nodes participating in the training job.
+* Data exchange between cluster nodes happens is in plain-text.
+* Using `kvstore.set_optimizer` one can use a custom optimizer to combine 
gradients. This optimizer code is sent to the server nodes as a pickle file. A 
server does not perform any further validation of the pickle file and simply 
executes the code trusting the sender (worker).
+* Since there is no authentication between nodes, a malicious actor running on 
the same network can launch a Denial of Service (DoS) attack by sending data 
that can overwhelm/crash a scheduler or other server nodes.
+
+It is highly recommended that the following best practices be followed when 
using MXNet:
+
+* Run MXNet with least privilege, i.e. not as root.
+* Run MXNet training jobs inside a secure and isolated environment. If you are 
using a cloud provider like Amazon AWS, running your training job inside a 
[private VPC] (https://aws.amazon.com/vpc/) is a good way to accomplish this. 
Additionally, configure your network security settings so as to only allow 
connections that the cluster nodes require.
+* Make sure no unauthorized actors have physical or remote access to the nodes 
participating in MXNet training.
+* During training, one can configure MXNet to periodically save model 
checkpoints. To protect these model checkpoints from unauthorized access, make 
sure the checkpoints are written out to an encrypted storage volume, and have a 
provision to delete checkpoints that are no longer needed.
+* When sharing trained models, or when receiving trained models from other 
parties, ensure that model artifacts are authenticated and integrity protected 
using cryptographic signatures, thus ensuring that the data received comes from 
trusted sources and has not been maliciously (or accidentally) modified in 
transit.
+* By default, mx.random uses a static and fixed seed value. The random 
utilities in MXNet should therefore never be used to implement any type of 
security critical functionality where cryptographically secure pseudorandom 
number generation is required.
+
+# Deployment Considerations
+The following are not MXNet framework specific threats but are applicable to 
Machine Learning models in general.
+
+* When deploying high-value, proprietary models for inference, care should be 
taken to prevent an adversary from stealing the model. The research paper 
[Stealing Machine Learning Models via Prediction APIs] 
(https://arxiv.org/pdf/1609.02943.pdf) outlines experiments performed to show 
how an attacker can use a prediction API to leak the ML model or construct a 
nearly identical replica. A simple way to thwart such an attack is to not 
expose the prediction probabilities to a high degree of  [...]

-- 
To stop receiving notification emails like this one, please contact
['"comm...@mxnet.apache.org" '].


[GitHub] piiswrong closed pull request #8743: add code signing key

2017-11-21 Thread GitBox
piiswrong closed pull request #8743: add code signing key
URL: https://github.com/apache/incubator-mxnet/pull/8743
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/KEYS b/KEYS
index 98b7b9e3d6..d646bb7c3f 100644
--- a/KEYS
+++ b/KEYS
@@ -304,3 +304,62 @@ 
IjljtjhIMhMLB5rf8BPCZ6og5fKqUF5LOp8DujG2DGa9ZhYWTzOO/UGZP60qGTot
 GZZVNUU0hQYfulYDY5E8fJ4Olzpf5OE=
 =WmLB
 -END PGP PUBLIC KEY BLOCK-
+pub   rsa4096 2017-11-21 [SC]
+  331E9A5ED727FADD429B2894F2F1EAB589EBCFB1
+uid   [ultimate] Haibin Lin 
+sig 3F2F1EAB589EBCFB1 2017-11-21  Haibin Lin 
+sub   rsa4096 2017-11-21 [E]
+sig  F2F1EAB589EBCFB1 2017-11-21  Haibin Lin 
+
+-BEGIN PGP PUBLIC KEY BLOCK-
+
+mQINBFoTp3YBEACiGa++rsTjQal+33xADuWxzN9L8bTkMu4uFJqYvyNP2z1Q0fcM
+DFjLJcvsc3ODSlkDGlkrtFpYlqkBTFERABU19TcAQ5FYFu1uULUybtHm55h6OKAm
+1qfSRcKvdidDRytf7XAnhK/jvjtY71EQZUz2OtvKj0p93C22JcaJasKjHEF+8Jv0
+1rvV4BsZcY3hl9ORbv+nvBB6PX6zkpfhh0edVl50yzJEM34dtBZ1CTVlcJhIj0yo
+LEZkt+zKEz5C3/D5OgM2DoclUInAvPeIGXvOgoQi9he4YjMppC3fmcA9O+sJ8XFh
+dqNxcI+ddcvg84g4ntC2iJb8OOX75xkkoIsJXhZgwxBbdnwINNY6Eqqyx2lMvGRI
+BLTSxLKsfX/mCmW9mwNrKxfrBIb107ldxwfo+13/Vh45nIlhM0yxfhlukHmYEHp+
+G+T+aD67t0HHZHr27M2x0qTdKkRoI+7xYTUvu+OmObJej48UDhi4GMAjQ61TeLm1
+OyetyMoKpB+Cah1n0O5j6nDPRJBS9OPi361DIZRhlg4IkrbIP5MHs+Zvof8O04xq
+GRfYAqEhT6rP98TidpHVhFEV3CrDLVDJLZ3Vqglj2iyNOjEjF1GJJBaFWUoXhKPs
+WVZMfgpkaXRwng6r6ieRmmt/Ci//JV6ztkwKk7e0OQJBqbwA0A7lqx7j2QARAQAB
+tCVIYWliaW4gTGluIDxsaW5oYWliaW4uZXJpY0BnbWFpbC5jb20+iQJOBBMBCAA4
+FiEEMx6aXtcn+t1CmyiU8vHqtYnrz7EFAloTp3YCGwMFCwkIBwIGFQgJCgsCBBYC
+AwECHgECF4AACgkQ8vHqtYnrz7GFWA//Z6YTxtlZSHFlqkAFFOsDtV3DghSC8zJe
+LRm508fZn53e9a3fUvT9U1sUfW8DI69GRK+IBkvP5hcmMb1U4N3MxzX4YC/13wMY
+3BtUbCIpD8uBJOtuC7fPAH//Ij/4wv4Fp1/3WL6y04+mJIayMyKqmc3nBLD0rVWC
+AHEsPR7tiDDMltrzxMNHIJCDaiClJzKiCrQ4owKBOnY2TU/E64xyk5IwAczz2lCY
+712h6+q2mO7F672Yt6b6pqmugnFqWdqUj9dx1V9x//4y/k0DefF7G/1Lk1lh4Eyo
+aUx3jve/74Y87ICW1AhR2/TvdfWbsAkPyfy98k1SLR/9BulSIXIFeduxaFl7M3D8
+98aB5pqO8tPl2BFUJwh/uywDx0994MjQ8Xvrjmb9WJOAx9OyokivVCvmqJOkBzve
+Fk/4KUHTFTGQCoXbbBlIQTC9hBd8c1S4t0gFGbcjlqTvr/ZnTdpSgbzZ/96/SVRm
+dYOgjjpkrBOZgJPwsmmRQ2MufeZUtmkFSqdIRLGBNTefsMDDCGvyNeR/XCgM5Zfy
+39PX/GHFKgq5Ei2ywEyZOGLCK5MwA12fMExYoedazFFjv6ApGpz+j831A2z/crEo
+bRpVvd+rFzGnCKDq5viUD7cRzIPLVltYCNEayEgWta4KI+00/ayaaT6sM7N7oM32
+r01Wv02FvdG5Ag0EWhOndgEQAPiiTvmo9fZNW/5IxL7kDR6u9FEmEb2EZI+KxzbN
+RYYY0IPsnA8TY9Rzj9D7xV8Vmf2Pd5SUyCtVwLfBKhadLh755NeehNXWIbW802gH
+bvbykL/Zcn98oiLOVfK/Op/6MVpDuGXZ6CpDbQDSn6ne6/CWQnoz1+Wo+wbs1TOy
+AhO6xKa20NtGIZrfZD01dSzRC5DMJD3GK1j6HdVUz5piwiTsGvGRJ3ZLfObdlHGn
+CTMA39Jb8zQ0QtWPsOre0Nz2JQ53awMBaUhan5MeoOYp6ccsgD1BigyxmKb8iIDN
+NM/Iwi0Ib5L4AiGh6fQFf0WF8p74yIn1WgFcWxJXR1ZzvMDDHXqq97SQtbr9FKhu
+xrceh/92Ga4ruAJRCbMtmOTUP4APTeT4csANdgJxtW+I4QAp01BQSl75pB2QDlam
++tqePQDboAGc78Ck6096wML0ZMKDDxXPrI67uppuM02FYuJ41ZQjOytigeoGS88g
+ByZwPcFIT+5XgtNC0BH7U9VIkiap5U00lykzEjcRjrZTtKqHdeFPbSEpv1QfIcLG
+Ra439g9acRHX82sVzhzZk5uu9QKyDN1EpuWoLOaOrICHcMSC7GkVXS8+/7TX0vAN
+vn/51fb+tHJekGfaPhsPuIbSba2kmUy8sSS/6JJHkJ1aEFigAPbwUbZTqNlb4IRm
+FBVBABEBAAGJAjYEGAEIACAWIQQzHppe1yf63UKbKJTy8eq1ievPsQUCWhOndgIb
+DAAKCRDy8eq1ievPsbrpEACQ8HqAvq3NuiM00WyHla7VtghCWVEmRozbYc4dR7u+
+sTQrVgbLfgR5zeSWCMHpEcaN/RS58O/i1Dk0DLHTu3NrarzrkEPlHwIgJQ7orxFD
+YW3Z2Ytk40uKex4ou/8VzvXTpj1u8d/GHgGdvChBmtw5FaMgc8PBi4FnlIS5cAGU
+1ca1RwMX0WpFsp9HgrQLVxgkDs/m7oRSmC5GvPDIpb5S9QFzJKYKTJxSfXXO6hCk
+FGAGHWjVC26a/wSUtZQfb3G9sYZJuKUOwr4tpz1y6Ronc34cZYi1FlKWJuz01w4s
+4PKjFG/wbYSd+QLfftyyVPMLdY+wCwc8O59QqKx5Rj8HQLxIwSL3chhmdAHCmejM
+zKCpkFyLOc6+Wjet6hD6X3EsjIee1AAy22D24EaLJsju9zR/khJFS4K76aQX7dYN
+aB3C7S5HGxvYGSqfnn4eBaEzrSOde7HEcqYpYKxS+jB1c4X4W91NSTsqDd0QJMVF
+35eKfhWj+X6jWIC+48kfzypXdOCnPbto7wrr40yYCHw3XSXj40H5dWSsWEZVmS+s
+Dzz6zy9maHVyXa/rNsL7OjqimtKad65r/wfSFPPIcR1jJfP4GMNHV0TYqxdyDaXg
+iEVpHzOV7gd75fJbOvoNxNZj20Yj5sg8OCwbv8PxLXEcBFs7hhjQMhVRsjpNYzAR
+Iw==
+=rMlc
+-END PGP PUBLIC KEY BLOCK-


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[incubator-mxnet] branch master updated: add code signing key (#8743)

2017-11-21 Thread jxie
This is an automated email from the ASF dual-hosted git repository.

jxie pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git


The following commit(s) were added to refs/heads/master by this push:
 new 9d48f74  add code signing key (#8743)
9d48f74 is described below

commit 9d48f7410568bf8a2ca926d2d6d88b7f6f80227f
Author: Haibin Lin 
AuthorDate: Tue Nov 21 16:50:35 2017 -0800

add code signing key (#8743)
---
 KEYS | 59 +++
 1 file changed, 59 insertions(+)

diff --git a/KEYS b/KEYS
index 98b7b9e..d646bb7 100644
--- a/KEYS
+++ b/KEYS
@@ -304,3 +304,62 @@ 
IjljtjhIMhMLB5rf8BPCZ6og5fKqUF5LOp8DujG2DGa9ZhYWTzOO/UGZP60qGTot
 GZZVNUU0hQYfulYDY5E8fJ4Olzpf5OE=
 =WmLB
 -END PGP PUBLIC KEY BLOCK-
+pub   rsa4096 2017-11-21 [SC]
+  331E9A5ED727FADD429B2894F2F1EAB589EBCFB1
+uid   [ultimate] Haibin Lin 
+sig 3F2F1EAB589EBCFB1 2017-11-21  Haibin Lin 
+sub   rsa4096 2017-11-21 [E]
+sig  F2F1EAB589EBCFB1 2017-11-21  Haibin Lin 
+
+-BEGIN PGP PUBLIC KEY BLOCK-
+
+mQINBFoTp3YBEACiGa++rsTjQal+33xADuWxzN9L8bTkMu4uFJqYvyNP2z1Q0fcM
+DFjLJcvsc3ODSlkDGlkrtFpYlqkBTFERABU19TcAQ5FYFu1uULUybtHm55h6OKAm
+1qfSRcKvdidDRytf7XAnhK/jvjtY71EQZUz2OtvKj0p93C22JcaJasKjHEF+8Jv0
+1rvV4BsZcY3hl9ORbv+nvBB6PX6zkpfhh0edVl50yzJEM34dtBZ1CTVlcJhIj0yo
+LEZkt+zKEz5C3/D5OgM2DoclUInAvPeIGXvOgoQi9he4YjMppC3fmcA9O+sJ8XFh
+dqNxcI+ddcvg84g4ntC2iJb8OOX75xkkoIsJXhZgwxBbdnwINNY6Eqqyx2lMvGRI
+BLTSxLKsfX/mCmW9mwNrKxfrBIb107ldxwfo+13/Vh45nIlhM0yxfhlukHmYEHp+
+G+T+aD67t0HHZHr27M2x0qTdKkRoI+7xYTUvu+OmObJej48UDhi4GMAjQ61TeLm1
+OyetyMoKpB+Cah1n0O5j6nDPRJBS9OPi361DIZRhlg4IkrbIP5MHs+Zvof8O04xq
+GRfYAqEhT6rP98TidpHVhFEV3CrDLVDJLZ3Vqglj2iyNOjEjF1GJJBaFWUoXhKPs
+WVZMfgpkaXRwng6r6ieRmmt/Ci//JV6ztkwKk7e0OQJBqbwA0A7lqx7j2QARAQAB
+tCVIYWliaW4gTGluIDxsaW5oYWliaW4uZXJpY0BnbWFpbC5jb20+iQJOBBMBCAA4
+FiEEMx6aXtcn+t1CmyiU8vHqtYnrz7EFAloTp3YCGwMFCwkIBwIGFQgJCgsCBBYC
+AwECHgECF4AACgkQ8vHqtYnrz7GFWA//Z6YTxtlZSHFlqkAFFOsDtV3DghSC8zJe
+LRm508fZn53e9a3fUvT9U1sUfW8DI69GRK+IBkvP5hcmMb1U4N3MxzX4YC/13wMY
+3BtUbCIpD8uBJOtuC7fPAH//Ij/4wv4Fp1/3WL6y04+mJIayMyKqmc3nBLD0rVWC
+AHEsPR7tiDDMltrzxMNHIJCDaiClJzKiCrQ4owKBOnY2TU/E64xyk5IwAczz2lCY
+712h6+q2mO7F672Yt6b6pqmugnFqWdqUj9dx1V9x//4y/k0DefF7G/1Lk1lh4Eyo
+aUx3jve/74Y87ICW1AhR2/TvdfWbsAkPyfy98k1SLR/9BulSIXIFeduxaFl7M3D8
+98aB5pqO8tPl2BFUJwh/uywDx0994MjQ8Xvrjmb9WJOAx9OyokivVCvmqJOkBzve
+Fk/4KUHTFTGQCoXbbBlIQTC9hBd8c1S4t0gFGbcjlqTvr/ZnTdpSgbzZ/96/SVRm
+dYOgjjpkrBOZgJPwsmmRQ2MufeZUtmkFSqdIRLGBNTefsMDDCGvyNeR/XCgM5Zfy
+39PX/GHFKgq5Ei2ywEyZOGLCK5MwA12fMExYoedazFFjv6ApGpz+j831A2z/crEo
+bRpVvd+rFzGnCKDq5viUD7cRzIPLVltYCNEayEgWta4KI+00/ayaaT6sM7N7oM32
+r01Wv02FvdG5Ag0EWhOndgEQAPiiTvmo9fZNW/5IxL7kDR6u9FEmEb2EZI+KxzbN
+RYYY0IPsnA8TY9Rzj9D7xV8Vmf2Pd5SUyCtVwLfBKhadLh755NeehNXWIbW802gH
+bvbykL/Zcn98oiLOVfK/Op/6MVpDuGXZ6CpDbQDSn6ne6/CWQnoz1+Wo+wbs1TOy
+AhO6xKa20NtGIZrfZD01dSzRC5DMJD3GK1j6HdVUz5piwiTsGvGRJ3ZLfObdlHGn
+CTMA39Jb8zQ0QtWPsOre0Nz2JQ53awMBaUhan5MeoOYp6ccsgD1BigyxmKb8iIDN
+NM/Iwi0Ib5L4AiGh6fQFf0WF8p74yIn1WgFcWxJXR1ZzvMDDHXqq97SQtbr9FKhu
+xrceh/92Ga4ruAJRCbMtmOTUP4APTeT4csANdgJxtW+I4QAp01BQSl75pB2QDlam
++tqePQDboAGc78Ck6096wML0ZMKDDxXPrI67uppuM02FYuJ41ZQjOytigeoGS88g
+ByZwPcFIT+5XgtNC0BH7U9VIkiap5U00lykzEjcRjrZTtKqHdeFPbSEpv1QfIcLG
+Ra439g9acRHX82sVzhzZk5uu9QKyDN1EpuWoLOaOrICHcMSC7GkVXS8+/7TX0vAN
+vn/51fb+tHJekGfaPhsPuIbSba2kmUy8sSS/6JJHkJ1aEFigAPbwUbZTqNlb4IRm
+FBVBABEBAAGJAjYEGAEIACAWIQQzHppe1yf63UKbKJTy8eq1ievPsQUCWhOndgIb
+DAAKCRDy8eq1ievPsbrpEACQ8HqAvq3NuiM00WyHla7VtghCWVEmRozbYc4dR7u+
+sTQrVgbLfgR5zeSWCMHpEcaN/RS58O/i1Dk0DLHTu3NrarzrkEPlHwIgJQ7orxFD
+YW3Z2Ytk40uKex4ou/8VzvXTpj1u8d/GHgGdvChBmtw5FaMgc8PBi4FnlIS5cAGU
+1ca1RwMX0WpFsp9HgrQLVxgkDs/m7oRSmC5GvPDIpb5S9QFzJKYKTJxSfXXO6hCk
+FGAGHWjVC26a/wSUtZQfb3G9sYZJuKUOwr4tpz1y6Ronc34cZYi1FlKWJuz01w4s
+4PKjFG/wbYSd+QLfftyyVPMLdY+wCwc8O59QqKx5Rj8HQLxIwSL3chhmdAHCmejM
+zKCpkFyLOc6+Wjet6hD6X3EsjIee1AAy22D24EaLJsju9zR/khJFS4K76aQX7dYN
+aB3C7S5HGxvYGSqfnn4eBaEzrSOde7HEcqYpYKxS+jB1c4X4W91NSTsqDd0QJMVF
+35eKfhWj+X6jWIC+48kfzypXdOCnPbto7wrr40yYCHw3XSXj40H5dWSsWEZVmS+s
+Dzz6zy9maHVyXa/rNsL7OjqimtKad65r/wfSFPPIcR1jJfP4GMNHV0TYqxdyDaXg
+iEVpHzOV7gd75fJbOvoNxNZj20Yj5sg8OCwbv8PxLXEcBFs7hhjQMhVRsjpNYzAR
+Iw==
+=rMlc
+-END PGP PUBLIC KEY BLOCK-

-- 
To stop receiving notification emails like this one, please contact
['"comm...@mxnet.apache.org" '].


[GitHub] reminisce opened a new pull request #8758: Explicitly convert float value

2017-11-21 Thread GitBox
reminisce opened a new pull request #8758: Explicitly convert float value
URL: https://github.com/apache/incubator-mxnet/pull/8758
 
 
   ## Description ##
   Fixed the failure reported in this issue.
   https://github.com/apache/incubator-mxnet/issues/8668#issuecomment-346202259
   
   ## Checklist ##
   ### Essentials ###
   - [ ] Passed code style checking (`make lint`)
   - [ ] Changes are complete (i.e. I finished coding on this PR)
   - [ ] All changes have test coverage
   - [ ] For user-facing API changes, API doc string has been updated. For new 
C++ functions in header files, their functionalities and arguments are 
well-documented. 
   - [ ] To my best knowledge, examples are either not affected by this change, 
or have been fixed to be compatible with this change
   
   ### Changes ###
   - [ ] Feature1, tests, (and when applicable, API doc)
   - [ ] Feature2, tests, (and when applicable, API doc)
   
   ## Comments ##
   - If this change is a backward incompatible change, why must this change be 
made.
   - Interesting edge cases to note here
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] szha commented on issue #7443: tf.boolean_mask equivalent in MxNet

2017-11-21 Thread GitBox
szha commented on issue #7443: tf.boolean_mask equivalent in MxNet
URL: 
https://github.com/apache/incubator-mxnet/issues/7443#issuecomment-346204562
 
 
   This issue is closed due to lack of activity in the last 90 days. Feel free 
to ping me to reopen if this is still an active issue. Thanks!
   Also, do please check out our [forum](https://discuss.mxnet.io/) (and 
[Chinese version](https://discuss.gluon.ai/)) for general "how-to" questions.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] szha closed issue #7443: tf.boolean_mask equivalent in MxNet

2017-11-21 Thread GitBox
szha closed issue #7443: tf.boolean_mask equivalent in MxNet
URL: https://github.com/apache/incubator-mxnet/issues/7443
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] cjolivier01 commented on issue #8668: Precision error setting NDArray from np.float32 scalar

2017-11-21 Thread GitBox
cjolivier01 commented on issue #8668: Precision error setting NDArray from 
np.float32 scalar
URL: 
https://github.com/apache/incubator-mxnet/issues/8668#issuecomment-346202259
 
 
   conversion appears to generate strings of a fixed precision, thus truncating 
the value.
   @szha is investigating now


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] mbaijal commented on issue #8401: Make make lint compatible with python3 (don't call python2 explicitly)

2017-11-21 Thread GitBox
mbaijal commented on issue #8401: Make make lint compatible with python3 (don't 
call python2 explicitly)
URL: https://github.com/apache/incubator-mxnet/pull/8401#issuecomment-346202027
 
 
   A new build got triggered (says PR updated?)
   So now it needs to pass the build again before it can get merged i think


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] larroy commented on issue #8401: Make make lint compatible with python3 (don't call python2 explicitly)

2017-11-21 Thread GitBox
larroy commented on issue #8401: Make make lint compatible with python3 (don't 
call python2 explicitly)
URL: https://github.com/apache/incubator-mxnet/pull/8401#issuecomment-346201162
 
 
   @mli @piiswrong  can we get this merged please? has been here lingering for 
a month


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] larroy commented on issue #8737: Use RAII and fix Coverity resource leaks #10371 and others

2017-11-21 Thread GitBox
larroy commented on issue #8737: Use RAII and fix Coverity resource leaks 
#10371 and others
URL: https://github.com/apache/incubator-mxnet/pull/8737#issuecomment-346198211
 
 
   @piiswrong isn't it ok to do it before 1.0? What do you suggest then?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] reminisce commented on issue #8668: Precision error setting NDArray from np.float32 scalar

2017-11-21 Thread GitBox
reminisce commented on issue #8668: Precision error setting NDArray from 
np.float32 scalar
URL: 
https://github.com/apache/incubator-mxnet/issues/8668#issuecomment-346197743
 
 
   When assigning a scalar value to an NDArray, the scalar is passed to the 
backend as a parameter, rather than an `NDArray`. This leads to converting the 
scalar to a python string and converting the python string to a 
`ctypes.c_char_p` in the frontend. After tracing the code, it's found that the 
scalar's last two digits were truncated when it reaches here: 
https://github.com/apache/incubator-mxnet/blob/master/src/c_api/c_api_ndarray.cc#L140
   
   while it was still good in the frontend before reaching here:
   
https://github.com/apache/incubator-mxnet/blob/master/python/mxnet/_ctypes/ndarray.py#L83
   
   So it looks like a bug in `ctypes` converting floats to strings.
   
   @cjolivier01 is looking into this for more details.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] sxjscience opened a new pull request #8757: [ImageIO] Fix image io for opencv3.3

2017-11-21 Thread GitBox
sxjscience opened a new pull request #8757: [ImageIO] Fix image io for opencv3.3
URL: https://github.com/apache/incubator-mxnet/pull/8757
 
 
   ## Description ##
   Starting from OpenCV3.3 the `imdecode` function will automatically rotate 
the image based on the EXIF' orientation flag.
   ```c++
   Mat imdecode( InputArray _buf, int flags )
   {
   CV_TRACE_FUNCTION();
   
   Mat buf = _buf.getMat(), img;
   imdecode_( buf, flags, LOAD_MAT,  );
   
   /// optionally rotate the data if EXIF' orientation flag says so
   if( !img.empty() && (flags & IMREAD_IGNORE_ORIENTATION) == 0 && flags != 
IMREAD_UNCHANGED )
   {
   ApplyExifOrientation(buf, img);
   }
   
   return img;
   }
   ```
   
   This can be disabled by setting the `cv::IMREAD_IGNORE_ORIENTATION` flag.
   
   ## Checklist ##
   ### Essentials ###
   - [x] Passed code style checking (`make lint`)
   - [x] Changes are complete (i.e. I finished coding on this PR)
   - [ ] All changes have test coverage
   - [x] For user-facing API changes, API doc string has been updated. For new 
C++ functions in header files, their functionalities and arguments are 
well-documented. 
   - [x] To my best knowledge, examples are either not affected by this change, 
or have been fixed to be compatible with this change
   
   ### Changes ###
   - [ ] add `IMREAD_IGNORE_ORIENTATION` flag when opencv 3.3 is used.
   
   ## Comments ##
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] eric-haibin-lin closed issue #8623: How to set different learning rate to different layers with pretained model which only have symbols json file and params file?

2017-11-21 Thread GitBox
eric-haibin-lin closed issue #8623: How to set different learning rate to 
different layers with pretained model which only have symbols json file and 
params file?
URL: https://github.com/apache/incubator-mxnet/issues/8623
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


  1   2   3   >