date:20180129

[GitHub] adaaaaaa commented on issue #8671: Discussion and troubleshooting on PyPI (pip) installation

2018-01-29 Thread GitBox

adaa commented on issue #8671: Discussion and troubleshooting on PyPI (pip) 
installation
URL: 
https://github.com/apache/incubator-mxnet/issues/8671#issuecomment-361505960
 
 
   ?@?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] shuokay commented on issue #9600: How to implement an efficient DataIter for image segmentation in mxnet?

2018-01-29 Thread GitBox

shuokay commented on issue #9600: How to implement an efficient DataIter for 
image segmentation in mxnet?
URL: 
https://github.com/apache/incubator-mxnet/issues/9600#issuecomment-361492059
 
 
   You can try to preload all data into memory in advance.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] marcoabreu commented on issue #9318: simplify import of citation metadata

2018-01-29 Thread GitBox

marcoabreu commented on issue #9318: simplify import of citation metadata
URL: https://github.com/apache/incubator-mxnet/pull/9318#issuecomment-361491226
 
 
   @mli 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] marcoabreu commented on issue #9616: Removing a broken tutorial from the nightly tests

2018-01-29 Thread GitBox

marcoabreu commented on issue #9616: Removing a broken tutorial from the 
nightly tests
URL: https://github.com/apache/incubator-mxnet/pull/9616#issuecomment-361490928
 
 
   @eric-haibin-lin 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] marcoabreu commented on issue #9601: MXNET not working in R

2018-01-29 Thread GitBox

marcoabreu commented on issue #9601: MXNET not working in R
URL: 
https://github.com/apache/incubator-mxnet/issues/9601#issuecomment-361489975
 
 
   Duplicate of https://github.com/apache/incubator-mxnet/issues/9602


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] marcoabreu closed issue #9601: MXNET not working in R

2018-01-29 Thread GitBox

marcoabreu closed issue #9601: MXNET not working in R
URL: https://github.com/apache/incubator-mxnet/issues/9601
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[incubator-mxnet] branch master updated: Fix skipping error in docstr and API docs (#9626)

2018-01-29 Thread jxie

This is an automated email from the ASF dual-hosted git repository.

jxie pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git


The following commit(s) were added to refs/heads/master by this push:
 new 5e0a0b0  Fix skipping error in docstr and API docs (#9626)
5e0a0b0 is described below

commit 5e0a0b0bd54cdeb92321f958bf964ddc8aca94e9
Author: Aston Zhang <22279212+astonzh...@users.noreply.github.com>
AuthorDate: Mon Jan 29 22:25:39 2018 -0800

Fix skipping error in docstr and API docs (#9626)

* Fix skipping error in docstr

* update
---
 docs/api/python/contrib/text.md| 28 ++--
 python/mxnet/contrib/text/embedding.py |  8 
 python/mxnet/contrib/text/vocab.py |  2 +-
 3 files changed, 19 insertions(+), 19 deletions(-)

diff --git a/docs/api/python/contrib/text.md b/docs/api/python/contrib/text.md
index f203a11..8bd67d2 100644
--- a/docs/api/python/contrib/text.md
+++ b/docs/api/python/contrib/text.md
@@ -138,11 +138,11 @@ data set.
 
 The obtained `counter` has key-value pairs whose keys are words and values are 
word frequencies.
 Suppose that we want to build indices for the 2 most frequent keys in 
`counter` with the unknown
-token representation '' and a reserved token ''.
+token representation 'unk' and a reserved token 'pad'.
 
 ```python
->>> my_vocab = text.vocab.Vocabulary(counter, most_freq_count=2, 
unknown_token='', 
-... reserved_tokens=[''])
+>>> my_vocab = text.vocab.Vocabulary(counter, most_freq_count=2, 
unknown_token='unk', 
+... reserved_tokens=['pad'])
 
 ```
 
@@ -153,18 +153,18 @@ of any unknown token) and `reserved_tokens`.
 
 ```python
 >>> my_vocab.token_to_idx
-{'': 0, '': 1, 'world': 2, 'hello': 3}
+{'unk': 0, 'pad': 1, 'world': 2, 'hello': 3}
 >>> my_vocab.idx_to_token
-['', '', 'world', 'hello']
+['unk', 'pad', 'world', 'hello']
 >>> my_vocab.unknown_token
-''
+'unk'
 >>> my_vocab.reserved_tokens
-['']
+['pad']
 >>> len(my_vocab)
 4
 ```
 
-Besides the specified unknown token '' and reserved_token '' are 
indexed, the 2 most
+Besides the specified unknown token 'unk' and reserved_token 
'pad' are indexed, the 2 most
 frequent words 'world' and 'hello' are also indexed.
 
 
@@ -259,9 +259,9 @@ We can also access properties such as `token_to_idx` 
(mapping tokens to indices)
 
 ```python
 >>> my_embedding.token_to_idx
-{'': 0, 'world': 1, 'hello': 2}
+{'unk': 0, 'world': 1, 'hello': 2}
 >>> my_embedding.idx_to_token
-['', 'world', 'hello']
+['unk', 'world', 'hello']
 >>> len(my_embedding)
 3
 >>> my_embedding.vec_len
@@ -302,7 +302,7 @@ word embedding file, we do not need to specify any 
vocabulary.
 
 We can access properties such as `token_to_idx` (mapping tokens to indices), 
`idx_to_token` (mapping
 indices to tokens), `vec_len` (length of each embedding vector), and 
`unknown_token` (representation
-of any unknown token, default value is '').
+of any unknown token, default value is 'unk').
 
 ```python
 >>> my_embedding.token_to_idx['nice']
@@ -312,15 +312,15 @@ of any unknown token, default value is '').
 >>> my_embedding.vec_len
 300
 >>> my_embedding.unknown_token
-''
+'unk'
 
 ```
 
-For every unknown token, if its representation '' is encountered in the 
pre-trained token
+For every unknown token, if its representation 'unk' is encountered in 
the pre-trained token
 embedding file, index 0 of property `idx_to_vec` maps to the pre-trained token 
embedding vector
 loaded from the file; otherwise, index 0 of property `idx_to_vec` maps to the 
default token
 embedding vector specified via `init_unknown_vec` (set to nd.zeros here). 
Since the pre-trained file
-does not have a vector for the token '', index 0 has to map to an 
additional token '' and
+does not have a vector for the token 'unk', index 0 has to map to an 
additional token 'unk' and
 the number of tokens in the embedding is 111,052.
 
 
diff --git a/python/mxnet/contrib/text/embedding.py 
b/python/mxnet/contrib/text/embedding.py
index 4fc6aac..961fbb0 100644
--- a/python/mxnet/contrib/text/embedding.py
+++ b/python/mxnet/contrib/text/embedding.py
@@ -646,12 +646,12 @@ class CustomEmbedding(_TokenEmbedding):
 
 This is to load embedding vectors from a user-defined pre-trained text 
embedding file.
 
-Denote by '' the argument `elem_delim`. Denote by  the j-th 
element of the token
-embedding vector for , the expected format of a custom 
pre-trained token embedding file
+Denote by '[ed]' the argument `elem_delim`. Denote by [v_ij] the j-th 
element of the token
+embedding vector for [token_i], the expected format of a custom 
pre-trained token embedding file
 is:
 
-
'...n...
-n...'
+
'[token_1][ed][v_11][ed][v_12][ed]...[ed][v_1k]n[token_2][ed][v_21][ed][v_22][ed]...[ed]
+[v_2k]n...'
 
 where k is the length of the embedding vector `vec_len`.
 
diff --git a/python/mxnet/contrib/text/vocab.py

[GitHub] piiswrong closed pull request #9626: Fix skipping error in docstr and API docs

2018-01-29 Thread GitBox

piiswrong closed pull request #9626: Fix skipping error in docstr and API docs
URL: https://github.com/apache/incubator-mxnet/pull/9626
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/docs/api/python/contrib/text.md b/docs/api/python/contrib/text.md
index f203a117ba..8bd67d2b50 100644
--- a/docs/api/python/contrib/text.md
+++ b/docs/api/python/contrib/text.md
@@ -138,11 +138,11 @@ data set.
 
 The obtained `counter` has key-value pairs whose keys are words and values are 
word frequencies.
 Suppose that we want to build indices for the 2 most frequent keys in 
`counter` with the unknown
-token representation '' and a reserved token ''.
+token representation 'unk' and a reserved token 'pad'.
 
 ```python
->>> my_vocab = text.vocab.Vocabulary(counter, most_freq_count=2, 
unknown_token='', 
-... reserved_tokens=[''])
+>>> my_vocab = text.vocab.Vocabulary(counter, most_freq_count=2, 
unknown_token='unk', 
+... reserved_tokens=['pad'])
 
 ```
 
@@ -153,18 +153,18 @@ of any unknown token) and `reserved_tokens`.
 
 ```python
 >>> my_vocab.token_to_idx
-{'': 0, '': 1, 'world': 2, 'hello': 3}
+{'unk': 0, 'pad': 1, 'world': 2, 'hello': 3}
 >>> my_vocab.idx_to_token
-['', '', 'world', 'hello']
+['unk', 'pad', 'world', 'hello']
 >>> my_vocab.unknown_token
-''
+'unk'
 >>> my_vocab.reserved_tokens
-['']
+['pad']
 >>> len(my_vocab)
 4
 ```
 
-Besides the specified unknown token '' and reserved_token '' are 
indexed, the 2 most
+Besides the specified unknown token 'unk' and reserved_token 
'pad' are indexed, the 2 most
 frequent words 'world' and 'hello' are also indexed.
 
 
@@ -259,9 +259,9 @@ We can also access properties such as `token_to_idx` 
(mapping tokens to indices)
 
 ```python
 >>> my_embedding.token_to_idx
-{'': 0, 'world': 1, 'hello': 2}
+{'unk': 0, 'world': 1, 'hello': 2}
 >>> my_embedding.idx_to_token
-['', 'world', 'hello']
+['unk', 'world', 'hello']
 >>> len(my_embedding)
 3
 >>> my_embedding.vec_len
@@ -302,7 +302,7 @@ word embedding file, we do not need to specify any 
vocabulary.
 
 We can access properties such as `token_to_idx` (mapping tokens to indices), 
`idx_to_token` (mapping
 indices to tokens), `vec_len` (length of each embedding vector), and 
`unknown_token` (representation
-of any unknown token, default value is '').
+of any unknown token, default value is 'unk').
 
 ```python
 >>> my_embedding.token_to_idx['nice']
@@ -312,15 +312,15 @@ of any unknown token, default value is '').
 >>> my_embedding.vec_len
 300
 >>> my_embedding.unknown_token
-''
+'unk'
 
 ```
 
-For every unknown token, if its representation '' is encountered in the 
pre-trained token
+For every unknown token, if its representation 'unk' is encountered in 
the pre-trained token
 embedding file, index 0 of property `idx_to_vec` maps to the pre-trained token 
embedding vector
 loaded from the file; otherwise, index 0 of property `idx_to_vec` maps to the 
default token
 embedding vector specified via `init_unknown_vec` (set to nd.zeros here). 
Since the pre-trained file
-does not have a vector for the token '', index 0 has to map to an 
additional token '' and
+does not have a vector for the token 'unk', index 0 has to map to an 
additional token 'unk' and
 the number of tokens in the embedding is 111,052.
 
 
diff --git a/python/mxnet/contrib/text/embedding.py 
b/python/mxnet/contrib/text/embedding.py
index 4fc6aacf67..961fbb02a8 100644
--- a/python/mxnet/contrib/text/embedding.py
+++ b/python/mxnet/contrib/text/embedding.py
@@ -646,12 +646,12 @@ class CustomEmbedding(_TokenEmbedding):
 
 This is to load embedding vectors from a user-defined pre-trained text 
embedding file.
 
-Denote by '' the argument `elem_delim`. Denote by  the j-th 
element of the token
-embedding vector for , the expected format of a custom 
pre-trained token embedding file
+Denote by '[ed]' the argument `elem_delim`. Denote by [v_ij] the j-th 
element of the token
+embedding vector for [token_i], the expected format of a custom 
pre-trained token embedding file
 is:
 
-
'...n...
-n...'
+
'[token_1][ed][v_11][ed][v_12][ed]...[ed][v_1k]n[token_2][ed][v_21][ed][v_22][ed]...[ed]
+[v_2k]n...'
 
 where k is the length of the embedding vector `vec_len`.
 
diff --git a/python/mxnet/contrib/text/vocab.py 
b/python/mxnet/contrib/text/vocab.py
index 04c3326841..9e44acb101 100644
--- a/python/mxnet/contrib/text/vocab.py
+++ b/python/mxnet/contrib/text/vocab.py
@@ -52,7 +52,7 @@ class Vocabulary(object):
 argument has no effect.
 min_freq : int, default 1
 The minimum frequency required for a token in the keys of `counter` to 
be indexed.
-unknown_token : hashable object, default ''
+unknown_token : hashable object, default 'unk'

[GitHub] nehaljwani commented on issue #9271: Windows: Python prompt gets stuck on exit()

2018-01-29 Thread GitBox

nehaljwani commented on issue #9271: Windows: Python prompt gets stuck on exit()
URL: 
https://github.com/apache/incubator-mxnet/issues/9271#issuecomment-354642541
 
 
   ***EDIT*** This doesn't work for all cases
   
   If I comment out 
   ```python
   atexit.register(_notify_shutdown)
   ```
at 
https://github.com/apache/incubator-mxnet/blob/1.0.0/python/mxnet/base.py#L394 
the problem goes away. Of course, that's just a workaround.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] reminisce commented on a change in pull request #9552: [REQUEST FOR REVIEW | DO NOT MERGE] Model Quantization with Calibration

2018-01-29 Thread GitBox

reminisce commented on a change in pull request #9552: [REQUEST FOR REVIEW | DO 
NOT MERGE] Model Quantization with Calibration
URL: https://github.com/apache/incubator-mxnet/pull/9552#discussion_r164638902
 
 

 ##
 File path: python/mxnet/quantization.py
 ##
 @@ -0,0 +1,467 @@
+# Licensed to the Apache Software Foundation (ASF) under one
 
 Review comment:
   Good catch. I will do that.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] marcoabreu commented on issue #9560: Expand gpu-kernel-launch synchronous error checking.

2018-01-29 Thread GitBox

marcoabreu commented on issue #9560: Expand gpu-kernel-launch synchronous error 
checking.
URL: https://github.com/apache/incubator-mxnet/pull/9560#issuecomment-361467952
 
 
   Thanks, your approach seems the perfect way to go! 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] DickJC123 commented on issue #9560: Expand gpu-kernel-launch synchronous error checking.

2018-01-29 Thread GitBox

DickJC123 commented on issue #9560: Expand gpu-kernel-launch synchronous error 
checking.
URL: https://github.com/apache/incubator-mxnet/pull/9560#issuecomment-361466348
 
 
   The main python thread is protected by the try-catch created by the 
API_BEGIN() and API_END() macros in src/c_api_common.h.  Any exceptions thrown 
in separate worker threads are not caught by this try-catch, so those 
exceptions cause terminate() to be called which kills the CI.  One thing I 
considered was setting up MXNET_ENGINE_TYPE=NaiveEngine which performs all 
actions on the one python thread.  The problem with this approach is that the 
Engine is a singleton and the environment variable is only checked when the 
first test of the CI calls for an engine. Once again, one needs a separate 
process to get a fresh engine singleton for a particular test.
   
   So I've today finished a bit of polishing on the way to create the separate 
test process.  I've confirmed that the test runs properly under both Win and 
Linux Python3 CI environments, and is appropriately skipped for Python2 
environments, where the needed fork-exec support is missing.
   
   I'm satisfied with the PR as it stands now, and hope it will be considered 
for merging.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] anirudhacharya commented on issue #9623: A tutorial for Word embeddings using mxnet

2018-01-29 Thread GitBox

anirudhacharya commented on issue #9623: A tutorial for Word embeddings using 
mxnet
URL: 
https://github.com/apache/incubator-mxnet/issues/9623#issuecomment-361464302
 
 
   Is there already a PR in review?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] astonzhang commented on issue #9623: A tutorial for Word embeddings using mxnet

2018-01-29 Thread GitBox

astonzhang commented on issue #9623: A tutorial for Word embeddings using mxnet
URL: 
https://github.com/apache/incubator-mxnet/issues/9623#issuecomment-361462638
 
 
   I think @zackchase is doing it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] astonzhang opened a new pull request #9626: Fix skipping error in docstr and API docs

2018-01-29 Thread GitBox

astonzhang opened a new pull request #9626: Fix skipping error in docstr and 
API docs
URL: https://github.com/apache/incubator-mxnet/pull/9626
 
 
   ## Description ##
   Fix skipping error in docstr and API docs
   
   ## Checklist ##
   ### Essentials ###
   - [x] Passed code style checking (`make lint`)
   - [x] Changes are complete (i.e. I finished coding on this PR)
   - [x] All changes have test coverage:
   - Unit tests are added for small changes to verify correctness (e.g. adding 
a new operator)
   - Nightly tests are added for complicated/long-running ones (e.g. changing 
distributed kvstore)
   - Build tests will be added for build configuration changes (e.g. adding a 
new build option with NCCL)
   - [x] Code is well-documented: 
   - For user-facing API changes, API doc string has been updated. 
   - For new C++ functions in header files, their functionalities and arguments 
are documented. 
   - For new examples, README.md is added to explain the what the example does, 
the source of the dataset, expected performance on test set and reference to 
the original paper if applicable
   - [x] To the my best knowledge, examples are either not affected by this 
change, or have been fixed to be compatible with this change
   
   ### Changes ###
   - [x] Fix skipping error in docstr and API docs
   
   ## Comments ##
   NA
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] ZiyueHuang commented on a change in pull request #9625: sparse regression operators

2018-01-29 Thread GitBox

ZiyueHuang commented on a change in pull request #9625: sparse regression 
operators
URL: https://github.com/apache/incubator-mxnet/pull/9625#discussion_r164629738
 
 

 ##
 File path: src/operator/regression_output-inl.h
 ##
 @@ -77,12 +78,41 @@ inline bool RegressionOpShape(const nnvm::NodeAttrs& attrs,
   return true;
 }
 
+template
+inline bool RegressionInferStorageType(const nnvm::NodeAttrs& attrs,
+   const int dev_mask,
+   DispatchMode* dispatch_mode,
+   std::vector* in_attrs,
+   std::vector* out_attrs) {
+  const auto label_stype = in_attrs->at(label_pos);
+  auto& out_stype = out_attrs->at(0);
+  bool dispatched = false;
+  if (!dispatched && label_stype == kDefaultStorage) {
+dispatched = storage_type_assign(_stype, kDefaultStorage,
+ dispatch_mode, DispatchMode::kFCompute);
+  }
+
+  if (!dispatched && label_stype == kCSRStorage) {
+dispatched = storage_type_assign(_stype, kDefaultStorage,
+ dispatch_mode, DispatchMode::kFComputeEx);
+  }
+
+  if (!dispatched) {
+dispatched = dispatch_fallback(out_attrs, dispatch_mode);
+  }
+  if (out_attrs->size() > 1) type_assign(_attrs->at(1), kDefaultStorage);
 
 Review comment:
   In backward pass, although we don't care about gradients of label, a storage 
type should be assigned to it. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] ZiyueHuang opened a new pull request #9625: sparse regression operators

2018-01-29 Thread GitBox

ZiyueHuang opened a new pull request #9625: sparse regression operators
URL: https://github.com/apache/incubator-mxnet/pull/9625
 
 
   ## Description ##
   Add sparse support for regression operators.
   
   cc @eric-haibin-lin 
   
   ## Checklist ##
   ### Essentials ###
   - [x] Passed code style checking (`make lint`)
   - [x] Changes are complete (i.e. I finished coding on this PR)
   - [x] All changes have test coverage:
   - Unit tests are added for small changes to verify correctness (e.g. adding 
a new operator)
   - Nightly tests are added for complicated/long-running ones (e.g. changing 
distributed kvstore)
   - Build tests will be added for build configuration changes (e.g. adding a 
new build option with NCCL)
   - [x] Code is well-documented: 
   - For user-facing API changes, API doc string has been updated. 
   - For new C++ functions in header files, their functionalities and arguments 
are documented. 
   - For new examples, README.md is added to explain the what the example does, 
the source of the dataset, expected performance on test set and reference to 
the original paper if applicable
   - [x] To the my best knowledge, examples are either not affected by this 
change, or have been fixed to be compatible with this change
   
   ### Changes ###
   - [x] sparse regression ops (label is of csr storage type), unittest
   
   
   ## Comments ##
   - If this change is a backward incompatible change, why must this change be 
made.
   - Interesting edge cases to note here
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] marcoabreu commented on a change in pull request #9617: Remove uneeded gtest dependency, build with verbosely with CMake & ni?

2018-01-29 Thread GitBox

marcoabreu commented on a change in pull request #9617: Remove uneeded gtest 
dependency, build with verbosely with CMake & ni?
URL: https://github.com/apache/incubator-mxnet/pull/9617#discussion_r164627598
 
 

 ##
 File path: Jenkinsfile
 ##
 @@ -270,10 +270,8 @@ try {
 -DUSE_CUDNN=1  \
 -DCMAKE_BUILD_TYPE=Release \
 """
-def flag = """ \
--j\$(nproc)
-"""
-  cmake("build_cuda", defines, flag)
+def flag = "-v"
 
 Review comment:
   Is the number of build threads automatically determined?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] marcoabreu commented on issue #9560: Expand gpu-kernel-launch synchronous error checking.

2018-01-29 Thread GitBox

marcoabreu commented on issue #9560: Expand gpu-kernel-launch synchronous error 
checking.
URL: https://github.com/apache/incubator-mxnet/pull/9560#issuecomment-361458481
 
 
   What's the problem if an exception is getting thrown in the current process? 
Would the process terminate without notice and proper error message or why 
exactly do you prefer creating a new process?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] Soonhwan-Kwon commented on issue #9156: float64 data backward error using gluon

2018-01-29 Thread GitBox

Soonhwan-Kwon commented on issue #9156: float64 data backward error using  gluon
URL: 
https://github.com/apache/incubator-mxnet/issues/9156#issuecomment-361166849
 
 
   Same error occurs when I use float16 and I'm not using gluon. 
   "mxnet.base.MXNetError: [05:42:23] include/mxnet/././tensor_blob.h:217: 
Check failed: mshadow::DataType::kFlag == type_flag_ 
TBlob.get_with_shape: data type do not match specified type.Expected: 0 v.s. 
given 2"
   And also when it's on backward and ok with forward.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] Soonhwan-Kwon commented on issue #9156: float64 data backward error using gluon

2018-01-29 Thread GitBox

Soonhwan-Kwon commented on issue #9156: float64 data backward error using  gluon
URL: 
https://github.com/apache/incubator-mxnet/issues/9156#issuecomment-361167158
 
 
   ...
 File "/data/ecg_2018/train.py", line 93, in do_training
   module.forward_backward(data_batch)
 File 
"/usr/local/lib/python2.7/dist-packages/mxnet-1.0.1-py2.7.egg/mxnet/module/base_module.py",
 line 192, in forward_backward
   self.backward()
 File 
"/usr/local/lib/python2.7/dist-packages/mxnet-1.0.1-py2.7.egg/mxnet/module/bucketing_module.py",
 line 444, in backward
   self._curr_module.backward(out_grads=out_grads)
 File 
"/usr/local/lib/python2.7/dist-packages/mxnet-1.0.1-py2.7.egg/mxnet/module/module.py",
 line 627, in backward
   self._exec_group.backward(out_grads=out_grads)
 File 
"/usr/local/lib/python2.7/dist-packages/mxnet-1.0.1-py2.7.egg/mxnet/module/executor_group.py",
 line 580, in backward
   exec_.backward(out_grads=out_grads_slice)
 File 
"/usr/local/lib/python2.7/dist-packages/mxnet-1.0.1-py2.7.egg/mxnet/executor.py",
 line 234, in backward
   ctypes.c_int(is_train)))
 File 
"/usr/local/lib/python2.7/dist-packages/mxnet-1.0.1-py2.7.egg/mxnet/base.py", 
line 146, in check_call
   raise MXNetError(py_str(_LIB.MXGetLastError()))
   mxnet.base.MXNetError: [05:42:23] include/mxnet/././tensor_blob.h:217: Check 
failed: mshadow::DataType::kFlag == type_flag_ TBlob.get_with_shape: 
data type do not match specified type.Expected: 0 v.s. given 2
   
   Stack trace returned 10 entries:
   [bt] (0) 
/usr/local/lib/python2.7/dist-packages/mxnet-1.0.1-py2.7.egg/mxnet/libmxnet.so(dmlc::StackTrace[abi:cxx11]()+0x5a)
 [0x7f03ecd9bcda]
   [bt] (1) 
/usr/local/lib/python2.7/dist-packages/mxnet-1.0.1-py2.7.egg/mxnet/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x28)
 [0x7f03ecd9c878]
   [bt] (2) 
/usr/local/lib/python2.7/dist-packages/mxnet-1.0.1-py2.7.egg/mxnet/libmxnet.so(mshadow::half::half_t*
 mxnet::TBlob::dptr() const+0xd7) [0x7f03ecdb74a7]
   [bt] (3) 
/usr/local/lib/python2.7/dist-packages/mxnet-1.0.1-py2.7.egg/mxnet/libmxnet.so(mshadow::Tensor mxnet::TBlob::get_with_shape(mshadow::Shape<3> const&, 
mshadow::Stream*) const+0x56c) [0x7f03ef94f84c]
   [bt] (4) 
/usr/local/lib/python2.7/dist-packages/mxnet-1.0.1-py2.7.egg/mxnet/libmxnet.so(mxnet::op::SliceChannelOp::Backward(mxnet::OpContext const&, 
std::vector const&, 
std::vector const&, 
std::vector const&, 
std::vector const&, 
std::vector const&, 
std::vector const&)+0x9a7) 
[0x7f03f0ab9be7]
   [bt] (5) 
/usr/local/lib/python2.7/dist-packages/mxnet-1.0.1-py2.7.egg/mxnet/libmxnet.so(mxnet::op::OperatorState::Backward(mxnet::OpContext
 const&, std::vector const&, 
std::vector const&, 
std::vector const&)+0x767) 
[0x7f03ef2f18a7]
   [bt] (6) 
/usr/local/lib/python2.7/dist-packages/mxnet-1.0.1-py2.7.egg/mxnet/libmxnet.so(mxnet::exec::StatefulComputeExecutor::Run(mxnet::RunContext,
 bool)+0x69) [0x7f03ef876429]
   [bt] (7) 
/usr/local/lib/python2.7/dist-packages/mxnet-1.0.1-py2.7.egg/mxnet/libmxnet.so(+0x339a050)
 [0x7f03ef849050]
   [bt] (8) 
/usr/local/lib/python2.7/dist-packages/mxnet-1.0.1-py2.7.egg/mxnet/libmxnet.so(std::_Function_handler::_M_invoke(std::_Any_data const&, 
mxnet::RunContext&&, mxnet::engine::CallbackOnComplete&&)+0x61) [0x7f03ef79ef61]
   [bt] (9) 
/usr/local/lib/python2.7/dist-packages/mxnet-1.0.1-py2.7.egg/mxnet/libmxnet.so(mxnet::engine::NaiveEngine::PushAsync(std::function, mxnet::Context, 
std::vector > const&, 
std::vector > const&, 
mxnet::FnProperty, int, char const*)+0x4da) [0x7f03ef7aac8a]


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] szha closed pull request #9514: Language Modeling Datasets and Sampler

2018-01-29 Thread GitBox

szha closed pull request #9514: Language Modeling Datasets and Sampler
URL: https://github.com/apache/incubator-mxnet/pull/9514
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/example/gluon/word_language_model/data.py 
b/example/gluon/word_language_model/data.py
deleted file mode 100644
index 913963ec20..00
--- a/example/gluon/word_language_model/data.py
+++ /dev/null
@@ -1,66 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements.  See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership.  The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License.  You may obtain a copy of the License at
-#
-#   http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied.  See the License for the
-# specific language governing permissions and limitations
-# under the License.
-
-import os
-import numpy as np
-import mxnet as mx
-
-class Dictionary(object):
-def __init__(self):
-self.word2idx = {}
-self.idx2word = []
-
-def add_word(self, word):
-if word not in self.word2idx:
-self.idx2word.append(word)
-self.word2idx[word] = len(self.idx2word) - 1
-return self.word2idx[word]
-
-def __len__(self):
-return len(self.idx2word)
-
-
-class Corpus(object):
-def __init__(self, path):
-self.dictionary = Dictionary()
-self.train = self.tokenize(path + 'train.txt')
-self.valid = self.tokenize(path + 'valid.txt')
-self.test = self.tokenize(path + 'test.txt')
-
-def tokenize(self, path):
-"""Tokenizes a text file."""
-assert os.path.exists(path)
-# Add words to the dictionary
-with open(path, 'r') as f:
-tokens = 0
-for line in f:
-words = line.split() + ['']
-tokens += len(words)
-for word in words:
-self.dictionary.add_word(word)
-
-# Tokenize file content
-with open(path, 'r') as f:
-ids = np.zeros((tokens,), dtype='int32')
-token = 0
-for line in f:
-words = line.split() + ['']
-for word in words:
-ids[token] = self.dictionary.word2idx[word]
-token += 1
-
-return mx.nd.array(ids, dtype='int32')
diff --git a/example/gluon/word_language_model/train.py 
b/example/gluon/word_language_model/train.py
index eb584b822a..001e9f4930 100644
--- a/example/gluon/word_language_model/train.py
+++ b/example/gluon/word_language_model/train.py
@@ -20,12 +20,11 @@
 import math
 import mxnet as mx
 from mxnet import gluon, autograd
+from mxnet.gluon import contrib
 import model
 import data
 
-parser = argparse.ArgumentParser(description='MXNet Autograd PennTreeBank 
RNN/LSTM Language Model')
-parser.add_argument('--data', type=str, default='./data/wikitext-2/wiki.',
-help='location of the data corpus')
+parser = argparse.ArgumentParser(description='MXNet Autograd RNN/LSTM Language 
Model on Wikitext-2.')
 parser.add_argument('--model', type=str, default='lstm',
 help='type of recurrent net (rnn_tanh, rnn_relu, lstm, 
gru)')
 parser.add_argument('--emsize', type=int, default=200,
@@ -72,18 +71,33 @@
 else:
 context = mx.cpu(0)
 
-corpus = data.Corpus(args.data)
-
-def batchify(data, batch_size):
-"""Reshape data into (num_example, batch_size)"""
-nbatch = data.shape[0] // batch_size
-data = data[:nbatch * batch_size]
-data = data.reshape((batch_size, nbatch)).T
-return data
-
-train_data = batchify(corpus.train, args.batch_size).as_in_context(context)
-val_data = batchify(corpus.valid, args.batch_size).as_in_context(context)
-test_data = batchify(corpus.test, args.batch_size).as_in_context(context)
+train_dataset = contrib.data.text.WikiText2('./data', 'train', 
seq_len=args.bptt)
+vocab = train_dataset.vocabulary
+val_dataset, test_dataset = [contrib.data.text.WikiText2('./data', segment,
+ vocab=vocab,
+ seq_len=args.bptt)
+ for segment in ['validation', 'test']]
+
+nbatch_train = len(train_dataset) / args.batch_size
+train_data = gluon.data.DataLoader(train_dataset,
+

[incubator-mxnet] branch master updated: Language Modeling Datasets and Sampler (#9514)

2018-01-29 Thread zhasheng

This is an automated email from the ASF dual-hosted git repository.

zhasheng pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git


The following commit(s) were added to refs/heads/master by this push:
 new 8bdc806  Language Modeling Datasets and Sampler (#9514)
8bdc806 is described below

commit 8bdc806caab0b6f6c179ce3b9545ad5d02354960
Author: Sheng Zha 
AuthorDate: Mon Jan 29 18:00:46 2018 -0800

Language Modeling Datasets and Sampler (#9514)

* refactor dataset

* add interval sampler

* wikitext-2/-103

* update word language model

* address comments

* move interval sampler to contrib

* update

* add frequencies property
---
 example/gluon/word_language_model/data.py  |  66 
 example/gluon/word_language_model/train.py |  68 +
 python/mxnet/gluon/contrib/__init__.py |   2 +
 .../mxnet/gluon/contrib/data/__init__.py   |  22 +--
 .../contrib/{__init__.py => data/_constants.py}|   5 +-
 python/mxnet/gluon/contrib/data/sampler.py |  62 
 python/mxnet/gluon/contrib/data/text.py| 170 +
 python/mxnet/gluon/data/dataset.py |  36 +
 python/mxnet/gluon/data/vision/datasets.py |  53 ++-
 tests/python/unittest/test_gluon_contrib.py|  23 +++
 10 files changed, 349 insertions(+), 158 deletions(-)

diff --git a/example/gluon/word_language_model/data.py 
b/example/gluon/word_language_model/data.py
deleted file mode 100644
index 913963e..000
--- a/example/gluon/word_language_model/data.py
+++ /dev/null
@@ -1,66 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements.  See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership.  The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License.  You may obtain a copy of the License at
-#
-#   http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied.  See the License for the
-# specific language governing permissions and limitations
-# under the License.
-
-import os
-import numpy as np
-import mxnet as mx
-
-class Dictionary(object):
-def __init__(self):
-self.word2idx = {}
-self.idx2word = []
-
-def add_word(self, word):
-if word not in self.word2idx:
-self.idx2word.append(word)
-self.word2idx[word] = len(self.idx2word) - 1
-return self.word2idx[word]
-
-def __len__(self):
-return len(self.idx2word)
-
-
-class Corpus(object):
-def __init__(self, path):
-self.dictionary = Dictionary()
-self.train = self.tokenize(path + 'train.txt')
-self.valid = self.tokenize(path + 'valid.txt')
-self.test = self.tokenize(path + 'test.txt')
-
-def tokenize(self, path):
-"""Tokenizes a text file."""
-assert os.path.exists(path)
-# Add words to the dictionary
-with open(path, 'r') as f:
-tokens = 0
-for line in f:
-words = line.split() + ['']
-tokens += len(words)
-for word in words:
-self.dictionary.add_word(word)
-
-# Tokenize file content
-with open(path, 'r') as f:
-ids = np.zeros((tokens,), dtype='int32')
-token = 0
-for line in f:
-words = line.split() + ['']
-for word in words:
-ids[token] = self.dictionary.word2idx[word]
-token += 1
-
-return mx.nd.array(ids, dtype='int32')
diff --git a/example/gluon/word_language_model/train.py 
b/example/gluon/word_language_model/train.py
index eb584b8..001e9f4 100644
--- a/example/gluon/word_language_model/train.py
+++ b/example/gluon/word_language_model/train.py
@@ -20,12 +20,11 @@ import time
 import math
 import mxnet as mx
 from mxnet import gluon, autograd
+from mxnet.gluon import contrib
 import model
 import data
 
-parser = argparse.ArgumentParser(description='MXNet Autograd PennTreeBank 
RNN/LSTM Language Model')
-parser.add_argument('--data', type=str, default='./data/wikitext-2/wiki.',
-help='location of the data corpus')
+parser = argparse.ArgumentParser(description='MXNet Autograd RNN/LSTM Language 
Model on Wikitext-2.')
 parser.add_argument('--model', type=str, default='lstm',
 help='type of recurrent net (rnn_tanh, rnn_relu, lstm, 
gru)')
 parser.add_argument('--emsize', type=int, default=200,
@@ -72,18 +71,33 @@ if

[GitHub] szha commented on issue #9514: Language Modeling Datasets and Sampler

2018-01-29 Thread GitBox

szha commented on issue #9514: Language Modeling Datasets and Sampler
URL: https://github.com/apache/incubator-mxnet/pull/9514#issuecomment-361451791
 
 
   Connected offline with @piiswrong that current design is OK to check in in 
contrib package.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] yanhn commented on issue #9612: CUDNN_STATUS_SUCCESS (4 vs. 0) cuDNN: CUDNN_STATUS_INTERNAL_ERROR on jetson TX2

2018-01-29 Thread GitBox

yanhn commented on issue #9612: CUDNN_STATUS_SUCCESS (4 vs. 0) cuDNN: 
CUDNN_STATUS_INTERNAL_ERROR on jetson TX2
URL: 
https://github.com/apache/incubator-mxnet/issues/9612#issuecomment-361448597
 
 
   @KellenSunderland @marcoabreu Thank you all.
   I reproduced my error with a newly flashed TX2 (We bought 2 of them.) and 
still got the same error.
   
   1?Follow the instruction to flash JetPack3.1 to TX2, 
(http://docs.nvidia.com/jetpack-l4t/index.html#developertools/mobile/jetpack/l4t/3.0/jetpack_l4t_install.htm)
   2?Build mxnet from source for `Linux, Python, GPU` version. 
(https://mxnet.incubator.apache.org/install/index.html)
2.1? I used CUDA V8.0.72, Cudnn v6.0.21 from JetPack 3.1 instead of 
CUDA9 and cudnn7.
2.2? I skipped the opencv step cause I think CV4Tegra may be enough.
   3?Clone the mxnet-ssd repo by `git clone --recursive 
https://github.com/zhreshold/mxnet-ssd.git`
   4?`# cd /path/to/mxnet-ssd` and `python demo.py --gpu 0`
   After a quite long period I got the same error:
   
   nvidia@tegra-ubuntu:~/Workspace/mxnet-ssd$ python demo.py --gpu 0
   Using mxnet as:
   
   Warning: using pre-installed version of mxnet may cause unexpected error...
   (export MXNET_EXAMPLE_SSD_DISABLE_PRE_INSTALLED=1) to prevent loading 
pre-installed mxnet.
   [01:23:02] src/nnvm/legacy_json_util.cc:190: Loading symbol saved by 
previous version v0.10.1. Attempting to upgrade...
   [01:23:02] src/nnvm/legacy_json_util.cc:198: Symbol successfully upgraded!
   [01:28:29] src/operator/nn/./cudnn/./cudnn_algoreg-inl.h:107: Running 
performance tests to find the best convolution algorithm, this can take a
while... (setting env variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0 to disable)
   terminate called after throwing an instance of 'dmlc::Error'
 what():  [01:28:42] src/engine/./threaded_engine.h:359: [01:28:42] 
src/operator/nn/./cudnn/cudnn_convolution-inl.h:628: Check failed: e == 
C$DNN_STATUS_SUCCESS (4 vs. 0) cuDNN: CUDNN_STATUS_INTERNAL_ERROR
   
   Stack trace returned 9 entries:
   [bt] (0) 
/home/nvidia/Workspace/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(dmlc::StackTrace[abi:cxx11]()+0x58)
 [0x7f75778190]
   [bt] (1) 
/home/nvidia/Workspace/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x44)
 [0x7f75778c$c]
   [bt] (2) 
/home/nvidia/Workspace/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::op::CuDNNConvolutionOp::SelectAlgo(mxnet::Con$ext
 const&, std::vector const&, 
std::vector const&, 
$udnnDataType_t, cudnnDataType_t)::{lambda(mxnet::RunContext, 
mxnet::engine::CallbackOnComplete)#1}::operator()(mxnet::RunContext, 
mxnet::engin$::CallbackOnComplete) const+0x53c) [0x7f780e4a7c]
   [bt] (3) 
/home/nvidia/Workspace/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(std::_Function_handler::SelectAlgo(mxnet::Context const&, 
std::vector > const&, 
std::vector const&, 
cudnnDataType_t, cudnnDataType_t)::{lambda(mxnet::RunContex$, 
mxnet::engine::CallbackOnComplete)#1}>::_M_invoke(std::_Any_data const&, 
mxnet::RunContext&&, mxnet::engine::CallbackOnComplete&&)+0x2c) [0x$f780e58ec]
   [bt] (4) 
/home/nvidia/Workspace/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::engine::ThreadedEngine::ExecuteOprBlock(mxnet::RunCo$text,
 mxnet::engine::OprBlock*)+0x98) [0x7f775c7488]
   [bt] (5) 
/home/nvidia/Workspace/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(std::_Function_handler), 
mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock*, 
bool)::{lambda()#3}::operator()() 
c$nst::{lambda(std::shared_ptr)#1}>::_M_invoke(std::_Any_data
 const&, std::shared_ptr&&)+0x144) 
[0x7f775cf73c]
   [bt] (6) 
/home/nvidia/Workspace/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(std::thread::_Impl 
(std::shared_ptr)> >::_M_run()+0x48) 
[0x7f775ca$28]
   [bt] (7) /usr/lib/aarch64-linux-gnu/libstdc++.so.6(+0xb8280) [0x7f55292280]
   [bt] (8) /lib/aarch64-linux-gnu/libpthread.so.0(+0x6fc4) [0x7f867d5fc4]
   
   
   A fatal error occurred in asynchronous engine operation. If you do not know 
what caused this error, you can try set environment variable MXNET$ENGINE_TYPE 
to NaiveEngine and run with debugger (i.e. gdb). This will force all operations 
to be synchronous and backtrace will give you the $eries of calls that lead to 
this error. Remember to set MXNET_ENGINE_TYPE back to empty after debugging.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] yanhn commented on issue #9612: CUDNN_STATUS_SUCCESS (4 vs. 0) cuDNN: CUDNN_STATUS_INTERNAL_ERROR on jetson TX2

2018-01-29 Thread GitBox

yanhn commented on issue #9612: CUDNN_STATUS_SUCCESS (4 vs. 0) cuDNN: 
CUDNN_STATUS_INTERNAL_ERROR on jetson TX2
URL: 
https://github.com/apache/incubator-mxnet/issues/9612#issuecomment-361448597
 
 
   @KellenSunderland @marcoabreu Thank you all.
   I reproduced my error with a newly flashed TX2 (We bought 2 of them.) and 
still got the same error.
   
   1?Follow the instruction to flash JetPack3.1 to TX2, 
(http://docs.nvidia.com/jetpack-l4t/index.html#developertools/mobile/jetpack/l4t/3.0/jetpack_l4t_install.htm)
   2?Build mxnet from source for `Linux, Python, GPU` version. 
(https://mxnet.incubator.apache.org/install/index.html)
2.1? I used CUDA V8.0.72, Cudnn v6.0.21 from JetPack 3.1 instead of 
CUDA9 and cudnn7.
2.2? I skipped the opencv step couse I think CV4Tegra may be enough.
   3?Clone the mxnet-ssd repo by `git clone --recursive 
https://github.com/zhreshold/mxnet-ssd.git`
   4?`# cd /path/to/mxnet-ssd` and `python demo.py --gpu 0`
   After a quite long period I got the same error:
   
   nvidia@tegra-ubuntu:~/Workspace/mxnet-ssd$ python demo.py --gpu 0
   Using mxnet as:
   
   Warning: using pre-installed version of mxnet may cause unexpected error...
   (export MXNET_EXAMPLE_SSD_DISABLE_PRE_INSTALLED=1) to prevent loading 
pre-installed mxnet.
   [01:23:02] src/nnvm/legacy_json_util.cc:190: Loading symbol saved by 
previous version v0.10.1. Attempting to upgrade...
   [01:23:02] src/nnvm/legacy_json_util.cc:198: Symbol successfully upgraded!
   [01:28:29] src/operator/nn/./cudnn/./cudnn_algoreg-inl.h:107: Running 
performance tests to find the best convolution algorithm, this can take a
while... (setting env variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0 to disable)
   terminate called after throwing an instance of 'dmlc::Error'
 what():  [01:28:42] src/engine/./threaded_engine.h:359: [01:28:42] 
src/operator/nn/./cudnn/cudnn_convolution-inl.h:628: Check failed: e == 
C$DNN_STATUS_SUCCESS (4 vs. 0) cuDNN: CUDNN_STATUS_INTERNAL_ERROR
   
   Stack trace returned 9 entries:
   [bt] (0) 
/home/nvidia/Workspace/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(dmlc::StackTrace[abi:cxx11]()+0x58)
 [0x7f75778190]
   [bt] (1) 
/home/nvidia/Workspace/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x44)
 [0x7f75778c$c]
   [bt] (2) 
/home/nvidia/Workspace/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::op::CuDNNConvolutionOp::SelectAlgo(mxnet::Con$ext
 const&, std::vector const&, 
std::vector const&, 
$udnnDataType_t, cudnnDataType_t)::{lambda(mxnet::RunContext, 
mxnet::engine::CallbackOnComplete)#1}::operator()(mxnet::RunContext, 
mxnet::engin$::CallbackOnComplete) const+0x53c) [0x7f780e4a7c]
   [bt] (3) 
/home/nvidia/Workspace/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(std::_Function_handler::SelectAlgo(mxnet::Context const&, 
std::vector > const&, 
std::vector const&, 
cudnnDataType_t, cudnnDataType_t)::{lambda(mxnet::RunContex$, 
mxnet::engine::CallbackOnComplete)#1}>::_M_invoke(std::_Any_data const&, 
mxnet::RunContext&&, mxnet::engine::CallbackOnComplete&&)+0x2c) [0x$f780e58ec]
   [bt] (4) 
/home/nvidia/Workspace/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::engine::ThreadedEngine::ExecuteOprBlock(mxnet::RunCo$text,
 mxnet::engine::OprBlock*)+0x98) [0x7f775c7488]
   [bt] (5) 
/home/nvidia/Workspace/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(std::_Function_handler), 
mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock*, 
bool)::{lambda()#3}::operator()() 
c$nst::{lambda(std::shared_ptr)#1}>::_M_invoke(std::_Any_data
 const&, std::shared_ptr&&)+0x144) 
[0x7f775cf73c]
   [bt] (6) 
/home/nvidia/Workspace/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(std::thread::_Impl 
(std::shared_ptr)> >::_M_run()+0x48) 
[0x7f775ca$28]
   [bt] (7) /usr/lib/aarch64-linux-gnu/libstdc++.so.6(+0xb8280) [0x7f55292280]
   [bt] (8) /lib/aarch64-linux-gnu/libpthread.so.0(+0x6fc4) [0x7f867d5fc4]
   
   
   A fatal error occurred in asynchronous engine operation. If you do not know 
what caused this error, you can try set environment variable MXNET$ENGINE_TYPE 
to NaiveEngine and run with debugger (i.e. gdb). This will force all operations 
to be synchronous and backtrace will give you the $eries of calls that lead to 
this error. Remember to set MXNET_ENGINE_TYPE back to empty after debugging.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] solin319 opened a new issue #9624: problem when set fix_gamma=True in batchnorm

2018-01-29 Thread GitBox

solin319 opened a new issue #9624: problem when set fix_gamma=True in batchnorm
URL: https://github.com/apache/incubator-mxnet/issues/9624
 
 
   If fix_gamma is true, then set gamma to 1 and its gradient to 0.
   But the value of gamma will be changed during parameters update. So the 
gamma saved in param file was not 1. These will bring a problem in convert 
MXNet parameters to other deep-learning platforms.
   This problem was caused by we set a default weight-decay in SGD optimizer.
   We must define variable gamma with wd_mult=0 to fix gamma=1 during training.
   
   Can MXNet set wd of gamma to 0 automatically when fix_gamma=1? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] yanhn commented on issue #9612: CUDNN_STATUS_SUCCESS (4 vs. 0) cuDNN: CUDNN_STATUS_INTERNAL_ERROR on jetson TX2

2018-01-29 Thread GitBox

yanhn commented on issue #9612: CUDNN_STATUS_SUCCESS (4 vs. 0) cuDNN: 
CUDNN_STATUS_INTERNAL_ERROR on jetson TX2
URL: 
https://github.com/apache/incubator-mxnet/issues/9612#issuecomment-361446751
 
 
   @KellenSunderland @marcoabreu Thank you all.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] yanhn commented on issue #9612: CUDNN_STATUS_SUCCESS (4 vs. 0) cuDNN: CUDNN_STATUS_INTERNAL_ERROR on jetson TX2

2018-01-29 Thread GitBox

yanhn commented on issue #9612: CUDNN_STATUS_SUCCESS (4 vs. 0) cuDNN: 
CUDNN_STATUS_INTERNAL_ERROR on jetson TX2
URL: 
https://github.com/apache/incubator-mxnet/issues/9612#issuecomment-361446751
 
 
   @KellenSunderland @marcoabreu Thank you all.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] mbaijal commented on issue #9616: Removing a broken tutorial from the nightly tests

2018-01-29 Thread GitBox

mbaijal commented on issue #9616: Removing a broken tutorial from the nightly 
tests
URL: https://github.com/apache/incubator-mxnet/pull/9616#issuecomment-361446276
 
 
   @aaronmarkham 
   I will be migrating all the nightly tests to the new CI setup soon and as a 
part of the process I want to review and optimize the current tests. If you 
still need help, I can work with you sometime this/next week to make  the 
latest tutorials a part of the nightly tutorial test. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] szha commented on issue #9623: A tutorial for Word embeddings using mxnet

2018-01-29 Thread GitBox

szha commented on issue #9623: A tutorial for Word embeddings using mxnet
URL: 
https://github.com/apache/incubator-mxnet/issues/9623#issuecomment-361439985
 
 
   @astonzhang 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] anirudhacharya opened a new issue #9623: A tutorial for Word embeddings using mxnet

2018-01-29 Thread GitBox

anirudhacharya opened a new issue #9623: A tutorial for Word embeddings using 
mxnet
URL: https://github.com/apache/incubator-mxnet/issues/9623
 
 
   I was trying to get familiar with mxnet framework and was browsing the 
tutorial section - https://mxnet.apache.org/tutorials/index.html.
   
   I did not come across any illustrations for generating vector 
representations of words, like word2vec models( 
https://en.wikipedia.org/wiki/Word2vec ). Would it make sense to create a PR 
with one such tutorial. Is that something that will make sense in this code 
repo.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] Laurawly commented on issue #8915: NVLink communication pattern updated

2018-01-29 Thread GitBox

Laurawly commented on issue #8915: NVLink communication pattern updated 
URL: https://github.com/apache/incubator-mxnet/pull/8915#issuecomment-361364347
 
 
   @rahul003 should be solved by commit 
https://github.com/apache/incubator-mxnet/pull/8915/commits/683653e869f2b22afde50c66bd3212fcad93775d


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] ajayvohra2005 opened a new issue #9622: Unable to reproduce the published mAP for example/ssd with VGGNET model VOC0712 data

2018-01-29 Thread GitBox

ajayvohra2005 opened a new issue #9622: Unable to reproduce the published mAP 
for example/ssd with VGGNET model VOC0712  data
URL: https://github.com/apache/incubator-mxnet/issues/9622
 
 
   I was not able to reproduce the published mAP with SSD VGG16_reduced model  
with VOC0712 train-validation and VOC07 test data (300x300 image size) with 
MxNet (Expected 0.77, Actual 0.73).
   
   I used an effective batch size of 32 on 2 GPUs with a learning rate of 
0.001. Rest of the hyper-parameters were as documented in GitHub. 
   
   Trained on AWS Deep Learning AMI Ubuntu Linux - 2.4_Oct2017 (ami-37bb714d) 
p2.8xlarge
   
   
[ssd_caffe_test.txt](https://github.com/apache/incubator-mxnet/files/1675704/ssd_caffe_test.txt)
   
[ssd_caffe_train.txt](https://github.com/apache/incubator-mxnet/files/1675705/ssd_caffe_train.txt)
   
[ssd_mxnet_trainval.txt](https://github.com/apache/incubator-mxnet/files/1675706/ssd_mxnet_trainval.txt)
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] shankarrajus commented on issue #9217: Installing GPU support on Mac

2018-01-29 Thread GitBox

shankarrajus commented on issue #9217: Installing GPU support on Mac
URL: 
https://github.com/apache/incubator-mxnet/issues/9217#issuecomment-361427979
 
 
   Hi @helloniklas,
   Thank you so much for the detailed instruction. I think it worked. It took 
me around 4 minutes to complete 19 epochs, is it good? (I am new to this DL). 
   
   It said something about "MXNET_CUDNN_AUTOTUNE_DEFAULT = 0" do I need to 
worry about it?
   
   [05:10:49] src/operator/nn/./cudnn/./cudnn_algoreg-inl.h:107: Running 
performance tests to find the best convolution algorithm, this can take a 
while... (setting env variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0 to disable)
   INFO:root:Epoch[0] Batch [100]   Speed: 4560.95 samples/sec  
accuracy=0.876702
   ...
   INFO:root:Epoch[19] Batch [900]  Speed: 5639.22 samples/sec  
accuracy=1.00
   INFO:root:Epoch[19] Train-accuracy=1.00
   INFO:root:Epoch[19] Time cost=10.585
   INFO:root:Epoch[19] Validation-accuracy=0.992138
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] piiswrong commented on a change in pull request #9552: [REQUEST FOR REVIEW | DO NOT MERGE] Model Quantization with Calibration

2018-01-29 Thread GitBox

piiswrong commented on a change in pull request #9552: [REQUEST FOR REVIEW | DO 
NOT MERGE] Model Quantization with Calibration
URL: https://github.com/apache/incubator-mxnet/pull/9552#discussion_r164594359
 
 

 ##
 File path: python/mxnet/quantization.py
 ##
 @@ -0,0 +1,467 @@
+# Licensed to the Apache Software Foundation (ASF) under one
 
 Review comment:
   Put this in contrib


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] piiswrong closed pull request #9618: fixed links that were missng ndarray folder path

2018-01-29 Thread GitBox

piiswrong closed pull request #9618: fixed links that were missng ndarray 
folder path
URL: https://github.com/apache/incubator-mxnet/pull/9618
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/docs/community/contribute.md b/docs/community/contribute.md
index 5bb790eed7..9c3c3e1870 100644
--- a/docs/community/contribute.md
+++ b/docs/community/contribute.md
@@ -103,7 +103,7 @@ or is conceptual, add it in the C++ documentation. Make 
sure your example works
 by running a Python version of the example.
   * If a concrete and simple language-specific example can further clarify the 
API and the API arguments, add the
 example in language-specific files.
-* Refer to these examples for guidance:- 
[Embedding](http://mxnet.io/api/python/ndarray.html#mxnet.ndarray.Embedding) , 
[ROIPooling](http://mxnet.io/api/python/ndarray.html#mxnet.ndarray.ROIPooling) 
, [Reshape](http://mxnet.io/api/python/ndarray.html#mxnet.ndarray.Reshape).
+* Refer to these examples for guidance:- 
[Embedding](http://mxnet.io/api/python/ndarray/ndarray.html#mxnet.ndarray.Embedding)
 , 
[ROIPooling](http://mxnet.io/api/python/ndarray/ndarray.html#mxnet.ndarray.ROIPooling)
 , 
[Reshape](http://mxnet.io/api/python/ndarray/ndarray.html#mxnet.ndarray.Reshape).
 
 ### Testing and Rendering
 * Make sure not to break any coding standards. Run


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[incubator-mxnet] branch master updated: proper flatten in acc (#9619)

2018-01-29 Thread jxie

This is an automated email from the ASF dual-hosted git repository.

jxie pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git


The following commit(s) were added to refs/heads/master by this push:
 new ed823b2  proper flatten in acc (#9619)
ed823b2 is described below

commit ed823b2e187eb859d9475eb651465edf714c6c5f
Author: Sheng Zha 
AuthorDate: Mon Jan 29 14:14:07 2018 -0800

proper flatten in acc (#9619)
---
 python/mxnet/metric.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/python/mxnet/metric.py b/python/mxnet/metric.py
index f1cdae2..fc2b901 100644
--- a/python/mxnet/metric.py
+++ b/python/mxnet/metric.py
@@ -399,7 +399,7 @@ class Accuracy(EvalMetric):
 if pred_label.context != label.context:
 pred_label = pred_label.as_in_context(label.context)
 
-self.sum_metric += (pred_label.flatten() == 
label.flatten()).sum().asscalar()
+self.sum_metric += (pred_label.reshape((-1,)) == 
label.reshape((-1,))).sum().asscalar()
 self.num_inst += numpy.prod(pred_label.shape)
 
 

-- 
To stop receiving notification emails like this one, please contact
j...@apache.org.

[GitHub] piiswrong closed pull request #9619: proper flatten in acc

2018-01-29 Thread GitBox

piiswrong closed pull request #9619: proper flatten in acc
URL: https://github.com/apache/incubator-mxnet/pull/9619
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/python/mxnet/metric.py b/python/mxnet/metric.py
index f1cdae26a2..fc2b9014e8 100644
--- a/python/mxnet/metric.py
+++ b/python/mxnet/metric.py
@@ -399,7 +399,7 @@ def update(self, labels, preds):
 if pred_label.context != label.context:
 pred_label = pred_label.as_in_context(label.context)
 
-self.sum_metric += (pred_label.flatten() == 
label.flatten()).sum().asscalar()
+self.sum_metric += (pred_label.reshape((-1,)) == 
label.reshape((-1,))).sum().asscalar()
 self.num_inst += numpy.prod(pred_label.shape)
 
 


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[incubator-mxnet] branch master updated: fixed links that were missng ndarray folder path (#9618)

2018-01-29 Thread jxie

This is an automated email from the ASF dual-hosted git repository.

jxie pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git


The following commit(s) were added to refs/heads/master by this push:
 new 5a5cd64  fixed links that were missng ndarray folder path (#9618)
5a5cd64 is described below

commit 5a5cd64b042e92a67b97a8abf6b2fd7c64fe2c44
Author: thinksanky <31976455+thinksa...@users.noreply.github.com>
AuthorDate: Mon Jan 29 14:13:49 2018 -0800

fixed links that were missng ndarray folder path (#9618)
---
 docs/community/contribute.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/community/contribute.md b/docs/community/contribute.md
index 5bb790e..9c3c3e1 100644
--- a/docs/community/contribute.md
+++ b/docs/community/contribute.md
@@ -103,7 +103,7 @@ or is conceptual, add it in the C++ documentation. Make 
sure your example works
 by running a Python version of the example.
   * If a concrete and simple language-specific example can further clarify the 
API and the API arguments, add the
 example in language-specific files.
-* Refer to these examples for guidance:- 
[Embedding](http://mxnet.io/api/python/ndarray.html#mxnet.ndarray.Embedding) , 
[ROIPooling](http://mxnet.io/api/python/ndarray.html#mxnet.ndarray.ROIPooling) 
, [Reshape](http://mxnet.io/api/python/ndarray.html#mxnet.ndarray.Reshape).
+* Refer to these examples for guidance:- 
[Embedding](http://mxnet.io/api/python/ndarray/ndarray.html#mxnet.ndarray.Embedding)
 , 
[ROIPooling](http://mxnet.io/api/python/ndarray/ndarray.html#mxnet.ndarray.ROIPooling)
 , 
[Reshape](http://mxnet.io/api/python/ndarray/ndarray.html#mxnet.ndarray.Reshape).
 
 ### Testing and Rendering
 * Make sure not to break any coding standards. Run

-- 
To stop receiving notification emails like this one, please contact
j...@apache.org.

[GitHub] piiswrong closed pull request #9620: host test dataset for libsvmiter

2018-01-29 Thread GitBox

piiswrong closed pull request #9620: host test dataset for libsvmiter
URL: https://github.com/apache/incubator-mxnet/pull/9620
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/tests/python/unittest/test_io.py b/tests/python/unittest/test_io.py
index c246be4382..e8aba38b82 100644
--- a/tests/python/unittest/test_io.py
+++ b/tests/python/unittest/test_io.py
@@ -191,7 +191,6 @@ def test_NDArrayIter_csr():
 assert_almost_equal(batch.data[0].asnumpy(), expected)
 begin += batch_size
 
-@unittest.skip("test fails intermittently due to external dependency. 
temporarily disabled till it gets fixed. tracked at 
https://github.com/apache/incubator-mxnet/issues/9604;)
 def test_LibSVMIter():
 
 def check_libSVMIter_synthetic():
@@ -226,7 +225,7 @@ def check_libSVMIter_news_data():
 news_metadata = {
 'name': 'news20.t',
 'origin_name': 'news20.t.bz2',
-'url': 
"http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass/news20.t.bz2;,
+'url': 
"https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/news20.t.bz2;,
 'feature_dim': 62060,
 'num_classes': 20,
 'num_examples': 3993,


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[incubator-mxnet] branch master updated: host test dataset for libsvmiter (#9620)

2018-01-29 Thread jxie

This is an automated email from the ASF dual-hosted git repository.

jxie pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git


The following commit(s) were added to refs/heads/master by this push:
 new 16d995a  host test dataset for libsvmiter (#9620)
16d995a is described below

commit 16d995ab1a5192133852b50f5c10346aa7a4d81e
Author: Sheng Zha 
AuthorDate: Mon Jan 29 14:12:57 2018 -0800

host test dataset for libsvmiter (#9620)
---
 tests/python/unittest/test_io.py | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/tests/python/unittest/test_io.py b/tests/python/unittest/test_io.py
index c246be4..e8aba38 100644
--- a/tests/python/unittest/test_io.py
+++ b/tests/python/unittest/test_io.py
@@ -191,7 +191,6 @@ def test_NDArrayIter_csr():
 assert_almost_equal(batch.data[0].asnumpy(), expected)
 begin += batch_size
 
-@unittest.skip("test fails intermittently due to external dependency. 
temporarily disabled till it gets fixed. tracked at 
https://github.com/apache/incubator-mxnet/issues/9604;)
 def test_LibSVMIter():
 
 def check_libSVMIter_synthetic():
@@ -226,7 +225,7 @@ def test_LibSVMIter():
 news_metadata = {
 'name': 'news20.t',
 'origin_name': 'news20.t.bz2',
-'url': 
"http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass/news20.t.bz2;,
+'url': 
"https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/news20.t.bz2;,
 'feature_dim': 62060,
 'num_classes': 20,
 'num_examples': 3993,

-- 
To stop receiving notification emails like this one, please contact
j...@apache.org.

[GitHub] piiswrong closed issue #9604: test_io.test_LibSVMIter fails intermittently

2018-01-29 Thread GitBox

piiswrong closed issue #9604: test_io.test_LibSVMIter fails intermittently
URL: https://github.com/apache/incubator-mxnet/issues/9604
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] Ishitori opened a new issue #9621: Documentation error: eval_metric vs eval.metric in R mx.mlp

2018-01-29 Thread GitBox

Ishitori opened a new issue #9621: Documentation error: eval_metric vs 
eval.metric in R mx.mlp
URL: https://github.com/apache/incubator-mxnet/issues/9621
 
 
   ## Description
   R client's documentation of mx.mlp has a wrong name for the parameter. It is 
named "eval_metric" while it actually should be "eval.metric" as in 
mx.model.FeedForward.create. See 
https://github.com/apache/incubator-mxnet/blob/master/R-package/R/mlp.R
   
   ## Environment info (Required)
   --Python Info--
   Version  : 3.6.2
   Compiler : GCC 4.2.1 Compatible Apple LLVM 8.1.0 (clang-802.0.42)
   Build: ('default', 'Jul 17 2017 16:44:45')
   Arch : ('64bit', '')
   Pip Info---
   Version  : 9.0.1
   Directory: /usr/local/lib/python3.6/site-packages/pip
   --MXNet Info---
   Version  : 0.11.0
   Directory: /usr/local/lib/python3.6/site-packages/mxnet
   Commit Hash   : 53274b4a2b0d73f3fbdb10cfb5f9ed0c8263fda7
   --System Info--
   Platform : Darwin-16.7.0-x86_64-i386-64bit
   system   : Darwin
   node : dca9048716cf.ant.amazon.com
   release  : 16.7.0
   version  : Darwin Kernel Version 16.7.0: Thu Jan 11 22:59:40 PST 2018; 
root:xnu-3789.73.8~1/RELEASE_X86_64
   --Hardware Info--
   machine  : x86_64
   processor: i386
   b'machdep.cpu.extfeatures: SYSCALL XD 1GBPAGE EM64T LAHF LZCNT PREFETCHW 
RDTSCP TSCI'
   b'machdep.cpu.leaf7_features: SMEP ERMS RDWRFSGS TSC_THREAD_OFFSET BMI1 HLE 
AVX2 BMI2 INVPCID RTM SMAP RDSEED ADX IPT SGX FPU_CSDS MPX CLFSOPT'
   b'machdep.cpu.features: FPU VME DE PSE TSC MSR PAE MCE CX8 APIC SEP MTRR PGE 
MCA CMOV PAT PSE36 CLFSH DS ACPI MMX FXSR SSE SSE2 SS HTT TM PBE SSE3 PCLMULQDQ 
DTES64 MON DSCPL VMX SMX EST TM2 SSSE3 FMA CX16 TPR PDCM SSE4.1 SSE4.2 x2APIC 
MOVBE POPCNT AES PCID XSAVE OSXSAVE SEGLIM64 TSCTMR AVX1.0 RDRAND F16C'
   b'machdep.cpu.brand_string: Intel(R) Core(TM) i7-7660U CPU @ 2.50GHz'
   --Network Test--
   Setting timeout: 10
   Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0030 
sec, LOAD: 0.6859 sec.
   Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.0009 sec, LOAD: 
0.1113 sec.
   Timing for Gluon Tutorial(cn): https://zh.gluon.ai, DNS: 0.0015 sec, LOAD: 
0.2410 sec.
   Timing for FashionMNIST: 
https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz,
 DNS: 0.0007 sec, LOAD: 0.2651 sec.
   Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0008 sec, LOAD: 
0.2126 sec.
   Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0006 sec, 
LOAD: 0.1419 sec.
sssokolo@dca9048716cf ? /Volumes/Unix/workspace/MxNet/stack-exchange-qa ? ls
   45798125.py  48292162_rmse_version.R  Data_breastcancer.csv 
diagnose.py
   46195917.R   48292162_softmax_version.R  Untitled.ipynb 
notMNIST.npz
   48292162.R   48490010.R   breast-cancer-wisconsin.data
sssokolo@dca9048716cf ? /Volumes/Unix/workspace/MxNet/stack-exchange-qa ? 
python3 diagnose.py
   --Python Info--
   Version  : 3.6.2
   Compiler : GCC 4.2.1 Compatible Apple LLVM 8.1.0 (clang-802.0.42)
   Build: ('default', 'Jul 17 2017 16:44:45')
   Arch : ('64bit', '')
   Pip Info---
   Version  : 9.0.1
   Directory: /usr/local/lib/python3.6/site-packages/pip
   --MXNet Info---
   Version  : 0.11.0
   Directory: /usr/local/lib/python3.6/site-packages/mxnet
   Commit Hash   : 53274b4a2b0d73f3fbdb10cfb5f9ed0c8263fda7
   --System Info--
   Platform : Darwin-16.7.0-x86_64-i386-64bit
   system   : Darwin
   node : dca9048716cf.ant.amazon.com
   release  : 16.7.0
   version  : Darwin Kernel Version 16.7.0: Thu Jan 11 22:59:40 PST 2018; 
root:xnu-3789.73.8~1/RELEASE_X86_64
   --Hardware Info--
   machine  : x86_64
   processor: i386
   b'machdep.cpu.extfeatures: SYSCALL XD 1GBPAGE EM64T LAHF LZCNT PREFETCHW 
RDTSCP TSCI'
   b'machdep.cpu.leaf7_features: SMEP ERMS RDWRFSGS TSC_THREAD_OFFSET BMI1 HLE 
AVX2 BMI2 INVPCID RTM SMAP RDSEED ADX IPT SGX FPU_CSDS MPX CLFSOPT'
   b'machdep.cpu.features: FPU VME DE PSE TSC MSR PAE MCE CX8 APIC SEP MTRR PGE 
MCA CMOV PAT PSE36 CLFSH DS ACPI MMX FXSR SSE SSE2 SS HTT TM PBE SSE3 PCLMULQDQ 
DTES64 MON DSCPL VMX SMX EST TM2 SSSE3 FMA CX16 TPR PDCM SSE4.1 SSE4.2 x2APIC 
MOVBE POPCNT AES PCID XSAVE OSXSAVE SEGLIM64 TSCTMR AVX1.0 RDRAND F16C'
   b'machdep.cpu.brand_string: Intel(R) Core(TM) i7-7660U CPU @ 2.50GHz'
   --Network Test--
   Setting timeout: 10
   Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0560 
sec, LOAD: 0.7075 sec.
   Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.0598 sec, LOAD: 
0.2979 sec.
   Timing for Gluon Tutorial(cn): https://zh.gluon.ai, DNS: 0.1063 sec, LOAD:

[GitHub] anirudh2290 commented on a change in pull request #9552: [REQUEST FOR REVIEW | DO NOT MERGE] Model Quantization with Calibration

2018-01-29 Thread GitBox

anirudh2290 commented on a change in pull request #9552: [REQUEST FOR REVIEW | 
DO NOT MERGE] Model Quantization with Calibration
URL: https://github.com/apache/incubator-mxnet/pull/9552#discussion_r164566272
 
 

 ##
 File path: python/mxnet/quantization.py
 ##
 @@ -0,0 +1,467 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from __future__ import absolute_import
+
+try:
+from scipy import stats
+except ImportError:
+stats = None
+
+import numpy as np
+import ctypes
+import logging
+import os
+from .base import _LIB, check_call
+from .base import c_array, c_str, mx_uint, c_str_array
+from .base import NDArrayHandle, SymbolHandle
+from .symbol import Symbol, load
+from . import ndarray as nd
+from .ndarray import NDArray
+from .io import DataIter
+from .context import cpu, Context
+from .module import Module
+
+
+def _quantize_params(qsym, params):
+"""Given a quantized symbol and a dict of params that have not been 
quantized, generate quantized params.
+Currently only supports quantizing the arg_params with names of `weight` 
or `bias`, not aux_params.
+If `qsym` contains symbols that are excluded from being quantized, their 
corresponding params will
+not be quantized, but saved together with quantized params of the symbols 
that have been quantized.
+
+Parameters
+--
+qsym : Symbol
+Quantized symbol from FP32 symbol.
+params : dict of str->NDArray
+"""
+inputs_name = qsym.list_arguments()
+quantized_params = {}
+for name in inputs_name:
+if name.endswith(('weight_quantize', 'bias_quantize')):
+original_name = name[:-len('_quantize')]
+param = params[original_name]
+val, vmin, vmax = nd.contrib.quantize(data=param, 
min_range=nd.min(param),
+  max_range=nd.max(param), 
out_type='int8')
+quantized_params[name] = val
+quantized_params[name+'_min'] = vmin
+quantized_params[name+'_max'] = vmax
+elif name in params:
+quantized_params[name] = params[name]
+return quantized_params
+
+
+def _quantize_symbol(sym, excluded_symbols=None, offline_params=None):
+"""Given a symbol object representing a neural network of data type FP32, 
quantize it into a INT8 network.
+
+Parameters
+--
+sym : Symbol
+FP32 neural network symbol.
+excluded_symbols : list of symbols
+Nodes in the network that users do not want to replace with a symbol 
of INT8 data type.
+offline_params : list of strs
+Names of the parameters that users want to quantize offline. It's 
always recommended to quantize parameters
+offline so that quantizing parameters during the inference can be 
avoided.
+"""
+num_excluded_symbols = 0
+excluded_handles = []
+if excluded_symbols is not None:
+assert isinstance(excluded_symbols, list)
+num_excluded_symbols = len(excluded_symbols)
+for s in excluded_symbols:
+excluded_handles.append(s.handle)
+
+num_offline = 0
+offline = []
+if offline_params is not None:
+num_offline = len(offline_params)
+for k in offline_params:
+offline.append(c_str(k))
+
+out = SymbolHandle()
+check_call(_LIB.MXQuantizeSymbol(sym.handle,
+ ctypes.byref(out),
+ mx_uint(num_excluded_symbols),
+ c_array(SymbolHandle, excluded_handles),
+ mx_uint(num_offline),
+ c_array(ctypes.c_char_p, offline)))
+return Symbol(out)
+
+
+class _LayerOutputCollector(object):
+"""Saves layer output NDArray in a dict with layer names as keys and lists 
of NDArrays as values.
+The collected NDArrays will be used for calculating the optimal thresholds 
for quantization using
+KL divergence."""
+def __init__(self, include_layer=None, logger=None):
+self.nd_dict = {}
+self.include_layer = include_layer
+self.logger = logger
+
+def collect(self, name, ndarray):
+if self.include_layer is not

[GitHub] zhreshold commented on a change in pull request #8918: Added in Large-Batch SGD with a warmup, and a LARS startegy. Also add?

2018-01-29 Thread GitBox

zhreshold commented on a change in pull request #8918: Added in Large-Batch SGD 
with a warmup, and a LARS startegy. Also add?
URL: https://github.com/apache/incubator-mxnet/pull/8918#discussion_r164556307
 
 

 ##
 File path: python/mxnet/optimizer.py
 ##
 @@ -645,6 +645,195 @@ def update(self, index, weight, grad, state):
 ftml_update(weight, grad, prev_d, prev_v, prev_z, out=weight,
 lr=lr, wd=wd, **kwargs)
 
+@register
+class LBSGD(Optimizer):
+"""The Large Batch SGD optimizer with momentum and weight decay.
+
+The optimizer updates the weight by::
+
+state = momentum * state + lr * rescale_grad * clip(grad, 
clip_gradient) + wd * weight
+weight = weight - state
+
+For details of the update algorithm see 
:class:`~mxnet.ndarray.lbsgd_update` and
+:class:`~mxnet.ndarray.lbsgd_mom_update`.
+
+This optimizer accepts the following parameters in addition to those 
accepted
+by :class:`.Optimizer`.
+
+Parameters
+--
+momentum : float, optional
+   The momentum value.
+multi_precision: bool, optional
+   Flag to control the internal precision of the optimizer.
+   ``False`` results in using the same precision as the weights (default),
+   ``True`` makes internal 32-bit copy of the weights and applies gradients
+in 32-bit precision even if actual weights used in the model 
have lower precision.`<
+Turning this on can improve convergence and accuracy when 
training with float16.
+warmup_strategy: string ('linear', 'power2', 'sqrt'. , 'lars'   default : 
'linear')
+warmup_epochs: unsigned, default: 5
+batch_scale:   unsigned, default: 1 (same as batch size*numworkers)
+updates_per_epoch: updates_per_epoch (default: 32, Default might not 
reflect true number batches per epoch. Used for warmup.)
+begin_epoch: unsigned, default 0, starting epoch.
 
 Review comment:
   @ashokei please add more details describing the strategy.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] zhreshold commented on a change in pull request #8918: Added in Large-Batch SGD with a warmup, and a LARS startegy. Also add?

2018-01-29 Thread GitBox

zhreshold commented on a change in pull request #8918: Added in Large-Batch SGD 
with a warmup, and a LARS startegy. Also add?
URL: https://github.com/apache/incubator-mxnet/pull/8918#discussion_r164555827
 
 

 ##
 File path: python/mxnet/optimizer.py
 ##
 @@ -645,6 +645,195 @@ def update(self, index, weight, grad, state):
 ftml_update(weight, grad, prev_d, prev_v, prev_z, out=weight,
 lr=lr, wd=wd, **kwargs)
 
+@register
+class LBSGD(Optimizer):
+"""The Large Batch SGD optimizer with momentum and weight decay.
+
+The optimizer updates the weight by::
+
+state = momentum * state + lr * rescale_grad * clip(grad, 
clip_gradient) + wd * weight
+weight = weight - state
+
+For details of the update algorithm see 
:class:`~mxnet.ndarray.lbsgd_update` and
+:class:`~mxnet.ndarray.lbsgd_mom_update`.
+
+This optimizer accepts the following parameters in addition to those 
accepted
+by :class:`.Optimizer`.
+
+Parameters
+--
+momentum : float, optional
+   The momentum value.
+multi_precision: bool, optional
+   Flag to control the internal precision of the optimizer.
+   ``False`` results in using the same precision as the weights (default),
+   ``True`` makes internal 32-bit copy of the weights and applies gradients
+in 32-bit precision even if actual weights used in the model 
have lower precision.`<
+Turning this on can improve convergence and accuracy when 
training with float16.
+warmup_strategy: string ('linear', 'power2', 'sqrt'. , 'lars'   default : 
'linear')
+warmup_epochs: unsigned, default: 5
+batch_scale:   unsigned, default: 1 (same as batch size*numworkers)
+updates_per_epoch: updates_per_epoch (default: 32, Default might not 
reflect true number batches per epoch. Used for warmup.)
 
 Review comment:
   I guess it requires the epoch number to stop warming up, which does not 
depend on the number of updates. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] zhreshold commented on a change in pull request #8918: Added in Large-Batch SGD with a warmup, and a LARS startegy. Also add?

2018-01-29 Thread GitBox

zhreshold commented on a change in pull request #8918: Added in Large-Batch SGD 
with a warmup, and a LARS startegy. Also add?
URL: https://github.com/apache/incubator-mxnet/pull/8918#discussion_r164555449
 
 

 ##
 File path: python/mxnet/optimizer.py
 ##
 @@ -645,6 +645,195 @@ def update(self, index, weight, grad, state):
 ftml_update(weight, grad, prev_d, prev_v, prev_z, out=weight,
 lr=lr, wd=wd, **kwargs)
 
+@register
+class LBSGD(Optimizer):
+"""The Large Batch SGD optimizer with momentum and weight decay.
+
+The optimizer updates the weight by::
+
+state = momentum * state + lr * rescale_grad * clip(grad, 
clip_gradient) + wd * weight
+weight = weight - state
+
+For details of the update algorithm see 
:class:`~mxnet.ndarray.lbsgd_update` and
+:class:`~mxnet.ndarray.lbsgd_mom_update`.
+
+This optimizer accepts the following parameters in addition to those 
accepted
+by :class:`.Optimizer`.
+
+Parameters
+--
+momentum : float, optional
+   The momentum value.
+multi_precision: bool, optional
+   Flag to control the internal precision of the optimizer.
+   ``False`` results in using the same precision as the weights (default),
+   ``True`` makes internal 32-bit copy of the weights and applies gradients
+in 32-bit precision even if actual weights used in the model 
have lower precision.`<
+Turning this on can improve convergence and accuracy when 
training with float16.
+warmup_strategy: string ('linear', 'power2', 'sqrt'. , 'lars'   default : 
'linear')
+warmup_epochs: unsigned, default: 5
+batch_scale:   unsigned, default: 1 (same as batch size*numworkers)
+updates_per_epoch: updates_per_epoch (default: 32, Default might not 
reflect true number batches per epoch. Used for warmup.)
+begin_epoch: unsigned, default 0, starting epoch.
+"""
+
+def __init__(self, momentum=0.0, multi_precision=False, 
warmup_strategy='linear',
+ warmup_epochs=5, batch_scale=1, updates_per_epoch=32, 
begin_epoch=0, num_epochs=60,
+ **kwargs):
+super(LBSGD, self).__init__(**kwargs)
+logging.info('Running Large-Batch SGD Algorithm')
+logging.info('(Batch_scale=%f, warmup_epochs=%d, warmup_strategy=%s, 
updates_per_epoch=%d)',
+ batch_scale, warmup_epochs, warmup_strategy, 
updates_per_epoch)
+self.momentum = momentum
+self.multi_precision = multi_precision
+# new user parameters for large batch
+self.warmup_strategy = warmup_strategy
+self.warmup_epochs = warmup_epochs
+self.batch_scale = batch_scale
+self.updates_per_epoch = updates_per_epoch
+self.init_updates = begin_epoch * updates_per_epoch
+self.num_epochs = num_epochs
+# addl internal usage parameters and storage
+self.lbmult = 1
+self.cumgrads = {}
+# for adaptive lr
+self.adaptive = False
+self.admult = 1  # adaptation constant
+
+def create_state(self, index, weight):
 
 Review comment:
   @ashokei As suggested, could you change to inherit SGD and override 
`create_state_multi_precision`, `create_state`, `update`, 
`update_multi_precision` only if necessary.  Seems like you are mixing 
multi_precision part into the normal one.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] ashokei commented on a change in pull request #8918: Added in Large-Batch SGD with a warmup, and a LARS startegy. Also add?

2018-01-29 Thread GitBox

ashokei commented on a change in pull request #8918: Added in Large-Batch SGD 
with a warmup, and a LARS startegy. Also add?
URL: https://github.com/apache/incubator-mxnet/pull/8918#discussion_r164551550
 
 

 ##
 File path: python/mxnet/optimizer.py
 ##
 @@ -645,6 +645,195 @@ def update(self, index, weight, grad, state):
 ftml_update(weight, grad, prev_d, prev_v, prev_z, out=weight,
 lr=lr, wd=wd, **kwargs)
 
+@register
+class LBSGD(Optimizer):
+"""The Large Batch SGD optimizer with momentum and weight decay.
+
+The optimizer updates the weight by::
+
+state = momentum * state + lr * rescale_grad * clip(grad, 
clip_gradient) + wd * weight
+weight = weight - state
+
+For details of the update algorithm see 
:class:`~mxnet.ndarray.lbsgd_update` and
 
 Review comment:
   it is the update method in LBSGD class.  we can fix that to resolve to right 
method, the '_' is misleading.
   the paper is here: 
   https://arxiv.org/pdf/1708.03888.pdf
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] szha commented on issue #9620: host test dataset for libsvmiter

2018-01-29 Thread GitBox

szha commented on issue #9620: host test dataset for libsvmiter
URL: https://github.com/apache/incubator-mxnet/pull/9620#issuecomment-361370963
 
 
   I didn't find any restriction on redistributing the data, and it's already 
redistributed by many institutes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] szha opened a new pull request #9620: host test dataset for libsvmiter

2018-01-29 Thread GitBox

szha opened a new pull request #9620: host test dataset for libsvmiter
URL: https://github.com/apache/incubator-mxnet/pull/9620
 
 
   ## Description ##
   Fixes #9604 
   
   ## Checklist ##
   ### Essentials ###
   - [x] Passed code style checking (`make lint`)
   - [x] Changes are complete (i.e. I finished coding on this PR)
   - [x] All changes have test coverage:
   - Unit tests are added for small changes to verify correctness (e.g. adding 
a new operator)
   - Nightly tests are added for complicated/long-running ones (e.g. changing 
distributed kvstore)
   - Build tests will be added for build configuration changes (e.g. adding a 
new build option with NCCL)
   - [x] Code is well-documented: 
   - For user-facing API changes, API doc string has been updated. 
   - For new C++ functions in header files, their functionalities and arguments 
are documented. 
   - For new examples, README.md is added to explain the what the example does, 
the source of the dataset, expected performance on test set and reference to 
the original paper if applicable
   - [x] To the my best knowledge, examples are either not affected by this 
change, or have been fixed to be compatible with this change
   
   ### Changes ###
   - [x] Fix flaky test and re-enable


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] Laurawly commented on issue #8915: NVLink communication pattern updated

2018-01-29 Thread GitBox

Laurawly commented on issue #8915: NVLink communication pattern updated 
URL: https://github.com/apache/incubator-mxnet/pull/8915#issuecomment-361364347
 
 
   @rahul003 should be solved by commit 
https://github.com/apache/incubator-mxnet/pull/8915/commits/d3aeed51b9e26a25c7bfe2b09c57e1fff0ead2a0


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] sxjscience commented on issue #9613: Transition matrix is not updated during the train in the lstm_crf example

2018-01-29 Thread GitBox

sxjscience commented on issue #9613: Transition matrix is not updated during 
the train in the lstm_crf example
URL: 
https://github.com/apache/incubator-mxnet/issues/9613#issuecomment-361363684
 
 
   OK. Feel free to provide more details.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] sxjscience commented on issue #9605: SGLD Optimizer in python cannot work due to wrong argument position

2018-01-29 Thread GitBox

sxjscience commented on issue #9605: SGLD Optimizer in python cannot work due 
to wrong argument position
URL: 
https://github.com/apache/incubator-mxnet/issues/9605#issuecomment-361362788
 
 
   Thanks for reporting this! Would you like to submit a pull request to fix 
the problem? We can change the code to `normal(0, math.sqrt(lr), 
shape=weight.shape, dtype=weight.dtype, ctx=weight.context)`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] thinksanky commented on issue #9548: updated release version to 1.1.0 on the mainpage and re-arranged news?

2018-01-29 Thread GitBox

thinksanky commented on issue #9548: updated release version to 1.1.0 on the 
mainpage and re-arranged news?
URL: https://github.com/apache/incubator-mxnet/pull/9548#issuecomment-361357782
 
 
   Note that the version numbering has changed from 1.0.1 to 1.1.0 for the 
release


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] piiswrong commented on issue #9583: use nd for accuracy calculation

2018-01-29 Thread GitBox

piiswrong commented on issue #9583: use nd for accuracy calculation
URL: https://github.com/apache/incubator-mxnet/pull/9583#issuecomment-361357076
 
 
   Since this is a performance improvement, please verify that it indeed 
improve performance, at least for common cases.
   
   Please verify this change does not bring back the negative performance 
impact reported by #7995.
   Please also verify that this change does improve performance for the case 
@zackchase reported


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] piiswrong commented on issue #9583: use nd for accuracy calculation

2018-01-29 Thread GitBox

piiswrong commented on issue #9583: use nd for accuracy calculation
URL: https://github.com/apache/incubator-mxnet/pull/9583#issuecomment-361357076
 
 
   Since this is a performance improvement, please verify that it indeed 
improve performance, at least for common cases.
   
   Please verify this change does not bring back the negative performance 
impact reported by #7995


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] szha opened a new pull request #9619: proper flatten in acc

2018-01-29 Thread GitBox

szha opened a new pull request #9619: proper flatten in acc
URL: https://github.com/apache/incubator-mxnet/pull/9619
 
 
   ## Description ##
   use reshape instead of flatten for flattening in acc metric.
   
   ## Checklist ##
   ### Essentials ###
   - [x] Passed code style checking (`make lint`)
   - [x] Changes are complete (i.e. I finished coding on this PR)
   - [x] All changes have test coverage:
   - Unit tests are added for small changes to verify correctness (e.g. adding 
a new operator)
   - Nightly tests are added for complicated/long-running ones (e.g. changing 
distributed kvstore)
   - Build tests will be added for build configuration changes (e.g. adding a 
new build option with NCCL)
   - [x] Code is well-documented: 
   - For user-facing API changes, API doc string has been updated. 
   - For new C++ functions in header files, their functionalities and arguments 
are documented. 
   - For new examples, README.md is added to explain the what the example does, 
the source of the dataset, expected performance on test set and reference to 
the original paper if applicable
   - [x] To the my best knowledge, examples are either not affected by this 
change, or have been fixed to be compatible with this change
   
   ### Changes ###
   - [x] use reshape instead of flatten for flattening.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] szha commented on issue #9583: use nd for accuracy calculation

2018-01-29 Thread GitBox

szha commented on issue #9583: use nd for accuracy calculation
URL: https://github.com/apache/incubator-mxnet/pull/9583#issuecomment-361354908
 
 
   Thanks for the comment. I will switch to use reshape.
   
   Regarding performance, that change was made before the CPU kernel 
optimization work. Now we should have better performance. Thanks to that, I'm 
not sure now whether it's worth investing time and code complexity on this now. 
Also, I'm not entirely convinced that the frontend code should do the job of 
the backend such as selecting implementation, just for the sake of performance. 
What do you think?
   
   That said, if there's immediate performance hit due to switching to ND, I'm 
open to switching back to numpy. Given that it would be infeasible for me to 
check for all cases, is there any specific observation or reasons from your 
side that requires attention on its performance?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] thinksanky opened a new pull request #9618: fixed links that were missng ndarray folder path

2018-01-29 Thread GitBox

thinksanky opened a new pull request #9618: fixed links that were missng 
ndarray folder path
URL: https://github.com/apache/incubator-mxnet/pull/9618
 
 
   ## Description ##
   Fixed the contribute.md file. There were 3 links in this file that were 
broken earlier missing the ndarray folder in the path.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] Garglesoap commented on issue #9399: error: unable to load shared object /mxnet/libs/libmxnet.so

2018-01-29 Thread GitBox

Garglesoap commented on issue #9399:  error: unable to load shared object 
/mxnet/libs/libmxnet.so
URL: 
https://github.com/apache/incubator-mxnet/issues/9399#issuecomment-361353405
 
 
   I think so. The missing files do exist on the system. It might also be a 
difference between the directory structure of the deep learning AMI vs the deep 
learning base AMI. Mxnet/BLAS are installed in a non-default directory under 
/src/  on the DL-AMI, but are in /home/user on the DL-base (the one that works).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] piiswrong commented on a change in pull request #8918: Added in Large-Batch SGD with a warmup, and a LARS startegy. Also add?

2018-01-29 Thread GitBox

piiswrong commented on a change in pull request #8918: Added in Large-Batch SGD 
with a warmup, and a LARS startegy. Also add?
URL: https://github.com/apache/incubator-mxnet/pull/8918#discussion_r164532080
 
 

 ##
 File path: python/mxnet/optimizer.py
 ##
 @@ -645,6 +645,195 @@ def update(self, index, weight, grad, state):
 ftml_update(weight, grad, prev_d, prev_v, prev_z, out=weight,
 lr=lr, wd=wd, **kwargs)
 
+@register
+class LBSGD(Optimizer):
+"""The Large Batch SGD optimizer with momentum and weight decay.
+
+The optimizer updates the weight by::
+
+state = momentum * state + lr * rescale_grad * clip(grad, 
clip_gradient) + wd * weight
+weight = weight - state
+
+For details of the update algorithm see 
:class:`~mxnet.ndarray.lbsgd_update` and
+:class:`~mxnet.ndarray.lbsgd_mom_update`.
+
+This optimizer accepts the following parameters in addition to those 
accepted
+by :class:`.Optimizer`.
+
+Parameters
+--
+momentum : float, optional
+   The momentum value.
+multi_precision: bool, optional
+   Flag to control the internal precision of the optimizer.
+   ``False`` results in using the same precision as the weights (default),
+   ``True`` makes internal 32-bit copy of the weights and applies gradients
+in 32-bit precision even if actual weights used in the model 
have lower precision.`<
+Turning this on can improve convergence and accuracy when 
training with float16.
+warmup_strategy: string ('linear', 'power2', 'sqrt'. , 'lars'   default : 
'linear')
+warmup_epochs: unsigned, default: 5
+batch_scale:   unsigned, default: 1 (same as batch size*numworkers)
+updates_per_epoch: updates_per_epoch (default: 32, Default might not 
reflect true number batches per epoch. Used for warmup.)
+begin_epoch: unsigned, default 0, starting epoch.
 
 Review comment:
   What's starting epoch? What would it do before start epoch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] piiswrong commented on a change in pull request #8918: Added in Large-Batch SGD with a warmup, and a LARS startegy. Also add?

2018-01-29 Thread GitBox

piiswrong commented on a change in pull request #8918: Added in Large-Batch SGD 
with a warmup, and a LARS startegy. Also add?
URL: https://github.com/apache/incubator-mxnet/pull/8918#discussion_r164532005
 
 

 ##
 File path: python/mxnet/optimizer.py
 ##
 @@ -645,6 +645,195 @@ def update(self, index, weight, grad, state):
 ftml_update(weight, grad, prev_d, prev_v, prev_z, out=weight,
 lr=lr, wd=wd, **kwargs)
 
+@register
+class LBSGD(Optimizer):
+"""The Large Batch SGD optimizer with momentum and weight decay.
+
+The optimizer updates the weight by::
+
+state = momentum * state + lr * rescale_grad * clip(grad, 
clip_gradient) + wd * weight
+weight = weight - state
+
+For details of the update algorithm see 
:class:`~mxnet.ndarray.lbsgd_update` and
+:class:`~mxnet.ndarray.lbsgd_mom_update`.
+
+This optimizer accepts the following parameters in addition to those 
accepted
+by :class:`.Optimizer`.
+
+Parameters
+--
+momentum : float, optional
+   The momentum value.
+multi_precision: bool, optional
+   Flag to control the internal precision of the optimizer.
+   ``False`` results in using the same precision as the weights (default),
+   ``True`` makes internal 32-bit copy of the weights and applies gradients
+in 32-bit precision even if actual weights used in the model 
have lower precision.`<
+Turning this on can improve convergence and accuracy when 
training with float16.
+warmup_strategy: string ('linear', 'power2', 'sqrt'. , 'lars'   default : 
'linear')
+warmup_epochs: unsigned, default: 5
+batch_scale:   unsigned, default: 1 (same as batch size*numworkers)
+updates_per_epoch: updates_per_epoch (default: 32, Default might not 
reflect true number batches per epoch. Used for warmup.)
 
 Review comment:
   Why use warmup_epochs and updates_per_epoch? Why not just warmup_updates?
   Why should it have a default value?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] rahul003 commented on issue #9152: tutorial for distributed training

2018-01-29 Thread GitBox

rahul003 commented on issue #9152: tutorial for distributed training
URL: https://github.com/apache/incubator-mxnet/pull/9152#issuecomment-361351974
 
 
   Fixed


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] piiswrong commented on a change in pull request #8918: Added in Large-Batch SGD with a warmup, and a LARS startegy. Also add?

2018-01-29 Thread GitBox

piiswrong commented on a change in pull request #8918: Added in Large-Batch SGD 
with a warmup, and a LARS startegy. Also add?
URL: https://github.com/apache/incubator-mxnet/pull/8918#discussion_r164531304
 
 

 ##
 File path: python/mxnet/optimizer.py
 ##
 @@ -645,6 +645,195 @@ def update(self, index, weight, grad, state):
 ftml_update(weight, grad, prev_d, prev_v, prev_z, out=weight,
 lr=lr, wd=wd, **kwargs)
 
+@register
+class LBSGD(Optimizer):
+"""The Large Batch SGD optimizer with momentum and weight decay.
+
+The optimizer updates the weight by::
+
+state = momentum * state + lr * rescale_grad * clip(grad, 
clip_gradient) + wd * weight
+weight = weight - state
+
+For details of the update algorithm see 
:class:`~mxnet.ndarray.lbsgd_update` and
+:class:`~mxnet.ndarray.lbsgd_mom_update`.
+
+This optimizer accepts the following parameters in addition to those 
accepted
+by :class:`.Optimizer`.
+
+Parameters
+--
+momentum : float, optional
+   The momentum value.
+multi_precision: bool, optional
+   Flag to control the internal precision of the optimizer.
+   ``False`` results in using the same precision as the weights (default),
+   ``True`` makes internal 32-bit copy of the weights and applies gradients
+in 32-bit precision even if actual weights used in the model 
have lower precision.`<
+Turning this on can improve convergence and accuracy when 
training with float16.
+warmup_strategy: string ('linear', 'power2', 'sqrt'. , 'lars'   default : 
'linear')
+warmup_epochs: unsigned, default: 5
+batch_scale:   unsigned, default: 1 (same as batch size*numworkers)
+updates_per_epoch: updates_per_epoch (default: 32, Default might not 
reflect true number batches per epoch. Used for warmup.)
+begin_epoch: unsigned, default 0, starting epoch.
+"""
+
+def __init__(self, momentum=0.0, multi_precision=False, 
warmup_strategy='linear',
+ warmup_epochs=5, batch_scale=1, updates_per_epoch=32, 
begin_epoch=0, num_epochs=60,
+ **kwargs):
+super(LBSGD, self).__init__(**kwargs)
+logging.info('Running Large-Batch SGD Algorithm')
+logging.info('(Batch_scale=%f, warmup_epochs=%d, warmup_strategy=%s, 
updates_per_epoch=%d)',
+ batch_scale, warmup_epochs, warmup_strategy, 
updates_per_epoch)
+self.momentum = momentum
+self.multi_precision = multi_precision
+# new user parameters for large batch
+self.warmup_strategy = warmup_strategy
+self.warmup_epochs = warmup_epochs
+self.batch_scale = batch_scale
+self.updates_per_epoch = updates_per_epoch
+self.init_updates = begin_epoch * updates_per_epoch
+self.num_epochs = num_epochs
+# addl internal usage parameters and storage
+self.lbmult = 1
+self.cumgrads = {}
+# for adaptive lr
+self.adaptive = False
+self.admult = 1  # adaptation constant
+
+def create_state(self, index, weight):
 
 Review comment:
   Is this copied from SGD?
   Why not inherit SGD instead?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] piiswrong commented on a change in pull request #8918: Added in Large-Batch SGD with a warmup, and a LARS startegy. Also add?

2018-01-29 Thread GitBox

piiswrong commented on a change in pull request #8918: Added in Large-Batch SGD 
with a warmup, and a LARS startegy. Also add?
URL: https://github.com/apache/incubator-mxnet/pull/8918#discussion_r164531156
 
 

 ##
 File path: python/mxnet/optimizer.py
 ##
 @@ -645,6 +645,195 @@ def update(self, index, weight, grad, state):
 ftml_update(weight, grad, prev_d, prev_v, prev_z, out=weight,
 lr=lr, wd=wd, **kwargs)
 
+@register
+class LBSGD(Optimizer):
+"""The Large Batch SGD optimizer with momentum and weight decay.
+
+The optimizer updates the weight by::
+
+state = momentum * state + lr * rescale_grad * clip(grad, 
clip_gradient) + wd * weight
+weight = weight - state
+
+For details of the update algorithm see 
:class:`~mxnet.ndarray.lbsgd_update` and
 
 Review comment:
   @ashokei @zhreshold 
   Where is lbsgd_update defined? I don't see it.
   
   Please add proper reference to relevant papar


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] piiswrong commented on issue #9583: use nd for accuracy calculation

2018-01-29 Thread GitBox

piiswrong commented on issue #9583: use nd for accuracy calculation
URL: https://github.com/apache/incubator-mxnet/pull/9583#issuecomment-361349438
 
 
   This is not as simple as changing numpy array to ndarray. See 
https://github.com/apache/incubator-mxnet/pull/7995
   There are some cases where numpy array is faster. Please check performance 
against all cases.
   
   Also flatten reshapes to 2 dimensions. This could cause problems when output 
is 1 dimensional. Use reshape((-1,))


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] larroy opened a new pull request #9617: Remove uneeded gtest dependency, build with verbosely with CMake & ni?

2018-01-29 Thread GitBox

larroy opened a new pull request #9617: Remove uneeded gtest dependency, build 
with verbosely with CMake & ni?
URL: https://github.com/apache/incubator-mxnet/pull/9617
 
 
   ?nja.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] marcoabreu commented on a change in pull request #9609: Enable CPP unit tests in CI

2018-01-29 Thread GitBox

marcoabreu commented on a change in pull request #9609: Enable CPP unit tests 
in CI
URL: https://github.com/apache/incubator-mxnet/pull/9609#discussion_r164503291
 
 

 ##
 File path: Jenkinsfile
 ##
 @@ -500,7 +500,7 @@ try {
   init_git()
   unpack_lib('cpu')
   timeout(time: max_time, unit: 'MINUTES') {
-  sh "${docker_run} cpu ./perl-package/test.sh"
+sh "${docker_run} cpu ./perl-package/test.sh"
 
 Review comment:
   I'd just leave the indenting as-is, considering that this is not severe. We 
can modify it the next time that part is touched.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] larroy commented on issue #9609: Enable CPP unit tests in CI

2018-01-29 Thread GitBox

larroy commented on issue #9609: Enable CPP unit tests in CI
URL: https://github.com/apache/incubator-mxnet/pull/9609#issuecomment-361320603
 
 
   LGTM


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] larroy commented on a change in pull request #9609: Enable CPP unit tests in CI

2018-01-29 Thread GitBox

larroy commented on a change in pull request #9609: Enable CPP unit tests in CI
URL: https://github.com/apache/incubator-mxnet/pull/9609#discussion_r164502475
 
 

 ##
 File path: Jenkinsfile
 ##
 @@ -500,7 +500,7 @@ try {
   init_git()
   unpack_lib('cpu')
   timeout(time: max_time, unit: 'MINUTES') {
-  sh "${docker_run} cpu ./perl-package/test.sh"
+sh "${docker_run} cpu ./perl-package/test.sh"
 
 Review comment:
   @marcoabreu totally agree, and I would like to see more hygienic commits in 
general. Not sure this case is worth an additional PR to fix space given also 
the cost of running all the tests etc. Up to you two. There's always good 
exceptions to rules right :-)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] KellenSunderland commented on a change in pull request #9609: Enable CPP unit tests in CI

2018-01-29 Thread GitBox

KellenSunderland commented on a change in pull request #9609: Enable CPP unit 
tests in CI
URL: https://github.com/apache/incubator-mxnet/pull/9609#discussion_r164502166
 
 

 ##
 File path: Jenkinsfile
 ##
 @@ -511,7 +511,18 @@ try {
   init_git()
   unpack_lib('gpu')
   timeout(time: max_time, unit: 'MINUTES') {
-  sh "${docker_run} gpu ./perl-package/test.sh"
+sh "${docker_run} gpu ./perl-package/test.sh"
+  }
+}
+  }
+},
+'Cpp: GPU': {
 
 Review comment:
   Commented on this in PR description.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] marcoabreu commented on a change in pull request #9609: Enable CPP unit tests in CI

2018-01-29 Thread GitBox

marcoabreu commented on a change in pull request #9609: Enable CPP unit tests 
in CI
URL: https://github.com/apache/incubator-mxnet/pull/9609#discussion_r164500256
 
 

 ##
 File path: Jenkinsfile
 ##
 @@ -500,7 +500,7 @@ try {
   init_git()
   unpack_lib('cpu')
   timeout(time: max_time, unit: 'MINUTES') {
-  sh "${docker_run} cpu ./perl-package/test.sh"
+sh "${docker_run} cpu ./perl-package/test.sh"
 
 Review comment:
   In projects I've worked on, it was common practice not to commit whitespace 
changes if they are not in direct relation to the code one has modified. Reason 
being that this might increase the number of merge conflicts (in general) and 
especially messes with the history if you try to use git blame. I personally 
like to have a clear change history instead of commits doing something 
different and fixing identing on the fly.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] KellenSunderland commented on issue #9609: Enable CPP unit tests in CI

2018-01-29 Thread GitBox

KellenSunderland commented on issue #9609: Enable CPP unit tests in CI
URL: https://github.com/apache/incubator-mxnet/pull/9609#issuecomment-361317714
 
 
   This I agree with.  If we're going to be strict about this then it should be 
well documented in a the contribution guide along with justification.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] larroy commented on a change in pull request #9609: Enable CPP unit tests in CI

2018-01-29 Thread GitBox

larroy commented on a change in pull request #9609: Enable CPP unit tests in CI
URL: https://github.com/apache/incubator-mxnet/pull/9609#discussion_r164499878
 
 

 ##
 File path: tests/ci_build/Dockerfile.gpu_mklml
 ##
 @@ -15,4 +15,4 @@ RUN /install/ubuntu_install_scala.sh
 RUN wget --no-check-certificate -O /tmp/mklml.tgz 
https://github.com/01org/mkl-dnn/releases/download/v0.12/mklml_lnx_2018.0.1.20171227.tgz
 RUN tar -zxvf /tmp/mklml.tgz && cp -rf mklml_*/* /usr/local/ && rm -rf mklml_*
 
-ENV LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/local/lib
+ENV 
LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/local/lib:/usr/lib/x86_64-linux-gnu/:/usr/lib/gcc/x86_64-linux-gnu/5/
 
 Review comment:
   I would remove this if not needed, it's confusing.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] KellenSunderland commented on issue #9609: Enable CPP unit tests in CI

2018-01-29 Thread GitBox

KellenSunderland commented on issue #9609: Enable CPP unit tests in CI
URL: https://github.com/apache/incubator-mxnet/pull/9609#issuecomment-361317714
 
 
   This I agree with.  If we're going to be strict about this, then it should 
be well documented in a the contribution guide along with justification.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] larroy commented on a change in pull request #9609: Enable CPP unit tests in CI

2018-01-29 Thread GitBox

larroy commented on a change in pull request #9609: Enable CPP unit tests in CI
URL: https://github.com/apache/incubator-mxnet/pull/9609#discussion_r164497741
 
 

 ##
 File path: tests/ci_build/Dockerfile.gpu_mklml
 ##
 @@ -15,4 +15,4 @@ RUN /install/ubuntu_install_scala.sh
 RUN wget --no-check-certificate -O /tmp/mklml.tgz 
https://github.com/01org/mkl-dnn/releases/download/v0.12/mklml_lnx_2018.0.1.20171227.tgz
 RUN tar -zxvf /tmp/mklml.tgz && cp -rf mklml_*/* /usr/local/ && rm -rf mklml_*
 
-ENV LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/local/lib
+ENV 
LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/local/lib:/usr/lib/x86_64-linux-gnu/:/usr/lib/gcc/x86_64-linux-gnu/5/
 
 Review comment:
   I think most of these are by default and not needed, could you check the 
output of:
   
   cat /etc/ld.so.conf.d/*
   
   ?
   
   Why add them if not needed?
   
   ```
   (mxnet_py3) ubuntu@ip-172-31-36-119:~/devel/mxnet$ docker run -ti 
nvidia/cuda:8.0-cudnn5-devel
   root@741913f34dea:/# cat /etc/ld.so.conf.d/*
   /usr/local/cuda-8.0/targets/x86_64-linux/lib
   # libc default configuration
   /usr/local/lib
   /usr/local/nvidia/lib
   /usr/local/nvidia/lib64
   # Multiarch support
   /lib/x86_64-linux-gnu
   /usr/lib/x86_64-linux-gnu
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] larroy commented on a change in pull request #9609: Enable CPP unit tests in CI

2018-01-29 Thread GitBox

larroy commented on a change in pull request #9609: Enable CPP unit tests in CI
URL: https://github.com/apache/incubator-mxnet/pull/9609#discussion_r164498531
 
 

 ##
 File path: Jenkinsfile
 ##
 @@ -511,7 +511,18 @@ try {
   init_git()
   unpack_lib('gpu')
   timeout(time: max_time, unit: 'MINUTES') {
-  sh "${docker_run} gpu ./perl-package/test.sh"
+sh "${docker_run} gpu ./perl-package/test.sh"
+  }
+}
+  }
+},
+'Cpp: GPU': {
 
 Review comment:
   Should we run the tests in CPU as well?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[incubator-mxnet] branch master updated: Added in Large-Batch SGD with a warmup, and a LARS startegy. Also add… (#8918)

2018-01-29 Thread zhreshold

This is an automated email from the ASF dual-hosted git repository.

zhreshold pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git


The following commit(s) were added to refs/heads/master by this push:
 new 785690c  Added in Large-Batch SGD with a warmup, and a LARS startegy. 
Also add… (#8918)
785690c is described below

commit 785690c0569f265b52c88ff3041849fd7c338d70
Author: Ashok Emani 
AuthorDate: Mon Jan 29 09:16:11 2018 -0800

Added in Large-Batch SGD with a warmup, and a LARS startegy. Also add… 
(#8918)

* Added in Large-Batch SGD with a warmup, and a LARS startegy. Also added 
in a Polynomial Decay learning rate scheduler. Modified the example image fit 
code to allow these options to be selectable.

* Fix pylint issues

* pylint fixes

* remove duplicate num_update

* remove unused count
---
 example/image-classification/common/fit.py | 138 +++--
 python/mxnet/lr_scheduler.py   |  32 +
 python/mxnet/optimizer.py  | 190 +
 3 files changed, 324 insertions(+), 36 deletions(-)

diff --git a/example/image-classification/common/fit.py 
b/example/image-classification/common/fit.py
index 2b002c7..d9f96d0 100755
--- a/example/image-classification/common/fit.py
+++ b/example/image-classification/common/fit.py
@@ -15,10 +15,14 @@
 # specific language governing permissions and limitations
 # under the License.
 
-import mxnet as mx
+""" example train fit utility """
 import logging
 import os
 import time
+import re
+import math
+import mxnet as mx
+
 
 def _get_lr_scheduler(args, kv):
 if 'lr_factor' not in args or args.lr_factor >= 1:
@@ -27,17 +31,26 @@ def _get_lr_scheduler(args, kv):
 if 'dist' in args.kv_store:
 epoch_size /= kv.num_workers
 begin_epoch = args.load_epoch if args.load_epoch else 0
+if 'pow' in args.lr_step_epochs:
+lr = args.lr
+max_up = args.num_epochs * epoch_size
+pwr = float(re.sub('pow[- ]*', '', args.lr_step_epochs))
+poly_sched = mx.lr_scheduler.PolyScheduler(max_up, lr, pwr)
+return (lr, poly_sched)
 step_epochs = [int(l) for l in args.lr_step_epochs.split(',')]
 lr = args.lr
 for s in step_epochs:
 if begin_epoch >= s:
 lr *= args.lr_factor
 if lr != args.lr:
-logging.info('Adjust learning rate to %e for epoch %d' %(lr, 
begin_epoch))
+logging.info('Adjust learning rate to %e for epoch %d',
+ lr, begin_epoch)
 
-steps = [epoch_size * (x-begin_epoch) for x in step_epochs if 
x-begin_epoch > 0]
+steps = [epoch_size * (x - begin_epoch)
+ for x in step_epochs if x - begin_epoch > 0]
 return (lr, mx.lr_scheduler.MultiFactorScheduler(step=steps, 
factor=args.lr_factor))
 
+
 def _load_model(args, rank=0):
 if 'load_epoch' not in args or args.load_epoch is None:
 return (None, None, None)
@@ -50,6 +63,7 @@ def _load_model(args, rank=0):
 logging.info('Loaded model %s_%04d.params', model_prefix, args.load_epoch)
 return (sym, arg_params, aux_params)
 
+
 def _save_model(args, rank=0):
 if args.model_prefix is None:
 return None
@@ -59,6 +73,7 @@ def _save_model(args, rank=0):
 return mx.callback.do_checkpoint(args.model_prefix if rank == 0 else 
"%s-%d" % (
 args.model_prefix, rank))
 
+
 def add_fit_args(parser):
 """
 parser : argparse.ArgumentParser
@@ -68,7 +83,8 @@ def add_fit_args(parser):
 train.add_argument('--network', type=str,
help='the neural network to use')
 train.add_argument('--num-layers', type=int,
-   help='number of layers in the neural network, required 
by some networks such as resnet')
+   help='number of layers in the neural network, \
+ required by some networks such as resnet')
 train.add_argument('--gpus', type=str,
help='list of gpus to run, e.g. 0 or 0,2,5. empty means 
using cpu')
 train.add_argument('--kv-store', type=str, default='device',
@@ -81,6 +97,8 @@ def add_fit_args(parser):
help='the ratio to reduce lr on each step')
 train.add_argument('--lr-step-epochs', type=str,
help='the epochs to reduce the lr, e.g. 30,60')
+train.add_argument('--initializer', type=str, default='default',
+   help='the initializer type')
 train.add_argument('--optimizer', type=str, default='sgd',
help='the optimizer type')
 train.add_argument('--mom', type=float, default=0.9,
@@ -108,8 +126,16 @@ def add_fit_args(parser):
  takes `2bit` or `none` for now')
 train.add_argument('--gc-threshold', type=float, default=0.5,
help='threshold for 2bit gradient compression')
+#

[GitHub] larroy commented on issue #9609: Enable CPP unit tests in CI

2018-01-29 Thread GitBox

larroy commented on issue #9609: Enable CPP unit tests in CI
URL: https://github.com/apache/incubator-mxnet/pull/9609#issuecomment-361317419
 
 
   I think in this case is ok, if you bury the whole commit with WS change then 
it would not be hygienic at all.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] zhreshold closed pull request #8918: Added in Large-Batch SGD with a warmup, and a LARS startegy. Also add?

2018-01-29 Thread GitBox

zhreshold closed pull request #8918: Added in Large-Batch SGD with a warmup, 
and a LARS startegy. Also add?
URL: https://github.com/apache/incubator-mxnet/pull/8918
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/example/image-classification/common/fit.py 
b/example/image-classification/common/fit.py
index 2b002c7702..d9f96d0eba 100755
--- a/example/image-classification/common/fit.py
+++ b/example/image-classification/common/fit.py
@@ -15,10 +15,14 @@
 # specific language governing permissions and limitations
 # under the License.
 
-import mxnet as mx
+""" example train fit utility """
 import logging
 import os
 import time
+import re
+import math
+import mxnet as mx
+
 
 def _get_lr_scheduler(args, kv):
 if 'lr_factor' not in args or args.lr_factor >= 1:
@@ -27,17 +31,26 @@ def _get_lr_scheduler(args, kv):
 if 'dist' in args.kv_store:
 epoch_size /= kv.num_workers
 begin_epoch = args.load_epoch if args.load_epoch else 0
+if 'pow' in args.lr_step_epochs:
+lr = args.lr
+max_up = args.num_epochs * epoch_size
+pwr = float(re.sub('pow[- ]*', '', args.lr_step_epochs))
+poly_sched = mx.lr_scheduler.PolyScheduler(max_up, lr, pwr)
+return (lr, poly_sched)
 step_epochs = [int(l) for l in args.lr_step_epochs.split(',')]
 lr = args.lr
 for s in step_epochs:
 if begin_epoch >= s:
 lr *= args.lr_factor
 if lr != args.lr:
-logging.info('Adjust learning rate to %e for epoch %d' %(lr, 
begin_epoch))
+logging.info('Adjust learning rate to %e for epoch %d',
+ lr, begin_epoch)
 
-steps = [epoch_size * (x-begin_epoch) for x in step_epochs if 
x-begin_epoch > 0]
+steps = [epoch_size * (x - begin_epoch)
+ for x in step_epochs if x - begin_epoch > 0]
 return (lr, mx.lr_scheduler.MultiFactorScheduler(step=steps, 
factor=args.lr_factor))
 
+
 def _load_model(args, rank=0):
 if 'load_epoch' not in args or args.load_epoch is None:
 return (None, None, None)
@@ -50,6 +63,7 @@ def _load_model(args, rank=0):
 logging.info('Loaded model %s_%04d.params', model_prefix, args.load_epoch)
 return (sym, arg_params, aux_params)
 
+
 def _save_model(args, rank=0):
 if args.model_prefix is None:
 return None
@@ -59,6 +73,7 @@ def _save_model(args, rank=0):
 return mx.callback.do_checkpoint(args.model_prefix if rank == 0 else 
"%s-%d" % (
 args.model_prefix, rank))
 
+
 def add_fit_args(parser):
 """
 parser : argparse.ArgumentParser
@@ -68,7 +83,8 @@ def add_fit_args(parser):
 train.add_argument('--network', type=str,
help='the neural network to use')
 train.add_argument('--num-layers', type=int,
-   help='number of layers in the neural network, required 
by some networks such as resnet')
+   help='number of layers in the neural network, \
+ required by some networks such as resnet')
 train.add_argument('--gpus', type=str,
help='list of gpus to run, e.g. 0 or 0,2,5. empty means 
using cpu')
 train.add_argument('--kv-store', type=str, default='device',
@@ -81,6 +97,8 @@ def add_fit_args(parser):
help='the ratio to reduce lr on each step')
 train.add_argument('--lr-step-epochs', type=str,
help='the epochs to reduce the lr, e.g. 30,60')
+train.add_argument('--initializer', type=str, default='default',
+   help='the initializer type')
 train.add_argument('--optimizer', type=str, default='sgd',
help='the optimizer type')
 train.add_argument('--mom', type=float, default=0.9,
@@ -108,8 +126,16 @@ def add_fit_args(parser):
  takes `2bit` or `none` for now')
 train.add_argument('--gc-threshold', type=float, default=0.5,
help='threshold for 2bit gradient compression')
+# additional parameters for large batch sgd
+train.add_argument('--macrobatch-size', type=int, default=0,
+   help='distributed effective batch size')
+train.add_argument('--warmup-epochs', type=int, default=5,
+   help='the epochs to ramp-up lr to scaled large-batch 
value')
+train.add_argument('--warmup-strategy', type=str, default='linear',
+   help='the ramping-up strategy for large batch sgd')
 return train
 
+
 def fit(args, network, data_loader, **kwargs):
 """
 train a model
@@ -135,14 +161,13 @@ def fit(args, network, data_loader, **kwargs):
 for i, batch in enumerate(train):
 for j in batch.data:
 j.wait_to_read()
-

[GitHub] KellenSunderland commented on a change in pull request #9609: Enable CPP unit tests in CI

2018-01-29 Thread GitBox

KellenSunderland commented on a change in pull request #9609: Enable CPP unit 
tests in CI
URL: https://github.com/apache/incubator-mxnet/pull/9609#discussion_r164498809
 
 

 ##
 File path: Jenkinsfile
 ##
 @@ -500,7 +500,7 @@ try {
   init_git()
   unpack_lib('cpu')
   timeout(time: max_time, unit: 'MINUTES') {
-  sh "${docker_run} cpu ./perl-package/test.sh"
+sh "${docker_run} cpu ./perl-package/test.sh"
 
 Review comment:
   It's quite common practice to fix whitespace issues on small commits.  Can 
you elaborate on why you think it should be removed?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] marcoabreu commented on a change in pull request #9609: Enable CPP unit tests in CI

2018-01-29 Thread GitBox

marcoabreu commented on a change in pull request #9609: Enable CPP unit tests 
in CI
URL: https://github.com/apache/incubator-mxnet/pull/9609#discussion_r164497953
 
 

 ##
 File path: Jenkinsfile
 ##
 @@ -500,7 +500,7 @@ try {
   init_git()
   unpack_lib('cpu')
   timeout(time: max_time, unit: 'MINUTES') {
-  sh "${docker_run} cpu ./perl-package/test.sh"
+sh "${docker_run} cpu ./perl-package/test.sh"
 
 Review comment:
   Remove whitespace change


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] aaronmarkham opened a new pull request #9616: Removing a broken tutorial from the nightly tests

2018-01-29 Thread GitBox

aaronmarkham opened a new pull request #9616: Removing a broken tutorial from 
the nightly tests
URL: https://github.com/apache/incubator-mxnet/pull/9616
 
 
   ## Description ##
   The predict image tutorial is broken: 
https://github.com/apache/incubator-mxnet/issues/9532
   It is causing the nightly test build to also fail, so I'm removing it from 
the config until #9532 can be fixed.
   
   
   ## Comments ##
   Since we've added a bunch of tutorials in this last release, we need to 
update this file to cover those tutorials. However, I'm not sure how these 
tests work and if the tutorials have the requisite code blocks to facilitate 
being part of the nightly test suite.
   
   If there's nothing special to be added to the tutorials themselves, then 
kick this back to me for adding the other tutorials to this config.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] larroy opened a new pull request #9615: [CMake] Fix OSX double compilation and add static library

2018-01-29 Thread GitBox

larroy opened a new pull request #9615: [CMake] Fix OSX double compilation and 
add static library
URL: https://github.com/apache/incubator-mxnet/pull/9615
 
 
   Fixes #9494
   Simplifies OSX (incorrect) building logic
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] byronyi commented on issue #5826: Does MXNet support RDMA over Converged Ethernet (ROCE)

2018-01-29 Thread GitBox

byronyi commented on issue #5826: Does MXNet support RDMA over Converged 
Ethernet (ROCE)
URL: 
https://github.com/apache/incubator-mxnet/issues/5826#issuecomment-349533758
 
 
   @weijianwen We are primarily targeting on RoCEv2 deployment, but in 
principle it should require no modification to support RoCEv1, InfiniBand and 
iWARP. We do not plan to support GPU/Xeon Phi Direct RDMA at this stage, as it 
seems inter-node communication, i.e. pslite, is largely agnostic to the actual 
worker processors. It might require significant refactoring or re-design on 
MXNet's side as well, and I do hope you MXNet developers could shed some light 
on this direction.
   
   I know little about Omni-Path, but I heard that Intel does support the same 
verbs interface as defined in RDMA specification (RFC 5040). 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] byronyi commented on issue #4766: How to use native Infiniband instead of IPoIB

2018-01-29 Thread GitBox

byronyi commented on issue #4766: How to use native Infiniband instead of IPoIB
URL: 
https://github.com/apache/incubator-mxnet/issues/4766#issuecomment-361303151
 
 
   Hi all, please see https://github.com/dmlc/ps-lite/pull/124 for our PR. Many 
thanks to my colleagues @crazyboycjr and @snowzjx for their design and 
implementation.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] byronyi commented on issue #5826: Does MXNet support RDMA over Converged Ethernet (ROCE)

2018-01-29 Thread GitBox

byronyi commented on issue #5826: Does MXNet support RDMA over Converged 
Ethernet (ROCE)
URL: 
https://github.com/apache/incubator-mxnet/issues/5826#issuecomment-361303170
 
 
   Hi all, please see https://github.com/dmlc/ps-lite/pull/124 for our PR. Many 
thanks to my colleagues @crazyboycjr and @snowzjx for their design and 
implementation.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] aaronmarkham commented on issue #8339: data iterators tutorial errors

2018-01-29 Thread GitBox

aaronmarkham commented on issue #8339: data iterators tutorial errors
URL: 
https://github.com/apache/incubator-mxnet/issues/8339#issuecomment-361296088
 
 
   Looks like this is fixed!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] aaronmarkham closed issue #8339: data iterators tutorial errors

2018-01-29 Thread GitBox

aaronmarkham closed issue #8339: data iterators tutorial errors
URL: https://github.com/apache/incubator-mxnet/issues/8339
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] dwSun opened a new pull request #9614: MobileNetV2

2018-01-29 Thread GitBox

dwSun opened a new pull request #9614: MobileNetV2
URL: https://github.com/apache/incubator-mxnet/pull/9614
 
 
   ## Description ##
   MobileNetV2 model from the
   `"Inverted Residuals and Linear Bottlenecks: Mobile Networks for  
Classification, Detection and Segmentation"
   `_ paper.
   
   ## Checklist ##
   ### Essentials ###
   - [ Y ] Passed code style checking (`pylint`)
   - [ Y ] Changes are complete (i.e. I finished coding on this PR)
   - [ N ] All changes have test coverage:
   - [ Y ] Code is well-documented: 
   - [ Y ] To the my best knowledge, examples are either not affected by this 
change, or have been fixed to be compatible with this change
   
   ## Comments ##
   - Not sure this is the correct implement, but this model did works.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] kitstar commented on issue #8421: model_converter.py fails to convert GoogLeNet caffe model

2018-01-29 Thread GitBox

kitstar commented on issue #8421: model_converter.py fails to convert GoogLeNet 
caffe model
URL: 
https://github.com/apache/incubator-mxnet/issues/8421#issuecomment-361259684
 
 
   Hi @badstones you can try [MMdnn](https://www.github.com/Microsoft/MMdnn) to 
convert the google pre-trained caffe model.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] KellenSunderland closed pull request #9542: WIP: Do not merge, Dockerfile to create Jetson TX1 and TX2 compatible builds.

2018-01-29 Thread GitBox

KellenSunderland closed pull request #9542: WIP: Do not merge, Dockerfile to 
create Jetson TX1 and TX2 compatible builds.
URL: https://github.com/apache/incubator-mxnet/pull/9542
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/Jenkinsfile b/Jenkinsfile
index b7a8f60cb9..6218c626db 100644
--- a/Jenkinsfile
+++ b/Jenkinsfile
@@ -204,6 +204,16 @@ try {
 }
   }
 },
+'asdfads': {
+  node('mxnetlinux-cpu') {
+ws('workspace/jetson') {
+  init_git()
+  sh "make clean"
+  sh "make -C amalgamation/ clean"
+  sh "docker build -f docker_multiarch/Dockerfile.build.jetson .."
+}
+  }
+},
 'CPU: MKLML': {
   node('mxnetlinux-cpu') {
 ws('workspace/build-mklml-cpu') {
diff --git a/docker_multiarch/Dockerfile.build.jetson 
b/docker_multiarch/Dockerfile.build.jetson
new file mode 100644
index 00..93fa53dd01
--- /dev/null
+++ b/docker_multiarch/Dockerfile.build.jetson
@@ -0,0 +1,72 @@
+# -*- mode: dockerfile -*-
+# dockerfile to build libmxnet.so, and a python wheel for the Jetson TX1/TX2
+
+FROM nvidia/cuda:8.0-cudnn5-devel as cudabuilder
+
+FROM dockcross/linux-arm64
+
+ENV ARCH aarch64
+ENV NVCCFLAGS "-m64"
+ENV CUDA_ARCH "-gencode arch=compute_53,code=sm_53 -gencode 
arch=compute_62,code=sm_62"
+ENV BUILD_OPTS "USE_OPENCV=0 USE_BLAS=openblas USE_SSE=0 USE_CUDA=1 
USE_CUDNN=0 ENABLE_CUDA_RTC=0 USE_NCCL=0 USE_CUDA_PATH=/usr/local/cuda/"
+ENV CC /usr/bin/aarch64-linux-gnu-gcc
+ENV CXX /usr/bin/aarch64-linux-gnu-g++
+ENV FC /usr/bin/aarch64-linux-gnu-gfortran-4.9
+ENV HOSTCC gcc
+
+WORKDIR /work
+
+# Build OpenBLAS
+ADD https://api.github.com/repos/xianyi/OpenBLAS/git/refs/heads/master 
/tmp/openblas_version.json
+RUN git clone https://github.com/xianyi/OpenBLAS.git && \
+cd OpenBLAS && \
+make -j$(nproc) TARGET=ARMV8 && \
+make install && \
+ln -s /opt/OpenBLAS/lib/libopenblas.so /usr/lib/libopenblas.so && \
+ln -s /opt/OpenBLAS/lib/libopenblas.a /usr/lib/libopenblas.a && \
+ln -s /opt/OpenBLAS/lib/libopenblas.a /usr/lib/liblapack.a
+
+ENV LD_LIBRARY_PATH $LD_LIBRARY_PATH:/opt/OpenBLAS/lib
+ENV CPLUS_INCLUDE_PATH /opt/OpenBLAS/include
+
+# Setup CUDA build env (including configuring and copying nvcc)
+COPY --from=cudabuilder /usr/local/cuda /usr/local/cuda
+ENV PATH $PATH:/usr/local/cuda/bin
+ENV TARGET_ARCH aarch64
+ENV TARGET_OS linux
+
+# Install ARM depedencies based on Jetpack 3.1
+RUN wget 
http://developer.download.nvidia.com/devzone/devcenter/mobile/jetpack_l4t/013/linux-x64/cuda-repo-l4t-8-0-local_8.0.84-1_arm64.deb
 && \
+wget 
http://developer.download.nvidia.com/devzone/devcenter/mobile/jetpack_l4t/013/linux-x64/libcudnn6_6.0.21-1+cuda8.0_arm64.deb
 && \
+dpkg -i cuda-repo-l4t-8-0-local_8.0.84-1_arm64.deb && \
+dpkg -i libcudnn6_6.0.21-1+cuda8.0_arm64.deb && \
+apt update -y && \
+apt install cuda-cudart-cross-aarch64-8-0 cuda-cublas-cross-aarch64-8-0 \
+cuda-nvml-cross-aarch64-8-0 cuda-nvrtc-cross-aarch64-8-0 
cuda-cufft-cross-aarch64-8-0 \
+cuda-curand-cross-aarch64-8-0 cuda-cusolver-cross-aarch64-8-0 
cuda-cusparse-cross-aarch64-8-0 \
+cuda-misc-headers-cross-aarch64-8-0 cuda-npp-cross-aarch64-8-0 libcudnn6 
-y && \
+cp /usr/local/cuda-8.0/targets/aarch64-linux/lib/*.so 
/usr/local/cuda/lib64/ && \
+cp /usr/local/cuda-8.0/targets/aarch64-linux/lib/stubs/*.so 
/usr/local/cuda/lib64/stubs/ && \
+cp -r /usr/local/cuda-8.0/targets/aarch64-linux/include/ 
/usr/local/cuda/include/ && \
+rm cuda-repo-l4t-8-0-local_8.0.84-1_arm64.deb && rm 
libcudnn6_6.0.21-1+cuda8.0_arm64.deb
+
+# Build MXNet
+ADD incubator-mxnet incubator-mxnet
+
+WORKDIR /work/incubator-mxnet
+
+# Add ARM specific settings
+ADD arm.crosscompile.mk make/config.mk
+
+# Build and link
+RUN make -j$(nproc) $BUILD_OPTS
+
+# Create a binary wheel for easy installation.
+# When using tool.py output will be in the jetson folder.
+# Scp the .whl file to your target device, and install via
+# pip install
+WORKDIR /work/incubator-mxnet/python
+RUN python setup.py  bdist_wheel --universal
+
+# Copy build artifacts to output folder for tool.py script
+RUN mkdir -p /work/build & cp dist/*.whl /work/build && cp ../lib/* /work/build


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] KellenSunderland commented on issue #9542: WIP: Do not merge, Dockerfile to create Jetson TX1 and TX2 compatible builds.

2018-01-29 Thread GitBox

KellenSunderland commented on issue #9542: WIP: Do not merge, Dockerfile to 
create Jetson TX1 and TX2 compatible builds.
URL: https://github.com/apache/incubator-mxnet/pull/9542#issuecomment-361235723
 
 
   Iterating on this one in a private CI for a while.  Will reopen when stable.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] calendarbase commented on issue #9399: error: unable to load shared object /mxnet/libs/libmxnet.so

2018-01-29 Thread GitBox

calendarbase commented on issue #9399:  error: unable to load shared object 
/mxnet/libs/libmxnet.so
URL: 
https://github.com/apache/incubator-mxnet/issues/9399#issuecomment-361232562
 
 
   Ok, so this could be a path issue?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] szha commented on issue #8471: Saved checkpoint "fcn_xs"

2018-01-29 Thread GitBox

szha commented on issue #8471: Saved checkpoint "fcn_xs"
URL: 
https://github.com/apache/incubator-mxnet/issues/8471#issuecomment-361230264
 
 
   @apache/mxnet-committers: This issue has been inactive for the past 90 days. 
It has no label and needs triage.
   
   For general "how-to" questions, our [user forum](https://discuss.mxnet.io/) 
(and [Chinese version](https://discuss.gluon.ai/)) is a good place to get help.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] szha commented on issue #8038: why I use MXPredCreate(), it shows " mxnet_predict-all.cc:23570: Check failed: op != nullptr Operator RegressionOutput is not registered"?

2018-01-29 Thread GitBox

szha commented on issue #8038: why I use MXPredCreate(), it shows  " 
mxnet_predict-all.cc:23570: Check failed: op != nullptr Operator 
RegressionOutput is not registered"?
URL: 
https://github.com/apache/incubator-mxnet/issues/8038#issuecomment-361230268
 
 
   @apache/mxnet-committers: This issue has been inactive for the past 90 days. 
It has no label and needs triage.
   
   For general "how-to" questions, our [user forum](https://discuss.mxnet.io/) 
(and [Chinese version](https://discuss.gluon.ai/)) is a good place to get help.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] alexmosc commented on issue #9358: Why does running 1 round of an MXNET model training produce Train-mse=NaN?

2018-01-29 Thread GitBox

alexmosc commented on issue #9358: Why does running 1 round of an MXNET model 
training produce Train-mse=NaN?
URL: 
https://github.com/apache/incubator-mxnet/issues/9358#issuecomment-361199965
 
 
   Any updates on this question?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] KellenSunderland commented on issue #9612: CUDNN_STATUS_SUCCESS (4 vs. 0) cuDNN: CUDNN_STATUS_INTERNAL_ERROR on jetson TX2

2018-01-29 Thread GitBox

KellenSunderland commented on issue #9612: CUDNN_STATUS_SUCCESS (4 vs. 0) 
cuDNN: CUDNN_STATUS_INTERNAL_ERROR on jetson TX2
URL: 
https://github.com/apache/incubator-mxnet/issues/9612#issuecomment-361192731
 
 
   Thanks for the report @yanhn and the flag @marcoabreu.  We're running SSD on 
Jetsons quite often so we can probably help out.  I'll try and repro with your 
exact steps and see if I can break it.
   
   FYI @larroy 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] zhangchen-qinyinghua opened a new issue #9613: Transition matrix is not updated during the train in the lstm_crf example

2018-01-29 Thread GitBox

zhangchen-qinyinghua opened a new issue #9613: Transition matrix is not updated 
during the train in the lstm_crf example
URL: https://github.com/apache/incubator-mxnet/issues/9613
 
 
   I think there is a bug in the lstm_crf example.
   Please see my reviews there.
   
   https://github.com/apache/incubator-mxnet/pull/7253


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

1 2 >

1 - 100 of 103 matches

Mail list logo