[GitHub] piiswrong closed pull request #7147: cuda support for linalg-functions, restructuring of linalg interfaces
piiswrong closed pull request #7147: cuda support for linalg-functions, restructuring of linalg interfaces URL: https://github.com/apache/incubator-mxnet/pull/7147 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] piiswrong commented on issue #7425: Tensorcore fullyconnected support
piiswrong commented on issue #7425: Tensorcore fullyconnected support URL: https://github.com/apache/incubator-mxnet/pull/7425#issuecomment-322000376 The linalg_gemm PR is merged. Please use that for tensorcore support This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] lxn2 commented on issue #5: Change to 0.11-RC
lxn2 commented on issue #5: Change to 0.11-RC URL: https://github.com/apache/incubator-mxnet-site/pull/5#issuecomment-321989354 Does this not list 0.10 as one of the versions? I wiped it out and pushed your changes but tags.txt only has .11 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] lxn2 commented on issue #5: Change to 0.11-RC
lxn2 commented on issue #5: Change to 0.11-RC URL: https://github.com/apache/incubator-mxnet-site/pull/5#issuecomment-321993040 That's not my understanding...did Steffen suggest to delete 0.10 anywhere? I thought we wanted to have all the versions including .10 so far? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] asmushetzel commented on issue #7147: cuda support for linalg-functions, restructuring of linalg interfaces
asmushetzel commented on issue #7147: cuda support for linalg-functions, restructuring of linalg interfaces URL: https://github.com/apache/incubator-mxnet/pull/7147#issuecomment-321958146 So this is done from my point of view. Jenkins failed on some nodes in some tests but that is all due to an unrelated problem with some Gluon testcase (I examined all the logfiles). The relevant la_op tests pass on all nodes successfully. Eric, from my point of view this can be merged now. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] goodtogood commented on issue #7154: mx.contrib.sym.ctc_loss slow down extremely on gpu with large alphabet_size,but faster on cpu
goodtogood commented on issue #7154: mx.contrib.sym.ctc_loss slow down extremely on gpu with large alphabet_size,but faster on cpu URL: https://github.com/apache/incubator-mxnet/issues/7154#issuecomment-321994447 same problem, would you like to share your code ? thx! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] lxn2 closed pull request #6: Fix example link
lxn2 closed pull request #6: Fix example link URL: https://github.com/apache/incubator-mxnet-site/pull/6 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] szha opened a new pull request #7442: contrib ctc interface changes for compatibility, and gluon CTC
szha opened a new pull request #7442: contrib ctc interface changes for compatibility, and gluon CTC URL: https://github.com/apache/incubator-mxnet/pull/7442 This change is to make current contrib CTC compatible with the cudnn7 CTC interface, and to add CTC loss layer for gluon. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] piiswrong closed pull request #7441: broken link in readme
piiswrong closed pull request #7441: broken link in readme URL: https://github.com/apache/incubator-mxnet/pull/7441 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ufukcbicici opened a new issue #7443: tf.boolean_mask equivalent in MxNet
ufukcbicici opened a new issue #7443: tf.boolean_mask equivalent in MxNet URL: https://github.com/apache/incubator-mxnet/issues/7443 In my archictecture, I want to pick same of the samples in a minibatch at different stages of computation, based on a binary vector produced elsewhere in the network. In Tensorflow, tf.boolean_mask is generally used for that purpose. How can I achieve the similar effect in MxNet, is it possible? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] piiswrong commented on a change in pull request #7082: Sparse Tensor: request for reviews
piiswrong commented on a change in pull request #7082: Sparse Tensor: request for reviews URL: https://github.com/apache/incubator-mxnet/pull/7082#discussion_r132833058 ## File path: python/mxnet/ndarray/sparse_ndarray.py ## @@ -0,0 +1,906 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +# coding: utf-8 +"""SparseNDArray API of mxnet.""" +from __future__ import absolute_import +from __future__ import division +try: +from __builtin__ import slice as py_slice +except ImportError: +from builtins import slice as py_slice + +import ctypes +import warnings + +import os as _os +import sys as _sys + +# import operator +import numpy as np +from ..base import NotSupportedForSparseNDArray +from ..base import _LIB, numeric_types +from ..base import c_array, mx_real_t +from ..base import mx_uint, NDArrayHandle, check_call +from ..context import Context +from . import _internal +from .ndarray import _DTYPE_NP_TO_MX, _DTYPE_MX_TO_NP +from .ndarray import _STORAGE_TYPE_STR_TO_ID +from .ndarray import NDArray, _storage_type, _zeros_ndarray, _array +from . import cast_storage +from . import slice as nd_slice + +# Use different verison of SymbolBase +# When possible, use cython to speedup part of computation. +# pylint: disable=unused-import +try: +if int(_os.environ.get("MXNET_ENABLE_CYTHON", True)) == 0: +from .._ctypes.ndarray import NDArrayBase, _set_ndarray_class +elif _sys.version_info >= (3, 0): +from .._cy3.ndarray import NDArrayBase, _set_ndarray_class Review comment: why not import these from ndarray.py to void try except? -------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] piiswrong commented on a change in pull request #7082: Sparse Tensor: request for reviews
piiswrong commented on a change in pull request #7082: Sparse Tensor: request for reviews URL: https://github.com/apache/incubator-mxnet/pull/7082#discussion_r132833058 ## File path: python/mxnet/ndarray/sparse_ndarray.py ## @@ -0,0 +1,906 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +# coding: utf-8 +"""SparseNDArray API of mxnet.""" +from __future__ import absolute_import +from __future__ import division +try: +from __builtin__ import slice as py_slice +except ImportError: +from builtins import slice as py_slice + +import ctypes +import warnings + +import os as _os +import sys as _sys + +# import operator +import numpy as np +from ..base import NotSupportedForSparseNDArray +from ..base import _LIB, numeric_types +from ..base import c_array, mx_real_t +from ..base import mx_uint, NDArrayHandle, check_call +from ..context import Context +from . import _internal +from .ndarray import _DTYPE_NP_TO_MX, _DTYPE_MX_TO_NP +from .ndarray import _STORAGE_TYPE_STR_TO_ID +from .ndarray import NDArray, _storage_type, _zeros_ndarray, _array +from . import cast_storage +from . import slice as nd_slice + +# Use different verison of SymbolBase +# When possible, use cython to speedup part of computation. +# pylint: disable=unused-import +try: +if int(_os.environ.get("MXNET_ENABLE_CYTHON", True)) == 0: +from .._ctypes.ndarray import NDArrayBase, _set_ndarray_class +elif _sys.version_info >= (3, 0): +from .._cy3.ndarray import NDArrayBase, _set_ndarray_class Review comment: why not import these from ndarray to void try except? -------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] xzqjack opened a new issue #7444: define a new Parametrized symbol layer and how to use (bind, init, set learning rate ) it?
peration is ok. Can anyone show me how to use (bind, init, set learning rate) a new parameterizd layer ? Thanks. ---- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] punisher-n commented on issue #6023: pip install error: No matching distribution found for mxnet-cu80
punisher-n commented on issue #6023: pip install error: No matching distribution found for mxnet-cu80 URL: https://github.com/apache/incubator-mxnet/issues/6023#issuecomment-321838490 I am having the same issues with my machine -- Could not find a version that satisfies the requirement setup.py (from versions: ) No matching distribution found for setup.py please help me on this.. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] lxn2 opened a new pull request #7429: Add more license files
lxn2 opened a new pull request #7429: Add more license files URL: https://github.com/apache/incubator-mxnet/pull/7429 Added more extension handlers to license_header.py. Made it verbose so we keep track of what we're whitelisting/skipping. Added license headers to files with newly added extension headers. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ArturIndio opened a new issue #7446: format input data using mx.rnn
ArturIndio opened a new issue #7446: format input data using mx.rnn URL: https://github.com/apache/incubator-mxnet/issues/7446 [data_ex_git.zip](https://github.com/apache/incubator-mxnet/files/1220978/data_ex_git.zip) I've trying to forecast time-series using mx.rnn model but I can't shape the data to input format asked from mx.rnn. I want do predict a variable Q_t using some past values from itself and another variables. Using mx.model.FeedForward.create is very easy to define. I can't understand the "labels" input on mx.rnn model. This is my code: ``` load("data_ex_git.RData") # attached train <- data[1:dias_train,] test <- data[(dias_train+1):nrow(data),] # Neural net fitting # Scaling data for the NN maxs <- apply(data, 2, max) mins <- apply(data, 2, min) scaled <- as.data.frame(scale(data, center = mins, scale = maxs - mins)) train_ <- scaled[1:dias_train,] test_ <- scaled[(dias_train+1):nrow(data),] library(mxnet) train.x <- data.matrix(train_[,-1]) train.y <- train_[,1] test.x <- data.matrix(test_[,-1]) test.y <- test_[,1] X.train <- list(data=t(train.x), label=t(train.y)) X.val <- list(data=t(test.x), label=t(test.y)) batch.size = 5 seq.len = 5 num.hidden = 3 num.embed = 3 num.rnn.layer = 1 num.lstm.layer = 1 num.round = 1 update.period = 1 learning.rate= 0.1 wd=0.1 clip_gradient=1 mx.set.seed(0) model <- mx.rnn(X.train, NULL, num.rnn.layer=num.rnn.layer, seq.len=seq.len, num.hidden=num.hidden, num.embed=num.embed, num.label=5, batch.size=batch.size, input.size=5, ctx = mx.cpu(), num.round = num.round, update.period = update.period, initializer = mx.init.uniform(0.01), dropout = 0, optimizer = "sgd", batch.norm = FALSE, learning.rate=learning.rate, wd=wd, clip_gradient=clip_gradient) preds = predict(model,t(test.x)) ``` That's my error: `[19:10:06] d:\program files (x86)\jenkins\workspace\mxnet\mxnet\src\operator\tensor\./matrix_op-inl.h:141: Using target_shape will be deprecated. [19:10:06] d:\program files (x86)\jenkins\workspace\mxnet\mxnet\src\operator\tensor\./matrix_op-inl.h:141: Using target_shape will be deprecated. [19:10:06] d:\program files (x86)\jenkins\workspace\mxnet\mxnet\src\operator\tensor\./matrix_op-inl.h:141: Using target_shape will be deprecated. [19:10:06] D:\Program Files (x86)\Jenkins\workspace\mxnet\mxnet\dmlc-core\include\dmlc/logging.h:308: [19:10:06] D:\Program Files (x86)\Jenkins\workspace\mxnet\mxnet\src\ndarray\ndarray.cc:329: Check failed: from.shape() == to->shape() operands shape mismatchfrom.shape = (5,14) to.shape=(5,5) Error in exec$update.arg.arrays(arg.arrays, match.name, skip.null) : [19:10:06] D:\Program Files (x86)\Jenkins\workspace\mxnet\mxnet\src\ndarray\ndarray.cc:329: Check failed: from.shape() == to->shape() operands shape mismatchfrom.shape = (5,14) to.shape=(5,5)` ---- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ysh329 commented on issue #7424: train_mnist.py failed: TypeError: __init__() got an unexpected keyword argument 'multi_precision'
ysh329 commented on issue #7424: train_mnist.py failed: TypeError: __init__() got an unexpected keyword argument 'multi_precision' URL: https://github.com/apache/incubator-mxnet/issues/7424#issuecomment-322086687 @ptrendx So stupid I am, I forgot `make` and `pip install -e .`. It's okay on current master branch. Issue closed. :rofl: This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ysh329 closed issue #7424: train_mnist.py failed: TypeError: __init__() got an unexpected keyword argument 'multi_precision'
ysh329 closed issue #7424: train_mnist.py failed: TypeError: __init__() got an unexpected keyword argument 'multi_precision' URL: https://github.com/apache/incubator-mxnet/issues/7424 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] sbodenstein opened a new issue #7445: Using cuDNN for CTC Loss
sbodenstein opened a new issue #7445: Using cuDNN for CTC Loss URL: https://github.com/apache/incubator-mxnet/issues/7445 @piiswrong, @szha: Now that cuDNN 7 supports CTC loss, perhaps we should discard the current GPU implementation in contrib.ctc_loss (adapted from the WarpCTC implementation) and only use cuDNN for GPU? The main reasons: 1) it requires maintenance effort to ensure the GPU implementation works on new GPU architectures, requiring careful updating of dependencies (like modern gpu). 2) Users are still reporting problems with memset issues when using the WarpCTC plugin (#6121) I don't think the maintenance effort is worthwhile if almost every single user training with CUDA will have cuDNN. What are your thoughts? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] thirdwing closed pull request #7437: [R] vignette update
thirdwing closed pull request #7437: [R] vignette update URL: https://github.com/apache/incubator-mxnet/pull/7437 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] szha commented on issue #7445: Using cuDNN for CTC Loss
szha commented on issue #7445: Using cuDNN for CTC Loss URL: https://github.com/apache/incubator-mxnet/issues/7445#issuecomment-322064100 Thanks for raising this, @sbodenstein. I'm working on using the cudnn7 implementation of CTC for GPU. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ysh329 commented on a change in pull request #7363: Add tensorboard configure into ./common/fit.py and ./train_mnist.py
ysh329 commented on a change in pull request #7363: Add tensorboard configure into ./common/fit.py and ./train_mnist.py URL: https://github.com/apache/incubator-mxnet/pull/7363#discussion_r132864621 ## File path: example/image-classification/train_mnist.py ## @@ -75,5 +75,13 @@ def get_mnist_iter(args, kv): net = import_module('symbols.'+args.network) sym = net.get_symbol(**vars(args)) +# tensorboard logs +train_log = 'logs/mnist/train' +eval_log = 'logs/mnist/eval' +batch_end_callbacks = [mx.contrib.tensorboard.LogMetricsCallback(train_log)] Review comment: So stupid I am, I forgot `make` and `pip install -e .`. It's okay on current master branch. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] DickJC123 opened a new pull request #7447: Tensorcore fullyconnected support2
DickJC123 opened a new pull request #7447: Tensorcore fullyconnected support2 URL: https://github.com/apache/incubator-mxnet/pull/7447 Consider this an alternative approach to getting TensorCore working with FullyConnected. It is far simpler than my first PR for this new functionality. If anything, this is my proof that one can invoke TensorCore algos through manipulation of the cublas handle along with the existing dot function's use of Hgemm and SgemmEx. This PR also shows the type of per-instance handle manipulations that are necessary, since blindly setting the handle globally to enable TensorCore will have the unfortunate side-effect of introducing fp16-casts on the inputs of fp32-I/O gemms. Bottom line, I wouldn't expect you to accept this PR without a discussion. I have begun studying the new linear algebra code with the idea of producing an enable-TensorCore PR for this new approach. I notice the new LA code doesn't support fp16 I/O gemms yet, and the solution there will not fit the mold of the existing function templates. Also, what is the plan for switching over MXNET's use of dot() to use the new functions? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] formath opened a new pull request #7452: rm not use variables
formath opened a new pull request #7452: rm not use variables URL: https://github.com/apache/incubator-mxnet/pull/7452 @piiswrong This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] BenLag2906 opened a new issue #7453: Pb in import of mxnet
BenLag2906 opened a new issue #7453: Pb in import of mxnet URL: https://github.com/apache/incubator-mxnet/issues/7453 I have a Pb in import of mxnet in python ## Environment info Operating System: windows 10 Compiler: visual 2015 64 Package used (Python/R/Scala/Julia): Python MXNet version: 0.10 or 0.11 installed from source: If you are using python package, please provide Python version and distribution: 2.7.13 ## Error Message: Python 2.7.13 (v2.7.13:a06454b1afa1, Dec 17 2016, 20:53:40) [MSC v.1500 64 bit (AMD64)] on win32 Type "help", "copyright", "credits" or "license" for more information. import mxnet Traceback (most recent call last): File "", line 1, in File "mxnet_init_.py", line 25, in from .base import MXNetError File "mxnet\base.py", line 86, in _LIB = _load_lib() File "mxnet\base.py", line 78, in load_lib lib = ctypes.CDLL(lib_path[0], ctypes.RTLD_LOCAL) File "C:\Python27\lib\ctypes_init.py", line 362, in init self._handle = _dlopen(self._name, mode) WindowsError: [Error 126] Le module sp?cifi? est introuvable ## Minimum reproducible example if you are using your own code, please provide a short script that reproduces the error. ## Steps to reproduce or if you are running standard examples, please provide the commands you have run that lead to the error. 1. AFTER COMPILATION : Import mxnet 2. 3. ## What have you tried to solve it? 1. different version 2. 3. ---- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] houkai opened a new issue #7450: reporting bugs: pbegin_ <= pend_. Two thread conflicts.
; } return parser_.ParseNext(*dptr); }, [this]() { parser_.BeforeFirst(); }); } ``` this thread produce data and put data into buffer(16 size), next function will consume data from buffer. 3. Two thread use same fs_. The first one will get 2 data(buffer size 2) and stop as no consumer read from its queue. but the first thread change fs_. If before the second thread run(read data), the first has read 2 data. Then, the first will not effect the second. However, if the first running when second thread begin, they are in conflict. As a result, the second thread calculate the position in file(fs_) is error. ## What have you tried to solve it? 1. close the first thread in dmlc.Because i train in one computer and don't use cache_file, returns the object in advance. It's solved. InputSplit::Create in io.cc ``` InputSplit* InputSplit::Create(const char *uri_, unsigned part, unsigned nsplit, const char *type) { using namespace std; using namespace dmlc::io; // allow cachefile in format path#cachefile io::URISpec spec(uri_, part, nsplit); if (!strcmp(spec.uri.c_str(), "stdin")) { return new SingleFileSplit(spec.uri.c_str()); } CHECK(part < nsplit) << "invalid input parameter for InputSplit::Create"; URI path(spec.uri.c_str()); InputSplitBase *split = NULL; if (!strcmp(type, "text")) { split = new LineSplitter(FileSystem::GetInstance(path), spec.uri.c_str(), part, nsplit); } else if (!strcmp(type, "recordio")) { split = new RecordIOSplitter(FileSystem::GetInstance(path), spec.uri.c_str(), part, nsplit); } else { LOG(FATAL) << "unknown input split type " << type; } return split; /*#if DMLC_ENABLE_STD_THREAD if (spec.cache_file.length() == 0) { return new ThreadedInputSplit(split); } else { return new CachedInputSplit(split, spec.cache_file.c_str()); } #else CHECK(spec.cache_file.length() == 0) << "to enable cached file, compile with c++11"; return split; #endif*/ } ``` 2. I don't know the impact of DMLC_ENABLE_STD_THREAD, so keep DMLC_ENABLE_STD_THREAD = 1. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] chinakook commented on issue #7361: training speed of batch-norm is less than batch-norm-v1
chinakook commented on issue #7361: training speed of batch-norm is less than batch-norm-v1 URL: https://github.com/apache/incubator-mxnet/issues/7361#issuecomment-322138483 Is the cudnn_off = False ? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] aktiger commented on issue #3239: I found my mxnet starts very slow, takes about 5 minutes before the gpus begin to run
aktiger commented on issue #3239: I found my mxnet starts very slow, takes about 5 minutes before the gpus begin to run URL: https://github.com/apache/incubator-mxnet/issues/3239#issuecomment-322136683 @sxjscience thanks, this solved my problem. The first time is still slow, but after the first time, it runs very fast. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ZiyueHuang commented on issue #7449: Fix a bug in SequentialRNNCell.reset()
ZiyueHuang commented on issue #7449: Fix a bug in SequentialRNNCell.reset() URL: https://github.com/apache/incubator-mxnet/pull/7449#issuecomment-322136929 ping @piiswrong @sxjscience This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] aktiger commented on issue #3239: I found my mxnet starts very slow, takes about 5 minutes before the gpus begin to run
aktiger commented on issue #3239: I found my mxnet starts very slow, takes about 5 minutes before the gpus begin to run URL: https://github.com/apache/incubator-mxnet/issues/3239#issuecomment-322136683 @sxjscience thanks, this solved my problem. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ZhaoxiaZhang commented on issue #7406: inconsistent accuracy: ImageRecordIter vs ImageIter
ZhaoxiaZhang commented on issue #7406: inconsistent accuracy: ImageRecordIter vs ImageIter URL: https://github.com/apache/incubator-mxnet/issues/7406#issuecomment-322140620 Thanks for that! let me try and will update the results later. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] piiswrong closed pull request #7304: gluon bce loss
piiswrong closed pull request #7304: gluon bce loss URL: https://github.com/apache/incubator-mxnet/pull/7304 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ZhaoxiaZhang commented on issue #7406: inconsistent accuracy: ImageRecordIter vs ImageIter
ZhaoxiaZhang commented on issue #7406: inconsistent accuracy: ImageRecordIter vs ImageIter URL: https://github.com/apache/incubator-mxnet/issues/7406#issuecomment-322109650 @melody-rain Thanks for your comments. I will try the imageiter. BTW, any insights why ImageRecordIter does,'t work? I'm reallly confused abot this. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ZiyueHuang opened a new pull request #7449: Fix a bug in SequentialRNNCell.reset()
ZiyueHuang opened a new pull request #7449: Fix a bug in SequentialRNNCell.reset() URL: https://github.com/apache/incubator-mxnet/pull/7449 SequentialRNNCell.reset() should invoke reset in all layer cells. This is an issue in https://github.com/awslabs/sockeye/blob/master/sockeye/decoder.py#L379. It is wrong if someone use SequentialRNNCell.__call__() to unroll the network as the way in BaseRNNCell.unroll(). Although SequentialRNNCell.unroll() is correct now because it uses unroll in all layer cells (cell.unroll will reset at the beginning). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] piiswrong commented on issue #7319: [RoadMap] Legacy issue resolution before 1.0 release
piiswrong commented on issue #7319: [RoadMap] Legacy issue resolution before 1.0 release URL: https://github.com/apache/incubator-mxnet/issues/7319#issuecomment-322110140 @ptrendx @madjam @bhavinthaker The removed tutorials need to be brought back ASAP! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] KeyKy opened a new issue #7448: out of memory when training imagenet with .rec file.
KeyKy opened a new issue #7448: out of memory when training imagenet with .rec file. URL: https://github.com/apache/incubator-mxnet/issues/7448 ## Environment info Operating System: ubuntu 16.04 Compiler: gcc (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609 Package used (Python/R/Scala/Julia): Python MXNet commit hash: 8ad3c8a7a98dfa6bd6f5065cf9c3688f2414c3d4 Python version and distribution: Python2.7.12 ## Error Message: The usage of memory, at the beginning of training is 5.7%-7.9% , after a while (3~4h) it takes 27% and finally it run out of my memory (waked up by the alarm message). ## Steps to reproduce or if you are running standard examples, please provide the commands you have run that lead to the error. 1. cd examples/image-classification && python train_imagenet.py --network my_net --gpus 0,1,2,3 --num-epochs 100 --lr 0.01 --lr-step-epochs 30,60,80,110 --batch-size 256 --top-k 5 --data-train /data_shared/datasets/ILSVRC2015/rec/train_480_q100.rec --data-val /data_shared/datasets/ILSVRC2015/rec/val_480_q100.rec --rgb-mean 123.68,116.779,103.939 --data-nthreads 4 --model-prefix ./my_net ## What have you tried to solve it? 1. set the prefetch_buffer = 1 but the accuracy of my model drop to 20% and set prefetch_buffer back to 2,4,8, the accuracy is right! 2. cpu memory increase continually when set the prefetch_buffer = 1 3. also find some similar issues: https://github.com/apache/incubator-mxnet/issues/1411 https://github.com/apache/incubator-mxnet/issues/3183 https://github.com/apache/incubator-mxnet/issues/2969 https://github.com/apache/incubator-mxnet/issues/2111 https://github.com/apache/incubator-mxnet/issues/2099 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] CodingCat commented on issue #7411: [scala-package][spark] fix example script
CodingCat commented on issue #7411: [scala-package][spark] fix example script URL: https://github.com/apache/incubator-mxnet/pull/7411#issuecomment-322097900 @javelinjs @terrytangyuan any comments on this? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] dabraude opened a new issue #7457: Minimal C example fails to register operators
dabraude opened a new issue #7457: Minimal C example fails to register operators URL: https://github.com/apache/incubator-mxnet/issues/7457 This is the most minimal example of what should be used to load a symbol with the c_api and it doesn't work. Unless I'm missing something completely like a compiler flag, otherwise ## Environment info Operating System: centos Compiler: gcc Package used (Python/R/Scala/Julia): C MXNet version: 0.9.3 ## Error Message: Please paste the full error message, including stack trace. [16:17:17] /share/tools/mxnet/dmlc-core/include/dmlc/./logging.h:300: [16:17:17] src/core/op.cc:55: Check failed: op != nullptr Operator FullyConnected is not registered Stack trace returned 10 entries: [bt] (0) /share/tools/mxnet/lib/libmxnet.so(_ZN4nnvm2Op3GetERKSs+0x329) [0x7fcb0323f179] [bt] (1) /share/tools/mxnet/lib/libmxnet.so(+0xef8268) [0x7fcb03227268] [bt] (2) /share/tools/mxnet/lib/libmxnet.so(_ZN4dmlc20JSONObjectReadHelper13ReadAllFieldsEPNS_10JSONReaderE+0x100) [0x7fcb0322d680] [bt] (3) /share/tools/mxnet/lib/libmxnet.so(+0xef70ef) [0x7fcb032260ef] [bt] (4) /share/tools/mxnet/lib/libmxnet.so(_ZNSt17_Function_handlerIFN4nnvm5GraphES1_EPS2_E9_M_invokeERKSt9_Any_dataS1_+0x11f) [0x7fcb02e8c3ef] [bt] (5) /share/tools/mxnet/lib/libmxnet.so(_ZN4nnvm11ApplyPassesENS_5GraphERKSt6vectorISsSaISsEE+0x501) [0x7fcb03232b51] [bt] (6) /share/tools/mxnet/lib/libmxnet.so(_ZN5mxnet18LoadLegacyJSONPassEN4nnvm5GraphE+0x180) [0x7fcb02e851c0] [bt] (7) /share/tools/mxnet/lib/libmxnet.so(_ZNSt17_Function_handlerIFN4nnvm5GraphES1_EPS2_E9_M_invokeERKSt9_Any_dataS1_+0x11f) [0x7fcb02e8c3ef] [bt] (8) /share/tools/mxnet/lib/libmxnet.so(_ZN4nnvm11ApplyPassesENS_5GraphERKSt6vectorISsSaISsEE+0x501) [0x7fcb03232b51] [bt] (9) /share/tools/mxnet/lib/libmxnet.so(_ZN4nnvm9ApplyPassENS_5GraphERKSs+0x8e) [0x7fcb0318006e] ## Minimum reproducible example test.c : #include #include "mxnet/c_api.h" int main(void) { const char * symfn = net_symbol.json"; SymbolHandle sym; MXSymbolCreateFromFile(symfn, ); return 0; } ## Steps to reproduce or if you are running standard examples, please provide the commands you have run that lead to the error. 1. compiled with gcc -I../include -L. -Wl,--whole-archive -lmxnet -Wl,--no-whole-archive test.c -o testrun 2. run ## What have you tried to solve it? 1. including every header in the include/mxnet directory 2. copying all compiler flags from the make file This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] wanderingpj opened a new issue #7454: Question about networks for cifar100.
wanderingpj opened a new issue #7454: Question about networks for cifar100. URL: https://github.com/apache/incubator-mxnet/issues/7454 What networks are suitable for cifar100 with relatively high testing accuracy? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ZiyueHuang closed pull request #7449: Fix a bug in SequentialRNNCell.reset()
ZiyueHuang closed pull request #7449: Fix a bug in SequentialRNNCell.reset() URL: https://github.com/apache/incubator-mxnet/pull/7449 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ZiyueHuang opened a new pull request #7449: Fix a bug in SequentialRNNCell.reset()
ZiyueHuang opened a new pull request #7449: Fix a bug in SequentialRNNCell.reset() URL: https://github.com/apache/incubator-mxnet/pull/7449 `SequentialRNNCell.reset()` should invoke reset in all layer cells. This is an issue in https://github.com/awslabs/sockeye/blob/master/sockeye/decoder.py#L379. It is wrong if someone use `SequentialRNNCell.__call__()` to unroll the network as the way in `BaseRNNCell.unroll()`. Although `SequentialRNNCell.unroll()` is correct now because it uses unroll in all layer cells (cell.unroll will reset at the beginning). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] saswatac opened a new pull request #7456: make MXDataIter work without indices
saswatac opened a new pull request #7456: make MXDataIter work without indices URL: https://github.com/apache/incubator-mxnet/pull/7456 indices are optional, custom cpp iterators providing data batches without indices should work while using MXDataIter. ##Testing python tests/python/unittest/test_io.py [10:58:26] src/io/iter_mnist.cc:91: MNISTIter: load 6 images, shuffle=1, shape=(100,784) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] leoxiaobin opened a new issue #7455: Distributed training is slow
leoxiaobin opened a new issue #7455: Distributed training is slow URL: https://github.com/apache/incubator-mxnet/issues/7455 ## Environment info Operating System: Ubuntu 16.4 Compiler: gcc 5.4 Package used (Python/R/Scala/Julia): Python MXNet version: Last code Or if installed from source: installed from source MXNet commit hash (`git rev-parse HEAD`): 1a3faa If you are using python package, please provide Python version and distribution: Python 2.7.13 :: Anaconda custom (64-bit) I tried to train image classification model using two servers with infiniband cards. But the speed is a little slow, just like using one server. I used the code of example/image-classifaction. when training on one server, the training command is ``` python train_imagenet.py --benchmark 1 --gpus 0,1,2,3,4,5,6,7 --kv-store device --network inception-v3 --batch-size 256 --image-shape 3,299,299 ``` the speed is ``` INFO:root:start with arguments Namespace(batch_size=256, benchmark=1, data_nthreads=4, data_train=None, data_val=None, disp_batches=20, dtype='float32', gpus='0,1,2,3,4,5,6,7', image_shape='3,299,299', kv_store='device', load_epoch=None, lr=0.1, lr_factor=0.1, lr_step_epochs='30,60', max_random_aspect_ratio=0.25, max_random_h=36, max_random_l=50, max_random_rotate_angle=10, max_random_s=50, max_random_scale=1, max_random_shear_ratio=0.1, min_random_scale=1, model_prefix=None, mom=0.9, monitor=0, network='inception-v3', num_classes=1000, num_epochs=80, num_examples=1281167, num_layers=50, optimizer='sgd', pad_size=0, random_crop=1, random_mirror=1, rgb_mean='123.68,116.779,103.939', test_io=0, top_k=0, wd=0.0001) [22:35:19] src/operator/././cudnn_algoreg-inl.h:112: Running performance tests to find the best convolution algorithm, this can take a while... (setting env variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0 to disable) [22:35:40] src/kvstore/././comm.h:327: only 24 out of 56 GPU pairs are enabled direct access. It may affect the performance. You can set MXNET_ENABLE_GPU_P2P=0 to turn it off [22:35:40] src/kvstore/././comm.h:336: .vvv [22:35:40] src/kvstore/././comm.h:336: v.vv [22:35:40] src/kvstore/././comm.h:336: vv.v [22:35:40] src/kvstore/././comm.h:336: vvv. [22:35:40] src/kvstore/././comm.h:336: .vvv [22:35:40] src/kvstore/././comm.h:336: v.vv [22:35:40] src/kvstore/././comm.h:336: vv.v [22:35:40] src/kvstore/././comm.h:336: vvv. INFO:root:Epoch[0] Batch [20] Speed: 1065.93 samples/sec accuracy=0.165365 INFO:root:Epoch[0] Batch [40] Speed: 1033.22 samples/sec accuracy=0.989648 INFO:root:Epoch[0] Batch [60] Speed: 1029.90 samples/sec accuracy=1.00 INFO:root:Epoch[0] Batch [80] Speed: 1029.80 samples/sec accuracy=1.00 INFO:root:Epoch[0] Batch [100] Speed: 1028.05 samples/sec accuracy=1.00 INFO:root:Epoch[0] Batch [120] Speed: 1019.75 samples/sec accuracy=1.00 INFO:root:Epoch[0] Batch [140] Speed: 1025.79 samples/sec accuracy=1.00 INFO:root:Epoch[0] Batch [160] Speed: 1027.82 samples/sec accuracy=1.00 INFO:root:Epoch[0] Batch [180] Speed: 1021.11 samples/sec accuracy=1.00 INFO:root:Epoch[0] Batch [200] Speed: 1025.14 samples/sec accuracy=1.00 INFO:root:Epoch[0] Batch [220] Speed: 1017.72 samples/sec accuracy=1.00 INFO:root:Epoch[0] Batch [240] Speed: 1021.09 samples/sec accuracy=1.00 INFO:root:Epoch[0] Batch [260] Speed: 1024.25 samples/sec accuracy=1.00 ``` When training with 2 servers, the command is ``` python ../../tools/launch.py -n 2 --launcher ssh -H hosts python train_imagenet.py --benchmark 1 --gpus 0,1,2,3,4,5,6,7 --kv-store dist_sync --network inception-v3 --num-layers 50 --batch-size 256 --sync-dst-dir /tmp/mxnet --image-shape 3,299,299 ``` And the speed is ``` INFO:root:Epoch[0] Batch [20] Speed: 609.31 samples/sec accuracy=0.056920 INFO:root:Epoch[0] Batch [20] Speed: 610.12 samples/sec accuracy=0.050967 INFO:root:Epoch[0] Batch [40] Speed: 608.68 samples/sec accuracy=0.854883 INFO:root:Epoch[0] Batch [40] Speed: 608.19 samples/sec accuracy=0.868164 INFO:root:Epoch[0] Batch [60] Speed: 602.48 samples/sec accuracy=1.00 INFO:root:Epoch[0] Batch [60] Speed: 603.86 samples/sec accuracy=1.00 INFO:root:Epoch[0] Batch [80] Speed: 603.11 samples/sec accuracy=1.00 INFO:root:Epoch[0] Batch [80] Speed: 603.87 samples/sec accuracy=1.00 INFO:root:Epoch[0] Batch [100] Speed: 607.30 samples/sec accuracy=1.00 INFO:root:Epoch[0] Batch [100] Speed: 606.54 samples/sec accuracy=1.00 INFO:root:Epoch[0] Batch [120] Speed: 604.53 samples/sec accuracy=1.00 INFO:root:Epoch[0] Batch [120
[GitHub] ZiyueHuang commented on issue #7449: Fix a bug in SequentialRNNCell.reset()
ZiyueHuang commented on issue #7449: Fix a bug in SequentialRNNCell.reset() URL: https://github.com/apache/incubator-mxnet/pull/7449#issuecomment-322136929 ping @piiswrong @sxjscience This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] xzqjack commented on issue #7427: how to set dataiter with multi data?
xzqjack commented on issue #7427: how to set dataiter with multi data? URL: https://github.com/apache/incubator-mxnet/issues/7427#issuecomment-322027400 You can look up the definition of class module (python/mxnet/module/module.py), and init module by `mod = mx.mod.Module(..., data_names=('data1', 'data2'), ...)` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] xzqjack commented on issue #7426: mx random seed doesn't work for random_uniform/random_normal on gpu
xzqjack commented on issue #7426: mx random seed doesn't work for random_uniform/random_normal on gpu URL: https://github.com/apache/incubator-mxnet/issues/7426#issuecomment-322027770 The same error happened in my service (ubuntu16.04, gpu, mxnet-version:0.10.1) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ZhaoxiaZhang commented on issue #7406: inconsistent accuracy: ImageRecordIter vs ImageIter
ZhaoxiaZhang commented on issue #7406: inconsistent accuracy: ImageRecordIter vs ImageIter URL: https://github.com/apache/incubator-mxnet/issues/7406#issuecomment-322030133 HI, I met something maybe similar with you. I trained ImageRecordIter and get a very good accuracy. However when I tested with the same data, the accuracy is much lower. I don't kown why. BTW, how did you get the mean r, g b.? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ZhaoxiaZhang commented on issue #7398: inconsistent results when infering
ZhaoxiaZhang commented on issue #7398: inconsistent results when infering URL: https://github.com/apache/incubator-mxnet/issues/7398#issuecomment-322030662 HI, I have met the similar problems. Have you figured this out? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] szha commented on a change in pull request #7449: Fix a bug in SequentialRNNCell.reset()
szha commented on a change in pull request #7449: Fix a bug in SequentialRNNCell.reset() URL: https://github.com/apache/incubator-mxnet/pull/7449#discussion_r133002287 ## File path: python/mxnet/rnn/rnn_cell.py ## @@ -788,6 +788,12 @@ def unpack_weights(self, args): def pack_weights(self, args): return _cells_pack_weights(self._cells, args) +def reset(self): +super(SequentialRNNCell, self).reset() +if hasattr(self, '_cells'): Review comment: Alternatively you can put this in the BaseRNNCell so that other container cells (e.g. BidirectionalCell) can be patched up together. Would you mind doing the same for python/mxnet/gluon/rnn/rnn_cell.py? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] szha commented on a change in pull request #7449: Fix a bug in SequentialRNNCell.reset()
szha commented on a change in pull request #7449: Fix a bug in SequentialRNNCell.reset() URL: https://github.com/apache/incubator-mxnet/pull/7449#discussion_r133002287 ## File path: python/mxnet/rnn/rnn_cell.py ## @@ -788,6 +788,12 @@ def unpack_weights(self, args): def pack_weights(self, args): return _cells_pack_weights(self._cells, args) +def reset(self): +super(SequentialRNNCell, self).reset() +if hasattr(self, '_cells'): Review comment: Alternatively you can put this in the BaseRNNCell so that other container cells (e.g. BidirectionalCell) can be patched up together. Would you mind doing the same for python/mxnet/gluon/rnn/rnn_cell.py? The container attribute there is `_children`. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] piiswrong closed pull request #7439: Small doc cleanups
piiswrong closed pull request #7439: Small doc cleanups URL: https://github.com/apache/incubator-mxnet/pull/7439 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] CodingCat opened a new pull request #7440: [scala-package] clean up pom (remove duplicate definition)
CodingCat opened a new pull request #7440: [scala-package] clean up pom (remove duplicate definition) URL: https://github.com/apache/incubator-mxnet/pull/7440 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] terrytangyuan commented on issue #7440: [scala-package] clean up pom (remove duplicate definition)
terrytangyuan commented on issue #7440: [scala-package] clean up pom (remove duplicate definition) URL: https://github.com/apache/incubator-mxnet/pull/7440#issuecomment-322011136 @javelinjs Sorry just realized that I merged it without your approval. Should we change the setting so we can only merge when ALL requested reviewers have approved? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] KeyKy commented on issue #3183: Large Rec File Memory and Speed Issues
KeyKy commented on issue #3183: Large Rec File Memory and Speed Issues URL: https://github.com/apache/incubator-mxnet/issues/3183#issuecomment-322011807 I have the same problem when i train ssd or imagenet and i check the usage of memory in top. At the beginning of training is 5.7% , after a while (3~4h) it takes 27% and finally it run out of my memory (waked up by the alarm message) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] KeyKy commented on issue #3183: Large Rec File Memory and Speed Issues
KeyKy commented on issue #3183: Large Rec File Memory and Speed Issues URL: https://github.com/apache/incubator-mxnet/issues/3183#issuecomment-322011807 I have the same problem when i train ssd or imagenet using .rec file and i check the usage of memory in top. At the beginning of training is 5.7% , after a while (3~4h) it takes 27% and finally it run out of my memory (waked up by the alarm message) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ZiyueHuang commented on a change in pull request #7449: Fix a bug in SequentialRNNCell.reset()
ZiyueHuang commented on a change in pull request #7449: Fix a bug in SequentialRNNCell.reset() URL: https://github.com/apache/incubator-mxnet/pull/7449#discussion_r133008855 ## File path: python/mxnet/rnn/rnn_cell.py ## @@ -788,6 +788,12 @@ def unpack_weights(self, args): def pack_weights(self, args): return _cells_pack_weights(self._cells, args) +def reset(self): +super(SequentialRNNCell, self).reset() +if hasattr(self, '_cells'): Review comment: SequentialRNNCell's constructor will call BaseRNNCell's constructor first before it initializes `self._cells`, BaseRNNCell's constructor will call `self.reset()`. So at first time `hasattr(self, '_cells')` should be false. @szha This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] cloudhan commented on issue #3724: asnumpy() of NDArray @cpu halted
cloudhan commented on issue #3724: asnumpy() of NDArray @cpu halted URL: https://github.com/apache/incubator-mxnet/issues/3724#issuecomment-322980863 @maxenceliu nah, I lost the script and forgot how to reproduce... This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] eric-haibin-lin commented on a change in pull request #7082: Sparse Tensor: request for reviews
eric-haibin-lin commented on a change in pull request #7082: Sparse Tensor: request for reviews URL: https://github.com/apache/incubator-mxnet/pull/7082#discussion_r133617511 ## File path: python/mxnet/model.py ## @@ -113,25 +127,36 @@ def _update_params_on_kvstore(param_arrays, grad_arrays, kvstore, param_names): # push gradient, priority is negative index kvstore.push(name, grad_list, priority=-index) # pull back the weights -kvstore.pull(name, arg_list, priority=-index) +if _contains_non_default_storage(arg_list): Review comment: Normally given a key, the ndarrays in the arg_list will be either all dense or all sparse, right? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] zihaolucky commented on a change in pull request #7503: log epoch number for tensorboard
zihaolucky commented on a change in pull request #7503: log epoch number for tensorboard URL: https://github.com/apache/incubator-mxnet/pull/7503#discussion_r133611060 ## File path: python/mxnet/contrib/tensorboard.py ## @@ -70,4 +70,4 @@ def __call__(self, param): for name, value in name_value: if self.prefix is not None: name = '%s-%s' % (self.prefix, name) -self.summary_writer.add_scalar(name, value) +self.summary_writer.add_scalar(name, value, global_step=param.epoch) Review comment: What would happened if we use this in `batch_end_callback`? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] zihaolucky commented on a change in pull request #7503: log epoch number for tensorboard
zihaolucky commented on a change in pull request #7503: log epoch number for tensorboard URL: https://github.com/apache/incubator-mxnet/pull/7503#discussion_r133611060 ## File path: python/mxnet/contrib/tensorboard.py ## @@ -70,4 +70,4 @@ def __call__(self, param): for name, value in name_value: if self.prefix is not None: name = '%s-%s' % (self.prefix, name) -self.summary_writer.add_scalar(name, value) +self.summary_writer.add_scalar(name, value, global_step=param.epoch) Review comment: What would happened if we use this in batch_end_callback? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] zihaolucky commented on issue #7503: log epoch number for tensorboard
zihaolucky commented on issue #7503: log epoch number for tensorboard URL: https://github.com/apache/incubator-mxnet/pull/7503#issuecomment-322955043 I've checked this change. LGTM. Notice that, after we add `global_step`: * In `RELATIVE` mode, the training curve still logs the metric every batch, the curve looks more smooth. Same with the older version, great. * In `STEP` mode, it shows the metric in that epoch. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] maxenceliu commented on issue #3724: asnumpy() of NDArray @cpu halted
maxenceliu commented on issue #3724: asnumpy() of NDArray @cpu halted URL: https://github.com/apache/incubator-mxnet/issues/3724#issuecomment-322966428 @zihaolucky naive engine has not been implemented? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] zihaolucky commented on issue #3724: asnumpy() of NDArray @cpu halted
zihaolucky commented on issue #3724: asnumpy() of NDArray @cpu halted URL: https://github.com/apache/incubator-mxnet/issues/3724#issuecomment-322957118 @maxenceliu I use the Scala package for a while in production. With version 0.7 it works okay. Or maybe you should try `naive engine`. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] StatML opened a new issue #7506: build error: expected ?}? at end of input
StatML opened a new issue #7506: build error: expected ?}? at end of input URL: https://github.com/apache/incubator-mxnet/issues/7506 For bugs or installation issues, please provide the following information. The more information you provide, the more likely people will be able to help you. ## Environment info Operating System: Ubuntu 16.04.4 Compiler: G++ 5.4 Package used (Python/R/Scala/Julia): Python MXNet version: Or if installed from source: source MXNet commit hash (`git rev-parse HEAD`): 1286809a1fc76c0b808b988084fc0950300f40d4 If you are using python package, please provide Python version and distribution: 3.5.2 If you are using R package, please provide R `sessionInfo()`: ## Error Message: > In file included from src/operator/random/./../operator_common.h:37:0, > from src/operator/random/./sample_multinomial_op.h:31, > from src/operator/random/sample_multinomial_op.cc:24: > src/operator/random/./../../common/cuda_utils.h:95:43: error: ?mxnet::common::cuda::CusolverGetErrorString? declared as an ?inline? variable > inline const char* CusolverGetErrorString(cusolverStatus_t error) { >^ > src/operator/random/./../../common/cuda_utils.h:95:43: error: ?cusolverStatus_t? was not declared in this scope > src/operator/random/./../../common/cuda_utils.h:95:67: error: expected ?,? or ?;? before ?{? token > inline const char* CusolverGetErrorString(cusolverStatus_t error) { >^ > src/operator/random/sample_multinomial_op.cc:111:1: error: expected ?}? at end of input > } // namespace mxnet > ^ > src/operator/random/sample_multinomial_op.cc:111:1: error: expected ?}? at end of input > src/operator/random/sample_multinomial_op.cc:111:1: error: expected ?}? at end of input > Makefile:275: recipe for target 'build/src/operator/random/sample_multinomial_op.o' failed > make: *** [build/src/operator/random/sample_multinomial_op.o] Error 1 > ## Steps to reproduce or if you are running standard examples, please provide the commands you have run that lead to the error. 1. make -j ## What have you tried to solve it? N/A ---- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] maxenceliu commented on issue #3724: asnumpy() of NDArray @cpu halted
maxenceliu commented on issue #3724: asnumpy() of NDArray @cpu halted URL: https://github.com/apache/incubator-mxnet/issues/3724#issuecomment-322959344 @zihaolucky Now I use version 1.10, problem appear again?!! What do you mean by naive engine? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] cloudhan commented on issue #3724: asnumpy() of NDArray @cpu halted
cloudhan commented on issue #3724: asnumpy() of NDArray @cpu halted URL: https://github.com/apache/incubator-mxnet/issues/3724#issuecomment-322962289 @zihaolucky actually pre 0.9 version works okey. Seems there have been deadlock since NVVM refactor This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] maxenceliu commented on issue #3724: asnumpy() of NDArray @cpu halted
maxenceliu commented on issue #3724: asnumpy() of NDArray @cpu halted URL: https://github.com/apache/incubator-mxnet/issues/3724#issuecomment-322965165 @cloudhan Have you tried 0.10.1 version? Still will block for several seconds. It will call `WaitToRead()` when copy data to Cpu. I don't understand if all the dependency need to be wait for this copy. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] peiyunh opened a new pull request #7507: Fix description of argument parser
peiyunh opened a new pull request #7507: Fix description of argument parser URL: https://github.com/apache/incubator-mxnet/pull/7507 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] piiswrong commented on issue #7502: Change git clone to specific tag for installation
piiswrong commented on issue #7502: Change git clone to specific tag for installation URL: https://github.com/apache/incubator-mxnet/pull/7502#issuecomment-322939359 Is branch name guaranteed to match with version? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] hpsoar commented on issue #7491: GPUDeviceStorage is not used? why?
hpsoar commented on issue #7491: GPUDeviceStorage is not used? why? URL: https://github.com/apache/incubator-mxnet/issues/7491#issuecomment-322953488 I didn't compile with CUDA, I'm reading the code, I see that when CUDA is used, it actually use GPUPooledStorageManager to do the allocation, on the other hand, CPUDeviceStorage is used no where. I'm just curious why CPUDeviceStorage is there, and not in use. I'm on the master branch. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] CNevd commented on a change in pull request #7082: Sparse Tensor: request for reviews
CNevd commented on a change in pull request #7082: Sparse Tensor: request for reviews URL: https://github.com/apache/incubator-mxnet/pull/7082#discussion_r133614687 ## File path: python/mxnet/model.py ## @@ -113,25 +127,36 @@ def _update_params_on_kvstore(param_arrays, grad_arrays, kvstore, param_names): # push gradient, priority is negative index kvstore.push(name, grad_list, priority=-index) # pull back the weights -kvstore.pull(name, arg_list, priority=-index) +if _contains_non_default_storage(arg_list): Review comment: Can we just skip the row_sparse weights and pull the dense weights normally or any other prettier way to do this? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] maxenceliu commented on issue #3724: asnumpy() of NDArray @cpu halted
maxenceliu commented on issue #3724: asnumpy() of NDArray @cpu halted URL: https://github.com/apache/incubator-mxnet/issues/3724#issuecomment-322955687 @zihaolucky So, you give up to deploy mxnet on Server at last and use another platform? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] maxenceliu commented on issue #7417: Update mxnet in maven timely?
maxenceliu commented on issue #7417: Update mxnet in maven timely? URL: https://github.com/apache/incubator-mxnet/issues/7417#issuecomment-322957351 @javelinjs @szha @piiswrong Thanks for your paying attention to this issue. Another question about scala version is: NDarray.toArray method could be block on cpu version mxnet? This method will block me for about 10 seconds each time I run it on the Centos server, which could be neglected when I run it on my Mac cpu only. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ZiyueHuang commented on issue #7449: Fix a bug in SequentialRNNCell.reset()
ZiyueHuang commented on issue #7449: Fix a bug in SequentialRNNCell.reset() URL: https://github.com/apache/incubator-mxnet/pull/7449#issuecomment-322959352 @piiswrong This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] maxenceliu commented on issue #7417: Update mxnet in maven timely?
maxenceliu commented on issue #7417: Update mxnet in maven timely? URL: https://github.com/apache/incubator-mxnet/issues/7417#issuecomment-322965667 I Read the code `if (this->ctx().dev_mask() == cpu::kDevMask) { **this->WaitToRead();** RunContext rctx{this->ctx(), nullptr}; ndarray::Copy<cpu, cpu>(this->data(), , Context::CPU(), Context::CPU(), rctx); } ` I guess it is blocked here. I don't no why we have to wait to read, if there is something we don't have to wait to do this copy? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] kevinthesun commented on issue #7502: Change git clone to specific tag for installation
kevinthesun commented on issue #7502: Change git clone to specific tag for installation URL: https://github.com/apache/incubator-mxnet/pull/7502#issuecomment-322940936 Yes. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] maxenceliu commented on issue #3724: asnumpy() of NDArray @cpu halted
maxenceliu commented on issue #3724: asnumpy() of NDArray @cpu halted URL: https://github.com/apache/incubator-mxnet/issues/3724#issuecomment-322952953 @zihaolucky Hi, How do you resolve this problem? I met the same occasion. Each time I call NDArray.toArray in scala, it will block at least 10 seconds when I run this on a cpu-Server with Centos. however, in my Macbook cpu only, its time cost can be noticed! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] reminisce commented on a change in pull request #7082: Sparse Tensor: request for reviews
reminisce commented on a change in pull request #7082: Sparse Tensor: request for reviews URL: https://github.com/apache/incubator-mxnet/pull/7082#discussion_r133617599 ## File path: include/mxnet/c_api.h ## @@ -321,6 +353,17 @@ MXNET_DLL int MXNDArraySyncCopyToCPU(NDArrayHandle handle, void *data, size_t size); /*! + * \brief Copy src.data() to dst.data() if i = -1, else dst.aux_data(i) if i >= 0 + * This function blocks. Do not use it in performance critical code. + * \param handle_dst handle of a dst ndarray whose data/aux_data has been allocated + * \param handle_src handle of a src ndarray which has default storage type + * \param i dst data blob indicator + */ +MXNET_DLL int MXNDArraySyncCopyFromNDArray(NDArrayHandle handle_dst, + const NDArrayHandle handle_src, + const int i); Review comment: Agree with @eric-haibin-lin. As long as it's documented clearly in the comment, it's fine to use `i`. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] DickJC123 opened a new pull request #7505: Changed FullyConnected to use new linalg gemm, plus TensorCore if fp16 I/O.
DickJC123 opened a new pull request #7505: Changed FullyConnected to use new linalg gemm, plus TensorCore if fp16 I/O. URL: https://github.com/apache/incubator-mxnet/pull/7505 GEMMs within the FullyConnected operator switched from using mshadow::dot() to the new linalg_gemm(). After a trial-run, this can be the model for removing all uses of dot() within MXNet. Added a specialization linalg_gemm<gpu, half_t> that includes use of TensorCore algos by default. Users can disable TensorCore on Volta by setting the environment variable MXNET_CUDA_ALLOW_TENSOR_CORE=0. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] jinfagang opened a new issue #7509: SSD example error
jinfagang opened a new issue #7509: SSD example error URL: https://github.com/apache/incubator-mxnet/issues/7509 Simply run SSD example got this error: ``` [01:09:24] src/nnvm/legacy_json_util.cc:185: Warning: loading symbol saved by MXNet version 1001 with lower version of MXNet v1000. May cause undefined behavior. Please update MXNet if you encounter any issue terminate called after throwing an instance of 'std::bad_alloc' what(): std::bad_alloc ``` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] CNevd commented on a change in pull request #7082: Sparse Tensor: request for reviews
CNevd commented on a change in pull request #7082: Sparse Tensor: request for reviews URL: https://github.com/apache/incubator-mxnet/pull/7082#discussion_r133667882 ## File path: python/mxnet/model.py ## @@ -113,25 +127,36 @@ def _update_params_on_kvstore(param_arrays, grad_arrays, kvstore, param_names): # push gradient, priority is negative index kvstore.push(name, grad_list, priority=-index) # pull back the weights -kvstore.pull(name, arg_list, priority=-index) +if _contains_non_default_storage(arg_list): Review comment: Consider arg_list has 1 row_sparse weight and many dense weights, ```_contains_non_default_storage(arg_list)``` will be true, so user have to call ```kvstore.pull``` for all the dense weights manually out of ```model.py```. maybe there is some way to split them (row_sparse and dense) so we can only pull row_sparse weights manually? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] jinfagang commented on issue #7509: SSD example error
jinfagang commented on issue #7509: SSD example error URL: https://github.com/apache/incubator-mxnet/issues/7509#issuecomment-323004047 Well, I know it's version issue, but I just installed from pip which gives me 0.10.0, the newest version is 0.11, but why not update pip? I really cannot build from source, the official doc is so out-of-date and messy. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] lxn2 closed pull request #11: Fix git clone links
lxn2 closed pull request #11: Fix git clone links URL: https://github.com/apache/incubator-mxnet-site/pull/11 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] maxenceliu opened a new issue #7417: Update mxnet in maven timely?
maxenceliu opened a new issue #7417: Update mxnet in maven timely? URL: https://github.com/apache/incubator-mxnet/issues/7417 in the maven repository, the mxnet version is much delayed. Furmore, it depends on opencv, in which the case our server doesn't need opencv. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] haimeh commented on issue #6629: Not enough information to get shape
haimeh commented on issue #6629: Not enough information to get shape URL: https://github.com/apache/incubator-mxnet/issues/6629#issuecomment-321613657 Also, a mistake one might make, so for reference I am adding it here. The shape inference may not always work with a chained composition such as: ``` discouragedSelfMatches <- mx.symbol.broadcast_mul(lhs = mx.symbol.broadcast_equal(lhs = distances, rhs = mx.symbol.min(distances), rhs = mx.symbol.max(distances)) ``` but it will work if the symbols are separated, for instance: ``` selfMatches <- mx.symbol.broadcast_equal(lhs = distances, rhs = mx.symbol.min(distances)) discouragedSelfMatches <- mx.symbol.broadcast_mul(lhs=selfMatches, rhs=mx.symbol.max(distances)) ``` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] piiswrong commented on issue #7393: add depthwise convolution's gpu version optimization
piiswrong commented on issue #7393: add depthwise convolution's gpu version optimization URL: https://github.com/apache/incubator-mxnet/pull/7393#issuecomment-321722267 @crazy-cat Still not compiling on windows: https://builds.apache.org/blue/organizations/jenkins/incubator-mxnet/detail/PR-7393/2/pipeline This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] regzhuce commented on issue #7375: Can I set instance weight when training?
regzhuce commented on issue #7375: Can I set instance weight when training? URL: https://github.com/apache/incubator-mxnet/issues/7375#issuecomment-321723588 @thirdwing Thanks Any proposals for my problem? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ysh329 commented on issue #7424: train_mnist.py failed: TypeError: __init__() got an unexpected keyword argument 'multi_precision'
ysh329 commented on issue #7424: train_mnist.py failed: TypeError: __init__() got an unexpected keyword argument 'multi_precision' URL: https://github.com/apache/incubator-mxnet/issues/7424#issuecomment-321723869 @ptrendx @mli @howard0su After I comment this line below about `multi_precision`, it's fixed. It seems a not complete feature? ```Python optimizer_params = { 'learning_rate': lr, 'momentum' : args.mom, 'wd' : args.wd, 'lr_scheduler': lr_scheduler,} #'multi_precision': True} ``` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] DickJC123 opened a new pull request #7425: Tensorcore fullyconnected support
DickJC123 opened a new pull request #7425: Tensorcore fullyconnected support URL: https://github.com/apache/incubator-mxnet/pull/7425 Adds TensorCore algo support to the FullyConnected operator for users with NVIDIA Volta / cuda9 / cudnn7. On by default, this can be disabled through an environment variable as in: export MXNET_CUDA_ALLOW_TENSOR_CORE=0 Applies to float16 I/O instances only. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ysh329 commented on a change in pull request #7363: Add tensorboard configure into ./common/fit.py and ./train_mnist.py
ysh329 commented on a change in pull request #7363: Add tensorboard configure into ./common/fit.py and ./train_mnist.py URL: https://github.com/apache/incubator-mxnet/pull/7363#discussion_r132612116 ## File path: example/image-classification/common/fit.py ## @@ -168,10 +168,16 @@ def fit(args, network, data_loader, **kwargs): # callbacks that run after each batch batch_end_callbacks = [mx.callback.Speedometer(args.batch_size, args.disp_batches)] +eval_end_callbacks = [] if 'batch_end_callback' in kwargs: cbs = kwargs['batch_end_callback'] batch_end_callbacks += cbs if isinstance(cbs, list) else [cbs] Review comment: @zihaolucky I assigned `eval_end_callbacks` with same value as `batch_end_ballbacks`, but log about validation set still print for each epoch, not `args.disp_batches`. ```Python batch_end_callbacks = [mx.callback.Speedometer(args.batch_size, args.disp_batches)] eval_end_callbacks = [mx.callback.Speedometer(args.batch_size, args.disp_batches)] ``` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] EsraaRagaa opened a new issue #7422: Data provided by data_shapes don't match names specified by data_names
EsraaRagaa opened a new issue #7422: Data provided by data_shapes don't match names specified by data_names URL: https://github.com/apache/incubator-mxnet/issues/7422 I am trying to train my data with mxnet, the data are 20x20 images I put my data in a csv files in the form: label, pixel1, pixel2, ..., pixel400 0,... 1,... and my labels are 0 or 1, this is the code I used: batch_size = 2 train_iterator = mx.io.NDArrayIter(X_train, Y_train, batch_size=batch_size) validate_iterator = mx.io.NDArrayIter(X_validate, Y_validate, batch_size=batch_size) ##first convelutional layer conv1 = mx.sym.Convolution(data=data, kernel=(3,3), num_filter=6) relu1 = mx.sym.Activation(data=conv1, act_type="relu") pool1 = mx.sym.Pooling(data=relu1, pool_type="max", kernel=(2,2), stride=(2,2)) ##second convelutional layer conv2 = mx.sym.Convolution(data=pool1, kernel=(6,6), num_filter=12) relu2 = mx.sym.Activation(data=conv2, act_type="relu") pool2 = mx.sym.Pooling(data=relu2, pool_type="max", kernel=(2,2), stride=(2,2)) ##first fully connected layer flatten = mx.sym.flatten(data=pool2) fc1 = mx.symbol.FullyConnected(data=flatten, num_hidden=12 ) ##softmax loss lenet = mx.sym.SoftmaxOutput(data=fc1, name='softmax') ##create a trainable module on CPU 0 lenet_model = mx.mod.Module(symbol=lenet, context=mx.cpu()) device = mx.cpu() ##train using parameters ''' model = mx.model.FeedForward.create(lenet_model, X = X_train, y = Y_train, ctx = device, num_epoch = 10) ''' lenet_model.fit(train_iterator, eval_data=validate_iterator, optimizer='sgd', optimizer_params={'learning_rate':0.1}, eval_metric='acc', batch_end_callback = mx.callback.Speedometer(batch_size, 100), num_epoch=10) # the error is: Traceback (most recent call last): File "", line 7, in num_epoch=10) File "C:\Users\...\Anaconda2\lib\site-packages\mxnet-0.10.1-py2.7.egg\mxnet\module\base_module.py", line 459, in fit for_training=True, force_rebind=force_rebind) File "C:\Users\...\Anaconda2\lib\site-packages\mxnet-0.10.1-py2.7.egg\mxnet\module\module.py", line 372, in bind self.data_names, self.label_names, data_shapes, label_shapes) File "C:\Users\...\Anaconda2\lib\site-packages\mxnet-0.10.1-py2.7.egg\mxnet\module\base_module.py", line 70, in _parse_data_desc _check_names_match(data_names, data_shapes, 'data', True) File "C:\Users\...\Anaconda2\lib\site-packages\mxnet-0.10.1-py2.7.egg\mxnet\module\base_module.py", line 62, in _check_names_match raise ValueError(msg) ValueError: Data provided by data_shapes don't match names specified by data_names ([DataDesc[_0_data,(2,),,NCHW], DataDesc[_1_data,(2,),,NCHW], DataDesc[_2_data,(2,),,NCHW], DataDesc[_3_data,(2,),,NCHW], DataDesc[_4_data,(2,),,NCHW], DataDesc[_5_data,(2,),,NCHW], DataDesc[_6_data,(2,),,NCHW], DataDesc[_7_data,(2,),,NCHW], DataDesc[_8_data,(2,),,NCHW], DataDesc[_9_data,(2,),,NCHW], DataDesc[_10_data,(2,),,NCHW], DataDesc[_11_data,(2,),,NCHW], DataDesc[_12_data,(2,),,NCHW], DataDesc[_13_data,(2,),,NCHW], DataDesc[_14_data,(2,),,NCHW], DataDesc[_15_data,(2,),,NCHW], DataDesc[_16_data,(2,),,NCHW], DataDesc[_17_data,( 2,),,NCHW], ... ... ... ... , DataDesc[_2773_data,(2,),,NCHW], DataDesc[_2774_data,(2,),,NCHW], DataDesc[_2775_data,(2,),,NCHW], DataDesc[_2776_data,(2,),,NCHW], DataDesc[_2777_data,(2,),,NCHW]] vs. ['data']) # What should the data look like, what is wrong in the code, please? I am using win10, python2, mxnet 0.10, anaconda2 Thanks in advance ---- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] chinakook commented on issue #7393: add depthwise convolution's gpu version optimization
chinakook commented on issue #7393: add depthwise convolution's gpu version optimization URL: https://github.com/apache/incubator-mxnet/pull/7393#issuecomment-321719808 Nice work, It's a good feature for mobilenet! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ysh329 opened a new issue #7424: train_mnist.py failed: TypeError: __init__() got an unexpected keyword argument 'multi_precision'
ysh329 opened a new issue #7424: train_mnist.py failed: TypeError: __init__() got an unexpected keyword argument 'multi_precision' URL: https://github.com/apache/incubator-mxnet/issues/7424 Release 0.10.0 is okay. However, Current master branch (git clone --recursive https://github.com/apache/incubator-mxnet.git) occurs this problem as below (running in the official docker, then I cloned from github.com and run this train_mnist.py) : ```Shell root@8acd2b8afd12:~/incubator-mxnet-origin/example/image-classification# python train_mnist.py INFO:root:start with arguments Namespace(add_stn=False, batch_size=64, disp_batches=100, dtype='float32', gpus=None, kv_store='device', load_epoch=None, lr=0.05, lr_factor=0.1, lr_step_epochs='10', model_prefix=None, mom=0.9, monitor=0, network='mlp', num_classes=10, num_epochs=20, num_examples=6, num_layers=None, optimizer='sgd', test_io=0, top_k=0, wd=0.0001) Traceback (most recent call last): File "train_mnist.py", line 91, in eval_end_callback = eval_end_callbacks) File "/root/incubator-mxnet-origin/example/image-classification/common/fit.py", line 207, in fit monitor= monitor) File "/mxnet/python/mxnet/module/base_module.py", line 465, in fit optimizer_params=optimizer_params) File "/mxnet/python/mxnet/module/module.py", line 478, in init_optimizer **optimizer_params) File "/mxnet/python/mxnet/optimizer.py", line 128, in create_optimizer return Optimizer.opt_registry[name.lower()](**kwargs) File "/mxnet/python/mxnet/optimizer.py", line 328, in __init__ super(SGD, self).__init__(**kwargs) TypeError: __init__() got an unexpected keyword argument 'multi_precision' ``` -------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] EsraaRagaa opened a new issue #7420: How to pass CSVIter (images flatten in .csv file) to MXNet ?
EsraaRagaa opened a new issue #7420: How to pass CSVIter (images flatten in .csv file) to MXNet ? URL: https://github.com/apache/incubator-mxnet/issues/7420 # Hello, I am trying to train my data with mxnet, the data are 20x20 images I put my data in a csv files in the form: label, pixel1, pixel2, ..., pixel400 0,... 1,... and my labels are 0 or 1, # this is the code I used: - train_data = pd.read_csv("training_set_flatten_rows_mxnet.csv") keys = ['pixel.'+str(i) for i in range(1,402)] X_train = train_data[keys[1:]].get_values() X_train = X_train.reshape((2779,1,20,20)) Y_train = train_data['pixel.1'].get_values().reshape((2779,1)) validate_data = pd.read_csv("validation_set_flatten_rows_mxnet.csv") keys = ['pixel.'+str(i) for i in range(1,402)] X_validate = validate_data[keys[1:]].get_values() X_validate = X_validate.reshape((692,1,20,20)) Y_validate = validate_data['pixel.1'].get_values().reshape((692,1)) train_iterator = mx.io.NDArrayIter(X_train, Y_train, batch_size=batch_size) validate_iterator = mx.io.NDArrayIter(X_validate, Y_validate, batch_size=batch_size) # first convelutional layer conv1 = mx.sym.Convolution(data=data, kernel=(3,3), num_filter=6) relu1 = mx.sym.Activation(data=conv1, act_type="relu") pool1 = mx.sym.Pooling(data=relu1, pool_type="max", kernel=(2,2), stride=(2,2)) # second convelutional layer conv2 = mx.sym.Convolution(data=pool1, kernel=(6,6), num_filter=12) relu2 = mx.sym.Activation(data=conv2, act_type="relu") pool2 = mx.sym.Pooling(data=relu2, pool_type="max", kernel=(2,2), stride=(2,2)) # first fully connected layer flatten = mx.sym.flatten(data=pool2) fc1 = mx.symbol.FullyConnected(data=flatten, num_hidden=12 ) # softmax loss lenet = mx.sym.SoftmaxOutput(data=fc1, name='softmax') # create a trainable module on CPU 0 lenet_model = mx.mod.Module(symbol=lenet, context=mx.cpu()) device = mx.cpu() # train using parameters ''' model = mx.model.FeedForward.create(lenet_model, X = X_train, y = Y_train, ctx = device, num_epoch = 10) ''' lenet_model.fit(train_iterator, eval_data=validate_iterator, optimizer='sgd', optimizer_params={'learning_rate':0.1}, eval_metric='acc', batch_end_callback = mx.callback.Speedometer(batch_size, 100), num_epoch=10) - # the error is: Traceback (most recent call last): File "", line 7, in num_epoch=10) File "C:\Users\...\Anaconda2\lib\site-packages\mxnet-0.10.1-py2.7.egg\mxnet\module\base_module.py", line 459, in fit for_training=True, force_rebind=force_rebind) File "C:\Users\...\Anaconda2\lib\site-packages\mxnet-0.10.1-py2.7.egg\mxnet\module\module.py", line 388, in bind state_names=self._state_names) File "C:\Users\...\Anaconda2\lib\site-packages\mxnet-0.10.1-py2.7.egg\mxnet\module\executor_group.py", line 214, in __init__ self.bind_exec(data_shapes, label_shapes, shared_group) File "C:\Users\...\Anaconda2\lib\site-packages\mxnet-0.10.1-py2.7.egg\mxnet\module\executor_group.py", line 310, in bind_exec shared_group)) File "C:\Users\...\Anaconda2\lib\site-packages\mxnet-0.10.1-py2.7.egg\mxnet\module\executor_group.py", line 582, in _bind_ith_exec shared_buffer=shared_data_arrays, **input_shapes) File "C:\Users\...\Anaconda2\lib\site-packages\mxnet-0.10.1-py2.7.egg\mxnet\symbol.py", line 1375, in simple_bind raise RuntimeError('simple_bind failed') RuntimeError: simple_bind failed # If I comment this line in the model.fit :: #batch_end_callback = mx.callback.Speedometer(batch_size, 100), # I get error: WARNING:root:Already bound, ignoring bind() Traceback (most recent call last): File "", line 7, in num_epoch=10) File "C:\Users\...\Anaconda2\lib\site-packages\mxnet-0.10.1-py2.7.egg\mxnet\module\base_module.py", line 463, in fit allow_missing=allow_missing, force_init=force_init) File "C:\Users\...\Anaconda2\lib\site-packages\mxnet-0.10.1-py2.7.egg\mxnet\module\module.py", line 272, in init_params for name, arr in self._arg_params.items(): AttributeError: 'NoneType' object has no attribute 'items' I guess the error is from the the data load and passing to mxnet. I don't know what does this mean and what shall I do, help me, please. I am using win10, python2, mxnet 0.10, anaconda2
[GitHub] rahul003 opened a new pull request #7421: Resolve more compile warnings
rahul003 opened a new pull request #7421: Resolve more compile warnings URL: https://github.com/apache/incubator-mxnet/pull/7421 All sign compare warnings. Warnings in fIrst two files showed up on GPU and the last file is a test file which had sign compare warnings. All of these occurrences don't need sign for the variable. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] larroy opened a new pull request #7416: update submoules with android fixes
larroy opened a new pull request #7416: update submoules with android fixes URL: https://github.com/apache/incubator-mxnet/pull/7416 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] idealboy commented on issue #5452: When executing "train_mnist.py" with two machines, no response returns
idealboy commented on issue #5452: When executing "train_mnist.py" with two machines, no response returns URL: https://github.com/apache/incubator-mxnet/issues/5452#issuecomment-321508451 it is probably because of the firewall in the machine, try to close it. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] eric-haibin-lin commented on issue #7414: why rnn train speed is not stable,sometims very slow?
eric-haibin-lin commented on issue #7414: why rnn train speed is not stable,sometims very slow? URL: https://github.com/apache/incubator-mxnet/issues/7414#issuecomment-321729637 Is 30 samples/sec the speed for a batch? Do you have variable sequence length? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] thirdwing commented on issue #7375: Can I set instance weight when training?
thirdwing commented on issue #7375: Can I set instance weight when training? URL: https://github.com/apache/incubator-mxnet/issues/7375#issuecomment-321731082 Can you give more details on what you mean by "set instance weight"? I am sorry that I don't understand your problem. On 10 Aug 2017 8:10 p.m., "reg.zhuce" <notificati...@github.com> wrote: @thirdwing <https://github.com/thirdwing> Thanks Any proposals for my problem? ? You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <https://github.com/apache/incubator-mxnet/issues/7375#issuecomment-321723588>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABebVczWwmvWZBZDoSqE8DBQs5w3Q6vzks5sW8YagaJpZM4OwL1A> . This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] jmacglashan commented on issue #7319: [RoadMap] Legacy issue resolution before 1.0 release
jmacglashan commented on issue #7319: [RoadMap] Legacy issue resolution before 1.0 release URL: https://github.com/apache/incubator-mxnet/issues/7319#issuecomment-320315492 Yes, an operator file (or otherwise) to support IDE code completion would be greatly welcomed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services