[NOTIFICATION] CI Restart

2020-01-31 Thread Anirudh Subramanian
Hi, We had to restart the master to mitigate an issue related to jenkins slaves being down. You may have to retrigger some of your in progress PRs. Apologies for the inconvenience caused. Anirudh

[NOTIFICATION] CI Upgrade

2020-01-30 Thread Anirudh Subramanian
Hi, I had to upgrade the CI to obtain some important security fixes : https://jenkins.io/security/advisory/2020-01-29/ . You may have to retrigger some of your in progress PRs. Apologies for the inconvenience caused. Anirudh

Re: [apache/incubator-mxnet] [RFC] MXNet Multithreaded Inference Interface (#16431)

2019-12-05 Thread Anirudh Subramanian
Thanks for the thoughtful and valuable comments @arcadiaphy. > I've deployed many models with scala API, and run them in multiple threads. > The whole system has run smoothly in production environment for more than 2 > months. > The backend of inference is graph executor, which is created for e

Re: [apache/incubator-mxnet] [RFC] MXNet Multithreaded Inference Interface (#16431)

2019-10-23 Thread Anirudh Subramanian
@ptrendx I am trying to open a PR by Friday. On the status : the two prereqs issues https://github.com/dmlc/dmlc-core/pull/573 and https://github.com/apache/incubator-mxnet/issues/16434 have been better understood and fixed/worked around. I have made C API and backend changes and currently stil

Re: Join the dev community

2019-10-18 Thread Anirudh Subramanian
Hi Akash, Welcome to the project! https://mxnet.apache.org/community/contribute is a good place to start. Anirudh On Fri, Oct 18, 2019 at 6:37 AM AKASH S M wrote: > Hello, > I'm Akash S M, an undergraduate from Indian Institute of > Technology, Roorkee. I'd like to join the developer

Re: [apache/incubator-mxnet] [RFC] MXNet Multithreaded Inference Interface (#16431)

2019-10-10 Thread Anirudh Subramanian
Thanks @marcoabreu ! > Will the new C-API functions be threadsafe in general? Speak, I can invoke > them at any point in time from any thread without the need of a lock, > sticky-thread or a thread hierarchy? (I'm thinking of the thread-safety being > done on the backend level) The issue I fo

[apache/incubator-mxnet] [RFC] MXNet Multithreaded Inference Interface (#16431)

2019-10-10 Thread Anirudh Subramanian
Thanks to @nswamy for his inputs and design discussions related to this project and @frankfliu for explaining the requirements and the use case from customer perspective. # Problem Statement One of the big un-catered for use cases in MXNet is loading a model and being able to run parallel infe

Re: mxnet ctrl-c

2019-09-23 Thread Anirudh Subramanian
chaining back to the > python through normal signal channels. if i can get it to work i’ll post a > PR. > > On Mon, Sep 23, 2019 at 12:00 PM Anirudh Subramanian < > anirudh2...@gmail.com> > wrote: > > > Currently I don't see any special handling in the code b

Re: mxnet ctrl-c

2019-09-23 Thread Anirudh Subramanian
Currently I don't see any special handling in the code base for this. We have atexit.register which invokes MXInvokeShutdown from python but that doesnt work for signals. Anirudh On Sun, Sep 22, 2019 at 7:30 PM Chris Olivier wrote: > question: how does gluon handle ctrl-c during a “long” impera

Re: [VOTE] Release Apache MXNet (incubating) 1.5.1.rc0

2019-09-19 Thread Anirudh Subramanian
+1 Build from source with cmake and ran unittest for gluon and amp. Noticed that test_sync_batchnorm fails on p3.8xlarge (hidden by the CI because passes on machines with 1 or 2 gpus). I have opened an issue for the same https://github.com/apache/incubator-mxnet/issues/16214 though I think its no

Re: [DISCUSS] Assigning Issues

2019-09-12 Thread Anirudh Subramanian
+1 On Thu, Sep 12, 2019 at 1:15 PM Zach Kimberg wrote: > We had a discussion a while back about trying to improve the way we handle > issues by assigning them to users who are working on them. However, the > discussion ended because issues could only be assigned to those with write > access (com

Re: [DISCUSS] Remove amalgamation

2019-09-10 Thread Anirudh Subramanian
Hi Pedro, I don't see anything "destructive" with Chris asking for justification for you calling something "hacky". The only email in this thread where I see ad hominems and disrespectful comments is your email. On Sat, Sep 7, 2019, 10:18 PM Pedro Larroy wrote: > Apache mentors should have a lo

Re: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc1

2019-06-20 Thread Anirudh Subramanian
arly so we can track and solve it > in time rather than block the release during vote time. > > [1] https://travis-ci.org/awslabs/sockeye > > > On Fri, Jun 21, 2019 at 7:01 AM Anirudh Subramanian > > wrote: > > > I was able to reproduce a cr

Re: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc1

2019-06-20 Thread Anirudh Subramanian
I was able to reproduce a crash with the commit 09202f7f261954383aa387144524d38f83f18d06 but not with the commit a862270beb2d796c1ba311183f7f4a766a18ad6c. Anirudh On Thu, Jun 20, 2019 at 3:53 PM Lai Wei wrote: > Hi Przemyslaw, > > Is there an issue with more details to track the problem? > > >

Re: CUDA / CUDNN support revisited

2019-06-18 Thread Anirudh Subramanian
+1, Agree this should be done for both CUDA and CUDNN versions. At max CUDA Version N and CUDA Version N - 1 should be supported in CI. My question is what happens, when we are at a position, where we are on a CUDA version N and removed support for CUDA version N - 1. Within a small duration Nvidi

Re: Making new operators and AMP lists

2019-05-30 Thread Anirudh Subramanian
> HOWEVER, as I was writing this reply I realized that due to pure luck this is not actually what happens - optimizers could in fact be in the FP32_FUNCS list. That is because, as AMP's assumption is that the model being changed is FP32 model at the start, all weights (and so all the gradients just

Re: Making new operators and AMP lists

2019-05-28 Thread Anirudh Subramanian
The assumption is the AMP requirement is something that has a steep learning curve. Developers may get confused by the name, but the question the developer has to essentially answer is (and this can be added in the error): 1. If the operator can run in FP16 and FP32 modes, put it in FP16_FP32_FUNCS

Re: Making new operators and AMP lists

2019-05-28 Thread Anirudh Subramanian
Hi, I agree with Marco there are some easy wins to be had since many new GPU operators come with FP16 support. I think we can explore the overhead to the developer and try to reduce the feedback time for the developer, so that cost associated with adding support for AMP feature is minimized. Also

Re: Making new operators and AMP lists

2019-05-28 Thread Anirudh Subramanian
Hi all, I had discussion with Przemyslaw about this offline. There are two options we can pursue to make developer experience better ( Since currently they have to wait for CI to complete): 1. Obtain the current lists and check if the length of the combined lists is same as MXListAllOpNames which

Re: [DISCUSS] 1.5.0 Release Plan

2019-05-15 Thread Anirudh Subramanian
quests aiming for > 1.5.0 that needs attention. > Please understand we already have around 650 commits in master that need > to be released in time. We understand TensorRT test in CI is failing and > are trying to fix it. Meanwhile please update the tracker if there is any > cha

Re: [Proposal] New operator graph for MXNet

2019-05-15 Thread Anirudh Subramanian
Hi Junru, Overall, I appreciate the points you made about the proposal. Having said that, I would like to remind the Apache Code of Conduct : https://www.apache.org/foundation/policies/conduct. "Be empathetic, welcoming, friendly and patient". I find your tone condescending. Clearly you understa

Re: Requesting slack access

2019-05-08 Thread Anirudh Subramanian
Sent invite! On Wed, May 8, 2019 at 6:43 AM Sem wrote: > Requesting slack access > >

Re: [DISCUSS] 1.5.0 Release Plan

2019-05-08 Thread Anirudh Subramanian
Hi Sheng, I had a discussion with nvidia folks offline today (@ptrendx et. al.). I strongly feel that the AMP feature should be included as part of the release: https://github.com/apache/incubator-mxnet/pull/14173 . The PR is aimed for completion for next week but reviews and RFC discussions may t

Re: [VOTE] Release Apache MXNet (incubating) version 1.4.1.rc0

2019-05-03 Thread Anirudh Subramanian
her cmake > works on their side. > > Thanks, > Junru > > > On Fri, May 3, 2019 at 9:43 PM Anirudh Subramanian > wrote: > > > Hi Junru, > > > > I am on v1.4.x , and my dmlc-core commit is this one : > > > > > https://github.com/dmlc/dmlc

Re: [VOTE] Release Apache MXNet (incubating) version 1.4.1.rc0

2019-05-03 Thread Anirudh Subramanian
Also, could you check if you are testing on v1.4.x branch? > > Thanks, > Junru > > > > On Fri, May 3, 2019 at 4:33 PM Anirudh Subramanian > wrote: > > > -1 (binding) > > > > Is the cmake build failing for the 1.4.1 release tag ? Is this a known > &g

Re: [VOTE] Release Apache MXNet (incubating) version 1.4.1.rc0

2019-05-03 Thread Anirudh Subramanian
-1 (binding) Is the cmake build failing for the 1.4.1 release tag ? Is this a known issue ? Did the following: cd build && cmake VERBOSE=1 -DUSE_CUDA=ON -DUSE_CUDNN=ON -DUSE_OPENMP=ON -DCMAKE_BUILD_TYPE=Debug -DUSE_DIST_KVSTORE=0 -DUSE_OPENCV=1 -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda -DCUDNN_ROO

Re: Proposal for Conversion from FP32 to Mixed Precision Models

2019-04-30 Thread Anirudh Subramanian
> mixed precision model we don't talk about training, and when talk about > inference, INT8 quantization is not mentioned~ > > -Original Message- > From: Anirudh Subramanian [mailto:anirudh2...@gmail.com] > Sent: Tuesday, April 30, 2019 8:27 PM > To: dev@mxnet.

Re: Proposal for Conversion from FP32 to Mixed Precision Models

2019-04-30 Thread Anirudh Subramanian
passes. Anirudh On Mon, Apr 29, 2019 at 2:22 PM Anirudh Subramanian wrote: > Hi Zach, > > You raise an interesting point. Thank you for the pointer! > > Incorporating CSE pass comes with its own cost, and the advantage it > brings is to make the ReducePrecision nnvm pass more ligh

Re: Proposal for Conversion from FP32 to Mixed Precision Models

2019-04-29 Thread Anirudh Subramanian
y for other passes that > could create duplicates or to remove duplicate expressions in general. This > tutorial [2] talks about it a bit. > > Zach > > [1] - https://en.wikipedia.org/wiki/Common_subexpression_elimination > [2] - https://blog.regehr.org/archives/1603 > > On Mon,

Re: Proposal for Conversion from FP32 to Mixed Precision Models

2019-04-29 Thread Anirudh Subramanian
ort the lower > > precision the previous one used? > > - what will be saved in the final symbol.json and params file when > > training is finished? > > - more generally, what will be saved when users want to serialize > > their model to disk? > > > > Th

Re: Proposal for Conversion from FP32 to Mixed Precision Models

2019-04-29 Thread Anirudh Subramanian
params file when > training is finished? > - more generally, what will be saved when users want to serialize their > model to disk? > > Thank you, > -tao > > -Original Message- > From: Anirudh Subramanian [mailto:anirudh2...@gmail.com] > Sent: Monday, April 29, 2019

Proposal for Conversion from FP32 to Mixed Precision Models

2019-04-29 Thread Anirudh Subramanian
Hi all, I have created a doc for conversion from FP32 to Mixed Precision Models: https://cwiki.apache.org/confluence/display/MXNET/Conversion+from+FP32+to+Mixed+Precision+Models I look forward to your feedback on the same. Thanks, Anirudh

[Announcement] New Committer - Wang Jiajun

2019-04-16 Thread Anirudh Subramanian
Hi, Please join me to welcome Wang Jiajun (https://github.com/arcadiaphy) as a new committer of Apache (incubating) MXNet! Wang has been solving some tough bugs with respect to memory leaks, process fork handling, dependency engine issues and custom op exception handling. Issue Involvement: http

Re: Implementing zero-dim and zero-size tensors in MXNet and its impact on your codebases

2019-04-11 Thread Anirudh Subramanian
> If there is a use-case where people can not even use our C++ package, > then > > we could have discussions about introducing a user-facing C-API, but > right > > now this approach to interface with our C-API (although I know that > people > > use it) seem a bit like u

Re: Implementing zero-dim and zero-size tensors in MXNet and its impact on your codebases

2019-04-11 Thread Anirudh Subramanian
ce a lot of duplicate > code though. > > On Thu, Apr 11, 2019 at 8:50 AM Anirudh Subramanian > > wrote: > > > I was under the impression that C API does fall under semver. Has this > been > > discussed somewhere before ? Is this also the case for C Predict API ? > &g

Re: Implementing zero-dim and zero-size tensors in MXNet and its impact on your codebases

2019-04-11 Thread Anirudh Subramanian
I was under the impression that C API does fall under semver. Has this been discussed somewhere before ? Is this also the case for C Predict API ? On Thu, Apr 11, 2019, 8:08 AM Marco de Abreu wrote: > In case only changes to the c-api are being made, it doesn't fall under our > semantic versioni

[Announcement] New Committer - Alex Zai

2019-03-31 Thread Anirudh Subramanian
Hi all, Please join me to welcome Alex Zai as a new committer of Apache (incubating) MXNet! Alex has been instrumental in brining MKLDNN from experimental to making it default on MXNet master. This involved adding Python and C++ unit tests, improving CI coverage for MKLDNN, testing MKLDNN on diff

[Announcement] New Committer - Patric Zhao

2019-03-14 Thread Anirudh Subramanian
Hi all, Please join me to welcome Patric Zhao as a new committer of Apache (incubating) MXNet! Patric has put in great effort around MKLDNN integration into MXNet and has been involved in features like quantization, graph fusion and fused RNN operators for CPU. Dev List activity: https://lists.a

Re: [VOTE] Release Apache MXNet (incubating) version 1.4.0.rc2

2019-02-04 Thread Anirudh Subramanian
-0 Thanks Steffen for your release efforts ! Build from source works with make but fails with cmake for me. cd build && cmake VERBOSE=1 -DUSE_CUDA=ON -DUSE_CUDNN=ON -DUSE_OPENMP=ON -DCMAKE_BUILD_TYPE=Debug -DUSE_DIST_KVSTORE=0 -DUSE_OPENCV=1 -GNinja .. && ninja -v FAILED: : && /usr/bin/c++ -

Re: [Question] UI change policy in MXNet

2018-12-20 Thread Anirudh Subramanian
ct to > backward compatibility, interface changes, testing etc. > > (Lin) This is definitely an informative discussion. It would be better if > we can put this in a more noticeable place for developers. > > > On Thu, Dec 20, 2018 at 1:39 PM Anirudh Subramanian > > wrot

Re: [Question] UI change policy in MXNet

2018-12-20 Thread Anirudh Subramanian
1) Which guideline should we follow when updating the UI in MXNet operators? A) MXNet follows semantic versioning, so breaking changes to operator interfaces can be introduced only in major versions. 2) Who should approve the UI change? A) Contributors who may have worked on the operator and/or ot

Re: v1.4.0 status 11/29

2018-12-03 Thread Anirudh Subramanian
Hi Steffen, I have created a PR to cherry pick the change to v1.4.x branch: https://github.com/apache/incubator-mxnet/pull/13517 Anirudh On Mon, Dec 3, 2018 at 11:29 AM Steffen Rochel wrote: > Thanks Haibin. Anirudh - please add PR for v1.4.x for > https://github.com/apache/incubator-mxnet/pul

Re: Adding AMD CPU to CI

2018-11-29 Thread Anirudh Subramanian
Instruction set extensions support like AVX2, AVX512 etc. can vary between AMD and Intel and there can also be a time lag between when Intel supports it versus when AMD supports it. Also, in the future this setup may be useful in case MXNet supports AMD GPUs and AWS also happens to have support for

Re: Adding AMD CPU to CI

2018-11-29 Thread Anirudh Subramanian
+1 On Thu, Nov 29, 2018 at 2:38 PM Alex Zai wrote: > What are people's thoughts on having AMD machines tested on the CI? AMD > machines are now available on AWS. > > Best, > Alex >

Re: Include MKLDNN into default mxnet pip package

2018-11-27 Thread Anirudh Subramanian
annot find any evidence about patch release. > > -tao > > -Original Message- > From: Anirudh Subramanian [mailto:anirudh2...@gmail.com] > Sent: Tuesday, November 27, 2018 6:16 AM > To: dev@mxnet.incubator.apache.org > Subject: Re: Include MKLDNN into default mxnet pip pa

Re: Include MKLDNN into default mxnet pip package

2018-11-26 Thread Anirudh Subramanian
Hi Tao, I agree with Steffen that we can start with a stable release for MKLDNN for 1.4.0. For your suggestion on using 0.17, can you provide info on what versioning mechanism MKLDNN uses. Once a MKLDNN release is out and there are some regressions found like the LSTM regression, would it be possi

Re: CI impaired

2018-11-21 Thread Anirudh Subramanian
Thanks for the quick response and mitigation! On Wed, Nov 21, 2018 at 3:55 PM Marco de Abreu wrote: > Hello, > > today, CI had some issues and I had to cancel all jobs a few minutes ago. > This was basically caused by the high load that is currently being put on > our CI system due to the pre-re

[ANNOUNCE] Apache MXNet (incubating) 1.2.1 Release

2018-07-20 Thread Anirudh Subramanian
Hello all, The Apache MXNet (incubating) Community announces the availability of Apache MXNet (incubating) 1.2.1! Apache MXNet (incubating) is a deep learning framework designed for both efficiency and flexibility. It allows you to mix symbolic and imperative programming to maximize efficiency a