from:"Naveen Swamy"

Re: Profiler Broken?

2020-06-01 Thread Naveen Swamy

FYI, It works with 1.6.0, the output json file was too large for chrome to
handle and it silently failed. I reduced the number of iterations to
profile to create a smaller file, anything above 500 MB seems to fail.
Thanks Sheng for the pointer and Anirudh for the help.

On Thu, May 28, 2020 at 1:07 PM Pedro Larroy 
wrote:

> Yes the profiler seems to be broken / has some concurrency issues. I have
> seen corrupted profile results.
>
> On Thu, May 28, 2020 at 12:30 PM Naveen Swamy  wrote:
>
> > I am attempting to profile one of our models, I used the profiler.state
> to
> > run/stop in code and also used the environment variables to autostart the
> > profiler. It creates a 600MB json file, however when I view in chrome
> > tracing it comes out to be blank screen (loading seems to be fine, didn't
> > get any errors)
> >
> > Wondering if anyone has recently tried or if aware of profiler being
> > broken?
> >
> > ENVIRON: Ubuntu 18.04
> > MXNet : mxnet-cu101mkl
> > Deep Learning AMI (Ubuntu 18.04) Version 29.0 (ami-043f9aeaf108ebc37)
> >
> > Thanks, Naveen
> >
>

Profiler Broken?

2020-05-28 Thread Naveen Swamy

I am attempting to profile one of our models, I used the profiler.state to
run/stop in code and also used the environment variables to autostart the
profiler. It creates a 600MB json file, however when I view in chrome
tracing it comes out to be blank screen (loading seems to be fine, didn't
get any errors)

Wondering if anyone has recently tried or if aware of profiler being broken?

ENVIRON: Ubuntu 18.04
MXNet : mxnet-cu101mkl
Deep Learning AMI (Ubuntu 18.04) Version 29.0 (ami-043f9aeaf108ebc37)

Thanks, Naveen

Re: Using AMP

2020-05-01 Thread Naveen Swamy

Thanks Przemek, appreciate your input. Let me apply the scale changes to
the gradient clips and run the experiment again.

On Fri, May 1, 2020 at 11:20 AM Przemysław Trędak 
wrote:

> Just realized I did not actually link to the issue I mentioned, it is
> https://github.com/apache/incubator-mxnet/issues/17507
>
> On 2020/05/01 18:19:27, Przemys��aw Tr��dak  wrote:
> > Hi Naveen,
> >
> > The problem that you see with loss is due to the fact that the model
> clips the gradient, which in the case of AMP is scaled by the loss scale.
> In order for it to work you need to apply the same loss scale to the value
> you are using to clip the gradients. This is currently possible in 2 ways,
> either use amp.unscale API to unscale the gradients before clipping, or use
> (currently quite hackily, there is an open issue [1] to expose it properly)
> trainer._amp_loss_scaler.loss_scale to multiply your intended global norm
> of gradients.
> >
> > The topic of gradient clipping with AMP is a common problem people have
> and it should be included in the tutorial. I intend to update the tutorial
> with an example of this together with other changes intended to bring AMP
> out of contrib.
> >
> > Regarding performance - it is quite hard to say what is the reason of
> this without profiling the application - there could be multiple different
> bottleneck here, other than the actual computation on the GPU.
> >
> > Hope this helps :-)
> > Przemek
> >
> > On 2020/05/01 05:14:39, Naveen Swamy  wrote:
> > > Hello,
> > > I am trying to use AMP on an RNN Model, however I am not seeing higher
> > > throughputs using AMP. also the loss seems to have stagnated. I am
> > > wondering if I am missing something.
> > >
> > > Also has AMP has been tested on any RNN models and if there are any
> > > benchmarks ? Appreciate some input here..
> > >
> > > I used the RNN model here [1] and followed the tutorial in [2], the
> output
> > > of the runs are
> > > 
> > > Without AMP:
> > > mxnet-lm$ python train.py --cuda --tied --nhid 1500 --emsize 1500
> --epochs
> > > 60  --dropout 0.65 --model gru --batch_size 128
> > >
> > > [Epoch 3 Batch 200/13] loss 6.47, ppl 648.24, throughput 675.94
> samples/s
> > > [Epoch 3 Batch 400/13] loss 6.30, ppl 543.20, throughput 679.51
> samples/s
> > > [Epoch 3] time cost 90.29s, valid loss 5.97, valid ppl 392.94
> > > test loss 5.89, test ppl 361.69
> > > [Epoch 4 Batch 200/13] loss 6.15, ppl 470.58, throughput 676.46
> samples/s
> > > [Epoch 4 Batch 400/13] loss 6.01, ppl 408.21, throughput 679.51
> samples/s
> > > [Epoch 4] time cost 90.27s, valid loss 5.69, valid ppl 296.89
> > >
> > > test loss 5.63, test ppl 277.58
> > > 
> > > With AMP:
> > >
> > > (gluonnlp) ubuntu@ip-172-30-0-140:~/mxnet-lm$ python train.py --cuda
> --tied
> > > --nhid 1500 --emsize 1500 --epochs 60  --dropout 0.65 --model gru
> > > --batch_size 128 --amp True
> > > Namespace(amp=True, batch_size=128, bptt=35, clip=0.25, cuda=True,
> > > dropout=0.65, emsize=1500, epochs=60, export_model=False,
> gcthreshold=0.5,
> > > gctype='none', hybridize=False, log_interval=200, lr=20, model='gru',
> > > nhid=1500, nlayers=2, save='model.params', static_alloc=False,
> > > static_shape=False, tied=True)
> > > using AMP
> > > INFO:root:Using AMP
> > > [Epoch 3 Batch 200/13] loss 10.43, ppl 34026.18, throughput 685.66
> samples/s
> > > [Epoch 3 Batch 400/13] loss 10.38, ppl 32150.51, throughput 688.99
> samples/s
> > > [Epoch 3] time cost 89.04s, valid loss 10.36, valid ppl 31650.83
> > > test loss 10.36, test ppl 31626.99
> > > INFO:root:AMP: increasing loss scale to 131072.00
> > > [Epoch 4 Batch 200/13] loss 10.42, ppl 33642.12, throughput 686.83
> samples/s
> > > [Epoch 4 Batch 400/13] loss 10.37, ppl 31839.51, throughput 689.55
> samples/s
> > > 
> > >
> > > changes made to the training loop after initializing amp and the
> trainer:
> > >
> > > with autograd.record():
> > > output, hidden = model(data, hidden)
> > > # Here L is a vector of size batch_size * bptt size
> > > L = loss(output, target)
> > > L = L / (args.bptt * args.batch_size)
> > > with amp.scale_loss(L, trainer) as scaled_loss:
> > > mx.autograd.backward(scaled_loss)
> > >
> > > 
> > > [1]
> > >
> https://github.com/apache/incubator-mxnet/blob/master/example/gluon/word_language_model/train.py
> > >
> > > [2]
> > >
> https://mxnet.apache.org/api/python/docs/tutorials/performance/backend/amp.html
> > >
> > > Thanks, Naveen
> > >
> >
>

Using AMP

2020-04-30 Thread Naveen Swamy

Hello,
I am trying to use AMP on an RNN Model, however I am not seeing higher
throughputs using AMP. also the loss seems to have stagnated. I am
wondering if I am missing something.

Also has AMP has been tested on any RNN models and if there are any
benchmarks ? Appreciate some input here..

I used the RNN model here [1] and followed the tutorial in [2], the output
of the runs are

Without AMP:
mxnet-lm$ python train.py --cuda --tied --nhid 1500 --emsize 1500 --epochs
60  --dropout 0.65 --model gru --batch_size 128

[Epoch 3 Batch 200/13] loss 6.47, ppl 648.24, throughput 675.94 samples/s
[Epoch 3 Batch 400/13] loss 6.30, ppl 543.20, throughput 679.51 samples/s
[Epoch 3] time cost 90.29s, valid loss 5.97, valid ppl 392.94
test loss 5.89, test ppl 361.69
[Epoch 4 Batch 200/13] loss 6.15, ppl 470.58, throughput 676.46 samples/s
[Epoch 4 Batch 400/13] loss 6.01, ppl 408.21, throughput 679.51 samples/s
[Epoch 4] time cost 90.27s, valid loss 5.69, valid ppl 296.89

test loss 5.63, test ppl 277.58

With AMP:

(gluonnlp) ubuntu@ip-172-30-0-140:~/mxnet-lm$ python train.py --cuda --tied
--nhid 1500 --emsize 1500 --epochs 60  --dropout 0.65 --model gru
--batch_size 128 --amp True
Namespace(amp=True, batch_size=128, bptt=35, clip=0.25, cuda=True,
dropout=0.65, emsize=1500, epochs=60, export_model=False, gcthreshold=0.5,
gctype='none', hybridize=False, log_interval=200, lr=20, model='gru',
nhid=1500, nlayers=2, save='model.params', static_alloc=False,
static_shape=False, tied=True)
using AMP
INFO:root:Using AMP
[Epoch 3 Batch 200/13] loss 10.43, ppl 34026.18, throughput 685.66 samples/s
[Epoch 3 Batch 400/13] loss 10.38, ppl 32150.51, throughput 688.99 samples/s
[Epoch 3] time cost 89.04s, valid loss 10.36, valid ppl 31650.83
test loss 10.36, test ppl 31626.99
INFO:root:AMP: increasing loss scale to 131072.00
[Epoch 4 Batch 200/13] loss 10.42, ppl 33642.12, throughput 686.83 samples/s
[Epoch 4 Batch 400/13] loss 10.37, ppl 31839.51, throughput 689.55 samples/s


changes made to the training loop after initializing amp and the trainer:

with autograd.record():
output, hidden = model(data, hidden)
# Here L is a vector of size batch_size * bptt size
L = loss(output, target)
L = L / (args.bptt * args.batch_size)
with amp.scale_loss(L, trainer) as scaled_loss:
mx.autograd.backward(scaled_loss)


[1]
https://github.com/apache/incubator-mxnet/blob/master/example/gluon/word_language_model/train.py

[2]
https://mxnet.apache.org/api/python/docs/tutorials/performance/backend/amp.html

Thanks, Naveen

[ANOUNCEMENT] New PPMC Member Thomas DELTEIL

2020-02-13 Thread Naveen Swamy

Hi all,

Please join me in welcoming Thomas Delteil as a new member of Apache MXNet
(incubating) PPMC!

Thomas has contributed to various areas of Apache MXNet being an evangelist
he has created many learning materials for MXNet, engaged with
contributions on PRs and Users on the forum (he is the most liked on the
MXNet user forum https://discuss.mxnet.io/u?period=all), he also helped
overhaul the website.

Welcome Thomas!

-Naveen

Re: [DISCUSS] Remove amalgamation

2019-09-12 Thread Naveen Swamy

so the original email suggesting to remove was after all self-serving :)

let's encourage if someone wants to maintain and make use of the original
work and make it better.

-1 to remove at this point

P.S: I suggest to do some due diligence before bringing topics up for
discussion.

On Wed, Sep 11, 2019 at 8:10 AM Lv, Tao A  wrote:

> Sorry to chime in.
>
> There is a PR to fix amalgamation. I was pinged several times to merge it
> but I don't think I have enough knowledge to do that. So it would be great
> if someone from this thread can help to review.
>
> https://github.com/apache/incubator-mxnet/pull/15303
>
> thanks,
> -tao
>
> -Original Message-
> From: Marco de Abreu 
> Sent: Wednesday, September 11, 2019 9:38 PM
> To: dev@mxnet.incubator.apache.org
> Subject: Re: [DISCUSS] Remove amalgamation
>
> Is Amalgamation only used on Android though? Are there any other use cases?
>
> -Marco
>
> Pedro Larroy  schrieb am Mi., 11. Sep. 2019,
> 11:57:
>
> > Hi Anirudh
> >
> > Appreciate your feedback and sorry if my email came across that way to
> > you, I think you might miss some context. I don't think calling
> > something hacky is anything bad and isn't supposed to be the topic of
> > the discussion. It was reported as not working by users, hence the
> > original thread. It was a request for opinions from people who might
> > actually have tried to work in Mxnet on Android.
> >
> > Thanks.
> >
> > Pedro.
> >
> >
> > On Tuesday, September 10, 2019, Anirudh Subramanian
> >  > >
> > wrote:
> > > Hi Pedro,
> > >
> > > I don't see anything "destructive" with Chris asking for
> > > justification
> > for
> > > you calling something "hacky". The only email in this thread where I
> > > see
> > ad
> > > hominems and disrespectful comments is your email.
> > >
> > > On Sat, Sep 7, 2019, 10:18 PM Pedro Larroy
> > >  > >
> > > wrote:
> > >
> > >> Apache mentors should have a look at these reincident harassment
> > >> and destructive behaviors which demotivate contributions and take
> > >> action. It takes only one bad apple to ruin a community.
> > >>
> > >> The mobile solution that is known to work as of know is cross
> > >> compiling with "ci/build.py -p build.android_armv8" or
> > >> "build.android_armv7". The only advantage of amalgamation is to
> > >> provide a smaller binary that we
> > could
> > >> accomplish with the C preprocessor.
> > >>
> > >> My technical contributions speak for themselves, including porting
> > >> MXNet
> > to
> > >> Android and ARM and helping many users run MXNet in Jetson,
> > >> Raspberry Pi and Android amongst many other topics. I have never
> > >> been disrespectful
> > to
> > >> anyone. I'm entitled to my own technical opinions about
> > >> amalgamation or
> > any
> > >> other piece of code whatsoever, that's no personal disrespect to
> > >> anyone
> > and
> > >> perfectly valid. If you are not interested in this project anymore,
> > >> do
> > us
> > >> all a favor and stop trolling and being toxic. If you want my
> > >> respect,
> > step
> > >> up your technical contributions, be positive and encourage others,
> > >> this including commits, I haven't seen for many months, please be
> > >> positive
> > and
> > >> constructive. This scorched-earth attitude is only reflecting bad
> > >> on
> > you.
> > >> I'm certainly not interested in your ad-hominems or unasked for
> > technical
> > >> advice, which to be honest,  showing poor judgment and ignorance.
> > >> Myself and others have come up with numbers, graphs, metrics and
> > >> arguments and have been met with dismissal, trolling and
> > >> sea-lioning. I have recieved your insults via public and private
> > >> channels (such as linkedin) as have others. This is not ok and has
> > >> to stop. If you have something personal against me or against your
> > >> former employer, this is not the right place
> > or
> > >> forum.
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> On Fri, Sep 6, 2019 at 3:56 PM Chris Olivier
> > >> 
> > >> wrote:
> > >>
> > >> > Hi Pedro,
> > >> >
> > >> > While I was not involved with amalgamation or its development in
> > >> > any
> > way,
> > >> > can you please refrain from referring to the work of others as a
> > "hacky
> > >> > solution"?  This is derogatory slang and the statement was not
> > supported
> > >> > with any justification for such name-calling.  Someone spent a
> > >> > good
> > deal
> > >> of
> > >> > time on this solution at some point in time and I am sure it
> > >> > worked
> > for
> > >> its
> > >> > purpose at that time -- I think it was used in the original
> > >> > javascript
> > >> port
> > >> > as well, actually -- and it is disrespectful to call their
> > >> > efforts "hacky".  Please respect what came before.
> > >> >
> > >> > Thanks for understanding,
> > >> >
> > >> > -Chris
> > >> >
> > >> >
> > >> > On Fri, Sep 6, 2019 at 3:07 PM Pedro Larroy <
> > >> pedro.larroy.li...@gmail.com>
> > >> > wrote:
> > >> >

Re: [DISCUSS] Remove amalgamation

2019-09-06 Thread Naveen Swamy

+1.
I have heard this before elsewhere if you don't understand the code, give
it a name like "hacky", "does not follow the pattern",  "unmaintainable",
etc., may all that be true but it does not help making cliched and
disrespectful comments about someone else's contributions.
the code is not going on a ramp walk for a beauty contest.  Instead of
subscribing to such software fallacies, it would help the community to make
a decision if concrete examples of limitations, drawbacks, missing features
are given.



On Fri, Sep 6, 2019 at 3:56 PM Chris Olivier  wrote:

> Hi Pedro,
>
> While I was not involved with amalgamation or its development in any way,
> can you please refrain from referring to the work of others as a "hacky
> solution"?  This is derogatory slang and the statement was not supported
> with any justification for such name-calling.  Someone spent a good deal of
> time on this solution at some point in time and I am sure it worked for its
> purpose at that time -- I think it was used in the original javascript port
> as well, actually -- and it is disrespectful to call their efforts
> "hacky".  Please respect what came before.
>
> Thanks for understanding,
>
> -Chris
>
>
> On Fri, Sep 6, 2019 at 3:07 PM Pedro Larroy 
> wrote:
>
> > Hi
> >
> > I would like to propose to remove amalgamation from MXNet and CI, users
> > have reported that they couldn't use it successfully in Android, and
> > instead they were able to use the cross compiled docker build
> successfully.
> >
> > Any reason why we shouldn't remove this hacky solution?
> >
> > Pedro.
> >
>

Re: [Discuss] Experiment Reproducibility in MXNet

2019-09-02 Thread Naveen Swamy

Look at https://mlflow.org/

> On Sep 2, 2019, at 7:02 PM, Chaitanya Bapat  wrote:
> 
> Hello MXNet community,
> 
> Reproducibility of ML experiments carried out by data scientists, analysts
> and experts is the talk of the town.
> 
> While listening to TWiML's latest podcast - Managing Deep Learning
> Experiments with Lukas Biewald [1], he mentions the company Weights and
> Biases [2] [3]
> 
> Brief
> - Reproducibility crisis in ML
> - Let alone the latest research papers, even your own experiments (say from
> 1 month ago) are not reproducible
> - Solution :
> 1. Versioning
> Takes snapshots to store versions - Code, Data, Parameters and Hyper
> parameters
> Versioning or Snapshotting falls in the realm of data management. Notable
> companies - DVC and Pachyderm.
> 
> 2. Visualization
> Builds on top of Tensorboard (TBoard). But solves its shortcomings
> - Targeted for distributed training (unlike TBoard)
> - Visualizes wrt several experiments (not just a single run)
> 
> 3. Collaboration
> Making this cloud based, allows cross-team collaboration.
> 
> *MXNet*
> From MXNet's point of view, we can discuss if it's worthwhile to have this
> (many positives point towards a yes) and if so we can explore following
> options -
> a. Work with W for building support for using it with MXNet (currently
> they have Tensorflow (TF) and PyTorch (PT) supported)
> b. Build something in-house on similar lines that would involve significant
> engineering effort, discussion.
> 
> So I wanted to know what does the community think about this?
> 
> Thanks,
> Chai
> 
> [1]
> https://twimlai.com/twiml-talk-295-managing-deep-learning-experiments-with-lukas-biewald
> [2] https://www.wandb.com
> [3] https://github.com/wandb
> 
> -- 
> *Chaitanya Prakash Bapat*
> *+1 (973) 953-6299*
> 
> [image: https://www.linkedin.com//in/chaibapat25]
> [image: https://www.facebook.com/chaibapat]
> [image:
> https://twitter.com/ChaiBapchya] [image:
> https://www.linkedin.com//in/chaibapat25]
>

[ANNOUNCEMENT] New Committer: Przemyslaw Tredak (ptrendx)

2019-05-21 Thread Naveen Swamy

The Project Podling Management Committee (PPMC) for Apache MXNet has
invited Przemyslaw Tredak (ptrendx) based on his contribution to MXNet to
become a committer and we are pleased to announce that he has accepted.

Przemyslaw, thanks a lot for your contribution and continued effort to
support MXNet community.

Please join me in welcoming Przemyslaw to the project!

Thanks, Naveen
(on behalf of Apache MXNet PPMC)

Re: [Proposal] New operator graph for MXNet

2019-05-15 Thread Naveen Swamy

Being dismissive and condescending has been exactly what is plaguing this
project.

I agree the last paragraph sounds very condescending and very dismissive
and it breaks many code of conducts listed.

On Wed, May 15, 2019 at 11:31 AM Anirudh Subramanian 
wrote:

> Hi Junru,
>
> Overall, I appreciate the points you made about the proposal.
>
> Having said that, I would like to remind the Apache Code of Conduct :
> https://www.apache.org/foundation/policies/conduct.
> "Be empathetic, welcoming, friendly and patient".
>
> I find your tone condescending. Clearly you understand what he meant from
> the context whether you prefer to call IR in compilers or data-flow in
> distributed systems. You could very well say lets use this terminology to
> have a common understanding instead of saying go learn the basic concepts.
> Before building a cool brand, its important to build a healthy community.
>
> Anirudh
>
>
> On Wed, May 15, 2019 at 12:03 AM Junru Shao 
> wrote:
>
> > Hi Pedro,
> >
> > I really appreciate that a diligent and talented engineer eagerly wants
> to
> > improve our system, and am very thankful that you have done so much for
> our
> > community. However, I do want to mention some points that I believe I
> > should mention.
> >
> > While I agree with Tianqi that every design has its pros and cons, I
> would
> > love to emphasize that a *good taste* of system design is to optimize the
> > bottleneck, enhance expressiveness (and usability), i.e. to do what needs
> > doing, rather than *trivial nits* that are irrelevant to either
> performance
> > or expressiveness. Generally speaking, typed or untyped, shared_ptr or
> > unique_ptr, won't affect the overall performance when it comes to deep
> > learning workload, specially when we have an async scheduler that does
> good
> > latency hiding in MXNet - to me, these are not major issues that are
> worth
> > re-designing our entire system.
> >
> > To benefit users - real-world ML practitioners, the most thing I would
> love
> > to mention is that dataflow graph-based representation is increasingly
> > incapable of modern neural networks, because the increasingly appeared
> > structures like arbitrary control flow (w/ continue, break, etc),
> > recursion, type conjunction and disjunction, etc. These issues will be
> our
> > priority to address, which is brought by Relay, which addresses all these
> > pain points.
> >
> > Another minor thing I would love to humbly mention is that, for sake of
> our
> > brand, it is our responsibility to be professional about terminologies
> when
> > writing an official proposal on Confluence. As one of the numerous
> > examples, the title of the proposal really shocks me for a while,
> something
> > like "operators graph" blah blah so weird. Educate me if I were wrong,
> but
> > compiler community would prefer the term "intermediate representation",
> and
> > distributed system community would prefer "dataflow graph". If you don't
> > have knowledge in these fields, a better way for efficient communication
> is
> > to get yourself first familiarize the most basic concepts and then do
> > discussion. This is a way to save your own valuable time as well.
> >
> > Again, thank you so much for your hard work, and hope that we could work
> > together to win customers in the future :-)
> >
> > Thanks,
> > Junru
> >
> >
> > On Tue, May 14, 2019 at 8:03 PM Tianqi Chen 
> > wrote:
> >
> > > The core part of the proposal is to move the graph to be much more
> > strongly
> > > typed template class.
> > > I think this is mainly a point of engineering taste, and both sides
> have
> > > pros and cons, let me list them before I share my thoughts on this
> issue:
> > >
> > > - Typed fields certainly enjoy more compile-time type checking, on the
> > > other hand, it is hard to expose
> > >template of explosive possibilities to frontend languages.
> > > - More type-erased fields provide runtime flexibility to store
> > polymorphic
> > > types as well as extensible attributes for graph optimization
> > >   - It is hard to use a virtual class to expose every possible
> attribute
> > > that an operator might have, such as inlining, storage pattern,
> gradient
> > > etc..
> > >   - The nature of supporting a growing set of operator attribute
> > requires a
> > > type-erased attrs field.
> > > - In contrast to your argument(typing is a blocker to features),
> > > type-erased or typed code can both get to the same feature except,
> except
> > > that
> > >   typed code gets more compile-time errors while type-erased get some
> of
> > > them in runtime.
> > > - Templatized data structures will likely introduce additional metal
> > > burdens to developers and are not really suitable as a core data
> > structure
> > >- Because they imply an explosive number of possible data
> structures,
> > > while the core data structure should be a single one.
> > >
> > > Now my view(as an MXNet PMC member) on typed vs type-erased style: If
> > MXNet
> > >

Re: [Proposal] MXNet operator benchmark library

2019-05-13 Thread Naveen Swamy

Sandeep,

Thanks for initiating work on individual operator performance. However I
find the proposed approach(ie., a separate libary/framework) to unnecessary
and increases maintenance overhead for the project.
Also, have you considered alternate approaches to achieve the same goal.?

Many of the requirements/motivations you have mentioned typically should be
covered in unit-tests(different data-types/ different dimensions), so
instead of having to rewrite for all operators measuring performance,
consider writing a @timeit routine(using Python decorators) which can be
called on individual unit tests.  Also even if you call the performance
script from Python, typically you want to measure as close to the kernel as
possible and avoid any other variables.

I left some comments on the doc itself.

Happy to discuss further.

-Naveen

On Mon, Apr 29, 2019 at 1:57 PM sandeep krishnamurthy <
sandeep.krishn...@gmail.com> wrote:

> Hello Community,
>
> I am currently working on building a utility/library to help us easily do
> individual operator benchmarking in MXNet. I have documented the proposal
> in
> this cwiki
> <
> https://cwiki.apache.org/confluence/display/MXNET/MXNet+Operator+Benchmarks
> >,
> and staging the current development in this github repository
> . Proposal
> is to get this library under incubator-mxnet/benchmark/
> . Please
> do review and provide your feedback and suggestions.
>
> Thanks to fellow MXNet community members - Lin, Sam, Rohit for providing
> initial ideas and suggestion.
>
> Best,
> Sandeep
>
>
>
>
> --
> Sandeep Krishnamurthy
>

Re: Unable to comment on GitHub issue

2019-05-10 Thread Naveen Swamy

Everything is in place, don't have any problems on other issues and PRs except 
a few of them. 

Certainly Apache Infra should override the setting on Apache projects to avoid 
abuse of the feature.

> On Fri, May 10, 2019 at 3:42 AM Marco de Abreu  
> wrote:
> Do you have 2 factor authentication enabled? Apache requires committers to
> have it enabled or they automatically revoke all permissions.
> 
> Also, check that https://gitbox.apache.org/setup/ is all green and MXNet is
> listed as repository in the bottom half.
> 
> -Marco
> 
> Sheng Zha  schrieb am Fr., 10. Mai 2019, 02:47:
> 
> > Locking a conversation wouldn't limit a committer from commenting. "While
> > a conversation is locked, only people with write access and repository
> > owners and collaborators can add comments." [1]
> >
> > Unless the apache organization has the blocking setting, blocking by a
> > person shouldn't limit one from commenting on issues in mxnet repo either.
> > The organization that owns the repo needs to explicitly block the person to
> > be able to prevent one from commenting on an issue in the repo of that
> > organization. [2]
> >
> > -sz
> >
> > [1] https://help.github.com/en/articles/locking-conversations
> > [2]
> > https://help.github.com/en/articles/blocking-a-user-from-your-personal-account
> >
> > On 2019/05/09 23:33:00, Aaron Markham  wrote:
> > > I just locked one of the issues I created:
> > > https://github.com/apache/incubator-mxnet/issues/14918
> > > Are you sure you don't have the unlock button on the right side?
> > > You should see this:
> > >
> > > aaronmarkham locked as off topic and limited conversation to
> > > collaborators 24 seconds from now
> > >
> > > Then to the right of that:
> > >
> > >  Unlock conversation
> > >  Pin issue
> > >
> > > On Thu, May 9, 2019 at 4:27 PM Naveen Swamy  wrote:
> > > >
> > > > I don't see the option, another possible explanation someone must have
> > blocked me, if that is the case it goes against the ethos of Open source.
> > > > Apache infra should override that setting for Apache projects. Anyway
> > I created this Jira.
> > > >
> > https://issues.apache.org/jira/plugins/servlet/mobile#issue/INFRA-18356
> > > >
> > > > -Naveen
> > > >
> > > > > On May 9, 2019, at 4:19 PM, Aaron Markham 
> > wrote:
> > > > >
> > > > > A new feature:
> > https://help.github.com/en/articles/locking-conversations
> > > > > So someone must have locked it. I can see the option on the right
> > hand
> > > > > side column, all the way at the bottom. You will probably have the
> > > > > ability to unlock it from there too.
> > > > >
> > > > >> On Thu, May 9, 2019 at 3:42 PM Chaitanya Bapat <
> > chai.ba...@gmail.com> wrote:
> > > > >>
> > > > >> Any specific issues you could give the links to? So I could verify
> > if
> > > > >> that's the case with me.
> > > > >>
> > > > >>> On Thu, 9 May 2019 at 14:44, Naveen Swamy 
> > wrote:
> > > > >>>
> > > > >>> I am unable to comment on certain GitHub issues and see a locked
> > Icon,
> > > > >>> wondering if anyone has experienced this and know why?
> > > > >>>
> > > > >>
> > > > >>
> > > > >> --
> > > > >> *Chaitanya Prakash Bapat*
> > > > >> *+1 (973) 953-6299*
> > > > >>
> > > > >> [image: https://www.linkedin.com//in/chaibapat25]
> > > > >> <https://github.com/ChaiBapchya>[image:
> > https://www.facebook.com/chaibapat]
> > > > >> <https://www.facebook.com/chaibapchya>[image:
> > > > >> https://twitter.com/ChaiBapchya] <https://twitter.com/ChaiBapchya
> > >[image:
> > > > >> https://www.linkedin.com//in/chaibapat25]
> > > > >> <https://www.linkedin.com//in/chaibapchya/>
> > >
> >

Re: Unable to comment on GitHub issue

2019-05-09 Thread Naveen Swamy

I don't see the option, another possible explanation someone must have blocked 
me, if that is the case it goes against the ethos of Open source.
Apache infra should override that setting for Apache projects. Anyway I created 
this Jira.
https://issues.apache.org/jira/plugins/servlet/mobile#issue/INFRA-18356

-Naveen

> On May 9, 2019, at 4:19 PM, Aaron Markham  wrote:
> 
> A new feature: https://help.github.com/en/articles/locking-conversations
> So someone must have locked it. I can see the option on the right hand
> side column, all the way at the bottom. You will probably have the
> ability to unlock it from there too.
> 
>> On Thu, May 9, 2019 at 3:42 PM Chaitanya Bapat  wrote:
>> 
>> Any specific issues you could give the links to? So I could verify if
>> that's the case with me.
>> 
>>> On Thu, 9 May 2019 at 14:44, Naveen Swamy  wrote:
>>> 
>>> I am unable to comment on certain GitHub issues and see a locked Icon,
>>> wondering if anyone has experienced this and know why?
>>> 
>> 
>> 
>> --
>> *Chaitanya Prakash Bapat*
>> *+1 (973) 953-6299*
>> 
>> [image: https://www.linkedin.com//in/chaibapat25]
>> <https://github.com/ChaiBapchya>[image: https://www.facebook.com/chaibapat]
>> <https://www.facebook.com/chaibapchya>[image:
>> https://twitter.com/ChaiBapchya] <https://twitter.com/ChaiBapchya>[image:
>> https://www.linkedin.com//in/chaibapat25]
>> <https://www.linkedin.com//in/chaibapchya/>

Unable to comment on GitHub issue

2019-05-09 Thread Naveen Swamy

I am unable to comment on certain GitHub issues and see a locked Icon,
wondering if anyone has experienced this and know why?

Re: Requesting slack access

2019-03-30 Thread Naveen Swamy

done. Welcome to Apache MXNet.

On Sat, Mar 30, 2019 at 2:44 PM Luyang Wang  wrote:

>
>

Re: Gluon fit API- Design proposal

2019-03-05 Thread Naveen Swamy

FYI, I have created a branch on the repo to facilitate multiple
collaborators for this feature.
https://github.com/apache/incubator-mxnet/tree/fit-api, they'll create PRs
to this branch and once the api is feature complete, i will rebase and
merge to master to preserve commit history

On Sun, Feb 10, 2019 at 2:43 PM Hagay Lupesko  wrote:

> Wanted to chime in as well.
> I have reviewed the design shared in the mail offline with Ankit, Lai and
> Naveen (we work in the same team in Amazon).
>
> I think it does a good job at simplifying many low-complexity training use
> cases, which can make MXNet/Gluon even more friendly to so-called "deep
> learning beginners" - so +1 on the proposal!
>
> Hagay
>
> On Fri, Feb 8, 2019 at 10:30 AM Naveen Swamy  wrote:
>
> > Hi Alfredo,
> > Thanks for your comments, I really like all your suggestions. Here are my
> > answers let me know if it makes sense or have comments.
> >
> > 1) The fit API is targeting novice users covering about 80% of the use
> > cases listed in the document. For advanced users,
> > and complex models, we (Naveen, Ankit and Lai) felt its best use the
> > existing mechanisms due to the imperative nature and the more control it
> > can give, So we did not duplicate the save/load functionality in the
> Hybrid
> > block.
> > We’ll consider and extend the functionality to Estimator.
> > I have had trouble using pickle package which is commonly used for
> > serialization and deserialization, if you have any other suggestions from
> > your experience please let us know.
> >
> > 2) +1, we’ll add this to our backlog and add it in our next iteration.
> >
> > 3) Can you expand a little more on this, how it helps in a production
> > environment (which this API was not target for) ?.
> > I’ll check the TF Estimator to understand further.
> >
> > Thanks, Naveen
> >
> >
> > On Thu, Feb 7, 2019 at 2:32 PM Alfredo Luque
> >  wrote:
> >
> > > This is great and something we should all be able to benefit from.
> > >
> > > There are just three pieces I’d like to advocate for that I feel are
> > > shortcomings of some competing APIs on other frameworks (eg; TF
> > Estimators)
> > > and I would love to see in this proposal:
> > >
> > > 1) Make serialization/deserialization of these classifiers/regressors
> > easy
> > > or at least ensure the internal members of the wrapper are easy to
> > > save/load. We’ve hacked around this by only allowing hybrid blocks
> which
> > > have easy save/load functionality, but having a simple
> > > “save_model”/“load_model” function as a 1st class citizen of these
> > proposed
> > > APIs will lead to a vastly improved user experience down the road.
> > >
> > > 2) Allowing the fit/predict/predict_proba functions to take in both
> data
> > > loaders and simple numpy arrays and pandas dataframes is a simple
> change
> > > but a huge usability improvement. Power users and library authors will
> > > appreciate being able to use custom data loaders but a large portion of
> > end
> > > users want to just pass an ndarray or data frame and get some results
> > > quickly.
> > >
> > > 3) Allow lazy construction of the model. This is something I feel TF
> > > Estimators do well: by allowing the user to pass a function that
> > constructs
> > > the net (i.e a model_fn that returns the net) rather than the net
> itself
> > it
> > > allows for more control at runtime and usage of these APIs in a
> > production
> > > environment.
> > >
> > > Would love your thoughts on these three changes/additions.
> > >
> > > —Alfredo Luque
> > > Software Engineer
> > > Machine Learning Infrastructure
> > > Airbnb
> > > San Francisco, CA
> > >
> > > On February 7, 2019 at 1:51:17 PM, Ankit Khedia (
> khedia.an...@gmail.com)
> > > wrote:
> > >
> > > Hello dev@,
> > >
> > > Training a model in Gluon requires users to write the training loop,
> this
> > > is useful because of its imperative nature, however repeating the same
> > code
> > > across multiple models can become tedious and repetitive with
> boilerplate
> > > code. The training loop can also be overwhelming to some users new to
> > deep
> > > learning. Users have asked in [1] for a simple Fit API, similar to APIs
> > > available in SKLearn and Keras as a way to simplify model training and
> > > reduce boilerplate code and

Re: [DISCUSS] Process to remove deprecated operators

2019-03-01 Thread Naveen Swamy

It might be a good idea to issue those deprecation messages right away if
have replacement operators already, so users get enough time to migrate to
the new operators.

On Thu, Feb 28, 2019 at 2:45 PM Anirudh Acharya 
wrote:

> Hi Lin,
>
> This is a good idea. Here is an issue -
> https://github.com/apache/incubator-mxnet/issues/9686 that is already
> attempting to collate all the breaking changes that might be necessary for
> v2.0. We could start by adding things to that issue.
>
> I think eventually we will need a separate branch into which these breaking
> changes get introduced, and this branch can later be merged into master
> prior to v2.0 release.
>
> Thanks
> Anirudh
>
>
> On Thu, Feb 28, 2019 at 1:35 PM Wen-Yang Chu 
> wrote:
>
> > Hi,
> >
> > I have raised an issue:
> >
> > mx.nd.Crop does not support FP16 and decpreciated but no direct
> alternative
> > with central crop
> > I use this operator to implement Unet and I found other people too on the
> > Internent. It is very inconvenient to remove this specific operator
> > withoit clear alternative for me:
> >
> > https://github.com/apache/incubator-mxnet/issues/13750
> >
> > *Is it possible to review depreciated operators to make sure we have
> > equivalent functionality?*
> > Thanks
> >
> > Wen-Yang
> >
> > On Thu, Feb 28, 2019 at 2:07 PM Chaitanya Bapat 
> > wrote:
> >
> > > This sounds good.
> > > Going further, if we can maintain a list of deprecated operators - we
> can
> > > create a "Good for first contribution" issue to improve log messaging
> of
> > > Deprecated operators.
> > > If it makes sense, I can go ahead and create that.
> > >
> > > Hope this helps.
> > >
> > > On Thu, 28 Feb 2019 at 01:54, Lin Yuan  wrote:
> > >
> > > > Agreed. When we deprecate an operator, we should add in the log
> message
> > > > something like "This operator X is deprecate and will be removed in
> the
> > > > next release. Please use operator Y instead."
> > > >
> > > > Lin
> > > >
> > > > On Wed, Feb 27, 2019 at 10:23 PM Junru Shao  >
> > > > wrote:
> > > >
> > > > > Hi Lin,
> > > > >
> > > > > I would love to share some immature ideas about deprecating
> > operators.
> > > > Not
> > > > > only adopting semantic versioning, but also should we provide
> enough
> > > > > informative error message for customers to understand how to
> replace
> > > > > deprecated operators with new ones.
> > > > >
> > > > > Thanks,
> > > > > Junru
> > > > >
> > > > > On Wed, Feb 27, 2019 at 9:30 PM Lin Yuan 
> > wrote:
> > > > >
> > > > > > Sheng,
> > > > > >
> > > > > > Thanks for your quick response.
> > > > > > If that's the case, we will wait till 2.0 release to remove the
> > > > > deprecated
> > > > > > operators from code.
> > > > > >
> > > > > > Best,
> > > > > > Lin
> > > > > >
> > > > > > On Wed, Feb 27, 2019 at 9:06 PM Sheng Zha 
> > > wrote:
> > > > > >
> > > > > > > MXNet follows semantic versioning so we will be able to delete
> > them
> > > > in
> > > > > > the
> > > > > > > next major release.
> > > > > > >
> > > > > > > -sz
> > > > > > >
> > > > > > > On Wed, Feb 27, 2019 at 8:53 PM Lin Yuan 
> > > > wrote:
> > > > > > >
> > > > > > > > Dear Community,
> > > > > > > >
> > > > > > > > In MXNet there are many legacy operators such as this
> > > > > > > > <
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://mxnet.incubator.apache.org/versions/master/api/python/symbol/symbol.html?highlight=convolution_v1#mxnet.symbol.Convolution_v1
> > > > > > > > >
> > > > > > > > that has been marked DEPRECATE for several releases. However,
> > > these
> > > > > > > > operators still exist in our code. This caused a few
> problems:
> > > > > > > >
> > > > > > > > 1) Make the codebase bloated and reduce readability
> > > > > > > > 2) Increase unnecessary maintanence effort
> > > > > > > > 3) Bug prone as some people will look up these legacy code as
> > > > example
> > > > > > > > 4) Cause confusion to end users and make documentation page
> > > lengthy
> > > > > > > >
> > > > > > > > I would like to propose the following process (if there is no
> > > > > existing
> > > > > > > one)
> > > > > > > > to remove deprecate operators from our code base.
> > > > > > > >
> > > > > > > > 1. Documnent the deprecate operators/environment variables in
> > the
> > > > > > release
> > > > > > > > note as well as man pages.
> > > > > > > > 2. Limit the life cycle of deprecate operators/argument to
> two
> > > > minor
> > > > > > > > release. For example, if one operator is marked deprecate in
> > 1.4
> > > > > > release,
> > > > > > > > it will be removed in 1.6 release.
> > > > > > > > 3. If there is some concern raised from customers during 1.4
> > and
> > > > 1.5
> > > > > > > > release, we can convert the deprecated operator back to
> current
> > > and
> > > > > it
> > > > > > > will
> > > > > > > > be treated as new operator.
> > > > > > > > 4. PRs that remove deprecate operators should contain
> [Cleanup]
> > > in
> > > > > > title.
> > > > > > > >
> > > > >

MXNet Meetups

2019-02-25 Thread Naveen Swamy

A user driven meetup in Copenhagen on April 4th, if any of you are in the
area. check it out

a) Introduction to Gluon NLP & CV and ONNX
b) MXNet to ONNX to ML.NET [1]

Thanks Cosmin for your passion to MXNet, organizing and leading the meetup
group.

Another Meetup today in Palo Alto,
Deep Learning With MXNET On Video: Ring & Xbox video demos [2]

[1] https://www.meetup.com/meetup-group-bdEUVQHL/events/258352270/
[2] https://www.meetup.com/deep-learning-with-mxnet/events/258901722/ (This
is organized by Marcelo and others.)

Re: Help with the Clojure release jars

2019-02-24 Thread Naveen Swamy

Since we release and vote on source and these were not a part of the voting, I 
think it's ok do it independently I.e proceed with clean up followed by 
rerelease so it won't impact closure users

> On Feb 24, 2019, at 3:46 PM, Carin Meier  wrote:
> 
> Yes. It looks a bit confusing because I have a copy for the jars both in
> staging and in releases since after I hit release, I created them again
> while my brain was trying to figure out what went wrong.
> 
> My understanding after talking with #asfinfra is that we would have to:
> 1) Create an infra ticket to delete it from repository.apache.org
> 2) Create a ticket with issues.sonatype.org to delete it from central
> 
> We can either
> -  Wait and let them be until the vote concludes, if the vote doesn't pass
> I can make the tickets to delete them.
> - Create the tickets and delete them now.
> 
> Let me know if this is what we want to do and I will kick off the process.
> 
> -Carin
> 
>> On Sun, Feb 24, 2019 at 6:39 PM Naveen Swamy  wrote:
>> 
>> Did you close the repo? In my understanding, the repo(*.Apache.org) is
>> under infra's control, so we porbably have to create a infra ticket, last
>> time I wasn't able to delete( luckily it was in staging, so I abandoned and
>> released to staging again.)
>> 
>>> On Feb 24, 2019, at 2:36 PM, Carin Meier  wrote:
>>> 
>>> From the #infra channel, the options to fix seem to be that it can be
>>> deleted from the repository.apache.org, but we'd need to talk to
>> sonatype
>>> about getting it removed.
>>> 
>>> I'm not exactly sure where we are in the voting process for release, so
>>> please let me know how everyone would like to proceed.
>>> 
>>> Sorry again and I'll take steps to make sure that it won't happen again.
>>> 
>>> Best,
>>> Carin
>>> 
>>>> On Sun, Feb 24, 2019 at 5:19 PM Carin Meier 
>> wrote:
>>>> 
>>>> It appears I did accidentally release them :(
>>>> 
>>>> I'm chatting with Infra in the slack room to see if there are any fixes.
>>>> If anyone has any other ideas please let me know.
>>>> 
>>>>> On Sun, Feb 24, 2019 at 4:35 PM Carin Meier 
>> wrote:
>>>>> 
>>>>> I was wondering if someone could help me verify if I accidentally
>>>>> released the Clojure jars.
>>>>> 
>>>>> Background:
>>>>> I was creating and testing the Clojure jars for staging according my my
>>>>> instructions[1].
>>>>> I hit the close button on the repository but I didn't see it updated at
>>>>> the link
>>>>> 
>> https://repository.apache.org/content/repositories/staging/org/apache/mxnet/contrib/clojure/clojure-mxnet-linux-cpu/1.4.0/
>>>>> - so I hit the release button.
>>>>> 
>>>>> Last time I did a release, I remember I had to explicitly hit the
>>>>> "promote" button to get it promoted to maven central at the release
>> time,
>>>>> but now I'm not sure.
>>>>> 
>>>>> So could someone please help me:
>>>>> 1) Figure out if I accidentally released it by mistake
>>>>> - there are 3 jars: The clojure linux-cpu, linux-gpu, and osx-cpu
>>>>> 2) If I did, please tell me how/if I can fix it (and sorry of course)
>>>>> 3) Please let me know any corrections so I that I can update my process
>>>>> instructions and make sure it doesn't happen again.
>>>>> 
>>>>> Thanks,
>>>>> Carin
>>>>> 
>>>> 
>>

Re: Help with the Clojure release jars

2019-02-24 Thread Naveen Swamy

Did you close the repo? In my understanding, the repo(*.Apache.org) is under 
infra's control, so we porbably have to create a infra ticket, last time I 
wasn't able to delete( luckily it was in staging, so I abandoned and released 
to staging again.)

> On Feb 24, 2019, at 2:36 PM, Carin Meier  wrote:
> 
> From the #infra channel, the options to fix seem to be that it can be
> deleted from the repository.apache.org, but we'd need to talk to sonatype
> about getting it removed.
> 
> I'm not exactly sure where we are in the voting process for release, so
> please let me know how everyone would like to proceed.
> 
> Sorry again and I'll take steps to make sure that it won't happen again.
> 
> Best,
> Carin
> 
>> On Sun, Feb 24, 2019 at 5:19 PM Carin Meier  wrote:
>> 
>> It appears I did accidentally release them :(
>> 
>> I'm chatting with Infra in the slack room to see if there are any fixes.
>> If anyone has any other ideas please let me know.
>> 
>>> On Sun, Feb 24, 2019 at 4:35 PM Carin Meier  wrote:
>>> 
>>> I was wondering if someone could help me verify if I accidentally
>>> released the Clojure jars.
>>> 
>>> Background:
>>> I was creating and testing the Clojure jars for staging according my my
>>> instructions[1].
>>> I hit the close button on the repository but I didn't see it updated at
>>> the link
>>> https://repository.apache.org/content/repositories/staging/org/apache/mxnet/contrib/clojure/clojure-mxnet-linux-cpu/1.4.0/
>>> - so I hit the release button.
>>> 
>>> Last time I did a release, I remember I had to explicitly hit the
>>> "promote" button to get it promoted to maven central at the release time,
>>> but now I'm not sure.
>>> 
>>> So could someone please help me:
>>> 1) Figure out if I accidentally released it by mistake
>>>  - there are 3 jars: The clojure linux-cpu, linux-gpu, and osx-cpu
>>> 2) If I did, please tell me how/if I can fix it (and sorry of course)
>>> 3) Please let me know any corrections so I that I can update my process
>>> instructions and make sure it doesn't happen again.
>>> 
>>> Thanks,
>>> Carin
>>> 
>>

Re: Wiki Access for Kedar Bellare

2019-02-24 Thread Naveen Swamy

i added kedar, i updated your id to have admin access.

On Sun, Feb 24, 2019 at 10:43 AM Carin Meier  wrote:

> Can someone please help Kedar get write access to our wiki? I don't have
> that access. He is taking the lead on the next generation of our Clojure
> package NDArray and Symbol APIs [1] and would like to collaborate on some
> design documentation.
>
> Thanks!
> Carin
>
> [1] https://github.com/apache/incubator-mxnet/pull/14195
>

Re: Gluon fit API- Design proposal

2019-02-08 Thread Naveen Swamy

Hi Alfredo,
Thanks for your comments, I really like all your suggestions. Here are my
answers let me know if it makes sense or have comments.

1) The fit API is targeting novice users covering about 80% of the use
cases listed in the document. For advanced users,
and complex models, we (Naveen, Ankit and Lai) felt its best use the
existing mechanisms due to the imperative nature and the more control it
can give, So we did not duplicate the save/load functionality in the Hybrid
block.
We’ll consider and extend the functionality to Estimator.
I have had trouble using pickle package which is commonly used for
serialization and deserialization, if you have any other suggestions from
your experience please let us know.

2) +1, we’ll add this to our backlog and add it in our next iteration.

3) Can you expand a little more on this, how it helps in a production
environment (which this API was not target for) ?.
I’ll check the TF Estimator to understand further.

Thanks, Naveen


On Thu, Feb 7, 2019 at 2:32 PM Alfredo Luque
 wrote:

> This is great and something we should all be able to benefit from.
>
> There are just three pieces I’d like to advocate for that I feel are
> shortcomings of some competing APIs on other frameworks (eg; TF Estimators)
> and I would love to see in this proposal:
>
> 1) Make serialization/deserialization of these classifiers/regressors easy
> or at least ensure the internal members of the wrapper are easy to
> save/load. We’ve hacked around this by only allowing hybrid blocks which
> have easy save/load functionality, but having a simple
> “save_model”/“load_model” function as a 1st class citizen of these proposed
> APIs will lead to a vastly improved user experience down the road.
>
> 2) Allowing the fit/predict/predict_proba functions to take in both data
> loaders and simple numpy arrays and pandas dataframes is a simple change
> but a huge usability improvement. Power users and library authors will
> appreciate being able to use custom data loaders but a large portion of end
> users want to just pass an ndarray or data frame and get some results
> quickly.
>
> 3) Allow lazy construction of the model. This is something I feel TF
> Estimators do well: by allowing the user to pass a function that constructs
> the net (i.e a model_fn that returns the net) rather than the net itself it
> allows for more control at runtime and usage of these APIs in a production
> environment.
>
> Would love your thoughts on these three changes/additions.
>
> —Alfredo Luque
> Software Engineer
> Machine Learning Infrastructure
> Airbnb
> San Francisco, CA
>
> On February 7, 2019 at 1:51:17 PM, Ankit Khedia (khedia.an...@gmail.com)
> wrote:
>
> Hello dev@,
>
> Training a model in Gluon requires users to write the training loop, this
> is useful because of its imperative nature, however repeating the same code
> across multiple models can become tedious and repetitive with boilerplate
> code. The training loop can also be overwhelming to some users new to deep
> learning. Users have asked in [1] for a simple Fit API, similar to APIs
> available in SKLearn and Keras as a way to simplify model training and
> reduce boilerplate code and complexity.
>
> So, I along with other contributor Naveen and Lai came up with a fit API
> proposal in [2] that covers 80% of the use-cases for beginners, the fit API
> does not replace the gluon training loops. The API proposal is inspired by
> the Keras fit API. I have discussed and got feedback from a few MXNet
> contributors (Sheng, Mu, Aston, Zhi) close by and I am writing to ask for
> the community’s feedback on the API proposal.
>
>
>
> [1]
> https://discuss.mxnet.io/t/wrapping-gluon-into-scikit-learn-like-api/2112
> [2]
>
> https://cwiki.apache.org/confluence/display/MXNET/Gluon+Fit+API+-+Tech+Design
>
>
> Thanks,
> Ankit
>
>
> —
> Alfredo Luque
> Software Engineer
> Machine Learning Infrastructure
> Airbnb
> San Francisco, CA
>

Re: request for approval to post

2019-01-22 Thread Naveen Swamy

Welcome to Apache MXNet, added permissions to your handle.


On Tue, Jan 22, 2019 at 2:25 PM Gasparakis, Harris <
harris.gaspara...@amd.com> wrote:

> Hello!
>
> Please allow me to introduce myself, I'm currently AMD's lead/responsible
> party for our MXNet port on AMD's ROCM open source platform. I'm also one
> of ROCM's architects, and also lead AMD's port of CNTK (soon to become
> public) and also TF in the earlier days. In the past I architected and lead
> the OpenCL implementation of opencv.
>
> I'm very excited about participating in the development of mxnet! As I'm
> still getting my feet wet, I was trying to post some comments in the cwiki
> that my team started
>
>
> https://cwiki.apache.org/confluence/display/MXNET/Upstreaming+of+Mxnet+HIP+port
>
> but with no luck.  Presumably I miss permissions?
>
> My handle is harris.gasparakis
> And email is harris.gaspara...@amd.com
>
> Thanks! Appreciate your help!
>

Re: Jira board and Confluence page for Julia

2019-01-15 Thread Naveen Swamy

No, one of us had to add your userId and assign appropriate permissions
which I did. You have the same permissions as I do, feel free to experiment
with it.

-Naveen

On Mon, Jan 14, 2019 at 6:04 PM iblis  wrote:

> My Confluence wiki account:
> https://cwiki.apache.org/confluence/users/viewuserprofile.action?username=iblis
> And Jira account:
> https://issues.apache.org/jira/secure/ViewProfile.jspa?name=iblis
>
> Both of these accounts predate my apache account.
> (Are there any processes of linking them?)
>
> Iblis Lin
> 林峻頤
>
> On 1/15/19 2:55 AM, Naveen Swamy wrote:
> > Ibis, Confluence needs its own user name/password. Once you have let us
> > know, one of us should be able to grant access.
> >
> > For JIRA, I am not an expert at it, I think it uses Apache Login, i'll
> find
> > you and add appropriate permissions, I was able to create a board for
> > Scala, the board also needs to be able to share with others that part is
> > tricky, I'll try to find and let you know.
> >
> > -Naveen
> >
> > On Mon, Jan 14, 2019 at 9:18 AM Carin Meier 
> wrote:
> >
> >> Iblis,
> >>
> >> Thanks for taking the lead in doing this. Unfortunately, I can't help
> you
> >> with JIRA  - but maybe someone else can.
> >>
> >> I can help you with the wiki. The Clojure package has done something
> >> similar here
> >>
> https://cwiki.apache.org/confluence/display/MXNET/Clojure+Release+Process.
> >>
> >> Once you are logged on you can go here:
> >> https://cwiki.apache.org/confluence/display/MXNET/ and hit "Create
> Page"
> >> on
> >> top. It will create a page in the tree structure it is currently in - so
> >> you can make "Julia Package" - After that you can create sub pages of
> the
> >> Julia package page.
> >>
> >> You should also be able to edit any other page as well. You can also
> "move"
> >> a page to another spot as well. Under the "..." on the right side,
> there is
> >> an option.
> >>
> >> Let me know if you you need any other help,
> >>
> >> Carin
> >>
> >> On Sun, Jan 13, 2019 at 9:03 PM iblis  wrote:
> >>
> >>> Hi,
> >>>
> >>> I want to create some issue for Julia on Jira board.
> >>> Could we have a category for the Julia package?
> >>> Also, I want to write down something like release process for Julia,
> >>> maybe Confluence wiki is the right replace.
> >>> How to creating a new page on this wiki?
> >>>
> >>> --
> >>> Iblis Lin
> >>> 林峻頤
> >>>
> >>
> >
>

Re: [DISCUSS] Make MKLDNN as a default on Maven nightly build

2019-01-14 Thread Naveen Swamy

Thanks for bringing this up.
I think it would be less confusing for users if we keep it same across
language bindings? What needs to be done to python, Java, Clojure, etc., ?
If they are not getting this default, any reason?

On Mon, Jan 14, 2019 at 10:56 AM Qing Lan  wrote:

> Hi all,
>
> I would like to raise a discussion on whether to make MKLDNN as a default
> in nightly build (1.5.0-SNAPSHOT) for MXNet Scala/Java binding. Currently
> Scala build with MKLDNN is supported since
> https://github.com/apache/incubator-mxnet/pull/13819 with CI. I do see
> the performance increase when dealing with the inference and it is also
> necessary to get it in nightly for beta-testing in order to make it
> official in 1.5.0.
>
> Thanks,
> Qing
>

Re: Jira board and Confluence page for Julia

2019-01-14 Thread Naveen Swamy

Ibis, Confluence needs its own user name/password. Once you have let us
know, one of us should be able to grant access.

For JIRA, I am not an expert at it, I think it uses Apache Login, i'll find
you and add appropriate permissions, I was able to create a board for
Scala, the board also needs to be able to share with others that part is
tricky, I'll try to find and let you know.

-Naveen

On Mon, Jan 14, 2019 at 9:18 AM Carin Meier  wrote:

> Iblis,
>
> Thanks for taking the lead in doing this. Unfortunately, I can't help you
> with JIRA  - but maybe someone else can.
>
> I can help you with the wiki. The Clojure package has done something
> similar here
> https://cwiki.apache.org/confluence/display/MXNET/Clojure+Release+Process.
>
> Once you are logged on you can go here:
> https://cwiki.apache.org/confluence/display/MXNET/ and hit "Create Page"
> on
> top. It will create a page in the tree structure it is currently in - so
> you can make "Julia Package" - After that you can create sub pages of the
> Julia package page.
>
> You should also be able to edit any other page as well. You can also "move"
> a page to another spot as well. Under the "..." on the right side, there is
> an option.
>
> Let me know if you you need any other help,
>
> Carin
>
> On Sun, Jan 13, 2019 at 9:03 PM iblis  wrote:
>
> > Hi,
> >
> > I want to create some issue for Julia on Jira board.
> > Could we have a category for the Julia package?
> > Also, I want to write down something like release process for Julia,
> > maybe Confluence wiki is the right replace.
> > How to creating a new page on this wiki?
> >
> > --
> > Iblis Lin
> > 林峻頤
> >
>

Re: [VOTE] Release Apache MXNet (incubating) version 1.4.0.rc0

2018-12-21 Thread Naveen Swamy

+1
Verified Signatures
Downloaded source from dist server and Tested Scala Package on Linux CPU/

On Fri, Dec 21, 2018 at 11:21 AM Carin Meier  wrote:

> +1 (binding) - Tested on OSX with Clojure/Scala package
>
> On Thu, Dec 20, 2018 at 12:25 PM Steffen Rochel 
> wrote:
>
> > Dear MXNet community,
> >
> > This is the vote to release Apache MXNet (incubating) version v1.4.0.
> > Voting will start December 20 noon PST  and close on December 27 noon
> PST.
> >
> > Link to release notes:
> >
> >
> >
> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Notes
> >
> > Link to release candidate:
> > https://github.com/apache/incubator-mxnet/releases/tag/
> > 
> > 1.4.0.rc0
> > 
> >
> > Link to source and signatures on apache dist server:
> > https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.4.0.rc0
> > 
> >
> >
> > Please remember to TEST first before voting accordingly:
> > +1 = approve
> > +0 = no opinion
> > -1 = disapprove (provide reason)
> >
> >
> > Best regards,
> > Steffen
> >
>

Re: Malformed package uploaded to Maven central

2018-12-21 Thread Naveen Swamy

I don't think they will disable PUT because some projects might be manually
uploading artifacts.
Here is the ticket i created :
https://issues.apache.org/jira/browse/INFRA-17489

On Fri, Dec 21, 2018 at 10:23 AM Frank Liu  wrote:

> If the release procedure is always push to staging and manually promote
> from staging to release, the nexus2 repo should be configured to forbidden
> direct push to release repo.
>
> It currently allows upload files directly via HTTP PUT (works with curl
> command).
> What Qing executed is a simple mvn deploy:deploy-file task in the maven
> which point to https://repository.apache.org/content/repositories/releases
>
> Thanks,
> Frank
>
> On Fri, Dec 21, 2018 at 9:59 AM Naveen Swamy  wrote:
>
> > Hi Qing,
> >
> > Thanks for bringing this to the attention of the community. I understand
> it
> > was an unintended consequences of publish experiment. I will raise a
> INFRA
> > ticket to remove this package from the releases repo.
> > Could you please file a GitHub issue or MXNet JIRA and mention the
> commands
> > you executed so we can request INFRA to not let packages be published
> > directly to releases without going through the process of deploying to
> > STAGING and then test/close the package to Releases ?
> >
> > Thanks, Naveen
> >
> > On Thu, Dec 20, 2018 at 5:01 PM Qing Lan  wrote:
> >
> > > Dear Community,
> > >
> > > Recently I tried to improve the Maven automated publish procedure and
> > > tested publish the package. However, I accidently used maven to upload
> a
> > > package to a close release branch:
> > >
> >
> https://repository.apache.org/#nexus-search;gav~org.apache.mxnet~~1.5.0~~/
> > > <
> >
> https://repository.apache.org/#nexus-search;gav~org.apache.mxnet~~1.5.0~~
> > >.
> > > However, it seemed that I didn’t have the access to remove this package
> > > since it is controlled by Maven central. In this case, I regretfully
> > > request a PMC/PPMC member to file an Apache Infra ticket to remove this
> > > package from there so it won’t influence the current maven users to
> > > download them. The influence is limited to OSX users who are using
> > official
> > > releases of MXNet Scala/Java packages.
> > >
> > > I apologize for this act and won’t do any more risky experiment until I
> > am
> > > fully aware of the consequence of it.
> > > Qing
> > >
> > >
> >
>

Re: Malformed package uploaded to Maven central

2018-12-21 Thread Naveen Swamy

Hi Qing,

Thanks for bringing this to the attention of the community. I understand it
was an unintended consequences of publish experiment. I will raise a INFRA
ticket to remove this package from the releases repo.
Could you please file a GitHub issue or MXNet JIRA and mention the commands
you executed so we can request INFRA to not let packages be published
directly to releases without going through the process of deploying to
STAGING and then test/close the package to Releases ?

Thanks, Naveen

On Thu, Dec 20, 2018 at 5:01 PM Qing Lan  wrote:

> Dear Community,
>
> Recently I tried to improve the Maven automated publish procedure and
> tested publish the package. However, I accidently used maven to upload a
> package to a close release branch:
> https://repository.apache.org/#nexus-search;gav~org.apache.mxnet~~1.5.0~~/
> .
> However, it seemed that I didn’t have the access to remove this package
> since it is controlled by Maven central. In this case, I regretfully
> request a PMC/PPMC member to file an Apache Infra ticket to remove this
> package from there so it won’t influence the current maven users to
> download them. The influence is limited to OSX users who are using official
> releases of MXNet Scala/Java packages.
>
> I apologize for this act and won’t do any more risky experiment until I am
> fully aware of the consequence of it.
> Qing
>
>

Re: [DISCUSS] About the usage of CUDA/CUDNN

2018-12-17 Thread Naveen Swamy

Attempting to answer Qing's question
--
If you can digest the legal terms:
https://docs.nvidia.com/cuda/eula/index.html#distribution-requirements.
It sounds its OK("

   1. Your application must have material additional functionality, beyond
   the included portions of the SDK.")

 but I not don't understand the legal lingo

@Hen  : Could you provide input to this?

Thanks, Naveen

On Mon, Dec 17, 2018 at 3:29 PM Davydenko, Denis <
dzianis.davydze...@gmail.com> wrote:

> Kellen, please see conversation [1] on previously published proposal re:
> maven publishing pipeline. I think your concerns are valid and we should
> look into security aspect of running our CI on a broader scope, not bound
> to just artifact publishing.
>
> I believe right now Qing's question is whether it is OK from legal
> perspective to download CUDA by literally running wget during one of the
> jobs in publishing pipeline. The fact it is not available by just simple
> URL download raises concern: whether it is a protective measure from
> downloads by unauthenticated users or just inconvenience that has not been
> addressed by nVidia yet.
>
> [1]:
> https://lists.apache.org/thread.html/464712f0136fb51916ca9f1b702b99847e108dbdbd0b6a2b73fc91f1@%3Cdev.mxnet.apache.org%3E
>
>
> On 12/17/18, 2:48 PM, "kellen sunderland" 
> wrote:
>
> Restricted nodes may provide enough security for some use cases, but
> in my
> opinion they don't provide enough for artifact publishing. An example
> would
> be if there were a exploit available that worked against a Jenkins
> master.
> In this case I think an attacker code still pivot to a secure node
> (correct
> me if I'm wrong).
>
> To your second point, it shouldn't be too hard for us to maintain all
> the
> deps for our packages in Dockerfiles which are checked into source and
> built on a regular basis.  To publish these artifacts I'd recommend
> doing
> this from a separate, secure environment.  The flow I'd recommend
> would be
> something like: (1) Developers commit PRs with verification that the
> artifacts build properly on a continual basis from the CI. (2) In a
> separate, secure environment we do the same artifact build generation
> again, but this time we publish to various repos as a convenience to
> our
> MXNet users.
>
> On Mon, Dec 17, 2018 at 2:34 PM Qing Lan  wrote:
>
> > Hi Kellen,
> >
> > Firstly the restricted node is completely isolated to the
> PR-checking CI
> > system (physically) which is explained in here:
> >
> https://cwiki.apache.org/confluence/display/MXNET/Restricted+jobs+and+nodes
> > .
> > What you are mentioning: the Public CIs are all having troubles if
> they
> > are public accessible. I am not sure how secure the restricted node
> is.
> > However, the only way I can think of from your end is to downloading
> all
> > deps in a single machine and run everything there (disconnected from
> > internet). It would bring us the best security we have.
> >
> > Thanks,
> > Qing
> >
> > On 12/17/18, 2:06 PM, "kellen sunderland" <
> kellen.sunderl...@gmail.com>
> > wrote:
> >
> > I'm not in favour of publishing artifacts from any Jenkins based
> > systems.
> > There are many ways to bundle artifacts and publish them from an
> > automated
> > system.  Why we would use a CI system like Jenkins for this task?
> > Jenkins
> > frequently has security vulnerabilities and is designed to run
> > arbitrary
> > code from the internet.  It is a real possibility that an
> attacker
> > could
> > pivot from any Jenkins based CI system to infect artifacts which
> would
> > then
> > potentially be pushed to repositories our users would consume.
> I would
> > consider any system using Jenkins as insecure-by-design, and
> encourage
> > us
> > to air-gapped any artifact generation (websites, jars, PyPi
> packages)
> > completely from a system like that.
> >
> > An alternative I could see is a simple Dockerfile (no Jenkins)
> that
> > builds
> > all artifacts end-to-end and can be run in an automated account
> well
> > outside our CI account.
> >
> > On Mon, Dec 17, 2018 at 1:53 PM Qing Lan 
> wrote:
> >
> > > Dear community,
> > >
> > > Currently me and Zach are working on the Automated-publish
> pipeline
> > on
> > > Jenkins which is a pipeline used to publish Maven packages and
> pip
> > packages
> > > nightly build. We are trying to use NVIDIA deb which could
> help us
> > to build
> > > different CUDA/CUDNN versions in the publish system. Sheng has
> > provided a
> > > script here:
> https://github.com/apache/incubator-mxnet/pull/13646.
> > This
> > > provide a very concrete and automatic solution from
> downloading to
> >

Re: Include MKLDNN into default mxnet pip package

2018-12-11 Thread Naveen Swamy

Great effort Alex and also folks from Intel.

+1 to make MKLDNN default.

On Tue, Dec 11, 2018 at 9:10 AM Kumar, Vikas 
wrote:

> +1
>
> On 12/10/18, 8:01 PM, "Zhao, Patric"  wrote:
>
> +1, thanks for the efforts, Alex.
>
>
>
> > -Original Message-
> > From: Alex Zai [mailto:aza...@gmail.com]
> > Sent: Tuesday, December 11, 2018 8:00 AM
> > To: dev@mxnet.incubator.apache.org
> > Subject: Include MKLDNN into default mxnet pip package
> >
> > Continuation from the following thread:
> >
> https://lists.apache.org/thread.html/bcb1bd5046ff51049a0556098e756578f
> > 6fa6564831d77fddb56432f@%3Cdev.mxnet.apache.org%3E
> >
> > I am also +1 for making it on master and testing until 1.5.0. We can
> decide
> > later on (before 1.5.0) to enable mkldnn as default for the nightly
> build (pip
> > install --pre build) to try to get more feedback if needed.
> >
> > - What the story is like when there's no AVX instructions present on
> CPUs.
> > Do we get an illegal instruction error, or does it fallback
> gracefully?
> > According to this issue (
> > https://github.com/apache/incubator-mxnet/issues/11911), AVX2 is the
> > minimum requirement for pre-build binaries.
> >
> > - Are there any outstanding issues when MKLDNN is enabled?
> > -There is one issues with quantization int8 of mkldnn (will create
> issue about
> > it when team gives me reproducible code snippet). Additionally, we
> are
> > waiting to merge the PR to build mkldnn statically with mac/linux
> when
> > building from source after MKL is added to the CI.
> >
> >
> > - MKLDNN is a submodule dependency, are we pulling the latest commit
> or
> > releases? If not we should move to releases before we make it a
> default I
> > agree. We should tag mxnet only to releases from now on. Currently
> it is
> > tagged to 0.17.1
> >
> > Please let me know if there any other outstanding issues, else we
> are going
> > to make mkldnn / cmake default in the Make/CMakefile.
> >
> > Alex
>
>
>

Re: Scala standard library is included in the mxnet jar

2018-12-04 Thread Naveen Swamy

Pedro,
Not everyone has Scala installed on their system, especially for example
they are Java/Clojure users(and they don't have to). I don't expect them to
download/install Scala libraries. IMO, this approach is correct and lowers
the entry bar for users.
Soon, we are going include all dependencies of MXNet in the Jar and
publish, this is deliberately done. Hope that answers your concern.

-Naveen

On Tue, Dec 4, 2018 at 10:59 AM Chris Olivier  wrote:

> Pedro,
>
> It would be polite to ask if there is a reason it is included before
> categorically declaring it is wrong.
>
> I am not involved in the scala library and what's included in it, but maybe
> there's a good reason for it. Or maybe there isn't.  Either way, it's best
> to ask first :)
>
> Thanks,
>
> -Chris
>

Re: CI impaired

2018-11-30 Thread Naveen Swamy

Hi Marco/Gavin,

Thanks for the clarification. I was not aware that it has been tested on a
separate test environment(this is what I was suggesting and make the
changes in a more controlled manner), last time the change was made, many
PRs were left dangling and developers had to go trigger and I triggered
them at least 5 times before it succeeded today.

Appreciate all the hard work to make CI better.

-Naveen

On Fri, Nov 30, 2018 at 8:50 AM Gavin M. Bell 
wrote:

> Hey Folks,
>
> Marco has been running this change in dev, with flying colors, for some
> time. This is not an experiment but a roll out that was announced.  We also
> decided to make this change post the release cut so limit the blast radius
> from any critical obligations to the community.  Marco is accountable for
> this work and will address any issues that may occur as he has been put
> on-call.  We have, to our best ability, mitigated as much risk as possible
> and now it is time to pull the trigger.  The community will enjoy a bit
> more visibility and clarity into the test process which will be
> advantageous, as well as allowing us to extend our infrastructure in a way
> that affords us more flexibility.
>
> No pending PRs will be impacted.
>
> Thank you for your support as we evolve this system to better serve the
> community.
>
> -Gavin
>
> On Fri, Nov 30, 2018 at 5:23 PM Marco de Abreu
>  wrote:
>
> > Hello Naveen, this is not an experiment. Everything has been tested in
> our
> > test system and is considered working 100%. This is not a test but
> actually
> > the move into production - the merge into master happened a week ago. We
> > now just have to put all PRs into the catalogue, which means that all PRs
> > have to be analyzed with the new pipelines - the only thing that will be
> > noticeable is that the CI is under higher load.
> >
> > The pending PRs will not be impacted. The existing pipeline is still
> > running in parallel and everything will behave as before.
> >
> > -Marco
> >
> > On Fri, Nov 30, 2018 at 4:41 PM Naveen Swamy  wrote:
> >
> > > Marco, run your experiments on a branch - set up, test it well and then
> > > bring it to the master.
> > >
> > > > On Nov 30, 2018, at 6:53 AM, Marco de Abreu <
> > > marco.g.ab...@googlemail.com.INVALID> wrote:
> > > >
> > > > Hello,
> > > >
> > > > I'm now moving forward with #1. I will try to get to #3 as soon as
> > > possible
> > > > to reduce parallel jobs in our CI. You might notice some unfinished
> > > jobs. I
> > > > will let you know as soon as this process has been completed. Until
> > then,
> > > > please bare with me since we have hundreds of jobs to run in order to
> > > > validate all PRs.
> > > >
> > > > Best regards,
> > > > Marco
> > > >
> > > > On Fri, Nov 30, 2018 at 1:36 AM Marco de Abreu <
> > > marco.g.ab...@googlemail.com>
> > > > wrote:
> > > >
> > > >> Hello,
> > > >>
> > > >> since the release branch has now been cut, I would like to move
> > forward
> > > >> with the CI improvements for the master branch. This would include
> the
> > > >> following actions:
> > > >> 1. Re-enable the new Jenkins job
> > > >> 2. Request Apache Infra to move the protected branch check from the
> > main
> > > >> pipeline to our new ones
> > > >> 3. Merge https://github.com/apache/incubator-mxnet/pull/13474 -
> this
> > > >> finalizes the deprecation process
> > > >>
> > > >> If nobody objects, I would like to start with #1 soon. Mentors,
> could
> > > you
> > > >> please assist to create the Apache Infra ticket? I would then take
> it
> > > from
> > > >> there and talk to Infra.
> > > >>
> > > >> Best regards,
> > > >> Marco
> > > >>
> > > >> On Mon, Nov 26, 2018 at 2:47 AM kellen sunderland <
> > > >> kellen.sunderl...@gmail.com> wrote:
> > > >>
> > > >>> Sorry, [1] meant to reference
> > > >>> https://issues.jenkins-ci.org/browse/JENKINS-37984 .
> > > >>>
> > > >>> On Sun, Nov 25, 2018 at 5:41 PM kellen sunderland <
> > > >>> kellen.sunderl...@gmail.com> wrote:
> > > >>>
> > > >>>> Marco and I ran into another urgent issue over the weekend that
> was
>

Re: CI impaired

2018-11-30 Thread Naveen Swamy

There are still pending PRs pending that needs to be merged and cherry picked 
to the branch

> On Nov 30, 2018, at 6:53 AM, Marco de Abreu 
>  wrote:
> 
> Hello,
> 
> I'm now moving forward with #1. I will try to get to #3 as soon as possible
> to reduce parallel jobs in our CI. You might notice some unfinished jobs. I
> will let you know as soon as this process has been completed. Until then,
> please bare with me since we have hundreds of jobs to run in order to
> validate all PRs.
> 
> Best regards,
> Marco
> 
> On Fri, Nov 30, 2018 at 1:36 AM Marco de Abreu 
> wrote:
> 
>> Hello,
>> 
>> since the release branch has now been cut, I would like to move forward
>> with the CI improvements for the master branch. This would include the
>> following actions:
>> 1. Re-enable the new Jenkins job
>> 2. Request Apache Infra to move the protected branch check from the main
>> pipeline to our new ones
>> 3. Merge https://github.com/apache/incubator-mxnet/pull/13474 - this
>> finalizes the deprecation process
>> 
>> If nobody objects, I would like to start with #1 soon. Mentors, could you
>> please assist to create the Apache Infra ticket? I would then take it from
>> there and talk to Infra.
>> 
>> Best regards,
>> Marco
>> 
>> On Mon, Nov 26, 2018 at 2:47 AM kellen sunderland <
>> kellen.sunderl...@gmail.com> wrote:
>> 
>>> Sorry, [1] meant to reference
>>> https://issues.jenkins-ci.org/browse/JENKINS-37984 .
>>> 
>>> On Sun, Nov 25, 2018 at 5:41 PM kellen sunderland <
>>> kellen.sunderl...@gmail.com> wrote:
>>> 
 Marco and I ran into another urgent issue over the weekend that was
 causing builds to fail.  This issue was unrelated to any feature
 development work, or other CI fixes applied recently, but it did require
 quite a bit of work from Marco (and a little from me) to fix.
 
 We spent enough time on the problem that it caused us to take a step
>>> back
 and consider how we could both fix issues in CI and support the 1.4
>>> release
 with the least impact possible on MXNet devs.  Marco had planned to
>>> make a
 significant change to the CI to fix a long-standing Jenkins error [1],
>>> but
 we feel that most developers would prioritize having a stable build
 environment for the next few weeks over having this fix in place.
 
 To properly introduce a new CI system the intent was to do a gradual
 blue/green roll out of the fix.  To manage this rollout would have taken
 operational effort and double compute load as we run systems in
>>> parallel.
 This risks outages due to scaling limits, and we’d rather make this
>>> change
 during a period of low-developer activity, i.e. shortly after the 1.4
 release.
 
 This means that from now until the 1.4 release, in order to reduce
 complexity MXNet developers should only see a single Jenkins
>>> verification
 check, and a single Travis check.
 
 
>>> 
>>

Re: [Announce] Upcoming Apache MXNet (incubating) 1.4.0 release

2018-11-29 Thread Naveen Swamy

the tests are randomly failing in different stages
http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-13105/
This PR has failed 8 times so far

On Thu, Nov 29, 2018 at 3:43 PM Steffen Rochel 
wrote:

> Pedro - ok. Please add PR to v1.4.x branch after merge to master and please
> update tracking page
> <
> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Plan+and+Status#ApacheMXNet(incubating)1.4.0ReleasePlanandStatus-OpenPRstotrack
> >
> .
> Steffen
>
> On Thu, Nov 29, 2018 at 3:00 PM Pedro Larroy  >
> wrote:
>
> > PR is ready from my side and passes the tests, unless somebody raises
> > any concerns it's good to go.
> > On Thu, Nov 29, 2018 at 9:50 PM Steffen Rochel 
> > wrote:
> > >
> > > Pedro - added  to 1.4.0 tracking list
> > > <
> >
> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Plan+and+Status#ApacheMXNet(incubating)1.4.0ReleasePlanandStatus-OpenPRstotrack
> > >
> > >
> > > Do you have already ETA?
> > > Steffen
> > >
> > > On Thu, Nov 29, 2018 at 6:13 AM Pedro Larroy <
> > pedro.larroy.li...@gmail.com>
> > > wrote:
> > >
> > > > Hi all.
> > > >
> > > > There are two important issues / fixes that should go in the next
> > > > release in my radar:
> > > >
> > > > 1) https://github.com/apache/incubator-mxnet/pull/13409/files
> > > > There is a bug in shape inference on CPU when not using MKL, also we
> > > > are running activation on CPU via MKL when we compile CUDNN+MKLDNN.
> > > > I'm finishing a fix for these issues in the above PR.
> > > >
> > > > 2) https://github.com/apache/incubator-mxnet/issues/13438
> > > > We are seeing crashes due to unsafe setenv in multithreaded code.
> > > > Setenv / getenv from multiple threads is not safe and is causing
> > > > segfaults. This piece of code (the handlers in pthread_atfork)
> already
> > > > caused a very difficult to diagnose hang in a previous release, where
> > > > a fork inside cudnn would deadlock the engine.
> > > >
> > > > I would remove setenv from 2) as a mitigation, but we would need to
> > > > check for regressions as we could be creating additional threads
> > > > inside the engine.
> > > >
> > > > I would suggest that we address these two major issues before the
> next
> > > > release.
> > > >
> > > > Pedro
> > > >
> > > >
> > > >
> > > > On Sun, Nov 25, 2018 at 11:41 PM Steffen Rochel <
> > steffenroc...@gmail.com>
> > > > wrote:
> > > > >
> > > > > Dear MXNet community,
> > > > >
> > > > > I will be the release manager for the upcoming Apache MXNet 1.4.0
> > > > release.
> > > > > Sergey Kolychev will be co-managing the release and providing help
> > from
> > > > the
> > > > > committers side.
> > > > > A release candidate will be cut on November 29, 2018 and voting
> will
> > > > start
> > > > > December 7, 2018. Release notes have been drafted here [1]. If you
> > have
> > > > any
> > > > > additional features in progress and would like to include it in
> this
> > > > > release, please assure they have been merged by November 27, 2018.
> > > > Release
> > > > > schedule is available here [2].
> > > > >
> > > > > Feel free to add any other comments/suggestions. Please help to
> > review
> > > > and
> > > > > merge outstanding PR's and resolve issues impacting the quality of
> > the
> > > > > 1.4.0 release.
> > > > >
> > > > > Regards,
> > > > >
> > > > > Steffen
> > > > >
> > > > > [1]
> > > > >
> > > >
> >
> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Notes
> > > > >
> > > > > [2]
> > > >
> >
> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Plan+and+Status
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Tue, Nov 20, 2018 at 7:15 PM kellen sunderland <
> > > > > kellen.sunderl...@gmail.com> wrote:
> > > > >
> > > > > > Spoke too soon[1], looks like others have been adding Turing
> > support as
> > > > > > well (thanks to those helping with this).  I believe there's
> still
> > a
> > > > few
> > > > > > changes we'd have to make to claim support though (mshadow CMake
> > > > changes,
> > > > > > PyPi package creation tweaks).
> > > > > >
> > > > > > 1:
> > > > > >
> > > > > >
> > > >
> >
> https://github.com/apache/incubator-mxnet/commit/2c3357443ec3d49a11e93c89f278264ce10c2f08
> > > > > >
> > > > > > On Tue, Nov 20, 2018 at 7:00 PM kellen sunderland <
> > > > > > kellen.sunderl...@gmail.com> wrote:
> > > > > >
> > > > > > > Hey Steffen, I'd like to be able to merge this PR for version
> > 1.4:
> > > > > > > https://github.com/apache/incubator-mxnet/pull/13310 . It
> fixes
> > a
> > > > > > > regression in master which causes incorrect feature vectors to
> be
> > > > output
> > > > > > > when using the TensorRT feature.  (Thanks to Nathalie for
> > helping me
> > > > > > track
> > > > > > > down the root cause of the issue).   I'm currently blocked on a
> > CI
> > > > issue
> > > > > > I
> > > > > > >

Re: Include MKLDNN into default mxnet pip package

2018-11-21 Thread Naveen Swamy

Tao,

You are right there are many submodules in 3rd party. We have to start
somewhere and I believe this one is a good candidate to start with. This is
not to cater to release of MXNet or to tie them with the releases of the
submodules but instead to pick only stable releases and not to pick up
bleeding edge commits from the tip of the master, this gives us confidence
in the submodule that MXNet users are depending on that especially if we
make MKLDNN the default.

Good to know it is known already as a regression.Alex has created this
issue https://github.com/apache/incubator-mxnet/issues/13369, please add
details and link the corresponding issue in MKLDNN(I couldn't find).

-Naveen

On Wed, Nov 21, 2018 at 6:04 PM Lv, Tao A  wrote:

> Here are my answers for the questions from Kellen and Naveen about
> MKL-DNN. It doesn't mean that I'm supportive for making MKL-DNN default
> here.
>
> @Kellen,
>
> FYI, here is a list for those platforms which are officially supported by
> MKL-DNN.
> https://github.com/intel/mkl-dnn#system-requirements
>
> Most of computation intensive kernels in MKL-DNN are JITed. So they are
> supposed to generate code according to the platform during runtime. For
> non-JIT code in MKL-DNN, same as other code in MXNet, it will generate
> instructions according to the options/flags of compiler. We can set
> -DARCH_OPT_FLAGS when build MKL-DNN to avoid optimization for compiling
> machine. That's exactly what we are doing for MKL-DNN build in MXNet. Even
> without MKL-DNN, I noticed there were issues about illegal instructions of
> MXNet when users import the pip package on a lower end machine which
> probably only supports SSE.
>
> @Naveen,
>
> The LSTM issue has already been identified as a regression from the recent
> version of MKL-DNN. Hopefully it will be fixed soon with a new update of
> MKL-DNN.
>
> MXNet has many submodule dependencies under the 3rd party folder. Seems we
> don't require release versions for most of these dependencies. The release
> period of MKL-DNN and MXNet are not matched very well. I think it would be
> a risk for MXNet release if it hardly depends on the release of a
> submodule, no need to say depends on the releases of all submodules.
>
> -tao
>
> -Original Message-
> From: Naveen Swamy [mailto:mnnav...@gmail.com]
> Sent: Thursday, November 22, 2018 9:08 AM
> To: dev@mxnet.incubator.apache.org
> Cc: d...@mxnet.apache.org
> Subject: Re: Include MKLDNN into default mxnet pip package
>
> Hi Alex,
>
> Thanks for promptly running the numbers on AMD and reporting here.
>
> Can you please update the AMD numbers here for posterity
> https://cwiki.apache.org/confluence/display/MXNET/MXNet+with+Intel+MKL-DNN+-+Performance+Benchmarking
> ?
>
> are there any outstanding issues when MKLDNN is enabled? from my offline
> conversation I am briefly aware performance issues with LSTM, is there an
> GitHub issue for it?
>
> MKLDNN is a submodule dependency, are we pulling the latest commit or
> releases  ? If not we should move to releases before we make it a default.
> Ideally we should use platform specific distributions (-dev packages) at
> least we should rely on well tested releases.
>
>
> Thanks, Naveen
>
> On Wed, Nov 21, 2018 at 4:55 PM Zai, Alexander  >
> wrote:
>
> > AMD benchmarks have been published. We are seeing a x15.8 speedup with
> > Resnet50 (batch size 32) on AWS's new m5a.24xlarge machine. With a
> > smaller network (Mobilenet - batch size 32) the speedup is more
> > significant at x38.7. Let's have a vote to see if the PR to have
> > MKLDNN enabled by default
> > (https://github.com/apache/incubator-mxnet/pull/12591) can be merged
> > before 1.4.0 release.
> >
> > On 10/19/18, 9:17 AM, "Pedro Larroy" 
> > wrote:
> >
> > I did  pip install mxnet-mkl==1.3.1b20181018 on an AMD Ryzen 1950X
> > and unit
> > tests are passing.
> >
> > Is this build using AVX512?  in /proc/cpuinfo I see only "avx" flag.
> > There's no "avx2" like on recent intel cpus.
> >
> > Pedro.
> >
> > On Fri, Oct 19, 2018 at 5:12 PM Hagay Lupesko 
> > wrote:
> >
> > > Awesome collaborative effort across many contributors and
> companies!
> > >
> > > The boost is impressive and for MXNet users to get this boost
> > "out of the
> > > box" is a great benefit and makes MXNet an even better choice.
> > >
> > > Alex - can you clarify whether there are any down sides with
> > regards to
> > > noon AVX-512 architectures, AMD CPUs, etc? Will it gracefully
> > fallback?
> > &

[ANNOUNCEMENT] New Committer: Qing Lan

2018-11-20 Thread Naveen Swamy

The Project Podling Management Committee (PPMC) for Apache MXNet has
invited Qing Lan based on his contribution to MXNet Scala to become a
committer and we are pleased to announce that he has accepted.

Qing, thanks a lot for your contribution and continued effort to support
MXNet community.

Please join me in welcoming Qing to the project!

Thanks, Naveen
(on behalf of Apache MXNet PPMC)

Merging java-api Branch

2018-11-16 Thread Naveen Swamy

Hi All,

Just wanted to let you know that the work for MXNet Java-API was done on
the 'java-api' branch and contributed by multiple contributors, I merged
the PR[1] and later realized that the squash/merge credited all code
contribution to one contributor. we are going to revert the commit and
merge directly from the java-api branch to master like we did for the Julia
PR. The code has been already code-reviewed[1]

[1] https://github.com/apache/incubator-mxnet/pull/13162
Thanks, Naveen

Re: [Question] Difference between "Feature" and "Feature request" labels in Github

2018-11-13 Thread Naveen Swamy

done now, removed the feature label, there were 4 issues with that label
but also had Feature Request.

On Tue, Nov 13, 2018 at 5:05 PM Anirudh Acharya 
wrote:

> This issue was raised before here -
>
> https://lists.apache.org/thread.html/3e988e6bd82cb2d69ba20c21bf763952ed22a5732e61f6fba1f89ac8@%3Cdev.mxnet.apache.org%3E
>
> We need someone with committer privileges to fix it.
>
>
> Thanks
> Anirudh
>
>
>
> On Tue, Nov 13, 2018 at 4:36 PM Lin Yuan  wrote:
>
> > Dear Community,
> >
> > I often see there are "Feature" and "Feature request" labels in Github
> > issues. May I know the difference? If they are meant to be the same
> thing,
> > can we only keep one of them?
> >
> > Thanks,
> >
> > Lin
> >
>

Re: [Question] Difference between "Feature" and "Feature request" labels in Github

2018-11-13 Thread Naveen Swamy

there were a few more that had 'Feature' as label but didn't show up in the
filter search, I manually applied the `Feature Request` on them

On Tue, Nov 13, 2018 at 5:12 PM Naveen Swamy  wrote:

> done now, removed the feature label, there were 4 issues with that label
> but also had Feature Request.
>
> On Tue, Nov 13, 2018 at 5:05 PM Anirudh Acharya 
> wrote:
>
>> This issue was raised before here -
>>
>> https://lists.apache.org/thread.html/3e988e6bd82cb2d69ba20c21bf763952ed22a5732e61f6fba1f89ac8@%3Cdev.mxnet.apache.org%3E
>>
>> We need someone with committer privileges to fix it.
>>
>>
>> Thanks
>> Anirudh
>>
>>
>>
>> On Tue, Nov 13, 2018 at 4:36 PM Lin Yuan  wrote:
>>
>> > Dear Community,
>> >
>> > I often see there are "Feature" and "Feature request" labels in Github
>> > issues. May I know the difference? If they are meant to be the same
>> thing,
>> > can we only keep one of them?
>> >
>> > Thanks,
>> >
>> > Lin
>> >
>>
>

Re: Nightly/Weekly tests for examples

2018-11-13 Thread Naveen Swamy

Aaron, IMO tutorials have a specific purpose, to introduce concepts and
APIs to the users and I think converting examples to tutorials would
overwhelm the users, we should carefully choose which examples we want to
turn into tutorials.

I agree that today examples are graveyard of untested code, my suggestion
is to add some testing to the example when you touch the example - at the
least to check the functionality. These can be run once a week.


On Tue, Nov 13, 2018 at 6:52 AM Aaron Markham 
wrote:

> I've been actively promoting moving examples to tutorials during reviews.
> That way they fall under the testing umbrella and get added to the website.
>
> Many times there's not really a great distinction as to why something is in
> the examples folder, other than it's like a graveyard of untested sample
> code.
>
> I would suggest a starting strategy of when doing updates on examples, see
> if with just a little more effort, ask yourself, can it be converted to a
> tutorial?
>
> The last thing CI needs is more flaky tutorial tests, so whatever is done
> here should use the more robust approaches that are being discussed.
>
> Cheers,
> Aaron
>
> On Mon, Nov 12, 2018, 16:24 sandeep krishnamurthy <
> sandeep.krishn...@gmail.com wrote:
>
> > Thanks, Ankit for bringing this up. @Anirudh - All the concerns you
> raised
> > are very valid. Here are my thoughts:
> > 1. There were several examples that were crashing or had compiler errors.
> > This is a very bad user experience. All example scripts should be at
> > least runnable!
> > 2. While I agree examples are too diverse (python scripts, notebooks,
> > epochs, print statements etc..) We can always start small, we can start
> > with 5 examples. We can use this to streamline all examples to be python
> > scripts, print statements, with the main function invoker that can take
> > params like epoch, dataset etc.
> > 3. We can start with running weekly tests to avoid too long nightly test
> > pipeline.
> > 4. One possible issue can be on a few examples that depend on a large or
> > controlled dataset. I am not sure yet, how to solve this, but, we can
> > think.
> >
> > Any suggestions?
> > Best,
> > Sandeep
> >
> >
> >
> > On Mon, Nov 12, 2018 at 10:38 AM Anirudh Acharya 
> > wrote:
> >
> > > Hi Ankit,
> > >
> > > I have a few concerns about testing examples. Before writing tests for
> > > examples,
> > >
> > >- you will need to first decide what constitutes a test for an
> > example,
> > >because examples are not API calls, which will have return
> statements
> > > and
> > >the test can just call the API and assert for certain values. Just
> > > testing
> > >if an example is a compilable python script will not add much value
> in
> > > my
> > >opinion.
> > >- And testing for example output and results will require a re-write
> > of
> > >many of the examples, because many of them currently just have print
> > >statements as outputs and does not return any value as such. I am
> not
> > > sure
> > >if it is worth the dev-effort.
> > >- the current set of examples in the mxnet repo are very diverse -
> > some
> > >are written as python notebooks, some are just python scripts with
> > paper
> > >implementations, and some are just illustrations of certain mxnet
> > > features.
> > >I am curious to know how you will write tests for these things.
> > >
> > >
> > > Looking forward to seeing the design of this test bed/framework.
> > >
> > >
> > > Thanks
> > > Anirudh Acharya
> > >
> > > On Fri, Nov 9, 2018 at 2:39 PM Marco de Abreu
> > >  wrote:
> > >
> > > > Hello Ankit,
> > > >
> > > > that's a great idea! Using the tutorial tests as reference is a great
> > > > starting point. If you are interested, please don't hesitate to
> attend
> > > the
> > > > Berlin user group in case you would like to discuss your first
> thoughts
> > > > in-person before drafting a design.
> > > >
> > > > -Marco
> > > >
> > > >
> > > > Am Fr., 9. Nov. 2018, 23:23 hat khedia.an...@gmail.com <
> > > > khedia.an...@gmail.com> geschrieben:
> > > >
> > > > > Hi MXNet community,
> > > > >
> > > > > Recently, I and a few other contributors focussed on fixing
> examples
> > in
> > > > > our repository which were not working out of the box as expected.
> > > > > https://github.com/apache/incubator-mxnet/issues/12800
> > > > > https://github.com/apache/incubator-mxnet/issues/11895
> > > > > https://github.com/apache/incubator-mxnet/pull/13196
> > > > >
> > > > > Some of the examples failed after API changes and remained uncaught
> > > until
> > > > > a user reported the issue. While the community is actively working
> on
> > > > > fixing it, it might re-occur after few days if we don’t have a
> proper
> > > > > mechanism to catch regressions.
> > > > >
> > > > > So, I would like to propose to enable nightly/weekly tests for the
> > > > > examples similar to what we have for tutorials to catch any such
> > > > > regressions. The test could check only

Re: LabelBot New Design in Production

2018-11-08 Thread Naveen Swamy

Great job!, this is very helpful to triage issues!, users when creating a
new Issue could themselves tag the issues. May be we should add that to the
issue template?

On Thu, Nov 8, 2018 at 3:54 PM Harsh Patel 
wrote:

> Hey all,
> The upgraded label bot has been pushed into production. Current
> functionality includes
> add, delete, and update.
> (i.e. @mxnet-label-bot add ['label']
> @mxnet-label-bot remove ['label']
> @mxnet-label-bot update ['label'])
>
> Users should feel free to leave suggestions and any potential issues. The
> forum to this best would be here:
> https://github.com/apache/incubator-mxnet/issues/13163
>
> Best,
> -Harsh Patel
>

Re: [Announce] Upcoming Apache MXNet (incubating) 1.3.1 patch release

2018-11-08 Thread Naveen Swamy

Anton, I don't think we need to add the Mac OS tests for 1.3.1 branch since
travis CI is timing out and creates blockers, it also did not exist for
v1.3.0.


On Thu, Nov 8, 2018 at 10:04 AM Anton Chernov  wrote:

> A PR to fix the tests:
>
> Remove test for non existing index copy operator (v1.3.x)
> https://github.com/apache/incubator-mxnet/pull/13180
>
>
> Best
> Anton
>
> чт, 8 нояб. 2018 г. в 10:05, Anton Chernov :
>
> > An addition has been made to include MacOS tests for the v1.3.x branch:
> >
> > [MXNET-908] Enable minimal OSX Travis build (v1.3.x)
> > https://github.com/apache/incubator-mxnet/pull/13179
> >
> > It includes following PR's for master:
> >
> > [MXNET-908] Enable minimal OSX Travis build
> > https://github.com/apache/incubator-mxnet/pull/12462
> >
> > [MXNET-908] Enable python tests in Travis
> > https://github.com/apache/incubator-mxnet/pull/12550
> >
> > [MXNET-968] Fix MacOS python tests
> > https://github.com/apache/incubator-mxnet/pull/12590
> >
> >
> > Best
> > Anton
> >
> >
> > чт, 8 нояб. 2018 г. в 9:38, Anton Chernov :
> >
> >> Thank you everyone for your support and suggestions. All proposed PR's
> >> have been merged. We will tag the release candidate and start the vote
> on
> >> Friday, the 9th of November 2018.
> >>
> >> Unfortunately after the merges the tests started to fail:
> >>
> >> http://jenkins.mxnet-ci.amazon-ml.com/job/incubator-mxnet/job/v1.3.x/
> >>
> >> I will look into the failures, but any help as usual is very
> appreciated.
> >>
> >> The nightly tests are fine:
> >> http://jenkins.mxnet-ci.amazon-ml.com/job/NightlyTests/job/v1.3.x/
> >>
> >>
> >> Best
> >> Anton
> >>
> >>
> >>
> >>
> >> ср, 7 нояб. 2018 г. в 17:19, Anton Chernov :
> >>
> >>> Yes, you are right about the versions wording, thanks for
> clarification.
> >>>
> >>> A performance improvement can be considered a bugfix as well. I see no
> >>> big risks in including PR's by Haibin and Lin into the patch release.
> >>>
> >>> @Haibin, if you can reopen the PR's they should be good to go for the
> >>> relase, considering the importance of the improvements.
> >>>
> >>> I propose the following bugfixes for the release as well (already
> >>> created corresponding PR's):
> >>>
> >>> Fixed __setattr__ method of _MXClassPropertyMetaClass (v1.3.x)
> >>> https://github.com/apache/incubator-mxnet/pull/13157
> >>>
> >>> fixed symbols naming in RNNCell, LSTMCell, GRUCell (v1.3.x)
> >>> https://github.com/apache/incubator-mxnet/pull/13158
> >>>
> >>> We will be starting to merge the PR's shortly. If are no more proposals
> >>> for backporting I would consider the list as set.
> >>>
> >>> Best
> >>> Anton
> >>>
> >>> ср, 7 нояб. 2018 г. в 17:01, Sheng Zha :
> >>>
>  Hi Anton,
> 
>  I hear your concern about a simultaneous 1.4.0 release and it
> certainly
>  is a valid one.
> 
>  Regarding the release, let’s agree on the language first. According to
>  semver.org, 1.3.1 release is considered patch release, which is for
>  backward compatible bug fixes, while 1.4.0 release is considered minor
>  release, which is for backward compatible new features. A major
> release
>  would mean 2.0.
> 
>  The three PRs suggested by Haibin and Lin are all introducing new
>  features. If they go into a patch release, it would require an
> exception
>  accepted by the community. Also, if other violation happens it could
> be
>  ground for declining a release during votes.
> 
>  -sz
> 
>  > On Nov 7, 2018, at 2:25 AM, Anton Chernov 
>  wrote:
>  >
>  > [MXNET-1179] Enforce deterministic algorithms in convolution layers
> 
> >>>
>

Re: [Announce] Upcoming Apache MXNet (incubating) 1.3.1 patch release

2018-11-06 Thread Naveen Swamy

Please note that this is a patch release(1.3.1) to address critical bugs!, For 
everything else please wait for 1.4.0 which is planned very shortly after 1.3.1

> On Nov 6, 2018, at 7:17 AM, Anton Chernov  wrote:
> 
> The following PR's have been created so far:
> 
> Infer dtype in SymbolBlock import from input symbol (v1.3.x)
> https://github.com/apache/incubator-mxnet/pull/13117
> 
> [MXNET-953] Fix oob memory read (v1.3.x)
> https://github.com/apache/incubator-mxnet/pull/13118
> 
> [MXNET-969] Fix buffer overflow in RNNOp (v1.3.x)
> https://github.com/apache/incubator-mxnet/pull/13119
> 
> [MXNET-922] Fix memleak in profiler (v1.3.x)
> https://github.com/apache/incubator-mxnet/pull/13120
> 
> Set correct update on kvstore flag in dist_device_sync mode (v1.3.x)
> https://github.com/apache/incubator-mxnet/pull/13121
> 
> update mshadow (v1.3.x)
> https://github.com/apache/incubator-mxnet/pull/13122
> 
> CudnnFind() usage improvements (v1.3.x)
> https://github.com/apache/incubator-mxnet/pull/13123
> 
> Fix lazy record io when used with dataloader and multi_worker > 0 (v1.3.x)
> https://github.com/apache/incubator-mxnet/pull/13124
> 
> 
> As stated previously I would be rather opposed to have following PR's it in
> the patch release:
> 
> Gluon LSTM Projection and Clipping Support (#13055) v1.3.x
> https://github.com/apache/incubator-mxnet/pull/13129
> 
> sample_like operators (#13034) v1.3.x
> https://github.com/apache/incubator-mxnet/pull/13130
> 
> 
> Best
> Anton
> 
> вт, 6 нояб. 2018 г. в 16:06, Anton Chernov :
> 
>> Hi Haibin,
>> 
>> I have a few comments regarding the proposed performance improvement
>> changes.
>> 
>> CUDNN support for LSTM with projection & clipping
>> https://github.com/apache/incubator-mxnet/pull/13056
>> 
>> There is no doubt that this change brings value, but I don't see it as a
>> critical bug fix. I would rather leave it for the next major release.
>> 
>> sample_like operators
>> https://github.com/apache/incubator-mxnet/pull/13034
>> 
>> Even if it's related to performance, this is an addition of functionality
>> and I would also push this to be in the next major release only.
>> 
>> 
>> Best
>> Anton
>> 
>> 
>> вт, 6 нояб. 2018 г. в 15:55, Anton Chernov :
>> 
>>> Hi Patric,
>>> 
>>> This change was listed in the 'PR candidates suggested for consideration
>>> for v1.3.1 patch release' section [1].
>>> 
>>> You are right, I also think that this is not a critical hotfix change
>>> that should be included into the 1.3.1 patch release.
>>> 
>>> Thus I'm not making any further efforts to bring it in.
>>> 
>>> Best
>>> Anton
>>> 
>>> [1]
>>> https://cwiki.apache.org/confluence/display/MXNET/Project+Proposals+for+next+MXNet+Release#PR_candidates
>>> 
>>> 
>>> вт, 6 нояб. 2018 г. в 1:14, Zhao, Patric :
>>> 
 Hi Anton,
 
 Thanks for looking into the MKL-DNN PR.
 
 As my understanding of cwiki (
 https://cwiki.apache.org/confluence/display/MXNET/Project+Proposals+for+next+MXNet+Release
 ),
 these features will go into 1.4 rather than patch release of 1.3.1.
 
 Feel free to correct me :)
 
 Thanks,
 
 --Patric
 
> -Original Message-
> From: Anton Chernov [mailto:mecher...@gmail.com]
> Sent: Tuesday, November 6, 2018 3:11 AM
> To: d...@mxnet.apache.org
> Subject: Re: [Announce] Upcoming Apache MXNet (incubating) 1.3.1 patch
> release
> 
> It seems that there is a problem porting following changes to the
 v1.3.x
> release branch:
> 
> Implement mkldnn convolution fusion and quantization
> https://github.com/apache/incubator-mxnet/pull/12530
> 
> MKL-DNN Quantization Examples and README
> https://github.com/apache/incubator-mxnet/pull/12808
> 
> The bases are different.
> 
> I would need help from authors of these changes to make a backport PR.
> 
> @ZhennanQin, @xinyu-intel would you be able to assist me and create the
> corresponding PR's?
> 
> Without proper history and domain knowledge I would not be able to
 create
> them by my own in reasonable amount of time, I'm afraid.
> 
> Best regards,
> Anton
> 
> пн, 5 нояб. 2018 г. в 19:45, Anton Chernov :
> 
>> 
>> As part of:
>> 
>> Implement mkldnn convolution fusion and quantization
>> https://github.com/apache/incubator-mxnet/pull/12530
>> 
>> I propose to add the examples and documentation PR as well:
>> 
>> MKL-DNN Quantization Examples and README
>> https://github.com/apache/incubator-mxnet/pull/12808
>> 
>> 
>> Best regards,
>> Anton
>> 
>> пн, 5 нояб. 2018 г. в 19:02, Anton Chernov :
>> 
>>> Dear MXNet community,
>>> 
>>> I will be the release manager for the upcoming 1.3.1 patch release.
>>> Naveen will be co-managing the release and providing help from the
>>> committers side.
>>> 
>>> The following dates have been set:
>>> 
>>>

Apache MXNet Workshop in ODSC

2018-11-04 Thread Naveen Swamy

Hey guys,

I wanted to let you know that I and Sandeep conducted a Hands on workshop
in ODSC, SF on November 1st, I conducted a similar session in London last
month.  This session had 30 attendees join us and the response was positive
and engaging especially for the fact that they could try out MXNet and see
a trained model. I covered some basics of DL and CNN, and conducted a
exercise of transfer learning a Facial Emotion Recognition model using a
VGG13 pre-trained model from GluonCV. Sandeep covered NLP concepts and an
exercise on training a sentiment analysis model using GluonNLP pre-trained
embedding/model.

The notebooks are here: https://github.com/TalkAI/apache-mxnet-odsc-2018

Slides are here:
https://www.slideshare.net/apachemxnet/apache-mxnet-odsc-west-2018

I also want to encourage you to participate in spreading the word about
MXNet by conducting workshops and/or present how you solve your business
problem using Apache MXNet in breakout sessions.

If you have feedback or questions please reach out to me.

Thanks, Naveen

Re: [VOTE] - Adopt "Become a Committer and PPMC Member" Document

2018-11-01 Thread Naveen Swamy

+1
Thanks everyone for your input and participation. Thanks to Carin for driving 
this.

> On Nov 1, 2018, at 6:07 AM, Carin Meier  wrote:
> 
> Reminder - vote ends tomorrow morning at 6:00 am EST
> 
>> On Mon, Oct 29, 2018 at 6:46 PM Carin Meier  wrote:
>> 
>> This vote is to adopt the document
>> https://cwiki.apache.org/confluence/display/MXNET/Become+an+Apache+MXNet+%28incubating%29+Committer+and+PPMC+Member+Proposal
>> to replace the current document
>> https://cwiki.apache.org/confluence/display/MXNET/Becoming+a+Committer
>> 
>> The dev discussion thread is here
>> https://lists.apache.org/thread.html/e61ffa26af374de7a99c475d406e462a00b26cfc1155e232198dd53e@%3Cdev.mxnet.apache.org%3E
>> 
>> The vote will be a procedural issue vote as defined
>> https://www.apache.org/foundation/voting.html
>> 
>> Votes on procedural issues follow the common format of majority rule
>> unless otherwise stated. That is, if there are more favourable votes than
>> unfavourable ones, the issue is considered to have passed -- regardless of
>> the number of votes in each category. (If the number of votes seems too
>> small to be representative of a community consensus, the issue is typically
>> not pursued. However, see the description of lazy consensus
>>  for a
>> modifying factor.)
>> 
>> The vote will run until Friday Nov 2nd at 6:00 am EST
>> 
>> Thanks,
>> Carin
>> 
>>

Re: [VOTE] - Adopt "Become a Committer and PPMC Member" Document

2018-10-29 Thread Naveen Swamy

Feel free to change to "rights" if that is more welcoming and suits better.


> On Oct 29, 2018, at 10:24 PM, Tianqi Chen  wrote:
> 
> Also from https://www.apache.org/foundation/how-it-works.html there is no
> mention of the word "privileges", maybe "right" is a better term.
> 
> I feel there is some wisdom in choose not to emphasize the entitlements
> being given in the role. After all, the PMC/committership is given by the
> community, and the main job of PMC/committer is to use the power serve the
> community well. And we should choose wisely as our actions have
> consequences, and the community is watching
> 
> Tianqi
> 
>> On Mon, Oct 29, 2018 at 10:03 PM Tianqi Chen  wrote:
>> 
>> As far as I recall from what Jim said
>> 
>> "The ASF strives for consensus, and votes and voting are used, primarily,
>> to gauge that. It's not used to divide a community; it's used to UNITE it.
>> Voting is used when collaboration and consensus building *FAILS*. It should
>> be rare."
>> 
>> In this context, we all agree that when a veto vote occurs everyone should
>> respect it and not kick a dead horse.  On the other hand, the
>> PMC/committers should be cautious when using this power, as the community
>> should always encourage reach consensus via reasonable technical discussion
>> first.
>> 
>> As with all the ML models, every guideline can be interpreted in an
>> adversarial fashion but I hope we can have a goodwill to build toward a
>> positive sum collaboration.
>> 
>> Tianqi
>> 
>> 
>> 
>>>> On Mon, Oct 29, 2018 at 9:01 PM Naveen Swamy  wrote:
>>> 
>>> The committer/PMC privileges is derived from
>>> https://www.apache.org/foundation/how-it-works.html.
>>> 
>>> The term abuse is very subjective (in this case) - If an opinion or Vote
>>> is
>>> against something they prefer, it can be termed as Abuse. I would expect
>>> those who differ with the vote to take that as feedback, if there are
>>> corrections to be made in the understanding, they respectfully clarify
>>> that
>>> misunderstanding.
>>> 
>>> I agree with Chris, we have seen in the past where discussions have gone
>>> on
>>> and on for a long time when there were disagreements until people gave up,
>>> This leads to frustration and less participation by members - this is also
>>> an ultimate productivity killer. You can see why some of the discuss
>>> threads go quiet and die.
>>> 
>>> I am all for discussion and reaching consensus but at some point one must
>>> realize its just kicking a dead horse and turns into an endurance contest
>>> rather than a discussion. We should be careful on the expectations we set
>>> in regard to how we reach consensus.
>>> 
>>> 
>>> On Mon, Oct 29, 2018 at 6:18 PM Chris Olivier 
>>> wrote:
>>> 
>>>> well, if something needs consensus to pass, then saying “you need to
>>> keep
>>>> discussing until consensus is reached” seems like it could be abused by
>>>> someone who was just willing to not accept a verdict and continues to
>>> push,
>>>> right? And if someone were to walk away saying “I don’t want to discuss
>>>> this any further”, which is fair in that situation, then they’re the
>>> “bad
>>>> guy”? While it sounds like a noble persuit, I just feel like this could
>>> be
>>>> abused.
>>>> 
>>>>> On Mon, Oct 29, 2018 at 5:53 PM Carin Meier 
>>>> wrote:
>>>> 
>>>>> Chris,
>>>>> 
>>>>> Is there are rewording that you would find more acceptable? Again, we
>>> can
>>>>> have more time to edit and revise the document. There is not a time
>>> limit
>>>>> on this. I might have been too hasty to start the vote thinking the
>>>>> discussion was wrapped up.
>>>>> 
>>>>> - Carin
>>>>> 
>>>>> On Mon, Oct 29, 2018 at 8:50 PM Chris Olivier 
>>>>> wrote:
>>>>> 
>>>>>> or another example if something is downvoted, this also implies that
>>>>> after
>>>>>> a vote is over, it’s approprorate to continue pushing the subject
>>>> trying
>>>>> to
>>>>>> just wear everyone down even though the outcome is clear. We’ve seen
>>>> this
>>>>>> before, actually.
>>>>

Re: [VOTE] - Adopt "Become a Committer and PPMC Member" Document

2018-10-29 Thread Naveen Swamy

The committer/PMC privileges is derived from
https://www.apache.org/foundation/how-it-works.html.

The term abuse is very subjective (in this case) - If an opinion or Vote is
against something they prefer, it can be termed as Abuse. I would expect
those who differ with the vote to take that as feedback, if there are
corrections to be made in the understanding, they respectfully clarify that
misunderstanding.

I agree with Chris, we have seen in the past where discussions have gone on
and on for a long time when there were disagreements until people gave up,
This leads to frustration and less participation by members - this is also
an ultimate productivity killer. You can see why some of the discuss
threads go quiet and die.

I am all for discussion and reaching consensus but at some point one must
realize its just kicking a dead horse and turns into an endurance contest
rather than a discussion. We should be careful on the expectations we set
in regard to how we reach consensus.


On Mon, Oct 29, 2018 at 6:18 PM Chris Olivier  wrote:

> well, if something needs consensus to pass, then saying “you need to keep
> discussing until consensus is reached” seems like it could be abused by
> someone who was just willing to not accept a verdict and continues to push,
> right? And if someone were to walk away saying “I don’t want to discuss
> this any further”, which is fair in that situation, then they’re the “bad
> guy”? While it sounds like a noble persuit, I just feel like this could be
> abused.
>
> On Mon, Oct 29, 2018 at 5:53 PM Carin Meier  wrote:
>
> > Chris,
> >
> > Is there are rewording that you would find more acceptable? Again, we can
> > have more time to edit and revise the document. There is not a time limit
> > on this. I might have been too hasty to start the vote thinking the
> > discussion was wrapped up.
> >
> > - Carin
> >
> > On Mon, Oct 29, 2018 at 8:50 PM Chris Olivier 
> > wrote:
> >
> > > or another example if something is downvoted, this also implies that
> > after
> > > a vote is over, it’s approprorate to continue pushing the subject
> trying
> > to
> > > just wear everyone down even though the outcome is clear. We’ve seen
> this
> > > before, actually.
> > >
> > > On Mon, Oct 29, 2018 at 5:41 PM Chris Olivier 
> > > wrote:
> > >
> > > > -1 “strive to meet consensus”? This seems to imply the consensus is
> the
> > > > natural expected state. So in the case where someone submits that we
> > > should
> > > > start a nuclear war, then our bylaws would state that we should all
> try
> > > to
> > > > agree to start a nuclear war.
> > > >
> > > > On Mon, Oct 29, 2018 at 4:41 PM Tianqi Chen 
> wrote:
> > > >
> > > >> Hi Carin:
> > > >> Sorry for the last minute request, but given the way we write
> down
> > > the
> > > >> PMC, committer privileges, I feel we need to add an additional line:
> > > >>
> > > >>- "PMC/committer should strive to be diplomatic and reach
> consensus
> > > >> with
> > > >> discussion when possible."
> > > >>
> > > >>Since I don't really want us to give an impression of abusing
> veto
> > > >> rights.
> > > >>
> > > >> Thanks!
> > > >> Tianqi
> > > >>
> > > >> On Mon, Oct 29, 2018 at 3:47 PM Carin Meier 
> > > wrote:
> > > >>
> > > >> > This vote is to adopt the document
> > > >> >
> > > >> >
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/MXNET/Become+an+Apache+MXNet+%28incubating%29+Committer+and+PPMC+Member+Proposal
> > > >> > to replace the current document
> > > >> >
> > > https://cwiki.apache.org/confluence/display/MXNET/Becoming+a+Committer
> > > >> >
> > > >> > The dev discussion thread is here
> > > >> >
> > > >> >
> > > >>
> > >
> >
> https://lists.apache.org/thread.html/e61ffa26af374de7a99c475d406e462a00b26cfc1155e232198dd53e@%3Cdev.mxnet.apache.org%3E
> > > >> >
> > > >> > The vote will be a procedural issue vote as defined
> > > >> > https://www.apache.org/foundation/voting.html
> > > >> >
> > > >> > Votes on procedural issues follow the common format of majority
> rule
> > > >> unless
> > > >> > otherwise stated. That is, if there are more favourable votes than
> > > >> > unfavourable ones, the issue is considered to have passed --
> > > regardless
> > > >> of
> > > >> > the number of votes in each category. (If the number of votes
> seems
> > > too
> > > >> > small to be representative of a community consensus, the issue is
> > > >> typically
> > > >> > not pursued. However, see the description of lazy consensus
> > > >> > 
> for a
> > > >> > modifying factor.)
> > > >> >
> > > >> > The vote will run until Friday Nov 2nd at 6:00 am EST
> > > >> >
> > > >> > Thanks,
> > > >> > Carin
> > > >> >
> > > >>
> > > >
> > >
> >
>

Re: [VOTE] - Adopt "Become a Committer and PPMC Member" Document

2018-10-29 Thread Naveen Swamy

May be we can adopt this.
https://struts.apache.org/bylaws.html#voting



> On Oct 29, 2018, at 5:52 PM, Carin Meier  wrote:
> 
> Chris,
> 
> Is there are rewording that you would find more acceptable? Again, we can
> have more time to edit and revise the document. There is not a time limit
> on this. I might have been too hasty to start the vote thinking the
> discussion was wrapped up.
> 
> - Carin
> 
>> On Mon, Oct 29, 2018 at 8:50 PM Chris Olivier  wrote:
>> 
>> or another example if something is downvoted, this also implies that after
>> a vote is over, it’s approprorate to continue pushing the subject trying to
>> just wear everyone down even though the outcome is clear. We’ve seen this
>> before, actually.
>> 
>> On Mon, Oct 29, 2018 at 5:41 PM Chris Olivier 
>> wrote:
>> 
>>> -1 “strive to meet consensus”? This seems to imply the consensus is the
>>> natural expected state. So in the case where someone submits that we
>> should
>>> start a nuclear war, then our bylaws would state that we should all try
>> to
>>> agree to start a nuclear war.
>>> 
 On Mon, Oct 29, 2018 at 4:41 PM Tianqi Chen  wrote:
 
 Hi Carin:
Sorry for the last minute request, but given the way we write down
>> the
 PMC, committer privileges, I feel we need to add an additional line:
 
   - "PMC/committer should strive to be diplomatic and reach consensus
 with
 discussion when possible."
 
   Since I don't really want us to give an impression of abusing veto
 rights.
 
 Thanks!
 Tianqi
 
 On Mon, Oct 29, 2018 at 3:47 PM Carin Meier 
>> wrote:
 
> This vote is to adopt the document
> 
> 
 
>> https://cwiki.apache.org/confluence/display/MXNET/Become+an+Apache+MXNet+%28incubating%29+Committer+and+PPMC+Member+Proposal
> to replace the current document
> 
>> https://cwiki.apache.org/confluence/display/MXNET/Becoming+a+Committer
> 
> The dev discussion thread is here
> 
> 
 
>> https://lists.apache.org/thread.html/e61ffa26af374de7a99c475d406e462a00b26cfc1155e232198dd53e@%3Cdev.mxnet.apache.org%3E
> 
> The vote will be a procedural issue vote as defined
> https://www.apache.org/foundation/voting.html
> 
> Votes on procedural issues follow the common format of majority rule
 unless
> otherwise stated. That is, if there are more favourable votes than
> unfavourable ones, the issue is considered to have passed --
>> regardless
 of
> the number of votes in each category. (If the number of votes seems
>> too
> small to be representative of a community consensus, the issue is
 typically
> not pursued. However, see the description of lazy consensus
>  for a
> modifying factor.)
> 
> The vote will run until Friday Nov 2nd at 6:00 am EST
> 
> Thanks,
> Carin
> 
 
>>> 
>>

Re: [DISCUSS] - Revisions to Committer Criteria

2018-10-28 Thread Naveen Swamy

I added clarifying sections to explicitly call out committers/PMC
privileges. Please review.

Pasting here for convenience
Committer Privileges

   - Committers have write access to the code repository.
   - Committers have an @apache.org email address.
   - Committers can make short-term decisions for the project, approving
   and merging pull requests.
   - Committer Vote is *NOT* considered *binding* thus the vote you cast do
   not have *Veto* on issues that require consensus.
   - Committer's can request changes on Pull Requests but it does not
   constitute Veto, PMC can agree to approve or reject requested changes.

PMC Privileges

   - PMC makes the long-term decisions with regard to the project.
   - PMC members have write access to the code repository.
   - PMC members have @apache.org email address.
   - PMC has access to private@ email list
   - PMC has the right to vote for the community-related decisions, PMC
   Votes are *binding*.
   - PMC has the right to propose active users for committership.
   - PMC must vote on any formal release of the project's software product.


All, I suggest you review the proposal and if there is any concern please
voice it here before this goes out for voting.


On Sun, Oct 28, 2018 at 8:04 AM Carin Meier  wrote:

> I plan to start a vote on the adopting
>
> https://cwiki.apache.org/confluence/display/MXNET/Become+an+Apache+MXNet+%28incubating%29+Committer+and+PPMC+Member+Proposal
> to
> replace our current document
> https://cwiki.apache.org/confluence/display/MXNET/Becoming+a+Committer
> tomorrow
> (Monday).
>
> - Carin
>
> On Thu, Oct 25, 2018 at 8:32 AM Carin Meier  wrote:
>
> > Thanks for publishing the notes and also thanks everyone for providing
> > valuable feedback and discussion.
> >
> > I encourage everyone that has ideas for improvement to the document to
> > feel free to edit and revise. If you need a login to the wiki, please
> just
> > ask.
> >
> > Also, while editing, please keep in mind that the intent is to have a
> vote
> > on adopting the new
> >
> https://cwiki.apache.org/confluence/display/MXNET/Become+an+Apache+MXNet+%28incubating%29+Committer+and+PPMC+Member+Proposal
> > to replace our current document
> > https://cwiki.apache.org/confluence/display/MXNET/Becoming+a+Committer
> > before a vote on separating levels of committer and PPMC as a process.
> So,
> > if possible, adopting wording that would work in either outcome of that
> > vote.
> >
> > On the subject of voting, I was thinking of starting a vote on Friday,
> but
> > will delay that until the active discussions and revisions are complete.
> >
> > Best,
> > Carin
> >
> > On Thu, Oct 25, 2018 at 6:39 AM Pedro Larroy <
> pedro.larroy.li...@gmail.com>
> > wrote:
> >
> >> This is the first hangout that I was able to attend, I liked the format
> >> and
> >> found them valuable. Thanks for organizing and publishing the notes.
> >> Looking forward to the next one.
> >>
> >> Pedro
> >>
> >> On Thu, Oct 25, 2018 at 6:44 AM Steffen Rochel  >
> >> wrote:
> >>
> >> > Carin - please see
> >> >
> >> >
> >>
> https://cwiki.apache.org/confluence/display/MXNET/Hangout+October+24th+2018+8am+and+5pm+PDT
> >> > :
> >> > Discussion about committer proposal:
> >> >
> >> >- Proposal default should be to have separation between committer
> and
> >> >PPMC election
> >> >- Criteria are vague, should we add some example persona?
> >> >- Spell out privileges of committer and PPMC member
> >> >
> >> >
> >> > Note: I update the project proposal to address first bullet.
> >> >
> >> > Steffen
> >> >
> >> >
> >> > On Wed, Oct 24, 2018 at 11:29 AM Carin Meier 
> >> wrote:
> >> >
> >> > > A request to whoever is taking notes at the MXNet Hangouts that are
> >> > > occurring today. Could you please recap feedback from the meeting in
> >> > > regards to document revisions here for everyone? I would like to
> >> attend
> >> > the
> >> > > session later today, but may not due to family obligations.
> >> > >
> >> > > Thanks!
> >> > > Carin
> >> > >
> >> > > On Tue, Oct 23, 2018 at 2:24 PM Steffen Rochel <
> >> steffenroc...@gmail.com>
> >> > > wrote:
> >> > >
> >> > > > Carin - I got feedback on my proposal and made changes. I
> >> incorporated
> >> > > > Tianqi's suggesiton that we should strive to nominate
> committer/PPMC
> >> > > > candidates from outside ones own organization. It should not be
> >> > > considered
> >> > > > as a hard rule, but recommendation.
> >> > > >
> >> > > > Steffen
> >> > > >
> >> > > > On Mon, Oct 22, 2018 at 2:18 PM Carin Meier  >
> >> > > wrote:
> >> > > >
> >> > > > > Thanks Steffen helping draft up the proposal for Committer and
> >> PPMC
> >> > > > > guidelines.
> >> > > > >
> >> > > > > Please everyone review and provide feedback
> >> > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://cwiki.apache.org/confluence/display/MXNET/Become+an+Apache+MXNet+(incubating)+Committer+and+PPMC+Member+Proposal
> >> > > > > .
> >> > > > >
> >> > > > > I plan to start a

Re: Join mxnet Slack

2018-10-24 Thread Naveen Swamy

Hi Piston,

Welcome to MXNet! Feel free to browse through our docs meant for developers
here
https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+Home

and let us know if you have any questions or feedback

Thanks, Naveen

On Wed, Oct 24, 2018 at 8:54 AM Piston Yang  wrote:

> Hi, I’m a deep user of Gluon for DeepLearning.Please let me join your
> channel.
>

MXNet Meetup Group in Copenhagen

2018-10-24 Thread Naveen Swamy

One of MXNet's user Cosmin has started a Meetup group in Copenhagen, interested 
and those living around the area please sign up 
https://www.meetup.com/meetup-group-bdEUVQHL/

Thanks, Naveen

Re: [Discussion] PMC and Committer Courtesy: Only Propose Candidate in a Different Organization

2018-10-21 Thread Naveen Swamy

this suggestion looks like it is putting the onus on contributors to
collaborate with contributors outside their org to get nominated to be
committer or a PMC of this project.
Every organization has its own business goals, on the way to meet their
objectives if their employees happen to be great contributors to this
project, I would expect PMC members(wearing their Apache hat) to recognize
them and give them a greater role in the project.
I would assume the responsibility of increasing the diversity is solely
upon the PMC members, the PMC should look ways to evangelize the project,
mentor new contributors, nominate and make them a part of the project's
journey.
I do agree that we have to increase the diversity and suggest to explore
different ways( for example collaborate with other successful Open source
projects to get their members excited about MXNet).

Guideline or not, I cannot agree to this in principle.
-1


On Sun, Oct 21, 2018 at 8:22 PM Tianqi Chen 
wrote:

> >
> >  Many potential committers and
> > PMC won’t interact with the non-Amazonians at all (since there are so
> few),
> > so they’d be relegated to obscurity and hopelessness by default.
> >
>
> If potential contributors do not comes from Amazon, then the Amazonian PMC
> can nominate them :)  If the potential contributors does comes from Amazon,
> then it is not a bad thing to interact with bigger part of the community. I
> can expect that as more non-Amazonian contributors get nonimated, this
> would make the process more healthy.
>
> Like neural networks, any guideline can be played in adverserial fashion
> (e.g. in the case of the gray areas). I think having a goodwill to push the
> guideline will understandably make people to work together.
>
> Afterall, this is an Apache project that should goes beyond a single
> company
>
> Tianqi
>
> >
> >
> >
> > On Sun, Oct 21, 2018 at 5:06 PM Steffen Rochel 
> > wrote:
> >
> > > Hi Tianqi -
> > > +1 . I like the idea to grow diversity at the project and encourage
> > > communication beyond people sitting next to each other. I also support
> > the
> > > way you described as guideline, not has a hard rule. I think it is
> > > important we focus on merit and contributions when evaluating nominee
> for
> > > committer and PPMC.
> > >
> > > Carin started a draft document for revised criteria for committer and
> > PPMC
> > > membership
> > > <
> > >
> >
> https://cwiki.apache.org/confluence/display/MXNET/Become+an+Apache+MXNet+%28incubating%29+Committer+and+PPMC+Member+Proposal
> > > >.
> > > I suggest to contribute, provide feedback and suggestion including your
> > > proposal.
> > >
> > > Steffen
> > >
> > > On Sun, Oct 21, 2018 at 10:22 AM Tianqi Chen 
> wrote:
> > >
> > > > Dear MXNet Community:
> > > >
> > > > There has been a great discussion going on in terms of
> > > > PMC/Committer Criteria.  As a community move forward, it is important
> > to
> > > > make the community inclusive to everyone and encourage folks to work
> > > > together.
> > > >
> > > > I want to propose the following proposal courtesy: when a PMC
> proposes
> > a
> > > > committer/PMC member, for courtesy of the community, she/he should
> only
> > > > propose a person from a different organization(company).
> > > >
> > > > The idea behind that is that the Apache project goes beyond a single
> > > > organization, it is important to recognize others, including those
> > from a
> > > > different organization in the community, and get your merit being
> > > > recognized by others.
> > > >
> > > > Admittedly, this would also give more "power" to the PMC members from
> > > > minority organizations -- which I think is a good thing. This might
> > also
> > > > encourage everyone to work together and talk to folks who are beyond
> > your
> > > > next door
> > > >
> > > > Tianqi
> > > >
> > >
> >
>

Re: [Discussion] Recognise Reviewers, Besides Committers and PMC

2018-10-19 Thread Naveen Swamy

+1
reviews are great contributions given that it takes good understanding and
investment of time to provide insightful feedback.

On Fri, Oct 19, 2018 at 11:51 AM Tianqi Chen  wrote:

> Dear MXNet Community:
>
> There is a great discussion going on in terms of lowering the barrier of
> entries and encourage more contribution to the project.  One of the general
> goals is to encourage a broader pool of contributions. I want to make the
> following proposal:
>
> Besides Committers and PMC, let us also recognize Reviewers in the
> community.  This is a "pseudo role" as there is no such official role in
> Apache. But I want to explore the possibility of recognizing active
> reviewers for example, by adding a list of names in the contributor list.
> In general, I find it is really helpful to have more code reviews.
> Recognizing good reviewers early enables us to find committer candidates,
> and encourage them to contribute and understand what is the bar of code
> quality that is required to merge the code.
>
> This can provide the community with more evidence when recruiting new
> committers. After all the write access of committership is about to the
> code and understand the consequence of the responsibility -- which is
> usually can be found in high-quality review history.
>
> Please let me know what you think.
>
> Tianqi
>

Re: [Discussion] Separating PMC and Committership

2018-10-18 Thread Naveen Swamy

I suggest we discuss on what the revised criteria for committers will be
and how do committers move up to become a PMC member before voting on the
separation ? I would like to see if that helps grow the community before
Voting on this.

@Chris Olivier  Can you clarify what you mean by
bonafide intentions and other intangeables ?,
I would assume one can still consider them while you vote if you can
justify or support it and not be just based on how someone feels.

On Thu, Oct 18, 2018 at 8:29 AM Chris Olivier  wrote:

> IMHO it’s not a great idea to develop a hard criteria for committer and PMC
> as if it were some sort of checklist. If that were the case, then people
> would tend to be just laser-focused on checking items off the list rather
> than a bonafied drive to improve the product and the community.  It would
> also make it difficult to consider other intangeables in the decision.
>
>
> On Thu, Oct 18, 2018 at 5:43 AM Carin Meier  wrote:
>
> > Thanks Micheal for making the process clearer to me. It helps quite a
> bit.
> >
> > Also thanks to Chris and Steffen for your clarification and input.
> >
> > I think there are two issues that are intermingled in considering this.
> One
> > relates to separating levels of committer and PMC member. The other, as
> > Steffen pointed out, relates to the criteria which we use to consider
> > people for these levels of membership. I would propose that to make it
> > easier to achieve consensus, we consider them each as their own proposal.
> >
> > The proposal of separating levels of committer and PMC member can be
> > considered on the Apache definitions of rights and responsibilities here
> > https://www.apache.org/foundation/how-it-works.html#roles: Since the PMC
> > member has more rights and responsibilities than a committer, I think it
> > implies a stricter criteria, (although it would be unspecified in the
> > proposal).
> >
> > The proposal of redefining our project's criteria in respect to how we
> > consider nomination to those roles could be a separate discussion and
> vote
> > since there are other issues that we might want to tackle such as
> inclusion
> > of non-code contributions and general alignment to the Apache
> definitions.
> >
> > We can of course choose to tackle the proposal of redefining the criteria
> > first or do the separation of levels first since the discussion is
> already
> > in progress.
> >
> > Thoughts?
> >
> > - Carin
> >
> >
> >
> >
> >
> >
> > On Thu, Oct 18, 2018 at 2:04 AM Steffen Rochel 
> > wrote:
> >
> > > Haibin's proposed "For active contributors we first invite them to
> become
> > > our committers. Later on as they make significant contribution, we can
> > > invite them to PMC."
> > > Several people raised the question what defines "active contributors"
> and
> > > "make significant contribution". In my view the discussion has not
> > answered
> > > the questions and it is not clear to me what changes are proposed to
> > > https://cwiki.apache.org/confluence/display/MXNET/Becoming+a+Committer
> .
> > > I'm making the assumption that the proposal is to simplify the path for
> > > becoming a committer to grow the committer community. So far I have not
> > > heard what changes or simplifications are proposed. Without a change I
> > fail
> > > to see the benefit of this proposal to increase the number of
> committers.
> > > I agree that the path from committer to PMC member should be clarified
> as
> > > well and suggest to align with expectations and responsibilities of PMC
> > > members.
> > > I'm also under the assumption that the proposal only applies for future
> > > committers and PMC members, not for existing PMC members and this
> > > assumption should be clarified.
> > >
> > > Steffen
> > >
> > > On Wed, Oct 17, 2018 at 4:29 PM Chris Olivier 
> > > wrote:
> > >
> > > > I believe the assumption has always been that current PMC members
> will
> > > > remain PMC members.
> > > >
> > > > On Wed, Oct 17, 2018 at 3:51 PM Michael Wall 
> wrote:
> > > >
> > > > > I too think separating committers from PMC is a good idea for your
> > > > project
> > > > > given the desire to grow committers and the concerns I have seen
> > trying
> > > > to
> > > > > add new committers.  I saw at least one other mentor, Jim on this
> > > thread
> > > > > too.
> > > > >
> > > > > Is the plan to leave all current PMC members in the PMC?  If that
> is
> > > not
> > > > > the plan, perhaps more discussion is required before moving on.
> > > > >
> > > > > Assuming you feel the discussion is done, someone needs to start a
> > > vote.
> > > > > This would be a procedural change as outlined on
> > > > > https://www.apache.org/foundation/voting.html
> > > > >
> > > > > If I were doing it, I would announce on this thread I am starting a
> > > vote
> > > > on
> > > > > this matter tomorrow or some specified time.  I might even outline
> > what
> > > > the
> > > > > vote will be.  This give people a chance to speak up if they think
> > more
> > > >

Storing PGP Key for Publishing packages

2018-10-17 Thread Naveen Swamy

I am collaborating with Zach Kimberg and Qing to work on automatic (
currently its very tedious and time consuming) publishing the MXNet-Scala
maven package to Apache Snapshot repo(either as nightly or weekly), for
publishing the package the artifacts need to be signed with a committer's
key, however Zach found Apache seems to strictly advise against storing the
PGP Keys, so I suggested to look at what Spark is doing and he found that
they are releasing to Apache Snapshots as a nightly job so they got to be
storing the credentials on the host.
I am looking for advise from Mentors on how to proceed with this?

One option(not preferable) is to publish to a private Repo or an S3 bucket
and only during the release and the keys continue to remain in the
committers control.

-- Advise on PGP Key storage on Apache website--


“It is recommended that you create a PGP key for your apache.org address
now (or add that address to an existing key, if you have one). *DO NOT* create
this key on any machine to which multiple users have access and *DO NOT*,
ever, copy your private key to any other shared machine. Release managers
need to take particular care of keys used to sign releases
.“ (
https://www.apache.org/dev/new-committers-guide.html#set-up-security-and-pgp-keys
)

“Strictly speaking, releases must be *verified
*
on
hardware owned and controlled by the committer. That means hardware the
committer has physical possession and control of and exclusively full
administrative/superuser access to. That's because only such hardware is
qualified to hold a PGP private key, and the release should be verified on
the machine the private key lives on or on a machine as trusted as that.” (
https://www.apache.org/legal/release-policy.html#release-signing)

 ---


Thanks, Naveen

Re: [LAZY VOTE]: rename dockerfiles s/.build.//

2018-10-17 Thread Naveen Swamy

I agree with Kellen on not renaming the CI docker files (by renaming - i
think its implicit you can use these for production) i don't think we
should telling our users go use these bloated docker files, you could
create lean separate docker files for production use-case with only
necessary runtime packages.

-1

On Wed, Oct 17, 2018 at 11:48 AM kellen sunderland <
kellen.sunderl...@gmail.com> wrote:

> Hey Pedro, sorry I still don't see a good reason to justify changing the
> filenames.  Renaming them to be less specific isn't going to explain to
> users what the purpose of the files is, and it could cause breakages with
> any system that refer to these files including external company's CI
> systems.  If I think of the benefits versus potential errors introduced by
> making the change I see more potential risk than obvious benefits.  I also
> feel that this change will make the difference between the runtime docker
> files and the CI docker files less clear to users, not more clear.  In
> general I think adding a descriptive README.md would server our purposed
> better here.  Happy to hear what others think.
>
> On Wed, Oct 17, 2018 at 6:45 AM Pedro Larroy  >
> wrote:
>
> > Hi Kellen, thank you for your response.
> >
> > Maybe I didn't explain myself correctly. The purpose of this
> infrastructure
> > is not changed.
> >
> > I'm not planning to use these Dockerfiles as MXNet docker containers for
> > users to run MXNet, that is a separate concern.
> >
> > It is just that some of this Dockerfiles we use in CI to build, test and
> > generate documentation, so are used as a runtime container as well. Thus
> > i'm just changing the pathing for semantic reasons and remove the .build.
> > which is just noise.
> >
> > As an example I would like to explain that we are about to merge the PR
> > which uses QEMU to run the unit tests, so there's an associated
> Dockerfile
> > which hosts the QEMU runtime environment used to execute the unit tests
> in
> > an ARM emulated machine. Thus makes little sense that these Dockerfiles
> are
> > called "build".  I don't know if my explanation changes your vote. Either
> > way please let me know. Separating this change in a different PR was
> > suggested by several MXNet contributors during review.
> >
> > Pedro.
> >
> > On Wed, Oct 17, 2018 at 11:21 AM kellen sunderland <
> > kellen.sunderl...@gmail.com> wrote:
> >
> > > -1. (non-binding)
> > >
> > > These Dockerfiles are very bloated and imo only useful for creating a
> > build
> > > environment or running tests.  Just as you wouldn't setup a server for
> a
> > > service and then install 200 packages that may or may not be used for
> the
> > > service I wouldn't recommend using these Dockerfiles at runtime.
> Runtime
> > > Dockerfiles should in my opinion be as lightweight and suited to their
> > task
> > > as possible.
> > >
> > > On Wed, Oct 17, 2018, 1:58 AM Hagay Lupesko  wrote:
> > >
> > > > The PR provides a good explanation of this change and all code
> updates.
> > > > LGTM.
> > > >
> > > > On Tue, Oct 16, 2018 at 8:41 AM Pedro Larroy <
> > > pedro.larroy.li...@gmail.com
> > > > >
> > > > wrote:
> > > >
> > > > > Hi
> > > > >
> > > > > I would like to rename the dockerfiles since they are used as a
> > runtime
> > > > > environment and not only as build as they were initially intended.
> > > > >
> > > > > More info about the change in this PR:
> > > > > https://github.com/apache/incubator-mxnet/pull/12423/files
> > > > >
> > > > >
> > > > > Pedro.
> > > > >
> > > >
> > >
> >
>

Re: Creating branch for Java_API

2018-10-15 Thread Naveen Swamy

Thanks guys!

On Mon, Oct 15, 2018 at 4:01 PM Marco de Abreu
 wrote:

> The job will automatically run as soon as a new commit has been pushed.
>
> -Marco
>
> Anton Chernov  schrieb am Mo., 15. Okt. 2018, 20:13:
>
> > Does this work for you?
> >
> > http://jenkins.mxnet-ci.amazon-ml.com/job/incubator-mxnet/job/java-api/
> >
> > This will be the normal build with the main Jenkinsfile. I'm not sure
> about
> > the triggering for new commits though...
> >
> > Best
> > Anton
> >
> >
> > пн, 15 окт. 2018 г. в 19:46, Naveen Swamy :
> >
> > > Thanks Anton, appreciate the offer of help. At the moment we are
> creating
> > > PRs for both Master and Java and this branch is temporary, I would not
> > mind
> > > having a job if it isn't too much work.
> > >
> > > Thanks, Naveen
> > >
> > > On Mon, Oct 15, 2018 at 7:00 AM Anton Chernov 
> > wrote:
> > >
> > > > We could create a special job for testing it, maybe with a tweaked
> > > > Jenkinsfile so you could run only the tests you are interested in.
> What
> > > do
> > > > you think?
> > > >
> > > > Best
> > > > Anton
> > > >
> > > > пт, 12 окт. 2018 г. в 20:24, Naveen Swamy :
> > > >
> > > > > Hi All,
> > > > >
> > > > > Just wanted to inform there that I am going to create a branch on
> > > GitHub
> > > > > for the Java API work that Andrew/Qing and few others are doing.
> This
> > > is
> > > > > only temporary I realize this will not have testing.
> > > > > There seems to be continued disagreement in the approaches we are
> > > > > taking(which is fine), so I are going to create a branch and
> provide
> > > the
> > > > > code to a few interested users(within Amazon) and get concrete
> > > > > feedback from them.
> > > > >
> > > > > Thanks, Naveen
> > > > >
> > > >
> > >
> >
>

Re: Reproducing test failures on CI

2018-10-15 Thread Naveen Swamy

Timur,
Here is a meetup Scheduled for 23rd October in London, where Pedro Larroy
will talk about Deep Learning using MXNet!

https://www.meetup.com/Deep-Learning-with-Apache-MXNet-London/events/255280739/


-Naveen

On Mon, Oct 15, 2018 at 11:18 AM Anton Chernov  wrote:

> Sorry, Timur, I've missed that part.
>
> It will be during the regular user group meeting that is conducted in
> Berlin and is streamed via Chime. You can find more information on the
> wiki:
>
>
> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28Incubating%29+User+Groups+recurring+meetings
>
> Best
> Anton
>
>
> пн, 15 окт. 2018 г. в 18:45, Timur Shenkao :
>
> > Is it London meeting?
> > Or some other location?
> >
> > On Monday, October 15, 2018, Anton Chernov  wrote:
> >
> > > Dear MXNet community,
> > >
> > > We've noticed that there has been some difficulties setting up
> > environments
> > > and reproducing test results from failed builds on the CI. We would
> like
> > to
> > > offer some help to the community on that and therefore helding a small
> > live
> > > stream demo session during our User Group Meeting on the 23rd of
> October.
> > > We will be:
> > >
> > > * Reviewing a failure and make an initial guess on the cause
> > > * Setting up environment
> > > * Reproducing the build step from the CI
> > > * Reproducing a failure step
> > > * Making and submitting a fix back to the community
> > >
> > > Feel free to propose some additional topic for the streaming.
> > >
> > > Best regards
> > > Anton
> > >
> >
>

Re: Creating branch for Java_API

2018-10-15 Thread Naveen Swamy

Thanks Anton, appreciate the offer of help. At the moment we are creating
PRs for both Master and Java and this branch is temporary, I would not mind
having a job if it isn't too much work.

Thanks, Naveen

On Mon, Oct 15, 2018 at 7:00 AM Anton Chernov  wrote:

> We could create a special job for testing it, maybe with a tweaked
> Jenkinsfile so you could run only the tests you are interested in. What do
> you think?
>
> Best
> Anton
>
> пт, 12 окт. 2018 г. в 20:24, Naveen Swamy :
>
> > Hi All,
> >
> > Just wanted to inform there that I am going to create a branch on GitHub
> > for the Java API work that Andrew/Qing and few others are doing. This is
> > only temporary I realize this will not have testing.
> > There seems to be continued disagreement in the approaches we are
> > taking(which is fine), so I are going to create a branch and provide the
> > code to a few interested users(within Amazon) and get concrete
> > feedback from them.
> >
> > Thanks, Naveen
> >
>

Creating branch for Java_API

2018-10-12 Thread Naveen Swamy

Hi All,

Just wanted to inform there that I am going to create a branch on GitHub
for the Java API work that Andrew/Qing and few others are doing. This is
only temporary I realize this will not have testing.
There seems to be continued disagreement in the approaches we are
taking(which is fine), so I are going to create a branch and provide the
code to a few interested users(within Amazon) and get concrete
feedback from them.

Thanks, Naveen

Re: [Discussion] Separating PMC and Committership

2018-10-10 Thread Naveen Swamy

Thanks for bringing up here. I think this topic and suggestions should be a
little more concrete by clarifying the difference between the role of
committer and PMC member.

Based on my understanding of here are my comments and concerns

1) I agree we need to bring more committers into the project, How would you
change the existing (but not followed IMO) committer criteria[1].

2) Can you expand on what significant contribution means to become a PMC
member? I think its important to recognize people who are supporting the
project in other ways than making code contributions like those building
tutorials/documentation/managing CI/managing community, etc.,

3) There are ~40 PMC members(committer+mentors), out of which i can count
in one hand the number of people who participate on the list(not
questioning) - This separation might lose perspectives that new members
bring into the PMC.

4) Building consensus in the PMC - IMHO, the problem here is when we have
disagreements is whether we(self included) accept each other's feedback and
respectfully disagree with each other - that is something for the PMC to
contemplate. I wonder if keeping people out would solve that problem.

5) like other's mentioned, it's important to call out what does being a
committer and a PMC member mean.

[1] https://cwiki.apache.org/confluence/display/MXNET/Becoming+a+Committer

On Wed, Oct 10, 2018 at 7:29 AM Isabel Drost-Fromm 
wrote:

>
>
> Am 10. Oktober 2018 16:16:47 MESZ schrieb sandeep krishnamurthy <
> sandeep.krishn...@gmail.com>:
> >However, like others suggested, success of this whole effort will be
> >based
> >on defining clear responsibility of PMC, committers and path for the
> >community to be part of committers and PMC.
>
> PMC member and chair are ASF defined roles. Some getting started docs:
>
> http://www.apache.org/foundation/how-it-works.html#pmc
>
> So to make my previous ask more explicit: Before discussing pros and cons
> of splitting roles I think it would make sense for everyone to either share
> their understanding of what those roles are or research the terms and share
> their resulting understanding. From the discussion so far to me it looks
> like this could be a helpful exercise to avoid confusion.
>
> Isabel
>
> --
> Diese Nachricht wurde von meinem Android-Gerät mit K-9 Mail gesendet.
>

Re: Create a Jira board for C/C++ API project

2018-10-05 Thread Naveen Swamy

Thanks for making the effort to bring tasks and feature improvements under
a managed boards in JIRA.  This will help in users/contributors to quickly
look through the stories/tasks and contribute to the project that they are
interested in.

We were able to create JIRA boards/filter and share it across the project
earlier. since the last few months I am unable to share the boards that I
create with the project( there was possibly an upgrade to JIRA that removed
access).  Now it looks like it needs global permission to create shared
boards and filter.[1]

We will need to create ticket to Apache INFRA to grant access. Can one of
the mentors help create INFRA ticket?

[1]
https://confluence.atlassian.com/adminjiracloud/managing-global-permissions-776636359.html

Thanks, Naveen

On Fri, Oct 5, 2018 at 11:14 AM Davydenko, Denis <
dzianis.davydze...@gmail.com> wrote:

> Hello, MXNet community,
>
>
>
> As part of mine and couple of my team mates day job we are working on
> contributing towards C++ and C APIs that MXNet exposes. We would like to
> propose to create a separate board in jira in order to make it easier to
> track work around MXNet C/C++ APIs. Very similar to
> https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=211
> (which we are using for managing development of Scala and Java APIs) but
> bound to show C/C++ sprints and work items. This will also give a better
> exposure to C/C++ API work for users to be aware of where these APIs moving
> as well as will make it easier to manage work on those APIs between
> multiple contributors.
>
>
>
> --
>
> Thanks,
>
> Denis
>
>

Re: CUDNN algorithm selection failure

2018-10-04 Thread Naveen Swamy

Looking at the error raised, you can see that the workspace size(GPU mem
size) of 1GB isn't sufficient. I am wondering if it is due to tests running
in parallel on CI, if this is true(tests running in parallel) is it
possible to reduce the parallelism ?
Error:
"mxnet.base.MXNetError: [05:40:12]
src/operator/nn/./cudnn/cudnn_convolution-inl.h:870: Failed to find any
forward convolution algorithm.  with workspace size of 1073741824 bytes,
please consider reducing batch/model size or increasing the workspace size"

I ran a similar test(test_slice_batchnorm) for 5K times and I couldn't
reproduce the issue. I will look into it further to see if there are other
alternatives.


On Thu, Oct 4, 2018 at 10:48 AM Piyush Ghai  wrote:

> Another build where test_slice_batchnorm_reshape_batchnorm fails :
>
> http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-12721/7/pipeline
> <
> http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-12721/7/pipeline
> >
>
> —
> Piyush
>
> > On Oct 3, 2018, at 9:32 AM, Pedro Larroy 
> wrote:
> >
> > Seems is not the only test:
> >
> http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-12726/5/pipeline
> >
> > test_slice_batchnorm_reshape_batchnorm is also failing and hasn't been
> > touched for a while. It doesn't look like a problem with the test to me,
> > (not a flaky test). Looks to me that should find and address the root
> cause
> > instead of disabling the test in this case.
> >
> > Pedro.
> >
> > On Tue, Oct 2, 2018 at 2:39 AM Marco de Abreu
> >  wrote:
> >
> >> I have created an issue at
> >> https://github.com/apache/incubator-mxnet/issues/12715 and a PR to
> disable
> >> the test at https://github.com/apache/incubator-mxnet/pull/12716.
> >>
> >> This test is pretty new and was submitted with a number of other
> >> problematic (and disabled) tests:
> >> https://github.com/apache/incubator-mxnet/issues/11164 It could be
> >> possible
> >> that the test is simply not stable enough. The PR that introduced that
> test
> >> is https://github.com/apache/incubator-mxnet/pull/10921 - it was merged
> >> two
> >> days ago.
> >>
> >> Best regards,
> >> Marco
> >>
> >> On Tue, Oct 2, 2018 at 8:43 AM Pedro Larroy <
> pedro.larroy.li...@gmail.com>
> >> wrote:
> >>
> >>> Thanks for checking Lin. If it happens again we will have to dig
> deeper.
> >> We
> >>> have just one executor in GPU so I wonder what could be the root cause
> of
> >>> this.
> >>>
> >>> On Mon, Oct 1, 2018 at 10:57 PM Lin Yuan  wrote:
> >>>
>  I could not reproduce the error on an EC2 g3x8 instance making it hard
> >> to
>  debug. I also suspect it was due to resource usage limit on ci
> >>> Instance.
> 
>  On Mon, Oct 1, 2018 at 10:40 PM Pedro Larroy <
> >>> pedro.larroy.li...@gmail.com
> >
>  wrote:
> 
> > It doesn't look like flakiness to me at first sight. I think it might
> >>> be
> > related to resource usage / allocation / leak in the worst case.
> >
> > Could be that there was not enough memory GPU memory at the time of
> >>> test
> > execution. But I'm just speculating, hence my original question.
> >
> > Pedro.
> >
> > On Mon, Oct 1, 2018 at 8:16 PM Lin Yuan  wrote:
> >
> >> Hi Pedro,
> >>
> >> I also got this failure in my PR
> >>
> >>
> >
> 
> >>>
> >>
> http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-11742/27/pipeline
> >>
> >> I was not able to identify the root cause of it from changelist.
> >> Are
>  you
> >> suggesting there is some flakiness in the master branch too?
> >>
> >> Thanks,
> >>
> >> Lin
> >>
> >> On Mon, Oct 1, 2018 at 4:55 PM Pedro Larroy <
> > pedro.larroy.li...@gmail.com>
> >> wrote:
> >>
> >>> Hi
> >>>
> >>> I saw this failure on CI:
> >>>
> >>>
> >>
> >
> 
> >>>
> >>
> http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/master/1697/pipeline
> >>>
> >>> Have you seen other cases where we fail to select the best CUDNN
> >> algorithm?
> >>> In which circumstances this could happen, and do you think is a
> >>> good
> > idea
> >>> to have one selected by default as a last resort?
> >>>
> >>>
> >>> Pedro.
> >>>
> >>
> >
> 
> >>>
> >>
>
>

Re: Subscription

2018-10-01 Thread Naveen Swamy

Invited

On Mon, Oct 1, 2018 at 12:39 PM Jim Jagielski  wrote:

> I'd like an invite as well, please :)
>
> > On Sep 29, 2018, at 12:03 PM, Naveen Swamy  wrote:
> >
> > Invite sent. Welcome to Apache MXNet Cosmin :).
> >
> >
> > On Sat, Sep 29, 2018 at 11:38 AM Cosmin Cătălin Sanda <
> > cosmincata...@gmail.com> wrote:
> >
> >> Hi, I would like to subscribe to the ASF mxnet channel.
> >> 
> >> *Cosmin Catalin SANDA*
> >> Data Scientist & Engineer
> >> Phone: +45.27.30.60.35
> >> Web: https://cosminsanda.com
> >>
>
>

Re: Feedback request for new Java API

2018-09-30 Thread Naveen Swamy

 just
> > picked an alternative.
> >
> > We should think a bit long term here. Java is a HUGE market with a
> big user
> > base, especially in the Enterprise segment. So, theoretically
> speaking,
> > even if we duplicate everything and decouple it from scala entirely,
> we
> > might have a good chance that - given a good design - other people
> will
> > step up and help maintaining and extending the API. If the process to
> > change anything in that API is super cryptic, we might block future
> efforts
> > because the entry barrier is too high.
> >
> > I think our python API is the best example. We are finally at a
> state where
> > it's super easy to extend, the usability is good (especially around
> gluon)
> > and it fits into the languages' idioms. Let's try to take that
> experience
> > and apply it to the Java API. This might be more work initially and
> create
> > redundancy, but we have seen that a good API (external as well as
> internal)
> > design increases contributions and attracts more users.
> >
> > I won't take any side here because I'm unfamiliar with the scala
> side of
> > mxnet, just wanted to add an external viewpoint.
> >
> > -Marco
> >
> > YiZhi Liu  schrieb am So., 30. Sep. 2018,
> 05:15:
> >
> >> Yes agreement and disagreement stay at  technical level only:)
> >>
> >> Back to the problem, they are unnecessary but good in terms of,
> >> 1. Still not good for java users to write 3 nulls in a function
> call with 5
> >> or 4 args
> >> 2. Every function call with a “tail” null for arg “out”. I would
> say, makes
> >> it seems not a serious api design to our users
> >> 3. Users have uniformed experience, nothing surprising.
> >>
> >> Given the reasons I listed before, I don’t see things bad for this.
> >>
> >> I agree we can vote. I suggest to have two votes, one for builder,
> one for
> >> separating java and scala objects.
> >>
> >> On Sat, Sep 29, 2018 at 7:43 PM Naveen Swamy 
> wrote:
> >>
> >>> Ah! we agree on something :) lets get more opinions, I am happy to
> go
> >> with
> >>> it.
> >>>
> >>> On Sat, Sep 29, 2018 at 10:40 PM YiZhi Liu 
> wrote:
> >>>
> >>>> Also sometimes people may not be at the same page when talking
> about
> >>> option
> >>>> #2. What I insist is the builder classes for each operator.
> Otherwise I
> >>>> actually more support Naveen’s approach - not to totally separate
> java
> >>> and
> >>>> scala objects.
> >>>>
> >>>> On Sat, Sep 29, 2018 at 7:35 PM YiZhi Liu 
> wrote:
> >>>>
> >>>>> No you haven't answered my question "Since you agree to have 30+
> >>>>> operators have Builder, what prevents from
> >>>>> having all of them have Builder?"
> >>>>> On Sat, Sep 29, 2018 at 7:30 PM Naveen Swamy  >
> >>> wrote:
> >>>>>> I think we have had enough of an debate between the two of us
> and I
> >>>> have
> >>>>>> already listed my reasons, I will stop here and see what others
> say
> >>>>> given
> >>>>>> my reasoning.
> >>>>>>
> >>>>>> -1 to #2)
> >>>>>>
> >>>>>> Also, by lecture I meant to say  "I don't want to list all the
> >>> problems
> >>>>>> with unnecessary complications and talk about how to design
> >> software"
> >>>>>> On Sat, Sep 29, 2018 at 10:15 PM YiZhi Liu  >
> >>>> wrote:
> >>>>>>> And if we find incorrect declaration, we fix it, not simply
> >>> assuming
> >>>>>>> many of them also has problem and we cannot rely on them -
> >>> otherwise
> >>>>>>> the type-safe APIs in Scala also does not make sense.
> >>>>>>> On Sat, Sep 29, 2018 at 7:10 PM YiZhi Liu  >
> >>>> wrote:
> >>>>>>>> It also makes sense to me if we have it under namespace
> >>

Re: Feedback request for new Java API

2018-09-29 Thread Naveen Swamy

Ah! we agree on something :) lets get more opinions, I am happy to go with
it.

On Sat, Sep 29, 2018 at 10:40 PM YiZhi Liu  wrote:

> Also sometimes people may not be at the same page when talking about option
> #2. What I insist is the builder classes for each operator. Otherwise I
> actually more support Naveen’s approach - not to totally separate java and
> scala objects.
>
> On Sat, Sep 29, 2018 at 7:35 PM YiZhi Liu  wrote:
>
> > No you haven't answered my question "Since you agree to have 30+
> > operators have Builder, what prevents from
> > having all of them have Builder?"
> > On Sat, Sep 29, 2018 at 7:30 PM Naveen Swamy  wrote:
> > >
> > > I think we have had enough of an debate between the two of us and I
> have
> > > already listed my reasons, I will stop here and see what others say
> > given
> > > my reasoning.
> > >
> > > -1 to #2)
> > >
> > > Also, by lecture I meant to say  "I don't want to list all the problems
> > > with unnecessary complications and talk about how to design software"
> > >
> > > On Sat, Sep 29, 2018 at 10:15 PM YiZhi Liu 
> wrote:
> > >
> > > > And if we find incorrect declaration, we fix it, not simply assuming
> > > > many of them also has problem and we cannot rely on them - otherwise
> > > > the type-safe APIs in Scala also does not make sense.
> > > > On Sat, Sep 29, 2018 at 7:10 PM YiZhi Liu 
> wrote:
> > > > >
> > > > > It also makes sense to me if we have it under namespace NDArray,
> not
> > > > > creating new JavaNDArray. But again, uniform experience is
> important.
> > > > >
> > > > > What I responded is your comment "keep scala macros minimum", I
> don't
> > > > > think "scala macro" equals "cryptic code". Even though it does,
> what
> > > > > we need to do is to find an alternative way to do code generation,
> > not
> > > > > making code generation minimum.
> > > > >
> > > > > Since you agree to have 30+ operators have Builder, what prevents
> > from
> > > > > having all of them have Builder?
> > > > > - They're auto-generated, the auto-generation "cryptic" code is
> > anyway
> > > > > there. And "two different paths of code" (though I don't totally
> > > > > agree) is anyway there.
> > > > > - What else? 200+ classes is a very tiny increasing in file size
> > > > > (~3MB) compare to current status. And won't have any performance
> > issue
> > > > > on modern JVM.
> > > > >
> > > > > Just remind, technical discussion is not about who gives who a
> > lecture.
> > > > > On Sat, Sep 29, 2018 at 6:41 PM Naveen Swamy 
> > wrote:
> > > > > >
> > > > > > Well, I am not sure(I don't think) we need Builder for every API
> in
> > > > > > NDArray. For APIs that take long list of parameters, I agree to
> add
> > > > Builder.
> > > > > > Look at the API distribution based on number of arguments here:
> > > > > > https://gist.github.com/nswamy/2dea72e514cc7bfc675f68aef9fe78bb
> > > > > > about 30 APIs have 7 or more arguments.. I agree to add Builders
> > for
> > > > these
> > > > > > APIs not separately but to the existing Scala APIs but not
> > separately
> > > > only
> > > > > > for Java.
> > > > > > APIs sorted by number of arguments is here, take a look :
> > > > > > https://gist.github.com/nswamy/e941cb94658b3960eec40bf00b970ac5
> > > > > >
> > > > > > Many of the arguments i think are actually mandatory but
> > incorrectly
> > > > > > declared optional on the backend, for example look at SwapAxis
> > > > > > "def SwapAxis (data : NDArray, dim1 : Option[Int] = None, dim2 :
> > > > > > Option[Int] = None, out : Option[NDArray] = None) :
> > NDArrayFuncReturn"
> > > > > > Why is dim1 and dim2 Optional, this is an error in the
> declaration
> > on
> > > > the
> > > > > > backend, I think there might be many of these?
> > > > > >
> > > > > > My answers to your other responses are below inline:
> > > > > >
> > > > > > On Sat, Sep 29, 20

Re: Feedback request for new Java API

2018-09-29 Thread Naveen Swamy

IMO, its unnecessary. this is only my opinion and you are free to disagree.
With more opinions one or some of us will have to commit to the majority of
the opinion.

On Sat, Sep 29, 2018 at 10:35 PM YiZhi Liu  wrote:

> No you haven't answered my question "Since you agree to have 30+
> operators have Builder, what prevents from
> having all of them have Builder?"
> On Sat, Sep 29, 2018 at 7:30 PM Naveen Swamy  wrote:
> >
> > I think we have had enough of an debate between the two of us and I have
> > already listed my reasons, I will stop here and see what others say
> given
> > my reasoning.
> >
> > -1 to #2)
> >
> > Also, by lecture I meant to say  "I don't want to list all the problems
> > with unnecessary complications and talk about how to design software"
> >
> > On Sat, Sep 29, 2018 at 10:15 PM YiZhi Liu  wrote:
> >
> > > And if we find incorrect declaration, we fix it, not simply assuming
> > > many of them also has problem and we cannot rely on them - otherwise
> > > the type-safe APIs in Scala also does not make sense.
> > > On Sat, Sep 29, 2018 at 7:10 PM YiZhi Liu  wrote:
> > > >
> > > > It also makes sense to me if we have it under namespace NDArray, not
> > > > creating new JavaNDArray. But again, uniform experience is important.
> > > >
> > > > What I responded is your comment "keep scala macros minimum", I don't
> > > > think "scala macro" equals "cryptic code". Even though it does, what
> > > > we need to do is to find an alternative way to do code generation,
> not
> > > > making code generation minimum.
> > > >
> > > > Since you agree to have 30+ operators have Builder, what prevents
> from
> > > > having all of them have Builder?
> > > > - They're auto-generated, the auto-generation "cryptic" code is
> anyway
> > > > there. And "two different paths of code" (though I don't totally
> > > > agree) is anyway there.
> > > > - What else? 200+ classes is a very tiny increasing in file size
> > > > (~3MB) compare to current status. And won't have any performance
> issue
> > > > on modern JVM.
> > > >
> > > > Just remind, technical discussion is not about who gives who a
> lecture.
> > > > On Sat, Sep 29, 2018 at 6:41 PM Naveen Swamy 
> wrote:
> > > > >
> > > > > Well, I am not sure(I don't think) we need Builder for every API in
> > > > > NDArray. For APIs that take long list of parameters, I agree to add
> > > Builder.
> > > > > Look at the API distribution based on number of arguments here:
> > > > > https://gist.github.com/nswamy/2dea72e514cc7bfc675f68aef9fe78bb
> > > > > about 30 APIs have 7 or more arguments.. I agree to add Builders
> for
> > > these
> > > > > APIs not separately but to the existing Scala APIs but not
> separately
> > > only
> > > > > for Java.
> > > > > APIs sorted by number of arguments is here, take a look :
> > > > > https://gist.github.com/nswamy/e941cb94658b3960eec40bf00b970ac5
> > > > >
> > > > > Many of the arguments i think are actually mandatory but
> incorrectly
> > > > > declared optional on the backend, for example look at SwapAxis
> > > > > "def SwapAxis (data : NDArray, dim1 : Option[Int] = None, dim2 :
> > > > > Option[Int] = None, out : Option[NDArray] = None) :
> NDArrayFuncReturn"
> > > > > Why is dim1 and dim2 Optional, this is an error in the declaration
> on
> > > the
> > > > > backend, I think there might be many of these?
> > > > >
> > > > > My answers to your other responses are below inline:
> > > > >
> > > > > On Sat, Sep 29, 2018 at 3:37 PM YiZhi Liu 
> wrote:
> > > > >
> > > > > > Some of my comments inline:
> > > > > >
> > > > > > > Why can we not create the builder just for these APIs( which we
> > > > > > discussed), why is it necessary to add 200 Apis
> > > > > > It is about unified user-experience. And we get rid of annoying
> extra
> > > > > > "out=null" in every operator.
> > > > > >
> > > > > > > Are you suggesting to create builder for each and every API?
> > > > > > Only for those are necessary. For NDArray.XXX, yes.
> &

Re: Feedback request for new Java API

2018-09-29 Thread Naveen Swamy

I am not sure what makes you think that I am suggesting we should not fix
it. I am pointing out that many of those are incorrectly optional so don't
take that into consideration to make a decision whether we need builders
for all.

On Sat, Sep 29, 2018 at 10:15 PM YiZhi Liu  wrote:

> And if we find incorrect declaration, we fix it, not simply assuming
> many of them also has problem and we cannot rely on them - otherwise
> the type-safe APIs in Scala also does not make sense.
> On Sat, Sep 29, 2018 at 7:10 PM YiZhi Liu  wrote:
> >
> > It also makes sense to me if we have it under namespace NDArray, not
> > creating new JavaNDArray. But again, uniform experience is important.
> >
> > What I responded is your comment "keep scala macros minimum", I don't
> > think "scala macro" equals "cryptic code". Even though it does, what
> > we need to do is to find an alternative way to do code generation, not
> > making code generation minimum.
> >
> > Since you agree to have 30+ operators have Builder, what prevents from
> > having all of them have Builder?
> > - They're auto-generated, the auto-generation "cryptic" code is anyway
> > there. And "two different paths of code" (though I don't totally
> > agree) is anyway there.
> > - What else? 200+ classes is a very tiny increasing in file size
> > (~3MB) compare to current status. And won't have any performance issue
> > on modern JVM.
> >
> > Just remind, technical discussion is not about who gives who a lecture.
> > On Sat, Sep 29, 2018 at 6:41 PM Naveen Swamy  wrote:
> > >
> > > Well, I am not sure(I don't think) we need Builder for every API in
> > > NDArray. For APIs that take long list of parameters, I agree to add
> Builder.
> > > Look at the API distribution based on number of arguments here:
> > > https://gist.github.com/nswamy/2dea72e514cc7bfc675f68aef9fe78bb
> > > about 30 APIs have 7 or more arguments.. I agree to add Builders for
> these
> > > APIs not separately but to the existing Scala APIs but not separately
> only
> > > for Java.
> > > APIs sorted by number of arguments is here, take a look :
> > > https://gist.github.com/nswamy/e941cb94658b3960eec40bf00b970ac5
> > >
> > > Many of the arguments i think are actually mandatory but incorrectly
> > > declared optional on the backend, for example look at SwapAxis
> > > "def SwapAxis (data : NDArray, dim1 : Option[Int] = None, dim2 :
> > > Option[Int] = None, out : Option[NDArray] = None) : NDArrayFuncReturn"
> > > Why is dim1 and dim2 Optional, this is an error in the declaration on
> the
> > > backend, I think there might be many of these?
> > >
> > > My answers to your other responses are below inline:
> > >
> > > On Sat, Sep 29, 2018 at 3:37 PM YiZhi Liu  wrote:
> > >
> > > > Some of my comments inline:
> > > >
> > > > > Why can we not create the builder just for these APIs( which we
> > > > discussed), why is it necessary to add 200 Apis
> > > > It is about unified user-experience. And we get rid of annoying extra
> > > > "out=null" in every operator.
> > > >
> > > > > Are you suggesting to create builder for each and every API?
> > > > Only for those are necessary. For NDArray.XXX, yes.
> > > >
> > > I think this is a ridiculous list of Builders, I think we can keep the
> > > 'out' parameter
> > >
> > > > 1) The NDArray APIs in question are not following functional style of
> > > > programming, in fact they are just static methods defined on an
> > > > NDArray object - so Scala users are not losing much by using null in
> > > > place of None.
> > > > You can create a implicit to maintain backward compatibility
> > > > - I doubt implicit can work in such case from None -> null.
> > > >
> > >
> > > It is just writing getOrElse in your implicit, so it will work.
> > > scala> implicit def optionStringToString(a: Option[String]): String = {
> > >  | a.getOrElse(null)
> > >  | }
> > >
> > > 2) It is adding 220+ APIs(I understand it is generated) for NDArray
> alone
> > > > - As I explained how it can improve user experiences
> > > >
> > > I don't think we need to write builders for 221 APIs we have, may be
> for 30
> > > or so. Uniform experience is good goal but it also has to be practical
> and
> > > make sense.
> > >
>

Re: Feedback request for new Java API

2018-09-29 Thread Naveen Swamy

I think we have had enough of an debate between the two of us and I have
already listed my reasons, I will stop here and see what others say  given
my reasoning.

-1 to #2)

Also, by lecture I meant to say  "I don't want to list all the problems
with unnecessary complications and talk about how to design software"

On Sat, Sep 29, 2018 at 10:15 PM YiZhi Liu  wrote:

> And if we find incorrect declaration, we fix it, not simply assuming
> many of them also has problem and we cannot rely on them - otherwise
> the type-safe APIs in Scala also does not make sense.
> On Sat, Sep 29, 2018 at 7:10 PM YiZhi Liu  wrote:
> >
> > It also makes sense to me if we have it under namespace NDArray, not
> > creating new JavaNDArray. But again, uniform experience is important.
> >
> > What I responded is your comment "keep scala macros minimum", I don't
> > think "scala macro" equals "cryptic code". Even though it does, what
> > we need to do is to find an alternative way to do code generation, not
> > making code generation minimum.
> >
> > Since you agree to have 30+ operators have Builder, what prevents from
> > having all of them have Builder?
> > - They're auto-generated, the auto-generation "cryptic" code is anyway
> > there. And "two different paths of code" (though I don't totally
> > agree) is anyway there.
> > - What else? 200+ classes is a very tiny increasing in file size
> > (~3MB) compare to current status. And won't have any performance issue
> > on modern JVM.
> >
> > Just remind, technical discussion is not about who gives who a lecture.
> > On Sat, Sep 29, 2018 at 6:41 PM Naveen Swamy  wrote:
> > >
> > > Well, I am not sure(I don't think) we need Builder for every API in
> > > NDArray. For APIs that take long list of parameters, I agree to add
> Builder.
> > > Look at the API distribution based on number of arguments here:
> > > https://gist.github.com/nswamy/2dea72e514cc7bfc675f68aef9fe78bb
> > > about 30 APIs have 7 or more arguments.. I agree to add Builders for
> these
> > > APIs not separately but to the existing Scala APIs but not separately
> only
> > > for Java.
> > > APIs sorted by number of arguments is here, take a look :
> > > https://gist.github.com/nswamy/e941cb94658b3960eec40bf00b970ac5
> > >
> > > Many of the arguments i think are actually mandatory but incorrectly
> > > declared optional on the backend, for example look at SwapAxis
> > > "def SwapAxis (data : NDArray, dim1 : Option[Int] = None, dim2 :
> > > Option[Int] = None, out : Option[NDArray] = None) : NDArrayFuncReturn"
> > > Why is dim1 and dim2 Optional, this is an error in the declaration on
> the
> > > backend, I think there might be many of these?
> > >
> > > My answers to your other responses are below inline:
> > >
> > > On Sat, Sep 29, 2018 at 3:37 PM YiZhi Liu  wrote:
> > >
> > > > Some of my comments inline:
> > > >
> > > > > Why can we not create the builder just for these APIs( which we
> > > > discussed), why is it necessary to add 200 Apis
> > > > It is about unified user-experience. And we get rid of annoying extra
> > > > "out=null" in every operator.
> > > >
> > > > > Are you suggesting to create builder for each and every API?
> > > > Only for those are necessary. For NDArray.XXX, yes.
> > > >
> > > I think this is a ridiculous list of Builders, I think we can keep the
> > > 'out' parameter
> > >
> > > > 1) The NDArray APIs in question are not following functional style of
> > > > programming, in fact they are just static methods defined on an
> > > > NDArray object - so Scala users are not losing much by using null in
> > > > place of None.
> > > > You can create a implicit to maintain backward compatibility
> > > > - I doubt implicit can work in such case from None -> null.
> > > >
> > >
> > > It is just writing getOrElse in your implicit, so it will work.
> > > scala> implicit def optionStringToString(a: Option[String]): String = {
> > >  | a.getOrElse(null)
> > >  | }
> > >
> > > 2) It is adding 220+ APIs(I understand it is generated) for NDArray
> alone
> > > > - As I explained how it can improve user experiences
> > > >
> > > I don't think we need to write builders for 221 APIs we have, may be
> for 30
> > > or so. Uniform experience

Re: [DISCUSS] Use modernized C++11 range loops uniformly throughout the project

2018-09-29 Thread Naveen Swamy

Thanks Kellen & Anton, for your detailed explanation and links to
advantages, appreciate it.
changing my vote to *-0*, I suggest to show as warnings.

On Sat, Sep 29, 2018 at 8:06 PM Anton Chernov  wrote:

> And if you want a more authoritative opinion on that check out what the C++
> core guidelines are saying [1]:
>
> > ES.71: Prefer a range-for-statement to a for-statement when there is a
> choice
> > Reason
> > Readability. Error prevention. Efficiency.
>
> Best regards
> Anton
>
> [1]
>
> https://github.com/isocpp/CppCoreGuidelines/blob/master/CppCoreGuidelines.md#Res-for-range
>
>
> сб, 29 сент. 2018 г. в 16:13, Anton Chernov :
>
> > +1
> >
> > Maybe it's not necessary to enforce usage of range-based for, but I would
> > highly encourage to to it due to already named advantages. If code would
> be
> > introduced using the old-style there could be a comment suggesting the
> new
> > way. But why do the manual work and not leave that to the automated tool?
> >
> > And since it's already automated - wouldn't it be better to keep a
> unified
> > modern style?
> >
> > Just to make this a trend - C++ evolves quickly and this will not be only
> > upgrade that would needed to be made. And the easier such upgrades get
> > accepted the easier in general is to upgrade the codebase.
> >
> > Soon the standard will get ranges and concepts and this will change the
> > way C++ applications get written significantly. It is a good habit to be
> > open for changes and keep up with the trends. By using the new
> > possibilities the language can offer you prepare yourself for further
> > changes and are more likely to accept them, evolving your programming
> style.
> >
> > Take a look at a new examples on modern usages (taken from [1]):
> >
> > // since C++17
> > for (auto&& [first,second] : mymap) {
> > // use first and second
> > }
> >
> > // since C++20
> > for (auto& x : foo().items()) { /* .. */ } // undefined behavior if foo()
> > returns by value
> > for (T thing = foo(); auto& x : thing.items()) { /* ... */ } // OK
> >
> > // since C++11
> > struct cow_string { /* ... */ };
> > // a copy-on-write string cow_string str = /* ... */;
> > // for(auto x : str) { /* ... */ } // may cause deep copy
> > for(auto x : std::as_const(str)) { /* ... */ }
> >
> > Regarding performance: it's really easy to prove that generated assembly
> > is not changing at all. There is a really handy tool for that [2]. You
> can
> > check online the assembly for different language constructs and different
> > compilers.
> >
> > Best regards,
> > Anton
> >
> > [1] https://en.cppreference.com/w/cpp/language/range-for
> > [2] https://gcc.godbolt.org
> >
> > сб, 29 сент. 2018 г. в 13:15, kellen sunderland <
> > kellen.sunderl...@gmail.com>:
> >
> >> It's more readable because it's concise and it's consistent for many
> types
> >> you're looping over (i.e. primitive arrays, stl iterators, etc all work
> >> the
> >> same way).  It's also useful because it's consistent with other
> >> programming
> >> languages, making C++ codebases much easier to read for novice and
> >> intermediate developers.  IMO it also leads to better naming in loop
> >> bodies
> >> as the concise style means you're less likely to have important 1 letter
> >> variable names describing loop elements (e.g. no int i =0 or it ...).
> >> More
> >> motivation can be found in the cpp standards proposals for C++11
> >> http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2005/n1868.html and
> >> http://open-std.org/jtc1/sc22/wg21/docs/papers/2014/n3853.htm.
> >>
> >>
> >>
> >> On Sat, Sep 29, 2018 at 6:38 PM Naveen Swamy 
> wrote:
> >>
> >> > Kellen,
> >> >
> >> > Could you please explain why you think range loops are better and how
> it
> >> > improves readability?  this is a relatively new feature, many of them
> >> are
> >> > used to the old syntax, shouldn't we leave it for the developers to
> >> choose
> >> > the one that best suits the need and their familiarity.
> >> > In general I support the notion of standardizing where necessary,
> >> enforcing
> >> > rules on loops seems little bit like micro-managing how you should
> write
> >> > C++ code for MXNet.
> >> >
> >> > -1(open to change based on new information)
> &g

Re: Which merge option to use on the Import Julia binding PR?

2018-09-29 Thread Naveen Swamy

yes, AFAIK I think Apache Infra has disabled the merge option.

If there is a valid reason(and this is one), we could ask our Mentors to
help us create a INFRA ticket to temporarily enable this option and once we
are done merging we can request to disable it again.

On Sat, Sep 29, 2018 at 9:44 PM Chiyuan Zhang  wrote:

> There is an option in the repo settings menu to disable or enable
> merge-commit for PR, see a screenshot below (from a different github
> project):
>
> [image: image.png]
>
> My guess is that this is disabled for the reason to avoid creating
> non-linear history for standard PRs (as oppose to technical problem). But
> this is only my guess, it would be great if someone could confirm.
>
> Best,
> Chiyuan
>
> On Sat, Sep 29, 2018 at 3:50 AM Carin Meier  wrote:
>
>> I believe so, but if someone wants to confirm it would be great.
>> Unfortunately, I just came down with a cold/flu so I will be out of
>> communication for a bit
>>
>> On Fri, Sep 28, 2018 at 9:51 PM Marco de Abreu
>>  wrote:
>>
>> > Are we sure that this is due to lacking permissions and not because of
>> some
>> > technical limitation? If we are certain, we can ask out mentors to
>> create a
>> > ticket with Apache Infra to make that switch.
>> >
>> > -Marco
>> >
>> > Carin Meier  schrieb am Sa., 29. Sep. 2018,
>> 01:17:
>> >
>> > > I made a test regular merge commit into a copy of master. It seemed
>> to go
>> > > fine. Here is a listing of what it will look like for everyone.
>> > >
>> > >
>> >
>> https://github.com/apache/incubator-mxnet/commits/test-merge-julia-import
>> > >
>> > > Although, I would be happy to push the merge button. I think the most
>> > > important thing is to get the PR merged, so whatever way is the best
>> to
>> > > make that happen, let's do it.
>> > >
>> > > So - Does the regular merge seem like a good option?
>> > > If so, what is the best way to make that happen?
>> > >
>> > >
>> > > On Fri, Sep 28, 2018 at 6:05 PM Chiyuan Zhang 
>> wrote:
>> > >
>> > > > Agreed with Pedro. Maybe the merge-commit option from the github
>> > > interface
>> > > > was disabled for a reason. But as Pedro said, maybe it is good to
>> > > > temporarily enable it for this PR and merge using that.
>> > > >
>> > > >
>> > > >- It should be technically easier than rebasing due to the
>> > > >git-subtree-import issue we are currently having
>> > > >- It also avoid stacking a huge commit history on *top* of
>> current
>> > > >history
>> > > >- The downside is probably the history of the project is not
>> linear
>> > > >anymore, but I think this is actually what we would like to have
>> for
>> > > > this
>> > > >particular case, because the contents of the main repo and the
>> julia
>> > > > branch
>> > > >actually does not overlap. So it makes sense to have two tails
>> with
>> > > > their
>> > > >own history.
>> > > >
>> > > > Carin: I guess if someone with admin permission on the github could
>> > > > temporarily enable the merge-commit option, then pushing the button
>> on
>> > > the
>> > > > web might simply work.
>> > > >
>> > > > Best,
>> > > > Chiyuan
>> > > >
>> > > > On Fri, Sep 28, 2018 at 2:53 PM Carin Meier 
>> > > wrote:
>> > > >
>> > > > > Pedro - Maybe a merge commit is a better answer in this case. I
>> > > > originally
>> > > > > ruled it out since it wasn't an option in the github web
>> interface,
>> > but
>> > > > > since this looks like it is going to have to be done outside it
>> > because
>> > > > of
>> > > > > the subtrees anyway, it might be a better fit.
>> > > > >
>> > > > > On Fri, Sep 28, 2018 at 5:07 PM Carin Meier > >
>> > > > wrote:
>> > > > >
>> > > > > > We are actually running into troubles with using the subtree and
>> > the
>> > > > > > rebase. Since it looks like this is not going to be a simple,
>> > "click
>> > >

Re: Feedback request for new Java API

2018-09-29 Thread Naveen Swamy

 I think adding 400+ APIs unnecessarily would
> significantly increase build time and bad developer experience
> - I don't think increasing such a bit compilation time is a problem
> compared to bad user experience.

I am not suggesting bad user experience but to take a practical approach -
having a bad developer experience is not great either.

>
>
7) I want to keep the core of the framework to be in Scala - because it
> allows you to write concise code - Yes it has a bit of learning curve, not
> everyone needs to know. I would rather invest in solidifying the Scala APIs
> and add more features in Scala(RNN, Support GluonHybridizedBlock...there is
> quite bit of work ) - do you want to rewrite everything in Scala and Java.
> - I agree with "don't rewrite everything in Scala and Java", IMO
> JavaNDArray is the only one good to have. JShape, JContext, etc. are
> not so necessary.
>
> Either you go all Java or make accommodation in Scala code to work for
APIs so your users know what to expect(uniform experience across).

> 8) Also, the discussion is not creating NDArray class for Java, just
> generate certain APIs to cater for Java incompatibility.
> - Yes I agree it's about "generate certain APIs to cater for Java
> incompatibility", though I think NDArray.api.XXX does not meet Java
> users' demands.
>
On Sat, Sep 29, 2018 at 12:05 PM Naveen Swamy  wrote:
> >
> > I know it is about trade-off.  I am suggesting a trade-off , how many
> apis do we have that takes too many parameters ?
> > From what I recall its around 20. Why can we not create the builder just
> for these APIs( which we discussed), why is it necessary to add 200 Apis ?
> > Are you suggesting to create builder for each and every API?
> >
> > I disagree with your opinion that they are not important and would like
> to hear from others.
> >
> > I am curious to see how the #2 looks like compared to #1
> > Andrew/Qing, can you paste the generated Apis that you have for both
> Scala and Java in a gist please.
> >
> > > On Sep 29, 2018, at 2:41 PM, YiZhi Liu  wrote:
> > >
> > > Naveen, software designing is all about tradeoff, every feature we
> > > introduce causes more compiling time, more efforts to maintain, etc.
> > >
> > > The main difference is.
> > >
> > > Option #1: Java users do
> > > NDArray.BatchNorm(data, gamma, beta, null, null, null, null, null,
> > > null, null, null, null, null, null);
> > > (and because every operator has an argument "out", users need to add
> > > an extra "null" to the function call almost every time.)
> > >
> > > Option #2, Java users do
> > > JavaNDArray.BatchNorm(data).setGamma(gamma).setBeta(beta).invoke();
> > >
> > > I don't think any of the reasons you listed is so important as the
> > > benefit above we got from option #2.
> > >> On Sat, Sep 29, 2018 at 8:24 AM Naveen Swamy 
> wrote:
> > >>
> > >> Java APIs are not like Clojure - The current proposal is only to
> build a
> > >> few thin wrappers for Inference.
> > >>
> > >> To better represent the two cases and this discussion in particular,
> here
> > >> is an example API
> > >>
> > >> 1) def Activation (data : org.apache.mxnet.NDArray, act_type :
> String, out
> > >> : Option[NDArray] = None) : org.apache.mxnet.NDArrayFuncReturn
> > >> or
> > >> 2) def Activation (data : org.apache.mxnet.NDArray, act_type :
> String, out
> > >> : NDArray) : org.apache.mxnet.NDArrayFuncReturn
> > >>
> > >> The discussion is should we add(generate) 200+ APIs to make it Java
> > >> compatible, ie., remove the Option class and the None default value
> which
> > >> Java does not understand from Option 1)
> > >>
> > >> my suggestion was to remove the Option class and create a implicit for
> > >> backward compatibility and use null instead of None, Andrew and I
> disagreed
> > >> on this, so I suggested to raise a discussion on dev@ to get more
> opinions
> > >> and one of us will disagree and commit. Thanks for raising it :)
> > >>
> > >> | * def Activation (data : org.apache.mxnet.NDArray, act_type :
> String, out
> > >> : NDArray = null) : org.apache.mxnet.NDArrayFuncReturn |
> > >> --
> > >>
> > >> 1) It is not true that Scala users will lose *default/optional*
> arguments -
> > >> if we followed the above, they will use null or None, though I do not
> like
> > >

Re: Feedback request for new Java API

2018-09-29 Thread Naveen Swamy

I know it is about trade-off.  I am suggesting a trade-off , how many apis do 
we have that takes too many parameters ? 
From what I recall its around 20. Why can we not create the builder just for 
these APIs( which we discussed), why is it necessary to add 200 Apis ?
Are you suggesting to create builder for each and every API?

I disagree with your opinion that they are not important and would like to hear 
from others.

I am curious to see how the #2 looks like compared to #1 
Andrew/Qing, can you paste the generated Apis that you have for both Scala and 
Java in a gist please.

> On Sep 29, 2018, at 2:41 PM, YiZhi Liu  wrote:
> 
> Naveen, software designing is all about tradeoff, every feature we
> introduce causes more compiling time, more efforts to maintain, etc.
> 
> The main difference is.
> 
> Option #1: Java users do
> NDArray.BatchNorm(data, gamma, beta, null, null, null, null, null,
> null, null, null, null, null, null);
> (and because every operator has an argument "out", users need to add
> an extra "null" to the function call almost every time.)
> 
> Option #2, Java users do
> JavaNDArray.BatchNorm(data).setGamma(gamma).setBeta(beta).invoke();
> 
> I don't think any of the reasons you listed is so important as the
> benefit above we got from option #2.
>> On Sat, Sep 29, 2018 at 8:24 AM Naveen Swamy  wrote:
>> 
>> Java APIs are not like Clojure - The current proposal is only to build a
>> few thin wrappers for Inference.
>> 
>> To better represent the two cases and this discussion in particular, here
>> is an example API
>> 
>> 1) def Activation (data : org.apache.mxnet.NDArray, act_type : String, out
>> : Option[NDArray] = None) : org.apache.mxnet.NDArrayFuncReturn
>> or
>> 2) def Activation (data : org.apache.mxnet.NDArray, act_type : String, out
>> : NDArray) : org.apache.mxnet.NDArrayFuncReturn
>> 
>> The discussion is should we add(generate) 200+ APIs to make it Java
>> compatible, ie., remove the Option class and the None default value which
>> Java does not understand from Option 1)
>> 
>> my suggestion was to remove the Option class and create a implicit for
>> backward compatibility and use null instead of None, Andrew and I disagreed
>> on this, so I suggested to raise a discussion on dev@ to get more opinions
>> and one of us will disagree and commit. Thanks for raising it :)
>> 
>> | * def Activation (data : org.apache.mxnet.NDArray, act_type : String, out
>> : NDArray = null) : org.apache.mxnet.NDArrayFuncReturn |
>> --
>> 
>> 1) It is not true that Scala users will lose *default/optional* arguments -
>> if we followed the above, they will use null or None, though I do not like
>> using nulls, this is a fine compromise.
>> To keep backward compatibility we can create a implicit to convert
>> Option.None to nulls and Option.Some-> Option.get(), so you are not going
>> to break users who might have been using the APIs that were released in
>> 1.3. The current incompatibility is only this w.r.t. NDArrays.
>> 
>> 2) Now about the Scala Macros - they are not simple to read or use, When I
>> and Qing started working on the #Scala Macros to improve the APIs, it took
>> us a good amount of time to get a hang of it. I don't want to add
>> additional code when not necessary.
>> 
>> My suggestion and vote is to modify existing Macro(i.e., #1 from the
>> original email with the necessary clarification above) and make it
>> compatible with Java
>> Here are my reasons
>> 1) The NDArray APIs in question are not following functional style of
>> programming, in fact they are just static methods defined on an NDArray
>> object - so Scala users are not losing much by using null in place of None.
>> You can create a implicit to maintain backward compatibility
>> 2) It is adding 220+ APIs(I understand it is generated) for NDArray alone
>> 3) this is adding another 100s of APIs unnecessarily, we are starting with
>> NDArray but we can't stop there, we will have to do this for Symbol,
>> Executor, Iterators, etc., .
>> 3) I don't want to be fixing bugs and maintaining code in 2 places.
>> 4) I want the cryptic code(# scala macros) to a minimum.
>> 5) increased compilation time & bad developer experience - the time to
>> compile has gone up quite a bit since we added the APIs last release on my
>> 3 year old laptop already.. I think adding 400+ APIs unnecessarily would
>> significantly increase build time and bad developer experience
>> 6) I want to keep the core of the framework to be in Scala - because it
>> allows you to write concise code - Yes it has a bit o

Re: [DISCUSS] Use modernized C++11 range loops uniformly throughout the project

2018-09-29 Thread Naveen Swamy

Kellen,

Could you please explain why you think range loops are better and how it
improves readability?  this is a relatively new feature, many of them are
used to the old syntax, shouldn't we leave it for the developers to choose
the one that best suits the need and their familiarity.
In general I support the notion of standardizing where necessary, enforcing
rules on loops seems little bit like micro-managing how you should write
C++ code for MXNet.

-1(open to change based on new information)



On Fri, Sep 28, 2018 at 5:20 PM Chris Olivier  wrote:

> ok then, my vote is still -1, however, because it’s just adding needless
> friction for developers imho.
>
> On Fri, Sep 28, 2018 at 7:42 AM kellen sunderland <
> kellen.sunderl...@gmail.com> wrote:
>
> > "Range loops aren’t always the most performant way" Do you have an
> example
> > where there's a perf difference?
> >
> > "In addition, sometimes you want the index. Or maybe you want to iterate
> > backwards, or not start from the first, etc. Maybe you want the iterator
> > because you remove it from the list at the bottom of the loop Seems
> > like a rule for the sake of having a rule."
> >
> > I should have been more clear about this point.  If you're using the
> index
> > in the loop, doing reverse iteration, or not iterating from start-to-end
> > this inspection is smart enough to realize it and will not suggest
> > optimizing that type of loop.  The loops that would be changes are _only_
> > the loops which are detected as equivalent to range-loops.  Examples can
> be
> > found here:
> >
> https://clang.llvm.org/extra/clang-tidy/checks/modernize-loop-convert.html
> > or you can look at what's been changed in the ref PR.  I've initially set
> > our confidence level at 'reasonable' but we could also set to 'safe'
> which
> > would further reduce the number of loops the check would apply to.
> >
> > -Kellen
> >
> > On Fri, Sep 28, 2018 at 3:54 PM Chris Olivier 
> > wrote:
> >
> > > -1
> > >
> > > Range loops aren’t always the most performant way. In addition,
> sometimes
> > > you want the index. Or maybe you want to iterate backwards, or not
> start
> > > from the first, etc. Maybe you want the iterator because you remove it
> > from
> > > the list at the bottom of the loop Seems like a rule for the sake
> of
> > > having a rule.
> > >
> > > On Fri, Sep 28, 2018 at 2:12 AM kellen sunderland <
> > > kellen.sunderl...@gmail.com> wrote:
> > >
> > > > Hello MXNet devs,
> > > >
> > > > I'd like to discuss uniformly adopting C++11 range loops in the MXNet
> > > > project.  The benefits I see are:
> > > >
> > > > *  Improved C++ readability (examples below).
> > > > *  Consistency with other languages.  The range-loops are quite
> similar
> > > to
> > > > loops almost all other programming languages.  Given we're a project
> > that
> > > > supports many languages this language consistency could be positive
> for
> > > our
> > > > community.
> > > > * Consistency within the same project.  Currently different authors
> > have
> > > > different loops styles which hurts codebase readability.
> > > > *  Best available performance.  There are often multiple ways to
> write
> > > > loops in C++ with subtle differences in performance and memory usage
> > > > between loop methods.  Using range-loops ensures we get the best
> > possible
> > > > perf using an intuitive loop pattern.
> > > > *  Slightly lower chance for bugs / OOB accesses when dealing with
> > > indexing
> > > > in an array for example.
> > > >
> > > > If we decide to enable this uniformly throughout the project we can
> > > enable
> > > > this policy with a simple clang-tidy configuration change.  There
> would
> > > be
> > > > no need for reviewers to have to manually provide feedback when
> someone
> > > > uses an older C++ loops style.
> > > >
> > > > -Kellen
> > > >
> > > > Reference PR:  https://github.com/apache/incubator-mxnet/pull/12356/
> > > > Previous clang-tidy discussion on the list:
> > > >
> > > >
> > >
> >
> https://lists.apache.org/thread.html/b0ae5a9df5dfe0d9074cb2ebe432264db4fa2175b89fa43a5f6e36be@%3Cdev.mxnet.apache.org%3E
> > > >
> > > > -
> > > > Examples:
> > > > for (auto axis_iter = param.axis.begin() ; axis_iter!=
> > param.axis.end();
> > > > ++axis_iter) {
> > > > CHECK_LT(*axis_iter, static_cast(ishape.ndim()));
> > > > stride_[reverse_index] = ishape[*axis_iter];
> > > > ...
> > > > -->
> > > > for (int axis : param.axis) {
> > > > CHECK_LT(axis, static_cast(ishape.ndim()));
> > > > stride_[reverse_index] = ishape[axis];
> > > > ...
> > > > --
> > > > for (size_t i = 0; i < in_array.size(); i++) {
> > > > auto  = in_array[i];
> > > > pre_temp_buf_.emplace_back(nd.shape(), nd.ctx(), true,
> nd.dtype());
> > > > }
> > > > -->
> > > > for (auto & nd : in_array) {
> > > > pre_temp_buf_.emplace_back(nd.shape(), nd.ctx(), true,
> nd.dtype());
> > > > }
> > > >
> > >
> >
>

Re: Subscription

2018-09-29 Thread Naveen Swamy

Invite sent. Welcome to Apache MXNet Cosmin :).

On Sat, Sep 29, 2018 at 11:38 AM Cosmin Cătălin Sanda <
cosmincata...@gmail.com> wrote:

> Hi, I would like to subscribe to the ASF mxnet channel.
> 
> *Cosmin Catalin SANDA*
> Data Scientist & Engineer
> Phone: +45.27.30.60.35
> Web: https://cosminsanda.com
>

Re: Feedback request for new Java API

2018-09-29 Thread Naveen Swamy

Java APIs are not like Clojure - The current proposal is only to build a
few thin wrappers for Inference.

To better represent the two cases and this discussion in particular, here
is an example API

1) def Activation (data : org.apache.mxnet.NDArray, act_type : String, out
: Option[NDArray] = None) : org.apache.mxnet.NDArrayFuncReturn
or
2) def Activation (data : org.apache.mxnet.NDArray, act_type : String, out
: NDArray) : org.apache.mxnet.NDArrayFuncReturn

The discussion is should we add(generate) 200+ APIs to make it Java
compatible, ie., remove the Option class and the None default value which
Java does not understand from Option 1)

my suggestion was to remove the Option class and create a implicit for
backward compatibility and use null instead of None, Andrew and I disagreed
on this, so I suggested to raise a discussion on dev@ to get more opinions
and one of us will disagree and commit. Thanks for raising it :)

| * def Activation (data : org.apache.mxnet.NDArray, act_type : String, out
: NDArray = null) : org.apache.mxnet.NDArrayFuncReturn |
--

1) It is not true that Scala users will lose *default/optional* arguments -
if we followed the above, they will use null or None, though I do not like
using nulls, this is a fine compromise.
To keep backward compatibility we can create a implicit to convert
Option.None to nulls and Option.Some-> Option.get(), so you are not going
to break users who might have been using the APIs that were released in
1.3. The current incompatibility is only this w.r.t. NDArrays.

2) Now about the Scala Macros - they are not simple to read or use, When I
and Qing started working on the #Scala Macros to improve the APIs, it took
us a good amount of time to get a hang of it. I don't want to add
additional code when not necessary.

My suggestion and vote is to modify existing Macro(i.e., #1 from the
original email with the necessary clarification above) and make it
compatible with Java
Here are my reasons
1) The NDArray APIs in question are not following functional style of
programming, in fact they are just static methods defined on an NDArray
object - so Scala users are not losing much by using null in place of None.
You can create a implicit to maintain backward compatibility
2) It is adding 220+ APIs(I understand it is generated) for NDArray alone
3) this is adding another 100s of APIs unnecessarily, we are starting with
NDArray but we can't stop there, we will have to do this for Symbol,
Executor, Iterators, etc., .
3) I don't want to be fixing bugs and maintaining code in 2 places.
4) I want the cryptic code(# scala macros) to a minimum.
5) increased compilation time & bad developer experience - the time to
compile has gone up quite a bit since we added the APIs last release on my
3 year old laptop already.. I think adding 400+ APIs unnecessarily would
significantly increase build time and bad developer experience
6) I want to keep the core of the framework to be in Scala - because it
allows you to write concise code - Yes it has a bit of learning curve, not
everyone needs to know. I would rather invest in solidifying the Scala APIs
and add more features in Scala(RNN, Support GluonHybridizedBlock...there is
quite bit of work ) - do you want to rewrite everything in Scala and Java.
7) Also, the discussion is not creating NDArray class for Java, just
generate certain APIs to cater for Java incompatibility.

@Andrew: To your response to Qing's comments - you cannot just consider it
as just generating NDArray's APIs and instead I suggest to take a wholistic
view of all the various implications.

@Chris: Yes, Scala has a bit of learning curve - the goal is not having
every developer to deal with how these APIs are generated,
the problem exists either ways with the above proposal. I might agree if we
were to move away completely(with a thorough discussion and valid reasons)
and instead use AspectJ or similar to write these APIs, the discussion is
about using Scala Macros to generate 2 different types of APIs which are
functionally not different and usability wise are very very similar, look
at the example.
Thanks for your input, I will deposit your 0.02$ in our JIRA bank :)

@Carin: It requires more effort to use AspectJ or similar to generate APIs
using reflection or at compile time, here we need to generate at compile
time so Java users have the API signature on their IDEs.

Thanks, Naveen

P.S: I am traveling and my responses will be delayed.

On Fri, Sep 28, 2018 at 10:25 AM Carin Meier  wrote:

> Sorry bad paste on the gist - here is the good one
> https://gist.github.com/gigasquid/01cd48f563db4739910592dd9ac9db20
>
> On Fri, Sep 28, 2018 at 10:24 AM Carin Meier  wrote:
>
> > +1 on option #2
> >
> > In the case of minimizing the the overhead for code maintenance, I wanted
> > to suggest the option of investigating generating code from the Java
> > Reflection for the Java APIs.  I did a quick gist from Clojure of what
> the
> > generated classes look like from the

Re: Which merge option to use on the Import Julia binding PR?

2018-09-28 Thread Naveen Swamy

Should we try to first being into a branch and then try merge that branch? 

> On Sep 28, 2018, at 4:40 PM, Pedro Larroy  
> wrote:
> 
> I'm not familiar with the specifics of this contribution, as a general
> approach my understanding is that if the list of commits is big and you
> want to preserve history, usually merging is better so you keep history and
> causality, if you rebase all the commits on top of master you are changing
> the history of these commits which can't be individually reverted as some
> have suggested before. Maybe is because I come from a mercurial background,
> but my initial impression would be either to:
> 1. squash everything and rebase
> 2. or merge without rebasing or squashing.
> 
> Pedro.
> 
>> On Thu, Sep 27, 2018 at 3:10 PM Carin Meier  wrote:
>> 
>> Thanks everyone for the input. I'll try to summarize the feedback from the
>> responses:
>> 
>> Using Squash-Merge is the project standard for very good reasons. However,
>> in the case of this PR to bring in the Julia language from its sibling
>> repo, we want to preserve all the individual commits of the many
>> contributors that have worked over multiple years to make this a great
>> language binding. We will use Rebase-Merge for it.
>> 
>> Chiyuan - thanks for the suggestion of using a tag. I think we can try it
>> initially without it since there are other ways to browse the commit
>> history, like looking at the PRs. But, we can add the tag retroactively if
>> people start having trouble.
>> 
>> If there no objections, I will merge the PR using the above method in my
>> morning (EST).
>> 
>> Thanks everyone! I'm looking forward to having the Julia community join the
>> main repo and increasing our collaboration with them.
>> 
>> Best,
>> Carin
>> 
>>> On Thu, Sep 27, 2018 at 1:37 PM Chiyuan Zhang  wrote:
>>> 
>>> +1 for rebase and merge. As a workaround for the aforementioned issue,
>>> maybe we can create a tag for the commit before the merge, so that in
>> case
>>> people want to browse the recent main-repo commits by skipping this big
>>> chunk of rebased commits, there is a pointer to take his or her hand on.
>>> 
>>> Best,
>>> Chiyuan
>>> 
>>>> On Thu, Sep 27, 2018 at 7:34 AM Jason Dai  wrote:
>>>> 
>>>> +1 to rebase and merge to preserve and track the contributions.
>>>> 
>>>> Thanks,
>>>> -Jason
>>>> 
>>>> On Thu, Sep 27, 2018 at 12:27 PM Aaron Markham <
>>> aaron.s.mark...@gmail.com>
>>>> wrote:
>>>> 
>>>>> +1 to rebase and merge to retain the efforts of all of the
>>> contributors.
>>>> If
>>>>> there's some git maintenance that can trim it down from 700+ commits
>>> then
>>>>> maybe that's a compromise.
>>>>> 
>>>>>> On Wed, Sep 26, 2018, 21:23 Naveen Swamy  wrote:
>>>>>> 
>>>>>> this PR comes from more than 1 individual, if we squash merge we'll
>>> not
>>>>> be
>>>>>> able to attribute the contribution of those individuals.
>>>>>> 
>>>>>> +1 to rebase merge to preserve history
>>>>>> 
>>>>>> On Thu, Sep 27, 2018 at 12:04 AM, Tianqi Chen <
>>>> tqc...@cs.washington.edu>
>>>>>> wrote:
>>>>>> 
>>>>>>> One of the main reason for a rebase merge is that it preserves
>> the
>>>>> commit
>>>>>>> history of the MXNet.jl package contributors, and given that the
>>>>> project
>>>>>>> has been evolved since 2015 and has always been a high-quality
>>>> language
>>>>>>> module for MXNet.
>>>>>>> 
>>>>>>> I think we should take an exception here to preserve the commit
>>>> history
>>>>>> of
>>>>>>> each individual contributors to the Julia binding and welcome
>> them
>>> to
>>>>> the
>>>>>>> community.
>>>>>>> 
>>>>>>> Tianqi
>>>>>>> 
>>>>>>> On Wed, Sep 26, 2018 at 8:55 PM Tianqi Chen <
>>>> tqc...@cs.washington.edu>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> In this particular case, I would suggest rebase and merge.
>>>>>>>> 
>>>>>&g

Re: Requesting slack access

2018-09-26 Thread Naveen Swamy

Invite sent. Welcome to Apache MXNet, we have a forum for questions related
to MXNet https://discuss.mxnet.io/ and in-depth tutorial on getting started
with MXNet https://gluon.mxnet.io/, feel free to ask questions

On Thu, Sep 27, 2018 at 12:04 AM, Rahul Padmanabhan <
rahul.padmanab...@gmail.com> wrote:

> Hello,
>
> I’m Rahul Padmanabhan, a Data Scientist in Montreal. I code in Python (and
> Scala sometimes) and am interested in contributing to the development of
> MXNet. Could you please add me to the MXNet Slack Channel?
>
> Thanks!
> Rahul Padmanabhan

Re: Which merge option to use on the Import Julia binding PR?

2018-09-26 Thread Naveen Swamy

this PR comes from more than 1 individual, if we squash merge we'll not be
able to attribute the contribution of those individuals.

+1 to rebase merge to preserve history

On Thu, Sep 27, 2018 at 12:04 AM, Tianqi Chen 
wrote:

> One of the main reason for a rebase merge is that it preserves the commit
> history of the MXNet.jl package contributors, and given that the project
> has been evolved since 2015 and has always been a high-quality language
> module for MXNet.
>
> I think we should take an exception here to preserve the commit history of
> each individual contributors to the Julia binding and welcome them to the
> community.
>
> Tianqi
>
> On Wed, Sep 26, 2018 at 8:55 PM Tianqi Chen 
> wrote:
>
> > In this particular case, I would suggest rebase and merge.
> >
> > The main reasoning is that the commit log of the Julia binding is not
> > simple WIP commits, every commit there has been done through testcases
> and
> > it is important for us to respect the developer of the effort. It is also
> > good to trace back the history of the commits more easily.
> >
> > Tianqi
> >
> >
> > Tianqi
> >
> > On Wed, Sep 26, 2018 at 5:34 PM Carin Meier 
> wrote:
> >
> >> Chiyuan,
> >>
> >> Thanks for the prompt to find some clarity of the pros and cons of
> each. I
> >> think that will help drive us to the right decision. I think some of
> those
> >> reasons are the ones you listed. I will take a stab below at outlining
> >> what
> >> I see. Feel free to chime in if I missed any.
> >>
> >> *Squash and Merge*
> >>   *Pros* - It is the project standard
> >>   - It will provide one commit for the feature and lessen the
> need
> >> for 700+ commits rebased on top of master.
> >>  - It is easier for a user to do git log to browse commits and
> see
> >> what was features were added.
> >>   *Cons* - I don't know how github would handle squashing all those
> commit
> >> messages into one. Will it be too much?
> >> - You lose the granularity of the features individual
> commits
> >>
> >> *Rebase and Merge*
> >>  * Pros *- You don't have a huge commit message with one commit
> >>   -  You do have the granularity of the individual features of
> the
> >> commit
> >>  * Cons *- It is not the project standard
> >>- You have 700+ commits on top of master that might be harder
> >> to
> >> see the ones that went in right before. (like someone browsing commits)
> >>
> >> On Wed, Sep 26, 2018 at 8:12 PM Chiyuan Zhang 
> wrote:
> >>
> >> > Hi Carin,
> >> >
> >> > Can you clarify the pros and cons of the two approaches? Is the main
> >> > concern here about logistics (e.g. preserving the history of the
> >> original
> >> > repo and developments) or technical issue (e.g. using squash might end
> >> up
> >> > with a hge commit message that might be difficult or hard to
> >> handle)?
> >> >
> >> > I think it might not be very likely that someone is going to cherry
> pick
> >> > revert some of the commits. But preserving the commit history is still
> >> > useful in case one need to trace the change or bisect for some
> >> regression
> >> > bugs, etc.
> >> >
> >> > Just to provide some context: the PR actually contains 700+ commits,
> >> and it
> >> > dates back to 2015. The development of the Julia binding started in
> the
> >> > early stage of MXNet. We started with a separate repo due to the
> >> > requirement of the package system of julia.
> >> >
> >> > Best,
> >> > Chiyuan
> >> >
> >> > On Wed, Sep 26, 2018 at 3:41 PM Carin Meier 
> >> wrote:
> >> >
> >> > > The Import Julia binding PR ,(
> >> > > https://github.com/apache/incubator-mxnet/pull/10149), is getting
> >> very
> >> > > close to being merged. Because of the large number of commits there
> >> was a
> >> > > suggestion not to use the usual "Squash and Merge".  The only option
> >> > would
> >> > > be "Rebase and Merge" since merging with a merge commit is not
> enabled
> >> > for
> >> > > the project.
> >> > >
> >> > > *Squash and Merge* - The commits from this branch will be combined
> >> into
> >> > one
> >> > > commit in the base branch (With all the commit messages combined)
> >> > >
> >> > > *Rebase and Merge* - The commits from this branch will be rebased
> and
> >> > added
> >> > > to the base branch
> >> > >
> >> > > The PR is over 250+ commits (Github won't show all of them)
> >> > >
> >> > > Thoughts about how we should handle the merge?
> >> > >
> >> > > Thanks,
> >> > > Carin
> >> > >
> >> >
> >>
> >
>

Re: Remove MKLML as dependency

2018-09-20 Thread Naveen Swamy

if MKLDNN is a replacement for MKL and MKLML (which is my understanding)
may you guys should bring in the necessary functions into MKLDNN instead of
letting the users go through this nightmare of a setup.

On Thu, Sep 20, 2018 at 6:01 PM, Lv, Tao A  wrote:

> " MKLML does not have a complete blas library and if you don’t link in
> another blas library like open blas, some functions will blow up (ie some
> of the linalg functions)."
> - Is there any GitHub issue for this problem? Maybe we can take a look.
>
> "I was not aware of MKLML still being required with MKLDNN."
> - Just to clarify, MKL-DNN doesn't require MKLML. For performance, MKL-DNN
> requires the GEMM functions which can be provided by both MKL and MKLML.
>
> -Original Message-
> From: Chris Olivier [mailto:cjolivie...@gmail.com]
> Sent: Friday, September 21, 2018 12:07 AM
> To: dev@mxnet.incubator.apache.org
> Subject: Re: Remove MKLML as dependency
>
> MKLML does not have a complete blas library and if you don’t link in
> another blas library like open blas, some functions will blow up (ie some
> of the linalg functions).
>
> I was not aware of MKLML still being required with MKLDNN. I’ve never
> gotten a definitive answer about this from Da, although I’ve asked a couple
> of times.
>
> What does Da say about all of this?
>
> Unless there’s good reason to the contrary, removing MKLML and requiring
> the larger, strangely licensed standalone MKL for everyone seems a bit
> heavy-handed.
>
> On Thu, Sep 20, 2018 at 7:41 AM Lv, Tao A  wrote:
>
> > Hah, seems it's a little confusing here. I think the "Intel MKL" in
> > the first statement includes both the full MKL and MKLML library. And
> > the "dynamic library" there obviously means the MKLML which is
> > delivered in MKL-DNN repo.
> >
> > MKLML is a subset of full MKL and includes all BLAS functions for both
> > single precision and double precision. From this point of view, I
> > think it can be used as a BLAS library, but cannot be used as full MKL.
> >
> > -tao
> >
> > -Original Message-
> > From: Chris Olivier [mailto:cjolivie...@gmail.com]
> > Sent: Thursday, September 20, 2018 9:36 PM
> > To: dev@mxnet.incubator.apache.org
> > Subject: Re: Remove MKLML as dependency
> >
> > thanks for the info. I am still a little confused — your statement
> > said “MKL” and not “MKLML”, so my question is still the same.  Are
> > GEMMS in MKLML or just MKL? I know MKLML doesn’t have a blas library
> > like the main MKL.
> >
> > On Wed, Sep 19, 2018 at 11:49 PM Lv, Tao A  wrote:
> >
> > > Hi Chris, please kindly check the statements here:
> > > https://github.com/intel/mkl-dnn#installation
> > >
> > > " Intel MKL-DNN can take advantage of optimized matrix-matrix
> > > multiplication (GEMM) function from Intel MKL. The dynamic library
> > > with this functionality is included in the repository. "
> > >
> > > " You can choose to build Intel MKL-DNN without binary dependency.
> > > The resulting version will be fully functional, however performance
> > > of certain convolution shapes and sizes and inner product relying on
> > > SGEMM function may be suboptimal."
> > >
> > > -tao
> > >
> > > -Original Message-
> > > From: Chris Olivier [mailto:cjolivie...@gmail.com]
> > > Sent: Thursday, September 20, 2018 11:20 AM
> > > To: dev@mxnet.incubator.apache.org
> > > Subject: Re: Remove MKLML as dependency
> > >
> > > maybe I missed it, but what does MKLML have that mkldnn doesn’t have
> > > that makes it necessary?
> > >
> > > what’s the motivation for removing it?
> > >
> > > On Tue, Sep 18, 2018 at 11:31 PM Lv, Tao A  wrote:
> > >
> > > > If you just want to test the performance, I think you need link
> > > > MKL for BLAS and MKL-DNN for NN. Also MKL-DNN should link MKL for
> > > > better performance.
> > > >
> > > > Here are some ways for you to install full MKL library if you
> > > > don't have
> > > > one:
> > > > 1. Register and download from intel website:
> > > > https://software.intel.com/en-us/mkl
> > > > 2. Apt-get/yum: currently it need configure Intel’s repositories.
> > > > a.
> > > >
> > > https://software.intel.com/en-us/articles/installing-intel-free-libs
> > > -a
> > > nd-python-yum-repo
> > > > b. https://software.intel.com/en-us/articles/
> > > > thatinstalling-intel-free-libs-and-python-apt-repo
> > > >  > > > ib
> > > > s-
> > > > and-python-apt-repo> 3. pip install mkl / mkl-devel: ‘mkl’ package
> > > > and-python-apt-repo> has
> > > > the runtime and ‘mkl-devel’ includes everything with the headers
> > > > a.
> > > > https://software.intel.com/en-us/articles/installing-the-intel-dis
> > > > tr ib
> > > > ution-for-python-and-intel-performance-libraries-with-pip-and
> > > > 4. conda install: also has mkl and mkl-devel
> > > > a. https://anaconda.org/intel/mkl
> > > > b. https://anaconda.org/intel/mkl-devel
> > > >
> > > > If you want to redistribute MKL with

Re: Some feedback from MXNet Zhihu topic

2018-09-20 Thread Naveen Swamy

Qing,

this is so loaded and very specific suggestions. Thank you for bringing up
here, since Apache MXNet is popular in China, It would be great if Mandrin
speaking developers here could bring such feedback and user pain to the
community's attention.

1. To capture specific API/Example/Tutorial that users have an issue on, Mu
suggested in the past to add thumbs up/down on the website:
https://issues.apache.org/jira/browse/MXNET-972

6. The heavy code base is not because of the code in the MXNet repo, its
all the sub-modules that are added to the repo - I have had this problem
too, to build MXNet i have to fetch and build the whole world that MXNet
depends on and its dependency(sub within sub) - I think its time to revisit
and refactor.

For others I suggest you work with someone to create actionable JIRAs(may
be Denis - because he knowledgable JIRA and creates nice actionable
stories), it would be nice if these stories can contain many
first-good-issue tasks for new contributors to pick up - creating
standalone examples(from existing) is a great one for newbies to learn
MXNet and contribute back.

Examples are very important for someone to not only quickly learn but also
extend/adopt to their own application, In Scala we(you) have added tests
around Examples and actually use them as integration tests - we should do
insist the same for new examples written or old examples that we touch .

In Deep Learning what is more critical and could increase rapid adoption is
to have the latest and greatest papers implemented as examples - this is a
call for suggestions and Action to the community.

Thanks, Naveen

On Wed, Sep 19, 2018 at 10:39 PM, Aaron Markham 
wrote:

> Thanks for this translation and feedback Qing!
> I've addressed point 3 of the documentation feedback with this PR:
> https://github.com/apache/incubator-mxnet/pull/12604
> I'm not sure how to take the first two points without some explicit URLs
> and examples, so if anyone has those I'd be happy to take a look if there's
> some glitch vs missing or wrong docs.
>
> Also, I would agree that there should be some more simple examples. Often
> times the examples are too complicated and unclear about what is important
> or not. The audience targeting is for deep learning practitioners, not
> "newbies".
>
> And on a related note, I'd really like to pull the Gluon stuff into the API
> section. It's confusing as its own navigation item and orphaned
> information. It could have a navigation entry at the top of the API list
> like "Python: Gluon" or just "Gluon" then list "Python: Module" or just
> "Python". Or running this the other way, the Gluon menu could have API and
> Tutorials and be more fleshed out, though this is not my preference. Either
> way, it needs some attention.
>
> Cheers,
> Aaron
>
> On Wed, Sep 19, 2018 at 11:04 AM Qing Lan  wrote:
>
> > Hi all,
> >
> > There was a trend topic in
> > Zhihu (a famous Chinese Stackoverflow+Quora) asking about the status of
> > MXNet in 2018 recently. Mu replied the thread and obtained more than 300+
> > `like`.
> > However there are a few concerns addressed in the comments of this
> thread,
> > I have done some simple translation from Chinese to English:
> >
> > 1. Documentations! Until now, the online doc still contains:
> > 1. Depreciated but not updated doc
> > 2. Wrong documentation with poor description
> > 3. Document in Alpha stage such as you must install `pip
> > –pre` in order to run.
> >
> > 2. Examples! For Gluon specifically, many examples are still mixing
> > Gluon/MXNet apis. The mixure of mx.sym, mx.nd mx.gluon confused the users
> > of what is the right one to choose in order to get their model to work.
> As
> > an example, Although Gluon made data encapsulation possible, still there
> > are examples using mxn.io.ImageRecordIter with tens of params (feels like
> > gluon examples are simply the copy from old Python examples).
> >
> > 3. Examples again! Comparing to PyTorch, there are a few examples I don't
> > like in Gluon:
> > 1. Available to run however the code structure is still
> > very complicated. Such as example/image-classification/cifar10.py. It
> > seemed like a consecutive code concatenation. In fact, these are just a
> > series of layers mixed with model.fit. It makes user very hard to
> > modify/extend the model.
> > 2. Only available to run with certain settings. If users
> > try to change a little bit in the model, crashes will happen. For
> example,
> > the multi-gpu example in Gluon website, MXNet hide the logic that using
> > batch size to change learning rate in a optimizer. A lot of newbies
> didn't
> > know this fact and they would only find that the model stopped converging
> > when batch size changed.
> > 3. The worst scenario is the model itself just simply
> > didn't work. Maintainers in the MXNet community didn't run the model

Re: multiple installation guides?

2018-09-18 Thread Naveen Swamy

Amol, do you want to post this on discuss.mxnet.io ?

> On Sep 19, 2018, at 4:34 AM, Hagay Lupesko  wrote:
> 
> The /test site seems to be something old that should have been removed a
> long time ago, it lists versions 0.10 and 0.10.14 :)
> Maybe Aaron has an idea what needs to be done to remove it...
> 
>> On Fri, Sep 14, 2018 at 4:55 PM Alex Zai  wrote:
>> 
>> Why do we have two sets of installation guides?
>> 
>> http://mxnet.incubator.apache.org/test/get_started/install.html
>> 
>> https://mxnet.incubator.apache.org/install/index.html?platform=Linux=Python=CPU
>> 
>> The /test domain is also not secure. If this is not suppose to be
>> public we should remove this as it is confusing.
>>

Re: MXNet Slack Channel

2018-09-15 Thread Naveen Swamy

invited. welcome to MXNet.

On Fri, Sep 14, 2018 at 1:55 PM, Denver McNeney 
wrote:

> Hello,
>
> I’d like to be added to the MXNet Slack channel!
>
> Thanks,
>
> Denver

Re: Requesting slack access

2018-09-13 Thread Naveen Swamy

done, welcome back.

On Thu, Sep 13, 2018 at 12:57 AM, shiwen hu  wrote:

> please add me to slack
>

Re: Requesting slack access

2018-09-12 Thread Naveen Swamy

done.

On Wed, Sep 12, 2018 at 11:09 AM, Chaitanya Bapat 
wrote:

> Hello,
>
> Chaitanya here. Requesting slack access.
> Thanks
>
> --
> *Chaitanya Prakash Bapat*
> *+1 (973) 953-6299*
>
> [image: https://www.linkedin.com//in/chaibapat25]
> [image: https://www.facebook.com/chaibapat
> ]
> [image:
> https://twitter.com/ChaiBapchya] [image:
> https://www.linkedin.com//in/chaibapat25]
> 
>

Re: Off-Heap Memory Management in MXNet Scala

2018-09-12 Thread Naveen Swamy

Thank you all for your feedback.

@Chris: Yes, One of the Amazon user(Calum Leslie) had contributed the
Dispose Pattern removing the free of native handles in Finalizers and
instead added Log. This was done because calling free in Finalizers was
segfaulting the application at random points and was very hard to reproduce
and debug.
The dispose pattern worked for some cases but made code cumbersome from a
readability aspect, keeping track of all the objects that were
created(imagine slice/reshape instead of writing expressions you are now
creating unnecessary variables and calling dispose on them).
As the 1st graph in the design shows despite carefully calling dispose on
most objects, there was constant memory leak and diagnosing leaks wasn't
straightforward. Note that Finalizers run on a separate thread later than
the object was found unreachable.

@Timur, thanks for the feedback.
1) No, the goal here is to manage Native memory that is created for various
operations. In MXNet-Scala most objects are in C++ Heap and Scala objects
are wrappers around it, the MXNet engine when it runs operations expects
objects to be accessible in C++ Heap.

2) Agree MNIST is not representative, the goal was to understand and show
that the existing code has hard to debug memory leaks(even for MNIST). I
was aiming to test my prototype code and see if my changes make a
difference. Yizhi suggested I run tests against RESNET50 model which I will
do as a part of my implementation. I think this is a standard benchmark
model that is widely used. Also note that most of MXNet-Scala's use-case
that we have seen is for Inference.

3) No, we haven't created a branch for Java-API work, please look at this
design and kindly leave your feedback:
https://cwiki.apache.org/confluence/display/MXNET/MXNet+Java+Inference+API

4) Calling System.gc() will be configurable(including don't call GC), one
of the feedback that I got from a User is calling System.gc on the user's
behalf is intrusive which i think is also the point you are making.

5) understood and agree, I see the calling GC as only a part of the
solution and configurable option. For using GPUs, training and other memory
intensive application ResourceScope is be a very good option.

Another alternative is to create Bytebuffers in Java and map the C++
pointers to JVM heap by tapping to the native malloc/free that way JVM is
aware of all the memory that is allocated and can free appropriately
whenever the objects becomes unreachable. I have to note that this still
does not solve the problem of accumulating memory until GC has kicked in.
This approach is too very involved and might not be tenable.

@Marco, thanks for your comments.
1) JVM kicks of GC when it feels pressure on JVM Heap not CPU RAM. Objects
on GPU are no special they are still off-heap(JVM Heap) so this would work,
look at the graph that show running GAN example on GPUs in the doc.

2) I am not looking to rewrite the Memory Allocation in MXNet, that will
still be handled by the C++ backend, the goal here is to free(reduce of
shared pointer count) native-memory when JVM objects go out of scope(become
unreachable).

@Carin, yes hopefully this would alleviate the memory management headache
for our users.

Hope that makes sense.

Thanks, Naveen

On Wed, Sep 12, 2018 at 6:06 AM, Carin Meier  wrote:

> Naveen,
>
> Thanks for putting together the detailed document and kickstarting this
> effort. It will benefit all the MXNet JVM users and will help solve a
> current pain point for them.
>
> - Carin
>
> On Tue, Sep 11, 2018 at 5:37 PM Naveen Swamy  wrote:
>
> > Hi All,
> >
> > I am working on managing Off-Heap Memory Management and have written a
> > proposal here based on my prototype and research I did.
> >
> > Please review the doc and provide your feedback ?
> >
> > https://cwiki.apache.org/confluence/display/MXNET/JVM+Memory+Management
> >
> > I had offline discussion with a few people I work with and added their
> > feedback to the doc as well.
> >
> > Thanks, Naveen
> >
>

Re: Error Publishing OSX package to Maven

2018-09-11 Thread Naveen Swamy

Qing helped test by excluding the bundle/source. It still creates 2 repos
in Staging. I am not sure if Maven Version or some other maven-plugin
difference is causing this issue, there was no issue publishing to
Linux-CPU and Linux-GPU from the same code.
For now, I have manually uploaded the artifacts(generated through maven
release prepare/perform) to the Staging repo so we can make it available
for OSX users.

This issue needs to be investigated independently.

-Naveen

On Tue, Sep 11, 2018 at 2:03 PM, Naveen Swamy  wrote:

> that seems reasonable. I also asked this question on d...@maven.apache.org
> to see if there is anything obvious that we missed.
>
> https://lists.apache.org/thread.html/def6e5c6c47ab2f39592a1fe060b6c
> fd0008d303a2b5c814545d231b@%3Cdev.maven.apache.org%3E
>
> On Tue, Sep 11, 2018 at 11:25 AM, Carin Meier 
> wrote:
>
>> I would suggest trying
>>
>> * Remove bundle/source jar from 1.3.0 and attempt publishing OSX package.
>> and seeing if that solves the problem and allows publishing to staging.
>>
>>  If it does work then it helps to identity the cause.
>> If the effort does not take to long. I would recommend timeboxing the
>> effort and if it is going to take more effort to investigate, we can go
>> with announcing later for OSX.
>>
>>
>> - Carin
>>
>> On Tue, Sep 11, 2018 at 2:10 PM Naveen Swamy  wrote:
>>
>> > hey all,
>> >
>> > I am working on publishing the 1.3.0 Scala package to Maven and
>> > encountering a error when I am about to Close the Repo on Nexus. When I
>> > publish the OSX package to Staging, the artifacts gets split into 2
>> > repositories and when I close[1] the repo(to make it available for use)
>> it
>> > fails since the signature files are in a different repo. This is how the
>> > artifacts are getting split
>> >
>> > Repo1: orgapachemxnet-1018
>> > ===
>> > mxnet-full_2.11-osx-x86_64-cpu-bundle.jar.md5
>> > mxnet-full_2.11-osx-x86_64-cpu-bundle.jar.sha1
>> > mxnet-full_2.11-osx-x86_64-cpu-sources.jar.asc
>> > mxnet-full_2.11-osx-x86_64-cpu-sources.jar.md5
>> > mxnet-full_2.11-osx-x86_64-cpu-src.jar.md5
>> > mxnet-full_2.11-osx-x86_64-cpu-src.jar.sha1
>> > mxnet-full_2.11-osx-x86_64-cpu.jar
>> > mxnet-full_2.11-osx-x86_64-cpu.jar.asc
>> > mxnet-full_2.11-osx-x86_64-cpu.pom.md5
>> > mxnet-full_2.11-osx-x86_64-cpu.pom.sha1
>> >
>> >
>> > Repo2: orgapachemxnet-1019
>> > ===
>> > mxnet-full_2.11-osx-x86_64-cpu-bundle.jar
>> > mxnet-full_2.11-osx-x86_64-cpu-bundle.jar.asc
>> > mxnet-full_2.11-osx-x86_64-cpu-sources.jar
>> > mxnet-full_2.11-osx-x86_64-cpu-sources.jar.sha1
>> > mxnet-full_2.11-osx-x86_64-cpu-src.jar
>> > mxnet-full_2.11-osx-x86_64-cpu-src.jar.asc
>> > mxnet-full_2.11-osx-x86_64-cpu-jar.md5
>> > mxnet-full_2.11-osx-x86_64-cpu.jar.sha1
>> > mxnet-full_2.11-osx-x86_64-cpu.pom
>> > mxnet-full_2.11-osx-x86_64-cpu.pom.asc
>> >
>> > This was not an issue with Linux-CPU and Linux-GPU versions for 1.3.0
>> and
>> > suspicion is the new Source and documentation jar files being added as a
>> > part of 1.3.0 for OSX.
>> >
>> > There are 2 options
>> > * Continue to investigate the issue and announce later for OSX (after
>> the
>> > general announcement)
>> > * Remove bundle/source jar from 1.3.0 and attempt publishing OSX
>> package.
>> >
>> > What do you guys suggest?
>> >
>> >
>> > 1) Closing Staged Repo -
>> >
>> > https://central.sonatype.org/pages/releasing-the-deployment.
>> html#locate-and-examine-your-staging-repository
>> > 2) Scala Release Process:
>> >
>> > https://cwiki.apache.org/confluence/display/MXNET/MXNet-
>> Scala+Release+Process
>> >
>>
>
>

Off-Heap Memory Management in MXNet Scala

2018-09-11 Thread Naveen Swamy

Hi All,

I am working on managing Off-Heap Memory Management and have written a
proposal here based on my prototype and research I did.

Please review the doc and provide your feedback ?

https://cwiki.apache.org/confluence/display/MXNET/JVM+Memory+Management

I had offline discussion with a few people I work with and added their
feedback to the doc as well.

Thanks, Naveen

Re: Error Publishing OSX package to Maven

2018-09-11 Thread Naveen Swamy

that seems reasonable. I also asked this question on d...@maven.apache.org
to see if there is anything obvious that we missed.

https://lists.apache.org/thread.html/def6e5c6c47ab2f39592a1fe060b6cfd0008d303a2b5c814545d231b@%3Cdev.maven.apache.org%3E


On Tue, Sep 11, 2018 at 11:25 AM, Carin Meier  wrote:

> I would suggest trying
>
> * Remove bundle/source jar from 1.3.0 and attempt publishing OSX package.
> and seeing if that solves the problem and allows publishing to staging.
>
>  If it does work then it helps to identity the cause.
> If the effort does not take to long. I would recommend timeboxing the
> effort and if it is going to take more effort to investigate, we can go
> with announcing later for OSX.
>
>
> - Carin
>
> On Tue, Sep 11, 2018 at 2:10 PM Naveen Swamy  wrote:
>
> > hey all,
> >
> > I am working on publishing the 1.3.0 Scala package to Maven and
> > encountering a error when I am about to Close the Repo on Nexus. When I
> > publish the OSX package to Staging, the artifacts gets split into 2
> > repositories and when I close[1] the repo(to make it available for use)
> it
> > fails since the signature files are in a different repo. This is how the
> > artifacts are getting split
> >
> > Repo1: orgapachemxnet-1018
> > ===
> > mxnet-full_2.11-osx-x86_64-cpu-bundle.jar.md5
> > mxnet-full_2.11-osx-x86_64-cpu-bundle.jar.sha1
> > mxnet-full_2.11-osx-x86_64-cpu-sources.jar.asc
> > mxnet-full_2.11-osx-x86_64-cpu-sources.jar.md5
> > mxnet-full_2.11-osx-x86_64-cpu-src.jar.md5
> > mxnet-full_2.11-osx-x86_64-cpu-src.jar.sha1
> > mxnet-full_2.11-osx-x86_64-cpu.jar
> > mxnet-full_2.11-osx-x86_64-cpu.jar.asc
> > mxnet-full_2.11-osx-x86_64-cpu.pom.md5
> > mxnet-full_2.11-osx-x86_64-cpu.pom.sha1
> >
> >
> > Repo2: orgapachemxnet-1019
> > ===
> > mxnet-full_2.11-osx-x86_64-cpu-bundle.jar
> > mxnet-full_2.11-osx-x86_64-cpu-bundle.jar.asc
> > mxnet-full_2.11-osx-x86_64-cpu-sources.jar
> > mxnet-full_2.11-osx-x86_64-cpu-sources.jar.sha1
> > mxnet-full_2.11-osx-x86_64-cpu-src.jar
> > mxnet-full_2.11-osx-x86_64-cpu-src.jar.asc
> > mxnet-full_2.11-osx-x86_64-cpu-jar.md5
> > mxnet-full_2.11-osx-x86_64-cpu.jar.sha1
> > mxnet-full_2.11-osx-x86_64-cpu.pom
> > mxnet-full_2.11-osx-x86_64-cpu.pom.asc
> >
> > This was not an issue with Linux-CPU and Linux-GPU versions for 1.3.0 and
> > suspicion is the new Source and documentation jar files being added as a
> > part of 1.3.0 for OSX.
> >
> > There are 2 options
> > * Continue to investigate the issue and announce later for OSX (after the
> > general announcement)
> > * Remove bundle/source jar from 1.3.0 and attempt publishing OSX package.
> >
> > What do you guys suggest?
> >
> >
> > 1) Closing Staged Repo -
> >
> > https://central.sonatype.org/pages/releasing-the-
> deployment.html#locate-and-examine-your-staging-repository
> > 2) Scala Release Process:
> >
> > https://cwiki.apache.org/confluence/display/MXNET/
> MXNet-Scala+Release+Process
> >
>

Error Publishing OSX package to Maven

2018-09-11 Thread Naveen Swamy

hey all,

I am working on publishing the 1.3.0 Scala package to Maven and
encountering a error when I am about to Close the Repo on Nexus. When I
publish the OSX package to Staging, the artifacts gets split into 2
repositories and when I close[1] the repo(to make it available for use) it
fails since the signature files are in a different repo. This is how the
artifacts are getting split

Repo1: orgapachemxnet-1018
===
mxnet-full_2.11-osx-x86_64-cpu-bundle.jar.md5
mxnet-full_2.11-osx-x86_64-cpu-bundle.jar.sha1
mxnet-full_2.11-osx-x86_64-cpu-sources.jar.asc
mxnet-full_2.11-osx-x86_64-cpu-sources.jar.md5
mxnet-full_2.11-osx-x86_64-cpu-src.jar.md5
mxnet-full_2.11-osx-x86_64-cpu-src.jar.sha1
mxnet-full_2.11-osx-x86_64-cpu.jar
mxnet-full_2.11-osx-x86_64-cpu.jar.asc
mxnet-full_2.11-osx-x86_64-cpu.pom.md5
mxnet-full_2.11-osx-x86_64-cpu.pom.sha1


Repo2: orgapachemxnet-1019
===
mxnet-full_2.11-osx-x86_64-cpu-bundle.jar
mxnet-full_2.11-osx-x86_64-cpu-bundle.jar.asc
mxnet-full_2.11-osx-x86_64-cpu-sources.jar
mxnet-full_2.11-osx-x86_64-cpu-sources.jar.sha1
mxnet-full_2.11-osx-x86_64-cpu-src.jar
mxnet-full_2.11-osx-x86_64-cpu-src.jar.asc
mxnet-full_2.11-osx-x86_64-cpu-jar.md5
mxnet-full_2.11-osx-x86_64-cpu.jar.sha1
mxnet-full_2.11-osx-x86_64-cpu.pom
mxnet-full_2.11-osx-x86_64-cpu.pom.asc

This was not an issue with Linux-CPU and Linux-GPU versions for 1.3.0 and
suspicion is the new Source and documentation jar files being added as a
part of 1.3.0 for OSX.

There are 2 options
* Continue to investigate the issue and announce later for OSX (after the
general announcement)
* Remove bundle/source jar from 1.3.0 and attempt publishing OSX package.

What do you guys suggest?


1) Closing Staged Repo -
https://central.sonatype.org/pages/releasing-the-deployment.html#locate-and-examine-your-staging-repository
2) Scala Release Process:
https://cwiki.apache.org/confluence/display/MXNET/MXNet-Scala+Release+Process

Re: [VOTE] Release MXNet version 1.3.0.RC0

2018-09-06 Thread Naveen Swamy

+1

Roshani/Sheng,

Thanks for putting this release together, I was able to test the release
only now. As Kellen indicated this release does not have enough committer
votes, I suggest you extend the timeline.

I downloaded the source code from
https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.3.0.rc0/.

I verified the signature of the release and built the Scala package from
this source, I was able to run Scala Unit Tests and Integration tests
successfully.

Also IMO, the issue that Sandeep though is good to include in the release,
I would not consider it a release blocker since it has a work around and
you can add it to release notes as a link to the github issue with the
workaround.

Other notes (consider adding to retrospective):

On running  gpg --verify, I received a message that the signature is Good
from Sheng Zha along with a WARNING(gpg: WARNING: This key is not certified
with a trusted signature!), On researching I found this is fine[1] and the
fingerprint matches with Sheng's Key here
https://dist.apache.org/repos/dist/dev/incubator/mxnet/KEYS.

Next time, please send a link to the source and signatures on apache dist
server

I am currently working with Qing to create and test a maven package for
Scala, please wait and add that to the Announcement email.

Next time, please give a day or two after the RC is cut so we can create
packages for various language bindings(Scala, Clojure, R) --(currently this
is manual), so we can get the packages that users use tested during the RC
phase.

During the release, I suggest the release manager communicate
regularly(daily) on dev@ until an announcement is made so everyone is aware
of the status and can plan their work to accommodate building packages,
testing RC, etc.,

1.
http://www.apache.org/dev/release-signing.html#valid-untrusted-vs-invalid-trusted

Thanks, Naveen

On Wed, Sep 5, 2018 at 10:20 AM, Aaron Markham 
wrote:

> 0 (non-binding) If we have a problem that blocks users, and a solution in
> hand... then we should fix it, but not at the expense of starting the
> release cycle again just for one fix. Users can cherry pick or build from
> master if they want the fix right away, right? I'd change my mind to -1 if
> this wasn't the case, with good reason, and if the user impact was critical
> to adoption or risks abandonment.
>
>
> On Wed, Sep 5, 2018 at 9:57 AM Roshani Nagmote 
> wrote:
>
> > I believe everyone here is working hard to make MXNet a better framework
> > for users. It's completely okay to have different opinions, we can decide
> > together if this issue is a blocker or not after voting time is over.
> >
> > As I mentioned before, voting will end at 7 pm today. So there is still
> > time to test the release. If there are any other issues anyone finds, I
> > will be happy to start the process again and work on RC1. For now, I want
> > to encourage everyone to utilize this time and vote. :)
> >
> > Thanks,
> > Roshani
> >
> > On Tue, Sep 4, 2018 at 10:35 PM sandeep krishnamurthy <
> > sandeep.krishn...@gmail.com> wrote:
> >
> > >1. As a Apache MXNet community member, I raised the concern of
> broken
> > >functionality for the user. I explained and provided the data points
> > on
> > > the
> > >issue, workaround and why I think it is important. If after all
> this,
> > > you
> > >think my vote is biased on my employer just because a user I quoted
> is
> > > from
> > >Amazon, this is more concerning to me on my voting abilities.
> > >2. My -1 no where undermines the huge amount of effort that goes
> > behind
> > >the scene for a release to happen. Great respect and recognition for
> > >everyone involved in all the releases of MXNet in the past and
> this. I
> > >voted on my judgement of what may be good for the users of MXNet.
> > >3. As pointed by Naveen & Chris, -1 are NOT veto. Feel free to
> decide
> > >and progress on the release as we already have >3 +1 in this thread.
> > >
> > >
> > > Best,
> > >
> > > Sandeep
> > >
> > > On Tue, Sep 4, 2018 at 8:29 PM Chris Olivier 
> > > wrote:
> > >
> > > > btw, there are no vetoes on package releases:
> > > >
> > > > VOTES ON PACKAGE RELEASES
> > > > 
> > > >
> > > > Votes on whether a package is ready to be released use majority
> > approval
> > > > 
> --
> > > i.e.
> > > > at least three PMC members must vote affirmatively for release, and
> > there
> > > > must be more positive than negative votes.Releases may not be vetoed.
> > > > Generally
> > > > the community will cancel the release vote if anyone identifies
> serious
> > > > problems, but in most cases the ultimate decision, lies with the
> > > individual
> > > > serving as release manager. The specifics of the process may vary
> from
> > > > project to project, but the 'minimum quorum of three +1 votes' rule
> is
> > > > universal.
> > > >
> > > > On Tue, Sep

Re: [VOTE] Release MXNet version 1.3.0.RC0

2018-09-04 Thread Naveen Swamy

"Releases may not be vetoed"
http://www.apache.org/legal/release-policy.html#release-approval

I haven't tested the release yet, I'll do so tomorrow.

> On Sep 4, 2018, at 7:13 PM, Sheng Zha  wrote:
> 
> Thanks for sharing your opinions, Thomas. Your recognition and respect of
> people's efforts on preparing the release candidate are certainly
> appreciated.
> 
> Now that the vote is set to fail thanks to the veto, there will be plenty
> of opportunities to include those bug fixes, including the one Zhi
> mentioned [1], which was already merged in the master and yet chose not to
> block this release with [2]. I will be happy to work with Roshani to
> prepare another release candidate once ready.
> 
> -sz
> 
> [1]
> https://lists.apache.org/thread.html/f02e952bec22c82cb00a6741390a78f55373311c97464997bb455a6c@%3Cdev.mxnet.apache.org%3E
> [2]
> https://lists.apache.org/thread.html/85d3fcabb3437ba7f1af455cf69aa13eb3afd1ea1d1f6f891e9c339c@%3Cdev.mxnet.apache.org%3E
> 
> On Tue, Sep 4, 2018 at 6:02 PM Thomas DELTEIL 
> wrote:
> 
>> -0
>> (non-binding)
>> 
>> If I may add some nuancing plus a personal data point as one of the users
>> commenting in the bug report in question:
>> 
>> - Performance vs. Basic functionality => I don't think high performance
>> use-cases and basic functionality are two obviously opposed concepts and
>> see no contradiction in Hagay's and Sandeep's statements.
>> Float16 support is feature of MXNet that provides more than twice the
>> performance of Float32 on supported platforms, hence the high performance
>> use-case. The bug is that the basic functionality of reloading a saved
>> float16 models is currently broken.
>> 
>> - This bug vs Other bugs => Contrary the vast majority of the 140 open bugs
>> that are mentioned above, I would put to Sandeep's credit that this one bug
>> has a PR open that provides a fix for it. This would make it a better
>> candidate to get included in this release than a bug that has no fix ready
>> for it.
>> 
>> - Personal datapoint: I recently did some experimentation with float16 [1]
>> and actually coincidentally just published a video on optimizing
>> performance for Gluon. Float16 conversion is one of the most, if not the
>> most effective way to get performance out of MXNet [2]. I believe there is
>> a lot of value in publicizing more its use and hence making sure at least
>> the basic support for normal use-cases is present.
>> 
>> Of course this needs to be balanced with the overhead of preparing a new
>> release candidate once the fixed is reviewed and merged, which seems to be
>> a lengthy and complex process in its own right, and the delay with
>> providing the other features present in 1.3 for users that are not running
>> off the nightly builds.
>> 
>> All the best,
>> 
>> Thomas
>> 
>> [1] https://github.com/ThomasDelteil/PerformanceTricksMXNetGluon
>> [2]
>> 
>> https://www.youtube.com/watch?v=Cqo7FPftNyo=0s=PLkEvNnRk8uVk6U515Pj-jHQUxFC4eDi3m
>> 
>>> Le mar. 4 sept. 2018 à 17:11, Sheng Zha  a écrit :
>>> 
>>> Sandeep,
>>> 
>>> Thanks for explaining your veto. We have open bugs that impacted a lot
>> more
>>> than just 3 customers, just by referring to the number of commenters on
>> the
>>> issue [1].
>>> 
>>> You said that this is for "high performance use cases", which contradicts
>>> with Hagay's assement that this is "basic functionality broken". Given
>> that
>>> this is for advanced use cases of using half-precision training, why is
>> it
>>> so much more important than any other open bug reports, that for this
>>> specific bug fix, we have to delay the access of regular users to the new
>>> MXNet 1.3 release by at least another week?
>>> 
>>> Honestly, I'm concerned that your vote is biased by Amazon involvement,
>>> given that you quoted Amazon Rekognition.
>>> 
>>> -sz
>>> 
>>> [1]
>>> 
>>> 
>> https://github.com/apache/incubator-mxnet/issues?q=is%3Aissue+is%3Aopen+label%3ABug+sort%3Acomments-desc
>>> 
>>> On Tue, Sep 4, 2018 at 4:51 PM sandeep krishnamurthy <
>>> sandeep.krishn...@gmail.com> wrote:
>>> 
 My initial vote of “-0” was due to lack of info from a user who had
>> said,
 he overcame this issue for FP16 model.
 
 
 However, suggested workaround [1] for the issue is not straight forward
>>> and
 generally usable for all users. Also, issue is not simple and isolated
>> to
 be listed in the Release Notes as known issue with a workaround.
 
 
 Changing my vote to: "-1 (binding)" owing to the user impact [3]
 
 
 
 @Sheng:
 
 1. Agreed, bug existed from long time. However, FP16 and such
>>> optimizations
 were added later on. Followed by users [2] using this feature for high
 performance use cases. It is not ok to measure severity of the bug
>> based
>>> on
 its past existence, rather we can see who is impacted now and is it a
>>> small
 subset with a simple workaround or large user impacting issue.
 
 2. Agreed bug was reported 7/21.

Re: New Java Inference API

2018-09-04 Thread Naveen Swamy

this proposal is missing many of the offline discussions that happened and
subsequent changes.

@andrewfayres: Please update the wiki(may be you forgot to publish the
changes)

On Tue, Sep 4, 2018 at 11:11 AM Qing Lan  wrote:

> Hi All,
>
> Here is an update for the Java Inference API design doc on CWIKI:
> https://cwiki.apache.org/confluence/display/MXNET/MXNet+Java+Inference+API.
> Currently, MXNet Java bindings is an extension of MXNet Scala API that
> allow users to use Java to do inference on MXNet. Users will be able to
> import pre-trained MXNet model and do single/batch inference on it.
>
> Please take a look the design document again and feel free to leave any
> thoughts you have.
>
> Thanks,
> Qing
>
> On 5/10/18, 11:08 AM, "Andrew Ayres"  wrote:
>
> Hi Kellen,
>
> Thanks for the feedback. You bring up an interesting idea about the
> dependencies. I'll add that to the list of things to look into.
>
> As for the threading, my current thinking is that we implement a
> dispatcher
> thread like suggested in the Scala threading discussion
>
> https://discuss.mxnet.io/t/fixing-thread-safety-issues-in-scala-library/236
> .
> I would definitely like to hide such complexities from the user.
>
> Andrew
>
>
> On Thu, May 10, 2018 at 3:22 AM, kellen sunderland <
> kellen.sunderl...@gmail.com> wrote:
>
> > Hey Andrew, thanks for the write-up.  I think having a Java binding
> will be
> > very useful for enterprise users.  Doc looks good but two things I'm
> > curious about:
> >
> > How are you planning to handle thread safe inference?   It'll be
> great if
> > you can hide the complexity of dealing with dispatch threading from
> users.
> >
> > The other thing I think a solid Java API could provide is a limited
> number
> > of dependencies.  There's some simple things we can do to make this
> happen
> > (create a statically linked, portable so) but there's also some
> complexity
> > around minimizing dependencies MXNet.  For example we'll likely want
> to
> > release MKL flavoured binaries, we should have a few versions of CUDA
> > supported.  We could try and have one version that has an absolute
> minimum
> > of dependencies (maybe statically linking with openblas).  It might
> be good
> > to document exactly the packages you're planning to release, and
> give some
> > more details about what the dependencies for the packages would be.
> >
> > Many thanks for looking into this, I think it'll be a big
> improvement for
> > many of our users.
> >
> > -Kellen
> >
> > On Thu, May 10, 2018, 12:57 AM Andrew Ayres <
> andrew.f.ay...@gmail.com>
> > wrote:
> >
> > > Hi all,
> > >
> > > There has been a lot of interest expressed in having a Java API
> for doing
> > > inference. The general idea is that after training a model using
> python,
> > > users would like to be able to load the model for inference inside
> their
> > > existing production eco-system.
> > >
> > > We've begun exploring a few options for the implementation at <
> > > https://cwiki.apache.org/confluence/display/MXNET/
> > MXNet+Java+Inference+API
> > > >
> > > and would appreciate any insights/feedback.
> > >
> > > Thanks,
> > > Andrew
> > >
> >
>
>
>

Re: Propose to discontinue supporting Apache MXNet on Windows 7

2018-08-28 Thread Naveen Swamy

+1 to stop supporting Win7

On Tue, Aug 28, 2018 at 3:54 PM Lin Yuan  wrote:

> Dear Community,
>
>
>
> Currently, our MXNet installation guide for Windows does not work for
> Windows 7. e.g. Microsoft Visual Studio 2015 is not supported on Windows 7
> <
> https://visualstudio.microsoft.com/vs/support/vs2015/received-error-specified-program-requires-newer-version-windows/
> >.
> In addition, MSFT ended “Mainstream” support for Windows 7 in 2015 (
> https://support.microsoft.com/en-us/help/13853/windows-lifecycle-fact-sheet
> ).
> Therefore, it is not possible for developers to build MXNet and verify the
> fix on Windows 7 platform. Given that there have been several issues about
> MXNet error on Windows 7 (issue#9271
> , issue #8921
> , issue #11163
> ), it will even
> add
> more burden on developers in the future if we were to continue supporting
> Windows 7.
>
>
>
> I therefore would like to propose that we discontinue the support of MXNet
> on Windows 7 in the next release.
>
>
> Specifically, this means the following required actions:
>
> 1) state the discontinuation of Windows 7 support in the release note
>
> 2) update the MXNet webpage if Windows version is mentioned.
>
> 3) update the open Github issues related to Windows 7
>
>
> Please share your thoughts about this proposal and/or suggest if there is
> any other missing action item from the above.
>
>
> Best Regards,
>
>
> Lin
>

Re: build from source instructions

2018-08-28 Thread Naveen Swamy

The automated script served its purpose when MXNet PyPi package was not
regularly maintained, the script made it simple for users when they had to
build from source to get the latest.

Now that we have a pypi package and build from source is something that the
developers need,  I have to agree that the script has become confusing, I
do not see any point in having automated script anymore. I prefer to have
clear instructions for OSX as well.

On Tue, Aug 28, 2018 at 10:44 AM Bhavin Thaker 
wrote:

> The automated build script on macOS was written with the intention to have
> an automated, easy and quick way to build and install MXNet by any user,
> new-bie or advanced. The build script aims to provide repeatability and an
> easy way to test the build instructions.
>
> Without the script, the build instructions had many combinations of
> possibilities which would break for various users and there was no easy way
> to test all the combinations.
>
> I propose that we have both well-written build instructions with
> corresponding automated build script to ensure that the build instructions
> are well-tested.
>
> Please remember that there can be multiple use-cases and user preferences
> to build MXNet.
>
> Bhavin Thaker.
>
> On Tue, Aug 28, 2018 at 10:29 AM Afrooze, Sina  wrote:
>
> > +1 on fully automated scripts being more confusing than helpful. It's
> > difficult to debug any issues when the entire instruction is to run a
> > single script. - Sina
> >
> >
> >
> > On 8/28/18, 9:46 AM, "Lin Yuan"  wrote:
> >
> > Aaron,
> >
> > I agree the installation page is very confusing to me. When I first
> > tried
> > to build MXNet from source on MacOS, I was totally confused about the
> > instruction. Why was it vastly different from building from source on
> > Linux
> > given these two OS have similar shell commands. I feel the automatic
> > scripts on MacOS platform is rather confusing than simplifying.
> >
> > Lin
> >
> > On Mon, Aug 27, 2018 at 9:21 PM Steffen Rochel <
> > steffenroc...@gmail.com>
> > wrote:
> >
> > > Aaron - we should keep instructions how to build from source.
> > Updating and
> > > re-organizing makes sense to me.
> > > Steffen
> > >
> > > On Mon, Aug 27, 2018 at 4:54 PM Aaron Markham <
> > aaron.s.mark...@gmail.com>
> > > wrote:
> > >
> > > > Hello,
> > > > I was looking into the C++ instructions and came across this
> > seemingly
> > > > pretty old page:
> > > > https://mxnet.incubator.apache.org/install/build_from_source
> > > >
> > > > I think it has several inaccuracies as different/updated
> > installation
> > > info
> > > > has been added to different pages.
> > > >
> > > > Should it be deleted?
> > > >
> > > > Or should a specific build from source page be maintained
> > (moving/copying
> > > > info from the other more recently updated pages)?
> > > >
> > > > I'm really thinking that it would be easier to maintain if each
> OS
> > had
> > > its
> > > > own page, Python/pip info had its own page, then bindings had
> > their own
> > > > pages.
> > > >
> > > > Other suggestions?
> > > >
> > > > Cheers,
> > > > Aaron
> > > >
> > >
> >
> >
> >
> >
>

Re: Removing misleading metric log

2018-08-27 Thread Naveen Swamy

Can you not just print the batch range[10-20] along with epoch, there might
be user scripts that depend on these values(we won't know who is using) , I
don't think we should drop this log.

On Mon, Aug 27, 2018 at 9:38 AM Vandana Kannan  wrote:

> Hi All,
>
> The log self.logger.info('Epoch[%d] Train-%s=%f', epoch, name, val) in
> python/mxnet/module/base_module.py gives the user the impression that the
> metric printed is for the entire epoch, which is misleading (Ref
> https://github.com/apache/incubator-mxnet/pull/10437). This log was
> maintained so that scripts that look for this text, are not broken. But
> this is continuing to cause confusion for users.
>
> https://github.com/apache/incubator-mxnet/pull/12182 removes this
> particular log. The scripts that parse this output within incubator-mxnet
> have been fixed, but there is a risk that external scripts may break due to
> this change.
>
> Are there any known scripts that look for this log? Any other thoughts
> about this PR?
>
> Thanks,
> Vandana
>
>

Re: Release plan - MXNET 1.3

2018-08-13 Thread Naveen Swamy

Is this is a major feature? This is a regression that Dom is reporting wrt
to performance

On Mon, Aug 13, 2018 at 11:38 AM, Roshani Nagmote  wrote:

> Thanks for reporting this issue Dom.
> 08/10 (Frida)y was the major feature freeze date. We won't be accepting any
> new features now for MXNet 1.3 release.
> RC0 will be cut on 08/17(Friday).
>
> Will be verifying the performance degradation issue mentioned.
>
> Thanks,
> Roshani
>
> On Mon, Aug 13, 2018 at 8:45 AM Divakaruni, Dominic
>  wrote:
>
> > Hi all, We tested resnet50 on MXNet built from master branch on Friday
> and
> > were seeing degraded performance on GPU - about 50% slower compared to
> > these values here https://mxnet.incubator.apache.org/faq/perf.html. FWIW
> > this slowdown was seen for both MXNet as well as the TRT integrated
> MXNet.
> >
> > Something for you all to verify before or after you cut the RC.
> >
> > Thx!
> >
> > On 8/13/18, 4:34 AM, "kellen sunderland" 
> > wrote:
> >
> > Hey Roshani,
> >
> > Has a RC branch already been cut?  If so, a quick heads up that I
> think
> > this commit should probably get into RC0 for 1.3.
> >
> > https://github.com/apache/incubator-mxnet/commit/
> ee8755a2531b322fec29c9c3d2aa3b8738da41f3
> >
> > It won't cause issues for users, but from a versioning compatibility
> > perspective it's probably better that we remove these functions in
> this
> > release. This way we don't have to worry about major bumps in the
> next
> > release if they're removed.
> >
> > -Kellen
> >
> >
> > On Fri, Aug 10, 2018 at 7:24 PM Roshani Nagmote <
> > roshaninagmo...@gmail.com>
> > wrote:
> >
> > > Thanks Kellen and everyone else for working to get TensorRT PR
> > merged!
> > > @Sina, I will be keeping track of that issue and fixes to get in
> the
> > > release.
> > >
> > > We are starting code freeze for 1.3 release today. A release
> > candidate will
> > > be cut on 08/17.
> > > Feel free to add any other comments/suggestions.
> > >
> > > Thanks,
> > > Roshani
> > >
> > > On Fri, Aug 10, 2018 at 5:39 AM kellen sunderland <
> > > kellen.sunderl...@gmail.com> wrote:
> > >
> > > > All merged and ready to go from my side Roshani (the TensorRT
> PR).
> > > >
> > > > I agree with Sina that issue 12116 looks it's a blocker.  I'll
> try
> > and
> > > > reproduce it locally to get another datapoint.
> > > >
> > > > On Fri, Aug 10, 2018 at 3:15 AM Afrooze, Sina <
> sina@gmail.com>
> > > wrote:
> > > >
> > > > > Hi Roshani - I think this regression issue is a release
> blocker:
> > > > > https://github.com/apache/incubator-mxnet/issues/12116  - Sina
> > > > >
> > > > >
> > > > > On 8/8/18, 12:40 PM, "Roshani Nagmote" <
> > roshaninagmo...@gmail.com>
> > > > wrote:
> > > > >
> > > > > Thanks, Kellen for letting me know.
> > > > >
> > > > > On Wed, Aug 8, 2018 at 12:09 PM kellen sunderland <
> > > > > kellen.sunderl...@gmail.com> wrote:
> > > > >
> > > > > > Hey Roshani, I think it should be ready by Friday.
> > > > > >
> > > > > > On Tue, Aug 7, 2018, 10:20 PM Roshani Nagmote <
> > > > > roshaninagmo...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Thanks Kellen. Yes, we were treating this PR as a
> release
> > > > blocker.
> > > > > Do you
> > > > > > > have any ETA by which it will be completed? Approximate
> > time
> > > will
> > > > > also
> > > > > > > work.
> > > > > > > @zhi, Thanks for bringing this PR into notice. I will
> > keep a
> > > > track
> > > > > of it.
> > > > > > >
> > > > > > > -Roshani
> > > > > > >
> > > > > > > On Tue, Aug 7, 2018 at 11:30 AM Joshua Z. Zhang <
> > > > > cheungc...@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > I strongly suggest to track this PR
> > > > > > > > https://github.com/apache/incubator-mxnet/pull/11908
> <
> > > > > > > > https://github.com/apache/incubator-mxnet/pull/11908
> >
> > in 1.3
> > > > > release
> > > > > > > > which fixed the usability issue for lower end
> machines
> > that
> > > > > don’t have
> > > > > > as
> > > > > > > > large shared memory space as ec2 instances.
> > > > > > > >
> > > > > > > > Best,
> > > > > > > >
> > > > > > > > - Zhi
> > > > > > > >
> > > > > > > > > On Aug 7, 2018, at 9:05 AM, Roshani Nagmote <
> > > > > > roshaninagmo...@gmail.com
> > > > > > > >
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > Hi all,
> > > > > > > > >
> > > > > > > > > Right now, we are delaying MXNet 1.3 release for
> > pending
> > > > > TensorRT PR
> > > > > > (
> > > > > > > > > https://github.com/apache/
>

Re: Growing number of open PRs and Labelling PRs

2018-08-09 Thread Naveen Swamy

A little more context and clarification: This is an effort to reduce the
number of open PRs and I along with others stated above as a first step
labeled the PRs to help us and the community to chime in and review. We(the
above) will be spending some time on reviewing some of the open PRs and
also invite the contributors/committers to participate.

Going forward we are looking into using the mxnet-label-bot to help
labeling the PRs as well.


On Thu, Aug 9, 2018 at 9:15 AM, sandeep krishnamurthy <
sandeep.krishn...@gmail.com> wrote:

> Hello Community,
>
> Thanks to our committers - Anirudh, Naveen and Haibin, we have now labelled
> all PRs raised in Jul/Aug (~80 PRs).
> PRs are labelled with current state (pr-awaiting-review, pr-awaiting-merge
> etc.) and components/functionality (Gluon, MKLDNN, Bugfix, Backend etc.)
> addressed in the PR. It should now be easy for you to filter PRs of your
> interest, expertise and awaiting-review.
>
> For example, we have around 50 PRs awaiting reviews -
> https://github.com/apache/incubator-mxnet/pulls?utf8=%
> E2%9C%93=is%3Apr+is%3Aopen+label%3Apr-awaiting-review
>
> Looking forward for all your contributions.
>
> Best,
> Sandeep
>
> On Wed, Aug 8, 2018 at 3:02 PM sandeep krishnamurthy <
> sandeep.krishn...@gmail.com> wrote:
>
> > @Sheng - Thanks for the feedback. I agree PR template provides that info.
> > But, one major draw back is ability to filter, group and take actions
> > (review, merge, ping corresponding contributors). And when we have
> deployed
> > the labelling bot (hopefully soon to happen), we should be able to
> > disassociate from the need for committer.
> > Suggestion?
> >
> > @Naveen - Thanks. I will make the change.
> >
> > On Wed, Aug 8, 2018 at 2:55 PM Naveen Swamy  wrote:
> >
> >> suggest to change pr-ready-to-merge to pr-awaiting-merge? makes it easy
> to
> >> pick all pr related status.
> >>
> >> Also think `(then merge)` is not necessary in pr-awaiting-testing
> >>
> >> On Wed, Aug 8, 2018 at 2:44 PM, sandeep krishnamurthy <
> >> sandeep.krishn...@gmail.com> wrote:
> >>
> >> > << Sorry sent too early>>
> >> > Hello Community,
> >> >
> >> > Recently, we are observing a growing number of PR open {pending for
> >> review,
> >> > pending for updates, ready to merge but waiting and more}.
> >> >
> >> > Few of us committers (Naveen, Haibin, Anirudh and Me) and contributors
> >> > (Steffen and Hagay) met to discuss on how to improve the process in
> >> > reviewing the PR and allow more people join the review process.
> >> >
> >> > To shed some light on numbers:
> >> >
> >> > *(As of 6-Aug-2018)*
> >> >
> >> >- Total open PRs - 113 - Link
> >> ><https://github.com/apache/incubator-mxnet/pulls>
> >> >- Total open PRs with No Reviews - 94 - Link
> >> ><https://github.com/apache/incubator-mxnet/pulls?q=is%
> >> > 3Apr+is%3Aopen+review%3Anone>
> >> >(*Note:* Out of these there are comments for 72 PRs. This count is
> >> for
> >> >formally reviewing and approve/request change etc.)
> >> >
> >> >
> >> >- Changes Requested and awaiting contributors to update - 8 - Link
> >> ><https://github.com/apache/incubator-mxnet/pulls?q=is%
> >> > 3Apr+is%3Aopen+review%3Achanges-requested>
> >> >- Oldest PR - Jan 19, 2018 - PR
> >> ><https://github.com/apache/incubator-mxnet/pull/9496>
> >> >
> >> > One important issue observed is, "*Inability to filter the PR based on
> >> > state and component*". For this, one suggested solution is to "*label
> >> the
> >> > PRs*" like we label the issues. This will allow community members to
> >> filter
> >> > by area of interest, add review, committers can filter by state and
> take
> >> > necessary action.
> >> >
> >> > In this direction, I have created following 4 new labels.
> >> >
> >> > Please let us know your suggestions, and this is open for feedback and
> >> > changes.
> >> >
> >> >
> >> > -
> >> > pr-awaiting-review
> >> > <https://github.com/apache/incubator-mxnet/labels/pr-awaiting-review>
> >> > PR is waiting for code review
> >> >  Edit Delete
> >> > - pr-awaiting-response
> &g

Re: Suggestions for Design Proposal Template

2018-08-08 Thread Naveen Swamy

Hi Patric,

Design template is some preliminary work that I did and plan to propose as
part of the PR best practice(criteria). I am still working on the PR
document.

Thanks for feedback on the Design Template and you make a great point. I
discussed a similar idea with a few contributors nearby (Steffen, Andrea,
Da, Haibin, Kellen) about adding a Shepherd to every medium and large
feature(what makes a medium/large feature is yet to discussed and agreed
upon) so authors can partner with another contributor to take their PRs to
completion. This will also expand the understanding of the code base.

Your suggestion aligns very closely to what I was thinking, I will add a
section called `Shepherd` in the template. I would suggest we have the
feedback happen on the cwiki document (you can double click a line to leave
inline feedback) or on dev@ list and the author can apply the feedback to
the design. I think the decision who would(or should) like to shepherd a
feature should also happen on dev@ as a part of the design discussion.

Yes I agree suggestions/feedback should be timely, whether it should be 1
or 2 weeks can be discussed and agreed upon here.

And also thank you for writing the design documents.

I will come back with a more concrete set of proposals shortly on the dev@
and meanwhile if others have suggestions happy to take them.

Thanks, Naveen

On Wed, Aug 8, 2018 at 8:04 PM, Zhao, Patric  wrote:

> Hi MXNet owner,
>
> We (Intel engineers) have already wrote up several design proposals and
> published into cwiki.
> So, I really like this documents and it make things very clear.
> https://cwiki.apache.org/confluence/display/MXNET/
> Apache+MXNet+Design+Proposal+Template
>
> Furthermore, I suggest adding a section of "feedbacks from MXNet owner
> (committers)".
> It's better to assign the proposal to the committers and write up the
> committer's name in the doc.
> The committer owner should give a clear suggestion/decision about the
> proposal in a time slot (maybe two weeks).
>
> I know this will take the extra-efforts to the committer and owners.
> But it can make the whole project more efficient and we will have a clear
> goal.
>
>
> Thanks,
>
> --Patric
>
>
>
>
>
>
>
>

1 2 3 >

1 - 100 of 239 matches

Mail list logo