Re: [DISCUSS] 1.5.0 Release Plan

2019-05-31 Thread Haibin Lin
Hi dev@,

Quick update on the gluonnlp issue. Lai and I worked together to test
gluonnlp and MXNet with different configurations, and found that the use of
GELU operator in fp16 is causing the divergence. It was a very recent
change in gluonnlp, and it can be avoided by reverting the change in
GluonNLP. This doesn't block 1.5 release anymore.

Best,
Haibin

On Thu, May 30, 2019 at 11:33 AM Lai Wei  wrote:

> Hi dev@,
>
> Quick update on the 1.5.0 release, all previous tracked PRs have been
> merged and CI is back to normal again, please rebase your PR.
> Again, I would like to encourage downstream projects to test against latest
> MXNet now to discover bugs and regressions early, really appreciate your
> help.
>
> We still have 3 new open issues/PRs to track:
> 1. Gluon NLP BERT training Haibin mentioned
> 2. https://github.com/apache/incubator-mxnet/pull/15039
> 3. https://github.com/apache/incubator-mxnet/pull/15097
>
> Thanks!
>
> Best Regards
>
> Lai
>
>
> On Tue, May 28, 2019 at 9:32 AM Haibin Lin 
> wrote:
>
> > Hi dev@,
> >
> > I was testing GluonNLP with MXNet master, and found that BERT training
> > crashes a few hours after I launch the job. I can confirm that MXNet pip
> > package 20190412 works fine. I am bisecting changes in MXNet/GluonNLP to
> > check what causes the problem. I'll send an update as soon as I find the
> > root cause, or if I find any workaround.
> >
> > Thanks,
> > Haibin
> >
> > On Thu, May 23, 2019 at 2:12 AM Lin Yuan  wrote:
> >
> > > Hi Lai,
> > >
> > > One important PR that is currently blocked by a Flaky TensorRT test:
> > >
> > > https://github.com/apache/incubator-mxnet/pull/15041
> > >
> > > I have retriggered it several times. If it fails again, I may need CI
> > team
> > > to help disable this test. It has been reported by multiple people:
> > > https://github.com/apache/incubator-mxnet/issues/14978
> > >
> > > Thanks,
> > >
> > > Lin
> > >
> > > On Wed, May 22, 2019 at 11:38 PM Zhao, Patric 
> > > wrote:
> > >
> > > > Thanks, Lai.
> > > >
> > > > With the great helps from the community, all PRs listed in the
> roadmap
> > > are
> > > > done :)
> > > >
> > > >
> > >
> >
> https://github.com/apache/incubator-mxnet/issues/14619#issuecomment-480110642
> > > >
> > > > Update the status of the below list
> > > >
> > > >  - [1] PR#14713 is almost done and wait for internal validation
> results
> > > >  - [2] PR#14893 is merged
> > > >  - [3] PR#15031 is merged
> > > >  - [7] PR#15038 new PR to fix the bug in C++ interface, will be
> merged
> > > > soon after the review.
> > > >
> > > > Feel free to let me know if anything our team can help :)
> > > >
> > > > BR,
> > > >
> > > > --Patric
> > > >
> > > > > -Original Message-
> > > > > From: Lai Wei [mailto:roywei...@gmail.com]
> > > > > Sent: Thursday, May 23, 2019 6:05 AM
> > > > > To: dev@mxnet.incubator.apache.org
> > > > > Subject: Re: [DISCUSS] 1.5.0 Release Plan
> > > > >
> > > > > Hi @dev,
> > > > >
> > > > > Thanks for working hard for the 1.5 release, since there has been
> > > several
> > > > > release blockers (mostly fixed). We are extending the code freeze
> to
> > > > Friday
> > > > > 05/22/2019. Right now we are tracking the following 5 open
> > > > PRs[1][2][3][4][5]
> > > > > and 1 issue[6]. Please let us know if you need more time.
> > > > >
> > > > > I would like to encourage all downstream projects to test with
> latest
> > > > MXNet
> > > > > to avoid any incompatibility in the coming 1.5.0 release. If you
> have
> > > any
> > > > > issues that may block the release, please let us know.
> > > > > Thank you very much.
> > > > >
> > > > > [1] https://github.com/apache/incubator-mxnet/pull/14713
> > > > > [2] https://github.com/apache/incubator-mxnet/pull/14893
> > > > > [3] https://github.com/apache/incubator-mxnet/pull/15031
> > > > > [4] https://github.com/apache/incubator-mxnet/pull/15039
> > > > > [5] https://github.com/apache/incubator-mxnet/pull/15041
> > > > > [6] https://github.com/apache/incubator-mxnet/issues/15034
> > > > >
> > > > >
> > > > > Best Regards
> > > > >
> > > > > Lai
> > > > >
> > > > >
> > > > > On Wed, May 15, 2019 at 9:05 PM Junru Shao <
> junrushao1...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi folks,
> > > > > >
> > > > > > Here I may have a release blocker for 1.5.0 about implementation
> of
> > > > > > dynamic shape mechanism, which somehow conflicts with Gluon's
> > > > > deferred
> > > > > > initialization [1].
> > > > > >
> > > > > > [1] https://github.com/dmlc/gluon-nlp/issues/706
> > > > > >
> > > > > > On Wed, May 15, 2019 at 12:09 PM Anirudh Subramanian <
> > > > > > anirudh2...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Lai,
> > > > > > >
> > > > > > > From the discussion I had with Nvidia offline they are
> targeting
> > on
> > > > > > pushing
> > > > > > > the required changes today.
> > > > > > > Since this is important feature for the release, if this gets
> > > > > > > delayed and cannot  be merged by 05/17/2019, the code 

cwiki permissions change

2019-05-31 Thread Sheng Zha
Hi,

Recently I came across the permission setting page where it was slightly 
chaotic. Namely, some non-committers were given admin access while some 
committers don't have delete rights. The permission rights were also not given 
consistently across contributors.

I made the following changes:
- committers will receive permissions through the group permission of "mxnet". 
This includes every bits except admin bits and delete own.
- PMCs members will receive permissions through the group permission of 
"mxnet-pmc". This includes every bits except delete own (same as 
confluence-administrators).

No manual settings should be required for committers and PMC members after this 
change.

Also, I suggest the following settings for adding contributors who would like 
to contribute to mxnet cwiki.

All Pages:View   √ Delete Own ×
Blog: Add√ Delete ×
Attachments:  Add√ Delete ×
Comments: Add√ Delete ×
Restrictions: Add/Delete √
Mail: Delete ×
Space:Export × Admin  ×

-sz


Re: Does internal quality matters to users?

2019-05-31 Thread Tianqi Chen
A good infrastructure design has a long way to go and has a profound impact
on the project itself. That is why we always want to rethink if the
interface can be better done, and think about the next possible
infrastructure to make things better, Refactoring is certainly part of it.

There are usually two types of refactoring we refers to :
1) The major design change, in terms of class relations, data structures
(e.g. numpy support, adding compilation to new hardware)
2) The specific choice of API, programming style(more types or type-erased
program)

(1) affects the long term support of the project, introduces new features
if necessary and need a lot of thoughts into that. I believe the general
IR, compilation and numpy support belongs to that category.

I would particularly like to talk about (2).
Because there is no unified correct answer in software engineering,
different developers may prefer different views on a certain problem.
Some of them have things to do with the taste developers. The change could
favor certain aspect of the project, but not necessarily another part.
Refactoring wrt these sometimes does require a more thoughtful conversation
and make a reasonable compromise.

For example, we have a recent discussion about whether to introduce more
typing into the code base, to the extent that the base data structure could
be templatized.
- The Pros of this approach
- It introduces more typing and compile-time error message(instead of
runtime checking), could help developers to find problem earlier.
- The Cons of the approach:
   - Having a template in the base data structure causes ABI problem(which
code generated by DLL A vs DLL B) and will have potential future issues.
   - Template sometimes confuses some developers.
   - For serialization, it is hard to anticipate all kinds of classes and
it is easier to have one class(any) that handles polymorphism.
   - Because of most frontends(python) are dynamic, it is easier to
interface them with a type-erased API.

As we can see there are pros and cons of bringing in more typing to the
change, and there is no unified answer.
One good example of a nice infrastructure design trade-off is DLPack
https://github.com/dmlc/dlpack/blob/master/include/dlpack/dlpack.h
This is a base data structure adopted by MXNet, Pytorch, Chainer, and many
other frameworks unanimously.
It is a type-erased data structure that erases the data type, and memory
allocator from the data structure and is designed to exchange tensor(coming
from different memory allocators) across DLL boundaries.
As you can see this is a good example of type-erased data structures.

When we are having this kind of questions. It is important to have a good
conversation. Sometimes we have to make tradeoffs rather than bend
everyone-else to our will. This is what open source is about.
I would also like to give some examples of conversations and how design
decisions are resolved. It comes from the TVM community's recent discussion
about VM design.
I directly paste the github issue conversation here for the sake of
clarity(note that all the conversations are also mirrored to dev@tvm).
The background is that the community want to bring a virtual machine that
can execute dynamic operations more effectively.

- The initial proposal, made by one of the committers gave a detailed
design based on Stack VM https://github.com/dmlc/tvm/issues/2810
   - As you can see that there are quite some discussions about whether we
want to use a different set of design, in this case, a register-based
version.
   - The conversation evolves, and while the community members disagree on
some cases, also agrees with each other on the particular tradeoffs.
- After some discussions, the committers bring a tradeoff design that tries
to consolidate the needs of both sides and this is the final solution being
adopted  https://github.com/dmlc/tvm/issues/2915
I would like to particularly highlight the fact that: 1) there are
disagreements in the development process. 2) developers work together to
understand each others' needs and then make consensus on a perhaps better
design.

There are two other particular conversations between Pedro and myself,
which are during his contributions.
- https://github.com/dmlc/tvm/pull/3037 In this case, I raised the concern
about API consistency, and Pedro brings up a reason why he thinks it is a
better idea, I agreed and we merged the PR
- https://github.com/dmlc/tvm/pull/3108 In this other case, there are
technical reasons for going both sides for the case of MXNet, we have
listed pros/cons about both sides and have a constructive conversation.
Eventually, I decided to not merge the PR after weighing in all the cases.

I believe both are useful conversations, and while Pedro and I disagree
sometimes, we do agree on many other cases. The most crucial part is about
having a constructive conversation.
To summarize, I do refactoring and making things better is certainly
important to make the project 

Re: Does internal quality matters to users?

2019-05-31 Thread Isabel Drost-Fromm



Am 31. Mai 2019 14:13:30 MESZ schrieb Pedro Larroy 
:
> I think Martin does a very good job explaining why
>refactoring,
>reducing developer frustration and internal improvement is a crucial
>productivity multiplier which includes lower cost to ship features,
>less
>bugs and time spent debugging.

There's one aspect that's special for open source projects: if a project wants 
to survive long term, it should make it easy for people to get started working 
on the project. In my experience, refactoring and cleanup play an important 
role in that. So thanks also for making recruiting of new contributers better.

Isabel
-- 
This message was sent with K-9 from a mobile device with swipe to type enabled. 
I'm sorry for any embarrassing typos that slipped through.


Does internal quality matters to users?

2019-05-31 Thread Pedro Larroy
Hi folks

I would like to share this interesting article. I had a few conversations
with members of the community about refactors and internal enhancements
that have no apparent impact in user experience and are sometimes put into
question. I think Martin does a very good job explaining why refactoring,
reducing developer frustration and internal improvement is a crucial
productivity multiplier which includes lower cost to ship features, less
bugs and time spent debugging.

https://martinfowler.com/articles/is-quality-worth-cost.html

The points expressed in the article coincide with my experience of more
than 15 years developing industrial software and working in very diverse
codebases.

I think some of you share these views and have made important contributions
in these directions, including CI, build system, testing, tooling,
automation, delivery and binary releases, etc. So I take the opportunity to
recognize these efforts and give a big thank you and admiration.

I'm interested to read about your opinions.

Happy weekend and TGIF.

Pedro.


Re: Join the MXNet slack channel

2019-05-31 Thread Tao Lv
Hi Yan Zhe,

Invite is sent. You can find the `mxnet` channel in the ASF workspace.

-tao

On Fri, May 31, 2019 at 3:15 PM 严哲  wrote:

> Dear MXNet community,
>
>
>
>
> I want to join the MXNet slack channel,  can you allow me to join it ?
>
>
>
>
>
> Thanks & Best Regards,
>
> Yan Zhe


Join the MXNet slack channel

2019-05-31 Thread 严哲
Dear MXNet community,




I want to join the MXNet slack channel,  can you allow me to join it ?





Thanks & Best Regards,

Yan Zhe