MKLDNN Building Discussion in MXNET

2018-03-20 Thread Zhao, Patric
Hi MXNET developers,

Since the MKL-DNN is integrated into MXNET master in the last month, we saw 
there're some confusions about how to build the MKL-DNN and Intel MKL into 
MXNET.
And several Github issues were opened and most of them have been fixed. But I 
think we still need to define a clear flow for the further developments.

We start a thread to discuss what's the best building flow. A slide is attached 
in the GITHUB issues.
https://github.com/apache/incubator-mxnet/issues/10175

Highly appreciate for any suggestions and comments (or PR).

BR,

Thanks,

--Patric



Join Mxnet Slack Channel

2018-03-20 Thread Heli Wang
Hello,

I want to join the Slack channel for my onnx-mxnet contribution. Appreciate it 
if you can send me the invitation link.

Thanks,
Heli


[LAZY VOTE][RESULT] Upgrade CI to CUDA 9.1 with CuDNN 7.0

2018-03-20 Thread Marco de Abreu
Hello,

the results of this vote are as follows:

+1:
Jun
Anirudh
Hao
Marco

0:
Chris

-1:
Naveen (veto recalled as of
https://lists.apache.org/thread.html/242db72a0c96349ef6e0ff1d3b1fe0dc7f7a9082532724c3293666c5@%3Cdev.mxnet.apache.org%3E
)

Under the constraint that we will use CUDA 8 on Windows and CUDA 9.1 on
UNIX slaves and work on integration tests for CUDA 8 in the long term, this
vote counts as PASSED.

The PR for this change is available at
https://github.com/apache/incubator-mxnet/pull/10108. I have developed and
tested the new slaves in our test environment and everything looks
promising so far. The plan is as follows:

   1. Get https://github.com/apache/incubator-mxnet/pull/10108 approved to
   allow self-merge – CI can’t pass until slaves have been upgraded.
   2. Replace all existing slaves with new upgraded slaves.
   3. Retrigger https://github.com/apache/incubator-mxnet/pull/10108 to
   merge necessary changes into master.

IMPORTANT: The migration will happen tomorrow, so please expect some delay
in job execution - the CI website will be unaffected. Ideally, no jobs
should fail - in case they do, please feel free to retrigger them by using
an empty commit. In case of any errors appearing after the upgrade, don't
hesitate to contact me!

Best regards,
Marco

On Tue, Mar 20, 2018 at 1:39 AM, Naveen Swamy  wrote:

> Yes, for short-term.
>
> On Monday, March 19, 2018, Chris Olivier  wrote:
>
> > In the short ter, Naveen, are you ok with Linux running CUDA 9 and
> Windows
> > CUDA 8 in order to get CUDA version coverage?
> >
> > On 2018/03/16 21:09:09, Marco de Abreu 
> > wrote:
> > > Thanks for your input. How would you propose to proceed in terms of a
> > > timeline in case this vote succeedes? I don't really have time to work
> > on a
> > > nightly setup right now. Would anybody in the community be able to help
> > me
> > > out here or shall we wait with the migration until a nightly setup for
> > CUDA
> > > 8 is up?
> > >
> > > -Marco
> > >
> > > On Fri, Mar 16, 2018 at 9:55 PM, Bhavin Thaker  >
> > > wrote:
> > >
> > > > +1 to the suggestion of testing CUDA8 in few nightly instances and
> > using
> > > > CUDA9 for most instances in CI.
> > > >
> > > > Bhavin Thaker.
> > > >
> > > > On Fri, Mar 16, 2018 at 12:37 PM Naveen Swamy 
> > wrote:
> > > >
> > > > > I think its best to add support for CUDA 9.0 while retaining
> existing
> > > > > support for CUDA 8, code might regress when you remove and create
> > more
> > > > work
> > > > > to add CUDA 8 support back.
> > > > >
> > > > > On Fri, Mar 16, 2018 at 9:29 AM, Marco de Abreu <
> > > > > marco.g.ab...@googlemail.com> wrote:
> > > > >
> > > > > > Yeah, sorry Chris, mixed up the names.
> > > > > >
> > > > > > @Naveen: Would you be fine with doing the switch now and adding
> > > > > integration
> > > > > > tests later or is this a hard constraint for you?
> > > > > >
> > > > > > On Wed, Mar 14, 2018 at 6:39 PM, Chris Olivier <
> > cjolivie...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Isn't the TItan V the Volta and not the Tesla?
> > > > > > >
> > > > > > > On Wed, Mar 14, 2018 at 10:36 AM, Naveen Swamy <
> > mnnav...@gmail.com>
> > > > > > wrote:
> > > > > > >
> > > > > > > > Marco,
> > > > > > > > My -1 vote is for dropping support to CUDA 8 and not for
> adding
> > > > CUDA
> > > > > 9.
> > > > > > > > CUDA 9.0 support for MXNet was added Oct'30-2017, I think
> that
> > all
> > > > > > users
> > > > > > > > might not have switched to CUDA 9.0
> > > > > > > >
> > > > > > > > Look at the earlier discussion on the same topic
> > > > > > > >
> > > > > > > > https://lists.apache.org/thread.html/
> > > > 27b84e4fc0e0728f2e4ad8b6827d7f
> > > > > > > > 996635021a5a4d47b5d3f4dbfb@%3Cdev.mxnet.apache.org%3E
> > > > > > > >
> > > > > > > > On Wed, Mar 14, 2018 at 10:14 AM, Marco de Abreu <
> > > > > > > > marco.g.ab...@googlemail.com> wrote:
> > > > > > > >
> > > > > > > > > Right, the code changes would not be validated against CUDA
> > 8.0
> > > > as
> > > > > > part
> > > > > > > > of
> > > > > > > > > the PR process.
> > > > > > > > >
> > > > > > > > > I don't have any numbers, but it's pretty unlikely that
> > anybody
> > > > is
> > > > > > > still
> > > > > > > > > using CUDA 8.0. According to
> > > > > > > > > https://en.wikipedia.org/wiki/CUDA#GPUs_supported, the
> > devices
> > > > > which
> > > > > > > are
> > > > > > > > > not being supported by CUDA 9 are under the Fermi
> > architecture
> > > > > which
> > > > > > > has
> > > > > > > > > been released in April 2010. These GPUs are way too old,
> so I
> > > > think
> > > > > > > we're
> > > > > > > > > safe with not covering them specifically - this does not
> mean
> > > > we're
> > > > > > > > > entirely deprecating them.
> > > > > > > > >
> > > > > > > > > One thing to note here is that we're not testing CUDA 9 as
> of
> > > > now.
> > > > > > > > > 

[VOTE] [RESULT] Change Scala namespace from dmlc to org.apache

2018-03-20 Thread Chris Olivier
Vote to change Scala namespace to org.apache from dmlc:

This vote passes with the following tally:

+1: 3 votes
-1: 0 votes
-0: 2 votes

*Regarding -0 votes: Invalid votes (both +1 and -1)*

   - *I researched this and I can't find anything that mentions a
   conditional vote (+1 if this, -1 if that).*


   - *They add up to zero, so treating them as -0*


   -
*See voting rules: https://www.apache.org/foundation/voting.html
    *


-Chris


Re: Display the master branch website in default for a period

2018-03-20 Thread Aaron Markham
Embedding images doesn't work, I guess. Here's a link to the chart.
https://drive.google.com/file/d/1GzxDEhF6tx9_bsFc4LuUzgoIaMSiEV5s/view?usp=sharing


On Tue, Mar 20, 2018 at 9:24 AM, Aaron Markham 
wrote:

> I would like to get a sense of people's feelings with regard to actual
> data on the usage of the site.
> *Users prefer master*
> Despite it defaulting to version 1.1.0, nearly 60% of the page views are
> on master. I think it is pretty clear what the website users want.
> I've implemented a new site building script
> 
> that can easily swap the default version to any tag, and hope to get that
> integrated with the CI process this week. We could go live with master
> being default whenever it is agreed.
>
> *Time-travel website is difficult to maintain, confuses search, and is a
> strange user experience*
> Switching versions on the whole site and maintaining these time-traveling
> website builds is not really worth the effort and these old sites
> introduces a lot of problems with search and poor information. (+1 to
> Christopher's sentiments on UX).
> I think that we should keep the versions of MXNet limited to the install
> page(s) and the API docs. This is where versions have value and user
> traffic. This accounts to over 50% of all traffic in the legacy versions,
> and 80%+ of this traffic is coming directly from search, primarily Google.
> We can enhance visibility to these data points by enabling the Google
> Analytics Search Console, but in absence of that, experience suggests that
> if old content is available and linked to, it reinforces itself despite it
> being deprecated or even inaccurate. By stashing versioned content where it
> is accessible, but not highlighted, we'll force the search engines to
> update their links, and over time, any dent to the ranking of search
> results will heal, and the site's primary and current content will appear
> at the top of the results.
>
> *Maintaining tutorials and examples in master is easier, will help search,
> and provide a better user experience*
> There's another 15-25% of legacy version traffic going to tutorials, and
> all of it comes from search. These old tutorials are not maintained and
> while they might theoretically work with the specific API version they're
> coupled with in the build, they are also riddled with broken links, missing
> datasets and Python 3 incompatibilities. IMO, we should flag each tutorial
> in master with the minimum required API version and Python version, and no
> longer support legacy tutorials as a matter of course. If someone wants to
> fix them, then great. They can make the tutorial in master backwards
> compatible, or create a separate tutorial that focuses on the legacy
> version. But it is maintained in master. This shift will force search to
> update, guiding users to working tutorials and fresh content.
>
> In conclusion there are three overlapping proposals here.
> 1. Make master primary.
> 2. Remove time travel from the website. Provide specific instructions on
> installing master, current release, and a subset of legacy versions.
> Provide versioned API docs on the website. People can still download tagged
> releases and build the old site and docs if they wish.
> 3. Maintain tutorials and examples in master.
>
> Cheers,
> Aaron
>
> P.S. Any moves or removal of content will be handled by 301 permanent
> redirects, so we can soften the transition.
>
>
> On Mar 1, 2018 19:55, "Barber, Christopher" 
> wrote:
>
>> I was thinking more along the lines of benchmarks of MXNet vs TensorFlow,
>> PyTorch, and Caffe2. Benchmarks of edge devices would definitely be
>> interesting, but I would also want to see benchmarks of training time and
>> memory use and accuracy on large models. Obviously this would be a
>> non-trivial amount of work, which is why no one else is doing it, but there
>> would be a lot of interest in this. Also would like to see benchmarks of
>> ndarray, vs symbol vs gluon.
>>
>> But yes, if you want to drive traffic to the website you should have
>> content that changes frequently. I have to say I find it really strange to
>> have the entire website change when I select a different version from the
>> top tab. The design of the website should be independent of the code
>> version.
>>
>> - Christopher
>>
>> On 3/1/18, 4:33 PM, "Marco de Abreu" 
>> wrote:
>>
>> As far as I know, there are plans to make regular benchmarks and
>> generate
>> statistics. We could use that data. My personal task after CI is
>> creating
>> an infrastructure to automatically perform performance and power
>> consumption benchmarks on edge devices (raspberry and Nvidia Jetson).
>> It
>> would definitely be a good idea to share this data with the community
>> (especially considering the impressive performance of 

Re: Display the master branch website in default for a period

2018-03-20 Thread Aaron Markham
I would like to get a sense of people's feelings with regard to actual data
on the usage of the site.
*Users prefer master*
Despite it defaulting to version 1.1.0, nearly 60% of the page views are on
master. I think it is pretty clear what the website users want.
I've implemented a new site building script

that can easily swap the default version to any tag, and hope to get that
integrated with the CI process this week. We could go live with master
being default whenever it is agreed.

*Time-travel website is difficult to maintain, confuses search, and is a
strange user experience*
Switching versions on the whole site and maintaining these time-traveling
website builds is not really worth the effort and these old sites
introduces a lot of problems with search and poor information. (+1 to
Christopher's sentiments on UX).
I think that we should keep the versions of MXNet limited to the install
page(s) and the API docs. This is where versions have value and user
traffic. This accounts to over 50% of all traffic in the legacy versions,
and 80%+ of this traffic is coming directly from search, primarily Google.
We can enhance visibility to these data points by enabling the Google
Analytics Search Console, but in absence of that, experience suggests that
if old content is available and linked to, it reinforces itself despite it
being deprecated or even inaccurate. By stashing versioned content where it
is accessible, but not highlighted, we'll force the search engines to
update their links, and over time, any dent to the ranking of search
results will heal, and the site's primary and current content will appear
at the top of the results.

*Maintaining tutorials and examples in master is easier, will help search,
and provide a better user experience*
There's another 15-25% of legacy version traffic going to tutorials, and
all of it comes from search. These old tutorials are not maintained and
while they might theoretically work with the specific API version they're
coupled with in the build, they are also riddled with broken links, missing
datasets and Python 3 incompatibilities. IMO, we should flag each tutorial
in master with the minimum required API version and Python version, and no
longer support legacy tutorials as a matter of course. If someone wants to
fix them, then great. They can make the tutorial in master backwards
compatible, or create a separate tutorial that focuses on the legacy
version. But it is maintained in master. This shift will force search to
update, guiding users to working tutorials and fresh content.

In conclusion there are three overlapping proposals here.
1. Make master primary.
2. Remove time travel from the website. Provide specific instructions on
installing master, current release, and a subset of legacy versions.
Provide versioned API docs on the website. People can still download tagged
releases and build the old site and docs if they wish.
3. Maintain tutorials and examples in master.

Cheers,
Aaron

P.S. Any moves or removal of content will be handled by 301 permanent
redirects, so we can soften the transition.


On Mar 1, 2018 19:55, "Barber, Christopher" 
wrote:

> I was thinking more along the lines of benchmarks of MXNet vs TensorFlow,
> PyTorch, and Caffe2. Benchmarks of edge devices would definitely be
> interesting, but I would also want to see benchmarks of training time and
> memory use and accuracy on large models. Obviously this would be a
> non-trivial amount of work, which is why no one else is doing it, but there
> would be a lot of interest in this. Also would like to see benchmarks of
> ndarray, vs symbol vs gluon.
>
> But yes, if you want to drive traffic to the website you should have
> content that changes frequently. I have to say I find it really strange to
> have the entire website change when I select a different version from the
> top tab. The design of the website should be independent of the code
> version.
>
> - Christopher
>
> On 3/1/18, 4:33 PM, "Marco de Abreu" 
> wrote:
>
> As far as I know, there are plans to make regular benchmarks and
> generate
> statistics. We could use that data. My personal task after CI is
> creating
> an infrastructure to automatically perform performance and power
> consumption benchmarks on edge devices (raspberry and Nvidia Jetson).
> It
> would definitely be a good idea to share this data with the community
> (especially considering the impressive performance of MXNet).
>
> Aaron is currently gathering requirements for recreating the website
> build
> and publish process, so input like this is definitely helpful. This
> could
> basically be summarized as a requirement to make the website contain
> static
> parts (e.g. APIs and documentation) as well as dynamic parts (e.g.
> news,
> statistics, recent papers etc).
>
>