Re: [Launch Announcement] Dynamic training with Apache MXNet

2018-11-29 Thread Rahul Huilgol
This is great stuff. Well done!  Few questions:

   - Do you plan to maintain this as a separate fork, or merge it back to
   the main repository?
   - Is the number of parameter servers fixed at the start? Or can we add
   more parameter servers?
   - I see that you can not remove any nodes that you initialized the
   cluster with. Why are these initial nodes treated differently? Are they
   treated differently because they hold the parameter servers who update the
   weights (and hold the optimizer states)?


On Thu, Nov 29, 2018 at 4:04 PM Marco de Abreu
 wrote:

> Awesome project! Great job everyone.
>
> Am Do., 29. Nov. 2018, 19:55 hat Kumar, Vikas 
> geschrieben:
>
> > A big thanks to Qi Qiao < https://github.com/mirocody > for making it
> > easy for users to set up a cluster for dynamic training using
> > cloudformation.
> >
> > From: "Kumar, Vikas" 
> > Date: Thursday, November 29, 2018 at 10:26 AM
> > To: "dev@mxnet.incubator.apache.org" 
> > Subject: [Launch Announcement] Dynamic training with Apache MXNet
> >
> > Hello MXNet community,
> >
> > MXNet users can now use Dynamic Training(DT) for Deep learning models
> with
> > Apache MXNet. DT helps to reducing training cost and training time by
> > adding elasticity to the distributed training cluster. DT also helps in
> > increasing instance pool utilization. With DT unused instances can be
> used
> > to speed up training and then instances can be removed from training
> > cluster at a later time to be used by some other application.
> > For details, refer to DT blog<
> >
> https://aws.amazon.com/blogs/machine-learning/introducing-dynamic-training-for-deep-learning-with-amazon-ec2/
> > >.
> > Developers should be able to integrate Dynamic training in their existing
> > distributed training code, with introduction of few extra lines of code<
> >
> https://github.com/awslabs/dynamic-training-with-apache-mxnet-on-aws#writing-a-distributed-training-script
> > >.
> >
> > Thank you for all the contributors – Vikas Kumar <
> > https://github.com/Vikas89 >, Haibin Lin <
> > https://github.com/eric-haibin-lin>, Andrea Olgiati <
> > https://github.com/andreaolgiati/><https://github.com/andreaolgiati/> ,
> > Mu Li < https://github.com/mli >, Hagay Lupesko <
> > https://github.com/lupesko>, Markham Aaron <
> > https://github.com/aaronmarkham > , Sergey Sokolov <
> > https://github.com/Ishitori> , Qi Qiao < https://github.com/mirocody >
> >
> > This is an effort towards making training neural networks cheap and fast.
> > We welcome your contributions to the repo -
> > https://github.com/awslabs/dynamic-training-with-apache-mxnet-on-aws .
> We
> > would love to hear feedback and ideas in this direction.
> >
> > Thanks
> > Vikas
> >
>


-- 
Rahul Huilgol


Re: Adding AMD CPU to CI

2018-11-29 Thread Rahul Huilgol
+1
I do think it would be valuable to add an AMD step to our CI. As we
continue to improve performance, we might have to consider more
instructions which are faster but are specific to the hardware
architecture. We are doing a lot of Intel specific work, it would be a good
sanity check that we continue to support AMD.


On Thu, Nov 29, 2018 at 4:03 PM kellen sunderland <
kellen.sunderl...@gmail.com> wrote:

> Just looked at the mf16c work and wanted to mention Rahul clearly _was_
> thinking about AMD users in that PR.
>
> On Thu, Nov 29, 2018 at 3:46 PM kellen sunderland <
> kellen.sunderl...@gmail.com> wrote:
>
> > From my perspective we're developing a few features like mf16c and MKLDNN
> > integration specifically for Intel CPUs.  It wouldn't hurt to make sure
> > those changes also run properly on AMD cpus.
> >
> > On Thu, Nov 29, 2018, 3:38 PM Hao Jin  >
> >> I'm a bit confused about why we need extra functionality tests just for
> >> AMD
> >> CPUs, aren't AMD CPUs supporting roughly the same instruction sets as
> the
> >> Intel ones? In the very impossible case that something working on Intel
> >> CPUs being not functioning on AMD CPUs (or vice versa), it would mostly
> >> likely be related to the underlying hardware implementation of the same
> >> ISA, to which we definitely do not have a good solution. So I don't
> think
> >> performing extra tests on functional aspect of the system on AMD CPUs is
> >> adding any values.
> >> Hao
> >>
> >> On Thu, Nov 29, 2018 at 5:50 PM Seth, Manu 
> >> wrote:
> >>
> >> > +1
> >> >
> >> > On 11/29/18, 2:39 PM, "Alex Zai"  wrote:
> >> >
> >> > What are people's thoughts on having AMD machines tested on the
> CI?
> >> AMD
> >> > machines are now available on AWS.
> >> >
> >> > Best,
> >> > Alex
> >> >
> >> >
> >> >
> >>
> >
>


-- 
Rahul Huilgol


Re: MXNet developer setup on Mac with VSCode for develop, test and debug

2018-07-20 Thread Rahul Huilgol
This is great, thanks Sandeep!

Regards,
Rahul
On Fri, Jul 20, 2018, 8:57 AM Lin Yuan  wrote:

> Pedro, I have tried CLion briefly but found it could not resolve C++ tags
> by default and throwing many errors/warning. Did you set any thing extra?
> Besides, debugging in a mixed language (Python and C++) environment, you
> don't have to switch between IDEs in VSCode.
>
> On Thu, Jul 19, 2018 at 9:59 AM Pedro Larroy  >
> wrote:
>
> > Have you guys tried CLion, works like a charm for me. (Requires license).
> >
> > On Wed, Jul 18, 2018 at 10:09 PM Naveen Swamy 
> wrote:
> >
> > > Thanks Sandeep for putting this together, it would make it easy for
> > people
> > > who prefer to IDEs to get started with MXNet easily.
> > >
> > > On Wed, Jul 18, 2018 at 1:04 PM, Lin Yuan  wrote:
> > >
> > > > Hi Aaron,
> > > >
> > > > This doc is for development on Mac. It is not intended for Windows
> > users.
> > > > Maybe we can start a different thread to discuss about MXNet build on
> > > > Windows? I have tried it myself on a GPU instances built on Windows
> > DLAMI
> > > > 10.0. I would love to share with you my setup steps.
> > > >
> > > > Lin
> > > >
> > > > On Wed, Jul 18, 2018 at 11:43 AM Markham, Aaron
> > > > 
> > > > wrote:
> > > >
> > > > > This is tangential, but Lin, I noticed during the RC1 tests you
> said
> > > you
> > > > > tried it out on Windows and it worked for you. I'd like to get
> VS2017
> > > or
> > > > VS
> > > > > Code working, take Sandeep's setup content and possibly your
> Windows
> > > > > experience, and improve the MXNet Windows setup guide. I've tried
> it
> > > and
> > > > > failed. Multiple times. I also tried the MKLDNN instructions and
> > > failed.
> > > > I
> > > > > tried the setup tools batch file and was hit with a lot of
> dependency
> > > > > errors. Some of the problem isn't in the MXNet docs, but in the
> > > > > dependencies' documentation, but I'm left to go figure that out on
> my
> > > > own.
> > > > > Anyway, any help you can provide here would be great. Also, if any
> of
> > > you
> > > > > reading this has a sort of checklist or guide for Windows, I'd love
> > to
> > > > see
> > > > > it.
> > > > >
> > > > > BTW, I'm using Windows 10 with an NVIDIA GeForce GTX 980, and was
> > > trying
> > > > > to use VS2017 Community Edition and MKL. I went to MKL after
> OpenBLAS
> > > > > wasn't installing/building.
> > > > >
> > > > > On 7/18/18, 10:59 AM, "Lin Yuan"  wrote:
> > > > >
> > > > > Thanks for the well-written document! As a new MXNet
> developer, I
> > > > have
> > > > > found it very helpful.
> > > > >
> > > > > Lin
> > > > >
> > > > > On Wed, Jul 18, 2018 at 10:50 AM sandeep krishnamurthy <
> > > > s...@apache.org
> > > > > >
> > > > > wrote:
> > > > >
> > > > > > Hello Community,
> > > > > >
> > > > > >
> > > > > >
> > > > > > As a MXNet contributor, I had issues and took me some time on
> > > > getting
> > > > > > hands-on with MXNet codebase, being able to code, test, DEBUG
> > > > > python/CPP
> > > > > > combination. I have documented the steps for MXNet
> development
> > > > setup
> > > > > using
> > > > > > VSCode on Mac. Document starts from installing all required
> > > > > > tools/packages/IDEs/extensions and then provides steps for
> > > > debugging
> > > > > mix of
> > > > > > Python/CPP code, which is most likely the case for any MXNet
> > > > > developer, all
> > > > > > in single IDE window. By end of this document, anyone should
> be
> > > > able
> > > > > to
> > > > > > walk through the MXNet code, debug and be able to make first
> > code
> > > > > change.
> > > > > >
> > > > > >
> > > > > >
> > > > > > Please feel free to add comments, make changes as necessary.
> > > > > >
> > > > > >
> > > > > >
> > > > > https://cwiki.apache.org/confluence/display/MXNET/
> > > > MXNet+Developer+Setup+on+Mac
> > > > > >
> > > > > > Best,
> > > > > > Sandeep
> > > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
>


Re: [VOTE] Subscribe dev@ to Github Activities

2018-07-18 Thread Rahul Huilgol
those discussions in the dev list instead.
> > > >
> > > > Indu
> > > >
> > > >
> > > > On Wed, Jul 18, 2018, 5:51 AM Barber, Christopher <
> > > > christopher.bar...@analog.com> wrote:
> > > >
> > > > > Can't people already subscribe to github notifications? I think
> > it
> > > > is safe
> > > > > to assume that developers are already smart enough to figure
> out
> > > how
> > > > to do
> > > > > that if they want. What problem are you really trying to solve
> > > here?
> > > > >
> > > > > On 7/18/18, 4:49 AM, "Chris Olivier" 
> > > wrote:
> > > > >
> > > > > -1.  (changed from -0.9)
> > > > >
> > > > > seems more like a strategy (whether intentional or on
> > accident)
> > > > to
> > > > > *not*
> > > > > have design discussions on dev by flooding it with noise
> and
> > > > then later
> > > > > claim it was discussed, even though you would have to sift
> > > > through
> > > > > thousands of emails to find it.
> > > > >
> > > > >
> > > > >
> > > > > On Wed, Jul 18, 2018 at 12:42 AM Rahul Huilgol <
> > > > rahulhuil...@gmail.com
> > > > > >
> > > > > wrote:
> > > > >
> > > > > > I pulled up some more stats so we can make an informed
> > > > decision.
> > > > > >
> > > > > > Here are some popular Apache projects and the number of
> > > emails
> > > > to
> > > > > their
> > > > > > dev@
> > > > > > list in the last 30 days
> > > > > > Apache Flink: 540 mails
> > > > > > ​Apache Spark: 249 mails
> > > > > > Apache Hive: 481 mails
> > > > > > Apache HBase: 300 mails
> > > > > >
> > > > > > Current dev list for MXNet: 348 mails
> > > > > > Current commits list for MXNet: 5329 mails
> > > > > > Making the proposed dev list for MXNet to be ~5677 mails.
> > > > > >
> > > > > > Sheng, even going by your comments that 1 of of those 4
> > mails
> > > > are
> > > > > relevant
> > > > > > for dev@, that's still a really high number of emails.
> > (130
> > > > email
> > > > > lists
> > > > > > doesn't say anything if we ignore the actual number of
> > emails
> > > > in
> > > > > those
> > > > > > lists, especially when the 131st sends these many mails
> :)
> > ).
> > > > People
> > > > > are
> > > > > > already talking about setting up filters here. Doesn't
> that
> > > > defeat
> > > > > the
> > > > > > purpose by making people filter out the discussion on
> > Github?
> > > > People
> > > > > can
> > > > > > subscribe to commits@ if they find it more convenient to
> > > > follow
> > > > > Github
> > > > > > activity over email rather than Github.com.
> > > > > >
> > > > > > We should strive to maintain dev@ as a place for high
> > > quality
> > > > > discussion.
> > > > > > It's upto the contributors to bring up something to dev@
> > if
> > > > they
> > > > > believe
> > > > > > it
> > > > > > deserves a focused discussion in the community. That
> > > > discussion may
> > > > > be
> > > > > > started by the person who proposes code changes, or a
> > > reviewer
> > > > who
> > > > > believes
> > > > > > that a particular code change warrants further
> discussion.
> > > > > >
> > > > > > Regards,
> > > > > > Rahul
> > > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > > >
> > > >
> > >
> >
>



-- 
Rahul Huilgol


Re: [VOTE] Subscribe dev@ to Github Activities

2018-07-18 Thread Rahul Huilgol
I pulled up some more stats so we can make an informed decision.

Here are some popular Apache projects and the number of emails to their dev@
list in the last 30 days
Apache Flink: 540 mails
​Apache Spark: 249 mails
Apache Hive: 481 mails
Apache HBase: 300 mails

Current dev list for MXNet: 348 mails
Current commits list for MXNet: 5329 mails
Making the proposed dev list for MXNet to be ~5677 mails.

Sheng, even going by your comments that 1 of of those 4 mails are relevant
for dev@, that's still a really high number of emails. (130 email lists
doesn't say anything if we ignore the actual number of emails in those
lists, especially when the 131st sends these many mails :) ). People are
already talking about setting up filters here. Doesn't that defeat the
purpose by making people filter out the discussion on Github? People can
subscribe to commits@ if they find it more convenient to follow Github
activity over email rather than Github.com.

We should strive to maintain dev@ as a place for high quality discussion.
It's upto the contributors to bring up something to dev@ if they believe it
deserves a focused discussion in the community. That discussion may be
started by the person who proposes code changes, or a reviewer who believes
that a particular code change warrants further discussion.

Regards,
Rahul


Re: [VOTE] Subscribe dev@ to Github Activities

2018-07-17 Thread Rahul Huilgol
-1

We had such a thing before and people asked for the mails to be redirected
to a different list commits@ because of the flood of mails.

https://lists.apache.org/thread.html/8b834e39110381fadb8a0ab59185a8f52b8406247a1f281f7d691392@%3Cdev.mxnet.apache.org%3E

I don't know if people have a sense of the volume of mails this can add
here. Here's the stats from the commits@ email list we have. I'd be curious
to see how many subscribers we have to that. Hopefully the people voting +1
here subscribed to that :)

2018 June: 4617
2018 July: (half a month) 3106
(Source of the numbers are here
https://lists.apache.org/list.html?comm...@mxnet.apache.org:2018-7)

@Joshua: yes we need to bring 'valuable' (emphasis mine) discussion to a
centralized place @dev. Does everything needs to be sent to dev@. For
example, consider these recent PRs, why is it necessary for them to be
forwarded to dev@?

fix flaky test test_operator_gpu.test_countsketch:
https://github.com/apache/incubator-mxnet/pull/11780
Update PyPI version number:
https://github.com/apache/incubator-mxnet/pull/11773
Fix file name creation for Windows:
https://github.com/apache/incubator-mxnet/pull/11765
[MXNET-8230] test_operator_gpu.test_rms fails:
https://github.com/apache/incubator-mxnet/pull/11749

If people are forced to setup filters to parse these mails, then we are
*ensuring* people don't get their eyes on valuable discussions on dev@.

Regards,
Rahul

On Tue, Jul 17, 2018 at 12:49 PM, Sheng Zha  wrote:

> FWIW: "from:notificati...@github.com AND to:dev@mxnet.incubator.apache.org
> AND NOT to:me" but I'm sure you get the gist :)
>
>
> Opt-in model applies to individuals rather than the dev list, because the
> dev list is intended as an asynchronous way for new comers to easily follow
> past technical discussions, and is the only place recognized by apache for
> these discussions. Currently, lots of high quality technical discussions
> that are happening on github are lost and not archived here. The procedural
> change in this vote is intended for bridging such gap. Besides, it's more
> likely for new contributors to know how to filter emails than to know how
> to "opt-in".
>
>
> More discussion is welcome in the linked discussion thread.
>
>
> -sz
>
> On Tue, Jul 17, 2018 at 12:37 PM, pracheer gupta <
> pracheer_gu...@hotmail.com
> > wrote:
>
> > FWIW: The filter needs to be more complicated than just "
> > from:notificati...@github.com". After all, if someone mentions me
> > directly in PR thread and/or I subscribe to only a particular PR, those
> > emails will also come from "notificati...@github.com". There are ways
> > around that though.
> >
> >
> > It might be good to mention this filter in some wiki/webpage somewhere;
> > may save some effort for people trying to find the right set of filters.
> It
> > could even be in the welcome email when one subscribes to this
> email-list.
> >
> >
> > Another alternate option: How about choosing an opt-in model rather than
> > an opt-out model? Having another email list and anyone can subscribe to
> it
> > if they wish.
> >
> >
> > Not sure if there is a perfect answer out there for this but in principle
> > I agree that it will be good to have "push notifications" for all
> PRs/issue.
> >
> >
> > -Pracheer
> >
> > 
> > From: Junru Shao 
> > Sent: Tuesday, July 17, 2018 10:58:33 AM
> > To: d...@mxnet.apache.org
> > Subject: Re: [VOTE] Subscribe dev@ to Github Activities
> >
> > +1
> >
> > Both GitHub activities and dev list are places for development. It will
> be
> > great if we could have a all-in-one place for such discussions. I believe
> > Sheng's proposal is a perfect solution.
> >
> > On 2018/07/16 03:32:06, Sheng Zha  wrote:
> > > Hi,
> > >
> > > I'm starting a vote on subscribing dev@ to Github activities. See
> > previous
> > > discussion thread here
> > > <https://lists.apache.org/thread.html/3d883f6a3cbc8e81e810962e0c0fe7
> > bfd01f0b78d3cb44034f566442@%3Cdev.mxnet.apache.org%3E>
> > > .
> > >
> > > The vote lasts for three days and ends on 7/18/2018 at 9pm pst.
> > >
> > > -sz
> > >
> >
>



-- 
Rahul Huilgol


Re: Make cmake default

2018-06-01 Thread Rahul Huilgol
+1

Let's move to CMake. It has much better support, and it's not worth
maintaining two build systems.
If we really want we could maintain a make file which manages the
installation of cmake and calls cmake internally! It seems easy to install
cmake. There is a shell script with binary for all Linux x86_64, Windows,
Mac. For other systems as well, it's just a couple of steps.

Regards,
Rahul



On Fri, 1 Jun 2018 at 15:12 Chen HY  wrote:

> building for rpi doesn't mean you should build on a rpi... that takes
> forever.
>
> 2018-06-01 23:06 GMT+01:00 Anirudh :
>
> > +1 to using cmake and deprecating Makefile. I was able to find a previous
> > discussion on this:
> > https://github.com/apache/incubator-mxnet/issues/8702
> >
> > The concerns raised were
> > 1. Building on devices like raspberry pi where cmake is non existent or
> > old.
> > 2. Adding an additional dependency.
> >
> > As mentioned in the thread, if we provide good instructions on how to
> > install cmake/build cmake from source,
> > these concerns will be addressed.
> >
> > Anirudh
> >
> > On Fri, Jun 1, 2018 at 2:58 PM, Alex Zai  wrote:
> >
> > > Just realized that the email lists strips aways all hyperlinks.
> Attached
> > > is a
> > > copy of my previous email with links pasted in.
> > >
> > > What are peoples' thought on requiring cmake when building from source?
> > > Currently we have to maintain two independent build files (CMakeLists
> and
> > > Makefile) which makes it more difficult to develop (each are 600+
> lines).
> > > Also,
> > > our current build system (in Makefile) requires that 3rdparty
> > dependencies
> > > have
> > > binaries present (or a Makefile to generate binaries) in the repo,
> which
> > > is not
> > > always the case.
> > > Generating a makefile with cmake will make our Makefile very simple
> like
> > > PyTorch'sMakefile (20 lines of code -
> > > https://github.com/pytorch/pytorch/blob/master/Makefile). Also, not
> all
> > > 3rdparty
> > > dependencies have binaries or Makefiles. For 3rdparty/mkldnn we end up
> > > calling
> > > cmake
> > >  (https://github.com/apache/incubator-mxnet/blob/master/
> > > prepare_mkldnn.sh#L96)
> > > to generate binaries (this does not violate our 'no cmake dependency'
> as
> > > USE_MKLDNN is OFF by default). If we encounter any library in the
> future
> > > that
> > > requires us to generate artifacts with cmake, it would be better to
> make
> > > the
> > > switch now. Lastly, we already require cmake as a dependency
> forwindows'
> > > developers
> > >  (https://www.dropbox.com/s/9sfnderg58z4j1l/Screenshot%
> > > 202018-06-01%2013.43.08.png?dl=0)
> > > so this would only affect linux / mac developers who do not have cmake
> > > already.
> > > I currently have a pendingPR
> > >  (https://github.com/apache/incubator-mxnet/pull/8/) that depends
> on
> > > this
> > > change. The library does not have a Makefile or binaries present.
> Unlike
> > > mkldnn,
> > > we would want this library included by default so I cannot generate
> > > artifacts
> > > with cmake. The alternative would be to strip out only the relevant
> parts
> > > of the
> > > code we need from the library. I did this in a previous version of myPR
> > >  (https://github.com/apache/incubator-mxnet/compare/
> > > dfdfd1ad15de8bb1b899effb0860a4e834093cfc...
> > a4267eb80488804a7f74ff01f5627c
> > > 47dd46bd78)
> > > but it is incredible messy.
> > > Please let me know your thoughts.
> > > Best,
> > > Alex
> > >
> > >
> > >
> > >
> > >
> > > On Fri, Jun 1, 2018 2:51 PM, Alex Zai aza...@gmail.com  wrote:
> > > What are peoples' thought on requiring cmake when building from source?
> > > Currently we have to maintain two independent build files (CMakeLists
> and
> > > Makefile) which makes it more difficult to develop (each are 600+
> lines).
> > > Also,
> > > our current build system (in Makefile) requires that 3rdparty
> > dependencies
> > > have
> > > binaries present (or a Makefile to generate binaries) in the repo,
> which
> > > is not
> > > always the case.
> > > Generating a makefile with cmake will make our Makefile very simple
> like
> > > PyTorch's Makefile (20 lines of code). Also, not all 3rdparty
> > dependencies
> > > have
> > > binaries or Makefiles. For 3rdparty/mkldnn we end up calling cmake to
> > > generate
> > > binaries (this does not violate our 'no cmake dependency' as USE_MKLDNN
> > is
> > > OFF
> > > by default). If we encounter any library in the future that requires us
> > to
> > > generate artifacts with cmake, it would be better to make the switch
> now.
> > > Lastly, we already require cmake as a dependency for windows'
> > > developers so this
> > > would only affect linux / mac developers who do not have cmake already.
> > > I currently have a pending PR that depends on this change. The library
> > > does not
> > > have a Makefile or binaries present. Unlike mkldnn, we would want this
> > > library
> > > included by default so I cannot generate artifacts with cmake. The
> > > alternative
>

Re: Slack. Subscription

2018-05-24 Thread Rahul Huilgol
Sent an invite!

On Thu, May 24, 2018, 11:49 PM Ivan Serdyuk 
wrote:

> Please subscribe to the Slack channel. Tnx
>


Re: MXNet Protobuf dependency

2018-05-23 Thread Rahul Huilgol
Hi Rajan,

This PR from the Intel folks is adding support for MPI based distributed
training. They also needed proto3 and have updated the current ps-lite
proto file to work with protobuf3.5. You might want to take a look at that
and align efforts with that approach.

https://github.com/apache/incubator-mxnet/pull/10696

The ps-lite change:
https://github.com/threeleafzerg/ps-lite/compare/a6dda54604a07d1fb21b016ed1e3f4246b08222a...a470d2270d4af4badf4c94eab9559811697332e3#diff-ba121c714260f51ca98d51a080880b6d

Regards,
Rahul

On Wed, 23 May 2018 at 11:06 Singh, Rajan  wrote:

> Hi,
>
> Currently, MXNet has Protobuf ( version 2.5) as one of its dependency. The
> dependency comes from PS-lite<
> https://github.com/dmlc/ps-lite/blob/a6dda54604a07d1fb21b016ed1e3f4246b08222a/make/deps.mk#L11>
> used for distributed training.
> Recently, we have added ONNX support in MXNet(1.2.0) contrib package(
> import ONNX support). This module has a runtime dependency on
> Protobuf(version 3) , needed for ONNX.
> So, if a user tries to do “import onnx”, will get a message:
>
> “To use this module developers need to install ONNX, which requires the
> protobuf compiler to be installed separately. Please follow the
> instructions to install ONNX and its dependencies<
> https://github.com/onnx/onnx#installation>. MXNet currently supports ONNX
> v1.1.1. Once installed, you can go through the tutorials on how to use this
> module.”
>
> User will end up installing protobuf version 3.5.2. Since Protobuf
> backward compatibility is flaky, anything dependent on version < 2.6, will
> probably break. In this case, distributed training might break for the user.
>
> IMO, To resolve this dependency conflict in MXNet, would require an update
> to PS-lite dependency to  Protobuf version 3. Is there a POA to update this
> dependency for PS-lite?
> FYI: We are also working on adding an export module support, will export
> MXNet models to ONNX format, which will also have Protobuf version 3 and
> ONNX as its runtime dependency.
>
> Please let me know, what should be best path moving forward.
>
> Thanks
> Rajan
>
>


Re: CMake issues

2018-05-22 Thread Rahul Huilgol
Maybe we could do that now, after the code for the release has been voted
on. We could maintain a patches branch for a release. This would also be
helpful for users who are using a particular release version but hesitate
to switch to the latest release when it's out because of the many changes.
Such users I've seen maintain their own patches branch. We could help them
and keep things consistent. We can cut a patch release from that if we have
many fixes or important fixes.

On Thu, May 17, 2018 at 10:05 PM, Anirudh  wrote:

> Thats an interesting suggestion. I understand the benefits, but this will
> complicate things if we want to do another quick release like the previous
> one (for eg. if there is a license issue because of which the release is
> cancelled).
>
> Anirudh
>
> On Thu, May 17, 2018 at 9:23 PM, Naveen Swamy  wrote:
>
> > why don't we cherry-pick and push it to the branch as well ? next time a
> > release is cut from that branch we'll have everything working.
> >
> > On Thu, May 17, 2018 at 9:21 PM, Anirudh  wrote:
> >
> > > Correction : This fix went into master but not 1.2 release.
> > >
> > > On Thu, May 17, 2018 at 9:21 PM, Anirudh 
> wrote:
> > >
> > > > Hi Xinyu,
> > > >
> > > > Thank you! This fix didnt went into master but not 1.2 release.
> > > >
> > > > Anirudh
> > > >
> > > > On Thu, May 17, 2018 at 5:23 PM, Chen, Xinyu1  >
> > > > wrote:
> > > >
> > > >> Hi all,
> > > >>
> > > >> Regarding to the following knowing issues listed below RC3:
> > > >>
> > > >>
> > > >>   *   CMake build ignores the USE_MKLDNN flag and doesn't build with
> > > >> MKLDNN support even with -DUSE_MKLDNN=1. To workaround the issue
> > please
> > > >> see: #10801<https://github.com/apache/incubator-mxnet/issues/10801
> >.
> > > >> I think this problem has already been fixed in #10629<
> > > >> https://github.com/apache/incubator-mxnet/pull/10629> and we can
> use
> > > >> CMake to build MXNet with MKLDNN support on WIN32 platform now.
> > > >> Xinyu
> > > >>
> > > >>
> > > >
> > >
> >
>



-- 
Rahul Huilgol


Re: installation page UX

2018-05-22 Thread Rahul Huilgol
Hi Aaron,

Thanks for your work on this! Few points I noticed

1. Looks like something is wrong with the pip instructions. I see
instructions for pre-reqs but not the actual pip install.

2. Most of the language bindings seem to be redirects to install page for
that OS, regardless of CPU/GPU, but we still show that selector.

3. Can we remove the 'Validate MXNet..' wording when instructions are on a
different page. The combination Linux, Scala, CPU for example.

4. I don't see much advantage of Pytorch or Caffe style installation
webpages. Like you mentioned we have couple of more selectors. But it's
pretty clear and not enough to overwhelm the user. If the problem is the
maintenance of backend instructions, could we address that by having
different files / folders for different scenarios and including them from
one file? Not sure if that's feasible.

+1 for hyperlinks to specific instructions.

Regards,
Rahul

On Tue, May 22, 2018 at 11:53 AM, Naveen Swamy  wrote:

> MXNet UI definitely needs more love.
>
> +1 - pytorch style
> +0.5 - caffe2
>
>
> On Tue, May 22, 2018 at 11:48 AM, Markham, Aaron 
> wrote:
>
> > Hi everyone,
> > In addition to the options on the wiki (pros & cons), there's this
> preview.
> > It uses a dropdown next to the install options to make it clearer what
> > versions you can install… then updates the pip commands…
> >
> > http://54.210.6.225/install/index.html#
> >
> > Thoughts?
> >
> >
> > On 5/16/18, 9:31 AM, "Aaron Markham"  wrote:
> >
> > Hi,
> > I've written some notes on the wiki about issues with the
> installation
> > page
> > along with some suggestions. I'd appreciate your feedback here or on
> > the
> > wiki.
> >
> > https://cwiki.apache.org/confluence/display/MXNET/
> Installation+page+UX
> >
> > Cheers,
> > Aaron
> >
> >
> >
>



-- 
Rahul Huilgol


Re: [VOTE] Release Apache MXNet (incubating) version 1.2.0.RC0

2018-04-20 Thread Rahul Huilgol
+1

Compiled from source and verified distributed training with float32 and
float16

Regards,
Rahul

On Thu, Apr 19, 2018 at 4:30 PM, Anirudh Acharya 
wrote:

> +1
>
> Checked the following -
>
>- source compiles
>- the tests for the onnx import API passes.
>
>
> Regards
> Anirudh
>
>
> On Thu, Apr 19, 2018 at 1:11 PM, Meghna Baijal  >
> wrote:
>
> > +1 (non-binding)
> >
> >
> > I Checked the following:
> >
> > 1. Signatures are ok
> >
> > 2. Source compiles
> >
> > 3. mnist test passes
> >
> >
> > Regards,
> >
> > Meghna
> >
> > On Thu, Apr 19, 2018 at 10:12 AM, Anirudh  wrote:
> >
> > > Hi all,
> > >
> > > Given the weekend, I am extending the vote deadline to Sunday evening,
> > > April 22nd 7:40 PM PDT, considering Saturday and Sunday as half days(as
> > > done before).
> > >
> > > Anirudh
> > >
> > > On Wed, Apr 18, 2018 at 7:40 PM, Anirudh 
> wrote:
> > >
> > > > Hi everyone,
> > > >
> > > > This is a vote to release Apache MXNet (incubating) version 1.2.0.
> > Voting
> > > > will start now (Wednesday, April 18th) and end at 7:40 PM PDT,
> > Saturday,
> > > > April 21st.
> > > >
> > > > Link to the release notes:
> > > >
> > > > https://cwiki.apache.org/confluence/display/MXNET/
> > > > Apache+MXNet+%28incubating%29+1.2.0+Release+Notes
> > > >
> > > > Link to the release candidate 1.2.0.rc0:
> > > > https://github.com/apache/incubator-mxnet/releases/tag/1.2.0.rc0
> > > >
> > > > View this page, click on “Build from Source”, and use the source code
> > > > obtained from the 1.2.0.rc0 tag:
> > > > https://mxnet.incubator.apache.org/install/index.html
> > > >
> > > > (Note: The README.md points to the 1.2.0 tag and does not work at the
> > > > moment.)
> > > >
> > > > Please remember to TEST first before voting accordingly:
> > > > +1 = approve
> > > > +0 = no opinion
> > > > -1 = disapprove (provide reason)
> > > >
> > > > Thanks,
> > > >
> > > > Anirudh
> > > >
> > >
> >
>



-- 
Rahul Huilgol


Re: MXNet Name Change?

2018-04-11 Thread Rahul Huilgol
Even changing the pronunciation is not an easy thing to do IMHO. As someone
who has been working on MXNet for the last 8 months, this thread is the
first time I am reading that MXNet is supposed to be pronounced 'mix-net'.
We risk losing the momentum even if we try to steer the pronunciation
towards something which is not popular currently.

I agree Gluon is a friendlier/cooler name. But since we are trying to
present Gluon as an interface which multiple frameworks can implement, I'm
not sure if we should try to shift the focus away from MXNet and towards
Gluon.

As far as adoption goes, I agree with Christopher that public benchmarks or
blog posts about how MXNet is better or more usable than other frameworks
would be worth spending effort on. Rebranding in a way to not lose the
existing momentum honestly sounds more difficult than this :)

By the way, we had some talk of a new friendlier logo sometime back. What
happened to that?

On Wed, Apr 11, 2018 at 1:43 PM, Aaron Markham 
wrote:

> Changing branding is hard as we already have some momentum under the
> current name. It's not impossible, and if someone has a fantastic idea
> and marketing plan for it, it's worth considering.
>
> Aside from that, updating the pronunciation could be useful if you
> like having those gif vs jif debates, but at least people would be
> talking about it. I personally like the sound of "mix-net" over
> "em-ex-net". Since we're just starting Youtube videos, I think it's a
> great idea to establish the pronunciation right away in those videos.
>
>
>
> On Wed, Apr 11, 2018 at 1:33 PM, Chris Olivier 
> wrote:
> > Just curious why you think it’s a bad idea — you didn’t say?
> >
> > On Wed, Apr 11, 2018 at 12:49 PM Chen HY  wrote:
> >
> >> At least people needs a way to speak it.
> >> Just define its pronunciation as "mix-net" or "m-x-net" and use the
> agreed
> >> one everywhere helps a lot.
> >> Changing name is a bad idea.
> >>
> >> 2018-04-11 20:29 GMT+01:00 Mu Li :
> >>
> >> > Agree that MXNet, the combination of Minerva and CXXNet, which can be
> >> > interpreted as mixed-net, is hard to be pronounced. But rebranding a
> name
> >> > is a very big decision. We need a very carefully designed marketing
> plan
> >> > for it.
> >> >
> >> > A choice is that we can gradually refer MXNet as a backend, and talk
> more
> >> > about the frontend Gluon.
> >> >
> >> > On Wed, Apr 11, 2018 at 12:16 PM, Thomas DELTEIL <
> >> > thomas.delte...@gmail.com>
> >> > wrote:
> >> >
> >> > > FWIW Brainscript is actually a network definition language:
> >> > > https://docs.microsoft.com/en-us/cognitive-toolkit/
> >> > > BrainScript-Network-Builder
> >> > >
> >> > >
> >> > >
> >> > > Thomas
> >> > >
> >> > >
> >> > > 2018-04-11 12:13 GMT-07:00 Chiyuan Zhang :
> >> > >
> >> > > > IIRC CNTK renamed to something like brainscript which does not
> seem
> >> to
> >> > be
> >> > > > very successful publicity campaign?
> >> > > >
> >> > > > Chiyuan
> >> > > >
> >> > > > On Wed, Apr 11, 2018 at 10:18 AM Chris Olivier <
> >> cjolivie...@gmail.com>
> >> > > > wrote:
> >> > > >
> >> > > > > Should we consider renaming MXNet to something more "friendly"?
> >> > > > >
> >> > > > > IMHO, I think this may be related to adoption problems.
> >> > > > >
> >> > > > > MXNet, CMTK -- both seem sort of sterile and hard to use, don't
> >> they?
> >> > > > >
> >> > > > > Tensorflow, PyTorch, Caffe -- sound cool.
> >> > > > >
> >> > > > --
> >> > > > Semt ftom m ipohne
> >> > > >
> >> > >
> >> >
> >>
> >>
> >>
> >> --
> >> Chen Hanyang 陈涵洋
> >> Software School Fudan University
> >> +86-138-1881-7745
> >>
>



-- 
Rahul Huilgol


Re: [VOTE] Change Scala namespace from dmlc to org.apache

2018-03-12 Thread Rahul Huilgol
+1

We need to change the namespace as soon as possible.

On Mon, Mar 12, 2018 at 3:15 PM, Roshani Nagmote 
wrote:

> +1 to change the namespace
>
> On Mon, Mar 12, 2018 at 3:05 PM, Chris Olivier 
> wrote:
>
> > The assumption is that it would be changed more-or-less immediately.  ie.
> > this is like a voted PR, I guess.
> >
> > On Mon, Mar 12, 2018 at 2:53 PM, Chris Olivier 
> > wrote:
> >
> > > It is about changing the namespace.  As far as I know, the version
> number
> > > of the next release is not defined.
> > > At such point where a release is announced, one could comment, vote
> > > whatever on the chosen version of that release, I suppose.  But that's
> > > beyond the scope of this vote, because the "next release" is not yet
> > > defined.
> > >
> > >
> > >
> > > On Mon, Mar 12, 2018 at 2:48 PM, Marco de Abreu <
> > > marco.g.ab...@googlemail.com> wrote:
> > >
> > >> Just for clarification: Is this vote about changing the namespace with
> > the
> > >> next release?
> > >>
> > >> On Mon, Mar 12, 2018 at 7:16 PM, Naveen Swamy 
> > wrote:
> > >>
> > >> > Chris, Thanks for starting this vote.
> > >> > This is long pending
> > >> >
> > >> > +1 to change org.apache namespace
> > >> >
> > >> > On Mon, Mar 12, 2018 at 10:35 AM, Marco de Abreu <
> > >> > marco.g.ab...@googlemail.com> wrote:
> > >> >
> > >> > > I gave my +1 for the code modification. The -1 was for Nan Zhus
> > >> proposal
> > >> > to
> > >> > > get it into 1.2.
> > >> > >
> > >> > > On Mon, Mar 12, 2018 at 6:18 PM, Chris Olivier <
> > cjolivie...@gmail.com
> > >> >
> > >> > > wrote:
> > >> > >
> > >> > > > If you're tying this to a process issue, then it's no longer a
> > code
> > >> > > > modification technical vote.
> > >> > > >
> > >> > > >
> > >> > > > On Mon, Mar 12, 2018 at 9:56 AM, Marco de Abreu <
> > >> > > > marco.g.ab...@googlemail.com> wrote:
> > >> > > >
> > >> > > > > Right
> > >> > > > >
> > >> > > > > Chris Olivier  schrieb am Mo., 12.
> März
> > >> 2018,
> > >> > > > > 17:38:
> > >> > > > >
> > >> > > > > > Are you saying your vote is contingent upon the outcome of a
> > >> > separate
> > >> > > > > vote?
> > >> > > > > >
> > >> > > > > > On Mon, Mar 12, 2018 at 9:37 AM, Marco de Abreu <
> > >> > > > > > marco.g.ab...@googlemail.com> wrote:
> > >> > > > > >
> > >> > > > > > > +1 for changing the namespace
> > >> > > > > > > -1 for merging this change into master according to the
> > >> current
> > >> > > > policy
> > >> > > > > > >
> > >> > > > > > > Chris Olivier  schrieb am Mo., 12.
> > >> März
> > >> > > 2018,
> > >> > > > > > > 17:34:
> > >> > > > > > >
> > >> > > > > > > > Release versioning is a separate issue or vote.  At
> > release
> > >> > time,
> > >> > > > > > people
> > >> > > > > > > > can "demand" version X or Y.  This vote represents "do
> we
> > >> want
> > >> > to
> > >> > > > > > change
> > >> > > > > > > > the namespace".
> > >> > > > > > > >
> > >> > > > > > > > On Mon, Mar 12, 2018 at 9:30 AM, Nan Zhu <
> > >> > zhunanmcg...@gmail.com
> > >> > > >
> > >> > > > > > wrote:
> > >> > > > > > > >
> > >> > > > > > > > > I think we'd specify it will change in the next
> version
> > >> > (1.2)?
> > >> > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > > > On Mon, Mar 12, 2018 at 9:26 AM, Chris Olivier <
> > >> > > > > > cjolivie...@gmail.com>
> > >> > > > > > > > > wrote:
> > >> > > > > > > > >
> > >> > > > > > > > > > This vote is for the code-change of altering the
> Scala
> > >> API
> > >> > > > > > namespace
> > >> > > > > > > > from
> > >> > > > > > > > > > dmlc to org.apache.
> > >> > > > > > > > > >
> > >> > > > > > > > > >
> > >> > > > > > > > > > Vote will conclude on Thursday, 5pm PDT.
> > >> > > > > > > > > >
> > >> > > > > > > > > > Thank you,
> > >> > > > > > > > > >
> > >> > > > > > > > > > -Chris
> > >> > > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
>



-- 
Rahul Huilgol


Re: [VOTE] Disconnect all non-C API's from mxnet versioning

2018-03-12 Thread Rahul Huilgol
b, GraphX, etc, same
> > >> > version as the Spark Core, as well as the Scala/Java API. I feel it
> > >> > convenient since every time I check a document, say, MLLib 1.6.0, I
> > >> > can tell it works with Spark Core 1.6.0 and GraphX 1.6.0. And I can
> > >> > expect when I use Python API 1.6.0, it will behave the same.
> > >> >
> > >> > and for +1 votings, do you mean to separate Python/Gluon API
> > versioning
> > >> as
> > >> > well?
> > >> >
> > >> > 2018-03-12 11:18 GMT-07:00 Naveen Swamy :
> > >> > > -1 for different versioning, it not only be maintenance nightmare
> > but
> > >> > also
> > >> > > more importantly confusing to users,
> > >> > >
> > >> > >
> > >> > > On Mon, Mar 12, 2018 at 9:57 AM, Marco de Abreu <
> > >> > > marco.g.ab...@googlemail.com> wrote:
> > >> > >
> > >> > >> According to the discussion in the Scala thread, the release
> cycles
> > >> > would
> > >> > >> stay unchanged and are still part of the mxnet releases.
> > >> > >>
> > >> > >> Nan Zhu  schrieb am Mo., 12. März 2018,
> > >> 17:42:
> > >> > >>
> > >> > >> > how about release cycle?
> > >> > >> >
> > >> > >> >
> > >> > >> > On Mon, Mar 12, 2018 at 9:37 AM, Yuan Tang <
> > terrytangy...@gmail.com
> > >> >
> > >> > >> > wrote:
> > >> > >> >
> > >> > >> > > +1
> > >> > >> > >
> > >> > >> > > On Mon, Mar 12, 2018 at 12:35 PM, Marco de Abreu <
> > >> > >> > > marco.g.ab...@googlemail.com> wrote:
> > >> > >> > >
> > >> > >> > > > +1
> > >> > >> > > >
> > >> > >> > > > Tianqi Chen  schrieb am Mo., 12.
> > März
> > >> > >> 2018,
> > >> > >> > > > 17:33:
> > >> > >> > > >
> > >> > >> > > > > +1
> > >> > >> > > > >
> > >> > >> > > > > On Mon, Mar 12, 2018 at 9:32 AM, Chris Olivier <
> > >> > >> > cjolivie...@apache.org
> > >> > >> > > >
> > >> > >> > > > > wrote:
> > >> > >> > > > >
> > >> > >> > > > > > It has been proposed that all Non-C API's follow
> separate
> > >> > >> > versioning
> > >> > >> > > > from
> > >> > >> > > > > > the main mxnet C API/releases.
> > >> > >> > > > > >
> > >> > >> > > > > > A +1 vote is in *favor of* using a different versioning
> > for
> > >> > all
> > >> > >> > > > > > non-C-API's, with each API (Scala, R, Julia, C++, etc.)
> > >> having
> > >> > >> its
> > >> > >> > > own
> > >> > >> > > > > > version.
> > >> > >> > > > > >
> > >> > >> > > > > > A -1 vote is *against* using a different versioning for
> > all
> > >> > >> > > > non-C-API's,
> > >> > >> > > > > > with all API's (Scala, R, Julia, C++, etc.) sharing the
> > >> mxnet
> > >> > >> > > version.
> > >> > >> > > > > >
> > >> > >> > > > > > This vote will conclude on Monday, March 19, 2018.
> > >> > >> > > > > >
> > >> > >> > > > > > Thanks,
> > >> > >> > > > > > -Chris
> > >> > >> > > > > >
> > >> > >> > > > >
> > >> > >> > > >
> > >> > >> > >
> > >> > >> >
> > >> > >>
> > >> >
> > >> >
> > >> >
> > >> > --
> > >> > Yizhi Liu
> > >> > DMLC member
> > >> > Amazon Web Services
> > >> > Vancouver, Canada
> > >> >
> > >>
> >
> >
> >
> > --
> > Yizhi Liu
> > DMLC member
> > Amazon Web Services
> > Vancouver, Canada
> >
>



-- 
Rahul Huilgol


Re: S3 Writes using SIG4 Authentication

2018-03-07 Thread Rahul Huilgol
I was looking at SIG4's documentation for S3 here
<https://docs.aws.amazon.com/AmazonS3/latest/API/sigv4-streaming.html>
earlier. The section on Chunked Upload confused me because it said I need
to pass Content-Length header in the request.

I now realize that I was using the terms `chunked upload` and `multipart
upload` interchangeably. They are actually different things.
Multipart upload functionality is similar to the earlier behavior that I
had mentioned. Each part can be sent as a normal PUT request with a certain
uploadId parameter. This request can again be uploaded as multiple chunks
if necessary. Chunked upload requires total size of that part beforehand,
but multipart upload itself does not require the total length of data
beforehand.

I've now updated my PR to support writes as well.

Thanks for your help!

Regards,
Rahul

On Wed, Mar 7, 2018 at 9:56 AM, Bhavin Thaker 
wrote:

> Multi-part upload with finalization seems like a good approach for this
> problem.
>
> Bhavin Thaker.
>
> On Wed, Mar 7, 2018 at 7:45 AM Naveen Swamy  wrote:
>
> > Rahul,
> > IMO It is not Ok to write to a local file before streaming, you have to
> > consider security implications such as:
> > 1) will your local file be encrypted(encryption at rest)
> > 2) what happens if the process crashes, you will have to make sure the
> > local file is deleted in failure and process exit scenarios.
> >
> > My understanding is for multi part uploads it uses chunked transfer
> > encoding and for that you do not need to know the total size and only
> know
> > the chunked data size.
> > https://en.wikipedia.org/wiki/Chunked_transfer_encoding
> >
> > See this SO answer:
> >
> > https://stackoverflow.com/questions/8653146/can-i-
> stream-a-file-upload-to-s3-without-a-content-length-header
> >
> > Can you point to the literature that asks to know the total size.
> >
> > -Naveen
> >
> >
> > On Tue, Mar 6, 2018 at 10:34 PM, Rahul Huilgol 
> > wrote:
> >
> > > Hi Chris,
> > >
> > > S3 doesn't support append calls. They promote the use of multipart
> > uploads
> > > to upload large files in parallel, or when network reliability is an
> > issue.
> > > Writing like a stream does not seem to be the purpose of multipart
> > uploads.
> > >
> > > I looked into what the AWS SDK does (in Java). It buffers in memory
> > however
> > > large the file might be, and then uploads. I imagine this involves
> > > reallocating and copying the buffer to the larger buffer. There are few
> > > issues raised regarding this on the sdk repos like this
> > > <https://github.com/aws/aws-sdk-java/issues/474>. But this doesn't
> seem
> > to
> > > be something the SDKs can do anything about. People seem to be writing
> to
> > > temporary files and then uploading.
> > >
> > > Regards,
> > > Rahul
> > >
> > > On Tue, Mar 6, 2018 at 9:04 PM, Chris Olivier 
> > > wrote:
> > >
> > > > it seems strange that s3 would make such a major restriction. there’s
> > > > literally no way to incrementally write a file without knowing the
> size
> > > > beforehand? some sort of separate append calls, maybe?
> > > >
> > > > On Tue, Mar 6, 2018 at 8:53 PM Rahul Huilgol  >
> > > > wrote:
> > > >
> > > > > Hi everyone,
> > > > >
> > > > > I have been looking at updating the authentication used by
> > S3FileSystem
> > > > in
> > > > > dmlc-core. Current code uses Signature version 2, which works only
> in
> > > the
> > > > > region us-east-1 now. We need to update the authentication scheme
> to
> > > use
> > > > > Signature version 4 (SIG4).
> > > > >
> > > > > I've submitted a PR <https://github.com/dmlc/dmlc-core/pull/378>
> to
> > > > change
> > > > > this for Reads. But I wanted to seek out thoughts on what to do for
> > > > Writes,
> > > > > as there is a potential problem.
> > > > >
> > > > > *How writes to S3 work currently:*
> > > > > Whenever s3filesystem's stream.write() is called, data is buffered.
> > > When
> > > > > the buffer is full, a request is made to S3. Since this can happen
> > > > multiple
> > > > > times, multipart upload feature is used. An upload id is created
> when
> > > > > stream is initialized. This upload id is 

Re: S3 Writes using SIG4 Authentication

2018-03-06 Thread Rahul Huilgol
Hi Chris,

S3 doesn't support append calls. They promote the use of multipart uploads
to upload large files in parallel, or when network reliability is an issue.
Writing like a stream does not seem to be the purpose of multipart uploads.

I looked into what the AWS SDK does (in Java). It buffers in memory however
large the file might be, and then uploads. I imagine this involves
reallocating and copying the buffer to the larger buffer. There are few
issues raised regarding this on the sdk repos like this
<https://github.com/aws/aws-sdk-java/issues/474>. But this doesn't seem to
be something the SDKs can do anything about. People seem to be writing to
temporary files and then uploading.

Regards,
Rahul

On Tue, Mar 6, 2018 at 9:04 PM, Chris Olivier  wrote:

> it seems strange that s3 would make such a major restriction. there’s
> literally no way to incrementally write a file without knowing the size
> beforehand? some sort of separate append calls, maybe?
>
> On Tue, Mar 6, 2018 at 8:53 PM Rahul Huilgol 
> wrote:
>
> > Hi everyone,
> >
> > I have been looking at updating the authentication used by S3FileSystem
> in
> > dmlc-core. Current code uses Signature version 2, which works only in the
> > region us-east-1 now. We need to update the authentication scheme to use
> > Signature version 4 (SIG4).
> >
> > I've submitted a PR <https://github.com/dmlc/dmlc-core/pull/378> to
> change
> > this for Reads. But I wanted to seek out thoughts on what to do for
> Writes,
> > as there is a potential problem.
> >
> > *How writes to S3 work currently:*
> > Whenever s3filesystem's stream.write() is called, data is buffered. When
> > the buffer is full, a request is made to S3. Since this can happen
> multiple
> > times, multipart upload feature is used. An upload id is created when
> > stream is initialized. This upload id is used till the stream is closed.
> > Default buffer size is 64MB.
> >
> > *Problem:*
> > The new SIG4 authentication scheme changes how multipart uploads work.
> Such
> > an upload now requires that we know the total size of data to be sent
> (sum
> > of sizes of all parts) when we create the first request itself. We need
> to
> > pass the total size of payload as part of header. This is not possible
> > given that we don't know all the write calls beforehand. For example, a
> > call to save model's parameters makes 145 calls to the stream's write.
> >
> > *Approach?*
> > Is it okay to buffer it to a local file, and then upload this file to S3
> at
> > the end?
> > What use case do we have for writes to S3 generally? I believe we would
> > want to write params after training or logs. These wouldn't be too large
> or
> > frequent I imagine. What would you suggest?
> >
> > Appreciate your thoughts and suggestions.
> >
> > Thanks,
> > Rahul Huilgol
> >
>



-- 
Rahul Huilgol


S3 Writes using SIG4 Authentication

2018-03-06 Thread Rahul Huilgol
Hi everyone,

I have been looking at updating the authentication used by S3FileSystem in
dmlc-core. Current code uses Signature version 2, which works only in the
region us-east-1 now. We need to update the authentication scheme to use
Signature version 4 (SIG4).

I've submitted a PR <https://github.com/dmlc/dmlc-core/pull/378> to change
this for Reads. But I wanted to seek out thoughts on what to do for Writes,
as there is a potential problem.

*How writes to S3 work currently:*
Whenever s3filesystem's stream.write() is called, data is buffered. When
the buffer is full, a request is made to S3. Since this can happen multiple
times, multipart upload feature is used. An upload id is created when
stream is initialized. This upload id is used till the stream is closed.
Default buffer size is 64MB.

*Problem:*
The new SIG4 authentication scheme changes how multipart uploads work. Such
an upload now requires that we know the total size of data to be sent (sum
of sizes of all parts) when we create the first request itself. We need to
pass the total size of payload as part of header. This is not possible
given that we don't know all the write calls beforehand. For example, a
call to save model's parameters makes 145 calls to the stream's write.

*Approach?*
Is it okay to buffer it to a local file, and then upload this file to S3 at
the end?
What use case do we have for writes to S3 generally? I believe we would
want to write params after training or logs. These wouldn't be too large or
frequent I imagine. What would you suggest?

Appreciate your thoughts and suggestions.

Thanks,
Rahul Huilgol


Re: mxnet Scala Convolution

2017-10-18 Thread Rahul Huilgol
Hi TongKe,

These are operators defined in the c++ backend under src/operator. For
example convolution is here
https://github.com/apache/incubator-mxnet/blob/master/src/operator/convolution.cc
. The operators are registered using nnvm, which helps automatically
generate the frontend functions.

This tutorial on how to add a backend operator
<https://github.com/apache/incubator-mxnet/blob/master/docs/how_to/add_op_in_backend.md>
contains information on how to register such operators, which would help
you understand the above file.
An excerpt from there (for quadratic operator) : "If you use python, when
you type import mxnet as mx, two python functions for invoking your backend
implementation are generated on the fly: one is for imperative programming
registered as mxnet.ndarray.quadratic or mx.nd.quadratic for short; the
other one is for symbolic programming registered under module
mxnet.symbol.quadratic or mx.sym.quadratic for short."

I'd think the Scala package works similarly.

Regards,
Rahul




On Wed, Oct 18, 2017 at 5:06 PM, TongKe Xue  wrote:

> My earlier question was a bit messy.
>
> To rephrase my question:
>
> 1. Scala AlexNet sample code calls Symbol.Convolution:
>
> https://github.com/apache/incubator-mxnet/blob/master/
> scala-package/examples/src/main/scala/ml/dmlc/mxnetexamples/visualization/
> AlexNet.scala#L30
>
> 2. Symbol.scala does not contain the string "Convolution"
>
> https://github.com/apache/incubator-mxnet/blob/master/
> scala-package/core/src/main/scala/ml/dmlc/mxnet/Symbol.scala#L982
>
> Question: where/how is Symbol.Convolution defined?
>
> On Wed, Oct 18, 2017 at 4:10 PM, TongKe Xue  wrote:
> > Hi,
> >
> > I am reading: https://mxnet.incubator.apache.org/api/scala/symbol.html
> >
> > I see Symbol.Variable, Symbol.Convolution
> >
> > When I look at Symbol.scala, I see Symbol.Variable at:
> > https://github.com/apache/incubator-mxnet/blob/master/
> scala-package/core/src/main/scala/ml/dmlc/mxnet/Symbol.scala#L982
> >
> > However, I can't find where Convolution, SoftMax, FullyConnected, ...
> > are defined.
> >
> > Where are these Symbols defined?
> >
> > (I have also tried: grep "Convolution" . -R | grep scala | grep def --
> > but found nothing).
> >
> > Thanks,
> > --TongKe
>



-- 
Rahul Huilgol


Re: CI system seems to be using python3 for python2 builds

2017-09-27 Thread Rahul Huilgol
Hi Gautam,

I see that ‘nosetests’ is the command used to run python2 tests. It looks
like that’s being mapped to use python3. I’ve checked that this is the case
on my Ubuntu instance. I need to use ‘nosetests-2.7’ to use python2 for the
tests. Please check if this fix works in the build environment
(slave/docker container) as well.

The PR you refer to only parallelized tests that were running one after the
other, this command was being used even before that PR.

Regards,
Rahul

On Wed, 27 Sep 2017 at 22:46 Gautam  wrote:

> Hi Ozawa,
>
>   Thanks for follow up.
>   Unfortunately I didn't get time to work on this today.
>
> However I have couple of points to mentions.
> 1. Looks like this backtrace has been present since long time, since this
> was not a test failure or build failure we never got notified about it.
> Here
> <
> https://builds.apache.org/view/Incubator%20Projects/job/incubator-mxnet/job/master/448/consoleFull
> >
> is the recent build log where back trace is present but build succeeded.
>
> 2. I don't think the default version of python on Ubuntu is 3.0, I logged
> into one of the apache slave and the default version of Python is 2.7.6
>
> 3. There has been slight change
>  in Jenkins file
> where
> we tried to parallelize python2 and 3 test run. I am not sure if it
> affects. I can probably scrub the build log and figure out if thats the
> case.
>
>
> Feel free to send the PR, if you have it ready.
>
>
> -Gautam
>
>
> On Wed, Sep 27, 2017 at 9:39 PM, Tsuyoshi Ozawa  wrote:
>
> > Hi Kumar,
> >
> > Thanks for looking into the issue. How is the progress of this problem?
> > Shouldn't we call /usr/bin/env python2 or python2.7 in following
> > source code instead of python since MXNet only supports python2
> > currently?
> > I think default version of python in Ubuntu is now python3, so it can
> > cause the problem.
> > If you have not yet done the work, I can create a PR for that in this
> > weekend.
> >
> > ./python/mxnet/__init__.py:#!/usr/bin/env python
> > ./python/mxnet/log.py:#!/usr/bin/env python
> > ./tests/nightly/dist_lenet.py:#!/usr/bin/env python
> > ./tests/nightly/dist_sync_kvstore.py:#!/usr/bin/env python
> > ./tests/nightly/multi_lenet.py:#!/usr/bin/env python
> > ./tests/nightly/test_kvstore.py:#!/usr/bin/env python
> > ./tools/coreml/mxnet_coreml_converter.py:#!/usr/bin/env python
> > ./tools/ipynb2md.py:#!/usr/bin/env python
> > ./tools/kill-mxnet.py:#!/usr/bin/env python
> > ./tools/launch.py:#!/usr/bin/env python
> > ./tools/parse_log.py:#!/usr/bin/env python
> >
> > On Wed, Sep 27, 2017 at 5:39 PM, Sunderland, Kellen 
> > wrote:
> > > Many thanks Gautam.
> > >
> > > On 9/26/17, 8:37 PM, "Kumar, Gautam"  wrote:
> > >
> > > Hi Kellen,
> > >
> > >This issue has been happening since last 3-4 days along with few
> > other test failure.
> > > I am looking into it.
> > >
> > > -Gautam
> > >
> > > On 9/26/17, 7:45 AM, "Sunderland, Kellen" 
> wrote:
> > >
> > > I’ve been noticing in a few failed builds that the stack trace
> > indicates we’re actually running python 3.4 in the python 2 tests. I know
> > the CI folks are working hard getting everything setup, is this a known
> > issue for the CI team?
> > >
> > > For example: https://builds.apache.org/
> > blue/organizations/jenkins/incubator-mxnet/detail/PR-8026/3/pipeline/281
> > >
> > > Steps Python2: MKLML-CPU
> > >
> > > StackTrace:
> > > Stack trace returned 10 entries:
> > > [bt] (0) /workspace/python/mxnet/../../lib/libmxnet.so(_
> > ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7fadb8999aac]
> > > [bt] (1) /workspace/python/mxnet/../../lib/libmxnet.so(_
> > ZN5mxnet7kvstore12KVStoreLocal12GroupKVPairsISt4pairIPNS_
> >
> 7NDArrayES4_EZNS1_19GroupKVPairsPullRspERKSt6vectorIiSaIiEERKS7_IS6_SaIS6_
> >
> EEPS9_PS7_ISD_SaISD_EEEUliRKS6_E_EEvSB_RKS7_IT_SaISN_EESG_PS7_ISP_SaISP_EERKT0_+0x56b)
> > [0x7fadba32c01b]
> > > [bt] (2) /workspace/python/mxnet/../../lib/libmxnet.so(_
> > ZN5mxnet7kvstore12KVStoreLocal17PullRowSparseImplERKSt6vecto
> > rIiSaIiEERKS2_ISt4pairIPNS_7NDArrayES8_ESaISA_EEi+0xa6) [0x7fadba32c856]
> > > [bt] (3)
> /workspace/python/mxnet/../../lib/libmxnet.so(MXKVStorePullRowSparse+0x245)
> > [0x7fadba18f165]
> > > [bt] (4)
> /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call_unix64+0x4c)
> > [0x7fadde26cadc]
> > > [bt] (5) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call+0x1fc)
> > [0x7fadde26c40c]
> > > [bt] (6) /usr/lib/python3.4/lib-dynload/_ctypes.cpython-34m-
> > x86_64-linux-gnu.so(_ctypes_callproc+0x21d) [0x7fadde47e12d]
> > > [bt] (7) /usr/lib/p
> 
> ython3.4/lib-dynload/_ctypes.cpython-34m-
> > x86_64-linux-gnu.so(+0xf6a3) [0x7fadde47e6a3]
> > > [bt] (8) /usr/bin/python3(PyEval_EvalFrameEx+0x41d7) [0x48a487]
> > > [bt] (9) /usr/bin/python3() [0x48f2df]
> > >
> 

Re: What's everyone working on?

2017-09-27 Thread Rahul Huilgol
Chao and I are working on compressing gradients to low bit precision (2bit
for now) to reduce communication costs and hence speedup training,
especially for distributed training. The idea is to retain the compression
error as residual and incorporate it into later iterations, so we don't see
much (or any?) loss in accuracy.

Regards,
Rahul

On Wed, 27 Sep 2017 at 07:20 kellen sunderland 
wrote:

> Pedro and I are focusing on a few use cases involving mobile and IoT device
> development.  At the moment we're trying to run machine translation and
> object detection models on a Jetson TX2 with reasonable performance.  We'll
> probably also look at a few different types of model compression at some
> point as well.  I think we're also happy to chip in with bug fixes where we
> can.
>
> -Kellen
>
> On Wed, Sep 27, 2017 at 3:58 PM, Dom Divakaruni <
> dominic.divakar...@gmail.com> wrote:
>
> > A couple of us are working on sparse support. Bhavin or Haibin, can you
> > fill in more detail?
> >
> > Regards,
> > Dom
> >
> >
> > > On Sep 26, 2017, at 4:35 PM, Nan Zhu  wrote:
> > >
> > > I am essentially doing the same thing as in xgboost-spark
> > >
> > > DF based ML integration, etc.
> > >
> > > Get Outlook for iOS
> > > 
> > > From: Naveen Swamy 
> > > Sent: Tuesday, September 26, 2017 4:20:25 PM
> > > To: dev@mxnet.incubator.apache.org
> > > Subject: Re: What's everyone working on?
> > >
> > > Hi Nan Zhu,
> > >
> > > Thanks for the update. Curious to know what part of mxnet-spark are you
> > > working on?
> > >
> > > I am also evaluating the integration of MXNet with Spark, planning to
> > start
> > > with PySpark and also looking into spark-deep learning-pipelines
> > > .
> > >
> > > Thanks, Naveen
> > >
> > >> On Tue, Sep 26, 2017 at 4:06 PM, Nan Zhu 
> > wrote:
> > >>
> > >> working on mxnet-spark, fixing some limitations in ps-lite(busy
> for
> > >> daily job in these days, should be back next week)
> > >>
> > >>> On Tue, Sep 26, 2017 at 10:13 AM, YiZhi Liu 
> > wrote:
> > >>>
> > >>> Hi Dominic,
> > >>>
> > >>> I'm working on 0.11-snapshot and we will soon have one. While the
> > >>> stable release will be after that we change package name from
> > >>> 'ml.dmlc' to 'org.apache'.
> > >>>
> > >>> 2017-09-27 0:04 GMT+08:00 Dominic Divakaruni <
> > >> dominic.divakar...@gmail.com
> >  :
> >  That's great, YiZhi. Workday uses the Scala package and was looking
> > >> for a
> >  maven distro for v0.11. When do you think you'll have one up?
> > 
> >  On Tue, Sep 26, 2017 at 8:58 AM, YiZhi Liu 
> > >> wrote:
> > 
> > > I'm currently working on maven deploy for scala package.
> > >
> > > 2017-09-26 16:00 GMT+08:00 Zihao Zheng :
> > >> I’m working on standalone TensorBoard, https://github.com/dmlc/
> > > tensorboard , currently we’ve
> > > support several features in original TensorBoard from TensorFlow in
> > >> pure
> > > Python without any DL framework dependency.
> > >>
> > >> Recently I’m trying to bring more features to this standalone
> > >> version,
> > > but seems not very trivial as it depends on TensorFlow. Any advice
> > are
> > > welcomed and looking for help.
> > >>
> > >> Thanks,
> > >> Zihao
> > >>
> > >>> 在 2017年9月26日,下午1:58,sandeep krishnamurthy <
> > >>> sandeep.krishn...@gmail.com>
> > > 写道:
> > >>>
> > >>> I am currently working with Jiajie Chen (https://github.com/
> > >>> jiajiechen/)
> > > on
> > >>> building an automated periodic benchmarking framework to run
> > >> various
> > >>> standard MXNet training jobs with both Symbolic and Gluon
> > >> interface.
> > > This
> > >>> framework will run following standard training jobs on a nightly
> > >> and
> > > weekly
> > >>> basis helping us to track performance improvements or regression
> > >>> early
> > > in
> > >>> the development cycle of MXNet. Both CPU and GPU instances are
> used
> > >>> capturing various metrics like training accuracy, validation
> > >>> accuracy,
> > >>> convergence, memory consumption, speed.
> > >>>
> > >>> To start with, we will be running Resnet50, Resnet152 on CIFAR
> and
> > >>> Synthetic Dataset. And, few more RNN and Bidirectional LSTM
> > >> training
> > > jobs.
> > >>>
> > >>> Thanks,
> > >>> Sandeep
> > >>>
> > >>>
> > >>> On Mon, Sep 25, 2017 at 8:00 PM, Henri Yandell <
> bay...@apache.org>
> > > wrote:
> > >>>
> >  Getting an instance of github.com/amzn/oss-dashboard setup for
> > >>> mxnet.
> > 
> >  Hopefully useful to write custom metric analysis; like: "most
> pull
> > > requests
> >  from non-committer" and "PRs without committer comment".
> > 
> >  Hen
> > 
> >  On Mon, Sep 25, 2017 at 11:24 Seb Kiureghian