Re: [MXNET 2.0 Wishlist] [DISCUSS] Backend choices during runtime

2019-04-12 Thread Tianqi Chen
+1.

While I like slack, personally,  I don't think we should treat slack as
public-archive. "everything that happens (also) happens in dev@"

Tianqi



On Fri, Apr 12, 2019 at 1:19 AM Marco de Abreu 
wrote:

> I'd prefer if we keep discussions on the dev-list instead of slack - feel
> free to open another thread.
>
> -Marco
>
> Pedro Larroy  schrieb am Fr., 12. Apr. 2019,
> 02:24:
>
> > I will respond in slack, so we don't derail the original thread's
> > topic with my points.
> >
> > Looking forward to your proposal.
> >
> > On Thu, Apr 11, 2019 at 1:00 PM Junru Shao 
> > wrote:
> > >
> > > I don't have idea about the following issues:
> > >
> > > 1) Reducing the abuse of inlined code moving more logic to
> implementation
> > > files and improve scoping which will also speed up compilation
> > > 2) Reduce runtime of some unit tests
> > > 3) Improve MXNet startup time
> > >
> > > Will be super interested to hear about your ideas :-)
> > >
> > >
> > > On Thu, Apr 11, 2019 at 12:52 PM Junru Shao 
> > wrote:
> > >
> > > > We have a systematic solution to go without ABI headache. I am
> > struggling
> > > > with some errants, and will share our proposal here as soon as I
> could.
> > > > This will be very interesting topic to discuss. Let's work hard
> > together
> > > > and make it perfect :-)
> > > >
> > > > On Thu, Apr 11, 2019 at 12:43 PM Pedro Larroy <
> > > > pedro.larroy.li...@gmail.com> wrote:
> > > >
> > > >> Thanks Marco for raising this issue. I think we can certainly do
> some
> > > >> improvements in modularization and build. At the same time Tianqi's
> > > >> point of view is important to consider and on point. I see a high
> risk
> > > >> of overengineering in such endeavor.
> > > >>
> > > >> I also see increased complexity, difficulty debugging, C++ ABI
> > > >> headaches, API compatibility, crashes inside a binary module, etc.
> > > >> which I don't want to deal with as a developer or even as an MXNet
> > > >> user. Does somebody have answers to these problems?
> > > >>
> > > >> If somebody thinks they have a good solution, by all means propose a
> > > >> design in the wiki, I think we are all open. Personally I see
> several
> > > >> other lower hanging fruits which need our attention:
> > > >>  * Simplifying our build logic,
> > > >>  * Cuda selection in CMake,
> > > >>  * Reducing the abuse of inlined code moving more logic to
> > > >> implementation files and improve scoping which will also speed up
> > > >> compilation, (some units take more than 5 minutes to build and lots
> of
> > > >> RAM in a top of the line CPU core)
> > > >>  * Reduce runtime of some unit tests
> > > >> And other  improvements in our codebase that would bring immediate
> > > >> benefits without the risks of overengineering of a plugin system. I
> > > >> also question our bandwidth for such an endeavor.
> > > >>  * Improve MXNet startup time.
> > > >>  * Thread safety
> > > >>
> > > >> I would say, let's apply the KISS principle, let's make the project
> > > >> fast to build, easy to work on, well documented and easy to
> contribute
> > > >> to before building the next Netscape browser. Otherwise we could
> save
> > > >> ourselves this exercise and switch to Rust directly.
> > > >>
> > > >> Pedro.
> > > >>
> > > >>
> > > >>
> > > >> On Mon, Apr 8, 2019 at 9:42 AM Tianqi Chen <
> tqc...@cs.washington.edu>
> > > >> wrote:
> > > >> >
> > > >> > Just to clarify. I am not questioning the usefulness of the
> > separation.
> > > >> > Just want to highlight the technical challenges here based on our
> > past
> > > >> > experiences.
> > > >> >
> > > >> > Crossing DLL boundaries in C++ can create quite a lot of problems,
> > > >> > especially some of the dependencies used a different version of
> the
> > > >> > compiler, follows static packaging or simply because of the
> dynamic
> > > >> linking
> > > >> > difference in windows. These problems could make this direction
> move
> > > >> less
> > > >> > appealing compared to focusing effort on other things.
> > > >> >
> > > >> > Technically, as a first step, it is possible to make dependencies
> > change
> > > >> > not change the global header files and via registration so that
> > changing
> > > >> > certain component won't trigger a global recompile in CMake. This
> is
> > > >> also a
> > > >> > required step toward some modularity.
> > > >> >
> > > >> > For plugins, solutions that use C ABI can be used for certain
> plugin
> > > >> > modules.
> > > >> >
> > > >> > Some of the discussion has been tied to what the interface should
> > look
> > > >> > like. I think we should use different threads for these and puts
> in
> > more
> > > >> > thoughts.
> > > >> >
> > > >> > Tianqi
> > > >> >
> > > >> >
> > > >> >
> > > >> > On Sun, Apr 7, 2019 at 4:39 PM kellen sunderland <
> > > >> > kellen.sunderl...@gmail.com> wrote:
> > > >> >
> > > >> > > I think we can make some incremental progress.  My thoughts were
> > > >> along the
> > > >> > > lines of plugins (thinking about what 

Re: [MXNET 2.0 Wishlist] [DISCUSS] Backend choices during runtime

2019-04-11 Thread Marco de Abreu
I'd prefer if we keep discussions on the dev-list instead of slack - feel
free to open another thread.

-Marco

Pedro Larroy  schrieb am Fr., 12. Apr. 2019,
02:24:

> I will respond in slack, so we don't derail the original thread's
> topic with my points.
>
> Looking forward to your proposal.
>
> On Thu, Apr 11, 2019 at 1:00 PM Junru Shao 
> wrote:
> >
> > I don't have idea about the following issues:
> >
> > 1) Reducing the abuse of inlined code moving more logic to implementation
> > files and improve scoping which will also speed up compilation
> > 2) Reduce runtime of some unit tests
> > 3) Improve MXNet startup time
> >
> > Will be super interested to hear about your ideas :-)
> >
> >
> > On Thu, Apr 11, 2019 at 12:52 PM Junru Shao 
> wrote:
> >
> > > We have a systematic solution to go without ABI headache. I am
> struggling
> > > with some errants, and will share our proposal here as soon as I could.
> > > This will be very interesting topic to discuss. Let's work hard
> together
> > > and make it perfect :-)
> > >
> > > On Thu, Apr 11, 2019 at 12:43 PM Pedro Larroy <
> > > pedro.larroy.li...@gmail.com> wrote:
> > >
> > >> Thanks Marco for raising this issue. I think we can certainly do some
> > >> improvements in modularization and build. At the same time Tianqi's
> > >> point of view is important to consider and on point. I see a high risk
> > >> of overengineering in such endeavor.
> > >>
> > >> I also see increased complexity, difficulty debugging, C++ ABI
> > >> headaches, API compatibility, crashes inside a binary module, etc.
> > >> which I don't want to deal with as a developer or even as an MXNet
> > >> user. Does somebody have answers to these problems?
> > >>
> > >> If somebody thinks they have a good solution, by all means propose a
> > >> design in the wiki, I think we are all open. Personally I see several
> > >> other lower hanging fruits which need our attention:
> > >>  * Simplifying our build logic,
> > >>  * Cuda selection in CMake,
> > >>  * Reducing the abuse of inlined code moving more logic to
> > >> implementation files and improve scoping which will also speed up
> > >> compilation, (some units take more than 5 minutes to build and lots of
> > >> RAM in a top of the line CPU core)
> > >>  * Reduce runtime of some unit tests
> > >> And other  improvements in our codebase that would bring immediate
> > >> benefits without the risks of overengineering of a plugin system. I
> > >> also question our bandwidth for such an endeavor.
> > >>  * Improve MXNet startup time.
> > >>  * Thread safety
> > >>
> > >> I would say, let's apply the KISS principle, let's make the project
> > >> fast to build, easy to work on, well documented and easy to contribute
> > >> to before building the next Netscape browser. Otherwise we could save
> > >> ourselves this exercise and switch to Rust directly.
> > >>
> > >> Pedro.
> > >>
> > >>
> > >>
> > >> On Mon, Apr 8, 2019 at 9:42 AM Tianqi Chen 
> > >> wrote:
> > >> >
> > >> > Just to clarify. I am not questioning the usefulness of the
> separation.
> > >> > Just want to highlight the technical challenges here based on our
> past
> > >> > experiences.
> > >> >
> > >> > Crossing DLL boundaries in C++ can create quite a lot of problems,
> > >> > especially some of the dependencies used a different version of the
> > >> > compiler, follows static packaging or simply because of the dynamic
> > >> linking
> > >> > difference in windows. These problems could make this direction move
> > >> less
> > >> > appealing compared to focusing effort on other things.
> > >> >
> > >> > Technically, as a first step, it is possible to make dependencies
> change
> > >> > not change the global header files and via registration so that
> changing
> > >> > certain component won't trigger a global recompile in CMake. This is
> > >> also a
> > >> > required step toward some modularity.
> > >> >
> > >> > For plugins, solutions that use C ABI can be used for certain plugin
> > >> > modules.
> > >> >
> > >> > Some of the discussion has been tied to what the interface should
> look
> > >> > like. I think we should use different threads for these and puts in
> more
> > >> > thoughts.
> > >> >
> > >> > Tianqi
> > >> >
> > >> >
> > >> >
> > >> > On Sun, Apr 7, 2019 at 4:39 PM kellen sunderland <
> > >> > kellen.sunderl...@gmail.com> wrote:
> > >> >
> > >> > > I think we can make some incremental progress.  My thoughts were
> > >> along the
> > >> > > lines of plugins (thinking about what happens with the VLC
> project).
> > >> At
> > >> > > process launch time we could gather some information about our
> > >> execution
> > >> > > environment (either through configuration, or by convention
> looking
> > >> at our
> > >> > > folder structure and libraries available).  We could then later
> load
> > >> the
> > >> > > components we need after understanding if we're using a CUDA
> backend
> > >> and
> > >> > > what operators or subgraph components we would need.  Advantages
> > >> 

Re: [MXNET 2.0 Wishlist] [DISCUSS] Backend choices during runtime

2019-04-11 Thread Pedro Larroy
I will respond in slack, so we don't derail the original thread's
topic with my points.

Looking forward to your proposal.

On Thu, Apr 11, 2019 at 1:00 PM Junru Shao  wrote:
>
> I don't have idea about the following issues:
>
> 1) Reducing the abuse of inlined code moving more logic to implementation
> files and improve scoping which will also speed up compilation
> 2) Reduce runtime of some unit tests
> 3) Improve MXNet startup time
>
> Will be super interested to hear about your ideas :-)
>
>
> On Thu, Apr 11, 2019 at 12:52 PM Junru Shao  wrote:
>
> > We have a systematic solution to go without ABI headache. I am struggling
> > with some errants, and will share our proposal here as soon as I could.
> > This will be very interesting topic to discuss. Let's work hard together
> > and make it perfect :-)
> >
> > On Thu, Apr 11, 2019 at 12:43 PM Pedro Larroy <
> > pedro.larroy.li...@gmail.com> wrote:
> >
> >> Thanks Marco for raising this issue. I think we can certainly do some
> >> improvements in modularization and build. At the same time Tianqi's
> >> point of view is important to consider and on point. I see a high risk
> >> of overengineering in such endeavor.
> >>
> >> I also see increased complexity, difficulty debugging, C++ ABI
> >> headaches, API compatibility, crashes inside a binary module, etc.
> >> which I don't want to deal with as a developer or even as an MXNet
> >> user. Does somebody have answers to these problems?
> >>
> >> If somebody thinks they have a good solution, by all means propose a
> >> design in the wiki, I think we are all open. Personally I see several
> >> other lower hanging fruits which need our attention:
> >>  * Simplifying our build logic,
> >>  * Cuda selection in CMake,
> >>  * Reducing the abuse of inlined code moving more logic to
> >> implementation files and improve scoping which will also speed up
> >> compilation, (some units take more than 5 minutes to build and lots of
> >> RAM in a top of the line CPU core)
> >>  * Reduce runtime of some unit tests
> >> And other  improvements in our codebase that would bring immediate
> >> benefits without the risks of overengineering of a plugin system. I
> >> also question our bandwidth for such an endeavor.
> >>  * Improve MXNet startup time.
> >>  * Thread safety
> >>
> >> I would say, let's apply the KISS principle, let's make the project
> >> fast to build, easy to work on, well documented and easy to contribute
> >> to before building the next Netscape browser. Otherwise we could save
> >> ourselves this exercise and switch to Rust directly.
> >>
> >> Pedro.
> >>
> >>
> >>
> >> On Mon, Apr 8, 2019 at 9:42 AM Tianqi Chen 
> >> wrote:
> >> >
> >> > Just to clarify. I am not questioning the usefulness of the separation.
> >> > Just want to highlight the technical challenges here based on our past
> >> > experiences.
> >> >
> >> > Crossing DLL boundaries in C++ can create quite a lot of problems,
> >> > especially some of the dependencies used a different version of the
> >> > compiler, follows static packaging or simply because of the dynamic
> >> linking
> >> > difference in windows. These problems could make this direction move
> >> less
> >> > appealing compared to focusing effort on other things.
> >> >
> >> > Technically, as a first step, it is possible to make dependencies change
> >> > not change the global header files and via registration so that changing
> >> > certain component won't trigger a global recompile in CMake. This is
> >> also a
> >> > required step toward some modularity.
> >> >
> >> > For plugins, solutions that use C ABI can be used for certain plugin
> >> > modules.
> >> >
> >> > Some of the discussion has been tied to what the interface should look
> >> > like. I think we should use different threads for these and puts in more
> >> > thoughts.
> >> >
> >> > Tianqi
> >> >
> >> >
> >> >
> >> > On Sun, Apr 7, 2019 at 4:39 PM kellen sunderland <
> >> > kellen.sunderl...@gmail.com> wrote:
> >> >
> >> > > I think we can make some incremental progress.  My thoughts were
> >> along the
> >> > > lines of plugins (thinking about what happens with the VLC project).
> >> At
> >> > > process launch time we could gather some information about our
> >> execution
> >> > > environment (either through configuration, or by convention looking
> >> at our
> >> > > folder structure and libraries available).  We could then later load
> >> the
> >> > > components we need after understanding if we're using a CUDA backend
> >> and
> >> > > what operators or subgraph components we would need.  Advantages
> >> would be
> >> > > that we would move a lot of the current conditional compile logic to
> >> > > runtime, and automate a lot of it.  It would also make packaging
> >> binaries
> >> > > for targeted environments a little easier.  As an example we could
> >> compile
> >> > > once, then remove CUDA focused libraries for systems that are going
> >> to run
> >> > > on CPUs.
> >> > >
> >> > > On Sun, Apr 7, 2019 at 2:45 

Re: [MXNET 2.0 Wishlist] [DISCUSS] Backend choices during runtime

2019-04-11 Thread Junru Shao
I don't have idea about the following issues:

1) Reducing the abuse of inlined code moving more logic to implementation
files and improve scoping which will also speed up compilation
2) Reduce runtime of some unit tests
3) Improve MXNet startup time

Will be super interested to hear about your ideas :-)


On Thu, Apr 11, 2019 at 12:52 PM Junru Shao  wrote:

> We have a systematic solution to go without ABI headache. I am struggling
> with some errants, and will share our proposal here as soon as I could.
> This will be very interesting topic to discuss. Let's work hard together
> and make it perfect :-)
>
> On Thu, Apr 11, 2019 at 12:43 PM Pedro Larroy <
> pedro.larroy.li...@gmail.com> wrote:
>
>> Thanks Marco for raising this issue. I think we can certainly do some
>> improvements in modularization and build. At the same time Tianqi's
>> point of view is important to consider and on point. I see a high risk
>> of overengineering in such endeavor.
>>
>> I also see increased complexity, difficulty debugging, C++ ABI
>> headaches, API compatibility, crashes inside a binary module, etc.
>> which I don't want to deal with as a developer or even as an MXNet
>> user. Does somebody have answers to these problems?
>>
>> If somebody thinks they have a good solution, by all means propose a
>> design in the wiki, I think we are all open. Personally I see several
>> other lower hanging fruits which need our attention:
>>  * Simplifying our build logic,
>>  * Cuda selection in CMake,
>>  * Reducing the abuse of inlined code moving more logic to
>> implementation files and improve scoping which will also speed up
>> compilation, (some units take more than 5 minutes to build and lots of
>> RAM in a top of the line CPU core)
>>  * Reduce runtime of some unit tests
>> And other  improvements in our codebase that would bring immediate
>> benefits without the risks of overengineering of a plugin system. I
>> also question our bandwidth for such an endeavor.
>>  * Improve MXNet startup time.
>>  * Thread safety
>>
>> I would say, let's apply the KISS principle, let's make the project
>> fast to build, easy to work on, well documented and easy to contribute
>> to before building the next Netscape browser. Otherwise we could save
>> ourselves this exercise and switch to Rust directly.
>>
>> Pedro.
>>
>>
>>
>> On Mon, Apr 8, 2019 at 9:42 AM Tianqi Chen 
>> wrote:
>> >
>> > Just to clarify. I am not questioning the usefulness of the separation.
>> > Just want to highlight the technical challenges here based on our past
>> > experiences.
>> >
>> > Crossing DLL boundaries in C++ can create quite a lot of problems,
>> > especially some of the dependencies used a different version of the
>> > compiler, follows static packaging or simply because of the dynamic
>> linking
>> > difference in windows. These problems could make this direction move
>> less
>> > appealing compared to focusing effort on other things.
>> >
>> > Technically, as a first step, it is possible to make dependencies change
>> > not change the global header files and via registration so that changing
>> > certain component won't trigger a global recompile in CMake. This is
>> also a
>> > required step toward some modularity.
>> >
>> > For plugins, solutions that use C ABI can be used for certain plugin
>> > modules.
>> >
>> > Some of the discussion has been tied to what the interface should look
>> > like. I think we should use different threads for these and puts in more
>> > thoughts.
>> >
>> > Tianqi
>> >
>> >
>> >
>> > On Sun, Apr 7, 2019 at 4:39 PM kellen sunderland <
>> > kellen.sunderl...@gmail.com> wrote:
>> >
>> > > I think we can make some incremental progress.  My thoughts were
>> along the
>> > > lines of plugins (thinking about what happens with the VLC project).
>> At
>> > > process launch time we could gather some information about our
>> execution
>> > > environment (either through configuration, or by convention looking
>> at our
>> > > folder structure and libraries available).  We could then later load
>> the
>> > > components we need after understanding if we're using a CUDA backend
>> and
>> > > what operators or subgraph components we would need.  Advantages
>> would be
>> > > that we would move a lot of the current conditional compile logic to
>> > > runtime, and automate a lot of it.  It would also make packaging
>> binaries
>> > > for targeted environments a little easier.  As an example we could
>> compile
>> > > once, then remove CUDA focused libraries for systems that are going
>> to run
>> > > on CPUs.
>> > >
>> > > On Sun, Apr 7, 2019 at 2:45 PM Tianqi Chen 
>> > > wrote:
>> > >
>> > > > While I personally like the idea. This can be something that is
>> fairly
>> > > > technical challenging and I would caution against this idea vs
>> pushing
>> > > for
>> > > > good features and just allow runtime configuration.
>> > > >
>> > > > The main problem here is due to the C++ ABI. There is no standard
>> c++ ABI
>> > > > across compilers, 

Re: [MXNET 2.0 Wishlist] [DISCUSS] Backend choices during runtime

2019-04-11 Thread Junru Shao
We have a systematic solution to go without ABI headache. I am struggling
with some errants, and will share our proposal here as soon as I could.
This will be very interesting topic to discuss. Let's work hard together
and make it perfect :-)

On Thu, Apr 11, 2019 at 12:43 PM Pedro Larroy 
wrote:

> Thanks Marco for raising this issue. I think we can certainly do some
> improvements in modularization and build. At the same time Tianqi's
> point of view is important to consider and on point. I see a high risk
> of overengineering in such endeavor.
>
> I also see increased complexity, difficulty debugging, C++ ABI
> headaches, API compatibility, crashes inside a binary module, etc.
> which I don't want to deal with as a developer or even as an MXNet
> user. Does somebody have answers to these problems?
>
> If somebody thinks they have a good solution, by all means propose a
> design in the wiki, I think we are all open. Personally I see several
> other lower hanging fruits which need our attention:
>  * Simplifying our build logic,
>  * Cuda selection in CMake,
>  * Reducing the abuse of inlined code moving more logic to
> implementation files and improve scoping which will also speed up
> compilation, (some units take more than 5 minutes to build and lots of
> RAM in a top of the line CPU core)
>  * Reduce runtime of some unit tests
> And other  improvements in our codebase that would bring immediate
> benefits without the risks of overengineering of a plugin system. I
> also question our bandwidth for such an endeavor.
>  * Improve MXNet startup time.
>  * Thread safety
>
> I would say, let's apply the KISS principle, let's make the project
> fast to build, easy to work on, well documented and easy to contribute
> to before building the next Netscape browser. Otherwise we could save
> ourselves this exercise and switch to Rust directly.
>
> Pedro.
>
>
>
> On Mon, Apr 8, 2019 at 9:42 AM Tianqi Chen 
> wrote:
> >
> > Just to clarify. I am not questioning the usefulness of the separation.
> > Just want to highlight the technical challenges here based on our past
> > experiences.
> >
> > Crossing DLL boundaries in C++ can create quite a lot of problems,
> > especially some of the dependencies used a different version of the
> > compiler, follows static packaging or simply because of the dynamic
> linking
> > difference in windows. These problems could make this direction move less
> > appealing compared to focusing effort on other things.
> >
> > Technically, as a first step, it is possible to make dependencies change
> > not change the global header files and via registration so that changing
> > certain component won't trigger a global recompile in CMake. This is
> also a
> > required step toward some modularity.
> >
> > For plugins, solutions that use C ABI can be used for certain plugin
> > modules.
> >
> > Some of the discussion has been tied to what the interface should look
> > like. I think we should use different threads for these and puts in more
> > thoughts.
> >
> > Tianqi
> >
> >
> >
> > On Sun, Apr 7, 2019 at 4:39 PM kellen sunderland <
> > kellen.sunderl...@gmail.com> wrote:
> >
> > > I think we can make some incremental progress.  My thoughts were along
> the
> > > lines of plugins (thinking about what happens with the VLC project).
> At
> > > process launch time we could gather some information about our
> execution
> > > environment (either through configuration, or by convention looking at
> our
> > > folder structure and libraries available).  We could then later load
> the
> > > components we need after understanding if we're using a CUDA backend
> and
> > > what operators or subgraph components we would need.  Advantages would
> be
> > > that we would move a lot of the current conditional compile logic to
> > > runtime, and automate a lot of it.  It would also make packaging
> binaries
> > > for targeted environments a little easier.  As an example we could
> compile
> > > once, then remove CUDA focused libraries for systems that are going to
> run
> > > on CPUs.
> > >
> > > On Sun, Apr 7, 2019 at 2:45 PM Tianqi Chen 
> > > wrote:
> > >
> > > > While I personally like the idea. This can be something that is
> fairly
> > > > technical challenging and I would caution against this idea vs
> pushing
> > > for
> > > > good features and just allow runtime configuration.
> > > >
> > > > The main problem here is due to the C++ ABI. There is no standard
> c++ ABI
> > > > across compilers, which means resorting to runtime DLL and dynamic
> > > loading
> > > > brings all sorts of technical problems, especially when multiple
> modules
> > > > depend on the same third dependency(CUDA runtime).
> > > > There is no good to go solution can be made here, especially given
> the
> > > > explosion of the backend variants and dependencies in C++.
> > > > A partial solution could be achieved, through the sole use of C ABI.
> > > > Combing this with code generation can result in some simplifications
> and
> > 

Re: [MXNET 2.0 Wishlist] [DISCUSS] Backend choices during runtime

2019-04-11 Thread Pedro Larroy
Thanks Marco for raising this issue. I think we can certainly do some
improvements in modularization and build. At the same time Tianqi's
point of view is important to consider and on point. I see a high risk
of overengineering in such endeavor.

I also see increased complexity, difficulty debugging, C++ ABI
headaches, API compatibility, crashes inside a binary module, etc.
which I don't want to deal with as a developer or even as an MXNet
user. Does somebody have answers to these problems?

If somebody thinks they have a good solution, by all means propose a
design in the wiki, I think we are all open. Personally I see several
other lower hanging fruits which need our attention:
 * Simplifying our build logic,
 * Cuda selection in CMake,
 * Reducing the abuse of inlined code moving more logic to
implementation files and improve scoping which will also speed up
compilation, (some units take more than 5 minutes to build and lots of
RAM in a top of the line CPU core)
 * Reduce runtime of some unit tests
And other  improvements in our codebase that would bring immediate
benefits without the risks of overengineering of a plugin system. I
also question our bandwidth for such an endeavor.
 * Improve MXNet startup time.
 * Thread safety

I would say, let's apply the KISS principle, let's make the project
fast to build, easy to work on, well documented and easy to contribute
to before building the next Netscape browser. Otherwise we could save
ourselves this exercise and switch to Rust directly.

Pedro.



On Mon, Apr 8, 2019 at 9:42 AM Tianqi Chen  wrote:
>
> Just to clarify. I am not questioning the usefulness of the separation.
> Just want to highlight the technical challenges here based on our past
> experiences.
>
> Crossing DLL boundaries in C++ can create quite a lot of problems,
> especially some of the dependencies used a different version of the
> compiler, follows static packaging or simply because of the dynamic linking
> difference in windows. These problems could make this direction move less
> appealing compared to focusing effort on other things.
>
> Technically, as a first step, it is possible to make dependencies change
> not change the global header files and via registration so that changing
> certain component won't trigger a global recompile in CMake. This is also a
> required step toward some modularity.
>
> For plugins, solutions that use C ABI can be used for certain plugin
> modules.
>
> Some of the discussion has been tied to what the interface should look
> like. I think we should use different threads for these and puts in more
> thoughts.
>
> Tianqi
>
>
>
> On Sun, Apr 7, 2019 at 4:39 PM kellen sunderland <
> kellen.sunderl...@gmail.com> wrote:
>
> > I think we can make some incremental progress.  My thoughts were along the
> > lines of plugins (thinking about what happens with the VLC project).  At
> > process launch time we could gather some information about our execution
> > environment (either through configuration, or by convention looking at our
> > folder structure and libraries available).  We could then later load the
> > components we need after understanding if we're using a CUDA backend and
> > what operators or subgraph components we would need.  Advantages would be
> > that we would move a lot of the current conditional compile logic to
> > runtime, and automate a lot of it.  It would also make packaging binaries
> > for targeted environments a little easier.  As an example we could compile
> > once, then remove CUDA focused libraries for systems that are going to run
> > on CPUs.
> >
> > On Sun, Apr 7, 2019 at 2:45 PM Tianqi Chen 
> > wrote:
> >
> > > While I personally like the idea. This can be something that is fairly
> > > technical challenging and I would caution against this idea vs pushing
> > for
> > > good features and just allow runtime configuration.
> > >
> > > The main problem here is due to the C++ ABI. There is no standard c++ ABI
> > > across compilers, which means resorting to runtime DLL and dynamic
> > loading
> > > brings all sorts of technical problems, especially when multiple modules
> > > depend on the same third dependency(CUDA runtime).
> > > There is no good to go solution can be made here, especially given the
> > > explosion of the backend variants and dependencies in C++.
> > > A partial solution could be achieved, through the sole use of C ABI.
> > > Combing this with code generation can result in some simplifications and
> > > enable some runtime loadable module. TVM does this, and perhaps MXNet
> > could
> > > reuse some of that component for operator libraries. Similarly, having a
> > > customizable operator library that is loadable via C ABI might be
> > possible.
> > >
> > > So to summarize, while I really like the idea of dynamically loadable
> > > modules. My past experience suggests that this will bring a lot of
> > > additional engineering burden and technical debts without significant
> > > benefit. I would suggest starting by 

Re: [MXNET 2.0 Wishlist] [DISCUSS] Backend choices during runtime

2019-04-08 Thread Junru Shao
+1 Thanks Marco for sharing this!

It is great to see people agree with this feature and we actually have been
planning for this for a while. We would love to share this plan as soon as
possible.


On Mon, Apr 8, 2019 at 9:42 AM Tianqi Chen  wrote:

> Just to clarify. I am not questioning the usefulness of the separation.
> Just want to highlight the technical challenges here based on our past
> experiences.
>
> Crossing DLL boundaries in C++ can create quite a lot of problems,
> especially some of the dependencies used a different version of the
> compiler, follows static packaging or simply because of the dynamic linking
> difference in windows. These problems could make this direction move less
> appealing compared to focusing effort on other things.
>
> Technically, as a first step, it is possible to make dependencies change
> not change the global header files and via registration so that changing
> certain component won't trigger a global recompile in CMake. This is also a
> required step toward some modularity.
>
> For plugins, solutions that use C ABI can be used for certain plugin
> modules.
>
> Some of the discussion has been tied to what the interface should look
> like. I think we should use different threads for these and puts in more
> thoughts.
>
> Tianqi
>
>
>
> On Sun, Apr 7, 2019 at 4:39 PM kellen sunderland <
> kellen.sunderl...@gmail.com> wrote:
>
> > I think we can make some incremental progress.  My thoughts were along
> the
> > lines of plugins (thinking about what happens with the VLC project).  At
> > process launch time we could gather some information about our execution
> > environment (either through configuration, or by convention looking at
> our
> > folder structure and libraries available).  We could then later load the
> > components we need after understanding if we're using a CUDA backend and
> > what operators or subgraph components we would need.  Advantages would be
> > that we would move a lot of the current conditional compile logic to
> > runtime, and automate a lot of it.  It would also make packaging binaries
> > for targeted environments a little easier.  As an example we could
> compile
> > once, then remove CUDA focused libraries for systems that are going to
> run
> > on CPUs.
> >
> > On Sun, Apr 7, 2019 at 2:45 PM Tianqi Chen 
> > wrote:
> >
> > > While I personally like the idea. This can be something that is fairly
> > > technical challenging and I would caution against this idea vs pushing
> > for
> > > good features and just allow runtime configuration.
> > >
> > > The main problem here is due to the C++ ABI. There is no standard c++
> ABI
> > > across compilers, which means resorting to runtime DLL and dynamic
> > loading
> > > brings all sorts of technical problems, especially when multiple
> modules
> > > depend on the same third dependency(CUDA runtime).
> > > There is no good to go solution can be made here, especially given the
> > > explosion of the backend variants and dependencies in C++.
> > > A partial solution could be achieved, through the sole use of C ABI.
> > > Combing this with code generation can result in some simplifications
> and
> > > enable some runtime loadable module. TVM does this, and perhaps MXNet
> > could
> > > reuse some of that component for operator libraries. Similarly, having
> a
> > > customizable operator library that is loadable via C ABI might be
> > possible.
> > >
> > > So to summarize, while I really like the idea of dynamically loadable
> > > modules. My past experience suggests that this will bring a lot of
> > > additional engineering burden and technical debts without significant
> > > benefit. I would suggest starting by supporting something simple like a
> > > plugin module, before moving toward the general direction.
> > >
> > > Tianqi
> > >
> > > On Sun, Apr 7, 2019 at 1:31 PM kellen sunderland <
> > > kellen.sunderl...@gmail.com> wrote:
> > >
> > > > Strongly support the idea of runtime loadable components in MXNet.
> > > There's
> > > > no reason (other than perhaps engineering effort) we can't have a
> > single
> > > > compilation of MXNet that finds dependencies and chooses execution
> > paths
> > > > intelligently (or based on configuration) at runtime.
> > > >
> > > > On Thu, Apr 4, 2019 at 12:29 PM Marco de Abreu <
> marcoab...@apache.org>
> > > > wrote:
> > > >
> > > > > Hello,
> > > > >
> > > > > I'd like to start a discussion about something that I've noticed
> > being
> > > > > troublesome to maintain in the current version: Backend choices
> being
> > > > made
> > > > > at compile time.
> > > > >
> > > > > Right now, the different backends and accelerators (CPU, cuda, mkl,
> > AWS
> > > > > elastic inference, (future) AMD, openblas,TVM, etc) are all
> scattered
> > > > > across the different layers of MXNet. On one hand, we have compile
> > time
> > > > > flags that decide which backends are being compiled into the
> binary,
> > > > while
> > > > > at the same time choices can be made in the 

Re: [MXNET 2.0 Wishlist] [DISCUSS] Backend choices during runtime

2019-04-08 Thread Tianqi Chen
Just to clarify. I am not questioning the usefulness of the separation.
Just want to highlight the technical challenges here based on our past
experiences.

Crossing DLL boundaries in C++ can create quite a lot of problems,
especially some of the dependencies used a different version of the
compiler, follows static packaging or simply because of the dynamic linking
difference in windows. These problems could make this direction move less
appealing compared to focusing effort on other things.

Technically, as a first step, it is possible to make dependencies change
not change the global header files and via registration so that changing
certain component won't trigger a global recompile in CMake. This is also a
required step toward some modularity.

For plugins, solutions that use C ABI can be used for certain plugin
modules.

Some of the discussion has been tied to what the interface should look
like. I think we should use different threads for these and puts in more
thoughts.

Tianqi



On Sun, Apr 7, 2019 at 4:39 PM kellen sunderland <
kellen.sunderl...@gmail.com> wrote:

> I think we can make some incremental progress.  My thoughts were along the
> lines of plugins (thinking about what happens with the VLC project).  At
> process launch time we could gather some information about our execution
> environment (either through configuration, or by convention looking at our
> folder structure and libraries available).  We could then later load the
> components we need after understanding if we're using a CUDA backend and
> what operators or subgraph components we would need.  Advantages would be
> that we would move a lot of the current conditional compile logic to
> runtime, and automate a lot of it.  It would also make packaging binaries
> for targeted environments a little easier.  As an example we could compile
> once, then remove CUDA focused libraries for systems that are going to run
> on CPUs.
>
> On Sun, Apr 7, 2019 at 2:45 PM Tianqi Chen 
> wrote:
>
> > While I personally like the idea. This can be something that is fairly
> > technical challenging and I would caution against this idea vs pushing
> for
> > good features and just allow runtime configuration.
> >
> > The main problem here is due to the C++ ABI. There is no standard c++ ABI
> > across compilers, which means resorting to runtime DLL and dynamic
> loading
> > brings all sorts of technical problems, especially when multiple modules
> > depend on the same third dependency(CUDA runtime).
> > There is no good to go solution can be made here, especially given the
> > explosion of the backend variants and dependencies in C++.
> > A partial solution could be achieved, through the sole use of C ABI.
> > Combing this with code generation can result in some simplifications and
> > enable some runtime loadable module. TVM does this, and perhaps MXNet
> could
> > reuse some of that component for operator libraries. Similarly, having a
> > customizable operator library that is loadable via C ABI might be
> possible.
> >
> > So to summarize, while I really like the idea of dynamically loadable
> > modules. My past experience suggests that this will bring a lot of
> > additional engineering burden and technical debts without significant
> > benefit. I would suggest starting by supporting something simple like a
> > plugin module, before moving toward the general direction.
> >
> > Tianqi
> >
> > On Sun, Apr 7, 2019 at 1:31 PM kellen sunderland <
> > kellen.sunderl...@gmail.com> wrote:
> >
> > > Strongly support the idea of runtime loadable components in MXNet.
> > There's
> > > no reason (other than perhaps engineering effort) we can't have a
> single
> > > compilation of MXNet that finds dependencies and chooses execution
> paths
> > > intelligently (or based on configuration) at runtime.
> > >
> > > On Thu, Apr 4, 2019 at 12:29 PM Marco de Abreu 
> > > wrote:
> > >
> > > > Hello,
> > > >
> > > > I'd like to start a discussion about something that I've noticed
> being
> > > > troublesome to maintain in the current version: Backend choices being
> > > made
> > > > at compile time.
> > > >
> > > > Right now, the different backends and accelerators (CPU, cuda, mkl,
> AWS
> > > > elastic inference, (future) AMD, openblas,TVM, etc) are all scattered
> > > > across the different layers of MXNet. On one hand, we have compile
> time
> > > > flags that decide which backends are being compiled into the binary,
> > > while
> > > > at the same time choices can be made in the frontend during runtime.
> > > >
> > > > At the moment, we have a lot of conditional build logic that picks
> > > > different parts. With the addition of MKLML and later MKLDNN the
> clear
> > > > separation of CPU and GPU got kind of broken up. While we have some
> > > places
> > > > where each code lives, in the end we resort to some files containing
> a
> > > lot
> > > > of conditional logic for the different backends (sorry I can't
> provide
> > > > links right now since I'm on mobile). 

Re: [MXNET 2.0 Wishlist] [DISCUSS] Backend choices during runtime

2019-04-07 Thread kellen sunderland
I think we can make some incremental progress.  My thoughts were along the
lines of plugins (thinking about what happens with the VLC project).  At
process launch time we could gather some information about our execution
environment (either through configuration, or by convention looking at our
folder structure and libraries available).  We could then later load the
components we need after understanding if we're using a CUDA backend and
what operators or subgraph components we would need.  Advantages would be
that we would move a lot of the current conditional compile logic to
runtime, and automate a lot of it.  It would also make packaging binaries
for targeted environments a little easier.  As an example we could compile
once, then remove CUDA focused libraries for systems that are going to run
on CPUs.

On Sun, Apr 7, 2019 at 2:45 PM Tianqi Chen  wrote:

> While I personally like the idea. This can be something that is fairly
> technical challenging and I would caution against this idea vs pushing for
> good features and just allow runtime configuration.
>
> The main problem here is due to the C++ ABI. There is no standard c++ ABI
> across compilers, which means resorting to runtime DLL and dynamic loading
> brings all sorts of technical problems, especially when multiple modules
> depend on the same third dependency(CUDA runtime).
> There is no good to go solution can be made here, especially given the
> explosion of the backend variants and dependencies in C++.
> A partial solution could be achieved, through the sole use of C ABI.
> Combing this with code generation can result in some simplifications and
> enable some runtime loadable module. TVM does this, and perhaps MXNet could
> reuse some of that component for operator libraries. Similarly, having a
> customizable operator library that is loadable via C ABI might be possible.
>
> So to summarize, while I really like the idea of dynamically loadable
> modules. My past experience suggests that this will bring a lot of
> additional engineering burden and technical debts without significant
> benefit. I would suggest starting by supporting something simple like a
> plugin module, before moving toward the general direction.
>
> Tianqi
>
> On Sun, Apr 7, 2019 at 1:31 PM kellen sunderland <
> kellen.sunderl...@gmail.com> wrote:
>
> > Strongly support the idea of runtime loadable components in MXNet.
> There's
> > no reason (other than perhaps engineering effort) we can't have a single
> > compilation of MXNet that finds dependencies and chooses execution paths
> > intelligently (or based on configuration) at runtime.
> >
> > On Thu, Apr 4, 2019 at 12:29 PM Marco de Abreu 
> > wrote:
> >
> > > Hello,
> > >
> > > I'd like to start a discussion about something that I've noticed being
> > > troublesome to maintain in the current version: Backend choices being
> > made
> > > at compile time.
> > >
> > > Right now, the different backends and accelerators (CPU, cuda, mkl, AWS
> > > elastic inference, (future) AMD, openblas,TVM, etc) are all scattered
> > > across the different layers of MXNet. On one hand, we have compile time
> > > flags that decide which backends are being compiled into the binary,
> > while
> > > at the same time choices can be made in the frontend during runtime.
> > >
> > > At the moment, we have a lot of conditional build logic that picks
> > > different parts. With the addition of MKLML and later MKLDNN the clear
> > > separation of CPU and GPU got kind of broken up. While we have some
> > places
> > > where each code lives, in the end we resort to some files containing a
> > lot
> > > of conditional logic for the different backends (sorry I can't provide
> > > links right now since I'm on mobile). To me this seems like a residue
> of
> > > the fast development style from the early days (more processor
> statement
> > > and less object orientation) while also having organic growth with new
> > > accelerators. When I see how much AMD had to hack to fit in their
> > > implementation, it seemed like we have to make this part more developer
> > > friendly.
> > >
> > > At the moment, every new flavour of MXNet has to be entirely
> recompiled.
> > > This makes it hard for users to figure out which options to use, while
> it
> > > makes it harder for us to test since the overhead to test every single
> > > combination of compile parameters would be overwhelming.
> > >
> > > I'd propose to have a clear class hierarchy based structure for
> > > accelerators, operators and memory management. This structure can then
> be
> > > implemented by the different backends. To reduce the compile burden, we
> > > would introduce dynamic loading and split the different backends into
> > > modules. These could then be developed, maintained and compiled on
> their
> > > own and then placed in a "module" folder to be loaded at runtime.
> Adding
> > a
> > > new accelerator would be a matter of placing the precompiled binary
> into
> > > the folder. The 

Re: [MXNET 2.0 Wishlist] [DISCUSS] Backend choices during runtime

2019-04-07 Thread Tianqi Chen
While I personally like the idea. This can be something that is fairly
technical challenging and I would caution against this idea vs pushing for
good features and just allow runtime configuration.

The main problem here is due to the C++ ABI. There is no standard c++ ABI
across compilers, which means resorting to runtime DLL and dynamic loading
brings all sorts of technical problems, especially when multiple modules
depend on the same third dependency(CUDA runtime).
There is no good to go solution can be made here, especially given the
explosion of the backend variants and dependencies in C++.
A partial solution could be achieved, through the sole use of C ABI.
Combing this with code generation can result in some simplifications and
enable some runtime loadable module. TVM does this, and perhaps MXNet could
reuse some of that component for operator libraries. Similarly, having a
customizable operator library that is loadable via C ABI might be possible.

So to summarize, while I really like the idea of dynamically loadable
modules. My past experience suggests that this will bring a lot of
additional engineering burden and technical debts without significant
benefit. I would suggest starting by supporting something simple like a
plugin module, before moving toward the general direction.

Tianqi

On Sun, Apr 7, 2019 at 1:31 PM kellen sunderland <
kellen.sunderl...@gmail.com> wrote:

> Strongly support the idea of runtime loadable components in MXNet.  There's
> no reason (other than perhaps engineering effort) we can't have a single
> compilation of MXNet that finds dependencies and chooses execution paths
> intelligently (or based on configuration) at runtime.
>
> On Thu, Apr 4, 2019 at 12:29 PM Marco de Abreu 
> wrote:
>
> > Hello,
> >
> > I'd like to start a discussion about something that I've noticed being
> > troublesome to maintain in the current version: Backend choices being
> made
> > at compile time.
> >
> > Right now, the different backends and accelerators (CPU, cuda, mkl, AWS
> > elastic inference, (future) AMD, openblas,TVM, etc) are all scattered
> > across the different layers of MXNet. On one hand, we have compile time
> > flags that decide which backends are being compiled into the binary,
> while
> > at the same time choices can be made in the frontend during runtime.
> >
> > At the moment, we have a lot of conditional build logic that picks
> > different parts. With the addition of MKLML and later MKLDNN the clear
> > separation of CPU and GPU got kind of broken up. While we have some
> places
> > where each code lives, in the end we resort to some files containing a
> lot
> > of conditional logic for the different backends (sorry I can't provide
> > links right now since I'm on mobile). To me this seems like a residue of
> > the fast development style from the early days (more processor statement
> > and less object orientation) while also having organic growth with new
> > accelerators. When I see how much AMD had to hack to fit in their
> > implementation, it seemed like we have to make this part more developer
> > friendly.
> >
> > At the moment, every new flavour of MXNet has to be entirely recompiled.
> > This makes it hard for users to figure out which options to use, while it
> > makes it harder for us to test since the overhead to test every single
> > combination of compile parameters would be overwhelming.
> >
> > I'd propose to have a clear class hierarchy based structure for
> > accelerators, operators and memory management. This structure can then be
> > implemented by the different backends. To reduce the compile burden, we
> > would introduce dynamic loading and split the different backends into
> > modules. These could then be developed, maintained and compiled on their
> > own and then placed in a "module" folder to be loaded at runtime. Adding
> a
> > new accelerator would be a matter of placing the precompiled binary into
> > the folder. The detailed configuration of that Backend would then be done
> > on runtime - the user shouldn't worry at the point of downloading mxnet
> > whether they want mkl, MKLDNN, mkl, openblas, atlas, TVM, cuda or what
> ever
> > else there is. I have an idea how we could help the user choosing, but
> > that's outside the scope of this proposal.
> >
> > This would allow us to have a "core" MXNet that takes care of the engine,
> > scheduling, communication and all the other crucial parts. On the other
> > hand we could make MXNet less of a monolith and have clear interfaces.
> This
> > would also act as a forcing function because the different parts wouldn't
> > be intermingled but have to follow the common interface.
> >
> > Of course this comes with the question what these interfaces would look
> > like. For operators, I'd like to propose getting inspiring (or fully
> > adapting) ONNX. For memory management and other Backend specific things
> we
> > could look at the current implementations and find a common ground.
> >
> > Back when I 

Re: [MXNET 2.0 Wishlist] [DISCUSS] Backend choices during runtime

2019-04-07 Thread kellen sunderland
Strongly support the idea of runtime loadable components in MXNet.  There's
no reason (other than perhaps engineering effort) we can't have a single
compilation of MXNet that finds dependencies and chooses execution paths
intelligently (or based on configuration) at runtime.

On Thu, Apr 4, 2019 at 12:29 PM Marco de Abreu 
wrote:

> Hello,
>
> I'd like to start a discussion about something that I've noticed being
> troublesome to maintain in the current version: Backend choices being made
> at compile time.
>
> Right now, the different backends and accelerators (CPU, cuda, mkl, AWS
> elastic inference, (future) AMD, openblas,TVM, etc) are all scattered
> across the different layers of MXNet. On one hand, we have compile time
> flags that decide which backends are being compiled into the binary, while
> at the same time choices can be made in the frontend during runtime.
>
> At the moment, we have a lot of conditional build logic that picks
> different parts. With the addition of MKLML and later MKLDNN the clear
> separation of CPU and GPU got kind of broken up. While we have some places
> where each code lives, in the end we resort to some files containing a lot
> of conditional logic for the different backends (sorry I can't provide
> links right now since I'm on mobile). To me this seems like a residue of
> the fast development style from the early days (more processor statement
> and less object orientation) while also having organic growth with new
> accelerators. When I see how much AMD had to hack to fit in their
> implementation, it seemed like we have to make this part more developer
> friendly.
>
> At the moment, every new flavour of MXNet has to be entirely recompiled.
> This makes it hard for users to figure out which options to use, while it
> makes it harder for us to test since the overhead to test every single
> combination of compile parameters would be overwhelming.
>
> I'd propose to have a clear class hierarchy based structure for
> accelerators, operators and memory management. This structure can then be
> implemented by the different backends. To reduce the compile burden, we
> would introduce dynamic loading and split the different backends into
> modules. These could then be developed, maintained and compiled on their
> own and then placed in a "module" folder to be loaded at runtime. Adding a
> new accelerator would be a matter of placing the precompiled binary into
> the folder. The detailed configuration of that Backend would then be done
> on runtime - the user shouldn't worry at the point of downloading mxnet
> whether they want mkl, MKLDNN, mkl, openblas, atlas, TVM, cuda or what ever
> else there is. I have an idea how we could help the user choosing, but
> that's outside the scope of this proposal.
>
> This would allow us to have a "core" MXNet that takes care of the engine,
> scheduling, communication and all the other crucial parts. On the other
> hand we could make MXNet less of a monolith and have clear interfaces. This
> would also act as a forcing function because the different parts wouldn't
> be intermingled but have to follow the common interface.
>
> Of course this comes with the question what these interfaces would look
> like. For operators, I'd like to propose getting inspiring (or fully
> adapting) ONNX. For memory management and other Backend specific things we
> could look at the current implementations and find a common ground.
>
> Back when I had a community driven project, we heavily used this modularity
> and it brought great benefits - besides the fact that our core was closed
> source. It allowed community developers to act entirely independent from
> other parts and even allowed them to add their own logic without having to
> touch the core. Thinking about companies that implement their own backends
> or have special tweaked operators without wanting to disclose them, this
> structure would avoid them having to fork the project and then spend a lot
> of effort porting the changes to the latest source release versions.
> Instead, they would maintain their module and we as MXNet community would
> only have to maintain these interfaces.
>
> Right now this is a lot of prosa and basically a brain dump of my thoughts.
> I'd be happy to follow up with details, but first I'd be curious what the
> community thinks about this design.
>
> Best regards,
> Marco
>


[MXNET 2.0 Wishlist] [DISCUSS] Backend choices during runtime

2019-04-04 Thread Marco de Abreu
Hello,

I'd like to start a discussion about something that I've noticed being
troublesome to maintain in the current version: Backend choices being made
at compile time.

Right now, the different backends and accelerators (CPU, cuda, mkl, AWS
elastic inference, (future) AMD, openblas,TVM, etc) are all scattered
across the different layers of MXNet. On one hand, we have compile time
flags that decide which backends are being compiled into the binary, while
at the same time choices can be made in the frontend during runtime.

At the moment, we have a lot of conditional build logic that picks
different parts. With the addition of MKLML and later MKLDNN the clear
separation of CPU and GPU got kind of broken up. While we have some places
where each code lives, in the end we resort to some files containing a lot
of conditional logic for the different backends (sorry I can't provide
links right now since I'm on mobile). To me this seems like a residue of
the fast development style from the early days (more processor statement
and less object orientation) while also having organic growth with new
accelerators. When I see how much AMD had to hack to fit in their
implementation, it seemed like we have to make this part more developer
friendly.

At the moment, every new flavour of MXNet has to be entirely recompiled.
This makes it hard for users to figure out which options to use, while it
makes it harder for us to test since the overhead to test every single
combination of compile parameters would be overwhelming.

I'd propose to have a clear class hierarchy based structure for
accelerators, operators and memory management. This structure can then be
implemented by the different backends. To reduce the compile burden, we
would introduce dynamic loading and split the different backends into
modules. These could then be developed, maintained and compiled on their
own and then placed in a "module" folder to be loaded at runtime. Adding a
new accelerator would be a matter of placing the precompiled binary into
the folder. The detailed configuration of that Backend would then be done
on runtime - the user shouldn't worry at the point of downloading mxnet
whether they want mkl, MKLDNN, mkl, openblas, atlas, TVM, cuda or what ever
else there is. I have an idea how we could help the user choosing, but
that's outside the scope of this proposal.

This would allow us to have a "core" MXNet that takes care of the engine,
scheduling, communication and all the other crucial parts. On the other
hand we could make MXNet less of a monolith and have clear interfaces. This
would also act as a forcing function because the different parts wouldn't
be intermingled but have to follow the common interface.

Of course this comes with the question what these interfaces would look
like. For operators, I'd like to propose getting inspiring (or fully
adapting) ONNX. For memory management and other Backend specific things we
could look at the current implementations and find a common ground.

Back when I had a community driven project, we heavily used this modularity
and it brought great benefits - besides the fact that our core was closed
source. It allowed community developers to act entirely independent from
other parts and even allowed them to add their own logic without having to
touch the core. Thinking about companies that implement their own backends
or have special tweaked operators without wanting to disclose them, this
structure would avoid them having to fork the project and then spend a lot
of effort porting the changes to the latest source release versions.
Instead, they would maintain their module and we as MXNet community would
only have to maintain these interfaces.

Right now this is a lot of prosa and basically a brain dump of my thoughts.
I'd be happy to follow up with details, but first I'd be curious what the
community thinks about this design.

Best regards,
Marco