Re: assimilation of mshadow into the MXNet codebase

2019-04-05 Thread Alfredo Luque
Do you have a link to both of these proposals?

On Fri, Apr 5, 2019 at 20:14 Anirudh Acharya  wrote:

> Hi Pedro,
>
> mshadow is mostly used for tensor arithmetic. There have been discussions
> about including it within mxnet. I think it is a good idea.
>
> As a more long term solution using libraries like eigen to perform linear
> algebra operations was also suggested by anirudh2290@. I think xtensor(
> https://github.com/QuantStack/xtensor ) can also be a candidate here.
>
> -
> Anirudh
>
>
> On Fri, Apr 5, 2019 at 7:03 PM Pedro Larroy 
> wrote:
>
> > Hi
> >
> > Some developers have noticed that working in mshadow is cumbersome as
> > it's a 3rdparty subrepo.
> >
> > Since mshadow is a bunch of headers which don't have much of
> > independent tests / library functionality, me and other developers
> > believe that it would be good to assimilate this code in the
> > repository for ease of contribution and changes without having to go
> > trough contortions to test PRs that modify mshadow.
> >
> > Would anybody oppose this change?
> >
> > Thanks and have a nice weekend.
> >
> > Pedro.
> >
>


Re: Gluon fit API- Design proposal

2019-02-07 Thread Alfredo Luque
This is great and something we should all be able to benefit from.

There are just three pieces I’d like to advocate for that I feel are
shortcomings of some competing APIs on other frameworks (eg; TF Estimators)
and I would love to see in this proposal:

1) Make serialization/deserialization of these classifiers/regressors easy
or at least ensure the internal members of the wrapper are easy to
save/load. We’ve hacked around this by only allowing hybrid blocks which
have easy save/load functionality, but having a simple
“save_model”/“load_model” function as a 1st class citizen of these proposed
APIs will lead to a vastly improved user experience down the road.

2) Allowing the fit/predict/predict_proba functions to take in both data
loaders and simple numpy arrays and pandas dataframes is a simple change
but a huge usability improvement. Power users and library authors will
appreciate being able to use custom data loaders but a large portion of end
users want to just pass an ndarray or data frame and get some results
quickly.

3) Allow lazy construction of the model. This is something I feel TF
Estimators do well: by allowing the user to pass a function that constructs
the net (i.e a model_fn that returns the net) rather than the net itself it
allows for more control at runtime and usage of these APIs in a production
environment.

Would love your thoughts on these three changes/additions.

—Alfredo Luque
Software Engineer
Machine Learning Infrastructure
Airbnb
San Francisco, CA

On February 7, 2019 at 1:51:17 PM, Ankit Khedia (khedia.an...@gmail.com)
wrote:

Hello dev@,

Training a model in Gluon requires users to write the training loop, this
is useful because of its imperative nature, however repeating the same code
across multiple models can become tedious and repetitive with boilerplate
code. The training loop can also be overwhelming to some users new to deep
learning. Users have asked in [1] for a simple Fit API, similar to APIs
available in SKLearn and Keras as a way to simplify model training and
reduce boilerplate code and complexity.

So, I along with other contributor Naveen and Lai came up with a fit API
proposal in [2] that covers 80% of the use-cases for beginners, the fit API
does not replace the gluon training loops. The API proposal is inspired by
the Keras fit API. I have discussed and got feedback from a few MXNet
contributors (Sheng, Mu, Aston, Zhi) close by and I am writing to ask for
the community’s feedback on the API proposal.



[1]
https://discuss.mxnet.io/t/wrapping-gluon-into-scikit-learn-like-api/2112
[2]
https://cwiki.apache.org/confluence/display/MXNET/Gluon+Fit+API+-+Tech+Design


Thanks,
Ankit


—
Alfredo Luque
Software Engineer
Machine Learning Infrastructure
Airbnb
San Francisco, CA


Re: [Discussion] Remove bundled llvm OpenMP

2018-11-22 Thread Alfredo Luque
The proposal here is not to eliminate the use of OpenMP but rather to use
the compiler's OpenMP implementation rather than a bundled one. I've been
bitten by issues with having multiple linked OpenMP implementations before
in another library and it was extremely difficult to debug.


It seems to me that tackling the issue with the assert is an orthogonal
issue altogether.

--Alfredo Luque

Software Engineer
Airbnb
Machine Learning Infrastructure

On Thu, Nov 22, 2018 at 10:12 AM Anton Chernov  wrote:

> Hi Chris,
>
> Thank you for your answer. If you have noticed the initial email comes from
> me, Anton Chernov (@lebeg on Github) and thus the proposal is not from any
> 'Ci' team that you've mentioned, but from me personally.
>
> You are writing:
>
> > someone is doing something unhealthy when they fork ...
>
> I'm missing any context to understand what you mean.
>
> > we get a lot of performance gain from OMP ...
>
> There is no data that would prove this statement and therefore it is a
> random guess.
>
> > in many months, no investigation has occurred as to WHY the assertion is
> failing.
>
> The investigation has concluded that this is happening due to undefined
> behaviour which is, in my opinion, a suffient answer that does not require
> to go any deeper.
>
> > the pr is vetoed until such a time that the actual root cause of the
> problem is known.
>
> And considering the statements above there is no valid reason to veto the
> PR.
>
>
> Best
> Anton
>
> чт, 22 нояб. 2018 г. в 15:38, Chris Olivier :
>
> > 3x less overhead*
> >
> > On Thu, Nov 22, 2018 at 6:25 AM Chris Olivier 
> > wrote:
> >
> > > someone is doing something unhealthy when they fork, which is causing
> an
> > > assertion in the openmp library. the same assertion that would fire in
> > mkl,
> > > which is linked to libiomp5 (exact same omp library). this is new
> > behavior
> > > and most likely due to an error or suboptimal approach in the forking
> > logic
> > > in mxnet.
> > >
> > > in order to circumvent the assert, the Ci team is proposing to remove
> the
> > > library completely which is equivalent to cutting off your leg to make
> > the
> > > pain from stubbing your toe go away.
> > >
> > > we get a lot of performance gain from OMP. is has about a 1/3 less
> > > overhead for entering omp regions and also supports omp regions after a
> > > fork, which libgomp does not.
> > >
> > > in many months, no investigation has occurred as to WHY the assertion
> is
> > > failing.
> > >
> > > the pr is vetoed until such a time that the actual root cause of the
> > > problem is known.
> > >
> > >
> > > thanks,
> > >
> > > -Chris.
> > >
> > >
> > >
> > >
> > > On Thu, Nov 22, 2018 at 4:36 AM Anton Chernov 
> > wrote:
> > >
> > >> Dear MXNet community,
> > >>
> > >> I would like to drive attention to an important issue that is present
> in
> > >> the MXNet CMake build: usage of bundled llvm OpenMP library.
> > >>
> > >> I have opened a PR to remove it:
> > >> https://github.com/apache/incubator-mxnet/pull/12160
> > >>
> > >> The issue was closed, but I am strong in my oppinion that it's the
> right
> > >> thing to do.
> > >>
> > >> *Background*
> > >> If you want to use OpenMP pragmas in your code for parallelization you
> > >> would supply a special flag to the compiler:
> > >>
> > >> - Clang / -fopenmp
> > >> https://openmp.llvm.org/
> > >>
> > >> - GCC / -fopenmp
> > >> https://gcc.gnu.org/onlinedocs/libgomp/Enabling-OpenMP.html
> > >>
> > >> - Intel / [Q]openmp
> > >>
> > >>
> >
> https://software.intel.com/en-us/node/522689#6E24682E-F411-4AE3-A04D-ECD81C7008D1
> > >>
> > >> - Visual Studio: /openmp (Enable OpenMP 2.0 Support)
> > >> https://msdn.microsoft.com/en-us/library/tt15eb9t.aspx
> > >>
> > >> Each of the compilers would enable the '#pragma omp' directive during
> > >> C/C++
> > >> compilation and arrange for automatic linking of the OpenMP runtime
> > >> library
> > >> supplied by each complier separately.
> > >>
> > >> Thus, to use the advantages of an OpenMP implementation one has to
> > compile
> > >> the code with the correspon

Re: Include MKLDNN into default mxnet pip package

2018-10-17 Thread Alfredo Luque
This is huge. Thanks for working on this. Is there a similar plan with eg;
tensor-rt support being ported into the main cuda-9.x packages?

On October 17, 2018 at 2:10:20 PM, Alex Zai (aza...@gmail.com) wrote:

Hey all,
We have been working hard these past few months to integrate and stabilize
Intel’s MKLDNN deep learning CPU accelerator into Mxnet and have made
incredible progress. On CPUs with AVX512 instructions (such as c5.18x) we
have seen performance increase up to 12x and on other platforms (Macs,
AVX2) we seen a speedup of 1.5+. Full list of benchmarks can be found here
(
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95650764
and https://github.com/apache/incubator-mxnet/pull/12591).

Currently, using this accelerator requires the developer to either pip
install the mxnet-mkl version of mxnet or to build it themselves from
source. Given that we should try to provide the best performance "out of
the box” with mxnet we should include this in the default build. The mkldnn
library is included with in the pip package build so it does not require an
external dependency.

There were concerns that MKLDNN could cause regressions on certain
platforms (as it did with the tensorflow version a while back); but we
added a env flag (MXNET_MKLDNN_ENABLED) that allows users to turn of this
feature during runtime. Please bring up any other concerns you may have and
your thoughts on including this accelerator in the default build.

Best,
Alex

—
Alfredo Luque
Software Engineer
Machine Learning Infrastructure
Airbnb
San Francisco, CA


Regressions in NDArrayIter

2018-09-11 Thread Alfredo Luque
Looks like https://github.com/apache/incubator-mxnet/pull/12285 broke a ton
of our test cases iterating over 3D NDArray instances (eg; MNIST) by
creating an index out of range.

Stacktrace:

.com/airbnb/bighead/python/bighead/ml_frameworks/mxnet/gluon.py", line
434, in transform
for batch in data_iter:
  File "/anaconda3/envs/py36/lib/python3.6/site-packages/mxnet/io/io.py",
line 228, in __next__
return self.next()
  File "/anaconda3/envs/py36/lib/python3.6/site-packages/mxnet/io/io.py",
line 680, in next
label = self.getlabel()
  File "/anaconda3/envs/py36/lib/python3.6/site-packages/mxnet/io/io.py",
line 750, in getlabel
return self._batchify(self.label)
  File "/anaconda3/envs/py36/lib/python3.6/site-packages/mxnet/io/io.py",
line 732, in _batchify
first_data = self._getdata(data_source, start=self.cursor)
  File "/anaconda3/envs/py36/lib/python3.6/site-packages/mxnet/io/io.py",
line 694, in _getdata
end = end if end is not None else data_source[0][1].shape[0]
IndexError: list index out of range

I’ve created an issue at
https://github.com/apache/incubator-mxnet/issues/12526


We’ll be pinning to the previous build until it’s reverted/patched, but let
us know if we can help provide more regression tests here.

—
Alfredo Luque
Software Engineer
Machine Learning Infrastructure
Airbnb
San Francisco, CA


Re: Nightly Builds Not Working for Cu90MKL?

2018-08-31 Thread Alfredo Luque
No worries! I think we’ll stick with this version for the time being since
we haven’t had issues with it. Building from source is a non-starter since
this is part of an automated docker build for us and we don’t want all the
build-dependencies.

Thanks for looking into this!

On August 31, 2018 at 2:37:20 PM, Anton Chernov (mecher...@gmail.com) wrote:

Thank you for noticing!

We are working on automating the process, but currently it's a manual
effort to publish to PyPi. We are experiencing some problems with the
publishing, but the issue should get resolved soon.

Best
Anton

пт, 31 авг. 2018 г. в 23:29, Alfredo Luque :

> See here:
> https://pypi.org/project/mxnet-cu90mkl/#history
>
> No builds show up since 8/22. From what I can tell, other variants (eg;
> mxnet-mkl) are up to date.
>
> On August 31, 2018 at 2:24:30 PM, Anton Chernov (mecher...@gmail.com)
> wrote:
>
> Hi Alfredo!
>
> Could you provide more info on this? Where do you get the information?
>
> Best
> Anton
>
> пт, 31 авг. 2018 г. в 22:49, Alfredo Luque
>  >:
>
> > Just curious why the latest build is 2018-08-22 while the other variants
> > are up to date.
> >
> > Thanks,
> >
> > —
> > Alfredo Luque
> > Software Engineer
> > Machine Learning Infrastructure
> > Airbnb
> > San Francisco, CA
> >
>
> —
> Alfredo Luque
> Software Engineer
> Machine Learning Infrastructure
> Airbnb
> San Francisco, CA
>
—
Alfredo Luque
Software Engineer
Machine Learning Infrastructure
Airbnb
San Francisco, CA


Re: Nightly Builds Not Working for Cu90MKL?

2018-08-31 Thread Alfredo Luque
See here:
https://pypi.org/project/mxnet-cu90mkl/#history

No builds show up since 8/22. From what I can tell, other variants (eg;
mxnet-mkl) are up to date.

On August 31, 2018 at 2:24:30 PM, Anton Chernov (mecher...@gmail.com) wrote:

Hi Alfredo!

Could you provide more info on this? Where do you get the information?

Best
Anton

пт, 31 авг. 2018 г. в 22:49, Alfredo Luque :

> Just curious why the latest build is 2018-08-22 while the other variants
> are up to date.
>
> Thanks,
>
> —
> Alfredo Luque
> Software Engineer
> Machine Learning Infrastructure
> Airbnb
> San Francisco, CA
>

—
Alfredo Luque
Software Engineer
Machine Learning Infrastructure
Airbnb
San Francisco, CA


Nightly Builds Not Working for Cu90MKL?

2018-08-31 Thread Alfredo Luque
Just curious why the latest build is 2018-08-22 while the other variants
are up to date.

Thanks,

—
Alfredo Luque
Software Engineer
Machine Learning Infrastructure
Airbnb
San Francisco, CA


Join Slack Channel

2018-07-16 Thread Alfredo Luque
Hi there, I’d like to join the MxNet slack channel. We’re working on low
precision quantization at airbnb and are interested in discussing some
issues we ran into there.

—
Alfredo Luque
Software Engineer
Machine Learning Infrastructure
Airbnb
San Francisco, CA