Re: Cuda 10.2 Wheels

2020-02-06 Thread Alfredo Luque
Looks like it updated since I last posted. Thanks!

On February 6, 2020 at 3:20:34 PM, Pedro Larroy (
pedro.larroy.li...@gmail.com) wrote:

Hi Alfredo.

Isn't "mxnet_cu102mkl-1.6.0
<
https://repo.mxnet.io/dist/mxnet_cu102mkl-1.6.0-py2.py3-none-manylinux1_x86_64.whl>"

what you are looking for? I see it on the second link you posted.

Pedro

On Tue, Feb 4, 2020 at 3:29 PM Alfredo Luque
 wrote:

> Hi folks,
>
> Are there any blockers on releasing CUDA 10.2 compatible wheels? Based on
> this
> readme
> <
>
https://github.com/apache/incubator-mxnet/blob/master/tools/pip/doc/CU102_ADDITIONAL.md
> >
> the
> packages should be available on PyPi already but they don’t appear to
exist
> yet.
>
> On the other thread, someone posted this static page
> <https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/index.html> that
has
> nightly builds hosted on S3 but it appears CUDA 10.2 wheels aren’t on
> there.
>
> —
> Alfredo Luque
> Software Engineer
> Machine Learning Infrastructure
> Airbnb
> San Francisco, CA
>

—
Alfredo Luque
Software Engineer
Machine Learning Infrastructure
Airbnb
San Francisco, CA


Cuda 10.2 Wheels

2020-02-04 Thread Alfredo Luque
Hi folks,

Are there any blockers on releasing CUDA 10.2 compatible wheels? Based on this
readme
<https://github.com/apache/incubator-mxnet/blob/master/tools/pip/doc/CU102_ADDITIONAL.md>
the
packages should be available on PyPi already but they don’t appear to exist
yet.

On the other thread, someone posted this static page
<https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/index.html> that has
nightly builds hosted on S3 but it appears CUDA 10.2 wheels aren’t on there.

—
Alfredo Luque
Software Engineer
Machine Learning Infrastructure
Airbnb
San Francisco, CA


Re: Proposal to make MKLDNN as default CPU backend

2019-11-18 Thread Alfredo Luque
For AMD CPUs, you’d want to perform validation because now MKL-DNN would be
enabled by default. Historically, other intel libraries (along with the ICC
compiler) have had performance issues on AMD CPUs. It’s just worth double
checking to make sure that’s not the case here. Perhaps some MKL-DNN
authors can chime in though. It’s not sufficient to double check that an
AVX2 package passes tests.

Agreed in the case we’re not releasing ARM binaries.

The reproducibility argument is around the results being numerically
reproducible. That is, eg; if I train a model with some fixed set of data,
some random seed, etc. and then run inference on it do I get the exact same
floating point values for the weights and results? Does MxNet already offer
this without MKL-DNN?

On November 18, 2019 at 6:32:07 PM, Tao Lv (mutou...@gmail.com) wrote:

Regarding the cases listed by Marco:
- AMD CPU
>From my architecture knowledge, what works on C4 instances (with AVX2
support) should also work well on m5a, right? I think mxnet-mkl and
mxnet-cuxxmkl packages have been fully validated on AVX2 machines.
Also, we didn't perform any validation on AMD CPU before, why we need do
that for this time?

- ARM CPU
I don't know we're releasing any convenience binaries for ARM CPU. This
proposal mainly targets those pypi packages.

- Windows
Already validated by CI. We're also releasing mxnet-mkl packages for Win.

- GPU and MKLDNN enabled
Already validated by CI and mxnet-cuxxmkl packages have been released for
several versions.

- Fully reproducible results (medical and financial sector requested that
and we have some flags for cuda)
Not sure I understand this case. We already have MKL-DNN backend for a
while. Functionality and correctness of it have been verified by MXNet
users.

-tao

On Tue, Nov 19, 2019 at 4:41 AM Marco de Abreu 
wrote:

> Sorry, my intent with the "non-standard" phrase was not about general
MXNet
> but rather from MKLDNNs point of view, considering that it's being
> developed by Intel, I assumed that MKLDNN might consider non-intel
> use-cases non standard.
>
> -Marco
>
> Skalicky, Sam  schrieb am Mo., 18. Nov. 2019,
> 21:34:
>
> > Thanks Alfredo, if you can create a GitHub issue with notes/steps we
can
> > add this to the todo list for integrating with the MXNet CI to test on
> m5a
> > instances too. Then we can start tracking this on a regular basis. It
> would
> > be great to actually test on ARM instances now that AWS has A1
instances
> > too…..ill add it to the wish list ;-D
> >
> > Sam
> >
> > > On Nov 18, 2019, at 12:32 PM, Alfredo Luque  .INVALID>
> > wrote:
> > >
> > > Happy to run some benchmarks on an AWS m5a instance (Epyc) and first
> > > generation AMD Threadripper Gen 1 if someone has something easy to
run
> > and
> > > representative.
> > >
> > > On November 18, 2019 at 12:29:31 PM, Skalicky, Sam (
> > > sska...@amazon.com.invalid) wrote:
> > >
> > > Thanks a good idea Alfredo, are you able to help test on AMD CPUs? Or
> is
> > > there someone else in the mxnet dev@ community who can help?
> > >
> > > Sam
> > >
> > >> On Nov 18, 2019, at 12:27 PM, Alfredo Luque
> > >  wrote:
> > >>
> > >> Verifying that there isn’t a slowdown on AMD CPUs (eg; Ryzen / Epyc)
> > > would
> > >> definitely make sense as a requirement. It seems odd to classify
that
> as
> > > a
> > >> “nonstandard” use case.
> > >>
> > >> On November 18, 2019 at 12:20:33 PM, Skalicky, Sam (
> > >> sska...@amazon.com.invalid) wrote:
> > >>
> > >> Thanks Patric & team for your work over the years to make MXNet fast
> > with
> > >> MKLDNN!
> > >>
> > >> I think it would be great to make MKLDNN enabled by default. We will
> > need
> > >> to continue producing variants without MKLDNN for those who don’t
want
> > it
> > >> (Marco enumerated some use cases). How do you propose to identify
the
> > pip
> > >> wheels with/without MKLDNN? Previously we had: mxnet-mkl and
> > > mxnet-cu101mkl
> > >> with MKLDNN. If the plain “mxnet” pip wheel now contains MKLDNN what
> do
> > > you
> > >> propose we call the build without MKLDNN? mxnet-nomkl?
> > >>
> > >> Thanks!
> > >> Sam
> > >>
> > >>> On Nov 18, 2019, at 11:08 AM, Marco de Abreu <
> marco.g.ab...@gmail.com>
> > >> wrote:
> > >>>
> > >>> Hi Patric,
> > >>>
> > >>> First of all, thanks a lot to you and yo

Re: Proposal to make MKLDNN as default CPU backend

2019-11-18 Thread Alfredo Luque
Happy to run some benchmarks on an AWS m5a instance (Epyc) and first
generation AMD Threadripper Gen 1 if someone has something easy to run and
representative.

On November 18, 2019 at 12:29:31 PM, Skalicky, Sam (
sska...@amazon.com.invalid) wrote:

Thanks a good idea Alfredo, are you able to help test on AMD CPUs? Or is
there someone else in the mxnet dev@ community who can help?

Sam

> On Nov 18, 2019, at 12:27 PM, Alfredo Luque
 wrote:
>
> Verifying that there isn’t a slowdown on AMD CPUs (eg; Ryzen / Epyc)
would
> definitely make sense as a requirement. It seems odd to classify that as
a
> “nonstandard” use case.
>
> On November 18, 2019 at 12:20:33 PM, Skalicky, Sam (
> sska...@amazon.com.invalid) wrote:
>
> Thanks Patric & team for your work over the years to make MXNet fast with
> MKLDNN!
>
> I think it would be great to make MKLDNN enabled by default. We will need
> to continue producing variants without MKLDNN for those who don’t want it
> (Marco enumerated some use cases). How do you propose to identify the pip
> wheels with/without MKLDNN? Previously we had: mxnet-mkl and
mxnet-cu101mkl
> with MKLDNN. If the plain “mxnet” pip wheel now contains MKLDNN what do
you
> propose we call the build without MKLDNN? mxnet-nomkl?
>
> Thanks!
> Sam
>
>> On Nov 18, 2019, at 11:08 AM, Marco de Abreu 
> wrote:
>>
>> Hi Patric,
>>
>> First of all, thanks a lot to you and your team for all the effort on
> MXNet
>> and mkldnn!
>>
>> Generally I'm inclined towards your proposal, but I'm thinking about the
>> non-standard use cases:
>> - AMD CPU
>> - ARM CPU
>> - Windows
>> - GPU and MKLDNN enabled
>> - Fully reproducible results (medical and financial sector requested
that
>> and we have some flags for cuda)
>>
>> Is mkldnn fully compatible with these use cases? If not, what would
> happen?
>> If yes, do we have performance numbers?
>>
>> Best regards,
>> Marco
>>
>> Zhao, Patric  schrieb am Mo., 18. Nov. 2019,
> 14:00:
>>
>>> Hi MXNet community,
>>>
>>> From the first MKLDNN backend integrated in release 1.2, the community
> is
>>> continuously improving the quality and performance of MKLDNN CPU
> backend.
>>> Nowadays, the MKLDNN backend is widely used for the inference,
> especially
>>> for INT8 inference, and we got lots of very positive feedbacks from
> MXNet
>>> users.
>>>
>>> Achieved milestones as below:
>>>
>>> - MKLDNN integrated into Apache MXNet from release 1.2, Feb, 2018 [1]
>>> - MKLDNN backend as default CPU backend from source building, Jan, 2019
> [2]
>>> - MKLDNN subgraph optimization as default for the inference, Jul, 2019
> [3]
>>> - MKLDNN major version upgrade in release 1.6, Oct, 2019 [4]
>>>
>>> To make more successful and technical leadership for Apache MXNet in
the
>>> industry, I propose to make MKLDNN as default CPU backend in all binary
>>> distribution from the next release.
>>> The new milestone includes:
>>>
>>> - Static link MKLDNN library in the binary avoiding the mismatch
version
>>> in the runtime [5]
>>> - Make nightly build with MKLDNN default from master pre 1.7 release
>>> - Binary distribution with MKLDNN default from 1.7 release.
>>>
>>> What will be changed:
>>>
>>> - mxnet and mxnet-cuXX binary will be built with MKLDNN=1
>>> - mxnet-mkl and mxnet-cuXXmkl will be not changed in the minor release
>>> (1.x) and plan to remove in next major release (2.0)
>>>
>>> Suggestions and comments are highly appreciated.
>>>
>>> Thanks,
>>>
>>> --Patric
>>>
>>>
>>> [1] https://github.com/apache/incubator-mxnet/pull/9677
>>> [2]
>>>
>
https://lists.apache.org/thread.html/bfeae6ee46374112eb4dff1470c262959101e4bffb19930926963535@%3Cdev.mxnet.apache.org%3E
>>> [3] https://github.com/apache/incubator-mxnet/pull/15518
>>> [4]
>>>
>
https://lists.apache.org/thread.html/f46ab920f18795496eafe713e6e9e561c684e06189085cec17b401dc@%3Cdev.mxnet.apache.org%3E
>>> [5] https://github.com/apache/incubator-mxnet/pull/16731
>>>
>
> —
> Alfredo Luque
> Software Engineer
> Machine Learning Infrastructure
> Airbnb
> San Francisco, CA

—
Alfredo Luque
Software Engineer
Machine Learning Infrastructure
Airbnb
San Francisco, CA


Re: Proposal to make MKLDNN as default CPU backend

2019-11-18 Thread Alfredo Luque
Verifying that there isn’t a slowdown on AMD CPUs (eg; Ryzen / Epyc) would
definitely make sense as a requirement. It seems odd to classify that as a
“nonstandard” use case.

On November 18, 2019 at 12:20:33 PM, Skalicky, Sam (
sska...@amazon.com.invalid) wrote:

Thanks Patric & team for your work over the years to make MXNet fast with
MKLDNN!

I think it would be great to make MKLDNN enabled by default. We will need
to continue producing variants without MKLDNN for those who don’t want it
(Marco enumerated some use cases). How do you propose to identify the pip
wheels with/without MKLDNN? Previously we had: mxnet-mkl and mxnet-cu101mkl
with MKLDNN. If the plain “mxnet” pip wheel now contains MKLDNN what do you
propose we call the build without MKLDNN? mxnet-nomkl?

Thanks!
Sam

> On Nov 18, 2019, at 11:08 AM, Marco de Abreu 
wrote:
>
> Hi Patric,
>
> First of all, thanks a lot to you and your team for all the effort on
MXNet
> and mkldnn!
>
> Generally I'm inclined towards your proposal, but I'm thinking about the
> non-standard use cases:
> - AMD CPU
> - ARM CPU
> - Windows
> - GPU and MKLDNN enabled
> - Fully reproducible results (medical and financial sector requested that
> and we have some flags for cuda)
>
> Is mkldnn fully compatible with these use cases? If not, what would
happen?
> If yes, do we have performance numbers?
>
> Best regards,
> Marco
>
> Zhao, Patric  schrieb am Mo., 18. Nov. 2019,
14:00:
>
>> Hi MXNet community,
>>
>> From the first MKLDNN backend integrated in release 1.2, the community
is
>> continuously improving the quality and performance of MKLDNN CPU
backend.
>> Nowadays, the MKLDNN backend is widely used for the inference,
especially
>> for INT8 inference, and we got lots of very positive feedbacks from
MXNet
>> users.
>>
>> Achieved milestones as below:
>>
>> - MKLDNN integrated into Apache MXNet from release 1.2, Feb, 2018 [1]
>> - MKLDNN backend as default CPU backend from source building, Jan, 2019
[2]
>> - MKLDNN subgraph optimization as default for the inference, Jul, 2019
[3]
>> - MKLDNN major version upgrade in release 1.6, Oct, 2019 [4]
>>
>> To make more successful and technical leadership for Apache MXNet in the
>> industry, I propose to make MKLDNN as default CPU backend in all binary
>> distribution from the next release.
>> The new milestone includes:
>>
>> - Static link MKLDNN library in the binary avoiding the mismatch version
>> in the runtime [5]
>> - Make nightly build with MKLDNN default from master pre 1.7 release
>> - Binary distribution with MKLDNN default from 1.7 release.
>>
>> What will be changed:
>>
>> - mxnet and mxnet-cuXX binary will be built with MKLDNN=1
>> - mxnet-mkl and mxnet-cuXXmkl will be not changed in the minor release
>> (1.x) and plan to remove in next major release (2.0)
>>
>> Suggestions and comments are highly appreciated.
>>
>> Thanks,
>>
>> --Patric
>>
>>
>> [1] https://github.com/apache/incubator-mxnet/pull/9677
>> [2]
>>
https://lists.apache.org/thread.html/bfeae6ee46374112eb4dff1470c262959101e4bffb19930926963535@%3Cdev.mxnet.apache.org%3E
>> [3] https://github.com/apache/incubator-mxnet/pull/15518
>> [4]
>>
https://lists.apache.org/thread.html/f46ab920f18795496eafe713e6e9e561c684e06189085cec17b401dc@%3Cdev.mxnet.apache.org%3E
>> [5] https://github.com/apache/incubator-mxnet/pull/16731
>>

—
Alfredo Luque
Software Engineer
Machine Learning Infrastructure
Airbnb
San Francisco, CA


Re: assimilation of mshadow into the MXNet codebase

2019-04-05 Thread Alfredo Luque
Do you have a link to both of these proposals?

On Fri, Apr 5, 2019 at 20:14 Anirudh Acharya  wrote:

> Hi Pedro,
>
> mshadow is mostly used for tensor arithmetic. There have been discussions
> about including it within mxnet. I think it is a good idea.
>
> As a more long term solution using libraries like eigen to perform linear
> algebra operations was also suggested by anirudh2290@. I think xtensor(
> https://github.com/QuantStack/xtensor ) can also be a candidate here.
>
> -
> Anirudh
>
>
> On Fri, Apr 5, 2019 at 7:03 PM Pedro Larroy 
> wrote:
>
> > Hi
> >
> > Some developers have noticed that working in mshadow is cumbersome as
> > it's a 3rdparty subrepo.
> >
> > Since mshadow is a bunch of headers which don't have much of
> > independent tests / library functionality, me and other developers
> > believe that it would be good to assimilate this code in the
> > repository for ease of contribution and changes without having to go
> > trough contortions to test PRs that modify mshadow.
> >
> > Would anybody oppose this change?
> >
> > Thanks and have a nice weekend.
> >
> > Pedro.
> >
>


Re: Gluon fit API- Design proposal

2019-02-07 Thread Alfredo Luque
This is great and something we should all be able to benefit from.

There are just three pieces I’d like to advocate for that I feel are
shortcomings of some competing APIs on other frameworks (eg; TF Estimators)
and I would love to see in this proposal:

1) Make serialization/deserialization of these classifiers/regressors easy
or at least ensure the internal members of the wrapper are easy to
save/load. We’ve hacked around this by only allowing hybrid blocks which
have easy save/load functionality, but having a simple
“save_model”/“load_model” function as a 1st class citizen of these proposed
APIs will lead to a vastly improved user experience down the road.

2) Allowing the fit/predict/predict_proba functions to take in both data
loaders and simple numpy arrays and pandas dataframes is a simple change
but a huge usability improvement. Power users and library authors will
appreciate being able to use custom data loaders but a large portion of end
users want to just pass an ndarray or data frame and get some results
quickly.

3) Allow lazy construction of the model. This is something I feel TF
Estimators do well: by allowing the user to pass a function that constructs
the net (i.e a model_fn that returns the net) rather than the net itself it
allows for more control at runtime and usage of these APIs in a production
environment.

Would love your thoughts on these three changes/additions.

—Alfredo Luque
Software Engineer
Machine Learning Infrastructure
Airbnb
San Francisco, CA

On February 7, 2019 at 1:51:17 PM, Ankit Khedia (khedia.an...@gmail.com)
wrote:

Hello dev@,

Training a model in Gluon requires users to write the training loop, this
is useful because of its imperative nature, however repeating the same code
across multiple models can become tedious and repetitive with boilerplate
code. The training loop can also be overwhelming to some users new to deep
learning. Users have asked in [1] for a simple Fit API, similar to APIs
available in SKLearn and Keras as a way to simplify model training and
reduce boilerplate code and complexity.

So, I along with other contributor Naveen and Lai came up with a fit API
proposal in [2] that covers 80% of the use-cases for beginners, the fit API
does not replace the gluon training loops. The API proposal is inspired by
the Keras fit API. I have discussed and got feedback from a few MXNet
contributors (Sheng, Mu, Aston, Zhi) close by and I am writing to ask for
the community’s feedback on the API proposal.



[1]
https://discuss.mxnet.io/t/wrapping-gluon-into-scikit-learn-like-api/2112
[2]
https://cwiki.apache.org/confluence/display/MXNET/Gluon+Fit+API+-+Tech+Design


Thanks,
Ankit


—
Alfredo Luque
Software Engineer
Machine Learning Infrastructure
Airbnb
San Francisco, CA


Re: [Discussion] Remove bundled llvm OpenMP

2018-11-22 Thread Alfredo Luque
The proposal here is not to eliminate the use of OpenMP but rather to use
the compiler's OpenMP implementation rather than a bundled one. I've been
bitten by issues with having multiple linked OpenMP implementations before
in another library and it was extremely difficult to debug.


It seems to me that tackling the issue with the assert is an orthogonal
issue altogether.

--Alfredo Luque

Software Engineer
Airbnb
Machine Learning Infrastructure

On Thu, Nov 22, 2018 at 10:12 AM Anton Chernov  wrote:

> Hi Chris,
>
> Thank you for your answer. If you have noticed the initial email comes from
> me, Anton Chernov (@lebeg on Github) and thus the proposal is not from any
> 'Ci' team that you've mentioned, but from me personally.
>
> You are writing:
>
> > someone is doing something unhealthy when they fork ...
>
> I'm missing any context to understand what you mean.
>
> > we get a lot of performance gain from OMP ...
>
> There is no data that would prove this statement and therefore it is a
> random guess.
>
> > in many months, no investigation has occurred as to WHY the assertion is
> failing.
>
> The investigation has concluded that this is happening due to undefined
> behaviour which is, in my opinion, a suffient answer that does not require
> to go any deeper.
>
> > the pr is vetoed until such a time that the actual root cause of the
> problem is known.
>
> And considering the statements above there is no valid reason to veto the
> PR.
>
>
> Best
> Anton
>
> чт, 22 нояб. 2018 г. в 15:38, Chris Olivier :
>
> > 3x less overhead*
> >
> > On Thu, Nov 22, 2018 at 6:25 AM Chris Olivier 
> > wrote:
> >
> > > someone is doing something unhealthy when they fork, which is causing
> an
> > > assertion in the openmp library. the same assertion that would fire in
> > mkl,
> > > which is linked to libiomp5 (exact same omp library). this is new
> > behavior
> > > and most likely due to an error or suboptimal approach in the forking
> > logic
> > > in mxnet.
> > >
> > > in order to circumvent the assert, the Ci team is proposing to remove
> the
> > > library completely which is equivalent to cutting off your leg to make
> > the
> > > pain from stubbing your toe go away.
> > >
> > > we get a lot of performance gain from OMP. is has about a 1/3 less
> > > overhead for entering omp regions and also supports omp regions after a
> > > fork, which libgomp does not.
> > >
> > > in many months, no investigation has occurred as to WHY the assertion
> is
> > > failing.
> > >
> > > the pr is vetoed until such a time that the actual root cause of the
> > > problem is known.
> > >
> > >
> > > thanks,
> > >
> > > -Chris.
> > >
> > >
> > >
> > >
> > > On Thu, Nov 22, 2018 at 4:36 AM Anton Chernov 
> > wrote:
> > >
> > >> Dear MXNet community,
> > >>
> > >> I would like to drive attention to an important issue that is present
> in
> > >> the MXNet CMake build: usage of bundled llvm OpenMP library.
> > >>
> > >> I have opened a PR to remove it:
> > >> https://github.com/apache/incubator-mxnet/pull/12160
> > >>
> > >> The issue was closed, but I am strong in my oppinion that it's the
> right
> > >> thing to do.
> > >>
> > >> *Background*
> > >> If you want to use OpenMP pragmas in your code for parallelization you
> > >> would supply a special flag to the compiler:
> > >>
> > >> - Clang / -fopenmp
> > >> https://openmp.llvm.org/
> > >>
> > >> - GCC / -fopenmp
> > >> https://gcc.gnu.org/onlinedocs/libgomp/Enabling-OpenMP.html
> > >>
> > >> - Intel / [Q]openmp
> > >>
> > >>
> >
> https://software.intel.com/en-us/node/522689#6E24682E-F411-4AE3-A04D-ECD81C7008D1
> > >>
> > >> - Visual Studio: /openmp (Enable OpenMP 2.0 Support)
> > >> https://msdn.microsoft.com/en-us/library/tt15eb9t.aspx
> > >>
> > >> Each of the compilers would enable the '#pragma omp' directive during
> > >> C/C++
> > >> compilation and arrange for automatic linking of the OpenMP runtime
> > >> library
> > >> supplied by each complier separately.
> > >>
> > >> Thus, to use the advantages of an OpenMP implementation one has to
> > compile
&

Re: Include MKLDNN into default mxnet pip package

2018-10-17 Thread Alfredo Luque
This is huge. Thanks for working on this. Is there a similar plan with eg;
tensor-rt support being ported into the main cuda-9.x packages?

On October 17, 2018 at 2:10:20 PM, Alex Zai (aza...@gmail.com) wrote:

Hey all,
We have been working hard these past few months to integrate and stabilize
Intel’s MKLDNN deep learning CPU accelerator into Mxnet and have made
incredible progress. On CPUs with AVX512 instructions (such as c5.18x) we
have seen performance increase up to 12x and on other platforms (Macs,
AVX2) we seen a speedup of 1.5+. Full list of benchmarks can be found here
(
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95650764
and https://github.com/apache/incubator-mxnet/pull/12591).

Currently, using this accelerator requires the developer to either pip
install the mxnet-mkl version of mxnet or to build it themselves from
source. Given that we should try to provide the best performance "out of
the box” with mxnet we should include this in the default build. The mkldnn
library is included with in the pip package build so it does not require an
external dependency.

There were concerns that MKLDNN could cause regressions on certain
platforms (as it did with the tensorflow version a while back); but we
added a env flag (MXNET_MKLDNN_ENABLED) that allows users to turn of this
feature during runtime. Please bring up any other concerns you may have and
your thoughts on including this accelerator in the default build.

Best,
Alex

—
Alfredo Luque
Software Engineer
Machine Learning Infrastructure
Airbnb
San Francisco, CA


Regressions in NDArrayIter

2018-09-11 Thread Alfredo Luque
Looks like https://github.com/apache/incubator-mxnet/pull/12285 broke a ton
of our test cases iterating over 3D NDArray instances (eg; MNIST) by
creating an index out of range.

Stacktrace:

.com/airbnb/bighead/python/bighead/ml_frameworks/mxnet/gluon.py", line
434, in transform
for batch in data_iter:
  File "/anaconda3/envs/py36/lib/python3.6/site-packages/mxnet/io/io.py",
line 228, in __next__
return self.next()
  File "/anaconda3/envs/py36/lib/python3.6/site-packages/mxnet/io/io.py",
line 680, in next
label = self.getlabel()
  File "/anaconda3/envs/py36/lib/python3.6/site-packages/mxnet/io/io.py",
line 750, in getlabel
return self._batchify(self.label)
  File "/anaconda3/envs/py36/lib/python3.6/site-packages/mxnet/io/io.py",
line 732, in _batchify
first_data = self._getdata(data_source, start=self.cursor)
  File "/anaconda3/envs/py36/lib/python3.6/site-packages/mxnet/io/io.py",
line 694, in _getdata
end = end if end is not None else data_source[0][1].shape[0]
IndexError: list index out of range

I’ve created an issue at
https://github.com/apache/incubator-mxnet/issues/12526


We’ll be pinning to the previous build until it’s reverted/patched, but let
us know if we can help provide more regression tests here.

—
Alfredo Luque
Software Engineer
Machine Learning Infrastructure
Airbnb
San Francisco, CA


Re: Nightly Builds Not Working for Cu90MKL?

2018-08-31 Thread Alfredo Luque
No worries! I think we’ll stick with this version for the time being since
we haven’t had issues with it. Building from source is a non-starter since
this is part of an automated docker build for us and we don’t want all the
build-dependencies.

Thanks for looking into this!

On August 31, 2018 at 2:37:20 PM, Anton Chernov (mecher...@gmail.com) wrote:

Thank you for noticing!

We are working on automating the process, but currently it's a manual
effort to publish to PyPi. We are experiencing some problems with the
publishing, but the issue should get resolved soon.

Best
Anton

пт, 31 авг. 2018 г. в 23:29, Alfredo Luque :

> See here:
> https://pypi.org/project/mxnet-cu90mkl/#history
>
> No builds show up since 8/22. From what I can tell, other variants (eg;
> mxnet-mkl) are up to date.
>
> On August 31, 2018 at 2:24:30 PM, Anton Chernov (mecher...@gmail.com)
> wrote:
>
> Hi Alfredo!
>
> Could you provide more info on this? Where do you get the information?
>
> Best
> Anton
>
> пт, 31 авг. 2018 г. в 22:49, Alfredo Luque
>  >:
>
> > Just curious why the latest build is 2018-08-22 while the other variants
> > are up to date.
> >
> > Thanks,
> >
> > —
> > Alfredo Luque
> > Software Engineer
> > Machine Learning Infrastructure
> > Airbnb
> > San Francisco, CA
> >
>
> —
> Alfredo Luque
> Software Engineer
> Machine Learning Infrastructure
> Airbnb
> San Francisco, CA
>
—
Alfredo Luque
Software Engineer
Machine Learning Infrastructure
Airbnb
San Francisco, CA


Re: Nightly Builds Not Working for Cu90MKL?

2018-08-31 Thread Alfredo Luque
See here:
https://pypi.org/project/mxnet-cu90mkl/#history

No builds show up since 8/22. From what I can tell, other variants (eg;
mxnet-mkl) are up to date.

On August 31, 2018 at 2:24:30 PM, Anton Chernov (mecher...@gmail.com) wrote:

Hi Alfredo!

Could you provide more info on this? Where do you get the information?

Best
Anton

пт, 31 авг. 2018 г. в 22:49, Alfredo Luque :

> Just curious why the latest build is 2018-08-22 while the other variants
> are up to date.
>
> Thanks,
>
> —
> Alfredo Luque
> Software Engineer
> Machine Learning Infrastructure
> Airbnb
> San Francisco, CA
>

—
Alfredo Luque
Software Engineer
Machine Learning Infrastructure
Airbnb
San Francisco, CA


Nightly Builds Not Working for Cu90MKL?

2018-08-31 Thread Alfredo Luque
Just curious why the latest build is 2018-08-22 while the other variants
are up to date.

Thanks,

—
Alfredo Luque
Software Engineer
Machine Learning Infrastructure
Airbnb
San Francisco, CA


Join Slack Channel

2018-07-16 Thread Alfredo Luque
Hi there, I’d like to join the MxNet slack channel. We’re working on low
precision quantization at airbnb and are interested in discussing some
issues we ran into there.

—
Alfredo Luque
Software Engineer
Machine Learning Infrastructure
Airbnb
San Francisco, CA