Re: Profiler Broken?

2020-05-28 Thread Pedro Larroy
Yes the profiler seems to be broken / has some concurrency issues. I have seen corrupted profile results. On Thu, May 28, 2020 at 12:30 PM Naveen Swamy wrote: > I am attempting to profile one of our models, I used the profiler.state to > run/stop in code and also used the environment variables t

Re: Workflow proposal

2020-03-17 Thread Pedro Larroy
Mon, Mar 16, 2020 at 7:41 PM Pedro Larroy > > wrote: > > > The original idea is that the promotion to the other branch is automated > by > > nightly CI, so it shouldn't have those problems that are mentioned, so > > there shouldn't be any manual merging

Re: Workflow proposal

2020-03-16 Thread Pedro Larroy
e sense the release itself > > > serves as a staging pt. > > > A good approach would simply setup the nightly if necessary strive to > fix > > > regressions and make sure the formal release addresses the issues. > > > > > > TQ > > > > >

Workflow proposal

2020-03-11 Thread Pedro Larroy
Hi I talk to some people about this and they thought it would be a good idea, so sharing it here: I would propose to use a staging or "dev" branch into which nightly & performance tests are done periodically and then this branch is merged to master. The goal of this workflow would be to avoid hav

Re: New AMIs for CI

2020-02-21 Thread Pedro Larroy
CI is back to normal. We haven't updated Windows AMIs due to issues with GPU unit tests. You might need to retrigger your PRs. Thanks for your patience. On Wed, Feb 19, 2020 at 5:54 PM Pedro Larroy wrote: > I reverted the CI rollout due to the following issues: > > https://git

Re: New AMIs for CI

2020-02-19 Thread Pedro Larroy
ue to older cmake being used in vs2017. For updating to vs2019 we would need to update cuda. Pedro. On Tue, Feb 18, 2020 at 5:31 PM Pedro Larroy wrote: > Hi > > Tomorrow I will be updating the CI environment with new AMIs, and > deploying updated autoscaling logic with fixes, expect

New AMIs for CI

2020-02-18 Thread Pedro Larroy
Hi Tomorrow I will be updating the CI environment with new AMIs, and deploying updated autoscaling logic with fixes, expect some disruptions in CI runs. The Linux AMIs will be updated to Ubuntu 18.04 with updated GPU drivers, this won't affect Linux container builds. The new Windows AMI comes wi

Re: Cuda 10.2 Wheels

2020-02-17 Thread Pedro Larroy
the previous links may > have been moved as part of reorganizing the file store namespaces. Please > refer to the latest page. > > > > -sz > > > >> On 2020/02/06 23:21:21, Alfredo Luque > wrote: > >> Looks like it updated since I last posted. Thanks! &

Re: Join request for MXNet Swift support

2020-02-10 Thread Pedro Larroy
> -tao > > On Mon, Feb 10, 2020 at 1:10 PM Rahul .invalid> > wrote: > > > Hello, > > > > As per the conversation with [Pedro Larroy](https://twitter.com/plarroy) > > on [Twitter thread]( > https://twitter.com/plarroy/status/1226408543621771264) > &

Re: Cuda 10.2 Wheels

2020-02-06 Thread Pedro Larroy
Hi Alfredo. Isn't "mxnet_cu102mkl-1.6.0 " what you are looking for? I see it on the second link you posted. Pedro On Tue, Feb 4, 2020 at 3:29 PM Alfredo Luque wrote: > Hi folks, > > Are there any blockers on re

Re: [VOTE] Release Apache MXNet (incubating) version 1.6.0.rc2

2020-02-04 Thread Pedro Larroy
Hi Przemek I'm fine if we add it to the release notes and try to fix it for the next release. Changing my vote to +1 Pedro. On Mon, Feb 3, 2020 at 7:42 PM Pedro Larroy wrote: > > -1 > > Unit tests passed in CPU build. > > I observe crashes related to openmp using cp

Re: [VOTE] Release Apache MXNet (incubating) version 1.6.0.rc2

2020-02-04 Thread Pedro Larroy
his is a release blocker as Przemek mentioned above. Could we fix this > in > > the next minor release? > > > > Thanks, > > > > Lin > > > > On Tue, Feb 4, 2020 at 10:38 AM Pedro Larroy < > pedro.larroy.li...@gmail.com > > > > > wrote

Re: [VOTE] Release Apache MXNet (incubating) version 1.6.0.rc2

2020-02-04 Thread Pedro Larroy
ub.com/apache/incubator-mxnet/issues/16891)? > > Przemek > > On 2020/02/04 03:42:30, Pedro Larroy > wrote: > > -1 > > > > Unit tests passed in CPU build. > > > > I observe crashes related to openmp using cpp unit tests: > > > > https://github.co

Re: [VOTE] Release Apache MXNet (incubating) version 1.6.0.rc2

2020-02-03 Thread Pedro Larroy
-1 Unit tests passed in CPU build. I observe crashes related to openmp using cpp unit tests: https://github.com/apache/incubator-mxnet/issues/17043 Pedro. On Mon, Feb 3, 2020 at 6:44 PM Chaitanya Bapat wrote: > +1 > Successfully built MXNet 1.6.0rc2 on Linux > Tested for OpPerf utility > For

[ANNOUNCE] Python2 is no longer supported after MXNet 1.6 release

2020-02-03 Thread Pedro Larroy
Hi all As per https://github.com/apache/incubator-mxnet/pull/15990 merge and as agreed with the community we will no longer support python2 in oncoming releases of MXNet. Special thanks to Leonard for facilitating this. Pedro.

Re: MXNet 1.6 as last release with Python 2 support?

2020-01-23 Thread Pedro Larroy
This is not good user experience. I have heard of impacts to some users / projects. Thanks. On Tue, Jan 21, 2020 at 10:44 PM Skalicky, Sam wrote: > Also, it has been reported that pip wheel installation with latest pip > version 20.0.1 breaks installation of MXNet pip wheels which have py2.py3

Re: Stop redistributing source code of 3rdparty dependencies to avoid licensing issues

2020-01-19 Thread Pedro Larroy
-1 I think is brittle to download a piece of source code that needs network connectivity to build. The network is always in flux. Source archives that need to download too many dependencies to build will end up broken with time. I would expect source to build with a reasonable set of well known sy

Re: CD with windows need a special jenkins slave machine like restricted-utility

2020-01-13 Thread Pedro Larroy
Thanks, it's working after updating to a 64 bit compiler. https://github.com/apache/incubator-mxnet/pull/17206 On Mon, Jan 13, 2020 at 4:55 PM Pedro Larroy wrote: > Isn't this something that gets selected through vcvars? > > On Fri, Jan 10, 2020 at 6:46 PM shiwen hu wrote

Re: CD with windows need a special jenkins slave machine like restricted-utility

2020-01-13 Thread Pedro Larroy
Isn't this something that gets selected through vcvars? On Fri, Jan 10, 2020 at 6:46 PM shiwen hu wrote: > use x64 host msvc. cmake -T host=x64 > > Pedro Larroy 于2020年1月10日周五 上午7:28写道: > > > Is there a solution for this error in VS2017? > > > > c:\use

Re: CD with windows need a special jenkins slave machine like restricted-utility

2020-01-09 Thread Pedro Larroy
Is there a solution for this error in VS2017? c:\users\administrator\mxnet\src\operator\mxnet_op.h(943) : fatal error C1002: compiler is out of heap space in pass 2 On Tue, Jan 7, 2020 at 5:11 PM shiwen hu wrote: > > > > I personally encountered the problem that 2015 can't compile in high > >

Re: Stopping nightly releases to Pypi

2020-01-08 Thread Pedro Larroy
rall risk score of our system. > > So in order to make sure that we're well protected, I'd recommend to spend > a bit of time on adapting the Jenkins pipeline to upload to s3 and then use > all the remaining time to actually harden the Jenkins master and make sure > that eve

Re: Stopping nightly releases to Pypi

2020-01-08 Thread Pedro Larroy
to the existing CD pipeline which publishes the > package to the s3 bucket instead of pypi. > > -Marco > > Pedro Larroy schrieb am Mi., 8. Jan. 2020, > 21:55: > > > I understand your point. But you don't provide an alternative, and > building > > binary release

Re: Stopping nightly releases to Pypi

2020-01-08 Thread Pedro Larroy
ine which publishes the > package to the s3 bucket instead of pypi. > > -Marco > > Pedro Larroy schrieb am Mi., 8. Jan. 2020, > 21:55: > > > I understand your point. But you don't provide an alternative, and > building > > binary releases from the CI jenkins as

Re: Stopping nightly releases to Pypi

2020-01-08 Thread Pedro Larroy
r being a release that is > affiliated or endorsed by Apache MXNet. > > We are taking a step back here and it's a pity to see that some people are > still not endorsing the Apache values. This will be my last email regarding > that topic and I will only follow up with actions afte

Re: CD with windows need a special jenkins slave machine like restricted-utility

2020-01-07 Thread Pedro Larroy
I'm putting some efforts on the side to improve the state of this: If you want to help: https://github.com/apache/incubator-mxnet/pull/17206 https://github.com/aiengines/ci/tree/master/windows Which of the cuda versions you listed it needs, I did some work on the side to update VS and cmake to

Re: Stopping nightly releases to Pypi

2020-01-03 Thread Pedro Larroy
> committers > > would also need access to the control plane of the system - to trigger, > > stop and audit builds. We could go down that road, but i think the fewer > > systems, the better - also for the sake of maintainability. > > > > Best regards, > > Mar

Re: Stopping nightly releases to Pypi

2020-01-03 Thread Pedro Larroy
- to trigger, > stop and audit builds. We could go down that road, but i think the fewer > systems, the better - also for the sake of maintainability. > > Best regards, > Marco > > > > Pedro Larroy schrieb am Fr., 3. Jan. 2020, > 20:55: > > > I'm not involved in

Re: Stopping nightly releases to Pypi

2020-01-03 Thread Pedro Larroy
o > with a grace period until 15th of January. Please bring the system into a > state that aligns with Apache values or revert the changes. > > -Marco > > Pedro Larroy schrieb am Fr., 3. Jan. 2020, > 03:33: > > > CD should be separate from CI for security reasons in any

Re: Stopping nightly releases to Pypi

2020-01-02 Thread Pedro Larroy
12-07/dist/mxnet_cu101mkl-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl > > > > You can easily install these pip wheels in your system either by > > downloading them to your machine first and then installing by doing: > > > > pip install /path/to/downloaded/wheel.whl &g

Re: windows ci, Cmake update, diverging scripts

2020-01-02 Thread Pedro Larroy
might want to integrate with games or other commercial packages which need deep learning. Thanks. On Mon, Dec 30, 2019 at 4:19 PM Pedro Larroy wrote: > I have looked into this a bit, and seems the open source version which is > in https://github.com/apache/incubator-mxnet-ci is older than

Re: windows ci, Cmake update, diverging scripts

2019-12-30 Thread Pedro Larroy
ow to query for the latest windows AMI: https://aws.amazon.com/blogs/mt/query-for-the-latest-windows-ami-using-systems-manager-parameter-store/ On Mon, Dec 30, 2019 at 3:12 PM Pedro Larroy wrote: > It's automated but broken as the execution is in failed state. I think we > will need a

Re: windows ci, Cmake update, diverging scripts

2019-12-30 Thread Pedro Larroy
ed automatically. Why is it not done > automatically anymore / why does the documentation claim it happens > automatically but it doesn't? > > On Mon, 2019-12-30 at 12:11 -0800, Pedro Larroy wrote: > > Hi > > > > I was looking at a request from Leonard for updating CMake

Re: [VOTE] Release Apache MXNet (incubating) version 1.6.0.rc0

2019-12-30 Thread Pedro Larroy
;s beneficial to backport the 3 PRs. > > On Fri, 2019-12-27 at 11:24 -0800, Pedro Larroy wrote: > > Agree with Sheng, I think it would be good to have the nice fixes that > > Leonard has done for 1.6 and not delay them to further releases since > they > > are beneficial to

windows ci, Cmake update, diverging scripts

2019-12-30 Thread Pedro Larroy
Hi I was looking at a request from Leonard for updating CMake on windows, and I see that the post-install.py script which setups the windows environment in CI has diverged significantly from the incubator-mxnet-ci and the private repository that is used to deploy to production CI. https://github.

Re: [apache/incubator-mxnet] [RFC][mxnet 2.0][item 10.1] MXNet Imperative Op Invocation Overhead (#17097)

2019-12-27 Thread Pedro Larroy
Test On Fri, Dec 27, 2019 at 11:54 AM Pedro Larroy wrote: > Thanks for the explanation. I'm not so concerned about complexity of > dispatching. If I understood you correctly the main benefit that you > explain for the TVM project was not having to change the C API, but still &g

Re: [apache/incubator-mxnet] [RFC][mxnet 2.0][item 10.1] MXNet Imperative Op Invocation Overhead (#17097)

2019-12-27 Thread Pedro Larroy
LOL, the last one was my comment, not @szha :-D -- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/apache/incubator-mxnet/issues/17097#issuecomment-569358758

Re: [apache/incubator-mxnet] [RFC][mxnet 2.0][item 10.1] MXNet Imperative Op Invocation Overhead (#17097)

2019-12-27 Thread Pedro Larroy
Thanks for the explanation. I'm not so concerned about complexity of dispatching. If I understood you correctly the main benefit that you explain for the TVM project was not having to change the C API, but still you need to do type checking in both ends, or at least on the receiving end of the API,

Re: [VOTE] Release Apache MXNet (incubating) version 1.6.0.rc0

2019-12-27 Thread Pedro Larroy
t; > >> In case of backporting #17012, also > > >> https://github.com/apache/incubator-mxnet/pull/17098 must be > > backported. > > >> The > > >> updated OpenMP added a new target which is not used by MXNet but > breaks > > the

Re: [apache/incubator-mxnet] [RFC][mxnet 2.0][item 10.1] MXNet Imperative Op Invocation Overhead (#17097)

2019-12-26 Thread Pedro Larroy
Pybind is nice, I used Boost python many years ago, which I think is based on. The problem with this is the hourglass C bindings, you have to go from Python to C++ / Pybind, down to C and to the engine, this seems like a lot of boilerplate. On Mon, Dec 16, 2019 at 10:02 PM reminisce wrote: > MXN

Re: [apache/incubator-mxnet] [RFC][mxnet 2.0][item 10.1] MXNet Imperative Op Invocation Overhead (#17097)

2019-12-26 Thread Pedro Larroy
What's the point of having an API if you type erase it? Then you might as well have a single function API with a type erased callback name to select the function to call. In the end you move the burden away from the API to the callers and inside the API to the dispatchers. For going this route of u

Re: [VOTE] Release Apache MXNet (incubating) version 1.6.0.rc0

2019-12-26 Thread Pedro Larroy
https://github.com/apache/incubator-mxnet/pull/17012 should be also ported to the release branch. On Fri, Dec 20, 2019 at 1:39 PM Przemysław Trędak wrote: > That issue is now fixed in master, I am in the process of cherry-picking > the fix to v1.6.x branch. I will prepare the RC1 once that is r

Re: [apache/incubator-mxnet] [RFC] Custom Operator Part 2 (#17006)

2019-12-26 Thread Pedro Larroy
@wkcn could you explain your suggestion? calling gemm back into the framework which gets dispatched to GPU or CPU? -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/apache/incubator-mxnet/issues/17006#issuec

[discuss] add lgtm.com to mxnet

2019-12-18 Thread Pedro Larroy
Shall we add lgtm to mxnet? https://lgtm.com/

The essence of deep learning, autodiff and higher order gradients

2019-12-18 Thread Pedro Larroy
Hi I published the slides I presented at the last MXNet meetup on automatic differentiation and higher order gradients. If you want to get more insights to understand some PRs which have been sent or future directions on this area for 2.0. I also compare implementation across major deep learning f

Re: Please remove conflicting Open MP version from CMake builds

2019-12-08 Thread Pedro Larroy
is flawed, I > will assume your previous veto is void based on Apache Voting rule as it lacks > technical justification and in any case was motivated by the assertion issue, > which I agree with you, is likely not due to gomp / omp interaction. > > Thank you > Leonard > > &g

Re: Please remove conflicting Open MP version from CMake builds

2019-12-08 Thread Pedro Larroy
eto is void based on Apache Voting rule as it lacks > technical justification and in any case was motivated by the assertion issue, > which I agree with you, is likely not due to gomp / omp interaction. > > Thank you > Leonard > > > On Sat, 2019-12-07 at 15:40 -0800, Pe

Re: Please remove conflicting Open MP version from CMake builds

2019-12-08 Thread Pedro Larroy
You can consider rescinding your veto on removing >> 3rdparty/openmp after reading through the evidence in that issue. If you >> don't >> provide any evidence for why the methodology/conclusion in #14979 is >> flawed, I >> will assume your previous veto is void based on

Re: Please remove conflicting Open MP version from CMake builds

2019-12-07 Thread Pedro Larroy
ow users should tune and configure MXNet when > > > using OMP. > > > > > > As a developer, the safest bet is to use GOMP to be able to debug and > > > develop without issues. As a user of CPU inference / training you want > to > > > run MKL so depends

Re: Please remove conflicting Open MP version from CMake builds

2019-12-06 Thread Pedro Larroy
ct#specific-guidelines > > > On Sat, 2019-11-30 at 02:47 -0800, Pedro Larroy wrote: > > (py3_venv) piotr@34-215-197-42:1:~/mxnet_1.6 (upstream_master)+$ ldd > > build/libmxnet.so| grep -i openmp > > libomp.so => > > /home/piotr/mxnet_1.6/build/3rdparty/openmp

Re: Can upgrade windows CI cmake?

2019-12-06 Thread Pedro Larroy
CMake shipped with ubuntu has issues when compiling with CUDA on GPU instances. I wouldn't recommend anything older than 3.12 for Linux GPU https://github.com/apache/incubator-mxnet/blob/master/ci/docker/install/ubuntu_core.sh#L63 I don't know about windows CMake version but would make sense to

Re: CI Update

2019-12-06 Thread Pedro Larroy
Hi all. CI is back to normal after Jake's commit: https://github.com/apache/incubator-mxnet/pull/16968 please merge from master. If someone could look into the TVM building issues described above would be great. On Tue, Dec 3, 2019 at 11:11 AM Pedro Larroy wrote: > Some PRs were expe

Re: CI Update

2019-12-03 Thread Pedro Larroy
Pedro Larroy wrote: > Also please take note that there's a stage building TVM which is executing > compilation serially and takes a lot of time which impacts CI turnaround > time: > > https://github.com/apache/incubator-mxnet/issues/16962 > > Pedro > > On Tue, Dec

Re: CI Update

2019-12-03 Thread Pedro Larroy
Also please take note that there's a stage building TVM which is executing compilation serially and takes a lot of time which impacts CI turnaround time: https://github.com/apache/incubator-mxnet/issues/16962 Pedro On Tue, Dec 3, 2019 at 9:49 AM Pedro Larroy wrote: > Hi MXNet commu

Re: CI Update

2019-12-03 Thread Pedro Larroy
h tvm when not installing the cuda driver in the container: https://pastebin.com/bQA0W2U4 centos gpu builds and tests seem to run with the updated AMI and changes to the container. Thanks. On Mon, Dec 2, 2019 at 12:11 PM Pedro Larroy wrote: > Small update about CI, which is blocked. &

CI Update

2019-12-02 Thread Pedro Larroy
Small update about CI, which is blocked. Seems there's a nvidia driver compatibility problem in the base AMI that is running in GPU instances and the nvidia docker images that we use for building and testing. We are working on providing a fix by updating the base images as doesn't seem to be easy

Please remove conflicting Open MP version from CMake builds

2019-11-30 Thread Pedro Larroy
(py3_venv) piotr@34-215-197-42:1:~/mxnet_1.6 (upstream_master)+$ ldd build/libmxnet.so| grep -i openmp libomp.so => /home/piotr/mxnet_1.6/build/3rdparty/openmp/runtime/src/libomp.so (0x7fde0991d000) (py3_venv) piotr@34-215-197-42:0:~/mxnet_1.6 (upstream_master)+$ python ~/deeplearning-b

Re: [Discuss] MXNet Python < 3.6 Support Deprecation

2019-11-06 Thread Pedro Larroy
y came to an agreement, we can formalize it with a voting > > thread. Until then, I'd recommend to refrain from any actions or > > user-facing communication regarding this topic. > > > > Best regards, > > Marco > > > > On Tue, Aug 27, 2019 at 1:29

Re: [DISCUSS] CI Access Control

2019-09-27 Thread Pedro Larroy
We will address the shortcomings that Marco outlined by using a pipeline to deploy the CI infrastructure. Which will allow for contributions and easy redeployment and rollback in the case of issues. I would recommend planning a migration towards Drone IO or similar, with an initial prototype to va

Re: Enable Timestamp in CI Logging

2019-09-27 Thread Pedro Larroy
Sheng, you should have admin access to Jenkins as of now. Why wouldn't be persistent through reboots? Pedro. On Sat, Sep 14, 2019 at 10:07 PM Sheng Zha wrote: > Thank you, Philip. Looks like xgboost is using the same plugin for the > timestamps. > > Unfortunately, I don't have admin access to

[DISCUSS] Remove amalgamation

2019-09-11 Thread Pedro Larroy
cation for > you calling something "hacky". The only email in this thread where I see ad > hominems and disrespectful comments is your email. > > On Sat, Sep 7, 2019, 10:18 PM Pedro Larroy > wrote: > >> Apache mentors should have a look at these reincident harassment

Re: [DISCUSS] Remove amalgamation

2019-09-07 Thread Pedro Larroy
rts > "hacky". Please respect what came before. > > Thanks for understanding, > > -Chris > > > On Fri, Sep 6, 2019 at 3:07 PM Pedro Larroy > wrote: > > > Hi > > > > I would like to propose to remove amalgamation from MXNet and CI, users >

[DISCUSS] Remove amalgamation

2019-09-06 Thread Pedro Larroy
Hi I would like to propose to remove amalgamation from MXNet and CI, users have reported that they couldn't use it successfully in Android, and instead they were able to use the cross compiled docker build successfully. Any reason why we shouldn't remove this hacky solution? Pedro.

Re: [VOTE] Python 2 Removal for MXNet 1.6

2019-09-06 Thread Pedro Larroy
Did this vote pass? Can we remove Python2 support from master? On Tue, Aug 27, 2019 at 2:51 PM Pedro Larroy wrote: > +1 > > On Tue, Aug 27, 2019 at 3:49 AM Leonard Lausen wrote: > >> Due to References: header the prior email was still sorted in the >> discussion thr

Re: new website

2019-09-06 Thread Pedro Larroy
The new website looks great Aaron. Nice work to everyone involved ! On Thu, Aug 29, 2019 at 5:26 PM Aaron Markham wrote: > Hi everyone, > > I'm very excited to share a preview and the pull requests for a new > website and new documentation pipelines. > > The following link is using Apache's new

Re: [DISCUSS] Slim down scope of CI

2019-08-28 Thread Pedro Larroy
ependencies for windows to simplify the build, CI, > and binary distribution? > > > On Wed, Aug 28, 2019, 09:30 Pedro Larroy > wrote: > >> Hi >> >> I would like to propose a discussion to slim down CI by dropping some jobs >> which are of questionable value, othe

[DISCUSS] Slim down scope of CI

2019-08-28 Thread Pedro Larroy
Hi I would like to propose a discussion to slim down CI by dropping some jobs which are of questionable value, other which have received little community support: - Drop centos: we can't support every distro, if anything we should clearly specify which versions of base libraries are needed and te

Re: [VOTE] Python 2 Removal for MXNet 1.6

2019-08-27 Thread Pedro Larroy
> > > > Thus, let's start a vote on dropping Python 2 for MXNet 1.6. > > It's fine if this vote fails, but we need to get a clear understanding > > how we want to move forward. > > > > For better visibility, I'm removing the In-Reply-To: header, which wa

Re: [Discussion] MXNet 1.5.1 release

2019-08-27 Thread Pedro Larroy
1:19 PM Lin Yuan wrote: > https://github.com/apache/incubator-mxnet/pull/15762 contains some > unrelated changes which is being reverted. Please do not cherry pick it > yet. > > On Mon, Aug 26, 2019 at 4:25 PM Pedro Larroy > > wrote: > > > There's a fix that I

Re: [Discuss] MXNet Python < 3.6 Support Deprecation

2019-08-26 Thread Pedro Larroy
I have sent a PR that removes Python2 from CI. But was closed. I thought everyone was +1 on this one. This would remove quite a bit of load on CI: https://github.com/apache/incubator-mxnet/pull/15990 If it's not the right time to do this, what steps do we need to take? Pedro. On Mon, Aug 26, 2

Re: [Discussion] MXNet 1.5.1 release

2019-08-26 Thread Pedro Larroy
There's a fix that I did which seems to still produce crashes in 1.5 for some users, which I got notice today and is fixed in master. Might be useful to put in 1.5.1: https://github.com/apache/incubator-mxnet/pull/15762 ? Pedro. On Tue, Aug 20, 2019 at 7:49 AM Tao Lv wrote: > Hi dev, > > Her

Re: CI and PRs

2019-08-26 Thread Pedro Larroy
ichever side you're "on" doesn't matter, because both > sides do it). > > -Chris > > > > > > On Fri, Aug 23, 2019 at 3:56 PM Pedro Larroy > > wrote: > > > Thanks for your response Marco, I think you have totally missed my > ori

Re: CI and PRs

2019-08-23 Thread Pedro Larroy
| tail -n 10 6 Zach Kimberg 6 stu1130 7 Jake Lee 8 Aaron Markham 11 Lanking 12 Anton Chernov 13 perdasilva 26 Kellen Sunderland 34 Marco de Abreu 46 Pedro Larroy pllarroy@mac:0: ~/d/mxnet_ci_general [master]> git log --pretty=format:%aN | sort | uniq -c | sort -n 1

Re: CI and PRs

2019-08-23 Thread Pedro Larroy
ot;. Pedro. On Fri, Aug 16, 2019 at 4:03 PM Pedro Larroy wrote: > Hi Aaron. This is difficult to diagnose, because I don't know what to do > when the hash of the layer in docker doesn't match and decides to rebuild > it. the r script seems not to have changed. I have observed t

Re: CI and PRs

2019-08-16 Thread Pedro Larroy
/work/ > ---> c5e77c38031d > Step 8/15 : COPY install/r.gpg /work/ > ---> d8cdbf015d2b > Step 9/15 : RUN /work/ubuntu_r.sh > ---> Running in c6c90b9e1538 > ++ dirname /work/ubuntu_r.sh > + cd /work > + echo 'deb http://cran.rstudio.com/bin/linux/ubuntu tru

Re: CI and PRs

2019-08-16 Thread Pedro Larroy
Also, I forgot, another workaround is that I added the -R flag to the build logic (build.py) so the container is not rebuilt for manual use. On Fri, Aug 16, 2019 at 11:18 AM Pedro Larroy wrote: > > Hi Aaron. > > As Marco explained, if you are in master the cache usually works, t

Re: CI and PRs

2019-08-16 Thread Pedro Larroy
. > > > > > > I noticed that almost all jobs use an ubuntu setup that is fully > loaded. > > > Without cache, it can take 10 or more minutes to build. So I made a > lite > > > version. Takes only a few minutes instead. > > > > > > In s

Re: MXNet CI repository

2019-08-15 Thread Pedro Larroy
Nice. On Thu, Aug 15, 2019 at 12:47 PM Marco de Abreu wrote: > Repository has been created: https://github.com/apache/incubator-mxnet-ci > > I will fill it soon. > > -Marco > > On Thu, Aug 15, 2019 at 8:43 PM Carin Meier wrote: > > > +1 > > > > On Thu, Aug 15, 2019 at 2:37 PM Chaitanya Bapat >

Re: CI and PRs

2019-08-15 Thread Pedro Larroy
t much more than nfs/efs > sharing and remote ssh commands. All it takes is a little ingenuity and > some imagination. > > On Wed, Aug 14, 2019 at 4:31 PM Pedro Larroy > > wrote: > > > Sounds good in theory. I think there are complex details with regards of > > res

Re: CI and PRs

2019-08-15 Thread Pedro Larroy
> > some imagination. > > > > On Wed, Aug 14, 2019 at 4:31 PM Pedro Larroy < > pedro.larroy.li...@gmail.com > > > > > wrote: > > > > > Sounds good in theory. I think there are complex details with regards > of > > > resource shari

Re: CI and PRs

2019-08-14 Thread Pedro Larroy
o provide code fixes to help the base PR get > > > green. > > > > > > The time costs to maintain such a large CI project obviously needs to > be > > > considered as well. > > > > > > [1] https://github.com/apache/incubator-mxnet/pull/15579 > >

Re: CI and PRs

2019-08-14 Thread Pedro Larroy
always happy to pitch in to provide code fixes to help the base PR get > > green. > > > > The time costs to maintain such a large CI project obviously needs to be > > considered as well. > > > > [1] https://github.com/apache/incubator-mxnet/pull/15579 > &g

Re: CI and PRs

2019-08-14 Thread Pedro Larroy
e analytics to see how much each of the language bindings > is > > > contributing to overall run time. > > > If we have some metrics on that, maybe we can come up with a guideline > of > > > how much time each should take. Another possibility is leverage the > &

Re: CI and PRs

2019-08-14 Thread Pedro Larroy
Meier wrote: > > > Before any binding tests are moved to nightly, I think we need to figure > > out how the community can get proper notifications of failure and success > > on those nightly runs. Otherwise, I think that breakages would go > > unnoticed. > > > > -Carin

Re: CI and PRs

2019-08-14 Thread Pedro Larroy
uns. Otherwise, I think that breakages would go > unnoticed. > > -Carin > > On Tue, Aug 13, 2019 at 7:47 PM Pedro Larroy > > wrote: > > > Hi > > > > Seems we are hitting some problems in CI. I propose the following action > > items to remedy the sit

CI and PRs

2019-08-13 Thread Pedro Larroy
Hi Seems we are hitting some problems in CI. I propose the following action items to remedy the situation and accelerate turn around times in CI, reduce cost, complexity and probability of failure blocking PRs and frustrating developers: * Upgrade Windows visual studio from VS 2015 to VS 2017. Th

Evolving the computational graph

2019-07-23 Thread Pedro Larroy
Hi dev@ I have observed some architectural limitations on MXNet's architecture that would be beneficial to address in future releases. For example during calculation of higher order gradients it would be needed to access the graph and shape information from FGradient function to be able to do some

Re: [Discuss] MXNet Python 2 Support Deprecation

2019-07-18 Thread Pedro Larroy
> On Thu, Jul 18, 2019 at 9:42 PM Yuan Tang wrote: > > > I would suggest supporting Python 3.5+ since the earlier versions have > > reached end-of-life status: > > https://devguide.python.org/devcycle/#end-of-life-branches > > > > On Thu, Jul 18, 2019

Re: [Discuss] MXNet Python 2 Support Deprecation

2019-07-18 Thread Pedro Larroy
+1 This would simplify CI, reduce costs and more. I think a followup question is what would be the mininum Python3 version supported? Depending on that we might be able to use type annotations for example or other features. Pedro. On Thu, Jul 18, 2019 at 12:07 PM Yuan Tang wrote: > > +1 > > On

Re: [DISCUSS] MXNet 1.6.0 Roadmap

2019-07-18 Thread Pedro Larroy
Are we using Jira or some other tool (Trello?) for planning? I think getting more visibility on some of on the major ongoing activities would help rally contributions around them. If they link to the design document and group PRs in a single place (I think Jira or trello can do that) it would help

Re: warnings as errors

2019-07-18 Thread Pedro Larroy
, May 22, 2019 at 1:50 PM Pedro Larroy wrote: > > I was not able to fix the warnings on mshadow type switch with unused > local typedefs, that's one example of warning that I would disable. I > couldn't find a way to solve that one and I think the ramifications of > an unus

Re: [DISCUSS] Make MXNet deploy it's own distribution

2019-07-03 Thread Pedro Larroy
Nice! +1 To this approach, seems well thought. Thanks for including Android and linux-arm. Does Android and linux-arm use a different classifier? On Wed, Jul 3, 2019 at 6:46 AM Chris Olivier wrote: > > Will this be another repo under Apache repo? Is tensorflow java package in > a separate repo?

Re: FW: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc1

2019-06-28 Thread Pedro Larroy
r 1.5.0 version, I used pip install mxnet-mkl==1.5.0b20190619 which > corresponds to commit# ccbbf6b4b76ea536a6583c99497c83b65a20817b which is > behind 1.5.x branch by 4 commits > > > Best, > Manu > > > On 6/27/19, 10:44 AM, "Pedro Larroy" wrote: > > > > I will

Re: OMP

2019-06-28 Thread Pedro Larroy
ce code, it doesn't.. You can't affect the OMP behavior at arbitrary > points in time by setting the "OMP_NUM_THREADS" environment variable. > > > > > On Tue, Jun 25, 2019 at 1:20 PM Pedro Larroy > wrote: > > > Nobody claimed that the original lockup

Re: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc1

2019-06-27 Thread Pedro Larroy
elease. > > Setup: > 1.5 => Build from source (head of 1.5.rc2 tag), built with MKLDNN > 1.4.1 => PyPi mxnet-mkl==1.4.1 > Machine: C5.18X > No explicit environment variable were set > Operator benchmark code - > https://github.com/apache/incubat

Re: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc1

2019-06-27 Thread Pedro Larroy
opcnt tsc_deadline_timer aes xsave avx f16c rdrand > > hypervisor lahf_lm abm 3dnowprefetch invpcid_single kaiser fsgsbase > > tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f rdseed adx > > smap clflushopt clwb avx512cd xsaveopt xsavec xgetbv1 ida arat pku &g

Re: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc1

2019-06-26 Thread Pedro Larroy
02 Epoch 3, Validation accuracy=0.293870 Epoch 4, Batch 199, Speed=449.329787 Epoch 4, Duration=138.398325 Epoch 4, Training accuracy=0.270021 Epoch 4, Validation accuracy=0.311498 real11m45.329s user426m13.908s sys 16m45.093s On Wed, Jun 26, 2019 at 4:18 PM Pedro Larroy wrote:

Re: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc1

2019-06-26 Thread Pedro Larroy
at 3:50 PM Pedro Larroy wrote: > > Hi Ciyong, thanks for trying to reproduce: > > I used this one: > https://github.com/awslabs/deeplearning-benchmark/blob/master/dawnbench/cifar10.py > > Could you provide hardware and OS details? > > I will rerun and repost numbers in a

Re: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc1

2019-06-26 Thread Pedro Larroy
al9m2.155s > user 329m37.092s > sys 16m8.668s > > -Ciyong > > > -Original Message- > From: Pedro Larroy [mailto:pedro.larroy.li...@gmail.com] > Sent: Wednesday, June 26, 2019 6:28 AM > To: dev@mxnet.incubator.apache.org > Cc: d...@mx

Re: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc1

2019-06-25 Thread Pedro Larroy
perfmperf pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clfl

Re: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc1

2019-06-25 Thread Pedro Larroy
I did a training of cifar10 in CPU and seems there's some regressions in the range of 7% increase of training time against 1.4.1: (py3_venv) piotr@ip-172-31-63-171:0:~/deeplearning-benchmark/dawnbench (master)+$ time python cifar10.py --epochs 5 real11m30.388s user417m7.766s sys 16m57.

  1   2   3   4   5   >