Slack channel

2017-06-20 Thread Pedro Larroy
Hi Please add me to the slack channel. -- Pedro Larroy Tovar

Improving and rationalizing unit tests

2017-10-16 Thread Pedro Larroy
Hi Some of the unit tests are extremely costly in terms of memory and compute. As an example in the gluon tests we are loading all the datasets. test_gluon_data.test_datasets Also running huge networks like resnets in test_gluon_model_zoo. This is ridiculously slow, and straight impossible on

Re: Improving and rationalizing unit tests

2017-10-16 Thread Pedro Larroy
> unforseen edge case failures occur in the field and not during testing". > > Is this the motivation? Seems strange to me. > > > On Mon, Oct 16, 2017 at 9:09 AM, Pedro Larroy < > pedro.larroy.li...@gmail.com> > wrote: > > > I think using a properly seede

Re: Improving and rationalizing unit tests

2017-10-16 Thread Pedro Larroy
ch that we can make > the unit tests framework to be the rock-solid foundation for the active > development of Apache MXNet (Incubating). > > Regards, > Bhavin Thaker. > > > On Mon, Oct 16, 2017 at 5:56 AM Pedro Larroy <pedro.larroy.li...@gmail.com > <mailto:p

Build failed in Jenkins: mxnet_incubator_master » armv6 #136

2017-10-16 Thread Pedro Larroy
See -- Started by upstream project "mxnet_incubator_master" build number 136 originally caused by: Started by an SCM change Building in workspace

Jenkins build is back to normal : mxnet_incubator_master » ubuntu-16.04-cuda_8.0_cudnn5 #137

2017-10-16 Thread Pedro Larroy
See

Build failed in Jenkins: mxnet_incubator_master » cmake.ubuntu-17.04 #136

2017-10-16 Thread Pedro Larroy
See -- Started by upstream project "mxnet_incubator_master" build number 136 originally caused by: Started by an SCM change Building in

Build failed in Jenkins: mxnet_incubator_master » android.armv7 #136

2017-10-16 Thread Pedro Larroy
See -- Started by upstream project "mxnet_incubator_master" build number 136 originally caused by: Started by an SCM change Building in workspace

Jenkins build is back to normal : mxnet_incubator_master » armv7 #137

2017-10-16 Thread Pedro Larroy
See

Jenkins build is back to normal : mxnet_incubator_master » armv6 #137

2017-10-16 Thread Pedro Larroy
See

Jenkins build is back to normal : mxnet_incubator_master » ubuntu-17.04 #137

2017-10-16 Thread Pedro Larroy
See

Build failed in Jenkins: mxnet_incubator_master » arm64 #136

2017-10-16 Thread Pedro Larroy
See -- Started by upstream project "mxnet_incubator_master" build number 136 originally caused by: Started by an SCM change Building in workspace

Jenkins build is back to normal : mxnet_incubator_master » arm64 #137

2017-10-16 Thread Pedro Larroy
See

Build failed in Jenkins: mxnet_incubator_master » armv7 #136

2017-10-16 Thread Pedro Larroy
See -- Started by upstream project "mxnet_incubator_master" build number 136 originally caused by: Started by an SCM change Building in workspace

Jenkins build is back to normal : mxnet_incubator_master » android.armv7 #137

2017-10-16 Thread Pedro Larroy
See

Build failed in Jenkins: mxnet_incubator_master » ubuntu-17.04 #136

2017-10-16 Thread Pedro Larroy
See -- Started by upstream project "mxnet_incubator_master" build number 136 originally caused by: Started by an SCM change Building in workspace

Use unique_ptr on Executor creation

2017-11-22 Thread Pedro Larroy
Hi I would like to make a minor change of the cpp package for 1.0 by returning a unique_ptr when creating Engine. This is a C++ idiom that prevents memory leaks and fixes one leak in the examples. As before it requires explicit delete as the ownership of the pointer is not clear from the API

Re: Apache MXNet Development Processes: Proposed update

2017-12-18 Thread Pedro Larroy
+1 to all mentioned points above What about changes that refactor code to make it more readable & maintainable? My point of view is that this is important to keep the quality of a project and maintain the codebase. While some people think "if ain't broke it don't fix it". The reality is that

NVtidia system profiler

2017-12-15 Thread Pedro Larroy
Hi We tried with Kellen to run the system profiler on MXNet on a Jetson board was impossible due to errors along the lines of "cannot allocate CUPTI buffer" while trying to profile MXNet Has somebody run into similar issue, or somebody from NVidia can comment on this? Was somebody able to run

Re: [VOTE] A Separate CI System for Apache MXNet (incubating)

2017-11-13 Thread Pedro Larroy
+1 for [1] (A setup separated from Apache Jenkins) On Mon, Nov 13, 2017 at 4:50 AM, sandeep krishnamurthy wrote: > +1 for [1] Jenkins (A setup separated from Apache Jenkins) - with > preferably AWS Code Build integration to reduce the size of infrastructure > we

[RFQ] Deprecate amalgamation

2017-11-20 Thread Pedro Larroy
Hi all Given that we have working builds for ARM, Android, TX2 and the main architectures, and after considering how amalgamation is done. I would like to propose that we deprecate and remove amalgamation. I don't think the cost of maintaining this feature and how it's done justifies the ROI,

Re: 3rdparty packages as submodules

2017-11-20 Thread Pedro Larroy
We could also add gtest as well for example. I would like to point out that is quite cumbersome to get your code tested and ready before sending a PR, this includes installing cpplint, pylint, gtest… Installing gtest and bootstrapping it is not completely trivial. Kind regards. On Mon, Nov

Re: [Important] Please Help make the Apache MXNet (incubating) 1.0 Release Notes Better!

2017-11-20 Thread Pedro Larroy
Thank you Meghna Added notes about ARM & Nvidia Jetson support (beta) to the document. On Mon, Nov 20, 2017 at 2:19 PM, Meghna Baijal wrote: > Apologies. Done. > > On Mon, Nov 20, 2017 at 2:18 PM, Chris Olivier > wrote: > >> No write access :(

Re: [RFQ] Deprecate amalgamation

2017-11-20 Thread Pedro Larroy
hat there is > no dedicated developer on it yet. We can talk about full deprecation after > this > > > Tianqi > > On Mon, Nov 20, 2017 at 2:47 PM, Pedro Larroy <pedro.larroy.li...@gmail.com> > wrote: > >> Hi all >> >> Given that we have working builds

Re: [RFQ] Deprecate amalgamation

2017-11-21 Thread Pedro Larroy
Anybody against removing amalgamation then? emscripten build is already using CMake. On Tue, Nov 21, 2017 at 9:22 AM, Tianqi Chen <tqc...@cs.washington.edu> wrote: > Yes, you can call emscripten from CMake > > Tianqi > > On Mon, Nov 20, 2017 at 5:42 PM, Pedro Larroy <pedr

Re: [Proposal] Stabilizing Apache MXNet CI build system

2017-11-01 Thread Pedro Larroy
+1 That would be great. On Mon, Oct 30, 2017 at 5:35 PM, Hen <bay...@apache.org> wrote: > How about we ask for a new mxnet repo to store all the config in? > > On Fri, Oct 27, 2017 at 05:30 Pedro Larroy <pedro.larroy.li...@gmail.com> > wrote: > >> Just

Re: update build instructions

2017-11-07 Thread Pedro Larroy
unit tests to get started making changes. Your feedback is welcome. And if you can please test the instructions to check they work properly and nothing was missed. Pedro. On Thu, Nov 2, 2017 at 5:07 PM, Pedro Larroy <pedro.larroy.li...@gmail.com> wrote: > Right. > > I tried now

Re: [Proposal] Stabilizing Apache MXNet CI build system

2017-11-01 Thread Pedro Larroy
igh-quality level. > > > 10) Finally, how do we get ownership for code submitted to MXNet? When > something fails in a code segment that only a small set of folks know > about, what is the expected SLA for a response from them? When users deploy > MXNet in production environments, they

Re: [VOTE] Disable Appveyor

2017-12-08 Thread Pedro Larroy
+1 On Fri, Dec 8, 2017 at 1:02 AM, Chris Olivier wrote: > +1 > > On Thu, Dec 7, 2017 at 3:57 PM, kellen sunderland < > kellen.sunderl...@gmail.com> wrote: > >> It's using a fixed binary version of openblas and opencv, so the versions >> there might be different than what

Re: [VOTE] Disable Appveyor

2017-12-08 Thread Pedro Larroy
Can somebody re-enable travis CI so we can do mac builds? currently the unit tests don't build in Mac because Mac is not tested. On Fri, Dec 8, 2017 at 12:02 PM, Pedro Larroy <pedro.larroy.li...@gmail.com> wrote: > +1 > > > > On Fri, Dec 8, 2017 at 1:02 AM, Chris Olivier &

Re: Protected master needs to be turned off

2017-12-01 Thread Pedro Larroy
CI catches problems all the time. I don't think many of us can afford to build all the flavors and architectures in their laptops or workstations, so we have to rely on CI to catch all kinds of errors from compilation errors to bugs plus regressions, specially in a project which has so many build

Re: Protected master needs to be turned off

2017-12-04 Thread Pedro Larroy
ractice today is still code >> reviewing. Otherwise, such as a PR is mainly about examples, the CI often >> doesn't help so we just waste machine times. >> >> I think checking the exact code coverage is on the roadmap, but I don't >> know if we have any progress on it. >

Re: [VOTE] Disable Appveyor

2017-12-11 Thread Pedro Larroy
+1 Queued for 16 hours: https://ci.appveyor.com/project/ApacheSoftwareFoundation/incubator-mxnet/build/1.0.4199 https://github.com/apache/incubator-mxnet/pull/9016 On Fri, Dec 8, 2017 at 12:19 PM, Bay, Daniel wrote: > +1 > > On 07.12.17, 23:41, "Indhu"

Re: [ANNOUNCE] Apache MXNet (incubating) 1.0.0 Release

2017-12-05 Thread Pedro Larroy
Congratulations, thanks for the contributions and the great work. On Tue, Dec 5, 2017 at 1:02 AM, Spisak, Joseph wrote: > Congrats > > > On 12/4/17, 4:01 PM, "Chris Olivier" wrote: > > Hello All, > > > > The Apache MXNet (incubating)

Fix slicing for 0.12

2017-10-24 Thread Pedro Larroy
Hi Can we get this PR in for 0.12? https://github.com/apache/incubator-mxnet/pull/8400 It's a critical fix with undefined behaviour, which shows itself specially in ARM platforms. -- Pedro.

Re: Fix slicing for 0.12

2017-10-24 Thread Pedro Larroy
the case that they have a critical fix that > should go into 0.12.0.rc1? Hopefully the PR already passed CI or is in > master already. > > On Tue, Oct 24, 2017 at 6:31 AM, Pedro Larroy < > pedro.larroy.li...@gmail.com> > wrote: > > > Hi > > > > Can

Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2

2018-05-04 Thread Pedro Larroy
ng all of the flaky tests which would delay > the release by considerable amount of time. > Or is it something else ? > > Anirudh > > > On Fri, May 4, 2018 at 4:49 AM, Pedro Larroy <pedro.larroy.li...@gmail.com > > > wrote: > > > Could you remove the fixe

Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2

2018-05-05 Thread Pedro Larroy
/5/thread:137: undefined reference to `pthread_create' collect2: error: ld returned 1 exit status ninja: build stopped: subcommand failed. Can we update dmlc-core on the release branch? this was recently fixed: https://github.com/dmlc/dmlc-core/commit/b744643f386660ddc39467a04e3a98853a7419b9 On

Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2

2018-05-05 Thread Pedro Larroy
t; > > > > > I agree with Anirudh that the focus of the discussion should be > limited > > > to > > > > the release branch, not the master branch. Anything that breaks on > > master > > > > but works on release branch should not block th

Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2

2018-05-04 Thread Pedro Larroy
gt; > > Look at the dashboard for master build > > http://jenkins.mxnet-ci.amazon-ml.com/job/incubator-mxnet/job/master/ > > > > -Naveen > > > > On Thu, May 3, 2018 at 5:11 AM, Pedro Larroy < > pedro.larroy.li...@gmail.com > > > > > wrote: >

Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2

2018-05-04 Thread Pedro Larroy
e the tests and the CI. I have seen the PR builds are > >> non-deterministic and you have to retry over and over (wasting resources > >> and time) and hope you get lucky. > >> > >> Look at the dashboard for master build > >> http://jenkins.mxnet-ci.amaz

Re: segmentation fault in master using mkdlnn

2018-05-04 Thread Pedro Larroy
: Command '['docker', 'build', '-f', > 'docker/Dockerfile.build.ubuntu_cpu', '--build-arg', 'USER_ID=1000', > '-t', 'mxnet/build.ubuntu_cpu', 'docker']' returned non-zero exit status 2 > > > On 5/3/18, 8:01 AM, "Pedro Larroy" <pedro.larroy.li...@gmail.com> wrote: > >

Re: MKL with 1.2.0rc3

2018-05-17 Thread Pedro Larroy
It would be great if we could improve the documentation of building with MKL? Whenever I have done this it took quite some effort to get it right, also add the different MKL related libraries. there's also duplicated scripts: ./3rdparty/mkldnn/scripts/prepare_mkl.bat

Flaky test failures are impacting development

2018-05-22 Thread Pedro Larroy
Hi team. Flaky test failures are impacting PR validation and hindering contributions to MXNet. We should prioritize dealing with these failing tests. See recent failures on master: http://jenkins.mxnet-ci.amazon-ml.com/job/incubator-mxnet/job/master/ The biggest offenders right now:

Re: Make scalapkg fails if USE_BLAS is set to openblas/mkl/apple

2018-06-06 Thread Pedro Larroy
t to python passes in 1 of 4 times. The creation of > NDArray's in this case fails though in all cases with similar message that > the stack is corrupted. > > Will update on findings. > > -- Anton > > > 2018-06-05 16:19 GMT+02:00 Pedro Larroy : > > > Could y

Re: home page highlights section updates

2018-06-06 Thread Pedro Larroy
Hi Aaron This is great! what about including social feed like MXNet twitter or medium on the home page as well? I also miss a quick install instruction like other frameworks have in a high visibility place, so there's less friction to try the framework. I think current install instructions are

Re: Good First Issue label

2018-06-06 Thread Pedro Larroy
Nice! I have seen these called "low hanging fruits" we could link them in the instructions to contribute. On Wed, May 30, 2018 at 3:36 AM, sandeep krishnamurthy < sandeep.krishn...@gmail.com> wrote: > Awesome! > > On Tue, May 29, 2018, 6:03 PM Aaron Markham > wrote: > > > Just wanted to bring

Re: Clojure Package

2018-06-06 Thread Pedro Larroy
Hi These Java classes that the document refers to, where are they located? Do we have a Java API atm? The origin of my question is that for android I think we need a Java API. Pedro. On Tue, Jun 5, 2018 at 5:40 PM, Carin Meier wrote: > Thanks everyone. I'll work on getting together a PR with

Re: About Becoming a Committer

2018-06-16 Thread Pedro Larroy
Hi Sebastian. Thank you for your comment. That's why I said "I would propose", because I don't know if it's possible as my experience with Apache is limited to the MXNet project. How do you interpret this part?: "Since the appointed Project Management Committees have the power to create their

Re: About Becoming a Committer

2018-06-16 Thread Pedro Larroy
Great points and feedback. I think everyone here wants the best for the project. We should definitely not shoot down pioneering contributions and be more inclusive with people that are actively contributing to the community with not just code. This should include code, documentation, website

Re: users@mxnet

2018-06-18 Thread Pedro Larroy
I agree with Tianqi, Eric and others. We shouldn't dilute the community with another forum. Disqus is already working and has healthy participation, you can get an email digest if you so desire. Subscribing to a mailing list to get a question answered is quite a heavyweight investment for many

Re: Feature branches for ARM and Android

2018-06-13 Thread Pedro Larroy
Thanks a lot for creating these branches and proposing the idea, for the reasons you listed. We tried during this week to work with these branches with @lebeg for Android and Arm support, for the reasons listed below these branches are not useful for us, so you can delete them. 1. We don't

Re: About Becoming a Committer

2018-06-12 Thread Pedro Larroy
* I personally don't like the idea that comittership status is decided in a closed mail list. This is not the transparency level that I would expect in an open source project. I'm happy to receive feedback from others that might be opposed to my application for committer to know what things could

Re: Pip packages for Raspberry Pi & other ARM architectures

2018-06-13 Thread Pedro Larroy
Thanks, we are in sync and will find a solution next week. On Wed, Jun 13, 2018 at 4:54 PM Marco de Abreu wrote: > I think Sheng manages the publishing to PyPi and we should be fine with > just adding it as another flavour. > > -Marco > > On Wed, Jun 13, 2018 at 6:36 AM Pedro

Re: PR validation and runtime of CI

2018-06-13 Thread Pedro Larroy
> > > > >> > > If a test fails in nightly, the commit would not be reverted since > > >> it's > > >> > > hard to pin a failure to a specific PR. We will have reporting for > > >> > failures > > >> > > on nightly (the

Re: Feature branches for ARM and Android

2018-06-13 Thread Pedro Larroy
ure to merge from apache/master and not larroy/master > if you have conflicts? Not sure why you got these conflicts otherwise. > > All the best, > > Thomas > > 2018-06-12 23:39 GMT-07:00 Pedro Larroy : > > > Thanks a lot for creating these branches and proposing the idea,

test/versions in the website

2018-06-13 Thread Pedro Larroy
Hi We have this url alive, and google is indexing: http://mxnet.incubator.apache.org/test/versions/ Is this url right, or is it pointing some users to the wrong documentation? Pedro

Re: Additional mentor to MXNet - Jim Jagielski

2018-06-19 Thread Pedro Larroy
Welcome Jim. Great to have you in the project. On Mon, Jun 18, 2018 at 10:51 PM Steffen Rochel wrote: > Welcome Jim, appreciating your support. > Steffen > > On Mon, Jun 18, 2018 at 3:14 PM Naveen Swamy wrote: > > > Hi All, > > > > I am excited to announce that we have an additional mentor

Re: Make scalapkg fails if USE_BLAS is set to openblas/mkl/apple

2018-06-05 Thread Pedro Larroy
Could you compile with debug symbols or get a core file? From this output is not clear why the crash is happening. On Sun, May 27, 2018 at 10:04 AM, Naveen Swamy wrote: > Hi, > I am working to publish MXNet-Scala package to maven and encountering an > issue when trying to build with

Re: Make scalapkg fails if USE_BLAS is set to openblas/mkl/apple

2018-06-06 Thread Pedro Larroy
> > > > > > > > > > What I can say for now that this failure is not deterministic (on > > > RPi's) > > > > > and the library import to python passes in 1 of 4 times. The > creation > > > of > > > > > NDArray's in this case fails though in

Re: Clojure Package

2018-06-06 Thread Pedro Larroy
way to interop on the JVM between languages. > There is no pure Java API at the moment that I am aware of. > > On Wed, Jun 6, 2018 at 5:54 AM, Pedro Larroy > > wrote: > > > Hi > > > > These Java classes that the document refers to, where are they located? > Do

PR validation and runtime of CI

2018-06-06 Thread Pedro Larroy
Hi Team The time to validate a PR is growing, due to our number of supported platforms and increased time spent in testing and running models. We are at approximately 3h for a full successful run. This is compounded with the failure rate of builds due to flaky tests of more than 50% which is a

Re: Merging Clojure PR

2018-07-01 Thread Pedro Larroy
gt; Thanks everyone for your feedback and efforts with the Clojure > package > > PR. > > > > > > > > I'm delighted to join the MXNet community and work with you all and > am > > > > excited to invite the Clojure community to grow with it :) > > >

Master doesn't build, RAT license check failure

2018-07-01 Thread Pedro Larroy
Hi Master is not building due to RAT license check on the clojure package. Is anyone having a look at this? Pedro.

Re: Adding section on how to develop with MXNet to the website

2018-07-01 Thread Pedro Larroy
other info is provided. Link > > to your info from the contribute page that's under community. > > > > Ping me if you need help. > > > > Sent from VMware Boxer > > > > On Jun 25, 2018 19:17, Pedro Larroy > wrote: > > Hi > > >

Re: Master doesn't build, RAT license check failure

2018-07-01 Thread Pedro Larroy
Marco > > Marco de Abreu schrieb am So., 1. Juli > 2018, > 11:57: > > > I'm on it. > > > > Pedro Larroy schrieb am So., 1. Juli > 2018, > > 09:03: > > > >> Hi > >> > >> Master is not building due to RAT license check on the clojure package. > Is > >> anyone having a look at this? > >> > >> Pedro. > >> > > >

Adding section on how to develop with MXNet to the website

2018-06-25 Thread Pedro Larroy
Hi I want to add a section on how to develop MXNet itself to attract contributors. Would this be acceptable for the website? Is there any recommended workflow for this? Any tools? is it going into docs and `make html` or something else? Thanks. Pedro

Re: Single-Machine Topology-aware Communication

2018-06-25 Thread Pedro Larroy
Nice design document. From where does it come the default value of MXNET_KVSTORE_GPUARRAY_BOUND of 10M? Do you generate a tree for each GPU? Pedro. On Mon, Jun 18, 2018 at 2:30 PM Carl Yang wrote: > Hi, > > Currently, we have two methods for single-machine communication: > parameter server

cleaning up branches on the main repo

2018-07-02 Thread Pedro Larroy
Hi Could we cleanup some of the branches on the main repo? the devel-arm and devel-android are not needed. Maybe there's something I'm not aware, but what is the policy with the other branches? should in the main repo only be release branches so we don't confuse users? Pedro.

Re: Slack access

2018-05-02 Thread Pedro Larroy
Hi Jesse Welcome! Have you seen the "contribute" documentation? https://mxnet.incubator.apache.org/community/contribute.html Should be easy to contribute via github issues, Jira ticket and a PR. Pedro. On Wed, May 2, 2018 at 3:47 AM, jesse brizzi wrote: > Hey

Re: The New Scala API

2018-05-02 Thread Pedro Larroy
Hi I had a brief look and I have some comments: 1 - Excessive use of Option: Having an optional map is often not necessary. What's the semantic difference between an empty map and passing None? What about a variable argument list of Pairs like: (attr: Pair[String,Sring]*) it can be then

Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC1

2018-05-02 Thread Pedro Larroy
ging this "https://github.com/dmlc/mshadow/pull/319 " > > > > I am curious how are we tracking the sub-modules's PRs which are really > > important for MXNet? > > This PR has been waiting to merge for almost 4 months. > > > > > > > > On Mon, Apr

Re: The New Scala API

2018-05-02 Thread Pedro Larroy
apache/mxnet/SymbolMacro.scala#L147 > > Thanks, > Qing > > On 5/2/18, 3:50 AM, "Pedro Larroy" <pedro.larroy.li...@gmail.com> wrote: > > Hi > > I had a brief look and I have some comments: > > 1 - Excessive use of Option: Having an opt

Re: segmentation fault in master using mkdlnn

2018-05-02 Thread Pedro Larroy
I couldn't reproduce locally with: ci/build.py -p ubuntu_cpu /work/runtime_functions.sh build_ubuntu_cpu_mkldnn && ci/build.py --platform ubuntu_cpu /work/runtime_functions.sh unittest_ubuntu_python2_cpu On Wed, May 2, 2018 at 8:50 PM, Pedro Larroy <pedro.larroy.li...@gmail.com>

segmentation fault in master using mkdlnn

2018-05-02 Thread Pedro Larroy
Hi Seems master is not running anymore, there's a segmentation fault using MKDLNN-CPU http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/master/801/pipeline/662 I see my PRs failing with a similar error. Pedro

Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC1

2018-04-30 Thread Pedro Larroy
-1 We should merge this and update mshadow before the next release: https://github.com/dmlc/mshadow/pull/319 so we compile cuda for Volta. On Sat, Apr 28, 2018 at 12:53 AM, Steffen Rochel wrote: > Hi Chris - acknowledge that building the docs is not as good as it

Re: segmentation fault in master using mkdlnn

2018-05-03 Thread Pedro Larroy
@Chris seems intel inspector requires purchasing right? maybe some of us already owns a license and can execute the test that fails intermittently? test_module.py:test_forward_reshape On Thu, May 3, 2018 at 3:49 PM, Pedro Larroy <pedro.larroy.li...@gmail.com> wrote: > It's very

Re: segmentation fault in master using mkdlnn

2018-05-03 Thread Pedro Larroy
ll me How you reproduce the error? > > On 5/3/18, 7:45 AM, "Pedro Larroy" <pedro.larroy.li...@gmail.com> wrote: > > Looks like a problem in mkl's same_shape > > the pointer to mkldnn::memory::desc looks invalid. > > (More stack frames follow...) >

Re: segmentation fault in master using mkdlnn

2018-05-03 Thread Pedro Larroy
t;> On Wed, May 2, 2018 at 2:14 PM, Zheng, Da <dzz...@amazon.com> > wrote: > > > >> > There might be a race condition that causes the memory error. > > > >> > It might be caused by this PR: > > > >> > https://github.com/apache/incu

Re: segmentation fault in master using mkdlnn

2018-05-03 Thread Pedro Larroy
C++ unit > tests. Not sure it'll cover all memory errors we want. > > Best, > Da > > On 5/3/18, 6:50 AM, "Pedro Larroy" <pedro.larroy.li...@gmail.com> wrote: > > It's very difficult to reproduce, non-deterministic. We were also > running > w

Re: segmentation fault in master using mkdlnn

2018-05-03 Thread Pedro Larroy
) [0x7f7fefbeffc1] [bt] (9) /work/mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::engine::ThreadedEngine::ExecuteOprBlock(mxnet::RunContext, mxnet::engine::OprBlock*)+0xcb5) [0x7f7ff01b1f65] ok On Thu, May 3, 2018 at 3:57 PM, Pedro Larroy <pedro.larroy.li...@gmail.com> wrote: > @Chris seem

Re: segmentation fault in master using mkdlnn

2018-05-03 Thread Pedro Larroy
PM, Pedro Larroy <pedro.larroy.li...@gmail.com> wrote: > Hi Da > > Reproduction instructions: > > On the host: > > Adjust core pattern: > > $ echo '/tmp/core.%h.%e.%t' > /proc/sys/kernel/core_pattern > > > Use the following patch: > >

Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2

2018-05-03 Thread Pedro Larroy
-1 nondeterminisitc failures on CI master: https://issues.apache.org/jira/browse/MXNET-396 Was able to reproduce once in a fresh p3 instance with DLAMI can't reproduce consistently. On Wed, May 2, 2018 at 9:51 PM, Anirudh wrote: > Hi all, > > As part of RC2 release, we

Re: Commiter access to Jenkins Sevrer

2018-01-06 Thread Pedro Larroy
Agree that comitters should have access to Jenkins. I would like to as ask for some patience due to the ongoing progress on the CI work and thank Amazon for providing the resources for running the new CI and the great job done by Marco and the infra team. Are there some volunteers in helping

Increase indentation limit from 100 to 120 characters

2018-01-05 Thread Pedro Larroy
Hi Can we please increase the indent limit from 100 to 120? I find 100 too low for current standards and today's monitors. Default CLion line limit is also 120. I'm having to split some long templates and I wish we had a longer line limit. Thanks a lot. Pedro

Reduce 99% of your memory leaks with this simple trick!

2018-01-11 Thread Pedro Larroy
Hi I would like to encourage contributors to use RAII idioms in C++ whenever possible to avoid resource leaks. RAII is an ugly acronym that stands for Resource Acquisition Is Initialization, which basically means that you should almost never use explicit new and delete operators and instead use

please merge PRs that fixes C++ unit tests and CMake build

2018-01-10 Thread Pedro Larroy
Hi Can a committer please unblock the following PRs? We would like to have C++ tests and CMake builds back in working state. This is also creating build failures when you run CLion for development. This one is blocking more PRs: https://github.com/dmlc/mshadow/pull/317

Re: Module maintainers proposal

2018-01-10 Thread Pedro Larroy
Hi all. Great initiative with maintainers, I hope to see more maintainers for different modules. I'm happy to volunteer as well to help. My main concern is that PRs take a long time for code to get merged, even when changes are trivial. I would like to see lower turn around times for simple PRs,

Re: Commiter access to Jenkins Sevrer

2018-01-08 Thread Pedro Larroy
Regarding the proposed permissions, I would like stricter permissions. I think a committer should be able to stop-start-cancel jobs. But I think only admins should be able to create new jobs, otherwise we run the risk of the CI becoming a mess of jobs that nobody owns and maintains. Please let's

Proposal for treating warnings as errors in Linux & Clang builds (-Werror)

2018-01-15 Thread Pedro Larroy
Hi I would like to propose to compile in CI with warnings as errors for increased code quality. This has a dual purpose: 1. Enforce a clean compilation output. Warnings often indicate deficiencies in the code and hide new warnings which can be an indicator of problems. 2. Warnings can surface

Re: Proposal for treating warnings as errors in Linux & Clang builds (-Werror)

2018-01-16 Thread Pedro Larroy
Mon, Jan 15, 2018 at 9:43 AM, Marco de Abreu < > marco.g.ab...@googlemail.com> wrote: > >> +1 >> >> On Mon, Jan 15, 2018 at 6:27 PM, Pedro Larroy < >> pedro.larroy.li...@gmail.com> >> wrote: >> >> > Hi >> > >> > I wo

Re: Proposal for treating warnings as errors in Linux & Clang builds (-Werror)

2018-01-16 Thread Pedro Larroy
would be good if >> > we can >> > filter most warnings during PR-stage and risk that some are getting >> > into >> > the master branch due to a different compiler version. A reduction of >> > (for >> > example) 95%

Re: Please help update/review pending PRs

2018-01-26 Thread Pedro Larroy
I would like to get these merged for the release, is it too late? https://github.com/dmlc/mshadow/pull/322 https://github.com/dmlc/dmlc-core/pull/357 On Mon, Jan 22, 2018 at 9:01 PM, Haibin Lin wrote: > Hi everyone, > > We still have a long list of outstanding PRs on

Re: CI failure due to offline llvm.org

2018-01-12 Thread Pedro Larroy
I think Chris is right, git clean with the right options plus proper initialization of the submodules should not make any difference versus deleting the entire workspace. Right? On Fri, Jan 12, 2018 at 8:56 AM, kellen sunderland wrote: > Doing a few searches I see

Re: [DISCUSSION] Adding labels to PRs

2018-01-15 Thread Pedro Larroy
+1 Agree with Bhavin, Marco and Sheng. I would also like to point out good commit practices such as, keeping each individual commit small and on-topic, meaning if that you are changing whitespace and a one liner fix, it's better practice to separate those commits. Or having two separate commits

Re: Reduce 99% of your memory leaks with this simple trick!

2018-01-15 Thread Pedro Larroy
> exceptions. >> > > when you do LOG(FATAL) or CHECK is caught at the C API boundary, which >> > > translates to return code -1 and an error is thrown on the python >> side. >> > > Throwing exception from another thread is a more tricky

Re: Call for Help for Fixing Flaky Tests

2018-01-15 Thread Pedro Larroy
Agree with Bhavin's arguments 100%. Please don't compromise the stability of CI with Flaky tests. Address the root cause of why these tests are failing / not deterministic as per propper engineering standards. Hence, my non-binding vote is: -1 for proposal #1 for re-enabling flaky tests. +1 for

cuda CUDNN auto tune, optimal parameters of cuda kernels

2018-01-24 Thread Pedro Larroy
Hi We have identified that cuda cudnn autotune produces a significant spike of ram usage when finding the best convolution algorithm. As far as we understand this is inside the cudnn library. But in platforms like the TX1 where we only have 4G this is problematic as the spike is close to 4G.

Outstanding PR

2018-02-01 Thread Pedro Larroy
Hi Can some of the admins please merge the following PR? I can't stand this warning on nvcc flooding my compilation anymore: https://github.com/dmlc/mshadow/pull/322 Thank you so much.

Re: JIRA notifications on dev@

2018-02-14 Thread Pedro Larroy
Is there a new alias to subscribe to get the Jira notifications? On Wed, Feb 7, 2018 at 7:10 PM, Marco de Abreu wrote: > Ticket is available at > https://issues.apache.org/jira/plugins/servlet/mobile#issue/INFRA-15997 > > -Marco > > Am 07.02.2018 7:09 nachm. schrieb

  1   2   3   4   >