Re: CI impaired

2018-12-02 Thread Hagay Lupesko
Thanks for the update Marco and all the hard work put into the CI! On Sat, Dec 1, 2018 at 1:21 PM Marco de Abreu wrote: > Hello everyone, > > the move has just been completed and the old big pipeline as well as the > according job have been disabled. From now on, you will see the details >

Re: CI impaired

2018-12-01 Thread Marco de Abreu
Hello everyone, the move has just been completed and the old big pipeline as well as the according job have been disabled. From now on, you will see the details status messages below your PRs. Some people wanted to make modifications to the Jenkinsfiles recently. In that case, your PR will show

Re: CI impaired

2018-11-30 Thread Marco de Abreu
Thanks Naveen and Gavin! #1 has been completed and every job has finished its processing. #2 is the ticket with infra: https://issues.apache.org/jira/browse/INFRA-17346 I'm now waiting for their response. -Marco On Fri, Nov 30, 2018 at 8:25 PM Naveen Swamy wrote: > Hi Marco/Gavin, > >

Re: CI impaired

2018-11-30 Thread Naveen Swamy
Hi Marco/Gavin, Thanks for the clarification. I was not aware that it has been tested on a separate test environment(this is what I was suggesting and make the changes in a more controlled manner), last time the change was made, many PRs were left dangling and developers had to go trigger and I

Re: CI impaired

2018-11-30 Thread Gavin M. Bell
Hey Folks, Marco has been running this change in dev, with flying colors, for some time. This is not an experiment but a roll out that was announced. We also decided to make this change post the release cut so limit the blast radius from any critical obligations to the community. Marco is

Re: CI impaired

2018-11-30 Thread Naveen Swamy
There are still pending PRs pending that needs to be merged and cherry picked to the branch > On Nov 30, 2018, at 6:53 AM, Marco de Abreu > wrote: > > Hello, > > I'm now moving forward with #1. I will try to get to #3 as soon as possible > to reduce parallel jobs in our CI. You might notice

Re: CI impaired

2018-11-30 Thread Marco de Abreu
Hello, I'm now moving forward with #1. I will try to get to #3 as soon as possible to reduce parallel jobs in our CI. You might notice some unfinished jobs. I will let you know as soon as this process has been completed. Until then, please bare with me since we have hundreds of jobs to run in

Re: CI impaired

2018-11-29 Thread Marco de Abreu
Hello, since the release branch has now been cut, I would like to move forward with the CI improvements for the master branch. This would include the following actions: 1. Re-enable the new Jenkins job 2. Request Apache Infra to move the protected branch check from the main pipeline to our new

Re: CI impaired

2018-11-25 Thread kellen sunderland
Sorry, [1] meant to reference https://issues.jenkins-ci.org/browse/JENKINS-37984 . On Sun, Nov 25, 2018 at 5:41 PM kellen sunderland < kellen.sunderl...@gmail.com> wrote: > Marco and I ran into another urgent issue over the weekend that was > causing builds to fail. This issue was unrelated to

Re: CI impaired

2018-11-25 Thread kellen sunderland
Marco and I ran into another urgent issue over the weekend that was causing builds to fail. This issue was unrelated to any feature development work, or other CI fixes applied recently, but it did require quite a bit of work from Marco (and a little from me) to fix. We spent enough time on the

Re: CI impaired

2018-11-25 Thread Steffen Rochel
Hi Marco - suggest to retrigger PRs, if needed in stages: - pr-awaiting-merge - pr-awaiting-review that would cover 78 PR. In any case I would exclude pr-work-in-progress. Steffen On Sat, Nov 24, 2018 at 9:11 PM kellen sunderland < kellen.sunderl...@gmail.com> wrote: > Hey Marco, I'm still

Re: CI impaired

2018-11-24 Thread kellen sunderland
Hey Marco, I'm still having quite a few issues passing PRs. Would you be able to at least test a handful of PRs and make sure they pass/fail tests as you expect? On Sat, Nov 24, 2018, 7:01 PM Marco de Abreu Hello Steffen, > > thank you for bringing up these PRs. > > I had to abort the builds

Re: CI impaired

2018-11-24 Thread Marco de Abreu
Hello Steffen, thank you for bringing up these PRs. I had to abort the builds during the outage which means that the jobs didn't finish and not even the status propagation could have finished (hence they show pending instead of failure or aborted). Recently, we merged a PR that adds utility

Re: CI impaired

2018-11-24 Thread Steffen Rochel
Thanks Marco for the updates and resolving the issues. However, I do see a number of PR waiting to be merged with inconsistent PR validation status check. E.g. https://github.com/apache/incubator-mxnet/pull/13041 shows 9 pending checks being queued. However, when you look at the details, either

Re: CI impaired

2018-11-22 Thread Marco de Abreu
Thanks everybody, I really appreciate it! Today was a good day, there were no incidents and everything appears to be stable. In the meantime I did a deep dive on why we has such a significant performance decrease with of our compilation jobs - which then clogged up the queue and resulted in 1000

Re: CI impaired

2018-11-21 Thread Qing Lan
Appreciated for your effort and help to make CI a better place! Qing On 11/21/18, 4:38 PM, "Lin Yuan" wrote: Thanks for your efforts, Marco! On Wed, Nov 21, 2018 at 4:02 PM Anirudh Subramanian wrote: > Thanks for the quick response and mitigation! > > On

Re: CI impaired

2018-11-21 Thread Marco de Abreu
Hello, today, CI had some issues and I had to cancel all jobs a few minutes ago. This was basically caused by the high load that is currently being put on our CI system due to the pre-release efforts for this Friday. It's really unfortunate that we just had outages of three core components

Re: CI impaired

2018-11-21 Thread Gavin M Bell
Yes, let me add to the kudos, very nice work Marco. "I'm trying real hard to be the shepherd." -Jules Winnfield > On Nov 21, 2018, at 5:04 PM, Sunderland, Kellen > wrote: > > Appreciate the big effort in bring the CI back so quickly. Thanks Marco. > > On Nov 21, 2018 5:52 AM, Marco de

Re: CI impaired

2018-11-21 Thread Sunderland, Kellen
Appreciate the big effort in bring the CI back so quickly. Thanks Marco. On Nov 21, 2018 5:52 AM, Marco de Abreu wrote: Thanks Aaron! Just for the record, the new Jenkins jobs were unrelated to that incident. If somebody is interested in the details around the outage: Due to a required

Re: CI impaired

2018-11-21 Thread Marco de Abreu
Thanks Aaron! Just for the record, the new Jenkins jobs were unrelated to that incident. If somebody is interested in the details around the outage: Due to a required maintenance (disk running full), we had to upgrade our Jenkins master because it was running on Ubuntu 17.04 (for an unknown

Re: CI impaired

2018-11-21 Thread Aaron Markham
Marco, thanks for your hard work on this. I'm super excited about the new Jenkins jobs. This is going to be very helpful and improve sanity for our PRs and ourselves! Cheers, Aaron On Wed, Nov 21, 2018, 05:37 Marco de Abreu Hello, > > the CI is now back up and running. Auto scaling is working

Re: CI impaired

2018-11-21 Thread Marco de Abreu
Hello, the CI is now back up and running. Auto scaling is working as expected and it passed our load tests. Please excuse the caused inconveniences. Best regards, Marco On Wed, Nov 21, 2018 at 5:24 AM Marco de Abreu wrote: > Hello, > > I'd like to let you know that our CI was impaired and