Re: Update on upcoming changes to the MXNet CI: Jenkins
This makes total sense, Aaron. We can probably spend some time on these modifications once we complete originally mentioned changes __ On 2/13/20, 9:21 AM, "Aaron Markham" wrote: +1 These are good action items that should help alleviate part of the CI issues. The following comments are not to take away from your proposal. Move forward, assuming the community agrees. I'd really like to see particular tests run only when the PR is touching a related part. While this is more effort, it would really make a major difference. Light research shows that projects have been doing this for quite some time, so it wouldn't be a new invention and deep exploration. I realize there are a lot of interdependencies and it would probably not work for everything. But, what if we start small? --> Docs pages (*.rst, *.md, *.html, *.js, *.css): don't trigger most tests, especially GPU and cross-platform tests. --> Tutorials that have GPU requirements run their own validation tests, and tutorials that don't have GPU requirement don't get tested on GPUs. Cheers, Aaron On Wed, Feb 12, 2020 at 10:12 AM Davydenko, Denis wrote: > > Hello, MXNet dev community, > As you all know, the experience with CI infrastructure isn’t ideal in spite of its high cost. For this reason, we’re proposing the following changes to improve stability, reduce cost, and grant more control to contributors. As we work in a refresh of CI, we believe these changes will reduce the pain we all suffer when we try to push a PR through the system. > > Following is the list of changes: > Fix missing status reports between GH and Jenkins > Update Jenkins permission groups to re-trigger builds > Introduce per-PR CI bot > Details: > > - Fix missing status reports > Currently, once commit gets added to PR - the CI is run on that added commit. Sometimes, CI run status is missing from the commit in Github despite having completed in Jenkins. Example: CI run: http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Funix-cpu/detail/PR-17376/17/pipeline, commit status in github (missing unix-cpu, unix-gpu and windows-gpu statuses): https://github.com/apache/incubator-mxnet/pull/17376#partial-pull-merging. > Problem: There seems to be a bug where some status reports are missing on Github. The hypothesis is that there is some issue with Github Hooks. > > - Update Jenkins permission groups to re-trigger builds > Problem: Currently, only MXNet Committers and selected people from AWS have the ability to re-trigger CI runs on PRs. This leaves the PR Authors waiting for authorized users to re-trigger their PRs for them. > Solution : Allow these membership categories Jenkins Admins, MXNet Committers, and PR Authors to re-trigger PR builds. > > - Introduce per-PR CI bot > Problem: As of date, MXNet CI is automated. It runs every time a commit is pushed onto your Github PR. This results in lot of unnecessary CI runs apart from added costs. > Solution: Switch to Manual Trigger. Users from authorized groups (1 of the 3 categories mentioned above) can trigger CI run by adding a simple comment to PR: “[mxnet-ci] run”. > > -- > Thank you, > > AWS MXNet team > > >
Re: Update on upcoming changes to the MXNet CI: Jenkins
+1 These are good action items that should help alleviate part of the CI issues. The following comments are not to take away from your proposal. Move forward, assuming the community agrees. I'd really like to see particular tests run only when the PR is touching a related part. While this is more effort, it would really make a major difference. Light research shows that projects have been doing this for quite some time, so it wouldn't be a new invention and deep exploration. I realize there are a lot of interdependencies and it would probably not work for everything. But, what if we start small? --> Docs pages (*.rst, *.md, *.html, *.js, *.css): don't trigger most tests, especially GPU and cross-platform tests. --> Tutorials that have GPU requirements run their own validation tests, and tutorials that don't have GPU requirement don't get tested on GPUs. Cheers, Aaron On Wed, Feb 12, 2020 at 10:12 AM Davydenko, Denis wrote: > > Hello, MXNet dev community, > As you all know, the experience with CI infrastructure isn’t ideal in spite > of its high cost. For this reason, we’re proposing the following changes to > improve stability, reduce cost, and grant more control to contributors. As we > work in a refresh of CI, we believe these changes will reduce the pain we all > suffer when we try to push a PR through the system. > > Following is the list of changes: > Fix missing status reports between GH and Jenkins > Update Jenkins permission groups to re-trigger builds > Introduce per-PR CI bot > Details: > > - Fix missing status reports > Currently, once commit gets added to PR - the CI is run on that added commit. > Sometimes, CI run status is missing from the commit in Github despite having > completed in Jenkins. Example: CI run: > http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Funix-cpu/detail/PR-17376/17/pipeline, > commit status in github (missing unix-cpu, unix-gpu and windows-gpu > statuses): > https://github.com/apache/incubator-mxnet/pull/17376#partial-pull-merging. > Problem: There seems to be a bug where some status reports are missing on > Github. The hypothesis is that there is some issue with Github Hooks. > > - Update Jenkins permission groups to re-trigger builds > Problem: Currently, only MXNet Committers and selected people from AWS have > the ability to re-trigger CI runs on PRs. This leaves the PR Authors waiting > for authorized users to re-trigger their PRs for them. > Solution : Allow these membership categories Jenkins Admins, MXNet > Committers, and PR Authors to re-trigger PR builds. > > - Introduce per-PR CI bot > Problem: As of date, MXNet CI is automated. It runs every time a commit is > pushed onto your Github PR. This results in lot of unnecessary CI runs apart > from added costs. > Solution: Switch to Manual Trigger. Users from authorized groups (1 of the 3 > categories mentioned above) can trigger CI run by adding a simple comment to > PR: “[mxnet-ci] run”. > > -- > Thank you, > > AWS MXNet team > > >
Re: Update on upcoming changes to the MXNet CI: Jenkins
Can someone educate me how to re-trigger a single test suite in CI? On Thu, Feb 13, 2020 at 5:10 AM Lausen, Leonard wrote: > Hi Denis, > > pipeline may be the wrong word, job may be the correct one. For example, > commiters can currently access a job page like > > http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Funix-gpu/detail/PR-17521/5/ > , press "Login" and then the restart button to only retrigger that job, > obtaining > > http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Funix-gpu/detail/PR-17521/6/ > > This is correctly reported to Github and the status will change from > failed to > passed once depending on the result of the new job. > > Best regards > Leonard > > On Wed, 2020-02-12 at 20:23 +, Davydenko, Denis wrote: > > This might or might not work given that GH PR is failed or not given > overall > > CI run status, not just few builds from it. But it is a good suggestion > to try > > out, we will evaluate whether it could be accomplished. Thanks! > > > > > > > > On 2/12/20, 11:05 AM, "Lausen, Leonard" > wrote: > > > > Thank you Denis for taking up this initiative. With respect to > "Introduce > > per-PR > > CI bot" and the "[mxnet-ci] run" command. Would it make sense to add > > "retriggering only failed pipelines" to the scope? For example users > could > > be > > asked to specify the name of the pipeline, or have "[mxnet-ci] run > all" > > and > > "[mxnet-ci] run failed". > > > > In the current state, when retriggering all pipelines, it's likely > that > > one of > > them will fail. Only by retriggering the failed pipeline alone there > is a > > higher > > chance to arrive at a state where all pipelines have succeeded. > > > > On Wed, 2020-02-12 at 10:12 -0800, Davydenko, Denis wrote: > > > Hello, MXNet dev community, > > > As you all know, the experience with CI infrastructure isn’t ideal > in > > spite of > > > its high cost. For this reason, we’re proposing the following > changes to > > > improve stability, reduce cost, and grant more control to > contributors. > > As we > > > work in a refresh of CI, we believe these changes will reduce the > pain > > we all > > > suffer when we try to push a PR through the system. > > > > > > Following is the list of changes: > > > Fix missing status reports between GH and Jenkins > > > Update Jenkins permission groups to re-trigger builds > > > Introduce per-PR CI bot > > > Details: > > > > > > - Fix missing status reports > > > Currently, once commit gets added to PR - the CI is run on that > added > > commit. > > > Sometimes, CI run status is missing from the commit in Github > despite > > having > > > completed in Jenkins. Example: CI run: > > > > > > http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Funix-cpu/detail/PR-17376/17/pipeline > > > , commit status in github (missing unix-cpu, unix-gpu and > windows-gpu > > > statuses): > > > > > > https://github.com/apache/incubator-mxnet/pull/17376#partial-pull-merging. > > > Problem: There seems to be a bug where some status reports are > missing > > on > > > Github. The hypothesis is that there is some issue with Github > Hooks. > > > > > > - Update Jenkins permission groups to re-trigger builds > > > Problem: Currently, only MXNet Committers and selected people from > AWS > > have > > > the ability to re-trigger CI runs on PRs. This leaves the PR > Authors > > waiting > > > for authorized users to re-trigger their PRs for them. > > > Solution : Allow these membership categories Jenkins Admins, MXNet > > Committers, > > > and PR Authors to re-trigger PR builds. > > > > > > - Introduce per-PR CI bot > > > Problem: As of date, MXNet CI is automated. It runs every time a > commit > > is > > > pushed onto your Github PR. This results in lot of unnecessary CI > runs > > apart > > > from added costs. > > > Solution: Switch to Manual Trigger. Users from authorized groups > (1 of > > the 3 > > > categories mentioned above) can trigger CI run by adding a simple > > comment to > > > PR: “[mxnet-ci] run”. > > > > > > -- > > > Thank you, > > > > > > AWS MXNet team > > > > > > > > > > > > > >
Re: Update on upcoming changes to the MXNet CI: Jenkins
Hi Denis, pipeline may be the wrong word, job may be the correct one. For example, commiters can currently access a job page like http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Funix-gpu/detail/PR-17521/5/ , press "Login" and then the restart button to only retrigger that job, obtaining http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Funix-gpu/detail/PR-17521/6/ This is correctly reported to Github and the status will change from failed to passed once depending on the result of the new job. Best regards Leonard On Wed, 2020-02-12 at 20:23 +, Davydenko, Denis wrote: > This might or might not work given that GH PR is failed or not given overall > CI run status, not just few builds from it. But it is a good suggestion to try > out, we will evaluate whether it could be accomplished. Thanks! > > > > On 2/12/20, 11:05 AM, "Lausen, Leonard" wrote: > > Thank you Denis for taking up this initiative. With respect to "Introduce > per-PR > CI bot" and the "[mxnet-ci] run" command. Would it make sense to add > "retriggering only failed pipelines" to the scope? For example users could > be > asked to specify the name of the pipeline, or have "[mxnet-ci] run all" > and > "[mxnet-ci] run failed". > > In the current state, when retriggering all pipelines, it's likely that > one of > them will fail. Only by retriggering the failed pipeline alone there is a > higher > chance to arrive at a state where all pipelines have succeeded. > > On Wed, 2020-02-12 at 10:12 -0800, Davydenko, Denis wrote: > > Hello, MXNet dev community, > > As you all know, the experience with CI infrastructure isn’t ideal in > spite of > > its high cost. For this reason, we’re proposing the following changes to > > improve stability, reduce cost, and grant more control to contributors. > As we > > work in a refresh of CI, we believe these changes will reduce the pain > we all > > suffer when we try to push a PR through the system. > > > > Following is the list of changes: > > Fix missing status reports between GH and Jenkins > > Update Jenkins permission groups to re-trigger builds > > Introduce per-PR CI bot > > Details: > > > > - Fix missing status reports > > Currently, once commit gets added to PR - the CI is run on that added > commit. > > Sometimes, CI run status is missing from the commit in Github despite > having > > completed in Jenkins. Example: CI run: > > > http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Funix-cpu/detail/PR-17376/17/pipeline > > , commit status in github (missing unix-cpu, unix-gpu and windows-gpu > > statuses): > > > https://github.com/apache/incubator-mxnet/pull/17376#partial-pull-merging. > > Problem: There seems to be a bug where some status reports are missing > on > > Github. The hypothesis is that there is some issue with Github Hooks. > > > > - Update Jenkins permission groups to re-trigger builds > > Problem: Currently, only MXNet Committers and selected people from AWS > have > > the ability to re-trigger CI runs on PRs. This leaves the PR Authors > waiting > > for authorized users to re-trigger their PRs for them. > > Solution : Allow these membership categories Jenkins Admins, MXNet > Committers, > > and PR Authors to re-trigger PR builds. > > > > - Introduce per-PR CI bot > > Problem: As of date, MXNet CI is automated. It runs every time a commit > is > > pushed onto your Github PR. This results in lot of unnecessary CI runs > apart > > from added costs. > > Solution: Switch to Manual Trigger. Users from authorized groups (1 of > the 3 > > categories mentioned above) can trigger CI run by adding a simple > comment to > > PR: “[mxnet-ci] run”. > > > > -- > > Thank you, > > > > AWS MXNet team > > > > > > > >
Re: Update on upcoming changes to the MXNet CI: Jenkins
This might or might not work given that GH PR is failed or not given overall CI run status, not just few builds from it. But it is a good suggestion to try out, we will evaluate whether it could be accomplished. Thanks! On 2/12/20, 11:05 AM, "Lausen, Leonard" wrote: Thank you Denis for taking up this initiative. With respect to "Introduce per-PR CI bot" and the "[mxnet-ci] run" command. Would it make sense to add "retriggering only failed pipelines" to the scope? For example users could be asked to specify the name of the pipeline, or have "[mxnet-ci] run all" and "[mxnet-ci] run failed". In the current state, when retriggering all pipelines, it's likely that one of them will fail. Only by retriggering the failed pipeline alone there is a higher chance to arrive at a state where all pipelines have succeeded. On Wed, 2020-02-12 at 10:12 -0800, Davydenko, Denis wrote: > Hello, MXNet dev community, > As you all know, the experience with CI infrastructure isn’t ideal in spite of > its high cost. For this reason, we’re proposing the following changes to > improve stability, reduce cost, and grant more control to contributors. As we > work in a refresh of CI, we believe these changes will reduce the pain we all > suffer when we try to push a PR through the system. > > Following is the list of changes: > Fix missing status reports between GH and Jenkins > Update Jenkins permission groups to re-trigger builds > Introduce per-PR CI bot > Details: > > - Fix missing status reports > Currently, once commit gets added to PR - the CI is run on that added commit. > Sometimes, CI run status is missing from the commit in Github despite having > completed in Jenkins. Example: CI run: > http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Funix-cpu/detail/PR-17376/17/pipeline > , commit status in github (missing unix-cpu, unix-gpu and windows-gpu > statuses): > https://github.com/apache/incubator-mxnet/pull/17376#partial-pull-merging. > Problem: There seems to be a bug where some status reports are missing on > Github. The hypothesis is that there is some issue with Github Hooks. > > - Update Jenkins permission groups to re-trigger builds > Problem: Currently, only MXNet Committers and selected people from AWS have > the ability to re-trigger CI runs on PRs. This leaves the PR Authors waiting > for authorized users to re-trigger their PRs for them. > Solution : Allow these membership categories Jenkins Admins, MXNet Committers, > and PR Authors to re-trigger PR builds. > > - Introduce per-PR CI bot > Problem: As of date, MXNet CI is automated. It runs every time a commit is > pushed onto your Github PR. This results in lot of unnecessary CI runs apart > from added costs. > Solution: Switch to Manual Trigger. Users from authorized groups (1 of the 3 > categories mentioned above) can trigger CI run by adding a simple comment to > PR: “[mxnet-ci] run”. > > -- > Thank you, > > AWS MXNet team > > >
Re: Update on upcoming changes to the MXNet CI: Jenkins
We intend this bot to be very simplistic initially. But your idea is very interesting and we will consider if we can roll this out as phase 2. On 2/12/20, 10:57 AM, "PrzemysÅ≠aw TrÄ˙dak" wrote: Hi Denis, Could this bot be smart enough to first do the sanity pipeline (to catch stuff like lint errors etc.) before launching the full thing? Thanks Przemek On 2020/02/12 18:12:07, "Davydenko, Denis" wrote: > Hello, MXNet dev community, > As you all know, the experience with CI infrastructure isn’t ideal in spite of its high cost. For this reason, we’re proposing the following changes to improve stability, reduce cost, and grant more control to contributors. As we work in a refresh of CI, we believe these changes will reduce the pain we all suffer when we try to push a PR through the system. > > Following is the list of changes: > Fix missing status reports between GH and Jenkins > Update Jenkins permission groups to re-trigger builds > Introduce per-PR CI bot > Details: > > - Fix missing status reports > Currently, once commit gets added to PR - the CI is run on that added commit. Sometimes, CI run status is missing from the commit in Github despite having completed in Jenkins. Example: CI run: http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Funix-cpu/detail/PR-17376/17/pipeline, commit status in github (missing unix-cpu, unix-gpu and windows-gpu statuses): https://github.com/apache/incubator-mxnet/pull/17376#partial-pull-merging. > Problem: There seems to be a bug where some status reports are missing on Github. The hypothesis is that there is some issue with Github Hooks. > > - Update Jenkins permission groups to re-trigger builds > Problem: Currently, only MXNet Committers and selected people from AWS have the ability to re-trigger CI runs on PRs. This leaves the PR Authors waiting for authorized users to re-trigger their PRs for them. > Solution : Allow these membership categories Jenkins Admins, MXNet Committers, and PR Authors to re-trigger PR builds. > > - Introduce per-PR CI bot > Problem: As of date, MXNet CI is automated. It runs every time a commit is pushed onto your Github PR. This results in lot of unnecessary CI runs apart from added costs. > Solution: Switch to Manual Trigger. Users from authorized groups (1 of the 3 categories mentioned above) can trigger CI run by adding a simple comment to PR: “[mxnet-ci] run”. > > -- > Thank you, > > AWS MXNet team > > > >
Re: Update on upcoming changes to the MXNet CI: Jenkins
Thank you Denis for taking up this initiative. With respect to "Introduce per-PR CI bot" and the "[mxnet-ci] run" command. Would it make sense to add "retriggering only failed pipelines" to the scope? For example users could be asked to specify the name of the pipeline, or have "[mxnet-ci] run all" and "[mxnet-ci] run failed". In the current state, when retriggering all pipelines, it's likely that one of them will fail. Only by retriggering the failed pipeline alone there is a higher chance to arrive at a state where all pipelines have succeeded. On Wed, 2020-02-12 at 10:12 -0800, Davydenko, Denis wrote: > Hello, MXNet dev community, > As you all know, the experience with CI infrastructure isn’t ideal in spite of > its high cost. For this reason, we’re proposing the following changes to > improve stability, reduce cost, and grant more control to contributors. As we > work in a refresh of CI, we believe these changes will reduce the pain we all > suffer when we try to push a PR through the system. > > Following is the list of changes: > Fix missing status reports between GH and Jenkins > Update Jenkins permission groups to re-trigger builds > Introduce per-PR CI bot > Details: > > - Fix missing status reports > Currently, once commit gets added to PR - the CI is run on that added commit. > Sometimes, CI run status is missing from the commit in Github despite having > completed in Jenkins. Example: CI run: > http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Funix-cpu/detail/PR-17376/17/pipeline > , commit status in github (missing unix-cpu, unix-gpu and windows-gpu > statuses): > https://github.com/apache/incubator-mxnet/pull/17376#partial-pull-merging. > Problem: There seems to be a bug where some status reports are missing on > Github. The hypothesis is that there is some issue with Github Hooks. > > - Update Jenkins permission groups to re-trigger builds > Problem: Currently, only MXNet Committers and selected people from AWS have > the ability to re-trigger CI runs on PRs. This leaves the PR Authors waiting > for authorized users to re-trigger their PRs for them. > Solution : Allow these membership categories Jenkins Admins, MXNet Committers, > and PR Authors to re-trigger PR builds. > > - Introduce per-PR CI bot > Problem: As of date, MXNet CI is automated. It runs every time a commit is > pushed onto your Github PR. This results in lot of unnecessary CI runs apart > from added costs. > Solution: Switch to Manual Trigger. Users from authorized groups (1 of the 3 > categories mentioned above) can trigger CI run by adding a simple comment to > PR: “[mxnet-ci] run”. > > -- > Thank you, > > AWS MXNet team > > >
Re: Update on upcoming changes to the MXNet CI: Jenkins
Hi Denis, Could this bot be smart enough to first do the sanity pipeline (to catch stuff like lint errors etc.) before launching the full thing? Thanks Przemek On 2020/02/12 18:12:07, "Davydenko, Denis" wrote: > Hello, MXNet dev community, > As you all know, the experience with CI infrastructure isn’t ideal in spite > of its high cost. For this reason, we’re proposing the following changes to > improve stability, reduce cost, and grant more control to contributors. As we > work in a refresh of CI, we believe these changes will reduce the pain we all > suffer when we try to push a PR through the system. > > Following is the list of changes: > Fix missing status reports between GH and Jenkins > Update Jenkins permission groups to re-trigger builds > Introduce per-PR CI bot > Details: > > - Fix missing status reports > Currently, once commit gets added to PR - the CI is run on that added commit. > Sometimes, CI run status is missing from the commit in Github despite having > completed in Jenkins. Example: CI run: > http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Funix-cpu/detail/PR-17376/17/pipeline, > commit status in github (missing unix-cpu, unix-gpu and windows-gpu > statuses): > https://github.com/apache/incubator-mxnet/pull/17376#partial-pull-merging. > Problem: There seems to be a bug where some status reports are missing on > Github. The hypothesis is that there is some issue with Github Hooks. > > - Update Jenkins permission groups to re-trigger builds > Problem: Currently, only MXNet Committers and selected people from AWS have > the ability to re-trigger CI runs on PRs. This leaves the PR Authors waiting > for authorized users to re-trigger their PRs for them. > Solution : Allow these membership categories Jenkins Admins, MXNet > Committers, and PR Authors to re-trigger PR builds. > > - Introduce per-PR CI bot > Problem: As of date, MXNet CI is automated. It runs every time a commit is > pushed onto your Github PR. This results in lot of unnecessary CI runs apart > from added costs. > Solution: Switch to Manual Trigger. Users from authorized groups (1 of the 3 > categories mentioned above) can trigger CI run by adding a simple comment to > PR: “[mxnet-ci] run”. > > -- > Thank you, > > AWS MXNet team > > > >