Re: [Proposal] Stabilizing Apache MXNet CI build system
Chris, The Windows slaves on apache use EIPs which makes it easier to replace/reboot/reconnect these instances. But, there are some reasons because of which EIPs cannot be used for ubuntu slaves Several workarounds are being explored for this. And one such solution is to use the aws codebuild plugin with Jenkins - 1. In Jenkins there is a plugin to integrate with aws codebuild which can be used to automate slave management. 2. The idea is to configure only the *ubuntu* slaves using this plugin. This addresses the issue of EIPs and automation on ubuntu. 3. Other platforms such as windows and Edge devices continue to be configured directly through jenkins without using this plugin. This is ok since windows slaves anyway use EIPs At this point this is only in POC stage. Thanks, Meghna Baijal On Thu, Nov 9, 2017 at 12:23 PM, Meghna Baijal wrote: > Pedro, I created a row for BuildBot in the doc. Do you want to add some > pros and cons about it? It would be good to have all this information > collected in one place. > > Meghna > > On Thu, Nov 9, 2017 at 4:40 AM, Larroy, Pedro wrote: > >> Thanks a lot for the document and leading the discussion. >> >> Does anybody have experience with a build system other than Jenkins? In >> the document we mention Teamcity as a possible option, and there’s also the >> second leading open source CI tool “Buildbot” which is not mentioned. >> >> I’m not sure if we have strong evidence to have an informed decision >> about using something other than Jenkins, also from the document I get that >> the negatives of Jenkins are pretty minor compared to the other frameworks. >> >> I would be interested to read if somebody has used any other framework in >> depth and is willing to vote against using Jenkins so we can all do an >> informed vote. >> >> I don’t feel comfortable voting for Jenkins because is the only one I >> know as well. >> >> Kind regards. >> -- >> >> Pedro >> >> On 08/11/17 23:41, "Meghna Baijal" wrote: >> >> Thanks for the active discussion on the document for the new CI for >> MXNet. >> Now that many of you have reviewed it, do you think I should start a >> vote >> on which framework the community wants to move forward with ? >> >> Thanks, >> Meghna >> >> On Mon, Nov 6, 2017 at 6:59 PM, Chris Olivier >> wrote: >> >> > After a decision is reached, i am willing to add tasks to Apache >> MXNet JIRA >> > >> > On Mon, Nov 6, 2017 at 6:15 AM, Pedro Larroy < >> pedro.larroy.li...@gmail.com >> > > >> > wrote: >> > >> > > Thanks for setting up the document guys, looks like a solid basis >> to >> > > start to work on! >> > > >> > > Marco, Kellen and I have already added some comments. >> > > >> > > Pedro >> > > >> > > >> > > On Sun, Nov 5, 2017 at 3:43 AM, Meghna Baijal >> > > wrote: >> > > > Kellen, Thank you for your comments in the doc. >> > > > Sure Steffen, I will continue to merge everyone’s comments into >> the doc >> > > and >> > > > work with Pedro to finalize it. >> > > > And then we can vote on the options. >> > > > >> > > > Thanks, >> > > > Meghna Baijal >> > > > >> > > > >> > > > On Sat, Nov 4, 2017 at 6:34 AM, Steffen Rochel < >> > steffenroc...@gmail.com> >> > > > wrote: >> > > > >> > > >> Sandeep and Meghna have been working in background collecting >> input >> > and >> > > >> preparing a doc. I suggest to drive discussion forward and >> would like >> > to >> > > >> ask everybody to contribute to >> > > >> https://docs.google.com/document/d/17PEasQ2VWrXi2Cf7IGZSWGZM >> awxDk >> > > >> dlavUDASzUmLjk/edit?usp=sharing >> > > >> >> > > >> Lets converge on requirements and architecture, so we can move >> forward >> > > with >> > > >> implementation. >> > > >> >> > > >> I would like to suggest for Pedro and Meghna to lead the >> discussion >> > and >> > > >> help to resolve suggestions. >> > > >> >> > > >> I assume we need a vote once we are converged on a good draft >> to call >> > > it a >> > > >> plan and move forward with implementation. As we all are >> unhappy with >> > > the >> > > >> current CI situation I would also suggest a phased approach, >> so we can >> > > get >> > > >> back to reliable and efficient basic CI quickly and add >> advanced >> > > >> capabilities over time. >> > > >> >> > > >> Steffen >> > > >> >> > > >> On Wed, Nov 1, 2017 at 1:14 PM kellen sunderland < >> > > >> kellen.sunderl...@gmail.com> wrote: >> > > >> >> > > >> > Hey Henri, I think that's what a few of us are advocating. >> Running >> > a >> > > set >> > > >> > of quick tests as part of the PR process, and then a more >> detailed >> > > >> > regression test suite periodically (say every 4 hours). This >> fits >> > > nicely >> > > >> > into a tagging or 2 branch development system. Commi
Re: [Proposal] Stabilizing Apache MXNet CI build system
Pedro, I created a row for BuildBot in the doc. Do you want to add some pros and cons about it? It would be good to have all this information collected in one place. Meghna On Thu, Nov 9, 2017 at 4:40 AM, Larroy, Pedro wrote: > Thanks a lot for the document and leading the discussion. > > Does anybody have experience with a build system other than Jenkins? In > the document we mention Teamcity as a possible option, and there’s also the > second leading open source CI tool “Buildbot” which is not mentioned. > > I’m not sure if we have strong evidence to have an informed decision about > using something other than Jenkins, also from the document I get that the > negatives of Jenkins are pretty minor compared to the other frameworks. > > I would be interested to read if somebody has used any other framework in > depth and is willing to vote against using Jenkins so we can all do an > informed vote. > > I don’t feel comfortable voting for Jenkins because is the only one I know > as well. > > Kind regards. > -- > > Pedro > > On 08/11/17 23:41, "Meghna Baijal" wrote: > > Thanks for the active discussion on the document for the new CI for > MXNet. > Now that many of you have reviewed it, do you think I should start a > vote > on which framework the community wants to move forward with ? > > Thanks, > Meghna > > On Mon, Nov 6, 2017 at 6:59 PM, Chris Olivier > wrote: > > > After a decision is reached, i am willing to add tasks to Apache > MXNet JIRA > > > > On Mon, Nov 6, 2017 at 6:15 AM, Pedro Larroy < > pedro.larroy.li...@gmail.com > > > > > wrote: > > > > > Thanks for setting up the document guys, looks like a solid basis > to > > > start to work on! > > > > > > Marco, Kellen and I have already added some comments. > > > > > > Pedro > > > > > > > > > On Sun, Nov 5, 2017 at 3:43 AM, Meghna Baijal > > > wrote: > > > > Kellen, Thank you for your comments in the doc. > > > > Sure Steffen, I will continue to merge everyone’s comments into > the doc > > > and > > > > work with Pedro to finalize it. > > > > And then we can vote on the options. > > > > > > > > Thanks, > > > > Meghna Baijal > > > > > > > > > > > > On Sat, Nov 4, 2017 at 6:34 AM, Steffen Rochel < > > steffenroc...@gmail.com> > > > > wrote: > > > > > > > >> Sandeep and Meghna have been working in background collecting > input > > and > > > >> preparing a doc. I suggest to drive discussion forward and > would like > > to > > > >> ask everybody to contribute to > > > >> https://docs.google.com/document/d/ > 17PEasQ2VWrXi2Cf7IGZSWGZMawxDk > > > >> dlavUDASzUmLjk/edit?usp=sharing > > > >> > > > >> Lets converge on requirements and architecture, so we can move > forward > > > with > > > >> implementation. > > > >> > > > >> I would like to suggest for Pedro and Meghna to lead the > discussion > > and > > > >> help to resolve suggestions. > > > >> > > > >> I assume we need a vote once we are converged on a good draft > to call > > > it a > > > >> plan and move forward with implementation. As we all are > unhappy with > > > the > > > >> current CI situation I would also suggest a phased approach, so > we can > > > get > > > >> back to reliable and efficient basic CI quickly and add advanced > > > >> capabilities over time. > > > >> > > > >> Steffen > > > >> > > > >> On Wed, Nov 1, 2017 at 1:14 PM kellen sunderland < > > > >> kellen.sunderl...@gmail.com> wrote: > > > >> > > > >> > Hey Henri, I think that's what a few of us are advocating. > Running > > a > > > set > > > >> > of quick tests as part of the PR process, and then a more > detailed > > > >> > regression test suite periodically (say every 4 hours). This > fits > > > nicely > > > >> > into a tagging or 2 branch development system. Commits will > be > > tagged > > > >> (or > > > >> > merged into a stable branch) as soon as they pass the detailed > > > regression > > > >> > testing. > > > >> > > > > >> > On Wed, Nov 1, 2017 at 9:07 PM, Hen > wrote: > > > >> > > > > >> > > Random question - can the CI be split such that the Apache > CI is > > > doing > > > >> a > > > >> > > basic set of checks on that hardware, and is hooked to a > PR, while > > > >> there > > > >> > is > > > >> > > a larger "Is trunk good for release?" test that is running > > > periodically > > > >> > > rather than on every PR? > > > >> > > > > > >> > > ie: do we need each PR to be run on varied hardware, or can > we > > have > > > >> this > > > >> > > two tier approach? > > > >> > > > > > >> > > Hen > > > >> > > > > > >> > > On Fri, Oct 20, 2017 at 1:01 PM, sandeep krishnamurthy < > > > >> > > sandeep.krishn...@gmail.com> wrote: > >
Re: [Proposal] Stabilizing Apache MXNet CI build system
Thanks a lot for the document and leading the discussion. Does anybody have experience with a build system other than Jenkins? In the document we mention Teamcity as a possible option, and there’s also the second leading open source CI tool “Buildbot” which is not mentioned. I’m not sure if we have strong evidence to have an informed decision about using something other than Jenkins, also from the document I get that the negatives of Jenkins are pretty minor compared to the other frameworks. I would be interested to read if somebody has used any other framework in depth and is willing to vote against using Jenkins so we can all do an informed vote. I don’t feel comfortable voting for Jenkins because is the only one I know as well. Kind regards. -- Pedro On 08/11/17 23:41, "Meghna Baijal" wrote: Thanks for the active discussion on the document for the new CI for MXNet. Now that many of you have reviewed it, do you think I should start a vote on which framework the community wants to move forward with ? Thanks, Meghna On Mon, Nov 6, 2017 at 6:59 PM, Chris Olivier wrote: > After a decision is reached, i am willing to add tasks to Apache MXNet JIRA > > On Mon, Nov 6, 2017 at 6:15 AM, Pedro Larroy > > wrote: > > > Thanks for setting up the document guys, looks like a solid basis to > > start to work on! > > > > Marco, Kellen and I have already added some comments. > > > > Pedro > > > > > > On Sun, Nov 5, 2017 at 3:43 AM, Meghna Baijal > > wrote: > > > Kellen, Thank you for your comments in the doc. > > > Sure Steffen, I will continue to merge everyone’s comments into the doc > > and > > > work with Pedro to finalize it. > > > And then we can vote on the options. > > > > > > Thanks, > > > Meghna Baijal > > > > > > > > > On Sat, Nov 4, 2017 at 6:34 AM, Steffen Rochel < > steffenroc...@gmail.com> > > > wrote: > > > > > >> Sandeep and Meghna have been working in background collecting input > and > > >> preparing a doc. I suggest to drive discussion forward and would like > to > > >> ask everybody to contribute to > > >> https://docs.google.com/document/d/17PEasQ2VWrXi2Cf7IGZSWGZMawxDk > > >> dlavUDASzUmLjk/edit?usp=sharing > > >> > > >> Lets converge on requirements and architecture, so we can move forward > > with > > >> implementation. > > >> > > >> I would like to suggest for Pedro and Meghna to lead the discussion > and > > >> help to resolve suggestions. > > >> > > >> I assume we need a vote once we are converged on a good draft to call > > it a > > >> plan and move forward with implementation. As we all are unhappy with > > the > > >> current CI situation I would also suggest a phased approach, so we can > > get > > >> back to reliable and efficient basic CI quickly and add advanced > > >> capabilities over time. > > >> > > >> Steffen > > >> > > >> On Wed, Nov 1, 2017 at 1:14 PM kellen sunderland < > > >> kellen.sunderl...@gmail.com> wrote: > > >> > > >> > Hey Henri, I think that's what a few of us are advocating. Running > a > > set > > >> > of quick tests as part of the PR process, and then a more detailed > > >> > regression test suite periodically (say every 4 hours). This fits > > nicely > > >> > into a tagging or 2 branch development system. Commits will be > tagged > > >> (or > > >> > merged into a stable branch) as soon as they pass the detailed > > regression > > >> > testing. > > >> > > > >> > On Wed, Nov 1, 2017 at 9:07 PM, Hen wrote: > > >> > > > >> > > Random question - can the CI be split such that the Apache CI is > > doing > > >> a > > >> > > basic set of checks on that hardware, and is hooked to a PR, while > > >> there > > >> > is > > >> > > a larger "Is trunk good for release?" test that is running > > periodically > > >> > > rather than on every PR? > > >> > > > > >> > > ie: do we need each PR to be run on varied hardware, or can we > have > > >> this > > >> > > two tier approach? > > >> > > > > >> > > Hen > > >> > > > > >> > > On Fri, Oct 20, 2017 at 1:01 PM, sandeep krishnamurthy < > > >> > > sandeep.krishn...@gmail.com> wrote: > > >> > > > > >> > > > Hello all, > > >> > > > > > >> > > > I am hereby opening up a discussion thread on how we can > stabilize > > >> > Apache > > >> > > > MXNet CI build system. > > >> > > > > > >> > > > Problems: > > >> > > > > > >> > > > > > >> > > > > > >> > > > Recently, we have seen following issues with Apache MXNet CI > build > > >> > > systems: > > >> > > > > > >> > > >1. Apache Jenkins master is overloaded and we see issues > like -
Re: [Proposal] Stabilizing Apache MXNet CI build system
Can you please clarify the AWS Code Build/Windows issue? Does the document state there is a workaround? I didn’t fully understand. On Wed, Nov 8, 2017 at 9:32 PM sandeep krishnamurthy < sandeep.krishn...@gmail.com> wrote: > Good work Meghna and thanks to community members for participating in the > discussion and providing valuable inputs. > Yes please share the document again and ask for vote and more broader > inputs. > > On Wed, Nov 8, 2017 at 2:43 PM, Chris Olivier > wrote: > > > +1 > > > > On Wed, Nov 8, 2017 at 2:40 PM Meghna Baijal > > > wrote: > > > > > Thanks for the active discussion on the document for the new CI for > > MXNet. > > > Now that many of you have reviewed it, do you think I should start a > vote > > > on which framework the community wants to move forward with ? > > > > > > Thanks, > > > Meghna > > > > > > On Mon, Nov 6, 2017 at 6:59 PM, Chris Olivier > > > wrote: > > > > > > > After a decision is reached, i am willing to add tasks to Apache > MXNet > > > JIRA > > > > > > > > On Mon, Nov 6, 2017 at 6:15 AM, Pedro Larroy < > > > pedro.larroy.li...@gmail.com > > > > > > > > > wrote: > > > > > > > > > Thanks for setting up the document guys, looks like a solid basis > to > > > > > start to work on! > > > > > > > > > > Marco, Kellen and I have already added some comments. > > > > > > > > > > Pedro > > > > > > > > > > > > > > > On Sun, Nov 5, 2017 at 3:43 AM, Meghna Baijal > > > > > wrote: > > > > > > Kellen, Thank you for your comments in the doc. > > > > > > Sure Steffen, I will continue to merge everyone’s comments into > the > > > doc > > > > > and > > > > > > work with Pedro to finalize it. > > > > > > And then we can vote on the options. > > > > > > > > > > > > Thanks, > > > > > > Meghna Baijal > > > > > > > > > > > > > > > > > > On Sat, Nov 4, 2017 at 6:34 AM, Steffen Rochel < > > > > steffenroc...@gmail.com> > > > > > > wrote: > > > > > > > > > > > >> Sandeep and Meghna have been working in background collecting > > input > > > > and > > > > > >> preparing a doc. I suggest to drive discussion forward and would > > > like > > > > to > > > > > >> ask everybody to contribute to > > > > > >> > https://docs.google.com/document/d/17PEasQ2VWrXi2Cf7IGZSWGZMawxDk > > > > > >> dlavUDASzUmLjk/edit?usp=sharing > > > > > >> > > > > > >> Lets converge on requirements and architecture, so we can move > > > forward > > > > > with > > > > > >> implementation. > > > > > >> > > > > > >> I would like to suggest for Pedro and Meghna to lead the > > discussion > > > > and > > > > > >> help to resolve suggestions. > > > > > >> > > > > > >> I assume we need a vote once we are converged on a good draft to > > > call > > > > > it a > > > > > >> plan and move forward with implementation. As we all are unhappy > > > with > > > > > the > > > > > >> current CI situation I would also suggest a phased approach, so > we > > > can > > > > > get > > > > > >> back to reliable and efficient basic CI quickly and add advanced > > > > > >> capabilities over time. > > > > > >> > > > > > >> Steffen > > > > > >> > > > > > >> On Wed, Nov 1, 2017 at 1:14 PM kellen sunderland < > > > > > >> kellen.sunderl...@gmail.com> wrote: > > > > > >> > > > > > >> > Hey Henri, I think that's what a few of us are advocating. > > > Running > > > > a > > > > > set > > > > > >> > of quick tests as part of the PR process, and then a more > > detailed > > > > > >> > regression test suite periodically (say every 4 hours). This > > fits > > > > > nicely > > > > > >> > into a tagging or 2 branch development system. Commits will > be > > > > tagged > > > > > >> (or > > > > > >> > merged into a stable branch) as soon as they pass the detailed > > > > > regression > > > > > >> > testing. > > > > > >> > > > > > > >> > On Wed, Nov 1, 2017 at 9:07 PM, Hen > wrote: > > > > > >> > > > > > > >> > > Random question - can the CI be split such that the Apache > CI > > is > > > > > doing > > > > > >> a > > > > > >> > > basic set of checks on that hardware, and is hooked to a PR, > > > while > > > > > >> there > > > > > >> > is > > > > > >> > > a larger "Is trunk good for release?" test that is running > > > > > periodically > > > > > >> > > rather than on every PR? > > > > > >> > > > > > > > >> > > ie: do we need each PR to be run on varied hardware, or can > we > > > > have > > > > > >> this > > > > > >> > > two tier approach? > > > > > >> > > > > > > > >> > > Hen > > > > > >> > > > > > > > >> > > On Fri, Oct 20, 2017 at 1:01 PM, sandeep krishnamurthy < > > > > > >> > > sandeep.krishn...@gmail.com> wrote: > > > > > >> > > > > > > > >> > > > Hello all, > > > > > >> > > > > > > > > >> > > > I am hereby opening up a discussion thread on how we can > > > > stabilize > > > > > >> > Apache > > > > > >> > > > MXNet CI build system. > > > > > >> > > > > > > > > >> > > > Problems: > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > Recently, we have seen following issues with Apache MXNet > CI > > > > build > > > > > >> > >
Re: [Proposal] Stabilizing Apache MXNet CI build system
Good work Meghna and thanks to community members for participating in the discussion and providing valuable inputs. Yes please share the document again and ask for vote and more broader inputs. On Wed, Nov 8, 2017 at 2:43 PM, Chris Olivier wrote: > +1 > > On Wed, Nov 8, 2017 at 2:40 PM Meghna Baijal > wrote: > > > Thanks for the active discussion on the document for the new CI for > MXNet. > > Now that many of you have reviewed it, do you think I should start a vote > > on which framework the community wants to move forward with ? > > > > Thanks, > > Meghna > > > > On Mon, Nov 6, 2017 at 6:59 PM, Chris Olivier > > wrote: > > > > > After a decision is reached, i am willing to add tasks to Apache MXNet > > JIRA > > > > > > On Mon, Nov 6, 2017 at 6:15 AM, Pedro Larroy < > > pedro.larroy.li...@gmail.com > > > > > > > wrote: > > > > > > > Thanks for setting up the document guys, looks like a solid basis to > > > > start to work on! > > > > > > > > Marco, Kellen and I have already added some comments. > > > > > > > > Pedro > > > > > > > > > > > > On Sun, Nov 5, 2017 at 3:43 AM, Meghna Baijal > > > > wrote: > > > > > Kellen, Thank you for your comments in the doc. > > > > > Sure Steffen, I will continue to merge everyone’s comments into the > > doc > > > > and > > > > > work with Pedro to finalize it. > > > > > And then we can vote on the options. > > > > > > > > > > Thanks, > > > > > Meghna Baijal > > > > > > > > > > > > > > > On Sat, Nov 4, 2017 at 6:34 AM, Steffen Rochel < > > > steffenroc...@gmail.com> > > > > > wrote: > > > > > > > > > >> Sandeep and Meghna have been working in background collecting > input > > > and > > > > >> preparing a doc. I suggest to drive discussion forward and would > > like > > > to > > > > >> ask everybody to contribute to > > > > >> https://docs.google.com/document/d/17PEasQ2VWrXi2Cf7IGZSWGZMawxDk > > > > >> dlavUDASzUmLjk/edit?usp=sharing > > > > >> > > > > >> Lets converge on requirements and architecture, so we can move > > forward > > > > with > > > > >> implementation. > > > > >> > > > > >> I would like to suggest for Pedro and Meghna to lead the > discussion > > > and > > > > >> help to resolve suggestions. > > > > >> > > > > >> I assume we need a vote once we are converged on a good draft to > > call > > > > it a > > > > >> plan and move forward with implementation. As we all are unhappy > > with > > > > the > > > > >> current CI situation I would also suggest a phased approach, so we > > can > > > > get > > > > >> back to reliable and efficient basic CI quickly and add advanced > > > > >> capabilities over time. > > > > >> > > > > >> Steffen > > > > >> > > > > >> On Wed, Nov 1, 2017 at 1:14 PM kellen sunderland < > > > > >> kellen.sunderl...@gmail.com> wrote: > > > > >> > > > > >> > Hey Henri, I think that's what a few of us are advocating. > > Running > > > a > > > > set > > > > >> > of quick tests as part of the PR process, and then a more > detailed > > > > >> > regression test suite periodically (say every 4 hours). This > fits > > > > nicely > > > > >> > into a tagging or 2 branch development system. Commits will be > > > tagged > > > > >> (or > > > > >> > merged into a stable branch) as soon as they pass the detailed > > > > regression > > > > >> > testing. > > > > >> > > > > > >> > On Wed, Nov 1, 2017 at 9:07 PM, Hen wrote: > > > > >> > > > > > >> > > Random question - can the CI be split such that the Apache CI > is > > > > doing > > > > >> a > > > > >> > > basic set of checks on that hardware, and is hooked to a PR, > > while > > > > >> there > > > > >> > is > > > > >> > > a larger "Is trunk good for release?" test that is running > > > > periodically > > > > >> > > rather than on every PR? > > > > >> > > > > > > >> > > ie: do we need each PR to be run on varied hardware, or can we > > > have > > > > >> this > > > > >> > > two tier approach? > > > > >> > > > > > > >> > > Hen > > > > >> > > > > > > >> > > On Fri, Oct 20, 2017 at 1:01 PM, sandeep krishnamurthy < > > > > >> > > sandeep.krishn...@gmail.com> wrote: > > > > >> > > > > > > >> > > > Hello all, > > > > >> > > > > > > > >> > > > I am hereby opening up a discussion thread on how we can > > > stabilize > > > > >> > Apache > > > > >> > > > MXNet CI build system. > > > > >> > > > > > > > >> > > > Problems: > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > Recently, we have seen following issues with Apache MXNet CI > > > build > > > > >> > > systems: > > > > >> > > > > > > > >> > > >1. Apache Jenkins master is overloaded and we see issues > > > like - > > > > >> > unable > > > > >> > > >to trigger builds, difficult to load and view the blue > > ocean > > > > and > > > > >> > other > > > > >> > > >Jenkins build status page. > > > > >> > > >2. We are generating too many request/interaction on > Apache > > > > Infra > > > > >> > > team. > > > > >> > > > 1. Addition/deletion of new slave: Caused from scaling > > > > >> activity, > > > > >> > > > recyc
Re: [Proposal] Stabilizing Apache MXNet CI build system
+1 On Wed, Nov 8, 2017 at 2:40 PM Meghna Baijal wrote: > Thanks for the active discussion on the document for the new CI for MXNet. > Now that many of you have reviewed it, do you think I should start a vote > on which framework the community wants to move forward with ? > > Thanks, > Meghna > > On Mon, Nov 6, 2017 at 6:59 PM, Chris Olivier > wrote: > > > After a decision is reached, i am willing to add tasks to Apache MXNet > JIRA > > > > On Mon, Nov 6, 2017 at 6:15 AM, Pedro Larroy < > pedro.larroy.li...@gmail.com > > > > > wrote: > > > > > Thanks for setting up the document guys, looks like a solid basis to > > > start to work on! > > > > > > Marco, Kellen and I have already added some comments. > > > > > > Pedro > > > > > > > > > On Sun, Nov 5, 2017 at 3:43 AM, Meghna Baijal > > > wrote: > > > > Kellen, Thank you for your comments in the doc. > > > > Sure Steffen, I will continue to merge everyone’s comments into the > doc > > > and > > > > work with Pedro to finalize it. > > > > And then we can vote on the options. > > > > > > > > Thanks, > > > > Meghna Baijal > > > > > > > > > > > > On Sat, Nov 4, 2017 at 6:34 AM, Steffen Rochel < > > steffenroc...@gmail.com> > > > > wrote: > > > > > > > >> Sandeep and Meghna have been working in background collecting input > > and > > > >> preparing a doc. I suggest to drive discussion forward and would > like > > to > > > >> ask everybody to contribute to > > > >> https://docs.google.com/document/d/17PEasQ2VWrXi2Cf7IGZSWGZMawxDk > > > >> dlavUDASzUmLjk/edit?usp=sharing > > > >> > > > >> Lets converge on requirements and architecture, so we can move > forward > > > with > > > >> implementation. > > > >> > > > >> I would like to suggest for Pedro and Meghna to lead the discussion > > and > > > >> help to resolve suggestions. > > > >> > > > >> I assume we need a vote once we are converged on a good draft to > call > > > it a > > > >> plan and move forward with implementation. As we all are unhappy > with > > > the > > > >> current CI situation I would also suggest a phased approach, so we > can > > > get > > > >> back to reliable and efficient basic CI quickly and add advanced > > > >> capabilities over time. > > > >> > > > >> Steffen > > > >> > > > >> On Wed, Nov 1, 2017 at 1:14 PM kellen sunderland < > > > >> kellen.sunderl...@gmail.com> wrote: > > > >> > > > >> > Hey Henri, I think that's what a few of us are advocating. > Running > > a > > > set > > > >> > of quick tests as part of the PR process, and then a more detailed > > > >> > regression test suite periodically (say every 4 hours). This fits > > > nicely > > > >> > into a tagging or 2 branch development system. Commits will be > > tagged > > > >> (or > > > >> > merged into a stable branch) as soon as they pass the detailed > > > regression > > > >> > testing. > > > >> > > > > >> > On Wed, Nov 1, 2017 at 9:07 PM, Hen wrote: > > > >> > > > > >> > > Random question - can the CI be split such that the Apache CI is > > > doing > > > >> a > > > >> > > basic set of checks on that hardware, and is hooked to a PR, > while > > > >> there > > > >> > is > > > >> > > a larger "Is trunk good for release?" test that is running > > > periodically > > > >> > > rather than on every PR? > > > >> > > > > > >> > > ie: do we need each PR to be run on varied hardware, or can we > > have > > > >> this > > > >> > > two tier approach? > > > >> > > > > > >> > > Hen > > > >> > > > > > >> > > On Fri, Oct 20, 2017 at 1:01 PM, sandeep krishnamurthy < > > > >> > > sandeep.krishn...@gmail.com> wrote: > > > >> > > > > > >> > > > Hello all, > > > >> > > > > > > >> > > > I am hereby opening up a discussion thread on how we can > > stabilize > > > >> > Apache > > > >> > > > MXNet CI build system. > > > >> > > > > > > >> > > > Problems: > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > Recently, we have seen following issues with Apache MXNet CI > > build > > > >> > > systems: > > > >> > > > > > > >> > > >1. Apache Jenkins master is overloaded and we see issues > > like - > > > >> > unable > > > >> > > >to trigger builds, difficult to load and view the blue > ocean > > > and > > > >> > other > > > >> > > >Jenkins build status page. > > > >> > > >2. We are generating too many request/interaction on Apache > > > Infra > > > >> > > team. > > > >> > > > 1. Addition/deletion of new slave: Caused from scaling > > > >> activity, > > > >> > > > recycling, troubleshooting or any actions leading to > > change > > > of > > > >> > > slave > > > >> > > > machines. > > > >> > > > 2. Plugins / other Jenkins Master configurations. > > > >> > > > 3. Experimentation on CI pipelines. > > > >> > > >3. Harder to debug and resolve issues - Since access to > > master > > > and > > > >> > > slave > > > >> > > >is not with the same community, it requires Infra and > > > community to > > > >> > > dive > > > >> > > >deep together on all action items. > > > >> > > > > > > >> > > > Possible
Re: [Proposal] Stabilizing Apache MXNet CI build system
Thanks for the active discussion on the document for the new CI for MXNet. Now that many of you have reviewed it, do you think I should start a vote on which framework the community wants to move forward with ? Thanks, Meghna On Mon, Nov 6, 2017 at 6:59 PM, Chris Olivier wrote: > After a decision is reached, i am willing to add tasks to Apache MXNet JIRA > > On Mon, Nov 6, 2017 at 6:15 AM, Pedro Larroy > > wrote: > > > Thanks for setting up the document guys, looks like a solid basis to > > start to work on! > > > > Marco, Kellen and I have already added some comments. > > > > Pedro > > > > > > On Sun, Nov 5, 2017 at 3:43 AM, Meghna Baijal > > wrote: > > > Kellen, Thank you for your comments in the doc. > > > Sure Steffen, I will continue to merge everyone’s comments into the doc > > and > > > work with Pedro to finalize it. > > > And then we can vote on the options. > > > > > > Thanks, > > > Meghna Baijal > > > > > > > > > On Sat, Nov 4, 2017 at 6:34 AM, Steffen Rochel < > steffenroc...@gmail.com> > > > wrote: > > > > > >> Sandeep and Meghna have been working in background collecting input > and > > >> preparing a doc. I suggest to drive discussion forward and would like > to > > >> ask everybody to contribute to > > >> https://docs.google.com/document/d/17PEasQ2VWrXi2Cf7IGZSWGZMawxDk > > >> dlavUDASzUmLjk/edit?usp=sharing > > >> > > >> Lets converge on requirements and architecture, so we can move forward > > with > > >> implementation. > > >> > > >> I would like to suggest for Pedro and Meghna to lead the discussion > and > > >> help to resolve suggestions. > > >> > > >> I assume we need a vote once we are converged on a good draft to call > > it a > > >> plan and move forward with implementation. As we all are unhappy with > > the > > >> current CI situation I would also suggest a phased approach, so we can > > get > > >> back to reliable and efficient basic CI quickly and add advanced > > >> capabilities over time. > > >> > > >> Steffen > > >> > > >> On Wed, Nov 1, 2017 at 1:14 PM kellen sunderland < > > >> kellen.sunderl...@gmail.com> wrote: > > >> > > >> > Hey Henri, I think that's what a few of us are advocating. Running > a > > set > > >> > of quick tests as part of the PR process, and then a more detailed > > >> > regression test suite periodically (say every 4 hours). This fits > > nicely > > >> > into a tagging or 2 branch development system. Commits will be > tagged > > >> (or > > >> > merged into a stable branch) as soon as they pass the detailed > > regression > > >> > testing. > > >> > > > >> > On Wed, Nov 1, 2017 at 9:07 PM, Hen wrote: > > >> > > > >> > > Random question - can the CI be split such that the Apache CI is > > doing > > >> a > > >> > > basic set of checks on that hardware, and is hooked to a PR, while > > >> there > > >> > is > > >> > > a larger "Is trunk good for release?" test that is running > > periodically > > >> > > rather than on every PR? > > >> > > > > >> > > ie: do we need each PR to be run on varied hardware, or can we > have > > >> this > > >> > > two tier approach? > > >> > > > > >> > > Hen > > >> > > > > >> > > On Fri, Oct 20, 2017 at 1:01 PM, sandeep krishnamurthy < > > >> > > sandeep.krishn...@gmail.com> wrote: > > >> > > > > >> > > > Hello all, > > >> > > > > > >> > > > I am hereby opening up a discussion thread on how we can > stabilize > > >> > Apache > > >> > > > MXNet CI build system. > > >> > > > > > >> > > > Problems: > > >> > > > > > >> > > > > > >> > > > > > >> > > > Recently, we have seen following issues with Apache MXNet CI > build > > >> > > systems: > > >> > > > > > >> > > >1. Apache Jenkins master is overloaded and we see issues > like - > > >> > unable > > >> > > >to trigger builds, difficult to load and view the blue ocean > > and > > >> > other > > >> > > >Jenkins build status page. > > >> > > >2. We are generating too many request/interaction on Apache > > Infra > > >> > > team. > > >> > > > 1. Addition/deletion of new slave: Caused from scaling > > >> activity, > > >> > > > recycling, troubleshooting or any actions leading to > change > > of > > >> > > slave > > >> > > > machines. > > >> > > > 2. Plugins / other Jenkins Master configurations. > > >> > > > 3. Experimentation on CI pipelines. > > >> > > >3. Harder to debug and resolve issues - Since access to > master > > and > > >> > > slave > > >> > > >is not with the same community, it requires Infra and > > community to > > >> > > dive > > >> > > >deep together on all action items. > > >> > > > > > >> > > > Possible Solutions: > > >> > > > > > >> > > > == > > >> > > > > > >> > > >1. Can we set up a separate Jenkins CI build system for > Apache > > >> MXNet > > >> > > >outside Apache Infra? > > >> > > >2. Can we have a separate Jenkins Master in Apache Infra for > > >> MXNet? > > >> > > >3. Review design of current setup, refine and fill the gaps. > > >> > > > > > >> > > > @ Mentors/Infr
Re: [Proposal] Stabilizing Apache MXNet CI build system
After a decision is reached, i am willing to add tasks to Apache MXNet JIRA On Mon, Nov 6, 2017 at 6:15 AM, Pedro Larroy wrote: > Thanks for setting up the document guys, looks like a solid basis to > start to work on! > > Marco, Kellen and I have already added some comments. > > Pedro > > > On Sun, Nov 5, 2017 at 3:43 AM, Meghna Baijal > wrote: > > Kellen, Thank you for your comments in the doc. > > Sure Steffen, I will continue to merge everyone’s comments into the doc > and > > work with Pedro to finalize it. > > And then we can vote on the options. > > > > Thanks, > > Meghna Baijal > > > > > > On Sat, Nov 4, 2017 at 6:34 AM, Steffen Rochel > > wrote: > > > >> Sandeep and Meghna have been working in background collecting input and > >> preparing a doc. I suggest to drive discussion forward and would like to > >> ask everybody to contribute to > >> https://docs.google.com/document/d/17PEasQ2VWrXi2Cf7IGZSWGZMawxDk > >> dlavUDASzUmLjk/edit?usp=sharing > >> > >> Lets converge on requirements and architecture, so we can move forward > with > >> implementation. > >> > >> I would like to suggest for Pedro and Meghna to lead the discussion and > >> help to resolve suggestions. > >> > >> I assume we need a vote once we are converged on a good draft to call > it a > >> plan and move forward with implementation. As we all are unhappy with > the > >> current CI situation I would also suggest a phased approach, so we can > get > >> back to reliable and efficient basic CI quickly and add advanced > >> capabilities over time. > >> > >> Steffen > >> > >> On Wed, Nov 1, 2017 at 1:14 PM kellen sunderland < > >> kellen.sunderl...@gmail.com> wrote: > >> > >> > Hey Henri, I think that's what a few of us are advocating. Running a > set > >> > of quick tests as part of the PR process, and then a more detailed > >> > regression test suite periodically (say every 4 hours). This fits > nicely > >> > into a tagging or 2 branch development system. Commits will be tagged > >> (or > >> > merged into a stable branch) as soon as they pass the detailed > regression > >> > testing. > >> > > >> > On Wed, Nov 1, 2017 at 9:07 PM, Hen wrote: > >> > > >> > > Random question - can the CI be split such that the Apache CI is > doing > >> a > >> > > basic set of checks on that hardware, and is hooked to a PR, while > >> there > >> > is > >> > > a larger "Is trunk good for release?" test that is running > periodically > >> > > rather than on every PR? > >> > > > >> > > ie: do we need each PR to be run on varied hardware, or can we have > >> this > >> > > two tier approach? > >> > > > >> > > Hen > >> > > > >> > > On Fri, Oct 20, 2017 at 1:01 PM, sandeep krishnamurthy < > >> > > sandeep.krishn...@gmail.com> wrote: > >> > > > >> > > > Hello all, > >> > > > > >> > > > I am hereby opening up a discussion thread on how we can stabilize > >> > Apache > >> > > > MXNet CI build system. > >> > > > > >> > > > Problems: > >> > > > > >> > > > > >> > > > > >> > > > Recently, we have seen following issues with Apache MXNet CI build > >> > > systems: > >> > > > > >> > > >1. Apache Jenkins master is overloaded and we see issues like - > >> > unable > >> > > >to trigger builds, difficult to load and view the blue ocean > and > >> > other > >> > > >Jenkins build status page. > >> > > >2. We are generating too many request/interaction on Apache > Infra > >> > > team. > >> > > > 1. Addition/deletion of new slave: Caused from scaling > >> activity, > >> > > > recycling, troubleshooting or any actions leading to change > of > >> > > slave > >> > > > machines. > >> > > > 2. Plugins / other Jenkins Master configurations. > >> > > > 3. Experimentation on CI pipelines. > >> > > >3. Harder to debug and resolve issues - Since access to master > and > >> > > slave > >> > > >is not with the same community, it requires Infra and > community to > >> > > dive > >> > > >deep together on all action items. > >> > > > > >> > > > Possible Solutions: > >> > > > > >> > > > == > >> > > > > >> > > >1. Can we set up a separate Jenkins CI build system for Apache > >> MXNet > >> > > >outside Apache Infra? > >> > > >2. Can we have a separate Jenkins Master in Apache Infra for > >> MXNet? > >> > > >3. Review design of current setup, refine and fill the gaps. > >> > > > > >> > > > @ Mentors/Infra team/Community: > >> > > > > >> > > > == > >> > > > > >> > > > Please provide your suggestions on how we can proceed further and > >> work > >> > on > >> > > > stabilizing the CI build systems for MXNet. > >> > > > > >> > > > Also, if the community decides on separate Jenkins CI build > system, > >> > what > >> > > > important points should be taken care of apart from the below: > >> > > > > >> > > >1. Community being able to access the build page for build > >> statuses. > >> > > >2. Committers being able to login with apache credentials. > >> > > >3. Hook setup f
Re: [Proposal] Stabilizing Apache MXNet CI build system
Thanks for setting up the document guys, looks like a solid basis to start to work on! Marco, Kellen and I have already added some comments. Pedro On Sun, Nov 5, 2017 at 3:43 AM, Meghna Baijal wrote: > Kellen, Thank you for your comments in the doc. > Sure Steffen, I will continue to merge everyone’s comments into the doc and > work with Pedro to finalize it. > And then we can vote on the options. > > Thanks, > Meghna Baijal > > > On Sat, Nov 4, 2017 at 6:34 AM, Steffen Rochel > wrote: > >> Sandeep and Meghna have been working in background collecting input and >> preparing a doc. I suggest to drive discussion forward and would like to >> ask everybody to contribute to >> https://docs.google.com/document/d/17PEasQ2VWrXi2Cf7IGZSWGZMawxDk >> dlavUDASzUmLjk/edit?usp=sharing >> >> Lets converge on requirements and architecture, so we can move forward with >> implementation. >> >> I would like to suggest for Pedro and Meghna to lead the discussion and >> help to resolve suggestions. >> >> I assume we need a vote once we are converged on a good draft to call it a >> plan and move forward with implementation. As we all are unhappy with the >> current CI situation I would also suggest a phased approach, so we can get >> back to reliable and efficient basic CI quickly and add advanced >> capabilities over time. >> >> Steffen >> >> On Wed, Nov 1, 2017 at 1:14 PM kellen sunderland < >> kellen.sunderl...@gmail.com> wrote: >> >> > Hey Henri, I think that's what a few of us are advocating. Running a set >> > of quick tests as part of the PR process, and then a more detailed >> > regression test suite periodically (say every 4 hours). This fits nicely >> > into a tagging or 2 branch development system. Commits will be tagged >> (or >> > merged into a stable branch) as soon as they pass the detailed regression >> > testing. >> > >> > On Wed, Nov 1, 2017 at 9:07 PM, Hen wrote: >> > >> > > Random question - can the CI be split such that the Apache CI is doing >> a >> > > basic set of checks on that hardware, and is hooked to a PR, while >> there >> > is >> > > a larger "Is trunk good for release?" test that is running periodically >> > > rather than on every PR? >> > > >> > > ie: do we need each PR to be run on varied hardware, or can we have >> this >> > > two tier approach? >> > > >> > > Hen >> > > >> > > On Fri, Oct 20, 2017 at 1:01 PM, sandeep krishnamurthy < >> > > sandeep.krishn...@gmail.com> wrote: >> > > >> > > > Hello all, >> > > > >> > > > I am hereby opening up a discussion thread on how we can stabilize >> > Apache >> > > > MXNet CI build system. >> > > > >> > > > Problems: >> > > > >> > > > >> > > > >> > > > Recently, we have seen following issues with Apache MXNet CI build >> > > systems: >> > > > >> > > >1. Apache Jenkins master is overloaded and we see issues like - >> > unable >> > > >to trigger builds, difficult to load and view the blue ocean and >> > other >> > > >Jenkins build status page. >> > > >2. We are generating too many request/interaction on Apache Infra >> > > team. >> > > > 1. Addition/deletion of new slave: Caused from scaling >> activity, >> > > > recycling, troubleshooting or any actions leading to change of >> > > slave >> > > > machines. >> > > > 2. Plugins / other Jenkins Master configurations. >> > > > 3. Experimentation on CI pipelines. >> > > >3. Harder to debug and resolve issues - Since access to master and >> > > slave >> > > >is not with the same community, it requires Infra and community to >> > > dive >> > > >deep together on all action items. >> > > > >> > > > Possible Solutions: >> > > > >> > > > == >> > > > >> > > >1. Can we set up a separate Jenkins CI build system for Apache >> MXNet >> > > >outside Apache Infra? >> > > >2. Can we have a separate Jenkins Master in Apache Infra for >> MXNet? >> > > >3. Review design of current setup, refine and fill the gaps. >> > > > >> > > > @ Mentors/Infra team/Community: >> > > > >> > > > == >> > > > >> > > > Please provide your suggestions on how we can proceed further and >> work >> > on >> > > > stabilizing the CI build systems for MXNet. >> > > > >> > > > Also, if the community decides on separate Jenkins CI build system, >> > what >> > > > important points should be taken care of apart from the below: >> > > > >> > > >1. Community being able to access the build page for build >> statuses. >> > > >2. Committers being able to login with apache credentials. >> > > >3. Hook setup from apache/incubator-mxnet repo to Jenkins master. >> > > > >> > > > >> > > > Irrespective of the solution we come up, I think we should initiate a >> > > > technical design discussion on how to setup the CI build system. >> > > Probably 1 >> > > > or 2 pager documents with the architecture and review with Infra and >> > > > community members. >> > > > >> > > > ***There were few proposal and discussion on the slack channe
Re: [Proposal] Stabilizing Apache MXNet CI build system
Kellen, Thank you for your comments in the doc. Sure Steffen, I will continue to merge everyone’s comments into the doc and work with Pedro to finalize it. And then we can vote on the options. Thanks, Meghna Baijal On Sat, Nov 4, 2017 at 6:34 AM, Steffen Rochel wrote: > Sandeep and Meghna have been working in background collecting input and > preparing a doc. I suggest to drive discussion forward and would like to > ask everybody to contribute to > https://docs.google.com/document/d/17PEasQ2VWrXi2Cf7IGZSWGZMawxDk > dlavUDASzUmLjk/edit?usp=sharing > > Lets converge on requirements and architecture, so we can move forward with > implementation. > > I would like to suggest for Pedro and Meghna to lead the discussion and > help to resolve suggestions. > > I assume we need a vote once we are converged on a good draft to call it a > plan and move forward with implementation. As we all are unhappy with the > current CI situation I would also suggest a phased approach, so we can get > back to reliable and efficient basic CI quickly and add advanced > capabilities over time. > > Steffen > > On Wed, Nov 1, 2017 at 1:14 PM kellen sunderland < > kellen.sunderl...@gmail.com> wrote: > > > Hey Henri, I think that's what a few of us are advocating. Running a set > > of quick tests as part of the PR process, and then a more detailed > > regression test suite periodically (say every 4 hours). This fits nicely > > into a tagging or 2 branch development system. Commits will be tagged > (or > > merged into a stable branch) as soon as they pass the detailed regression > > testing. > > > > On Wed, Nov 1, 2017 at 9:07 PM, Hen wrote: > > > > > Random question - can the CI be split such that the Apache CI is doing > a > > > basic set of checks on that hardware, and is hooked to a PR, while > there > > is > > > a larger "Is trunk good for release?" test that is running periodically > > > rather than on every PR? > > > > > > ie: do we need each PR to be run on varied hardware, or can we have > this > > > two tier approach? > > > > > > Hen > > > > > > On Fri, Oct 20, 2017 at 1:01 PM, sandeep krishnamurthy < > > > sandeep.krishn...@gmail.com> wrote: > > > > > > > Hello all, > > > > > > > > I am hereby opening up a discussion thread on how we can stabilize > > Apache > > > > MXNet CI build system. > > > > > > > > Problems: > > > > > > > > > > > > > > > > Recently, we have seen following issues with Apache MXNet CI build > > > systems: > > > > > > > >1. Apache Jenkins master is overloaded and we see issues like - > > unable > > > >to trigger builds, difficult to load and view the blue ocean and > > other > > > >Jenkins build status page. > > > >2. We are generating too many request/interaction on Apache Infra > > > team. > > > > 1. Addition/deletion of new slave: Caused from scaling > activity, > > > > recycling, troubleshooting or any actions leading to change of > > > slave > > > > machines. > > > > 2. Plugins / other Jenkins Master configurations. > > > > 3. Experimentation on CI pipelines. > > > >3. Harder to debug and resolve issues - Since access to master and > > > slave > > > >is not with the same community, it requires Infra and community to > > > dive > > > >deep together on all action items. > > > > > > > > Possible Solutions: > > > > > > > > == > > > > > > > >1. Can we set up a separate Jenkins CI build system for Apache > MXNet > > > >outside Apache Infra? > > > >2. Can we have a separate Jenkins Master in Apache Infra for > MXNet? > > > >3. Review design of current setup, refine and fill the gaps. > > > > > > > > @ Mentors/Infra team/Community: > > > > > > > > == > > > > > > > > Please provide your suggestions on how we can proceed further and > work > > on > > > > stabilizing the CI build systems for MXNet. > > > > > > > > Also, if the community decides on separate Jenkins CI build system, > > what > > > > important points should be taken care of apart from the below: > > > > > > > >1. Community being able to access the build page for build > statuses. > > > >2. Committers being able to login with apache credentials. > > > >3. Hook setup from apache/incubator-mxnet repo to Jenkins master. > > > > > > > > > > > > Irrespective of the solution we come up, I think we should initiate a > > > > technical design discussion on how to setup the CI build system. > > > Probably 1 > > > > or 2 pager documents with the architecture and review with Infra and > > > > community members. > > > > > > > > ***There were few proposal and discussion on the slack channel, to > > reach > > > > wider community members, moving that discussion formally to this > list. > > > > > > > > > > > > My Proposal: Option 1 - Set up separate Jenkins CI build system. > > > > > > > > Thanks, > > > > > > > > Sandeep > > > > > > > > > > > > > > > > -- > > > > Sandeep Krishnamurthy > > > > > > > > > >
Re: [Proposal] Stabilizing Apache MXNet CI build system
Sandeep and Meghna have been working in background collecting input and preparing a doc. I suggest to drive discussion forward and would like to ask everybody to contribute to https://docs.google.com/document/d/17PEasQ2VWrXi2Cf7IGZSWGZMawxDkdlavUDASzUmLjk/edit?usp=sharing Lets converge on requirements and architecture, so we can move forward with implementation. I would like to suggest for Pedro and Meghna to lead the discussion and help to resolve suggestions. I assume we need a vote once we are converged on a good draft to call it a plan and move forward with implementation. As we all are unhappy with the current CI situation I would also suggest a phased approach, so we can get back to reliable and efficient basic CI quickly and add advanced capabilities over time. Steffen On Wed, Nov 1, 2017 at 1:14 PM kellen sunderland < kellen.sunderl...@gmail.com> wrote: > Hey Henri, I think that's what a few of us are advocating. Running a set > of quick tests as part of the PR process, and then a more detailed > regression test suite periodically (say every 4 hours). This fits nicely > into a tagging or 2 branch development system. Commits will be tagged (or > merged into a stable branch) as soon as they pass the detailed regression > testing. > > On Wed, Nov 1, 2017 at 9:07 PM, Hen wrote: > > > Random question - can the CI be split such that the Apache CI is doing a > > basic set of checks on that hardware, and is hooked to a PR, while there > is > > a larger "Is trunk good for release?" test that is running periodically > > rather than on every PR? > > > > ie: do we need each PR to be run on varied hardware, or can we have this > > two tier approach? > > > > Hen > > > > On Fri, Oct 20, 2017 at 1:01 PM, sandeep krishnamurthy < > > sandeep.krishn...@gmail.com> wrote: > > > > > Hello all, > > > > > > I am hereby opening up a discussion thread on how we can stabilize > Apache > > > MXNet CI build system. > > > > > > Problems: > > > > > > > > > > > > Recently, we have seen following issues with Apache MXNet CI build > > systems: > > > > > >1. Apache Jenkins master is overloaded and we see issues like - > unable > > >to trigger builds, difficult to load and view the blue ocean and > other > > >Jenkins build status page. > > >2. We are generating too many request/interaction on Apache Infra > > team. > > > 1. Addition/deletion of new slave: Caused from scaling activity, > > > recycling, troubleshooting or any actions leading to change of > > slave > > > machines. > > > 2. Plugins / other Jenkins Master configurations. > > > 3. Experimentation on CI pipelines. > > >3. Harder to debug and resolve issues - Since access to master and > > slave > > >is not with the same community, it requires Infra and community to > > dive > > >deep together on all action items. > > > > > > Possible Solutions: > > > > > > == > > > > > >1. Can we set up a separate Jenkins CI build system for Apache MXNet > > >outside Apache Infra? > > >2. Can we have a separate Jenkins Master in Apache Infra for MXNet? > > >3. Review design of current setup, refine and fill the gaps. > > > > > > @ Mentors/Infra team/Community: > > > > > > == > > > > > > Please provide your suggestions on how we can proceed further and work > on > > > stabilizing the CI build systems for MXNet. > > > > > > Also, if the community decides on separate Jenkins CI build system, > what > > > important points should be taken care of apart from the below: > > > > > >1. Community being able to access the build page for build statuses. > > >2. Committers being able to login with apache credentials. > > >3. Hook setup from apache/incubator-mxnet repo to Jenkins master. > > > > > > > > > Irrespective of the solution we come up, I think we should initiate a > > > technical design discussion on how to setup the CI build system. > > Probably 1 > > > or 2 pager documents with the architecture and review with Infra and > > > community members. > > > > > > ***There were few proposal and discussion on the slack channel, to > reach > > > wider community members, moving that discussion formally to this list. > > > > > > > > > My Proposal: Option 1 - Set up separate Jenkins CI build system. > > > > > > Thanks, > > > > > > Sandeep > > > > > > > > > > > > -- > > > Sandeep Krishnamurthy > > > > > >
Re: [Proposal] Stabilizing Apache MXNet CI build system
Hey Henri, I think that's what a few of us are advocating. Running a set of quick tests as part of the PR process, and then a more detailed regression test suite periodically (say every 4 hours). This fits nicely into a tagging or 2 branch development system. Commits will be tagged (or merged into a stable branch) as soon as they pass the detailed regression testing. On Wed, Nov 1, 2017 at 9:07 PM, Hen wrote: > Random question - can the CI be split such that the Apache CI is doing a > basic set of checks on that hardware, and is hooked to a PR, while there is > a larger "Is trunk good for release?" test that is running periodically > rather than on every PR? > > ie: do we need each PR to be run on varied hardware, or can we have this > two tier approach? > > Hen > > On Fri, Oct 20, 2017 at 1:01 PM, sandeep krishnamurthy < > sandeep.krishn...@gmail.com> wrote: > > > Hello all, > > > > I am hereby opening up a discussion thread on how we can stabilize Apache > > MXNet CI build system. > > > > Problems: > > > > > > > > Recently, we have seen following issues with Apache MXNet CI build > systems: > > > >1. Apache Jenkins master is overloaded and we see issues like - unable > >to trigger builds, difficult to load and view the blue ocean and other > >Jenkins build status page. > >2. We are generating too many request/interaction on Apache Infra > team. > > 1. Addition/deletion of new slave: Caused from scaling activity, > > recycling, troubleshooting or any actions leading to change of > slave > > machines. > > 2. Plugins / other Jenkins Master configurations. > > 3. Experimentation on CI pipelines. > >3. Harder to debug and resolve issues - Since access to master and > slave > >is not with the same community, it requires Infra and community to > dive > >deep together on all action items. > > > > Possible Solutions: > > > > == > > > >1. Can we set up a separate Jenkins CI build system for Apache MXNet > >outside Apache Infra? > >2. Can we have a separate Jenkins Master in Apache Infra for MXNet? > >3. Review design of current setup, refine and fill the gaps. > > > > @ Mentors/Infra team/Community: > > > > == > > > > Please provide your suggestions on how we can proceed further and work on > > stabilizing the CI build systems for MXNet. > > > > Also, if the community decides on separate Jenkins CI build system, what > > important points should be taken care of apart from the below: > > > >1. Community being able to access the build page for build statuses. > >2. Committers being able to login with apache credentials. > >3. Hook setup from apache/incubator-mxnet repo to Jenkins master. > > > > > > Irrespective of the solution we come up, I think we should initiate a > > technical design discussion on how to setup the CI build system. > Probably 1 > > or 2 pager documents with the architecture and review with Infra and > > community members. > > > > ***There were few proposal and discussion on the slack channel, to reach > > wider community members, moving that discussion formally to this list. > > > > > > My Proposal: Option 1 - Set up separate Jenkins CI build system. > > > > Thanks, > > > > Sandeep > > > > > > > > -- > > Sandeep Krishnamurthy > > >
Re: [Proposal] Stabilizing Apache MXNet CI build system
Random question - can the CI be split such that the Apache CI is doing a basic set of checks on that hardware, and is hooked to a PR, while there is a larger "Is trunk good for release?" test that is running periodically rather than on every PR? ie: do we need each PR to be run on varied hardware, or can we have this two tier approach? Hen On Fri, Oct 20, 2017 at 1:01 PM, sandeep krishnamurthy < sandeep.krishn...@gmail.com> wrote: > Hello all, > > I am hereby opening up a discussion thread on how we can stabilize Apache > MXNet CI build system. > > Problems: > > > > Recently, we have seen following issues with Apache MXNet CI build systems: > >1. Apache Jenkins master is overloaded and we see issues like - unable >to trigger builds, difficult to load and view the blue ocean and other >Jenkins build status page. >2. We are generating too many request/interaction on Apache Infra team. > 1. Addition/deletion of new slave: Caused from scaling activity, > recycling, troubleshooting or any actions leading to change of slave > machines. > 2. Plugins / other Jenkins Master configurations. > 3. Experimentation on CI pipelines. >3. Harder to debug and resolve issues - Since access to master and slave >is not with the same community, it requires Infra and community to dive >deep together on all action items. > > Possible Solutions: > > == > >1. Can we set up a separate Jenkins CI build system for Apache MXNet >outside Apache Infra? >2. Can we have a separate Jenkins Master in Apache Infra for MXNet? >3. Review design of current setup, refine and fill the gaps. > > @ Mentors/Infra team/Community: > > == > > Please provide your suggestions on how we can proceed further and work on > stabilizing the CI build systems for MXNet. > > Also, if the community decides on separate Jenkins CI build system, what > important points should be taken care of apart from the below: > >1. Community being able to access the build page for build statuses. >2. Committers being able to login with apache credentials. >3. Hook setup from apache/incubator-mxnet repo to Jenkins master. > > > Irrespective of the solution we come up, I think we should initiate a > technical design discussion on how to setup the CI build system. Probably 1 > or 2 pager documents with the architecture and review with Infra and > community members. > > ***There were few proposal and discussion on the slack channel, to reach > wider community members, moving that discussion formally to this list. > > > My Proposal: Option 1 - Set up separate Jenkins CI build system. > > Thanks, > > Sandeep > > > > -- > Sandeep Krishnamurthy >
Re: [Proposal] Stabilizing Apache MXNet CI build system
Some inline thoughts. On Wed, Nov 1, 2017 at 9:41 AM, Bhavin Thaker wrote: > Few comments/suggestions: > > 1) Can we have this nice list of todo items on the Apache MXNet wiki page > to track them better? > > 2) Can we have a set of owners for each set of tests and source code > directory? One of the problems I have observed is that when there is a test > failure, it is difficult to find an owner who will take the responsibility > of fixing the test OR identifying the culprit code promptly -- this causes > the master to continue to fail for many days. > On this one, we're all volunteers and there shouldn't be situations of "Bob's permission is needed to edit this file", or "We're waiting on Alice to do that work". The project as a whole owns this. Agreed that this can cause a tragedy of the commons, but raising the bar on being a committer to someone who has the privilege of 24/7 time on the project is worse. As an employer of contributors, something you could do internally at Amazon is to identify experts who own (from Amazon's point of view) contributions to that area and they can be the ones you poke on an issue (internally). > > 3) Specifically, we need an owner for the Windows setup -- nobody seems to > know much about it -- please feel free to correct me if required. > If there's no one in the community who can support it, then a) we should seek someone (help wanted etc) on the lists/website/twitter, and b) if that fails, we should move it to a contrib/deprecated path. > > 4) +1 to have a list of all feature requests on Jira or a similar commonly > and easily accessible system. > > 5) -1 to the branching model -- I was the gatekeeper for the branching > model at Informix for the database kernel code to be merged to master along > with my day-job of being a database kernel engineer for around 9 months and > hence have the opinion that a branching model just shifts the burden from > one place to another. We don't have a dedicated team to do the branching > model. If we really need a buildable master everyday, then we could just > tag every successful build as last_clean_build on master -- use this tag to > get a clean master at any time. How many Apache projects are doing > development on separate branches? > Typically I would expect separate branch develop to happen when a project is experimenting with multiple futures. Most projects do have multiple branches (I'd guess typically only 2) to support bugfixes to older versions and new code on newer versions though. > > 6) FYI: Rahul (rahul003@) has fixed various warnings with this PR: > https://github.com/apache/incubator-mxnet/pull/7109 and has a test added > that fails for any warning found. We can build on top of his work. > > 7) FYI: For the unit-tests problems, Meghna identified that some of the > unit-test run times have increased significantly in the recent builds. We > need volunteers to help diagnose the root-cause here: > > Unit Test Task > > Build #337 > > Build #500 > > Build #556 > > Python 2: GPU win > > 25 > > 38 > > 40 > > Python 3: GPU Win > > 15 > > 38 > > 46 > > Python2: CPU > > 25 > > 35 > > 80 > > Python3: CPU > > 14 > > 28 > > 72 > > R: CPU > > 20 > > 34 > > 24 > > R: GPU > > 5 > > 24 > > 24 > > > 8) Ensure that all PRs submitted have corresponding documentation on > http://mxnet.io for it. It may be fine to have documentation follow the > code changes as long as there is ownership that this task will be done in a > timely manner. For example, I have requested the Nvidia team to submit PRs > to update documentation on http://mxnet.io for the Volta changes to MXNet. > Why not expect documentation as a part of the PR? > > > 9) Ensure that mega-PRs have some level of design or architecture > document(s) shared on the Apache MXNet wiki. The mega-PR must have both > unit-tests and nightly/integration tests submitted to demonstrate > high-quality level. > +1. These are the ones that should be having a dev@ discussion. > > > 10) Finally, how do we get ownership for code submitted to MXNet? When > something fails in a code segment that only a small set of folks know > about, what is the expected SLA for a response from them? When users deploy > MXNet in production environments, they will expect some form of SLA for > support and a patch release. > Users can expect what they want. What they get is best effort/good intentions. If they want someone to supply an SLA, then they can pay a vendor who repackages MXNet/builds upon MXNet for that service. Part of the value of Open Source is that users can always fix the issue themselves, they are not beholden to a third party to fix it for them (and thus need an SLA). For something like OpenOffice there is an obvious issue there, many of its users would need longer to come up to speed to fix the issue and the likely reply; but for MXNet, many of its users do know how to code and don't need to go learn a programming language before starting to look at the bug. This is also wh
Re: [Proposal] Stabilizing Apache MXNet CI build system
To point 7) I did a little bit of measure / profiling of our test runs a week or two ago and came to the same conclusion. I assumed the slow downs were mostly due to new tests which had recently been added. There were quite a few gluon tests for example added, and I think they're fairly resource intensive. On Wed, Nov 1, 2017 at 6:40 PM, kellen sunderland < kellen.sunderl...@gmail.com> wrote: > Bhavin: I would add on point 5 that it doesn't alway make sense to attach > ownership for the broken integration test to the PR author. We're planning > extensive integration tests on a variety of hardware. Some of these test > failures won't be reproducible by most PR authors and the effort to resolve > these failures should be delegated to a test owner. Agree with Pedro that > this would be strictly fast-fwd merging from one branch to another after > integration tests pass, so it shouldn't require much extra work beyond > fixing failures. > > On Wed, Nov 1, 2017 at 6:35 PM, Pedro Larroy > wrote: > >> Hi Bhavin >> >> Good suggestions. >> >> I wanted to respond to your point #5 >> >> The promotion of integration to master would be done automatically by >> jenkins once a commit passes the nightly tests. So it should not >> impose any additional burden on the developers, as there is no manual >> step involved / human gatekeeper. >> >> It would be equivalent to your suggestion with tags. You can do the >> same with branches, anyway a git branch is just a pointer to some >> commit, so I think we are talking about the same. >> >> Pedro. >> >> >> >> >> On Wed, Nov 1, 2017 at 5:41 PM, Bhavin Thaker >> wrote: >> > Few comments/suggestions: >> > >> > 1) Can we have this nice list of todo items on the Apache MXNet wiki >> page >> > to track them better? >> > >> > 2) Can we have a set of owners for each set of tests and source code >> > directory? One of the problems I have observed is that when there is a >> test >> > failure, it is difficult to find an owner who will take the >> responsibility >> > of fixing the test OR identifying the culprit code promptly -- this >> causes >> > the master to continue to fail for many days. >> > >> > 3) Specifically, we need an owner for the Windows setup -- nobody seems >> to >> > know much about it -- please feel free to correct me if required. >> > >> > 4) +1 to have a list of all feature requests on Jira or a similar >> commonly >> > and easily accessible system. >> > >> > 5) -1 to the branching model -- I was the gatekeeper for the branching >> > model at Informix for the database kernel code to be merged to master >> along >> > with my day-job of being a database kernel engineer for around 9 months >> and >> > hence have the opinion that a branching model just shifts the burden >> from >> > one place to another. We don't have a dedicated team to do the branching >> > model. If we really need a buildable master everyday, then we could just >> > tag every successful build as last_clean_build on master -- use this >> tag to >> > get a clean master at any time. How many Apache projects are doing >> > development on separate branches? >> > >> > 6) FYI: Rahul (rahul003@) has fixed various warnings with this PR: >> > https://github.com/apache/incubator-mxnet/pull/7109 and has a test >> added >> > that fails for any warning found. We can build on top of his work. >> > >> > 7) FYI: For the unit-tests problems, Meghna identified that some of the >> > unit-test run times have increased significantly in the recent builds. >> We >> > need volunteers to help diagnose the root-cause here: >> > >> > Unit Test Task >> > >> > Build #337 >> > >> > Build #500 >> > >> > Build #556 >> > >> > Python 2: GPU win >> > >> > 25 >> > >> > 38 >> > >> > 40 >> > >> > Python 3: GPU Win >> > >> > 15 >> > >> > 38 >> > >> > 46 >> > >> > Python2: CPU >> > >> > 25 >> > >> > 35 >> > >> > 80 >> > >> > Python3: CPU >> > >> > 14 >> > >> > 28 >> > >> > 72 >> > >> > R: CPU >> > >> > 20 >> > >> > 34 >> > >> > 24 >> > >> > R: GPU >> > >> > 5 >> > >> > 24 >> > >> > 24 >> > >> > >> > 8) Ensure that all PRs submitted have corresponding documentation on >> > http://mxnet.io for it. It may be fine to have documentation follow >> the >> > code changes as long as there is ownership that this task will be done >> in a >> > timely manner. For example, I have requested the Nvidia team to submit >> PRs >> > to update documentation on http://mxnet.io for the Volta changes to >> MXNet. >> > >> > >> > 9) Ensure that mega-PRs have some level of design or architecture >> > document(s) shared on the Apache MXNet wiki. The mega-PR must have both >> > unit-tests and nightly/integration tests submitted to demonstrate >> > high-quality level. >> > >> > >> > 10) Finally, how do we get ownership for code submitted to MXNet? When >> > something fails in a code segment that only a small set of folks know >> > about, what is the expected SLA for a response from them? When users >> deploy >> > MXNet in production environments, they will expect
Re: [Proposal] Stabilizing Apache MXNet CI build system
Bhavin: I would add on point 5 that it doesn't alway make sense to attach ownership for the broken integration test to the PR author. We're planning extensive integration tests on a variety of hardware. Some of these test failures won't be reproducible by most PR authors and the effort to resolve these failures should be delegated to a test owner. Agree with Pedro that this would be strictly fast-fwd merging from one branch to another after integration tests pass, so it shouldn't require much extra work beyond fixing failures. On Wed, Nov 1, 2017 at 6:35 PM, Pedro Larroy wrote: > Hi Bhavin > > Good suggestions. > > I wanted to respond to your point #5 > > The promotion of integration to master would be done automatically by > jenkins once a commit passes the nightly tests. So it should not > impose any additional burden on the developers, as there is no manual > step involved / human gatekeeper. > > It would be equivalent to your suggestion with tags. You can do the > same with branches, anyway a git branch is just a pointer to some > commit, so I think we are talking about the same. > > Pedro. > > > > > On Wed, Nov 1, 2017 at 5:41 PM, Bhavin Thaker > wrote: > > Few comments/suggestions: > > > > 1) Can we have this nice list of todo items on the Apache MXNet wiki > page > > to track them better? > > > > 2) Can we have a set of owners for each set of tests and source code > > directory? One of the problems I have observed is that when there is a > test > > failure, it is difficult to find an owner who will take the > responsibility > > of fixing the test OR identifying the culprit code promptly -- this > causes > > the master to continue to fail for many days. > > > > 3) Specifically, we need an owner for the Windows setup -- nobody seems > to > > know much about it -- please feel free to correct me if required. > > > > 4) +1 to have a list of all feature requests on Jira or a similar > commonly > > and easily accessible system. > > > > 5) -1 to the branching model -- I was the gatekeeper for the branching > > model at Informix for the database kernel code to be merged to master > along > > with my day-job of being a database kernel engineer for around 9 months > and > > hence have the opinion that a branching model just shifts the burden from > > one place to another. We don't have a dedicated team to do the branching > > model. If we really need a buildable master everyday, then we could just > > tag every successful build as last_clean_build on master -- use this tag > to > > get a clean master at any time. How many Apache projects are doing > > development on separate branches? > > > > 6) FYI: Rahul (rahul003@) has fixed various warnings with this PR: > > https://github.com/apache/incubator-mxnet/pull/7109 and has a test added > > that fails for any warning found. We can build on top of his work. > > > > 7) FYI: For the unit-tests problems, Meghna identified that some of the > > unit-test run times have increased significantly in the recent builds. We > > need volunteers to help diagnose the root-cause here: > > > > Unit Test Task > > > > Build #337 > > > > Build #500 > > > > Build #556 > > > > Python 2: GPU win > > > > 25 > > > > 38 > > > > 40 > > > > Python 3: GPU Win > > > > 15 > > > > 38 > > > > 46 > > > > Python2: CPU > > > > 25 > > > > 35 > > > > 80 > > > > Python3: CPU > > > > 14 > > > > 28 > > > > 72 > > > > R: CPU > > > > 20 > > > > 34 > > > > 24 > > > > R: GPU > > > > 5 > > > > 24 > > > > 24 > > > > > > 8) Ensure that all PRs submitted have corresponding documentation on > > http://mxnet.io for it. It may be fine to have documentation follow the > > code changes as long as there is ownership that this task will be done > in a > > timely manner. For example, I have requested the Nvidia team to submit > PRs > > to update documentation on http://mxnet.io for the Volta changes to > MXNet. > > > > > > 9) Ensure that mega-PRs have some level of design or architecture > > document(s) shared on the Apache MXNet wiki. The mega-PR must have both > > unit-tests and nightly/integration tests submitted to demonstrate > > high-quality level. > > > > > > 10) Finally, how do we get ownership for code submitted to MXNet? When > > something fails in a code segment that only a small set of folks know > > about, what is the expected SLA for a response from them? When users > deploy > > MXNet in production environments, they will expect some form of SLA for > > support and a patch release. > > > > > > Regards, > > Bhavin Thaker. > > > > > > > > > > > > > > On Wed, Nov 1, 2017 at 8:20 AM, Pedro Larroy < > pedro.larroy.li...@gmail.com> > > wrote: > > > >> +1 That would be great. > >> > >> On Mon, Oct 30, 2017 at 5:35 PM, Hen wrote: > >> > How about we ask for a new mxnet repo to store all the config in? > >> > > >> > On Fri, Oct 27, 2017 at 05:30 Pedro Larroy < > pedro.larroy.li...@gmail.com > >> > > >> > wrote: > >> > > >> >> Just to provide a high level overview of the ideas and proposals > >> >> comin
Re: [Proposal] Stabilizing Apache MXNet CI build system
Hi Bhavin Good suggestions. I wanted to respond to your point #5 The promotion of integration to master would be done automatically by jenkins once a commit passes the nightly tests. So it should not impose any additional burden on the developers, as there is no manual step involved / human gatekeeper. It would be equivalent to your suggestion with tags. You can do the same with branches, anyway a git branch is just a pointer to some commit, so I think we are talking about the same. Pedro. On Wed, Nov 1, 2017 at 5:41 PM, Bhavin Thaker wrote: > Few comments/suggestions: > > 1) Can we have this nice list of todo items on the Apache MXNet wiki page > to track them better? > > 2) Can we have a set of owners for each set of tests and source code > directory? One of the problems I have observed is that when there is a test > failure, it is difficult to find an owner who will take the responsibility > of fixing the test OR identifying the culprit code promptly -- this causes > the master to continue to fail for many days. > > 3) Specifically, we need an owner for the Windows setup -- nobody seems to > know much about it -- please feel free to correct me if required. > > 4) +1 to have a list of all feature requests on Jira or a similar commonly > and easily accessible system. > > 5) -1 to the branching model -- I was the gatekeeper for the branching > model at Informix for the database kernel code to be merged to master along > with my day-job of being a database kernel engineer for around 9 months and > hence have the opinion that a branching model just shifts the burden from > one place to another. We don't have a dedicated team to do the branching > model. If we really need a buildable master everyday, then we could just > tag every successful build as last_clean_build on master -- use this tag to > get a clean master at any time. How many Apache projects are doing > development on separate branches? > > 6) FYI: Rahul (rahul003@) has fixed various warnings with this PR: > https://github.com/apache/incubator-mxnet/pull/7109 and has a test added > that fails for any warning found. We can build on top of his work. > > 7) FYI: For the unit-tests problems, Meghna identified that some of the > unit-test run times have increased significantly in the recent builds. We > need volunteers to help diagnose the root-cause here: > > Unit Test Task > > Build #337 > > Build #500 > > Build #556 > > Python 2: GPU win > > 25 > > 38 > > 40 > > Python 3: GPU Win > > 15 > > 38 > > 46 > > Python2: CPU > > 25 > > 35 > > 80 > > Python3: CPU > > 14 > > 28 > > 72 > > R: CPU > > 20 > > 34 > > 24 > > R: GPU > > 5 > > 24 > > 24 > > > 8) Ensure that all PRs submitted have corresponding documentation on > http://mxnet.io for it. It may be fine to have documentation follow the > code changes as long as there is ownership that this task will be done in a > timely manner. For example, I have requested the Nvidia team to submit PRs > to update documentation on http://mxnet.io for the Volta changes to MXNet. > > > 9) Ensure that mega-PRs have some level of design or architecture > document(s) shared on the Apache MXNet wiki. The mega-PR must have both > unit-tests and nightly/integration tests submitted to demonstrate > high-quality level. > > > 10) Finally, how do we get ownership for code submitted to MXNet? When > something fails in a code segment that only a small set of folks know > about, what is the expected SLA for a response from them? When users deploy > MXNet in production environments, they will expect some form of SLA for > support and a patch release. > > > Regards, > Bhavin Thaker. > > > > > > > On Wed, Nov 1, 2017 at 8:20 AM, Pedro Larroy > wrote: > >> +1 That would be great. >> >> On Mon, Oct 30, 2017 at 5:35 PM, Hen wrote: >> > How about we ask for a new mxnet repo to store all the config in? >> > >> > On Fri, Oct 27, 2017 at 05:30 Pedro Larroy > > >> > wrote: >> > >> >> Just to provide a high level overview of the ideas and proposals >> >> coming from different sources for the requirements for testing and >> >> validation of builds: >> >> >> >> * Have terraform files for the testing infrastructure. Infrastructure >> >> as code (IaC). Minus not emulated / nor cloud based, embedded >> >> hardware. ("single command" replication of the testing infrastructure, >> >> no manual steps). >> >> >> >> * CI software based on Jenkins, unless someone thinks there's a better >> >> alternative. >> >> >> >> * Use autoscaling groups and improve staggered build + test steps to >> >> achieve higher parallelism and shorter feedback times. >> >> >> >> * Switch to a branching model based on stable master + integration >> >> branch. PRs are merged into dev/integration which runs extended >> >> nightly tests, which are >> >> then merged into master, preferably in an automated way after >> >> successful extended testing. >> >> Master is always tested, and always buildable. Release branches or >> >> tags in master as usual for releases. >>
Re: [Proposal] Stabilizing Apache MXNet CI build system
Few comments/suggestions: 1) Can we have this nice list of todo items on the Apache MXNet wiki page to track them better? 2) Can we have a set of owners for each set of tests and source code directory? One of the problems I have observed is that when there is a test failure, it is difficult to find an owner who will take the responsibility of fixing the test OR identifying the culprit code promptly -- this causes the master to continue to fail for many days. 3) Specifically, we need an owner for the Windows setup -- nobody seems to know much about it -- please feel free to correct me if required. 4) +1 to have a list of all feature requests on Jira or a similar commonly and easily accessible system. 5) -1 to the branching model -- I was the gatekeeper for the branching model at Informix for the database kernel code to be merged to master along with my day-job of being a database kernel engineer for around 9 months and hence have the opinion that a branching model just shifts the burden from one place to another. We don't have a dedicated team to do the branching model. If we really need a buildable master everyday, then we could just tag every successful build as last_clean_build on master -- use this tag to get a clean master at any time. How many Apache projects are doing development on separate branches? 6) FYI: Rahul (rahul003@) has fixed various warnings with this PR: https://github.com/apache/incubator-mxnet/pull/7109 and has a test added that fails for any warning found. We can build on top of his work. 7) FYI: For the unit-tests problems, Meghna identified that some of the unit-test run times have increased significantly in the recent builds. We need volunteers to help diagnose the root-cause here: Unit Test Task Build #337 Build #500 Build #556 Python 2: GPU win 25 38 40 Python 3: GPU Win 15 38 46 Python2: CPU 25 35 80 Python3: CPU 14 28 72 R: CPU 20 34 24 R: GPU 5 24 24 8) Ensure that all PRs submitted have corresponding documentation on http://mxnet.io for it. It may be fine to have documentation follow the code changes as long as there is ownership that this task will be done in a timely manner. For example, I have requested the Nvidia team to submit PRs to update documentation on http://mxnet.io for the Volta changes to MXNet. 9) Ensure that mega-PRs have some level of design or architecture document(s) shared on the Apache MXNet wiki. The mega-PR must have both unit-tests and nightly/integration tests submitted to demonstrate high-quality level. 10) Finally, how do we get ownership for code submitted to MXNet? When something fails in a code segment that only a small set of folks know about, what is the expected SLA for a response from them? When users deploy MXNet in production environments, they will expect some form of SLA for support and a patch release. Regards, Bhavin Thaker. On Wed, Nov 1, 2017 at 8:20 AM, Pedro Larroy wrote: > +1 That would be great. > > On Mon, Oct 30, 2017 at 5:35 PM, Hen wrote: > > How about we ask for a new mxnet repo to store all the config in? > > > > On Fri, Oct 27, 2017 at 05:30 Pedro Larroy > > > wrote: > > > >> Just to provide a high level overview of the ideas and proposals > >> coming from different sources for the requirements for testing and > >> validation of builds: > >> > >> * Have terraform files for the testing infrastructure. Infrastructure > >> as code (IaC). Minus not emulated / nor cloud based, embedded > >> hardware. ("single command" replication of the testing infrastructure, > >> no manual steps). > >> > >> * CI software based on Jenkins, unless someone thinks there's a better > >> alternative. > >> > >> * Use autoscaling groups and improve staggered build + test steps to > >> achieve higher parallelism and shorter feedback times. > >> > >> * Switch to a branching model based on stable master + integration > >> branch. PRs are merged into dev/integration which runs extended > >> nightly tests, which are > >> then merged into master, preferably in an automated way after > >> successful extended testing. > >> Master is always tested, and always buildable. Release branches or > >> tags in master as usual for releases. > >> > >> * Build + test feedback time targeting less than 15 minutes. > >> (Currently a build in a 16x core takes 7m). This involves lot of > >> refactoring of tests, move expensive tests / big smoke tests to > >> nightlies on the integration branch, also tests on IoT devices / power > >> and performance regressions... > >> > >> * Add code coverage and other quality metrics. > >> > >> * Eliminate warnings and treat warnings as errors. We have spent time > >> tracking down "undefined behaviour" bugs that could have been caught > >> by compiler warnings. > >> > >> Is there something I'm missing or additional things that come to your > >> mind that you would wish to add? > >> > >> Pedro. > >> >
Re: [Proposal] Stabilizing Apache MXNet CI build system
+1 That would be great. On Mon, Oct 30, 2017 at 5:35 PM, Hen wrote: > How about we ask for a new mxnet repo to store all the config in? > > On Fri, Oct 27, 2017 at 05:30 Pedro Larroy > wrote: > >> Just to provide a high level overview of the ideas and proposals >> coming from different sources for the requirements for testing and >> validation of builds: >> >> * Have terraform files for the testing infrastructure. Infrastructure >> as code (IaC). Minus not emulated / nor cloud based, embedded >> hardware. ("single command" replication of the testing infrastructure, >> no manual steps). >> >> * CI software based on Jenkins, unless someone thinks there's a better >> alternative. >> >> * Use autoscaling groups and improve staggered build + test steps to >> achieve higher parallelism and shorter feedback times. >> >> * Switch to a branching model based on stable master + integration >> branch. PRs are merged into dev/integration which runs extended >> nightly tests, which are >> then merged into master, preferably in an automated way after >> successful extended testing. >> Master is always tested, and always buildable. Release branches or >> tags in master as usual for releases. >> >> * Build + test feedback time targeting less than 15 minutes. >> (Currently a build in a 16x core takes 7m). This involves lot of >> refactoring of tests, move expensive tests / big smoke tests to >> nightlies on the integration branch, also tests on IoT devices / power >> and performance regressions... >> >> * Add code coverage and other quality metrics. >> >> * Eliminate warnings and treat warnings as errors. We have spent time >> tracking down "undefined behaviour" bugs that could have been caught >> by compiler warnings. >> >> Is there something I'm missing or additional things that come to your >> mind that you would wish to add? >> >> Pedro. >>
Re: [Proposal] Stabilizing Apache MXNet CI build system
How about we ask for a new mxnet repo to store all the config in? On Fri, Oct 27, 2017 at 05:30 Pedro Larroy wrote: > Just to provide a high level overview of the ideas and proposals > coming from different sources for the requirements for testing and > validation of builds: > > * Have terraform files for the testing infrastructure. Infrastructure > as code (IaC). Minus not emulated / nor cloud based, embedded > hardware. ("single command" replication of the testing infrastructure, > no manual steps). > > * CI software based on Jenkins, unless someone thinks there's a better > alternative. > > * Use autoscaling groups and improve staggered build + test steps to > achieve higher parallelism and shorter feedback times. > > * Switch to a branching model based on stable master + integration > branch. PRs are merged into dev/integration which runs extended > nightly tests, which are > then merged into master, preferably in an automated way after > successful extended testing. > Master is always tested, and always buildable. Release branches or > tags in master as usual for releases. > > * Build + test feedback time targeting less than 15 minutes. > (Currently a build in a 16x core takes 7m). This involves lot of > refactoring of tests, move expensive tests / big smoke tests to > nightlies on the integration branch, also tests on IoT devices / power > and performance regressions... > > * Add code coverage and other quality metrics. > > * Eliminate warnings and treat warnings as errors. We have spent time > tracking down "undefined behaviour" bugs that could have been caught > by compiler warnings. > > Is there something I'm missing or additional things that come to your > mind that you would wish to add? > > Pedro. >
Re: [Proposal] Stabilizing Apache MXNet CI build system
+1 On Sat, Oct 28, 2017 at 5:29 AM, Chris Olivier wrote: > IMHO, it would be nice to have Apache JIRA for mxnet where these sort of > feature requests could be entered and publicly tracked and possibly taken > up by whoever has cycles with the JIRA helping to avoid overlapping work. > After the core system works, of course. WDYT? > > On Fri, Oct 27, 2017 at 5:30 AM, Pedro Larroy < > pedro.larroy.li...@gmail.com> > wrote: > > > Just to provide a high level overview of the ideas and proposals > > coming from different sources for the requirements for testing and > > validation of builds: > > > > * Have terraform files for the testing infrastructure. Infrastructure > > as code (IaC). Minus not emulated / nor cloud based, embedded > > hardware. ("single command" replication of the testing infrastructure, > > no manual steps). > > > > * CI software based on Jenkins, unless someone thinks there's a better > > alternative. > > > > * Use autoscaling groups and improve staggered build + test steps to > > achieve higher parallelism and shorter feedback times. > > > > * Switch to a branching model based on stable master + integration > > branch. PRs are merged into dev/integration which runs extended > > nightly tests, which are > > then merged into master, preferably in an automated way after > > successful extended testing. > > Master is always tested, and always buildable. Release branches or > > tags in master as usual for releases. > > > > * Build + test feedback time targeting less than 15 minutes. > > (Currently a build in a 16x core takes 7m). This involves lot of > > refactoring of tests, move expensive tests / big smoke tests to > > nightlies on the integration branch, also tests on IoT devices / power > > and performance regressions... > > > > * Add code coverage and other quality metrics. > > > > * Eliminate warnings and treat warnings as errors. We have spent time > > tracking down "undefined behaviour" bugs that could have been caught > > by compiler warnings. > > > > Is there something I'm missing or additional things that come to your > > mind that you would wish to add? > > > > Pedro. > > >
Re: [Proposal] Stabilizing Apache MXNet CI build system
IMHO, it would be nice to have Apache JIRA for mxnet where these sort of feature requests could be entered and publicly tracked and possibly taken up by whoever has cycles with the JIRA helping to avoid overlapping work. After the core system works, of course. WDYT? On Fri, Oct 27, 2017 at 5:30 AM, Pedro Larroy wrote: > Just to provide a high level overview of the ideas and proposals > coming from different sources for the requirements for testing and > validation of builds: > > * Have terraform files for the testing infrastructure. Infrastructure > as code (IaC). Minus not emulated / nor cloud based, embedded > hardware. ("single command" replication of the testing infrastructure, > no manual steps). > > * CI software based on Jenkins, unless someone thinks there's a better > alternative. > > * Use autoscaling groups and improve staggered build + test steps to > achieve higher parallelism and shorter feedback times. > > * Switch to a branching model based on stable master + integration > branch. PRs are merged into dev/integration which runs extended > nightly tests, which are > then merged into master, preferably in an automated way after > successful extended testing. > Master is always tested, and always buildable. Release branches or > tags in master as usual for releases. > > * Build + test feedback time targeting less than 15 minutes. > (Currently a build in a 16x core takes 7m). This involves lot of > refactoring of tests, move expensive tests / big smoke tests to > nightlies on the integration branch, also tests on IoT devices / power > and performance regressions... > > * Add code coverage and other quality metrics. > > * Eliminate warnings and treat warnings as errors. We have spent time > tracking down "undefined behaviour" bugs that could have been caught > by compiler warnings. > > Is there something I'm missing or additional things that come to your > mind that you would wish to add? > > Pedro. >
Re: [Proposal] Stabilizing Apache MXNet CI build system
Just to provide a high level overview of the ideas and proposals coming from different sources for the requirements for testing and validation of builds: * Have terraform files for the testing infrastructure. Infrastructure as code (IaC). Minus not emulated / nor cloud based, embedded hardware. ("single command" replication of the testing infrastructure, no manual steps). * CI software based on Jenkins, unless someone thinks there's a better alternative. * Use autoscaling groups and improve staggered build + test steps to achieve higher parallelism and shorter feedback times. * Switch to a branching model based on stable master + integration branch. PRs are merged into dev/integration which runs extended nightly tests, which are then merged into master, preferably in an automated way after successful extended testing. Master is always tested, and always buildable. Release branches or tags in master as usual for releases. * Build + test feedback time targeting less than 15 minutes. (Currently a build in a 16x core takes 7m). This involves lot of refactoring of tests, move expensive tests / big smoke tests to nightlies on the integration branch, also tests on IoT devices / power and performance regressions... * Add code coverage and other quality metrics. * Eliminate warnings and treat warnings as errors. We have spent time tracking down "undefined behaviour" bugs that could have been caught by compiler warnings. Is there something I'm missing or additional things that come to your mind that you would wish to add? Pedro.
Re: [Proposal] Stabilizing Apache MXNet CI build system
Thanks for your input guys, I think we are all on a good track to get this fixed. I'm confident that Meghna and Marco are going to drive this to success. We are collecting ideas and requirements for the document on how we will revamp the testing infrastructure. My only question right now is where to store this document to collaborate. I don't seem to have permissions in confluence to edit the wiki: https://cwiki.apache.org/confluence/display/MXNET/Continuous+Integration Should we otherwise use a shared google doc or a github wiki or how? Please advice. Pedro. On Thu, Oct 26, 2017 at 8:14 AM, Meghna Baijal wrote: > Thanks Sandeep for driving this discussion. I am also in contact with Pedro > and his team to include their requirements. > And thank you Sebastian, I will let you know! > > Meghna > > On Wed, Oct 25, 2017 at 11:05 PM, Sebastian > wrote: > > > @meghana @pedro let me know if you need someone with a mentor hat to open > > tickets or send mail to infra, happy to help here. > > > > Best, > > Sebastian > > > > > > On 25.10.2017 23:18, sandeep krishnamurthy wrote: > > > >> Thank you, everyone, for the discussion, proposal, and the vote. > >> > >> Here majority community members see current CI system for Apache MXNet > is > >> having issues in scaling and diverse test environments. And the common > >> suggestion is to have a separate CI setup for Apache MXNet. > >> > >> Following are the next steps: > >> > >> 1. Meghana proposed she would like to take the lead on this and come up > >> with an initial tech design write up covering requirements, use-cases, > >> alternate solutions and a proposed solution on how we could set up the > CI > >> system for MXNet. > >> 2. This tech design will be reviewed in the community and following > that, > >> collaborate with Infra team and mentors to complete setup in the > >> integration of the new system with Repo and Website and more. > >> > >> @Pedro Larry - We should sync up on understanding how we can unify the > set > >> up you have for various devices and the new set up being proposed and > >> built. Ideally, we should have a unified CI setup for the project > >> accessible to the community. > >> > >> Regards, > >> Sandeep > >> > >> On Mon, Oct 23, 2017 at 7:29 AM, Pedro Larroy < > >> pedro.larroy.li...@gmail.com> > >> wrote: > >> > >> +1 > >>> > >>> We (with Kellen and Marco) are already working on a CI system that > >>> verifies > >>> MXNet on devices, so far a work in progress, but at least we are > checking > >>> that the build is sane on Android, different arm flavors and ubuntu, > also > >>> building PRs. So far we are still working on having the unit tests pass > >>> on > >>> some architectures like Jetson TX2 and ARM / Raspberry PI. > >>> > >>> http://ci.mxnet.amazon-ml.com/ > >>> > >>> Agree with Steffen on creating a document with requirements and high > >>> level > >>> architecture. Also I would like to have quicker feedback and as we > >>> discussed before, saner unit tests. I think there's a big and > nontrivial > >>> amount of effort required here. > >>> > >>> Pedro. > >>> > >>> On Mon, Oct 23, 2017 at 6:43 AM, Steffen Rochel < > steffenroc...@gmail.com > >>> > > >>> wrote: > >>> > >>> +1 > I support Option 1 - Set up separate Jenkins CI build system. While > the > Apache service is appropriate for some projects, our experience over > the > last 6 months has not been meeting the needs of the MXNet (incubating) > project. AWS has been and will continue provide resources for such > > >>> project. > >>> > Agree we should create a document summarizing the requirements and > high > level architecture, which should answer the question of Jenkins or > alternative. > > Steffen > > On Sat, Oct 21, 2017 at 6:51 PM shiwen hu > wrote: > > +1 > > > > > > 2017-10-21 9:48 GMT+08:00 Chris Olivier : > > > > Ok, just looking for anything that can cut a task out if possible. I > >> > > do > >>> > support not using Apache Jenkins server anyMore — it’s really not > >> > > been > >>> > working out for various reasons. But having a person full time is > >> something that Steffen would have to address, I imagine. > >> > >> On Fri, Oct 20, 2017 at 6:03 PM Mu Li wrote: > >> > >> I didn't see the clear advantage of CodePipline over pure jenkins, > >>> > >> because > >> > >>> we don't need to deploy here. > >>> > >>> On Fri, Oct 20, 2017 at 5:34 PM, Chris Olivier < > >>> > >> cjolivie...@gmail.com> > > > wrote: > >>> > >>> CodePipeline, then. You can point it to Jenkins instances. > > > On Fri, Oct 20, 2017 at 4:49 PM Mu Li > > >>> wrote: > >>> > > AWS CodeBuild is not an option. It doesn't support GPU > > > instances, > >>> > mac > > > >> os > >>> > x, > > > and win
Re: [Proposal] Stabilizing Apache MXNet CI build system
Thanks Sandeep for driving this discussion. I am also in contact with Pedro and his team to include their requirements. And thank you Sebastian, I will let you know! Meghna On Wed, Oct 25, 2017 at 11:05 PM, Sebastian wrote: > @meghana @pedro let me know if you need someone with a mentor hat to open > tickets or send mail to infra, happy to help here. > > Best, > Sebastian > > > On 25.10.2017 23:18, sandeep krishnamurthy wrote: > >> Thank you, everyone, for the discussion, proposal, and the vote. >> >> Here majority community members see current CI system for Apache MXNet is >> having issues in scaling and diverse test environments. And the common >> suggestion is to have a separate CI setup for Apache MXNet. >> >> Following are the next steps: >> >> 1. Meghana proposed she would like to take the lead on this and come up >> with an initial tech design write up covering requirements, use-cases, >> alternate solutions and a proposed solution on how we could set up the CI >> system for MXNet. >> 2. This tech design will be reviewed in the community and following that, >> collaborate with Infra team and mentors to complete setup in the >> integration of the new system with Repo and Website and more. >> >> @Pedro Larry - We should sync up on understanding how we can unify the set >> up you have for various devices and the new set up being proposed and >> built. Ideally, we should have a unified CI setup for the project >> accessible to the community. >> >> Regards, >> Sandeep >> >> On Mon, Oct 23, 2017 at 7:29 AM, Pedro Larroy < >> pedro.larroy.li...@gmail.com> >> wrote: >> >> +1 >>> >>> We (with Kellen and Marco) are already working on a CI system that >>> verifies >>> MXNet on devices, so far a work in progress, but at least we are checking >>> that the build is sane on Android, different arm flavors and ubuntu, also >>> building PRs. So far we are still working on having the unit tests pass >>> on >>> some architectures like Jetson TX2 and ARM / Raspberry PI. >>> >>> http://ci.mxnet.amazon-ml.com/ >>> >>> Agree with Steffen on creating a document with requirements and high >>> level >>> architecture. Also I would like to have quicker feedback and as we >>> discussed before, saner unit tests. I think there's a big and nontrivial >>> amount of effort required here. >>> >>> Pedro. >>> >>> On Mon, Oct 23, 2017 at 6:43 AM, Steffen Rochel >> > >>> wrote: >>> >>> +1 I support Option 1 - Set up separate Jenkins CI build system. While the Apache service is appropriate for some projects, our experience over the last 6 months has not been meeting the needs of the MXNet (incubating) project. AWS has been and will continue provide resources for such >>> project. >>> Agree we should create a document summarizing the requirements and high level architecture, which should answer the question of Jenkins or alternative. Steffen On Sat, Oct 21, 2017 at 6:51 PM shiwen hu wrote: +1 > > > 2017-10-21 9:48 GMT+08:00 Chris Olivier : > > Ok, just looking for anything that can cut a task out if possible. I >> > do >>> support not using Apache Jenkins server anyMore — it’s really not >> > been >>> working out for various reasons. But having a person full time is >> something that Steffen would have to address, I imagine. >> >> On Fri, Oct 20, 2017 at 6:03 PM Mu Li wrote: >> >> I didn't see the clear advantage of CodePipline over pure jenkins, >>> >> because >> >>> we don't need to deploy here. >>> >>> On Fri, Oct 20, 2017 at 5:34 PM, Chris Olivier < >>> >> cjolivie...@gmail.com> > wrote: >>> >>> CodePipeline, then. You can point it to Jenkins instances. On Fri, Oct 20, 2017 at 4:49 PM Mu Li >>> wrote: >>> AWS CodeBuild is not an option. It doesn't support GPU > instances, >>> mac > >> os >>> x, > and windows. Not even mention the edge devices. > > On Fri, Oct 20, 2017 at 4:07 PM, Chris Olivier < > cjolivie...@gmail.com> >> >>> wrote: > > Why don;t we look into fully managed AWS CodeBuild? It >> > maintains > everything. It's also compatible with Jenkins. >> >> On Fri, Oct 20, 2017 at 1:51 PM, Tianqi Chen < >> > tqc...@cs.washington.edu >>> > wrote: >> >> +1 >>> >>> Tianqi >>> On Fri, Oct 20, 2017 at 1:39 PM Mu Li >>> >> wrote: >> >>> >>> +1 It seems that the Apache CI is quite overloaded these >>> days, >>> and > >> MXNet's > >> CI >>> pipeline is too complex to run there. In addition, we may >>>
Re: [Proposal] Stabilizing Apache MXNet CI build system
@meghana @pedro let me know if you need someone with a mentor hat to open tickets or send mail to infra, happy to help here. Best, Sebastian On 25.10.2017 23:18, sandeep krishnamurthy wrote: Thank you, everyone, for the discussion, proposal, and the vote. Here majority community members see current CI system for Apache MXNet is having issues in scaling and diverse test environments. And the common suggestion is to have a separate CI setup for Apache MXNet. Following are the next steps: 1. Meghana proposed she would like to take the lead on this and come up with an initial tech design write up covering requirements, use-cases, alternate solutions and a proposed solution on how we could set up the CI system for MXNet. 2. This tech design will be reviewed in the community and following that, collaborate with Infra team and mentors to complete setup in the integration of the new system with Repo and Website and more. @Pedro Larry - We should sync up on understanding how we can unify the set up you have for various devices and the new set up being proposed and built. Ideally, we should have a unified CI setup for the project accessible to the community. Regards, Sandeep On Mon, Oct 23, 2017 at 7:29 AM, Pedro Larroy wrote: +1 We (with Kellen and Marco) are already working on a CI system that verifies MXNet on devices, so far a work in progress, but at least we are checking that the build is sane on Android, different arm flavors and ubuntu, also building PRs. So far we are still working on having the unit tests pass on some architectures like Jetson TX2 and ARM / Raspberry PI. http://ci.mxnet.amazon-ml.com/ Agree with Steffen on creating a document with requirements and high level architecture. Also I would like to have quicker feedback and as we discussed before, saner unit tests. I think there's a big and nontrivial amount of effort required here. Pedro. On Mon, Oct 23, 2017 at 6:43 AM, Steffen Rochel wrote: +1 I support Option 1 - Set up separate Jenkins CI build system. While the Apache service is appropriate for some projects, our experience over the last 6 months has not been meeting the needs of the MXNet (incubating) project. AWS has been and will continue provide resources for such project. Agree we should create a document summarizing the requirements and high level architecture, which should answer the question of Jenkins or alternative. Steffen On Sat, Oct 21, 2017 at 6:51 PM shiwen hu wrote: +1 2017-10-21 9:48 GMT+08:00 Chris Olivier : Ok, just looking for anything that can cut a task out if possible. I do support not using Apache Jenkins server anyMore — it’s really not been working out for various reasons. But having a person full time is something that Steffen would have to address, I imagine. On Fri, Oct 20, 2017 at 6:03 PM Mu Li wrote: I didn't see the clear advantage of CodePipline over pure jenkins, because we don't need to deploy here. On Fri, Oct 20, 2017 at 5:34 PM, Chris Olivier < cjolivie...@gmail.com> wrote: CodePipeline, then. You can point it to Jenkins instances. On Fri, Oct 20, 2017 at 4:49 PM Mu Li wrote: AWS CodeBuild is not an option. It doesn't support GPU instances, mac os x, and windows. Not even mention the edge devices. On Fri, Oct 20, 2017 at 4:07 PM, Chris Olivier < cjolivie...@gmail.com> wrote: Why don;t we look into fully managed AWS CodeBuild? It maintains everything. It's also compatible with Jenkins. On Fri, Oct 20, 2017 at 1:51 PM, Tianqi Chen < tqc...@cs.washington.edu wrote: +1 Tianqi On Fri, Oct 20, 2017 at 1:39 PM Mu Li wrote: +1 It seems that the Apache CI is quite overloaded these days, and MXNet's CI pipeline is too complex to run there. In addition, we may need to add more devices, e.g. macpro and rasbperry pi, into the server, and more tasks such as pip build. It means a lot of requests to the Infra team. We can reuse our previous Jenkins server at http://ci.mxnet.io/. But we probably need a dedicate developer to maintain it. On Fri, Oct 20, 2017 at 1:01 PM, sandeep krishnamurthy < sandeep.krishn...@gmail.com> wrote: Hello all, I am hereby opening up a discussion thread on how we can stabilize Apache MXNet CI build system. Problems: Recently, we have seen following issues with Apache MXNet CI build systems: 1. Apache Jenkins master is overloaded and we see issues like - unable to trigger builds, difficult to load and view the blue ocean and other Jenkins build status page. 2. We are generating too many request/interaction on Apache Infra team. 1. Addition/deletion of new slave: Caused from scaling activity, recycling, troubleshooting or any actions leading to change of slave machines. 2. Plugins / other Jenkins Master configurations. 3. Experimentation on CI pipelines. 3. Harder to debug and resolve issues - Since access to master and slave
Re: [Proposal] Stabilizing Apache MXNet CI build system
Thank you, everyone, for the discussion, proposal, and the vote. Here majority community members see current CI system for Apache MXNet is having issues in scaling and diverse test environments. And the common suggestion is to have a separate CI setup for Apache MXNet. Following are the next steps: 1. Meghana proposed she would like to take the lead on this and come up with an initial tech design write up covering requirements, use-cases, alternate solutions and a proposed solution on how we could set up the CI system for MXNet. 2. This tech design will be reviewed in the community and following that, collaborate with Infra team and mentors to complete setup in the integration of the new system with Repo and Website and more. @Pedro Larry - We should sync up on understanding how we can unify the set up you have for various devices and the new set up being proposed and built. Ideally, we should have a unified CI setup for the project accessible to the community. Regards, Sandeep On Mon, Oct 23, 2017 at 7:29 AM, Pedro Larroy wrote: > +1 > > We (with Kellen and Marco) are already working on a CI system that verifies > MXNet on devices, so far a work in progress, but at least we are checking > that the build is sane on Android, different arm flavors and ubuntu, also > building PRs. So far we are still working on having the unit tests pass on > some architectures like Jetson TX2 and ARM / Raspberry PI. > > http://ci.mxnet.amazon-ml.com/ > > Agree with Steffen on creating a document with requirements and high level > architecture. Also I would like to have quicker feedback and as we > discussed before, saner unit tests. I think there's a big and nontrivial > amount of effort required here. > > Pedro. > > On Mon, Oct 23, 2017 at 6:43 AM, Steffen Rochel > wrote: > > > +1 > > I support Option 1 - Set up separate Jenkins CI build system. While the > > Apache service is appropriate for some projects, our experience over the > > last 6 months has not been meeting the needs of the MXNet (incubating) > > project. AWS has been and will continue provide resources for such > project. > > Agree we should create a document summarizing the requirements and high > > level architecture, which should answer the question of Jenkins or > > alternative. > > > > Steffen > > > > On Sat, Oct 21, 2017 at 6:51 PM shiwen hu wrote: > > > > > +1 > > > > > > > > > 2017-10-21 9:48 GMT+08:00 Chris Olivier : > > > > > > > Ok, just looking for anything that can cut a task out if possible. I > do > > > > support not using Apache Jenkins server anyMore — it’s really not > been > > > > working out for various reasons. But having a person full time is > > > > something that Steffen would have to address, I imagine. > > > > > > > > On Fri, Oct 20, 2017 at 6:03 PM Mu Li wrote: > > > > > > > > > I didn't see the clear advantage of CodePipline over pure jenkins, > > > > because > > > > > we don't need to deploy here. > > > > > > > > > > On Fri, Oct 20, 2017 at 5:34 PM, Chris Olivier < > > cjolivie...@gmail.com> > > > > > wrote: > > > > > > > > > > > CodePipeline, then. You can point it to Jenkins instances. > > > > > > > > > > > > > > > > > > On Fri, Oct 20, 2017 at 4:49 PM Mu Li > wrote: > > > > > > > > > > > > > AWS CodeBuild is not an option. It doesn't support GPU > instances, > > > mac > > > > > os > > > > > > x, > > > > > > > and windows. Not even mention the edge devices. > > > > > > > > > > > > > > On Fri, Oct 20, 2017 at 4:07 PM, Chris Olivier < > > > > cjolivie...@gmail.com> > > > > > > > wrote: > > > > > > > > > > > > > > > Why don;t we look into fully managed AWS CodeBuild? It > > maintains > > > > > > > > everything. It's also compatible with Jenkins. > > > > > > > > > > > > > > > > On Fri, Oct 20, 2017 at 1:51 PM, Tianqi Chen < > > > > > tqc...@cs.washington.edu > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > +1 > > > > > > > > > > > > > > > > > > Tianqi > > > > > > > > > On Fri, Oct 20, 2017 at 1:39 PM Mu Li > > > > wrote: > > > > > > > > > > > > > > > > > > > +1 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > It seems that the Apache CI is quite overloaded these > days, > > > and > > > > > > > MXNet's > > > > > > > > > CI > > > > > > > > > > pipeline is too complex to run there. In addition, we may > > > need > > > > to > > > > > > add > > > > > > > > > more > > > > > > > > > > devices, e.g. macpro and rasbperry pi, into the server, > and > > > > more > > > > > > > tasks > > > > > > > > > such > > > > > > > > > > as pip build. It means a lot of requests to the Infra > team. > > > > > > > > > > > > > > > > > > > > We can reuse our previous Jenkins server at > > > > http://ci.mxnet.io/. > > > > > > But > > > > > > > > we > > > > > > > > > > probably need a dedicate developer to maintain it. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Oct 20, 2017 at 1:01 PM, sandeep krishnamurthy < > > > > > > > > > > sandeep.krishn...@gmail.com> wrote: > > > >
Re: [Proposal] Stabilizing Apache MXNet CI build system
+1 We (with Kellen and Marco) are already working on a CI system that verifies MXNet on devices, so far a work in progress, but at least we are checking that the build is sane on Android, different arm flavors and ubuntu, also building PRs. So far we are still working on having the unit tests pass on some architectures like Jetson TX2 and ARM / Raspberry PI. http://ci.mxnet.amazon-ml.com/ Agree with Steffen on creating a document with requirements and high level architecture. Also I would like to have quicker feedback and as we discussed before, saner unit tests. I think there's a big and nontrivial amount of effort required here. Pedro. On Mon, Oct 23, 2017 at 6:43 AM, Steffen Rochel wrote: > +1 > I support Option 1 - Set up separate Jenkins CI build system. While the > Apache service is appropriate for some projects, our experience over the > last 6 months has not been meeting the needs of the MXNet (incubating) > project. AWS has been and will continue provide resources for such project. > Agree we should create a document summarizing the requirements and high > level architecture, which should answer the question of Jenkins or > alternative. > > Steffen > > On Sat, Oct 21, 2017 at 6:51 PM shiwen hu wrote: > > > +1 > > > > > > 2017-10-21 9:48 GMT+08:00 Chris Olivier : > > > > > Ok, just looking for anything that can cut a task out if possible. I do > > > support not using Apache Jenkins server anyMore — it’s really not been > > > working out for various reasons. But having a person full time is > > > something that Steffen would have to address, I imagine. > > > > > > On Fri, Oct 20, 2017 at 6:03 PM Mu Li wrote: > > > > > > > I didn't see the clear advantage of CodePipline over pure jenkins, > > > because > > > > we don't need to deploy here. > > > > > > > > On Fri, Oct 20, 2017 at 5:34 PM, Chris Olivier < > cjolivie...@gmail.com> > > > > wrote: > > > > > > > > > CodePipeline, then. You can point it to Jenkins instances. > > > > > > > > > > > > > > > On Fri, Oct 20, 2017 at 4:49 PM Mu Li wrote: > > > > > > > > > > > AWS CodeBuild is not an option. It doesn't support GPU instances, > > mac > > > > os > > > > > x, > > > > > > and windows. Not even mention the edge devices. > > > > > > > > > > > > On Fri, Oct 20, 2017 at 4:07 PM, Chris Olivier < > > > cjolivie...@gmail.com> > > > > > > wrote: > > > > > > > > > > > > > Why don;t we look into fully managed AWS CodeBuild? It > maintains > > > > > > > everything. It's also compatible with Jenkins. > > > > > > > > > > > > > > On Fri, Oct 20, 2017 at 1:51 PM, Tianqi Chen < > > > > tqc...@cs.washington.edu > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > +1 > > > > > > > > > > > > > > > > Tianqi > > > > > > > > On Fri, Oct 20, 2017 at 1:39 PM Mu Li > > > wrote: > > > > > > > > > > > > > > > > > +1 > > > > > > > > > > > > > > > > > > > > > > > > > > > It seems that the Apache CI is quite overloaded these days, > > and > > > > > > MXNet's > > > > > > > > CI > > > > > > > > > pipeline is too complex to run there. In addition, we may > > need > > > to > > > > > add > > > > > > > > more > > > > > > > > > devices, e.g. macpro and rasbperry pi, into the server, and > > > more > > > > > > tasks > > > > > > > > such > > > > > > > > > as pip build. It means a lot of requests to the Infra team. > > > > > > > > > > > > > > > > > > We can reuse our previous Jenkins server at > > > http://ci.mxnet.io/. > > > > > But > > > > > > > we > > > > > > > > > probably need a dedicate developer to maintain it. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Oct 20, 2017 at 1:01 PM, sandeep krishnamurthy < > > > > > > > > > sandeep.krishn...@gmail.com> wrote: > > > > > > > > > > > > > > > > > > > Hello all, > > > > > > > > > > > > > > > > > > > > I am hereby opening up a discussion thread on how we can > > > > > stabilize > > > > > > > > Apache > > > > > > > > > > MXNet CI build system. > > > > > > > > > > > > > > > > > > > > Problems: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Recently, we have seen following issues with Apache MXNet > > CI > > > > > build > > > > > > > > > systems: > > > > > > > > > > > > > > > > > > > >1. Apache Jenkins master is overloaded and we see > issues > > > > like > > > > > - > > > > > > > > unable > > > > > > > > > >to trigger builds, difficult to load and view the blue > > > ocean > > > > > and > > > > > > > > other > > > > > > > > > >Jenkins build status page. > > > > > > > > > >2. We are generating too many request/interaction on > > > Apache > > > > > > Infra > > > > > > > > > team. > > > > > > > > > > 1. Addition/deletion of new slave: Caused from > > scaling > > > > > > > activity, > > > > > > > > > > recycling, troubleshooting or any actions leading > to > > > > change > > > > > > of > > > > > > > > > slave > > > > > > > > > > machines. > > > > > > > > > > 2. Plugins / other Jenkins Master configurations. >
Re: [Proposal] Stabilizing Apache MXNet CI build system
+1 I support Option 1 - Set up separate Jenkins CI build system. While the Apache service is appropriate for some projects, our experience over the last 6 months has not been meeting the needs of the MXNet (incubating) project. AWS has been and will continue provide resources for such project. Agree we should create a document summarizing the requirements and high level architecture, which should answer the question of Jenkins or alternative. Steffen On Sat, Oct 21, 2017 at 6:51 PM shiwen hu wrote: > +1 > > > 2017-10-21 9:48 GMT+08:00 Chris Olivier : > > > Ok, just looking for anything that can cut a task out if possible. I do > > support not using Apache Jenkins server anyMore — it’s really not been > > working out for various reasons. But having a person full time is > > something that Steffen would have to address, I imagine. > > > > On Fri, Oct 20, 2017 at 6:03 PM Mu Li wrote: > > > > > I didn't see the clear advantage of CodePipline over pure jenkins, > > because > > > we don't need to deploy here. > > > > > > On Fri, Oct 20, 2017 at 5:34 PM, Chris Olivier > > > wrote: > > > > > > > CodePipeline, then. You can point it to Jenkins instances. > > > > > > > > > > > > On Fri, Oct 20, 2017 at 4:49 PM Mu Li wrote: > > > > > > > > > AWS CodeBuild is not an option. It doesn't support GPU instances, > mac > > > os > > > > x, > > > > > and windows. Not even mention the edge devices. > > > > > > > > > > On Fri, Oct 20, 2017 at 4:07 PM, Chris Olivier < > > cjolivie...@gmail.com> > > > > > wrote: > > > > > > > > > > > Why don;t we look into fully managed AWS CodeBuild? It maintains > > > > > > everything. It's also compatible with Jenkins. > > > > > > > > > > > > On Fri, Oct 20, 2017 at 1:51 PM, Tianqi Chen < > > > tqc...@cs.washington.edu > > > > > > > > > > > wrote: > > > > > > > > > > > > > +1 > > > > > > > > > > > > > > Tianqi > > > > > > > On Fri, Oct 20, 2017 at 1:39 PM Mu Li > > wrote: > > > > > > > > > > > > > > > +1 > > > > > > > > > > > > > > > > > > > > > > > > It seems that the Apache CI is quite overloaded these days, > and > > > > > MXNet's > > > > > > > CI > > > > > > > > pipeline is too complex to run there. In addition, we may > need > > to > > > > add > > > > > > > more > > > > > > > > devices, e.g. macpro and rasbperry pi, into the server, and > > more > > > > > tasks > > > > > > > such > > > > > > > > as pip build. It means a lot of requests to the Infra team. > > > > > > > > > > > > > > > > We can reuse our previous Jenkins server at > > http://ci.mxnet.io/. > > > > But > > > > > > we > > > > > > > > probably need a dedicate developer to maintain it. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Oct 20, 2017 at 1:01 PM, sandeep krishnamurthy < > > > > > > > > sandeep.krishn...@gmail.com> wrote: > > > > > > > > > > > > > > > > > Hello all, > > > > > > > > > > > > > > > > > > I am hereby opening up a discussion thread on how we can > > > > stabilize > > > > > > > Apache > > > > > > > > > MXNet CI build system. > > > > > > > > > > > > > > > > > > Problems: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Recently, we have seen following issues with Apache MXNet > CI > > > > build > > > > > > > > systems: > > > > > > > > > > > > > > > > > >1. Apache Jenkins master is overloaded and we see issues > > > like > > > > - > > > > > > > unable > > > > > > > > >to trigger builds, difficult to load and view the blue > > ocean > > > > and > > > > > > > other > > > > > > > > >Jenkins build status page. > > > > > > > > >2. We are generating too many request/interaction on > > Apache > > > > > Infra > > > > > > > > team. > > > > > > > > > 1. Addition/deletion of new slave: Caused from > scaling > > > > > > activity, > > > > > > > > > recycling, troubleshooting or any actions leading to > > > change > > > > > of > > > > > > > > slave > > > > > > > > > machines. > > > > > > > > > 2. Plugins / other Jenkins Master configurations. > > > > > > > > > 3. Experimentation on CI pipelines. > > > > > > > > >3. Harder to debug and resolve issues - Since access to > > > master > > > > > and > > > > > > > > slave > > > > > > > > >is not with the same community, it requires Infra and > > > > community > > > > > to > > > > > > > > dive > > > > > > > > >deep together on all action items. > > > > > > > > > > > > > > > > > > Possible Solutions: > > > > > > > > > > > > > > > > > > == > > > > > > > > > > > > > > > > > >1. Can we set up a separate Jenkins CI build system for > > > Apache > > > > > > MXNet > > > > > > > > >outside Apache Infra? > > > > > > > > >2. Can we have a separate Jenkins Master in Apache Infra > > for > > > > > > MXNet? > > > > > > > > >3. Review design of current setup, refine and fill the > > gaps. > > > > > > > > > > > > > > > > > > @ Mentors/Infra team/Community: > > > > > > > > > > > > > > > > > > == > > > > > > > > > > > > > >
Re: [Proposal] Stabilizing Apache MXNet CI build system
+1 2017-10-21 9:48 GMT+08:00 Chris Olivier : > Ok, just looking for anything that can cut a task out if possible. I do > support not using Apache Jenkins server anyMore — it’s really not been > working out for various reasons. But having a person full time is > something that Steffen would have to address, I imagine. > > On Fri, Oct 20, 2017 at 6:03 PM Mu Li wrote: > > > I didn't see the clear advantage of CodePipline over pure jenkins, > because > > we don't need to deploy here. > > > > On Fri, Oct 20, 2017 at 5:34 PM, Chris Olivier > > wrote: > > > > > CodePipeline, then. You can point it to Jenkins instances. > > > > > > > > > On Fri, Oct 20, 2017 at 4:49 PM Mu Li wrote: > > > > > > > AWS CodeBuild is not an option. It doesn't support GPU instances, mac > > os > > > x, > > > > and windows. Not even mention the edge devices. > > > > > > > > On Fri, Oct 20, 2017 at 4:07 PM, Chris Olivier < > cjolivie...@gmail.com> > > > > wrote: > > > > > > > > > Why don;t we look into fully managed AWS CodeBuild? It maintains > > > > > everything. It's also compatible with Jenkins. > > > > > > > > > > On Fri, Oct 20, 2017 at 1:51 PM, Tianqi Chen < > > tqc...@cs.washington.edu > > > > > > > > > wrote: > > > > > > > > > > > +1 > > > > > > > > > > > > Tianqi > > > > > > On Fri, Oct 20, 2017 at 1:39 PM Mu Li > wrote: > > > > > > > > > > > > > +1 > > > > > > > > > > > > > > > > > > > > > It seems that the Apache CI is quite overloaded these days, and > > > > MXNet's > > > > > > CI > > > > > > > pipeline is too complex to run there. In addition, we may need > to > > > add > > > > > > more > > > > > > > devices, e.g. macpro and rasbperry pi, into the server, and > more > > > > tasks > > > > > > such > > > > > > > as pip build. It means a lot of requests to the Infra team. > > > > > > > > > > > > > > We can reuse our previous Jenkins server at > http://ci.mxnet.io/. > > > But > > > > > we > > > > > > > probably need a dedicate developer to maintain it. > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Oct 20, 2017 at 1:01 PM, sandeep krishnamurthy < > > > > > > > sandeep.krishn...@gmail.com> wrote: > > > > > > > > > > > > > > > Hello all, > > > > > > > > > > > > > > > > I am hereby opening up a discussion thread on how we can > > > stabilize > > > > > > Apache > > > > > > > > MXNet CI build system. > > > > > > > > > > > > > > > > Problems: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Recently, we have seen following issues with Apache MXNet CI > > > build > > > > > > > systems: > > > > > > > > > > > > > > > >1. Apache Jenkins master is overloaded and we see issues > > like > > > - > > > > > > unable > > > > > > > >to trigger builds, difficult to load and view the blue > ocean > > > and > > > > > > other > > > > > > > >Jenkins build status page. > > > > > > > >2. We are generating too many request/interaction on > Apache > > > > Infra > > > > > > > team. > > > > > > > > 1. Addition/deletion of new slave: Caused from scaling > > > > > activity, > > > > > > > > recycling, troubleshooting or any actions leading to > > change > > > > of > > > > > > > slave > > > > > > > > machines. > > > > > > > > 2. Plugins / other Jenkins Master configurations. > > > > > > > > 3. Experimentation on CI pipelines. > > > > > > > >3. Harder to debug and resolve issues - Since access to > > master > > > > and > > > > > > > slave > > > > > > > >is not with the same community, it requires Infra and > > > community > > > > to > > > > > > > dive > > > > > > > >deep together on all action items. > > > > > > > > > > > > > > > > Possible Solutions: > > > > > > > > > > > > > > > > == > > > > > > > > > > > > > > > >1. Can we set up a separate Jenkins CI build system for > > Apache > > > > > MXNet > > > > > > > >outside Apache Infra? > > > > > > > >2. Can we have a separate Jenkins Master in Apache Infra > for > > > > > MXNet? > > > > > > > >3. Review design of current setup, refine and fill the > gaps. > > > > > > > > > > > > > > > > @ Mentors/Infra team/Community: > > > > > > > > > > > > > > > > == > > > > > > > > > > > > > > > > Please provide your suggestions on how we can proceed further > > and > > > > > work > > > > > > on > > > > > > > > stabilizing the CI build systems for MXNet. > > > > > > > > > > > > > > > > Also, if the community decides on separate Jenkins CI build > > > system, > > > > > > what > > > > > > > > important points should be taken care of apart from the > below: > > > > > > > > > > > > > > > >1. Community being able to access the build page for build > > > > > statuses. > > > > > > > >2. Committers being able to login with apache credentials. > > > > > > > >3. Hook setup from apache/incubator-mxnet repo to Jenkins > > > > master. > > > > > > > > > > > > > > > > > > > > > > > > Irrespective of the solution we come up, I think we should > > > > initiate a > > > > > >
Re: [Proposal] Stabilizing Apache MXNet CI build system
Ok, just looking for anything that can cut a task out if possible. I do support not using Apache Jenkins server anyMore — it’s really not been working out for various reasons. But having a person full time is something that Steffen would have to address, I imagine. On Fri, Oct 20, 2017 at 6:03 PM Mu Li wrote: > I didn't see the clear advantage of CodePipline over pure jenkins, because > we don't need to deploy here. > > On Fri, Oct 20, 2017 at 5:34 PM, Chris Olivier > wrote: > > > CodePipeline, then. You can point it to Jenkins instances. > > > > > > On Fri, Oct 20, 2017 at 4:49 PM Mu Li wrote: > > > > > AWS CodeBuild is not an option. It doesn't support GPU instances, mac > os > > x, > > > and windows. Not even mention the edge devices. > > > > > > On Fri, Oct 20, 2017 at 4:07 PM, Chris Olivier > > > wrote: > > > > > > > Why don;t we look into fully managed AWS CodeBuild? It maintains > > > > everything. It's also compatible with Jenkins. > > > > > > > > On Fri, Oct 20, 2017 at 1:51 PM, Tianqi Chen < > tqc...@cs.washington.edu > > > > > > > wrote: > > > > > > > > > +1 > > > > > > > > > > Tianqi > > > > > On Fri, Oct 20, 2017 at 1:39 PM Mu Li wrote: > > > > > > > > > > > +1 > > > > > > > > > > > > > > > > > > It seems that the Apache CI is quite overloaded these days, and > > > MXNet's > > > > > CI > > > > > > pipeline is too complex to run there. In addition, we may need to > > add > > > > > more > > > > > > devices, e.g. macpro and rasbperry pi, into the server, and more > > > tasks > > > > > such > > > > > > as pip build. It means a lot of requests to the Infra team. > > > > > > > > > > > > We can reuse our previous Jenkins server at http://ci.mxnet.io/. > > But > > > > we > > > > > > probably need a dedicate developer to maintain it. > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Oct 20, 2017 at 1:01 PM, sandeep krishnamurthy < > > > > > > sandeep.krishn...@gmail.com> wrote: > > > > > > > > > > > > > Hello all, > > > > > > > > > > > > > > I am hereby opening up a discussion thread on how we can > > stabilize > > > > > Apache > > > > > > > MXNet CI build system. > > > > > > > > > > > > > > Problems: > > > > > > > > > > > > > > > > > > > > > > > > > > > > Recently, we have seen following issues with Apache MXNet CI > > build > > > > > > systems: > > > > > > > > > > > > > >1. Apache Jenkins master is overloaded and we see issues > like > > - > > > > > unable > > > > > > >to trigger builds, difficult to load and view the blue ocean > > and > > > > > other > > > > > > >Jenkins build status page. > > > > > > >2. We are generating too many request/interaction on Apache > > > Infra > > > > > > team. > > > > > > > 1. Addition/deletion of new slave: Caused from scaling > > > > activity, > > > > > > > recycling, troubleshooting or any actions leading to > change > > > of > > > > > > slave > > > > > > > machines. > > > > > > > 2. Plugins / other Jenkins Master configurations. > > > > > > > 3. Experimentation on CI pipelines. > > > > > > >3. Harder to debug and resolve issues - Since access to > master > > > and > > > > > > slave > > > > > > >is not with the same community, it requires Infra and > > community > > > to > > > > > > dive > > > > > > >deep together on all action items. > > > > > > > > > > > > > > Possible Solutions: > > > > > > > > > > > > > > == > > > > > > > > > > > > > >1. Can we set up a separate Jenkins CI build system for > Apache > > > > MXNet > > > > > > >outside Apache Infra? > > > > > > >2. Can we have a separate Jenkins Master in Apache Infra for > > > > MXNet? > > > > > > >3. Review design of current setup, refine and fill the gaps. > > > > > > > > > > > > > > @ Mentors/Infra team/Community: > > > > > > > > > > > > > > == > > > > > > > > > > > > > > Please provide your suggestions on how we can proceed further > and > > > > work > > > > > on > > > > > > > stabilizing the CI build systems for MXNet. > > > > > > > > > > > > > > Also, if the community decides on separate Jenkins CI build > > system, > > > > > what > > > > > > > important points should be taken care of apart from the below: > > > > > > > > > > > > > >1. Community being able to access the build page for build > > > > statuses. > > > > > > >2. Committers being able to login with apache credentials. > > > > > > >3. Hook setup from apache/incubator-mxnet repo to Jenkins > > > master. > > > > > > > > > > > > > > > > > > > > > Irrespective of the solution we come up, I think we should > > > initiate a > > > > > > > technical design discussion on how to setup the CI build > system. > > > > > > Probably 1 > > > > > > > or 2 pager documents with the architecture and review with > Infra > > > and > > > > > > > community members. > > > > > > > > > > > > > > ***There were few proposal and discussion on the slack channel, > > to > > > > > reach > > > > > > > wider community members, moving tha
Re: [Proposal] Stabilizing Apache MXNet CI build system
I didn't see the clear advantage of CodePipline over pure jenkins, because we don't need to deploy here. On Fri, Oct 20, 2017 at 5:34 PM, Chris Olivier wrote: > CodePipeline, then. You can point it to Jenkins instances. > > > On Fri, Oct 20, 2017 at 4:49 PM Mu Li wrote: > > > AWS CodeBuild is not an option. It doesn't support GPU instances, mac os > x, > > and windows. Not even mention the edge devices. > > > > On Fri, Oct 20, 2017 at 4:07 PM, Chris Olivier > > wrote: > > > > > Why don;t we look into fully managed AWS CodeBuild? It maintains > > > everything. It's also compatible with Jenkins. > > > > > > On Fri, Oct 20, 2017 at 1:51 PM, Tianqi Chen > > > > wrote: > > > > > > > +1 > > > > > > > > Tianqi > > > > On Fri, Oct 20, 2017 at 1:39 PM Mu Li wrote: > > > > > > > > > +1 > > > > > > > > > > > > > > > It seems that the Apache CI is quite overloaded these days, and > > MXNet's > > > > CI > > > > > pipeline is too complex to run there. In addition, we may need to > add > > > > more > > > > > devices, e.g. macpro and rasbperry pi, into the server, and more > > tasks > > > > such > > > > > as pip build. It means a lot of requests to the Infra team. > > > > > > > > > > We can reuse our previous Jenkins server at http://ci.mxnet.io/. > But > > > we > > > > > probably need a dedicate developer to maintain it. > > > > > > > > > > > > > > > > > > > > On Fri, Oct 20, 2017 at 1:01 PM, sandeep krishnamurthy < > > > > > sandeep.krishn...@gmail.com> wrote: > > > > > > > > > > > Hello all, > > > > > > > > > > > > I am hereby opening up a discussion thread on how we can > stabilize > > > > Apache > > > > > > MXNet CI build system. > > > > > > > > > > > > Problems: > > > > > > > > > > > > > > > > > > > > > > > > Recently, we have seen following issues with Apache MXNet CI > build > > > > > systems: > > > > > > > > > > > >1. Apache Jenkins master is overloaded and we see issues like > - > > > > unable > > > > > >to trigger builds, difficult to load and view the blue ocean > and > > > > other > > > > > >Jenkins build status page. > > > > > >2. We are generating too many request/interaction on Apache > > Infra > > > > > team. > > > > > > 1. Addition/deletion of new slave: Caused from scaling > > > activity, > > > > > > recycling, troubleshooting or any actions leading to change > > of > > > > > slave > > > > > > machines. > > > > > > 2. Plugins / other Jenkins Master configurations. > > > > > > 3. Experimentation on CI pipelines. > > > > > >3. Harder to debug and resolve issues - Since access to master > > and > > > > > slave > > > > > >is not with the same community, it requires Infra and > community > > to > > > > > dive > > > > > >deep together on all action items. > > > > > > > > > > > > Possible Solutions: > > > > > > > > > > > > == > > > > > > > > > > > >1. Can we set up a separate Jenkins CI build system for Apache > > > MXNet > > > > > >outside Apache Infra? > > > > > >2. Can we have a separate Jenkins Master in Apache Infra for > > > MXNet? > > > > > >3. Review design of current setup, refine and fill the gaps. > > > > > > > > > > > > @ Mentors/Infra team/Community: > > > > > > > > > > > > == > > > > > > > > > > > > Please provide your suggestions on how we can proceed further and > > > work > > > > on > > > > > > stabilizing the CI build systems for MXNet. > > > > > > > > > > > > Also, if the community decides on separate Jenkins CI build > system, > > > > what > > > > > > important points should be taken care of apart from the below: > > > > > > > > > > > >1. Community being able to access the build page for build > > > statuses. > > > > > >2. Committers being able to login with apache credentials. > > > > > >3. Hook setup from apache/incubator-mxnet repo to Jenkins > > master. > > > > > > > > > > > > > > > > > > Irrespective of the solution we come up, I think we should > > initiate a > > > > > > technical design discussion on how to setup the CI build system. > > > > > Probably 1 > > > > > > or 2 pager documents with the architecture and review with Infra > > and > > > > > > community members. > > > > > > > > > > > > ***There were few proposal and discussion on the slack channel, > to > > > > reach > > > > > > wider community members, moving that discussion formally to this > > > list. > > > > > > > > > > > > > > > > > > My Proposal: Option 1 - Set up separate Jenkins CI build system. > > > > > > > > > > > > Thanks, > > > > > > > > > > > > Sandeep > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > Sandeep Krishnamurthy > > > > > > > > > > > > > > > > > > > > >
Re: [Proposal] Stabilizing Apache MXNet CI build system
CodePipeline, then. You can point it to Jenkins instances. On Fri, Oct 20, 2017 at 4:49 PM Mu Li wrote: > AWS CodeBuild is not an option. It doesn't support GPU instances, mac os x, > and windows. Not even mention the edge devices. > > On Fri, Oct 20, 2017 at 4:07 PM, Chris Olivier > wrote: > > > Why don;t we look into fully managed AWS CodeBuild? It maintains > > everything. It's also compatible with Jenkins. > > > > On Fri, Oct 20, 2017 at 1:51 PM, Tianqi Chen > > wrote: > > > > > +1 > > > > > > Tianqi > > > On Fri, Oct 20, 2017 at 1:39 PM Mu Li wrote: > > > > > > > +1 > > > > > > > > > > > > It seems that the Apache CI is quite overloaded these days, and > MXNet's > > > CI > > > > pipeline is too complex to run there. In addition, we may need to add > > > more > > > > devices, e.g. macpro and rasbperry pi, into the server, and more > tasks > > > such > > > > as pip build. It means a lot of requests to the Infra team. > > > > > > > > We can reuse our previous Jenkins server at http://ci.mxnet.io/. But > > we > > > > probably need a dedicate developer to maintain it. > > > > > > > > > > > > > > > > On Fri, Oct 20, 2017 at 1:01 PM, sandeep krishnamurthy < > > > > sandeep.krishn...@gmail.com> wrote: > > > > > > > > > Hello all, > > > > > > > > > > I am hereby opening up a discussion thread on how we can stabilize > > > Apache > > > > > MXNet CI build system. > > > > > > > > > > Problems: > > > > > > > > > > > > > > > > > > > > Recently, we have seen following issues with Apache MXNet CI build > > > > systems: > > > > > > > > > >1. Apache Jenkins master is overloaded and we see issues like - > > > unable > > > > >to trigger builds, difficult to load and view the blue ocean and > > > other > > > > >Jenkins build status page. > > > > >2. We are generating too many request/interaction on Apache > Infra > > > > team. > > > > > 1. Addition/deletion of new slave: Caused from scaling > > activity, > > > > > recycling, troubleshooting or any actions leading to change > of > > > > slave > > > > > machines. > > > > > 2. Plugins / other Jenkins Master configurations. > > > > > 3. Experimentation on CI pipelines. > > > > >3. Harder to debug and resolve issues - Since access to master > and > > > > slave > > > > >is not with the same community, it requires Infra and community > to > > > > dive > > > > >deep together on all action items. > > > > > > > > > > Possible Solutions: > > > > > > > > > > == > > > > > > > > > >1. Can we set up a separate Jenkins CI build system for Apache > > MXNet > > > > >outside Apache Infra? > > > > >2. Can we have a separate Jenkins Master in Apache Infra for > > MXNet? > > > > >3. Review design of current setup, refine and fill the gaps. > > > > > > > > > > @ Mentors/Infra team/Community: > > > > > > > > > > == > > > > > > > > > > Please provide your suggestions on how we can proceed further and > > work > > > on > > > > > stabilizing the CI build systems for MXNet. > > > > > > > > > > Also, if the community decides on separate Jenkins CI build system, > > > what > > > > > important points should be taken care of apart from the below: > > > > > > > > > >1. Community being able to access the build page for build > > statuses. > > > > >2. Committers being able to login with apache credentials. > > > > >3. Hook setup from apache/incubator-mxnet repo to Jenkins > master. > > > > > > > > > > > > > > > Irrespective of the solution we come up, I think we should > initiate a > > > > > technical design discussion on how to setup the CI build system. > > > > Probably 1 > > > > > or 2 pager documents with the architecture and review with Infra > and > > > > > community members. > > > > > > > > > > ***There were few proposal and discussion on the slack channel, to > > > reach > > > > > wider community members, moving that discussion formally to this > > list. > > > > > > > > > > > > > > > My Proposal: Option 1 - Set up separate Jenkins CI build system. > > > > > > > > > > Thanks, > > > > > > > > > > Sandeep > > > > > > > > > > > > > > > > > > > > -- > > > > > Sandeep Krishnamurthy > > > > > > > > > > > > > > >
Re: [Proposal] Stabilizing Apache MXNet CI build system
AWS CodeBuild is not an option. It doesn't support GPU instances, mac os x, and windows. Not even mention the edge devices. On Fri, Oct 20, 2017 at 4:07 PM, Chris Olivier wrote: > Why don;t we look into fully managed AWS CodeBuild? It maintains > everything. It's also compatible with Jenkins. > > On Fri, Oct 20, 2017 at 1:51 PM, Tianqi Chen > wrote: > > > +1 > > > > Tianqi > > On Fri, Oct 20, 2017 at 1:39 PM Mu Li wrote: > > > > > +1 > > > > > > > > > It seems that the Apache CI is quite overloaded these days, and MXNet's > > CI > > > pipeline is too complex to run there. In addition, we may need to add > > more > > > devices, e.g. macpro and rasbperry pi, into the server, and more tasks > > such > > > as pip build. It means a lot of requests to the Infra team. > > > > > > We can reuse our previous Jenkins server at http://ci.mxnet.io/. But > we > > > probably need a dedicate developer to maintain it. > > > > > > > > > > > > On Fri, Oct 20, 2017 at 1:01 PM, sandeep krishnamurthy < > > > sandeep.krishn...@gmail.com> wrote: > > > > > > > Hello all, > > > > > > > > I am hereby opening up a discussion thread on how we can stabilize > > Apache > > > > MXNet CI build system. > > > > > > > > Problems: > > > > > > > > > > > > > > > > Recently, we have seen following issues with Apache MXNet CI build > > > systems: > > > > > > > >1. Apache Jenkins master is overloaded and we see issues like - > > unable > > > >to trigger builds, difficult to load and view the blue ocean and > > other > > > >Jenkins build status page. > > > >2. We are generating too many request/interaction on Apache Infra > > > team. > > > > 1. Addition/deletion of new slave: Caused from scaling > activity, > > > > recycling, troubleshooting or any actions leading to change of > > > slave > > > > machines. > > > > 2. Plugins / other Jenkins Master configurations. > > > > 3. Experimentation on CI pipelines. > > > >3. Harder to debug and resolve issues - Since access to master and > > > slave > > > >is not with the same community, it requires Infra and community to > > > dive > > > >deep together on all action items. > > > > > > > > Possible Solutions: > > > > > > > > == > > > > > > > >1. Can we set up a separate Jenkins CI build system for Apache > MXNet > > > >outside Apache Infra? > > > >2. Can we have a separate Jenkins Master in Apache Infra for > MXNet? > > > >3. Review design of current setup, refine and fill the gaps. > > > > > > > > @ Mentors/Infra team/Community: > > > > > > > > == > > > > > > > > Please provide your suggestions on how we can proceed further and > work > > on > > > > stabilizing the CI build systems for MXNet. > > > > > > > > Also, if the community decides on separate Jenkins CI build system, > > what > > > > important points should be taken care of apart from the below: > > > > > > > >1. Community being able to access the build page for build > statuses. > > > >2. Committers being able to login with apache credentials. > > > >3. Hook setup from apache/incubator-mxnet repo to Jenkins master. > > > > > > > > > > > > Irrespective of the solution we come up, I think we should initiate a > > > > technical design discussion on how to setup the CI build system. > > > Probably 1 > > > > or 2 pager documents with the architecture and review with Infra and > > > > community members. > > > > > > > > ***There were few proposal and discussion on the slack channel, to > > reach > > > > wider community members, moving that discussion formally to this > list. > > > > > > > > > > > > My Proposal: Option 1 - Set up separate Jenkins CI build system. > > > > > > > > Thanks, > > > > > > > > Sandeep > > > > > > > > > > > > > > > > -- > > > > Sandeep Krishnamurthy > > > > > > > > > >
Re: [Proposal] Stabilizing Apache MXNet CI build system
I believe that Mu already started that discussion about using old mxnet.io Jenkins server. I expect deciding whether to replace would hinge in large part upon what it would be replaced with. On Fri, Oct 20, 2017 at 4:30 PM, sandeep krishnamurthy < sandeep.krishn...@gmail.com> wrote: > Chris: If the community decides to go with separate setup, then there will > be a tech design discussion and CodeCommit / Jenkins / Travis such > proposals will be covered and discussed. > > Thanks, > Sandeep > > On Fri, Oct 20, 2017 at 4:22 PM, Seb Kiureghian > wrote: > > > But the feather can definitely be added once MXNet graduates. > > > > On Fri, Oct 20, 2017 at 4:21 PM, Seb Kiureghian > > wrote: > > > > > The feather can only be used by Top Level Projects. > > > > > > On Fri, Oct 20, 2017 at 4:19 PM, Chris Olivier > > > wrote: > > > > > >> When the word Apache is in the Hadoop logo (not always), it includes > the > > >> feather and color scheme. > > >> > > >> On Fri, Oct 20, 2017 at 4:18 PM, Chris Olivier > > > >> wrote: > > >> > > >>> Thanks. > > >>> > > >>> Is there any way to work the feather into it? > > >>> > > >>> i.e. https://goo.gl/images/BU4dnG > > >>> > > >>> On Fri, Oct 20, 2017 at 4:11 PM, Seb Kiureghian > > >>> wrote: > > >>> > > https://imgur.com/a/aADkA > > > > On Fri, Oct 20, 2017 at 4:07 PM, Chris Olivier < > cjolivie...@gmail.com > > > > > wrote: > > > > > Why don;t we look into fully managed AWS CodeBuild? It maintains > > > everything. It's also compatible with Jenkins. > > > > > > On Fri, Oct 20, 2017 at 1:51 PM, Tianqi Chen < > > tqc...@cs.washington.edu> > > > wrote: > > > > > > > +1 > > > > > > > > Tianqi > > > > On Fri, Oct 20, 2017 at 1:39 PM Mu Li > wrote: > > > > > > > > > +1 > > > > > > > > > > > > > > > It seems that the Apache CI is quite overloaded these days, > and > > MXNet's > > > > CI > > > > > pipeline is too complex to run there. In addition, we may need > > to > > add > > > > more > > > > > devices, e.g. macpro and rasbperry pi, into the server, and > more > > tasks > > > > such > > > > > as pip build. It means a lot of requests to the Infra team. > > > > > > > > > > We can reuse our previous Jenkins server at > http://ci.mxnet.io/ > > . > > But > > > we > > > > > probably need a dedicate developer to maintain it. > > > > > > > > > > > > > > > > > > > > On Fri, Oct 20, 2017 at 1:01 PM, sandeep krishnamurthy < > > > > > sandeep.krishn...@gmail.com> wrote: > > > > > > > > > > > Hello all, > > > > > > > > > > > > I am hereby opening up a discussion thread on how we can > > stabilize > > > > Apache > > > > > > MXNet CI build system. > > > > > > > > > > > > Problems: > > > > > > > > > > > > > > > > > > > > > > > > Recently, we have seen following issues with Apache MXNet CI > > build > > > > > systems: > > > > > > > > > > > >1. Apache Jenkins master is overloaded and we see issues > > like - > > > > unable > > > > > >to trigger builds, difficult to load and view the blue > > ocean > > and > > > > other > > > > > >Jenkins build status page. > > > > > >2. We are generating too many request/interaction on > Apache > > Infra > > > > > team. > > > > > > 1. Addition/deletion of new slave: Caused from scaling > > > activity, > > > > > > recycling, troubleshooting or any actions leading to > > change of > > > > > slave > > > > > > machines. > > > > > > 2. Plugins / other Jenkins Master configurations. > > > > > > 3. Experimentation on CI pipelines. > > > > > >3. Harder to debug and resolve issues - Since access to > > master and > > > > > slave > > > > > >is not with the same community, it requires Infra and > > community to > > > > > dive > > > > > >deep together on all action items. > > > > > > > > > > > > Possible Solutions: > > > > > > > > > > > > == > > > > > > > > > > > >1. Can we set up a separate Jenkins CI build system for > > Apache > > > MXNet > > > > > >outside Apache Infra? > > > > > >2. Can we have a separate Jenkins Master in Apache Infra > > for > > > MXNet? > > > > > >3. Review design of current setup, refine and fill the > > gaps. > > > > > > > > > > > > @ Mentors/Infra team/Community: > > > > > > > > > > > > == > > > > > > > > > > > > Please provide your suggestions on how we can proceed > further > > and > > > work > > > > on > > > > > > stabilizing the CI build systems for MXNet. > > > > > > > > > > > > Also, if the community deci
Re: [Proposal] Stabilizing Apache MXNet CI build system
Chris: If the community decides to go with separate setup, then there will be a tech design discussion and CodeCommit / Jenkins / Travis such proposals will be covered and discussed. Thanks, Sandeep On Fri, Oct 20, 2017 at 4:22 PM, Seb Kiureghian wrote: > But the feather can definitely be added once MXNet graduates. > > On Fri, Oct 20, 2017 at 4:21 PM, Seb Kiureghian > wrote: > > > The feather can only be used by Top Level Projects. > > > > On Fri, Oct 20, 2017 at 4:19 PM, Chris Olivier > > wrote: > > > >> When the word Apache is in the Hadoop logo (not always), it includes the > >> feather and color scheme. > >> > >> On Fri, Oct 20, 2017 at 4:18 PM, Chris Olivier > >> wrote: > >> > >>> Thanks. > >>> > >>> Is there any way to work the feather into it? > >>> > >>> i.e. https://goo.gl/images/BU4dnG > >>> > >>> On Fri, Oct 20, 2017 at 4:11 PM, Seb Kiureghian > >>> wrote: > >>> > https://imgur.com/a/aADkA > > On Fri, Oct 20, 2017 at 4:07 PM, Chris Olivier > > wrote: > > > Why don;t we look into fully managed AWS CodeBuild? It maintains > > everything. It's also compatible with Jenkins. > > > > On Fri, Oct 20, 2017 at 1:51 PM, Tianqi Chen < > tqc...@cs.washington.edu> > > wrote: > > > > > +1 > > > > > > Tianqi > > > On Fri, Oct 20, 2017 at 1:39 PM Mu Li wrote: > > > > > > > +1 > > > > > > > > > > > > It seems that the Apache CI is quite overloaded these days, and > MXNet's > > > CI > > > > pipeline is too complex to run there. In addition, we may need > to > add > > > more > > > > devices, e.g. macpro and rasbperry pi, into the server, and more > tasks > > > such > > > > as pip build. It means a lot of requests to the Infra team. > > > > > > > > We can reuse our previous Jenkins server at http://ci.mxnet.io/ > . > But > > we > > > > probably need a dedicate developer to maintain it. > > > > > > > > > > > > > > > > On Fri, Oct 20, 2017 at 1:01 PM, sandeep krishnamurthy < > > > > sandeep.krishn...@gmail.com> wrote: > > > > > > > > > Hello all, > > > > > > > > > > I am hereby opening up a discussion thread on how we can > stabilize > > > Apache > > > > > MXNet CI build system. > > > > > > > > > > Problems: > > > > > > > > > > > > > > > > > > > > Recently, we have seen following issues with Apache MXNet CI > build > > > > systems: > > > > > > > > > >1. Apache Jenkins master is overloaded and we see issues > like - > > > unable > > > > >to trigger builds, difficult to load and view the blue > ocean > and > > > other > > > > >Jenkins build status page. > > > > >2. We are generating too many request/interaction on Apache > Infra > > > > team. > > > > > 1. Addition/deletion of new slave: Caused from scaling > > activity, > > > > > recycling, troubleshooting or any actions leading to > change of > > > > slave > > > > > machines. > > > > > 2. Plugins / other Jenkins Master configurations. > > > > > 3. Experimentation on CI pipelines. > > > > >3. Harder to debug and resolve issues - Since access to > master and > > > > slave > > > > >is not with the same community, it requires Infra and > community to > > > > dive > > > > >deep together on all action items. > > > > > > > > > > Possible Solutions: > > > > > > > > > > == > > > > > > > > > >1. Can we set up a separate Jenkins CI build system for > Apache > > MXNet > > > > >outside Apache Infra? > > > > >2. Can we have a separate Jenkins Master in Apache Infra > for > > MXNet? > > > > >3. Review design of current setup, refine and fill the > gaps. > > > > > > > > > > @ Mentors/Infra team/Community: > > > > > > > > > > == > > > > > > > > > > Please provide your suggestions on how we can proceed further > and > > work > > > on > > > > > stabilizing the CI build systems for MXNet. > > > > > > > > > > Also, if the community decides on separate Jenkins CI build > system, > > > what > > > > > important points should be taken care of apart from the below: > > > > > > > > > >1. Community being able to access the build page for build > > statuses. > > > > >2. Committers being able to login with apache credentials. > > > > >3. Hook setup from apache/incubator-mxnet repo to Jenkins > master. > > > > > > > > > > > > > > > Irrespective of the solution we come up, I think we should > initiate a > > > > > technical design discussion on how to setup the CI build > sys
Re: [Proposal] Stabilizing Apache MXNet CI build system
But the feather can definitely be added once MXNet graduates. On Fri, Oct 20, 2017 at 4:21 PM, Seb Kiureghian wrote: > The feather can only be used by Top Level Projects. > > On Fri, Oct 20, 2017 at 4:19 PM, Chris Olivier > wrote: > >> When the word Apache is in the Hadoop logo (not always), it includes the >> feather and color scheme. >> >> On Fri, Oct 20, 2017 at 4:18 PM, Chris Olivier >> wrote: >> >>> Thanks. >>> >>> Is there any way to work the feather into it? >>> >>> i.e. https://goo.gl/images/BU4dnG >>> >>> On Fri, Oct 20, 2017 at 4:11 PM, Seb Kiureghian >>> wrote: >>> https://imgur.com/a/aADkA On Fri, Oct 20, 2017 at 4:07 PM, Chris Olivier wrote: > Why don;t we look into fully managed AWS CodeBuild? It maintains > everything. It's also compatible with Jenkins. > > On Fri, Oct 20, 2017 at 1:51 PM, Tianqi Chen < tqc...@cs.washington.edu> > wrote: > > > +1 > > > > Tianqi > > On Fri, Oct 20, 2017 at 1:39 PM Mu Li wrote: > > > > > +1 > > > > > > > > > It seems that the Apache CI is quite overloaded these days, and MXNet's > > CI > > > pipeline is too complex to run there. In addition, we may need to add > > more > > > devices, e.g. macpro and rasbperry pi, into the server, and more tasks > > such > > > as pip build. It means a lot of requests to the Infra team. > > > > > > We can reuse our previous Jenkins server at http://ci.mxnet.io/. But > we > > > probably need a dedicate developer to maintain it. > > > > > > > > > > > > On Fri, Oct 20, 2017 at 1:01 PM, sandeep krishnamurthy < > > > sandeep.krishn...@gmail.com> wrote: > > > > > > > Hello all, > > > > > > > > I am hereby opening up a discussion thread on how we can stabilize > > Apache > > > > MXNet CI build system. > > > > > > > > Problems: > > > > > > > > > > > > > > > > Recently, we have seen following issues with Apache MXNet CI build > > > systems: > > > > > > > >1. Apache Jenkins master is overloaded and we see issues like - > > unable > > > >to trigger builds, difficult to load and view the blue ocean and > > other > > > >Jenkins build status page. > > > >2. We are generating too many request/interaction on Apache Infra > > > team. > > > > 1. Addition/deletion of new slave: Caused from scaling > activity, > > > > recycling, troubleshooting or any actions leading to change of > > > slave > > > > machines. > > > > 2. Plugins / other Jenkins Master configurations. > > > > 3. Experimentation on CI pipelines. > > > >3. Harder to debug and resolve issues - Since access to master and > > > slave > > > >is not with the same community, it requires Infra and community to > > > dive > > > >deep together on all action items. > > > > > > > > Possible Solutions: > > > > > > > > == > > > > > > > >1. Can we set up a separate Jenkins CI build system for Apache > MXNet > > > >outside Apache Infra? > > > >2. Can we have a separate Jenkins Master in Apache Infra for > MXNet? > > > >3. Review design of current setup, refine and fill the gaps. > > > > > > > > @ Mentors/Infra team/Community: > > > > > > > > == > > > > > > > > Please provide your suggestions on how we can proceed further and > work > > on > > > > stabilizing the CI build systems for MXNet. > > > > > > > > Also, if the community decides on separate Jenkins CI build system, > > what > > > > important points should be taken care of apart from the below: > > > > > > > >1. Community being able to access the build page for build > statuses. > > > >2. Committers being able to login with apache credentials. > > > >3. Hook setup from apache/incubator-mxnet repo to Jenkins master. > > > > > > > > > > > > Irrespective of the solution we come up, I think we should initiate a > > > > technical design discussion on how to setup the CI build system. > > > Probably 1 > > > > or 2 pager documents with the architecture and review with Infra and > > > > community members. > > > > > > > > ***There were few proposal and discussion on the slack channel, to > > reach > > > > wider community members, moving that discussion formally to this > list. > > > > > > > > > > > > My Proposal: Option 1 - Set up separate Jenkins CI build system. > > > > > > > > Thanks, > > > > > > > > Sandeep > > > > > > > > > > >
Re: [Proposal] Stabilizing Apache MXNet CI build system
The feather can only be used by Top Level Projects. On Fri, Oct 20, 2017 at 4:19 PM, Chris Olivier wrote: > When the word Apache is in the Hadoop logo (not always), it includes the > feather and color scheme. > > On Fri, Oct 20, 2017 at 4:18 PM, Chris Olivier > wrote: > >> Thanks. >> >> Is there any way to work the feather into it? >> >> i.e. https://goo.gl/images/BU4dnG >> >> On Fri, Oct 20, 2017 at 4:11 PM, Seb Kiureghian >> wrote: >> >>> https://imgur.com/a/aADkA >>> >>> On Fri, Oct 20, 2017 at 4:07 PM, Chris Olivier >>> wrote: >>> >>> > Why don;t we look into fully managed AWS CodeBuild? It maintains >>> > everything. It's also compatible with Jenkins. >>> > >>> > On Fri, Oct 20, 2017 at 1:51 PM, Tianqi Chen >> > >>> > wrote: >>> > >>> > > +1 >>> > > >>> > > Tianqi >>> > > On Fri, Oct 20, 2017 at 1:39 PM Mu Li wrote: >>> > > >>> > > > +1 >>> > > > >>> > > > >>> > > > It seems that the Apache CI is quite overloaded these days, and >>> MXNet's >>> > > CI >>> > > > pipeline is too complex to run there. In addition, we may need to >>> add >>> > > more >>> > > > devices, e.g. macpro and rasbperry pi, into the server, and more >>> tasks >>> > > such >>> > > > as pip build. It means a lot of requests to the Infra team. >>> > > > >>> > > > We can reuse our previous Jenkins server at http://ci.mxnet.io/. >>> But >>> > we >>> > > > probably need a dedicate developer to maintain it. >>> > > > >>> > > > >>> > > > >>> > > > On Fri, Oct 20, 2017 at 1:01 PM, sandeep krishnamurthy < >>> > > > sandeep.krishn...@gmail.com> wrote: >>> > > > >>> > > > > Hello all, >>> > > > > >>> > > > > I am hereby opening up a discussion thread on how we can >>> stabilize >>> > > Apache >>> > > > > MXNet CI build system. >>> > > > > >>> > > > > Problems: >>> > > > > >>> > > > > >>> > > > > >>> > > > > Recently, we have seen following issues with Apache MXNet CI >>> build >>> > > > systems: >>> > > > > >>> > > > >1. Apache Jenkins master is overloaded and we see issues like >>> - >>> > > unable >>> > > > >to trigger builds, difficult to load and view the blue ocean >>> and >>> > > other >>> > > > >Jenkins build status page. >>> > > > >2. We are generating too many request/interaction on Apache >>> Infra >>> > > > team. >>> > > > > 1. Addition/deletion of new slave: Caused from scaling >>> > activity, >>> > > > > recycling, troubleshooting or any actions leading to >>> change of >>> > > > slave >>> > > > > machines. >>> > > > > 2. Plugins / other Jenkins Master configurations. >>> > > > > 3. Experimentation on CI pipelines. >>> > > > >3. Harder to debug and resolve issues - Since access to >>> master and >>> > > > slave >>> > > > >is not with the same community, it requires Infra and >>> community to >>> > > > dive >>> > > > >deep together on all action items. >>> > > > > >>> > > > > Possible Solutions: >>> > > > > >>> > > > > == >>> > > > > >>> > > > >1. Can we set up a separate Jenkins CI build system for Apache >>> > MXNet >>> > > > >outside Apache Infra? >>> > > > >2. Can we have a separate Jenkins Master in Apache Infra for >>> > MXNet? >>> > > > >3. Review design of current setup, refine and fill the gaps. >>> > > > > >>> > > > > @ Mentors/Infra team/Community: >>> > > > > >>> > > > > == >>> > > > > >>> > > > > Please provide your suggestions on how we can proceed further and >>> > work >>> > > on >>> > > > > stabilizing the CI build systems for MXNet. >>> > > > > >>> > > > > Also, if the community decides on separate Jenkins CI build >>> system, >>> > > what >>> > > > > important points should be taken care of apart from the below: >>> > > > > >>> > > > >1. Community being able to access the build page for build >>> > statuses. >>> > > > >2. Committers being able to login with apache credentials. >>> > > > >3. Hook setup from apache/incubator-mxnet repo to Jenkins >>> master. >>> > > > > >>> > > > > >>> > > > > Irrespective of the solution we come up, I think we should >>> initiate a >>> > > > > technical design discussion on how to setup the CI build system. >>> > > > Probably 1 >>> > > > > or 2 pager documents with the architecture and review with Infra >>> and >>> > > > > community members. >>> > > > > >>> > > > > ***There were few proposal and discussion on the slack channel, >>> to >>> > > reach >>> > > > > wider community members, moving that discussion formally to this >>> > list. >>> > > > > >>> > > > > >>> > > > > My Proposal: Option 1 - Set up separate Jenkins CI build system. >>> > > > > >>> > > > > Thanks, >>> > > > > >>> > > > > Sandeep >>> > > > > >>> > > > > >>> > > > > >>> > > > > -- >>> > > > > Sandeep Krishnamurthy >>> > > > > >>> > > > >>> > > >>> > >>> >> >> >
Re: [Proposal] Stabilizing Apache MXNet CI build system
When the word Apache is in the Hadoop logo (not always), it includes the feather and color scheme. On Fri, Oct 20, 2017 at 4:18 PM, Chris Olivier wrote: > Thanks. > > Is there any way to work the feather into it? > > i.e. https://goo.gl/images/BU4dnG > > On Fri, Oct 20, 2017 at 4:11 PM, Seb Kiureghian > wrote: > >> https://imgur.com/a/aADkA >> >> On Fri, Oct 20, 2017 at 4:07 PM, Chris Olivier >> wrote: >> >> > Why don;t we look into fully managed AWS CodeBuild? It maintains >> > everything. It's also compatible with Jenkins. >> > >> > On Fri, Oct 20, 2017 at 1:51 PM, Tianqi Chen >> > wrote: >> > >> > > +1 >> > > >> > > Tianqi >> > > On Fri, Oct 20, 2017 at 1:39 PM Mu Li wrote: >> > > >> > > > +1 >> > > > >> > > > >> > > > It seems that the Apache CI is quite overloaded these days, and >> MXNet's >> > > CI >> > > > pipeline is too complex to run there. In addition, we may need to >> add >> > > more >> > > > devices, e.g. macpro and rasbperry pi, into the server, and more >> tasks >> > > such >> > > > as pip build. It means a lot of requests to the Infra team. >> > > > >> > > > We can reuse our previous Jenkins server at http://ci.mxnet.io/. >> But >> > we >> > > > probably need a dedicate developer to maintain it. >> > > > >> > > > >> > > > >> > > > On Fri, Oct 20, 2017 at 1:01 PM, sandeep krishnamurthy < >> > > > sandeep.krishn...@gmail.com> wrote: >> > > > >> > > > > Hello all, >> > > > > >> > > > > I am hereby opening up a discussion thread on how we can stabilize >> > > Apache >> > > > > MXNet CI build system. >> > > > > >> > > > > Problems: >> > > > > >> > > > > >> > > > > >> > > > > Recently, we have seen following issues with Apache MXNet CI build >> > > > systems: >> > > > > >> > > > >1. Apache Jenkins master is overloaded and we see issues like - >> > > unable >> > > > >to trigger builds, difficult to load and view the blue ocean >> and >> > > other >> > > > >Jenkins build status page. >> > > > >2. We are generating too many request/interaction on Apache >> Infra >> > > > team. >> > > > > 1. Addition/deletion of new slave: Caused from scaling >> > activity, >> > > > > recycling, troubleshooting or any actions leading to change >> of >> > > > slave >> > > > > machines. >> > > > > 2. Plugins / other Jenkins Master configurations. >> > > > > 3. Experimentation on CI pipelines. >> > > > >3. Harder to debug and resolve issues - Since access to master >> and >> > > > slave >> > > > >is not with the same community, it requires Infra and >> community to >> > > > dive >> > > > >deep together on all action items. >> > > > > >> > > > > Possible Solutions: >> > > > > >> > > > > == >> > > > > >> > > > >1. Can we set up a separate Jenkins CI build system for Apache >> > MXNet >> > > > >outside Apache Infra? >> > > > >2. Can we have a separate Jenkins Master in Apache Infra for >> > MXNet? >> > > > >3. Review design of current setup, refine and fill the gaps. >> > > > > >> > > > > @ Mentors/Infra team/Community: >> > > > > >> > > > > == >> > > > > >> > > > > Please provide your suggestions on how we can proceed further and >> > work >> > > on >> > > > > stabilizing the CI build systems for MXNet. >> > > > > >> > > > > Also, if the community decides on separate Jenkins CI build >> system, >> > > what >> > > > > important points should be taken care of apart from the below: >> > > > > >> > > > >1. Community being able to access the build page for build >> > statuses. >> > > > >2. Committers being able to login with apache credentials. >> > > > >3. Hook setup from apache/incubator-mxnet repo to Jenkins >> master. >> > > > > >> > > > > >> > > > > Irrespective of the solution we come up, I think we should >> initiate a >> > > > > technical design discussion on how to setup the CI build system. >> > > > Probably 1 >> > > > > or 2 pager documents with the architecture and review with Infra >> and >> > > > > community members. >> > > > > >> > > > > ***There were few proposal and discussion on the slack channel, to >> > > reach >> > > > > wider community members, moving that discussion formally to this >> > list. >> > > > > >> > > > > >> > > > > My Proposal: Option 1 - Set up separate Jenkins CI build system. >> > > > > >> > > > > Thanks, >> > > > > >> > > > > Sandeep >> > > > > >> > > > > >> > > > > >> > > > > -- >> > > > > Sandeep Krishnamurthy >> > > > > >> > > > >> > > >> > >> > >
Re: [Proposal] Stabilizing Apache MXNet CI build system
Thanks. Is there any way to work the feather into it? i.e. https://goo.gl/images/BU4dnG On Fri, Oct 20, 2017 at 4:11 PM, Seb Kiureghian wrote: > https://imgur.com/a/aADkA > > On Fri, Oct 20, 2017 at 4:07 PM, Chris Olivier > wrote: > > > Why don;t we look into fully managed AWS CodeBuild? It maintains > > everything. It's also compatible with Jenkins. > > > > On Fri, Oct 20, 2017 at 1:51 PM, Tianqi Chen > > wrote: > > > > > +1 > > > > > > Tianqi > > > On Fri, Oct 20, 2017 at 1:39 PM Mu Li wrote: > > > > > > > +1 > > > > > > > > > > > > It seems that the Apache CI is quite overloaded these days, and > MXNet's > > > CI > > > > pipeline is too complex to run there. In addition, we may need to add > > > more > > > > devices, e.g. macpro and rasbperry pi, into the server, and more > tasks > > > such > > > > as pip build. It means a lot of requests to the Infra team. > > > > > > > > We can reuse our previous Jenkins server at http://ci.mxnet.io/. But > > we > > > > probably need a dedicate developer to maintain it. > > > > > > > > > > > > > > > > On Fri, Oct 20, 2017 at 1:01 PM, sandeep krishnamurthy < > > > > sandeep.krishn...@gmail.com> wrote: > > > > > > > > > Hello all, > > > > > > > > > > I am hereby opening up a discussion thread on how we can stabilize > > > Apache > > > > > MXNet CI build system. > > > > > > > > > > Problems: > > > > > > > > > > > > > > > > > > > > Recently, we have seen following issues with Apache MXNet CI build > > > > systems: > > > > > > > > > >1. Apache Jenkins master is overloaded and we see issues like - > > > unable > > > > >to trigger builds, difficult to load and view the blue ocean and > > > other > > > > >Jenkins build status page. > > > > >2. We are generating too many request/interaction on Apache > Infra > > > > team. > > > > > 1. Addition/deletion of new slave: Caused from scaling > > activity, > > > > > recycling, troubleshooting or any actions leading to change > of > > > > slave > > > > > machines. > > > > > 2. Plugins / other Jenkins Master configurations. > > > > > 3. Experimentation on CI pipelines. > > > > >3. Harder to debug and resolve issues - Since access to master > and > > > > slave > > > > >is not with the same community, it requires Infra and community > to > > > > dive > > > > >deep together on all action items. > > > > > > > > > > Possible Solutions: > > > > > > > > > > == > > > > > > > > > >1. Can we set up a separate Jenkins CI build system for Apache > > MXNet > > > > >outside Apache Infra? > > > > >2. Can we have a separate Jenkins Master in Apache Infra for > > MXNet? > > > > >3. Review design of current setup, refine and fill the gaps. > > > > > > > > > > @ Mentors/Infra team/Community: > > > > > > > > > > == > > > > > > > > > > Please provide your suggestions on how we can proceed further and > > work > > > on > > > > > stabilizing the CI build systems for MXNet. > > > > > > > > > > Also, if the community decides on separate Jenkins CI build system, > > > what > > > > > important points should be taken care of apart from the below: > > > > > > > > > >1. Community being able to access the build page for build > > statuses. > > > > >2. Committers being able to login with apache credentials. > > > > >3. Hook setup from apache/incubator-mxnet repo to Jenkins > master. > > > > > > > > > > > > > > > Irrespective of the solution we come up, I think we should > initiate a > > > > > technical design discussion on how to setup the CI build system. > > > > Probably 1 > > > > > or 2 pager documents with the architecture and review with Infra > and > > > > > community members. > > > > > > > > > > ***There were few proposal and discussion on the slack channel, to > > > reach > > > > > wider community members, moving that discussion formally to this > > list. > > > > > > > > > > > > > > > My Proposal: Option 1 - Set up separate Jenkins CI build system. > > > > > > > > > > Thanks, > > > > > > > > > > Sandeep > > > > > > > > > > > > > > > > > > > > -- > > > > > Sandeep Krishnamurthy > > > > > > > > > > > > > > >
Re: [Proposal] Stabilizing Apache MXNet CI build system
https://imgur.com/a/aADkA On Fri, Oct 20, 2017 at 4:07 PM, Chris Olivier wrote: > Why don;t we look into fully managed AWS CodeBuild? It maintains > everything. It's also compatible with Jenkins. > > On Fri, Oct 20, 2017 at 1:51 PM, Tianqi Chen > wrote: > > > +1 > > > > Tianqi > > On Fri, Oct 20, 2017 at 1:39 PM Mu Li wrote: > > > > > +1 > > > > > > > > > It seems that the Apache CI is quite overloaded these days, and MXNet's > > CI > > > pipeline is too complex to run there. In addition, we may need to add > > more > > > devices, e.g. macpro and rasbperry pi, into the server, and more tasks > > such > > > as pip build. It means a lot of requests to the Infra team. > > > > > > We can reuse our previous Jenkins server at http://ci.mxnet.io/. But > we > > > probably need a dedicate developer to maintain it. > > > > > > > > > > > > On Fri, Oct 20, 2017 at 1:01 PM, sandeep krishnamurthy < > > > sandeep.krishn...@gmail.com> wrote: > > > > > > > Hello all, > > > > > > > > I am hereby opening up a discussion thread on how we can stabilize > > Apache > > > > MXNet CI build system. > > > > > > > > Problems: > > > > > > > > > > > > > > > > Recently, we have seen following issues with Apache MXNet CI build > > > systems: > > > > > > > >1. Apache Jenkins master is overloaded and we see issues like - > > unable > > > >to trigger builds, difficult to load and view the blue ocean and > > other > > > >Jenkins build status page. > > > >2. We are generating too many request/interaction on Apache Infra > > > team. > > > > 1. Addition/deletion of new slave: Caused from scaling > activity, > > > > recycling, troubleshooting or any actions leading to change of > > > slave > > > > machines. > > > > 2. Plugins / other Jenkins Master configurations. > > > > 3. Experimentation on CI pipelines. > > > >3. Harder to debug and resolve issues - Since access to master and > > > slave > > > >is not with the same community, it requires Infra and community to > > > dive > > > >deep together on all action items. > > > > > > > > Possible Solutions: > > > > > > > > == > > > > > > > >1. Can we set up a separate Jenkins CI build system for Apache > MXNet > > > >outside Apache Infra? > > > >2. Can we have a separate Jenkins Master in Apache Infra for > MXNet? > > > >3. Review design of current setup, refine and fill the gaps. > > > > > > > > @ Mentors/Infra team/Community: > > > > > > > > == > > > > > > > > Please provide your suggestions on how we can proceed further and > work > > on > > > > stabilizing the CI build systems for MXNet. > > > > > > > > Also, if the community decides on separate Jenkins CI build system, > > what > > > > important points should be taken care of apart from the below: > > > > > > > >1. Community being able to access the build page for build > statuses. > > > >2. Committers being able to login with apache credentials. > > > >3. Hook setup from apache/incubator-mxnet repo to Jenkins master. > > > > > > > > > > > > Irrespective of the solution we come up, I think we should initiate a > > > > technical design discussion on how to setup the CI build system. > > > Probably 1 > > > > or 2 pager documents with the architecture and review with Infra and > > > > community members. > > > > > > > > ***There were few proposal and discussion on the slack channel, to > > reach > > > > wider community members, moving that discussion formally to this > list. > > > > > > > > > > > > My Proposal: Option 1 - Set up separate Jenkins CI build system. > > > > > > > > Thanks, > > > > > > > > Sandeep > > > > > > > > > > > > > > > > -- > > > > Sandeep Krishnamurthy > > > > > > > > > >
Re: [Proposal] Stabilizing Apache MXNet CI build system
Why don;t we look into fully managed AWS CodeBuild? It maintains everything. It's also compatible with Jenkins. On Fri, Oct 20, 2017 at 1:51 PM, Tianqi Chen wrote: > +1 > > Tianqi > On Fri, Oct 20, 2017 at 1:39 PM Mu Li wrote: > > > +1 > > > > > > It seems that the Apache CI is quite overloaded these days, and MXNet's > CI > > pipeline is too complex to run there. In addition, we may need to add > more > > devices, e.g. macpro and rasbperry pi, into the server, and more tasks > such > > as pip build. It means a lot of requests to the Infra team. > > > > We can reuse our previous Jenkins server at http://ci.mxnet.io/. But we > > probably need a dedicate developer to maintain it. > > > > > > > > On Fri, Oct 20, 2017 at 1:01 PM, sandeep krishnamurthy < > > sandeep.krishn...@gmail.com> wrote: > > > > > Hello all, > > > > > > I am hereby opening up a discussion thread on how we can stabilize > Apache > > > MXNet CI build system. > > > > > > Problems: > > > > > > > > > > > > Recently, we have seen following issues with Apache MXNet CI build > > systems: > > > > > >1. Apache Jenkins master is overloaded and we see issues like - > unable > > >to trigger builds, difficult to load and view the blue ocean and > other > > >Jenkins build status page. > > >2. We are generating too many request/interaction on Apache Infra > > team. > > > 1. Addition/deletion of new slave: Caused from scaling activity, > > > recycling, troubleshooting or any actions leading to change of > > slave > > > machines. > > > 2. Plugins / other Jenkins Master configurations. > > > 3. Experimentation on CI pipelines. > > >3. Harder to debug and resolve issues - Since access to master and > > slave > > >is not with the same community, it requires Infra and community to > > dive > > >deep together on all action items. > > > > > > Possible Solutions: > > > > > > == > > > > > >1. Can we set up a separate Jenkins CI build system for Apache MXNet > > >outside Apache Infra? > > >2. Can we have a separate Jenkins Master in Apache Infra for MXNet? > > >3. Review design of current setup, refine and fill the gaps. > > > > > > @ Mentors/Infra team/Community: > > > > > > == > > > > > > Please provide your suggestions on how we can proceed further and work > on > > > stabilizing the CI build systems for MXNet. > > > > > > Also, if the community decides on separate Jenkins CI build system, > what > > > important points should be taken care of apart from the below: > > > > > >1. Community being able to access the build page for build statuses. > > >2. Committers being able to login with apache credentials. > > >3. Hook setup from apache/incubator-mxnet repo to Jenkins master. > > > > > > > > > Irrespective of the solution we come up, I think we should initiate a > > > technical design discussion on how to setup the CI build system. > > Probably 1 > > > or 2 pager documents with the architecture and review with Infra and > > > community members. > > > > > > ***There were few proposal and discussion on the slack channel, to > reach > > > wider community members, moving that discussion formally to this list. > > > > > > > > > My Proposal: Option 1 - Set up separate Jenkins CI build system. > > > > > > Thanks, > > > > > > Sandeep > > > > > > > > > > > > -- > > > Sandeep Krishnamurthy > > > > > >
Re: [Proposal] Stabilizing Apache MXNet CI build system
+1 Tianqi On Fri, Oct 20, 2017 at 1:39 PM Mu Li wrote: > +1 > > > It seems that the Apache CI is quite overloaded these days, and MXNet's CI > pipeline is too complex to run there. In addition, we may need to add more > devices, e.g. macpro and rasbperry pi, into the server, and more tasks such > as pip build. It means a lot of requests to the Infra team. > > We can reuse our previous Jenkins server at http://ci.mxnet.io/. But we > probably need a dedicate developer to maintain it. > > > > On Fri, Oct 20, 2017 at 1:01 PM, sandeep krishnamurthy < > sandeep.krishn...@gmail.com> wrote: > > > Hello all, > > > > I am hereby opening up a discussion thread on how we can stabilize Apache > > MXNet CI build system. > > > > Problems: > > > > > > > > Recently, we have seen following issues with Apache MXNet CI build > systems: > > > >1. Apache Jenkins master is overloaded and we see issues like - unable > >to trigger builds, difficult to load and view the blue ocean and other > >Jenkins build status page. > >2. We are generating too many request/interaction on Apache Infra > team. > > 1. Addition/deletion of new slave: Caused from scaling activity, > > recycling, troubleshooting or any actions leading to change of > slave > > machines. > > 2. Plugins / other Jenkins Master configurations. > > 3. Experimentation on CI pipelines. > >3. Harder to debug and resolve issues - Since access to master and > slave > >is not with the same community, it requires Infra and community to > dive > >deep together on all action items. > > > > Possible Solutions: > > > > == > > > >1. Can we set up a separate Jenkins CI build system for Apache MXNet > >outside Apache Infra? > >2. Can we have a separate Jenkins Master in Apache Infra for MXNet? > >3. Review design of current setup, refine and fill the gaps. > > > > @ Mentors/Infra team/Community: > > > > == > > > > Please provide your suggestions on how we can proceed further and work on > > stabilizing the CI build systems for MXNet. > > > > Also, if the community decides on separate Jenkins CI build system, what > > important points should be taken care of apart from the below: > > > >1. Community being able to access the build page for build statuses. > >2. Committers being able to login with apache credentials. > >3. Hook setup from apache/incubator-mxnet repo to Jenkins master. > > > > > > Irrespective of the solution we come up, I think we should initiate a > > technical design discussion on how to setup the CI build system. > Probably 1 > > or 2 pager documents with the architecture and review with Infra and > > community members. > > > > ***There were few proposal and discussion on the slack channel, to reach > > wider community members, moving that discussion formally to this list. > > > > > > My Proposal: Option 1 - Set up separate Jenkins CI build system. > > > > Thanks, > > > > Sandeep > > > > > > > > -- > > Sandeep Krishnamurthy > > >
Re: [Proposal] Stabilizing Apache MXNet CI build system
+1 It seems that the Apache CI is quite overloaded these days, and MXNet's CI pipeline is too complex to run there. In addition, we may need to add more devices, e.g. macpro and rasbperry pi, into the server, and more tasks such as pip build. It means a lot of requests to the Infra team. We can reuse our previous Jenkins server at http://ci.mxnet.io/. But we probably need a dedicate developer to maintain it. On Fri, Oct 20, 2017 at 1:01 PM, sandeep krishnamurthy < sandeep.krishn...@gmail.com> wrote: > Hello all, > > I am hereby opening up a discussion thread on how we can stabilize Apache > MXNet CI build system. > > Problems: > > > > Recently, we have seen following issues with Apache MXNet CI build systems: > >1. Apache Jenkins master is overloaded and we see issues like - unable >to trigger builds, difficult to load and view the blue ocean and other >Jenkins build status page. >2. We are generating too many request/interaction on Apache Infra team. > 1. Addition/deletion of new slave: Caused from scaling activity, > recycling, troubleshooting or any actions leading to change of slave > machines. > 2. Plugins / other Jenkins Master configurations. > 3. Experimentation on CI pipelines. >3. Harder to debug and resolve issues - Since access to master and slave >is not with the same community, it requires Infra and community to dive >deep together on all action items. > > Possible Solutions: > > == > >1. Can we set up a separate Jenkins CI build system for Apache MXNet >outside Apache Infra? >2. Can we have a separate Jenkins Master in Apache Infra for MXNet? >3. Review design of current setup, refine and fill the gaps. > > @ Mentors/Infra team/Community: > > == > > Please provide your suggestions on how we can proceed further and work on > stabilizing the CI build systems for MXNet. > > Also, if the community decides on separate Jenkins CI build system, what > important points should be taken care of apart from the below: > >1. Community being able to access the build page for build statuses. >2. Committers being able to login with apache credentials. >3. Hook setup from apache/incubator-mxnet repo to Jenkins master. > > > Irrespective of the solution we come up, I think we should initiate a > technical design discussion on how to setup the CI build system. Probably 1 > or 2 pager documents with the architecture and review with Infra and > community members. > > ***There were few proposal and discussion on the slack channel, to reach > wider community members, moving that discussion formally to this list. > > > My Proposal: Option 1 - Set up separate Jenkins CI build system. > > Thanks, > > Sandeep > > > > -- > Sandeep Krishnamurthy >
Fwd: [Proposal] Stabilizing Apache MXNet CI build system
Hello all, I am hereby opening up a discussion thread on how we can stabilize Apache MXNet CI build system. Problems: Recently, we have seen following issues with Apache MXNet CI build systems: 1. Apache Jenkins master is overloaded and we see issues like - unable to trigger builds, difficult to load and view the blue ocean and other Jenkins build status page. 2. We are generating too many request/interaction on Apache Infra team. 1. Addition/deletion of new slave: Caused from scaling activity, recycling, troubleshooting or any actions leading to change of slave machines. 2. Plugins / other Jenkins Master configurations. 3. Experimentation on CI pipelines. 3. Harder to debug and resolve issues - Since access to master and slave is not with the same community, it requires Infra and community to dive deep together on all action items. Possible Solutions: == 1. Can we set up a separate Jenkins CI build system for Apache MXNet outside Apache Infra? 2. Can we have a separate Jenkins Master in Apache Infra for MXNet? 3. Review design of current setup, refine and fill the gaps. @ Mentors/Infra team/Community: == Please provide your suggestions on how we can proceed further and work on stabilizing the CI build systems for MXNet. Also, if the community decides on separate Jenkins CI build system, what important points should be taken care of apart from the below: 1. Community being able to access the build page for build statuses. 2. Committers being able to login with apache credentials. 3. Hook setup from apache/incubator-mxnet repo to Jenkins master. Irrespective of the solution we come up, I think we should initiate a technical design discussion on how to setup the CI build system. Probably 1 or 2 pager documents with the architecture and review with Infra and community members. ***There were few proposal and discussion on the slack channel, to reach wider community members, moving that discussion formally to this list. My Proposal: Option 1 - Set up separate Jenkins CI build system. Thanks, Sandeep -- Sandeep Krishnamurthy