Re: [Proposal] Stabilizing Apache MXNet CI build system

2017-11-09 Thread Meghna Baijal
Chris,
The Windows slaves on apache use EIPs which makes it easier to
replace/reboot/reconnect these instances. But, there are some reasons
because of which EIPs cannot be used for ubuntu slaves
Several workarounds are being explored for this. And one such solution is
to use the aws codebuild plugin with Jenkins -

1. In Jenkins there is a plugin to integrate with aws codebuild which can
be used to automate slave management.
2. The idea is to configure only the *ubuntu* slaves using this plugin.
This addresses the issue of EIPs and automation on ubuntu.
3. Other platforms such as windows and Edge devices continue to be
configured directly through jenkins without using this plugin. This is ok
since windows slaves anyway use EIPs

At this point this is only in POC stage.

Thanks,
Meghna Baijal

On Thu, Nov 9, 2017 at 12:23 PM, Meghna Baijal 
wrote:

> Pedro, I created a row for BuildBot in the doc. Do you want to add some
> pros and cons about it? It would be good to have all this information
> collected in one place.
>
> Meghna
>
> On Thu, Nov 9, 2017 at 4:40 AM, Larroy, Pedro  wrote:
>
>> Thanks a lot for the document and leading the discussion.
>>
>> Does anybody have experience with a build system other than Jenkins? In
>> the document we mention Teamcity as a possible option, and there’s also the
>> second leading open source CI tool “Buildbot” which is not mentioned.
>>
>> I’m not sure if we have strong evidence to have an informed decision
>> about using something other than Jenkins, also from the document I get that
>> the negatives of Jenkins are pretty minor compared to the other frameworks.
>>
>> I would be interested to read if somebody has used any other framework in
>> depth and is willing to vote against using Jenkins so we can all do an
>> informed vote.
>>
>> I don’t feel comfortable voting for Jenkins because is the only one I
>> know as well.
>>
>> Kind regards.
>> --
>>
>> Pedro
>>
>> On 08/11/17 23:41, "Meghna Baijal"  wrote:
>>
>> Thanks for the active discussion on the document for the new CI for
>> MXNet.
>> Now that many of you have reviewed it, do you think I should start a
>> vote
>> on which framework the community wants to move forward with ?
>>
>> Thanks,
>> Meghna
>>
>> On Mon, Nov 6, 2017 at 6:59 PM, Chris Olivier 
>> wrote:
>>
>> > After a decision is reached, i am willing to add tasks to Apache
>> MXNet JIRA
>> >
>> > On Mon, Nov 6, 2017 at 6:15 AM, Pedro Larroy <
>> pedro.larroy.li...@gmail.com
>> > >
>> > wrote:
>> >
>> > > Thanks for setting up the document guys, looks like a solid basis
>> to
>> > > start to work on!
>> > >
>> > > Marco, Kellen and I have already added some comments.
>> > >
>> > > Pedro
>> > >
>> > >
>> > > On Sun, Nov 5, 2017 at 3:43 AM, Meghna Baijal
>> > >  wrote:
>> > > > Kellen, Thank you for your comments in the doc.
>> > > > Sure Steffen, I will continue to merge everyone’s comments into
>> the doc
>> > > and
>> > > > work with Pedro to finalize it.
>> > > > And then we can vote on the options.
>> > > >
>> > > > Thanks,
>> > > > Meghna Baijal
>> > > >
>> > > >
>> > > > On Sat, Nov 4, 2017 at 6:34 AM, Steffen Rochel <
>> > steffenroc...@gmail.com>
>> > > > wrote:
>> > > >
>> > > >> Sandeep and Meghna have been working in background collecting
>> input
>> > and
>> > > >> preparing a doc. I suggest to drive discussion forward and
>> would like
>> > to
>> > > >> ask everybody to contribute to
>> > > >> https://docs.google.com/document/d/17PEasQ2VWrXi2Cf7IGZSWGZM
>> awxDk
>> > > >> dlavUDASzUmLjk/edit?usp=sharing
>> > > >>
>> > > >> Lets converge on requirements and architecture, so we can move
>> forward
>> > > with
>> > > >> implementation.
>> > > >>
>> > > >> I would like to suggest for Pedro  and Meghna to lead the
>> discussion
>> > and
>> > > >> help to resolve suggestions.
>> > > >>
>> > > >> I assume we need a vote once we are converged on a good draft
>> to call
>> > > it a
>> > > >> plan and move forward with implementation. As we all are
>> unhappy with
>> > > the
>> > > >> current CI situation I would also suggest a phased approach,
>> so we can
>> > > get
>> > > >> back to reliable and efficient basic CI quickly and add
>> advanced
>> > > >> capabilities over time.
>> > > >>
>> > > >> Steffen
>> > > >>
>> > > >> On Wed, Nov 1, 2017 at 1:14 PM kellen sunderland <
>> > > >> kellen.sunderl...@gmail.com> wrote:
>> > > >>
>> > > >> > Hey Henri, I think that's what a few of us are advocating.
>> Running
>> > a
>> > > set
>> > > >> > of quick tests as part of the PR process, and then a more
>> detailed
>> > > >> > regression test suite periodically (say every 4 hours). This
>> fits
>> > > nicely
>> > > >> > into a tagging or 2 branch development system.  Commi

Re: [Proposal] Stabilizing Apache MXNet CI build system

2017-11-09 Thread Meghna Baijal
Pedro, I created a row for BuildBot in the doc. Do you want to add some
pros and cons about it? It would be good to have all this information
collected in one place.

Meghna

On Thu, Nov 9, 2017 at 4:40 AM, Larroy, Pedro  wrote:

> Thanks a lot for the document and leading the discussion.
>
> Does anybody have experience with a build system other than Jenkins? In
> the document we mention Teamcity as a possible option, and there’s also the
> second leading open source CI tool “Buildbot” which is not mentioned.
>
> I’m not sure if we have strong evidence to have an informed decision about
> using something other than Jenkins, also from the document I get that the
> negatives of Jenkins are pretty minor compared to the other frameworks.
>
> I would be interested to read if somebody has used any other framework in
> depth and is willing to vote against using Jenkins so we can all do an
> informed vote.
>
> I don’t feel comfortable voting for Jenkins because is the only one I know
> as well.
>
> Kind regards.
> --
>
> Pedro
>
> On 08/11/17 23:41, "Meghna Baijal"  wrote:
>
> Thanks for the active discussion on the document for the new CI for
> MXNet.
> Now that many of you have reviewed it, do you think I should start a
> vote
> on which framework the community wants to move forward with ?
>
> Thanks,
> Meghna
>
> On Mon, Nov 6, 2017 at 6:59 PM, Chris Olivier 
> wrote:
>
> > After a decision is reached, i am willing to add tasks to Apache
> MXNet JIRA
> >
> > On Mon, Nov 6, 2017 at 6:15 AM, Pedro Larroy <
> pedro.larroy.li...@gmail.com
> > >
> > wrote:
> >
> > > Thanks for setting up the document guys, looks like a solid basis
> to
> > > start to work on!
> > >
> > > Marco, Kellen and I have already added some comments.
> > >
> > > Pedro
> > >
> > >
> > > On Sun, Nov 5, 2017 at 3:43 AM, Meghna Baijal
> > >  wrote:
> > > > Kellen, Thank you for your comments in the doc.
> > > > Sure Steffen, I will continue to merge everyone’s comments into
> the doc
> > > and
> > > > work with Pedro to finalize it.
> > > > And then we can vote on the options.
> > > >
> > > > Thanks,
> > > > Meghna Baijal
> > > >
> > > >
> > > > On Sat, Nov 4, 2017 at 6:34 AM, Steffen Rochel <
> > steffenroc...@gmail.com>
> > > > wrote:
> > > >
> > > >> Sandeep and Meghna have been working in background collecting
> input
> > and
> > > >> preparing a doc. I suggest to drive discussion forward and
> would like
> > to
> > > >> ask everybody to contribute to
> > > >> https://docs.google.com/document/d/
> 17PEasQ2VWrXi2Cf7IGZSWGZMawxDk
> > > >> dlavUDASzUmLjk/edit?usp=sharing
> > > >>
> > > >> Lets converge on requirements and architecture, so we can move
> forward
> > > with
> > > >> implementation.
> > > >>
> > > >> I would like to suggest for Pedro  and Meghna to lead the
> discussion
> > and
> > > >> help to resolve suggestions.
> > > >>
> > > >> I assume we need a vote once we are converged on a good draft
> to call
> > > it a
> > > >> plan and move forward with implementation. As we all are
> unhappy with
> > > the
> > > >> current CI situation I would also suggest a phased approach, so
> we can
> > > get
> > > >> back to reliable and efficient basic CI quickly and add advanced
> > > >> capabilities over time.
> > > >>
> > > >> Steffen
> > > >>
> > > >> On Wed, Nov 1, 2017 at 1:14 PM kellen sunderland <
> > > >> kellen.sunderl...@gmail.com> wrote:
> > > >>
> > > >> > Hey Henri, I think that's what a few of us are advocating.
> Running
> > a
> > > set
> > > >> > of quick tests as part of the PR process, and then a more
> detailed
> > > >> > regression test suite periodically (say every 4 hours). This
> fits
> > > nicely
> > > >> > into a tagging or 2 branch development system.  Commits will
> be
> > tagged
> > > >> (or
> > > >> > merged into a stable branch) as soon as they pass the detailed
> > > regression
> > > >> > testing.
> > > >> >
> > > >> > On Wed, Nov 1, 2017 at 9:07 PM, Hen 
> wrote:
> > > >> >
> > > >> > > Random question - can the CI be split such that the Apache
> CI is
> > > doing
> > > >> a
> > > >> > > basic set of checks on that hardware, and is hooked to a
> PR, while
> > > >> there
> > > >> > is
> > > >> > > a larger "Is trunk good for release?" test that is running
> > > periodically
> > > >> > > rather than on every PR?
> > > >> > >
> > > >> > > ie: do we need each PR to be run on varied hardware, or can
> we
> > have
> > > >> this
> > > >> > > two tier approach?
> > > >> > >
> > > >> > > Hen
> > > >> > >
> > > >> > > On Fri, Oct 20, 2017 at 1:01 PM, sandeep krishnamurthy <
> > > >> > > sandeep.krishn...@gmail.com> wrote:
> > 

Re: [Proposal] Stabilizing Apache MXNet CI build system

2017-11-09 Thread Larroy, Pedro
Thanks a lot for the document and leading the discussion.

Does anybody have experience with a build system other than Jenkins? In the 
document we mention Teamcity as a possible option, and there’s also the second 
leading open source CI tool “Buildbot” which is not mentioned.

I’m not sure if we have strong evidence to have an informed decision about 
using something other than Jenkins, also from the document I get that the 
negatives of Jenkins are pretty minor compared to the other frameworks.

I would be interested to read if somebody has used any other framework in depth 
and is willing to vote against using Jenkins so we can all do an informed vote.

I don’t feel comfortable voting for Jenkins because is the only one I know as 
well.

Kind regards.
-- 

Pedro

On 08/11/17 23:41, "Meghna Baijal"  wrote:

Thanks for the active discussion on the document for the new CI for MXNet.
Now that many of you have reviewed it, do you think I should start a vote
on which framework the community wants to move forward with ?

Thanks,
Meghna

On Mon, Nov 6, 2017 at 6:59 PM, Chris Olivier  wrote:

> After a decision is reached, i am willing to add tasks to Apache MXNet 
JIRA
>
> On Mon, Nov 6, 2017 at 6:15 AM, Pedro Larroy  >
> wrote:
>
> > Thanks for setting up the document guys, looks like a solid basis to
> > start to work on!
> >
> > Marco, Kellen and I have already added some comments.
> >
> > Pedro
> >
> >
> > On Sun, Nov 5, 2017 at 3:43 AM, Meghna Baijal
> >  wrote:
> > > Kellen, Thank you for your comments in the doc.
> > > Sure Steffen, I will continue to merge everyone’s comments into the 
doc
> > and
> > > work with Pedro to finalize it.
> > > And then we can vote on the options.
> > >
> > > Thanks,
> > > Meghna Baijal
> > >
> > >
> > > On Sat, Nov 4, 2017 at 6:34 AM, Steffen Rochel <
> steffenroc...@gmail.com>
> > > wrote:
> > >
> > >> Sandeep and Meghna have been working in background collecting input
> and
> > >> preparing a doc. I suggest to drive discussion forward and would like
> to
> > >> ask everybody to contribute to
> > >> https://docs.google.com/document/d/17PEasQ2VWrXi2Cf7IGZSWGZMawxDk
> > >> dlavUDASzUmLjk/edit?usp=sharing
> > >>
> > >> Lets converge on requirements and architecture, so we can move 
forward
> > with
> > >> implementation.
> > >>
> > >> I would like to suggest for Pedro  and Meghna to lead the discussion
> and
> > >> help to resolve suggestions.
> > >>
> > >> I assume we need a vote once we are converged on a good draft to call
> > it a
> > >> plan and move forward with implementation. As we all are unhappy with
> > the
> > >> current CI situation I would also suggest a phased approach, so we 
can
> > get
> > >> back to reliable and efficient basic CI quickly and add advanced
> > >> capabilities over time.
> > >>
> > >> Steffen
> > >>
> > >> On Wed, Nov 1, 2017 at 1:14 PM kellen sunderland <
> > >> kellen.sunderl...@gmail.com> wrote:
> > >>
> > >> > Hey Henri, I think that's what a few of us are advocating.  Running
> a
> > set
> > >> > of quick tests as part of the PR process, and then a more detailed
> > >> > regression test suite periodically (say every 4 hours). This fits
> > nicely
> > >> > into a tagging or 2 branch development system.  Commits will be
> tagged
> > >> (or
> > >> > merged into a stable branch) as soon as they pass the detailed
> > regression
> > >> > testing.
> > >> >
> > >> > On Wed, Nov 1, 2017 at 9:07 PM, Hen  wrote:
> > >> >
> > >> > > Random question - can the CI be split such that the Apache CI is
> > doing
> > >> a
> > >> > > basic set of checks on that hardware, and is hooked to a PR, 
while
> > >> there
> > >> > is
> > >> > > a larger "Is trunk good for release?" test that is running
> > periodically
> > >> > > rather than on every PR?
> > >> > >
> > >> > > ie: do we need each PR to be run on varied hardware, or can we
> have
> > >> this
> > >> > > two tier approach?
> > >> > >
> > >> > > Hen
> > >> > >
> > >> > > On Fri, Oct 20, 2017 at 1:01 PM, sandeep krishnamurthy <
> > >> > > sandeep.krishn...@gmail.com> wrote:
> > >> > >
> > >> > > > Hello all,
> > >> > > >
> > >> > > > I am hereby opening up a discussion thread on how we can
> stabilize
> > >> > Apache
> > >> > > > MXNet CI build system.
> > >> > > >
> > >> > > > Problems:
> > >> > > >
> > >> > > > 
> > >> > > >
> > >> > > > Recently, we have seen following issues with Apache MXNet CI
> build
> > >> > > systems:
> > >> > > >
> > >> > > >1. Apache Jenkins master is overloaded and we see issues
> like -

Re: [Proposal] Stabilizing Apache MXNet CI build system

2017-11-08 Thread Chris Olivier
Can you please clarify the AWS Code Build/Windows issue? Does the document
state there is a workaround? I didn’t fully understand.


On Wed, Nov 8, 2017 at 9:32 PM sandeep krishnamurthy <
sandeep.krishn...@gmail.com> wrote:

> Good work Meghna and thanks to community members for participating in the
> discussion and providing valuable inputs.
> Yes please share the document again and ask for vote and more broader
> inputs.
>
> On Wed, Nov 8, 2017 at 2:43 PM, Chris Olivier 
> wrote:
>
> > +1
> >
> > On Wed, Nov 8, 2017 at 2:40 PM Meghna Baijal  >
> > wrote:
> >
> > > Thanks for the active discussion on the document for the new CI for
> > MXNet.
> > > Now that many of you have reviewed it, do you think I should start a
> vote
> > > on which framework the community wants to move forward with ?
> > >
> > > Thanks,
> > > Meghna
> > >
> > > On Mon, Nov 6, 2017 at 6:59 PM, Chris Olivier 
> > > wrote:
> > >
> > > > After a decision is reached, i am willing to add tasks to Apache
> MXNet
> > > JIRA
> > > >
> > > > On Mon, Nov 6, 2017 at 6:15 AM, Pedro Larroy <
> > > pedro.larroy.li...@gmail.com
> > > > >
> > > > wrote:
> > > >
> > > > > Thanks for setting up the document guys, looks like a solid basis
> to
> > > > > start to work on!
> > > > >
> > > > > Marco, Kellen and I have already added some comments.
> > > > >
> > > > > Pedro
> > > > >
> > > > >
> > > > > On Sun, Nov 5, 2017 at 3:43 AM, Meghna Baijal
> > > > >  wrote:
> > > > > > Kellen, Thank you for your comments in the doc.
> > > > > > Sure Steffen, I will continue to merge everyone’s comments into
> the
> > > doc
> > > > > and
> > > > > > work with Pedro to finalize it.
> > > > > > And then we can vote on the options.
> > > > > >
> > > > > > Thanks,
> > > > > > Meghna Baijal
> > > > > >
> > > > > >
> > > > > > On Sat, Nov 4, 2017 at 6:34 AM, Steffen Rochel <
> > > > steffenroc...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > >> Sandeep and Meghna have been working in background collecting
> > input
> > > > and
> > > > > >> preparing a doc. I suggest to drive discussion forward and would
> > > like
> > > > to
> > > > > >> ask everybody to contribute to
> > > > > >>
> https://docs.google.com/document/d/17PEasQ2VWrXi2Cf7IGZSWGZMawxDk
> > > > > >> dlavUDASzUmLjk/edit?usp=sharing
> > > > > >>
> > > > > >> Lets converge on requirements and architecture, so we can move
> > > forward
> > > > > with
> > > > > >> implementation.
> > > > > >>
> > > > > >> I would like to suggest for Pedro  and Meghna to lead the
> > discussion
> > > > and
> > > > > >> help to resolve suggestions.
> > > > > >>
> > > > > >> I assume we need a vote once we are converged on a good draft to
> > > call
> > > > > it a
> > > > > >> plan and move forward with implementation. As we all are unhappy
> > > with
> > > > > the
> > > > > >> current CI situation I would also suggest a phased approach, so
> we
> > > can
> > > > > get
> > > > > >> back to reliable and efficient basic CI quickly and add advanced
> > > > > >> capabilities over time.
> > > > > >>
> > > > > >> Steffen
> > > > > >>
> > > > > >> On Wed, Nov 1, 2017 at 1:14 PM kellen sunderland <
> > > > > >> kellen.sunderl...@gmail.com> wrote:
> > > > > >>
> > > > > >> > Hey Henri, I think that's what a few of us are advocating.
> > > Running
> > > > a
> > > > > set
> > > > > >> > of quick tests as part of the PR process, and then a more
> > detailed
> > > > > >> > regression test suite periodically (say every 4 hours). This
> > fits
> > > > > nicely
> > > > > >> > into a tagging or 2 branch development system.  Commits will
> be
> > > > tagged
> > > > > >> (or
> > > > > >> > merged into a stable branch) as soon as they pass the detailed
> > > > > regression
> > > > > >> > testing.
> > > > > >> >
> > > > > >> > On Wed, Nov 1, 2017 at 9:07 PM, Hen 
> wrote:
> > > > > >> >
> > > > > >> > > Random question - can the CI be split such that the Apache
> CI
> > is
> > > > > doing
> > > > > >> a
> > > > > >> > > basic set of checks on that hardware, and is hooked to a PR,
> > > while
> > > > > >> there
> > > > > >> > is
> > > > > >> > > a larger "Is trunk good for release?" test that is running
> > > > > periodically
> > > > > >> > > rather than on every PR?
> > > > > >> > >
> > > > > >> > > ie: do we need each PR to be run on varied hardware, or can
> we
> > > > have
> > > > > >> this
> > > > > >> > > two tier approach?
> > > > > >> > >
> > > > > >> > > Hen
> > > > > >> > >
> > > > > >> > > On Fri, Oct 20, 2017 at 1:01 PM, sandeep krishnamurthy <
> > > > > >> > > sandeep.krishn...@gmail.com> wrote:
> > > > > >> > >
> > > > > >> > > > Hello all,
> > > > > >> > > >
> > > > > >> > > > I am hereby opening up a discussion thread on how we can
> > > > stabilize
> > > > > >> > Apache
> > > > > >> > > > MXNet CI build system.
> > > > > >> > > >
> > > > > >> > > > Problems:
> > > > > >> > > >
> > > > > >> > > > 
> > > > > >> > > >
> > > > > >> > > > Recently, we have seen following issues with Apache MXNet
> CI
> > > > build
> > > > > >> > > 

Re: [Proposal] Stabilizing Apache MXNet CI build system

2017-11-08 Thread sandeep krishnamurthy
Good work Meghna and thanks to community members for participating in the
discussion and providing valuable inputs.
Yes please share the document again and ask for vote and more broader
inputs.

On Wed, Nov 8, 2017 at 2:43 PM, Chris Olivier  wrote:

> +1
>
> On Wed, Nov 8, 2017 at 2:40 PM Meghna Baijal 
> wrote:
>
> > Thanks for the active discussion on the document for the new CI for
> MXNet.
> > Now that many of you have reviewed it, do you think I should start a vote
> > on which framework the community wants to move forward with ?
> >
> > Thanks,
> > Meghna
> >
> > On Mon, Nov 6, 2017 at 6:59 PM, Chris Olivier 
> > wrote:
> >
> > > After a decision is reached, i am willing to add tasks to Apache MXNet
> > JIRA
> > >
> > > On Mon, Nov 6, 2017 at 6:15 AM, Pedro Larroy <
> > pedro.larroy.li...@gmail.com
> > > >
> > > wrote:
> > >
> > > > Thanks for setting up the document guys, looks like a solid basis to
> > > > start to work on!
> > > >
> > > > Marco, Kellen and I have already added some comments.
> > > >
> > > > Pedro
> > > >
> > > >
> > > > On Sun, Nov 5, 2017 at 3:43 AM, Meghna Baijal
> > > >  wrote:
> > > > > Kellen, Thank you for your comments in the doc.
> > > > > Sure Steffen, I will continue to merge everyone’s comments into the
> > doc
> > > > and
> > > > > work with Pedro to finalize it.
> > > > > And then we can vote on the options.
> > > > >
> > > > > Thanks,
> > > > > Meghna Baijal
> > > > >
> > > > >
> > > > > On Sat, Nov 4, 2017 at 6:34 AM, Steffen Rochel <
> > > steffenroc...@gmail.com>
> > > > > wrote:
> > > > >
> > > > >> Sandeep and Meghna have been working in background collecting
> input
> > > and
> > > > >> preparing a doc. I suggest to drive discussion forward and would
> > like
> > > to
> > > > >> ask everybody to contribute to
> > > > >> https://docs.google.com/document/d/17PEasQ2VWrXi2Cf7IGZSWGZMawxDk
> > > > >> dlavUDASzUmLjk/edit?usp=sharing
> > > > >>
> > > > >> Lets converge on requirements and architecture, so we can move
> > forward
> > > > with
> > > > >> implementation.
> > > > >>
> > > > >> I would like to suggest for Pedro  and Meghna to lead the
> discussion
> > > and
> > > > >> help to resolve suggestions.
> > > > >>
> > > > >> I assume we need a vote once we are converged on a good draft to
> > call
> > > > it a
> > > > >> plan and move forward with implementation. As we all are unhappy
> > with
> > > > the
> > > > >> current CI situation I would also suggest a phased approach, so we
> > can
> > > > get
> > > > >> back to reliable and efficient basic CI quickly and add advanced
> > > > >> capabilities over time.
> > > > >>
> > > > >> Steffen
> > > > >>
> > > > >> On Wed, Nov 1, 2017 at 1:14 PM kellen sunderland <
> > > > >> kellen.sunderl...@gmail.com> wrote:
> > > > >>
> > > > >> > Hey Henri, I think that's what a few of us are advocating.
> > Running
> > > a
> > > > set
> > > > >> > of quick tests as part of the PR process, and then a more
> detailed
> > > > >> > regression test suite periodically (say every 4 hours). This
> fits
> > > > nicely
> > > > >> > into a tagging or 2 branch development system.  Commits will be
> > > tagged
> > > > >> (or
> > > > >> > merged into a stable branch) as soon as they pass the detailed
> > > > regression
> > > > >> > testing.
> > > > >> >
> > > > >> > On Wed, Nov 1, 2017 at 9:07 PM, Hen  wrote:
> > > > >> >
> > > > >> > > Random question - can the CI be split such that the Apache CI
> is
> > > > doing
> > > > >> a
> > > > >> > > basic set of checks on that hardware, and is hooked to a PR,
> > while
> > > > >> there
> > > > >> > is
> > > > >> > > a larger "Is trunk good for release?" test that is running
> > > > periodically
> > > > >> > > rather than on every PR?
> > > > >> > >
> > > > >> > > ie: do we need each PR to be run on varied hardware, or can we
> > > have
> > > > >> this
> > > > >> > > two tier approach?
> > > > >> > >
> > > > >> > > Hen
> > > > >> > >
> > > > >> > > On Fri, Oct 20, 2017 at 1:01 PM, sandeep krishnamurthy <
> > > > >> > > sandeep.krishn...@gmail.com> wrote:
> > > > >> > >
> > > > >> > > > Hello all,
> > > > >> > > >
> > > > >> > > > I am hereby opening up a discussion thread on how we can
> > > stabilize
> > > > >> > Apache
> > > > >> > > > MXNet CI build system.
> > > > >> > > >
> > > > >> > > > Problems:
> > > > >> > > >
> > > > >> > > > 
> > > > >> > > >
> > > > >> > > > Recently, we have seen following issues with Apache MXNet CI
> > > build
> > > > >> > > systems:
> > > > >> > > >
> > > > >> > > >1. Apache Jenkins master is overloaded and we see issues
> > > like -
> > > > >> > unable
> > > > >> > > >to trigger builds, difficult to load and view the blue
> > ocean
> > > > and
> > > > >> > other
> > > > >> > > >Jenkins build status page.
> > > > >> > > >2. We are generating too many request/interaction on
> Apache
> > > > Infra
> > > > >> > > team.
> > > > >> > > >   1. Addition/deletion of new slave: Caused from scaling
> > > > >> activity,
> > > > >> > > >   recyc

Re: [Proposal] Stabilizing Apache MXNet CI build system

2017-11-08 Thread Chris Olivier
+1

On Wed, Nov 8, 2017 at 2:40 PM Meghna Baijal 
wrote:

> Thanks for the active discussion on the document for the new CI for MXNet.
> Now that many of you have reviewed it, do you think I should start a vote
> on which framework the community wants to move forward with ?
>
> Thanks,
> Meghna
>
> On Mon, Nov 6, 2017 at 6:59 PM, Chris Olivier 
> wrote:
>
> > After a decision is reached, i am willing to add tasks to Apache MXNet
> JIRA
> >
> > On Mon, Nov 6, 2017 at 6:15 AM, Pedro Larroy <
> pedro.larroy.li...@gmail.com
> > >
> > wrote:
> >
> > > Thanks for setting up the document guys, looks like a solid basis to
> > > start to work on!
> > >
> > > Marco, Kellen and I have already added some comments.
> > >
> > > Pedro
> > >
> > >
> > > On Sun, Nov 5, 2017 at 3:43 AM, Meghna Baijal
> > >  wrote:
> > > > Kellen, Thank you for your comments in the doc.
> > > > Sure Steffen, I will continue to merge everyone’s comments into the
> doc
> > > and
> > > > work with Pedro to finalize it.
> > > > And then we can vote on the options.
> > > >
> > > > Thanks,
> > > > Meghna Baijal
> > > >
> > > >
> > > > On Sat, Nov 4, 2017 at 6:34 AM, Steffen Rochel <
> > steffenroc...@gmail.com>
> > > > wrote:
> > > >
> > > >> Sandeep and Meghna have been working in background collecting input
> > and
> > > >> preparing a doc. I suggest to drive discussion forward and would
> like
> > to
> > > >> ask everybody to contribute to
> > > >> https://docs.google.com/document/d/17PEasQ2VWrXi2Cf7IGZSWGZMawxDk
> > > >> dlavUDASzUmLjk/edit?usp=sharing
> > > >>
> > > >> Lets converge on requirements and architecture, so we can move
> forward
> > > with
> > > >> implementation.
> > > >>
> > > >> I would like to suggest for Pedro  and Meghna to lead the discussion
> > and
> > > >> help to resolve suggestions.
> > > >>
> > > >> I assume we need a vote once we are converged on a good draft to
> call
> > > it a
> > > >> plan and move forward with implementation. As we all are unhappy
> with
> > > the
> > > >> current CI situation I would also suggest a phased approach, so we
> can
> > > get
> > > >> back to reliable and efficient basic CI quickly and add advanced
> > > >> capabilities over time.
> > > >>
> > > >> Steffen
> > > >>
> > > >> On Wed, Nov 1, 2017 at 1:14 PM kellen sunderland <
> > > >> kellen.sunderl...@gmail.com> wrote:
> > > >>
> > > >> > Hey Henri, I think that's what a few of us are advocating.
> Running
> > a
> > > set
> > > >> > of quick tests as part of the PR process, and then a more detailed
> > > >> > regression test suite periodically (say every 4 hours). This fits
> > > nicely
> > > >> > into a tagging or 2 branch development system.  Commits will be
> > tagged
> > > >> (or
> > > >> > merged into a stable branch) as soon as they pass the detailed
> > > regression
> > > >> > testing.
> > > >> >
> > > >> > On Wed, Nov 1, 2017 at 9:07 PM, Hen  wrote:
> > > >> >
> > > >> > > Random question - can the CI be split such that the Apache CI is
> > > doing
> > > >> a
> > > >> > > basic set of checks on that hardware, and is hooked to a PR,
> while
> > > >> there
> > > >> > is
> > > >> > > a larger "Is trunk good for release?" test that is running
> > > periodically
> > > >> > > rather than on every PR?
> > > >> > >
> > > >> > > ie: do we need each PR to be run on varied hardware, or can we
> > have
> > > >> this
> > > >> > > two tier approach?
> > > >> > >
> > > >> > > Hen
> > > >> > >
> > > >> > > On Fri, Oct 20, 2017 at 1:01 PM, sandeep krishnamurthy <
> > > >> > > sandeep.krishn...@gmail.com> wrote:
> > > >> > >
> > > >> > > > Hello all,
> > > >> > > >
> > > >> > > > I am hereby opening up a discussion thread on how we can
> > stabilize
> > > >> > Apache
> > > >> > > > MXNet CI build system.
> > > >> > > >
> > > >> > > > Problems:
> > > >> > > >
> > > >> > > > 
> > > >> > > >
> > > >> > > > Recently, we have seen following issues with Apache MXNet CI
> > build
> > > >> > > systems:
> > > >> > > >
> > > >> > > >1. Apache Jenkins master is overloaded and we see issues
> > like -
> > > >> > unable
> > > >> > > >to trigger builds, difficult to load and view the blue
> ocean
> > > and
> > > >> > other
> > > >> > > >Jenkins build status page.
> > > >> > > >2. We are generating too many request/interaction on Apache
> > > Infra
> > > >> > > team.
> > > >> > > >   1. Addition/deletion of new slave: Caused from scaling
> > > >> activity,
> > > >> > > >   recycling, troubleshooting or any actions leading to
> > change
> > > of
> > > >> > > slave
> > > >> > > >   machines.
> > > >> > > >   2. Plugins / other Jenkins Master configurations.
> > > >> > > >   3. Experimentation on CI pipelines.
> > > >> > > >3. Harder to debug and resolve issues - Since access to
> > master
> > > and
> > > >> > > slave
> > > >> > > >is not with the same community, it requires Infra and
> > > community to
> > > >> > > dive
> > > >> > > >deep together on all action items.
> > > >> > > >
> > > >> > > > Possible

Re: [Proposal] Stabilizing Apache MXNet CI build system

2017-11-08 Thread Meghna Baijal
Thanks for the active discussion on the document for the new CI for MXNet.
Now that many of you have reviewed it, do you think I should start a vote
on which framework the community wants to move forward with ?

Thanks,
Meghna

On Mon, Nov 6, 2017 at 6:59 PM, Chris Olivier  wrote:

> After a decision is reached, i am willing to add tasks to Apache MXNet JIRA
>
> On Mon, Nov 6, 2017 at 6:15 AM, Pedro Larroy  >
> wrote:
>
> > Thanks for setting up the document guys, looks like a solid basis to
> > start to work on!
> >
> > Marco, Kellen and I have already added some comments.
> >
> > Pedro
> >
> >
> > On Sun, Nov 5, 2017 at 3:43 AM, Meghna Baijal
> >  wrote:
> > > Kellen, Thank you for your comments in the doc.
> > > Sure Steffen, I will continue to merge everyone’s comments into the doc
> > and
> > > work with Pedro to finalize it.
> > > And then we can vote on the options.
> > >
> > > Thanks,
> > > Meghna Baijal
> > >
> > >
> > > On Sat, Nov 4, 2017 at 6:34 AM, Steffen Rochel <
> steffenroc...@gmail.com>
> > > wrote:
> > >
> > >> Sandeep and Meghna have been working in background collecting input
> and
> > >> preparing a doc. I suggest to drive discussion forward and would like
> to
> > >> ask everybody to contribute to
> > >> https://docs.google.com/document/d/17PEasQ2VWrXi2Cf7IGZSWGZMawxDk
> > >> dlavUDASzUmLjk/edit?usp=sharing
> > >>
> > >> Lets converge on requirements and architecture, so we can move forward
> > with
> > >> implementation.
> > >>
> > >> I would like to suggest for Pedro  and Meghna to lead the discussion
> and
> > >> help to resolve suggestions.
> > >>
> > >> I assume we need a vote once we are converged on a good draft to call
> > it a
> > >> plan and move forward with implementation. As we all are unhappy with
> > the
> > >> current CI situation I would also suggest a phased approach, so we can
> > get
> > >> back to reliable and efficient basic CI quickly and add advanced
> > >> capabilities over time.
> > >>
> > >> Steffen
> > >>
> > >> On Wed, Nov 1, 2017 at 1:14 PM kellen sunderland <
> > >> kellen.sunderl...@gmail.com> wrote:
> > >>
> > >> > Hey Henri, I think that's what a few of us are advocating.  Running
> a
> > set
> > >> > of quick tests as part of the PR process, and then a more detailed
> > >> > regression test suite periodically (say every 4 hours). This fits
> > nicely
> > >> > into a tagging or 2 branch development system.  Commits will be
> tagged
> > >> (or
> > >> > merged into a stable branch) as soon as they pass the detailed
> > regression
> > >> > testing.
> > >> >
> > >> > On Wed, Nov 1, 2017 at 9:07 PM, Hen  wrote:
> > >> >
> > >> > > Random question - can the CI be split such that the Apache CI is
> > doing
> > >> a
> > >> > > basic set of checks on that hardware, and is hooked to a PR, while
> > >> there
> > >> > is
> > >> > > a larger "Is trunk good for release?" test that is running
> > periodically
> > >> > > rather than on every PR?
> > >> > >
> > >> > > ie: do we need each PR to be run on varied hardware, or can we
> have
> > >> this
> > >> > > two tier approach?
> > >> > >
> > >> > > Hen
> > >> > >
> > >> > > On Fri, Oct 20, 2017 at 1:01 PM, sandeep krishnamurthy <
> > >> > > sandeep.krishn...@gmail.com> wrote:
> > >> > >
> > >> > > > Hello all,
> > >> > > >
> > >> > > > I am hereby opening up a discussion thread on how we can
> stabilize
> > >> > Apache
> > >> > > > MXNet CI build system.
> > >> > > >
> > >> > > > Problems:
> > >> > > >
> > >> > > > 
> > >> > > >
> > >> > > > Recently, we have seen following issues with Apache MXNet CI
> build
> > >> > > systems:
> > >> > > >
> > >> > > >1. Apache Jenkins master is overloaded and we see issues
> like -
> > >> > unable
> > >> > > >to trigger builds, difficult to load and view the blue ocean
> > and
> > >> > other
> > >> > > >Jenkins build status page.
> > >> > > >2. We are generating too many request/interaction on Apache
> > Infra
> > >> > > team.
> > >> > > >   1. Addition/deletion of new slave: Caused from scaling
> > >> activity,
> > >> > > >   recycling, troubleshooting or any actions leading to
> change
> > of
> > >> > > slave
> > >> > > >   machines.
> > >> > > >   2. Plugins / other Jenkins Master configurations.
> > >> > > >   3. Experimentation on CI pipelines.
> > >> > > >3. Harder to debug and resolve issues - Since access to
> master
> > and
> > >> > > slave
> > >> > > >is not with the same community, it requires Infra and
> > community to
> > >> > > dive
> > >> > > >deep together on all action items.
> > >> > > >
> > >> > > > Possible Solutions:
> > >> > > >
> > >> > > > ==
> > >> > > >
> > >> > > >1. Can we set up a separate Jenkins CI build system for
> Apache
> > >> MXNet
> > >> > > >outside Apache Infra?
> > >> > > >2. Can we have a separate Jenkins Master in Apache Infra for
> > >> MXNet?
> > >> > > >3. Review design of current setup, refine and fill the gaps.
> > >> > > >
> > >> > > > @ Mentors/Infr

Re: [Proposal] Stabilizing Apache MXNet CI build system

2017-11-06 Thread Chris Olivier
After a decision is reached, i am willing to add tasks to Apache MXNet JIRA

On Mon, Nov 6, 2017 at 6:15 AM, Pedro Larroy 
wrote:

> Thanks for setting up the document guys, looks like a solid basis to
> start to work on!
>
> Marco, Kellen and I have already added some comments.
>
> Pedro
>
>
> On Sun, Nov 5, 2017 at 3:43 AM, Meghna Baijal
>  wrote:
> > Kellen, Thank you for your comments in the doc.
> > Sure Steffen, I will continue to merge everyone’s comments into the doc
> and
> > work with Pedro to finalize it.
> > And then we can vote on the options.
> >
> > Thanks,
> > Meghna Baijal
> >
> >
> > On Sat, Nov 4, 2017 at 6:34 AM, Steffen Rochel 
> > wrote:
> >
> >> Sandeep and Meghna have been working in background collecting input and
> >> preparing a doc. I suggest to drive discussion forward and would like to
> >> ask everybody to contribute to
> >> https://docs.google.com/document/d/17PEasQ2VWrXi2Cf7IGZSWGZMawxDk
> >> dlavUDASzUmLjk/edit?usp=sharing
> >>
> >> Lets converge on requirements and architecture, so we can move forward
> with
> >> implementation.
> >>
> >> I would like to suggest for Pedro  and Meghna to lead the discussion and
> >> help to resolve suggestions.
> >>
> >> I assume we need a vote once we are converged on a good draft to call
> it a
> >> plan and move forward with implementation. As we all are unhappy with
> the
> >> current CI situation I would also suggest a phased approach, so we can
> get
> >> back to reliable and efficient basic CI quickly and add advanced
> >> capabilities over time.
> >>
> >> Steffen
> >>
> >> On Wed, Nov 1, 2017 at 1:14 PM kellen sunderland <
> >> kellen.sunderl...@gmail.com> wrote:
> >>
> >> > Hey Henri, I think that's what a few of us are advocating.  Running a
> set
> >> > of quick tests as part of the PR process, and then a more detailed
> >> > regression test suite periodically (say every 4 hours). This fits
> nicely
> >> > into a tagging or 2 branch development system.  Commits will be tagged
> >> (or
> >> > merged into a stable branch) as soon as they pass the detailed
> regression
> >> > testing.
> >> >
> >> > On Wed, Nov 1, 2017 at 9:07 PM, Hen  wrote:
> >> >
> >> > > Random question - can the CI be split such that the Apache CI is
> doing
> >> a
> >> > > basic set of checks on that hardware, and is hooked to a PR, while
> >> there
> >> > is
> >> > > a larger "Is trunk good for release?" test that is running
> periodically
> >> > > rather than on every PR?
> >> > >
> >> > > ie: do we need each PR to be run on varied hardware, or can we have
> >> this
> >> > > two tier approach?
> >> > >
> >> > > Hen
> >> > >
> >> > > On Fri, Oct 20, 2017 at 1:01 PM, sandeep krishnamurthy <
> >> > > sandeep.krishn...@gmail.com> wrote:
> >> > >
> >> > > > Hello all,
> >> > > >
> >> > > > I am hereby opening up a discussion thread on how we can stabilize
> >> > Apache
> >> > > > MXNet CI build system.
> >> > > >
> >> > > > Problems:
> >> > > >
> >> > > > 
> >> > > >
> >> > > > Recently, we have seen following issues with Apache MXNet CI build
> >> > > systems:
> >> > > >
> >> > > >1. Apache Jenkins master is overloaded and we see issues like -
> >> > unable
> >> > > >to trigger builds, difficult to load and view the blue ocean
> and
> >> > other
> >> > > >Jenkins build status page.
> >> > > >2. We are generating too many request/interaction on Apache
> Infra
> >> > > team.
> >> > > >   1. Addition/deletion of new slave: Caused from scaling
> >> activity,
> >> > > >   recycling, troubleshooting or any actions leading to change
> of
> >> > > slave
> >> > > >   machines.
> >> > > >   2. Plugins / other Jenkins Master configurations.
> >> > > >   3. Experimentation on CI pipelines.
> >> > > >3. Harder to debug and resolve issues - Since access to master
> and
> >> > > slave
> >> > > >is not with the same community, it requires Infra and
> community to
> >> > > dive
> >> > > >deep together on all action items.
> >> > > >
> >> > > > Possible Solutions:
> >> > > >
> >> > > > ==
> >> > > >
> >> > > >1. Can we set up a separate Jenkins CI build system for Apache
> >> MXNet
> >> > > >outside Apache Infra?
> >> > > >2. Can we have a separate Jenkins Master in Apache Infra for
> >> MXNet?
> >> > > >3. Review design of current setup, refine and fill the gaps.
> >> > > >
> >> > > > @ Mentors/Infra team/Community:
> >> > > >
> >> > > > ==
> >> > > >
> >> > > > Please provide your suggestions on how we can proceed further and
> >> work
> >> > on
> >> > > > stabilizing the CI build systems for MXNet.
> >> > > >
> >> > > > Also, if the community decides on separate Jenkins CI build
> system,
> >> > what
> >> > > > important points should be taken care of apart from the below:
> >> > > >
> >> > > >1. Community being able to access the build page for build
> >> statuses.
> >> > > >2. Committers being able to login with apache credentials.
> >> > > >3. Hook setup f

Re: [Proposal] Stabilizing Apache MXNet CI build system

2017-11-06 Thread Pedro Larroy
Thanks for setting up the document guys, looks like a solid basis to
start to work on!

Marco, Kellen and I have already added some comments.

Pedro


On Sun, Nov 5, 2017 at 3:43 AM, Meghna Baijal
 wrote:
> Kellen, Thank you for your comments in the doc.
> Sure Steffen, I will continue to merge everyone’s comments into the doc and
> work with Pedro to finalize it.
> And then we can vote on the options.
>
> Thanks,
> Meghna Baijal
>
>
> On Sat, Nov 4, 2017 at 6:34 AM, Steffen Rochel 
> wrote:
>
>> Sandeep and Meghna have been working in background collecting input and
>> preparing a doc. I suggest to drive discussion forward and would like to
>> ask everybody to contribute to
>> https://docs.google.com/document/d/17PEasQ2VWrXi2Cf7IGZSWGZMawxDk
>> dlavUDASzUmLjk/edit?usp=sharing
>>
>> Lets converge on requirements and architecture, so we can move forward with
>> implementation.
>>
>> I would like to suggest for Pedro  and Meghna to lead the discussion and
>> help to resolve suggestions.
>>
>> I assume we need a vote once we are converged on a good draft to call it a
>> plan and move forward with implementation. As we all are unhappy with the
>> current CI situation I would also suggest a phased approach, so we can get
>> back to reliable and efficient basic CI quickly and add advanced
>> capabilities over time.
>>
>> Steffen
>>
>> On Wed, Nov 1, 2017 at 1:14 PM kellen sunderland <
>> kellen.sunderl...@gmail.com> wrote:
>>
>> > Hey Henri, I think that's what a few of us are advocating.  Running a set
>> > of quick tests as part of the PR process, and then a more detailed
>> > regression test suite periodically (say every 4 hours). This fits nicely
>> > into a tagging or 2 branch development system.  Commits will be tagged
>> (or
>> > merged into a stable branch) as soon as they pass the detailed regression
>> > testing.
>> >
>> > On Wed, Nov 1, 2017 at 9:07 PM, Hen  wrote:
>> >
>> > > Random question - can the CI be split such that the Apache CI is doing
>> a
>> > > basic set of checks on that hardware, and is hooked to a PR, while
>> there
>> > is
>> > > a larger "Is trunk good for release?" test that is running periodically
>> > > rather than on every PR?
>> > >
>> > > ie: do we need each PR to be run on varied hardware, or can we have
>> this
>> > > two tier approach?
>> > >
>> > > Hen
>> > >
>> > > On Fri, Oct 20, 2017 at 1:01 PM, sandeep krishnamurthy <
>> > > sandeep.krishn...@gmail.com> wrote:
>> > >
>> > > > Hello all,
>> > > >
>> > > > I am hereby opening up a discussion thread on how we can stabilize
>> > Apache
>> > > > MXNet CI build system.
>> > > >
>> > > > Problems:
>> > > >
>> > > > 
>> > > >
>> > > > Recently, we have seen following issues with Apache MXNet CI build
>> > > systems:
>> > > >
>> > > >1. Apache Jenkins master is overloaded and we see issues like -
>> > unable
>> > > >to trigger builds, difficult to load and view the blue ocean and
>> > other
>> > > >Jenkins build status page.
>> > > >2. We are generating too many request/interaction on Apache Infra
>> > > team.
>> > > >   1. Addition/deletion of new slave: Caused from scaling
>> activity,
>> > > >   recycling, troubleshooting or any actions leading to change of
>> > > slave
>> > > >   machines.
>> > > >   2. Plugins / other Jenkins Master configurations.
>> > > >   3. Experimentation on CI pipelines.
>> > > >3. Harder to debug and resolve issues - Since access to master and
>> > > slave
>> > > >is not with the same community, it requires Infra and community to
>> > > dive
>> > > >deep together on all action items.
>> > > >
>> > > > Possible Solutions:
>> > > >
>> > > > ==
>> > > >
>> > > >1. Can we set up a separate Jenkins CI build system for Apache
>> MXNet
>> > > >outside Apache Infra?
>> > > >2. Can we have a separate Jenkins Master in Apache Infra for
>> MXNet?
>> > > >3. Review design of current setup, refine and fill the gaps.
>> > > >
>> > > > @ Mentors/Infra team/Community:
>> > > >
>> > > > ==
>> > > >
>> > > > Please provide your suggestions on how we can proceed further and
>> work
>> > on
>> > > > stabilizing the CI build systems for MXNet.
>> > > >
>> > > > Also, if the community decides on separate Jenkins CI build system,
>> > what
>> > > > important points should be taken care of apart from the below:
>> > > >
>> > > >1. Community being able to access the build page for build
>> statuses.
>> > > >2. Committers being able to login with apache credentials.
>> > > >3. Hook setup from apache/incubator-mxnet repo to Jenkins master.
>> > > >
>> > > >
>> > > > Irrespective of the solution we come up, I think we should initiate a
>> > > > technical design discussion on how to setup the CI build system.
>> > > Probably 1
>> > > > or 2 pager documents with the architecture and review with Infra and
>> > > > community members.
>> > > >
>> > > > ***There were few proposal and discussion on the slack channe

Re: [Proposal] Stabilizing Apache MXNet CI build system

2017-11-04 Thread Meghna Baijal
Kellen, Thank you for your comments in the doc.
Sure Steffen, I will continue to merge everyone’s comments into the doc and
work with Pedro to finalize it.
And then we can vote on the options.

Thanks,
Meghna Baijal


On Sat, Nov 4, 2017 at 6:34 AM, Steffen Rochel 
wrote:

> Sandeep and Meghna have been working in background collecting input and
> preparing a doc. I suggest to drive discussion forward and would like to
> ask everybody to contribute to
> https://docs.google.com/document/d/17PEasQ2VWrXi2Cf7IGZSWGZMawxDk
> dlavUDASzUmLjk/edit?usp=sharing
>
> Lets converge on requirements and architecture, so we can move forward with
> implementation.
>
> I would like to suggest for Pedro  and Meghna to lead the discussion and
> help to resolve suggestions.
>
> I assume we need a vote once we are converged on a good draft to call it a
> plan and move forward with implementation. As we all are unhappy with the
> current CI situation I would also suggest a phased approach, so we can get
> back to reliable and efficient basic CI quickly and add advanced
> capabilities over time.
>
> Steffen
>
> On Wed, Nov 1, 2017 at 1:14 PM kellen sunderland <
> kellen.sunderl...@gmail.com> wrote:
>
> > Hey Henri, I think that's what a few of us are advocating.  Running a set
> > of quick tests as part of the PR process, and then a more detailed
> > regression test suite periodically (say every 4 hours). This fits nicely
> > into a tagging or 2 branch development system.  Commits will be tagged
> (or
> > merged into a stable branch) as soon as they pass the detailed regression
> > testing.
> >
> > On Wed, Nov 1, 2017 at 9:07 PM, Hen  wrote:
> >
> > > Random question - can the CI be split such that the Apache CI is doing
> a
> > > basic set of checks on that hardware, and is hooked to a PR, while
> there
> > is
> > > a larger "Is trunk good for release?" test that is running periodically
> > > rather than on every PR?
> > >
> > > ie: do we need each PR to be run on varied hardware, or can we have
> this
> > > two tier approach?
> > >
> > > Hen
> > >
> > > On Fri, Oct 20, 2017 at 1:01 PM, sandeep krishnamurthy <
> > > sandeep.krishn...@gmail.com> wrote:
> > >
> > > > Hello all,
> > > >
> > > > I am hereby opening up a discussion thread on how we can stabilize
> > Apache
> > > > MXNet CI build system.
> > > >
> > > > Problems:
> > > >
> > > > 
> > > >
> > > > Recently, we have seen following issues with Apache MXNet CI build
> > > systems:
> > > >
> > > >1. Apache Jenkins master is overloaded and we see issues like -
> > unable
> > > >to trigger builds, difficult to load and view the blue ocean and
> > other
> > > >Jenkins build status page.
> > > >2. We are generating too many request/interaction on Apache Infra
> > > team.
> > > >   1. Addition/deletion of new slave: Caused from scaling
> activity,
> > > >   recycling, troubleshooting or any actions leading to change of
> > > slave
> > > >   machines.
> > > >   2. Plugins / other Jenkins Master configurations.
> > > >   3. Experimentation on CI pipelines.
> > > >3. Harder to debug and resolve issues - Since access to master and
> > > slave
> > > >is not with the same community, it requires Infra and community to
> > > dive
> > > >deep together on all action items.
> > > >
> > > > Possible Solutions:
> > > >
> > > > ==
> > > >
> > > >1. Can we set up a separate Jenkins CI build system for Apache
> MXNet
> > > >outside Apache Infra?
> > > >2. Can we have a separate Jenkins Master in Apache Infra for
> MXNet?
> > > >3. Review design of current setup, refine and fill the gaps.
> > > >
> > > > @ Mentors/Infra team/Community:
> > > >
> > > > ==
> > > >
> > > > Please provide your suggestions on how we can proceed further and
> work
> > on
> > > > stabilizing the CI build systems for MXNet.
> > > >
> > > > Also, if the community decides on separate Jenkins CI build system,
> > what
> > > > important points should be taken care of apart from the below:
> > > >
> > > >1. Community being able to access the build page for build
> statuses.
> > > >2. Committers being able to login with apache credentials.
> > > >3. Hook setup from apache/incubator-mxnet repo to Jenkins master.
> > > >
> > > >
> > > > Irrespective of the solution we come up, I think we should initiate a
> > > > technical design discussion on how to setup the CI build system.
> > > Probably 1
> > > > or 2 pager documents with the architecture and review with Infra and
> > > > community members.
> > > >
> > > > ***There were few proposal and discussion on the slack channel, to
> > reach
> > > > wider community members, moving that discussion formally to this
> list.
> > > >
> > > >
> > > > My Proposal: Option 1 - Set up separate Jenkins CI build system.
> > > >
> > > > Thanks,
> > > >
> > > > Sandeep
> > > >
> > > >
> > > >
> > > > --
> > > > Sandeep Krishnamurthy
> > > >
> > >
> >
>


Re: [Proposal] Stabilizing Apache MXNet CI build system

2017-11-04 Thread Steffen Rochel
Sandeep and Meghna have been working in background collecting input and
preparing a doc. I suggest to drive discussion forward and would like to
ask everybody to contribute to
https://docs.google.com/document/d/17PEasQ2VWrXi2Cf7IGZSWGZMawxDkdlavUDASzUmLjk/edit?usp=sharing

Lets converge on requirements and architecture, so we can move forward with
implementation.

I would like to suggest for Pedro  and Meghna to lead the discussion and
help to resolve suggestions.

I assume we need a vote once we are converged on a good draft to call it a
plan and move forward with implementation. As we all are unhappy with the
current CI situation I would also suggest a phased approach, so we can get
back to reliable and efficient basic CI quickly and add advanced
capabilities over time.

Steffen

On Wed, Nov 1, 2017 at 1:14 PM kellen sunderland <
kellen.sunderl...@gmail.com> wrote:

> Hey Henri, I think that's what a few of us are advocating.  Running a set
> of quick tests as part of the PR process, and then a more detailed
> regression test suite periodically (say every 4 hours). This fits nicely
> into a tagging or 2 branch development system.  Commits will be tagged (or
> merged into a stable branch) as soon as they pass the detailed regression
> testing.
>
> On Wed, Nov 1, 2017 at 9:07 PM, Hen  wrote:
>
> > Random question - can the CI be split such that the Apache CI is doing a
> > basic set of checks on that hardware, and is hooked to a PR, while there
> is
> > a larger "Is trunk good for release?" test that is running periodically
> > rather than on every PR?
> >
> > ie: do we need each PR to be run on varied hardware, or can we have this
> > two tier approach?
> >
> > Hen
> >
> > On Fri, Oct 20, 2017 at 1:01 PM, sandeep krishnamurthy <
> > sandeep.krishn...@gmail.com> wrote:
> >
> > > Hello all,
> > >
> > > I am hereby opening up a discussion thread on how we can stabilize
> Apache
> > > MXNet CI build system.
> > >
> > > Problems:
> > >
> > > 
> > >
> > > Recently, we have seen following issues with Apache MXNet CI build
> > systems:
> > >
> > >1. Apache Jenkins master is overloaded and we see issues like -
> unable
> > >to trigger builds, difficult to load and view the blue ocean and
> other
> > >Jenkins build status page.
> > >2. We are generating too many request/interaction on Apache Infra
> > team.
> > >   1. Addition/deletion of new slave: Caused from scaling activity,
> > >   recycling, troubleshooting or any actions leading to change of
> > slave
> > >   machines.
> > >   2. Plugins / other Jenkins Master configurations.
> > >   3. Experimentation on CI pipelines.
> > >3. Harder to debug and resolve issues - Since access to master and
> > slave
> > >is not with the same community, it requires Infra and community to
> > dive
> > >deep together on all action items.
> > >
> > > Possible Solutions:
> > >
> > > ==
> > >
> > >1. Can we set up a separate Jenkins CI build system for Apache MXNet
> > >outside Apache Infra?
> > >2. Can we have a separate Jenkins Master in Apache Infra for MXNet?
> > >3. Review design of current setup, refine and fill the gaps.
> > >
> > > @ Mentors/Infra team/Community:
> > >
> > > ==
> > >
> > > Please provide your suggestions on how we can proceed further and work
> on
> > > stabilizing the CI build systems for MXNet.
> > >
> > > Also, if the community decides on separate Jenkins CI build system,
> what
> > > important points should be taken care of apart from the below:
> > >
> > >1. Community being able to access the build page for build statuses.
> > >2. Committers being able to login with apache credentials.
> > >3. Hook setup from apache/incubator-mxnet repo to Jenkins master.
> > >
> > >
> > > Irrespective of the solution we come up, I think we should initiate a
> > > technical design discussion on how to setup the CI build system.
> > Probably 1
> > > or 2 pager documents with the architecture and review with Infra and
> > > community members.
> > >
> > > ***There were few proposal and discussion on the slack channel, to
> reach
> > > wider community members, moving that discussion formally to this list.
> > >
> > >
> > > My Proposal: Option 1 - Set up separate Jenkins CI build system.
> > >
> > > Thanks,
> > >
> > > Sandeep
> > >
> > >
> > >
> > > --
> > > Sandeep Krishnamurthy
> > >
> >
>


Re: [Proposal] Stabilizing Apache MXNet CI build system

2017-11-01 Thread kellen sunderland
Hey Henri, I think that's what a few of us are advocating.  Running a set
of quick tests as part of the PR process, and then a more detailed
regression test suite periodically (say every 4 hours). This fits nicely
into a tagging or 2 branch development system.  Commits will be tagged (or
merged into a stable branch) as soon as they pass the detailed regression
testing.

On Wed, Nov 1, 2017 at 9:07 PM, Hen  wrote:

> Random question - can the CI be split such that the Apache CI is doing a
> basic set of checks on that hardware, and is hooked to a PR, while there is
> a larger "Is trunk good for release?" test that is running periodically
> rather than on every PR?
>
> ie: do we need each PR to be run on varied hardware, or can we have this
> two tier approach?
>
> Hen
>
> On Fri, Oct 20, 2017 at 1:01 PM, sandeep krishnamurthy <
> sandeep.krishn...@gmail.com> wrote:
>
> > Hello all,
> >
> > I am hereby opening up a discussion thread on how we can stabilize Apache
> > MXNet CI build system.
> >
> > Problems:
> >
> > 
> >
> > Recently, we have seen following issues with Apache MXNet CI build
> systems:
> >
> >1. Apache Jenkins master is overloaded and we see issues like - unable
> >to trigger builds, difficult to load and view the blue ocean and other
> >Jenkins build status page.
> >2. We are generating too many request/interaction on Apache Infra
> team.
> >   1. Addition/deletion of new slave: Caused from scaling activity,
> >   recycling, troubleshooting or any actions leading to change of
> slave
> >   machines.
> >   2. Plugins / other Jenkins Master configurations.
> >   3. Experimentation on CI pipelines.
> >3. Harder to debug and resolve issues - Since access to master and
> slave
> >is not with the same community, it requires Infra and community to
> dive
> >deep together on all action items.
> >
> > Possible Solutions:
> >
> > ==
> >
> >1. Can we set up a separate Jenkins CI build system for Apache MXNet
> >outside Apache Infra?
> >2. Can we have a separate Jenkins Master in Apache Infra for MXNet?
> >3. Review design of current setup, refine and fill the gaps.
> >
> > @ Mentors/Infra team/Community:
> >
> > ==
> >
> > Please provide your suggestions on how we can proceed further and work on
> > stabilizing the CI build systems for MXNet.
> >
> > Also, if the community decides on separate Jenkins CI build system, what
> > important points should be taken care of apart from the below:
> >
> >1. Community being able to access the build page for build statuses.
> >2. Committers being able to login with apache credentials.
> >3. Hook setup from apache/incubator-mxnet repo to Jenkins master.
> >
> >
> > Irrespective of the solution we come up, I think we should initiate a
> > technical design discussion on how to setup the CI build system.
> Probably 1
> > or 2 pager documents with the architecture and review with Infra and
> > community members.
> >
> > ***There were few proposal and discussion on the slack channel, to reach
> > wider community members, moving that discussion formally to this list.
> >
> >
> > My Proposal: Option 1 - Set up separate Jenkins CI build system.
> >
> > Thanks,
> >
> > Sandeep
> >
> >
> >
> > --
> > Sandeep Krishnamurthy
> >
>


Re: [Proposal] Stabilizing Apache MXNet CI build system

2017-11-01 Thread Hen
Random question - can the CI be split such that the Apache CI is doing a
basic set of checks on that hardware, and is hooked to a PR, while there is
a larger "Is trunk good for release?" test that is running periodically
rather than on every PR?

ie: do we need each PR to be run on varied hardware, or can we have this
two tier approach?

Hen

On Fri, Oct 20, 2017 at 1:01 PM, sandeep krishnamurthy <
sandeep.krishn...@gmail.com> wrote:

> Hello all,
>
> I am hereby opening up a discussion thread on how we can stabilize Apache
> MXNet CI build system.
>
> Problems:
>
> 
>
> Recently, we have seen following issues with Apache MXNet CI build systems:
>
>1. Apache Jenkins master is overloaded and we see issues like - unable
>to trigger builds, difficult to load and view the blue ocean and other
>Jenkins build status page.
>2. We are generating too many request/interaction on Apache Infra team.
>   1. Addition/deletion of new slave: Caused from scaling activity,
>   recycling, troubleshooting or any actions leading to change of slave
>   machines.
>   2. Plugins / other Jenkins Master configurations.
>   3. Experimentation on CI pipelines.
>3. Harder to debug and resolve issues - Since access to master and slave
>is not with the same community, it requires Infra and community to dive
>deep together on all action items.
>
> Possible Solutions:
>
> ==
>
>1. Can we set up a separate Jenkins CI build system for Apache MXNet
>outside Apache Infra?
>2. Can we have a separate Jenkins Master in Apache Infra for MXNet?
>3. Review design of current setup, refine and fill the gaps.
>
> @ Mentors/Infra team/Community:
>
> ==
>
> Please provide your suggestions on how we can proceed further and work on
> stabilizing the CI build systems for MXNet.
>
> Also, if the community decides on separate Jenkins CI build system, what
> important points should be taken care of apart from the below:
>
>1. Community being able to access the build page for build statuses.
>2. Committers being able to login with apache credentials.
>3. Hook setup from apache/incubator-mxnet repo to Jenkins master.
>
>
> Irrespective of the solution we come up, I think we should initiate a
> technical design discussion on how to setup the CI build system. Probably 1
> or 2 pager documents with the architecture and review with Infra and
> community members.
>
> ***There were few proposal and discussion on the slack channel, to reach
> wider community members, moving that discussion formally to this list.
>
>
> My Proposal: Option 1 - Set up separate Jenkins CI build system.
>
> Thanks,
>
> Sandeep
>
>
>
> --
> Sandeep Krishnamurthy
>


Re: [Proposal] Stabilizing Apache MXNet CI build system

2017-11-01 Thread Hen
Some inline thoughts.

On Wed, Nov 1, 2017 at 9:41 AM, Bhavin Thaker 
wrote:

> Few comments/suggestions:
>
> 1) Can  we have this nice list of todo items on the Apache MXNet wiki page
> to track them better?
>
> 2) Can we have a set of owners for each set of tests and source code
> directory? One of the problems I have observed is that when there is a test
> failure, it is difficult to find an owner who will take the responsibility
> of fixing the test OR identifying the culprit code promptly -- this causes
> the master to continue to fail for many days.
>

On this one, we're all volunteers and there shouldn't be situations of
"Bob's permission is needed to edit this file", or "We're waiting on Alice
to do that work". The project as a whole owns this.

Agreed that this can cause a tragedy of the commons, but raising the bar on
being a committer to someone who has the privilege of 24/7 time on the
project is worse.

As an employer of contributors, something you could do internally at Amazon
is to identify experts who own (from Amazon's point of view) contributions
to that area and they can be the ones you poke on an issue (internally).


>
> 3) Specifically, we need an owner for the Windows setup -- nobody seems to
> know much about it -- please feel free to correct me if required.
>

If there's no one in the community who can support it, then a) we should
seek someone (help wanted etc) on the lists/website/twitter, and b) if that
fails, we should move it to a contrib/deprecated path.


>
> 4) +1 to have a list of all feature requests on Jira or a similar commonly
> and easily accessible system.
>
> 5) -1 to the branching model -- I was the gatekeeper for the branching
> model at Informix for the database kernel code to be merged to master along
> with my day-job of being a database kernel engineer for around 9 months and
> hence have the opinion that a branching model just shifts the burden from
> one place to another. We don't have a dedicated team to do the branching
> model. If we really need a buildable master everyday, then we could just
> tag every successful build as last_clean_build on master -- use this tag to
> get a clean master at any time. How many Apache projects are doing
> development on separate branches?
>

Typically I would expect separate branch develop to happen when a project
is experimenting with multiple futures. Most projects do have multiple
branches (I'd guess typically only 2) to support bugfixes to older versions
and new code on newer versions though.


>
> 6) FYI: Rahul (rahul003@) has fixed various warnings with this PR:
> https://github.com/apache/incubator-mxnet/pull/7109 and has a test added
> that fails for any warning found. We can build on top of his work.
>
> 7) FYI: For the unit-tests problems, Meghna identified that some of the
> unit-test run times have increased significantly in the recent builds. We
> need volunteers to help diagnose the root-cause here:
>
> Unit Test Task
>
> Build #337
>
> Build #500
>
> Build #556
>
> Python 2: GPU win
>
> 25
>
> 38
>
> 40
>
> Python 3: GPU Win
>
> 15
>
> 38
>
> 46
>
> Python2: CPU
>
> 25
>
> 35
>
> 80
>
> Python3: CPU
>
> 14
>
> 28
>
> 72
>
> R: CPU
>
> 20
>
> 34
>
> 24
>
> R: GPU
>
> 5
>
> 24
>
> 24
>
>
> 8) Ensure that all PRs submitted have corresponding documentation on
> http://mxnet.io for it.  It may be fine to have documentation follow the
> code changes as long as there is ownership that this task will be done in a
> timely manner.  For example, I have requested the Nvidia team to submit PRs
> to update documentation on http://mxnet.io for the Volta changes to MXNet.
>

Why not expect documentation as a part of the PR?


>
>
> 9) Ensure that mega-PRs have some level of design or architecture
> document(s) shared on the Apache MXNet wiki. The mega-PR must have both
> unit-tests and nightly/integration tests submitted to demonstrate
> high-quality level.
>

+1. These are the ones that should be having a dev@ discussion.


>
>
> 10) Finally, how do we get ownership for code submitted to MXNet? When
> something fails in a code segment that only a small set of folks know
> about, what is the expected SLA for a response from them? When users deploy
> MXNet in production environments, they will expect some form of SLA for
> support and a patch release.
>

Users can expect what they want. What they get is best effort/good
intentions. If they want someone to supply an SLA, then they can pay a
vendor who repackages MXNet/builds upon MXNet for that service.

Part of the value of Open Source is that users can always fix the issue
themselves, they are not beholden to a third party to fix it for them (and
thus need an SLA). For something like OpenOffice there is an obvious issue
there, many of its users would need longer to come up to speed to fix the
issue and the likely reply; but for MXNet, many of its users do know how to
code and don't need to go learn a programming language before starting to
look at the bug. This is also wh

Re: [Proposal] Stabilizing Apache MXNet CI build system

2017-11-01 Thread kellen sunderland
To point 7) I did a little bit of measure / profiling of our test runs a
week or two ago and came to the same conclusion.  I assumed the slow downs
were mostly due to new tests which had recently been added.  There were
quite a few gluon tests for example added, and I think they're fairly
resource intensive.

On Wed, Nov 1, 2017 at 6:40 PM, kellen sunderland <
kellen.sunderl...@gmail.com> wrote:

> Bhavin: I would add on point 5 that it doesn't alway make sense to attach
> ownership for the broken integration test to the PR author.  We're planning
> extensive integration tests on a variety of hardware.  Some of these test
> failures won't be reproducible by most PR authors and the effort to resolve
> these failures should be delegated to a test owner.  Agree with Pedro that
> this would be strictly fast-fwd merging from one branch to another after
> integration tests pass, so it shouldn't require much extra work beyond
> fixing failures.
>
> On Wed, Nov 1, 2017 at 6:35 PM, Pedro Larroy  > wrote:
>
>> Hi Bhavin
>>
>> Good suggestions.
>>
>> I wanted to respond to your point #5
>>
>> The promotion of integration to master would be done automatically by
>> jenkins once a commit passes the nightly tests. So it should not
>> impose any additional burden on the developers, as there is no manual
>> step involved / human gatekeeper.
>>
>> It would be equivalent to your suggestion with tags. You can do the
>> same with branches, anyway a git branch is just a pointer to some
>> commit, so I think we are talking about the same.
>>
>> Pedro.
>>
>>
>>
>>
>> On Wed, Nov 1, 2017 at 5:41 PM, Bhavin Thaker 
>> wrote:
>> > Few comments/suggestions:
>> >
>> > 1) Can  we have this nice list of todo items on the Apache MXNet wiki
>> page
>> > to track them better?
>> >
>> > 2) Can we have a set of owners for each set of tests and source code
>> > directory? One of the problems I have observed is that when there is a
>> test
>> > failure, it is difficult to find an owner who will take the
>> responsibility
>> > of fixing the test OR identifying the culprit code promptly -- this
>> causes
>> > the master to continue to fail for many days.
>> >
>> > 3) Specifically, we need an owner for the Windows setup -- nobody seems
>> to
>> > know much about it -- please feel free to correct me if required.
>> >
>> > 4) +1 to have a list of all feature requests on Jira or a similar
>> commonly
>> > and easily accessible system.
>> >
>> > 5) -1 to the branching model -- I was the gatekeeper for the branching
>> > model at Informix for the database kernel code to be merged to master
>> along
>> > with my day-job of being a database kernel engineer for around 9 months
>> and
>> > hence have the opinion that a branching model just shifts the burden
>> from
>> > one place to another. We don't have a dedicated team to do the branching
>> > model. If we really need a buildable master everyday, then we could just
>> > tag every successful build as last_clean_build on master -- use this
>> tag to
>> > get a clean master at any time. How many Apache projects are doing
>> > development on separate branches?
>> >
>> > 6) FYI: Rahul (rahul003@) has fixed various warnings with this PR:
>> > https://github.com/apache/incubator-mxnet/pull/7109 and has a test
>> added
>> > that fails for any warning found. We can build on top of his work.
>> >
>> > 7) FYI: For the unit-tests problems, Meghna identified that some of the
>> > unit-test run times have increased significantly in the recent builds.
>> We
>> > need volunteers to help diagnose the root-cause here:
>> >
>> > Unit Test Task
>> >
>> > Build #337
>> >
>> > Build #500
>> >
>> > Build #556
>> >
>> > Python 2: GPU win
>> >
>> > 25
>> >
>> > 38
>> >
>> > 40
>> >
>> > Python 3: GPU Win
>> >
>> > 15
>> >
>> > 38
>> >
>> > 46
>> >
>> > Python2: CPU
>> >
>> > 25
>> >
>> > 35
>> >
>> > 80
>> >
>> > Python3: CPU
>> >
>> > 14
>> >
>> > 28
>> >
>> > 72
>> >
>> > R: CPU
>> >
>> > 20
>> >
>> > 34
>> >
>> > 24
>> >
>> > R: GPU
>> >
>> > 5
>> >
>> > 24
>> >
>> > 24
>> >
>> >
>> > 8) Ensure that all PRs submitted have corresponding documentation on
>> > http://mxnet.io for it.  It may be fine to have documentation follow
>> the
>> > code changes as long as there is ownership that this task will be done
>> in a
>> > timely manner.  For example, I have requested the Nvidia team to submit
>> PRs
>> > to update documentation on http://mxnet.io for the Volta changes to
>> MXNet.
>> >
>> >
>> > 9) Ensure that mega-PRs have some level of design or architecture
>> > document(s) shared on the Apache MXNet wiki. The mega-PR must have both
>> > unit-tests and nightly/integration tests submitted to demonstrate
>> > high-quality level.
>> >
>> >
>> > 10) Finally, how do we get ownership for code submitted to MXNet? When
>> > something fails in a code segment that only a small set of folks know
>> > about, what is the expected SLA for a response from them? When users
>> deploy
>> > MXNet in production environments, they will expect 

Re: [Proposal] Stabilizing Apache MXNet CI build system

2017-11-01 Thread kellen sunderland
Bhavin: I would add on point 5 that it doesn't alway make sense to attach
ownership for the broken integration test to the PR author.  We're planning
extensive integration tests on a variety of hardware.  Some of these test
failures won't be reproducible by most PR authors and the effort to resolve
these failures should be delegated to a test owner.  Agree with Pedro that
this would be strictly fast-fwd merging from one branch to another after
integration tests pass, so it shouldn't require much extra work beyond
fixing failures.

On Wed, Nov 1, 2017 at 6:35 PM, Pedro Larroy 
wrote:

> Hi Bhavin
>
> Good suggestions.
>
> I wanted to respond to your point #5
>
> The promotion of integration to master would be done automatically by
> jenkins once a commit passes the nightly tests. So it should not
> impose any additional burden on the developers, as there is no manual
> step involved / human gatekeeper.
>
> It would be equivalent to your suggestion with tags. You can do the
> same with branches, anyway a git branch is just a pointer to some
> commit, so I think we are talking about the same.
>
> Pedro.
>
>
>
>
> On Wed, Nov 1, 2017 at 5:41 PM, Bhavin Thaker 
> wrote:
> > Few comments/suggestions:
> >
> > 1) Can  we have this nice list of todo items on the Apache MXNet wiki
> page
> > to track them better?
> >
> > 2) Can we have a set of owners for each set of tests and source code
> > directory? One of the problems I have observed is that when there is a
> test
> > failure, it is difficult to find an owner who will take the
> responsibility
> > of fixing the test OR identifying the culprit code promptly -- this
> causes
> > the master to continue to fail for many days.
> >
> > 3) Specifically, we need an owner for the Windows setup -- nobody seems
> to
> > know much about it -- please feel free to correct me if required.
> >
> > 4) +1 to have a list of all feature requests on Jira or a similar
> commonly
> > and easily accessible system.
> >
> > 5) -1 to the branching model -- I was the gatekeeper for the branching
> > model at Informix for the database kernel code to be merged to master
> along
> > with my day-job of being a database kernel engineer for around 9 months
> and
> > hence have the opinion that a branching model just shifts the burden from
> > one place to another. We don't have a dedicated team to do the branching
> > model. If we really need a buildable master everyday, then we could just
> > tag every successful build as last_clean_build on master -- use this tag
> to
> > get a clean master at any time. How many Apache projects are doing
> > development on separate branches?
> >
> > 6) FYI: Rahul (rahul003@) has fixed various warnings with this PR:
> > https://github.com/apache/incubator-mxnet/pull/7109 and has a test added
> > that fails for any warning found. We can build on top of his work.
> >
> > 7) FYI: For the unit-tests problems, Meghna identified that some of the
> > unit-test run times have increased significantly in the recent builds. We
> > need volunteers to help diagnose the root-cause here:
> >
> > Unit Test Task
> >
> > Build #337
> >
> > Build #500
> >
> > Build #556
> >
> > Python 2: GPU win
> >
> > 25
> >
> > 38
> >
> > 40
> >
> > Python 3: GPU Win
> >
> > 15
> >
> > 38
> >
> > 46
> >
> > Python2: CPU
> >
> > 25
> >
> > 35
> >
> > 80
> >
> > Python3: CPU
> >
> > 14
> >
> > 28
> >
> > 72
> >
> > R: CPU
> >
> > 20
> >
> > 34
> >
> > 24
> >
> > R: GPU
> >
> > 5
> >
> > 24
> >
> > 24
> >
> >
> > 8) Ensure that all PRs submitted have corresponding documentation on
> > http://mxnet.io for it.  It may be fine to have documentation follow the
> > code changes as long as there is ownership that this task will be done
> in a
> > timely manner.  For example, I have requested the Nvidia team to submit
> PRs
> > to update documentation on http://mxnet.io for the Volta changes to
> MXNet.
> >
> >
> > 9) Ensure that mega-PRs have some level of design or architecture
> > document(s) shared on the Apache MXNet wiki. The mega-PR must have both
> > unit-tests and nightly/integration tests submitted to demonstrate
> > high-quality level.
> >
> >
> > 10) Finally, how do we get ownership for code submitted to MXNet? When
> > something fails in a code segment that only a small set of folks know
> > about, what is the expected SLA for a response from them? When users
> deploy
> > MXNet in production environments, they will expect some form of SLA for
> > support and a patch release.
> >
> >
> > Regards,
> > Bhavin Thaker.
> >
> >
> >
> >
> >
> >
> > On Wed, Nov 1, 2017 at 8:20 AM, Pedro Larroy <
> pedro.larroy.li...@gmail.com>
> > wrote:
> >
> >> +1  That would be great.
> >>
> >> On Mon, Oct 30, 2017 at 5:35 PM, Hen  wrote:
> >> > How about we ask for a new mxnet repo to store all the config in?
> >> >
> >> > On Fri, Oct 27, 2017 at 05:30 Pedro Larroy <
> pedro.larroy.li...@gmail.com
> >> >
> >> > wrote:
> >> >
> >> >> Just to provide a high level overview of the ideas and proposals
> >> >> comin

Re: [Proposal] Stabilizing Apache MXNet CI build system

2017-11-01 Thread Pedro Larroy
Hi Bhavin

Good suggestions.

I wanted to respond to your point #5

The promotion of integration to master would be done automatically by
jenkins once a commit passes the nightly tests. So it should not
impose any additional burden on the developers, as there is no manual
step involved / human gatekeeper.

It would be equivalent to your suggestion with tags. You can do the
same with branches, anyway a git branch is just a pointer to some
commit, so I think we are talking about the same.

Pedro.




On Wed, Nov 1, 2017 at 5:41 PM, Bhavin Thaker  wrote:
> Few comments/suggestions:
>
> 1) Can  we have this nice list of todo items on the Apache MXNet wiki page
> to track them better?
>
> 2) Can we have a set of owners for each set of tests and source code
> directory? One of the problems I have observed is that when there is a test
> failure, it is difficult to find an owner who will take the responsibility
> of fixing the test OR identifying the culprit code promptly -- this causes
> the master to continue to fail for many days.
>
> 3) Specifically, we need an owner for the Windows setup -- nobody seems to
> know much about it -- please feel free to correct me if required.
>
> 4) +1 to have a list of all feature requests on Jira or a similar commonly
> and easily accessible system.
>
> 5) -1 to the branching model -- I was the gatekeeper for the branching
> model at Informix for the database kernel code to be merged to master along
> with my day-job of being a database kernel engineer for around 9 months and
> hence have the opinion that a branching model just shifts the burden from
> one place to another. We don't have a dedicated team to do the branching
> model. If we really need a buildable master everyday, then we could just
> tag every successful build as last_clean_build on master -- use this tag to
> get a clean master at any time. How many Apache projects are doing
> development on separate branches?
>
> 6) FYI: Rahul (rahul003@) has fixed various warnings with this PR:
> https://github.com/apache/incubator-mxnet/pull/7109 and has a test added
> that fails for any warning found. We can build on top of his work.
>
> 7) FYI: For the unit-tests problems, Meghna identified that some of the
> unit-test run times have increased significantly in the recent builds. We
> need volunteers to help diagnose the root-cause here:
>
> Unit Test Task
>
> Build #337
>
> Build #500
>
> Build #556
>
> Python 2: GPU win
>
> 25
>
> 38
>
> 40
>
> Python 3: GPU Win
>
> 15
>
> 38
>
> 46
>
> Python2: CPU
>
> 25
>
> 35
>
> 80
>
> Python3: CPU
>
> 14
>
> 28
>
> 72
>
> R: CPU
>
> 20
>
> 34
>
> 24
>
> R: GPU
>
> 5
>
> 24
>
> 24
>
>
> 8) Ensure that all PRs submitted have corresponding documentation on
> http://mxnet.io for it.  It may be fine to have documentation follow the
> code changes as long as there is ownership that this task will be done in a
> timely manner.  For example, I have requested the Nvidia team to submit PRs
> to update documentation on http://mxnet.io for the Volta changes to MXNet.
>
>
> 9) Ensure that mega-PRs have some level of design or architecture
> document(s) shared on the Apache MXNet wiki. The mega-PR must have both
> unit-tests and nightly/integration tests submitted to demonstrate
> high-quality level.
>
>
> 10) Finally, how do we get ownership for code submitted to MXNet? When
> something fails in a code segment that only a small set of folks know
> about, what is the expected SLA for a response from them? When users deploy
> MXNet in production environments, they will expect some form of SLA for
> support and a patch release.
>
>
> Regards,
> Bhavin Thaker.
>
>
>
>
>
>
> On Wed, Nov 1, 2017 at 8:20 AM, Pedro Larroy 
> wrote:
>
>> +1  That would be great.
>>
>> On Mon, Oct 30, 2017 at 5:35 PM, Hen  wrote:
>> > How about we ask for a new mxnet repo to store all the config in?
>> >
>> > On Fri, Oct 27, 2017 at 05:30 Pedro Larroy > >
>> > wrote:
>> >
>> >> Just to provide a high level overview of the ideas and proposals
>> >> coming from different sources for the requirements for testing and
>> >> validation of builds:
>> >>
>> >> * Have terraform files for the testing infrastructure. Infrastructure
>> >> as code (IaC). Minus not emulated / nor cloud based, embedded
>> >> hardware. ("single command" replication of the testing infrastructure,
>> >> no manual steps).
>> >>
>> >> * CI software based on Jenkins, unless someone thinks there's a better
>> >> alternative.
>> >>
>> >> * Use autoscaling groups and improve staggered build + test steps to
>> >> achieve higher parallelism and shorter feedback times.
>> >>
>> >> * Switch to a branching model based on stable master + integration
>> >> branch. PRs are merged into dev/integration which runs extended
>> >> nightly tests, which are
>> >> then merged into master, preferably in an automated way after
>> >> successful extended testing.
>> >> Master is always tested, and always buildable. Release branches or
>> >> tags in master as usual for releases.
>>

Re: [Proposal] Stabilizing Apache MXNet CI build system

2017-11-01 Thread Bhavin Thaker
Few comments/suggestions:

1) Can  we have this nice list of todo items on the Apache MXNet wiki page
to track them better?

2) Can we have a set of owners for each set of tests and source code
directory? One of the problems I have observed is that when there is a test
failure, it is difficult to find an owner who will take the responsibility
of fixing the test OR identifying the culprit code promptly -- this causes
the master to continue to fail for many days.

3) Specifically, we need an owner for the Windows setup -- nobody seems to
know much about it -- please feel free to correct me if required.

4) +1 to have a list of all feature requests on Jira or a similar commonly
and easily accessible system.

5) -1 to the branching model -- I was the gatekeeper for the branching
model at Informix for the database kernel code to be merged to master along
with my day-job of being a database kernel engineer for around 9 months and
hence have the opinion that a branching model just shifts the burden from
one place to another. We don't have a dedicated team to do the branching
model. If we really need a buildable master everyday, then we could just
tag every successful build as last_clean_build on master -- use this tag to
get a clean master at any time. How many Apache projects are doing
development on separate branches?

6) FYI: Rahul (rahul003@) has fixed various warnings with this PR:
https://github.com/apache/incubator-mxnet/pull/7109 and has a test added
that fails for any warning found. We can build on top of his work.

7) FYI: For the unit-tests problems, Meghna identified that some of the
unit-test run times have increased significantly in the recent builds. We
need volunteers to help diagnose the root-cause here:

Unit Test Task

Build #337

Build #500

Build #556

Python 2: GPU win

25

38

40

Python 3: GPU Win

15

38

46

Python2: CPU

25

35

80

Python3: CPU

14

28

72

R: CPU

20

34

24

R: GPU

5

24

24


8) Ensure that all PRs submitted have corresponding documentation on
http://mxnet.io for it.  It may be fine to have documentation follow the
code changes as long as there is ownership that this task will be done in a
timely manner.  For example, I have requested the Nvidia team to submit PRs
to update documentation on http://mxnet.io for the Volta changes to MXNet.


9) Ensure that mega-PRs have some level of design or architecture
document(s) shared on the Apache MXNet wiki. The mega-PR must have both
unit-tests and nightly/integration tests submitted to demonstrate
high-quality level.


10) Finally, how do we get ownership for code submitted to MXNet? When
something fails in a code segment that only a small set of folks know
about, what is the expected SLA for a response from them? When users deploy
MXNet in production environments, they will expect some form of SLA for
support and a patch release.


Regards,
Bhavin Thaker.






On Wed, Nov 1, 2017 at 8:20 AM, Pedro Larroy 
wrote:

> +1  That would be great.
>
> On Mon, Oct 30, 2017 at 5:35 PM, Hen  wrote:
> > How about we ask for a new mxnet repo to store all the config in?
> >
> > On Fri, Oct 27, 2017 at 05:30 Pedro Larroy  >
> > wrote:
> >
> >> Just to provide a high level overview of the ideas and proposals
> >> coming from different sources for the requirements for testing and
> >> validation of builds:
> >>
> >> * Have terraform files for the testing infrastructure. Infrastructure
> >> as code (IaC). Minus not emulated / nor cloud based, embedded
> >> hardware. ("single command" replication of the testing infrastructure,
> >> no manual steps).
> >>
> >> * CI software based on Jenkins, unless someone thinks there's a better
> >> alternative.
> >>
> >> * Use autoscaling groups and improve staggered build + test steps to
> >> achieve higher parallelism and shorter feedback times.
> >>
> >> * Switch to a branching model based on stable master + integration
> >> branch. PRs are merged into dev/integration which runs extended
> >> nightly tests, which are
> >> then merged into master, preferably in an automated way after
> >> successful extended testing.
> >> Master is always tested, and always buildable. Release branches or
> >> tags in master as usual for releases.
> >>
> >> * Build + test feedback time targeting less than 15 minutes.
> >> (Currently a build in a 16x core takes 7m). This involves lot of
> >> refactoring of tests, move expensive tests / big smoke tests to
> >> nightlies on the integration branch, also tests on IoT devices / power
> >> and performance regressions...
> >>
> >> * Add code coverage and other quality metrics.
> >>
> >> * Eliminate warnings and treat warnings as errors. We have spent time
> >> tracking down "undefined behaviour" bugs that could have been caught
> >> by compiler warnings.
> >>
> >> Is there something I'm missing or additional things that come to your
> >> mind that you would wish to add?
> >>
> >> Pedro.
> >>
>


Re: [Proposal] Stabilizing Apache MXNet CI build system

2017-11-01 Thread Pedro Larroy
+1  That would be great.

On Mon, Oct 30, 2017 at 5:35 PM, Hen  wrote:
> How about we ask for a new mxnet repo to store all the config in?
>
> On Fri, Oct 27, 2017 at 05:30 Pedro Larroy 
> wrote:
>
>> Just to provide a high level overview of the ideas and proposals
>> coming from different sources for the requirements for testing and
>> validation of builds:
>>
>> * Have terraform files for the testing infrastructure. Infrastructure
>> as code (IaC). Minus not emulated / nor cloud based, embedded
>> hardware. ("single command" replication of the testing infrastructure,
>> no manual steps).
>>
>> * CI software based on Jenkins, unless someone thinks there's a better
>> alternative.
>>
>> * Use autoscaling groups and improve staggered build + test steps to
>> achieve higher parallelism and shorter feedback times.
>>
>> * Switch to a branching model based on stable master + integration
>> branch. PRs are merged into dev/integration which runs extended
>> nightly tests, which are
>> then merged into master, preferably in an automated way after
>> successful extended testing.
>> Master is always tested, and always buildable. Release branches or
>> tags in master as usual for releases.
>>
>> * Build + test feedback time targeting less than 15 minutes.
>> (Currently a build in a 16x core takes 7m). This involves lot of
>> refactoring of tests, move expensive tests / big smoke tests to
>> nightlies on the integration branch, also tests on IoT devices / power
>> and performance regressions...
>>
>> * Add code coverage and other quality metrics.
>>
>> * Eliminate warnings and treat warnings as errors. We have spent time
>> tracking down "undefined behaviour" bugs that could have been caught
>> by compiler warnings.
>>
>> Is there something I'm missing or additional things that come to your
>> mind that you would wish to add?
>>
>> Pedro.
>>


Re: [Proposal] Stabilizing Apache MXNet CI build system

2017-10-30 Thread Hen
How about we ask for a new mxnet repo to store all the config in?

On Fri, Oct 27, 2017 at 05:30 Pedro Larroy 
wrote:

> Just to provide a high level overview of the ideas and proposals
> coming from different sources for the requirements for testing and
> validation of builds:
>
> * Have terraform files for the testing infrastructure. Infrastructure
> as code (IaC). Minus not emulated / nor cloud based, embedded
> hardware. ("single command" replication of the testing infrastructure,
> no manual steps).
>
> * CI software based on Jenkins, unless someone thinks there's a better
> alternative.
>
> * Use autoscaling groups and improve staggered build + test steps to
> achieve higher parallelism and shorter feedback times.
>
> * Switch to a branching model based on stable master + integration
> branch. PRs are merged into dev/integration which runs extended
> nightly tests, which are
> then merged into master, preferably in an automated way after
> successful extended testing.
> Master is always tested, and always buildable. Release branches or
> tags in master as usual for releases.
>
> * Build + test feedback time targeting less than 15 minutes.
> (Currently a build in a 16x core takes 7m). This involves lot of
> refactoring of tests, move expensive tests / big smoke tests to
> nightlies on the integration branch, also tests on IoT devices / power
> and performance regressions...
>
> * Add code coverage and other quality metrics.
>
> * Eliminate warnings and treat warnings as errors. We have spent time
> tracking down "undefined behaviour" bugs that could have been caught
> by compiler warnings.
>
> Is there something I'm missing or additional things that come to your
> mind that you would wish to add?
>
> Pedro.
>


Re: [Proposal] Stabilizing Apache MXNet CI build system

2017-10-27 Thread Suneel Marthi
+1

On Sat, Oct 28, 2017 at 5:29 AM, Chris Olivier 
wrote:

> IMHO, it would be nice to have Apache JIRA for mxnet where these sort of
> feature requests could be entered and publicly tracked and possibly taken
> up by whoever has cycles with the JIRA helping to avoid overlapping work.
> After the core system works, of course. WDYT?
>
> On Fri, Oct 27, 2017 at 5:30 AM, Pedro Larroy <
> pedro.larroy.li...@gmail.com>
> wrote:
>
> > Just to provide a high level overview of the ideas and proposals
> > coming from different sources for the requirements for testing and
> > validation of builds:
> >
> > * Have terraform files for the testing infrastructure. Infrastructure
> > as code (IaC). Minus not emulated / nor cloud based, embedded
> > hardware. ("single command" replication of the testing infrastructure,
> > no manual steps).
> >
> > * CI software based on Jenkins, unless someone thinks there's a better
> > alternative.
> >
> > * Use autoscaling groups and improve staggered build + test steps to
> > achieve higher parallelism and shorter feedback times.
> >
> > * Switch to a branching model based on stable master + integration
> > branch. PRs are merged into dev/integration which runs extended
> > nightly tests, which are
> > then merged into master, preferably in an automated way after
> > successful extended testing.
> > Master is always tested, and always buildable. Release branches or
> > tags in master as usual for releases.
> >
> > * Build + test feedback time targeting less than 15 minutes.
> > (Currently a build in a 16x core takes 7m). This involves lot of
> > refactoring of tests, move expensive tests / big smoke tests to
> > nightlies on the integration branch, also tests on IoT devices / power
> > and performance regressions...
> >
> > * Add code coverage and other quality metrics.
> >
> > * Eliminate warnings and treat warnings as errors. We have spent time
> > tracking down "undefined behaviour" bugs that could have been caught
> > by compiler warnings.
> >
> > Is there something I'm missing or additional things that come to your
> > mind that you would wish to add?
> >
> > Pedro.
> >
>


Re: [Proposal] Stabilizing Apache MXNet CI build system

2017-10-27 Thread Chris Olivier
IMHO, it would be nice to have Apache JIRA for mxnet where these sort of
feature requests could be entered and publicly tracked and possibly taken
up by whoever has cycles with the JIRA helping to avoid overlapping work.
After the core system works, of course. WDYT?

On Fri, Oct 27, 2017 at 5:30 AM, Pedro Larroy 
wrote:

> Just to provide a high level overview of the ideas and proposals
> coming from different sources for the requirements for testing and
> validation of builds:
>
> * Have terraform files for the testing infrastructure. Infrastructure
> as code (IaC). Minus not emulated / nor cloud based, embedded
> hardware. ("single command" replication of the testing infrastructure,
> no manual steps).
>
> * CI software based on Jenkins, unless someone thinks there's a better
> alternative.
>
> * Use autoscaling groups and improve staggered build + test steps to
> achieve higher parallelism and shorter feedback times.
>
> * Switch to a branching model based on stable master + integration
> branch. PRs are merged into dev/integration which runs extended
> nightly tests, which are
> then merged into master, preferably in an automated way after
> successful extended testing.
> Master is always tested, and always buildable. Release branches or
> tags in master as usual for releases.
>
> * Build + test feedback time targeting less than 15 minutes.
> (Currently a build in a 16x core takes 7m). This involves lot of
> refactoring of tests, move expensive tests / big smoke tests to
> nightlies on the integration branch, also tests on IoT devices / power
> and performance regressions...
>
> * Add code coverage and other quality metrics.
>
> * Eliminate warnings and treat warnings as errors. We have spent time
> tracking down "undefined behaviour" bugs that could have been caught
> by compiler warnings.
>
> Is there something I'm missing or additional things that come to your
> mind that you would wish to add?
>
> Pedro.
>


Re: [Proposal] Stabilizing Apache MXNet CI build system

2017-10-27 Thread Pedro Larroy
Just to provide a high level overview of the ideas and proposals
coming from different sources for the requirements for testing and
validation of builds:

* Have terraform files for the testing infrastructure. Infrastructure
as code (IaC). Minus not emulated / nor cloud based, embedded
hardware. ("single command" replication of the testing infrastructure,
no manual steps).

* CI software based on Jenkins, unless someone thinks there's a better
alternative.

* Use autoscaling groups and improve staggered build + test steps to
achieve higher parallelism and shorter feedback times.

* Switch to a branching model based on stable master + integration
branch. PRs are merged into dev/integration which runs extended
nightly tests, which are
then merged into master, preferably in an automated way after
successful extended testing.
Master is always tested, and always buildable. Release branches or
tags in master as usual for releases.

* Build + test feedback time targeting less than 15 minutes.
(Currently a build in a 16x core takes 7m). This involves lot of
refactoring of tests, move expensive tests / big smoke tests to
nightlies on the integration branch, also tests on IoT devices / power
and performance regressions...

* Add code coverage and other quality metrics.

* Eliminate warnings and treat warnings as errors. We have spent time
tracking down "undefined behaviour" bugs that could have been caught
by compiler warnings.

Is there something I'm missing or additional things that come to your
mind that you would wish to add?

Pedro.


Re: [Proposal] Stabilizing Apache MXNet CI build system

2017-10-26 Thread Pedro Larroy
Thanks for your input guys, I think we are all on a good track to get this
fixed. I'm confident that Meghna and Marco are going to drive this to
success. We are collecting ideas and requirements for the document on how
we will revamp the testing infrastructure. My only question right now is
where to store this document to collaborate. I don't seem to have
permissions in confluence to edit the wiki:
https://cwiki.apache.org/confluence/display/MXNET/Continuous+Integration

Should we otherwise use a shared google doc or a github wiki or how?

Please advice.

Pedro.

On Thu, Oct 26, 2017 at 8:14 AM, Meghna Baijal 
wrote:

> Thanks Sandeep for driving this discussion. I am also in contact with Pedro
> and his team to include their requirements.
> And thank you Sebastian, I will let you know!
>
> Meghna
>
> On Wed, Oct 25, 2017 at 11:05 PM, Sebastian 
> wrote:
>
> > @meghana @pedro let me know if you need someone with a mentor hat to open
> > tickets or send mail to infra, happy to help here.
> >
> > Best,
> > Sebastian
> >
> >
> > On 25.10.2017 23:18, sandeep krishnamurthy wrote:
> >
> >> Thank you, everyone, for the discussion, proposal, and the vote.
> >>
> >> Here majority community members see current CI system for Apache MXNet
> is
> >> having issues in scaling and diverse test environments. And the common
> >> suggestion is to have a separate CI setup for Apache MXNet.
> >>
> >> Following are the next steps:
> >>
> >> 1. Meghana proposed she would like to take the lead on this and come up
> >> with an initial tech design write up covering requirements, use-cases,
> >> alternate solutions and a proposed solution on how we could set up the
> CI
> >> system for MXNet.
> >> 2. This tech design will be reviewed in the community and following
> that,
> >> collaborate with Infra team and mentors to complete setup in the
> >> integration of the new system with Repo and Website and more.
> >>
> >> @Pedro Larry - We should sync up on understanding how we can unify the
> set
> >> up you have for various devices and the new set up being proposed and
> >> built. Ideally, we should have a unified CI setup for the project
> >> accessible to the community.
> >>
> >> Regards,
> >> Sandeep
> >>
> >> On Mon, Oct 23, 2017 at 7:29 AM, Pedro Larroy <
> >> pedro.larroy.li...@gmail.com>
> >> wrote:
> >>
> >> +1
> >>>
> >>> We (with Kellen and Marco) are already working on a CI system that
> >>> verifies
> >>> MXNet on devices, so far a work in progress, but at least we are
> checking
> >>> that the build is sane on Android, different arm flavors and ubuntu,
> also
> >>> building PRs. So far we are still working on having the unit tests pass
> >>> on
> >>> some architectures like Jetson TX2 and ARM / Raspberry PI.
> >>>
> >>> http://ci.mxnet.amazon-ml.com/
> >>>
> >>> Agree with Steffen on creating a document with requirements and high
> >>> level
> >>> architecture. Also I would like to have quicker feedback and as we
> >>> discussed before, saner unit tests. I think there's a big and
> nontrivial
> >>> amount of effort required here.
> >>>
> >>> Pedro.
> >>>
> >>> On Mon, Oct 23, 2017 at 6:43 AM, Steffen Rochel <
> steffenroc...@gmail.com
> >>> >
> >>> wrote:
> >>>
> >>> +1
>  I support Option 1 - Set up separate Jenkins CI build system. While
> the
>  Apache service is appropriate for some projects, our experience over
> the
>  last 6 months has not been meeting the needs of the MXNet (incubating)
>  project. AWS has been and will continue provide resources for such
> 
> >>> project.
> >>>
>  Agree we should create a document summarizing the requirements and
> high
>  level architecture, which should answer the question of Jenkins or
>  alternative.
> 
>  Steffen
> 
>  On Sat, Oct 21, 2017 at 6:51 PM shiwen hu 
>  wrote:
> 
>  +1
> >
> >
> > 2017-10-21 9:48 GMT+08:00 Chris Olivier :
> >
> > Ok, just looking for anything that can cut a task out if possible. I
> >>
> > do
> >>>
>  support not using Apache Jenkins server anyMore — it’s really not
> >>
> > been
> >>>
>  working out for various reasons.  But having a person full time is
> >> something that Steffen would have to address, I imagine.
> >>
> >> On Fri, Oct 20, 2017 at 6:03 PM Mu Li  wrote:
> >>
> >> I didn't see the clear advantage of CodePipline over pure jenkins,
> >>>
> >> because
> >>
> >>> we don't need to deploy here.
> >>>
> >>> On Fri, Oct 20, 2017 at 5:34 PM, Chris Olivier <
> >>>
> >> cjolivie...@gmail.com>
> 
> > wrote:
> >>>
> >>> CodePipeline, then.  You can point it to Jenkins instances.
> 
> 
>  On Fri, Oct 20, 2017 at 4:49 PM Mu Li 
> 
> >>> wrote:
> >>>
> 
>  AWS CodeBuild is not an option. It doesn't support GPU
> >
>  instances,
> >>>
>  mac
> >
> >> os
> >>>
>  x,
> 
> > and win

Re: [Proposal] Stabilizing Apache MXNet CI build system

2017-10-25 Thread Meghna Baijal
Thanks Sandeep for driving this discussion. I am also in contact with Pedro
and his team to include their requirements.
And thank you Sebastian, I will let you know!

Meghna

On Wed, Oct 25, 2017 at 11:05 PM, Sebastian  wrote:

> @meghana @pedro let me know if you need someone with a mentor hat to open
> tickets or send mail to infra, happy to help here.
>
> Best,
> Sebastian
>
>
> On 25.10.2017 23:18, sandeep krishnamurthy wrote:
>
>> Thank you, everyone, for the discussion, proposal, and the vote.
>>
>> Here majority community members see current CI system for Apache MXNet is
>> having issues in scaling and diverse test environments. And the common
>> suggestion is to have a separate CI setup for Apache MXNet.
>>
>> Following are the next steps:
>>
>> 1. Meghana proposed she would like to take the lead on this and come up
>> with an initial tech design write up covering requirements, use-cases,
>> alternate solutions and a proposed solution on how we could set up the CI
>> system for MXNet.
>> 2. This tech design will be reviewed in the community and following that,
>> collaborate with Infra team and mentors to complete setup in the
>> integration of the new system with Repo and Website and more.
>>
>> @Pedro Larry - We should sync up on understanding how we can unify the set
>> up you have for various devices and the new set up being proposed and
>> built. Ideally, we should have a unified CI setup for the project
>> accessible to the community.
>>
>> Regards,
>> Sandeep
>>
>> On Mon, Oct 23, 2017 at 7:29 AM, Pedro Larroy <
>> pedro.larroy.li...@gmail.com>
>> wrote:
>>
>> +1
>>>
>>> We (with Kellen and Marco) are already working on a CI system that
>>> verifies
>>> MXNet on devices, so far a work in progress, but at least we are checking
>>> that the build is sane on Android, different arm flavors and ubuntu, also
>>> building PRs. So far we are still working on having the unit tests pass
>>> on
>>> some architectures like Jetson TX2 and ARM / Raspberry PI.
>>>
>>> http://ci.mxnet.amazon-ml.com/
>>>
>>> Agree with Steffen on creating a document with requirements and high
>>> level
>>> architecture. Also I would like to have quicker feedback and as we
>>> discussed before, saner unit tests. I think there's a big and nontrivial
>>> amount of effort required here.
>>>
>>> Pedro.
>>>
>>> On Mon, Oct 23, 2017 at 6:43 AM, Steffen Rochel >> >
>>> wrote:
>>>
>>> +1
 I support Option 1 - Set up separate Jenkins CI build system. While the
 Apache service is appropriate for some projects, our experience over the
 last 6 months has not been meeting the needs of the MXNet (incubating)
 project. AWS has been and will continue provide resources for such

>>> project.
>>>
 Agree we should create a document summarizing the requirements and high
 level architecture, which should answer the question of Jenkins or
 alternative.

 Steffen

 On Sat, Oct 21, 2017 at 6:51 PM shiwen hu 
 wrote:

 +1
>
>
> 2017-10-21 9:48 GMT+08:00 Chris Olivier :
>
> Ok, just looking for anything that can cut a task out if possible. I
>>
> do
>>>
 support not using Apache Jenkins server anyMore — it’s really not
>>
> been
>>>
 working out for various reasons.  But having a person full time is
>> something that Steffen would have to address, I imagine.
>>
>> On Fri, Oct 20, 2017 at 6:03 PM Mu Li  wrote:
>>
>> I didn't see the clear advantage of CodePipline over pure jenkins,
>>>
>> because
>>
>>> we don't need to deploy here.
>>>
>>> On Fri, Oct 20, 2017 at 5:34 PM, Chris Olivier <
>>>
>> cjolivie...@gmail.com>

> wrote:
>>>
>>> CodePipeline, then.  You can point it to Jenkins instances.


 On Fri, Oct 20, 2017 at 4:49 PM Mu Li 

>>> wrote:
>>>

 AWS CodeBuild is not an option. It doesn't support GPU
>
 instances,
>>>
 mac
>
>> os
>>>
 x,

> and windows. Not even mention the edge devices.
>
> On Fri, Oct 20, 2017 at 4:07 PM, Chris Olivier <
>
 cjolivie...@gmail.com>
>>
>>> wrote:
>
> Why don;t we look into fully managed AWS CodeBuild?  It
>>
> maintains

> everything. It's also compatible with Jenkins.
>>
>> On Fri, Oct 20, 2017 at 1:51 PM, Tianqi Chen <
>>
> tqc...@cs.washington.edu
>>>

> wrote:
>>
>> +1
>>>
>>> Tianqi
>>> On Fri, Oct 20, 2017 at 1:39 PM Mu Li 
>>>
>> wrote:
>>
>>>
>>> +1


 It seems that the Apache CI is quite overloaded these

>>> days,
>>>
 and
>
>> MXNet's
>
>> CI
>>>
 pipeline is too complex to run there. In addition, we may

>>>

Re: [Proposal] Stabilizing Apache MXNet CI build system

2017-10-25 Thread Sebastian
@meghana @pedro let me know if you need someone with a mentor hat to 
open tickets or send mail to infra, happy to help here.


Best,
Sebastian

On 25.10.2017 23:18, sandeep krishnamurthy wrote:

Thank you, everyone, for the discussion, proposal, and the vote.

Here majority community members see current CI system for Apache MXNet is
having issues in scaling and diverse test environments. And the common
suggestion is to have a separate CI setup for Apache MXNet.

Following are the next steps:

1. Meghana proposed she would like to take the lead on this and come up
with an initial tech design write up covering requirements, use-cases,
alternate solutions and a proposed solution on how we could set up the CI
system for MXNet.
2. This tech design will be reviewed in the community and following that,
collaborate with Infra team and mentors to complete setup in the
integration of the new system with Repo and Website and more.

@Pedro Larry - We should sync up on understanding how we can unify the set
up you have for various devices and the new set up being proposed and
built. Ideally, we should have a unified CI setup for the project
accessible to the community.

Regards,
Sandeep

On Mon, Oct 23, 2017 at 7:29 AM, Pedro Larroy 
wrote:


+1

We (with Kellen and Marco) are already working on a CI system that verifies
MXNet on devices, so far a work in progress, but at least we are checking
that the build is sane on Android, different arm flavors and ubuntu, also
building PRs. So far we are still working on having the unit tests pass on
some architectures like Jetson TX2 and ARM / Raspberry PI.

http://ci.mxnet.amazon-ml.com/

Agree with Steffen on creating a document with requirements and high level
architecture. Also I would like to have quicker feedback and as we
discussed before, saner unit tests. I think there's a big and nontrivial
amount of effort required here.

Pedro.

On Mon, Oct 23, 2017 at 6:43 AM, Steffen Rochel 
wrote:


+1
I support Option 1 - Set up separate Jenkins CI build system. While the
Apache service is appropriate for some projects, our experience over the
last 6 months has not been meeting the needs of the MXNet (incubating)
project. AWS has been and will continue provide resources for such

project.

Agree we should create a document summarizing the requirements and high
level architecture, which should answer the question of Jenkins or
alternative.

Steffen

On Sat, Oct 21, 2017 at 6:51 PM shiwen hu  wrote:


+1


2017-10-21 9:48 GMT+08:00 Chris Olivier :


Ok, just looking for anything that can cut a task out if possible. I

do

support not using Apache Jenkins server anyMore — it’s really not

been

working out for various reasons.  But having a person full time is
something that Steffen would have to address, I imagine.

On Fri, Oct 20, 2017 at 6:03 PM Mu Li  wrote:


I didn't see the clear advantage of CodePipline over pure jenkins,

because

we don't need to deploy here.

On Fri, Oct 20, 2017 at 5:34 PM, Chris Olivier <

cjolivie...@gmail.com>

wrote:


CodePipeline, then.  You can point it to Jenkins instances.


On Fri, Oct 20, 2017 at 4:49 PM Mu Li 

wrote:



AWS CodeBuild is not an option. It doesn't support GPU

instances,

mac

os

x,

and windows. Not even mention the edge devices.

On Fri, Oct 20, 2017 at 4:07 PM, Chris Olivier <

cjolivie...@gmail.com>

wrote:


Why don;t we look into fully managed AWS CodeBuild?  It

maintains

everything. It's also compatible with Jenkins.

On Fri, Oct 20, 2017 at 1:51 PM, Tianqi Chen <

tqc...@cs.washington.edu



wrote:


+1

Tianqi
On Fri, Oct 20, 2017 at 1:39 PM Mu Li 

wrote:



+1


It seems that the Apache CI is quite overloaded these

days,

and

MXNet's

CI

pipeline is too complex to run there. In addition, we may

need

to

add

more

devices, e.g. macpro and rasbperry pi, into the server,

and

more

tasks

such

as pip build. It means a lot of requests to the Infra

team.


We can reuse our previous Jenkins server at

http://ci.mxnet.io/.

But

we

probably need a dedicate developer to maintain it.



On Fri, Oct 20, 2017 at 1:01 PM, sandeep krishnamurthy <
sandeep.krishn...@gmail.com> wrote:


Hello all,

I am hereby opening up a discussion thread on how we

can

stabilize

Apache

MXNet CI build system.

Problems:



Recently, we have seen following issues with Apache

MXNet

CI

build

systems:


1. Apache Jenkins master is overloaded and we see

issues

like

-

unable

to trigger builds, difficult to load and view the

blue

ocean

and

other

Jenkins build status page.
2. We are generating too many request/interaction on

Apache

Infra

team.

   1. Addition/deletion of new slave: Caused from

scaling

activity,

   recycling, troubleshooting or any actions leading

to

change

of

slave

   machines.
   2. Plugins / other Jenkins Master configurations.
   3. Experimentation on CI pipelines.
3. Harder to debug and resolve issues - Since access

to

master

and

slave


Re: [Proposal] Stabilizing Apache MXNet CI build system

2017-10-25 Thread sandeep krishnamurthy
Thank you, everyone, for the discussion, proposal, and the vote.

Here majority community members see current CI system for Apache MXNet is
having issues in scaling and diverse test environments. And the common
suggestion is to have a separate CI setup for Apache MXNet.

Following are the next steps:

1. Meghana proposed she would like to take the lead on this and come up
with an initial tech design write up covering requirements, use-cases,
alternate solutions and a proposed solution on how we could set up the CI
system for MXNet.
2. This tech design will be reviewed in the community and following that,
collaborate with Infra team and mentors to complete setup in the
integration of the new system with Repo and Website and more.

@Pedro Larry - We should sync up on understanding how we can unify the set
up you have for various devices and the new set up being proposed and
built. Ideally, we should have a unified CI setup for the project
accessible to the community.

Regards,
Sandeep

On Mon, Oct 23, 2017 at 7:29 AM, Pedro Larroy 
wrote:

> +1
>
> We (with Kellen and Marco) are already working on a CI system that verifies
> MXNet on devices, so far a work in progress, but at least we are checking
> that the build is sane on Android, different arm flavors and ubuntu, also
> building PRs. So far we are still working on having the unit tests pass on
> some architectures like Jetson TX2 and ARM / Raspberry PI.
>
> http://ci.mxnet.amazon-ml.com/
>
> Agree with Steffen on creating a document with requirements and high level
> architecture. Also I would like to have quicker feedback and as we
> discussed before, saner unit tests. I think there's a big and nontrivial
> amount of effort required here.
>
> Pedro.
>
> On Mon, Oct 23, 2017 at 6:43 AM, Steffen Rochel 
> wrote:
>
> > +1
> > I support Option 1 - Set up separate Jenkins CI build system. While the
> > Apache service is appropriate for some projects, our experience over the
> > last 6 months has not been meeting the needs of the MXNet (incubating)
> > project. AWS has been and will continue provide resources for such
> project.
> > Agree we should create a document summarizing the requirements and high
> > level architecture, which should answer the question of Jenkins or
> > alternative.
> >
> > Steffen
> >
> > On Sat, Oct 21, 2017 at 6:51 PM shiwen hu  wrote:
> >
> > > +1
> > >
> > >
> > > 2017-10-21 9:48 GMT+08:00 Chris Olivier :
> > >
> > > > Ok, just looking for anything that can cut a task out if possible. I
> do
> > > > support not using Apache Jenkins server anyMore — it’s really not
> been
> > > > working out for various reasons.  But having a person full time is
> > > > something that Steffen would have to address, I imagine.
> > > >
> > > > On Fri, Oct 20, 2017 at 6:03 PM Mu Li  wrote:
> > > >
> > > > > I didn't see the clear advantage of CodePipline over pure jenkins,
> > > > because
> > > > > we don't need to deploy here.
> > > > >
> > > > > On Fri, Oct 20, 2017 at 5:34 PM, Chris Olivier <
> > cjolivie...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > CodePipeline, then.  You can point it to Jenkins instances.
> > > > > >
> > > > > >
> > > > > > On Fri, Oct 20, 2017 at 4:49 PM Mu Li 
> wrote:
> > > > > >
> > > > > > > AWS CodeBuild is not an option. It doesn't support GPU
> instances,
> > > mac
> > > > > os
> > > > > > x,
> > > > > > > and windows. Not even mention the edge devices.
> > > > > > >
> > > > > > > On Fri, Oct 20, 2017 at 4:07 PM, Chris Olivier <
> > > > cjolivie...@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Why don;t we look into fully managed AWS CodeBuild?  It
> > maintains
> > > > > > > > everything. It's also compatible with Jenkins.
> > > > > > > >
> > > > > > > > On Fri, Oct 20, 2017 at 1:51 PM, Tianqi Chen <
> > > > > tqc...@cs.washington.edu
> > > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > +1
> > > > > > > > >
> > > > > > > > > Tianqi
> > > > > > > > > On Fri, Oct 20, 2017 at 1:39 PM Mu Li 
> > > > wrote:
> > > > > > > > >
> > > > > > > > > > +1
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > It seems that the Apache CI is quite overloaded these
> days,
> > > and
> > > > > > > MXNet's
> > > > > > > > > CI
> > > > > > > > > > pipeline is too complex to run there. In addition, we may
> > > need
> > > > to
> > > > > > add
> > > > > > > > > more
> > > > > > > > > > devices, e.g. macpro and rasbperry pi, into the server,
> and
> > > > more
> > > > > > > tasks
> > > > > > > > > such
> > > > > > > > > > as pip build. It means a lot of requests to the Infra
> team.
> > > > > > > > > >
> > > > > > > > > > We can reuse our previous Jenkins server at
> > > > http://ci.mxnet.io/.
> > > > > > But
> > > > > > > > we
> > > > > > > > > > probably need a dedicate developer to maintain it.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Fri, Oct 20, 2017 at 1:01 PM, sandeep krishnamurthy <
> > > > > > > > > > sandeep.krishn...@gmail.com> wrote:
> > > > 

Re: [Proposal] Stabilizing Apache MXNet CI build system

2017-10-23 Thread Pedro Larroy
+1

We (with Kellen and Marco) are already working on a CI system that verifies
MXNet on devices, so far a work in progress, but at least we are checking
that the build is sane on Android, different arm flavors and ubuntu, also
building PRs. So far we are still working on having the unit tests pass on
some architectures like Jetson TX2 and ARM / Raspberry PI.

http://ci.mxnet.amazon-ml.com/

Agree with Steffen on creating a document with requirements and high level
architecture. Also I would like to have quicker feedback and as we
discussed before, saner unit tests. I think there's a big and nontrivial
amount of effort required here.

Pedro.

On Mon, Oct 23, 2017 at 6:43 AM, Steffen Rochel 
wrote:

> +1
> I support Option 1 - Set up separate Jenkins CI build system. While the
> Apache service is appropriate for some projects, our experience over the
> last 6 months has not been meeting the needs of the MXNet (incubating)
> project. AWS has been and will continue provide resources for such project.
> Agree we should create a document summarizing the requirements and high
> level architecture, which should answer the question of Jenkins or
> alternative.
>
> Steffen
>
> On Sat, Oct 21, 2017 at 6:51 PM shiwen hu  wrote:
>
> > +1
> >
> >
> > 2017-10-21 9:48 GMT+08:00 Chris Olivier :
> >
> > > Ok, just looking for anything that can cut a task out if possible. I do
> > > support not using Apache Jenkins server anyMore — it’s really not been
> > > working out for various reasons.  But having a person full time is
> > > something that Steffen would have to address, I imagine.
> > >
> > > On Fri, Oct 20, 2017 at 6:03 PM Mu Li  wrote:
> > >
> > > > I didn't see the clear advantage of CodePipline over pure jenkins,
> > > because
> > > > we don't need to deploy here.
> > > >
> > > > On Fri, Oct 20, 2017 at 5:34 PM, Chris Olivier <
> cjolivie...@gmail.com>
> > > > wrote:
> > > >
> > > > > CodePipeline, then.  You can point it to Jenkins instances.
> > > > >
> > > > >
> > > > > On Fri, Oct 20, 2017 at 4:49 PM Mu Li  wrote:
> > > > >
> > > > > > AWS CodeBuild is not an option. It doesn't support GPU instances,
> > mac
> > > > os
> > > > > x,
> > > > > > and windows. Not even mention the edge devices.
> > > > > >
> > > > > > On Fri, Oct 20, 2017 at 4:07 PM, Chris Olivier <
> > > cjolivie...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Why don;t we look into fully managed AWS CodeBuild?  It
> maintains
> > > > > > > everything. It's also compatible with Jenkins.
> > > > > > >
> > > > > > > On Fri, Oct 20, 2017 at 1:51 PM, Tianqi Chen <
> > > > tqc...@cs.washington.edu
> > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > +1
> > > > > > > >
> > > > > > > > Tianqi
> > > > > > > > On Fri, Oct 20, 2017 at 1:39 PM Mu Li 
> > > wrote:
> > > > > > > >
> > > > > > > > > +1
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > It seems that the Apache CI is quite overloaded these days,
> > and
> > > > > > MXNet's
> > > > > > > > CI
> > > > > > > > > pipeline is too complex to run there. In addition, we may
> > need
> > > to
> > > > > add
> > > > > > > > more
> > > > > > > > > devices, e.g. macpro and rasbperry pi, into the server, and
> > > more
> > > > > > tasks
> > > > > > > > such
> > > > > > > > > as pip build. It means a lot of requests to the Infra team.
> > > > > > > > >
> > > > > > > > > We can reuse our previous Jenkins server at
> > > http://ci.mxnet.io/.
> > > > > But
> > > > > > > we
> > > > > > > > > probably need a dedicate developer to maintain it.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Fri, Oct 20, 2017 at 1:01 PM, sandeep krishnamurthy <
> > > > > > > > > sandeep.krishn...@gmail.com> wrote:
> > > > > > > > >
> > > > > > > > > > Hello all,
> > > > > > > > > >
> > > > > > > > > > I am hereby opening up a discussion thread on how we can
> > > > > stabilize
> > > > > > > > Apache
> > > > > > > > > > MXNet CI build system.
> > > > > > > > > >
> > > > > > > > > > Problems:
> > > > > > > > > >
> > > > > > > > > > 
> > > > > > > > > >
> > > > > > > > > > Recently, we have seen following issues with Apache MXNet
> > CI
> > > > > build
> > > > > > > > > systems:
> > > > > > > > > >
> > > > > > > > > >1. Apache Jenkins master is overloaded and we see
> issues
> > > > like
> > > > > -
> > > > > > > > unable
> > > > > > > > > >to trigger builds, difficult to load and view the blue
> > > ocean
> > > > > and
> > > > > > > > other
> > > > > > > > > >Jenkins build status page.
> > > > > > > > > >2. We are generating too many request/interaction on
> > > Apache
> > > > > > Infra
> > > > > > > > > team.
> > > > > > > > > >   1. Addition/deletion of new slave: Caused from
> > scaling
> > > > > > > activity,
> > > > > > > > > >   recycling, troubleshooting or any actions leading
> to
> > > > change
> > > > > > of
> > > > > > > > > slave
> > > > > > > > > >   machines.
> > > > > > > > > >   2. Plugins / other Jenkins Master configurations.
> 

Re: [Proposal] Stabilizing Apache MXNet CI build system

2017-10-22 Thread Steffen Rochel
+1
I support Option 1 - Set up separate Jenkins CI build system. While the
Apache service is appropriate for some projects, our experience over the
last 6 months has not been meeting the needs of the MXNet (incubating)
project. AWS has been and will continue provide resources for such project.
Agree we should create a document summarizing the requirements and high
level architecture, which should answer the question of Jenkins or
alternative.

Steffen

On Sat, Oct 21, 2017 at 6:51 PM shiwen hu  wrote:

> +1
>
>
> 2017-10-21 9:48 GMT+08:00 Chris Olivier :
>
> > Ok, just looking for anything that can cut a task out if possible. I do
> > support not using Apache Jenkins server anyMore — it’s really not been
> > working out for various reasons.  But having a person full time is
> > something that Steffen would have to address, I imagine.
> >
> > On Fri, Oct 20, 2017 at 6:03 PM Mu Li  wrote:
> >
> > > I didn't see the clear advantage of CodePipline over pure jenkins,
> > because
> > > we don't need to deploy here.
> > >
> > > On Fri, Oct 20, 2017 at 5:34 PM, Chris Olivier 
> > > wrote:
> > >
> > > > CodePipeline, then.  You can point it to Jenkins instances.
> > > >
> > > >
> > > > On Fri, Oct 20, 2017 at 4:49 PM Mu Li  wrote:
> > > >
> > > > > AWS CodeBuild is not an option. It doesn't support GPU instances,
> mac
> > > os
> > > > x,
> > > > > and windows. Not even mention the edge devices.
> > > > >
> > > > > On Fri, Oct 20, 2017 at 4:07 PM, Chris Olivier <
> > cjolivie...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Why don;t we look into fully managed AWS CodeBuild?  It maintains
> > > > > > everything. It's also compatible with Jenkins.
> > > > > >
> > > > > > On Fri, Oct 20, 2017 at 1:51 PM, Tianqi Chen <
> > > tqc...@cs.washington.edu
> > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > +1
> > > > > > >
> > > > > > > Tianqi
> > > > > > > On Fri, Oct 20, 2017 at 1:39 PM Mu Li 
> > wrote:
> > > > > > >
> > > > > > > > +1
> > > > > > > >
> > > > > > > >
> > > > > > > > It seems that the Apache CI is quite overloaded these days,
> and
> > > > > MXNet's
> > > > > > > CI
> > > > > > > > pipeline is too complex to run there. In addition, we may
> need
> > to
> > > > add
> > > > > > > more
> > > > > > > > devices, e.g. macpro and rasbperry pi, into the server, and
> > more
> > > > > tasks
> > > > > > > such
> > > > > > > > as pip build. It means a lot of requests to the Infra team.
> > > > > > > >
> > > > > > > > We can reuse our previous Jenkins server at
> > http://ci.mxnet.io/.
> > > > But
> > > > > > we
> > > > > > > > probably need a dedicate developer to maintain it.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Fri, Oct 20, 2017 at 1:01 PM, sandeep krishnamurthy <
> > > > > > > > sandeep.krishn...@gmail.com> wrote:
> > > > > > > >
> > > > > > > > > Hello all,
> > > > > > > > >
> > > > > > > > > I am hereby opening up a discussion thread on how we can
> > > > stabilize
> > > > > > > Apache
> > > > > > > > > MXNet CI build system.
> > > > > > > > >
> > > > > > > > > Problems:
> > > > > > > > >
> > > > > > > > > 
> > > > > > > > >
> > > > > > > > > Recently, we have seen following issues with Apache MXNet
> CI
> > > > build
> > > > > > > > systems:
> > > > > > > > >
> > > > > > > > >1. Apache Jenkins master is overloaded and we see issues
> > > like
> > > > -
> > > > > > > unable
> > > > > > > > >to trigger builds, difficult to load and view the blue
> > ocean
> > > > and
> > > > > > > other
> > > > > > > > >Jenkins build status page.
> > > > > > > > >2. We are generating too many request/interaction on
> > Apache
> > > > > Infra
> > > > > > > > team.
> > > > > > > > >   1. Addition/deletion of new slave: Caused from
> scaling
> > > > > > activity,
> > > > > > > > >   recycling, troubleshooting or any actions leading to
> > > change
> > > > > of
> > > > > > > > slave
> > > > > > > > >   machines.
> > > > > > > > >   2. Plugins / other Jenkins Master configurations.
> > > > > > > > >   3. Experimentation on CI pipelines.
> > > > > > > > >3. Harder to debug and resolve issues - Since access to
> > > master
> > > > > and
> > > > > > > > slave
> > > > > > > > >is not with the same community, it requires Infra and
> > > > community
> > > > > to
> > > > > > > > dive
> > > > > > > > >deep together on all action items.
> > > > > > > > >
> > > > > > > > > Possible Solutions:
> > > > > > > > >
> > > > > > > > > ==
> > > > > > > > >
> > > > > > > > >1. Can we set up a separate Jenkins CI build system for
> > > Apache
> > > > > > MXNet
> > > > > > > > >outside Apache Infra?
> > > > > > > > >2. Can we have a separate Jenkins Master in Apache Infra
> > for
> > > > > > MXNet?
> > > > > > > > >3. Review design of current setup, refine and fill the
> > gaps.
> > > > > > > > >
> > > > > > > > > @ Mentors/Infra team/Community:
> > > > > > > > >
> > > > > > > > > ==
> > > > > > > > >
> > > > > 

Re: [Proposal] Stabilizing Apache MXNet CI build system

2017-10-21 Thread shiwen hu
+1


2017-10-21 9:48 GMT+08:00 Chris Olivier :

> Ok, just looking for anything that can cut a task out if possible. I do
> support not using Apache Jenkins server anyMore — it’s really not been
> working out for various reasons.  But having a person full time is
> something that Steffen would have to address, I imagine.
>
> On Fri, Oct 20, 2017 at 6:03 PM Mu Li  wrote:
>
> > I didn't see the clear advantage of CodePipline over pure jenkins,
> because
> > we don't need to deploy here.
> >
> > On Fri, Oct 20, 2017 at 5:34 PM, Chris Olivier 
> > wrote:
> >
> > > CodePipeline, then.  You can point it to Jenkins instances.
> > >
> > >
> > > On Fri, Oct 20, 2017 at 4:49 PM Mu Li  wrote:
> > >
> > > > AWS CodeBuild is not an option. It doesn't support GPU instances, mac
> > os
> > > x,
> > > > and windows. Not even mention the edge devices.
> > > >
> > > > On Fri, Oct 20, 2017 at 4:07 PM, Chris Olivier <
> cjolivie...@gmail.com>
> > > > wrote:
> > > >
> > > > > Why don;t we look into fully managed AWS CodeBuild?  It maintains
> > > > > everything. It's also compatible with Jenkins.
> > > > >
> > > > > On Fri, Oct 20, 2017 at 1:51 PM, Tianqi Chen <
> > tqc...@cs.washington.edu
> > > >
> > > > > wrote:
> > > > >
> > > > > > +1
> > > > > >
> > > > > > Tianqi
> > > > > > On Fri, Oct 20, 2017 at 1:39 PM Mu Li 
> wrote:
> > > > > >
> > > > > > > +1
> > > > > > >
> > > > > > >
> > > > > > > It seems that the Apache CI is quite overloaded these days, and
> > > > MXNet's
> > > > > > CI
> > > > > > > pipeline is too complex to run there. In addition, we may need
> to
> > > add
> > > > > > more
> > > > > > > devices, e.g. macpro and rasbperry pi, into the server, and
> more
> > > > tasks
> > > > > > such
> > > > > > > as pip build. It means a lot of requests to the Infra team.
> > > > > > >
> > > > > > > We can reuse our previous Jenkins server at
> http://ci.mxnet.io/.
> > > But
> > > > > we
> > > > > > > probably need a dedicate developer to maintain it.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Fri, Oct 20, 2017 at 1:01 PM, sandeep krishnamurthy <
> > > > > > > sandeep.krishn...@gmail.com> wrote:
> > > > > > >
> > > > > > > > Hello all,
> > > > > > > >
> > > > > > > > I am hereby opening up a discussion thread on how we can
> > > stabilize
> > > > > > Apache
> > > > > > > > MXNet CI build system.
> > > > > > > >
> > > > > > > > Problems:
> > > > > > > >
> > > > > > > > 
> > > > > > > >
> > > > > > > > Recently, we have seen following issues with Apache MXNet CI
> > > build
> > > > > > > systems:
> > > > > > > >
> > > > > > > >1. Apache Jenkins master is overloaded and we see issues
> > like
> > > -
> > > > > > unable
> > > > > > > >to trigger builds, difficult to load and view the blue
> ocean
> > > and
> > > > > > other
> > > > > > > >Jenkins build status page.
> > > > > > > >2. We are generating too many request/interaction on
> Apache
> > > > Infra
> > > > > > > team.
> > > > > > > >   1. Addition/deletion of new slave: Caused from scaling
> > > > > activity,
> > > > > > > >   recycling, troubleshooting or any actions leading to
> > change
> > > > of
> > > > > > > slave
> > > > > > > >   machines.
> > > > > > > >   2. Plugins / other Jenkins Master configurations.
> > > > > > > >   3. Experimentation on CI pipelines.
> > > > > > > >3. Harder to debug and resolve issues - Since access to
> > master
> > > > and
> > > > > > > slave
> > > > > > > >is not with the same community, it requires Infra and
> > > community
> > > > to
> > > > > > > dive
> > > > > > > >deep together on all action items.
> > > > > > > >
> > > > > > > > Possible Solutions:
> > > > > > > >
> > > > > > > > ==
> > > > > > > >
> > > > > > > >1. Can we set up a separate Jenkins CI build system for
> > Apache
> > > > > MXNet
> > > > > > > >outside Apache Infra?
> > > > > > > >2. Can we have a separate Jenkins Master in Apache Infra
> for
> > > > > MXNet?
> > > > > > > >3. Review design of current setup, refine and fill the
> gaps.
> > > > > > > >
> > > > > > > > @ Mentors/Infra team/Community:
> > > > > > > >
> > > > > > > > ==
> > > > > > > >
> > > > > > > > Please provide your suggestions on how we can proceed further
> > and
> > > > > work
> > > > > > on
> > > > > > > > stabilizing the CI build systems for MXNet.
> > > > > > > >
> > > > > > > > Also, if the community decides on separate Jenkins CI build
> > > system,
> > > > > > what
> > > > > > > > important points should be taken care of apart from the
> below:
> > > > > > > >
> > > > > > > >1. Community being able to access the build page for build
> > > > > statuses.
> > > > > > > >2. Committers being able to login with apache credentials.
> > > > > > > >3. Hook setup from apache/incubator-mxnet repo to Jenkins
> > > > master.
> > > > > > > >
> > > > > > > >
> > > > > > > > Irrespective of the solution we come up, I think we should
> > > > initiate a
> > > > > > 

Re: [Proposal] Stabilizing Apache MXNet CI build system

2017-10-20 Thread Chris Olivier
Ok, just looking for anything that can cut a task out if possible. I do
support not using Apache Jenkins server anyMore — it’s really not been
working out for various reasons.  But having a person full time is
something that Steffen would have to address, I imagine.

On Fri, Oct 20, 2017 at 6:03 PM Mu Li  wrote:

> I didn't see the clear advantage of CodePipline over pure jenkins, because
> we don't need to deploy here.
>
> On Fri, Oct 20, 2017 at 5:34 PM, Chris Olivier 
> wrote:
>
> > CodePipeline, then.  You can point it to Jenkins instances.
> >
> >
> > On Fri, Oct 20, 2017 at 4:49 PM Mu Li  wrote:
> >
> > > AWS CodeBuild is not an option. It doesn't support GPU instances, mac
> os
> > x,
> > > and windows. Not even mention the edge devices.
> > >
> > > On Fri, Oct 20, 2017 at 4:07 PM, Chris Olivier 
> > > wrote:
> > >
> > > > Why don;t we look into fully managed AWS CodeBuild?  It maintains
> > > > everything. It's also compatible with Jenkins.
> > > >
> > > > On Fri, Oct 20, 2017 at 1:51 PM, Tianqi Chen <
> tqc...@cs.washington.edu
> > >
> > > > wrote:
> > > >
> > > > > +1
> > > > >
> > > > > Tianqi
> > > > > On Fri, Oct 20, 2017 at 1:39 PM Mu Li  wrote:
> > > > >
> > > > > > +1
> > > > > >
> > > > > >
> > > > > > It seems that the Apache CI is quite overloaded these days, and
> > > MXNet's
> > > > > CI
> > > > > > pipeline is too complex to run there. In addition, we may need to
> > add
> > > > > more
> > > > > > devices, e.g. macpro and rasbperry pi, into the server, and more
> > > tasks
> > > > > such
> > > > > > as pip build. It means a lot of requests to the Infra team.
> > > > > >
> > > > > > We can reuse our previous Jenkins server at http://ci.mxnet.io/.
> > But
> > > > we
> > > > > > probably need a dedicate developer to maintain it.
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Fri, Oct 20, 2017 at 1:01 PM, sandeep krishnamurthy <
> > > > > > sandeep.krishn...@gmail.com> wrote:
> > > > > >
> > > > > > > Hello all,
> > > > > > >
> > > > > > > I am hereby opening up a discussion thread on how we can
> > stabilize
> > > > > Apache
> > > > > > > MXNet CI build system.
> > > > > > >
> > > > > > > Problems:
> > > > > > >
> > > > > > > 
> > > > > > >
> > > > > > > Recently, we have seen following issues with Apache MXNet CI
> > build
> > > > > > systems:
> > > > > > >
> > > > > > >1. Apache Jenkins master is overloaded and we see issues
> like
> > -
> > > > > unable
> > > > > > >to trigger builds, difficult to load and view the blue ocean
> > and
> > > > > other
> > > > > > >Jenkins build status page.
> > > > > > >2. We are generating too many request/interaction on Apache
> > > Infra
> > > > > > team.
> > > > > > >   1. Addition/deletion of new slave: Caused from scaling
> > > > activity,
> > > > > > >   recycling, troubleshooting or any actions leading to
> change
> > > of
> > > > > > slave
> > > > > > >   machines.
> > > > > > >   2. Plugins / other Jenkins Master configurations.
> > > > > > >   3. Experimentation on CI pipelines.
> > > > > > >3. Harder to debug and resolve issues - Since access to
> master
> > > and
> > > > > > slave
> > > > > > >is not with the same community, it requires Infra and
> > community
> > > to
> > > > > > dive
> > > > > > >deep together on all action items.
> > > > > > >
> > > > > > > Possible Solutions:
> > > > > > >
> > > > > > > ==
> > > > > > >
> > > > > > >1. Can we set up a separate Jenkins CI build system for
> Apache
> > > > MXNet
> > > > > > >outside Apache Infra?
> > > > > > >2. Can we have a separate Jenkins Master in Apache Infra for
> > > > MXNet?
> > > > > > >3. Review design of current setup, refine and fill the gaps.
> > > > > > >
> > > > > > > @ Mentors/Infra team/Community:
> > > > > > >
> > > > > > > ==
> > > > > > >
> > > > > > > Please provide your suggestions on how we can proceed further
> and
> > > > work
> > > > > on
> > > > > > > stabilizing the CI build systems for MXNet.
> > > > > > >
> > > > > > > Also, if the community decides on separate Jenkins CI build
> > system,
> > > > > what
> > > > > > > important points should be taken care of apart from the below:
> > > > > > >
> > > > > > >1. Community being able to access the build page for build
> > > > statuses.
> > > > > > >2. Committers being able to login with apache credentials.
> > > > > > >3. Hook setup from apache/incubator-mxnet repo to Jenkins
> > > master.
> > > > > > >
> > > > > > >
> > > > > > > Irrespective of the solution we come up, I think we should
> > > initiate a
> > > > > > > technical design discussion on how to setup the CI build
> system.
> > > > > > Probably 1
> > > > > > > or 2 pager documents with the architecture and review with
> Infra
> > > and
> > > > > > > community members.
> > > > > > >
> > > > > > > ***There were few proposal and discussion on the slack channel,
> > to
> > > > > reach
> > > > > > > wider community members, moving tha

Re: [Proposal] Stabilizing Apache MXNet CI build system

2017-10-20 Thread Mu Li
I didn't see the clear advantage of CodePipline over pure jenkins, because
we don't need to deploy here.

On Fri, Oct 20, 2017 at 5:34 PM, Chris Olivier 
wrote:

> CodePipeline, then.  You can point it to Jenkins instances.
>
>
> On Fri, Oct 20, 2017 at 4:49 PM Mu Li  wrote:
>
> > AWS CodeBuild is not an option. It doesn't support GPU instances, mac os
> x,
> > and windows. Not even mention the edge devices.
> >
> > On Fri, Oct 20, 2017 at 4:07 PM, Chris Olivier 
> > wrote:
> >
> > > Why don;t we look into fully managed AWS CodeBuild?  It maintains
> > > everything. It's also compatible with Jenkins.
> > >
> > > On Fri, Oct 20, 2017 at 1:51 PM, Tianqi Chen  >
> > > wrote:
> > >
> > > > +1
> > > >
> > > > Tianqi
> > > > On Fri, Oct 20, 2017 at 1:39 PM Mu Li  wrote:
> > > >
> > > > > +1
> > > > >
> > > > >
> > > > > It seems that the Apache CI is quite overloaded these days, and
> > MXNet's
> > > > CI
> > > > > pipeline is too complex to run there. In addition, we may need to
> add
> > > > more
> > > > > devices, e.g. macpro and rasbperry pi, into the server, and more
> > tasks
> > > > such
> > > > > as pip build. It means a lot of requests to the Infra team.
> > > > >
> > > > > We can reuse our previous Jenkins server at http://ci.mxnet.io/.
> But
> > > we
> > > > > probably need a dedicate developer to maintain it.
> > > > >
> > > > >
> > > > >
> > > > > On Fri, Oct 20, 2017 at 1:01 PM, sandeep krishnamurthy <
> > > > > sandeep.krishn...@gmail.com> wrote:
> > > > >
> > > > > > Hello all,
> > > > > >
> > > > > > I am hereby opening up a discussion thread on how we can
> stabilize
> > > > Apache
> > > > > > MXNet CI build system.
> > > > > >
> > > > > > Problems:
> > > > > >
> > > > > > 
> > > > > >
> > > > > > Recently, we have seen following issues with Apache MXNet CI
> build
> > > > > systems:
> > > > > >
> > > > > >1. Apache Jenkins master is overloaded and we see issues like
> -
> > > > unable
> > > > > >to trigger builds, difficult to load and view the blue ocean
> and
> > > > other
> > > > > >Jenkins build status page.
> > > > > >2. We are generating too many request/interaction on Apache
> > Infra
> > > > > team.
> > > > > >   1. Addition/deletion of new slave: Caused from scaling
> > > activity,
> > > > > >   recycling, troubleshooting or any actions leading to change
> > of
> > > > > slave
> > > > > >   machines.
> > > > > >   2. Plugins / other Jenkins Master configurations.
> > > > > >   3. Experimentation on CI pipelines.
> > > > > >3. Harder to debug and resolve issues - Since access to master
> > and
> > > > > slave
> > > > > >is not with the same community, it requires Infra and
> community
> > to
> > > > > dive
> > > > > >deep together on all action items.
> > > > > >
> > > > > > Possible Solutions:
> > > > > >
> > > > > > ==
> > > > > >
> > > > > >1. Can we set up a separate Jenkins CI build system for Apache
> > > MXNet
> > > > > >outside Apache Infra?
> > > > > >2. Can we have a separate Jenkins Master in Apache Infra for
> > > MXNet?
> > > > > >3. Review design of current setup, refine and fill the gaps.
> > > > > >
> > > > > > @ Mentors/Infra team/Community:
> > > > > >
> > > > > > ==
> > > > > >
> > > > > > Please provide your suggestions on how we can proceed further and
> > > work
> > > > on
> > > > > > stabilizing the CI build systems for MXNet.
> > > > > >
> > > > > > Also, if the community decides on separate Jenkins CI build
> system,
> > > > what
> > > > > > important points should be taken care of apart from the below:
> > > > > >
> > > > > >1. Community being able to access the build page for build
> > > statuses.
> > > > > >2. Committers being able to login with apache credentials.
> > > > > >3. Hook setup from apache/incubator-mxnet repo to Jenkins
> > master.
> > > > > >
> > > > > >
> > > > > > Irrespective of the solution we come up, I think we should
> > initiate a
> > > > > > technical design discussion on how to setup the CI build system.
> > > > > Probably 1
> > > > > > or 2 pager documents with the architecture and review with Infra
> > and
> > > > > > community members.
> > > > > >
> > > > > > ***There were few proposal and discussion on the slack channel,
> to
> > > > reach
> > > > > > wider community members, moving that discussion formally to this
> > > list.
> > > > > >
> > > > > >
> > > > > > My Proposal: Option 1 - Set up separate Jenkins CI build system.
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Sandeep
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Sandeep Krishnamurthy
> > > > > >
> > > > >
> > > >
> > >
> >
>


Re: [Proposal] Stabilizing Apache MXNet CI build system

2017-10-20 Thread Chris Olivier
CodePipeline, then.  You can point it to Jenkins instances.


On Fri, Oct 20, 2017 at 4:49 PM Mu Li  wrote:

> AWS CodeBuild is not an option. It doesn't support GPU instances, mac os x,
> and windows. Not even mention the edge devices.
>
> On Fri, Oct 20, 2017 at 4:07 PM, Chris Olivier 
> wrote:
>
> > Why don;t we look into fully managed AWS CodeBuild?  It maintains
> > everything. It's also compatible with Jenkins.
> >
> > On Fri, Oct 20, 2017 at 1:51 PM, Tianqi Chen 
> > wrote:
> >
> > > +1
> > >
> > > Tianqi
> > > On Fri, Oct 20, 2017 at 1:39 PM Mu Li  wrote:
> > >
> > > > +1
> > > >
> > > >
> > > > It seems that the Apache CI is quite overloaded these days, and
> MXNet's
> > > CI
> > > > pipeline is too complex to run there. In addition, we may need to add
> > > more
> > > > devices, e.g. macpro and rasbperry pi, into the server, and more
> tasks
> > > such
> > > > as pip build. It means a lot of requests to the Infra team.
> > > >
> > > > We can reuse our previous Jenkins server at http://ci.mxnet.io/. But
> > we
> > > > probably need a dedicate developer to maintain it.
> > > >
> > > >
> > > >
> > > > On Fri, Oct 20, 2017 at 1:01 PM, sandeep krishnamurthy <
> > > > sandeep.krishn...@gmail.com> wrote:
> > > >
> > > > > Hello all,
> > > > >
> > > > > I am hereby opening up a discussion thread on how we can stabilize
> > > Apache
> > > > > MXNet CI build system.
> > > > >
> > > > > Problems:
> > > > >
> > > > > 
> > > > >
> > > > > Recently, we have seen following issues with Apache MXNet CI build
> > > > systems:
> > > > >
> > > > >1. Apache Jenkins master is overloaded and we see issues like -
> > > unable
> > > > >to trigger builds, difficult to load and view the blue ocean and
> > > other
> > > > >Jenkins build status page.
> > > > >2. We are generating too many request/interaction on Apache
> Infra
> > > > team.
> > > > >   1. Addition/deletion of new slave: Caused from scaling
> > activity,
> > > > >   recycling, troubleshooting or any actions leading to change
> of
> > > > slave
> > > > >   machines.
> > > > >   2. Plugins / other Jenkins Master configurations.
> > > > >   3. Experimentation on CI pipelines.
> > > > >3. Harder to debug and resolve issues - Since access to master
> and
> > > > slave
> > > > >is not with the same community, it requires Infra and community
> to
> > > > dive
> > > > >deep together on all action items.
> > > > >
> > > > > Possible Solutions:
> > > > >
> > > > > ==
> > > > >
> > > > >1. Can we set up a separate Jenkins CI build system for Apache
> > MXNet
> > > > >outside Apache Infra?
> > > > >2. Can we have a separate Jenkins Master in Apache Infra for
> > MXNet?
> > > > >3. Review design of current setup, refine and fill the gaps.
> > > > >
> > > > > @ Mentors/Infra team/Community:
> > > > >
> > > > > ==
> > > > >
> > > > > Please provide your suggestions on how we can proceed further and
> > work
> > > on
> > > > > stabilizing the CI build systems for MXNet.
> > > > >
> > > > > Also, if the community decides on separate Jenkins CI build system,
> > > what
> > > > > important points should be taken care of apart from the below:
> > > > >
> > > > >1. Community being able to access the build page for build
> > statuses.
> > > > >2. Committers being able to login with apache credentials.
> > > > >3. Hook setup from apache/incubator-mxnet repo to Jenkins
> master.
> > > > >
> > > > >
> > > > > Irrespective of the solution we come up, I think we should
> initiate a
> > > > > technical design discussion on how to setup the CI build system.
> > > > Probably 1
> > > > > or 2 pager documents with the architecture and review with Infra
> and
> > > > > community members.
> > > > >
> > > > > ***There were few proposal and discussion on the slack channel, to
> > > reach
> > > > > wider community members, moving that discussion formally to this
> > list.
> > > > >
> > > > >
> > > > > My Proposal: Option 1 - Set up separate Jenkins CI build system.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Sandeep
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Sandeep Krishnamurthy
> > > > >
> > > >
> > >
> >
>


Re: [Proposal] Stabilizing Apache MXNet CI build system

2017-10-20 Thread Mu Li
AWS CodeBuild is not an option. It doesn't support GPU instances, mac os x,
and windows. Not even mention the edge devices.

On Fri, Oct 20, 2017 at 4:07 PM, Chris Olivier 
wrote:

> Why don;t we look into fully managed AWS CodeBuild?  It maintains
> everything. It's also compatible with Jenkins.
>
> On Fri, Oct 20, 2017 at 1:51 PM, Tianqi Chen 
> wrote:
>
> > +1
> >
> > Tianqi
> > On Fri, Oct 20, 2017 at 1:39 PM Mu Li  wrote:
> >
> > > +1
> > >
> > >
> > > It seems that the Apache CI is quite overloaded these days, and MXNet's
> > CI
> > > pipeline is too complex to run there. In addition, we may need to add
> > more
> > > devices, e.g. macpro and rasbperry pi, into the server, and more tasks
> > such
> > > as pip build. It means a lot of requests to the Infra team.
> > >
> > > We can reuse our previous Jenkins server at http://ci.mxnet.io/. But
> we
> > > probably need a dedicate developer to maintain it.
> > >
> > >
> > >
> > > On Fri, Oct 20, 2017 at 1:01 PM, sandeep krishnamurthy <
> > > sandeep.krishn...@gmail.com> wrote:
> > >
> > > > Hello all,
> > > >
> > > > I am hereby opening up a discussion thread on how we can stabilize
> > Apache
> > > > MXNet CI build system.
> > > >
> > > > Problems:
> > > >
> > > > 
> > > >
> > > > Recently, we have seen following issues with Apache MXNet CI build
> > > systems:
> > > >
> > > >1. Apache Jenkins master is overloaded and we see issues like -
> > unable
> > > >to trigger builds, difficult to load and view the blue ocean and
> > other
> > > >Jenkins build status page.
> > > >2. We are generating too many request/interaction on Apache Infra
> > > team.
> > > >   1. Addition/deletion of new slave: Caused from scaling
> activity,
> > > >   recycling, troubleshooting or any actions leading to change of
> > > slave
> > > >   machines.
> > > >   2. Plugins / other Jenkins Master configurations.
> > > >   3. Experimentation on CI pipelines.
> > > >3. Harder to debug and resolve issues - Since access to master and
> > > slave
> > > >is not with the same community, it requires Infra and community to
> > > dive
> > > >deep together on all action items.
> > > >
> > > > Possible Solutions:
> > > >
> > > > ==
> > > >
> > > >1. Can we set up a separate Jenkins CI build system for Apache
> MXNet
> > > >outside Apache Infra?
> > > >2. Can we have a separate Jenkins Master in Apache Infra for
> MXNet?
> > > >3. Review design of current setup, refine and fill the gaps.
> > > >
> > > > @ Mentors/Infra team/Community:
> > > >
> > > > ==
> > > >
> > > > Please provide your suggestions on how we can proceed further and
> work
> > on
> > > > stabilizing the CI build systems for MXNet.
> > > >
> > > > Also, if the community decides on separate Jenkins CI build system,
> > what
> > > > important points should be taken care of apart from the below:
> > > >
> > > >1. Community being able to access the build page for build
> statuses.
> > > >2. Committers being able to login with apache credentials.
> > > >3. Hook setup from apache/incubator-mxnet repo to Jenkins master.
> > > >
> > > >
> > > > Irrespective of the solution we come up, I think we should initiate a
> > > > technical design discussion on how to setup the CI build system.
> > > Probably 1
> > > > or 2 pager documents with the architecture and review with Infra and
> > > > community members.
> > > >
> > > > ***There were few proposal and discussion on the slack channel, to
> > reach
> > > > wider community members, moving that discussion formally to this
> list.
> > > >
> > > >
> > > > My Proposal: Option 1 - Set up separate Jenkins CI build system.
> > > >
> > > > Thanks,
> > > >
> > > > Sandeep
> > > >
> > > >
> > > >
> > > > --
> > > > Sandeep Krishnamurthy
> > > >
> > >
> >
>


Re: [Proposal] Stabilizing Apache MXNet CI build system

2017-10-20 Thread Chris Olivier
I believe that Mu already started that discussion about using old mxnet.io
Jenkins server.   I expect deciding whether to replace would hinge in large
part upon what it would be replaced with.

On Fri, Oct 20, 2017 at 4:30 PM, sandeep krishnamurthy <
sandeep.krishn...@gmail.com> wrote:

> Chris: If the community decides to go with separate setup, then there will
> be a tech design discussion and CodeCommit / Jenkins / Travis such
> proposals will be covered and discussed.
>
> Thanks,
> Sandeep
>
> On Fri, Oct 20, 2017 at 4:22 PM, Seb Kiureghian 
> wrote:
>
> > But the feather can definitely be added once MXNet graduates.
> >
> > On Fri, Oct 20, 2017 at 4:21 PM, Seb Kiureghian 
> > wrote:
> >
> > > The feather can only be used by Top Level Projects.
> > >
> > > On Fri, Oct 20, 2017 at 4:19 PM, Chris Olivier 
> > > wrote:
> > >
> > >> When the word Apache is in the Hadoop logo (not always), it includes
> the
> > >> feather and color scheme.
> > >>
> > >> On Fri, Oct 20, 2017 at 4:18 PM, Chris Olivier  >
> > >> wrote:
> > >>
> > >>> Thanks.
> > >>>
> > >>> Is there any way to work the feather into it?
> > >>>
> > >>> i.e.  https://goo.gl/images/BU4dnG
> > >>>
> > >>> On Fri, Oct 20, 2017 at 4:11 PM, Seb Kiureghian 
> > >>> wrote:
> > >>>
> >  https://imgur.com/a/aADkA
> > 
> >  On Fri, Oct 20, 2017 at 4:07 PM, Chris Olivier <
> cjolivie...@gmail.com
> > >
> >  wrote:
> > 
> >  > Why don;t we look into fully managed AWS CodeBuild?  It maintains
> >  > everything. It's also compatible with Jenkins.
> >  >
> >  > On Fri, Oct 20, 2017 at 1:51 PM, Tianqi Chen <
> >  tqc...@cs.washington.edu>
> >  > wrote:
> >  >
> >  > > +1
> >  > >
> >  > > Tianqi
> >  > > On Fri, Oct 20, 2017 at 1:39 PM Mu Li 
> wrote:
> >  > >
> >  > > > +1
> >  > > >
> >  > > >
> >  > > > It seems that the Apache CI is quite overloaded these days,
> and
> >  MXNet's
> >  > > CI
> >  > > > pipeline is too complex to run there. In addition, we may need
> > to
> >  add
> >  > > more
> >  > > > devices, e.g. macpro and rasbperry pi, into the server, and
> more
> >  tasks
> >  > > such
> >  > > > as pip build. It means a lot of requests to the Infra team.
> >  > > >
> >  > > > We can reuse our previous Jenkins server at
> http://ci.mxnet.io/
> > .
> >  But
> >  > we
> >  > > > probably need a dedicate developer to maintain it.
> >  > > >
> >  > > >
> >  > > >
> >  > > > On Fri, Oct 20, 2017 at 1:01 PM, sandeep krishnamurthy <
> >  > > > sandeep.krishn...@gmail.com> wrote:
> >  > > >
> >  > > > > Hello all,
> >  > > > >
> >  > > > > I am hereby opening up a discussion thread on how we can
> >  stabilize
> >  > > Apache
> >  > > > > MXNet CI build system.
> >  > > > >
> >  > > > > Problems:
> >  > > > >
> >  > > > > 
> >  > > > >
> >  > > > > Recently, we have seen following issues with Apache MXNet CI
> >  build
> >  > > > systems:
> >  > > > >
> >  > > > >1. Apache Jenkins master is overloaded and we see issues
> >  like -
> >  > > unable
> >  > > > >to trigger builds, difficult to load and view the blue
> > ocean
> >  and
> >  > > other
> >  > > > >Jenkins build status page.
> >  > > > >2. We are generating too many request/interaction on
> Apache
> >  Infra
> >  > > > team.
> >  > > > >   1. Addition/deletion of new slave: Caused from scaling
> >  > activity,
> >  > > > >   recycling, troubleshooting or any actions leading to
> >  change of
> >  > > > slave
> >  > > > >   machines.
> >  > > > >   2. Plugins / other Jenkins Master configurations.
> >  > > > >   3. Experimentation on CI pipelines.
> >  > > > >3. Harder to debug and resolve issues - Since access to
> >  master and
> >  > > > slave
> >  > > > >is not with the same community, it requires Infra and
> >  community to
> >  > > > dive
> >  > > > >deep together on all action items.
> >  > > > >
> >  > > > > Possible Solutions:
> >  > > > >
> >  > > > > ==
> >  > > > >
> >  > > > >1. Can we set up a separate Jenkins CI build system for
> >  Apache
> >  > MXNet
> >  > > > >outside Apache Infra?
> >  > > > >2. Can we have a separate Jenkins Master in Apache Infra
> > for
> >  > MXNet?
> >  > > > >3. Review design of current setup, refine and fill the
> > gaps.
> >  > > > >
> >  > > > > @ Mentors/Infra team/Community:
> >  > > > >
> >  > > > > ==
> >  > > > >
> >  > > > > Please provide your suggestions on how we can proceed
> further
> >  and
> >  > work
> >  > > on
> >  > > > > stabilizing the CI build systems for MXNet.
> >  > > > >
> >  > > > > Also, if the community deci

Re: [Proposal] Stabilizing Apache MXNet CI build system

2017-10-20 Thread sandeep krishnamurthy
Chris: If the community decides to go with separate setup, then there will
be a tech design discussion and CodeCommit / Jenkins / Travis such
proposals will be covered and discussed.

Thanks,
Sandeep

On Fri, Oct 20, 2017 at 4:22 PM, Seb Kiureghian  wrote:

> But the feather can definitely be added once MXNet graduates.
>
> On Fri, Oct 20, 2017 at 4:21 PM, Seb Kiureghian 
> wrote:
>
> > The feather can only be used by Top Level Projects.
> >
> > On Fri, Oct 20, 2017 at 4:19 PM, Chris Olivier 
> > wrote:
> >
> >> When the word Apache is in the Hadoop logo (not always), it includes the
> >> feather and color scheme.
> >>
> >> On Fri, Oct 20, 2017 at 4:18 PM, Chris Olivier 
> >> wrote:
> >>
> >>> Thanks.
> >>>
> >>> Is there any way to work the feather into it?
> >>>
> >>> i.e.  https://goo.gl/images/BU4dnG
> >>>
> >>> On Fri, Oct 20, 2017 at 4:11 PM, Seb Kiureghian 
> >>> wrote:
> >>>
>  https://imgur.com/a/aADkA
> 
>  On Fri, Oct 20, 2017 at 4:07 PM, Chris Olivier  >
>  wrote:
> 
>  > Why don;t we look into fully managed AWS CodeBuild?  It maintains
>  > everything. It's also compatible with Jenkins.
>  >
>  > On Fri, Oct 20, 2017 at 1:51 PM, Tianqi Chen <
>  tqc...@cs.washington.edu>
>  > wrote:
>  >
>  > > +1
>  > >
>  > > Tianqi
>  > > On Fri, Oct 20, 2017 at 1:39 PM Mu Li  wrote:
>  > >
>  > > > +1
>  > > >
>  > > >
>  > > > It seems that the Apache CI is quite overloaded these days, and
>  MXNet's
>  > > CI
>  > > > pipeline is too complex to run there. In addition, we may need
> to
>  add
>  > > more
>  > > > devices, e.g. macpro and rasbperry pi, into the server, and more
>  tasks
>  > > such
>  > > > as pip build. It means a lot of requests to the Infra team.
>  > > >
>  > > > We can reuse our previous Jenkins server at http://ci.mxnet.io/
> .
>  But
>  > we
>  > > > probably need a dedicate developer to maintain it.
>  > > >
>  > > >
>  > > >
>  > > > On Fri, Oct 20, 2017 at 1:01 PM, sandeep krishnamurthy <
>  > > > sandeep.krishn...@gmail.com> wrote:
>  > > >
>  > > > > Hello all,
>  > > > >
>  > > > > I am hereby opening up a discussion thread on how we can
>  stabilize
>  > > Apache
>  > > > > MXNet CI build system.
>  > > > >
>  > > > > Problems:
>  > > > >
>  > > > > 
>  > > > >
>  > > > > Recently, we have seen following issues with Apache MXNet CI
>  build
>  > > > systems:
>  > > > >
>  > > > >1. Apache Jenkins master is overloaded and we see issues
>  like -
>  > > unable
>  > > > >to trigger builds, difficult to load and view the blue
> ocean
>  and
>  > > other
>  > > > >Jenkins build status page.
>  > > > >2. We are generating too many request/interaction on Apache
>  Infra
>  > > > team.
>  > > > >   1. Addition/deletion of new slave: Caused from scaling
>  > activity,
>  > > > >   recycling, troubleshooting or any actions leading to
>  change of
>  > > > slave
>  > > > >   machines.
>  > > > >   2. Plugins / other Jenkins Master configurations.
>  > > > >   3. Experimentation on CI pipelines.
>  > > > >3. Harder to debug and resolve issues - Since access to
>  master and
>  > > > slave
>  > > > >is not with the same community, it requires Infra and
>  community to
>  > > > dive
>  > > > >deep together on all action items.
>  > > > >
>  > > > > Possible Solutions:
>  > > > >
>  > > > > ==
>  > > > >
>  > > > >1. Can we set up a separate Jenkins CI build system for
>  Apache
>  > MXNet
>  > > > >outside Apache Infra?
>  > > > >2. Can we have a separate Jenkins Master in Apache Infra
> for
>  > MXNet?
>  > > > >3. Review design of current setup, refine and fill the
> gaps.
>  > > > >
>  > > > > @ Mentors/Infra team/Community:
>  > > > >
>  > > > > ==
>  > > > >
>  > > > > Please provide your suggestions on how we can proceed further
>  and
>  > work
>  > > on
>  > > > > stabilizing the CI build systems for MXNet.
>  > > > >
>  > > > > Also, if the community decides on separate Jenkins CI build
>  system,
>  > > what
>  > > > > important points should be taken care of apart from the below:
>  > > > >
>  > > > >1. Community being able to access the build page for build
>  > statuses.
>  > > > >2. Committers being able to login with apache credentials.
>  > > > >3. Hook setup from apache/incubator-mxnet repo to Jenkins
>  master.
>  > > > >
>  > > > >
>  > > > > Irrespective of the solution we come up, I think we should
>  initiate a
>  > > > > technical design discussion on how to setup the CI build
> sys

Re: [Proposal] Stabilizing Apache MXNet CI build system

2017-10-20 Thread Seb Kiureghian
But the feather can definitely be added once MXNet graduates.

On Fri, Oct 20, 2017 at 4:21 PM, Seb Kiureghian  wrote:

> The feather can only be used by Top Level Projects.
>
> On Fri, Oct 20, 2017 at 4:19 PM, Chris Olivier 
> wrote:
>
>> When the word Apache is in the Hadoop logo (not always), it includes the
>> feather and color scheme.
>>
>> On Fri, Oct 20, 2017 at 4:18 PM, Chris Olivier 
>> wrote:
>>
>>> Thanks.
>>>
>>> Is there any way to work the feather into it?
>>>
>>> i.e.  https://goo.gl/images/BU4dnG
>>>
>>> On Fri, Oct 20, 2017 at 4:11 PM, Seb Kiureghian 
>>> wrote:
>>>
 https://imgur.com/a/aADkA

 On Fri, Oct 20, 2017 at 4:07 PM, Chris Olivier 
 wrote:

 > Why don;t we look into fully managed AWS CodeBuild?  It maintains
 > everything. It's also compatible with Jenkins.
 >
 > On Fri, Oct 20, 2017 at 1:51 PM, Tianqi Chen <
 tqc...@cs.washington.edu>
 > wrote:
 >
 > > +1
 > >
 > > Tianqi
 > > On Fri, Oct 20, 2017 at 1:39 PM Mu Li  wrote:
 > >
 > > > +1
 > > >
 > > >
 > > > It seems that the Apache CI is quite overloaded these days, and
 MXNet's
 > > CI
 > > > pipeline is too complex to run there. In addition, we may need to
 add
 > > more
 > > > devices, e.g. macpro and rasbperry pi, into the server, and more
 tasks
 > > such
 > > > as pip build. It means a lot of requests to the Infra team.
 > > >
 > > > We can reuse our previous Jenkins server at http://ci.mxnet.io/.
 But
 > we
 > > > probably need a dedicate developer to maintain it.
 > > >
 > > >
 > > >
 > > > On Fri, Oct 20, 2017 at 1:01 PM, sandeep krishnamurthy <
 > > > sandeep.krishn...@gmail.com> wrote:
 > > >
 > > > > Hello all,
 > > > >
 > > > > I am hereby opening up a discussion thread on how we can
 stabilize
 > > Apache
 > > > > MXNet CI build system.
 > > > >
 > > > > Problems:
 > > > >
 > > > > 
 > > > >
 > > > > Recently, we have seen following issues with Apache MXNet CI
 build
 > > > systems:
 > > > >
 > > > >1. Apache Jenkins master is overloaded and we see issues
 like -
 > > unable
 > > > >to trigger builds, difficult to load and view the blue ocean
 and
 > > other
 > > > >Jenkins build status page.
 > > > >2. We are generating too many request/interaction on Apache
 Infra
 > > > team.
 > > > >   1. Addition/deletion of new slave: Caused from scaling
 > activity,
 > > > >   recycling, troubleshooting or any actions leading to
 change of
 > > > slave
 > > > >   machines.
 > > > >   2. Plugins / other Jenkins Master configurations.
 > > > >   3. Experimentation on CI pipelines.
 > > > >3. Harder to debug and resolve issues - Since access to
 master and
 > > > slave
 > > > >is not with the same community, it requires Infra and
 community to
 > > > dive
 > > > >deep together on all action items.
 > > > >
 > > > > Possible Solutions:
 > > > >
 > > > > ==
 > > > >
 > > > >1. Can we set up a separate Jenkins CI build system for
 Apache
 > MXNet
 > > > >outside Apache Infra?
 > > > >2. Can we have a separate Jenkins Master in Apache Infra for
 > MXNet?
 > > > >3. Review design of current setup, refine and fill the gaps.
 > > > >
 > > > > @ Mentors/Infra team/Community:
 > > > >
 > > > > ==
 > > > >
 > > > > Please provide your suggestions on how we can proceed further
 and
 > work
 > > on
 > > > > stabilizing the CI build systems for MXNet.
 > > > >
 > > > > Also, if the community decides on separate Jenkins CI build
 system,
 > > what
 > > > > important points should be taken care of apart from the below:
 > > > >
 > > > >1. Community being able to access the build page for build
 > statuses.
 > > > >2. Committers being able to login with apache credentials.
 > > > >3. Hook setup from apache/incubator-mxnet repo to Jenkins
 master.
 > > > >
 > > > >
 > > > > Irrespective of the solution we come up, I think we should
 initiate a
 > > > > technical design discussion on how to setup the CI build system.
 > > > Probably 1
 > > > > or 2 pager documents with the architecture and review with
 Infra and
 > > > > community members.
 > > > >
 > > > > ***There were few proposal and discussion on the slack channel,
 to
 > > reach
 > > > > wider community members, moving that discussion formally to this
 > list.
 > > > >
 > > > >
 > > > > My Proposal: Option 1 - Set up separate Jenkins CI build system.
 > > > >
 > > > > Thanks,
 > > > >
 > > > > Sandeep
 > > > >
 > > > >
 > > > 

Re: [Proposal] Stabilizing Apache MXNet CI build system

2017-10-20 Thread Seb Kiureghian
The feather can only be used by Top Level Projects.

On Fri, Oct 20, 2017 at 4:19 PM, Chris Olivier 
wrote:

> When the word Apache is in the Hadoop logo (not always), it includes the
> feather and color scheme.
>
> On Fri, Oct 20, 2017 at 4:18 PM, Chris Olivier 
> wrote:
>
>> Thanks.
>>
>> Is there any way to work the feather into it?
>>
>> i.e.  https://goo.gl/images/BU4dnG
>>
>> On Fri, Oct 20, 2017 at 4:11 PM, Seb Kiureghian 
>> wrote:
>>
>>> https://imgur.com/a/aADkA
>>>
>>> On Fri, Oct 20, 2017 at 4:07 PM, Chris Olivier 
>>> wrote:
>>>
>>> > Why don;t we look into fully managed AWS CodeBuild?  It maintains
>>> > everything. It's also compatible with Jenkins.
>>> >
>>> > On Fri, Oct 20, 2017 at 1:51 PM, Tianqi Chen >> >
>>> > wrote:
>>> >
>>> > > +1
>>> > >
>>> > > Tianqi
>>> > > On Fri, Oct 20, 2017 at 1:39 PM Mu Li  wrote:
>>> > >
>>> > > > +1
>>> > > >
>>> > > >
>>> > > > It seems that the Apache CI is quite overloaded these days, and
>>> MXNet's
>>> > > CI
>>> > > > pipeline is too complex to run there. In addition, we may need to
>>> add
>>> > > more
>>> > > > devices, e.g. macpro and rasbperry pi, into the server, and more
>>> tasks
>>> > > such
>>> > > > as pip build. It means a lot of requests to the Infra team.
>>> > > >
>>> > > > We can reuse our previous Jenkins server at http://ci.mxnet.io/.
>>> But
>>> > we
>>> > > > probably need a dedicate developer to maintain it.
>>> > > >
>>> > > >
>>> > > >
>>> > > > On Fri, Oct 20, 2017 at 1:01 PM, sandeep krishnamurthy <
>>> > > > sandeep.krishn...@gmail.com> wrote:
>>> > > >
>>> > > > > Hello all,
>>> > > > >
>>> > > > > I am hereby opening up a discussion thread on how we can
>>> stabilize
>>> > > Apache
>>> > > > > MXNet CI build system.
>>> > > > >
>>> > > > > Problems:
>>> > > > >
>>> > > > > 
>>> > > > >
>>> > > > > Recently, we have seen following issues with Apache MXNet CI
>>> build
>>> > > > systems:
>>> > > > >
>>> > > > >1. Apache Jenkins master is overloaded and we see issues like
>>> -
>>> > > unable
>>> > > > >to trigger builds, difficult to load and view the blue ocean
>>> and
>>> > > other
>>> > > > >Jenkins build status page.
>>> > > > >2. We are generating too many request/interaction on Apache
>>> Infra
>>> > > > team.
>>> > > > >   1. Addition/deletion of new slave: Caused from scaling
>>> > activity,
>>> > > > >   recycling, troubleshooting or any actions leading to
>>> change of
>>> > > > slave
>>> > > > >   machines.
>>> > > > >   2. Plugins / other Jenkins Master configurations.
>>> > > > >   3. Experimentation on CI pipelines.
>>> > > > >3. Harder to debug and resolve issues - Since access to
>>> master and
>>> > > > slave
>>> > > > >is not with the same community, it requires Infra and
>>> community to
>>> > > > dive
>>> > > > >deep together on all action items.
>>> > > > >
>>> > > > > Possible Solutions:
>>> > > > >
>>> > > > > ==
>>> > > > >
>>> > > > >1. Can we set up a separate Jenkins CI build system for Apache
>>> > MXNet
>>> > > > >outside Apache Infra?
>>> > > > >2. Can we have a separate Jenkins Master in Apache Infra for
>>> > MXNet?
>>> > > > >3. Review design of current setup, refine and fill the gaps.
>>> > > > >
>>> > > > > @ Mentors/Infra team/Community:
>>> > > > >
>>> > > > > ==
>>> > > > >
>>> > > > > Please provide your suggestions on how we can proceed further and
>>> > work
>>> > > on
>>> > > > > stabilizing the CI build systems for MXNet.
>>> > > > >
>>> > > > > Also, if the community decides on separate Jenkins CI build
>>> system,
>>> > > what
>>> > > > > important points should be taken care of apart from the below:
>>> > > > >
>>> > > > >1. Community being able to access the build page for build
>>> > statuses.
>>> > > > >2. Committers being able to login with apache credentials.
>>> > > > >3. Hook setup from apache/incubator-mxnet repo to Jenkins
>>> master.
>>> > > > >
>>> > > > >
>>> > > > > Irrespective of the solution we come up, I think we should
>>> initiate a
>>> > > > > technical design discussion on how to setup the CI build system.
>>> > > > Probably 1
>>> > > > > or 2 pager documents with the architecture and review with Infra
>>> and
>>> > > > > community members.
>>> > > > >
>>> > > > > ***There were few proposal and discussion on the slack channel,
>>> to
>>> > > reach
>>> > > > > wider community members, moving that discussion formally to this
>>> > list.
>>> > > > >
>>> > > > >
>>> > > > > My Proposal: Option 1 - Set up separate Jenkins CI build system.
>>> > > > >
>>> > > > > Thanks,
>>> > > > >
>>> > > > > Sandeep
>>> > > > >
>>> > > > >
>>> > > > >
>>> > > > > --
>>> > > > > Sandeep Krishnamurthy
>>> > > > >
>>> > > >
>>> > >
>>> >
>>>
>>
>>
>


Re: [Proposal] Stabilizing Apache MXNet CI build system

2017-10-20 Thread Chris Olivier
When the word Apache is in the Hadoop logo (not always), it includes the
feather and color scheme.

On Fri, Oct 20, 2017 at 4:18 PM, Chris Olivier 
wrote:

> Thanks.
>
> Is there any way to work the feather into it?
>
> i.e.  https://goo.gl/images/BU4dnG
>
> On Fri, Oct 20, 2017 at 4:11 PM, Seb Kiureghian 
> wrote:
>
>> https://imgur.com/a/aADkA
>>
>> On Fri, Oct 20, 2017 at 4:07 PM, Chris Olivier 
>> wrote:
>>
>> > Why don;t we look into fully managed AWS CodeBuild?  It maintains
>> > everything. It's also compatible with Jenkins.
>> >
>> > On Fri, Oct 20, 2017 at 1:51 PM, Tianqi Chen 
>> > wrote:
>> >
>> > > +1
>> > >
>> > > Tianqi
>> > > On Fri, Oct 20, 2017 at 1:39 PM Mu Li  wrote:
>> > >
>> > > > +1
>> > > >
>> > > >
>> > > > It seems that the Apache CI is quite overloaded these days, and
>> MXNet's
>> > > CI
>> > > > pipeline is too complex to run there. In addition, we may need to
>> add
>> > > more
>> > > > devices, e.g. macpro and rasbperry pi, into the server, and more
>> tasks
>> > > such
>> > > > as pip build. It means a lot of requests to the Infra team.
>> > > >
>> > > > We can reuse our previous Jenkins server at http://ci.mxnet.io/.
>> But
>> > we
>> > > > probably need a dedicate developer to maintain it.
>> > > >
>> > > >
>> > > >
>> > > > On Fri, Oct 20, 2017 at 1:01 PM, sandeep krishnamurthy <
>> > > > sandeep.krishn...@gmail.com> wrote:
>> > > >
>> > > > > Hello all,
>> > > > >
>> > > > > I am hereby opening up a discussion thread on how we can stabilize
>> > > Apache
>> > > > > MXNet CI build system.
>> > > > >
>> > > > > Problems:
>> > > > >
>> > > > > 
>> > > > >
>> > > > > Recently, we have seen following issues with Apache MXNet CI build
>> > > > systems:
>> > > > >
>> > > > >1. Apache Jenkins master is overloaded and we see issues like -
>> > > unable
>> > > > >to trigger builds, difficult to load and view the blue ocean
>> and
>> > > other
>> > > > >Jenkins build status page.
>> > > > >2. We are generating too many request/interaction on Apache
>> Infra
>> > > > team.
>> > > > >   1. Addition/deletion of new slave: Caused from scaling
>> > activity,
>> > > > >   recycling, troubleshooting or any actions leading to change
>> of
>> > > > slave
>> > > > >   machines.
>> > > > >   2. Plugins / other Jenkins Master configurations.
>> > > > >   3. Experimentation on CI pipelines.
>> > > > >3. Harder to debug and resolve issues - Since access to master
>> and
>> > > > slave
>> > > > >is not with the same community, it requires Infra and
>> community to
>> > > > dive
>> > > > >deep together on all action items.
>> > > > >
>> > > > > Possible Solutions:
>> > > > >
>> > > > > ==
>> > > > >
>> > > > >1. Can we set up a separate Jenkins CI build system for Apache
>> > MXNet
>> > > > >outside Apache Infra?
>> > > > >2. Can we have a separate Jenkins Master in Apache Infra for
>> > MXNet?
>> > > > >3. Review design of current setup, refine and fill the gaps.
>> > > > >
>> > > > > @ Mentors/Infra team/Community:
>> > > > >
>> > > > > ==
>> > > > >
>> > > > > Please provide your suggestions on how we can proceed further and
>> > work
>> > > on
>> > > > > stabilizing the CI build systems for MXNet.
>> > > > >
>> > > > > Also, if the community decides on separate Jenkins CI build
>> system,
>> > > what
>> > > > > important points should be taken care of apart from the below:
>> > > > >
>> > > > >1. Community being able to access the build page for build
>> > statuses.
>> > > > >2. Committers being able to login with apache credentials.
>> > > > >3. Hook setup from apache/incubator-mxnet repo to Jenkins
>> master.
>> > > > >
>> > > > >
>> > > > > Irrespective of the solution we come up, I think we should
>> initiate a
>> > > > > technical design discussion on how to setup the CI build system.
>> > > > Probably 1
>> > > > > or 2 pager documents with the architecture and review with Infra
>> and
>> > > > > community members.
>> > > > >
>> > > > > ***There were few proposal and discussion on the slack channel, to
>> > > reach
>> > > > > wider community members, moving that discussion formally to this
>> > list.
>> > > > >
>> > > > >
>> > > > > My Proposal: Option 1 - Set up separate Jenkins CI build system.
>> > > > >
>> > > > > Thanks,
>> > > > >
>> > > > > Sandeep
>> > > > >
>> > > > >
>> > > > >
>> > > > > --
>> > > > > Sandeep Krishnamurthy
>> > > > >
>> > > >
>> > >
>> >
>>
>
>


Re: [Proposal] Stabilizing Apache MXNet CI build system

2017-10-20 Thread Chris Olivier
Thanks.

Is there any way to work the feather into it?

i.e.  https://goo.gl/images/BU4dnG

On Fri, Oct 20, 2017 at 4:11 PM, Seb Kiureghian  wrote:

> https://imgur.com/a/aADkA
>
> On Fri, Oct 20, 2017 at 4:07 PM, Chris Olivier 
> wrote:
>
> > Why don;t we look into fully managed AWS CodeBuild?  It maintains
> > everything. It's also compatible with Jenkins.
> >
> > On Fri, Oct 20, 2017 at 1:51 PM, Tianqi Chen 
> > wrote:
> >
> > > +1
> > >
> > > Tianqi
> > > On Fri, Oct 20, 2017 at 1:39 PM Mu Li  wrote:
> > >
> > > > +1
> > > >
> > > >
> > > > It seems that the Apache CI is quite overloaded these days, and
> MXNet's
> > > CI
> > > > pipeline is too complex to run there. In addition, we may need to add
> > > more
> > > > devices, e.g. macpro and rasbperry pi, into the server, and more
> tasks
> > > such
> > > > as pip build. It means a lot of requests to the Infra team.
> > > >
> > > > We can reuse our previous Jenkins server at http://ci.mxnet.io/. But
> > we
> > > > probably need a dedicate developer to maintain it.
> > > >
> > > >
> > > >
> > > > On Fri, Oct 20, 2017 at 1:01 PM, sandeep krishnamurthy <
> > > > sandeep.krishn...@gmail.com> wrote:
> > > >
> > > > > Hello all,
> > > > >
> > > > > I am hereby opening up a discussion thread on how we can stabilize
> > > Apache
> > > > > MXNet CI build system.
> > > > >
> > > > > Problems:
> > > > >
> > > > > 
> > > > >
> > > > > Recently, we have seen following issues with Apache MXNet CI build
> > > > systems:
> > > > >
> > > > >1. Apache Jenkins master is overloaded and we see issues like -
> > > unable
> > > > >to trigger builds, difficult to load and view the blue ocean and
> > > other
> > > > >Jenkins build status page.
> > > > >2. We are generating too many request/interaction on Apache
> Infra
> > > > team.
> > > > >   1. Addition/deletion of new slave: Caused from scaling
> > activity,
> > > > >   recycling, troubleshooting or any actions leading to change
> of
> > > > slave
> > > > >   machines.
> > > > >   2. Plugins / other Jenkins Master configurations.
> > > > >   3. Experimentation on CI pipelines.
> > > > >3. Harder to debug and resolve issues - Since access to master
> and
> > > > slave
> > > > >is not with the same community, it requires Infra and community
> to
> > > > dive
> > > > >deep together on all action items.
> > > > >
> > > > > Possible Solutions:
> > > > >
> > > > > ==
> > > > >
> > > > >1. Can we set up a separate Jenkins CI build system for Apache
> > MXNet
> > > > >outside Apache Infra?
> > > > >2. Can we have a separate Jenkins Master in Apache Infra for
> > MXNet?
> > > > >3. Review design of current setup, refine and fill the gaps.
> > > > >
> > > > > @ Mentors/Infra team/Community:
> > > > >
> > > > > ==
> > > > >
> > > > > Please provide your suggestions on how we can proceed further and
> > work
> > > on
> > > > > stabilizing the CI build systems for MXNet.
> > > > >
> > > > > Also, if the community decides on separate Jenkins CI build system,
> > > what
> > > > > important points should be taken care of apart from the below:
> > > > >
> > > > >1. Community being able to access the build page for build
> > statuses.
> > > > >2. Committers being able to login with apache credentials.
> > > > >3. Hook setup from apache/incubator-mxnet repo to Jenkins
> master.
> > > > >
> > > > >
> > > > > Irrespective of the solution we come up, I think we should
> initiate a
> > > > > technical design discussion on how to setup the CI build system.
> > > > Probably 1
> > > > > or 2 pager documents with the architecture and review with Infra
> and
> > > > > community members.
> > > > >
> > > > > ***There were few proposal and discussion on the slack channel, to
> > > reach
> > > > > wider community members, moving that discussion formally to this
> > list.
> > > > >
> > > > >
> > > > > My Proposal: Option 1 - Set up separate Jenkins CI build system.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Sandeep
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Sandeep Krishnamurthy
> > > > >
> > > >
> > >
> >
>


Re: [Proposal] Stabilizing Apache MXNet CI build system

2017-10-20 Thread Seb Kiureghian
https://imgur.com/a/aADkA

On Fri, Oct 20, 2017 at 4:07 PM, Chris Olivier 
wrote:

> Why don;t we look into fully managed AWS CodeBuild?  It maintains
> everything. It's also compatible with Jenkins.
>
> On Fri, Oct 20, 2017 at 1:51 PM, Tianqi Chen 
> wrote:
>
> > +1
> >
> > Tianqi
> > On Fri, Oct 20, 2017 at 1:39 PM Mu Li  wrote:
> >
> > > +1
> > >
> > >
> > > It seems that the Apache CI is quite overloaded these days, and MXNet's
> > CI
> > > pipeline is too complex to run there. In addition, we may need to add
> > more
> > > devices, e.g. macpro and rasbperry pi, into the server, and more tasks
> > such
> > > as pip build. It means a lot of requests to the Infra team.
> > >
> > > We can reuse our previous Jenkins server at http://ci.mxnet.io/. But
> we
> > > probably need a dedicate developer to maintain it.
> > >
> > >
> > >
> > > On Fri, Oct 20, 2017 at 1:01 PM, sandeep krishnamurthy <
> > > sandeep.krishn...@gmail.com> wrote:
> > >
> > > > Hello all,
> > > >
> > > > I am hereby opening up a discussion thread on how we can stabilize
> > Apache
> > > > MXNet CI build system.
> > > >
> > > > Problems:
> > > >
> > > > 
> > > >
> > > > Recently, we have seen following issues with Apache MXNet CI build
> > > systems:
> > > >
> > > >1. Apache Jenkins master is overloaded and we see issues like -
> > unable
> > > >to trigger builds, difficult to load and view the blue ocean and
> > other
> > > >Jenkins build status page.
> > > >2. We are generating too many request/interaction on Apache Infra
> > > team.
> > > >   1. Addition/deletion of new slave: Caused from scaling
> activity,
> > > >   recycling, troubleshooting or any actions leading to change of
> > > slave
> > > >   machines.
> > > >   2. Plugins / other Jenkins Master configurations.
> > > >   3. Experimentation on CI pipelines.
> > > >3. Harder to debug and resolve issues - Since access to master and
> > > slave
> > > >is not with the same community, it requires Infra and community to
> > > dive
> > > >deep together on all action items.
> > > >
> > > > Possible Solutions:
> > > >
> > > > ==
> > > >
> > > >1. Can we set up a separate Jenkins CI build system for Apache
> MXNet
> > > >outside Apache Infra?
> > > >2. Can we have a separate Jenkins Master in Apache Infra for
> MXNet?
> > > >3. Review design of current setup, refine and fill the gaps.
> > > >
> > > > @ Mentors/Infra team/Community:
> > > >
> > > > ==
> > > >
> > > > Please provide your suggestions on how we can proceed further and
> work
> > on
> > > > stabilizing the CI build systems for MXNet.
> > > >
> > > > Also, if the community decides on separate Jenkins CI build system,
> > what
> > > > important points should be taken care of apart from the below:
> > > >
> > > >1. Community being able to access the build page for build
> statuses.
> > > >2. Committers being able to login with apache credentials.
> > > >3. Hook setup from apache/incubator-mxnet repo to Jenkins master.
> > > >
> > > >
> > > > Irrespective of the solution we come up, I think we should initiate a
> > > > technical design discussion on how to setup the CI build system.
> > > Probably 1
> > > > or 2 pager documents with the architecture and review with Infra and
> > > > community members.
> > > >
> > > > ***There were few proposal and discussion on the slack channel, to
> > reach
> > > > wider community members, moving that discussion formally to this
> list.
> > > >
> > > >
> > > > My Proposal: Option 1 - Set up separate Jenkins CI build system.
> > > >
> > > > Thanks,
> > > >
> > > > Sandeep
> > > >
> > > >
> > > >
> > > > --
> > > > Sandeep Krishnamurthy
> > > >
> > >
> >
>


Re: [Proposal] Stabilizing Apache MXNet CI build system

2017-10-20 Thread Chris Olivier
Why don;t we look into fully managed AWS CodeBuild?  It maintains
everything. It's also compatible with Jenkins.

On Fri, Oct 20, 2017 at 1:51 PM, Tianqi Chen 
wrote:

> +1
>
> Tianqi
> On Fri, Oct 20, 2017 at 1:39 PM Mu Li  wrote:
>
> > +1
> >
> >
> > It seems that the Apache CI is quite overloaded these days, and MXNet's
> CI
> > pipeline is too complex to run there. In addition, we may need to add
> more
> > devices, e.g. macpro and rasbperry pi, into the server, and more tasks
> such
> > as pip build. It means a lot of requests to the Infra team.
> >
> > We can reuse our previous Jenkins server at http://ci.mxnet.io/. But we
> > probably need a dedicate developer to maintain it.
> >
> >
> >
> > On Fri, Oct 20, 2017 at 1:01 PM, sandeep krishnamurthy <
> > sandeep.krishn...@gmail.com> wrote:
> >
> > > Hello all,
> > >
> > > I am hereby opening up a discussion thread on how we can stabilize
> Apache
> > > MXNet CI build system.
> > >
> > > Problems:
> > >
> > > 
> > >
> > > Recently, we have seen following issues with Apache MXNet CI build
> > systems:
> > >
> > >1. Apache Jenkins master is overloaded and we see issues like -
> unable
> > >to trigger builds, difficult to load and view the blue ocean and
> other
> > >Jenkins build status page.
> > >2. We are generating too many request/interaction on Apache Infra
> > team.
> > >   1. Addition/deletion of new slave: Caused from scaling activity,
> > >   recycling, troubleshooting or any actions leading to change of
> > slave
> > >   machines.
> > >   2. Plugins / other Jenkins Master configurations.
> > >   3. Experimentation on CI pipelines.
> > >3. Harder to debug and resolve issues - Since access to master and
> > slave
> > >is not with the same community, it requires Infra and community to
> > dive
> > >deep together on all action items.
> > >
> > > Possible Solutions:
> > >
> > > ==
> > >
> > >1. Can we set up a separate Jenkins CI build system for Apache MXNet
> > >outside Apache Infra?
> > >2. Can we have a separate Jenkins Master in Apache Infra for MXNet?
> > >3. Review design of current setup, refine and fill the gaps.
> > >
> > > @ Mentors/Infra team/Community:
> > >
> > > ==
> > >
> > > Please provide your suggestions on how we can proceed further and work
> on
> > > stabilizing the CI build systems for MXNet.
> > >
> > > Also, if the community decides on separate Jenkins CI build system,
> what
> > > important points should be taken care of apart from the below:
> > >
> > >1. Community being able to access the build page for build statuses.
> > >2. Committers being able to login with apache credentials.
> > >3. Hook setup from apache/incubator-mxnet repo to Jenkins master.
> > >
> > >
> > > Irrespective of the solution we come up, I think we should initiate a
> > > technical design discussion on how to setup the CI build system.
> > Probably 1
> > > or 2 pager documents with the architecture and review with Infra and
> > > community members.
> > >
> > > ***There were few proposal and discussion on the slack channel, to
> reach
> > > wider community members, moving that discussion formally to this list.
> > >
> > >
> > > My Proposal: Option 1 - Set up separate Jenkins CI build system.
> > >
> > > Thanks,
> > >
> > > Sandeep
> > >
> > >
> > >
> > > --
> > > Sandeep Krishnamurthy
> > >
> >
>


Re: [Proposal] Stabilizing Apache MXNet CI build system

2017-10-20 Thread Tianqi Chen
+1

Tianqi
On Fri, Oct 20, 2017 at 1:39 PM Mu Li  wrote:

> +1
>
>
> It seems that the Apache CI is quite overloaded these days, and MXNet's CI
> pipeline is too complex to run there. In addition, we may need to add more
> devices, e.g. macpro and rasbperry pi, into the server, and more tasks such
> as pip build. It means a lot of requests to the Infra team.
>
> We can reuse our previous Jenkins server at http://ci.mxnet.io/. But we
> probably need a dedicate developer to maintain it.
>
>
>
> On Fri, Oct 20, 2017 at 1:01 PM, sandeep krishnamurthy <
> sandeep.krishn...@gmail.com> wrote:
>
> > Hello all,
> >
> > I am hereby opening up a discussion thread on how we can stabilize Apache
> > MXNet CI build system.
> >
> > Problems:
> >
> > 
> >
> > Recently, we have seen following issues with Apache MXNet CI build
> systems:
> >
> >1. Apache Jenkins master is overloaded and we see issues like - unable
> >to trigger builds, difficult to load and view the blue ocean and other
> >Jenkins build status page.
> >2. We are generating too many request/interaction on Apache Infra
> team.
> >   1. Addition/deletion of new slave: Caused from scaling activity,
> >   recycling, troubleshooting or any actions leading to change of
> slave
> >   machines.
> >   2. Plugins / other Jenkins Master configurations.
> >   3. Experimentation on CI pipelines.
> >3. Harder to debug and resolve issues - Since access to master and
> slave
> >is not with the same community, it requires Infra and community to
> dive
> >deep together on all action items.
> >
> > Possible Solutions:
> >
> > ==
> >
> >1. Can we set up a separate Jenkins CI build system for Apache MXNet
> >outside Apache Infra?
> >2. Can we have a separate Jenkins Master in Apache Infra for MXNet?
> >3. Review design of current setup, refine and fill the gaps.
> >
> > @ Mentors/Infra team/Community:
> >
> > ==
> >
> > Please provide your suggestions on how we can proceed further and work on
> > stabilizing the CI build systems for MXNet.
> >
> > Also, if the community decides on separate Jenkins CI build system, what
> > important points should be taken care of apart from the below:
> >
> >1. Community being able to access the build page for build statuses.
> >2. Committers being able to login with apache credentials.
> >3. Hook setup from apache/incubator-mxnet repo to Jenkins master.
> >
> >
> > Irrespective of the solution we come up, I think we should initiate a
> > technical design discussion on how to setup the CI build system.
> Probably 1
> > or 2 pager documents with the architecture and review with Infra and
> > community members.
> >
> > ***There were few proposal and discussion on the slack channel, to reach
> > wider community members, moving that discussion formally to this list.
> >
> >
> > My Proposal: Option 1 - Set up separate Jenkins CI build system.
> >
> > Thanks,
> >
> > Sandeep
> >
> >
> >
> > --
> > Sandeep Krishnamurthy
> >
>


Re: [Proposal] Stabilizing Apache MXNet CI build system

2017-10-20 Thread Mu Li
+1


It seems that the Apache CI is quite overloaded these days, and MXNet's CI
pipeline is too complex to run there. In addition, we may need to add more
devices, e.g. macpro and rasbperry pi, into the server, and more tasks such
as pip build. It means a lot of requests to the Infra team.

We can reuse our previous Jenkins server at http://ci.mxnet.io/. But we
probably need a dedicate developer to maintain it.



On Fri, Oct 20, 2017 at 1:01 PM, sandeep krishnamurthy <
sandeep.krishn...@gmail.com> wrote:

> Hello all,
>
> I am hereby opening up a discussion thread on how we can stabilize Apache
> MXNet CI build system.
>
> Problems:
>
> 
>
> Recently, we have seen following issues with Apache MXNet CI build systems:
>
>1. Apache Jenkins master is overloaded and we see issues like - unable
>to trigger builds, difficult to load and view the blue ocean and other
>Jenkins build status page.
>2. We are generating too many request/interaction on Apache Infra team.
>   1. Addition/deletion of new slave: Caused from scaling activity,
>   recycling, troubleshooting or any actions leading to change of slave
>   machines.
>   2. Plugins / other Jenkins Master configurations.
>   3. Experimentation on CI pipelines.
>3. Harder to debug and resolve issues - Since access to master and slave
>is not with the same community, it requires Infra and community to dive
>deep together on all action items.
>
> Possible Solutions:
>
> ==
>
>1. Can we set up a separate Jenkins CI build system for Apache MXNet
>outside Apache Infra?
>2. Can we have a separate Jenkins Master in Apache Infra for MXNet?
>3. Review design of current setup, refine and fill the gaps.
>
> @ Mentors/Infra team/Community:
>
> ==
>
> Please provide your suggestions on how we can proceed further and work on
> stabilizing the CI build systems for MXNet.
>
> Also, if the community decides on separate Jenkins CI build system, what
> important points should be taken care of apart from the below:
>
>1. Community being able to access the build page for build statuses.
>2. Committers being able to login with apache credentials.
>3. Hook setup from apache/incubator-mxnet repo to Jenkins master.
>
>
> Irrespective of the solution we come up, I think we should initiate a
> technical design discussion on how to setup the CI build system. Probably 1
> or 2 pager documents with the architecture and review with Infra and
> community members.
>
> ***There were few proposal and discussion on the slack channel, to reach
> wider community members, moving that discussion formally to this list.
>
>
> My Proposal: Option 1 - Set up separate Jenkins CI build system.
>
> Thanks,
>
> Sandeep
>
>
>
> --
> Sandeep Krishnamurthy
>


Fwd: [Proposal] Stabilizing Apache MXNet CI build system

2017-10-20 Thread sandeep krishnamurthy
Hello all,

I am hereby opening up a discussion thread on how we can stabilize Apache
MXNet CI build system.

Problems:



Recently, we have seen following issues with Apache MXNet CI build systems:

   1. Apache Jenkins master is overloaded and we see issues like - unable
   to trigger builds, difficult to load and view the blue ocean and other
   Jenkins build status page.
   2. We are generating too many request/interaction on Apache Infra team.
  1. Addition/deletion of new slave: Caused from scaling activity,
  recycling, troubleshooting or any actions leading to change of slave
  machines.
  2. Plugins / other Jenkins Master configurations.
  3. Experimentation on CI pipelines.
   3. Harder to debug and resolve issues - Since access to master and slave
   is not with the same community, it requires Infra and community to dive
   deep together on all action items.

Possible Solutions:

==

   1. Can we set up a separate Jenkins CI build system for Apache MXNet
   outside Apache Infra?
   2. Can we have a separate Jenkins Master in Apache Infra for MXNet?
   3. Review design of current setup, refine and fill the gaps.

@ Mentors/Infra team/Community:

==

Please provide your suggestions on how we can proceed further and work on
stabilizing the CI build systems for MXNet.

Also, if the community decides on separate Jenkins CI build system, what
important points should be taken care of apart from the below:

   1. Community being able to access the build page for build statuses.
   2. Committers being able to login with apache credentials.
   3. Hook setup from apache/incubator-mxnet repo to Jenkins master.


Irrespective of the solution we come up, I think we should initiate a
technical design discussion on how to setup the CI build system. Probably 1
or 2 pager documents with the architecture and review with Infra and
community members.

***There were few proposal and discussion on the slack channel, to reach
wider community members, moving that discussion formally to this list.


My Proposal: Option 1 - Set up separate Jenkins CI build system.

Thanks,

Sandeep



-- 
Sandeep Krishnamurthy