Re: [DISCUSS] Build Times are getting out of hand

2017-02-07 Thread Casey Stella
haha there was some desperation there, I'll admit. ;)

On Tue, Feb 7, 2017 at 3:12 PM, Otto Fowler  wrote:

> This PR gets a star just for the commit messages, it isn’t even Friday
> Casey
>
>
> On February 7, 2017 at 14:49:22, Casey Stella (ceste...@gmail.com) wrote:
>
> I spent a minute or two looking at how we might use travis
> configuration-alone to drop the wall-clock time of the build and put it up
> for review at https://github.com/apache/incubator-metron/pull/444
>
> It does 2 things:
>
> - Separates the build, the unit tests and the integration tests
> - Parallelizes the unit tests and the build and runs the integration
> tests within the travis container
> - Runs the unit tests and integration tests in separate travis
> containers using travis' build matrix
>
> This ultimately cuts the wallclock time down to 24 minutes for me on
> travis
> and should give us some time where we're not constantly bouncing builds to
> act on the suggestions here.
>
>
> On Tue, Feb 7, 2017 at 1:03 PM, Michael Miklavcic <
> michael.miklav...@gmail.com> wrote:
>
> > FYI, found this for Docker - https://docs.travis-ci.com/user/docker/
> >
> > On Tue, Feb 7, 2017 at 9:09 AM, David Lyle 
> wrote:
> >
> > > Absolutely agree. I also think we'd want both once we've done that.
> > Travis
> > > is good for smoke testing PRs and Commits. Jenkins is good for nightly
> > runs
> > > of medium duration tests and would be great for automating our
> > distributed
> > > testing if we found infrastructure to support it. I've seen them used
> in
> > > concert to provide a good solution.
> > >
> > > But, initially, I'd like to see us get our in-process stuff replaced
> with
> > > docker where (if) it makes sense, refactored to run in parallel, the
> poms
> > > refactored to handle our dependencies better and our uber jars removed
> > > where they can be and minimized where they cannot be.
> > >
> > > Which, I think, is a long-winded way of saying "I'd like to see us do
> > what
> > > Casey suggested." :)
> > >
> > > -D...
> > >
> > >
> > > On Tue, Feb 7, 2017 at 10:45 AM, Michael Miklavcic <
> > > michael.miklav...@gmail.com> wrote:
> > >
> > > > I agree with this. I don't think we should switch to an alternate
> > system
> > > > until we find that we are absolutely incapable of eking out any
> further
> > > > efficiency from the current setup.
> > > >
> > > > On Tue, Feb 7, 2017 at 8:04 AM, Casey Stella 
> > wrote:
> > > >
> > > > > I believe that some people use travis and some people request
> Jenkins
> > > > from
> > > > > Apache Infra. That being said, personally, I think we should take
> > the
> > > > > opportunity to correct the underlying issues. 50 minutes for a
> build
> > > > seems
> > > > > excessive to me.
> > > > >
> > > > > On Mon, Feb 6, 2017 at 10:07 PM, Otto Fowler <
> > ottobackwa...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Is there an alternative to Travis? Do other like sized apache
> > > projects
> > > > > > have these problems? Do they use travis?
> > > > > >
> > > > > >
> > > > > > On February 6, 2017 at 17:02:37, Casey Stella (
> ceste...@gmail.com)
> > > > > wrote:
> > > > > >
> > > > > > For those with pending/building pull requests, it will come as
> no
> > > > > surprise
> > > > > > that our build times are increasing at a pace that is worrisome.
> In
> > > > fact,
> > > > > > we have hit a fundamental limit associated with Travis over the
> > > > weekend.
> > > > > > We have creeped up into the 40+ minute build territory and
> travis
> > > seems
> > > > > to
> > > > > > error out at around 49 minutes.
> > > > > >
> > > > > > Taking the current build (
> > > > > > https://travis-ci.org/apache/incubator-metron/jobs/198929446),
> > > looking
> > > > > at
> > > > > > just job times, we're spending about 19 - 20 minutes (1176.53
> > > seconds)
> > > > in
> > > > > > tests out of 44 minutes and 42 seconds to do the build. This
> places
> > > the
> > > > > > unit tests at around 43% of the build time. I say all of this to
> > > point
> > > > > out
> > > > > > that while unit tests are a portion of the build, they are not
> even
> > > the
> > > > > > majority of the build time. We need an approach that addresses
> the
> > > > whole
> > > > > > build performance holistically and we need it soonest.
> > > > > >
> > > > > > To seed the discussion, I will point to a few things that come
> to
> > > mind
> > > > > > that
> > > > > > fit into three broad categories:
> > > > > >
> > > > > > *Tests are Slow*
> > > > > >
> > > > > >
> > > > > > - *Tactical*: We have around 13 tests that take more than 30
> > seconds
> > > > and
> > > > > > make up 14 minutes of the build. Considering what we can do to
> > speed
> > > > > those
> > > > > > tests as a tactical approach may be worth considering
> > > > > > - We are spinning up the same services (e.g. kafka, storm) for
> > > multiple
> > > > > > tests, instead use the docker infrastructure to spin them 

Re: [DISCUSS] Build Times are getting out of hand

2017-02-07 Thread Otto Fowler
This PR gets a star just for the commit messages, it isn’t even Friday Casey


On February 7, 2017 at 14:49:22, Casey Stella (ceste...@gmail.com) wrote:

I spent a minute or two looking at how we might use travis
configuration-alone to drop the wall-clock time of the build and put it up
for review at https://github.com/apache/incubator-metron/pull/444

It does 2 things:

- Separates the build, the unit tests and the integration tests
- Parallelizes the unit tests and the build and runs the integration
tests within the travis container
- Runs the unit tests and integration tests in separate travis
containers using travis' build matrix

This ultimately cuts the wallclock time down to 24 minutes for me on travis
and should give us some time where we're not constantly bouncing builds to
act on the suggestions here.


On Tue, Feb 7, 2017 at 1:03 PM, Michael Miklavcic <
michael.miklav...@gmail.com> wrote:

> FYI, found this for Docker - https://docs.travis-ci.com/user/docker/
>
> On Tue, Feb 7, 2017 at 9:09 AM, David Lyle  wrote:
>
> > Absolutely agree. I also think we'd want both once we've done that.
> Travis
> > is good for smoke testing PRs and Commits. Jenkins is good for nightly
> runs
> > of medium duration tests and would be great for automating our
> distributed
> > testing if we found infrastructure to support it. I've seen them used
in
> > concert to provide a good solution.
> >
> > But, initially, I'd like to see us get our in-process stuff replaced
with
> > docker where (if) it makes sense, refactored to run in parallel, the
poms
> > refactored to handle our dependencies better and our uber jars removed
> > where they can be and minimized where they cannot be.
> >
> > Which, I think, is a long-winded way of saying "I'd like to see us do
> what
> > Casey suggested." :)
> >
> > -D...
> >
> >
> > On Tue, Feb 7, 2017 at 10:45 AM, Michael Miklavcic <
> > michael.miklav...@gmail.com> wrote:
> >
> > > I agree with this. I don't think we should switch to an alternate
> system
> > > until we find that we are absolutely incapable of eking out any
further
> > > efficiency from the current setup.
> > >
> > > On Tue, Feb 7, 2017 at 8:04 AM, Casey Stella 
> wrote:
> > >
> > > > I believe that some people use travis and some people request
Jenkins
> > > from
> > > > Apache Infra. That being said, personally, I think we should take
> the
> > > > opportunity to correct the underlying issues. 50 minutes for a
build
> > > seems
> > > > excessive to me.
> > > >
> > > > On Mon, Feb 6, 2017 at 10:07 PM, Otto Fowler <
> ottobackwa...@gmail.com>
> > > > wrote:
> > > >
> > > > > Is there an alternative to Travis? Do other like sized apache
> > projects
> > > > > have these problems? Do they use travis?
> > > > >
> > > > >
> > > > > On February 6, 2017 at 17:02:37, Casey Stella (ceste...@gmail.com)

> > > > wrote:
> > > > >
> > > > > For those with pending/building pull requests, it will come as no
> > > > surprise
> > > > > that our build times are increasing at a pace that is worrisome.
In
> > > fact,
> > > > > we have hit a fundamental limit associated with Travis over the
> > > weekend.
> > > > > We have creeped up into the 40+ minute build territory and travis
> > seems
> > > > to
> > > > > error out at around 49 minutes.
> > > > >
> > > > > Taking the current build (
> > > > > https://travis-ci.org/apache/incubator-metron/jobs/198929446),
> > looking
> > > > at
> > > > > just job times, we're spending about 19 - 20 minutes (1176.53
> > seconds)
> > > in
> > > > > tests out of 44 minutes and 42 seconds to do the build. This
places
> > the
> > > > > unit tests at around 43% of the build time. I say all of this to
> > point
> > > > out
> > > > > that while unit tests are a portion of the build, they are not
even
> > the
> > > > > majority of the build time. We need an approach that addresses
the
> > > whole
> > > > > build performance holistically and we need it soonest.
> > > > >
> > > > > To seed the discussion, I will point to a few things that come to
> > mind
> > > > > that
> > > > > fit into three broad categories:
> > > > >
> > > > > *Tests are Slow*
> > > > >
> > > > >
> > > > > - *Tactical*: We have around 13 tests that take more than 30
> seconds
> > > and
> > > > > make up 14 minutes of the build. Considering what we can do to
> speed
> > > > those
> > > > > tests as a tactical approach may be worth considering
> > > > > - We are spinning up the same services (e.g. kafka, storm) for
> > multiple
> > > > > tests, instead use the docker infrastructure to spin them up once
> and
> > > > then
> > > > > use them throughout the tests.
> > > > >
> > > > >
> > > > > *Tests aren't parallel*
> > > > >
> > > > > Currently we cannot run the build in parallel due to the
> integration
> > > test
> > > > > infrastructure spinning up its own services that bind to the same
> > > ports.
> > > > > If we correct this, we can run the builds in parallel with mvn -T
> > > > >
> > > > > - 

Re: [DISCUSS] Build Times are getting out of hand

2017-02-07 Thread Ryan Merriman
Down to 24 minutes?  Nice job.

On Tue, Feb 7, 2017 at 1:49 PM, Casey Stella  wrote:

> I spent a minute or two looking at how we might use travis
> configuration-alone to drop the wall-clock time of the build and put it up
> for review at https://github.com/apache/incubator-metron/pull/444
>
> It does 2 things:
>
>- Separates the build, the unit tests and the integration tests
>- Parallelizes the unit tests and the build and runs the integration
>tests within the travis container
>- Runs the unit tests and integration tests in separate travis
>containers using travis' build matrix
>
> This ultimately cuts the wallclock time down to 24 minutes for me on travis
> and should give us some time where we're not constantly bouncing builds to
> act on the suggestions here.
>
>
> On Tue, Feb 7, 2017 at 1:03 PM, Michael Miklavcic <
> michael.miklav...@gmail.com> wrote:
>
> > FYI, found this for Docker - https://docs.travis-ci.com/user/docker/
> >
> > On Tue, Feb 7, 2017 at 9:09 AM, David Lyle  wrote:
> >
> > > Absolutely agree. I also think we'd want both once we've done that.
> > Travis
> > > is good for smoke testing PRs and Commits. Jenkins is good for nightly
> > runs
> > > of medium duration tests and would be great for automating our
> > distributed
> > > testing if we found infrastructure to support it. I've seen them used
> in
> > > concert to provide a good solution.
> > >
> > > But, initially, I'd like to see us get our in-process stuff replaced
> with
> > > docker where (if) it makes sense, refactored to run in parallel, the
> poms
> > > refactored to handle our dependencies better and our uber jars removed
> > > where they can be and minimized where they cannot be.
> > >
> > > Which, I think, is a long-winded way of saying "I'd like to see us do
> > what
> > > Casey suggested." :)
> > >
> > > -D...
> > >
> > >
> > > On Tue, Feb 7, 2017 at 10:45 AM, Michael Miklavcic <
> > > michael.miklav...@gmail.com> wrote:
> > >
> > > > I agree with this. I don't think we should switch to an alternate
> > system
> > > > until we find that we are absolutely incapable of eking out any
> further
> > > > efficiency from the current setup.
> > > >
> > > > On Tue, Feb 7, 2017 at 8:04 AM, Casey Stella 
> > wrote:
> > > >
> > > > > I believe that some people use travis and some people request
> Jenkins
> > > > from
> > > > > Apache Infra.  That being said, personally, I think we should take
> > the
> > > > > opportunity to correct the underlying issues.  50 minutes for a
> build
> > > > seems
> > > > > excessive to me.
> > > > >
> > > > > On Mon, Feb 6, 2017 at 10:07 PM, Otto Fowler <
> > ottobackwa...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Is there an alternative to Travis?  Do other like sized apache
> > > projects
> > > > > > have these problems?  Do they use travis?
> > > > > >
> > > > > >
> > > > > > On February 6, 2017 at 17:02:37, Casey Stella (
> ceste...@gmail.com)
> > > > > wrote:
> > > > > >
> > > > > > For those with pending/building pull requests, it will come as no
> > > > > surprise
> > > > > > that our build times are increasing at a pace that is worrisome.
> In
> > > > fact,
> > > > > > we have hit a fundamental limit associated with Travis over the
> > > > weekend.
> > > > > > We have creeped up into the 40+ minute build territory and travis
> > > seems
> > > > > to
> > > > > > error out at around 49 minutes.
> > > > > >
> > > > > > Taking the current build (
> > > > > > https://travis-ci.org/apache/incubator-metron/jobs/198929446),
> > > looking
> > > > > at
> > > > > > just job times, we're spending about 19 - 20 minutes (1176.53
> > > seconds)
> > > > in
> > > > > > tests out of 44 minutes and 42 seconds to do the build. This
> places
> > > the
> > > > > > unit tests at around 43% of the build time. I say all of this to
> > > point
> > > > > out
> > > > > > that while unit tests are a portion of the build, they are not
> even
> > > the
> > > > > > majority of the build time. We need an approach that addresses
> the
> > > > whole
> > > > > > build performance holistically and we need it soonest.
> > > > > >
> > > > > > To seed the discussion, I will point to a few things that come to
> > > mind
> > > > > > that
> > > > > > fit into three broad categories:
> > > > > >
> > > > > > *Tests are Slow*
> > > > > >
> > > > > >
> > > > > > - *Tactical*: We have around 13 tests that take more than 30
> > seconds
> > > > and
> > > > > > make up 14 minutes of the build. Considering what we can do to
> > speed
> > > > > those
> > > > > > tests as a tactical approach may be worth considering
> > > > > > - We are spinning up the same services (e.g. kafka, storm) for
> > > multiple
> > > > > > tests, instead use the docker infrastructure to spin them up once
> > and
> > > > > then
> > > > > > use them throughout the tests.
> > > > > >
> > > > > >
> > > > > > *Tests aren't parallel*
> > > > > >
> > > > > > Currently we cannot run 

Re: [DISCUSS] Build Times are getting out of hand

2017-02-07 Thread Casey Stella
I spent a minute or two looking at how we might use travis
configuration-alone to drop the wall-clock time of the build and put it up
for review at https://github.com/apache/incubator-metron/pull/444

It does 2 things:

   - Separates the build, the unit tests and the integration tests
   - Parallelizes the unit tests and the build and runs the integration
   tests within the travis container
   - Runs the unit tests and integration tests in separate travis
   containers using travis' build matrix

This ultimately cuts the wallclock time down to 24 minutes for me on travis
and should give us some time where we're not constantly bouncing builds to
act on the suggestions here.


On Tue, Feb 7, 2017 at 1:03 PM, Michael Miklavcic <
michael.miklav...@gmail.com> wrote:

> FYI, found this for Docker - https://docs.travis-ci.com/user/docker/
>
> On Tue, Feb 7, 2017 at 9:09 AM, David Lyle  wrote:
>
> > Absolutely agree. I also think we'd want both once we've done that.
> Travis
> > is good for smoke testing PRs and Commits. Jenkins is good for nightly
> runs
> > of medium duration tests and would be great for automating our
> distributed
> > testing if we found infrastructure to support it. I've seen them used in
> > concert to provide a good solution.
> >
> > But, initially, I'd like to see us get our in-process stuff replaced with
> > docker where (if) it makes sense, refactored to run in parallel, the poms
> > refactored to handle our dependencies better and our uber jars removed
> > where they can be and minimized where they cannot be.
> >
> > Which, I think, is a long-winded way of saying "I'd like to see us do
> what
> > Casey suggested." :)
> >
> > -D...
> >
> >
> > On Tue, Feb 7, 2017 at 10:45 AM, Michael Miklavcic <
> > michael.miklav...@gmail.com> wrote:
> >
> > > I agree with this. I don't think we should switch to an alternate
> system
> > > until we find that we are absolutely incapable of eking out any further
> > > efficiency from the current setup.
> > >
> > > On Tue, Feb 7, 2017 at 8:04 AM, Casey Stella 
> wrote:
> > >
> > > > I believe that some people use travis and some people request Jenkins
> > > from
> > > > Apache Infra.  That being said, personally, I think we should take
> the
> > > > opportunity to correct the underlying issues.  50 minutes for a build
> > > seems
> > > > excessive to me.
> > > >
> > > > On Mon, Feb 6, 2017 at 10:07 PM, Otto Fowler <
> ottobackwa...@gmail.com>
> > > > wrote:
> > > >
> > > > > Is there an alternative to Travis?  Do other like sized apache
> > projects
> > > > > have these problems?  Do they use travis?
> > > > >
> > > > >
> > > > > On February 6, 2017 at 17:02:37, Casey Stella (ceste...@gmail.com)
> > > > wrote:
> > > > >
> > > > > For those with pending/building pull requests, it will come as no
> > > > surprise
> > > > > that our build times are increasing at a pace that is worrisome. In
> > > fact,
> > > > > we have hit a fundamental limit associated with Travis over the
> > > weekend.
> > > > > We have creeped up into the 40+ minute build territory and travis
> > seems
> > > > to
> > > > > error out at around 49 minutes.
> > > > >
> > > > > Taking the current build (
> > > > > https://travis-ci.org/apache/incubator-metron/jobs/198929446),
> > looking
> > > > at
> > > > > just job times, we're spending about 19 - 20 minutes (1176.53
> > seconds)
> > > in
> > > > > tests out of 44 minutes and 42 seconds to do the build. This places
> > the
> > > > > unit tests at around 43% of the build time. I say all of this to
> > point
> > > > out
> > > > > that while unit tests are a portion of the build, they are not even
> > the
> > > > > majority of the build time. We need an approach that addresses the
> > > whole
> > > > > build performance holistically and we need it soonest.
> > > > >
> > > > > To seed the discussion, I will point to a few things that come to
> > mind
> > > > > that
> > > > > fit into three broad categories:
> > > > >
> > > > > *Tests are Slow*
> > > > >
> > > > >
> > > > > - *Tactical*: We have around 13 tests that take more than 30
> seconds
> > > and
> > > > > make up 14 minutes of the build. Considering what we can do to
> speed
> > > > those
> > > > > tests as a tactical approach may be worth considering
> > > > > - We are spinning up the same services (e.g. kafka, storm) for
> > multiple
> > > > > tests, instead use the docker infrastructure to spin them up once
> and
> > > > then
> > > > > use them throughout the tests.
> > > > >
> > > > >
> > > > > *Tests aren't parallel*
> > > > >
> > > > > Currently we cannot run the build in parallel due to the
> integration
> > > test
> > > > > infrastructure spinning up its own services that bind to the same
> > > ports.
> > > > > If we correct this, we can run the builds in parallel with mvn -T
> > > > >
> > > > > - Correct this by decoupling the infrastructure from the tests and
> > > > > refactoring the tests to run in parallel.
> > > > > - Make the 

Re: [DISCUSS] Build Times are getting out of hand

2017-02-07 Thread Michael Miklavcic
FYI, found this for Docker - https://docs.travis-ci.com/user/docker/

On Tue, Feb 7, 2017 at 9:09 AM, David Lyle  wrote:

> Absolutely agree. I also think we'd want both once we've done that. Travis
> is good for smoke testing PRs and Commits. Jenkins is good for nightly runs
> of medium duration tests and would be great for automating our distributed
> testing if we found infrastructure to support it. I've seen them used in
> concert to provide a good solution.
>
> But, initially, I'd like to see us get our in-process stuff replaced with
> docker where (if) it makes sense, refactored to run in parallel, the poms
> refactored to handle our dependencies better and our uber jars removed
> where they can be and minimized where they cannot be.
>
> Which, I think, is a long-winded way of saying "I'd like to see us do what
> Casey suggested." :)
>
> -D...
>
>
> On Tue, Feb 7, 2017 at 10:45 AM, Michael Miklavcic <
> michael.miklav...@gmail.com> wrote:
>
> > I agree with this. I don't think we should switch to an alternate system
> > until we find that we are absolutely incapable of eking out any further
> > efficiency from the current setup.
> >
> > On Tue, Feb 7, 2017 at 8:04 AM, Casey Stella  wrote:
> >
> > > I believe that some people use travis and some people request Jenkins
> > from
> > > Apache Infra.  That being said, personally, I think we should take the
> > > opportunity to correct the underlying issues.  50 minutes for a build
> > seems
> > > excessive to me.
> > >
> > > On Mon, Feb 6, 2017 at 10:07 PM, Otto Fowler 
> > > wrote:
> > >
> > > > Is there an alternative to Travis?  Do other like sized apache
> projects
> > > > have these problems?  Do they use travis?
> > > >
> > > >
> > > > On February 6, 2017 at 17:02:37, Casey Stella (ceste...@gmail.com)
> > > wrote:
> > > >
> > > > For those with pending/building pull requests, it will come as no
> > > surprise
> > > > that our build times are increasing at a pace that is worrisome. In
> > fact,
> > > > we have hit a fundamental limit associated with Travis over the
> > weekend.
> > > > We have creeped up into the 40+ minute build territory and travis
> seems
> > > to
> > > > error out at around 49 minutes.
> > > >
> > > > Taking the current build (
> > > > https://travis-ci.org/apache/incubator-metron/jobs/198929446),
> looking
> > > at
> > > > just job times, we're spending about 19 - 20 minutes (1176.53
> seconds)
> > in
> > > > tests out of 44 minutes and 42 seconds to do the build. This places
> the
> > > > unit tests at around 43% of the build time. I say all of this to
> point
> > > out
> > > > that while unit tests are a portion of the build, they are not even
> the
> > > > majority of the build time. We need an approach that addresses the
> > whole
> > > > build performance holistically and we need it soonest.
> > > >
> > > > To seed the discussion, I will point to a few things that come to
> mind
> > > > that
> > > > fit into three broad categories:
> > > >
> > > > *Tests are Slow*
> > > >
> > > >
> > > > - *Tactical*: We have around 13 tests that take more than 30 seconds
> > and
> > > > make up 14 minutes of the build. Considering what we can do to speed
> > > those
> > > > tests as a tactical approach may be worth considering
> > > > - We are spinning up the same services (e.g. kafka, storm) for
> multiple
> > > > tests, instead use the docker infrastructure to spin them up once and
> > > then
> > > > use them throughout the tests.
> > > >
> > > >
> > > > *Tests aren't parallel*
> > > >
> > > > Currently we cannot run the build in parallel due to the integration
> > test
> > > > infrastructure spinning up its own services that bind to the same
> > ports.
> > > > If we correct this, we can run the builds in parallel with mvn -T
> > > >
> > > > - Correct this by decoupling the infrastructure from the tests and
> > > > refactoring the tests to run in parallel.
> > > > - Make the integration testing infrastructure bind intelligently to
> > > > whatever port is available.
> > > > - Move the integration tests to their own project. This will let us
> run
> > > > the build in parallel since an individual project's test will be run
> > > > serially.
> > > >
> > > > *Packaging is Painful*
> > > >
> > > > We have a sensitive environment in terms of dependencies. As such, we
> > are
> > > > careful to shade and relocate dependencies that we want to isolate
> from
> > > > our
> > > > transitive dependencies. The consequences of this is that we spend a
> > lot
> > > > of time in the build shading and relocating maven module output.
> > > >
> > > > - Do the hard work to walk our transitive dependencies and ensure
> that
> > > > we are including only one copy of every library by using exclusions
> > > > effectively. This will not only bring down build times, it will make
> > sure
> > > > we know what we're including.
> > > > - Try to devise a strategy where we only shade once at the end. 

Re: [DISCUSS] Build Times are getting out of hand

2017-02-07 Thread David Lyle
Absolutely agree. I also think we'd want both once we've done that. Travis
is good for smoke testing PRs and Commits. Jenkins is good for nightly runs
of medium duration tests and would be great for automating our distributed
testing if we found infrastructure to support it. I've seen them used in
concert to provide a good solution.

But, initially, I'd like to see us get our in-process stuff replaced with
docker where (if) it makes sense, refactored to run in parallel, the poms
refactored to handle our dependencies better and our uber jars removed
where they can be and minimized where they cannot be.

Which, I think, is a long-winded way of saying "I'd like to see us do what
Casey suggested." :)

-D...


On Tue, Feb 7, 2017 at 10:45 AM, Michael Miklavcic <
michael.miklav...@gmail.com> wrote:

> I agree with this. I don't think we should switch to an alternate system
> until we find that we are absolutely incapable of eking out any further
> efficiency from the current setup.
>
> On Tue, Feb 7, 2017 at 8:04 AM, Casey Stella  wrote:
>
> > I believe that some people use travis and some people request Jenkins
> from
> > Apache Infra.  That being said, personally, I think we should take the
> > opportunity to correct the underlying issues.  50 minutes for a build
> seems
> > excessive to me.
> >
> > On Mon, Feb 6, 2017 at 10:07 PM, Otto Fowler 
> > wrote:
> >
> > > Is there an alternative to Travis?  Do other like sized apache projects
> > > have these problems?  Do they use travis?
> > >
> > >
> > > On February 6, 2017 at 17:02:37, Casey Stella (ceste...@gmail.com)
> > wrote:
> > >
> > > For those with pending/building pull requests, it will come as no
> > surprise
> > > that our build times are increasing at a pace that is worrisome. In
> fact,
> > > we have hit a fundamental limit associated with Travis over the
> weekend.
> > > We have creeped up into the 40+ minute build territory and travis seems
> > to
> > > error out at around 49 minutes.
> > >
> > > Taking the current build (
> > > https://travis-ci.org/apache/incubator-metron/jobs/198929446), looking
> > at
> > > just job times, we're spending about 19 - 20 minutes (1176.53 seconds)
> in
> > > tests out of 44 minutes and 42 seconds to do the build. This places the
> > > unit tests at around 43% of the build time. I say all of this to point
> > out
> > > that while unit tests are a portion of the build, they are not even the
> > > majority of the build time. We need an approach that addresses the
> whole
> > > build performance holistically and we need it soonest.
> > >
> > > To seed the discussion, I will point to a few things that come to mind
> > > that
> > > fit into three broad categories:
> > >
> > > *Tests are Slow*
> > >
> > >
> > > - *Tactical*: We have around 13 tests that take more than 30 seconds
> and
> > > make up 14 minutes of the build. Considering what we can do to speed
> > those
> > > tests as a tactical approach may be worth considering
> > > - We are spinning up the same services (e.g. kafka, storm) for multiple
> > > tests, instead use the docker infrastructure to spin them up once and
> > then
> > > use them throughout the tests.
> > >
> > >
> > > *Tests aren't parallel*
> > >
> > > Currently we cannot run the build in parallel due to the integration
> test
> > > infrastructure spinning up its own services that bind to the same
> ports.
> > > If we correct this, we can run the builds in parallel with mvn -T
> > >
> > > - Correct this by decoupling the infrastructure from the tests and
> > > refactoring the tests to run in parallel.
> > > - Make the integration testing infrastructure bind intelligently to
> > > whatever port is available.
> > > - Move the integration tests to their own project. This will let us run
> > > the build in parallel since an individual project's test will be run
> > > serially.
> > >
> > > *Packaging is Painful*
> > >
> > > We have a sensitive environment in terms of dependencies. As such, we
> are
> > > careful to shade and relocate dependencies that we want to isolate from
> > > our
> > > transitive dependencies. The consequences of this is that we spend a
> lot
> > > of time in the build shading and relocating maven module output.
> > >
> > > - Do the hard work to walk our transitive dependencies and ensure that
> > > we are including only one copy of every library by using exclusions
> > > effectively. This will not only bring down build times, it will make
> sure
> > > we know what we're including.
> > > - Try to devise a strategy where we only shade once at the end. This
> > > could look like some combination of
> > > - standardizing on the lowest common denominator of a troublesome
> > > library
> > > - We shade in dependencies so they can use different versions of
> > > libraries (e.g. metron-common with a modern version of guava) than the
> > > final jars.
> > > - exclusions
> > > - externalizing infrastructure out to not necessitate spinning up
> > 

Re: [DISCUSS] Build Times are getting out of hand

2017-02-07 Thread Michael Miklavcic
I agree with this. I don't think we should switch to an alternate system
until we find that we are absolutely incapable of eking out any further
efficiency from the current setup.

On Tue, Feb 7, 2017 at 8:04 AM, Casey Stella  wrote:

> I believe that some people use travis and some people request Jenkins from
> Apache Infra.  That being said, personally, I think we should take the
> opportunity to correct the underlying issues.  50 minutes for a build seems
> excessive to me.
>
> On Mon, Feb 6, 2017 at 10:07 PM, Otto Fowler 
> wrote:
>
> > Is there an alternative to Travis?  Do other like sized apache projects
> > have these problems?  Do they use travis?
> >
> >
> > On February 6, 2017 at 17:02:37, Casey Stella (ceste...@gmail.com)
> wrote:
> >
> > For those with pending/building pull requests, it will come as no
> surprise
> > that our build times are increasing at a pace that is worrisome. In fact,
> > we have hit a fundamental limit associated with Travis over the weekend.
> > We have creeped up into the 40+ minute build territory and travis seems
> to
> > error out at around 49 minutes.
> >
> > Taking the current build (
> > https://travis-ci.org/apache/incubator-metron/jobs/198929446), looking
> at
> > just job times, we're spending about 19 - 20 minutes (1176.53 seconds) in
> > tests out of 44 minutes and 42 seconds to do the build. This places the
> > unit tests at around 43% of the build time. I say all of this to point
> out
> > that while unit tests are a portion of the build, they are not even the
> > majority of the build time. We need an approach that addresses the whole
> > build performance holistically and we need it soonest.
> >
> > To seed the discussion, I will point to a few things that come to mind
> > that
> > fit into three broad categories:
> >
> > *Tests are Slow*
> >
> >
> > - *Tactical*: We have around 13 tests that take more than 30 seconds and
> > make up 14 minutes of the build. Considering what we can do to speed
> those
> > tests as a tactical approach may be worth considering
> > - We are spinning up the same services (e.g. kafka, storm) for multiple
> > tests, instead use the docker infrastructure to spin them up once and
> then
> > use them throughout the tests.
> >
> >
> > *Tests aren't parallel*
> >
> > Currently we cannot run the build in parallel due to the integration test
> > infrastructure spinning up its own services that bind to the same ports.
> > If we correct this, we can run the builds in parallel with mvn -T
> >
> > - Correct this by decoupling the infrastructure from the tests and
> > refactoring the tests to run in parallel.
> > - Make the integration testing infrastructure bind intelligently to
> > whatever port is available.
> > - Move the integration tests to their own project. This will let us run
> > the build in parallel since an individual project's test will be run
> > serially.
> >
> > *Packaging is Painful*
> >
> > We have a sensitive environment in terms of dependencies. As such, we are
> > careful to shade and relocate dependencies that we want to isolate from
> > our
> > transitive dependencies. The consequences of this is that we spend a lot
> > of time in the build shading and relocating maven module output.
> >
> > - Do the hard work to walk our transitive dependencies and ensure that
> > we are including only one copy of every library by using exclusions
> > effectively. This will not only bring down build times, it will make sure
> > we know what we're including.
> > - Try to devise a strategy where we only shade once at the end. This
> > could look like some combination of
> > - standardizing on the lowest common denominator of a troublesome
> > library
> > - We shade in dependencies so they can use different versions of
> > libraries (e.g. metron-common with a modern version of guava) than the
> > final jars.
> > - exclusions
> > - externalizing infrastructure out to not necessitate spinning up
> > hadoop components in-process for integration tests (i.e. hbase server
> > conflicts with storm in a few dependencies)
> >
> > *Final Thoughts*
> >
> > If I had three to pick, I'd pick
> >
> > - moving off of the in-memory component infrastructure to docker images
> > - fixing the maven poms to exclude correctly
> > - ensuring the resulting tests are parallelizable
> >
> > I will point out that fixing the maven poms to exclude correctly (i.e. we
> > choose the version of every jar that we depend on transitively) ticks
> > multiple boxes, not just making things faster.
> >
> > What are your thoughts? What did I miss? We need a plan and we need to
> > execute on it soon, otherwise travis is going to keep smacking us hard.
> It
> > may be worth while constructing a tactical plan and then a more strategic
> > plan that we can work toward. I was heartened at how much some of these
> > suggestions dovetail with the discussion around the future of the docker
> > infrastructure.
> >
> > Best,
> >
> > 

Re: [DISCUSS] Build Times are getting out of hand

2017-02-07 Thread JJ Meyer
Mike, unfortunately something changed recently, and I can't run `mvn clean
install -T 2C` locally anymore.

I'd like to echo that I think working on fixing the dependency issue is a
very good idea. We've actually faced issues with this on the REST API PR.
Working to fix this and having a standard way of including/excluding
dependencies will be helpful to all, and to Ryan's point will benefit us
outside of this context.

On Tue, Feb 7, 2017 at 9:36 AM, Ryan Merriman  wrote:

> Debugging integration tests in an IDE uses the same approach with our
> current infrastructure or with docker:  start up the topology with
> LocalRunner.  I've had mixed success with our current infrastructure.  As
> Mike alluded to, some tests work fine (most of the parser topologies and
> enrichment topology) while others fail when run in my IDE but work on the
> command line (ES integration test due to guava issues and Squid topology
> due to some issue with the remove subdomains Stellar function).  Of course
> with Docker infrastructure you will need a test runner to launch topologies
> in LocalRunner.  They are short and simple though and I have one written
> for each topology that I can share when appropriate.
>
> There are some advantages and disadvantages to switching the integration
> tests to use Docker.  The infrastructure we have now works and could be
> adjusted to overcome it's primary weaknesses (single classloader and start
> up/shutdown after each test).  With Docker the classloader issue goes away
> for the most part (or is much better than it is now) without any extra
> work.  For spinning services up/down once instead of with each test, we
> will need to adjust our tests to clean up after themselves or (even better)
> namespace all testing objects so that tests don't step on each other.  That
> work would have to be done no matter which infrastructure approach we
> take.  Probably the biggest downside to using Docker is that all
> integration tests will need to be adjusted and we'll likely hit some issues
> that we'll need to resolve.  I was bitten several times by services that
> broadcast their host address (Kafka for example) and I bet we hit more of
> those.  We'll also need to add a few more containers (HDFS for sure) but
> those are easy to create as long as you don't hit the issue I just
> mentioned.
>
> I think all of the suggestions so far are good ideas.  I think it goes
> without saying that we should do one at a time and maybe even reassess
> after we see the impact of each change.  I would vote for doing the
> Maven/shading one first because it is all around beneficial, even outside
> of this context.
>
> On Tue, Feb 7, 2017 at 9:04 AM, Casey Stella  wrote:
>
> > I believe that some people use travis and some people request Jenkins
> from
> > Apache Infra.  That being said, personally, I think we should take the
> > opportunity to correct the underlying issues.  50 minutes for a build
> seems
> > excessive to me.
> >
> > On Mon, Feb 6, 2017 at 10:07 PM, Otto Fowler 
> > wrote:
> >
> > > Is there an alternative to Travis?  Do other like sized apache projects
> > > have these problems?  Do they use travis?
> > >
> > >
> > > On February 6, 2017 at 17:02:37, Casey Stella (ceste...@gmail.com)
> > wrote:
> > >
> > > For those with pending/building pull requests, it will come as no
> > surprise
> > > that our build times are increasing at a pace that is worrisome. In
> fact,
> > > we have hit a fundamental limit associated with Travis over the
> weekend.
> > > We have creeped up into the 40+ minute build territory and travis seems
> > to
> > > error out at around 49 minutes.
> > >
> > > Taking the current build (
> > > https://travis-ci.org/apache/incubator-metron/jobs/198929446), looking
> > at
> > > just job times, we're spending about 19 - 20 minutes (1176.53 seconds)
> in
> > > tests out of 44 minutes and 42 seconds to do the build. This places the
> > > unit tests at around 43% of the build time. I say all of this to point
> > out
> > > that while unit tests are a portion of the build, they are not even the
> > > majority of the build time. We need an approach that addresses the
> whole
> > > build performance holistically and we need it soonest.
> > >
> > > To seed the discussion, I will point to a few things that come to mind
> > > that
> > > fit into three broad categories:
> > >
> > > *Tests are Slow*
> > >
> > >
> > > - *Tactical*: We have around 13 tests that take more than 30 seconds
> and
> > > make up 14 minutes of the build. Considering what we can do to speed
> > those
> > > tests as a tactical approach may be worth considering
> > > - We are spinning up the same services (e.g. kafka, storm) for multiple
> > > tests, instead use the docker infrastructure to spin them up once and
> > then
> > > use them throughout the tests.
> > >
> > >
> > > *Tests aren't parallel*
> > >
> > > Currently we cannot run the build in parallel due 

Re: [DISCUSS] Build Times are getting out of hand

2017-02-07 Thread Ryan Merriman
Debugging integration tests in an IDE uses the same approach with our
current infrastructure or with docker:  start up the topology with
LocalRunner.  I've had mixed success with our current infrastructure.  As
Mike alluded to, some tests work fine (most of the parser topologies and
enrichment topology) while others fail when run in my IDE but work on the
command line (ES integration test due to guava issues and Squid topology
due to some issue with the remove subdomains Stellar function).  Of course
with Docker infrastructure you will need a test runner to launch topologies
in LocalRunner.  They are short and simple though and I have one written
for each topology that I can share when appropriate.

There are some advantages and disadvantages to switching the integration
tests to use Docker.  The infrastructure we have now works and could be
adjusted to overcome it's primary weaknesses (single classloader and start
up/shutdown after each test).  With Docker the classloader issue goes away
for the most part (or is much better than it is now) without any extra
work.  For spinning services up/down once instead of with each test, we
will need to adjust our tests to clean up after themselves or (even better)
namespace all testing objects so that tests don't step on each other.  That
work would have to be done no matter which infrastructure approach we
take.  Probably the biggest downside to using Docker is that all
integration tests will need to be adjusted and we'll likely hit some issues
that we'll need to resolve.  I was bitten several times by services that
broadcast their host address (Kafka for example) and I bet we hit more of
those.  We'll also need to add a few more containers (HDFS for sure) but
those are easy to create as long as you don't hit the issue I just
mentioned.

I think all of the suggestions so far are good ideas.  I think it goes
without saying that we should do one at a time and maybe even reassess
after we see the impact of each change.  I would vote for doing the
Maven/shading one first because it is all around beneficial, even outside
of this context.

On Tue, Feb 7, 2017 at 9:04 AM, Casey Stella  wrote:

> I believe that some people use travis and some people request Jenkins from
> Apache Infra.  That being said, personally, I think we should take the
> opportunity to correct the underlying issues.  50 minutes for a build seems
> excessive to me.
>
> On Mon, Feb 6, 2017 at 10:07 PM, Otto Fowler 
> wrote:
>
> > Is there an alternative to Travis?  Do other like sized apache projects
> > have these problems?  Do they use travis?
> >
> >
> > On February 6, 2017 at 17:02:37, Casey Stella (ceste...@gmail.com)
> wrote:
> >
> > For those with pending/building pull requests, it will come as no
> surprise
> > that our build times are increasing at a pace that is worrisome. In fact,
> > we have hit a fundamental limit associated with Travis over the weekend.
> > We have creeped up into the 40+ minute build territory and travis seems
> to
> > error out at around 49 minutes.
> >
> > Taking the current build (
> > https://travis-ci.org/apache/incubator-metron/jobs/198929446), looking
> at
> > just job times, we're spending about 19 - 20 minutes (1176.53 seconds) in
> > tests out of 44 minutes and 42 seconds to do the build. This places the
> > unit tests at around 43% of the build time. I say all of this to point
> out
> > that while unit tests are a portion of the build, they are not even the
> > majority of the build time. We need an approach that addresses the whole
> > build performance holistically and we need it soonest.
> >
> > To seed the discussion, I will point to a few things that come to mind
> > that
> > fit into three broad categories:
> >
> > *Tests are Slow*
> >
> >
> > - *Tactical*: We have around 13 tests that take more than 30 seconds and
> > make up 14 minutes of the build. Considering what we can do to speed
> those
> > tests as a tactical approach may be worth considering
> > - We are spinning up the same services (e.g. kafka, storm) for multiple
> > tests, instead use the docker infrastructure to spin them up once and
> then
> > use them throughout the tests.
> >
> >
> > *Tests aren't parallel*
> >
> > Currently we cannot run the build in parallel due to the integration test
> > infrastructure spinning up its own services that bind to the same ports.
> > If we correct this, we can run the builds in parallel with mvn -T
> >
> > - Correct this by decoupling the infrastructure from the tests and
> > refactoring the tests to run in parallel.
> > - Make the integration testing infrastructure bind intelligently to
> > whatever port is available.
> > - Move the integration tests to their own project. This will let us run
> > the build in parallel since an individual project's test will be run
> > serially.
> >
> > *Packaging is Painful*
> >
> > We have a sensitive environment in terms of dependencies. As such, we are
> > careful to 

Re: [DISCUSS] Build Times are getting out of hand

2017-02-07 Thread Casey Stella
I believe that some people use travis and some people request Jenkins from
Apache Infra.  That being said, personally, I think we should take the
opportunity to correct the underlying issues.  50 minutes for a build seems
excessive to me.

On Mon, Feb 6, 2017 at 10:07 PM, Otto Fowler 
wrote:

> Is there an alternative to Travis?  Do other like sized apache projects
> have these problems?  Do they use travis?
>
>
> On February 6, 2017 at 17:02:37, Casey Stella (ceste...@gmail.com) wrote:
>
> For those with pending/building pull requests, it will come as no surprise
> that our build times are increasing at a pace that is worrisome. In fact,
> we have hit a fundamental limit associated with Travis over the weekend.
> We have creeped up into the 40+ minute build territory and travis seems to
> error out at around 49 minutes.
>
> Taking the current build (
> https://travis-ci.org/apache/incubator-metron/jobs/198929446), looking at
> just job times, we're spending about 19 - 20 minutes (1176.53 seconds) in
> tests out of 44 minutes and 42 seconds to do the build. This places the
> unit tests at around 43% of the build time. I say all of this to point out
> that while unit tests are a portion of the build, they are not even the
> majority of the build time. We need an approach that addresses the whole
> build performance holistically and we need it soonest.
>
> To seed the discussion, I will point to a few things that come to mind
> that
> fit into three broad categories:
>
> *Tests are Slow*
>
>
> - *Tactical*: We have around 13 tests that take more than 30 seconds and
> make up 14 minutes of the build. Considering what we can do to speed those
> tests as a tactical approach may be worth considering
> - We are spinning up the same services (e.g. kafka, storm) for multiple
> tests, instead use the docker infrastructure to spin them up once and then
> use them throughout the tests.
>
>
> *Tests aren't parallel*
>
> Currently we cannot run the build in parallel due to the integration test
> infrastructure spinning up its own services that bind to the same ports.
> If we correct this, we can run the builds in parallel with mvn -T
>
> - Correct this by decoupling the infrastructure from the tests and
> refactoring the tests to run in parallel.
> - Make the integration testing infrastructure bind intelligently to
> whatever port is available.
> - Move the integration tests to their own project. This will let us run
> the build in parallel since an individual project's test will be run
> serially.
>
> *Packaging is Painful*
>
> We have a sensitive environment in terms of dependencies. As such, we are
> careful to shade and relocate dependencies that we want to isolate from
> our
> transitive dependencies. The consequences of this is that we spend a lot
> of time in the build shading and relocating maven module output.
>
> - Do the hard work to walk our transitive dependencies and ensure that
> we are including only one copy of every library by using exclusions
> effectively. This will not only bring down build times, it will make sure
> we know what we're including.
> - Try to devise a strategy where we only shade once at the end. This
> could look like some combination of
> - standardizing on the lowest common denominator of a troublesome
> library
> - We shade in dependencies so they can use different versions of
> libraries (e.g. metron-common with a modern version of guava) than the
> final jars.
> - exclusions
> - externalizing infrastructure out to not necessitate spinning up
> hadoop components in-process for integration tests (i.e. hbase server
> conflicts with storm in a few dependencies)
>
> *Final Thoughts*
>
> If I had three to pick, I'd pick
>
> - moving off of the in-memory component infrastructure to docker images
> - fixing the maven poms to exclude correctly
> - ensuring the resulting tests are parallelizable
>
> I will point out that fixing the maven poms to exclude correctly (i.e. we
> choose the version of every jar that we depend on transitively) ticks
> multiple boxes, not just making things faster.
>
> What are your thoughts? What did I miss? We need a plan and we need to
> execute on it soon, otherwise travis is going to keep smacking us hard. It
> may be worth while constructing a tactical plan and then a more strategic
> plan that we can work toward. I was heartened at how much some of these
> suggestions dovetail with the discussion around the future of the docker
> infrastructure.
>
> Best,
>
> Casey
>
>


Re: [DISCUSS] Build Times are getting out of hand

2017-02-07 Thread Casey Stella
Mike, I can verify that the integration tests do not run in parallel via
mvn -T 1C clean install

At a minimum the integration test infrastructure will need to hunt for an
open port to bind to rather than assuming one.

On Tue, Feb 7, 2017 at 9:26 AM, Michael Miklavcic <
michael.miklav...@gmail.com> wrote:

> I can't recall, did we have a good solution around Docker and remote
> debugging integration tests from the IDE? On the topic of test refactoring
> and running in parallel, I'm all for it. I know JJ had been doing this on
> his local machine at one point, but we'd need to be sure all tests are
> truly independent. E.g. counts on hbase tables would need to be very
> specific or every test should use unique tables. Also, can we spin up
> something like Docker in Travis? How many cores do we get? I'll look into
> that and see what we get.
>
> I'm all for simplifying our dependencies. Shading the jars takes an
> incredible amount of time and has consistently bitten us repeatedly.
> Another bummer about the jar shading has been that the build runs
> differently in IntelliJ than it does from the Maven command line. I don't
> think we'll get away from it entirely, but we may be able to make this
> better as well.
>
> From my most recent local build, these are the biggest offending modules:
> metron-profiler  SUCCESS [05:56 min]
> metron-parsers . SUCCESS [09:38 min]
> metron-data-management . SUCCESS [09:15 min]
> elasticsearch-shaded ... SUCCESS [08:05 min]
>
> I'm going to take a look at Travis and also see what pom dependencies I can
> start excluding.
>
>
> On Mon, Feb 6, 2017 at 3:02 PM, Casey Stella  wrote:
>
> > For those with pending/building pull requests, it will come as no
> surprise
> > that our build times are increasing at a pace that is worrisome.  In
> fact,
> > we have hit a fundamental limit associated with Travis over the weekend.
> > We have creeped up into the 40+ minute build territory and travis seems
> to
> > error out at around 49 minutes.
> >
> > Taking the current build (
> > https://travis-ci.org/apache/incubator-metron/jobs/198929446), looking
> at
> > just job times, we're spending about 19 - 20 minutes (1176.53 seconds) in
> > tests out of 44 minutes and 42 seconds to do the build.  This places the
> > unit tests at around 43% of the build time.  I say all of this to point
> out
> > that while unit tests are a portion of the build, they are not even the
> > majority of the build time.  We need an approach that addresses the whole
> > build performance holistically and we need it soonest.
> >
> > To seed the discussion, I will point to a few things that come to mind
> that
> > fit into three broad categories:
> >
> > *Tests are Slow*
> >
> >
> >- *Tactical*: We have around 13 tests that take more than 30 seconds
> and
> >make up 14 minutes of the build.  Considering what we can do to speed
> > those
> >tests as a tactical approach may be worth considering
> >- We are spinning up the same services (e.g. kafka, storm) for
> multiple
> >tests, instead use the docker infrastructure to spin them up once and
> > then
> >use them throughout the tests.
> >
> >
> > *Tests aren't parallel*
> >
> > Currently we cannot run the build in parallel due to the integration test
> > infrastructure spinning up its own services that bind to the same ports.
> > If we correct this, we can run the builds in parallel with mvn -T
> >
> >- Correct this by decoupling the infrastructure from the tests and
> >refactoring the tests to run in parallel.
> >- Make the integration testing infrastructure bind intelligently to
> >whatever port is available.
> >- Move the integration tests to their own project.  This will let us
> run
> >the build in parallel since an individual project's test will be run
> >serially.
> >
> > *Packaging is Painful*
> >
> > We have a sensitive environment in terms of dependencies.  As such, we
> are
> > careful to shade and relocate dependencies that we want to isolate from
> our
> > transitive dependencies.  The consequences of this is that we spend a lot
> > of time in the build shading and relocating maven module output.
> >
> >- Do the hard work to walk our transitive dependencies and ensure that
> >we are including only one copy of every library by using exclusions
> >effectively.  This will not only bring down build times, it will make
> > sure
> >we know what we're including.
> >- Try to devise a strategy where we only shade once at the end.  This
> >could look like some combination of
> >   - standardizing on the lowest common denominator of a troublesome
> >   library
> >  - We shade in dependencies so they can use different versions of
> >  libraries (e.g. metron-common with a modern version of guava)
> > than the
> >  

Re: [DISCUSS] Build Times are getting out of hand

2017-02-06 Thread Otto Fowler
Is there an alternative to Travis?  Do other like sized apache projects
have these problems?  Do they use travis?


On February 6, 2017 at 17:02:37, Casey Stella (ceste...@gmail.com) wrote:

For those with pending/building pull requests, it will come as no surprise
that our build times are increasing at a pace that is worrisome. In fact,
we have hit a fundamental limit associated with Travis over the weekend.
We have creeped up into the 40+ minute build territory and travis seems to
error out at around 49 minutes.

Taking the current build (
https://travis-ci.org/apache/incubator-metron/jobs/198929446), looking at
just job times, we're spending about 19 - 20 minutes (1176.53 seconds) in
tests out of 44 minutes and 42 seconds to do the build. This places the
unit tests at around 43% of the build time. I say all of this to point out
that while unit tests are a portion of the build, they are not even the
majority of the build time. We need an approach that addresses the whole
build performance holistically and we need it soonest.

To seed the discussion, I will point to a few things that come to mind that
fit into three broad categories:

*Tests are Slow*


- *Tactical*: We have around 13 tests that take more than 30 seconds and
make up 14 minutes of the build. Considering what we can do to speed those
tests as a tactical approach may be worth considering
- We are spinning up the same services (e.g. kafka, storm) for multiple
tests, instead use the docker infrastructure to spin them up once and then
use them throughout the tests.


*Tests aren't parallel*

Currently we cannot run the build in parallel due to the integration test
infrastructure spinning up its own services that bind to the same ports.
If we correct this, we can run the builds in parallel with mvn -T

- Correct this by decoupling the infrastructure from the tests and
refactoring the tests to run in parallel.
- Make the integration testing infrastructure bind intelligently to
whatever port is available.
- Move the integration tests to their own project. This will let us run
the build in parallel since an individual project's test will be run
serially.

*Packaging is Painful*

We have a sensitive environment in terms of dependencies. As such, we are
careful to shade and relocate dependencies that we want to isolate from our
transitive dependencies. The consequences of this is that we spend a lot
of time in the build shading and relocating maven module output.

- Do the hard work to walk our transitive dependencies and ensure that
we are including only one copy of every library by using exclusions
effectively. This will not only bring down build times, it will make sure
we know what we're including.
- Try to devise a strategy where we only shade once at the end. This
could look like some combination of
- standardizing on the lowest common denominator of a troublesome
library
- We shade in dependencies so they can use different versions of
libraries (e.g. metron-common with a modern version of guava) than the
final jars.
- exclusions
- externalizing infrastructure out to not necessitate spinning up
hadoop components in-process for integration tests (i.e. hbase server
conflicts with storm in a few dependencies)

*Final Thoughts*

If I had three to pick, I'd pick

- moving off of the in-memory component infrastructure to docker images
- fixing the maven poms to exclude correctly
- ensuring the resulting tests are parallelizable

I will point out that fixing the maven poms to exclude correctly (i.e. we
choose the version of every jar that we depend on transitively) ticks
multiple boxes, not just making things faster.

What are your thoughts? What did I miss? We need a plan and we need to
execute on it soon, otherwise travis is going to keep smacking us hard. It
may be worth while constructing a tactical plan and then a more strategic
plan that we can work toward. I was heartened at how much some of these
suggestions dovetail with the discussion around the future of the docker
infrastructure.

Best,

Casey


[DISCUSS] Build Times are getting out of hand

2017-02-06 Thread Casey Stella
For those with pending/building pull requests, it will come as no surprise
that our build times are increasing at a pace that is worrisome.  In fact,
we have hit a fundamental limit associated with Travis over the weekend.
We have creeped up into the 40+ minute build territory and travis seems to
error out at around 49 minutes.

Taking the current build (
https://travis-ci.org/apache/incubator-metron/jobs/198929446), looking at
just job times, we're spending about 19 - 20 minutes (1176.53 seconds) in
tests out of 44 minutes and 42 seconds to do the build.  This places the
unit tests at around 43% of the build time.  I say all of this to point out
that while unit tests are a portion of the build, they are not even the
majority of the build time.  We need an approach that addresses the whole
build performance holistically and we need it soonest.

To seed the discussion, I will point to a few things that come to mind that
fit into three broad categories:

*Tests are Slow*


   - *Tactical*: We have around 13 tests that take more than 30 seconds and
   make up 14 minutes of the build.  Considering what we can do to speed those
   tests as a tactical approach may be worth considering
   - We are spinning up the same services (e.g. kafka, storm) for multiple
   tests, instead use the docker infrastructure to spin them up once and then
   use them throughout the tests.


*Tests aren't parallel*

Currently we cannot run the build in parallel due to the integration test
infrastructure spinning up its own services that bind to the same ports.
If we correct this, we can run the builds in parallel with mvn -T

   - Correct this by decoupling the infrastructure from the tests and
   refactoring the tests to run in parallel.
   - Make the integration testing infrastructure bind intelligently to
   whatever port is available.
   - Move the integration tests to their own project.  This will let us run
   the build in parallel since an individual project's test will be run
   serially.

*Packaging is Painful*

We have a sensitive environment in terms of dependencies.  As such, we are
careful to shade and relocate dependencies that we want to isolate from our
transitive dependencies.  The consequences of this is that we spend a lot
of time in the build shading and relocating maven module output.

   - Do the hard work to walk our transitive dependencies and ensure that
   we are including only one copy of every library by using exclusions
   effectively.  This will not only bring down build times, it will make sure
   we know what we're including.
   - Try to devise a strategy where we only shade once at the end.  This
   could look like some combination of
  - standardizing on the lowest common denominator of a troublesome
  library
 - We shade in dependencies so they can use different versions of
 libraries (e.g. metron-common with a modern version of guava) than the
 final jars.
  - exclusions
  - externalizing infrastructure out to not necessitate spinning up
  hadoop components in-process for integration tests (i.e. hbase server
  conflicts with storm in a few dependencies)

*Final Thoughts*

If I had three to pick, I'd pick

   - moving off of the in-memory component infrastructure to docker images
   - fixing the maven poms to exclude correctly
   - ensuring the resulting tests are parallelizable

I will point out that fixing the maven poms to exclude correctly (i.e. we
choose the version of every jar that we depend on transitively) ticks
multiple boxes, not just making things faster.

What are your thoughts?  What did I miss?  We need a plan and we need to
execute on it soon, otherwise travis is going to keep smacking us hard.  It
may be worth while constructing a tactical plan and then a more strategic
plan that we can work toward.  I was heartened at how much some of these
suggestions dovetail with the discussion around the future of the docker
infrastructure.

Best,

Casey