Re: Proposal for Mesos Build Improvements

2017-02-16 Thread Alexander Rojas
Actually, this is a policy I have never been a big fan of. In my experience 
just forward declaring as much as possible in the headers and only including in 
compilations units tend to have decent improvements in complication time, 
particularly files like `mesos.cpp` or `slave.cpp` which indirectly end up 
including almost every header in the project.

Alexander Rojas
alexan...@mesosphere.io




> On 15 Feb 2017, at 20:12, Neil Conway  wrote:
> 
> On Tue, Feb 14, 2017 at 11:28 AM, Jeff Coffler
>  wrote:
>> For efficiency purposes, if a header file is included by 50% or more of the 
>> source files, it should be included in the precompiled header. If a header 
>> is included in fewer than 50% of the source files, then it can be separately 
>> included (and thus would not benefit from precompiled headers). Note that 
>> this is a guideline; even if a header is used by less than 50% of source 
>> files, if it's very large, we still may decide to throw it in the 
>> precompiled header.
> 
> It seems like this would have the effect of creating many false
> dependencies: if file X doesn't currently include header Y but Y is
> included in the precompiled header, the symbols in Y will now be
> visible when X is compiled. It would also mean that X would need to be
> recompiled when Y changes.
> 
> Related: the current policy is that headers and implementation files
> should try to include all of their dependencies, without relying on
> transitive includes. For example, if foo.cpp includes bar.hpp, which
> includes , but foo.cpp also uses , both foo.cpp and
> bar.hpp should "#include ". Adopting precompiled headers would
> mean making an exception to this policy, right?
> 
> I wonder if we should instead use headers like:
> 
> <- mesos_common.h ->
> #include 
> #include 
> #include 
> 
> <- xyz.cpp, which needs headers "b" and "d" ->
> #include "mesos_common.h>
> 
> #include 
> #include 
> 
> That way, the fact that "xyz.cpp" logically depends on  (but not
>  or ) is not obscured (in other words, Mesos should continue to
> compile if 'mesos_common.h' is replaced with an empty file). Does
> anyone know whether the header guard in  _should_ make the repeated
> inclusion of  relatively cheap?
> 
> Neil



Re: Proposal for Mesos Build Improvements

2017-02-15 Thread Neil Conway
On Wed, Feb 15, 2017 at 1:59 PM, Jeff Coffler
 wrote:
> 3. Maintaining the correct includes is nice, but not at the cost of compiler 
> speed.

Personally, I would invert these statements -- but until we know the
cost of the redundant includes, probably not worth debating further.

> 4. I totally disagree about auto-generating the PCH. We should go through the 
> sources and pick what makes sense. Auto-generating implies that we 
> auto-generate all the time (on every build), and I'd rather not scan the 
> sources during a build (with an associated speed hit) just to try and speed 
> up the build.

The problem is that "what makes sense" will change over time.
Auto-generating the PCH certainly doesn't mean it needs to be
generated as part of the build process: a script (or docker container)
to generate "mesos_common.hpp" on-demand would be fine with me, as
long as it is a mechanical process.

Neil


RE: Proposal for Mesos Build Improvements

2017-02-15 Thread Jeff Coffler
I'm planning on prototyping this just to generate numbers. I don't think I need 
permission to do that! But, of course, to incorporate any changes into the code 
base, we need consensus.

I agree that stout optimizations are outside of the scope of this discussion. 
Any stout optimizations are orthogonal to PCH, and thus they need not be linked 
together. Note that stout optimizations may be less "pressing" with PCH, but 
still it's separate. The fact that PCH may help stout just indicates that PCH 
is a good thing, particularly on platforms like Windows (where we get to 
include windows.h, a massive file).

Also, I wanted to clarify a message from Benjamin. I did NOT mean to imply that 
PCH takes 20 seconds to generate. I was simply saying that PCH reads the 
headers ONCE and generates the PCH. As such, I don't believe that "bloat" is an 
issue here. In actuality, generating the PCH is about as long as reading them. 
But you read it once and generate the PCH, you don't read it once for each 
source file. That's the speed-up for PCH; a ton of header processing is done 
once. When I used PCH in the past, it took about 4 seconds to read all my 
headers. That 4 seconds was then subtracted from all the source compilations. 
That is, 4 seconds to generate, then all the compiles were 4 seconds faster.

Regarding Andy's points:

1. I agree, we need a benchmarked prototype. Note that I will only benchmark a 
particular directory, I don't intend to benchmark EVERYTHING. One directory 
should give us enough of an idea to see how it works.

2. Maintaining ccache compatibility is a good thing. BUT I don't think it's a 
hard requirement. If PCH on Linux gives us reasonable performance without 
ccache, then I don't see a lot of value in maintaining ccache compatibility. 
Now, that said, I will try to do so (why not?). But I'm not sure if these 
workarounds for ccache will work on Windows; we'll see during the prototyping 
stage.

3. Maintaining the correct includes is nice, but not at the cost of compiler 
speed. I'm not sure if Windows has "multiple include optimizations". I will 
include this in my prototyping. If it does, then I agree it would be very nice 
to maintain this. BUT in practice, it will be hard over time. After all, if you 
include mesos_common.h (either literally or by build system), you may not 
realize that you're missing an include without that. And I don't think it's 
"worth it" to build twice to catch this, once with PCH and once without. That's 
ugly, in my honest opinion.

4. I totally disagree about auto-generating the PCH. We should go through the 
sources and pick what makes sense. Auto-generating implies that we 
auto-generate all the time (on every build), and I'd rather not scan the 
sources during a build (with an associated speed hit) just to try and speed up 
the build.

Let me get some hard numbers under my belt. From that, we can make intelligent 
decisions about where to go.

/Jeff


-Original Message-
From: Andy Schwartzmeyer [mailto:andsc...@microsoft.com.INVALID] 
Sent: Wednesday, February 15, 2017 1:31 PM
To: dev <dev@mesos.apache.org>
Subject: Re: Proposal for Mesos Build Improvements

Hi,

I worked with Jeff on the initial proposal for pre-compiled headers and library 
refactor. I think this thread should focus on the former, potentially 
implementing pre-compiled headers, and have a separate conversation on Jeff's 
original second suggestion of using more libraries inside Mesos.

With that in mind, I think we have some requirements for the pre-compiled 
header implementation.

* First and foremost, we need a benchmarked prototype that proves pre-compiled 
headers provide a considerable speed-up. As the most complex headers are those 
of the header-only Stout library, we should also benchmark improvements from 
making Stout non-header-only, and then prioritize; but this will likely be a 
separate discussion.

* We must maintain ccache compatibility, as the majority of Mesos developers 
already use ccache. It appears the most straightforward way to do this is to 
_not_ `#include common.h`, but to `-include` it; this fits well with the next 
requirement.

* We must maintain correct includes; i.e. Mesos should be compilable without 
the pre-compiled header. Because of multiple-include optimization, this should 
not affect the gains from the use of pre-compiled headers. Again, this fits 
well with the next requirement.

* We should automatically generate the pre-compiled header, as this eliminates 
manual maintenance. Combined with the above two points, this approach should 
actually negate the original code-churn problem. By generating a common header 
to pre-compile, and using `-include`, we will not have to modify existing 
source files. This would both give us ccache compatibility and ensure that the 
correct includes would be maintained (and thus can be refactored independently 
of this work).

Did I miss any p

Re: Proposal for Mesos Build Improvements

2017-02-15 Thread Benjamin Bannier
Hi,

> I wonder if we should instead use headers like:
> 
> <- mesos_common.h ->
> #include 
> #include 
> #include 
> 
> <- xyz.cpp, which needs headers "b" and "d" ->
> #include "mesos_common.h>
> 
> #include 
> #include 
> 
> That way, the fact that "xyz.cpp" logically depends on  (but not
>  or ) is not obscured (in other words, Mesos should continue to
> compile if 'mesos_common.h' is replaced with an empty file).

That’s an interesting angle for a number of reasons. It would allow local 
reasoning about correct includes, and it also appears to help maintain support 
for ccache’d builds,

  https://ccache.samba.org/manual.html#_precompiled_headers

For that one could include project headers such as `mesos_common.h` via a 
command line switch to the compiler invocation, without the need to make any 
changes to source files (possibly an interesting way to create some 
benchmarking POC of this proposal).

Not changing source files for this would be valuable as it would keep build 
setup idiosyncrasies out of the source. If we wouldn’t change files we’d keep 
the possibility to make PCH use opt-in. Right now a ccache build of the Mesos 
source files and tests with warm ccache takes less than 50s on my 8 core 
machine (a substantial fraction of this time is spent in serializing 
(non-parallelizable) linking steps, and I’d bet there is also some ~10s 
overhead from Make stat’ing files and changing directories in there).

Generating precompiled headers would throw in additional serializing step, and 
even if it really only would take 20s to generate a header as guestimated by 
Jeff, we would already be approaching a point of diminishing returns on 
platforms with ccache, even if we compiled every source file in no time.

> Does anyone know whether the header guard in  _should_ make the repeated
> inclusion of  relatively cheap?

Not sure how much information gcc or clang would need to serialize from the 
PCH, but there is of course some form of multi-include optimization in both gcc 
and clang, see e.g.,

  https://gcc.gnu.org/onlinedocs/cppinternals/Guard-Macros.html


Cheers,

Benjamin

Re: Proposal for Mesos Build Improvements

2017-02-15 Thread Vinod Kone
Thanks Jeff for the proposal! Faster builds for Mesos have been a long
awaited feature, so great to see some real traction here.

Regarding benchmarks, would it be possible to have benchmarks (for clean
build and incremental build) with 1) PCH only change 2) stout
non-header-only and 3) 1+2 ?  Not sure how easy it is to prototype these to
get benchmarks, but I think having those numbers would clearly show us
which one which we should prioritize?

Thanks,


On Wed, Feb 15, 2017 at 11:34 AM, Neil Conway <neil.con...@gmail.com> wrote:

> Hi Jeff,
>
> Gotcha -- I just wanted to understand the tradeoffs here.
>
> I'd definitely prefer an approach in which we include "" in both
> "mesos_common.hpp" and each individual file that logically depends on
> "". This makes clear the dependencies between modules and also
> makes it easy to disable building with PCH (see also the
> recommendations in [1]). If the only reason avoid this is the cost of
> the repeated include, it would be important to see benchmarks that
> justify this.
>
> BTW, I think it's important that we script/automate this as far as
> possible, e.g., using a script to decide which headers are included
> often enough to justify being included in the PCH. This should avoid
> the PCH getting out of date, as well as innumerable arguments down the
> road about whether header X warrants being added to the PCH :)
>
> Overall, sounds cool to me! Faster builds would be fantastic.
>
> Neil
>
> [1] http://gamesfromwithin.com/the-care-and-feeding-of-pre-
> compiled-headers
>
> On Wed, Feb 15, 2017 at 11:26 AM, Jeff Coffler
> <jeff.coff...@microsoft.com.invalid> wrote:
> > Ni Neil,
> >
> > What you're saying is essentially correct. If mesos_common.h includes a
> bunch of, well, "common" stuff, and everybody includes mesos_common.h, then
> those files will, by definition, have a least some number of items that
> they didn't need.
> >
> > Since PCH works on both Windows and Linux, I don't think this is a "bad
> thing". It's a trade-off. Is a (what I believe to be) very significant
> speed-up in compile speed "worth it"? (Obviously, since I submitted the
> proposal, I think so. But this is a very valid point).
> >
> >  Yes, header guards will help, but header guards are not free. I would
> rather not include a really large set of headers (say, windows.h, or stout)
> multiple times, expecting header guards to make them fast. I'd rather just
> include them once, in mesos_common.h. And this would also yield the
> greatest performance enhancement as well.
> >
> > I'm working on getting some hard numbers for a subset of Mesos. Once we
> have some hard comparisons with compiler performance (with and without
> PCH), we can address this much more practically.
> >
> > /Jeff
> >
> >
> > -Original Message-
> > From: Neil Conway [mailto:neil.con...@gmail.com]
> > Sent: Wednesday, February 15, 2017 11:13 AM
> > To: dev <dev@mesos.apache.org>
> > Subject: Re: Proposal for Mesos Build Improvements
> >
> > On Tue, Feb 14, 2017 at 11:28 AM, Jeff Coffler <
> jeff.coff...@microsoft.com.invalid> wrote:
> >> For efficiency purposes, if a header file is included by 50% or more of
> the source files, it should be included in the precompiled header. If a
> header is included in fewer than 50% of the source files, then it can be
> separately included (and thus would not benefit from precompiled headers).
> Note that this is a guideline; even if a header is used by less than 50% of
> source files, if it's very large, we still may decide to throw it in the
> precompiled header.
> >
> > It seems like this would have the effect of creating many false
> > dependencies: if file X doesn't currently include header Y but Y is
> included in the precompiled header, the symbols in Y will now be visible
> when X is compiled. It would also mean that X would need to be recompiled
> when Y changes.
> >
> > Related: the current policy is that headers and implementation files
> should try to include all of their dependencies, without relying on
> transitive includes. For example, if foo.cpp includes bar.hpp, which
> includes , but foo.cpp also uses , both foo.cpp and bar.hpp
> should "#include ". Adopting precompiled headers would mean making
> an exception to this policy, right?
> >
> > I wonder if we should instead use headers like:
> >
> > <- mesos_common.h ->
> > #include 
> > #include 
> > #include 
> >
> > <- xyz.cpp, which needs headers "b" and "d" -> #include "mesos_common.h>
> >
> > #include 
> > #include 
> >
> > That way, the fact that "xyz.cpp" logically depends on  (but not 
> or ) is not obscured (in other words, Mesos should continue to compile
> if 'mesos_common.h' is replaced with an empty file). Does anyone know
> whether the header guard in  _should_ make the repeated inclusion of 
> relatively cheap?
> >
> > Neil
>


RE: Proposal for Mesos Build Improvements

2017-02-15 Thread Jeff Coffler
Ni Neil,

What you're saying is essentially correct. If mesos_common.h includes a bunch 
of, well, "common" stuff, and everybody includes mesos_common.h, then those 
files will, by definition, have a least some number of items that they didn't 
need.

Since PCH works on both Windows and Linux, I don't think this is a "bad thing". 
It's a trade-off. Is a (what I believe to be) very significant speed-up in 
compile speed "worth it"? (Obviously, since I submitted the proposal, I think 
so. But this is a very valid point).

 Yes, header guards will help, but header guards are not free. I would rather 
not include a really large set of headers (say, windows.h, or stout) multiple 
times, expecting header guards to make them fast. I'd rather just include them 
once, in mesos_common.h. And this would also yield the greatest performance 
enhancement as well.

I'm working on getting some hard numbers for a subset of Mesos. Once we have 
some hard comparisons with compiler performance (with and without PCH), we can 
address this much more practically.

/Jeff


-Original Message-
From: Neil Conway [mailto:neil.con...@gmail.com] 
Sent: Wednesday, February 15, 2017 11:13 AM
To: dev <dev@mesos.apache.org>
Subject: Re: Proposal for Mesos Build Improvements

On Tue, Feb 14, 2017 at 11:28 AM, Jeff Coffler 
<jeff.coff...@microsoft.com.invalid> wrote:
> For efficiency purposes, if a header file is included by 50% or more of the 
> source files, it should be included in the precompiled header. If a header is 
> included in fewer than 50% of the source files, then it can be separately 
> included (and thus would not benefit from precompiled headers). Note that 
> this is a guideline; even if a header is used by less than 50% of source 
> files, if it's very large, we still may decide to throw it in the precompiled 
> header.

It seems like this would have the effect of creating many false
dependencies: if file X doesn't currently include header Y but Y is included in 
the precompiled header, the symbols in Y will now be visible when X is 
compiled. It would also mean that X would need to be recompiled when Y changes.

Related: the current policy is that headers and implementation files should try 
to include all of their dependencies, without relying on transitive includes. 
For example, if foo.cpp includes bar.hpp, which includes , but foo.cpp 
also uses , both foo.cpp and bar.hpp should "#include ". 
Adopting precompiled headers would mean making an exception to this policy, 
right?

I wonder if we should instead use headers like:

<- mesos_common.h ->
#include 
#include 
#include 

<- xyz.cpp, which needs headers "b" and "d" -> #include "mesos_common.h>

#include 
#include 

That way, the fact that "xyz.cpp" logically depends on  (but not  or ) 
is not obscured (in other words, Mesos should continue to compile if 
'mesos_common.h' is replaced with an empty file). Does anyone know whether the 
header guard in  _should_ make the repeated inclusion of  relatively 
cheap?

Neil


Re: Proposal for Mesos Build Improvements

2017-02-15 Thread Alex Clemmer

Yes, that is right, PCHs would probably introduce some additional
dependencies for some object files, and if those PCHs become bloated
over time, then you can expect this to be expressed as diminishing time
savings.

This does imply that maintaining PCHs will require at least some work.


__
Transcribed by my voice-enabled refrigerator, please pardon chilly messages.

On Wed, 15 Feb 2017, Neil Conway wrote:


On Tue, Feb 14, 2017 at 11:28 AM, Jeff Coffler
 wrote:

For efficiency purposes, if a header file is included by 50% or more of the 
source files, it should be included in the precompiled header. If a header is 
included in fewer than 50% of the source files, then it can be separately 
included (and thus would not benefit from precompiled headers). Note that this 
is a guideline; even if a header is used by less than 50% of source files, if 
it's very large, we still may decide to throw it in the precompiled header.


It seems like this would have the effect of creating many false
dependencies: if file X doesn't currently include header Y but Y is
included in the precompiled header, the symbols in Y will now be
visible when X is compiled. It would also mean that X would need to be
recompiled when Y changes.

Related: the current policy is that headers and implementation files
should try to include all of their dependencies, without relying on
transitive includes. For example, if foo.cpp includes bar.hpp, which
includes , but foo.cpp also uses , both foo.cpp and
bar.hpp should "#include ". Adopting precompiled headers would
mean making an exception to this policy, right?

I wonder if we should instead use headers like:

<- mesos_common.h ->
#include 
#include 
#include 

<- xyz.cpp, which needs headers "b" and "d" ->
#include "mesos_common.h>

#include 
#include 

That way, the fact that "xyz.cpp" logically depends on  (but not
 or ) is not obscured (in other words, Mesos should continue to
compile if 'mesos_common.h' is replaced with an empty file). Does
anyone know whether the header guard in  _should_ make the repeated
inclusion of  relatively cheap?

Neil



Re: Proposal for Mesos Build Improvements

2017-02-15 Thread Neil Conway
On Tue, Feb 14, 2017 at 11:28 AM, Jeff Coffler
 wrote:
> For efficiency purposes, if a header file is included by 50% or more of the 
> source files, it should be included in the precompiled header. If a header is 
> included in fewer than 50% of the source files, then it can be separately 
> included (and thus would not benefit from precompiled headers). Note that 
> this is a guideline; even if a header is used by less than 50% of source 
> files, if it's very large, we still may decide to throw it in the precompiled 
> header.

It seems like this would have the effect of creating many false
dependencies: if file X doesn't currently include header Y but Y is
included in the precompiled header, the symbols in Y will now be
visible when X is compiled. It would also mean that X would need to be
recompiled when Y changes.

Related: the current policy is that headers and implementation files
should try to include all of their dependencies, without relying on
transitive includes. For example, if foo.cpp includes bar.hpp, which
includes , but foo.cpp also uses , both foo.cpp and
bar.hpp should "#include ". Adopting precompiled headers would
mean making an exception to this policy, right?

I wonder if we should instead use headers like:

<- mesos_common.h ->
#include 
#include 
#include 

<- xyz.cpp, which needs headers "b" and "d" ->
#include "mesos_common.h>

#include 
#include 

That way, the fact that "xyz.cpp" logically depends on  (but not
 or ) is not obscured (in other words, Mesos should continue to
compile if 'mesos_common.h' is replaced with an empty file). Does
anyone know whether the header guard in  _should_ make the repeated
inclusion of  relatively cheap?

Neil


Re: Proposal for Mesos Build Improvements

2017-02-15 Thread Joris Van Remoortere
>
> However, the non-header-only work won't do anything in a "clean build"
> scenario.

I don't think this is true.

If you look at how many independent .o files we build that scan those
headers each time it should be clear that reducing the complexity of the
header file reduces the compile time.
A good example of heave .o files are the mesos tests that scan close to all
of stout / libprocess for each test file.

—
*Joris Van Remoortere*
Mesosphere

On Tue, Feb 14, 2017 at 4:49 PM, Jeff Coffler <
jeff.coff...@microsoft.com.invalid> wrote:

> Hi Neil,
>
> This was discussed in the CXX Mesos Slack channel yesterday.
>
> Basically, the two are separate and independent. Regardless of stout work,
> I anticipate that PCH work will dramatically speed up the Windows build
> (and Linux too, although I have less experience in that area). I'm going to
> run some benchmarks on a subset of the code to give a good "before/after"
> idea of the speedup and report to the list.
>
> If stout non-header-only library work is done, this will do a fair amount
> to speed up incremental builds (i.e. you just update implementation of a
> stout method, and only the related C file is rebuilt). However, the
> non-header-only work won't do anything in a "clean build" scenario. And, if
> course, if you change the interface of a stout method, all bets are off and
> you get to rebuild virtually the world.
>
> PCH, on the other hand, will speed up all compiles across the board (using
> stout and not using stout). Now, that said, if a stout change is made
> (assuming still header-only), you will still rebuild everything, but the
> builds will go much faster. That *may* be fast enough to take the sting out
> of significant stout changes, but changing stout will still help the
> incremental build cases regardless.
>
> Hope that clarifies,
>
> /Jeff
>
> -Original Message-
> From: Neil Conway [mailto:neil.con...@gmail.com]
> Sent: Tuesday, February 14, 2017 11:45 AM
> To: dev <dev@mesos.apache.org>
> Subject: Re: Proposal for Mesos Build Improvements
>
> I'm curious to hear more about how using PCH compares with making stout a
> non-header-only library. Is PCH easier to implement, or is it expected to
> offer a more dramatic improvement in compile times? Would making both
> changes eventually make sense?
>
> Neil
>
> On Tue, Feb 14, 2017 at 11:28 AM, Jeff Coffler <jeff.coff...@microsoft.com
> .invalid> wrote:
> > Proposal For Build Improvements
> >
> > The Mesos build process is in dire need of some build infrastructure
> improvements. These improvements will improve speed and ease of work in
> particular components, and dramatically improve overall build time,
> especially in the Windows environment, but likely in the Linux environment
> as well.
> >
> >
> > Background:
> >
> > It is currently recommended to use the ccache project with the Mesos
> build process. This makes the Linux build process more tolerable in terms
> of speed, but unfortunately such software is not available on Windows.
> Ultimately, though, the caching software is covering up two fundamental
> flaws in the overall build process:
> >
> > 1. Lack of use of libraries
> > 2. Lack of precompiled headers
> >
> > By not allowing use of libraries, the overall build process is often
> much longer, particularly when a lot of work is being done in a particular
> component. If work is being done in a particular component, only that
> library need be rebuilt (and then the overall image relinked). Currently,
> since there is no such modularization, all source files must be considered
> at build time. Interestingly enough, there is such modularization in the
> source code layout; that modularization just isn't utilized at the compiler
> level.
> >
> > Precompiled headers exist on both Windows and Linux. For Linux, you can
> refer to https://na01.safelinks.protection.outlook.com/?url=
> https%3A%2F%2Fgcc.gnu.org%2Fonlinedocs%2Fgcc%2FPrecompiled-Headers.html&
> data=02%7C01%7CJeff.Coffler%40microsoft.com%7Cf0dfa7d79e6e43d31fa008d45512
> 0381%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%
> 7C636226983234972044=ljS8BJ9ZSI7Wqvk5%2Bv1oPH5c6tHZGg7FPb08nUN8JUc%
> 3D=0. Straight from the GNU CC documentation: "The time the
> compiler takes to process these header files over and over again can
> account for nearly all of the time required to build the project."
> >
> > In my prior use of precompiled headers, each C or C++ file generally
> took about 4 seconds to compile. After switching to precompiled headers,
> the precompiled header creation took about 4 seconds, but each C/C++ file
> now took about 200 mill

RE: Proposal for Mesos Build Improvements

2017-02-14 Thread Jeff Coffler
Hi Neil,

This was discussed in the CXX Mesos Slack channel yesterday.

Basically, the two are separate and independent. Regardless of stout work, I 
anticipate that PCH work will dramatically speed up the Windows build (and 
Linux too, although I have less experience in that area). I'm going to run some 
benchmarks on a subset of the code to give a good "before/after" idea of the 
speedup and report to the list.

If stout non-header-only library work is done, this will do a fair amount to 
speed up incremental builds (i.e. you just update implementation of a stout 
method, and only the related C file is rebuilt). However, the non-header-only 
work won't do anything in a "clean build" scenario. And, if course, if you 
change the interface of a stout method, all bets are off and you get to rebuild 
virtually the world.

PCH, on the other hand, will speed up all compiles across the board (using 
stout and not using stout). Now, that said, if a stout change is made (assuming 
still header-only), you will still rebuild everything, but the builds will go 
much faster. That *may* be fast enough to take the sting out of significant 
stout changes, but changing stout will still help the incremental build cases 
regardless.

Hope that clarifies,

/Jeff

-Original Message-
From: Neil Conway [mailto:neil.con...@gmail.com] 
Sent: Tuesday, February 14, 2017 11:45 AM
To: dev <dev@mesos.apache.org>
Subject: Re: Proposal for Mesos Build Improvements

I'm curious to hear more about how using PCH compares with making stout a 
non-header-only library. Is PCH easier to implement, or is it expected to offer 
a more dramatic improvement in compile times? Would making both changes 
eventually make sense?

Neil

On Tue, Feb 14, 2017 at 11:28 AM, Jeff Coffler 
<jeff.coff...@microsoft.com.invalid> wrote:
> Proposal For Build Improvements
>
> The Mesos build process is in dire need of some build infrastructure 
> improvements. These improvements will improve speed and ease of work in 
> particular components, and dramatically improve overall build time, 
> especially in the Windows environment, but likely in the Linux environment as 
> well.
>
>
> Background:
>
> It is currently recommended to use the ccache project with the Mesos build 
> process. This makes the Linux build process more tolerable in terms of speed, 
> but unfortunately such software is not available on Windows. Ultimately, 
> though, the caching software is covering up two fundamental flaws in the 
> overall build process:
>
> 1. Lack of use of libraries
> 2. Lack of precompiled headers
>
> By not allowing use of libraries, the overall build process is often much 
> longer, particularly when a lot of work is being done in a particular 
> component. If work is being done in a particular component, only that library 
> need be rebuilt (and then the overall image relinked). Currently, since there 
> is no such modularization, all source files must be considered at build time. 
> Interestingly enough, there is such modularization in the source code layout; 
> that modularization just isn't utilized at the compiler level.
>
> Precompiled headers exist on both Windows and Linux. For Linux, you can refer 
> to 
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgcc.gnu.org%2Fonlinedocs%2Fgcc%2FPrecompiled-Headers.html=02%7C01%7CJeff.Coffler%40microsoft.com%7Cf0dfa7d79e6e43d31fa008d455120381%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636226983234972044=ljS8BJ9ZSI7Wqvk5%2Bv1oPH5c6tHZGg7FPb08nUN8JUc%3D=0.
>  Straight from the GNU CC documentation: "The time the compiler takes to 
> process these header files over and over again can account for nearly all of 
> the time required to build the project."
>
> In my prior use of precompiled headers, each C or C++ file generally took 
> about 4 seconds to compile. After switching to precompiled headers, the 
> precompiled header creation took about 4 seconds, but each C/C++ file now 
> took about 200 milliseconds to compile. The overall build speed was thus 
> dramatically reduced.
>
>
> Scope of Changes:
>
> These changes are only being proposed for the CMake system. Going forward, 
> the CMake system is the easiest way to maintain some level of portability 
> between the Linux and Windows platforms.
>
>
> Details for Modularization:
>
> For the modularization, the intent is to simply make each source directory of 
> files, if functionally separate, to be compiled into an archive (.a) file. 
> These archive files will then be linked together to form the actual 
> executables. These changes will primarily be in the CMake system, and should 
> have limited effect on any actual source code.
>
> At a later date, if it makes sense, we can look at building shared library 
> (.so) files. 

Re: Proposal for Mesos Build Improvements

2017-02-14 Thread Neil Conway
I'm curious to hear more about how using PCH compares with making
stout a non-header-only library. Is PCH easier to implement, or is it
expected to offer a more dramatic improvement in compile times? Would
making both changes eventually make sense?

Neil

On Tue, Feb 14, 2017 at 11:28 AM, Jeff Coffler
<jeff.coff...@microsoft.com.invalid> wrote:
> Proposal For Build Improvements
>
> The Mesos build process is in dire need of some build infrastructure 
> improvements. These improvements will improve speed and ease of work in 
> particular components, and dramatically improve overall build time, 
> especially in the Windows environment, but likely in the Linux environment as 
> well.
>
>
> Background:
>
> It is currently recommended to use the ccache project with the Mesos build 
> process. This makes the Linux build process more tolerable in terms of speed, 
> but unfortunately such software is not available on Windows. Ultimately, 
> though, the caching software is covering up two fundamental flaws in the 
> overall build process:
>
> 1. Lack of use of libraries
> 2. Lack of precompiled headers
>
> By not allowing use of libraries, the overall build process is often much 
> longer, particularly when a lot of work is being done in a particular 
> component. If work is being done in a particular component, only that library 
> need be rebuilt (and then the overall image relinked). Currently, since there 
> is no such modularization, all source files must be considered at build time. 
> Interestingly enough, there is such modularization in the source code layout; 
> that modularization just isn't utilized at the compiler level.
>
> Precompiled headers exist on both Windows and Linux. For Linux, you can refer 
> to https://gcc.gnu.org/onlinedocs/gcc/Precompiled-Headers.html. Straight from 
> the GNU CC documentation: "The time the compiler takes to process these 
> header files over and over again can account for nearly all of the time 
> required to build the project."
>
> In my prior use of precompiled headers, each C or C++ file generally took 
> about 4 seconds to compile. After switching to precompiled headers, the 
> precompiled header creation took about 4 seconds, but each C/C++ file now 
> took about 200 milliseconds to compile. The overall build speed was thus 
> dramatically reduced.
>
>
> Scope of Changes:
>
> These changes are only being proposed for the CMake system. Going forward, 
> the CMake system is the easiest way to maintain some level of portability 
> between the Linux and Windows platforms.
>
>
> Details for Modularization:
>
> For the modularization, the intent is to simply make each source directory of 
> files, if functionally separate, to be compiled into an archive (.a) file. 
> These archive files will then be linked together to form the actual 
> executables. These changes will primarily be in the CMake system, and should 
> have limited effect on any actual source code.
>
> At a later date, if it makes sense, we can look at building shared library 
> (.so) files. However, this only makes the most sense if the code is truly 
> shared between different executable files. If that's not the case, then it 
> likely makes sense just to stick with .a files. Regardless, generation of .so 
> files is out of scope for this change.
>
>
> Details for Precompiled Header Changes:
>
> Precompiled headers will make use of stout (a very large header-only library) 
> essentially "free" from a compile-time overhead point of view. Basically, 
> precompiled headers will take a list of header files (including very long 
> header files, like "windows.h"), and generate the compiler memory structures 
> for their representation.
>
> During precompiled header generation, these memory structures are flushed to 
> disk. Then, when components are built, the memory structures are reloaded 
> from disk, which is dramatically faster than actually parsing the tens of 
> thousands of lines of header files and building the memory structures.
>
> For precompiled headers to be useful, a relatively "consistent" set of 
> headers must be included by all of the C/C++ files. So, for example, consider 
> the following C file:
>
> #if defined(windows)
> #include 
> #endif
>
> #include 
> #include 
> #include 
>
> < - Remainder of module - >
>
> To make a precompiled header for this module, all of the #include files would 
> be included in a new file, mesos_common.h. The C file would then be changed 
> as follows:
>
> #include "mesos_common.h"
>
> < - Remainder of module - >
>
> Structurally, the code is identical, and need not be built with precompiled 
>

Re: Proposal for Mesos Build Improvements

2017-02-14 Thread Alex Clemmer

Just to add a bit of context, the history of the issue of build time is
tracked in MESOS-1582[1], and most recently[2].

Speaking personally, I'm excited about _any_ progress in this area,
because (1) the Windows build times are completely unbearable, and (2)
because getting the build times down benefits the whole community.

When it was basically just me working on the Windows code paths, this
issue was tolerable, but now that we have multiple people working
full-time, it is really important to start fixing the issue.

[1] https://issues.apache.org/jira/browse/MESOS-1582
[2]
https://issues.apache.org/jira/browse/MESOS-1582?focusedCommentId=15828645=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15828645


__
Transcribed by my voice-enabled refrigerator, please pardon chilly messages.

On Tue, 14 Feb 2017, Jeff Coffler wrote:


Proposal For Build Improvements

The Mesos build process is in dire need of some build infrastructure 
improvements. These improvements will improve speed and ease of work in 
particular components, and dramatically improve overall build time, especially 
in the Windows environment, but likely in the Linux environment as well.


Background:

It is currently recommended to use the ccache project with the Mesos build 
process. This makes the Linux build process more tolerable in terms of speed, 
but unfortunately such software is not available on Windows. Ultimately, 
though, the caching software is covering up two fundamental flaws in the 
overall build process:

1. Lack of use of libraries
2. Lack of precompiled headers

By not allowing use of libraries, the overall build process is often much 
longer, particularly when a lot of work is being done in a particular 
component. If work is being done in a particular component, only that library 
need be rebuilt (and then the overall image relinked). Currently, since there 
is no such modularization, all source files must be considered at build time. 
Interestingly enough, there is such modularization in the source code layout; 
that modularization just isn't utilized at the compiler level.

Precompiled headers exist on both Windows and Linux. For Linux, you can refer to 
https://gcc.gnu.org/onlinedocs/gcc/Precompiled-Headers.html. Straight from the GNU CC 
documentation: "The time the compiler takes to process these header files over and 
over again can account for nearly all of the time required to build the project."

In my prior use of precompiled headers, each C or C++ file generally took about 
4 seconds to compile. After switching to precompiled headers, the precompiled 
header creation took about 4 seconds, but each C/C++ file now took about 200 
milliseconds to compile. The overall build speed was thus dramatically reduced.


Scope of Changes:

These changes are only being proposed for the CMake system. Going forward, the 
CMake system is the easiest way to maintain some level of portability between 
the Linux and Windows platforms.


Details for Modularization:

For the modularization, the intent is to simply make each source directory of 
files, if functionally separate, to be compiled into an archive (.a) file. 
These archive files will then be linked together to form the actual 
executables. These changes will primarily be in the CMake system, and should 
have limited effect on any actual source code.

At a later date, if it makes sense, we can look at building shared library 
(.so) files. However, this only makes the most sense if the code is truly 
shared between different executable files. If that's not the case, then it 
likely makes sense just to stick with .a files. Regardless, generation of .so 
files is out of scope for this change.


Details for Precompiled Header Changes:

Precompiled headers will make use of stout (a very large header-only library) essentially 
"free" from a compile-time overhead point of view. Basically, precompiled headers will 
take a list of header files (including very long header files, like "windows.h"), and 
generate the compiler memory structures for their representation.

During precompiled header generation, these memory structures are flushed to 
disk. Then, when components are built, the memory structures are reloaded from 
disk, which is dramatically faster than actually parsing the tens of thousands 
of lines of header files and building the memory structures.

For precompiled headers to be useful, a relatively "consistent" set of headers 
must be included by all of the C/C++ files. So, for example, consider the following C 
file:

#if defined(windows)
#include 
#endif

#include 
#include 
#include 

< - Remainder of module - >

To make a precompiled header for this module, all of the #include files would 
be included in a new file, mesos_common.h. The C file would then be changed as 
follows:

#include "mesos_common.h"

< - Remainder of module - >

Structurally, the code is identical,

Proposal for Mesos Build Improvements

2017-02-14 Thread Jeff Coffler
Proposal For Build Improvements

The Mesos build process is in dire need of some build infrastructure 
improvements. These improvements will improve speed and ease of work in 
particular components, and dramatically improve overall build time, especially 
in the Windows environment, but likely in the Linux environment as well.


Background:

It is currently recommended to use the ccache project with the Mesos build 
process. This makes the Linux build process more tolerable in terms of speed, 
but unfortunately such software is not available on Windows. Ultimately, 
though, the caching software is covering up two fundamental flaws in the 
overall build process:

1. Lack of use of libraries
2. Lack of precompiled headers

By not allowing use of libraries, the overall build process is often much 
longer, particularly when a lot of work is being done in a particular 
component. If work is being done in a particular component, only that library 
need be rebuilt (and then the overall image relinked). Currently, since there 
is no such modularization, all source files must be considered at build time. 
Interestingly enough, there is such modularization in the source code layout; 
that modularization just isn't utilized at the compiler level.

Precompiled headers exist on both Windows and Linux. For Linux, you can refer 
to https://gcc.gnu.org/onlinedocs/gcc/Precompiled-Headers.html. Straight from 
the GNU CC documentation: "The time the compiler takes to process these header 
files over and over again can account for nearly all of the time required to 
build the project."

In my prior use of precompiled headers, each C or C++ file generally took about 
4 seconds to compile. After switching to precompiled headers, the precompiled 
header creation took about 4 seconds, but each C/C++ file now took about 200 
milliseconds to compile. The overall build speed was thus dramatically reduced.


Scope of Changes:

These changes are only being proposed for the CMake system. Going forward, the 
CMake system is the easiest way to maintain some level of portability between 
the Linux and Windows platforms.


Details for Modularization:

For the modularization, the intent is to simply make each source directory of 
files, if functionally separate, to be compiled into an archive (.a) file. 
These archive files will then be linked together to form the actual 
executables. These changes will primarily be in the CMake system, and should 
have limited effect on any actual source code.

At a later date, if it makes sense, we can look at building shared library 
(.so) files. However, this only makes the most sense if the code is truly 
shared between different executable files. If that's not the case, then it 
likely makes sense just to stick with .a files. Regardless, generation of .so 
files is out of scope for this change.


Details for Precompiled Header Changes:

Precompiled headers will make use of stout (a very large header-only library) 
essentially "free" from a compile-time overhead point of view. Basically, 
precompiled headers will take a list of header files (including very long 
header files, like "windows.h"), and generate the compiler memory structures 
for their representation.

During precompiled header generation, these memory structures are flushed to 
disk. Then, when components are built, the memory structures are reloaded from 
disk, which is dramatically faster than actually parsing the tens of thousands 
of lines of header files and building the memory structures.

For precompiled headers to be useful, a relatively "consistent" set of headers 
must be included by all of the C/C++ files. So, for example, consider the 
following C file:

#if defined(windows)
#include 
#endif

#include 
#include 
#include 

< - Remainder of module - >

To make a precompiled header for this module, all of the #include files would 
be included in a new file, mesos_common.h. The C file would then be changed as 
follows:

#include "mesos_common.h"

< - Remainder of module - >

Structurally, the code is identical, and need not be built with precompiled 
headers. However, use of precompiled headers will make file compilation 
dramatically faster.

Note that other include files can be included after the precompiled header if 
appropriate. For example, the following is valid:

#include "mesos_common.h"
#inclue 

< - Remainder of module - >

For efficiency purposes, if a header file is included by 50% or more of the 
source files, it should be included in the precompiled header. If a header is 
included in fewer than 50% of the source files, then it can be separately 
included (and thus would not benefit from precompiled headers). Note that this 
is a guideline; even if a header is used by less than 50% of source files, if 
it's very large, we still may decide to throw it in the precompiled header.

Note that, for use of precompiled headers, there will be