In this post, I'd like to summarize my first impressions of using maven.
The product built in my company is a mixed bag of C/C++ code, java code
and perl code. It is a classic three tier app, with the back end written
in C/C++, the middle tier written in perl and java and the GUI written
in java.
The java code is built via ant scripts, and they have grown to be quite
intricate and unwieldy, and they also tended to obscure the actual
dependencies between artifacts, causing multiple jars to contain varying
subsets of the class files, and various random jars being duplicated
into several places in the source tree -- in other words: your classic
organically grown mess. Also, the way the ant scripts were written made
it very hard to get incremental parallel builds to work, since they
tended to spill files all over the tree while invoking each others in
scary ways.
The top level build system is written in a make clone called "cook",
written by a nice guy in Australia, Peter Miller. This clone has a
variety of extras not available in any other build tool, which made it
possible for me to write a system implementing a philosophy that matches
almost perfectly with maven's philosophy. See
http://www.cg-soft.com/tools/build/ for details. One thing to note is
that "cook" is fairly old software (I think that the first version came
out over 20 years ago), which makes the absence of many of its features
in newer systems that much more depressing - more on that later.
So maven seems like a perfect fit, especially since our java code is
fairly vanilla, using standard technologies and very few hacks. Indeed,
it was possible for me to convert about 60% of the java build to maven
builds within 2 days after reading the "Better Builds with Maven" book.
Needless to say, I was impressed.
I really want to like maven. I think it has the right ideas, and the way
it deals with deploying and reusing artifacts built elsewhere is a great
model for dealing with third party software, and even inhouse software
build by separate teams. It effortlessly implements a "wink-in" scheme
which usually requires a lot of effort to get to work. In particular, I
like the fact that developers only need to have those subtrees checked
out where they are actually making changes, and can still build the
complete product by downloading the artifacts built and deployed by the
continuous build loop.
I am in the process of integrating some of the maven ideas (mainly the
plugin architecture) into my cook based build system, and I almost
wished maven had some support for C/C++ style artifacts - but I realize
that this is a much harder problem than java artifacts - a testament to
the wisdom of the java designers.
Nevertheless, there are difficulties and disappointments:
* Incremental builds are not reliable;
* Builds are not reproducible;
* Builds are not parallizable and distributable;
* Reactor builds are "all or nothing";
* Propagation of build parameters is undocumented and unpredictable;
* Release process is bizarre.
I think these are important issues. I'll go into details later, but I
cannot stress how important it is to have a reliable build tool that
actually removes workload from developers. Most developers are not
thrilled about having control taken away from them via an "opiniated
tool" like maven. They will only go along if it provides tangible
benefits. It is therefore -extremely-important- to not disappoint them.
I don't think there is disagreement about that - after all, the idea
plugin is a great step in the right direction, but I do fear that people
do not fully appreciate the difficulties created by having a build tool
that fails in mysterious and random, hard to debug ways. Developers have
strong egos, and will go to great lengths to try to figure out things by
themselves and will only come and ask questions when they are desperate
- and then they will blame the tool and the people who brought it into
their lives.
Now, in detail:
_Incremental Builds are not Reliable_
There are two well known failure modes:
* A source file has been relocated or removed
* A source file was updated, but with a timestamp older than the
associated derived file(s).
Supporting those two cases is not really that hard: In the first case,
you record a hash signature of the sorted list of all "ingredient" files
used to produce the target file, and consider it out of date if that
signature changes. In the second case, you record timestamp and hash
signature of the source file and consider it out of date if the
timestamp and signature changes. As a side effect, you get free build
avoidance by comparing hash signature of a generated file with the
previous version and consider subsequent dependencies up to date if the
signature did not change. "cook" has been doing this for years and it
works great.
The real problem seems to be a lack of awareness of why this is so bad.
The classic shrug, followed by "just say mvn clean install"... Besides
the fact that this is horrible for doing continuous builds, it also
makes build script writers very lazy, since all they need to do is
support the "clean" case. So, for example, jars will include obsolete
.class files unless a clean build is used. Sites will contain obsolete
html files unless a clean build is used...
(before I discovered maven, I was about 50% done with integrating java
builds into my "cook" system, and one of the little tools I have is a
perl script to determine the exact names of the class files generated by
a java source file, so I will only pack the exact files produced by the
current build)...
_Builds are not Reproducible_
This should be the holy grail of every release engineer. Arguably, it
isn't really maven's fault, but rather the fault of the java designers
to rely on a format that includes timestamps (jars). Run mvn install
once, save the artifcats, run mvn clean install again, and the jars will
look different. It requires trust in the system to accept that both
builds are the same. QA engineers are very unwilling to trust - it's
part of their job description. If you wish to reproduce a build, it
would be great if it came out bitwise identical to the original build.
It is actually not impossible to do this, and "cook" as a feature to at
least solve the timestamp issue, assuming that your version control
system is good enough and can reproduce a source tree with the exact
same timestamps as the original. The trick is to backdate every
generated file to be one second younger than the youngest ingredient
file. As a side effect, it will actually make your jars slightly
smaller, since those timestamps will compress better - yeah, big deal :)
Another wrench in the works is the fact that maven itself may change
over time and may cause builds to change. I'm not sure I understand all
the ramifications, but it would seem reasonable to ask how to ensure
that the right version of maven and its plugins be used when reproducing
any particular build.
_Builds are not Parallel_
I actually don't know this for sure - is javac multithreaded? I know
that the reactor build isn't. Perhaps not such an urgent feature, but
still a pity, since GNU Make and cook have supported parallel builds for
over ten years...
My solution is to use "cook" to invoke "mvn -N compile" and "mvn -N
install" in every directory that has a pom.xml, after extracting the
dependencies from the pom.xml files, and let "cook" do the equivalent of
the reactor build, using cook's parallelization (compile and install
invoked separately, so that cook has a chance to insert JNI header
generation in between, and pack the C/C++ shared objects built by cook
into the assembly built in the "mvn install" step) . This also addresses
the next issue:
_Reactor Builds are "all or nothing"_
If I'm in a really large multi-module project, and I need to work on
multiple modules at the same time, I only have two choices:
* I use the reactor build (and wait, and wait....)
* I run mvn in dependency order by hand.
I'd like to have a mode where mvn will do a reactor build for a specific
module and its prerequisites only.
_Propagation of Build Parameters Unpredictable and Undocumented_
As far as I can tell, there are pom parameters which will accumulate
(e.g. dependencies), override (various plugin settings, properties) or
just be ignored. Then there are parameters for which I can put in a
${...} expression, and some where I can't. The only way to find out is
via trial and error (or perhaps by reading the source code). I think
this is importent enough to merit documentation.
Others have complained before me about the difficulties in actually
figuring out how things are supposed to work. This is a serious
disadvantage of the declarative style: the declarations are by
definition arbitrary and must somehow be documented. As much as I hate
"ant", I have to say that this is much less of a problem there, mainly
because one needs to explicitly describe the operations in ant anyway,
and the atomic tasks are not so hard to understand.
_Release Process is Bizarre_
I don't envy anyone with the task of defining this. Fact is, there still
is no widespread consensus on how a release process is supposed to work,
as far as I can tell. All I know is that I can't use the release plugin
as it is. I do need to understand better the idea behind the snapshot
versioning and how version ranges will work out. I am uncomfortable with
some of the logic there.
I hope that this post will give the maven developers some insights on
the problems faced by a novice user attempting to use the product. I
still very much want to like it, and I'll struggle to make it work - but
it's not looking great at this moment.
--
cg
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]