First impressions of using maven (long)

Christian Goetze Thu, 04 Jan 2007 11:53:31 -0800

In this post, I'd like to summarize my first impressions of using maven.

The product built in my company is a mixed bag of C/C++ code, java codeand perl code. It is a classic three tier app, with the back end writtenin C/C++, the middle tier written in perl and java and the GUI writtenin java.

The java code is built via ant scripts, and they have grown to be quiteintricate and unwieldy, and they also tended to obscure the actualdependencies between artifacts, causing multiple jars to contain varyingsubsets of the class files, and various random jars being duplicatedinto several places in the source tree -- in other words: your classicorganically grown mess. Also, the way the ant scripts were written madeit very hard to get incremental parallel builds to work, since theytended to spill files all over the tree while invoking each others inscary ways.

The top level build system is written in a make clone called "cook",written by a nice guy in Australia, Peter Miller. This clone has avariety of extras not available in any other build tool, which made itpossible for me to write a system implementing a philosophy that matchesalmost perfectly with maven's philosophy. Seehttp://www.cg-soft.com/tools/build/ for details. One thing to note isthat "cook" is fairly old software (I think that the first version cameout over 20 years ago), which makes the absence of many of its featuresin newer systems that much more depressing - more on that later.

So maven seems like a perfect fit, especially since our java code isfairly vanilla, using standard technologies and very few hacks. Indeed,it was possible for me to convert about 60% of the java build to mavenbuilds within 2 days after reading the "Better Builds with Maven" book.Needless to say, I was impressed.

I really want to like maven. I think it has the right ideas, and the wayit deals with deploying and reusing artifacts built elsewhere is a greatmodel for dealing with third party software, and even inhouse softwarebuild by separate teams. It effortlessly implements a "wink-in" schemewhich usually requires a lot of effort to get to work. In particular, Ilike the fact that developers only need to have those subtrees checkedout where they are actually making changes, and can still build thecomplete product by downloading the artifacts built and deployed by thecontinuous build loop.

I am in the process of integrating some of the maven ideas (mainly theplugin architecture) into my cook based build system, and I almostwished maven had some support for C/C++ style artifacts - but I realizethat this is a much harder problem than java artifacts - a testament tothe wisdom of the java designers.


Nevertheless, there are difficulties and disappointments:

   * Incremental builds are not reliable;
   * Builds are not reproducible;
   * Builds are not parallizable and distributable;
   * Reactor builds are "all or nothing";
   * Propagation of build parameters is undocumented and unpredictable;
   * Release process is bizarre.

I think these are important issues. I'll go into details later, but Icannot stress how important it is to have a reliable build tool thatactually removes workload from developers. Most developers are notthrilled about having control taken away from them via an "opiniatedtool" like maven. They will only go along if it provides tangiblebenefits. It is therefore -extremely-important- to not disappoint them.I don't think there is disagreement about that - after all, the ideaplugin is a great step in the right direction, but I do fear that peopledo not fully appreciate the difficulties created by having a build toolthat fails in mysterious and random, hard to debug ways. Developers havestrong egos, and will go to great lengths to try to figure out things bythemselves and will only come and ask questions when they are desperate- and then they will blame the tool and the people who brought it intotheir lives.


Now, in detail:

_Incremental Builds are not Reliable_

There are two well known failure modes:

   * A source file has been relocated or removed
   * A source file was updated, but with a timestamp older than the
     associated derived file(s).

Supporting those two cases is not really that hard: In the first case,you record a hash signature of the sorted list of all "ingredient" filesused to produce the target file, and consider it out of date if thatsignature changes. In the second case, you record timestamp and hashsignature of the source file and consider it out of date if thetimestamp and signature changes. As a side effect, you get free buildavoidance by comparing hash signature of a generated file with theprevious version and consider subsequent dependencies up to date if thesignature did not change. "cook" has been doing this for years and itworks great.

The real problem seems to be a lack of awareness of why this is so bad.The classic shrug, followed by "just say mvn clean install"... Besidesthe fact that this is horrible for doing continuous builds, it alsomakes build script writers very lazy, since all they need to do issupport the "clean" case. So, for example, jars will include obsolete.class files unless a clean build is used. Sites will contain obsoletehtml files unless a clean build is used...

(before I discovered maven, I was about 50% done with integrating javabuilds into my "cook" system, and one of the little tools I have is aperl script to determine the exact names of the class files generated bya java source file, so I will only pack the exact files produced by thecurrent build)...


_Builds are not Reproducible_

This should be the holy grail of every release engineer. Arguably, itisn't really maven's fault, but rather the fault of the java designersto rely on a format that includes timestamps (jars). Run mvn installonce, save the artifcats, run mvn clean install again, and the jars willlook different. It requires trust in the system to accept that bothbuilds are the same. QA engineers are very unwilling to trust - it'spart of their job description. If you wish to reproduce a build, itwould be great if it came out bitwise identical to the original build.

It is actually not impossible to do this, and "cook" as a feature to atleast solve the timestamp issue, assuming that your version controlsystem is good enough and can reproduce a source tree with the exactsame timestamps as the original. The trick is to backdate everygenerated file to be one second younger than the youngest ingredientfile. As a side effect, it will actually make your jars slightlysmaller, since those timestamps will compress better - yeah, big deal :)

Another wrench in the works is the fact that maven itself may changeover time and may cause builds to change. I'm not sure I understand allthe ramifications, but it would seem reasonable to ask how to ensurethat the right version of maven and its plugins be used when reproducingany particular build.


_Builds are not Parallel_

I actually don't know this for sure - is javac multithreaded? I knowthat the reactor build isn't. Perhaps not such an urgent feature, butstill a pity, since GNU Make and cook have supported parallel builds forover ten years...

My solution is to use "cook" to invoke "mvn -N compile" and "mvn -Ninstall" in every directory that has a pom.xml, after extracting thedependencies from the pom.xml files, and let "cook" do the equivalent ofthe reactor build, using cook's parallelization (compile and installinvoked separately, so that cook has a chance to insert JNI headergeneration in between, and pack the C/C++ shared objects built by cookinto the assembly built in the "mvn install" step) . This also addressesthe next issue:


_Reactor Builds are "all or nothing"_

If I'm in a really large multi-module project, and I need to work onmultiple modules at the same time, I only have two choices:


   * I use the reactor build (and wait, and wait....)
   * I run mvn in dependency order by hand.

I'd like to have a mode where mvn will do a reactor build for a specificmodule and its prerequisites only.


_Propagation of Build Parameters Unpredictable and Undocumented_

As far as I can tell, there are pom parameters which will accumulate(e.g. dependencies), override (various plugin settings, properties) orjust be ignored. Then there are parameters for which I can put in a${...} expression, and some where I can't. The only way to find out isvia trial and error (or perhaps by reading the source code). I thinkthis is importent enough to merit documentation.

Others have complained before me about the difficulties in actuallyfiguring out how things are supposed to work. This is a seriousdisadvantage of the declarative style: the declarations are bydefinition arbitrary and must somehow be documented. As much as I hate"ant", I have to say that this is much less of a problem there, mainlybecause one needs to explicitly describe the operations in ant anyway,and the atomic tasks are not so hard to understand.


_Release Process is Bizarre_

I don't envy anyone with the task of defining this. Fact is, there stillis no widespread consensus on how a release process is supposed to work,as far as I can tell. All I know is that I can't use the release pluginas it is. I do need to understand better the idea behind the snapshotversioning and how version ranges will work out. I am uncomfortable withsome of the logic there.

I hope that this post will give the maven developers some insights onthe problems faced by a novice user attempting to use the product. Istill very much want to like it, and I'll struggle to make it work - butit's not looking great at this moment.

--
cg


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

First impressions of using maven (long)

Reply via email to