werken 2002/06/10 06:28:53
Added: src/tex ruminations.tex
Log:
Adding the source for my ruminations.
Revision Changes Path
1.1 jakarta-turbine-maven/src/tex/ruminations.tex
Index: ruminations.tex
===================================================================
%%
%%
%% Ruminations of a Mavenite
%%
%%
\documentclass[10pt,letterpaper]{article}
%%
%% Package Imports
%%
\usepackage{alltt}
\usepackage{cite}
\usepackage{plain}
\usepackage{hyperref}
\newenvironment{codelisting}%
{\begin{minipage}{250pt}\small\begin{alltt}}%
{\end{alltt}\end{minipage}}
\begin{document}
\title{Ruminations of a Mavenite\\\small{}Architecture, Design and Implementation}
\author{Bob McWhirter\\\small{}[EMAIL PROTECTED]\\\small{}The Werken
Company\\\small{}http://www.werken.com/}
\maketitle
%%
%% Abstract
%%
\begin{abstract}
Maven, a project management tool created by Jason van Zyl in early
2002 has seen rapid evolution with an increase in both features and
inadequacies. I have worked with Maven for only a few weeks, but is
obvious that in order to survive and thrive, certain changes must
occur. Weather these changes are evolutionary or revolutionary in
terms of the current Maven code-base is still to be decided. Herein,
I address various good and bad points that I have observed while using
Maven on several projects.
\end{abstract}
%% ------------------------------------------------------------
%% Origins and History
%% ------------------------------------------------------------
%% Outside source:
http://nagoya.apache.org/eyebrowse/ReadMsg?[EMAIL PROTECTED]&msgId=267542
\section{Origins, History \& Future}
\subsection{Motivation}
Jason van Zyl, attempting to get a hold on the recently decoupled
Turbine components, decided to start the Maven project. Turbine had
initially been a mostly monolithic project, but over the course of
time various components were shed from the core project into other
projects such as \emph{jakarta-commons}. As such, to develop on
Turbine, a developer must have all dependencies up-to-date and
managing these dependent jars was a hassle. Maven is Jason's attempt
to provide a consistent mechanism for building multi-project
code-bases. Maven was initially announced on February 25 on the
\emph{turbine-dev}
mailing list.
\subsection{The Early Years}
The initial development was performed by Jason van Zyl, while
documentation was produced by Pete Kazmier. The current incarnation
of Maven is basically an add-on to widely-used \emph{jakarta-ant}
Java build-system.
Fairly quickly, Maven developed a good-sized base of developers,
documenters, testers and users. People immediately saw the beauty
behind maven, of isolating logic into an abstract project descriptor
instead of directly coding actions within \emph{ant}. Maven grew
quickly and now supports many J2EE components, documentation-generator
of several types, and advanced inter-project dependency management.
The maven developer community is active on the \emph{\#maven} channel
of Jon Scott Steven's IRC server \emph{irc.whichever.com}.
\subsection{The Future}
The general consensus is that Maven could some day be an \emph{ant}-killer,
obviating the need for ant. While it may be politically incorrect
within the jakarta community, many of the maven developers believe
there are fundamental flaws with the design an implementation of
\emph{ant}, and hold little hope for improvement in \emph{ant-2}.
%% ------------------------------------------------------------
%% Features and Capabilities
%% ------------------------------------------------------------
\section{Features and Capabilities}
\subsection{Network-Enabled}
\subsubsection{Shared Dependency Repositories}
Maven makes extensive use of the network when managing inter-project
dependencies. This is considered one of the most important features of
Maven. When building a project, and dependent libraries may be pulled
from a common network repository and placed into a local cache. Maven
itself manages the classpath issues in order to allow building of the
target project. No longer must a user explicitly download dependencies and
manually install them into the correct locations for building, testing
and deploying.
Maven currently resolves all project dependencies through a local
cache which may be populated from multiple other local or remote
repositories.
The one main issue with regards to the remote repositories is
administration. Currently, the remote repository is simply an HTTP
accessible directory. For example, the main maven repository is
locate at \emph{http://jakarta.apache.org/turbine/jars/}. Some sort
of repository-management scheme needs to be implemented. Currently,
the owner of the directory on the server is responsible for
maintaining the repository using normal file-system commands.
I propose that a repository management tool be created that will
either accept built jars from the responsible projects or provides
some mechanism for the repository itself to build releases of jars.
In the common case, a project could notify the repository, possibly
through email or an HTTP request, that a new version of a project is
available. The repository is then responsible for syncing out the
correct version of the code from a revision-control system and
building an official build.
One issue with the method is that ``official'' builds will not
actually come from the project itself, but rather from the repository
itself.
If each project were allowed to submit versions of their deliverables,
strict security must be in place. Potentially public-key signatures
could be used while submitting deliverables in order to prevent
spoofing and the possibility of introducing malicious code. This
issue is extreme important with the central repository because any
bogus code propagates quickly to many other projects and users. The
downside of using signatures is that each project must establish a
relationship with each repository in order to verify public key
fingerprints, adding to administrative overhead.
As a complete departure, each project could possibly maintain their
own repository, under their own control, and simply register the
repository itself with a central repository-manager, instead of
registering the actual deliverables.
\subsection{Extensible Components}
\subsubsection{Action \& Plug-In Extension}
Maven, at the core, relies on \emph{elemental actions}. Various elemental
actions may be used in concert to produce an \emph{extension}, which
is implemented as a run-time plug-in. Elemental actions consist of
things such as:
\begin{itemize}
\item Compilation of a source tree.
\item Documentation generation of a source tree.
\item Processing a set of templates and data.
\item Compressing a file.
\end{itemize}
In the original version of Maven, based upon \emph{ant}, these element
actions were simply ant's own \verb|Task| objects. Maven simply
built the work already done by the ant team. An ant \verb|Task| is
simply a parameterizable process. The ant \verb|<javac>| task, for
example, takes \verb|srcDir|, \verb|destDir|, \verb|classpath| and
other flags and parameters.
Larger-grain \emph{activities} aggregate several elemental actions and
parameterize them with information from the \emph{project descriptor}.
A project to simply defines the location of the source tree along with
a set of external dependencies, and Maven will determine the necessary
invocation of \verb|<javac>| to perform compilation correctly.
Maven's current reliance upon ant is proving to be one of its biggest
faults. Ant's build-file language was never intended for the type of
programming the Maven developers are using it for. In order to
implement the initialization, callbacks and other features of
Maven, many ant targets are required, drastically slowing down even a
small incremental build. Additionally, the passing of information in
various forms to various targets in various build-files is sometimes
tenuous at best. Ant seems to basically be making the goals of Maven
more difficult to attain than they should be. Several replacement
options are described below.
\subsubsection{Project-Defined Callbacks}
While Maven attempts to always Do The Right Thing, there are times
that a project will require the ability to augment and otherwise
affect the normal execution of a Maven build. Currently, this is
implemented using a callback mechanism. Before and after each Maven
activity, the project is given an opportunity to fire a callback
routine. This is currently also implemented as a project-define ant
target. By implementing project-defined callbacks in the ant build-file
language, Maven is accessible to the large body of ant-knowledgeable
developers.
%% ------------------------------------------------------------
%% Architectural and Design Issues
%% ------------------------------------------------------------
\section{Architectural \& Design Issues}
\subsection{Elemental actions: Ant, Jelly or Otherwise}
Internally, Maven implements the \emph{project descriptor} as a normal
graph of JavaBeans known as the \emph{Project Object Model (POM)}. In
the current implementation, information from the POM is extracted and
made available to the ant process in order to parameterize various
elemental actions. Ant itself is currently the driver of Maven. A
user must explicitly invoke maven capabilities from his own ant
build-file. It is this aspect of Maven being secondary to ant,
instead of being in control the creates hardships for the Maven
developers.
We propose to, at the bare minimum, invert this relationship so that
ant is a slave to Maven. Maven would become a full application,
instead of simply a collection of ant tasks. This would allow each
project to actually exist without a build.xml file, unless required
for implementation of callbacks.
Developers are finding it less-than-friendly to perform the complex
logic necessary for maven in ant's XML language. Many feel that XML
is not an appropriate scripting language for these tasks. Several
participants (including the author) feel that for elemental tasks, a
pure JavaBeans interface may be appropriate. While ant has walked
partially down their path by implementing the \verb|Task| interface,
this interface is not generic and is dependent upon the rest of the
ant components. A developer may not simply use the \verb|Javac| task
independently of the rest of ant.
I therefore propose that Maven uses independent elemental action
JavaBeans that do not rely upon other Maven components, necessarily
(Figure \ref{figure.elemental-action}).
\begin{figure}
\begin{codelisting}
import java.io.File;
public class Javac \{
public Javac() \{
....
\}
public void setSourceDirectory(File sourceDirectory) \{
....
\}
public void setDestinationDirectory(File sourceDirectory) \{
....
\}
public void execute() throws Exception \{
....
\}
\}
\end{codelisting}
\caption{Javac Elemental Action}
\label{figure.elemental-action}
\end{figure}
Notice that this class contains absolutely nothing from the Maven
project itself. It is imminently re-usable in many places without
maintaining complex relationships to other components. At run-time,
Maven would simply use reflection to invoke the \verb|execute()|
method.
This would be the basic building-block for Maven.
A middle mediator layer would be used to extract the necessary
information from the POM to parameterize the elemental actions.
The mediator layer may somehow be implemented using James Strachan's
\emph{Jelly} (Figure \ref{figure.jelly}), or may simply be done using
properties files and XPaths to walk the POM (Figure
\ref{figure.xpath}).
\begin{figure}
\begin{codelisting}
<maven:javac>
<sourceDirectory>\$\{project.sourceDirectory\}</sourceDirectory>
<destinationDirectory>\$\{project.destinationDirectory\}</destinationDirectory>
<classpath>\$\{project.buildTimeDependencies\}</classpath>
....
</maven:javac>
\end{codelisting}
\caption{Jelly Mediator Example}
\label{figure.jelly}
\end{figure}
\begin{figure}
\begin{codelisting}
sourceDirectory = /project/sourceDirectory
destinationDirectory = /project/destinationDirectory
classpath = /project/buildTimeDependencies
....
\end{codelisting}
\caption{XPath Mediator Example}
\label{figure.xpath}
\end{figure}
\subsection{Composition and Inheritance}
By implementing elemental actions simply as generic JavaBeans, all
normal Java constructs for inheritance and composition are available.
I personally suggest that Java is the method for constructing both
elemental actions and the larger-grained activities. Ant's use of
an interpreted XML scripting language is important because every
developer on every project was required to write his own build-file.
Since Maven makes this mostly unneeded, where only the Maven
developers have to create actions, binding ourselves directly to Java
does not seem bad. Through the use of aspect-oriented programming
(AOP) or post-compilation byte-code processing (possibly as class
resolution time), we can add dependency-checking advice before each
invocation of an action's \verb|execute()| method.
There \emph{is} still a need for project-defined callbacks, and this
is an area where I believe we need many options. Maintaining the
familiar ant build-file for project-specific callbacks (a form of
extension) is necessary. Though, I propose that the callback
mechanism include an abstraction layer so that callbacks may be
implemented as Java code, ant targets, jelly scripts, or others.
\subsection{XML, JavaBeans and Testing}
By implementing elemental actions are mere JavaBeans without
dependency upon Maven components, unit-testing will be much easier.
We currently have issues with attempting to test the ant-based Maven
targets. By converting the targets to JavaBeans, this will be
alleviated. If we bind ourselves to an XML scripting language or any
sort as the only method for controlling the elemental activities, the
testing process becomes much more difficult and cross-cutting.
\subsection{Dependencies}
Dependencies between projects is accounted for through the repository
mechanism described above. Dependencies between elemental actions and
larger-grained activities must also be accounted for. Each activity
may also include version specifiers. It may be possible to simply
utilize the normal inter-project dependency repository for the
Maven-extension dependencies also. This could be accomplished by
tagging each project dependency as a build-time or run-time
dependency, possibly. More is discussed below in sections
\ref{section.impl.dependencies} and \ref{section.impl.the.many.classpaths}.
%% ------------------------------------------------------------
%% Implementation Issues
%% ------------------------------------------------------------
\section{Implementation Issues}
\subsection{Dependencies}
\label{section.impl.dependencies}
While the notion of dependencies seems simple enough, there are many
complexities involved, several of which have been directly
experienced by the jakarta-ant project team.
\subsubsection{Maven Dependencies}
Maven itself, as a project, has dependencies. When building Maven, it
is no different than building any other project. The complexity comes
during the execution of Maven, and maintain the correct classpaths.
The classpath issue is discussed below in section
\ref{section.impl.the.many.classpaths}.
\subsubsection{Extension Dependencies}
Since Maven can load, at run-time, various extensions to provide more
capabilities, the dependencies (and possibly conflicts) between these
extensions must also be managed at run-time. Luckily, this is simply
considered a subset of the afore-mentioned maven dependencies when
considering classpaths. Unlike the run-time dependency mechanism, below,
this requires a run-time loading of dependent jars into Maven's own
process-space.
\subsubsection{Build-Time Dependencies}
Build-time dependencies can be considered a sub-set of the extension
dependencies above. There may exist an extension that provides an
interface to \emph{javacc} or \emph{antlr} through Maven. In addition
to the dependency on those elemental activities, those activities
themselves required the javacc or antlr tools to be available within
Maven's process-space.
\subsubsection{Run-Time/Deploy-Time Dependencies}
Run-time/deploy-time dependencies are the simplest dependencies to
manage. These are \emph{not} required to be present within Maven's own
process-space. Instead, the list of them must be provided to
activities and elemental actions executed within Maven. For example,
during compilation, Maven is not required to have the dependent
libraries within its process space, but it must provide to the Java
compiler a list of locations for the dependencies. The compiler uses
this list internally.
\subsection{The Many Classpaths}
\label{section.impl.the.many.classpaths}
\subsubsection{Maven's Classpath}
Maven's own classpath is comprised of the following:
\begin{itemize}
\item \emph{Core Maven Components.} This includes maven.jar and
all other libraries that maven itself requires in order to operate.
\item \emph{Extension Components.} This includes extensions that
bundled into jar files and deployed within Maven itself.
\item \emph{Build-time Components.} This includes libraries that
the extensions themselves rely upon.
\end{itemize}
\subsubsection{Run-Time Dependency Classpath}
The run-time dependency classpath is not actually loaded within
Maven's process-space, but is simply a list of dependencies on the
local drive that is made available to activities and actions.
\subsubsection{Classpath interactions}
By far, the most interesting classpath interaction involves
\emph{junit}. A project may have a source-tree of unit tests written
against junit 3.6. Therefore, junit-3.6.jar is described as a
run-time dependency, as it is required in order to actually execute
the unit-tests. Maven's own \verb|junit| action, though, may be
compiled against junit 3.7. Here we have a definite version mismatch.
Somehow, we must find a method for aligning run-time dependencies with
extension and build-time dependencies. Otherwise, users of Maven are
inherently forced to use the version of junit (or other libraries that
exhibit similar usage idioms) that Maven itself depends upon. This
would be sub-optimal.
\subsubsection{Classloaders}
We should achieve complete isolation, where possible, between Maven's
own classpath and the classpath provided to elemental action. For
example, while maven uses log4j 1.2.3 internally, it should not ever
be presented to an elemental action in the dependency classpath.
There also exists the possibility of version clashes between Maven
extensions, and if possible these should be handled gracefully. A
classloader sandboxing mechanism may be necessarily even with Maven
itself, to possibly isolate each extension component's classpath from
another. Management of the classloader hierarchy will be an
important, if not exciting, task.
\section{Bottom Line}
To sum up, I feel that the current incarnation of Maven is indeed a
wonderful prototype of a beautiful concept. I feel that we should
take a hard look at massively reworking Maven to support further
future development. If we do not, I fear that Maven will be lost in
a quagmire of hacks.
The following things should be well-design and cleanly implemented,
the next time around:
\begin{itemize}
\item \emph{Maven is an Applications.} Maven should be a
full-fledged application, and not simply a collection of
ant tasks. Maven should not be subservient.
\item \emph{Repository Management.} Consider either a server
component for the repository to allow each project to update
its own libraries, or a distributed model, like DNS, where
registration of a repository allows delegation of management
responsibilities.
\item \emph{Full-Fledge Language.} Instead of working primarily
in an interpreted XML-based language such as ant build-files
or Jelly, the primary language for implementing Maven actions
and activities should be normal Java. Avoid anything
Maven-specific, and allow the Maven mediator layer
parameterize the JavaBean actions.
\item \emph{Robust Dependency Management.} Being able to manage
the many times of dependencies between Maven, extensions,
build-time and run-time is the utmost importance.
\item \emph{Correct ClassLoader Sand-boxing.} Hand-in-hand with
robust dependency management is correct handling of
classloaders within Maven, to avoid collisions or incorrect
behaviour.
\end{itemize}
I heartily applaud all of the developers who have spent considerable
time bringing Maven to where it is today. I would like to see the
same amount of progress continue into the future and suggest a
framework that will hopefully make this realizable. If a
re-engineering of Maven does not occur, I feel that in the near
future, we will hit a wall where every added feature creates more
problems than it solves.
%%\bibliography{werken}
%%\bibliographystyle{acm}
\end{document}
--
To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>