[caution, long reply ahead, no major disagreements, just some minor
misunderstandings and _a_lot_ of examples, explanation and rationale.]
Roy,
Thanks for taking the time to contribute to this discussion. I hope
that together we all can build a shared value system that will be the
basis for creating the OpenSolaris development process (which is also,
in a very real sense, reinventing Sun's internal development process).
I think we are very much in agreement on most things, although I seem to
have left you with the impression that Sun's development process is based
on the old "waterfall" model instead of the more recent agile or "extreme"
community based ones. Let me see if I can correct that impression.
Some of the confusion may come from a poor definition of context and
scope. Here's mine:
o In a mature system like Solaris, we tend to have a mix of "create
new" versus "update existing" that is heavily biased towards the
latter. And even those new features tend to have preexisting
architectural constraints. My intent here is to develop a shared
understanding of how those constraints are identified and managed
over time.
o I'm trying to focus on the architecture level of the discussion, which
is a subset of the entire development process. In a very real sense,
it is independent of any specific development and implementation
model. Waterfall, Extreme, "laid back community", or even something
not yet invented - all should fit easily into this architectural
discussion framework. In fact, it is a design goal of Sun's ARC process
to allow those models to change over time on a per-project and/or
per-community level - without requiring major changes to the core
ARChitectural framework.
o The granularity I'm focusing on is measured in "units of architectural
change", and not implementation and design details or even component
releases. That is, I want us to focus on being proactive:
"we want to make an architectural change to <this> part of the system,
what are the architectural constraints we need to be aware of and
what architectural expectations are we setting?"
rather than being reactive:
"we've already effectively committed to <this wad of stuff> for
production branch 2.0, somebody please approve it for us"
Addressing your particular points:
Roy T. Fielding wrote:
How does this differ from "other" FOSS projects? Primarily in the
proactive nature of the "commit to a plan" before starting, and
"requiring that a project be complete" before allowing integration.
Er, kind of, if you completely ignore the presence of stable/unstable
branches in the source code control system and the versioning policies
that govern the release process.
Different levels of detail - I'm looking at individual projects, while
many FOSS projects combine multiple such projects together into a single
development branch (yes, I know I'm over-generalizing and over-simplifying).
If this hypothetical FOSS project created a branch for each fine-grained
project/change, then these two views would be the same.
Maybe it would help if I described how we do this today for Solaris,
internal to Sun, using teamware and its ability to manage multiple
semi-independent branches:
Definitions:
A workspace is a source tree with its version control metadata.
The source tree contains many related elements working
together to implement a larger component.
In Teamware, workspaces have parent-child relationships, and
the scm code allows bringovers, putbacks and merges between
arbitrarily related workspaces.
A consolidation is roughly equivalent to a versioned
workspace and the policies governing its evolution.
Example: We have a consolidation called ON that contains the
Solaris OS and Networking code. It exists in many versions,
such as ON5.10 (which went into Solaris 10), ON5.11
(sort of aka Nevada, which will go into Solaris11),
ON5.10u1 (which will go into an update release of Solaris10),
etc. In concept, these are very similar to CVS branches.
Each of these consolidations creates bi-weekly child workspaces
for their development builds. Some of those workspaces
make it out in developer snapshots (this is where you get
Solaris <developer> Express releases) or even the Open
Solaris "source dumps" (June 14's one came from build 16
of ON5.11).
Each of these consolidation instances is contained in its own
distinct workspace. There is a tree-structured relationship
between them which I won't try to reproduce here in ASCII-email,
but here is a URL that may help:
http://blogs.sun.com/roller/resources/plocher/SDF-flow.gif
Each of these consolidations is associated with a release type,
from the set of {Major, Minor, Micro}. \
<rathole>
Sun has effectively done three Major releases of its OS:
SunOS 3.x (mid 1980's),
SunOS 4.x (late 80's) and
SunOS 5.x (early 90's).
Solaris2 is the marketing branding for the product that contains
ON5.x. Everything in the 5.x family (aka Solaris 2.x) has been
a Minor release, except for Solaris 5.5.1, which was a Micro
release port to PowerPC, and the patch sets applied to the Minor
releases to produce Update releases.
</rathole>
The developer who wishes to make a change to one of these
consolidation instances does the following
o Decides what to do (i.e, take ownership of a bug,
submits a proposal for a new feature, etc)
o creates a child workspace from their target consolidation
o [optionally, meets with the ARC one or more times to discuss
things and to map out potential problem areas]
o prototypes things and otherwise gets to the point of
having an architecture that can be committed to
o gets ARC commitment for that architecture
o designs, implements and tests the project
o resyncs with the parent as needed to stay current with other
changes being incorporated, and
o when complete, requests permission to do a commit/putback,
and, when given, puts back their changes into the parent.
If what you mean to say is that
Solaris follows a waterfall software design process ... collaborative
open source projects that depend on iterative development processes...
I don't mean to say that at all. Part of the confusion is that I am
only looking at architecture and you seem to be seeing architecture+
design+implementation.
Even in XP, you don't usually start with a blue-sky clean slate. When
you go in to play with some code, you have an existing set of test
suites that constrain what you are allowed to change. Some of those
are public in that they describe customer-visible/required behaviors,
and some are private in that they exist only to help the developer
maintain internal consistency. When you refactor, it is possible to
incompatibly change private things because you can get closure on all
the producers and consumers of an interface. Public things, in contrast,
resist such closure because of the asynchronous release nature of
multiple interconnected components. (i.e., you can't always guarantee
that all your consumers can/will simultaneously upgrade at exactly the
same time...)
In this world, as long as you are not changing any existing public test
suites (or adding any new ones), you are working below the radar screen
of architectural review and oversight. However, if your project involves
adding new interfaces for use by other components (i.e., adding public
test suites) or if it proposes to incompatibly change existing ones,
you are adding risk to the systems that build on top of your component.
It is that risk that we need to manage proactively. This is what
I meant when I said:
>> The former is simply good software engineering - know
>> what you want before you start hacking
If you "wanted" to maintain a low risk for dependent components,
yet your development team ran roughshod over their public interfaces,
you wouldn't get the result you wanted. When they delivered something
that you were not expecting, you would need to react, and the cost
of that reaction goes up the further along the development/deploy/sustain
cycle you go. Best to catch such things early. Thus, ARC review.
Not to micromanage their development, but to give then a solid, well
understood beginning for their efforts and a commitment that the
result of their development efforts will be accepted by the community.
Good software design is a process of discovery that is never
"complete".
I agree completely, even as I point out that design and architecture
are not synonyms. In a sense, Architecture says "you can't fall down
in an earthquake and you must have more than one exit route for every
occupied room", while Design gets to figure out how to do it all while
staying within the budget :-)
The latter constraint of "requiring a commit be complete" is
just as true for collaborative open source projects ... a general
rule in Apache is that a change must be complete within the branch
and platform on which the developer is working before they are
allowed to commit even to an unstable branch.
I hate it when my generalizations fall apart :-)
I never meant to say that there were not good examples out there;
of course there are, and we should learn from them.
[... skipping the ARC review description because it was an example
of a business decision, not a technical decision.]
It is technical simply because - at an architecture level, it deals
with expectations of interface stability:
What expectations did we set?
How does this proposed change affect those expectations?
What new expectations are we setting now?
Brian mentioned that there is a relationship between stability and releases,
and that Sun has taxonomies for both. They exist so that we can have this
type of discussion. Without dragging them more into this thread, they
let us say things like "this Stable interface will not change incompatibly
in anything other than a Major release of its consolidation".
You are correct in saying that the decision to make such a major release of the
consolidation *is* a business decision that needs to be made by the community.
If the community says "no" to such a release, (i.e, the core value of backwards
compatibility and stability wins out), then any proposal to make an incompatible
change to a Stable interface is effectively "dead", because there would be no
consolidation in that community that will accept that kind of change.
Many FOSS efforts routinely create a sequence of major release branches,
thus the sequence of <production>, <development>, <production> ....
This is not goood or bad in and of itself, but it is an indication that
they are not at the same place on "values backwards compatibility"
scale as Solaris is.
Alternatively, OpenSolaris could give development autonomy to the
communities, wherein technical development, discussion of alternatives,
getting it to work, and testing can all take place independent of
any ARC review. ARC review isn't needed until the community wishes
to apply the completed work to a stable release branch, at which
point the community product does need to adhere to the particular
interface requirements for that branch. [They are, of course, aware
of those requirements during the whole process, and thus will have
designed and developed for a particular set of branches.]
This sets up an "us -vs- them" situation that, at Sun, has usually
resulted in the failure of the process.
Much better to have the community "be" the ARC process, such that,
as they make the decisions relating to development, they do so by
using the ARC process itself.
I'm going to try to annotate your example above with "ARC Review"
notations to show that they can - and do - work well together:
----------------------
> Alternatively, OpenSolaris could give development autonomy to the
> communities,
Assuming that, in this context, a community maps to a consolidation
level entity and not the much smaller project team level, the
community entertains proposals for changes. Anyone who wishes to
embark on a project that will integrate into the community's codebase
needs to submit a change request proposal. Sometimes those proposals
are in the form of bug reports or RFE's, other times they are in other
forms (management mandates, marketing requirements, ...). The key
is that the proposal be a "plan" and not a simply "half baked idea".
One simple way of telling them apart is:
*I* am going to do this <plan> -vs-
*Someone else* should do <this idea>
As these proposals come in, they are evaluated. Some are below the
architectural radar screen, and so vector immediately to design and
implementation, with peer-, design- and code- reviews as required by
the community. Think "simple bugfixes".
Others have an impact on the architecture of the component. If that
impact is simple, routine and/or obvious (i.e., adding yet another
network card driver), for which there is an existing best practice
or design pattern to follow, it should be easy to get an ARChitectural
OK.
This leaves the things that are complex or risky. Things that may
impact other consolidations or consumers. These proposals may need to
be exposed to other communities so that the change can be coordinated
across them all. These proposals are the ones that take the longest
to go from "idea" to "prototype" to "finally have something that we
are willing to commit to". That "commitment" should be exactly equal to
"ARC review resulting in approval".
> wherein technical development, discussion of alternatives,
> getting it to work, and testing can all take place independent of
> any ARC review.
The discussions along this timeline *are* the ones that should happen
at the ARC level, simply because the ARC is where to find those other
participants. That is, lets define "ARC Review" as "getting the right
people together to make a decision about this particular architectural
change request".
> ARC review isn't needed until the community wishes
> to apply the completed work to a stable release branch, at which
> point the community product does need to adhere to the particular
> interface requirements for that branch.
Waiting 'till the project has reached internal commitment before
exposing it to others means that, in practice, that the developing
team will be unwilling or unable to make any substantive changes
that result from such a review. This tends to make such reviews
pointless.
> [They are, of course, aware
> of those requirements during the whole process, and thus will have
> designed and developed for a particular set of branches.]
This may or may not be true; again, in practice at Sun, we find that
most project teams over-optimize for their own short term requirements
while tending to ignore (or even intentionally "break") things that
impact systems integration, compatibility and interoperability with
other consumers. ("We think our customers are OK with having to
recompile on every release...") Certainly, customer focused design
methodologies can be expected to do a better job here IFF they are
effectively able to engage with real customers.
-------------------
Continuing on,
It makes more sense for an OpenSolaris community to simply create
a directory on an unstable branch and iteratively design-develop-test
until complete enough for a release.
We may be stumbling over the operational definition of "community".
Can you give a couple of examples of what you are thinking of?
If these branches are fine grained, we may be in violent agreement.
A discussion should ensue on
what (more stable) branches it should be ported. Finally, the
community votes based on what they want in the branch.
I see this somewhat differently, though with the same end results:
The output of an ARC review is an opinion that says
This <list of interfaces> is new, and has <these stability levels>
This project made incompatible changes to <this other list>,
These lists dictate that the proposed change can only be applied to
a {major, minor, micro (pick one)} release of the consolidation.
The community then maps this change to the set of branches that it
currently has open by looking at the release taxonomy (major, minor,
micro) and determines which of the eligible ones it wants the change
to go into. E.g.,
My proposal makes a compatible Stable addition, but
changes an existing Unstable interface. As such,
it is not eligible for inclusion in a Micro release,
since Micro releases guarantee that no Stable or Unstable
interfaces will break.
My community has a Major (v2.0) and a Minor (v1.4) branch
open. Because we have adopted a "no lost features" requirement,
I am instructed to integrate my change first into the Major
workspace and then into the Minor one. This ensures that
users who get this minor release, and then later upgrade
to the next major one won't see a feature regression for my stuff.
The requirement is simply that, sometime during this extended discussion (from
the initial exposure of the proposal up to the point where it is ready for
integration), the community needs to decide if they really "want" the proposed
change.
Within Sun Engineering, advance review and careful description of
customer benefits are necessary because resources are limited and
need to be allocated. In collaborative open source, resources are
only limited by the number of interested volunteers and the entry
barriers they need to overcome before contributing.
Right. The mechanics of answering the "do we want it" question change.
(although, it never was as simplistic as just "people resources"; the
"want" question is also influenced by customer requirements, the
competition, how much this project overlaps some other project, ...)
In the new OpenSolaris community, it may be more like "you need to
have at least 3 people working on it", or "it got enough votes", or
even "it fit within <these> boundary conditions".
Description
and review of plans is always beneficial, but those descriptions
and plans can and should be adapted as development on the unstable
branch proceeds.
Certainly plans should change as needed. The only question is whether
we should be proactive or reactive in understanding the risks and
constraints associates with them as they change. Or, other words,
it is all about expectation management. "What did we promise, what
are we doing that impacts those promises?"
The problem with having a "single, wide-open development branch"
for everyone to share is that you can't easily filter out the
incomplete or unwanted stuff once it has polluted the codebase.
Effectively, you get a roach motel - cruft can check in, but it
never gets out.
Design review ...
... is one level below the scope of Architectural review.
With a reasonably sane handle on architectural change, it
is relatively risk-free to delegate all such design and
implementation changes to the people doing the work - at least
as long as they, in turn, agree to re-engage at the "ARC"
level if/when they find that they need to break a previous
ARC commitment in the process of doing their project.
Sun can also choose to release only stable
components within its proprietary releases of Solaris.
If the OpenSolaris core that Solaris is built upon remains stable,
this would be a low-cost proposition. If not, if the Open
Solaris community abandons stability in favor of chaotic change,
then the cost of doing so becomes that of maintaining a full and
permanent fork from the rest of the community, and we are back
where we were 6 months ago. In my mind, this would be a failure.
OpenSolaris must allow projects to proceed independent of the
business requirements of any single contributor, even when one
of those business requirements happens to be interface
compatibility with SunOS 5.0.0.
Not a business requirement, but a technical/architectural one
derived at by the closure of
All the interfaces in OpenSolaris evolving at the rates
defined by their promised stability levels.
The business decision is that (IMHO) Sun does not intend to open
a consolidation that would be a Major release of ON, and so
is "unwilling" in general to consider incompatible changes to
existing Stable interfaces. (Note that part of Sun's ARC process
is the application of "engineering common sense"; it is possible
to have incompatible changes on an exception basis. Security fixes,
component end-of-life, etc. We don't do stupid things simply to
follow "the process" - or at least I hope we don't!)
That cannot be an OpenSolaris
requirement because it prevents the community from building new
things that may never be part of SunOS.
I'm not sure how you get from point A to point B here.
All the ARC process requires is that you set expectations and then
live up to them. OpenSolaris is starting its life with a bunch of
those expectations "already set"; it is free to set whatever new
expectations it wants on new stuff that goes into the system.
Whether or not those new things make it into the Solaris distro is
orthogonal to this architecture discussion.
You
can never predict when a feature will become so compelling to
new customers that it will justify a SunOS 6,
By definition, new features do not trigger the need for a major
release; it is only incompatible changes to existing components
that might do so. And, when one is paying proactive attention to
such details, it is often easy to architect and design these
changes such that they remain compatible, thus precluding the
need for an incompatible change in the first place.
The only other alternative would be for OpenSolaris to become
a non-collaborative project...
This sounds a bit like "I won't even consider alternatives because
my mind is already made up". I hope I'm just missing some nuance
because of the email medium...
-John Plocher
Keeper of Suns ARC process and small time process weenie
_______________________________________________
opensolaris-discuss mailing list
[email protected]