You have probably heard by now that there is a discussion going on in
the Hadoop PMC as to whether a number of the subprojects (Hbase, Avro,
Zookeeper, Hive, and Pig) should move out from under the Hadoop
umbrella and become top level Apache projects (TLP). This discussion
has picked up recently since the Apache board has clearly communicated
to the Hadoop PMC that it is concerned that Hadoop is acting as an
umbrella project with many disjoint subprojects underneath it. They
are concerned that this gives Apache little insight into the health
and happenings of the subproject communities which in turn means
Apache cannot properly mentor those communities.
The purpose of this email is to start a discussion within the
ZooKeeper community about this topic. Let me cover first what becoming
TLP would mean for ZooKeeper, and then I'll go into what options I
think we as a community have.
Becoming a TLP would mean that ZooKeeper would itself have a PMC that
would report directly to the Apache board. Who would be on the PMC
would be something we as a community would need to decide. Common
options would be to say all active committers are on the PMC, or all
active committers who have been a committer for at least a year. We
would also need to elect a chair of the PMC. This lucky person would
have no additional power, but would have the additional responsibility
of writing quarterly reports on ZooKeeper's status for Apache board
meetings, as well as coordinating with Apache to get accounts for new
committers, etc. We currently submit these same reports, however they
are forwarded to the board through the Hadoop PMC Chair. For more
Becoming a TLP would not mean that we are ostracized from the Hadoop
community. We would continue to be invited to Hadoop Summits, HUGs,
I see three ways that we as a community can respond to this:
1) Say yes, we want to be a TLP now.
2) Say yes, we want to be a TLP, but not yet. We feel we need more
time to mature. If we choose this option we need to be able to clearly
articulate how much time we need and what we hope to see change in
3) Say no, we feel the benefits for us staying with Hadoop outweigh
the drawbacks of being a disjoint subproject. If we choose this, we
need to be able to say exactly what those benefits are and why we feel
they will be compromised by leaving the Hadoop project.
There may other options that I haven't thought of. Please feel free to
suggest any you think of.
Here are the thoughts I've formed so far on the subject:
Benefits of moving to TLP:
a) Here's the boards view as communicated to me:
"we're looking to ensure that proper and effective oversight is
reached, and umbrellas can get in the way of that. If you *also* think
that all of your communities have proper oversight, and that you're
communicating enough about each/all of them to the Board, so that *it*
can provide oversight, then that's just fine. Go do the review and
come back and say, "we're all good. no changes are necessary.""
b) setting our own course - we would have our own PMC and therefore
have more latitude (within the apache rules of course) in setting
direction. PMC members would be focused on ZooKeeper exclusively.
Serious reservations I personally have with a move to TLP today:
a) I do not think ZooKeeper currently has a sufficiently large and
diverse enough community such that it can fend for itself as a
TLP. Our community is working hard to establish a critical mass, given
our maturity level, complexity of code, and the stakes involved (ZK is
literally the linchpin of many of our user's computing
infrastructures) it has been hard to attract/promote developers. We
currently have 5 active committers, 4 from one company and 1 from
a separate one (who only recently joined the committer ranks). The
board has stated they are willing to break their own rules here (form
a TLP with less than acceptable diversity) however I don't believe that
would be prudent from our perspective.
b) Loss of branding and discover-ability - "in the land of the cloud
the elephant is king". IMO being associated with Hadoop is a huge win
for us in terms of branding and discover-ability. This is similar to
the benefits we get of being an Apache project. People who are serious
about the cloud need to look at Hadoop. In the process they discover
c) "if ain't broke don't fix it". I have frequent interactions with
Hadoop PMC/Chair and an Apache board member. We are getting excellent
representation through this process and I don't see how visibility
"up" or support "down" could be improved.
Questions? Thoughts? Rebuttal? Let the discussion begin.