Re: [RESULT] [VOTE] Designating maintainers for some Spark components

2014-11-30 Thread Matei Zaharia
An update on this: After adding the initial maintainer list, we got feedback to 
add more maintainers for some components, so we added four others (Josh Rosen 
for core API, Mark Hamstra for scheduler, Shivaram Venkataraman for MLlib and 
Xiangrui Meng for Python). We also decided to lower the timeout for waiting 
for a maintainer to a week. Hopefully this will provide more options for 
reviewing in these components.

The complete list is available at 
https://cwiki.apache.org/confluence/display/SPARK/Committers.

Matei

 On Nov 8, 2014, at 7:28 PM, Matei Zaharia matei.zaha...@gmail.com wrote:
 
 Thanks everyone for voting on this. With all of the PMC votes being for, the 
 vote passes, but there were some concerns that I wanted to address for 
 everyone who brought them up, as well as in the wording we will use for this 
 policy.
 
 First, like every Apache project, Spark follows the Apache voting process 
 (http://www.apache.org/foundation/voting.html), wherein all code changes are 
 done by consensus. This means that any PMC member can block a code change on 
 technical grounds, and thus that there is consensus when something goes in. 
 It's absolutely true that every PMC member is responsible for the whole 
 codebase, as Greg said (not least due to legal reasons, e.g. making sure it 
 complies to licensing rules), and this idea will not change that. To make 
 this clear, I will include that in the wording on the project page, to make 
 sure new committers and other community members are all aware of it.
 
 What the maintainer model does, instead, is to change the review process, by 
 having a required review from some people on some types of code changes 
 (assuming those people respond in time). Projects can have their own diverse 
 review processes (e.g. some do commit-then-review and others do 
 review-then-commit, some point people to specific reviewers, etc). This kind 
 of process seems useful to try (and to refine) as the project grows. We will 
 of course evaluate how it goes and respond to any problems.
 
 So to summarize,
 
 - Every committer is responsible for, and more than welcome to review and 
 vote on, every code change. In fact all community members are welcome to do 
 this, and lots are doing it.
 - Everyone has the same voting rights on these code changes (namely consensus 
 as described at http://www.apache.org/foundation/voting.html)
 - Committers will be asked to run patches that are making architectural and 
 API changes by the maintainers before merging.
 
 In practice, none of this matters too much because we are not exactly a 
 hot-well of discord ;), and even in the case of discord, the point of the ASF 
 voting process is to create consensus. The goal is just to have a better 
 structure for reviewing and minimize the chance of errors.
 
 Here is a tally of the votes:
 
 Binding votes (from PMC): 17 +1, no 0 or -1
 
 Matei Zaharia
 Michael Armbrust
 Reynold Xin
 Patrick Wendell
 Andrew Or
 Prashant Sharma
 Mark Hamstra
 Xiangrui Meng
 Ankur Dave
 Imran Rashid
 Jason Dai
 Tom Graves
 Sean McNamara
 Nick Pentreath
 Josh Rosen
 Kay Ousterhout
 Tathagata Das
 
 Non-binding votes: 18 +1, one +0, one -1
 
 +1:
 Nan Zhu
 Nicholas Chammas
 Denny Lee
 Cheng Lian
 Timothy Chen
 Jeremy Freeman
 Cheng Hao
 Jackylk Likun
 Kousuke Saruta
 Reza Zadeh
 Xuefeng Wu
 Witgo
 Manoj Babu
 Ravindra Pesala
 Liquan Pei
 Kushal Datta
 Davies Liu
 Vaquar Khan
 
 +0: Corey Nolet
 
 -1: Greg Stein
 
 I'll send another email when I have a more detailed writeup of this on the 
 website.
 
 Matei


-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: [VOTE] Designating maintainers for some Spark components

2014-11-11 Thread Yu Ishikawa
+1 (binding) 

On Wed, Nov 5, 2014 at 5:33 PM, Matei Zaharia [hidden email] 
wrote: 

 BTW, my own vote is obviously +1 (binding). 
 
 Matei 
 
  On Nov 5, 2014, at 5:31 PM, Matei Zaharia [hidden email] 
 wrote: 
  
  Hi all, 
  
  I wanted to share a discussion we've been having on the PMC list, as 
 well as call for an official vote on it on a public list. Basically, as
 the 
 Spark project scales up, we need to define a model to make sure there is 
 still great oversight of key components (in particular internal 
 architecture and public APIs), and to this end I've proposed implementing
 a 
 maintainer model for some of these components, similar to other large 
 projects. 
  
  As background on this, Spark has grown a lot since joining Apache. We've 
 had over 80 contributors/month for the past 3 months, which I believe
 makes 
 us the most active project in contributors/month at Apache, as well as
 over 
 500 patches/month. The codebase has also grown significantly, with new 
 libraries for SQL, ML, graphs and more. 
  
  In this kind of large project, one common way to scale development is to 
 assign maintainers to oversee key components, where each patch to that 
 component needs to get sign-off from at least one of its maintainers. Most 
 existing large projects do this -- at Apache, some large ones with this 
 model are CloudStack (the second-most active project overall), Subversion, 
 and Kafka, and other examples include Linux and Python. This is also 
 by-and-large how Spark operates today -- most components have a de-facto 
 maintainer. 
  
  IMO, adopting this model would have two benefits: 
  
  1) Consistent oversight of design for that component, especially 
 regarding architecture and API. This process would ensure that the 
 component's maintainers see all proposed changes and consider them to fit 
 together in a good way. 
  
  2) More structure for new contributors and committers -- in particular, 
 it would be easy to look up who’s responsible for each module and ask them 
 for reviews, etc, rather than having patches slip between the cracks. 
  
  We'd like to start with in a light-weight manner, where the model only 
 applies to certain key components (e.g. scheduler, shuffle) and
 user-facing 
 APIs (MLlib, GraphX, etc). Over time, as the project grows, we can expand 
 it if we deem it useful. The specific mechanics would be as follows: 
  
  - Some components in Spark will have maintainers assigned to them, where 
 one of the maintainers needs to sign off on each patch to the component. 
  - Each component with maintainers will have at least 2 maintainers. 
  - Maintainers will be assigned from the most active and knowledgeable 
 committers on that component by the PMC. The PMC can vote to add / remove 
 maintainers, and maintained components, through consensus. 
  - Maintainers are expected to be active in responding to patches for 
 their components, though they do not need to be the main reviewers for
 them 
 (e.g. they might just sign off on architecture / API). To prevent inactive 
 maintainers from blocking the project, if a maintainer isn't responding in 
 a reasonable time period (say 2 weeks), other committers can merge the 
 patch, and the PMC will want to discuss adding another maintainer. 
  
  If you'd like to see examples for this model, check out the following 
 projects: 
  - CloudStack: 
 https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
  
 https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
  
  - Subversion: 
 https://subversion.apache.org/docs/community-guide/roles.html  
 https://subversion.apache.org/docs/community-guide/roles.html 
  
  Finally, I wanted to list our current proposal for initial components 
 and maintainers. It would be good to get feedback on other components we 
 might add, but please note that personnel discussions (e.g. I don't think 
 Matei should maintain *that* component) should only happen on the private 
 list. The initial components were chosen to include all public APIs and
 the 
 main core components, and the maintainers were chosen from the most active 
 contributors to those modules. 
  
  - Spark core public API: Matei, Patrick, Reynold 
  - Job scheduler: Matei, Kay, Patrick 
  - Shuffle and network: Reynold, Aaron, Matei 
  - Block manager: Reynold, Aaron 
  - YARN: Tom, Andrew Or 
  - Python: Josh, Matei 
  - MLlib: Xiangrui, Matei 
  - SQL: Michael, Reynold 
  - Streaming: TD, Matei 
  - GraphX: Ankur, Joey, Reynold 
  
  I'd like to formally call a [VOTE] on this model, to last 72 hours. The 
 [VOTE] will end on Nov 8, 2014 at 6 PM PST. 
  
  Matei 
 
 



-
-- Yu Ishikawa
--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Designating-maintainers-for-some-Spark-components-tp9115p9281.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com

[RESULT] [VOTE] Designating maintainers for some Spark components

2014-11-08 Thread Matei Zaharia
Thanks everyone for voting on this. With all of the PMC votes being for, the 
vote passes, but there were some concerns that I wanted to address for everyone 
who brought them up, as well as in the wording we will use for this policy.

First, like every Apache project, Spark follows the Apache voting process 
(http://www.apache.org/foundation/voting.html), wherein all code changes are 
done by consensus. This means that any PMC member can block a code change on 
technical grounds, and thus that there is consensus when something goes in. 
It's absolutely true that every PMC member is responsible for the whole 
codebase, as Greg said (not least due to legal reasons, e.g. making sure it 
complies to licensing rules), and this idea will not change that. To make this 
clear, I will include that in the wording on the project page, to make sure new 
committers and other community members are all aware of it.

What the maintainer model does, instead, is to change the review process, by 
having a required review from some people on some types of code changes 
(assuming those people respond in time). Projects can have their own diverse 
review processes (e.g. some do commit-then-review and others do 
review-then-commit, some point people to specific reviewers, etc). This kind of 
process seems useful to try (and to refine) as the project grows. We will of 
course evaluate how it goes and respond to any problems.

So to summarize,

- Every committer is responsible for, and more than welcome to review and vote 
on, every code change. In fact all community members are welcome to do this, 
and lots are doing it.
- Everyone has the same voting rights on these code changes (namely consensus 
as described at http://www.apache.org/foundation/voting.html)
- Committers will be asked to run patches that are making architectural and API 
changes by the maintainers before merging.

In practice, none of this matters too much because we are not exactly a 
hot-well of discord ;), and even in the case of discord, the point of the ASF 
voting process is to create consensus. The goal is just to have a better 
structure for reviewing and minimize the chance of errors.

Here is a tally of the votes:

Binding votes (from PMC): 17 +1, no 0 or -1

Matei Zaharia
Michael Armbrust
Reynold Xin
Patrick Wendell
Andrew Or
Prashant Sharma
Mark Hamstra
Xiangrui Meng
Ankur Dave
Imran Rashid
Jason Dai
Tom Graves
Sean McNamara
Nick Pentreath
Josh Rosen
Kay Ousterhout
Tathagata Das

Non-binding votes: 18 +1, one +0, one -1

+1:
Nan Zhu
Nicholas Chammas
Denny Lee
Cheng Lian
Timothy Chen
Jeremy Freeman
Cheng Hao
Jackylk Likun
Kousuke Saruta
Reza Zadeh
Xuefeng Wu
Witgo
Manoj Babu
Ravindra Pesala
Liquan Pei
Kushal Datta
Davies Liu
Vaquar Khan

+0: Corey Nolet

-1: Greg Stein

I'll send another email when I have a more detailed writeup of this on the 
website.

Matei
-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: [VOTE] Designating maintainers for some Spark components

2014-11-07 Thread Kay Ousterhout
+1 (binding)

I see this as a way to increase transparency and efficiency around a
process that already informally exists, with benefits to both new
contributors and committers.  For new contributors, it makes clear who they
should ping about a pending patch.  For committers, it's a good reference
for who to rope in if they're reviewing a change that touches code they're
unfamiliar with.  I've often found myself in that situation when doing a
review; for me, having this list would be quite helpful.

-Kay

On Thu, Nov 6, 2014 at 10:00 AM, Josh Rosen rosenvi...@gmail.com wrote:

 +1 (binding).

 (our pull request browsing tool is open-source, by the way; contributions
 welcome: https://github.com/databricks/spark-pr-dashboard)

 On Thu, Nov 6, 2014 at 9:28 AM, Nick Pentreath nick.pentre...@gmail.com
 wrote:

  +1 (binding)
 
  —
  Sent from Mailbox
 
  On Thu, Nov 6, 2014 at 6:52 PM, Debasish Das debasish.da...@gmail.com
  wrote:
 
   +1
   The app to track PRs based on component is a great idea...
   On Thu, Nov 6, 2014 at 8:47 AM, Sean McNamara 
  sean.mcnam...@webtrends.com
   wrote:
   +1
  
   Sean
  
   On Nov 5, 2014, at 6:32 PM, Matei Zaharia matei.zaha...@gmail.com
  wrote:
  
Hi all,
   
I wanted to share a discussion we've been having on the PMC list, as
   well as call for an official vote on it on a public list. Basically,
 as
  the
   Spark project scales up, we need to define a model to make sure there
 is
   still great oversight of key components (in particular internal
   architecture and public APIs), and to this end I've proposed
  implementing a
   maintainer model for some of these components, similar to other large
   projects.
   
As background on this, Spark has grown a lot since joining Apache.
  We've
   had over 80 contributors/month for the past 3 months, which I believe
  makes
   us the most active project in contributors/month at Apache, as well as
  over
   500 patches/month. The codebase has also grown significantly, with new
   libraries for SQL, ML, graphs and more.
   
In this kind of large project, one common way to scale development
 is
  to
   assign maintainers to oversee key components, where each patch to
 that
   component needs to get sign-off from at least one of its maintainers.
  Most
   existing large projects do this -- at Apache, some large ones with
 this
   model are CloudStack (the second-most active project overall),
  Subversion,
   and Kafka, and other examples include Linux and Python. This is also
   by-and-large how Spark operates today -- most components have a
 de-facto
   maintainer.
   
IMO, adopting this model would have two benefits:
   
1) Consistent oversight of design for that component, especially
   regarding architecture and API. This process would ensure that the
   component's maintainers see all proposed changes and consider them to
  fit
   together in a good way.
   
2) More structure for new contributors and committers -- in
  particular,
   it would be easy to look up who’s responsible for each module and ask
  them
   for reviews, etc, rather than having patches slip between the cracks.
   
We'd like to start with in a light-weight manner, where the model
 only
   applies to certain key components (e.g. scheduler, shuffle) and
  user-facing
   APIs (MLlib, GraphX, etc). Over time, as the project grows, we can
  expand
   it if we deem it useful. The specific mechanics would be as follows:
   
- Some components in Spark will have maintainers assigned to them,
  where
   one of the maintainers needs to sign off on each patch to the
 component.
- Each component with maintainers will have at least 2 maintainers.
- Maintainers will be assigned from the most active and
 knowledgeable
   committers on that component by the PMC. The PMC can vote to add /
  remove
   maintainers, and maintained components, through consensus.
- Maintainers are expected to be active in responding to patches for
   their components, though they do not need to be the main reviewers for
  them
   (e.g. they might just sign off on architecture / API). To prevent
  inactive
   maintainers from blocking the project, if a maintainer isn't
 responding
  in
   a reasonable time period (say 2 weeks), other committers can merge the
   patch, and the PMC will want to discuss adding another maintainer.
   
If you'd like to see examples for this model, check out the
 following
   projects:
- CloudStack:
  
 
 https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
   
  
 
 https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
   
- Subversion:
   https://subversion.apache.org/docs/community-guide/roles.html 
   https://subversion.apache.org/docs/community-guide/roles.html
   
Finally, I wanted to list our current proposal for initial
 components
   and maintainers. It would be good to get feedback on other components
 we
   might add, but please 

Re: [VOTE] Designating maintainers for some Spark components

2014-11-07 Thread Davies Liu
-1 (not binding, +1 for maintainer, -1 for sign off)

Agree with Greg and Vinod. In the beginning, everything is better
(more efficient, more focus), but after some time, fighting begins.

Code style is the most hot topic to fight (we already saw it in some
PRs). If two committers (one of them is maintainer) have not got a
agreement on code style, before this process, they will ask comments
from other committers, but after this process, the maintainer have
higher priority to -1, then maintainer will keep his/her personal
preference, it's hard to make a agreement. Finally, different
components will have different code style (or others).

Right now, maintainers are kind of first contact or best contacts, the
best person to review the PR in that component. We could announce it,
then new contributors can easily find the right one to review.

My 2 cents.

Davies


On Thu, Nov 6, 2014 at 11:43 PM, Vinod Kumar Vavilapalli
vino...@apache.org wrote:
 With the maintainer model, the process is as follows:

 - Any committer could review the patch and merge it, but they would need to 
 forward it to me (or another core API maintainer) to make sure we also 
 approve
 - At any point during this process, I could come in and -1 it, or give 
 feedback
 - In addition, any other committer beyond me is still allowed to -1 this 
 patch

 The only change in this model is that committers are responsible to forward 
 patches in these areas to certain other committers. If every committer had 
 perfect oversight of the project, they could have also seen every patch to 
 their component on their own, but this list ensures that they see it even if 
 they somehow overlooked it.


 Having done the job of playing an informal 'maintainer' of a project myself, 
 this is what I think you really need:

 The so called 'maintainers' do one of the below
  - Actively poll the lists and watch over contributions. And follow what is 
 repeated often around here: Trust but verify.
  - Setup automated mechanisms to send all bug-tracker updates of a specific 
 component to a list that people can subscribe to

 And/or
  - Individual contributors send review requests to unofficial 'maintainers' 
 over dev-lists or through tools. Like many projects do with review boards and 
 other tools.

 Note that none of the above is a required step. It must not be, that's the 
 point. But once set as a convention, they will all help you address your 
 concerns with project scalability.

 Anything else that you add is bestowing privileges to a select few and 
 forming dictatorships. And contrary to what the proposal claims, this is 
 neither scalable nor confirming to Apache governance rules.

 +Vinod

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: [VOTE] Designating maintainers for some Spark components

2014-11-07 Thread Tathagata Das
+1 (binding)

I agree with the proposal that it just formalizes what we have been
doing till now, and will increase the efficiency and focus of the
review process.

To address Davies' concern, I agree coding style is often a hot topic
of contention. But that is just an indication that our processes are
not perfect and we have much room to improve (which is what this
proposal is all about). Regarding the specific case of coding style,
we should all get together, discuss, and make our coding style guide
more comprehensive so that such concerns can be dealt with once and
not be a recurring concern. And that guide will override any one's
personal preference, be it the maintainer or a new committer.

TD


On Fri, Nov 7, 2014 at 3:18 PM, Davies Liu dav...@databricks.com wrote:
 -1 (not binding, +1 for maintainer, -1 for sign off)

 Agree with Greg and Vinod. In the beginning, everything is better
 (more efficient, more focus), but after some time, fighting begins.

 Code style is the most hot topic to fight (we already saw it in some
 PRs). If two committers (one of them is maintainer) have not got a
 agreement on code style, before this process, they will ask comments
 from other committers, but after this process, the maintainer have
 higher priority to -1, then maintainer will keep his/her personal
 preference, it's hard to make a agreement. Finally, different
 components will have different code style (or others).

 Right now, maintainers are kind of first contact or best contacts, the
 best person to review the PR in that component. We could announce it,
 then new contributors can easily find the right one to review.

 My 2 cents.

 Davies


 On Thu, Nov 6, 2014 at 11:43 PM, Vinod Kumar Vavilapalli
 vino...@apache.org wrote:
 With the maintainer model, the process is as follows:

 - Any committer could review the patch and merge it, but they would need to 
 forward it to me (or another core API maintainer) to make sure we also 
 approve
 - At any point during this process, I could come in and -1 it, or give 
 feedback
 - In addition, any other committer beyond me is still allowed to -1 this 
 patch

 The only change in this model is that committers are responsible to forward 
 patches in these areas to certain other committers. If every committer had 
 perfect oversight of the project, they could have also seen every patch to 
 their component on their own, but this list ensures that they see it even 
 if they somehow overlooked it.


 Having done the job of playing an informal 'maintainer' of a project myself, 
 this is what I think you really need:

 The so called 'maintainers' do one of the below
  - Actively poll the lists and watch over contributions. And follow what is 
 repeated often around here: Trust but verify.
  - Setup automated mechanisms to send all bug-tracker updates of a specific 
 component to a list that people can subscribe to

 And/or
  - Individual contributors send review requests to unofficial 'maintainers' 
 over dev-lists or through tools. Like many projects do with review boards 
 and other tools.

 Note that none of the above is a required step. It must not be, that's the 
 point. But once set as a convention, they will all help you address your 
 concerns with project scalability.

 Anything else that you add is bestowing privileges to a select few and 
 forming dictatorships. And contrary to what the proposal claims, this is 
 neither scalable nor confirming to Apache governance rules.

 +Vinod

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: [VOTE] Designating maintainers for some Spark components

2014-11-07 Thread Davies Liu
Sorry for my last email, I misunderstood the proposal here, all the
committer still have equal -1 to all the code changes.

Also, as mentioned in the proposal, the sign off only happens to
public API and architect, something like discussion about code style
things are still the same.

So, I'd revert my vote to +1. Sorry for this.

Davies


On Fri, Nov 7, 2014 at 3:18 PM, Davies Liu dav...@databricks.com wrote:
 -1 (not binding, +1 for maintainer, -1 for sign off)

 Agree with Greg and Vinod. In the beginning, everything is better
 (more efficient, more focus), but after some time, fighting begins.

 Code style is the most hot topic to fight (we already saw it in some
 PRs). If two committers (one of them is maintainer) have not got a
 agreement on code style, before this process, they will ask comments
 from other committers, but after this process, the maintainer have
 higher priority to -1, then maintainer will keep his/her personal
 preference, it's hard to make a agreement. Finally, different
 components will have different code style (or others).

 Right now, maintainers are kind of first contact or best contacts, the
 best person to review the PR in that component. We could announce it,
 then new contributors can easily find the right one to review.

 My 2 cents.

 Davies


 On Thu, Nov 6, 2014 at 11:43 PM, Vinod Kumar Vavilapalli
 vino...@apache.org wrote:
 With the maintainer model, the process is as follows:

 - Any committer could review the patch and merge it, but they would need to 
 forward it to me (or another core API maintainer) to make sure we also 
 approve
 - At any point during this process, I could come in and -1 it, or give 
 feedback
 - In addition, any other committer beyond me is still allowed to -1 this 
 patch

 The only change in this model is that committers are responsible to forward 
 patches in these areas to certain other committers. If every committer had 
 perfect oversight of the project, they could have also seen every patch to 
 their component on their own, but this list ensures that they see it even 
 if they somehow overlooked it.


 Having done the job of playing an informal 'maintainer' of a project myself, 
 this is what I think you really need:

 The so called 'maintainers' do one of the below
  - Actively poll the lists and watch over contributions. And follow what is 
 repeated often around here: Trust but verify.
  - Setup automated mechanisms to send all bug-tracker updates of a specific 
 component to a list that people can subscribe to

 And/or
  - Individual contributors send review requests to unofficial 'maintainers' 
 over dev-lists or through tools. Like many projects do with review boards 
 and other tools.

 Note that none of the above is a required step. It must not be, that's the 
 point. But once set as a convention, they will all help you address your 
 concerns with project scalability.

 Anything else that you add is bestowing privileges to a select few and 
 forming dictatorships. And contrary to what the proposal claims, this is 
 neither scalable nor confirming to Apache governance rules.

 +Vinod

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: [VOTE] Designating maintainers for some Spark components

2014-11-07 Thread vaquar khan
+1 (binding)
On 8 Nov 2014 07:26, Davies Liu dav...@databricks.com wrote:

 Sorry for my last email, I misunderstood the proposal here, all the
 committer still have equal -1 to all the code changes.

 Also, as mentioned in the proposal, the sign off only happens to
 public API and architect, something like discussion about code style
 things are still the same.

 So, I'd revert my vote to +1. Sorry for this.

 Davies


 On Fri, Nov 7, 2014 at 3:18 PM, Davies Liu dav...@databricks.com wrote:
  -1 (not binding, +1 for maintainer, -1 for sign off)
 
  Agree with Greg and Vinod. In the beginning, everything is better
  (more efficient, more focus), but after some time, fighting begins.
 
  Code style is the most hot topic to fight (we already saw it in some
  PRs). If two committers (one of them is maintainer) have not got a
  agreement on code style, before this process, they will ask comments
  from other committers, but after this process, the maintainer have
  higher priority to -1, then maintainer will keep his/her personal
  preference, it's hard to make a agreement. Finally, different
  components will have different code style (or others).
 
  Right now, maintainers are kind of first contact or best contacts, the
  best person to review the PR in that component. We could announce it,
  then new contributors can easily find the right one to review.
 
  My 2 cents.
 
  Davies
 
 
  On Thu, Nov 6, 2014 at 11:43 PM, Vinod Kumar Vavilapalli
  vino...@apache.org wrote:
  With the maintainer model, the process is as follows:
 
  - Any committer could review the patch and merge it, but they would
 need to forward it to me (or another core API maintainer) to make sure we
 also approve
  - At any point during this process, I could come in and -1 it, or give
 feedback
  - In addition, any other committer beyond me is still allowed to -1
 this patch
 
  The only change in this model is that committers are responsible to
 forward patches in these areas to certain other committers. If every
 committer had perfect oversight of the project, they could have also seen
 every patch to their component on their own, but this list ensures that
 they see it even if they somehow overlooked it.
 
 
  Having done the job of playing an informal 'maintainer' of a project
 myself, this is what I think you really need:
 
  The so called 'maintainers' do one of the below
   - Actively poll the lists and watch over contributions. And follow
 what is repeated often around here: Trust but verify.
   - Setup automated mechanisms to send all bug-tracker updates of a
 specific component to a list that people can subscribe to
 
  And/or
   - Individual contributors send review requests to unofficial
 'maintainers' over dev-lists or through tools. Like many projects do with
 review boards and other tools.
 
  Note that none of the above is a required step. It must not be, that's
 the point. But once set as a convention, they will all help you address
 your concerns with project scalability.
 
  Anything else that you add is bestowing privileges to a select few and
 forming dictatorships. And contrary to what the proposal claims, this is
 neither scalable nor confirming to Apache governance rules.
 
  +Vinod

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org




Re: [VOTE] Designating maintainers for some Spark components

2014-11-06 Thread Ankur Dave
+1 (binding)

Ankur http://www.ankurdave.com/

On Wed, Nov 5, 2014 at 5:31 PM, Matei Zaharia matei.zaha...@gmail.com
wrote:

 I'd like to formally call a [VOTE] on this model, to last 72 hours. The
 [VOTE] will end on Nov 8, 2014 at 6 PM PST.



Re: [VOTE] Designating maintainers for some Spark components

2014-11-06 Thread Kushal Datta
+1 (binding)

For tickets which span across multiple components, will it need to be
approved by all maintainers? For example, I'm working on the Python
bindings of GraphX where code is added to both Python and GraphX modules.

Thanks,
-Kushal.

On Thu, Nov 6, 2014 at 12:02 AM, Ankur Dave ankurd...@gmail.com wrote:

 +1 (binding)

 Ankur http://www.ankurdave.com/

 On Wed, Nov 5, 2014 at 5:31 PM, Matei Zaharia matei.zaha...@gmail.com
 wrote:

  I'd like to formally call a [VOTE] on this model, to last 72 hours. The
  [VOTE] will end on Nov 8, 2014 at 6 PM PST.
 



Re: [VOTE] Designating maintainers for some Spark components

2014-11-06 Thread Imran Rashid
+1 overall

also +1 to Sandy's suggestion to getting build maintainers as well.

On Wed, Nov 5, 2014 at 7:57 PM, Sandy Ryza sandy.r...@cloudera.com wrote:

 This seems like a good idea.

 An area that wasn't listed, but that I think could strongly benefit from
 maintainers, is the build.  Having consistent oversight over Maven, SBT,
 and dependencies would allow us to avoid subtle breakages.

 Component maintainers have come up several times within the Hadoop project,
 and I think one of the main reasons the proposals have been rejected is
 that, structurally, its effect is to slow down development.  As you
 mention, this is somewhat mitigated if being a maintainer leads committers
 to take on more responsibility, but it might be worthwhile to draw up more
 specific ideas on how to combat this?  E.g. do obvious changes, doc fixes,
 test fixes, etc. always require a maintainer?

 -Sandy

 On Wed, Nov 5, 2014 at 5:36 PM, Michael Armbrust mich...@databricks.com
 wrote:

  +1 (binding)
 
  On Wed, Nov 5, 2014 at 5:33 PM, Matei Zaharia matei.zaha...@gmail.com
  wrote:
 
   BTW, my own vote is obviously +1 (binding).
  
   Matei
  
On Nov 5, 2014, at 5:31 PM, Matei Zaharia matei.zaha...@gmail.com
   wrote:
   
Hi all,
   
I wanted to share a discussion we've been having on the PMC list, as
   well as call for an official vote on it on a public list. Basically, as
  the
   Spark project scales up, we need to define a model to make sure there
 is
   still great oversight of key components (in particular internal
   architecture and public APIs), and to this end I've proposed
  implementing a
   maintainer model for some of these components, similar to other large
   projects.
   
As background on this, Spark has grown a lot since joining Apache.
  We've
   had over 80 contributors/month for the past 3 months, which I believe
  makes
   us the most active project in contributors/month at Apache, as well as
  over
   500 patches/month. The codebase has also grown significantly, with new
   libraries for SQL, ML, graphs and more.
   
In this kind of large project, one common way to scale development is
  to
   assign maintainers to oversee key components, where each patch to
 that
   component needs to get sign-off from at least one of its maintainers.
  Most
   existing large projects do this -- at Apache, some large ones with this
   model are CloudStack (the second-most active project overall),
  Subversion,
   and Kafka, and other examples include Linux and Python. This is also
   by-and-large how Spark operates today -- most components have a
 de-facto
   maintainer.
   
IMO, adopting this model would have two benefits:
   
1) Consistent oversight of design for that component, especially
   regarding architecture and API. This process would ensure that the
   component's maintainers see all proposed changes and consider them to
 fit
   together in a good way.
   
2) More structure for new contributors and committers -- in
 particular,
   it would be easy to look up who’s responsible for each module and ask
  them
   for reviews, etc, rather than having patches slip between the cracks.
   
We'd like to start with in a light-weight manner, where the model
 only
   applies to certain key components (e.g. scheduler, shuffle) and
  user-facing
   APIs (MLlib, GraphX, etc). Over time, as the project grows, we can
 expand
   it if we deem it useful. The specific mechanics would be as follows:
   
- Some components in Spark will have maintainers assigned to them,
  where
   one of the maintainers needs to sign off on each patch to the
 component.
- Each component with maintainers will have at least 2 maintainers.
- Maintainers will be assigned from the most active and knowledgeable
   committers on that component by the PMC. The PMC can vote to add /
 remove
   maintainers, and maintained components, through consensus.
- Maintainers are expected to be active in responding to patches for
   their components, though they do not need to be the main reviewers for
  them
   (e.g. they might just sign off on architecture / API). To prevent
  inactive
   maintainers from blocking the project, if a maintainer isn't responding
  in
   a reasonable time period (say 2 weeks), other committers can merge the
   patch, and the PMC will want to discuss adding another maintainer.
   
If you'd like to see examples for this model, check out the following
   projects:
- CloudStack:
  
 
 https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
   
  
 
 https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
   
- Subversion:
   https://subversion.apache.org/docs/community-guide/roles.html 
   https://subversion.apache.org/docs/community-guide/roles.html
   
Finally, I wanted to list our current proposal for initial components
   and maintainers. It would be good to get feedback on other components
 we
   might add, 

Re: [VOTE] Designating maintainers for some Spark components

2014-11-06 Thread Jason Dai
+1 (binding)

On Thu, Nov 6, 2014 at 4:02 PM, Ankur Dave ankurd...@gmail.com wrote:

 +1 (binding)

 Ankur http://www.ankurdave.com/

 On Wed, Nov 5, 2014 at 5:31 PM, Matei Zaharia matei.zaha...@gmail.com
 wrote:

  I'd like to formally call a [VOTE] on this model, to last 72 hours. The
  [VOTE] will end on Nov 8, 2014 at 6 PM PST.
 



Re: [VOTE] Designating maintainers for some Spark components

2014-11-06 Thread RJ Nowling
Matei,

I saw that you're listed as a maintainer for ~6 different subcomponents,
and on over half of those, you're only the 2nd person.  My concern is that
you would be stretched thin and maybe wouldn't be able to work as a back
up on all of those subcomponents.  Are you planning on adding more
maintainers for each subcomponent?  I think it would be good to have 2
regulars + backups for each.

RJ

On Thu, Nov 6, 2014 at 8:48 AM, Jason Dai jason@gmail.com wrote:

 +1 (binding)

 On Thu, Nov 6, 2014 at 4:02 PM, Ankur Dave ankurd...@gmail.com wrote:

  +1 (binding)
 
  Ankur http://www.ankurdave.com/
 
  On Wed, Nov 5, 2014 at 5:31 PM, Matei Zaharia matei.zaha...@gmail.com
  wrote:
 
   I'd like to formally call a [VOTE] on this model, to last 72 hours. The
   [VOTE] will end on Nov 8, 2014 at 6 PM PST.
  
 




-- 
em rnowl...@gmail.com
c 954.496.2314


Re: [VOTE] Designating maintainers for some Spark components

2014-11-06 Thread Tom Graves
+1.
Tom 

 On Wednesday, November 5, 2014 9:21 PM, Matei Zaharia 
matei.zaha...@gmail.com wrote:
   

 BTW, my own vote is obviously +1 (binding).

Matei

 On Nov 5, 2014, at 5:31 PM, Matei Zaharia matei.zaha...@gmail.com wrote:
 
 Hi all,
 
 I wanted to share a discussion we've been having on the PMC list, as well as 
 call for an official vote on it on a public list. Basically, as the Spark 
 project scales up, we need to define a model to make sure there is still 
 great oversight of key components (in particular internal architecture and 
 public APIs), and to this end I've proposed implementing a maintainer model 
 for some of these components, similar to other large projects.
 
 As background on this, Spark has grown a lot since joining Apache. We've had 
 over 80 contributors/month for the past 3 months, which I believe makes us 
 the most active project in contributors/month at Apache, as well as over 500 
 patches/month. The codebase has also grown significantly, with new libraries 
 for SQL, ML, graphs and more.
 
 In this kind of large project, one common way to scale development is to 
 assign maintainers to oversee key components, where each patch to that 
 component needs to get sign-off from at least one of its maintainers. Most 
 existing large projects do this -- at Apache, some large ones with this model 
 are CloudStack (the second-most active project overall), Subversion, and 
 Kafka, and other examples include Linux and Python. This is also by-and-large 
 how Spark operates today -- most components have a de-facto maintainer.
 
 IMO, adopting this model would have two benefits:
 
 1) Consistent oversight of design for that component, especially regarding 
 architecture and API. This process would ensure that the component's 
 maintainers see all proposed changes and consider them to fit together in a 
 good way.
 
 2) More structure for new contributors and committers -- in particular, it 
 would be easy to look up who’s responsible for each module and ask them for 
 reviews, etc, rather than having patches slip between the cracks.
 
 We'd like to start with in a light-weight manner, where the model only 
 applies to certain key components (e.g. scheduler, shuffle) and user-facing 
 APIs (MLlib, GraphX, etc). Over time, as the project grows, we can expand it 
 if we deem it useful. The specific mechanics would be as follows:
 
 - Some components in Spark will have maintainers assigned to them, where one 
 of the maintainers needs to sign off on each patch to the component.
 - Each component with maintainers will have at least 2 maintainers.
 - Maintainers will be assigned from the most active and knowledgeable 
 committers on that component by the PMC. The PMC can vote to add / remove 
 maintainers, and maintained components, through consensus.
 - Maintainers are expected to be active in responding to patches for their 
 components, though they do not need to be the main reviewers for them (e.g. 
 they might just sign off on architecture / API). To prevent inactive 
 maintainers from blocking the project, if a maintainer isn't responding in a 
 reasonable time period (say 2 weeks), other committers can merge the patch, 
 and the PMC will want to discuss adding another maintainer.
 
 If you'd like to see examples for this model, check out the following 
 projects:
 - CloudStack: 
 https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
  
 https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
  
 - Subversion: https://subversion.apache.org/docs/community-guide/roles.html 
 https://subversion.apache.org/docs/community-guide/roles.html
 
 Finally, I wanted to list our current proposal for initial components and 
 maintainers. It would be good to get feedback on other components we might 
 add, but please note that personnel discussions (e.g. I don't think Matei 
 should maintain *that* component) should only happen on the private list. The 
 initial components were chosen to include all public APIs and the main core 
 components, and the maintainers were chosen from the most active contributors 
 to those modules.
 
 - Spark core public API: Matei, Patrick, Reynold
 - Job scheduler: Matei, Kay, Patrick
 - Shuffle and network: Reynold, Aaron, Matei
 - Block manager: Reynold, Aaron
 - YARN: Tom, Andrew Or
 - Python: Josh, Matei
 - MLlib: Xiangrui, Matei
 - SQL: Michael, Reynold
 - Streaming: TD, Matei
 - GraphX: Ankur, Joey, Reynold
 
 I'd like to formally call a [VOTE] on this model, to last 72 hours. The 
 [VOTE] will end on Nov 8, 2014 at 6 PM PST.
 
 Matei


   

Re: [VOTE] Designating maintainers for some Spark components

2014-11-06 Thread Sean McNamara
+1

Sean

On Nov 5, 2014, at 6:32 PM, Matei Zaharia matei.zaha...@gmail.com wrote:

 Hi all,
 
 I wanted to share a discussion we've been having on the PMC list, as well as 
 call for an official vote on it on a public list. Basically, as the Spark 
 project scales up, we need to define a model to make sure there is still 
 great oversight of key components (in particular internal architecture and 
 public APIs), and to this end I've proposed implementing a maintainer model 
 for some of these components, similar to other large projects.
 
 As background on this, Spark has grown a lot since joining Apache. We've had 
 over 80 contributors/month for the past 3 months, which I believe makes us 
 the most active project in contributors/month at Apache, as well as over 500 
 patches/month. The codebase has also grown significantly, with new libraries 
 for SQL, ML, graphs and more.
 
 In this kind of large project, one common way to scale development is to 
 assign maintainers to oversee key components, where each patch to that 
 component needs to get sign-off from at least one of its maintainers. Most 
 existing large projects do this -- at Apache, some large ones with this model 
 are CloudStack (the second-most active project overall), Subversion, and 
 Kafka, and other examples include Linux and Python. This is also by-and-large 
 how Spark operates today -- most components have a de-facto maintainer.
 
 IMO, adopting this model would have two benefits:
 
 1) Consistent oversight of design for that component, especially regarding 
 architecture and API. This process would ensure that the component's 
 maintainers see all proposed changes and consider them to fit together in a 
 good way.
 
 2) More structure for new contributors and committers -- in particular, it 
 would be easy to look up who’s responsible for each module and ask them for 
 reviews, etc, rather than having patches slip between the cracks.
 
 We'd like to start with in a light-weight manner, where the model only 
 applies to certain key components (e.g. scheduler, shuffle) and user-facing 
 APIs (MLlib, GraphX, etc). Over time, as the project grows, we can expand it 
 if we deem it useful. The specific mechanics would be as follows:
 
 - Some components in Spark will have maintainers assigned to them, where one 
 of the maintainers needs to sign off on each patch to the component.
 - Each component with maintainers will have at least 2 maintainers.
 - Maintainers will be assigned from the most active and knowledgeable 
 committers on that component by the PMC. The PMC can vote to add / remove 
 maintainers, and maintained components, through consensus.
 - Maintainers are expected to be active in responding to patches for their 
 components, though they do not need to be the main reviewers for them (e.g. 
 they might just sign off on architecture / API). To prevent inactive 
 maintainers from blocking the project, if a maintainer isn't responding in a 
 reasonable time period (say 2 weeks), other committers can merge the patch, 
 and the PMC will want to discuss adding another maintainer.
 
 If you'd like to see examples for this model, check out the following 
 projects:
 - CloudStack: 
 https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
  
 https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
  
 - Subversion: https://subversion.apache.org/docs/community-guide/roles.html 
 https://subversion.apache.org/docs/community-guide/roles.html
 
 Finally, I wanted to list our current proposal for initial components and 
 maintainers. It would be good to get feedback on other components we might 
 add, but please note that personnel discussions (e.g. I don't think Matei 
 should maintain *that* component) should only happen on the private list. The 
 initial components were chosen to include all public APIs and the main core 
 components, and the maintainers were chosen from the most active contributors 
 to those modules.
 
 - Spark core public API: Matei, Patrick, Reynold
 - Job scheduler: Matei, Kay, Patrick
 - Shuffle and network: Reynold, Aaron, Matei
 - Block manager: Reynold, Aaron
 - YARN: Tom, Andrew Or
 - Python: Josh, Matei
 - MLlib: Xiangrui, Matei
 - SQL: Michael, Reynold
 - Streaming: TD, Matei
 - GraphX: Ankur, Joey, Reynold
 
 I'd like to formally call a [VOTE] on this model, to last 72 hours. The 
 [VOTE] will end on Nov 8, 2014 at 6 PM PST.
 
 Matei


-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: [VOTE] Designating maintainers for some Spark components

2014-11-06 Thread Debasish Das
+1

The app to track PRs based on component is a great idea...

On Thu, Nov 6, 2014 at 8:47 AM, Sean McNamara sean.mcnam...@webtrends.com
wrote:

 +1

 Sean

 On Nov 5, 2014, at 6:32 PM, Matei Zaharia matei.zaha...@gmail.com wrote:

  Hi all,
 
  I wanted to share a discussion we've been having on the PMC list, as
 well as call for an official vote on it on a public list. Basically, as the
 Spark project scales up, we need to define a model to make sure there is
 still great oversight of key components (in particular internal
 architecture and public APIs), and to this end I've proposed implementing a
 maintainer model for some of these components, similar to other large
 projects.
 
  As background on this, Spark has grown a lot since joining Apache. We've
 had over 80 contributors/month for the past 3 months, which I believe makes
 us the most active project in contributors/month at Apache, as well as over
 500 patches/month. The codebase has also grown significantly, with new
 libraries for SQL, ML, graphs and more.
 
  In this kind of large project, one common way to scale development is to
 assign maintainers to oversee key components, where each patch to that
 component needs to get sign-off from at least one of its maintainers. Most
 existing large projects do this -- at Apache, some large ones with this
 model are CloudStack (the second-most active project overall), Subversion,
 and Kafka, and other examples include Linux and Python. This is also
 by-and-large how Spark operates today -- most components have a de-facto
 maintainer.
 
  IMO, adopting this model would have two benefits:
 
  1) Consistent oversight of design for that component, especially
 regarding architecture and API. This process would ensure that the
 component's maintainers see all proposed changes and consider them to fit
 together in a good way.
 
  2) More structure for new contributors and committers -- in particular,
 it would be easy to look up who’s responsible for each module and ask them
 for reviews, etc, rather than having patches slip between the cracks.
 
  We'd like to start with in a light-weight manner, where the model only
 applies to certain key components (e.g. scheduler, shuffle) and user-facing
 APIs (MLlib, GraphX, etc). Over time, as the project grows, we can expand
 it if we deem it useful. The specific mechanics would be as follows:
 
  - Some components in Spark will have maintainers assigned to them, where
 one of the maintainers needs to sign off on each patch to the component.
  - Each component with maintainers will have at least 2 maintainers.
  - Maintainers will be assigned from the most active and knowledgeable
 committers on that component by the PMC. The PMC can vote to add / remove
 maintainers, and maintained components, through consensus.
  - Maintainers are expected to be active in responding to patches for
 their components, though they do not need to be the main reviewers for them
 (e.g. they might just sign off on architecture / API). To prevent inactive
 maintainers from blocking the project, if a maintainer isn't responding in
 a reasonable time period (say 2 weeks), other committers can merge the
 patch, and the PMC will want to discuss adding another maintainer.
 
  If you'd like to see examples for this model, check out the following
 projects:
  - CloudStack:
 https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
 
 https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
 
  - Subversion:
 https://subversion.apache.org/docs/community-guide/roles.html 
 https://subversion.apache.org/docs/community-guide/roles.html
 
  Finally, I wanted to list our current proposal for initial components
 and maintainers. It would be good to get feedback on other components we
 might add, but please note that personnel discussions (e.g. I don't think
 Matei should maintain *that* component) should only happen on the private
 list. The initial components were chosen to include all public APIs and the
 main core components, and the maintainers were chosen from the most active
 contributors to those modules.
 
  - Spark core public API: Matei, Patrick, Reynold
  - Job scheduler: Matei, Kay, Patrick
  - Shuffle and network: Reynold, Aaron, Matei
  - Block manager: Reynold, Aaron
  - YARN: Tom, Andrew Or
  - Python: Josh, Matei
  - MLlib: Xiangrui, Matei
  - SQL: Michael, Reynold
  - Streaming: TD, Matei
  - GraphX: Ankur, Joey, Reynold
 
  I'd like to formally call a [VOTE] on this model, to last 72 hours. The
 [VOTE] will end on Nov 8, 2014 at 6 PM PST.
 
  Matei


 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org




Re: [VOTE] Designating maintainers for some Spark components

2014-11-06 Thread Nick Pentreath
+1 (binding)

—
Sent from Mailbox

On Thu, Nov 6, 2014 at 6:52 PM, Debasish Das debasish.da...@gmail.com
wrote:

 +1
 The app to track PRs based on component is a great idea...
 On Thu, Nov 6, 2014 at 8:47 AM, Sean McNamara sean.mcnam...@webtrends.com
 wrote:
 +1

 Sean

 On Nov 5, 2014, at 6:32 PM, Matei Zaharia matei.zaha...@gmail.com wrote:

  Hi all,
 
  I wanted to share a discussion we've been having on the PMC list, as
 well as call for an official vote on it on a public list. Basically, as the
 Spark project scales up, we need to define a model to make sure there is
 still great oversight of key components (in particular internal
 architecture and public APIs), and to this end I've proposed implementing a
 maintainer model for some of these components, similar to other large
 projects.
 
  As background on this, Spark has grown a lot since joining Apache. We've
 had over 80 contributors/month for the past 3 months, which I believe makes
 us the most active project in contributors/month at Apache, as well as over
 500 patches/month. The codebase has also grown significantly, with new
 libraries for SQL, ML, graphs and more.
 
  In this kind of large project, one common way to scale development is to
 assign maintainers to oversee key components, where each patch to that
 component needs to get sign-off from at least one of its maintainers. Most
 existing large projects do this -- at Apache, some large ones with this
 model are CloudStack (the second-most active project overall), Subversion,
 and Kafka, and other examples include Linux and Python. This is also
 by-and-large how Spark operates today -- most components have a de-facto
 maintainer.
 
  IMO, adopting this model would have two benefits:
 
  1) Consistent oversight of design for that component, especially
 regarding architecture and API. This process would ensure that the
 component's maintainers see all proposed changes and consider them to fit
 together in a good way.
 
  2) More structure for new contributors and committers -- in particular,
 it would be easy to look up who’s responsible for each module and ask them
 for reviews, etc, rather than having patches slip between the cracks.
 
  We'd like to start with in a light-weight manner, where the model only
 applies to certain key components (e.g. scheduler, shuffle) and user-facing
 APIs (MLlib, GraphX, etc). Over time, as the project grows, we can expand
 it if we deem it useful. The specific mechanics would be as follows:
 
  - Some components in Spark will have maintainers assigned to them, where
 one of the maintainers needs to sign off on each patch to the component.
  - Each component with maintainers will have at least 2 maintainers.
  - Maintainers will be assigned from the most active and knowledgeable
 committers on that component by the PMC. The PMC can vote to add / remove
 maintainers, and maintained components, through consensus.
  - Maintainers are expected to be active in responding to patches for
 their components, though they do not need to be the main reviewers for them
 (e.g. they might just sign off on architecture / API). To prevent inactive
 maintainers from blocking the project, if a maintainer isn't responding in
 a reasonable time period (say 2 weeks), other committers can merge the
 patch, and the PMC will want to discuss adding another maintainer.
 
  If you'd like to see examples for this model, check out the following
 projects:
  - CloudStack:
 https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
 
 https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
 
  - Subversion:
 https://subversion.apache.org/docs/community-guide/roles.html 
 https://subversion.apache.org/docs/community-guide/roles.html
 
  Finally, I wanted to list our current proposal for initial components
 and maintainers. It would be good to get feedback on other components we
 might add, but please note that personnel discussions (e.g. I don't think
 Matei should maintain *that* component) should only happen on the private
 list. The initial components were chosen to include all public APIs and the
 main core components, and the maintainers were chosen from the most active
 contributors to those modules.
 
  - Spark core public API: Matei, Patrick, Reynold
  - Job scheduler: Matei, Kay, Patrick
  - Shuffle and network: Reynold, Aaron, Matei
  - Block manager: Reynold, Aaron
  - YARN: Tom, Andrew Or
  - Python: Josh, Matei
  - MLlib: Xiangrui, Matei
  - SQL: Michael, Reynold
  - Streaming: TD, Matei
  - GraphX: Ankur, Joey, Reynold
 
  I'd like to formally call a [VOTE] on this model, to last 72 hours. The
 [VOTE] will end on Nov 8, 2014 at 6 PM PST.
 
  Matei


 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org



Re: [VOTE] Designating maintainers for some Spark components

2014-11-06 Thread Josh Rosen
+1 (binding).

(our pull request browsing tool is open-source, by the way; contributions
welcome: https://github.com/databricks/spark-pr-dashboard)

On Thu, Nov 6, 2014 at 9:28 AM, Nick Pentreath nick.pentre...@gmail.com
wrote:

 +1 (binding)

 —
 Sent from Mailbox

 On Thu, Nov 6, 2014 at 6:52 PM, Debasish Das debasish.da...@gmail.com
 wrote:

  +1
  The app to track PRs based on component is a great idea...
  On Thu, Nov 6, 2014 at 8:47 AM, Sean McNamara 
 sean.mcnam...@webtrends.com
  wrote:
  +1
 
  Sean
 
  On Nov 5, 2014, at 6:32 PM, Matei Zaharia matei.zaha...@gmail.com
 wrote:
 
   Hi all,
  
   I wanted to share a discussion we've been having on the PMC list, as
  well as call for an official vote on it on a public list. Basically, as
 the
  Spark project scales up, we need to define a model to make sure there is
  still great oversight of key components (in particular internal
  architecture and public APIs), and to this end I've proposed
 implementing a
  maintainer model for some of these components, similar to other large
  projects.
  
   As background on this, Spark has grown a lot since joining Apache.
 We've
  had over 80 contributors/month for the past 3 months, which I believe
 makes
  us the most active project in contributors/month at Apache, as well as
 over
  500 patches/month. The codebase has also grown significantly, with new
  libraries for SQL, ML, graphs and more.
  
   In this kind of large project, one common way to scale development is
 to
  assign maintainers to oversee key components, where each patch to that
  component needs to get sign-off from at least one of its maintainers.
 Most
  existing large projects do this -- at Apache, some large ones with this
  model are CloudStack (the second-most active project overall),
 Subversion,
  and Kafka, and other examples include Linux and Python. This is also
  by-and-large how Spark operates today -- most components have a de-facto
  maintainer.
  
   IMO, adopting this model would have two benefits:
  
   1) Consistent oversight of design for that component, especially
  regarding architecture and API. This process would ensure that the
  component's maintainers see all proposed changes and consider them to
 fit
  together in a good way.
  
   2) More structure for new contributors and committers -- in
 particular,
  it would be easy to look up who’s responsible for each module and ask
 them
  for reviews, etc, rather than having patches slip between the cracks.
  
   We'd like to start with in a light-weight manner, where the model only
  applies to certain key components (e.g. scheduler, shuffle) and
 user-facing
  APIs (MLlib, GraphX, etc). Over time, as the project grows, we can
 expand
  it if we deem it useful. The specific mechanics would be as follows:
  
   - Some components in Spark will have maintainers assigned to them,
 where
  one of the maintainers needs to sign off on each patch to the component.
   - Each component with maintainers will have at least 2 maintainers.
   - Maintainers will be assigned from the most active and knowledgeable
  committers on that component by the PMC. The PMC can vote to add /
 remove
  maintainers, and maintained components, through consensus.
   - Maintainers are expected to be active in responding to patches for
  their components, though they do not need to be the main reviewers for
 them
  (e.g. they might just sign off on architecture / API). To prevent
 inactive
  maintainers from blocking the project, if a maintainer isn't responding
 in
  a reasonable time period (say 2 weeks), other committers can merge the
  patch, and the PMC will want to discuss adding another maintainer.
  
   If you'd like to see examples for this model, check out the following
  projects:
   - CloudStack:
 
 https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
  
 
 https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
  
   - Subversion:
  https://subversion.apache.org/docs/community-guide/roles.html 
  https://subversion.apache.org/docs/community-guide/roles.html
  
   Finally, I wanted to list our current proposal for initial components
  and maintainers. It would be good to get feedback on other components we
  might add, but please note that personnel discussions (e.g. I don't
 think
  Matei should maintain *that* component) should only happen on the
 private
  list. The initial components were chosen to include all public APIs and
 the
  main core components, and the maintainers were chosen from the most
 active
  contributors to those modules.
  
   - Spark core public API: Matei, Patrick, Reynold
   - Job scheduler: Matei, Kay, Patrick
   - Shuffle and network: Reynold, Aaron, Matei
   - Block manager: Reynold, Aaron
   - YARN: Tom, Andrew Or
   - Python: Josh, Matei
   - MLlib: Xiangrui, Matei
   - SQL: Michael, Reynold
   - Streaming: TD, Matei
   - GraphX: Ankur, Joey, Reynold
  
   I'd like to formally call a [VOTE] on this 

Re: [VOTE] Designating maintainers for some Spark components

2014-11-06 Thread bc Wong
Hi Matei,

Good call on scaling the project itself. Identifying domain experts in
different areas is a good thing. But I have some questions about the
implementation. Here's my understanding of the proposal:

(1) The PMC votes on a list of components and their maintainers. Changes to
that list requires PMC approval.
(2) No committer shall commit changes to a component without a +1 from a
maintainer of that component.

I see good reasons for #1, to help people navigate the project and identify
expertise. For #2, I'd like to understand what problem it's trying to
solve. Do we have rogue committers committing to areas that they don't know
much about? If that's the case, we should address it directly, instead of
adding new processes.

To point out the obvious, it completely changes what committers means in
Spark. Do we have clear promotion criteria from committer to
maintainer? Is there a max number of maintainers per area Currently, as
committers gains expertise in new areas, they could start reviewing code in
those areas and give +1. This encourages more contributions and
cross-component knowledge sharing. Under the new proposal, they now have to
be promoted to maintainers first. That reduces our review bandwidth.

Again, if there is a quality issue with code reviews, let's talk to those
committers and help them do better. There are non-process ways to solve the
problem.

So I think we shouldn't require maintainer +1. I do like the idea of
having explicit maintainers on a volunteer basis. These maintainers should
watch their jira and PR traffic, and be very active in design  API
discussions. That leads to better consistency and long-term design choices.

Cheers,
bc

On Wed, Nov 5, 2014 at 5:31 PM, Matei Zaharia matei.zaha...@gmail.com
wrote:

 Hi all,

 I wanted to share a discussion we've been having on the PMC list, as well
 as call for an official vote on it on a public list. Basically, as the
 Spark project scales up, we need to define a model to make sure there is
 still great oversight of key components (in particular internal
 architecture and public APIs), and to this end I've proposed implementing a
 maintainer model for some of these components, similar to other large
 projects.

 As background on this, Spark has grown a lot since joining Apache. We've
 had over 80 contributors/month for the past 3 months, which I believe makes
 us the most active project in contributors/month at Apache, as well as over
 500 patches/month. The codebase has also grown significantly, with new
 libraries for SQL, ML, graphs and more.

 In this kind of large project, one common way to scale development is to
 assign maintainers to oversee key components, where each patch to that
 component needs to get sign-off from at least one of its maintainers. Most
 existing large projects do this -- at Apache, some large ones with this
 model are CloudStack (the second-most active project overall), Subversion,
 and Kafka, and other examples include Linux and Python. This is also
 by-and-large how Spark operates today -- most components have a de-facto
 maintainer.

 IMO, adopting this model would have two benefits:

 1) Consistent oversight of design for that component, especially regarding
 architecture and API. This process would ensure that the component's
 maintainers see all proposed changes and consider them to fit together in a
 good way.

 2) More structure for new contributors and committers -- in particular, it
 would be easy to look up who’s responsible for each module and ask them for
 reviews, etc, rather than having patches slip between the cracks.

 We'd like to start with in a light-weight manner, where the model only
 applies to certain key components (e.g. scheduler, shuffle) and user-facing
 APIs (MLlib, GraphX, etc). Over time, as the project grows, we can expand
 it if we deem it useful. The specific mechanics would be as follows:

 - Some components in Spark will have maintainers assigned to them, where
 one of the maintainers needs to sign off on each patch to the component.
 - Each component with maintainers will have at least 2 maintainers.
 - Maintainers will be assigned from the most active and knowledgeable
 committers on that component by the PMC. The PMC can vote to add / remove
 maintainers, and maintained components, through consensus.
 - Maintainers are expected to be active in responding to patches for their
 components, though they do not need to be the main reviewers for them (e.g.
 they might just sign off on architecture / API). To prevent inactive
 maintainers from blocking the project, if a maintainer isn't responding in
 a reasonable time period (say 2 weeks), other committers can merge the
 patch, and the PMC will want to discuss adding another maintainer.

 If you'd like to see examples for this model, check out the following
 projects:
 - CloudStack:
 https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
 
 

Re: [VOTE] Designating maintainers for some Spark components

2014-11-06 Thread Matei Zaharia
Hi BC,

The point is exactly to ensure that the maintainers have looked at each patch 
to that component and consider it to fit consistently into its architecture. 
The issue is not about rogue committers, it's about making sure that changes 
don't accidentally sneak in that we want to roll back, particularly because we 
have frequent releases and we guarantee API stability. This process is meant to 
ensure that whichever committer reviews a patch also forwards it to its 
maintainers.

Note that any committer is able to review patches in any component. The 
maintainer sign-off is just a second requirement for some core components 
(central parts of the system and public APIs). But I expect that most 
maintainers will let others do the bulk of the reviewing and focus only on 
changes to the architecture or API.

Ultimately, the core motivation is that the project has grown to the point 
where it's hard to expect every committer to have full understanding of every 
component. Some committers know a ton about systems but little about machine 
learning, some are algorithmic whizzes but may not realize the implications of 
changing something on the Python API, etc. This is just a way to make sure that 
a domain expert has looked at the areas where it is most likely for something 
to go wrong.

Matei

 On Nov 6, 2014, at 10:53 AM, bc Wong bcwal...@cloudera.com wrote:
 
 Hi Matei,
 
 Good call on scaling the project itself. Identifying domain experts in 
 different areas is a good thing. But I have some questions about the 
 implementation. Here's my understanding of the proposal:
 
 (1) The PMC votes on a list of components and their maintainers. Changes to 
 that list requires PMC approval.
 (2) No committer shall commit changes to a component without a +1 from a 
 maintainer of that component.
 
 I see good reasons for #1, to help people navigate the project and identify 
 expertise. For #2, I'd like to understand what problem it's trying to solve. 
 Do we have rogue committers committing to areas that they don't know much 
 about? If that's the case, we should address it directly, instead of adding 
 new processes.
 
 To point out the obvious, it completely changes what committers means in 
 Spark. Do we have clear promotion criteria from committer to maintainer? 
 Is there a max number of maintainers per area Currently, as committers gains 
 expertise in new areas, they could start reviewing code in those areas and 
 give +1. This encourages more contributions and cross-component knowledge 
 sharing. Under the new proposal, they now have to be promoted to 
 maintainers first. That reduces our review bandwidth.
 
 Again, if there is a quality issue with code reviews, let's talk to those 
 committers and help them do better. There are non-process ways to solve the 
 problem.
 
 So I think we shouldn't require maintainer +1. I do like the idea of having 
 explicit maintainers on a volunteer basis. These maintainers should watch 
 their jira and PR traffic, and be very active in design  API discussions. 
 That leads to better consistency and long-term design choices.
 
 Cheers,
 bc
 
 On Wed, Nov 5, 2014 at 5:31 PM, Matei Zaharia matei.zaha...@gmail.com 
 mailto:matei.zaha...@gmail.com wrote:
 Hi all,
 
 I wanted to share a discussion we've been having on the PMC list, as well as 
 call for an official vote on it on a public list. Basically, as the Spark 
 project scales up, we need to define a model to make sure there is still 
 great oversight of key components (in particular internal architecture and 
 public APIs), and to this end I've proposed implementing a maintainer model 
 for some of these components, similar to other large projects.
 
 As background on this, Spark has grown a lot since joining Apache. We've had 
 over 80 contributors/month for the past 3 months, which I believe makes us 
 the most active project in contributors/month at Apache, as well as over 500 
 patches/month. The codebase has also grown significantly, with new libraries 
 for SQL, ML, graphs and more.
 
 In this kind of large project, one common way to scale development is to 
 assign maintainers to oversee key components, where each patch to that 
 component needs to get sign-off from at least one of its maintainers. Most 
 existing large projects do this -- at Apache, some large ones with this model 
 are CloudStack (the second-most active project overall), Subversion, and 
 Kafka, and other examples include Linux and Python. This is also by-and-large 
 how Spark operates today -- most components have a de-facto maintainer.
 
 IMO, adopting this model would have two benefits:
 
 1) Consistent oversight of design for that component, especially regarding 
 architecture and API. This process would ensure that the component's 
 maintainers see all proposed changes and consider them to fit together in a 
 good way.
 
 2) More structure for new contributors and committers -- in particular, it 
 would be easy to look up who’s 

Re: [VOTE] Designating maintainers for some Spark components

2014-11-06 Thread bc Wong
On Thu, Nov 6, 2014 at 11:25 AM, Matei Zaharia matei.zaha...@gmail.com
wrote:
​snip

 Ultimately, the core motivation is that the project has grown to the point
 where it's hard to expect every committer to have full understanding of
 every component. Some committers know a ton about systems but little about
 machine learning, some are algorithmic whizzes but may not realize the
 implications of changing something on the Python API, etc. This is just a
 way to make sure that a domain expert has looked at the areas where it is
 most likely for something to go wrong.


​Hi Matei,

I understand where you're coming from. My suggestion is to solve this
without adding a new process. In the example above, those algo whizzes
committers should realize that they're touching the Python API, and loop in
some Python maintainers​. Those Python maintainers would then respond and
help move the PR along. This is good hygiene and should already be
happening. For example, HDFS committers have commit rights to all of
Hadoop. But none of them would check in YARN code without getting agreement
from the YARN folks.

I think the majority of the effort here will be education and building the
convention. We have to ask committers to watch out for API changes, know
their own limits, and involve the component domain experts. We need that
anyways, which btw also seems to solve the problem. It's not clear what the
new process would add.

It'd be good to know the details, too. What are the exact criteria for a
committer to get promoted to be a maintainer? How often does the PMC
re-evaluate the list of maintainers? Is there an upper bound on the number
of maintainers for a component? Can we have an automatic rule for a
maintainer promotion after X patches or Y lines of code in that area?

Cheers,
bc

On Nov 6, 2014, at 10:53 AM, bc Wong bcwal...@cloudera.com wrote:

 Hi Matei,

 Good call on scaling the project itself. Identifying domain experts in
 different areas is a good thing. But I have some questions about the
 implementation. Here's my understanding of the proposal:

 (1) The PMC votes on a list of components and their maintainers. Changes
 to that list requires PMC approval.
 (2) No committer shall commit changes to a component without a +1 from a
 maintainer of that component.

 I see good reasons for #1, to help people navigate the project and
 identify expertise. For #2, I'd like to understand what problem it's trying
 to solve. Do we have rogue committers committing to areas that they don't
 know much about? If that's the case, we should address it directly, instead
 of adding new processes.

 To point out the obvious, it completely changes what committers means in
 Spark. Do we have clear promotion criteria from committer to
 maintainer? Is there a max number of maintainers per area Currently, as
 committers gains expertise in new areas, they could start reviewing code in
 those areas and give +1. This encourages more contributions and
 cross-component knowledge sharing. Under the new proposal, they now have to
 be promoted to maintainers first. That reduces our review bandwidth.

 Again, if there is a quality issue with code reviews, let's talk to those
 committers and help them do better. There are non-process ways to solve the
 problem.

 So I think we shouldn't require maintainer +1. I do like the idea of
 having explicit maintainers on a volunteer basis. These maintainers should
 watch their jira and PR traffic, and be very active in design  API
 discussions. That leads to better consistency and long-term design choices.

 Cheers,
 bc

 On Wed, Nov 5, 2014 at 5:31 PM, Matei Zaharia matei.zaha...@gmail.com
 wrote:

 Hi all,

 I wanted to share a discussion we've been having on the PMC list, as well
 as call for an official vote on it on a public list. Basically, as the
 Spark project scales up, we need to define a model to make sure there is
 still great oversight of key components (in particular internal
 architecture and public APIs), and to this end I've proposed implementing a
 maintainer model for some of these components, similar to other large
 projects.

 As background on this, Spark has grown a lot since joining Apache. We've
 had over 80 contributors/month for the past 3 months, which I believe makes
 us the most active project in contributors/month at Apache, as well as over
 500 patches/month. The codebase has also grown significantly, with new
 libraries for SQL, ML, graphs and more.

 In this kind of large project, one common way to scale development is to
 assign maintainers to oversee key components, where each patch to that
 component needs to get sign-off from at least one of its maintainers. Most
 existing large projects do this -- at Apache, some large ones with this
 model are CloudStack (the second-most active project overall), Subversion,
 and Kafka, and other examples include Linux and Python. This is also
 by-and-large how Spark operates today -- most components have a de-facto
 maintainer.

 

Re: [VOTE] Designating maintainers for some Spark components

2014-11-06 Thread Patrick Wendell
I think new committers might or might not be maintainers (it would
depend on the PMC vote). I don't think it would affect what you could
merge, you can merge in any part of the source tree, you just need to
get sign off if you want to touch a public API or make major
architectural changes. Most projects already require code review from
other committers before you commit something, so it's just a version
of that where you have specific people appointed to specific
components for review.

If you look, most large software projects have a maintainer model,
both in Apache and outside of it. Cloudstack is probably the best
example in Apache since they are the second most active project
(roughly) after Spark. They have two levels of maintainers and much
strong language - their language: In general, maintainers only have
commit rights on the module for which they are responsible..

I'd like us to start with something simpler and lightweight as
proposed here. Really the proposal on the table is just to codify the
current de-facto process to make sure we stick by it as we scale. If
we want to add more formality to it or strictness, we can do it later.

- Patrick

On Thu, Nov 6, 2014 at 3:29 PM, Hari Shreedharan
hshreedha...@cloudera.com wrote:
 How would this model work with a new committer who gets voted in? Does it 
 mean that a new committer would be a maintainer for at least one area -- else 
 we could end up having committers who really can't merge anything significant 
 until he becomes a maintainer.


 Thanks,
 Hari

 On Thu, Nov 6, 2014 at 3:00 PM, Matei Zaharia matei.zaha...@gmail.com
 wrote:

 I think you're misunderstanding the idea of process here. The point of 
 process is to make sure something happens automatically, which is useful to 
 ensure a certain level of quality. For example, all our patches go through 
 Jenkins, and nobody will make the mistake of merging them if they fail 
 tests, or RAT checks, or API compatibility checks. The idea is to get the 
 same kind of automation for design on these components. This is a very 
 common process for large software projects, and it's essentially what we had 
 already, but formalizing it will make clear that this is the process we 
 want. It's important to do it early in order to be able to refine the 
 process as the project grows.
 In terms of scope, again, the maintainers are *not* going to be the only 
 reviewers for that component, they are just a second level of sign-off 
 required for architecture and API. Being a maintainer is also not a 
 promotion, it's a responsibility. Since we don't have much experience yet 
 with this model, I didn't propose automatic rules beyond that the PMC can 
 add / remove maintainers -- presumably the PMC is in the best position to 
 know what the project needs. I think automatic rules are exactly the kind of 
 process you're arguing against. The process here is about ensuring 
 certain checks are made for every code change, not about automating 
 personnel and development decisions.
 In any case, I appreciate your input on this, and we're going to evaluate 
 the model to see how it goes. It might be that we decide we don't want it at 
 all. However, from what I've seen of other projects (not Hadoop but projects 
 with an order of magnitude more contributors, like Python or Linux), this is 
 one of the best ways to have consistently great releases with a large 
 contributor base and little room for error. With all due respect to what 
 Hadoop's accomplished, I wouldn't use Hadoop as the best example to strive 
 for; in my experience there I've seen patches reverted because of 
 architectural disagreements, new APIs released and abandoned, and generally 
 an experience that's been painful for users. A lot of the decisions we've 
 made in Spark (e.g. time-based release cycle, built-in libraries, API 
 stability rules, etc) were based on lessons learned there, in an attempt to 
 define a better model.
 Matei
 On Nov 6, 2014, at 2:18 PM, bc Wong bcwal...@cloudera.com wrote:

 On Thu, Nov 6, 2014 at 11:25 AM, Matei Zaharia matei.zaha...@gmail.com 
 mailto:matei.zaha...@gmail.com wrote:
 snip
 Ultimately, the core motivation is that the project has grown to the point 
 where it's hard to expect every committer to have full understanding of 
 every component. Some committers know a ton about systems but little about 
 machine learning, some are algorithmic whizzes but may not realize the 
 implications of changing something on the Python API, etc. This is just a 
 way to make sure that a domain expert has looked at the areas where it is 
 most likely for something to go wrong.

 Hi Matei,

 I understand where you're coming from. My suggestion is to solve this 
 without adding a new process. In the example above, those algo whizzes 
 committers should realize that they're touching the Python API, and loop in 
 some Python maintainers. Those Python maintainers would then respond and 
 help move the PR along. This is good 

Re: [VOTE] Designating maintainers for some Spark components

2014-11-06 Thread Hari Shreedharan
In Cloudstack, I believe one becomes a maintainer first for a subset of 
modules, before he/she becomes a proven maintainter who has commit rights on 
the entire source tree. 




So would it make sense to go that route, and have committers voted in as 
maintainers for certain parts of the codebase and then eventually become proven 
maintainers (though this might have be honor code based, since I don’t think 
git allows per module commit rights).


Thanks,
Hari

On Thu, Nov 6, 2014 at 3:45 PM, Patrick Wendell pwend...@gmail.com
wrote:

 I think new committers might or might not be maintainers (it would
 depend on the PMC vote). I don't think it would affect what you could
 merge, you can merge in any part of the source tree, you just need to
 get sign off if you want to touch a public API or make major
 architectural changes. Most projects already require code review from
 other committers before you commit something, so it's just a version
 of that where you have specific people appointed to specific
 components for review.
 If you look, most large software projects have a maintainer model,
 both in Apache and outside of it. Cloudstack is probably the best
 example in Apache since they are the second most active project
 (roughly) after Spark. They have two levels of maintainers and much
 strong language - their language: In general, maintainers only have
 commit rights on the module for which they are responsible..
 I'd like us to start with something simpler and lightweight as
 proposed here. Really the proposal on the table is just to codify the
 current de-facto process to make sure we stick by it as we scale. If
 we want to add more formality to it or strictness, we can do it later.
 - Patrick
 On Thu, Nov 6, 2014 at 3:29 PM, Hari Shreedharan
 hshreedha...@cloudera.com wrote:
 How would this model work with a new committer who gets voted in? Does it 
 mean that a new committer would be a maintainer for at least one area -- 
 else we could end up having committers who really can't merge anything 
 significant until he becomes a maintainer.


 Thanks,
 Hari

 On Thu, Nov 6, 2014 at 3:00 PM, Matei Zaharia matei.zaha...@gmail.com
 wrote:

 I think you're misunderstanding the idea of process here. The point of 
 process is to make sure something happens automatically, which is useful to 
 ensure a certain level of quality. For example, all our patches go through 
 Jenkins, and nobody will make the mistake of merging them if they fail 
 tests, or RAT checks, or API compatibility checks. The idea is to get the 
 same kind of automation for design on these components. This is a very 
 common process for large software projects, and it's essentially what we 
 had already, but formalizing it will make clear that this is the process we 
 want. It's important to do it early in order to be able to refine the 
 process as the project grows.
 In terms of scope, again, the maintainers are *not* going to be the only 
 reviewers for that component, they are just a second level of sign-off 
 required for architecture and API. Being a maintainer is also not a 
 promotion, it's a responsibility. Since we don't have much experience yet 
 with this model, I didn't propose automatic rules beyond that the PMC can 
 add / remove maintainers -- presumably the PMC is in the best position to 
 know what the project needs. I think automatic rules are exactly the kind 
 of process you're arguing against. The process here is about ensuring 
 certain checks are made for every code change, not about automating 
 personnel and development decisions.
 In any case, I appreciate your input on this, and we're going to evaluate 
 the model to see how it goes. It might be that we decide we don't want it 
 at all. However, from what I've seen of other projects (not Hadoop but 
 projects with an order of magnitude more contributors, like Python or 
 Linux), this is one of the best ways to have consistently great releases 
 with a large contributor base and little room for error. With all due 
 respect to what Hadoop's accomplished, I wouldn't use Hadoop as the best 
 example to strive for; in my experience there I've seen patches reverted 
 because of architectural disagreements, new APIs released and abandoned, 
 and generally an experience that's been painful for users. A lot of the 
 decisions we've made in Spark (e.g. time-based release cycle, built-in 
 libraries, API stability rules, etc) were based on lessons learned there, 
 in an attempt to define a better model.
 Matei
 On Nov 6, 2014, at 2:18 PM, bc Wong bcwal...@cloudera.com wrote:

 On Thu, Nov 6, 2014 at 11:25 AM, Matei Zaharia matei.zaha...@gmail.com 
 mailto:matei.zaha...@gmail.com wrote:
 snip
 Ultimately, the core motivation is that the project has grown to the point 
 where it's hard to expect every committer to have full understanding of 
 every component. Some committers know a ton about systems but little about 
 machine learning, some are algorithmic whizzes but may 

Re: [VOTE] Designating maintainers for some Spark components

2014-11-06 Thread Greg Stein
-1 (non-binding)

This is an idea that runs COMPLETELY counter to the Apache Way, and is
to be severely frowned up. This creates *unequal* ownership of the
codebase.

Each Member of the PMC should have *equal* rights to all areas of the
codebase until their purview. It should not be subjected to others'
ownership except throught the standard mechanisms of reviews and
if/when absolutely necessary, to vetos.

Apache does not want leads, benevolent dictators or assigned
maintainers, no matter how you may dress it up with multiple
maintainers per component. The fact is that this creates an unequal
level of ownership and responsibility. The Board has shut down
projects that attempted or allowed for Leads. Just a few months ago,
there was a problem with somebody calling themself a Lead.

I don't know why you suggest that Apache Subversion does this. We
absolutely do not. Never have. Never will. The Subversion codebase is
owned by all of us, and we all care for every line of it. Some people
know more than others, of course. But any one of us, can change any
part, without being subjected to a maintainer. Of course, we ask
people with more knowledge of the component when we feel
uncomfortable, but we also know when it is safe or not to make a
specific change. And *always*, our fellow committers can review our
work and let us know when we've done something wrong.

Equal ownership reduces fiefdoms, enhances a feeling of community and
project ownership, and creates a more open and inviting project.

So again: -1 on this entire concept. Not good, to be polite.

Regards,
Greg Stein
Director, Vice Chairman
Apache Software Foundation

On Wed, Nov 05, 2014 at 05:31:58PM -0800, Matei Zaharia wrote:
 Hi all,
 
 I wanted to share a discussion we've been having on the PMC list, as well as 
 call for an official vote on it on a public list. Basically, as the Spark 
 project scales up, we need to define a model to make sure there is still 
 great oversight of key components (in particular internal architecture and 
 public APIs), and to this end I've proposed implementing a maintainer model 
 for some of these components, similar to other large projects.
 
 As background on this, Spark has grown a lot since joining Apache. We've had 
 over 80 contributors/month for the past 3 months, which I believe makes us 
 the most active project in contributors/month at Apache, as well as over 500 
 patches/month. The codebase has also grown significantly, with new libraries 
 for SQL, ML, graphs and more.
 
 In this kind of large project, one common way to scale development is to 
 assign maintainers to oversee key components, where each patch to that 
 component needs to get sign-off from at least one of its maintainers. Most 
 existing large projects do this -- at Apache, some large ones with this model 
 are CloudStack (the second-most active project overall), Subversion, and 
 Kafka, and other examples include Linux and Python. This is also by-and-large 
 how Spark operates today -- most components have a de-facto maintainer.
 
 IMO, adopting this model would have two benefits:
 
 1) Consistent oversight of design for that component, especially regarding 
 architecture and API. This process would ensure that the component's 
 maintainers see all proposed changes and consider them to fit together in a 
 good way.
 
 2) More structure for new contributors and committers -- in particular, it 
 would be easy to look up who???s responsible for each module and ask them for 
 reviews, etc, rather than having patches slip between the cracks.
 
 We'd like to start with in a light-weight manner, where the model only 
 applies to certain key components (e.g. scheduler, shuffle) and user-facing 
 APIs (MLlib, GraphX, etc). Over time, as the project grows, we can expand it 
 if we deem it useful. The specific mechanics would be as follows:
 
 - Some components in Spark will have maintainers assigned to them, where one 
 of the maintainers needs to sign off on each patch to the component.
 - Each component with maintainers will have at least 2 maintainers.
 - Maintainers will be assigned from the most active and knowledgeable 
 committers on that component by the PMC. The PMC can vote to add / remove 
 maintainers, and maintained components, through consensus.
 - Maintainers are expected to be active in responding to patches for their 
 components, though they do not need to be the main reviewers for them (e.g. 
 they might just sign off on architecture / API). To prevent inactive 
 maintainers from blocking the project, if a maintainer isn't responding in a 
 reasonable time period (say 2 weeks), other committers can merge the patch, 
 and the PMC will want to discuss adding another maintainer.
 
 If you'd like to see examples for this model, check out the following 
 projects:
 - CloudStack: 
 https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
  
 

Re: [VOTE] Designating maintainers for some Spark components

2014-11-06 Thread Patrick Wendell
Hey Greg,

Regarding subversion - I think the reference is to partial vs full
committers here:
https://subversion.apache.org/docs/community-guide/roles.html

- Patrick

On Thu, Nov 6, 2014 at 4:18 PM, Greg Stein gst...@gmail.com wrote:
 -1 (non-binding)

 This is an idea that runs COMPLETELY counter to the Apache Way, and is
 to be severely frowned up. This creates *unequal* ownership of the
 codebase.

 Each Member of the PMC should have *equal* rights to all areas of the
 codebase until their purview. It should not be subjected to others'
 ownership except throught the standard mechanisms of reviews and
 if/when absolutely necessary, to vetos.

 Apache does not want leads, benevolent dictators or assigned
 maintainers, no matter how you may dress it up with multiple
 maintainers per component. The fact is that this creates an unequal
 level of ownership and responsibility. The Board has shut down
 projects that attempted or allowed for Leads. Just a few months ago,
 there was a problem with somebody calling themself a Lead.

 I don't know why you suggest that Apache Subversion does this. We
 absolutely do not. Never have. Never will. The Subversion codebase is
 owned by all of us, and we all care for every line of it. Some people
 know more than others, of course. But any one of us, can change any
 part, without being subjected to a maintainer. Of course, we ask
 people with more knowledge of the component when we feel
 uncomfortable, but we also know when it is safe or not to make a
 specific change. And *always*, our fellow committers can review our
 work and let us know when we've done something wrong.

 Equal ownership reduces fiefdoms, enhances a feeling of community and
 project ownership, and creates a more open and inviting project.

 So again: -1 on this entire concept. Not good, to be polite.

 Regards,
 Greg Stein
 Director, Vice Chairman
 Apache Software Foundation

 On Wed, Nov 05, 2014 at 05:31:58PM -0800, Matei Zaharia wrote:
 Hi all,

 I wanted to share a discussion we've been having on the PMC list, as well as 
 call for an official vote on it on a public list. Basically, as the Spark 
 project scales up, we need to define a model to make sure there is still 
 great oversight of key components (in particular internal architecture and 
 public APIs), and to this end I've proposed implementing a maintainer model 
 for some of these components, similar to other large projects.

 As background on this, Spark has grown a lot since joining Apache. We've had 
 over 80 contributors/month for the past 3 months, which I believe makes us 
 the most active project in contributors/month at Apache, as well as over 500 
 patches/month. The codebase has also grown significantly, with new libraries 
 for SQL, ML, graphs and more.

 In this kind of large project, one common way to scale development is to 
 assign maintainers to oversee key components, where each patch to that 
 component needs to get sign-off from at least one of its maintainers. Most 
 existing large projects do this -- at Apache, some large ones with this 
 model are CloudStack (the second-most active project overall), Subversion, 
 and Kafka, and other examples include Linux and Python. This is also 
 by-and-large how Spark operates today -- most components have a de-facto 
 maintainer.

 IMO, adopting this model would have two benefits:

 1) Consistent oversight of design for that component, especially regarding 
 architecture and API. This process would ensure that the component's 
 maintainers see all proposed changes and consider them to fit together in a 
 good way.

 2) More structure for new contributors and committers -- in particular, it 
 would be easy to look up who's responsible for each module and ask them for 
 reviews, etc, rather than having patches slip between the cracks.

 We'd like to start with in a light-weight manner, where the model only 
 applies to certain key components (e.g. scheduler, shuffle) and user-facing 
 APIs (MLlib, GraphX, etc). Over time, as the project grows, we can expand it 
 if we deem it useful. The specific mechanics would be as follows:

 - Some components in Spark will have maintainers assigned to them, where one 
 of the maintainers needs to sign off on each patch to the component.
 - Each component with maintainers will have at least 2 maintainers.
 - Maintainers will be assigned from the most active and knowledgeable 
 committers on that component by the PMC. The PMC can vote to add / remove 
 maintainers, and maintained components, through consensus.
 - Maintainers are expected to be active in responding to patches for their 
 components, though they do not need to be the main reviewers for them (e.g. 
 they might just sign off on architecture / API). To prevent inactive 
 maintainers from blocking the project, if a maintainer isn't responding in a 
 reasonable time period (say 2 weeks), other committers can merge the patch, 
 and the PMC will want to discuss adding another 

Re: [VOTE] Designating maintainers for some Spark components

2014-11-06 Thread Patrick Wendell
In fact, if you look at the subversion commiter list, the majority of
people here have commit access only for particular areas of the
project:

http://svn.apache.org/repos/asf/subversion/trunk/COMMITTERS

On Thu, Nov 6, 2014 at 4:26 PM, Patrick Wendell pwend...@gmail.com wrote:
 Hey Greg,

 Regarding subversion - I think the reference is to partial vs full
 committers here:
 https://subversion.apache.org/docs/community-guide/roles.html

 - Patrick

 On Thu, Nov 6, 2014 at 4:18 PM, Greg Stein gst...@gmail.com wrote:
 -1 (non-binding)

 This is an idea that runs COMPLETELY counter to the Apache Way, and is
 to be severely frowned up. This creates *unequal* ownership of the
 codebase.

 Each Member of the PMC should have *equal* rights to all areas of the
 codebase until their purview. It should not be subjected to others'
 ownership except throught the standard mechanisms of reviews and
 if/when absolutely necessary, to vetos.

 Apache does not want leads, benevolent dictators or assigned
 maintainers, no matter how you may dress it up with multiple
 maintainers per component. The fact is that this creates an unequal
 level of ownership and responsibility. The Board has shut down
 projects that attempted or allowed for Leads. Just a few months ago,
 there was a problem with somebody calling themself a Lead.

 I don't know why you suggest that Apache Subversion does this. We
 absolutely do not. Never have. Never will. The Subversion codebase is
 owned by all of us, and we all care for every line of it. Some people
 know more than others, of course. But any one of us, can change any
 part, without being subjected to a maintainer. Of course, we ask
 people with more knowledge of the component when we feel
 uncomfortable, but we also know when it is safe or not to make a
 specific change. And *always*, our fellow committers can review our
 work and let us know when we've done something wrong.

 Equal ownership reduces fiefdoms, enhances a feeling of community and
 project ownership, and creates a more open and inviting project.

 So again: -1 on this entire concept. Not good, to be polite.

 Regards,
 Greg Stein
 Director, Vice Chairman
 Apache Software Foundation

 On Wed, Nov 05, 2014 at 05:31:58PM -0800, Matei Zaharia wrote:
 Hi all,

 I wanted to share a discussion we've been having on the PMC list, as well 
 as call for an official vote on it on a public list. Basically, as the 
 Spark project scales up, we need to define a model to make sure there is 
 still great oversight of key components (in particular internal 
 architecture and public APIs), and to this end I've proposed implementing a 
 maintainer model for some of these components, similar to other large 
 projects.

 As background on this, Spark has grown a lot since joining Apache. We've 
 had over 80 contributors/month for the past 3 months, which I believe makes 
 us the most active project in contributors/month at Apache, as well as over 
 500 patches/month. The codebase has also grown significantly, with new 
 libraries for SQL, ML, graphs and more.

 In this kind of large project, one common way to scale development is to 
 assign maintainers to oversee key components, where each patch to that 
 component needs to get sign-off from at least one of its maintainers. Most 
 existing large projects do this -- at Apache, some large ones with this 
 model are CloudStack (the second-most active project overall), Subversion, 
 and Kafka, and other examples include Linux and Python. This is also 
 by-and-large how Spark operates today -- most components have a de-facto 
 maintainer.

 IMO, adopting this model would have two benefits:

 1) Consistent oversight of design for that component, especially regarding 
 architecture and API. This process would ensure that the component's 
 maintainers see all proposed changes and consider them to fit together in a 
 good way.

 2) More structure for new contributors and committers -- in particular, it 
 would be easy to look up who's responsible for each module and ask them for 
 reviews, etc, rather than having patches slip between the cracks.

 We'd like to start with in a light-weight manner, where the model only 
 applies to certain key components (e.g. scheduler, shuffle) and user-facing 
 APIs (MLlib, GraphX, etc). Over time, as the project grows, we can expand 
 it if we deem it useful. The specific mechanics would be as follows:

 - Some components in Spark will have maintainers assigned to them, where 
 one of the maintainers needs to sign off on each patch to the component.
 - Each component with maintainers will have at least 2 maintainers.
 - Maintainers will be assigned from the most active and knowledgeable 
 committers on that component by the PMC. The PMC can vote to add / remove 
 maintainers, and maintained components, through consensus.
 - Maintainers are expected to be active in responding to patches for their 
 components, though they do not need to be the main reviewers for them (e.g. 
 

Re: [VOTE] Designating maintainers for some Spark components

2014-11-06 Thread Corey Nolet
+1 (non-binding) [for original process proposal]

Greg, the first time I've seen the word ownership on this thread is in
your message. The first time the word lead has appeared in this thread is
in your message as well. I don't think that was the intent. The PMC and
Committers have a responsibility to the community to make sure that their
patches are being reviewed and committed. I don't see in Apache's
recommended bylaws anywhere that says establishing responsibility on paper
for specific areas cannot be taken on by different members of the PMC.
What's been proposed looks, to me, to be an empirical process and it looks
like it has pretty much a consensus from the side able to give binding
votes. I don't at all this model establishes any form of ownership over
anything. I also don't see in the process proposal where it mentions that
nobody other than the persons responsible for a module can review or commit
code.

In fact, I'll go as far as to say that since Apache is a meritocracy, the
people who have been aligned to the responsibilities probably were aligned
based on some sort of meric, correct? Perhaps we could dig in and find out
for sure... I'm still getting familiar with the Spark community myself.



On Thu, Nov 6, 2014 at 7:28 PM, Patrick Wendell pwend...@gmail.com wrote:

 In fact, if you look at the subversion commiter list, the majority of
 people here have commit access only for particular areas of the
 project:

 http://svn.apache.org/repos/asf/subversion/trunk/COMMITTERS

 On Thu, Nov 6, 2014 at 4:26 PM, Patrick Wendell pwend...@gmail.com
 wrote:
  Hey Greg,
 
  Regarding subversion - I think the reference is to partial vs full
  committers here:
  https://subversion.apache.org/docs/community-guide/roles.html
 
  - Patrick
 
  On Thu, Nov 6, 2014 at 4:18 PM, Greg Stein gst...@gmail.com wrote:
  -1 (non-binding)
 
  This is an idea that runs COMPLETELY counter to the Apache Way, and is
  to be severely frowned up. This creates *unequal* ownership of the
  codebase.
 
  Each Member of the PMC should have *equal* rights to all areas of the
  codebase until their purview. It should not be subjected to others'
  ownership except throught the standard mechanisms of reviews and
  if/when absolutely necessary, to vetos.
 
  Apache does not want leads, benevolent dictators or assigned
  maintainers, no matter how you may dress it up with multiple
  maintainers per component. The fact is that this creates an unequal
  level of ownership and responsibility. The Board has shut down
  projects that attempted or allowed for Leads. Just a few months ago,
  there was a problem with somebody calling themself a Lead.
 
  I don't know why you suggest that Apache Subversion does this. We
  absolutely do not. Never have. Never will. The Subversion codebase is
  owned by all of us, and we all care for every line of it. Some people
  know more than others, of course. But any one of us, can change any
  part, without being subjected to a maintainer. Of course, we ask
  people with more knowledge of the component when we feel
  uncomfortable, but we also know when it is safe or not to make a
  specific change. And *always*, our fellow committers can review our
  work and let us know when we've done something wrong.
 
  Equal ownership reduces fiefdoms, enhances a feeling of community and
  project ownership, and creates a more open and inviting project.
 
  So again: -1 on this entire concept. Not good, to be polite.
 
  Regards,
  Greg Stein
  Director, Vice Chairman
  Apache Software Foundation
 
  On Wed, Nov 05, 2014 at 05:31:58PM -0800, Matei Zaharia wrote:
  Hi all,
 
  I wanted to share a discussion we've been having on the PMC list, as
 well as call for an official vote on it on a public list. Basically, as the
 Spark project scales up, we need to define a model to make sure there is
 still great oversight of key components (in particular internal
 architecture and public APIs), and to this end I've proposed implementing a
 maintainer model for some of these components, similar to other large
 projects.
 
  As background on this, Spark has grown a lot since joining Apache.
 We've had over 80 contributors/month for the past 3 months, which I believe
 makes us the most active project in contributors/month at Apache, as well
 as over 500 patches/month. The codebase has also grown significantly, with
 new libraries for SQL, ML, graphs and more.
 
  In this kind of large project, one common way to scale development is
 to assign maintainers to oversee key components, where each patch to that
 component needs to get sign-off from at least one of its maintainers. Most
 existing large projects do this -- at Apache, some large ones with this
 model are CloudStack (the second-most active project overall), Subversion,
 and Kafka, and other examples include Linux and Python. This is also
 by-and-large how Spark operates today -- most components have a de-facto
 maintainer.
 
  IMO, adopting this model would have two 

Re: [VOTE] Designating maintainers for some Spark components

2014-11-06 Thread Greg Stein
Partial committers are people invited to work on a particular area, and
they do not require sign-off to work on that area. They can get a sign-off
and commit outside that area. That approach doesn't compare to this
proposal.

Full committers are PMC members. As each PMC member is responsible for
*every* line of code, then every PMC member should have complete rights to
every line of code. Creating disparity flies in the face of a PMC member's
responsibility. If I am a Spark PMC member, then I have responsibility for
GraphX code, whether my name is Ankur, Joey, Reynold, or Greg. And
interposing a barrier inhibits my responsibility to ensure GraphX is
designed, maintained, and delivered to the Public.

Cheers,
-g

(and yes, I'm aware of COMMITTERS; I've been changing that file for the
past 12 years :-) )

On Thu, Nov 6, 2014 at 6:28 PM, Patrick Wendell pwend...@gmail.com wrote:

 In fact, if you look at the subversion commiter list, the majority of
 people here have commit access only for particular areas of the
 project:

 http://svn.apache.org/repos/asf/subversion/trunk/COMMITTERS

 On Thu, Nov 6, 2014 at 4:26 PM, Patrick Wendell pwend...@gmail.com
 wrote:
  Hey Greg,
 
  Regarding subversion - I think the reference is to partial vs full
  committers here:
  https://subversion.apache.org/docs/community-guide/roles.html
 
  - Patrick
 
  On Thu, Nov 6, 2014 at 4:18 PM, Greg Stein gst...@gmail.com wrote:
  -1 (non-binding)
 
  This is an idea that runs COMPLETELY counter to the Apache Way, and is
  to be severely frowned up. This creates *unequal* ownership of the
  codebase.
 
  Each Member of the PMC should have *equal* rights to all areas of the
  codebase until their purview. It should not be subjected to others'
  ownership except throught the standard mechanisms of reviews and
  if/when absolutely necessary, to vetos.
 
  Apache does not want leads, benevolent dictators or assigned
  maintainers, no matter how you may dress it up with multiple
  maintainers per component. The fact is that this creates an unequal
  level of ownership and responsibility. The Board has shut down
  projects that attempted or allowed for Leads. Just a few months ago,
  there was a problem with somebody calling themself a Lead.
 
  I don't know why you suggest that Apache Subversion does this. We
  absolutely do not. Never have. Never will. The Subversion codebase is
  owned by all of us, and we all care for every line of it. Some people
  know more than others, of course. But any one of us, can change any
  part, without being subjected to a maintainer. Of course, we ask
  people with more knowledge of the component when we feel
  uncomfortable, but we also know when it is safe or not to make a
  specific change. And *always*, our fellow committers can review our
  work and let us know when we've done something wrong.
 
  Equal ownership reduces fiefdoms, enhances a feeling of community and
  project ownership, and creates a more open and inviting project.
 
  So again: -1 on this entire concept. Not good, to be polite.
 
  Regards,
  Greg Stein
  Director, Vice Chairman
  Apache Software Foundation
 
  On Wed, Nov 05, 2014 at 05:31:58PM -0800, Matei Zaharia wrote:
  Hi all,
 
  I wanted to share a discussion we've been having on the PMC list, as
 well as call for an official vote on it on a public list. Basically, as the
 Spark project scales up, we need to define a model to make sure there is
 still great oversight of key components (in particular internal
 architecture and public APIs), and to this end I've proposed implementing a
 maintainer model for some of these components, similar to other large
 projects.
 
  As background on this, Spark has grown a lot since joining Apache.
 We've had over 80 contributors/month for the past 3 months, which I believe
 makes us the most active project in contributors/month at Apache, as well
 as over 500 patches/month. The codebase has also grown significantly, with
 new libraries for SQL, ML, graphs and more.
 
  In this kind of large project, one common way to scale development is
 to assign maintainers to oversee key components, where each patch to that
 component needs to get sign-off from at least one of its maintainers. Most
 existing large projects do this -- at Apache, some large ones with this
 model are CloudStack (the second-most active project overall), Subversion,
 and Kafka, and other examples include Linux and Python. This is also
 by-and-large how Spark operates today -- most components have a de-facto
 maintainer.
 
  IMO, adopting this model would have two benefits:
 
  1) Consistent oversight of design for that component, especially
 regarding architecture and API. This process would ensure that the
 component's maintainers see all proposed changes and consider them to fit
 together in a good way.
 
  2) More structure for new contributors and committers -- in
 particular, it would be easy to look up who's responsible for each module
 and ask them for reviews, 

Re: [VOTE] Designating maintainers for some Spark components

2014-11-06 Thread Matei Zaharia
So I don't understand, Greg, are the partial committers committers, or are they 
not? Spark also has a PMC, but our PMC currently consists of all committers (we 
decided not to have a differentiation when we left the incubator). I see the 
Subversion partial committers listed as committers on 
https://people.apache.org/committers-by-project.html#subversion, so I assume 
they are committers. As far as I can see, CloudStack is similar.

Matei

 On Nov 6, 2014, at 4:43 PM, Greg Stein gst...@gmail.com wrote:
 
 Partial committers are people invited to work on a particular area, and they 
 do not require sign-off to work on that area. They can get a sign-off and 
 commit outside that area. That approach doesn't compare to this proposal.
 
 Full committers are PMC members. As each PMC member is responsible for 
 *every* line of code, then every PMC member should have complete rights to 
 every line of code. Creating disparity flies in the face of a PMC member's 
 responsibility. If I am a Spark PMC member, then I have responsibility for 
 GraphX code, whether my name is Ankur, Joey, Reynold, or Greg. And 
 interposing a barrier inhibits my responsibility to ensure GraphX is 
 designed, maintained, and delivered to the Public.
 
 Cheers,
 -g
 
 (and yes, I'm aware of COMMITTERS; I've been changing that file for the past 
 12 years :-) )
 
 On Thu, Nov 6, 2014 at 6:28 PM, Patrick Wendell pwend...@gmail.com 
 mailto:pwend...@gmail.com wrote:
 In fact, if you look at the subversion commiter list, the majority of
 people here have commit access only for particular areas of the
 project:
 
 http://svn.apache.org/repos/asf/subversion/trunk/COMMITTERS 
 http://svn.apache.org/repos/asf/subversion/trunk/COMMITTERS
 
 On Thu, Nov 6, 2014 at 4:26 PM, Patrick Wendell pwend...@gmail.com 
 mailto:pwend...@gmail.com wrote:
  Hey Greg,
 
  Regarding subversion - I think the reference is to partial vs full
  committers here:
  https://subversion.apache.org/docs/community-guide/roles.html 
  https://subversion.apache.org/docs/community-guide/roles.html
 
  - Patrick
 
  On Thu, Nov 6, 2014 at 4:18 PM, Greg Stein gst...@gmail.com 
  mailto:gst...@gmail.com wrote:
  -1 (non-binding)
 
  This is an idea that runs COMPLETELY counter to the Apache Way, and is
  to be severely frowned up. This creates *unequal* ownership of the
  codebase.
 
  Each Member of the PMC should have *equal* rights to all areas of the
  codebase until their purview. It should not be subjected to others'
  ownership except throught the standard mechanisms of reviews and
  if/when absolutely necessary, to vetos.
 
  Apache does not want leads, benevolent dictators or assigned
  maintainers, no matter how you may dress it up with multiple
  maintainers per component. The fact is that this creates an unequal
  level of ownership and responsibility. The Board has shut down
  projects that attempted or allowed for Leads. Just a few months ago,
  there was a problem with somebody calling themself a Lead.
 
  I don't know why you suggest that Apache Subversion does this. We
  absolutely do not. Never have. Never will. The Subversion codebase is
  owned by all of us, and we all care for every line of it. Some people
  know more than others, of course. But any one of us, can change any
  part, without being subjected to a maintainer. Of course, we ask
  people with more knowledge of the component when we feel
  uncomfortable, but we also know when it is safe or not to make a
  specific change. And *always*, our fellow committers can review our
  work and let us know when we've done something wrong.
 
  Equal ownership reduces fiefdoms, enhances a feeling of community and
  project ownership, and creates a more open and inviting project.
 
  So again: -1 on this entire concept. Not good, to be polite.
 
  Regards,
  Greg Stein
  Director, Vice Chairman
  Apache Software Foundation
 
  On Wed, Nov 05, 2014 at 05:31:58PM -0800, Matei Zaharia wrote:
  Hi all,
 
  I wanted to share a discussion we've been having on the PMC list, as well 
  as call for an official vote on it on a public list. Basically, as the 
  Spark project scales up, we need to define a model to make sure there is 
  still great oversight of key components (in particular internal 
  architecture and public APIs), and to this end I've proposed implementing 
  a maintainer model for some of these components, similar to other large 
  projects.
 
  As background on this, Spark has grown a lot since joining Apache. We've 
  had over 80 contributors/month for the past 3 months, which I believe 
  makes us the most active project in contributors/month at Apache, as well 
  as over 500 patches/month. The codebase has also grown significantly, 
  with new libraries for SQL, ML, graphs and more.
 
  In this kind of large project, one common way to scale development is to 
  assign maintainers to oversee key components, where each patch to that 
  component needs to get sign-off from at least one of its 

Re: [VOTE] Designating maintainers for some Spark components

2014-11-06 Thread Corey Nolet
PMC [1] is responsible for oversight and does not designate partial or full
committer. There are projects where all committers become PMC and others
where PMC is reserved for committers with the most merit (and willingness
to take on the responsibility of project oversight, releases, etc...).
Community maintains the codebase through committers. Committers to mentor,
roll in patches, and spread the project throughout other communities.

Adding someone's name to a list as a maintainer is not a barrier. With a
community as large as Spark's, and myself not being a committer on this
project, I see it as a welcome opportunity to find a mentor in the areas in
which I'm interested in contributing. We'd expect the list of names to grow
as more volunteers gain more interest, correct? To me, that seems quite
contrary to a barrier.

[1] http://www.apache.org/dev/pmc.html


On Thu, Nov 6, 2014 at 7:49 PM, Matei Zaharia matei.zaha...@gmail.com
wrote:

 So I don't understand, Greg, are the partial committers committers, or are
 they not? Spark also has a PMC, but our PMC currently consists of all
 committers (we decided not to have a differentiation when we left the
 incubator). I see the Subversion partial committers listed as committers
 on https://people.apache.org/committers-by-project.html#subversion, so I
 assume they are committers. As far as I can see, CloudStack is similar.

 Matei

  On Nov 6, 2014, at 4:43 PM, Greg Stein gst...@gmail.com wrote:
 
  Partial committers are people invited to work on a particular area, and
 they do not require sign-off to work on that area. They can get a sign-off
 and commit outside that area. That approach doesn't compare to this
 proposal.
 
  Full committers are PMC members. As each PMC member is responsible for
 *every* line of code, then every PMC member should have complete rights to
 every line of code. Creating disparity flies in the face of a PMC member's
 responsibility. If I am a Spark PMC member, then I have responsibility for
 GraphX code, whether my name is Ankur, Joey, Reynold, or Greg. And
 interposing a barrier inhibits my responsibility to ensure GraphX is
 designed, maintained, and delivered to the Public.
 
  Cheers,
  -g
 
  (and yes, I'm aware of COMMITTERS; I've been changing that file for the
 past 12 years :-) )
 
  On Thu, Nov 6, 2014 at 6:28 PM, Patrick Wendell pwend...@gmail.com
 mailto:pwend...@gmail.com wrote:
  In fact, if you look at the subversion commiter list, the majority of
  people here have commit access only for particular areas of the
  project:
 
  http://svn.apache.org/repos/asf/subversion/trunk/COMMITTERS 
 http://svn.apache.org/repos/asf/subversion/trunk/COMMITTERS
 
  On Thu, Nov 6, 2014 at 4:26 PM, Patrick Wendell pwend...@gmail.com
 mailto:pwend...@gmail.com wrote:
   Hey Greg,
  
   Regarding subversion - I think the reference is to partial vs full
   committers here:
   https://subversion.apache.org/docs/community-guide/roles.html 
 https://subversion.apache.org/docs/community-guide/roles.html
  
   - Patrick
  
   On Thu, Nov 6, 2014 at 4:18 PM, Greg Stein gst...@gmail.com mailto:
 gst...@gmail.com wrote:
   -1 (non-binding)
  
   This is an idea that runs COMPLETELY counter to the Apache Way, and is
   to be severely frowned up. This creates *unequal* ownership of the
   codebase.
  
   Each Member of the PMC should have *equal* rights to all areas of the
   codebase until their purview. It should not be subjected to others'
   ownership except throught the standard mechanisms of reviews and
   if/when absolutely necessary, to vetos.
  
   Apache does not want leads, benevolent dictators or assigned
   maintainers, no matter how you may dress it up with multiple
   maintainers per component. The fact is that this creates an unequal
   level of ownership and responsibility. The Board has shut down
   projects that attempted or allowed for Leads. Just a few months ago,
   there was a problem with somebody calling themself a Lead.
  
   I don't know why you suggest that Apache Subversion does this. We
   absolutely do not. Never have. Never will. The Subversion codebase is
   owned by all of us, and we all care for every line of it. Some people
   know more than others, of course. But any one of us, can change any
   part, without being subjected to a maintainer. Of course, we ask
   people with more knowledge of the component when we feel
   uncomfortable, but we also know when it is safe or not to make a
   specific change. And *always*, our fellow committers can review our
   work and let us know when we've done something wrong.
  
   Equal ownership reduces fiefdoms, enhances a feeling of community and
   project ownership, and creates a more open and inviting project.
  
   So again: -1 on this entire concept. Not good, to be polite.
  
   Regards,
   Greg Stein
   Director, Vice Chairman
   Apache Software Foundation
  
   On Wed, Nov 05, 2014 at 05:31:58PM -0800, Matei Zaharia wrote:
   Hi all,
  
   I wanted to share a 

Re: [VOTE] Designating maintainers for some Spark components

2014-11-06 Thread Sandy Ryza
It looks like the difference between the proposed Spark model and the
CloudStack / SVN model is:
* In the former, maintainers / partial committers are a way of centralizing
oversight over particular components among committers
* In the latter, maintainers / partial committers are a way of giving
non-committers some power to make changes

-Sandy

On Thu, Nov 6, 2014 at 5:17 PM, Corey Nolet cjno...@gmail.com wrote:

 PMC [1] is responsible for oversight and does not designate partial or full
 committer. There are projects where all committers become PMC and others
 where PMC is reserved for committers with the most merit (and willingness
 to take on the responsibility of project oversight, releases, etc...).
 Community maintains the codebase through committers. Committers to mentor,
 roll in patches, and spread the project throughout other communities.

 Adding someone's name to a list as a maintainer is not a barrier. With a
 community as large as Spark's, and myself not being a committer on this
 project, I see it as a welcome opportunity to find a mentor in the areas in
 which I'm interested in contributing. We'd expect the list of names to grow
 as more volunteers gain more interest, correct? To me, that seems quite
 contrary to a barrier.

 [1] http://www.apache.org/dev/pmc.html


 On Thu, Nov 6, 2014 at 7:49 PM, Matei Zaharia matei.zaha...@gmail.com
 wrote:

  So I don't understand, Greg, are the partial committers committers, or
 are
  they not? Spark also has a PMC, but our PMC currently consists of all
  committers (we decided not to have a differentiation when we left the
  incubator). I see the Subversion partial committers listed as
 committers
  on https://people.apache.org/committers-by-project.html#subversion, so I
  assume they are committers. As far as I can see, CloudStack is similar.
 
  Matei
 
   On Nov 6, 2014, at 4:43 PM, Greg Stein gst...@gmail.com wrote:
  
   Partial committers are people invited to work on a particular area, and
  they do not require sign-off to work on that area. They can get a
 sign-off
  and commit outside that area. That approach doesn't compare to this
  proposal.
  
   Full committers are PMC members. As each PMC member is responsible for
  *every* line of code, then every PMC member should have complete rights
 to
  every line of code. Creating disparity flies in the face of a PMC
 member's
  responsibility. If I am a Spark PMC member, then I have responsibility
 for
  GraphX code, whether my name is Ankur, Joey, Reynold, or Greg. And
  interposing a barrier inhibits my responsibility to ensure GraphX is
  designed, maintained, and delivered to the Public.
  
   Cheers,
   -g
  
   (and yes, I'm aware of COMMITTERS; I've been changing that file for the
  past 12 years :-) )
  
   On Thu, Nov 6, 2014 at 6:28 PM, Patrick Wendell pwend...@gmail.com
  mailto:pwend...@gmail.com wrote:
   In fact, if you look at the subversion commiter list, the majority of
   people here have commit access only for particular areas of the
   project:
  
   http://svn.apache.org/repos/asf/subversion/trunk/COMMITTERS 
  http://svn.apache.org/repos/asf/subversion/trunk/COMMITTERS
  
   On Thu, Nov 6, 2014 at 4:26 PM, Patrick Wendell pwend...@gmail.com
  mailto:pwend...@gmail.com wrote:
Hey Greg,
   
Regarding subversion - I think the reference is to partial vs full
committers here:
https://subversion.apache.org/docs/community-guide/roles.html 
  https://subversion.apache.org/docs/community-guide/roles.html
   
- Patrick
   
On Thu, Nov 6, 2014 at 4:18 PM, Greg Stein gst...@gmail.com
 mailto:
  gst...@gmail.com wrote:
-1 (non-binding)
   
This is an idea that runs COMPLETELY counter to the Apache Way, and
 is
to be severely frowned up. This creates *unequal* ownership of the
codebase.
   
Each Member of the PMC should have *equal* rights to all areas of
 the
codebase until their purview. It should not be subjected to others'
ownership except throught the standard mechanisms of reviews and
if/when absolutely necessary, to vetos.
   
Apache does not want leads, benevolent dictators or assigned
maintainers, no matter how you may dress it up with multiple
maintainers per component. The fact is that this creates an unequal
level of ownership and responsibility. The Board has shut down
projects that attempted or allowed for Leads. Just a few months
 ago,
there was a problem with somebody calling themself a Lead.
   
I don't know why you suggest that Apache Subversion does this. We
absolutely do not. Never have. Never will. The Subversion codebase
 is
owned by all of us, and we all care for every line of it. Some
 people
know more than others, of course. But any one of us, can change any
part, without being subjected to a maintainer. Of course, we ask
people with more knowledge of the component when we feel
uncomfortable, but we also know when it is safe or not to make a
specific 

Re: [VOTE] Designating maintainers for some Spark components

2014-11-06 Thread Cody Koeninger
My 2 cents:

Spark since pre-Apache days has been the most friendly and welcoming open
source project I've seen, and that's reflected in its success.

It seems pretty obvious to me that, for example, Michael should be looking
at major changes to the SQL codebase.  I trust him to do that in a way
that's technically and socially appropriate.

What Matei is saying makes sense, regardless of whether it gets codified in
a process.



On Thu, Nov 6, 2014 at 7:46 PM, Arun C Murthy a...@hortonworks.com wrote:

 With my ASF Member hat on, I fully agree with Greg.

 As he points out, this is an anti-pattern in the ASF and is severely
 frowned upon.

 We, in Hadoop, had a similar trajectory where we had were politely told to
 go away from having sub-project committers (HDFS, MapReduce etc.) to a
 common list of committers. There were some concerns initially, but we have
 successfully managed to work together and build a more healthy community as
 a result of following the advice on the ASF Way.

 I do have sympathy for good oversight etc. as the project grows and
 attracts many contributors - it's essentially the need to have smaller,
 well-knit developer communities. One way to achieve that would be to have
 separate TLPs  (e.g. Spark, MLLIB, GraphX) with separate committer lists
 for each representing the appropriate community. Hadoop went a similar
 route where we had Pig, Hive, HBase etc. as sub-projects initially and then
 split them into TLPs with more focussed communities to the benefit of
 everyone. Maybe you guys want to try this too?

 

 Few more observations:
 # In general, *discussions* on project directions (such as new concept of
 *maintainers*) should happen first on the public lists *before* voting, not
 in the private PMC list.
 # If you chose to go this route in spite of this advice, seems to me Spark
 would be better of having more maintainers per component (at least 4-5),
 probably with a lot more diversity in terms of affiliations. Not sure if
 that is a concern - do you have good diversity in the proposed list? This
 will ensure that there are no concerns about a dominant employer
 controlling a project.

 

 Hope this helps - we've gone through similar journey, got through similar
 issues and fully embraced the Apache Way (™) as Greg points out to our
 benefit.

 thanks,
 Arun


 On Nov 6, 2014, at 4:18 PM, Greg Stein gst...@gmail.com wrote:

  -1 (non-binding)
 
  This is an idea that runs COMPLETELY counter to the Apache Way, and is
  to be severely frowned up. This creates *unequal* ownership of the
  codebase.
 
  Each Member of the PMC should have *equal* rights to all areas of the
  codebase until their purview. It should not be subjected to others'
  ownership except throught the standard mechanisms of reviews and
  if/when absolutely necessary, to vetos.
 
  Apache does not want leads, benevolent dictators or assigned
  maintainers, no matter how you may dress it up with multiple
  maintainers per component. The fact is that this creates an unequal
  level of ownership and responsibility. The Board has shut down
  projects that attempted or allowed for Leads. Just a few months ago,
  there was a problem with somebody calling themself a Lead.
 
  I don't know why you suggest that Apache Subversion does this. We
  absolutely do not. Never have. Never will. The Subversion codebase is
  owned by all of us, and we all care for every line of it. Some people
  know more than others, of course. But any one of us, can change any
  part, without being subjected to a maintainer. Of course, we ask
  people with more knowledge of the component when we feel
  uncomfortable, but we also know when it is safe or not to make a
  specific change. And *always*, our fellow committers can review our
  work and let us know when we've done something wrong.
 
  Equal ownership reduces fiefdoms, enhances a feeling of community and
  project ownership, and creates a more open and inviting project.
 
  So again: -1 on this entire concept. Not good, to be polite.
 
  Regards,
  Greg Stein
  Director, Vice Chairman
  Apache Software Foundation
 
  On Wed, Nov 05, 2014 at 05:31:58PM -0800, Matei Zaharia wrote:
  Hi all,
 
  I wanted to share a discussion we've been having on the PMC list, as
 well as call for an official vote on it on a public list. Basically, as the
 Spark project scales up, we need to define a model to make sure there is
 still great oversight of key components (in particular internal
 architecture and public APIs), and to this end I've proposed implementing a
 maintainer model for some of these components, similar to other large
 projects.
 
  As background on this, Spark has grown a lot since joining Apache.
 We've had over 80 contributors/month for the past 3 months, which I believe
 makes us the most active project in contributors/month at Apache, as well
 as over 500 patches/month. The codebase has also grown significantly, with
 new libraries for SQL, ML, graphs and more.
 
  In 

Re: [VOTE] Designating maintainers for some Spark components

2014-11-06 Thread Corey Nolet
I'm actually going to change my non-binding to +0 for the proposal as-is.

I overlooked some parts of the original proposal that, when reading over
them again, do not sit well with me. one of the maintainers needs to sign
off on each patch to the component, as Greg has pointed out, does seem to
imply that there are committers with more power than others with regards to
specific components- which does imply ownership.

My thinking would be to re-work in some way as to take out the accent on
ownership. I would maybe focus on things such as:

1) Other committers and contributors being forced to consult with
maintainers of modules before patches can get rolled in.
2) Maintainers being assigned specifically from PMC.
3) Oversight to have more accent on keeping the community happy in a
specific area of interest vice being a consultant for the design of a
specific piece.

On Thu, Nov 6, 2014 at 8:46 PM, Arun C Murthy a...@hortonworks.com wrote:

 With my ASF Member hat on, I fully agree with Greg.

 As he points out, this is an anti-pattern in the ASF and is severely
 frowned upon.

 We, in Hadoop, had a similar trajectory where we had were politely told to
 go away from having sub-project committers (HDFS, MapReduce etc.) to a
 common list of committers. There were some concerns initially, but we have
 successfully managed to work together and build a more healthy community as
 a result of following the advice on the ASF Way.

 I do have sympathy for good oversight etc. as the project grows and
 attracts many contributors - it's essentially the need to have smaller,
 well-knit developer communities. One way to achieve that would be to have
 separate TLPs  (e.g. Spark, MLLIB, GraphX) with separate committer lists
 for each representing the appropriate community. Hadoop went a similar
 route where we had Pig, Hive, HBase etc. as sub-projects initially and then
 split them into TLPs with more focussed communities to the benefit of
 everyone. Maybe you guys want to try this too?

 

 Few more observations:
 # In general, *discussions* on project directions (such as new concept of
 *maintainers*) should happen first on the public lists *before* voting, not
 in the private PMC list.
 # If you chose to go this route in spite of this advice, seems to me Spark
 would be better of having more maintainers per component (at least 4-5),
 probably with a lot more diversity in terms of affiliations. Not sure if
 that is a concern - do you have good diversity in the proposed list? This
 will ensure that there are no concerns about a dominant employer
 controlling a project.

 

 Hope this helps - we've gone through similar journey, got through similar
 issues and fully embraced the Apache Way (™) as Greg points out to our
 benefit.

 thanks,
 Arun


 On Nov 6, 2014, at 4:18 PM, Greg Stein gst...@gmail.com wrote:

  -1 (non-binding)
 
  This is an idea that runs COMPLETELY counter to the Apache Way, and is
  to be severely frowned up. This creates *unequal* ownership of the
  codebase.
 
  Each Member of the PMC should have *equal* rights to all areas of the
  codebase until their purview. It should not be subjected to others'
  ownership except throught the standard mechanisms of reviews and
  if/when absolutely necessary, to vetos.
 
  Apache does not want leads, benevolent dictators or assigned
  maintainers, no matter how you may dress it up with multiple
  maintainers per component. The fact is that this creates an unequal
  level of ownership and responsibility. The Board has shut down
  projects that attempted or allowed for Leads. Just a few months ago,
  there was a problem with somebody calling themself a Lead.
 
  I don't know why you suggest that Apache Subversion does this. We
  absolutely do not. Never have. Never will. The Subversion codebase is
  owned by all of us, and we all care for every line of it. Some people
  know more than others, of course. But any one of us, can change any
  part, without being subjected to a maintainer. Of course, we ask
  people with more knowledge of the component when we feel
  uncomfortable, but we also know when it is safe or not to make a
  specific change. And *always*, our fellow committers can review our
  work and let us know when we've done something wrong.
 
  Equal ownership reduces fiefdoms, enhances a feeling of community and
  project ownership, and creates a more open and inviting project.
 
  So again: -1 on this entire concept. Not good, to be polite.
 
  Regards,
  Greg Stein
  Director, Vice Chairman
  Apache Software Foundation
 
  On Wed, Nov 05, 2014 at 05:31:58PM -0800, Matei Zaharia wrote:
  Hi all,
 
  I wanted to share a discussion we've been having on the PMC list, as
 well as call for an official vote on it on a public list. Basically, as the
 Spark project scales up, we need to define a model to make sure there is
 still great oversight of key components (in particular internal
 architecture and public APIs), and to this end I've proposed 

Re: [VOTE] Designating maintainers for some Spark components

2014-11-06 Thread Greg Stein
[ I'm going to try and pull a couple thread directions into this one, to
avoid explosion :-) ]

On Thu, Nov 6, 2014 at 6:44 PM, Corey Nolet cjno...@gmail.com wrote:

Note: I'm going to use you generically; I understand you [Corey] are not
a PMC member, at this time.

+1 (non-binding) [for original process proposal]

 Greg, the first time I've seen the word ownership on this thread is in
 your message. The first time the word lead has appeared in this thread is
 in your message as well. I don't think that was the intent. The PMC and
 Committers have a


The word ownership is there, but with a different term. If you are a PMC
member, and *cannot* alter a line of code without another's consent, then
you don't own that code. Your ownership is subservient to another. You
are not a *peer*, but a second-class citizen at this point.

The term maintainer in this context is being used as a word for lead.
The maintainers are a *gate* for any change. That is not consensus. The
proposal attempts to soften that, and turn it into an oligarchy of several
maintainers. But the simple fact is that you have some with the ability
to set direction, and those who do not. They are called leaders in most
contexts, but however you want to slice it... the dynamic creates people
with unequal commit ability.

But as the PMC member you *are* responsible for it. That is the very basic
definition of being a PMC member. You are responsible for all things
Spark.

responsibility to the community to make sure that their patches are being
 reviewed and committed. I don't see in Apache's recommended bylaws anywhere
 that says establishing responsibility on paper for specific areas cannot be
 taken on by different members of the PMC. What's been proposed looks, to
 me, to be an empirical process and it looks like it has pretty much a
 consensus from the side able to give binding votes. I don't at all this
 model establishes any form of ownership over anything. I also don't see in
 the process proposal where it mentions that nobody other than the persons
 responsible for a module can review or commit code.


where each patch to that component needs to get sign-off from at least one
of its maintainers

That establishes two types of PMC members: those who require sign-off, and
those who don't. Apache is intended to be a group of peers, none more
equal than others.

That said, we *do* recognize various levels of merit. This is where you see
differences between committers, their range of access, and PMC members. But
when you hit the *PMC member* role, then you are talking about a legal
construct established by the Foundation. You move outside of community
norms, and into how the umbrella of the Foundation operates. PMC members
are individually responsible for all of the code under their purview, which
is then at the direction of the Foundation itself. I'll skip that
conversation, and leave it with the simple phrase: as a PMC member, you're
responsible for the whole codebase.

So following from that, anything that *restricts* your ability to work on
that code, is a problem.

In fact, I'll go as far as to say that since Apache is a meritocracy, the
 people who have been aligned to the responsibilities probably were aligned
 based on some sort of meric, correct? Perhaps we could dig in and find out
 for sure... I'm still getting familiar with the Spark community myself.


Once you are a PMC member, then there is no difference in your merit. Merit
ends. You're a PMC member, and that is all there is to it. Just because
Jane commits 1000 times per month, makes her no better than John who
commits 10/month. They are peers on the PMC and have equal rights and
responsibility to the codebase.

Historically, some PMCs have attempted to create variant levels within the
PMC, or create different groups and rights, or different partitions over
the code, and ... again, historically: it has failed. This is why Apache
stresses consensus. The failure modes are crazy and numerous when moving
away from that, into silos.

...
On Thu, Nov 6, 2014 at 6:49 PM, Matei Zaharia matei.zaha...@gmail.com
 wrote:

 So I don't understand, Greg, are the partial committers committers, or are
 they not? Spark also has a PMC, but our PMC currently consists of all
 committers (we decided not to have a differentiation when we left the
 incubator). I see the Subversion partial committers listed as committers
 on https://people.apache.org/committers-by-project.html#subversion, so I
 assume they are committers. As far as I can see, CloudStack is similar.


PMC members are responsible for the code. They provide the oversight,
direction, and management. (they're also responsible for the community, but
that distinction isn't relevant in this contrasting example)

Committers can make changes to the code, with the
acknowledgement/agreement/direction of the PMC.

When these groups are equal, like Spark, then things are pretty simple.

But many communities in Apache define them as disparate. 

Re: [VOTE] Designating maintainers for some Spark components

2014-11-06 Thread Greg Stein
On Thu, Nov 6, 2014 at 7:28 PM, Sandy Ryza sandy.r...@cloudera.com wrote:

 It looks like the difference between the proposed Spark model and the
 CloudStack / SVN model is:
 * In the former, maintainers / partial committers are a way of
 centralizing oversight over particular components among committers
 * In the latter, maintainers / partial committers are a way of giving
 non-committers some power to make changes


I can't speak for CloudStack, but for Subversion: yes, you're exactly
right, Sandy.

We use the partial committer role as a way to bring in new committers.
Great idea, go work there, and have fun. Any PMC member can give a
single +1, and that new (partial) committer gets and account/access, and is
off and running. We don't even ask for a PMC vote (though, we almost always
have a brief discussion).

The svnrdump tool was written by a *Git* Google Summer of Code student.
He wanted a quick way to get a Subversion dumpfile from a remote
repository, in order to drop that into Git. We gave him commit access
directly into trunk/svnrdump, and he wrote the tool. Technically, he could
commit anywhere in our tree, but we just asked him not to, without a +1
from a PMC member.

Partial committers are a way to *include* people into the [coding]
community. And hopefully, over time, they grow into something more.

Maintainers are a way (IMO) to *exclude* people from certain commit
activity. (or more precisely: limit/restrict, rather than exclude)

You can see why it concerns me :-)

Cheers,
-g


Re: [VOTE] Designating maintainers for some Spark components

2014-11-06 Thread Matei Zaharia
Alright, Greg, I think I understand how Subversion's model is different, which 
is that the PMC members are all full committers. However, I still think that 
the model proposed here is purely organizational (how the PMC and committers 
organize themselves), and in no way changes peoples' ownership or rights. 
Certainly the reason I proposed it was organizational, to make sure patches get 
seen by the right people. I believe that every PMC member still has the same 
responsibility for two reasons:

1) The PMC is actually what selects the maintainers, so basically this 
mechanism is a way for the PMC to make sure certain people review each patch.

2) Code changes are all still made by consensus, where any individual has veto 
power over the code. The maintainer model mentioned here is only meant to make 
sure that the experts in an area get to see each patch *before* it is merged, 
and choose whether to exercise their veto power.

Let me give a simple example, which is a patch to the Spark core public API. 
Say I'm a maintainer in this API. Without the maintainer model, the decision on 
the patch would be made as follows:

- Any committer could review the patch and merge it
- At any point during this process, I (as the main expert on this) could come 
in and -1 it, or give feedback
- In addition, any other committer beyond me is allowed to -1 this patch

With the maintainer model, the process is as follows:

- Any committer could review the patch and merge it, but they would need to 
forward it to me (or another core API maintainer) to make sure we also approve
- At any point during this process, I could come in and -1 it, or give feedback
- In addition, any other committer beyond me is still allowed to -1 this patch

The only change in this model is that committers are responsible to forward 
patches in these areas to certain other committers. If every committer had 
perfect oversight of the project, they could have also seen every patch to 
their component on their own, but this list ensures that they see it even if 
they somehow overlooked it.

It's true that technically this model might gate development in the sense of 
adding some latency, but it doesn't gate it any more than consensus as a 
whole does, where any committer (not even PMC member) can -1 any code change. 
In fact I believe this will speed development by motivating the maintainers to 
be active in reviewing their areas and by reducing the chance that mistakes 
happen that require a revert.

I apologize if this wasn't clear in any way, but I do think it's pretty clear 
in the original wording of the proposal. The sign-off by a maintainer is simply 
an extra step in the merge process, it does *not* mean that other committers 
can't -1 a patch, or that the maintainers get to review all patches, or that 
they somehow have more ownership of the component (since they already had the 
ability to -1). I also wanted to clarify another thing -- it seems there is a 
misunderstanding that only PMC members can be maintainers, but this was not the 
point; the PMC *assigns* maintainers but they can do it out of the whole 
committer pool (and if we move to separating the PMC from the committers, I 
fully expect some non-PMC committers to be made maintainers).

I hope this clarifies where we're coming from, and why we believe that this 
still conforms fully with the spirit of Apache (collaborative, open development 
that anyone can participate in, and meritocracy for project governance). There 
were some comments made about the maintainers being only some kind of list of 
people without a requirement to review stuff, but as you can see it's the 
requirement to review that is the main reason I'm proposing this, to ensure we 
have an automated process for patches to certain components to be seen. If it 
helps we may be able to change the wording to something like it is every 
committer's responsibility to forward patches for a maintained component to 
that component's maintainer, or something like that, instead of using sign 
off. If we don't do this, I'd actually be against any measure that lists some 
component maintainers without them having a specific responsibility. Apache 
is not a place for people to gain kudos by having fancier titles given on a 
website, it's a place for building great communities and software.

Matei



 On Nov 6, 2014, at 9:27 PM, Greg Stein gst...@gmail.com wrote:
 
 On Thu, Nov 6, 2014 at 7:28 PM, Sandy Ryza sandy.r...@cloudera.com wrote:
 
 It looks like the difference between the proposed Spark model and the
 CloudStack / SVN model is:
 * In the former, maintainers / partial committers are a way of
 centralizing oversight over particular components among committers
 * In the latter, maintainers / partial committers are a way of giving
 non-committers some power to make changes
 
 
 I can't speak for CloudStack, but for Subversion: yes, you're exactly
 right, Sandy.
 
 We use the partial committer role as a way to bring in new 

Re: [VOTE] Designating maintainers for some Spark components

2014-11-06 Thread Greg Stein
[last reply for tonite; let others read; and after the next drink or three,
I shouldn't be replying...]

On Thu, Nov 6, 2014 at 11:38 PM, Matei Zaharia matei.zaha...@gmail.com
wrote:

 Alright, Greg, I think I understand how Subversion's model is different,
 which is that the PMC members are all full committers. However, I still
 think that the model proposed here is purely organizational (how the PMC
 and committers organize themselves), and in no way changes peoples'
 ownership or rights.


That was not my impression, when your proposal said that maintainers need
to provide sign-off.

Okay. Now my next item of feedback starts here:


 Certainly the reason I proposed it was organizational, to make sure
 patches get seen by the right people. I believe that every PMC member still
 has the same responsibility for two reasons:

 1) The PMC is actually what selects the maintainers, so basically this
 mechanism is a way for the PMC to make sure certain people review each
 patch.

 2) Code changes are all still made by consensus, where any individual has
 veto power over the code. The maintainer model mentioned here is only meant
 to make sure that the experts in an area get to see each patch *before*
 it is merged, and choose whether to exercise their veto power.

 Let me give a simple example, which is a patch to the Spark core public
 API. Say I'm a maintainer in this API. Without the maintainer model, the
 decision on the patch would be made as follows:

 - Any committer could review the patch and merge it
 - At any point during this process, I (as the main expert on this) could
 come in and -1 it, or give feedback
 - In addition, any other committer beyond me is allowed to -1 this patch

 With the maintainer model, the process is as follows:

 - Any committer could review the patch and merge it, but they would need
 to forward it to me (or another core API maintainer) to make sure we also
 approve
 - At any point during this process, I could come in and -1 it, or give
 feedback
 - In addition, any other committer beyond me is still allowed to -1 this
 patch

 The only change in this model is that committers are responsible to
 forward patches in these areas to certain other committers. If every
 committer had perfect oversight of the project, they could have also seen
 every patch to their component on their own, but this list ensures that
 they see it even if they somehow overlooked it.

 It's true that technically this model might gate development in the
 sense of adding some latency, but it doesn't gate it any more than
 consensus as a whole does, where any committer (not even PMC member) can -1
 any code change. In fact I believe this will speed development by
 motivating the maintainers to be active in reviewing their areas and by
 reducing the chance that mistakes happen that require a revert.

 I apologize if this wasn't clear in any way, but I do think it's pretty
 clear in the original wording of the proposal. The sign-off by a maintainer
 is simply an extra step in the merge process, it does *not* mean that other
 committers can't -1 a patch, or that the maintainers get to review all
 patches, or that they somehow have more ownership of the component (since
 they already had the ability to -1). I also wanted to clarify another thing
 -- it seems there is a misunderstanding that only PMC members can be
 maintainers, but this was not the point; the PMC *assigns* maintainers but
 they can do it out of the whole committer pool (and if we move to
 separating the PMC from the committers, I fully expect some non-PMC
 committers to be made maintainers).


... and ends here.

All of that text is about a process for applying Vetoes. ... That is just
the wrong focus (IMO).

Back around 2000, in httpd, we ran into vetoes. It was horrible. The
community suffered. We actually had a face-to-face at one point, flying in
people from around the US, gathering a bunch of the httpd committers to
work through some basic problems. The vetoes were flying fast and furious,
and it was just the wrong dynamic. Discussion and consensus had been thrown
aside. Trust was absent. Peer relationships were ruined. (tho thankfully,
our personal relationships never suffered, and that basis helped us pull it
back together)

Contrast that with Subversion. We've had some vetoes, yes. But invariably,
MOST of them would really be considered woah. -1 on that. let's talk.
Only a few were about somebody laying down the veto hammer. Outside those
few, a -1 was always about opening a discussion to fix a particular commit.

It looks like you are creating a process to apply vetoes. That seems
backwards.

It seems like you want a process to ensure that reviews are performed. IMO,
all committers/PMC members should begin as *trusted*. Why not? You've
already voted them in as committers/PMCers. So trust them. Trust.

And that leads to trust, but verify. The review process. So how about
creating a workflow that is focused on what needs to be reviewed 

Re: [VOTE] Designating maintainers for some Spark components

2014-11-06 Thread Reynold Xin
Greg,

Thanks a lot for commenting on this, but I feel we are splitting hairs
here. Matei did mention -1, followed by or give feedback. The original
process outlined by Matei was exactly about review, rather than fighting.
Nobody wants to spend their energy fighting.  Everybody is doing it to
improve the project.


In particular, quoting you in your email

Be careful here. Responsibility is pretty much a taboo word. All of
Apache is a group of volunteers. People can disappear at any point, which
is why you need multiple (as my fellow Director warned, on your private
list). And multiple people can disappear.

Take a look at this page: http://www.apache.org/dev/pmc.html

This Project Management Committee Guide outlines the general
***responsibilities*** of PMC members in managing their projects.

Are you suggesting the wording used by the PMC guideline itself is taboo?





On Thu, Nov 6, 2014 at 11:27 PM, Greg Stein gst...@gmail.com wrote:

 [last reply for tonite; let others read; and after the next drink or three,
 I shouldn't be replying...]

 On Thu, Nov 6, 2014 at 11:38 PM, Matei Zaharia matei.zaha...@gmail.com
 wrote:

  Alright, Greg, I think I understand how Subversion's model is different,
  which is that the PMC members are all full committers. However, I still
  think that the model proposed here is purely organizational (how the PMC
  and committers organize themselves), and in no way changes peoples'
  ownership or rights.


 That was not my impression, when your proposal said that maintainers need
 to provide sign-off.

 Okay. Now my next item of feedback starts here:


  Certainly the reason I proposed it was organizational, to make sure
  patches get seen by the right people. I believe that every PMC member
 still
  has the same responsibility for two reasons:
 
  1) The PMC is actually what selects the maintainers, so basically this
  mechanism is a way for the PMC to make sure certain people review each
  patch.
 
  2) Code changes are all still made by consensus, where any individual has
  veto power over the code. The maintainer model mentioned here is only
 meant
  to make sure that the experts in an area get to see each patch *before*
  it is merged, and choose whether to exercise their veto power.
 
  Let me give a simple example, which is a patch to the Spark core public
  API. Say I'm a maintainer in this API. Without the maintainer model, the
  decision on the patch would be made as follows:
 
  - Any committer could review the patch and merge it
  - At any point during this process, I (as the main expert on this) could
  come in and -1 it, or give feedback
  - In addition, any other committer beyond me is allowed to -1 this patch
 
  With the maintainer model, the process is as follows:
 
  - Any committer could review the patch and merge it, but they would need
  to forward it to me (or another core API maintainer) to make sure we also
  approve
  - At any point during this process, I could come in and -1 it, or give
  feedback
  - In addition, any other committer beyond me is still allowed to -1 this
  patch
 
  The only change in this model is that committers are responsible to
  forward patches in these areas to certain other committers. If every
  committer had perfect oversight of the project, they could have also seen
  every patch to their component on their own, but this list ensures that
  they see it even if they somehow overlooked it.
 
  It's true that technically this model might gate development in the
  sense of adding some latency, but it doesn't gate it any more than
  consensus as a whole does, where any committer (not even PMC member) can
 -1
  any code change. In fact I believe this will speed development by
  motivating the maintainers to be active in reviewing their areas and by
  reducing the chance that mistakes happen that require a revert.
 
  I apologize if this wasn't clear in any way, but I do think it's pretty
  clear in the original wording of the proposal. The sign-off by a
 maintainer
  is simply an extra step in the merge process, it does *not* mean that
 other
  committers can't -1 a patch, or that the maintainers get to review all
  patches, or that they somehow have more ownership of the component
 (since
  they already had the ability to -1). I also wanted to clarify another
 thing
  -- it seems there is a misunderstanding that only PMC members can be
  maintainers, but this was not the point; the PMC *assigns* maintainers
 but
  they can do it out of the whole committer pool (and if we move to
  separating the PMC from the committers, I fully expect some non-PMC
  committers to be made maintainers).
 

 ... and ends here.

 All of that text is about a process for applying Vetoes. ... That is just
 the wrong focus (IMO).

 Back around 2000, in httpd, we ran into vetoes. It was horrible. The
 community suffered. We actually had a face-to-face at one point, flying in
 people from around the US, gathering a bunch of the httpd committers to
 

Re: [VOTE] Designating maintainers for some Spark components

2014-11-06 Thread Vinod Kumar Vavilapalli
 With the maintainer model, the process is as follows:
 
 - Any committer could review the patch and merge it, but they would need to 
 forward it to me (or another core API maintainer) to make sure we also approve
 - At any point during this process, I could come in and -1 it, or give 
 feedback
 - In addition, any other committer beyond me is still allowed to -1 this patch
 
 The only change in this model is that committers are responsible to forward 
 patches in these areas to certain other committers. If every committer had 
 perfect oversight of the project, they could have also seen every patch to 
 their component on their own, but this list ensures that they see it even if 
 they somehow overlooked it.


Having done the job of playing an informal 'maintainer' of a project myself, 
this is what I think you really need:

The so called 'maintainers' do one of the below
 - Actively poll the lists and watch over contributions. And follow what is 
repeated often around here: Trust but verify.
 - Setup automated mechanisms to send all bug-tracker updates of a specific 
component to a list that people can subscribe to

And/or
 - Individual contributors send review requests to unofficial 'maintainers' 
over dev-lists or through tools. Like many projects do with review boards and 
other tools.

Note that none of the above is a required step. It must not be, that's the 
point. But once set as a convention, they will all help you address your 
concerns with project scalability.

Anything else that you add is bestowing privileges to a select few and forming 
dictatorships. And contrary to what the proposal claims, this is neither 
scalable nor confirming to Apache governance rules.

+Vinod


signature.asc
Description: Message signed with OpenPGP using GPGMail


[VOTE] Designating maintainers for some Spark components

2014-11-05 Thread Matei Zaharia
Hi all,

I wanted to share a discussion we've been having on the PMC list, as well as 
call for an official vote on it on a public list. Basically, as the Spark 
project scales up, we need to define a model to make sure there is still great 
oversight of key components (in particular internal architecture and public 
APIs), and to this end I've proposed implementing a maintainer model for some 
of these components, similar to other large projects.

As background on this, Spark has grown a lot since joining Apache. We've had 
over 80 contributors/month for the past 3 months, which I believe makes us the 
most active project in contributors/month at Apache, as well as over 500 
patches/month. The codebase has also grown significantly, with new libraries 
for SQL, ML, graphs and more.

In this kind of large project, one common way to scale development is to assign 
maintainers to oversee key components, where each patch to that component 
needs to get sign-off from at least one of its maintainers. Most existing large 
projects do this -- at Apache, some large ones with this model are CloudStack 
(the second-most active project overall), Subversion, and Kafka, and other 
examples include Linux and Python. This is also by-and-large how Spark operates 
today -- most components have a de-facto maintainer.

IMO, adopting this model would have two benefits:

1) Consistent oversight of design for that component, especially regarding 
architecture and API. This process would ensure that the component's 
maintainers see all proposed changes and consider them to fit together in a 
good way.

2) More structure for new contributors and committers -- in particular, it 
would be easy to look up who’s responsible for each module and ask them for 
reviews, etc, rather than having patches slip between the cracks.

We'd like to start with in a light-weight manner, where the model only applies 
to certain key components (e.g. scheduler, shuffle) and user-facing APIs 
(MLlib, GraphX, etc). Over time, as the project grows, we can expand it if we 
deem it useful. The specific mechanics would be as follows:

- Some components in Spark will have maintainers assigned to them, where one of 
the maintainers needs to sign off on each patch to the component.
- Each component with maintainers will have at least 2 maintainers.
- Maintainers will be assigned from the most active and knowledgeable 
committers on that component by the PMC. The PMC can vote to add / remove 
maintainers, and maintained components, through consensus.
- Maintainers are expected to be active in responding to patches for their 
components, though they do not need to be the main reviewers for them (e.g. 
they might just sign off on architecture / API). To prevent inactive 
maintainers from blocking the project, if a maintainer isn't responding in a 
reasonable time period (say 2 weeks), other committers can merge the patch, and 
the PMC will want to discuss adding another maintainer.

If you'd like to see examples for this model, check out the following projects:
- CloudStack: 
https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
 
https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
 
- Subversion: https://subversion.apache.org/docs/community-guide/roles.html 
https://subversion.apache.org/docs/community-guide/roles.html

Finally, I wanted to list our current proposal for initial components and 
maintainers. It would be good to get feedback on other components we might add, 
but please note that personnel discussions (e.g. I don't think Matei should 
maintain *that* component) should only happen on the private list. The initial 
components were chosen to include all public APIs and the main core components, 
and the maintainers were chosen from the most active contributors to those 
modules.

- Spark core public API: Matei, Patrick, Reynold
- Job scheduler: Matei, Kay, Patrick
- Shuffle and network: Reynold, Aaron, Matei
- Block manager: Reynold, Aaron
- YARN: Tom, Andrew Or
- Python: Josh, Matei
- MLlib: Xiangrui, Matei
- SQL: Michael, Reynold
- Streaming: TD, Matei
- GraphX: Ankur, Joey, Reynold

I'd like to formally call a [VOTE] on this model, to last 72 hours. The [VOTE] 
will end on Nov 8, 2014 at 6 PM PST.

Matei

Re: [VOTE] Designating maintainers for some Spark components

2014-11-05 Thread Timothy Chen
Hi Matei,

Definitely in favor of moving into this model for exactly the reasons
you mentioned.

From the module list though, the module that I'm mostly involved with
and is not listed is the Mesos integration piece.

I believe we also need a maintainer for Mesos, and I wonder if there
is someone that can be added to that?

Tim

On Wed, Nov 5, 2014 at 5:31 PM, Matei Zaharia matei.zaha...@gmail.com wrote:
 Hi all,

 I wanted to share a discussion we've been having on the PMC list, as well as 
 call for an official vote on it on a public list. Basically, as the Spark 
 project scales up, we need to define a model to make sure there is still 
 great oversight of key components (in particular internal architecture and 
 public APIs), and to this end I've proposed implementing a maintainer model 
 for some of these components, similar to other large projects.

 As background on this, Spark has grown a lot since joining Apache. We've had 
 over 80 contributors/month for the past 3 months, which I believe makes us 
 the most active project in contributors/month at Apache, as well as over 500 
 patches/month. The codebase has also grown significantly, with new libraries 
 for SQL, ML, graphs and more.

 In this kind of large project, one common way to scale development is to 
 assign maintainers to oversee key components, where each patch to that 
 component needs to get sign-off from at least one of its maintainers. Most 
 existing large projects do this -- at Apache, some large ones with this model 
 are CloudStack (the second-most active project overall), Subversion, and 
 Kafka, and other examples include Linux and Python. This is also by-and-large 
 how Spark operates today -- most components have a de-facto maintainer.

 IMO, adopting this model would have two benefits:

 1) Consistent oversight of design for that component, especially regarding 
 architecture and API. This process would ensure that the component's 
 maintainers see all proposed changes and consider them to fit together in a 
 good way.

 2) More structure for new contributors and committers -- in particular, it 
 would be easy to look up who’s responsible for each module and ask them for 
 reviews, etc, rather than having patches slip between the cracks.

 We'd like to start with in a light-weight manner, where the model only 
 applies to certain key components (e.g. scheduler, shuffle) and user-facing 
 APIs (MLlib, GraphX, etc). Over time, as the project grows, we can expand it 
 if we deem it useful. The specific mechanics would be as follows:

 - Some components in Spark will have maintainers assigned to them, where one 
 of the maintainers needs to sign off on each patch to the component.
 - Each component with maintainers will have at least 2 maintainers.
 - Maintainers will be assigned from the most active and knowledgeable 
 committers on that component by the PMC. The PMC can vote to add / remove 
 maintainers, and maintained components, through consensus.
 - Maintainers are expected to be active in responding to patches for their 
 components, though they do not need to be the main reviewers for them (e.g. 
 they might just sign off on architecture / API). To prevent inactive 
 maintainers from blocking the project, if a maintainer isn't responding in a 
 reasonable time period (say 2 weeks), other committers can merge the patch, 
 and the PMC will want to discuss adding another maintainer.

 If you'd like to see examples for this model, check out the following 
 projects:
 - CloudStack: 
 https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
  
 https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
 - Subversion: https://subversion.apache.org/docs/community-guide/roles.html 
 https://subversion.apache.org/docs/community-guide/roles.html

 Finally, I wanted to list our current proposal for initial components and 
 maintainers. It would be good to get feedback on other components we might 
 add, but please note that personnel discussions (e.g. I don't think Matei 
 should maintain *that* component) should only happen on the private list. The 
 initial components were chosen to include all public APIs and the main core 
 components, and the maintainers were chosen from the most active contributors 
 to those modules.

 - Spark core public API: Matei, Patrick, Reynold
 - Job scheduler: Matei, Kay, Patrick
 - Shuffle and network: Reynold, Aaron, Matei
 - Block manager: Reynold, Aaron
 - YARN: Tom, Andrew Or
 - Python: Josh, Matei
 - MLlib: Xiangrui, Matei
 - SQL: Michael, Reynold
 - Streaming: TD, Matei
 - GraphX: Ankur, Joey, Reynold

 I'd like to formally call a [VOTE] on this model, to last 72 hours. The 
 [VOTE] will end on Nov 8, 2014 at 6 PM PST.

 Matei

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: [VOTE] Designating maintainers for some Spark components

2014-11-05 Thread Michael Armbrust
+1 (binding)

On Wed, Nov 5, 2014 at 5:33 PM, Matei Zaharia matei.zaha...@gmail.com
wrote:

 BTW, my own vote is obviously +1 (binding).

 Matei

  On Nov 5, 2014, at 5:31 PM, Matei Zaharia matei.zaha...@gmail.com
 wrote:
 
  Hi all,
 
  I wanted to share a discussion we've been having on the PMC list, as
 well as call for an official vote on it on a public list. Basically, as the
 Spark project scales up, we need to define a model to make sure there is
 still great oversight of key components (in particular internal
 architecture and public APIs), and to this end I've proposed implementing a
 maintainer model for some of these components, similar to other large
 projects.
 
  As background on this, Spark has grown a lot since joining Apache. We've
 had over 80 contributors/month for the past 3 months, which I believe makes
 us the most active project in contributors/month at Apache, as well as over
 500 patches/month. The codebase has also grown significantly, with new
 libraries for SQL, ML, graphs and more.
 
  In this kind of large project, one common way to scale development is to
 assign maintainers to oversee key components, where each patch to that
 component needs to get sign-off from at least one of its maintainers. Most
 existing large projects do this -- at Apache, some large ones with this
 model are CloudStack (the second-most active project overall), Subversion,
 and Kafka, and other examples include Linux and Python. This is also
 by-and-large how Spark operates today -- most components have a de-facto
 maintainer.
 
  IMO, adopting this model would have two benefits:
 
  1) Consistent oversight of design for that component, especially
 regarding architecture and API. This process would ensure that the
 component's maintainers see all proposed changes and consider them to fit
 together in a good way.
 
  2) More structure for new contributors and committers -- in particular,
 it would be easy to look up who’s responsible for each module and ask them
 for reviews, etc, rather than having patches slip between the cracks.
 
  We'd like to start with in a light-weight manner, where the model only
 applies to certain key components (e.g. scheduler, shuffle) and user-facing
 APIs (MLlib, GraphX, etc). Over time, as the project grows, we can expand
 it if we deem it useful. The specific mechanics would be as follows:
 
  - Some components in Spark will have maintainers assigned to them, where
 one of the maintainers needs to sign off on each patch to the component.
  - Each component with maintainers will have at least 2 maintainers.
  - Maintainers will be assigned from the most active and knowledgeable
 committers on that component by the PMC. The PMC can vote to add / remove
 maintainers, and maintained components, through consensus.
  - Maintainers are expected to be active in responding to patches for
 their components, though they do not need to be the main reviewers for them
 (e.g. they might just sign off on architecture / API). To prevent inactive
 maintainers from blocking the project, if a maintainer isn't responding in
 a reasonable time period (say 2 weeks), other committers can merge the
 patch, and the PMC will want to discuss adding another maintainer.
 
  If you'd like to see examples for this model, check out the following
 projects:
  - CloudStack:
 https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
 
 https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
 
  - Subversion:
 https://subversion.apache.org/docs/community-guide/roles.html 
 https://subversion.apache.org/docs/community-guide/roles.html
 
  Finally, I wanted to list our current proposal for initial components
 and maintainers. It would be good to get feedback on other components we
 might add, but please note that personnel discussions (e.g. I don't think
 Matei should maintain *that* component) should only happen on the private
 list. The initial components were chosen to include all public APIs and the
 main core components, and the maintainers were chosen from the most active
 contributors to those modules.
 
  - Spark core public API: Matei, Patrick, Reynold
  - Job scheduler: Matei, Kay, Patrick
  - Shuffle and network: Reynold, Aaron, Matei
  - Block manager: Reynold, Aaron
  - YARN: Tom, Andrew Or
  - Python: Josh, Matei
  - MLlib: Xiangrui, Matei
  - SQL: Michael, Reynold
  - Streaming: TD, Matei
  - GraphX: Ankur, Joey, Reynold
 
  I'd like to formally call a [VOTE] on this model, to last 72 hours. The
 [VOTE] will end on Nov 8, 2014 at 6 PM PST.
 
  Matei




Re: [VOTE] Designating maintainers for some Spark components

2014-11-05 Thread Reynold Xin
+1 (binding)

We are already doing this implicitly. In my experience, this can create
longer term personal commitment, which usually leads to better design
decisions if somebody knows they would need to look after something for a
while.

On Wed, Nov 5, 2014 at 5:33 PM, Matei Zaharia matei.zaha...@gmail.com
wrote:

 BTW, my own vote is obviously +1 (binding).

 Matei

  On Nov 5, 2014, at 5:31 PM, Matei Zaharia matei.zaha...@gmail.com
 wrote:
 
  Hi all,
 
  I wanted to share a discussion we've been having on the PMC list, as
 well as call for an official vote on it on a public list. Basically, as the
 Spark project scales up, we need to define a model to make sure there is
 still great oversight of key components (in particular internal
 architecture and public APIs), and to this end I've proposed implementing a
 maintainer model for some of these components, similar to other large
 projects.
 
  As background on this, Spark has grown a lot since joining Apache. We've
 had over 80 contributors/month for the past 3 months, which I believe makes
 us the most active project in contributors/month at Apache, as well as over
 500 patches/month. The codebase has also grown significantly, with new
 libraries for SQL, ML, graphs and more.
 
  In this kind of large project, one common way to scale development is to
 assign maintainers to oversee key components, where each patch to that
 component needs to get sign-off from at least one of its maintainers. Most
 existing large projects do this -- at Apache, some large ones with this
 model are CloudStack (the second-most active project overall), Subversion,
 and Kafka, and other examples include Linux and Python. This is also
 by-and-large how Spark operates today -- most components have a de-facto
 maintainer.
 
  IMO, adopting this model would have two benefits:
 
  1) Consistent oversight of design for that component, especially
 regarding architecture and API. This process would ensure that the
 component's maintainers see all proposed changes and consider them to fit
 together in a good way.
 
  2) More structure for new contributors and committers -- in particular,
 it would be easy to look up who’s responsible for each module and ask them
 for reviews, etc, rather than having patches slip between the cracks.
 
  We'd like to start with in a light-weight manner, where the model only
 applies to certain key components (e.g. scheduler, shuffle) and user-facing
 APIs (MLlib, GraphX, etc). Over time, as the project grows, we can expand
 it if we deem it useful. The specific mechanics would be as follows:
 
  - Some components in Spark will have maintainers assigned to them, where
 one of the maintainers needs to sign off on each patch to the component.
  - Each component with maintainers will have at least 2 maintainers.
  - Maintainers will be assigned from the most active and knowledgeable
 committers on that component by the PMC. The PMC can vote to add / remove
 maintainers, and maintained components, through consensus.
  - Maintainers are expected to be active in responding to patches for
 their components, though they do not need to be the main reviewers for them
 (e.g. they might just sign off on architecture / API). To prevent inactive
 maintainers from blocking the project, if a maintainer isn't responding in
 a reasonable time period (say 2 weeks), other committers can merge the
 patch, and the PMC will want to discuss adding another maintainer.
 
  If you'd like to see examples for this model, check out the following
 projects:
  - CloudStack:
 https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
 
 https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
 
  - Subversion:
 https://subversion.apache.org/docs/community-guide/roles.html 
 https://subversion.apache.org/docs/community-guide/roles.html
 
  Finally, I wanted to list our current proposal for initial components
 and maintainers. It would be good to get feedback on other components we
 might add, but please note that personnel discussions (e.g. I don't think
 Matei should maintain *that* component) should only happen on the private
 list. The initial components were chosen to include all public APIs and the
 main core components, and the maintainers were chosen from the most active
 contributors to those modules.
 
  - Spark core public API: Matei, Patrick, Reynold
  - Job scheduler: Matei, Kay, Patrick
  - Shuffle and network: Reynold, Aaron, Matei
  - Block manager: Reynold, Aaron
  - YARN: Tom, Andrew Or
  - Python: Josh, Matei
  - MLlib: Xiangrui, Matei
  - SQL: Michael, Reynold
  - Streaming: TD, Matei
  - GraphX: Ankur, Joey, Reynold
 
  I'd like to formally call a [VOTE] on this model, to last 72 hours. The
 [VOTE] will end on Nov 8, 2014 at 6 PM PST.
 
  Matei




Re: [VOTE] Designating maintainers for some Spark components

2014-11-05 Thread Nan Zhu
+1, with a question

Will these maintainers have a cleanup for those pending PRs upon we start to 
apply this model? there are some patches always being there but haven’t been  
merged, some of which are periodically maintained (rebase, ping , etc….), the 
others are just phased out  

Best,  

--  
Nan Zhu


On Wednesday, November 5, 2014 at 8:33 PM, Matei Zaharia wrote:

 BTW, my own vote is obviously +1 (binding).
  
 Matei
  
  On Nov 5, 2014, at 5:31 PM, Matei Zaharia matei.zaha...@gmail.com 
  (mailto:matei.zaha...@gmail.com) wrote:
   
  Hi all,
   
  I wanted to share a discussion we've been having on the PMC list, as well 
  as call for an official vote on it on a public list. Basically, as the 
  Spark project scales up, we need to define a model to make sure there is 
  still great oversight of key components (in particular internal 
  architecture and public APIs), and to this end I've proposed implementing a 
  maintainer model for some of these components, similar to other large 
  projects.
   
  As background on this, Spark has grown a lot since joining Apache. We've 
  had over 80 contributors/month for the past 3 months, which I believe makes 
  us the most active project in contributors/month at Apache, as well as over 
  500 patches/month. The codebase has also grown significantly, with new 
  libraries for SQL, ML, graphs and more.
   
  In this kind of large project, one common way to scale development is to 
  assign maintainers to oversee key components, where each patch to that 
  component needs to get sign-off from at least one of its maintainers. Most 
  existing large projects do this -- at Apache, some large ones with this 
  model are CloudStack (the second-most active project overall), Subversion, 
  and Kafka, and other examples include Linux and Python. This is also 
  by-and-large how Spark operates today -- most components have a de-facto 
  maintainer.
   
  IMO, adopting this model would have two benefits:
   
  1) Consistent oversight of design for that component, especially regarding 
  architecture and API. This process would ensure that the component's 
  maintainers see all proposed changes and consider them to fit together in a 
  good way.
   
  2) More structure for new contributors and committers -- in particular, it 
  would be easy to look up who’s responsible for each module and ask them for 
  reviews, etc, rather than having patches slip between the cracks.
   
  We'd like to start with in a light-weight manner, where the model only 
  applies to certain key components (e.g. scheduler, shuffle) and user-facing 
  APIs (MLlib, GraphX, etc). Over time, as the project grows, we can expand 
  it if we deem it useful. The specific mechanics would be as follows:
   
  - Some components in Spark will have maintainers assigned to them, where 
  one of the maintainers needs to sign off on each patch to the component.
  - Each component with maintainers will have at least 2 maintainers.
  - Maintainers will be assigned from the most active and knowledgeable 
  committers on that component by the PMC. The PMC can vote to add / remove 
  maintainers, and maintained components, through consensus.
  - Maintainers are expected to be active in responding to patches for their 
  components, though they do not need to be the main reviewers for them (e.g. 
  they might just sign off on architecture / API). To prevent inactive 
  maintainers from blocking the project, if a maintainer isn't responding in 
  a reasonable time period (say 2 weeks), other committers can merge the 
  patch, and the PMC will want to discuss adding another maintainer.
   
  If you'd like to see examples for this model, check out the following 
  projects:
  - CloudStack: 
  https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
   
  https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide

  - Subversion: https://subversion.apache.org/docs/community-guide/roles.html 
  https://subversion.apache.org/docs/community-guide/roles.html
   
  Finally, I wanted to list our current proposal for initial components and 
  maintainers. It would be good to get feedback on other components we might 
  add, but please note that personnel discussions (e.g. I don't think Matei 
  should maintain *that* component) should only happen on the private list. 
  The initial components were chosen to include all public APIs and the main 
  core components, and the maintainers were chosen from the most active 
  contributors to those modules.
   
  - Spark core public API: Matei, Patrick, Reynold
  - Job scheduler: Matei, Kay, Patrick
  - Shuffle and network: Reynold, Aaron, Matei
  - Block manager: Reynold, Aaron
  - YARN: Tom, Andrew Or
  - Python: Josh, Matei
  - MLlib: Xiangrui, Matei
  - SQL: Michael, Reynold
  - Streaming: TD, Matei
  - GraphX: Ankur, Joey, Reynold
   
  I'd like to formally call a [VOTE] on this model, to last 72 hours. The 
  [VOTE] will 

Re: [VOTE] Designating maintainers for some Spark components

2014-11-05 Thread Matei Zaharia
Hi Tim,

We can definitely add one for that if the component grows larger or becomes 
harder to maintain. The main reason I didn't propose one is that the Mesos 
integration is actually a lot simpler than YARN at the moment, partly because 
we support several YARN versions that have incompatible APIs. But so far our 
modus operandi has been to ask Mesos contributors to review patches that touch 
it.

We didn't want to add a lot of components at the beginning partly to minimize 
overhead, but we can revisit it later. It would definitely be bad if we break 
Mesos support.

Matei

 On Nov 5, 2014, at 5:35 PM, Timothy Chen tnac...@gmail.com wrote:
 
 Hi Matei,
 
 Definitely in favor of moving into this model for exactly the reasons
 you mentioned.
 
 From the module list though, the module that I'm mostly involved with
 and is not listed is the Mesos integration piece.
 
 I believe we also need a maintainer for Mesos, and I wonder if there
 is someone that can be added to that?
 
 Tim
 
 On Wed, Nov 5, 2014 at 5:31 PM, Matei Zaharia matei.zaha...@gmail.com wrote:
 Hi all,
 
 I wanted to share a discussion we've been having on the PMC list, as well as 
 call for an official vote on it on a public list. Basically, as the Spark 
 project scales up, we need to define a model to make sure there is still 
 great oversight of key components (in particular internal architecture and 
 public APIs), and to this end I've proposed implementing a maintainer model 
 for some of these components, similar to other large projects.
 
 As background on this, Spark has grown a lot since joining Apache. We've had 
 over 80 contributors/month for the past 3 months, which I believe makes us 
 the most active project in contributors/month at Apache, as well as over 500 
 patches/month. The codebase has also grown significantly, with new libraries 
 for SQL, ML, graphs and more.
 
 In this kind of large project, one common way to scale development is to 
 assign maintainers to oversee key components, where each patch to that 
 component needs to get sign-off from at least one of its maintainers. Most 
 existing large projects do this -- at Apache, some large ones with this 
 model are CloudStack (the second-most active project overall), Subversion, 
 and Kafka, and other examples include Linux and Python. This is also 
 by-and-large how Spark operates today -- most components have a de-facto 
 maintainer.
 
 IMO, adopting this model would have two benefits:
 
 1) Consistent oversight of design for that component, especially regarding 
 architecture and API. This process would ensure that the component's 
 maintainers see all proposed changes and consider them to fit together in a 
 good way.
 
 2) More structure for new contributors and committers -- in particular, it 
 would be easy to look up who’s responsible for each module and ask them for 
 reviews, etc, rather than having patches slip between the cracks.
 
 We'd like to start with in a light-weight manner, where the model only 
 applies to certain key components (e.g. scheduler, shuffle) and user-facing 
 APIs (MLlib, GraphX, etc). Over time, as the project grows, we can expand it 
 if we deem it useful. The specific mechanics would be as follows:
 
 - Some components in Spark will have maintainers assigned to them, where one 
 of the maintainers needs to sign off on each patch to the component.
 - Each component with maintainers will have at least 2 maintainers.
 - Maintainers will be assigned from the most active and knowledgeable 
 committers on that component by the PMC. The PMC can vote to add / remove 
 maintainers, and maintained components, through consensus.
 - Maintainers are expected to be active in responding to patches for their 
 components, though they do not need to be the main reviewers for them (e.g. 
 they might just sign off on architecture / API). To prevent inactive 
 maintainers from blocking the project, if a maintainer isn't responding in a 
 reasonable time period (say 2 weeks), other committers can merge the patch, 
 and the PMC will want to discuss adding another maintainer.
 
 If you'd like to see examples for this model, check out the following 
 projects:
 - CloudStack: 
 https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
  
 https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
 - Subversion: https://subversion.apache.org/docs/community-guide/roles.html 
 https://subversion.apache.org/docs/community-guide/roles.html
 
 Finally, I wanted to list our current proposal for initial components and 
 maintainers. It would be good to get feedback on other components we might 
 add, but please note that personnel discussions (e.g. I don't think Matei 
 should maintain *that* component) should only happen on the private list. 
 The initial components were chosen to include all public APIs and the main 
 core components, and the maintainers were chosen from the most active 
 contributors to 

Re: [VOTE] Designating maintainers for some Spark components

2014-11-05 Thread Sandy Ryza
This seems like a good idea.

An area that wasn't listed, but that I think could strongly benefit from
maintainers, is the build.  Having consistent oversight over Maven, SBT,
and dependencies would allow us to avoid subtle breakages.

Component maintainers have come up several times within the Hadoop project,
and I think one of the main reasons the proposals have been rejected is
that, structurally, its effect is to slow down development.  As you
mention, this is somewhat mitigated if being a maintainer leads committers
to take on more responsibility, but it might be worthwhile to draw up more
specific ideas on how to combat this?  E.g. do obvious changes, doc fixes,
test fixes, etc. always require a maintainer?

-Sandy

On Wed, Nov 5, 2014 at 5:36 PM, Michael Armbrust mich...@databricks.com
wrote:

 +1 (binding)

 On Wed, Nov 5, 2014 at 5:33 PM, Matei Zaharia matei.zaha...@gmail.com
 wrote:

  BTW, my own vote is obviously +1 (binding).
 
  Matei
 
   On Nov 5, 2014, at 5:31 PM, Matei Zaharia matei.zaha...@gmail.com
  wrote:
  
   Hi all,
  
   I wanted to share a discussion we've been having on the PMC list, as
  well as call for an official vote on it on a public list. Basically, as
 the
  Spark project scales up, we need to define a model to make sure there is
  still great oversight of key components (in particular internal
  architecture and public APIs), and to this end I've proposed
 implementing a
  maintainer model for some of these components, similar to other large
  projects.
  
   As background on this, Spark has grown a lot since joining Apache.
 We've
  had over 80 contributors/month for the past 3 months, which I believe
 makes
  us the most active project in contributors/month at Apache, as well as
 over
  500 patches/month. The codebase has also grown significantly, with new
  libraries for SQL, ML, graphs and more.
  
   In this kind of large project, one common way to scale development is
 to
  assign maintainers to oversee key components, where each patch to that
  component needs to get sign-off from at least one of its maintainers.
 Most
  existing large projects do this -- at Apache, some large ones with this
  model are CloudStack (the second-most active project overall),
 Subversion,
  and Kafka, and other examples include Linux and Python. This is also
  by-and-large how Spark operates today -- most components have a de-facto
  maintainer.
  
   IMO, adopting this model would have two benefits:
  
   1) Consistent oversight of design for that component, especially
  regarding architecture and API. This process would ensure that the
  component's maintainers see all proposed changes and consider them to fit
  together in a good way.
  
   2) More structure for new contributors and committers -- in particular,
  it would be easy to look up who’s responsible for each module and ask
 them
  for reviews, etc, rather than having patches slip between the cracks.
  
   We'd like to start with in a light-weight manner, where the model only
  applies to certain key components (e.g. scheduler, shuffle) and
 user-facing
  APIs (MLlib, GraphX, etc). Over time, as the project grows, we can expand
  it if we deem it useful. The specific mechanics would be as follows:
  
   - Some components in Spark will have maintainers assigned to them,
 where
  one of the maintainers needs to sign off on each patch to the component.
   - Each component with maintainers will have at least 2 maintainers.
   - Maintainers will be assigned from the most active and knowledgeable
  committers on that component by the PMC. The PMC can vote to add / remove
  maintainers, and maintained components, through consensus.
   - Maintainers are expected to be active in responding to patches for
  their components, though they do not need to be the main reviewers for
 them
  (e.g. they might just sign off on architecture / API). To prevent
 inactive
  maintainers from blocking the project, if a maintainer isn't responding
 in
  a reasonable time period (say 2 weeks), other committers can merge the
  patch, and the PMC will want to discuss adding another maintainer.
  
   If you'd like to see examples for this model, check out the following
  projects:
   - CloudStack:
 
 https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
  
 
 https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
  
   - Subversion:
  https://subversion.apache.org/docs/community-guide/roles.html 
  https://subversion.apache.org/docs/community-guide/roles.html
  
   Finally, I wanted to list our current proposal for initial components
  and maintainers. It would be good to get feedback on other components we
  might add, but please note that personnel discussions (e.g. I don't
 think
  Matei should maintain *that* component) should only happen on the private
  list. The initial components were chosen to include all public APIs and
 the
  main core components, and the maintainers were chosen 

Re: [VOTE] Designating maintainers for some Spark components

2014-11-05 Thread Patrick Wendell
I'm a +1 on this as well, I think it will be a useful model as we
scale the project in the future and recognizes some informal process
we have now.

To respond to Sandy's comment: for changes that fall in between the
component boundaries or are straightforward, my understanding of this
model is you wouldn't need an explicit sign off. I think this is why
unlike some other projects, we wouldn't e.g. lock down permissions to
portions of the source tree. If some obvious fix needs to go in,
people should just merge it.

- Patrick

On Wed, Nov 5, 2014 at 5:57 PM, Sandy Ryza sandy.r...@cloudera.com wrote:
 This seems like a good idea.

 An area that wasn't listed, but that I think could strongly benefit from
 maintainers, is the build.  Having consistent oversight over Maven, SBT,
 and dependencies would allow us to avoid subtle breakages.

 Component maintainers have come up several times within the Hadoop project,
 and I think one of the main reasons the proposals have been rejected is
 that, structurally, its effect is to slow down development.  As you
 mention, this is somewhat mitigated if being a maintainer leads committers
 to take on more responsibility, but it might be worthwhile to draw up more
 specific ideas on how to combat this?  E.g. do obvious changes, doc fixes,
 test fixes, etc. always require a maintainer?

 -Sandy

 On Wed, Nov 5, 2014 at 5:36 PM, Michael Armbrust mich...@databricks.com
 wrote:

 +1 (binding)

 On Wed, Nov 5, 2014 at 5:33 PM, Matei Zaharia matei.zaha...@gmail.com
 wrote:

  BTW, my own vote is obviously +1 (binding).
 
  Matei
 
   On Nov 5, 2014, at 5:31 PM, Matei Zaharia matei.zaha...@gmail.com
  wrote:
  
   Hi all,
  
   I wanted to share a discussion we've been having on the PMC list, as
  well as call for an official vote on it on a public list. Basically, as
 the
  Spark project scales up, we need to define a model to make sure there is
  still great oversight of key components (in particular internal
  architecture and public APIs), and to this end I've proposed
 implementing a
  maintainer model for some of these components, similar to other large
  projects.
  
   As background on this, Spark has grown a lot since joining Apache.
 We've
  had over 80 contributors/month for the past 3 months, which I believe
 makes
  us the most active project in contributors/month at Apache, as well as
 over
  500 patches/month. The codebase has also grown significantly, with new
  libraries for SQL, ML, graphs and more.
  
   In this kind of large project, one common way to scale development is
 to
  assign maintainers to oversee key components, where each patch to that
  component needs to get sign-off from at least one of its maintainers.
 Most
  existing large projects do this -- at Apache, some large ones with this
  model are CloudStack (the second-most active project overall),
 Subversion,
  and Kafka, and other examples include Linux and Python. This is also
  by-and-large how Spark operates today -- most components have a de-facto
  maintainer.
  
   IMO, adopting this model would have two benefits:
  
   1) Consistent oversight of design for that component, especially
  regarding architecture and API. This process would ensure that the
  component's maintainers see all proposed changes and consider them to fit
  together in a good way.
  
   2) More structure for new contributors and committers -- in particular,
  it would be easy to look up who's responsible for each module and ask
 them
  for reviews, etc, rather than having patches slip between the cracks.
  
   We'd like to start with in a light-weight manner, where the model only
  applies to certain key components (e.g. scheduler, shuffle) and
 user-facing
  APIs (MLlib, GraphX, etc). Over time, as the project grows, we can expand
  it if we deem it useful. The specific mechanics would be as follows:
  
   - Some components in Spark will have maintainers assigned to them,
 where
  one of the maintainers needs to sign off on each patch to the component.
   - Each component with maintainers will have at least 2 maintainers.
   - Maintainers will be assigned from the most active and knowledgeable
  committers on that component by the PMC. The PMC can vote to add / remove
  maintainers, and maintained components, through consensus.
   - Maintainers are expected to be active in responding to patches for
  their components, though they do not need to be the main reviewers for
 them
  (e.g. they might just sign off on architecture / API). To prevent
 inactive
  maintainers from blocking the project, if a maintainer isn't responding
 in
  a reasonable time period (say 2 weeks), other committers can merge the
  patch, and the PMC will want to discuss adding another maintainer.
  
   If you'd like to see examples for this model, check out the following
  projects:
   - CloudStack:
 
 https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
  
 
 

Re: [VOTE] Designating maintainers for some Spark components

2014-11-05 Thread Andrew Or
+1

2014-11-05 18:08 GMT-08:00 Patrick Wendell pwend...@gmail.com:

 I'm a +1 on this as well, I think it will be a useful model as we
 scale the project in the future and recognizes some informal process
 we have now.

 To respond to Sandy's comment: for changes that fall in between the
 component boundaries or are straightforward, my understanding of this
 model is you wouldn't need an explicit sign off. I think this is why
 unlike some other projects, we wouldn't e.g. lock down permissions to
 portions of the source tree. If some obvious fix needs to go in,
 people should just merge it.

 - Patrick

 On Wed, Nov 5, 2014 at 5:57 PM, Sandy Ryza sandy.r...@cloudera.com
 wrote:
  This seems like a good idea.
 
  An area that wasn't listed, but that I think could strongly benefit from
  maintainers, is the build.  Having consistent oversight over Maven, SBT,
  and dependencies would allow us to avoid subtle breakages.
 
  Component maintainers have come up several times within the Hadoop
 project,
  and I think one of the main reasons the proposals have been rejected is
  that, structurally, its effect is to slow down development.  As you
  mention, this is somewhat mitigated if being a maintainer leads
 committers
  to take on more responsibility, but it might be worthwhile to draw up
 more
  specific ideas on how to combat this?  E.g. do obvious changes, doc
 fixes,
  test fixes, etc. always require a maintainer?
 
  -Sandy
 
  On Wed, Nov 5, 2014 at 5:36 PM, Michael Armbrust mich...@databricks.com
 
  wrote:
 
  +1 (binding)
 
  On Wed, Nov 5, 2014 at 5:33 PM, Matei Zaharia matei.zaha...@gmail.com
  wrote:
 
   BTW, my own vote is obviously +1 (binding).
  
   Matei
  
On Nov 5, 2014, at 5:31 PM, Matei Zaharia matei.zaha...@gmail.com
   wrote:
   
Hi all,
   
I wanted to share a discussion we've been having on the PMC list, as
   well as call for an official vote on it on a public list. Basically,
 as
  the
   Spark project scales up, we need to define a model to make sure there
 is
   still great oversight of key components (in particular internal
   architecture and public APIs), and to this end I've proposed
  implementing a
   maintainer model for some of these components, similar to other large
   projects.
   
As background on this, Spark has grown a lot since joining Apache.
  We've
   had over 80 contributors/month for the past 3 months, which I believe
  makes
   us the most active project in contributors/month at Apache, as well as
  over
   500 patches/month. The codebase has also grown significantly, with new
   libraries for SQL, ML, graphs and more.
   
In this kind of large project, one common way to scale development
 is
  to
   assign maintainers to oversee key components, where each patch to
 that
   component needs to get sign-off from at least one of its maintainers.
  Most
   existing large projects do this -- at Apache, some large ones with
 this
   model are CloudStack (the second-most active project overall),
  Subversion,
   and Kafka, and other examples include Linux and Python. This is also
   by-and-large how Spark operates today -- most components have a
 de-facto
   maintainer.
   
IMO, adopting this model would have two benefits:
   
1) Consistent oversight of design for that component, especially
   regarding architecture and API. This process would ensure that the
   component's maintainers see all proposed changes and consider them to
 fit
   together in a good way.
   
2) More structure for new contributors and committers -- in
 particular,
   it would be easy to look up who's responsible for each module and ask
  them
   for reviews, etc, rather than having patches slip between the cracks.
   
We'd like to start with in a light-weight manner, where the model
 only
   applies to certain key components (e.g. scheduler, shuffle) and
  user-facing
   APIs (MLlib, GraphX, etc). Over time, as the project grows, we can
 expand
   it if we deem it useful. The specific mechanics would be as follows:
   
- Some components in Spark will have maintainers assigned to them,
  where
   one of the maintainers needs to sign off on each patch to the
 component.
- Each component with maintainers will have at least 2 maintainers.
- Maintainers will be assigned from the most active and
 knowledgeable
   committers on that component by the PMC. The PMC can vote to add /
 remove
   maintainers, and maintained components, through consensus.
- Maintainers are expected to be active in responding to patches for
   their components, though they do not need to be the main reviewers for
  them
   (e.g. they might just sign off on architecture / API). To prevent
  inactive
   maintainers from blocking the project, if a maintainer isn't
 responding
  in
   a reasonable time period (say 2 weeks), other committers can merge the
   patch, and the PMC will want to discuss adding another maintainer.
   
If you'd like to see examples for this model, 

Re: [VOTE] Designating maintainers for some Spark components

2014-11-05 Thread Prashant Sharma
+1, Sounds good.

Now I know whom to ping for what, even if I did not follow the whole
history of the project very carefully.

Prashant Sharma



On Thu, Nov 6, 2014 at 7:01 AM, Matei Zaharia matei.zaha...@gmail.com
wrote:

 Hi all,

 I wanted to share a discussion we've been having on the PMC list, as well
 as call for an official vote on it on a public list. Basically, as the
 Spark project scales up, we need to define a model to make sure there is
 still great oversight of key components (in particular internal
 architecture and public APIs), and to this end I've proposed implementing a
 maintainer model for some of these components, similar to other large
 projects.

 As background on this, Spark has grown a lot since joining Apache. We've
 had over 80 contributors/month for the past 3 months, which I believe makes
 us the most active project in contributors/month at Apache, as well as over
 500 patches/month. The codebase has also grown significantly, with new
 libraries for SQL, ML, graphs and more.

 In this kind of large project, one common way to scale development is to
 assign maintainers to oversee key components, where each patch to that
 component needs to get sign-off from at least one of its maintainers. Most
 existing large projects do this -- at Apache, some large ones with this
 model are CloudStack (the second-most active project overall), Subversion,
 and Kafka, and other examples include Linux and Python. This is also
 by-and-large how Spark operates today -- most components have a de-facto
 maintainer.

 IMO, adopting this model would have two benefits:

 1) Consistent oversight of design for that component, especially regarding
 architecture and API. This process would ensure that the component's
 maintainers see all proposed changes and consider them to fit together in a
 good way.

 2) More structure for new contributors and committers -- in particular, it
 would be easy to look up who’s responsible for each module and ask them for
 reviews, etc, rather than having patches slip between the cracks.

 We'd like to start with in a light-weight manner, where the model only
 applies to certain key components (e.g. scheduler, shuffle) and user-facing
 APIs (MLlib, GraphX, etc). Over time, as the project grows, we can expand
 it if we deem it useful. The specific mechanics would be as follows:

 - Some components in Spark will have maintainers assigned to them, where
 one of the maintainers needs to sign off on each patch to the component.
 - Each component with maintainers will have at least 2 maintainers.
 - Maintainers will be assigned from the most active and knowledgeable
 committers on that component by the PMC. The PMC can vote to add / remove
 maintainers, and maintained components, through consensus.
 - Maintainers are expected to be active in responding to patches for their
 components, though they do not need to be the main reviewers for them (e.g.
 they might just sign off on architecture / API). To prevent inactive
 maintainers from blocking the project, if a maintainer isn't responding in
 a reasonable time period (say 2 weeks), other committers can merge the
 patch, and the PMC will want to discuss adding another maintainer.

 If you'd like to see examples for this model, check out the following
 projects:
 - CloudStack:
 https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
 
 https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
 
 - Subversion:
 https://subversion.apache.org/docs/community-guide/roles.html 
 https://subversion.apache.org/docs/community-guide/roles.html

 Finally, I wanted to list our current proposal for initial components and
 maintainers. It would be good to get feedback on other components we might
 add, but please note that personnel discussions (e.g. I don't think Matei
 should maintain *that* component) should only happen on the private list.
 The initial components were chosen to include all public APIs and the main
 core components, and the maintainers were chosen from the most active
 contributors to those modules.

 - Spark core public API: Matei, Patrick, Reynold
 - Job scheduler: Matei, Kay, Patrick
 - Shuffle and network: Reynold, Aaron, Matei
 - Block manager: Reynold, Aaron
 - YARN: Tom, Andrew Or
 - Python: Josh, Matei
 - MLlib: Xiangrui, Matei
 - SQL: Michael, Reynold
 - Streaming: TD, Matei
 - GraphX: Ankur, Joey, Reynold

 I'd like to formally call a [VOTE] on this model, to last 72 hours. The
 [VOTE] will end on Nov 8, 2014 at 6 PM PST.

 Matei


Re: [VOTE] Designating maintainers for some Spark components

2014-11-05 Thread Mark Hamstra
+1 (binding)

On Wed, Nov 5, 2014 at 6:29 PM, Nicholas Chammas nicholas.cham...@gmail.com
 wrote:

 +1 on this proposal.

 On Wed, Nov 5, 2014 at 8:55 PM, Nan Zhu zhunanmcg...@gmail.com wrote:

  Will these maintainers have a cleanup for those pending PRs upon we start
  to apply this model?


 I second Nan's question. I would like to see this initiative drive a
 reduction in the number of stale PRs we have out there. We're approaching
 300 open PRs again.

 Nick



Re: [VOTE] Designating maintainers for some Spark components

2014-11-05 Thread Xiangrui Meng
+1 (binding)

On Wed, Nov 5, 2014 at 7:52 PM, Mark Hamstra m...@clearstorydata.com wrote:
 +1 (binding)

 On Wed, Nov 5, 2014 at 6:29 PM, Nicholas Chammas nicholas.cham...@gmail.com
 wrote:

 +1 on this proposal.

 On Wed, Nov 5, 2014 at 8:55 PM, Nan Zhu zhunanmcg...@gmail.com wrote:

  Will these maintainers have a cleanup for those pending PRs upon we start
  to apply this model?


 I second Nan's question. I would like to see this initiative drive a
 reduction in the number of stale PRs we have out there. We're approaching
 300 open PRs again.

 Nick


-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: [VOTE] Designating maintainers for some Spark components

2014-11-05 Thread Wangfei (X)
+1

发自我的 iPhone

 在 2014年11月5日,20:06,Denny Lee denny.g@gmail.com 写道:
 
 +1 great idea.
 On Wed, Nov 5, 2014 at 20:04 Xiangrui Meng men...@gmail.com wrote:
 
 +1 (binding)
 
 On Wed, Nov 5, 2014 at 7:52 PM, Mark Hamstra m...@clearstorydata.com
 wrote:
 +1 (binding)
 
 On Wed, Nov 5, 2014 at 6:29 PM, Nicholas Chammas 
 nicholas.cham...@gmail.com
 wrote:
 
 +1 on this proposal.
 
 On Wed, Nov 5, 2014 at 8:55 PM, Nan Zhu zhunanmcg...@gmail.com wrote:
 
 Will these maintainers have a cleanup for those pending PRs upon we
 start
 to apply this model?
 
 
 I second Nan's question. I would like to see this initiative drive a
 reduction in the number of stale PRs we have out there. We're
 approaching
 300 open PRs again.
 
 Nick
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org
 
 

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: [VOTE] Designating maintainers for some Spark components

2014-11-05 Thread Cheng Lian
+1 since this is already the de facto model we are using.

On Thu, Nov 6, 2014 at 12:40 PM, Wangfei (X) wangf...@huawei.com wrote:

 +1

 发自我的 iPhone

  在 2014年11月5日,20:06,Denny Lee denny.g@gmail.com 写道:
 
  +1 great idea.
  On Wed, Nov 5, 2014 at 20:04 Xiangrui Meng men...@gmail.com wrote:
 
  +1 (binding)
 
  On Wed, Nov 5, 2014 at 7:52 PM, Mark Hamstra m...@clearstorydata.com
  wrote:
  +1 (binding)
 
  On Wed, Nov 5, 2014 at 6:29 PM, Nicholas Chammas 
  nicholas.cham...@gmail.com
  wrote:
 
  +1 on this proposal.
 
  On Wed, Nov 5, 2014 at 8:55 PM, Nan Zhu zhunanmcg...@gmail.com
 wrote:
 
  Will these maintainers have a cleanup for those pending PRs upon we
  start
  to apply this model?
 
 
  I second Nan's question. I would like to see this initiative drive a
  reduction in the number of stale PRs we have out there. We're
  approaching
  300 open PRs again.
 
  Nick
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
  For additional commands, e-mail: dev-h...@spark.apache.org
 
 

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org




Re: [VOTE] Designating maintainers for some Spark components

2014-11-05 Thread Jeremy Freeman
Great idea! +1

— Jeremy

-
jeremyfreeman.net
@thefreemanlab

On Nov 5, 2014, at 11:48 PM, Timothy Chen tnac...@gmail.com wrote:

 Matei that makes sense, +1 (non-binding)
 
 Tim
 
 On Wed, Nov 5, 2014 at 8:46 PM, Cheng Lian lian.cs@gmail.com wrote:
 +1 since this is already the de facto model we are using.
 
 On Thu, Nov 6, 2014 at 12:40 PM, Wangfei (X) wangf...@huawei.com wrote:
 
 +1
 
 发自我的 iPhone
 
 在 2014年11月5日,20:06,Denny Lee denny.g@gmail.com 写道:
 
 +1 great idea.
 On Wed, Nov 5, 2014 at 20:04 Xiangrui Meng men...@gmail.com wrote:
 
 +1 (binding)
 
 On Wed, Nov 5, 2014 at 7:52 PM, Mark Hamstra m...@clearstorydata.com
 wrote:
 +1 (binding)
 
 On Wed, Nov 5, 2014 at 6:29 PM, Nicholas Chammas 
 nicholas.cham...@gmail.com
 wrote:
 
 +1 on this proposal.
 
 On Wed, Nov 5, 2014 at 8:55 PM, Nan Zhu zhunanmcg...@gmail.com
 wrote:
 
 Will these maintainers have a cleanup for those pending PRs upon we
 start
 to apply this model?
 
 
 I second Nan's question. I would like to see this initiative drive a
 reduction in the number of stale PRs we have out there. We're
 approaching
 300 open PRs again.
 
 Nick
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org
 
 
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org
 
 
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org
 



RE: [VOTE] Designating maintainers for some Spark components

2014-11-05 Thread Cheng, Hao
+1, that definitely will speeds up the PR reviewing / merging.

-Original Message-
From: Cheng Lian [mailto:lian.cs@gmail.com] 
Sent: Thursday, November 6, 2014 12:46 PM
To: dev
Subject: Re: [VOTE] Designating maintainers for some Spark components

+1 since this is already the de facto model we are using.

On Thu, Nov 6, 2014 at 12:40 PM, Wangfei (X) wangf...@huawei.com wrote:

 +1

 发自我的 iPhone

  在 2014年11月5日,20:06,Denny Lee denny.g@gmail.com 写道:
 
  +1 great idea.
  On Wed, Nov 5, 2014 at 20:04 Xiangrui Meng men...@gmail.com wrote:
 
  +1 (binding)
 
  On Wed, Nov 5, 2014 at 7:52 PM, Mark Hamstra 
  m...@clearstorydata.com
  wrote:
  +1 (binding)
 
  On Wed, Nov 5, 2014 at 6:29 PM, Nicholas Chammas 
  nicholas.cham...@gmail.com
  wrote:
 
  +1 on this proposal.
 
  On Wed, Nov 5, 2014 at 8:55 PM, Nan Zhu zhunanmcg...@gmail.com
 wrote:
 
  Will these maintainers have a cleanup for those pending PRs upon 
  we
  start
  to apply this model?
 
 
  I second Nan's question. I would like to see this initiative 
  drive a reduction in the number of stale PRs we have out there. 
  We're
  approaching
  300 open PRs again.
 
  Nick
 
  ---
  -- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For 
  additional commands, e-mail: dev-h...@spark.apache.org
 
 

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For 
 additional commands, e-mail: dev-h...@spark.apache.org




Re: [VOTE] Designating maintainers for some Spark components

2014-11-05 Thread jackylk
+1 Great idea!



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Designating-maintainers-for-some-Spark-components-tp9115p9142.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: [VOTE] Designating maintainers for some Spark components

2014-11-05 Thread Kousuke Saruta

+1, It makes sense!

- Kousuke

(2014/11/05 17:31), Matei Zaharia wrote:

Hi all,

I wanted to share a discussion we've been having on the PMC list, as well as 
call for an official vote on it on a public list. Basically, as the Spark 
project scales up, we need to define a model to make sure there is still great 
oversight of key components (in particular internal architecture and public 
APIs), and to this end I've proposed implementing a maintainer model for some 
of these components, similar to other large projects.

As background on this, Spark has grown a lot since joining Apache. We've had 
over 80 contributors/month for the past 3 months, which I believe makes us the 
most active project in contributors/month at Apache, as well as over 500 
patches/month. The codebase has also grown significantly, with new libraries 
for SQL, ML, graphs and more.

In this kind of large project, one common way to scale development is to assign 
maintainers to oversee key components, where each patch to that component 
needs to get sign-off from at least one of its maintainers. Most existing large projects 
do this -- at Apache, some large ones with this model are CloudStack (the second-most 
active project overall), Subversion, and Kafka, and other examples include Linux and 
Python. This is also by-and-large how Spark operates today -- most components have a 
de-facto maintainer.

IMO, adopting this model would have two benefits:

1) Consistent oversight of design for that component, especially regarding 
architecture and API. This process would ensure that the component's 
maintainers see all proposed changes and consider them to fit together in a 
good way.

2) More structure for new contributors and committers -- in particular, it 
would be easy to look up who’s responsible for each module and ask them for 
reviews, etc, rather than having patches slip between the cracks.

We'd like to start with in a light-weight manner, where the model only applies 
to certain key components (e.g. scheduler, shuffle) and user-facing APIs 
(MLlib, GraphX, etc). Over time, as the project grows, we can expand it if we 
deem it useful. The specific mechanics would be as follows:

- Some components in Spark will have maintainers assigned to them, where one of 
the maintainers needs to sign off on each patch to the component.
- Each component with maintainers will have at least 2 maintainers.
- Maintainers will be assigned from the most active and knowledgeable 
committers on that component by the PMC. The PMC can vote to add / remove 
maintainers, and maintained components, through consensus.
- Maintainers are expected to be active in responding to patches for their 
components, though they do not need to be the main reviewers for them (e.g. 
they might just sign off on architecture / API). To prevent inactive 
maintainers from blocking the project, if a maintainer isn't responding in a 
reasonable time period (say 2 weeks), other committers can merge the patch, and 
the PMC will want to discuss adding another maintainer.

If you'd like to see examples for this model, check out the following projects:
- CloudStack: 
https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide 
https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
- Subversion: https://subversion.apache.org/docs/community-guide/roles.html 
https://subversion.apache.org/docs/community-guide/roles.html

Finally, I wanted to list our current proposal for initial components and 
maintainers. It would be good to get feedback on other components we might add, but 
please note that personnel discussions (e.g. I don't think Matei should 
maintain *that* component) should only happen on the private list. The initial 
components were chosen to include all public APIs and the main core components, and 
the maintainers were chosen from the most active contributors to those modules.

- Spark core public API: Matei, Patrick, Reynold
- Job scheduler: Matei, Kay, Patrick
- Shuffle and network: Reynold, Aaron, Matei
- Block manager: Reynold, Aaron
- YARN: Tom, Andrew Or
- Python: Josh, Matei
- MLlib: Xiangrui, Matei
- SQL: Michael, Reynold
- Streaming: TD, Matei
- GraphX: Ankur, Joey, Reynold

I'd like to formally call a [VOTE] on this model, to last 72 hours. The [VOTE] 
will end on Nov 8, 2014 at 6 PM PST.

Matei



-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: [VOTE] Designating maintainers for some Spark components

2014-11-05 Thread Reza Zadeh
+1, sounds good.

On Wed, Nov 5, 2014 at 9:19 PM, Kousuke Saruta saru...@oss.nttdata.co.jp
wrote:

 +1, It makes sense!

 - Kousuke


 (2014/11/05 17:31), Matei Zaharia wrote:

 Hi all,

 I wanted to share a discussion we've been having on the PMC list, as well
 as call for an official vote on it on a public list. Basically, as the
 Spark project scales up, we need to define a model to make sure there is
 still great oversight of key components (in particular internal
 architecture and public APIs), and to this end I've proposed implementing a
 maintainer model for some of these components, similar to other large
 projects.

 As background on this, Spark has grown a lot since joining Apache. We've
 had over 80 contributors/month for the past 3 months, which I believe makes
 us the most active project in contributors/month at Apache, as well as over
 500 patches/month. The codebase has also grown significantly, with new
 libraries for SQL, ML, graphs and more.

 In this kind of large project, one common way to scale development is to
 assign maintainers to oversee key components, where each patch to that
 component needs to get sign-off from at least one of its maintainers. Most
 existing large projects do this -- at Apache, some large ones with this
 model are CloudStack (the second-most active project overall), Subversion,
 and Kafka, and other examples include Linux and Python. This is also
 by-and-large how Spark operates today -- most components have a de-facto
 maintainer.

 IMO, adopting this model would have two benefits:

 1) Consistent oversight of design for that component, especially
 regarding architecture and API. This process would ensure that the
 component's maintainers see all proposed changes and consider them to fit
 together in a good way.

 2) More structure for new contributors and committers -- in particular,
 it would be easy to look up who’s responsible for each module and ask them
 for reviews, etc, rather than having patches slip between the cracks.

 We'd like to start with in a light-weight manner, where the model only
 applies to certain key components (e.g. scheduler, shuffle) and user-facing
 APIs (MLlib, GraphX, etc). Over time, as the project grows, we can expand
 it if we deem it useful. The specific mechanics would be as follows:

 - Some components in Spark will have maintainers assigned to them, where
 one of the maintainers needs to sign off on each patch to the component.
 - Each component with maintainers will have at least 2 maintainers.
 - Maintainers will be assigned from the most active and knowledgeable
 committers on that component by the PMC. The PMC can vote to add / remove
 maintainers, and maintained components, through consensus.
 - Maintainers are expected to be active in responding to patches for
 their components, though they do not need to be the main reviewers for them
 (e.g. they might just sign off on architecture / API). To prevent inactive
 maintainers from blocking the project, if a maintainer isn't responding in
 a reasonable time period (say 2 weeks), other committers can merge the
 patch, and the PMC will want to discuss adding another maintainer.

 If you'd like to see examples for this model, check out the following
 projects:
 - CloudStack: https://cwiki.apache.org/confluence/display/CLOUDSTACK/
 CloudStack+Maintainers+Guide https://cwiki.apache.org/
 confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
 - Subversion: https://subversion.apache.org/docs/community-guide/roles.
 html https://subversion.apache.org/docs/community-guide/roles.html

 Finally, I wanted to list our current proposal for initial components and
 maintainers. It would be good to get feedback on other components we might
 add, but please note that personnel discussions (e.g. I don't think Matei
 should maintain *that* component) should only happen on the private list.
 The initial components were chosen to include all public APIs and the main
 core components, and the maintainers were chosen from the most active
 contributors to those modules.

 - Spark core public API: Matei, Patrick, Reynold
 - Job scheduler: Matei, Kay, Patrick
 - Shuffle and network: Reynold, Aaron, Matei
 - Block manager: Reynold, Aaron
 - YARN: Tom, Andrew Or
 - Python: Josh, Matei
 - MLlib: Xiangrui, Matei
 - SQL: Michael, Reynold
 - Streaming: TD, Matei
 - GraphX: Ankur, Joey, Reynold

 I'd like to formally call a [VOTE] on this model, to last 72 hours. The
 [VOTE] will end on Nov 8, 2014 at 6 PM PST.

 Matei



 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org




Re: [VOTE] Designating maintainers for some Spark components

2014-11-05 Thread Xuefeng Wu
+1  it make more focus and more consistence. 
 

Yours, Xuefeng Wu 吴雪峰 敬上

 On 2014年11月6日, at 上午9:31, Matei Zaharia matei.zaha...@gmail.com wrote:
 
 Hi all,
 
 I wanted to share a discussion we've been having on the PMC list, as well as 
 call for an official vote on it on a public list. Basically, as the Spark 
 project scales up, we need to define a model to make sure there is still 
 great oversight of key components (in particular internal architecture and 
 public APIs), and to this end I've proposed implementing a maintainer model 
 for some of these components, similar to other large projects.
 
 As background on this, Spark has grown a lot since joining Apache. We've had 
 over 80 contributors/month for the past 3 months, which I believe makes us 
 the most active project in contributors/month at Apache, as well as over 500 
 patches/month. The codebase has also grown significantly, with new libraries 
 for SQL, ML, graphs and more.
 
 In this kind of large project, one common way to scale development is to 
 assign maintainers to oversee key components, where each patch to that 
 component needs to get sign-off from at least one of its maintainers. Most 
 existing large projects do this -- at Apache, some large ones with this model 
 are CloudStack (the second-most active project overall), Subversion, and 
 Kafka, and other examples include Linux and Python. This is also by-and-large 
 how Spark operates today -- most components have a de-facto maintainer.
 
 IMO, adopting this model would have two benefits:
 
 1) Consistent oversight of design for that component, especially regarding 
 architecture and API. This process would ensure that the component's 
 maintainers see all proposed changes and consider them to fit together in a 
 good way.
 
 2) More structure for new contributors and committers -- in particular, it 
 would be easy to look up who’s responsible for each module and ask them for 
 reviews, etc, rather than having patches slip between the cracks.
 
 We'd like to start with in a light-weight manner, where the model only 
 applies to certain key components (e.g. scheduler, shuffle) and user-facing 
 APIs (MLlib, GraphX, etc). Over time, as the project grows, we can expand it 
 if we deem it useful. The specific mechanics would be as follows:
 
 - Some components in Spark will have maintainers assigned to them, where one 
 of the maintainers needs to sign off on each patch to the component.
 - Each component with maintainers will have at least 2 maintainers.
 - Maintainers will be assigned from the most active and knowledgeable 
 committers on that component by the PMC. The PMC can vote to add / remove 
 maintainers, and maintained components, through consensus.
 - Maintainers are expected to be active in responding to patches for their 
 components, though they do not need to be the main reviewers for them (e.g. 
 they might just sign off on architecture / API). To prevent inactive 
 maintainers from blocking the project, if a maintainer isn't responding in a 
 reasonable time period (say 2 weeks), other committers can merge the patch, 
 and the PMC will want to discuss adding another maintainer.
 
 If you'd like to see examples for this model, check out the following 
 projects:
 - CloudStack: 
 https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
  
 https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
  
 - Subversion: https://subversion.apache.org/docs/community-guide/roles.html 
 https://subversion.apache.org/docs/community-guide/roles.html
 
 Finally, I wanted to list our current proposal for initial components and 
 maintainers. It would be good to get feedback on other components we might 
 add, but please note that personnel discussions (e.g. I don't think Matei 
 should maintain *that* component) should only happen on the private list. The 
 initial components were chosen to include all public APIs and the main core 
 components, and the maintainers were chosen from the most active contributors 
 to those modules.
 
 - Spark core public API: Matei, Patrick, Reynold
 - Job scheduler: Matei, Kay, Patrick
 - Shuffle and network: Reynold, Aaron, Matei
 - Block manager: Reynold, Aaron
 - YARN: Tom, Andrew Or
 - Python: Josh, Matei
 - MLlib: Xiangrui, Matei
 - SQL: Michael, Reynold
 - Streaming: TD, Matei
 - GraphX: Ankur, Joey, Reynold
 
 I'd like to formally call a [VOTE] on this model, to last 72 hours. The 
 [VOTE] will end on Nov 8, 2014 at 6 PM PST.
 
 Matei

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: [VOTE] Designating maintainers for some Spark components

2014-11-05 Thread Matei Zaharia
Several people asked about having maintainers review the PR queue for their 
modules regularly, and I like that idea. We have a new tool now to help with 
that in https://spark-prs.appspot.com.

In terms of the set of open PRs itself, it is large but note that there are 
also 2800 *closed* PRs, which means we close the majority of PRs (and I don't 
know the exact stats but I'd guess that 90% of those are accepted and merged). 
I think one problem is that with GitHub, people often develop something as a PR 
and have a lot of discussion on there (including whether we even want the 
feature). I recently updated our how to contribute page to encourage opening 
a JIRA and having discussions on the dev list first, but I do think we need to 
be faster with closing ones that we don't have a plan to merge. Note that 
Hadoop, Hive, HBase, etc also have about 300 issues each in the patch 
available state, so this is some kind of universal constant :P.

Matei


 On Nov 5, 2014, at 10:46 PM, Sean Owen so...@cloudera.com wrote:
 
 Naturally, this sounds great. FWIW my only but significant worry about
 Spark is scaling up to meet unprecedented demand in the form of
 questions and contribution. Clarifying responsibility and ownership
 helps more than it hurts by adding process.
 
 This is related but different topic, but, I wonder out loud what this
 can do to help clear the backlog -- ~*1200* open JIRAs and ~300 open
 PRs, most of which have de facto already fallen between some cracks.
 This harms the usefulness of these tools and processes.
 
 I'd love to see this translate into triage / closing of most of it by
 maintainers, and new actions and strategies for increasing
 'throughput' in review and/or helping people make better contributions
 in the first place.
 
 On Thu, Nov 6, 2014 at 1:31 AM, Matei Zaharia matei.zaha...@gmail.com wrote:
 Hi all,
 
 I wanted to share a discussion we've been having on the PMC list, as well as 
 call for an official vote on it on a public list. Basically, as the Spark 
 project scales up, we need to define a model to make sure there is still 
 great oversight of key components (in particular internal architecture and 
 public APIs), and to this end I've proposed implementing a maintainer model 
 for some of these components, similar to other large projects.


-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: [VOTE] Designating maintainers for some Spark components

2014-11-05 Thread Manoj Babu
+1

Cheers!
Manoj.

On Thu, Nov 6, 2014 at 12:51 PM, Matei Zaharia matei.zaha...@gmail.com
wrote:

 Several people asked about having maintainers review the PR queue for
 their modules regularly, and I like that idea. We have a new tool now to
 help with that in https://spark-prs.appspot.com.

 In terms of the set of open PRs itself, it is large but note that there
 are also 2800 *closed* PRs, which means we close the majority of PRs (and I
 don't know the exact stats but I'd guess that 90% of those are accepted and
 merged). I think one problem is that with GitHub, people often develop
 something as a PR and have a lot of discussion on there (including whether
 we even want the feature). I recently updated our how to contribute page
 to encourage opening a JIRA and having discussions on the dev list first,
 but I do think we need to be faster with closing ones that we don't have a
 plan to merge. Note that Hadoop, Hive, HBase, etc also have about 300
 issues each in the patch available state, so this is some kind of
 universal constant :P.

 Matei


  On Nov 5, 2014, at 10:46 PM, Sean Owen so...@cloudera.com wrote:
 
  Naturally, this sounds great. FWIW my only but significant worry about
  Spark is scaling up to meet unprecedented demand in the form of
  questions and contribution. Clarifying responsibility and ownership
  helps more than it hurts by adding process.
 
  This is related but different topic, but, I wonder out loud what this
  can do to help clear the backlog -- ~*1200* open JIRAs and ~300 open
  PRs, most of which have de facto already fallen between some cracks.
  This harms the usefulness of these tools and processes.
 
  I'd love to see this translate into triage / closing of most of it by
  maintainers, and new actions and strategies for increasing
  'throughput' in review and/or helping people make better contributions
  in the first place.
 
  On Thu, Nov 6, 2014 at 1:31 AM, Matei Zaharia matei.zaha...@gmail.com
 wrote:
  Hi all,
 
  I wanted to share a discussion we've been having on the PMC list, as
 well as call for an official vote on it on a public list. Basically, as the
 Spark project scales up, we need to define a model to make sure there is
 still great oversight of key components (in particular internal
 architecture and public APIs), and to this end I've proposed implementing a
 maintainer model for some of these components, similar to other large
 projects.


 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org




Re: [VOTE] Designating maintainers for some Spark components

2014-11-05 Thread Liquan Pei
+1

Liquan

On Wed, Nov 5, 2014 at 11:32 PM, Manoj Babu manoj...@gmail.com wrote:

 +1

 Cheers!
 Manoj.

 On Thu, Nov 6, 2014 at 12:51 PM, Matei Zaharia matei.zaha...@gmail.com
 wrote:

  Several people asked about having maintainers review the PR queue for
  their modules regularly, and I like that idea. We have a new tool now to
  help with that in https://spark-prs.appspot.com.
 
  In terms of the set of open PRs itself, it is large but note that there
  are also 2800 *closed* PRs, which means we close the majority of PRs
 (and I
  don't know the exact stats but I'd guess that 90% of those are accepted
 and
  merged). I think one problem is that with GitHub, people often develop
  something as a PR and have a lot of discussion on there (including
 whether
  we even want the feature). I recently updated our how to contribute
 page
  to encourage opening a JIRA and having discussions on the dev list first,
  but I do think we need to be faster with closing ones that we don't have
 a
  plan to merge. Note that Hadoop, Hive, HBase, etc also have about 300
  issues each in the patch available state, so this is some kind of
  universal constant :P.
 
  Matei
 
 
   On Nov 5, 2014, at 10:46 PM, Sean Owen so...@cloudera.com wrote:
  
   Naturally, this sounds great. FWIW my only but significant worry about
   Spark is scaling up to meet unprecedented demand in the form of
   questions and contribution. Clarifying responsibility and ownership
   helps more than it hurts by adding process.
  
   This is related but different topic, but, I wonder out loud what this
   can do to help clear the backlog -- ~*1200* open JIRAs and ~300 open
   PRs, most of which have de facto already fallen between some cracks.
   This harms the usefulness of these tools and processes.
  
   I'd love to see this translate into triage / closing of most of it by
   maintainers, and new actions and strategies for increasing
   'throughput' in review and/or helping people make better contributions
   in the first place.
  
   On Thu, Nov 6, 2014 at 1:31 AM, Matei Zaharia matei.zaha...@gmail.com
 
  wrote:
   Hi all,
  
   I wanted to share a discussion we've been having on the PMC list, as
  well as call for an official vote on it on a public list. Basically, as
 the
  Spark project scales up, we need to define a model to make sure there is
  still great oversight of key components (in particular internal
  architecture and public APIs), and to this end I've proposed
 implementing a
  maintainer model for some of these components, similar to other large
  projects.
 
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
  For additional commands, e-mail: dev-h...@spark.apache.org
 
 




-- 
Liquan Pei
Department of Physics
University of Massachusetts Amherst