Re: Flume Graduation (was Re: June reports in two weeks)

2012-05-24 Thread Ralph Goers

On May 23, 2012, at 10:48 PM, Patrick Hunt wrote:

 On Wed, May 23, 2012 at 10:36 PM, Ralph Goers
 ralph.go...@dslextreme.com wrote:
 
 On May 23, 2012, at 10:15 PM, Benson Margulies wrote:
 
 On Wed, May 23, 2012 at 10:09 PM, Ralph Goers
 ralph.go...@dslextreme.com wrote:
 Right after I read Jukka's email that started this thread and I posted my 
 reply and discovered to my shock that they had started a graduation vote.  
 I am shocked because I have pointed out repeatedly the project's complete 
 lack of diversity.  Virtually all the active PMC members and committers 
 work for the same employer.  I have told them several times that I would 
 actually like to participate in the project but the way the project works 
 is very different then every other project I am involved with at the ASF 
 and the barriers to figure out what is actually going on is very high. 
 Almost nothing is discussed directly on the dev list - it is all done 
 through Jira issues or the Review tool.  While all the Jira issue updates 
 and reviews are sent to the dev list most of that is just noise.  Feel 
 free to review the dev list archives to see what I am talking about.
 
 I don't follow flume, but I'd propose to soften your objection only
 slightly. I've met other groups of people who like a JIRA centric view
 of the world. I suspect that if they did a bunch of other good things
 called out below, you or others would find the JIRA business
 digestible. Also, on the other hand, I fear that the co-employed
 contributors are collaborating in the hallway, and the lack of the
 context in JIRA or on the list is contributing to the problem.
 
 I have reason to doubt the collaboration in the hallway aspect and I 
 certainly do not doubt everyone's good intent.  I'm not objecting to the 
 collaboration style as an issue preventing graduation. I'm just saying I 
 find it difficult to participate with that style and that simply makes me 
 wonder if that is making it harder to attract new committers.  I fully 
 realize that that issue might just be with me, but the fact remains that 
 there is practically no diversity in the project and I cannot in good 
 conscience recommend graduation for a project in that situation.
 
 
 Hi Ralph, Benson, et. al., some background:
 
 Flume is similar to Hadoop and other related projects in that it is
 very jira heavy for development activity. No slouch in terms of
 mailing list traffic either though (1200 last month):
 http://flume.markmail.org/
 
 Also note the extensive new developer type detail that's available
 on the web/wiki:
 https://cwiki.apache.org/confluence/display/FLUME/Index
 
 The team list can provide insight into the diversity issue
 http://incubator.apache.org/flume/team-list.html My understanding is
 that there are at least 4 separate organizations represented by active
 commiters.
 

The team list is incorrect and is somewhat misleading.  To my knowledge at 
least two separate organizations represented in that list are now employed by 
Cloudera.  Others signed on when the project entered the incubator but have 
never participated.  This all became clear to me during the last release vote 
when, as I recall, I cast the only binding vote that didn't come from a 
Cloudera employee.

Ralph




 Regards,
 
 Patrick
 
 
 
 Needless to say, when the graduation proposal reaches this list, and I'm 
 sure it will, I will strongly endorse the IPMC to reject the proposal.
 
 FWIW, I found the post below to be 100% on target.
 
 Ralph
 
 
 
 On May 23, 2012, at 7:31 PM, Marvin Humphrey wrote:
 
 On Wed, May 23, 2012 at 5:36 PM, Patrick Hunt ph...@apache.org wrote:
 Perhaps someone will have some insight on how to gather new
 contributors that hasn't been tried yet?
 
 Jukka's written on this subject multiple times in the past.  Here are two
 gems, one from a while back, the other recent:
 
http://markmail.org/message/o3gbgam4ny2upqte
 
Most of the cases I've been involved so far of podlings in the hoping
some more people come along have had symptoms of the project team not
paying enough attention on making it easy for new contributors to show 
 up
and stick around. Things like complex and undocumented build steps,
missing Getting started or Getting involved guides, lack of quick 
 and
positive feedback to newcomers, etc., are all too common. Fixing even 
 just
some of such things will dramatically increase the odds of new people
showing up.
 
Those are things that are very easy to overlook when you're working on
your first open source projects (it took me years to learn those 
 lessons),
but we here have a massive amount of collective experience on such 
 things.
That's what we could and IMHO should be sharing with the podlings. 
 That's
what mentoring to me is about and that's where our most precious 
 added
value is. Otherwise incubation just boils down to an indoctrination
period on how to apply and conform to the various Apache 

Re: Flume Graduation (was Re: June reports in two weeks)

2012-05-24 Thread Ralph Goers

On May 23, 2012, at 10:48 PM, Patrick Hunt wrote:

 On Wed, May 23, 2012 at 10:36 PM, Ralph Goers
 ralph.go...@dslextreme.com wrote:
 
 On May 23, 2012, at 10:15 PM, Benson Margulies wrote:
 
 On Wed, May 23, 2012 at 10:09 PM, Ralph Goers
 ralph.go...@dslextreme.com wrote:
 Right after I read Jukka's email that started this thread and I posted my 
 reply and discovered to my shock that they had started a graduation vote.  
 I am shocked because I have pointed out repeatedly the project's complete 
 lack of diversity.  Virtually all the active PMC members and committers 
 work for the same employer.  I have told them several times that I would 
 actually like to participate in the project but the way the project works 
 is very different then every other project I am involved with at the ASF 
 and the barriers to figure out what is actually going on is very high. 
 Almost nothing is discussed directly on the dev list - it is all done 
 through Jira issues or the Review tool.  While all the Jira issue updates 
 and reviews are sent to the dev list most of that is just noise.  Feel 
 free to review the dev list archives to see what I am talking about.
 
 I don't follow flume, but I'd propose to soften your objection only
 slightly. I've met other groups of people who like a JIRA centric view
 of the world. I suspect that if they did a bunch of other good things
 called out below, you or others would find the JIRA business
 digestible. Also, on the other hand, I fear that the co-employed
 contributors are collaborating in the hallway, and the lack of the
 context in JIRA or on the list is contributing to the problem.
 
 I have reason to doubt the collaboration in the hallway aspect and I 
 certainly do not doubt everyone's good intent.  I'm not objecting to the 
 collaboration style as an issue preventing graduation. I'm just saying I 
 find it difficult to participate with that style and that simply makes me 
 wonder if that is making it harder to attract new committers.  I fully 
 realize that that issue might just be with me, but the fact remains that 
 there is practically no diversity in the project and I cannot in good 
 conscience recommend graduation for a project in that situation.
 
 
 Hi Ralph, Benson, et. al., some background:
 
 Flume is similar to Hadoop and other related projects in that it is
 very jira heavy for development activity. No slouch in terms of
 mailing list traffic either though (1200 last month):
 http://flume.markmail.org/

Sorry I didn't include this in my prior post but here you are making my point 
exactly.  I participate in several other Apache projects. Wading through 1200+ 
emails per month that are largely Jira/Review noise makes it very difficult for 
me to find posts that have any value. As a consequence I am largely forced to 
simply delete everything generated by he Review tool and Jira.  And I'm a 
mentor. I just don't see how newcomers are going to find this style welcoming.

Ralph




-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: Flume Graduation (was Re: June reports in two weeks)

2012-05-24 Thread Patrick Hunt
On Wed, May 23, 2012 at 11:18 PM, Ralph Goers
ralph.go...@dslextreme.com wrote:

 On May 23, 2012, at 10:48 PM, Patrick Hunt wrote:

 On Wed, May 23, 2012 at 10:36 PM, Ralph Goers
 ralph.go...@dslextreme.com wrote:

 On May 23, 2012, at 10:15 PM, Benson Margulies wrote:

 On Wed, May 23, 2012 at 10:09 PM, Ralph Goers
 ralph.go...@dslextreme.com wrote:
 Right after I read Jukka's email that started this thread and I posted my 
 reply and discovered to my shock that they had started a graduation vote. 
  I am shocked because I have pointed out repeatedly the project's 
 complete lack of diversity.  Virtually all the active PMC members and 
 committers work for the same employer.  I have told them several times 
 that I would actually like to participate in the project but the way the 
 project works is very different then every other project I am involved 
 with at the ASF and the barriers to figure out what is actually going on 
 is very high. Almost nothing is discussed directly on the dev list - it 
 is all done through Jira issues or the Review tool.  While all the Jira 
 issue updates and reviews are sent to the dev list most of that is just 
 noise.  Feel free to review the dev list archives to see what I am 
 talking about.

 I don't follow flume, but I'd propose to soften your objection only
 slightly. I've met other groups of people who like a JIRA centric view
 of the world. I suspect that if they did a bunch of other good things
 called out below, you or others would find the JIRA business
 digestible. Also, on the other hand, I fear that the co-employed
 contributors are collaborating in the hallway, and the lack of the
 context in JIRA or on the list is contributing to the problem.

 I have reason to doubt the collaboration in the hallway aspect and I 
 certainly do not doubt everyone's good intent.  I'm not objecting to the 
 collaboration style as an issue preventing graduation. I'm just saying I 
 find it difficult to participate with that style and that simply makes me 
 wonder if that is making it harder to attract new committers.  I fully 
 realize that that issue might just be with me, but the fact remains that 
 there is practically no diversity in the project and I cannot in good 
 conscience recommend graduation for a project in that situation.


 Hi Ralph, Benson, et. al., some background:

 Flume is similar to Hadoop and other related projects in that it is
 very jira heavy for development activity. No slouch in terms of
 mailing list traffic either though (1200 last month):
 http://flume.markmail.org/

 Sorry I didn't include this in my prior post but here you are making my point 
 exactly.  I participate in several other Apache projects. Wading through 
 1200+ emails per month that are largely Jira/Review noise makes it very 
 difficult for me to find posts that have any value. As a consequence I am 
 largely forced to simply delete everything generated by he Review tool and 
 Jira.  And I'm a mentor. I just don't see how newcomers are going to find 
 this style welcoming.

There are separate lists it's just that markmail clubs them all
together. It's also pretty easy to filter...

Patrick

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: Flume Graduation (was Re: June reports in two weeks)

2012-05-24 Thread Marvin Humphrey
On Wed, May 23, 2012 at 10:36 PM, Ralph Goers
ralph.go...@dslextreme.com wrote:
 I'm just saying I find it difficult to participate with that style and that
 simply makes me wonder if that is making it harder to attract new
 committers.

I suspect it attracts some and drives away others.

I frikkin' hate JIRA notifications.  The emails suck, so newcomers are forced
to learn JIRA's interface before they can participate fully in the dev
conversation.  To me that seems like it raises a barrier to entry -- but then,
there are numerous projects around the ASF who are not hurting for
contributors and who use JIRA for *everything* -- starting with Hadoop and
Lucene.

If you don't want JIRA-centric development, it can be curtailed by sending
notifications to a dedicated issues list instead of the dev list.  However,
I would not necessarily recommend that to a new podling, as I can't tell where
my own biases end and I don't want to start a phpBB^H^H^H^H^HJIRA vs email
flame war.

-- Marvin Humphrey, who in moments of weakness fantasizes about the day when
Infra can no longer keep the massive ASF JIRA instance from toppling over and
all the Java projects whose participation in Infra is limited to complaining
when stuff goes offline come crying.

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: Flume Graduation (was Re: June reports in two weeks)

2012-05-24 Thread Eric Sammer
I appreciate your position Ralph and I don't want anyone to feel like they
can't contribute. As we've talked about before, we've been quick to nurture
new contributors to committer status successfully in a few cases. It's true
that some of the more active committers are from Cloudera, but it's not to
the exclusion of anyone. Others aren't from Cloudera. Those of us that work
together are also very strict about abiding to the if it's not on the
mailing list, it didn't happen rule (where mailing list can mean JIRA or
other ASF infrastructure as well).

I'm happy to take your guidance as a mentor, but you also need to
understand that some of the ways the Flume project has elected to operate
are just a matter of taste. They were proposed, discussed, voted on (and
not as a block by Cloudera employees, IIRC - pretty sure I was -0), and put
in place and do not violate the Apache Way (like RTC vs. CTR). They aren't
unheard of and they do not work to the exclusion of contributors (RTC, for
instance, only impacts committers). I think the vote that was started was
only to gauge community opinion as a first step (although I'm not
completely well versed in the graduation process, to be honest).

If there are concrete things we can do to improve diversity, in your
opinion, I am extremely open to hearing them. We already do many of the
(excellent) things listed earlier in the thread. JIRA noise withstanding
(again, it's a matter of taste - I use the email frequently as I find
trolling through JIRA slow) I'm definitely open to ideas. Of course, if
Flume simply needs to remain in the incubator until we develop greater
diversity, that's fine too. If we're not ready, we're just not ready.

On Wed, May 23, 2012 at 11:18 PM, Ralph Goers ralph.go...@dslextreme.comwrote:


 On May 23, 2012, at 10:48 PM, Patrick Hunt wrote:

  On Wed, May 23, 2012 at 10:36 PM, Ralph Goers
  ralph.go...@dslextreme.com wrote:
 
  On May 23, 2012, at 10:15 PM, Benson Margulies wrote:
 
  On Wed, May 23, 2012 at 10:09 PM, Ralph Goers
  ralph.go...@dslextreme.com wrote:
  Right after I read Jukka's email that started this thread and I
 posted my reply and discovered to my shock that they had started a
 graduation vote.  I am shocked because I have pointed out repeatedly the
 project's complete lack of diversity.  Virtually all the active PMC members
 and committers work for the same employer.  I have told them several times
 that I would actually like to participate in the project but the way the
 project works is very different then every other project I am involved with
 at the ASF and the barriers to figure out what is actually going on is very
 high. Almost nothing is discussed directly on the dev list - it is all done
 through Jira issues or the Review tool.  While all the Jira issue updates
 and reviews are sent to the dev list most of that is just noise.  Feel free
 to review the dev list archives to see what I am talking about.
 
  I don't follow flume, but I'd propose to soften your objection only
  slightly. I've met other groups of people who like a JIRA centric view
  of the world. I suspect that if they did a bunch of other good things
  called out below, you or others would find the JIRA business
  digestible. Also, on the other hand, I fear that the co-employed
  contributors are collaborating in the hallway, and the lack of the
  context in JIRA or on the list is contributing to the problem.
 
  I have reason to doubt the collaboration in the hallway aspect and I
 certainly do not doubt everyone's good intent.  I'm not objecting to the
 collaboration style as an issue preventing graduation. I'm just saying I
 find it difficult to participate with that style and that simply makes me
 wonder if that is making it harder to attract new committers.  I fully
 realize that that issue might just be with me, but the fact remains that
 there is practically no diversity in the project and I cannot in good
 conscience recommend graduation for a project in that situation.
 
 
  Hi Ralph, Benson, et. al., some background:
 
  Flume is similar to Hadoop and other related projects in that it is
  very jira heavy for development activity. No slouch in terms of
  mailing list traffic either though (1200 last month):
  http://flume.markmail.org/

 Sorry I didn't include this in my prior post but here you are making my
 point exactly.  I participate in several other Apache projects. Wading
 through 1200+ emails per month that are largely Jira/Review noise makes it
 very difficult for me to find posts that have any value. As a consequence I
 am largely forced to simply delete everything generated by he Review tool
 and Jira.  And I'm a mentor. I just don't see how newcomers are going to
 find this style welcoming.

 Ralph




 -
 To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
 For additional commands, e-mail: general-h...@incubator.apache.org




-- 
Eric Sammer
twitter: 

Re: Flume Graduation (was Re: June reports in two weeks)

2012-05-24 Thread Ralph Goers
The ONLY issue I see for Flume to graduate is diversity.  No one will convince 
me that the current makeup constitutes diversity of any kind.  

Perhaps I shouldn't have brought up the mailing list issues as that was only 
meant in the spirit of trying to offer some advice on how more diversity could 
be achieved.  Flume is really the only community I participate in that contains 
Cloudera employees so I do find myself wondering if the way the project is run 
is because that is the way all projects with a large number of Cloudera 
employees are run.  That might make all of those participants comfortable but 
might create a barrier to others.

In any case - I'm not insisting that the way the project is run needs to 
change. I'm simply saying I cannot support graduation with the current makeup 
of the committers and PMC. I don't have a hard and fast ratio - gaining 10 new 
unaffiliated committers who don't do much isn't nearly as good as 2 or 3 who 
are very active.  Ultimately the project needs to figure out how to solve this.


Ralph


On May 23, 2012, at 11:48 PM, Eric Sammer wrote:

 I appreciate your position Ralph and I don't want anyone to feel like they
 can't contribute. As we've talked about before, we've been quick to nurture
 new contributors to committer status successfully in a few cases. It's true
 that some of the more active committers are from Cloudera, but it's not to
 the exclusion of anyone. Others aren't from Cloudera. Those of us that work
 together are also very strict about abiding to the if it's not on the
 mailing list, it didn't happen rule (where mailing list can mean JIRA or
 other ASF infrastructure as well).
 
 I'm happy to take your guidance as a mentor, but you also need to
 understand that some of the ways the Flume project has elected to operate
 are just a matter of taste. They were proposed, discussed, voted on (and
 not as a block by Cloudera employees, IIRC - pretty sure I was -0), and put
 in place and do not violate the Apache Way (like RTC vs. CTR). They aren't
 unheard of and they do not work to the exclusion of contributors (RTC, for
 instance, only impacts committers). I think the vote that was started was
 only to gauge community opinion as a first step (although I'm not
 completely well versed in the graduation process, to be honest).
 
 If there are concrete things we can do to improve diversity, in your
 opinion, I am extremely open to hearing them. We already do many of the
 (excellent) things listed earlier in the thread. JIRA noise withstanding
 (again, it's a matter of taste - I use the email frequently as I find
 trolling through JIRA slow) I'm definitely open to ideas. Of course, if
 Flume simply needs to remain in the incubator until we develop greater
 diversity, that's fine too. If we're not ready, we're just not ready.
 
 On Wed, May 23, 2012 at 11:18 PM, Ralph Goers 
 ralph.go...@dslextreme.comwrote:
 
 
 On May 23, 2012, at 10:48 PM, Patrick Hunt wrote:
 
 On Wed, May 23, 2012 at 10:36 PM, Ralph Goers
 ralph.go...@dslextreme.com wrote:
 
 On May 23, 2012, at 10:15 PM, Benson Margulies wrote:
 
 On Wed, May 23, 2012 at 10:09 PM, Ralph Goers
 ralph.go...@dslextreme.com wrote:
 Right after I read Jukka's email that started this thread and I
 posted my reply and discovered to my shock that they had started a
 graduation vote.  I am shocked because I have pointed out repeatedly the
 project's complete lack of diversity.  Virtually all the active PMC members
 and committers work for the same employer.  I have told them several times
 that I would actually like to participate in the project but the way the
 project works is very different then every other project I am involved with
 at the ASF and the barriers to figure out what is actually going on is very
 high. Almost nothing is discussed directly on the dev list - it is all done
 through Jira issues or the Review tool.  While all the Jira issue updates
 and reviews are sent to the dev list most of that is just noise.  Feel free
 to review the dev list archives to see what I am talking about.
 
 I don't follow flume, but I'd propose to soften your objection only
 slightly. I've met other groups of people who like a JIRA centric view
 of the world. I suspect that if they did a bunch of other good things
 called out below, you or others would find the JIRA business
 digestible. Also, on the other hand, I fear that the co-employed
 contributors are collaborating in the hallway, and the lack of the
 context in JIRA or on the list is contributing to the problem.
 
 I have reason to doubt the collaboration in the hallway aspect and I
 certainly do not doubt everyone's good intent.  I'm not objecting to the
 collaboration style as an issue preventing graduation. I'm just saying I
 find it difficult to participate with that style and that simply makes me
 wonder if that is making it harder to attract new committers.  I fully
 realize that that issue might just be with me, but the fact remains that
 there is 

Re: [VOTE] Accept Crunch into the Apache Incubator

2012-05-24 Thread Bertrand Delacretaz
 [X ] +1, bring Crunch into Incubator

-Bertrand

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [VOTE] Accept Crunch into the Apache Incubator

2012-05-24 Thread Tommaso Teofili
+1

Tommaso

2012/5/23 Josh Wills jwi...@cloudera.com

 I would like to call a vote for accepting Apache Crunch for
 incubation in the Apache Incubator. The full proposal is available
 below.  We ask the Incubator PMC to sponsor it, with phunt as
 Champion, and phunt, tomwhite, and acmurthy volunteering to be
 Mentors.

 Please cast your vote:

 [ ] +1, bring Crunch into Incubator
 [ ] +0, I don't care either way,
 [ ] -1, do not bring Crunch into Incubator, because...

 This vote will be open for 72 hours and only votes from the Incubator
 PMC are binding.

 http://wiki.apache.org/incubator/CrunchProposal

 Proposal text from the wiki:

 --
 = Crunch - Easy, Efficient MapReduce Pipelines in Java and Scala =

 == Abstract ==

 Crunch is a Java library for writing, testing, and running pipelines
 of !MapReduce jobs on Apache Hadoop.

 == Proposal ==

 Crunch is a Java library for writing, testing, and running pipelines
 of !MapReduce jobs on Apache Hadoop. Its main goal is to provide a
 high-level API for writing and testing complex !MapReduce jobs that
 require multiple processing stages.  It has a simple, flexible, and
 extensible data model that makes it ideal for processing data that
 does not naturally fit into a relational structure, such as time
 series and serialized object formats like JSON and Avro. It supports
 running pipelines either as a series of !MapReduce jobs on an Apache
 Hadoop cluster or in memory on a single machine for fast testing and
 debugging.

 == Background ==

 Crunch was initially developed by Cloudera to simplify the process of
 creating sequences of dependent !MapReduce jobs, especially jobs that
 processed non-relational data like time series. Its design was based
 on a paper Google published about a Java library they developed called
 !FlumeJava that was created in order to solve a similar class of
 problems. Crunch was open-sourced by Cloudera on !GitHub as an Apache
 2.0 licensed project in October 2011. During this time Crunch has been
 formally released twice, as versions 0.1.0 (October 2010) and 0.2.0
 (February 2012), with an incremental update to version 0.2.1 (March
 2012) .  These releases are also distributed by Cloudera as source and
 binaries from Cloudera's Maven repository.

 == Rationale ==

 Most of the interesting analytical and data processing tasks that are
 run on an Apache Hadoop cluster require a series of !MapReduce jobs to
 be executed in sequence. Developers who are creating these pipelines
 today need to manually assign the sequence of tasks to perform in a
 dependent chain of !MapReduce jobs, even though there are a number of
 well-known patterns for fusing dependent computations together into a
 single !MapReduce stage and for performing common types of joins and
 aggregations. This results in !MapReduce pipelines that are more
 difficult to test, maintain, and extend to support new functionality.

 Furthermore, the type of data that is being stored and processed using
 Apache Hadoop is evolving. Although Hadoop was originally used for
 storing large volumes of structured text in the form of webpages and
 log files, it is now common for Hadoop to store complex, structured
 data formats such as JSON, Apache Avro, and Apache Thrift. These
 formats allow developers to work with serialized objects in
 programming languages like Java, C++, and Python, and allow for new
 types of analysis to be performed on complex data types. Hadoop has
 also been adopted by the scientific research community, who are using
 Hadoop to process time series data, structured binary files in the
 HDF5 format, and large medical and satellite images.

 Crunch addresses these challenges by providing a lightweight and
 extensible Java API for defining the stages of a data processing
 pipeline, which can then be run on an Apache Hadoop cluster as a
 sequence of dependent !MapReduce jobs, or in-memory on a single
 machine to facilitate fast testing and debugging. Crunch relies on a
 small set of primitive abstractions that represent immutable,
 distributed collections of objects. Developers define functions that
 are applied to those objects in order to generate new immutable,
 distributed collections of objects. Crunch also provides a library of
 common !MapReduce patterns for performing efficient joins and
 aggregation operations over these distributed collections that
 developers may integrate into their own pipelines. Crunch also
 provides native support for processing structured binary data formats
 like JSON, Apache Avro, and Apache Thrift, and is designed to be
 extensible to support working with any kind of data format that Java
 supports in its native form.

 == Initial Goals ==

 Crunch is currently in its first major release with a considerable
 number of enhancement requests, tasks, and issues recorded towards its
 future development. The initial goal 

[RESULT] [VOTE] Release Apache Wookie 0.10.0-incubating (General Incubation List)

2012-05-24 Thread Scott Wilson
The 72 hour voting period has passed and the vote is now closed. Thanks to 
everyone who took time to review the release. 

With the three IPMC member votes (3 of them mentors) and 3 PPMC votes the vote 
succeeds

IPMC Member voting record:

* Ate Douma: +1
* Ross Gardler +1
* Matt Franklin +1

* Denotes an IPMC member vote cast on the wookie-dev list.

Thanks,

Scott.

On 21 May 2012, at 16:13, Scott Wilson wrote:

 This is the third incubator release for Apache Wookie, with the artifacts
 being versioned as 0.10.0-incubating.
 
 We are requesting a lazy consensus vote, as we have already received 3
 binding IPMC +1 votes during the release voting on wookie-dev -
 
 Vote thread:
 http://markmail.org/message/2p4veen6n22w7hnb
 
 Result:
 http://markmail.org/message/d2jzbrdgic3od5uj
 
 Svn source tag:
 https://svn.apache.org/repos/asf/incubator/wookie/tags/0.10.0-incubating/
 
 Release notes:
 https://svn.apache.org/repos/asf/incubator/wookie/tags/0.10.0-incubating/RELEASE_NOTES
 
 Release artifacts:
 http://people.apache.org/builds/incubator/wookie/0.10.0-incubating/
 
 Maven artifacts
 https://repository.apache.org/content/repositories/orgapachewookie-094/
 
 PGP release keys:
 https://svn.apache.org/repos/asf/incubator/wookie/KEYS
 
 Lazy consensus, vote open for 72 hours.
 
 [ ] +1  approve
 [ ] +0  no opinion
 [ ] -1  disapprove (and reason why) 
 
 


-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



RE: [VOTE] Accept Crunch into the Apache Incubator

2012-05-24 Thread Franklin, Matthew B.
+1 (binding)

-Original Message-
From: Josh Wills [mailto:jwi...@cloudera.com]
Sent: Wednesday, May 23, 2012 2:46 PM
To: general@incubator.apache.org
Subject: [VOTE] Accept Crunch into the Apache Incubator

I would like to call a vote for accepting Apache Crunch for
incubation in the Apache Incubator. The full proposal is available
below.  We ask the Incubator PMC to sponsor it, with phunt as
Champion, and phunt, tomwhite, and acmurthy volunteering to be
Mentors.

Please cast your vote:

[ ] +1, bring Crunch into Incubator
[ ] +0, I don't care either way,
[ ] -1, do not bring Crunch into Incubator, because...

This vote will be open for 72 hours and only votes from the Incubator
PMC are binding.

http://wiki.apache.org/incubator/CrunchProposal

Proposal text from the wiki:
---
---
= Crunch - Easy, Efficient MapReduce Pipelines in Java and Scala =

== Abstract ==

Crunch is a Java library for writing, testing, and running pipelines
of !MapReduce jobs on Apache Hadoop.

== Proposal ==

Crunch is a Java library for writing, testing, and running pipelines
of !MapReduce jobs on Apache Hadoop. Its main goal is to provide a
high-level API for writing and testing complex !MapReduce jobs that
require multiple processing stages.  It has a simple, flexible, and
extensible data model that makes it ideal for processing data that
does not naturally fit into a relational structure, such as time
series and serialized object formats like JSON and Avro. It supports
running pipelines either as a series of !MapReduce jobs on an Apache
Hadoop cluster or in memory on a single machine for fast testing and
debugging.

== Background ==

Crunch was initially developed by Cloudera to simplify the process of
creating sequences of dependent !MapReduce jobs, especially jobs that
processed non-relational data like time series. Its design was based
on a paper Google published about a Java library they developed called
!FlumeJava that was created in order to solve a similar class of
problems. Crunch was open-sourced by Cloudera on !GitHub as an Apache
2.0 licensed project in October 2011. During this time Crunch has been
formally released twice, as versions 0.1.0 (October 2010) and 0.2.0
(February 2012), with an incremental update to version 0.2.1 (March
2012) .  These releases are also distributed by Cloudera as source and
binaries from Cloudera's Maven repository.

== Rationale ==

Most of the interesting analytical and data processing tasks that are
run on an Apache Hadoop cluster require a series of !MapReduce jobs to
be executed in sequence. Developers who are creating these pipelines
today need to manually assign the sequence of tasks to perform in a
dependent chain of !MapReduce jobs, even though there are a number of
well-known patterns for fusing dependent computations together into a
single !MapReduce stage and for performing common types of joins and
aggregations. This results in !MapReduce pipelines that are more
difficult to test, maintain, and extend to support new functionality.

Furthermore, the type of data that is being stored and processed using
Apache Hadoop is evolving. Although Hadoop was originally used for
storing large volumes of structured text in the form of webpages and
log files, it is now common for Hadoop to store complex, structured
data formats such as JSON, Apache Avro, and Apache Thrift. These
formats allow developers to work with serialized objects in
programming languages like Java, C++, and Python, and allow for new
types of analysis to be performed on complex data types. Hadoop has
also been adopted by the scientific research community, who are using
Hadoop to process time series data, structured binary files in the
HDF5 format, and large medical and satellite images.

Crunch addresses these challenges by providing a lightweight and
extensible Java API for defining the stages of a data processing
pipeline, which can then be run on an Apache Hadoop cluster as a
sequence of dependent !MapReduce jobs, or in-memory on a single
machine to facilitate fast testing and debugging. Crunch relies on a
small set of primitive abstractions that represent immutable,
distributed collections of objects. Developers define functions that
are applied to those objects in order to generate new immutable,
distributed collections of objects. Crunch also provides a library of
common !MapReduce patterns for performing efficient joins and
aggregation operations over these distributed collections that
developers may integrate into their own pipelines. Crunch also
provides native support for processing structured binary data formats
like JSON, Apache Avro, and Apache Thrift, and is designed to be
extensible to support working with any kind of data format that Java
supports in its native form.

== Initial Goals ==

Crunch is currently in its first major release with a considerable
number of enhancement 

Re: [VOTE] Accept Crunch into the Apache Incubator

2012-05-24 Thread Benson Margulies
+1 (binding) ...

And a friendly reminder to the ppmc via their mentors -- in response
to that email about limiting the initial committers to a tight group.
They will soon be learning that the big challenge of incubation is not
writing a lot of code, its recruiting new faces. They'll want to
switch from 'just us chickens' to putting out the welcome mat as soon
as possible.

On Thu, May 24, 2012 at 2:29 AM, Bertrand Delacretaz
bdelacre...@apache.org wrote:
 [X ] +1, bring Crunch into Incubator

 -Bertrand

 -
 To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
 For additional commands, e-mail: general-h...@incubator.apache.org


-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



[ANNOUNCE] Apache Wink 1.2.0-incubating release

2012-05-24 Thread Luciano Resende
The Apache Wink team is pleased to announce the release of Apache Wink
1.2.0-incubating.

Apache Wink is a simple yet solid framework for building RESTful Web
services. It is comprised of a Server module and a Client module for
developing and consuming RESTful Web services.

The Wink Server module is a complete implementation of the JAX-RS v1.1
specification. On top of this implementation, the Wink Server module
provides a set of additional features that were designed to facilitate
the development of RESTful Web services.

The Wink Client module is a Java based framework that provides
functionality for communicating with RESTful Web services. The
framework is built on top of the JDK HttpURLConnection and adds
essential features that facilitate the development of such client
applications.

For full details about the release and to download the distributions
please go to:

http://incubator.apache.org/wink/downloads.html

Apache Wink welcomes your help. Any contribution, including code,
testing, contributions to the documentation, or bug reporting is
always appreciated. For more information on how to get involved in
Apache Wink visit the website at:

http://incubator.apache.org/wink/

Thank you for your interest in Apache Wink!

The Apache Wink Team.

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [VOTE] Accept Crunch into the Apache Incubator

2012-05-24 Thread Mike Percy
[x] +1, bring Crunch into Incubator (non-binding)

Regards,
Mike


On Wednesday, May 23, 2012 at 11:45 AM, Josh Wills wrote:

 I would like to call a vote for accepting Apache Crunch for
 incubation in the Apache Incubator. The full proposal is available
 below. We ask the Incubator PMC to sponsor it, with phunt as
 Champion, and phunt, tomwhite, and acmurthy volunteering to be
 Mentors.

 Please cast your vote:

 [ ] +1, bring Crunch into Incubator
 [ ] +0, I don't care either way,
 [ ] -1, do not bring Crunch into Incubator, because...

 This vote will be open for 72 hours and only votes from the Incubator
 PMC are binding.

 http://wiki.apache.org/incubator/CrunchProposal

 Proposal text from the wiki:
 --

 = Crunch - Easy, Efficient MapReduce Pipelines in Java and Scala =

 == Abstract ==

 Crunch is a Java library for writing, testing, and running pipelines
 of !MapReduce jobs on Apache Hadoop.

 == Proposal ==

 Crunch is a Java library for writing, testing, and running pipelines
 of !MapReduce jobs on Apache Hadoop. Its main goal is to provide a
 high-level API for writing and testing complex !MapReduce jobs that
 require multiple processing stages. It has a simple, flexible, and
 extensible data model that makes it ideal for processing data that
 does not naturally fit into a relational structure, such as time
 series and serialized object formats like JSON and Avro. It supports
 running pipelines either as a series of !MapReduce jobs on an Apache
 Hadoop cluster or in memory on a single machine for fast testing and
 debugging.

 == Background ==

 Crunch was initially developed by Cloudera to simplify the process of
 creating sequences of dependent !MapReduce jobs, especially jobs that
 processed non-relational data like time series. Its design was based
 on a paper Google published about a Java library they developed called
 !FlumeJava that was created in order to solve a similar class of
 problems. Crunch was open-sourced by Cloudera on !GitHub as an Apache
 2.0 licensed project in October 2011. During this time Crunch has been
 formally released twice, as versions 0.1.0 (October 2010) and 0.2.0
 (February 2012), with an incremental update to version 0.2.1 (March
 2012) . These releases are also distributed by Cloudera as source and
 binaries from Cloudera's Maven repository.

 == Rationale ==

 Most of the interesting analytical and data processing tasks that are
 run on an Apache Hadoop cluster require a series of !MapReduce jobs to
 be executed in sequence. Developers who are creating these pipelines
 today need to manually assign the sequence of tasks to perform in a
 dependent chain of !MapReduce jobs, even though there are a number of
 well-known patterns for fusing dependent computations together into a
 single !MapReduce stage and for performing common types of joins and
 aggregations. This results in !MapReduce pipelines that are more
 difficult to test, maintain, and extend to support new functionality.

 Furthermore, the type of data that is being stored and processed using
 Apache Hadoop is evolving. Although Hadoop was originally used for
 storing large volumes of structured text in the form of webpages and
 log files, it is now common for Hadoop to store complex, structured
 data formats such as JSON, Apache Avro, and Apache Thrift. These
 formats allow developers to work with serialized objects in
 programming languages like Java, C++, and Python, and allow for new
 types of analysis to be performed on complex data types. Hadoop has
 also been adopted by the scientific research community, who are using
 Hadoop to process time series data, structured binary files in the
 HDF5 format, and large medical and satellite images.

 Crunch addresses these challenges by providing a lightweight and
 extensible Java API for defining the stages of a data processing
 pipeline, which can then be run on an Apache Hadoop cluster as a
 sequence of dependent !MapReduce jobs, or in-memory on a single
 machine to facilitate fast testing and debugging. Crunch relies on a
 small set of primitive abstractions that represent immutable,
 distributed collections of objects. Developers define functions that
 are applied to those objects in order to generate new immutable,
 distributed collections of objects. Crunch also provides a library of
 common !MapReduce patterns for performing efficient joins and
 aggregation operations over these distributed collections that
 developers may integrate into their own pipelines. Crunch also
 provides native support for processing structured binary data formats
 like JSON, Apache Avro, and Apache Thrift, and is designed to be
 extensible to support working with any kind of data format that Java
 supports in its native form.

 == Initial Goals ==

 Crunch is currently in its first major release with a considerable
 number of enhancement requests, tasks, 

Re: [VOTE] Accept Crunch into the Apache Incubator

2012-05-24 Thread Arun C Murthy
+1 (binding)

On May 23, 2012, at 11:45 AM, Josh Wills wrote:

 I would like to call a vote for accepting Apache Crunch for
 incubation in the Apache Incubator. The full proposal is available
 below.  We ask the Incubator PMC to sponsor it, with phunt as
 Champion, and phunt, tomwhite, and acmurthy volunteering to be
 Mentors.
 
 Please cast your vote:
 
 [ ] +1, bring Crunch into Incubator
 [ ] +0, I don't care either way,
 [ ] -1, do not bring Crunch into Incubator, because...
 
 This vote will be open for 72 hours and only votes from the Incubator
 PMC are binding.
 
 http://wiki.apache.org/incubator/CrunchProposal
 
 Proposal text from the wiki:
 --
 = Crunch - Easy, Efficient MapReduce Pipelines in Java and Scala =
 
 == Abstract ==
 
 Crunch is a Java library for writing, testing, and running pipelines
 of !MapReduce jobs on Apache Hadoop.
 
 == Proposal ==
 
 Crunch is a Java library for writing, testing, and running pipelines
 of !MapReduce jobs on Apache Hadoop. Its main goal is to provide a
 high-level API for writing and testing complex !MapReduce jobs that
 require multiple processing stages.  It has a simple, flexible, and
 extensible data model that makes it ideal for processing data that
 does not naturally fit into a relational structure, such as time
 series and serialized object formats like JSON and Avro. It supports
 running pipelines either as a series of !MapReduce jobs on an Apache
 Hadoop cluster or in memory on a single machine for fast testing and
 debugging.
 
 == Background ==
 
 Crunch was initially developed by Cloudera to simplify the process of
 creating sequences of dependent !MapReduce jobs, especially jobs that
 processed non-relational data like time series. Its design was based
 on a paper Google published about a Java library they developed called
 !FlumeJava that was created in order to solve a similar class of
 problems. Crunch was open-sourced by Cloudera on !GitHub as an Apache
 2.0 licensed project in October 2011. During this time Crunch has been
 formally released twice, as versions 0.1.0 (October 2010) and 0.2.0
 (February 2012), with an incremental update to version 0.2.1 (March
 2012) .  These releases are also distributed by Cloudera as source and
 binaries from Cloudera's Maven repository.
 
 == Rationale ==
 
 Most of the interesting analytical and data processing tasks that are
 run on an Apache Hadoop cluster require a series of !MapReduce jobs to
 be executed in sequence. Developers who are creating these pipelines
 today need to manually assign the sequence of tasks to perform in a
 dependent chain of !MapReduce jobs, even though there are a number of
 well-known patterns for fusing dependent computations together into a
 single !MapReduce stage and for performing common types of joins and
 aggregations. This results in !MapReduce pipelines that are more
 difficult to test, maintain, and extend to support new functionality.
 
 Furthermore, the type of data that is being stored and processed using
 Apache Hadoop is evolving. Although Hadoop was originally used for
 storing large volumes of structured text in the form of webpages and
 log files, it is now common for Hadoop to store complex, structured
 data formats such as JSON, Apache Avro, and Apache Thrift. These
 formats allow developers to work with serialized objects in
 programming languages like Java, C++, and Python, and allow for new
 types of analysis to be performed on complex data types. Hadoop has
 also been adopted by the scientific research community, who are using
 Hadoop to process time series data, structured binary files in the
 HDF5 format, and large medical and satellite images.
 
 Crunch addresses these challenges by providing a lightweight and
 extensible Java API for defining the stages of a data processing
 pipeline, which can then be run on an Apache Hadoop cluster as a
 sequence of dependent !MapReduce jobs, or in-memory on a single
 machine to facilitate fast testing and debugging. Crunch relies on a
 small set of primitive abstractions that represent immutable,
 distributed collections of objects. Developers define functions that
 are applied to those objects in order to generate new immutable,
 distributed collections of objects. Crunch also provides a library of
 common !MapReduce patterns for performing efficient joins and
 aggregation operations over these distributed collections that
 developers may integrate into their own pipelines. Crunch also
 provides native support for processing structured binary data formats
 like JSON, Apache Avro, and Apache Thrift, and is designed to be
 extensible to support working with any kind of data format that Java
 supports in its native form.
 
 == Initial Goals ==
 
 Crunch is currently in its first major release with a considerable
 number of enhancement requests, tasks, and issues recorded towards its
 future 

Re: Flume Graduation (was Re: June reports in two weeks)

2012-05-24 Thread Eric Sammer
On May 24, 2012, at 12:20 AM, Ralph Goers ralph.go...@dslextreme.com wrote:

 The ONLY issue I see for Flume to graduate is diversity.  No one will 
 convince me that the current makeup constitutes diversity of any kind.

 Perhaps I shouldn't have brought up the mailing list issues as that was only 
 meant in the spirit of trying to offer some advice on how more diversity 
 could be achieved.  Flume is really the only community I participate in that 
 contains Cloudera employees so I do find myself wondering if the way the 
 project is run is because that is the way all projects with a large number of 
 Cloudera employees are run.  That might make all of those participants 
 comfortable but might create a barrier to others.

There are others where this is the case that are easily referenceable.
There's an obvious (to me) implication that this is the cause of the
problem and that's simply not true. If there are concrete
recommendations of things you feel we can do better I know the flume
community is open to those sightings. There's no practice in place
within flume that isn't in place in some other ASF TLP to my
knowledge.


 In any case - I'm not insisting that the way the project is run needs to 
 change. I'm simply saying I cannot support graduation with the current makeup 
 of the committers and PMC. I don't have a hard and fast ratio - gaining 10 
 new unaffiliated committers who don't do much isn't nearly as good as 2 or 3 
 who are very active.  Ultimately the project needs to figure out how to solve 
 this.

That's fine. So let's have a discussion about actionable tasks. I've
mentioned my thoughts on growing diversity in the past, although
admittedly it was within a response to a similar thread on our private
list. I'll start a thread on our dev list with the same thoughts for
the larger community to comment on. I welcome your contribution to
such a discussion!

Thanks.



 Ralph


 On May 23, 2012, at 11:48 PM, Eric Sammer wrote:

 I appreciate your position Ralph and I don't want anyone to feel like they
 can't contribute. As we've talked about before, we've been quick to nurture
 new contributors to committer status successfully in a few cases. It's true
 that some of the more active committers are from Cloudera, but it's not to
 the exclusion of anyone. Others aren't from Cloudera. Those of us that work
 together are also very strict about abiding to the if it's not on the
 mailing list, it didn't happen rule (where mailing list can mean JIRA or
 other ASF infrastructure as well).

 I'm happy to take your guidance as a mentor, but you also need to
 understand that some of the ways the Flume project has elected to operate
 are just a matter of taste. They were proposed, discussed, voted on (and
 not as a block by Cloudera employees, IIRC - pretty sure I was -0), and put
 in place and do not violate the Apache Way (like RTC vs. CTR). They aren't
 unheard of and they do not work to the exclusion of contributors (RTC, for
 instance, only impacts committers). I think the vote that was started was
 only to gauge community opinion as a first step (although I'm not
 completely well versed in the graduation process, to be honest).

 If there are concrete things we can do to improve diversity, in your
 opinion, I am extremely open to hearing them. We already do many of the
 (excellent) things listed earlier in the thread. JIRA noise withstanding
 (again, it's a matter of taste - I use the email frequently as I find
 trolling through JIRA slow) I'm definitely open to ideas. Of course, if
 Flume simply needs to remain in the incubator until we develop greater
 diversity, that's fine too. If we're not ready, we're just not ready.

 On Wed, May 23, 2012 at 11:18 PM, Ralph Goers 
 ralph.go...@dslextreme.comwrote:


 On May 23, 2012, at 10:48 PM, Patrick Hunt wrote:

 On Wed, May 23, 2012 at 10:36 PM, Ralph Goers
 ralph.go...@dslextreme.com wrote:

 On May 23, 2012, at 10:15 PM, Benson Margulies wrote:

 On Wed, May 23, 2012 at 10:09 PM, Ralph Goers
 ralph.go...@dslextreme.com wrote:
 Right after I read Jukka's email that started this thread and I
 posted my reply and discovered to my shock that they had started a
 graduation vote.  I am shocked because I have pointed out repeatedly the
 project's complete lack of diversity.  Virtually all the active PMC members
 and committers work for the same employer.  I have told them several times
 that I would actually like to participate in the project but the way the
 project works is very different then every other project I am involved with
 at the ASF and the barriers to figure out what is actually going on is very
 high. Almost nothing is discussed directly on the dev list - it is all done
 through Jira issues or the Review tool.  While all the Jira issue updates
 and reviews are sent to the dev list most of that is just noise.  Feel free
 to review the dev list archives to see what I am talking about.

 I don't follow flume, but I'd propose to soften your 

Re: Flume Graduation (was Re: June reports in two weeks)

2012-05-24 Thread Arvind Prabhakar
Hi,

On Thu, May 24, 2012 at 12:19 AM, Ralph Goers ralph.go...@dslextreme.comwrote:

 The ONLY issue I see for Flume to graduate is diversity.  No one will
 convince me that the current makeup constitutes diversity of any kind.

 Perhaps I shouldn't have brought up the mailing list issues as that was
 only meant in the spirit of trying to offer some advice on how more
 diversity could be achieved.  Flume is really the only community I
 participate in that contains Cloudera employees so I do find myself
 wondering if the way the project is run is because that is the way all
 projects with a large number of Cloudera employees are run.  That might
 make all of those participants comfortable but might create a barrier to
 others.


Here are the committers who have been active in the past three months:

* Brock Noland (Cloudera)
* Hari Shreedharan  (Cloudera)
* Jarek Jarcec Cecho (AVG Technologies)
* Juhani Connolly   (CyberAgent)
* Mike Percy (Cloudera)
* Mingjie Lai (Trend Micro)
* Prasad Mujumdar (Cloudera)
* Will McQueen (Cloudera)
* Arvind Prabhakar (Cloudera)

There are four companies represented in this list: AVG Technologies,
Cloudera, CyberAgent and Trend Micro. Compared to other projects that have
successfully graduated from Incubator in the past, this meets the diversity
requirements very well.



 In any case - I'm not insisting that the way the project is run needs to
 change. I'm simply saying I cannot support graduation with the current
 makeup of the committers and PMC. I don't have a hard and fast ratio -
 gaining 10 new unaffiliated committers who don't do much isn't nearly as
 good as 2 or 3 who are very active.  Ultimately the project needs to figure
 out how to solve this.


Stating that some committers who don't do much isn't nearly as good as 2
or 3 who are very active is an unfair characterization. This is more
unfair for those who are part of the project but have not been active
lately due to whatever reasons, but have played a foundational role in
getting the project to a point where it is today. I think they are as
important as any other committer who may be very active at the moment.
Merit once earned, never expires [1].

[1] http://www.apache.org/dev/committers.html#committer-set-term

Arvind



 Ralph


 On May 23, 2012, at 11:48 PM, Eric Sammer wrote:

  I appreciate your position Ralph and I don't want anyone to feel like
 they
  can't contribute. As we've talked about before, we've been quick to
 nurture
  new contributors to committer status successfully in a few cases. It's
 true
  that some of the more active committers are from Cloudera, but it's not
 to
  the exclusion of anyone. Others aren't from Cloudera. Those of us that
 work
  together are also very strict about abiding to the if it's not on the
  mailing list, it didn't happen rule (where mailing list can mean JIRA
 or
  other ASF infrastructure as well).
 
  I'm happy to take your guidance as a mentor, but you also need to
  understand that some of the ways the Flume project has elected to operate
  are just a matter of taste. They were proposed, discussed, voted on (and
  not as a block by Cloudera employees, IIRC - pretty sure I was -0), and
 put
  in place and do not violate the Apache Way (like RTC vs. CTR). They
 aren't
  unheard of and they do not work to the exclusion of contributors (RTC,
 for
  instance, only impacts committers). I think the vote that was started was
  only to gauge community opinion as a first step (although I'm not
  completely well versed in the graduation process, to be honest).
 
  If there are concrete things we can do to improve diversity, in your
  opinion, I am extremely open to hearing them. We already do many of the
  (excellent) things listed earlier in the thread. JIRA noise withstanding
  (again, it's a matter of taste - I use the email frequently as I find
  trolling through JIRA slow) I'm definitely open to ideas. Of course, if
  Flume simply needs to remain in the incubator until we develop greater
  diversity, that's fine too. If we're not ready, we're just not ready.
 
  On Wed, May 23, 2012 at 11:18 PM, Ralph Goers 
 ralph.go...@dslextreme.comwrote:
 
 
  On May 23, 2012, at 10:48 PM, Patrick Hunt wrote:
 
  On Wed, May 23, 2012 at 10:36 PM, Ralph Goers
  ralph.go...@dslextreme.com wrote:
 
  On May 23, 2012, at 10:15 PM, Benson Margulies wrote:
 
  On Wed, May 23, 2012 at 10:09 PM, Ralph Goers
  ralph.go...@dslextreme.com wrote:
  Right after I read Jukka's email that started this thread and I
  posted my reply and discovered to my shock that they had started a
  graduation vote.  I am shocked because I have pointed out repeatedly the
  project's complete lack of diversity.  Virtually all the active PMC
 members
  and committers work for the same employer.  I have told them several
 times
  that I would actually like to participate in the project but the way the
  project works is very different then every other project I am involved
 with
  at the ASF and the barriers 

Re: Flume Graduation (was Re: June reports in two weeks)

2012-05-24 Thread Ralph Goers

On May 24, 2012, at 10:40 AM, Arvind Prabhakar wrote:

 Hi,
 
 On Thu, May 24, 2012 at 12:19 AM, Ralph Goers 
 ralph.go...@dslextreme.comwrote:
 
 The ONLY issue I see for Flume to graduate is diversity.  No one will
 convince me that the current makeup constitutes diversity of any kind.
 
 Perhaps I shouldn't have brought up the mailing list issues as that was
 only meant in the spirit of trying to offer some advice on how more
 diversity could be achieved.  Flume is really the only community I
 participate in that contains Cloudera employees so I do find myself
 wondering if the way the project is run is because that is the way all
 projects with a large number of Cloudera employees are run.  That might
 make all of those participants comfortable but might create a barrier to
 others.
 
 
 Here are the committers who have been active in the past three months:
 
 * Brock Noland (Cloudera)
 * Hari Shreedharan  (Cloudera)
 * Jarek Jarcec Cecho (AVG Technologies)
 * Juhani Connolly   (CyberAgent)
 * Mike Percy (Cloudera)
 * Mingjie Lai (Trend Micro)
 * Prasad Mujumdar (Cloudera)
 * Will McQueen (Cloudera)
 * Arvind Prabhakar (Cloudera)
 
 There are four companies represented in this list: AVG Technologies,
 Cloudera, CyberAgent and Trend Micro. Compared to other projects that have
 successfully graduated from Incubator in the past, this meets the diversity
 requirements very well.

I was mistaken and the list above is indeed correct.  For some reason I thought 
a couple of them had become Cloudera employees.  

However, none of those three are currently on the PPMC.  When you look at the 
PPMC list you should also include a few more Cloudera people who do participate 
in release votes and PPMC issues. Most, if not all, of the non-Cloudera PMC 
members don't.



 
 
 
 In any case - I'm not insisting that the way the project is run needs to
 change. I'm simply saying I cannot support graduation with the current
 makeup of the committers and PMC. I don't have a hard and fast ratio -
 gaining 10 new unaffiliated committers who don't do much isn't nearly as
 good as 2 or 3 who are very active.  Ultimately the project needs to figure
 out how to solve this.
 
 
 Stating that some committers who don't do much isn't nearly as good as 2
 or 3 who are very active is an unfair characterization. This is more
 unfair for those who are part of the project but have not been active
 lately due to whatever reasons, but have played a foundational role in
 getting the project to a point where it is today. I think they are as
 important as any other committer who may be very active at the moment.
 Merit once earned, never expires [1].
 
 [1] http://www.apache.org/dev/committers.html#committer-set-term

I think you misunderstood my point or I didn't state it very well.  Diversity 
isn't achieved simply by having bodies.  IOW I am not suggesting offering 
commit rights to people who haven't earned it just to meet some ratio.  
However, I am not suggesting the project has ever even considered doing that.

Ralph 



-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [VOTE] Accept Crunch into the Apache Incubator

2012-05-24 Thread Doug Cutting

+1

Doug

On 05/23/2012 11:45 AM, Josh Wills wrote:

I would like to call a vote for accepting Apache Crunch for
incubation in the Apache Incubator. The full proposal is available
below.  We ask the Incubator PMC to sponsor it, with phunt as
Champion, and phunt, tomwhite, and acmurthy volunteering to be
Mentors.

Please cast your vote:

[ ] +1, bring Crunch into Incubator
[ ] +0, I don't care either way,
[ ] -1, do not bring Crunch into Incubator, because...

This vote will be open for 72 hours and only votes from the Incubator
PMC are binding.

http://wiki.apache.org/incubator/CrunchProposal

Proposal text from the wiki:
--
= Crunch - Easy, Efficient MapReduce Pipelines in Java and Scala =

== Abstract ==

Crunch is a Java library for writing, testing, and running pipelines
of !MapReduce jobs on Apache Hadoop.

== Proposal ==

Crunch is a Java library for writing, testing, and running pipelines
of !MapReduce jobs on Apache Hadoop. Its main goal is to provide a
high-level API for writing and testing complex !MapReduce jobs that
require multiple processing stages.  It has a simple, flexible, and
extensible data model that makes it ideal for processing data that
does not naturally fit into a relational structure, such as time
series and serialized object formats like JSON and Avro. It supports
running pipelines either as a series of !MapReduce jobs on an Apache
Hadoop cluster or in memory on a single machine for fast testing and
debugging.

== Background ==

Crunch was initially developed by Cloudera to simplify the process of
creating sequences of dependent !MapReduce jobs, especially jobs that
processed non-relational data like time series. Its design was based
on a paper Google published about a Java library they developed called
!FlumeJava that was created in order to solve a similar class of
problems. Crunch was open-sourced by Cloudera on !GitHub as an Apache
2.0 licensed project in October 2011. During this time Crunch has been
formally released twice, as versions 0.1.0 (October 2010) and 0.2.0
(February 2012), with an incremental update to version 0.2.1 (March
2012) .  These releases are also distributed by Cloudera as source and
binaries from Cloudera's Maven repository.

== Rationale ==

Most of the interesting analytical and data processing tasks that are
run on an Apache Hadoop cluster require a series of !MapReduce jobs to
be executed in sequence. Developers who are creating these pipelines
today need to manually assign the sequence of tasks to perform in a
dependent chain of !MapReduce jobs, even though there are a number of
well-known patterns for fusing dependent computations together into a
single !MapReduce stage and for performing common types of joins and
aggregations. This results in !MapReduce pipelines that are more
difficult to test, maintain, and extend to support new functionality.

Furthermore, the type of data that is being stored and processed using
Apache Hadoop is evolving. Although Hadoop was originally used for
storing large volumes of structured text in the form of webpages and
log files, it is now common for Hadoop to store complex, structured
data formats such as JSON, Apache Avro, and Apache Thrift. These
formats allow developers to work with serialized objects in
programming languages like Java, C++, and Python, and allow for new
types of analysis to be performed on complex data types. Hadoop has
also been adopted by the scientific research community, who are using
Hadoop to process time series data, structured binary files in the
HDF5 format, and large medical and satellite images.

Crunch addresses these challenges by providing a lightweight and
extensible Java API for defining the stages of a data processing
pipeline, which can then be run on an Apache Hadoop cluster as a
sequence of dependent !MapReduce jobs, or in-memory on a single
machine to facilitate fast testing and debugging. Crunch relies on a
small set of primitive abstractions that represent immutable,
distributed collections of objects. Developers define functions that
are applied to those objects in order to generate new immutable,
distributed collections of objects. Crunch also provides a library of
common !MapReduce patterns for performing efficient joins and
aggregation operations over these distributed collections that
developers may integrate into their own pipelines. Crunch also
provides native support for processing structured binary data formats
like JSON, Apache Avro, and Apache Thrift, and is designed to be
extensible to support working with any kind of data format that Java
supports in its native form.

== Initial Goals ==

Crunch is currently in its first major release with a considerable
number of enhancement requests, tasks, and issues recorded towards its
future development. The initial goal of this project will be to
continue to build community in the spirit of the Apache 

Re: Flume Graduation (was Re: June reports in two weeks)

2012-05-24 Thread Dave Fisher

On May 24, 2012, at 11:49 AM, Ralph Goers wrote:

 
 On May 24, 2012, at 10:40 AM, Arvind Prabhakar wrote:
 
 Hi,
 
 On Thu, May 24, 2012 at 12:19 AM, Ralph Goers 
 ralph.go...@dslextreme.comwrote:
 
 The ONLY issue I see for Flume to graduate is diversity.  No one will
 convince me that the current makeup constitutes diversity of any kind.
 
 Perhaps I shouldn't have brought up the mailing list issues as that was
 only meant in the spirit of trying to offer some advice on how more
 diversity could be achieved.  Flume is really the only community I
 participate in that contains Cloudera employees so I do find myself
 wondering if the way the project is run is because that is the way all
 projects with a large number of Cloudera employees are run.  That might
 make all of those participants comfortable but might create a barrier to
 others.
 
 
 Here are the committers who have been active in the past three months:
 
 * Brock Noland (Cloudera)
 * Hari Shreedharan  (Cloudera)
 * Jarek Jarcec Cecho (AVG Technologies)
 * Juhani Connolly   (CyberAgent)
 * Mike Percy (Cloudera)
 * Mingjie Lai (Trend Micro)
 * Prasad Mujumdar (Cloudera)
 * Will McQueen (Cloudera)
 * Arvind Prabhakar (Cloudera)
 
 There are four companies represented in this list: AVG Technologies,
 Cloudera, CyberAgent and Trend Micro. Compared to other projects that have
 successfully graduated from Incubator in the past, this meets the diversity
 requirements very well.
 
 I was mistaken and the list above is indeed correct.  For some reason I 
 thought a couple of them had become Cloudera employees.  
 
 However, none of those three are currently on the PPMC.  When you look at the 
 PPMC list you should also include a few more Cloudera people who do 
 participate in release votes and PPMC issues. Most, if not all, of the 
 non-Cloudera PMC members don't.

I started reading some of the Flume website and I think that when you go to the 
main Wiki page:

https://cwiki.apache.org/confluence/display/FLUME/Index

When you click on the Flume Cookbook the resource is at cloudera.org.

http://archive.cloudera.com/cdh/3/flume/Cookbook/

This page lists flume-...@cloudera.org and is a file with a revision dated 
May 7, 2012.

You can make you own conclusions, but it looks like podling resources need to 
be migrated to the ASF.

Regards,
Dave

 
 
 
 
 
 
 In any case - I'm not insisting that the way the project is run needs to
 change. I'm simply saying I cannot support graduation with the current
 makeup of the committers and PMC. I don't have a hard and fast ratio -
 gaining 10 new unaffiliated committers who don't do much isn't nearly as
 good as 2 or 3 who are very active.  Ultimately the project needs to figure
 out how to solve this.
 
 
 Stating that some committers who don't do much isn't nearly as good as 2
 or 3 who are very active is an unfair characterization. This is more
 unfair for those who are part of the project but have not been active
 lately due to whatever reasons, but have played a foundational role in
 getting the project to a point where it is today. I think they are as
 important as any other committer who may be very active at the moment.
 Merit once earned, never expires [1].
 
 [1] http://www.apache.org/dev/committers.html#committer-set-term
 
 I think you misunderstood my point or I didn't state it very well.  Diversity 
 isn't achieved simply by having bodies.  IOW I am not suggesting offering 
 commit rights to people who haven't earned it just to meet some ratio.  
 However, I am not suggesting the project has ever even considered doing that.
 
 Ralph 
 
 
 
 -
 To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
 For additional commands, e-mail: general-h...@incubator.apache.org
 


-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: Invitation to join Apache Kafka as a committer

2012-05-24 Thread Joel Koshy
Works now.

Thanks,

Joel

On Wed, May 23, 2012 at 8:08 PM, Kevan Miller kevan.mil...@gmail.comwrote:


 On May 23, 2012, at 10:11 PM, Alan D. Cabrera wrote:

  -kafka-private
  +kafka-dev
  +general
 
  Ahh, account was only created.  According to root:
 
  Only PMC chairs can grant karma.  If needed, please post to the general@
 /
  dev@/private@ list of your project asking for someone with sufficient
  karma to grant access to 'jjkoshy'.
 
  Sorry about this confusion.  I don't have the necessary karma and am so
 used to other mentors having it that I forgot that the above step needed to
 be done.
 
  Can someone in the IPMC perform the needful?   Thanks!
 
  cc: incubator general

 Done.

 --kevan




Re: Flume Graduation (was Re: June reports in two weeks)

2012-05-24 Thread Tom White
According to Clutch [1] the project has added 8 committers since it
entered incubation. Regarding diversity, committers from over four
organizations are actively involved in Flume development, which is
pretty healthy. There does seem to be a need to have more diversity at
the PPMC level, however, so that's something that could be worked on.

Tom

[1] http://incubator.apache.org/clutch.html

On Thu, May 24, 2012 at 2:06 PM, Dave Fisher dave2w...@comcast.net wrote:

 On May 24, 2012, at 11:49 AM, Ralph Goers wrote:


 On May 24, 2012, at 10:40 AM, Arvind Prabhakar wrote:

 Hi,

 On Thu, May 24, 2012 at 12:19 AM, Ralph Goers 
 ralph.go...@dslextreme.comwrote:

 The ONLY issue I see for Flume to graduate is diversity.  No one will
 convince me that the current makeup constitutes diversity of any kind.

 Perhaps I shouldn't have brought up the mailing list issues as that was
 only meant in the spirit of trying to offer some advice on how more
 diversity could be achieved.  Flume is really the only community I
 participate in that contains Cloudera employees so I do find myself
 wondering if the way the project is run is because that is the way all
 projects with a large number of Cloudera employees are run.  That might
 make all of those participants comfortable but might create a barrier to
 others.


 Here are the committers who have been active in the past three months:

 * Brock Noland (Cloudera)
 * Hari Shreedharan  (Cloudera)
 * Jarek Jarcec Cecho (AVG Technologies)
 * Juhani Connolly   (CyberAgent)
 * Mike Percy (Cloudera)
 * Mingjie Lai (Trend Micro)
 * Prasad Mujumdar (Cloudera)
 * Will McQueen (Cloudera)
 * Arvind Prabhakar (Cloudera)

 There are four companies represented in this list: AVG Technologies,
 Cloudera, CyberAgent and Trend Micro. Compared to other projects that have
 successfully graduated from Incubator in the past, this meets the diversity
 requirements very well.

 I was mistaken and the list above is indeed correct.  For some reason I 
 thought a couple of them had become Cloudera employees.

 However, none of those three are currently on the PPMC.  When you look at 
 the PPMC list you should also include a few more Cloudera people who do 
 participate in release votes and PPMC issues. Most, if not all, of the 
 non-Cloudera PMC members don't.

 I started reading some of the Flume website and I think that when you go to 
 the main Wiki page:

 https://cwiki.apache.org/confluence/display/FLUME/Index

 When you click on the Flume Cookbook the resource is at cloudera.org.

 http://archive.cloudera.com/cdh/3/flume/Cookbook/

 This page lists flume-...@cloudera.org and is a file with a revision dated 
 May 7, 2012.

 You can make you own conclusions, but it looks like podling resources need to 
 be migrated to the ASF.

 Regards,
 Dave







 In any case - I'm not insisting that the way the project is run needs to
 change. I'm simply saying I cannot support graduation with the current
 makeup of the committers and PMC. I don't have a hard and fast ratio -
 gaining 10 new unaffiliated committers who don't do much isn't nearly as
 good as 2 or 3 who are very active.  Ultimately the project needs to figure
 out how to solve this.


 Stating that some committers who don't do much isn't nearly as good as 2
 or 3 who are very active is an unfair characterization. This is more
 unfair for those who are part of the project but have not been active
 lately due to whatever reasons, but have played a foundational role in
 getting the project to a point where it is today. I think they are as
 important as any other committer who may be very active at the moment.
 Merit once earned, never expires [1].

 [1] http://www.apache.org/dev/committers.html#committer-set-term

 I think you misunderstood my point or I didn't state it very well.  
 Diversity isn't achieved simply by having bodies.  IOW I am not suggesting 
 offering commit rights to people who haven't earned it just to meet some 
 ratio.  However, I am not suggesting the project has ever even considered 
 doing that.

 Ralph



 -
 To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
 For additional commands, e-mail: general-h...@incubator.apache.org



 -
 To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
 For additional commands, e-mail: general-h...@incubator.apache.org


-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org