Re: [DISCUSS] Pulling "Contrib" Modules into Apache

P. Taylor Goetz Wed, 26 Feb 2014 14:26:57 -0800

I purposely left out storm-starter from the discussion to keep things focused, 
and because it’s a different animal. But I also feel it should be pulled in, 
albeit differently. I was thinking something along the lines of an “examples” 
directory, and that all committers would share collective 
ownership/responsibility.


I haven’t thought to much yet about the others (storm-yarn, etc.), but I think 
that warrants a discussion as well.

Personally, I’d be willing to sponsor modules for Cassandra, HDFS, HBase, and 
JMS.

I also contacted the author of storm-kafka-0.8-plus, and he is willing to 
contribute that work and help with maintenance.

Regarding the juju charms issue [1], my intent wasn’t to shoot it down entirely 
(which is why I left it open), but rather make it clear that it’s not a 
priority at this point in time. I’ll admit that it was a bit of a knee-jerk 
reaction to the fact that someone from Canonical essentially spammed a bunch of 
Apache projects with the same request. It also seemed not unlike a request for 
us to maintain .rpm and .deb packages,  etc., which is a path I’d be very 
hesitant to go down.

- Taylor

[1] https://issues.apache.org/jira/browse/STORM-240

On Feb 26, 2014, at 4:25 PM, Bobby Evans <[email protected]> wrote:

> I totally agree and I am +1 on bringing these spout/trident pieces in, 
> assuming there are committers to support them.
> 
> I am also curious about how people feel about pulling in other projects like 
> storm-starter, storm-deploy, storm-mesos, and storm-yarn?
> 
> Storm-starter in my option seems more like documentation and it would be nice 
> to pull in so that it stays up to date with storm itself, just like the 
> documentation.
> 
> The others are more of ways to run storm in different environments.  They 
> seem like there could be a lot of coupling between them and storm as storm 
> evolves, and they kind of fit with "integrate storm with *Technology X*” 
> except X in this case is a compute environment instead of a data source or 
> store. But then again we also just shot down a request to create juju charms 
> for storm.
> 
> —Bobby
> 
> From: "P. Taylor Goetz" <[email protected]<mailto:[email protected]>>
> Reply-To: 
> <[email protected]<mailto:[email protected]>>
> Date: Wednesday, February 26, 2014 at 1:21 PM
> To: <[email protected]<mailto:[email protected]>>
> Cc: "[email protected]<mailto:[email protected]>" 
> <[email protected]<mailto:[email protected]>>
> Subject: Re: [DISCUSS] Pulling "Contrib" Modules into Apache
> 
> Thanks for the feedback Bobby.
> 
> To clarify, I’m mainly talking about spout/bolt/trident state implementations 
> that integrate storm with *Technology X*, where *Technology X* is not a 
> fundamental part of storm.
> 
> Examples would be technologies that are part of or related to the Hadoop/Big 
> Data ecosystem and enable the Lamda Architecture, e.g.: Kafka, HDFS, HBase, 
> Cassandra, etc.
> 
> The idea behind having one or more Storm committers act as a “sponsor” is to 
> make sure new additions are done carefully and with good reason. To add a new 
> module, it would require committer/PPMC consensus, and assignment of one or 
> more sponsors. Part of a sponsor’s job would be to ensure that a module is 
> maintained, which would require enough familiarity with the code so support 
> it long term. If a new module was proposed, but no committers were willing to 
> act as a sponsor, it would not be added.
> 
> It would be the Committers’/PPMC’s responsibly to make sure things didn’t get 
> out of hand, and to do something about it if it does.
> 
> Here’s an old Hadoop JIRA thread [1] discussing the addition of Hive as a 
> contrib module, similar to what happened with HBase as Bobby pointed out. 
> Some interesting points are brought up. The difference here is that both 
> HBase and Hive were pretty big codebases relative to Hadoop. With 
> spout/bolt/state implementations I doubt we’d see anything along that scale.
> 
> - Taylor
> 
> [1] https://issues.apache.org/jira/browse/HADOOP-3601
> 
> 
> On Feb 26, 2014, at 12:35 PM, Bobby Evans 
> <[email protected]<mailto:[email protected]>> wrote:
> 
> I can see a lot of value in having a distribution of storm that comes with 
> batteries included, everything is tested together and you know it works.  But 
> I don’t see much long term developer benefit in building them all together.  
> If there is strong coupling between storm and these external projects so that 
> they break when storm changes then we need to understand the coupling and 
> decide if we want to reduce that coupling by stabilizing APIs, improving 
> version numbering and release process, etc.; or if the functionality is 
> something that should be offered as a base service in storm.
> 
> I can see politically the value of giving these other projects a home in 
> Apache, and making them sub-projects is the simplest route to that.  I’d love 
> to have storm on yarn inside Apache.  I just don’t want to go overboard with 
> it.  There was a time when HBase was a “contrib” module under Hadoop along 
> with a lot of other things, and the Apache board came and told Hadoop to 
> brake it up.
> 
> Bringing storm-kafka into storm does not sound like it will solve much from a 
> developer’s perspective, because there is at least as much coupling with 
> kafka as there is with storm.  I can see how it is a huge amount of overhead 
> and pain to set up a new project just for a few hundred lines of code, as 
> such I am in favor of pulling in closely related projects, especially those 
> that are spouts and state implementations. I just want to be sure that we do 
> it carefully, with a good reason, and with enough people who are familiar 
> with the code to support it long term.
> 
> If it starts to look like we are pulling in too many projects perhaps we 
> should look at something more like the bigtop project  
> https://bigtop.apache.org/ which produces a tested distribution of Hadoop 
> with many different sub-projects included in it.
> 
> I am also a bit concerned about these sub-projects becoming second class 
> citizens, where we break something, but because the build is off by default 
> we don’t know it.  I would prefer that they are built and tested by default.  
> If the build and test time starts to take too long, to me that means we need 
> to start wondering if we have too many contrib modules.
> 
> —Bobby
> 
> From: Brian Enochson 
> <[email protected]<mailto:[email protected]><mailto:[email protected]>>
> Reply-To: 
> "[email protected]<mailto:[email protected]><mailto:[email protected]>"
>  
> <[email protected]<mailto:[email protected]><mailto:[email protected]>>
> Date: Tuesday, February 25, 2014 at 9:50 PM
> To: 
> "[email protected]<mailto:[email protected]><mailto:[email protected]>"
>  
> <[email protected]<mailto:[email protected]><mailto:[email protected]>>
> Cc: 
> "[email protected]<mailto:[email protected]><mailto:[email protected]>"
>  
> <[email protected]<mailto:[email protected]><mailto:[email protected]>>
> Subject: Re: [DISCUSS] Pulling "Contrib" Modules into Apache
> 
> hi,
>  I am in agreement with Taylor and believe I understand his intent. An 
> incredible tool/framework/application like Storm is only enhanced and gains 
> value from the number of well maintained and vetted modules that can be used 
> for integration and adding further functionality.
> I am relatively new to the Storm community but have spent quite some time 
> reviewing contributing modules out there, reviewing various duplicates and 
> running into some version incompatibilities. I understand the need to keep 
> Storm itself pure, but do think there needs to be some structure and 
> governance added to the contributing modules. Look at the benefit a tool like 
> npm brings to the node community.
> I like the idea of sponsorship, vetting and a community vote.  I, as sure 
> many would be, am willing to offer support and time to working through how to 
> set this up and helping with the implementation if it is decided to pursue 
> some solution.
> I hope these views are taken in the sprit they are made, to make this 
> incredible system even better along with the surrounding eco-system.
> 
> Thanks,
> Brian
> 
> 
> On Tue, Feb 25, 2014 at 9:36 PM, P. Taylor Goetz 
> <[email protected]<mailto:[email protected]><mailto:[email protected]>> wrote:
> Just to be clear (and play a little Devil’s advocate :) ), I’m not suggesting 
> that whatever a “contrib” project/module/subproject might  become, be a 
> clearinghouse for anything Storm-related.
> 
> I see it as something that is well-vetted by the Storm community, subject to 
> PPMC review, vote, etc. Entry would require community review, PPMC review, 
> and in some cases ASF IP clearance/legal review. Anything added would require 
> some level of commitment from the PPMC/committers to provide some level of 
> support.
> 
> In other words, nothing “willy-nilly”.
> 
> One option could be that any module added require (X > 0)  number of 
> committers to volunteer as “sponsor”s for the module, and commit to 
> maintaining it.
> 
> That being said, I don’t see storm-kafka being any different from anything 
> else that provides integration points for Storm.
> 
> -Taylor
> 
> 
> On Feb 25, 2014, at 7:53 PM, Nathan Marz 
> <[email protected]<mailto:[email protected]><mailto:[email protected]>>
>  wrote:
> 
> I'm only +1 for pulling in storm-kafka and updating it. Other projects put 
> these contrib modules in a "contrib" folder and keep them managed as 
> completely separate codebases. As it's not actually a "module" necessary for 
> Storm, there's an argument there for doing it that way rather than via the 
> multi-module route.
> 
> 
> On Tue, Feb 25, 2014 at 4:39 PM, Milinda Pathirage 
> <[email protected]<mailto:[email protected]><mailto:[email protected]>>
>  wrote:
> Hi Taylor,
> 
> I'm +1 for pulling these external libraries into Apache codebase. This
> will certainly benifit Strom community. I also like to contribute to
> this process.
> 
> Thanks
> Milinda
> 
> On Tue, Feb 25, 2014 at 5:28 PM, P. Taylor Goetz 
> <[email protected]<mailto:[email protected]><mailto:[email protected]>> wrote:
> A while back I opened STORM-206 [1] to capture ideas for pulling in
> "contrib" modules to the Apache codebase.
> 
> In the past, we had the storm-contrib github project [2] which subsequently
> got broken up into individual projects hosted on the stormprocessor github
> group [3] and elsewhere.
> 
> The problem with this approach is that in certain cases it led to code rot
> (modules not being updated in step with Storm's API), fragmentation
> (multiple similar modules with the same name), and confusion.
> 
> A good example of this is the storm-kafka module [4], since it is a widely
> used component. Because storm-contrib wasn't being tagged in github, a lot
> of users had trouble reconciling with which versions of storm it was
> compatible. Some users built off specific commit hashes, some forked, and a
> few even pushed custom builds to repositories such as clojars. With kafka
> 0.8 now available, there are two main storm-kafka projects, the original
> (compatible with kafka 0.7) and an updated fork [5] (compatible with kafka
> 0.8).
> 
> My intention is not to find fault in any way, but rather to point out the
> resulting pain, and work toward a better solution.
> 
> I think it would be beneficial to the Storm user community to have certain
> commonly used modules like storm-kafka brought into the Apache Storm
> project. Another benefit worth considering is the licensing/legal oversight
> that the ASF provides, which is important to many users.
> 
> If this is something we want to do, then the big question becomes what sort
> governance process needs to be established to ensure that such things are
> properly maintained.
> 
> Some random thoughts, questions, etc. that jump to mind include:
> 
> What to call these things: "contib modules", "connectors", "integration
> modules", etc.?
> Build integration: I imagine they would be a multi-module submodule of the
> main maven build. Probably turned off by default and enabled by a maven
> profile.
> Governance: Have one or more committer volunteers responsible for
> maintenance, merging patches, etc.? Proposal process for pulling new
> modules?
> 
> 
> I look forward to hearing others' opinions.
> 
> - Taylor
> 
> 
> [1] https://issues.apache.org/jira/browse/STORM-206
> [2] https://github.com/nathanmarz/storm-contrib
> [3] https://github.com/stormprocessor
> [4] https://github.com/nathanmarz/storm-contrib/tree/master/storm-kafka
> [5] https://github.com/wurstmeister/storm-kafka-0.8-plus
> 
> 
> 
> --
> Milinda Pathirage
> 
> PhD Student | Research Assistant
> School of Informatics and Computing | Data to Insight Center
> Indiana University
> 
> twitter: milindalakmal
> skype: milinda.pathirage
> blog: http://milinda.pathirage.org<http://milinda.pathirage.org/>
> 
> 
> 
> --
> Twitter: @nathanmarz
> http://nathanmarz.com<http://nathanmarz.com/>

signature.asc
Description: Message signed with OpenPGP using GPGMail

Re: [DISCUSS] Pulling "Contrib" Modules into Apache

Reply via email to