I purposely left out storm-starter from the discussion to keep things focused, and because it’s a different animal. But I also feel it should be pulled in, albeit differently. I was thinking something along the lines of an “examples” directory, and that all committers would share collective ownership/responsibility.
I haven’t thought to much yet about the others (storm-yarn, etc.), but I think that warrants a discussion as well. Personally, I’d be willing to sponsor modules for Cassandra, HDFS, HBase, and JMS. I also contacted the author of storm-kafka-0.8-plus, and he is willing to contribute that work and help with maintenance. Regarding the juju charms issue [1], my intent wasn’t to shoot it down entirely (which is why I left it open), but rather make it clear that it’s not a priority at this point in time. I’ll admit that it was a bit of a knee-jerk reaction to the fact that someone from Canonical essentially spammed a bunch of Apache projects with the same request. It also seemed not unlike a request for us to maintain .rpm and .deb packages, etc., which is a path I’d be very hesitant to go down. - Taylor [1] https://issues.apache.org/jira/browse/STORM-240 On Feb 26, 2014, at 4:25 PM, Bobby Evans <[email protected]> wrote: > I totally agree and I am +1 on bringing these spout/trident pieces in, > assuming there are committers to support them. > > I am also curious about how people feel about pulling in other projects like > storm-starter, storm-deploy, storm-mesos, and storm-yarn? > > Storm-starter in my option seems more like documentation and it would be nice > to pull in so that it stays up to date with storm itself, just like the > documentation. > > The others are more of ways to run storm in different environments. They > seem like there could be a lot of coupling between them and storm as storm > evolves, and they kind of fit with "integrate storm with *Technology X*” > except X in this case is a compute environment instead of a data source or > store. But then again we also just shot down a request to create juju charms > for storm. > > —Bobby > > From: "P. Taylor Goetz" <[email protected]<mailto:[email protected]>> > Reply-To: > <[email protected]<mailto:[email protected]>> > Date: Wednesday, February 26, 2014 at 1:21 PM > To: <[email protected]<mailto:[email protected]>> > Cc: "[email protected]<mailto:[email protected]>" > <[email protected]<mailto:[email protected]>> > Subject: Re: [DISCUSS] Pulling "Contrib" Modules into Apache > > Thanks for the feedback Bobby. > > To clarify, I’m mainly talking about spout/bolt/trident state implementations > that integrate storm with *Technology X*, where *Technology X* is not a > fundamental part of storm. > > Examples would be technologies that are part of or related to the Hadoop/Big > Data ecosystem and enable the Lamda Architecture, e.g.: Kafka, HDFS, HBase, > Cassandra, etc. > > The idea behind having one or more Storm committers act as a “sponsor” is to > make sure new additions are done carefully and with good reason. To add a new > module, it would require committer/PPMC consensus, and assignment of one or > more sponsors. Part of a sponsor’s job would be to ensure that a module is > maintained, which would require enough familiarity with the code so support > it long term. If a new module was proposed, but no committers were willing to > act as a sponsor, it would not be added. > > It would be the Committers’/PPMC’s responsibly to make sure things didn’t get > out of hand, and to do something about it if it does. > > Here’s an old Hadoop JIRA thread [1] discussing the addition of Hive as a > contrib module, similar to what happened with HBase as Bobby pointed out. > Some interesting points are brought up. The difference here is that both > HBase and Hive were pretty big codebases relative to Hadoop. With > spout/bolt/state implementations I doubt we’d see anything along that scale. > > - Taylor > > [1] https://issues.apache.org/jira/browse/HADOOP-3601 > > > On Feb 26, 2014, at 12:35 PM, Bobby Evans > <[email protected]<mailto:[email protected]>> wrote: > > I can see a lot of value in having a distribution of storm that comes with > batteries included, everything is tested together and you know it works. But > I don’t see much long term developer benefit in building them all together. > If there is strong coupling between storm and these external projects so that > they break when storm changes then we need to understand the coupling and > decide if we want to reduce that coupling by stabilizing APIs, improving > version numbering and release process, etc.; or if the functionality is > something that should be offered as a base service in storm. > > I can see politically the value of giving these other projects a home in > Apache, and making them sub-projects is the simplest route to that. I’d love > to have storm on yarn inside Apache. I just don’t want to go overboard with > it. There was a time when HBase was a “contrib” module under Hadoop along > with a lot of other things, and the Apache board came and told Hadoop to > brake it up. > > Bringing storm-kafka into storm does not sound like it will solve much from a > developer’s perspective, because there is at least as much coupling with > kafka as there is with storm. I can see how it is a huge amount of overhead > and pain to set up a new project just for a few hundred lines of code, as > such I am in favor of pulling in closely related projects, especially those > that are spouts and state implementations. I just want to be sure that we do > it carefully, with a good reason, and with enough people who are familiar > with the code to support it long term. > > If it starts to look like we are pulling in too many projects perhaps we > should look at something more like the bigtop project > https://bigtop.apache.org/ which produces a tested distribution of Hadoop > with many different sub-projects included in it. > > I am also a bit concerned about these sub-projects becoming second class > citizens, where we break something, but because the build is off by default > we don’t know it. I would prefer that they are built and tested by default. > If the build and test time starts to take too long, to me that means we need > to start wondering if we have too many contrib modules. > > —Bobby > > From: Brian Enochson > <[email protected]<mailto:[email protected]><mailto:[email protected]>> > Reply-To: > "[email protected]<mailto:[email protected]><mailto:[email protected]>" > > <[email protected]<mailto:[email protected]><mailto:[email protected]>> > Date: Tuesday, February 25, 2014 at 9:50 PM > To: > "[email protected]<mailto:[email protected]><mailto:[email protected]>" > > <[email protected]<mailto:[email protected]><mailto:[email protected]>> > Cc: > "[email protected]<mailto:[email protected]><mailto:[email protected]>" > > <[email protected]<mailto:[email protected]><mailto:[email protected]>> > Subject: Re: [DISCUSS] Pulling "Contrib" Modules into Apache > > hi, > I am in agreement with Taylor and believe I understand his intent. An > incredible tool/framework/application like Storm is only enhanced and gains > value from the number of well maintained and vetted modules that can be used > for integration and adding further functionality. > I am relatively new to the Storm community but have spent quite some time > reviewing contributing modules out there, reviewing various duplicates and > running into some version incompatibilities. I understand the need to keep > Storm itself pure, but do think there needs to be some structure and > governance added to the contributing modules. Look at the benefit a tool like > npm brings to the node community. > I like the idea of sponsorship, vetting and a community vote. I, as sure > many would be, am willing to offer support and time to working through how to > set this up and helping with the implementation if it is decided to pursue > some solution. > I hope these views are taken in the sprit they are made, to make this > incredible system even better along with the surrounding eco-system. > > Thanks, > Brian > > > On Tue, Feb 25, 2014 at 9:36 PM, P. Taylor Goetz > <[email protected]<mailto:[email protected]><mailto:[email protected]>> wrote: > Just to be clear (and play a little Devil’s advocate :) ), I’m not suggesting > that whatever a “contrib” project/module/subproject might become, be a > clearinghouse for anything Storm-related. > > I see it as something that is well-vetted by the Storm community, subject to > PPMC review, vote, etc. Entry would require community review, PPMC review, > and in some cases ASF IP clearance/legal review. Anything added would require > some level of commitment from the PPMC/committers to provide some level of > support. > > In other words, nothing “willy-nilly”. > > One option could be that any module added require (X > 0) number of > committers to volunteer as “sponsor”s for the module, and commit to > maintaining it. > > That being said, I don’t see storm-kafka being any different from anything > else that provides integration points for Storm. > > -Taylor > > > On Feb 25, 2014, at 7:53 PM, Nathan Marz > <[email protected]<mailto:[email protected]><mailto:[email protected]>> > wrote: > > I'm only +1 for pulling in storm-kafka and updating it. Other projects put > these contrib modules in a "contrib" folder and keep them managed as > completely separate codebases. As it's not actually a "module" necessary for > Storm, there's an argument there for doing it that way rather than via the > multi-module route. > > > On Tue, Feb 25, 2014 at 4:39 PM, Milinda Pathirage > <[email protected]<mailto:[email protected]><mailto:[email protected]>> > wrote: > Hi Taylor, > > I'm +1 for pulling these external libraries into Apache codebase. This > will certainly benifit Strom community. I also like to contribute to > this process. > > Thanks > Milinda > > On Tue, Feb 25, 2014 at 5:28 PM, P. Taylor Goetz > <[email protected]<mailto:[email protected]><mailto:[email protected]>> wrote: > A while back I opened STORM-206 [1] to capture ideas for pulling in > "contrib" modules to the Apache codebase. > > In the past, we had the storm-contrib github project [2] which subsequently > got broken up into individual projects hosted on the stormprocessor github > group [3] and elsewhere. > > The problem with this approach is that in certain cases it led to code rot > (modules not being updated in step with Storm's API), fragmentation > (multiple similar modules with the same name), and confusion. > > A good example of this is the storm-kafka module [4], since it is a widely > used component. Because storm-contrib wasn't being tagged in github, a lot > of users had trouble reconciling with which versions of storm it was > compatible. Some users built off specific commit hashes, some forked, and a > few even pushed custom builds to repositories such as clojars. With kafka > 0.8 now available, there are two main storm-kafka projects, the original > (compatible with kafka 0.7) and an updated fork [5] (compatible with kafka > 0.8). > > My intention is not to find fault in any way, but rather to point out the > resulting pain, and work toward a better solution. > > I think it would be beneficial to the Storm user community to have certain > commonly used modules like storm-kafka brought into the Apache Storm > project. Another benefit worth considering is the licensing/legal oversight > that the ASF provides, which is important to many users. > > If this is something we want to do, then the big question becomes what sort > governance process needs to be established to ensure that such things are > properly maintained. > > Some random thoughts, questions, etc. that jump to mind include: > > What to call these things: "contib modules", "connectors", "integration > modules", etc.? > Build integration: I imagine they would be a multi-module submodule of the > main maven build. Probably turned off by default and enabled by a maven > profile. > Governance: Have one or more committer volunteers responsible for > maintenance, merging patches, etc.? Proposal process for pulling new > modules? > > > I look forward to hearing others' opinions. > > - Taylor > > > [1] https://issues.apache.org/jira/browse/STORM-206 > [2] https://github.com/nathanmarz/storm-contrib > [3] https://github.com/stormprocessor > [4] https://github.com/nathanmarz/storm-contrib/tree/master/storm-kafka > [5] https://github.com/wurstmeister/storm-kafka-0.8-plus > > > > -- > Milinda Pathirage > > PhD Student | Research Assistant > School of Informatics and Computing | Data to Insight Center > Indiana University > > twitter: milindalakmal > skype: milinda.pathirage > blog: http://milinda.pathirage.org<http://milinda.pathirage.org/> > > > > -- > Twitter: @nathanmarz > http://nathanmarz.com<http://nathanmarz.com/>
signature.asc
Description: Message signed with OpenPGP using GPGMail
