Re: [VOTE] Apache Syncope 1.0.0-incubating
+1 (binding) apart from the jQuery question it looks fine. You might btw be carefull with the samples. If we publish a WAR which contains other libraries, then this might be interpreted as 'distributing' them. In OpenWebBeans, DeltaSpike and MyFaces we don't deploy them any longer in binary form. Just the sources. Users can easily build it themselfs. And most of the times they are only interested in the sample source anyway. LieGrue, strub - Original Message - From: Francesco Chicchiriccò ilgro...@apache.org To: general@incubator.apache.org Cc: Sent: Monday, August 6, 2012 5:50 PM Subject: Re: [VOTE] Apache Syncope 1.0.0-incubating On 06/08/2012 16:36, Alexei Fedotov wrote: Hello Francesco, Here are few things I have found via manual inspection: 1. Jquery bundle contains several following strings: Dual licensed under the MIT or GPL Version 2 licenses. *) source release LICENSE file does not contain MIT license; *) and the file itself does not look like APL licensed; *) and it is a part of the source release. Something should be fixed here, i.e. the files replaced with wget in the build script. 2. ./legal_ext/LICENSE does not have a license for jquery. Does war contain jquery? Hi Alexei, I've taken a look at other ASF projects including JQuery (or similar dual-licensed JS frameworks) and I've opened https://issues.apache.org/jira/browse/SYNCOPE-181 We'll fix this ASAP. Don't think these issues are stoppers. Cool :-) What's your vote on the release, then? Thanks for your review. Regards. On Mon, Aug 6, 2012 at 6:07 PM, Mark Struberg strub...@yahoo.de wrote: Hi Francesco, I can check in the evening. LieGrue, strub - Original Message - From: Francesco Chicchiriccò ilgro...@apache.org To: general@incubator.apache.org Cc: Sent: Monday, August 6, 2012 2:49 PM Subject: Re: [VOTE] Apache Syncope 1.0.0-incubating Hi IPMC members, we are missing a single vote on this release: anyone interested to check? TIA. Regards. On 03/08/2012 09:58, Francesco Chicchiriccò wrote: I've created a 1.0.0-incubating release, with the following artifacts up for a vote: SVN source tag (r1367421): https://svn.apache.org/repos/asf/incubator/syncope/tags/syncope-1.0.0-incubating/ List of changes: https://svn.apache.org/repos/asf/incubator/syncope/tags/syncope-1.0.0-incubating/CHANGES Maven staging repo: https://repository.apache.org/content/repositories/orgapachesyncope-100/ Source release (checksums and signatures are available at the same location): https://repository.apache.org/content/repositories/orgapachesyncope-100/org/apache/syncope/syncope-root/1.0.0-incubating/syncope-root-1.0.0-incubating-source-release.zip Staging site: http://incubator.apache.org/syncope/1.0.0-incubating/ PGP release keys (signed using 273DF287): http://www.apache.org/dist/incubator/syncope/KEYS This has been voted through on the syncope-...@incubator.apache.org mailing list [1], and now requires a vote on general@incubator.apache.org Votes already cast (on syncope-dev): +1 (binding) * Francesco Chicchiriccò * Massimiliano Perrone * Marco Di Sabatino Di Diodoro * Emmanuel Lécharny (IPMC member) * Simone Tripodi * Colm O hEigeartaigh (IPMC member) +1 (non binding) * Denis Signoretto Vote will be open for 72 hours. [ ] +1 approve [ ] +0 no opinion [ ] -1 disapprove (and reason why) Best regards. [1] http://syncope-dev.1063484.n5.nabble.com/VOTE-Apache-Syncope-1-0-0-incubating-tp5710173p5710292.html -- Francesco Chicchiriccò ASF Member, Apache Cocoon PMC and Apache Syncope PPMC Member http://people.apache.org/~ilgrosso/ - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: Wiki write access
Hi, On Mon, Aug 6, 2012 at 10:29 PM, Tomer Shiran tshi...@maprtech.com wrote: I would like to create a Wiki page. Can you please grant me write access? (alias tshiran) Added to ContributorsGroup. BR, Jukka Zitting - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [VOTE] Release Apache Bloodhound 0.1.0 (incubating)
On Mon, Aug 6, 2012 at 9:37 PM, Greg Stein gst...@gmail.com wrote: On Aug 6, 2012 7:07 PM, Gary Martin gary.mar...@wandisco.com wrote: ... The vote will be open for at least 72 hours and therefore ends after 11pm UTC on Thursday 9th August. [ ] +1 Release this package as Apache Bloodhound 0.1.0 [ ] +0 Don't care [ ] -1 Do not release this package (please explain) Repeating my prior IPMC binding vote: +1 to release Same. +1 (binding) -Hyrum - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [VOTE] Release Apache Bloodhound 0.1.0 (incubating)
On Tue, Aug 7, 2012 at 12:07 AM, Gary Martin gary.mar...@wandisco.com wrote: Hi, I would like to request the beginning of the vote for the first release Apache Bloodhound in the incubator following the successful vote by the Bloodhound PPMC. Two of the four +1 PPMC votes were from the IPMC members Greg Stein and Hyrum Wright. The result of the vote is summarised here: http://markmail.org/message/i3g5t2m7gajuoyv6 The artefacts for the release including the source distribution and KEYS can be found here: https://dist.apache.org/repos/dist/dev/incubator/bloodhound/ The release itself is created from: https://svn.apache.org/repos/asf/incubator/bloodhound/branches/0.1 (r1362530) Issues identified by Greg and Hyrum to be fixed for the next release are listed here: https://issues.apache.org/bloodhound/ticket/153 The vote will be open for at least 72 hours and therefore ends after 11pm UTC on Thursday 9th August. [ ] +1 Release this package as Apache Bloodhound 0.1.0 [ ] +0 Don't care [ ] -1 Do not release this package (please explain) Cheers, Gary This looks similar to the Syncope release vote thats also happening right now in that the source distribution includes things like JQuery but doesn't mention that in the LICENSE file. I'm a bit surprised people are continuing to vote +1 on the Syncope release knowing that so am I getting this wrong and the JQuery license doesn't need to be included here for some reason? ...ant - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
{RESULT] [VOTE] S4 0.5.0 Release Candidate 1
Hi, The vote for this S4 release passed with the following results at the vote deadline: +1: 7 (5 binding) -1: 0 Details: +1 IPMC: acmurthy, phunt +1 PPMC kishoreg*, leoneu*, fpj +1 wider community Daniel Gomez, Karthik Kambatla Thanks to all the participants to the voting process! I'll now publish the artifacts, and after the sync delay, update the websites and send announcements. Matthieu (* voted on s4-dev list) - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [VOTE] Release Apache Bloodhound 0.1.0 (incubating)
On 07.08.2012 13:14, ant elder wrote: This looks similar to the Syncope release vote thats also happening right now in that the source distribution includes things like JQuery but doesn't mention that in the LICENSE file. I'm a bit surprised people are continuing to vote +1 on the Syncope release knowing that so am I getting this wrong and the JQuery license doesn't need to be included here for some reason? The NOTICE file explicitly notes external dependencies and their (standard) licenses. Combined with the ticket that mentions adding licenses of said dependencies to LICENSE, IMO, this is good enough for a release candidate. -- Brane - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [VOTE] Release Apache Bloodhound 0.1.0 (incubating)
On 08/07/2012 12:39 PM, Branko Čibej wrote: On 07.08.2012 13:14, ant elder wrote: This looks similar to the Syncope release vote thats also happening right now in that the source distribution includes things like JQuery but doesn't mention that in the LICENSE file. I'm a bit surprised people are continuing to vote +1 on the Syncope release knowing that so am I getting this wrong and the JQuery license doesn't need to be included here for some reason? The NOTICE file explicitly notes external dependencies and their (standard) licenses. Combined with the ticket that mentions adding licenses of said dependencies to LICENSE, IMO, this is good enough for a release candidate. -- Brane I think it is also worth noting that Greg Stein has already mentioned this - see the first item in https://issues.apache.org/bloodhound/ticket/153 (which also contains the link to Greg's email) - and so this will be attended to in the next release. Cheers, Gary - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
[RESULT] [VOTE] Apache Syncope 1.0.0-incubating
Hi all, even though we would have reached the required number of +1 from IPMC members after the required 72 hours, I still have the feeling that we are missing some consensus here. I will now revert the current release candidate and start again from scratch with another attempt. Regards. On 07/08/2012 11:21, Mark Struberg wrote: +1 (binding) apart from the jQuery question it looks fine. You might btw be carefull with the samples. If we publish a WAR which contains other libraries, then this might be interpreted as 'distributing' them. In OpenWebBeans, DeltaSpike and MyFaces we don't deploy them any longer in binary form. Just the sources. Users can easily build it themselfs. And most of the times they are only interested in the sample source anyway. LieGrue, strub - Original Message - From: Francesco Chicchiriccò ilgro...@apache.org To: general@incubator.apache.org Cc: Sent: Monday, August 6, 2012 5:50 PM Subject: Re: [VOTE] Apache Syncope 1.0.0-incubating On 06/08/2012 16:36, Alexei Fedotov wrote: Hello Francesco, Here are few things I have found via manual inspection: 1. Jquery bundle contains several following strings: Dual licensed under the MIT or GPL Version 2 licenses. *) source release LICENSE file does not contain MIT license; *) and the file itself does not look like APL licensed; *) and it is a part of the source release. Something should be fixed here, i.e. the files replaced with wget in the build script. 2. ./legal_ext/LICENSE does not have a license for jquery. Does war contain jquery? Hi Alexei, I've taken a look at other ASF projects including JQuery (or similar dual-licensed JS frameworks) and I've opened https://issues.apache.org/jira/browse/SYNCOPE-181 We'll fix this ASAP. Don't think these issues are stoppers. Cool :-) What's your vote on the release, then? Thanks for your review. Regards. On Mon, Aug 6, 2012 at 6:07 PM, Mark Struberg strub...@yahoo.de wrote: Hi Francesco, I can check in the evening. LieGrue, strub - Original Message - From: Francesco Chicchiriccò ilgro...@apache.org To: general@incubator.apache.org Cc: Sent: Monday, August 6, 2012 2:49 PM Subject: Re: [VOTE] Apache Syncope 1.0.0-incubating Hi IPMC members, we are missing a single vote on this release: anyone interested to check? TIA. Regards. On 03/08/2012 09:58, Francesco Chicchiriccò wrote: I've created a 1.0.0-incubating release, with the following artifacts up for a vote: SVN source tag (r1367421): https://svn.apache.org/repos/asf/incubator/syncope/tags/syncope-1.0.0-incubating/ List of changes: https://svn.apache.org/repos/asf/incubator/syncope/tags/syncope-1.0.0-incubating/CHANGES Maven staging repo: https://repository.apache.org/content/repositories/orgapachesyncope-100/ Source release (checksums and signatures are available at the same location): https://repository.apache.org/content/repositories/orgapachesyncope-100/org/apache/syncope/syncope-root/1.0.0-incubating/syncope-root-1.0.0-incubating-source-release.zip Staging site: http://incubator.apache.org/syncope/1.0.0-incubating/ PGP release keys (signed using 273DF287): http://www.apache.org/dist/incubator/syncope/KEYS This has been voted through on the syncope-...@incubator.apache.org mailing list [1], and now requires a vote on general@incubator.apache.org Votes already cast (on syncope-dev): +1 (binding) * Francesco Chicchiriccò * Massimiliano Perrone * Marco Di Sabatino Di Diodoro * Emmanuel Lécharny (IPMC member) * Simone Tripodi * Colm O hEigeartaigh (IPMC member) +1 (non binding) * Denis Signoretto Vote will be open for 72 hours. [ ] +1 approve [ ] +0 no opinion [ ] -1 disapprove (and reason why) Best regards. [1] http://syncope-dev.1063484.n5.nabble.com/VOTE-Apache-Syncope-1-0-0-incubating-tp5710173p5710292.html -- Francesco Chicchiriccò ASF Member, Apache Cocoon PMC and Apache Syncope PPMC Member http://people.apache.org/~ilgrosso/ - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [PROPOSAL] Drill for the Apache Incubator
FYI: I have posted the proposal to the wiki and updated it based on the feedback from Marvin and Jakob: http://wiki.apache.org/incubator/DrillProposal On Mon, Aug 6, 2012 at 2:29 PM, Ted Dunning ted.dunn...@gmail.com wrote: In fact, a big part of the motivation for proposing incubation before code is ready is exactly to foster the discussions needed to form community. It is true that many projects that start without the fundamentals face challenges that more mature projects face but that is really just a fact of life with young projects. My own experience includes a project that also started without an initial code drop. Mahout has gone on to have a vibrant welcoming community that has fostered the donation and development of some very valuable software. I expect Drill will be able to say the same thing before long. Sent from my iPhone On Aug 6, 2012, at 2:55 PM, Jakob Homan jgho...@gmail.com wrote: Any reason the design docs can't be put up in place of where the source would normally go? On Mon, Aug 6, 2012 at 11:23 AM, Tomer Shiran tshi...@maprtech.com wrote: Marvin, thanks for commenting on the proposal! The initial committers have been working on the design for several months, and will commit the design once the project is approved, so we do not expect much friction during the design phase. With that said, we certainly do want to engage others early on, and our goal in incubating earlier is to encourage feedback and contributions when it is still easy to change the APIs and extensibility points. This is important because Drill (unlike, say, Google's Dremel) must be really flexible in order to be relevant to a broad user base, allowing multiple data sources, data formats and query languages. While many projects enter incubation with a complete implementation, others don't, and due to the nature of this project we think that in this case it is better to start earlier. Thanks, Tomer On Mon, Aug 6, 2012 at 9:25 AM, Marvin Humphrey mar...@rectangular.com wrote: On Thu, Aug 2, 2012 at 3:12 PM, Ted Dunning ted.dunn...@gmail.com wrote: Initial Source == There is no initial source code. All source code will be developed within the Apache Incubator. Coming in without any source code is going to pose a challenge to this podling. http://www.apache.org/foundation/how-it-works.html#incubator The incubator filters projects on the basis of the likeliness of them becoming successful meritocratic communities. The basic requirements for incubation are: * a working codebase -- over the years and after several failures, the foundation came to understand that without an initial working codebase, it is generally hard to bootstrap a community. This is because merit is not well recognized by developers without a working codebase. Also, the friction that is developed during the initial design stage is likely to fragment the community. That last line in particular seems like something to watch out for. Marvin Humphrey - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [VOTE] Release Apache Bloodhound 0.1.0 (incubating)
On Tue, Aug 7, 2012 at 7:14 AM, ant elder ant.el...@gmail.com wrote: ... This looks similar to the Syncope release vote thats also happening right now in that the source distribution includes things like JQuery but doesn't mention that in the LICENSE file. I'm a bit surprised people are continuing to vote +1 on the Syncope release knowing that so am I getting this wrong and the JQuery license doesn't need to be included here for some reason? My feeling on the matter here is that these are incubating projects. We allow things like (L)GPL dependencies in the releases, as long as a PLAN exists to get rid of them. Of course, it must be perfectly clean to graduate. But I believe we have wiggle room while incubating. As Branko noted, the included projects are mentioned in the NOTICE file, but that isn't quite Right. The 0.2.0 release will get it corrected. We could have stepped back and rolled another tarball, but I believe it is more important for Bloodhound to get a release out [than to be perfect on 0.1.0], in order to get some traction and some attraction to build a larger community. The BH folks plan to release every few weeks, so we should see the corrections in a release at the end of the month. (or we could convince Gary to do another in a week or two :-) Cheers, -g - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
RE: [PROPOSAL] Drill for the Apache Incubator
-Original Message- From: Marvin Humphrey [mailto:mar...@rectangular.com] Sent: Monday, August 06, 2012 12:25 PM To: general@incubator.apache.org Cc: Grant Ingersoll; Isabel Drost Subject: Re: [PROPOSAL] Drill for the Apache Incubator On Thu, Aug 2, 2012 at 3:12 PM, Ted Dunning ted.dunn...@gmail.com wrote: Initial Source == There is no initial source code. All source code will be developed within the Apache Incubator. Coming in without any source code is going to pose a challenge to this podling. http://www.apache.org/foundation/how-it-works.html#incubator The incubator filters projects on the basis of the likeliness of them becoming successful meritocratic communities. The basic requirements for incubation are: * a working codebase -- over the years and after several failures, the foundation came to understand that without an initial working codebase, it is generally hard to bootstrap a community. This is because merit is not well recognized by developers without a working codebase. Also, the friction that is developed during the initial design stage is likely to fragment the community. It seems like there could be flexibility in this requirement, based on a few factors. In this case, a design discussion has been ongoing; but I would also think that any community coming in with enough people who know the Apache way may also not need as much of a solid starting point code wise. That last line in particular seems like something to watch out for. Marvin Humphrey - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [VOTE] Release Apache Bloodhound 0.1.0 (incubating)
On Tue, Aug 7, 2012 at 11:38 AM, ant elder ant.el...@gmail.com wrote: Gosh i'm pretty sure we _don't_ allow things like (L)GPL dependencies in Incubator releases, we allow them in the source in SVN but i don't recall any releases like that. I know AOO had interactions with Legal regarding dmake, dictionaries and so on, though I don't recall exactly what went into their release. I would be surprised if any category X dependencies have wound up in an incubating release without Legal's involvement. Lucy's early incubating releases had two Perl-licensed (Artistic/GPL) dependencies (which were not bundled, but had to be downloaded and installed separately by the consumer). We sought a variance from Legal and got specific approval from the Legal VP for our plan, which involved ditching both of the problematic dependencies prior to graduation: https://issues.apache.org/jira/browse/LEGAL-86 Are there other examples? Anyway thats beside the point, ok so lets have this be a precedent that sets Incubator policy - we now have some wiggle room while incubating to do a release that violates ASF release policy as long as it will be fixed soon in another release and definitely before graduating. It seems that with regards to this Bloodhound release, the issue is restricted to LICENSE/NOTICE, an area where ASF policies are notoriously unclear and conformance is arguably spotty even among TLPs. So long as the licenses of all dependencies are being obeyed (e.g. no license headers or mandatory files stripped from source files) and usage is compatible with ASF policy (no category X dependencies, etc), I agree with the judgment call that an incubating release need not be held up simply to move the text of the license from LICENSE to NOTICE or vice versa. IMO, this is different from releases with category X dependencies, where ASF policies are clear and conformance is very high among TLPs. I don't see that the Incubator should consider this vote a precedent for overturning arbitrary ASF policy. If we don't like the poor state of ASF policy and conformance on LICENSE/NOTICE then the ASF Membership should work to clarify the policy. Marvin Humphrey - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [VOTE] Release Apache Bloodhound 0.1.0 (incubating)
On Tue, Aug 7, 2012 at 3:44 PM, Marvin Humphrey mar...@rectangular.com wrote: On Tue, Aug 7, 2012 at 11:38 AM, ant elder ant.el...@gmail.com wrote: Gosh i'm pretty sure we _don't_ allow things like (L)GPL dependencies in Incubator releases, we allow them in the source in SVN but i don't recall any releases like that. I know AOO had interactions with Legal regarding dmake, dictionaries and so on, though I don't recall exactly what went into their release. I would be surprised if any category X dependencies have wound up in an incubating release without Legal's involvement. Lucy's early incubating releases had two Perl-licensed (Artistic/GPL) dependencies (which were not bundled, but had to be downloaded and installed separately by the consumer). We sought a variance from Legal and got specific approval from the Legal VP for our plan, which involved ditching both of the problematic dependencies prior to graduation: https://issues.apache.org/jira/browse/LEGAL-86 Are there other examples? The one that I had in mind was Roller. Several of its incubating releases had a hard dependency on Hibernate. They were required to clean it up before graduation, of course. You can look at the archives back in 2006 when it was incubating. In particular, there is one sent to private@incubator that I would refer you to: http://s.apache.org/c04 [only usable by ASF Members] Anyway thats beside the point, ok so lets have this be a precedent that sets Incubator policy - we now have some wiggle room while incubating to do a release that violates ASF release policy as long as it will be fixed soon in another release and definitely before graduating. It seems that with regards to this Bloodhound release, the issue is restricted to LICENSE/NOTICE, an area where ASF policies are notoriously unclear and conformance is arguably spotty even among TLPs. I've given some bad info in the past, but after the last go-round (thanks Marvin), I feel that I've got a better handle on it. And that's the feedback that I've now provided to the BH people. So long as the licenses of all dependencies are being obeyed (e.g. no license headers or mandatory files stripped from source files) and usage is compatible with ASF policy (no category X dependencies, etc), All good here. I agree with the judgment call that an incubating release need not be held up simply to move the text of the license from LICENSE to NOTICE or vice versa. IMO, this is different from releases with category X dependencies, where ASF policies are clear and conformance is very high among TLPs. I don't see that the Incubator should consider this vote a precedent for overturning arbitrary ASF policy. For TLPs, I totally agree. For projects that are incubating... they are NOT ASF projects by definition. That is why we've allowed a bit of wiggle. In any case, Bloodhound isn't requesting any funny deps. Just getting a release out there which some already-known issues. That's why it got my +1, and recommendation to just go with 0.1.0 rather than spinning up a new tarball. Cheers, -g - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [VOTE] Release Apache Bloodhound 0.1.0 (incubating)
On Tue, Aug 7, 2012 at 2:38 PM, ant elder ant.el...@gmail.com wrote: ... Gosh i'm pretty sure we _don't_ allow things like (L)GPL dependencies in Incubator releases, we allow them in the source in SVN but i don't recall any releases like that. As I replied to Marvin, Apache Roller had a hard dependency on Hibernate for some of its incubator releases. Allowing that was okay'd by the IPMC, VP Legal, and the Board :-) My view is that these are not true ASF projects, so *some* wiggle is allowable, especially with a plan in hand. (now, I still would not advocate for any release that seriously broke the rules; at a minimum, get LICENSE/NOTICE and source file headers in there; work on clarifying your dependencies and their licenses; etc) Anyway thats beside the point, ok so lets have this be a precedent that sets Incubator policy - we now have some wiggle room while incubating to do a release that violates ASF release policy as long as it will be fixed soon in another release and definitely before graduating. A policy like that would help a lot with avoiding the numerous respins some poddling releases are made to do during voting on general@. Exactly. We've seen a lot of back/forth which doesn't really help the podling very much. It's certainly a subjective judgement call. I don't know where to draw the line, nor whether we must draw it. One of those know it when you see it things. And we have the judgement of a large body of people here on this list. Cheers, -g - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
[DISCUSS] [VOTE] Release Apache Bloodhound 0.1.0 (incubating)
Hi, [branching a discuss thread] On Tue, Aug 7, 2012 at 10:56 PM, Greg Stein gst...@gmail.com wrote: As I replied to Marvin, Apache Roller had a hard dependency on Hibernate for some of its incubator releases. Allowing that was okay'd by the IPMC, VP Legal, and the Board :-) My view is that these are not true ASF projects, so *some* wiggle is allowable, especially with a plan in hand. Note that even though podlings aren't full Apache projects yet, incubating releases *are* official Apache releases, and should therefore be held to a similar standard. If that standard can't easily be reached, some podlings (like Subversion when it came in) have opted to keep cutting non-Apache releases outside the ASF until those issues have been resolved. It's certainly a subjective judgement call. I don't know where to draw the line, nor whether we must draw it. One of those know it when you see it things. And we have the judgement of a large body of people here on this list. Personally I'm fine with things like missing license headers or partially incomplete license metadata (which sounds like is the case here), as long as those are just omissions that don't fundamentally affect our rights (or those of downstream users) to distribute the releases and as long as there's a commitment to fix such issues in time for the next release. Such minor issues are fairly common also in many TLPs (I've filed a number of related bugs), so it's not even a problem that's limited just to the Incubator. Larger issues like exceptions to documented licensing policy (like in the examples brought up here) should always be explicitly cleared with legal, etc. BR, Jukka Zitting - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [PROPOSAL] Drill for the Apache Incubator
On 07/08/2012 21:14, Franklin, Matthew B. wrote: -Original Message- From: Marvin Humphrey [mailto:mar...@rectangular.com] Sent: Monday, August 06, 2012 12:25 PM To: general@incubator.apache.org Cc: Grant Ingersoll; Isabel Drost Subject: Re: [PROPOSAL] Drill for the Apache Incubator On Thu, Aug 2, 2012 at 3:12 PM, Ted Dunning ted.dunn...@gmail.com wrote: Initial Source == There is no initial source code. All source code will be developed within the Apache Incubator. Coming in without any source code is going to pose a challenge to this podling. http://www.apache.org/foundation/how-it-works.html#incubator The incubator filters projects on the basis of the likeliness of them becoming successful meritocratic communities. The basic requirements for incubation are: * a working codebase -- over the years and after several failures, the foundation came to understand that without an initial working codebase, it is generally hard to bootstrap a community. This is because merit is not well recognized by developers without a working codebase. Also, the friction that is developed during the initial design stage is likely to fragment the community. It seems like there could be flexibility in this requirement, based on a few factors. In this case, a design discussion has been ongoing; but I would also think that any community coming in with enough people who know the Apache way may also not need as much of a solid starting point code wise. +1. Given the credentials and the experience of proposed committers and mentors, and the fact that the initial design is already done, I don't think this is a serious risk. And it's an exciting proposal with a potentially big impact. -- Best regards, Andrzej Bialecki http://www.sigram.com, blog http://www.sigram.com/blog ___.,___,___,___,_._. __ [___||.__|__/|__||\/|: Information Retrieval, System Integration ___|||__||..\|..||..|: Contact: info at sigram dot com - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [VOTE] Release Apache Bloodhound 0.1.0 (incubating)
On Tue, Aug 7, 2012 at 9:51 PM, Greg Stein gst...@gmail.com wrote: On Tue, Aug 7, 2012 at 3:44 PM, Marvin Humphrey mar...@rectangular.com wrote: On Tue, Aug 7, 2012 at 11:38 AM, ant elder ant.el...@gmail.com wrote: Gosh i'm pretty sure we _don't_ allow things like (L)GPL dependencies in Incubator releases, we allow them in the source in SVN but i don't recall any releases like that. I know AOO had interactions with Legal regarding dmake, dictionaries and so on, though I don't recall exactly what went into their release. I would be surprised if any category X dependencies have wound up in an incubating release without Legal's involvement. Lucy's early incubating releases had two Perl-licensed (Artistic/GPL) dependencies (which were not bundled, but had to be downloaded and installed separately by the consumer). We sought a variance from Legal and got specific approval from the Legal VP for our plan, which involved ditching both of the problematic dependencies prior to graduation: https://issues.apache.org/jira/browse/LEGAL-86 Are there other examples? The one that I had in mind was Roller. Several of its incubating releases had a hard dependency on Hibernate. They were required to clean it up before graduation, of course. You can look at the archives back in 2006 when it was incubating. In particular, there is one sent to private@incubator that I would refer you to: http://s.apache.org/c04 [only usable by ASF Members] Didn't that get subsequently revised by Cliff et al into Incubating projects must not distribute an official product release that includes works covered by an excluded license - http://www.apache.org/legal/3party.html#transition-incubator ...ant - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [PROPOSAL] Drill for the Apache Incubator
I concur with Andrzej. Let's see that VOTE Ted! Otis Performance Monitoring for Solr / ElasticSearch / HBase - http://sematext.com/spm From: Andrzej Bialecki a...@getopt.org To: general@incubator.apache.org Sent: Tuesday, August 7, 2012 5:51 PM Subject: Re: [PROPOSAL] Drill for the Apache Incubator On 07/08/2012 21:14, Franklin, Matthew B. wrote: -Original Message- From: Marvin Humphrey [mailto:mar...@rectangular.com] Sent: Monday, August 06, 2012 12:25 PM To: general@incubator.apache.org Cc: Grant Ingersoll; Isabel Drost Subject: Re: [PROPOSAL] Drill for the Apache Incubator On Thu, Aug 2, 2012 at 3:12 PM, Ted Dunning ted.dunn...@gmail.com wrote: Initial Source == There is no initial source code. All source code will be developed within the Apache Incubator. Coming in without any source code is going to pose a challenge to this podling. http://www.apache.org/foundation/how-it-works.html#incubator The incubator filters projects on the basis of the likeliness of them becoming successful meritocratic communities. The basic requirements for incubation are: * a working codebase -- over the years and after several failures, the foundation came to understand that without an initial working codebase, it is generally hard to bootstrap a community. This is because merit is not well recognized by developers without a working codebase. Also, the friction that is developed during the initial design stage is likely to fragment the community. It seems like there could be flexibility in this requirement, based on a few factors. In this case, a design discussion has been ongoing; but I would also think that any community coming in with enough people who know the Apache way may also not need as much of a solid starting point code wise. +1. Given the credentials and the experience of proposed committers and mentors, and the fact that the initial design is already done, I don't think this is a serious risk. And it's an exciting proposal with a potentially big impact. -- Best regards, Andrzej Bialecki http://www.sigram.com, blog http://www.sigram.com/blog ___.,___,___,___,_._. __ [___||.__|__/|__||\/|: Information Retrieval, System Integration ___|||__||..\|..||..|: Contact: info at sigram dot com - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: Status of Blur?
Hi Otis, Nice! yeah, we're bootstrapping now... join us on blur-dev@i.a.o and blur-user@i.a.o http://incubator.apache.org/projects/blur.html The ticket's in now to get the git repo up too. Thanks, --tim On Tue, Aug 7, 2012 at 8:05 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hi, What's the word on Blur? The Proposal went well, VOTE thread got all +1s back on July 20th, but not sure if anything is happening with it now and I'm itching! :) Thanks, Otis Search Analytics - http://sematext.com/search-analytics/index.html Scalable Performance Monitoring - http://sematext.com/spm/index.html - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [VOTE] Release Apache Bloodhound 0.1.0 (incubating)
On Tue, Aug 7, 2012 at 5:54 PM, ant elder ant.el...@gmail.com wrote: On Tue, Aug 7, 2012 at 9:51 PM, Greg Stein gst...@gmail.com wrote: ... You can look at the archives back in 2006 when it was incubating. In particular, there is one sent to private@incubator that I would refer you to: http://s.apache.org/c04 [only usable by ASF Members] Didn't that get subsequently revised by Cliff et al into Incubating projects must not distribute an official product release that includes works covered by an excluded license - http://www.apache.org/legal/3party.html#transition-incubator Dunno. That link is for a draft document, and has been replaced by a final/resolved form (see link at top of page). Regardless... Jukka posted recently, and I'd look to his note for current policy. I think his statement puts Incubator policy a little more relaxed than ASF, but likely not as relaxed as I would have posited (in regards to dependencies). Cheers, -g - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Amber - A Shepherd's View
Community looks well on the way to graduation. Congratulations on the recent release. There are a few things on the status page that need to be filled in and processes [1] like suitable name search need to be completed prior to graduation vote at the IPMC level. [1]:http://incubator.apache.org/guides/graduation.html#checklist - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
[VOTE] Accept Drill into the Apache Incubator
I would like to call a vote for accepting Drill for incubation in the Apache Incubator. The full proposal is available below. Discussion over the last few days has been quite positive. Please cast your vote: [ ] +1, bring Drill into Incubator [ ] +0, I don't care either way, [ ] -1, do not bring Drill into Incubator, because... This vote will be open for 72 hours and only votes from the Incubator PMC are binding. The start of the vote is just before 3AM UTC on 8 August so the closing time will be 3AM UTC on 11 August. Thank you for your consideration! Ted http://wiki.apache.org/incubator/DrillProposal = Drill = == Abstract == Drill is a distributed system for interactive analysis of large-scale datasets, inspired by [[http://research.google.com/pubs/pub36632.html|Google's Dremel]]. == Proposal == Drill is a distributed system for interactive analysis of large-scale datasets. Drill is similar to Google's Dremel, with the additional flexibility needed to support a broader range of query languages, data formats and data sources. It is designed to efficiently process nested data. It is a design goal to scale to 10,000 servers or more and to be able to process petabyes of data and trillions of records in seconds. == Background == Many organizations have the need to run data-intensive applications, including batch processing, stream processing and interactive analysis. In recent years open source systems have emerged to address the need for scalable batch processing (Apache Hadoop) and stream processing (Storm, Apache S4). In 2010 Google published a paper called Dremel: Interactive Analysis of Web-Scale Datasets, describing a scalable system used internally for interactive analysis of nested data. No open source project has successfully replicated the capabilities of Dremel. == Rationale == There is a strong need in the market for low-latency interactive analysis of large-scale datasets, including nested data (eg, JSON, Avro, Protocol Buffers). This need was identified by Google and addressed internally with a system called Dremel. In recent years open source systems have emerged to address the need for scalable batch processing (Apache Hadoop) and stream processing (Storm, Apache S4). Apache Hadoop, originally inspired by Google's internal MapReduce system, is used by thousands of organizations processing large-scale datasets. Apache Hadoop is designed to achieve very high throughput, but is not designed to achieve the sub-second latency needed for interactive data analysis and exploration. Drill, inspired by Google's internal Dremel system, is intended to address this need. It is worth noting that, as explained by Google in the original paper, Dremel complements MapReduce-based computing. Dremel is not intended as a replacement for MapReduce and is often used in conjunction with it to analyze outputs of MapReduce pipelines or rapidly prototype larger computations. Indeed, Dremel and MapReduce are both used by thousands of Google employees. Like Dremel, Drill supports a nested data model with data encoded in a number of formats such as JSON, Avro or Protocol Buffers. In many organizations nested data is the standard, so supporting a nested data model eliminates the need to normalize the data. With that said, flat data formats, such as CSV files, are naturally supported as a special case of nested data. The Drill architecture consists of four key components/layers: * Query languages: This layer is responsible for parsing the user's query and constructing an execution plan. The initial goal is to support the SQL-like language used by Dremel and [[https://developers.google.com/bigquery/docs/query-reference|Google BigQuery]], which we call DrQL. However, Drill is designed to support other languages and programming models, such as the [[http://www.mongodb.org/display/DOCS/Mongo+Query+Language|Mongo Query Language]], [[http://www.cascading.org/|Cascading]] or [[https://github.com/tdunning/Plume|Plume]]. * Low-latency distributed execution engine: This layer is responsible for executing the physical plan. It provides the scalability and fault tolerance needed to efficiently query petabytes of data on 10,000 servers. Drill's execution engine is based on research in distributed execution engines (eg, Dremel, Dryad, Hyracks, CIEL, Stratosphere) and columnar storage, and can be extended with additional operators and connectors. * Nested data formats: This layer is responsible for supporting various data formats. The initial goal is to support the column-based format used by Dremel. Drill is designed to support schema-based formats such as Protocol Buffers/Dremel, Avro/AVRO-806/Trevni and CSV, and schema-less formats such as JSON, BSON or YAML. In addition, it is designed to support column-based formats such as Dremel, AVRO-806/Trevni and RCFile, and row-based formats such as Protocol Buffers, Avro, JSON, BSON and CSV. A particular distinction with Drill is that the execution engine is flexible enough
Re: [PROPOSAL] Drill for the Apache Incubator
Just sent that out. Thanks for the encouragement! On Tue, Aug 7, 2012 at 6:02 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: I concur with Andrzej. Let's see that VOTE Ted! - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [VOTE] Accept Drill into the Apache Incubator
+1 (binding) On Tue, Aug 7, 2012 at 7:41 PM, Ted Dunning ted.dunn...@gmail.com wrote: I would like to call a vote for accepting Drill for incubation in the Apache Incubator. The full proposal is available below. Discussion over the last few days has been quite positive. Please cast your vote: [ ] +1, bring Drill into Incubator [ ] +0, I don't care either way, [ ] -1, do not bring Drill into Incubator, because... This vote will be open for 72 hours and only votes from the Incubator PMC are binding. The start of the vote is just before 3AM UTC on 8 August so the closing time will be 3AM UTC on 11 August. Thank you for your consideration! Ted http://wiki.apache.org/incubator/DrillProposal = Drill = == Abstract == Drill is a distributed system for interactive analysis of large-scale datasets, inspired by [[http://research.google.com/pubs/pub36632.html|Google's Dremel]]. == Proposal == Drill is a distributed system for interactive analysis of large-scale datasets. Drill is similar to Google's Dremel, with the additional flexibility needed to support a broader range of query languages, data formats and data sources. It is designed to efficiently process nested data. It is a design goal to scale to 10,000 servers or more and to be able to process petabyes of data and trillions of records in seconds. == Background == Many organizations have the need to run data-intensive applications, including batch processing, stream processing and interactive analysis. In recent years open source systems have emerged to address the need for scalable batch processing (Apache Hadoop) and stream processing (Storm, Apache S4). In 2010 Google published a paper called Dremel: Interactive Analysis of Web-Scale Datasets, describing a scalable system used internally for interactive analysis of nested data. No open source project has successfully replicated the capabilities of Dremel. == Rationale == There is a strong need in the market for low-latency interactive analysis of large-scale datasets, including nested data (eg, JSON, Avro, Protocol Buffers). This need was identified by Google and addressed internally with a system called Dremel. In recent years open source systems have emerged to address the need for scalable batch processing (Apache Hadoop) and stream processing (Storm, Apache S4). Apache Hadoop, originally inspired by Google's internal MapReduce system, is used by thousands of organizations processing large-scale datasets. Apache Hadoop is designed to achieve very high throughput, but is not designed to achieve the sub-second latency needed for interactive data analysis and exploration. Drill, inspired by Google's internal Dremel system, is intended to address this need. It is worth noting that, as explained by Google in the original paper, Dremel complements MapReduce-based computing. Dremel is not intended as a replacement for MapReduce and is often used in conjunction with it to analyze outputs of MapReduce pipelines or rapidly prototype larger computations. Indeed, Dremel and MapReduce are both used by thousands of Google employees. Like Dremel, Drill supports a nested data model with data encoded in a number of formats such as JSON, Avro or Protocol Buffers. In many organizations nested data is the standard, so supporting a nested data model eliminates the need to normalize the data. With that said, flat data formats, such as CSV files, are naturally supported as a special case of nested data. The Drill architecture consists of four key components/layers: * Query languages: This layer is responsible for parsing the user's query and constructing an execution plan. The initial goal is to support the SQL-like language used by Dremel and [[https://developers.google.com/bigquery/docs/query-reference|Google BigQuery]], which we call DrQL. However, Drill is designed to support other languages and programming models, such as the [[http://www.mongodb.org/display/DOCS/Mongo+Query+Language|Mongohttp://www.mongodb.org/display/DOCS/Mongo+Query+Language%7CMongoQuery Language]], [[http://www.cascading.org/|Cascading]] or [[https://github.com/tdunning/Plume|Plume]]. * Low-latency distributed execution engine: This layer is responsible for executing the physical plan. It provides the scalability and fault tolerance needed to efficiently query petabytes of data on 10,000 servers. Drill's execution engine is based on research in distributed execution engines (eg, Dremel, Dryad, Hyracks, CIEL, Stratosphere) and columnar storage, and can be extended with additional operators and connectors. * Nested data formats: This layer is responsible for supporting various data formats. The initial goal is to support the column-based format used by Dremel. Drill is designed to support schema-based formats such as Protocol Buffers/Dremel, Avro/AVRO-806/Trevni and CSV, and schema-less formats such as JSON, BSON or YAML. In addition, it is
Re: [PROPOSAL] Drill for the Apache Incubator
Ted, Wasn't clear, can I add myself now? thanks, Arun On Aug 6, 2012, at 9:08 AM, Ted Dunning wrote: Sounds like some good pull. I will call a vote tomorrow. On Mon, Aug 6, 2012 at 9:45 AM, Arun C Murthy a...@hortonworks.com wrote: Agreed, likewise. I'd love to get involved and would like to add myself whenever you are ready. thanks, Arun On Aug 3, 2012, at 10:40 AM, Owen O'Malley wrote: On Thu, Aug 2, 2012 at 3:12 PM, Ted Dunning ted.dunn...@gmail.com wrote: Drill is a distributed system for interactive analysis of large-scale datasets, inspired by Google’s Dremel ( http://research.google.com/pubs/pub36632.html). This sounds really interesting Ted and I would love to help you. Would it be ok to add myself as one of the initial committers? Thanks, Owen -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/ -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/
Re: [PROPOSAL] Drill for the Apache Incubator
On Tue, Aug 7, 2012 at 12:14 PM, Franklin, Matthew B. mfrank...@mitre.org wrote: The incubator filters projects on the basis of the likeliness of them becoming successful meritocratic communities. The basic requirements for incubation are: * a working codebase -- over the years and after several failures, the foundation came to understand that without an initial working codebase, it is generally hard to bootstrap a community. This is because merit is not well recognized by developers without a working codebase. Also, the friction that is developed during the initial design stage is likely to fragment the community. It seems like there could be flexibility in this requirement, based on a few factors. In this case, a design discussion has been ongoing; but I would also think that any community coming in with enough people who know the Apache way may also not need as much of a solid starting point code wise. In the abstract, I'm a little skeptical about your last point. The inclusive, collaborative emphasis of the Apache Way is effective for evolutionary development of an existing code base, but IMO it's less well suited to the revolutionary act of starting a project. Choosing what *not* to do is really important when you start out, and that's not necessarily our strength. In Drill's case, I think the focus problem is mitigated by the fact that the podling will start with design documents and the Dremel whitepaper rather than a blank slate empty repository. In addition, the other classic problem which afflicts podlings which start with no code -- difficulty refreshing the community with no releases -- seems unlikely to manifest. The proposal looks good to me now. :) Marvin Humphrey - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [PROPOSAL] Drill for the Apache Incubator
On Tue, Aug 7, 2012 at 10:09 PM, Arun C Murthy a...@hortonworks.com wrote: Wasn't clear, can I add myself now? Didn't the Incubator go back to discouraging open enrollment? Is it OK to be invited in based on merit later, or do you feel that due to the nature of this project, it's essential to be in on the ground floor? Marvin Humphrey - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [VOTE] Accept Drill into the Apache Incubator
+1 (non-binding) On Wed, Aug 8, 2012 at 8:11 AM, Ted Dunning ted.dunn...@gmail.com wrote: I would like to call a vote for accepting Drill for incubation in the Apache Incubator. The full proposal is available below. Discussion over the last few days has been quite positive. Please cast your vote: [ ] +1, bring Drill into Incubator [ ] +0, I don't care either way, [ ] -1, do not bring Drill into Incubator, because... This vote will be open for 72 hours and only votes from the Incubator PMC are binding. The start of the vote is just before 3AM UTC on 8 August so the closing time will be 3AM UTC on 11 August. Thank you for your consideration! Ted http://wiki.apache.org/incubator/DrillProposal = Drill = == Abstract == Drill is a distributed system for interactive analysis of large-scale datasets, inspired by [[http://research.google.com/pubs/pub36632.html|Google's Dremel]]. == Proposal == Drill is a distributed system for interactive analysis of large-scale datasets. Drill is similar to Google's Dremel, with the additional flexibility needed to support a broader range of query languages, data formats and data sources. It is designed to efficiently process nested data. It is a design goal to scale to 10,000 servers or more and to be able to process petabyes of data and trillions of records in seconds. == Background == Many organizations have the need to run data-intensive applications, including batch processing, stream processing and interactive analysis. In recent years open source systems have emerged to address the need for scalable batch processing (Apache Hadoop) and stream processing (Storm, Apache S4). In 2010 Google published a paper called Dremel: Interactive Analysis of Web-Scale Datasets, describing a scalable system used internally for interactive analysis of nested data. No open source project has successfully replicated the capabilities of Dremel. == Rationale == There is a strong need in the market for low-latency interactive analysis of large-scale datasets, including nested data (eg, JSON, Avro, Protocol Buffers). This need was identified by Google and addressed internally with a system called Dremel. In recent years open source systems have emerged to address the need for scalable batch processing (Apache Hadoop) and stream processing (Storm, Apache S4). Apache Hadoop, originally inspired by Google's internal MapReduce system, is used by thousands of organizations processing large-scale datasets. Apache Hadoop is designed to achieve very high throughput, but is not designed to achieve the sub-second latency needed for interactive data analysis and exploration. Drill, inspired by Google's internal Dremel system, is intended to address this need. It is worth noting that, as explained by Google in the original paper, Dremel complements MapReduce-based computing. Dremel is not intended as a replacement for MapReduce and is often used in conjunction with it to analyze outputs of MapReduce pipelines or rapidly prototype larger computations. Indeed, Dremel and MapReduce are both used by thousands of Google employees. Like Dremel, Drill supports a nested data model with data encoded in a number of formats such as JSON, Avro or Protocol Buffers. In many organizations nested data is the standard, so supporting a nested data model eliminates the need to normalize the data. With that said, flat data formats, such as CSV files, are naturally supported as a special case of nested data. The Drill architecture consists of four key components/layers: * Query languages: This layer is responsible for parsing the user's query and constructing an execution plan. The initial goal is to support the SQL-like language used by Dremel and [[https://developers.google.com/bigquery/docs/query-reference|Google BigQuery]], which we call DrQL. However, Drill is designed to support other languages and programming models, such as the [[http://www.mongodb.org/display/DOCS/Mongo+Query+Language|Mongo Query Language]], [[http://www.cascading.org/|Cascading]] or [[https://github.com/tdunning/Plume|Plume]]. * Low-latency distributed execution engine: This layer is responsible for executing the physical plan. It provides the scalability and fault tolerance needed to efficiently query petabytes of data on 10,000 servers. Drill's execution engine is based on research in distributed execution engines (eg, Dremel, Dryad, Hyracks, CIEL, Stratosphere) and columnar storage, and can be extended with additional operators and connectors. * Nested data formats: This layer is responsible for supporting various data formats. The initial goal is to support the column-based format used by Dremel. Drill is designed to support schema-based formats such as Protocol Buffers/Dremel, Avro/AVRO-806/Trevni and CSV, and schema-less formats such as JSON, BSON or YAML. In addition, it is designed to support column-based formats such as Dremel,
Re: [VOTE] Accept Drill into the Apache Incubator
+1 (binding) On Aug 7, 2012, at 7:41 PM, Ted Dunning wrote: I would like to call a vote for accepting Drill for incubation in the Apache Incubator. The full proposal is available below. Discussion over the last few days has been quite positive. Please cast your vote: [ ] +1, bring Drill into Incubator [ ] +0, I don't care either way, [ ] -1, do not bring Drill into Incubator, because... This vote will be open for 72 hours and only votes from the Incubator PMC are binding. The start of the vote is just before 3AM UTC on 8 August so the closing time will be 3AM UTC on 11 August. Thank you for your consideration! Ted http://wiki.apache.org/incubator/DrillProposal = Drill = == Abstract == Drill is a distributed system for interactive analysis of large-scale datasets, inspired by [[http://research.google.com/pubs/pub36632.html|Google's Dremel]]. == Proposal == Drill is a distributed system for interactive analysis of large-scale datasets. Drill is similar to Google's Dremel, with the additional flexibility needed to support a broader range of query languages, data formats and data sources. It is designed to efficiently process nested data. It is a design goal to scale to 10,000 servers or more and to be able to process petabyes of data and trillions of records in seconds. == Background == Many organizations have the need to run data-intensive applications, including batch processing, stream processing and interactive analysis. In recent years open source systems have emerged to address the need for scalable batch processing (Apache Hadoop) and stream processing (Storm, Apache S4). In 2010 Google published a paper called Dremel: Interactive Analysis of Web-Scale Datasets, describing a scalable system used internally for interactive analysis of nested data. No open source project has successfully replicated the capabilities of Dremel. == Rationale == There is a strong need in the market for low-latency interactive analysis of large-scale datasets, including nested data (eg, JSON, Avro, Protocol Buffers). This need was identified by Google and addressed internally with a system called Dremel. In recent years open source systems have emerged to address the need for scalable batch processing (Apache Hadoop) and stream processing (Storm, Apache S4). Apache Hadoop, originally inspired by Google's internal MapReduce system, is used by thousands of organizations processing large-scale datasets. Apache Hadoop is designed to achieve very high throughput, but is not designed to achieve the sub-second latency needed for interactive data analysis and exploration. Drill, inspired by Google's internal Dremel system, is intended to address this need. It is worth noting that, as explained by Google in the original paper, Dremel complements MapReduce-based computing. Dremel is not intended as a replacement for MapReduce and is often used in conjunction with it to analyze outputs of MapReduce pipelines or rapidly prototype larger computations. Indeed, Dremel and MapReduce are both used by thousands of Google employees. Like Dremel, Drill supports a nested data model with data encoded in a number of formats such as JSON, Avro or Protocol Buffers. In many organizations nested data is the standard, so supporting a nested data model eliminates the need to normalize the data. With that said, flat data formats, such as CSV files, are naturally supported as a special case of nested data. The Drill architecture consists of four key components/layers: * Query languages: This layer is responsible for parsing the user's query and constructing an execution plan. The initial goal is to support the SQL-like language used by Dremel and [[https://developers.google.com/bigquery/docs/query-reference|Google BigQuery]], which we call DrQL. However, Drill is designed to support other languages and programming models, such as the [[http://www.mongodb.org/display/DOCS/Mongo+Query+Language|Mongo Query Language]], [[http://www.cascading.org/|Cascading]] or [[https://github.com/tdunning/Plume|Plume]]. * Low-latency distributed execution engine: This layer is responsible for executing the physical plan. It provides the scalability and fault tolerance needed to efficiently query petabytes of data on 10,000 servers. Drill's execution engine is based on research in distributed execution engines (eg, Dremel, Dryad, Hyracks, CIEL, Stratosphere) and columnar storage, and can be extended with additional operators and connectors. * Nested data formats: This layer is responsible for supporting various data formats. The initial goal is to support the column-based format used by Dremel. Drill is designed to support schema-based formats such as Protocol Buffers/Dremel, Avro/AVRO-806/Trevni and CSV, and schema-less formats such as JSON, BSON or YAML. In addition, it is designed to support column-based formats such as Dremel, AVRO-806/Trevni and