Re: [VOTE] Accept Drill into the Apache Incubator
+1 (binding) On Wed, Aug 8, 2012 at 8:33 AM, Mattmann, Chris A (388J) chris.a.mattm...@jpl.nasa.gov wrote: +1 (binding). Good luck and sounds cool! Cheers, Chris On Aug 7, 2012, at 7:41 PM, Ted Dunning wrote: I would like to call a vote for accepting Drill for incubation in the Apache Incubator. The full proposal is available below. Discussion over the last few days has been quite positive. Please cast your vote: [ ] +1, bring Drill into Incubator [ ] +0, I don't care either way, [ ] -1, do not bring Drill into Incubator, because... This vote will be open for 72 hours and only votes from the Incubator PMC are binding. The start of the vote is just before 3AM UTC on 8 August so the closing time will be 3AM UTC on 11 August. Thank you for your consideration! Ted http://wiki.apache.org/incubator/DrillProposal = Drill = == Abstract == Drill is a distributed system for interactive analysis of large-scale datasets, inspired by [[http://research.google.com/pubs/pub36632.html|Google's Dremel]]. == Proposal == Drill is a distributed system for interactive analysis of large-scale datasets. Drill is similar to Google's Dremel, with the additional flexibility needed to support a broader range of query languages, data formats and data sources. It is designed to efficiently process nested data. It is a design goal to scale to 10,000 servers or more and to be able to process petabyes of data and trillions of records in seconds. == Background == Many organizations have the need to run data-intensive applications, including batch processing, stream processing and interactive analysis. In recent years open source systems have emerged to address the need for scalable batch processing (Apache Hadoop) and stream processing (Storm, Apache S4). In 2010 Google published a paper called Dremel: Interactive Analysis of Web-Scale Datasets, describing a scalable system used internally for interactive analysis of nested data. No open source project has successfully replicated the capabilities of Dremel. == Rationale == There is a strong need in the market for low-latency interactive analysis of large-scale datasets, including nested data (eg, JSON, Avro, Protocol Buffers). This need was identified by Google and addressed internally with a system called Dremel. In recent years open source systems have emerged to address the need for scalable batch processing (Apache Hadoop) and stream processing (Storm, Apache S4). Apache Hadoop, originally inspired by Google's internal MapReduce system, is used by thousands of organizations processing large-scale datasets. Apache Hadoop is designed to achieve very high throughput, but is not designed to achieve the sub-second latency needed for interactive data analysis and exploration. Drill, inspired by Google's internal Dremel system, is intended to address this need. It is worth noting that, as explained by Google in the original paper, Dremel complements MapReduce-based computing. Dremel is not intended as a replacement for MapReduce and is often used in conjunction with it to analyze outputs of MapReduce pipelines or rapidly prototype larger computations. Indeed, Dremel and MapReduce are both used by thousands of Google employees. Like Dremel, Drill supports a nested data model with data encoded in a number of formats such as JSON, Avro or Protocol Buffers. In many organizations nested data is the standard, so supporting a nested data model eliminates the need to normalize the data. With that said, flat data formats, such as CSV files, are naturally supported as a special case of nested data. The Drill architecture consists of four key components/layers: * Query languages: This layer is responsible for parsing the user's query and constructing an execution plan. The initial goal is to support the SQL-like language used by Dremel and [[https://developers.google.com/bigquery/docs/query-reference|Google BigQuery]], which we call DrQL. However, Drill is designed to support other languages and programming models, such as the [[http://www.mongodb.org/display/DOCS/Mongo+Query+Language|Mongo Query Language]], [[http://www.cascading.org/|Cascading]] or [[https://github.com/tdunning/Plume|Plume]]. * Low-latency distributed execution engine: This layer is responsible for executing the physical plan. It provides the scalability and fault tolerance needed to efficiently query petabytes of data on 10,000 servers. Drill's execution engine is based on research in distributed execution engines (eg, Dremel, Dryad, Hyracks, CIEL, Stratosphere) and columnar storage, and can be extended with additional operators and connectors. * Nested data formats: This layer is responsible for supporting various data formats. The initial goal is to support the column-based format used by Dremel. Drill is designed to support
Re: [PROPOSAL] Drill for the Apache Incubator
On Mon, Aug 6, 2012 at 2:23 PM, Ted Dunning ted.dunn...@gmail.com wrote: No reason at all. Sorry. I may have been unclear. I was requesting that the design docs which are being referenced in the proposal: The requirement and design documents are currently stored in MapR Technologies' source code repository. They will be checked in as part of the initial code dump. be made available for review as part of the proposal, much as an initial source code base would be. There is also a reference to a presentation to-be-made available: High-level slides have been published by MapR: TODO Can those be made public? - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [VOTE] Release Apache Bloodhound 0.1.0 (incubating)
On Wed, Aug 8, 2012 at 1:13 AM, Greg Stein gst...@gmail.com wrote: On Tue, Aug 7, 2012 at 5:54 PM, ant elder ant.el...@gmail.com wrote: On Tue, Aug 7, 2012 at 9:51 PM, Greg Stein gst...@gmail.com wrote: ... You can look at the archives back in 2006 when it was incubating. In particular, there is one sent to private@incubator that I would refer you to: http://s.apache.org/c04 [only usable by ASF Members] Didn't that get subsequently revised by Cliff et al into Incubating projects must not distribute an official product release that includes works covered by an excluded license - http://www.apache.org/legal/3party.html#transition-incubator Dunno. That link is for a draft document, and has been replaced by a final/resolved form (see link at top of page). Regardless... Jukka posted recently, and I'd look to his note for current policy. I think his statement puts Incubator policy a little more relaxed than ASF, but likely not as relaxed as I would have posited (in regards to dependencies). The good thing about release votes is that they can't be vetoed so regardless of what policies may or may not be documented whether or not a release vote passes is just down to getting enough people to vote +1. Votes on general@ often stall and require a respin when someone claims something is wrong which puts off others from voting. Something as basic as a dependent license missing from the LICENSE file would be one of those things that in the past would have always demanded a respin, so the change, and it is a change, to allow wiggle room is what i hope people will remember from this. ...ant - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: {RESULT] [VOTE] S4 0.5.0 Release Candidate 1
Hello, On 8/7/12 4:30 PM, Richard Frovarp wrote: On 08/07/2012 06:36 AM, Matthieu Morel wrote: Hi, The vote for this S4 release passed with the following results at the vote deadline: +1: 7 (5 binding) -1: 0 Details: +1 IPMC: acmurthy, phunt +1 PPMC kishoreg*, leoneu*, fpj +1 wider community Daniel Gomez, Karthik Kambatla Thanks to all the participants to the voting process! I'll now publish the artifacts, and after the sync delay, update the websites and send announcements. Matthieu Best I know, you need three IPMC votes for it to pass. Thanks for outlining the missing IPMC vote Flavio, thanks for the clarification Richard, and sorry all for my misinterpretation and for the noise on this list. It seems that typically mentors vote for releases, and that counts as IPMC votes, unfortunately we now only have 2 mentors for S4 (both +1'ed). At this point, I believe that we should ask for another IPMC to vote for the release, by sending a specific vote request, even though the vote expired, is this correct? Should we set a timeframe for the vote (I don't see that in previous similar requests)? Thanks, Matthieu - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: Incubator release task force
Hi, For people interested in working on this, the ongoing Bloodhound release vote has triggered some good discussion that would be great to capture somehow. BR, Jukka Zitting - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [VOTE] Accept Drill into the Apache Incubator
On 08/08/2012 04:41, Ted Dunning wrote: I would like to call a vote for accepting Drill for incubation in the Apache Incubator. The full proposal is available below. Discussion over the last few days has been quite positive. Please cast your vote: [ ] +1, bring Drill into Incubator [ ] +0, I don't care either way, [ ] -1, do not bring Drill into Incubator, because... This vote will be open for 72 hours and only votes from the Incubator PMC are binding. The start of the vote is just before 3AM UTC on 8 August so the closing time will be 3AM UTC on 11 August. +1 (binding) - this is an exciting proposal! -- Best regards, Andrzej Bialecki http://www.sigram.com, blog http://www.sigram.com/blog ___.,___,___,___,_._. __ [___||.__|__/|__||\/|: Information Retrieval, System Integration ___|||__||..\|..||..|: Contact: info at sigram dot com - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: Incubator release task force
On Thu, Jul 26, 2012 at 4:10 PM, Jukka Zitting jukka.zitt...@gmail.com wrote: ...I'd like to start fixing this by forming a release task force of a handful of volunteers who are ready to invest an hour or two per week to work onb) migrating /dist/incubator to svnpubsub by the end of this year... I'm interested in helping with that but I'd suggest starting from scratch on new docs in svnpubsub, in order to create a minimal set of docs that's understandable and maintainable. We'd keep the current docs around as the old docs and refer to them less and less and the new, smaller ones take shape. I'll discuss that in a separate thread. -Bertrand - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
[RT] a minimal set of docs for incubator.apache.org
Hi, Like others, I'm not too happy with the current http://incubator.apache.org/ content. How about starting a new, minimal set of docs that are more maintainable and understandable? IMO, the following would be sufficient, with one page per topic: 1. What's the Apache Incubator? (homepage) 2. Lifecycle of a podling, from proposal to graduation, with many links to existing examples (proposals, committer votes, graduation threads, etc.) 3. Release checklist: criteria for approving a release 4. Previously asked questions (a la http://www.apache.org/legal/resolved.html, includes IP clearance info) 6. Glossary of terms (though that might belong to the top-level apache.org site instead) I'm just considering the narrative info, not the podling status pages or clutch stuff in this refactoring. That status info might move to podlings.incubator.apache.org to better separate it and keep the main site minimal. I've got some draft content for 2. and 3., that I've been collecting in my mentoring activities. WDYT? -Bertrand - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [VOTE] Accept Drill into the Apache Incubator
On Wed, Aug 8, 2012 at 4:41 AM, Ted Dunning ted.dunn...@gmail.com wrote: I would like to call a vote for accepting Drill for incubation in the Apache Incubator... +1 -Bertrand - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [RT] a minimal set of docs for incubator.apache.org
Commit the content. Otherwise, we're just hand-waving. On Aug 8, 2012 5:29 AM, Bertrand Delacretaz bdelacre...@apache.org wrote: Hi, Like others, I'm not too happy with the current http://incubator.apache.org/ content. How about starting a new, minimal set of docs that are more maintainable and understandable? IMO, the following would be sufficient, with one page per topic: 1. What's the Apache Incubator? (homepage) 2. Lifecycle of a podling, from proposal to graduation, with many links to existing examples (proposals, committer votes, graduation threads, etc.) 3. Release checklist: criteria for approving a release 4. Previously asked questions (a la http://www.apache.org/legal/resolved.html, includes IP clearance info) 6. Glossary of terms (though that might belong to the top-level apache.org site instead) I'm just considering the narrative info, not the podling status pages or clutch stuff in this refactoring. That status info might move to podlings.incubator.apache.org to better separate it and keep the main site minimal. I've got some draft content for 2. and 3., that I've been collecting in my mentoring activities. WDYT? -Bertrand - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [RT] a minimal set of docs for incubator.apache.org
Committing somewhere would be good as otherwise I don't know whether I need to suggest the following (I wasn't sure of the best thread for this to go in anyway): May I suggest that, where appropriate, the documentation is backed up with pointers to examples of existing projects that are considered to represent the current best practice on various aspects. Whilst clear documentation is fantastic, there is nothing like good examples for building confidence that one is doing things in the right way. Cheers, Gary On 08/08/2012 10:42 AM, Greg Stein wrote: Commit the content. Otherwise, we're just hand-waving. On Aug 8, 2012 5:29 AM, Bertrand Delacretaz bdelacre...@apache.org wrote: Hi, Like others, I'm not too happy with the current http://incubator.apache.org/ content. How about starting a new, minimal set of docs that are more maintainable and understandable? IMO, the following would be sufficient, with one page per topic: 1. What's the Apache Incubator? (homepage) 2. Lifecycle of a podling, from proposal to graduation, with many links to existing examples (proposals, committer votes, graduation threads, etc.) 3. Release checklist: criteria for approving a release 4. Previously asked questions (a la http://www.apache.org/legal/resolved.html, includes IP clearance info) 6. Glossary of terms (though that might belong to the top-level apache.org site instead) I'm just considering the narrative info, not the podling status pages or clutch stuff in this refactoring. That status info might move to podlings.incubator.apache.org to better separate it and keep the main site minimal. I've got some draft content for 2. and 3., that I've been collecting in my mentoring activities. WDYT? -Bertrand - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [VOTE] Accept Drill into the Apache Incubator
On Wed, Aug 8, 2012 at 11:39 AM, Bertrand Delacretaz bdelacre...@apache.org wrote: On Wed, Aug 8, 2012 at 4:41 AM, Ted Dunning ted.dunn...@gmail.com wrote: I would like to call a vote for accepting Drill for incubation in the Apache Incubator... +1 +1 cheers, Torsten - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [VOTE] Apache Syncope 1.0.0-incubating
+1 (non-binding) -- With best regards / с наилучшими пожеланиями, Alexei Fedotov / Алексей Федотов, http://dataved.ru/ +7 916 562 8095 On Mon, Aug 6, 2012 at 7:50 PM, Francesco Chicchiriccò ilgro...@apache.org wrote: On 06/08/2012 16:36, Alexei Fedotov wrote: Hello Francesco, Here are few things I have found via manual inspection: 1. Jquery bundle contains several following strings: Dual licensed under the MIT or GPL Version 2 licenses. *) source release LICENSE file does not contain MIT license; *) and the file itself does not look like APL licensed; *) and it is a part of the source release. Something should be fixed here, i.e. the files replaced with wget in the build script. 2. ./legal_ext/LICENSE does not have a license for jquery. Does war contain jquery? Hi Alexei, I've taken a look at other ASF projects including JQuery (or similar dual-licensed JS frameworks) and I've opened https://issues.apache.org/jira/browse/SYNCOPE-181 We'll fix this ASAP. Don't think these issues are stoppers. Cool :-) What's your vote on the release, then? Thanks for your review. Regards. On Mon, Aug 6, 2012 at 6:07 PM, Mark Struberg strub...@yahoo.de wrote: Hi Francesco, I can check in the evening. LieGrue, strub - Original Message - From: Francesco Chicchiriccò ilgro...@apache.org To: general@incubator.apache.org Cc: Sent: Monday, August 6, 2012 2:49 PM Subject: Re: [VOTE] Apache Syncope 1.0.0-incubating Hi IPMC members, we are missing a single vote on this release: anyone interested to check? TIA. Regards. On 03/08/2012 09:58, Francesco Chicchiriccò wrote: I've created a 1.0.0-incubating release, with the following artifacts up for a vote: SVN source tag (r1367421): https://svn.apache.org/repos/asf/incubator/syncope/tags/syncope-1.0.0-incubating/ List of changes: https://svn.apache.org/repos/asf/incubator/syncope/tags/syncope-1.0.0-incubating/CHANGES Maven staging repo: https://repository.apache.org/content/repositories/orgapachesyncope-100/ Source release (checksums and signatures are available at the same location): https://repository.apache.org/content/repositories/orgapachesyncope-100/org/apache/syncope/syncope-root/1.0.0-incubating/syncope-root-1.0.0-incubating-source-release.zip Staging site: http://incubator.apache.org/syncope/1.0.0-incubating/ PGP release keys (signed using 273DF287): http://www.apache.org/dist/incubator/syncope/KEYS This has been voted through on the syncope-...@incubator.apache.org mailing list [1], and now requires a vote on general@incubator.apache.org Votes already cast (on syncope-dev): +1 (binding) * Francesco Chicchiriccò * Massimiliano Perrone * Marco Di Sabatino Di Diodoro * Emmanuel Lécharny (IPMC member) * Simone Tripodi * Colm O hEigeartaigh (IPMC member) +1 (non binding) * Denis Signoretto Vote will be open for 72 hours. [ ] +1 approve [ ] +0 no opinion [ ] -1 disapprove (and reason why) Best regards. [1] http://syncope-dev.1063484.n5.nabble.com/VOTE-Apache-Syncope-1-0-0-incubating-tp5710173p5710292.html -- Francesco Chicchiriccò ASF Member, Apache Cocoon PMC and Apache Syncope PPMC Member http://people.apache.org/~ilgrosso/ - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: Incubator release task force
On Wed, Aug 8, 2012 at 5:18 AM, Bertrand Delacretaz bdelacre...@apache.org wrote: On Thu, Jul 26, 2012 at 4:10 PM, Jukka Zitting jukka.zitt...@gmail.com wrote: ...I'd like to start fixing this by forming a release task force of a handful of volunteers who are ready to invest an hour or two per week to work onb) migrating /dist/incubator to svnpubsub by the end of this year... I'm interested in helping with that but I'd suggest starting from scratch on new docs in svnpubsub, in order to create a minimal set of docs that's understandable and maintainable. We'd keep the current docs around as the old docs and refer to them less and less and the new, smaller ones take shape. I'll discuss that in a separate thread. I'm in on this. -Bertrand - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [VOTE] Accept Drill into the Apache Incubator
On Aug 7, 2012, at 10:41 PM, Ted Dunning wrote: I would like to call a vote for accepting Drill for incubation in the Apache Incubator. The full proposal is available below. Discussion over the last few days has been quite positive. Please cast your vote: [ ] +1, bring Drill into Incubator +1 (binding) - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [VOTE] Accept Drill into the Apache Incubator
+1 (binding) On Wed, Aug 8, 2012 at 3:55 PM, Grant Ingersoll gsing...@apache.org wrote: On Aug 7, 2012, at 10:41 PM, Ted Dunning wrote: I would like to call a vote for accepting Drill for incubation in the Apache Incubator. The full proposal is available below. Discussion over the last few days has been quite positive. Please cast your vote: [ ] +1, bring Drill into Incubator +1 (binding) - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org -- Thanks - Mohammad Nour Life is like riding a bicycle. To keep your balance you must keep moving - Albert Einstein
Re: [VOTE] Accept Drill into the Apache Incubator
On Tue, Aug 7, 2012 at 9:41 PM, Ted Dunning ted.dunn...@gmail.com wrote: I would like to call a vote for accepting Drill for incubation in the Apache Incubator. The full proposal is available below. Discussion over the last few days has been quite positive. Please cast your vote: [ ] +1, bring Drill into Incubator [ ] +0, I don't care either way, [ ] -1, do not bring Drill into Incubator, because... +1 Phil - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [VOTE] Release Apache Bloodhound 0.1.0 (incubating)
Hello, Let me add one more point on adding dependencies to source releases. In addition to license, the dependence contain copyright statements, e.g. # Copyright (C) 2005 Christopher Lenz cml...@gmx.de. As mentioned here http://www.apache.org/legal/src-headers.html If the source file is submitted with a copyright notice included in it, the copyright owner (or owner's agent) must either: remove such notices, or move them to the NOTICE file associated with each applicable project release, or provide written permission for the ASF to make such removal or relocation of the notices. This issue cannot be fixed by merging licenses into LICENSE file. -- With best regards / с наилучшими пожеланиями, Alexei Fedotov / Алексей Федотов, http://dataved.ru/ +7 916 562 8095 On Wed, Aug 8, 2012 at 11:26 AM, ant elder ant.el...@gmail.com wrote: On Wed, Aug 8, 2012 at 1:13 AM, Greg Stein gst...@gmail.com wrote: On Tue, Aug 7, 2012 at 5:54 PM, ant elder ant.el...@gmail.com wrote: On Tue, Aug 7, 2012 at 9:51 PM, Greg Stein gst...@gmail.com wrote: ... You can look at the archives back in 2006 when it was incubating. In particular, there is one sent to private@incubator that I would refer you to: http://s.apache.org/c04 [only usable by ASF Members] Didn't that get subsequently revised by Cliff et al into Incubating projects must not distribute an official product release that includes works covered by an excluded license - http://www.apache.org/legal/3party.html#transition-incubator Dunno. That link is for a draft document, and has been replaced by a final/resolved form (see link at top of page). Regardless... Jukka posted recently, and I'd look to his note for current policy. I think his statement puts Incubator policy a little more relaxed than ASF, but likely not as relaxed as I would have posited (in regards to dependencies). The good thing about release votes is that they can't be vetoed so regardless of what policies may or may not be documented whether or not a release vote passes is just down to getting enough people to vote +1. Votes on general@ often stall and require a respin when someone claims something is wrong which puts off others from voting. Something as basic as a dependent license missing from the LICENSE file would be one of those things that in the past would have always demanded a respin, so the change, and it is a change, to allow wiggle room is what i hope people will remember from this. ...ant - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [VOTE] Release Apache Bloodhound 0.1.0 (incubating)
On Wed, Aug 8, 2012 at 3:19 PM, Alexei Fedotov alexei.fedo...@gmail.com wrote: Hello, Let me add one more point on adding dependencies to source releases. In addition to license, the dependence contain copyright statements, e.g. # Copyright (C) 2005 Christopher Lenz cml...@gmx.de. As mentioned here http://www.apache.org/legal/src-headers.html If the source file is submitted with a copyright notice included in it, the copyright owner (or owner's agent) must either: remove such notices, or move them to the NOTICE file associated with each applicable project release, or provide written permission for the ASF to make such removal or relocation of the notices. This issue cannot be fixed by merging licenses into LICENSE file. No, this is not what that source headers page is talking about. That page is talking about any copyright statements that may have been in source files when contributed to the ASF, here we are talking about the licenses of any external dependencies that are included in a release, and those licenses should be added to the LICENSE file, as described at: http://www.apache.org/dev/release.html#distributing-code-under-several-licenses ...ant - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
[VOTE] (missing 1 IPMC +1) S4 0.5.0 Release Candidate 1
Hi, I had misinterpreted the vote results and prematurely declared the vote as passed, sorry about that... In reality, we still need 1 IPMC +1 vote for the S4 0.5.0 Release Candidate 1. Current status after last week's votation is: +1: 7 (5 binding) -1: 0 Details: +1 IPMC: acmurthy, phunt +1 PPMC kishoreg*, leoneu*, fpj +1 wider community Daniel Gomez, Karthik Kambatla (* voted on the s4-dev list only) --- This is the first release candidate for Apache S4, version 0.5.0 It fixes the following issues: https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12312322version=12318653 Note that we are voting upon the source (tag), binaries are provided for convenience. ** The vote is open for at least 72 hours with no specific close time. Source and binary packages in zip format: http://people.apache.org/~mmorel/s4-0.5.0-incubating-release-candidate-1/ The (git) tag to be voted upon: 0.5.0: https://git-wip-us.apache.org/repos/asf?p=incubator-s4.git;a=tag;h=70806aa1ee0b9154d36fd834dc4907cd8d3eb791 S4 KEYS file containing PGP keys we use to sign the release: http://svn.apache.org/repos/asf/incubator/s4/dist/KEYS Please cast your vote. [ ] +1 approve [ ] +0 no opinion [ ] -1 disapprove (and reason why) Thanks! Matthieu - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [RT] a minimal set of docs for incubator.apache.org
On Wed, Aug 8, 2012 at 2:29 AM, Bertrand Delacretaz bdelacre...@apache.org wrote: How about starting a new, minimal set of docs that are more maintainable and understandable? +1 to accepting that the end result of the documentation overhaul may be quite different from what exists now. -1 to starting from scratch rather than continuing the ongoing evolutionary effort via progressive edits to the existing documents. IMO, the following would be sufficient, with one page per topic: 1. What's the Apache Incubator? (homepage) 2. Lifecycle of a podling, from proposal to graduation, with many links to existing examples (proposals, committer votes, graduation threads, etc.) 3. Release checklist: criteria for approving a release 4. Previously asked questions (a la http://www.apache.org/legal/resolved.html, includes IP clearance info) 6. Glossary of terms (though that might belong to the top-level apache.org site instead) I don't believe that this proposed outline will meet your goals for maintainability, because it is is not structured to take into account how the Incubator docs evolve. If we adopt this framework unmodified, I predict that over time our docs will gradually decompose and revert to the current state of incoherency. The proposed Previously asked questions page, in particular, is doomed to death-by-bloat. The Incubator's documentation gets continuously updated by people who are well-meaning but have a limited perspective. If we don't provide outlets for individuals to contribute what they are absolutely convinced is essential material but is likely just their own pet best-practices tip, minimal docs won't stay minimal for long. In my opinion, we will achieve better results if we adopt a hierarchical model: augment a minimal core with topical satellite pages (which lots of people write to but fewer people read). This paradigm is superior to minimalism for two reasons: First, the hierarchical model is sustainable while a purely minimalist approach is toxic to community and incompatible with the Apache Way. Rejecting contributions which do not fit within the tight scope of a minimalist vision is costly -- it is dispiriting for the contributor and exhausts the curator. In contrast, when a curator merely *moves* a contribution to a satellite page, less diplomatic effort is required and all parties are more likely to be more-or-less satisfied with the end result. Second, under a hierarchical model we are better able to make use of topical contributions because they will be accessible by subject rather than thrown into a catch-all like an FAQ page. While the Java and Maven stuff was buried in the giant pile of releasemanagement.html, no one had ownership of it. Now that release-java.html has been broken out, it has a decent shot at evolving into something coherent and succinct that will serve Java podlings well. I've got some draft content for 2. and 3., that I've been collecting in my mentoring activities. From past experience, I know that the quality of your writing is high... we are not exactly lacking draft content, though, you know? :\ It would be great to add your material to the collection of raw material that exists now, but I don't see that it should displace everybody else's hard work. Can you instead be persuaded to work with us on rewriting and editing down the existing docs? A lot of your draft material is likely to find its way into the final product that way. :) Marvin Humphrey - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
RE: [VOTE] Accept Drill into the Apache Incubator
+1 (binding) -Original Message- From: Ted Dunning [mailto:ted.dunn...@gmail.com] Sent: Tuesday, August 07, 2012 10:41 PM To: general@incubator.apache.org Subject: [VOTE] Accept Drill into the Apache Incubator I would like to call a vote for accepting Drill for incubation in the Apache Incubator. The full proposal is available below. Discussion over the last few days has been quite positive. Please cast your vote: [ ] +1, bring Drill into Incubator [ ] +0, I don't care either way, [ ] -1, do not bring Drill into Incubator, because... This vote will be open for 72 hours and only votes from the Incubator PMC are binding. The start of the vote is just before 3AM UTC on 8 August so the closing time will be 3AM UTC on 11 August. Thank you for your consideration! Ted http://wiki.apache.org/incubator/DrillProposal = Drill = == Abstract == Drill is a distributed system for interactive analysis of large-scale datasets, inspired by [[http://research.google.com/pubs/pub36632.html|Google's Dremel]]. == Proposal == Drill is a distributed system for interactive analysis of large-scale datasets. Drill is similar to Google's Dremel, with the additional flexibility needed to support a broader range of query languages, data formats and data sources. It is designed to efficiently process nested data. It is a design goal to scale to 10,000 servers or more and to be able to process petabyes of data and trillions of records in seconds. == Background == Many organizations have the need to run data-intensive applications, including batch processing, stream processing and interactive analysis. In recent years open source systems have emerged to address the need for scalable batch processing (Apache Hadoop) and stream processing (Storm, Apache S4). In 2010 Google published a paper called Dremel: Interactive Analysis of Web-Scale Datasets, describing a scalable system used internally for interactive analysis of nested data. No open source project has successfully replicated the capabilities of Dremel. == Rationale == There is a strong need in the market for low-latency interactive analysis of large-scale datasets, including nested data (eg, JSON, Avro, Protocol Buffers). This need was identified by Google and addressed internally with a system called Dremel. In recent years open source systems have emerged to address the need for scalable batch processing (Apache Hadoop) and stream processing (Storm, Apache S4). Apache Hadoop, originally inspired by Google's internal MapReduce system, is used by thousands of organizations processing large-scale datasets. Apache Hadoop is designed to achieve very high throughput, but is not designed to achieve the sub-second latency needed for interactive data analysis and exploration. Drill, inspired by Google's internal Dremel system, is intended to address this need. It is worth noting that, as explained by Google in the original paper, Dremel complements MapReduce-based computing. Dremel is not intended as a replacement for MapReduce and is often used in conjunction with it to analyze outputs of MapReduce pipelines or rapidly prototype larger computations. Indeed, Dremel and MapReduce are both used by thousands of Google employees. Like Dremel, Drill supports a nested data model with data encoded in a number of formats such as JSON, Avro or Protocol Buffers. In many organizations nested data is the standard, so supporting a nested data model eliminates the need to normalize the data. With that said, flat data formats, such as CSV files, are naturally supported as a special case of nested data. The Drill architecture consists of four key components/layers: * Query languages: This layer is responsible for parsing the user's query and constructing an execution plan. The initial goal is to support the SQL-like language used by Dremel and [[https://developers.google.com/bigquery/docs/query-reference|Google BigQuery]], which we call DrQL. However, Drill is designed to support other languages and programming models, such as the [[http://www.mongodb.org/display/DOCS/Mongo+Query+Language|Mongo Query Language]], [[http://www.cascading.org/|Cascading]] or [[https://github.com/tdunning/Plume|Plume]]. * Low-latency distributed execution engine: This layer is responsible for executing the physical plan. It provides the scalability and fault tolerance needed to efficiently query petabytes of data on 10,000 servers. Drill's execution engine is based on research in distributed execution engines (eg, Dremel, Dryad, Hyracks, CIEL, Stratosphere) and columnar storage, and can be extended with additional operators and connectors. * Nested data formats: This layer is responsible for supporting various data formats. The initial goal is to support the column-based format used by Dremel. Drill is designed to support schema-based formats such as Protocol Buffers/Dremel, Avro/AVRO-806/Trevni and CSV, and schema-less formats such as JSON, BSON or YAML. In addition, it is designed to support
Re: [PROPOSAL] Drill for the Apache Incubator
The consensus in the group of committers listed in the proposal is that we would like to discourage piling on of pre-formation committers and encourage adding committers after formation based on contributions. It is clear that there are gobs of people with the credentials and track record to be potential contributors, but it is also clear that many of these people have huge demands on their time. That leaves doubt about how much contribution they can or should be making to a new project. It is also clear that there are gobs of people that are not already part of Apache who may have time and expertise to contribute. In any case, the vote is already started and will be done before long. Let's go with what we are already voting on without changing it in mid-stream and then adjust later. Progress, not perfection, as they say. On Wed, Aug 8, 2012 at 3:31 AM, Bertrand Delacretaz bdelacre...@apache.org wrote: On Wed, Aug 8, 2012 at 7:20 AM, Marvin Humphrey mar...@rectangular.com wrote: On Tue, Aug 7, 2012 at 10:09 PM, Arun C Murthy a...@hortonworks.com wrote: Wasn't clear, can I add myself now? Didn't the Incubator go back to discouraging open enrollment?... AFAIK, no. What was discussed is that incoming podlings should clearly state their requirements for people that want to be added as initial committers, to keep it fair. -Bertrand - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [VOTE] (missing 1 IPMC +1) S4 0.5.0 Release Candidate 1
On 08/08/2012 10:51 AM, Matthieu Morel wrote: Hi, I had misinterpreted the vote results and prematurely declared the vote as passed, sorry about that... In reality, we still need 1 IPMC +1 vote for the S4 0.5.0 Release Candidate 1. Current status after last week's votation is: +1: 7 (5 binding) -1: 0 Details: +1 IPMC: acmurthy, phunt +1 PPMC kishoreg*, leoneu*, fpj +1 wider community Daniel Gomez, Karthik Kambatla (* voted on the s4-dev list only) --- This is the first release candidate for Apache S4, version 0.5.0 It fixes the following issues: https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12312322version=12318653 Note that we are voting upon the source (tag), binaries are provided for convenience. ** The vote is open for at least 72 hours with no specific close time. Source and binary packages in zip format: http://people.apache.org/~mmorel/s4-0.5.0-incubating-release-candidate-1/ The (git) tag to be voted upon: 0.5.0: https://git-wip-us.apache.org/repos/asf?p=incubator-s4.git;a=tag;h=70806aa1ee0b9154d36fd834dc4907cd8d3eb791 S4 KEYS file containing PGP keys we use to sign the release: http://svn.apache.org/repos/asf/incubator/s4/dist/KEYS Please cast your vote. [ ] +1 approve [ ] +0 no opinion [ ] -1 disapprove (and reason why) Thanks! Matthieu +1 Binding Sigs and hashes are good. src file matches what is in the tag (minus the javadoc generation which is fine). All Java files have headers. Disclaimer, Notice, and License all look right to me. A few of the properties files are missing headers. That should probably be fixed in the future. It would be nice to be able to run Apache Creadur in the project to verify licenses in the future. - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [VOTE] Accept Drill into the Apache Incubator
+1 (blinding) Otis Performance Monitoring for Solr / ElasticSearch / HBase - http://sematext.com/spm From: Ted Dunning ted.dunn...@gmail.com To: general@incubator.apache.org Sent: Tuesday, August 7, 2012 10:41 PM Subject: [VOTE] Accept Drill into the Apache Incubator I would like to call a vote for accepting Drill for incubation in the Apache Incubator. The full proposal is available below. Discussion over the last few days has been quite positive. Please cast your vote: [ ] +1, bring Drill into Incubator [ ] +0, I don't care either way, [ ] -1, do not bring Drill into Incubator, because... This vote will be open for 72 hours and only votes from the Incubator PMC are binding. The start of the vote is just before 3AM UTC on 8 August so the closing time will be 3AM UTC on 11 August. Thank you for your consideration! Ted http://wiki.apache.org/incubator/DrillProposal = Drill = == Abstract == Drill is a distributed system for interactive analysis of large-scale datasets, inspired by [[http://research.google.com/pubs/pub36632.html|Google's Dremel]]. == Proposal == Drill is a distributed system for interactive analysis of large-scale datasets. Drill is similar to Google's Dremel, with the additional flexibility needed to support a broader range of query languages, data formats and data sources. It is designed to efficiently process nested data. It is a design goal to scale to 10,000 servers or more and to be able to process petabyes of data and trillions of records in seconds. == Background == Many organizations have the need to run data-intensive applications, including batch processing, stream processing and interactive analysis. In recent years open source systems have emerged to address the need for scalable batch processing (Apache Hadoop) and stream processing (Storm, Apache S4). In 2010 Google published a paper called Dremel: Interactive Analysis of Web-Scale Datasets, describing a scalable system used internally for interactive analysis of nested data. No open source project has successfully replicated the capabilities of Dremel. == Rationale == There is a strong need in the market for low-latency interactive analysis of large-scale datasets, including nested data (eg, JSON, Avro, Protocol Buffers). This need was identified by Google and addressed internally with a system called Dremel. In recent years open source systems have emerged to address the need for scalable batch processing (Apache Hadoop) and stream processing (Storm, Apache S4). Apache Hadoop, originally inspired by Google's internal MapReduce system, is used by thousands of organizations processing large-scale datasets. Apache Hadoop is designed to achieve very high throughput, but is not designed to achieve the sub-second latency needed for interactive data analysis and exploration. Drill, inspired by Google's internal Dremel system, is intended to address this need. It is worth noting that, as explained by Google in the original paper, Dremel complements MapReduce-based computing. Dremel is not intended as a replacement for MapReduce and is often used in conjunction with it to analyze outputs of MapReduce pipelines or rapidly prototype larger computations. Indeed, Dremel and MapReduce are both used by thousands of Google employees. Like Dremel, Drill supports a nested data model with data encoded in a number of formats such as JSON, Avro or Protocol Buffers. In many organizations nested data is the standard, so supporting a nested data model eliminates the need to normalize the data. With that said, flat data formats, such as CSV files, are naturally supported as a special case of nested data. The Drill architecture consists of four key components/layers: * Query languages: This layer is responsible for parsing the user's query and constructing an execution plan. The initial goal is to support the SQL-like language used by Dremel and [[https://developers.google.com/bigquery/docs/query-reference|Google BigQuery]], which we call DrQL. However, Drill is designed to support other languages and programming models, such as the [[http://www.mongodb.org/display/DOCS/Mongo+Query+Language|Mongo Query Language]], [[http://www.cascading.org/|Cascading]] or [[https://github.com/tdunning/Plume|Plume]]. * Low-latency distributed execution engine: This layer is responsible for executing the physical plan. It provides the scalability and fault tolerance needed to efficiently query petabytes of data on 10,000 servers. Drill's execution engine is based on research in distributed execution engines (eg, Dremel, Dryad, Hyracks, CIEL, Stratosphere) and columnar storage, and can be extended with additional operators and connectors. * Nested data formats: This layer is responsible for supporting various data formats. The initial goal is to support the column-based format used by Dremel. Drill is designed to support schema-based formats such as Protocol Buffers/Dremel, Avro/AVRO-806/Trevni and CSV,
Clerezza status (Was: [Incubator Wiki] Update of August2012 by BertrandDelacretaz)
Hi, On Mon, Aug 6, 2012 at 11:26 AM, Apache Wiki wikidi...@apache.org wrote: + As in our last report in May, we believe Clerezza should graduate soon, but + unfortunately that hasn't happened yet. Activity is currently fairly low, + and it looks like Clerezza might remain a small/low activity project, but + the PPMC is functional, has done releases and invited additional committers + so there's no need to stay in the Incubator any longer once a plan to attempt + to grow the community is in place. Do you have an idea what happened around a year ago when dev@ activity dropped from the hundreds it was for a long time to the dozens where it's mostly stayed since then? Alarmingly the low mark seems to have been last month when only a single non-automated post was sent to dev@. I recall Clerezza having release trouble due to complex/unreleased dependencies for a long time. Could that have contributed to the loss of momentum? I think it would be useful to somehow capture experience like this, perhaps ultimately for use by ComDev in something like a How to maintain community momentum? guide. Anyway, it sounds like the community has a reasonably good idea on how to proceed, so I'm not too worried yet even though Clerezza is already getting pretty close to its three-year mark at the Incubator. Though I'd really love to see Clerezza showing notable improvement or even graduating before that milestone is reached. If the efforts to grow or reactivate the community fail, would it be a good idea to seek to join forces with some related projects like Stanbol, Any23 or UIMA? Or do you feel that there are still enough active people to allow the project to function as a standalone TLP (able to reach 3 PMC votes for releases, etc.)? BR, Jukka Zitting - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [PROPOSAL] Drill for the Apache Incubator
+1 -C On Thu, Aug 2, 2012 at 3:12 PM, Ted Dunning ted.dunn...@gmail.com wrote: This is a duplicated attempt at sending this message, please ignore the previous message if it eventually arrives. There appears to be a hangup sending email from my apache email address via gmail. Abstract Drill is a distributed system for interactive analysis of large-scale datasets, inspired by Google’s Dremel ( http://research.google.com/pubs/pub36632.html). Proposal Drill is a distributed system for interactive analysis of large-scale datasets. Drill is similar to Google’s Dremel, with the additional flexibility needed to support a broader range of query languages, data formats and data sources. It is designed to efficiently process nested data. It is a design goal to scale to 10,000 servers or more and to be able to process petabyes of data and trillions of records in seconds. Background == Many organizations have the need to run data-intensive applications, including batch processing, stream processing and interactive analysis. In recent years open source systems have emerged to address the need for scalable batch processing (Apache Hadoop) and stream processing (Storm, Apache S4). In 2010 Google published a paper called “Dremel: Interactive Analysis of Web-Scale Datasets,” describing a scalable system used internally for interactive analysis of nested data. No open source project has successfully replicated the capabilities of Dremel. Rationale = There is a strong need in the market for low-latency interactive analysis of large-scale datasets, including nested data (eg, JSON, Avro, Protocol Buffers). This need was identified by Google and addressed internally with a system called Dremel. In recent years open source systems have emerged to address the need for scalable batch processing (Apache Hadoop) and stream processing (Storm, Apache S4). Apache Hadoop, originally inspired by Google’s internal MapReduce system, is used by thousands of organizations processing large-scale datasets. Apache Hadoop is designed to achieve very high throughput, but is not designed to achieve the sub-second latency needed for interactive data analysis and exploration. Drill, inspired by Google’s internal Dremel system, is intended to address this need. It is worth noting that, as explained by Google in the original paper, Dremel complements MapReduce-based computing. Dremel is not intended as a replacement for MapReduce and is often used in conjunction with it to analyze outputs of MapReduce pipelines or rapidly prototype larger computations. Indeed, Dremel and MapReduce are both used by thousands of Google employees. Like Dremel, Drill supports a nested data model with data encoded in a number of formats such as JSON, Avro or Protocol Buffers. In many organizations nested data is the standard, so supporting a nested data model eliminates the need to normalize the data. With that said, flat data formats, such as CSV files, are naturally supported as a special case of nested data. The Drill architecture consists of four key components/layers: * Query languages: This layer is responsible for parsing the user’s query and constructing an execution plan. The initial goal is to support the SQL-like language used by Dremel and Google BigQuery ( https://developers.google.com/bigquery/docs/query-reference), which we call DrQL. However, Drill is designed to support other languages and programming models, such as the Mongo Query Language ( http://www.mongodb.org/display/DOCS/Mongo+Query+Language), Cascading ( http://www.cascading.org/) or Plume (https://github.com/tdunning/Plume). * Low-latency distributed execution engine: This layer is responsible for executing the physical plan. It provides the scalability and fault tolerance needed to efficiently query petabytes of data on 10,000 servers. Drill’s execution engine is based on research in distributed execution engines (eg, Dremel, Dryad, Hyracks, CIEL, Stratosphere) and columnar storage, and can be extended with additional operators and connectors. * Nested data formats: This layer is responsible for supporting various data formats. The initial goal is to support the column-based format used by Dremel. Drill is designed to support schema-based formats such as Protocol Buffers/Dremel, Avro/AVRO-806/Trevni and CSV, and schema-less formats such as JSON, BSON or YAML. In addition, it is designed to support column-based formats such as Dremel, AVRO-806/Trevni and RCFile, and row-based formats such as Protocol Buffers, Avro, JSON, BSON and CSV. A particular distinction with Drill is that the execution engine is flexible enough to support column-based processing as well as row-based processing. This is important because column-based processing can be much more efficient when the data is stored in a column-based format, but many large data assets are stored in a row-based format that
Re: [VOTE] Accept Drill into the Apache Incubator
+1 -C (sorry, wrong thread) On Tue, Aug 7, 2012 at 7:41 PM, Ted Dunning ted.dunn...@gmail.com wrote: I would like to call a vote for accepting Drill for incubation in the Apache Incubator. The full proposal is available below. Discussion over the last few days has been quite positive. Please cast your vote: [ ] +1, bring Drill into Incubator [ ] +0, I don't care either way, [ ] -1, do not bring Drill into Incubator, because... This vote will be open for 72 hours and only votes from the Incubator PMC are binding. The start of the vote is just before 3AM UTC on 8 August so the closing time will be 3AM UTC on 11 August. Thank you for your consideration! Ted http://wiki.apache.org/incubator/DrillProposal = Drill = == Abstract == Drill is a distributed system for interactive analysis of large-scale datasets, inspired by [[http://research.google.com/pubs/pub36632.html|Google's Dremel]]. == Proposal == Drill is a distributed system for interactive analysis of large-scale datasets. Drill is similar to Google's Dremel, with the additional flexibility needed to support a broader range of query languages, data formats and data sources. It is designed to efficiently process nested data. It is a design goal to scale to 10,000 servers or more and to be able to process petabyes of data and trillions of records in seconds. == Background == Many organizations have the need to run data-intensive applications, including batch processing, stream processing and interactive analysis. In recent years open source systems have emerged to address the need for scalable batch processing (Apache Hadoop) and stream processing (Storm, Apache S4). In 2010 Google published a paper called Dremel: Interactive Analysis of Web-Scale Datasets, describing a scalable system used internally for interactive analysis of nested data. No open source project has successfully replicated the capabilities of Dremel. == Rationale == There is a strong need in the market for low-latency interactive analysis of large-scale datasets, including nested data (eg, JSON, Avro, Protocol Buffers). This need was identified by Google and addressed internally with a system called Dremel. In recent years open source systems have emerged to address the need for scalable batch processing (Apache Hadoop) and stream processing (Storm, Apache S4). Apache Hadoop, originally inspired by Google's internal MapReduce system, is used by thousands of organizations processing large-scale datasets. Apache Hadoop is designed to achieve very high throughput, but is not designed to achieve the sub-second latency needed for interactive data analysis and exploration. Drill, inspired by Google's internal Dremel system, is intended to address this need. It is worth noting that, as explained by Google in the original paper, Dremel complements MapReduce-based computing. Dremel is not intended as a replacement for MapReduce and is often used in conjunction with it to analyze outputs of MapReduce pipelines or rapidly prototype larger computations. Indeed, Dremel and MapReduce are both used by thousands of Google employees. Like Dremel, Drill supports a nested data model with data encoded in a number of formats such as JSON, Avro or Protocol Buffers. In many organizations nested data is the standard, so supporting a nested data model eliminates the need to normalize the data. With that said, flat data formats, such as CSV files, are naturally supported as a special case of nested data. The Drill architecture consists of four key components/layers: * Query languages: This layer is responsible for parsing the user's query and constructing an execution plan. The initial goal is to support the SQL-like language used by Dremel and [[https://developers.google.com/bigquery/docs/query-reference|Google BigQuery]], which we call DrQL. However, Drill is designed to support other languages and programming models, such as the [[http://www.mongodb.org/display/DOCS/Mongo+Query+Language|Mongo Query Language]], [[http://www.cascading.org/|Cascading]] or [[https://github.com/tdunning/Plume|Plume]]. * Low-latency distributed execution engine: This layer is responsible for executing the physical plan. It provides the scalability and fault tolerance needed to efficiently query petabytes of data on 10,000 servers. Drill's execution engine is based on research in distributed execution engines (eg, Dremel, Dryad, Hyracks, CIEL, Stratosphere) and columnar storage, and can be extended with additional operators and connectors. * Nested data formats: This layer is responsible for supporting various data formats. The initial goal is to support the column-based format used by Dremel. Drill is designed to support schema-based formats such as Protocol Buffers/Dremel, Avro/AVRO-806/Trevni and CSV, and schema-less formats such as JSON, BSON or YAML. In addition, it is designed to support column-based formats such as
Re: [VOTE] Apache OpenMeetings Moodle Plugin 1.4 Incubating Release Candidate 1
Hi, On Mon, Aug 6, 2012 at 7:44 PM, seba.wag...@gmail.com seba.wag...@gmail.com wrote: I would like to start a vote about releasing Apache OpenMeetings Moodle Plugin 1.4 Incubating Release Candidate 1 +1 to release (-src.tar.gz MD5 e381dc019e70dde3117bc9021ee2c79e) On Mon, Aug 6, 2012 at 7:41 PM, seba.wag...@gmail.com seba.wag...@gmail.com wrote: However we still need 3 IPMCs to vote. Openmeetings mentors, where are you? BR, Jukka Zitting - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: Preparing for August report
Hi, On Mon, Aug 6, 2012 at 12:14 PM, Jukka Zitting jukka.zitt...@gmail.com wrote: That leaves only the reviews to be done. Here's the latest TODO list: And an updated one: Benson Margulies - Syncope, Nuvem Dave Fisher - DeltaSpike Matt Franklin- Droids Matt Hogstrom- SIS, Wookie Mohammad Nour- Airavata Ross Gardler - Wink That's quite a few reports still to review and the report deadline is close. Please let me know if you're still on it (or update the wiki page directly), otherwise I'll take over tomorrow to review any remaining reports. BR, Jukka Zitting - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [VOTE] Accept Drill into the Apache Incubator
Hi, On Wed, Aug 8, 2012 at 4:41 AM, Ted Dunning ted.dunn...@gmail.com wrote: I would like to call a vote for accepting Drill for incubation in the Apache Incubator. The full proposal is available below. Discussion over the last few days has been quite positive. [x] +1, bring Drill into Incubator BR, Jukka Zitting - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
DeltaSpike Status - Re: Preparing for August report
Hi - AFAIK Documentation is not a graduation requirement. I was frustrated with the documentation because the lack makes it hard to understand the project, but that is not a blocker. I think the project should start working on graduation soon. They have made a release. The community is active on the lists. Most of the status page items are checked off with possibly only podlingnamesearch needed. Regards, Dave On Aug 8, 2012, at 3:36 PM, Jukka Zitting wrote: Hi, On Mon, Aug 6, 2012 at 12:14 PM, Jukka Zitting jukka.zitt...@gmail.com wrote: That leaves only the reviews to be done. Here's the latest TODO list: And an updated one: Benson Margulies - Syncope, Nuvem Dave Fisher - DeltaSpike Matt Franklin- Droids Matt Hogstrom- SIS, Wookie Mohammad Nour- Airavata Ross Gardler - Wink That's quite a few reports still to review and the report deadline is close. Please let me know if you're still on it (or update the wiki page directly), otherwise I'll take over tomorrow to review any remaining reports. BR, Jukka Zitting - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
RE: Preparing for August report
I reviewed droids and thought their report adequately represented the project state. -Original Message- From: Jukka Zitting [jukka.zitt...@gmail.commailto:jukka.zitt...@gmail.com] Sent: Wednesday, August 08, 2012 06:37 PM Eastern Standard Time To: general Subject: Re: Preparing for August report Hi, On Mon, Aug 6, 2012 at 12:14 PM, Jukka Zitting jukka.zitt...@gmail.com wrote: That leaves only the reviews to be done. Here's the latest TODO list: And an updated one: Benson Margulies - Syncope, Nuvem Dave Fisher - DeltaSpike Matt Franklin- Droids Matt Hogstrom- SIS, Wookie Mohammad Nour- Airavata Ross Gardler - Wink That's quite a few reports still to review and the report deadline is close. Please let me know if you're still on it (or update the wiki page directly), otherwise I'll take over tomorrow to review any remaining reports. BR, Jukka Zitting - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [PROPOSAL] Drill for the Apache Incubator
Oops, apologies - thanks for the reminder. I uploaded the slides as an attachment on the wiki page. Thanks, Tomer On Wed, Aug 8, 2012 at 9:14 PM, Jakob Homan jgho...@gmail.com wrote: So, no response to my request above about the design docs and not-TO-DOne MapR presentation? On Wed, Aug 8, 2012 at 3:25 PM, Chris Douglas cdoug...@apache.org wrote: +1 -C On Thu, Aug 2, 2012 at 3:12 PM, Ted Dunning ted.dunn...@gmail.com wrote: This is a duplicated attempt at sending this message, please ignore the previous message if it eventually arrives. There appears to be a hangup sending email from my apache email address via gmail. Abstract Drill is a distributed system for interactive analysis of large-scale datasets, inspired by Google’s Dremel ( http://research.google.com/pubs/pub36632.html). Proposal Drill is a distributed system for interactive analysis of large-scale datasets. Drill is similar to Google’s Dremel, with the additional flexibility needed to support a broader range of query languages, data formats and data sources. It is designed to efficiently process nested data. It is a design goal to scale to 10,000 servers or more and to be able to process petabyes of data and trillions of records in seconds. Background == Many organizations have the need to run data-intensive applications, including batch processing, stream processing and interactive analysis. In recent years open source systems have emerged to address the need for scalable batch processing (Apache Hadoop) and stream processing (Storm, Apache S4). In 2010 Google published a paper called “Dremel: Interactive Analysis of Web-Scale Datasets,” describing a scalable system used internally for interactive analysis of nested data. No open source project has successfully replicated the capabilities of Dremel. Rationale = There is a strong need in the market for low-latency interactive analysis of large-scale datasets, including nested data (eg, JSON, Avro, Protocol Buffers). This need was identified by Google and addressed internally with a system called Dremel. In recent years open source systems have emerged to address the need for scalable batch processing (Apache Hadoop) and stream processing (Storm, Apache S4). Apache Hadoop, originally inspired by Google’s internal MapReduce system, is used by thousands of organizations processing large-scale datasets. Apache Hadoop is designed to achieve very high throughput, but is not designed to achieve the sub-second latency needed for interactive data analysis and exploration. Drill, inspired by Google’s internal Dremel system, is intended to address this need. It is worth noting that, as explained by Google in the original paper, Dremel complements MapReduce-based computing. Dremel is not intended as a replacement for MapReduce and is often used in conjunction with it to analyze outputs of MapReduce pipelines or rapidly prototype larger computations. Indeed, Dremel and MapReduce are both used by thousands of Google employees. Like Dremel, Drill supports a nested data model with data encoded in a number of formats such as JSON, Avro or Protocol Buffers. In many organizations nested data is the standard, so supporting a nested data model eliminates the need to normalize the data. With that said, flat data formats, such as CSV files, are naturally supported as a special case of nested data. The Drill architecture consists of four key components/layers: * Query languages: This layer is responsible for parsing the user’s query and constructing an execution plan. The initial goal is to support the SQL-like language used by Dremel and Google BigQuery ( https://developers.google.com/bigquery/docs/query-reference), which we call DrQL. However, Drill is designed to support other languages and programming models, such as the Mongo Query Language ( http://www.mongodb.org/display/DOCS/Mongo+Query+Language), Cascading ( http://www.cascading.org/) or Plume (https://github.com/tdunning/Plume ). * Low-latency distributed execution engine: This layer is responsible for executing the physical plan. It provides the scalability and fault tolerance needed to efficiently query petabytes of data on 10,000 servers. Drill’s execution engine is based on research in distributed execution engines (eg, Dremel, Dryad, Hyracks, CIEL, Stratosphere) and columnar storage, and can be extended with additional operators and connectors. * Nested data formats: This layer is responsible for supporting various data formats. The initial goal is to support the column-based format used by Dremel. Drill is designed to support schema-based formats such as Protocol Buffers/Dremel, Avro/AVRO-806/Trevni and CSV, and schema-less formats such as JSON, BSON or YAML. In addition, it is designed to support column-based formats such as