Re: [VOTE] Accept Drill into the Apache Incubator

2012-08-08 Thread Alex Karasulu
+1 (binding)

On Wed, Aug 8, 2012 at 8:33 AM, Mattmann, Chris A (388J) 
chris.a.mattm...@jpl.nasa.gov wrote:

 +1 (binding). Good luck and sounds cool!

 Cheers,
 Chris

 On Aug 7, 2012, at 7:41 PM, Ted Dunning wrote:

  I would like to call a vote for accepting Drill for incubation in the
  Apache Incubator. The full proposal is available below.  Discussion
  over the last few days has been quite positive.
 
  Please cast your vote:
 
  [ ] +1, bring Drill into Incubator
  [ ] +0, I don't care either way,
  [ ] -1, do not bring Drill into Incubator, because...
 
  This vote will be open for 72 hours and only votes from the Incubator
  PMC are binding.  The start of the vote is just before 3AM UTC on 8
  August so the closing time will be 3AM UTC on 11 August.
 
  Thank you for your consideration!
 
  Ted
 
  http://wiki.apache.org/incubator/DrillProposal
 
  = Drill =
 
  == Abstract ==
  Drill is a distributed system for interactive analysis of large-scale
  datasets, inspired by
  [[http://research.google.com/pubs/pub36632.html|Google's Dremel]].
 
  == Proposal ==
  Drill is a distributed system for interactive analysis of large-scale
  datasets. Drill is similar to Google's Dremel, with the additional
  flexibility needed to support a broader range of query languages, data
  formats and data sources. It is designed to efficiently process nested
  data. It is a design goal to scale to 10,000 servers or more and to be
  able to process petabyes of data and trillions of records in seconds.
 
  == Background ==
  Many organizations have the need to run data-intensive applications,
  including batch processing, stream processing and interactive
  analysis. In recent years open source systems have emerged to address
  the need for scalable batch processing (Apache Hadoop) and stream
  processing (Storm, Apache S4). In 2010 Google published a paper called
  Dremel: Interactive Analysis of Web-Scale Datasets, describing a
  scalable system used internally for interactive analysis of nested
  data. No open source project has successfully replicated the
  capabilities of Dremel.
 
  == Rationale ==
  There is a strong need in the market for low-latency interactive
  analysis of large-scale datasets, including nested data (eg, JSON,
  Avro, Protocol Buffers). This need was identified by Google and
  addressed internally with a system called Dremel.
 
  In recent years open source systems have emerged to address the need
  for scalable batch processing (Apache Hadoop) and stream processing
  (Storm, Apache S4). Apache Hadoop, originally inspired by Google's
  internal MapReduce system, is used by thousands of organizations
  processing large-scale datasets. Apache Hadoop is designed to achieve
  very high throughput, but is not designed to achieve the sub-second
  latency needed for interactive data analysis and exploration. Drill,
  inspired by Google's internal Dremel system, is intended to address
  this need.
 
  It is worth noting that, as explained by Google in the original paper,
  Dremel complements MapReduce-based computing. Dremel is not intended
  as a replacement for MapReduce and is often used in conjunction with
  it to analyze outputs of MapReduce pipelines or rapidly prototype
  larger computations. Indeed, Dremel and MapReduce are both used by
  thousands of Google employees.
 
  Like Dremel, Drill supports a nested data model with data encoded in a
  number of formats such as JSON, Avro or Protocol Buffers. In many
  organizations nested data is the standard, so supporting a nested data
  model eliminates the need to normalize the data. With that said, flat
  data formats, such as CSV files, are naturally supported as a special
  case of nested data.
 
  The Drill architecture consists of four key components/layers:
  * Query languages: This layer is responsible for parsing the user's
  query and constructing an execution plan.  The initial goal is to
  support the SQL-like language used by Dremel and
  [[https://developers.google.com/bigquery/docs/query-reference|Google
  BigQuery]], which we call DrQL. However, Drill is designed to support
  other languages and programming models, such as the
  [[http://www.mongodb.org/display/DOCS/Mongo+Query+Language|Mongo Query
  Language]], [[http://www.cascading.org/|Cascading]] or
  [[https://github.com/tdunning/Plume|Plume]].
  * Low-latency distributed execution engine: This layer is responsible
  for executing the physical plan. It provides the scalability and fault
  tolerance needed to efficiently query petabytes of data on 10,000
  servers. Drill's execution engine is based on research in distributed
  execution engines (eg, Dremel, Dryad, Hyracks, CIEL, Stratosphere) and
  columnar storage, and can be extended with additional operators and
  connectors.
  * Nested data formats: This layer is responsible for supporting
  various data formats. The initial goal is to support the column-based
  format used by Dremel. Drill is designed to support 

Re: [PROPOSAL] Drill for the Apache Incubator

2012-08-08 Thread Jakob Homan
On Mon, Aug 6, 2012 at 2:23 PM, Ted Dunning ted.dunn...@gmail.com wrote:
 No reason at all.

Sorry.  I may have been unclear.  I was requesting that the design
docs which are being referenced in the proposal:
The requirement and design documents are currently stored in MapR
Technologies' source code repository. They will be checked in as part
of the initial code dump.
be made available for review as part of the proposal, much as an
initial source code base would be.  There is also a reference to a
presentation to-be-made available:
High-level slides have been published by MapR: TODO

Can those be made public?

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [VOTE] Release Apache Bloodhound 0.1.0 (incubating)

2012-08-08 Thread ant elder
On Wed, Aug 8, 2012 at 1:13 AM, Greg Stein gst...@gmail.com wrote:
 On Tue, Aug 7, 2012 at 5:54 PM, ant elder ant.el...@gmail.com wrote:
 On Tue, Aug 7, 2012 at 9:51 PM, Greg Stein gst...@gmail.com wrote:
...
 You can look at the archives back in 2006 when it was incubating. In
 particular, there is one sent to private@incubator that I would refer
 you to:
   http://s.apache.org/c04  [only usable by ASF Members]


 Didn't that get subsequently revised by Cliff et al into Incubating
 projects must not distribute an official product release that includes
 works covered by an excluded license -
 http://www.apache.org/legal/3party.html#transition-incubator

 Dunno. That link is for a draft document, and has been replaced by a
 final/resolved form (see link at top of page).

 Regardless... Jukka posted recently, and I'd look to his note for
 current policy. I think his statement puts Incubator policy a little
 more relaxed than ASF, but likely not as relaxed as I would have
 posited (in regards to dependencies).


The good thing about release votes is that they can't be vetoed so
regardless of what policies may or may not be documented whether or
not a release vote passes is just down to getting enough people to
vote +1. Votes on general@ often stall and require a respin when
someone claims something is wrong which puts off others from voting.
Something as basic as a dependent license missing from the LICENSE
file would be one of those things that in the past would have always
demanded a respin, so the change, and it is a change, to allow wiggle
room is what i hope people will remember from this.

   ...ant

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: {RESULT] [VOTE] S4 0.5.0 Release Candidate 1

2012-08-08 Thread Matthieu Morel

Hello,

On 8/7/12 4:30 PM, Richard Frovarp wrote:
 On 08/07/2012 06:36 AM, Matthieu Morel wrote:
 Hi,

 The vote for this S4 release passed with the following results at the
 vote deadline:

 +1: 7 (5 binding)
 -1: 0

 Details:

 +1 IPMC:
 acmurthy, phunt

 +1 PPMC
 kishoreg*, leoneu*, fpj

 +1 wider community
 Daniel Gomez, Karthik Kambatla


 Thanks to all the participants to the voting process!

 I'll now publish the artifacts, and after the sync delay, update the
 websites and send announcements.

 Matthieu


 Best I know, you need three IPMC votes for it to pass.

Thanks for outlining the missing IPMC vote Flavio, thanks for the 
clarification Richard, and sorry all for my misinterpretation and for 
the noise on this list.


It seems that typically mentors vote for releases, and that counts as 
IPMC votes, unfortunately we now only have 2 mentors for S4 (both +1'ed).


At this point, I believe that we should ask for another IPMC to vote for 
the release, by sending a specific vote request, even though the vote 
expired, is this correct? Should we set a timeframe for the vote (I 
don't see that in previous similar requests)?


Thanks,


Matthieu

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: Incubator release task force

2012-08-08 Thread Jukka Zitting
Hi,

For people interested in working on this, the ongoing Bloodhound
release vote has triggered some good discussion that would be great to
capture somehow.

BR,

Jukka Zitting

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [VOTE] Accept Drill into the Apache Incubator

2012-08-08 Thread Andrzej Bialecki

On 08/08/2012 04:41, Ted Dunning wrote:

I would like to call a vote for accepting Drill for incubation in the
Apache Incubator. The full proposal is available below.  Discussion
over the last few days has been quite positive.

Please cast your vote:

[ ] +1, bring Drill into Incubator
[ ] +0, I don't care either way,
[ ] -1, do not bring Drill into Incubator, because...

This vote will be open for 72 hours and only votes from the Incubator
PMC are binding.  The start of the vote is just before 3AM UTC on 8
August so the closing time will be 3AM UTC on 11 August.


+1 (binding) - this is an exciting proposal!

--
Best regards,
Andrzej Bialecki
http://www.sigram.com, blog http://www.sigram.com/blog
 ___.,___,___,___,_._. __
[___||.__|__/|__||\/|: Information Retrieval, System Integration
___|||__||..\|..||..|: Contact: info at sigram dot com


-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: Incubator release task force

2012-08-08 Thread Bertrand Delacretaz
On Thu, Jul 26, 2012 at 4:10 PM, Jukka Zitting jukka.zitt...@gmail.com wrote:

 ...I'd like to start fixing this by forming a release task force of a
 handful of volunteers who are ready to invest an hour or two per week
 to work onb) migrating
 /dist/incubator to svnpubsub by the end of this year...

I'm interested in helping with that but I'd suggest starting from
scratch on new docs in svnpubsub, in order to create a minimal set of
docs that's understandable and maintainable. We'd keep the current
docs around as the old docs and refer to them less and less and the
new, smaller ones take shape. I'll discuss that in a separate thread.

-Bertrand

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



[RT] a minimal set of docs for incubator.apache.org

2012-08-08 Thread Bertrand Delacretaz
Hi,

Like others, I'm not too happy with the current
http://incubator.apache.org/ content.

How about starting a new, minimal set of docs that are more
maintainable and understandable?

IMO, the following would be sufficient, with one page per topic:

1. What's the Apache Incubator? (homepage)
2. Lifecycle of a podling, from proposal to graduation, with many
links to existing examples (proposals, committer votes, graduation
threads, etc.)
3. Release checklist: criteria for approving a release
4. Previously asked questions (a la
http://www.apache.org/legal/resolved.html, includes IP clearance info)
6. Glossary of terms (though that might belong to the top-level
apache.org site instead)

I'm just considering the narrative info, not the podling status pages
or clutch stuff in this refactoring. That status info might move to
podlings.incubator.apache.org to better separate it and keep the main
site minimal.

I've got some draft content for 2. and 3., that I've been collecting
in my mentoring activities.

WDYT?

-Bertrand

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [VOTE] Accept Drill into the Apache Incubator

2012-08-08 Thread Bertrand Delacretaz
On Wed, Aug 8, 2012 at 4:41 AM, Ted Dunning ted.dunn...@gmail.com wrote:
 I would like to call a vote for accepting Drill for incubation in the
 Apache Incubator...

+1

-Bertrand

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [RT] a minimal set of docs for incubator.apache.org

2012-08-08 Thread Greg Stein
Commit the content. Otherwise, we're just hand-waving.
On Aug 8, 2012 5:29 AM, Bertrand Delacretaz bdelacre...@apache.org
wrote:

 Hi,

 Like others, I'm not too happy with the current
 http://incubator.apache.org/ content.

 How about starting a new, minimal set of docs that are more
 maintainable and understandable?

 IMO, the following would be sufficient, with one page per topic:

 1. What's the Apache Incubator? (homepage)
 2. Lifecycle of a podling, from proposal to graduation, with many
 links to existing examples (proposals, committer votes, graduation
 threads, etc.)
 3. Release checklist: criteria for approving a release
 4. Previously asked questions (a la
 http://www.apache.org/legal/resolved.html, includes IP clearance info)
 6. Glossary of terms (though that might belong to the top-level
 apache.org site instead)

 I'm just considering the narrative info, not the podling status pages
 or clutch stuff in this refactoring. That status info might move to
 podlings.incubator.apache.org to better separate it and keep the main
 site minimal.

 I've got some draft content for 2. and 3., that I've been collecting
 in my mentoring activities.

 WDYT?

 -Bertrand

 -
 To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
 For additional commands, e-mail: general-h...@incubator.apache.org




Re: [RT] a minimal set of docs for incubator.apache.org

2012-08-08 Thread Gary Martin
Committing somewhere would be good as otherwise I don't know whether I 
need to suggest the following (I wasn't sure of the best thread for this 
to go in anyway):


   May I suggest that, where appropriate, the documentation is backed
   up with pointers to examples of existing projects that are
   considered to represent the current best practice on various
   aspects. Whilst clear documentation is fantastic, there is nothing
   like good examples for building confidence that one is doing things
   in the right way.

Cheers,
Gary


On 08/08/2012 10:42 AM, Greg Stein wrote:

Commit the content. Otherwise, we're just hand-waving.
On Aug 8, 2012 5:29 AM, Bertrand Delacretaz bdelacre...@apache.org
wrote:


Hi,

Like others, I'm not too happy with the current
http://incubator.apache.org/ content.

How about starting a new, minimal set of docs that are more
maintainable and understandable?

IMO, the following would be sufficient, with one page per topic:

1. What's the Apache Incubator? (homepage)
2. Lifecycle of a podling, from proposal to graduation, with many
links to existing examples (proposals, committer votes, graduation
threads, etc.)
3. Release checklist: criteria for approving a release
4. Previously asked questions (a la
http://www.apache.org/legal/resolved.html, includes IP clearance info)
6. Glossary of terms (though that might belong to the top-level
apache.org site instead)

I'm just considering the narrative info, not the podling status pages
or clutch stuff in this refactoring. That status info might move to
podlings.incubator.apache.org to better separate it and keep the main
site minimal.

I've got some draft content for 2. and 3., that I've been collecting
in my mentoring activities.

WDYT?

-Bertrand

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org






Re: [VOTE] Accept Drill into the Apache Incubator

2012-08-08 Thread Torsten Curdt
On Wed, Aug 8, 2012 at 11:39 AM, Bertrand Delacretaz
bdelacre...@apache.org wrote:
 On Wed, Aug 8, 2012 at 4:41 AM, Ted Dunning ted.dunn...@gmail.com wrote:
 I would like to call a vote for accepting Drill for incubation in the
 Apache Incubator...

 +1

+1

cheers,
Torsten

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [VOTE] Apache Syncope 1.0.0-incubating

2012-08-08 Thread Alexei Fedotov
+1 (non-binding)
--
With best regards / с наилучшими пожеланиями,
Alexei Fedotov / Алексей Федотов,
http://dataved.ru/
+7 916 562 8095


On Mon, Aug 6, 2012 at 7:50 PM, Francesco Chicchiriccò
ilgro...@apache.org wrote:
 On 06/08/2012 16:36, Alexei Fedotov wrote:
 Hello Francesco,

 Here are few things I have found via manual inspection:

 1. Jquery bundle contains several following strings: Dual licensed
 under the MIT or GPL Version 2 licenses.
 *) source release LICENSE file does not contain MIT license;
 *) and the file itself does not look like APL licensed;
 *) and it is a part of the source release.

 Something should be fixed here, i.e. the files replaced with wget in
 the build script.

 2. ./legal_ext/LICENSE does not have a license for jquery. Does war
 contain jquery?

 Hi Alexei,
 I've taken a look at other ASF projects including JQuery (or similar
 dual-licensed JS frameworks) and I've opened
 https://issues.apache.org/jira/browse/SYNCOPE-181
 We'll fix this ASAP.

 Don't think these issues are stoppers.

 Cool :-)
 What's your vote on the release, then?

 Thanks for your review.
 Regards.

 On Mon, Aug 6, 2012 at 6:07 PM, Mark Struberg strub...@yahoo.de wrote:
 Hi Francesco, I can check in the evening.

 LieGrue,
 strub



 - Original Message -
 From: Francesco Chicchiriccò ilgro...@apache.org
 To: general@incubator.apache.org
 Cc:
 Sent: Monday, August 6, 2012 2:49 PM
 Subject: Re: [VOTE] Apache Syncope 1.0.0-incubating

 Hi IPMC members,
 we are missing a single vote on this release: anyone interested to check?

 TIA.
 Regards.

 On 03/08/2012 09:58, Francesco Chicchiriccò wrote:
  I've created a 1.0.0-incubating release, with the following artifacts
 up
  for a vote:

  SVN source tag (r1367421):

 https://svn.apache.org/repos/asf/incubator/syncope/tags/syncope-1.0.0-incubating/
  List of changes:

 https://svn.apache.org/repos/asf/incubator/syncope/tags/syncope-1.0.0-incubating/CHANGES
  Maven staging repo:
  https://repository.apache.org/content/repositories/orgapachesyncope-100/

  Source release (checksums and signatures are available at the same
  location):

 https://repository.apache.org/content/repositories/orgapachesyncope-100/org/apache/syncope/syncope-root/1.0.0-incubating/syncope-root-1.0.0-incubating-source-release.zip
  Staging site:
  http://incubator.apache.org/syncope/1.0.0-incubating/

  PGP release keys (signed using 273DF287):
  http://www.apache.org/dist/incubator/syncope/KEYS


  This has been voted through on the syncope-...@incubator.apache.org
  mailing list [1],
  and now requires a vote on general@incubator.apache.org

  Votes already cast (on syncope-dev):

  +1 (binding)
  * Francesco Chicchiriccò
  * Massimiliano Perrone
  * Marco Di Sabatino Di Diodoro
  * Emmanuel Lécharny (IPMC member)
  * Simone Tripodi
  * Colm O hEigeartaigh (IPMC member)

  +1 (non binding)
   * Denis Signoretto


  Vote will be open for 72 hours.

  [ ] +1  approve
  [ ] +0  no opinion
  [ ] -1  disapprove (and reason why)

  Best regards.

  [1]

 http://syncope-dev.1063484.n5.nabble.com/VOTE-Apache-Syncope-1-0-0-incubating-tp5710173p5710292.html

 --
 Francesco Chicchiriccò

 ASF Member, Apache Cocoon PMC and Apache Syncope PPMC Member
 http://people.apache.org/~ilgrosso/


 -
 To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
 For additional commands, e-mail: general-h...@incubator.apache.org


-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: Incubator release task force

2012-08-08 Thread Benson Margulies
On Wed, Aug 8, 2012 at 5:18 AM, Bertrand Delacretaz
bdelacre...@apache.org wrote:
 On Thu, Jul 26, 2012 at 4:10 PM, Jukka Zitting jukka.zitt...@gmail.com 
 wrote:

 ...I'd like to start fixing this by forming a release task force of a
 handful of volunteers who are ready to invest an hour or two per week
 to work onb) migrating
 /dist/incubator to svnpubsub by the end of this year...

 I'm interested in helping with that but I'd suggest starting from
 scratch on new docs in svnpubsub, in order to create a minimal set of
 docs that's understandable and maintainable. We'd keep the current
 docs around as the old docs and refer to them less and less and the
 new, smaller ones take shape. I'll discuss that in a separate thread.

I'm in on this.


 -Bertrand

 -
 To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
 For additional commands, e-mail: general-h...@incubator.apache.org


-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [VOTE] Accept Drill into the Apache Incubator

2012-08-08 Thread Grant Ingersoll

On Aug 7, 2012, at 10:41 PM, Ted Dunning wrote:

 I would like to call a vote for accepting Drill for incubation in the
 Apache Incubator. The full proposal is available below.  Discussion
 over the last few days has been quite positive.
 
 Please cast your vote:
 
 [ ] +1, bring Drill into Incubator

+1 (binding)

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [VOTE] Accept Drill into the Apache Incubator

2012-08-08 Thread Mohammad Nour El-Din
+1 (binding)

On Wed, Aug 8, 2012 at 3:55 PM, Grant Ingersoll gsing...@apache.org wrote:


 On Aug 7, 2012, at 10:41 PM, Ted Dunning wrote:

  I would like to call a vote for accepting Drill for incubation in the
  Apache Incubator. The full proposal is available below.  Discussion
  over the last few days has been quite positive.
 
  Please cast your vote:
 
  [ ] +1, bring Drill into Incubator

 +1 (binding)

 -
 To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
 For additional commands, e-mail: general-h...@incubator.apache.org




-- 
Thanks
- Mohammad Nour

Life is like riding a bicycle. To keep your balance you must keep moving
- Albert Einstein


Re: [VOTE] Accept Drill into the Apache Incubator

2012-08-08 Thread Phillip Rhodes
On Tue, Aug 7, 2012 at 9:41 PM, Ted Dunning ted.dunn...@gmail.com wrote:
 I would like to call a vote for accepting Drill for incubation in the
 Apache Incubator. The full proposal is available below.  Discussion
 over the last few days has been quite positive.

 Please cast your vote:

 [ ] +1, bring Drill into Incubator
 [ ] +0, I don't care either way,
 [ ] -1, do not bring Drill into Incubator, because...

+1


Phil

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [VOTE] Release Apache Bloodhound 0.1.0 (incubating)

2012-08-08 Thread Alexei Fedotov
Hello,

Let me add one more point on adding dependencies to source releases.
In addition to license, the dependence contain copyright statements,
e.g. # Copyright (C)  2005 Christopher Lenz cml...@gmx.de.

As mentioned here http://www.apache.org/legal/src-headers.html
 If the source file is submitted with a copyright notice included in it, the 
 copyright owner (or owner's agent) must either:
 remove such notices, or
 move them to the NOTICE file associated with each applicable project release, 
 or
 provide written permission for the ASF to make such removal or relocation of 
 the notices.

This issue cannot be fixed by merging licenses into LICENSE file.

--
With best regards / с наилучшими пожеланиями,
Alexei Fedotov / Алексей Федотов,
http://dataved.ru/
+7 916 562 8095


On Wed, Aug 8, 2012 at 11:26 AM, ant elder ant.el...@gmail.com wrote:
 On Wed, Aug 8, 2012 at 1:13 AM, Greg Stein gst...@gmail.com wrote:
 On Tue, Aug 7, 2012 at 5:54 PM, ant elder ant.el...@gmail.com wrote:
 On Tue, Aug 7, 2012 at 9:51 PM, Greg Stein gst...@gmail.com wrote:
...
 You can look at the archives back in 2006 when it was incubating. In
 particular, there is one sent to private@incubator that I would refer
 you to:
   http://s.apache.org/c04  [only usable by ASF Members]


 Didn't that get subsequently revised by Cliff et al into Incubating
 projects must not distribute an official product release that includes
 works covered by an excluded license -
 http://www.apache.org/legal/3party.html#transition-incubator

 Dunno. That link is for a draft document, and has been replaced by a
 final/resolved form (see link at top of page).

 Regardless... Jukka posted recently, and I'd look to his note for
 current policy. I think his statement puts Incubator policy a little
 more relaxed than ASF, but likely not as relaxed as I would have
 posited (in regards to dependencies).


 The good thing about release votes is that they can't be vetoed so
 regardless of what policies may or may not be documented whether or
 not a release vote passes is just down to getting enough people to
 vote +1. Votes on general@ often stall and require a respin when
 someone claims something is wrong which puts off others from voting.
 Something as basic as a dependent license missing from the LICENSE
 file would be one of those things that in the past would have always
 demanded a respin, so the change, and it is a change, to allow wiggle
 room is what i hope people will remember from this.

...ant

 -
 To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
 For additional commands, e-mail: general-h...@incubator.apache.org


-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [VOTE] Release Apache Bloodhound 0.1.0 (incubating)

2012-08-08 Thread ant elder
On Wed, Aug 8, 2012 at 3:19 PM, Alexei Fedotov alexei.fedo...@gmail.com wrote:
 Hello,

 Let me add one more point on adding dependencies to source releases.
 In addition to license, the dependence contain copyright statements,
 e.g. # Copyright (C)  2005 Christopher Lenz cml...@gmx.de.

 As mentioned here http://www.apache.org/legal/src-headers.html
 If the source file is submitted with a copyright notice included in it, the 
 copyright owner (or owner's agent) must either:
 remove such notices, or
 move them to the NOTICE file associated with each applicable project 
 release, or
 provide written permission for the ASF to make such removal or relocation of 
 the notices.

 This issue cannot be fixed by merging licenses into LICENSE file.


No, this is not what that source headers page is talking about. That
page is talking about any copyright statements that may have been in
source files when contributed to the ASF, here we are talking about
the licenses of any external dependencies that are included in a
release, and those licenses should be added to the LICENSE file, as
described at: 
http://www.apache.org/dev/release.html#distributing-code-under-several-licenses

   ...ant

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



[VOTE] (missing 1 IPMC +1) S4 0.5.0 Release Candidate 1

2012-08-08 Thread Matthieu Morel

Hi,

I had misinterpreted the vote results and prematurely declared the vote 
as passed, sorry about that...


In reality, we still need 1 IPMC +1 vote for the S4 0.5.0 Release 
Candidate 1.



Current status after last week's votation is:

+1: 7 (5 binding)
-1: 0

Details:

+1 IPMC:
acmurthy, phunt

+1 PPMC
kishoreg*, leoneu*, fpj

+1 wider community
Daniel Gomez, Karthik Kambatla


(* voted on the s4-dev list only)

---

This is the first release candidate for Apache S4, version 0.5.0

It fixes the following issues:
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12312322version=12318653

Note that we are voting upon the source (tag), binaries are provided for
convenience.

** The vote is open for at least 72 hours with no specific close time.

Source and binary packages in zip format:
http://people.apache.org/~mmorel/s4-0.5.0-incubating-release-candidate-1/

The (git) tag to be voted upon: 0.5.0:
https://git-wip-us.apache.org/repos/asf?p=incubator-s4.git;a=tag;h=70806aa1ee0b9154d36fd834dc4907cd8d3eb791

S4 KEYS file containing PGP keys we use to sign the release:
http://svn.apache.org/repos/asf/incubator/s4/dist/KEYS

Please cast your vote.

[ ] +1  approve
[ ] +0  no opinion
[ ] -1  disapprove (and reason why)


Thanks!

Matthieu



-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [RT] a minimal set of docs for incubator.apache.org

2012-08-08 Thread Marvin Humphrey
On Wed, Aug 8, 2012 at 2:29 AM, Bertrand Delacretaz
bdelacre...@apache.org wrote:

 How about starting a new, minimal set of docs that are more
 maintainable and understandable?

+1 to accepting that the end result of the documentation overhaul may be quite
different from what exists now.

-1 to starting from scratch rather than continuing the ongoing evolutionary
effort via progressive edits to the existing documents.

 IMO, the following would be sufficient, with one page per topic:

 1. What's the Apache Incubator? (homepage)
 2. Lifecycle of a podling, from proposal to graduation, with many
 links to existing examples (proposals, committer votes, graduation
 threads, etc.)
 3. Release checklist: criteria for approving a release
 4. Previously asked questions (a la
 http://www.apache.org/legal/resolved.html, includes IP clearance info)
 6. Glossary of terms (though that might belong to the top-level
 apache.org site instead)

I don't believe that this proposed outline will meet your goals for
maintainability, because it is is not structured to take into account how the
Incubator docs evolve.  If we adopt this framework unmodified, I predict that
over time our docs will gradually decompose and revert to the current state of
incoherency.

The proposed Previously asked questions page, in particular, is doomed to
death-by-bloat.

The Incubator's documentation gets continuously updated by people who are
well-meaning but have a limited perspective.  If we don't provide outlets for
individuals to contribute what they are absolutely convinced is essential
material but is likely just their own pet best-practices tip, minimal docs
won't stay minimal for long.

In my opinion, we will achieve better results if we adopt a hierarchical
model: augment a minimal core with topical satellite pages (which lots of
people write to but fewer people read).  This paradigm is superior to
minimalism for two reasons:

First, the hierarchical model is sustainable while a purely minimalist
approach is toxic to community and incompatible with the Apache Way.
Rejecting contributions which do not fit within the tight scope of a
minimalist vision is costly -- it is dispiriting for the contributor and
exhausts the curator.  In contrast, when a curator merely *moves* a
contribution to a satellite page, less diplomatic effort is required and all
parties are more likely to be more-or-less satisfied with the end result.

Second, under a hierarchical model we are better able to make use of topical
contributions because they will be accessible by subject rather than thrown
into a catch-all like an FAQ page.  While the Java and Maven stuff was buried
in the giant pile of releasemanagement.html, no one had ownership of it.  Now
that release-java.html has been broken out, it has a decent shot at evolving
into something coherent and succinct that will serve Java podlings well.

 I've got some draft content for 2. and 3., that I've been collecting
 in my mentoring activities.

From past experience, I know that the quality of your writing is high...  we
are not exactly lacking draft content, though, you know? :\  It would be great
to add your material to the collection of raw material that exists now, but I
don't see that it should displace everybody else's hard work.

Can you instead be persuaded to work with us on rewriting and editing down the
existing docs?  A lot of your draft material is likely to find its way into
the final product that way. :)

Marvin Humphrey

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



RE: [VOTE] Accept Drill into the Apache Incubator

2012-08-08 Thread Franklin, Matthew B.
+1 (binding)

-Original Message-
From: Ted Dunning [mailto:ted.dunn...@gmail.com]
Sent: Tuesday, August 07, 2012 10:41 PM
To: general@incubator.apache.org
Subject: [VOTE] Accept Drill into the Apache Incubator

I would like to call a vote for accepting Drill for incubation in the
Apache Incubator. The full proposal is available below.  Discussion
over the last few days has been quite positive.

Please cast your vote:

[ ] +1, bring Drill into Incubator
[ ] +0, I don't care either way,
[ ] -1, do not bring Drill into Incubator, because...

This vote will be open for 72 hours and only votes from the Incubator
PMC are binding.  The start of the vote is just before 3AM UTC on 8
August so the closing time will be 3AM UTC on 11 August.

Thank you for your consideration!

Ted

http://wiki.apache.org/incubator/DrillProposal

= Drill =

== Abstract ==
Drill is a distributed system for interactive analysis of large-scale
datasets, inspired by
[[http://research.google.com/pubs/pub36632.html|Google's Dremel]].

== Proposal ==
Drill is a distributed system for interactive analysis of large-scale
datasets. Drill is similar to Google's Dremel, with the additional
flexibility needed to support a broader range of query languages, data
formats and data sources. It is designed to efficiently process nested
data. It is a design goal to scale to 10,000 servers or more and to be
able to process petabyes of data and trillions of records in seconds.

== Background ==
Many organizations have the need to run data-intensive applications,
including batch processing, stream processing and interactive
analysis. In recent years open source systems have emerged to address
the need for scalable batch processing (Apache Hadoop) and stream
processing (Storm, Apache S4). In 2010 Google published a paper called
Dremel: Interactive Analysis of Web-Scale Datasets, describing a
scalable system used internally for interactive analysis of nested
data. No open source project has successfully replicated the
capabilities of Dremel.

== Rationale ==
There is a strong need in the market for low-latency interactive
analysis of large-scale datasets, including nested data (eg, JSON,
Avro, Protocol Buffers). This need was identified by Google and
addressed internally with a system called Dremel.

In recent years open source systems have emerged to address the need
for scalable batch processing (Apache Hadoop) and stream processing
(Storm, Apache S4). Apache Hadoop, originally inspired by Google's
internal MapReduce system, is used by thousands of organizations
processing large-scale datasets. Apache Hadoop is designed to achieve
very high throughput, but is not designed to achieve the sub-second
latency needed for interactive data analysis and exploration. Drill,
inspired by Google's internal Dremel system, is intended to address
this need.

It is worth noting that, as explained by Google in the original paper,
Dremel complements MapReduce-based computing. Dremel is not intended
as a replacement for MapReduce and is often used in conjunction with
it to analyze outputs of MapReduce pipelines or rapidly prototype
larger computations. Indeed, Dremel and MapReduce are both used by
thousands of Google employees.

Like Dremel, Drill supports a nested data model with data encoded in a
number of formats such as JSON, Avro or Protocol Buffers. In many
organizations nested data is the standard, so supporting a nested data
model eliminates the need to normalize the data. With that said, flat
data formats, such as CSV files, are naturally supported as a special
case of nested data.

The Drill architecture consists of four key components/layers:
 * Query languages: This layer is responsible for parsing the user's
query and constructing an execution plan.  The initial goal is to
support the SQL-like language used by Dremel and
[[https://developers.google.com/bigquery/docs/query-reference|Google
BigQuery]], which we call DrQL. However, Drill is designed to support
other languages and programming models, such as the
[[http://www.mongodb.org/display/DOCS/Mongo+Query+Language|Mongo
Query
Language]], [[http://www.cascading.org/|Cascading]] or
[[https://github.com/tdunning/Plume|Plume]].
 * Low-latency distributed execution engine: This layer is responsible
for executing the physical plan. It provides the scalability and fault
tolerance needed to efficiently query petabytes of data on 10,000
servers. Drill's execution engine is based on research in distributed
execution engines (eg, Dremel, Dryad, Hyracks, CIEL, Stratosphere) and
columnar storage, and can be extended with additional operators and
connectors.
 * Nested data formats: This layer is responsible for supporting
various data formats. The initial goal is to support the column-based
format used by Dremel. Drill is designed to support schema-based
formats such as Protocol Buffers/Dremel, Avro/AVRO-806/Trevni and CSV,
and schema-less formats such as JSON, BSON or YAML. In addition, it is
designed to support 

Re: [PROPOSAL] Drill for the Apache Incubator

2012-08-08 Thread Ted Dunning
The consensus in the group of committers listed in the proposal is
that we would like to discourage piling on of pre-formation committers
and encourage adding committers after formation based on
contributions.  It is clear that there are gobs of people with the
credentials and track record to be potential contributors, but it is
also clear that many of these people have huge demands on their time.
That leaves doubt about how much contribution they can or should be
making to a new project.

It is also clear that there are gobs of people that are not already
part of Apache who may have time and expertise to contribute.

In any case, the vote is already started and will be done before long.
 Let's go with what we are already voting on without changing it in
mid-stream and then adjust later.  Progress, not perfection, as they
say.

On Wed, Aug 8, 2012 at 3:31 AM, Bertrand Delacretaz
bdelacre...@apache.org wrote:
 On Wed, Aug 8, 2012 at 7:20 AM, Marvin Humphrey mar...@rectangular.com 
 wrote:
 On Tue, Aug 7, 2012 at 10:09 PM, Arun C Murthy a...@hortonworks.com wrote:
 Wasn't clear, can I add myself now?

 Didn't the Incubator go back to discouraging open enrollment?...

 AFAIK, no. What was discussed is that incoming podlings should clearly
 state their requirements for people that want to be added as initial
 committers, to keep it fair.

 -Bertrand

 -
 To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
 For additional commands, e-mail: general-h...@incubator.apache.org


-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [VOTE] (missing 1 IPMC +1) S4 0.5.0 Release Candidate 1

2012-08-08 Thread Richard Frovarp

On 08/08/2012 10:51 AM, Matthieu Morel wrote:

Hi,

I had misinterpreted the vote results and prematurely declared the vote
as passed, sorry about that...

In reality, we still need 1 IPMC +1 vote for the S4 0.5.0 Release
Candidate 1.


Current status after last week's votation is:

+1: 7 (5 binding)
-1: 0

Details:

+1 IPMC:
acmurthy, phunt

+1 PPMC
kishoreg*, leoneu*, fpj

+1 wider community
Daniel Gomez, Karthik Kambatla


(* voted on the s4-dev list only)

---

This is the first release candidate for Apache S4, version 0.5.0

It fixes the following issues:
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12312322version=12318653


Note that we are voting upon the source (tag), binaries are provided for
convenience.

** The vote is open for at least 72 hours with no specific close time.

Source and binary packages in zip format:
http://people.apache.org/~mmorel/s4-0.5.0-incubating-release-candidate-1/

The (git) tag to be voted upon: 0.5.0:
https://git-wip-us.apache.org/repos/asf?p=incubator-s4.git;a=tag;h=70806aa1ee0b9154d36fd834dc4907cd8d3eb791


S4 KEYS file containing PGP keys we use to sign the release:
http://svn.apache.org/repos/asf/incubator/s4/dist/KEYS

Please cast your vote.

[ ] +1  approve
[ ] +0  no opinion
[ ] -1  disapprove (and reason why)


Thanks!

Matthieu




+1 Binding

Sigs and hashes are good.
src file matches what is in the tag (minus the javadoc generation which 
is fine).


All Java files have headers.

Disclaimer, Notice, and License all look right to me.

A few of the properties files are missing headers. That should probably 
be fixed in the future. It would be nice to be able to run Apache 
Creadur in the project to verify licenses in the future.



-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [VOTE] Accept Drill into the Apache Incubator

2012-08-08 Thread Otis Gospodnetic
+1 (blinding)

Otis

Performance Monitoring for Solr / ElasticSearch / HBase - 
http://sematext.com/spm 




 From: Ted Dunning ted.dunn...@gmail.com
To: general@incubator.apache.org 
Sent: Tuesday, August 7, 2012 10:41 PM
Subject: [VOTE] Accept Drill into the Apache Incubator
 
I would like to call a vote for accepting Drill for incubation in the
Apache Incubator. The full proposal is available below.  Discussion
over the last few days has been quite positive.

Please cast your vote:

[ ] +1, bring Drill into Incubator
[ ] +0, I don't care either way,
[ ] -1, do not bring Drill into Incubator, because...

This vote will be open for 72 hours and only votes from the Incubator
PMC are binding.  The start of the vote is just before 3AM UTC on 8
August so the closing time will be 3AM UTC on 11 August.

Thank you for your consideration!

Ted

http://wiki.apache.org/incubator/DrillProposal

= Drill =

== Abstract ==
Drill is a distributed system for interactive analysis of large-scale
datasets, inspired by
[[http://research.google.com/pubs/pub36632.html|Google's Dremel]].

== Proposal ==
Drill is a distributed system for interactive analysis of large-scale
datasets. Drill is similar to Google's Dremel, with the additional
flexibility needed to support a broader range of query languages, data
formats and data sources. It is designed to efficiently process nested
data. It is a design goal to scale to 10,000 servers or more and to be
able to process petabyes of data and trillions of records in seconds.

== Background ==
Many organizations have the need to run data-intensive applications,
including batch processing, stream processing and interactive
analysis. In recent years open source systems have emerged to address
the need for scalable batch processing (Apache Hadoop) and stream
processing (Storm, Apache S4). In 2010 Google published a paper called
Dremel: Interactive Analysis of Web-Scale Datasets, describing a
scalable system used internally for interactive analysis of nested
data. No open source project has successfully replicated the
capabilities of Dremel.

== Rationale ==
There is a strong need in the market for low-latency interactive
analysis of large-scale datasets, including nested data (eg, JSON,
Avro, Protocol Buffers). This need was identified by Google and
addressed internally with a system called Dremel.

In recent years open source systems have emerged to address the need
for scalable batch processing (Apache Hadoop) and stream processing
(Storm, Apache S4). Apache Hadoop, originally inspired by Google's
internal MapReduce system, is used by thousands of organizations
processing large-scale datasets. Apache Hadoop is designed to achieve
very high throughput, but is not designed to achieve the sub-second
latency needed for interactive data analysis and exploration. Drill,
inspired by Google's internal Dremel system, is intended to address
this need.

It is worth noting that, as explained by Google in the original paper,
Dremel complements MapReduce-based computing. Dremel is not intended
as a replacement for MapReduce and is often used in conjunction with
it to analyze outputs of MapReduce pipelines or rapidly prototype
larger computations. Indeed, Dremel and MapReduce are both used by
thousands of Google employees.

Like Dremel, Drill supports a nested data model with data encoded in a
number of formats such as JSON, Avro or Protocol Buffers. In many
organizations nested data is the standard, so supporting a nested data
model eliminates the need to normalize the data. With that said, flat
data formats, such as CSV files, are naturally supported as a special
case of nested data.

The Drill architecture consists of four key components/layers:
* Query languages: This layer is responsible for parsing the user's
query and constructing an execution plan.  The initial goal is to
support the SQL-like language used by Dremel and
[[https://developers.google.com/bigquery/docs/query-reference|Google
BigQuery]], which we call DrQL. However, Drill is designed to support
other languages and programming models, such as the
[[http://www.mongodb.org/display/DOCS/Mongo+Query+Language|Mongo Query
Language]], [[http://www.cascading.org/|Cascading]] or
[[https://github.com/tdunning/Plume|Plume]].
* Low-latency distributed execution engine: This layer is responsible
for executing the physical plan. It provides the scalability and fault
tolerance needed to efficiently query petabytes of data on 10,000
servers. Drill's execution engine is based on research in distributed
execution engines (eg, Dremel, Dryad, Hyracks, CIEL, Stratosphere) and
columnar storage, and can be extended with additional operators and
connectors.
* Nested data formats: This layer is responsible for supporting
various data formats. The initial goal is to support the column-based
format used by Dremel. Drill is designed to support schema-based
formats such as Protocol Buffers/Dremel, Avro/AVRO-806/Trevni and CSV,

Clerezza status (Was: [Incubator Wiki] Update of August2012 by BertrandDelacretaz)

2012-08-08 Thread Jukka Zitting
Hi,

On Mon, Aug 6, 2012 at 11:26 AM, Apache Wiki wikidi...@apache.org wrote:
 +   As in our last report in May, we believe Clerezza should graduate soon, 
 but
 +   unfortunately that hasn't happened yet. Activity is currently fairly low,
 +   and it looks like Clerezza might remain a small/low activity project, but
 +   the PPMC is functional, has done releases and invited additional 
 committers
 +   so there's no need to stay in the Incubator any longer once a plan to 
 attempt
 +   to grow the community is in place.

Do you have an idea what happened around a year ago when dev@ activity
dropped from the hundreds it was for a long time to the dozens where
it's mostly stayed since then? Alarmingly the low mark seems to have
been last month when only a single non-automated post was sent to
dev@.

I recall Clerezza having release trouble due to complex/unreleased
dependencies for a long time. Could that have contributed to the loss
of momentum? I think it would be useful to somehow capture experience
like this, perhaps ultimately for use by ComDev in something like a
How to maintain community momentum? guide.

Anyway, it sounds like the community has a reasonably good idea on how
to proceed, so I'm not too worried yet even though Clerezza is already
getting pretty close to its three-year mark at the Incubator. Though
I'd really love to see Clerezza showing notable improvement or even
graduating before that milestone is reached.

If the efforts to grow or reactivate the community fail, would it be a
good idea to seek to join forces with some related projects like
Stanbol, Any23 or UIMA? Or do you feel that there are still enough
active people to allow the project to function as a standalone TLP
(able to reach 3 PMC votes for releases, etc.)?

BR,

Jukka Zitting

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [PROPOSAL] Drill for the Apache Incubator

2012-08-08 Thread Chris Douglas
+1 -C

On Thu, Aug 2, 2012 at 3:12 PM, Ted Dunning ted.dunn...@gmail.com wrote:
 This is a duplicated attempt at sending this message, please ignore the
 previous message if it eventually arrives.  There appears to be a hangup
 sending email from my apache email address via gmail.

 Abstract
 
 Drill is a distributed system for interactive analysis of large-scale
 datasets, inspired by Google’s Dremel (
 http://research.google.com/pubs/pub36632.html).

 Proposal
 
 Drill is a distributed system for interactive analysis of large-scale
 datasets. Drill is similar to Google’s Dremel, with the additional
 flexibility needed to support a broader range of query languages, data
 formats and data sources. It is designed to efficiently process nested
 data. It is a design goal to scale to 10,000 servers or more and to be able
 to process petabyes of data and trillions of records in seconds.

 Background
 ==
 Many organizations have the need to run data-intensive applications,
 including batch processing, stream processing and interactive analysis. In
 recent years open source systems have emerged to address the need for
 scalable batch processing (Apache Hadoop) and stream processing (Storm,
 Apache S4). In 2010 Google published a paper called “Dremel: Interactive
 Analysis of Web-Scale Datasets,” describing a scalable system used
 internally for interactive analysis of nested data. No open source project
 has successfully replicated the capabilities of Dremel.

 Rationale
 =
 There is a strong need in the market for low-latency interactive analysis
 of large-scale datasets, including nested data (eg, JSON, Avro, Protocol
 Buffers). This need was identified by Google and addressed internally with
 a system called Dremel.

 In recent years open source systems have emerged to address the need for
 scalable batch processing (Apache Hadoop) and stream processing (Storm,
 Apache S4). Apache Hadoop, originally inspired by Google’s internal
 MapReduce system, is used by thousands of organizations processing
 large-scale datasets. Apache Hadoop is designed to achieve very high
 throughput, but is not designed to achieve the sub-second latency needed
 for interactive data analysis and exploration. Drill, inspired by Google’s
 internal Dremel system, is intended to address this need.

 It is worth noting that, as explained by Google in the original paper,
 Dremel complements MapReduce-based computing. Dremel is not intended as a
 replacement for MapReduce and is often used in conjunction with it to
 analyze outputs of MapReduce pipelines or rapidly prototype larger
 computations. Indeed, Dremel and MapReduce are both used by thousands of
 Google employees.

 Like Dremel, Drill supports a nested data model with data encoded in a
 number of formats such as JSON, Avro or Protocol Buffers. In many
 organizations nested data is the standard, so supporting a nested data
 model eliminates the need to normalize the data. With that said, flat data
 formats, such as CSV files, are naturally supported as a special case of
 nested data.

 The Drill architecture consists of four key components/layers:
 * Query languages: This layer is responsible for parsing the user’s query
 and constructing an execution plan.  The initial goal is to support the
 SQL-like language used by Dremel and Google BigQuery (
 https://developers.google.com/bigquery/docs/query-reference), which we call
 DrQL. However, Drill is designed to support other languages and programming
 models, such as the Mongo Query Language (
 http://www.mongodb.org/display/DOCS/Mongo+Query+Language), Cascading (
 http://www.cascading.org/) or Plume (https://github.com/tdunning/Plume).
 * Low-latency distributed execution engine: This layer is responsible for
 executing the physical plan. It provides the scalability and fault
 tolerance needed to efficiently query petabytes of data on 10,000 servers.
 Drill’s execution engine is based on research in distributed execution
 engines (eg, Dremel, Dryad, Hyracks, CIEL, Stratosphere) and columnar
 storage, and can be extended with additional operators and connectors.
 * Nested data formats: This layer is responsible for supporting various
 data formats. The initial goal is to support the column-based format used
 by Dremel. Drill is designed to support schema-based formats such as
 Protocol Buffers/Dremel, Avro/AVRO-806/Trevni and CSV, and schema-less
 formats such as JSON, BSON or YAML. In addition, it is designed to support
 column-based formats such as Dremel, AVRO-806/Trevni and RCFile, and
 row-based formats such as Protocol Buffers, Avro, JSON, BSON and CSV. A
 particular distinction with Drill is that the execution engine is flexible
 enough to support column-based processing as well as row-based processing.
 This is important because column-based processing can be much more
 efficient when the data is stored in a column-based format, but many large
 data assets are stored in a row-based format that 

Re: [VOTE] Accept Drill into the Apache Incubator

2012-08-08 Thread Chris Douglas
+1 -C

(sorry, wrong thread)

On Tue, Aug 7, 2012 at 7:41 PM, Ted Dunning ted.dunn...@gmail.com wrote:
 I would like to call a vote for accepting Drill for incubation in the
 Apache Incubator. The full proposal is available below.  Discussion
 over the last few days has been quite positive.

 Please cast your vote:

 [ ] +1, bring Drill into Incubator
 [ ] +0, I don't care either way,
 [ ] -1, do not bring Drill into Incubator, because...

 This vote will be open for 72 hours and only votes from the Incubator
 PMC are binding.  The start of the vote is just before 3AM UTC on 8
 August so the closing time will be 3AM UTC on 11 August.

 Thank you for your consideration!

 Ted

 http://wiki.apache.org/incubator/DrillProposal

 = Drill =

 == Abstract ==
 Drill is a distributed system for interactive analysis of large-scale
 datasets, inspired by
 [[http://research.google.com/pubs/pub36632.html|Google's Dremel]].

 == Proposal ==
 Drill is a distributed system for interactive analysis of large-scale
 datasets. Drill is similar to Google's Dremel, with the additional
 flexibility needed to support a broader range of query languages, data
 formats and data sources. It is designed to efficiently process nested
 data. It is a design goal to scale to 10,000 servers or more and to be
 able to process petabyes of data and trillions of records in seconds.

 == Background ==
 Many organizations have the need to run data-intensive applications,
 including batch processing, stream processing and interactive
 analysis. In recent years open source systems have emerged to address
 the need for scalable batch processing (Apache Hadoop) and stream
 processing (Storm, Apache S4). In 2010 Google published a paper called
 Dremel: Interactive Analysis of Web-Scale Datasets, describing a
 scalable system used internally for interactive analysis of nested
 data. No open source project has successfully replicated the
 capabilities of Dremel.

 == Rationale ==
 There is a strong need in the market for low-latency interactive
 analysis of large-scale datasets, including nested data (eg, JSON,
 Avro, Protocol Buffers). This need was identified by Google and
 addressed internally with a system called Dremel.

 In recent years open source systems have emerged to address the need
 for scalable batch processing (Apache Hadoop) and stream processing
 (Storm, Apache S4). Apache Hadoop, originally inspired by Google's
 internal MapReduce system, is used by thousands of organizations
 processing large-scale datasets. Apache Hadoop is designed to achieve
 very high throughput, but is not designed to achieve the sub-second
 latency needed for interactive data analysis and exploration. Drill,
 inspired by Google's internal Dremel system, is intended to address
 this need.

 It is worth noting that, as explained by Google in the original paper,
 Dremel complements MapReduce-based computing. Dremel is not intended
 as a replacement for MapReduce and is often used in conjunction with
 it to analyze outputs of MapReduce pipelines or rapidly prototype
 larger computations. Indeed, Dremel and MapReduce are both used by
 thousands of Google employees.

 Like Dremel, Drill supports a nested data model with data encoded in a
 number of formats such as JSON, Avro or Protocol Buffers. In many
 organizations nested data is the standard, so supporting a nested data
 model eliminates the need to normalize the data. With that said, flat
 data formats, such as CSV files, are naturally supported as a special
 case of nested data.

 The Drill architecture consists of four key components/layers:
  * Query languages: This layer is responsible for parsing the user's
 query and constructing an execution plan.  The initial goal is to
 support the SQL-like language used by Dremel and
 [[https://developers.google.com/bigquery/docs/query-reference|Google
 BigQuery]], which we call DrQL. However, Drill is designed to support
 other languages and programming models, such as the
 [[http://www.mongodb.org/display/DOCS/Mongo+Query+Language|Mongo Query
 Language]], [[http://www.cascading.org/|Cascading]] or
 [[https://github.com/tdunning/Plume|Plume]].
  * Low-latency distributed execution engine: This layer is responsible
 for executing the physical plan. It provides the scalability and fault
 tolerance needed to efficiently query petabytes of data on 10,000
 servers. Drill's execution engine is based on research in distributed
 execution engines (eg, Dremel, Dryad, Hyracks, CIEL, Stratosphere) and
 columnar storage, and can be extended with additional operators and
 connectors.
  * Nested data formats: This layer is responsible for supporting
 various data formats. The initial goal is to support the column-based
 format used by Dremel. Drill is designed to support schema-based
 formats such as Protocol Buffers/Dremel, Avro/AVRO-806/Trevni and CSV,
 and schema-less formats such as JSON, BSON or YAML. In addition, it is
 designed to support column-based formats such as 

Re: [VOTE] Apache OpenMeetings Moodle Plugin 1.4 Incubating Release Candidate 1

2012-08-08 Thread Jukka Zitting
Hi,

On Mon, Aug 6, 2012 at 7:44 PM, seba.wag...@gmail.com
seba.wag...@gmail.com wrote:
 I would like to start a vote about releasing Apache OpenMeetings Moodle
 Plugin 1.4 Incubating Release Candidate 1

+1 to release (-src.tar.gz MD5 e381dc019e70dde3117bc9021ee2c79e)

On Mon, Aug 6, 2012 at 7:41 PM, seba.wag...@gmail.com
seba.wag...@gmail.com wrote:
 However we still need 3 IPMCs to vote.

Openmeetings mentors, where are you?

BR,

Jukka Zitting

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: Preparing for August report

2012-08-08 Thread Jukka Zitting
Hi,

On Mon, Aug 6, 2012 at 12:14 PM, Jukka Zitting jukka.zitt...@gmail.com wrote:
 That leaves only the reviews to be done. Here's the latest TODO list:

And an updated one:

  Benson Margulies - Syncope, Nuvem
  Dave Fisher  - DeltaSpike
  Matt Franklin- Droids
  Matt Hogstrom- SIS, Wookie
  Mohammad Nour- Airavata
  Ross Gardler - Wink

That's quite a few reports still to review and the report deadline is
close. Please let me know if you're still on it (or update the wiki
page directly), otherwise I'll take over tomorrow to review any
remaining reports.

BR,

Jukka Zitting

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [VOTE] Accept Drill into the Apache Incubator

2012-08-08 Thread Jukka Zitting
Hi,

On Wed, Aug 8, 2012 at 4:41 AM, Ted Dunning ted.dunn...@gmail.com wrote:
 I would like to call a vote for accepting Drill for incubation in the
 Apache Incubator. The full proposal is available below.  Discussion
 over the last few days has been quite positive.

  [x] +1, bring Drill into Incubator

BR,

Jukka Zitting

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



DeltaSpike Status - Re: Preparing for August report

2012-08-08 Thread Dave Fisher
Hi -

AFAIK Documentation is not a graduation requirement. I was frustrated with the 
documentation because the lack makes it hard to understand the project, but 
that is not a blocker.

I think the project should start working on graduation soon.

They have made a release. The community is active on the lists. Most of the 
status page items are checked off with possibly only podlingnamesearch needed.

Regards,
Dave

On Aug 8, 2012, at 3:36 PM, Jukka Zitting wrote:

 Hi,
 
 On Mon, Aug 6, 2012 at 12:14 PM, Jukka Zitting jukka.zitt...@gmail.com 
 wrote:
 That leaves only the reviews to be done. Here's the latest TODO list:
 
 And an updated one:
 
  Benson Margulies - Syncope, Nuvem
  Dave Fisher  - DeltaSpike
  Matt Franklin- Droids
  Matt Hogstrom- SIS, Wookie
  Mohammad Nour- Airavata
  Ross Gardler - Wink
 
 That's quite a few reports still to review and the report deadline is
 close. Please let me know if you're still on it (or update the wiki
 page directly), otherwise I'll take over tomorrow to review any
 remaining reports.
 
 BR,
 
 Jukka Zitting
 
 -
 To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
 For additional commands, e-mail: general-h...@incubator.apache.org
 


-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



RE: Preparing for August report

2012-08-08 Thread Franklin, Matthew B.
I reviewed droids and thought their report adequately represented the project 
state.






-Original Message-
From: Jukka Zitting [jukka.zitt...@gmail.commailto:jukka.zitt...@gmail.com]
Sent: Wednesday, August 08, 2012 06:37 PM Eastern Standard Time
To: general
Subject: Re: Preparing for August report


Hi,

On Mon, Aug 6, 2012 at 12:14 PM, Jukka Zitting jukka.zitt...@gmail.com wrote:
 That leaves only the reviews to be done. Here's the latest TODO list:

And an updated one:

  Benson Margulies - Syncope, Nuvem
  Dave Fisher  - DeltaSpike
  Matt Franklin- Droids
  Matt Hogstrom- SIS, Wookie
  Mohammad Nour- Airavata
  Ross Gardler - Wink

That's quite a few reports still to review and the report deadline is
close. Please let me know if you're still on it (or update the wiki
page directly), otherwise I'll take over tomorrow to review any
remaining reports.

BR,

Jukka Zitting

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [PROPOSAL] Drill for the Apache Incubator

2012-08-08 Thread Tomer Shiran
Oops, apologies - thanks for the reminder. I uploaded the slides as an
attachment on the wiki page.

Thanks,
Tomer

On Wed, Aug 8, 2012 at 9:14 PM, Jakob Homan jgho...@gmail.com wrote:

 So, no response to my request above about the design docs and
 not-TO-DOne MapR presentation?

 On Wed, Aug 8, 2012 at 3:25 PM, Chris Douglas cdoug...@apache.org wrote:
  +1 -C
 
  On Thu, Aug 2, 2012 at 3:12 PM, Ted Dunning ted.dunn...@gmail.com
 wrote:
  This is a duplicated attempt at sending this message, please ignore the
  previous message if it eventually arrives.  There appears to be a hangup
  sending email from my apache email address via gmail.
 
  Abstract
  
  Drill is a distributed system for interactive analysis of large-scale
  datasets, inspired by Google’s Dremel (
  http://research.google.com/pubs/pub36632.html).
 
  Proposal
  
  Drill is a distributed system for interactive analysis of large-scale
  datasets. Drill is similar to Google’s Dremel, with the additional
  flexibility needed to support a broader range of query languages, data
  formats and data sources. It is designed to efficiently process nested
  data. It is a design goal to scale to 10,000 servers or more and to be
 able
  to process petabyes of data and trillions of records in seconds.
 
  Background
  ==
  Many organizations have the need to run data-intensive applications,
  including batch processing, stream processing and interactive analysis.
 In
  recent years open source systems have emerged to address the need for
  scalable batch processing (Apache Hadoop) and stream processing (Storm,
  Apache S4). In 2010 Google published a paper called “Dremel: Interactive
  Analysis of Web-Scale Datasets,” describing a scalable system used
  internally for interactive analysis of nested data. No open source
 project
  has successfully replicated the capabilities of Dremel.
 
  Rationale
  =
  There is a strong need in the market for low-latency interactive
 analysis
  of large-scale datasets, including nested data (eg, JSON, Avro, Protocol
  Buffers). This need was identified by Google and addressed internally
 with
  a system called Dremel.
 
  In recent years open source systems have emerged to address the need for
  scalable batch processing (Apache Hadoop) and stream processing (Storm,
  Apache S4). Apache Hadoop, originally inspired by Google’s internal
  MapReduce system, is used by thousands of organizations processing
  large-scale datasets. Apache Hadoop is designed to achieve very high
  throughput, but is not designed to achieve the sub-second latency needed
  for interactive data analysis and exploration. Drill, inspired by
 Google’s
  internal Dremel system, is intended to address this need.
 
  It is worth noting that, as explained by Google in the original paper,
  Dremel complements MapReduce-based computing. Dremel is not intended as
 a
  replacement for MapReduce and is often used in conjunction with it to
  analyze outputs of MapReduce pipelines or rapidly prototype larger
  computations. Indeed, Dremel and MapReduce are both used by thousands of
  Google employees.
 
  Like Dremel, Drill supports a nested data model with data encoded in a
  number of formats such as JSON, Avro or Protocol Buffers. In many
  organizations nested data is the standard, so supporting a nested data
  model eliminates the need to normalize the data. With that said, flat
 data
  formats, such as CSV files, are naturally supported as a special case of
  nested data.
 
  The Drill architecture consists of four key components/layers:
  * Query languages: This layer is responsible for parsing the user’s
 query
  and constructing an execution plan.  The initial goal is to support the
  SQL-like language used by Dremel and Google BigQuery (
  https://developers.google.com/bigquery/docs/query-reference), which we
 call
  DrQL. However, Drill is designed to support other languages and
 programming
  models, such as the Mongo Query Language (
  http://www.mongodb.org/display/DOCS/Mongo+Query+Language), Cascading (
  http://www.cascading.org/) or Plume (https://github.com/tdunning/Plume
 ).
  * Low-latency distributed execution engine: This layer is responsible
 for
  executing the physical plan. It provides the scalability and fault
  tolerance needed to efficiently query petabytes of data on 10,000
 servers.
  Drill’s execution engine is based on research in distributed execution
  engines (eg, Dremel, Dryad, Hyracks, CIEL, Stratosphere) and columnar
  storage, and can be extended with additional operators and connectors.
  * Nested data formats: This layer is responsible for supporting various
  data formats. The initial goal is to support the column-based format
 used
  by Dremel. Drill is designed to support schema-based formats such as
  Protocol Buffers/Dremel, Avro/AVRO-806/Trevni and CSV, and schema-less
  formats such as JSON, BSON or YAML. In addition, it is designed to
 support
  column-based formats such as