Re: [PROPOSAL] MRQL for the Apache Incubator

2013-03-06 Thread Edward J. Yoon
I think it's time to call for vote.

On Mon, Mar 4, 2013 at 9:25 PM, Tommaso Teofili
tommaso.teof...@gmail.com wrote:
 Nice proposal indeed, I'd say having 3 mentors is usually better to avoid
 release headaches.
 Regards,
 Tommaso


 2013/3/4 Edward J. Yoon edwardy...@apache.org

 Sure I can. :)

 Of course, we'll welcome more mentors from incubator IPMC if there're
 volunteers.

 On Mon, Mar 4, 2013 at 7:34 PM, Alex Karasulu akaras...@apache.org
 wrote:
  On Mon, Mar 4, 2013 at 12:31 PM, Bertrand Delacretaz 
 bdelacre...@apache.org
  wrote:
 
  On Sat, Mar 2, 2013 at 7:12 AM, Leonidas Fegaras fega...@cse.uta.edu
  wrote:
   == Champion ==
   * Edward J. Yoon edwardyoon AT apache DOT org
   == Nominated Mentors ==
   * Alex Karasulu akarasulu AT apache DOT org
  ...
 
  Is Edward going to stay on as a mentor as well?
 
  Two (active) mentors is the bare minimum IMO.
 
 
  I suspect so but let's hear from Edward himself.
 
  Best Regards,
  -- Alex



 --
 Best Regards, Edward J. Yoon
 @eddieyoon

 -
 To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
 For additional commands, e-mail: general-h...@incubator.apache.org





-- 
Best Regards, Edward J. Yoon
@eddieyoon

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [VOTE] Accept Curator into the Incubator

2013-03-06 Thread Ioannis Canellos
+1 non-binding


Re: [VOTE] Accept Curator into the Incubator

2013-03-06 Thread Ioan Eugen Stan
+1 non binding

-- 
Ioan Eugen Stan

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Permission to edit incubator wiki

2013-03-06 Thread Joachim Dreimann
Can someone grant me permissions to edit the incubator wiki?

My wiki id is jdreimann

Thanks!

Joe


Suggested change to the ppmc guide

2013-03-06 Thread Chip Childers
Hi all,

After spamming the private@i.a.o list, and being asked to stop it, I'd
like to suggest the following changes to the PPMC guide:


Index: content/guides/ppmc.xml
===
--- content/guides/ppmc.xml (revision 1453351)
+++ content/guides/ppmc.xml (working copy)
@@ -168,7 +168,8 @@
   [VOTE] Joe Bob as committer. The [VOTE] message should be forwarded
   to the IPMC (a href=mailto:priv...@incubator.apache.org;
   priv...@incubator.apache.org/a) to notify them that the
-  vote is underway/p
+  vote is underway. Do not BCC or CC the IPMC on the VOTE thread.
+  Instead, forward the initial VOTE email./p
 
   pTo be successful the vote requires strongat least three +1 votes
   from PPMC members, including at least one +1
@@ -179,7 +180,8 @@
   a message to the PPMC private alias, and forward it to the IPMC,
   with the subject line of [VOTE][RESULT] Joe Bob as committer.
   The message should include the usual vote tally, indicating which
-  mentor or IPMC member votes cause it to be valid.
+  mentor or IPMC member votes cause it to be valid. Do not
+  BCC or CC the IPMC on the results email.  Instead, forward it.
   /p 
 
   p
@@ -229,8 +231,9 @@
   [VOTE] Joe Bob PPMC membership. The [VOTE] message should be forwarded
   to the IPMC (a href=mailto:priv...@incubator.apache.org;
   priv...@incubator.apache.org/a) to notify them that the
-  vote is underway. If the vote is successful, the proposer should send 
-  a message to the PPMC private alias, with
+  vote is underway. Do not CC or BCC the IPMC on this thread.  Instead,
+  forward the initial VOTE email.  If the vote is successful, the proposer 
+  should send a message to the PPMC private alias, with
   the subject line of [VOTE][RESULT] Joe Bob PPMC membership. The
   message id of the [VOTE][RESULT] message should be preserved for
   the message to the Incubator PMC after Joe Bob accepts. Now, Joe Bob

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: Suggested change to the ppmc guide

2013-03-06 Thread Daniel Shahaf
I believe the change represents current IPMC consensus but it'd be nice
if the change documented the rationale for the policy as well (at least
in the log message).

Daniel
(last week I ran into a 10 years old change that didn't have any
justification anywhere)

Chip Childers wrote on Wed, Mar 06, 2013 at 10:00:10 -0500:
 Hi all,
 
 After spamming the private@i.a.o list, and being asked to stop it, I'd
 like to suggest the following changes to the PPMC guide:
 
 
 Index: content/guides/ppmc.xml
 ===
 --- content/guides/ppmc.xml (revision 1453351)
 +++ content/guides/ppmc.xml (working copy)
 @@ -168,7 +168,8 @@
[VOTE] Joe Bob as committer. The [VOTE] message should be forwarded
to the IPMC (a href=mailto:priv...@incubator.apache.org;
priv...@incubator.apache.org/a) to notify them that the
 -  vote is underway/p
 +  vote is underway. Do not BCC or CC the IPMC on the VOTE thread.
 +  Instead, forward the initial VOTE email./p
  
pTo be successful the vote requires strongat least three +1 votes
from PPMC members, including at least one +1
 @@ -179,7 +180,8 @@
a message to the PPMC private alias, and forward it to the IPMC,
with the subject line of [VOTE][RESULT] Joe Bob as committer.
The message should include the usual vote tally, indicating which
 -  mentor or IPMC member votes cause it to be valid.
 +  mentor or IPMC member votes cause it to be valid. Do not
 +  BCC or CC the IPMC on the results email.  Instead, forward it.
/p 
  
p
 @@ -229,8 +231,9 @@
[VOTE] Joe Bob PPMC membership. The [VOTE] message should be forwarded
to the IPMC (a href=mailto:priv...@incubator.apache.org;
priv...@incubator.apache.org/a) to notify them that the
 -  vote is underway. If the vote is successful, the proposer should send 
 -  a message to the PPMC private alias, with
 +  vote is underway. Do not CC or BCC the IPMC on this thread.  Instead,
 +  forward the initial VOTE email.  If the vote is successful, the 
 proposer 
 +  should send a message to the PPMC private alias, with
the subject line of [VOTE][RESULT] Joe Bob PPMC membership. The
message id of the [VOTE][RESULT] message should be preserved for
the message to the Incubator PMC after Joe Bob accepts. Now, Joe Bob
 
 -
 To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
 For additional commands, e-mail: general-h...@incubator.apache.org
 

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: Suggested change to the ppmc guide

2013-03-06 Thread Bertrand Delacretaz
On Wed, Mar 6, 2013 at 4:00 PM, Chip Childers chip.child...@sungard.com wrote:
 ...After spamming the private@i.a.o list, and being asked to stop it, I'd
 like to suggest the following changes to the PPMC guide...

+1, and +1 to Daniels comment, you could point to the private@ thread
where this was discussed, by Message-Id

-Bertrand

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: Suggested change to the ppmc guide

2013-03-06 Thread Chip Childers
On Wed, Mar 06, 2013 at 04:24:20PM +0100, Bertrand Delacretaz wrote:
 On Wed, Mar 6, 2013 at 4:00 PM, Chip Childers chip.child...@sungard.com 
 wrote:
  ...After spamming the private@i.a.o list, and being asked to stop it, I'd
  like to suggest the following changes to the PPMC guide...
 
 +1, and +1 to Daniels comment, you could point to the private@ thread
 where this was discussed, by Message-Id

Thanks.

Committed.  Please let me know if you believe I didn't provide enough
information in the commit message.

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: Suggested change to the ppmc guide

2013-03-06 Thread Craig L Russell

Hi Daniel,

On Mar 6, 2013, at 7:09 AM, Daniel Shahaf wrote:

I believe the change represents current IPMC consensus but it'd be  
nice
if the change documented the rationale for the policy as well (at  
least

in the log message).


There is no change to the process, policy, or consensus. The only  
thing that is different is the emphasis on forwarding instead of cc or  
bcc.


The phrase should be forwarded to the IPMC is not ambiguous, but  
it's apparently easy to overlook.


Craig



Daniel
(last week I ran into a 10 years old change that didn't have any
justification anywhere)

Chip Childers wrote on Wed, Mar 06, 2013 at 10:00:10 -0500:

Hi all,

After spamming the private@i.a.o list, and being asked to stop it,  
I'd

like to suggest the following changes to the PPMC guide:


Index: content/guides/ppmc.xml
===
--- content/guides/ppmc.xml (revision 1453351)
+++ content/guides/ppmc.xml (working copy)
@@ -168,7 +168,8 @@
  [VOTE] Joe Bob as committer. The [VOTE] message should be  
forwarded

  to the IPMC (a href=mailto:priv...@incubator.apache.org;
  priv...@incubator.apache.org/a) to notify them that the
-  vote is underway/p
+  vote is underway. Do not BCC or CC the IPMC on the VOTE  
thread.

+  Instead, forward the initial VOTE email./p

  pTo be successful the vote requires strongat least three  
+1 votes

  from PPMC members, including at least one +1
@@ -179,7 +180,8 @@
  a message to the PPMC private alias, and forward it to the  
IPMC,

  with the subject line of [VOTE][RESULT] Joe Bob as committer.
  The message should include the usual vote tally, indicating  
which

-  mentor or IPMC member votes cause it to be valid.
+  mentor or IPMC member votes cause it to be valid. Do not
+  BCC or CC the IPMC on the results email.  Instead, forward it.
  /p

  p
@@ -229,8 +231,9 @@
  [VOTE] Joe Bob PPMC membership. The [VOTE] message should be  
forwarded

  to the IPMC (a href=mailto:priv...@incubator.apache.org;
  priv...@incubator.apache.org/a) to notify them that the
-  vote is underway. If the vote is successful, the proposer  
should send

-  a message to the PPMC private alias, with
+  vote is underway. Do not CC or BCC the IPMC on this thread.   
Instead,
+  forward the initial VOTE email.  If the vote is successful,  
the proposer

+  should send a message to the PPMC private alias, with
  the subject line of [VOTE][RESULT] Joe Bob PPMC membership. The
  message id of the [VOTE][RESULT] message should be preserved  
for
  the message to the Incubator PMC after Joe Bob accepts. Now,  
Joe Bob


-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Craig L Russell
Architect, Oracle
http://db.apache.org/jdo
408 276-5638 mailto:craig.russ...@oracle.com
P.S. A good JDO? O, Gasp!


-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: Suggested change to the ppmc guide

2013-03-06 Thread Chip Childers
On Wed, Mar 06, 2013 at 08:00:43AM -0800, Craig L Russell wrote:
 Hi Daniel,
 
 On Mar 6, 2013, at 7:09 AM, Daniel Shahaf wrote:
 
 I believe the change represents current IPMC consensus but it'd be
 nice
 if the change documented the rationale for the policy as well (at
 least
 in the log message).
 
 There is no change to the process, policy, or consensus. The only
 thing that is different is the emphasis on forwarding instead of cc
 or bcc.
 
 The phrase should be forwarded to the IPMC is not ambiguous, but
 it's apparently easy to overlook.

Correct - it wasn't clear enough for my (apparently) think skull, although it 
was specific and accurate. Hopefully my patch will help others in the future.



-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



[VOTE] Accept MRQL into the Incubator

2013-03-06 Thread Leonidas Fegaras

Dear ASF members,
I would like to call for a VOTE for acceptance of MRQL into the  
Incubator.

The vote will close on Monday March 11, 2013.

[ ] +1 Accept MRQL into the Apache incubator
[ ] +0 Don't care.
[ ] -1 Don't accept MRQL into the incubator because...

Full proposal is pasted below and the corresponding wiki is

http://wiki.apache.org/incubator/MRQLProposal

Only VOTEs from Incubator PMC members are binding,
but all are welcome to express their thoughts.
Sincerely,
Leonidas Fegaras


= Abstract =

MRQL is a query processing and optimization system for large-scale,
distributed data analysis, built on top of Apache Hadoop and Hama.

= Proposal =

MRQL (pronounced ''miracle'') is a query processing and optimization
system for large-scale, distributed data analysis. MRQL (the MapReduce
Query Language) is an SQL-like query language for large-scale data
analysis on a cluster of computers. The MRQL query processing system
can evaluate MRQL queries in two modes: in MapReduce mode on top of
Apache Hadoop or in Bulk Synchronous Parallel (BSP) mode on top of
Apache Hama. The MRQL query language is powerful enough to express
most common data analysis tasks over many forms of raw ''in-situ''
data, such as XML and JSON documents, binary files, and CSV
documents. MRQL is more powerful than other current high-level
MapReduce languages, such as Hive and PigLatin, since it can operate
on more complex data and supports more powerful query constructs, thus
eliminating the need for using explicit MapReduce code. With MRQL,
users will be able to express complex data analysis tasks, such as
PageRank, k-means clustering, matrix factorization, etc, using
SQL-like queries exclusively, while the MRQL query processing system
will be able to compile these queries to efficient Java code.

= Background =

The initial code was developed at the University of Texas of Arlington
(UTA) by a research team, led by Leonidas Fegaras. The software was
first released in May 2011. The original goal of this project was to
build a query processing system that translates SQL-like data analysis
queries to efficient workflows of MapReduce jobs. A design goal was to
use HDFS as the physical storage layer, without any indexing, data
partitioning, or data normalization, and to use Hadoop (without
extensions) as the run-time engine. The motivation behind this work
was to build a platform to test new ideas on query processing and
optimization techniques applicable to the MapReduce framework.

A year ago, MRQL was extended to run on Hama. The motivation for this
extension was that Hadoop MapReduce jobs were required to read their
input and write their output on HDFS. This simplifies reliability and
fault tolerance but it imposes a high overhead to complex MapReduce
workflows and graph algorithms, such as PageRank, which require
repetitive jobs. In addition, Hadoop does not preserve data in memory
across consecutive MapReduce jobs. This restriction requires to read
data at every step, even when the data is constant. BSP, on the other
hand, does not suffer from this restriction, and, under certain
circumstances, allows complex repetitive algorithms to run entirely in
the collective memory of a cluster. Thus, the goal was to be able to
run the same MRQL queries in both modes, MapReduce and BSP, without
modifying the queries: If there are enough resources available, and
low latency and speed are more important than resilience, queries may
run in BSP mode; otherwise, the same queries may run in MapReduce
mode. BSP evaluation was found to be a good choice when fault
tolerance is not critical, data (both input and intermediate) can fit
in the cluster memory, and data processing requires complex/repetitive
steps.

The research results of this ongoing work have already been published
in conferences (WebDB'11, EDBT'12, and DataCloud'12) and the authors
have already received positive feedback from researchers in academia
and industry who were attending these conferences.

= Rationale =

* MRQL will be the first general-purpose, SQL-like query language for
data analysis based on BSP.
Currently, many programmers prefer to code their MapReduce
applications in a higher-level query language, rather than an
algorithmic language. For instance, Pig is used for 60% of Yahoo
MapReduce jobs, while Hive is used for 90% of Facebook MapReduce
jobs. This, we believe, will also be the trend for BSP applications,
because, even though, in principle, the BSP model is very simple to
understand, it is hard to develop, optimize, and maintain non-trivial
BSP applications coded in a general-purpose programming
language. Currently, there is no widely acceptable declarative BSP
query language, although there are a few special-purpose BSP systems
for graph analysis, such as Google Pregel and Apache Giraph, for
machine learning, such as BSML, and for scientific data analysis.

* MRQL can capture many complex data analysis algorithms in
declarative form.
Existing MapReduce query 

Re: [VOTE] Accept MRQL into the Incubator

2013-03-06 Thread Mohammad Nour El-Din
+1


On Wed, Mar 6, 2013 at 6:04 PM, Leonidas Fegaras fega...@cse.uta.eduwrote:

 Dear ASF members,
 I would like to call for a VOTE for acceptance of MRQL into the Incubator.
 The vote will close on Monday March 11, 2013.

 [ ] +1 Accept MRQL into the Apache incubator
 [ ] +0 Don't care.
 [ ] -1 Don't accept MRQL into the incubator because...

 Full proposal is pasted below and the corresponding wiki is

 http://wiki.apache.org/**incubator/MRQLProposalhttp://wiki.apache.org/incubator/MRQLProposal

 Only VOTEs from Incubator PMC members are binding,
 but all are welcome to express their thoughts.
 Sincerely,
 Leonidas Fegaras


 = Abstract =

 MRQL is a query processing and optimization system for large-scale,
 distributed data analysis, built on top of Apache Hadoop and Hama.

 = Proposal =

 MRQL (pronounced ''miracle'') is a query processing and optimization
 system for large-scale, distributed data analysis. MRQL (the MapReduce
 Query Language) is an SQL-like query language for large-scale data
 analysis on a cluster of computers. The MRQL query processing system
 can evaluate MRQL queries in two modes: in MapReduce mode on top of
 Apache Hadoop or in Bulk Synchronous Parallel (BSP) mode on top of
 Apache Hama. The MRQL query language is powerful enough to express
 most common data analysis tasks over many forms of raw ''in-situ''
 data, such as XML and JSON documents, binary files, and CSV
 documents. MRQL is more powerful than other current high-level
 MapReduce languages, such as Hive and PigLatin, since it can operate
 on more complex data and supports more powerful query constructs, thus
 eliminating the need for using explicit MapReduce code. With MRQL,
 users will be able to express complex data analysis tasks, such as
 PageRank, k-means clustering, matrix factorization, etc, using
 SQL-like queries exclusively, while the MRQL query processing system
 will be able to compile these queries to efficient Java code.

 = Background =

 The initial code was developed at the University of Texas of Arlington
 (UTA) by a research team, led by Leonidas Fegaras. The software was
 first released in May 2011. The original goal of this project was to
 build a query processing system that translates SQL-like data analysis
 queries to efficient workflows of MapReduce jobs. A design goal was to
 use HDFS as the physical storage layer, without any indexing, data
 partitioning, or data normalization, and to use Hadoop (without
 extensions) as the run-time engine. The motivation behind this work
 was to build a platform to test new ideas on query processing and
 optimization techniques applicable to the MapReduce framework.

 A year ago, MRQL was extended to run on Hama. The motivation for this
 extension was that Hadoop MapReduce jobs were required to read their
 input and write their output on HDFS. This simplifies reliability and
 fault tolerance but it imposes a high overhead to complex MapReduce
 workflows and graph algorithms, such as PageRank, which require
 repetitive jobs. In addition, Hadoop does not preserve data in memory
 across consecutive MapReduce jobs. This restriction requires to read
 data at every step, even when the data is constant. BSP, on the other
 hand, does not suffer from this restriction, and, under certain
 circumstances, allows complex repetitive algorithms to run entirely in
 the collective memory of a cluster. Thus, the goal was to be able to
 run the same MRQL queries in both modes, MapReduce and BSP, without
 modifying the queries: If there are enough resources available, and
 low latency and speed are more important than resilience, queries may
 run in BSP mode; otherwise, the same queries may run in MapReduce
 mode. BSP evaluation was found to be a good choice when fault
 tolerance is not critical, data (both input and intermediate) can fit
 in the cluster memory, and data processing requires complex/repetitive
 steps.

 The research results of this ongoing work have already been published
 in conferences (WebDB'11, EDBT'12, and DataCloud'12) and the authors
 have already received positive feedback from researchers in academia
 and industry who were attending these conferences.

 = Rationale =

 * MRQL will be the first general-purpose, SQL-like query language for
 data analysis based on BSP.
 Currently, many programmers prefer to code their MapReduce
 applications in a higher-level query language, rather than an
 algorithmic language. For instance, Pig is used for 60% of Yahoo
 MapReduce jobs, while Hive is used for 90% of Facebook MapReduce
 jobs. This, we believe, will also be the trend for BSP applications,
 because, even though, in principle, the BSP model is very simple to
 understand, it is hard to develop, optimize, and maintain non-trivial
 BSP applications coded in a general-purpose programming
 language. Currently, there is no widely acceptable declarative BSP
 query language, although there are a few special-purpose BSP systems
 for graph analysis, such as Google 

Re: [VOTE] Accept MRQL into the Incubator

2013-03-06 Thread Alex Karasulu
+1 (binding)


On Wed, Mar 6, 2013 at 7:04 PM, Leonidas Fegaras fega...@cse.uta.eduwrote:

 Dear ASF members,
 I would like to call for a VOTE for acceptance of MRQL into the Incubator.
 The vote will close on Monday March 11, 2013.

 [ ] +1 Accept MRQL into the Apache incubator
 [ ] +0 Don't care.
 [ ] -1 Don't accept MRQL into the incubator because...

 Full proposal is pasted below and the corresponding wiki is

 http://wiki.apache.org/**incubator/MRQLProposalhttp://wiki.apache.org/incubator/MRQLProposal

 Only VOTEs from Incubator PMC members are binding,
 but all are welcome to express their thoughts.
 Sincerely,
 Leonidas Fegaras


 = Abstract =

 MRQL is a query processing and optimization system for large-scale,
 distributed data analysis, built on top of Apache Hadoop and Hama.

 = Proposal =

 MRQL (pronounced ''miracle'') is a query processing and optimization
 system for large-scale, distributed data analysis. MRQL (the MapReduce
 Query Language) is an SQL-like query language for large-scale data
 analysis on a cluster of computers. The MRQL query processing system
 can evaluate MRQL queries in two modes: in MapReduce mode on top of
 Apache Hadoop or in Bulk Synchronous Parallel (BSP) mode on top of
 Apache Hama. The MRQL query language is powerful enough to express
 most common data analysis tasks over many forms of raw ''in-situ''
 data, such as XML and JSON documents, binary files, and CSV
 documents. MRQL is more powerful than other current high-level
 MapReduce languages, such as Hive and PigLatin, since it can operate
 on more complex data and supports more powerful query constructs, thus
 eliminating the need for using explicit MapReduce code. With MRQL,
 users will be able to express complex data analysis tasks, such as
 PageRank, k-means clustering, matrix factorization, etc, using
 SQL-like queries exclusively, while the MRQL query processing system
 will be able to compile these queries to efficient Java code.

 = Background =

 The initial code was developed at the University of Texas of Arlington
 (UTA) by a research team, led by Leonidas Fegaras. The software was
 first released in May 2011. The original goal of this project was to
 build a query processing system that translates SQL-like data analysis
 queries to efficient workflows of MapReduce jobs. A design goal was to
 use HDFS as the physical storage layer, without any indexing, data
 partitioning, or data normalization, and to use Hadoop (without
 extensions) as the run-time engine. The motivation behind this work
 was to build a platform to test new ideas on query processing and
 optimization techniques applicable to the MapReduce framework.

 A year ago, MRQL was extended to run on Hama. The motivation for this
 extension was that Hadoop MapReduce jobs were required to read their
 input and write their output on HDFS. This simplifies reliability and
 fault tolerance but it imposes a high overhead to complex MapReduce
 workflows and graph algorithms, such as PageRank, which require
 repetitive jobs. In addition, Hadoop does not preserve data in memory
 across consecutive MapReduce jobs. This restriction requires to read
 data at every step, even when the data is constant. BSP, on the other
 hand, does not suffer from this restriction, and, under certain
 circumstances, allows complex repetitive algorithms to run entirely in
 the collective memory of a cluster. Thus, the goal was to be able to
 run the same MRQL queries in both modes, MapReduce and BSP, without
 modifying the queries: If there are enough resources available, and
 low latency and speed are more important than resilience, queries may
 run in BSP mode; otherwise, the same queries may run in MapReduce
 mode. BSP evaluation was found to be a good choice when fault
 tolerance is not critical, data (both input and intermediate) can fit
 in the cluster memory, and data processing requires complex/repetitive
 steps.

 The research results of this ongoing work have already been published
 in conferences (WebDB'11, EDBT'12, and DataCloud'12) and the authors
 have already received positive feedback from researchers in academia
 and industry who were attending these conferences.

 = Rationale =

 * MRQL will be the first general-purpose, SQL-like query language for
 data analysis based on BSP.
 Currently, many programmers prefer to code their MapReduce
 applications in a higher-level query language, rather than an
 algorithmic language. For instance, Pig is used for 60% of Yahoo
 MapReduce jobs, while Hive is used for 90% of Facebook MapReduce
 jobs. This, we believe, will also be the trend for BSP applications,
 because, even though, in principle, the BSP model is very simple to
 understand, it is hard to develop, optimize, and maintain non-trivial
 BSP applications coded in a general-purpose programming
 language. Currently, there is no widely acceptable declarative BSP
 query language, although there are a few special-purpose BSP systems
 for graph analysis, such as 

Re: [PROPOSAL] MRQL for the Apache Incubator

2013-03-06 Thread Mohammad Nour El-Din
I added myself as a mentor. Welcome aboard.


On Wed, Mar 6, 2013 at 9:02 AM, Edward J. Yoon edwardy...@apache.orgwrote:

 I think it's time to call for vote.

 On Mon, Mar 4, 2013 at 9:25 PM, Tommaso Teofili
 tommaso.teof...@gmail.com wrote:
  Nice proposal indeed, I'd say having 3 mentors is usually better to avoid
  release headaches.
  Regards,
  Tommaso
 
 
  2013/3/4 Edward J. Yoon edwardy...@apache.org
 
  Sure I can. :)
 
  Of course, we'll welcome more mentors from incubator IPMC if there're
  volunteers.
 
  On Mon, Mar 4, 2013 at 7:34 PM, Alex Karasulu akaras...@apache.org
  wrote:
   On Mon, Mar 4, 2013 at 12:31 PM, Bertrand Delacretaz 
  bdelacre...@apache.org
   wrote:
  
   On Sat, Mar 2, 2013 at 7:12 AM, Leonidas Fegaras 
 fega...@cse.uta.edu
   wrote:
== Champion ==
* Edward J. Yoon edwardyoon AT apache DOT org
== Nominated Mentors ==
* Alex Karasulu akarasulu AT apache DOT org
   ...
  
   Is Edward going to stay on as a mentor as well?
  
   Two (active) mentors is the bare minimum IMO.
  
  
   I suspect so but let's hear from Edward himself.
  
   Best Regards,
   -- Alex
 
 
 
  --
  Best Regards, Edward J. Yoon
  @eddieyoon
 
  -
  To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
  For additional commands, e-mail: general-h...@incubator.apache.org
 
 



 --
 Best Regards, Edward J. Yoon
 @eddieyoon

 -
 To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
 For additional commands, e-mail: general-h...@incubator.apache.org




-- 
Thanks
- Mohammad Nour

Life is like riding a bicycle. To keep your balance you must keep moving
- Albert Einstein


Re: [VOTE] Accept MRQL into the Incubator

2013-03-06 Thread Tommaso Teofili
+1

Tommaso


2013/3/6 Alex Karasulu akaras...@apache.org

 +1 (binding)


 On Wed, Mar 6, 2013 at 7:04 PM, Leonidas Fegaras fega...@cse.uta.edu
 wrote:

  Dear ASF members,
  I would like to call for a VOTE for acceptance of MRQL into the
 Incubator.
  The vote will close on Monday March 11, 2013.
 
  [ ] +1 Accept MRQL into the Apache incubator
  [ ] +0 Don't care.
  [ ] -1 Don't accept MRQL into the incubator because...
 
  Full proposal is pasted below and the corresponding wiki is
 
  http://wiki.apache.org/**incubator/MRQLProposal
 http://wiki.apache.org/incubator/MRQLProposal
 
  Only VOTEs from Incubator PMC members are binding,
  but all are welcome to express their thoughts.
  Sincerely,
  Leonidas Fegaras
 
 
  = Abstract =
 
  MRQL is a query processing and optimization system for large-scale,
  distributed data analysis, built on top of Apache Hadoop and Hama.
 
  = Proposal =
 
  MRQL (pronounced ''miracle'') is a query processing and optimization
  system for large-scale, distributed data analysis. MRQL (the MapReduce
  Query Language) is an SQL-like query language for large-scale data
  analysis on a cluster of computers. The MRQL query processing system
  can evaluate MRQL queries in two modes: in MapReduce mode on top of
  Apache Hadoop or in Bulk Synchronous Parallel (BSP) mode on top of
  Apache Hama. The MRQL query language is powerful enough to express
  most common data analysis tasks over many forms of raw ''in-situ''
  data, such as XML and JSON documents, binary files, and CSV
  documents. MRQL is more powerful than other current high-level
  MapReduce languages, such as Hive and PigLatin, since it can operate
  on more complex data and supports more powerful query constructs, thus
  eliminating the need for using explicit MapReduce code. With MRQL,
  users will be able to express complex data analysis tasks, such as
  PageRank, k-means clustering, matrix factorization, etc, using
  SQL-like queries exclusively, while the MRQL query processing system
  will be able to compile these queries to efficient Java code.
 
  = Background =
 
  The initial code was developed at the University of Texas of Arlington
  (UTA) by a research team, led by Leonidas Fegaras. The software was
  first released in May 2011. The original goal of this project was to
  build a query processing system that translates SQL-like data analysis
  queries to efficient workflows of MapReduce jobs. A design goal was to
  use HDFS as the physical storage layer, without any indexing, data
  partitioning, or data normalization, and to use Hadoop (without
  extensions) as the run-time engine. The motivation behind this work
  was to build a platform to test new ideas on query processing and
  optimization techniques applicable to the MapReduce framework.
 
  A year ago, MRQL was extended to run on Hama. The motivation for this
  extension was that Hadoop MapReduce jobs were required to read their
  input and write their output on HDFS. This simplifies reliability and
  fault tolerance but it imposes a high overhead to complex MapReduce
  workflows and graph algorithms, such as PageRank, which require
  repetitive jobs. In addition, Hadoop does not preserve data in memory
  across consecutive MapReduce jobs. This restriction requires to read
  data at every step, even when the data is constant. BSP, on the other
  hand, does not suffer from this restriction, and, under certain
  circumstances, allows complex repetitive algorithms to run entirely in
  the collective memory of a cluster. Thus, the goal was to be able to
  run the same MRQL queries in both modes, MapReduce and BSP, without
  modifying the queries: If there are enough resources available, and
  low latency and speed are more important than resilience, queries may
  run in BSP mode; otherwise, the same queries may run in MapReduce
  mode. BSP evaluation was found to be a good choice when fault
  tolerance is not critical, data (both input and intermediate) can fit
  in the cluster memory, and data processing requires complex/repetitive
  steps.
 
  The research results of this ongoing work have already been published
  in conferences (WebDB'11, EDBT'12, and DataCloud'12) and the authors
  have already received positive feedback from researchers in academia
  and industry who were attending these conferences.
 
  = Rationale =
 
  * MRQL will be the first general-purpose, SQL-like query language for
  data analysis based on BSP.
  Currently, many programmers prefer to code their MapReduce
  applications in a higher-level query language, rather than an
  algorithmic language. For instance, Pig is used for 60% of Yahoo
  MapReduce jobs, while Hive is used for 90% of Facebook MapReduce
  jobs. This, we believe, will also be the trend for BSP applications,
  because, even though, in principle, the BSP model is very simple to
  understand, it is hard to develop, optimize, and maintain non-trivial
  BSP applications coded in a general-purpose programming
  

Re: [VOTE] Accept MRQL into the Incubator

2013-03-06 Thread Edward J. Yoon
+1

On Thu, Mar 7, 2013 at 2:11 AM, Tommaso Teofili
tommaso.teof...@gmail.com wrote:
 +1

 Tommaso


 2013/3/6 Alex Karasulu akaras...@apache.org

 +1 (binding)


 On Wed, Mar 6, 2013 at 7:04 PM, Leonidas Fegaras fega...@cse.uta.edu
 wrote:

  Dear ASF members,
  I would like to call for a VOTE for acceptance of MRQL into the
 Incubator.
  The vote will close on Monday March 11, 2013.
 
  [ ] +1 Accept MRQL into the Apache incubator
  [ ] +0 Don't care.
  [ ] -1 Don't accept MRQL into the incubator because...
 
  Full proposal is pasted below and the corresponding wiki is
 
  http://wiki.apache.org/**incubator/MRQLProposal
 http://wiki.apache.org/incubator/MRQLProposal
 
  Only VOTEs from Incubator PMC members are binding,
  but all are welcome to express their thoughts.
  Sincerely,
  Leonidas Fegaras
 
 
  = Abstract =
 
  MRQL is a query processing and optimization system for large-scale,
  distributed data analysis, built on top of Apache Hadoop and Hama.
 
  = Proposal =
 
  MRQL (pronounced ''miracle'') is a query processing and optimization
  system for large-scale, distributed data analysis. MRQL (the MapReduce
  Query Language) is an SQL-like query language for large-scale data
  analysis on a cluster of computers. The MRQL query processing system
  can evaluate MRQL queries in two modes: in MapReduce mode on top of
  Apache Hadoop or in Bulk Synchronous Parallel (BSP) mode on top of
  Apache Hama. The MRQL query language is powerful enough to express
  most common data analysis tasks over many forms of raw ''in-situ''
  data, such as XML and JSON documents, binary files, and CSV
  documents. MRQL is more powerful than other current high-level
  MapReduce languages, such as Hive and PigLatin, since it can operate
  on more complex data and supports more powerful query constructs, thus
  eliminating the need for using explicit MapReduce code. With MRQL,
  users will be able to express complex data analysis tasks, such as
  PageRank, k-means clustering, matrix factorization, etc, using
  SQL-like queries exclusively, while the MRQL query processing system
  will be able to compile these queries to efficient Java code.
 
  = Background =
 
  The initial code was developed at the University of Texas of Arlington
  (UTA) by a research team, led by Leonidas Fegaras. The software was
  first released in May 2011. The original goal of this project was to
  build a query processing system that translates SQL-like data analysis
  queries to efficient workflows of MapReduce jobs. A design goal was to
  use HDFS as the physical storage layer, without any indexing, data
  partitioning, or data normalization, and to use Hadoop (without
  extensions) as the run-time engine. The motivation behind this work
  was to build a platform to test new ideas on query processing and
  optimization techniques applicable to the MapReduce framework.
 
  A year ago, MRQL was extended to run on Hama. The motivation for this
  extension was that Hadoop MapReduce jobs were required to read their
  input and write their output on HDFS. This simplifies reliability and
  fault tolerance but it imposes a high overhead to complex MapReduce
  workflows and graph algorithms, such as PageRank, which require
  repetitive jobs. In addition, Hadoop does not preserve data in memory
  across consecutive MapReduce jobs. This restriction requires to read
  data at every step, even when the data is constant. BSP, on the other
  hand, does not suffer from this restriction, and, under certain
  circumstances, allows complex repetitive algorithms to run entirely in
  the collective memory of a cluster. Thus, the goal was to be able to
  run the same MRQL queries in both modes, MapReduce and BSP, without
  modifying the queries: If there are enough resources available, and
  low latency and speed are more important than resilience, queries may
  run in BSP mode; otherwise, the same queries may run in MapReduce
  mode. BSP evaluation was found to be a good choice when fault
  tolerance is not critical, data (both input and intermediate) can fit
  in the cluster memory, and data processing requires complex/repetitive
  steps.
 
  The research results of this ongoing work have already been published
  in conferences (WebDB'11, EDBT'12, and DataCloud'12) and the authors
  have already received positive feedback from researchers in academia
  and industry who were attending these conferences.
 
  = Rationale =
 
  * MRQL will be the first general-purpose, SQL-like query language for
  data analysis based on BSP.
  Currently, many programmers prefer to code their MapReduce
  applications in a higher-level query language, rather than an
  algorithmic language. For instance, Pig is used for 60% of Yahoo
  MapReduce jobs, while Hive is used for 90% of Facebook MapReduce
  jobs. This, we believe, will also be the trend for BSP applications,
  because, even though, in principle, the BSP model is very simple to
  understand, it is hard to develop, optimize, 

Re: [VOTE] Accept Curator into the Incubator

2013-03-06 Thread Enis Söztutar
+1 (binding)

Disclosure: I am one of the mentors.


On Wed, Mar 6, 2013 at 4:27 AM, Ioan Eugen Stan stan.ieu...@gmail.comwrote:

 +1 non binding

 --
 Ioan Eugen Stan

 -
 To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
 For additional commands, e-mail: general-h...@incubator.apache.org




Re: [VOTE] Accept Curator into the Incubator

2013-03-06 Thread Mahadev Konar
+1 (binding)


thanks
mahadev

On Wed, Mar 6, 2013 at 7:14 PM, Enis Söztutar e...@apache.org wrote:

 +1 (binding)

 Disclosure: I am one of the mentors.


 On Wed, Mar 6, 2013 at 4:27 AM, Ioan Eugen Stan stan.ieu...@gmail.com
 wrote:

  +1 non binding
 
  --
  Ioan Eugen Stan
 
  -
  To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
  For additional commands, e-mail: general-h...@incubator.apache.org
 
 



Re: [VOTE] Accept Provisionr into the Apache Incubator

2013-03-06 Thread Andrei Savu
Thanks to all who voted! With 18 +1s (10 binding) the vote passes.

I'll start the work to get the podling started.

Thanks,
Andrei

On Mon, Mar 4, 2013 at 9:31 PM, Henry Saputra henry.sapu...@gmail.comwrote:

 +1 non-binding

 Good luck


 On Sat, Mar 2, 2013 at 3:35 PM, Andrei Savu as...@apache.org wrote:

  Hi Guys,
 
  I'd like to call a VOTE for acceptance of Provisionr into the Apache
  Incubator.
 
  The vote will close on March 8.
 
  [] +1 Accept Provisionr into the Apache incubator
  [] +0 Don't care.
  [] -1 Don't accept Provisionr into the incubator because...
 
  Full proposal is pasted at the bottom on this email, and the
 corresponding
  wiki is http://wiki.apache.org/incubator/ProvisionrProposal
 
  Only VOTEs from Incubator PMC members are binding, but all are welcome to
  express their thoughts.
 
  Thanks,
  Andrei Savu
 
  --
  Provisionr Proposal
 
  == Abstract ==
 
  Provisionr is an effort to develop a service that can be used to create
 and
  manage pools of virtual machines on multiple clouds. Our focus is on
  semi-automated workflows and cloud portability.
 
  == Proposal ==
 
  Provisionr solves the problem of cloud portability by hiding completely
 the
  APIs and only focusing on building a cluster that matches the same set of
  assumptions on all clouds, assumptions like: running a specific operating
  system (e.g. Ubuntu 12.04 LTS), having the same set of pre-installed
  packages and binaries, sane dns settings (forward  reverse ip
 resolution -
  as needed for Hadoop), ntp settings, networking settings, firewall, ssh
  admin access, vpn access etc.
 
  As a secondary goal Provisionr should also provide primitives for
 building
  automatic or semi-automatic workflows for configuring services, workflows
  that assume that all the machines share a common set of characteristics
 as
  described above.
 
  == Background ==
 
  Creating clusters on cloud infrastructure is non-trivial because careful
  orchestration is required. To make it easy to deploy services we need to
  start from a foundation that matches a common set of assumptions on
  multiple providers.
 
  == Rationale ==
 
  This project started as a re-write of the core of Apache Whirr but has a
  different target being more focused on semi-automated workflows and cloud
  portability.
 
  == Initial Goals ==
 
   * Build a community
   * Provide an excellent user experience for semi-automatic workflows
 (e.g.
  using Rundeck)
   * Implement a REST service and a Web Console
   * Add support for more providers
 
  == Current Status ==
 
  Provisionr had four releases on [[
  https://github.com/axemblr/axemblr-provisionr/wiki|GitHub]] and it's
 used
  to deploy Hadoop clusters on-demand at Axemblr and infrastructure for
  testing / QA.
 
  === Meritocracy ===
 
  We plan to invest in supporting a meritocracy. We will discuss the
  requirements in an open forum. Several companies have already expressed
  interest in this project, and we intend to invite additional developers
 to
  participate. We will encourage and monitor community participation so
 that
  privileges can be extended to those that contribute.
 
  === Community ===
 
  The community interested in cloud service infrastructure is currently
  spread across many smaller projects, and one of the main goals of this
  project is to build a vibrant community to share best practices and build
  common infrastructure.
 
  === Core developers ===
 
  Core developers are very experienced in the Apache ecosystem. To achieve
  more diversity of developers, we will be eager to recruit developers from
  diverse companies.
 
   * Andrei Savu - asavu at apache dot org  (Apache Whirr PMC)
   * Ioan Eugen Stan - ieugen at apache dot org (Apache James PMC)
   * Alex Ciminian -  alex.ciminian at gmail dot org
 
  === Alignment ===
 
  Provisionr complements Apache Whirr and later on it should provide a
 robust
  foundation for more advanced functionalities.
 
  == Known Risks ==
 
  === Orphaned products ===
 
  The contributors have significant open source experience and the project
 is
  being used as part of a commercial product, so the risk of being orphaned
  is relatively low. We plan to mitigate this risk by recruiting additional
  committers.
 
  === Inexperience with Open Source ===
 
  Most of the initial committers have experience working on open source
  projects. Andrei Savu and Ioan Eugen Stan have experience as committers
 and
  PMC members on other Apache projects.
 
  === Homogenous Developers ===
 
  We are committed to recruiting additional committers from other companies
  based on their contributions to the project.
 
  === Reliance on Salaried Developers ===
 
  It is expected that Provisionr development will occur on both salaried
 time
  and on volunteer time, after hours. The majority of initial committers
 are
  paid by their employer to contribute to this project. However, they are
 all
  passionate about the project, and we are confident