Re: [PROPOSAL] MRQL for the Apache Incubator
I think it's time to call for vote. On Mon, Mar 4, 2013 at 9:25 PM, Tommaso Teofili tommaso.teof...@gmail.com wrote: Nice proposal indeed, I'd say having 3 mentors is usually better to avoid release headaches. Regards, Tommaso 2013/3/4 Edward J. Yoon edwardy...@apache.org Sure I can. :) Of course, we'll welcome more mentors from incubator IPMC if there're volunteers. On Mon, Mar 4, 2013 at 7:34 PM, Alex Karasulu akaras...@apache.org wrote: On Mon, Mar 4, 2013 at 12:31 PM, Bertrand Delacretaz bdelacre...@apache.org wrote: On Sat, Mar 2, 2013 at 7:12 AM, Leonidas Fegaras fega...@cse.uta.edu wrote: == Champion == * Edward J. Yoon edwardyoon AT apache DOT org == Nominated Mentors == * Alex Karasulu akarasulu AT apache DOT org ... Is Edward going to stay on as a mentor as well? Two (active) mentors is the bare minimum IMO. I suspect so but let's hear from Edward himself. Best Regards, -- Alex -- Best Regards, Edward J. Yoon @eddieyoon - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org -- Best Regards, Edward J. Yoon @eddieyoon - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [VOTE] Accept Curator into the Incubator
+1 non-binding
Re: [VOTE] Accept Curator into the Incubator
+1 non binding -- Ioan Eugen Stan - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Permission to edit incubator wiki
Can someone grant me permissions to edit the incubator wiki? My wiki id is jdreimann Thanks! Joe
Suggested change to the ppmc guide
Hi all, After spamming the private@i.a.o list, and being asked to stop it, I'd like to suggest the following changes to the PPMC guide: Index: content/guides/ppmc.xml === --- content/guides/ppmc.xml (revision 1453351) +++ content/guides/ppmc.xml (working copy) @@ -168,7 +168,8 @@ [VOTE] Joe Bob as committer. The [VOTE] message should be forwarded to the IPMC (a href=mailto:priv...@incubator.apache.org; priv...@incubator.apache.org/a) to notify them that the - vote is underway/p + vote is underway. Do not BCC or CC the IPMC on the VOTE thread. + Instead, forward the initial VOTE email./p pTo be successful the vote requires strongat least three +1 votes from PPMC members, including at least one +1 @@ -179,7 +180,8 @@ a message to the PPMC private alias, and forward it to the IPMC, with the subject line of [VOTE][RESULT] Joe Bob as committer. The message should include the usual vote tally, indicating which - mentor or IPMC member votes cause it to be valid. + mentor or IPMC member votes cause it to be valid. Do not + BCC or CC the IPMC on the results email. Instead, forward it. /p p @@ -229,8 +231,9 @@ [VOTE] Joe Bob PPMC membership. The [VOTE] message should be forwarded to the IPMC (a href=mailto:priv...@incubator.apache.org; priv...@incubator.apache.org/a) to notify them that the - vote is underway. If the vote is successful, the proposer should send - a message to the PPMC private alias, with + vote is underway. Do not CC or BCC the IPMC on this thread. Instead, + forward the initial VOTE email. If the vote is successful, the proposer + should send a message to the PPMC private alias, with the subject line of [VOTE][RESULT] Joe Bob PPMC membership. The message id of the [VOTE][RESULT] message should be preserved for the message to the Incubator PMC after Joe Bob accepts. Now, Joe Bob - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: Suggested change to the ppmc guide
I believe the change represents current IPMC consensus but it'd be nice if the change documented the rationale for the policy as well (at least in the log message). Daniel (last week I ran into a 10 years old change that didn't have any justification anywhere) Chip Childers wrote on Wed, Mar 06, 2013 at 10:00:10 -0500: Hi all, After spamming the private@i.a.o list, and being asked to stop it, I'd like to suggest the following changes to the PPMC guide: Index: content/guides/ppmc.xml === --- content/guides/ppmc.xml (revision 1453351) +++ content/guides/ppmc.xml (working copy) @@ -168,7 +168,8 @@ [VOTE] Joe Bob as committer. The [VOTE] message should be forwarded to the IPMC (a href=mailto:priv...@incubator.apache.org; priv...@incubator.apache.org/a) to notify them that the - vote is underway/p + vote is underway. Do not BCC or CC the IPMC on the VOTE thread. + Instead, forward the initial VOTE email./p pTo be successful the vote requires strongat least three +1 votes from PPMC members, including at least one +1 @@ -179,7 +180,8 @@ a message to the PPMC private alias, and forward it to the IPMC, with the subject line of [VOTE][RESULT] Joe Bob as committer. The message should include the usual vote tally, indicating which - mentor or IPMC member votes cause it to be valid. + mentor or IPMC member votes cause it to be valid. Do not + BCC or CC the IPMC on the results email. Instead, forward it. /p p @@ -229,8 +231,9 @@ [VOTE] Joe Bob PPMC membership. The [VOTE] message should be forwarded to the IPMC (a href=mailto:priv...@incubator.apache.org; priv...@incubator.apache.org/a) to notify them that the - vote is underway. If the vote is successful, the proposer should send - a message to the PPMC private alias, with + vote is underway. Do not CC or BCC the IPMC on this thread. Instead, + forward the initial VOTE email. If the vote is successful, the proposer + should send a message to the PPMC private alias, with the subject line of [VOTE][RESULT] Joe Bob PPMC membership. The message id of the [VOTE][RESULT] message should be preserved for the message to the Incubator PMC after Joe Bob accepts. Now, Joe Bob - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: Suggested change to the ppmc guide
On Wed, Mar 6, 2013 at 4:00 PM, Chip Childers chip.child...@sungard.com wrote: ...After spamming the private@i.a.o list, and being asked to stop it, I'd like to suggest the following changes to the PPMC guide... +1, and +1 to Daniels comment, you could point to the private@ thread where this was discussed, by Message-Id -Bertrand - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: Suggested change to the ppmc guide
On Wed, Mar 06, 2013 at 04:24:20PM +0100, Bertrand Delacretaz wrote: On Wed, Mar 6, 2013 at 4:00 PM, Chip Childers chip.child...@sungard.com wrote: ...After spamming the private@i.a.o list, and being asked to stop it, I'd like to suggest the following changes to the PPMC guide... +1, and +1 to Daniels comment, you could point to the private@ thread where this was discussed, by Message-Id Thanks. Committed. Please let me know if you believe I didn't provide enough information in the commit message. - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: Suggested change to the ppmc guide
Hi Daniel, On Mar 6, 2013, at 7:09 AM, Daniel Shahaf wrote: I believe the change represents current IPMC consensus but it'd be nice if the change documented the rationale for the policy as well (at least in the log message). There is no change to the process, policy, or consensus. The only thing that is different is the emphasis on forwarding instead of cc or bcc. The phrase should be forwarded to the IPMC is not ambiguous, but it's apparently easy to overlook. Craig Daniel (last week I ran into a 10 years old change that didn't have any justification anywhere) Chip Childers wrote on Wed, Mar 06, 2013 at 10:00:10 -0500: Hi all, After spamming the private@i.a.o list, and being asked to stop it, I'd like to suggest the following changes to the PPMC guide: Index: content/guides/ppmc.xml === --- content/guides/ppmc.xml (revision 1453351) +++ content/guides/ppmc.xml (working copy) @@ -168,7 +168,8 @@ [VOTE] Joe Bob as committer. The [VOTE] message should be forwarded to the IPMC (a href=mailto:priv...@incubator.apache.org; priv...@incubator.apache.org/a) to notify them that the - vote is underway/p + vote is underway. Do not BCC or CC the IPMC on the VOTE thread. + Instead, forward the initial VOTE email./p pTo be successful the vote requires strongat least three +1 votes from PPMC members, including at least one +1 @@ -179,7 +180,8 @@ a message to the PPMC private alias, and forward it to the IPMC, with the subject line of [VOTE][RESULT] Joe Bob as committer. The message should include the usual vote tally, indicating which - mentor or IPMC member votes cause it to be valid. + mentor or IPMC member votes cause it to be valid. Do not + BCC or CC the IPMC on the results email. Instead, forward it. /p p @@ -229,8 +231,9 @@ [VOTE] Joe Bob PPMC membership. The [VOTE] message should be forwarded to the IPMC (a href=mailto:priv...@incubator.apache.org; priv...@incubator.apache.org/a) to notify them that the - vote is underway. If the vote is successful, the proposer should send - a message to the PPMC private alias, with + vote is underway. Do not CC or BCC the IPMC on this thread. Instead, + forward the initial VOTE email. If the vote is successful, the proposer + should send a message to the PPMC private alias, with the subject line of [VOTE][RESULT] Joe Bob PPMC membership. The message id of the [VOTE][RESULT] message should be preserved for the message to the Incubator PMC after Joe Bob accepts. Now, Joe Bob - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org Craig L Russell Architect, Oracle http://db.apache.org/jdo 408 276-5638 mailto:craig.russ...@oracle.com P.S. A good JDO? O, Gasp! - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: Suggested change to the ppmc guide
On Wed, Mar 06, 2013 at 08:00:43AM -0800, Craig L Russell wrote: Hi Daniel, On Mar 6, 2013, at 7:09 AM, Daniel Shahaf wrote: I believe the change represents current IPMC consensus but it'd be nice if the change documented the rationale for the policy as well (at least in the log message). There is no change to the process, policy, or consensus. The only thing that is different is the emphasis on forwarding instead of cc or bcc. The phrase should be forwarded to the IPMC is not ambiguous, but it's apparently easy to overlook. Correct - it wasn't clear enough for my (apparently) think skull, although it was specific and accurate. Hopefully my patch will help others in the future. - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
[VOTE] Accept MRQL into the Incubator
Dear ASF members, I would like to call for a VOTE for acceptance of MRQL into the Incubator. The vote will close on Monday March 11, 2013. [ ] +1 Accept MRQL into the Apache incubator [ ] +0 Don't care. [ ] -1 Don't accept MRQL into the incubator because... Full proposal is pasted below and the corresponding wiki is http://wiki.apache.org/incubator/MRQLProposal Only VOTEs from Incubator PMC members are binding, but all are welcome to express their thoughts. Sincerely, Leonidas Fegaras = Abstract = MRQL is a query processing and optimization system for large-scale, distributed data analysis, built on top of Apache Hadoop and Hama. = Proposal = MRQL (pronounced ''miracle'') is a query processing and optimization system for large-scale, distributed data analysis. MRQL (the MapReduce Query Language) is an SQL-like query language for large-scale data analysis on a cluster of computers. The MRQL query processing system can evaluate MRQL queries in two modes: in MapReduce mode on top of Apache Hadoop or in Bulk Synchronous Parallel (BSP) mode on top of Apache Hama. The MRQL query language is powerful enough to express most common data analysis tasks over many forms of raw ''in-situ'' data, such as XML and JSON documents, binary files, and CSV documents. MRQL is more powerful than other current high-level MapReduce languages, such as Hive and PigLatin, since it can operate on more complex data and supports more powerful query constructs, thus eliminating the need for using explicit MapReduce code. With MRQL, users will be able to express complex data analysis tasks, such as PageRank, k-means clustering, matrix factorization, etc, using SQL-like queries exclusively, while the MRQL query processing system will be able to compile these queries to efficient Java code. = Background = The initial code was developed at the University of Texas of Arlington (UTA) by a research team, led by Leonidas Fegaras. The software was first released in May 2011. The original goal of this project was to build a query processing system that translates SQL-like data analysis queries to efficient workflows of MapReduce jobs. A design goal was to use HDFS as the physical storage layer, without any indexing, data partitioning, or data normalization, and to use Hadoop (without extensions) as the run-time engine. The motivation behind this work was to build a platform to test new ideas on query processing and optimization techniques applicable to the MapReduce framework. A year ago, MRQL was extended to run on Hama. The motivation for this extension was that Hadoop MapReduce jobs were required to read their input and write their output on HDFS. This simplifies reliability and fault tolerance but it imposes a high overhead to complex MapReduce workflows and graph algorithms, such as PageRank, which require repetitive jobs. In addition, Hadoop does not preserve data in memory across consecutive MapReduce jobs. This restriction requires to read data at every step, even when the data is constant. BSP, on the other hand, does not suffer from this restriction, and, under certain circumstances, allows complex repetitive algorithms to run entirely in the collective memory of a cluster. Thus, the goal was to be able to run the same MRQL queries in both modes, MapReduce and BSP, without modifying the queries: If there are enough resources available, and low latency and speed are more important than resilience, queries may run in BSP mode; otherwise, the same queries may run in MapReduce mode. BSP evaluation was found to be a good choice when fault tolerance is not critical, data (both input and intermediate) can fit in the cluster memory, and data processing requires complex/repetitive steps. The research results of this ongoing work have already been published in conferences (WebDB'11, EDBT'12, and DataCloud'12) and the authors have already received positive feedback from researchers in academia and industry who were attending these conferences. = Rationale = * MRQL will be the first general-purpose, SQL-like query language for data analysis based on BSP. Currently, many programmers prefer to code their MapReduce applications in a higher-level query language, rather than an algorithmic language. For instance, Pig is used for 60% of Yahoo MapReduce jobs, while Hive is used for 90% of Facebook MapReduce jobs. This, we believe, will also be the trend for BSP applications, because, even though, in principle, the BSP model is very simple to understand, it is hard to develop, optimize, and maintain non-trivial BSP applications coded in a general-purpose programming language. Currently, there is no widely acceptable declarative BSP query language, although there are a few special-purpose BSP systems for graph analysis, such as Google Pregel and Apache Giraph, for machine learning, such as BSML, and for scientific data analysis. * MRQL can capture many complex data analysis algorithms in declarative form. Existing MapReduce query
Re: [VOTE] Accept MRQL into the Incubator
+1 On Wed, Mar 6, 2013 at 6:04 PM, Leonidas Fegaras fega...@cse.uta.eduwrote: Dear ASF members, I would like to call for a VOTE for acceptance of MRQL into the Incubator. The vote will close on Monday March 11, 2013. [ ] +1 Accept MRQL into the Apache incubator [ ] +0 Don't care. [ ] -1 Don't accept MRQL into the incubator because... Full proposal is pasted below and the corresponding wiki is http://wiki.apache.org/**incubator/MRQLProposalhttp://wiki.apache.org/incubator/MRQLProposal Only VOTEs from Incubator PMC members are binding, but all are welcome to express their thoughts. Sincerely, Leonidas Fegaras = Abstract = MRQL is a query processing and optimization system for large-scale, distributed data analysis, built on top of Apache Hadoop and Hama. = Proposal = MRQL (pronounced ''miracle'') is a query processing and optimization system for large-scale, distributed data analysis. MRQL (the MapReduce Query Language) is an SQL-like query language for large-scale data analysis on a cluster of computers. The MRQL query processing system can evaluate MRQL queries in two modes: in MapReduce mode on top of Apache Hadoop or in Bulk Synchronous Parallel (BSP) mode on top of Apache Hama. The MRQL query language is powerful enough to express most common data analysis tasks over many forms of raw ''in-situ'' data, such as XML and JSON documents, binary files, and CSV documents. MRQL is more powerful than other current high-level MapReduce languages, such as Hive and PigLatin, since it can operate on more complex data and supports more powerful query constructs, thus eliminating the need for using explicit MapReduce code. With MRQL, users will be able to express complex data analysis tasks, such as PageRank, k-means clustering, matrix factorization, etc, using SQL-like queries exclusively, while the MRQL query processing system will be able to compile these queries to efficient Java code. = Background = The initial code was developed at the University of Texas of Arlington (UTA) by a research team, led by Leonidas Fegaras. The software was first released in May 2011. The original goal of this project was to build a query processing system that translates SQL-like data analysis queries to efficient workflows of MapReduce jobs. A design goal was to use HDFS as the physical storage layer, without any indexing, data partitioning, or data normalization, and to use Hadoop (without extensions) as the run-time engine. The motivation behind this work was to build a platform to test new ideas on query processing and optimization techniques applicable to the MapReduce framework. A year ago, MRQL was extended to run on Hama. The motivation for this extension was that Hadoop MapReduce jobs were required to read their input and write their output on HDFS. This simplifies reliability and fault tolerance but it imposes a high overhead to complex MapReduce workflows and graph algorithms, such as PageRank, which require repetitive jobs. In addition, Hadoop does not preserve data in memory across consecutive MapReduce jobs. This restriction requires to read data at every step, even when the data is constant. BSP, on the other hand, does not suffer from this restriction, and, under certain circumstances, allows complex repetitive algorithms to run entirely in the collective memory of a cluster. Thus, the goal was to be able to run the same MRQL queries in both modes, MapReduce and BSP, without modifying the queries: If there are enough resources available, and low latency and speed are more important than resilience, queries may run in BSP mode; otherwise, the same queries may run in MapReduce mode. BSP evaluation was found to be a good choice when fault tolerance is not critical, data (both input and intermediate) can fit in the cluster memory, and data processing requires complex/repetitive steps. The research results of this ongoing work have already been published in conferences (WebDB'11, EDBT'12, and DataCloud'12) and the authors have already received positive feedback from researchers in academia and industry who were attending these conferences. = Rationale = * MRQL will be the first general-purpose, SQL-like query language for data analysis based on BSP. Currently, many programmers prefer to code their MapReduce applications in a higher-level query language, rather than an algorithmic language. For instance, Pig is used for 60% of Yahoo MapReduce jobs, while Hive is used for 90% of Facebook MapReduce jobs. This, we believe, will also be the trend for BSP applications, because, even though, in principle, the BSP model is very simple to understand, it is hard to develop, optimize, and maintain non-trivial BSP applications coded in a general-purpose programming language. Currently, there is no widely acceptable declarative BSP query language, although there are a few special-purpose BSP systems for graph analysis, such as Google
Re: [VOTE] Accept MRQL into the Incubator
+1 (binding) On Wed, Mar 6, 2013 at 7:04 PM, Leonidas Fegaras fega...@cse.uta.eduwrote: Dear ASF members, I would like to call for a VOTE for acceptance of MRQL into the Incubator. The vote will close on Monday March 11, 2013. [ ] +1 Accept MRQL into the Apache incubator [ ] +0 Don't care. [ ] -1 Don't accept MRQL into the incubator because... Full proposal is pasted below and the corresponding wiki is http://wiki.apache.org/**incubator/MRQLProposalhttp://wiki.apache.org/incubator/MRQLProposal Only VOTEs from Incubator PMC members are binding, but all are welcome to express their thoughts. Sincerely, Leonidas Fegaras = Abstract = MRQL is a query processing and optimization system for large-scale, distributed data analysis, built on top of Apache Hadoop and Hama. = Proposal = MRQL (pronounced ''miracle'') is a query processing and optimization system for large-scale, distributed data analysis. MRQL (the MapReduce Query Language) is an SQL-like query language for large-scale data analysis on a cluster of computers. The MRQL query processing system can evaluate MRQL queries in two modes: in MapReduce mode on top of Apache Hadoop or in Bulk Synchronous Parallel (BSP) mode on top of Apache Hama. The MRQL query language is powerful enough to express most common data analysis tasks over many forms of raw ''in-situ'' data, such as XML and JSON documents, binary files, and CSV documents. MRQL is more powerful than other current high-level MapReduce languages, such as Hive and PigLatin, since it can operate on more complex data and supports more powerful query constructs, thus eliminating the need for using explicit MapReduce code. With MRQL, users will be able to express complex data analysis tasks, such as PageRank, k-means clustering, matrix factorization, etc, using SQL-like queries exclusively, while the MRQL query processing system will be able to compile these queries to efficient Java code. = Background = The initial code was developed at the University of Texas of Arlington (UTA) by a research team, led by Leonidas Fegaras. The software was first released in May 2011. The original goal of this project was to build a query processing system that translates SQL-like data analysis queries to efficient workflows of MapReduce jobs. A design goal was to use HDFS as the physical storage layer, without any indexing, data partitioning, or data normalization, and to use Hadoop (without extensions) as the run-time engine. The motivation behind this work was to build a platform to test new ideas on query processing and optimization techniques applicable to the MapReduce framework. A year ago, MRQL was extended to run on Hama. The motivation for this extension was that Hadoop MapReduce jobs were required to read their input and write their output on HDFS. This simplifies reliability and fault tolerance but it imposes a high overhead to complex MapReduce workflows and graph algorithms, such as PageRank, which require repetitive jobs. In addition, Hadoop does not preserve data in memory across consecutive MapReduce jobs. This restriction requires to read data at every step, even when the data is constant. BSP, on the other hand, does not suffer from this restriction, and, under certain circumstances, allows complex repetitive algorithms to run entirely in the collective memory of a cluster. Thus, the goal was to be able to run the same MRQL queries in both modes, MapReduce and BSP, without modifying the queries: If there are enough resources available, and low latency and speed are more important than resilience, queries may run in BSP mode; otherwise, the same queries may run in MapReduce mode. BSP evaluation was found to be a good choice when fault tolerance is not critical, data (both input and intermediate) can fit in the cluster memory, and data processing requires complex/repetitive steps. The research results of this ongoing work have already been published in conferences (WebDB'11, EDBT'12, and DataCloud'12) and the authors have already received positive feedback from researchers in academia and industry who were attending these conferences. = Rationale = * MRQL will be the first general-purpose, SQL-like query language for data analysis based on BSP. Currently, many programmers prefer to code their MapReduce applications in a higher-level query language, rather than an algorithmic language. For instance, Pig is used for 60% of Yahoo MapReduce jobs, while Hive is used for 90% of Facebook MapReduce jobs. This, we believe, will also be the trend for BSP applications, because, even though, in principle, the BSP model is very simple to understand, it is hard to develop, optimize, and maintain non-trivial BSP applications coded in a general-purpose programming language. Currently, there is no widely acceptable declarative BSP query language, although there are a few special-purpose BSP systems for graph analysis, such as
Re: [PROPOSAL] MRQL for the Apache Incubator
I added myself as a mentor. Welcome aboard. On Wed, Mar 6, 2013 at 9:02 AM, Edward J. Yoon edwardy...@apache.orgwrote: I think it's time to call for vote. On Mon, Mar 4, 2013 at 9:25 PM, Tommaso Teofili tommaso.teof...@gmail.com wrote: Nice proposal indeed, I'd say having 3 mentors is usually better to avoid release headaches. Regards, Tommaso 2013/3/4 Edward J. Yoon edwardy...@apache.org Sure I can. :) Of course, we'll welcome more mentors from incubator IPMC if there're volunteers. On Mon, Mar 4, 2013 at 7:34 PM, Alex Karasulu akaras...@apache.org wrote: On Mon, Mar 4, 2013 at 12:31 PM, Bertrand Delacretaz bdelacre...@apache.org wrote: On Sat, Mar 2, 2013 at 7:12 AM, Leonidas Fegaras fega...@cse.uta.edu wrote: == Champion == * Edward J. Yoon edwardyoon AT apache DOT org == Nominated Mentors == * Alex Karasulu akarasulu AT apache DOT org ... Is Edward going to stay on as a mentor as well? Two (active) mentors is the bare minimum IMO. I suspect so but let's hear from Edward himself. Best Regards, -- Alex -- Best Regards, Edward J. Yoon @eddieyoon - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org -- Best Regards, Edward J. Yoon @eddieyoon - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org -- Thanks - Mohammad Nour Life is like riding a bicycle. To keep your balance you must keep moving - Albert Einstein
Re: [VOTE] Accept MRQL into the Incubator
+1 Tommaso 2013/3/6 Alex Karasulu akaras...@apache.org +1 (binding) On Wed, Mar 6, 2013 at 7:04 PM, Leonidas Fegaras fega...@cse.uta.edu wrote: Dear ASF members, I would like to call for a VOTE for acceptance of MRQL into the Incubator. The vote will close on Monday March 11, 2013. [ ] +1 Accept MRQL into the Apache incubator [ ] +0 Don't care. [ ] -1 Don't accept MRQL into the incubator because... Full proposal is pasted below and the corresponding wiki is http://wiki.apache.org/**incubator/MRQLProposal http://wiki.apache.org/incubator/MRQLProposal Only VOTEs from Incubator PMC members are binding, but all are welcome to express their thoughts. Sincerely, Leonidas Fegaras = Abstract = MRQL is a query processing and optimization system for large-scale, distributed data analysis, built on top of Apache Hadoop and Hama. = Proposal = MRQL (pronounced ''miracle'') is a query processing and optimization system for large-scale, distributed data analysis. MRQL (the MapReduce Query Language) is an SQL-like query language for large-scale data analysis on a cluster of computers. The MRQL query processing system can evaluate MRQL queries in two modes: in MapReduce mode on top of Apache Hadoop or in Bulk Synchronous Parallel (BSP) mode on top of Apache Hama. The MRQL query language is powerful enough to express most common data analysis tasks over many forms of raw ''in-situ'' data, such as XML and JSON documents, binary files, and CSV documents. MRQL is more powerful than other current high-level MapReduce languages, such as Hive and PigLatin, since it can operate on more complex data and supports more powerful query constructs, thus eliminating the need for using explicit MapReduce code. With MRQL, users will be able to express complex data analysis tasks, such as PageRank, k-means clustering, matrix factorization, etc, using SQL-like queries exclusively, while the MRQL query processing system will be able to compile these queries to efficient Java code. = Background = The initial code was developed at the University of Texas of Arlington (UTA) by a research team, led by Leonidas Fegaras. The software was first released in May 2011. The original goal of this project was to build a query processing system that translates SQL-like data analysis queries to efficient workflows of MapReduce jobs. A design goal was to use HDFS as the physical storage layer, without any indexing, data partitioning, or data normalization, and to use Hadoop (without extensions) as the run-time engine. The motivation behind this work was to build a platform to test new ideas on query processing and optimization techniques applicable to the MapReduce framework. A year ago, MRQL was extended to run on Hama. The motivation for this extension was that Hadoop MapReduce jobs were required to read their input and write their output on HDFS. This simplifies reliability and fault tolerance but it imposes a high overhead to complex MapReduce workflows and graph algorithms, such as PageRank, which require repetitive jobs. In addition, Hadoop does not preserve data in memory across consecutive MapReduce jobs. This restriction requires to read data at every step, even when the data is constant. BSP, on the other hand, does not suffer from this restriction, and, under certain circumstances, allows complex repetitive algorithms to run entirely in the collective memory of a cluster. Thus, the goal was to be able to run the same MRQL queries in both modes, MapReduce and BSP, without modifying the queries: If there are enough resources available, and low latency and speed are more important than resilience, queries may run in BSP mode; otherwise, the same queries may run in MapReduce mode. BSP evaluation was found to be a good choice when fault tolerance is not critical, data (both input and intermediate) can fit in the cluster memory, and data processing requires complex/repetitive steps. The research results of this ongoing work have already been published in conferences (WebDB'11, EDBT'12, and DataCloud'12) and the authors have already received positive feedback from researchers in academia and industry who were attending these conferences. = Rationale = * MRQL will be the first general-purpose, SQL-like query language for data analysis based on BSP. Currently, many programmers prefer to code their MapReduce applications in a higher-level query language, rather than an algorithmic language. For instance, Pig is used for 60% of Yahoo MapReduce jobs, while Hive is used for 90% of Facebook MapReduce jobs. This, we believe, will also be the trend for BSP applications, because, even though, in principle, the BSP model is very simple to understand, it is hard to develop, optimize, and maintain non-trivial BSP applications coded in a general-purpose programming
Re: [VOTE] Accept MRQL into the Incubator
+1 On Thu, Mar 7, 2013 at 2:11 AM, Tommaso Teofili tommaso.teof...@gmail.com wrote: +1 Tommaso 2013/3/6 Alex Karasulu akaras...@apache.org +1 (binding) On Wed, Mar 6, 2013 at 7:04 PM, Leonidas Fegaras fega...@cse.uta.edu wrote: Dear ASF members, I would like to call for a VOTE for acceptance of MRQL into the Incubator. The vote will close on Monday March 11, 2013. [ ] +1 Accept MRQL into the Apache incubator [ ] +0 Don't care. [ ] -1 Don't accept MRQL into the incubator because... Full proposal is pasted below and the corresponding wiki is http://wiki.apache.org/**incubator/MRQLProposal http://wiki.apache.org/incubator/MRQLProposal Only VOTEs from Incubator PMC members are binding, but all are welcome to express their thoughts. Sincerely, Leonidas Fegaras = Abstract = MRQL is a query processing and optimization system for large-scale, distributed data analysis, built on top of Apache Hadoop and Hama. = Proposal = MRQL (pronounced ''miracle'') is a query processing and optimization system for large-scale, distributed data analysis. MRQL (the MapReduce Query Language) is an SQL-like query language for large-scale data analysis on a cluster of computers. The MRQL query processing system can evaluate MRQL queries in two modes: in MapReduce mode on top of Apache Hadoop or in Bulk Synchronous Parallel (BSP) mode on top of Apache Hama. The MRQL query language is powerful enough to express most common data analysis tasks over many forms of raw ''in-situ'' data, such as XML and JSON documents, binary files, and CSV documents. MRQL is more powerful than other current high-level MapReduce languages, such as Hive and PigLatin, since it can operate on more complex data and supports more powerful query constructs, thus eliminating the need for using explicit MapReduce code. With MRQL, users will be able to express complex data analysis tasks, such as PageRank, k-means clustering, matrix factorization, etc, using SQL-like queries exclusively, while the MRQL query processing system will be able to compile these queries to efficient Java code. = Background = The initial code was developed at the University of Texas of Arlington (UTA) by a research team, led by Leonidas Fegaras. The software was first released in May 2011. The original goal of this project was to build a query processing system that translates SQL-like data analysis queries to efficient workflows of MapReduce jobs. A design goal was to use HDFS as the physical storage layer, without any indexing, data partitioning, or data normalization, and to use Hadoop (without extensions) as the run-time engine. The motivation behind this work was to build a platform to test new ideas on query processing and optimization techniques applicable to the MapReduce framework. A year ago, MRQL was extended to run on Hama. The motivation for this extension was that Hadoop MapReduce jobs were required to read their input and write their output on HDFS. This simplifies reliability and fault tolerance but it imposes a high overhead to complex MapReduce workflows and graph algorithms, such as PageRank, which require repetitive jobs. In addition, Hadoop does not preserve data in memory across consecutive MapReduce jobs. This restriction requires to read data at every step, even when the data is constant. BSP, on the other hand, does not suffer from this restriction, and, under certain circumstances, allows complex repetitive algorithms to run entirely in the collective memory of a cluster. Thus, the goal was to be able to run the same MRQL queries in both modes, MapReduce and BSP, without modifying the queries: If there are enough resources available, and low latency and speed are more important than resilience, queries may run in BSP mode; otherwise, the same queries may run in MapReduce mode. BSP evaluation was found to be a good choice when fault tolerance is not critical, data (both input and intermediate) can fit in the cluster memory, and data processing requires complex/repetitive steps. The research results of this ongoing work have already been published in conferences (WebDB'11, EDBT'12, and DataCloud'12) and the authors have already received positive feedback from researchers in academia and industry who were attending these conferences. = Rationale = * MRQL will be the first general-purpose, SQL-like query language for data analysis based on BSP. Currently, many programmers prefer to code their MapReduce applications in a higher-level query language, rather than an algorithmic language. For instance, Pig is used for 60% of Yahoo MapReduce jobs, while Hive is used for 90% of Facebook MapReduce jobs. This, we believe, will also be the trend for BSP applications, because, even though, in principle, the BSP model is very simple to understand, it is hard to develop, optimize,
Re: [VOTE] Accept Curator into the Incubator
+1 (binding) Disclosure: I am one of the mentors. On Wed, Mar 6, 2013 at 4:27 AM, Ioan Eugen Stan stan.ieu...@gmail.comwrote: +1 non binding -- Ioan Eugen Stan - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [VOTE] Accept Curator into the Incubator
+1 (binding) thanks mahadev On Wed, Mar 6, 2013 at 7:14 PM, Enis Söztutar e...@apache.org wrote: +1 (binding) Disclosure: I am one of the mentors. On Wed, Mar 6, 2013 at 4:27 AM, Ioan Eugen Stan stan.ieu...@gmail.com wrote: +1 non binding -- Ioan Eugen Stan - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [VOTE] Accept Provisionr into the Apache Incubator
Thanks to all who voted! With 18 +1s (10 binding) the vote passes. I'll start the work to get the podling started. Thanks, Andrei On Mon, Mar 4, 2013 at 9:31 PM, Henry Saputra henry.sapu...@gmail.comwrote: +1 non-binding Good luck On Sat, Mar 2, 2013 at 3:35 PM, Andrei Savu as...@apache.org wrote: Hi Guys, I'd like to call a VOTE for acceptance of Provisionr into the Apache Incubator. The vote will close on March 8. [] +1 Accept Provisionr into the Apache incubator [] +0 Don't care. [] -1 Don't accept Provisionr into the incubator because... Full proposal is pasted at the bottom on this email, and the corresponding wiki is http://wiki.apache.org/incubator/ProvisionrProposal Only VOTEs from Incubator PMC members are binding, but all are welcome to express their thoughts. Thanks, Andrei Savu -- Provisionr Proposal == Abstract == Provisionr is an effort to develop a service that can be used to create and manage pools of virtual machines on multiple clouds. Our focus is on semi-automated workflows and cloud portability. == Proposal == Provisionr solves the problem of cloud portability by hiding completely the APIs and only focusing on building a cluster that matches the same set of assumptions on all clouds, assumptions like: running a specific operating system (e.g. Ubuntu 12.04 LTS), having the same set of pre-installed packages and binaries, sane dns settings (forward reverse ip resolution - as needed for Hadoop), ntp settings, networking settings, firewall, ssh admin access, vpn access etc. As a secondary goal Provisionr should also provide primitives for building automatic or semi-automatic workflows for configuring services, workflows that assume that all the machines share a common set of characteristics as described above. == Background == Creating clusters on cloud infrastructure is non-trivial because careful orchestration is required. To make it easy to deploy services we need to start from a foundation that matches a common set of assumptions on multiple providers. == Rationale == This project started as a re-write of the core of Apache Whirr but has a different target being more focused on semi-automated workflows and cloud portability. == Initial Goals == * Build a community * Provide an excellent user experience for semi-automatic workflows (e.g. using Rundeck) * Implement a REST service and a Web Console * Add support for more providers == Current Status == Provisionr had four releases on [[ https://github.com/axemblr/axemblr-provisionr/wiki|GitHub]] and it's used to deploy Hadoop clusters on-demand at Axemblr and infrastructure for testing / QA. === Meritocracy === We plan to invest in supporting a meritocracy. We will discuss the requirements in an open forum. Several companies have already expressed interest in this project, and we intend to invite additional developers to participate. We will encourage and monitor community participation so that privileges can be extended to those that contribute. === Community === The community interested in cloud service infrastructure is currently spread across many smaller projects, and one of the main goals of this project is to build a vibrant community to share best practices and build common infrastructure. === Core developers === Core developers are very experienced in the Apache ecosystem. To achieve more diversity of developers, we will be eager to recruit developers from diverse companies. * Andrei Savu - asavu at apache dot org (Apache Whirr PMC) * Ioan Eugen Stan - ieugen at apache dot org (Apache James PMC) * Alex Ciminian - alex.ciminian at gmail dot org === Alignment === Provisionr complements Apache Whirr and later on it should provide a robust foundation for more advanced functionalities. == Known Risks == === Orphaned products === The contributors have significant open source experience and the project is being used as part of a commercial product, so the risk of being orphaned is relatively low. We plan to mitigate this risk by recruiting additional committers. === Inexperience with Open Source === Most of the initial committers have experience working on open source projects. Andrei Savu and Ioan Eugen Stan have experience as committers and PMC members on other Apache projects. === Homogenous Developers === We are committed to recruiting additional committers from other companies based on their contributions to the project. === Reliance on Salaried Developers === It is expected that Provisionr development will occur on both salaried time and on volunteer time, after hours. The majority of initial committers are paid by their employer to contribute to this project. However, they are all passionate about the project, and we are confident