Re: [VOTE] Accept Wayang into the Apache Incubator

2020-12-16 Thread Aditya Sharma
+1 (non-binding)

Thanks and regards,
Aditya Sharma


On Fri, 11 Dec 2020 at 22:03, Christofer Dutz 
wrote:

> Hi all,
>
> following up the [DISCUSS] thread on Wayang (
> https://lists.apache.org/thread.html/r5fc03ae014f44c7c31a509a6db4ac07faedb2e1c6245cd917b744826%40%3Cgeneral.incubator.apache.org%3E)
> I would like to call a VOTE to accept Wayang Aka Rheem into the Apache
> Incubator.
>
> Please cast your vote:
>
>   [ ] +1, bring Wayang into the Incubator
>   [ ] +0, I don't care either way
>   [ ] -1, do not bring Wayang into the Incubator, because...
>
> The vote will open at least for 72 hours and only votes from the Incubator
> PMC are binding, but votes from everyone are welcome.
>
> Chris
>
> -
>
> Wayang Proposal (
> https://cwiki.apache.org/confluence/display/INCUBATOR/WayangProposal)
>
> == Abstract ==
>
> Wayang is a cross-platform data processing system that aims at decoupling
> the business logic of data analytics applications from concrete data
> processing platforms, such as Apache Flink or Apache Spark. Hence, it tames
> the complexity that arises from the "Cambrian explosion" of novel data
> processing platforms that we currently witness.
>
> Note that Wayang project is the Rheem project, but we have renamed the
> project because of trademark issues.
>
> You can find the project web page at: https://rheem-ecosystem.github.io/
>
> = Proposal =
>
> Wayang is a cross-platform system that provides an abstraction over data
> processing platforms to free users from the burdens of (i) performing
> tedious and costly data migration and integration tasks to run their
> applications, and (ii) choosing the right data processing platforms for
> their applications. To achieve this, Wayang: (1) provides an abstraction on
> top of existing data processing platforms that allows users to specify
> their data analytics tasks in a form of a DAG of operators; (2) comes with
> a cross-platform optimizer for automating the selection of
> suitable/efficient platforms; and (3) and finally takes care of executing
> the optimized plan, including communication across platforms. In summary,
> Wayang has the following salient features:
>
> - Flexible Data Model - It considers a flexible and simple data model
> based on data quanta. A data quantum is an atomic processing unit in the
> system, that can represent a large spectrum of data formats, such as data
> points for a machine learning application, tuples for a database
> application, or RDF triples. Hence, Wayang is able to express a wide range
> of data analytics tasks.
> - Platform independence - It provides a simple interface (currently Java
> and Scala) that is inspired by established programming models, such as that
> of Apache Spark and Apache Flink. Users represent their data analytic tasks
> as a DAG (Wayang plan), where vertices correspond to Wayang operators and
> edges represent data flows (data quanta flowing) among these operators. A
> Wayang operator defines a particular kind of data transformation over an
> input data quantum, ranging from basic functionality (e.g.,
> transformations, filters, joins) to complex, extensible tasks (e.g.,
> PageRank).
> - Cross-platform execution - Besides running a data analytic task on any
> data processing platform, it also comes with an optimizer that can decide
> to execute a single data analytic task using multiple data processing
> platforms. This allows for exploiting the capabilities of different data
> processing platforms to perform complex data analytic tasks more
> efficiently.
> Self-tuning UDF-based cost model - Its optimizer uses a cost model fully
> based on UDFs. This not only enables Wayang to learn the cost functions of
> newly added data processing platforms, but also allows developers to tune
> the optimizer at will.
> - Extensibility - It treats data processing platforms as plugins to allow
> users (developers) to easily incorporate new data processing platforms into
> the system. This is achieved by exposing the functionalities of data
> processing platforms as operators (execution operators). The same approach
> is followed at the Wayang interface, where users can also extend Wayang
> capabilities, i.e., the operators, easily.
>
> We plan to work on the stability of all these features as well as
> extending Wayang with more advanced features. Furthermore, Wayang currently
> supports Apache Spark, Standalone Java, GraphChi, relational databases (via
> JDBC). We plan to incorporate more data processing platforms, such as
> Apache Flink and Apache Hive.
>
> === Background ===
>
> Many organizations and companies collect or produce large variety of data
> to apply data analytics over them. This is because insights from data
> rapidly allow them to make better decisions. Thus, the pursuit for
> efficient and scalable data analytics as well as the
> one-size-does-not-fit-all philosophy has given rise to a plethora of data
> processing platforms. Examples of these specialized processing 

Re: [VOTE] Accept Wayang into the Apache Incubator

2020-12-16 Thread Paul King
+1 (binding)



Virus-free.
www.avast.com

<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

On Sat, Dec 12, 2020 at 2:33 AM Christofer Dutz 
wrote:

> Hi all,
>
> following up the [DISCUSS] thread on Wayang (
> https://lists.apache.org/thread.html/r5fc03ae014f44c7c31a509a6db4ac07faedb2e1c6245cd917b744826%40%3Cgeneral.incubator.apache.org%3E)
> I would like to call a VOTE to accept Wayang Aka Rheem into the Apache
> Incubator.
>
> Please cast your vote:
>
>   [ ] +1, bring Wayang into the Incubator
>   [ ] +0, I don't care either way
>   [ ] -1, do not bring Wayang into the Incubator, because...
>
> The vote will open at least for 72 hours and only votes from the Incubator
> PMC are binding, but votes from everyone are welcome.
>
> Chris
>
> -
>
> Wayang Proposal (
> https://cwiki.apache.org/confluence/display/INCUBATOR/WayangProposal)
>
> == Abstract ==
>
> Wayang is a cross-platform data processing system that aims at decoupling
> the business logic of data analytics applications from concrete data
> processing platforms, such as Apache Flink or Apache Spark. Hence, it tames
> the complexity that arises from the "Cambrian explosion" of novel data
> processing platforms that we currently witness.
>
> Note that Wayang project is the Rheem project, but we have renamed the
> project because of trademark issues.
>
> You can find the project web page at: https://rheem-ecosystem.github.io/
>
> = Proposal =
>
> Wayang is a cross-platform system that provides an abstraction over data
> processing platforms to free users from the burdens of (i) performing
> tedious and costly data migration and integration tasks to run their
> applications, and (ii) choosing the right data processing platforms for
> their applications. To achieve this, Wayang: (1) provides an abstraction on
> top of existing data processing platforms that allows users to specify
> their data analytics tasks in a form of a DAG of operators; (2) comes with
> a cross-platform optimizer for automating the selection of
> suitable/efficient platforms; and (3) and finally takes care of executing
> the optimized plan, including communication across platforms. In summary,
> Wayang has the following salient features:
>
> - Flexible Data Model - It considers a flexible and simple data model
> based on data quanta. A data quantum is an atomic processing unit in the
> system, that can represent a large spectrum of data formats, such as data
> points for a machine learning application, tuples for a database
> application, or RDF triples. Hence, Wayang is able to express a wide range
> of data analytics tasks.
> - Platform independence - It provides a simple interface (currently Java
> and Scala) that is inspired by established programming models, such as that
> of Apache Spark and Apache Flink. Users represent their data analytic tasks
> as a DAG (Wayang plan), where vertices correspond to Wayang operators and
> edges represent data flows (data quanta flowing) among these operators. A
> Wayang operator defines a particular kind of data transformation over an
> input data quantum, ranging from basic functionality (e.g.,
> transformations, filters, joins) to complex, extensible tasks (e.g.,
> PageRank).
> - Cross-platform execution - Besides running a data analytic task on any
> data processing platform, it also comes with an optimizer that can decide
> to execute a single data analytic task using multiple data processing
> platforms. This allows for exploiting the capabilities of different data
> processing platforms to perform complex data analytic tasks more
> efficiently.
> Self-tuning UDF-based cost model - Its optimizer uses a cost model fully
> based on UDFs. This not only enables Wayang to learn the cost functions of
> newly added data processing platforms, but also allows developers to tune
> the optimizer at will.
> - Extensibility - It treats data processing platforms as plugins to allow
> users (developers) to easily incorporate new data processing platforms into
> the system. This is achieved by exposing the functionalities of data
> processing platforms as operators (execution operators). The same approach
> is followed at the Wayang interface, where users can also extend Wayang
> capabilities, i.e., the operators, easily.
>
> We plan to work on the stability of all these features as well as
> extending Wayang with more advanced features. Furthermore, Wayang currently
> supports Apache Spark, Standalone Java, GraphChi, relational databases (via
> JDBC). We plan to incorporate more data processing platforms, such as
> Apache Flink and Apache Hive.
>
> === Background ===
>
> Many organizations and companies collect or produce large variety of data
> to apply data analytics over them. This is because insights from data
> rapidly allow them to make better decisions. 

[RESULT] [VOTE] Accept Wayang into the Apache Incubator

2020-12-16 Thread Christofer Dutz
So the vote passes:

8 binding +1 votes (All nominated  mentors voted)
3 non-binding +1 votes

No 0 or -1 votes

https://lists.apache.org/thread.html/ra8e804fadca573af0f053df56686c98dbeaf1e8d1cba7228b16a637e%40%3Cgeneral.incubator.apache.org%3E

Binding:
Dave Fisher
Kevin Ratnasekera
Furkan KAMACI
Byung-Gon Chun
Sheng Wu
Lars George
Jean-Baptiste Onofre
Bernd Fondermann

Non-Binding:
Alexander Alten-Lorenz
Daniel B. Widdis
Xiangdong Huang


-Ursprüngliche Nachricht-
Von: Xiangdong Huang 
Gesendet: Montag, 14. Dezember 2020 07:28
An: general@incubator.apache.org
Betreff: Re: [VOTE] Accept Wayang into the Apache Incubator

+1 (non-binding)
---
Xiangdong Huang
School of Software, Tsinghua University


Lars George  于2020年12月12日周六 下午8:19写道:

> +1 binding
>
> On Sat, Dec 12, 2020 at 2:24 AM Sheng Wu 
> wrote:
>
> > +1 binding
> >
> > Sheng Wu 吴晟
> > Twitter, wusheng1108
> >
> >
> > Byung-Gon Chun  于2020年12月12日周六 上午5:59写道:
> >
> > > +1 (binding)
> > >
> > > -Gon
> > >
> > > On Sat, Dec 12, 2020 at 2:35 AM Furkan KAMACI
> > > 
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > +1 (binding)
> > > >
> > > > Kind Regards,
> > > > Furkan KAMACI
> > > >
> > > > On 11 Dec 2020 Fri at 20:04 Daniel B. Widdis 
> wrote:
> > > >
> > > > > +1 (non-binding).  I'm interested in getting involved in this
> > project!
> > > > >
> > > > > On Fri, Dec 11, 2020 at 8:33 AM Christofer Dutz <
> > > > christofer.d...@c-ware.de
> > > > > >
> > > > > wrote:
> > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > following up the [DISCUSS] thread on Wayang (
> > > > > >
> > > > >
> > > >
> > >
> >
> https://lists.apache.org/thread.html/r5fc03ae014f44c7c31a509a6db4ac07f
> aedb2e1c6245cd917b744826%40%3Cgeneral.incubator.apache.org%3E
> > > > > )
> > > > > > I would like to call a VOTE to accept Wayang Aka Rheem into
> > > > > > the
> > > Apache
> > > > > > Incubator.
> > > > > >
> > > > > > Please cast your vote:
> > > > > >
> > > > > >   [ ] +1, bring Wayang into the Incubator
> > > > > >   [ ] +0, I don't care either way
> > > > > >   [ ] -1, do not bring Wayang into the Incubator, because...
> > > > > >
> > > > > > The vote will open at least for 72 hours and only votes from
> > > > > > the
> > > > > Incubator
> > > > > > PMC are binding, but votes from everyone are welcome.
> > > > > >
> > > > > > Chris
> > > > > >
> > > > > > -
> > > > > >
> > > > > > Wayang Proposal (
> > > > > >
> > https://cwiki.apache.org/confluence/display/INCUBATOR/WayangProposal
> > > )
> > > > > >
> > > > > > == Abstract ==
> > > > > >
> > > > > > Wayang is a cross-platform data processing system that aims
> > > > > > at
> > > > decoupling
> > > > > > the business logic of data analytics applications from
> > > > > > concrete
> > data
> > > > > > processing platforms, such as Apache Flink or Apache Spark.
> Hence,
> > it
> > > > > tames
> > > > > > the complexity that arises from the "Cambrian explosion" of
> > > > > > novel
> > > data
> > > > > > processing platforms that we currently witness.
> > > > > >
> > > > > > Note that Wayang project is the Rheem project, but we have
> renamed
> > > the
> > > > > > project because of trademark issues.
> > > > > >
> > > > > > You can find the project web page at:
> > > > https://rheem-ecosystem.github.io/
> > > > > >
> > > > > > = Proposal =
> > > > > >
> > > > > > Wayang is a cross-platform system that provides an
> > > > > > abstraction
> over
> > > > data
> > > > > > processing platforms to free users from the burdens of (i)
> > performing
> > > > > > tedious and costly data migration a

Re: [VOTE] Accept Wayang into the Apache Incubator

2020-12-16 Thread Bernd Fondermann
+1

  Bernd

On Fri, Dec 11, 2020 at 5:33 PM Christofer Dutz 
wrote:

> Hi all,
>
> following up the [DISCUSS] thread on Wayang (
> https://lists.apache.org/thread.html/r5fc03ae014f44c7c31a509a6db4ac07faedb2e1c6245cd917b744826%40%3Cgeneral.incubator.apache.org%3E)
> I would like to call a VOTE to accept Wayang Aka Rheem into the Apache
> Incubator.
>
> Please cast your vote:
>
>   [ ] +1, bring Wayang into the Incubator
>   [ ] +0, I don't care either way
>   [ ] -1, do not bring Wayang into the Incubator, because...
>
> The vote will open at least for 72 hours and only votes from the Incubator
> PMC are binding, but votes from everyone are welcome.
>
> Chris
>
> -
>
> Wayang Proposal (
> https://cwiki.apache.org/confluence/display/INCUBATOR/WayangProposal)
>
> == Abstract ==
>
> Wayang is a cross-platform data processing system that aims at decoupling
> the business logic of data analytics applications from concrete data
> processing platforms, such as Apache Flink or Apache Spark. Hence, it tames
> the complexity that arises from the "Cambrian explosion" of novel data
> processing platforms that we currently witness.
>
> Note that Wayang project is the Rheem project, but we have renamed the
> project because of trademark issues.
>
> You can find the project web page at: https://rheem-ecosystem.github.io/
>
> = Proposal =
>
> Wayang is a cross-platform system that provides an abstraction over data
> processing platforms to free users from the burdens of (i) performing
> tedious and costly data migration and integration tasks to run their
> applications, and (ii) choosing the right data processing platforms for
> their applications. To achieve this, Wayang: (1) provides an abstraction on
> top of existing data processing platforms that allows users to specify
> their data analytics tasks in a form of a DAG of operators; (2) comes with
> a cross-platform optimizer for automating the selection of
> suitable/efficient platforms; and (3) and finally takes care of executing
> the optimized plan, including communication across platforms. In summary,
> Wayang has the following salient features:
>
> - Flexible Data Model - It considers a flexible and simple data model
> based on data quanta. A data quantum is an atomic processing unit in the
> system, that can represent a large spectrum of data formats, such as data
> points for a machine learning application, tuples for a database
> application, or RDF triples. Hence, Wayang is able to express a wide range
> of data analytics tasks.
> - Platform independence - It provides a simple interface (currently Java
> and Scala) that is inspired by established programming models, such as that
> of Apache Spark and Apache Flink. Users represent their data analytic tasks
> as a DAG (Wayang plan), where vertices correspond to Wayang operators and
> edges represent data flows (data quanta flowing) among these operators. A
> Wayang operator defines a particular kind of data transformation over an
> input data quantum, ranging from basic functionality (e.g.,
> transformations, filters, joins) to complex, extensible tasks (e.g.,
> PageRank).
> - Cross-platform execution - Besides running a data analytic task on any
> data processing platform, it also comes with an optimizer that can decide
> to execute a single data analytic task using multiple data processing
> platforms. This allows for exploiting the capabilities of different data
> processing platforms to perform complex data analytic tasks more
> efficiently.
> Self-tuning UDF-based cost model - Its optimizer uses a cost model fully
> based on UDFs. This not only enables Wayang to learn the cost functions of
> newly added data processing platforms, but also allows developers to tune
> the optimizer at will.
> - Extensibility - It treats data processing platforms as plugins to allow
> users (developers) to easily incorporate new data processing platforms into
> the system. This is achieved by exposing the functionalities of data
> processing platforms as operators (execution operators). The same approach
> is followed at the Wayang interface, where users can also extend Wayang
> capabilities, i.e., the operators, easily.
>
> We plan to work on the stability of all these features as well as
> extending Wayang with more advanced features. Furthermore, Wayang currently
> supports Apache Spark, Standalone Java, GraphChi, relational databases (via
> JDBC). We plan to incorporate more data processing platforms, such as
> Apache Flink and Apache Hive.
>
> === Background ===
>
> Many organizations and companies collect or produce large variety of data
> to apply data analytics over them. This is because insights from data
> rapidly allow them to make better decisions. Thus, the pursuit for
> efficient and scalable data analytics as well as the
> one-size-does-not-fit-all philosophy has given rise to a plethora of data
> processing platforms. Examples of these specialized processing platforms
> range from DBMSs to MapReduce-like 

Re: [VOTE] Accept Wayang into the Apache Incubator

2020-12-13 Thread Xiangdong Huang
+1 (non-binding)
---
Xiangdong Huang
School of Software, Tsinghua University


Lars George  于2020年12月12日周六 下午8:19写道:

> +1 binding
>
> On Sat, Dec 12, 2020 at 2:24 AM Sheng Wu 
> wrote:
>
> > +1 binding
> >
> > Sheng Wu 吴晟
> > Twitter, wusheng1108
> >
> >
> > Byung-Gon Chun  于2020年12月12日周六 上午5:59写道:
> >
> > > +1 (binding)
> > >
> > > -Gon
> > >
> > > On Sat, Dec 12, 2020 at 2:35 AM Furkan KAMACI 
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > +1 (binding)
> > > >
> > > > Kind Regards,
> > > > Furkan KAMACI
> > > >
> > > > On 11 Dec 2020 Fri at 20:04 Daniel B. Widdis 
> wrote:
> > > >
> > > > > +1 (non-binding).  I'm interested in getting involved in this
> > project!
> > > > >
> > > > > On Fri, Dec 11, 2020 at 8:33 AM Christofer Dutz <
> > > > christofer.d...@c-ware.de
> > > > > >
> > > > > wrote:
> > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > following up the [DISCUSS] thread on Wayang (
> > > > > >
> > > > >
> > > >
> > >
> >
> https://lists.apache.org/thread.html/r5fc03ae014f44c7c31a509a6db4ac07faedb2e1c6245cd917b744826%40%3Cgeneral.incubator.apache.org%3E
> > > > > )
> > > > > > I would like to call a VOTE to accept Wayang Aka Rheem into the
> > > Apache
> > > > > > Incubator.
> > > > > >
> > > > > > Please cast your vote:
> > > > > >
> > > > > >   [ ] +1, bring Wayang into the Incubator
> > > > > >   [ ] +0, I don't care either way
> > > > > >   [ ] -1, do not bring Wayang into the Incubator, because...
> > > > > >
> > > > > > The vote will open at least for 72 hours and only votes from the
> > > > > Incubator
> > > > > > PMC are binding, but votes from everyone are welcome.
> > > > > >
> > > > > > Chris
> > > > > >
> > > > > > -
> > > > > >
> > > > > > Wayang Proposal (
> > > > > >
> > https://cwiki.apache.org/confluence/display/INCUBATOR/WayangProposal
> > > )
> > > > > >
> > > > > > == Abstract ==
> > > > > >
> > > > > > Wayang is a cross-platform data processing system that aims at
> > > > decoupling
> > > > > > the business logic of data analytics applications from concrete
> > data
> > > > > > processing platforms, such as Apache Flink or Apache Spark.
> Hence,
> > it
> > > > > tames
> > > > > > the complexity that arises from the "Cambrian explosion" of novel
> > > data
> > > > > > processing platforms that we currently witness.
> > > > > >
> > > > > > Note that Wayang project is the Rheem project, but we have
> renamed
> > > the
> > > > > > project because of trademark issues.
> > > > > >
> > > > > > You can find the project web page at:
> > > > https://rheem-ecosystem.github.io/
> > > > > >
> > > > > > = Proposal =
> > > > > >
> > > > > > Wayang is a cross-platform system that provides an abstraction
> over
> > > > data
> > > > > > processing platforms to free users from the burdens of (i)
> > performing
> > > > > > tedious and costly data migration and integration tasks to run
> > their
> > > > > > applications, and (ii) choosing the right data processing
> platforms
> > > for
> > > > > > their applications. To achieve this, Wayang: (1) provides an
> > > > abstraction
> > > > > on
> > > > > > top of existing data processing platforms that allows users to
> > > specify
> > > > > > their data analytics tasks in a form of a DAG of operators; (2)
> > comes
> > > > > with
> > > > > > a cross-platform optimizer for automating the selection of
> > > > > > suitable/efficient platforms; and (3) and finally takes care of
> > > > executing
> > > > > > the optimized plan, including communication across platforms. In
> > > > summary,
> > > > > > Wayang has the following salient features:
> > > > > >
> > > > > > - Flexible Data Model - It considers a flexible and simple data
> > model
> > > > > > based on data quanta. A data quantum is an atomic processing unit
> > in
> > > > the
> > > > > > system, that can represent a large spectrum of data formats, such
> > as
> > > > data
> > > > > > points for a machine learning application, tuples for a database
> > > > > > application, or RDF triples. Hence, Wayang is able to express a
> > wide
> > > > > range
> > > > > > of data analytics tasks.
> > > > > > - Platform independence - It provides a simple interface
> (currently
> > > > Java
> > > > > > and Scala) that is inspired by established programming models,
> such
> > > as
> > > > > that
> > > > > > of Apache Spark and Apache Flink. Users represent their data
> > analytic
> > > > > tasks
> > > > > > as a DAG (Wayang plan), where vertices correspond to Wayang
> > operators
> > > > and
> > > > > > edges represent data flows (data quanta flowing) among these
> > > > operators. A
> > > > > > Wayang operator defines a particular kind of data transformation
> > over
> > > > an
> > > > > > input data quantum, ranging from basic functionality (e.g.,
> > > > > > transformations, filters, joins) to complex, extensible tasks
> > (e.g.,
> > > > > > PageRank).
> > > > > > - Cross-platform execution - Besides running a data analytic task
> > on
> > > > any
> > > > > > data 

Re: [VOTE] Accept Wayang into the Apache Incubator

2020-12-12 Thread Lars George
+1 binding

On Sat, Dec 12, 2020 at 2:24 AM Sheng Wu  wrote:

> +1 binding
>
> Sheng Wu 吴晟
> Twitter, wusheng1108
>
>
> Byung-Gon Chun  于2020年12月12日周六 上午5:59写道:
>
> > +1 (binding)
> >
> > -Gon
> >
> > On Sat, Dec 12, 2020 at 2:35 AM Furkan KAMACI 
> > wrote:
> >
> > > Hi,
> > >
> > > +1 (binding)
> > >
> > > Kind Regards,
> > > Furkan KAMACI
> > >
> > > On 11 Dec 2020 Fri at 20:04 Daniel B. Widdis  wrote:
> > >
> > > > +1 (non-binding).  I'm interested in getting involved in this
> project!
> > > >
> > > > On Fri, Dec 11, 2020 at 8:33 AM Christofer Dutz <
> > > christofer.d...@c-ware.de
> > > > >
> > > > wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > following up the [DISCUSS] thread on Wayang (
> > > > >
> > > >
> > >
> >
> https://lists.apache.org/thread.html/r5fc03ae014f44c7c31a509a6db4ac07faedb2e1c6245cd917b744826%40%3Cgeneral.incubator.apache.org%3E
> > > > )
> > > > > I would like to call a VOTE to accept Wayang Aka Rheem into the
> > Apache
> > > > > Incubator.
> > > > >
> > > > > Please cast your vote:
> > > > >
> > > > >   [ ] +1, bring Wayang into the Incubator
> > > > >   [ ] +0, I don't care either way
> > > > >   [ ] -1, do not bring Wayang into the Incubator, because...
> > > > >
> > > > > The vote will open at least for 72 hours and only votes from the
> > > > Incubator
> > > > > PMC are binding, but votes from everyone are welcome.
> > > > >
> > > > > Chris
> > > > >
> > > > > -
> > > > >
> > > > > Wayang Proposal (
> > > > >
> https://cwiki.apache.org/confluence/display/INCUBATOR/WayangProposal
> > )
> > > > >
> > > > > == Abstract ==
> > > > >
> > > > > Wayang is a cross-platform data processing system that aims at
> > > decoupling
> > > > > the business logic of data analytics applications from concrete
> data
> > > > > processing platforms, such as Apache Flink or Apache Spark. Hence,
> it
> > > > tames
> > > > > the complexity that arises from the "Cambrian explosion" of novel
> > data
> > > > > processing platforms that we currently witness.
> > > > >
> > > > > Note that Wayang project is the Rheem project, but we have renamed
> > the
> > > > > project because of trademark issues.
> > > > >
> > > > > You can find the project web page at:
> > > https://rheem-ecosystem.github.io/
> > > > >
> > > > > = Proposal =
> > > > >
> > > > > Wayang is a cross-platform system that provides an abstraction over
> > > data
> > > > > processing platforms to free users from the burdens of (i)
> performing
> > > > > tedious and costly data migration and integration tasks to run
> their
> > > > > applications, and (ii) choosing the right data processing platforms
> > for
> > > > > their applications. To achieve this, Wayang: (1) provides an
> > > abstraction
> > > > on
> > > > > top of existing data processing platforms that allows users to
> > specify
> > > > > their data analytics tasks in a form of a DAG of operators; (2)
> comes
> > > > with
> > > > > a cross-platform optimizer for automating the selection of
> > > > > suitable/efficient platforms; and (3) and finally takes care of
> > > executing
> > > > > the optimized plan, including communication across platforms. In
> > > summary,
> > > > > Wayang has the following salient features:
> > > > >
> > > > > - Flexible Data Model - It considers a flexible and simple data
> model
> > > > > based on data quanta. A data quantum is an atomic processing unit
> in
> > > the
> > > > > system, that can represent a large spectrum of data formats, such
> as
> > > data
> > > > > points for a machine learning application, tuples for a database
> > > > > application, or RDF triples. Hence, Wayang is able to express a
> wide
> > > > range
> > > > > of data analytics tasks.
> > > > > - Platform independence - It provides a simple interface (currently
> > > Java
> > > > > and Scala) that is inspired by established programming models, such
> > as
> > > > that
> > > > > of Apache Spark and Apache Flink. Users represent their data
> analytic
> > > > tasks
> > > > > as a DAG (Wayang plan), where vertices correspond to Wayang
> operators
> > > and
> > > > > edges represent data flows (data quanta flowing) among these
> > > operators. A
> > > > > Wayang operator defines a particular kind of data transformation
> over
> > > an
> > > > > input data quantum, ranging from basic functionality (e.g.,
> > > > > transformations, filters, joins) to complex, extensible tasks
> (e.g.,
> > > > > PageRank).
> > > > > - Cross-platform execution - Besides running a data analytic task
> on
> > > any
> > > > > data processing platform, it also comes with an optimizer that can
> > > decide
> > > > > to execute a single data analytic task using multiple data
> processing
> > > > > platforms. This allows for exploiting the capabilities of different
> > > data
> > > > > processing platforms to perform complex data analytic tasks more
> > > > > efficiently.
> > > > > Self-tuning UDF-based cost model - Its optimizer uses a cost model
> > > fully
> > > > > based on UDFs. This not only 

Re: [VOTE] Accept Wayang into the Apache Incubator

2020-12-11 Thread Sheng Wu
+1 binding

Sheng Wu 吴晟
Twitter, wusheng1108


Byung-Gon Chun  于2020年12月12日周六 上午5:59写道:

> +1 (binding)
>
> -Gon
>
> On Sat, Dec 12, 2020 at 2:35 AM Furkan KAMACI 
> wrote:
>
> > Hi,
> >
> > +1 (binding)
> >
> > Kind Regards,
> > Furkan KAMACI
> >
> > On 11 Dec 2020 Fri at 20:04 Daniel B. Widdis  wrote:
> >
> > > +1 (non-binding).  I'm interested in getting involved in this project!
> > >
> > > On Fri, Dec 11, 2020 at 8:33 AM Christofer Dutz <
> > christofer.d...@c-ware.de
> > > >
> > > wrote:
> > >
> > > > Hi all,
> > > >
> > > > following up the [DISCUSS] thread on Wayang (
> > > >
> > >
> >
> https://lists.apache.org/thread.html/r5fc03ae014f44c7c31a509a6db4ac07faedb2e1c6245cd917b744826%40%3Cgeneral.incubator.apache.org%3E
> > > )
> > > > I would like to call a VOTE to accept Wayang Aka Rheem into the
> Apache
> > > > Incubator.
> > > >
> > > > Please cast your vote:
> > > >
> > > >   [ ] +1, bring Wayang into the Incubator
> > > >   [ ] +0, I don't care either way
> > > >   [ ] -1, do not bring Wayang into the Incubator, because...
> > > >
> > > > The vote will open at least for 72 hours and only votes from the
> > > Incubator
> > > > PMC are binding, but votes from everyone are welcome.
> > > >
> > > > Chris
> > > >
> > > > -
> > > >
> > > > Wayang Proposal (
> > > > https://cwiki.apache.org/confluence/display/INCUBATOR/WayangProposal
> )
> > > >
> > > > == Abstract ==
> > > >
> > > > Wayang is a cross-platform data processing system that aims at
> > decoupling
> > > > the business logic of data analytics applications from concrete data
> > > > processing platforms, such as Apache Flink or Apache Spark. Hence, it
> > > tames
> > > > the complexity that arises from the "Cambrian explosion" of novel
> data
> > > > processing platforms that we currently witness.
> > > >
> > > > Note that Wayang project is the Rheem project, but we have renamed
> the
> > > > project because of trademark issues.
> > > >
> > > > You can find the project web page at:
> > https://rheem-ecosystem.github.io/
> > > >
> > > > = Proposal =
> > > >
> > > > Wayang is a cross-platform system that provides an abstraction over
> > data
> > > > processing platforms to free users from the burdens of (i) performing
> > > > tedious and costly data migration and integration tasks to run their
> > > > applications, and (ii) choosing the right data processing platforms
> for
> > > > their applications. To achieve this, Wayang: (1) provides an
> > abstraction
> > > on
> > > > top of existing data processing platforms that allows users to
> specify
> > > > their data analytics tasks in a form of a DAG of operators; (2) comes
> > > with
> > > > a cross-platform optimizer for automating the selection of
> > > > suitable/efficient platforms; and (3) and finally takes care of
> > executing
> > > > the optimized plan, including communication across platforms. In
> > summary,
> > > > Wayang has the following salient features:
> > > >
> > > > - Flexible Data Model - It considers a flexible and simple data model
> > > > based on data quanta. A data quantum is an atomic processing unit in
> > the
> > > > system, that can represent a large spectrum of data formats, such as
> > data
> > > > points for a machine learning application, tuples for a database
> > > > application, or RDF triples. Hence, Wayang is able to express a wide
> > > range
> > > > of data analytics tasks.
> > > > - Platform independence - It provides a simple interface (currently
> > Java
> > > > and Scala) that is inspired by established programming models, such
> as
> > > that
> > > > of Apache Spark and Apache Flink. Users represent their data analytic
> > > tasks
> > > > as a DAG (Wayang plan), where vertices correspond to Wayang operators
> > and
> > > > edges represent data flows (data quanta flowing) among these
> > operators. A
> > > > Wayang operator defines a particular kind of data transformation over
> > an
> > > > input data quantum, ranging from basic functionality (e.g.,
> > > > transformations, filters, joins) to complex, extensible tasks (e.g.,
> > > > PageRank).
> > > > - Cross-platform execution - Besides running a data analytic task on
> > any
> > > > data processing platform, it also comes with an optimizer that can
> > decide
> > > > to execute a single data analytic task using multiple data processing
> > > > platforms. This allows for exploiting the capabilities of different
> > data
> > > > processing platforms to perform complex data analytic tasks more
> > > > efficiently.
> > > > Self-tuning UDF-based cost model - Its optimizer uses a cost model
> > fully
> > > > based on UDFs. This not only enables Wayang to learn the cost
> functions
> > > of
> > > > newly added data processing platforms, but also allows developers to
> > tune
> > > > the optimizer at will.
> > > > - Extensibility - It treats data processing platforms as plugins to
> > allow
> > > > users (developers) to easily incorporate new data processing
> platforms
> > > into
> > > > the system. 

Re: [VOTE] Accept Wayang into the Apache Incubator

2020-12-11 Thread Byung-Gon Chun
+1 (binding)

-Gon

On Sat, Dec 12, 2020 at 2:35 AM Furkan KAMACI 
wrote:

> Hi,
>
> +1 (binding)
>
> Kind Regards,
> Furkan KAMACI
>
> On 11 Dec 2020 Fri at 20:04 Daniel B. Widdis  wrote:
>
> > +1 (non-binding).  I'm interested in getting involved in this project!
> >
> > On Fri, Dec 11, 2020 at 8:33 AM Christofer Dutz <
> christofer.d...@c-ware.de
> > >
> > wrote:
> >
> > > Hi all,
> > >
> > > following up the [DISCUSS] thread on Wayang (
> > >
> >
> https://lists.apache.org/thread.html/r5fc03ae014f44c7c31a509a6db4ac07faedb2e1c6245cd917b744826%40%3Cgeneral.incubator.apache.org%3E
> > )
> > > I would like to call a VOTE to accept Wayang Aka Rheem into the Apache
> > > Incubator.
> > >
> > > Please cast your vote:
> > >
> > >   [ ] +1, bring Wayang into the Incubator
> > >   [ ] +0, I don't care either way
> > >   [ ] -1, do not bring Wayang into the Incubator, because...
> > >
> > > The vote will open at least for 72 hours and only votes from the
> > Incubator
> > > PMC are binding, but votes from everyone are welcome.
> > >
> > > Chris
> > >
> > > -
> > >
> > > Wayang Proposal (
> > > https://cwiki.apache.org/confluence/display/INCUBATOR/WayangProposal)
> > >
> > > == Abstract ==
> > >
> > > Wayang is a cross-platform data processing system that aims at
> decoupling
> > > the business logic of data analytics applications from concrete data
> > > processing platforms, such as Apache Flink or Apache Spark. Hence, it
> > tames
> > > the complexity that arises from the "Cambrian explosion" of novel data
> > > processing platforms that we currently witness.
> > >
> > > Note that Wayang project is the Rheem project, but we have renamed the
> > > project because of trademark issues.
> > >
> > > You can find the project web page at:
> https://rheem-ecosystem.github.io/
> > >
> > > = Proposal =
> > >
> > > Wayang is a cross-platform system that provides an abstraction over
> data
> > > processing platforms to free users from the burdens of (i) performing
> > > tedious and costly data migration and integration tasks to run their
> > > applications, and (ii) choosing the right data processing platforms for
> > > their applications. To achieve this, Wayang: (1) provides an
> abstraction
> > on
> > > top of existing data processing platforms that allows users to specify
> > > their data analytics tasks in a form of a DAG of operators; (2) comes
> > with
> > > a cross-platform optimizer for automating the selection of
> > > suitable/efficient platforms; and (3) and finally takes care of
> executing
> > > the optimized plan, including communication across platforms. In
> summary,
> > > Wayang has the following salient features:
> > >
> > > - Flexible Data Model - It considers a flexible and simple data model
> > > based on data quanta. A data quantum is an atomic processing unit in
> the
> > > system, that can represent a large spectrum of data formats, such as
> data
> > > points for a machine learning application, tuples for a database
> > > application, or RDF triples. Hence, Wayang is able to express a wide
> > range
> > > of data analytics tasks.
> > > - Platform independence - It provides a simple interface (currently
> Java
> > > and Scala) that is inspired by established programming models, such as
> > that
> > > of Apache Spark and Apache Flink. Users represent their data analytic
> > tasks
> > > as a DAG (Wayang plan), where vertices correspond to Wayang operators
> and
> > > edges represent data flows (data quanta flowing) among these
> operators. A
> > > Wayang operator defines a particular kind of data transformation over
> an
> > > input data quantum, ranging from basic functionality (e.g.,
> > > transformations, filters, joins) to complex, extensible tasks (e.g.,
> > > PageRank).
> > > - Cross-platform execution - Besides running a data analytic task on
> any
> > > data processing platform, it also comes with an optimizer that can
> decide
> > > to execute a single data analytic task using multiple data processing
> > > platforms. This allows for exploiting the capabilities of different
> data
> > > processing platforms to perform complex data analytic tasks more
> > > efficiently.
> > > Self-tuning UDF-based cost model - Its optimizer uses a cost model
> fully
> > > based on UDFs. This not only enables Wayang to learn the cost functions
> > of
> > > newly added data processing platforms, but also allows developers to
> tune
> > > the optimizer at will.
> > > - Extensibility - It treats data processing platforms as plugins to
> allow
> > > users (developers) to easily incorporate new data processing platforms
> > into
> > > the system. This is achieved by exposing the functionalities of data
> > > processing platforms as operators (execution operators). The same
> > approach
> > > is followed at the Wayang interface, where users can also extend Wayang
> > > capabilities, i.e., the operators, easily.
> > >
> > > We plan to work on the stability of all these features as well as
> > > extending Wayang 

Re: [VOTE] Accept Wayang into the Apache Incubator

2020-12-11 Thread Furkan KAMACI
Hi,

+1 (binding)

Kind Regards,
Furkan KAMACI

On 11 Dec 2020 Fri at 20:04 Daniel B. Widdis  wrote:

> +1 (non-binding).  I'm interested in getting involved in this project!
>
> On Fri, Dec 11, 2020 at 8:33 AM Christofer Dutz  >
> wrote:
>
> > Hi all,
> >
> > following up the [DISCUSS] thread on Wayang (
> >
> https://lists.apache.org/thread.html/r5fc03ae014f44c7c31a509a6db4ac07faedb2e1c6245cd917b744826%40%3Cgeneral.incubator.apache.org%3E
> )
> > I would like to call a VOTE to accept Wayang Aka Rheem into the Apache
> > Incubator.
> >
> > Please cast your vote:
> >
> >   [ ] +1, bring Wayang into the Incubator
> >   [ ] +0, I don't care either way
> >   [ ] -1, do not bring Wayang into the Incubator, because...
> >
> > The vote will open at least for 72 hours and only votes from the
> Incubator
> > PMC are binding, but votes from everyone are welcome.
> >
> > Chris
> >
> > -
> >
> > Wayang Proposal (
> > https://cwiki.apache.org/confluence/display/INCUBATOR/WayangProposal)
> >
> > == Abstract ==
> >
> > Wayang is a cross-platform data processing system that aims at decoupling
> > the business logic of data analytics applications from concrete data
> > processing platforms, such as Apache Flink or Apache Spark. Hence, it
> tames
> > the complexity that arises from the "Cambrian explosion" of novel data
> > processing platforms that we currently witness.
> >
> > Note that Wayang project is the Rheem project, but we have renamed the
> > project because of trademark issues.
> >
> > You can find the project web page at: https://rheem-ecosystem.github.io/
> >
> > = Proposal =
> >
> > Wayang is a cross-platform system that provides an abstraction over data
> > processing platforms to free users from the burdens of (i) performing
> > tedious and costly data migration and integration tasks to run their
> > applications, and (ii) choosing the right data processing platforms for
> > their applications. To achieve this, Wayang: (1) provides an abstraction
> on
> > top of existing data processing platforms that allows users to specify
> > their data analytics tasks in a form of a DAG of operators; (2) comes
> with
> > a cross-platform optimizer for automating the selection of
> > suitable/efficient platforms; and (3) and finally takes care of executing
> > the optimized plan, including communication across platforms. In summary,
> > Wayang has the following salient features:
> >
> > - Flexible Data Model - It considers a flexible and simple data model
> > based on data quanta. A data quantum is an atomic processing unit in the
> > system, that can represent a large spectrum of data formats, such as data
> > points for a machine learning application, tuples for a database
> > application, or RDF triples. Hence, Wayang is able to express a wide
> range
> > of data analytics tasks.
> > - Platform independence - It provides a simple interface (currently Java
> > and Scala) that is inspired by established programming models, such as
> that
> > of Apache Spark and Apache Flink. Users represent their data analytic
> tasks
> > as a DAG (Wayang plan), where vertices correspond to Wayang operators and
> > edges represent data flows (data quanta flowing) among these operators. A
> > Wayang operator defines a particular kind of data transformation over an
> > input data quantum, ranging from basic functionality (e.g.,
> > transformations, filters, joins) to complex, extensible tasks (e.g.,
> > PageRank).
> > - Cross-platform execution - Besides running a data analytic task on any
> > data processing platform, it also comes with an optimizer that can decide
> > to execute a single data analytic task using multiple data processing
> > platforms. This allows for exploiting the capabilities of different data
> > processing platforms to perform complex data analytic tasks more
> > efficiently.
> > Self-tuning UDF-based cost model - Its optimizer uses a cost model fully
> > based on UDFs. This not only enables Wayang to learn the cost functions
> of
> > newly added data processing platforms, but also allows developers to tune
> > the optimizer at will.
> > - Extensibility - It treats data processing platforms as plugins to allow
> > users (developers) to easily incorporate new data processing platforms
> into
> > the system. This is achieved by exposing the functionalities of data
> > processing platforms as operators (execution operators). The same
> approach
> > is followed at the Wayang interface, where users can also extend Wayang
> > capabilities, i.e., the operators, easily.
> >
> > We plan to work on the stability of all these features as well as
> > extending Wayang with more advanced features. Furthermore, Wayang
> currently
> > supports Apache Spark, Standalone Java, GraphChi, relational databases
> (via
> > JDBC). We plan to incorporate more data processing platforms, such as
> > Apache Flink and Apache Hive.
> >
> > === Background ===
> >
> > Many organizations and companies collect or produce large variety of data
> 

Re: [VOTE] Accept Wayang into the Apache Incubator

2020-12-11 Thread Daniel B. Widdis
+1 (non-binding).  I'm interested in getting involved in this project!

On Fri, Dec 11, 2020 at 8:33 AM Christofer Dutz 
wrote:

> Hi all,
>
> following up the [DISCUSS] thread on Wayang (
> https://lists.apache.org/thread.html/r5fc03ae014f44c7c31a509a6db4ac07faedb2e1c6245cd917b744826%40%3Cgeneral.incubator.apache.org%3E)
> I would like to call a VOTE to accept Wayang Aka Rheem into the Apache
> Incubator.
>
> Please cast your vote:
>
>   [ ] +1, bring Wayang into the Incubator
>   [ ] +0, I don't care either way
>   [ ] -1, do not bring Wayang into the Incubator, because...
>
> The vote will open at least for 72 hours and only votes from the Incubator
> PMC are binding, but votes from everyone are welcome.
>
> Chris
>
> -
>
> Wayang Proposal (
> https://cwiki.apache.org/confluence/display/INCUBATOR/WayangProposal)
>
> == Abstract ==
>
> Wayang is a cross-platform data processing system that aims at decoupling
> the business logic of data analytics applications from concrete data
> processing platforms, such as Apache Flink or Apache Spark. Hence, it tames
> the complexity that arises from the "Cambrian explosion" of novel data
> processing platforms that we currently witness.
>
> Note that Wayang project is the Rheem project, but we have renamed the
> project because of trademark issues.
>
> You can find the project web page at: https://rheem-ecosystem.github.io/
>
> = Proposal =
>
> Wayang is a cross-platform system that provides an abstraction over data
> processing platforms to free users from the burdens of (i) performing
> tedious and costly data migration and integration tasks to run their
> applications, and (ii) choosing the right data processing platforms for
> their applications. To achieve this, Wayang: (1) provides an abstraction on
> top of existing data processing platforms that allows users to specify
> their data analytics tasks in a form of a DAG of operators; (2) comes with
> a cross-platform optimizer for automating the selection of
> suitable/efficient platforms; and (3) and finally takes care of executing
> the optimized plan, including communication across platforms. In summary,
> Wayang has the following salient features:
>
> - Flexible Data Model - It considers a flexible and simple data model
> based on data quanta. A data quantum is an atomic processing unit in the
> system, that can represent a large spectrum of data formats, such as data
> points for a machine learning application, tuples for a database
> application, or RDF triples. Hence, Wayang is able to express a wide range
> of data analytics tasks.
> - Platform independence - It provides a simple interface (currently Java
> and Scala) that is inspired by established programming models, such as that
> of Apache Spark and Apache Flink. Users represent their data analytic tasks
> as a DAG (Wayang plan), where vertices correspond to Wayang operators and
> edges represent data flows (data quanta flowing) among these operators. A
> Wayang operator defines a particular kind of data transformation over an
> input data quantum, ranging from basic functionality (e.g.,
> transformations, filters, joins) to complex, extensible tasks (e.g.,
> PageRank).
> - Cross-platform execution - Besides running a data analytic task on any
> data processing platform, it also comes with an optimizer that can decide
> to execute a single data analytic task using multiple data processing
> platforms. This allows for exploiting the capabilities of different data
> processing platforms to perform complex data analytic tasks more
> efficiently.
> Self-tuning UDF-based cost model - Its optimizer uses a cost model fully
> based on UDFs. This not only enables Wayang to learn the cost functions of
> newly added data processing platforms, but also allows developers to tune
> the optimizer at will.
> - Extensibility - It treats data processing platforms as plugins to allow
> users (developers) to easily incorporate new data processing platforms into
> the system. This is achieved by exposing the functionalities of data
> processing platforms as operators (execution operators). The same approach
> is followed at the Wayang interface, where users can also extend Wayang
> capabilities, i.e., the operators, easily.
>
> We plan to work on the stability of all these features as well as
> extending Wayang with more advanced features. Furthermore, Wayang currently
> supports Apache Spark, Standalone Java, GraphChi, relational databases (via
> JDBC). We plan to incorporate more data processing platforms, such as
> Apache Flink and Apache Hive.
>
> === Background ===
>
> Many organizations and companies collect or produce large variety of data
> to apply data analytics over them. This is because insights from data
> rapidly allow them to make better decisions. Thus, the pursuit for
> efficient and scalable data analytics as well as the
> one-size-does-not-fit-all philosophy has given rise to a plethora of data
> processing platforms. Examples of these specialized 

Re: [VOTE] Accept Wayang into the Apache Incubator

2020-12-11 Thread Jean-Baptiste Onofre
+1 (binding)

Regards
JB

> Le 11 déc. 2020 à 17:33, Christofer Dutz  a écrit :
> 
> Hi all,
> 
> following up the [DISCUSS] thread on Wayang 
> (https://lists.apache.org/thread.html/r5fc03ae014f44c7c31a509a6db4ac07faedb2e1c6245cd917b744826%40%3Cgeneral.incubator.apache.org%3E)
>  I would like to call a VOTE to accept Wayang Aka Rheem into the Apache 
> Incubator.
> 
> Please cast your vote:
> 
>  [ ] +1, bring Wayang into the Incubator
>  [ ] +0, I don't care either way
>  [ ] -1, do not bring Wayang into the Incubator, because...
> 
> The vote will open at least for 72 hours and only votes from the Incubator 
> PMC are binding, but votes from everyone are welcome.
> 
> Chris
> 
> -
> 
> Wayang Proposal 
> (https://cwiki.apache.org/confluence/display/INCUBATOR/WayangProposal)
> 
> == Abstract ==
> 
> Wayang is a cross-platform data processing system that aims at decoupling the 
> business logic of data analytics applications from concrete data processing 
> platforms, such as Apache Flink or Apache Spark. Hence, it tames the 
> complexity that arises from the "Cambrian explosion" of novel data processing 
> platforms that we currently witness.
> 
> Note that Wayang project is the Rheem project, but we have renamed the 
> project because of trademark issues.
> 
> You can find the project web page at: https://rheem-ecosystem.github.io/
> 
> = Proposal =
> 
> Wayang is a cross-platform system that provides an abstraction over data 
> processing platforms to free users from the burdens of (i) performing tedious 
> and costly data migration and integration tasks to run their applications, 
> and (ii) choosing the right data processing platforms for their applications. 
> To achieve this, Wayang: (1) provides an abstraction on top of existing data 
> processing platforms that allows users to specify their data analytics tasks 
> in a form of a DAG of operators; (2) comes with a cross-platform optimizer 
> for automating the selection of suitable/efficient platforms; and (3) and 
> finally takes care of executing the optimized plan, including communication 
> across platforms. In summary, Wayang has the following salient features:
> 
> - Flexible Data Model - It considers a flexible and simple data model based 
> on data quanta. A data quantum is an atomic processing unit in the system, 
> that can represent a large spectrum of data formats, such as data points for 
> a machine learning application, tuples for a database application, or RDF 
> triples. Hence, Wayang is able to express a wide range of data analytics 
> tasks.
> - Platform independence - It provides a simple interface (currently Java and 
> Scala) that is inspired by established programming models, such as that of 
> Apache Spark and Apache Flink. Users represent their data analytic tasks as a 
> DAG (Wayang plan), where vertices correspond to Wayang operators and edges 
> represent data flows (data quanta flowing) among these operators. A Wayang 
> operator defines a particular kind of data transformation over an input data 
> quantum, ranging from basic functionality (e.g., transformations, filters, 
> joins) to complex, extensible tasks (e.g., PageRank).
> - Cross-platform execution - Besides running a data analytic task on any data 
> processing platform, it also comes with an optimizer that can decide to 
> execute a single data analytic task using multiple data processing platforms. 
> This allows for exploiting the capabilities of different data processing 
> platforms to perform complex data analytic tasks more efficiently.
> Self-tuning UDF-based cost model - Its optimizer uses a cost model fully 
> based on UDFs. This not only enables Wayang to learn the cost functions of 
> newly added data processing platforms, but also allows developers to tune the 
> optimizer at will.
> - Extensibility - It treats data processing platforms as plugins to allow 
> users (developers) to easily incorporate new data processing platforms into 
> the system. This is achieved by exposing the functionalities of data 
> processing platforms as operators (execution operators). The same approach is 
> followed at the Wayang interface, where users can also extend Wayang 
> capabilities, i.e., the operators, easily.
> 
> We plan to work on the stability of all these features as well as extending 
> Wayang with more advanced features. Furthermore, Wayang currently supports 
> Apache Spark, Standalone Java, GraphChi, relational databases (via JDBC). We 
> plan to incorporate more data processing platforms, such as Apache Flink and 
> Apache Hive.
> 
> === Background ===
> 
> Many organizations and companies collect or produce large variety of data to 
> apply data analytics over them. This is because insights from data rapidly 
> allow them to make better decisions. Thus, the pursuit for efficient and 
> scalable data analytics as well as the one-size-does-not-fit-all philosophy 
> has given rise to a plethora of data processing platforms. Examples of these 
> 

Re: [VOTE] Accept Wayang into the Apache Incubator

2020-12-11 Thread Alexander Alten-Lorenz
+1 (unbinding)

On Fri, Dec 11, 2020 at 5:40 PM Kevin Ratnasekera
 wrote:
>
> +1 (binding)
>
> On Fri, Dec 11, 2020 at 10:05 PM Dave Fisher  wrote:
>
> > +1 (binding)
> >
> > Sent from my iPhone
> >
> > > On Dec 11, 2020, at 8:33 AM, Christofer Dutz 
> > wrote:
> > >
> > > Hi all,
> > >
> > > following up the [DISCUSS] thread on Wayang (
> > https://lists.apache.org/thread.html/r5fc03ae014f44c7c31a509a6db4ac07faedb2e1c6245cd917b744826%40%3Cgeneral.incubator.apache.org%3E)
> > I would like to call a VOTE to accept Wayang Aka Rheem into the Apache
> > Incubator.
> > >
> > > Please cast your vote:
> > >
> > >  [ ] +1, bring Wayang into the Incubator
> > >  [ ] +0, I don't care either way
> > >  [ ] -1, do not bring Wayang into the Incubator, because...
> > >
> > > The vote will open at least for 72 hours and only votes from the
> > Incubator PMC are binding, but votes from everyone are welcome.
> > >
> > > Chris
> > >
> > > -
> > >
> > > Wayang Proposal (
> > https://cwiki.apache.org/confluence/display/INCUBATOR/WayangProposal)
> > >
> > > == Abstract ==
> > >
> > > Wayang is a cross-platform data processing system that aims at
> > decoupling the business logic of data analytics applications from concrete
> > data processing platforms, such as Apache Flink or Apache Spark. Hence, it
> > tames the complexity that arises from the "Cambrian explosion" of novel
> > data processing platforms that we currently witness.
> > >
> > > Note that Wayang project is the Rheem project, but we have renamed the
> > project because of trademark issues.
> > >
> > > You can find the project web page at: https://rheem-ecosystem.github.io/
> > >
> > > = Proposal =
> > >
> > > Wayang is a cross-platform system that provides an abstraction over data
> > processing platforms to free users from the burdens of (i) performing
> > tedious and costly data migration and integration tasks to run their
> > applications, and (ii) choosing the right data processing platforms for
> > their applications. To achieve this, Wayang: (1) provides an abstraction on
> > top of existing data processing platforms that allows users to specify
> > their data analytics tasks in a form of a DAG of operators; (2) comes with
> > a cross-platform optimizer for automating the selection of
> > suitable/efficient platforms; and (3) and finally takes care of executing
> > the optimized plan, including communication across platforms. In summary,
> > Wayang has the following salient features:
> > >
> > > - Flexible Data Model - It considers a flexible and simple data model
> > based on data quanta. A data quantum is an atomic processing unit in the
> > system, that can represent a large spectrum of data formats, such as data
> > points for a machine learning application, tuples for a database
> > application, or RDF triples. Hence, Wayang is able to express a wide range
> > of data analytics tasks.
> > > - Platform independence - It provides a simple interface (currently Java
> > and Scala) that is inspired by established programming models, such as that
> > of Apache Spark and Apache Flink. Users represent their data analytic tasks
> > as a DAG (Wayang plan), where vertices correspond to Wayang operators and
> > edges represent data flows (data quanta flowing) among these operators. A
> > Wayang operator defines a particular kind of data transformation over an
> > input data quantum, ranging from basic functionality (e.g.,
> > transformations, filters, joins) to complex, extensible tasks (e.g.,
> > PageRank).
> > > - Cross-platform execution - Besides running a data analytic task on any
> > data processing platform, it also comes with an optimizer that can decide
> > to execute a single data analytic task using multiple data processing
> > platforms. This allows for exploiting the capabilities of different data
> > processing platforms to perform complex data analytic tasks more
> > efficiently.
> > > Self-tuning UDF-based cost model - Its optimizer uses a cost model fully
> > based on UDFs. This not only enables Wayang to learn the cost functions of
> > newly added data processing platforms, but also allows developers to tune
> > the optimizer at will.
> > > - Extensibility - It treats data processing platforms as plugins to
> > allow users (developers) to easily incorporate new data processing
> > platforms into the system. This is achieved by exposing the functionalities
> > of data processing platforms as operators (execution operators). The same
> > approach is followed at the Wayang interface, where users can also extend
> > Wayang capabilities, i.e., the operators, easily.
> > >
> > > We plan to work on the stability of all these features as well as
> > extending Wayang with more advanced features. Furthermore, Wayang currently
> > supports Apache Spark, Standalone Java, GraphChi, relational databases (via
> > JDBC). We plan to incorporate more data processing platforms, such as
> > Apache Flink and Apache Hive.
> > >
> > > === Background ===
> > >
> > 

Re: [VOTE] Accept Wayang into the Apache Incubator

2020-12-11 Thread Kevin Ratnasekera
+1 (binding)

On Fri, Dec 11, 2020 at 10:05 PM Dave Fisher  wrote:

> +1 (binding)
>
> Sent from my iPhone
>
> > On Dec 11, 2020, at 8:33 AM, Christofer Dutz 
> wrote:
> >
> > Hi all,
> >
> > following up the [DISCUSS] thread on Wayang (
> https://lists.apache.org/thread.html/r5fc03ae014f44c7c31a509a6db4ac07faedb2e1c6245cd917b744826%40%3Cgeneral.incubator.apache.org%3E)
> I would like to call a VOTE to accept Wayang Aka Rheem into the Apache
> Incubator.
> >
> > Please cast your vote:
> >
> >  [ ] +1, bring Wayang into the Incubator
> >  [ ] +0, I don't care either way
> >  [ ] -1, do not bring Wayang into the Incubator, because...
> >
> > The vote will open at least for 72 hours and only votes from the
> Incubator PMC are binding, but votes from everyone are welcome.
> >
> > Chris
> >
> > -
> >
> > Wayang Proposal (
> https://cwiki.apache.org/confluence/display/INCUBATOR/WayangProposal)
> >
> > == Abstract ==
> >
> > Wayang is a cross-platform data processing system that aims at
> decoupling the business logic of data analytics applications from concrete
> data processing platforms, such as Apache Flink or Apache Spark. Hence, it
> tames the complexity that arises from the "Cambrian explosion" of novel
> data processing platforms that we currently witness.
> >
> > Note that Wayang project is the Rheem project, but we have renamed the
> project because of trademark issues.
> >
> > You can find the project web page at: https://rheem-ecosystem.github.io/
> >
> > = Proposal =
> >
> > Wayang is a cross-platform system that provides an abstraction over data
> processing platforms to free users from the burdens of (i) performing
> tedious and costly data migration and integration tasks to run their
> applications, and (ii) choosing the right data processing platforms for
> their applications. To achieve this, Wayang: (1) provides an abstraction on
> top of existing data processing platforms that allows users to specify
> their data analytics tasks in a form of a DAG of operators; (2) comes with
> a cross-platform optimizer for automating the selection of
> suitable/efficient platforms; and (3) and finally takes care of executing
> the optimized plan, including communication across platforms. In summary,
> Wayang has the following salient features:
> >
> > - Flexible Data Model - It considers a flexible and simple data model
> based on data quanta. A data quantum is an atomic processing unit in the
> system, that can represent a large spectrum of data formats, such as data
> points for a machine learning application, tuples for a database
> application, or RDF triples. Hence, Wayang is able to express a wide range
> of data analytics tasks.
> > - Platform independence - It provides a simple interface (currently Java
> and Scala) that is inspired by established programming models, such as that
> of Apache Spark and Apache Flink. Users represent their data analytic tasks
> as a DAG (Wayang plan), where vertices correspond to Wayang operators and
> edges represent data flows (data quanta flowing) among these operators. A
> Wayang operator defines a particular kind of data transformation over an
> input data quantum, ranging from basic functionality (e.g.,
> transformations, filters, joins) to complex, extensible tasks (e.g.,
> PageRank).
> > - Cross-platform execution - Besides running a data analytic task on any
> data processing platform, it also comes with an optimizer that can decide
> to execute a single data analytic task using multiple data processing
> platforms. This allows for exploiting the capabilities of different data
> processing platforms to perform complex data analytic tasks more
> efficiently.
> > Self-tuning UDF-based cost model - Its optimizer uses a cost model fully
> based on UDFs. This not only enables Wayang to learn the cost functions of
> newly added data processing platforms, but also allows developers to tune
> the optimizer at will.
> > - Extensibility - It treats data processing platforms as plugins to
> allow users (developers) to easily incorporate new data processing
> platforms into the system. This is achieved by exposing the functionalities
> of data processing platforms as operators (execution operators). The same
> approach is followed at the Wayang interface, where users can also extend
> Wayang capabilities, i.e., the operators, easily.
> >
> > We plan to work on the stability of all these features as well as
> extending Wayang with more advanced features. Furthermore, Wayang currently
> supports Apache Spark, Standalone Java, GraphChi, relational databases (via
> JDBC). We plan to incorporate more data processing platforms, such as
> Apache Flink and Apache Hive.
> >
> > === Background ===
> >
> > Many organizations and companies collect or produce large variety of
> data to apply data analytics over them. This is because insights from data
> rapidly allow them to make better decisions. Thus, the pursuit for
> efficient and scalable data analytics as well as the
> 

Re: [VOTE] Accept Wayang into the Apache Incubator

2020-12-11 Thread Dave Fisher
+1 (binding)

Sent from my iPhone

> On Dec 11, 2020, at 8:33 AM, Christofer Dutz  
> wrote:
> 
> Hi all,
> 
> following up the [DISCUSS] thread on Wayang 
> (https://lists.apache.org/thread.html/r5fc03ae014f44c7c31a509a6db4ac07faedb2e1c6245cd917b744826%40%3Cgeneral.incubator.apache.org%3E)
>  I would like to call a VOTE to accept Wayang Aka Rheem into the Apache 
> Incubator.
> 
> Please cast your vote:
> 
>  [ ] +1, bring Wayang into the Incubator
>  [ ] +0, I don't care either way
>  [ ] -1, do not bring Wayang into the Incubator, because...
> 
> The vote will open at least for 72 hours and only votes from the Incubator 
> PMC are binding, but votes from everyone are welcome.
> 
> Chris
> 
> -
> 
> Wayang Proposal 
> (https://cwiki.apache.org/confluence/display/INCUBATOR/WayangProposal)
> 
> == Abstract ==
> 
> Wayang is a cross-platform data processing system that aims at decoupling the 
> business logic of data analytics applications from concrete data processing 
> platforms, such as Apache Flink or Apache Spark. Hence, it tames the 
> complexity that arises from the "Cambrian explosion" of novel data processing 
> platforms that we currently witness.
> 
> Note that Wayang project is the Rheem project, but we have renamed the 
> project because of trademark issues.
> 
> You can find the project web page at: https://rheem-ecosystem.github.io/
> 
> = Proposal =
> 
> Wayang is a cross-platform system that provides an abstraction over data 
> processing platforms to free users from the burdens of (i) performing tedious 
> and costly data migration and integration tasks to run their applications, 
> and (ii) choosing the right data processing platforms for their applications. 
> To achieve this, Wayang: (1) provides an abstraction on top of existing data 
> processing platforms that allows users to specify their data analytics tasks 
> in a form of a DAG of operators; (2) comes with a cross-platform optimizer 
> for automating the selection of suitable/efficient platforms; and (3) and 
> finally takes care of executing the optimized plan, including communication 
> across platforms. In summary, Wayang has the following salient features:
> 
> - Flexible Data Model - It considers a flexible and simple data model based 
> on data quanta. A data quantum is an atomic processing unit in the system, 
> that can represent a large spectrum of data formats, such as data points for 
> a machine learning application, tuples for a database application, or RDF 
> triples. Hence, Wayang is able to express a wide range of data analytics 
> tasks.
> - Platform independence - It provides a simple interface (currently Java and 
> Scala) that is inspired by established programming models, such as that of 
> Apache Spark and Apache Flink. Users represent their data analytic tasks as a 
> DAG (Wayang plan), where vertices correspond to Wayang operators and edges 
> represent data flows (data quanta flowing) among these operators. A Wayang 
> operator defines a particular kind of data transformation over an input data 
> quantum, ranging from basic functionality (e.g., transformations, filters, 
> joins) to complex, extensible tasks (e.g., PageRank).
> - Cross-platform execution - Besides running a data analytic task on any data 
> processing platform, it also comes with an optimizer that can decide to 
> execute a single data analytic task using multiple data processing platforms. 
> This allows for exploiting the capabilities of different data processing 
> platforms to perform complex data analytic tasks more efficiently.
> Self-tuning UDF-based cost model - Its optimizer uses a cost model fully 
> based on UDFs. This not only enables Wayang to learn the cost functions of 
> newly added data processing platforms, but also allows developers to tune the 
> optimizer at will.
> - Extensibility - It treats data processing platforms as plugins to allow 
> users (developers) to easily incorporate new data processing platforms into 
> the system. This is achieved by exposing the functionalities of data 
> processing platforms as operators (execution operators). The same approach is 
> followed at the Wayang interface, where users can also extend Wayang 
> capabilities, i.e., the operators, easily.
> 
> We plan to work on the stability of all these features as well as extending 
> Wayang with more advanced features. Furthermore, Wayang currently supports 
> Apache Spark, Standalone Java, GraphChi, relational databases (via JDBC). We 
> plan to incorporate more data processing platforms, such as Apache Flink and 
> Apache Hive.
> 
> === Background ===
> 
> Many organizations and companies collect or produce large variety of data to 
> apply data analytics over them. This is because insights from data rapidly 
> allow them to make better decisions. Thus, the pursuit for efficient and 
> scalable data analytics as well as the one-size-does-not-fit-all philosophy 
> has given rise to a plethora of data processing platforms. 

[VOTE] Accept Wayang into the Apache Incubator

2020-12-11 Thread Christofer Dutz
Hi all,

following up the [DISCUSS] thread on Wayang 
(https://lists.apache.org/thread.html/r5fc03ae014f44c7c31a509a6db4ac07faedb2e1c6245cd917b744826%40%3Cgeneral.incubator.apache.org%3E)
 I would like to call a VOTE to accept Wayang Aka Rheem into the Apache 
Incubator.

Please cast your vote:

  [ ] +1, bring Wayang into the Incubator
  [ ] +0, I don't care either way
  [ ] -1, do not bring Wayang into the Incubator, because...

The vote will open at least for 72 hours and only votes from the Incubator PMC 
are binding, but votes from everyone are welcome.

Chris

-

Wayang Proposal 
(https://cwiki.apache.org/confluence/display/INCUBATOR/WayangProposal)

== Abstract ==

Wayang is a cross-platform data processing system that aims at decoupling the 
business logic of data analytics applications from concrete data processing 
platforms, such as Apache Flink or Apache Spark. Hence, it tames the complexity 
that arises from the "Cambrian explosion" of novel data processing platforms 
that we currently witness.

Note that Wayang project is the Rheem project, but we have renamed the project 
because of trademark issues.

You can find the project web page at: https://rheem-ecosystem.github.io/

= Proposal =

Wayang is a cross-platform system that provides an abstraction over data 
processing platforms to free users from the burdens of (i) performing tedious 
and costly data migration and integration tasks to run their applications, and 
(ii) choosing the right data processing platforms for their applications. To 
achieve this, Wayang: (1) provides an abstraction on top of existing data 
processing platforms that allows users to specify their data analytics tasks in 
a form of a DAG of operators; (2) comes with a cross-platform optimizer for 
automating the selection of suitable/efficient platforms; and (3) and finally 
takes care of executing the optimized plan, including communication across 
platforms. In summary, Wayang has the following salient features:

- Flexible Data Model - It considers a flexible and simple data model based on 
data quanta. A data quantum is an atomic processing unit in the system, that 
can represent a large spectrum of data formats, such as data points for a 
machine learning application, tuples for a database application, or RDF 
triples. Hence, Wayang is able to express a wide range of data analytics tasks.
- Platform independence - It provides a simple interface (currently Java and 
Scala) that is inspired by established programming models, such as that of 
Apache Spark and Apache Flink. Users represent their data analytic tasks as a 
DAG (Wayang plan), where vertices correspond to Wayang operators and edges 
represent data flows (data quanta flowing) among these operators. A Wayang 
operator defines a particular kind of data transformation over an input data 
quantum, ranging from basic functionality (e.g., transformations, filters, 
joins) to complex, extensible tasks (e.g., PageRank).
- Cross-platform execution - Besides running a data analytic task on any data 
processing platform, it also comes with an optimizer that can decide to execute 
a single data analytic task using multiple data processing platforms. This 
allows for exploiting the capabilities of different data processing platforms 
to perform complex data analytic tasks more efficiently.
Self-tuning UDF-based cost model - Its optimizer uses a cost model fully based 
on UDFs. This not only enables Wayang to learn the cost functions of newly 
added data processing platforms, but also allows developers to tune the 
optimizer at will.
- Extensibility - It treats data processing platforms as plugins to allow users 
(developers) to easily incorporate new data processing platforms into the 
system. This is achieved by exposing the functionalities of data processing 
platforms as operators (execution operators). The same approach is followed at 
the Wayang interface, where users can also extend Wayang capabilities, i.e., 
the operators, easily.

We plan to work on the stability of all these features as well as extending 
Wayang with more advanced features. Furthermore, Wayang currently supports 
Apache Spark, Standalone Java, GraphChi, relational databases (via JDBC). We 
plan to incorporate more data processing platforms, such as Apache Flink and 
Apache Hive.

=== Background ===

Many organizations and companies collect or produce large variety of data to 
apply data analytics over them. This is because insights from data rapidly 
allow them to make better decisions. Thus, the pursuit for efficient and 
scalable data analytics as well as the one-size-does-not-fit-all philosophy has 
given rise to a plethora of data processing platforms. Examples of these 
specialized processing platforms range from DBMSs to MapReduce-like platforms.

However, today's data analytics are moving beyond the limits of a single data 
processing platform. More and more applications need to perform complex data 
analytics over several data