Re: [VOTE] Accept Apex into the Apache Incubator
This vote is now closed and passes with 15 binding +1 votes, 9 non-binding +1 votes and not 0 or -1 votes. Vote tally (* indicates a binding vote): +1: Pramod Immaneni Chris Nauroth* Gaurav Gupta Julian Hyde* Seetharam Venkatesh Hitesh Shah* Ted Dunning* Alan Gates* P. Taylor Goetz* Henry Saputra* Ashwin Chandra Putta David Yan John D. Ament* Amol Kekre Luke Han Atri Sharma Chris Douglas* Justin Mclean* Naresh Agarwal Bertrand Delacretaz* Jan Iversen* Amareshwari Sriramdasu* Roman Shaposhnik* Niall Pemberton* 0: -none- -1: -none- Thank you to all who voted. -Taylor On Aug 13, 2015, at 10:48 AM, P. Taylor Goetz ptgo...@apache.org wrote: Following the discussion thread [1], I would like to call a VOTE for Accepting Apex as a new Apache Incubator project. The proposal is available on the wiki [2] and is also attached below. The VOTE will be open for at least 72 hours. [ ] +1 Accept Apex into the Incubator [ ] ±0 No opinion [ ] -1 Do not accept Apex into the Incubator because… Thanks, -Taylor [1] http://s.apache.org/apex_discuss http://s.apache.org/apex_discuss [2] https://wiki.apache.org/incubator/ApexProposal https://wiki.apache.org/incubator/ApexProposal == Abstract == Apex is an enterprise grade native YARN big data-in-motion platform that unifies stream processing as well as batch processing. Apex processes big data in-motion in a highly scalable, highly performant, fault tolerant, stateful, secure, distributed, and an easily operable way. It provides a simple API that enables users to write or re-use generic Java code, thereby lowering the expertise needed to write big data applications. Functional and operational specifications are separated. Apex is designed in a way to enable users to write their own code (aka user defined functions) as is and leave all operability to the platform. The API is very simple and is designed to allow users to drop in their code as is. The platform mainly deals with operability and treats functional code as a black box. Operability includes fault tolerance, scalability, security, ease of use, metrics api, webservices, etc. In other words there is no separation of UDF (user defined functions), as all functional code is UDF. This frees users to focus on functional development, and lets platform provide operability support. The same code runs as is with different operability attributes. The data-in-motion architecture of Apex unifies stream as well as batch processing in a single platform. Since Apex is a native YARN application, it leverages all the components of YARN without duplication. Apex was developed with YARN in mind and has no overlapping components/functionality with YARN. The Apex platform is supplemented by project Malhar, which is a library of operators that implement common business logic functions needed by customers who want to quickly develop applications. These operators provide access to HDFS, S3, NFS, FTP, and other file systems; Kafka, ActiveMQ, RabbitMQ, JMS, and other message systems; MySql, Cassandra, MongoDB, Redis, HBase, CouchDB and other databases along with JDBC connectors. The Malhar library also includes a host of other common business logic patterns that help users to significantly reduce the time it takes to go into production. Ease of integration with all other big data technologies is one of the primary missions of Malhar. == Proposal == The goal of this proposal is to establish the core engine of DataTorrent RTS product as an Apache Software Foundation (ASF) project in order to build a vibrant, diverse, and self-governed open source community around the technology. DataTorrent will continue to sell management tools, application building tools, easy to use big data applications, and custom high end business logic operators. This proposal covers the Apex source code (written in Java), Apex documentation and other materials currently available on https://github.com/DataTorrent/Apex https://github.com/DataTorrent/Apex. This proposal also covers the Malhar source code (written in Java), Malhar documentation, and other materials currently available on https://github.com/DataTorrent/Malhar https://github.com/DataTorrent/Malhar. We have done a trademark check on the name Apex, and have concluded that the Apex name is likely to be a suitable project name. == Background == DataTorrent RTS is a mature and robust product developed as a native YARN application. RTS 1.0 was launched in summer of 2014; RTS 2.0 was launched in Jan 2015. Both were well received by customers. RTS 3.0 was launched at end of July 2015. RTS is among the first enterprise grade platform that was developed from the ground up as native YARN application. DataTorrent RTS is currently maintained by engineers as a closed source project. Even though the engineers behind RTS are experienced software engineers and are knowledge leaders
Re: [VOTE] Accept Apex into the Apache Incubator
+1 (binding) have fun jan i On Friday, August 14, 2015, Bertrand Delacretaz bdelacre...@apache.org wrote: On Thu, Aug 13, 2015 at 4:48 PM, P. Taylor Goetz ptgo...@apache.org javascript:; wrote: Following the discussion thread [1], I would like to call a VOTE for Accepting Apex as a new Apache Incubator project. +1, binding -Bertrand - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org javascript:; For additional commands, e-mail: general-h...@incubator.apache.org javascript:; -- Sent from My iPad, sorry for any misspellings.
Re: [VOTE] Accept Apex into the Apache Incubator
On Thu, Aug 13, 2015 at 4:48 PM, P. Taylor Goetz ptgo...@apache.org wrote: Following the discussion thread [1], I would like to call a VOTE for Accepting Apex as a new Apache Incubator project. +1, binding -Bertrand - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [VOTE] Accept Apex into the Apache Incubator
+1 binding On Thu, Aug 13, 2015 at 8:18 PM, P. Taylor Goetz ptgo...@apache.org wrote: Following the discussion thread [1], I would like to call a VOTE for Accepting Apex as a new Apache Incubator project. The proposal is available on the wiki [2] and is also attached below. The VOTE will be open for at least 72 hours. [ ] +1 Accept Apex into the Incubator [ ] ±0 No opinion [ ] -1 Do not accept Apex into the Incubator because… Thanks, -Taylor [1] http://s.apache.org/apex_discuss [2] https://wiki.apache.org/incubator/ApexProposal == Abstract == Apex is an enterprise grade native YARN big data-in-motion platform that unifies stream processing as well as batch processing. Apex processes big data in-motion in a highly scalable, highly performant, fault tolerant, stateful, secure, distributed, and an easily operable way. It provides a simple API that enables users to write or re-use generic Java code, thereby lowering the expertise needed to write big data applications. Functional and operational specifications are separated. Apex is designed in a way to enable users to write their own code (aka user defined functions) as is and leave all operability to the platform. The API is very simple and is designed to allow users to drop in their code as is. The platform mainly deals with operability and treats functional code as a black box. Operability includes fault tolerance, scalability, security, ease of use, metrics api, webservices, etc. In other words there is no separation of UDF (user defined functions), as all functional code is UDF. This frees users to focus on functional development, and lets platform provide operability support. The same code runs as is with different operability attributes. The data-in-motion architecture of Apex unifies stream as well as batch processing in a single platform. Since Apex is a native YARN application, it leverages all the components of YARN without duplication. Apex was developed with YARN in mind and has no overlapping components/functionality with YARN. The Apex platform is supplemented by project Malhar, which is a library of operators that implement common business logic functions needed by customers who want to quickly develop applications. These operators provide access to HDFS, S3, NFS, FTP, and other file systems; Kafka, ActiveMQ, RabbitMQ, JMS, and other message systems; MySql, Cassandra, MongoDB, Redis, HBase, CouchDB and other databases along with JDBC connectors. The Malhar library also includes a host of other common business logic patterns that help users to significantly reduce the time it takes to go into production. Ease of integration with all other big data technologies is one of the primary missions of Malhar. == Proposal == The goal of this proposal is to establish the core engine of DataTorrent RTS product as an Apache Software Foundation (ASF) project in order to build a vibrant, diverse, and self-governed open source community around the technology. DataTorrent will continue to sell management tools, application building tools, easy to use big data applications, and custom high end business logic operators. This proposal covers the Apex source code (written in Java), Apex documentation and other materials currently available on https://github.com/DataTorrent/Apex. This proposal also covers the Malhar source code (written in Java), Malhar documentation, and other materials currently available on https://github.com/DataTorrent/Malhar. We have done a trademark check on the name Apex, and have concluded that the Apex name is likely to be a suitable project name. == Background == DataTorrent RTS is a mature and robust product developed as a native YARN application. RTS 1.0 was launched in summer of 2014; RTS 2.0 was launched in Jan 2015. Both were well received by customers. RTS 3.0 was launched at end of July 2015. RTS is among the first enterprise grade platform that was developed from the ground up as native YARN application. DataTorrent RTS is currently maintained by engineers as a closed source project. Even though the engineers behind RTS are experienced software engineers and are knowledge leaders in data-in-motion platforms, they have had little exposure to the open source governance process. Customers are currently running applications based on DataTorrent RTS in production. == Rationale == Big data applications written for non-Hadoop platforms typically require major rewrites to get them to work with Hadoop. This rewriting creates a significant bottleneck in terms of resources (expertise) which in turn jeopardizes the viability of such an endeavour. It is hard enough to acquire big data expertise, demanding additional expertise to do a major code conversion makes it a very hard problem for projects to successfully migrate to Hadoop. Also, due to the batch processing nature of Hadoop’s MapReduce paradigm, users often have to wait tens of minutes to see
Re: [VOTE] Accept Apex into the Apache Incubator
On Thu, Aug 13, 2015 at 7:48 AM, P. Taylor Goetz ptgo...@apache.org wrote: Following the discussion thread [1], I would like to call a VOTE for Accepting Apex as a new Apache Incubator project. The proposal is available on the wiki [2] and is also attached below. The VOTE will be open for at least 72 hours. [ ] +1 Accept Apex into the Incubator [ ] ±0 No opinion [ ] -1 Do not accept Apex into the Incubator because… +1 (binding) Thanks, Roman. - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [VOTE] Accept Apex into the Apache Incubator
+1 Niall On Thu, Aug 13, 2015 at 3:48 PM, P. Taylor Goetz ptgo...@apache.org wrote: Following the discussion thread [1], I would like to call a VOTE for Accepting Apex as a new Apache Incubator project. The proposal is available on the wiki [2] and is also attached below. The VOTE will be open for at least 72 hours. [ ] +1 Accept Apex into the Incubator [ ] ±0 No opinion [ ] -1 Do not accept Apex into the Incubator because… Thanks, -Taylor [1] http://s.apache.org/apex_discuss [2] https://wiki.apache.org/incubator/ApexProposal == Abstract == Apex is an enterprise grade native YARN big data-in-motion platform that unifies stream processing as well as batch processing. Apex processes big data in-motion in a highly scalable, highly performant, fault tolerant, stateful, secure, distributed, and an easily operable way. It provides a simple API that enables users to write or re-use generic Java code, thereby lowering the expertise needed to write big data applications. Functional and operational specifications are separated. Apex is designed in a way to enable users to write their own code (aka user defined functions) as is and leave all operability to the platform. The API is very simple and is designed to allow users to drop in their code as is. The platform mainly deals with operability and treats functional code as a black box. Operability includes fault tolerance, scalability, security, ease of use, metrics api, webservices, etc. In other words there is no separation of UDF (user defined functions), as all functional code is UDF. This frees users to focus on functional development, and lets platform provide operability support. The same code runs as is with different operability attributes. The data-in-motion architecture of Apex unifies stream as well as batch processing in a single platform. Since Apex is a native YARN application, it leverages all the components of YARN without duplication. Apex was developed with YARN in mind and has no overlapping components/functionality with YARN. The Apex platform is supplemented by project Malhar, which is a library of operators that implement common business logic functions needed by customers who want to quickly develop applications. These operators provide access to HDFS, S3, NFS, FTP, and other file systems; Kafka, ActiveMQ, RabbitMQ, JMS, and other message systems; MySql, Cassandra, MongoDB, Redis, HBase, CouchDB and other databases along with JDBC connectors. The Malhar library also includes a host of other common business logic patterns that help users to significantly reduce the time it takes to go into production. Ease of integration with all other big data technologies is one of the primary missions of Malhar. == Proposal == The goal of this proposal is to establish the core engine of DataTorrent RTS product as an Apache Software Foundation (ASF) project in order to build a vibrant, diverse, and self-governed open source community around the technology. DataTorrent will continue to sell management tools, application building tools, easy to use big data applications, and custom high end business logic operators. This proposal covers the Apex source code (written in Java), Apex documentation and other materials currently available on https://github.com/DataTorrent/Apex. This proposal also covers the Malhar source code (written in Java), Malhar documentation, and other materials currently available on https://github.com/DataTorrent/Malhar. We have done a trademark check on the name Apex, and have concluded that the Apex name is likely to be a suitable project name. == Background == DataTorrent RTS is a mature and robust product developed as a native YARN application. RTS 1.0 was launched in summer of 2014; RTS 2.0 was launched in Jan 2015. Both were well received by customers. RTS 3.0 was launched at end of July 2015. RTS is among the first enterprise grade platform that was developed from the ground up as native YARN application. DataTorrent RTS is currently maintained by engineers as a closed source project. Even though the engineers behind RTS are experienced software engineers and are knowledge leaders in data-in-motion platforms, they have had little exposure to the open source governance process. Customers are currently running applications based on DataTorrent RTS in production. == Rationale == Big data applications written for non-Hadoop platforms typically require major rewrites to get them to work with Hadoop. This rewriting creates a significant bottleneck in terms of resources (expertise) which in turn jeopardizes the viability of such an endeavour. It is hard enough to acquire big data expertise, demanding additional expertise to do a major code conversion makes it a very hard problem for projects to successfully migrate to Hadoop. Also, due to the batch processing nature of Hadoop’s MapReduce paradigm, users often have to wait tens of minutes to see
Re: [VOTE] Accept Apex into the Apache Incubator
+1 (binding) I believe the current proposal covers everything required. Thank you to Amol for incorporating the community's feedback. --Chris Nauroth From: P. Taylor Goetz ptgo...@apache.orgmailto:ptgo...@apache.org Reply-To: general@incubator.apache.orgmailto:general@incubator.apache.org Date: Thursday, August 13, 2015 at 7:48 AM To: Incubator general@incubator.apache.orgmailto:general@incubator.apache.org Subject: [VOTE] Accept Apex into the Apache Incubator Following the discussion thread [1], I would like to call a VOTE for Accepting Apex as a new Apache Incubator project. The proposal is available on the wiki [2] and is also attached below. The VOTE will be open for at least 72 hours. [ ] +1 Accept Apex into the Incubator [ ] ±0 No opinion [ ] -1 Do not accept Apex into the Incubator because… Thanks, -Taylor [1] http://s.apache.org/apex_discuss [2] https://wiki.apache.org/incubator/ApexProposal == Abstract == Apex is an enterprise grade native YARN big data-in-motion platform that unifies stream processing as well as batch processing. Apex processes big data in-motion in a highly scalable, highly performant, fault tolerant, stateful, secure, distributed, and an easily operable way. It provides a simple API that enables users to write or re-use generic Java code, thereby lowering the expertise needed to write big data applications. Functional and operational specifications are separated. Apex is designed in a way to enable users to write their own code (aka user defined functions) as is and leave all operability to the platform. The API is very simple and is designed to allow users to drop in their code as is. The platform mainly deals with operability and treats functional code as a black box. Operability includes fault tolerance, scalability, security, ease of use, metrics api, webservices, etc. In other words there is no separation of UDF (user defined functions), as all functional code is UDF. This frees users to focus on functional development, and lets platform provide operability support. The same code runs as is with different operability attributes. The data-in-motion architecture of Apex unifies stream as well as batch processing in a single platform. Since Apex is a native YARN application, it leverages all the components of YARN without duplication. Apex was developed with YARN in mind and has no overlapping components/functionality with YARN. The Apex platform is supplemented by project Malhar, which is a library of operators that implement common business logic functions needed by customers who want to quickly develop applications. These operators provide access to HDFS, S3, NFS, FTP, and other file systems; Kafka, ActiveMQ, RabbitMQ, JMS, and other message systems; MySql, Cassandra, MongoDB, Redis, HBase, CouchDB and other databases along with JDBC connectors. The Malhar library also includes a host of other common business logic patterns that help users to significantly reduce the time it takes to go into production. Ease of integration with all other big data technologies is one of the primary missions of Malhar. == Proposal == The goal of this proposal is to establish the core engine of DataTorrent RTS product as an Apache Software Foundation (ASF) project in order to build a vibrant, diverse, and self-governed open source community around the technology. DataTorrent will continue to sell management tools, application building tools, easy to use big data applications, and custom high end business logic operators. This proposal covers the Apex source code (written in Java), Apex documentation and other materials currently available on https://github.com/DataTorrent/Apex. This proposal also covers the Malhar source code (written in Java), Malhar documentation, and other materials currently available on https://github.com/DataTorrent/Malhar. We have done a trademark check on the name Apex, and have concluded that the Apex name is likely to be a suitable project name. == Background == DataTorrent RTS is a mature and robust product developed as a native YARN application. RTS 1.0 was launched in summer of 2014; RTS 2.0 was launched in Jan 2015. Both were well received by customers. RTS 3.0 was launched at end of July 2015. RTS is among the first enterprise grade platform that was developed from the ground up as native YARN application. DataTorrent RTS is currently maintained by engineers as a closed source project. Even though the engineers behind RTS are experienced software engineers and are knowledge leaders in data-in-motion platforms, they have had little exposure to the open source governance process. Customers are currently running applications based on DataTorrent RTS in production. == Rationale == Big data applications written for non-Hadoop platforms typically require major rewrites to get them to work with Hadoop. This rewriting creates a significant bottleneck in terms of resources (expertise) which in turn
Re: [VOTE] Accept Apex into the Apache Incubator
+1 (Non-binding) On Thu, Aug 13, 2015 at 7:48 AM, P. Taylor Goetz ptgo...@apache.org wrote: Following the discussion thread [1], I would like to call a VOTE for Accepting Apex as a new Apache Incubator project. The proposal is available on the wiki [2] and is also attached below. The VOTE will be open for at least 72 hours. [ ] +1 Accept Apex into the Incubator [ ] ±0 No opinion [ ] -1 Do not accept Apex into the Incubator because… Thanks, -Taylor [1] http://s.apache.org/apex_discuss [2] https://wiki.apache.org/incubator/ApexProposal == Abstract == Apex is an enterprise grade native YARN big data-in-motion platform that unifies stream processing as well as batch processing. Apex processes big data in-motion in a highly scalable, highly performant, fault tolerant, stateful, secure, distributed, and an easily operable way. It provides a simple API that enables users to write or re-use generic Java code, thereby lowering the expertise needed to write big data applications. Functional and operational specifications are separated. Apex is designed in a way to enable users to write their own code (aka user defined functions) as is and leave all operability to the platform. The API is very simple and is designed to allow users to drop in their code as is. The platform mainly deals with operability and treats functional code as a black box. Operability includes fault tolerance, scalability, security, ease of use, metrics api, webservices, etc. In other words there is no separation of UDF (user defined functions), as all functional code is UDF. This frees users to focus on functional development, and lets platform provide operability support. The same code runs as is with different operability attributes. The data-in-motion architecture of Apex unifies stream as well as batch processing in a single platform. Since Apex is a native YARN application, it leverages all the components of YARN without duplication. Apex was developed with YARN in mind and has no overlapping components/functionality with YARN. The Apex platform is supplemented by project Malhar, which is a library of operators that implement common business logic functions needed by customers who want to quickly develop applications. These operators provide access to HDFS, S3, NFS, FTP, and other file systems; Kafka, ActiveMQ, RabbitMQ, JMS, and other message systems; MySql, Cassandra, MongoDB, Redis, HBase, CouchDB and other databases along with JDBC connectors. The Malhar library also includes a host of other common business logic patterns that help users to significantly reduce the time it takes to go into production. Ease of integration with all other big data technologies is one of the primary missions of Malhar. == Proposal == The goal of this proposal is to establish the core engine of DataTorrent RTS product as an Apache Software Foundation (ASF) project in order to build a vibrant, diverse, and self-governed open source community around the technology. DataTorrent will continue to sell management tools, application building tools, easy to use big data applications, and custom high end business logic operators. This proposal covers the Apex source code (written in Java), Apex documentation and other materials currently available on https://github.com/DataTorrent/Apex. This proposal also covers the Malhar source code (written in Java), Malhar documentation, and other materials currently available on https://github.com/DataTorrent/Malhar. We have done a trademark check on the name Apex, and have concluded that the Apex name is likely to be a suitable project name. == Background == DataTorrent RTS is a mature and robust product developed as a native YARN application. RTS 1.0 was launched in summer of 2014; RTS 2.0 was launched in Jan 2015. Both were well received by customers. RTS 3.0 was launched at end of July 2015. RTS is among the first enterprise grade platform that was developed from the ground up as native YARN application. DataTorrent RTS is currently maintained by engineers as a closed source project. Even though the engineers behind RTS are experienced software engineers and are knowledge leaders in data-in-motion platforms, they have had little exposure to the open source governance process. Customers are currently running applications based on DataTorrent RTS in production. == Rationale == Big data applications written for non-Hadoop platforms typically require major rewrites to get them to work with Hadoop. This rewriting creates a significant bottleneck in terms of resources (expertise) which in turn jeopardizes the viability of such an endeavour. It is hard enough to acquire big data expertise, demanding additional expertise to do a major code conversion makes it a very hard problem for projects to successfully migrate to Hadoop. Also, due to the batch processing nature of Hadoop’s MapReduce paradigm, users often have to wait tens of minutes
Re: [VOTE] Accept Apex into the Apache Incubator
+1 (Non-binding) -Gaurav On Aug 13, 2015, at 10:22 AM, Pramod Immaneni pra...@datatorrent.com wrote: +1 (Non-binding) On Thu, Aug 13, 2015 at 7:48 AM, P. Taylor Goetz ptgo...@apache.org wrote: Following the discussion thread [1], I would like to call a VOTE for Accepting Apex as a new Apache Incubator project. The proposal is available on the wiki [2] and is also attached below. The VOTE will be open for at least 72 hours. [ ] +1 Accept Apex into the Incubator [ ] ±0 No opinion [ ] -1 Do not accept Apex into the Incubator because… Thanks, -Taylor [1] http://s.apache.org/apex_discuss [2] https://wiki.apache.org/incubator/ApexProposal == Abstract == Apex is an enterprise grade native YARN big data-in-motion platform that unifies stream processing as well as batch processing. Apex processes big data in-motion in a highly scalable, highly performant, fault tolerant, stateful, secure, distributed, and an easily operable way. It provides a simple API that enables users to write or re-use generic Java code, thereby lowering the expertise needed to write big data applications. Functional and operational specifications are separated. Apex is designed in a way to enable users to write their own code (aka user defined functions) as is and leave all operability to the platform. The API is very simple and is designed to allow users to drop in their code as is. The platform mainly deals with operability and treats functional code as a black box. Operability includes fault tolerance, scalability, security, ease of use, metrics api, webservices, etc. In other words there is no separation of UDF (user defined functions), as all functional code is UDF. This frees users to focus on functional development, and lets platform provide operability support. The same code runs as is with different operability attributes. The data-in-motion architecture of Apex unifies stream as well as batch processing in a single platform. Since Apex is a native YARN application, it leverages all the components of YARN without duplication. Apex was developed with YARN in mind and has no overlapping components/functionality with YARN. The Apex platform is supplemented by project Malhar, which is a library of operators that implement common business logic functions needed by customers who want to quickly develop applications. These operators provide access to HDFS, S3, NFS, FTP, and other file systems; Kafka, ActiveMQ, RabbitMQ, JMS, and other message systems; MySql, Cassandra, MongoDB, Redis, HBase, CouchDB and other databases along with JDBC connectors. The Malhar library also includes a host of other common business logic patterns that help users to significantly reduce the time it takes to go into production. Ease of integration with all other big data technologies is one of the primary missions of Malhar. == Proposal == The goal of this proposal is to establish the core engine of DataTorrent RTS product as an Apache Software Foundation (ASF) project in order to build a vibrant, diverse, and self-governed open source community around the technology. DataTorrent will continue to sell management tools, application building tools, easy to use big data applications, and custom high end business logic operators. This proposal covers the Apex source code (written in Java), Apex documentation and other materials currently available on https://github.com/DataTorrent/Apex. This proposal also covers the Malhar source code (written in Java), Malhar documentation, and other materials currently available on https://github.com/DataTorrent/Malhar. We have done a trademark check on the name Apex, and have concluded that the Apex name is likely to be a suitable project name. == Background == DataTorrent RTS is a mature and robust product developed as a native YARN application. RTS 1.0 was launched in summer of 2014; RTS 2.0 was launched in Jan 2015. Both were well received by customers. RTS 3.0 was launched at end of July 2015. RTS is among the first enterprise grade platform that was developed from the ground up as native YARN application. DataTorrent RTS is currently maintained by engineers as a closed source project. Even though the engineers behind RTS are experienced software engineers and are knowledge leaders in data-in-motion platforms, they have had little exposure to the open source governance process. Customers are currently running applications based on DataTorrent RTS in production. == Rationale == Big data applications written for non-Hadoop platforms typically require major rewrites to get them to work with Hadoop. This rewriting creates a significant bottleneck in terms of resources (expertise) which in turn jeopardizes the viability of such an endeavour. It is hard enough to acquire big data expertise, demanding additional expertise to do a major code conversion makes it a very hard problem for projects to successfully migrate to
Re: [VOTE] Accept Apex into the Apache Incubator
+1 (binding) Good luck guys! - Henry On Thu, Aug 13, 2015 at 7:48 AM, P. Taylor Goetz ptgo...@apache.org wrote: Following the discussion thread [1], I would like to call a VOTE for Accepting Apex as a new Apache Incubator project. The proposal is available on the wiki [2] and is also attached below. The VOTE will be open for at least 72 hours. [ ] +1 Accept Apex into the Incubator [ ] ±0 No opinion [ ] -1 Do not accept Apex into the Incubator because… Thanks, -Taylor [1] http://s.apache.org/apex_discuss [2] https://wiki.apache.org/incubator/ApexProposal == Abstract == Apex is an enterprise grade native YARN big data-in-motion platform that unifies stream processing as well as batch processing. Apex processes big data in-motion in a highly scalable, highly performant, fault tolerant, stateful, secure, distributed, and an easily operable way. It provides a simple API that enables users to write or re-use generic Java code, thereby lowering the expertise needed to write big data applications. Functional and operational specifications are separated. Apex is designed in a way to enable users to write their own code (aka user defined functions) as is and leave all operability to the platform. The API is very simple and is designed to allow users to drop in their code as is. The platform mainly deals with operability and treats functional code as a black box. Operability includes fault tolerance, scalability, security, ease of use, metrics api, webservices, etc. In other words there is no separation of UDF (user defined functions), as all functional code is UDF. This frees users to focus on functional development, and lets platform provide operability support. The same code runs as is with different operability attributes. The data-in-motion architecture of Apex unifies stream as well as batch processing in a single platform. Since Apex is a native YARN application, it leverages all the components of YARN without duplication. Apex was developed with YARN in mind and has no overlapping components/functionality with YARN. The Apex platform is supplemented by project Malhar, which is a library of operators that implement common business logic functions needed by customers who want to quickly develop applications. These operators provide access to HDFS, S3, NFS, FTP, and other file systems; Kafka, ActiveMQ, RabbitMQ, JMS, and other message systems; MySql, Cassandra, MongoDB, Redis, HBase, CouchDB and other databases along with JDBC connectors. The Malhar library also includes a host of other common business logic patterns that help users to significantly reduce the time it takes to go into production. Ease of integration with all other big data technologies is one of the primary missions of Malhar. == Proposal == The goal of this proposal is to establish the core engine of DataTorrent RTS product as an Apache Software Foundation (ASF) project in order to build a vibrant, diverse, and self-governed open source community around the technology. DataTorrent will continue to sell management tools, application building tools, easy to use big data applications, and custom high end business logic operators. This proposal covers the Apex source code (written in Java), Apex documentation and other materials currently available on https://github.com/DataTorrent/Apex. This proposal also covers the Malhar source code (written in Java), Malhar documentation, and other materials currently available on https://github.com/DataTorrent/Malhar. We have done a trademark check on the name Apex, and have concluded that the Apex name is likely to be a suitable project name. == Background == DataTorrent RTS is a mature and robust product developed as a native YARN application. RTS 1.0 was launched in summer of 2014; RTS 2.0 was launched in Jan 2015. Both were well received by customers. RTS 3.0 was launched at end of July 2015. RTS is among the first enterprise grade platform that was developed from the ground up as native YARN application. DataTorrent RTS is currently maintained by engineers as a closed source project. Even though the engineers behind RTS are experienced software engineers and are knowledge leaders in data-in-motion platforms, they have had little exposure to the open source governance process. Customers are currently running applications based on DataTorrent RTS in production. == Rationale == Big data applications written for non-Hadoop platforms typically require major rewrites to get them to work with Hadoop. This rewriting creates a significant bottleneck in terms of resources (expertise) which in turn jeopardizes the viability of such an endeavour. It is hard enough to acquire big data expertise, demanding additional expertise to do a major code conversion makes it a very hard problem for projects to successfully migrate to Hadoop. Also, due to the batch processing nature of Hadoop’s MapReduce paradigm, users often have to
Re: [VOTE] Accept Apex into the Apache Incubator
+1 (Non-binding) On Thu, Aug 13, 2015 at 2:32 PM, Alan Gates alanfga...@gmail.com wrote: +1. Alan. Chris Nauroth cnaur...@hortonworks.com August 13, 2015 at 9:59 +1 (binding) I believe the current proposal covers everything required. Thank you to Amol for incorporating the community's feedback. --Chris Nauroth From: P. Taylor Goetz ptgo...@apache.orgmailto:ptgo...@apache.org ptgo...@apache.org Reply-To: general@incubator.apache.org mailto:general@incubator.apache.org general@incubator.apache.org Date: Thursday, August 13, 2015 at 7:48 AM To: Incubator general@incubator.apache.org mailto:general@incubator.apache.org general@incubator.apache.org Subject: [VOTE] Accept Apex into the Apache Incubator Following the discussion thread [1], I would like to call a VOTE for Accepting Apex as a new Apache Incubator project. The proposal is available on the wiki [2] and is also attached below. The VOTE will be open for at least 72 hours. [ ] +1 Accept Apex into the Incubator [ ] ±0 No opinion [ ] -1 Do not accept Apex into the Incubator because… Thanks, -Taylor [1] http://s.apache.org/apex_discuss [2] https://wiki.apache.org/incubator/ApexProposal == Abstract == Apex is an enterprise grade native YARN big data-in-motion platform that unifies stream processing as well as batch processing. Apex processes big data in-motion in a highly scalable, highly performant, fault tolerant, stateful, secure, distributed, and an easily operable way. It provides a simple API that enables users to write or re-use generic Java code, thereby lowering the expertise needed to write big data applications. Functional and operational specifications are separated. Apex is designed in a way to enable users to write their own code (aka user defined functions) as is and leave all operability to the platform. The API is very simple and is designed to allow users to drop in their code as is. The platform mainly deals with operability and treats functional code as a black box. Operability includes fault tolerance, scalability, security, ease of use, metrics api, webservices, etc. In other words there is no separation of UDF (user defined functions), as all functional code is UDF. This frees users to focus on functional development, and lets platform provide operability support. The same code runs as is with different operability attributes. The data-in-motion architecture of Apex unifies stream as well as batch processing in a single platform. Since Apex is a native YARN application, it leverages all the components of YARN without duplication. Apex was developed with YARN in mind and has no overlapping components/functionality with YARN. The Apex platform is supplemented by project Malhar, which is a library of operators that implement common business logic functions needed by customers who want to quickly develop applications. These operators provide access to HDFS, S3, NFS, FTP, and other file systems; Kafka, ActiveMQ, RabbitMQ, JMS, and other message systems; MySql, Cassandra, MongoDB, Redis, HBase, CouchDB and other databases along with JDBC connectors. The Malhar library also includes a host of other common business logic patterns that help users to significantly reduce the time it takes to go into production. Ease of integration with all other big data technologies is one of the primary missions of Malhar. == Proposal == The goal of this proposal is to establish the core engine of DataTorrent RTS product as an Apache Software Foundation (ASF) project in order to build a vibrant, diverse, and self-governed open source community around the technology. DataTorrent will continue to sell management tools, application building tools, easy to use big data applications, and custom high end business logic operators. This proposal covers the Apex source code (written in Java), Apex documentation and other materials currently available on https://github.com/DataTorrent/Apex. This proposal also covers the Malhar source code (written in Java), Malhar documentation, and other materials currently available on https://github.com/DataTorrent/Malhar. We have done a trademark check on the name Apex, and have concluded that the Apex name is likely to be a suitable project name. == Background == DataTorrent RTS is a mature and robust product developed as a native YARN application. RTS 1.0 was launched in summer of 2014; RTS 2.0 was launched in Jan 2015. Both were well received by customers. RTS 3.0 was launched at end of July 2015. RTS is among the first enterprise grade platform that was developed from the ground up as native YARN application. DataTorrent RTS is currently maintained by engineers as a closed source project. Even though the engineers behind RTS are experienced software engineers and are knowledge leaders in data-in-motion platforms, they have had little exposure to the open source governance process. Customers are currently running applications
Re: [VOTE] Accept Apex into the Apache Incubator
+1 (binding) -Taylor On Aug 13, 2015, at 10:48 AM, P. Taylor Goetz ptgo...@apache.org wrote: Following the discussion thread [1], I would like to call a VOTE for Accepting Apex as a new Apache Incubator project. The proposal is available on the wiki [2] and is also attached below. The VOTE will be open for at least 72 hours. [ ] +1 Accept Apex into the Incubator [ ] ±0 No opinion [ ] -1 Do not accept Apex into the Incubator because… Thanks, -Taylor [1] http://s.apache.org/apex_discuss http://s.apache.org/apex_discuss [2] https://wiki.apache.org/incubator/ApexProposal https://wiki.apache.org/incubator/ApexProposal == Abstract == Apex is an enterprise grade native YARN big data-in-motion platform that unifies stream processing as well as batch processing. Apex processes big data in-motion in a highly scalable, highly performant, fault tolerant, stateful, secure, distributed, and an easily operable way. It provides a simple API that enables users to write or re-use generic Java code, thereby lowering the expertise needed to write big data applications. Functional and operational specifications are separated. Apex is designed in a way to enable users to write their own code (aka user defined functions) as is and leave all operability to the platform. The API is very simple and is designed to allow users to drop in their code as is. The platform mainly deals with operability and treats functional code as a black box. Operability includes fault tolerance, scalability, security, ease of use, metrics api, webservices, etc. In other words there is no separation of UDF (user defined functions), as all functional code is UDF. This frees users to focus on functional development, and lets platform provide operability support. The same code runs as is with different operability attributes. The data-in-motion architecture of Apex unifies stream as well as batch processing in a single platform. Since Apex is a native YARN application, it leverages all the components of YARN without duplication. Apex was developed with YARN in mind and has no overlapping components/functionality with YARN. The Apex platform is supplemented by project Malhar, which is a library of operators that implement common business logic functions needed by customers who want to quickly develop applications. These operators provide access to HDFS, S3, NFS, FTP, and other file systems; Kafka, ActiveMQ, RabbitMQ, JMS, and other message systems; MySql, Cassandra, MongoDB, Redis, HBase, CouchDB and other databases along with JDBC connectors. The Malhar library also includes a host of other common business logic patterns that help users to significantly reduce the time it takes to go into production. Ease of integration with all other big data technologies is one of the primary missions of Malhar. == Proposal == The goal of this proposal is to establish the core engine of DataTorrent RTS product as an Apache Software Foundation (ASF) project in order to build a vibrant, diverse, and self-governed open source community around the technology. DataTorrent will continue to sell management tools, application building tools, easy to use big data applications, and custom high end business logic operators. This proposal covers the Apex source code (written in Java), Apex documentation and other materials currently available on https://github.com/DataTorrent/Apex https://github.com/DataTorrent/Apex. This proposal also covers the Malhar source code (written in Java), Malhar documentation, and other materials currently available on https://github.com/DataTorrent/Malhar https://github.com/DataTorrent/Malhar. We have done a trademark check on the name Apex, and have concluded that the Apex name is likely to be a suitable project name. == Background == DataTorrent RTS is a mature and robust product developed as a native YARN application. RTS 1.0 was launched in summer of 2014; RTS 2.0 was launched in Jan 2015. Both were well received by customers. RTS 3.0 was launched at end of July 2015. RTS is among the first enterprise grade platform that was developed from the ground up as native YARN application. DataTorrent RTS is currently maintained by engineers as a closed source project. Even though the engineers behind RTS are experienced software engineers and are knowledge leaders in data-in-motion platforms, they have had little exposure to the open source governance process. Customers are currently running applications based on DataTorrent RTS in production. == Rationale == Big data applications written for non-Hadoop platforms typically require major rewrites to get them to work with Hadoop. This rewriting creates a significant bottleneck in terms of resources (expertise) which in turn jeopardizes the viability of such an endeavour. It is hard enough to acquire big data expertise, demanding additional
Re: [VOTE] Accept Apex into the Apache Incubator
+1 (non-binding) On Thu, Aug 13, 2015 at 7:48 AM, P. Taylor Goetz ptgo...@apache.org wrote: Following the discussion thread [1], I would like to call a VOTE for Accepting Apex as a new Apache Incubator project. The proposal is available on the wiki [2] and is also attached below. The VOTE will be open for at least 72 hours. [ ] +1 Accept Apex into the Incubator [ ] ±0 No opinion [ ] -1 Do not accept Apex into the Incubator because… Thanks, -Taylor [1] http://s.apache.org/apex_discuss [2] https://wiki.apache.org/incubator/ApexProposal == Abstract == Apex is an enterprise grade native YARN big data-in-motion platform that unifies stream processing as well as batch processing. Apex processes big data in-motion in a highly scalable, highly performant, fault tolerant, stateful, secure, distributed, and an easily operable way. It provides a simple API that enables users to write or re-use generic Java code, thereby lowering the expertise needed to write big data applications. Functional and operational specifications are separated. Apex is designed in a way to enable users to write their own code (aka user defined functions) as is and leave all operability to the platform. The API is very simple and is designed to allow users to drop in their code as is. The platform mainly deals with operability and treats functional code as a black box. Operability includes fault tolerance, scalability, security, ease of use, metrics api, webservices, etc. In other words there is no separation of UDF (user defined functions), as all functional code is UDF. This frees users to focus on functional development, and lets platform provide operability support. The same code runs as is with different operability attributes. The data-in-motion architecture of Apex unifies stream as well as batch processing in a single platform. Since Apex is a native YARN application, it leverages all the components of YARN without duplication. Apex was developed with YARN in mind and has no overlapping components/functionality with YARN. The Apex platform is supplemented by project Malhar, which is a library of operators that implement common business logic functions needed by customers who want to quickly develop applications. These operators provide access to HDFS, S3, NFS, FTP, and other file systems; Kafka, ActiveMQ, RabbitMQ, JMS, and other message systems; MySql, Cassandra, MongoDB, Redis, HBase, CouchDB and other databases along with JDBC connectors. The Malhar library also includes a host of other common business logic patterns that help users to significantly reduce the time it takes to go into production. Ease of integration with all other big data technologies is one of the primary missions of Malhar. == Proposal == The goal of this proposal is to establish the core engine of DataTorrent RTS product as an Apache Software Foundation (ASF) project in order to build a vibrant, diverse, and self-governed open source community around the technology. DataTorrent will continue to sell management tools, application building tools, easy to use big data applications, and custom high end business logic operators. This proposal covers the Apex source code (written in Java), Apex documentation and other materials currently available on https://github.com/DataTorrent/Apex. This proposal also covers the Malhar source code (written in Java), Malhar documentation, and other materials currently available on https://github.com/DataTorrent/Malhar. We have done a trademark check on the name Apex, and have concluded that the Apex name is likely to be a suitable project name. == Background == DataTorrent RTS is a mature and robust product developed as a native YARN application. RTS 1.0 was launched in summer of 2014; RTS 2.0 was launched in Jan 2015. Both were well received by customers. RTS 3.0 was launched at end of July 2015. RTS is among the first enterprise grade platform that was developed from the ground up as native YARN application. DataTorrent RTS is currently maintained by engineers as a closed source project. Even though the engineers behind RTS are experienced software engineers and are knowledge leaders in data-in-motion platforms, they have had little exposure to the open source governance process. Customers are currently running applications based on DataTorrent RTS in production. == Rationale == Big data applications written for non-Hadoop platforms typically require major rewrites to get them to work with Hadoop. This rewriting creates a significant bottleneck in terms of resources (expertise) which in turn jeopardizes the viability of such an endeavour. It is hard enough to acquire big data expertise, demanding additional expertise to do a major code conversion makes it a very hard problem for projects to successfully migrate to Hadoop. Also, due to the batch processing nature of Hadoop’s MapReduce paradigm, users often have to wait tens of minutes
Re: [VOTE] Accept Apex into the Apache Incubator
+1 On Thu, Aug 13, 2015 at 12:48 PM P. Taylor Goetz ptgo...@apache.org wrote: Following the discussion thread [1], I would like to call a VOTE for Accepting Apex as a new Apache Incubator project. The proposal is available on the wiki [2] and is also attached below. The VOTE will be open for at least 72 hours. [ ] +1 Accept Apex into the Incubator [ ] ±0 No opinion [ ] -1 Do not accept Apex into the Incubator because… Thanks, -Taylor [1] http://s.apache.org/apex_discuss [2] https://wiki.apache.org/incubator/ApexProposal == Abstract == Apex is an enterprise grade native YARN big data-in-motion platform that unifies stream processing as well as batch processing. Apex processes big data in-motion in a highly scalable, highly performant, fault tolerant, stateful, secure, distributed, and an easily operable way. It provides a simple API that enables users to write or re-use generic Java code, thereby lowering the expertise needed to write big data applications. Functional and operational specifications are separated. Apex is designed in a way to enable users to write their own code (aka user defined functions) as is and leave all operability to the platform. The API is very simple and is designed to allow users to drop in their code as is. The platform mainly deals with operability and treats functional code as a black box. Operability includes fault tolerance, scalability, security, ease of use, metrics api, webservices, etc. In other words there is no separation of UDF (user defined functions), as all functional code is UDF. This frees users to focus on functional development, and lets platform provide operability support. The same code runs as is with different operability attributes. The data-in-motion architecture of Apex unifies stream as well as batch processing in a single platform. Since Apex is a native YARN application, it leverages all the components of YARN without duplication. Apex was developed with YARN in mind and has no overlapping components/functionality with YARN. The Apex platform is supplemented by project Malhar, which is a library of operators that implement common business logic functions needed by customers who want to quickly develop applications. These operators provide access to HDFS, S3, NFS, FTP, and other file systems; Kafka, ActiveMQ, RabbitMQ, JMS, and other message systems; MySql, Cassandra, MongoDB, Redis, HBase, CouchDB and other databases along with JDBC connectors. The Malhar library also includes a host of other common business logic patterns that help users to significantly reduce the time it takes to go into production. Ease of integration with all other big data technologies is one of the primary missions of Malhar. == Proposal == The goal of this proposal is to establish the core engine of DataTorrent RTS product as an Apache Software Foundation (ASF) project in order to build a vibrant, diverse, and self-governed open source community around the technology. DataTorrent will continue to sell management tools, application building tools, easy to use big data applications, and custom high end business logic operators. This proposal covers the Apex source code (written in Java), Apex documentation and other materials currently available on https://github.com/DataTorrent/Apex. This proposal also covers the Malhar source code (written in Java), Malhar documentation, and other materials currently available on https://github.com/DataTorrent/Malhar. We have done a trademark check on the name Apex, and have concluded that the Apex name is likely to be a suitable project name. == Background == DataTorrent RTS is a mature and robust product developed as a native YARN application. RTS 1.0 was launched in summer of 2014; RTS 2.0 was launched in Jan 2015. Both were well received by customers. RTS 3.0 was launched at end of July 2015. RTS is among the first enterprise grade platform that was developed from the ground up as native YARN application. DataTorrent RTS is currently maintained by engineers as a closed source project. Even though the engineers behind RTS are experienced software engineers and are knowledge leaders in data-in-motion platforms, they have had little exposure to the open source governance process. Customers are currently running applications based on DataTorrent RTS in production. == Rationale == Big data applications written for non-Hadoop platforms typically require major rewrites to get them to work with Hadoop. This rewriting creates a significant bottleneck in terms of resources (expertise) which in turn jeopardizes the viability of such an endeavour. It is hard enough to acquire big data expertise, demanding additional expertise to do a major code conversion makes it a very hard problem for projects to successfully migrate to Hadoop. Also, due to the batch processing nature of Hadoop’s MapReduce paradigm, users often have to wait tens of minutes to see results
Re: [VOTE] Accept Apex into the Apache Incubator
+1 (Non-binding) On Thu, Aug 13, 2015 at 1:09 PM Julian Hyde jh...@apache.org wrote: +1 (binding) Julian On Aug 13, 2015, at 12:40 PM, Gaurav Gupta gau...@datatorrent.com wrote: +1 (Non-binding) -Gaurav On Aug 13, 2015, at 10:22 AM, Pramod Immaneni pra...@datatorrent.com wrote: +1 (Non-binding) On Thu, Aug 13, 2015 at 7:48 AM, P. Taylor Goetz ptgo...@apache.org wrote: Following the discussion thread [1], I would like to call a VOTE for Accepting Apex as a new Apache Incubator project. The proposal is available on the wiki [2] and is also attached below. The VOTE will be open for at least 72 hours. [ ] +1 Accept Apex into the Incubator [ ] ±0 No opinion [ ] -1 Do not accept Apex into the Incubator because… Thanks, -Taylor [1] http://s.apache.org/apex_discuss [2] https://wiki.apache.org/incubator/ApexProposal == Abstract == Apex is an enterprise grade native YARN big data-in-motion platform that unifies stream processing as well as batch processing. Apex processes big data in-motion in a highly scalable, highly performant, fault tolerant, stateful, secure, distributed, and an easily operable way. It provides a simple API that enables users to write or re-use generic Java code, thereby lowering the expertise needed to write big data applications. Functional and operational specifications are separated. Apex is designed in a way to enable users to write their own code (aka user defined functions) as is and leave all operability to the platform. The API is very simple and is designed to allow users to drop in their code as is. The platform mainly deals with operability and treats functional code as a black box. Operability includes fault tolerance, scalability, security, ease of use, metrics api, webservices, etc. In other words there is no separation of UDF (user defined functions), as all functional code is UDF. This frees users to focus on functional development, and lets platform provide operability support. The same code runs as is with different operability attributes. The data-in-motion architecture of Apex unifies stream as well as batch processing in a single platform. Since Apex is a native YARN application, it leverages all the components of YARN without duplication. Apex was developed with YARN in mind and has no overlapping components/functionality with YARN. The Apex platform is supplemented by project Malhar, which is a library of operators that implement common business logic functions needed by customers who want to quickly develop applications. These operators provide access to HDFS, S3, NFS, FTP, and other file systems; Kafka, ActiveMQ, RabbitMQ, JMS, and other message systems; MySql, Cassandra, MongoDB, Redis, HBase, CouchDB and other databases along with JDBC connectors. The Malhar library also includes a host of other common business logic patterns that help users to significantly reduce the time it takes to go into production. Ease of integration with all other big data technologies is one of the primary missions of Malhar. == Proposal == The goal of this proposal is to establish the core engine of DataTorrent RTS product as an Apache Software Foundation (ASF) project in order to build a vibrant, diverse, and self-governed open source community around the technology. DataTorrent will continue to sell management tools, application building tools, easy to use big data applications, and custom high end business logic operators. This proposal covers the Apex source code (written in Java), Apex documentation and other materials currently available on https://github.com/DataTorrent/Apex. This proposal also covers the Malhar source code (written in Java), Malhar documentation, and other materials currently available on https://github.com/DataTorrent/Malhar. We have done a trademark check on the name Apex, and have concluded that the Apex name is likely to be a suitable project name. == Background == DataTorrent RTS is a mature and robust product developed as a native YARN application. RTS 1.0 was launched in summer of 2014; RTS 2.0 was launched in Jan 2015. Both were well received by customers. RTS 3.0 was launched at end of July 2015. RTS is among the first enterprise grade platform that was developed from the ground up as native YARN application. DataTorrent RTS is currently maintained by engineers as a closed source project. Even though the engineers behind RTS are experienced software engineers and are knowledge leaders in data-in-motion platforms, they have had little exposure to the open source governance process. Customers are currently running applications based on DataTorrent RTS in production. == Rationale == Big data applications written for non-Hadoop platforms typically require major rewrites to get them to work with Hadoop. This rewriting creates a
Re: [VOTE] Accept Apex into the Apache Incubator
+1 (binding) On Thu, Aug 13, 2015 at 1:47 PM, Hitesh Shah hit...@apache.org wrote: +1 (binding) — Hitesh On Aug 13, 2015, at 7:48 AM, P. Taylor Goetz ptgo...@apache.org wrote: Following the discussion thread [1], I would like to call a VOTE for Accepting Apex as a new Apache Incubator project. The proposal is available on the wiki [2] and is also attached below. The VOTE will be open for at least 72 hours. [ ] +1 Accept Apex into the Incubator [ ] ±0 No opinion [ ] -1 Do not accept Apex into the Incubator because… Thanks, -Taylor [1] http://s.apache.org/apex_discuss [2] https://wiki.apache.org/incubator/ApexProposal == Abstract == Apex is an enterprise grade native YARN big data-in-motion platform that unifies stream processing as well as batch processing. Apex processes big data in-motion in a highly scalable, highly performant, fault tolerant, stateful, secure, distributed, and an easily operable way. It provides a simple API that enables users to write or re-use generic Java code, thereby lowering the expertise needed to write big data applications. Functional and operational specifications are separated. Apex is designed in a way to enable users to write their own code (aka user defined functions) as is and leave all operability to the platform. The API is very simple and is designed to allow users to drop in their code as is. The platform mainly deals with operability and treats functional code as a black box. Operability includes fault tolerance, scalability, security, ease of use, metrics api, webservices, etc. In other words there is no separation of UDF (user defined functions), as all functional code is UDF. This frees users to focus on functional development, and lets platform provide operability support. The same code runs as is with different operability attributes. The data-in-motion architecture of Apex unifies stream as well as batch processing in a single platform. Since Apex is a native YARN application, it leverages all the components of YARN without duplication. Apex was developed with YARN in mind and has no overlapping components/functionality with YARN. The Apex platform is supplemented by project Malhar, which is a library of operators that implement common business logic functions needed by customers who want to quickly develop applications. These operators provide access to HDFS, S3, NFS, FTP, and other file systems; Kafka, ActiveMQ, RabbitMQ, JMS, and other message systems; MySql, Cassandra, MongoDB, Redis, HBase, CouchDB and other databases along with JDBC connectors. The Malhar library also includes a host of other common business logic patterns that help users to significantly reduce the time it takes to go into production. Ease of integration with all other big data technologies is one of the primary missions of Malhar. == Proposal == The goal of this proposal is to establish the core engine of DataTorrent RTS product as an Apache Software Foundation (ASF) project in order to build a vibrant, diverse, and self-governed open source community around the technology. DataTorrent will continue to sell management tools, application building tools, easy to use big data applications, and custom high end business logic operators. This proposal covers the Apex source code (written in Java), Apex documentation and other materials currently available on https://github.com/DataTorrent/Apex. This proposal also covers the Malhar source code (written in Java), Malhar documentation, and other materials currently available on https://github.com/DataTorrent/Malhar. We have done a trademark check on the name Apex, and have concluded that the Apex name is likely to be a suitable project name. == Background == DataTorrent RTS is a mature and robust product developed as a native YARN application. RTS 1.0 was launched in summer of 2014; RTS 2.0 was launched in Jan 2015. Both were well received by customers. RTS 3.0 was launched at end of July 2015. RTS is among the first enterprise grade platform that was developed from the ground up as native YARN application. DataTorrent RTS is currently maintained by engineers as a closed source project. Even though the engineers behind RTS are experienced software engineers and are knowledge leaders in data-in-motion platforms, they have had little exposure to the open source governance process. Customers are currently running applications based on DataTorrent RTS in production. == Rationale == Big data applications written for non-Hadoop platforms typically require major rewrites to get them to work with Hadoop. This rewriting creates a significant bottleneck in terms of resources (expertise) which in turn jeopardizes the viability of such an endeavour. It is hard enough to acquire big data expertise, demanding additional expertise to do a major code conversion makes it a very hard problem for projects to successfully migrate to
Re: [VOTE] Accept Apex into the Apache Incubator
+1. Alan. Chris Nauroth mailto:cnaur...@hortonworks.com August 13, 2015 at 9:59 +1 (binding) I believe the current proposal covers everything required. Thank you to Amol for incorporating the community's feedback. --Chris Nauroth From: P. Taylor Goetz ptgo...@apache.orgmailto:ptgo...@apache.org Reply-To: general@incubator.apache.orgmailto:general@incubator.apache.org Date: Thursday, August 13, 2015 at 7:48 AM To: Incubator general@incubator.apache.orgmailto:general@incubator.apache.org Subject: [VOTE] Accept Apex into the Apache Incubator Following the discussion thread [1], I would like to call a VOTE for Accepting Apex as a new Apache Incubator project. The proposal is available on the wiki [2] and is also attached below. The VOTE will be open for at least 72 hours. [ ] +1 Accept Apex into the Incubator [ ] ±0 No opinion [ ] -1 Do not accept Apex into the Incubator because… Thanks, -Taylor [1] http://s.apache.org/apex_discuss [2] https://wiki.apache.org/incubator/ApexProposal == Abstract == Apex is an enterprise grade native YARN big data-in-motion platform that unifies stream processing as well as batch processing. Apex processes big data in-motion in a highly scalable, highly performant, fault tolerant, stateful, secure, distributed, and an easily operable way. It provides a simple API that enables users to write or re-use generic Java code, thereby lowering the expertise needed to write big data applications. Functional and operational specifications are separated. Apex is designed in a way to enable users to write their own code (aka user defined functions) as is and leave all operability to the platform. The API is very simple and is designed to allow users to drop in their code as is. The platform mainly deals with operability and treats functional code as a black box. Operability includes fault tolerance, scalability, security, ease of use, metrics api, webservices, etc. In other words there is no separation of UDF (user defined functions), as all functional code is UDF. This frees users to focus on functional development, and lets platform provide operability support. The same code runs as is with different operability attributes. The data-in-motion architecture of Apex unifies stream as well as batch processing in a single platform. Since Apex is a native YARN application, it leverages all the components of YARN without duplication. Apex was developed with YARN in mind and has no overlapping components/functionality with YARN. The Apex platform is supplemented by project Malhar, which is a library of operators that implement common business logic functions needed by customers who want to quickly develop applications. These operators provide access to HDFS, S3, NFS, FTP, and other file systems; Kafka, ActiveMQ, RabbitMQ, JMS, and other message systems; MySql, Cassandra, MongoDB, Redis, HBase, CouchDB and other databases along with JDBC connectors. The Malhar library also includes a host of other common business logic patterns that help users to significantly reduce the time it takes to go into production. Ease of integration with all other big data technologies is one of the primary missions of Malhar. == Proposal == The goal of this proposal is to establish the core engine of DataTorrent RTS product as an Apache Software Foundation (ASF) project in order to build a vibrant, diverse, and self-governed open source community around the technology. DataTorrent will continue to sell management tools, application building tools, easy to use big data applications, and custom high end business logic operators. This proposal covers the Apex source code (written in Java), Apex documentation and other materials currently available on https://github.com/DataTorrent/Apex. This proposal also covers the Malhar source code (written in Java), Malhar documentation, and other materials currently available on https://github.com/DataTorrent/Malhar. We have done a trademark check on the name Apex, and have concluded that the Apex name is likely to be a suitable project name. == Background == DataTorrent RTS is a mature and robust product developed as a native YARN application. RTS 1.0 was launched in summer of 2014; RTS 2.0 was launched in Jan 2015. Both were well received by customers. RTS 3.0 was launched at end of July 2015. RTS is among the first enterprise grade platform that was developed from the ground up as native YARN application. DataTorrent RTS is currently maintained by engineers as a closed source project. Even though the engineers behind RTS are experienced software engineers and are knowledge leaders in data-in-motion platforms, they have had little exposure to the open source governance process. Customers are currently running applications based on DataTorrent RTS in production. == Rationale == Big data applications written for non-Hadoop platforms typically require major rewrites to get them
Re: [VOTE] Accept Apex into the Apache Incubator
+1 (binding) Julian On Aug 13, 2015, at 12:40 PM, Gaurav Gupta gau...@datatorrent.com wrote: +1 (Non-binding) -Gaurav On Aug 13, 2015, at 10:22 AM, Pramod Immaneni pra...@datatorrent.com wrote: +1 (Non-binding) On Thu, Aug 13, 2015 at 7:48 AM, P. Taylor Goetz ptgo...@apache.org wrote: Following the discussion thread [1], I would like to call a VOTE for Accepting Apex as a new Apache Incubator project. The proposal is available on the wiki [2] and is also attached below. The VOTE will be open for at least 72 hours. [ ] +1 Accept Apex into the Incubator [ ] ±0 No opinion [ ] -1 Do not accept Apex into the Incubator because… Thanks, -Taylor [1] http://s.apache.org/apex_discuss [2] https://wiki.apache.org/incubator/ApexProposal == Abstract == Apex is an enterprise grade native YARN big data-in-motion platform that unifies stream processing as well as batch processing. Apex processes big data in-motion in a highly scalable, highly performant, fault tolerant, stateful, secure, distributed, and an easily operable way. It provides a simple API that enables users to write or re-use generic Java code, thereby lowering the expertise needed to write big data applications. Functional and operational specifications are separated. Apex is designed in a way to enable users to write their own code (aka user defined functions) as is and leave all operability to the platform. The API is very simple and is designed to allow users to drop in their code as is. The platform mainly deals with operability and treats functional code as a black box. Operability includes fault tolerance, scalability, security, ease of use, metrics api, webservices, etc. In other words there is no separation of UDF (user defined functions), as all functional code is UDF. This frees users to focus on functional development, and lets platform provide operability support. The same code runs as is with different operability attributes. The data-in-motion architecture of Apex unifies stream as well as batch processing in a single platform. Since Apex is a native YARN application, it leverages all the components of YARN without duplication. Apex was developed with YARN in mind and has no overlapping components/functionality with YARN. The Apex platform is supplemented by project Malhar, which is a library of operators that implement common business logic functions needed by customers who want to quickly develop applications. These operators provide access to HDFS, S3, NFS, FTP, and other file systems; Kafka, ActiveMQ, RabbitMQ, JMS, and other message systems; MySql, Cassandra, MongoDB, Redis, HBase, CouchDB and other databases along with JDBC connectors. The Malhar library also includes a host of other common business logic patterns that help users to significantly reduce the time it takes to go into production. Ease of integration with all other big data technologies is one of the primary missions of Malhar. == Proposal == The goal of this proposal is to establish the core engine of DataTorrent RTS product as an Apache Software Foundation (ASF) project in order to build a vibrant, diverse, and self-governed open source community around the technology. DataTorrent will continue to sell management tools, application building tools, easy to use big data applications, and custom high end business logic operators. This proposal covers the Apex source code (written in Java), Apex documentation and other materials currently available on https://github.com/DataTorrent/Apex. This proposal also covers the Malhar source code (written in Java), Malhar documentation, and other materials currently available on https://github.com/DataTorrent/Malhar. We have done a trademark check on the name Apex, and have concluded that the Apex name is likely to be a suitable project name. == Background == DataTorrent RTS is a mature and robust product developed as a native YARN application. RTS 1.0 was launched in summer of 2014; RTS 2.0 was launched in Jan 2015. Both were well received by customers. RTS 3.0 was launched at end of July 2015. RTS is among the first enterprise grade platform that was developed from the ground up as native YARN application. DataTorrent RTS is currently maintained by engineers as a closed source project. Even though the engineers behind RTS are experienced software engineers and are knowledge leaders in data-in-motion platforms, they have had little exposure to the open source governance process. Customers are currently running applications based on DataTorrent RTS in production. == Rationale == Big data applications written for non-Hadoop platforms typically require major rewrites to get them to work with Hadoop. This rewriting creates a significant bottleneck in terms of resources (expertise) which in turn jeopardizes the viability of such an endeavour. It is hard enough to acquire big data expertise, demanding additional expertise
Re: [VOTE] Accept Apex into the Apache Incubator
+1 (Non-binding) Best Regards! - Luke Han On Fri, Aug 14, 2015 at 9:52 AM, Amol Kekre a...@datatorrent.com wrote: +1 (Non-binding) Amol On Thu, Aug 13, 2015 at 7:48 AM, P. Taylor Goetz ptgo...@apache.org wrote: Following the discussion thread [1], I would like to call a VOTE for Accepting Apex as a new Apache Incubator project. The proposal is available on the wiki [2] and is also attached below. The VOTE will be open for at least 72 hours. [ ] +1 Accept Apex into the Incubator [ ] ±0 No opinion [ ] -1 Do not accept Apex into the Incubator because… Thanks, -Taylor [1] http://s.apache.org/apex_discuss [2] https://wiki.apache.org/incubator/ApexProposal == Abstract == Apex is an enterprise grade native YARN big data-in-motion platform that unifies stream processing as well as batch processing. Apex processes big data in-motion in a highly scalable, highly performant, fault tolerant, stateful, secure, distributed, and an easily operable way. It provides a simple API that enables users to write or re-use generic Java code, thereby lowering the expertise needed to write big data applications. Functional and operational specifications are separated. Apex is designed in a way to enable users to write their own code (aka user defined functions) as is and leave all operability to the platform. The API is very simple and is designed to allow users to drop in their code as is. The platform mainly deals with operability and treats functional code as a black box. Operability includes fault tolerance, scalability, security, ease of use, metrics api, webservices, etc. In other words there is no separation of UDF (user defined functions), as all functional code is UDF. This frees users to focus on functional development, and lets platform provide operability support. The same code runs as is with different operability attributes. The data-in-motion architecture of Apex unifies stream as well as batch processing in a single platform. Since Apex is a native YARN application, it leverages all the components of YARN without duplication. Apex was developed with YARN in mind and has no overlapping components/functionality with YARN. The Apex platform is supplemented by project Malhar, which is a library of operators that implement common business logic functions needed by customers who want to quickly develop applications. These operators provide access to HDFS, S3, NFS, FTP, and other file systems; Kafka, ActiveMQ, RabbitMQ, JMS, and other message systems; MySql, Cassandra, MongoDB, Redis, HBase, CouchDB and other databases along with JDBC connectors. The Malhar library also includes a host of other common business logic patterns that help users to significantly reduce the time it takes to go into production. Ease of integration with all other big data technologies is one of the primary missions of Malhar. == Proposal == The goal of this proposal is to establish the core engine of DataTorrent RTS product as an Apache Software Foundation (ASF) project in order to build a vibrant, diverse, and self-governed open source community around the technology. DataTorrent will continue to sell management tools, application building tools, easy to use big data applications, and custom high end business logic operators. This proposal covers the Apex source code (written in Java), Apex documentation and other materials currently available on https://github.com/DataTorrent/Apex. This proposal also covers the Malhar source code (written in Java), Malhar documentation, and other materials currently available on https://github.com/DataTorrent/Malhar. We have done a trademark check on the name Apex, and have concluded that the Apex name is likely to be a suitable project name. == Background == DataTorrent RTS is a mature and robust product developed as a native YARN application. RTS 1.0 was launched in summer of 2014; RTS 2.0 was launched in Jan 2015. Both were well received by customers. RTS 3.0 was launched at end of July 2015. RTS is among the first enterprise grade platform that was developed from the ground up as native YARN application. DataTorrent RTS is currently maintained by engineers as a closed source project. Even though the engineers behind RTS are experienced software engineers and are knowledge leaders in data-in-motion platforms, they have had little exposure to the open source governance process. Customers are currently running applications based on DataTorrent RTS in production. == Rationale == Big data applications written for non-Hadoop platforms typically require major rewrites to get them to work with Hadoop. This rewriting creates a significant bottleneck in terms of resources (expertise) which in turn jeopardizes the viability of such an endeavour. It is hard enough to acquire big data expertise, demanding
Re: [VOTE] Accept Apex into the Apache Incubator
+1 (Non-binding) Amol On Thu, Aug 13, 2015 at 7:48 AM, P. Taylor Goetz ptgo...@apache.org wrote: Following the discussion thread [1], I would like to call a VOTE for Accepting Apex as a new Apache Incubator project. The proposal is available on the wiki [2] and is also attached below. The VOTE will be open for at least 72 hours. [ ] +1 Accept Apex into the Incubator [ ] ±0 No opinion [ ] -1 Do not accept Apex into the Incubator because… Thanks, -Taylor [1] http://s.apache.org/apex_discuss [2] https://wiki.apache.org/incubator/ApexProposal == Abstract == Apex is an enterprise grade native YARN big data-in-motion platform that unifies stream processing as well as batch processing. Apex processes big data in-motion in a highly scalable, highly performant, fault tolerant, stateful, secure, distributed, and an easily operable way. It provides a simple API that enables users to write or re-use generic Java code, thereby lowering the expertise needed to write big data applications. Functional and operational specifications are separated. Apex is designed in a way to enable users to write their own code (aka user defined functions) as is and leave all operability to the platform. The API is very simple and is designed to allow users to drop in their code as is. The platform mainly deals with operability and treats functional code as a black box. Operability includes fault tolerance, scalability, security, ease of use, metrics api, webservices, etc. In other words there is no separation of UDF (user defined functions), as all functional code is UDF. This frees users to focus on functional development, and lets platform provide operability support. The same code runs as is with different operability attributes. The data-in-motion architecture of Apex unifies stream as well as batch processing in a single platform. Since Apex is a native YARN application, it leverages all the components of YARN without duplication. Apex was developed with YARN in mind and has no overlapping components/functionality with YARN. The Apex platform is supplemented by project Malhar, which is a library of operators that implement common business logic functions needed by customers who want to quickly develop applications. These operators provide access to HDFS, S3, NFS, FTP, and other file systems; Kafka, ActiveMQ, RabbitMQ, JMS, and other message systems; MySql, Cassandra, MongoDB, Redis, HBase, CouchDB and other databases along with JDBC connectors. The Malhar library also includes a host of other common business logic patterns that help users to significantly reduce the time it takes to go into production. Ease of integration with all other big data technologies is one of the primary missions of Malhar. == Proposal == The goal of this proposal is to establish the core engine of DataTorrent RTS product as an Apache Software Foundation (ASF) project in order to build a vibrant, diverse, and self-governed open source community around the technology. DataTorrent will continue to sell management tools, application building tools, easy to use big data applications, and custom high end business logic operators. This proposal covers the Apex source code (written in Java), Apex documentation and other materials currently available on https://github.com/DataTorrent/Apex. This proposal also covers the Malhar source code (written in Java), Malhar documentation, and other materials currently available on https://github.com/DataTorrent/Malhar. We have done a trademark check on the name Apex, and have concluded that the Apex name is likely to be a suitable project name. == Background == DataTorrent RTS is a mature and robust product developed as a native YARN application. RTS 1.0 was launched in summer of 2014; RTS 2.0 was launched in Jan 2015. Both were well received by customers. RTS 3.0 was launched at end of July 2015. RTS is among the first enterprise grade platform that was developed from the ground up as native YARN application. DataTorrent RTS is currently maintained by engineers as a closed source project. Even though the engineers behind RTS are experienced software engineers and are knowledge leaders in data-in-motion platforms, they have had little exposure to the open source governance process. Customers are currently running applications based on DataTorrent RTS in production. == Rationale == Big data applications written for non-Hadoop platforms typically require major rewrites to get them to work with Hadoop. This rewriting creates a significant bottleneck in terms of resources (expertise) which in turn jeopardizes the viability of such an endeavour. It is hard enough to acquire big data expertise, demanding additional expertise to do a major code conversion makes it a very hard problem for projects to successfully migrate to Hadoop. Also, due to the batch processing nature of Hadoop’s MapReduce paradigm, users often have to wait tens of
Re: [VOTE] Accept Apex into the Apache Incubator
+1 (Non Binding) On 13 Aug 2015 22:18, P. Taylor Goetz ptgo...@apache.org wrote: Following the discussion thread [1], I would like to call a VOTE for Accepting Apex as a new Apache Incubator project. The proposal is available on the wiki [2] and is also attached below. The VOTE will be open for at least 72 hours. [ ] +1 Accept Apex into the Incubator [ ] ±0 No opinion [ ] -1 Do not accept Apex into the Incubator because… Thanks, -Taylor [1] http://s.apache.org/apex_discuss [2] https://wiki.apache.org/incubator/ApexProposal == Abstract == Apex is an enterprise grade native YARN big data-in-motion platform that unifies stream processing as well as batch processing. Apex processes big data in-motion in a highly scalable, highly performant, fault tolerant, stateful, secure, distributed, and an easily operable way. It provides a simple API that enables users to write or re-use generic Java code, thereby lowering the expertise needed to write big data applications. Functional and operational specifications are separated. Apex is designed in a way to enable users to write their own code (aka user defined functions) as is and leave all operability to the platform. The API is very simple and is designed to allow users to drop in their code as is. The platform mainly deals with operability and treats functional code as a black box. Operability includes fault tolerance, scalability, security, ease of use, metrics api, webservices, etc. In other words there is no separation of UDF (user defined functions), as all functional code is UDF. This frees users to focus on functional development, and lets platform provide operability support. The same code runs as is with different operability attributes. The data-in-motion architecture of Apex unifies stream as well as batch processing in a single platform. Since Apex is a native YARN application, it leverages all the components of YARN without duplication. Apex was developed with YARN in mind and has no overlapping components/functionality with YARN. The Apex platform is supplemented by project Malhar, which is a library of operators that implement common business logic functions needed by customers who want to quickly develop applications. These operators provide access to HDFS, S3, NFS, FTP, and other file systems; Kafka, ActiveMQ, RabbitMQ, JMS, and other message systems; MySql, Cassandra, MongoDB, Redis, HBase, CouchDB and other databases along with JDBC connectors. The Malhar library also includes a host of other common business logic patterns that help users to significantly reduce the time it takes to go into production. Ease of integration with all other big data technologies is one of the primary missions of Malhar. == Proposal == The goal of this proposal is to establish the core engine of DataTorrent RTS product as an Apache Software Foundation (ASF) project in order to build a vibrant, diverse, and self-governed open source community around the technology. DataTorrent will continue to sell management tools, application building tools, easy to use big data applications, and custom high end business logic operators. This proposal covers the Apex source code (written in Java), Apex documentation and other materials currently available on https://github.com/DataTorrent/Apex. This proposal also covers the Malhar source code (written in Java), Malhar documentation, and other materials currently available on https://github.com/DataTorrent/Malhar. We have done a trademark check on the name Apex, and have concluded that the Apex name is likely to be a suitable project name. == Background == DataTorrent RTS is a mature and robust product developed as a native YARN application. RTS 1.0 was launched in summer of 2014; RTS 2.0 was launched in Jan 2015. Both were well received by customers. RTS 3.0 was launched at end of July 2015. RTS is among the first enterprise grade platform that was developed from the ground up as native YARN application. DataTorrent RTS is currently maintained by engineers as a closed source project. Even though the engineers behind RTS are experienced software engineers and are knowledge leaders in data-in-motion platforms, they have had little exposure to the open source governance process. Customers are currently running applications based on DataTorrent RTS in production. == Rationale == Big data applications written for non-Hadoop platforms typically require major rewrites to get them to work with Hadoop. This rewriting creates a significant bottleneck in terms of resources (expertise) which in turn jeopardizes the viability of such an endeavour. It is hard enough to acquire big data expertise, demanding additional expertise to do a major code conversion makes it a very hard problem for projects to successfully migrate to Hadoop. Also, due to the batch processing nature of Hadoop’s MapReduce paradigm, users often have to wait tens of minutes to see
Re: [VOTE] Accept Apex into the Apache Incubator
+1 (binding) -C On Thu, Aug 13, 2015 at 7:48 AM, P. Taylor Goetz ptgo...@apache.org wrote: Following the discussion thread [1], I would like to call a VOTE for Accepting Apex as a new Apache Incubator project. The proposal is available on the wiki [2] and is also attached below. The VOTE will be open for at least 72 hours. [ ] +1 Accept Apex into the Incubator [ ] ±0 No opinion [ ] -1 Do not accept Apex into the Incubator because… Thanks, -Taylor [1] http://s.apache.org/apex_discuss [2] https://wiki.apache.org/incubator/ApexProposal == Abstract == Apex is an enterprise grade native YARN big data-in-motion platform that unifies stream processing as well as batch processing. Apex processes big data in-motion in a highly scalable, highly performant, fault tolerant, stateful, secure, distributed, and an easily operable way. It provides a simple API that enables users to write or re-use generic Java code, thereby lowering the expertise needed to write big data applications. Functional and operational specifications are separated. Apex is designed in a way to enable users to write their own code (aka user defined functions) as is and leave all operability to the platform. The API is very simple and is designed to allow users to drop in their code as is. The platform mainly deals with operability and treats functional code as a black box. Operability includes fault tolerance, scalability, security, ease of use, metrics api, webservices, etc. In other words there is no separation of UDF (user defined functions), as all functional code is UDF. This frees users to focus on functional development, and lets platform provide operability support. The same code runs as is with different operability attributes. The data-in-motion architecture of Apex unifies stream as well as batch processing in a single platform. Since Apex is a native YARN application, it leverages all the components of YARN without duplication. Apex was developed with YARN in mind and has no overlapping components/functionality with YARN. The Apex platform is supplemented by project Malhar, which is a library of operators that implement common business logic functions needed by customers who want to quickly develop applications. These operators provide access to HDFS, S3, NFS, FTP, and other file systems; Kafka, ActiveMQ, RabbitMQ, JMS, and other message systems; MySql, Cassandra, MongoDB, Redis, HBase, CouchDB and other databases along with JDBC connectors. The Malhar library also includes a host of other common business logic patterns that help users to significantly reduce the time it takes to go into production. Ease of integration with all other big data technologies is one of the primary missions of Malhar. == Proposal == The goal of this proposal is to establish the core engine of DataTorrent RTS product as an Apache Software Foundation (ASF) project in order to build a vibrant, diverse, and self-governed open source community around the technology. DataTorrent will continue to sell management tools, application building tools, easy to use big data applications, and custom high end business logic operators. This proposal covers the Apex source code (written in Java), Apex documentation and other materials currently available on https://github.com/DataTorrent/Apex. This proposal also covers the Malhar source code (written in Java), Malhar documentation, and other materials currently available on https://github.com/DataTorrent/Malhar. We have done a trademark check on the name Apex, and have concluded that the Apex name is likely to be a suitable project name. == Background == DataTorrent RTS is a mature and robust product developed as a native YARN application. RTS 1.0 was launched in summer of 2014; RTS 2.0 was launched in Jan 2015. Both were well received by customers. RTS 3.0 was launched at end of July 2015. RTS is among the first enterprise grade platform that was developed from the ground up as native YARN application. DataTorrent RTS is currently maintained by engineers as a closed source project. Even though the engineers behind RTS are experienced software engineers and are knowledge leaders in data-in-motion platforms, they have had little exposure to the open source governance process. Customers are currently running applications based on DataTorrent RTS in production. == Rationale == Big data applications written for non-Hadoop platforms typically require major rewrites to get them to work with Hadoop. This rewriting creates a significant bottleneck in terms of resources (expertise) which in turn jeopardizes the viability of such an endeavour. It is hard enough to acquire big data expertise, demanding additional expertise to do a major code conversion makes it a very hard problem for projects to successfully migrate to Hadoop. Also, due to the batch processing nature of Hadoop’s MapReduce paradigm, users often have to wait tens of minutes to
Re: [VOTE] Accept Apex into the Apache Incubator
+1 binding - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [VOTE] Accept Apex into the Apache Incubator
+1 (non-binding) Thanks Naresh On Fri, Aug 14, 2015 at 11:14 AM, Justin Mclean jus...@classsoftware.com wrote: +1 binding - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org -- _ The information contained in this communication is intended solely for the use of the individual or entity to whom it is addressed and others authorized to receive it. It may contain confidential or legally privileged information. If you are not the intended recipient you are hereby notified that any disclosure, copying, distribution or taking any action in reliance on the contents of this information is strictly prohibited and may be unlawful. If you have received this communication in error, please notify us immediately by responding to this email and then delete it from your system. The firm is neither liable for the proper and complete transmission of the information contained in this communication nor for any delay in its receipt.
Re: [VOTE] Accept Apex into the Apache Incubator
+1 (binding) — Hitesh On Aug 13, 2015, at 7:48 AM, P. Taylor Goetz ptgo...@apache.org wrote: Following the discussion thread [1], I would like to call a VOTE for Accepting Apex as a new Apache Incubator project. The proposal is available on the wiki [2] and is also attached below. The VOTE will be open for at least 72 hours. [ ] +1 Accept Apex into the Incubator [ ] ±0 No opinion [ ] -1 Do not accept Apex into the Incubator because… Thanks, -Taylor [1] http://s.apache.org/apex_discuss [2] https://wiki.apache.org/incubator/ApexProposal == Abstract == Apex is an enterprise grade native YARN big data-in-motion platform that unifies stream processing as well as batch processing. Apex processes big data in-motion in a highly scalable, highly performant, fault tolerant, stateful, secure, distributed, and an easily operable way. It provides a simple API that enables users to write or re-use generic Java code, thereby lowering the expertise needed to write big data applications. Functional and operational specifications are separated. Apex is designed in a way to enable users to write their own code (aka user defined functions) as is and leave all operability to the platform. The API is very simple and is designed to allow users to drop in their code as is. The platform mainly deals with operability and treats functional code as a black box. Operability includes fault tolerance, scalability, security, ease of use, metrics api, webservices, etc. In other words there is no separation of UDF (user defined functions), as all functional code is UDF. This frees users to focus on functional development, and lets platform provide operability support. The same code runs as is with different operability attributes. The data-in-motion architecture of Apex unifies stream as well as batch processing in a single platform. Since Apex is a native YARN application, it leverages all the components of YARN without duplication. Apex was developed with YARN in mind and has no overlapping components/functionality with YARN. The Apex platform is supplemented by project Malhar, which is a library of operators that implement common business logic functions needed by customers who want to quickly develop applications. These operators provide access to HDFS, S3, NFS, FTP, and other file systems; Kafka, ActiveMQ, RabbitMQ, JMS, and other message systems; MySql, Cassandra, MongoDB, Redis, HBase, CouchDB and other databases along with JDBC connectors. The Malhar library also includes a host of other common business logic patterns that help users to significantly reduce the time it takes to go into production. Ease of integration with all other big data technologies is one of the primary missions of Malhar. == Proposal == The goal of this proposal is to establish the core engine of DataTorrent RTS product as an Apache Software Foundation (ASF) project in order to build a vibrant, diverse, and self-governed open source community around the technology. DataTorrent will continue to sell management tools, application building tools, easy to use big data applications, and custom high end business logic operators. This proposal covers the Apex source code (written in Java), Apex documentation and other materials currently available on https://github.com/DataTorrent/Apex. This proposal also covers the Malhar source code (written in Java), Malhar documentation, and other materials currently available on https://github.com/DataTorrent/Malhar. We have done a trademark check on the name Apex, and have concluded that the Apex name is likely to be a suitable project name. == Background == DataTorrent RTS is a mature and robust product developed as a native YARN application. RTS 1.0 was launched in summer of 2014; RTS 2.0 was launched in Jan 2015. Both were well received by customers. RTS 3.0 was launched at end of July 2015. RTS is among the first enterprise grade platform that was developed from the ground up as native YARN application. DataTorrent RTS is currently maintained by engineers as a closed source project. Even though the engineers behind RTS are experienced software engineers and are knowledge leaders in data-in-motion platforms, they have had little exposure to the open source governance process. Customers are currently running applications based on DataTorrent RTS in production. == Rationale == Big data applications written for non-Hadoop platforms typically require major rewrites to get them to work with Hadoop. This rewriting creates a significant bottleneck in terms of resources (expertise) which in turn jeopardizes the viability of such an endeavour. It is hard enough to acquire big data expertise, demanding additional expertise to do a major code conversion makes it a very hard problem for projects to successfully migrate to Hadoop. Also, due to the batch processing nature of