[RESULT] [VOTE] Accept REEF into the Apache Incubator
Thanks everyone who voted! The vote has passed with 13 binding +1 votes and 2 non-binding +1 votes and no +0 or -1 votes. Binding (+1) Ross Gardler Till Westmann Alan D. Cabrera Konstantin Boudnik Bertrand Delacretaz Jakob Glen Homan Chris A Mattmann Andrew Purtell Owen O'Malley Jake Farrell Suresh Srinivas Roman Shaposhnik Chris Douglas Non-binding (+1) Hitesh Shah Jan Iversen We will follow the next steps under the guidance of our mentors. Thanks! - Gon --- Byung-Gon Chun On Sat, Aug 9, 2014 at 2:40 PM, Byung-Gon Chun bgc...@gmail.com wrote: Hi, Thanks for participating in the proposal discussion on REEF. The discussion has calmed. I would like to call a vote for acceptance of REEF into the Apache Incubator. The proposal is attached below, and it is also available at https://wiki.apache.org/incubator/ReefProposal Let's keep this vote open for three business days, closing the voting on August 11, 11:59PM (PDT). [] +1 Accept REEF into the Incubator [] 0 Don't care [] -1 Don't accept REEF because... Thanks! -Gon -- Byung-Gon Chun # REEFProposal - Incubator # Abstract REEF (Retainable Evaluator Execution Framework) is a scale-out computing fabric that eases the development of Big Data applications on top of resource managers such as Apache YARN and Mesos. # Proposal REEF is a Big Data system that makes it easy to implement scalable, fault-tolerant runtime environments for a range of data processing models (e.g., graph processing and machine learning) on top of resource managers such as Apache YARN and Mesos. REEF provides capabilities to run multiple heterogeneous frameworks and workflows of those efficiently. Additionally, REEF contains two libraries that are of independent value: Wake is an event-based-programming framework inspired by Rx and SEDA. Tang is a dependency injection framework inspired by Google Guice, but designed specifically for configuring distributed systems. # Background The resource management layer such as Apache YARN and Mesos has emerged as a critical layer in the new scale-out data processing stack; resource managers assume the responsibility of multiplexing a cluster of shared-nothing machines across heterogeneous applications. They operate behind an interface for leasing containers - a slice of a machine’s resources - to computations in an elastic fashion. However, building data processing frameworks directly on this layer comes at a high cost: each framework must tackle the same challenges (e.g., fault-tolerance, task scheduling and coordination) and reimplement common mechanisms (e.g., caching, bulk transfers). REEF provides a reusable control-plane for scheduling and coordinating task-level work on cluster resource managers. The REEF design enables sophisticated optimizations, such as container re-use and data caching, and facilitates workflows that span multiple frameworks. Examples include pipelining data between different operators in a relational system, retaining state across iterations in iterative or recursive data flow, and passing the result of a MapReduce job to a Machine Learning computation. # Rationale Since REEF is a library that makes it easy to write distributed applications on top of Apache YARN or Mesos, the Apache Software Foundation is the perfect home for hosting REEF. # Current Status REEF has been developed mostly by Microsoft, UCLA and the Seoul National University. The REEF codebase is open-sourced under Apache License 2.0 and is currently hosted in a public repository at github.com. # Meritocracy We plan to build a strong open community by following the Apache meritocracy principles. We will work with those who contribute significantly to the project and invite them to be its committers. # Community REEF is currently being used internally at Microsoft. Also, SK Telecom builds their data analytics infrastructure on top of REEF in collaboration with Seoul National University. We hope to extend our contributor base by becoming an Apache incubator project. REEF will attract developers who are interested in creating common building blocks for simplifying the development of large-scale big data applications. # Core Developers Core developers are engineers from Microsoft, Purestorage, UCB, UCLA, UW and Seoul National University. # Alignment REEF depends on many Apache projects and dependencies. REEF is built on resource managers such as Apache YARN and Apache Mesos. REEF also uses HDFS as a distributed storage layer. # Known Risks ## Orphaned Products The risk of REEF being orphaned is small because Microsoft products are built on REEF. The core REEF developers continue to work on REEF at Microsoft, UCLA, and Seoul National University. The REEF project is gaining interest from other institutions to be used as their infrastructure. ## Inexperience with Open Source Several core developers have experience with open source
Re: [VOTE] Accept REEF into the Apache Incubator
+1 (binding) On Mon, Aug 11, 2014 at 6:20 PM, Hitesh Shah hit...@apache.org wrote: +1 ( non-binding ) — Hitesh On Aug 8, 2014, at 10:40 PM, Byung-Gon Chun bgc...@gmail.com wrote: Hi, Thanks for participating in the proposal discussion on REEF. The discussion has calmed. I would like to call a vote for acceptance of REEF into the Apache Incubator. The proposal is attached below, and it is also available at https://wiki.apache.org/incubator/ReefProposal Let's keep this vote open for three business days, closing the voting on August 11, 11:59PM (PDT). [] +1 Accept REEF into the Incubator [] 0 Don't care [] -1 Don't accept REEF because... Thanks! -Gon -- Byung-Gon Chun # REEFProposal - Incubator # Abstract REEF (Retainable Evaluator Execution Framework) is a scale-out computing fabric that eases the development of Big Data applications on top of resource managers such as Apache YARN and Mesos. # Proposal REEF is a Big Data system that makes it easy to implement scalable, fault-tolerant runtime environments for a range of data processing models (e.g., graph processing and machine learning) on top of resource managers such as Apache YARN and Mesos. REEF provides capabilities to run multiple heterogeneous frameworks and workflows of those efficiently. Additionally, REEF contains two libraries that are of independent value: Wake is an event-based-programming framework inspired by Rx and SEDA. Tang is a dependency injection framework inspired by Google Guice, but designed specifically for configuring distributed systems. # Background The resource management layer such as Apache YARN and Mesos has emerged as a critical layer in the new scale-out data processing stack; resource managers assume the responsibility of multiplexing a cluster of shared-nothing machines across heterogeneous applications. They operate behind an interface for leasing containers - a slice of a machine’s resources - to computations in an elastic fashion. However, building data processing frameworks directly on this layer comes at a high cost: each framework must tackle the same challenges (e.g., fault-tolerance, task scheduling and coordination) and reimplement common mechanisms (e.g., caching, bulk transfers). REEF provides a reusable control-plane for scheduling and coordinating task-level work on cluster resource managers. The REEF design enables sophisticated optimizations, such as container re-use and data caching, and facilitates workflows that span multiple frameworks. Examples include pipelining data between different operators in a relational system, retaining state across iterations in iterative or recursive data flow, and passing the result of a MapReduce job to a Machine Learning computation. # Rationale Since REEF is a library that makes it easy to write distributed applications on top of Apache YARN or Mesos, the Apache Software Foundation is the perfect home for hosting REEF. # Current Status REEF has been developed mostly by Microsoft, UCLA and the Seoul National University. The REEF codebase is open-sourced under Apache License 2.0 and is currently hosted in a public repository at github.com. # Meritocracy We plan to build a strong open community by following the Apache meritocracy principles. We will work with those who contribute significantly to the project and invite them to be its committers. # Community REEF is currently being used internally at Microsoft. Also, SK Telecom builds their data analytics infrastructure on top of REEF in collaboration with Seoul National University. We hope to extend our contributor base by becoming an Apache incubator project. REEF will attract developers who are interested in creating common building blocks for simplifying the development of large-scale big data applications. # Core Developers Core developers are engineers from Microsoft, Purestorage, UCB, UCLA, UW and Seoul National University. # Alignment REEF depends on many Apache projects and dependencies. REEF is built on resource managers such as Apache YARN and Apache Mesos. REEF also uses HDFS as a distributed storage layer. # Known Risks ## Orphaned Products The risk of REEF being orphaned is small because Microsoft products are built on REEF. The core REEF developers continue to work on REEF at Microsoft, UCLA, and Seoul National University. The REEF project is gaining interest from other institutions to be used as their infrastructure. ## Inexperience with Open Source Several core developers have experience with open source development. REEF committers will be guided by the mentors with strong Apache open source project backgrounds. ## Homogeneous Developers The initial committers include developers from several institutions including Microsoft,
Re: [VOTE] Accept REEF into the Apache Incubator
+1 (binding) -Jake On Sat, Aug 9, 2014 at 1:40 AM, Byung-Gon Chun bgc...@gmail.com wrote: Hi, Thanks for participating in the proposal discussion on REEF. The discussion has calmed. I would like to call a vote for acceptance of REEF into the Apache Incubator. The proposal is attached below, and it is also available at https://wiki.apache.org/incubator/ReefProposal Let's keep this vote open for three business days, closing the voting on August 11, 11:59PM (PDT). [] +1 Accept REEF into the Incubator [] 0 Don't care [] -1 Don't accept REEF because... Thanks! -Gon -- Byung-Gon Chun # REEFProposal - Incubator # Abstract REEF (Retainable Evaluator Execution Framework) is a scale-out computing fabric that eases the development of Big Data applications on top of resource managers such as Apache YARN and Mesos. # Proposal REEF is a Big Data system that makes it easy to implement scalable, fault-tolerant runtime environments for a range of data processing models (e.g., graph processing and machine learning) on top of resource managers such as Apache YARN and Mesos. REEF provides capabilities to run multiple heterogeneous frameworks and workflows of those efficiently. Additionally, REEF contains two libraries that are of independent value: Wake is an event-based-programming framework inspired by Rx and SEDA. Tang is a dependency injection framework inspired by Google Guice, but designed specifically for configuring distributed systems. # Background The resource management layer such as Apache YARN and Mesos has emerged as a critical layer in the new scale-out data processing stack; resource managers assume the responsibility of multiplexing a cluster of shared-nothing machines across heterogeneous applications. They operate behind an interface for leasing containers - a slice of a machine’s resources - to computations in an elastic fashion. However, building data processing frameworks directly on this layer comes at a high cost: each framework must tackle the same challenges (e.g., fault-tolerance, task scheduling and coordination) and reimplement common mechanisms (e.g., caching, bulk transfers). REEF provides a reusable control-plane for scheduling and coordinating task-level work on cluster resource managers. The REEF design enables sophisticated optimizations, such as container re-use and data caching, and facilitates workflows that span multiple frameworks. Examples include pipelining data between different operators in a relational system, retaining state across iterations in iterative or recursive data flow, and passing the result of a MapReduce job to a Machine Learning computation. # Rationale Since REEF is a library that makes it easy to write distributed applications on top of Apache YARN or Mesos, the Apache Software Foundation is the perfect home for hosting REEF. # Current Status REEF has been developed mostly by Microsoft, UCLA and the Seoul National University. The REEF codebase is open-sourced under Apache License 2.0 and is currently hosted in a public repository at github.com. # Meritocracy We plan to build a strong open community by following the Apache meritocracy principles. We will work with those who contribute significantly to the project and invite them to be its committers. # Community REEF is currently being used internally at Microsoft. Also, SK Telecom builds their data analytics infrastructure on top of REEF in collaboration with Seoul National University. We hope to extend our contributor base by becoming an Apache incubator project. REEF will attract developers who are interested in creating common building blocks for simplifying the development of large-scale big data applications. # Core Developers Core developers are engineers from Microsoft, Purestorage, UCB, UCLA, UW and Seoul National University. # Alignment REEF depends on many Apache projects and dependencies. REEF is built on resource managers such as Apache YARN and Apache Mesos. REEF also uses HDFS as a distributed storage layer. # Known Risks ## Orphaned Products The risk of REEF being orphaned is small because Microsoft products are built on REEF. The core REEF developers continue to work on REEF at Microsoft, UCLA, and Seoul National University. The REEF project is gaining interest from other institutions to be used as their infrastructure. ## Inexperience with Open Source Several core developers have experience with open source development. REEF committers will be guided by the mentors with strong Apache open source project backgrounds. ## Homogeneous Developers The initial committers include developers from several institutions including Microsoft, Purestorage, UCB, UCLA, and Seoul National University. ## Reliance on Salaried Developers Developers from Microsoft are paid to work on REEF. Since the work is used internally at Microsoft, Microsoft will keep supporting the
Re: [VOTE] Accept REEF into the Apache Incubator
+1 (binding) On Fri, Aug 8, 2014 at 10:40 PM, Byung-Gon Chun bgc...@gmail.com wrote: Hi, Thanks for participating in the proposal discussion on REEF. The discussion has calmed. I would like to call a vote for acceptance of REEF into the Apache Incubator. The proposal is attached below, and it is also available at https://wiki.apache.org/incubator/ReefProposal Let's keep this vote open for three business days, closing the voting on August 11, 11:59PM (PDT). [] +1 Accept REEF into the Incubator [] 0 Don't care [] -1 Don't accept REEF because... Thanks! -Gon -- Byung-Gon Chun # REEFProposal - Incubator # Abstract REEF (Retainable Evaluator Execution Framework) is a scale-out computing fabric that eases the development of Big Data applications on top of resource managers such as Apache YARN and Mesos. # Proposal REEF is a Big Data system that makes it easy to implement scalable, fault-tolerant runtime environments for a range of data processing models (e.g., graph processing and machine learning) on top of resource managers such as Apache YARN and Mesos. REEF provides capabilities to run multiple heterogeneous frameworks and workflows of those efficiently. Additionally, REEF contains two libraries that are of independent value: Wake is an event-based-programming framework inspired by Rx and SEDA. Tang is a dependency injection framework inspired by Google Guice, but designed specifically for configuring distributed systems. # Background The resource management layer such as Apache YARN and Mesos has emerged as a critical layer in the new scale-out data processing stack; resource managers assume the responsibility of multiplexing a cluster of shared-nothing machines across heterogeneous applications. They operate behind an interface for leasing containers - a slice of a machine’s resources - to computations in an elastic fashion. However, building data processing frameworks directly on this layer comes at a high cost: each framework must tackle the same challenges (e.g., fault-tolerance, task scheduling and coordination) and reimplement common mechanisms (e.g., caching, bulk transfers). REEF provides a reusable control-plane for scheduling and coordinating task-level work on cluster resource managers. The REEF design enables sophisticated optimizations, such as container re-use and data caching, and facilitates workflows that span multiple frameworks. Examples include pipelining data between different operators in a relational system, retaining state across iterations in iterative or recursive data flow, and passing the result of a MapReduce job to a Machine Learning computation. # Rationale Since REEF is a library that makes it easy to write distributed applications on top of Apache YARN or Mesos, the Apache Software Foundation is the perfect home for hosting REEF. # Current Status REEF has been developed mostly by Microsoft, UCLA and the Seoul National University. The REEF codebase is open-sourced under Apache License 2.0 and is currently hosted in a public repository at github.com. # Meritocracy We plan to build a strong open community by following the Apache meritocracy principles. We will work with those who contribute significantly to the project and invite them to be its committers. # Community REEF is currently being used internally at Microsoft. Also, SK Telecom builds their data analytics infrastructure on top of REEF in collaboration with Seoul National University. We hope to extend our contributor base by becoming an Apache incubator project. REEF will attract developers who are interested in creating common building blocks for simplifying the development of large-scale big data applications. # Core Developers Core developers are engineers from Microsoft, Purestorage, UCB, UCLA, UW and Seoul National University. # Alignment REEF depends on many Apache projects and dependencies. REEF is built on resource managers such as Apache YARN and Apache Mesos. REEF also uses HDFS as a distributed storage layer. # Known Risks ## Orphaned Products The risk of REEF being orphaned is small because Microsoft products are built on REEF. The core REEF developers continue to work on REEF at Microsoft, UCLA, and Seoul National University. The REEF project is gaining interest from other institutions to be used as their infrastructure. ## Inexperience with Open Source Several core developers have experience with open source development. REEF committers will be guided by the mentors with strong Apache open source project backgrounds. ## Homogeneous Developers The initial committers include developers from several institutions including Microsoft, Purestorage, UCB, UCLA, and Seoul National University. ## Reliance on Salaried Developers Developers from Microsoft are paid to work on REEF. Since the work is used internally at Microsoft, Microsoft will keep supporting the
Re: [VOTE] Accept REEF into the Apache Incubator
On Aug 12, 2014 7:26 PM, Suresh Srinivas sur...@hortonworks.com wrote: +1 (binding) +1 On Fri, Aug 8, 2014 at 10:40 PM, Byung-Gon Chun bgc...@gmail.com wrote: Hi, Thanks for participating in the proposal discussion on REEF. The discussion has calmed. I would like to call a vote for acceptance of REEF into the Apache Incubator. The proposal is attached below, and it is also available at https://wiki.apache.org/incubator/ReefProposal Let's keep this vote open for three business days, closing the voting on August 11, 11:59PM (PDT). [] +1 Accept REEF into the Incubator [] 0 Don't care [] -1 Don't accept REEF because... Thanks! -Gon -- Byung-Gon Chun # REEFProposal - Incubator # Abstract REEF (Retainable Evaluator Execution Framework) is a scale-out computing fabric that eases the development of Big Data applications on top of resource managers such as Apache YARN and Mesos. # Proposal REEF is a Big Data system that makes it easy to implement scalable, fault-tolerant runtime environments for a range of data processing models (e.g., graph processing and machine learning) on top of resource managers such as Apache YARN and Mesos. REEF provides capabilities to run multiple heterogeneous frameworks and workflows of those efficiently. Additionally, REEF contains two libraries that are of independent value: Wake is an event-based-programming framework inspired by Rx and SEDA. Tang is a dependency injection framework inspired by Google Guice, but designed specifically for configuring distributed systems. # Background The resource management layer such as Apache YARN and Mesos has emerged as a critical layer in the new scale-out data processing stack; resource managers assume the responsibility of multiplexing a cluster of shared-nothing machines across heterogeneous applications. They operate behind an interface for leasing containers - a slice of a machine’s resources - to computations in an elastic fashion. However, building data processing frameworks directly on this layer comes at a high cost: each framework must tackle the same challenges (e.g., fault-tolerance, task scheduling and coordination) and reimplement common mechanisms (e.g., caching, bulk transfers). REEF provides a reusable control-plane for scheduling and coordinating task-level work on cluster resource managers. The REEF design enables sophisticated optimizations, such as container re-use and data caching, and facilitates workflows that span multiple frameworks. Examples include pipelining data between different operators in a relational system, retaining state across iterations in iterative or recursive data flow, and passing the result of a MapReduce job to a Machine Learning computation. # Rationale Since REEF is a library that makes it easy to write distributed applications on top of Apache YARN or Mesos, the Apache Software Foundation is the perfect home for hosting REEF. # Current Status REEF has been developed mostly by Microsoft, UCLA and the Seoul National University. The REEF codebase is open-sourced under Apache License 2.0 and is currently hosted in a public repository at github.com. # Meritocracy We plan to build a strong open community by following the Apache meritocracy principles. We will work with those who contribute significantly to the project and invite them to be its committers. # Community REEF is currently being used internally at Microsoft. Also, SK Telecom builds their data analytics infrastructure on top of REEF in collaboration with Seoul National University. We hope to extend our contributor base by becoming an Apache incubator project. REEF will attract developers who are interested in creating common building blocks for simplifying the development of large-scale big data applications. # Core Developers Core developers are engineers from Microsoft, Purestorage, UCB, UCLA, UW and Seoul National University. # Alignment REEF depends on many Apache projects and dependencies. REEF is built on resource managers such as Apache YARN and Apache Mesos. REEF also uses HDFS as a distributed storage layer. # Known Risks ## Orphaned Products The risk of REEF being orphaned is small because Microsoft products are built on REEF. The core REEF developers continue to work on REEF at Microsoft, UCLA, and Seoul National University. The REEF project is gaining interest from other institutions to be used as their infrastructure. ## Inexperience with Open Source Several core developers have experience with open source development. REEF committers will be guided by the mentors with strong Apache open source project backgrounds. ## Homogeneous Developers The initial committers include developers from several institutions including Microsoft, Purestorage, UCB, UCLA,
Re: [VOTE] Accept REEF into the Apache Incubator
On Fri, Aug 8, 2014 at 10:40 PM, Byung-Gon Chun bgc...@gmail.com wrote: Hi, Thanks for participating in the proposal discussion on REEF. The discussion has calmed. I would like to call a vote for acceptance of REEF into the Apache Incubator. The proposal is attached below, and it is also available at https://wiki.apache.org/incubator/ReefProposal Let's keep this vote open for three business days, closing the voting on August 11, 11:59PM (PDT). [] +1 Accept REEF into the Incubator [] 0 Don't care [] -1 Don't accept REEF because... +1 (binding) Thanks, Roman. - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [VOTE] Accept REEF into the Apache Incubator
+1 -C On Fri, Aug 8, 2014 at 10:40 PM, Byung-Gon Chun bgc...@gmail.com wrote: Hi, Thanks for participating in the proposal discussion on REEF. The discussion has calmed. I would like to call a vote for acceptance of REEF into the Apache Incubator. The proposal is attached below, and it is also available at https://wiki.apache.org/incubator/ReefProposal Let's keep this vote open for three business days, closing the voting on August 11, 11:59PM (PDT). [] +1 Accept REEF into the Incubator [] 0 Don't care [] -1 Don't accept REEF because... Thanks! -Gon -- Byung-Gon Chun # REEFProposal - Incubator # Abstract REEF (Retainable Evaluator Execution Framework) is a scale-out computing fabric that eases the development of Big Data applications on top of resource managers such as Apache YARN and Mesos. # Proposal REEF is a Big Data system that makes it easy to implement scalable, fault-tolerant runtime environments for a range of data processing models (e.g., graph processing and machine learning) on top of resource managers such as Apache YARN and Mesos. REEF provides capabilities to run multiple heterogeneous frameworks and workflows of those efficiently. Additionally, REEF contains two libraries that are of independent value: Wake is an event-based-programming framework inspired by Rx and SEDA. Tang is a dependency injection framework inspired by Google Guice, but designed specifically for configuring distributed systems. # Background The resource management layer such as Apache YARN and Mesos has emerged as a critical layer in the new scale-out data processing stack; resource managers assume the responsibility of multiplexing a cluster of shared-nothing machines across heterogeneous applications. They operate behind an interface for leasing containers - a slice of a machine’s resources - to computations in an elastic fashion. However, building data processing frameworks directly on this layer comes at a high cost: each framework must tackle the same challenges (e.g., fault-tolerance, task scheduling and coordination) and reimplement common mechanisms (e.g., caching, bulk transfers). REEF provides a reusable control-plane for scheduling and coordinating task-level work on cluster resource managers. The REEF design enables sophisticated optimizations, such as container re-use and data caching, and facilitates workflows that span multiple frameworks. Examples include pipelining data between different operators in a relational system, retaining state across iterations in iterative or recursive data flow, and passing the result of a MapReduce job to a Machine Learning computation. # Rationale Since REEF is a library that makes it easy to write distributed applications on top of Apache YARN or Mesos, the Apache Software Foundation is the perfect home for hosting REEF. # Current Status REEF has been developed mostly by Microsoft, UCLA and the Seoul National University. The REEF codebase is open-sourced under Apache License 2.0 and is currently hosted in a public repository at github.com. # Meritocracy We plan to build a strong open community by following the Apache meritocracy principles. We will work with those who contribute significantly to the project and invite them to be its committers. # Community REEF is currently being used internally at Microsoft. Also, SK Telecom builds their data analytics infrastructure on top of REEF in collaboration with Seoul National University. We hope to extend our contributor base by becoming an Apache incubator project. REEF will attract developers who are interested in creating common building blocks for simplifying the development of large-scale big data applications. # Core Developers Core developers are engineers from Microsoft, Purestorage, UCB, UCLA, UW and Seoul National University. # Alignment REEF depends on many Apache projects and dependencies. REEF is built on resource managers such as Apache YARN and Apache Mesos. REEF also uses HDFS as a distributed storage layer. # Known Risks ## Orphaned Products The risk of REEF being orphaned is small because Microsoft products are built on REEF. The core REEF developers continue to work on REEF at Microsoft, UCLA, and Seoul National University. The REEF project is gaining interest from other institutions to be used as their infrastructure. ## Inexperience with Open Source Several core developers have experience with open source development. REEF committers will be guided by the mentors with strong Apache open source project backgrounds. ## Homogeneous Developers The initial committers include developers from several institutions including Microsoft, Purestorage, UCB, UCLA, and Seoul National University. ## Reliance on Salaried Developers Developers from Microsoft are paid to work on REEF. Since the work is used internally at Microsoft, Microsoft will keep supporting the developers to
Re: [VOTE] Accept REEF into the Apache Incubator
On Sat, Aug 9, 2014 at 7:40 AM, Byung-Gon Chun bgc...@gmail.com wrote: ...I would like to call a vote for acceptance of REEF into the Apache Incubator... +1 -Bertrand - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [VOTE] Accept REEF into the Apache Incubator
+1 (binding) -Jakob From: Bertrand Delacretaz Sent: Monday, August 11, 2014 1:16 AM To: general@incubator.apache.org On Sat, Aug 9, 2014 at 7:40 AM, Byung-Gon Chun bgc...@gmail.com wrote: ...I would like to call a vote for acceptance of REEF into the Apache Incubator... +1 -Bertrand - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [VOTE] Accept REEF into the Apache Incubator
+1 binding thanks Sent from my iPhone On Aug 8, 2014, at 10:40 PM, Byung-Gon Chun bgc...@gmail.com wrote: Hi, Thanks for participating in the proposal discussion on REEF. The discussion has calmed. I would like to call a vote for acceptance of REEF into the Apache Incubator. The proposal is attached below, and it is also available at https://wiki.apache.org/incubator/ReefProposal Let's keep this vote open for three business days, closing the voting on August 11, 11:59PM (PDT). [] +1 Accept REEF into the Incubator [] 0 Don't care [] -1 Don't accept REEF because... Thanks! -Gon -- Byung-Gon Chun # REEFProposal - Incubator # Abstract REEF (Retainable Evaluator Execution Framework) is a scale-out computing fabric that eases the development of Big Data applications on top of resource managers such as Apache YARN and Mesos. # Proposal REEF is a Big Data system that makes it easy to implement scalable, fault-tolerant runtime environments for a range of data processing models (e.g., graph processing and machine learning) on top of resource managers such as Apache YARN and Mesos. REEF provides capabilities to run multiple heterogeneous frameworks and workflows of those efficiently. Additionally, REEF contains two libraries that are of independent value: Wake is an event-based-programming framework inspired by Rx and SEDA. Tang is a dependency injection framework inspired by Google Guice, but designed specifically for configuring distributed systems. # Background The resource management layer such as Apache YARN and Mesos has emerged as a critical layer in the new scale-out data processing stack; resource managers assume the responsibility of multiplexing a cluster of shared-nothing machines across heterogeneous applications. They operate behind an interface for leasing containers - a slice of a machine’s resources - to computations in an elastic fashion. However, building data processing frameworks directly on this layer comes at a high cost: each framework must tackle the same challenges (e.g., fault-tolerance, task scheduling and coordination) and reimplement common mechanisms (e.g., caching, bulk transfers). REEF provides a reusable control-plane for scheduling and coordinating task-level work on cluster resource managers. The REEF design enables sophisticated optimizations, such as container re-use and data caching, and facilitates workflows that span multiple frameworks. Examples include pipelining data between different operators in a relational system, retaining state across iterations in iterative or recursive data flow, and passing the result of a MapReduce job to a Machine Learning computation. # Rationale Since REEF is a library that makes it easy to write distributed applications on top of Apache YARN or Mesos, the Apache Software Foundation is the perfect home for hosting REEF. # Current Status REEF has been developed mostly by Microsoft, UCLA and the Seoul National University. The REEF codebase is open-sourced under Apache License 2.0 and is currently hosted in a public repository at github.com. # Meritocracy We plan to build a strong open community by following the Apache meritocracy principles. We will work with those who contribute significantly to the project and invite them to be its committers. # Community REEF is currently being used internally at Microsoft. Also, SK Telecom builds their data analytics infrastructure on top of REEF in collaboration with Seoul National University. We hope to extend our contributor base by becoming an Apache incubator project. REEF will attract developers who are interested in creating common building blocks for simplifying the development of large-scale big data applications. # Core Developers Core developers are engineers from Microsoft, Purestorage, UCB, UCLA, UW and Seoul National University. # Alignment REEF depends on many Apache projects and dependencies. REEF is built on resource managers such as Apache YARN and Apache Mesos. REEF also uses HDFS as a distributed storage layer. # Known Risks ## Orphaned Products The risk of REEF being orphaned is small because Microsoft products are built on REEF. The core REEF developers continue to work on REEF at Microsoft, UCLA, and Seoul National University. The REEF project is gaining interest from other institutions to be used as their infrastructure. ## Inexperience with Open Source Several core developers have experience with open source development. REEF committers will be guided by the mentors with strong Apache open source project backgrounds. ## Homogeneous Developers The initial committers include developers from several institutions including Microsoft, Purestorage, UCB, UCLA, and Seoul National University. ## Reliance on Salaried Developers Developers from Microsoft are paid to work on REEF. Since the work is
Re: [VOTE] Accept REEF into the Apache Incubator
+1 (binding) On Fri, Aug 8, 2014 at 10:40 PM, Byung-Gon Chun bgc...@gmail.com wrote: Hi, Thanks for participating in the proposal discussion on REEF. The discussion has calmed. I would like to call a vote for acceptance of REEF into the Apache Incubator. The proposal is attached below, and it is also available at https://wiki.apache.org/incubator/ReefProposal Let's keep this vote open for three business days, closing the voting on August 11, 11:59PM (PDT). [] +1 Accept REEF into the Incubator [] 0 Don't care [] -1 Don't accept REEF because... Thanks! -Gon -- Byung-Gon Chun # REEFProposal - Incubator # Abstract REEF (Retainable Evaluator Execution Framework) is a scale-out computing fabric that eases the development of Big Data applications on top of resource managers such as Apache YARN and Mesos. # Proposal REEF is a Big Data system that makes it easy to implement scalable, fault-tolerant runtime environments for a range of data processing models (e.g., graph processing and machine learning) on top of resource managers such as Apache YARN and Mesos. REEF provides capabilities to run multiple heterogeneous frameworks and workflows of those efficiently. Additionally, REEF contains two libraries that are of independent value: Wake is an event-based-programming framework inspired by Rx and SEDA. Tang is a dependency injection framework inspired by Google Guice, but designed specifically for configuring distributed systems. # Background The resource management layer such as Apache YARN and Mesos has emerged as a critical layer in the new scale-out data processing stack; resource managers assume the responsibility of multiplexing a cluster of shared-nothing machines across heterogeneous applications. They operate behind an interface for leasing containers - a slice of a machine’s resources - to computations in an elastic fashion. However, building data processing frameworks directly on this layer comes at a high cost: each framework must tackle the same challenges (e.g., fault-tolerance, task scheduling and coordination) and reimplement common mechanisms (e.g., caching, bulk transfers). REEF provides a reusable control-plane for scheduling and coordinating task-level work on cluster resource managers. The REEF design enables sophisticated optimizations, such as container re-use and data caching, and facilitates workflows that span multiple frameworks. Examples include pipelining data between different operators in a relational system, retaining state across iterations in iterative or recursive data flow, and passing the result of a MapReduce job to a Machine Learning computation. # Rationale Since REEF is a library that makes it easy to write distributed applications on top of Apache YARN or Mesos, the Apache Software Foundation is the perfect home for hosting REEF. # Current Status REEF has been developed mostly by Microsoft, UCLA and the Seoul National University. The REEF codebase is open-sourced under Apache License 2.0 and is currently hosted in a public repository at github.com. # Meritocracy We plan to build a strong open community by following the Apache meritocracy principles. We will work with those who contribute significantly to the project and invite them to be its committers. # Community REEF is currently being used internally at Microsoft. Also, SK Telecom builds their data analytics infrastructure on top of REEF in collaboration with Seoul National University. We hope to extend our contributor base by becoming an Apache incubator project. REEF will attract developers who are interested in creating common building blocks for simplifying the development of large-scale big data applications. # Core Developers Core developers are engineers from Microsoft, Purestorage, UCB, UCLA, UW and Seoul National University. # Alignment REEF depends on many Apache projects and dependencies. REEF is built on resource managers such as Apache YARN and Apache Mesos. REEF also uses HDFS as a distributed storage layer. # Known Risks ## Orphaned Products The risk of REEF being orphaned is small because Microsoft products are built on REEF. The core REEF developers continue to work on REEF at Microsoft, UCLA, and Seoul National University. The REEF project is gaining interest from other institutions to be used as their infrastructure. ## Inexperience with Open Source Several core developers have experience with open source development. REEF committers will be guided by the mentors with strong Apache open source project backgrounds. ## Homogeneous Developers The initial committers include developers from several institutions including Microsoft, Purestorage, UCB, UCLA, and Seoul National University. ## Reliance on Salaried Developers Developers from Microsoft are paid to work on REEF. Since the work is used internally at Microsoft, Microsoft will keep supporting the
Re: [VOTE] Accept REEF into the Apache Incubator
+1 ( non-binding ) — Hitesh On Aug 8, 2014, at 10:40 PM, Byung-Gon Chun bgc...@gmail.com wrote: Hi, Thanks for participating in the proposal discussion on REEF. The discussion has calmed. I would like to call a vote for acceptance of REEF into the Apache Incubator. The proposal is attached below, and it is also available at https://wiki.apache.org/incubator/ReefProposal Let's keep this vote open for three business days, closing the voting on August 11, 11:59PM (PDT). [] +1 Accept REEF into the Incubator [] 0 Don't care [] -1 Don't accept REEF because... Thanks! -Gon -- Byung-Gon Chun # REEFProposal - Incubator # Abstract REEF (Retainable Evaluator Execution Framework) is a scale-out computing fabric that eases the development of Big Data applications on top of resource managers such as Apache YARN and Mesos. # Proposal REEF is a Big Data system that makes it easy to implement scalable, fault-tolerant runtime environments for a range of data processing models (e.g., graph processing and machine learning) on top of resource managers such as Apache YARN and Mesos. REEF provides capabilities to run multiple heterogeneous frameworks and workflows of those efficiently. Additionally, REEF contains two libraries that are of independent value: Wake is an event-based-programming framework inspired by Rx and SEDA. Tang is a dependency injection framework inspired by Google Guice, but designed specifically for configuring distributed systems. # Background The resource management layer such as Apache YARN and Mesos has emerged as a critical layer in the new scale-out data processing stack; resource managers assume the responsibility of multiplexing a cluster of shared-nothing machines across heterogeneous applications. They operate behind an interface for leasing containers - a slice of a machine’s resources - to computations in an elastic fashion. However, building data processing frameworks directly on this layer comes at a high cost: each framework must tackle the same challenges (e.g., fault-tolerance, task scheduling and coordination) and reimplement common mechanisms (e.g., caching, bulk transfers). REEF provides a reusable control-plane for scheduling and coordinating task-level work on cluster resource managers. The REEF design enables sophisticated optimizations, such as container re-use and data caching, and facilitates workflows that span multiple frameworks. Examples include pipelining data between different operators in a relational system, retaining state across iterations in iterative or recursive data flow, and passing the result of a MapReduce job to a Machine Learning computation. # Rationale Since REEF is a library that makes it easy to write distributed applications on top of Apache YARN or Mesos, the Apache Software Foundation is the perfect home for hosting REEF. # Current Status REEF has been developed mostly by Microsoft, UCLA and the Seoul National University. The REEF codebase is open-sourced under Apache License 2.0 and is currently hosted in a public repository at github.com. # Meritocracy We plan to build a strong open community by following the Apache meritocracy principles. We will work with those who contribute significantly to the project and invite them to be its committers. # Community REEF is currently being used internally at Microsoft. Also, SK Telecom builds their data analytics infrastructure on top of REEF in collaboration with Seoul National University. We hope to extend our contributor base by becoming an Apache incubator project. REEF will attract developers who are interested in creating common building blocks for simplifying the development of large-scale big data applications. # Core Developers Core developers are engineers from Microsoft, Purestorage, UCB, UCLA, UW and Seoul National University. # Alignment REEF depends on many Apache projects and dependencies. REEF is built on resource managers such as Apache YARN and Apache Mesos. REEF also uses HDFS as a distributed storage layer. # Known Risks ## Orphaned Products The risk of REEF being orphaned is small because Microsoft products are built on REEF. The core REEF developers continue to work on REEF at Microsoft, UCLA, and Seoul National University. The REEF project is gaining interest from other institutions to be used as their infrastructure. ## Inexperience with Open Source Several core developers have experience with open source development. REEF committers will be guided by the mentors with strong Apache open source project backgrounds. ## Homogeneous Developers The initial committers include developers from several institutions including Microsoft, Purestorage, UCB, UCLA, and Seoul National University. ## Reliance on Salaried Developers Developers from Microsoft are paid to work on REEF. Since the work is used
Re: [VOTE] Accept REEF into the Apache Incubator
+1 On Sat, Aug 09, 2014 at 02:40PM, Byung-Gon Chun wrote: Hi, Thanks for participating in the proposal discussion on REEF. The discussion has calmed. I would like to call a vote for acceptance of REEF into the Apache Incubator. The proposal is attached below, and it is also available at https://wiki.apache.org/incubator/ReefProposal Let's keep this vote open for three business days, closing the voting on August 11, 11:59PM (PDT). [] +1 Accept REEF into the Incubator [] 0 Don't care [] -1 Don't accept REEF because... Thanks! -Gon -- Byung-Gon Chun # REEFProposal - Incubator # Abstract REEF (Retainable Evaluator Execution Framework) is a scale-out computing fabric that eases the development of Big Data applications on top of resource managers such as Apache YARN and Mesos. # Proposal REEF is a Big Data system that makes it easy to implement scalable, fault-tolerant runtime environments for a range of data processing models (e.g., graph processing and machine learning) on top of resource managers such as Apache YARN and Mesos. REEF provides capabilities to run multiple heterogeneous frameworks and workflows of those efficiently. Additionally, REEF contains two libraries that are of independent value: Wake is an event-based-programming framework inspired by Rx and SEDA. Tang is a dependency injection framework inspired by Google Guice, but designed specifically for configuring distributed systems. # Background The resource management layer such as Apache YARN and Mesos has emerged as a critical layer in the new scale-out data processing stack; resource managers assume the responsibility of multiplexing a cluster of shared-nothing machines across heterogeneous applications. They operate behind an interface for leasing containers - a slice of a machine’s resources - to computations in an elastic fashion. However, building data processing frameworks directly on this layer comes at a high cost: each framework must tackle the same challenges (e.g., fault-tolerance, task scheduling and coordination) and reimplement common mechanisms (e.g., caching, bulk transfers). REEF provides a reusable control-plane for scheduling and coordinating task-level work on cluster resource managers. The REEF design enables sophisticated optimizations, such as container re-use and data caching, and facilitates workflows that span multiple frameworks. Examples include pipelining data between different operators in a relational system, retaining state across iterations in iterative or recursive data flow, and passing the result of a MapReduce job to a Machine Learning computation. # Rationale Since REEF is a library that makes it easy to write distributed applications on top of Apache YARN or Mesos, the Apache Software Foundation is the perfect home for hosting REEF. # Current Status REEF has been developed mostly by Microsoft, UCLA and the Seoul National University. The REEF codebase is open-sourced under Apache License 2.0 and is currently hosted in a public repository at github.com. # Meritocracy We plan to build a strong open community by following the Apache meritocracy principles. We will work with those who contribute significantly to the project and invite them to be its committers. # Community REEF is currently being used internally at Microsoft. Also, SK Telecom builds their data analytics infrastructure on top of REEF in collaboration with Seoul National University. We hope to extend our contributor base by becoming an Apache incubator project. REEF will attract developers who are interested in creating common building blocks for simplifying the development of large-scale big data applications. # Core Developers Core developers are engineers from Microsoft, Purestorage, UCB, UCLA, UW and Seoul National University. # Alignment REEF depends on many Apache projects and dependencies. REEF is built on resource managers such as Apache YARN and Apache Mesos. REEF also uses HDFS as a distributed storage layer. # Known Risks ## Orphaned Products The risk of REEF being orphaned is small because Microsoft products are built on REEF. The core REEF developers continue to work on REEF at Microsoft, UCLA, and Seoul National University. The REEF project is gaining interest from other institutions to be used as their infrastructure. ## Inexperience with Open Source Several core developers have experience with open source development. REEF committers will be guided by the mentors with strong Apache open source project backgrounds. ## Homogeneous Developers The initial committers include developers from several institutions including Microsoft, Purestorage, UCB, UCLA, and Seoul National University. ## Reliance on Salaried Developers Developers from Microsoft are paid to work on REEF. Since the work is used internally at Microsoft, Microsoft will keep
Re: [VOTE] Accept REEF into the Apache Incubator
+1 On Fri, Aug 8, 2014 at 10:40 PM, Byung-Gon Chun bgc...@gmail.com wrote: Hi, Thanks for participating in the proposal discussion on REEF. The discussion has calmed. I would like to call a vote for acceptance of REEF into the Apache Incubator. The proposal is attached below, and it is also available at https://wiki.apache.org/incubator/ReefProposal Let's keep this vote open for three business days, closing the voting on August 11, 11:59PM (PDT). [] +1 Accept REEF into the Incubator [] 0 Don't care [] -1 Don't accept REEF because... Thanks! -Gon -- Byung-Gon Chun # REEFProposal - Incubator # Abstract REEF (Retainable Evaluator Execution Framework) is a scale-out computing fabric that eases the development of Big Data applications on top of resource managers such as Apache YARN and Mesos. # Proposal REEF is a Big Data system that makes it easy to implement scalable, fault-tolerant runtime environments for a range of data processing models (e.g., graph processing and machine learning) on top of resource managers such as Apache YARN and Mesos. REEF provides capabilities to run multiple heterogeneous frameworks and workflows of those efficiently. Additionally, REEF contains two libraries that are of independent value: Wake is an event-based-programming framework inspired by Rx and SEDA. Tang is a dependency injection framework inspired by Google Guice, but designed specifically for configuring distributed systems. # Background The resource management layer such as Apache YARN and Mesos has emerged as a critical layer in the new scale-out data processing stack; resource managers assume the responsibility of multiplexing a cluster of shared-nothing machines across heterogeneous applications. They operate behind an interface for leasing containers - a slice of a machine’s resources - to computations in an elastic fashion. However, building data processing frameworks directly on this layer comes at a high cost: each framework must tackle the same challenges (e.g., fault-tolerance, task scheduling and coordination) and reimplement common mechanisms (e.g., caching, bulk transfers). REEF provides a reusable control-plane for scheduling and coordinating task-level work on cluster resource managers. The REEF design enables sophisticated optimizations, such as container re-use and data caching, and facilitates workflows that span multiple frameworks. Examples include pipelining data between different operators in a relational system, retaining state across iterations in iterative or recursive data flow, and passing the result of a MapReduce job to a Machine Learning computation. # Rationale Since REEF is a library that makes it easy to write distributed applications on top of Apache YARN or Mesos, the Apache Software Foundation is the perfect home for hosting REEF. # Current Status REEF has been developed mostly by Microsoft, UCLA and the Seoul National University. The REEF codebase is open-sourced under Apache License 2.0 and is currently hosted in a public repository at github.com. # Meritocracy We plan to build a strong open community by following the Apache meritocracy principles. We will work with those who contribute significantly to the project and invite them to be its committers. # Community REEF is currently being used internally at Microsoft. Also, SK Telecom builds their data analytics infrastructure on top of REEF in collaboration with Seoul National University. We hope to extend our contributor base by becoming an Apache incubator project. REEF will attract developers who are interested in creating common building blocks for simplifying the development of large-scale big data applications. # Core Developers Core developers are engineers from Microsoft, Purestorage, UCB, UCLA, UW and Seoul National University. # Alignment REEF depends on many Apache projects and dependencies. REEF is built on resource managers such as Apache YARN and Apache Mesos. REEF also uses HDFS as a distributed storage layer. # Known Risks ## Orphaned Products The risk of REEF being orphaned is small because Microsoft products are built on REEF. The core REEF developers continue to work on REEF at Microsoft, UCLA, and Seoul National University. The REEF project is gaining interest from other institutions to be used as their infrastructure. ## Inexperience with Open Source Several core developers have experience with open source development. REEF committers will be guided by the mentors with strong Apache open source project backgrounds. ## Homogeneous Developers The initial committers include developers from several institutions including Microsoft, Purestorage, UCB, UCLA, and Seoul National University. ## Reliance on Salaried Developers Developers from Microsoft are paid to work on REEF. Since the work is used internally at Microsoft, Microsoft will keep supporting the developers to
Re: [VOTE] Accept REEF into the Apache Incubator
+1 binding Regards, Alan On Aug 8, 2014, at 10:40 PM, Byung-Gon Chun bgc...@gmail.com wrote: Let's keep this vote open for three business days, closing the voting on August 11, 11:59PM (PDT). [] +1 Accept REEF into the Incubator [] 0 Don't care [] -1 Don't accept REEF because...
[VOTE] Accept REEF into the Apache Incubator
Hi, Thanks for participating in the proposal discussion on REEF. The discussion has calmed. I would like to call a vote for acceptance of REEF into the Apache Incubator. The proposal is attached below, and it is also available at https://wiki.apache.org/incubator/ReefProposal Let's keep this vote open for three business days, closing the voting on August 11, 11:59PM (PDT). [] +1 Accept REEF into the Incubator [] 0 Don't care [] -1 Don't accept REEF because... Thanks! -Gon -- Byung-Gon Chun # REEFProposal - Incubator # Abstract REEF (Retainable Evaluator Execution Framework) is a scale-out computing fabric that eases the development of Big Data applications on top of resource managers such as Apache YARN and Mesos. # Proposal REEF is a Big Data system that makes it easy to implement scalable, fault-tolerant runtime environments for a range of data processing models (e.g., graph processing and machine learning) on top of resource managers such as Apache YARN and Mesos. REEF provides capabilities to run multiple heterogeneous frameworks and workflows of those efficiently. Additionally, REEF contains two libraries that are of independent value: Wake is an event-based-programming framework inspired by Rx and SEDA. Tang is a dependency injection framework inspired by Google Guice, but designed specifically for configuring distributed systems. # Background The resource management layer such as Apache YARN and Mesos has emerged as a critical layer in the new scale-out data processing stack; resource managers assume the responsibility of multiplexing a cluster of shared-nothing machines across heterogeneous applications. They operate behind an interface for leasing containers - a slice of a machine’s resources - to computations in an elastic fashion. However, building data processing frameworks directly on this layer comes at a high cost: each framework must tackle the same challenges (e.g., fault-tolerance, task scheduling and coordination) and reimplement common mechanisms (e.g., caching, bulk transfers). REEF provides a reusable control-plane for scheduling and coordinating task-level work on cluster resource managers. The REEF design enables sophisticated optimizations, such as container re-use and data caching, and facilitates workflows that span multiple frameworks. Examples include pipelining data between different operators in a relational system, retaining state across iterations in iterative or recursive data flow, and passing the result of a MapReduce job to a Machine Learning computation. # Rationale Since REEF is a library that makes it easy to write distributed applications on top of Apache YARN or Mesos, the Apache Software Foundation is the perfect home for hosting REEF. # Current Status REEF has been developed mostly by Microsoft, UCLA and the Seoul National University. The REEF codebase is open-sourced under Apache License 2.0 and is currently hosted in a public repository at github.com. # Meritocracy We plan to build a strong open community by following the Apache meritocracy principles. We will work with those who contribute significantly to the project and invite them to be its committers. # Community REEF is currently being used internally at Microsoft. Also, SK Telecom builds their data analytics infrastructure on top of REEF in collaboration with Seoul National University. We hope to extend our contributor base by becoming an Apache incubator project. REEF will attract developers who are interested in creating common building blocks for simplifying the development of large-scale big data applications. # Core Developers Core developers are engineers from Microsoft, Purestorage, UCB, UCLA, UW and Seoul National University. # Alignment REEF depends on many Apache projects and dependencies. REEF is built on resource managers such as Apache YARN and Apache Mesos. REEF also uses HDFS as a distributed storage layer. # Known Risks ## Orphaned Products The risk of REEF being orphaned is small because Microsoft products are built on REEF. The core REEF developers continue to work on REEF at Microsoft, UCLA, and Seoul National University. The REEF project is gaining interest from other institutions to be used as their infrastructure. ## Inexperience with Open Source Several core developers have experience with open source development. REEF committers will be guided by the mentors with strong Apache open source project backgrounds. ## Homogeneous Developers The initial committers include developers from several institutions including Microsoft, Purestorage, UCB, UCLA, and Seoul National University. ## Reliance on Salaried Developers Developers from Microsoft are paid to work on REEF. Since the work is used internally at Microsoft, Microsoft will keep supporting the developers to work on REEF. There are also engineers and graduate students that contribute to REEF from UCLA, UCB, UW and Seoul National University. We plan to attract active developers
Re: [VOTE] Accept REEF into the Apache Incubator
[x] +1 Accept REEF into the Incubator On 8 August 2014 22:40, Byung-Gon Chun bgc...@gmail.com wrote: Hi, Thanks for participating in the proposal discussion on REEF. The discussion has calmed. I would like to call a vote for acceptance of REEF into the Apache Incubator. The proposal is attached below, and it is also available at https://wiki.apache.org/incubator/ReefProposal Let's keep this vote open for three business days, closing the voting on August 11, 11:59PM (PDT). [] +1 Accept REEF into the Incubator [] 0 Don't care [] -1 Don't accept REEF because... Thanks! -Gon -- Byung-Gon Chun # REEFProposal - Incubator # Abstract REEF (Retainable Evaluator Execution Framework) is a scale-out computing fabric that eases the development of Big Data applications on top of resource managers such as Apache YARN and Mesos. # Proposal REEF is a Big Data system that makes it easy to implement scalable, fault-tolerant runtime environments for a range of data processing models (e.g., graph processing and machine learning) on top of resource managers such as Apache YARN and Mesos. REEF provides capabilities to run multiple heterogeneous frameworks and workflows of those efficiently. Additionally, REEF contains two libraries that are of independent value: Wake is an event-based-programming framework inspired by Rx and SEDA. Tang is a dependency injection framework inspired by Google Guice, but designed specifically for configuring distributed systems. # Background The resource management layer such as Apache YARN and Mesos has emerged as a critical layer in the new scale-out data processing stack; resource managers assume the responsibility of multiplexing a cluster of shared-nothing machines across heterogeneous applications. They operate behind an interface for leasing containers - a slice of a machine's resources - to computations in an elastic fashion. However, building data processing frameworks directly on this layer comes at a high cost: each framework must tackle the same challenges (e.g., fault-tolerance, task scheduling and coordination) and reimplement common mechanisms (e.g., caching, bulk transfers). REEF provides a reusable control-plane for scheduling and coordinating task-level work on cluster resource managers. The REEF design enables sophisticated optimizations, such as container re-use and data caching, and facilitates workflows that span multiple frameworks. Examples include pipelining data between different operators in a relational system, retaining state across iterations in iterative or recursive data flow, and passing the result of a MapReduce job to a Machine Learning computation. # Rationale Since REEF is a library that makes it easy to write distributed applications on top of Apache YARN or Mesos, the Apache Software Foundation is the perfect home for hosting REEF. # Current Status REEF has been developed mostly by Microsoft, UCLA and the Seoul National University. The REEF codebase is open-sourced under Apache License 2.0 and is currently hosted in a public repository at github.com. # Meritocracy We plan to build a strong open community by following the Apache meritocracy principles. We will work with those who contribute significantly to the project and invite them to be its committers. # Community REEF is currently being used internally at Microsoft. Also, SK Telecom builds their data analytics infrastructure on top of REEF in collaboration with Seoul National University. We hope to extend our contributor base by becoming an Apache incubator project. REEF will attract developers who are interested in creating common building blocks for simplifying the development of large-scale big data applications. # Core Developers Core developers are engineers from Microsoft, Purestorage, UCB, UCLA, UW and Seoul National University. # Alignment REEF depends on many Apache projects and dependencies. REEF is built on resource managers such as Apache YARN and Apache Mesos. REEF also uses HDFS as a distributed storage layer. # Known Risks ## Orphaned Products The risk of REEF being orphaned is small because Microsoft products are built on REEF. The core REEF developers continue to work on REEF at Microsoft, UCLA, and Seoul National University. The REEF project is gaining interest from other institutions to be used as their infrastructure. ## Inexperience with Open Source Several core developers have experience with open source development. REEF committers will be guided by the mentors with strong Apache open source project backgrounds. ## Homogeneous Developers The initial committers include developers from several institutions including Microsoft, Purestorage, UCB, UCLA, and Seoul National University. ## Reliance on Salaried Developers Developers from Microsoft are paid to work on REEF. Since the work is used internally at Microsoft, Microsoft will keep