Re: [PROPOSAL] REEF for the Apache Incubator
Looks like the feedback has been well received. Any reason not to start a vote? Thanks, Roman. On Mon, Aug 4, 2014 at 11:12 PM, Byung-Gon Chun bgc...@gmail.com wrote: Hi Jake, Thank you for the comment. We had discussions on how to structure mailing lists with our mentors. We took our mentors' suggestions to start with a minimal set (two mailing lists) not to miss important discussions and to split them if there are demands. Thanks! -Gon --- Byung-Gon Chun On Tue, Aug 5, 2014 at 3:04 AM, Jake Farrell jfarr...@apache.org wrote: Would suggest you use the following format for the mailing lists (you have the older format listed) and also split the dev and commits. Also a lot of new projects have been also splitting out the jira issues from dev to cut down on noise on the dev list, would add issues@reef if you want to do this. private@reef for private PMC discussions dev@reef for technical discussions commits@reef notification about commits issues@reef jira notifications -Jake On Fri, Aug 1, 2014 at 3:14 AM, Byung-Gon Chun bgc...@gmail.com wrote: Hi everyone, I would like to propose REEF to be an Apache Incubator project. REEF is a scale-out computing fabric that eases the development of Big Data applications on top of resource managers such as Apache YARN and Mesos. The proposal is included in plain text below. I would also like to put this on wiki but I don't have privileges to create wiki pages. I look forward to hearing everyone's thoughts and feedback! -Gon -- Byung-Gon Chun === # REEFProposal - Incubator # Abstract REEF (Retainable Evaluator Execution Framework) is a scale-out computing fabric that eases the development of Big Data applications on top of resource managers such as Apache YARN and Mesos. # Proposal REEF is a Big Data system that makes it easy to implement scalable, fault-tolerant runtime environments for a range of data processing models (e.g., graph processing and machine learning) on top of resource managers such as Apache YARN and Mesos. REEF provides capabilities to run multiple heterogeneous frameworks and workflows of those efficiently. Additionally, REEF contains two libraries that are of independent value: Wake is an event-based-programming framework inspired by Rx and SEDA. Tang is a dependency injection framework inspired by Google Guice, but designed specifically for configuring distributed systems. # Background The resource management layer such as Apache YARN and Mesos has emerged as a critical layer in the new scale-out data processing stack; resource managers assume the responsibility of multiplexing a cluster of shared-nothing machines across heterogeneous applications. They operate behind an interface for leasing containers - a slice of a machine’s resources - to computations in an elastic fashion. However, building data processing frameworks directly on this layer comes at a high cost: each framework must tackle the same challenges (e.g., fault-tolerance, task scheduling and coordination) and reimplement common mechanisms (e.g., caching, bulk transfers). REEF provides a reusable control-plane for scheduling and coordinating task-level work on cluster resource managers. The REEF design enables sophisticated optimizations, such as container re-use and data caching, and facilitates workflows that span multiple frameworks. Examples include pipelining data between different operators in a relational system, retaining state across iterations in iterative or recursive data flow, and passing the result of a MapReduce job to a Machine Learning computation. # Rationale Since REEF is a library that makes it easy to write distributed applications on top of Apache YARN or Mesos, the Apache Software Foundation is the perfect home for hosting REEF. # Current Status REEF has been developed mostly by Microsoft, UCLA and the Seoul National University. The REEF codebase is open-sourced under Apache License 2.0 and is currently hosted in a public repository at github.com. # Meritocracy We plan to build a strong open community by following the Apache meritocracy principles. We will work with those who contribute significantly to the project and invite them to be its committers. # Community REEF is currently being used internally at Microsoft. Also, SK Telecom builds their data analytics infrastructure on top of REEF in collaboration with Seoul National University. We hope to extend our contributor base by becoming an Apache incubator project. REEF will attract developers who are interested in creating common building blocks for simplifying the development of large-scale big data applications. # Core Developers Core developers are engineers from Microsoft, Purestorage, UCB, UCLA, UW and Seoul National University. # Alignment REEF
Re: [PROPOSAL] REEF for the Apache Incubator
Hi Roman, I will send an email to start a vote soon. Thanks! -Gon On Sat, Aug 9, 2014 at 8:32 AM, Roman Shaposhnik r...@apache.org wrote: Looks like the feedback has been well received. Any reason not to start a vote? Thanks, Roman. On Mon, Aug 4, 2014 at 11:12 PM, Byung-Gon Chun bgc...@gmail.com wrote: Hi Jake, Thank you for the comment. We had discussions on how to structure mailing lists with our mentors. We took our mentors' suggestions to start with a minimal set (two mailing lists) not to miss important discussions and to split them if there are demands. Thanks! -Gon --- Byung-Gon Chun On Tue, Aug 5, 2014 at 3:04 AM, Jake Farrell jfarr...@apache.org wrote: Would suggest you use the following format for the mailing lists (you have the older format listed) and also split the dev and commits. Also a lot of new projects have been also splitting out the jira issues from dev to cut down on noise on the dev list, would add issues@reef if you want to do this. private@reef for private PMC discussions dev@reef for technical discussions commits@reef notification about commits issues@reef jira notifications -Jake On Fri, Aug 1, 2014 at 3:14 AM, Byung-Gon Chun bgc...@gmail.com wrote: Hi everyone, I would like to propose REEF to be an Apache Incubator project. REEF is a scale-out computing fabric that eases the development of Big Data applications on top of resource managers such as Apache YARN and Mesos. The proposal is included in plain text below. I would also like to put this on wiki but I don't have privileges to create wiki pages. I look forward to hearing everyone's thoughts and feedback! -Gon -- Byung-Gon Chun === # REEFProposal - Incubator # Abstract REEF (Retainable Evaluator Execution Framework) is a scale-out computing fabric that eases the development of Big Data applications on top of resource managers such as Apache YARN and Mesos. # Proposal REEF is a Big Data system that makes it easy to implement scalable, fault-tolerant runtime environments for a range of data processing models (e.g., graph processing and machine learning) on top of resource managers such as Apache YARN and Mesos. REEF provides capabilities to run multiple heterogeneous frameworks and workflows of those efficiently. Additionally, REEF contains two libraries that are of independent value: Wake is an event-based-programming framework inspired by Rx and SEDA. Tang is a dependency injection framework inspired by Google Guice, but designed specifically for configuring distributed systems. # Background The resource management layer such as Apache YARN and Mesos has emerged as a critical layer in the new scale-out data processing stack; resource managers assume the responsibility of multiplexing a cluster of shared-nothing machines across heterogeneous applications. They operate behind an interface for leasing containers - a slice of a machine’s resources - to computations in an elastic fashion. However, building data processing frameworks directly on this layer comes at a high cost: each framework must tackle the same challenges (e.g., fault-tolerance, task scheduling and coordination) and reimplement common mechanisms (e.g., caching, bulk transfers). REEF provides a reusable control-plane for scheduling and coordinating task-level work on cluster resource managers. The REEF design enables sophisticated optimizations, such as container re-use and data caching, and facilitates workflows that span multiple frameworks. Examples include pipelining data between different operators in a relational system, retaining state across iterations in iterative or recursive data flow, and passing the result of a MapReduce job to a Machine Learning computation. # Rationale Since REEF is a library that makes it easy to write distributed applications on top of Apache YARN or Mesos, the Apache Software Foundation is the perfect home for hosting REEF. # Current Status REEF has been developed mostly by Microsoft, UCLA and the Seoul National University. The REEF codebase is open-sourced under Apache License 2.0 and is currently hosted in a public repository at github.com. # Meritocracy We plan to build a strong open community by following the Apache meritocracy principles. We will work with those who contribute significantly to the project and invite them to be its committers. # Community REEF is currently being used internally at Microsoft. Also, SK Telecom builds their data analytics infrastructure on top of REEF in collaboration with Seoul National University. We hope to extend our contributor base by becoming an Apache incubator project. REEF will attract
Re: [PROPOSAL] REEF for the Apache Incubator
Hi Roman, Thank you for the comment. We will add the following description that covers Helix to the proposal page. Apache Helix automates application-wide management operations which require global knowledge and coordination, such as repartitioning of resources and scheduling of maintenance tasks. Helix separates global coordination concerns from the functional tasks of the application with a state machine abstraction. REEF's generic layer makes it easy to program the functional and management tasks, which may span small or large groups within the application. Helix can work hand-in-hand with REEF, by providing the global management component for REEF applications. Thanks! - Gon --- Byung-Gon Chun On Tue, Aug 5, 2014 at 1:59 AM, Roman Shaposhnik r...@apache.org wrote: Hi! On Fri, Aug 1, 2014 at 12:14 AM, Byung-Gon Chun bgc...@gmail.com wrote: Since REEF is a library that makes it easy to write distributed applications on top of Apache YARN or Mesos, the Apache Software Foundation is the perfect home for hosting REEF. [ snip...snip...snip ] ## Relationships with Other Apache Products Really appreciated the detailed review of potential relationships, but was surprised not to see Apache Helix on the list of related projects. Given the exec summary of the project -- there must be some relationship. Or am I reading it incorrectly? Thanks, Roman. - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org -- Byung-Gon Chun
Re: [PROPOSAL] REEF for the Apache Incubator
Hi Jake, Thank you for the comment. We had discussions on how to structure mailing lists with our mentors. We took our mentors' suggestions to start with a minimal set (two mailing lists) not to miss important discussions and to split them if there are demands. Thanks! -Gon --- Byung-Gon Chun On Tue, Aug 5, 2014 at 3:04 AM, Jake Farrell jfarr...@apache.org wrote: Would suggest you use the following format for the mailing lists (you have the older format listed) and also split the dev and commits. Also a lot of new projects have been also splitting out the jira issues from dev to cut down on noise on the dev list, would add issues@reef if you want to do this. private@reef for private PMC discussions dev@reef for technical discussions commits@reef notification about commits issues@reef jira notifications -Jake On Fri, Aug 1, 2014 at 3:14 AM, Byung-Gon Chun bgc...@gmail.com wrote: Hi everyone, I would like to propose REEF to be an Apache Incubator project. REEF is a scale-out computing fabric that eases the development of Big Data applications on top of resource managers such as Apache YARN and Mesos. The proposal is included in plain text below. I would also like to put this on wiki but I don't have privileges to create wiki pages. I look forward to hearing everyone's thoughts and feedback! -Gon -- Byung-Gon Chun === # REEFProposal - Incubator # Abstract REEF (Retainable Evaluator Execution Framework) is a scale-out computing fabric that eases the development of Big Data applications on top of resource managers such as Apache YARN and Mesos. # Proposal REEF is a Big Data system that makes it easy to implement scalable, fault-tolerant runtime environments for a range of data processing models (e.g., graph processing and machine learning) on top of resource managers such as Apache YARN and Mesos. REEF provides capabilities to run multiple heterogeneous frameworks and workflows of those efficiently. Additionally, REEF contains two libraries that are of independent value: Wake is an event-based-programming framework inspired by Rx and SEDA. Tang is a dependency injection framework inspired by Google Guice, but designed specifically for configuring distributed systems. # Background The resource management layer such as Apache YARN and Mesos has emerged as a critical layer in the new scale-out data processing stack; resource managers assume the responsibility of multiplexing a cluster of shared-nothing machines across heterogeneous applications. They operate behind an interface for leasing containers - a slice of a machine’s resources - to computations in an elastic fashion. However, building data processing frameworks directly on this layer comes at a high cost: each framework must tackle the same challenges (e.g., fault-tolerance, task scheduling and coordination) and reimplement common mechanisms (e.g., caching, bulk transfers). REEF provides a reusable control-plane for scheduling and coordinating task-level work on cluster resource managers. The REEF design enables sophisticated optimizations, such as container re-use and data caching, and facilitates workflows that span multiple frameworks. Examples include pipelining data between different operators in a relational system, retaining state across iterations in iterative or recursive data flow, and passing the result of a MapReduce job to a Machine Learning computation. # Rationale Since REEF is a library that makes it easy to write distributed applications on top of Apache YARN or Mesos, the Apache Software Foundation is the perfect home for hosting REEF. # Current Status REEF has been developed mostly by Microsoft, UCLA and the Seoul National University. The REEF codebase is open-sourced under Apache License 2.0 and is currently hosted in a public repository at github.com. # Meritocracy We plan to build a strong open community by following the Apache meritocracy principles. We will work with those who contribute significantly to the project and invite them to be its committers. # Community REEF is currently being used internally at Microsoft. Also, SK Telecom builds their data analytics infrastructure on top of REEF in collaboration with Seoul National University. We hope to extend our contributor base by becoming an Apache incubator project. REEF will attract developers who are interested in creating common building blocks for simplifying the development of large-scale big data applications. # Core Developers Core developers are engineers from Microsoft, Purestorage, UCB, UCLA, UW and Seoul National University. # Alignment REEF depends on many Apache projects and dependencies. REEF is built on resource managers such as Apache YARN and Apache Mesos. REEF also uses HDFS as a distributed storage layer.
Re: [PROPOSAL] REEF for the Apache Incubator
Hi! On Fri, Aug 1, 2014 at 12:14 AM, Byung-Gon Chun bgc...@gmail.com wrote: Since REEF is a library that makes it easy to write distributed applications on top of Apache YARN or Mesos, the Apache Software Foundation is the perfect home for hosting REEF. [ snip...snip...snip ] ## Relationships with Other Apache Products Really appreciated the detailed review of potential relationships, but was surprised not to see Apache Helix on the list of related projects. Given the exec summary of the project -- there must be some relationship. Or am I reading it incorrectly? Thanks, Roman. - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [PROPOSAL] REEF for the Apache Incubator
Would suggest you use the following format for the mailing lists (you have the older format listed) and also split the dev and commits. Also a lot of new projects have been also splitting out the jira issues from dev to cut down on noise on the dev list, would add issues@reef if you want to do this. private@reef for private PMC discussions dev@reef for technical discussions commits@reef notification about commits issues@reef jira notifications -Jake On Fri, Aug 1, 2014 at 3:14 AM, Byung-Gon Chun bgc...@gmail.com wrote: Hi everyone, I would like to propose REEF to be an Apache Incubator project. REEF is a scale-out computing fabric that eases the development of Big Data applications on top of resource managers such as Apache YARN and Mesos. The proposal is included in plain text below. I would also like to put this on wiki but I don't have privileges to create wiki pages. I look forward to hearing everyone's thoughts and feedback! -Gon -- Byung-Gon Chun === # REEFProposal - Incubator # Abstract REEF (Retainable Evaluator Execution Framework) is a scale-out computing fabric that eases the development of Big Data applications on top of resource managers such as Apache YARN and Mesos. # Proposal REEF is a Big Data system that makes it easy to implement scalable, fault-tolerant runtime environments for a range of data processing models (e.g., graph processing and machine learning) on top of resource managers such as Apache YARN and Mesos. REEF provides capabilities to run multiple heterogeneous frameworks and workflows of those efficiently. Additionally, REEF contains two libraries that are of independent value: Wake is an event-based-programming framework inspired by Rx and SEDA. Tang is a dependency injection framework inspired by Google Guice, but designed specifically for configuring distributed systems. # Background The resource management layer such as Apache YARN and Mesos has emerged as a critical layer in the new scale-out data processing stack; resource managers assume the responsibility of multiplexing a cluster of shared-nothing machines across heterogeneous applications. They operate behind an interface for leasing containers - a slice of a machine’s resources - to computations in an elastic fashion. However, building data processing frameworks directly on this layer comes at a high cost: each framework must tackle the same challenges (e.g., fault-tolerance, task scheduling and coordination) and reimplement common mechanisms (e.g., caching, bulk transfers). REEF provides a reusable control-plane for scheduling and coordinating task-level work on cluster resource managers. The REEF design enables sophisticated optimizations, such as container re-use and data caching, and facilitates workflows that span multiple frameworks. Examples include pipelining data between different operators in a relational system, retaining state across iterations in iterative or recursive data flow, and passing the result of a MapReduce job to a Machine Learning computation. # Rationale Since REEF is a library that makes it easy to write distributed applications on top of Apache YARN or Mesos, the Apache Software Foundation is the perfect home for hosting REEF. # Current Status REEF has been developed mostly by Microsoft, UCLA and the Seoul National University. The REEF codebase is open-sourced under Apache License 2.0 and is currently hosted in a public repository at github.com. # Meritocracy We plan to build a strong open community by following the Apache meritocracy principles. We will work with those who contribute significantly to the project and invite them to be its committers. # Community REEF is currently being used internally at Microsoft. Also, SK Telecom builds their data analytics infrastructure on top of REEF in collaboration with Seoul National University. We hope to extend our contributor base by becoming an Apache incubator project. REEF will attract developers who are interested in creating common building blocks for simplifying the development of large-scale big data applications. # Core Developers Core developers are engineers from Microsoft, Purestorage, UCB, UCLA, UW and Seoul National University. # Alignment REEF depends on many Apache projects and dependencies. REEF is built on resource managers such as Apache YARN and Apache Mesos. REEF also uses HDFS as a distributed storage layer. # Known Risks ## Orphaned Products The risk of REEF being orphaned is small because Microsoft products are built on REEF. The core REEF developers continue to work on REEF at Microsoft, UCLA, and Seoul National University. The REEF project is gaining interest from other institutions to be used as their infrastructure. ## Inexperience with Open Source Several core developers have experience with open source development. REEF committers will be guided by the mentors with strong Apache
Re: [PROPOSAL] REEF for the Apache Incubator
John, Thank you for the feedback and the offer to help! On the mentors, I think that four mentors (including Chris Douglas) can cover REEF at this point. Thanks! -Gon --- Byung-Gon Chun On Sat, Aug 2, 2014 at 10:12 PM, John D. Ament john.d.am...@gmail.com wrote: Byung-Gon It looks like a good proposal. There are some minor edit I'd recommend you'd do: - Use the same github URL consistently. - I just fixed the section of the proposal guide to include how to reference a git repository. This should help you and future proposed podlings get things going better. If you like - you already have 3 mentors, I'd be willing to step up and help mentor REEF as well. John On Fri, Aug 1, 2014 at 3:14 AM, Byung-Gon Chun bgc...@gmail.com wrote: Hi everyone, I would like to propose REEF to be an Apache Incubator project. REEF is a scale-out computing fabric that eases the development of Big Data applications on top of resource managers such as Apache YARN and Mesos. The proposal is included in plain text below. I would also like to put this on wiki but I don't have privileges to create wiki pages. I look forward to hearing everyone's thoughts and feedback! -Gon -- Byung-Gon Chun === # REEFProposal - Incubator # Abstract REEF (Retainable Evaluator Execution Framework) is a scale-out computing fabric that eases the development of Big Data applications on top of resource managers such as Apache YARN and Mesos. # Proposal REEF is a Big Data system that makes it easy to implement scalable, fault-tolerant runtime environments for a range of data processing models (e.g., graph processing and machine learning) on top of resource managers such as Apache YARN and Mesos. REEF provides capabilities to run multiple heterogeneous frameworks and workflows of those efficiently. Additionally, REEF contains two libraries that are of independent value: Wake is an event-based-programming framework inspired by Rx and SEDA. Tang is a dependency injection framework inspired by Google Guice, but designed specifically for configuring distributed systems. # Background The resource management layer such as Apache YARN and Mesos has emerged as a critical layer in the new scale-out data processing stack; resource managers assume the responsibility of multiplexing a cluster of shared-nothing machines across heterogeneous applications. They operate behind an interface for leasing containers - a slice of a machine’s resources - to computations in an elastic fashion. However, building data processing frameworks directly on this layer comes at a high cost: each framework must tackle the same challenges (e.g., fault-tolerance, task scheduling and coordination) and reimplement common mechanisms (e.g., caching, bulk transfers). REEF provides a reusable control-plane for scheduling and coordinating task-level work on cluster resource managers. The REEF design enables sophisticated optimizations, such as container re-use and data caching, and facilitates workflows that span multiple frameworks. Examples include pipelining data between different operators in a relational system, retaining state across iterations in iterative or recursive data flow, and passing the result of a MapReduce job to a Machine Learning computation. # Rationale Since REEF is a library that makes it easy to write distributed applications on top of Apache YARN or Mesos, the Apache Software Foundation is the perfect home for hosting REEF. # Current Status REEF has been developed mostly by Microsoft, UCLA and the Seoul National University. The REEF codebase is open-sourced under Apache License 2.0 and is currently hosted in a public repository at github.com. # Meritocracy We plan to build a strong open community by following the Apache meritocracy principles. We will work with those who contribute significantly to the project and invite them to be its committers. # Community REEF is currently being used internally at Microsoft. Also, SK Telecom builds their data analytics infrastructure on top of REEF in collaboration with Seoul National University. We hope to extend our contributor base by becoming an Apache incubator project. REEF will attract developers who are interested in creating common building blocks for simplifying the development of large-scale big data applications. # Core Developers Core developers are engineers from Microsoft, Purestorage, UCB, UCLA, UW and Seoul National University. # Alignment REEF depends on many Apache projects and dependencies. REEF is built on resource managers such as Apache YARN and Apache Mesos. REEF also uses HDFS as a distributed storage layer. # Known Risks ## Orphaned Products The risk of REEF being orphaned is small because Microsoft products are built on REEF. The core REEF
RE: [PROPOSAL] REEF for the Apache Incubator
I have no objection to additional mentors. Please sign up Sent from my phone - please forgive brevity and typos -Original Message- From: Byung-Gon Chun bgc...@gmail.com Sent: 8/3/2014 0:43 To: general@incubator.apache.org general@incubator.apache.org Subject: Re: [PROPOSAL] REEF for the Apache Incubator John, Thank you for the feedback and the offer to help! On the mentors, I think that four mentors (including Chris Douglas) can cover REEF at this point. Thanks! -Gon --- Byung-Gon Chun On Sat, Aug 2, 2014 at 10:12 PM, John D. Ament john.d.am...@gmail.com wrote: Byung-Gon It looks like a good proposal. There are some minor edit I'd recommend you'd do: - Use the same github URL consistently. - I just fixed the section of the proposal guide to include how to reference a git repository. This should help you and future proposed podlings get things going better. If you like - you already have 3 mentors, I'd be willing to step up and help mentor REEF as well. John On Fri, Aug 1, 2014 at 3:14 AM, Byung-Gon Chun bgc...@gmail.com wrote: Hi everyone, I would like to propose REEF to be an Apache Incubator project. REEF is a scale-out computing fabric that eases the development of Big Data applications on top of resource managers such as Apache YARN and Mesos. The proposal is included in plain text below. I would also like to put this on wiki but I don't have privileges to create wiki pages. I look forward to hearing everyone's thoughts and feedback! -Gon -- Byung-Gon Chun === # REEFProposal - Incubator # Abstract REEF (Retainable Evaluator Execution Framework) is a scale-out computing fabric that eases the development of Big Data applications on top of resource managers such as Apache YARN and Mesos. # Proposal REEF is a Big Data system that makes it easy to implement scalable, fault-tolerant runtime environments for a range of data processing models (e.g., graph processing and machine learning) on top of resource managers such as Apache YARN and Mesos. REEF provides capabilities to run multiple heterogeneous frameworks and workflows of those efficiently. Additionally, REEF contains two libraries that are of independent value: Wake is an event-based-programming framework inspired by Rx and SEDA. Tang is a dependency injection framework inspired by Google Guice, but designed specifically for configuring distributed systems. # Background The resource management layer such as Apache YARN and Mesos has emerged as a critical layer in the new scale-out data processing stack; resource managers assume the responsibility of multiplexing a cluster of shared-nothing machines across heterogeneous applications. They operate behind an interface for leasing containers - a slice of a machine’s resources - to computations in an elastic fashion. However, building data processing frameworks directly on this layer comes at a high cost: each framework must tackle the same challenges (e.g., fault-tolerance, task scheduling and coordination) and reimplement common mechanisms (e.g., caching, bulk transfers). REEF provides a reusable control-plane for scheduling and coordinating task-level work on cluster resource managers. The REEF design enables sophisticated optimizations, such as container re-use and data caching, and facilitates workflows that span multiple frameworks. Examples include pipelining data between different operators in a relational system, retaining state across iterations in iterative or recursive data flow, and passing the result of a MapReduce job to a Machine Learning computation. # Rationale Since REEF is a library that makes it easy to write distributed applications on top of Apache YARN or Mesos, the Apache Software Foundation is the perfect home for hosting REEF. # Current Status REEF has been developed mostly by Microsoft, UCLA and the Seoul National University. The REEF codebase is open-sourced under Apache License 2.0 and is currently hosted in a public repository at github.com. # Meritocracy We plan to build a strong open community by following the Apache meritocracy principles. We will work with those who contribute significantly to the project and invite them to be its committers. # Community REEF is currently being used internally at Microsoft. Also, SK Telecom builds their data analytics infrastructure on top of REEF in collaboration with Seoul National University. We hope to extend our contributor base by becoming an Apache incubator project. REEF will attract developers who are interested in creating common building blocks for simplifying the development of large-scale big data applications. # Core Developers Core developers are engineers from Microsoft, Purestorage, UCB, UCLA, UW and Seoul National University. # Alignment REEF depends
Re: [PROPOSAL] REEF for the Apache Incubator
Byung-Gon It looks like a good proposal. There are some minor edit I'd recommend you'd do: - Use the same github URL consistently. - I just fixed the section of the proposal guide to include how to reference a git repository. This should help you and future proposed podlings get things going better. If you like - you already have 3 mentors, I'd be willing to step up and help mentor REEF as well. John On Fri, Aug 1, 2014 at 3:14 AM, Byung-Gon Chun bgc...@gmail.com wrote: Hi everyone, I would like to propose REEF to be an Apache Incubator project. REEF is a scale-out computing fabric that eases the development of Big Data applications on top of resource managers such as Apache YARN and Mesos. The proposal is included in plain text below. I would also like to put this on wiki but I don't have privileges to create wiki pages. I look forward to hearing everyone's thoughts and feedback! -Gon -- Byung-Gon Chun === # REEFProposal - Incubator # Abstract REEF (Retainable Evaluator Execution Framework) is a scale-out computing fabric that eases the development of Big Data applications on top of resource managers such as Apache YARN and Mesos. # Proposal REEF is a Big Data system that makes it easy to implement scalable, fault-tolerant runtime environments for a range of data processing models (e.g., graph processing and machine learning) on top of resource managers such as Apache YARN and Mesos. REEF provides capabilities to run multiple heterogeneous frameworks and workflows of those efficiently. Additionally, REEF contains two libraries that are of independent value: Wake is an event-based-programming framework inspired by Rx and SEDA. Tang is a dependency injection framework inspired by Google Guice, but designed specifically for configuring distributed systems. # Background The resource management layer such as Apache YARN and Mesos has emerged as a critical layer in the new scale-out data processing stack; resource managers assume the responsibility of multiplexing a cluster of shared-nothing machines across heterogeneous applications. They operate behind an interface for leasing containers - a slice of a machine’s resources - to computations in an elastic fashion. However, building data processing frameworks directly on this layer comes at a high cost: each framework must tackle the same challenges (e.g., fault-tolerance, task scheduling and coordination) and reimplement common mechanisms (e.g., caching, bulk transfers). REEF provides a reusable control-plane for scheduling and coordinating task-level work on cluster resource managers. The REEF design enables sophisticated optimizations, such as container re-use and data caching, and facilitates workflows that span multiple frameworks. Examples include pipelining data between different operators in a relational system, retaining state across iterations in iterative or recursive data flow, and passing the result of a MapReduce job to a Machine Learning computation. # Rationale Since REEF is a library that makes it easy to write distributed applications on top of Apache YARN or Mesos, the Apache Software Foundation is the perfect home for hosting REEF. # Current Status REEF has been developed mostly by Microsoft, UCLA and the Seoul National University. The REEF codebase is open-sourced under Apache License 2.0 and is currently hosted in a public repository at github.com. # Meritocracy We plan to build a strong open community by following the Apache meritocracy principles. We will work with those who contribute significantly to the project and invite them to be its committers. # Community REEF is currently being used internally at Microsoft. Also, SK Telecom builds their data analytics infrastructure on top of REEF in collaboration with Seoul National University. We hope to extend our contributor base by becoming an Apache incubator project. REEF will attract developers who are interested in creating common building blocks for simplifying the development of large-scale big data applications. # Core Developers Core developers are engineers from Microsoft, Purestorage, UCB, UCLA, UW and Seoul National University. # Alignment REEF depends on many Apache projects and dependencies. REEF is built on resource managers such as Apache YARN and Apache Mesos. REEF also uses HDFS as a distributed storage layer. # Known Risks ## Orphaned Products The risk of REEF being orphaned is small because Microsoft products are built on REEF. The core REEF developers continue to work on REEF at Microsoft, UCLA, and Seoul National University. The REEF project is gaining interest from other institutions to be used as their infrastructure. ## Inexperience with Open Source Several core developers have experience with open source development. REEF committers will be guided by the mentors with strong Apache open source project backgrounds. ##
Re: [PROPOSAL] REEF for the Apache Incubator
I added the proposal to the Wiki at http://wiki.apache.org/incubator/ReefProposal Sent from Windows Mail From: bgc...@gmail.com Sent: Friday, August 1, 2014 12:14 AM To: general@incubator.apache.org Hi everyone, I would like to propose REEF to be an Apache Incubator project. REEF is a scale-out computing fabric that eases the development of Big Data applications on top of resource managers such as Apache YARN and Mesos. The proposal is included in plain text below. I would also like to put this on wiki but I don't have privileges to create wiki pages. I look forward to hearing everyone's thoughts and feedback! -Gon -- Byung-Gon Chun === # REEFProposal - Incubator # Abstract REEF (Retainable Evaluator Execution Framework) is a scale-out computing fabric that eases the development of Big Data applications on top of resource managers such as Apache YARN and Mesos. # Proposal REEF is a Big Data system that makes it easy to implement scalable, fault-tolerant runtime environments for a range of data processing models (e.g., graph processing and machine learning) on top of resource managers such as Apache YARN and Mesos. REEF provides capabilities to run multiple heterogeneous frameworks and workflows of those efficiently. Additionally, REEF contains two libraries that are of independent value: Wake is an event-based-programming framework inspired by Rx and SEDA. Tang is a dependency injection framework inspired by Google Guice, but designed specifically for configuring distributed systems. # Background The resource management layer such as Apache YARN and Mesos has emerged as a critical layer in the new scale-out data processing stack; resource managers assume the responsibility of multiplexing a cluster of shared-nothing machines across heterogeneous applications. They operate behind an interface for leasing containers - a slice of a machine’s resources - to computations in an elastic fashion. However, building data processing frameworks directly on this layer comes at a high cost: each framework must tackle the same challenges (e.g., fault-tolerance, task scheduling and coordination) and reimplement common mechanisms (e.g., caching, bulk transfers). REEF provides a reusable control-plane for scheduling and coordinating task-level work on cluster resource managers. The REEF design enables sophisticated optimizations, such as container re-use and data caching, and facilitates workflows that span multiple frameworks. Examples include pipelining data between different operators in a relational system, retaining state across iterations in iterative or recursive data flow, and passing the result of a MapReduce job to a Machine Learning computation. # Rationale Since REEF is a library that makes it easy to write distributed applications on top of Apache YARN or Mesos, the Apache Software Foundation is the perfect home for hosting REEF. # Current Status REEF has been developed mostly by Microsoft, UCLA and the Seoul National University. The REEF codebase is open-sourced under Apache License 2.0 and is currently hosted in a public repository at github.com. # Meritocracy We plan to build a strong open community by following the Apache meritocracy principles. We will work with those who contribute significantly to the project and invite them to be its committers. # Community REEF is currently being used internally at Microsoft. Also, SK Telecom builds their data analytics infrastructure on top of REEF in collaboration with Seoul National University. We hope to extend our contributor base by becoming an Apache incubator project. REEF will attract developers who are interested in creating common building blocks for simplifying the development of large-scale big data applications. # Core Developers Core developers are engineers from Microsoft, Purestorage, UCB, UCLA, UW and Seoul National University. # Alignment REEF depends on many Apache projects and dependencies. REEF is built on resource managers such as Apache YARN and Apache Mesos. REEF also uses HDFS as a distributed storage layer. # Known Risks ## Orphaned Products The risk of REEF being orphaned is small because Microsoft products are built on REEF. The core REEF developers continue to work on REEF at Microsoft, UCLA, and Seoul National University. The REEF project is gaining interest from other institutions to be used as their infrastructure. ## Inexperience with Open Source Several core developers have experience with open source development. REEF committers will be guided by the mentors with strong Apache open source project backgrounds. ## Homogeneous Developers The initial committers include developers from several institutions including Microsoft, Purestorage, UCB, UCLA, and Seoul National University. ## Reliance on Salaried Developers Developers from Microsoft are paid to work on REEF. Since the work is used internally at Microsoft, Microsoft will keep supporting the developers
Re: [PROPOSAL] REEF for the Apache Incubator
Thank you! --- Byung-Gon Chun Sent from my phone 2014. 8. 2. 오전 11:08 rgard...@opendirective.com 작성: I added the proposal to the Wiki at http://wiki.apache.org/incubator/ReefProposal Sent from Windows Mail From: bgc...@gmail.com Sent: Friday, August 1, 2014 12:14 AM To: general@incubator.apache.org Hi everyone, I would like to propose REEF to be an Apache Incubator project. REEF is a scale-out computing fabric that eases the development of Big Data applications on top of resource managers such as Apache YARN and Mesos. The proposal is included in plain text below. I would also like to put this on wiki but I don't have privileges to create wiki pages. I look forward to hearing everyone's thoughts and feedback! -Gon -- Byung-Gon Chun === # REEFProposal - Incubator # Abstract REEF (Retainable Evaluator Execution Framework) is a scale-out computing fabric that eases the development of Big Data applications on top of resource managers such as Apache YARN and Mesos. # Proposal REEF is a Big Data system that makes it easy to implement scalable, fault-tolerant runtime environments for a range of data processing models (e.g., graph processing and machine learning) on top of resource managers such as Apache YARN and Mesos. REEF provides capabilities to run multiple heterogeneous frameworks and workflows of those efficiently. Additionally, REEF contains two libraries that are of independent value: Wake is an event-based-programming framework inspired by Rx and SEDA. Tang is a dependency injection framework inspired by Google Guice, but designed specifically for configuring distributed systems. # Background The resource management layer such as Apache YARN and Mesos has emerged as a critical layer in the new scale-out data processing stack; resource managers assume the responsibility of multiplexing a cluster of shared-nothing machines across heterogeneous applications. They operate behind an interface for leasing containers - a slice of a machine’s resources - to computations in an elastic fashion. However, building data processing frameworks directly on this layer comes at a high cost: each framework must tackle the same challenges (e.g., fault-tolerance, task scheduling and coordination) and reimplement common mechanisms (e.g., caching, bulk transfers). REEF provides a reusable control-plane for scheduling and coordinating task-level work on cluster resource managers. The REEF design enables sophisticated optimizations, such as container re-use and data caching, and facilitates workflows that span multiple frameworks. Examples include pipelining data between different operators in a relational system, retaining state across iterations in iterative or recursive data flow, and passing the result of a MapReduce job to a Machine Learning computation. # Rationale Since REEF is a library that makes it easy to write distributed applications on top of Apache YARN or Mesos, the Apache Software Foundation is the perfect home for hosting REEF. # Current Status REEF has been developed mostly by Microsoft, UCLA and the Seoul National University. The REEF codebase is open-sourced under Apache License 2.0 and is currently hosted in a public repository at github.com. # Meritocracy We plan to build a strong open community by following the Apache meritocracy principles. We will work with those who contribute significantly to the project and invite them to be its committers. # Community REEF is currently being used internally at Microsoft. Also, SK Telecom builds their data analytics infrastructure on top of REEF in collaboration with Seoul National University. We hope to extend our contributor base by becoming an Apache incubator project. REEF will attract developers who are interested in creating common building blocks for simplifying the development of large-scale big data applications. # Core Developers Core developers are engineers from Microsoft, Purestorage, UCB, UCLA, UW and Seoul National University. # Alignment REEF depends on many Apache projects and dependencies. REEF is built on resource managers such as Apache YARN and Apache Mesos. REEF also uses HDFS as a distributed storage layer. # Known Risks ## Orphaned Products The risk of REEF being orphaned is small because Microsoft products are built on REEF. The core REEF developers continue to work on REEF at Microsoft, UCLA, and Seoul National University. The REEF project is gaining interest from other institutions to be used as their infrastructure. ## Inexperience with Open Source Several core developers have experience with open source development. REEF committers will be guided by the mentors with strong Apache open source project backgrounds. ## Homogeneous Developers The initial committers include developers from several institutions
Re: [PROPOSAL] REEF for the Apache Incubator
REEF looks great! Look fwd to see it grow in the ASF. Welcome! thanks, Arun On Aug 1, 2014, at 12:14 AM, Byung-Gon Chun bgc...@gmail.com wrote: Hi everyone, I would like to propose REEF to be an Apache Incubator project. REEF is a scale-out computing fabric that eases the development of Big Data applications on top of resource managers such as Apache YARN and Mesos. The proposal is included in plain text below. I would also like to put this on wiki but I don't have privileges to create wiki pages. I look forward to hearing everyone's thoughts and feedback! -Gon -- Byung-Gon Chun === # REEFProposal - Incubator # Abstract REEF (Retainable Evaluator Execution Framework) is a scale-out computing fabric that eases the development of Big Data applications on top of resource managers such as Apache YARN and Mesos. # Proposal REEF is a Big Data system that makes it easy to implement scalable, fault-tolerant runtime environments for a range of data processing models (e.g., graph processing and machine learning) on top of resource managers such as Apache YARN and Mesos. REEF provides capabilities to run multiple heterogeneous frameworks and workflows of those efficiently. Additionally, REEF contains two libraries that are of independent value: Wake is an event-based-programming framework inspired by Rx and SEDA. Tang is a dependency injection framework inspired by Google Guice, but designed specifically for configuring distributed systems. # Background The resource management layer such as Apache YARN and Mesos has emerged as a critical layer in the new scale-out data processing stack; resource managers assume the responsibility of multiplexing a cluster of shared-nothing machines across heterogeneous applications. They operate behind an interface for leasing containers - a slice of a machine’s resources - to computations in an elastic fashion. However, building data processing frameworks directly on this layer comes at a high cost: each framework must tackle the same challenges (e.g., fault-tolerance, task scheduling and coordination) and reimplement common mechanisms (e.g., caching, bulk transfers). REEF provides a reusable control-plane for scheduling and coordinating task-level work on cluster resource managers. The REEF design enables sophisticated optimizations, such as container re-use and data caching, and facilitates workflows that span multiple frameworks. Examples include pipelining data between different operators in a relational system, retaining state across iterations in iterative or recursive data flow, and passing the result of a MapReduce job to a Machine Learning computation. # Rationale Since REEF is a library that makes it easy to write distributed applications on top of Apache YARN or Mesos, the Apache Software Foundation is the perfect home for hosting REEF. # Current Status REEF has been developed mostly by Microsoft, UCLA and the Seoul National University. The REEF codebase is open-sourced under Apache License 2.0 and is currently hosted in a public repository at github.com. # Meritocracy We plan to build a strong open community by following the Apache meritocracy principles. We will work with those who contribute significantly to the project and invite them to be its committers. # Community REEF is currently being used internally at Microsoft. Also, SK Telecom builds their data analytics infrastructure on top of REEF in collaboration with Seoul National University. We hope to extend our contributor base by becoming an Apache incubator project. REEF will attract developers who are interested in creating common building blocks for simplifying the development of large-scale big data applications. # Core Developers Core developers are engineers from Microsoft, Purestorage, UCB, UCLA, UW and Seoul National University. # Alignment REEF depends on many Apache projects and dependencies. REEF is built on resource managers such as Apache YARN and Apache Mesos. REEF also uses HDFS as a distributed storage layer. # Known Risks ## Orphaned Products The risk of REEF being orphaned is small because Microsoft products are built on REEF. The core REEF developers continue to work on REEF at Microsoft, UCLA, and Seoul National University. The REEF project is gaining interest from other institutions to be used as their infrastructure. ## Inexperience with Open Source Several core developers have experience with open source development. REEF committers will be guided by the mentors with strong Apache open source project backgrounds. ## Homogeneous Developers The initial committers include developers from several institutions including Microsoft, Purestorage, UCB, UCLA, and Seoul National University. ## Reliance on Salaried Developers Developers from Microsoft are paid to work on REEF. Since the work is used internally at Microsoft, Microsoft will keep supporting