Re: [VOTE] S4 to join the Incubator
Hi Leo, I am Sajeevan , I am working for Ericsson, Ireland . I have 13 years of experience in Java technologies and distributed computing. We(Ericsson) are looking for distributed streaming projects for telecommunication devices performance monitoring and mobile phone user experience analysis . This project is very interesting , I have plenty of experience in tcp/ip data stream processing and very interested to join in this project and help to implement. If you are interested, you can add me to committer's list. Thanks Sajeevan On 27 September 2011 18:23, Flavio Junqueira f...@s4.io wrote: I'm thrilled to see that it passed. Thanks for all the support so far, and I'm looking forward to setting it up and getting the project going. -Flavio On Sep 26, 2011, at 6:47 PM, Patrick Hunt wrote: This passes, with 16 +1 votes, plenty of them binding, and no -1 votes. Thanks to all who voted! We can now get started creating the Apache S4 podling. Patrick On Tue, Sep 20, 2011 at 1:56 PM, Patrick Hunt ph...@apache.org wrote: It's been a nearly a week since the S4 proposal was submitted for discussion. A few questions were asked, and the proposal was clarified in response. Sufficient mentors have volunteered. I thus feel we are now ready for a vote. The latest proposal can be found at the end of this email and at: http://wiki.apache.org/**incubator/S4Proposalhttp://wiki.apache.org/incubator/S4Proposal The discussion regarding the proposal can be found at: http://s.apache.org/RMU Please cast your votes: [ ] +1 Accept S4 for incubation [ ] +0 Indifferent to S4 incubation [ ] -1 Reject S4 for incubation This vote will close 72 hours from now. Thanks, Patrick -- = S4 Proposal = == Abstract == S4 (Simple Scalable Streaming System) is a general-purpose, distributed, scalable, partially fault-tolerant, pluggable platform that allows programmers to easily develop applications for processing continuous, unbounded streams of data. == Proposal == S4 is a software platform written in Java. Clients that send and receive events can be written in any programming language. S4 also includes a collection of modules called Processing Elements (or PEs for short) that implement basic functionality and can be used by application developers. In S4, keyed data events are routed with affinity to Processing Elements (PEs), which consume the events and do one or both of the following: (1) ''emit'' one or more events which may be consumed by other PEs, (2) ''publish'' results. The architecture resembles the Actors model, providing semantics of encapsulation and location transparency, thus allowing applications to be massively concurrent while exposing a simple programming interface to application developers. To drive adoption and increase the number of contributors to the project, we may need to prioritize the focus based on feedback from the community. We believe that one of the top priorities and driving design principle for the S4 project is to provide a simple API that hides most of the complexity associated with distributed systems and concurrency. The project grew out of the need to provide a flexible platform for application developers and scientists that can be used for quick experimentation and production. S4 differs from existing Apache projects in a number of fundamental ways. Flume is an Incubator project that focuses on log processing, performing lightweight processing in a distributed fashion and accumulating log data in a centralized repository for batch processing. S4 instead performs all stream processing in a distributed fashion and enables applications to form arbitrary graphs to process streams of events. We see Flume as a complementary project. We also expect S4 to complement Hadoop processing and in some cases to supersede it. Kafka is another Incubator project that focuses on processing large amounts of stream data. The design of Kafka, however, follows the pub-sub paradigm, which focuses on delivering messages containing arbitrary data from source processes (publishers) to consumer processes (subscribers). Compared to S4, Kafka is an intermediate step between data generation and processing, while S4 is itself a platform for processing streams of events. S4 overall addresses a need of existing applications to process streams of events beyond moving data to a centralized repository for batch processing. It complements the features of existing Apache projects, such as Hadoop, Flume, and Kafka, by providing a flexible platform for distributed event processing. == Background == S4 was initially developed at Yahoo! Labs starting in 2008 to process user feedback in the context of search advertising. The project was licensed under the Apache License version 2.0 in October 2010. The project documentation is currently available at http://s4.io . == Rationale == Stream computing
Re: [VOTE] S4 to join the Incubator
Hi Sajeevan, This is great! We really need people with your background and experience. We are just putting things together including some minimal processes for people who want to join. We will announce shortly. In the meantime, please clone this repo to get started with the latest code: https://github.com/leoneu/s4-piper here you can see the API I am proposing for the next S4 release. The experimental integration with the communication layer is being done here: https://github.com/brucerobbins/s4-piper-commlayer_experiment Once we start the new repository in incubator we will merge in one place. The communication layer is an abstraction that makes it possible to implement network communication using any framework. We have a simple UDP-based implementation and the new Netty-based implementation. If you can help with the design and code of the Netty implementation or suggest other ideas, that would be extremely valuable. thanks! -leo On Sep 28, 2011, at 12:36 PM, Sajeevan Achuthan wrote: Hi Leo, I am Sajeevan , I am working for Ericsson, Ireland . I have 13 years of experience in Java technologies and distributed computing. We(Ericsson) are looking for distributed streaming projects for telecommunication devices performance monitoring and mobile phone user experience analysis . This project is very interesting , I have plenty of experience in tcp/ip data stream processing and very interested to join in this project and help to implement. If you are interested, you can add me to committer's list. Thanks Sajeevan On 27 September 2011 18:23, Flavio Junqueira f...@s4.io wrote: I'm thrilled to see that it passed. Thanks for all the support so far, and I'm looking forward to setting it up and getting the project going. -Flavio On Sep 26, 2011, at 6:47 PM, Patrick Hunt wrote: This passes, with 16 +1 votes, plenty of them binding, and no -1 votes. Thanks to all who voted! We can now get started creating the Apache S4 podling. Patrick On Tue, Sep 20, 2011 at 1:56 PM, Patrick Hunt ph...@apache.org wrote: It's been a nearly a week since the S4 proposal was submitted for discussion. A few questions were asked, and the proposal was clarified in response. Sufficient mentors have volunteered. I thus feel we are now ready for a vote. The latest proposal can be found at the end of this email and at: http://wiki.apache.org/**incubator/S4Proposalhttp://wiki.apache.org/incubator/S4Proposal The discussion regarding the proposal can be found at: http://s.apache.org/RMU Please cast your votes: [ ] +1 Accept S4 for incubation [ ] +0 Indifferent to S4 incubation [ ] -1 Reject S4 for incubation This vote will close 72 hours from now. Thanks, Patrick -- = S4 Proposal = == Abstract == S4 (Simple Scalable Streaming System) is a general-purpose, distributed, scalable, partially fault-tolerant, pluggable platform that allows programmers to easily develop applications for processing continuous, unbounded streams of data. == Proposal == S4 is a software platform written in Java. Clients that send and receive events can be written in any programming language. S4 also includes a collection of modules called Processing Elements (or PEs for short) that implement basic functionality and can be used by application developers. In S4, keyed data events are routed with affinity to Processing Elements (PEs), which consume the events and do one or both of the following: (1) ''emit'' one or more events which may be consumed by other PEs, (2) ''publish'' results. The architecture resembles the Actors model, providing semantics of encapsulation and location transparency, thus allowing applications to be massively concurrent while exposing a simple programming interface to application developers. To drive adoption and increase the number of contributors to the project, we may need to prioritize the focus based on feedback from the community. We believe that one of the top priorities and driving design principle for the S4 project is to provide a simple API that hides most of the complexity associated with distributed systems and concurrency. The project grew out of the need to provide a flexible platform for application developers and scientists that can be used for quick experimentation and production. S4 differs from existing Apache projects in a number of fundamental ways. Flume is an Incubator project that focuses on log processing, performing lightweight processing in a distributed fashion and accumulating log data in a centralized repository for batch processing. S4 instead performs all stream processing in a distributed fashion and enables applications to form arbitrary graphs to process streams of events. We see Flume as a complementary project. We also expect S4 to complement Hadoop processing and in some cases to supersede it. Kafka is another Incubator project that
Re: [VOTE] S4 to join the Incubator
I'm thrilled to see that it passed. Thanks for all the support so far, and I'm looking forward to setting it up and getting the project going. -Flavio On Sep 26, 2011, at 6:47 PM, Patrick Hunt wrote: This passes, with 16 +1 votes, plenty of them binding, and no -1 votes. Thanks to all who voted! We can now get started creating the Apache S4 podling. Patrick On Tue, Sep 20, 2011 at 1:56 PM, Patrick Hunt ph...@apache.org wrote: It's been a nearly a week since the S4 proposal was submitted for discussion. A few questions were asked, and the proposal was clarified in response. Sufficient mentors have volunteered. I thus feel we are now ready for a vote. The latest proposal can be found at the end of this email and at: http://wiki.apache.org/incubator/S4Proposal The discussion regarding the proposal can be found at: http://s.apache.org/RMU Please cast your votes: [ ] +1 Accept S4 for incubation [ ] +0 Indifferent to S4 incubation [ ] -1 Reject S4 for incubation This vote will close 72 hours from now. Thanks, Patrick -- = S4 Proposal = == Abstract == S4 (Simple Scalable Streaming System) is a general-purpose, distributed, scalable, partially fault-tolerant, pluggable platform that allows programmers to easily develop applications for processing continuous, unbounded streams of data. == Proposal == S4 is a software platform written in Java. Clients that send and receive events can be written in any programming language. S4 also includes a collection of modules called Processing Elements (or PEs for short) that implement basic functionality and can be used by application developers. In S4, keyed data events are routed with affinity to Processing Elements (PEs), which consume the events and do one or both of the following: (1) ''emit'' one or more events which may be consumed by other PEs, (2) ''publish'' results. The architecture resembles the Actors model, providing semantics of encapsulation and location transparency, thus allowing applications to be massively concurrent while exposing a simple programming interface to application developers. To drive adoption and increase the number of contributors to the project, we may need to prioritize the focus based on feedback from the community. We believe that one of the top priorities and driving design principle for the S4 project is to provide a simple API that hides most of the complexity associated with distributed systems and concurrency. The project grew out of the need to provide a flexible platform for application developers and scientists that can be used for quick experimentation and production. S4 differs from existing Apache projects in a number of fundamental ways. Flume is an Incubator project that focuses on log processing, performing lightweight processing in a distributed fashion and accumulating log data in a centralized repository for batch processing. S4 instead performs all stream processing in a distributed fashion and enables applications to form arbitrary graphs to process streams of events. We see Flume as a complementary project. We also expect S4 to complement Hadoop processing and in some cases to supersede it. Kafka is another Incubator project that focuses on processing large amounts of stream data. The design of Kafka, however, follows the pub-sub paradigm, which focuses on delivering messages containing arbitrary data from source processes (publishers) to consumer processes (subscribers). Compared to S4, Kafka is an intermediate step between data generation and processing, while S4 is itself a platform for processing streams of events. S4 overall addresses a need of existing applications to process streams of events beyond moving data to a centralized repository for batch processing. It complements the features of existing Apache projects, such as Hadoop, Flume, and Kafka, by providing a flexible platform for distributed event processing. == Background == S4 was initially developed at Yahoo! Labs starting in 2008 to process user feedback in the context of search advertising. The project was licensed under the Apache License version 2.0 in October 2010. The project documentation is currently available at http://s4.io . == Rationale == Stream computing has been growing steadily over the last 20 years. However, recently there has been an explosion in real-time data sources including the Web, sensor networks, financial securities analysis and trading, traffic monitoring, natural language processing of news and social data, and much more. As Hadoop evolved as a standard open source solution for batch processing of massive data sets, there is no equivalent community supported open source platform for processing data streams in real-time. While various research projects have evolved into proprietary commercial products, S4 has the potential to fill the gap. Many projects that require a scalable stream processing architecture currently use Hadoop by segmenting
Re: [VOTE] S4 to join the Incubator
This passes, with 16 +1 votes, plenty of them binding, and no -1 votes. Thanks to all who voted! We can now get started creating the Apache S4 podling. Patrick On Tue, Sep 20, 2011 at 1:56 PM, Patrick Hunt ph...@apache.org wrote: It's been a nearly a week since the S4 proposal was submitted for discussion. A few questions were asked, and the proposal was clarified in response. Sufficient mentors have volunteered. I thus feel we are now ready for a vote. The latest proposal can be found at the end of this email and at: http://wiki.apache.org/incubator/S4Proposal The discussion regarding the proposal can be found at: http://s.apache.org/RMU Please cast your votes: [ ] +1 Accept S4 for incubation [ ] +0 Indifferent to S4 incubation [ ] -1 Reject S4 for incubation This vote will close 72 hours from now. Thanks, Patrick -- = S4 Proposal = == Abstract == S4 (Simple Scalable Streaming System) is a general-purpose, distributed, scalable, partially fault-tolerant, pluggable platform that allows programmers to easily develop applications for processing continuous, unbounded streams of data. == Proposal == S4 is a software platform written in Java. Clients that send and receive events can be written in any programming language. S4 also includes a collection of modules called Processing Elements (or PEs for short) that implement basic functionality and can be used by application developers. In S4, keyed data events are routed with affinity to Processing Elements (PEs), which consume the events and do one or both of the following: (1) ''emit'' one or more events which may be consumed by other PEs, (2) ''publish'' results. The architecture resembles the Actors model, providing semantics of encapsulation and location transparency, thus allowing applications to be massively concurrent while exposing a simple programming interface to application developers. To drive adoption and increase the number of contributors to the project, we may need to prioritize the focus based on feedback from the community. We believe that one of the top priorities and driving design principle for the S4 project is to provide a simple API that hides most of the complexity associated with distributed systems and concurrency. The project grew out of the need to provide a flexible platform for application developers and scientists that can be used for quick experimentation and production. S4 differs from existing Apache projects in a number of fundamental ways. Flume is an Incubator project that focuses on log processing, performing lightweight processing in a distributed fashion and accumulating log data in a centralized repository for batch processing. S4 instead performs all stream processing in a distributed fashion and enables applications to form arbitrary graphs to process streams of events. We see Flume as a complementary project. We also expect S4 to complement Hadoop processing and in some cases to supersede it. Kafka is another Incubator project that focuses on processing large amounts of stream data. The design of Kafka, however, follows the pub-sub paradigm, which focuses on delivering messages containing arbitrary data from source processes (publishers) to consumer processes (subscribers). Compared to S4, Kafka is an intermediate step between data generation and processing, while S4 is itself a platform for processing streams of events. S4 overall addresses a need of existing applications to process streams of events beyond moving data to a centralized repository for batch processing. It complements the features of existing Apache projects, such as Hadoop, Flume, and Kafka, by providing a flexible platform for distributed event processing. == Background == S4 was initially developed at Yahoo! Labs starting in 2008 to process user feedback in the context of search advertising. The project was licensed under the Apache License version 2.0 in October 2010. The project documentation is currently available at http://s4.io . == Rationale == Stream computing has been growing steadily over the last 20 years. However, recently there has been an explosion in real-time data sources including the Web, sensor networks, financial securities analysis and trading, traffic monitoring, natural language processing of news and social data, and much more. As Hadoop evolved as a standard open source solution for batch processing of massive data sets, there is no equivalent community supported open source platform for processing data streams in real-time. While various research projects have evolved into proprietary commercial products, S4 has the potential to fill the gap. Many projects that require a scalable stream processing architecture currently use Hadoop by segmenting the input stream into data batches. This solution is not efficient, results in high latency, and introduces unnecessary complexity. The S4 design is
Re: [VOTE] S4 to join the Incubator
Thank you all for your support, looking forward to working with the Apache community. -leo On Sep 26, 2011, at 9:47 AM, Patrick Hunt wrote: This passes, with 16 +1 votes, plenty of them binding, and no -1 votes. Thanks to all who voted! We can now get started creating the Apache S4 podling. Patrick On Tue, Sep 20, 2011 at 1:56 PM, Patrick Hunt ph...@apache.org wrote: It's been a nearly a week since the S4 proposal was submitted for discussion. A few questions were asked, and the proposal was clarified in response. Sufficient mentors have volunteered. I thus feel we are now ready for a vote. The latest proposal can be found at the end of this email and at: http://wiki.apache.org/incubator/S4Proposal The discussion regarding the proposal can be found at: http://s.apache.org/RMU Please cast your votes: [ ] +1 Accept S4 for incubation [ ] +0 Indifferent to S4 incubation [ ] -1 Reject S4 for incubation This vote will close 72 hours from now. Thanks, Patrick -- = S4 Proposal = == Abstract == S4 (Simple Scalable Streaming System) is a general-purpose, distributed, scalable, partially fault-tolerant, pluggable platform that allows programmers to easily develop applications for processing continuous, unbounded streams of data. == Proposal == S4 is a software platform written in Java. Clients that send and receive events can be written in any programming language. S4 also includes a collection of modules called Processing Elements (or PEs for short) that implement basic functionality and can be used by application developers. In S4, keyed data events are routed with affinity to Processing Elements (PEs), which consume the events and do one or both of the following: (1) ''emit'' one or more events which may be consumed by other PEs, (2) ''publish'' results. The architecture resembles the Actors model, providing semantics of encapsulation and location transparency, thus allowing applications to be massively concurrent while exposing a simple programming interface to application developers. To drive adoption and increase the number of contributors to the project, we may need to prioritize the focus based on feedback from the community. We believe that one of the top priorities and driving design principle for the S4 project is to provide a simple API that hides most of the complexity associated with distributed systems and concurrency. The project grew out of the need to provide a flexible platform for application developers and scientists that can be used for quick experimentation and production. S4 differs from existing Apache projects in a number of fundamental ways. Flume is an Incubator project that focuses on log processing, performing lightweight processing in a distributed fashion and accumulating log data in a centralized repository for batch processing. S4 instead performs all stream processing in a distributed fashion and enables applications to form arbitrary graphs to process streams of events. We see Flume as a complementary project. We also expect S4 to complement Hadoop processing and in some cases to supersede it. Kafka is another Incubator project that focuses on processing large amounts of stream data. The design of Kafka, however, follows the pub-sub paradigm, which focuses on delivering messages containing arbitrary data from source processes (publishers) to consumer processes (subscribers). Compared to S4, Kafka is an intermediate step between data generation and processing, while S4 is itself a platform for processing streams of events. S4 overall addresses a need of existing applications to process streams of events beyond moving data to a centralized repository for batch processing. It complements the features of existing Apache projects, such as Hadoop, Flume, and Kafka, by providing a flexible platform for distributed event processing. == Background == S4 was initially developed at Yahoo! Labs starting in 2008 to process user feedback in the context of search advertising. The project was licensed under the Apache License version 2.0 in October 2010. The project documentation is currently available at http://s4.io . == Rationale == Stream computing has been growing steadily over the last 20 years. However, recently there has been an explosion in real-time data sources including the Web, sensor networks, financial securities analysis and trading, traffic monitoring, natural language processing of news and social data, and much more. As Hadoop evolved as a standard open source solution for batch processing of massive data sets, there is no equivalent community supported open source platform for processing data streams in real-time. While various research projects have evolved into proprietary commercial products, S4 has the potential to fill the gap. Many projects that require a scalable stream processing architecture currently
Re: [VOTE] S4 to join the Incubator
+1 Doug On Sep 20, 2011 1:57 PM, Patrick Hunt ph...@apache.org wrote: It's been a nearly a week since the S4 proposal was submitted for discussion. A few questions were asked, and the proposal was clarified in response. Sufficient mentors have volunteered. I thus feel we are now ready for a vote. The latest proposal can be found at the end of this email and at: http://wiki.apache.org/incubator/S4Proposal The discussion regarding the proposal can be found at: http://s.apache.org/RMU Please cast your votes: [ ] +1 Accept S4 for incubation [ ] +0 Indifferent to S4 incubation [ ] -1 Reject S4 for incubation This vote will close 72 hours from now. Thanks, Patrick -- = S4 Proposal = == Abstract == S4 (Simple Scalable Streaming System) is a general-purpose, distributed, scalable, partially fault-tolerant, pluggable platform that allows programmers to easily develop applications for processing continuous, unbounded streams of data. == Proposal == S4 is a software platform written in Java. Clients that send and receive events can be written in any programming language. S4 also includes a collection of modules called Processing Elements (or PEs for short) that implement basic functionality and can be used by application developers. In S4, keyed data events are routed with affinity to Processing Elements (PEs), which consume the events and do one or both of the following: (1) ''emit'' one or more events which may be consumed by other PEs, (2) ''publish'' results. The architecture resembles the Actors model, providing semantics of encapsulation and location transparency, thus allowing applications to be massively concurrent while exposing a simple programming interface to application developers. To drive adoption and increase the number of contributors to the project, we may need to prioritize the focus based on feedback from the community. We believe that one of the top priorities and driving design principle for the S4 project is to provide a simple API that hides most of the complexity associated with distributed systems and concurrency. The project grew out of the need to provide a flexible platform for application developers and scientists that can be used for quick experimentation and production. S4 differs from existing Apache projects in a number of fundamental ways. Flume is an Incubator project that focuses on log processing, performing lightweight processing in a distributed fashion and accumulating log data in a centralized repository for batch processing. S4 instead performs all stream processing in a distributed fashion and enables applications to form arbitrary graphs to process streams of events. We see Flume as a complementary project. We also expect S4 to complement Hadoop processing and in some cases to supersede it. Kafka is another Incubator project that focuses on processing large amounts of stream data. The design of Kafka, however, follows the pub-sub paradigm, which focuses on delivering messages containing arbitrary data from source processes (publishers) to consumer processes (subscribers). Compared to S4, Kafka is an intermediate step between data generation and processing, while S4 is itself a platform for processing streams of events. S4 overall addresses a need of existing applications to process streams of events beyond moving data to a centralized repository for batch processing. It complements the features of existing Apache projects, such as Hadoop, Flume, and Kafka, by providing a flexible platform for distributed event processing. == Background == S4 was initially developed at Yahoo! Labs starting in 2008 to process user feedback in the context of search advertising. The project was licensed under the Apache License version 2.0 in October 2010. The project documentation is currently available at http://s4.io . == Rationale == Stream computing has been growing steadily over the last 20 years. However, recently there has been an explosion in real-time data sources including the Web, sensor networks, financial securities analysis and trading, traffic monitoring, natural language processing of news and social data, and much more. As Hadoop evolved as a standard open source solution for batch processing of massive data sets, there is no equivalent community supported open source platform for processing data streams in real-time. While various research projects have evolved into proprietary commercial products, S4 has the potential to fill the gap. Many projects that require a scalable stream processing architecture currently use Hadoop by segmenting the input stream into data batches. This solution is not efficient, results in high latency, and introduces unnecessary complexity. The S4 design is primarily driven by large scale applications for data mining and machine learning in a production environment. We think that the S4 design is surprisingly flexible and
Re: [VOTE] S4 to join the Incubator
+1 Cheers, Adam On Tue, Sep 20, 2011 at 9:56 PM, Patrick Hunt ph...@apache.org wrote: It's been a nearly a week since the S4 proposal was submitted for discussion. A few questions were asked, and the proposal was clarified in response. Sufficient mentors have volunteered. I thus feel we are now ready for a vote. The latest proposal can be found at the end of this email and at: http://wiki.apache.org/incubator/S4Proposal The discussion regarding the proposal can be found at: http://s.apache.org/RMU Please cast your votes: [ ] +1 Accept S4 for incubation [ ] +0 Indifferent to S4 incubation [ ] -1 Reject S4 for incubation This vote will close 72 hours from now. Thanks, Patrick -- = S4 Proposal = == Abstract == S4 (Simple Scalable Streaming System) is a general-purpose, distributed, scalable, partially fault-tolerant, pluggable platform that allows programmers to easily develop applications for processing continuous, unbounded streams of data. == Proposal == S4 is a software platform written in Java. Clients that send and receive events can be written in any programming language. S4 also includes a collection of modules called Processing Elements (or PEs for short) that implement basic functionality and can be used by application developers. In S4, keyed data events are routed with affinity to Processing Elements (PEs), which consume the events and do one or both of the following: (1) ''emit'' one or more events which may be consumed by other PEs, (2) ''publish'' results. The architecture resembles the Actors model, providing semantics of encapsulation and location transparency, thus allowing applications to be massively concurrent while exposing a simple programming interface to application developers. To drive adoption and increase the number of contributors to the project, we may need to prioritize the focus based on feedback from the community. We believe that one of the top priorities and driving design principle for the S4 project is to provide a simple API that hides most of the complexity associated with distributed systems and concurrency. The project grew out of the need to provide a flexible platform for application developers and scientists that can be used for quick experimentation and production. S4 differs from existing Apache projects in a number of fundamental ways. Flume is an Incubator project that focuses on log processing, performing lightweight processing in a distributed fashion and accumulating log data in a centralized repository for batch processing. S4 instead performs all stream processing in a distributed fashion and enables applications to form arbitrary graphs to process streams of events. We see Flume as a complementary project. We also expect S4 to complement Hadoop processing and in some cases to supersede it. Kafka is another Incubator project that focuses on processing large amounts of stream data. The design of Kafka, however, follows the pub-sub paradigm, which focuses on delivering messages containing arbitrary data from source processes (publishers) to consumer processes (subscribers). Compared to S4, Kafka is an intermediate step between data generation and processing, while S4 is itself a platform for processing streams of events. S4 overall addresses a need of existing applications to process streams of events beyond moving data to a centralized repository for batch processing. It complements the features of existing Apache projects, such as Hadoop, Flume, and Kafka, by providing a flexible platform for distributed event processing. == Background == S4 was initially developed at Yahoo! Labs starting in 2008 to process user feedback in the context of search advertising. The project was licensed under the Apache License version 2.0 in October 2010. The project documentation is currently available at http://s4.io . == Rationale == Stream computing has been growing steadily over the last 20 years. However, recently there has been an explosion in real-time data sources including the Web, sensor networks, financial securities analysis and trading, traffic monitoring, natural language processing of news and social data, and much more. As Hadoop evolved as a standard open source solution for batch processing of massive data sets, there is no equivalent community supported open source platform for processing data streams in real-time. While various research projects have evolved into proprietary commercial products, S4 has the potential to fill the gap. Many projects that require a scalable stream processing architecture currently use Hadoop by segmenting the input stream into data batches. This solution is not efficient, results in high latency, and introduces unnecessary complexity. The S4 design is primarily driven by large scale applications for data mining and machine learning in a production environment. We think that the S4 design is
Re: [VOTE] S4 to join the Incubator
On Tue, Sep 20, 2011 at 10:56 PM, Patrick Hunt ph...@apache.org wrote: ...Please cast your votes: [ X] +1 Accept S4 for incubation ... * Matthieu Morel (mm at s4 dot io) * Anish Nair (an at s4 dot com)... Shouldn't that be s4 dot io instead? -Bertrand - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [VOTE] S4 to join the Incubator
+1 (binding) 2011/9/20 Patrick Hunt ph...@apache.org: It's been a nearly a week since the S4 proposal was submitted for discussion. A few questions were asked, and the proposal was clarified in response. Sufficient mentors have volunteered. I thus feel we are now ready for a vote. The latest proposal can be found at the end of this email and at: http://wiki.apache.org/incubator/S4Proposal The discussion regarding the proposal can be found at: http://s.apache.org/RMU Please cast your votes: [ ] +1 Accept S4 for incubation [ ] +0 Indifferent to S4 incubation [ ] -1 Reject S4 for incubation This vote will close 72 hours from now. Thanks, Patrick -- = S4 Proposal = == Abstract == S4 (Simple Scalable Streaming System) is a general-purpose, distributed, scalable, partially fault-tolerant, pluggable platform that allows programmers to easily develop applications for processing continuous, unbounded streams of data. == Proposal == S4 is a software platform written in Java. Clients that send and receive events can be written in any programming language. S4 also includes a collection of modules called Processing Elements (or PEs for short) that implement basic functionality and can be used by application developers. In S4, keyed data events are routed with affinity to Processing Elements (PEs), which consume the events and do one or both of the following: (1) ''emit'' one or more events which may be consumed by other PEs, (2) ''publish'' results. The architecture resembles the Actors model, providing semantics of encapsulation and location transparency, thus allowing applications to be massively concurrent while exposing a simple programming interface to application developers. To drive adoption and increase the number of contributors to the project, we may need to prioritize the focus based on feedback from the community. We believe that one of the top priorities and driving design principle for the S4 project is to provide a simple API that hides most of the complexity associated with distributed systems and concurrency. The project grew out of the need to provide a flexible platform for application developers and scientists that can be used for quick experimentation and production. S4 differs from existing Apache projects in a number of fundamental ways. Flume is an Incubator project that focuses on log processing, performing lightweight processing in a distributed fashion and accumulating log data in a centralized repository for batch processing. S4 instead performs all stream processing in a distributed fashion and enables applications to form arbitrary graphs to process streams of events. We see Flume as a complementary project. We also expect S4 to complement Hadoop processing and in some cases to supersede it. Kafka is another Incubator project that focuses on processing large amounts of stream data. The design of Kafka, however, follows the pub-sub paradigm, which focuses on delivering messages containing arbitrary data from source processes (publishers) to consumer processes (subscribers). Compared to S4, Kafka is an intermediate step between data generation and processing, while S4 is itself a platform for processing streams of events. S4 overall addresses a need of existing applications to process streams of events beyond moving data to a centralized repository for batch processing. It complements the features of existing Apache projects, such as Hadoop, Flume, and Kafka, by providing a flexible platform for distributed event processing. == Background == S4 was initially developed at Yahoo! Labs starting in 2008 to process user feedback in the context of search advertising. The project was licensed under the Apache License version 2.0 in October 2010. The project documentation is currently available at http://s4.io . == Rationale == Stream computing has been growing steadily over the last 20 years. However, recently there has been an explosion in real-time data sources including the Web, sensor networks, financial securities analysis and trading, traffic monitoring, natural language processing of news and social data, and much more. As Hadoop evolved as a standard open source solution for batch processing of massive data sets, there is no equivalent community supported open source platform for processing data streams in real-time. While various research projects have evolved into proprietary commercial products, S4 has the potential to fill the gap. Many projects that require a scalable stream processing architecture currently use Hadoop by segmenting the input stream into data batches. This solution is not efficient, results in high latency, and introduces unnecessary complexity. The S4 design is primarily driven by large scale applications for data mining and machine learning in a production environment. We think that the S4 design is surprisingly flexible and lends
Re: [VOTE] S4 to join the Incubator
+1 (binding) Regards JB On 09/21/2011 11:04 AM, Olivier Lamy wrote: +1 (binding) 2011/9/20 Patrick Huntph...@apache.org: It's been a nearly a week since the S4 proposal was submitted for discussion. A few questions were asked, and the proposal was clarified in response. Sufficient mentors have volunteered. I thus feel we are now ready for a vote. The latest proposal can be found at the end of this email and at: http://wiki.apache.org/incubator/S4Proposal The discussion regarding the proposal can be found at: http://s.apache.org/RMU Please cast your votes: [ ] +1 Accept S4 for incubation [ ] +0 Indifferent to S4 incubation [ ] -1 Reject S4 for incubation This vote will close 72 hours from now. Thanks, Patrick -- = S4 Proposal = == Abstract == S4 (Simple Scalable Streaming System) is a general-purpose, distributed, scalable, partially fault-tolerant, pluggable platform that allows programmers to easily develop applications for processing continuous, unbounded streams of data. == Proposal == S4 is a software platform written in Java. Clients that send and receive events can be written in any programming language. S4 also includes a collection of modules called Processing Elements (or PEs for short) that implement basic functionality and can be used by application developers. In S4, keyed data events are routed with affinity to Processing Elements (PEs), which consume the events and do one or both of the following: (1) ''emit'' one or more events which may be consumed by other PEs, (2) ''publish'' results. The architecture resembles the Actors model, providing semantics of encapsulation and location transparency, thus allowing applications to be massively concurrent while exposing a simple programming interface to application developers. To drive adoption and increase the number of contributors to the project, we may need to prioritize the focus based on feedback from the community. We believe that one of the top priorities and driving design principle for the S4 project is to provide a simple API that hides most of the complexity associated with distributed systems and concurrency. The project grew out of the need to provide a flexible platform for application developers and scientists that can be used for quick experimentation and production. S4 differs from existing Apache projects in a number of fundamental ways. Flume is an Incubator project that focuses on log processing, performing lightweight processing in a distributed fashion and accumulating log data in a centralized repository for batch processing. S4 instead performs all stream processing in a distributed fashion and enables applications to form arbitrary graphs to process streams of events. We see Flume as a complementary project. We also expect S4 to complement Hadoop processing and in some cases to supersede it. Kafka is another Incubator project that focuses on processing large amounts of stream data. The design of Kafka, however, follows the pub-sub paradigm, which focuses on delivering messages containing arbitrary data from source processes (publishers) to consumer processes (subscribers). Compared to S4, Kafka is an intermediate step between data generation and processing, while S4 is itself a platform for processing streams of events. S4 overall addresses a need of existing applications to process streams of events beyond moving data to a centralized repository for batch processing. It complements the features of existing Apache projects, such as Hadoop, Flume, and Kafka, by providing a flexible platform for distributed event processing. == Background == S4 was initially developed at Yahoo! Labs starting in 2008 to process user feedback in the context of search advertising. The project was licensed under the Apache License version 2.0 in October 2010. The project documentation is currently available at http://s4.io . == Rationale == Stream computing has been growing steadily over the last 20 years. However, recently there has been an explosion in real-time data sources including the Web, sensor networks, financial securities analysis and trading, traffic monitoring, natural language processing of news and social data, and much more. As Hadoop evolved as a standard open source solution for batch processing of massive data sets, there is no equivalent community supported open source platform for processing data streams in real-time. While various research projects have evolved into proprietary commercial products, S4 has the potential to fill the gap. Many projects that require a scalable stream processing architecture currently use Hadoop by segmenting the input stream into data batches. This solution is not efficient, results in high latency, and introduces unnecessary complexity. The S4 design is primarily driven by large scale applications for data mining and machine learning in a production environment. We think that the S4 design is surprisingly flexible and lends itself to run
Re: [VOTE] S4 to join the Incubator
Thanks for pointing it out, Bertrand. I have just fixed it on the wiki. -Flavio On Sep 21, 2011, at 10:51 AM, Bertrand Delacretaz wrote: On Tue, Sep 20, 2011 at 10:56 PM, Patrick Hunt ph...@apache.org wrote: ...Please cast your votes: [ X] +1 Accept S4 for incubation ... * Matthieu Morel (mm at s4 dot io) * Anish Nair (an at s4 dot com)... Shouldn't that be s4 dot io instead? -Bertrand - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [VOTE] S4 to join the Incubator
+1 (binding) On Wed, Sep 21, 2011 at 12:05 PM, Flavio Junqueira f...@s4.io wrote: Thanks for pointing it out, Bertrand. I have just fixed it on the wiki. -Flavio On Sep 21, 2011, at 10:51 AM, Bertrand Delacretaz wrote: On Tue, Sep 20, 2011 at 10:56 PM, Patrick Hunt ph...@apache.org wrote: ...Please cast your votes: [ X] +1 Accept S4 for incubation ... * Matthieu Morel (mm at s4 dot io) * Anish Nair (an at s4 dot com)... Shouldn't that be s4 dot io instead? -Bertrand - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [VOTE] S4 to join the Incubator
On Tue, Sep 20, 2011 at 4:56 PM, Patrick Hunt ph...@apache.org wrote: It's been a nearly a week since the S4 proposal was submitted for discussion. A few questions were asked, and the proposal was clarified in response. Sufficient mentors have volunteered. I thus feel we are now ready for a vote. The latest proposal can be found at the end of this email and at: http://wiki.apache.org/incubator/S4Proposal The discussion regarding the proposal can be found at: http://s.apache.org/RMU Please cast your votes: [ ] +1 Accept S4 for incubation [ ] +0 Indifferent to S4 incubation [ ] -1 Reject S4 for incubation +1 --tim - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [VOTE] S4 to join the Incubator
+1 (non binding) great project On Wed, Sep 21, 2011 at 2:41 PM, Tim Williams william...@gmail.com wrote: On Tue, Sep 20, 2011 at 4:56 PM, Patrick Hunt ph...@apache.org wrote: It's been a nearly a week since the S4 proposal was submitted for discussion. A few questions were asked, and the proposal was clarified in response. Sufficient mentors have volunteered. I thus feel we are now ready for a vote. The latest proposal can be found at the end of this email and at: http://wiki.apache.org/incubator/S4Proposal The discussion regarding the proposal can be found at: http://s.apache.org/RMU Please cast your votes: [ ] +1 Accept S4 for incubation [ ] +0 Indifferent to S4 incubation [ ] -1 Reject S4 for incubation +1 --tim - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [VOTE] S4 to join the Incubator
+1 (binding) Patrick On Wed, Sep 21, 2011 at 6:28 AM, Raffaele P. Guidi raffaele.p.gu...@gmail.com wrote: +1 (non binding) great project On Wed, Sep 21, 2011 at 2:41 PM, Tim Williams william...@gmail.com wrote: On Tue, Sep 20, 2011 at 4:56 PM, Patrick Hunt ph...@apache.org wrote: It's been a nearly a week since the S4 proposal was submitted for discussion. A few questions were asked, and the proposal was clarified in response. Sufficient mentors have volunteered. I thus feel we are now ready for a vote. The latest proposal can be found at the end of this email and at: http://wiki.apache.org/incubator/S4Proposal The discussion regarding the proposal can be found at: http://s.apache.org/RMU Please cast your votes: [ ] +1 Accept S4 for incubation [ ] +0 Indifferent to S4 incubation [ ] -1 Reject S4 for incubation +1 --tim - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
[VOTE] S4 to join the Incubator
It's been a nearly a week since the S4 proposal was submitted for discussion. A few questions were asked, and the proposal was clarified in response. Sufficient mentors have volunteered. I thus feel we are now ready for a vote. The latest proposal can be found at the end of this email and at: http://wiki.apache.org/incubator/S4Proposal The discussion regarding the proposal can be found at: http://s.apache.org/RMU Please cast your votes: [ ] +1 Accept S4 for incubation [ ] +0 Indifferent to S4 incubation [ ] -1 Reject S4 for incubation This vote will close 72 hours from now. Thanks, Patrick -- = S4 Proposal = == Abstract == S4 (Simple Scalable Streaming System) is a general-purpose, distributed, scalable, partially fault-tolerant, pluggable platform that allows programmers to easily develop applications for processing continuous, unbounded streams of data. == Proposal == S4 is a software platform written in Java. Clients that send and receive events can be written in any programming language. S4 also includes a collection of modules called Processing Elements (or PEs for short) that implement basic functionality and can be used by application developers. In S4, keyed data events are routed with affinity to Processing Elements (PEs), which consume the events and do one or both of the following: (1) ''emit'' one or more events which may be consumed by other PEs, (2) ''publish'' results. The architecture resembles the Actors model, providing semantics of encapsulation and location transparency, thus allowing applications to be massively concurrent while exposing a simple programming interface to application developers. To drive adoption and increase the number of contributors to the project, we may need to prioritize the focus based on feedback from the community. We believe that one of the top priorities and driving design principle for the S4 project is to provide a simple API that hides most of the complexity associated with distributed systems and concurrency. The project grew out of the need to provide a flexible platform for application developers and scientists that can be used for quick experimentation and production. S4 differs from existing Apache projects in a number of fundamental ways. Flume is an Incubator project that focuses on log processing, performing lightweight processing in a distributed fashion and accumulating log data in a centralized repository for batch processing. S4 instead performs all stream processing in a distributed fashion and enables applications to form arbitrary graphs to process streams of events. We see Flume as a complementary project. We also expect S4 to complement Hadoop processing and in some cases to supersede it. Kafka is another Incubator project that focuses on processing large amounts of stream data. The design of Kafka, however, follows the pub-sub paradigm, which focuses on delivering messages containing arbitrary data from source processes (publishers) to consumer processes (subscribers). Compared to S4, Kafka is an intermediate step between data generation and processing, while S4 is itself a platform for processing streams of events. S4 overall addresses a need of existing applications to process streams of events beyond moving data to a centralized repository for batch processing. It complements the features of existing Apache projects, such as Hadoop, Flume, and Kafka, by providing a flexible platform for distributed event processing. == Background == S4 was initially developed at Yahoo! Labs starting in 2008 to process user feedback in the context of search advertising. The project was licensed under the Apache License version 2.0 in October 2010. The project documentation is currently available at http://s4.io . == Rationale == Stream computing has been growing steadily over the last 20 years. However, recently there has been an explosion in real-time data sources including the Web, sensor networks, financial securities analysis and trading, traffic monitoring, natural language processing of news and social data, and much more. As Hadoop evolved as a standard open source solution for batch processing of massive data sets, there is no equivalent community supported open source platform for processing data streams in real-time. While various research projects have evolved into proprietary commercial products, S4 has the potential to fill the gap. Many projects that require a scalable stream processing architecture currently use Hadoop by segmenting the input stream into data batches. This solution is not efficient, results in high latency, and introduces unnecessary complexity. The S4 design is primarily driven by large scale applications for data mining and machine learning in a production environment. We think that the S4 design is surprisingly flexible and lends itself to run in large clusters built with commodity hardware. S4 enables application programmers to focus more on the application and less on
Re: [VOTE] S4 to join the Incubator
Great project. +1 (non-binding) On Tue, Sep 20, 2011 at 1:56 PM, Patrick Hunt ph...@apache.org wrote: It's been a nearly a week since the S4 proposal was submitted for discussion. A few questions were asked, and the proposal was clarified in response. Sufficient mentors have volunteered. I thus feel we are now ready for a vote. The latest proposal can be found at the end of this email and at: http://wiki.apache.org/incubator/S4Proposal The discussion regarding the proposal can be found at: http://s.apache.org/RMU Please cast your votes: [ ] +1 Accept S4 for incubation [ ] +0 Indifferent to S4 incubation [ ] -1 Reject S4 for incubation This vote will close 72 hours from now. Thanks, Patrick -- = S4 Proposal = == Abstract == S4 (Simple Scalable Streaming System) is a general-purpose, distributed, scalable, partially fault-tolerant, pluggable platform that allows programmers to easily develop applications for processing continuous, unbounded streams of data. == Proposal == S4 is a software platform written in Java. Clients that send and receive events can be written in any programming language. S4 also includes a collection of modules called Processing Elements (or PEs for short) that implement basic functionality and can be used by application developers. In S4, keyed data events are routed with affinity to Processing Elements (PEs), which consume the events and do one or both of the following: (1) ''emit'' one or more events which may be consumed by other PEs, (2) ''publish'' results. The architecture resembles the Actors model, providing semantics of encapsulation and location transparency, thus allowing applications to be massively concurrent while exposing a simple programming interface to application developers. To drive adoption and increase the number of contributors to the project, we may need to prioritize the focus based on feedback from the community. We believe that one of the top priorities and driving design principle for the S4 project is to provide a simple API that hides most of the complexity associated with distributed systems and concurrency. The project grew out of the need to provide a flexible platform for application developers and scientists that can be used for quick experimentation and production. S4 differs from existing Apache projects in a number of fundamental ways. Flume is an Incubator project that focuses on log processing, performing lightweight processing in a distributed fashion and accumulating log data in a centralized repository for batch processing. S4 instead performs all stream processing in a distributed fashion and enables applications to form arbitrary graphs to process streams of events. We see Flume as a complementary project. We also expect S4 to complement Hadoop processing and in some cases to supersede it. Kafka is another Incubator project that focuses on processing large amounts of stream data. The design of Kafka, however, follows the pub-sub paradigm, which focuses on delivering messages containing arbitrary data from source processes (publishers) to consumer processes (subscribers). Compared to S4, Kafka is an intermediate step between data generation and processing, while S4 is itself a platform for processing streams of events. S4 overall addresses a need of existing applications to process streams of events beyond moving data to a centralized repository for batch processing. It complements the features of existing Apache projects, such as Hadoop, Flume, and Kafka, by providing a flexible platform for distributed event processing. == Background == S4 was initially developed at Yahoo! Labs starting in 2008 to process user feedback in the context of search advertising. The project was licensed under the Apache License version 2.0 in October 2010. The project documentation is currently available at http://s4.io . == Rationale == Stream computing has been growing steadily over the last 20 years. However, recently there has been an explosion in real-time data sources including the Web, sensor networks, financial securities analysis and trading, traffic monitoring, natural language processing of news and social data, and much more. As Hadoop evolved as a standard open source solution for batch processing of massive data sets, there is no equivalent community supported open source platform for processing data streams in real-time. While various research projects have evolved into proprietary commercial products, S4 has the potential to fill the gap. Many projects that require a scalable stream processing architecture currently use Hadoop by segmenting the input stream into data batches. This solution is not efficient, results in high latency, and introduces unnecessary complexity. The S4 design is primarily driven by large scale applications for data mining and machine learning in a production environment. We think that
Re: [VOTE] S4 to join the Incubator
+1 On Wed, Sep 21, 2011 at 2:26 AM, Patrick Hunt ph...@apache.org wrote: It's been a nearly a week since the S4 proposal was submitted for discussion. A few questions were asked, and the proposal was clarified in response. Sufficient mentors have volunteered. I thus feel we are now ready for a vote. The latest proposal can be found at the end of this email and at: http://wiki.apache.org/incubator/S4Proposal The discussion regarding the proposal can be found at: http://s.apache.org/RMU Please cast your votes: [ ] +1 Accept S4 for incubation [ ] +0 Indifferent to S4 incubation [ ] -1 Reject S4 for incubation This vote will close 72 hours from now. Thanks, Patrick -- = S4 Proposal = == Abstract == S4 (Simple Scalable Streaming System) is a general-purpose, distributed, scalable, partially fault-tolerant, pluggable platform that allows programmers to easily develop applications for processing continuous, unbounded streams of data. == Proposal == S4 is a software platform written in Java. Clients that send and receive events can be written in any programming language. S4 also includes a collection of modules called Processing Elements (or PEs for short) that implement basic functionality and can be used by application developers. In S4, keyed data events are routed with affinity to Processing Elements (PEs), which consume the events and do one or both of the following: (1) ''emit'' one or more events which may be consumed by other PEs, (2) ''publish'' results. The architecture resembles the Actors model, providing semantics of encapsulation and location transparency, thus allowing applications to be massively concurrent while exposing a simple programming interface to application developers. To drive adoption and increase the number of contributors to the project, we may need to prioritize the focus based on feedback from the community. We believe that one of the top priorities and driving design principle for the S4 project is to provide a simple API that hides most of the complexity associated with distributed systems and concurrency. The project grew out of the need to provide a flexible platform for application developers and scientists that can be used for quick experimentation and production. S4 differs from existing Apache projects in a number of fundamental ways. Flume is an Incubator project that focuses on log processing, performing lightweight processing in a distributed fashion and accumulating log data in a centralized repository for batch processing. S4 instead performs all stream processing in a distributed fashion and enables applications to form arbitrary graphs to process streams of events. We see Flume as a complementary project. We also expect S4 to complement Hadoop processing and in some cases to supersede it. Kafka is another Incubator project that focuses on processing large amounts of stream data. The design of Kafka, however, follows the pub-sub paradigm, which focuses on delivering messages containing arbitrary data from source processes (publishers) to consumer processes (subscribers). Compared to S4, Kafka is an intermediate step between data generation and processing, while S4 is itself a platform for processing streams of events. S4 overall addresses a need of existing applications to process streams of events beyond moving data to a centralized repository for batch processing. It complements the features of existing Apache projects, such as Hadoop, Flume, and Kafka, by providing a flexible platform for distributed event processing. == Background == S4 was initially developed at Yahoo! Labs starting in 2008 to process user feedback in the context of search advertising. The project was licensed under the Apache License version 2.0 in October 2010. The project documentation is currently available at http://s4.io . == Rationale == Stream computing has been growing steadily over the last 20 years. However, recently there has been an explosion in real-time data sources including the Web, sensor networks, financial securities analysis and trading, traffic monitoring, natural language processing of news and social data, and much more. As Hadoop evolved as a standard open source solution for batch processing of massive data sets, there is no equivalent community supported open source platform for processing data streams in real-time. While various research projects have evolved into proprietary commercial products, S4 has the potential to fill the gap. Many projects that require a scalable stream processing architecture currently use Hadoop by segmenting the input stream into data batches. This solution is not efficient, results in high latency, and introduces unnecessary complexity. The S4 design is primarily driven by large scale applications for data mining and machine learning in a production environment. We think that the S4 design is surprisingly
Re: [VOTE] S4 to join the Incubator
On Tue, Sep 20, 2011 at 4:56 PM, Patrick Hunt ph...@apache.org wrote: [ ] +1 Accept S4 for incubation [ ] +0 Indifferent to S4 incubation [ ] -1 Reject S4 for incubation +1 Cheers, Phil - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [VOTE] S4 to join the Incubator
+1 (binding) Arun On Sep 20, 2011, at 1:56 PM, Patrick Hunt wrote: It's been a nearly a week since the S4 proposal was submitted for discussion. A few questions were asked, and the proposal was clarified in response. Sufficient mentors have volunteered. I thus feel we are now ready for a vote. The latest proposal can be found at the end of this email and at: http://wiki.apache.org/incubator/S4Proposal The discussion regarding the proposal can be found at: http://s.apache.org/RMU Please cast your votes: [ ] +1 Accept S4 for incubation [ ] +0 Indifferent to S4 incubation [ ] -1 Reject S4 for incubation This vote will close 72 hours from now. Thanks, Patrick -- = S4 Proposal = == Abstract == S4 (Simple Scalable Streaming System) is a general-purpose, distributed, scalable, partially fault-tolerant, pluggable platform that allows programmers to easily develop applications for processing continuous, unbounded streams of data. == Proposal == S4 is a software platform written in Java. Clients that send and receive events can be written in any programming language. S4 also includes a collection of modules called Processing Elements (or PEs for short) that implement basic functionality and can be used by application developers. In S4, keyed data events are routed with affinity to Processing Elements (PEs), which consume the events and do one or both of the following: (1) ''emit'' one or more events which may be consumed by other PEs, (2) ''publish'' results. The architecture resembles the Actors model, providing semantics of encapsulation and location transparency, thus allowing applications to be massively concurrent while exposing a simple programming interface to application developers. To drive adoption and increase the number of contributors to the project, we may need to prioritize the focus based on feedback from the community. We believe that one of the top priorities and driving design principle for the S4 project is to provide a simple API that hides most of the complexity associated with distributed systems and concurrency. The project grew out of the need to provide a flexible platform for application developers and scientists that can be used for quick experimentation and production. S4 differs from existing Apache projects in a number of fundamental ways. Flume is an Incubator project that focuses on log processing, performing lightweight processing in a distributed fashion and accumulating log data in a centralized repository for batch processing. S4 instead performs all stream processing in a distributed fashion and enables applications to form arbitrary graphs to process streams of events. We see Flume as a complementary project. We also expect S4 to complement Hadoop processing and in some cases to supersede it. Kafka is another Incubator project that focuses on processing large amounts of stream data. The design of Kafka, however, follows the pub-sub paradigm, which focuses on delivering messages containing arbitrary data from source processes (publishers) to consumer processes (subscribers). Compared to S4, Kafka is an intermediate step between data generation and processing, while S4 is itself a platform for processing streams of events. S4 overall addresses a need of existing applications to process streams of events beyond moving data to a centralized repository for batch processing. It complements the features of existing Apache projects, such as Hadoop, Flume, and Kafka, by providing a flexible platform for distributed event processing. == Background == S4 was initially developed at Yahoo! Labs starting in 2008 to process user feedback in the context of search advertising. The project was licensed under the Apache License version 2.0 in October 2010. The project documentation is currently available at http://s4.io . == Rationale == Stream computing has been growing steadily over the last 20 years. However, recently there has been an explosion in real-time data sources including the Web, sensor networks, financial securities analysis and trading, traffic monitoring, natural language processing of news and social data, and much more. As Hadoop evolved as a standard open source solution for batch processing of massive data sets, there is no equivalent community supported open source platform for processing data streams in real-time. While various research projects have evolved into proprietary commercial products, S4 has the potential to fill the gap. Many projects that require a scalable stream processing architecture currently use Hadoop by segmenting the input stream into data batches. This solution is not efficient, results in high latency, and introduces unnecessary complexity. The S4 design is primarily driven by large scale applications for data mining and machine learning in a production environment. We think that the S4 design
Re: [VOTE] S4 to join the Incubator
+1 Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ From: Patrick Hunt ph...@apache.org To: general@incubator.apache.org Sent: Tuesday, September 20, 2011 4:56 PM Subject: [VOTE] S4 to join the Incubator It's been a nearly a week since the S4 proposal was submitted for discussion. A few questions were asked, and the proposal was clarified in response. Sufficient mentors have volunteered. I thus feel we are now ready for a vote. The latest proposal can be found at the end of this email and at: http://wiki.apache.org/incubator/S4Proposal The discussion regarding the proposal can be found at: http://s.apache.org/RMU Please cast your votes: [ ] +1 Accept S4 for incubation [ ] +0 Indifferent to S4 incubation [ ] -1 Reject S4 for incubation This vote will close 72 hours from now. Thanks, Patrick -- = S4 Proposal = == Abstract == S4 (Simple Scalable Streaming System) is a general-purpose, distributed, scalable, partially fault-tolerant, pluggable platform that allows programmers to easily develop applications for processing continuous, unbounded streams of data. == Proposal == S4 is a software platform written in Java. Clients that send and receive events can be written in any programming language. S4 also includes a collection of modules called Processing Elements (or PEs for short) that implement basic functionality and can be used by application developers. In S4, keyed data events are routed with affinity to Processing Elements (PEs), which consume the events and do one or both of the following: (1) ''emit'' one or more events which may be consumed by other PEs, (2) ''publish'' results. The architecture resembles the Actors model, providing semantics of encapsulation and location transparency, thus allowing applications to be massively concurrent while exposing a simple programming interface to application developers. To drive adoption and increase the number of contributors to the project, we may need to prioritize the focus based on feedback from the community. We believe that one of the top priorities and driving design principle for the S4 project is to provide a simple API that hides most of the complexity associated with distributed systems and concurrency. The project grew out of the need to provide a flexible platform for application developers and scientists that can be used for quick experimentation and production. S4 differs from existing Apache projects in a number of fundamental ways. Flume is an Incubator project that focuses on log processing, performing lightweight processing in a distributed fashion and accumulating log data in a centralized repository for batch processing. S4 instead performs all stream processing in a distributed fashion and enables applications to form arbitrary graphs to process streams of events. We see Flume as a complementary project. We also expect S4 to complement Hadoop processing and in some cases to supersede it. Kafka is another Incubator project that focuses on processing large amounts of stream data. The design of Kafka, however, follows the pub-sub paradigm, which focuses on delivering messages containing arbitrary data from source processes (publishers) to consumer processes (subscribers). Compared to S4, Kafka is an intermediate step between data generation and processing, while S4 is itself a platform for processing streams of events. S4 overall addresses a need of existing applications to process streams of events beyond moving data to a centralized repository for batch processing. It complements the features of existing Apache projects, such as Hadoop, Flume, and Kafka, by providing a flexible platform for distributed event processing. == Background == S4 was initially developed at Yahoo! Labs starting in 2008 to process user feedback in the context of search advertising. The project was licensed under the Apache License version 2.0 in October 2010. The project documentation is currently available at http://s4.io . == Rationale == Stream computing has been growing steadily over the last 20 years. However, recently there has been an explosion in real-time data sources including the Web, sensor networks, financial securities analysis and trading, traffic monitoring, natural language processing of news and social data, and much more. As Hadoop evolved as a standard open source solution for batch processing of massive data sets, there is no equivalent community supported open source platform for processing data streams in real-time. While various research projects have evolved into proprietary commercial products, S4 has the potential to fill the gap. Many projects that require a scalable stream processing architecture currently use Hadoop by segmenting the input stream into data batches. This solution is not efficient, results in high latency, and introduces unnecessary complexity. The S4 design
Re: [VOTE] S4 to join the Incubator
+1 (non-binding) On Tue, Sep 20, 2011 at 4:56 PM, Patrick Hunt ph...@apache.org wrote: It's been a nearly a week since the S4 proposal was submitted for discussion. A few questions were asked, and the proposal was clarified in response. Sufficient mentors have volunteered. I thus feel we are now ready for a vote. The latest proposal can be found at the end of this email and at: http://wiki.apache.org/incubator/S4Proposal The discussion regarding the proposal can be found at: http://s.apache.org/RMU Please cast your votes: [ ] +1 Accept S4 for incubation [ ] +0 Indifferent to S4 incubation [ ] -1 Reject S4 for incubation This vote will close 72 hours from now. Thanks, Patrick -- = S4 Proposal = == Abstract == S4 (Simple Scalable Streaming System) is a general-purpose, distributed, scalable, partially fault-tolerant, pluggable platform that allows programmers to easily develop applications for processing continuous, unbounded streams of data. == Proposal == S4 is a software platform written in Java. Clients that send and receive events can be written in any programming language. S4 also includes a collection of modules called Processing Elements (or PEs for short) that implement basic functionality and can be used by application developers. In S4, keyed data events are routed with affinity to Processing Elements (PEs), which consume the events and do one or both of the following: (1) ''emit'' one or more events which may be consumed by other PEs, (2) ''publish'' results. The architecture resembles the Actors model, providing semantics of encapsulation and location transparency, thus allowing applications to be massively concurrent while exposing a simple programming interface to application developers. To drive adoption and increase the number of contributors to the project, we may need to prioritize the focus based on feedback from the community. We believe that one of the top priorities and driving design principle for the S4 project is to provide a simple API that hides most of the complexity associated with distributed systems and concurrency. The project grew out of the need to provide a flexible platform for application developers and scientists that can be used for quick experimentation and production. S4 differs from existing Apache projects in a number of fundamental ways. Flume is an Incubator project that focuses on log processing, performing lightweight processing in a distributed fashion and accumulating log data in a centralized repository for batch processing. S4 instead performs all stream processing in a distributed fashion and enables applications to form arbitrary graphs to process streams of events. We see Flume as a complementary project. We also expect S4 to complement Hadoop processing and in some cases to supersede it. Kafka is another Incubator project that focuses on processing large amounts of stream data. The design of Kafka, however, follows the pub-sub paradigm, which focuses on delivering messages containing arbitrary data from source processes (publishers) to consumer processes (subscribers). Compared to S4, Kafka is an intermediate step between data generation and processing, while S4 is itself a platform for processing streams of events. S4 overall addresses a need of existing applications to process streams of events beyond moving data to a centralized repository for batch processing. It complements the features of existing Apache projects, such as Hadoop, Flume, and Kafka, by providing a flexible platform for distributed event processing. == Background == S4 was initially developed at Yahoo! Labs starting in 2008 to process user feedback in the context of search advertising. The project was licensed under the Apache License version 2.0 in October 2010. The project documentation is currently available at http://s4.io . == Rationale == Stream computing has been growing steadily over the last 20 years. However, recently there has been an explosion in real-time data sources including the Web, sensor networks, financial securities analysis and trading, traffic monitoring, natural language processing of news and social data, and much more. As Hadoop evolved as a standard open source solution for batch processing of massive data sets, there is no equivalent community supported open source platform for processing data streams in real-time. While various research projects have evolved into proprietary commercial products, S4 has the potential to fill the gap. Many projects that require a scalable stream processing architecture currently use Hadoop by segmenting the input stream into data batches. This solution is not efficient, results in high latency, and introduces unnecessary complexity. The S4 design is primarily driven by large scale applications for data mining and machine learning in a production environment. We think that the S4 design is
Re: [VOTE] S4 to join the Incubator
+1 (non-binding) +Vinod On Wed, Sep 21, 2011 at 2:26 AM, Patrick Hunt ph...@apache.org wrote: It's been a nearly a week since the S4 proposal was submitted for discussion. A few questions were asked, and the proposal was clarified in response. Sufficient mentors have volunteered. I thus feel we are now ready for a vote. The latest proposal can be found at the end of this email and at: http://wiki.apache.org/incubator/S4Proposal The discussion regarding the proposal can be found at: http://s.apache.org/RMU Please cast your votes: [ ] +1 Accept S4 for incubation [ ] +0 Indifferent to S4 incubation [ ] -1 Reject S4 for incubation This vote will close 72 hours from now. Thanks, Patrick -- = S4 Proposal = == Abstract == S4 (Simple Scalable Streaming System) is a general-purpose, distributed, scalable, partially fault-tolerant, pluggable platform that allows programmers to easily develop applications for processing continuous, unbounded streams of data. == Proposal == S4 is a software platform written in Java. Clients that send and receive events can be written in any programming language. S4 also includes a collection of modules called Processing Elements (or PEs for short) that implement basic functionality and can be used by application developers. In S4, keyed data events are routed with affinity to Processing Elements (PEs), which consume the events and do one or both of the following: (1) ''emit'' one or more events which may be consumed by other PEs, (2) ''publish'' results. The architecture resembles the Actors model, providing semantics of encapsulation and location transparency, thus allowing applications to be massively concurrent while exposing a simple programming interface to application developers. To drive adoption and increase the number of contributors to the project, we may need to prioritize the focus based on feedback from the community. We believe that one of the top priorities and driving design principle for the S4 project is to provide a simple API that hides most of the complexity associated with distributed systems and concurrency. The project grew out of the need to provide a flexible platform for application developers and scientists that can be used for quick experimentation and production. S4 differs from existing Apache projects in a number of fundamental ways. Flume is an Incubator project that focuses on log processing, performing lightweight processing in a distributed fashion and accumulating log data in a centralized repository for batch processing. S4 instead performs all stream processing in a distributed fashion and enables applications to form arbitrary graphs to process streams of events. We see Flume as a complementary project. We also expect S4 to complement Hadoop processing and in some cases to supersede it. Kafka is another Incubator project that focuses on processing large amounts of stream data. The design of Kafka, however, follows the pub-sub paradigm, which focuses on delivering messages containing arbitrary data from source processes (publishers) to consumer processes (subscribers). Compared to S4, Kafka is an intermediate step between data generation and processing, while S4 is itself a platform for processing streams of events. S4 overall addresses a need of existing applications to process streams of events beyond moving data to a centralized repository for batch processing. It complements the features of existing Apache projects, such as Hadoop, Flume, and Kafka, by providing a flexible platform for distributed event processing. == Background == S4 was initially developed at Yahoo! Labs starting in 2008 to process user feedback in the context of search advertising. The project was licensed under the Apache License version 2.0 in October 2010. The project documentation is currently available at http://s4.io . == Rationale == Stream computing has been growing steadily over the last 20 years. However, recently there has been an explosion in real-time data sources including the Web, sensor networks, financial securities analysis and trading, traffic monitoring, natural language processing of news and social data, and much more. As Hadoop evolved as a standard open source solution for batch processing of massive data sets, there is no equivalent community supported open source platform for processing data streams in real-time. While various research projects have evolved into proprietary commercial products, S4 has the potential to fill the gap. Many projects that require a scalable stream processing architecture currently use Hadoop by segmenting the input stream into data batches. This solution is not efficient, results in high latency, and introduces unnecessary complexity. The S4 design is primarily driven by large scale applications for data mining and machine learning in a production environment. We think that the S4