Re: [VOTE] S4 to join the Incubator

2011-09-28 Thread Sajeevan Achuthan
Hi Leo,

I am Sajeevan , I am working for Ericsson, Ireland . I have 13 years of
experience in Java technologies and distributed computing.
   We(Ericsson) are looking for distributed streaming projects for
telecommunication devices performance monitoring and mobile phone user
experience analysis .
   This project is very interesting , I have plenty of experience in tcp/ip
 data stream  processing and  very interested to join in this  project and
help to implement.
   If you are interested, you can add me to committer's list.
Thanks
Sajeevan

On 27 September 2011 18:23, Flavio Junqueira f...@s4.io wrote:

 I'm thrilled to see that it passed. Thanks for all the support so far, and
 I'm looking forward to setting it up and getting the project going.

 -Flavio


 On Sep 26, 2011, at 6:47 PM, Patrick Hunt wrote:

  This passes, with 16 +1 votes, plenty of them binding, and no -1 votes.

 Thanks to all who voted!

 We can now get started creating the Apache S4 podling.

 Patrick

 On Tue, Sep 20, 2011 at 1:56 PM, Patrick Hunt ph...@apache.org wrote:

 It's been a nearly a week since the S4 proposal was submitted for
 discussion.  A few questions were asked, and the proposal was clarified
 in response.  Sufficient mentors have volunteered.  I thus feel we are
 now ready for a vote.

 The latest proposal can be found at the end of this email and at:

  
 http://wiki.apache.org/**incubator/S4Proposalhttp://wiki.apache.org/incubator/S4Proposal

 The discussion regarding the proposal can be found at:

  http://s.apache.org/RMU

 Please cast your votes:

 [  ] +1 Accept S4 for incubation
 [  ] +0 Indifferent to S4 incubation
 [  ] -1 Reject S4 for incubation

 This vote will close 72 hours from now.

 Thanks,

 Patrick

 --
 = S4 Proposal =

 == Abstract ==

 S4 (Simple Scalable Streaming System) is a general-purpose,
 distributed, scalable, partially fault-tolerant, pluggable platform
 that allows programmers to easily develop applications for processing
 continuous, unbounded streams of data.

 == Proposal ==

 S4 is a software platform written in Java. Clients that send and
 receive events can be written in any programming language. S4 also
 includes a collection of modules called Processing Elements (or PEs
 for short) that implement basic functionality and can be used by
 application developers. In S4, keyed data events are routed with
 affinity to Processing Elements (PEs), which consume the events and do
 one or both of the following: (1) ''emit'' one or more events which
 may be consumed by other PEs, (2) ''publish'' results. The
 architecture resembles the Actors model, providing semantics of
 encapsulation and location transparency, thus allowing applications to
 be massively concurrent while exposing a simple programming  interface
 to application developers.

 To drive adoption and increase the number of contributors to the
 project, we may need to prioritize the focus based on feedback from
 the community. We believe that one of the top priorities and driving
 design principle for the S4 project is to provide a simple API that
 hides most of the complexity associated with distributed systems and
 concurrency. The project grew out of the need to provide a flexible
 platform for application developers and scientists that can be used
 for quick experimentation and production.

 S4 differs from existing Apache projects in a number of fundamental
 ways. Flume is an Incubator project that focuses on log processing,
 performing lightweight processing in a distributed fashion and
 accumulating log data in a centralized repository for batch
 processing. S4 instead performs all stream processing in a distributed
 fashion and enables applications to form arbitrary graphs to process
 streams of events. We see Flume as a complementary project. We also
 expect S4 to complement Hadoop processing and in some cases to
 supersede it. Kafka is another Incubator project that focuses on
 processing large amounts of stream data. The design of Kafka, however,
 follows the pub-sub paradigm, which focuses on delivering messages
 containing arbitrary data from source processes (publishers) to
 consumer processes (subscribers). Compared to S4, Kafka is an
 intermediate step between data generation and processing, while S4 is
 itself a platform for processing streams of events.

 S4 overall addresses a need of existing applications to process
 streams of events beyond moving data to a centralized repository for
 batch processing. It complements the features of existing Apache
 projects, such as Hadoop, Flume, and Kafka, by providing a flexible
 platform for distributed event processing.

 == Background ==

 S4 was initially developed at Yahoo! Labs starting in 2008 to process
 user feedback in the context of search advertising. The project was
 licensed under the Apache License version 2.0 in October 2010. The
 project documentation is currently available at http://s4.io .

 == Rationale ==

 Stream computing 

Re: [VOTE] S4 to join the Incubator

2011-09-28 Thread Leo Neumeyer
Hi Sajeevan,

This is great! We really need people with your background and experience. We 
are just putting things together including some minimal processes for people 
who want to join. We will announce shortly. In the meantime, please clone this 
repo to get started with the latest code: https://github.com/leoneu/s4-piper 
here you can see the API I am proposing for the next S4 release. The 
experimental integration with the communication layer is being done here: 
https://github.com/brucerobbins/s4-piper-commlayer_experiment Once we start the 
new repository in incubator we will merge in one place. The communication layer 
is an abstraction that makes it possible to implement network communication 
using any framework. We have a simple UDP-based implementation and the new 
Netty-based implementation. If you can help with the design and code of the 
Netty implementation or suggest other ideas, that would be extremely valuable.

thanks!
-leo 

On Sep 28, 2011, at 12:36 PM, Sajeevan Achuthan wrote:

 Hi Leo,
 
I am Sajeevan , I am working for Ericsson, Ireland . I have 13 years of
 experience in Java technologies and distributed computing.
   We(Ericsson) are looking for distributed streaming projects for
 telecommunication devices performance monitoring and mobile phone user
 experience analysis .
   This project is very interesting , I have plenty of experience in tcp/ip
 data stream  processing and  very interested to join in this  project and
 help to implement.
   If you are interested, you can add me to committer's list.
 Thanks
 Sajeevan
 
 On 27 September 2011 18:23, Flavio Junqueira f...@s4.io wrote:
 
 I'm thrilled to see that it passed. Thanks for all the support so far, and
 I'm looking forward to setting it up and getting the project going.
 
 -Flavio
 
 
 On Sep 26, 2011, at 6:47 PM, Patrick Hunt wrote:
 
 This passes, with 16 +1 votes, plenty of them binding, and no -1 votes.
 
 Thanks to all who voted!
 
 We can now get started creating the Apache S4 podling.
 
 Patrick
 
 On Tue, Sep 20, 2011 at 1:56 PM, Patrick Hunt ph...@apache.org wrote:
 
 It's been a nearly a week since the S4 proposal was submitted for
 discussion.  A few questions were asked, and the proposal was clarified
 in response.  Sufficient mentors have volunteered.  I thus feel we are
 now ready for a vote.
 
 The latest proposal can be found at the end of this email and at:
 
 http://wiki.apache.org/**incubator/S4Proposalhttp://wiki.apache.org/incubator/S4Proposal
 
 The discussion regarding the proposal can be found at:
 
 http://s.apache.org/RMU
 
 Please cast your votes:
 
 [  ] +1 Accept S4 for incubation
 [  ] +0 Indifferent to S4 incubation
 [  ] -1 Reject S4 for incubation
 
 This vote will close 72 hours from now.
 
 Thanks,
 
 Patrick
 
 --
 = S4 Proposal =
 
 == Abstract ==
 
 S4 (Simple Scalable Streaming System) is a general-purpose,
 distributed, scalable, partially fault-tolerant, pluggable platform
 that allows programmers to easily develop applications for processing
 continuous, unbounded streams of data.
 
 == Proposal ==
 
 S4 is a software platform written in Java. Clients that send and
 receive events can be written in any programming language. S4 also
 includes a collection of modules called Processing Elements (or PEs
 for short) that implement basic functionality and can be used by
 application developers. In S4, keyed data events are routed with
 affinity to Processing Elements (PEs), which consume the events and do
 one or both of the following: (1) ''emit'' one or more events which
 may be consumed by other PEs, (2) ''publish'' results. The
 architecture resembles the Actors model, providing semantics of
 encapsulation and location transparency, thus allowing applications to
 be massively concurrent while exposing a simple programming  interface
 to application developers.
 
 To drive adoption and increase the number of contributors to the
 project, we may need to prioritize the focus based on feedback from
 the community. We believe that one of the top priorities and driving
 design principle for the S4 project is to provide a simple API that
 hides most of the complexity associated with distributed systems and
 concurrency. The project grew out of the need to provide a flexible
 platform for application developers and scientists that can be used
 for quick experimentation and production.
 
 S4 differs from existing Apache projects in a number of fundamental
 ways. Flume is an Incubator project that focuses on log processing,
 performing lightweight processing in a distributed fashion and
 accumulating log data in a centralized repository for batch
 processing. S4 instead performs all stream processing in a distributed
 fashion and enables applications to form arbitrary graphs to process
 streams of events. We see Flume as a complementary project. We also
 expect S4 to complement Hadoop processing and in some cases to
 supersede it. Kafka is another Incubator project that 

Re: [VOTE] S4 to join the Incubator

2011-09-27 Thread Flavio Junqueira
I'm thrilled to see that it passed. Thanks for all the support so far,  
and I'm looking forward to setting it up and getting the project going.


-Flavio

On Sep 26, 2011, at 6:47 PM, Patrick Hunt wrote:

This passes, with 16 +1 votes, plenty of them binding, and no -1  
votes.


Thanks to all who voted!

We can now get started creating the Apache S4 podling.

Patrick

On Tue, Sep 20, 2011 at 1:56 PM, Patrick Hunt ph...@apache.org  
wrote:

It's been a nearly a week since the S4 proposal was submitted for
discussion.  A few questions were asked, and the proposal was  
clarified
in response.  Sufficient mentors have volunteered.  I thus feel we  
are

now ready for a vote.

The latest proposal can be found at the end of this email and at:

 http://wiki.apache.org/incubator/S4Proposal

The discussion regarding the proposal can be found at:

 http://s.apache.org/RMU

Please cast your votes:

[  ] +1 Accept S4 for incubation
[  ] +0 Indifferent to S4 incubation
[  ] -1 Reject S4 for incubation

This vote will close 72 hours from now.

Thanks,

Patrick

--
= S4 Proposal =

== Abstract ==

S4 (Simple Scalable Streaming System) is a general-purpose,
distributed, scalable, partially fault-tolerant, pluggable platform
that allows programmers to easily develop applications for processing
continuous, unbounded streams of data.

== Proposal ==

S4 is a software platform written in Java. Clients that send and
receive events can be written in any programming language. S4 also
includes a collection of modules called Processing Elements (or PEs
for short) that implement basic functionality and can be used by
application developers. In S4, keyed data events are routed with
affinity to Processing Elements (PEs), which consume the events and  
do

one or both of the following: (1) ''emit'' one or more events which
may be consumed by other PEs, (2) ''publish'' results. The
architecture resembles the Actors model, providing semantics of
encapsulation and location transparency, thus allowing applications  
to
be massively concurrent while exposing a simple programming   
interface

to application developers.

To drive adoption and increase the number of contributors to the
project, we may need to prioritize the focus based on feedback from
the community. We believe that one of the top priorities and driving
design principle for the S4 project is to provide a simple API that
hides most of the complexity associated with distributed systems and
concurrency. The project grew out of the need to provide a flexible
platform for application developers and scientists that can be used
for quick experimentation and production.

S4 differs from existing Apache projects in a number of fundamental
ways. Flume is an Incubator project that focuses on log processing,
performing lightweight processing in a distributed fashion and
accumulating log data in a centralized repository for batch
processing. S4 instead performs all stream processing in a  
distributed

fashion and enables applications to form arbitrary graphs to process
streams of events. We see Flume as a complementary project. We also
expect S4 to complement Hadoop processing and in some cases to
supersede it. Kafka is another Incubator project that focuses on
processing large amounts of stream data. The design of Kafka,  
however,

follows the pub-sub paradigm, which focuses on delivering messages
containing arbitrary data from source processes (publishers) to
consumer processes (subscribers). Compared to S4, Kafka is an
intermediate step between data generation and processing, while S4 is
itself a platform for processing streams of events.

S4 overall addresses a need of existing applications to process
streams of events beyond moving data to a centralized repository for
batch processing. It complements the features of existing Apache
projects, such as Hadoop, Flume, and Kafka, by providing a flexible
platform for distributed event processing.

== Background ==

S4 was initially developed at Yahoo! Labs starting in 2008 to process
user feedback in the context of search advertising. The project was
licensed under the Apache License version 2.0 in October 2010. The
project documentation is currently available at http://s4.io .

== Rationale ==

Stream computing has been growing steadily over the last 20 years.
However, recently there has been an explosion in real-time data
sources including the Web, sensor networks, financial securities
analysis and trading, traffic monitoring, natural language processing
of news and social data, and much more.

As Hadoop evolved as a standard open source solution for batch
processing of massive data sets, there is no equivalent community
supported open source platform for processing data streams in
real-time. While various research projects have evolved into
proprietary commercial products, S4 has the potential to fill the  
gap.

Many projects that require a scalable stream processing architecture
currently use Hadoop by segmenting 

Re: [VOTE] S4 to join the Incubator

2011-09-26 Thread Patrick Hunt
This passes, with 16 +1 votes, plenty of them binding, and no -1 votes.

Thanks to all who voted!

We can now get started creating the Apache S4 podling.

Patrick

On Tue, Sep 20, 2011 at 1:56 PM, Patrick Hunt ph...@apache.org wrote:
 It's been a nearly a week since the S4 proposal was submitted for
 discussion.  A few questions were asked, and the proposal was clarified
 in response.  Sufficient mentors have volunteered.  I thus feel we are
 now ready for a vote.

 The latest proposal can be found at the end of this email and at:

  http://wiki.apache.org/incubator/S4Proposal

 The discussion regarding the proposal can be found at:

  http://s.apache.org/RMU

 Please cast your votes:

 [  ] +1 Accept S4 for incubation
 [  ] +0 Indifferent to S4 incubation
 [  ] -1 Reject S4 for incubation

 This vote will close 72 hours from now.

 Thanks,

 Patrick

 --
 = S4 Proposal =

 == Abstract ==

 S4 (Simple Scalable Streaming System) is a general-purpose,
 distributed, scalable, partially fault-tolerant, pluggable platform
 that allows programmers to easily develop applications for processing
 continuous, unbounded streams of data.

 == Proposal ==

 S4 is a software platform written in Java. Clients that send and
 receive events can be written in any programming language. S4 also
 includes a collection of modules called Processing Elements (or PEs
 for short) that implement basic functionality and can be used by
 application developers. In S4, keyed data events are routed with
 affinity to Processing Elements (PEs), which consume the events and do
 one or both of the following: (1) ''emit'' one or more events which
 may be consumed by other PEs, (2) ''publish'' results. The
 architecture resembles the Actors model, providing semantics of
 encapsulation and location transparency, thus allowing applications to
 be massively concurrent while exposing a simple programming  interface
 to application developers.

 To drive adoption and increase the number of contributors to the
 project, we may need to prioritize the focus based on feedback from
 the community. We believe that one of the top priorities and driving
 design principle for the S4 project is to provide a simple API that
 hides most of the complexity associated with distributed systems and
 concurrency. The project grew out of the need to provide a flexible
 platform for application developers and scientists that can be used
 for quick experimentation and production.

 S4 differs from existing Apache projects in a number of fundamental
 ways. Flume is an Incubator project that focuses on log processing,
 performing lightweight processing in a distributed fashion and
 accumulating log data in a centralized repository for batch
 processing. S4 instead performs all stream processing in a distributed
 fashion and enables applications to form arbitrary graphs to process
 streams of events. We see Flume as a complementary project. We also
 expect S4 to complement Hadoop processing and in some cases to
 supersede it. Kafka is another Incubator project that focuses on
 processing large amounts of stream data. The design of Kafka, however,
 follows the pub-sub paradigm, which focuses on delivering messages
 containing arbitrary data from source processes (publishers) to
 consumer processes (subscribers). Compared to S4, Kafka is an
 intermediate step between data generation and processing, while S4 is
 itself a platform for processing streams of events.

 S4 overall addresses a need of existing applications to process
 streams of events beyond moving data to a centralized repository for
 batch processing. It complements the features of existing Apache
 projects, such as Hadoop, Flume, and Kafka, by providing a flexible
 platform for distributed event processing.

 == Background ==

 S4 was initially developed at Yahoo! Labs starting in 2008 to process
 user feedback in the context of search advertising. The project was
 licensed under the Apache License version 2.0 in October 2010. The
 project documentation is currently available at http://s4.io .

 == Rationale ==

 Stream computing has been growing steadily over the last 20 years.
 However, recently there has been an explosion in real-time data
 sources including the Web, sensor networks, financial securities
 analysis and trading, traffic monitoring, natural language processing
 of news and social data, and much more.

 As Hadoop evolved as a standard open source solution for batch
 processing of massive data sets, there is no equivalent community
 supported open source platform for processing data streams in
 real-time. While various research projects have evolved into
 proprietary commercial products, S4 has the potential to fill the gap.
 Many projects that require a scalable stream processing architecture
 currently use Hadoop by segmenting the input stream into data batches.
 This solution is not efficient, results in high latency, and
 introduces unnecessary complexity.

 The S4 design is 

Re: [VOTE] S4 to join the Incubator

2011-09-26 Thread Leo Neumeyer
Thank you all for your support, looking forward to working with the Apache 
community.

-leo

On Sep 26, 2011, at 9:47 AM, Patrick Hunt wrote:

 This passes, with 16 +1 votes, plenty of them binding, and no -1 votes.
 
 Thanks to all who voted!
 
 We can now get started creating the Apache S4 podling.
 
 Patrick
 
 On Tue, Sep 20, 2011 at 1:56 PM, Patrick Hunt ph...@apache.org wrote:
 It's been a nearly a week since the S4 proposal was submitted for
 discussion.  A few questions were asked, and the proposal was clarified
 in response.  Sufficient mentors have volunteered.  I thus feel we are
 now ready for a vote.
 
 The latest proposal can be found at the end of this email and at:
 
  http://wiki.apache.org/incubator/S4Proposal
 
 The discussion regarding the proposal can be found at:
 
  http://s.apache.org/RMU
 
 Please cast your votes:
 
 [  ] +1 Accept S4 for incubation
 [  ] +0 Indifferent to S4 incubation
 [  ] -1 Reject S4 for incubation
 
 This vote will close 72 hours from now.
 
 Thanks,
 
 Patrick
 
 --
 = S4 Proposal =
 
 == Abstract ==
 
 S4 (Simple Scalable Streaming System) is a general-purpose,
 distributed, scalable, partially fault-tolerant, pluggable platform
 that allows programmers to easily develop applications for processing
 continuous, unbounded streams of data.
 
 == Proposal ==
 
 S4 is a software platform written in Java. Clients that send and
 receive events can be written in any programming language. S4 also
 includes a collection of modules called Processing Elements (or PEs
 for short) that implement basic functionality and can be used by
 application developers. In S4, keyed data events are routed with
 affinity to Processing Elements (PEs), which consume the events and do
 one or both of the following: (1) ''emit'' one or more events which
 may be consumed by other PEs, (2) ''publish'' results. The
 architecture resembles the Actors model, providing semantics of
 encapsulation and location transparency, thus allowing applications to
 be massively concurrent while exposing a simple programming  interface
 to application developers.
 
 To drive adoption and increase the number of contributors to the
 project, we may need to prioritize the focus based on feedback from
 the community. We believe that one of the top priorities and driving
 design principle for the S4 project is to provide a simple API that
 hides most of the complexity associated with distributed systems and
 concurrency. The project grew out of the need to provide a flexible
 platform for application developers and scientists that can be used
 for quick experimentation and production.
 
 S4 differs from existing Apache projects in a number of fundamental
 ways. Flume is an Incubator project that focuses on log processing,
 performing lightweight processing in a distributed fashion and
 accumulating log data in a centralized repository for batch
 processing. S4 instead performs all stream processing in a distributed
 fashion and enables applications to form arbitrary graphs to process
 streams of events. We see Flume as a complementary project. We also
 expect S4 to complement Hadoop processing and in some cases to
 supersede it. Kafka is another Incubator project that focuses on
 processing large amounts of stream data. The design of Kafka, however,
 follows the pub-sub paradigm, which focuses on delivering messages
 containing arbitrary data from source processes (publishers) to
 consumer processes (subscribers). Compared to S4, Kafka is an
 intermediate step between data generation and processing, while S4 is
 itself a platform for processing streams of events.
 
 S4 overall addresses a need of existing applications to process
 streams of events beyond moving data to a centralized repository for
 batch processing. It complements the features of existing Apache
 projects, such as Hadoop, Flume, and Kafka, by providing a flexible
 platform for distributed event processing.
 
 == Background ==
 
 S4 was initially developed at Yahoo! Labs starting in 2008 to process
 user feedback in the context of search advertising. The project was
 licensed under the Apache License version 2.0 in October 2010. The
 project documentation is currently available at http://s4.io .
 
 == Rationale ==
 
 Stream computing has been growing steadily over the last 20 years.
 However, recently there has been an explosion in real-time data
 sources including the Web, sensor networks, financial securities
 analysis and trading, traffic monitoring, natural language processing
 of news and social data, and much more.
 
 As Hadoop evolved as a standard open source solution for batch
 processing of massive data sets, there is no equivalent community
 supported open source platform for processing data streams in
 real-time. While various research projects have evolved into
 proprietary commercial products, S4 has the potential to fill the gap.
 Many projects that require a scalable stream processing architecture
 currently 

Re: [VOTE] S4 to join the Incubator

2011-09-24 Thread Doug Cutting
+1

Doug
On Sep 20, 2011 1:57 PM, Patrick Hunt ph...@apache.org wrote:
 It's been a nearly a week since the S4 proposal was submitted for
 discussion. A few questions were asked, and the proposal was clarified
 in response. Sufficient mentors have volunteered. I thus feel we are
 now ready for a vote.

 The latest proposal can be found at the end of this email and at:

 http://wiki.apache.org/incubator/S4Proposal

 The discussion regarding the proposal can be found at:

 http://s.apache.org/RMU

 Please cast your votes:

 [ ] +1 Accept S4 for incubation
 [ ] +0 Indifferent to S4 incubation
 [ ] -1 Reject S4 for incubation

 This vote will close 72 hours from now.

 Thanks,

 Patrick

 --
 = S4 Proposal =

 == Abstract ==

 S4 (Simple Scalable Streaming System) is a general-purpose,
 distributed, scalable, partially fault-tolerant, pluggable platform
 that allows programmers to easily develop applications for processing
 continuous, unbounded streams of data.

 == Proposal ==

 S4 is a software platform written in Java. Clients that send and
 receive events can be written in any programming language. S4 also
 includes a collection of modules called Processing Elements (or PEs
 for short) that implement basic functionality and can be used by
 application developers. In S4, keyed data events are routed with
 affinity to Processing Elements (PEs), which consume the events and do
 one or both of the following: (1) ''emit'' one or more events which
 may be consumed by other PEs, (2) ''publish'' results. The
 architecture resembles the Actors model, providing semantics of
 encapsulation and location transparency, thus allowing applications to
 be massively concurrent while exposing a simple programming interface
 to application developers.

 To drive adoption and increase the number of contributors to the
 project, we may need to prioritize the focus based on feedback from
 the community. We believe that one of the top priorities and driving
 design principle for the S4 project is to provide a simple API that
 hides most of the complexity associated with distributed systems and
 concurrency. The project grew out of the need to provide a flexible
 platform for application developers and scientists that can be used
 for quick experimentation and production.

 S4 differs from existing Apache projects in a number of fundamental
 ways. Flume is an Incubator project that focuses on log processing,
 performing lightweight processing in a distributed fashion and
 accumulating log data in a centralized repository for batch
 processing. S4 instead performs all stream processing in a distributed
 fashion and enables applications to form arbitrary graphs to process
 streams of events. We see Flume as a complementary project. We also
 expect S4 to complement Hadoop processing and in some cases to
 supersede it. Kafka is another Incubator project that focuses on
 processing large amounts of stream data. The design of Kafka, however,
 follows the pub-sub paradigm, which focuses on delivering messages
 containing arbitrary data from source processes (publishers) to
 consumer processes (subscribers). Compared to S4, Kafka is an
 intermediate step between data generation and processing, while S4 is
 itself a platform for processing streams of events.

 S4 overall addresses a need of existing applications to process
 streams of events beyond moving data to a centralized repository for
 batch processing. It complements the features of existing Apache
 projects, such as Hadoop, Flume, and Kafka, by providing a flexible
 platform for distributed event processing.

 == Background ==

 S4 was initially developed at Yahoo! Labs starting in 2008 to process
 user feedback in the context of search advertising. The project was
 licensed under the Apache License version 2.0 in October 2010. The
 project documentation is currently available at http://s4.io .

 == Rationale ==

 Stream computing has been growing steadily over the last 20 years.
 However, recently there has been an explosion in real-time data
 sources including the Web, sensor networks, financial securities
 analysis and trading, traffic monitoring, natural language processing
 of news and social data, and much more.

 As Hadoop evolved as a standard open source solution for batch
 processing of massive data sets, there is no equivalent community
 supported open source platform for processing data streams in
 real-time. While various research projects have evolved into
 proprietary commercial products, S4 has the potential to fill the gap.
 Many projects that require a scalable stream processing architecture
 currently use Hadoop by segmenting the input stream into data batches.
 This solution is not efficient, results in high latency, and
 introduces unnecessary complexity.

 The S4 design is primarily driven by large scale applications for data
 mining and machine learning in a production environment. We think that
 the S4 design is surprisingly flexible and 

Re: [VOTE] S4 to join the Incubator

2011-09-21 Thread adam wojtuniak
+1

Cheers,
Adam


On Tue, Sep 20, 2011 at 9:56 PM, Patrick Hunt ph...@apache.org wrote:

 It's been a nearly a week since the S4 proposal was submitted for
 discussion.  A few questions were asked, and the proposal was clarified
 in response.  Sufficient mentors have volunteered.  I thus feel we are
 now ready for a vote.

 The latest proposal can be found at the end of this email and at:

  http://wiki.apache.org/incubator/S4Proposal

 The discussion regarding the proposal can be found at:

  http://s.apache.org/RMU

 Please cast your votes:

 [  ] +1 Accept S4 for incubation
 [  ] +0 Indifferent to S4 incubation
 [  ] -1 Reject S4 for incubation

 This vote will close 72 hours from now.

 Thanks,

 Patrick

 --
 = S4 Proposal =

 == Abstract ==

 S4 (Simple Scalable Streaming System) is a general-purpose,
 distributed, scalable, partially fault-tolerant, pluggable platform
 that allows programmers to easily develop applications for processing
 continuous, unbounded streams of data.

 == Proposal ==

 S4 is a software platform written in Java. Clients that send and
 receive events can be written in any programming language. S4 also
 includes a collection of modules called Processing Elements (or PEs
 for short) that implement basic functionality and can be used by
 application developers. In S4, keyed data events are routed with
 affinity to Processing Elements (PEs), which consume the events and do
 one or both of the following: (1) ''emit'' one or more events which
 may be consumed by other PEs, (2) ''publish'' results. The
 architecture resembles the Actors model, providing semantics of
 encapsulation and location transparency, thus allowing applications to
 be massively concurrent while exposing a simple programming  interface
 to application developers.

 To drive adoption and increase the number of contributors to the
 project, we may need to prioritize the focus based on feedback from
 the community. We believe that one of the top priorities and driving
 design principle for the S4 project is to provide a simple API that
 hides most of the complexity associated with distributed systems and
 concurrency. The project grew out of the need to provide a flexible
 platform for application developers and scientists that can be used
 for quick experimentation and production.

 S4 differs from existing Apache projects in a number of fundamental
 ways. Flume is an Incubator project that focuses on log processing,
 performing lightweight processing in a distributed fashion and
 accumulating log data in a centralized repository for batch
 processing. S4 instead performs all stream processing in a distributed
 fashion and enables applications to form arbitrary graphs to process
 streams of events. We see Flume as a complementary project. We also
 expect S4 to complement Hadoop processing and in some cases to
 supersede it. Kafka is another Incubator project that focuses on
 processing large amounts of stream data. The design of Kafka, however,
 follows the pub-sub paradigm, which focuses on delivering messages
 containing arbitrary data from source processes (publishers) to
 consumer processes (subscribers). Compared to S4, Kafka is an
 intermediate step between data generation and processing, while S4 is
 itself a platform for processing streams of events.

 S4 overall addresses a need of existing applications to process
 streams of events beyond moving data to a centralized repository for
 batch processing. It complements the features of existing Apache
 projects, such as Hadoop, Flume, and Kafka, by providing a flexible
 platform for distributed event processing.

 == Background ==

 S4 was initially developed at Yahoo! Labs starting in 2008 to process
 user feedback in the context of search advertising. The project was
 licensed under the Apache License version 2.0 in October 2010. The
 project documentation is currently available at http://s4.io .

 == Rationale ==

 Stream computing has been growing steadily over the last 20 years.
 However, recently there has been an explosion in real-time data
 sources including the Web, sensor networks, financial securities
 analysis and trading, traffic monitoring, natural language processing
 of news and social data, and much more.

 As Hadoop evolved as a standard open source solution for batch
 processing of massive data sets, there is no equivalent community
 supported open source platform for processing data streams in
 real-time. While various research projects have evolved into
 proprietary commercial products, S4 has the potential to fill the gap.
 Many projects that require a scalable stream processing architecture
 currently use Hadoop by segmenting the input stream into data batches.
 This solution is not efficient, results in high latency, and
 introduces unnecessary complexity.

 The S4 design is primarily driven by large scale applications for data
 mining and machine learning in a production environment. We think that
 the S4 design is 

Re: [VOTE] S4 to join the Incubator

2011-09-21 Thread Bertrand Delacretaz
On Tue, Sep 20, 2011 at 10:56 PM, Patrick Hunt ph...@apache.org wrote:
 ...Please cast your votes:

 [ X] +1 Accept S4 for incubation

...

  * Matthieu Morel (mm at s4 dot io)
  * Anish Nair (an at s4 dot com)...

Shouldn't that be s4 dot io instead?

-Bertrand

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [VOTE] S4 to join the Incubator

2011-09-21 Thread Olivier Lamy
+1 (binding)

2011/9/20 Patrick Hunt ph...@apache.org:
 It's been a nearly a week since the S4 proposal was submitted for
 discussion.  A few questions were asked, and the proposal was clarified
 in response.  Sufficient mentors have volunteered.  I thus feel we are
 now ready for a vote.

 The latest proposal can be found at the end of this email and at:

  http://wiki.apache.org/incubator/S4Proposal

 The discussion regarding the proposal can be found at:

  http://s.apache.org/RMU

 Please cast your votes:

 [  ] +1 Accept S4 for incubation
 [  ] +0 Indifferent to S4 incubation
 [  ] -1 Reject S4 for incubation

 This vote will close 72 hours from now.

 Thanks,

 Patrick

 --
 = S4 Proposal =

 == Abstract ==

 S4 (Simple Scalable Streaming System) is a general-purpose,
 distributed, scalable, partially fault-tolerant, pluggable platform
 that allows programmers to easily develop applications for processing
 continuous, unbounded streams of data.

 == Proposal ==

 S4 is a software platform written in Java. Clients that send and
 receive events can be written in any programming language. S4 also
 includes a collection of modules called Processing Elements (or PEs
 for short) that implement basic functionality and can be used by
 application developers. In S4, keyed data events are routed with
 affinity to Processing Elements (PEs), which consume the events and do
 one or both of the following: (1) ''emit'' one or more events which
 may be consumed by other PEs, (2) ''publish'' results. The
 architecture resembles the Actors model, providing semantics of
 encapsulation and location transparency, thus allowing applications to
 be massively concurrent while exposing a simple programming  interface
 to application developers.

 To drive adoption and increase the number of contributors to the
 project, we may need to prioritize the focus based on feedback from
 the community. We believe that one of the top priorities and driving
 design principle for the S4 project is to provide a simple API that
 hides most of the complexity associated with distributed systems and
 concurrency. The project grew out of the need to provide a flexible
 platform for application developers and scientists that can be used
 for quick experimentation and production.

 S4 differs from existing Apache projects in a number of fundamental
 ways. Flume is an Incubator project that focuses on log processing,
 performing lightweight processing in a distributed fashion and
 accumulating log data in a centralized repository for batch
 processing. S4 instead performs all stream processing in a distributed
 fashion and enables applications to form arbitrary graphs to process
 streams of events. We see Flume as a complementary project. We also
 expect S4 to complement Hadoop processing and in some cases to
 supersede it. Kafka is another Incubator project that focuses on
 processing large amounts of stream data. The design of Kafka, however,
 follows the pub-sub paradigm, which focuses on delivering messages
 containing arbitrary data from source processes (publishers) to
 consumer processes (subscribers). Compared to S4, Kafka is an
 intermediate step between data generation and processing, while S4 is
 itself a platform for processing streams of events.

 S4 overall addresses a need of existing applications to process
 streams of events beyond moving data to a centralized repository for
 batch processing. It complements the features of existing Apache
 projects, such as Hadoop, Flume, and Kafka, by providing a flexible
 platform for distributed event processing.

 == Background ==

 S4 was initially developed at Yahoo! Labs starting in 2008 to process
 user feedback in the context of search advertising. The project was
 licensed under the Apache License version 2.0 in October 2010. The
 project documentation is currently available at http://s4.io .

 == Rationale ==

 Stream computing has been growing steadily over the last 20 years.
 However, recently there has been an explosion in real-time data
 sources including the Web, sensor networks, financial securities
 analysis and trading, traffic monitoring, natural language processing
 of news and social data, and much more.

 As Hadoop evolved as a standard open source solution for batch
 processing of massive data sets, there is no equivalent community
 supported open source platform for processing data streams in
 real-time. While various research projects have evolved into
 proprietary commercial products, S4 has the potential to fill the gap.
 Many projects that require a scalable stream processing architecture
 currently use Hadoop by segmenting the input stream into data batches.
 This solution is not efficient, results in high latency, and
 introduces unnecessary complexity.

 The S4 design is primarily driven by large scale applications for data
 mining and machine learning in a production environment. We think that
 the S4 design is surprisingly flexible and lends 

Re: [VOTE] S4 to join the Incubator

2011-09-21 Thread Jean-Baptiste Onofré

+1 (binding)

Regards
JB

On 09/21/2011 11:04 AM, Olivier Lamy wrote:

+1 (binding)

2011/9/20 Patrick Huntph...@apache.org:

It's been a nearly a week since the S4 proposal was submitted for
discussion.  A few questions were asked, and the proposal was clarified
in response.  Sufficient mentors have volunteered.  I thus feel we are
now ready for a vote.

The latest proposal can be found at the end of this email and at:

  http://wiki.apache.org/incubator/S4Proposal

The discussion regarding the proposal can be found at:

  http://s.apache.org/RMU

Please cast your votes:

[  ] +1 Accept S4 for incubation
[  ] +0 Indifferent to S4 incubation
[  ] -1 Reject S4 for incubation

This vote will close 72 hours from now.

Thanks,

Patrick

--
= S4 Proposal =

== Abstract ==

S4 (Simple Scalable Streaming System) is a general-purpose,
distributed, scalable, partially fault-tolerant, pluggable platform
that allows programmers to easily develop applications for processing
continuous, unbounded streams of data.

== Proposal ==

S4 is a software platform written in Java. Clients that send and
receive events can be written in any programming language. S4 also
includes a collection of modules called Processing Elements (or PEs
for short) that implement basic functionality and can be used by
application developers. In S4, keyed data events are routed with
affinity to Processing Elements (PEs), which consume the events and do
one or both of the following: (1) ''emit'' one or more events which
may be consumed by other PEs, (2) ''publish'' results. The
architecture resembles the Actors model, providing semantics of
encapsulation and location transparency, thus allowing applications to
be massively concurrent while exposing a simple programming  interface
to application developers.

To drive adoption and increase the number of contributors to the
project, we may need to prioritize the focus based on feedback from
the community. We believe that one of the top priorities and driving
design principle for the S4 project is to provide a simple API that
hides most of the complexity associated with distributed systems and
concurrency. The project grew out of the need to provide a flexible
platform for application developers and scientists that can be used
for quick experimentation and production.

S4 differs from existing Apache projects in a number of fundamental
ways. Flume is an Incubator project that focuses on log processing,
performing lightweight processing in a distributed fashion and
accumulating log data in a centralized repository for batch
processing. S4 instead performs all stream processing in a distributed
fashion and enables applications to form arbitrary graphs to process
streams of events. We see Flume as a complementary project. We also
expect S4 to complement Hadoop processing and in some cases to
supersede it. Kafka is another Incubator project that focuses on
processing large amounts of stream data. The design of Kafka, however,
follows the pub-sub paradigm, which focuses on delivering messages
containing arbitrary data from source processes (publishers) to
consumer processes (subscribers). Compared to S4, Kafka is an
intermediate step between data generation and processing, while S4 is
itself a platform for processing streams of events.

S4 overall addresses a need of existing applications to process
streams of events beyond moving data to a centralized repository for
batch processing. It complements the features of existing Apache
projects, such as Hadoop, Flume, and Kafka, by providing a flexible
platform for distributed event processing.

== Background ==

S4 was initially developed at Yahoo! Labs starting in 2008 to process
user feedback in the context of search advertising. The project was
licensed under the Apache License version 2.0 in October 2010. The
project documentation is currently available at http://s4.io .

== Rationale ==

Stream computing has been growing steadily over the last 20 years.
However, recently there has been an explosion in real-time data
sources including the Web, sensor networks, financial securities
analysis and trading, traffic monitoring, natural language processing
of news and social data, and much more.

As Hadoop evolved as a standard open source solution for batch
processing of massive data sets, there is no equivalent community
supported open source platform for processing data streams in
real-time. While various research projects have evolved into
proprietary commercial products, S4 has the potential to fill the gap.
Many projects that require a scalable stream processing architecture
currently use Hadoop by segmenting the input stream into data batches.
This solution is not efficient, results in high latency, and
introduces unnecessary complexity.

The S4 design is primarily driven by large scale applications for data
mining and machine learning in a production environment. We think that
the S4 design is surprisingly flexible and lends itself to run 

Re: [VOTE] S4 to join the Incubator

2011-09-21 Thread Flavio Junqueira

Thanks for pointing it out, Bertrand. I have just fixed it on the wiki.

-Flavio

On Sep 21, 2011, at 10:51 AM, Bertrand Delacretaz wrote:

On Tue, Sep 20, 2011 at 10:56 PM, Patrick Hunt ph...@apache.org  
wrote:

...Please cast your votes:

[ X] +1 Accept S4 for incubation


...


 * Matthieu Morel (mm at s4 dot io)
 * Anish Nair (an at s4 dot com)...


Shouldn't that be s4 dot io instead?

-Bertrand

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org




-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [VOTE] S4 to join the Incubator

2011-09-21 Thread Julien Vermillard
+1 (binding)

On Wed, Sep 21, 2011 at 12:05 PM, Flavio Junqueira f...@s4.io wrote:
 Thanks for pointing it out, Bertrand. I have just fixed it on the wiki.

 -Flavio

 On Sep 21, 2011, at 10:51 AM, Bertrand Delacretaz wrote:

 On Tue, Sep 20, 2011 at 10:56 PM, Patrick Hunt ph...@apache.org wrote:

 ...Please cast your votes:

 [ X] +1 Accept S4 for incubation

 ...

  * Matthieu Morel (mm at s4 dot io)
  * Anish Nair (an at s4 dot com)...

 Shouldn't that be s4 dot io instead?

 -Bertrand

 -
 To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
 For additional commands, e-mail: general-h...@incubator.apache.org



 -
 To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
 For additional commands, e-mail: general-h...@incubator.apache.org



-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [VOTE] S4 to join the Incubator

2011-09-21 Thread Tim Williams
On Tue, Sep 20, 2011 at 4:56 PM, Patrick Hunt ph...@apache.org wrote:
 It's been a nearly a week since the S4 proposal was submitted for
 discussion.  A few questions were asked, and the proposal was clarified
 in response.  Sufficient mentors have volunteered.  I thus feel we are
 now ready for a vote.

 The latest proposal can be found at the end of this email and at:

  http://wiki.apache.org/incubator/S4Proposal

 The discussion regarding the proposal can be found at:

  http://s.apache.org/RMU

 Please cast your votes:

 [  ] +1 Accept S4 for incubation
 [  ] +0 Indifferent to S4 incubation
 [  ] -1 Reject S4 for incubation

+1

--tim

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [VOTE] S4 to join the Incubator

2011-09-21 Thread Raffaele P. Guidi
+1 (non binding) great project

On Wed, Sep 21, 2011 at 2:41 PM, Tim Williams william...@gmail.com wrote:

 On Tue, Sep 20, 2011 at 4:56 PM, Patrick Hunt ph...@apache.org wrote:
  It's been a nearly a week since the S4 proposal was submitted for
  discussion.  A few questions were asked, and the proposal was clarified
  in response.  Sufficient mentors have volunteered.  I thus feel we are
  now ready for a vote.
 
  The latest proposal can be found at the end of this email and at:
 
   http://wiki.apache.org/incubator/S4Proposal
 
  The discussion regarding the proposal can be found at:
 
   http://s.apache.org/RMU
 
  Please cast your votes:
 
  [  ] +1 Accept S4 for incubation
  [  ] +0 Indifferent to S4 incubation
  [  ] -1 Reject S4 for incubation

 +1

 --tim

 -
 To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
 For additional commands, e-mail: general-h...@incubator.apache.org




Re: [VOTE] S4 to join the Incubator

2011-09-21 Thread Patrick Hunt
+1 (binding)

Patrick

On Wed, Sep 21, 2011 at 6:28 AM, Raffaele P. Guidi
raffaele.p.gu...@gmail.com wrote:
 +1 (non binding) great project

 On Wed, Sep 21, 2011 at 2:41 PM, Tim Williams william...@gmail.com wrote:

 On Tue, Sep 20, 2011 at 4:56 PM, Patrick Hunt ph...@apache.org wrote:
  It's been a nearly a week since the S4 proposal was submitted for
  discussion.  A few questions were asked, and the proposal was clarified
  in response.  Sufficient mentors have volunteered.  I thus feel we are
  now ready for a vote.
 
  The latest proposal can be found at the end of this email and at:
 
   http://wiki.apache.org/incubator/S4Proposal
 
  The discussion regarding the proposal can be found at:
 
   http://s.apache.org/RMU
 
  Please cast your votes:
 
  [  ] +1 Accept S4 for incubation
  [  ] +0 Indifferent to S4 incubation
  [  ] -1 Reject S4 for incubation

 +1

 --tim

 -
 To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
 For additional commands, e-mail: general-h...@incubator.apache.org




-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



[VOTE] S4 to join the Incubator

2011-09-20 Thread Patrick Hunt
It's been a nearly a week since the S4 proposal was submitted for
discussion.  A few questions were asked, and the proposal was clarified
in response.  Sufficient mentors have volunteered.  I thus feel we are
now ready for a vote.

The latest proposal can be found at the end of this email and at:

 http://wiki.apache.org/incubator/S4Proposal

The discussion regarding the proposal can be found at:

 http://s.apache.org/RMU

Please cast your votes:

[  ] +1 Accept S4 for incubation
[  ] +0 Indifferent to S4 incubation
[  ] -1 Reject S4 for incubation

This vote will close 72 hours from now.

Thanks,

Patrick

--
= S4 Proposal =

== Abstract ==

S4 (Simple Scalable Streaming System) is a general-purpose,
distributed, scalable, partially fault-tolerant, pluggable platform
that allows programmers to easily develop applications for processing
continuous, unbounded streams of data.

== Proposal ==

S4 is a software platform written in Java. Clients that send and
receive events can be written in any programming language. S4 also
includes a collection of modules called Processing Elements (or PEs
for short) that implement basic functionality and can be used by
application developers. In S4, keyed data events are routed with
affinity to Processing Elements (PEs), which consume the events and do
one or both of the following: (1) ''emit'' one or more events which
may be consumed by other PEs, (2) ''publish'' results. The
architecture resembles the Actors model, providing semantics of
encapsulation and location transparency, thus allowing applications to
be massively concurrent while exposing a simple programming  interface
to application developers.

To drive adoption and increase the number of contributors to the
project, we may need to prioritize the focus based on feedback from
the community. We believe that one of the top priorities and driving
design principle for the S4 project is to provide a simple API that
hides most of the complexity associated with distributed systems and
concurrency. The project grew out of the need to provide a flexible
platform for application developers and scientists that can be used
for quick experimentation and production.

S4 differs from existing Apache projects in a number of fundamental
ways. Flume is an Incubator project that focuses on log processing,
performing lightweight processing in a distributed fashion and
accumulating log data in a centralized repository for batch
processing. S4 instead performs all stream processing in a distributed
fashion and enables applications to form arbitrary graphs to process
streams of events. We see Flume as a complementary project. We also
expect S4 to complement Hadoop processing and in some cases to
supersede it. Kafka is another Incubator project that focuses on
processing large amounts of stream data. The design of Kafka, however,
follows the pub-sub paradigm, which focuses on delivering messages
containing arbitrary data from source processes (publishers) to
consumer processes (subscribers). Compared to S4, Kafka is an
intermediate step between data generation and processing, while S4 is
itself a platform for processing streams of events.

S4 overall addresses a need of existing applications to process
streams of events beyond moving data to a centralized repository for
batch processing. It complements the features of existing Apache
projects, such as Hadoop, Flume, and Kafka, by providing a flexible
platform for distributed event processing.

== Background ==

S4 was initially developed at Yahoo! Labs starting in 2008 to process
user feedback in the context of search advertising. The project was
licensed under the Apache License version 2.0 in October 2010. The
project documentation is currently available at http://s4.io .

== Rationale ==

Stream computing has been growing steadily over the last 20 years.
However, recently there has been an explosion in real-time data
sources including the Web, sensor networks, financial securities
analysis and trading, traffic monitoring, natural language processing
of news and social data, and much more.

As Hadoop evolved as a standard open source solution for batch
processing of massive data sets, there is no equivalent community
supported open source platform for processing data streams in
real-time. While various research projects have evolved into
proprietary commercial products, S4 has the potential to fill the gap.
Many projects that require a scalable stream processing architecture
currently use Hadoop by segmenting the input stream into data batches.
This solution is not efficient, results in high latency, and
introduces unnecessary complexity.

The S4 design is primarily driven by large scale applications for data
mining and machine learning in a production environment. We think that
the S4 design is surprisingly flexible and lends itself to run in
large clusters built with commodity hardware.

S4 enables application programmers to focus more on the application
and less on 

Re: [VOTE] S4 to join the Incubator

2011-09-20 Thread Ahmed Radwan
Great project. +1 (non-binding)


On Tue, Sep 20, 2011 at 1:56 PM, Patrick Hunt ph...@apache.org wrote:

 It's been a nearly a week since the S4 proposal was submitted for
 discussion.  A few questions were asked, and the proposal was clarified
 in response.  Sufficient mentors have volunteered.  I thus feel we are
 now ready for a vote.

 The latest proposal can be found at the end of this email and at:

  http://wiki.apache.org/incubator/S4Proposal

 The discussion regarding the proposal can be found at:

  http://s.apache.org/RMU

 Please cast your votes:

 [  ] +1 Accept S4 for incubation
 [  ] +0 Indifferent to S4 incubation
 [  ] -1 Reject S4 for incubation

 This vote will close 72 hours from now.

 Thanks,

 Patrick

 --
 = S4 Proposal =

 == Abstract ==

 S4 (Simple Scalable Streaming System) is a general-purpose,
 distributed, scalable, partially fault-tolerant, pluggable platform
 that allows programmers to easily develop applications for processing
 continuous, unbounded streams of data.

 == Proposal ==

 S4 is a software platform written in Java. Clients that send and
 receive events can be written in any programming language. S4 also
 includes a collection of modules called Processing Elements (or PEs
 for short) that implement basic functionality and can be used by
 application developers. In S4, keyed data events are routed with
 affinity to Processing Elements (PEs), which consume the events and do
 one or both of the following: (1) ''emit'' one or more events which
 may be consumed by other PEs, (2) ''publish'' results. The
 architecture resembles the Actors model, providing semantics of
 encapsulation and location transparency, thus allowing applications to
 be massively concurrent while exposing a simple programming  interface
 to application developers.

 To drive adoption and increase the number of contributors to the
 project, we may need to prioritize the focus based on feedback from
 the community. We believe that one of the top priorities and driving
 design principle for the S4 project is to provide a simple API that
 hides most of the complexity associated with distributed systems and
 concurrency. The project grew out of the need to provide a flexible
 platform for application developers and scientists that can be used
 for quick experimentation and production.

 S4 differs from existing Apache projects in a number of fundamental
 ways. Flume is an Incubator project that focuses on log processing,
 performing lightweight processing in a distributed fashion and
 accumulating log data in a centralized repository for batch
 processing. S4 instead performs all stream processing in a distributed
 fashion and enables applications to form arbitrary graphs to process
 streams of events. We see Flume as a complementary project. We also
 expect S4 to complement Hadoop processing and in some cases to
 supersede it. Kafka is another Incubator project that focuses on
 processing large amounts of stream data. The design of Kafka, however,
 follows the pub-sub paradigm, which focuses on delivering messages
 containing arbitrary data from source processes (publishers) to
 consumer processes (subscribers). Compared to S4, Kafka is an
 intermediate step between data generation and processing, while S4 is
 itself a platform for processing streams of events.

 S4 overall addresses a need of existing applications to process
 streams of events beyond moving data to a centralized repository for
 batch processing. It complements the features of existing Apache
 projects, such as Hadoop, Flume, and Kafka, by providing a flexible
 platform for distributed event processing.

 == Background ==

 S4 was initially developed at Yahoo! Labs starting in 2008 to process
 user feedback in the context of search advertising. The project was
 licensed under the Apache License version 2.0 in October 2010. The
 project documentation is currently available at http://s4.io .

 == Rationale ==

 Stream computing has been growing steadily over the last 20 years.
 However, recently there has been an explosion in real-time data
 sources including the Web, sensor networks, financial securities
 analysis and trading, traffic monitoring, natural language processing
 of news and social data, and much more.

 As Hadoop evolved as a standard open source solution for batch
 processing of massive data sets, there is no equivalent community
 supported open source platform for processing data streams in
 real-time. While various research projects have evolved into
 proprietary commercial products, S4 has the potential to fill the gap.
 Many projects that require a scalable stream processing architecture
 currently use Hadoop by segmenting the input stream into data batches.
 This solution is not efficient, results in high latency, and
 introduces unnecessary complexity.

 The S4 design is primarily driven by large scale applications for data
 mining and machine learning in a production environment. We think that
 

Re: [VOTE] S4 to join the Incubator

2011-09-20 Thread Ashish
+1

On Wed, Sep 21, 2011 at 2:26 AM, Patrick Hunt ph...@apache.org wrote:
 It's been a nearly a week since the S4 proposal was submitted for
 discussion.  A few questions were asked, and the proposal was clarified
 in response.  Sufficient mentors have volunteered.  I thus feel we are
 now ready for a vote.

 The latest proposal can be found at the end of this email and at:

  http://wiki.apache.org/incubator/S4Proposal

 The discussion regarding the proposal can be found at:

  http://s.apache.org/RMU

 Please cast your votes:

 [  ] +1 Accept S4 for incubation
 [  ] +0 Indifferent to S4 incubation
 [  ] -1 Reject S4 for incubation

 This vote will close 72 hours from now.

 Thanks,

 Patrick

 --
 = S4 Proposal =

 == Abstract ==

 S4 (Simple Scalable Streaming System) is a general-purpose,
 distributed, scalable, partially fault-tolerant, pluggable platform
 that allows programmers to easily develop applications for processing
 continuous, unbounded streams of data.

 == Proposal ==

 S4 is a software platform written in Java. Clients that send and
 receive events can be written in any programming language. S4 also
 includes a collection of modules called Processing Elements (or PEs
 for short) that implement basic functionality and can be used by
 application developers. In S4, keyed data events are routed with
 affinity to Processing Elements (PEs), which consume the events and do
 one or both of the following: (1) ''emit'' one or more events which
 may be consumed by other PEs, (2) ''publish'' results. The
 architecture resembles the Actors model, providing semantics of
 encapsulation and location transparency, thus allowing applications to
 be massively concurrent while exposing a simple programming  interface
 to application developers.

 To drive adoption and increase the number of contributors to the
 project, we may need to prioritize the focus based on feedback from
 the community. We believe that one of the top priorities and driving
 design principle for the S4 project is to provide a simple API that
 hides most of the complexity associated with distributed systems and
 concurrency. The project grew out of the need to provide a flexible
 platform for application developers and scientists that can be used
 for quick experimentation and production.

 S4 differs from existing Apache projects in a number of fundamental
 ways. Flume is an Incubator project that focuses on log processing,
 performing lightweight processing in a distributed fashion and
 accumulating log data in a centralized repository for batch
 processing. S4 instead performs all stream processing in a distributed
 fashion and enables applications to form arbitrary graphs to process
 streams of events. We see Flume as a complementary project. We also
 expect S4 to complement Hadoop processing and in some cases to
 supersede it. Kafka is another Incubator project that focuses on
 processing large amounts of stream data. The design of Kafka, however,
 follows the pub-sub paradigm, which focuses on delivering messages
 containing arbitrary data from source processes (publishers) to
 consumer processes (subscribers). Compared to S4, Kafka is an
 intermediate step between data generation and processing, while S4 is
 itself a platform for processing streams of events.

 S4 overall addresses a need of existing applications to process
 streams of events beyond moving data to a centralized repository for
 batch processing. It complements the features of existing Apache
 projects, such as Hadoop, Flume, and Kafka, by providing a flexible
 platform for distributed event processing.

 == Background ==

 S4 was initially developed at Yahoo! Labs starting in 2008 to process
 user feedback in the context of search advertising. The project was
 licensed under the Apache License version 2.0 in October 2010. The
 project documentation is currently available at http://s4.io .

 == Rationale ==

 Stream computing has been growing steadily over the last 20 years.
 However, recently there has been an explosion in real-time data
 sources including the Web, sensor networks, financial securities
 analysis and trading, traffic monitoring, natural language processing
 of news and social data, and much more.

 As Hadoop evolved as a standard open source solution for batch
 processing of massive data sets, there is no equivalent community
 supported open source platform for processing data streams in
 real-time. While various research projects have evolved into
 proprietary commercial products, S4 has the potential to fill the gap.
 Many projects that require a scalable stream processing architecture
 currently use Hadoop by segmenting the input stream into data batches.
 This solution is not efficient, results in high latency, and
 introduces unnecessary complexity.

 The S4 design is primarily driven by large scale applications for data
 mining and machine learning in a production environment. We think that
 the S4 design is surprisingly 

Re: [VOTE] S4 to join the Incubator

2011-09-20 Thread Phillip Rhodes
On Tue, Sep 20, 2011 at 4:56 PM, Patrick Hunt ph...@apache.org wrote:

 [  ] +1 Accept S4 for incubation
 [  ] +0 Indifferent to S4 incubation
 [  ] -1 Reject S4 for incubation

+1


Cheers,


Phil

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [VOTE] S4 to join the Incubator

2011-09-20 Thread Arun C Murthy
+1 (binding)

Arun

On Sep 20, 2011, at 1:56 PM, Patrick Hunt wrote:

 It's been a nearly a week since the S4 proposal was submitted for
 discussion.  A few questions were asked, and the proposal was clarified
 in response.  Sufficient mentors have volunteered.  I thus feel we are
 now ready for a vote.
 
 The latest proposal can be found at the end of this email and at:
 
 http://wiki.apache.org/incubator/S4Proposal
 
 The discussion regarding the proposal can be found at:
 
 http://s.apache.org/RMU
 
 Please cast your votes:
 
 [  ] +1 Accept S4 for incubation
 [  ] +0 Indifferent to S4 incubation
 [  ] -1 Reject S4 for incubation
 
 This vote will close 72 hours from now.
 
 Thanks,
 
 Patrick
 
 --
 = S4 Proposal =
 
 == Abstract ==
 
 S4 (Simple Scalable Streaming System) is a general-purpose,
 distributed, scalable, partially fault-tolerant, pluggable platform
 that allows programmers to easily develop applications for processing
 continuous, unbounded streams of data.
 
 == Proposal ==
 
 S4 is a software platform written in Java. Clients that send and
 receive events can be written in any programming language. S4 also
 includes a collection of modules called Processing Elements (or PEs
 for short) that implement basic functionality and can be used by
 application developers. In S4, keyed data events are routed with
 affinity to Processing Elements (PEs), which consume the events and do
 one or both of the following: (1) ''emit'' one or more events which
 may be consumed by other PEs, (2) ''publish'' results. The
 architecture resembles the Actors model, providing semantics of
 encapsulation and location transparency, thus allowing applications to
 be massively concurrent while exposing a simple programming  interface
 to application developers.
 
 To drive adoption and increase the number of contributors to the
 project, we may need to prioritize the focus based on feedback from
 the community. We believe that one of the top priorities and driving
 design principle for the S4 project is to provide a simple API that
 hides most of the complexity associated with distributed systems and
 concurrency. The project grew out of the need to provide a flexible
 platform for application developers and scientists that can be used
 for quick experimentation and production.
 
 S4 differs from existing Apache projects in a number of fundamental
 ways. Flume is an Incubator project that focuses on log processing,
 performing lightweight processing in a distributed fashion and
 accumulating log data in a centralized repository for batch
 processing. S4 instead performs all stream processing in a distributed
 fashion and enables applications to form arbitrary graphs to process
 streams of events. We see Flume as a complementary project. We also
 expect S4 to complement Hadoop processing and in some cases to
 supersede it. Kafka is another Incubator project that focuses on
 processing large amounts of stream data. The design of Kafka, however,
 follows the pub-sub paradigm, which focuses on delivering messages
 containing arbitrary data from source processes (publishers) to
 consumer processes (subscribers). Compared to S4, Kafka is an
 intermediate step between data generation and processing, while S4 is
 itself a platform for processing streams of events.
 
 S4 overall addresses a need of existing applications to process
 streams of events beyond moving data to a centralized repository for
 batch processing. It complements the features of existing Apache
 projects, such as Hadoop, Flume, and Kafka, by providing a flexible
 platform for distributed event processing.
 
 == Background ==
 
 S4 was initially developed at Yahoo! Labs starting in 2008 to process
 user feedback in the context of search advertising. The project was
 licensed under the Apache License version 2.0 in October 2010. The
 project documentation is currently available at http://s4.io .
 
 == Rationale ==
 
 Stream computing has been growing steadily over the last 20 years.
 However, recently there has been an explosion in real-time data
 sources including the Web, sensor networks, financial securities
 analysis and trading, traffic monitoring, natural language processing
 of news and social data, and much more.
 
 As Hadoop evolved as a standard open source solution for batch
 processing of massive data sets, there is no equivalent community
 supported open source platform for processing data streams in
 real-time. While various research projects have evolved into
 proprietary commercial products, S4 has the potential to fill the gap.
 Many projects that require a scalable stream processing architecture
 currently use Hadoop by segmenting the input stream into data batches.
 This solution is not efficient, results in high latency, and
 introduces unnecessary complexity.
 
 The S4 design is primarily driven by large scale applications for data
 mining and machine learning in a production environment. We think that
 the S4 design 

Re: [VOTE] S4 to join the Incubator

2011-09-20 Thread Otis Gospodnetic
+1

Otis


Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



From: Patrick Hunt ph...@apache.org
To: general@incubator.apache.org
Sent: Tuesday, September 20, 2011 4:56 PM
Subject: [VOTE] S4 to join the Incubator

It's been a nearly a week since the S4 proposal was submitted for
discussion.  A few questions were asked, and the proposal was clarified
in response.  Sufficient mentors have volunteered.  I thus feel we are
now ready for a vote.

The latest proposal can be found at the end of this email and at:

http://wiki.apache.org/incubator/S4Proposal

The discussion regarding the proposal can be found at:

http://s.apache.org/RMU

Please cast your votes:

[  ] +1 Accept S4 for incubation
[  ] +0 Indifferent to S4 incubation
[  ] -1 Reject S4 for incubation

This vote will close 72 hours from now.

Thanks,

Patrick

--
= S4 Proposal =

== Abstract ==

S4 (Simple Scalable Streaming System) is a general-purpose,
distributed, scalable, partially fault-tolerant, pluggable platform
that allows programmers to easily develop applications for processing
continuous, unbounded streams of data.

== Proposal ==

S4 is a software platform written in Java. Clients that send and
receive events can be written in any programming language. S4 also
includes a collection of modules called Processing Elements (or PEs
for short) that implement basic functionality and can be used by
application developers. In S4, keyed data events are routed with
affinity to Processing Elements (PEs), which consume the events and do
one or both of the following: (1) ''emit'' one or more events which
may be consumed by other PEs, (2) ''publish'' results. The
architecture resembles the Actors model, providing semantics of
encapsulation and location transparency, thus allowing applications to
be massively concurrent while exposing a simple programming  interface
to application developers.

To drive adoption and increase the number of contributors to the
project, we may need to prioritize the focus based on feedback from
the community. We believe that one of the top priorities and driving
design principle for the S4 project is to provide a simple API that
hides most of the complexity associated with distributed systems and
concurrency. The project grew out of the need to provide a flexible
platform for application developers and scientists that can be used
for quick experimentation and production.

S4 differs from existing Apache projects in a number of fundamental
ways. Flume is an Incubator project that focuses on log processing,
performing lightweight processing in a distributed fashion and
accumulating log data in a centralized repository for batch
processing. S4 instead performs all stream processing in a distributed
fashion and enables applications to form arbitrary graphs to process
streams of events. We see Flume as a complementary project. We also
expect S4 to complement Hadoop processing and in some cases to
supersede it. Kafka is another Incubator project that focuses on
processing large amounts of stream data. The design of Kafka, however,
follows the pub-sub paradigm, which focuses on delivering messages
containing arbitrary data from source processes (publishers) to
consumer processes (subscribers). Compared to S4, Kafka is an
intermediate step between data generation and processing, while S4 is
itself a platform for processing streams of events.

S4 overall addresses a need of existing applications to process
streams of events beyond moving data to a centralized repository for
batch processing. It complements the features of existing Apache
projects, such as Hadoop, Flume, and Kafka, by providing a flexible
platform for distributed event processing.

== Background ==

S4 was initially developed at Yahoo! Labs starting in 2008 to process
user feedback in the context of search advertising. The project was
licensed under the Apache License version 2.0 in October 2010. The
project documentation is currently available at http://s4.io .

== Rationale ==

Stream computing has been growing steadily over the last 20 years.
However, recently there has been an explosion in real-time data
sources including the Web, sensor networks, financial securities
analysis and trading, traffic monitoring, natural language processing
of news and social data, and much more.

As Hadoop evolved as a standard open source solution for batch
processing of massive data sets, there is no equivalent community
supported open source platform for processing data streams in
real-time. While various research projects have evolved into
proprietary commercial products, S4 has the potential to fill the gap.
Many projects that require a scalable stream processing architecture
currently use Hadoop by segmenting the input stream into data batches.
This solution is not efficient, results in high latency, and
introduces unnecessary complexity.

The S4 design

Re: [VOTE] S4 to join the Incubator

2011-09-20 Thread Joey Echeverria
+1 (non-binding)

On Tue, Sep 20, 2011 at 4:56 PM, Patrick Hunt ph...@apache.org wrote:
 It's been a nearly a week since the S4 proposal was submitted for
 discussion.  A few questions were asked, and the proposal was clarified
 in response.  Sufficient mentors have volunteered.  I thus feel we are
 now ready for a vote.

 The latest proposal can be found at the end of this email and at:

  http://wiki.apache.org/incubator/S4Proposal

 The discussion regarding the proposal can be found at:

  http://s.apache.org/RMU

 Please cast your votes:

 [  ] +1 Accept S4 for incubation
 [  ] +0 Indifferent to S4 incubation
 [  ] -1 Reject S4 for incubation

 This vote will close 72 hours from now.

 Thanks,

 Patrick

 --
 = S4 Proposal =

 == Abstract ==

 S4 (Simple Scalable Streaming System) is a general-purpose,
 distributed, scalable, partially fault-tolerant, pluggable platform
 that allows programmers to easily develop applications for processing
 continuous, unbounded streams of data.

 == Proposal ==

 S4 is a software platform written in Java. Clients that send and
 receive events can be written in any programming language. S4 also
 includes a collection of modules called Processing Elements (or PEs
 for short) that implement basic functionality and can be used by
 application developers. In S4, keyed data events are routed with
 affinity to Processing Elements (PEs), which consume the events and do
 one or both of the following: (1) ''emit'' one or more events which
 may be consumed by other PEs, (2) ''publish'' results. The
 architecture resembles the Actors model, providing semantics of
 encapsulation and location transparency, thus allowing applications to
 be massively concurrent while exposing a simple programming  interface
 to application developers.

 To drive adoption and increase the number of contributors to the
 project, we may need to prioritize the focus based on feedback from
 the community. We believe that one of the top priorities and driving
 design principle for the S4 project is to provide a simple API that
 hides most of the complexity associated with distributed systems and
 concurrency. The project grew out of the need to provide a flexible
 platform for application developers and scientists that can be used
 for quick experimentation and production.

 S4 differs from existing Apache projects in a number of fundamental
 ways. Flume is an Incubator project that focuses on log processing,
 performing lightweight processing in a distributed fashion and
 accumulating log data in a centralized repository for batch
 processing. S4 instead performs all stream processing in a distributed
 fashion and enables applications to form arbitrary graphs to process
 streams of events. We see Flume as a complementary project. We also
 expect S4 to complement Hadoop processing and in some cases to
 supersede it. Kafka is another Incubator project that focuses on
 processing large amounts of stream data. The design of Kafka, however,
 follows the pub-sub paradigm, which focuses on delivering messages
 containing arbitrary data from source processes (publishers) to
 consumer processes (subscribers). Compared to S4, Kafka is an
 intermediate step between data generation and processing, while S4 is
 itself a platform for processing streams of events.

 S4 overall addresses a need of existing applications to process
 streams of events beyond moving data to a centralized repository for
 batch processing. It complements the features of existing Apache
 projects, such as Hadoop, Flume, and Kafka, by providing a flexible
 platform for distributed event processing.

 == Background ==

 S4 was initially developed at Yahoo! Labs starting in 2008 to process
 user feedback in the context of search advertising. The project was
 licensed under the Apache License version 2.0 in October 2010. The
 project documentation is currently available at http://s4.io .

 == Rationale ==

 Stream computing has been growing steadily over the last 20 years.
 However, recently there has been an explosion in real-time data
 sources including the Web, sensor networks, financial securities
 analysis and trading, traffic monitoring, natural language processing
 of news and social data, and much more.

 As Hadoop evolved as a standard open source solution for batch
 processing of massive data sets, there is no equivalent community
 supported open source platform for processing data streams in
 real-time. While various research projects have evolved into
 proprietary commercial products, S4 has the potential to fill the gap.
 Many projects that require a scalable stream processing architecture
 currently use Hadoop by segmenting the input stream into data batches.
 This solution is not efficient, results in high latency, and
 introduces unnecessary complexity.

 The S4 design is primarily driven by large scale applications for data
 mining and machine learning in a production environment. We think that
 the S4 design is 

Re: [VOTE] S4 to join the Incubator

2011-09-20 Thread Vinod Kumar Vavilapalli
+1 (non-binding)

+Vinod

On Wed, Sep 21, 2011 at 2:26 AM, Patrick Hunt ph...@apache.org wrote:

 It's been a nearly a week since the S4 proposal was submitted for
 discussion.  A few questions were asked, and the proposal was clarified
 in response.  Sufficient mentors have volunteered.  I thus feel we are
 now ready for a vote.

 The latest proposal can be found at the end of this email and at:

  http://wiki.apache.org/incubator/S4Proposal

 The discussion regarding the proposal can be found at:

  http://s.apache.org/RMU

 Please cast your votes:

 [  ] +1 Accept S4 for incubation
 [  ] +0 Indifferent to S4 incubation
 [  ] -1 Reject S4 for incubation

 This vote will close 72 hours from now.

 Thanks,

 Patrick

 --
 = S4 Proposal =

 == Abstract ==

 S4 (Simple Scalable Streaming System) is a general-purpose,
 distributed, scalable, partially fault-tolerant, pluggable platform
 that allows programmers to easily develop applications for processing
 continuous, unbounded streams of data.

 == Proposal ==

 S4 is a software platform written in Java. Clients that send and
 receive events can be written in any programming language. S4 also
 includes a collection of modules called Processing Elements (or PEs
 for short) that implement basic functionality and can be used by
 application developers. In S4, keyed data events are routed with
 affinity to Processing Elements (PEs), which consume the events and do
 one or both of the following: (1) ''emit'' one or more events which
 may be consumed by other PEs, (2) ''publish'' results. The
 architecture resembles the Actors model, providing semantics of
 encapsulation and location transparency, thus allowing applications to
 be massively concurrent while exposing a simple programming  interface
 to application developers.

 To drive adoption and increase the number of contributors to the
 project, we may need to prioritize the focus based on feedback from
 the community. We believe that one of the top priorities and driving
 design principle for the S4 project is to provide a simple API that
 hides most of the complexity associated with distributed systems and
 concurrency. The project grew out of the need to provide a flexible
 platform for application developers and scientists that can be used
 for quick experimentation and production.

 S4 differs from existing Apache projects in a number of fundamental
 ways. Flume is an Incubator project that focuses on log processing,
 performing lightweight processing in a distributed fashion and
 accumulating log data in a centralized repository for batch
 processing. S4 instead performs all stream processing in a distributed
 fashion and enables applications to form arbitrary graphs to process
 streams of events. We see Flume as a complementary project. We also
 expect S4 to complement Hadoop processing and in some cases to
 supersede it. Kafka is another Incubator project that focuses on
 processing large amounts of stream data. The design of Kafka, however,
 follows the pub-sub paradigm, which focuses on delivering messages
 containing arbitrary data from source processes (publishers) to
 consumer processes (subscribers). Compared to S4, Kafka is an
 intermediate step between data generation and processing, while S4 is
 itself a platform for processing streams of events.

 S4 overall addresses a need of existing applications to process
 streams of events beyond moving data to a centralized repository for
 batch processing. It complements the features of existing Apache
 projects, such as Hadoop, Flume, and Kafka, by providing a flexible
 platform for distributed event processing.

 == Background ==

 S4 was initially developed at Yahoo! Labs starting in 2008 to process
 user feedback in the context of search advertising. The project was
 licensed under the Apache License version 2.0 in October 2010. The
 project documentation is currently available at http://s4.io .

 == Rationale ==

 Stream computing has been growing steadily over the last 20 years.
 However, recently there has been an explosion in real-time data
 sources including the Web, sensor networks, financial securities
 analysis and trading, traffic monitoring, natural language processing
 of news and social data, and much more.

 As Hadoop evolved as a standard open source solution for batch
 processing of massive data sets, there is no equivalent community
 supported open source platform for processing data streams in
 real-time. While various research projects have evolved into
 proprietary commercial products, S4 has the potential to fill the gap.
 Many projects that require a scalable stream processing architecture
 currently use Hadoop by segmenting the input stream into data batches.
 This solution is not efficient, results in high latency, and
 introduces unnecessary complexity.

 The S4 design is primarily driven by large scale applications for data
 mining and machine learning in a production environment. We think that
 the S4