Re: [VOTE] Pulsar 1.22.0-incubating Release Candidate 2

2018-02-20 Thread Jai Asher
Closing this thread since we have a new release candidate.
Subject of the new voting mail chain "Pulsar 1.22.0-incubating Release
Candidate 3"

On Tue, Feb 20, 2018 at 1:38 PM, Jai Asher  wrote:

> This is the third release candidate for Apache Pulsar, version
> 1.22.0-incubating.
>
> It fixes the following issues:
> https://github.com/apache/incubator-pulsar/milestone/11?closed=1
>
> *** Please download, test and vote by Friday, Feb 23, 2018, 10:00 GMT.
>
> Note that we are voting upon the source (tag), binaries are provided for
> convenience.
>
> Source and binary files:
> https://dist.apache.org/repos/dist/dev/incubator/pulsar/
> pulsar-1.22.0-incubating-candidate-2/
>
> Maven staging repo:
> https://repository.apache.org/content/repositories/orgapachepulsar-1009/
>
> The tag to be voted upon:
> v1.22.0-incubating-candidate-2 (44fd82654fbf19f31a708b4c9d6ce1681e32a5fb)
> https://github.com/apache/incubator-pulsar/releases/tag/
> v1.22.0-incubating-candidate-2
>
> Pulsar's KEYS file containing PGP keys we use to sign the release:
> https://dist.apache.org/repos/dist/release/incubator/pulsar/KEYS
>
> Please download the source package, and follow the README to build
> and run the Pulsar standalone service.
>


[VOTE] Pulsar 1.22.0-incubating Release Candidate 3

2018-02-20 Thread Jai Asher
This is the fourth release candidate for Apache Pulsar, version
1.22.0-incubating.

It fixes the following issues:
https://github.com/apache/incubator-pulsar/milestone/11?closed=1

*** Please download, test and vote by Friday, Feb 23, 2018, 10:00 GMT.

Note that we are voting upon the source (tag), binaries are provided for
convenience.

Source and binary files:
https://dist.apache.org/repos/dist/dev/incubator/pulsar/pulsar-1.22.0-incubating-candidate-3/

Maven staging repo:
https://repository.apache.org/content/repositories/orgapachepulsar-1010/

The tag to be voted upon:
v1.22.0-incubating-candidate-3 (5d14788e510faec23fd8ed189ed343e93b489dda)
https://github.com/apache/incubator-pulsar/releases/tag/v1.22.0-incubating-candidate-3

Pulsar's KEYS file containing PGP keys we use to sign the release:
https://dist.apache.org/repos/dist/release/incubator/pulsar/KEYS

Please download the source package, and follow the README to build
and run the Pulsar standalone service.


Re: [DISCUSS] PIP 15: Pulsar Functions

2018-02-20 Thread Sanjeev Kulkarni
Hi Dave,
Chaining functions is certainly on the roadmap. The PIP document briefly
talks about at-least two ways of doing it, but it probably requires another
PIP by itself at a later stage.
Wrt parallelism, for functions managed by Pulsar cluster, parallelism can
be provided at submission time. For functions that will be run as a simple
process, the parallelism should be managed by the user.
WRT cpu/memory and other configuration, the aim of the inbuilt Pulsar
cluster is to keep it simple by just doing some simple distribution across
multiple workers. The aim is not to replicate features that are already
present in full-fledged schedulers like Mesos/Yarn/K8. If one needs
memory/cpu bounds for a function, the ideal way to do that would be to run
them on one of these full-blown schedulers.  We could provide an easier
path for users to run these functions onto these schedulers by providing
launch templates.
Hope that helps.

On Tue, Feb 20, 2018 at 6:08 PM, Dave Fisher  wrote:

> Hi -
>
> This is very interesting. I’ve been thinking about using Heron for this
> functionality.
>
> An Admin API for configuring the functions on live Executors and
> specifying a unique return value Topic need discussion. I would also like
> to chain Functions.
>
> I think Functions will need Profiles to include metadata for parallelism,
> memory, configuration, etc.
>
> Regards,
> Dave
>
> Sent from my iPhone
>
> > On Feb 20, 2018, at 4:05 PM, Sanjeev Kulkarni 
> wrote:
> >
> > https://github.com/apache/incubator-pulsar/wiki/PIP-15:-Pulsar-Functions
> >
> > ---
> >
> > * **Status**: Proposal
> > * **Author**: Sanjeev Kulkarni/Sijie Guo/Jerry Peng - Streamlio
> > * **Pull Request**: See Below
> > * **Mailing List discussion**:
> >
> > Motivation
> >
> > There has been a renewed interest from users in lightweight computing
> > frameworks. Typical things what they mean by lightweight is:
> >
> >  1. They are not compute systems that need to be installed/run/monitored.
> >  Thus they are much more ops light. Some of them are offered as pure
> >  SaaS(like AWS Lambda) while others are integrated with message
> queues(like
> >  KStreams)
> >  2. Their interface should be as simple as it gets. Typically it takes
> >  the form of a function/subroutine that is the basic compute block in
> most
> >  programming languages. And API must be multi-language capable.
> >  3. The deployment models should be flexible. Users should be able to run
> >  these functions using their favorite management tools, or they can run
> them
> >  with the brokers.
> >
> > The aim of all of these would be to dramatically increase the pace of
> > experimentation/dev productivity. They also fit in the event driven
> > architecture that most companies are moving towards where data is
> > constantly arriving. The aim is for users to run simple functions against
> > arriving data and not really worry about mastering the complicated
> > API/semantics as well as managing/monitoring a complex compute infra.
> >
> > A message queue like Pulsar sits at the heart of any event driven
> > architecture. Data coming in from all sources typically lands in the
> > message bus first. Thus if Pulsar(or a Pulsar extension) has this feature
> > of being able to register/run simple user functions, it could be a long
> way
> > to drive Pulsar adoption. Users could just deploy Pulsar and instantly
> have
> > a very flexible way of doing basic computation.
> >
> > This document outlines the goals/design of what we want in such a system
> > and how they can be built into Pulsar.
> >  Pulsar-Functions#goals>
> > Goals
> >
> >  1. Simplest possible programmability: This is the overarching goal.
> >  Anyone with the ability to write a function in a supported language
> should
> >  be able to get productive in matter of minutes.
> >  2. Multi Language Capability:- We should provide the API in at-least the
> >  most popular languages, Java/Scala/Python/Go/JavaScript.
> >  3. Flexible runtime deployment:- User should be able to run these
> >  functions as a simple process using their favorite management tools.
> They
> >  should also be able to submit their functions to be run in a Pulsar
> cluster.
> >  4. Built in State Management:- Computations should be allowed to keep
> >  state across computations. The system should take care of persisting
> this
> >  state in a robust manner. Basic things like incrBy/get/put/update
> >  functionality is a must. This dramatically simplifies the architecture
> for
> >  the developer.
> >  5. Queryable State:- The state written by a function should be queryable
> >  using standard rest apis.
> >  6. Automatic Load Balancing:- The Managed runtime should take care of
> >  assigning workers to the functions.
> >  7. Scale Up/Down:- Users should be able to scale up/down the number of
> >  function instances in the managed runtime.
> >  8. Flexible Invocation:- Thread based, process based and docker 

Re: [DISCUSS] PIP 15: Pulsar Functions

2018-02-20 Thread Dave Fisher
Hi -

This is very interesting. I’ve been thinking about using Heron for this 
functionality.

An Admin API for configuring the functions on live Executors and specifying a 
unique return value Topic need discussion. I would also like to chain Functions.

I think Functions will need Profiles to include metadata for parallelism, 
memory, configuration, etc. 

Regards,
Dave

Sent from my iPhone

> On Feb 20, 2018, at 4:05 PM, Sanjeev Kulkarni  wrote:
> 
> https://github.com/apache/incubator-pulsar/wiki/PIP-15:-Pulsar-Functions
> 
> ---
> 
> * **Status**: Proposal
> * **Author**: Sanjeev Kulkarni/Sijie Guo/Jerry Peng - Streamlio
> * **Pull Request**: See Below
> * **Mailing List discussion**:
> 
> Motivation
> 
> There has been a renewed interest from users in lightweight computing
> frameworks. Typical things what they mean by lightweight is:
> 
>  1. They are not compute systems that need to be installed/run/monitored.
>  Thus they are much more ops light. Some of them are offered as pure
>  SaaS(like AWS Lambda) while others are integrated with message queues(like
>  KStreams)
>  2. Their interface should be as simple as it gets. Typically it takes
>  the form of a function/subroutine that is the basic compute block in most
>  programming languages. And API must be multi-language capable.
>  3. The deployment models should be flexible. Users should be able to run
>  these functions using their favorite management tools, or they can run them
>  with the brokers.
> 
> The aim of all of these would be to dramatically increase the pace of
> experimentation/dev productivity. They also fit in the event driven
> architecture that most companies are moving towards where data is
> constantly arriving. The aim is for users to run simple functions against
> arriving data and not really worry about mastering the complicated
> API/semantics as well as managing/monitoring a complex compute infra.
> 
> A message queue like Pulsar sits at the heart of any event driven
> architecture. Data coming in from all sources typically lands in the
> message bus first. Thus if Pulsar(or a Pulsar extension) has this feature
> of being able to register/run simple user functions, it could be a long way
> to drive Pulsar adoption. Users could just deploy Pulsar and instantly have
> a very flexible way of doing basic computation.
> 
> This document outlines the goals/design of what we want in such a system
> and how they can be built into Pulsar.
> 
> Goals
> 
>  1. Simplest possible programmability: This is the overarching goal.
>  Anyone with the ability to write a function in a supported language should
>  be able to get productive in matter of minutes.
>  2. Multi Language Capability:- We should provide the API in at-least the
>  most popular languages, Java/Scala/Python/Go/JavaScript.
>  3. Flexible runtime deployment:- User should be able to run these
>  functions as a simple process using their favorite management tools. They
>  should also be able to submit their functions to be run in a Pulsar cluster.
>  4. Built in State Management:- Computations should be allowed to keep
>  state across computations. The system should take care of persisting this
>  state in a robust manner. Basic things like incrBy/get/put/update
>  functionality is a must. This dramatically simplifies the architecture for
>  the developer.
>  5. Queryable State:- The state written by a function should be queryable
>  using standard rest apis.
>  6. Automatic Load Balancing:- The Managed runtime should take care of
>  assigning workers to the functions.
>  7. Scale Up/Down:- Users should be able to scale up/down the number of
>  function instances in the managed runtime.
>  8. Flexible Invocation:- Thread based, process based and docker based
>  invocation should be supported for running each function.
>  9. Metrics:- Basic metrics like events processed per second, failures,
>  latency etc should be made available on a per function basis. Users should
>  also be able to publish their own metrics
>  10. REST interface:- Function control should be using REST protocol to
>  have the widest adoption.
>  11. Library/CLI:- Simple Libraries in all supported languages should
>  exist. Also should come with basic CLI to register/list/query/stats and
>  other admin activities.
> 
> More details on the PIP page.
> Thanks!



Slack digest for #dev - 2018-02-20

2018-02-20 Thread Apache Pulsar Slack
2018-02-20 18:42:47 UTC - Anil Muppalla: @Anil Muppalla has joined the channel



[DISCUSS] PIP 15: Pulsar Functions

2018-02-20 Thread Sanjeev Kulkarni
https://github.com/apache/incubator-pulsar/wiki/PIP-15:-Pulsar-Functions

---

 * **Status**: Proposal
 * **Author**: Sanjeev Kulkarni/Sijie Guo/Jerry Peng - Streamlio
 * **Pull Request**: See Below
 * **Mailing List discussion**:

Motivation

There has been a renewed interest from users in lightweight computing
frameworks. Typical things what they mean by lightweight is:

   1. They are not compute systems that need to be installed/run/monitored.
   Thus they are much more ops light. Some of them are offered as pure
   SaaS(like AWS Lambda) while others are integrated with message queues(like
   KStreams)
   2. Their interface should be as simple as it gets. Typically it takes
   the form of a function/subroutine that is the basic compute block in most
   programming languages. And API must be multi-language capable.
   3. The deployment models should be flexible. Users should be able to run
   these functions using their favorite management tools, or they can run them
   with the brokers.

The aim of all of these would be to dramatically increase the pace of
experimentation/dev productivity. They also fit in the event driven
architecture that most companies are moving towards where data is
constantly arriving. The aim is for users to run simple functions against
arriving data and not really worry about mastering the complicated
API/semantics as well as managing/monitoring a complex compute infra.

A message queue like Pulsar sits at the heart of any event driven
architecture. Data coming in from all sources typically lands in the
message bus first. Thus if Pulsar(or a Pulsar extension) has this feature
of being able to register/run simple user functions, it could be a long way
to drive Pulsar adoption. Users could just deploy Pulsar and instantly have
a very flexible way of doing basic computation.

This document outlines the goals/design of what we want in such a system
and how they can be built into Pulsar.

Goals

   1. Simplest possible programmability: This is the overarching goal.
   Anyone with the ability to write a function in a supported language should
   be able to get productive in matter of minutes.
   2. Multi Language Capability:- We should provide the API in at-least the
   most popular languages, Java/Scala/Python/Go/JavaScript.
   3. Flexible runtime deployment:- User should be able to run these
   functions as a simple process using their favorite management tools. They
   should also be able to submit their functions to be run in a Pulsar cluster.
   4. Built in State Management:- Computations should be allowed to keep
   state across computations. The system should take care of persisting this
   state in a robust manner. Basic things like incrBy/get/put/update
   functionality is a must. This dramatically simplifies the architecture for
   the developer.
   5. Queryable State:- The state written by a function should be queryable
   using standard rest apis.
   6. Automatic Load Balancing:- The Managed runtime should take care of
   assigning workers to the functions.
   7. Scale Up/Down:- Users should be able to scale up/down the number of
   function instances in the managed runtime.
   8. Flexible Invocation:- Thread based, process based and docker based
   invocation should be supported for running each function.
   9. Metrics:- Basic metrics like events processed per second, failures,
   latency etc should be made available on a per function basis. Users should
   also be able to publish their own metrics
   10. REST interface:- Function control should be using REST protocol to
   have the widest adoption.
   11. Library/CLI:- Simple Libraries in all supported languages should
   exist. Also should come with basic CLI to register/list/query/stats and
   other admin activities.

More details on the PIP page.
Thanks!


[VOTE] Pulsar 1.22.0-incubating Release Candidate 2

2018-02-20 Thread Jai Asher
This is the third release candidate for Apache Pulsar, version
1.22.0-incubating.

It fixes the following issues:
https://github.com/apache/incubator-pulsar/milestone/11?closed=1

*** Please download, test and vote by Friday, Feb 23, 2018, 10:00 GMT.

Note that we are voting upon the source (tag), binaries are provided for
convenience.

Source and binary files:
https://dist.apache.org/repos/dist/dev/incubator/pulsar/pulsar-1.22.0-incubating-candidate-2/

Maven staging repo:
https://repository.apache.org/content/repositories/orgapachepulsar-1009/

The tag to be voted upon:
v1.22.0-incubating-candidate-2 (44fd82654fbf19f31a708b4c9d6ce1681e32a5fb)
https://github.com/apache/incubator-pulsar/releases/tag/v1.22.0-incubating-candidate-2

Pulsar's KEYS file containing PGP keys we use to sign the release:
https://dist.apache.org/repos/dist/release/incubator/pulsar/KEYS

Please download the source package, and follow the README to build
and run the Pulsar standalone service.


[BLOG POST] Effectively-once semantics in Apache Pulsar

2018-02-20 Thread Matteo Merli
https://streaml.io/blog/pulsar-effectively-once/

Matteo
-- 
Matteo Merli



RE: [VOTE] Pulsar 1.22.0-incubating Release Candidate 1

2018-02-20 Thread Masahiro Sakamoto
I'm sorry I'm late. I have opened the pull request.
https://github.com/apache/incubator-pulsar/pull/1256

Masahiro

--
Masahiro Sakamoto
Yahoo Japan Corp.
E-mail: massa...@yahoo-corp.jp
--

> -Original Message-
> From: Masahiro Sakamoto [mailto:massa...@yahoo-corp.jp]
> Sent: Tuesday, February 20, 2018 10:59 AM
> To: dev@pulsar.incubator.apache.org
> Subject: RE: [VOTE] Pulsar 1.22.0-incubating Release Candidate 1
> 
> Hi Jai,
> 
> I have found an issue related to the feature that has been added in the
> following pull request.
> https://github.com/apache/incubator-pulsar/pull/899
> 
> I will create a pull request to fix that issue by the end of today.
> If possible, I would like to include the fix in 1.22.
> 
> Regards,
> 
> Masahiro
> 
> --
> Masahiro Sakamoto
> Yahoo Japan Corp.
> E-mail: massa...@yahoo-corp.jp
> --
> 
> > -Original Message-
> > From: Jai Asher [mailto:jai.ashe...@gmail.com]
> > Sent: Tuesday, February 20, 2018 5:01 AM
> > To: dev@pulsar.incubator.apache.org
> > Subject: Re: [VOTE] Pulsar 1.22.0-incubating Release Candidate 1
> >
> > Thanks, Matteo :-)
> > I will have another release candidate ready by the End of Day.
> >
> > On Mon, Feb 19, 2018 at 10:49 AM, Matteo Merli  wrote:
> >
> > > Jai,
> > >
> > > I have merged the mentioned fix in master & 1.22 branch.
> > >
> > > On Mon, Feb 19, 2018 at 9:53 AM Dave Fisher 
> wrote:
> > >
> > > > Hi -
> > > >
> > > > Given this comment I’ll wait until tomorrow to review the release.
> > > >
> > > > Regards,
> > > > Dave
> > > >
> > > > > On Feb 19, 2018, at 9:27 AM, Matteo Merli  wrote:
> > > > >
> > > > > Jai, the artifacts look good.
> > > > >
> > > > > Though I think we should squeeze in the fix for
> > > > > https://github.com/apache/incubator-pulsar/pull/1251
> > > > > That makes partitions unusable on non-persistent topics.
> > > > >
> > > > > Matteo
> > > > >
> > > > > On Sat, Feb 17, 2018 at 1:13 AM Jai Asher
> > > > > 
> > > wrote:
> > > > >
> > > > >> This is the second release candidate for Apache Pulsar, version
> > > > >> 1.22.0-incubating.
> > > > >>
> > > > >> It fixes the following issues:
> > > > >> https://github.com/apache/incubator-pulsar/milestone/11?closed=
> > > > >> 1
> > > > >>
> > > > >> *** Please download, test and vote by Tuesday, Feb 20th, 2018,
> > > > >> 10:00
> > > > GMT.
> > > > >>
> > > > >> Note that we are voting upon the source (tag), binaries are
> > > > >> provided
> > > for
> > > > >> convenience.
> > > > >>
> > > > >> Source and binary files:
> > > > >>
> > > > >>
> > > > https://dist.apache.org/repos/dist/dev/incubator/pulsar/
> > > pulsar-1.22.0-incubating-candidate-1/
> > > > >>
> > > > >> Maven staging repo:
> > > > >>
> > > >
> > https://repository.apache.org/content/repositories/orgapachepulsar-1
> > > > 008/
> > > > >>
> > > > >> The tag to be voted upon:
> > > > >> v1.22.0-incubating-candidate-0
> > > > (c7c8a408e377e979350453e06c68340bc66c512c)
> > > > >>
> > > > >>
> > > > https://github.com/apache/incubator-pulsar/releases/tag/
> > > v1.22.0-incubating-candidate-1
> > > > >>
> > > > >> Pulsar's KEYS file containing PGP keys we use to sign the release:
> > > > >> https://dist.apache.org/repos/dist/release/incubator/pulsar/KEY
> > > > >> S
> > > > >>
> > > > >> Please download the source package, and follow the README to
> > > > >> build and run the Pulsar standalone service.
> > > > >>
> > > > > --
> > > > > Matteo Merli
> > > > > 
> > > >
> > > > --
> > > Matteo Merli
> > > 
> > >