Podling Report Reminder - March 2018

2018-03-01 Thread johndament
Dear podling,

This email was sent by an automated system on behalf of the Apache
Incubator PMC. It is an initial reminder to give you plenty of time to
prepare your quarterly board report.

The board meeting is scheduled for Wed, 21 March 2018, 10:30 am PDT.
The report for your podling will form a part of the Incubator PMC
report. The Incubator PMC requires your report to be submitted 2 weeks
before the board meeting, to allow sufficient time for review and
submission (Wed, March 07).

Please submit your report with sufficient time to allow the Incubator
PMC, and subsequently board members to review and digest. Again, the
very latest you should submit your report is 2 weeks prior to the board
meeting.

Thanks,

The Apache Incubator PMC

Submitting your Report

--

Your report should contain the following:

*   Your project name
*   A brief description of your project, which assumes no knowledge of
the project or necessarily of its field
*   A list of the three most important issues to address in the move
towards graduation.
*   Any issues that the Incubator PMC or ASF Board might wish/need to be
aware of
*   How has the community developed since the last report
*   How has the project developed since the last report.
*   How does the podling rate their own maturity.

This should be appended to the Incubator Wiki page at:

https://wiki.apache.org/incubator/March2018

Note: This is manually populated. You may need to wait a little before
this page is created from a template.

Mentors
---

Mentors should review reports for their project(s) and sign them off on
the Incubator wiki page. Signing off reports shows that you are
following the project - projects that are not signed may raise alarms
for the Incubator PMC.

Incubator PMC


Re: Pulsar now using standard Apache BookKeeper

2018-03-01 Thread Enrico Olivelli
Great!
Congrats

Do you have any number to compare performances?

Enrico

Il gio 1 mar 2018, 19:06 Sijie Guo  ha scritto:

> Awesome work, Ivan & Jia!
>
> Congrats Pulsar community!
>
> - Sijie
>
> On Thu, Mar 1, 2018 at 9:53 AM, Matteo Merli  wrote:
>
> > In Pulsar master branch, we have switched the BookKeeper dependency from
> > the Yahoo fork to the 4.7.0-SNAPSHOT version of BookKeeper from main
> master
> > branch.
> >
> > All the the changes that were in the Yahoo fork, which was based on 4.3.1
> > release, have been already merged upstream.
> >
> > This was a big effort that took ~ 1 year to get through. There were 246
> > commits to merge into a codebase that slightly changed in a 4 years
> > timespan. For the curious, this is the spreadsheet we used to track the
> > merging.
> > https://docs.google.com/spreadsheets/d/1jAy3EfjViqNEKpCKpWiRv-
> > PCZGzdjwm_PclL7Obog4Q/
> >
> > I would like to call out Ivan Kelly & Jia Zhai for giving a big push of
> > porting many of the changes into BookKeeper and the BookKeeper community
> > for being very receptive and helpful in getting this load of changes back
> > into mainline.
> >
> >
> > Matteo
> > --
> > Matteo Merli
> > 
> >
>
-- 


-- Enrico Olivelli


Re: Pulsar now using standard Apache BookKeeper

2018-03-01 Thread Sijie Guo
Awesome work, Ivan & Jia!

Congrats Pulsar community!

- Sijie

On Thu, Mar 1, 2018 at 9:53 AM, Matteo Merli  wrote:

> In Pulsar master branch, we have switched the BookKeeper dependency from
> the Yahoo fork to the 4.7.0-SNAPSHOT version of BookKeeper from main master
> branch.
>
> All the the changes that were in the Yahoo fork, which was based on 4.3.1
> release, have been already merged upstream.
>
> This was a big effort that took ~ 1 year to get through. There were 246
> commits to merge into a codebase that slightly changed in a 4 years
> timespan. For the curious, this is the spreadsheet we used to track the
> merging.
> https://docs.google.com/spreadsheets/d/1jAy3EfjViqNEKpCKpWiRv-
> PCZGzdjwm_PclL7Obog4Q/
>
> I would like to call out Ivan Kelly & Jia Zhai for giving a big push of
> porting many of the changes into BookKeeper and the BookKeeper community
> for being very receptive and helpful in getting this load of changes back
> into mainline.
>
>
> Matteo
> --
> Matteo Merli
> 
>


Pulsar now using standard Apache BookKeeper

2018-03-01 Thread Matteo Merli
In Pulsar master branch, we have switched the BookKeeper dependency from
the Yahoo fork to the 4.7.0-SNAPSHOT version of BookKeeper from main master
branch.

All the the changes that were in the Yahoo fork, which was based on 4.3.1
release, have been already merged upstream.

This was a big effort that took ~ 1 year to get through. There were 246
commits to merge into a codebase that slightly changed in a 4 years
timespan. For the curious, this is the spreadsheet we used to track the
merging.
https://docs.google.com/spreadsheets/d/1jAy3EfjViqNEKpCKpWiRv-PCZGzdjwm_PclL7Obog4Q/

I would like to call out Ivan Kelly & Jia Zhai for giving a big push of
porting many of the changes into BookKeeper and the BookKeeper community
for being very receptive and helpful in getting this load of changes back
into mainline.


Matteo
-- 
Matteo Merli



Re: Fine grained control over batching in tests

2018-03-01 Thread Ivan Kelly
To answer my own question, batching can be controlled with.

ProducerConfiguration producerConf = new ProducerConfiguration();
producerConf.setMaxPendingMessages(10);
producerConf.setBatchingEnabled(true);
producerConf.setBatchingMaxMessages(10);
producerConf.setBatchingMaxPublishDelay(1, TimeUnit.HOURS);

This will produce in batches of 10. The non-determinism I saw was due
to a bug in HashedWheelTimer. I set the delay to Long.MAX_VALUE, which
caused the batching trigger to run in a tight loop
(https://github.com/netty/netty/issues/7760).

-Ivan

On Mon, Feb 26, 2018 at 9:53 PM, Ivan Kelly  wrote:
> Hi folks,
>
> I'd like to do some tests with batching and compaction. So far the
> only way I've found to control batching is by setting
> BatchingMaxMessages and MaxPublishingDelay, but this doesn't seem 100%
> deterministic to me.
>
> Does anyone have a better way to control this, or would it make more
> sense to expose ProducerImpl#batchMessageAndSend() just for tests?
>
> Regards,
> Ivan


Slack digest for #dev - 2018-03-01

2018-03-01 Thread Apache Pulsar Slack
2018-03-01 00:49:07 UTC - Sahaya Andrews Albert: Good to hear @Matteo Merli. We 
could target it for 2.0 release?

2018-03-01 00:49:29 UTC - Matteo Merli: It’s **merged**

2018-03-01 00:50:05 UTC - Matteo Merli: Pulsar master is already using 
BookKeeper-4.7.0-SNAPSHOT

2018-03-01 00:50:25 UTC - Matteo Merli: BookKeeper-4.7.0 will be released soon

2018-03-01 00:51:24 UTC - Matteo Merli: the idea is start testing it out 
earlier, before approaching the release

2018-03-01 00:51:31 UTC - Matteo Merli: (Pulsar release 2.0)

2018-03-01 00:51:52 UTC - Sahaya Andrews Albert: I see, but did the release 
candidate go out with that version as well?

2018-03-01 00:55:12 UTC - Matteo Merli: No no, the release candidate is for 
Pulsar 1.22 — Master has already moved to 2.0-SNAPSHOT

2018-03-01 00:56:22 UTC - Sahaya Andrews Albert: Got it, thanks

2018-03-01 01:02:50 UTC - Matteo Merli: :beers:

2018-03-01 08:14:03 UTC - Sijie Guo: any committers around? can I get a review 
on this -  (fixing a typo)

2018-03-01 08:31:11 UTC - Jai Asher: Please download, test and vote "Pulsar 
1.22.0-incubating Release Candidate 3" 
This time it for 
 mailing list



Re: [DISCUSS] PIP 15: Pulsar Functions

2018-03-01 Thread Sijie Guo
Hi all,

We've sent out a PR for contributing the pulsar-functions:
https://github.com/apache/incubator-pulsar/pull/1314

All the changes are made within a sub-module called `pulsar-functions`.

Look forward to comments and reviews.

- Sijie

On Tue, Feb 27, 2018 at 2:43 PM, Sijie Guo  wrote:

> Hi all,
>
> Thank you for everyone in this email thread. It seems that people are
> interested in this feature. We'd like to contribute our initial work as
> part of pulsar and continue the development of this idea under the ASF.
>
> I am going to send a pull request soon this week. The pull request is
> going to be a "gaint" pull request, however we have made sure all the
> changes are made under a submodule called "pulsar-functions", so that pull
> requests
> will not contain any changes to the main pulsar repo. Hopefully it would
> be easier for the community to accept this feature :-)
>
> I know pulsar repo only accepts squash merges now. I am wondering is there
> anyway for accepting this feature while keeping all the commits for it?
>
> We would also like to see is there any better approaches for merging this
> change :-)
>
> Thanks,
> Sijie
>
> On Wed, Feb 21, 2018 at 1:48 PM, Sanjeev Kulkarni 
> wrote:
>
>> Thread based and process based run times exist in code. Docker based
>> runtime is still to be done. We plan to release a preview version in a
>> couple of weeks. And based on community feedback evolve from there.
>> Hope that helps!
>>
>> On Wed, Feb 21, 2018 at 1:12 PM Dave Fisher 
>> wrote:
>>
>> > Hi Sanjeev -
>> >
>> > I have read the PIP more carefully on my computer (rather than iPhone).
>> >
>> >
>> >1. Process Runtime in which each instance is run as a process.
>> >2. Docker Runtime in which each instance is run as a docker container
>> >3. Threaded Runtime in which each instance is run as a thread. This
>> >type is applicable only to Java instance since Pulsar Functions
>> framework
>> >itself is written in Java.
>> >
>> > I’m interested in knowing a bit more about the Runtime API for these
>> three
>> > types.
>> >
>> > How much of the PIP exists in code?
>> >
>> > Best Regards,
>> > Dave
>> >
>> >
>> > On Feb 20, 2018, at 7:33 PM, Sanjeev Kulkarni 
>> wrote:
>> >
>> > Hi Dave,
>> > Chaining functions is certainly on the roadmap. The PIP document briefly
>> > talks about at-least two ways of doing it, but it probably requires
>> another
>> > PIP by itself at a later stage.
>> > Wrt parallelism, for functions managed by Pulsar cluster, parallelism
>> can
>> > be provided at submission time. For functions that will be run as a
>> simple
>> > process, the parallelism should be managed by the user.
>> > WRT cpu/memory and other configuration, the aim of the inbuilt Pulsar
>> > cluster is to keep it simple by just doing some simple distribution
>> across
>> > multiple workers. The aim is not to replicate features that are already
>> > present in full-fledged schedulers like Mesos/Yarn/K8. If one needs
>> > memory/cpu bounds for a function, the ideal way to do that would be to
>> run
>> > them on one of these full-blown schedulers.  We could provide an easier
>> > path for users to run these functions onto these schedulers by providing
>> > launch templates.
>> > Hope that helps.
>> >
>> > On Tue, Feb 20, 2018 at 6:08 PM, Dave Fisher 
>> > wrote:
>> >
>> > Hi -
>> >
>> > This is very interesting. I’ve been thinking about using Heron for this
>> > functionality.
>> >
>> > An Admin API for configuring the functions on live Executors and
>> > specifying a unique return value Topic need discussion. I would also
>> like
>> > to chain Functions.
>> >
>> > I think Functions will need Profiles to include metadata for
>> parallelism,
>> > memory, configuration, etc.
>> >
>> > Regards,
>> > Dave
>> >
>> > Sent from my iPhone
>> >
>> > On Feb 20, 2018, at 4:05 PM, Sanjeev Kulkarni 
>> >
>> > wrote:
>> >
>> >
>> > https://github.com/apache/incubator-pulsar/wiki/PIP-15:-Puls
>> ar-Functions
>> >
>> > ---
>> >
>> > * **Status**: Proposal
>> > * **Author**: Sanjeev Kulkarni/Sijie Guo/Jerry Peng - Streamlio
>> > * **Pull Request**: See Below
>> > * **Mailing List discussion**:
>> >
>> > Motivation
>> >
>> > There has been a renewed interest from users in lightweight computing
>> > frameworks. Typical things what they mean by lightweight is:
>> >
>> > 1. They are not compute systems that need to be installed/run/monitored.
>> > Thus they are much more ops light. Some of them are offered as pure
>> > SaaS(like AWS Lambda) while others are integrated with message
>> >
>> > queues(like
>> >
>> > KStreams)
>> > 2. Their interface should be as simple as it gets. Typically it takes
>> > the form of a function/subroutine that is the basic compute block in
>> >
>> > most
>> >
>> > programming languages. And API must be multi-language capable.
>> > 3. The deployment models should be flexible. Users should be able to run
>> > these functions using their favorite management tools, or they can run
>> >
>