Re: [VOTE] SEP-32: Elasticity for Samza

2023-02-08 Thread Yi Pan
+1 (binding)

Thanks!

-Yi

On Tue, Feb 7, 2023 at 2:14 PM Bharath Kumara Subramanian <
codin.mart...@gmail.com> wrote:

> +1 (binding)
>
> Cheers,
> Bharath
>
> On Tue, Feb 7, 2023 at 12:56 PM Lakshmi Manasa 
> wrote:
>
> > Hi folks,
> >
> >  This is a call for vote on SEP-32: Elasticity for Samza.
> > Thank you for reviewing the SEP and giving feedback.
> >
> > I have addressed the comments on the SEP and since there were three +1 on
> > the discuss thread, starting this vote.
> >
> > Discussion thread:
> > https://lists.apache.org/thread/vjtl5fnf64kpkoxc591466y92dlt2bsb
> >
> > SEP:
> >
> >
> https://cwiki.apache.org/confluence/display/SAMZA/SEP-32%3A+Elasticity+for+Samza
> >
> > Please vote:
> > [ ] +1 approve
> > [ ] +0 no opinion
> > [ ] -1 disapprove (and reason why)
> >
> > thanks,
> > Manasa
> >
>


Re: [DISCUSS] SEP-32: Elasticity for Samza

2023-02-06 Thread Yi Pan
ty as virtual tasks can be spread across hosts whereas
>> increased throughput due to all keys (single task) in key ordered executor
>> sitting in the same host will increase the load on the host and (c) if one
>> or more of the parallel units (threads here) needs more resources, it will
>> result in large container which makes scheduling harder as finding large
>> chunks takes longer in a cluster whereas with virtual tasks, we can have
>> smaller containers for virtual tasks.
>>
>>
>> Please let me know if the above answers make sense and if there are any
>> follow-ups for this SEP.
>>
>> On Thu, Jan 19, 2023 at 10:33 PM Yi Pan  wrote:
>>
>>> Hey, Manasa,
>>>
>>> Sorry to chime in late. A few questions:
>>> a) how are states for the virtual tasks managed during split/merge?
>>> b) what's perf impact when we have 2 virtual tasks on the same SSP in the
>>> same container, while one virtual task is much faster than the other?
>>> c) what's the reason that a virtual task can not filter older messages
>>> from
>>> a previous offset, in case the container restarts from a smaller offset
>>> from another virtual task consuming the same SSP?
>>> d) how do we compare this w/ an alternative idea that implements a
>>> KeyedOrderedExecutor w/ multiple parallel threads within the single
>>> task's
>>> main event loop to increase the parallelism?
>>>
>>> Best,
>>>
>>> -Yi
>>>
>>>
>>> On Thu, Jan 19, 2023 at 3:26 PM Lakshmi Manasa <
>>> lakshmimanas...@gmail.com>
>>> wrote:
>>>
>>> > hi all,
>>> >
>>> >  if there are no concerns or questions about this SEP, I shall start
>>> the
>>> > vote email thread tomorrow.
>>> >
>>> > thanks,
>>> > Manasa
>>> >
>>> > On Fri, Jan 6, 2023 at 8:08 AM Lakshmi Manasa <
>>> lakshmimanas...@gmail.com>
>>> > wrote:
>>> >
>>> > > Hi all,
>>> > >   We created SEP-32: Elasticity for Samza.
>>> > >
>>> > > Please find SEP here (
>>> > >
>>> >
>>> https://cwiki.apache.org/confluence/display/SAMZA/SEP-32%3A+Elasticity+for+Samza
>>> > > )
>>> > >   Please take a look and provide feedback. thanks, Manasa
>>> > >
>>> >
>>>
>>


Re: [DISCUSS] SEP-32: Elasticity for Samza

2023-01-19 Thread Yi Pan
Hey, Manasa,

Sorry to chime in late. A few questions:
a) how are states for the virtual tasks managed during split/merge?
b) what's perf impact when we have 2 virtual tasks on the same SSP in the
same container, while one virtual task is much faster than the other?
c) what's the reason that a virtual task can not filter older messages from
a previous offset, in case the container restarts from a smaller offset
from another virtual task consuming the same SSP?
d) how do we compare this w/ an alternative idea that implements a
KeyedOrderedExecutor w/ multiple parallel threads within the single task's
main event loop to increase the parallelism?

Best,

-Yi


On Thu, Jan 19, 2023 at 3:26 PM Lakshmi Manasa 
wrote:

> hi all,
>
>  if there are no concerns or questions about this SEP, I shall start the
> vote email thread tomorrow.
>
> thanks,
> Manasa
>
> On Fri, Jan 6, 2023 at 8:08 AM Lakshmi Manasa 
> wrote:
>
> > Hi all,
> >   We created SEP-32: Elasticity for Samza.
> >
> > Please find SEP here (
> >
> https://cwiki.apache.org/confluence/display/SAMZA/SEP-32%3A+Elasticity+for+Samza
> > )
> >   Please take a look and provide feedback. thanks, Manasa
> >
>


Re: [VOTE] Apache Samza 1.8.0 RC0

2023-01-06 Thread Yi Pan
(+1) binding,

Downloaded the src tarball, run check-all.sh and passed all tests.

One thing noticed: there are configurations for your personal keys used for
publishing the jars checked in gradle.properties. I don't think that we
need to include that in the published src tarball.

Otherwise, lgtm.

Thanks a lot!

-Yi

On Wed, Dec 21, 2022 at 4:40 PM Xinyu Liu  wrote:

> +1 (binding).
>
> Verified the md5 and sha1 checksums. Run check-all.sh on linux and the
> build/tests all passed. Please also generate the sha256 checksums for the
> release artifacts, according to Apache's requirements for open source
> releases.
>
> Thanks,
> Xinyu
>
> On Wed, Dec 21, 2022 at 9:47 AM Ajo Thomas  wrote:
>
> > Hey All,
> >
> > This is a call for a vote on the release of *Apache Samza 1.8.0.*
> > Thanks to everyone who contributed to this release.
> >
> > The release candidate can be downloaded from here:
> > https://home.apache.org/~ajothomas/samza-1.8.0-rc0/
> > The release candidate is signed with pgp key *1A4639DA*, which is
> included
> > in the repository's KEYS file:
> > https://git-wip-us.apache.org/repos/asf?p=samza.git;a=blob_plain;f=KEYS
> > and
> > can also be found on keyservers:
> >
> >
> https://keyserver.ubuntu.com/pks/lookup?search=ajothomas%40apache.org=on=index
> > <
> >
> https://keyserver.ubuntu.com/pks/lookup?search=ajothomas%40apache.org=on=index
> > >
> >
> > The git tag is *release-1.8.0-rc0* and signed with the same pgp key:
> >
> >
> https://gitbox.apache.org/repos/asf?p=samza.git;a=tag;h=refs/tags/release-1.8.0-rc0
> > <
> >
> https://gitbox.apache.org/repos/asf?p=samza.git;a=tag;h=refs/tags/release-1.8.0-rc0
> > >
> >
> > Test binaries have been published to Maven's staging repository, and are
> > available here:
> > URL: https://repository.apache.org/#stagingRepositories
> > 
> > Repository: orgapachesamza-1095 (org.apache.samza)
> >
> > Please download the release candidate, check the hashes/signature, build
> it
> > and test it, and then please vote:
> > [ ] +1 approve
> > [ ] +0 no opinion
> > [ ] -1 disapprove (and reason why)
> >
> > Please note that check-all.sh was run and integration tests *were not
> *run
> > as there are some issues with the legacy zopkio library used for
> > integration testing. However, most of the key features being released as
> a
> > part of this release have been tested and are currently used in many of
> our
> > production jobs at LinkedIn. hadoop/yarn3 changes have been tested with
> > https://github.com/apache/samza-hello-samza which brings up yarn 3,
> > zookeeper and kafka clusters locally for testing.
> >
> > Thanks,
> > Ajo
> >
>


Re: [ANNOUNCE] Welcome Ajo Thomas as Samza Committer

2022-12-16 Thread Yi Pan
Welcome and congrats, Ajo!

- Yi

On Wed, Dec 14, 2022 at 3:42 PM Xinyu Liu  wrote:

> Hi, All,
>
> I am glad to announce that Ajo Thomas has officially accepted our
> invitation and become an Apache Samza Committer now.
>
> Ajo has made contributions to improve both Samza user experience and
> operability greatly. He added the partial update functionality to Samza
> Table API to allow field-level updates to stores. He developed the
> “Pipeline Drain” feature for cleaning up intermediate data and state before
> introducing backward incompatible changes. He is also actively working on
> the next release of Samza 1.8.
>
> Considering his contributions, the Samza PMC trusts Ajo with the
> responsibilities of a Samza Committer.
>
> Please join me to give him a warm welcome!
>
> Xinyu Liu
> on behalf of the Apache Samza PMC
>


Re: [VOTE] SEP-31: Pipeline Drain- Support the ability to drain pipelines to allow incompatible intermediate schema changes

2022-12-08 Thread Yi Pan
+1. Long awaited feature! Thanks!

-Yi

On Tue, Nov 29, 2022 at 11:46 AM Xinyu Liu  wrote:

> +1.
>
> Overall the design looks good. Thanks for contributing to this feature.
>
> Thanks,
> Xinyu
>
> On Tue, Nov 29, 2022 at 10:44 AM Ajo Thomas 
> wrote:
>
> > Hi All,
> >
> > This is a call for a vote on *SEP-31: Pipeline Drain- Support the ability
> > to drain pipelines to allow incompatible intermediate schema changes.*
> > Thanks to everyone involved with the design and reviews to refine the
> > proposal.
> >
> > Discuss Email Thread:
> > https://lists.apache.org/thread/7m2hqcqq9lx9o1d48gb64glplb3g2crt
> >
> > SEP-31:
> >
> >
> https://cwiki.apache.org/confluence/display/SAMZA/SEP-31%3A+Pipeline+Drain-+Support+the+ability+to+drain+pipelines+to+allow+incompatible+intermediate+schema+changes
> >
> > Jira ticket:
> > https://issues.apache.org/jira/browse/SAMZA-2741
> >
> > Please vote:
> > [ ] +1 approve
> > [ ] +0 no opinion
> > [ ] -1 disapprove (and reason why)
> >
> > Thanks,
> > Ajo
> >
>


Re: SEP-31: Pipeline Drain: Support the ability to drain pipelines to allow incompatible intermediate schema changes

2022-12-08 Thread Yi Pan
As discussed offline and see the clarifications in the SEP, +1 (binding)

On Fri, Dec 2, 2022 at 8:05 AM Ajo Thomas  wrote:

> Hi Yi,
>
> The order currently is infinity watermark followed by drain control message
> for every source SSP (all input SSPs - intermediate SSPs) to insert in the
> in-memory buffer in SystemConsumers. Prior to this step, we also
> stop calling refresh in Chooser to make sure that the last messages in the
> in-memory SSP buffer are the watermark and drain messages.
> Infinity watermark is essentially tasked with flushing windows and
> triggers.
> Drain message essentially signals to the processing logic that it is the
> last message for SSP and it should shutdown. We track the SSPs that have
> received this token in a task. Once all SSPs have been drained, the task is
> marked ready to shutdown. Once all tasks are ready to shutdown, RunLoop
> shuts down.
>
> Do you see any issues with it ?
>
> - Ajo
>
>
> On Thu, 1 Dec 2022 at 20:06, Yi Pan  wrote:
>
> > Hi, Ajo,
> >
> > Sorry to reply this late. Could you clarify one thing in the design: For
> > watermark triggered window draining, is the infinitive watermark trigger
> > happen first, or the drain token in all source SSP happen first?
> Shouldn't
> > it be the following sequence: a) all drain token from all input source
> SSPs
> > (except for intermediate streams) are received by tasks ==> b) infinite
> > watermark triggers from the source and flush all window/triggers in the
> > pipeline ==> c) once the infinite watermark is propagated through all
> > stages in the pipeline, stops the tasks. Could you confirm?
> >
> > Thanks a lot!
> >
> > -Yi
> >
> > On Thu, Nov 17, 2022 at 9:48 AM Ajo Thomas 
> wrote:
> >
> > > Hi All,
> > >
> > > Samza currently doesn't have a way to gracefully drain pipelines before
> > > making a backward-incompatible intermediate schema change. We have
> added
> > a
> > > feature called Pipeline Drain to the samza engine to address this
> > problem.
> > > Here is the SEP page for it:
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/SAMZA/SEP-31%3A+Pipeline+Drain%3A+Support+the+ability+to+drain+pipelines+to+allow+incompatible+intermediate+schema+changes
> > >
> > >
> > > If there are no major blockers, we are tentatively seeking to open a
> vote
> > > on Monday, Nov 28th, 2022.
> > >
> > > Thanks,
> > > Ajo
> > >
> >
>


Re: SEP-31: Pipeline Drain: Support the ability to drain pipelines to allow incompatible intermediate schema changes

2022-12-01 Thread Yi Pan
Hi, Ajo,

Sorry to reply this late. Could you clarify one thing in the design: For
watermark triggered window draining, is the infinitive watermark trigger
happen first, or the drain token in all source SSP happen first? Shouldn't
it be the following sequence: a) all drain token from all input source SSPs
(except for intermediate streams) are received by tasks ==> b) infinite
watermark triggers from the source and flush all window/triggers in the
pipeline ==> c) once the infinite watermark is propagated through all
stages in the pipeline, stops the tasks. Could you confirm?

Thanks a lot!

-Yi

On Thu, Nov 17, 2022 at 9:48 AM Ajo Thomas  wrote:

> Hi All,
>
> Samza currently doesn't have a way to gracefully drain pipelines before
> making a backward-incompatible intermediate schema change. We have added a
> feature called Pipeline Drain to the samza engine to address this problem.
> Here is the SEP page for it:
>
> https://cwiki.apache.org/confluence/display/SAMZA/SEP-31%3A+Pipeline+Drain%3A+Support+the+ability+to+drain+pipelines+to+allow+incompatible+intermediate+schema+changes
>
>
> If there are no major blockers, we are tentatively seeking to open a vote
> on Monday, Nov 28th, 2022.
>
> Thanks,
> Ajo
>


[REPORT] Samza - Nov 2022

2022-11-09 Thread Yi Pan
## Description:
The mission of Samza is the creation and maintenance of software related to
distributed stream processing framework

## Issues:
- There are no issues requiring board attention.

## Membership Data:
Apache Samza was founded 2015-01-22 (8 years ago)
There are currently 29 committers and 17 PMC members in this project.
The Committer-to-PMC ratio is roughly 8:5.

Community changes, past quarter:
- No new PMC members. Last addition was Bharath Kumarasubramanian on
2020-02-13.
- No new committers. Last addition was Daniel Chen on 2021-09-17.

## Project Activity:
- Samza upgrade to be runtime compatible w/ Java 11 and YARN 3.3
- Stream Processing Meetup@LinkedIn on Kafka, Samza held on 2022-10-19

## Community Health:
JIRA:
13 issues opened in JIRA, past quarter (no change)
10 issues closed in JIRA, past quarter (233% increase)
Commits:
11 commits in the past quarter (-50% decrease)
7 code contributors in the past quarter (-36% change)
16 PRs opened on GitHub, past quarter (-23% change)
13 PRs closed on GitHub, past quarter (-35% change)


Re: Request for new release

2022-10-15 Thread Yi Pan
Hi, James,

Thanks for the reminder. We are preparing the new 1.8 release. It is
expected to be the end of this quarter.

Best,

-Yi

On Mon, Oct 10, 2022 at 12:52 PM James DeMichele
 wrote:

> Hello, we just had a pr merged to main in the Samza app that now supports
> Java 11 runtime environments.
>
> Could we get a new official release of this project?
>
> Here's the pr: https://github.com/apache/samza/pull/1628
>
> Jamie
>


Re: Java 11 Checkin again

2022-09-20 Thread Yi Pan
Hi, James,

Sorry that I was busy during the day and couldn't check your email. I have
joined the slack channel you created. Let's discuss there.

Best,

-Yi

On Mon, Sep 19, 2022 at 9:23 AM James DeMichele
 wrote:

> Hey, Yi! Thanks.
>
> I started a slack channel that maybe would make it easier to communicate if
> I have questions. I do have one issue that I am hitting between the 2
> different Yarn versions I think and that I am not entirely sure what to do
> about. I made a change to a Test class that was needed for a
> compilation fix:
>
> https://github.com/apache/samza/pull/1628/files#diff-34db8b18730bda1058014e87ec3ad88dfc03f79854b00a339407761d224f66e9
> and the class is TestSamzaYarnAppMasterLifecycle.scala.
>
> I'm not sure how to go about toggling this class between a compatible one
> for yarn 2.10.1 and 3.3.4.
>
> Thanks!
>
> -Jamie
>
> On Mon, Sep 19, 2022 at 11:00 AM Yi Pan  wrote:
>
> > Hi, James,
> >
> > Thanks a lot for reporting this. I will take a look this week.
> >
> > Best!
> >
> > -Yi
> >
> > On Mon, Sep 19, 2022 at 8:20 AM James DeMichele
> >  wrote:
> >
> > > Hey Yi. I can take a look at this. I do want to point out that your
> > current
> > > "master" branch is actually broken for using Scala 2.11.
> > >
> > > You can repro by just going into the master branch using Java 8 and
> > > compiling like this
> > >
> > > $ java -version
> > > openjdk version "1.8.0_332"
> > > OpenJDK Runtime Environment (Temurin)(build 1.8.0_332-b09)
> > > OpenJDK 64-Bit Server VM (Temurin)(build 25.332-b09, mixed mode)
> > >
> > > ./gradlew build -PscalaSuffix=2.11
> > >
> > > The build fails with this command using that version of Java 8 ^.
> > >
> > > Anyway, just wanted to point that out since I hit this in my branch
> > > trying to utilize the "bin/check-all.sh" script. That doesn't block
> > > me/us, but just wanted to call it out.
> > >
> > > -Jamie
> > >
> > >
> > >
> > > On Wed, Sep 14, 2022 at 5:52 PM Yi Pan  wrote:
> > >
> > > > Hey, James,
> > > >
> > > > In order to merge your PR without breaking the jdk8 older modules, we
> > > will
> > > > need the changes proposed here. Can you try to add those build script
> > > > changes in the same PR? We will definitely help review and merge it.
> > > >
> > > > Best!
> > > >
> > > > -Yi
> > > >
> > > > On Wed, Sep 14, 2022 at 8:00 AM James DeMichele
> > > >  wrote:
> > > >
> > > > > Also, do you have a timeline for when this could be completed?
> > Thanks.
> > > > >
> > > > > On Wed, Sep 14, 2022 at 7:16 AM James DeMichele <
> > > > > james.demich...@redfin.com>
> > > > > wrote:
> > > > >
> > > > > > That sounds like a great solution to me if that works for y'all!
> > > > > >
> > > > > > Note too, the Java 11 and yarn 3 module need to only use the
> Scala
> > > 2.12
> > > > > > version of the build.
> > > > > >
> > > > > > Jamie
> > > > > >
> > > > > >
> > > > > > On Wed, Sep 14, 2022, 2:38 AM Yi Pan 
> wrote:
> > > > > >
> > > > > >> Hi, James,
> > > > > >>
> > > > > >> Sorry to reply late. I just came back from a trip and had a
> > > discussion
> > > > > >> with
> > > > > >> our internal team as well. So, there is one proposal other than
> > > > > creating a
> > > > > >> branch. Let me elaborate it below:
> > > > > >> a) creating a new module samza-yarn3 that depends on YARN 3.3.0
> > and
> > > be
> > > > > the
> > > > > >> hosting module for most of the jdk11 related changes.
> > > > > >> b) modify the build script s.t. samza-yarn3 will only compile
> and
> > > > build
> > > > > >> with jdk11 and samza-yarn only compile and build with jdk8.
> > > > > >> Thus, we can have two builds: jdk8 build that builds with
> > samza-yarn
> > > > w/
> > > > > >> YARN 2.10.0, and jdk11 build that builds with samza-yanr3 w/
> YARN
> > > > 3.3.0.
> > > > > >> We

Re: Java 11 Checkin again

2022-09-19 Thread Yi Pan
Hi, James,

Thanks a lot for reporting this. I will take a look this week.

Best!

-Yi

On Mon, Sep 19, 2022 at 8:20 AM James DeMichele
 wrote:

> Hey Yi. I can take a look at this. I do want to point out that your current
> "master" branch is actually broken for using Scala 2.11.
>
> You can repro by just going into the master branch using Java 8 and
> compiling like this
>
> $ java -version
> openjdk version "1.8.0_332"
> OpenJDK Runtime Environment (Temurin)(build 1.8.0_332-b09)
> OpenJDK 64-Bit Server VM (Temurin)(build 25.332-b09, mixed mode)
>
> ./gradlew build -PscalaSuffix=2.11
>
> The build fails with this command using that version of Java 8 ^.
>
> Anyway, just wanted to point that out since I hit this in my branch
> trying to utilize the "bin/check-all.sh" script. That doesn't block
> me/us, but just wanted to call it out.
>
> -Jamie
>
>
>
> On Wed, Sep 14, 2022 at 5:52 PM Yi Pan  wrote:
>
> > Hey, James,
> >
> > In order to merge your PR without breaking the jdk8 older modules, we
> will
> > need the changes proposed here. Can you try to add those build script
> > changes in the same PR? We will definitely help review and merge it.
> >
> > Best!
> >
> > -Yi
> >
> > On Wed, Sep 14, 2022 at 8:00 AM James DeMichele
> >  wrote:
> >
> > > Also, do you have a timeline for when this could be completed? Thanks.
> > >
> > > On Wed, Sep 14, 2022 at 7:16 AM James DeMichele <
> > > james.demich...@redfin.com>
> > > wrote:
> > >
> > > > That sounds like a great solution to me if that works for y'all!
> > > >
> > > > Note too, the Java 11 and yarn 3 module need to only use the Scala
> 2.12
> > > > version of the build.
> > > >
> > > > Jamie
> > > >
> > > >
> > > > On Wed, Sep 14, 2022, 2:38 AM Yi Pan  wrote:
> > > >
> > > >> Hi, James,
> > > >>
> > > >> Sorry to reply late. I just came back from a trip and had a
> discussion
> > > >> with
> > > >> our internal team as well. So, there is one proposal other than
> > > creating a
> > > >> branch. Let me elaborate it below:
> > > >> a) creating a new module samza-yarn3 that depends on YARN 3.3.0 and
> be
> > > the
> > > >> hosting module for most of the jdk11 related changes.
> > > >> b) modify the build script s.t. samza-yarn3 will only compile and
> > build
> > > >> with jdk11 and samza-yarn only compile and build with jdk8.
> > > >> Thus, we can have two builds: jdk8 build that builds with samza-yarn
> > w/
> > > >> YARN 2.10.0, and jdk11 build that builds with samza-yanr3 w/ YARN
> > 3.3.0.
> > > >> We
> > > >> can manage to publish both jdk8 and jdk11 artifacts if needed.
> > > >> The benefit of this approach is that we can still maintain the trunk
> > > >> release while opening up the jdk11 support.
> > > >>
> > > >> Let me know if that works for you and we can work together to get
> the
> > > code
> > > >> in.
> > > >>
> > > >> Best!
> > > >>
> > > >> -Yi
> > > >>
> > > >> On Tue, Sep 13, 2022 at 7:53 AM James DeMichele
> > > >>  wrote:
> > > >>
> > > >> > Hey Yi, I wanted to follow up here and figure what a path forward
> is
> > > >> here.
> > > >> > We need to move to Java 11, and Samza currently is our only
> blocking
> > > >> issue.
> > > >> > In order to move to Java 11, the Yarn Cluster would need to run on
> > > Java
> > > >> 11
> > > >> > correct? If that's the case, then it would need to be 3.3+. I
> don't
> > > know
> > > >> > what it entails on your end to have a new Major version, but that
> > > seems
> > > >> > like a good option here right? Version 2 could be where we can
> move
> > > this
> > > >> > project forward to Java 11, while Version 1 can still remain, and
> > > would
> > > >> not
> > > >> > break people that can't/won't upgrade to Java 11.
> > > >> >
> > > >> > -Jamie
> > > >> >
> > > >> > On Tue, Sep 6, 2022 at 9:55 AM James DeMichele <
> > > >> james.demich...@redfin.com

Re: Java 11 Checkin again

2022-09-14 Thread Yi Pan
Hey, James,

In order to merge your PR without breaking the jdk8 older modules, we will
need the changes proposed here. Can you try to add those build script
changes in the same PR? We will definitely help review and merge it.

Best!

-Yi

On Wed, Sep 14, 2022 at 8:00 AM James DeMichele
 wrote:

> Also, do you have a timeline for when this could be completed? Thanks.
>
> On Wed, Sep 14, 2022 at 7:16 AM James DeMichele <
> james.demich...@redfin.com>
> wrote:
>
> > That sounds like a great solution to me if that works for y'all!
> >
> > Note too, the Java 11 and yarn 3 module need to only use the Scala 2.12
> > version of the build.
> >
> > Jamie
> >
> >
> > On Wed, Sep 14, 2022, 2:38 AM Yi Pan  wrote:
> >
> >> Hi, James,
> >>
> >> Sorry to reply late. I just came back from a trip and had a discussion
> >> with
> >> our internal team as well. So, there is one proposal other than
> creating a
> >> branch. Let me elaborate it below:
> >> a) creating a new module samza-yarn3 that depends on YARN 3.3.0 and be
> the
> >> hosting module for most of the jdk11 related changes.
> >> b) modify the build script s.t. samza-yarn3 will only compile and build
> >> with jdk11 and samza-yarn only compile and build with jdk8.
> >> Thus, we can have two builds: jdk8 build that builds with samza-yarn w/
> >> YARN 2.10.0, and jdk11 build that builds with samza-yanr3 w/ YARN 3.3.0.
> >> We
> >> can manage to publish both jdk8 and jdk11 artifacts if needed.
> >> The benefit of this approach is that we can still maintain the trunk
> >> release while opening up the jdk11 support.
> >>
> >> Let me know if that works for you and we can work together to get the
> code
> >> in.
> >>
> >> Best!
> >>
> >> -Yi
> >>
> >> On Tue, Sep 13, 2022 at 7:53 AM James DeMichele
> >>  wrote:
> >>
> >> > Hey Yi, I wanted to follow up here and figure what a path forward is
> >> here.
> >> > We need to move to Java 11, and Samza currently is our only blocking
> >> issue.
> >> > In order to move to Java 11, the Yarn Cluster would need to run on
> Java
> >> 11
> >> > correct? If that's the case, then it would need to be 3.3+. I don't
> know
> >> > what it entails on your end to have a new Major version, but that
> seems
> >> > like a good option here right? Version 2 could be where we can move
> this
> >> > project forward to Java 11, while Version 1 can still remain, and
> would
> >> not
> >> > break people that can't/won't upgrade to Java 11.
> >> >
> >> > -Jamie
> >> >
> >> > On Tue, Sep 6, 2022 at 9:55 AM James DeMichele <
> >> james.demich...@redfin.com
> >> > >
> >> > wrote:
> >> >
> >> > > Yeah I mean if Samza works fine with the hadoop-yarn library running
> >> > > against a 3.3.x YARN cluster, then I don't mind keeping that library
> >> of
> >> > > 2.10.x in Samza's code. But it is still a moot point in terms of
> >> > upgrading
> >> > > your YARN cluster, since it must be upgraded to 3.3.x+ in order to
> be
> >> > able
> >> > > to run the Cluster with Java 11.
> >> > >
> >> > > @Yi, I think that moving to a new major version might be the
> solution
> >> > > here. That way Linkedin can still have a pathway of upgrading code
> for
> >> > the
> >> > > old legacy 1.x version of Samza. While a new major version of 2.x of
> >> > Samza
> >> > > could then make it a requirement that it runs with a YARN cluster of
> >> > 3.3.x
> >> > > if you want to use Java 11.
> >> > >
> >> > > The only issue there is that you'll probably need to backport
> changes
> >> > > between the 2 versions. But in all honestly, this project does not
> >> look
> >> > > extremely active with commits so it might not be that big of a
> >> problem.
> >> > >
> >> > > -Jamie
> >> > >
> >> > > On Fri, Sep 2, 2022 at 9:08 PM Malcolm McFarland <
> >> mmcfarl...@cavulus.com
> >> > >
> >> > > wrote:
> >> > >
> >> > >> Hi all,
> >> > >>
> >> > >> I've been doing a little bit of testing with Samza and Hadoop
> 3.3.4;
> >&g

Re: Java 11 Checkin again

2022-09-14 Thread Yi Pan
Hi, James,

Sorry to reply late. I just came back from a trip and had a discussion with
our internal team as well. So, there is one proposal other than creating a
branch. Let me elaborate it below:
a) creating a new module samza-yarn3 that depends on YARN 3.3.0 and be the
hosting module for most of the jdk11 related changes.
b) modify the build script s.t. samza-yarn3 will only compile and build
with jdk11 and samza-yarn only compile and build with jdk8.
Thus, we can have two builds: jdk8 build that builds with samza-yarn w/
YARN 2.10.0, and jdk11 build that builds with samza-yanr3 w/ YARN 3.3.0. We
can manage to publish both jdk8 and jdk11 artifacts if needed.
The benefit of this approach is that we can still maintain the trunk
release while opening up the jdk11 support.

Let me know if that works for you and we can work together to get the code
in.

Best!

-Yi

On Tue, Sep 13, 2022 at 7:53 AM James DeMichele
 wrote:

> Hey Yi, I wanted to follow up here and figure what a path forward is here.
> We need to move to Java 11, and Samza currently is our only blocking issue.
> In order to move to Java 11, the Yarn Cluster would need to run on Java 11
> correct? If that's the case, then it would need to be 3.3+. I don't know
> what it entails on your end to have a new Major version, but that seems
> like a good option here right? Version 2 could be where we can move this
> project forward to Java 11, while Version 1 can still remain, and would not
> break people that can't/won't upgrade to Java 11.
>
> -Jamie
>
> On Tue, Sep 6, 2022 at 9:55 AM James DeMichele  >
> wrote:
>
> > Yeah I mean if Samza works fine with the hadoop-yarn library running
> > against a 3.3.x YARN cluster, then I don't mind keeping that library of
> > 2.10.x in Samza's code. But it is still a moot point in terms of
> upgrading
> > your YARN cluster, since it must be upgraded to 3.3.x+ in order to be
> able
> > to run the Cluster with Java 11.
> >
> > @Yi, I think that moving to a new major version might be the solution
> > here. That way Linkedin can still have a pathway of upgrading code for
> the
> > old legacy 1.x version of Samza. While a new major version of 2.x of
> Samza
> > could then make it a requirement that it runs with a YARN cluster of
> 3.3.x
> > if you want to use Java 11.
> >
> > The only issue there is that you'll probably need to backport changes
> > between the 2 versions. But in all honestly, this project does not look
> > extremely active with commits so it might not be that big of a problem.
> >
> > -Jamie
> >
> > On Fri, Sep 2, 2022 at 9:08 PM Malcolm McFarland  >
> > wrote:
> >
> >> Hi all,
> >>
> >> I've been doing a little bit of testing with Samza and Hadoop 3.3.4;
> >> afaict, in light testing, Samza seems to work fine using the 2.10.x
> >> hadoop-yarn library against a YARN cluster running 3.3.x. As Jamie
> pointed
> >> out, YARN didn't incorporate Java 11 compatibility until v3.3.0 (
> >> https://hadoop.apache.org/docs/r3.3.0/index.html). Are there any unit
> >> tests
> >> in Samza that verify compatibility against a YARN cluster? If so, that
> >> could be a place to validate YARN v2.10/v3.3 cross-compatibility.
> >>
> >> Just throwing my 2 cents out there,
> >> Malcolm McFarland
> >> Cavulus
> >>
> >> On Fri, Sep 2, 2022 at 6:27 PM James DeMichele
> >>  wrote:
> >>
> >> > Hey Yi,
> >> >
> >> > Thanks for getting back to me. I have not tried the older yarn cluster
> >> > version yet in the Samza app running against 3.3.4 but I am wary it
> >> would
> >> > work. Yarn itself is not compatible at 2.10.1 with Java 11 so you
> would
> >> > have to update yarn even if the Java library here wasn't updated.
> >> >
> >> > Could we move this version I'm proposing to a 2.x version of Samza? So
> >> > people that wanted to move forward with yarn upgrade and Samza and
> Java
> >> 11
> >> > (like us) could do so? Then 1.x could only be java 8 compatible and
> 2.x
> >> > could be java 11.
> >> >
> >> > Jamie
> >> >
> >> > On Fri, Sep 2, 2022, 6:44 PM Yi Pan  wrote:
> >> >
> >> > > Hey, James,
> >> > >
> >> > > Thanks for the ping. @prateek, can we have someone to review this
> >> change?
> >> > >
> >> > > One question: have you tested the change w/ the older YARN cluster
> >> > version
> >> > > (running 2.10.1)? If this change requires YARN cluster

Re: Running v1.7.0 locally

2022-09-02 Thread Yi Pan
Hey, Malcolm,

Thanks for reporting this issue. Could you open a JIRA to track that?

Best!

-Yi

On Mon, Aug 29, 2022 at 5:53 PM Malcolm McFarland 
wrote:

> Hey folks,
>
> I've recently been attempting to upgrade our legacy application from Samza
> 1.5.1 to 1.7.0. With version 1.5.1, I've had no problems running the
> application with this command:
>
> ./bin/run-app.sh --config-path=path/to/file.properties
>
> Starting in 1.6.0, this doesn't seem to work. As far as I can tell, the
> application is starting fully up without errors and then is simply shutting
> down, once again without error. Afaict it runs fine on YARN. Does Samza
> v1.6.0+ support running local processes? I've tried this on both OS X and
> Ubuntu, using Java 1.8.
>
> Here are the relevant portions of the properties file:
>
> task.class=com.cavulus.task.SimpleLegacyTask
> job.factory.class=org.apache.samza.job.local.ThreadJobFactory
> job.default.system=kafka
>
> systems.kafka.samza.factory=org.apache.samza.system.kafka.KafkaSystemFactory
> job.name=simple-legacy-task
> task.inputs=kafka.event-input
>
> ...plus serdes, ZooKeeper configuration, etc, etc. Here are the last few
> lines of logging output:
>
> 2022-08-29 17:19:42,842  DEBUG  [org.apache.kafka.clients.NetworkClient]
>  [Consumer clientId=kafka_admin_consumer-simple_legacy_task-1,
> groupId=simple-legacy-task-1] Sending metadata request
> (type=MetadataRequest, topics=) to node localhost:9092 (id: -1 rack: null)
> 2022-08-29 17:19:42,843  INFO   [org.apache.kafka.clients.Metadata]
>  Cluster ID: fwnjhL2kQayFxN0xpatT-g
> 2022-08-29 17:19:42,843  DEBUG  [org.apache.kafka.clients.Metadata]
>  Updated cluster metadata version 2 to Cluster(id = fwnjhL2kQayFxN0xpatT-g,
> nodes = [localhost:9092 (id: 0 rack: null)], partitions = [], controller =
> localhost:9092 (id: 0 rack: null))
> 2022-08-29 17:19:42,843  DEBUG
>  [org.apache.samza.system.kafka.KafkaSystemAdmin]  Stream
> simple-legacy-task-broadcast-stream has partitions [Partition(topic =
> simple-legacy-task-broadcast-stream, partition = 0, leader = 0, replicas =
> [0], isr = [0], offlineReplicas = [])]
> 2022-08-29 17:19:42,844  DEBUG  [org.apache.kafka.clients.NetworkClient]
>  [Consumer clientId=kafka_admin_consumer-simple_legacy_task-1,
> groupId=simple-legacy-task-1] Initiating connection to node localhost:9092
> (id: 0 rack: null)
> 2022-08-29 17:19:42,844  DEBUG  [org.apache.kafka.common.metrics.Metrics]
>  Added sensor with name node-0.bytes-sent
> 2022-08-29 17:19:42,844  DEBUG  [org.apache.kafka.common.metrics.Metrics]
>  Added sensor with name node-0.bytes-received
> 2022-08-29 17:19:42,844  DEBUG  [org.apache.kafka.common.metrics.Metrics]
>  Added sensor with name node-0.latency
> 2022-08-29 17:19:42,844  DEBUG  [org.apache.kafka.common.network.Selector]
>  [Consumer clientId=kafka_admin_consumer-simple_legacy_task-1,
> groupId=simple-legacy-task-1] Created socket with SO_RCVBUF = 342972,
> SO_SNDBUF = 146988, SO_TIMEOUT = 0 to node 0
> 2022-08-29 17:19:42,844  DEBUG  [org.apache.kafka.clients.NetworkClient]
>  [Consumer clientId=kafka_admin_consumer-simple_legacy_task-1,
> groupId=simple-legacy-task-1] Completed connection to node 0. Fetching API
> versions.
> 2022-08-29 17:19:42,844  DEBUG  [org.apache.kafka.clients.NetworkClient]
>  [Consumer clientId=kafka_admin_consumer-simple_legacy_task-1,
> groupId=simple-legacy-task-1] Initiating API versions fetch from node 0.
> 2022-08-29 17:19:42,845  DEBUG  [org.apache.kafka.clients.NetworkClient]
>  [Consumer clientId=kafka_admin_consumer-simple_legacy_task-1,
> groupId=simple-legacy-task-1] Recorded API versions for node 0:
> (Produce(0): 0 to 7 [usable: 6], Fetch(1): 0 to 11 [usable: 8],
> ListOffsets(2): 0 to 5 [usable: 3], Metadata(3): 0 to 8 [usable: 6],
> LeaderAndIsr(4): 0 to 2 [usable: 1], StopReplica(5): 0 to 1 [usable: 0],
> UpdateMetadata(6): 0 to 5 [usable: 4], ControlledShutdown(7): 0 to 2
> [usable: 1], OffsetCommit(8): 0 to 7 [usable: 4], OffsetFetch(9): 0 to 5
> [usable: 4], FindCoordinator(10): 0 to 2 [usable: 2], JoinGroup(11): 0 to 5
> [usable: 3], Heartbeat(12): 0 to 3 [usable: 2], LeaveGroup(13): 0 to 2
> [usable: 2], SyncGroup(14): 0 to 3 [usable: 2], DescribeGroups(15): 0 to 3
> [usable: 2], ListGroups(16): 0 to 2 [usable: 2], SaslHandshake(17): 0 to 1
> [usable: 1], ApiVersions(18): 0 to 2 [usable: 2], CreateTopics(19): 0 to 3
> [usable: 3], DeleteTopics(20): 0 to 3 [usable: 2], DeleteRecords(21): 0 to
> 1 [usable: 1], InitProducerId(22): 0 to 1 [usable: 1],
> OffsetForLeaderEpoch(23): 0 to 3 [usable: 1], AddPartitionsToTxn(24): 0 to
> 1 [usable: 1], AddOffsetsToTxn(25): 0 to 1 [usable: 1], EndTxn(26): 0 to 1
> [usable: 1], WriteTxnMarkers(27): 0 [usable: 0], TxnOffsetCommit(28): 0 to
> 2 [usable: 1], DescribeAcls(29): 0 to 1 [usable: 1], CreateAcls(30): 0 to 1
> [usable: 1], DeleteAcls(31): 0 to 1 [usable: 1], DescribeConfigs(32): 0 to
> 2 [usable: 2], AlterConfigs(33): 0 to 1 [usable: 1],
> AlterReplicaLogDirs(34): 0 to 1 [usable: 1], 

Re: Java 11 Checkin again

2022-09-02 Thread Yi Pan
Hey, James,

Thanks for the ping. @prateek, can we have someone to review this change?

One question: have you tested the change w/ the older YARN cluster version
(running 2.10.1)? If this change requires YARN cluster upgrade to 3.3.4 as
well, that may be a breaking change to existing Samza users (i.e. LinkedIn
is still running a YARN cluster with version 2.10.1).

Best and apologize for the delay.

-Yi

On Fri, Sep 2, 2022 at 8:56 AM James DeMichele
 wrote:

> Hey y'all. I just am not sure how to get some traction on these Java 11
> PRs.
>
> https://github.com/apache/samza/pull/1628
> https://github.com/apache/samza-hello-samza/pull/87
>
> Would someone that is a maintainer for Samza just let us know that y'all
> are looking at them? I can stop pestering you :)
>
> I ran all tests in both PRs, all pass. I also confirmed that using my Samza
> PR in the Hello World app all works with Java 11.
>
> Thanks!
>
> -Jamie
>


[REPORT] Samza - April 2022

2022-04-13 Thread Yi Pan
## Description:
The mission of Samza is the creation and maintenance of software related to
distributed stream processing framework

## Issues:
- There are no issues requiring board attention.

## Membership Data:
Apache Samza was founded 2015-01-22 (7 years ago)
There are currently 29 committers and 17 PMC members in this project.
The Committer-to-PMC ratio is roughly 8:5.

Community changes, past quarter:
- No new PMC members. Last addition was Bharath Kumarasubramanian on
2020-02-13.
- No new committers. Last addition was Daniel Chen on 2021-09-17.

## Project Activity:
- Samza 1.7.0 is released on 2022-04-04
- Stream Processing Meetup@LinkedIn on Kafka, Samza held on 2022-04-07

## Community Health:
JIRA:
13 issues opened in JIRA, past quarter (-13% change)
13 issues closed in JIRA, past quarter (-23% change)
Commits:
32 commits in the past quarter (10% increase)
11 code contributors in the past quarter (-21% change)
22 PRs opened on GitHub, past quarter (-15% change)
20 PRs closed on GitHub, past quarter (-20% change)


Re: [RESULT][VOTE] Apache Samza 1.7.0 RC1

2022-03-15 Thread Yi Pan
Thanks, Daniel!

Just want to mention that Boris also voted +1 (binding).

Best!

-Yi

On Tue, Mar 15, 2022 at 9:22 AM Daniel Chen  wrote:

> Hey all,
>
> The vote for 1.7.0 release has been out for more than 72 hours and we got
> +1(binding) x3  from Yi, Xinyu, Daniel
>
> Samza 1.7.0 officially passed the VOTE phase!
>
> Thanks to everyone who helped with the validation!
>
> Daniel
>


Re: [VOTE] Apache Samza 1.7.0 RC1

2022-03-15 Thread Yi Pan
+1 (binding).

Ran check-all, verified the signature and checksums. All passed.

Thanks for pushing 1.7.0 out of the door!

Yi

On Fri, Mar 11, 2022 at 2:31 PM Xinyu Liu  wrote:

> +1 (binding).
>
> Verified the signature and checksums, and also ran check-all tests which
> all passed.
>
> Thanks,
> Xinyu
>
> On Fri, Mar 11, 2022 at 2:00 PM Bob S  wrote:
>
> > +1
> > Ran build,test and both integration tests (regular + standalone) and
> > check-all.
> > Verified signatures, sha and md5.
> > Thanks Daniel!
> >
> > On Wed, Mar 9, 2022 at 4:34 PM Daniel Chen  wrote:
> >
> > > Hey all, This is a call for a vote on a release of Apache Samza 1.7.0.
> > > Thanks to everyone who has contributed to this release.
> > >
> > > The release candidate can be downloaded from here:
> > >
> > > https://home.apache.org/~dchen/samza-1.7.0-rc1/
> > >
> > > The release candidate is signed with pgp key 1D9ADCE059431C34, which is
> > > included in the repository's KEYS file:
> > >
> > >
> >
> https://git-wip-us.apache.org/repos/asf?p=samza.git;a=blob_plain;f=KEYS;hb=c5831bfc01b2e70ba57c4bd3505c6a84a73c8a7b
> > > and can also be found on keyservers:
> > >
> > >
> > >
> >
> https://keyserver.ubuntu.com/pks/lookup?search=dchen%40apache.org=on=index
> > >
> > > The git tag is release-1.7.0-rc1 and signed with the same pgp key:
> > >
> > >
> > >
> >
> https://gitbox.apache.org/repos/asf?p=samza.git;a=tag;h=refs/tags/release-1.7.0-rc1
> > >
> > > Test binaries have been published to Maven's staging repository, and
> are
> > > available here:
> > >
> > > Scala 2.11:
> > >
> https://repository.apache.org/content/repositories/orgapachesamza-1092/
> > > Scala 2.12:
> > >
> https://repository.apache.org/content/repositories/orgapachesamza-1093/
> > >
> > > The vote will be open for 72 hours ( end in 5:00pm Saturday, 03/12/2022
> > ).
> > > Please download the release candidate, check the hashes/signature,
> build
> > it
> > > and test it, and then please vote: [ ] +1 approve [ ] +0 no opinion [ ]
> > -1
> > > disapprove (and reason why)
> > >
> > > I ran check-all.sh and bor...@apache.org helped run integration tests
> > > (both
> > > YARN and standalone) passed, for rc1
> > >
> > > +1 from my side for the release.
> > > Thanks,
> > > Daniel
> > >
> >
>


[REPORT] Samza - Feb 2022

2022-02-09 Thread Yi Pan
## Description:
The mission of Samza is the creation and maintenance of software related to
distributed stream processing framework

## Issues:
- There are no issues requiring board attention.

## Membership Data:
Apache Samza was founded 2015-01-22 (7 years ago)
There are currently 29 committers and 17 PMC members in this project.
The Committer-to-PMC ratio is roughly 8:5.

Community changes, past quarter:
- No new PMC members. Last addition was Bharath Kumarasubramanian on
2020-02-13.
- No new committers. Last addition was Daniel Chen on 2021-09-17.

## Project Activity:
- Samza 1.7.x release is in DISCUSSION to include the following major
features
  - [SAMZA-2591] Introduce Async State Backup API (SEP-28)
  - [SAMZA-2657] Blob store backed state backup and restore (SEP-29)
  - [SAMZA-2709] Adding partial updates to Samza Table API (SEP-30)
  - [SAMZA-2716] Upgrade to Kafka 2.4
- Samza auto-scaling presented in Stream Processing Meetup@LinkedIn on Dec 1

## Community Health:
JIRA
- 11 issues opened in JIRA, past quarter (-72% change)
- 24 issues closed in JIRA, past quarter (166% increase)
COMMITS
- 28 commits in the past quarter (-15% decrease)
- 15 code contributors in the past quarter (87% increase)
- 24 PRs opened on GitHub, past quarter (-42% change)
- 25 PRs closed on GitHub, past quarter (-30% change)


Re: [DISCUSS] Apache Samza 1.7.0 RC0

2022-01-26 Thread Yi Pan
Huge +1! Can't wait to see this list of features coming out!

-Yi

On Wed, Jan 26, 2022 at 2:36 PM Daniel Chen  wrote:

> Hi folks,
>
> We have added a number of major features and changes to master since
>
> 1.6, that warrants a major 1.7 release.
>
> Within LinkedIn, some of these features have already been tested as
>
> part of our test suites and are currently used in many of our production
> jobs. We plan to continue our testing in the coming weeks to validate the
> stability prior to release.
>
> We wanted to kick off the discussion in the open source forum to keep
>
> the momentum flowing.
>
> Here is a selected list of major features that are part of the new release:
>
>
>
>-
>
>SEP-28: Samza State Backend Interface and Checkpointing Improvements
>(#1514)
>-
>
>SEP-29: Blob Store as backend for Samza State backup and restore (#1501)
>-
>
>SEP-30: Adding partial update api to Table API (#1560)
>
>
>
> You can find a concrete list of the features, bug-fixes, upgrades here
>
>
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SAMZA%20AND%20fixVersion%20%3D%201.7
>


Re: [VOTE] SEP-30: Support Updates in Table API

2022-01-24 Thread Yi Pan
Discussed and resolved the minor concerns offline. +1 (binding) for this
one.

Thanks!

-Yi

On Tue, Dec 21, 2021 at 1:28 PM Xinyu Liu  wrote:

> +1 on my side.
>
> Glad to see this feature coming. Please make sure the api changes are
> reflected in the documents, e.g.
> https://samza.apache.org/learn/documentation/1.0.0/api/table-api.html.
>
> Thanks,
> Xinyu
>
> On Mon, Dec 20, 2021 at 10:44 AM Ajo Thomas 
> wrote:
>
> > Hi All,
> >
> > This is a call for a vote on SEP-30: Support Updates in Table API
> > Thanks to everyone involved with the design and reviews to refine the
> > proposal.
> >
> > Email Thread:
> >
> >
> http://mail-archives.apache.org/mod_mbox/samza-dev/202112.mbox/%3cCAAMuQDN9fX64KONdqD1n06xTXvgMXNUqkt2RnPnt9Zr=vjn...@mail.gmail.com%3e
> >
> > SEP-30:
> >
> >
> https://cwiki.apache.org/confluence/display/SAMZA/SEP-30%3A+Support+Updates+in+Table+API
> >
> > Jira ticket:
> > https://issues.apache.org/jira/browse/SAMZA-2709
> >
> > Please vote:
> > [ ] +1 approve
> > [ ] +0 no opinion
> > [ ] -1 disapprove (and reason why)
> >
> > Thanks,
> > Ajo Thomas
> >
>


[Draft] [REPORT] Apache Samza Jan 2022

2022-01-12 Thread Yi Pan
Hi, team,

Please read the draft report below and let me know if I missed anything.
Thanks!

- Yi

==
## Description:
The mission of Samza is the creation and maintenance of software related to
distributed stream processing framework

## Issues:
- There are no issues requiring board attention.

## Membership Data:
Apache Samza was founded 2015-01-22 (7 years ago)
There are currently 29 committers and 17 PMC members in this project.
The Committer-to-PMC ratio is roughly 8:5.

Community changes, past quarter:
- No new PMC members. Last addition was Bharath Kumarasubramanian on
2020-02-13.
- No new committers. Last addition was Daniel Chen on 2021-09-17.

## Project Activity:
- Samza 1.7.x release is WIP to include the following najor features
  - [SAMZA-2591] Introduce Async State Backup API (SEP-28)
  - [SAMZA-2657] Blob store backed state backup and restore (SEP-29)
  - [SAMZA-2709] Adding partial updates to Samza Table API (SEP-30)
  - [SAMZA-2716] Upgrade to Kafka 2.4
- Samza auto-scaling presented in Stream Processing Meetup@LinkedIn on Dec 1

## Community Health:
JIRA
- 13 issues opened in JIRA, past quarter (-66% change)
- 17 issues closed in JIRA, past quarter (70% increase)
COMMITS
- 27 commits in the past quarter (-37% change)
- 14 code contributors in the past quarter (100% increase)
- 24 PRs opened on GitHub, past quarter (-36% change)
- 23 PRs closed on GitHub, past quarter (-32% change)


[Draft][REPORT] Apache Samza Oct 2021

2021-10-11 Thread Yi Pan
Hi, team,

Please read the following draft report for Oct 2021. Let me know if I
missed anything. Thanks a lot!



## Description:
The mission of Samza is the creation and maintenance of software related to
distributed stream processing framework

## Issues:
- There are no issues requiring board attention.

## Membership Data:
Apache Samza was founded 2015-01-22 (7 years ago)
There are currently 29 committers and 17 PMC members in this project.
The Committer-to-PMC ratio is roughly 8:5.

Community changes, past quarter:
- No new PMC members. Last addition was Bharath Kumarasubramanian on
2020-02-13.
- Daniel Chen was added as committer on 2021-09-17

## Project Activity:
Major feature WIP:
- [SAMZA-2687] Samza elasticity project to scale beyond the input partition
count

## Community Health:
JIRA:
- 36 issues opened in JIRA, past quarter (100% increase)
- 10 issues closed in JIRA, past quarter (-37% change)
COMMITS:
- 40 commits in the past quarter (110% increase)
- 7 code contributors in the past quarter (16% increase)
- 34 PRs opened on GitHub, past quarter (47% increase)
- 31 PRs closed on GitHub, past quarter (34% increase)


Re: [ANNOUNCE] Welcome Daniel Chen as Samza Committer

2021-09-17 Thread Yi Pan
Congrats, Daniel, well deserved!!!

-Yi

On Fri, Sep 17, 2021 at 11:23 AM Xinyu Liu  wrote:

> Hi, all,
>
> I am glad to announce that Daniel Chen has officially accepted our
> invitation and become an Apache Samza Committer now.
>
> Daniel has contributed to many areas of Samza, from his early work on
> Eventhub connector, to recently state restore and checkpointing
> improvements. Daniel also contributed tremendously to integrate Apache Beam
> Python API on top of Samza. As an active member in Samza, he has
> participated frequently in the design, code reviews and mailing list
> discussions. He has also contributed to Samza tutorials, website, releases
> and bug fixes.
>
> Considering his contributions, the Samza PMC trusts Daniel with the
> responsibilities of a Samza Committer.
>
> Please join me to give him a warm welcome!
>
> Xinyu Liu
> on behalf of the Apache Samza PMC
>


Re: Malformed URL issue while deploying Samza on a pure IPv6 VM

2021-08-26 Thread Yi Pan
Hi Vishal,

Could you open a JIRA to track this one? I will circle back to our
internal team to do a quick assessment on IPv6 related issues. Meanwhile, I
strongly encourage you to submit the patch and continue your test, since
that's the best way to discover any hidden issues on that front.

Thanks for reporting this issue!

-Yi

On Thu, Aug 26, 2021 at 10:25 AM Vishal Ranjan 
wrote:

> In our product we are expanding our support for IPv6 environment.
> We earlier tested deploying Samza in a dual stack environment, and it
> worked fine.
> Recently we got our hands on a pure IPv6 lab, and while testing we came
> across an issue when starting Samza.
>
> Looking at the stack trace it seems like the url is not formed correctly
> for IPv6 deployment. The IPv6 address should have been decorated inside '[
> ]’, but that isn’t the case.
> Here is the stack trace. The ip of the VM where Samza is deployed is
> 'fc00:192:168:22::14’.
>
> 2021-08-26T07:28:52.232Z INFO jetty.server.AbstractConnector main
> doStart:331 Started ServerConnector@6c1cfa53{HTTP/1.1, (http/1.1)}{
> 0.0.0.0:44843}
> 2021-08-26T07:28:52.232Z INFO jetty.server.Server main doStart:415 Started
> @5524ms
> 2021-08-26T07:28:52.235Z ERROR
> samza.clustermanager.ClusterBasedJobCoordinator main run:314 Exception
> thrown in the JobCoordinator loop
> java.net.MalformedURLException: Error at index 3 in:
> "192:168:22:0:0:0:14:44843"
> at java.base/java.net.URL.(URL.java:679)
> at java.base/java.net.URL.(URL.java:541)
> at java.base/java.net.URL.(URL.java:488)
> at
> org.apache.samza.coordinator.server.HttpServer.getUrl(HttpServer.scala:134)
> at
> org.apache.samza.coordinator.server.HttpServer.$anonfun$start$4(HttpServer.scala:113)
> at org.apache.samza.util.Logging.info(Logging.scala:63)
> at org.apache.samza.util.Logging.info$(Logging.scala:61)
> at org.apache.samza.coordinator.server.HttpServer.info
> (HttpServer.scala:39)
> at
> org.apache.samza.coordinator.server.HttpServer.start(HttpServer.scala:113)
> at
> org.apache.samza.job.yarn.SamzaYarnAppMasterService.onInit(SamzaYarnAppMasterService.scala:53)
> at
> org.apache.samza.job.yarn.YarnClusterResourceManager.start(YarnClusterResourceManager.java:212)
> at
> org.apache.samza.clustermanager.ContainerProcessManager.start(ContainerProcessManager.java:230)
> at
> org.apache.samza.clustermanager.ClusterBasedJobCoordinator.run(ClusterBasedJobCoordinator.java:289)
> at
> org.apache.samza.clustermanager.ClusterBasedJobCoordinator.runClusterBasedJobCoordinator(ClusterBasedJobCoordinator.java:547)
> at
> org.apache.samza.clustermanager.ClusterBasedJobCoordinator.main(ClusterBasedJobCoordinator.java:473)
> Caused by: java.lang.NumberFormatException: Error at index 3 in:
> "192:168:22:0:0:0:14:44843"
> at
> java.base/java.lang.NumberFormatException.forCharSequence(NumberFormatException.java:81)
> at java.base/java.lang.Integer.parseInt(Integer.java:735)
> at java.base/java.net
> .URLStreamHandler.parseURL(URLStreamHandler.java:223)
>
> This is the problematic line of code:
> ~l/.m2/repository/org/apache/samza/samza-core_2.12/1.5.1-arkin-jdk11/samza-core_2.12-1.5.1-arkin-jdk11-sources.jar!/org/apache/samza/coordinator/server/HttpServer.scala:134
>  new URL("http://; + Util.getLocalHost.getHostName + ":" + runningPort +
> rootPath)
>
>
> While we are planning to add a patch (proper decoration based on IP type)
> to fix this, we are not certain if there will be any further issue with
> IPv6.
> We searched over internet but couldn’t find whether Samza supports IPv6 or
> not. We need to know if there will be any further issues or not.
>
> We are at a crucial stage of release cycle, and having a proper estimate
> will help us plan better. Any help on this matter is highly appreciated.
>
> Thanks,
> Vishal
>
>


[Draft] [REPORT] Samza Report July 2021

2021-07-14 Thread Yi Pan
## Description:
The mission of Samza is the creation and maintenance of software related to
distributed stream processing framework

## Issues:
- There are no issues requiring board attention.

## Membership Data:
Apache Samza was founded 2015-01-22 (6 years ago)
There are currently 28 committers and 17 PMC members in this project.
The Committer-to-PMC ratio is roughly 7:5.

Community changes, past quarter:
- No new PMC members. Last addition was Bharath Kumarasubramanian on
2020-02-13.
- No new committers. Last addition was Ke Wu on 2021-02-25.

## Project Activity:
New features added:
- container placement (SEP22)
- remote blob storage as state backup store (SEP29)
- checkpoint to include state snapshot in remote blob stores (SEP28)

## Community Health:
- JIRA
  17 issues opened in JIRA, past quarter (-41% change)
  16 issues closed in JIRA, past quarter (45% increase)
- Commits
  18 commits in the past quarter (-48% change)
  6 code contributors in the past quarter (-53% change)
  19 PRs opened on GitHub, past quarter (-51% change)
  23 PRs closed on GitHub, past quarter (-39% change)
- Community activities
  Streams Meetup @LinkedIn on 6/24 (https://lnkd.in/g7Fq-3K)


Re: [VOTE] SEP-29: Blob Store Based State Backup And Restore

2021-06-22 Thread Yi Pan
+1 (binding). Thanks for rolling out this big feature!

-Yi

On Tue, Jun 22, 2021 at 1:42 PM Sanil Jain  wrote:

> +1 (non-binding) Thanks for this contribution!
>
> -Sanil
>
> On Tue, 22 Jun 2021 at 13:13, Daniel Chen  wrote:
>
> > +1 (non-binding), thanks!
> >
> > On Tue, Jun 22, 2021 at 1:10 PM Prateek Maheshwari 
> > wrote:
> >
> > > +1 (binding) from me. Thanks for the contribution!
> > >
> > > - Prateek
> > >
> > > On Tue, Jun 22, 2021 at 11:45 AM Prateek Maheshwari <
> prate...@utexas.edu
> > >
> > > wrote:
> > >
> > > > Hi all,
> > > >
> > > > This is a call for a vote on SEP-29: Blob Store Based State Backup
> And
> > > > Restore.
> > > >
> > > > Discussion thread:
> > > >
> > >
> >
> https://mail-archives.apache.org/mod_mbox/samza-dev/202106.mbox/%3cCAMja7KdMNU_Zk-vDnwcm4GSJs==126-mu6djgtsoukzkxzf...@mail.gmail.com%3e
> > > >
> > > > SEP-29:
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/SAMZA/SEP-29%3A+Blob+Store+Based+State+Backup+And+Restore
> > > >
> > > > Please vote:
> > > > [  ] +1 approve
> > > > [  ] +0 no opinion
> > > > [  ] -1 disapprove (and the reason why)
> > > >
> > > > Thanks,
> > > > Shekhar and Prateek
> > > >
> > >
> >
>


Re: [VOTE] SEP-28: Samza State Backend Interface and Checkpointing Improvement

2021-06-22 Thread Yi Pan
+1 (binding) this is going to improve our state recovery story
significantly!

-Yi

On Mon, Jun 21, 2021 at 1:03 PM Daniel Chen  wrote:

> Hi all,
>
> This is a call for a vote on SEP-28: Samza State Backend Interface and
> Checkpointing Improvements. Thanks to everyone who was involved with the
> design and reviews to refine the proposal.
>
> Discussion thread:
>
> http://mail-archives.apache.org/mod_mbox/samza-dev/202106.mbox/%3cCA+6YmWVVxz=xr244rpg2a-a6qaor0mjrw9ck41-u7tsuv8o...@mail.gmail.com%3e
>
> SEP-28:
>
> https://cwiki.apache.org/confluence/display/SAMZA/SEP-28%3A+Samza+State+Backend+Interface+and+Checkpointing+Improvements
>
> Jira ticket:
> https://issues.apache.org/jira/browse/SAMZA-2591
>
> Please vote:
>
> [ ] +1 approve
>
> [ ] +0 no opinion
>
> [ ] -1 disapprove (and reason why)
>
> Thanks,
> Daniel
>


Reminder: LinkedIn Stream Processing Meetup on Jun 24

2021-06-15 Thread Yi Pan
The meet up is coming in less than 10 days! Please remember to sign up and
join us!
https://www.meetup.com/Stream-Processing-Meetup-LinkedIn/events/278266182/

Short abstracts for the contents:
https://www.linkedin.com/posts/celiakkung_stream-processing-with-apache-kafka-apache-activity-6802706543746920448-Rori

Looking forward to seeing you virtually!

-Yi


Upcoming streams meetup event

2021-05-25 Thread Yi Pan
Hi all,

Please join us for our upcoming virtual Stream Processing meetup on
Thursday, June 24th (6:00-8:00pm PT)!

We will have 3 very exciting presentations:

Data Integration Platform using Brooklin
- Santosh Domalapalli, Wayfair

Toward Unified Stream-Batch Processing at LinkedIn
- Xinyu Liu & Yuhong Cheng, LinkedIn

Scaling Kafka Audit Service
- Abhishek Mendhekar, LinkedIn

Please visit our Meetup site for access to more details & the live event
link (TBD), and to let us know you're coming! https://lnkd.in/g7Fq-3K

Hope to see you there!

Best,

- Yi Pan


Re: How are samza.container.id generated in yarn?

2021-04-22 Thread Yi Pan
Hi, Debraj,

In YARN environment, Samza uses YARN generated containerIds as
environmental variables to set each container process's samza.container.id.
i.e. when containers are requested by Samza AM process in YARN, YARN RM
will reply with a set of allocated container objects, which is of class
org.apache.hadoop.yarn.api.records.Container. That's the resource class to
uniquely identify a container in YARN and Container#getId().toString() is
the container ID string we set to samza.container.id.

Best,

-Yi

On Wed, Apr 21, 2021 at 11:28 PM Debraj Manna 
wrote:

> The same has been asked in stackoverflow
> <
> https://stackoverflow.com/questions/67207850/how-does-samza-generate-the-container-id-when-the-application-is-deployed-in-yar
> >
> also. Anyone any thoughts on this?
>
>
> https://stackoverflow.com/questions/67207850/how-does-samza-generate-the-container-id-when-the-application-is-deployed-in-yar
>
> On Wed, Apr 21, 2021 at 6:08 PM Debraj Manna 
> wrote:
>
> > Hi
> >
> > Can someone let me know how is "samza.container.id" generated when a
> > samza app is running in yarn?
> >
> > Thanks,
> >
> >
>


Welcome Ke Wu as Apache Samza committer

2021-02-23 Thread Yi Pan
Hi, everyone,

I am glad to announce that Ke Wu has officially accepted our invitation and
become an Apache Samza committer now.

Please join me to give him a warm welcome!

Cheers!

-Yi


Re: Zookeeper and Client upgrade

2021-01-14 Thread Yi Pan
Hi, Stuart,

Thanks for opening the JIRAs. Feel free to take ownership of them.

Cheers!

-Yi

On Tue, Jan 12, 2021 at 12:03 PM Stuart Perks 
wrote:

> Two jiras raised.
>
> ZK Upgrade: https://issues.apache.org/jira/browse/SAMZA-2616
> Kafka Client Upgrade: https://issues.apache.org/jira/browse/SAMZA-2617
>
> Happy to start looking onto one of these.
>
> On 2021/01/11 23:04:08, Yi Pan  wrote:
> > Hi, Stuart,>
> >
> > Please feel free to raise tickets for update requests like these.>
> >
> > Thanks for reporting!>
> >
> > -Yi>
> >
> > On Fri, Jan 8, 2021 at 9:58 PM Stuart Perks >
> > wrote:>
> >
> > > The Zookeeper version sits at 3.4.6 with 3.6.2 now available bringing>
> > > security enhancements.>
> > >>
> > >>
> > >
> https://github.com/apache/samza/blob/master/gradle/dependency-versions.gradle>
>
> > > <>
> > >
> https://github.com/apache/samza/blob/master/gradle/dependency-versions.gradle>
>
> > > >>
> > >>
> > > The client is at 0.8 and has 0.11 available.>
> > >>
> > > Are there any plans to upgrade these? Or should we raise a Jira for
> this?>
> > >>
> > > Thanks,>
> > >>
> > > Stuart>
> > >>
> > >>
> > >>
> >


Re: [VOTE] Apache Samza 1.6.0 RC1

2021-01-13 Thread Yi Pan
Verified signature and sha1. Ran check-all and integration tests. All
passed.

+1 (binding).

-Yi

On Mon, Jan 11, 2021 at 4:08 PM Bharath Kumara Subramanian <
codin.mart...@gmail.com> wrote:

> Thanks for driving this release Boris! Verified signatures and ran all the
> tests.
> check-all and integration tests passed.
>
> +1 (binding)
>
> --
> Bharath
>
> On Mon, Jan 11, 2021 at 1:28 PM Boris S  wrote:
>
> > Quick reminder,
> > Please take some time to validate the release and vote on it.
> >
> > Thanks,
> > Boris.
> >
> > On Wed, Jan 6, 2021 at 11:13 PM Boris S  wrote:
> >
> > > Hi all,
> > >
> > > This is a call for a vote on a release of Apache Samza 1.6.0. We are
> > > excited to see some new features and improvements in this release.
> > >
> > > The release candidate can be downloaded from here:
> > > http://people.apache.org/~boryas/samza-1.6.0-rc1/
> > > 
> > >
> > > The release candidate is signed with pgp key D2103453, which is
> > > included in the repository's KEYS file:
> > > https://github.com/apache/samza/blob/master/KEYS
> > >
> > > or to directly see the public key here:
> > >
> > >
> >
> https://keyserver.ubuntu.com/pks/lookup?search=Boris+Shkolnik=on=index
> > >
> > > The git tag is release-1.6.0-rc1 and signed with the same pgp key
> above:
> > >
> > >
> >
> https://gitbox.apache.org/repos/asf?p=samza.git;a=tag;h=refs/tags/release-1.6.0-rc1
> > > <
> >
> https://gitbox.apache.org/repos/asf?p=samza.git;a=tag;h=refs/tags/release-1.6.0-rc0
> > >
> > >
> > > Test binaries have been published to Maven's staging repository, and
> are
> > > available here:
> > >
> https://repository.apache.org/content/repositories/orgapachesamza-1088/
> > > <
> https://repository.apache.org/content/repositories/orgapachesamza-1080/
> > >
> > > and
> > >
> https://repository.apache.org/content/repositories/orgapachesamza-1089/
> > > <
> https://repository.apache.org/content/repositories/orgapachesamza-1080/
> > >
> > > (for Scala 2.12)
> > >
> > > The vote will be open for 72 hours (end at 06:00pm, Mon 1/11/2021).
> > > Please download the release candidate, check the hashes/signature,
> build
> > it
> > > and test it, and vote:
> > >
> > > [ ] +1 approve
> > > [ ] +0 no opinion
> > > [ ] -1 disapprove (and reason why)
> > >
> > > I ran check-all.sh and integration tests (both YARN and standalone)
> > passed.
> > >
> > > +1 on my end for the release.
> > >
> > > Thanks,
> > > Boris
> > >
> >
>


[REPORT] Apache Samza - Jan 2021

2021-01-13 Thread Yi Pan
## Description
Apache Samza is a distributed stream processing engine that is highly
configurable to process events from various data sources, including
real-time
messaging system (e.g. Kafka) and distributed file systems (e.g. HDFS).

## Issues:
- There are no issues requiring board attention.

## Membership Data:
Apache Samza was founded 2015-01-22 (6 years ago)
There are currently 26 committers and 17 PMC members in this project.
The Committer-to-PMC ratio is roughly 7:5.

Community changes, past quarter:
No new PMC members. Last addition was Bharath Kumarasubramanian on
2020-02-13.
No new committers. Last addition was Rayman Preet Singh on 2019-07-08.

## Project Activity:
- Added AM-HA to allow continuation of job when AM restarts
- Preparing for release 1.6.0

## Community Health:
- We held another virtual meetup for Stream Processing on 12/16
- We have another Samza podcast on Software Engineering Radio channel on
11/24
- JIRA Activity:
  - 22 issues opened in JIRA, past quarter (-37% decrease)
  - 13 issues closed in JIRA, past quarter (85% increase)
- Commit Activity:
  - 34 commits in the past quarter (-17% decrease)
  - 9 code contributors in the past quarter (-55% decrease)
  - 21 PRs opened on GitHub, past quarter (-41% decrease)
  - 20 PRs closed on GitHub, past quarter (-47% decrease)


Re: Zookeeper and Client upgrade

2021-01-11 Thread Yi Pan
Hi, Stuart,

Please feel free to raise tickets for update requests like these.

Thanks for reporting!

-Yi

On Fri, Jan 8, 2021 at 9:58 PM Stuart Perks 
wrote:

> The Zookeeper version sits at 3.4.6 with 3.6.2 now available bringing
> security enhancements.
>
>
> https://github.com/apache/samza/blob/master/gradle/dependency-versions.gradle
> <
> https://github.com/apache/samza/blob/master/gradle/dependency-versions.gradle
> >
>
> The client is at 0.8 and has 0.11 available.
>
> Are there any plans to upgrade these? Or should we raise a Jira for this?
>
> Thanks,
>
> Stuart
>
>
>


Re: SAMZA-2612: Kafka topic naming not supported fully

2020-12-28 Thread Yi Pan
Hey, Stuart,

Sounds great that you have found the way around it! Thanks!

-Yi

On Sat, Dec 19, 2020 at 12:28 PM Stuart Perks 
wrote:

> This can be done using withPhysicalName
>
> Closed the JIRA
>
> On 2020/12/17 12:19:27, Stuart Perks  wrote:
> > https://issues.apache.org/jira/browse/SAMZA-2612>
> >
> > Raised a bug JIRA but wanted to check with the community. Any thoughts
> would be great.>
> >
> >
> > The StreamDescriptor class cannot accept all acceptable formats for
> Kafka Topic names.>
> > StreamDescriptor>
> >   private static final Pattern STREAM_ID_PATTERN =
> Pattern.compile("[\\d\\w-_]+");>
> > Kafka Topic Validation>
> > public static final String LEGAL_CHARS = "[a-zA-Z0-9._-]";>
> > Taking the example this is valid>
> >  KafkaInputDescriptor pageViewStreamDescriptor =
> kafkaSystemDescriptor.getInputDescriptor("page-view-topic", new
> JsonSerdeV2<>(PageView.class));>
> > but this is not if we use the name page.view.topic as . Is not valid in
> the StreamDescriptor.>
> >  KafkaInputDescriptor pageViewStreamDescriptor =
> kafkaSystemDescriptor.getInputDescriptor("page.view.topic", new
> JsonSerdeV2<>(PageView.class));>
> > Stream Descriptor Validation <
> https://github.com/apache/samza/blob/master/samza-api/src/main/java/org/apache/samza/system/descriptors/StreamDescriptor.java#L48>>
>
> > Kafka Topic Validation <
> https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/common/internals/Topic.java#L29>>
>


Re: Samza jdk11 support

2020-10-16 Thread Yi Pan
Hi, Debraj,

Sorry to reply late. Unfortunately, we don't have an official doc to run
integration tests on JDK11 yet. However, the set of integration tests that
you can run locally should be included in the open source code:
{code}
> ./bin/integration-tests.sh 
[yarn-integration-tests|standalone-integration-tests]
{code}

Hope that helps.

-Yi

On Sat, Oct 10, 2020 at 10:27 AM Debraj Manna 
wrote:

> Hi Yan
>
> Can you give me some pointer on running all the integrations tests that are
> present on samza 1.5 release branch in a local environment with Open JDK 11
> ? I found the below doc but that talks about samza 0.10.
>
> https://samza.apache.org/contribute/tests.html
>
> Thanks,
>
> On Thu, Jun 4, 2020 at 12:18 AM Yi Pan  wrote:
>
> > Hi, Debraj and Jordan,
> >
> > Thanks a lot for the ping. I dug a bit deeper in the past email thread on
> > this topic and talked with our internal team. Unfortunately, LI currently
> > does not have a short term plan to migrate to JDK 11. However, we
> encourage
> > you to contribute to the matter if that is important to you. Here is an
> > earlier investigation on this matter from this mailing list that I want
> to
> > share:
> >
> >
> http://mail-archives.apache.org/mod_mbox/samza-dev/201810.mbox/%3C1538433805053.85238%40helixeducation.com%3E
> > .
> >
> > Best!
> >
> > -Yi
> >
> > On Mon, Jun 1, 2020 at 9:09 AM Jordan Messec 
> > wrote:
> >
> > > This is a pressing issue for our team as well.
> > >
> > > Jordan
> > >
> > > > On May 31, 2020, at 7:15 PM, Debraj Manna 
> > > wrote:
> > > >
> > > > Thanks Yi.
> > > >
> > > > Did you get any update on the JDK 11 roadmap?
> > > >
> > > >
> > > > On Wed, May 27, 2020 at 1:04 AM Yi Pan  wrote:
> > > >
> > > >> Hi, Debraj,
> > > >>
> > > >> Thanks for the reminder. We did discussed about JDK11 before.
> > > >> Unfortunately, I don't know whether JDK11 is up to the roadmap as of
> > > now.
> > > >> Let me sync up with the team and get back to you.
> > > >>
> > > >> -Yi
> > > >>
> > > >> On Mon, May 25, 2020 at 9:53 AM Debraj Manna <
> > subharaj.ma...@gmail.com>
> > > >> wrote:
> > > >>
> > > >>> Hi
> > > >>>
> > > >>> I have seen a few earlier discussions on the email group which
> > > >>> suggested that samza officially does not support jdk11. But those
> > > >>> discussions seem to be old.
> > > >>>
> > > >>> Do the latest samza support jdk11?
> > > >>>
> > > >>> Thanks,
> > > >>>
> > > >>
> > >
> > >
> >
>


Draft board report for Samza

2020-10-10 Thread Yi Pan
Hi, all,

Here is the draft report for Samza. Please let me know if I missed
anything. Thanks!

==
## Description:
The mission of Samza is the creation and maintenance of software related to
distributed stream processing framework

## Issues:
[Insert your own data here]

## Membership Data:
Apache Samza was founded 2015-01-22 (6 years ago)
There are currently 26 committers and 17 PMC members in this project.
The Committer-to-PMC ratio is roughly 7:5.

Community changes, past quarter:
- No new PMC members. Last addition was Bharath Kumarasubramanian on
2020-02-13.
- No new committers. Last addition was Rayman Preet Singh on 2019-07-08.

## Project Activity:
- New version 1.5.1 was released on 2020-08-28

## Community Health:
- We held an online stream processing meetup on July 21, 2020
- We presented Python stream processing on Beam Samza Runner in Beam Summit
on Aug 24-28, 2020
- We presented Fast SamzaSQL in ApacheCon on Sept 30, 2020
- Beam Samza runner performance improvement was published in LinkedIn
Engineering blog on Oct 1, 2020




Re: Issues with Samza-Kafka compatibility while upgrading to latest versions

2020-10-10 Thread Yi Pan
Hi, Choudhary,

Thanks for reporting your issues. Samza manages its dependencies in a few
gradle files under ${ROOT}/gradle/dependency-versions-*.gradle.
Specifically, the Kafka dependency is defined in the file
${ROOT}/gradle/dependency-versions.gradle. I quickly checked the versions
listed in 1.4.0 and it is still at 2.0.1. Please try to use 2.0.1 instead
of 2.5.0. Let me know if that works for you.

Thanks!

-Yi

On Thu, Oct 8, 2020 at 1:48 PM Choudhary, Suraj 
wrote:

> Hi,
>
> I am working on updating our infrastructure by upgrading Samza, Kafka and
> Hadoop(our container manager for samza).
>
> I am upgrading from/to these versions:
> Samza:from 2.11-0.12.0 to 2.12-1.4.0
> Kafka:  from 2.11-0.10.0.1 to 2.12-2.5.0
> Hadoop:  from 2.7.2 to 2.9.2
>
> After updating the versions on starting the Samza application I am getting
> the following error:
>
> Exception in thread "main" java.lang.NoClassDefFoundError:
> kafka/common/TopicAndPartition
> at
> org.apache.samza.system.kafka.KafkaSystemConsumer.toTopicAndPartition(KafkaSystemConsumer.java:317)
> at
> org.apache.samza.system.kafka.KafkaSystemConsumer.register(KafkaSystemConsumer.java:284)
> at
> org.apache.samza.coordinator.stream.CoordinatorStreamSystemConsumer.register(CoordinatorStreamSystemConsumer.java:130)
> at
> org.apache.samza.util.CoordinatorStreamUtil$.writeConfigToCoordinatorStream(CoordinatorStreamUtil.scala:159)
> at org.apache.samza.job.JobRunner.run(JobRunner.scala:80)
> at org.apache.samza.job.JobRunner$.doOperation(JobRunner.scala:52)
> at org.apache.samza.job.JobRunner$.main(JobRunner.scala:47)
> at org.apache.samza.job.JobRunner.main(JobRunner.scala)
> Caused by: java.lang.ClassNotFoundException: kafka.common.TopicAndPartition
> at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
> ... 8 more
>
> On further checking I found that ‘KafkaSystemConsumer’ is part of
> samza-kafka_2.12-1.4.0.jar. And I looked into our deploy jars and I am
> finding only one copy of this class.
> The reference class ‘TopicAndPartition’ is not present anywhere in the
> deploy jars or kafka_2.12-2.5.0 jar in particular.
>
> I tried to find a Samza-Kafka version compatibility documentation. But I
> didn’t find any. On investigating our import of samza-kafka_2.12-1.4.0.jar,
> I found that it is including the kafka_2.12-2.5.0 dependency, which leads
> me to think that Samza should be compatible the Kafka version I am
> referencing.
>
> I am also including all the required maven dependencies such as kafka,
> kafka-clients, samza-core, samza-api of these latest versions and made sure
> no other version of the same is present in my deploy jars.
>
> Please let me know If I am missing something. And what can do to resolve
> this issue.
>
> Thanks in Advance,
> Suraj
>
>
>


Re: Problem regarding the Monitoring of Samza Applications

2020-09-02 Thread Yi Pan
Hi, Jan,

Thanks for reporting this metrics issue. We will take a further look and
get back to you.

Thanks a lot!

-Yi

On Tue, Sep 1, 2020 at 8:01 AM Jan Bensien 
wrote:

> Hello,
>
> I hope this is the right place to ask.
>
> I am having problems monitoring my Samza application. Using the
> JMX-Exporter I found that the groups: KafkaSystemConsumersMetrics,
> ZkJobCoordinatorMetrics and ZkUtilsMetrics are not emitted. I am using
> Samza 1.5.0 in combination with Beam 2.22.0 and my application is
> written using the Beam Api. My configuration file is a copy of the file
> used in the Beam example, except that I added:
>
>
> metrics.reporter.jmx.class=org.apache.samza.metrics.reporter.JmxReporterFactory
>
> metrics.reporters=jmx
>
> Using these configs for the execution of the Beam example I was able to
> get all metrics groups.
>
> I found the following warnings in my own application log multiple times:
> javax.management.InstanceAlreadyExistsException:
>
> kafka.consumer:type=app-info,id=kafka_admin_consumer-uc1applicationbeam_jan_0901133413_faffdbda-d2e5666d_be95_4bd1_b6b7_86f6c854d9a0
>
> Looking further I saw that multiple kafka consumers using the same
> client.id have been instantiated. I am not sure if this is the
> underlying issue and was not able to track why this happened.
> Furthermore i added the full error message and execution log here:
> https://gist.github.com/janb15/c580fa814a895302954cef998cff419d
> Did someone else run into this issue and managed to find a solution?
>
> Many thanks,
> Jan


Re: [VOTE] Apache Samza 1.5.1 RC0

2020-08-27 Thread Yi Pan
+1 (binding), ran ./bin/check-all.sh and all integration tests. Verified
signatures.

On Sat, Aug 22, 2020 at 8:09 PM Bharath Kumara Subramanian <
codin.mart...@gmail.com> wrote:

> Hi all,
>
> This is a call for a vote on a release of Apache Samza 1.5.1. We are
> releasing 1.5.1 to address a critical bug related transaction state
> feature.
>
> More details on the bug can be found here:
> https://issues.apache.org/jira/browse/SAMZA-2578
>
> The release candidate can be downloaded from here:
> http://home.apache.org/~bharathkk/samza-1.5.1-rc0/
>
> The release candidate is signed with pgp key F3B965A6B192DAB7, which is
> included in the repository's KEYS file:
> https://github.com/apache/samza/blob/master/KEYS
>
> or to directly see the public key here:
>
> https://keyserver.ubuntu.com/pks/lookup?search=Bharath+Kumarasubramanian=on=index
>
> The git tag is release-1.5.1-rc0 and signed with the same pgp key above:
>
> https://gitbox.apache.org/repos/asf?p=samza.git;a=tag;h=refs/tags/release-1.5.1-rc0
>
> Test binaries have been published to Maven's staging repository, and are
> available here:
> https://repository.apache.org/content/repositories/orgapachesamza-1081
>
> The vote will be open for 72 hours (end at 07:15pm Wednesday, 08/26/2020).
> Please download the release candidate, check the hashes/signature, build it
> and test it, and vote:
>
> [ ] +1 approve
> [ ] +0 no opinion
> [ ] -1 disapprove (and reason why)
>
> I ran check-all.sh and integration tests (both YARN and standalone) passed.
>
> +1 on my end for the release.
>
> Thanks,
> Bharath
>


Re: [DISCUSS] Samza 1.5.1 release

2020-08-19 Thread Yi Pan
+1 to this release as well!

On Wed, Aug 19, 2020 at 9:21 AM Prateek Maheshwari 
wrote:

> +1, this is a critical bug and we should release the fix ASAP.
>
> Thanks,
> Prateek
>
> On Tue, Aug 18, 2020 at 9:02 PM Bharath Kumara Subramanian <
> codin.mart...@gmail.com> wrote:
>
> > Hi all,
> >
> > In 1.5 release, we enabled transactional state by default for all samza
> > jobs. We identified a critical bug related to trimming the state which
> > requires a minor release.
> >
> > I wanted to kick off the discussion on the open source forum as the bug
> fix
> > has been validated internally at LinkedIn.
> >
> > More details on the bug can be found in SAMZA-2578
> > .
> > The patch that contains the fix: samza/pull/1413
> > 
> >
> > I'd like to target early next week for voting.
> >
> > Cheers,
> > Bharath
> >
>


Re: [VOTE] Apache Samza 1.5.0 RC1

2020-06-12 Thread Yi Pan
Hey, Yang,

Can you open a JIRA for this failure? This is again related to a test code
using sleep() trying to make sure that the test is completed before
verification.

-Yi

On Fri, Jun 12, 2020 at 8:01 AM Yang Zhang  wrote:

> Being late to the party. Verified apache-samza-1.5.0-src.tgz in both Linux
> and Mac.
>
> Linux (pass):
>  - check-all.sh
>  - yarn integration tests
>  - standalone integration tests.
>
> Mac (fail, may be due to my mac local setup):
>  - check-all.sh fails (can be reproduced)
>
> testBatchOperationTriggeredByTimer FAILED
>
> java.lang.AssertionError: expected:<0> but was:<100>
>
> at org.junit.Assert.fail(Assert.java:88)
>
> at org.junit.Assert.failNotEquals(Assert.java:834)
>
> at org.junit.Assert.assertEquals(Assert.java:645)
>
> at org.junit.Assert.assertEquals(Assert.java:631)
>
> at
>
> org.apache.samza.table.batching.TestBatchProcessor$TestBatchTriggered.testBatchOperationTriggeredByTimer(TestBatchProcessor.java:144)
>
>
> 1303 tests completed, 1 failed, 3 skipped
>
> +0 no opinion (non binding)
>
>
> On Thu, Jun 11, 2020 at 4:02 PM Boris S  wrote:
>
> > +1 (binding)
> >
> >- verified signatures
> >- ran check-all
> >- ran integration tests.
> >
> >
> > On Mon, Jun 8, 2020 at 5:16 PM Bharath Kumara Subramanian <
> > codin.mart...@gmail.com> wrote:
> >
> > > Hi all,
> > >
> > > This is a call for a vote on a release of Apache Samza 1.5.0. We are
> > > excited to see some new features and improvements in this release.
> > >
> > > The release candidate can be downloaded from here:
> > > http://home.apache.org/~bharathkk/samza-1.5.0-rc1/
> > >
> > > The release candidate is signed with pgp key F3B965A6B192DAB7, which is
> > > included in the repository's KEYS file:
> > > https://github.com/apache/samza/blob/master/KEYS
> > >
> > > or to directly see the public key here:
> > >
> > >
> >
> https://keyserver.ubuntu.com/pks/lookup?search=Bharath+Kumarasubramanian=on=index
> > >
> > > The git tag is release-1.5.0-rc0 and signed with the same pgp key
> above:
> > >
> > >
> >
> https://gitbox.apache.org/repos/asf?p=samza.git;a=tag;h=refs/tags/release-1.5.0-rc1
> > >
> > > Test binaries have been published to Maven's staging repository, and
> are
> > > available here:
> > >
> https://repository.apache.org/content/repositories/orgapachesamza-1080/
> > >
> > > The vote will be open for 72 hours (end at 05:15pm Tuesday,
> 06/11/2021).
> > > Please download the release candidate, check the hashes/signature,
> build
> > it
> > > and test it, and vote:
> > >
> > > [ ] +1 approve
> > > [ ] +0 no opinion
> > > [ ] -1 disapprove (and reason why)
> > >
> > > I ran check-all.sh and integration tests (both YARN and standalone)
> > passed.
> > >
> > > +1 on my end for the release.
> > >
> > > Thanks,
> > > Bharath
> > >
> >
>


Re: [VOTE] Apache Samza 1.5.0 RC1

2020-06-10 Thread Yi Pan
+1 (binding). Ran check-all, verified sha1 and signature, ran both
standalone and YARN integration tests. LGTM.

On Mon, Jun 8, 2020 at 5:16 PM Bharath Kumara Subramanian <
codin.mart...@gmail.com> wrote:

> Hi all,
>
> This is a call for a vote on a release of Apache Samza 1.5.0. We are
> excited to see some new features and improvements in this release.
>
> The release candidate can be downloaded from here:
> http://home.apache.org/~bharathkk/samza-1.5.0-rc1/
>
> The release candidate is signed with pgp key F3B965A6B192DAB7, which is
> included in the repository's KEYS file:
> https://github.com/apache/samza/blob/master/KEYS
>
> or to directly see the public key here:
>
> https://keyserver.ubuntu.com/pks/lookup?search=Bharath+Kumarasubramanian=on=index
>
> The git tag is release-1.5.0-rc0 and signed with the same pgp key above:
>
> https://gitbox.apache.org/repos/asf?p=samza.git;a=tag;h=refs/tags/release-1.5.0-rc1
>
> Test binaries have been published to Maven's staging repository, and are
> available here:
> https://repository.apache.org/content/repositories/orgapachesamza-1080/
>
> The vote will be open for 72 hours (end at 05:15pm Tuesday, 06/11/2021).
> Please download the release candidate, check the hashes/signature, build it
> and test it, and vote:
>
> [ ] +1 approve
> [ ] +0 no opinion
> [ ] -1 disapprove (and reason why)
>
> I ran check-all.sh and integration tests (both YARN and standalone) passed.
>
> +1 on my end for the release.
>
> Thanks,
> Bharath
>


Re: Samza jdk11 support

2020-06-03 Thread Yi Pan
Hi, Debraj and Jordan,

Thanks a lot for the ping. I dug a bit deeper in the past email thread on
this topic and talked with our internal team. Unfortunately, LI currently
does not have a short term plan to migrate to JDK 11. However, we encourage
you to contribute to the matter if that is important to you. Here is an
earlier investigation on this matter from this mailing list that I want to
share:
http://mail-archives.apache.org/mod_mbox/samza-dev/201810.mbox/%3C1538433805053.85238%40helixeducation.com%3E
.

Best!

-Yi

On Mon, Jun 1, 2020 at 9:09 AM Jordan Messec 
wrote:

> This is a pressing issue for our team as well.
>
> Jordan
>
> > On May 31, 2020, at 7:15 PM, Debraj Manna 
> wrote:
> >
> > Thanks Yi.
> >
> > Did you get any update on the JDK 11 roadmap?
> >
> >
> > On Wed, May 27, 2020 at 1:04 AM Yi Pan  wrote:
> >
> >> Hi, Debraj,
> >>
> >> Thanks for the reminder. We did discussed about JDK11 before.
> >> Unfortunately, I don't know whether JDK11 is up to the roadmap as of
> now.
> >> Let me sync up with the team and get back to you.
> >>
> >> -Yi
> >>
> >> On Mon, May 25, 2020 at 9:53 AM Debraj Manna 
> >> wrote:
> >>
> >>> Hi
> >>>
> >>> I have seen a few earlier discussions on the email group which
> >>> suggested that samza officially does not support jdk11. But those
> >>> discussions seem to be old.
> >>>
> >>> Do the latest samza support jdk11?
> >>>
> >>> Thanks,
> >>>
> >>
>
>


Re: [DISCUSS] Samza 1.5 release

2020-06-01 Thread Yi Pan
lgtm. Thanks for kicking off the discussion!

-Yi

On Tue, May 26, 2020 at 8:55 PM Bharath Kumara Subramanian <
codin.mart...@gmail.com> wrote:

> Hi all,
>
> We have accumulated few features/improvements since the last release and
> would like to make Samza 1.5 release.
>
> I wanted to kick off the discussion on the open source forum as some of
> these changes have already been tested internally at LinkedIn. Some of the
> features/improvements include but not limited to Simplifying Job Runner,
> Container Placements and auto enable transactional state.
>
> A comprehensive list of changes can be found here:
>
> https://issues.apache.org/jira/browse/SAMZA-2527?jql=project%20%3D%20%22SAMZA%22%20and%20fixVersion%20in%20(1.5)
>
> The new release branch has already been cut and the name is "1.5.0". I
> would like to target the week of June 1st for voting.
>
> Thank you,
> Bharath
>


Re: Samza jdk11 support

2020-05-26 Thread Yi Pan
Hi, Debraj,

Thanks for the reminder. We did discussed about JDK11 before.
Unfortunately, I don't know whether JDK11 is up to the roadmap as of now.
Let me sync up with the team and get back to you.

-Yi

On Mon, May 25, 2020 at 9:53 AM Debraj Manna 
wrote:

> Hi
>
> I have seen a few earlier discussions on the email group which
> suggested that samza officially does not support jdk11. But those
> discussions seem to be old.
>
> Do the latest samza support jdk11?
>
> Thanks,
>


Re: Streamtasks not instantiating consumers on startup

2020-04-10 Thread Yi Pan
Hi, Malcolm,

Samza 0.14.1 is pretty old and if you have already upgraded Kafka to 2.2.1,
I would highly recommend you to migrate to latest Samza version, which has
Kafka 2.0 client. Depending on how you configure your broker, a 2.0 Kafka
broker can be incompatible with older client like 0.8.6, since there have
been wire-protocol changes in Kafka since 0.11.

P.S. a more detailed log + configuration would be required to debug your
issue (probably, turn on debug log for Kafka consumer lib), if you still
choose to stay with Samza 0.14.1.

Best,

-Yi

On Thu, Apr 9, 2020 at 12:50 PM Malcolm McFarland 
wrote:

> Hey folks,
>
> We're occasionally seeing an issue when starting Samza containers where
> none of the streamtasks for a job will instantiate consumers. We'll see the
> beginnings of an attempt to read the checkpoint stream and nothing further
> (including no errors). All of these streamtasks seem to be creating
> producers -- there are plenty of "Registering
> TaskName-SystemStreamPartition [kafka, , ] with
> producer." messages. We're not seeing any of the usual messages about
> instantiating consumers; all we're seeing is this:
>
> 2020-04-09T19:24:14.968Z Validating offset 0 for topic and partition
> [__samza_checkpoint_ver_1_for__1,0]
> 2020-04-09T19:24:14.969Z Able to successfully read from offset 0 for topic
> and partition [__samza_checkpoint_ver_1_for__1,0]. Using it to
> instantiate consumer.
> 2020-04-09T19:24:14.974Z Reading checkpoint for taskName
> SystemStreamPartition [kafka, , ]
>
> ...and that's as far as it will go. I am also noticing that when using
> different versions of librdkafka, I *can* read the checkpoint stream with
> librdkafka 1.3.0, but *not* with librdkafka 0.8.6. Could there be a version
> incompatibility with how the data is being stored on the kafka server?
> We're running Samza 0.14.1 and using AWS MSK which is running version Kafka
> 2.2.1.
>
>
> Thanks so much,
> Malcolm McFarland
> Cavulus
>


[REPORT] Samza - April 2020

2020-04-08 Thread Yi Pan
Apache Samza is a distributed stream processing engine that are highly
configurable to process events from various data sources, including real-time
messaging system (e.g. Kafka) and distributed file systems (e.g. HDFS).

## Issues:
- No issues require board attention

## Membership Data:
Apache Samza was founded 2015-01-22 (5 years ago) There are currently 26
committers and 17 PMC members in this project. The Committer-to-PMC ratio is
roughly 7:5.

Community changes, past quarter:
- Bharath Kumarasubramanian was added to the PMC on 2020-02-13
- No new committers. Last addition was Rayman Preet Singh on 2019-07-08.

## Project Activity:
- New version 1.3.1 was released on 2020-02-20
- New version 1.4.0 was released on 2020-03-18
- There have been 5 new Samza Enhancement Proposals (SEPs) to add new features
  in the last quarter. Out of these, 3 have been accepted, and 2 are under
  discussion.
- JIRA Activity:
  - 75 issues opened in JIRA, past quarter (-12% decrease)
  - 56 issues closed in JIRA, past quarter (133% increase)
- Commit Activity:
  - 125 commits in the past quarter (47% increase)
  - 99 PRs closed on GitHub, past quarter (62% increase)

## Community Health:
- We continue to engage and support the community via the dev@samza.apache.org
  mailing list. The mailing list has had a 98% increase in traffic in the past
  quarter (189 emails compared to 95)

- We presented about Samza in the following meetup talks:
  - Stateful Stream Processing with Apache Samza and RocksDB: RocksDB meetup
2020 at Rockset
  - Defending users from Abuse using Stream Processing at LinkedIn: Stream
Processing with Apache Kafka & Apache Samza Meetup at LinkedIn
  - Enabling Mission-critical Stateful Stream Processing with Samza: Stream
Processing with Apache Kafka & Apache Samza Meetup at LinkedIn


Re: [VOTE] SEP-24: Cluster-based Job Coordinator Dependency Isolation

2020-03-16 Thread Yi Pan
Hey, Cameron,

Thanks for the detailed answers. It would be good to add this explanation
to the SEP page as well.

Otherwise, +1 from my side. Thanks!

-Yi

On Mon, Mar 16, 2020 at 10:06 AM Cameron Lee 
wrote:

> You have the correct understanding about the "yarn.resources.*"
> configuration, and your question is a good one. Currently, the
> implementation is that Samza will look in a specific place on the file
> system (i.e. /__samzaFrameworkApi and  working directory>/__samzaFrameworkInfrastructure) to get the
> API/infrastructure classpaths. I have a TODO in the code to make the file
> system location configurable (or specified through an environment
> variable). The configuration or environment variable for the file system
> location would not be YARN-specific, and it would be applicable to any
> execution environment.
>
> On Wed, Mar 11, 2020 at 10:54 PM Yi Pan  wrote:
>
> > OK. If I understand correctly, your answer is the following:
> > yarn.resources.* configuration variables are used by YARN localizer to
> make
> > API and infrastructure classpath available, together with the
> application's
> > own classpath, which is also determined by the YARN localizer.
> > The question here is: how do we let the container JVM know the
> > API/infrastructure classpaths when launching the container processes? If
> > the API and infrastructure classpaths (i.e. installation path determined
> by
> > the localizer) are customizable, then we would need to tell the container
> > JVM those API/infra classpaths via some configuration variables as well,
> > right? Hence, those configuration variable names need to be understood by
> > the Samza application's code (which is run within the container) as well.
> > If not, what's the mechanism that we will use to let the container JVM
> > process to know where the YARN localizer has put API/infra classpaths?
> >
> > Thanks!
> >
> > -Yi
> >
> >
> >
> > On Wed, Mar 11, 2020 at 8:09 PM Cameron Lee 
> > wrote:
> >
> > > The configuration variables are only used by the YARN localizer. The
> > Samza
> > > application will look for the framework resources in certain places in
> > the
> > > application's working directory when it needs to access them. My aim is
> > to
> > > do something similar to how "yarn.package.path" works. In other
> execution
> > > environments, it is my understanding that "yarn.package.path" would get
> > > replaced by a different environment-specific configuration key/value.
> > > I agree that we should not use "yarn.resources.*" if the configurations
> > are
> > > not YARN-specific. Do you think that these resource localization
> configs
> > > are generalizable to arbitrary environments? If so, does that mean
> > > "yarn.package.path" is also generalizable? For example, what if some
> > > execution environment does not use URLs to specify resource locations
> > > (although maybe this isn't a reasonable concern to worry about?)?
> > >
> > > Thanks,
> > > Cameron
> > >
> > > On Wed, Mar 11, 2020 at 4:43 PM Yi Pan  wrote:
> > >
> > > > Hi, Cameron,
> > > >
> > > > Thanks for the quick responses! Appreciate it.
> > > >
> > > > I am still having a concern on a): are those configuration variables
> > used
> > > > by YARN localizer or by Samza applications? If those are used only by
> > the
> > > > YARN localizer, I agree that we should keep those as yarn specific.
> > > > Otherwise, I think that would still be better to name those as
> > > > cluster.based.resources.*. The reason being: Samza applications are
> > > > supposed to be able to run on different execution environments.
> > Ideally,
> > > > when we are deploying the same Samza application on YARN vs Mesos or
> > > > managed K8s clusters, we should only need to change the configure
> > values,
> > > > not the configuration variable names and values. Does it make sense?
> > > > Otherwise, we can schedule a conf call to clarify that.
> > > >
> > > > Thanks!
> > > >
> > > > -Yi
> > > >
> > > > On Tue, Mar 10, 2020 at 3:25 PM Cameron Lee  >
> > > > wrote:
> > > >
> > > > > a) The "yarn.resources.*" configs are for localizing the necessary
> > > > > resources into the working directory for the process. I felt that
> the
> > > > > specific co

Re: Got Error Produce Respons with Correlation Id.

2020-03-13 Thread Yi Pan
Cool! Sounds good to me! Happy to be the help!

-Yi

On Fri, Mar 13, 2020 at 1:13 PM Jeremiah Adams
 wrote:

> Yes, I explicitly commit via code for this job as an effort to ensure only
> once processing.
>
> Thanks for taking the time to look into our concerns.
>
> Jeremiah Adams
> Software Engineer
> www.helixeducation.com
> Blog | Twitter | Facebook | LinkedIn
>
> ________
> From: Yi Pan 
> Sent: Friday, March 13, 2020 1:27 PM
> To: dev@samza.apache.org
> Subject: Re: Got Error Produce Respons with Correlation Id.
>
> Hi, Jeremiah,
>
> From what you have answered, it looks to me as a transient error (probably
> timeout due to some transient network errors as you mentioned) and your job
> was able to retry/recover and make progress.
>
> Just one thing to confirm: I saw your configured task.commit.ms=-1, and
> you
> have mentioned that your checkpointed offset metrics DOES increment over
> time. Are you calling commit in your user code?
>
> Thanks!
>
> -Yi
>
> On Fri, Mar 13, 2020 at 9:46 AM Jeremiah Adams
>  wrote:
>
> >  Do you see the Samza job hanging after that?
> > The job does not hang.
> >
> >
> > Is the checkpointed offset metrics incrementing in this case?
> > We do get incremented offsets.
> >
> > Not clear on your claiming: "logs stop at that point". No logs are
> written
> > after the WARN lines?
> > My apologies for the confusion - I see no lag messages related to the
> > warning. I see all of our normal processing logs. I'm assuming this means
> > the retry worked.
> >
> >
> > What's your Samza configuration?
> >
> >
> job.coordinator.factory=org.apache.samza.standalone.PassthroughJobCoordinatorFactory
> > job.coordinator.replication.factor=1
> > job.default.system=kafka
> > systems.kafka.producer.bootstrap.servers=.confluent.cloud:9092
> >
> >
> task.name.grouper.factory=org.apache.samza.container.grouper.task.GroupByContainerIdsFactory
> > systems.kafka.producer.ssl.endpoint.identification.algorithm=https
> > systems.kafka.producer.sasl.mechanism=PLAIN
> > systems.kafka.producer.request.timeout.ms=2
> > systems.kafka.producer.retry.backoff.ms=500
> >
> systems.kafka.producer.sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule
> > required username="" password="";
> > systems.kafka.producer.security.protocol=SASL_SSL
> > systems.kafka.consumer.ssl.endpoint.identification.algorithm=https
> > systems.kafka.consumer.sasl.mechanism=PLAIN
> > systems.kafka.consumer.request.timeout.ms=2
> > systems.kafka.consumer.retry.backoff.ms=500
> >
> systems.kafka.consumer.sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule
> > required username="" password="";
> > systems.kafka.consumer.security.protocol=SASL_SSL
> > processor.id=0
> >
> > # checkpointing
> >
> >
> task.checkpoint.factory=org.apache.samza.checkpoint.kafka.KafkaCheckpointManagerFactory
> > task.checkpoint.system=kafka
> > task.checkpoint.replication.factor=3
> > task.commit.ms=-1
> >
> > Is the Samza container still running after you see those WARN logs?
> > Yes.
> >
> >
> > I am thinking this is a timeout issue. We've never seen the issue before.
> > The warning first appeared after  testing Confluent's Cloud kafka
> offering.
> > We had no issues when running our own kafka clusters in aws.
> >
> >
> > Jeremiah Adams
> > Software Engineer
> >
> https://url.emailprotection.link/?bM9S-3pRw1lv8pYfwa-TwdjElP4W2K6b9vP5Crz22L_YcgsRJ-13h-OgPZSwFtU7GSNTDi1z-jdaRvWESRhtTVA~~
> > Blog | Twitter | Facebook | LinkedIn
> >
> > 
> > From: Yi Pan 
> > Sent: Wednesday, March 11, 2020 5:48 PM
> > To: dev@samza.apache.org
> > Subject: Re: Got Error Produce Respons with Correlation Id.
> >
> > Hi, Jeremiah,
> >
> > Sorry to reply late. This WARN message indicates that producer failed to
> > flush to checkpoint topic and would retry. Do you see the Samza job
> hanging
> > after that? Is the checkpointed offset metrics incrementing in this case?
> > Not clear on your claiming: "logs stop at that point". No logs are
> written
> > after the WARN lines? What's your Samza configuration? Is the Samza
> > container still running after you see those WARN logs?
> >
> > Thanks!
> >
> > -Yi
> >
> > On Wed, Mar 11, 2020 at 2:39 PM Jeremiah Adams
> >  wrote:
> >
> > > Can 

Re: Got Error Produce Respons with Correlation Id.

2020-03-13 Thread Yi Pan
Hi, Jeremiah,

>From what you have answered, it looks to me as a transient error (probably
timeout due to some transient network errors as you mentioned) and your job
was able to retry/recover and make progress.

Just one thing to confirm: I saw your configured task.commit.ms=-1, and you
have mentioned that your checkpointed offset metrics DOES increment over
time. Are you calling commit in your user code?

Thanks!

-Yi

On Fri, Mar 13, 2020 at 9:46 AM Jeremiah Adams
 wrote:

>  Do you see the Samza job hanging after that?
> The job does not hang.
>
>
> Is the checkpointed offset metrics incrementing in this case?
> We do get incremented offsets.
>
> Not clear on your claiming: "logs stop at that point". No logs are written
> after the WARN lines?
> My apologies for the confusion - I see no lag messages related to the
> warning. I see all of our normal processing logs. I'm assuming this means
> the retry worked.
>
>
> What's your Samza configuration?
>
> job.coordinator.factory=org.apache.samza.standalone.PassthroughJobCoordinatorFactory
> job.coordinator.replication.factor=1
> job.default.system=kafka
> systems.kafka.producer.bootstrap.servers=.confluent.cloud:9092
>
> task.name.grouper.factory=org.apache.samza.container.grouper.task.GroupByContainerIdsFactory
> systems.kafka.producer.ssl.endpoint.identification.algorithm=https
> systems.kafka.producer.sasl.mechanism=PLAIN
> systems.kafka.producer.request.timeout.ms=2
> systems.kafka.producer.retry.backoff.ms=500
> systems.kafka.producer.sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule
> required username="" password="";
> systems.kafka.producer.security.protocol=SASL_SSL
> systems.kafka.consumer.ssl.endpoint.identification.algorithm=https
> systems.kafka.consumer.sasl.mechanism=PLAIN
> systems.kafka.consumer.request.timeout.ms=2
> systems.kafka.consumer.retry.backoff.ms=500
> systems.kafka.consumer.sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule
> required username="" password="";
> systems.kafka.consumer.security.protocol=SASL_SSL
> processor.id=0
>
> # checkpointing
>
> task.checkpoint.factory=org.apache.samza.checkpoint.kafka.KafkaCheckpointManagerFactory
> task.checkpoint.system=kafka
> task.checkpoint.replication.factor=3
> task.commit.ms=-1
>
> Is the Samza container still running after you see those WARN logs?
> Yes.
>
>
> I am thinking this is a timeout issue. We've never seen the issue before.
> The warning first appeared after  testing Confluent's Cloud kafka offering.
> We had no issues when running our own kafka clusters in aws.
>
>
> Jeremiah Adams
> Software Engineer
> www.helixeducation.com
> Blog | Twitter | Facebook | LinkedIn
>
> 
> From: Yi Pan 
> Sent: Wednesday, March 11, 2020 5:48 PM
> To: dev@samza.apache.org
> Subject: Re: Got Error Produce Respons with Correlation Id.
>
> Hi, Jeremiah,
>
> Sorry to reply late. This WARN message indicates that producer failed to
> flush to checkpoint topic and would retry. Do you see the Samza job hanging
> after that? Is the checkpointed offset metrics incrementing in this case?
> Not clear on your claiming: "logs stop at that point". No logs are written
> after the WARN lines? What's your Samza configuration? Is the Samza
> container still running after you see those WARN logs?
>
> Thanks!
>
> -Yi
>
> On Wed, Mar 11, 2020 at 2:39 PM Jeremiah Adams
>  wrote:
>
> > Can anyone take a look at the message below? We are trying to gauge our
> > risk before moving forward.
> >
> >
> > Jeremiah Adams
> > Software Engineer
> >
> https://url.emailprotection.link/?bM9S-3pRw1lv8pYfwa-TwdjElP4W2K6b9vP5Crz22L_YcgsRJ-13h-OgPZSwFtU7GSNTDi1z-jdaRvWESRhtTVA~~
> > Blog | Twitter | Facebook | LinkedIn
> >
> > 
> > From: Jeremiah Adams 
> > Sent: Wednesday, March 4, 2020 2:28 PM
> > To: dev@samza.apache.org
> > Subject: Got Error Produce Response iwth Correlation Id.
> >
> > Hello devs,
> >
> >
> > I've got a warning showing up in the logs while testing our new Confluent
> > Cloud config.  Can anyone tell me how concerned I should be about this
> > warning? Is there a setting to control timeouts?
> >
> >
> > Also, logs stop at that point, so I can't tell if the "metatdata update"
> > was complete.
> >
> >
> >
> > 2020-03-04 21:17:51 Sender [WARN] [Producer
> > clientId=kafka_producer-application_submission-1] Got error produce
> > response with correlation id 144 on topic-partition

Re: [VOTE] SEP-24: Cluster-based Job Coordinator Dependency Isolation

2020-03-11 Thread Yi Pan
OK. If I understand correctly, your answer is the following:
yarn.resources.* configuration variables are used by YARN localizer to make
API and infrastructure classpath available, together with the application's
own classpath, which is also determined by the YARN localizer.
The question here is: how do we let the container JVM know the
API/infrastructure classpaths when launching the container processes? If
the API and infrastructure classpaths (i.e. installation path determined by
the localizer) are customizable, then we would need to tell the container
JVM those API/infra classpaths via some configuration variables as well,
right? Hence, those configuration variable names need to be understood by
the Samza application's code (which is run within the container) as well.
If not, what's the mechanism that we will use to let the container JVM
process to know where the YARN localizer has put API/infra classpaths?

Thanks!

-Yi



On Wed, Mar 11, 2020 at 8:09 PM Cameron Lee  wrote:

> The configuration variables are only used by the YARN localizer. The Samza
> application will look for the framework resources in certain places in the
> application's working directory when it needs to access them. My aim is to
> do something similar to how "yarn.package.path" works. In other execution
> environments, it is my understanding that "yarn.package.path" would get
> replaced by a different environment-specific configuration key/value.
> I agree that we should not use "yarn.resources.*" if the configurations are
> not YARN-specific. Do you think that these resource localization configs
> are generalizable to arbitrary environments? If so, does that mean
> "yarn.package.path" is also generalizable? For example, what if some
> execution environment does not use URLs to specify resource locations
> (although maybe this isn't a reasonable concern to worry about?)?
>
> Thanks,
> Cameron
>
> On Wed, Mar 11, 2020 at 4:43 PM Yi Pan  wrote:
>
> > Hi, Cameron,
> >
> > Thanks for the quick responses! Appreciate it.
> >
> > I am still having a concern on a): are those configuration variables used
> > by YARN localizer or by Samza applications? If those are used only by the
> > YARN localizer, I agree that we should keep those as yarn specific.
> > Otherwise, I think that would still be better to name those as
> > cluster.based.resources.*. The reason being: Samza applications are
> > supposed to be able to run on different execution environments. Ideally,
> > when we are deploying the same Samza application on YARN vs Mesos or
> > managed K8s clusters, we should only need to change the configure values,
> > not the configuration variable names and values. Does it make sense?
> > Otherwise, we can schedule a conf call to clarify that.
> >
> > Thanks!
> >
> > -Yi
> >
> > On Tue, Mar 10, 2020 at 3:25 PM Cameron Lee 
> > wrote:
> >
> > > a) The "yarn.resources.*" configs are for localizing the necessary
> > > resources into the working directory for the process. I felt that the
> > > specific configuration format to specify these resources might be
> > > YARN-specific (e.g. YARN has type and visibility configs for each of
> its
> > > resources), so a generic format might not apply. In a non-YARN case,
> the
> > > localization configs would need to be specified according to the
> > technology
> > > being used.
> > > b) It is correct that the Avro version will need to be compatible with
> > the
> > > version that is used by the infrastructure, if infrastructure needs to
> > use
> > > Avro and pass the Avro object to the application. This is the case with
> > any
> > > serde technology that needs to be used. For the job coordinator, it is
> > not
> > > much of a concern anyways, since it is not doing serde of Avro
> messages.
> > > This may be more of a concern for general split deployment, which will
> > > impact the processing containers, and will be a separate SEP.
> > > c) It should work to leave infrastructure serdes in the infrastructure
> > > classpath. The infrastructure serdes just see generic types (which are
> > > java.lang.Object at runtime) for the messages, and they don't do
> anything
> > > with the concrete types, so in the infrastructure classes, the messages
> > get
> > > passed around as Object, but their concrete classes can still be loaded
> > > from the application. As with (b), this is more of a concern for
> general
> > > split deployment, since the job coordinator doesn't do message serde. I
> > > have run some tests regarding this cl

Re: Got Error Produce Respons with Correlation Id.

2020-03-11 Thread Yi Pan
Hi, Jeremiah,

Sorry to reply late. This WARN message indicates that producer failed to
flush to checkpoint topic and would retry. Do you see the Samza job hanging
after that? Is the checkpointed offset metrics incrementing in this case?
Not clear on your claiming: "logs stop at that point". No logs are written
after the WARN lines? What's your Samza configuration? Is the Samza
container still running after you see those WARN logs?

Thanks!

-Yi

On Wed, Mar 11, 2020 at 2:39 PM Jeremiah Adams
 wrote:

> Can anyone take a look at the message below? We are trying to gauge our
> risk before moving forward.
>
>
> Jeremiah Adams
> Software Engineer
> www.helixeducation.com
> Blog | Twitter | Facebook | LinkedIn
>
> 
> From: Jeremiah Adams 
> Sent: Wednesday, March 4, 2020 2:28 PM
> To: dev@samza.apache.org
> Subject: Got Error Produce Response iwth Correlation Id.
>
> Hello devs,
>
>
> I've got a warning showing up in the logs while testing our new Confluent
> Cloud config.  Can anyone tell me how concerned I should be about this
> warning? Is there a setting to control timeouts?
>
>
> Also, logs stop at that point, so I can't tell if the "metatdata update"
> was complete.
>
>
>
> 2020-03-04 21:17:51 Sender [WARN] [Producer
> clientId=kafka_producer-application_submission-1] Got error produce
> response with correlation id 144 on topic-partition
> __samza_checkpoint_ver_1_for_application-submission_1-0, retrying
> (2147483646 attempts left). Error: NETWORK_EXCEPTION
> 2020-03-04 21:17:51 Sender [WARN] [Producer
> clientId=kafka_producer-application_submission-1] Received invalid metadata
> error in produce request on partition
> __samza_checkpoint_ver_1_for_application-submission_1-0 due to
> org.apache.kafka.common.errors.NetworkException: The server disconnected
> before a response was received.. Going to request metadata update now
>
>
> Jeremiah Adams
> Software Engineer
>
> https://url.emailprotection.link/?bM9S-3pRw1lv8pYfwa-TwdjElP4W2K6b9vP5Crz22L_YcgsRJ-13h-OgPZSwFtU7GSNTDi1z-jdaRvWESRhtTVA~~
> <
> https://url.emailprotection.link/?basKr9vk92a8vVw0XMnK5bmaSKuBc0AuEZ7YasYc7Df8YVt3SYmcjmLWdKMWzAAINWlUUA33ebGI7pSoTl9cg1g~~
> >
> Blog<
> https://url.emailprotection.link/?basKr9vk92a8vVw0XMnK5bmaSKuBc0AuEZ7YasYc7Df-lAcqG1fqHPpNw-wd9z7HtUJeCG5_8UjCf2mHtn6C_zQ~~>
> | Twitter<
> https://url.emailprotection.link/?bVO2q0UXR235wN_yOnM0FjqITPdBYMD3reLGNddq-zPV5ChMQK9JwV4Be-QnrbRoXpJl8IcknAqKzYtA3RABKww~~>
> | Facebook<
> https://url.emailprotection.link/?bUU7m4NfMS_EWGtH1yojBHX9sWZ6uxVdT1eQUkmU5vWY01WFZiS2KJ-c9iLIncdHB7Uw1lRYCprEEpPPQCdiK6Q~~>
> | LinkedIn<
> https://url.emailprotection.link/?b0ZQfJ1pZYnASyoShs9MJI46-r1lxPhA-JS5VSkR7so-DFP0_HxbOo2LsajGOaoYXxb1ZCOMAu7hZscPCnIKWpXz0cpgQ386SnNHjPcwsu4z90mzBkuwoZc6YxOCzMGA0
> >
>


Re: [VOTE] SEP-24: Cluster-based Job Coordinator Dependency Isolation

2020-03-11 Thread Yi Pan
Hi, Cameron,

Thanks for the quick responses! Appreciate it.

I am still having a concern on a): are those configuration variables used
by YARN localizer or by Samza applications? If those are used only by the
YARN localizer, I agree that we should keep those as yarn specific.
Otherwise, I think that would still be better to name those as
cluster.based.resources.*. The reason being: Samza applications are
supposed to be able to run on different execution environments. Ideally,
when we are deploying the same Samza application on YARN vs Mesos or
managed K8s clusters, we should only need to change the configure values,
not the configuration variable names and values. Does it make sense?
Otherwise, we can schedule a conf call to clarify that.

Thanks!

-Yi

On Tue, Mar 10, 2020 at 3:25 PM Cameron Lee  wrote:

> a) The "yarn.resources.*" configs are for localizing the necessary
> resources into the working directory for the process. I felt that the
> specific configuration format to specify these resources might be
> YARN-specific (e.g. YARN has type and visibility configs for each of its
> resources), so a generic format might not apply. In a non-YARN case, the
> localization configs would need to be specified according to the technology
> being used.
> b) It is correct that the Avro version will need to be compatible with the
> version that is used by the infrastructure, if infrastructure needs to use
> Avro and pass the Avro object to the application. This is the case with any
> serde technology that needs to be used. For the job coordinator, it is not
> much of a concern anyways, since it is not doing serde of Avro messages.
> This may be more of a concern for general split deployment, which will
> impact the processing containers, and will be a separate SEP.
> c) It should work to leave infrastructure serdes in the infrastructure
> classpath. The infrastructure serdes just see generic types (which are
> java.lang.Object at runtime) for the messages, and they don't do anything
> with the concrete types, so in the infrastructure classes, the messages get
> passed around as Object, but their concrete classes can still be loaded
> from the application. As with (b), this is more of a concern for general
> split deployment, since the job coordinator doesn't do message serde. I
> have run some tests regarding this classloading pattern, but we will do
> further verification for general split deployment.
> d) Yes, you are correct. Good catch. It should be "described above at
> Application classloader".
>
> Thanks for all of your questions. I will clarify some details in the doc
> regarding your questions.
>
> Cameron
>
> On Mon, Mar 9, 2020 at 12:07 PM Yi Pan  wrote:
>
> > Hi, Cameron,
> >
> > Sorry to chime in late. Overall, looks great! I do have a few
> > suggestions/questions before I can cast my vote here:
> > a) for the configuration variable names, why are we limiting ourselves to
> > yarn.resource.*? We have changed some of the configuration variables from
> > yarn specific to non-yarn specific. I would love to keep that consistent
> > (i.e. gradually moving all our yarn-specific configuration variables to
> > non-yarn-specifc names)
> > b) for the avro case as referred to in the delegation case in the
> > Infrastructure classloader, if we delegate the object deserialization
> class
> > to the application classloader, would it be possible that the application
> > provides an non-compatible version of avro class than the ones used
> within
> > the "infrastructure plugins" and hence causing runtime exception in the
> > infrastructure plugin? Or is the solution being: do not directly use
> serde
> > classes in the infrastructure code?
> > c) following the description of infrastructure classloader flow, where
> > should we expect the serde classes? In the application classpath, I
> guess?
> > So, does that mean that we should exclude serde classes (including
> > SerializableSerde and JsonSerdeV2) in the Samza infrastructure package,
> and
> > tell the users to package them in application package?
> > d) I am a bit confused about the description on "multiple" application
> > classloaders on the job coordinator: one is for the describe flow and the
> > other is in the "Application" classloader, instead of "API" classloader,
> > right?
> >
> > Best,
> >
> > -Yi
> >
> >
> > On Wed, Mar 4, 2020 at 11:32 AM Ke Wu  wrote:
> >
> > > +1.
> > >
> > > Thanks for driving this effort.
> > >
> > > Best,
> > > Ke
> > >
> > > > On Mar 3, 2020, at 6:28 PM, Jagadish Venkatraman <

Re: [VOTE] SEP-24: Cluster-based Job Coordinator Dependency Isolation

2020-03-09 Thread Yi Pan
Hi, Cameron,

Sorry to chime in late. Overall, looks great! I do have a few
suggestions/questions before I can cast my vote here:
a) for the configuration variable names, why are we limiting ourselves to
yarn.resource.*? We have changed some of the configuration variables from
yarn specific to non-yarn specific. I would love to keep that consistent
(i.e. gradually moving all our yarn-specific configuration variables to
non-yarn-specifc names)
b) for the avro case as referred to in the delegation case in the
Infrastructure classloader, if we delegate the object deserialization class
to the application classloader, would it be possible that the application
provides an non-compatible version of avro class than the ones used within
the "infrastructure plugins" and hence causing runtime exception in the
infrastructure plugin? Or is the solution being: do not directly use serde
classes in the infrastructure code?
c) following the description of infrastructure classloader flow, where
should we expect the serde classes? In the application classpath, I guess?
So, does that mean that we should exclude serde classes (including
SerializableSerde and JsonSerdeV2) in the Samza infrastructure package, and
tell the users to package them in application package?
d) I am a bit confused about the description on "multiple" application
classloaders on the job coordinator: one is for the describe flow and the
other is in the "Application" classloader, instead of "API" classloader,
right?

Best,

-Yi


On Wed, Mar 4, 2020 at 11:32 AM Ke Wu  wrote:

> +1.
>
> Thanks for driving this effort.
>
> Best,
> Ke
>
> > On Mar 3, 2020, at 6:28 PM, Jagadish Venkatraman 
> wrote:
> >
> > +1 binding.
> >
> > Thanks Cameron. I look forward to this feature taking our "Stream
> > Processing as a service" offering to the next level.
> >
> > Cheers
> >
> > On Tuesday, March 3, 2020, Prateek Maheshwari 
> wrote:
> >
> >> +1 (binding) from me. Thanks for contributing this feature. Looking
> forward
> >> to having dependency isolation and to the ability to upgrade the
> framework
> >> independently from an application.
> >>
> >> Thanks,
> >> Prateek
> >>
> >> On Fri, Feb 28, 2020 at 10:48 AM Cameron Lee 
> >> wrote:
> >>
> >>> Hi all,
> >>>
> >>> This is a call for a vote on SEP-24: Cluster-based Job Coordinator
> >>> Dependency Isolation. Thanks to everyone who reviewed the proposal and
> >>> provided feedback.
> >>>
> >>> I have addressed comments on the SEP, and I am not aware of any further
> >>> major questions or objections, so I am starting this vote.
> >>>
> >>> SEP link:
> >>>
> >>> https://cwiki.apache.org/confluence/display/SAMZA/SEP-
> >> 24%3A+Cluster-based+Job+Coordinator+Dependency+Isolation
> >>>
> >>> Discuss thread:
> >>>
> >>> https://mail-archives.apache.org/mod_mbox/samza-dev/202001.mbox/%
> >> 3cCAMja7KeGcRZ3H95Rxk5XE=60zxm6jxjkjuwwxmgmadpfbyk...@mail.gmail.com%3e
> >>> There was also some discussion through comments on the SEP page (see
> >>> Resolved Comments).
> >>>
> >>> Please vote:
> >>> [ ] +1 approve
> >>> [ ] +0 no opinion
> >>> [ ] -1 disapprove (and reason why)
> >>>
> >>> Thank you,
> >>> Cameron
> >>>
> >>
> >
> >
> > --
> > Jagadish
>
>


Re: [VOTE] Apache Samza 1.4.0 RC1

2020-03-06 Thread Yi Pan
Have downloaded the files, build with check-all.sh, and ran both YARN and
standalone integration tests. All passed.

+1 (binding).

Thanks!

-Yi

On Tue, Mar 3, 2020 at 3:03 PM Cameron Lee  wrote:

> Hi all,
>
> This is a call for a vote on a release of Apache Samza 1.4.0. Thanks to
> everyone who has contributed to this release.
>
> The release candidate can be downloaded from here:
> https://home.apache.org/~cameronlee/samza-1.4.0-rc1/
>
> The release candidate is signed with pgp key 0x54CB3CE3, which can be found
> here:
>
> https://keyserver.ubuntu.com/pks/lookup?search=0x54CB3CE3=on=index
> or to directly see the public key here:
>
> https://keyserver.ubuntu.com/pks/lookup?op=get=0x71b0145290ecdbfa5caea6dbd786a7ba54cb3ce3
>
> The git tag is release-1.4.0-rc1, signed by the same pgp key above:
>
> https://gitbox.apache.org/repos/asf?p=samza.git;a=commit;h=5327fafb8502b126482ec0c4efc8d1aa9b96ba44
>
> Test binaries have been published to Maven's staging repository, and are
> available here:
> https://repository.apache.org/content/repositories/orgapachesamza-1077
>
> The vote will be open for 72 hours (until Friday, March 6, 2020 at 3pm
> PST).
>
> Please download the release candidate, check the hashes/signature, build it
> and test it, and then please vote:
> [ ] +1 approve
> [ ] +0 no opinion
> [ ] -1 disapprove (and reason why)
>
> I ran check-all.sh and integration tests.
>
> +1 (non-binding) from my side.
>
> Thank you,
> Cameron
>


Re: [DISCUSS] 1.4 release

2020-02-28 Thread Yi Pan
Hi, Cameron,

Thanks for the clarification. Sounds good to me.

-Yi

On Fri, Feb 28, 2020 at 10:25 AM Cameron Lee 
wrote:

> Hi Yi,
> Those tickets are actually all committed already. They just aren't marked
> as closed. I have reached out to owners to close them.
> I was planning on sending out the VOTE thread today, and then release by
> the end of next week.
> Cameron
>
> On Thu, Feb 27, 2020 at 9:08 PM Yi Pan  wrote:
>
> > Hey, Cameron,
> >
> > Briefly browsed through the list and there are total 24 tickets tagged
> with
> > 1.4 and 11 are assigned/in-progress/done. Are we targeting to finish all
> 24
> > for 1.4? And what's the proposed timeline for 1.4?
> >
> > Thanks!
> >
> > -Yi
> >
> > On Wed, Feb 26, 2020 at 1:25 PM Xinyu Liu  wrote:
> >
> > > This is great! Thanks for driving the new release.
> > >
> > > Thanks,
> > > Xinyu
> > >
> > > On Tue, Feb 25, 2020 at 2:45 PM Cameron Lee 
> > > wrote:
> > >
> > > > Hi all,
> > > > We have made some updates to Samza, and we would like to make a Samza
> > 1.4
> > > > release.
> > > > We have been testing some of these changes internally at Linkedin,
> and
> > we
> > > > would like to send this thread for discussion for releasing to open
> > > source.
> > > > Highlights of the release include improvements to local state
> > management,
> > > > improvements to the Samza SQL API, and a new system producer to write
> > to
> > > > Azure blob storage, along with some other miscellaneous bug fixes and
> > > > clean-up.
> > > > A comprehensive list of changes can be found at
> > > >
> > > >
> > >
> >
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SAMZA%20and%20fixVersion%20in%20(1.4)
> > > > .
> > > > Some of the tickets are still open, but the corresponding commits
> have
> > > > already been pushed. If you do have an open ticket that is actually
> > > > complete, please close it.
> > > > The new release branch has already been cut. The name of the branch
> is
> > > > "1.4.0".
> > > > I would like to target a release VOTE email thread to start on
> February
> > > > 28th.
> > > > Thank you,
> > > > Cameron
> > > >
> > >
> >
>


Re: Samza - New Relic integration

2020-02-28 Thread Yi Pan
Hi, Vaibhav,

Not quite sure whether I understand your ask. How do you package and
distribute NewRelic jar in the YARN cluster? It would be much helpful if
you can share your job configure that worked and the places that you would
like to change.

Best,

-Yi

On Fri, Feb 28, 2020 at 11:55 AM Vaibhav Garg 
wrote:

> OK. I have been able to use NewRelic by specifying -javaagent option in
> task.opts.
>
> However, I had to give the absolute path to the NewRelic jar file in order
> to make it work.
>
> How can I give a path that is relative to the container since I would like
> to integrate NewRelic dependencies along with other container files?
>
> Thanks,
> Vaibhav Garg
> +91-9505020924
> vaibhavgar...@gmail.com
> LinkedIn <https://www.linkedin.com/in/vaibhavgarg90/>
>
>
> On Sat, Feb 29, 2020 at 12:41 AM Vaibhav Garg 
> wrote:
>
> > Yes. That's exactly what I am looking for.
> >
> > Let me try it out and I will get back to you if I face any issues.
> >
> > Thanks again
> >
> > Vaibhav Garg
> > +91-9505020924
> > vaibhavgar...@gmail.com
> > LinkedIn <https://www.linkedin.com/in/vaibhavgarg90/>
> >
> >
> > On Fri, Feb 28, 2020 at 1:23 PM Yi Pan  wrote:
> >
> >> Hi, Vaibhav,
> >>
> >> Check the description of task.opts in the configuration doc here:
> >>
> >>
> http://samza.apache.org/learn/documentation/latest/jobs/samza-configurations.html
> >>
> >> Is this what you are looking for?
> >>
> >> On Thu, Feb 27, 2020 at 9:54 PM Vaibhav Garg 
> >> wrote:
> >>
> >> > Hi,
> >> >
> >> > Any help here would be great.
> >> >
> >> > Thanks,
> >> > Vaibhav Garg
> >> > +91-9505020924
> >> > vaibhavgar...@gmail.com
> >> > LinkedIn <https://www.linkedin.com/in/vaibhavgarg90/>
> >> >
> >> >
> >> > On Thu, Feb 27, 2020 at 10:20 AM Vaibhav Garg <
> vaibhavgar...@gmail.com>
> >> > wrote:
> >> >
> >> > > Dear Bharath,
> >> > >
> >> > > Thanks for your reply. I now realize that I put a very ambiguous
> >> question
> >> > > to the community. Here is another attempt:
> >> > >
> >> > > I have set up a Yarn cluster that is configured to run containers of
> >> > > multiple Samza jobs.
> >> > >
> >> > > I would like to treat all the containers of the job as a single
> >> > > application in New Relic (as expected).
> >> > >
> >> > > Since the containers of the application can be killed on one Yarn
> node
> >> > and
> >> > > can start on another Yarn node, I would have to specify New Relic
> >> > > environment settings at the container start.
> >> > >
> >> > > Now, I am not sure how to specify java agent (to include New Relic
> >> java
> >> > > agent) and pass additional arguments such as New Relic environment
> >> every
> >> > > time the container starts.
> >> > >
> >> > > Please let me know if my understanding is wrong or some other
> >> > > clarification is needed.
> >> > >
> >> > > Thanks in advance,
> >> > > Vaibhav Garg
> >> > > +91-9505020924
> >> > > vaibhavgar...@gmail.com
> >> > > LinkedIn <https://www.linkedin.com/in/vaibhavgarg90/>
> >> > >
> >> > >
> >> > > On Thu, Feb 27, 2020 at 8:25 AM Bharath Kumara Subramanian <
> >> > > codin.mart...@gmail.com> wrote:
> >> > >
> >> > >> Hi,
> >> > >>
> >> > >> I am not sure I fully understand the ask.
> >> > >>
> >> > >> IIUC, Samza doesn't have native integration with New relic.
> However,
> >> you
> >> > >> should still be able to integrate your application with New relic
> on
> >> > your
> >> > >> end without native support.
> >> > >> If you are particularly looking to integrate native Samza metrics
> w/
> >> New
> >> > >> relic, you might need to implement your own custom metrics
> reporter.
> >> You
> >> > >> can find more details here
> >> > >> <
> >> > >>
> >> >
> >>
> https://samza.apache.org/learn/documentation/latest/operations/monitoring.html#customreporter
> >> > >> >
> >> > >>
> >> > >> Thanks,
> >> > >> Bharath
> >> > >>
> >> > >> On Wed, Feb 26, 2020 at 3:15 AM Vaibhav Garg <
> >> vaibhavgar...@gmail.com>
> >> > >> wrote:
> >> > >>
> >> > >> > Hi,
> >> > >> >
> >> > >> > I want to integrate New Relic in my Samza jobs. Can anyone help
> >> with
> >> > >> this,
> >> > >> > please?
> >> > >> >
> >> > >> > Thanks,
> >> > >> > Vaibhav Garg
> >> > >> > +91-9505020924
> >> > >> > vaibhavgar...@gmail.com
> >> > >> > LinkedIn <https://www.linkedin.com/in/vaibhavgarg90/>
> >> > >> >
> >> > >>
> >> > >
> >> >
> >>
> >
>


Re: Samza - New Relic integration

2020-02-27 Thread Yi Pan
Hi, Vaibhav,

Check the description of task.opts in the configuration doc here:
http://samza.apache.org/learn/documentation/latest/jobs/samza-configurations.html

Is this what you are looking for?

On Thu, Feb 27, 2020 at 9:54 PM Vaibhav Garg 
wrote:

> Hi,
>
> Any help here would be great.
>
> Thanks,
> Vaibhav Garg
> +91-9505020924
> vaibhavgar...@gmail.com
> LinkedIn 
>
>
> On Thu, Feb 27, 2020 at 10:20 AM Vaibhav Garg 
> wrote:
>
> > Dear Bharath,
> >
> > Thanks for your reply. I now realize that I put a very ambiguous question
> > to the community. Here is another attempt:
> >
> > I have set up a Yarn cluster that is configured to run containers of
> > multiple Samza jobs.
> >
> > I would like to treat all the containers of the job as a single
> > application in New Relic (as expected).
> >
> > Since the containers of the application can be killed on one Yarn node
> and
> > can start on another Yarn node, I would have to specify New Relic
> > environment settings at the container start.
> >
> > Now, I am not sure how to specify java agent (to include New Relic java
> > agent) and pass additional arguments such as New Relic environment every
> > time the container starts.
> >
> > Please let me know if my understanding is wrong or some other
> > clarification is needed.
> >
> > Thanks in advance,
> > Vaibhav Garg
> > +91-9505020924
> > vaibhavgar...@gmail.com
> > LinkedIn 
> >
> >
> > On Thu, Feb 27, 2020 at 8:25 AM Bharath Kumara Subramanian <
> > codin.mart...@gmail.com> wrote:
> >
> >> Hi,
> >>
> >> I am not sure I fully understand the ask.
> >>
> >> IIUC, Samza doesn't have native integration with New relic. However, you
> >> should still be able to integrate your application with New relic on
> your
> >> end without native support.
> >> If you are particularly looking to integrate native Samza metrics w/ New
> >> relic, you might need to implement your own custom metrics reporter. You
> >> can find more details here
> >> <
> >>
> https://samza.apache.org/learn/documentation/latest/operations/monitoring.html#customreporter
> >> >
> >>
> >> Thanks,
> >> Bharath
> >>
> >> On Wed, Feb 26, 2020 at 3:15 AM Vaibhav Garg 
> >> wrote:
> >>
> >> > Hi,
> >> >
> >> > I want to integrate New Relic in my Samza jobs. Can anyone help with
> >> this,
> >> > please?
> >> >
> >> > Thanks,
> >> > Vaibhav Garg
> >> > +91-9505020924
> >> > vaibhavgar...@gmail.com
> >> > LinkedIn 
> >> >
> >>
> >
>


Re: [DISCUSS] 1.4 release

2020-02-27 Thread Yi Pan
Hey, Cameron,

Briefly browsed through the list and there are total 24 tickets tagged with
1.4 and 11 are assigned/in-progress/done. Are we targeting to finish all 24
for 1.4? And what's the proposed timeline for 1.4?

Thanks!

-Yi

On Wed, Feb 26, 2020 at 1:25 PM Xinyu Liu  wrote:

> This is great! Thanks for driving the new release.
>
> Thanks,
> Xinyu
>
> On Tue, Feb 25, 2020 at 2:45 PM Cameron Lee 
> wrote:
>
> > Hi all,
> > We have made some updates to Samza, and we would like to make a Samza 1.4
> > release.
> > We have been testing some of these changes internally at Linkedin, and we
> > would like to send this thread for discussion for releasing to open
> source.
> > Highlights of the release include improvements to local state management,
> > improvements to the Samza SQL API, and a new system producer to write to
> > Azure blob storage, along with some other miscellaneous bug fixes and
> > clean-up.
> > A comprehensive list of changes can be found at
> >
> >
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SAMZA%20and%20fixVersion%20in%20(1.4)
> > .
> > Some of the tickets are still open, but the corresponding commits have
> > already been pushed. If you do have an open ticket that is actually
> > complete, please close it.
> > The new release branch has already been cut. The name of the branch is
> > "1.4.0".
> > I would like to target a release VOTE email thread to start on February
> > 28th.
> > Thank you,
> > Cameron
> >
>


Re: [VOTE] Apache Samza 1.3.1 RC0

2020-02-18 Thread Yi Pan
Ran check-all and integration tests successfully.

+1 (binding)

On Thu, Feb 13, 2020 at 12:02 PM Hai Lu  wrote:

> Hi,
>
> This is a call for a vote on a release of Apache Samza 1.3.1 to redress
> certain issues found in 1.3.0
>
> The release candidate can be downloaded from here:
> http://home.apache.org/~lhaiesp/samza-1.3.1-rc0/
>
> The release candidate is signed with pgp key 0x256F8FA2, which can be found
> here:
>
> https://keyserver.ubuntu.com/pks/lookup?search=0x256F8FA2=on=index
> or to directly see the public key here:
>
> https://keyserver.ubuntu.com/pks/lookup?op=get=0x9ebc0889d43fae16dd0d8f5ba2f50cf4256f8fa2
>
> The git tag is release-1.3.1-rc0 and signed with the same pgp key above:
>
> https://gitbox.apache.org/repos/asf?p=samza.git;a=commit;h=7b849c047827587dec55ac169f41aac7321ce1cb
>
> Test binaries have been published to Maven's staging repository, and are
> available here:
> https://repository.apache.org/content/repositories/orgapachesamza-1074
>
> The vote will be open for 128 hours (ending at 8:00 PM PST Tuesday,
> 2/18/2020).
>
> Please download the release candidate, check the hashes/signature, build it
> and test it, and then please vote:
>
> [ ] +1 approve
>
> [ ] +0 no opinion
>
> [ ] -1 disapprove (and reason why)
>
> I ran check-all.sh and integration tests (both YARN and standalone).
>
> +1 (non-binding) from my side.
>
> Thanks,
> Hai
>


Re: job.coordinator.replication.factor configuration

2020-02-15 Thread Yi Pan
JIRA opened: SAMZA-2460.

Thanks!

-Yi

On Sat, Feb 15, 2020 at 9:42 AM Yi Pan  wrote:

> Hi, Robert,
>
> Thanks for pointing this out. This is an error on the documentation
> side.The meaning of job.coordinator.replication.factor should not change
> post 1.0. I will file a bug against the doc site.
>
> Thanks!
>
> -Yi
>
> On Fri, Feb 14, 2020 at 7:52 AM Robert Wigginton
>  wrote:
>
>> Hello,
>>
>>
>>
>> Could you verify the description of the
>> job.coordinator.replication.factor property?  Documentation is different
>> for releases prior to 1.0.0 and I would like to confirm if the property
>> truly did change so we can set that property correctly.
>>
>>
>>
>> <= 1.0.0 Description
>>
>>
>>
>> job.coordinator.replication.factor
>>
>> If you are using Kafka for coordinator stream, this is the number of
>> Kafka nodes to which you want the coordinator topic replicated for
>> durability.
>>
>> > 1.0.0 Description
>>
>>
>>
>> job.coordinator.replication.factor
>>
>>
>>
>> The frequency at which the input streams’ partition count change should
>> be detected. When the input partition count change is detected, Samza will
>> automatically restart a stateless job or fail a stateful job. A longer time
>> interval is recommended for jobs w/ large number of input system stream
>> partitions, since gathering partition count may incur measurable overhead
>> to the job. You can completely disable partition count monitoring by
>> setting this value to 0 or a negative integer, which will also disable
>> auto-restart/failing behavior of a Samza job on partition count changes.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> [image: Helix Education] <http://www.helixeducation.com/>
>>
>>
>>
>> *Robert Wigginton*
>>
>> Sr. Systems Engineer
>>
>> Denver, Colorado
>>
>> C: 303-304-4914
>>
>> www.helixeducation.com
>>
>> Blog <http://www.helixeducation.com/blog/> | Twitter
>> <https://twitter.com/HelixEducation> | Facebook
>> <https://www.facebook.com/HelixEducation> | LinkedIn
>> <http://www.linkedin.com/company/3609946>
>>
>>
>>
>


Re: job.coordinator.replication.factor configuration

2020-02-15 Thread Yi Pan
Hi, Robert,

Thanks for pointing this out. This is an error on the documentation
side.The meaning of job.coordinator.replication.factor should not change
post 1.0. I will file a bug against the doc site.

Thanks!

-Yi

On Fri, Feb 14, 2020 at 7:52 AM Robert Wigginton
 wrote:

> Hello,
>
>
>
> Could you verify the description of the job.coordinator.replication.factor
> property?  Documentation is different for releases prior to 1.0.0 and I
> would like to confirm if the property truly did change so we can set that
> property correctly.
>
>
>
> <= 1.0.0 Description
>
>
>
> job.coordinator.replication.factor
>
> If you are using Kafka for coordinator stream, this is the number of Kafka
> nodes to which you want the coordinator topic replicated for durability.
>
> > 1.0.0 Description
>
>
>
> job.coordinator.replication.factor
>
>
>
> The frequency at which the input streams’ partition count change should be
> detected. When the input partition count change is detected, Samza will
> automatically restart a stateless job or fail a stateful job. A longer time
> interval is recommended for jobs w/ large number of input system stream
> partitions, since gathering partition count may incur measurable overhead
> to the job. You can completely disable partition count monitoring by
> setting this value to 0 or a negative integer, which will also disable
> auto-restart/failing behavior of a Samza job on partition count changes.
>
>
>
>
>
>
>
>
>
> [image: Helix Education] 
>
>
>
> *Robert Wigginton*
>
> Sr. Systems Engineer
>
> Denver, Colorado
>
> C: 303-304-4914
>
> www.helixeducation.com
>
> Blog  | Twitter
>  | Facebook
>  | LinkedIn
> 
>
>
>


Re: [ANNOUNCE] Please welcome Bharath Kumarasubramanian to the Samza PMC

2020-02-13 Thread Yi Pan
Congrats! Bharath, well deserved!

-Yi

On Thu, Feb 13, 2020 at 10:17 PM Wei Song 
wrote:

> Congrats, Bharath, well deserved !!!
>
>
> On 2/13/20, 8:51 PM, "Jagadish Venkatraman" 
> wrote:
>
> Congrats Bharath. Great work! Looking forward to continued
> contributions!
>
> On Thursday, February 13, 2020, Yang Zhang  wrote:
>
> > Congratulations, Bharath! Nice work and thanks for the contributions!
> >
> > Best,
> > Yang
> >
> > On Thu, Feb 13, 2020 at 4:27 PM Xinyu Liu 
> wrote:
> >
> > > Hi all,
> > >
> > > I'm very pleased to announce that the Samza PMC has voted Bharath
> > > Kumarasubramanian to be a Project Management Committee (PMC)
> Member.  The
> > > PMC is responsible for the overall health of a project and for
> voting in
> > > new committers and PMC members, as well as voting on releases.
> Over the
> > > past few years, Bharath has been a valuable committer on the
> project.
> > >
> > > Congrats Bharath!
> > >
> > > Thanks,
> > > Xinyu
> > > on behalf of the Samza PMC
> > >
> >
>
>
> --
> Jagadish
>
>
>


Re: Samza consumer without Zookeeper

2020-02-05 Thread Yi Pan
Hi, Robert,

Thanks! I believe since Samza 1.2, the ZK dependency from Kafka consumer is
removed from Samza. I briefly checked the code base from master and there
is no reference to KafkaConsumerConfig.getZkConnect(). Do you see any error
messages when you remove the systems.x.consumer.zookeeper.connect from your
config?

-Yi

On Wed, Feb 5, 2020 at 3:12 PM Robert Wigginton
 wrote:

> Currently 1.0.0 but are in the process of upgrading to 1.3.0.
>
> On Wed, 2020-02-05 at 15:01 -0800, Yi Pan wrote:
> > Hi, Robert,
> >
> > Which version of Samza are you using?
> >
> > -Yi
> >
> > On Tue, Feb 4, 2020 at 9:17 AM Robert Wigginton
> >  wrote:
> >
> > > Hello,
> > >
> > > We are currently evaluating moving from an on prem Kafka deployment
> > > to
> > > Confluent Cloud.  Confluent Cloud does not expose Zookeeper to
> > > user.
> > > Is it possible to setup a Samza consumer without the zookeeper
> > > connect
> > > option?
> > >
> > > Thanks,
> > >
> > > Robert
> > >
>


Re: Samza consumer without Zookeeper

2020-02-05 Thread Yi Pan
Hi, Robert,

Which version of Samza are you using?

-Yi

On Tue, Feb 4, 2020 at 9:17 AM Robert Wigginton
 wrote:

> Hello,
>
> We are currently evaluating moving from an on prem Kafka deployment to
> Confluent Cloud.  Confluent Cloud does not expose Zookeeper to user.
> Is it possible to setup a Samza consumer without the zookeeper connect
> option?
>
> Thanks,
>
> Robert
>


Re: [VOTE] SEP-26: Add SystemProducer for Azure Blob Storage

2020-01-09 Thread Yi Pan
+1 (binding). Good to see more cloud native integrations in Samza.

-Yi

On Wed, Jan 8, 2020 at 10:31 AM Prateek Maheshwari 
wrote:

> +1 (binding). Thanks for the contribution.
>
> - Prateek
>
> On Tue, Jan 7, 2020 at 7:59 PM Jagadish Venkatraman <
> jagadish1...@gmail.com>
> wrote:
>
> > +1 (binding), looking forward to Samza's integration with Azure blobs
> >
> > On Wednesday, January 8, 2020, Lakshmi Manasa  >
> > wrote:
> >
> > > Hi,
> > >
> > > This is a call for a vote on SEP-26: Add SystemProducer for Azure Blob
> > > Storage.
> > > Thanks for taking a look and giving feedback.
> > >
> > > I have addressed the comments on the SEP and since there were no major
> > > questions/objections, starting this vote.
> > >
> > > Discussion thread:
> > > http://mail-archives.apache.org/mod_mbox/samza-dev/202001.
> > > mbox/%3CCAEwD47cW2T24C9A_tzj7Qxuv3P%2B2an47GkmaA4-
> > > 41WZfvY_vgw%40mail.gmail.com%3E
> > >
> > > SEP:
> > > https://cwiki.apache.org/confluence/display/SAMZA/SEP-
> > > 26%3A+Azure+Blob+Storage+Producer
> > >
> > > Please vote:
> > >
> > > [ ] +1 approve
> > >
> > > [ ] +0 no opinion
> > >
> > > [ ] -1 disapprove (and reason why)
> > >
> > > Thanks,
> > > Manasa
> > >
> >
> >
> > --
> > Jagadish
> >
>


Re: [Draft] Samza quarterly report

2020-01-09 Thread Yi Pan
## Description:
- Apache Samza is a distributed stream processing engine that are highly
  configurable to process events from various data sources, including
  real-time messaging system (e.g. Kafka) and distributed file systems (e.g.
  HDFS).

## Issues:
- No issues require board attention

## Membership Data:
Apache Samza was founded 2015-01-22 (5 years ago)
There are currently 26 committers and 16 PMC members in this project.
The Committer-to-PMC ratio is roughly 7:4.

Community changes, past quarter:
- No new PMC members. Last addition was Boris Shkolnik on 2019-06-06.
- No new committers. Last addition was Rayman Preet Singh on 2019-07-08.

## Project Activity:
- New version 1.3 was released on 12/05/2019
- New features via SEPs (i.e. Samza Enhancement Proposals) are proposed
continuously.
In the last quarter, there are 4 new SEPs.

## Community Health:
- We continue engage with new users via the Q on dev email lists.
- We have Samza talks in many Conferences:
Strange Loop - Riding the Stream Processing Wave
Apache Beam Summit (Berlin) - Streaming Pipelines at Scale with Apache
Beam and Samza
ApacheCon North America - Samza 1.0: How we scaled stream processing at
LinkedIn
ApacheCon North America - Samza Portable Runner for Beam
KubeCon North America - Running Apache Samza on Kubernetes
- We have organized meetups with the following Samza Talks:
Sunnyvale - Stream Processing in Python with Samza and Beam
Sunnyvale - Apache Samza 1.0: Recent Advances and our plans for future
in Stream Processing
Seattle - Scalable Stream Processing with Apache Samza


P.S. just fixing one typo.

-Yi

On Thu, Jan 9, 2020 at 1:42 PM Yi Pan  wrote:

> ## Description:
> - Apache Samza is a distributed stream processing engine that are highly
>   configurable to process events from various data sources, including
>   real-time messaging system (e.g. Kafka) and distributed file systems
> (e.g.
>   HDFS).
>
> ## Issues:
> - No issues requires board attention
>
> ## Membership Data:
> Apache Samza was founded 2015-01-22 (5 years ago)
> There are currently 26 committers and 16 PMC members in this project.
> The Committer-to-PMC ratio is roughly 7:4.
>
> Community changes, past quarter:
> - No new PMC members. Last addition was Boris Shkolnik on 2019-06-06.
> - No new committers. Last addition was Rayman Preet Singh on 2019-07-08.
>
> ## Project Activity:
> - New version 1.3 was released on 12/05/2019
> - New features via SEPs (i.e. Samza Enhancement Proposals) are proposed
> continuously.
> In the last quarter, there are 4 new SEPs.
>
> ## Community Health:
> - We continue engage with new users via the Q on dev email lists.
> - We have Samza talks in many Conferences:
> Strange Loop - Riding the Stream Processing Wave
> Apache Beam Summit (Berlin) - Streaming Pipelines at Scale with Apache
> Beam and Samza
> ApacheCon North America - Samza 1.0: How we scaled stream processing
> at LinkedIn
> ApacheCon North America - Samza Portable Runner for Beam
> KubeCon North America - Running Apache Samza on Kubernetes
> - We have organized meetups with the following Samza Talks:
> Sunnyvale - Stream Processing in Python with Samza and Beam
> Sunnyvale - Apache Samza 1.0: Recent Advances and our plans for future
> in Stream Processing
> Seattle - Scalable Stream Processing with Apache Samza
>
> If the above report looks good, I will submit today.
>
> Thanks a lot!
>
> -Yi
>
> On Thu, Jan 9, 2020 at 10:23 AM Prateek Maheshwari 
> wrote:
>
>> Thanks for preparing this Yi. We had the following Samza talks and meetups
>> in 2019. Let's highlight them under Community Health:
>>
>> Conferences:
>> Strange Loop - Riding the Stream Processing Wave
>> Apache Beam Summit (Berlin) - Streaming Pipelines at Scale with Apache
>> Beam
>> and Samza
>> ApacheCon North America - Samza 1.0: How we scaled stream processing at
>> LinkedIn
>> ApacheCon North America - Samza Portable Runner for Beam
>> KubeCon North America - Running Apache Samza on Kubernetes
>>
>> Meetup Talks:
>> Sunnyvale - Stream Processing in Python with Samza and Beam
>> Sunnyvale - Apache Samza 1.0: Recent Advances and our plans for future in
>> Stream Processing
>> Seattle - Scalable Stream Processing with Apache Samza
>>
>> On Thu, Jan 9, 2020 at 1:23 AM Yi Pan  wrote:
>>
>> > Hi, all,
>> >
>> > Another time to report our project status. I have a draft below and
>> would
>> > like input from the community to fill in some more details:
>> >
>> > ## Description:
>> > - Apache Samza is a distributed stream processing engine that are highly
>> >   configurable to proc

Re: [Draft] Samza quarterly report

2020-01-09 Thread Yi Pan
## Description:
- Apache Samza is a distributed stream processing engine that are highly
  configurable to process events from various data sources, including
  real-time messaging system (e.g. Kafka) and distributed file systems (e.g.
  HDFS).

## Issues:
- No issues requires board attention

## Membership Data:
Apache Samza was founded 2015-01-22 (5 years ago)
There are currently 26 committers and 16 PMC members in this project.
The Committer-to-PMC ratio is roughly 7:4.

Community changes, past quarter:
- No new PMC members. Last addition was Boris Shkolnik on 2019-06-06.
- No new committers. Last addition was Rayman Preet Singh on 2019-07-08.

## Project Activity:
- New version 1.3 was released on 12/05/2019
- New features via SEPs (i.e. Samza Enhancement Proposals) are proposed
continuously.
In the last quarter, there are 4 new SEPs.

## Community Health:
- We continue engage with new users via the Q on dev email lists.
- We have Samza talks in many Conferences:
Strange Loop - Riding the Stream Processing Wave
Apache Beam Summit (Berlin) - Streaming Pipelines at Scale with Apache
Beam and Samza
ApacheCon North America - Samza 1.0: How we scaled stream processing at
LinkedIn
ApacheCon North America - Samza Portable Runner for Beam
KubeCon North America - Running Apache Samza on Kubernetes
- We have organized meetups with the following Samza Talks:
Sunnyvale - Stream Processing in Python with Samza and Beam
Sunnyvale - Apache Samza 1.0: Recent Advances and our plans for future
in Stream Processing
Seattle - Scalable Stream Processing with Apache Samza

If the above report looks good, I will submit today.

Thanks a lot!

-Yi

On Thu, Jan 9, 2020 at 10:23 AM Prateek Maheshwari 
wrote:

> Thanks for preparing this Yi. We had the following Samza talks and meetups
> in 2019. Let's highlight them under Community Health:
>
> Conferences:
> Strange Loop - Riding the Stream Processing Wave
> Apache Beam Summit (Berlin) - Streaming Pipelines at Scale with Apache Beam
> and Samza
> ApacheCon North America - Samza 1.0: How we scaled stream processing at
> LinkedIn
> ApacheCon North America - Samza Portable Runner for Beam
> KubeCon North America - Running Apache Samza on Kubernetes
>
> Meetup Talks:
> Sunnyvale - Stream Processing in Python with Samza and Beam
> Sunnyvale - Apache Samza 1.0: Recent Advances and our plans for future in
> Stream Processing
> Seattle - Scalable Stream Processing with Apache Samza
>
> On Thu, Jan 9, 2020 at 1:23 AM Yi Pan  wrote:
>
> > Hi, all,
> >
> > Another time to report our project status. I have a draft below and would
> > like input from the community to fill in some more details:
> >
> > ## Description:
> > - Apache Samza is a distributed stream processing engine that are highly
> >   configurable to process events from various data sources, including
> >   real-time messaging system (e.g. Kafka) and distributed file systems
> > (e.g.
> >   HDFS).
> >
> > ## Issues:
> > - No issues requires board attention
> >
> > ## Membership Data:
> > Apache Samza was founded 2015-01-22 (5 years ago)
> > There are currently 26 committers and 16 PMC members in this project.
> > The Committer-to-PMC ratio is roughly 7:4.
> >
> > Community changes, past quarter:
> > - No new PMC members. Last addition was Boris Shkolnik on 2019-06-06.
> > - No new committers. Last addition was Rayman Preet Singh on 2019-07-08.
> >
> > ## Project Activity:
> > - New version 1.3 was released on 12/05/2019
> > *- [please add related project activities you know here]*
> >
> > ## Community Health:
> > - The community is actively pushing new features via SEPs (i.e. Samza
> >   Enhancement Proposals). In the last quarter, there are 4 new SEPs.
> > - We continue engage with new users via the Q on dev email lists.
> > *- [please add examples of community health indicators here, like new
> > companies/users, new meetups/talks, new initiatives proposed and
> > in-progress etc.]*
> >
>


Re: [VOTE] SEP 25: PR Title and Description Guidelines

2019-12-18 Thread Yi Pan
+1 (binding)

On Wed, Dec 18, 2019 at 10:49 AM Bharath Kumara Subramanian <
codin.mart...@gmail.com> wrote:

> +1 (non-binding).
>
> Thanks,
> Bharath
>
> On Wed, Dec 18, 2019 at 10:42 AM Prateek Maheshwari 
> wrote:
>
> > Hi folks,
> >
> > This is a call for a vote on SEP 25: PR Title and Description Guidelines.
> > Thanks to everyone who helped review the proposal and provided feedback.
> >
> > Feedback from the discussion is positive:
> >
> >
> http://mail-archives.apache.org/mod_mbox/samza-dev/201912.mbox/%3CCAMja7KeQr9C048UVZwfSC46h%3DEX_9S%2BSEvMF9NPg0V5dPTPfZg%40mail.gmail.com%3E
> >  <
> >
> >
> http://mail-archives.apache.org/mod_mbox/samza-dev/201912.mbox/%3CCAMja7KeQr9C048UVZwfSC46h%3DEX_9S%2BSEvMF9NPg0V5dPTPfZg%40mail.gmail.com%3E
> > r <
> http://mail-archives.apache.org/mod_mbox/samza-dev/201912.mbox/browser
> > >>
> >
> > SEP can be found at:
> >
> >
> https://cwiki.apache.org/confluence/display/SAMZA/SEP-25%3A+PR+Title+And+Description+Guidelines
> >  <
> >
> >
> https://cwiki.apache.org/confluence/display/SAMZA/SEP-25%3A+PR+Title+And+Description+Guidelines
> > >
> >
> > Please vote:
> >
> > [ ] +1 approve
> >
> > [ ] +0 no opinion
> >
> > [ ] -1 disapprove (and reason why)
> >
> > Thanks,
> > Prateek
> >
>


Re: [DISCUSS] SEP 25: PR Title and Description Guidelines

2019-12-16 Thread Yi Pan
+1 (binding). lgtm. Thanks!

-Yi

On Mon, Dec 16, 2019 at 8:08 AM Daniel Nishimura 
wrote:

> +1. Thank Prateek for standardizing the PR process better.
>
> On Sun, Dec 15, 2019 at 10:55 PM Bharath Kumara Subramanian <
> codin.mart...@gmail.com> wrote:
>
> > +1.  Template looks good to me.
> > It will be really helpful to sift through, categorize and prepare release
> > notes from notable PRs during releases.
> >
> > Thanks,
> > Bharath
> >
> >
> > On Fri, Dec 13, 2019 at 11:24 PM Jagadish Venkatraman <
> > jagadish1...@gmail.com> wrote:
> >
> > > +1, thanks for the write-up Prateek.
> > >
> > > Let's also update the contributor's guidelines at:
> > > https://samza.apache.org/contribute/contributors-corner.html
> > >
> > >
> > > On Friday, December 13, 2019, Prateek Maheshwari  >
> > > wrote:
> > >
> > > > Hi folks,
> > > >
> > > > In order to make Samza PR descriptions and commit messages more
> > > consistent,
> > > > informative and discoverable, we propose the following requirements
> for
> > > new
> > > > PRs submitted to the Samza project
> > > >
> > > > https://cwiki.apache.org/confluence/display/SAMZA/SEP-25%3A+
> > > > PR+Title+And+Description+Guidelines
> > > >
> > > > Contributors should copy-paste and update the description template
> when
> > > > submitting PRs.
> > > > Committers should ensure that the guidelines are followed before
> > merging
> > > > changes.
> > > >
> > > > Please take a look and let us know if you have any concerns or
> > > suggestions.
> > > >
> > > > Thanks,
> > > > Prateek
> > > >
> > >
> > >
> > > --
> > > Jagadish
> > >
> >
>


Re: [VOTE] SEP-23: Simplify Job Runner

2019-12-11 Thread Yi Pan
+1(binding).

Thanks!

-Yi

On Wed, Dec 11, 2019 at 12:13 PM Prateek Maheshwari 
wrote:

> +1 (binding).
>
> Thanks,
> Prateek
>
> On Wed, Dec 11, 2019 at 11:50 AM Xinyu Liu  wrote:
>
> > +1 (binding). This proposal will help future split deployment as well as
> > make the deployment simple. Thanks for making the effort!
> >
> > Thanks,
> > Xinyu
> >
> > On Wed, Dec 11, 2019 at 10:30 AM Ke Wu  wrote:
> >
> > > Hi,
> > >
> > > This is a call for a vote on SEP-23: Simplify Job Runner. Thanks to
> > > everyone who help review and refine the proposal.
> > >
> > > Feedbacks from discussion is positive
> > > http://mail-archives.apache.org/mod_mbox/samza-dev/201912.mbox/browser
> <
> > > http://mail-archives.apache.org/mod_mbox/samza-dev/201912.mbox/browser
> >
> > >
> > > SEP can be found at
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/SAMZA/SEP-23%3A+Simplify+Job+Runner
> > > <
> > >
> >
> https://cwiki.apache.org/confluence/display/SAMZA/SEP-23:+Simplify+Job+Runner
> > > >
> > >
> > > Jira can be found at
> > > https://issues.apache.org/jira/browse/SAMZA-2405 <
> > > https://issues.apache.org/jira/browse/SAMZA-2405>
> > >
> > > Please vote:
> > >
> > > [ ] +1 approve
> > >
> > > [ ] +0 no opinion
> > >
> > > [ ] -1 disapprove (and reason why)
> > >
> > > Thanks,
> > > Ke
> >
>


Re: [VOTE] Apache Samza 1.3.0 RC2

2019-12-01 Thread Yi Pan
+1 (binding), verified the signature, built and local integration tests
passed.

Thanks!

-Yi

On Wed, Nov 27, 2019 at 2:49 PM Hai Lu  wrote:

> Hi,
>
> This is a call for a vote on a release of Apache Samza 1.3.0. Thanks to
> everyone who has contributed to this release.
>
> The release candidate can be downloaded from here:
> http://home.apache.org/~lhaiesp/samza-1.3.0-rc2/
>
> The release candidate is signed with pgp key 0x07678C76, which can be found
> here:
>
> https://keyserver.ubuntu.com/pks/lookup?search=0x07678C76=on=index
> or to directly see the public key here:
>
> https://keyserver.ubuntu.com/pks/lookup?op=get=0x1513eaedf69d7ca77ff283b534ea3ca507678c76
>
> The git tag is release-1.3.0-rc2 and signed with the same pgp key above:
>
> https://gitbox.apache.org/repos/asf?p=samza.git;a=commit;h=573ef951dd9d96d9d547db86bbc8023557714f47
>
> Test binaries have been published to Maven's staging repository, and are
> available here:
> https://repository.apache.org/content/repositories/orgapachesamza-1073
>
> The vote will be open for 171 hours (ending at 6:00 PM PST Wednesday,
> 12/4/2019).
>
> Please download the release candidate, check the hashes/signature, build it
> and test it, and then please vote:
>
> [ ] +1 approve
>
> [ ] +0 no opinion
>
> [ ] -1 disapprove (and reason why)
>
> I ran check-all.sh and integration tests (both YARN and standalone).
>
> +1 (non-binding) from my side.
>
> Thanks,
> Hai
>


[REPORT] Samza - November 2019

2019-11-10 Thread Yi Pan
## Description:
- Apache Samza is a distributed stream processing engine that are highly
  configurable to process events from various data sources, including
  real-time messaging system (e.g. Kafka) and distributed file systems (e.g.
  HDFS).

## Issues:
- No issues requires board attention

## Health report:
- Project is in healthy status with 1.2 released in June 2019

## PMC changes:

- Currently 16 PMC members.
- Boris Shkolnik was added to the PMC on Thu Jun 06 2019

## Committer base changes:

- Currently 26 committers.
- Rayman Preet Singh was added as a committer on July 8 2019

## Releases:

- Last release was 1.2.0 on June 11 2019

## Mailing list activity:

- dev@samza.apache.org:
   - 56 emails sent to list (133 in previous quarter)


## JIRA activity:

- 78 JIRA tickets created in the last 3 months
- 26 JIRA tickets closed/resolved in the last 3 months


## Commit activity

- 90 commits in the past quarter (-23% decrease)
- 27 code contributors in the past quarter (12% increase)

## GitHub PR activity:

- 80 PRs opened on GitHub, past quarter (-27% decrease)
- 79 PRs closed on GitHub, past quarter (-28% decrease)

## Other activities:
- Samza stream processing meetup @LinkedIn was held on Oct 3, 2019
  (
https://www.meetup.com/Stream-Processing-Meetup-LinkedIn/events/264589317/)


Re: [VOTE] SEP-20: Samza on Kubernetes

2019-11-08 Thread Yi Pan
+1 (binding)

On Thu, Nov 7, 2019 at 7:38 PM Jagadish Venkatraman 
wrote:

> +1 binding.
>
> Thanks Weiqing for driving this!
>
> On Thursday, November 7, 2019, Xinyu Liu  wrote:
>
> > +1 (binding).
> >
> > Thanks,
> > Xinyu
> >
> > On Thu, Nov 7, 2019 at 10:50 AM Weiqing Yang 
> > wrote:
> >
> > > Hi All,
> > >
> > > The feedback from the discussion thread:
> > > http://mail-archives.apache.org/mod_mbox/samza-dev/201911.mbox/browser
> > is
> > > positive. This is a call for a vote for SEP-20: Samza on Kubernetes <
> > >
> > > https://cwiki.apache.org/confluence/display/SAMZA/SEP-
> > 20%3A+Samza+on+Kubernetes
> > > >.
> > > Thanks to the committers and contributors that were involved with the
> > > review, design, etc.
> > >
> > > The link to the jira ticket: SAMZA-2067 <
> > > https://issues.apache.org/jira/browse/SAMZA-2067>.
> > >
> > > Thanks,
> > > Weiqing
> > >
> >
>
>
> --
> Jagadish
>


Re: [DISCUSS] SEP-20: Samza on Kubernetes

2019-11-05 Thread Yi Pan
+1! Great to see this coming through!

-Yi

On Mon, Nov 4, 2019 at 9:20 PM Jagadish Venkatraman 
wrote:

> +1, look forward to Samza K8s integration :)
>
> On Monday, November 4, 2019, Xinyu Liu  wrote:
>
> > +1 on the design. This is a great feature to allow Samza to expand its
> > deployment to Kubernetes clusters. Nice job!
> >
> > Thanks,
> > Xinyu
> >
> > On Mon, Nov 4, 2019 at 10:10 AM Weiqing Yang 
> > wrote:
> >
> > > Hi all,
> > >
> > > We created SEP-20: Samza on Kubernetes, which supports Samza to run on
> > > Kubernetes natively. Please find the SEP wiki below:
> > >
> > > https://cwiki.apache.org/confluence/display/SAMZA/SEP-
> > 20%3A+Samza+on+Kubernetes
> > >
> > > Please take a look and chime in your feedback.
> > >
> > > Thanks,
> > > Weiqing
> > >
> >
>
>
> --
> Jagadish
>


Re: Questions about using custom groupers

2019-09-30 Thread Yi Pan
HI, Malcolm,

The configuration should be in *_coordinator_* topic. If your configuration
of cleanup policy for this topic is compact only, you should not lose the
configuration. If your configuration on this topic is a combination of
compact + time retention (i.e. newer Kafka version on the broker side
enabled this feature), you may lose your configuration.

-Yi

On Wed, Sep 25, 2019 at 3:14 PM Malcolm McFarland 
wrote:

> Hey folks,
>
> We implemented a custom grouper several months ago to do some basic
> stats collection prior to startup. After our most recent restart, I
> started seeing this error in our system:
>
> Grouper mismatch. Configured:
> com.cavulus.grouper.OurSystemStreamPartitionGrouperFactory Actual:
> org.apache.samza.container.grouper.stream.GroupByPartitionFactory
>
> Nothing has changed in either our Kafka cluster or our grouper code in
> many months. In sleuthing out the cause, it occurred to me that
> perhaps a data retention or cleanup policy was causing older messages
> in the *_checkpoint_* or *_coordinator_* topics to be removed.
>
> I have two questions:
>
> 1) Where within these queues is the grouper configuration stored?
> 2) Would a Kafka topic cleanup.policy of "compact" cause trouble here?
>
> Cheers,
> Malcolm McFarland
> Cavulus
>
>
> This correspondence is from HealthPlanCRM, LLC, d/b/a Cavulus. Any
> unauthorized or improper disclosure, copying, distribution, or use of
> the contents of this message is prohibited. The information contained
> in this message is intended only for the personal and confidential use
> of the recipient(s) named above. If you have received this message in
> error, please notify the sender immediately and delete the original
> message.
>


[REPORT] Samza - July 2019

2019-07-12 Thread Yi Pan (Data Infrastructure)
## Description:
- Apache Samza is a distributed stream processing engine that are highly
  configurable to process events from various data sources, including
  real-time messaging system (e.g. Kafka) and distributed file systems (e.g.
  HDFS).

## Issues:
- No issues requires board attention

## Activity:
- Samza 1.2 is released: 
http://samza.apache.org/blog/2019-06-11-announcing-the-release-of-apache-samza--1.2.0

## Health report:
- Project is in healthy status with 1.2 released in June 2019

## PMC changes:

- Currently 16 PMC members.
- Boris Shkolnik was added to the PMC on Thu Jun 06 2019

## Committer base changes:

- Currently 26 committers.
- New commmitters:
- Bharath Kumarasubramanian was added as a committer on Mon Jun 24 2019
- Cameron Lee was added as a committer on Thu Apr 11 2019
- Rayman Preet Singh was added as a committer on Mon Jul 08 2019

## Releases:

- Last release was 1.2.0 on June 11 2019

## /dist/ errors: 9
- This has been fixed.

## Mailing list activity:
- dev@samza.apache.org:
- 267 subscribers (down -3 in the last 3 months):
- 215 emails sent to list (1011 in previous quarter)


## JIRA activity:

- 107 JIRA tickets created in the last 3 months
- 84 JIRA tickets closed/resolved in the last 3 months

Draft July report for Samza

2019-07-11 Thread Yi Pan
Hi, all,

Here is the draft report I had for Samza this quarter. Please let me know
if I miss anything.

Thanks!

## Description:
- Apache Samza is a distributed stream processing engine that are highly
  configurable to process events from various data sources, including
  real-time messaging system (e.g. Kafka) and distributed file systems (e.g.
  HDFS).

## Issues:
- No issues requires board attention

## Activity:
- Samza 1.2 is released:
http://samza.apache.org/blog/2019-06-11-announcing-the-release-of-apache-samza--1.2.0

## Health report:
- Project is in healthy status with 1.1 released in Mar 2019

## PMC changes:

 - Currently 16 PMC members.
 - Boris Shkolnik was added to the PMC on Thu Jun 06 2019

## Committer base changes:

 - Currently 26 committers.
 - New commmitters:
- Bharath Kumarasubramanian was added as a committer on Mon Jun 24 2019
- Cameron Lee was added as a committer on Thu Apr 11 2019
- Rayman Preet Singh was added as a committer on Mon Jul 08 2019

## Releases:

 - Last release was 1.1.0 on Thu Mar 21 2019

## /dist/ errors: 9
 - TODO - has it been fixed? Here is the link to the checker report:
https://checker.apache.org/projs/samza.html


## Mailing list activity:
 - dev@samza.apache.org:
- 267 subscribers (down -3 in the last 3 months):
- 215 emails sent to list (1011 in previous quarter)


## JIRA activity:

 - 107 JIRA tickets created in the last 3 months
 - 84 JIRA tickets closed/resolved in the last 3 months


Re: [ANNOUNCE] Welcoming Rayman and Bharath as new Samza committers!

2019-07-09 Thread Yi Pan
Big congrats to Ray and Bharath! Looking forward to more PRs from you guys!

-Yi

On Tue, Jul 9, 2019 at 9:54 AM Weiqing Yang 
wrote:

> Congrats Rayman and Bharath!
>
> - Weiqing
>
> On Mon, Jul 8, 2019 at 10:55 PM Yang Zhang  wrote:
>
> > Congratulations!
> >
> > Best,
> > Yang
> >
> > On Mon, Jul 8, 2019 at 22:09 Sanil Jain  wrote:
> >
> > > Congrats Rayman and Bharath!!!
> > >
> > > On Mon, 8 Jul 2019 at 21:47, Wei Song  wrote:
> > >
> > > > Congrats Rayman and Bharath!!!
> > > >
> > > > -Wei
> > > >
> > > >
> > > > On 7/8/19, 9:19 PM, "Jagadish Venkatraman" 
> > wrote:
> > > >
> > > > I'm pleased to announce that the Samza PMC has voted to invite
> > > Rayman &
> > > > Bharath as committers, and they have accepted.
> > > >
> > > > Please join us in congratulating them on this recognition!
> > > >
> > > > A quick summary of their accomplishments..
> > > >
> > > > *Rayman *
> > > > Ray has been driving multiple improvements to Samza for stateful
> > > > applications. He built a state-replication feature for quicker
> > > failure
> > > > recovery (aka "Hot standby containers
> > > > <
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/SAMZA/SEP-19%3A+Hot+standby+state+for+Samza+applications
> > > > >").
> > > > His work on parallel restore
> > > > <
> > > >
> https://drive.google.com/file/d/1CHJl7K9QE3eB2QQPklwn76k8WvUQLqf-/view
> > >
> > > to
> > > > RocksDb reduces our bootstrap times considerably. In addition to
> > > being
> > > > the
> > > > release-master for the Samza 1.0
> > > > <
> > > >
> > >
> >
> https://engineering.linkedin.com/blog/2018/11/samza-1-0--stream-processing-at-massive-scale
> > > > >
> > > > release, he improved Samza's operability
> > > > with his
> work
> > > on
> > > > the
> > > > "Diagnostics" feature.
> > > >
> > > > *Bharath*
> > > > Bharath has a history of contributing multiple impactful features
> > to
> > > > Samza.
> > > > His contributions include "side-input" stores, in-memory streams
> > > > <
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=71013043>
> > > > to enable unit-testing, stability enhancements to
> Samza-standalone
> > > and
> > > > our upgrade
> > > > to Kafka 2.0. In
> > addition
> > > to
> > > > shepherding a successful 0.14 release, he also designed and built
> > our
> > > > async-high-level
> > > > API
> > > > <
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/SAMZA/SEP-21%3A+Samza+Async+API+for+High+Level
> > > > >
> > > > .
> > > >
> > > > Thank you Ray and Bharath for your contributions. We look forward
> > to
> > > > more :)
> > > >
> > > > --
> > > > Jagadish V
> > > > (for the Apache Samza PMC)
> > > >
> > > >
> > >
> >
>


Re: Tracing the Samza+YARN startup process

2019-06-19 Thread Yi Pan
Great and detailed report! Really appreciate it!

-Yi

On Tue, Jun 18, 2019 at 2:37 PM Malcolm McFarland 
wrote:

> Just want to follow up on this, for anybody that might be trying to do
> something similar.
>
> There are two things that were getting in the way of us using YARN+Samza on
> ECS: 1) YARN needs to be able to resolve its hostname to something that's
> publicly available; and 2) Samza needs to be able to open connections on
> arbitrary ports in the 3+ range.
>
> Docker confounds each of these in a different way. For the first, Docker's
> hostname inside of the container is an arbitrary hash, and this is what
> java.net.InetAddress will resolve to. I took Rayman's suggestion and used
> dnsmasq to create a local CNAME mapping inside the container, mapping the
> local "hostname" to one that is publicly available. This should work well
> for any Docker-hosted JVM app relying on java.net.InetAddress.
>
> Docker also only allows 100 ports to be publicly exposed, and there is no
> configuration option in Samza to specify what the range of ports will be.
> The way we worked around this on ECS was to create an elastic network
> interface (ENI) for each of the node manager containers. Although I can't
> find any documentation on this, I suspect that Fargate does this by
> default, as the whole point of that service is to bypass the restrictions
> placed on containers running on EC2 instances. With the ENI, we no longer
> had to explicitly expose any ports; all ports will be available if the
> security group allows.
>
> As an aside, you might wonder: why not just run these on Fargate? Well,
> Fargate only allows 10GB of storage (this can be extended a small amount
> via an ephemeral mounted volume but seemingly not enough to satisfy YARN's
> VM requirements).
>
> Hth, and thanks for everybody's patience,
>
> Malcolm McFarland
> Cavulus
>
>
> This correspondence is from HealthPlanCRM, LLC, d/b/a Cavulus. Any
> unauthorized or improper disclosure, copying, distribution, or use of the
> contents of this message is prohibited. The information contained in this
> message is intended only for the personal and confidential use of the
> recipient(s) named above. If you have received this message in error,
> please notify the sender immediately and delete the original message.
>
>
> On Fri, May 31, 2019 at 3:08 PM rayman preet  wrote:
>
> > Apart from /etc/hosts and /bin/hostname the only other relevant place
> might
> > be
> > to modify values in /etc/resolv.conf, to point to, e.g., a dnsmasq
> > instance.
> >
> > On Fri, May 31, 2019 at 2:43 PM Malcolm McFarland <
> mmcfarl...@cavulus.com>
> > wrote:
> >
> > > Hey Rayman,
> > >
> > > The ops group and I went through the configuration today and observed
> the
> > > YARN containers as they were coming up. We seem to have found the root
> of
> > > the problem, and I'm putting this out there for anybody else that's
> > trying
> > > to do something similar on AWS ECS:
> > >
> > > The ECS container instances set their hostname to the container ID on
> > > startup (ie 717b6f75aaf8), and this looks like it's interfering with
> the
> > > YARN container startup process. This *seems* to be corroborated in that
> > > containers that start on the same host as their AM look to be starting
> > fine
> > > (ie they can locally resolve their IP address correctly), but
> containers
> > > starting on other hosts don't seem to be. We were *not* having this
> > problem
> > > on Fargate, and my only guess is that, given Fargate's intended use
> case
> > as
> > > a replicated-services-in-the-cloud environment, AWS sets the hostname
> for
> > > Fargate-bound Docker containers on launch (ie
> > > ip-10-#-#-#.us-west-#.internal.local or whatever). (As a side note, we
> > > probably would have stuck with Fargate and not run into this problem,
> but
> > > Fargate instances are only allowed 10GB of disk space, and this wasn't
> > > enough for YARN's VM requirements.)
> > >
> > > I've been fishing around for a way to get Samza to resolve the hostname
> > to
> > > something more publicly-available. I've thus far tried a) changing the
> > > /etc/hosts file, and b) replacing the /bin/hostname binary in the
> > container
> > > with a static script, but neither of these options seem to have an
> effect
> > > on Java's DNS resolution. Two further options I can think of are:
> > >
> > > - find some place in the Samza configuration where the hostname can be
> > set
> > > explicitly; or
> > > - change just the right piece of information in the system so that
> > > java.net.InetAddress will resolve the localhost to something other than
> > > what's returned from /bin/hostname (I'm guessing it uses gethostname()
> on
> > > Ubuntu, could be wrong).
> > >
> > > Anybody ideas?
> > >
> > > Cheers,
> > > Malcolm McFarland
> > > Cavulus
> > >
> > >
> > > This correspondence is from HealthPlanCRM, LLC, d/b/a Cavulus. Any
> > > unauthorized or improper disclosure, copying, distribution, or use of
> the
> > 

Re: Apache Samza 1.2.0 is released

2019-06-12 Thread Yi Pan
Thanks, Boris!

-Yi

On Wed, Jun 12, 2019 at 1:17 PM Boris Shkolnik  wrote:

> Documentation and Blog are published.
> Thanks everyone.
>


Re: REMINDER. [VOTE] Apache Samza 1.2.0 RC4

2019-06-10 Thread Yi Pan
+1 (binding), verified signature and built successfully.

One more: can we make sure that we address SAMZA-2064? The Infra team has
mentioned this issue multiple times in the feedbacks to our report.

Thanks!

-Yi

On Thu, Jun 6, 2019 at 5:59 PM Jagadish Venkatraman 
wrote:

> +1 (binding)
>
> Verified signatures, built the RC successfully.
>
>
>
>
> On Wed, Jun 5, 2019 at 5:17 PM Prateek Maheshwari 
> wrote:
>
> > +1 (binding)
> >
> > Verified build + check-all +  integration tests + signatures.
> > Thanks for help with the release, Boris and Pawas.
> >
> > - Prateek
> >
> > On Wed, Jun 5, 2019 at 3:03 AM Bharath Kumara Subramanian
> >  wrote:
> > >
> > > +1 (non-binding)
> > >
> > > Verified build and test on Linux. I too noticed some intermittent
> > failures
> > > on mac for Scala 2.12.
> > >
> > > Thanks,
> > > Bharath
> > >
> > > On Tue, Jun 4, 2019 at 2:00 PM Hai Lu  wrote:
> > >
> > > > +1 (non-binding)
> > > >
> > > > Verified build and test on Linux box. On mac the test is failing but
> > seems
> > > > like flakiness not real failure.
> > > >
> > > > Thanks,
> > > > Hai
> > > >
> > > > On Tue, Jun 4, 2019 at 1:55 PM santhosh venkat <
> > > > santhoshvenkat1...@gmail.com>
> > > > wrote:
> > > >
> > > > > +1(non-binding)
> > > > >
> > > > > 1. ./bin/check-all.sh succeeded
> > > > > 2. ./bin/integration-tests.sh succeeded
> > > > > 3. Expanded samza-tools and followed the tutorial steps for
> > standalone
> > > > SQL
> > > > > examples Succeeded.
> > > > > 4. Verified all sha1 hash code and asc signatures successfully
> > > > >
> > > > > Thanks,
> > > > >
> > > > >
> > > > > On Tue, Jun 4, 2019 at 1:26 PM Xinyu Liu 
> > wrote:
> > > > >
> > > > > > +1 (binding).
> > > > > >
> > > > > > run check-all.sh and the build passed.
> > > > > >
> > > > > > Having trouble running the integration tests in both linux and
> mac,
> > > > > > possibly due to my local machine env.
> > > > > >
> > > > > > Thanks,
> > > > > > Xinyu
> > > > > >
> > > > > > On Mon, Jun 3, 2019 at 11:00 AM Daniel Nishimura <
> > dnishim...@gmail.com
> > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > check-all.sh and integration tests passed. +1 from me.
> > > > > > >
> > > > > > > Just a side note, the link in the original email is a broken
> > link.
> > > > The
> > > > > > link
> > > > > > > to the RC archive is:
> > http://home.apache.org/~boryas/samza-1.2.0-rc4
> > > > > > >
> > > > > > > On Sun, Jun 2, 2019 at 5:00 PM Boris Shkolnik <
> bor...@apache.org
> > >
> > > > > wrote:
> > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > This is a call for a vote on a release of Apache Samza 1.2.0.
> > > > Thanks
> > > > > to
> > > > > > > > everyone who has contributed to this release.
> > > > > > > >
> > > > > > > >
> > > > > > > > The release candidate can be downloaded from here:
> > > > > > > > http://home.apache.org/~boryas/samza-1.2.0-rc
> > > > > > > > 4
> > > > > > > >
> > > > > > > > (this release has a fix for standalone integration test)
> > > > > > > >
> > > > > > > > The release candidate is signed with pgp key
> > 0x7D74D0CD5B5EB041,
> > > > > which
> > > > > > > can
> > > > > > > > be found
> > > > > > > >
> > > > > >
> > > >
> > http://keyserver.ubuntu.com/pks/lookup?op=get=0x7d74d0cd5b5eb041
> > > > > > > > <
> > > > > >
> > > >
> > http://keyserver.ubuntu.com/pks/lookup?op=get=0xF8B95961A401BF0F
> > > > > > > >
> > > > > > > > The git tag is release-1.2.0-rc4 and signed with the same pgp
> > key:
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> >
> https://gitbox.apache.org/repos/asf?p=samza.git;a=tag;h=refs/tags/release-1.2.0-rc
> > > > > > > > <
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> >
> https://gitbox.apache.org/repos/asf?p=samza.git;a=tag;h=refs/tags/release-1.1.0-rc1
> > > > > > > > >
> > > > > > > > 4
> > > > > > > >
> > > > > > > > Test binaries have been published to Maven's staging
> > repository,
> > > > and
> > > > > > are
> > > > > > > > available here:
> > > > > > > >
> > > > >
> > https://repository.apache.org/content/repositories/orgapachesamza-106
> > > > > > > > <
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> >
> https://repository.apache.org/content/repositories/orgapachesamza-1065/org/
> > > > > > > > >
> > > > > > > > 9
> > > > > > > >
> > > > > > > > The vote will be open until 06:00 PM PST Monday, 06/03/2019.
> > > > > > > >
> > > > > > > >
> > > > > > > > Please download the release candidate, check the
> > hashes/signature,
> > > > > > build
> > > > > > > it
> > > > > > > > and test it, and then please vote:
> > > > > > > >
> > > > > > > > [ ] +1 approve
> > > > > > > >
> > > > > > > > [ ] +0 no opinion
> > > > > > > >
> > > > > > > > [ ] -1 disapprove (and reason why)
> > > > > > > >
> > > > > > > > I ran check-all.sh and integration tests.
> > > > > > > >
> > > > > > > > +1 from my side.
> > > > > > > >
> > > > > > > > Thanks
> > > > > > > >
> > > > > > >
> > > > > 

Re: [ANNOUNCE] Please welcome Boris Shkolnik to the Samza PMC

2019-06-09 Thread Yi Pan
Welcome and well deserved, Boris!

-Yi

On Sat, Jun 8, 2019 at 3:28 PM Hai Lu  wrote:

> Congratulations, Boris!
>
> On Fri, Jun 7, 2019 at 6:13 PM Aditya  wrote:
>
> > Congrats Boris!
> >
> > > On Jun 7, 2019, at 4:58 PM, Weiqing Yang 
> > wrote:
> > >
> > > Congrats, Boris!
> > >
> > > On Fri, Jun 7, 2019 at 4:50 PM santhosh venkat <
> > santhoshvenkat1...@gmail.com>
> > > wrote:
> > >
> > >> Congratulations boris! Very well deserved.
> > >>
> > >> On Fri, Jun 7, 2019 at 3:41 PM Daniel Nishimura  >
> > >> wrote:
> > >>
> > >>> Congrats!
> > >>>
> >  On Fri, Jun 7, 2019 at 3:35 PM Ignacio Solis 
> wrote:
> > 
> >  Congrats Boris!
> > 
> >  On Fri, Jun 7, 2019 at 3:20 PM Bharath Kumara Subramanian <
> >  codin.mart...@gmail.com> wrote:
> > 
> > > Congratulations Boris!
> > >
> > > On Fri, Jun 7, 2019 at 3:19 PM Jagadish Venkatraman <
> > > jagadish1...@gmail.com>
> > > wrote:
> > >
> > >> Congratulations Boris!
> > >>
> > >> On Fri, Jun 7, 2019 at 3:15 PM Xinyu Liu 
> >  wrote:
> > >>
> > >>> Congrats, Boris!
> > >>>
> > >>> Xinyu
> > >>>
> > >>> On Fri, Jun 7, 2019 at 3:13 PM Jakob Homan 
> >  wrote:
> > >>>
> >  Howdy all-
> >    I'm very pleased to announce that the Samza PMC has voted
> > >>> Boris
> >  Shkolnik to be a Project Management Committee (PMC) Member.
> > >> The
> >  PMC
> >  is responsible for the overall health of a project andl for
> > >>> voting
> >  in
> >  new committers and PMC members, as well as VOTEing on releases.
> >  Over
> >  the past two years, Boris has been a valuable committer on the
> >  project.
> > 
> >  Congrats Boris!
> > 
> >  Thanks,
> > 
> >  Jakob
> >  on behalf of the Samza PMC
> > 
> > >>>
> > >>
> > >>
> > >> --
> > >> Jagadish V,
> > >> Graduate Student,
> > >> Department of Computer Science,
> > >> Stanford University
> > >>
> > >
> > 
> > 
> >  --
> >  Nacho - Ignacio Solis - iso...@igso.net
> > 
> > >>>
> > >>
> >
>


Re: [DISCUSS] Hygene for merging PRs

2019-05-19 Thread Yi Pan
Hi, Cameron,

That's generally the case. Thanks for Xinyu to bring this to
our attention! +1 to the stated guidelines.

-Yi

On Fri, May 17, 2019 at 11:10 AM Cameron Lee 
wrote:

> Thanks Xinyu for starting this thread.
> I support the guidelines that you mentioned, with a couple clarifications
> regarding "PR Review": If a Samza PR is authored by a committer, then
> another second committer should provide an approval before that code is
> merged, correct? Once the second committer provides approval, then are we
> ok with any committer (including the PR author) doing the merge?
>
> Cameron
>
> On Thu, May 16, 2019 at 11:30 AM Xinyu Liu  wrote:
>
> > Hi, all,
> >
> > I've seen different practices around how PRs are contributed, reviewed
> and
> > merged for Samza open source. I think it's time to bring up our committer
> > guide again to make sure we follow exactly the guidelines. It's also an
> > opportunity to talk about future improvement to the flow.
> >
> > *PR Contribution*
> > According to our committer guide [1], a JIRA must be created before
> > creating the PR, unless the PR is trivial typo or doc fixes. The PR needs
> > to have the JIRA ticket name in the following format:
> >
> > *SAMZA- : *
> >
> > As an example:
> > SAMZA-2168: Remove redundant SystemAdmin creation in ApplicationMaster
> [2].
> >
> > *PR Review*
> > As discussed before, a Samza PR requires an approval from a committer
> > before merging. Contributors are welcome to review the code, but a final
> > "LGTM" from a committer is a MUST.
> >
> > *PR merge*
> > As we now use the simple merge flow in github to merge a PR, I think we
> > should mostly squash the commits for merging.Otherwise it's hard to roll
> > back changes and it generally generates a lot of noise in the commit
> > history.
> >
> > Any further suggestions are highly appreciated.
> >
> > Thanks,
> > Xinyu
> >
> > [1]
> > https://cwiki.apache.org/confluence/display/SAMZA/Contributor%27s+Corner
> > [2] https://github.com/apache/samza/pull/1001
> >
>


Re: [ANNOUNCE] New committer announcement: Cameron Lee

2019-04-17 Thread Yi Pan
Congrats Cameron!!!

On Wed, Apr 17, 2019 at 6:12 AM Daniel Nishimura 
wrote:

> Congrats!
>
> Sent from my iPhone
>
> > On Apr 16, 2019, at 9:22 PM, Jagadish Venkatraman <
> jagadish1...@gmail.com> wrote:
> >
> > Awesome addition! Congrats Cameron. well deserved.
> >
> >> On Tuesday, April 16, 2019, Xinyu Liu  wrote:
> >>
> >> Hi, all,
> >>
> >> Please join me and the rest of the Samza PMC in welcoming a new
> committer:
> >> Cameron Lee.
> >>
> >> Cameron has been contributing to Samza since early 2018. He worked on
> >> multiple areas: the runtime context in Samza, checkpoint enhancements,
> as
> >> well as testing and gradle improvements. He is also very active in code
> >> reviews and discussions.
> >>
> >> Through his work on refining existing Samza code base, the Samza PMC
> trusts
> >> Cameron with the responsibilities of a Samza committer.
> >>
> >> Look forward to seeing more contributions from you, Cameron!
> >>
> >> Xinyu
> >>
> >
> >
> > --
> > Jagadish V,
> > Graduate Student,
> > Department of Computer Science,
> > Stanford University
>


Fwd: [REPORT] Samza - April 2019

2019-04-11 Thread Yi Pan
-- Forwarded message -
From: Yi Pan (Data Infrastructure) 
Date: Thu, Apr 11, 2019 at 9:16 PM
Subject: [REPORT] Samza - April 2019
To: 
Cc: 


## Description:
- Apache Samza is a distributed stream processing engine that are highly
  configurable to process events from various data sources, including
  real-time messaging system (e.g. Kafka) and distributed file systems (e.g.
  HDFS).

## Issues:
- No issues requires board attention

## Activity:
- Samza 1.1 is released:
http://sa
mza.apache.org/blog/2019-03-22-announcing-the-release-of-apache-samza--1.1.0
- Continued SEP projects initiated or in-progress:
- SEP-20: Samza on Kubernetes
- SEP-21: Async high-level api
- Stream Processing meetup held on 03/20/2019:
  https://www.meetup.com/Stream-Processing-Meetup-LinkedIn/events/259437388/
- YouTube 101 Tutorial for Stream Processing using Samza:
  https://www.youtube.com/playlist?list=PLZDyxA22zzGyNgtBMUIXAgIaO5Ok3PR-x

## Health report:
- Project is in healthy status with 1.1 released in Mar 2018

## PMC changes:

- Currently 15 PMC members.
- No new PMC members added in the last 3 months
- Last PMC addition was Prateek Maheshwari on Thu Nov 01 2018

## Committer base changes:

- Currently 23 committers.
- New committer addition since last report was Santhosh Venkat at Fri Jan 11
   2019

## Releases:

- 1.1.0 was released on Thu Mar 21 2019

## /dist/ errors: 9
- JIRAs are wip

## Mailing list activity:
- dev@samza.apache.org:
- 270 subscribers (down -3 in the last 3 months):
- 1047 emails sent to list (453 in previous quarter)


## JIRA activity:
- 105 JIRA tickets created in the last 3 months
- 132 JIRA tickets closed/resolved in the last 3 months


Re: Context when Converting from Yarn Task to Standalone LocalApplicationRunner

2019-04-03 Thread Yi Pan
Hi, Jeremiah,

In the new apis, you should be using ApplicationContainerContextFactory and
ApplicationTaskContextFactory to instantiate context objects used in the
whole container or in a task instance, respectively. The context factories
should be implemented as dependencies injected to your implementation of
StreamApplication. In your example, you should add the context factories in
the InquirySubmissionApp:
public InquirySubmissionApp implements StreamApplication {
  @Override
  public void describe(StreamApplicationDescriptor appDescriptor) {
 appDescriptor.withApplicationContainerContextFactory(new
MyContainerContextFactory(...));
 appDescriptor.withApplicationTaskContextFactory(new
MyTaskContextFactory(...));
 // user processing logic using MessageStream and the transform
operators...
  }
}

Best,

-Yi

On Wed, Apr 3, 2019 at 12:05 PM Jeremiah Adams 
wrote:

> ?I am working to move our code from Yarn based StreamTask to standalone
> StreamApplication via LocalTaskRunner.
>
>
> I'm having some trouble understanding how to create/fetch a Context for
> use in the IntiableFunction.init() interface. My old code used context to
> get the store for LocalCacheManager initialization so I need this context.
> I see no interesting methods in StreamAplicationDescriptor.
>
>
> I am launching my StreamApplication via a TaskRunner:
>
>
> Where/how do i get a Samza.context.Context?
>
>
> public static void main( String[] args )
> {
>
> CommandLine cmdLine = new CommandLine();
> OptionSet options = cmdLine.parser().parse(args);
> Config config = cmdLine.loadConfig(options);
>
> InquirySubmissionApp app= new InquirySubmissionApp(config);
> // Need to get a context here?
> app.init();
> LocalApplicationRunner localApplicationRunner = new
> LocalApplicationRunner(app, config);
> localApplicationRunner.run();
> localApplicationRunner.waitForFinish();
> }?
>
>
>
>
>
>
>
>
> Jeremiah Adams
> Software Engineer
> www.helixeducation.com
> Blog | Twitter<
> https://twitter.com/HelixEducation> | Facebook<
> https://www.facebook.com/HelixEducation> | LinkedIn<
> http://www.linkedin.com/company/3609946>
>


Re: Trouble running samza 0.14 in standalone mode

2019-03-07 Thread Yi Pan
Great! Glad that you were able to figure it out!

-Yi

On Thu, Mar 7, 2019 at 3:14 AM Anoop Krishnakumar <
anoop.krishnaku...@gmail.com> wrote:

> Hi Yi,
>
> Apologies for my ignorance. I did not realize that attachments wont make
> through and gist is the preferred method of sharing logs and code snippets.
>
> Issue is resolved. I was using the default task.name.grouper.factory
> instead of GroupByContainerIdsFactory.
> Appreciate and thanks for the response. I will consider using 1.0 version.
>
> -anoop
>
>
> On Thu, 7 Mar 2019 at 01:41, Yi Pan  wrote:
>
> > Hi, Anoop,
> >
> > 1. Please provide the full log file if possible. Just listing out a
> single
> > log line reporting the failure does not help.
> > 2. Is there any reason that you still stay with Samza 0.14? I would
> highly
> > recommend to upgrade to 1.0 since there are tons of API and standalone
> > related improvements went in to that version after 0.14.
> >
> > -Yi
> >
> > On Wed, Mar 6, 2019 at 1:54 PM Anoop Krishnakumar <
> > anoop.krishnaku...@gmail.com> wrote:
> >
> > > I am prototyping an application where I have two input topics and two
> > > output topics. The application also uses a rocksdb state store that is
> > > backed by kafka.
> > >
> > > I followed WikipediaZkLocalApplication example, but the application
> > always
> > > exits without listening for messages from topic. I could see the
> > execution
> > > plan and job coordinator communicating with zookeeper. I could see the
> > > following line in logs
> > > StreamProcessor [INFO] Container is not instantiated for stream
> > processor:
> > > 5f7e2a46-fb7a-4054-bf5c-c62423ca2e35.
> > > I could not figure out why the container wouldn't start. Any help would
> > be
> > > appreciated.
> > >
> > > Attached is the log and the job configuration.
> > >
> > > -anoop
> > >
> >
>


Re: [DISCUSS] Samza 1.1.0 release

2019-03-06 Thread Yi Pan
+1 (binding)

On Wed, Mar 6, 2019 at 10:08 PM Daniel Chen  wrote:

> Hello everyone,
>
> We have added couple of major features to master since 1.0.0 that warrants
> a major release.
>
> Within LinkedIn, some of these features have already been tested as part of
> our test suites. We plan to continue our testing in coming weeks to
> validate the stability prior to release.
>
> Here is the highlighted list of features that are part of the new release
> (in chronological order)
> SAMZA-1981
> Consolidate table descriptors to samza-api
> SAMZA-1985
> Implement Startpoints model and StartpointManager
> SAMZA-1998
> Table API refactoring
> SAMZA-2012
> Add API for wiring an external context through to application processing
> code
> SAMZA-2041
> Add system descriptors for HDFS and Kinesis
> SAMZA-2043
> Consolidate ReadableTable and ReadWriteTable
> SAMZA-2106
> Samza App & Job Config Refactor
> SAMZA-2081
> Samza SQL : Type system for Samza SQL
>
> You can find a complete list of features here:
>
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fissues%2F%3Fjql%3Dproject%2520%253D%2520SAMZA%2520AND%2520resolution%2520%2520%253D%2520Fixed%2520%2520AND%2520(fixVersion%2520%253E%253D%25201.1%2520)%2520ORDER%2520BY%2520createdDate%2520%2520DESCdata=02%7C01%7Cdchen1%40linkedin.com%7C01251a7438ea4324f3f608d6a2c11a53%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636875347611087937sdata=ZDMaQj5vX6Vlm%2B8vpGhrNygxpI2vvNnYGi1USWe%2FD5A%3Dreserved=0
>
> Here is my proposal on our release schedule and timelines.
>
>1. Cut a release version 1.1.0 from master
>2. Target a release vote on the week March 13th (next week)
>
> Thoughts?
>
> Thanks,
> Daniel
>


Re: Trouble running samza 0.14 in standalone mode

2019-03-06 Thread Yi Pan
Hi, Anoop,

1. Please provide the full log file if possible. Just listing out a single
log line reporting the failure does not help.
2. Is there any reason that you still stay with Samza 0.14? I would highly
recommend to upgrade to 1.0 since there are tons of API and standalone
related improvements went in to that version after 0.14.

-Yi

On Wed, Mar 6, 2019 at 1:54 PM Anoop Krishnakumar <
anoop.krishnaku...@gmail.com> wrote:

> I am prototyping an application where I have two input topics and two
> output topics. The application also uses a rocksdb state store that is
> backed by kafka.
>
> I followed WikipediaZkLocalApplication example, but the application always
> exits without listening for messages from topic. I could see the execution
> plan and job coordinator communicating with zookeeper. I could see the
> following line in logs
> StreamProcessor [INFO] Container is not instantiated for stream processor:
> 5f7e2a46-fb7a-4054-bf5c-c62423ca2e35.
> I could not figure out why the container wouldn't start. Any help would be
> appreciated.
>
> Attached is the log and the job configuration.
>
> -anoop
>


Re: code cleanups

2019-02-04 Thread Yi Pan
Hi, Andrey,

Thanks for bring this up. See my response below:

On Mon, Feb 4, 2019 at 4:54 AM Andrey Paykin  wrote:

>
> 1. Is this idea make sense (code cleanup)
>

Yes, totally make sense.


> 2. Do I need JIRA ticket(s)? May be there is some example ticket or
> additional requirements?
>

I would recommend to open JIRAs to track this effort.


> 3. If answer for 2 is yes, how should I organize it? If all cleanups are
> submitted in one PR it can be difficult to review. Or cleanups can be
> submitted for each module separately with many PRs for each ticket, it
> leads to many same tickets. What is better way?
>
>
It would better to be organized as an umbrella JIRA + a set of sub-JIRAs,
each sub-JIRA addresses one subset of cleanups (organized by module and
some meaning logic unit within a module if there are many in a single
module, i.e. samza-yarn/).

Does that make sense?

Thanks and welcome to the community!

-Yi


Re: [VOTE] Migration of Samza git repo to gitbox.apache.org

2019-01-24 Thread Yi Pan
+1 (binding). Thanks a lot!

On Thu, Jan 24, 2019 at 6:02 PM Boris S  wrote:

> +1. Thanks!
>
> On Thu, Jan 24, 2019 at 8:53 AM santhosh venkat <
> santhoshvenkat1...@gmail.com> wrote:
>
> > +1 (non-binding).
> >
> > On Thu, Jan 24, 2019 at 7:10 AM Jake Maes  wrote:
> >
> > > +1 (binding)
> > >
> > > On Wed, Jan 23, 2019 at 10:35 PM santhosh venkat <
> > > santhoshvenkat1...@gmail.com> wrote:
> > >
> > > > +1 (binding).
> > > >
> > > > Thanks,
> > > >
> > > > On Wed, Jan 23, 2019 at 2:43 PM Jagadish Venkatraman <
> > > > jagadish1...@gmail.com>
> > > > wrote:
> > > >
> > > > > +1 (binding). Thank you Pawas for driving this!
> > > > >
> > > > > On Wed, Jan 23, 2019 at 2:40 PM Xinyu Liu 
> > > wrote:
> > > > >
> > > > > > +1 (binding).
> > > > > >
> > > > > > On Wed, Jan 23, 2019 at 2:39 PM Prateek Maheshwari <
> > > > prateek...@gmail.com
> > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > +1 (binding) again
> > > > > > >
> > > > > > > - Prateek
> > > > > > >
> > > > > > > On Wed, Jan 23, 2019 at 11:50 AM Pawas Chhokra <
> > > pawas2...@gmail.com>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > Hi all,
> > > > > > > >
> > > > > > > > This is a call for a vote on migrating Samza git repo to
> > > > > > > gitbox.apache.org, on
> > > > > > > > 11 AM, Jan 29, 2019. As mandated by the Apache Infrastructure
> > > Team,
> > > > > all
> > > > > > > git
> > > > > > > > repositories must be migrated from git-wip-us.apache.org URL
> > to
> > > > > > > > gitbox.apache.org, as the old service is being
> decommissioned.
> > > > > > > > The vote will be open for 72 hours (ending at 12:00 PM PST
> > > Monday,
> > > > > > > > January 28). You can vote as follows:
> > > > > > > >
> > > > > > > > [ ] +1 approve
> > > > > > > >
> > > > > > > > [ ] +0 no opinion
> > > > > > > >
> > > > > > > > [ ] -1 disapprove (and reason why)
> > > > > > > >
> > > > > > > > The vote is +1 from my side.
> > > > > > > >
> > > > > > > > Thanks & Regards,
> > > > > > > > Pawas Chhokra
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Jagadish V,
> > > > > Graduate Student,
> > > > > Department of Computer Science,
> > > > > Stanford University
> > > > >
> > > >
> > >
> >
>


Re: [DISCUSS] Mandatory migration of Samza git repo to gitbox.apache.org

2019-01-15 Thread Yi Pan
+1 (binding) for the move as well.

@Jake Maes  I think our merge script requires setting
up a local alias 'apache-samza' for the remote repo. Hence, it should be a
simple step to point the 'apache-samza' repo to the new repo on gitbox,
hopefully.

-Yi

On Tue, Jan 15, 2019 at 12:12 PM Jake Maes  wrote:

> Yeah, sounds pretty light weight.
>
> Quick question for committers: Do we anticipate this affecting our merge
> script? Perhaps we won't need the merge script, since we'll be able to use
> github directly. Thoughts?
>
> -Jake
>
> On Tue, Jan 15, 2019 at 11:49 AM Prateek Maheshwari 
> wrote:
>
> > Thanks for starting the discussion Pawas. I'm +1 (binding) for the
> > migration.
> >
> > - Prateek
> >
> > On Tue, Jan 15, 2019 at 11:44 AM Pawas Chhokra 
> > wrote:
> > >
> > > Hi all,
> > >
> > > As mandated by the Apache Infrastructure Team, all git repositories
> must
> > be
> > > migrated from git-wip-us.apache.org URL to gitbox.apache.org, as the
> old
> > > service is being decommissioned. This needs to happen before February
> > 7th,
> > > and this ticket  is
> to
> > > check if migrating Samza on 11 AM, Jan 25, 2019 is acceptable to
> > everyone.
> > >
> > > Thanks & Regards,
> > > Pawas Chhokra
> >
>


[REPORT] Samza - January 2019

2019-01-10 Thread Yi Pan
## Description:
- Apache Samza is a distributed stream processing engine that are highly
   configurable to process events from various data sources, including
   real-time messaging system (e.g. Kafka) and distributed file systems
(e.g.
   HDFS).

## Issues:
- No issues requires board attention

## Activity:
- Samza 1.0 is released:
- News coverage:
https://www.zdnet.com/article/real-time-data-proces
sing-just-got-more-options-linkedin-releases-apache-samza-1-0-streaming/
- Engineering blogs:
https://engineering  .
linkedin.com/blog/2018/11/samza-1-0--stream-processing-at-massive-scale
- Major online website refresh: http://samza.apache.org/
- Critical improvement projects completed:
- Changelog restore parallelization
- Evaluated HDFS based backup/restore of state stores
- Multiple SEP projects initiated or in-progress:
- SEP-18: allows manipulating starting offsets and time-based rewind
- SEP-19: Fast failover for stateful jobs on container failure (i.e.
  standby container)
- SEP to come soon: async high-level API
- Beam Samza runner upgrade to use Samza 1.0
- Go and Python support via Beam Samza runner

## Health report:
- Project is in healthy status with 1.0 released in Nov 2018

## PMC changes:

- Currently 15 PMC members.
- Prateek Maheshwari was added to the PMC on Thu Nov 01 2018

## Committer base changes:

- Currently 22 committers.
- New commmitters:
- Aditya Toomula was added as a committer on Mon Nov 05 2018
- Hai Lu was added as a committer on Mon Nov 05 2018

## Releases:

- Last release was 1.0 on Nov 28, 2018

## /dist/ errors: 9
- Project is in healthy status with a major release pending in Oct

## Mailing list activity:

- dev@samza.apache.org:
- 271 subscribers (down -13 in the last 3 months):
- 445 emails sent to list (288 in previous quarter)


## JIRA activity:

- 111 JIRA tickets created in the last 3 months
- 57 JIRA tickets closed/resolved in the last 3 months


Re: Draft report to board - Jan 2019

2019-01-09 Thread Yi Pan
Thanks! Updated inline accordingly.

-Yi

On Wed, Jan 9, 2019 at 12:32 PM Prateek Maheshwari 
wrote:

> Thanks for the summary Yi. I'd change: "HDFS based backup/restore of
> state stores" to "Evaluation for HDFS based backup/restore of state
> stores" since this was an intern project and is not checked in to
> master. Otherwise LGTM.
>
> Thanks,
> Prateek
>
> On Wed, Jan 9, 2019 at 12:28 PM Yi Pan  wrote:
> >
> > Hi, all,
> >
> > Our quarterly report is due this Wed (1/9). The following is the draft
> > report. Please let me know by the end of the day if I missed anything.
> > Thanks!
> >
> > ## Description:
> >
> >  - Apache Samza is a distributed stream processing engine that are highly
> >
> >configurable to process events from various data sources, including
> >
> >real-time messaging system (e.g. Kafka) and distributed file systems
> > (e.g.
> >
> >HDFS).
> >
> >
> >
> > ## Issues:
> >
> >  - No issues requires board attention
> >
> >
> >
> > ## Activity:
> >
> >  - Samza 1.0 is released:
> >
> > - News coverage:
> >
> https://www.zdnet.com/article/real-time-data-processing-just-got-more-options-linkedin-releases-apache-samza-1-0-streaming/
> >
> > - Engineering blogs:
> >
> https://engineering.linkedin.com/blog/2018/11/samza-1-0--stream-processing-at-massive-scale
> >
> > - Major online website refresh: http://samza.apache.org/
> >
> >  - Critical improvement projects completed:
> >
> > - Changelog restore parallelization
> >
> > - Evaluation for HDFS based backup/restore of state stores
> >
> >  - Multiple SEP projects initiated or in-progress:
> >
> > - SEP-18: allows manipulating starting offsets and time-based rewind
> >
> > - SEP-19: Fast failover for stateful jobs on container failure (i.e.
> > standby container)
> >
> > - SEP to come soon: async high-level API
> >
> >  - Beam Samza runner upgrade to use Samza 1.0
> >
> >  - Go and Python support via Beam Samza runner
> >
> >
> >
> > ## Health report:
> >
> >  - Project is in healthy status with 1.0 released in Nov 2018
> >
> >
> >
> > ## PMC changes:
> >
> >
> >
> >  - Currently 15 PMC members.
> >
> >  - Prateek Maheshwari was added to the PMC on Thu Nov 01 2018
> >
> >
> >
> > ## Committer base changes:
> >
> >
> >
> >  - Currently 22 committers.
> >
> >  - New commmitters:
> >
> > - Aditya Toomula was added as a committer on Mon Nov 05 2018
> >
> > - Hai Lu was added as a committer on Mon Nov 05 2018
> >
> >
> >
> > ## Releases:
> >
> >
> >
> >  - Last release was 1.0 on Nov 28, 2018
> >
> >
> >
> > ## /dist/ errors: 9
> >
> >  - Project is in healthy status with 1.0 released in Nov 2018
> >
> >
> >
> > ## Mailing list activity:
> >
> >
> >
> >  - dev@samza.apache.org:
> >
> > - 271 subscribers (down -13 in the last 3 months):
> >
> > - 445 emails sent to list (288 in previous quarter)
> >
> >
> >
> >
> >
> > ## JIRA activity:
> >
> >
> >
> >  - 111 JIRA tickets created in the last 3 months
> >
> >  - 57 JIRA tickets closed/resolved in the last 3 months
>


Draft report to board - Jan 2019

2019-01-09 Thread Yi Pan
Hi, all,

Our quarterly report is due this Wed (1/9). The following is the draft
report. Please let me know by the end of the day if I missed anything.
Thanks!

## Description:

 - Apache Samza is a distributed stream processing engine that are highly

   configurable to process events from various data sources, including

   real-time messaging system (e.g. Kafka) and distributed file systems
(e.g.

   HDFS).



## Issues:

 - No issues requires board attention



## Activity:

 - Samza 1.0 is released:

- News coverage:
https://www.zdnet.com/article/real-time-data-processing-just-got-more-options-linkedin-releases-apache-samza-1-0-streaming/

- Engineering blogs:
https://engineering.linkedin.com/blog/2018/11/samza-1-0--stream-processing-at-massive-scale

- Major online website refresh: http://samza.apache.org/

 - Critical improvement projects completed:

- Changelog restore parallelization

- HDFS based backup/restore of state stores

 - Multiple SEP projects initiated or in-progress:

- SEP-18: allows manipulating starting offsets and time-based rewind

- SEP-19: Fast failover for stateful jobs on container failure (i.e.
standby container)

- SEP to come soon: async high-level API

 - Beam Samza runner upgrade to use Samza 1.0

 - Go and Python support via Beam Samza runner



## Health report:

 - Project is in healthy status with 1.0 released in Nov 2018



## PMC changes:



 - Currently 15 PMC members.

 - Prateek Maheshwari was added to the PMC on Thu Nov 01 2018



## Committer base changes:



 - Currently 22 committers.

 - New commmitters:

- Aditya Toomula was added as a committer on Mon Nov 05 2018

- Hai Lu was added as a committer on Mon Nov 05 2018



## Releases:



 - Last release was 1.0 on Nov 28, 2018



## /dist/ errors: 9

 - Project is in healthy status with 1.0 released in Nov 2018



## Mailing list activity:



 - dev@samza.apache.org:

- 271 subscribers (down -13 in the last 3 months):

- 445 emails sent to list (288 in previous quarter)





## JIRA activity:



 - 111 JIRA tickets created in the last 3 months

 - 57 JIRA tickets closed/resolved in the last 3 months


  1   2   3   4   5   6   7   8   >