from:"Prateek Maheshwari"

Re: [VOTE] SEP-28: Samza State Backend Interface and Checkpointing Improvement

2021-06-24 Thread Prateek Maheshwari

It has been 72+ hours since the VOTE thread started, and we have received 3
binding + 1 non-binding votes on the proposal.
The vote has passed, and this proposal is accepted.

Thanks,
Prateek

On Tue, Jun 22, 2021 at 2:14 PM Bharath Kumara Subramanian <
codin.mart...@gmail.com> wrote:

> +1 (binding). Looking forward to seeing this running in production.
>
> --
> Bharath
>
> On Tue, Jun 22, 2021 at 1:41 PM Sanil Jain  wrote:
>
> > +1 (non-binding) Thanks for the contribution
> >
> > On Tue, 22 Jun 2021 at 13:09, Prateek Maheshwari 
> > wrote:
> >
> > > +1 (binding). Thanks for the contribution!
> > >
> > > - Prateek
> > >
> > > On Tue, Jun 22, 2021 at 11:13 AM Yi Pan  wrote:
> > >
> > > > +1 (binding) this is going to improve our state recovery story
> > > > significantly!
> > > >
> > > > -Yi
> > > >
> > > > On Mon, Jun 21, 2021 at 1:03 PM Daniel Chen 
> wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > This is a call for a vote on SEP-28: Samza State Backend Interface
> > and
> > > > > Checkpointing Improvements. Thanks to everyone who was involved
> with
> > > the
> > > > > design and reviews to refine the proposal.
> > > > >
> > > > > Discussion thread:
> > > > >
> > > > >
> > > >
> > >
> >
> http://mail-archives.apache.org/mod_mbox/samza-dev/202106.mbox/%3cCA+6YmWVVxz=xr244rpg2a-a6qaor0mjrw9ck41-u7tsuv8o...@mail.gmail.com%3e
> > > > >
> > > > > SEP-28:
> > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/SAMZA/SEP-28%3A+Samza+State+Backend+Interface+and+Checkpointing+Improvements
> > > > >
> > > > > Jira ticket:
> > > > > https://issues.apache.org/jira/browse/SAMZA-2591
> > > > >
> > > > > Please vote:
> > > > >
> > > > > [ ] +1 approve
> > > > >
> > > > > [ ] +0 no opinion
> > > > >
> > > > > [ ] -1 disapprove (and reason why)
> > > > >
> > > > > Thanks,
> > > > > Daniel
> > > > >
> > > >
> > >
> >
>

Re: [VOTE] SEP-29: Blob Store Based State Backup And Restore

2021-06-24 Thread Prateek Maheshwari

It has been > 72 hours since the VOTE started, and we have received 3
binding + 2 non-binding votes on the proposal.
The vote has passed, and this proposal is accepted.

Thanks,
Prateek

On Wed, Jun 23, 2021 at 1:47 PM Bharath Kumara Subramanian <
codin.mart...@gmail.com> wrote:

> +1 (binding). Thanks for the contribution!
>
> --
> Bharath
>
> On Tue, Jun 22, 2021 at 8:19 PM Yi Pan  wrote:
>
> > +1 (binding). Thanks for rolling out this big feature!
> >
> > -Yi
> >
> > On Tue, Jun 22, 2021 at 1:42 PM Sanil Jain 
> wrote:
> >
> > > +1 (non-binding) Thanks for this contribution!
> > >
> > > -Sanil
> > >
> > > On Tue, 22 Jun 2021 at 13:13, Daniel Chen  wrote:
> > >
> > > > +1 (non-binding), thanks!
> > > >
> > > > On Tue, Jun 22, 2021 at 1:10 PM Prateek Maheshwari <
> > prate...@utexas.edu>
> > > > wrote:
> > > >
> > > > > +1 (binding) from me. Thanks for the contribution!
> > > > >
> > > > > - Prateek
> > > > >
> > > > > On Tue, Jun 22, 2021 at 11:45 AM Prateek Maheshwari <
> > > prate...@utexas.edu
> > > > >
> > > > > wrote:
> > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > This is a call for a vote on SEP-29: Blob Store Based State
> Backup
> > > And
> > > > > > Restore.
> > > > > >
> > > > > > Discussion thread:
> > > > > >
> > > > >
> > > >
> > >
> >
> https://mail-archives.apache.org/mod_mbox/samza-dev/202106.mbox/%3cCAMja7KdMNU_Zk-vDnwcm4GSJs==126-mu6djgtsoukzkxzf...@mail.gmail.com%3e
> > > > > >
> > > > > > SEP-29:
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/SAMZA/SEP-29%3A+Blob+Store+Based+State+Backup+And+Restore
> > > > > >
> > > > > > Please vote:
> > > > > > [  ] +1 approve
> > > > > > [  ] +0 no opinion
> > > > > > [  ] -1 disapprove (and the reason why)
> > > > > >
> > > > > > Thanks,
> > > > > > Shekhar and Prateek
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [VOTE] SEP-29: Blob Store Based State Backup And Restore

2021-06-22 Thread Prateek Maheshwari

+1 (binding) from me. Thanks for the contribution!

- Prateek

On Tue, Jun 22, 2021 at 11:45 AM Prateek Maheshwari 
wrote:

> Hi all,
>
> This is a call for a vote on SEP-29: Blob Store Based State Backup And
> Restore.
>
> Discussion thread:
> https://mail-archives.apache.org/mod_mbox/samza-dev/202106.mbox/%3cCAMja7KdMNU_Zk-vDnwcm4GSJs==126-mu6djgtsoukzkxzf...@mail.gmail.com%3e
>
> SEP-29:
> https://cwiki.apache.org/confluence/display/SAMZA/SEP-29%3A+Blob+Store+Based+State+Backup+And+Restore
>
> Please vote:
> [  ] +1 approve
> [  ] +0 no opinion
> [  ] -1 disapprove (and the reason why)
>
> Thanks,
> Shekhar and Prateek
>

Re: [VOTE] SEP-28: Samza State Backend Interface and Checkpointing Improvement

2021-06-22 Thread Prateek Maheshwari

+1 (binding). Thanks for the contribution!

- Prateek

On Tue, Jun 22, 2021 at 11:13 AM Yi Pan  wrote:

> +1 (binding) this is going to improve our state recovery story
> significantly!
>
> -Yi
>
> On Mon, Jun 21, 2021 at 1:03 PM Daniel Chen  wrote:
>
> > Hi all,
> >
> > This is a call for a vote on SEP-28: Samza State Backend Interface and
> > Checkpointing Improvements. Thanks to everyone who was involved with the
> > design and reviews to refine the proposal.
> >
> > Discussion thread:
> >
> >
> http://mail-archives.apache.org/mod_mbox/samza-dev/202106.mbox/%3cCA+6YmWVVxz=xr244rpg2a-a6qaor0mjrw9ck41-u7tsuv8o...@mail.gmail.com%3e
> >
> > SEP-28:
> >
> >
> https://cwiki.apache.org/confluence/display/SAMZA/SEP-28%3A+Samza+State+Backend+Interface+and+Checkpointing+Improvements
> >
> > Jira ticket:
> > https://issues.apache.org/jira/browse/SAMZA-2591
> >
> > Please vote:
> >
> > [ ] +1 approve
> >
> > [ ] +0 no opinion
> >
> > [ ] -1 disapprove (and reason why)
> >
> > Thanks,
> > Daniel
> >
>

[VOTE] SEP-29: Blob Store Based State Backup And Restore

2021-06-22 Thread Prateek Maheshwari

Hi all,

This is a call for a vote on SEP-29: Blob Store Based State Backup And
Restore.

Discussion thread:
https://mail-archives.apache.org/mod_mbox/samza-dev/202106.mbox/%3cCAMja7KdMNU_Zk-vDnwcm4GSJs==126-mu6djgtsoukzkxzf...@mail.gmail.com%3e

SEP-29:
https://cwiki.apache.org/confluence/display/SAMZA/SEP-29%3A+Blob+Store+Based+State+Backup+And+Restore

Please vote:
[  ] +1 approve
[  ] +0 no opinion
[  ] -1 disapprove (and the reason why)

Thanks,
Shekhar and Prateek

[DISCUSS] SEP-29: Blob Store Based State Backup And Restore

2021-06-15 Thread Prateek Maheshwari

[Sending on behalf of Shekhar due to email delivery issues]

Hi folks,

We've been working on adding support for a blob store based state backend
for backing up and restoring local state. In our experiments this has led
to significantly faster state restore times compared to Kafka changelogs,
simplifying management of applications with large local state.

Please take a look at the SEP for more details:
https://cwiki.apache.org/confluence/display/SAMZA/SEP-29%3A+Blob+Store+Based+State+Backup+And+Restore

Looking forward to your review and feedback. If there are no blockers, we
will open the vote on Monday, June 21st.

Thanks,
Prateek and Shekhar

Re: Draft board report for Samza

2021-04-16 Thread Prateek Maheshwari

Looks good to me, thanks. May be worth calling out the recent improvements
in standalone stability under Project Activity as well.

- Prateek

On Fri, Apr 16, 2021 at 12:28 PM Bharath Kumara Subramanian <
codin.mart...@gmail.com> wrote:

> Hi all,
>
> Here is a draft report for Samza. Please let me know if I missed anything.
>
>
> 
>
> ## Description:
> The mission of Samza is the creation and maintenance of software related to
> distributed stream processing framework
>
> ## Issues:
> - There are no issues requiring board attention.
>
> ## Membership Data:
> Apache Samza was founded 2015-01-22 (6 years ago)
> There are currently 28 committers and 17 PMC members in this project.
> The Committer-to-PMC ratio is roughly 7:5.
>
> Community changes, past quarter:
> - No new PMC members. Last addition was Bharath Kumarasubramanian on
> 2020-02-13.
> - Ke Wu was added as committer on 2021-02-25
> - Sanil Jain was added as committer on 2021-02-01
>
> ## Project Activity:
> - 1.6.0 was released on 2021-01-28.
> - Support rack aware standby in YARN
>
> ## Community Health:
> - JIRA Activity
>   - 11 issues opened in JIRA, past quarter (-50% decrease)
>   - 7 issues closed in JIRA, past quarter (-46% decrease)
> - Commit Activity
>   - 34 commits in the past quarter
>   - 13 code contributors in the past quarter (44% increase)
>   - 35 PRs opened on GitHub, past quarter (66% increase)
>   - 38 PRs closed on GitHub, past quarter (90% increase)
>

Re: [DISCUSS] Samza 1.5.1 release

2020-08-19 Thread Prateek Maheshwari

+1, this is a critical bug and we should release the fix ASAP.

Thanks,
Prateek

On Tue, Aug 18, 2020 at 9:02 PM Bharath Kumara Subramanian <
codin.mart...@gmail.com> wrote:

> Hi all,
>
> In 1.5 release, we enabled transactional state by default for all samza
> jobs. We identified a critical bug related to trimming the state which
> requires a minor release.
>
> I wanted to kick off the discussion on the open source forum as the bug fix
> has been validated internally at LinkedIn.
>
> More details on the bug can be found in SAMZA-2578
> .
> The patch that contains the fix: samza/pull/1413
> 
>
> I'd like to target early next week for voting.
>
> Cheers,
> Bharath
>

Re: [VOTE] SEP-22: Container Placements in Samza

2020-05-29 Thread Prateek Maheshwari

+1 from me too. Thanks for adding this feature!

- Prateek

On Fri, May 29, 2020 at 11:56 AM Boris S  wrote:

> Hi,
> LGTM.
> +1.
>
> On Wed, May 27, 2020 at 11:28 AM Sanil Jain 
> wrote:
>
> > Hi all,
> >
> > This is a call for a vote on SEP-22: Container Placements in Samza
> >
> > Thanks to everyone who reviewed the proposal and
> > provided feedback.
> >
> > I have addressed comments on the SEP, and I am not aware of any further
> > major questions or objections, so I am starting this vote.
> >
> > *SEP link: *
> >
> >
> https://cwiki.apache.org/confluence/display/SAMZA/SEP-22%3A+Container+Placements+in+Samza
> >
> >
> > *Discuss thread:*
> >
> >
> >
> http://mail-archives.apache.org/mod_mbox/samza-dev/202001.mbox/%3CCAKkRg%3D94NY8cLn89u%3DVeL1K52R3XuOimzxXsy7BLzS7fpS%3DLfg%40mail.gmail.com%3E
> >
> > There was also some discussion through comments on the SEP page (see
> > Resolved Comments).
> >
> > Please vote:
> > [ ] +1 approve
> > [ ] +0 no opinion
> > [ ] -1 disapprove (and reason why)
> >
>

[Draft] Samza Quarterly ASF Board Report

2020-04-06 Thread Prateek Maheshwari

Hi folks,

Below is the draft for the quarterly ASF Board Report for Apache Samza. If
you're aware of recent community activity (new users, talks, meetups,
projects etc.) related to Samza not covered below, we'd love to hear about
it.

Thanks,
Prateek

== Draft Report ==

## Description:
Apache Samza is a distributed stream processing engine that is highly
  configurable to process events from various data sources, including
  real-time messaging systems (e.g. Kafka) and distributed file systems
(e.g.
  HDFS).

## Issues:
- No issues require board attention

## Membership Data:
Apache Samza was founded 2015-01-22 (5 years ago)
There are currently 26 committers and 17 PMC members in this project.
The Committer-to-PMC ratio is roughly 7:5.

Community changes, past quarter:
- Bharath Kumarasubramanian was added to the PMC on 2020-02-13
- No new committers. Last addition was Rayman Preet Singh on 2019-07-08.

## Project Activity:
- New version 1.3.1 was released on 2020-02-20
- New version 1.4.0 was released on 2020-03-18
- There have been 5 new Samza Enhancement Proposals (SEPs)
to add new features in the last quarter. Out of these,
3 have been accepted, and 2 are under discussion.
- JIRA Activity:
  - 75 issues opened in JIRA, past quarter (-12% decrease)
  - 56 issues closed in JIRA, past quarter (133% increase)
- Commit Activity:
  - 125 commits in the past quarter (47% increase)
  - 99 PRs closed on GitHub, past quarter (62% increase)

## Community Health:
- We continue to engage and support the community via
the dev@samza.apache.org mailing list. The mailing
list has had a 98% increase in traffic in the past
quarter (189 emails compared to 95)

- We presented about Samza in the following meetup talks:
  - Stateful Stream Processing with Apache Samza and RocksDB:
RocksDB meetup 2020 at Rockset
  - Defending users from Abuse using Stream Processing at LinkedIn:
Stream Processing with Apache Kafka & Apache Samza Meetup at LinkedIn
  - Enabling Mission-critical Stateful Stream Processing with Samza:
Stream Processing with Apache Kafka & Apache Samza Meetup at LinkedIn

Re: [VOTE] SEP-24: Cluster-based Job Coordinator Dependency Isolation

2020-03-03 Thread Prateek Maheshwari

+1 (binding) from me. Thanks for contributing this feature. Looking forward
to having dependency isolation and to the ability to upgrade the framework
independently from an application.

Thanks,
Prateek

On Fri, Feb 28, 2020 at 10:48 AM Cameron Lee 
wrote:

> Hi all,
>
> This is a call for a vote on SEP-24: Cluster-based Job Coordinator
> Dependency Isolation. Thanks to everyone who reviewed the proposal and
> provided feedback.
>
> I have addressed comments on the SEP, and I am not aware of any further
> major questions or objections, so I am starting this vote.
>
> SEP link:
>
> https://cwiki.apache.org/confluence/display/SAMZA/SEP-24%3A+Cluster-based+Job+Coordinator+Dependency+Isolation
>
> Discuss thread:
>
> https://mail-archives.apache.org/mod_mbox/samza-dev/202001.mbox/%3cCAMja7KeGcRZ3H95Rxk5XE=60zxm6jxjkjuwwxmgmadpfbyk...@mail.gmail.com%3e
> There was also some discussion through comments on the SEP page (see
> Resolved Comments).
>
> Please vote:
> [ ] +1 approve
> [ ] +0 no opinion
> [ ] -1 disapprove (and reason why)
>
> Thank you,
> Cameron
>

Re: [VOTE] Apache Samza 1.3.1 RC0

2020-02-19 Thread Prateek Maheshwari

Integration tests and check-all passed successfully. +1 (binding) from me.

Thanks,
Prateek

On Tue, Feb 18, 2020 at 12:47 PM Bharath Kumara Subramanian <
codin.mart...@gmail.com> wrote:

> Ran check-all and integration tests passed with one exception.
> The following test was flaky and passed on the second attempt for a
> specific combination (Scala - 2.12, YARN - 2.7.1 & JDK 8)
>
> > Task :samza-test_2.12:test
>
>
> testRepartitionedSessionWindowCounter FAILED
>
> java.lang.AssertionError: expected:<0> but was:<2>
>
> at org.junit.Assert.fail(Assert.java:88)
>
> at org.junit.Assert.failNotEquals(Assert.java:834)
>
> at org.junit.Assert.assertEquals(Assert.java:645)
>
> at org.junit.Assert.assertEquals(Assert.java:631)
>
> at
>
> org.apache.samza.test.operator.TestRepartitionWindowApp.testRepartitionedSessionWindowCounter(TestRepartitionWindowApp.java:72)
>
> We can follow it up with a ticket if we someone else runs into it as well.
>
> +1 (binding)
>
> Thanks,
> Bharath
>
> On Tue, Feb 18, 2020 at 12:12 AM Yi Pan  wrote:
>
> > Ran check-all and integration tests successfully.
> >
> > +1 (binding)
> >
> > On Thu, Feb 13, 2020 at 12:02 PM Hai Lu  wrote:
> >
> > > Hi,
> > >
> > > This is a call for a vote on a release of Apache Samza 1.3.1 to redress
> > > certain issues found in 1.3.0
> > >
> > > The release candidate can be downloaded from here:
> > > http://home.apache.org/~lhaiesp/samza-1.3.1-rc0/
> > >
> > > The release candidate is signed with pgp key 0x256F8FA2, which can be
> > found
> > > here:
> > >
> > >
> >
> https://keyserver.ubuntu.com/pks/lookup?search=0x256F8FA2=on=index
> > > or to directly see the public key here:
> > >
> > >
> >
> https://keyserver.ubuntu.com/pks/lookup?op=get=0x9ebc0889d43fae16dd0d8f5ba2f50cf4256f8fa2
> > >
> > > The git tag is release-1.3.1-rc0 and signed with the same pgp key
> above:
> > >
> > >
> >
> https://gitbox.apache.org/repos/asf?p=samza.git;a=commit;h=7b849c047827587dec55ac169f41aac7321ce1cb
> > >
> > > Test binaries have been published to Maven's staging repository, and
> are
> > > available here:
> > > https://repository.apache.org/content/repositories/orgapachesamza-1074
> > >
> > > The vote will be open for 128 hours (ending at 8:00 PM PST Tuesday,
> > > 2/18/2020).
> > >
> > > Please download the release candidate, check the hashes/signature,
> build
> > it
> > > and test it, and then please vote:
> > >
> > > [ ] +1 approve
> > >
> > > [ ] +0 no opinion
> > >
> > > [ ] -1 disapprove (and reason why)
> > >
> > > I ran check-all.sh and integration tests (both YARN and standalone).
> > >
> > > +1 (non-binding) from my side.
> > >
> > > Thanks,
> > > Hai
> > >
> >
>

Re: [ANNOUNCE] Please welcome Bharath Kumarasubramanian to the Samza PMC

2020-02-14 Thread Prateek Maheshwari

Congrats Bharath!

On Fri, Feb 14, 2020 at 9:15 AM Sanil Jain  wrote:

> Congrats Bharath!
>
>
>
> On Fri, 14 Feb 2020 at 08:30, Daniel Nishimura 
> wrote:
>
> > Congrats Bharath!
> >
> >
> > > On Feb 13, 2020, at 10:46 PM, Yi Pan  wrote:
> > >
> > > Congrats! Bharath, well deserved!
> > >
> > > -Yi
> > >
> > >> On Thu, Feb 13, 2020 at 10:17 PM Wei Song  >
> > >> wrote:
> > >>
> > >> Congrats, Bharath, well deserved !!!
> > >>
> > >>
> > >> On 2/13/20, 8:51 PM, "Jagadish Venkatraman" 
> > >> wrote:
> > >>
> > >>Congrats Bharath. Great work! Looking forward to continued
> > >> contributions!
> > >>
> > >>>On Thursday, February 13, 2020, Yang Zhang 
> > wrote:
> > >>>
> > >>> Congratulations, Bharath! Nice work and thanks for the contributions!
> > >>>
> > >>> Best,
> > >>> Yang
> > >>>
> > >>> On Thu, Feb 13, 2020 at 4:27 PM Xinyu Liu 
> > >> wrote:
> > >>>
> >  Hi all,
> > 
> >  I'm very pleased to announce that the Samza PMC has voted Bharath
> >  Kumarasubramanian to be a Project Management Committee (PMC)
> > >> Member.  The
> >  PMC is responsible for the overall health of a project and for
> > >> voting in
> >  new committers and PMC members, as well as voting on releases.
> > >> Over the
> >  past few years, Bharath has been a valuable committer on the
> > >> project.
> > 
> >  Congrats Bharath!
> > 
> >  Thanks,
> >  Xinyu
> >  on behalf of the Samza PMC
> > 
> > >>>
> > >>
> > >>
> > >>--
> > >>Jagadish
> > >>
> > >>
> > >>
> >
>

Re: Samza consumer without Zookeeper

2020-02-10 Thread Prateek Maheshwari

Robert, from the stacktrace this looks like something in the application
logic is looking for that key (i.e. in the HelixApp constructor).

Thanks,
Prateek

On Mon, Feb 10, 2020 at 1:14 PM Robert Wigginton
 wrote:

> Yi,
>
> I think I am getting ahead of myself.  We are not getting past the job
> coordinator portion of the config.  Running 1.3 and defining
> job.coordinator.factory=org.apache.samza.standalone.PassthroughJobCoordinatorFactory
> in the properties file we still get the error below.
>
>
> Exception in thread "main" org.apache.samza.config.ConfigException:
> Missing key job.coordinator.zk.connect.
> at org.apache.samza.config.Config.getList(Config.java:168)
> at com.helixeducation.porter.HelixApp.(HelixApp.java:29)
> at
> com.helixeducation.porter.HelixRocksDBApp.(HelixRocksDBApp.java:16)
> at
> com.helixeducation.porter.inquirySubmission.InquirySubmissionApp.(InquirySubmissionApp.java:25)
> at
> com.helixeducation.porter.inquirySubmission.InquirySubmissionTaskRunner.main(InquirySubmissionTaskRunner.java:19)
>
> On 2/5/20, 6:32 PM, "Yi Pan"  wrote:
>
> Hi, Robert,
>
> Thanks! I believe since Samza 1.2, the ZK dependency from Kafka
> consumer is
> removed from Samza. I briefly checked the code base from master and
> there
> is no reference to KafkaConsumerConfig.getZkConnect(). Do you see any
> error
> messages when you remove the systems.x.consumer.zookeeper.connect from
> your
> config?
>
> -Yi
>
> On Wed, Feb 5, 2020 at 3:12 PM Robert Wigginton
>  wrote:
>
> > Currently 1.0.0 but are in the process of upgrading to 1.3.0.
> >
> > On Wed, 2020-02-05 at 15:01 -0800, Yi Pan wrote:
> > > Hi, Robert,
> > >
> > > Which version of Samza are you using?
> > >
> > > -Yi
> > >
> > > On Tue, Feb 4, 2020 at 9:17 AM Robert Wigginton
> > >  wrote:
> > >
> > > > Hello,
> > > >
> > > > We are currently evaluating moving from an on prem Kafka
> deployment
> > > > to
> > > > Confluent Cloud.  Confluent Cloud does not expose Zookeeper to
> > > > user.
> > > > Is it possible to setup a Samza consumer without the zookeeper
> > > > connect
> > > > option?
> > > >
> > > > Thanks,
> > > >
> > > > Robert
> > > >
> >
>
>
>

Re: Nullpointer After Upgrade from samza 1.0.0 to 1.3.0

2020-01-23 Thread Prateek Maheshwari

Brett, can you take a look at this?

- Prateek

On Wed, Jan 15, 2020 at 9:41 AM Jeremiah Adams
 wrote:

> I am updating our jobs to use samza 1.3.0. I'm getting a null pointer when
> manually committing via taskCoordinator.commit().
>
>
> Below is the stack trace - can anyone point me in the right direction?
>
> Thanks.
>
>
> 2020-01-15 10:33:35 RunLoop [ERROR] Task Partition 0 commit failed
> java.lang.NullPointerException
> at
> scala.collection.mutable.ArrayOps$ofRef$.newBuilder$extension(ArrayOps.scala:190)
> at
> scala.collection.mutable.ArrayOps$ofRef.newBuilder(ArrayOps.scala:186)
> at
> scala.collection.TraversableLike$class.filterImpl(TraversableLike.scala:246)
> at
> scala.collection.TraversableLike$class.filter(TraversableLike.scala:259)
> at scala.collection.mutable.ArrayOps$ofRef.filter(ArrayOps.scala:186)
> at
> org.apache.samza.storage.TransactionalStateTaskStorageManager$$anonfun$removeOldCheckpoints$2.apply(TransactionalStateTaskStorageManager.scala:94)
> at
> org.apache.samza.storage.TransactionalStateTaskStorageManager$$anonfun$removeOldCheckpoints$2.apply(TransactionalStateTaskStorageManager.scala:86)
> at
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
> at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
> at
> org.apache.samza.storage.TransactionalStateTaskStorageManager.removeOldCheckpoints(TransactionalStateTaskStorageManager.scala:86)
> at
> org.apache.samza.container.TaskInstance.commit(TaskInstance.scala:277)
> at
> org.apache.samza.container.RunLoop$AsyncTaskWorker$5.run(RunLoop.java:547)
> at
> org.apache.samza.container.RunLoop$AsyncTaskWorker.commit(RunLoop.java:566)
> at
> org.apache.samza.container.RunLoop$AsyncTaskWorker.run(RunLoop.java:432)
> at
> org.apache.samza.container.RunLoop$AsyncTaskWorker.access$300(RunLoop.java:357)
> at org.apache.samza.container.RunLoop.runTasks(RunLoop.java:244)
> at org.apache.samza.container.RunLoop.run(RunLoop.java:176)
> at
> org.apache.samza.container.SamzaContainer.run(SamzaContainer.scala:768)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:266)
> at java.util.concurrent.FutureTask.run(FutureTask.java)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 2020-01-15 10:33:35 RunLoop [ERROR] Caught throwable and stopping run loop
> java.lang.NullPointerException
> at
> scala.collection.mutable.ArrayOps$ofRef$.newBuilder$extension(ArrayOps.scala:190)
> at
> scala.collection.mutable.ArrayOps$ofRef.newBuilder(ArrayOps.scala:186)
> at
> scala.collection.TraversableLike$class.filterImpl(TraversableLike.scala:246)
> at
> scala.collection.TraversableLike$class.filter(TraversableLike.scala:259)
> at scala.collection.mutable.ArrayOps$ofRef.filter(ArrayOps.scala:186)
> at
> org.apache.samza.storage.TransactionalStateTaskStorageManager$$anonfun$removeOldCheckpoints$2.apply(TransactionalStateTaskStorageManager.scala:94)
> at
> org.apache.samza.storage.TransactionalStateTaskStorageManager$$anonfun$removeOldCheckpoints$2.apply(TransactionalStateTaskStorageManager.scala:86)
> at
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
> at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
> at
> org.apache.samza.storage.TransactionalStateTaskStorageManager.removeOldCheckpoints(TransactionalStateTaskStorageManager.scala:86)
> at
> org.apache.samza.container.TaskInstance.commit(TaskInstance.scala:277)
> at
> org.apache.samza.container.RunLoop$AsyncTaskWorker$5.run(RunLoop.java:547)
> at
> org.apache.samza.container.RunLoop$AsyncTaskWorker.commit(RunLoop.java:566)
> at
> org.apache.samza.container.RunLoop$AsyncTaskWorker.run(RunLoop.java:432)
> at
> org.apache.samza.container.RunLoop$AsyncTaskWorker.access$300(RunLoop.java:357)
> at org.apache.samza.container.RunLoop.runTasks(RunLoop.java:244)
> at org.apache.samza.container.RunLoop.run(RunLoop.java:176)
> at
> org.apache.samza.container.SamzaContainer.run(SamzaContainer.scala:768)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:266)
> at java.util.concurrent.FutureTask.run(FutureTask.java)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
>
>
> Jeremiah Adams
> Software Engineer
> www.helixeducation.com
> Blog | Twitter<
>

Stream Processing Meetup at LinkedIn (Sunnyvale) on Feb 5th

2020-01-22 Thread Prateek Maheshwari

Hi folks,

The next Bay Area Stream Processing meetup will be held on Wednesday,
February 5, 2020. This meetup will focus on Apache Kafka, Apache Samza and
related streaming technologies.

Where: Unify Conference room, 950 W Maude Ave, Sunnyvale
When: 5:00 - 8:00 PM
RSVP: link


Agenda:
5:00 PM: Doors open and catered food available

5:00 - 6:00 PM: Networking

6:00 - 6:30 PM: High-performance data replication at Salesforce with Mirus
Paul Davidson, Salesforce
At Salesforce we manage high-volume Apache Kafka clusters in a growing
number of data centers around the globe. In the past we relied on Kafka's
Mirror Maker tool for cross-data center replication but, as the volume and
variety of data increased, we needed a new solution to maintain a high
standard of service reliability. In this talk, we will describe Mirus, our
open-source data replication tool based on Kafka Connect. Mirus was
designed for reliable, high-performance data replication at scale. It
successfully replaced MirrorMaker at Salesforce and has now been running
reliably in production for more than a year. We will give an overview of
the Mirus design and discuss the lessons we learned deploying, tuning, and
operating Mirus in a high-volume production environment.

6:30 - 7:00 PM: Defending users from Abuse using Stream Processing at
LinkedIn
Bhargav Golla, LinkedIn
When there are more than half a billion users, how can one effectively,
reliably and scalably classify them as good and bad users? This talk will
highlight how Anti-Abuse team at LinkedIn leverages Streams Processing
techniques like Samza and Brooklin to keep the good users in a trusted
environment devoid of bad actors.

7:00 - 7:30 PM: Enabling Mission-critical Stateful Stream Processing with
Samza
Ray Manpreet Singh Matharu, LinkedIn
Samza powers a variety of large-scale business-critical stateful stream
processing applications at LinkedIn. Their scale necessitates using
persistent and replicated local state. Unfortunately, hard failures can
cause a loss of this local state, and re-caching it can incur downtime
ranging from a few minutes to hours! In this talk, we describe the systems
and protocols that we've devised that bound the down time to a few seconds.
We detail the tradeoffs our approach brings and how we tackle them in
production at LinkedIn.

7:30 - 8:00 PM: Additional networking and Q

If you are interested in attending, please RSVP via this meetup.com link

.

Hope to see you there!
Prateek

Re: [DISCUSS] SEP-24: Cluster-based Job Coordinator Dependency Isolation

2020-01-21 Thread Prateek Maheshwari

Thanks for addressing the feedback. I'm +1 for the proposal. Looking
forward to the ability to upgrade the framework independently from the
application.

- Prateek

On Mon, Jan 6, 2020 at 10:39 AM Cameron Lee  wrote:

> Hi all,
> I am just refreshing this thread to check if there is any further feedback
> on this SEP.
> Thank you,
> Cameron
>
> On Fri, Nov 22, 2019 at 11:24 AM Cameron Lee 
> wrote:
>
> > Hi all,
> > We created SEP-24: Cluster-based Job Coordinator Dependency Isolation.
> > Please find the SEP wiki here (
> >
> https://cwiki.apache.org/confluence/display/SAMZA/SEP-24%3A+Cluster-based+Job+Coordinator+Dependency+Isolation
> > ).
> > Please take a look and provide any feedback.
> > Thank you,
> > Cameron
> >
>

Re: [Draft] Samza quarterly report

2020-01-09 Thread Prateek Maheshwari

Looks good to me, thanks!

- Prateek

On Thu, Jan 9, 2020 at 1:43 PM Yi Pan  wrote:

> ## Description:
> - Apache Samza is a distributed stream processing engine that are highly
>   configurable to process events from various data sources, including
>   real-time messaging system (e.g. Kafka) and distributed file systems
> (e.g.
>   HDFS).
>
> ## Issues:
> - No issues require board attention
>
> ## Membership Data:
> Apache Samza was founded 2015-01-22 (5 years ago)
> There are currently 26 committers and 16 PMC members in this project.
> The Committer-to-PMC ratio is roughly 7:4.
>
> Community changes, past quarter:
> - No new PMC members. Last addition was Boris Shkolnik on 2019-06-06.
> - No new committers. Last addition was Rayman Preet Singh on 2019-07-08.
>
> ## Project Activity:
> - New version 1.3 was released on 12/05/2019
> - New features via SEPs (i.e. Samza Enhancement Proposals) are proposed
> continuously.
> In the last quarter, there are 4 new SEPs.
>
> ## Community Health:
> - We continue engage with new users via the Q on dev email lists.
> - We have Samza talks in many Conferences:
> Strange Loop - Riding the Stream Processing Wave
> Apache Beam Summit (Berlin) - Streaming Pipelines at Scale with Apache
> Beam and Samza
> ApacheCon North America - Samza 1.0: How we scaled stream processing at
> LinkedIn
> ApacheCon North America - Samza Portable Runner for Beam
> KubeCon North America - Running Apache Samza on Kubernetes
> - We have organized meetups with the following Samza Talks:
> Sunnyvale - Stream Processing in Python with Samza and Beam
> Sunnyvale - Apache Samza 1.0: Recent Advances and our plans for future
> in Stream Processing
> Seattle - Scalable Stream Processing with Apache Samza
>
>
> P.S. just fixing one typo.
>
> -Yi
>
> On Thu, Jan 9, 2020 at 1:42 PM Yi Pan  wrote:
>
> > ## Description:
> > - Apache Samza is a distributed stream processing engine that are highly
> >   configurable to process events from various data sources, including
> >   real-time messaging system (e.g. Kafka) and distributed file systems
> > (e.g.
> >   HDFS).
> >
> > ## Issues:
> > - No issues requires board attention
> >
> > ## Membership Data:
> > Apache Samza was founded 2015-01-22 (5 years ago)
> > There are currently 26 committers and 16 PMC members in this project.
> > The Committer-to-PMC ratio is roughly 7:4.
> >
> > Community changes, past quarter:
> > - No new PMC members. Last addition was Boris Shkolnik on 2019-06-06.
> > - No new committers. Last addition was Rayman Preet Singh on 2019-07-08.
> >
> > ## Project Activity:
> > - New version 1.3 was released on 12/05/2019
> > - New features via SEPs (i.e. Samza Enhancement Proposals) are proposed
> > continuously.
> > In the last quarter, there are 4 new SEPs.
> >
> > ## Community Health:
> > - We continue engage with new users via the Q on dev email lists.
> > - We have Samza talks in many Conferences:
> > Strange Loop - Riding the Stream Processing Wave
> > Apache Beam Summit (Berlin) - Streaming Pipelines at Scale with
> Apache
> > Beam and Samza
> > ApacheCon North America - Samza 1.0: How we scaled stream processing
> > at LinkedIn
> > ApacheCon North America - Samza Portable Runner for Beam
> > KubeCon North America - Running Apache Samza on Kubernetes
> > - We have organized meetups with the following Samza Talks:
> > Sunnyvale - Stream Processing in Python with Samza and Beam
> > Sunnyvale - Apache Samza 1.0: Recent Advances and our plans for
> future
> > in Stream Processing
> > Seattle - Scalable Stream Processing with Apache Samza
> >
> > If the above report looks good, I will submit today.
> >
> > Thanks a lot!
> >
> > -Yi
> >
> > On Thu, Jan 9, 2020 at 10:23 AM Prateek Maheshwari  >
> > wrote:
> >
> >> Thanks for preparing this Yi. We had the following Samza talks and
> meetups
> >> in 2019. Let's highlight them under Community Health:
> >>
> >> Conferences:
> >> Strange Loop - Riding the Stream Processing Wave
> >> Apache Beam Summit (Berlin) - Streaming Pipelines at Scale with Apache
> >> Beam
> >> and Samza
> >> ApacheCon North America - Samza 1.0: How we scaled stream processing at
> >> LinkedIn
> >> ApacheCon North America - Samza Portable Runner for Beam
> >> KubeCon North America - Running Apache Samza on Kubernetes
> >>
> >> Meetup Talks:
> >> Sunnyvale - Stream Pr

Re: [Draft] Samza quarterly report

2020-01-09 Thread Prateek Maheshwari

Thanks for preparing this Yi. We had the following Samza talks and meetups
in 2019. Let's highlight them under Community Health:

Conferences:
Strange Loop - Riding the Stream Processing Wave
Apache Beam Summit (Berlin) - Streaming Pipelines at Scale with Apache Beam
and Samza
ApacheCon North America - Samza 1.0: How we scaled stream processing at
LinkedIn
ApacheCon North America - Samza Portable Runner for Beam
KubeCon North America - Running Apache Samza on Kubernetes

Meetup Talks:
Sunnyvale - Stream Processing in Python with Samza and Beam
Sunnyvale - Apache Samza 1.0: Recent Advances and our plans for future in
Stream Processing
Seattle - Scalable Stream Processing with Apache Samza

On Thu, Jan 9, 2020 at 1:23 AM Yi Pan  wrote:

> Hi, all,
>
> Another time to report our project status. I have a draft below and would
> like input from the community to fill in some more details:
>
> ## Description:
> - Apache Samza is a distributed stream processing engine that are highly
>   configurable to process events from various data sources, including
>   real-time messaging system (e.g. Kafka) and distributed file systems
> (e.g.
>   HDFS).
>
> ## Issues:
> - No issues requires board attention
>
> ## Membership Data:
> Apache Samza was founded 2015-01-22 (5 years ago)
> There are currently 26 committers and 16 PMC members in this project.
> The Committer-to-PMC ratio is roughly 7:4.
>
> Community changes, past quarter:
> - No new PMC members. Last addition was Boris Shkolnik on 2019-06-06.
> - No new committers. Last addition was Rayman Preet Singh on 2019-07-08.
>
> ## Project Activity:
> - New version 1.3 was released on 12/05/2019
> *- [please add related project activities you know here]*
>
> ## Community Health:
> - The community is actively pushing new features via SEPs (i.e. Samza
>   Enhancement Proposals). In the last quarter, there are 4 new SEPs.
> - We continue engage with new users via the Q on dev email lists.
> *- [please add examples of community health indicators here, like new
> companies/users, new meetups/talks, new initiatives proposed and
> in-progress etc.]*
>

Re: [VOTE] SEP-26: Add SystemProducer for Azure Blob Storage

2020-01-08 Thread Prateek Maheshwari

+1 (binding). Thanks for the contribution.

- Prateek

On Tue, Jan 7, 2020 at 7:59 PM Jagadish Venkatraman 
wrote:

> +1 (binding), looking forward to Samza's integration with Azure blobs
>
> On Wednesday, January 8, 2020, Lakshmi Manasa 
> wrote:
>
> > Hi,
> >
> > This is a call for a vote on SEP-26: Add SystemProducer for Azure Blob
> > Storage.
> > Thanks for taking a look and giving feedback.
> >
> > I have addressed the comments on the SEP and since there were no major
> > questions/objections, starting this vote.
> >
> > Discussion thread:
> > http://mail-archives.apache.org/mod_mbox/samza-dev/202001.
> > mbox/%3CCAEwD47cW2T24C9A_tzj7Qxuv3P%2B2an47GkmaA4-
> > 41WZfvY_vgw%40mail.gmail.com%3E
> >
> > SEP:
> > https://cwiki.apache.org/confluence/display/SAMZA/SEP-
> > 26%3A+Azure+Blob+Storage+Producer
> >
> > Please vote:
> >
> > [ ] +1 approve
> >
> > [ ] +0 no opinion
> >
> > [ ] -1 disapprove (and reason why)
> >
> > Thanks,
> > Manasa
> >
>
>
> --
> Jagadish
>

[RESULT][VOTE] SEP-25: PR Title And Description Guidelines

2020-01-06 Thread Prateek Maheshwari

The vote on SEP-25 has been open for > 72 hours and has 4 binding and 4
non-binding +1s. SEP-25 is accepted.

Thanks,
Prateek

Re: [VOTE] SEP 25: PR Title and Description Guidelines

2020-01-06 Thread Prateek Maheshwari

The VOTE has been open for > 3 business days and has 3 binding +1s, so the
proposal has been accepted. I'll send out a RESULT email.

Thanks everyone for reviewing and voting.

- Prateek

On Fri, Dec 20, 2019 at 7:40 AM Yang Zhang  wrote:

> +1 (non-binding)
>
> On Thu, Dec 19, 2019 at 5:28 PM Jake Maes  wrote:
>
> > +1 (binding)
> >
> > On Thu, Dec 19, 2019, 10:59 AM  wrote:
> >
> > >
> > > +1 (non-binding)
> > >
> > > Thanks,
> > > Daniel C
> > >
> > > > On Dec 19, 2019, at 1:48 PM, Xinyu Liu 
> wrote:
> > > >
> > > > +1 (binding).
> > > >
> > > > Thanks,
> > > > Xinyu
> > > >
> > > >> On Wed, Dec 18, 2019 at 7:49 PM Jagadish Venkatraman <
> > > jagadish1...@gmail.com>
> > > >> wrote:
> > > >>
> > > >> +1 (binding)
> > > >>
> > > >> On Thursday, December 19, 2019, Daniel Nishimura <
> > dnishim...@gmail.com>
> > > >> wrote:
> > > >>
> > > >>> +1 non-binding
> > > >>>
> > > >>>> On Dec 18, 2019, at 7:09 PM, Yi Pan  wrote:
> > > >>>>
> > > >>>> +1 (binding)
> > > >>>>
> > > >>>> On Wed, Dec 18, 2019 at 10:49 AM Bharath Kumara Subramanian <
> > > >>>> codin.mart...@gmail.com> wrote:
> > > >>>>
> > > >>>>> +1 (non-binding).
> > > >>>>>
> > > >>>>> Thanks,
> > > >>>>> Bharath
> > > >>>>>
> > > >>>>> On Wed, Dec 18, 2019 at 10:42 AM Prateek Maheshwari <
> > > >>> prate...@utexas.edu>
> > > >>>>> wrote:
> > > >>>>>
> > > >>>>>> Hi folks,
> > > >>>>>>
> > > >>>>>> This is a call for a vote on SEP 25: PR Title and Description
> > > >>> Guidelines.
> > > >>>>>> Thanks to everyone who helped review the proposal and provided
> > > >>> feedback.
> > > >>>>>>
> > > >>>>>> Feedback from the discussion is positive:
> > > >>>>>>
> > > >>>>>>
> > > >>>>> http://mail-archives.apache.org/mod_mbox/samza-dev/201912.mbox/%
> > > >>> 3CCAMja7KeQr9C048UVZwfSC46h%3DEX_9S%2BSEvMF9NPg0V5dPTPfZg%
> > > >>> 40mail.gmail.com%3E
> > > >>>>>> <
> > > >>>>>>
> > > >>>>>>
> > > >>>>> http://mail-archives.apache.org/mod_mbox/samza-dev/201912.mbox/%
> > > >>> 3CCAMja7KeQr9C048UVZwfSC46h%3DEX_9S%2BSEvMF9NPg0V5dPTPfZg%
> > > >>> 40mail.gmail.com%3E
> > > >>>>>> r <
> > > >>>>>
> > > >>
> > http://mail-archives.apache.org/mod_mbox/samza-dev/201912.mbox/browser
> > > >>>>>>>>
> > > >>>>>>
> > > >>>>>> SEP can be found at:
> > > >>>>>>
> > > >>>>>>
> > > >>>>> https://cwiki.apache.org/confluence/display/SAMZA/SEP-
> > > >>> 25%3A+PR+Title+And+Description+Guidelines
> > > >>>>>> <
> > > >>>>>>
> > > >>>>>>
> > > >>>>> https://cwiki.apache.org/confluence/display/SAMZA/SEP-
> > > >>> 25%3A+PR+Title+And+Description+Guidelines
> > > >>>>>>>
> > > >>>>>>
> > > >>>>>> Please vote:
> > > >>>>>>
> > > >>>>>> [ ] +1 approve
> > > >>>>>>
> > > >>>>>> [ ] +0 no opinion
> > > >>>>>>
> > > >>>>>> [ ] -1 disapprove (and reason why)
> > > >>>>>>
> > > >>>>>> Thanks,
> > > >>>>>> Prateek
> > > >>>>>>
> > > >>>>>
> > > >>>
> > > >>
> > > >>
> > > >> --
> > > >> Jagadish
> > > >>
> > >
> >
>

Re: [DISCUSS] SEP-26: Add SystemProducer for Azure Blob Storage

2020-01-06 Thread Prateek Maheshwari

Thanks for the proposal Manasa, it looks good to me. If there are no major
questions or objections by EOD let's move to the VOTE.

Thanks,
Prateek

On Wed, Dec 18, 2019 at 4:27 PM Lakshmi Manasa 
wrote:

> Hi all,
>
> We created SEP-26: Add SystemProducer for Azure Blob Storage.
>
> Please find SEP here (
>
> https://cwiki.apache.org/confluence/display/SAMZA/SEP-26%3A+Azure+Blob+Storage+Producer
> )
>
> Please take a look and provide feedback.
>
> thanks,
> Manasa
>

[VOTE] SEP 25: PR Title and Description Guidelines

2019-12-18 Thread Prateek Maheshwari

Hi folks,

This is a call for a vote on SEP 25: PR Title and Description Guidelines.
Thanks to everyone who helped review the proposal and provided feedback.

Feedback from the discussion is positive:
http://mail-archives.apache.org/mod_mbox/samza-dev/201912.mbox/%3CCAMja7KeQr9C048UVZwfSC46h%3DEX_9S%2BSEvMF9NPg0V5dPTPfZg%40mail.gmail.com%3E
 <
http://mail-archives.apache.org/mod_mbox/samza-dev/201912.mbox/%3CCAMja7KeQr9C048UVZwfSC46h%3DEX_9S%2BSEvMF9NPg0V5dPTPfZg%40mail.gmail.com%3E
r >

SEP can be found at:
https://cwiki.apache.org/confluence/display/SAMZA/SEP-25%3A+PR+Title+And+Description+Guidelines
 <
https://cwiki.apache.org/confluence/display/SAMZA/SEP-25%3A+PR+Title+And+Description+Guidelines
>

Please vote:

[ ] +1 approve

[ ] +0 no opinion

[ ] -1 disapprove (and reason why)

Thanks,
Prateek

Re: [DISCUSS] SEP 25: PR Title and Description Guidelines

2019-12-18 Thread Prateek Maheshwari

Thanks for the feedback folks. Since the feedback is positive, I'll start a
VOTE for the SEP.
@Jagadish Venkatraman , will add a link to this SEP
to the contributor's corner in the appropriate place.

Thanks,
Prateek

On Mon, Dec 16, 2019 at 12:09 PM Yi Pan  wrote:

> +1 (binding). lgtm. Thanks!
>
> -Yi
>
> On Mon, Dec 16, 2019 at 8:08 AM Daniel Nishimura 
> wrote:
>
> > +1. Thank Prateek for standardizing the PR process better.
> >
> > On Sun, Dec 15, 2019 at 10:55 PM Bharath Kumara Subramanian <
> > codin.mart...@gmail.com> wrote:
> >
> > > +1.  Template looks good to me.
> > > It will be really helpful to sift through, categorize and prepare
> release
> > > notes from notable PRs during releases.
> > >
> > > Thanks,
> > > Bharath
> > >
> > >
> > > On Fri, Dec 13, 2019 at 11:24 PM Jagadish Venkatraman <
> > > jagadish1...@gmail.com> wrote:
> > >
> > > > +1, thanks for the write-up Prateek.
> > > >
> > > > Let's also update the contributor's guidelines at:
> > > > https://samza.apache.org/contribute/contributors-corner.html
> > > >
> > > >
> > > > On Friday, December 13, 2019, Prateek Maheshwari <
> prateek...@gmail.com
> > >
> > > > wrote:
> > > >
> > > > > Hi folks,
> > > > >
> > > > > In order to make Samza PR descriptions and commit messages more
> > > > consistent,
> > > > > informative and discoverable, we propose the following requirements
> > for
> > > > new
> > > > > PRs submitted to the Samza project
> > > > >
> > > > > https://cwiki.apache.org/confluence/display/SAMZA/SEP-25%3A+
> > > > > PR+Title+And+Description+Guidelines
> > > > >
> > > > > Contributors should copy-paste and update the description template
> > when
> > > > > submitting PRs.
> > > > > Committers should ensure that the guidelines are followed before
> > > merging
> > > > > changes.
> > > > >
> > > > > Please take a look and let us know if you have any concerns or
> > > > suggestions.
> > > > >
> > > > > Thanks,
> > > > > Prateek
> > > > >
> > > >
> > > >
> > > > --
> > > > Jagadish
> > > >
> > >
> >
>

[DISCUSS] SEP 25: PR Title and Description Guidelines

2019-12-13 Thread Prateek Maheshwari

Hi folks,

In order to make Samza PR descriptions and commit messages more consistent,
informative and discoverable, we propose the following requirements for new
PRs submitted to the Samza project

https://cwiki.apache.org/confluence/display/SAMZA/SEP-25%3A+PR+Title+And+Description+Guidelines

Contributors should copy-paste and update the description template when
submitting PRs.
Committers should ensure that the guidelines are followed before merging
changes.

Please take a look and let us know if you have any concerns or suggestions.

Thanks,
Prateek

Re: [VOTE] SEP-23: Simplify Job Runner

2019-12-11 Thread Prateek Maheshwari

+1 (binding).

Thanks,
Prateek

On Wed, Dec 11, 2019 at 11:50 AM Xinyu Liu  wrote:

> +1 (binding). This proposal will help future split deployment as well as
> make the deployment simple. Thanks for making the effort!
>
> Thanks,
> Xinyu
>
> On Wed, Dec 11, 2019 at 10:30 AM Ke Wu  wrote:
>
> > Hi,
> >
> > This is a call for a vote on SEP-23: Simplify Job Runner. Thanks to
> > everyone who help review and refine the proposal.
> >
> > Feedbacks from discussion is positive
> > http://mail-archives.apache.org/mod_mbox/samza-dev/201912.mbox/browser <
> > http://mail-archives.apache.org/mod_mbox/samza-dev/201912.mbox/browser>
> >
> > SEP can be found at
> >
> >
> https://cwiki.apache.org/confluence/display/SAMZA/SEP-23%3A+Simplify+Job+Runner
> > <
> >
> https://cwiki.apache.org/confluence/display/SAMZA/SEP-23:+Simplify+Job+Runner
> > >
> >
> > Jira can be found at
> > https://issues.apache.org/jira/browse/SAMZA-2405 <
> > https://issues.apache.org/jira/browse/SAMZA-2405>
> >
> > Please vote:
> >
> > [ ] +1 approve
> >
> > [ ] +0 no opinion
> >
> > [ ] -1 disapprove (and reason why)
> >
> > Thanks,
> > Ke
>

Re: [DISCUSS] SEP-23: Simplify Job Runner

2019-12-10 Thread Prateek Maheshwari

Looks good to me as well. +1 for the overall proposal, and thanks for
putting it together.

- Prateek

On Tue, Dec 10, 2019 at 1:26 PM Xinyu Liu  wrote:

> Thanks for updating the SEP wiki. The revised design looks clear to me.
> Some config name might be simplified, e.g.
> job.config.loader.properties.path.
> Overall it looks good.
>
> Thanks,
> Xinyu
>
> On Fri, Dec 6, 2019 at 1:52 PM Ke Wu  wrote:
>
> > As we are revamping our config loading logic in SEP-23: Simplify Job
> > Runner, we will introduce backward incompatible changes on the job
> > submission logic, i.e. deprecating the usage of --config-factory and
> > --config-path, which reads a full config upon job submission, instead, we
> > will only provide job submission related config through --config and
> > ConfigLoader will load the complete config on AM instead.
> >
> > Please take a look and chime in your thoughts.
> >
> > Best,
> > Ke
> >
> > > On Dec 2, 2019, at 4:42 PM, Ke Wu  wrote:
> > >
> > > Hi Xinyu,
> > >
> > > Please see the response in line:
> > >
> > >>  1. After this change, seems the original config-factory and
> config-path
> > >>  are only used to supply parameters for submitting job. Is that the
> > case?
> > >>  Which configs are still needed in the submission?
> > >
> > > Yes, only configs related to job submission is needed.
> > >
> > > job.name, job.factory.class & yarn.package.path are the minimum three
> > configs needed for the job submission, which may be supplied by --config
> > instead.
> > >
> > >>  2. For backward compatibility, does it still work if the user doesn't
> > >>  specify the new ConfigLoader in the command line? The
> > >>  PropertiesConfigLoader class seems requiring the path of the config
> > after
> > >>  exploding the tgz.
> > >
> > > If the user does not specify config loader in the config, then it will
> > work in the previous flow, where runner publishes configs in coordinator
> > stream and job coordinator/application master will pick it up by reading
> > from Kafka. So this is a backward compatible change.
> > >
> > >>  3. If the final plan is to remove the original config factory/path,
> how
> > >>  do we pass the parameters needed for Yarn submission, e.g. job name,
> > id,
> > >>  and tgz path?
> > >
> > > We can either pass them by --config or introduce delicate command line
> > arguments for it in CommandLine.scala.
> > >
> > >
> > > Let me know if you have any further questions.
> > >
> > > Best,
> > > Ke
> > >
> > >> On Nov 27, 2019, at 11:02 AM, Xinyu Liu 
> wrote:
> > >>
> > >> Thanks a lot for putting out the design for simplifying the job
> > submission
> > >> process. The motivation makes sense to me that most of the planning
> and
> > >> config generation should be done after submitting to the cluster,
> > instead
> > >> of during the submission, which can happen in a local sandbox without
> > the
> > >> access to the resources needed for planning. It also improves the
> > process
> > >> from the security stand of the view.
> > >>
> > >> A few questions regarding to the interface changes:
> > >>
> > >>  1. After this change, seems the original config-factory and
> config-path
> > >>  are only used to supply parameters for submitting job. Is that the
> > case?
> > >>  Which configs are still needed in the submission?
> > >>  2. For backward compatibility, does it still work if the user doesn't
> > >>  specify the new ConfigLoader in the command line? The
> > >>  PropertiesConfigLoader class seems requiring the path of the config
> > after
> > >>  exploding the tgz.
> > >>  3. If the final plan is to remove the original config factory/path,
> how
> > >>  do we pass the parameters needed for Yarn submission, e.g. job name,
> > id,
> > >>  and tgz path?
> > >>
> > >> Thanks,
> > >> Xinyu
> > >>
> > >> On Fri, Nov 15, 2019 at 3:00 PM Ke Wu  wrote:
> > >>
> > >>> We created SEP-23: Simplify Job Runner, which simplifies job runner
> by
> > >>> moving config retrieval and planning to AM.
> > >>>
> > >>> Please find out the SEP wiki below:
> > >>>
> > >>>
> >
> https://cwiki.apache.org/confluence/display/SAMZA/SEP-23%3A+Simplify+Job+Runner
> > >>>
> > >>> Please take a look and chime in your thoughts.
> > >>>
> > >>> Thanks,
> > >>> Ke
> > >>>
> > >
> >
> >
>

Re: [VOTE] Apache Samza 1.3.0 RC2

2019-12-04 Thread Prateek Maheshwari

+ 1 (binding)

Verified the signatures, built and ran check-all and the integration
tests. All tests passed.

Thanks for co-ordinating the release.

- Prateek

On Mon, Dec 2, 2019 at 10:18 AM Xinyu Liu  wrote:

> + 1 (binding)
>
> Verified the signatures, built and ran the integration tests. All passed.
> There is one flaky test failure during running check-all.sh:
>
>
> org.apache.samza.table.batching.TestBatchProcessor$TestBatchTriggered.testBatchOperationTriggeredByBatchSize
> FAILED
> java.lang.AssertionError
> at org.junit.Assert.fail(Assert.java:86)
> at org.junit.Assert.assertTrue(Assert.java:41)
> at org.junit.Assert.assertTrue(Assert.java:52)
> at
>
> org.apache.samza.table.batching.TestBatchProcessor$TestBatchTriggered.testBatchOperationTriggeredByBatchSize(TestBatchProcessor.java:122)
>
> This shouldn't block the release as the test is flaky. We should either fix
> or disable this test for the future releases. Create ticket to track:
> https://issues.apache.org/jira/browse/SAMZA-2411
>
> Thanks,
> Xinyu
>
>
>
> On Sun, Dec 1, 2019 at 6:20 PM Yi Pan  wrote:
>
> > +1 (binding), verified the signature, built and local integration tests
> > passed.
> >
> > Thanks!
> >
> > -Yi
> >
> > On Wed, Nov 27, 2019 at 2:49 PM Hai Lu  wrote:
> >
> > > Hi,
> > >
> > > This is a call for a vote on a release of Apache Samza 1.3.0. Thanks to
> > > everyone who has contributed to this release.
> > >
> > > The release candidate can be downloaded from here:
> > > http://home.apache.org/~lhaiesp/samza-1.3.0-rc2/
> > >
> > > The release candidate is signed with pgp key 0x07678C76, which can be
> > found
> > > here:
> > >
> > >
> >
> https://keyserver.ubuntu.com/pks/lookup?search=0x07678C76=on=index
> > > or to directly see the public key here:
> > >
> > >
> >
> https://keyserver.ubuntu.com/pks/lookup?op=get=0x1513eaedf69d7ca77ff283b534ea3ca507678c76
> > >
> > > The git tag is release-1.3.0-rc2 and signed with the same pgp key
> above:
> > >
> > >
> >
> https://gitbox.apache.org/repos/asf?p=samza.git;a=commit;h=573ef951dd9d96d9d547db86bbc8023557714f47
> > >
> > > Test binaries have been published to Maven's staging repository, and
> are
> > > available here:
> > > https://repository.apache.org/content/repositories/orgapachesamza-1073
> > >
> > > The vote will be open for 171 hours (ending at 6:00 PM PST Wednesday,
> > > 12/4/2019).
> > >
> > > Please download the release candidate, check the hashes/signature,
> build
> > it
> > > and test it, and then please vote:
> > >
> > > [ ] +1 approve
> > >
> > > [ ] +0 no opinion
> > >
> > > [ ] -1 disapprove (and reason why)
> > >
> > > I ran check-all.sh and integration tests (both YARN and standalone).
> > >
> > > +1 (non-binding) from my side.
> > >
> > > Thanks,
> > > Hai
> > >
> >
>

Re: Samza compaction policy

2019-11-06 Thread Prateek Maheshwari

Hi Malcolm,

Using cleanup.policy=compact on the Kafka checkpoint topic should be
sufficient, and is  the default when the topic is created by Samza. Under
normal operations, a checkpoint topic should only have ~ num task messages.

I can suggest the following ways to identify the issue:
1. Read the topic contents using kafka-console-consumer and check if the
extra size is due to incorrect entries (a second / non-samza writer), or
due to duplicate entries for the same key (log compaction issues).
2. If duplicate keys, verify if Kafka's log compaction is kicking in and
compacting stale entries. One evidence of this working is a sawtooth
pattern in the Kafka topic partition size graph. You can also check the
Kafka broker logs for any log compaction related error messages.
3. If log compaction isn't working, verify if the related Kafka topic /
broker configurations are appropriate. E.g, log.cleaner.enable,
log.cleaner.threads, min.cleanable.dirty.ratio, min/max.compaction.lag.ms,
delete.retention.ms etc.

Let us know if you are able to find any more details.

Thanks,
Prateek

On Tue, Nov 5, 2019 at 9:20 AM Malcolm McFarland 
wrote:

> Hey folks,
>
> We have cleanup.policy=compact set on our checkpoint topics. Even with
> this, we have almost 3 billion messages in some of these topics, and this
> is causing huge startup times. Are there any other settings we should set
> to optimize our startup times?
>
> Cheers,
> Malcolm McFarland
> Cavulus
>
>
> This correspondence is from HealthPlanCRM, LLC, d/b/a Cavulus. Any
> unauthorized or improper disclosure, copying, distribution, or use of the
> contents of this message is prohibited. The information contained in this
> message is intended only for the personal and confidential use of the
> recipient(s) named above. If you have received this message in error,
> please notify the sender immediately and delete the original message.
>

Re: [VOTE] SEP-18: Startpoints - Manipulating Starting Offsets for Input Streams

2019-09-06 Thread Prateek Maheshwari

+1 (binding). Thanks!

- Prateek

On Fri, Sep 6, 2019 at 8:32 AM Bharath Kumara Subramanian <
codin.mart...@gmail.com> wrote:

> +1 (non-binding)
>
> Looks good to me.
>
> On Fri, Sep 6, 2019 at 8:11 AM Jagadish Venkatraman <
> jagadish1...@gmail.com>
> wrote:
>
> > +1 (binding)
> >
> > Excellent work Dan! LGTM;
> >
> > On Fri, Sep 6, 2019 at 8:02 AM Daniel Nishimura 
> > wrote:
> >
> > > Please vote for SEP-18
> > > <
> > >
> >
> https://cwiki.apache.org/confluence/display/SAMZA/SEP-18%3A+Startpoints+-+Manipulating+Starting+Offsets+for+Input+Streams
> > > >.
> > > Thanks to the committers and contributors that were involved with the
> > > design, review and implementation!
> > >
> > > SEP-18 has been discussed and implemented using SAMZA-1983
> > > 
> > >
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/SAMZA/SEP-18%3A+Startpoints+-+Manipulating+Starting+Offsets+for+Input+Streams
> > >
> >
> >
> > --
> > Jagadish V,
> > Graduate Student,
> > Department of Computer Science,
> > Stanford University
> >
>

Re: "send to" ordering is inconsistent

2019-06-23 Thread Prateek Maheshwari

Just FYI Tom, this fix is available in the Samza 1.2 release:
https://samza.apache.org/releases/1.2.0

Thanks,
Prateek

On Tue, Mar 12, 2019 at 9:11 AM Tom Davis  wrote:
>
> Bummer! Yes, that works. The phase from "pending" to "running" is a
> nice-to-have at the moment (operations don't take long enough yet to
> warrant the extra state) so we just removed it for the time being.
>
> Prateek Maheshwari  writes:
>
> > Hi Tom,
> >
> > It looks like we won't be able to include SAMZA-2116 in the upcoming 1.1
> > release due to time constraints. It'll have to go in to the 1.2 release,
> > which will tentatively be in June. Does that still work for you?
> >
> > Thanks,
> > Prateek
> >
> > On Thu, Feb 28, 2019 at 2:16 PM Tom Davis  wrote:
> >
> >> Thanks, Prateek! Yes, the workaround will be fine for the time being.
> >> Thank you again!
> >>
> >> Prateek Maheshwari  writes:
> >>
> >> > Hi Tom,
> >> >
> >> > Thanks for reporting this. I created a ticket (SAMZA-2116
> >> > <https://issues.apache.org/jira/browse/SAMZA-2116>) to make the required
> >> > API changes. We'll include this in the next Samza release, which should
> >> be
> >> > mid to late next month.
> >> >
> >> > In the mean time, the workaround would be to keep all of this
> >> functionality
> >> > in a sink function. Does this work for you?
> >> >
> >> > Thanks,
> >> > Prateek
> >> >
> >> > On Wed, Feb 27, 2019 at 2:54 PM Tom Davis 
> >> wrote:
> >> >
> >> >>
> >> >> Prateek Maheshwari  writes:
> >> >>
> >> >> > Hi Tom,
> >> >> >
> >> >> > I'm assuming that the two sub-DAGs you're talking about are the two
> >> Map
> >> >> ->
> >> >> > Send To chains acting on the "audit-report-requests" input and sending
> >> >> > their results to the "audit-report-status" output.
> >> >> >
> >> >>
> >> >> Yes, that's correct.
> >> >>
> >> >> >
> >> >> > Although processing within each Task is in-order, the framework does
> >> not
> >> >> > guarantee the order in which the multiple chained operators for an
> >> >> operator
> >> >> > are evaluated. Specifically, in the current implementation
> >> >> > <
> >> >>
> >> https://github.com/apache/samza/blob/master/samza-core/src/main/java/org/apache/samza/operators/impl/OperatorImpl.java#L106
> >> >> >,
> >> >> > an Operator's registeredOperators are maintained as a HashSet of
> >> >> > OperatorImpls. This would explain the out-of-order appearance of the
> >> two
> >> >> > messages. I'm not sure what's changed in 1.0 that makes this trigger
> >> now.
> >> >> >
> >> >>
> >> >> Ah! I thought this was the case but I couldn't find the part of the code
> >> >> to prove it. This makes far more sense than Kafka routinely not
> >> >> committing messages in order (though it is still technically a
> >> >> possibility).
> >> >>
> >> >> Upon further investigation, I'm not convinced it's a 1.0 issue; I think
> >> >> we just started using multiple chained operators more heavily.
> >> >>
> >> >> >
> >> >> > Since both sendTo and sink are terminal operators (void return type),
> >> I
> >> >> > don't think you'll be able to easily get around this. Let me discuss
> >> this
> >> >> > with the team and get back to you with a workaround / fix.
> >> >> >
> >> >>
> >> >> Thanks a lot! <3
> >> >>
> >> >> >
> >> >> > Thanks,
> >> >> > Prateek
> >> >> >
> >> >> >
> >> >> > On Tue, Feb 26, 2019 at 7:08 PM Tom Davis 
> >> >> wrote:
> >> >> >
> >> >> >> Hey folks!
> >> >> >>
> >> >> >> We have noticed some inconsistencies in message ordering when
> >> running a
> >> >> >> StreamApplication that calls two separate `map` functions over an
> >> input
> >> >> >

Re: Anyone interested in Elasticsearch 5.x ?

2019-06-21 Thread Prateek Maheshwari

Hi Thunder,

Thanks for the offer to update the ES version!

What's the scope of these changes? Is this a client version upgrade
with minor changes, or a reimplementation of / significant changes to
the ElasticsearchSystemProducer/Consumer? Will the changes be
backwards compatible for existing users (looks like we'll be going
from ES 2.x to 5.x).
If these are small changes, and are backwards compatible for users,
we'll be happy to review and merge it. If it's the latter, would be
good to understand what the user impact / migration complexity would
be first.

Thanks,
Prateek

On Wed, Jun 5, 2019 at 10:07 PM Thunder Stumpges
 wrote:
>
> Sending out a feeler for if a PR to update the Elasticsearch module to ES
> 5.x and expand support from just IndexRequest to ActionRequest (allows
> Deletes from ES indices)  would be desired?
>
> We have had this for some time, and I recently went back through our
> customizations, and submitted a few other PRs, and saw this one. It does
> need a little work to make it up to Samza coding standards and more robust
> unit tests, but we've been using it in production for some time now.
>
> If there is interest, I will make a point to clean this up some and get in
> a PR...
>
> Let me know!
> Thunder Stumpges

Re: REMINDER. [VOTE] Apache Samza 1.2.0 RC4

2019-06-05 Thread Prateek Maheshwari

+1 (binding)

Verified build + check-all +  integration tests + signatures.
Thanks for help with the release, Boris and Pawas.

- Prateek

On Wed, Jun 5, 2019 at 3:03 AM Bharath Kumara Subramanian
 wrote:
>
> +1 (non-binding)
>
> Verified build and test on Linux. I too noticed some intermittent failures
> on mac for Scala 2.12.
>
> Thanks,
> Bharath
>
> On Tue, Jun 4, 2019 at 2:00 PM Hai Lu  wrote:
>
> > +1 (non-binding)
> >
> > Verified build and test on Linux box. On mac the test is failing but seems
> > like flakiness not real failure.
> >
> > Thanks,
> > Hai
> >
> > On Tue, Jun 4, 2019 at 1:55 PM santhosh venkat <
> > santhoshvenkat1...@gmail.com>
> > wrote:
> >
> > > +1(non-binding)
> > >
> > > 1. ./bin/check-all.sh succeeded
> > > 2. ./bin/integration-tests.sh succeeded
> > > 3. Expanded samza-tools and followed the tutorial steps for standalone
> > SQL
> > > examples Succeeded.
> > > 4. Verified all sha1 hash code and asc signatures successfully
> > >
> > > Thanks,
> > >
> > >
> > > On Tue, Jun 4, 2019 at 1:26 PM Xinyu Liu  wrote:
> > >
> > > > +1 (binding).
> > > >
> > > > run check-all.sh and the build passed.
> > > >
> > > > Having trouble running the integration tests in both linux and mac,
> > > > possibly due to my local machine env.
> > > >
> > > > Thanks,
> > > > Xinyu
> > > >
> > > > On Mon, Jun 3, 2019 at 11:00 AM Daniel Nishimura  > >
> > > > wrote:
> > > >
> > > > > check-all.sh and integration tests passed. +1 from me.
> > > > >
> > > > > Just a side note, the link in the original email is a broken link.
> > The
> > > > link
> > > > > to the RC archive is: http://home.apache.org/~boryas/samza-1.2.0-rc4
> > > > >
> > > > > On Sun, Jun 2, 2019 at 5:00 PM Boris Shkolnik 
> > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > This is a call for a vote on a release of Apache Samza 1.2.0.
> > Thanks
> > > to
> > > > > > everyone who has contributed to this release.
> > > > > >
> > > > > >
> > > > > > The release candidate can be downloaded from here:
> > > > > > http://home.apache.org/~boryas/samza-1.2.0-rc
> > > > > > 4
> > > > > >
> > > > > > (this release has a fix for standalone integration test)
> > > > > >
> > > > > > The release candidate is signed with pgp key 0x7D74D0CD5B5EB041,
> > > which
> > > > > can
> > > > > > be found
> > > > > >
> > > >
> > http://keyserver.ubuntu.com/pks/lookup?op=get=0x7d74d0cd5b5eb041
> > > > > > <
> > > >
> > http://keyserver.ubuntu.com/pks/lookup?op=get=0xF8B95961A401BF0F
> > > > > >
> > > > > > The git tag is release-1.2.0-rc4 and signed with the same pgp key:
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > https://gitbox.apache.org/repos/asf?p=samza.git;a=tag;h=refs/tags/release-1.2.0-rc
> > > > > > <
> > > > > >
> > > > >
> > > >
> > >
> > https://gitbox.apache.org/repos/asf?p=samza.git;a=tag;h=refs/tags/release-1.1.0-rc1
> > > > > > >
> > > > > > 4
> > > > > >
> > > > > > Test binaries have been published to Maven's staging repository,
> > and
> > > > are
> > > > > > available here:
> > > > > >
> > > https://repository.apache.org/content/repositories/orgapachesamza-106
> > > > > > <
> > > > > >
> > > > >
> > > >
> > >
> > https://repository.apache.org/content/repositories/orgapachesamza-1065/org/
> > > > > > >
> > > > > > 9
> > > > > >
> > > > > > The vote will be open until 06:00 PM PST Monday, 06/03/2019.
> > > > > >
> > > > > >
> > > > > > Please download the release candidate, check the hashes/signature,
> > > > build
> > > > > it
> > > > > > and test it, and then please vote:
> > > > > >
> > > > > > [ ] +1 approve
> > > > > >
> > > > > > [ ] +0 no opinion
> > > > > >
> > > > > > [ ] -1 disapprove (and reason why)
> > > > > >
> > > > > > I ran check-all.sh and integration tests.
> > > > > >
> > > > > > +1 from my side.
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > >
> > > >
> > >
> >

Re: Limiting the job coordinator port range

2019-06-04 Thread Prateek Maheshwari

Hi Malcolm,

The JC server is currently started on a random port. This is because
multiple JCs can exist on the same host, so using a framework-wide
pre-defined port causes bind conflicts.
There's currently no way to narrow down the range or fix the port on a
per-job basis. If you want to add a config for specifying per-job AM
ports, please feel free to open a PR. The relevant code can be found
in SamzaYarnAppMasterService.

Thanks,
Prateek

On Tue, Jun 4, 2019 at 10:58 AM Malcolm McFarland
 wrote:
>
> Hey folks,
>
> Is there any way to specify which ports the tasks communicate with the job
> coordinator on? Right now it looks like it's 3+. Is there any way to
> narrow this?
>
> Cheers,
> Malcolm McFarland
> Cavulus
>
>
> This correspondence is from HealthPlanCRM, LLC, d/b/a Cavulus. Any
> unauthorized or improper disclosure, copying, distribution, or use of the
> contents of this message is prohibited. The information contained in this
> message is intended only for the personal and confidential use of the
> recipient(s) named above. If you have received this message in error,
> please notify the sender immediately and delete the original message.

Re: 0.14 to 1.x Low-Level application migration questions

2019-06-03 Thread Prateek Maheshwari

Hi Thunder,

I'm assuming you're talking about the low level (StreamTask) API here,
since the High Level API has stronger requirements for I/O
systems/streams.

> How much IS picked up from config.
All of the system, stream and store properties can still be specified
in configuration. Properties specified in config will override those
specified using descriptors (with a couple of exceptions like
task.inputs).

>  I don't see how to register [custom coordinator] system via the 
> ApplicationDescriptor
Re: dedicated coordinator system, you can continue to specify
'job.coordinator.system' and it's properties in configs. To keep the
API simple, we only support specifying the job.default.system (which
is the default system for intermediate, coordinator, changelog and
checkpoint streams) using descriptors for now.

> It seems that a KafkaSystem is only associated with the ApplicationDescriptor 
> via its Input/Output/Table descriptors.
Yeah, the ApplicationDescriptor is only aware of system descriptors
transitively through the input / output streams or the default system.
However see the response above for adding systems via configs.

> we have dynamic output SystemStream(s) created based on other runtime state
This will still work in Low Level API. It is recommended to, but
there's no hard requirement to pre-specify your output systems and
streams.

In general, when migrating your Low Level TaskApplication to Samza
1.0, you should be able to do
'applicationDescriptor.withTaskFactory(() -> new MyTask)' in your
TaskApplication#describe with no other code changes. Please give that
a shot and let us know if you run into any issues.

Apologies for the confusion, we'll update the upgrade docs.

Thanks,
Prateek

On Sat, Jun 1, 2019 at 11:13 AM Thunder Stumpges
 wrote:
>
> Hey guys,
>
> I'm following the guide here:
> http://samza.apache.org/releases/1.0.0
>
> In step 3 it says:
> "In Samza 1.0, a Samza application’s input, output, and processing-task
> should be specified in code, rather than in config. "
>
> How much IS picked up from config? Will all the configuration of the
> systems (consumer and producer properties, buffering, etc) be picked up
> from the config properties still? What about stream settings like offset
> reset, offset default, etc?
>
> In some of my tasks, I have a dedicated coordinator system. I don't see how
> to register that system via the ApplicationDescriptor, nor how to associate
> it with the coordinator (config setting `*job.coordinator.system*`). It
> seems that a KafkaSystem is only associated with the ApplicationDescriptor
> via its Input/Output/Table descriptors. Is this correct?
>
> I would like to keep my config in config, not in code, but it feels like
> this is forcing me to move some (or all?) of it into code. I had custom
> config re-writers which made this very flexible, but I'm not seeing how to
> adapt this to the "new way". The Application/ApplicationDescriptor seems to
> have no connection to the Configuration / properties...
>
> One other thing, is that in a few of my jobs, we have dynamic output
> SystemStream(s) created based on other runtime state. Is this not going to
> be possible anymore?
>
> A little more guidance would be most helpful.
>
> Thanks!
> Thunder Stumpges

Re: [DISCUSS] Change Apache Samza git comments/merge email recipient to commits@samza

2019-04-05 Thread Prateek Maheshwari

Thanks Xinyu. Separating discussions and commit messages sounds good to me.
I'm +1, but happy to keep it as-is if others find the commit emails useful.

- Prateek

On Thu, Apr 4, 2019 at 3:14 PM Xinyu Liu  wrote:

> Hi, All,
>
> Our dev mailing list has been flooded with github comments/merges so it's
> really hard to see the meaningful discussions and user engagement. Shall we
> move dev@Samza off the JIRA/github messages? We will use
> comm...@samza.apache.org as the recipients.
>
> In addition, I think we should have most of our open source discussions in
> the dev mailing list to raise visibility as well as attract contributors.
>
> Please let me know if you are ok or have objections to this change. It's
> a +1 on my side.
>
> Thanks,
> Xinyu
>

Re: Running w/ multiple CPUs/container on YARN

2019-04-02 Thread Prateek Maheshwari

And just to double check, you also changed the
yarn.resourcemanager.scheduler.class to CapacityScheduler?

On Tue, Apr 2, 2019 at 9:49 AM Prateek Maheshwari 
wrote:

> Is it still the same message from the AM? The one that says: "Got AM
> register response. The YARN RM supports container requests with max-mem:
> 14336, max-cpu: 1"
>
> On Tue, Apr 2, 2019 at 12:09 AM Malcolm McFarland 
> wrote:
>
>> Hey Prateek,
>>
>> The upgrade to Hadoop 2.7.6 went fine; everything seems to be working, and
>> access to S3 via an access key/secret pair is working as well. However, my
>> requested tasks are still only getting allocated 1 core, despite
>> requesting
>> more than that. Once again, I have a 3-node cluster that should have 24
>> vcores available; on the yarn side, I have these options set:
>>
>> nodemanager.resource.cpu-vcores=8
>> yarn.scheduler.minimum-allocation-vcore=1
>> yarn.scheduler.maximum-allocation-vcores=4
>>
>> yarn.scheduler.capacity.resource-calculator=org.apache.hadoop.yarn.util.resource.DominantResourceCalculator
>>
>> And on the Samza side, I'm setting:
>>
>> cluster-manager.container.cpu.cores=2
>>
>> However, YARN is still telling me that the running task has 1 vcore
>> assigned. Do you have any other suggestions for options to tweak?
>>
>> Cheers,
>> Malcolm
>>
>>
>> On Mon, Apr 1, 2019 at 5:28 PM Malcolm McFarland 
>> wrote:
>>
>> > One more thing -- fwiw, I actually also came across the possibility
>> that I
>> > would need to use the DominantResourceCalculator, but as you point out,
>> > this doesn't seem to be available in Hadoop 2.6.
>> >
>> >
>> > On Mon, Apr 1, 2019 at 5:27 PM Malcolm McFarland <
>> mmcfarl...@cavulus.com>
>> > wrote:
>> >
>> >> That's quite helpful! I actually initially tried using a version of
>> >> Hadoop > 2.6.x; when I did, it seemed like the AWS credentials in YARN
>> >> (fs.s3a.access.key, fs.s3a.secret.key) weren't being accessed, as I
>> >> received lots of "No AWS Credentials
>> >> provided by DefaultAWSCredentialsProviderChain" messages. I found a
>> >> way around this by providing the credentials to the AM directly via
>> >> yarn.am.opts=-Daws.accessKeyId= -Daws.secretKey=, but
>> >> since this seemed very workaround-ish, I just assumed that I would
>> >> eventually hit other problems using a version of Hadoop not pinned in
>> >> the Samza repo. If you're running 2.7.x at LinkedIn, however, I'll
>> >> give it a shot again.
>> >>
>> >> Have you done any AWS credential integration, and if so, did you need
>> >> to do anything special to get it to work?
>> >>
>> >> Cheers,
>> >> Malcolm
>> >>
>> >>
>> >>
>> >> On Mon, Apr 1, 2019 at 5:20 PM Prateek Maheshwari <
>> prateek...@gmail.com>
>> >> wrote:
>> >> >
>> >> > Hi Malcolm,
>> >> >
>> >> > I think this is because in YARN 2.6 the FifoScheduler only accounts
>> for
>> >> > memory for 'maximumAllocation':
>> >> >
>> >>
>> https://github.com/apache/hadoop/blob/branch-2.6.2/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java#L218
>> >> >
>> >> > This has been changed as early as 2.7.0:
>> >> >
>> >>
>> https://github.com/apache/hadoop/blob/branch-2.7.0/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java#L218
>> >> >
>> >> > So upgrading will likely fix this issue. For reference, at LinkedIn
>> we
>> >> are
>> >> > running YARN 2.7.2 with the CapacityScheduler
>> >> > <
>> >>
>> https://hadoop.apache.org/docs/r2.7.2/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html
>> >> >
>> >> > and DominantResourceCalculator to account for vcore allocations in
>> >> > scheduling.
>> >> >
>> >> > - Prateek
>> >> >
>> >> > On Mon, Apr 1, 2019 at 3:00 PM Malcolm McFarland <
>> >> mmcfarl...@cavulus.com>
>> >> > wrote:
>> >> >
>> >> > > Hi Prateek

Re: Running w/ multiple CPUs/container on YARN

2019-04-02 Thread Prateek Maheshwari

Is it still the same message from the AM? The one that says: "Got AM
register response. The YARN RM supports container requests with max-mem:
14336, max-cpu: 1"

On Tue, Apr 2, 2019 at 12:09 AM Malcolm McFarland 
wrote:

> Hey Prateek,
>
> The upgrade to Hadoop 2.7.6 went fine; everything seems to be working, and
> access to S3 via an access key/secret pair is working as well. However, my
> requested tasks are still only getting allocated 1 core, despite requesting
> more than that. Once again, I have a 3-node cluster that should have 24
> vcores available; on the yarn side, I have these options set:
>
> nodemanager.resource.cpu-vcores=8
> yarn.scheduler.minimum-allocation-vcore=1
> yarn.scheduler.maximum-allocation-vcores=4
>
> yarn.scheduler.capacity.resource-calculator=org.apache.hadoop.yarn.util.resource.DominantResourceCalculator
>
> And on the Samza side, I'm setting:
>
> cluster-manager.container.cpu.cores=2
>
> However, YARN is still telling me that the running task has 1 vcore
> assigned. Do you have any other suggestions for options to tweak?
>
> Cheers,
> Malcolm
>
>
> On Mon, Apr 1, 2019 at 5:28 PM Malcolm McFarland 
> wrote:
>
> > One more thing -- fwiw, I actually also came across the possibility that
> I
> > would need to use the DominantResourceCalculator, but as you point out,
> > this doesn't seem to be available in Hadoop 2.6.
> >
> >
> > On Mon, Apr 1, 2019 at 5:27 PM Malcolm McFarland  >
> > wrote:
> >
> >> That's quite helpful! I actually initially tried using a version of
> >> Hadoop > 2.6.x; when I did, it seemed like the AWS credentials in YARN
> >> (fs.s3a.access.key, fs.s3a.secret.key) weren't being accessed, as I
> >> received lots of "No AWS Credentials
> >> provided by DefaultAWSCredentialsProviderChain" messages. I found a
> >> way around this by providing the credentials to the AM directly via
> >> yarn.am.opts=-Daws.accessKeyId= -Daws.secretKey=, but
> >> since this seemed very workaround-ish, I just assumed that I would
> >> eventually hit other problems using a version of Hadoop not pinned in
> >> the Samza repo. If you're running 2.7.x at LinkedIn, however, I'll
> >> give it a shot again.
> >>
> >> Have you done any AWS credential integration, and if so, did you need
> >> to do anything special to get it to work?
> >>
> >> Cheers,
> >> Malcolm
> >>
> >>
> >>
> >> On Mon, Apr 1, 2019 at 5:20 PM Prateek Maheshwari  >
> >> wrote:
> >> >
> >> > Hi Malcolm,
> >> >
> >> > I think this is because in YARN 2.6 the FifoScheduler only accounts
> for
> >> > memory for 'maximumAllocation':
> >> >
> >>
> https://github.com/apache/hadoop/blob/branch-2.6.2/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java#L218
> >> >
> >> > This has been changed as early as 2.7.0:
> >> >
> >>
> https://github.com/apache/hadoop/blob/branch-2.7.0/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java#L218
> >> >
> >> > So upgrading will likely fix this issue. For reference, at LinkedIn we
> >> are
> >> > running YARN 2.7.2 with the CapacityScheduler
> >> > <
> >>
> https://hadoop.apache.org/docs/r2.7.2/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html
> >> >
> >> > and DominantResourceCalculator to account for vcore allocations in
> >> > scheduling.
> >> >
> >> > - Prateek
> >> >
> >> > On Mon, Apr 1, 2019 at 3:00 PM Malcolm McFarland <
> >> mmcfarl...@cavulus.com>
> >> > wrote:
> >> >
> >> > > Hi Prateek,
> >> > >
> >> > > This still seems to be manifesting with the same problem. Since this
> >> seems
> >> > > to be something in the hadoop codebase, and I've emailed the
> >> hadoop-dev
> >> > > mailing list about it.
> >> > >
> >> > > Cheers,
> >> > > Malcolm
> >> > >
> >> > > On Mon, Apr 1, 2019 at 1:51 PM Prateek Maheshwari <
> >> prateek...@gmail.com>
> >> > > wrote:
> >> > >
> >> > > > Hi Malcolm,
> >> > > >
> >> &g

Re: Running w/ multiple CPUs/container on YARN

2019-04-01 Thread Prateek Maheshwari

Hi Malcolm,

I think this is because in YARN 2.6 the FifoScheduler only accounts for
memory for 'maximumAllocation':
https://github.com/apache/hadoop/blob/branch-2.6.2/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java#L218

This has been changed as early as 2.7.0:
https://github.com/apache/hadoop/blob/branch-2.7.0/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java#L218

So upgrading will likely fix this issue. For reference, at LinkedIn we are
running YARN 2.7.2 with the CapacityScheduler
<https://hadoop.apache.org/docs/r2.7.2/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html>
and DominantResourceCalculator to account for vcore allocations in
scheduling.

- Prateek

On Mon, Apr 1, 2019 at 3:00 PM Malcolm McFarland 
wrote:

> Hi Prateek,
>
> This still seems to be manifesting with the same problem. Since this seems
> to be something in the hadoop codebase, and I've emailed the hadoop-dev
> mailing list about it.
>
> Cheers,
> Malcolm
>
> On Mon, Apr 1, 2019 at 1:51 PM Prateek Maheshwari 
> wrote:
>
> > Hi Malcolm,
> >
> > Yes, the AM is just reporting what the RM specified as the maximum
> allowed
> > request size.
> >
> > I think 'yarn.scheduler.maximum-allocation-vcores' needs to be less than
> > 'yarn.nodemanager.resource.cpu-vcores', since a container must fit on a
> > single NM. Maybe the RM detected this and decided to default to 1? Can
> you
> > try setting maximum-allocation-vcores lower?
> >
> > - Prateek
> >
> > On Mon, Apr 1, 2019 at 11:59 AM Malcolm McFarland <
> mmcfarl...@cavulus.com>
> > wrote:
> >
> > > One other detail: I'm running YARN on ECS in AWS. Has anybody seen
> > > issues with core allocation in this environment? I'm seeing this in
> > > the samza log:
> > >
> > > "Got AM register response. The YARN RM supports container requests
> > > with max-mem: 14336, max-cpu: 1"
> > >
> > > How does samza determine this? Looking at the Samza source on Github,
> > > it appears to be information that's passed back to the AM when it
> > > starts up.
> > >
> > > Cheers,
> > > Malcolm
> > >
> > > On Mon, Apr 1, 2019 at 10:44 AM Malcolm McFarland
> > >  wrote:
> > > >
> > > > Hi Prateek,
> > > >
> > > > Sorry, meant to include these versions with my email; I'm running
> > > > Samza 0.14 and Hadoop 2.6.1. I'm running three containers across 3
> > > > node managers, each with 16GB and 8 vcores. The other two containers
> > > > are requesting 1 vcore each; even with the AMs running, that should
> be
> > > > 4 for them in total, leaving plenty of processing power available.
> > > >
> > > > The error is in the application attempt diagnostics field: "The YARN
> > > > cluster is unable to run your job due to unsatisfiable resource
> > > > requirements. You asked for mem: 2048, and cpu: 2." I do not see this
> > > > error with the same memory request, but a cpu count request of 1.
> > > >
> > > > Here are the configuration options pertaining to resource allocation:
> > > >
> > > > 
> > > > 
> > > >   
> > > > yarn.resourcemanager.scheduler.class
> > > >
> > >
> >
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler
> > > >   
> > > >   
> > > > yarn.nodemanager.vmem-check-enabled
> > > > false
> > > >   
> > > >   
> > > > yarn.nodemanager.vmem-pmem-ratio
> > > > 2.1
> > > >   
> > > >   
> > > > yarn.nodemanager.resource.memory-mb
> > > > 14336
> > > >   
> > > >   
> > > > yarn.scheduler.minimum-allocation-mb
> > > > 256
> > > >   
> > > >   
> > > > yarn.scheduler.maximum-allocation-mb
> > > > 14336
> > > >   
> > > >   
> > > > yarn.scheduler.minimum-allocation-vcores
> > > > 1
> > > >   
> > > >   
> > > > yarn.scheduler.maximum-allocation-vcores
> > > > 16
> > > >   
> > > >   
> > > > yarn.nodemanager.reso

Re: Running w/ multiple CPUs/container on YARN

2019-04-01 Thread Prateek Maheshwari

Hi Malcolm,

Yes, the AM is just reporting what the RM specified as the maximum allowed
request size.

I think 'yarn.scheduler.maximum-allocation-vcores' needs to be less than
'yarn.nodemanager.resource.cpu-vcores', since a container must fit on a
single NM. Maybe the RM detected this and decided to default to 1? Can you
try setting maximum-allocation-vcores lower?

- Prateek

On Mon, Apr 1, 2019 at 11:59 AM Malcolm McFarland 
wrote:

> One other detail: I'm running YARN on ECS in AWS. Has anybody seen
> issues with core allocation in this environment? I'm seeing this in
> the samza log:
>
> "Got AM register response. The YARN RM supports container requests
> with max-mem: 14336, max-cpu: 1"
>
> How does samza determine this? Looking at the Samza source on Github,
> it appears to be information that's passed back to the AM when it
> starts up.
>
> Cheers,
> Malcolm
>
> On Mon, Apr 1, 2019 at 10:44 AM Malcolm McFarland
>  wrote:
> >
> > Hi Prateek,
> >
> > Sorry, meant to include these versions with my email; I'm running
> > Samza 0.14 and Hadoop 2.6.1. I'm running three containers across 3
> > node managers, each with 16GB and 8 vcores. The other two containers
> > are requesting 1 vcore each; even with the AMs running, that should be
> > 4 for them in total, leaving plenty of processing power available.
> >
> > The error is in the application attempt diagnostics field: "The YARN
> > cluster is unable to run your job due to unsatisfiable resource
> > requirements. You asked for mem: 2048, and cpu: 2." I do not see this
> > error with the same memory request, but a cpu count request of 1.
> >
> > Here are the configuration options pertaining to resource allocation:
> >
> > 
> > 
> >   
> > yarn.resourcemanager.scheduler.class
> >
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler
> >   
> >   
> > yarn.nodemanager.vmem-check-enabled
> > false
> >   
> >   
> > yarn.nodemanager.vmem-pmem-ratio
> > 2.1
> >   
> >   
> > yarn.nodemanager.resource.memory-mb
> > 14336
> >   
> >   
> > yarn.scheduler.minimum-allocation-mb
> > 256
> >   
> >   
> > yarn.scheduler.maximum-allocation-mb
> > 14336
> >   
> >   
> > yarn.scheduler.minimum-allocation-vcores
> > 1
> >   
> >   
> > yarn.scheduler.maximum-allocation-vcores
> > 16
> >   
> >   
> > yarn.nodemanager.resource.cpu-vcores
> > 8
> >   
> >   
> > yarn.resourcemanager.cluster-id
> > processor-cluster
> >   
> > 
> >
> > Cheers,
> > Malcolm
> >
> > On Mon, Apr 1, 2019 at 10:25 AM Prateek Maheshwari 
> wrote:
> > >
> > > Hi Malcolm,
> > >
> > > Just setting that configuration should be sufficient. We haven't seen
> this
> > > issue before. What Samza/YARN versions are you using? Can you also
> include
> > > the logs from where you get the error and your yarn configuration?
> > >
> > > - Prateek
> > >
> > > On Mon, Apr 1, 2019 at 2:33 AM Malcolm McFarland <
> mmcfarl...@cavulus.com>
> > > wrote:
> > >
> > > > Hey Folks,
> > > >
> > > > I'm having some issues getting multiple cores for containers in yarn.
> > > > I seem to have my YARN settings correct, and the RM interface says
> > > > that I have 24vcores available. However, when I set the
> > > > cluster-manager.container.cpu.cores Samza setting to anything other
> > > > than 1, I get a message about how the container is requesting more
> > > > resources than it can allocate. With 1 core, everything is fine. Is
> > > > there another Samza option I need to set?
> > > >
> > > > Cheers,
> > > > Malcolm
> > > >
> > > >
> > > > --
> > > > Malcolm McFarland
> > > > Cavulus
> > > >
> >
> >
> >
> > --
> > Malcolm McFarland
> > Cavulus
> > 1-800-760-6915
> > mmcfarl...@cavulus.com
> >
> >
> > This correspondence is from HealthPlanCRM, LLC, d/b/a Cavulus. Any
> > unauthorized or improper disclosure, copying, distribution, or use of
> > the contents of this message is prohibited. The information contained
> > in this message is intended only for the personal and confidential use
> > of the recipient(s) named above. If you have received this message in
> > error, please notify the sender immediately and delete the original
> > message.
>
>
>
> --
> Malcolm McFarland
> Cavulus
> 1-800-760-6915
> mmcfarl...@cavulus.com
>
>
> This correspondence is from HealthPlanCRM, LLC, d/b/a Cavulus. Any
> unauthorized or improper disclosure, copying, distribution, or use of
> the contents of this message is prohibited. The information contained
> in this message is intended only for the personal and confidential use
> of the recipient(s) named above. If you have received this message in
> error, please notify the sender immediately and delete the original
> message.
>

Re: Running w/ multiple CPUs/container on YARN

2019-04-01 Thread Prateek Maheshwari

Hi Malcolm,

Just setting that configuration should be sufficient. We haven't seen this
issue before. What Samza/YARN versions are you using? Can you also include
the logs from where you get the error and your yarn configuration?

- Prateek

On Mon, Apr 1, 2019 at 2:33 AM Malcolm McFarland 
wrote:

> Hey Folks,
>
> I'm having some issues getting multiple cores for containers in yarn.
> I seem to have my YARN settings correct, and the RM interface says
> that I have 24vcores available. However, when I set the
> cluster-manager.container.cpu.cores Samza setting to anything other
> than 1, I get a message about how the container is requesting more
> resources than it can allocate. With 1 core, everything is fine. Is
> there another Samza option I need to set?
>
> Cheers,
> Malcolm
>
>
> --
> Malcolm McFarland
> Cavulus
>

Re: SSL with Samza 0.14.1?

2019-03-27 Thread Prateek Maheshwari

+ Xinyu. I think he's still working on merging the remaining 1.0 changes to
the Beam runner.

On Tue, Mar 26, 2019 at 3:35 PM LeVeck, Matt  wrote:

> Thanks Prateek.  I think I’ve switched all of the relevant libraries to
> Beam Runner 2.12-SNAPSHOTl, and Samza 1.1.0.  But for some reason it’s
> still looking for a class, ContextManager that existed in Samza API 0.14.1,
> and not in Samza 1.1.0.  Any idea off the top of your head what I need to
> change for it to stop looking for that?
>
> Error:
> 2019-03-26 22:13:13 DEBUG PipelineOptionsFactory:296 - Provided Arguments:
> {runner=[SamzaRunner], jobName=[spp-kabini-transformer],
> maxSourceParallelism=[2], configFilePath=[config.properties]}
>
> Exception in thread "main" java.lang.NoClassDefFoundError:
> org/apache/samza/operators/ContextManager
>
> at java.lang.Class.getDeclaredMethods0(Native Method)
>
> at
> java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
>
> at java.lang.Class.getDeclaredMethod(Class.java:2128)
>
> at
> org.apache.beam.sdk.util.InstanceBuilder.buildFromMethod(InstanceBuilder.java:191)
>
> at
> org.apache.beam.sdk.util.InstanceBuilder.build(InstanceBuilder.java:155)
>
> at
> org.apache.beam.sdk.PipelineRunner.fromOptions(PipelineRunner.java:55)
>
> at org.apache.beam.sdk.Pipeline.create(Pipeline.java:145)
>
> at
> com.intuit.idp.kabini.transformer.SampleJoiner.main(SampleJoiner.java:257)
>
> Caused by: java.lang.ClassNotFoundException:
> org.apache.samza.operators.ContextManager
>
> at
> java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>
> at
> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
>
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>
> ... 8 more
>
> Relevant snippets of pom:
> 
> 1.5.1
> 2.12.0-SNAPSHOT
> 1.1.0
>
>
>
> …
>
> And
> 
> org.apache.samza
> samza-api
> ${samza.version}
> jar
> 
>
> 
> org.apache.samza
> samza-core_2.12
> ${samza.version}
> 
>
> 
> org.apache.samza
> samza-kafka_2.12
> ${samza.version}
> runtime
> 
>
> 
> org.apache.samza
> samza-kv_2.12
> ${samza.version}
> runtime
> 
>
> 
> org.apache.samza
> samza-kv-rocksdb_2.12
> ${samza.version}
> runtime
> 
>
>
>
>
>
> *From: *Prateek Maheshwari 
> *Date: *Tuesday, March 26, 2019 at 11:31 AM
> *To: *"LeVeck, Matt" 
> *Cc: *"dev@samza.apache.org" , "Deshpande, Omkar" <
> omkar_deshpa...@intuit.com>, "Audo, Nicholas" 
> *Subject: *Re: SSL with Samza 0.14.1?
>
>
>
> This email is from an external sender.
>
>
>
> Hi Matt,
>
>
>
> You're right, the KafkaSystemConsumer in 0.14.1 does not support SSL since
> it uses SimpleConsumer in the BrokerProxy. The new KafkaSystemConsumer in
> Samza 1.0 does.
>
>
>
> Backporting this to 14.1 will be non-trivial. Can you upgrade to Samza 1.0
> to pick up the new consumer? I think Xinyu already shared a beam runner
> snapshot build with 1.0 that you can try.
>
>
>
> - Prateek
>
>
>
> On Mon, Mar 25, 2019 at 1:58 PM LeVeck, Matt 
> wrote:
>
> So, this is far from definitive for SSL.  However it is consistent with
> what we would expect from an SSL error.  We don’t get the error if I
> instantiate a full fledged consumer, or use kafka-console-consumer with SSL
> configs.  We do see this error if we try a console consumer with the
> deprecated interface of providing the zookeeper addresses instead of the
> broker addresses.  That, combined with the code I linked to in my previous
> message (where samza builds a consumer that doesn’t take, and is not passed
> any SSL configs) is why we think SSL is the issue.
>
> Thanks,
>
> Matt
> 2019-03-25 20:53:41 DEBUG KafkaSystemAdmin:57 - Exception detail:
> kafka.common.KafkaException: fetching topic metadata for topics
> [Set(__samza_checkpoint_ver_1_for_spp-kabini-transformer_42)] from broker
> [ArrayBuffer(BrokerEndPoint(0,sppkafka.data-lake-dev.a.intuit.com,19701),
> BrokerEndPoint(2,sppkafka.data-lake-dev.a.intuit.com,19901),
> BrokerEndPoint(1,sppkafka.data-lake-dev.a.intuit.com,19801))] failed at
> kafka.client.ClientUtils$.fetchTopicMetadata(ClientUtils.scala:75) at
> kafka.client.ClientUtils$.fetchTopicMetadata(ClientUtils.scala:96) at
> org.apache.samza.ut

Re: SSL with Samza 0.14.1?

2019-03-25 Thread Prateek Maheshwari

Hi Matt,

It's possible that the old Kafka AdminClient does not support SSL for ZK
out of the box. I'll check if this is the case, and if this is something
that can be configured.

In the mean time, can you tell us the following:
1. Kafka broker version you're running.
2. Kafka client version for the job.
3. Stacktrace where you see the SSL connect errors.

Thanks,
Prateek



On Mon, Mar 25, 2019 at 9:47 AM Prateek Maheshwari 
wrote:

> Forwarding again. Original email did not show up on the OSS mailing list.
>
> -- Forwarded message -
> From: Deshpande, Omkar 
> Date: Fri, Mar 22, 2019 at 5:08 PM
> Subject: Fwd: SSL with Samza 0.14.1?
> To: prateek...@gmail.com 
>
>
> ++Prateek gmail
> --
> *From:* LeVeck, Matt
> *Sent:* Thursday, March 21, 2019 10:33:11 PM
> *To:* dev@samza.apache.org; pmaheshw...@linkedin.com; Deshpande, Omkar;
> Audo, Nicholas
> *Subject:* SSL with Samza 0.14.1?
>
>
> Prateek, Samza dev team,
>
> This is Matt from Intuit.  We met briefly at the beginning of this
> week’s meetup.  I’m wondering if you could help give us some guidance on
> Kafka SSL with Samza.  Here, I’m talking about the Kafka cluster that Samza
> uses to store checkpoints, etc.  We’re trying to connect to a cluster that
> has SSL enabled, and we’re getting some errors that are indicative of SSL
> connectivity failing.  It might just be that our properties file isn’t
> correct.  But we’re a wondering if there is another possibility. This
> indicates that Samza 0.14.1 uses Kafka 0.11 which should have SSL support.
> But Samza 0.14.1 also requires access to zookeeper for its consumer client,
> which is indicative of older clients (see
> https://samza.apache.org/learn/documentation/0.14/jobs/configuration-table.html#kafka).
> Is it possible that Samza 0.14.1 doesn’t support SSL for Kafka when
> creating its checkpoint topics?
>
> Anyways, I’m hoping that’s not the case, and either our config is wrong or
> we’re doing something else wrong.  Here is our properties snippet in case
> we’ve messed up the config key names.  Any guidance is appreciated.
>
>
> # Kafka System
>
> systems.kafka.zookeeper.connect=
> sppzookeeper.data-lake-dev.a.intuit.com:2181,
> sppzookeeper.data-lake-dev.a.intuit.com:2182,
> sppzookeeper.data-lake-dev.a.intuit.com:2183
>
> systems.kafka.security.protocol=SSL
>
> systems.kafka.ssl.enabled.protocols=TLSv1.2,TLSv1.1,TLSv1
>
> systems.kafka.ssl.truststore.type=JKS
>
> systems.kafka.ssl.truststore.location=/home/appuser/spp/kabini.jks
>
> systems.kafka.ssl.truststore.password=Intuit01
>
> systems.kafka.bootstrap.servers=sppkafka.data-lake-dev.a.intuit.com:19701,
> sppkafka.data-lake-dev.a.intuit.com:19801,
> sppkafka.data-lake-dev.a.intuit.com:19901
>
>
> systems.kafka.samza.factory=org.apache.samza.system.kafka.KafkaSystemFactory
>
>
>
> We’ve also tried adding producer and consumer specific entries:
>
>
>
> systems.kafka.producer.security.protocol=SSL
>
> systems.kafka.producer.ssl.enabled.protocols=TLSv1.2,TLSv1.1,TLSv1
>
> systems.kafka.producer.ssl.truststore.type=JKS
>
> systems.kafka.producer.ssl.truststore.location=/home/appuser/spp/kabini.jks
>
> systems.kafka.producer.ssl.truststore.password=Intuit01
>
> systems.kafka.producer.bootstrap.servers=
> sppkafka.data-lake-dev.a.intuit.com:19701,
> sppkafka.data-lake-dev.a.intuit.com:19801,
> sppkafka.data-lake-dev.a.intuit.com:19901
>
> systems.kafka.consumer.zookeeper.connect=
> sppzookeeper.data-lake-dev.a.intuit.com:2181,
> sppzookeeper.data-lake-dev.a.intuit.com:2182,
> sppzookeeper.data-lake-dev.a.intuit.com:2183
>
> systems.kafka.consumer.security.protocol=SSL
>
> systems.kafka.consumer.ssl.enabled.protocols=TLSv1.2,TLSv1.1,TLSv1
>
> systems.kafka.consumer.ssl.truststore.type=JKS
>
> systems.kafka.consumer.ssl.truststore.location=/home/appuser/spp/kabini.jks
>
> systems.kafka.consumer.ssl.truststore.password=Intuit01
>
> systems.kafka.consumer.bootstrap.servers=
> sppkafka.data-lake-dev.a.intuit.com:19701,
> sppkafka.data-lake-dev.a.intuit.com:19801,
> sppkafka.data-lake-dev.a.intuit.com:19901
>
> systems.kafka.zookeeper.connect=
> sppzookeeper.data-lake-dev.a.intuit.com:2181,
> sppzookeeper.data-lake-dev.a.intuit.com:2182,
> sppzookeeper.data-lake-dev.a.intuit.com:2183
>
> systems.kafka.security.protocol=SSL
>
> systems.kafka.ssl.enabled.protocols=TLSv1.2,TLSv1.1,TLSv1
>
> systems.kafka.ssl.truststore.type=JKS
>
> systems.kafka.ssl.truststore.location=/home/appuser/spp/kabini.jks
>
> systems.kafka.ssl.truststore.password=Intuit01
>
> systems.kafka.bootstrap.servers=sppkafka.data-lake-dev.a.intuit.com:19701,
> sppkafka.data-lake-dev.a.intuit.com:19801,
> sppkafka.data-lake-dev.a.intuit.com:19901
>
> Thanks,
>
> Matt
>

Fwd: SSL with Samza 0.14.1?

2019-03-25 Thread Prateek Maheshwari

Forwarding again. Original email did not show up on the OSS mailing list.

-- Forwarded message -
From: Deshpande, Omkar 
Date: Fri, Mar 22, 2019 at 5:08 PM
Subject: Fwd: SSL with Samza 0.14.1?
To: prateek...@gmail.com 

++Prateek gmail
--
*From:* LeVeck, Matt
*Sent:* Thursday, March 21, 2019 10:33:11 PM
*To:* dev@samza.apache.org; pmaheshw...@linkedin.com; Deshpande, Omkar;
Audo, Nicholas
*Subject:* SSL with Samza 0.14.1?

Prateek, Samza dev team,

This is Matt from Intuit.  We met briefly at the beginning of this
week’s meetup.  I’m wondering if you could help give us some guidance on
Kafka SSL with Samza.  Here, I’m talking about the Kafka cluster that Samza
uses to store checkpoints, etc.  We’re trying to connect to a cluster that
has SSL enabled, and we’re getting some errors that are indicative of SSL
connectivity failing.  It might just be that our properties file isn’t
correct.  But we’re a wondering if there is another possibility. This
indicates that Samza 0.14.1 uses Kafka 0.11 which should have SSL support.
But Samza 0.14.1 also requires access to zookeeper for its consumer client,
which is indicative of older clients (see
https://samza.apache.org/learn/documentation/0.14/jobs/configuration-table.html#kafka).
Is it possible that Samza 0.14.1 doesn’t support SSL for Kafka when
creating its checkpoint topics?

Anyways, I’m hoping that’s not the case, and either our config is wrong or
we’re doing something else wrong.  Here is our properties snippet in case
we’ve messed up the config key names.  Any guidance is appreciated.

# Kafka System

systems.kafka.zookeeper.connect=sppzookeeper.data-lake-dev.a.intuit.com:2181
,sppzookeeper.data-lake-dev.a.intuit.com:2182,
sppzookeeper.data-lake-dev.a.intuit.com:2183

systems.kafka.security.protocol=SSL

systems.kafka.ssl.enabled.protocols=TLSv1.2,TLSv1.1,TLSv1

systems.kafka.ssl.truststore.type=JKS

systems.kafka.ssl.truststore.location=/home/appuser/spp/kabini.jks

systems.kafka.ssl.truststore.password=Intuit01

systems.kafka.bootstrap.servers=sppkafka.data-lake-dev.a.intuit.com:19701,
sppkafka.data-lake-dev.a.intuit.com:19801,
sppkafka.data-lake-dev.a.intuit.com:19901

systems.kafka.samza.factory=org.apache.samza.system.kafka.KafkaSystemFactory

We’ve also tried adding producer and consumer specific entries:

systems.kafka.producer.security.protocol=SSL

systems.kafka.producer.ssl.enabled.protocols=TLSv1.2,TLSv1.1,TLSv1

systems.kafka.producer.ssl.truststore.type=JKS

systems.kafka.producer.ssl.truststore.location=/home/appuser/spp/kabini.jks

systems.kafka.producer.ssl.truststore.password=Intuit01

systems.kafka.producer.bootstrap.servers=
sppkafka.data-lake-dev.a.intuit.com:19701,
sppkafka.data-lake-dev.a.intuit.com:19801,
sppkafka.data-lake-dev.a.intuit.com:19901

systems.kafka.consumer.zookeeper.connect=
sppzookeeper.data-lake-dev.a.intuit.com:2181,
sppzookeeper.data-lake-dev.a.intuit.com:2182,
sppzookeeper.data-lake-dev.a.intuit.com:2183

systems.kafka.consumer.security.protocol=SSL

systems.kafka.consumer.ssl.enabled.protocols=TLSv1.2,TLSv1.1,TLSv1

systems.kafka.consumer.ssl.truststore.type=JKS

systems.kafka.consumer.ssl.truststore.location=/home/appuser/spp/kabini.jks

systems.kafka.consumer.ssl.truststore.password=Intuit01

systems.kafka.consumer.bootstrap.servers=
sppkafka.data-lake-dev.a.intuit.com:19701,
sppkafka.data-lake-dev.a.intuit.com:19801,
sppkafka.data-lake-dev.a.intuit.com:19901

systems.kafka.zookeeper.connect=sppzookeeper.data-lake-dev.a.intuit.com:2181
,sppzookeeper.data-lake-dev.a.intuit.com:2182,
sppzookeeper.data-lake-dev.a.intuit.com:2183

systems.kafka.security.protocol=SSL

systems.kafka.ssl.enabled.protocols=TLSv1.2,TLSv1.1,TLSv1

systems.kafka.ssl.truststore.type=JKS

systems.kafka.ssl.truststore.location=/home/appuser/spp/kabini.jks

systems.kafka.ssl.truststore.password=Intuit01

systems.kafka.bootstrap.servers=sppkafka.data-lake-dev.a.intuit.com:19701,
sppkafka.data-lake-dev.a.intuit.com:19801,
sppkafka.data-lake-dev.a.intuit.com:19901

Thanks,

Matt

Re: Error handling

2019-03-22 Thread Prateek Maheshwari

Hi Tom,

This sounds like a bug. ApplicationRunner should return the correct status
when the processor has shut down. We fixed a similar standalone bug
recently, are you already using Samza 1.0.
If this is reproducible / happens again, a thread dump + logs would also be
very helpful for debugging and verifying if the issue is already fixed.

Thanks,
Prateek

On Fri, Mar 22, 2019 at 7:23 AM Tom Davis  wrote:

>
> Prateek Maheshwari  writes:
>
> > Hi Tom,
> >
> > This would depend on what your k8s container orchestration logic looks
> > like. For example, in YARN, 'status' returns 'not running' after 'start'
> > until all the containers requested from the AM are 'running'. We also
> > leverage YARN to restart containers/job automatically on failures (within
> > some bounds). Additionally, we set up a monitoring alert that goes off if
> > the number of running containers stays lower than the number of expected
> > containers for extended periods of time (~ 5 minutes).
> >
> > Are you saying that you noticed that the LocalApplicationRunner status
> > returns 'running' even if its stream processor / SamzaContainer has
> stopped
> > processing?
> >
>
> Yeah, this is what I mean. We have a health check for the overall
> ApplicationStatus but if the containers enter a failed state that
> doesn't result in a shut down of the runner itself. An example from last
> night: Kafka became unavailable at some point and Samza failed to write
> checkpoints for a while, ultimately leading to container failures. The
> last log line is:
>
> o.a.s.c.SamzaContainer - Shutdown is no-op since the container is already
> in
> state: FAILED
>
> This doesn't cause the Pod to be killed, though, so we just silently
> stop processing events. How do you determine the number of expected
> containers? Or are you speaking of containers in terms of YARN and not
> Samza processors?
>
> >
> > - Prateek
> >
> > On Fri, Mar 15, 2019 at 7:26 AM Tom Davis 
> wrote:
> >
> >> I'm using the LocalApplicationRunner and had added a liveness check
> >> around the `status` method. The app is running in Kubernetes so, in
> >> theory, it could be restarted if exceptions happened during processing.
> >> However, it seems that "container failure" is divorced from "app
> >> failure" because the app continues to run even after all the task
> >> containers have shut down. Is there a better way to check for
> >> application health? Is there a way to shut down the application if all
> >> containers have failed? Should I simply ensure exceptions never escape
> >> operators? Thanks!
> >>
>

Re: [VOTE] Apache Samza 1.1.0 RC2

2019-03-18 Thread Prateek Maheshwari

1. Verified checksum and signatures for the binaries.
2. Ran ./check-all.sh
3. Ran YARN and Standalone integration tests with the config patch
successfully.

+1(binding) from my side as well.

Thanks,
Prateek

On Mon, Mar 18, 2019 at 2:06 PM Jagadish Venkatraman 
wrote:

> 1. Verified check-sum and signatures for the release binaries.
> 2. Ran ./check-all.sh successfully
> 3. Ran YARN integration tests successfully
> 4. Encountered an error on the standalone integration test, but it
> succeeded after setting Kafka's replication factor config to 1.
>
> +1(binding) from my side.
>
> Thanks Daniel Chen and Shanthoosh for shepherding Samza 1.0.1!
>
> On Mon, Mar 18, 2019 at 9:47 AM Jake Maes  wrote:
>
> > Verified with check-all on RHEL 7
> >
> > Verified pgp and sha.
> >
> > +1 (binding)
> >
> > On Fri, Mar 15, 2019 at 11:39 AM rayman preet 
> > wrote:
> >
> > > +1 (Non-binding)
> > >
> > > --
> > > thanks
> > > rayman
> > >
> > > On Wed, Mar 13, 2019 at 7:17 PM Daniel Chen  wrote:
> > >
> > > > Hi,
> > > >
> > > > I performed the following verifications:
> > > >
> > > > 1. ./bin/check-all.sh succeeded.
> > > >
> > > > 2. Verified both ./bin/integration-tests.sh yarn-integration-tests
> and
> > > > ./bin/integration-tests.sh standalone-integration-tests succeeded.
> > > >
> > > > 3. Verified that SQL console available in samza-tool.tgz.
> > > >
> > > > +1 (Non-binding)
> > > >
> > > >
> > > > Thanks,
> > > >
> > > > Daniel
> > > >
> > > >
> > > > On Tue, Mar 12, 2019 at 4:11 PM santhosh venkat <
> > > > santhoshvenkat1...@gmail.com> wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > This is a call for a vote on a release of Apache Samza 1.1.0.
> Thanks
> > to
> > > > > everyone who has contributed to this release.
> > > > >
> > > > > The release candidate can be downloaded from here:
> > > > > http://home.apache.org/~shanthoosh/samza-1.1.0-rc2/
> > > > >
> > > > > The release candidate is signed with pgp key 0xF8B95961A401BF0F,
> > which
> > > > can
> > > > > be found
> > > > >
> > >
> http://keyserver.ubuntu.com/pks/lookup?op=get=0xF8B95961A401BF0F
> > > > >
> > > > > The git tag is release-1.1.0-rc0 and signed with the same pgp key:
> > > > >
> > > > >
> > > >
> > >
> >
> https://gitbox.apache.org/repos/asf?p=samza.git;a=tag;h=refs/tags/release-1.1.0-rc2
> > > > >
> > > > > Test binaries have been published to Maven's staging repository,
> and
> > > are
> > > > > available here:
> > > > >
> > >
> https://repository.apache.org/content/repositories/orgapachesamza-1060/
> > > > >
> > > > > The vote will be open for 72 hours (ending at 16:30 PM PST
> Thursday,
> > > > > 03/15/2018).
> > > > >
> > > > > Please download the release candidate, check the hashes/signature,
> > > build
> > > > it
> > > > > and test it, and then please vote:
> > > > >
> > > > > [ ] +1 approve
> > > > >
> > > > > [ ] +0 no opinion
> > > > >
> > > > > [ ] -1 disapprove (and reason why)
> > > > >
> > > > > I ran check-all.sh, integration tests and verified the SQL console
> > > > > in samza-tool tgz.
> > > > >
> > > > > +1 (non-binding) from my side.
> > > > >
> > > > > Thanks,
> > > > >
> > > >
> > >
> > >
> > > --
> > > thanks
> > > rayman
> > >
> >
>
>
> --
> Jagadish V,
> Graduate Student,
> Department of Computer Science,
> Stanford University
>

Re: Error handling

2019-03-15 Thread Prateek Maheshwari

Hi Tom,

This would depend on what your k8s container orchestration logic looks
like. For example, in YARN, 'status' returns 'not running' after 'start'
until all the containers requested from the AM are 'running'. We also
leverage YARN to restart containers/job automatically on failures (within
some bounds). Additionally, we set up a monitoring alert that goes off if
the number of running containers stays lower than the number of expected
containers for extended periods of time (~ 5 minutes).

Are you saying that you noticed that the LocalApplicationRunner status
returns 'running' even if its stream processor / SamzaContainer has stopped
processing?

- Prateek

On Fri, Mar 15, 2019 at 7:26 AM Tom Davis  wrote:

> I'm using the LocalApplicationRunner and had added a liveness check
> around the `status` method. The app is running in Kubernetes so, in
> theory, it could be restarted if exceptions happened during processing.
> However, it seems that "container failure" is divorced from "app
> failure" because the app continues to run even after all the task
> containers have shut down. Is there a better way to check for
> application health? Is there a way to shut down the application if all
> containers have failed? Should I simply ensure exceptions never escape
> operators? Thanks!
>

Re: Backing Kafka/Yarn/Zookeeper version for Samza 1.0.0?

2019-03-14 Thread Prateek Maheshwari

Hi Jeremiah,

If you're using YARN, it requires ZK for itself. You'll also need ZK if
you're using Standalone with the ZKJobCoordinator. Other than that the
framework does not rely on ZK for anything.

Hope that helps.

- Prateek

On Tue, Mar 12, 2019 at 7:23 AM Jeremiah Adams 
wrote:

> Is zookeeper required for the tech stack? If samza does all of its
> checkpointing on kafka, I may be able to save us some money by eliminating
> the ZK cluster.
>
>
>
>
>
>
> * Jeremiah Adams*
>
> Software Engineer
>
> www.helixeducation.com
>
> Blog <http://www.helixeducation.com/blog/> | Twitter
> <https://twitter.com/HelixEducation> | Facebook
> <https://www.facebook.com/HelixEducation> | LinkedIn
> <http://www.linkedin.com/company/3609946>
>
>
>
>
>
> *From: *Prateek Maheshwari 
> *Date: *Monday, March 11, 2019 at 6:31 PM
> *To: *"dev@samza.apache.org" , Jeremiah Adams <
> jad...@helixeducation.com>
> *Subject: *Re: Backing Kafka/Yarn/Zookeeper version for Samza 1.0.0?
>
>
>
> Hi Jeremiah,
>
>
>
> We're in the process of upgrading Samza to use Kafka clients version 2.0,
> and YARN client version 2.9. This should be available in the next release
> (version 1.2). In the mean time, Kafka 0.11 and YARN 2.6 / 2.7 are the
> recommended versions.
>
>
>
> Can you clarify what you mean about hard requirements for Zookeeper since
> checkpointing?
>
>
>
> Thanks,
> Prateek
>
>
>
> On Fri, Mar 8, 2019 at 11:27 AM Jeremiah Adams 
> wrote:
>
> I am in the process of updating our stack. I’m not seeing documentation on
> what versions of Kafka, Zookeeper and Yarn Samza 1.0.0 should be run on.
> Best I can tell is a 2.6.1 dependency on Hadoop jars and 0.11.0.2 kafka
> jars.  If these jars are meant to match versions of the software,  they are
> dated. Hadoop is up to version 3.2.0 Kafka is up to version 2.1.1
>
> Any advice concerning versions of the components? Also we are planning on
> moving all Samza jobs to standalone and eliminating the Yarn dependencies.
> I prefer taking iterative steps so want to get the updates done before
> moving to containers.
>
> Also, does Samza have any hard requirements for Zookeeper since
> checkpointing was implemented?
>
>
> Jeremiah Adams
> Software Engineer
> www.helixeducation.com
> <https://url.emailprotection.link/?basKr9vk92a8vVw0XMnK5bnWsxM_w6KChRx8CY_UgrU5RmcwzgGL3Po63B7rJIXeNyMBLKYpptY6Rl-f5kb6p2A~~>
> <http://www.helixeducation.com/
> <https://url.emailprotection.link/?basKr9vk92a8vVw0XMnK5bmaSKuBc0AuEZ7YasYc7Df8YVt3SYmcjmLWdKMWzAAINWlUUA33ebGI7pSoTl9cg1g~~>
> >
> Blog<http://www.helixeducation.com/blog/
> <https://url.emailprotection.link/?basKr9vk92a8vVw0XMnK5bmaSKuBc0AuEZ7YasYc7Df-lAcqG1fqHPpNw-wd9z7HtUJeCG5_8UjCf2mHtn6C_zQ~~>>
> | Twitter<https://twitter.com/HelixEducation
> <https://url.emailprotection.link/?bVO2q0UXR235wN_yOnM0FjqITPdBYMD3reLGNddq-zPV5ChMQK9JwV4Be-QnrbRoXpJl8IcknAqKzYtA3RABKww~~>>
> | Facebook<https://www.facebook.com/HelixEducation
> <https://url.emailprotection.link/?bUU7m4NfMS_EWGtH1yojBHX9sWZ6uxVdT1eQUkmU5vWY01WFZiS2KJ-c9iLIncdHB7Uw1lRYCprEEpPPQCdiK6Q~~>>
> | LinkedIn<http://www.linkedin.com/company/3609946
> <https://url.emailprotection.link/?b0ZQfJ1pZYnASyoShs9MJI46-r1lxPhA-JS5VSkR7so-DFP0_HxbOo2LsajGOaoYXxb1ZCOMAu7hZscPCnIKWpXz0cpgQ386SnNHjPcwsu4z90mzBkuwoZc6YxOCzMGA0>
> >
>
>

Re: Backing Kafka/Yarn/Zookeeper version for Samza 1.0.0?

2019-03-11 Thread Prateek Maheshwari

Hi Jeremiah,

We're in the process of upgrading Samza to use Kafka clients version 2.0,
and YARN client version 2.9. This should be available in the next release
(version 1.2). In the mean time, Kafka 0.11 and YARN 2.6 / 2.7 are the
recommended versions.

Can you clarify what you mean about hard requirements for Zookeeper since
checkpointing?

Thanks,
Prateek

On Fri, Mar 8, 2019 at 11:27 AM Jeremiah Adams 
wrote:

> I am in the process of updating our stack. I’m not seeing documentation on
> what versions of Kafka, Zookeeper and Yarn Samza 1.0.0 should be run on.
> Best I can tell is a 2.6.1 dependency on Hadoop jars and 0.11.0.2 kafka
> jars.  If these jars are meant to match versions of the software,  they are
> dated. Hadoop is up to version 3.2.0 Kafka is up to version 2.1.1
>
> Any advice concerning versions of the components? Also we are planning on
> moving all Samza jobs to standalone and eliminating the Yarn dependencies.
> I prefer taking iterative steps so want to get the updates done before
> moving to containers.
>
> Also, does Samza have any hard requirements for Zookeeper since
> checkpointing was implemented?
>
>
> Jeremiah Adams
> Software Engineer
> www.helixeducation.com
> Blog | Twitter<
> https://twitter.com/HelixEducation> | Facebook<
> https://www.facebook.com/HelixEducation> | LinkedIn<
> http://www.linkedin.com/company/3609946>
>
>

Re: "send to" ordering is inconsistent

2019-03-07 Thread Prateek Maheshwari

Hi Tom,

It looks like we won't be able to include SAMZA-2116 in the upcoming 1.1
release due to time constraints. It'll have to go in to the 1.2 release,
which will tentatively be in June. Does that still work for you?

Thanks,
Prateek

On Thu, Feb 28, 2019 at 2:16 PM Tom Davis  wrote:

> Thanks, Prateek! Yes, the workaround will be fine for the time being.
> Thank you again!
>
> Prateek Maheshwari  writes:
>
> > Hi Tom,
> >
> > Thanks for reporting this. I created a ticket (SAMZA-2116
> > <https://issues.apache.org/jira/browse/SAMZA-2116>) to make the required
> > API changes. We'll include this in the next Samza release, which should
> be
> > mid to late next month.
> >
> > In the mean time, the workaround would be to keep all of this
> functionality
> > in a sink function. Does this work for you?
> >
> > Thanks,
> > Prateek
> >
> > On Wed, Feb 27, 2019 at 2:54 PM Tom Davis 
> wrote:
> >
> >>
> >> Prateek Maheshwari  writes:
> >>
> >> > Hi Tom,
> >> >
> >> > I'm assuming that the two sub-DAGs you're talking about are the two
> Map
> >> ->
> >> > Send To chains acting on the "audit-report-requests" input and sending
> >> > their results to the "audit-report-status" output.
> >> >
> >>
> >> Yes, that's correct.
> >>
> >> >
> >> > Although processing within each Task is in-order, the framework does
> not
> >> > guarantee the order in which the multiple chained operators for an
> >> operator
> >> > are evaluated. Specifically, in the current implementation
> >> > <
> >>
> https://github.com/apache/samza/blob/master/samza-core/src/main/java/org/apache/samza/operators/impl/OperatorImpl.java#L106
> >> >,
> >> > an Operator's registeredOperators are maintained as a HashSet of
> >> > OperatorImpls. This would explain the out-of-order appearance of the
> two
> >> > messages. I'm not sure what's changed in 1.0 that makes this trigger
> now.
> >> >
> >>
> >> Ah! I thought this was the case but I couldn't find the part of the code
> >> to prove it. This makes far more sense than Kafka routinely not
> >> committing messages in order (though it is still technically a
> >> possibility).
> >>
> >> Upon further investigation, I'm not convinced it's a 1.0 issue; I think
> >> we just started using multiple chained operators more heavily.
> >>
> >> >
> >> > Since both sendTo and sink are terminal operators (void return type),
> I
> >> > don't think you'll be able to easily get around this. Let me discuss
> this
> >> > with the team and get back to you with a workaround / fix.
> >> >
> >>
> >> Thanks a lot! <3
> >>
> >> >
> >> > Thanks,
> >> > Prateek
> >> >
> >> >
> >> > On Tue, Feb 26, 2019 at 7:08 PM Tom Davis 
> >> wrote:
> >> >
> >> >> Hey folks!
> >> >>
> >> >> We have noticed some inconsistencies in message ordering when
> running a
> >> >> StreamApplication that calls two separate `map` functions over an
> input
> >> >> and sends results to the same output. I have attached my Execution
> Plan,
> >> >> but the gist is that the first `map` function marks a thing as
> "pending"
> >> >> by sending a message to a status topic and the second `map` function
> >> >> does some work then sends its own status with "done".
> >> >>
> >> >> We have a test set up to read the resulting status topic with a
> normal
> >> >> Kafka consumer to ensure that two status messages were produced by
> Samza
> >> >> and consumed in the proper order (first "pending", then "done", per
> the
> >> >> order of the MessageStream call chains). This test flaps pretty
> >> >> routinely since upgrading to Samza 1.0; we never noticed this in the
> >> >> past. Sometimes, it times out waiting for any messages, though that's
> >> >> considerably less rare than the ordering issue. My understanding is,
> for
> >> >> a given Task, by default, all processing should be done serially. Is
> >> >> that no longer true? Is the guarantee *only* for the order in which
> >> >> messages are consumed, not produced?
> >> >>
> >> >> For test simplicity, there's a single Kafka partition for each topic
> and
> >> >> I attempted to create a configuration file that would eliminate as
> much
> >> >> coordination and concurrency sources as I knew how:
> >> >>
> >> >>   processor.id=0
> >> >>
> >> >>
> >>
> job.coordinator.factory=org.apache.samza.standalone.PassthroughJobCoordinatorFactory
> >> >>   job.container.single.thread.mode=true
> >> >>
> >> >> (We use the ZkJobCoordinatorFactory normally but both produce the
> bug)
> >> >>
> >> >> I realize the KafkaProducer does not *technically* guarantee delivery
> >> >> order except when using transactions, which KafkaSystemProducer
> doesn't
> >> >> appear to do by default. I have checked the actual message envelope
> and
> >> >> when the ordering is wrong, the offset order is correct -- so, "done"
> >> >> was recorded by Kafka prior to "pending". This seems to rule out
> Samza
> >> >> but I'm not entirely confident in that conclusion. Any thoughts?
> >> >>
> >> >> Thanks,
> >> >>
> >> >> Tom
> >> >>
> >>
>

Re: [POSSIBLE PHISHING] Task Partition Commit Failed After Upgrade

2019-03-07 Thread Prateek Maheshwari

Jeremiah, were you able to resolve this issue?

- Prateek

On Wed, Mar 6, 2019 at 10:08 AM Prateek Maheshwari 
wrote:

> Hi Jeremiah,
>
> The configuration you want to look for is:
> 'job.systemstreampartition.grouper.factory'. It should default to:
> 'org.apache.samza.container.grouper.stream.GroupByPartitionFactory'.
> Can you check if you see this value in the configuration logged by
> SamzaContainer during container start? You can grep for: "Using
> configuration".
>
> For context, there are two groupers for a Samza job. One that groups input
> partitions into tasks (this one), and one that groups tasks into containers
> (the one you mentioned above).
>
> Thanks,
> Prateek
>
>
>
> On Wed, Mar 6, 2019 at 8:14 AM Jeremiah Adams 
> wrote:
>
>> It appears that the issue is related to the KafkaCheckpointLogKey.java
>> constructor. grouperFactoryClassName here is null.  THe documentation
>> indicates that task.name.grouper.factory config setting has a default value
>> of
>>  org.apache.samza.container.grouper.task.GroupByContainerCountFactory. I
>> wouldn't expect it to be null here.
>>
>> If I specify GroupByContainerCountFactory for the
>> task.name.grouper.factory in my properties file, I get a
>> NoSuchMethodException:
>>
>> Exception in thread "main" java.lang.InstantiationException:
>> org.apache.samza.container.grouper.task.GroupByContainerCount
>> at java.lang.Class.newInstance(Class.java:427)
>> at org.apache.samza.util.Util$.getObj(Util.scala:80)
>> at
>> org.apache.samza.coordinator.JobModelManager$.readJobModel(JobModelManager.scala:261)
>> at
>> org.apache.samza.coordinator.JobModelManager$.getJobModelManager(JobModelManager.scala:155)
>> at
>> org.apache.samza.coordinator.JobModelManager$.apply(JobModelManager.scala:117)
>> at
>> org.apache.samza.coordinator.JobModelManager.apply(JobModelManager.scala)
>> at
>> org.apache.samza.clustermanager.ClusterBasedJobCoordinator.buildJobModelManager(ClusterBasedJobCoordinator.java:241)
>> at
>> org.apache.samza.clustermanager.ClusterBasedJobCoordinator.(ClusterBasedJobCoordinator.java:152)
>> at
>> org.apache.samza.clustermanager.ClusterBasedJobCoordinator.main(ClusterBasedJobCoordinator.java:297)
>> Caused by: java.lang.NoSuchMethodException:
>> org.apache.samza.container.grouper.task.GroupByContainerCount.()
>> at java.lang.Class.getConstructor0(Class.java:3082)
>> at java.lang.Class.newInstance(Class.java:412)
>> ... 8 more
>>
>>
>>
>> Jeremiah Adams
>> Software Engineer
>> www.helixeducation.com <http://www.helixeducation.com/>
>> Blog <http://www.helixeducation.com/blog/> | Twitter <
>> https://twitter.com/HelixEducation> | Facebook <
>> https://www.facebook.com/HelixEducation> | LinkedIn <
>> http://www.linkedin.com/company/3609946>
>>
>>
>> On 3/4/19, 2:48 PM, "Jeremiah Adams"  wrote:
>>
>> I am updating dependencies and moving from Samza V0.13.0 to V0.14.0.
>> I develop locally using the grid app in the hello-samza project to spin up
>> local yarn/zookeeper/kafka instances.
>>
>> Grid is running these versions:
>> kafka_2.11-0.10.2.1.tgz
>> hadoop-2.6.1.tar.gz
>> zookeeper-3.4.3.tar.gz
>>
>>
>> My job is now failing with the NPE below. anyone have ideas on the
>> cause of this error?
>>
>>
>> 2019-03-04 14:13:49 AsyncRunLoop [ERROR] Task Partition 0 commit
>> failed
>> java.lang.NullPointerException
>> at
>> com.google.common.base.Preconditions.checkNotNull(Preconditions.java:782)
>> at
>> org.apache.samza.checkpoint.kafka.KafkaCheckpointLogKey.(KafkaCheckpointLogKey.java:46)
>> at
>> org.apache.samza.checkpoint.kafka.KafkaCheckpointManager.writeCheckpoint(KafkaCheckpointManager.scala:136)
>> at
>> org.apache.samza.checkpoint.OffsetManager.writeCheckpoint(OffsetManager.scala:259)
>> at
>> org.apache.samza.container.TaskInstance.commit(TaskInstance.scala:205)
>> at
>> org.apache.samza.task.AsyncRunLoop$AsyncTaskWorker$5.run(AsyncRunLoop.java:494)
>> at
>> org.apache.samza.task.AsyncRunLoop$AsyncTaskWorker.commit(AsyncRunLoop.java:513)
>> at
>> org.apache.samza.task.AsyncRunLoop$AsyncTaskWorker.run(AsyncRunLoop.java:379)
>> at
>> org.apache.samza.task.AsyncRunLoop$AsyncTaskWorker.access$300(AsyncRu

Re: [DISCUSS] Samza 1.1.0 release

2019-03-07 Thread Prateek Maheshwari

Daniel, let's try to include the following change in the release as well.
SAMZA-2116: Make sendTo and sink operators non-terminal

Other than that, +1 (binding).

- Prateek



On Thu, Mar 7, 2019 at 9:22 AM Xinyu Liu  wrote:

> +1 (binding)
>
> Thanks,
> Xinyu
>
> On Thu, Mar 7, 2019 at 12:43 AM santhosh venkat <
> santhoshvenkat1...@gmail.com> wrote:
>
> > +1 (non-binding)
> >
> > Thanks,
> >
> > On Wed, Mar 6, 2019 at 10:42 PM Yi Pan  wrote:
> >
> > > +1 (binding)
> > >
> > > On Wed, Mar 6, 2019 at 10:08 PM Daniel Chen  wrote:
> > >
> > > > Hello everyone,
> > > >
> > > > We have added couple of major features to master since 1.0.0 that
> > > warrants
> > > > a major release.
> > > >
> > > > Within LinkedIn, some of these features have already been tested as
> > part
> > > of
> > > > our test suites. We plan to continue our testing in coming weeks to
> > > > validate the stability prior to release.
> > > >
> > > > Here is the highlighted list of features that are part of the new
> > release
> > > > (in chronological order)
> > > > SAMZA-1981
> > > > Consolidate table descriptors to samza-api
> > > > SAMZA-1985
> > > > Implement Startpoints model and StartpointManager
> > > > SAMZA-1998
> > > > Table API refactoring
> > > > SAMZA-2012
> > > > Add API for wiring an external context through to application
> > processing
> > > > code
> > > > SAMZA-2041
> > > > Add system descriptors for HDFS and Kinesis
> > > > SAMZA-2043
> > > > Consolidate ReadableTable and ReadWriteTable
> > > > SAMZA-2106
> > > > Samza App & Job Config Refactor
> > > > SAMZA-2081
> > > > Samza SQL : Type system for Samza SQL
> > > >
> > > > You can find a complete list of features here:
> > > >
> > > >
> > >
> >
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fissues%2F%3Fjql%3Dproject%2520%253D%2520SAMZA%2520AND%2520resolution%2520%2520%253D%2520Fixed%2520%2520AND%2520(fixVersion%2520%253E%253D%25201.1%2520)%2520ORDER%2520BY%2520createdDate%2520%2520DESCdata=02%7C01%7Cdchen1%40linkedin.com%7C01251a7438ea4324f3f608d6a2c11a53%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636875347611087937sdata=ZDMaQj5vX6Vlm%2B8vpGhrNygxpI2vvNnYGi1USWe%2FD5A%3Dreserved=0
> > > >
> > > > Here is my proposal on our release schedule and timelines.
> > > >
> > > >1. Cut a release version 1.1.0 from master
> > > >2. Target a release vote on the week March 13th (next week)
> > > >
> > > > Thoughts?
> > > >
> > > > Thanks,
> > > > Daniel
> > > >
> > >
> >
>

Re: [POSSIBLE PHISHING] Task Partition Commit Failed After Upgrade

2019-03-06 Thread Prateek Maheshwari

Hi Jeremiah,

The configuration you want to look for is:
'job.systemstreampartition.grouper.factory'. It should default to:
'org.apache.samza.container.grouper.stream.GroupByPartitionFactory'.
Can you check if you see this value in the configuration logged by
SamzaContainer during container start? You can grep for: "Using
configuration".

For context, there are two groupers for a Samza job. One that groups input
partitions into tasks (this one), and one that groups tasks into containers
(the one you mentioned above).

Thanks,
Prateek



On Wed, Mar 6, 2019 at 8:14 AM Jeremiah Adams 
wrote:

> It appears that the issue is related to the KafkaCheckpointLogKey.java
> constructor. grouperFactoryClassName here is null.  THe documentation
> indicates that task.name.grouper.factory config setting has a default value
> of
>  org.apache.samza.container.grouper.task.GroupByContainerCountFactory. I
> wouldn't expect it to be null here.
>
> If I specify GroupByContainerCountFactory for the
> task.name.grouper.factory in my properties file, I get a
> NoSuchMethodException:
>
> Exception in thread "main" java.lang.InstantiationException:
> org.apache.samza.container.grouper.task.GroupByContainerCount
> at java.lang.Class.newInstance(Class.java:427)
> at org.apache.samza.util.Util$.getObj(Util.scala:80)
> at
> org.apache.samza.coordinator.JobModelManager$.readJobModel(JobModelManager.scala:261)
> at
> org.apache.samza.coordinator.JobModelManager$.getJobModelManager(JobModelManager.scala:155)
> at
> org.apache.samza.coordinator.JobModelManager$.apply(JobModelManager.scala:117)
> at
> org.apache.samza.coordinator.JobModelManager.apply(JobModelManager.scala)
> at
> org.apache.samza.clustermanager.ClusterBasedJobCoordinator.buildJobModelManager(ClusterBasedJobCoordinator.java:241)
> at
> org.apache.samza.clustermanager.ClusterBasedJobCoordinator.(ClusterBasedJobCoordinator.java:152)
> at
> org.apache.samza.clustermanager.ClusterBasedJobCoordinator.main(ClusterBasedJobCoordinator.java:297)
> Caused by: java.lang.NoSuchMethodException:
> org.apache.samza.container.grouper.task.GroupByContainerCount.()
> at java.lang.Class.getConstructor0(Class.java:3082)
> at java.lang.Class.newInstance(Class.java:412)
> ... 8 more
>
>
>
> Jeremiah Adams
> Software Engineer
> www.helixeducation.com 
> Blog  | Twitter <
> https://twitter.com/HelixEducation> | Facebook <
> https://www.facebook.com/HelixEducation> | LinkedIn <
> http://www.linkedin.com/company/3609946>
>
>
> On 3/4/19, 2:48 PM, "Jeremiah Adams"  wrote:
>
> I am updating dependencies and moving from Samza V0.13.0 to V0.14.0.
> I develop locally using the grid app in the hello-samza project to spin up
> local yarn/zookeeper/kafka instances.
>
> Grid is running these versions:
> kafka_2.11-0.10.2.1.tgz
> hadoop-2.6.1.tar.gz
> zookeeper-3.4.3.tar.gz
>
>
> My job is now failing with the NPE below. anyone have ideas on the
> cause of this error?
>
>
> 2019-03-04 14:13:49 AsyncRunLoop [ERROR] Task Partition 0 commit failed
> java.lang.NullPointerException
> at
> com.google.common.base.Preconditions.checkNotNull(Preconditions.java:782)
> at
> org.apache.samza.checkpoint.kafka.KafkaCheckpointLogKey.(KafkaCheckpointLogKey.java:46)
> at
> org.apache.samza.checkpoint.kafka.KafkaCheckpointManager.writeCheckpoint(KafkaCheckpointManager.scala:136)
> at
> org.apache.samza.checkpoint.OffsetManager.writeCheckpoint(OffsetManager.scala:259)
> at
> org.apache.samza.container.TaskInstance.commit(TaskInstance.scala:205)
> at
> org.apache.samza.task.AsyncRunLoop$AsyncTaskWorker$5.run(AsyncRunLoop.java:494)
> at
> org.apache.samza.task.AsyncRunLoop$AsyncTaskWorker.commit(AsyncRunLoop.java:513)
> at
> org.apache.samza.task.AsyncRunLoop$AsyncTaskWorker.run(AsyncRunLoop.java:379)
> at
> org.apache.samza.task.AsyncRunLoop$AsyncTaskWorker.access$300(AsyncRunLoop.java:314)
> at
> org.apache.samza.task.AsyncRunLoop.runTasks(AsyncRunLoop.java:228)
> at
> org.apache.samza.task.AsyncRunLoop.run(AsyncRunLoop.java:157)
> at
> org.apache.samza.container.SamzaContainer.run(SamzaContainer.scala:728)
> at
> org.apache.samza.runtime.LocalContainerRunner.run(LocalContainerRunner.java:102)
> at
> org.apache.samza.runtime.LocalContainerRunner.main(LocalContainerRunner.java:147)
> 2019-03-04 14:13:49 AsyncRunLoop [ERROR] Caught throwable and stopping
> run loop
>
>
>
> Jeremiah Adams
> Software Engineer
>
> https://url.emailprotection.link/?ahfhEufaAWbezBrUFPG98ZJcterGfIerU3ZwsA3Gv_C0~
> <
> https://url.emailprotection.link/?a49H2rNGIIBtQOw6md8OcHp-qKE3Xn2gNiZ3dlqAeSDA~
> >
> Blog<
>

Re: "send to" ordering is inconsistent

2019-02-28 Thread Prateek Maheshwari

Hi Tom,

Thanks for reporting this. I created a ticket (SAMZA-2116
<https://issues.apache.org/jira/browse/SAMZA-2116>) to make the required
API changes. We'll include this in the next Samza release, which should be
mid to late next month.

In the mean time, the workaround would be to keep all of this functionality
in a sink function. Does this work for you?

Thanks,
Prateek

On Wed, Feb 27, 2019 at 2:54 PM Tom Davis  wrote:

>
> Prateek Maheshwari  writes:
>
> > Hi Tom,
> >
> > I'm assuming that the two sub-DAGs you're talking about are the two Map
> ->
> > Send To chains acting on the "audit-report-requests" input and sending
> > their results to the "audit-report-status" output.
> >
>
> Yes, that's correct.
>
> >
> > Although processing within each Task is in-order, the framework does not
> > guarantee the order in which the multiple chained operators for an
> operator
> > are evaluated. Specifically, in the current implementation
> > <
> https://github.com/apache/samza/blob/master/samza-core/src/main/java/org/apache/samza/operators/impl/OperatorImpl.java#L106
> >,
> > an Operator's registeredOperators are maintained as a HashSet of
> > OperatorImpls. This would explain the out-of-order appearance of the two
> > messages. I'm not sure what's changed in 1.0 that makes this trigger now.
> >
>
> Ah! I thought this was the case but I couldn't find the part of the code
> to prove it. This makes far more sense than Kafka routinely not
> committing messages in order (though it is still technically a
> possibility).
>
> Upon further investigation, I'm not convinced it's a 1.0 issue; I think
> we just started using multiple chained operators more heavily.
>
> >
> > Since both sendTo and sink are terminal operators (void return type), I
> > don't think you'll be able to easily get around this. Let me discuss this
> > with the team and get back to you with a workaround / fix.
> >
>
> Thanks a lot! <3
>
> >
> > Thanks,
> > Prateek
> >
> >
> > On Tue, Feb 26, 2019 at 7:08 PM Tom Davis 
> wrote:
> >
> >> Hey folks!
> >>
> >> We have noticed some inconsistencies in message ordering when running a
> >> StreamApplication that calls two separate `map` functions over an input
> >> and sends results to the same output. I have attached my Execution Plan,
> >> but the gist is that the first `map` function marks a thing as "pending"
> >> by sending a message to a status topic and the second `map` function
> >> does some work then sends its own status with "done".
> >>
> >> We have a test set up to read the resulting status topic with a normal
> >> Kafka consumer to ensure that two status messages were produced by Samza
> >> and consumed in the proper order (first "pending", then "done", per the
> >> order of the MessageStream call chains). This test flaps pretty
> >> routinely since upgrading to Samza 1.0; we never noticed this in the
> >> past. Sometimes, it times out waiting for any messages, though that's
> >> considerably less rare than the ordering issue. My understanding is, for
> >> a given Task, by default, all processing should be done serially. Is
> >> that no longer true? Is the guarantee *only* for the order in which
> >> messages are consumed, not produced?
> >>
> >> For test simplicity, there's a single Kafka partition for each topic and
> >> I attempted to create a configuration file that would eliminate as much
> >> coordination and concurrency sources as I knew how:
> >>
> >>   processor.id=0
> >>
> >>
> job.coordinator.factory=org.apache.samza.standalone.PassthroughJobCoordinatorFactory
> >>   job.container.single.thread.mode=true
> >>
> >> (We use the ZkJobCoordinatorFactory normally but both produce the bug)
> >>
> >> I realize the KafkaProducer does not *technically* guarantee delivery
> >> order except when using transactions, which KafkaSystemProducer doesn't
> >> appear to do by default. I have checked the actual message envelope and
> >> when the ordering is wrong, the offset order is correct -- so, "done"
> >> was recorded by Kafka prior to "pending". This seems to rule out Samza
> >> but I'm not entirely confident in that conclusion. Any thoughts?
> >>
> >> Thanks,
> >>
> >> Tom
> >>
>

Re: "send to" ordering is inconsistent

2019-02-27 Thread Prateek Maheshwari

Hi Tom,

I'm assuming that the two sub-DAGs you're talking about are the two Map ->
Send To chains acting on the "audit-report-requests" input and sending
their results to the "audit-report-status" output.

Although processing within each Task is in-order, the framework does not
guarantee the order in which the multiple chained operators for an operator
are evaluated. Specifically, in the current implementation
,
an Operator's registeredOperators are maintained as a HashSet of
OperatorImpls. This would explain the out-of-order appearance of the two
messages. I'm not sure what's changed in 1.0 that makes this trigger now.

Since both sendTo and sink are terminal operators (void return type), I
don't think you'll be able to easily get around this. Let me discuss this
with the team and get back to you with a workaround / fix.

Thanks,
Prateek

On Tue, Feb 26, 2019 at 7:08 PM Tom Davis  wrote:

> Hey folks!
>
> We have noticed some inconsistencies in message ordering when running a
> StreamApplication that calls two separate `map` functions over an input
> and sends results to the same output. I have attached my Execution Plan,
> but the gist is that the first `map` function marks a thing as "pending"
> by sending a message to a status topic and the second `map` function
> does some work then sends its own status with "done".
>
> We have a test set up to read the resulting status topic with a normal
> Kafka consumer to ensure that two status messages were produced by Samza
> and consumed in the proper order (first "pending", then "done", per the
> order of the MessageStream call chains). This test flaps pretty
> routinely since upgrading to Samza 1.0; we never noticed this in the
> past. Sometimes, it times out waiting for any messages, though that's
> considerably less rare than the ordering issue. My understanding is, for
> a given Task, by default, all processing should be done serially. Is
> that no longer true? Is the guarantee *only* for the order in which
> messages are consumed, not produced?
>
> For test simplicity, there's a single Kafka partition for each topic and
> I attempted to create a configuration file that would eliminate as much
> coordination and concurrency sources as I knew how:
>
>   processor.id=0
>
> job.coordinator.factory=org.apache.samza.standalone.PassthroughJobCoordinatorFactory
>   job.container.single.thread.mode=true
>
> (We use the ZkJobCoordinatorFactory normally but both produce the bug)
>
> I realize the KafkaProducer does not *technically* guarantee delivery
> order except when using transactions, which KafkaSystemProducer doesn't
> appear to do by default. I have checked the actual message envelope and
> when the ordering is wrong, the offset order is correct -- so, "done"
> was recorded by Kafka prior to "pending". This seems to rule out Samza
> but I'm not entirely confident in that conclusion. Any thoughts?
>
> Thanks,
>
> Tom
>

Re: [VOTE] Migration of Samza git repo to gitbox.apache.org

2019-01-23 Thread Prateek Maheshwari

+1 (binding) again

- Prateek

On Wed, Jan 23, 2019 at 11:50 AM Pawas Chhokra  wrote:
>
> Hi all,
>
> This is a call for a vote on migrating Samza git repo to gitbox.apache.org, on
> 11 AM, Jan 29, 2019. As mandated by the Apache Infrastructure Team, all git
> repositories must be migrated from git-wip-us.apache.org URL to
> gitbox.apache.org, as the old service is being decommissioned.
> The vote will be open for 72 hours (ending at 12:00 PM PST Monday,
> January 28). You can vote as follows:
>
> [ ] +1 approve
>
> [ ] +0 no opinion
>
> [ ] -1 disapprove (and reason why)
>
> The vote is +1 from my side.
>
> Thanks & Regards,
> Pawas Chhokra

Re: [DISCUSS] Mandatory migration of Samza git repo to gitbox.apache.org

2019-01-15 Thread Prateek Maheshwari

Thanks for starting the discussion Pawas. I'm +1 (binding) for the migration.

- Prateek

On Tue, Jan 15, 2019 at 11:44 AM Pawas Chhokra  wrote:
>
> Hi all,
>
> As mandated by the Apache Infrastructure Team, all git repositories must be
> migrated from git-wip-us.apache.org URL to gitbox.apache.org, as the old
> service is being decommissioned. This needs to happen before February 7th,
> and this ticket  is to
> check if migrating Samza on 11 AM, Jan 25, 2019 is acceptable to everyone.
>
> Thanks & Regards,
> Pawas Chhokra

Re: Draft report to board - Jan 2019

2019-01-09 Thread Prateek Maheshwari

Thanks for the summary Yi. I'd change: "HDFS based backup/restore of
state stores" to "Evaluation for HDFS based backup/restore of state
stores" since this was an intern project and is not checked in to
master. Otherwise LGTM.

Thanks,
Prateek

On Wed, Jan 9, 2019 at 12:28 PM Yi Pan  wrote:
>
> Hi, all,
>
> Our quarterly report is due this Wed (1/9). The following is the draft
> report. Please let me know by the end of the day if I missed anything.
> Thanks!
>
> ## Description:
>
>  - Apache Samza is a distributed stream processing engine that are highly
>
>configurable to process events from various data sources, including
>
>real-time messaging system (e.g. Kafka) and distributed file systems
> (e.g.
>
>HDFS).
>
>
>
> ## Issues:
>
>  - No issues requires board attention
>
>
>
> ## Activity:
>
>  - Samza 1.0 is released:
>
> - News coverage:
> https://www.zdnet.com/article/real-time-data-processing-just-got-more-options-linkedin-releases-apache-samza-1-0-streaming/
>
> - Engineering blogs:
> https://engineering.linkedin.com/blog/2018/11/samza-1-0--stream-processing-at-massive-scale
>
> - Major online website refresh: http://samza.apache.org/
>
>  - Critical improvement projects completed:
>
> - Changelog restore parallelization
>
> - HDFS based backup/restore of state stores
>
>  - Multiple SEP projects initiated or in-progress:
>
> - SEP-18: allows manipulating starting offsets and time-based rewind
>
> - SEP-19: Fast failover for stateful jobs on container failure (i.e.
> standby container)
>
> - SEP to come soon: async high-level API
>
>  - Beam Samza runner upgrade to use Samza 1.0
>
>  - Go and Python support via Beam Samza runner
>
>
>
> ## Health report:
>
>  - Project is in healthy status with 1.0 released in Nov 2018
>
>
>
> ## PMC changes:
>
>
>
>  - Currently 15 PMC members.
>
>  - Prateek Maheshwari was added to the PMC on Thu Nov 01 2018
>
>
>
> ## Committer base changes:
>
>
>
>  - Currently 22 committers.
>
>  - New commmitters:
>
> - Aditya Toomula was added as a committer on Mon Nov 05 2018
>
> - Hai Lu was added as a committer on Mon Nov 05 2018
>
>
>
> ## Releases:
>
>
>
>  - Last release was 1.0 on Nov 28, 2018
>
>
>
> ## /dist/ errors: 9
>
>  - Project is in healthy status with 1.0 released in Nov 2018
>
>
>
> ## Mailing list activity:
>
>
>
>  - dev@samza.apache.org:
>
> - 271 subscribers (down -13 in the last 3 months):
>
> - 445 emails sent to list (288 in previous quarter)
>
>
>
>
>
> ## JIRA activity:
>
>
>
>  - 111 JIRA tickets created in the last 3 months
>
>  - 57 JIRA tickets closed/resolved in the last 3 months

Re: Beam Samza Runner - java.lang.UnsupportedOperationException: Cannot create a producer for an input system

2019-01-07 Thread Prateek Maheshwari

+ Xinyu

> On Jan 4, 2019, at 9:58 PM, Deshpande, Omkar  
> wrote:
> 
> Hello,
> 
> I am getting following exception while running Beam Samza Runner –
> 
> java.lang.UnsupportedOperationException: Cannot create a producer for an 
> input system
> 
>  at 
> org.apache.beam.runners.samza.adapter.BoundedSourceSystem$Factory.getProducer(BoundedSourceSystem.java:411)
> 
>  at 
> org.apache.samza.container.SamzaContainer$$anonfun$13.apply(SamzaContainer.scala:223)
> 
>  at 
> org.apache.samza.container.SamzaContainer$$anonfun$13.apply(SamzaContainer.scala:220)
> 
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
> 
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
> 
>  at scala.collection.immutable.Map$Map3.foreach(Map.scala:161)
> 
>  at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
> 
>  at scala.collection.AbstractTraversable.map(Traversable.scala:104)
> 
>  at 
> org.apache.samza.container.SamzaContainer$.apply(SamzaContainer.scala:220)
> 
>  at org.apache.samza.container.SamzaContainer.apply(SamzaContainer.scala)
> 
>  at 
> org.apache.samza.processor.StreamProcessor.createSamzaContainer(StreamProcessor.java:198)
> 
>  at 
> org.apache.samza.processor.StreamProcessor$1.onNewJobModel(StreamProcessor.java:290)
> 
>  at 
> org.apache.samza.zk.ZkJobCoordinator.onNewJobModelConfirmed(ZkJobCoordinator.java:304)
> 
>  at 
> org.apache.samza.zk.ZkJobCoordinator$ZkBarrierListenerImpl.lambda$onBarrierStateChanged$1(ZkJobCoordinator.java:394)
> 
>  at 
> org.apache.samza.zk.ScheduleAfterDebounceTime.lambda$getScheduleableAction$0(ScheduleAfterDebounceTime.java:163)
> 
>  at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> 
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> 
>  at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> 
>  at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> 
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> 
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> 
>  at java.lang.Thread.run(Thread.java:748)
> 
> 2019-01-04 21:39:11 ERROR SamzaContainer$:86 - Failed to create a producer 
> for 0-KafkaIO_Read_Read_KafkaUnboundedSource__out__PCollection_, so skipping.
> 
> java.lang.UnsupportedOperationException: Cannot create a producer for an 
> input system
> 
>  at 
> org.apache.beam.runners.samza.adapter.UnboundedSourceSystem$Factory.getProducer(UnboundedSourceSystem.java:452)
> 
>  at 
> org.apache.samza.container.SamzaContainer$$anonfun$13.apply(SamzaContainer.scala:223)
> 
>  at 
> org.apache.samza.container.SamzaContainer$$anonfun$13.apply(SamzaContainer.scala:220)
> 
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
> 
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
> 
>  at scala.collection.immutable.Map$Map3.foreach(Map.scala:161)
> 
>  at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
> 
>  at scala.collection.AbstractTraversable.map(Traversable.scala:104)
> 
>  at 
> org.apache.samza.container.SamzaContainer$.apply(SamzaContainer.scala:220)
> 
>  at org.apache.samza.container.SamzaContainer.apply(SamzaContainer.scala)
> 
>  at 
> org.apache.samza.processor.StreamProcessor.createSamzaContainer(StreamProcessor.java:198)
> 
>  at 
> org.apache.samza.processor.StreamProcessor$1.onNewJobModel(StreamProcessor.java:290)
> 
>  at 
> org.apache.samza.zk.ZkJobCoordinator.onNewJobModelConfirmed(ZkJobCoordinator.java:304)
> 
>  at 
> org.apache.samza.zk.ZkJobCoordinator$ZkBarrierListenerImpl.lambda$onBarrierStateChanged$1(ZkJobCoordinator.java:394)
> 
>  at 
> org.apache.samza.zk.ScheduleAfterDebounceTime.lambda$getScheduleableAction$0(ScheduleAfterDebounceTime.java:163)
> 
>  at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> 
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> 
>  at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> 
>  at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> 
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> 
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> 
>  at java.lang.Thread.run(Thread.java:748)
> 
> This exception does not stop the execution, however I would like to 
> understand the reason for this and possible resolution.
> 
> Thanks,
> Omkar Deshpande

Re: app.class or task.class for beam samza runner

2019-01-03 Thread Prateek Maheshwari

Hi Omkar,

I think it's only possible to get that exception with Samza 1.0. Can
you verify that the deployment is indeed using samza 0.14.1?

Thanks,
Prateek

On Wed, Jan 2, 2019 at 11:40 PM Deshpande, Omkar
 wrote:
>
> Hello,
>
> I have been able to execute my Samza-Beam application in Local mode. And now 
> I am trying to run a Samza-Beam application in Standalone mode.
>
> Here is my configFile  config.properties:
>
> app.name=test-app
> job.coordinator.factory=org.apache.samza.zk.ZkJobCoordinatorFactory
> job.coordinator.zk.connect=localhost:2181
> job.coordinator.system=kafka
> job.factory.class=org.apache.samza.job.local.ProcessJobFactory
> # Kafka System
> systems.kafka.samza.factory=org.apache.samza.system.kafka.KafkaSystemFactory
> systems.kafka.consumer.zookeeper.connect=localhost:2181
> systems.kafka.producer.bootstrap.servers=localhost:9092
> systems.kafka.default.stream.replication.factor=1
>
> I am getting following exception:
>
> org.apache.samza.config.ConfigException: Legacy task applications must set a 
> non-empty task.class in configuration.
>
>   at 
> org.apache.samza.application.ApplicationUtil.fromConfig(ApplicationUtil.java:58)
>
>   at 
> org.apache.samza.runtime.LocalContainerRunner.main(LocalContainerRunner.java:87)
>
> Versions:
> 2.9.0
> 0.14.1
>
> As per my understanding, I shouldn’t have to create implementation of 
> StreamApplication or StreamTask while using Beam SDK.
>
> An example of configFile for Samza-Beam Standalone application would be 
> helpful.
>
> Regards,
> Omkar Deshpande

Re: https://issues.apache.org/jira/browse/SAMZA-2039

2018-12-18 Thread Prateek Maheshwari

For notifying others, you can leave a comment on the ticket that you're working 
on it. Additionally, you can assign the ticket to yourself if you have the 
permissions to do so. 

Thanks for your interest, and please let us know if you need any help.

- Prateek

> On Dec 17, 2018, at 8:27 PM, blitzerr  wrote:
> 
> What is the protocol to work on an issue ? Do I claim it first ? How do we
> make sure multiple people are starting on the same one ?

Re: Welcome Hai Lu and Aditya Toomla as committers to Apache Samza!

2018-11-06 Thread Prateek Maheshwari

Congrats Hai and Aditya, and thanks for your contributions!

- Prateek

> On Nov 6, 2018, at 10:40 AM, Wei Song  wrote:
> 
> Congrats Hai and Aditya!
> 
> 
> 
> On 11/6/18, 10:20 AM, "Yi Pan"  wrote:
> 
>Hi, all,
> 
>All official steps are completed and please join me to welcome Hai and
>Aditya to Apache Samza community as committers! They have been making
>significant contribution to many important projects in Samza such as SQL,
>Samza-on-Hadoop, Kinesis connector, etc.
> 
>Welcome Hai and Aditya!
> 
>-Yi
> 
>

Re: [VOTE] Apache Samza 1.0.0 RC4

2018-11-02 Thread Prateek Maheshwari

Verified signatures and successfully ran check-all and integration tests.

+1 (binding) from me.

Thanks,
Prateek

On Fri, Nov 2, 2018 at 2:39 PM Boris S  wrote:
>
> ran check-all and integration tests. All passed.
> verified signatures.
> +1
>
> On Wed, Oct 31, 2018 at 7:15 PM Jagadish Venkatraman 
> wrote:
>
> > Hi all,
> >
> > This is a call for a vote on a release of Apache Samza 1.0.0. Thanks to
> > everyone who has contributed to this release.
> >
> > The release candidate can be downloaded from here:
> > http://home.apache.org/~jagadish/samza-1.0.0-rc4/
> >
> > The release candidate is signed with pgp key AF81FFBF, which can be found
> > on keyservers:
> > http://pgp.mit.edu/pks/lookup?op=get=0xAF81FFBF
> >
> > The git tag is release-1.0.0-rc4 and signed with the same pgp key:
> >
> > https://git-wip-us.apache.org/repos/asf?p=samza.git;a=tag;h=refs/tags/release-1.0.0-rc4
> >
> > Test binaries have been published to Maven's staging repository, and are
> > available here:
> > https://repository.apache.org/content/repositories/orgapachesamza-1055/
> >
> > The vote will be open for 72 hours (ending at 7:00 PM PST Saturday,
> > November 3).
> >
> > Please download the release candidate, check the hashes/signature, build it
> > and test it, and then please vote:
> >
> > [ ] +1 approve
> >
> > [ ] +0 no opinion
> >
> > [ ] -1 disapprove (and reason why)
> >
> > For me, I ran check-all.sh, integration tests and verified the SQL console
> > in samza-tool tgz. So +1 (binding) from my side.
> >
> > Thanks,
> > Jagadish
> >
> > --
> > Jagadish V
> >

[CANCEL] [VOTE] Apache Samza 1.0.0 RC2

2018-10-30 Thread Prateek Maheshwari

Hi all,

This is the CANCEL notification for the 1.0.0 RC3. We found an issue
with Samza SQL integration with the new ApplicationRunners API that we
will fix in the new RC.

Thanks,
Prateek

Re: [VOTE] Apache Samza 1.0.0 RC3

2018-10-30 Thread Prateek Maheshwari

We found an issue with Samza SQL integration with the new
ApplicationRunners APIs. We'll cancel this vote and create a new RC.

Thanks,
Prateek
On Tue, Oct 30, 2018 at 10:14 AM Jake Maes  wrote:
>
> +1 binding
>
> Ran check-all on OSX with Gradle 2.8
>
> On Mon, Oct 29, 2018 at 10:13 PM Prateek Maheshwari 
> wrote:
>
> > Hi all,
> >
> > This is a call for a vote on a release of Apache Samza 1.0.0. Thanks to
> > everyone who has contributed to this release.
> >
> > The release candidate can be downloaded from here:
> > http://home.apache.org/~pmaheshwari/samza-1.0.0-rc3/
> >
> > The release candidate is signed with pgp key 6585B3D7, which can be found
> > on keyservers: https://pgp.mit.edu/pks/lookup?op=get=0x6585B3D7
> >
> > The git tag is release-1.0.0-rc3 and signed with the same pgp key:
> >
> > https://git-wip-us.apache.org/repos/asf?p=samza.git;a=tag;h=refs/tags/release-1.0.0-rc3
> >
> > Test binaries have been published to Maven's staging repository, and are
> > available here:
> > https://repository.apache.org/content/repositories/orgapachesamza-1054/
> >
> > The vote will be open for 72 hours (ending at 10:00 PM PST Thursday,
> > 11/01/2018).
> >
> > Please download the release candidate, check the hashes/signature, build it
> > and test it, and then please vote:
> >
> > [ ] +1 approve
> >
> > [ ] +0 no opinion
> >
> > [ ] -1 disapprove (and reason why)
> >
> > For me, I ran check-all.sh, integration tests and verified the SQL console
> > in samza-tool tgz. So +1 (non-binding) from my side.
> >
> > Thanks,
> > Prateek
> >

[VOTE] Apache Samza 1.0.0 RC3

2018-10-29 Thread Prateek Maheshwari

Hi all,

This is a call for a vote on a release of Apache Samza 1.0.0. Thanks to
everyone who has contributed to this release.

The release candidate can be downloaded from here:
http://home.apache.org/~pmaheshwari/samza-1.0.0-rc3/

The release candidate is signed with pgp key 6585B3D7, which can be found
on keyservers: https://pgp.mit.edu/pks/lookup?op=get=0x6585B3D7

The git tag is release-1.0.0-rc3 and signed with the same pgp key:
https://git-wip-us.apache.org/repos/asf?p=samza.git;a=tag;h=refs/tags/release-1.0.0-rc3

Test binaries have been published to Maven's staging repository, and are
available here:
https://repository.apache.org/content/repositories/orgapachesamza-1054/

The vote will be open for 72 hours (ending at 10:00 PM PST Thursday,
11/01/2018).

Please download the release candidate, check the hashes/signature, build it
and test it, and then please vote:

[ ] +1 approve

[ ] +0 no opinion

[ ] -1 disapprove (and reason why)

For me, I ran check-all.sh, integration tests and verified the SQL console
in samza-tool tgz. So +1 (non-binding) from my side.

Thanks,
Prateek

[CANCEL] [VOTE] Apache Samza 1.0.0 RC2

2018-10-25 Thread Prateek Maheshwari

Hi all,

This is the CANCEL notification for the 1.0.0 RC2. We found a
test framework message serialization issue that we will fix in the new RC.

Thanks,
Prateek

Re: [VOTE] Apache Samza 1.0.0 RC2

2018-10-25 Thread Prateek Maheshwari

We found another issue that affects message serialization in the test
framework. We will cancel this vote, fix the issue and create another
RC soon.

Thanks,
Prateek

On Thu, Oct 25, 2018 at 12:38 AM santhosh venkat
 wrote:
>
> 1. ./bin/check-all.sh succeeded.
> 2. Both the commands ./bin/integration-tests.sh yarn-integration-tests and
> ./bin/integration-tests.sh standalone-integration-tests succeeded.
> 3. Verified the SQL console available in samza-tool tgz.
>
> +1
>
> Thanks.
>
> On Wed, Oct 24, 2018 at 10:12 PM Yi Pan  wrote:
>
> > Ran check-all and deployed locally with the test jobs. All tests passed.
> >
> > +1 (binding) from my end.
> >
> > Thanks for push the release!
> >
> > -Yi
> >
> > On Wed, Oct 24, 2018 at 8:53 AM Prateek Maheshwari 
> > wrote:
> >
> > > Hi Jagadish,
> > >
> > > PR 755 is mis-titled. Its only adding back the tests for the old
> > > consumer. The old consumer was already added back in
> > > https://github.com/apache/samza/pull/740.
> > >
> > > Thanks,
> > > Prateek
> > > On Wed, Oct 24, 2018 at 12:02 AM Jagadish Venkatraman
> > >  wrote:
> > > >
> > > > Boris,
> > > >
> > > > Do users have the option to switch to use the "old" Kafka consumer if
> > > they
> > > > encounter any issue with the "new" consumer?. If not, should we pull in
> > > > https://github.com/apache/samza/pull/755? It is my understanding that
> > > > PR-755 adds support for this.
> > > >
> > > > Thanks,
> > > > Jagadish
> > > >
> > > > On Tue, Oct 23, 2018 at 2:50 PM Boris S  wrote:
> > > >
> > > > > Ran build, test and integration test on Linux.
> > > > > Verified the signatures.
> > > > >
> > > > > +1
> > > > >
> > > > > On Tue, Oct 23, 2018 at 11:55 AM Prateek Maheshwari <
> > > prate...@utexas.edu>
> > > > > wrote:
> > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > This is a call for a vote on a release of Apache Samza 1.0.0.
> > Thanks
> > > to
> > > > > > everyone who has contributed to this release.
> > > > > >
> > > > > > The release candidate can be downloaded from here:
> > > > > > http://home.apache.org/~pmaheshwari/samza-1.0.0-rc2/
> > > > > >
> > > > > > The release candidate is signed with pgp key 6585B3D7, which can be
> > > found
> > > > > > on keyservers:
> > > https://pgp.mit.edu/pks/lookup?op=get=0x6585B3D7
> > > > > >
> > > > > > The git tag is release-1.0.0-rc2 and signed with the same pgp key:
> > > > > >
> > > > > >
> > > > >
> > >
> > https://git-wip-us.apache.org/repos/asf?p=samza.git;a=tag;h=refs/tags/release-1.0.0-rc2
> > > > > >
> > > > > > Test binaries have been published to Maven's staging repository,
> > and
> > > are
> > > > > > available here:
> > > > > >
> > > https://repository.apache.org/content/repositories/orgapachesamza-1053/
> > > > > >
> > > > > > The vote will be open for 72 hours (ending at 12:00 PM PST Friday,
> > > > > > 10/26/2018).
> > > > > >
> > > > > > Please download the release candidate, check the hashes/signature,
> > > build
> > > > > it
> > > > > > and test it, and then please vote:
> > > > > >
> > > > > > [ ] +1 approve
> > > > > >
> > > > > > [ ] +0 no opinion
> > > > > >
> > > > > > [ ] -1 disapprove (and reason why)
> > > > > >
> > > > > > For me, I ran check-all.sh, integration tests and verified the SQL
> > > > > console
> > > > > > in samza-tool tgz. So +1 (non-binding) from my side.
> > > > > >
> > > > > > Thanks,
> > > > > > Prateek
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Jagadish V,
> > > > Graduate Student,
> > > > Department of Computer Science,
> > > > Stanford University
> > >
> >

Re: [VOTE] Apache Samza 1.0.0 RC2

2018-10-24 Thread Prateek Maheshwari

Hi Jagadish,

PR 755 is mis-titled. Its only adding back the tests for the old
consumer. The old consumer was already added back in
https://github.com/apache/samza/pull/740.

Thanks,
Prateek
On Wed, Oct 24, 2018 at 12:02 AM Jagadish Venkatraman
 wrote:
>
> Boris,
>
> Do users have the option to switch to use the "old" Kafka consumer if they
> encounter any issue with the "new" consumer?. If not, should we pull in
> https://github.com/apache/samza/pull/755? It is my understanding that
> PR-755 adds support for this.
>
> Thanks,
> Jagadish
>
> On Tue, Oct 23, 2018 at 2:50 PM Boris S  wrote:
>
> > Ran build, test and integration test on Linux.
> > Verified the signatures.
> >
> > +1
> >
> > On Tue, Oct 23, 2018 at 11:55 AM Prateek Maheshwari 
> > wrote:
> >
> > > Hi all,
> > >
> > > This is a call for a vote on a release of Apache Samza 1.0.0. Thanks to
> > > everyone who has contributed to this release.
> > >
> > > The release candidate can be downloaded from here:
> > > http://home.apache.org/~pmaheshwari/samza-1.0.0-rc2/
> > >
> > > The release candidate is signed with pgp key 6585B3D7, which can be found
> > > on keyservers: https://pgp.mit.edu/pks/lookup?op=get=0x6585B3D7
> > >
> > > The git tag is release-1.0.0-rc2 and signed with the same pgp key:
> > >
> > >
> > https://git-wip-us.apache.org/repos/asf?p=samza.git;a=tag;h=refs/tags/release-1.0.0-rc2
> > >
> > > Test binaries have been published to Maven's staging repository, and are
> > > available here:
> > > https://repository.apache.org/content/repositories/orgapachesamza-1053/
> > >
> > > The vote will be open for 72 hours (ending at 12:00 PM PST Friday,
> > > 10/26/2018).
> > >
> > > Please download the release candidate, check the hashes/signature, build
> > it
> > > and test it, and then please vote:
> > >
> > > [ ] +1 approve
> > >
> > > [ ] +0 no opinion
> > >
> > > [ ] -1 disapprove (and reason why)
> > >
> > > For me, I ran check-all.sh, integration tests and verified the SQL
> > console
> > > in samza-tool tgz. So +1 (non-binding) from my side.
> > >
> > > Thanks,
> > > Prateek
> > >
> >
>
>
> --
> Jagadish V,
> Graduate Student,
> Department of Computer Science,
> Stanford University

[VOTE] Apache Samza 1.0.0 RC2

2018-10-23 Thread Prateek Maheshwari

Hi all,

This is a call for a vote on a release of Apache Samza 1.0.0. Thanks to
everyone who has contributed to this release.

The release candidate can be downloaded from here:
http://home.apache.org/~pmaheshwari/samza-1.0.0-rc2/

The release candidate is signed with pgp key 6585B3D7, which can be found
on keyservers: https://pgp.mit.edu/pks/lookup?op=get=0x6585B3D7

The git tag is release-1.0.0-rc2 and signed with the same pgp key:
https://git-wip-us.apache.org/repos/asf?p=samza.git;a=tag;h=refs/tags/release-1.0.0-rc2

Test binaries have been published to Maven's staging repository, and are
available here:
https://repository.apache.org/content/repositories/orgapachesamza-1053/

The vote will be open for 72 hours (ending at 12:00 PM PST Friday, 10/26/2018).

Please download the release candidate, check the hashes/signature, build it
and test it, and then please vote:

[ ] +1 approve

[ ] +0 no opinion

[ ] -1 disapprove (and reason why)

For me, I ran check-all.sh, integration tests and verified the SQL console
in samza-tool tgz. So +1 (non-binding) from my side.

Thanks,
Prateek

[CANCEL] [VOTE] Apache Samza 1.0.0 RC1

2018-10-23 Thread Prateek Maheshwari

Hi all,

This is the CANCEL notification for the 1.0.0 RC1. We found a
checkstyle issue that we will fix in the new RC.

Thanks,
Prateek

Re: [VOTE] Apache Samza 1.0.0 RC1

2018-10-23 Thread Prateek Maheshwari

Thanks for verifying Shanthoosh. We'll cancel this vote, fix the error
and create a new RC.

- Prateek
On Mon, Oct 22, 2018 at 10:31 PM santhosh venkat
 wrote:
>
> I tried building the release candidate(RC1) and it fails with the following
> checkstyle errors.
>
> [ant:checkstyle]
> /Users/svenkata/Documents/apache-samza-1.0.0-src/samza-test/src/main/java/org/apache/samza/test/integration/TestStandaloneIntegrationApplication.java:43:
> 'member def type' have incorrect indentation level 5, expected level should
> be 4.
> [ant:checkstyle]
> /Users/svenkata/Documents/apache-samza-1.0.0-src/samza-test/src/main/java/org/apache/samza/test/integration/TestStandaloneIntegrationApplication.java:43:
> 'method def' child have incorrect indentation level 5, expected level
> should be 4.
> [ant:checkstyle]
> /Users/svenkata/Documents/apache-samza-1.0.0-src/samza-test/src/main/java/org/apache/samza/test/integration/TestStandaloneIntegrationApplication.java:44:
> 'method def' child have incorrect indentation level 5, expected level
> should be 4.
>
> Thanks.
>
> On Mon, Oct 22, 2018 at 9:09 PM Prateek Maheshwari 
> wrote:
>
> > Hi all,
> >
> > This is a call for a vote on a release of Apache Samza 1.0.0. Thanks to
> > everyone who has contributed to this release.
> >
> > The release candidate can be downloaded from here:
> > http://home.apache.org/~pmaheshwari/samza-1.0.0-rc1/
> >
> > The release candidate is signed with pgp key 6585B3D7, which can be found
> > on keyservers: https://pgp.mit.edu/pks/lookup?op=get=0x6585B3D7
> >
> > The git tag is release-1.0.0-rc1 and signed with the same pgp key:
> >
> > https://git-wip-us.apache.org/repos/asf?p=samza.git;a=tag;h=refs/tags/release-1.0.0-rc1
> >
> > Test binaries have been published to Maven's staging repository, and are
> > available here:
> > https://repository.apache.org/content/repositories/orgapachesamza-1052/
> >
> > The vote will be open for 72 hours (ending at 9:00 PM PST Thursday,
> > 10/25/2018).
> >
> > Please download the release candidate, check the hashes/signature, build it
> > and test it, and then please vote:
> >
> > [ ] +1 approve
> >
> > [ ] +0 no opinion
> >
> > [ ] -1 disapprove (and reason why)
> >
> > For me, I ran check-all.sh, integration tests and verified the SQL console
> > in samza-tool tgz. So +1 (non-binding) from my side.
> >
> > Thanks,
> > Prateek
> >

[VOTE] Apache Samza 1.0.0 RC1

2018-10-22 Thread Prateek Maheshwari

Hi all,

This is a call for a vote on a release of Apache Samza 1.0.0. Thanks to
everyone who has contributed to this release.

The release candidate can be downloaded from here:
http://home.apache.org/~pmaheshwari/samza-1.0.0-rc1/

The release candidate is signed with pgp key 6585B3D7, which can be found
on keyservers: https://pgp.mit.edu/pks/lookup?op=get=0x6585B3D7

The git tag is release-1.0.0-rc1 and signed with the same pgp key:
https://git-wip-us.apache.org/repos/asf?p=samza.git;a=tag;h=refs/tags/release-1.0.0-rc1

Test binaries have been published to Maven's staging repository, and are
available here:
https://repository.apache.org/content/repositories/orgapachesamza-1052/

The vote will be open for 72 hours (ending at 9:00 PM PST Thursday, 10/25/2018).

Please download the release candidate, check the hashes/signature, build it
and test it, and then please vote:

[ ] +1 approve

[ ] +0 no opinion

[ ] -1 disapprove (and reason why)

For me, I ran check-all.sh, integration tests and verified the SQL console
in samza-tool tgz. So +1 (non-binding) from my side.

Thanks,
Prateek

[CANCEL][VOTE] Apache Samza 1.0.0 RC0

2018-10-22 Thread Prateek Maheshwari

Hi all,

This is the CANCEL notification for the 1.0.0 RC0. We found an
integration test setup issue that we will fix. We will also include
the following PR in the new RC:
SAMZA-1901: Implementation of Samza SQL Shell,

Thanks,
Prateek

Re: [VOTE] Apache Samza 1.0.0 RC0

2018-10-22 Thread Prateek Maheshwari

Thanks for verifying Jagadish. I'll fix the test failure and create a
new RC and cancel the vote for this RC.

- Prateek
On Mon, Oct 22, 2018 at 4:23 PM Jagadish Venkatraman
 wrote:
>
> I ran the integration test and encountered this failure:
>
> 2018-10-22 15:58:43,687 zopkio.test_runner [INFO] test_samza_jobfailed
> 2018-10-22 15:58:43,688 zopkio.test_runner [INFO] ['AssertionError: Job
> (negate_number) appears not to have started. Expected to see a log line
> matching regex: .*Submitted application (\\w*)\n']
> 2018-10-22 15:58:43,688 zopkio.test_runner [INFO]
> test_container_performancefailed
> 2018-10-22 15:58:43,688 zopkio.test_runner [INFO] ['AssertionError: Job
> (container-performance) appears not to have started. Expected to see a log
> line matching regex: .*Submitted application (\\w*)\n']
> 2018-10-22 15:58:43,688 zopkio.test_runner [INFO]
> test_kafka_read_write_performancefailed
> 2018-10-22 15:58:43,688 zopkio.test_runner [INFO] ['AssertionError: Job
> (kafka-read-write-performance) appears not to have started. Expected to see
> a log line matching regex: .*Submitted application (\\w*)\n']
>
>
> -- Jagadish
>
>
>
> On Fri, Oct 19, 2018 at 6:59 PM Prateek Maheshwari 
> wrote:
>
> > Hi all,
> >
> > This is a call for a vote on a release of Apache Samza 1.0.0. Thanks to
> > everyone who has contributed to this release.
> >
> > The release candidate can be downloaded from here:
> > http://home.apache.org/~pmaheshwari/samza-1.0.0-rc0/
> >
> > The release candidate is signed with pgp key 6585B3D7, which can be found
> > on keyservers: https://pgp.mit.edu/pks/lookup?op=get=0x6585B3D7
> >
> > The git tag is release-1.0.0-rc0 and signed with the same pgp key:
> >
> > https://git-wip-us.apache.org/repos/asf?p=samza.git;a=tag;h=refs/tags/release-1.0.0-rc0
> >
> > Test binaries have been published to Maven's staging repository, and are
> > available here:
> > https://repository.apache.org/content/repositories/orgapachesamza-1051/
> >
> > The vote will be open for 72 hours (ending at 7:00 PM PST Monday,
> > 10/22/2018).
> >
> > Please download the release candidate, check the hashes/signature, build it
> > and test it, and then please vote:
> >
> > [ ] +1 approve
> >
> > [ ] +0 no opinion
> >
> > [ ] -1 disapprove (and reason why)
> >
> > For me, I ran check-all.sh, integration tests and verified the SQL console
> > in samza-tool tgz. So +1 (non-binding) from my side.
> >
> > Thanks,
> > Prateek
> >
>
>
> --
> Jagadish V,
> Graduate Student,
> Department of Computer Science,
> Stanford University

[VOTE] Apache Samza 1.0.0 RC0

2018-10-19 Thread Prateek Maheshwari

Hi all,

This is a call for a vote on a release of Apache Samza 1.0.0. Thanks to
everyone who has contributed to this release.

The release candidate can be downloaded from here:
http://home.apache.org/~pmaheshwari/samza-1.0.0-rc0/

The release candidate is signed with pgp key 6585B3D7, which can be found
on keyservers: https://pgp.mit.edu/pks/lookup?op=get=0x6585B3D7

The git tag is release-1.0.0-rc0 and signed with the same pgp key:
https://git-wip-us.apache.org/repos/asf?p=samza.git;a=tag;h=refs/tags/release-1.0.0-rc0

Test binaries have been published to Maven's staging repository, and are
available here:
https://repository.apache.org/content/repositories/orgapachesamza-1051/

The vote will be open for 72 hours (ending at 7:00 PM PST Monday, 10/22/2018).

Please download the release candidate, check the hashes/signature, build it
and test it, and then please vote:

[ ] +1 approve

[ ] +0 no opinion

[ ] -1 disapprove (and reason why)

For me, I ran check-all.sh, integration tests and verified the SQL console
in samza-tool tgz. So +1 (non-binding) from my side.

Thanks,
Prateek

Re: [VOTE] SEP-15: New Runtime Context API

2018-10-19 Thread Prateek Maheshwari

Hi all,

This VOTE has been up for > 72 hours and has 4 binding and 1
non-binding votes with no vetos. SEP-15 is now accepted. Thanks for
the contribution, Cameron!

- PrateekOn Tue, Oct 16, 2018 at 7:35 PM Jake Maes  wrote:
>
> +1 (binding)
>
> On Tue, Oct 16, 2018 at 3:00 PM Yi Pan  wrote:
>
> > +1 (binding)
> >
> > This has been long-waited feature to allow us to have better control and
> > access to shared object in different scope of context!
> >
> > -Yi
> >
> > On Mon, Oct 15, 2018 at 10:47 AM Jagadish Venkatraman <
> > jagadish1...@gmail.com> wrote:
> >
> > > +1 (binding) from my side.
> > >
> > > LGTM
> > >
> > > On Mon, Oct 15, 2018 at 10:44 AM Prateek Maheshwari  > >
> > > wrote:
> > >
> > > > +1 (non-binding) for these changes.
> > > >
> > > > (Resending from a non-LI email due to email delivery issues)
> > > >
> > > > - Prateek
> > > > On Fri, Oct 12, 2018 at 3:28 PM Cameron Lee 
> > wrote:
> > > > >
> > > > > Hi all,
> > > > >
> > > > > SEP-15 has been updated now that SAMZA-1714 has been reviewed and
> > > > implemented.
> > > > > Please vote on whether there are further breaking changes needed in
> > the
> > > > API or we can accept this proposal to be included in Samza 1.0.
> > > > >
> > > > >
> > > >
> > >
> > https://cwiki.apache.org/confluence/display/SAMZA/SEP-15%3A+New+Runtime+Context+API
> > > > >
> > > > > Thank you,
> > > > > Cameron
> > > >
> > >
> > >
> > > --
> > > Jagadish V,
> > > Graduate Student,
> > > Department of Computer Science,
> > > Stanford University
> > >
> >

Re: [VOTE] SEP-14: System and Stream Descriptors

2018-10-17 Thread Prateek Maheshwari

Hi all,

This vote has been up for over 72 hours and has 3 binding votes, so
SEP-14 is now accepted. Thanks for reviewing and voting.

- Prateek
On Mon, Oct 15, 2018 at 10:42 AM Jagadish Venkatraman
 wrote:
>
> +1 binding from my side, thanks! This is a great addition to Samza-1.0
>
> On Fri, Oct 12, 2018 at 12:30 PM Prateek Maheshwari 
> wrote:
>
> > Hi folks,
> >
> > Now that SAMZA-1804 has been implemented and reviewed, we've updated
> > SEP-14 with the latest APIs and design decisions.
> >
> > Please vote for accepting SEP-14 in its current form for the upcoming
> > Samza 1.0 release.
> >
> > https://cwiki.apache.org/confluence/display/SAMZA/SEP-14%3A+System+and+Stream+Descriptors
> >
> > Thanks,
> > Prateek
> >
>
>
> --
> Jagadish V,
> Graduate Student,
> Department of Computer Science,
> Stanford University

Re: [VOTE] SEP-15: New Runtime Context API

2018-10-15 Thread Prateek Maheshwari

+1 (non-binding) for these changes.

(Resending from a non-LI email due to email delivery issues)

- Prateek
On Fri, Oct 12, 2018 at 3:28 PM Cameron Lee  wrote:
>
> Hi all,
>
> SEP-15 has been updated now that SAMZA-1714 has been reviewed and implemented.
> Please vote on whether there are further breaking changes needed in the API 
> or we can accept this proposal to be included in Samza 1.0.
>
> https://cwiki.apache.org/confluence/display/SAMZA/SEP-15%3A+New+Runtime+Context+API
>
> Thank you,
> Cameron

Re: [VOTE] SEP-15: New Runtime Context API

2018-10-15 Thread Prateek Maheshwari

+1 (non-binding) for these changes.

- Prateek

> On Oct 12, 2018, at 3:27 PM, Cameron Lee  wrote:
> 
> Hi all,
> 
> SEP-15 has been updated now that SAMZA-1714 has been reviewed and implemented.
> Please vote on whether there are further breaking changes needed in the API 
> or we can accept this proposal to be included in Samza 1.0.
> 
> https://cwiki.apache.org/confluence/display/SAMZA/SEP-15%3A+New+Runtime+Context+API
> 
> Thank you,
> Cameron

Re: [VOTE] SEP-13: unified ApplicationDescriptor and ApplicationRunner APIs for high and low- level APIs in YARN and standalone deployment

2018-10-12 Thread Prateek Maheshwari

+1 (non-binding) from me. Thanks for making the changes and updating the SEP!

- Prateek

On Fri, Oct 12, 2018 at 12:15 PM Yi Pan  wrote:
>
> Hi, all,
>
> Given SAMZA-1789 has been reviewed and implemented, SEP-13 has been updated
> to the latest API classes as well. Please vote on whether there is further
> breaking changes needed in the API, or we can accept this proposal and seal
> it for 1.0.
>
> https://cwiki.apache.org/confluence/display/SAMZA/SEP-13%3A+unify+high-+and+low-level+user+applications+in+YARN+and+standalone
>
> Thanks a lot!
>
> This email serves as my +1 (binding) to accept SEP-13.
>
> -Yi

[VOTE] SEP-14: System and Stream Descriptors

2018-10-12 Thread Prateek Maheshwari

Hi folks,

Now that SAMZA-1804 has been implemented and reviewed, we've updated
SEP-14 with the latest APIs and design decisions.

Please vote for accepting SEP-14 in its current form for the upcoming
Samza 1.0 release.
https://cwiki.apache.org/confluence/display/SAMZA/SEP-14%3A+System+and+Stream+Descriptors

Thanks,
Prateek

Re: Need Some Help w/ Gradle Build on OpenJDK 11

2018-10-11 Thread Prateek Maheshwari

Hi Jeremiah,

We fixed a Rat related issue yesterday in
https://github.com/apache/samza/pull/703/. I don't know if this is the
same issue you were running into, but might be worth trying again with
the latest master.

Thanks,
Prateek
On Wed, Oct 10, 2018 at 6:58 AM Jeremiah Adams
 wrote:
>
> Anyone have a few cycles to help me out with Apache Rat failing the build on 
> OpenJDK 11?
>
> Some things I’ve tried:
>
> Bumped Gradle to 4.10.2 (required for Java11). Bumped the Rat version to 0.12.
>
>
>
> Jeremiah Adams
> Software Engineer
> www.helixeducation.com
> Blog | Twitter | Facebook | LinkedIn
>
> 
> From: Jeremiah Adams 
> Sent: Tuesday, October 2, 2018 11:58 AM
> To: dev@samza.apache.org
> Subject: [POSSIBLE PHISHING] Need Some Help w/ Gradle Build on OpenJDK 11
>
> I know very little about Gradle.  I got a response on the issue I opened 
> regarding Gradle builds failing on OpenJDK 11. I've since upgraded the Gradle 
> version in gradle-wrapper.properties to 4.10.2. Build now gets past the 
> java11 issue but is dying on the Apache rat task.  The build/plugin can't 
> find a stylesheet.  I've had no luck chasing this down, can anyone point me 
> in the right direction?
>
> Output from build:
>
> jeremiah:samza jeremiah$ gradle --stacktrace clean build
> > Task :rat FAILED
>
> FAILURE: Build failed with an exception.
>
> * Where:
> Script '/Users/jeremiah/projects/open_source/samza/gradle/rat.gradle' line: 90
>
> * What went wrong:
> Execution failed for task ':rat'.
> > stylesheet 
> > /Users/jeremiah/.gradle/daemon/4.10.2/gradle/resources/rat-output-to-html.xsl
> >  doesn't exist.
>
> * Try:
> Run with --info or --debug option to get more log output. Run with --scan to 
> get full insights.
>
> * Exception is:
> org.gradle.api.tasks.TaskExecutionException: Execution failed for task ':rat'.
> at 
> org.gradle.api.internal.tasks.execution.ExecuteActionsTaskExecuter.executeActions(ExecuteActionsTaskExecuter.java:110)
> at 
> org.gradle.api.internal.tasks.execution.ExecuteActionsTaskExecuter.execute(ExecuteActionsTaskExecuter.java:77)
> at 
> org.gradle.api.internal.tasks.execution.OutputDirectoryCreatingTaskExecuter.execute(OutputDirectoryCreatingTaskExecuter.java:51)
> at 
> org.gradle.api.internal.tasks.execution.SkipUpToDateTaskExecuter.execute(SkipUpToDateTaskExecuter.java:59)
> at 
> org.gradle.api.internal.tasks.execution.ResolveTaskOutputCachingStateExecuter.execute(ResolveTaskOutputCachingStateExecuter.java:54)
> at 
> org.gradle.api.internal.tasks.execution.ValidatingTaskExecuter.execute(ValidatingTaskExecuter.java:59)
> at 
> org.gradle.api.internal.tasks.execution.SkipEmptySourceFilesTaskExecuter.execute(SkipEmptySourceFilesTaskExecuter.java:101)
> at 
> org.gradle.api.internal.tasks.execution.FinalizeInputFilePropertiesTaskExecuter.execute(FinalizeInputFilePropertiesTaskExecuter.java:44)
> at 
> org.gradle.api.internal.tasks.execution.CleanupStaleOutputsExecuter.execute(CleanupStaleOutputsExecuter.java:91)
> at 
> org.gradle.api.internal.tasks.execution.ResolveTaskArtifactStateTaskExecuter.execute(ResolveTaskArtifactStateTaskExecuter.java:62)
> at 
> org.gradle.api.internal.tasks.execution.SkipTaskWithNoActionsExecuter.execute(SkipTaskWithNoActionsExecuter.java:59)
> at 
> org.gradle.api.internal.tasks.execution.SkipOnlyIfTaskExecuter.execute(SkipOnlyIfTaskExecuter.java:54)
> at 
> org.gradle.api.internal.tasks.execution.ExecuteAtMostOnceTaskExecuter.execute(ExecuteAtMostOnceTaskExecuter.java:43)
> at 
> org.gradle.api.internal.tasks.execution.CatchExceptionTaskExecuter.execute(CatchExceptionTaskExecuter.java:34)
> at 
> org.gradle.api.internal.tasks.execution.EventFiringTaskExecuter$1.run(EventFiringTaskExecuter.java:51)
> at 
> org.gradle.internal.operations.DefaultBuildOperationExecutor$RunnableBuildOperationWorker.execute(DefaultBuildOperationExecutor.java:300)
> at 
> org.gradle.internal.operations.DefaultBuildOperationExecutor$RunnableBuildOperationWorker.execute(DefaultBuildOperationExecutor.java:292)
> at 
> org.gradle.internal.operations.DefaultBuildOperationExecutor.execute(DefaultBuildOperationExecutor.java:174)
> at 
> org.gradle.internal.operations.DefaultBuildOperationExecutor.run(DefaultBuildOperationExecutor.java:90)
> at 
> org.gradle.internal.operations.DelegatingBuildOperationExecutor.run(DelegatingBuildOperationExecutor.java:31)
> at 
> org.gradle.api.internal.tasks.execution.EventFiringTaskExecuter.execute(EventFiringTaskExecuter.java:46)
> at 
> org.gradle.execution.taskgraph.LocalTaskInfoExecutor.execute(LocalTaskInfoExecutor.java:42)
> at 
> org.gradle.execution.taskgraph.DefaultTaskExecutionGraph$BuildOperationAwareWorkItemExecutor.execute(DefaultTaskExecutionGraph.java:277)
> at 
>

Re: [DISCUSS] SEP-14: System and Stream Descriptors

2018-08-21 Thread Prateek Maheshwari

Hi folks,

I updated SEP-14 based on some feedback and discussions on the PR. Major 
changes are:

1. We will not support system level serdes _out of the box_ (e.g., for 
GenericSystemDescritors). Users must specify a stream level serde when creating 
Input/OutputStreamDescriptors. System implementers may still choose to support 
system level serdes for their own descriptors.

2. We will not support stream level input transformers in the initial 
implementation. We can add them later if we find this to be a common use case. 

3. We changed the SystemDescriptor base classes to optional interfaces for 
providing input and output descriptors. This allows system implementers more 
flexibility in providing input and output descriptors to users.

Please let me know if you have any questions or suggestions. If not, we can 
move SEP-14 from discussion to voting.

Thanks,
Prateek

> On Aug 6, 2018, at 4:20 PM, Prateek Maheshwari  wrote:
> 
> Hi all,
> 
> Here's the proposal for System and Stream Descriptors - a way of
> specifying systems, input and output streams properties in application
> code instead of configurations.
> https://cwiki.apache.org/confluence/display/SAMZA/SEP-14%3A+System+and+Stream+Descriptors
> 
> Here's the PR with an implementation for the proposal above:
> https://github.com/apache/samza/pull/603
> 
> Please let me know if you have any questions or feedback for the SEP above.
> 
> Thanks,
> Prateek

[DISCUSS] SEP-14: System and Stream Descriptors

2018-08-06 Thread Prateek Maheshwari

Hi all,

Here's the proposal for System and Stream Descriptors - a way of
specifying systems, input and output streams properties in application
code instead of configurations.
https://cwiki.apache.org/confluence/display/SAMZA/SEP-14%3A+System+and+Stream+Descriptors

Here's the PR with an implementation for the proposal above:
https://github.com/apache/samza/pull/603

Please let me know if you have any questions or feedback for the SEP above.

Thanks,
Prateek

Re: Samza 0.14.1 : OffsetOutOfRangeException even with auto.offset.reset=smallest

2018-07-09 Thread Prateek Maheshwari

Hi Thunder,

Can you provide debug level logs from KafkaSystemConsumer with the
stack trace for the exception? It'll help figure out why the
auto.offset.reset property isn't taking effect.

If this error is due to an older checkpoint for the stream, you can
try resetting the checkpoint using the following two configurations:
streams.stream-id.samza.reset.offset: If set to true, when a Samza
container starts up, it ignores any checkpointed offset for this
particular input stream. Its behavior is thus determined by the
samza.offset.default setting. Note that the reset takes effect every
time a container is started, which may be every time you restart your
job, or more frequently if a container fails and is restarted by the
framework.

streams.stream-id.samza.offset.default: If a container starts up
without a checkpoint, this property determines where in the input
stream we should start consuming. The value must be an OffsetType, one
of the following:
  upcoming: Start processing messages that are published after the job
starts. Any messages published while the job was not running are not
processed.
  oldest: Start processing at the oldest available message in the
system, and reprocess the entire available message history.

I.e., set 'samza.reset.offset' = true, and 'samza.offset.default' =
oldest for your stream. Let us know if this doesn't help.

Thanks,
Prateek

On Fri, Jul 6, 2018 at 11:43 AM, Thunder Stumpges  wrote:
> Hi all,
>
>
> We've just run into a strange problem with samza 0.14.1. We had a job down 
> for a bit, while kafka cleaned past our saved offsets. When starting the job 
> now, we get repeated 
> org.apache.kafka.common.errors.OffsetOutOfRangeException. And it just retries 
> over and over again. We HAVE set
>
> systems.kafka.consumer.auto.offset.reset=smallest as well. Has anyone else 
> seen this? Our understanding from the documentation is that this setting says 
> what to do if the offset is out of range.
>
>
>
> systems.system-name.consumer.auto.offset.reset : This setting determines what 
> happens if a consumer attempts to read an offset that is outside of the 
> current valid range. This could happen if the topic does not exist, or if a 
> checkpoint is older than the maximum message history retained by the brokers.
>
>
>
> This is the set of messages that keeps repeating:
>
>
>
> 2018-07-06 18:32:15 INFO  kafka.utils.VerifiableProperties - Verifying 
> properties
>
> 2018-07-06 18:32:15 INFO  kafka.utils.VerifiableProperties - Property 
> client.id is overridden to samza_consumer-stg_apollo_crawler_stream_task-1
>
> 2018-07-06 18:32:15 INFO  kafka.utils.VerifiableProperties - Property 
> metadata.broker.list is overridden to kafka-server.ntent.com:9092
>
> 2018-07-06 18:32:15 INFO  kafka.utils.VerifiableProperties - Property 
> request.timeout.ms is overridden to 3
>
> 2018-07-06 18:32:15 INFO  kafka.client.ClientUtils$ - Fetching metadata from 
> broker BrokerEndPoint(0,kafka-server,9092) with correlation id 12 for 1 
> topic(s) Set(my-topic)
>
> 2018-07-06 18:32:15 INFO  kafka.producer.SyncProducer - Connected to 
> kafka-server:9092 for producing
>
> 2018-07-06 18:32:15 INFO  kafka.producer.SyncProducer - Disconnecting from 
> kafka-server:9092
>
> 2018-07-06 18:32:15 INFO  o.a.samza.system.kafka.GetOffset - Validating 
> offset 6883929 for topic and partition [my-topic,10]
>
> 2018-07-06 18:32:15 WARN  o.a.s.s.kafka.KafkaSystemConsumer - While 
> refreshing brokers for [my-topic,10]: 
> org.apache.kafka.common.errors.OffsetOutOfRangeException: The requested 
> offset is not within the range of offsets maintained by the server.. Retrying.
>
>
>
> Thanks!
>
> Thunder
>
>

Re: Urgent : Help with latency / backlog / topic lag

2018-06-08 Thread Prateek Maheshwari

Good to hear its working now. Please feel free to reach out if you run
into any issues.

When you upgrade the job to 0.14.1, please remember to change
single.thread.mode to false (or remove this configuration): The bug in
SAMZA-1599 only affects the AsyncRunLoop implementation. Setting
"single.thread.mode = true" reverts to the older and synchronous
RunLoop implementation. Since the bug is fixed for AsyncRunLoop in
0.14.1, you can continue using it again.

Thanks,
Prateek

On Fri, Jun 8, 2018 at 3:23 PM, Thunder Stumpges  wrote:
> I set job.container.single.thread.mode = true
>
> And actually I think we did catch up with that setting. I have since 
> completed also the merge of 0.14.1 and we are able to keep up with the input 
> now.
>
> Thanks again for the pointers and the fast response!
>
> -Original Message-
> From: Prateek Maheshwari [mailto:prateek...@gmail.com]
> Sent: Friday, June 8, 2018 15:00
> To: dev@samza.apache.org
> Subject: Re: Urgent : Help with latency / backlog / topic lag
>
> Just to clarify, when you say you tried single threaded mode, do you mean 
> that you set job.container.thread.pool.size = 1, or that you set 
> job.container.single.thread.mode = true?
>
> On Fri, Jun 8, 2018 at 2:53 PM, Thunder Stumpges  wrote:
>> Thanks for the quick reply. That sounds very much like what I'm
>> seeing. I'm merging in 0.14.1 to our branch now. I did try single
>> threaded mode and unfortunately that didn't seem to make a significant
>> difference. Perhaps I do need some multithreading? I'm seeing a task
>> latency 0.2ms per message but still only achieve ~700/sec
>>
>>
>> -Original Message-
>> From: Prateek Maheshwari [mailto:prateek...@gmail.com]
>> Sent: Friday, June 8, 2018 13:54
>> To: dev@samza.apache.org
>> Subject: Re: Urgent : Help with latency / backlog / topic lag
>>
>> Hi Thunder,
>>
>>> What we believe may be happening is that most of the topics have no
>> backlog, but one topic has all the backlog (this is because one of the 
>> topics accounts for ~60% of the total message rate).  Could there be 
>> something inducing extra latency on processing the one topic with a backlog 
>> just having a bunch of other topics with NO backlog?
>> This seems very similar to this issue:
>> https://issues.apache.org/jira/browse/SAMZA-1599
>> This was fixed in https://github.com/apache/samza/pull/436, and the fix 
>> should be available in the 0.14.1 version.
>> Would it be possible to try upgrading to 0.14.1? It should be backwards 
>> compatible with 0.14.0.
>>
>> For something you can try without upgrading: try setting
>> "job.container.single.thread.mode" to true. From the configuration
>> reference
>> <https://samza.apache.org/learn/documentation/latest/jobs/configuration-table.html>:
>> "If set to true, samza will fallback to legacy single-threaded event loop.
>> Default is false, which enables the multithreading execution."
>>
>> Let us know if this doesn't help.
>>
>> Thanks,
>> Prateek
>>
>> On Fri, Jun 8, 2018 at 1:35 PM, Thunder Stumpges 
>> wrote:
>>
>>> We have a new samza job which we just put into production. This job
>>> processes many topics (~30) but the total rate is not that high
>>> (~1200/sec in aggregate). I am unable to get above ~700/sec and have a 
>>> growing backlog.
>>>
>>> We are running samza 0.12 (I have an update to 0.14 that is not
>>> tested or pushed yet).  When we load tested with a single topic, we
>>> could easily do several thousand per second. The latency of a single
>>> message is about 0.5ms as recorded by our timer metric on our 'process' 
>>> call.
>>>
>>> What we believe may be happening is that most of the topics have no
>>> backlog, but one topic has all the backlog (this is because one of
>>> the topics accounts for ~60% of the total message rate).  Could there
>>> be something inducing extra latency on processing the one topic with
>>> a backlog just having a bunch of other topics with NO backlog?
>>>
>>> Some things I have tried:
>>>
>>>
>>>   1.  Increasing thread pool (10->20->30), no change
>>>   2.  Going from 1 container to 2, no help (the two containers run at
>>> half the speed and total is the same)
>>>   3.  Increasing task.max.concurrency from 1 -> 2 -> 3  (this had
>>> some minor help going from 1 to 2, but not enough)
>>>   4.  Increasing fetch.threshold.bytes (currently at 100,000 and we
>>> have pretty small messages)
>>>
>>> Some observed metrics:
>>>
>>>
>>>   *   "Pending Messages" are > 0  (15+ on some partitions)
>>>   *   "Messages in flight" is almost always 0
>>>   *   Polls rate is ~50/sec
>>>   *   Message chooser "Choos Obj" is ~680-700/sec like our processing rate
>>>   *   Message chooser "choose null" is ~50/sec
>>>
>>> I'm somewhat at a loss because based on the actual processing latency
>>> we should easily be able to do 2000+ with just a small handful of threads.
>>>
>>> Thanks in advance, this is in production I really need a solution.
>>> Thunder
>>>
>>>

Re: Urgent : Help with latency / backlog / topic lag

2018-06-08 Thread Prateek Maheshwari

Just to clarify, when you say you tried single threaded mode, do you
mean that you set job.container.thread.pool.size = 1, or that you set
job.container.single.thread.mode = true?

On Fri, Jun 8, 2018 at 2:53 PM, Thunder Stumpges  wrote:
> Thanks for the quick reply. That sounds very much like what I'm seeing. I'm 
> merging in 0.14.1 to our branch now. I did try single threaded mode and 
> unfortunately that didn't seem to make a significant difference. Perhaps I do 
> need some multithreading? I'm seeing a task latency 0.2ms per message but 
> still only achieve ~700/sec
>
>
> -Original Message-
> From: Prateek Maheshwari [mailto:prateek...@gmail.com]
> Sent: Friday, June 8, 2018 13:54
> To: dev@samza.apache.org
> Subject: Re: Urgent : Help with latency / backlog / topic lag
>
> Hi Thunder,
>
>> What we believe may be happening is that most of the topics have no
> backlog, but one topic has all the backlog (this is because one of the topics 
> accounts for ~60% of the total message rate).  Could there be something 
> inducing extra latency on processing the one topic with a backlog just having 
> a bunch of other topics with NO backlog?
> This seems very similar to this issue:
> https://issues.apache.org/jira/browse/SAMZA-1599
> This was fixed in https://github.com/apache/samza/pull/436, and the fix 
> should be available in the 0.14.1 version.
> Would it be possible to try upgrading to 0.14.1? It should be backwards 
> compatible with 0.14.0.
>
> For something you can try without upgrading: try setting 
> "job.container.single.thread.mode" to true. From the configuration reference
> <https://samza.apache.org/learn/documentation/latest/jobs/configuration-table.html>:
> "If set to true, samza will fallback to legacy single-threaded event loop.
> Default is false, which enables the multithreading execution."
>
> Let us know if this doesn't help.
>
> Thanks,
> Prateek
>
> On Fri, Jun 8, 2018 at 1:35 PM, Thunder Stumpges 
> wrote:
>
>> We have a new samza job which we just put into production. This job
>> processes many topics (~30) but the total rate is not that high
>> (~1200/sec in aggregate). I am unable to get above ~700/sec and have a 
>> growing backlog.
>>
>> We are running samza 0.12 (I have an update to 0.14 that is not tested
>> or pushed yet).  When we load tested with a single topic, we could
>> easily do several thousand per second. The latency of a single message
>> is about 0.5ms as recorded by our timer metric on our 'process' call.
>>
>> What we believe may be happening is that most of the topics have no
>> backlog, but one topic has all the backlog (this is because one of the
>> topics accounts for ~60% of the total message rate).  Could there be
>> something inducing extra latency on processing the one topic with a
>> backlog just having a bunch of other topics with NO backlog?
>>
>> Some things I have tried:
>>
>>
>>   1.  Increasing thread pool (10->20->30), no change
>>   2.  Going from 1 container to 2, no help (the two containers run at
>> half the speed and total is the same)
>>   3.  Increasing task.max.concurrency from 1 -> 2 -> 3  (this had some
>> minor help going from 1 to 2, but not enough)
>>   4.  Increasing fetch.threshold.bytes (currently at 100,000 and we
>> have pretty small messages)
>>
>> Some observed metrics:
>>
>>
>>   *   "Pending Messages" are > 0  (15+ on some partitions)
>>   *   "Messages in flight" is almost always 0
>>   *   Polls rate is ~50/sec
>>   *   Message chooser "Choos Obj" is ~680-700/sec like our processing rate
>>   *   Message chooser "choose null" is ~50/sec
>>
>> I'm somewhat at a loss because based on the actual processing latency
>> we should easily be able to do 2000+ with just a small handful of threads.
>>
>> Thanks in advance, this is in production I really need a solution.
>> Thunder
>>
>>

Re: Urgent : Help with latency / backlog / topic lag

2018-06-08 Thread Prateek Maheshwari

Hi Thunder,

> What we believe may be happening is that most of the topics have no
backlog, but one topic has all the backlog (this is because one of the
topics accounts for ~60% of the total message rate).  Could there be
something inducing extra latency on processing the one topic with a backlog
just having a bunch of other topics with NO backlog?
This seems very similar to this issue:
https://issues.apache.org/jira/browse/SAMZA-1599
This was fixed in https://github.com/apache/samza/pull/436, and the fix
should be available in the 0.14.1 version.
Would it be possible to try upgrading to 0.14.1? It should be backwards
compatible with 0.14.0.

For something you can try without upgrading: try setting
"job.container.single.thread.mode" to true. From the configuration reference
:
"If set to true, samza will fallback to legacy single-threaded event loop.
Default is false, which enables the multithreading execution."

Let us know if this doesn't help.

Thanks,
Prateek

On Fri, Jun 8, 2018 at 1:35 PM, Thunder Stumpges 
wrote:

> We have a new samza job which we just put into production. This job
> processes many topics (~30) but the total rate is not that high (~1200/sec
> in aggregate). I am unable to get above ~700/sec and have a growing backlog.
>
> We are running samza 0.12 (I have an update to 0.14 that is not tested or
> pushed yet).  When we load tested with a single topic, we could easily do
> several thousand per second. The latency of a single message is about 0.5ms
> as recorded by our timer metric on our 'process' call.
>
> What we believe may be happening is that most of the topics have no
> backlog, but one topic has all the backlog (this is because one of the
> topics accounts for ~60% of the total message rate).  Could there be
> something inducing extra latency on processing the one topic with a backlog
> just having a bunch of other topics with NO backlog?
>
> Some things I have tried:
>
>
>   1.  Increasing thread pool (10->20->30), no change
>   2.  Going from 1 container to 2, no help (the two containers run at half
> the speed and total is the same)
>   3.  Increasing task.max.concurrency from 1 -> 2 -> 3  (this had some
> minor help going from 1 to 2, but not enough)
>   4.  Increasing fetch.threshold.bytes (currently at 100,000 and we have
> pretty small messages)
>
> Some observed metrics:
>
>
>   *   "Pending Messages" are > 0  (15+ on some partitions)
>   *   "Messages in flight" is almost always 0
>   *   Polls rate is ~50/sec
>   *   Message chooser "Choos Obj" is ~680-700/sec like our processing rate
>   *   Message chooser "choose null" is ~50/sec
>
> I'm somewhat at a loss because based on the actual processing latency we
> should easily be able to do 2000+ with just a small handful of threads.
>
> Thanks in advance, this is in production I really need a solution.
> Thunder
>
>

Re: Old style "low level" Tasks with alternative deployment model(s)

2018-03-20 Thread Prateek Maheshwari

Glad you were able to figure it out, that was very confusing. Thanks for
the fix too.

- Prateek

On Mon, Mar 19, 2018 at 9:58 PM, Thunder Stumpges <tstump...@ntent.com>
wrote:

> And that last issue was mine. My setting override was not picked up and it
> was using GroupByContainerCount instead.
> -Thanks,
> Thunder
>
>
> -Original Message-
> From: Thunder Stumpges
> Sent: Monday, March 19, 2018 20:58
> To: dev@samza.apache.org
> Cc: Jagadish Venkatraman <jagadish1...@gmail.com>; t...@recursivedream.com;
> yi...@linkedin.com; Yi Pan <nickpa...@gmail.com>
> Subject: RE: Old style "low level" Tasks with alternative deployment
> model(s)
>
> Well I figured it out. My specific issue was due to a simple dependency
> problem where I had gotten an older version of the Jackson-mapper library.
> However the code was throwing NoSuchMethodError (an Error instead of
> Exception) and being silently dropped. I created a pull request to handle
> any Throwable in ScheduleAfterDebounceTime.
> https://github.com/apache/samza/pull/450
>
> I'm now running into an issue with the generation of the JobModel and the
> ProcessorId. The ZkJobCoordinator has a ProcessorId that is a Guid, but
> when GroupByContainerIds class (my TaskNameGrouper) creates the
> ContainerModels, it is using the ContainerId (a numeric value, 0,1,2,etc)
> as the ProcessorId (~ line 105). This results in the JobModel that is
> generated and published immediately causing the processor to quit with this
> message:
>
> INFO  o.apache.samza.zk.ZkJobCoordinator - New JobModel does not contain
> pid=38c637bf-9c2b-4856-afc4-5b1562711cfb. Stopping this processor.
>
> I was assuming I should be using GroupByContainerIds as my
> TaskNameGrouper. I don't see any other promising implementations. Am I just
> missing something?
>
> Thanks,
> Thunder
>
> JobModel
> {
>   "config" : {
>   ...
>   },
>   "containers" : {
> "0" : {
>   "tasks" : {
> "Partition 0" : {
>   "task-name" : "Partition 0",
>   "system-stream-partitions" : [ {
> "system" : "kafka",
> "partition" : 0,
> "stream" : "test_topic1"
>   }, {
> "system" : "kafka",
> "partition" : 0,
> "stream" : "test_topic2"
>   } ],
>   "changelog-partition" : 0
> },
> "Partition 1" : {
>   "task-name" : "Partition 1",
>   "system-stream-partitions" : [ {
> "system" : "kafka",
> "partition" : 1,
> "stream" : "test_topic1"
>   }, {
> "system" : "kafka",
> "partition" : 1,
> "stream" : "test_topic2"
>   } ],
>   "changelog-partition" : 1
> }
>   },
>   "container-id" : 0,
>   "processor-id" : "0"
> }
>   },
>   "max-change-log-stream-partitions" : 2,
>   "all-container-locality" : {
> "0" : null
>   }
> }
>
> -Original Message-
> From: Thunder Stumpges [mailto:tstump...@ntent.com]
> Sent: Friday, March 16, 2018 18:21
> To: dev@samza.apache.org
> Cc: Jagadish Venkatraman <jagadish1...@gmail.com>; t...@recursivedream.com;
> yi...@linkedin.com; Yi Pan <nickpa...@gmail.com>
> Subject: RE: Old style "low level" Tasks with alternative deployment
> model(s)
>
> Attached. I don't see any threads actually running this code which is odd.
>
> There's my main thread that's waiting for the whole thing to finish, the
> "debounce-thread-0" (which logged the other surrounding messages below) has
> this:
>
> "debounce-thread-0" #18 daemon prio=5 os_prio=0 tid=0x7fa0fd719800
> nid=0x21 waiting on condition [0x7fa0d0d45000]
>java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x0006f166e350> (a
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.
> java:175)
> at java.util.concurrent.locks.AbstractQueuedSynchronizer$
> ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
> at java.util.concurrent.ScheduledThreadPoolExecutor$
> DelayedWorkQu

Re: Old style "low level" Tasks with alternative deployment model(s)

2018-03-16 Thread Prateek Maheshwari

Hi Thunder,

Can you please take and attach a thread dump with this?

Thanks,
Prateek

On Fri, Mar 16, 2018 at 4:47 PM, Thunder Stumpges 
wrote:

> It appears it IS hung while serializing the JobModel... very strange! I
> added some debug statements around the calls:
>
>   LOG.debug("Getting object mapper to serialize job model");  // this
> IS printed
>   ObjectMapper mmapper = SamzaObjectMapper.getObjectMapper();
>   LOG.debug("Serializing job model"); // this IS printed
>   String jobModelStr = mmapper.writerWithDefaultPrettyPrinter
> ().writeValueAsString(jobModel);
>   LOG.info("jobModelAsString=" + jobModelStr); // this is NOT printed!
>
> Another thing I noticed is that "getObjectMapper" actually creates the
> object mapper twice!
>
> 2018-03-16 23:09:24 logback 24985 [debounce-thread-0] DEBUG
> org.apache.samza.zk.ZkUtils - Getting object mapper to serialize job model
> 2018-03-16 23:09:24 logback 24994 [debounce-thread-0] DEBUG 
> o.a.s.s.model.SamzaObjectMapper
> - Creating new object mapper and simple module
> 2018-03-16 23:09:24 logback 25178 [debounce-thread-0] DEBUG 
> o.a.s.s.model.SamzaObjectMapper
> - Adding SerDes and mixins
> 2018-03-16 23:09:24 logback 25183 [debounce-thread-0] DEBUG 
> o.a.s.s.model.SamzaObjectMapper
> - Adding custom ContainerModel deserializer
> 2018-03-16 23:09:24 logback 25184 [debounce-thread-0] DEBUG 
> o.a.s.s.model.SamzaObjectMapper
> - Setting up naming strategy and registering module
> 2018-03-16 23:09:24 logback 25187 [debounce-thread-0] DEBUG 
> o.a.s.s.model.SamzaObjectMapper
> - Done!
> 2018-03-16 23:09:24 logback 25187 [debounce-thread-0] DEBUG 
> o.a.s.s.model.SamzaObjectMapper
> - Creating new object mapper and simple module
> 2018-03-16 23:09:24 logback 25187 [debounce-thread-0] DEBUG 
> o.a.s.s.model.SamzaObjectMapper
> - Adding SerDes  and mixins
> 2018-03-16 23:09:24 logback 25187 [debounce-thread-0] DEBUG 
> o.a.s.s.model.SamzaObjectMapper
> - Adding custom ContainerModel deserializer
> 2018-03-16 23:09:24 logback 25187 [debounce-thread-0] DEBUG 
> o.a.s.s.model.SamzaObjectMapper
> - Setting up naming strategy and registering module
> 2018-03-16 23:09:24 logback 25187 [debounce-thread-0] DEBUG 
> o.a.s.s.model.SamzaObjectMapper
> - Done!
> 2018-03-16 23:09:24 logback 25187 [debounce-thread-0] DEBUG
> org.apache.samza.zk.ZkUtils - Serializing job model
>
> Could this ObjectMapper be a singleton? I see there is a private static
> instance, but getObjectMapper creates a new one every time...
>
> Anyway, then it takes off to serialize the job model and never comes
> back...
>
> Hoping someone has some idea here... the implementation for this mostly
> comes from Jackson-mapper-asl, and I have the version that is linked in the
> 0.14.0 tag:
> |||+--- org.codehaus.jackson:jackson-mapper-asl:1.9.13
> ||||\--- org.codehaus.jackson:jackson-core-asl:1.9.13
>
> Thanks!
> Thunder
>
> -Original Message-
> From: Thunder Stumpges [mailto:tstump...@ntent.com]
> Sent: Friday, March 16, 2018 15:29
> To: dev@samza.apache.org; Jagadish Venkatraman 
> Cc: t...@recursivedream.com; yi...@linkedin.com; Yi Pan <
> nickpa...@gmail.com>
> Subject: RE: Old style "low level" Tasks with alternative deployment
> model(s)
>
> So, my investigation starts at StreamProcessor.java.  Line 294 in method
> onNewJobModel() logs an INFO message that it's starting a container. This
> message never appears.
>
> I see that ZkJobCoordinator calls onNewJobModel from its
> onNewJobModelConfirmed method which also logs an info message stating
> "version X of the job model got confirmed". I never see this message
> either, so I go up the chain some more.
>
> I DO see:
>
> 2018-03-16 21:43:58 logback 20498 [ZkClient-EventThread-13-10.0.127.114:2181]
> INFO  o.apache.samza.zk.ZkJobCoordinator - ZkJobCoordinator::onBecomeLeader
> - I became the leader!
> And
> 2018-03-16 21:44:18 logback 40712 [debounce-thread-0] INFO
> o.apache.samza.zk.ZkJobCoordinator - 
> pid=91e07d20-ae33-4156-a5f3-534a95642133Generated
> new Job Model. Version = 1
>
> Which led me to method onDoProcessorChange line 210. I see that line, but
> not the line below " Published new Job Model. Version =" so something in
> here is not completing:
>
> LOG.info("pid=" + processorId + "Generated new Job Model. Version = "
> + nextJMVersion);
>
> // Publish the new job model
> zkUtils.publishJobModel(nextJMVersion, jobModel);
>
> // Start the barrier for the job model update
> barrier.create(nextJMVersion, currentProcessorIds);
>
> // Notify all processors about the new JobModel by updating JobModel
> Version number
> zkUtils.publishJobModelVersion(currentJMVersion, nextJMVersion);
>
> LOG.info("pid=" + processorId + "Published new Job Model. Version = "
> + nextJMVersion);
>
> As I mentioned, after the line "Generated new Job Model. Version = 1" I
> just get repeated zk ping responses.. no more

Re: Welcome Xinyu as new Samza PMC!

2018-01-17 Thread Prateek Maheshwari

This is great news. Congrats Xinyu, and thanks for your contributions!

> On Jan 17, 2018, at 10:39 AM, Srinivasulu Punuru  wrote:
> 
> Congrats Xinyu, Very well deserved!
> 
> 
> From: Jagadish Venkatraman 
> Sent: Wednesday, January 17, 2018 10:37:46 AM
> To: dev@samza.apache.org
> Subject: Re: Welcome Xinyu as new Samza PMC!
> 
> Big Congrats Xinyu. Thanks for your continued contributions to all aspects
> of the project!
> 
> On Wed, Jan 17, 2018 at 10:36 AM, Wei Song  wrote:
> 
>> Congrats, Xinyu!
>> 
>> --
>> Thanks
>> -Wei
>> 
>> 
>> On 1/17/18, 10:35 AM, "Navina Ramesh"  wrote:
>> 
>>Congratulations, Xinyu!
>>Thanks for all your contribution and looking forward to more 
>> 
>> 
>>Cheers!
>>Navina
>> 
>>
>>From: Yi Pan 
>>Sent: Wednesday, January 17, 2018 10:26:54 AM
>>To: dev@samza.apache.org
>>Subject: Welcome Xinyu as new Samza PMC!
>> 
>>Finally all the documentation procedure is completed and Xinyu Liu has
>> been
>>officially promoted to Samza PMC member! This is well deserved due to
>> his
>>continued contribution to the Samza project.
>> 
>>Please join me to welcome Xinyu as our newest PMC member!
>> 
>>Cheers!
>> 
>>-Yi Pan
>> 
>> 
>> 
> 
> 
> --
> Jagadish V,
> Graduate Student,
> Department of Computer Science,
> Stanford University

Re: Review Request 63267: Add instructions to README.md

2017-10-25 Thread Prateek Maheshwari via Review Board


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63267/#review189259
---


Ship it!




Thanks for adding this. Let's commit to whichever branch this was tested on, 
and I'll copy and update for the other branch.

- Prateek Maheshwari


On Oct. 24, 2017, 3:54 p.m., Xinyu Liu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63267/
> ---
> 
> (Updated Oct. 24, 2017, 3:54 p.m.)
> 
> 
> Review request for samza and Prateek Maheshwari.
> 
> 
> Repository: samza-hello-samza
> 
> 
> Description
> ---
> 
> Add instructions to README.md
> 
> 
> Diffs
> -
> 
>   README.md 0f80e9e52240abc071dcdfb56800826ef5d49d7d 
>   build.gradle ec451d576a157eba2d8889b63b8574f397937197 
> 
> 
> Diff: https://reviews.apache.org/r/63267/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Xinyu Liu
> 
>

Re: Review Request 63267: Add instructions to README.md

2017-10-25 Thread Prateek Maheshwari via Review Board


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63267/#review189258
---


Ship it!




Thanks for adding this. Were these instructions were tested on the latest 
branch or on master? Let's commit to whichever it was, and I'll copy it over 
and update for the other branch.

- Prateek Maheshwari


On Oct. 24, 2017, 3:54 p.m., Xinyu Liu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63267/
> ---
> 
> (Updated Oct. 24, 2017, 3:54 p.m.)
> 
> 
> Review request for samza and Prateek Maheshwari.
> 
> 
> Repository: samza-hello-samza
> 
> 
> Description
> ---
> 
> Add instructions to README.md
> 
> 
> Diffs
> -
> 
>   README.md 0f80e9e52240abc071dcdfb56800826ef5d49d7d 
>   build.gradle ec451d576a157eba2d8889b63b8574f397937197 
> 
> 
> Diff: https://reviews.apache.org/r/63267/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Xinyu Liu
> 
>

Re: Kafka client.id collision

2017-07-20 Thread Prateek Maheshwari

+1 for adding system name to the client id.

- Prateek

On Thu, Jul 20, 2017 at 10:43 AM, Navina Ramesh (Apache) 
wrote:

> Hi David,
>
> I think this is expected to occur as a warning since we spin up all kafka
> clients with the same client-id, which is $job.name + $job.id.
>
> As Jagadish mentioned, it will be great if you can provide us the entire
> log so that we can take a look.
>
> As a side note for the samza contributors, I do believe the container spins
> up kafka clients for each kafka systems defined, even if it is not used.
> Iirc, we use `KafkaUtil.getClientId` for generating the client id. Perhaps
> it makes sense to append another identifier with the client id (such as
> system name or component name). That way, we won't lose the kafka-client
> related metrics and there will be no overlap between the client ids.
> Thoughts?
>
> Thanks!
> Navina
>
> On Thu, Jul 20, 2017 at 9:13 AM, Jagadish Venkatraman <
> jagadish1...@gmail.com> wrote:
>
> > Can you share the entire log file if that's okay? The warning should be a
> > red-herring IMHO.
> >
> > On Thu, Jul 20, 2017 at 7:50 AM Davide Simoncelli <
> netcelli@gmail.com>
> > wrote:
> >
> > > Hi,
> > >
> > > Thanks for the reply.
> > >
> > > It is a warning, but the application fails. Here is the logging:
> > >
> > >
> > > 017-07-20 10:43:06.349 [main] AppInfoParser [INFO] Kafka version :
> > 0.10.1.1
> > > 2017-07-20 10:43:06.349 [main] AppInfoParser [INFO] Kafka commitId :
> > > f10ef2720b03b247
> > > 2017-07-20 10:43:06.351 [main] AppInfoParser [WARN] Error registering
> > > AppInfo mbean
> > > javax.management.InstanceAlreadyExistsException:
> > > kafka.producer:type=app-info,id=samza_producer-wikipedia_feed-1
> > > at com.sun.jmx.mbeanserver.Repository.addMBean(
> > Repository.java:437)
> > > at
> > > com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.
> > registerWithRepository(DefaultMBeanServerInterceptor.java:1898)
> > > at
> > > com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.
> > registerDynamicMBean(DefaultMBeanServerInterceptor.java:966)
> > > at
> > > com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerObject(
> > DefaultMBeanServerInterceptor.java:900)
> > > at
> > > com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(
> > DefaultMBeanServerInterceptor.java:324)
> > > at
> > > com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(
> > JmxMBeanServer.java:522)
> > > at
> > > org.apache.kafka.common.utils.AppInfoParser.registerAppInfo(
> > AppInfoParser.java:58)
> > > at
> > > org.apache.kafka.clients.producer.KafkaProducer.(
> > KafkaProducer.java:331)
> > > at
> > > org.apache.kafka.clients.producer.KafkaProducer.(
> > KafkaProducer.java:163)
> > > at
> > > org.apache.samza.system.kafka.KafkaSystemFactory$$anonfun$3.
> > apply(KafkaSystemFactory.scala:89)
> > > at
> > > org.apache.samza.system.kafka.KafkaSystemFactory$$anonfun$3.
> > apply(KafkaSystemFactory.scala:89)
> > > at
> > > org.apache.samza.system.kafka.KafkaSystemProducer.send(
> > KafkaSystemProducer.scala:144)
> > > at
> > > org.apache.samza.coordinator.stream.CoordinatorStreamSystemProduce
> > r.send(CoordinatorStreamSystemProducer.java:113)
> > > at
> > > org.apache.samza.coordinator.stream.CoordinatorStreamWriter.
> > sendSetConfigMessage(CoordinatorStreamWriter.java:98)
> > > at
> > > org.apache.samza.coordinator.stream.CoordinatorStreamWriter.
> sendMessage(
> > CoordinatorStreamWriter.java:82)
> > > at
> > > org.apache.samza.job.yarn.SamzaYarnAppMasterService.onInit(
> > SamzaYarnAppMasterService.scala:68)
> > > at
> > > org.apache.samza.job.yarn.YarnClusterResourceManager.start(
> > YarnClusterResourceManager.java:180)
> > > at
> > > org.apache.samza.clustermanager.ContainerProcessManager.start(
> > ContainerProcessManager.java:167)
> > > at
> > > org.apache.samza.clustermanager.ClusterBasedJobCoordinator.run(
> > ClusterBasedJobCoordinator.java:154)
> > > at
> > > org.apache.samza.clustermanager.ClusterBasedJobCoordinator.main(
> > ClusterBasedJobCoordinator.java:222)
> > > 2017-07-20 10:43:06.549 [main] CoordinatorStreamWriter [INFO] Stopping
> > the
> > > coordinator stream producer.
> > > 2017-07-20 10:43:06.549 [main] CoordinatorStreamSystemProducer [INFO]
> > > Stopping coordinator stream producer.
> > > 2017-07-20 10:43:06.549 [main] KafkaProducer [INFO] Closing the Kafka
> > > producer with timeoutMillis = 9223372036854775807 ms.
> > >
> > >
> > > > On 20 Jul 2017, at 3:16 pm, Jagadish Venkatraman <
> > jagadish1...@gmail.com>
> > > wrote:
> > > >
> > > > Hi Davide,
> > > >
> > > > Is this logged as an error or as a warning?
> > > >
> > > > IIUC, this warning should not fail the job. It may not cause some
> Mbean
> > > > sensors / metrics emitted from Kafka to be correctly reported (since,
> > > those
> > > > are reported per-clientId).
> >

Re: [DISCUSS] SEP-2: ApplicationRunner Design

2017-06-20 Thread Prateek Maheshwari

mmatically in env. Suppose we have the
> following
> >>>> > before
> >>>> > > > > runner.input(...):
> >>>> > > > >
> >>>> > > > >   runner.setup(env /* a writable interface of env*/ -> {
> >>>> > > > > env.setStreamSpec(streamId, streamSpec);
> >>>> > > > > env.setSystem(systemName, systemFactory);
> >>>> > > > >   })
> >>>> > > > >
> >>>> > > > > runner.setup(->) also provides setup for stores and other
> runtime
> >>>> > stuff
> >>>> > > > > needed for the execution. The setup should be able to
> serialized
> >>>> to
> >>>> > > > config.
> >>>> > > > > For #6, I haven't figured out a good way to inject
> user-defined
> >>>> > objects
> >>>> > > > > here yet.
> >>>> > > > >
> >>>> > > > > With this API, we should be able to also support #4. For
> remote
> >>>> > > > > runner.run(), the operator user classes/lamdas in the
> StreamGraph
> >>>> > need
> >>>> > > to
> >>>> > > > > be serialized. As today, the existing option is to serialize
> to a
> >>>> > > stream,
> >>>> > > > > either the coordinator stream or the pipeline control stream,
> >>>> which
> >>>> > > will
> >>>> > > > > have the size limit per message. Do you see RPC as an option?
> >>>> > > > >
> >>>> > > > > For this version of API, seems we don't need the
> StreamApplication
> >>>> > > > wrapper
> >>>> > > > > as well as exposing the StreamGraph. Do you think we are on
> the
> >>>> right
> >>>> > > > path?
> >>>> > > > >
> >>>> > > > > Thanks,
> >>>> > > > > Xinyu
> >>>> > > > >
> >>>> > > > >
> >>>> > > > > On Thu, Apr 27, 2017 at 6:09 AM, Chris Pettitt <
> >>>> > > > > cpett...@linkedin.com.invalid> wrote:
> >>>> > > > >
> >>>> > > > >> That should have been:
> >>>> > > > >>
> >>>> > > > >> For #1, Beam doesn't have a hard requirement...
> >>>> > > > >>
> >>>> > > > >> On Thu, Apr 27, 2017 at 9:07 AM, Chris Pettitt <
> >>>> > cpett...@linkedin.com
> >>>> > > >
> >>>> > > > >> wrote:
> >>>> > > > >>
> >>>> > > > >> > For #1, I doesn't have a hard requirement for any change
> from
> >>>> > > Samza. A
> >>>> > > > >> > very nice to have would be to allow the input systems to be
> >>>> set up
> >>>> > > at
> >>>> > > > >> the
> >>>> > > > >> > same time as the rest of the StreamGraph. An even nicer to
> have
> >>>> > > would
> >>>> > > > >> be to
> >>>> > > > >> > do away with the callback based approach and treat graph
> >>>> building
> >>>> > > as a
> >>>> > > > >> > library, a la Beam and Flink.
> >>>> > > > >> >
> >>>> > > > >> > For the moment I've worked around the two pass requirement
> >>>> (once
> >>>> > for
> >>>> > > > >> > config, once for StreamGraph) by introducing an IR layer
> >>>> between
> >>>> > > Beam
> >>>> > > > >> and
> >>>> > > > >> > the Samza Fluent translation. The IR layer is convenient
> >>>> > independent
> >>>> > > > of
> >>>> > > > >> > this problem because it makes it easier to switch between
> the
> >>>> > Fluent
> >>>> > > > and
> >>>> > > > >> > low-level APIs.
> >>>> > > > >> >
>

Re: [DISCUSS] SEP-6: Support Watermark Across Intermediate Streams for Batch Processing

2017-05-25 Thread Prateek Maheshwari

Hi Xinyu,

Thanks for the proposal. Some requests for clarifications. Let's update the
SEP directly instead of replying here.

E.g., in "For any following intermediate stream whose input streams are all
end-of-stream, it will be marked as pending EOS" - Should clarify that
(IIUC) something is injecting EOS messages in all intermediate stream
partitions once it receives EOS from all input stream partitions it's
consuming. Should also clarify what is that something.
Same for "declare end of stream once all the EOS messages have been
received." - What does this declaration involve and who is doing this?

In pro for approach 2: Not clear what this means - "The watermark can
conclude the input messages before this watermark have been complete."

For the cons of approach 2: "Complicated failure scenario of the second
job. It needs to checkpoint all the watermark messages received, so when it
recovered from failure, it can still count." - How is this related to EOS?
How is this related to the checkpoint watermark section?
Also, what is the "more messages required to write.. " referring to?

"Samza needs to reconcile based on the task counts." - Please explain what
reconciliation means, why it needs to happen, and why we need to track the
producer task and total task count in the watermark message to do this.

Checkpoint watermarks section is also unclear. What problem are we trying
to solve here?

Should also move the message format and the watermark message interface
sections to the bottom, since they depend on details in the event time and
checkpoint watermark sections.

Thanks,
Prateek

On Wed, May 24, 2017 at 11:30 AM, xinyu liu  wrote:

> Hi all,
>
> I created SEP-6 for SAMZA-1260
> : Support Watermark
> Across Intermediate Streams for Batch Processing. The link to the SEP is
> here:
>
> https://cwiki.apache.org/confluence/display/SAMZA/SEP-
> 6+Support+Watermark+Across+Intermediate+Streams+for+Batch+Processing
>
> Please review and comments are welcome!
>
> Thanks,
> Xinyu
>

Re: [VOTE] Apache Samza 0.13.0 RC0

2017-05-17 Thread Prateek Maheshwari

Resent the CANCEL email, hopefully it makes it this time.

- Prateek

On Wed, May 17, 2017 at 2:08 PM, Navina Ramesh (Apache) 
wrote:

> Prateek told me that he sent out a cancel email. It didn't reach the
> mail-archive I think. Lately, we have this kind of issues where the emails
> are not reaching our dev list.
>
> On Wed, May 17, 2017 at 2:06 PM, Yi Pan  wrote:
>
> > Hi, all,
> >
> > Based on the conversation above, can we officially cancel this vote?
> >
> > Thanks!
> >
> > -Yi
> >
> > On Mon, May 15, 2017 at 9:31 AM, Ignacio Solis  wrote:
> >
> > > Thanks!
> > >
> > > On Mon, May 15, 2017 at 8:00 AM, Navina Ramesh
> > >  wrote:
> > > > I will try to get the patch out today. Work doesn't look trivial. I
> am
> > on
> > > > it.
> > > >
> > > > Navina
> > > >
> > > > On May 14, 2017 23:10, "Ignacio Solis"  wrote:
> > > >
> > > >> We should hold off until it is solved.  How long will it take to fix
> > > this?
> > > >>
> > > >> On Sun, May 14, 2017 at 10:13 PM, Navina Ramesh (Apache)
> > > >>  wrote:
> > > >> > I just changed the status of this JIRA to "BLOCKER" -
> > > >> > https://issues.apache.org/jira/browse/SAMZA-1128
> > > >> >
> > > >> > This causes a bug in standalone deployment where any failure in
> the
> > > >> barrier
> > > >> > protocol stops the scheduled executorservice. Unfortunately,
> > > >> > CoordinationUtils creates its own scheduled executorservice, which
> > is
> > > >> > incorrect. Scheduled ExecutorService is meant to be the working
> > queue
> > > for
> > > >> > the ZkJobCoordinator. This needs to be fixed. Bharath already ran
> > into
> > > >> this
> > > >> > bug during testing on Friday.
> > > >> >
> > > >> > veto for this release candidate.
> > > >> >
> > > >> > @Prateek/Jagadish:
> > > >> > I recommend sending a "non-vote, testing release candidate" for
> this
> > > >> > release until we complete all pending tasks (includes docs, tests
> > > etc).
> > > >> It
> > > >> > will also be useful to share the pending tasks and their progress.
> > In
> > > >> case
> > > >> > you have already shared it, I might have missed it since some
> emails
> > > are
> > > >> > bouncing off my inbox.
> > > >> >
> > > >> > Thanks!
> > > >> > Navina
> > > >> >
> > > >> > On Sun, May 14, 2017 at 1:30 PM, Boris S 
> wrote:
> > > >> >
> > > >> >> I think we need to add SAMZA-1286 and
> > > >> >> SAMZA-1279 to the release .
> > > >> >>
> > > >> >> On Wed, May 10, 2017 at 7:51 PM, Jagadish Venkatraman <
> > > >> jagad...@apache.org
> > > >> >> >
> > > >> >> wrote:
> > > >> >>
> > > >> >> > This is a call for a vote on a release of Apache Samza 0.13.0.
> > > Thanks
> > > >> to
> > > >> >> > everyone who has contributed to this release. We are very glad
> to
> > > see
> > > >> >> some
> > > >> >> > new contributors and features in this release.
> > > >> >> >
> > > >> >> > The release candidate can be downloaded from here:
> > > >> >> > http://home.apache.org/~jagadish/samza-0.13.0-rc0/
> > > >> >> >
> > > >> >> > The release candidate is signed with pgp key AF81FFBF, which
> can
> > be
> > > >> found
> > > >> >> > on keyservers:
> > > >> >> > http://pgp.mit.edu/pks/lookup?op=get=0xAF81FFBF
> > > >> >> >
> > > >> >> > The git tag is release-0.13.0-rc0 and signed with the same pgp
> > key:
> > > >> >> > https://git-wip-us.apache.org/repos/asf?p=samza.git;a=tag;h=
> > > >> >> > refs/tags/release-0.13.0-rc0
> > > >> >> >
> > > >> >> > Test binaries have been published to Maven's staging
> repository,
> > > and
> > > >> are
> > > >> >> > available here:
> > > >> >> > https://repository.apache.org/content/repositories/
> > > >> orgapachesamza-1020
> > > >> >> >
> > > >> >> > 127 issues were resolved for this release:
> > > >> >> > https://issues.apache.org/jira/issues/?jql=project%20%
> > > >> >> > 3D%20SAMZA%20AND%20fixVersion%20in%20(0.13%2C%200.13.0)%
> > > >> >> > 20AND%20status%20in%20(Resolved%2C%20Closed)
> > > >> >> >
> > > >> >> > The vote will be open for 72 hours (ending at 8:00PM Saturday,
> > > >> >> 05/13/2017).
> > > >> >> >
> > > >> >> > Please download the release candidate, check the
> > hashes/signature,
> > > >> build
> > > >> >> it
> > > >> >> > and test it, and then please vote:
> > > >> >> >
> > > >> >> >
> > > >> >> > [ ] +1 approve
> > > >> >> >
> > > >> >> > [ ] +0 no opinion
> > > >> >> >
> > > >> >> > [ ] -1 disapprove (and reason why)
> > > >> >> >
> > > >> >> >
> > > >> >> > +1 from my side for the release.
> > > >> >> >
> > > >> >> > Cheers!
> > > >> >> >
> > > >> >>
> > > >>
> > > >>
> > > >>
> > > >> --
> > > >> Nacho - Ignacio Solis - iso...@igso.net
> > > >>
> > >
> > >
> > >
> > > --
> > > Nacho - Ignacio Solis - iso...@igso.net
> > >
> >
>

[DISCUSS] Samza 0.13.0 release

2017-05-05 Thread Prateek Maheshwari

 Hi all,

There have been quite a lot of new features added to master since 0.12
release to warrant a new major release. At LinkedIn, we've done functional
and performance testing against master in the past weeks, and deployed jobs
with the latest build in production. We will continue to test for stability
in the next few weeks.

We've made significant progress on several exciting new features:
SAMZA-1063: Samza Standalone Project
SAMZA-1073: Top-level fluent API
SAMZA-1130: ApplicationRunner for running StreamApplication

Here are a few additional features that will also be included in this
release:
SAMZA-868: Support Elasticsearch version 2.x
SAMZA-1135: Support Scala 2.12
SAMZA-1140: Non blocking commit in Async Runloop
SAMZA-1143: Universal config support for localized resource
SAMZA-1145: Provide Ability To Confgure The Default Number Of Changelog
Replicas
SAMZA-1154: Tasks endpoint to list the complete details of all tasks
related to a job

An exhaustive list of changes in 0.13 can be found here

.

Here are the issues that we should consider as blockers for the 0.13.0
release:
SAMZA-871: Heart-beat mechanism between JobCoordinator and all running
containers
SAMZA-1150: Handling Error propagation between ZkJobCoordinator &
DebounceTimer
SAMZA-1153: Metrics should be added for ZK based JobCoordinator
SAMZA-1155: Allow configuring window.ms to specify trigger duration
SAMZA-1268: Final renamings and javadocs for public APIs for 0.13 release
SAMZA-1267: ApplicationRunner#getLocalRunner returns null

Here's what I propose:
1. Cut an 0.13.0 release branch.
2. Work on getting the blocker JIRAs resolved.
3. Target a release vote on the week of May 15th.

Please let us know if there are any other changes you'd like to go in
0.13.0.

Thanks,
Prateek

1 2 >

1 - 100 of 142 matches

Mail list logo