Re: [VOTE] Apache Samza 1.8.0 RC0

2022-12-21 Thread Xinyu Liu
+1 (binding).

Verified the md5 and sha1 checksums. Run check-all.sh on linux and the
build/tests all passed. Please also generate the sha256 checksums for the
release artifacts, according to Apache's requirements for open source
releases.

Thanks,
Xinyu

On Wed, Dec 21, 2022 at 9:47 AM Ajo Thomas  wrote:

> Hey All,
>
> This is a call for a vote on the release of *Apache Samza 1.8.0.*
> Thanks to everyone who contributed to this release.
>
> The release candidate can be downloaded from here:
> https://home.apache.org/~ajothomas/samza-1.8.0-rc0/
> The release candidate is signed with pgp key *1A4639DA*, which is included
> in the repository's KEYS file:
> https://git-wip-us.apache.org/repos/asf?p=samza.git;a=blob_plain;f=KEYS
> and
> can also be found on keyservers:
>
> https://keyserver.ubuntu.com/pks/lookup?search=ajothomas%40apache.org=on=index
> <
> https://keyserver.ubuntu.com/pks/lookup?search=ajothomas%40apache.org=on=index
> >
>
> The git tag is *release-1.8.0-rc0* and signed with the same pgp key:
>
> https://gitbox.apache.org/repos/asf?p=samza.git;a=tag;h=refs/tags/release-1.8.0-rc0
> <
> https://gitbox.apache.org/repos/asf?p=samza.git;a=tag;h=refs/tags/release-1.8.0-rc0
> >
>
> Test binaries have been published to Maven's staging repository, and are
> available here:
> URL: https://repository.apache.org/#stagingRepositories
> 
> Repository: orgapachesamza-1095 (org.apache.samza)
>
> Please download the release candidate, check the hashes/signature, build it
> and test it, and then please vote:
> [ ] +1 approve
> [ ] +0 no opinion
> [ ] -1 disapprove (and reason why)
>
> Please note that check-all.sh was run and integration tests *were not *run
> as there are some issues with the legacy zopkio library used for
> integration testing. However, most of the key features being released as a
> part of this release have been tested and are currently used in many of our
> production jobs at LinkedIn. hadoop/yarn3 changes have been tested with
> https://github.com/apache/samza-hello-samza which brings up yarn 3,
> zookeeper and kafka clusters locally for testing.
>
> Thanks,
> Ajo
>


Re: [DISCUSS] Apache Samza 1.8.0 RC0

2022-12-19 Thread Xinyu Liu
+1. Thanks for driving this release!

Thanks,
Xinyu

On Sat, Dec 17, 2022 at 5:18 AM James DeMichele
 wrote:

> Ship it! 
>
> On Fri, Dec 16, 2022, 12:53 PM Ajo Thomas  wrote:
>
> > Hi All,
> >
> > Since the last Samza 1.7.0 release earlier this year, we have added a few
> > major features and improvements to the master which warrants a major
> 1.8.0
> > release.
> >
> > Within LinkedIn, some of these features have already been tested as part
> of
> > our test suites and are in use in many of our production jobs. We wanted
> to
> > kick off the discussion in the open source forum to keep the momentum
> > flowing.
> >
> > Below is a list of key features that we intend to include in the upcoming
> > release:
> > - SEP-31: Pipeline Drain
> > <
> >
> https://cwiki.apache.org/confluence/display/SAMZA/SEP-31%3A+Pipeline+Drain-+Support+the+ability+to+drain+pipelines+to+allow+incompatible+intermediate+schema+changes
> > >
> > - SAMZA-2757: Make Samza Compatible with Java 11
> >
> > Here is the complete list of the features, bug-fixes, upgrades: JIRA Link
> > <
> >
> https://issues.apache.org/jira/browse/SAMZA-2744?jql=project%20%3D%20SAMZA%20AND%20fixVersion%20%3D%201.8
> > >
> >
> > Thanks
> > Ajo
> >
>


[ANNOUNCE] Welcome Ajo Thomas as Samza Committer

2022-12-14 Thread Xinyu Liu
Hi, All,

I am glad to announce that Ajo Thomas has officially accepted our
invitation and become an Apache Samza Committer now.

Ajo has made contributions to improve both Samza user experience and
operability greatly. He added the partial update functionality to Samza
Table API to allow field-level updates to stores. He developed the
“Pipeline Drain” feature for cleaning up intermediate data and state before
introducing backward incompatible changes. He is also actively working on
the next release of Samza 1.8.

Considering his contributions, the Samza PMC trusts Ajo with the
responsibilities of a Samza Committer.

Please join me to give him a warm welcome!

Xinyu Liu
on behalf of the Apache Samza PMC


Re: [VOTE] SEP-31: Pipeline Drain- Support the ability to drain pipelines to allow incompatible intermediate schema changes

2022-11-29 Thread Xinyu Liu
+1.

Overall the design looks good. Thanks for contributing to this feature.

Thanks,
Xinyu

On Tue, Nov 29, 2022 at 10:44 AM Ajo Thomas  wrote:

> Hi All,
>
> This is a call for a vote on *SEP-31: Pipeline Drain- Support the ability
> to drain pipelines to allow incompatible intermediate schema changes.*
> Thanks to everyone involved with the design and reviews to refine the
> proposal.
>
> Discuss Email Thread:
> https://lists.apache.org/thread/7m2hqcqq9lx9o1d48gb64glplb3g2crt
>
> SEP-31:
>
> https://cwiki.apache.org/confluence/display/SAMZA/SEP-31%3A+Pipeline+Drain-+Support+the+ability+to+drain+pipelines+to+allow+incompatible+intermediate+schema+changes
>
> Jira ticket:
> https://issues.apache.org/jira/browse/SAMZA-2741
>
> Please vote:
> [ ] +1 approve
> [ ] +0 no opinion
> [ ] -1 disapprove (and reason why)
>
> Thanks,
> Ajo
>


Re: [VOTE] Apache Samza 1.7.0 RC1

2022-03-11 Thread Xinyu Liu
+1 (binding).

Verified the signature and checksums, and also ran check-all tests which
all passed.

Thanks,
Xinyu

On Fri, Mar 11, 2022 at 2:00 PM Bob S  wrote:

> +1
> Ran build,test and both integration tests (regular + standalone) and
> check-all.
> Verified signatures, sha and md5.
> Thanks Daniel!
>
> On Wed, Mar 9, 2022 at 4:34 PM Daniel Chen  wrote:
>
> > Hey all, This is a call for a vote on a release of Apache Samza 1.7.0.
> > Thanks to everyone who has contributed to this release.
> >
> > The release candidate can be downloaded from here:
> >
> > https://home.apache.org/~dchen/samza-1.7.0-rc1/
> >
> > The release candidate is signed with pgp key 1D9ADCE059431C34, which is
> > included in the repository's KEYS file:
> >
> >
> https://git-wip-us.apache.org/repos/asf?p=samza.git;a=blob_plain;f=KEYS;hb=c5831bfc01b2e70ba57c4bd3505c6a84a73c8a7b
> > and can also be found on keyservers:
> >
> >
> >
> https://keyserver.ubuntu.com/pks/lookup?search=dchen%40apache.org=on=index
> >
> > The git tag is release-1.7.0-rc1 and signed with the same pgp key:
> >
> >
> >
> https://gitbox.apache.org/repos/asf?p=samza.git;a=tag;h=refs/tags/release-1.7.0-rc1
> >
> > Test binaries have been published to Maven's staging repository, and are
> > available here:
> >
> > Scala 2.11:
> > https://repository.apache.org/content/repositories/orgapachesamza-1092/
> > Scala 2.12:
> > https://repository.apache.org/content/repositories/orgapachesamza-1093/
> >
> > The vote will be open for 72 hours ( end in 5:00pm Saturday, 03/12/2022
> ).
> > Please download the release candidate, check the hashes/signature, build
> it
> > and test it, and then please vote: [ ] +1 approve [ ] +0 no opinion [ ]
> -1
> > disapprove (and reason why)
> >
> > I ran check-all.sh and bor...@apache.org helped run integration tests
> > (both
> > YARN and standalone) passed, for rc1
> >
> > +1 from my side for the release.
> > Thanks,
> > Daniel
> >
>


Re: [VOTE] SEP-30: Support Updates in Table API

2021-12-21 Thread Xinyu Liu
+1 on my side.

Glad to see this feature coming. Please make sure the api changes are
reflected in the documents, e.g.
https://samza.apache.org/learn/documentation/1.0.0/api/table-api.html.

Thanks,
Xinyu

On Mon, Dec 20, 2021 at 10:44 AM Ajo Thomas  wrote:

> Hi All,
>
> This is a call for a vote on SEP-30: Support Updates in Table API
> Thanks to everyone involved with the design and reviews to refine the
> proposal.
>
> Email Thread:
>
> http://mail-archives.apache.org/mod_mbox/samza-dev/202112.mbox/%3cCAAMuQDN9fX64KONdqD1n06xTXvgMXNUqkt2RnPnt9Zr=vjn...@mail.gmail.com%3e
>
> SEP-30:
>
> https://cwiki.apache.org/confluence/display/SAMZA/SEP-30%3A+Support+Updates+in+Table+API
>
> Jira ticket:
> https://issues.apache.org/jira/browse/SAMZA-2709
>
> Please vote:
> [ ] +1 approve
> [ ] +0 no opinion
> [ ] -1 disapprove (and reason why)
>
> Thanks,
> Ajo Thomas
>


[ANNOUNCE] Welcome Daniel Chen as Samza Committer

2021-09-17 Thread Xinyu Liu
Hi, all,

I am glad to announce that Daniel Chen has officially accepted our
invitation and become an Apache Samza Committer now.

Daniel has contributed to many areas of Samza, from his early work on
Eventhub connector, to recently state restore and checkpointing
improvements. Daniel also contributed tremendously to integrate Apache Beam
Python API on top of Samza. As an active member in Samza, he has
participated frequently in the design, code reviews and mailing list
discussions. He has also contributed to Samza tutorials, website, releases
and bug fixes.

Considering his contributions, the Samza PMC trusts Daniel with the
responsibilities of a Samza Committer.

Please join me to give him a warm welcome!

Xinyu Liu
on behalf of the Apache Samza PMC


Re: Samza Runner for Beam Processing Time support

2021-02-09 Thread Xinyu Liu
Jan,

I tried the latest Beam 2.27 version and ran into the same issue as you
saw. I dug a bit deeper and it was caused by the recent changes in beam to
enable SplitableParDo in all runners. While we are going to work with Beam
to get this resolved, you can avoid the issue by adding this argument
"--experiments=use_deprecated_read" when running your program. This flag
will disable the new code path to make it work as before.

I also tried your triggering code in the KafkaWordCount example in
samza-beam-examples git repo (https://github.com/apache/samza-beam-examples).
Seems it is working for me as I can see the 1 second early firing within a
10 sec window and the fired panes are accumulated. You can also use this
git repo as a reference.

Thanks,
Xinyu


On Mon, Feb 8, 2021 at 9:41 AM Xinyu Liu  wrote:

> Hi, Jan,
>
> Thanks for reporting this issue to us. Processing time triggers are
> supported in Samza Runner with version Beam 2.22.0 [1]. The
> exception message wasn't updated after we added the support of processing
> time. Apologize for the confusion here. Looks most of the exception
> messages have been fixed in the latest code.
>
> From reading the code, it seems we will only run into this exception if we
> somehow end up having TimeDomain as synchronized_processing_time [2]. Samza
> runner does not support this time domain. Are you aware that your code
> might use it somehow? If not, I can help debug further. We have other users
> who use processing time triggers for early triggering, and it was working
> fine.
>
> I will also take a look at 2.27.0. LinkedIn has been recently upgraded to
> 2.26.0, and we found a few issues. Previously we were using a version close
> to 2.24.0.
>
> Thanks,
> Xinyu
>
> [1]:
> https://github.com/apache/beam/blob/9b43fadb8bb6f4bcabc945fc299b378eb1d7d205/runners/samza/src/main/java/org/apache/beam/runners/samza/runtime/SamzaTimerInternalsFactory.java#L347
> [2]
> https://github.com/apache/beam/blob/055140203ce2df56ba903b05266466cf16562dde/sdks/java/core/src/main/java/org/apache/beam/sdk/state/TimeDomain.java#L49
>
> On Sun, Feb 7, 2021 at 2:24 PM Jan Bensien 
> wrote:
>
>> Hello,
>>
>> I am currently trying to execute my Beam Pipelines using the Samza
>> Runner. I am using processing time triggers for calculating early
>> results for my larger windows.
>> However i am getting the following error:
>> java.lang.UnsupportedOperationException: class
>> org.apache.beam.runners.samza.SamzaRunner currently only supports event
>> time.
>> Looking at the capability matrix of
>> Beam(https://beam.apache.org/documentation/runners/capability-matrix/),
>> it looks like processing time should be supported.
>> I could not find a documentation, for the exact supported features for
>> the different runner versions.
>> I am using the version 2.22.0 for the Samza Runner but also tried 2.25.0
>> and got the same error. When i tried to upgrade to 2.27.0 I got the
>> following error: java.lang.UnsupportedOperationException:
>> BundleFinalizer unsupported in Samza. This happens whenever i use
>> KafkaIO to read from Kafka. Even when i tried a Pipeline that did
>> nothing except reading from Kafka.
>>
>> The trigger that caused the exception is the following: .
>> triggering(Repeatedly.forever(
>> AfterProcessingTime.pastFirstElementInPane()
>> .plusDelayOf(Duration.standardSeconds(1
>> .accumulatingFiredPanes());
>>
>> Running the pipeline with the Direct Runner worked fine. Which version
>> is the latest stable version of the Samza Runner and does it support
>> processing time triggers?
>>
>> With many thanks,
>>
>> Jan
>
>


Re: [DISCUSS] Samza 1.5.1 release

2020-08-19 Thread Xinyu Liu
+1. Hopefully it's not a bug in my code :)

Thanks,
Xinyu

On Wed, Aug 19, 2020 at 10:29 AM Yi Pan  wrote:

> +1 to this release as well!
>
> On Wed, Aug 19, 2020 at 9:21 AM Prateek Maheshwari 
> wrote:
>
> > +1, this is a critical bug and we should release the fix ASAP.
> >
> > Thanks,
> > Prateek
> >
> > On Tue, Aug 18, 2020 at 9:02 PM Bharath Kumara Subramanian <
> > codin.mart...@gmail.com> wrote:
> >
> > > Hi all,
> > >
> > > In 1.5 release, we enabled transactional state by default for all samza
> > > jobs. We identified a critical bug related to trimming the state which
> > > requires a minor release.
> > >
> > > I wanted to kick off the discussion on the open source forum as the bug
> > fix
> > > has been validated internally at LinkedIn.
> > >
> > > More details on the bug can be found in SAMZA-2578
> > > .
> > > The patch that contains the fix: samza/pull/1413
> > > 
> > >
> > > I'd like to target early next week for voting.
> > >
> > > Cheers,
> > > Bharath
> > >
> >
>


Re: [VOTE] Apache Samza 1.4.0 RC1

2020-03-10 Thread Xinyu Liu
+1 (binding).

Run check-all.sh and integration tests for both yarn and standalone. All
passed.

Thanks,
Xinyu


On Fri, Mar 6, 2020 at 6:46 PM Yi Pan  wrote:

> Have downloaded the files, build with check-all.sh, and ran both YARN and
> standalone integration tests. All passed.
>
> +1 (binding).
>
> Thanks!
>
> -Yi
>
> On Tue, Mar 3, 2020 at 3:03 PM Cameron Lee 
> wrote:
>
> > Hi all,
> >
> > This is a call for a vote on a release of Apache Samza 1.4.0. Thanks to
> > everyone who has contributed to this release.
> >
> > The release candidate can be downloaded from here:
> > https://home.apache.org/~cameronlee/samza-1.4.0-rc1/
> >
> > The release candidate is signed with pgp key 0x54CB3CE3, which can be
> found
> > here:
> >
> >
> https://keyserver.ubuntu.com/pks/lookup?search=0x54CB3CE3=on=index
> > or to directly see the public key here:
> >
> >
> https://keyserver.ubuntu.com/pks/lookup?op=get=0x71b0145290ecdbfa5caea6dbd786a7ba54cb3ce3
> >
> > The git tag is release-1.4.0-rc1, signed by the same pgp key above:
> >
> >
> https://gitbox.apache.org/repos/asf?p=samza.git;a=commit;h=5327fafb8502b126482ec0c4efc8d1aa9b96ba44
> >
> > Test binaries have been published to Maven's staging repository, and are
> > available here:
> > https://repository.apache.org/content/repositories/orgapachesamza-1077
> >
> > The vote will be open for 72 hours (until Friday, March 6, 2020 at 3pm
> > PST).
> >
> > Please download the release candidate, check the hashes/signature, build
> it
> > and test it, and then please vote:
> > [ ] +1 approve
> > [ ] +0 no opinion
> > [ ] -1 disapprove (and reason why)
> >
> > I ran check-all.sh and integration tests.
> >
> > +1 (non-binding) from my side.
> >
> > Thank you,
> > Cameron
> >
>


Re: [DISCUSS] 1.4 release

2020-02-26 Thread Xinyu Liu
This is great! Thanks for driving the new release.

Thanks,
Xinyu

On Tue, Feb 25, 2020 at 2:45 PM Cameron Lee  wrote:

> Hi all,
> We have made some updates to Samza, and we would like to make a Samza 1.4
> release.
> We have been testing some of these changes internally at Linkedin, and we
> would like to send this thread for discussion for releasing to open source.
> Highlights of the release include improvements to local state management,
> improvements to the Samza SQL API, and a new system producer to write to
> Azure blob storage, along with some other miscellaneous bug fixes and
> clean-up.
> A comprehensive list of changes can be found at
>
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SAMZA%20and%20fixVersion%20in%20(1.4)
> .
> Some of the tickets are still open, but the corresponding commits have
> already been pushed. If you do have an open ticket that is actually
> complete, please close it.
> The new release branch has already been cut. The name of the branch is
> "1.4.0".
> I would like to target a release VOTE email thread to start on February
> 28th.
> Thank you,
> Cameron
>


[ANNOUNCE] Please welcome Bharath Kumarasubramanian to the Samza PMC

2020-02-13 Thread Xinyu Liu
Hi all,

I'm very pleased to announce that the Samza PMC has voted Bharath
Kumarasubramanian to be a Project Management Committee (PMC) Member.  The
PMC is responsible for the overall health of a project and for voting in
new committers and PMC members, as well as voting on releases. Over the
past few years, Bharath has been a valuable committer on the project.

Congrats Bharath!

Thanks,
Xinyu
on behalf of the Samza PMC


Re: [DISCUSS] Samza 1.3.1 release

2020-02-13 Thread Xinyu Liu
Cool! Thanks for driving this release.

Thanks,
Xinyu

On Thu, Feb 13, 2020 at 10:28 AM Hai Lu  wrote:

> Hi all,
>
> We're going to make a 1.3.1 release to address some critical issues that
> were found in 1.3.0
>
> 1.3.1 will be based off 1.3.0 but include the following additional commits:
>
> SAMZA-2447: Checkpoint dir removal should only search in valid store dirs
> (#1261)
> SAMZA-2446: Invoke onCheckpoint only for registered SSPs (#1260)
> SAMZA-2431: Fix the checkpoint and changelog topic auto-creation. (#1251)
> SAMZA-2434: Fix the coordinator steam creation workflow
> SAMZA-2423: Heartbeat failure causes incorrect container shutdown (#1240)
> SAMZA-2305: Stream processor should ensure previous container is stopped
> during a rebalance (#1213)
>
> I'm going to create the release candidate soon.
>
> Thanks,
> Hai
>


Re: [DISCUSS] SEP-22: Container Placements in Samza

2020-02-13 Thread Xinyu Liu
Thanks for writing a very detailed design doc. My feedback is below:

- The title of the SEP is Container Placements in Samza, which sounds like
it's going to reimplement the container placements logic currently in
ClusterBasedJobCoordinator. After reading the proposal, I think it should
be renamed to be a much smaller scope, e.g. Container Relocation Tool.
- Goal: the goal is to build a tool, right? I want to be crystal clear on
that. After the solution, I think there will be an tool to do container
relocation for Samza open source. Build the ability doesn't mean much to
Samza open source users I believe.
- Proposed Solution: after reading this I think the rejected the option is
to write to metadastore, and the accepted one is to develop a container
placement service. But then I read the architecture part, the proposal says
the preferred approach is to use Samza metastore API to read and write the
container placement metadata. It feels contradictory to me. Seems the
metadatsstore is a must-have piece in this solution, so it might be cleaner
to remove the first solution.
- I don't understand the "service" part of the container placement. Is it a
separate service that can be hosted somewhere? Is it based on jetty or
netty? what's the communication protocol? From reading the proposal, it
looks like a set of API instead of service. If that's the case, please
remove all the usage of "container placement service" in this proposal.
- key-value format: the description of the uuid of the
ContainerPlacementRequestMessage has a typo I think (response -> request).
- The diagram of the components interaction look nice, and definitely helps
me understand what the solution will look like. I didn't find a description
about writing the new locality information. I am interested in the ordering
of that regarding to the ordering of reserving resource and running the
container. I feel that part will be a bit complicated.
- public interfaces: are these interfaces intended to by used by samza
users? How do they use them? Is there an example? Are these interfaces
going into samza-api? Are those interfaces stable? Why the uuid in the
interface cannot be generated by Samza itself? Why
ContainerPlacementMetadatstore is part of the API? should that be hidden
from the users? If the goal is to provide a tool, I would rather make these
classes internal in samza-core and also describe how the tool will look
like here.

Thanks,
Xinyu



On Wed, Jan 15, 2020 at 10:34 AM Sanil Jain  wrote:

> Hi Prateek,
>
> Thanks for the feedback, I have updated the SEP with your suggestions.
> Please have a look.
>
> Thank you
> Sanil
>
> On Mon, 6 Jan 2020 at 10:54, Sanil Jain  wrote:
>
> > Hi all,
> >
> > Refreshing this back to see if there is any further feedback on this SEP.
> >
> > Thanks
> > Sanil
> >
> > On Thu, 5 Dec 2019 at 16:25, Sanil Jain  wrote:
> >
> >> Hi all,
> >>
> >> I have created SEP-22: Container Placements in Samza, which adds
> >> abilities to move containers for a Samza job seamlessly from one host to
> >> another without restarting jobs.
> >>
> >> Please find the SEP wiki below:
> >>
> https://cwiki.apache.org/confluence/display/SAMZA/SEP-22%3A+Container+Placements+in+Samza
> >>
> >> Please take a look and chime in your feedback.
> >>
> >> Thanks
> >> Sanil
> >>
> >
>


Re: [VOTE] SEP 25: PR Title and Description Guidelines

2019-12-19 Thread Xinyu Liu
+1 (binding).

Thanks,
Xinyu

On Wed, Dec 18, 2019 at 7:49 PM Jagadish Venkatraman 
wrote:

> +1 (binding)
>
> On Thursday, December 19, 2019, Daniel Nishimura 
> wrote:
>
> > +1 non-binding
> >
> > > On Dec 18, 2019, at 7:09 PM, Yi Pan  wrote:
> > >
> > > +1 (binding)
> > >
> > > On Wed, Dec 18, 2019 at 10:49 AM Bharath Kumara Subramanian <
> > > codin.mart...@gmail.com> wrote:
> > >
> > >> +1 (non-binding).
> > >>
> > >> Thanks,
> > >> Bharath
> > >>
> > >> On Wed, Dec 18, 2019 at 10:42 AM Prateek Maheshwari <
> > prate...@utexas.edu>
> > >> wrote:
> > >>
> > >>> Hi folks,
> > >>>
> > >>> This is a call for a vote on SEP 25: PR Title and Description
> > Guidelines.
> > >>> Thanks to everyone who helped review the proposal and provided
> > feedback.
> > >>>
> > >>> Feedback from the discussion is positive:
> > >>>
> > >>>
> > >> http://mail-archives.apache.org/mod_mbox/samza-dev/201912.mbox/%
> > 3CCAMja7KeQr9C048UVZwfSC46h%3DEX_9S%2BSEvMF9NPg0V5dPTPfZg%
> > 40mail.gmail.com%3E
> > >>> <
> > >>>
> > >>>
> > >> http://mail-archives.apache.org/mod_mbox/samza-dev/201912.mbox/%
> > 3CCAMja7KeQr9C048UVZwfSC46h%3DEX_9S%2BSEvMF9NPg0V5dPTPfZg%
> > 40mail.gmail.com%3E
> > >>> r <
> > >>
> http://mail-archives.apache.org/mod_mbox/samza-dev/201912.mbox/browser
> > >
> > >>>
> > >>> SEP can be found at:
> > >>>
> > >>>
> > >> https://cwiki.apache.org/confluence/display/SAMZA/SEP-
> > 25%3A+PR+Title+And+Description+Guidelines
> > >>> <
> > >>>
> > >>>
> > >> https://cwiki.apache.org/confluence/display/SAMZA/SEP-
> > 25%3A+PR+Title+And+Description+Guidelines
> > 
> > >>>
> > >>> Please vote:
> > >>>
> > >>> [ ] +1 approve
> > >>>
> > >>> [ ] +0 no opinion
> > >>>
> > >>> [ ] -1 disapprove (and reason why)
> > >>>
> > >>> Thanks,
> > >>> Prateek
> > >>>
> > >>
> >
>
>
> --
> Jagadish
>


Re: [VOTE] SEP-23: Simplify Job Runner

2019-12-11 Thread Xinyu Liu
+1 (binding). This proposal will help future split deployment as well as
make the deployment simple. Thanks for making the effort!

Thanks,
Xinyu

On Wed, Dec 11, 2019 at 10:30 AM Ke Wu  wrote:

> Hi,
>
> This is a call for a vote on SEP-23: Simplify Job Runner. Thanks to
> everyone who help review and refine the proposal.
>
> Feedbacks from discussion is positive
> http://mail-archives.apache.org/mod_mbox/samza-dev/201912.mbox/browser <
> http://mail-archives.apache.org/mod_mbox/samza-dev/201912.mbox/browser>
>
> SEP can be found at
>
> https://cwiki.apache.org/confluence/display/SAMZA/SEP-23%3A+Simplify+Job+Runner
> <
> https://cwiki.apache.org/confluence/display/SAMZA/SEP-23:+Simplify+Job+Runner
> >
>
> Jira can be found at
> https://issues.apache.org/jira/browse/SAMZA-2405 <
> https://issues.apache.org/jira/browse/SAMZA-2405>
>
> Please vote:
>
> [ ] +1 approve
>
> [ ] +0 no opinion
>
> [ ] -1 disapprove (and reason why)
>
> Thanks,
> Ke


Re: [DISCUSS] SEP-23: Simplify Job Runner

2019-12-10 Thread Xinyu Liu
Thanks for updating the SEP wiki. The revised design looks clear to me.
Some config name might be simplified, e.g. job.config.loader.properties.path.
Overall it looks good.

Thanks,
Xinyu

On Fri, Dec 6, 2019 at 1:52 PM Ke Wu  wrote:

> As we are revamping our config loading logic in SEP-23: Simplify Job
> Runner, we will introduce backward incompatible changes on the job
> submission logic, i.e. deprecating the usage of --config-factory and
> --config-path, which reads a full config upon job submission, instead, we
> will only provide job submission related config through --config and
> ConfigLoader will load the complete config on AM instead.
>
> Please take a look and chime in your thoughts.
>
> Best,
> Ke
>
> > On Dec 2, 2019, at 4:42 PM, Ke Wu  wrote:
> >
> > Hi Xinyu,
> >
> > Please see the response in line:
> >
> >>  1. After this change, seems the original config-factory and config-path
> >>  are only used to supply parameters for submitting job. Is that the
> case?
> >>  Which configs are still needed in the submission?
> >
> > Yes, only configs related to job submission is needed.
> >
> > job.name, job.factory.class & yarn.package.path are the minimum three
> configs needed for the job submission, which may be supplied by --config
> instead.
> >
> >>  2. For backward compatibility, does it still work if the user doesn't
> >>  specify the new ConfigLoader in the command line? The
> >>  PropertiesConfigLoader class seems requiring the path of the config
> after
> >>  exploding the tgz.
> >
> > If the user does not specify config loader in the config, then it will
> work in the previous flow, where runner publishes configs in coordinator
> stream and job coordinator/application master will pick it up by reading
> from Kafka. So this is a backward compatible change.
> >
> >>  3. If the final plan is to remove the original config factory/path, how
> >>  do we pass the parameters needed for Yarn submission, e.g. job name,
> id,
> >>  and tgz path?
> >
> > We can either pass them by --config or introduce delicate command line
> arguments for it in CommandLine.scala.
> >
> >
> > Let me know if you have any further questions.
> >
> > Best,
> > Ke
> >
> >> On Nov 27, 2019, at 11:02 AM, Xinyu Liu  wrote:
> >>
> >> Thanks a lot for putting out the design for simplifying the job
> submission
> >> process. The motivation makes sense to me that most of the planning and
> >> config generation should be done after submitting to the cluster,
> instead
> >> of during the submission, which can happen in a local sandbox without
> the
> >> access to the resources needed for planning. It also improves the
> process
> >> from the security stand of the view.
> >>
> >> A few questions regarding to the interface changes:
> >>
> >>  1. After this change, seems the original config-factory and config-path
> >>  are only used to supply parameters for submitting job. Is that the
> case?
> >>  Which configs are still needed in the submission?
> >>  2. For backward compatibility, does it still work if the user doesn't
> >>  specify the new ConfigLoader in the command line? The
> >>  PropertiesConfigLoader class seems requiring the path of the config
> after
> >>  exploding the tgz.
> >>  3. If the final plan is to remove the original config factory/path, how
> >>  do we pass the parameters needed for Yarn submission, e.g. job name,
> id,
> >>  and tgz path?
> >>
> >> Thanks,
> >> Xinyu
> >>
> >> On Fri, Nov 15, 2019 at 3:00 PM Ke Wu  wrote:
> >>
> >>> We created SEP-23: Simplify Job Runner, which simplifies job runner by
> >>> moving config retrieval and planning to AM.
> >>>
> >>> Please find out the SEP wiki below:
> >>>
> >>>
> https://cwiki.apache.org/confluence/display/SAMZA/SEP-23%3A+Simplify+Job+Runner
> >>>
> >>> Please take a look and chime in your thoughts.
> >>>
> >>> Thanks,
> >>> Ke
> >>>
> >
>
>


Re: [VOTE] Apache Samza 1.3.0 RC2

2019-12-02 Thread Xinyu Liu
+ 1 (binding)

Verified the signatures, built and ran the integration tests. All passed.
There is one flaky test failure during running check-all.sh:

org.apache.samza.table.batching.TestBatchProcessor$TestBatchTriggered.testBatchOperationTriggeredByBatchSize
FAILED
java.lang.AssertionError
at org.junit.Assert.fail(Assert.java:86)
at org.junit.Assert.assertTrue(Assert.java:41)
at org.junit.Assert.assertTrue(Assert.java:52)
at
org.apache.samza.table.batching.TestBatchProcessor$TestBatchTriggered.testBatchOperationTriggeredByBatchSize(TestBatchProcessor.java:122)

This shouldn't block the release as the test is flaky. We should either fix
or disable this test for the future releases. Create ticket to track:
https://issues.apache.org/jira/browse/SAMZA-2411

Thanks,
Xinyu



On Sun, Dec 1, 2019 at 6:20 PM Yi Pan  wrote:

> +1 (binding), verified the signature, built and local integration tests
> passed.
>
> Thanks!
>
> -Yi
>
> On Wed, Nov 27, 2019 at 2:49 PM Hai Lu  wrote:
>
> > Hi,
> >
> > This is a call for a vote on a release of Apache Samza 1.3.0. Thanks to
> > everyone who has contributed to this release.
> >
> > The release candidate can be downloaded from here:
> > http://home.apache.org/~lhaiesp/samza-1.3.0-rc2/
> >
> > The release candidate is signed with pgp key 0x07678C76, which can be
> found
> > here:
> >
> >
> https://keyserver.ubuntu.com/pks/lookup?search=0x07678C76=on=index
> > or to directly see the public key here:
> >
> >
> https://keyserver.ubuntu.com/pks/lookup?op=get=0x1513eaedf69d7ca77ff283b534ea3ca507678c76
> >
> > The git tag is release-1.3.0-rc2 and signed with the same pgp key above:
> >
> >
> https://gitbox.apache.org/repos/asf?p=samza.git;a=commit;h=573ef951dd9d96d9d547db86bbc8023557714f47
> >
> > Test binaries have been published to Maven's staging repository, and are
> > available here:
> > https://repository.apache.org/content/repositories/orgapachesamza-1073
> >
> > The vote will be open for 171 hours (ending at 6:00 PM PST Wednesday,
> > 12/4/2019).
> >
> > Please download the release candidate, check the hashes/signature, build
> it
> > and test it, and then please vote:
> >
> > [ ] +1 approve
> >
> > [ ] +0 no opinion
> >
> > [ ] -1 disapprove (and reason why)
> >
> > I ran check-all.sh and integration tests (both YARN and standalone).
> >
> > +1 (non-binding) from my side.
> >
> > Thanks,
> > Hai
> >
>


Re: [DISCUSS] SEP-23: Simplify Job Runner

2019-11-27 Thread Xinyu Liu
Thanks a lot for putting out the design for simplifying the job submission
process. The motivation makes sense to me that most of the planning and
config generation should be done after submitting to the cluster, instead
of during the submission, which can happen in a local sandbox without the
access to the resources needed for planning. It also improves the process
from the security stand of the view.

A few questions regarding to the interface changes:

   1. After this change, seems the original config-factory and config-path
   are only used to supply parameters for submitting job. Is that the case?
   Which configs are still needed in the submission?
   2. For backward compatibility, does it still work if the user doesn't
   specify the new ConfigLoader in the command line? The
   PropertiesConfigLoader class seems requiring the path of the config after
   exploding the tgz.
   3. If the final plan is to remove the original config factory/path, how
   do we pass the parameters needed for Yarn submission, e.g. job name, id,
   and tgz path?

Thanks,
Xinyu

On Fri, Nov 15, 2019 at 3:00 PM Ke Wu  wrote:

> We created SEP-23: Simplify Job Runner, which simplifies job runner by
> moving config retrieval and planning to AM.
>
> Please find out the SEP wiki below:
>
> https://cwiki.apache.org/confluence/display/SAMZA/SEP-23%3A+Simplify+Job+Runner
>
> Please take a look and chime in your thoughts.
>
> Thanks,
> Ke
>


Re: [VOTE] SEP-20: Samza on Kubernetes

2019-11-07 Thread Xinyu Liu
+1 (binding).

Thanks,
Xinyu

On Thu, Nov 7, 2019 at 10:50 AM Weiqing Yang 
wrote:

> Hi All,
>
> The feedback from the discussion thread:
> http://mail-archives.apache.org/mod_mbox/samza-dev/201911.mbox/browser is
> positive. This is a call for a vote for SEP-20: Samza on Kubernetes <
>
> https://cwiki.apache.org/confluence/display/SAMZA/SEP-20%3A+Samza+on+Kubernetes
> >.
> Thanks to the committers and contributors that were involved with the
> review, design, etc.
>
> The link to the jira ticket: SAMZA-2067 <
> https://issues.apache.org/jira/browse/SAMZA-2067>.
>
> Thanks,
> Weiqing
>


Re: [DISCUSS] SEP-20: Samza on Kubernetes

2019-11-04 Thread Xinyu Liu
+1 on the design. This is a great feature to allow Samza to expand its
deployment to Kubernetes clusters. Nice job!

Thanks,
Xinyu

On Mon, Nov 4, 2019 at 10:10 AM Weiqing Yang 
wrote:

> Hi all,
>
> We created SEP-20: Samza on Kubernetes, which supports Samza to run on
> Kubernetes natively. Please find the SEP wiki below:
>
> https://cwiki.apache.org/confluence/display/SAMZA/SEP-20%3A+Samza+on+Kubernetes
>
> Please take a look and chime in your feedback.
>
> Thanks,
> Weiqing
>


Re: [ANNOUNCE] Please welcome Boris Shkolnik to the Samza PMC

2019-06-07 Thread Xinyu Liu
Congrats, Boris!

Xinyu

On Fri, Jun 7, 2019 at 3:13 PM Jakob Homan  wrote:

> Howdy all-
>I'm very pleased to announce that the Samza PMC has voted Boris
> Shkolnik to be a Project Management Committee (PMC) Member.  The PMC
> is responsible for the overall health of a project andl for voting in
> new committers and PMC members, as well as VOTEing on releases. Over
> the past two years, Boris has been a valuable committer on the
> project.
>
> Congrats Boris!
>
> Thanks,
>
> Jakob
> on behalf of the Samza PMC
>


Re: REMINDER. [VOTE] Apache Samza 1.2.0 RC4

2019-06-04 Thread Xinyu Liu
+1 (binding).

run check-all.sh and the build passed.

Having trouble running the integration tests in both linux and mac,
possibly due to my local machine env.

Thanks,
Xinyu

On Mon, Jun 3, 2019 at 11:00 AM Daniel Nishimura 
wrote:

> check-all.sh and integration tests passed. +1 from me.
>
> Just a side note, the link in the original email is a broken link. The link
> to the RC archive is: http://home.apache.org/~boryas/samza-1.2.0-rc4
>
> On Sun, Jun 2, 2019 at 5:00 PM Boris Shkolnik  wrote:
>
> > Hi,
> >
> > This is a call for a vote on a release of Apache Samza 1.2.0. Thanks to
> > everyone who has contributed to this release.
> >
> >
> > The release candidate can be downloaded from here:
> > http://home.apache.org/~boryas/samza-1.2.0-rc
> > 4
> >
> > (this release has a fix for standalone integration test)
> >
> > The release candidate is signed with pgp key 0x7D74D0CD5B5EB041, which
> can
> > be found
> > http://keyserver.ubuntu.com/pks/lookup?op=get=0x7d74d0cd5b5eb041
> >  >
> > The git tag is release-1.2.0-rc4 and signed with the same pgp key:
> >
> >
> https://gitbox.apache.org/repos/asf?p=samza.git;a=tag;h=refs/tags/release-1.2.0-rc
> > <
> >
> https://gitbox.apache.org/repos/asf?p=samza.git;a=tag;h=refs/tags/release-1.1.0-rc1
> > >
> > 4
> >
> > Test binaries have been published to Maven's staging repository, and are
> > available here:
> > https://repository.apache.org/content/repositories/orgapachesamza-106
> > <
> >
> https://repository.apache.org/content/repositories/orgapachesamza-1065/org/
> > >
> > 9
> >
> > The vote will be open until 06:00 PM PST Monday, 06/03/2019.
> >
> >
> > Please download the release candidate, check the hashes/signature, build
> it
> > and test it, and then please vote:
> >
> > [ ] +1 approve
> >
> > [ ] +0 no opinion
> >
> > [ ] -1 disapprove (and reason why)
> >
> > I ran check-all.sh and integration tests.
> >
> > +1 from my side.
> >
> > Thanks
> >
>


Re: [DISCUSS] 1.2 release

2019-05-20 Thread Xinyu Liu
+1 on the 1.2 release. It's time to get the newly added features out!

Thanks,
Xinyu

On Mon, May 20, 2019 at 9:39 AM Jake Maes  wrote:

> I don't think we did anything for "Making sendTo(table), sendTo(stream)
> non-terminal". The ticket was just closed as a "won't fix" IIRC.
>
> Nevertheless, I think the Kafka 2.0 upgrade warrants a release by itself.
>
> Let's do it.
>
> On Fri, May 17, 2019 at 12:17 PM Boris S  wrote:
>
> > Hi all,
> >
> > We have added a number of major features and changes to master since
> > 1.1 that warrants a new 1.2 release.
> >
> > Within LinkedIn, some of these features have already been tested as
> > part of our test suites. We plan to continue our testing in coming
> > week to validate the stability prior to release.
> >
> > We wanted to kick off the discussion in the open source forum to keep
> > the momentum flowing.
> > Here is a selected list of features that are part of the new release
> >
> >   Kafka 2.0 upgrade
> >
> >   Couchbase support for Samza Table API
> >   Making sendTo(table), sendTo(stream) non-terminal
> >
> > We have also worked on the following upgrades and bugfixes.
> > You can find a concrete list of the features, bug-fixes, upgrades
> > herehttps://
> >
> issues.apache.org/jira/issues/?jql=project%20%3D%20%22SAMZA%22%20and%20fixVersion%20in%20(1.2)
> >
> >
> > Some of these Jiras are not marked as fixed (but they are marked as
> > committed in the git log). Please close the Jiras is they are fixed.
> >
> > Here is my proposal on our release schedule and timelines.
> >1. Cut the 1.2 release branch.
> >2. Target a release vote on the week of May 20, 2019
> >
> >
> > Thanks
> > Boris
> >
>


[DISCUSS] Hygene for merging PRs

2019-05-16 Thread Xinyu Liu
Hi, all,

I've seen different practices around how PRs are contributed, reviewed and
merged for Samza open source. I think it's time to bring up our committer
guide again to make sure we follow exactly the guidelines. It's also an
opportunity to talk about future improvement to the flow.

*PR Contribution*
According to our committer guide [1], a JIRA must be created before
creating the PR, unless the PR is trivial typo or doc fixes. The PR needs
to have the JIRA ticket name in the following format:

*SAMZA- : *

As an example:
SAMZA-2168: Remove redundant SystemAdmin creation in ApplicationMaster [2].

*PR Review*
As discussed before, a Samza PR requires an approval from a committer
before merging. Contributors are welcome to review the code, but a final
"LGTM" from a committer is a MUST.

*PR merge*
As we now use the simple merge flow in github to merge a PR, I think we
should mostly squash the commits for merging.Otherwise it's hard to roll
back changes and it generally generates a lot of noise in the commit
history.

Any further suggestions are highly appreciated.

Thanks,
Xinyu

[1] https://cwiki.apache.org/confluence/display/SAMZA/Contributor%27s+Corner
[2] https://github.com/apache/samza/pull/1001


[ANNOUNCE] New committer announcement: Cameron Lee

2019-04-16 Thread Xinyu Liu
Hi, all,

Please join me and the rest of the Samza PMC in welcoming a new committer:
Cameron Lee.

Cameron has been contributing to Samza since early 2018. He worked on
multiple areas: the runtime context in Samza, checkpoint enhancements, as
well as testing and gradle improvements. He is also very active in code
reviews and discussions.

Through his work on refining existing Samza code base, the Samza PMC trusts
Cameron with the responsibilities of a Samza committer.

Look forward to seeing more contributions from you, Cameron!

Xinyu


[DISCUSS] Change Apache Samza git comments/merge email recipient to commits@samza

2019-04-04 Thread Xinyu Liu
Hi, All,

Our dev mailing list has been flooded with github comments/merges so it's
really hard to see the meaningful discussions and user engagement. Shall we
move dev@Samza off the JIRA/github messages? We will use
comm...@samza.apache.org as the recipients.

In addition, I think we should have most of our open source discussions in
the dev mailing list to raise visibility as well as attract contributors.

Please let me know if you are ok or have objections to this change. It's
a +1 on my side.

Thanks,
Xinyu


Re: [VOTE] SEP-21: Samza Async API for High Level

2019-03-28 Thread Xinyu Liu
Looks great! +1 (binding).

On Thu, Mar 28, 2019 at 12:03 PM Jake Maes  wrote:

> +1 (binding)
>
> On Mon, Mar 25, 2019 at 6:09 PM Jagadish Venkatraman <
> jagadish1...@gmail.com>
> wrote:
>
> > +1 (binding);
> >
> > thanks Bharath for the proposal and the implementation.
> >
> > LGTM;
> >
> > On Monday, March 25, 2019, Bharath Kumara Subramanian <
> > codin.mart...@gmail.com> wrote:
> >
> > > Hi all,
> > >
> > >
> > > This is a call for a vote for SEP-21: Samza Async API for High Level
> > >
> > >
> > > SEP-21 has been discussed and implemented using SAMZA-2055. For
> > reference,
> > > the design document can be found -
> > > https://cwiki.apache.org/confluence/display/SAMZA/SEP-
> > > 21%3A+Samza+Async+API+for+High+Level
> > >
> > >
> > > Thanks,
> > > Bharath
> > >
> >
> >
> > --
> > Jagadish V,
> > Graduate Student,
> > Department of Computer Science,
> > Stanford University
> >
>


Re: [DISCUSS] Samza 1.1.0 release

2019-03-07 Thread Xinyu Liu
+1 (binding)

Thanks,
Xinyu

On Thu, Mar 7, 2019 at 12:43 AM santhosh venkat <
santhoshvenkat1...@gmail.com> wrote:

> +1 (non-binding)
>
> Thanks,
>
> On Wed, Mar 6, 2019 at 10:42 PM Yi Pan  wrote:
>
> > +1 (binding)
> >
> > On Wed, Mar 6, 2019 at 10:08 PM Daniel Chen  wrote:
> >
> > > Hello everyone,
> > >
> > > We have added couple of major features to master since 1.0.0 that
> > warrants
> > > a major release.
> > >
> > > Within LinkedIn, some of these features have already been tested as
> part
> > of
> > > our test suites. We plan to continue our testing in coming weeks to
> > > validate the stability prior to release.
> > >
> > > Here is the highlighted list of features that are part of the new
> release
> > > (in chronological order)
> > > SAMZA-1981
> > > Consolidate table descriptors to samza-api
> > > SAMZA-1985
> > > Implement Startpoints model and StartpointManager
> > > SAMZA-1998
> > > Table API refactoring
> > > SAMZA-2012
> > > Add API for wiring an external context through to application
> processing
> > > code
> > > SAMZA-2041
> > > Add system descriptors for HDFS and Kinesis
> > > SAMZA-2043
> > > Consolidate ReadableTable and ReadWriteTable
> > > SAMZA-2106
> > > Samza App & Job Config Refactor
> > > SAMZA-2081
> > > Samza SQL : Type system for Samza SQL
> > >
> > > You can find a complete list of features here:
> > >
> > >
> >
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fissues%2F%3Fjql%3Dproject%2520%253D%2520SAMZA%2520AND%2520resolution%2520%2520%253D%2520Fixed%2520%2520AND%2520(fixVersion%2520%253E%253D%25201.1%2520)%2520ORDER%2520BY%2520createdDate%2520%2520DESCdata=02%7C01%7Cdchen1%40linkedin.com%7C01251a7438ea4324f3f608d6a2c11a53%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636875347611087937sdata=ZDMaQj5vX6Vlm%2B8vpGhrNygxpI2vvNnYGi1USWe%2FD5A%3Dreserved=0
> > >
> > > Here is my proposal on our release schedule and timelines.
> > >
> > >1. Cut a release version 1.1.0 from master
> > >2. Target a release vote on the week March 13th (next week)
> > >
> > > Thoughts?
> > >
> > > Thanks,
> > > Daniel
> > >
> >
>


Re: [VOTE] Migration of Samza git repo to gitbox.apache.org

2019-01-23 Thread Xinyu Liu
+1 (binding).

On Wed, Jan 23, 2019 at 2:39 PM Prateek Maheshwari 
wrote:

> +1 (binding) again
>
> - Prateek
>
> On Wed, Jan 23, 2019 at 11:50 AM Pawas Chhokra 
> wrote:
> >
> > Hi all,
> >
> > This is a call for a vote on migrating Samza git repo to
> gitbox.apache.org, on
> > 11 AM, Jan 29, 2019. As mandated by the Apache Infrastructure Team, all
> git
> > repositories must be migrated from git-wip-us.apache.org URL to
> > gitbox.apache.org, as the old service is being decommissioned.
> > The vote will be open for 72 hours (ending at 12:00 PM PST Monday,
> > January 28). You can vote as follows:
> >
> > [ ] +1 approve
> >
> > [ ] +0 no opinion
> >
> > [ ] -1 disapprove (and reason why)
> >
> > The vote is +1 from my side.
> >
> > Thanks & Regards,
> > Pawas Chhokra
>


Re: Samza runner bundle implementation

2019-01-18 Thread Xinyu Liu
Your understanding is correct: both GroupIntoBatches and the Stateful
processing of ParDo are on a per-key basis. The DoFn in this example
uses bufferState
to accumulate the batch and uses countState to keep track of the size of
the batch. I believe GroupIntoBatches implements the same logic, basically
keep the batch elements in a state and emit all the batch once the count of
the elements are the batch size. So if you are trying this example, you
don't need to use GroupIntoBatches.

Note this is not related to the bundle support, which is an internal
implementation of the runner which helps performance. The internal bundling
will let the runner process elements in a micro-batch mode (similar to
spark), so some particular calculation can be optimized, e.g. Python serde
of protobuf messages. Overall it's not visible to the user APIs. The batch
on the user side has to be done explicitly through the PTransforms.

Btw, seems my email still does work well with intuit domain, so please
include my gmail account directly in this email chain, which is added here.

Thanks,
Xinyu

On Fri, Jan 18, 2019 at 2:35 PM Daniel Chen  wrote:

>
>
> On 1/18/19, 1:33 PM, "Deshpande, Omkar" 
> wrote:
>
> Hey Xinyu,
>
> I am trying to implement "Batched RPC" as described in -
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbeam.apache.org%2Fblog%2F2017%2F08%2F28%2Ftimely-processing.htmldata=02%7C01%7Cdchen1%40linkedin.com%7C77e8e21b921042e5a8f908d67d8c8da8%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636834439965710970sdata=wWeVLlQVp3udG2Xg70SovqB66feNxMxYLxxCZRsJIc0%3Dreserved=0
> The documentation for GroupIntoBatches says "Batches will contain only
> elements of a single key".
> And my understanding is for "Batched RPC", I need a batch of keys. So,
> I am not sure if I can use GroupIntoBatches.
>
> On 1/18/19, 10:40 AM, "Xinyu Liu"  wrote:
>
> This email is from an external sender.
>
>
> sorry, the correct link to the first reference:
>
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fbeam%2Fblob%2Fmaster%2Frunners%2Fsamza%2Fsrc%2Fmain%2Fjava%2Forg%2Fapache%2Fbeam%2Frunners%2Fsamza%2Fruntime%2FDoFnOp.javadata=02%7C01%7Cdchen1%40linkedin.com%7C77e8e21b921042e5a8f908d67d8c8da8%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636834439965710970sdata=FQ9nJn1S7%2BRTiRwvwKrxAvFhTF076SNuY6RfUnsdkRI%3Dreserved=0
> .
>
> Thanks,
> Xinyu
>
> On Fri, Jan 18, 2019 at 10:35 AM Xinyu Liu 
> wrote:
>
> > Hi, Omkar,
> >
> > Your observation is correct. Currently bundle is implemented in a
> > per-event basis (Code is DoFnOp.processElement,
> >
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.google.com%2Fspreadsheets%2Fd%2F1pIUQ8J658B7GPNDt5dwiJBRQyWep1n2rXlkONyA6ZMM%2Fedit%23gid%3D1709587251data=02%7C01%7Cdchen1%40linkedin.com%7C77e8e21b921042e5a8f908d67d8c8da8%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636834439965710970sdata=Y48p6YZWoB5AaYgUeVHYrWD3JITAcjvVH5%2FgyZbIN44%3Dreserved=0
> ).
> > We are working on supporting bundles in Samza right now so in
> Beam we can
> > take advantage of it. Bundling is also critical to have better
> python
> > performance so we are trying to get it out very soon (Feb-March).
> >
> > On the other hand, in java if you want to process in a batch
> fashion, you
> > can use the Beam GroupIntoBatches api (
> >
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbeam.apache.org%2Freleases%2Fjavadoc%2F2.0.0%2Forg%2Fapache%2Fbeam%2Fsdk%2Ftransforms%2FGroupIntoBatches.htmldata=02%7C01%7Cdchen1%40linkedin.com%7C77e8e21b921042e5a8f908d67d8c8da8%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636834439965710970sdata=ZgKm2XXoromrgxW%2BeEyALoZT%2BLR3fwh3eQkxM17rW5g%3Dreserved=0
> ).
> > This will group the elements into batches and then deliver to
> your ParDo
> > afterwards. Please let us know whether this works for you.
> >
> > Thanks,
> > Xinyu
> >
> >
> >
> > On Fri, Jan 18, 2019 at 10:27 AM Daniel Chen <
> dch...@linkedin.com> wrote:
> >
> >> + Xinyu
> >>
> >> On 1/18/19, 10:05 AM, "Deshpande, Omkar" <
> omkar_deshpa...@intuit.com>
> >> wrote:
> >>
> >> Hello,
> >>
> >> I am using Samza runner with Apache Beam. Is there any
> documentation
> >> available on how bundles are implemented in the Samza runner?
> >> I have observed every Kafka record getting processed in its
> own
> >> bundle. How can I get larger bundles?
> >>
> >>  2.9.0 0.14.1
> >>
> >> Thanks,
> >> Omkar
> >>
> >>
> >>
>
>
>
>
>


Re: Samza runner bundle implementation

2019-01-18 Thread Xinyu Liu
sorry, the correct link to the first reference:
https://github.com/apache/beam/blob/master/runners/samza/src/main/java/org/apache/beam/runners/samza/runtime/DoFnOp.java
.

Thanks,
Xinyu

On Fri, Jan 18, 2019 at 10:35 AM Xinyu Liu  wrote:

> Hi, Omkar,
>
> Your observation is correct. Currently bundle is implemented in a
> per-event basis (Code is DoFnOp.processElement,
> https://docs.google.com/spreadsheets/d/1pIUQ8J658B7GPNDt5dwiJBRQyWep1n2rXlkONyA6ZMM/edit#gid=1709587251).
> We are working on supporting bundles in Samza right now so in Beam we can
> take advantage of it. Bundling is also critical to have better python
> performance so we are trying to get it out very soon (Feb-March).
>
> On the other hand, in java if you want to process in a batch fashion, you
> can use the Beam GroupIntoBatches api (
> https://beam.apache.org/releases/javadoc/2.0.0/org/apache/beam/sdk/transforms/GroupIntoBatches.html).
> This will group the elements into batches and then deliver to your ParDo
> afterwards. Please let us know whether this works for you.
>
> Thanks,
> Xinyu
>
>
>
> On Fri, Jan 18, 2019 at 10:27 AM Daniel Chen  wrote:
>
>> + Xinyu
>>
>> On 1/18/19, 10:05 AM, "Deshpande, Omkar" 
>> wrote:
>>
>> Hello,
>>
>> I am using Samza runner with Apache Beam. Is there any documentation
>> available on how bundles are implemented in the Samza runner?
>> I have observed every Kafka record getting processed in its own
>> bundle. How can I get larger bundles?
>>
>>  2.9.0 0.14.1
>>
>> Thanks,
>> Omkar
>>
>>
>>


Re: Samza runner bundle implementation

2019-01-18 Thread Xinyu Liu
Hi, Omkar,

Your observation is correct. Currently bundle is implemented in a per-event
basis (Code is DoFnOp.processElement,
https://docs.google.com/spreadsheets/d/1pIUQ8J658B7GPNDt5dwiJBRQyWep1n2rXlkONyA6ZMM/edit#gid=1709587251).
We are working on supporting bundles in Samza right now so in Beam we can
take advantage of it. Bundling is also critical to have better python
performance so we are trying to get it out very soon (Feb-March).

On the other hand, in java if you want to process in a batch fashion, you
can use the Beam GroupIntoBatches api (
https://beam.apache.org/releases/javadoc/2.0.0/org/apache/beam/sdk/transforms/GroupIntoBatches.html).
This will group the elements into batches and then deliver to your ParDo
afterwards. Please let us know whether this works for you.

Thanks,
Xinyu



On Fri, Jan 18, 2019 at 10:27 AM Daniel Chen  wrote:

> + Xinyu
>
> On 1/18/19, 10:05 AM, "Deshpande, Omkar" 
> wrote:
>
> Hello,
>
> I am using Samza runner with Apache Beam. Is there any documentation
> available on how bundles are implemented in the Samza runner?
> I have observed every Kafka record getting processed in its own
> bundle. How can I get larger bundles?
>
>  2.9.0 0.14.1
>
> Thanks,
> Omkar
>
>
>


Re: InMemorySystemDescriptor ignores serde

2019-01-17 Thread Xinyu Liu
Hi, Tom,

First, your observation about current InMemorySystem is exactly right and
thanks for raising this issue to the community!

The current InMemorySystem came up with a tight coupling with the Samza
test framework, which I believe put quite a lot of limitations on its uses,
e.g. using InMemorySystem in prototyping. Currently we are working on ways
to improve it so it can be used by normal user code. For your case, it
seems to me the inconsistency of using serde caused the confusion. You
point of being more consistent to support Serde in InMemory stream sounds
reasonable to me. I have the same impression that InMemory can be treated
the same way as other input streams. The initial rollout doesn't have this
feature and I create the ticket
https://issues.apache.org/jira/browse/SAMZA-2075 to track this. We will see
whether we can get it there in the next release.

Thanks,
Xinyu





On Thu, Jan 17, 2019 at 6:11 AM Tom Davis  wrote:

> Hey Sanil, thanks for the reply. I eventually figured that not
> supporting serdes for in-memory streams was an intentional restriction,
> I was just pointing out that it is inconsistent with earlier versions
> since it was relatively easy to supply stream serdes directly before the
> Descriptor API.
>
> I can't really send a test case along because it's all in Clojure and
> uses a Clojure-based API wrapper I wrote for interacting with Samza. In
> theory, the easiest test would be one where a Config contains the
> property I mentioned; with that, you should be able to run a simple
> pipeline that shows -- despite the NoOpSerde forced by
> InMemorySystemDescriptor -- the input is serialized using that serde.
>
> Anyway, I'm not sure if it's worth the trouble. I get why you'd forgo
> serialization for the in-memory system, it was just a handy way to test
> my entire pipeline which contains a few non-trivial custom serdes.
>
>
> Sanil Jain  writes:
>
> > Hi Tom,
> >
> > InMemorySystem is a system that is supposed to only support NoOpSerde
> since
> > all the associated steams for this system are maintained in memory. In
> > addition to this, if your test is using the Samza's Test Framework, it
> will
> > override any explicit serde configs specified for streams to NoOp.
> >
> >
> > You are expected to supply deserialized objects to the in-memory system.
> >
> >
> > In addition to that in your email you mentioned:
> >
> >
> > {unformat}
> >
> > I had still specified in my config:
> >
> > streams.in-0.samza.msg.serde=integer
> >
> >
> > Apparently, that *was* respected by some part of the system because
> > integers were
> > deserialized properly! Removing this configuration value results in my
> > operator
> > receiving a byte array since the in-memory system only uses NoOpSerde.
> >
> > {unformat}
> >
> >
> > Can you send me a snippet of test you were trying to fix so that I can
> > understand the problem better?
> >
> >
> > Thanks
> >
> > Sanil
> >
> > On Tue, 8 Jan 2019 at 17:28, Tom Davis  wrote:
> >
> >> I am in the process of updating a project to 1.0 and spent today
> debugging
> >> a
> >> rather odd test failure. When using input/output streams with
> IntegerSerde,
> >> things worked fine -- however, using LongSerde, every message value was
> 0!
> >> I
> >> eventually found that InMemorySystemDescriptor#getInputDescriptor
> ignores
> >> the
> >> serde passed to it. However, I had still specified in my config:
> >>
> >> streams.in-0.samza.msg.serde=integer
> >>
> >> Apparently that *was* respected by some part of the system because
> >> integers were
> >> deserialized properly! Removing this configuration value results in my
> >> operator
> >> receiving a byte array since the in-memory system only uses NoOpSerde.
> >>
> >> This behavior appears inconsistent with the previous version of Samza.
> The
> >> old
> >> `getInputStream` was passed a serde that was always used, but since the
> new
> >> version receives a Descriptor that has already discarded the serde, I am
> >> forced
> >> into assuming NoOpSerde everywhere, at least for testing purposes.
> >>
> >> Not the end of the world, but it does introduce an inconsistency between
> >> the
> >> in-memory system and any other -- one that requires a fair bit of domain
> >> knowledge to avoid.
> >>
> >> As always, thanks for the great project!
> >>
>


[ANNOUNCE] New committer announcement

2019-01-15 Thread Xinyu Liu
Hi, all,

Please join me and the rest of the Samza PMC in welcoming a new committer:

 - Shanthoosh Venkataraman: of, but not limited to, his work on Samza
standalone.

Through his work on developing new features such as host affinity on
standalone and refining existing Samza code base, the PMC trusts Shanthoosh
with the responsibilities of a Samza committer.

Xinyu


Re: app.class or task.class for beam samza runner

2019-01-03 Thread Xinyu Liu
Add Omkar email back to this email list.

For your later error, I think you need to add the following config as you
are using standalone:

app.runner.class=org.apache.samza.runtime.LocalApplicationRunner


Please keep us updated if you run into any further issues.

Thanks,

Xinyu


On Thu, Jan 3, 2019 at 12:14 PM Xinyu Liu  wrote:

> As Prateek mentioned, I also double checked the exception, which comes
> from a class (ApplicationUtil.java) that only exists in Samza 1.0. Please
> remove any Samza 1.0 dependency since Beam api currently works with Samza
> 0.14.1.
>
> Your config looks mostly correct to me. The following is not needed:
>
> job.factory.class=org.apache.samza.job.local.ProcessJobFactory
>
> And you probably need to config this for any data repartitioning:
>
> job.default.system=kafka
>
> Thanks,
> Xinyu
>
>
> On Thu, Jan 3, 2019 at 10:03 AM Prateek Maheshwari 
> wrote:
>
>> Hi Omkar,
>>
>> I think it's only possible to get that exception with Samza 1.0. Can
>> you verify that the deployment is indeed using samza 0.14.1?
>>
>> Thanks,
>> Prateek
>>
>> On Wed, Jan 2, 2019 at 11:40 PM Deshpande, Omkar
>>  wrote:
>> >
>> > Hello,
>> >
>> > I have been able to execute my Samza-Beam application in Local mode.
>> And now I am trying to run a Samza-Beam application in Standalone mode.
>> >
>> > Here is my configFile  config.properties:
>> >
>> > app.name=test-app
>> > job.coordinator.factory=org.apache.samza.zk.ZkJobCoordinatorFactory
>> > job.coordinator.zk.connect=localhost:2181
>> > job.coordinator.system=kafka
>> > job.factory.class=org.apache.samza.job.local.ProcessJobFactory
>> > # Kafka System
>> >
>> systems.kafka.samza.factory=org.apache.samza.system.kafka.KafkaSystemFactory
>> > systems.kafka.consumer.zookeeper.connect=localhost:2181
>> > systems.kafka.producer.bootstrap.servers=localhost:9092
>> > systems.kafka.default.stream.replication.factor=1
>> >
>> > I am getting following exception:
>> >
>> > org.apache.samza.config.ConfigException: Legacy task applications must
>> set a non-empty task.class in configuration.
>> >
>> >   at
>> org.apache.samza.application.ApplicationUtil.fromConfig(ApplicationUtil.java:58)
>> >
>> >   at
>> org.apache.samza.runtime.LocalContainerRunner.main(LocalContainerRunner.java:87)
>> >
>> > Versions:
>> > 2.9.0
>> > 0.14.1
>> >
>> > As per my understanding, I shouldn’t have to create implementation of
>> StreamApplication or StreamTask while using Beam SDK.
>> >
>> > An example of configFile for Samza-Beam Standalone application would be
>> helpful.
>> >
>> > Regards,
>> > Omkar Deshpande
>>
>


Re: app.class or task.class for beam samza runner

2019-01-03 Thread Xinyu Liu
As Prateek mentioned, I also double checked the exception, which comes from
a class (ApplicationUtil.java) that only exists in Samza 1.0. Please remove
any Samza 1.0 dependency since Beam api currently works with Samza 0.14.1.

Your config looks mostly correct to me. The following is not needed:

job.factory.class=org.apache.samza.job.local.ProcessJobFactory

And you probably need to config this for any data repartitioning:

job.default.system=kafka

Thanks,
Xinyu


On Thu, Jan 3, 2019 at 10:03 AM Prateek Maheshwari 
wrote:

> Hi Omkar,
>
> I think it's only possible to get that exception with Samza 1.0. Can
> you verify that the deployment is indeed using samza 0.14.1?
>
> Thanks,
> Prateek
>
> On Wed, Jan 2, 2019 at 11:40 PM Deshpande, Omkar
>  wrote:
> >
> > Hello,
> >
> > I have been able to execute my Samza-Beam application in Local mode. And
> now I am trying to run a Samza-Beam application in Standalone mode.
> >
> > Here is my configFile  config.properties:
> >
> > app.name=test-app
> > job.coordinator.factory=org.apache.samza.zk.ZkJobCoordinatorFactory
> > job.coordinator.zk.connect=localhost:2181
> > job.coordinator.system=kafka
> > job.factory.class=org.apache.samza.job.local.ProcessJobFactory
> > # Kafka System
> >
> systems.kafka.samza.factory=org.apache.samza.system.kafka.KafkaSystemFactory
> > systems.kafka.consumer.zookeeper.connect=localhost:2181
> > systems.kafka.producer.bootstrap.servers=localhost:9092
> > systems.kafka.default.stream.replication.factor=1
> >
> > I am getting following exception:
> >
> > org.apache.samza.config.ConfigException: Legacy task applications must
> set a non-empty task.class in configuration.
> >
> >   at
> org.apache.samza.application.ApplicationUtil.fromConfig(ApplicationUtil.java:58)
> >
> >   at
> org.apache.samza.runtime.LocalContainerRunner.main(LocalContainerRunner.java:87)
> >
> > Versions:
> > 2.9.0
> > 0.14.1
> >
> > As per my understanding, I shouldn’t have to create implementation of
> StreamApplication or StreamTask while using Beam SDK.
> >
> > An example of configFile for Samza-Beam Standalone application would be
> helpful.
> >
> > Regards,
> > Omkar Deshpande
>


Re: Welcome Hai Lu and Aditya Toomla as committers to Apache Samza!

2018-11-06 Thread Xinyu Liu
Well deserved! Congrats!

Xinyu

On Tue, Nov 6, 2018 at 10:39 AM Jake Maes  wrote:

> Welcome Hai and Aditya. Nice work!
>
> On Tue, Nov 6, 2018 at 10:20 AM Yi Pan  wrote:
>
> > Hi, all,
> >
> > All official steps are completed and please join me to welcome Hai and
> > Aditya to Apache Samza community as committers! They have been making
> > significant contribution to many important projects in Samza such as SQL,
> > Samza-on-Hadoop, Kinesis connector, etc.
> >
> > Welcome Hai and Aditya!
> >
> > -Yi
> >
>


Re: [VOTE] SEP-15: New Runtime Context API

2018-10-15 Thread Xinyu Liu
+1 (binding).

@Cameron: your first voting email seems not showing up for me. For future
communications, please use your personal email instead.

Thanks,
Xinyu

On Mon, Oct 15, 2018 at 10:44 AM Prateek Maheshwari 
wrote:

> +1 (non-binding) for these changes.
>
> (Resending from a non-LI email due to email delivery issues)
>
> - Prateek
> On Fri, Oct 12, 2018 at 3:28 PM Cameron Lee  wrote:
> >
> > Hi all,
> >
> > SEP-15 has been updated now that SAMZA-1714 has been reviewed and
> implemented.
> > Please vote on whether there are further breaking changes needed in the
> API or we can accept this proposal to be included in Samza 1.0.
> >
> >
> https://cwiki.apache.org/confluence/display/SAMZA/SEP-15%3A+New+Runtime+Context+API
> >
> > Thank you,
> > Cameron
>


Re: [VOTE] SEP-14: System and Stream Descriptors

2018-10-15 Thread Xinyu Liu
+1 (binding). Great API for IO!

Thanks,
Xinyu

On Fri, Oct 12, 2018 at 5:36 PM Yi Pan  wrote:

> +1 (binding) from me. Thanks!
>
> On Fri, Oct 12, 2018 at 12:30 PM Prateek Maheshwari 
> wrote:
>
> > Hi folks,
> >
> > Now that SAMZA-1804 has been implemented and reviewed, we've updated
> > SEP-14 with the latest APIs and design decisions.
> >
> > Please vote for accepting SEP-14 in its current form for the upcoming
> > Samza 1.0 release.
> >
> >
> https://cwiki.apache.org/confluence/display/SAMZA/SEP-14%3A+System+and+Stream+Descriptors
> >
> > Thanks,
> > Prateek
> >
>


Re: [VOTE] SEP-13: unified ApplicationDescriptor and ApplicationRunner APIs for high and low- level APIs in YARN and standalone deployment

2018-10-15 Thread Xinyu Liu
+1 (binding). Thanks for the effort to making our API consistent.

Thanks,
Xinyu

On Fri, Oct 12, 2018 at 12:32 PM Prateek Maheshwari 
wrote:

> +1 (non-binding) from me. Thanks for making the changes and updating the
> SEP!
>
> - Prateek
>
> On Fri, Oct 12, 2018 at 12:15 PM Yi Pan  wrote:
> >
> > Hi, all,
> >
> > Given SAMZA-1789 has been reviewed and implemented, SEP-13 has been
> updated
> > to the latest API classes as well. Please vote on whether there is
> further
> > breaking changes needed in the API, or we can accept this proposal and
> seal
> > it for 1.0.
> >
> >
> https://cwiki.apache.org/confluence/display/SAMZA/SEP-13%3A+unify+high-+and+low-level+user+applications+in+YARN+and+standalone
> >
> > Thanks a lot!
> >
> > This email serves as my +1 (binding) to accept SEP-13.
> >
> > -Yi
>


Re: Review Request 68867: Update hell0-samza with latest code

2018-09-28 Thread Xinyu Liu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68867/
---

(Updated Sept. 28, 2018, 11:50 p.m.)


Review request for samza and Prateek Maheshwari.


Changes
---

Add Prateek's comments.


Repository: samza-hello-samza


Description
---

Update hell0-samza with latest code


Diffs (updated)
-

  build.gradle 9d1f5433afc548ee19b00c0e0e2e73e7963d25ef 
  src/main/config/pageview-adclick-joiner.properties 
eba7b0b6efc86869dbdf9402ae069550ed4c1723 
  src/main/config/pageview-filter.properties 
331ee1a1f1315c13aa955353795853eca792d669 
  src/main/config/pageview-profile-table-joiner.properties 
7cec6013e744042295568cef17928321abf95b35 
  src/main/config/pageview-sessionizer.properties 
420cdde0d2d82f22daafc49b7428040a7dcd1eef 
  src/main/config/stock-price-table-joiner.properties 
f9bd684ed3601ba887b48c87c070f31c8b137f8e 
  src/main/config/tumbling-pageview-counter.properties 
b58dbe9a951a1ec1978a745efdd43e0f9fe87483 
  src/main/config/wikipedia-application-local-runner.properties 
b770f1317dfd61b33122c71127ba8163135a26e5 
  src/main/config/wikipedia-application.properties 
841fcc5a3ca6b5e52c82e717fb7baa1e380d134f 
  src/main/java/samza/examples/azure/AzureApplication.java 
9f565fe47373c5bb054cfcc1ccc00e54111d786e 
  src/main/java/samza/examples/azure/AzureZKLocalApplication.java 
3d4f8b05cd1405070448c7598afa2712bb2db13d 
  src/main/java/samza/examples/cookbook/PageViewAdClickJoiner.java 
f6c3810ad59532563994ff344952490369c87350 
  src/main/java/samza/examples/cookbook/PageViewFilterApp.java 
a2accfdd8fee6033133e77e74fa8c6fd8de186c5 
  src/main/java/samza/examples/cookbook/PageViewProfileTableJoiner.java 
86deb614cc7bf9720df86e9d9f49bb4613ce1384 
  src/main/java/samza/examples/cookbook/PageViewSessionizerApp.java 
2bcd9f5b8f8e14e451451a12b627e189ad673288 
  src/main/java/samza/examples/cookbook/StockPriceTableJoiner.java 
cb735d284f82b67f01f797c2ede2c130b80eb047 
  src/main/java/samza/examples/cookbook/TumblingPageViewCounterApp.java 
acf1411af7d951081ad988c92822fde2b140608d 
  src/main/java/samza/examples/wikipedia/application/WikipediaApplication.java 
032608f4f1f2525c0d5598ea80b84ed4d40f6f7e 
  
src/main/java/samza/examples/wikipedia/application/WikipediaZkLocalApplication.java
 51dd28f69b146962b2853d2426e153e13a6ab1e6 
  src/main/java/samza/examples/wikipedia/model/WikipediaParser.java 
93479626afa0fb9f8a230fa755e8f30fdbc57563 
  src/main/java/samza/examples/wikipedia/system/WikipediaInputDescriptor.java 
PRE-CREATION 
  src/main/java/samza/examples/wikipedia/system/WikipediaSystemDescriptor.java 
PRE-CREATION 


Diff: https://reviews.apache.org/r/68867/diff/2/

Changes: https://reviews.apache.org/r/68867/diff/1-2/


Testing
---

Only tested the high-level wikipedia example for now. I will keep testing the 
rest of it in the next few days. Commit the change for now to unblock others to 
test.


Thanks,

Xinyu Liu



Review Request 68867: Update hell0-samza with latest code

2018-09-27 Thread Xinyu Liu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68867/
---

Review request for samza and Prateek Maheshwari.


Repository: samza-hello-samza


Description
---

Update hell0-samza with latest code


Diffs
-

  build.gradle 9d1f5433afc548ee19b00c0e0e2e73e7963d25ef 
  src/main/config/pageview-adclick-joiner.properties 
eba7b0b6efc86869dbdf9402ae069550ed4c1723 
  src/main/config/pageview-filter.properties 
331ee1a1f1315c13aa955353795853eca792d669 
  src/main/config/pageview-profile-table-joiner.properties 
7cec6013e744042295568cef17928321abf95b35 
  src/main/config/pageview-sessionizer.properties 
420cdde0d2d82f22daafc49b7428040a7dcd1eef 
  src/main/config/stock-price-table-joiner.properties 
f9bd684ed3601ba887b48c87c070f31c8b137f8e 
  src/main/config/tumbling-pageview-counter.properties 
b58dbe9a951a1ec1978a745efdd43e0f9fe87483 
  src/main/config/wikipedia-application-local-runner.properties 
b770f1317dfd61b33122c71127ba8163135a26e5 
  src/main/config/wikipedia-application.properties 
841fcc5a3ca6b5e52c82e717fb7baa1e380d134f 
  src/main/java/samza/examples/azure/AzureApplication.java 
9f565fe47373c5bb054cfcc1ccc00e54111d786e 
  src/main/java/samza/examples/azure/AzureZKLocalApplication.java 
3d4f8b05cd1405070448c7598afa2712bb2db13d 
  src/main/java/samza/examples/cookbook/PageViewAdClickJoiner.java 
f6c3810ad59532563994ff344952490369c87350 
  src/main/java/samza/examples/cookbook/PageViewFilterApp.java 
a2accfdd8fee6033133e77e74fa8c6fd8de186c5 
  src/main/java/samza/examples/cookbook/PageViewProfileTableJoiner.java 
86deb614cc7bf9720df86e9d9f49bb4613ce1384 
  src/main/java/samza/examples/cookbook/PageViewSessionizerApp.java 
2bcd9f5b8f8e14e451451a12b627e189ad673288 
  src/main/java/samza/examples/cookbook/StockPriceTableJoiner.java 
cb735d284f82b67f01f797c2ede2c130b80eb047 
  src/main/java/samza/examples/cookbook/TumblingPageViewCounterApp.java 
acf1411af7d951081ad988c92822fde2b140608d 
  src/main/java/samza/examples/wikipedia/application/WikipediaApplication.java 
032608f4f1f2525c0d5598ea80b84ed4d40f6f7e 
  
src/main/java/samza/examples/wikipedia/application/WikipediaZkLocalApplication.java
 51dd28f69b146962b2853d2426e153e13a6ab1e6 
  src/main/java/samza/examples/wikipedia/model/WikipediaParser.java 
93479626afa0fb9f8a230fa755e8f30fdbc57563 
  src/main/java/samza/examples/wikipedia/system/WikipediaInputDescriptor.java 
PRE-CREATION 
  src/main/java/samza/examples/wikipedia/system/WikipediaSystemDescriptor.java 
PRE-CREATION 


Diff: https://reviews.apache.org/r/68867/diff/1/


Testing
---

Only tested the high-level wikipedia example for now. I will keep testing the 
rest of it in the next few days. Commit the change for now to unblock others to 
test.


Thanks,

Xinyu Liu



Re: [VOTE] Committership for Wei and Srini

2018-07-04 Thread Xinyu Liu
Sorry, sent to the wrong mailing list. Please ignore.

Thanks,
Xinyu

On Wed, Jul 4, 2018 at 8:17 AM, Xinyu Liu  wrote:

> Hi, All,
>
> Seems the feedback from the discussion thread is very positive and we
> didn't receive any objections, this is a vote for committership for Wei and
> Srini.
>
> Please vote:
> [] +1, Approve to become a new committer
> [] -1, Do not approve (please provide specific comments)
>
> From my side, it's +1 for both.
>
> Thanks,
> Xinyu
>


[VOTE] Committership for Wei and Srini

2018-07-04 Thread Xinyu Liu
Hi, All,

Seems the feedback from the discussion thread is very positive and we
didn't receive any objections, this is a vote for committership for Wei and
Srini.

Please vote:
[] +1, Approve to become a new committer
[] -1, Do not approve (please provide specific comments)

>From my side, it's +1 for both.

Thanks,
Xinyu


Re: [VOTE] Apache Samza 0.14.1 RC2

2018-05-23 Thread Xinyu Liu
The vote of 0.14.1 RC2 has been more than 72 hours and we got +1 (binding)
x 3 and +1 (non-binding) x 1.

Samza 0.14.1 officially passed the VOTE!

Thanks!
Xinyu


On Tue, May 22, 2018 at 11:31 PM, Weiqing Yang <yangweiqing...@gmail.com>
wrote:

> +1
>
> 1. Verified the hashes/signature;
> 2. Ran both ./bin/check-all.sh and ./bin/integration-tests.sh successfully;
> 3. Per http://samza.apache.org/learn/tutorials/latest/samza-sql.html, I
> ran
> some SQL queries locally, and it worked. (only verified in standalone mode)
>
>
> On Fri, May 18, 2018 at 7:08 PM, Jagadish Venkatraman <jagad...@apache.org
> >
> wrote:
>
> >- Verified the signature in the release tag.
> >- Ran check-all.sh successfully
> >- Ran integration-tests.sh successfully
> >- Executed all SQL queries successfully in the tutorial at
> >http://samza.apache.org/learn/tutorials/0.14/samza-sql.html
> ><http://samza.apache.org/learn/tutorials/0.14/samza-sql.html>.
> >
> > +1 (binding) from my side for RC2.
> >
> > Big thanks to Xinyu for driving the 0.14.1 release and to everyone for
> your
> > continued contributions!
> >
> > -- Jagadish
> >
> > On Fri, May 18, 2018 at 5:57 PM, Yi Pan <nickpa...@gmail.com> wrote:
> >
> > > Re-ran the RC w/ the following results:
> > > 1) ./bin/check-all.sh succeeded
> > > 2) ./bin/integration-tests.sh succeeded
> > > 3) expanded samza-tools and followed the tutorial steps for standalone
> > SQL
> > > examples in http://samza.apache.org/learn/
> tutorials/0.14/samza-sql.html.
> > > Succeeded.
> > > 4) verified all sha1 hash code and asc signatures successfully
> > >
> > > All passed and +1 (binding). Thanks for Xinyu to carry it on!
> > >
> > > -Yi
> > >
> > > On Fri, May 18, 2018 at 2:31 PM, Xinyu Liu <xinyuliu...@gmail.com>
> > wrote:
> > >
> > > > Hi, All,
> > > >
> > > > This is a call for a vote on a release of Apache Samza 0.14.1. Thanks
> > to
> > > > everyone who has contributed to this release.
> > > >
> > > > The release candidate can be downloaded from here:
> > > > http://home.apache.org/~xinyu/samza-0.14.1-rc2/
> > > >
> > > > The release candidate is signed with pgp key C31D7061, which can be
> > found
> > > > on keyservers: http://pgp.mit.edu/pks/lookup?
> op=get=0xC31D7061
> > > >
> > > > The git tag is release-0.14.1-rc2 and signed with the same pgp key:
> > > > https://git-wip-us.apache.org/repos/asf?p=samza.git;a=tag;h=
> > > > refs/tags/release-0.14.1-rc2
> > > >
> > > > Test binaries have been published to Maven's staging repository, and
> > are
> > > > available here:
> > > >
> > https://repository.apache.org/content/repositories/orgapachesamza-1049/
> > > >
> > > > 46 issues were resolved for this release: https://issues.
> > > > apache.org/jira/projects/SAMZA/versions/12343155
> > > >
> > > > The vote will be open for 72 hours (ending at 3:00PM Wednesday,
> > > > 05/23/2018).
> > > >
> > > > Please download the release candidate, check the hashes/signature,
> > build
> > > it
> > > > and test it, and then please vote:
> > > >
> > > > [ ] +1 approve
> > > >
> > > > [ ] +0 no opinion
> > > >
> > > > [ ] -1 disapprove (and reason why)
> > > >
> > > > For me, I ran check-all.sh, integration tests and verified the SQL
> > > console
> > > > in samza-tool tgz. So +1 (binding) from my side.
> > > >
> > > > Thanks,
> > > > Xinyu
> > > >
> > >
> >
>


[VOTE] Apache Samza 0.14.1 RC2

2018-05-18 Thread Xinyu Liu
Hi, All,

This is a call for a vote on a release of Apache Samza 0.14.1. Thanks to
everyone who has contributed to this release.

The release candidate can be downloaded from here:
http://home.apache.org/~xinyu/samza-0.14.1-rc2/

The release candidate is signed with pgp key C31D7061, which can be found
on keyservers: http://pgp.mit.edu/pks/lookup?op=get=0xC31D7061

The git tag is release-0.14.1-rc2 and signed with the same pgp key:
https://git-wip-us.apache.org/repos/asf?p=samza.git;a=tag;h=refs/tags/release-0.14.1-rc2

Test binaries have been published to Maven's staging repository, and are
available here:
https://repository.apache.org/content/repositories/orgapachesamza-1049/

46 issues were resolved for this release: https://issues.
apache.org/jira/projects/SAMZA/versions/12343155

The vote will be open for 72 hours (ending at 3:00PM Wednesday, 05/23/2018).

Please download the release candidate, check the hashes/signature, build it
and test it, and then please vote:

[ ] +1 approve

[ ] +0 no opinion

[ ] -1 disapprove (and reason why)

For me, I ran check-all.sh, integration tests and verified the SQL console
in samza-tool tgz. So +1 (binding) from my side.

Thanks,
Xinyu


[CANCEL][VOTE] Apache Samza 0.14.1 RC1

2018-05-18 Thread Xinyu Liu
We found an issue when running SQL console tool. Cancel this vote and will
submit another candidate after it's fixed.

Thanks,
Xinyu


Re: [VOTE] Apache Samza 0.14.1 RC1

2018-05-18 Thread Xinyu Liu
Cancel the vote due to this issue. I will reach out to the Samza SQL folks
to get this resolved.

Thanks,
Xinyu

On Fri, May 18, 2018 at 12:33 AM, Yi Pan <nickpa...@gmail.com> wrote:

> -1.
>
> Verified the sha1 and asc signatures are correct.
> Ran check-all.sh and succeeded.
> Ran ./bin/integration-tests.sh succeeded.
>
> Followed instruction in http://samza.apache.org/learn/
> tutorials/0.14/samza-
> sql.html and saw the following error:
> 1) localhost samza-tools/scripts/generate-kafka-events.sh failed on MacOS:
> realpath command not found
> 2) after modifying the scripts to set the base_dir,
> samza-tools/scripts/samza-sql-console.sh  --sql "insert into
> log.consoleoutput select * from kafka.ProfileChangeStream" failed w/
> the following error:
> {noformat}
> Executing sql insert into log.consoleoutput select Name as __key__, Name,
> NewCompany, RegexMatch('.*soft', OldCompany) from kafka.ProfileChangeStream
> where NewCompany = 'LinkedIn' log4j:WARN Continuable parsing error 42 and
> column 23 log4j:WARN The content of element type "log4j:configuration" must
> match
> "(renderer*,throwableRenderer?,appender*,plugin*,(category|
> logger)*,root?,(categoryFactory|loggerFactory)?)".
> 2018-05-18 00:18:50 INFO VerifiableProperties:70 - Verifying properties
> 2018-05-18 00:18:50 INFO VerifiableProperties:70 - Property client.id is
> overridden to samza_admin-sql_job-1 2018-05-18 00:18:50 INFO
> VerifiableProperties:70 - Property group.id is overridden to
> undefined-samza-consumer-group-3edff597-2257-427b-aea4-5d11c664bba4
> 2018-05-18 00:18:50 INFO VerifiableProperties:70 - Property
> zookeeper.connect is overridden to localhost:2181 2018-05-18 00:18:50 INFO
> SamzaSqlApplicationConfig:148 - Instantiating SqlIOResolver using factory
> org.apache.samza.sql.impl.ConfigBasedIOResolverFactory with props
> {log.relSchemaProviderName=config,
> factory=org.apache.samza.sql.impl.ConfigBasedIOResolverFactory,
> kafka.samzaRelConverterName=avro, kafka.relSchemaProviderName=config,
> log.samzaRelConverterName=json} 2018-05-18 00:18:50 DEBUG parser:632 -
> Reduced `NewCompany` = 'LinkedIn' Exception in thread "main"
> org.apache.commons.lang.NotImplementedException: No sink support in
> ConfigBasedIOResolver. at
> org.apache.samza.sql.impl.ConfigBasedIOResolverFactory$
> ConfigBasedIOResolver.fetchSinkInfo(ConfigBasedIOResolverFactory.java:111)
> at
> org.apache.samza.sql.runner.SamzaSqlApplicationRunner.computeSamzaConfigs(
> SamzaSqlApplicationRunner.java:91)
> at
> org.apache.samza.sql.runner.SamzaSqlApplicationRunner.(
> SamzaSqlApplicationRunner.java:66)
> at
> org.apache.samza.tools.SamzaSqlConsole.executeSql(
> SamzaSqlConsole.java:110)
> at org.apache.samza.tools.SamzaSqlConsole.main(SamzaSqlConsole.java:104)
> {noformat}
>
> On Thu, May 17, 2018 at 12:00 PM, Xinyu Liu <xinyuliu...@gmail.com> wrote:
>
> > Extend the vote to Friday, May 18 5pm. Please take a chance to run the
> > tests and vote for it.
> >
> > Thanks,
> > Xinyu
> >
> > On Wed, May 16, 2018 at 10:44 AM, Xinyu Liu <xinyuliu...@gmail.com>
> wrote:
> >
> > > A reminder to download the build and vote for it. Since we are passing
> > the
> > > end date for the vote, let's extend to tomorrow 12pm.
> > >
> > > For me, I ran the check-all.sh and integration tests on Linux REHL7 and
> > > all tests passed. So +1 (binding) from my side.
> > >
> > > Thanks,
> > > Xinyu
> > >
> > > On Thu, May 10, 2018 at 2:44 PM, Xinyu Liu <xinyuliu...@gmail.com>
> > wrote:
> > >
> > >> Hi, All,
> > >>
> > >> This is a call for a vote on a release of Apache Samza 0.14.1. Thanks
> to
> > >> everyone who has contributed to this release.
> > >>
> > >> The release candidate can be downloaded from here:
> > >> http://home.apache.org/~xinyu/samza-0.14.1-rc1/
> > >>
> > >> The release candidate is signed with pgp key C31D7061, which can be
> > found
> > >> on keyservers: http://pgp.mit.edu/pks/lookup?op=get=0xC31D7061
> > >>
> > >> The git tag is release-0.14.1-rc0 and signed with the same pgp key:
> > >> https://git-wip-us.apache.org/repos/asf?p=samza.git;a=
> > >> tag;h=refs/tags/release-0.14.1-rc1
> > >>
> > >> Test binaries have been published to Maven's staging repository, and
> are
> > >> available here: https://repository.apache.org/content/repositories/
> > >> orgapachesamza-1047/
> > >>
> > >> 46 issues were resolved for this release: https://issues.apache
> > >> .org/jira/projects/SAMZA/versions/12343155
> > >>
> > >> The vote will be open for 72 hours (ending at 3:00PM Tuesday,
> > 05/15/2018).
> > >>
> > >> Please download the release candidate, check the hashes/signature,
> build
> > >> it and test it, and then please vote:
> > >>
> > >> [ ] +1 approve
> > >>
> > >> [ ] +0 no opinion
> > >>
> > >> [ ] -1 disapprove (and reason why)
> > >>
> > >> Thanks,
> > >> Xinyu
> > >>
> > >>
> > >
> >
>


Re: [VOTE] Apache Samza 0.14.1 RC1

2018-05-17 Thread Xinyu Liu
Extend the vote to Friday, May 18 5pm. Please take a chance to run the
tests and vote for it.

Thanks,
Xinyu

On Wed, May 16, 2018 at 10:44 AM, Xinyu Liu <xinyuliu...@gmail.com> wrote:

> A reminder to download the build and vote for it. Since we are passing the
> end date for the vote, let's extend to tomorrow 12pm.
>
> For me, I ran the check-all.sh and integration tests on Linux REHL7 and
> all tests passed. So +1 (binding) from my side.
>
> Thanks,
> Xinyu
>
> On Thu, May 10, 2018 at 2:44 PM, Xinyu Liu <xinyuliu...@gmail.com> wrote:
>
>> Hi, All,
>>
>> This is a call for a vote on a release of Apache Samza 0.14.1. Thanks to
>> everyone who has contributed to this release.
>>
>> The release candidate can be downloaded from here:
>> http://home.apache.org/~xinyu/samza-0.14.1-rc1/
>>
>> The release candidate is signed with pgp key C31D7061, which can be found
>> on keyservers: http://pgp.mit.edu/pks/lookup?op=get=0xC31D7061
>>
>> The git tag is release-0.14.1-rc0 and signed with the same pgp key:
>> https://git-wip-us.apache.org/repos/asf?p=samza.git;a=
>> tag;h=refs/tags/release-0.14.1-rc1
>>
>> Test binaries have been published to Maven's staging repository, and are
>> available here: https://repository.apache.org/content/repositories/
>> orgapachesamza-1047/
>>
>> 46 issues were resolved for this release: https://issues.apache
>> .org/jira/projects/SAMZA/versions/12343155
>>
>> The vote will be open for 72 hours (ending at 3:00PM Tuesday, 05/15/2018).
>>
>> Please download the release candidate, check the hashes/signature, build
>> it and test it, and then please vote:
>>
>> [ ] +1 approve
>>
>> [ ] +0 no opinion
>>
>> [ ] -1 disapprove (and reason why)
>>
>> Thanks,
>> Xinyu
>>
>>
>


Re: [VOTE] Apache Samza 0.14.1 RC1

2018-05-16 Thread Xinyu Liu
A reminder to download the build and vote for it. Since we are passing the
end date for the vote, let's extend to tomorrow 12pm.

For me, I ran the check-all.sh and integration tests on Linux REHL7 and all
tests passed. So +1 (binding) from my side.

Thanks,
Xinyu

On Thu, May 10, 2018 at 2:44 PM, Xinyu Liu <xinyuliu...@gmail.com> wrote:

> Hi, All,
>
> This is a call for a vote on a release of Apache Samza 0.14.1. Thanks to
> everyone who has contributed to this release.
>
> The release candidate can be downloaded from here:
> http://home.apache.org/~xinyu/samza-0.14.1-rc1/
>
> The release candidate is signed with pgp key C31D7061, which can be found
> on keyservers: http://pgp.mit.edu/pks/lookup?op=get=0xC31D7061
>
> The git tag is release-0.14.1-rc0 and signed with the same pgp key:
> https://git-wip-us.apache.org/repos/asf?p=samza.
> git;a=tag;h=refs/tags/release-0.14.1-rc1
>
> Test binaries have been published to Maven's staging repository, and are
> available here: https://repository.apache.org/content/
> repositories/orgapachesamza-1047/
>
> 46 issues were resolved for this release: https://issues.
> apache.org/jira/projects/SAMZA/versions/12343155
>
> The vote will be open for 72 hours (ending at 3:00PM Tuesday, 05/15/2018).
>
> Please download the release candidate, check the hashes/signature, build
> it and test it, and then please vote:
>
> [ ] +1 approve
>
> [ ] +0 no opinion
>
> [ ] -1 disapprove (and reason why)
>
> Thanks,
> Xinyu
>
>


[VOTE] Apache Samza 0.14.1 RC1

2018-05-10 Thread Xinyu Liu
Hi, All,

This is a call for a vote on a release of Apache Samza 0.14.1. Thanks to
everyone who has contributed to this release.

The release candidate can be downloaded from here:
http://home.apache.org/~xinyu/samza-0.14.1-rc1/

The release candidate is signed with pgp key C31D7061, which can be found
on keyservers: http://pgp.mit.edu/pks/lookup?op=get=0xC31D7061

The git tag is release-0.14.1-rc0 and signed with the same pgp key:
https://git-wip-us.apache.org/repos/asf?p=samza.git;a=tag;h=refs/tags/release-0.14.1-rc1

Test binaries have been published to Maven's staging repository, and are
available here:
https://repository.apache.org/content/repositories/orgapachesamza-1047/

46 issues were resolved for this release: https://issues.apache.org/
jira/projects/SAMZA/versions/12343155

The vote will be open for 72 hours (ending at 3:00PM Tuesday, 05/15/2018).

Please download the release candidate, check the hashes/signature, build it
and test it, and then please vote:

[ ] +1 approve

[ ] +0 no opinion

[ ] -1 disapprove (and reason why)

Thanks,
Xinyu


[CANCEL][VOTE] Apache Samza 0.14.1 RC0

2018-05-10 Thread Xinyu Liu
We need an performance fix for eventhub system producer (SAMZA-1706) as
well as a fix for the unit test in TestRocksDbKeyValueStoreJava. Cancel
this vote and will submit another RC soon.

Thanks,
Xinyu


Re: [VOTE] Apache Samza 0.14.1 RC0

2018-05-10 Thread Xinyu Liu
We are pulling in a fix in SAMZA-1706: lazy initialization for eventhub
system producer, and also a fix for test failure in
TestRocksDbKeyValueStoreJava#testIterate in this release. RC0 is cancelled.
Will send out the cancellation email shortly.

Thanks,
Xinyu

On Tue, May 8, 2018 at 6:48 PM, Xinyu Liu <xinyuliu...@gmail.com> wrote:

> Hi, All,
>
> This is a call for a vote on a release of Apache Samza 0.14.1. Thanks to
> everyone who has contributed to this release.
>
> The release candidate can be downloaded from here:
> http://home.apache.org/~xinyu/samza-0.14.1-rc0/
>
> The release candidate is signed with pgp key C31D7061, which can be found
> on keyservers: http://pgp.mit.edu/pks/lookup?op=get=0xC31D7061
>
> The git tag is release-0.14.1-rc0 and signed with the same pgp key:
> https://git-wip-us.apache.org/repos/asf?p=samza.git;a=tag;h=
> refs/tags/release-0.14.1-rc0
>
> Test binaries have been published to Maven's staging repository, and are
> available here: https://repository.apache.org/content/repositories/
> orgapachesamza-1046/
>
> 46 issues were resolved for this release: https://issues.apache.org/
> jira/projects/SAMZA/versions/12343155
>
> The vote will be open for 72 hours (ending at 7:00PM Friday, 05/11/2018).
>
> Please download the release candidate, check the hashes/signature, build
> it and test it, and then please vote:
>
> [ ] +1 approve
>
> [ ] +0 no opinion
>
> [ ] -1 disapprove (and reason why)
>
> Thanks,
> Xinyu
>


[Discuss] Samza 0.14.1 release

2018-04-30 Thread Xinyu Liu
Hi, All,

We have been adding many improvements and critical bug fixes in the areas
of Samza sql, standalone, eventhub system consumer and host-affinity since
0.14.0 release. The changes should warrant a new minor release:

SQL
- SAMZA-1681: Samza-sql: Add support for handling older record schema
versions in AvroRelConverter
- SAMZA-1671: SamzaSQL: add insert into table support
- SAMZA-1651: Samza-sql: Implement GROUP BY SQL operator

Standalone
- SAMZA-1689: Add validations before state transitions in
ZkBarrierForVersionUpgrade.
- SAMZA-1686: Set finite operation timeout when creating zkClient.
- SAMZA-1667: Skip storing configuration as a part of JobModel in zookeeper
data nodes.
- SAMZA-1647: Fix NPE in JobModelExpired event handler.

Eventhub
- SAMZA-1688: use per partition eventhubs client
- SAMZA-1676: miscellaneous fix and improvement for eventhubs system
- SAMZA-1656: EventHubSystemAdmin does not fetch metadata for valid streams.

host-affinity
- SAMZA-1687: Prioritize preferred host requests over ANY-HOST requests
- SAMZA-1649: Improve host-aware allocation to account for strict locality

The complete list of changes are here:
https://issues.apache.org/jira/browse/SAMZA-1624?jql=project%20%3D%2012314526%20AND%20fixVersion%20%3D%200.14.1%20%20ORDER%20BY%20priority%20DESC%2C%20key%20ASC.
Most JIRAs in the list have been completed and merged, and we willl try to
get the remaining completed before 0.14.1 release.

Here's what I propose:
- Cut the 0.14.1 release branch
- Target a release sometime next week (5/7 - 5/11).
Thoughts?

Xinyu


Re: [VOTE] SEP-11: Host affinity in standalone.

2018-04-05 Thread xinyu liu
+1 (binding). Look forward to the implementation.

Xinyu

On Wed, Apr 4, 2018 at 2:43 PM, Yi Pan  wrote:

> +1 (binding). Thanks for the revisions!
>
> -Yi
>
> On Wed, Apr 4, 2018 at 2:39 PM, santhosh venkat <
> santhoshvenkat1...@gmail.com> wrote:
>
> > Hi,
> >
> > This is a voting thread for SEP-11: Host affinity in standalone.
> >
> > For reference, here is the wiki link: https://cwiki.apache.org
> > /confluence/pages/viewpage.action?pageId=75957309
> >
> > Thanks.
> >
>


Samza 0.14.0 officially released!

2018-01-05 Thread xinyu liu
Hi, all,

I am pleased to let you know that we have officially released Samza 0.14.0!

Huge thanks to everyone for working on the features and bugs in this
release, and cooperating in the release process.

Here is the announcement blog: https://blogs.apache.org/samza/

Cheers!
Xinyu


Re: [VOTE] Apache Samza 0.14.0 RC5

2017-12-28 Thread xinyu liu
+1 on my side.

Verified by running check-all.sh and integration tests. They both passed.

Thanks,
Xinyu

On Thu, Dec 28, 2017 at 5:06 AM, Jagadish Venkatraman <
jagadish1...@gmail.com> wrote:

> +1 (binding)
>
> Verified the RC. Ran *check-all.sh* and integration tests successfully on
> OS X. Thanks Xinyu, and everyone for driving Samza-0.14!
>
> On Thu, Dec 28, 2017 at 3:22 AM, Yi Pan <nickpa...@gmail.com> wrote:
>
> > +1 (binding).
> >
> > Verified the signature and MD5
> > Ran ./bin/check-all.sh on OSX
> > Ran integration tests on OSX
> > Verified ./gradlew releaseToolsTarGz generated samza-tools-0.14.0.tgz in
> > build directory
> >
> > Thanks for all!
> >
> > -Yi
> >
> > On Fri, Dec 22, 2017 at 6:10 PM, Boris S <bor...@gmail.com> wrote:
> >
> > > Verified the signature.
> > > Ran build, tests and integration tests on Unix.
> > > All passed (as before requires python 2.7, neither higher nor lower).
> > >
> > > +1
> > > Thanks guys !!
> > >
> > > On Fri, Dec 22, 2017 at 2:50 PM, xinyu liu <xinyuliu...@gmail.com>
> > wrote:
> > >
> > > > This is a call for a vote on a release of Apache Samza 0.14.0. Thanks
> > > > to everyone
> > > > who has contributed to this release.
> > > >
> > > > The release candidate can be downloaded from here:
> > > > http://home.apache.org/~xinyu/samza-0.14.0-rc5/
> > > >
> > > > The release candidate is signed with pgp key C31D7061, which can be
> > found
> > > > on
> > > > keyservers:
> > > > http://pgp.mit.edu/pks/lookup?op=get=0x35964389C31D7061
> > > >
> > > > The git tag is release-0.14.1-rc5 and signed with the same pgp key:
> > > > https://git-wip-us.apache.org/repos/asf?p=samza.git;a=tag;h=
> > > > refs/tags/release-0.14.0-rc5
> > > >
> > > > Test binaries have been published to Maven's staging repository, and
> > > > are available
> > > > here:
> > > > https://repository.apache.org/content/repositories/
> orgapachesamza-1042
> > > >
> > > > 61 issues have been resolved as part of this release
> > > > https://issues.apache.org/jira/browse/SAMZA-1519?jql=project
> > > > %20%3D%20SAMZA%20AND%20fixVersion%20%3D%200.14.0%20AND%
> > > > 20status%20%3D%20Resolved
> > > >
> > > > The vote will be open for 72 hours (ending at 15:00 PM Thursday,
> > > > 12/28/2017).
> > > >
> > > > Please download the release candidate, check the hashes/signature,
> > build
> > > it
> > > > and test it, and then please vote:
> > > >
> > > > [ ] +1 approve
> > > >
> > > > [ ] +0 no opinion
> > > >
> > > > [ ] -1 disapprove (and reason why)
> > > >
> > > > Thanks,
> > > > Xinyu
> > > >
> > >
> >
>
>
>
> --
> Jagadish V,
> Graduate Student,
> Department of Computer Science,
> Stanford University
>


[VOTE] Apache Samza 0.14.0 RC5

2017-12-22 Thread xinyu liu
This is a call for a vote on a release of Apache Samza 0.14.0. Thanks
to everyone
who has contributed to this release.

The release candidate can be downloaded from here:
http://home.apache.org/~xinyu/samza-0.14.0-rc5/

The release candidate is signed with pgp key C31D7061, which can be found on
keyservers:
http://pgp.mit.edu/pks/lookup?op=get=0x35964389C31D7061

The git tag is release-0.14.1-rc5 and signed with the same pgp key:
https://git-wip-us.apache.org/repos/asf?p=samza.git;a=tag;h=
refs/tags/release-0.14.0-rc5

Test binaries have been published to Maven's staging repository, and
are available
here:
https://repository.apache.org/content/repositories/orgapachesamza-1042

61 issues have been resolved as part of this release
https://issues.apache.org/jira/browse/SAMZA-1519?jql=project
%20%3D%20SAMZA%20AND%20fixVersion%20%3D%200.14.0%20AND%
20status%20%3D%20Resolved

The vote will be open for 72 hours (ending at 15:00 PM Thursday,
12/28/2017).

Please download the release candidate, check the hashes/signature, build it
and test it, and then please vote:

[ ] +1 approve

[ ] +0 no opinion

[ ] -1 disapprove (and reason why)

Thanks,
Xinyu


[CANCEL][VOTE] Apache Samza 0.14.0 RC4

2017-12-22 Thread xinyu liu
Hi,

This is an official CANCEL for the RC4 vote as we found an issue in the
KafkaCheckpointManager. We will create a new RC with the fix.

Thanks,
Xinyu


[VOTE] Apache Samza 0.14.0 RC4

2017-12-22 Thread xinyu liu
This is a call for a vote on a release of Apache Samza 0.14.0. Thanks
to everyone
who has contributed to this release.

The release candidate can be downloaded from here:
http://home.apache.org/~xinyu/samza-0.14.0-rc4/

The release candidate is signed with pgp key C31D7061, which can be found on
keyservers:
http://pgp.mit.edu/pks/lookup?op=get=0x35964389C31D7061

The git tag is release-0.14.1-rc4 and signed with the same pgp key:
https://git-wip-us.apache.org/repos/asf?p=samza.git;a=tag;h=
refs/tags/release-0.14.0-rc4

Test binaries have been published to Maven's staging repository, and
are available
here:
https://repository.apache.org/content/repositories/orgapachesamza-1041

61 issues have been resolved as part of this release
https://issues.apache.org/jira/browse/SAMZA-1519?jql=
project%20%3D%20SAMZA%20AND%20fixVersion%20%3D%200.14.0%
20AND%20status%20%3D%20Resolved

The vote will be open for 72 hours (ending at 12:00 PM Thursday,
12/28/2017).

Please download the release candidate, check the hashes/signature, build it
and test it, and then please vote:

[ ] +1 approve

[ ] +0 no opinion

[ ] -1 disapprove (and reason why)

Thanks,
Xinyu


[CANCEL][VOTE] Apache Samza 0.14.0 RC3

2017-12-22 Thread xinyu liu
Hi,

This is an official CANCEL for the RC3 vote as we found the documentation
for Samza SQL (SAMZA-1526) was not merged to the 0.14.0 release branch. We
will create a new RC with it.

Thanks,
Xinyu


[RESULT][VOTE] Apache Samza 0.14.0 RC3

2017-12-21 Thread xinyu liu
The vote of 0.14.0 RC3 has been more than 72 hours and we got +1 (binding)
x 3 and +1 (non-binding) x 3.

Samza 0.14.0 officially passed the VOTE!

Thanks!
Xinyu


Re: [VOTE] Apache Samza 0.14.0 RC3

2017-12-21 Thread xinyu liu
+1

Run check-all.sh successfully on both OSX and linux.

Thanks,
Xinyu

On Thu, Dec 21, 2017 at 10:51 AM, Daniel Nishimura <dnishim...@linkedin.com>
wrote:

> +1
> Verified signatures and built on Ubuntu 14.04 and macOS 10.12 with JDK
> 1.8.0_121
> Note: The build is currently works with Gradle 2.8. The following error
> happens on higher versions of Gradle: “bad option: '-feature
> -language:implicitConversions -language:reflectiveCalls'”
> There’s a PR to address this in a future release:
> https://github.com/apache/samza/pull/326
>
> On 12/21/17, 8:10 AM, "Jagadish Venkatraman" <jagadish1...@gmail.com>
> wrote:
>
> LGTM.
>
> I verified signatures and ran all tests on OsX. Additionally,
> *check-all.sh*
>  succeeded.
>
> +1 (binding)
>
>
>
>
> On Thu, Dec 21, 2017 at 1:07 AM, Yi Pan <nickpa...@gmail.com> wrote:
>
> > +1 binding
> >
> > Verified git tag and source signatures
> >
> > Ran check-all.sh on OSX
> > Ran integration tests from OSX
> >
> > Thanks for push it forward!
> >
> > -Yi
> >
> > On Wed, Dec 20, 2017 at 3:19 PM, Jake Maes <jma...@apache.org>
> wrote:
> >
> > > +1 binding
> > >
> > > Verified git tag `release-0.14.0-rc3`
> > > Verified source signature
> > >
> > > Ran check-all.sh on both Linux and OSX
> > > Ran integration tests from OSX
> > >
> > > LGTM
> > >
> > >
> > > On Tue, Dec 19, 2017 at 10:00 AM, Boris S <bor...@gmail.com>
> wrote:
> > >
> > > > Verified signature.
> > > > Ran unit and integration tests.
> > > > As usual had to force python to 2.7 for integration tests to run
> on
> > > Linux.
> > > >
> > > > +1
> > > >
> > > > On Mon, Dec 18, 2017 at 11:01 AM, xinyu liu <
> xinyuliu...@gmail.com>
> > > wrote:
> > > >
> > > > > Correction: This is a call for a vote on a release of Apache
> Samza
> > > > > *0.14.0*.
> > > > >
> > > > > On Mon, Dec 18, 2017 at 10:57 AM, xinyu liu <
> xinyuliu...@gmail.com>
> > > > wrote:
> > > > >
> > > > > > This is a call for a vote on a release of Apache Samza
> 0.13.1.
> > Thanks
> > > > to
> > > > > everyone
> > > > > > who has contributed to this release.
> > > > > >
> > > > > > The release candidate can be downloaded from here:
> > > > > > http://home.apache.org/~xinyu/samza-0.14.0-rc3/
> > > > > >
> > > > > > The release candidate is signed with pgp key C31D7061, which
> can be
> > > > > found on
> > > > > > keyservers:
> > > > > > http://pgp.mit.edu/pks/lookup?op=get=
> 0x35964389C31D7061
> > > > > >
> > > > > > The git tag is release-0.13.1-rc0 and signed with the same
> pgp key:
> > > > > > https://git-wip-us.apache.org/repos/asf?p=samza.git;a=tag;h=
> > > > > > refs/tags/release-0.14.0-rc3
> > > > > >
> > > > > > Test binaries have been published to Maven's staging
> repository,
> > and
> > > > are
> > > > > available
> > > > > > here:
> > > > > > https://repository.apache.org/content/repositories/
> > > orgapachesamza-1036
> > > > > >
> > > > > > 61 issues have been resolved as part of this release
> > > > > > https://issues.apache.org/jira/browse/SAMZA-1519?jql=
> > > > > > project%20%3D%20SAMZA%20AND%20fixVersion%20%3D%200.14.0%
> > > > > > 20AND%20status%20%3D%20Resolved
> > > > > >
> > > > > > The vote will be open for 72 hours (ending at 11:00 AM
> Thursday,
> > > > > > 12/21/2017).
> > > > > >
> > > > > > Please download the release candidate, check the
> hashes/signature,
> > > > build
> > > > > > it and test it, and then please vote:
> > > > > >
> > > > > > [ ] +1 approve
> > > > > >
> > > > > > [ ] +0 no opinion
> > > > > >
> > > > > > [ ] -1 disapprove (and reason why)
> > > > > >
> > > > > > Thanks,
> > > > > > Xinyu
> > > > > >
> > > > >
> > > >
> > >
> >
>
>
>
> --
> Jagadish V,
> Graduate Student,
> Department of Computer Science,
> Stanford University
>
>
>


Re: [VOTE] Apache Samza 0.14.0 RC3

2017-12-18 Thread xinyu liu
Correction: This is a call for a vote on a release of Apache Samza *0.14.0*.

On Mon, Dec 18, 2017 at 10:57 AM, xinyu liu <xinyuliu...@gmail.com> wrote:

> This is a call for a vote on a release of Apache Samza 0.13.1. Thanks to 
> everyone
> who has contributed to this release.
>
> The release candidate can be downloaded from here:
> http://home.apache.org/~xinyu/samza-0.14.0-rc3/
>
> The release candidate is signed with pgp key C31D7061, which can be found on
> keyservers:
> http://pgp.mit.edu/pks/lookup?op=get=0x35964389C31D7061
>
> The git tag is release-0.13.1-rc0 and signed with the same pgp key:
> https://git-wip-us.apache.org/repos/asf?p=samza.git;a=tag;h=
> refs/tags/release-0.14.0-rc3
>
> Test binaries have been published to Maven's staging repository, and are 
> available
> here:
> https://repository.apache.org/content/repositories/orgapachesamza-1036
>
> 61 issues have been resolved as part of this release
> https://issues.apache.org/jira/browse/SAMZA-1519?jql=
> project%20%3D%20SAMZA%20AND%20fixVersion%20%3D%200.14.0%
> 20AND%20status%20%3D%20Resolved
>
> The vote will be open for 72 hours (ending at 11:00 AM Thursday,
> 12/21/2017).
>
> Please download the release candidate, check the hashes/signature, build
> it and test it, and then please vote:
>
> [ ] +1 approve
>
> [ ] +0 no opinion
>
> [ ] -1 disapprove (and reason why)
>
> Thanks,
> Xinyu
>


[VOTE] Apache Samza 0.14.0 RC3

2017-12-18 Thread xinyu liu
This is a call for a vote on a release of Apache Samza 0.13.1. Thanks
to everyone
who has contributed to this release.

The release candidate can be downloaded from here:
http://home.apache.org/~xinyu/samza-0.14.0-rc3/

The release candidate is signed with pgp key C31D7061, which can be found on
keyservers:
http://pgp.mit.edu/pks/lookup?op=get=0x35964389C31D7061

The git tag is release-0.13.1-rc0 and signed with the same pgp key:
https://git-wip-us.apache.org/repos/asf?p=samza.git;a=tag;h=refs/tags/release-0.14.0-rc3

Test binaries have been published to Maven's staging repository, and
are available
here:
https://repository.apache.org/content/repositories/orgapachesamza-1036

61 issues have been resolved as part of this release
https://issues.apache.org/jira/browse/SAMZA-1519?jql=project%20%3D%20SAMZA%20AND%20fixVersion%20%3D%200.14.0%20AND%20status%20%3D%20Resolved

The vote will be open for 72 hours (ending at 11:00 AM Thursday,
12/21/2017).

Please download the release candidate, check the hashes/signature, build it
and test it, and then please vote:

[ ] +1 approve

[ ] +0 no opinion

[ ] -1 disapprove (and reason why)

Thanks,
Xinyu


[CANCEL][VOTE] Apache Samza 0.14.0 RC2

2017-12-15 Thread xinyu liu
Hi,

This is an official CANCEL for the RC2 vote as we have another javadoc
update in the release.

Thanks,
Xinyu


Re: [DISCUSS] Samza 0.14.0 release

2017-11-27 Thread xinyu liu
+1.

Very happy to see a lot of important features added in this release.

Thanks,
Xinyu

On Mon, Nov 27, 2017 at 10:00 AM, Jagadish Venkatraman 
wrote:

> +1 from my side.
>
> Thank you Bharath for driving the release!
>
>
>
> On Mon, Nov 27, 2017 at 9:50 AM, Bharath Kumara Subramanian <
> codin.mart...@gmail.com> wrote:
>
> > Hi all,
> >
> >
> >
> > We have added couple of major features to master since 0.13.1 that
> warrants
> > a major release.
> >
> > Within LinkedIn, some of these features have already been tested as part
> of
> > our test suites. We plan to continue our testing in coming weeks to
> > validate the stability prior to release.
> >
> > We wanted to kick off the discussion in open source forum to keep the
> > momentum flowing.
> >
> >
> >
> > Here is the list of features that are part of the new release
> >
> >- SAMZA-1510  -
> Samza
> >SQL
> >- SAMZA-1417  - Add
> >support for multistage batch to Samza on Hadoop
> >- SAMZA-1438  -
> > Event-hub
> >connectors for Samza
> >
> >
> >
> > We have also worked on stabilizing our 0.13 features. Here are some
> > highlights
> >
> >- SAMZA-1454 ,
> >SAMZA-1493  - Add
> >support for durable state for high level API
> >- SAMZA-1417 
> >SAMZA-1330 
> > SAMZA-1289
> > - Stabilization of
> >ZooKeeper based deployment model
> >- SAMZA-1471 ,
> >SAMZA-1392 ,
> > SAMZA-1465
> > - Performance
> >improvements
> >
> >
> >
> > You can find the concrete list of the features here
> >  > project%20%3D%20samza%20AND%20fixVersion%20%3D%200.14.0%
> > 20AND%20resolution%20%3D%20fixed>
> > .
> >
> >
> >
> > Here is my proposal on our release schedule and timelines.
> >
> >1. Create a release candidate with the current 0.14.0 HEAD
> >2. Target a release vote on the week Dec 4st
> >
> >
> >
> > Thoughts?
> >
> >
> >
> > Thanks,
> >
> > Bharath
> >
>


Re: Samza questions (downtime during deployment and num partition per task)

2017-10-31 Thread xinyu liu
Hi, Tony,

For your questions:

1) Having a hot-standby job instance for fail-over may introduce certain
operational complications. For example, if they produce to the same output,
then both will be running in a short period of time, which might lead to
duplicates in output. If the jobs has local state, it will be more
complicated to continue using the previous states from the active job. So
if your job doesn't have state and can handle duplicates, this might work.
Another option I can think of is to use the zk-based Samza deployment. This
is a new feature we added to Samza. The description is
http://samza.apache.org/startup/preview/#flexible-deployment-model, and
there is a hello-samza example for it:
http://samza.apache.org/learn/tutorials/latest/hello-samza-high-level-zk.html.
For zk-based deployment, we support rolling bounce so you can upgrade your
container one at a time.

2) Yes, you can implement your own SystemStreamPartitionGrouper if you want
a static assignment (from config). For example, please take a look at
AllSspToSingleTaskGrouper which groups all system stream partitions into a
single task.

Thanks,
Xinyu

On Mon, Oct 30, 2017 at 2:02 PM, Tony Du  wrote:

> Hi, we're looking into Samza for doing real-time processing. We have couple
> questions w.r.t Samza functionality
>
> 1. One "must-have" requirement for us is zero/minimal downtime during
> deployment of Samza jobs. One approach that we're thinking of is to start a
> new instance of the same Samza job and make sure it's running before take
> down the running one. Is there any problem to this approach? If so, is
> there any suggestion for this problem as I'm unable to find anything
> related to this in documentation.
>
> 2. Is it possible to configure multiple Kafka topic partitions to a single
> task instance within a job? Since each task has its own JVM and some of the
> job that we have use a lot of memory, it would be very wasteful to run many
> instances of them
>
> Thanks much for your help!
>
> 
> --
> Tony T Du
> Sr. Software Engineer - Dataminr
>
> I hear I forget. I see I remember. I do I understand.
> 
> --
>


Review Request 63267: Add instructions to README.md

2017-10-24 Thread Xinyu Liu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63267/
---

Review request for samza and Prateek Maheshwari.


Repository: samza-hello-samza


Description
---

Add instructions to README.md


Diffs
-

  README.md 0f80e9e52240abc071dcdfb56800826ef5d49d7d 
  build.gradle ec451d576a157eba2d8889b63b8574f397937197 


Diff: https://reviews.apache.org/r/63267/diff/1/


Testing
---


Thanks,

Xinyu Liu



Re: [VOTE] SEP-8: Add in-memory system consumer & producer

2017-09-06 Thread xinyu liu
+1 on the overall design. This will make testing a lot easier!

Thanks,
Xinyu

On Wed, Sep 6, 2017 at 10:45 AM, Bharath Kumara Subramanian <
codin.mart...@gmail.com> wrote:

> Hi all,
>
> Can you please vote for SEP-8?
> You can find the design document here
>  >.
>
> Thanks,
> Bharath
>


Re: [Discuss] Samza 0.13.1 release

2017-08-14 Thread xinyu liu
+1. Thanks for pushing the release.

Xinyu

On Mon, Aug 14, 2017 at 10:58 AM, Boris S  wrote:

> +1 for the release.
>
> On Mon, Aug 14, 2017 at 10:38 AM, Yi Pan  wrote:
>
> > +1 for the list! Let's proceed!
> >
> > On Fri, Aug 11, 2017 at 6:13 PM, Ignacio Solis  wrote:
> >
> > > +1
> > >
> > > On Fri, Aug 11, 2017 at 3:52 PM, Jacob Maes 
> > wrote:
> > > > Looks good!
> > > >
> > > > +1
> > > >
> > > > On Thu, Aug 10, 2017 at 6:53 PM, Jagadish Venkatraman <
> > > > jagadish1...@gmail.com> wrote:
> > > >
> > > >> +1 for the release. thanks for the summary and for driving this
> Fred!
> > > >>
> > > >> On Thu, Aug 10, 2017 at 5:15 PM Fred Haifeng Ji <
> haifeng...@gmail.com
> > >
> > > >> wrote:
> > > >>
> > > >> > The format was messed up when sent from my yahoo mail to
> > > >> > dev@samza.apache.org. I am resending it from my gmail account.
> > Sorry
> > > for
> > > >> > inconvenience!
> > > >> >
> > > >> > Hi all,
> > > >> >
> > > >> > There have been some new features and critical bug fixes added to
> > > master
> > > >> > since 0.13.0 release, which makes Samza Standalone features more
> > > stable.
> > > >> It
> > > >> > is now good enough to warrant *a new minor release*. We will
> > continue
> > > to
> > > >> > test for stability and performance in the next few weeks.
> > > >> >
> > > >> > Here are the main JIRA tickets that will be included in this
> release
> > > (but
> > > >> > not limited to):
> > > >> > SAMZA-1165: Cleanup data created by ZkStandalone in ZK;
> > > >> > SAMZA-1324: Add a metricsreporter lifecycle for JobCoordinator
> > > component
> > > >> of
> > > >> > StreamProcessor;
> > > >> > SAMZA-1336: Standalone session expiration propagation;
> > > >> > SAMZA-1337: LocalApplicationRunner needs to support StreamTask;
> > > >> > SAMZA-1339: Add standalone integration tests;
> > > >> > …
> > > >> >
> > > >> > There are also quite a few bug fixes in 0.13.1, *please check the
> > > >> complete
> > > >> > list of changes in 0.13.1 here
> > > >> > <
> > > >> > https://issues.apache.org/jira/browse/SAMZA-1165?jql=
> > > >> project%20%3D%2012314526%20AND%20fixVersion%20%3D%
> > > 2012340845%20ORDER%20BY%
> > > >> 20priority%20DESC%2C%20key%20ASC
> > > >> > >*
> > > >> > .
> > > >> >
> > > >> > Most JIRAs in the list have been completed and merged, with the
> > > following
> > > >> > one remaining, but we should try to get it completed before 0.13.1
> > is
> > > >> > released.
> > > >> > SAMZA-1385: Coordination utils in LocalApplicationRunner uses same
> > Zk
> > > >> node
> > > >> > as ZkJobCoordinatorFactory for leader election
> > > >> >
> > > >> > Here's what I propose:
> > > >> > 1. Cut an 0.13.1 release branch.
> > > >> > 2. Work on getting the remaining open JIRA done.
> > > >> > 3. Target a release vote by Aug 18.
> > > >> >
> > > >> > Thoughts?
> > > >> >
> > > >> > Fred
> > > >> >
> > > >> --
> > > >> Sent from my iphone.
> > > >>
> > >
> > >
> > >
> > > --
> > > Nacho - Ignacio Solis - iso...@igso.net
> > >
> >
>


Re: [VOTE] SEP-5: Enable partition expansion of input streams

2017-06-20 Thread xinyu liu
+1 (non-binding) on this design.

To me the task-count based groupers should work well in practice for
partition expansion of system using hash for partitions, e.g. Kafka. It
will not cause any state transfer between hosts so the runtime cost will be
minimal. In the future when we support dynamically re-balancing the tasks,
we can further scale the task count if needed.

Thanks,
Xinyu

On Mon, Jun 19, 2017 at 9:27 AM, Dong Lin  wrote:

> Hi everyone,
>
> Can you please vote for SEP-5? The wiki can be found at
> *https://cwiki.apache.org/confluence/display/SAMZA/SEP-
> 5%3A+Enable+partition+expansion+of+input+streams
>  5%3A+Enable+partition+expansion+of+input+streams>.*
>
> Thanks,
> Dong
>


Re: [DISCUSS] SEP-5: Enable partition expansion of input streams

2017-06-12 Thread xinyu liu
How about making the partition mapping function a pluggable component in
the partition expansion? Mathematically, this is a mapping function which
is able to map the new partitions to the old ones:

  *f (new partition) -> old partition*

If the function is a surjective function (
https://en.wikipedia.org/wiki/Surjective_function), we are able to keep the
tasks as they were by replacing the old partition assignment with the new
one using the mapping function. By making this function pluggable, users
can provide their own mapping functions to make this work for different
kinds of input systems. Samza should check whether the function is
surjective so it knows whether we can keep the same task count. with
different grouping. For Kafka, we can provide a simple modular function as
the mapping, and it's surjective. I agree it's very nice to have a more
general support to be able to split the states of tasks and expand the
change log etc, but this SEP is still useful and can address quite a large
number of scenarios in practice, do you agree?

Thanks,
Xinyu

On Mon, Jun 12, 2017 at 3:54 PM, Jacob Maes  wrote:

> Hey Dong,
>
> I'm opposed (or a +0, at best) to this limited, Kafka-specific solution. I
> understand that the proposal is relatively simple to implement, but I think
> it will cause headaches for Samza users. They will not only have to
> understand all the limitations (increase only, double partitions only,
> partition using hash+modulo, etc) of this approach, but enforcing these
> limitations can be a major problem, especially when the Samza jobs and
> message brokers are managed by separate orgs in a company. Separate orgs
> are often difficult to coordinate and a system which depends on such
> significant process/coordination is too fragile for my taste.
>
> That said, I realize that my opinion is just one of many in the broader
> community which may feel differently, so let me respond to some of the
> other items in the discussion so we can clear them up:
>
> The task-to-container assignment matters because if the correlated tasks
> > (i.e. tasks that consume messages with the same key) needs to be in the
> > same container so that they can share the same key/value local store on
> the
> > same physical machine.
>
> There is currently no supported way of sharing state among the tasks of a
> container.  Each task has its own isolated store and that logical isolation
> is the primary thing that enables Samza jobs to scale with a simple
> container count change. My feeling is that we should not change this
> without good reason.
>
> I think we can hardcode new logic in KafkaCheckpointLogKey.scala such that
> > exception will not be thrown if new grouper is
> > GroupByPartitionWithFixedTaskNum and old grouper is GroupByPartition.
> Does
> > this look reasonable?
>
> With the current proposal, we'd also need a similar check for
> GroupBySystemStreamPartitionWithFixedTaskNum as well. And if any other
> groupers were later added with both these modes, we'd probably need to add
> those too. It might be easier and cleaner to add a config to ignore that
> check temporarily. Down side is that it further complicates the Samza
> config, which is already huge. Thoughts?
>
> I think storing the previous task-to-partition mapping is more general than
> > storing the partition count of all topics for the following reasons:
> > - Samza already stores the task-to-container mapping and
> container-to-host
> > mapping in the coordinator stream. It seems consistent to also store the
> > partition-to-task mapping. And this information may be useful for other
> > use-case such as debugging.
> > - By having the new interface take the previous task-to-partition
> > assignment instead of a topic-to-partition-count mapping as new
> parameter,
> > we can potentially have grouper implementation to support other types of
> > input systems.
> > - It is sightly simpler to store the task-to-partition assignment because
> > we don't need to know whether this is the first time a job is started or
> > not. On the other hand, you can write topic-to-partition-count mapping to
> > the coordinator stream only if this is the first time the job is run
>
> The task-to-container and container-to-host mappings are both meaningful in
> context of the JobModel. Partition-to-task mapping is not meaningful
> without some definition of the key-to-partition assignments. It's
> incomplete information and therefore misleading. I think it only makes
> sense to use this mapping if we adopt a solution wherein Samza also knows
> the partition key assignment.
>
> -Jake
>
> On Tue, Jun 6, 2017 at 11:06 PM, Dong Lin  wrote:
>
> > Hey Jacob,
> >
> > Thanks for taking time to review the SEP.
> >
> > I agree with you and Navina that the current SEP doesn't provide support
> to
> > arbitrary input systems and it doesn't support partition shrink. I think
> > the scope of this SEP is to support partition expansion for 

Re: [VOTE] Apache Samza 0.13.0 RC6

2017-06-06 Thread xinyu liu
+1 (non-binding).

Downloaded the source tar, built it and run check-all.sh on REHL6 with
gradle 2.8. All passed.

As a side note to Jagadish's comments, the build doesn't work on a higher
gradle version either (gradle 3.5). Seems "-language:implicitConversions
-language:reflectiveCalls" is not a valid build option anymore.

Thanks,
Xinyu

On Mon, Jun 5, 2017 at 10:06 AM, Jagadish Venkatraman <
jagadish1...@gmail.com> wrote:

> Checked out, ran tests, and all of them pass.
>
> +1 (non-binding)
>
> I did get an error when running with gradle 2.4:
> >>Could not resolve all dependencies for configuration
> ':samza-kafka_2.11:compile'. > java.lang.UnsupportedOperationException (no
> error message)
>
> However, when I used gradle 2.8, it was resolved.
>
> *gradle wrapper --gradle-version 2.8*
>
> Best,
> Jagadish
>
>
>
>
>
>
> On Mon, Jun 5, 2017 at 8:37 AM, Jake Maes  wrote:
>
> > This is a call for a vote on a release of Apache Samza 0.13.0. Thanks to
> > everyone who has contributed to this release. We are very glad to see
> some
> > new contributors and features in this release.
> >
> > The release candidate can be downloaded from here:
> > http://home.apache.org/~jmakes/samza-0.13.0-rc6/
> >
> > The release candidate is signed with pgp key 940AFC5A, which can be found
> > on keyservers:
> > *http://pgp.mit.edu/pks/lookup?op=get=0x940AFC5A
> > *
> >
> > The git tag is release-0.13.0-rc6 and signed with the same pgp key:
> > https://git-wip-us.apache.org/repos/asf?p=samza.git;a=tag;h=
> > refs/tags/release-0.13.0-rc6
> >
> > Test binaries have been published to Maven's staging repository, and are
> > available here:
> > https://repository.apache.org/content/repositories/orgapachesamza-1026
> >
> > 144 issues were resolved for this release:
> > https://issues.apache.org/jira/issues/?jql=project%20%3D%
> > 20SAMZA%20AND%20fixVersion%20in%20(0.13%2C%200.13.0)%
> > 20AND%20status%20in%20(
> > Resolved%2C%20Closed)
> >
> > The vote will be open for 72 hours (ending at 9:00AM Thursday,
> 06/08/2017).
> >
> > Please download the release candidate, check the hashes/signature, build
> it
> > and test it, and then please vote:
> >
> >
> > [ ] +1 approve
> >
> > [ ] +0 no opinion
> >
> > [ ] -1 disapprove (and reason why)
> >
>
>
>
> --
> Jagadish V,
> Graduate Student,
> Department of Computer Science,
> Stanford University
>


[VOTE] SEP 6: Support Control Message Across Intermediate Streams

2017-06-05 Thread xinyu liu
Hi, everyone,

This is the voting thread for SEP 6: Support Control Message Across
Intermediate Streams. The wiki page that discusses the SEP is:
https://cwiki.apache.org/confluence/display/SAMZA/SEP-6+Support+Control+Message+Across+Intermediate+Streams
.

Please vote.

Thanks,
Xinyu


Re: [DISCUSS] SEP-6: Support Watermark Across Intermediate Streams for Batch Processing

2017-06-01 Thread xinyu liu
Chatted with Yi offline and we both agree on keeping watermarks and
end-of-stream separate. @Chris: thanks for the detailed explanation.

@Yi: for you questions:

1) The propagation and the reconciliation process of watermarks and
end-of-stream are the same. As we discussed, this can be done inside the
consumer TaskInstance. After reconciliation, a single eos/watermark message
will be emitted to the task. The message emission will be based on the
control message type. For watermark we will calculate the min watermark
timestamp. For end-of-stream it's a empty envelop with END-OF-STREAM
offset, as we have today.

2) We need to checkpoint the control message bookkeeping states in the
TaskInstance along with the incoming message offsets. During failure
recovery, we will restore the states and continue processing future control
messages. reemission from upstream watermarks will not cause issues since
the state keeps the latest watermarks received. Any older watermarks will
be thrown away.

3) Yes, the integration with future exactly-once processing is out-of-scope
for this SEP. I will sync with Becket to make sure the exactly-once marker
can use the mechanism I am building here (in-band control messages) for
propagation.

I updated the SEP-6 (change the name to be more general. new link:
https://cwiki.apache.org/confluence/display/SAMZA/SEP-6+Support+Control+Message+Across+Intermediate+Streams)
according to our discussion. Please take a look.

Thanks,
Xinyu



On Thu, Jun 1, 2017 at 10:41 AM, Chris Pettitt <
cpett...@linkedin.com.invalid> wrote:

> I would recommend keeping watermarks and end-of-stream separate. It is
> lossy to represent end-of-stream as a watermark - does that mean we
> hit the max watermark on a stream or that we're in a bounded stream
> with an end-of-stream marker? Also keep in mind that watermarks will
> eventually be user overridable and it would be possible for a user to
> effectively emit an end-of-stream control message on an unbounded
> stream.
>
> On Wed, May 31, 2017 at 8:12 PM, Yi Pan <nickpa...@gmail.com> wrote:
> > Hi, Xinyu,
> >
> > Thanks for the update. So I have two suggestions:
> > - It seems to me that EndOfStream can be implemented as a special type of
> > Watermark as well. a) we can use MAX_INT in the watermark value to
> indicate
> > the end-of-stream; b) the streamId are simply the key to the Map<String,
> > Long> in the source ingestion task. When the source ingestion task
> received
> > enough count of EoS, it simply emits an EoS with its own taskName to the
> > intermediate stream as a watermark and the watermark propagation rule
> will
> > work. The only different thing the tasks will do in EoS is shutdown the
> > current tasks, while non EoS watermarks does not trigger shutdown.
> However,
> > that will allow us to simplify the type of messages and data structure to
> > pass through. And the reasoning in reconsilation in the downstream tasks
> > are pretty simple: a) # of watermarks == # of upstream tasks (i.e.
> > producers) b) propagation rule for the watermark message is the same;
> > - Based on our discussion yesterday, I think that we also need a detailed
> > description in the design to talk about the failure recovery scenario,
> > especially to answer the questions: a) in failure recovery, how the
> > checkpoint of offsets in the input streams and the watermark checkpoint
> > recovered in the current task? b) What's the correlation between the
> input
> > offsets and watermarks in the checkpoint in the current task?; c) what's
> > the implication of re-emitted watermarks from the current task to the
> > downstream tasks?
> >
> > And for Beam's watermark algorithm that Chris pointed out, I think that
> > OldestWork(stage) would be corresponding to the watermark timestamp of
> some
> > messages/state that we buffer in the current task and have not generate
> the
> > output or commit the state change yet. That would be needed if we
> implement
> > the exact-once algorithm Backett is working on since the algorithm will
> > require buffering message/state that are not committed yet. For now, if
> we
> > are processing each incoming message immediately and only checkpoint the
> > messages being processed completely, I think that we can ignore it.
> >
> > Just my few points.
> >
> > Thanks!
> >
> > -Yi
> >
> > On Tue, May 30, 2017 at 5:47 PM, xinyu liu <xinyuliu...@gmail.com>
> wrote:
> >
> >> @Chris: thanks a lot for providing the definitions. The first equation
> is
> >> exactly what I want to say about the watermark reconciliation. I haven't
> >> got to the second equation yet. Will probably think i

Re: [DISCUSS] SEP-6: Support Watermark Across Intermediate Streams for Batch Processing

2017-05-30 Thread xinyu liu
ing intermediate streams that all the inputs
> > have been end-of-stream” what does it mean? The task consuming the input
> > stream(s) reconcile all EoS from all input stream partitions and then
> > propagate EoS messages to all partitions in intermediate streams? This is
> > not super clear to me.
> >
> > - in step-3, how does the consumer of intermediate streams know how many
> > EOS messages should be received? And we should make it clear that it
> should
> > be EOS / producer and the count of the downstream consumer is counting on
> > the number of unique EOS from all producers from the upstream.
> >
> > - In comparison table, “checkpoint the control messages received” ==> is
> it
> > referring to the partially accumulated upstream EOS messages?
> >
> > - Please make a clear definition on “Watermark” and “EndOfStream”. Why
> are
> > they different? Are they both control messages that requires the same
> > delivery pattern (i.e. broadcast to downstream, reconcile at the
> consumer)?
> > If yes, should we make the “watermark” vs “EndOfStream” a sub-category in
> > control message?
> >
> > - As for the serde for intermediate stream, I assume that we will need an
> > envelope serde that is avro to wrap the user message and control message
> > in? So, user-defined serde now only applies to the “UserMessage”? And
> > what’s the message key in the message format?
> >
> > - A big question regarding to the watermark propagation: “When Samza
> > receives watermark messages, it will emit a watermark with the earliest
> > event time across all the stream partitions. No emission if the earliest
> > event time doesn’t change.” Does the watermark propagation requires
> > synchronization/coordination between all producers at the source? Say, if
> > the task taking one input source emits watermark at 1min interval and the
> > task taking another input source emits watermark at 5min interval, how
> does
> > the downstream consumer reconcile the watermarks?
> >
> > - In the checkpoint message format, it seems that it is only design for
> > watermark messages? Any streamId info that EoS is carrying over?
> >
> >
> > Thanks a lot!
> >
> >
> > -Yi
> >
> > On Tue, May 30, 2017 at 9:46 AM, xinyu liu <xinyuliu...@gmail.com>
> wrote:
> >
> > > Makes sense. I noticed that too and I dropped the ControlMessage type
> in
> > my
> > > code. I also moved taskName, taskCount to the parent ControlMessage
> > class.
> > > Just updated the SEP-6. Please take a look again.
> > >
> > > Thanks,
> > > Xinyu
> > >
> > > On Tue, May 30, 2017 at 9:12 AM, Chris Pettitt <
> > > cpett...@linkedin.com.invalid> wrote:
> > >
> > > > MessageType and ControlMessage.Type look redundant. You could either
> > use
> > > > "ControlMessage" as the type in MessageType or drop
> > ControlMessage.Type.
> > > >
> > > > On Fri, May 26, 2017 at 5:14 PM, xinyu liu <xinyuliu...@gmail.com>
> > > wrote:
> > > >
> > > > > Thanks a lot for the comments. I updated the SEP with more details
> > and
> > > > > clarification. Please let me know if you have further questions.
> > > > >
> > > > > Thanks,
> > > > > Xinyu
> > > > >
> > > > > On Thu, May 25, 2017 at 11:19 AM, Prateek Maheshwari <
> > > > > pmaheshw...@linkedin.com.invalid> wrote:
> > > > >
> > > > > > Hi Xinyu,
> > > > > >
> > > > > > Thanks for the proposal. Some requests for clarifications. Let's
> > > update
> > > > > the
> > > > > > SEP directly instead of replying here.
> > > > > >
> > > > > > E.g., in "For any following intermediate stream whose input
> streams
> > > are
> > > > > all
> > > > > > end-of-stream, it will be marked as pending EOS" - Should clarify
> > > that
> > > > > > (IIUC) something is injecting EOS messages in all intermediate
> > stream
> > > > > > partitions once it receives EOS from all input stream partitions
> > it's
> > > > > > consuming. Should also clarify what is that something.
> > > > > > Same for "declare end of stream once all the EOS messages have
> been
> > > > > > received." - What does this declaration involve and who is doing
>

Re: [DISCUSS] SEP-6: Support Watermark Across Intermediate Streams for Batch Processing

2017-05-30 Thread xinyu liu
Makes sense. I noticed that too and I dropped the ControlMessage type in my
code. I also moved taskName, taskCount to the parent ControlMessage class.
Just updated the SEP-6. Please take a look again.

Thanks,
Xinyu

On Tue, May 30, 2017 at 9:12 AM, Chris Pettitt <
cpett...@linkedin.com.invalid> wrote:

> MessageType and ControlMessage.Type look redundant. You could either use
> "ControlMessage" as the type in MessageType or drop ControlMessage.Type.
>
> On Fri, May 26, 2017 at 5:14 PM, xinyu liu <xinyuliu...@gmail.com> wrote:
>
> > Thanks a lot for the comments. I updated the SEP with more details and
> > clarification. Please let me know if you have further questions.
> >
> > Thanks,
> > Xinyu
> >
> > On Thu, May 25, 2017 at 11:19 AM, Prateek Maheshwari <
> > pmaheshw...@linkedin.com.invalid> wrote:
> >
> > > Hi Xinyu,
> > >
> > > Thanks for the proposal. Some requests for clarifications. Let's update
> > the
> > > SEP directly instead of replying here.
> > >
> > > E.g., in "For any following intermediate stream whose input streams are
> > all
> > > end-of-stream, it will be marked as pending EOS" - Should clarify that
> > > (IIUC) something is injecting EOS messages in all intermediate stream
> > > partitions once it receives EOS from all input stream partitions it's
> > > consuming. Should also clarify what is that something.
> > > Same for "declare end of stream once all the EOS messages have been
> > > received." - What does this declaration involve and who is doing this?
> > >
> > > In pro for approach 2: Not clear what this means - "The watermark can
> > > conclude the input messages before this watermark have been complete."
> > >
> > > For the cons of approach 2: "Complicated failure scenario of the second
> > > job. It needs to checkpoint all the watermark messages received, so
> when
> > it
> > > recovered from failure, it can still count." - How is this related to
> > EOS?
> > > How is this related to the checkpoint watermark section?
> > > Also, what is the "more messages required to write.. " referring to?
> > >
> > > "Samza needs to reconcile based on the task counts." - Please explain
> > what
> > > reconciliation means, why it needs to happen, and why we need to track
> > the
> > > producer task and total task count in the watermark message to do this.
> > >
> > > Checkpoint watermarks section is also unclear. What problem are we
> trying
> > > to solve here?
> > >
> > > Should also move the message format and the watermark message interface
> > > sections to the bottom, since they depend on details in the event time
> > and
> > > checkpoint watermark sections.
> > >
> > > Thanks,
> > > Prateek
> > >
> > >
> > > On Wed, May 24, 2017 at 11:30 AM, xinyu liu <xinyuliu...@gmail.com>
> > wrote:
> > >
> > > > Hi all,
> > > >
> > > > I created SEP-6 for SAMZA-1260
> > > > <https://issues.apache.org/jira/browse/SAMZA-1260>: Support
> Watermark
> > > > Across Intermediate Streams for Batch Processing. The link to the SEP
> > is
> > > > here:
> > > >
> > > > https://cwiki.apache.org/confluence/display/SAMZA/SEP-
> > > > 6+Support+Watermark+Across+Intermediate+Streams+for+Batch+Processing
> > > >
> > > > Please review and comments are welcome!
> > > >
> > > > Thanks,
> > > > Xinyu
> > > >
> > >
> >
>


Re: [NON-VOTING][RELEASE-TESTING] Apache Samza 0.13.0 RC1

2017-05-25 Thread xinyu liu
Change subject to [NON-VOTING][RELEASE-TESTING].

Thanks,
Xinyu


On Wed, May 24, 2017 at 7:42 PM, Navina Ramesh (Apache) 
wrote:

> Hi everyone,
>
> This is a call for a vote on a release of Apache Samza 0.13.0. Thanks to
> everyone who has contributed to this release. We are very glad to see some
> new contributors and features in this release.
>
> The release candidate can be downloaded from here:
> *http://home.apache.org/~navina/samza-0.13.0-rc1/
> *
>
> The release candidate is signed with pgp key 331C8F69 , which can be found
> on keyservers:
> http://pgp.mit.edu/pks/lookup?op=get=0x331C8F69
>
> The git tag is release-0.13.0-rc1 and signed with the same pgp key:
> https://git-wip-us.apache.org/repos/asf?p=samza.git;a=tag;h=
> refs/tags/release-0.13.0-rc1
>
> Test binaries have been published to Maven's staging repository, and are
> available here:
> https://repository.apache.org/content/repositories/orgapachesamza-1021
>
> 137 issues were resolved for this release:
> https://issues.apache.org/jira/issues/?jql=project%20%
> 3D%20SAMZA%20AND%20fixVersion%20in%20(0.13%2C%200.13.0)%
> 20AND%20status%20in%20(Resolved%2C%20Closed)
>
> The vote will be open for 3 *working* days (ending at 8:00PM Monday,
> 05/13/2017). We have an extended deadline this time as it is too close to a
> long weekend.
>
> Please download the release candidate, check the hashes/signature, build it
> and test it, and then please vote:
>
>
> [ ] +1 approve
>
> [ ] +0 no opinion
>
> [ ] -1 disapprove (and reason why)
>
> Cheers!
> Navina
>


[DISCUSS] SEP-6: Support Watermark Across Intermediate Streams for Batch Processing

2017-05-24 Thread xinyu liu
Hi all,

I created SEP-6 for SAMZA-1260
: Support Watermark
Across Intermediate Streams for Batch Processing. The link to the SEP is
here:

https://cwiki.apache.org/confluence/display/SAMZA/SEP-6+Support+Watermark+Across+Intermediate+Streams+for+Batch+Processing

Please review and comments are welcome!

Thanks,
Xinyu


Re: [DISCUSS] SEP-4: Adjunct Data Store for Unbounded DataSets

2017-05-19 Thread xinyu liu
Hi, Wei,

+1 on the proposed design. This is going to reduce a lot of heavy-lifting
work that's needed done by user code today to bootstrap a data stream into
local store. The configs look quite straightforward and easy to set up.
Overall the design looks great to me.

I have one question: in the proposal you mentioned "When Samza is running
in 24x7 mode, the stream for a bounded dataset may deliver multiple
versions.". So after the bootstrap of the initial version is done, what
will happen when the new version comes? Right now by default Bootstrap
stream is set up to be priority INT_MAX, meaning it will preempt other
streams to be processed if the bootstrap is going on. Are we expecting
pauses when the new version of adjunct data coming? Please let me know what
will be the plan to handle this scenario.

Thanks,
Xinyu

On Tue, May 16, 2017 at 2:15 PM, Navina Ramesh (Apache) 
wrote:

> Thanks for trying 3 times, Wei. Sorry about the trouble. Not sure where the
> problem lies. Looking forward to review your design.
>
> Navina
>
> On Tue, May 16, 2017 at 8:56 AM, Wei Song  wrote:
>
> > Hey everyone,
> >
> > I created a proposal for SAMZA-1278
> > , Adjunct Data Store
> > for Unbounded DataSets, which introduces an automatic mechanism to store
> > adjunct data for stream tasks.
> >
> > https://cwiki.apache.org/confluence/display/SAMZA/Adjunct+Da
> > ta+Store+for+Unbounded+DataSets
> >
> > Please review and comments are welcome!
> >
> > For those who are not actively following the master branch, you may have
> > more questions than others. Feel free to ask them here.
> >
> > P.S. this is the 3rd try, sent this last week, but apparently no one at
> > Linkedin has received, including samza-dev here just to be sure.
> >
> > --
> > Thanks,
> > -Wei
> >
>


Re: [DISCUSS] SEP-2: ApplicationRunner Design

2017-05-01 Thread xinyu liu
Looked again at Chris's beam-samza-runner implementation. Seems
LocalApplicationRunner.run() should be asynchronous too. Current
implementation is actually using a latch to wait for the StreamProcessors
to finish, which seems unnecessary. And we can provide a waitUntilFinish()
counterpart to the user. I created
https://issues.apache.org/jira/browse/SAMZA-1252 to track it.

Thanks,
Xinyu

On Fri, Apr 28, 2017 at 5:55 PM, xinyu liu <xinyuliu...@gmail.com> wrote:

> Right, option #2 seems redundant for defining streams after further
> discussion here. StreamSpec itself is flexible enough to achieve both
> static and programmatic specification of the stream. Agree it's not
> convenient for now (pretty obvious after looking at your bsr
> beam.runners.samza.wrapper), and we should provide similar predefined
> convenient wrappers for user to create the StreamSpec. In your case
> something like BoundedStreamSpec.file() which will generate the system
> and serialize the data as you did.
>
> We're still thinking the callback proposed in #2 can be useful for
> requirement #6: injecting other user objects in run time, such as stores
> and metrics. To simplify the user understanding further, I think we might
> hide the ApplicationRunner and expose the StreamApplication instead, which
> will make requirement #3 not user facing. So the API becomes like:
>
>   StreamApplication app = StreamApplication.local(config)
> .init (env -> {
>env.registerStore("my-store", new MyStoreFactory());
>env.registerMetricsReporter("my-reporte", new
> MyMetricsReporterFactory());
> })
> .withLifeCycleListener(myListener);
>
>   app.input(BoundedStreamSpec.create("/sample/input.txt"))
> .map(...)
> .window(...)
>
>   app.run();
>
> For requirement #5, I add a .withLifeCycleListener() in the API, which can
> trigger the callbacks with life cycle events.
>
> For #4: distribution of the jars will be what we have today using the Yarn
> localization with a remote store like artifactory or http server. We
> discussed where to put the graph serialization. The current thinking is to
> define a general interface which can backed by a remote store, like Kafka,
> artifactory or http server. For Kafka, it's straightforward but we will
> have the size limit or cut it by ourselves. For the other two, we need to
> investigate whether we can easily upload jars to our artifactory and
> localizing it with Yarn. Any opinions on this?
>
> Thanks,
> Xinyu
>
> On Fri, Apr 28, 2017 at 11:34 AM, Chris Pettitt <
> cpett...@linkedin.com.invalid> wrote:
>
>> Your proposal for #1 looks good.
>>
>> I'm not quite how to reconcile the proposals for #1 and #2. In #1 you add
>> the stream spec straight onto the runner while in #2 you do it in a
>> callback. If it is either-or, #1 looks a lot better for my purposes.
>>
>> For #4 what mechanism are you using to distribute the JARs? Can you use
>> the
>> same mechanism to distribute the serialized graph?
>>
>> On Fri, Apr 28, 2017 at 12:14 AM, xinyu liu <xinyuliu...@gmail.com>
>> wrote:
>>
>> > btw, I will get to SAMZA-1246 as soon as possible.
>> >
>> > Thanks,
>> > Xinyu
>> >
>> > On Thu, Apr 27, 2017 at 9:11 PM, xinyu liu <xinyuliu...@gmail.com>
>> wrote:
>> >
>> > > Let me try to capture the updated requirements:
>> > >
>> > > 1. Set up input streams outside StreamGraph, and treat graph building
>> as
>> > a
>> > > library (*Fluent, Beam*).
>> > >
>> > > 2. Improve ease of use for ApplicationRunner to avoid complex
>> > > configurations such as zkCoordinator, zkCoordinationService.
>> > (*Standalone*).
>> > > Provide some programmatic way to tweak them in the API.
>> > >
>> > > 3. Clean up ApplicationRunner into a single interface (*Fluent*). We
>> can
>> > > have one or more implementations but it's hidden from the users.
>> > >
>> > > 4. Separate StreamGraph from runtime environment so it can be
>> serialized
>> > (*Beam,
>> > > Yarn*)
>> > >
>> > > 5. Better life cycle management of application, parity with
>> > > StreamProcessor (*Standalone, Beam*). Stats should include exception
>> in
>> > > case of failure (tracked in SAMZA-1246).
>> > >
>> > > 6. Support injecting user-defined objects into ApplicationRunner.
>> > >
>> > > Prateek and I iterate on the ApplilcationRunner API based on these
>> > > requirements

Re: [DISCUSS] SEP-2: ApplicationRunner Design

2017-04-28 Thread xinyu liu
Right, option #2 seems redundant for defining streams after further
discussion here. StreamSpec itself is flexible enough to achieve both
static and programmatic specification of the stream. Agree it's not
convenient for now (pretty obvious after looking at your bsr
beam.runners.samza.wrapper), and we should provide similar predefined
convenient wrappers for user to create the StreamSpec. In your case
something like BoundedStreamSpec.file() which will generate the system
and serialize the data as you did.

We're still thinking the callback proposed in #2 can be useful for
requirement #6: injecting other user objects in run time, such as stores
and metrics. To simplify the user understanding further, I think we might
hide the ApplicationRunner and expose the StreamApplication instead, which
will make requirement #3 not user facing. So the API becomes like:

  StreamApplication app = StreamApplication.local(config)
.init (env -> {
   env.registerStore("my-store", new MyStoreFactory());
   env.registerMetricsReporter("my-reporte", new
MyMetricsReporterFactory());
})
.withLifeCycleListener(myListener);

  app.input(BoundedStreamSpec.create("/sample/input.txt"))
.map(...)
.window(...)

  app.run();

For requirement #5, I add a .withLifeCycleListener() in the API, which can
trigger the callbacks with life cycle events.

For #4: distribution of the jars will be what we have today using the Yarn
localization with a remote store like artifactory or http server. We
discussed where to put the graph serialization. The current thinking is to
define a general interface which can backed by a remote store, like Kafka,
artifactory or http server. For Kafka, it's straightforward but we will
have the size limit or cut it by ourselves. For the other two, we need to
investigate whether we can easily upload jars to our artifactory and
localizing it with Yarn. Any opinions on this?

Thanks,
Xinyu

On Fri, Apr 28, 2017 at 11:34 AM, Chris Pettitt <
cpett...@linkedin.com.invalid> wrote:

> Your proposal for #1 looks good.
>
> I'm not quite how to reconcile the proposals for #1 and #2. In #1 you add
> the stream spec straight onto the runner while in #2 you do it in a
> callback. If it is either-or, #1 looks a lot better for my purposes.
>
> For #4 what mechanism are you using to distribute the JARs? Can you use the
> same mechanism to distribute the serialized graph?
>
> On Fri, Apr 28, 2017 at 12:14 AM, xinyu liu <xinyuliu...@gmail.com> wrote:
>
> > btw, I will get to SAMZA-1246 as soon as possible.
> >
> > Thanks,
> > Xinyu
> >
> > On Thu, Apr 27, 2017 at 9:11 PM, xinyu liu <xinyuliu...@gmail.com>
> wrote:
> >
> > > Let me try to capture the updated requirements:
> > >
> > > 1. Set up input streams outside StreamGraph, and treat graph building
> as
> > a
> > > library (*Fluent, Beam*).
> > >
> > > 2. Improve ease of use for ApplicationRunner to avoid complex
> > > configurations such as zkCoordinator, zkCoordinationService.
> > (*Standalone*).
> > > Provide some programmatic way to tweak them in the API.
> > >
> > > 3. Clean up ApplicationRunner into a single interface (*Fluent*). We
> can
> > > have one or more implementations but it's hidden from the users.
> > >
> > > 4. Separate StreamGraph from runtime environment so it can be
> serialized
> > (*Beam,
> > > Yarn*)
> > >
> > > 5. Better life cycle management of application, parity with
> > > StreamProcessor (*Standalone, Beam*). Stats should include exception in
> > > case of failure (tracked in SAMZA-1246).
> > >
> > > 6. Support injecting user-defined objects into ApplicationRunner.
> > >
> > > Prateek and I iterate on the ApplilcationRunner API based on these
> > > requirements. To support #1, we can set up input streams on the runner
> > > level, which returns the MessageStream and allows graph building
> > > afterwards. The code looks like below:
> > >
> > >   ApplicationRunner runner = ApplicationRunner.local();
> > >   runner.input(streamSpec)
> > > .map(..)
> > > .window(...)
> > >   runner.run();
> > >
> > > StreamSpec is the building block for setting up streams here. It can be
> > > set up in different ways:
> > >
> > >   - Direct creation of stream spec, like runner.input(new
> StreamSpec(id,
> > > system, stream))
> > >   - Load from streamId from env or config, like
> > runner.input(runner.env().
> > > getStreamSpec(id))
> > >   - Canned Spec which generates the

Re: [DISCUSS] SEP-2: ApplicationRunner Design

2017-04-27 Thread xinyu liu
btw, I will get to SAMZA-1246 as soon as possible.

Thanks,
Xinyu

On Thu, Apr 27, 2017 at 9:11 PM, xinyu liu <xinyuliu...@gmail.com> wrote:

> Let me try to capture the updated requirements:
>
> 1. Set up input streams outside StreamGraph, and treat graph building as a
> library (*Fluent, Beam*).
>
> 2. Improve ease of use for ApplicationRunner to avoid complex
> configurations such as zkCoordinator, zkCoordinationService. (*Standalone*).
> Provide some programmatic way to tweak them in the API.
>
> 3. Clean up ApplicationRunner into a single interface (*Fluent*). We can
> have one or more implementations but it's hidden from the users.
>
> 4. Separate StreamGraph from runtime environment so it can be serialized 
> (*Beam,
> Yarn*)
>
> 5. Better life cycle management of application, parity with
> StreamProcessor (*Standalone, Beam*). Stats should include exception in
> case of failure (tracked in SAMZA-1246).
>
> 6. Support injecting user-defined objects into ApplicationRunner.
>
> Prateek and I iterate on the ApplilcationRunner API based on these
> requirements. To support #1, we can set up input streams on the runner
> level, which returns the MessageStream and allows graph building
> afterwards. The code looks like below:
>
>   ApplicationRunner runner = ApplicationRunner.local();
>   runner.input(streamSpec)
> .map(..)
> .window(...)
>   runner.run();
>
> StreamSpec is the building block for setting up streams here. It can be
> set up in different ways:
>
>   - Direct creation of stream spec, like runner.input(new StreamSpec(id,
> system, stream))
>   - Load from streamId from env or config, like runner.input(runner.env().
> getStreamSpec(id))
>   - Canned Spec which generates the StreamSpec with id, system and stream
> to minimize the configuration. For example, CollectionSpec.create(new
> ArrayList[]{1,2,3,4}), which will auto generate the system and stream in
> the spec.
>
> To support #2, we need to be able to set up StreamSpec-related objects and
> factories programmatically in env. Suppose we have the following before
> runner.input(...):
>
>   runner.setup(env /* a writable interface of env*/ -> {
> env.setStreamSpec(streamId, streamSpec);
> env.setSystem(systemName, systemFactory);
>   })
>
> runner.setup(->) also provides setup for stores and other runtime stuff
> needed for the execution. The setup should be able to serialized to config.
> For #6, I haven't figured out a good way to inject user-defined objects
> here yet.
>
> With this API, we should be able to also support #4. For remote
> runner.run(), the operator user classes/lamdas in the StreamGraph need to
> be serialized. As today, the existing option is to serialize to a stream,
> either the coordinator stream or the pipeline control stream, which will
> have the size limit per message. Do you see RPC as an option?
>
> For this version of API, seems we don't need the StreamApplication wrapper
> as well as exposing the StreamGraph. Do you think we are on the right path?
>
> Thanks,
> Xinyu
>
>
> On Thu, Apr 27, 2017 at 6:09 AM, Chris Pettitt <
> cpett...@linkedin.com.invalid> wrote:
>
>> That should have been:
>>
>> For #1, Beam doesn't have a hard requirement...
>>
>> On Thu, Apr 27, 2017 at 9:07 AM, Chris Pettitt <cpett...@linkedin.com>
>> wrote:
>>
>> > For #1, I doesn't have a hard requirement for any change from Samza. A
>> > very nice to have would be to allow the input systems to be set up at
>> the
>> > same time as the rest of the StreamGraph. An even nicer to have would
>> be to
>> > do away with the callback based approach and treat graph building as a
>> > library, a la Beam and Flink.
>> >
>> > For the moment I've worked around the two pass requirement (once for
>> > config, once for StreamGraph) by introducing an IR layer between Beam
>> and
>> > the Samza Fluent translation. The IR layer is convenient independent of
>> > this problem because it makes it easier to switch between the Fluent and
>> > low-level APIs.
>> >
>> >
>> > For #4, if we had parity with StreamProcessor for lifecycle we'd be in
>> > great shape. One additional issue with the status call that I may not
>> have
>> > mentioned is that it provides you no way to get at the cause of failure.
>> > The StreamProcessor API does allow this via the callback.
>> >
>> >
>> > Re. #2 and #3, I'm a big fan of getting rid of the extra configuration
>> > indirection you currently have to jump through (this is also related to
>>

Re: [DISCUSS] SEP-2: ApplicationRunner Design

2017-04-27 Thread xinyu liu
Let me try to capture the updated requirements:

1. Set up input streams outside StreamGraph, and treat graph building as a
library (*Fluent, Beam*).

2. Improve ease of use for ApplicationRunner to avoid complex
configurations such as zkCoordinator, zkCoordinationService. (*Standalone*).
Provide some programmatic way to tweak them in the API.

3. Clean up ApplicationRunner into a single interface (*Fluent*). We can
have one or more implementations but it's hidden from the users.

4. Separate StreamGraph from runtime environment so it can be
serialized (*Beam,
Yarn*)

5. Better life cycle management of application, parity with StreamProcessor
(*Standalone, Beam*). Stats should include exception in case of failure
(tracked in SAMZA-1246).

6. Support injecting user-defined objects into ApplicationRunner.

Prateek and I iterate on the ApplilcationRunner API based on these
requirements. To support #1, we can set up input streams on the runner
level, which returns the MessageStream and allows graph building
afterwards. The code looks like below:

  ApplicationRunner runner = ApplicationRunner.local();
  runner.input(streamSpec)
.map(..)
.window(...)
  runner.run();

StreamSpec is the building block for setting up streams here. It can be set
up in different ways:

  - Direct creation of stream spec, like runner.input(new StreamSpec(id,
system, stream))
  - Load from streamId from env or config, like
runner.input(runner.env().getStreamSpec(id))
  - Canned Spec which generates the StreamSpec with id, system and stream
to minimize the configuration. For example, CollectionSpec.create(new
ArrayList[]{1,2,3,4}), which will auto generate the system and stream in
the spec.

To support #2, we need to be able to set up StreamSpec-related objects and
factories programmatically in env. Suppose we have the following before
runner.input(...):

  runner.setup(env /* a writable interface of env*/ -> {
env.setStreamSpec(streamId, streamSpec);
env.setSystem(systemName, systemFactory);
  })

runner.setup(->) also provides setup for stores and other runtime stuff
needed for the execution. The setup should be able to serialized to config.
For #6, I haven't figured out a good way to inject user-defined objects
here yet.

With this API, we should be able to also support #4. For remote
runner.run(), the operator user classes/lamdas in the StreamGraph need to
be serialized. As today, the existing option is to serialize to a stream,
either the coordinator stream or the pipeline control stream, which will
have the size limit per message. Do you see RPC as an option?

For this version of API, seems we don't need the StreamApplication wrapper
as well as exposing the StreamGraph. Do you think we are on the right path?

Thanks,
Xinyu


On Thu, Apr 27, 2017 at 6:09 AM, Chris Pettitt <
cpett...@linkedin.com.invalid> wrote:

> That should have been:
>
> For #1, Beam doesn't have a hard requirement...
>
> On Thu, Apr 27, 2017 at 9:07 AM, Chris Pettitt <cpett...@linkedin.com>
> wrote:
>
> > For #1, I doesn't have a hard requirement for any change from Samza. A
> > very nice to have would be to allow the input systems to be set up at the
> > same time as the rest of the StreamGraph. An even nicer to have would be
> to
> > do away with the callback based approach and treat graph building as a
> > library, a la Beam and Flink.
> >
> > For the moment I've worked around the two pass requirement (once for
> > config, once for StreamGraph) by introducing an IR layer between Beam and
> > the Samza Fluent translation. The IR layer is convenient independent of
> > this problem because it makes it easier to switch between the Fluent and
> > low-level APIs.
> >
> >
> > For #4, if we had parity with StreamProcessor for lifecycle we'd be in
> > great shape. One additional issue with the status call that I may not
> have
> > mentioned is that it provides you no way to get at the cause of failure.
> > The StreamProcessor API does allow this via the callback.
> >
> >
> > Re. #2 and #3, I'm a big fan of getting rid of the extra configuration
> > indirection you currently have to jump through (this is also related to
> > system consumer configuration from #1. It makes it much easier to
> discover
> > what the configurable parameters are too, if we provide some programmatic
> > way to tweak them in the API - which can turn into config under the hood.
> >
> > On Wed, Apr 26, 2017 at 9:20 PM, xinyu liu <xinyuliu...@gmail.com>
> wrote:
> >
> >> Let me give a shot to summarize the requirements for ApplicationRunner
> we
> >> have discussed so far:
> >>
> >> - Support environment for passing in user-defined objects (streams
> >> potentially) into ApplicationRunner (*

Re: [DISCUSS] SEP-2: ApplicationRunner Design

2017-04-26 Thread xinyu liu
Let me give a shot to summarize the requirements for ApplicationRunner we
have discussed so far:

- Support environment for passing in user-defined objects (streams
potentially) into ApplicationRunner (*Beam*)

- Improve ease of use for ApplicationRunner to avoid complex configurations
such as zkCoordinator, zkCoordinationService. (*Standalone*)

- Clean up ApplicationRunner into a single interface (*Fluent*). We can
have one or more implementations but it's hidden from the users.

- Separate StreamGraph from environment so it can be serializable (*Beam,
Yarn*)

- Better life cycle management of application, including
start/stop/stats (*Standalone,
Beam*)


One way to address 2 and 3 is to provide pre-packaged runner using static
factory methods, and the return type will be the ApplicationRunner
interface. So we can have:

  ApplicationRunner runner = ApplicationRunner.zk() / ApplicationRunner.local()
/ ApplicationRunner.remote() / ApplicationRunner.test().

Internally we will package the right configs and run-time environment with
the runner. For example, ApplicationRunner.zk() will define all the configs
needed for zk coordination.

To support 1 and 4, can we pass in a lambda function in the runner, and
then we can run the stream graph? Like the following:

  ApplicationRunner.zk().env(config -> environment).run(streamGraph);

Then we need a way to pass the environment into the StreamGraph. This can
be done by either adding an extra parameter to each operator, or have a
getEnv() function in the MessageStream, which seems to be pretty hacky.

What do you think?

Thanks,
Xinyu





On Sun, Apr 23, 2017 at 11:01 PM, Prateek Maheshwari <
pmaheshw...@linkedin.com.invalid> wrote:

> Thanks for putting this together Yi!
>
> I agree with Jake, it does seem like there are a few too many moving parts
> here. That said, the problem being solved is pretty broad, so let me try to
> summarize my current understanding of the requirements. Please correct me
> if I'm wrong or missing something.
>
> ApplicationRunner and JobRunner first, ignoring test environment for the
> moment.
> ApplicationRunner:
> 1. Create execution plan: Same in Standalone and Yarn
> 2. Create intermediate streams: Same logic but different leader election
> (ZK-based or pre-configured in standalone, AM in Yarn).
> 3. Run jobs: In JVM in standalone. Submit to the cluster in Yarn.
>
> JobRunner:
> 1. Run the StreamProcessors: Same process in Standalone & Test. Remote host
> in Yarn.
>
> To get a single ApplicationRunner implementation, like Jake suggested, we
> need to make leader election and JobRunner implementation pluggable.
> There's still the question of whether ApplicationRunner#run API should be
> blocking or non-blocking. It has to be non-blocking in YARN. We want it to
> be blocking in standalone, but seems like the main reason is ease of use
> when launched from main(). I'd prefer making it consitently non-blocking
> instead, esp. since in embedded standalone mode (where the processor is
> running in another container) a blocking API would not be user-friendly
> either. If not, we can add both run and runBlocking.
>
> Coming to RuntimeEnvironment, which is the least clear to me so far:
> 1. I don't think RuntimeEnvironment should be responsible for providing
> StreamSpecs for streamIds - they can be obtained with a config/util class.
> The StreamProcessor should only know about logical streamIds and the
> streamId <-> actual stream mapping should happen within the
> SystemProducer/Consumer/Admins provided by the RuntimeEnvironment.
> 2. There's also other components that the user might be interested in
> providing implementations of in embedded Standalone mode (i.e., not just in
> tests) - MetricsRegistry and JMXServer come to mind.
> 3. Most importantly, it's not clear to me who creates and manages the
> RuntimeEnvironment. It seems like it should be the ApplicationRunner or the
> user because of (2) above and because StreamManager also needs access to
> SystemAdmins for creating intermediate streams which users might want to
> mock. But it also needs to be passed down to the StreamProcessor - how
> would this work on Yarn?
>
> I think we should figure out how to integrate RuntimeEnvironment with
> ApplicationRunner before we can make a call on one vs. multiple
> ApplicationRunner implementations. If we do keep LocalApplicationRunner and
> RemoteApplication (and TestApplicationRunner) separate, agree with Jake
> that we should remove the JobRunners and roll them up into the respective
> ApplicationRunners.
>
> - Prateek
>
> On Thu, Apr 20, 2017 at 10:06 AM, Jacob Maes  wrote:
>
> > Thanks for the SEP!
> >
> > +1 on introducing these new components
> > -1 on the current definition of their roles (see Design feedback below)
> >
> > *Design*
> >
> >- If LocalJobRunner and RemoteJobRunner handle the different methods
> of
> >launching a Job, what additional value do the different types of
> >

Re: [VOTE] SEP-1: Semantics of ProcessorId in Samza

2017-03-28 Thread xinyu liu
+1 on my side. Very happy to see this proposal. This is a blocker for
integrating fluent API with StreamProcessor, and hopefully we can get it
resolved soon :).

Thanks,
Xinyu

On Tue, Mar 28, 2017 at 11:28 AM, Navina Ramesh (Apache) 
wrote:

> Hi everyone,
>
> This is a voting thread for SEP-1: Semantics of ProcessorId in Samza.
> For reference, here is the wiki link:
> https://cwiki.apache.org/confluence/display/SAMZA/SEP-
> 1%3A+Semantics+of+ProcessorId+in+Samza
>
> Link to discussion mail thread:
> http://mail-archives.apache.org/mod_mbox/samza-dev/201703.
> mbox/%3CCANazzuuHiO%3DvZQyFbTiYU-0Sfh3riK%3Dz4j_AdCicQ8rBO%3DXuYQ%40mail.
> gmail.com%3E
>
> Please vote on this SEP asap. :)
>
> Thanks!
> Navina
>


Re: Review Request 52168: Tasks endpoint to list the complete details of all tasks related to a job

2017-03-16 Thread Xinyu Liu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52168/#review169231
---


Ship it!




Ship It!

- Xinyu Liu


On Feb. 16, 2017, 6:26 a.m., Shanthoosh Venkataraman wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52168/
> ---
> 
> (Updated Feb. 16, 2017, 6:26 a.m.)
> 
> 
> Review request for samza.
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> This patch contains the following changes
>  * Http get api to list the complete details of all the tasks that belongs to 
> a job. 
>  * Refactored some methods in coordinator stream, to reuse the existing 
> functionality of getting jobConfig from the coordinator stream.
> 
> 
> Diffs
> -
> 
>   docs/learn/documentation/versioned/rest/resource-directory.md 
> 79746d1e2eb3491e4bd26c3c7cf6c7efd150d8ef 
>   docs/learn/documentation/versioned/rest/resources/tasks.md PRE-CREATION 
>   
> samza-core/src/main/java/org/apache/samza/standalone/StandaloneJobCoordinator.java
>  46dbf30eb37f7d617fb868fb4e1561fef18522d5 
>   samza-core/src/main/java/org/apache/samza/zk/ZkJobCoordinator.java 
> PRE-CREATION 
>   
> samza-core/src/main/scala/org/apache/samza/coordinator/JobModelManager.scala 
> 14d5dff00758c6e35cae018b4ebaa686d67bb57d 
>   samza-core/src/main/scala/org/apache/samza/job/JobRunner.scala 
> 022b480856483059cb9f837a08f97a718bc04c31 
>   samza-core/src/main/scala/org/apache/samza/util/Util.scala 
> 9019d02bc83e0e76a1b6a6fb9958733dfe8b86a4 
>   
> samza-core/src/test/java/org/apache/samza/execution/TestExecutionPlanner.java 
> PRE-CREATION 
>   samza-rest/src/main/config/samza-rest.properties 
> 7be0b47d1466d2199ae278247e8d81522fb6a91c 
>   samza-rest/src/main/java/org/apache/samza/rest/model/Partition.java 
> PRE-CREATION 
>   samza-rest/src/main/java/org/apache/samza/rest/model/Task.java PRE-CREATION 
>   
> samza-rest/src/main/java/org/apache/samza/rest/proxy/job/AbstractJobProxy.java
>  4d8647f3e1e650632e38b47ba5a8a2dac004f545 
>   
> samza-rest/src/main/java/org/apache/samza/rest/proxy/job/JobProxyFactory.java 
> 067711a74e5b0d7277a9c8b2d2517b56e9cfbcca 
>   
> samza-rest/src/main/java/org/apache/samza/rest/proxy/job/SimpleYarnJobProxy.java
>  a935c98730f85f448c688a6baf2e8ddffdbb2cb4 
>   
> samza-rest/src/main/java/org/apache/samza/rest/proxy/job/SimpleYarnJobProxyFactory.java
>  11d93d4608d23a4e3fb3bfc50dfac35ab6dbdf3c 
>   
> samza-rest/src/main/java/org/apache/samza/rest/proxy/task/SamzaTaskProxy.java 
> PRE-CREATION 
>   
> samza-rest/src/main/java/org/apache/samza/rest/proxy/task/SamzaTaskProxyFactory.java
>  PRE-CREATION 
>   samza-rest/src/main/java/org/apache/samza/rest/proxy/task/TaskProxy.java 
> PRE-CREATION 
>   
> samza-rest/src/main/java/org/apache/samza/rest/proxy/task/TaskProxyFactory.java
>  PRE-CREATION 
>   
> samza-rest/src/main/java/org/apache/samza/rest/proxy/task/TaskResourceConfig.java
>  PRE-CREATION 
>   
> samza-rest/src/main/java/org/apache/samza/rest/resources/BaseResourceConfig.java
>  PRE-CREATION 
>   
> samza-rest/src/main/java/org/apache/samza/rest/resources/DefaultResourceFactory.java
>  e0224c6bcf4aeaa336e5786ac472482507fcd382 
>   samza-rest/src/main/java/org/apache/samza/rest/resources/JobsResource.java 
> a566db598c284d69ea61af88fdc0851483d5a089 
>   
> samza-rest/src/main/java/org/apache/samza/rest/resources/JobsResourceConfig.java
>  527482d2ee55747e7b3f9c54c8a3b1afe7ad8797 
>   samza-rest/src/main/java/org/apache/samza/rest/resources/Responses.java 
> PRE-CREATION 
>   samza-rest/src/main/java/org/apache/samza/rest/resources/TasksResource.java 
> PRE-CREATION 
>   
> samza-rest/src/test/java/org/apache/samza/rest/resources/TestJobsResource.java
>  7db437b348ecd286185898b8f8ab0220d59da71a 
>   
> samza-rest/src/test/java/org/apache/samza/rest/resources/TestTasksResource.java
>  PRE-CREATION 
>   
> samza-rest/src/test/java/org/apache/samza/rest/resources/mock/MockInstallationFinder.java
>  PRE-CREATION 
>   
> samza-rest/src/test/java/org/apache/samza/rest/resources/mock/MockResourceFactory.java
>  PRE-CREATION 
>   
> samza-rest/src/test/java/org/apache/samza/rest/resources/mock/MockTaskProxy.java
>  PRE-CREATION 
>   
> samza-rest/src/test/java/org/apache/samza/rest/resources/mock/MockTaskProxyFactory.java
>  PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/52168/diff/15/
> 
> 
> Testing
> ---
> 
> Manual and unit testing has been done to verify the apis.
> 
> 
> Thanks,
> 
> Shanthoosh Venkataraman
> 
>



Re: [DISCUSS] SEP-1: Semantics of ProcessorId in Samza

2017-03-16 Thread xinyu liu
Right, the static factory is very simple as you said. It's pretty
convenient for the client to use.

I am working on the ApplicationRunner SEP right now. Will send out the
discussion email once I am done.

Thanks,
Xinyu

On Thu, Mar 16, 2017 at 4:50 PM, Navina Ramesh (Apache) <nav...@apache.org>
wrote:

> > One minor thing I found is that the name of the config is camel case
> (*processor.idGenerator.class*). Seems Samza's practice is to use all
> lower
> case configs with "." delimiter. Do you think we should stick to this
> convention?
>
> I am always torn between the "convention" we have and the better way of
> doing things. But I don't have strong opinions about it. I can change it.
>
> > One more suggestion is to have a static factory method in the
> ProcessorIdGenerator (Like what we have in ApplicationRunner):
>
> I couldn't grasp these requirements from the ApplicationRunner design. It
> will be great if you can put it out in an SEP :)
>
> I can add the static factory method for it. Just to clarify, the static
> method simply class loads the ProcessorIdGenerator ? It uses reflection to
> create the instance ?
>
> Thanks!
> Navina
>
>
>
> On Thu, Mar 16, 2017 at 4:31 PM, xinyu liu <xinyuliu...@gmail.com> wrote:
>
> > The proposal looks great to me! Changing the id type to string will make
> > sure this can work with other types of cluster which doesn't support
> > integer id. The interface and config provides a pluggable way to have
> > different id generators for different use cases. One minor thing I found
> is
> > that the name of the config is camel case (*processor.idGenerator.class*
> ).
> > Seems Samza's practice is to use all lower case configs with "."
> delimiter.
> > Do you think we should stick to this convention?
> >
> > One more suggestion is to have a static factory method in
> > the ProcessorIdGenerator (Like what we have in ApplicationRunner):
> >
> > static ProcessIdGenerator fromConfig(Config config) { ... }.
> >
> > With this, It will be more convenient for the ApplicationRunner to
> > construct the generator. What do you think?
> >
> > Thanks,
> > Xinyu
> >
> > On Wed, Mar 15, 2017 at 10:59 PM, Navina Ramesh (Apache) <
> > nav...@apache.org>
> > wrote:
> >
> > > Hi everyone,
> > > I created a proposal for SAMZA-1126, which addresses the semantics of
> > > ProcessorId in Samza. For most purposes, ProcessorId is same as the
> > logical
> > > id that Samza assigns for each Yarn container. It is primarily used in
> > > JobModel as a key for the corresponding ContainerModel and also, in
> > > container-level metrics. We are expanding the applicability of
> > processorId
> > > to be beyond a fixed set of processors.
> > >
> > > Please review and comment on this SEP.
> > >
> > > For those who are not actively following the master branch, you may
> have
> > > more questions than others. Feel free to ask them here.
> > >
> > > @Xinyu: Since you are working on SAMZA-1067 and other related
> integration
> > > APIs, can you please add an SEP for SAMZA-1067 ? This will help others
> > (adn
> > > me as well) get on the same page with your design/code. Let me know if
> > > SEP-1 will work per your design for ApplicationRunner.
> > >
> > > Thanks!
> > > Navina
> > >
> >
>


Re: [DISCUSS] SEP-1: Semantics of ProcessorId in Samza

2017-03-16 Thread xinyu liu
The proposal looks great to me! Changing the id type to string will make
sure this can work with other types of cluster which doesn't support
integer id. The interface and config provides a pluggable way to have
different id generators for different use cases. One minor thing I found is
that the name of the config is camel case (*processor.idGenerator.class*).
Seems Samza's practice is to use all lower case configs with "." delimiter.
Do you think we should stick to this convention?

One more suggestion is to have a static factory method in
the ProcessorIdGenerator (Like what we have in ApplicationRunner):

static ProcessIdGenerator fromConfig(Config config) { ... }.

With this, It will be more convenient for the ApplicationRunner to
construct the generator. What do you think?

Thanks,
Xinyu

On Wed, Mar 15, 2017 at 10:59 PM, Navina Ramesh (Apache) 
wrote:

> Hi everyone,
> I created a proposal for SAMZA-1126, which addresses the semantics of
> ProcessorId in Samza. For most purposes, ProcessorId is same as the logical
> id that Samza assigns for each Yarn container. It is primarily used in
> JobModel as a key for the corresponding ContainerModel and also, in
> container-level metrics. We are expanding the applicability of processorId
> to be beyond a fixed set of processors.
>
> Please review and comment on this SEP.
>
> For those who are not actively following the master branch, you may have
> more questions than others. Feel free to ask them here.
>
> @Xinyu: Since you are working on SAMZA-1067 and other related integration
> APIs, can you please add an SEP for SAMZA-1067 ? This will help others (adn
> me as well) get on the same page with your design/code. Let me know if
> SEP-1 will work per your design for ApplicationRunner.
>
> Thanks!
> Navina
>


Re: Review Request 52168: Tasks endpoint to list the complete details of all tasks related to a job

2017-03-15 Thread Xinyu Liu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52168/#review169086
---



Please rebase this patch with master.


samza-core/src/main/scala/org/apache/samza/coordinator/JobCoordinator.scala
Line 64 (original), 78 (patched)
<https://reviews.apache.org/r/52168/#comment241409>

Add java doc indicating we not only read jobmodel but update some other 
stuff.



samza-core/src/main/scala/org/apache/samza/coordinator/JobCoordinator.scala
Line 120 (original), 155 (patched)
<https://reviews.apache.org/r/52168/#comment241408>

make this private and don't expose it outside.



samza-core/src/main/scala/org/apache/samza/coordinator/JobCoordinator.scala
Line 188 (original), 224 (patched)
<https://reviews.apache.org/r/52168/#comment241407>

rename this as getJobModle. In javadoc, make clear there is no side effect.


- Xinyu Liu


On Feb. 16, 2017, 6:26 a.m., Shanthoosh Venkataraman wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52168/
> ---
> 
> (Updated Feb. 16, 2017, 6:26 a.m.)
> 
> 
> Review request for samza.
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> This patch contains the following changes
>  * Http get api to list the complete details of all the tasks that belongs to 
> a job. 
>  * Refactored some methods in coordinator stream, to reuse the existing 
> functionality of getting jobConfig from the coordinator stream.
> 
> 
> Diffs
> -
> 
>   docs/learn/documentation/versioned/rest/resource-directory.md 
> 79746d1e2eb3491e4bd26c3c7cf6c7efd150d8ef 
>   docs/learn/documentation/versioned/rest/resources/tasks.md PRE-CREATION 
>   samza-core/src/main/scala/org/apache/samza/coordinator/JobCoordinator.scala 
> df63b97e9d598ecd1840111ba490a723e410d089 
>   samza-core/src/main/scala/org/apache/samza/job/JobRunner.scala 
> 022b480856483059cb9f837a08f97a718bc04c31 
>   samza-core/src/main/scala/org/apache/samza/util/Util.scala 
> c4836f202f7eda1d4e71eac94fd48e46207b0316 
>   samza-rest/src/main/config/samza-rest.properties 
> 7be0b47d1466d2199ae278247e8d81522fb6a91c 
>   samza-rest/src/main/java/org/apache/samza/rest/model/Partition.java 
> PRE-CREATION 
>   samza-rest/src/main/java/org/apache/samza/rest/model/Task.java PRE-CREATION 
>   
> samza-rest/src/main/java/org/apache/samza/rest/proxy/job/AbstractJobProxy.java
>  4d8647f3e1e650632e38b47ba5a8a2dac004f545 
>   
> samza-rest/src/main/java/org/apache/samza/rest/proxy/job/JobProxyFactory.java 
> 067711a74e5b0d7277a9c8b2d2517b56e9cfbcca 
>   
> samza-rest/src/main/java/org/apache/samza/rest/proxy/job/SimpleYarnJobProxy.java
>  a935c98730f85f448c688a6baf2e8ddffdbb2cb4 
>   
> samza-rest/src/main/java/org/apache/samza/rest/proxy/job/SimpleYarnJobProxyFactory.java
>  11d93d4608d23a4e3fb3bfc50dfac35ab6dbdf3c 
>   
> samza-rest/src/main/java/org/apache/samza/rest/proxy/task/SamzaTaskProxy.java 
> PRE-CREATION 
>   
> samza-rest/src/main/java/org/apache/samza/rest/proxy/task/SamzaTaskProxyFactory.java
>  PRE-CREATION 
>   samza-rest/src/main/java/org/apache/samza/rest/proxy/task/TaskProxy.java 
> PRE-CREATION 
>   
> samza-rest/src/main/java/org/apache/samza/rest/proxy/task/TaskProxyFactory.java
>  PRE-CREATION 
>   
> samza-rest/src/main/java/org/apache/samza/rest/proxy/task/TaskResourceConfig.java
>  PRE-CREATION 
>   
> samza-rest/src/main/java/org/apache/samza/rest/resources/BaseResourceConfig.java
>  PRE-CREATION 
>   
> samza-rest/src/main/java/org/apache/samza/rest/resources/DefaultResourceFactory.java
>  e0224c6bcf4aeaa336e5786ac472482507fcd382 
>   samza-rest/src/main/java/org/apache/samza/rest/resources/JobsResource.java 
> a566db598c284d69ea61af88fdc0851483d5a089 
>   
> samza-rest/src/main/java/org/apache/samza/rest/resources/JobsResourceConfig.java
>  527482d2ee55747e7b3f9c54c8a3b1afe7ad8797 
>   samza-rest/src/main/java/org/apache/samza/rest/resources/Responses.java 
> PRE-CREATION 
>   samza-rest/src/main/java/org/apache/samza/rest/resources/TasksResource.java 
> PRE-CREATION 
>   
> samza-rest/src/test/java/org/apache/samza/rest/resources/TestJobsResource.java
>  7db437b348ecd286185898b8f8ab0220d59da71a 
>   
> samza-rest/src/test/java/org/apache/samza/rest/resources/TestTasksResource.java
>  PRE-CREATION 
>   
> samza-rest/src/test/java/org/apache/samza/rest/resources/mock/MockInstallationFinder.java
>  PRE-CREATION 
>   
> samza-rest/src/test/java/org/apache/samza/rest/resources/mock/MockResourceFac

Re: [DISCUSS] SAMZA-1141 - Apache Samza Development Process Improvements

2017-03-14 Thread xinyu liu
+1 on this proposal too. Could you actually put this proposal as the first
SEP (like SEP-0), so it serves an example of how it will look like in
practice?

Xinyu

On Tue, Mar 14, 2017 at 3:34 PM, Navina Ramesh  wrote:

> Just to clarify: The proposal for code and design process change is
> attached as a PDF/markdown to the JIRA - SAMZA-1141.
>
> Also, please show your support specifically for code and design process. My
> bad for not calling it out earlier :)
>
> Thanks!
> Navina
>
> On Tue, Mar 14, 2017 at 3:30 PM, Jagadish Venkatraman <
> jagadish1...@gmail.com> wrote:
>
> > Thanks for writing this up.
> >
> > I'm +1 on this proposal.
> >
> >
> >
> > On Tue, Mar 14, 2017 at 3:15 PM, Navina Ramesh (Apache) <
> nav...@apache.org
> > >
> > wrote:
> >
> > > Hi everyone,
> > >
> > > We switched to using Pull Requests for code reviews a few months back.
> > > Clearly, there are some drawbacks to that model and we are trying to
> > > address the shortcomings. I have gathered input from some of the
> > committers
> > > regarding what is missing the code review process and what can be
> > improved.
> > > Please take a look and provide feedback.
> > >
> > > Additionally, we are considering moving to a KIP/FLIP-like model for
> > > submitting design proposals (major changes to samza). Lately, there
> have
> > > been some major feature discussions that are not documented
> consistently
> > in
> > > a centralized location. The proposal in SAMZA-1141
> > >  address the design
> > > review process as well. Please review it too. I have already created a
> > wiki
> > > page
> > >  > > Samza+Enhancement+Proposal>
> > > describing the Samza Enhancement Proposal (SEP) process and an SEP
> > > template. Going forward, let's start adding all major change proposals
> to
> > > the wiki and discuss the design on the mailing list.
> > >
> > > Your cooperation is highly appreciated during this period of transition
> > in
> > > the process :)
> > >
> > > Feedbacks welcome!
> > >
> > > Thanks!
> > > --
> > > Navina R
> > >
> > > PS: Alternatives name suggestions for "SEP" are welcome !
> > >
> >
> >
> >
> > --
> > Jagadish V,
> > Graduate Student,
> > Department of Computer Science,
> > Stanford University
> >
>
>
>
> --
> Navina R.
>


Re: Understanding metrics

2017-03-14 Thread xinyu liu
Hi, Ankit,

When running your job in multithreading, block-ns here actually includes
the process_ns. This is because after your task.process() is submitted to
the thread pool, the run loop thread will be blocked until the process() is
complete for one of the task. It's interesting that block-ns (0.3 ms) is
much longer than process-ns (0.12 ms). I am wondering whether you also have
window and checkpoint configured for your job. Since window and checkpoint
will also be running inside this thread pool to improve the parallelism,
block-ns will be affected since the run loop will block for
window/checkpoint to complete. If you are using window/commit, please send
us the configs (task.window.ms and task.commit.ms) and the timer metrics
(window-ns and commit-ns). Then we can correlate better with block-ns.

Thanks,
Xinyu

On Tue, Mar 14, 2017 at 3:33 PM, Ankit Malhotra 
wrote:

> Wait, block-ns = 0.3ms (300,000ns). Also, why are we not adding in
> choose-ns?
>
> Thanks
> Ankit
>
> On 3/14/17, 6:26 PM, "Jagadish Venkatraman" 
> wrote:
>
> I would expect (process_ns + block_ns) to be almost the same as 0.15
> which
> makes sense.
>
> process_ns = 0.12 ms
> block_ns = 0.03 ms
> process_ns + block_ns ~ 0.15ms
>
> This corresponds to the number of process calls roughly 1/7000 ~
> 0.15ms per
> process call.
>
> >> Each process call is a separate thread.
> Given that you have one partition in each container, and you want
> in-order
> processing, there will be only one thread that's processing messages.
> The
> two other threads are not doing work, and entail a constant (albeit
> insignificant) synchronization overhead.
>
>
>
>
>
> On Tue, Mar 14, 2017 at 3:03 PM, Ankit Malhotra <
> amalho...@appnexus.com>
> wrote:
>
> > Hi,
> >
> > We are trying to understand metrics that are being populated by our
> samza
> > job and are a little confused what each of these metrics mean
> especially
> > since we’re running the job with a thread pool.
> >
> >
> > · We have 3 input streams
> >
> > · job.container.thread.pool.size=3
> >
> > · 1 container per partition
> >
> > · Using a RocksDB backed store with changelogging
> >
> > · process-ns = 120,000
> >
> > · get-ns ~ 30,000
> >
> > · put-ns ~ 90,000
> >
> > · block-ns ~ 300,000
> >
> > · choose-ns ~ 500,000
> >
> > Metrics are avg(metric) for all containers/partitions.
> >
> > Process-envelopes ~ 7000/sec.
> >
> > If I back the math out, this correlates quite closely to process-ns.
> > (1/7000 ~ 0.15ms).
> >
> > What I don’t understand is that the event loop is single threaded.
> Even
> > though, each process call is a separate thread, the main even loop is
> > blocking (block-ns) and choosing (choose-ns) every time and these
> times are
> > quite high. Given these metrics, how is it that we are consistently
> seeing
> > the above metrics?
> >
> > Also, lag (messages behind high watermark) is ~ 0.
> >
> > Thanks
> > Ankit
> >
> >
> >
> >
> >
> >
>
>
> --
> Jagadish V,
> Graduate Student,
> Department of Computer Science,
> Stanford University
>
>
>


Re: Review Request 56928: SAMZA-1099: Changes to hello-samza (latest branch) for the 0.12.0 release

2017-02-22 Thread Xinyu Liu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56928/#review166399
---


Ship it!




Ship It!

- Xinyu Liu


On Feb. 22, 2017, 3:45 p.m., Jagadish Venkatraman wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56928/
> ---
> 
> (Updated Feb. 22, 2017, 3:45 p.m.)
> 
> 
> Review request for samza, Jake Maes, Prateek Maheshwari, Xinyu Liu, and Yi 
> Pan (Data Infrastructure).
> 
> 
> Repository: samza-hello-samza
> 
> 
> Description
> ---
> 
> * Upgrade kafka version to 0.10.
> * Upgrade the kafka broker versions downloaded in bin/grid scripts to 0.10
> * Fix all binary dependencies to point to _2.11 instead of _2.11 because of 
> changes in scala version.
> * Update hello-samza version to 0.13.0
> * Update the samza version being used by hello-samza to 0.13.0-SNAPSHOT.
> 
> 
> Diffs
> -
> 
>   bin/grid 74ee026987e68d431785fd690b1a6a3501c5953d 
>   build.gradle e838b1806e5925c6837966e4dc74b7cd9dc32260 
>   gradle.properties 3d8a0ea504f2a2d628162730a77c2ee79b121a30 
>   pom.xml d90b1f90a354006724907835635068b2cf06aae6 
>   src/main/assembly/src.xml f57fee2acb0c27f78761efdcc63ed8208a1888bd 
> 
> Diff: https://reviews.apache.org/r/56928/diff/
> 
> 
> Testing
> ---
> 
> Tested, and deployed the changes with sample jobs.
> Verified hello-samza was working with the latest samza-0.13.0-SNAPSHOT 
> version.
> 
> 
> Thanks,
> 
> Jagadish Venkatraman
> 
>



Re: Review Request 56909: SAMZA-1099: Changes to hello-samza (master branch) for the 0.12.0 release

2017-02-22 Thread Xinyu Liu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56909/#review166398
---


Ship it!




Ship It!

- Xinyu Liu


On Feb. 22, 2017, 3:55 p.m., Jagadish Venkatraman wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56909/
> ---
> 
> (Updated Feb. 22, 2017, 3:55 p.m.)
> 
> 
> Review request for samza, Jake Maes, Navina Ramesh, Prateek Maheshwari, Xinyu 
> Liu, and Yi Pan (Data Infrastructure).
> 
> 
> Repository: samza-hello-samza
> 
> 
> Description
> ---
> 
> * Upgrade kafka version to 0.10.
> * Upgrade the kafka broker versions downloaded in bin/grid scripts to 0.10
> * Fix all binary dependencies to point to _2.11 instead of _2.11 because of 
> changes in scala version.
> * Update hello-samza version to 0.12.0
> * Update the samza version being used by hello-samza to 0.12.0.
> 
> 
> Diffs
> -
> 
>   bin/grid 042cabe355ec3a8fff06f193252fec494d01a864 
>   build.gradle 1a597801c4f2025f02e1f9d890a4d0419e6a3980 
>   gradle.properties 2605f933fdf69f7e8c4903411a6b46763ef58fb4 
>   pom.xml 4e45c7e5a902bfa46bbd78b438edef88ac60a8fa 
>   src/main/assembly/src.xml f57fee2acb0c27f78761efdcc63ed8208a1888bd 
> 
> Diff: https://reviews.apache.org/r/56909/diff/
> 
> 
> Testing
> ---
> 
> Deployed 3 sample jobs, and tested each of them by consuming outputs.
> 
> 
> Thanks,
> 
> Jagadish Venkatraman
> 
>



Re: Review Request 56913: SAMZA-1099: Documentation updates for Samza 0.12 release (for master branch)

2017-02-22 Thread Xinyu Liu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56913/#review166396
---


Ship it!




Ship It!

- Xinyu Liu


On Feb. 22, 2017, 7:29 p.m., Jagadish Venkatraman wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56913/
> ---
> 
> (Updated Feb. 22, 2017, 7:29 p.m.)
> 
> 
> Review request for samza, Jake Maes, Navina Ramesh, Prateek Maheshwari, Xinyu 
> Liu, and Yi Pan (Data Infrastructure).
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> SAMZA-1099: Documentation updates for Samza 0.12 release (for master branch)
> 
> 
> Diffs
> -
> 
>   docs/_config.yml 2b5bf059f5d1c67fb9bddf6f9cf1fc606a73b428 
>   docs/_layouts/default.html dfcb6d06223961f0c6a5886845da2fa1806be608 
>   docs/archive/index.html e92f50e0dfbb7856cf5b94b026d409fb5b6ba2a4 
>   docs/learn/documentation/versioned/jobs/split-deployment.md 
> ebab670ae0476f3dd2bba8d0ce0c893135cf2ea2 
>   docs/learn/tutorials/versioned/deploy-samza-job-from-hdfs.md 
> 6f7dcc1107b9b2ad5ab83449efbe9780a600700c 
>   docs/learn/tutorials/versioned/deploy-samza-to-CDH.md 
> fff209f558aceb4069e40b533e4f3ae8d1419df0 
>   docs/learn/tutorials/versioned/remote-debugging-samza.md 
> 7cc3a0ebf59f76e2191df7b0ddc6cb690b80bba6 
>   docs/learn/tutorials/versioned/run-in-multi-node-yarn.md 
> 7e7ba8de537c2b28e581ba6f369ab147a74cddc6 
>   docs/learn/tutorials/versioned/samza-rest-getting-started.md 
> c0e1cf5eb1aba5cd87a62cbc3248336e47072b7b 
>   docs/startup/download/index.md 7dfdfda21caf874b72f45e2e25db6ad9980b8342 
>   docs/startup/hello-samza/versioned/index.md 
> d0aca547cfbf38aaa24ef0d2c6cf3b29a17fb64b 
> 
> Diff: https://reviews.apache.org/r/56913/diff/
> 
> 
> Testing
> ---
> 
> Built, tested and verified that the website loads locally.
> 
> 
> Thanks,
> 
> Jagadish Venkatraman
> 
>



Re: Review Request 56911: SAMZA-1099: Documentation updates for Samza 0.12 release (for 0.12.0 branch)

2017-02-22 Thread Xinyu Liu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56911/#review166393
---


Ship it!




Ship It!

- Xinyu Liu


On Feb. 22, 2017, 4:27 p.m., Jagadish Venkatraman wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56911/
> ---
> 
> (Updated Feb. 22, 2017, 4:27 p.m.)
> 
> 
> Review request for samza, Jake Maes, Navina Ramesh, Prateek Maheshwari, Xinyu 
> Liu, and Yi Pan (Data Infrastructure).
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> SAMZA-1099: Documentation updates for Samza 0.12 release (for 0.12.0 branch)
> 
> 
> Diffs
> -
> 
>   docs/_config.yml 2b5bf059f5d1c67fb9bddf6f9cf1fc606a73b428 
>   docs/learn/documentation/versioned/jobs/split-deployment.md 
> ebab670ae0476f3dd2bba8d0ce0c893135cf2ea2 
>   docs/learn/tutorials/versioned/deploy-samza-job-from-hdfs.md 
> 6f7dcc1107b9b2ad5ab83449efbe9780a600700c 
>   docs/learn/tutorials/versioned/deploy-samza-to-CDH.md 
> fff209f558aceb4069e40b533e4f3ae8d1419df0 
>   docs/learn/tutorials/versioned/remote-debugging-samza.md 
> 7cc3a0ebf59f76e2191df7b0ddc6cb690b80bba6 
>   docs/learn/tutorials/versioned/run-in-multi-node-yarn.md 
> 7e7ba8de537c2b28e581ba6f369ab147a74cddc6 
>   docs/learn/tutorials/versioned/samza-rest-getting-started.md 
> c0e1cf5eb1aba5cd87a62cbc3248336e47072b7b 
>   docs/startup/download/index.md 7dfdfda21caf874b72f45e2e25db6ad9980b8342 
>   docs/startup/hello-samza/versioned/index.md 
> d0aca547cfbf38aaa24ef0d2c6cf3b29a17fb64b 
> 
> Diff: https://reviews.apache.org/r/56911/diff/
> 
> 
> Testing
> ---
> 
> - Built and verified that all versions of the website (0.12,0.11,0.10, 
> latest) were displaying correctly.
> 
> 
> Thanks,
> 
> Jagadish Venkatraman
> 
>



Re: [VOTE] Apache Samza 0.12.0 RC2

2017-02-08 Thread xinyu liu
Ran build, checkAll and integration tests. All passed.

+1 non-binding.

Thanks,
Xinyu

On Wed, Feb 8, 2017 at 4:18 PM, Boris S  wrote:

> Cloned the release and ran build, test and checkAll.sh
> All passed.
> Verified MD5 and the signature.
> Got warning - "this key is not certified with a trusted signature". I guess
> it is ok.
>
> +1
>
> On Mon, Feb 6, 2017 at 5:32 PM, Jagadish Venkatraman <
> jagadish1...@gmail.com
> > wrote:
>
> > This is a call for a vote on a release of Apache Samza 0.12.0. Thanks to
> > everyone who has contributed to this release. We are very glad to see
> some
> > new contributors in this release.
> >
> > The release candidate can be downloaded from here:
> > http://home.apache.org/~jagadish/samza-0.12.0-rc2/
> >
> > The release candidate is signed with pgp key AF81FFBF, which can be found
> > on keyservers:
> > http://pgp.mit.edu/pks/lookup?op=get=0xAF81FFBF
> >
> > The git tag is release-0.12.0-rc2 and signed with the same pgp key:
> > https://git-wip-us.apache.org/repos/asf?p=samza.git;a=tag;h=
> > refs/tags/release-0.12.0-rc2
> >
> > Test binaries have been published to Maven's staging repository, and are
> > available here:
> > https://repository.apache.org/content/repositories/orgapachesamza-1018
> >
> > Note that the binaries were built with JDK8 without incident.
> >
> > 26 issues were resolved for this release:
> > https://issues.apache.org/jira/issues/?jql=project%20%3D%20S
> > AMZA%20AND%20fixVersion%20in%20(0.12%2C%200.12.0)%20AND%20st
> > atus%20in%20(Resolved%2C%20Closed)
> >
> > The vote will be open for 72 hours (end in 6PM Thursday, 02/09/2017 ).
> >
> > Please download the release candidate, check the hashes/signature, build
> it
> > and test it, and then please vote:
> >
> >
> > [ ] +1 approve
> >
> > [ ] +0 no opinion
> >
> > [ ] -1 disapprove (and reason why)
> >
> >
> > +1 from my side for the release.
> >
> > Cheers!
> >
> > --
> > Jagadish V,
> > Graduate Student,
> > Department of Computer Science,
> > Stanford University
> >
>


Re: Review Request 52570: SAMZA-1025: documentation for hdfs system consumer

2016-11-11 Thread Xinyu Liu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52570/#review155740
---




docs/learn/documentation/versioned/hdfs/consumer.md (line 22)
<https://reviews.apache.org/r/52570/#comment225813>

Please put more details about how to extend it to read other types of 
records, like which interface to extend. More tech details will be useful to 
other users.



docs/learn/documentation/versioned/hdfs/consumer.md (line 66)
<https://reviews.apache.org/r/52570/#comment225814>

This looks not that great to the users. Can we put the details here instead 
of just refering to a JIRA? Since this is user facing docs, please add the 
complete information.

Another note: the doc doesn't mention the job behavior when end of stream. 
That needs to be described here in detail too.



docs/learn/documentation/versioned/hdfs/producer.md (line 70)
<https://reviews.apache.org/r/52570/#comment225812>

Seems to me this is weird that the consumer page entrance is from the 
producer page. I would suggest add the link in versioned/index.html, something 
like "Reading from HDFS" before the link of "Writing to HDFS"


- Xinyu Liu


On Oct. 5, 2016, 8:54 p.m., Hai Lu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52570/
> ---
> 
> (Updated Oct. 5, 2016, 8:54 p.m.)
> 
> 
> Review request for samza.
> 
> 
> Bugs: SAMZA-1025
> https://issues.apache.org/jira/browse/SAMZA-1025
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> documentation of hdfs consumer
> 
> 
> Diffs
> -
> 
>   docs/learn/documentation/versioned/hdfs/consumer.md PRE-CREATION 
>   docs/learn/documentation/versioned/hdfs/producer.md 
> b0e936f5b0a9c945ea7f02bfc2536ef50f017bf6 
>   docs/learn/documentation/versioned/jobs/configuration-table.html 
> f60cd50fb197423ac3c84fd364bbe4fb3767883e 
> 
> Diff: https://reviews.apache.org/r/52570/diff/
> 
> 
> Testing
> ---
> 
> N/A
> 
> 
> Thanks,
> 
> Hai Lu
> 
>



  1   2   3   >