Re: [ANNOUNCE] Please welcome Boris Shkolnik to the Samza PMC

2019-06-07 Thread Navina Ramesh
Yaay, Boris! Congrats! 

From: Daniel Nishimura 
Sent: Friday, June 7, 2019 3:40 PM
To: dev@samza.apache.org
Subject: Re: [ANNOUNCE] Please welcome Boris Shkolnik to the Samza PMC

Congrats!

On Fri, Jun 7, 2019 at 3:35 PM Ignacio Solis  wrote:

> Congrats Boris!
>
> On Fri, Jun 7, 2019 at 3:20 PM Bharath Kumara Subramanian <
> codin.mart...@gmail.com> wrote:
>
> > Congratulations Boris!
> >
> > On Fri, Jun 7, 2019 at 3:19 PM Jagadish Venkatraman <
> > jagadish1...@gmail.com>
> > wrote:
> >
> > > Congratulations Boris!
> > >
> > > On Fri, Jun 7, 2019 at 3:15 PM Xinyu Liu 
> wrote:
> > >
> > > > Congrats, Boris!
> > > >
> > > > Xinyu
> > > >
> > > > On Fri, Jun 7, 2019 at 3:13 PM Jakob Homan 
> wrote:
> > > >
> > > > > Howdy all-
> > > > >I'm very pleased to announce that the Samza PMC has voted Boris
> > > > > Shkolnik to be a Project Management Committee (PMC) Member.  The
> PMC
> > > > > is responsible for the overall health of a project andl for voting
> in
> > > > > new committers and PMC members, as well as VOTEing on releases.
> Over
> > > > > the past two years, Boris has been a valuable committer on the
> > > > > project.
> > > > >
> > > > > Congrats Boris!
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jakob
> > > > > on behalf of the Samza PMC
> > > > >
> > > >
> > >
> > >
> > > --
> > > Jagadish V,
> > > Graduate Student,
> > > Department of Computer Science,
> > > Stanford University
> > >
> >
>
>
> --
> Nacho - Ignacio Solis - iso...@igso.net
>


Re: [VOTE] Migration of Samza git repo to gitbox.apache.org

2019-01-23 Thread Navina Ramesh
It looks like a mandatory migration. Why do we need a vote for this?


Thanks!
Navina


From: Pawas Chhokra 
Sent: Wednesday, January 23, 2019 11:50:32 AM
To: dev@samza.apache.org
Subject: [VOTE] Migration of Samza git repo to gitbox.apache.org

Hi all,

This is a call for a vote on migrating Samza git repo to gitbox.apache.org, on
11 AM, Jan 29, 2019. As mandated by the Apache Infrastructure Team, all git
repositories must be migrated from git-wip-us.apache.org URL to
gitbox.apache.org, as the old service is being decommissioned.
The vote will be open for 72 hours (ending at 12:00 PM PST Monday,
January 28). You can vote as follows:

[ ] +1 approve

[ ] +0 no opinion

[ ] -1 disapprove (and reason why)

The vote is +1 from my side.

Thanks & Regards,
Pawas Chhokra


Re: [VOTE] SEP-12: Integration Test Framework

2018-05-17 Thread Navina Ramesh
+1 can't wait to have this in! :)


From: Jacob Maes 
Sent: Thursday, May 17, 2018 10:13:57 AM
To: dev@samza.apache.org
Subject: Re: [VOTE] SEP-12: Integration Test Framework

+1

On Thu, May 17, 2018 at 9:56 AM, Jagadish Venkatraman <
jagadish1...@gmail.com> wrote:

> Thanks Sanil for the proposal. This will go a long way in simplifying
> testing of Samza applications.
>
> +1 (binding)
>
>
>
> On Thu, May 17, 2018 at 9:45 AM, Daniel Nishimura 
> wrote:
>
> > +1
> >
> > Looks great!
> >
> > On Thu, May 17, 2018 at 9:08 AM, Xinyu Liu 
> wrote:
> >
> > > +1
> > >
> > > The proposal looks great to me. Look forward to seeing the
> > implementation.
> > >
> > > Thanks,
> > > Xinyu
> > >
> > > On Wed, May 16, 2018 at 6:12 PM, Sanil Jain 
> > > wrote:
> > >
> > > > Hi All,
> > > >
> > > > This is a call for a vote for Samza's Integration Test Framework as
> > > > described by:
> > > >
> > > > https://cwiki.apache.org/confluence/display/SAMZA/SEP-
> > > > 12%3A+Integration+Test+Framework
> > > >
> > > > The vote will be open for 3 days (ending at 6:00PM Monday,
> 05/21/2018).
> > > >
> > > > Link to the discuss mailing thread:
> > > >
> > > > http://mail-archives.apache.org/mod_mbox/samza-dev/201805.mbox/%
> > > > 3CDM5PR21MB02827A6FA9F47CB8EF99A339A2810%40DM5PR21MB0282.
> > > > namprd21.prod.outlook.com%3E
> > > >
> > > >
> > > > Please vote:
> > > >
> > > > [ ] +1 approve
> > > >
> > > > [ ] +0 no opinion
> > > >
> > > > [ ] -1 disapprove (and reason why)
> > > >
> > > > Thanks
> > > >
> > >
> >
>
>
>
> --
> Jagadish V,
> Graduate Student,
> Department of Computer Science,
> Stanford University
>


Re: Welcome Xinyu as new Samza PMC!

2018-01-17 Thread Navina Ramesh
Congratulations, Xinyu!
Thanks for all your contribution and looking forward to more 


Cheers!
Navina


From: Yi Pan 
Sent: Wednesday, January 17, 2018 10:26:54 AM
To: dev@samza.apache.org
Subject: Welcome Xinyu as new Samza PMC!

Finally all the documentation procedure is completed and Xinyu Liu has been
officially promoted to Samza PMC member! This is well deserved due to his
continued contribution to the Samza project.

Please join me to welcome Xinyu as our newest PMC member!

Cheers!

-Yi Pan


Re: [VOTE] Apache Samza 0.14.0 RC5

2017-12-22 Thread Navina Ramesh
+1 on RC5.


Verified signature and checksum. Ran ./bin/check-all.sh


Cheers!
Navina


From: xinyu liu 
Sent: Friday, December 22, 2017 2:50:48 PM
To: dev@samza.apache.org
Subject: [VOTE] Apache Samza 0.14.0 RC5

This is a call for a vote on a release of Apache Samza 0.14.0. Thanks
to everyone
who has contributed to this release.

The release candidate can be downloaded from here:
http://home.apache.org/~xinyu/samza-0.14.0-rc5/

The release candidate is signed with pgp key C31D7061, which can be found on
keyservers:
http://pgp.mit.edu/pks/lookup?op=get=0x35964389C31D7061

The git tag is release-0.14.1-rc5 and signed with the same pgp key:
https://git-wip-us.apache.org/repos/asf?p=samza.git;a=tag;h=
refs/tags/release-0.14.0-rc5

Test binaries have been published to Maven's staging repository, and
are available
here:
https://repository.apache.org/content/repositories/orgapachesamza-1042

61 issues have been resolved as part of this release
https://issues.apache.org/jira/browse/SAMZA-1519?jql=project
%20%3D%20SAMZA%20AND%20fixVersion%20%3D%200.14.0%20AND%
20status%20%3D%20Resolved

The vote will be open for 72 hours (ending at 15:00 PM Thursday,
12/28/2017).

Please download the release candidate, check the hashes/signature, build it
and test it, and then please vote:

[ ] +1 approve

[ ] +0 no opinion

[ ] -1 disapprove (and reason why)

Thanks,
Xinyu


Re: [DISCUSS] Samza 0.14.0 release

2017-11-28 Thread Navina Ramesh
The list looks awesome! :)


From: Jacob Maes 
Sent: Tuesday, November 28, 2017 10:03:58 AM
To: dev@samza.apache.org
Subject: Re: [DISCUSS] Samza 0.14.0 release

+1

On Mon, Nov 27, 2017 at 8:15 PM, Fred Haifeng Ji 
wrote:

> +1! Thanks Bharath!
>
> Fred
>
> On Mon, Nov 27, 2017 at 11:10 AM, Yi Pan  wrote:
>
> > Thanks for driving this! +1
> >
> > A few minor things that are pending that I think we should pull in:
> > 1) https://issues.apache.org/jira/browse/SAMZA-1459
> > 2) https://github.com/apache/samza/pull/302
> > 3) https://github.com/apache/samza/pull/301
> > 4) https://github.com/apache/samza/pull/286
> > 5) https://issues.apache.org/jira/browse/SAMZA-1406
> > 6) https://issues.apache.org/jira/browse/SAMZA-1356
> > 7) https://github.com/apache/samza/pull/10
> > 8) https://github.com/apache/samza/pull/7
> >
> > Let's pull in the patches that are ready as well.
> >
> > Thanks!
> >
> > On Mon, Nov 27, 2017 at 10:45 AM, Debraj Manna  >
> > wrote:
> >
> > > +1
> > >
> > > On Mon, Nov 27, 2017 at 11:32 PM, xinyu liu 
> > wrote:
> > >
> > > > +1.
> > > >
> > > > Very happy to see a lot of important features added in this release.
> > > >
> > > > Thanks,
> > > > Xinyu
> > > >
> > > > On Mon, Nov 27, 2017 at 10:00 AM, Jagadish Venkatraman <
> > > > jagad...@apache.org>
> > > > wrote:
> > > >
> > > > > +1 from my side.
> > > > >
> > > > > Thank you Bharath for driving the release!
> > > > >
> > > > >
> > > > >
> > > > > On Mon, Nov 27, 2017 at 9:50 AM, Bharath Kumara Subramanian <
> > > > > codin.mart...@gmail.com> wrote:
> > > > >
> > > > > > Hi all,
> > > > > >
> > > > > >
> > > > > >
> > > > > > We have added couple of major features to master since 0.13.1
> that
> > > > > warrants
> > > > > > a major release.
> > > > > >
> > > > > > Within LinkedIn, some of these features have already been tested
> as
> > > > part
> > > > > of
> > > > > > our test suites. We plan to continue our testing in coming weeks
> to
> > > > > > validate the stability prior to release.
> > > > > >
> > > > > > We wanted to kick off the discussion in open source forum to keep
> > the
> > > > > > momentum flowing.
> > > > > >
> > > > > >
> > > > > >
> > > > > > Here is the list of features that are part of the new release
> > > > > >
> > > > > >- SAMZA-1510  jira/browse/SAMZA-1510>
> > -
> > > > > Samza
> > > > > >SQL
> > > > > >- SAMZA-1417  jira/browse/SAMZA-1417>
> > -
> > > > Add
> > > > > >support for multistage batch to Samza on Hadoop
> > > > > >- SAMZA-1438  jira/browse/SAMZA-1438>
> > -
> > > > > > Event-hub
> > > > > >connectors for Samza
> > > > > >
> > > > > >
> > > > > >
> > > > > > We have also worked on stabilizing our 0.13 features. Here are
> some
> > > > > > highlights
> > > > > >
> > > > > >- SAMZA-1454  jira/browse/SAMZA-1454
> > >,
> > > > > >SAMZA-1493 
> -
> > > Add
> > > > > >support for durable state for high level API
> > > > > >- SAMZA-1417  jira/browse/SAMZA-1417>
> > > > > >SAMZA-1330 
> > > > > > SAMZA-1289
> > > > > > -
> > > Stabilization
> > > > of
> > > > > >ZooKeeper based deployment model
> > > > > >- SAMZA-1471  jira/browse/SAMZA-1471
> > >,
> > > > > >SAMZA-1392  >,
> > > > > > SAMZA-1465
> > > > > > -
> > Performance
> > > > > >improvements
> > > > > >
> > > > > >
> > > > > >
> > > > > > You can find the concrete list of the features here
> > > > > >  > > > > > project%20%3D%20samza%20AND%20fixVersion%20%3D%200.14.0%
> > > > > > 20AND%20resolution%20%3D%20fixed>
> > > > > > .
> > > > > >
> > > > > >
> > > > > >
> > > > > > Here is my proposal on our release schedule and timelines.
> > > > > >
> > > > > >1. Create a release candidate with the current 0.14.0 HEAD
> > > > > >2. Target a release vote on the week Dec 4st
> > > > > >
> > > > > >
> > > > > >
> > > > > > Thoughts?
> > > > > >
> > > > > >
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Bharath
> > > > > >
> > > > >
> > > >
> > >
> >
>
>
>
> --
> Haifeng (Fred)  Ji
>


Re: Historical container logs in YARN UI

2017-10-02 Thread Navina Ramesh
+Jon (Adding Jon to cc)


From: Yi Pan 
Sent: Monday, October 2, 2017 2:10:27 AM
To: dev@samza.apache.org
Subject: Re: Historical container logs in YARN UI

Hi, XiaoChuan,

Our SRE team have been using timeline server in YARN at LinkedIn to get the
historical container logs in our admin dashboard. @Jon Bringburst, can you
share some experience regarding to how to configure timeline server in YARN?

Thanks a lot!

-Yi

On Sat, Sep 30, 2017 at 1:08 PM, XiaoChuan Yu  wrote:

> Hi,
>
> Is there a way to view historical container logs in YARN UI?
> When I try view historical logs from YARN UI right now I get the follow
> message:
> Failed while trying to construct the redirect url to the log server. Log
> Server url may not be configured
> ...
>
> I configured log aggregation and timeline server in YARN.
> I know there's a history server implementation for Map Reduce. Is there a
> similar history server implementation available for Samza?
>
> Thanks,
> Xiaochuan Yu
>


Re: Connection timed out error while installing "Hello Samza"

2017-09-14 Thread Navina Ramesh
if it is already installed in your system, I don't think you need re-install 
them again. However, do make sure that there are no ACLs enabled on ZK because 
kafka may not support it.


> My guess Even if there are conflicts on earlier installations it should throw 
> some other error but not the connection time out

I think your connection timeout was related to when you were trying to download 
samza from apache.


If you can re-use the already install kafka and zookeeper client, then install 
yarn, and follow the remaining steps to build and deploy samza. In any case, 
more logs will be useful for us to debug.


> it could either related resolving "localhost" or firewall that prevents 
> communication between ports"

I meant this could be the reason why ZK is not able to install and/or talk to 
the kafka broker.


It totally stumps me why you can't downloads samza from your server. Is it 
possible for you try on another clean host and see if it works or fails for you?


Navina


From: Anantharaman, Srinatha (Contractor) <srinatha_ananthara...@comcast.com>
Sent: Thursday, September 14, 2017 12:33:26 PM
To: dev@samza.apache.org
Subject: RE: Connection timed out error while installing "Hello Samza"

That is a good catch Naveena the server which I am trying to install is an 
Hadoop edge node, it has kafka broker and Zookeeper client already installed 
for my Hadoop
Does it matter If I am installing it on a separate folder?. My guess Even if 
there are conflicts on earlier installations it should throw some other error 
but not the connection time out

Coming to your question " it could either related resolving "localhost" or 
firewall that prevents communication between ports" - how to prove it is 
because of Firewall issue. I am able to clone Samza files from Apache git. I 
can download any external files on this server

~Sri

-Original Message-
From: Navina Ramesh [mailto:nram...@linkedin.com]
Sent: Thursday, September 14, 2017 2:51 PM
To: dev@samza.apache.org
Subject: Re: Connection timed out error while installing "Hello Samza"

I wonder if this has anything to do with previous kafka / zookeeper installed 
on your box. Just for sanity, try clearing /tmp/zookeeeper* and /tmp/kafka* 
before re-trying those steps.


Same as Yi, I strongly suspect issues with your local laptop setup - it could 
either related resolving "localhost" or firewall that prevents communication 
between ports.


Navina


From: Yi Pan <nickpa...@gmail.com>
Sent: Thursday, September 14, 2017 11:37:46 AM
To: dev@samza.apache.org
Subject: Re: Connection timed out error while installing "Hello Samza"

Hi, Anantharaman,

I just did the same steps as you described in your email and all passed on my 
box. Hence, I strongly suspect that it is related to your local laptop network 
setup.

Could you post all the command line output when you ran the sequence of 
commands?

-Yi

On Thu, Sep 14, 2017 at 11:21 AM, Yi Pan <nickpa...@gmail.com> wrote:

> Hi, Ananarath,
>
> It is very strange that you are seeing this timeout exception that we
> do not see. I am trying to follow the exact steps you did to see
> whether there is anything broken. I will update you this afternoon.
>
> Meanwhile, could you check your hostname setup and firewall
> configuration to see whether your local laptop has blocked access via
> the public IP address to your laptop? Could you verify that your
> localhost is resolved to
> 127.0.0.1 and is accessible?
>
> -Yi
>
> On Thu, Sep 14, 2017 at 11:18 AM, Anantharaman, Srinatha (Contractor)
> < srinatha_ananthara...@comcast.com> wrote:
>
>> Yi,
>>
>> Is there any alternate way to install Samza  Or solution to the
>> connection time out error?
>>
>> Regards,
>> ~Sri
>>
>> From: Anantharaman, Srinatha (Contractor)
>> Sent: Wednesday, September 13, 2017 11:37 AM
>> To: dev@samza.apache.org
>> Subject: RE: Connection timed out error while installing "Hello Samza"
>>
>>
>> Yi,
>>
>>
>>
>> I am trying to build Samza locally by following the steps provided by
>> Navina.
>>
>> As per those steps kafka will be installed after Zookeeper, I am
>> getting Error while starting Zookeeper after it is installed
>>
>>
>>
>>
>>
>> Steps Followed :
>>
>>
>>
>> Yes. You can clone apache/samza locally and build it with:
>>
>>
>>
>>
>>
>> cd 
>>
>>
>>
>> gradle -b bootstrap.gradle
>>
>>
>>
>> ./gradlew clean build -x test
>>
>>
>>
>> ./gradlew publishToMavenLocal## This publishes a snapshot ver

Re: Connection timed out error while installing "Hello Samza"

2017-09-14 Thread Navina Ramesh
 - Got user-level KeeperException when
>>
>> > processing sessionid:0x15e770d6192 type:setData cxid:0x25
>>
>> > zxid:0x15
>>
>> > txntype:-1 reqpath:n/a Error Path:/controller_epoch
>>
>> > Error:KeeperErrorCode = NoNode for /controller_epoch
>>
>> > 2017-09-12 17:18:07,050 [myid:] - INFO  [ProcessThread(sid:0 cport:-1)::
>>
>> > PrepRequestProcessor@617] - Got user-level KeeperException when
>>
>> > processing sessionid:0x15e770d6192 type:delete cxid:0x34 zxid:0x17
>>
>> > txntype:-1 reqpath:n/a Error Path:/admin/preferred_replica_election
>>
>> > Error:KeeperErrorCode = NoNode for /admin/preferred_replica_election
>>
>> > 2017-09-12 17:18:07,288 [myid:] - INFO  [ProcessThread(sid:0 cport:-1)::
>>
>> > PrepRequestProcessor@617] - Got user-level KeeperException when
>>
>> > processing sessionid:0x15e770d6192 type:create cxid:0x3f zxid:0x18
>>
>> > txntype:-1 reqpath:n/a Error Path:/brokers Error:KeeperErrorCode =
>>
>> > NodeExists for /brokers
>>
>> > 2017-09-12 17:18:07,290 [myid:] - INFO  [ProcessThread(sid:0 cport:-1)::
>>
>> > PrepRequestProcessor@617] - Got user-level KeeperException when
>>
>> > processing sessionid:0x15e770d6192 type:create cxid:0x40 zxid:0x19
>>
>> > txntype:-1 reqpath:n/a Error Path:/brokers/ids Error:KeeperErrorCode =
>>
>> > NodeExists for /brokers/ids
>>
>> >
>>
>> >
>>
>> >
>>
>> > -Original Message-
>>
>> > From: Yi Pan [mailto:nickpa...@gmail.com]
>>
>> > Sent: Tuesday, September 12, 2017 2:02 PM
>>
>> > To: dev@samza.apache.org<mailto:dev@samza.apache.org>
>>
>> > Subject: Re: Connection timed out error while installing "Hello Samza"
>>
>> >
>>
>> > Hi, Anantharaman,
>>
>> >
>>
>> > Could you post your zookeeper startup logs here?
>>
>> >
>>
>> > On Tue, Sep 12, 2017 at 10:21 AM, Anantharaman, Srinatha (Contractor)
>>
>> > < srinatha_ananthara...@comcast.com<mailto:Srinatha_Anantharam
>> a...@comcast.com>> wrote:
>>
>> >
>>
>> > > It hangs while bringing up the service
>>
>> > >
>>
>> > > [root@codehdplak-po-r19p hello-samza]# pwd
>>
>> > > /app/home/eventsvc/samza-git/hello-samza
>>
>> > > [root@codehdplak-po-r19p hello-samza]# ./bin/grid install zookeeper
>>
>> > > EXECUTING: install zookeeper
>>
>> > > Using previously downloaded file /root/.samza/download/
>>
>> > > zookeeper-3.4.3.tar.gz [root@codehdplak-po-r19p hello-samza]#
>>
>> > > ./bin/grid start zookeeper
>>
>> > > EXECUTING: start zookeeper
>>
>> > > JMX enabled by default
>>
>> > > Using config:
>>
>> > > /app/home/eventsvc/samza-git/hello-samza/deploy/zookeeper/
>>
>> > > bin/../conf/zoo.cfg
>>
>> > > Starting zookeeper ... STARTED
>>
>> > > Waiting for zookeeper to start...
>>
>> > >
>>
>> > > ^C
>>
>> > > [root@codehdplak-po-r19p hello-samza]# ./bin/grid start zookeeper^C
>>
>> > > [root@codehdplak-po-r19p hello-samza]# ./bin/grid install kafka
>>
>> > > EXECUTING: install kafka
>>
>> > > Using previously downloaded file /root/.samza/download/kafka_2.
>>
>> > > 11-0.10.1.1.tgz
>>
>> > > [root@codehdplak-po-r19p hello-samza]# ./bin/grid start kafka
>>
>> > > EXECUTING: start kafka
>>
>> > > Waiting for kafka to start...
>>
>> > > Ncat: Connection refused.
>>
>> > > .Ncat: Connection refused.
>>
>> > > .Ncat: Connection refused.
>>
>> > > .^C
>>
>> > > [root@codehdplak-po-r19p hello-samza]#
>>
>> > >
>>
>> > >
>>
>> > >
>>
>> > > -Original Message-
>>
>> > > From: Navina Ramesh [mailto:nram...@linkedin.com]
>>
>> > > Sent: Tuesday, September 12, 2017 12:43 PM
>>
>> > > To: dev@samza.apache.org<mailto:dev@samza.apache.org>
>>
>> > > Subject: Re: Connection timed out error while installing "Hello Samza"
>>
>> > >
>>
>> > > Yes. You can clone apache/samza locally and build it with:
>>
>> &g

Re: Connection timed out error while installing "Hello Samza"

2017-09-12 Thread Navina Ramesh
Yes. You can clone apache/samza locally and build it with:


cd 

gradle -b bootstrap.gradle

./gradlew clean build -x test

./gradlew publishToMavenLocal## This publishes a snapshot version of the 
latest apache/samza into your local maven repo


Then, head to hello-samza workspace and build again:

cd 

mvn clean package  ## This should create a build target

./bin/grid install zookeeper

./bin/grid start zookeeper

./bin/grid install kafka

./bin/grid start kafka

./bin/grid install yarn

./bin/grid start yarn


mkdir -p deploy/samza

tar -xvf ./target/hello-samza-*-SNAPSHOT-dist.tar.gz -C deploy/samza


After this, you can follow steps in the tutorial to "Run" the example Samza job.


HTH! Let me know if you need further help.

Navina


From: Anantharaman, Srinatha (Contractor) <srinatha_ananthara...@comcast.com>
Sent: Tuesday, September 12, 2017 9:21:53 AM
To: dev@samza.apache.org
Subject: RE: Connection timed out error while installing "Hello Samza"

Navina,

Is there any other way we can install Hello Samza?

Regards,
~Sri

-Original Message-
From: Navina Ramesh [mailto:nram...@linkedin.com]
Sent: Tuesday, September 12, 2017 11:42 AM
To: dev@samza.apache.org
Subject: Re: Connection timed out error while installing "Hello Samza"

Ok. I tried again for the "latest" branch in hello-samza and it still works.


> While installing it says "Building samza from master..."

It is expected to build from "master" in apache/samza repo. So, the output line 
is expected.


It is weird that you are unable to connect. Is it possible you are behind a 
firewall or something? Can you try to ping "git.apache.org" ? Or try the setup 
on a different box?


Navina


From: Anantharaman, Srinatha (Contractor) <srinatha_ananthara...@comcast.com>
Sent: Tuesday, September 12, 2017 8:33:30 AM
To: dev@samza.apache.org
Subject: RE: Connection timed out error while installing "Hello Samza"

Navina,

I tried again but still same error

While installing it says "Building samza from master..."

But when after I cloned I executed " git checkout latest"

Regards,
~Sri


-Original Message-
From: Navina Ramesh [mailto:nram...@linkedin.com]
Sent: Tuesday, September 12, 2017 11:10 AM
To: dev@samza.apache.org
Subject: Re: Connection timed out error while installing "Hello Samza"

Hi Anantharaman,

It looks like a transient connection failure to connect to Apache's git. I 
tried on my host and it seems to be working.

Can you give it another shot?


If it still doesn't work, please let me know if you are running the command 
under the "master" or "latest" branch of samza-hello-samza.


Thanks!

Navina


From: Anantharaman, Srinatha (Contractor) <srinatha_ananthara...@comcast.com>
Sent: Tuesday, September 12, 2017 7:45:37 AM
To: dev@samza.apache.org
Subject: Connection timed out error while installing "Hello Samza"

Hi,

I am trying to install "Hello Samza" on a single node Initially I have 
installed Kafka, Yarn and Zookeeper using  bin/grid install 
kafka/yarn/zookeeper When I am trying bin/grid bootstrap getting connection 
timed out error It also mentions no kafka, yarn and zookeeper installed

Please find below the error message

[root@codehdplak-po-r19p bin]# cd ..
[root@codehdplak-po-r19p hello-samza]#  bin/grid install kafka
EXECUTING: install kafka
Using previously downloaded file /root/.samza/download/kafka_2.11-0.10.1.1.tgz
[root@codehdplak-po-r19p hello-samza]# bin/grid install yarn
EXECUTING: install yarn
Using previously downloaded file /root/.samza/download/hadoop-2.6.1.tar.gz
[root@codehdplak-po-r19p hello-samza]# bin/grid install zookeeper
EXECUTING: install zookeeper
Using previously downloaded file /root/.samza/download/zookeeper-3.4.3.tar.gz
[root@codehdplak-po-r19p hello-samza]# bin/grid bootstrap Bootstrapping the 
system...
EXECUTING: stop kafka
No kafka server to stop
EXECUTING: stop yarn
no resourcemanager to stop
no nodemanager to stop
EXECUTING: stop zookeeper
JMX enabled by default
Using config: 
/app/home/eventsvc/samza-git/hello-samza/deploy/zookeeper/bin/../conf/zoo.cfg
Stopping zookeeper ... no zookeeper to stop (could not find file 
/tmp/zookeeper/zookeeper_server.pid)
EXECUTING: install samza
Building samza from master...
~/.samza/download /app/home/eventsvc/samza-git/hello-samza
Cloning into 'samza'...
fatal: unable to connect to git.apache.org:
git.apache.org[0: 54.84.58.65]: errno=Connection timed out


Could you please help me to resolve this issue?

Regards,
~Sri


Re: Connection timed out error while installing "Hello Samza"

2017-09-12 Thread Navina Ramesh
Ok. I tried again for the "latest" branch in hello-samza and it still works.


> While installing it says "Building samza from master..."

It is expected to build from "master" in apache/samza repo. So, the output line 
is expected.


It is weird that you are unable to connect. Is it possible you are behind a 
firewall or something? Can you try to ping "git.apache.org" ? Or try the setup 
on a different box?


Navina


From: Anantharaman, Srinatha (Contractor) <srinatha_ananthara...@comcast.com>
Sent: Tuesday, September 12, 2017 8:33:30 AM
To: dev@samza.apache.org
Subject: RE: Connection timed out error while installing "Hello Samza"

Navina,

I tried again but still same error

While installing it says "Building samza from master..."

But when after I cloned I executed " git checkout latest"

Regards,
~Sri


-Original Message-
From: Navina Ramesh [mailto:nram...@linkedin.com]
Sent: Tuesday, September 12, 2017 11:10 AM
To: dev@samza.apache.org
Subject: Re: Connection timed out error while installing "Hello Samza"

Hi Anantharaman,

It looks like a transient connection failure to connect to Apache's git. I 
tried on my host and it seems to be working.

Can you give it another shot?


If it still doesn't work, please let me know if you are running the command 
under the "master" or "latest" branch of samza-hello-samza.


Thanks!

Navina


From: Anantharaman, Srinatha (Contractor) <srinatha_ananthara...@comcast.com>
Sent: Tuesday, September 12, 2017 7:45:37 AM
To: dev@samza.apache.org
Subject: Connection timed out error while installing "Hello Samza"

Hi,

I am trying to install "Hello Samza" on a single node Initially I have 
installed Kafka, Yarn and Zookeeper using  bin/grid install 
kafka/yarn/zookeeper When I am trying bin/grid bootstrap getting connection 
timed out error It also mentions no kafka, yarn and zookeeper installed

Please find below the error message

[root@codehdplak-po-r19p bin]# cd ..
[root@codehdplak-po-r19p hello-samza]#  bin/grid install kafka
EXECUTING: install kafka
Using previously downloaded file /root/.samza/download/kafka_2.11-0.10.1.1.tgz
[root@codehdplak-po-r19p hello-samza]# bin/grid install yarn
EXECUTING: install yarn
Using previously downloaded file /root/.samza/download/hadoop-2.6.1.tar.gz
[root@codehdplak-po-r19p hello-samza]# bin/grid install zookeeper
EXECUTING: install zookeeper
Using previously downloaded file /root/.samza/download/zookeeper-3.4.3.tar.gz
[root@codehdplak-po-r19p hello-samza]# bin/grid bootstrap Bootstrapping the 
system...
EXECUTING: stop kafka
No kafka server to stop
EXECUTING: stop yarn
no resourcemanager to stop
no nodemanager to stop
EXECUTING: stop zookeeper
JMX enabled by default
Using config: 
/app/home/eventsvc/samza-git/hello-samza/deploy/zookeeper/bin/../conf/zoo.cfg
Stopping zookeeper ... no zookeeper to stop (could not find file 
/tmp/zookeeper/zookeeper_server.pid)
EXECUTING: install samza
Building samza from master...
~/.samza/download /app/home/eventsvc/samza-git/hello-samza
Cloning into 'samza'...
fatal: unable to connect to git.apache.org:
git.apache.org[0: 54.84.58.65]: errno=Connection timed out


Could you please help me to resolve this issue?

Regards,
~Sri


Re: Connection timed out error while installing "Hello Samza"

2017-09-12 Thread Navina Ramesh
Hi Anantharaman,

It looks like a transient connection failure to connect to Apache's git. I 
tried on my host and it seems to be working.

Can you give it another shot?


If it still doesn't work, please let me know if you are running the command 
under the "master" or "latest" branch of samza-hello-samza.


Thanks!

Navina


From: Anantharaman, Srinatha (Contractor) 
Sent: Tuesday, September 12, 2017 7:45:37 AM
To: dev@samza.apache.org
Subject: Connection timed out error while installing "Hello Samza"

Hi,

I am trying to install "Hello Samza" on a single node
Initially I have installed Kafka, Yarn and Zookeeper using  bin/grid install 
kafka/yarn/zookeeper
When I am trying bin/grid bootstrap getting connection timed out error
It also mentions no kafka, yarn and zookeeper installed

Please find below the error message

[root@codehdplak-po-r19p bin]# cd ..
[root@codehdplak-po-r19p hello-samza]#  bin/grid install kafka
EXECUTING: install kafka
Using previously downloaded file /root/.samza/download/kafka_2.11-0.10.1.1.tgz
[root@codehdplak-po-r19p hello-samza]# bin/grid install yarn
EXECUTING: install yarn
Using previously downloaded file /root/.samza/download/hadoop-2.6.1.tar.gz
[root@codehdplak-po-r19p hello-samza]# bin/grid install zookeeper
EXECUTING: install zookeeper
Using previously downloaded file /root/.samza/download/zookeeper-3.4.3.tar.gz
[root@codehdplak-po-r19p hello-samza]# bin/grid bootstrap
Bootstrapping the system...
EXECUTING: stop kafka
No kafka server to stop
EXECUTING: stop yarn
no resourcemanager to stop
no nodemanager to stop
EXECUTING: stop zookeeper
JMX enabled by default
Using config: 
/app/home/eventsvc/samza-git/hello-samza/deploy/zookeeper/bin/../conf/zoo.cfg
Stopping zookeeper ... no zookeeper to stop (could not find file 
/tmp/zookeeper/zookeeper_server.pid)
EXECUTING: install samza
Building samza from master...
~/.samza/download /app/home/eventsvc/samza-git/hello-samza
Cloning into 'samza'...
fatal: unable to connect to git.apache.org:
git.apache.org[0: 54.84.58.65]: errno=Connection timed out


Could you please help me to resolve this issue?

Regards,
~Sri


Re: [VOTE] SEP-8: Add in-memory system consumer & producer

2017-09-06 Thread Navina Ramesh
Hi Bharath,


Really good design!


  1.  Based on your SEP, you have listed 3 implementation approaches. Do you 
know which one we are choosing? I suspect it is Approach C. Can you please 
confirm and update the SEP?
  2.  Perhaps rename "Test Plan" to "Proposed Usage" or "Usage Example"

Overall, +1 on this. We need this asap!! 

Thanks!

Navina


From: xinyu liu 
Sent: Wednesday, September 6, 2017 2:06:45 PM
To: dev@samza.apache.org
Subject: Re: [VOTE] SEP-8: Add in-memory system consumer & producer

+1 on the overall design. This will make testing a lot easier!

Thanks,
Xinyu

On Wed, Sep 6, 2017 at 10:45 AM, Bharath Kumara Subramanian <
codin.mart...@gmail.com> wrote:

> Hi all,
>
> Can you please vote for SEP-8?
> You can find the design document here
>  >.
>
> Thanks,
> Bharath
>


Re: [VOTE] Apache Samza 0.13.1 RC0

2017-08-22 Thread Navina Ramesh
Ran check-all on Mac. Build looks good.


+1 (binding)


Thanks!
Navina


From: Fred Haifeng Ji 
Sent: Tuesday, August 22, 2017 11:19 AM
To: dev@samza.apache.org
Subject: Re: [VOTE] Apache Samza 0.13.1 RC0

[You don't often get email from haifeng...@gmail.com. Learn why this is 
important at http://aka.ms/LearnAboutSenderIdentification.]

Thanks to those who already tested the RC and voted.

Due to the weekend and the eclipse day, we are extending the vote till 1pm
Wednesday 8/23.

Thanks,

Fred

On Tue, Aug 22, 2017 at 9:59 AM, Jagadish Venkatraman <
jagadish1...@gmail.com> wrote:

> Ran check-all.sh, and it succeeded!
>
> +1 (non binding)
>
> On Mon, Aug 21, 2017 at 4:34 PM, xinyu liu  wrote:
>
> > Built the src, and ran the tests using check-all.sh. Most of the tests
> ran
> > fine. There was an transient test failure (
> > https://issues.apache.org/jira/browse/SAMZA-1405), which seems to be
> > caused
> > by the testing env (further investigation needed). I reran the tests
> again
> > and it passed. Since this test doesn't affect the build itself, I am +1
> > (non-binding).
> >
> > Thanks,
> > Xinyu
> >
> > On Mon, Aug 21, 2017 at 2:24 PM, Yi Pan  wrote:
> >
> > > Downloaded the source, compiled and ran the integration tests. All
> > passed.
> > >
> > > +1 (binding) w/ the following minor comments:
> > > # Please make a note in the release note that this version requires JDK
> > > 1.8.0.112+ (I have test w/ JDK 1.8.0.121)
> > > # Please make sure that we publish artifacts compiled w/ Scala 2.10,
> > Scala
> > > 2.11, and Scala 2.12
> > >
> > > -Yi
> > >
> > > On Fri, Aug 18, 2017 at 11:59 AM, Fred Haifeng Ji <
> haifeng...@gmail.com>
> > > wrote:
> > >
> > > > This is a call for a vote on a release of Apache Samza 0.13.1. Thanks
> > to
> > > > everyone who has contributed to this release.
> > > >
> > > > The release candidate can be downloaded from here:
> > > > http://home.apache.org/~navina/samza-0.13.1-rc0/
> > > >
> > > >
> > > > The release candidate is signed with pgp key A211312E, which can be
> > found
> > > > on keyservers:
> > > > http://pgp.mit.edu/pks/lookup?op=get=0xEDFD8F9AA211312E
> > > >
> > > >
> > > > The git tag is release-0.13.1-rc0 and signed with the same pgp key:
> > > > https://git-wip-us.apache.org/repos/asf?p=samza.git;a=tag;h=
> > > > refs/tags/release-0.13.1-rc0
> > > >
> > > > Test binaries have been published to Maven's staging repository, and
> > are
> > > > available here:
> > > > *https://repository.apache.org/content/repositories/
> > orgapachesamza-1030/
> > > >  > orgapachesamza-1030/
> > > >*
> > > >
> > > >
> > > > 29 issues were resolved for this release: https://issues.apache
> > > > .org/jira/issues/?jql=project%20%3D%2012314526%20AND%20fixVe
> > > > rsion%20%3D%2012340845%20ORDER%20BY%20priority%20DESC%2C%20key%20ASC
> > > >
> > > >
> > > > The vote will be open for 72 hours (ending at 1:00PM Monday,
> > 08/21/2017).
> > > >
> > > > Please download the release candidate, check the hashes/signature,
> > build
> > > it
> > > > and test it, and then please vote:
> > > >
> > > >
> > > > [ ] +1 approve
> > > >
> > > > [ ] +0 no opinion
> > > >
> > > > [ ] -1 disapprove (and reason why)
> > > >
> > > >
> > > > --
> > > > Fred Ji
> > > >
> > >
> >
>
>
>
> --
> Jagadish V,
> Graduate Student,
> Department of Computer Science,
> Stanford University
>



--
Haifeng (Fred)  Ji


Re: [Discuss] Samza 0.13.1 release

2017-08-12 Thread Navina Ramesh
Fred,
Thanks for starting the release process. I am unable to open the link you have 
provided, though. It opens the JIRA SAMZA-1165, instead of the entire list of 
0.13.1 bug fixes. Can you please re-check?

Navina

On 8/11/17, 6:13 PM, "ignacio.so...@gmail.com on behalf of Ignacio Solis" 
 wrote:

+1

On Fri, Aug 11, 2017 at 3:52 PM, Jacob Maes  wrote:
> Looks good!
>
> +1
>
> On Thu, Aug 10, 2017 at 6:53 PM, Jagadish Venkatraman <
> jagadish1...@gmail.com> wrote:
>
>> +1 for the release. thanks for the summary and for driving this Fred!
>>
>> On Thu, Aug 10, 2017 at 5:15 PM Fred Haifeng Ji 
>> wrote:
>>
>> > The format was messed up when sent from my yahoo mail to
>> > dev@samza.apache.org. I am resending it from my gmail account. Sorry 
for
>> > inconvenience!
>> >
>> > Hi all,
>> >
>> > There have been some new features and critical bug fixes added to 
master
>> > since 0.13.0 release, which makes Samza Standalone features more 
stable.
>> It
>> > is now good enough to warrant *a new minor release*. We will continue 
to
>> > test for stability and performance in the next few weeks.
>> >
>> > Here are the main JIRA tickets that will be included in this release 
(but
>> > not limited to):
>> > SAMZA-1165: Cleanup data created by ZkStandalone in ZK;
>> > SAMZA-1324: Add a metricsreporter lifecycle for JobCoordinator 
component
>> of
>> > StreamProcessor;
>> > SAMZA-1336: Standalone session expiration propagation;
>> > SAMZA-1337: LocalApplicationRunner needs to support StreamTask;
>> > SAMZA-1339: Add standalone integration tests;
>> > …
>> >
>> > There are also quite a few bug fixes in 0.13.1, *please check the
>> complete
>> > list of changes in 0.13.1 here
>> > <
>> > https://issues.apache.org/jira/browse/SAMZA-1165?jql=
>> 
project%20%3D%2012314526%20AND%20fixVersion%20%3D%2012340845%20ORDER%20BY%
>> 20priority%20DESC%2C%20key%20ASC
>> > >*
>> > .
>> >
>> > Most JIRAs in the list have been completed and merged, with the 
following
>> > one remaining, but we should try to get it completed before 0.13.1 is
>> > released.
>> > SAMZA-1385: Coordination utils in LocalApplicationRunner uses same Zk
>> node
>> > as ZkJobCoordinatorFactory for leader election
>> >
>> > Here's what I propose:
>> > 1. Cut an 0.13.1 release branch.
>> > 2. Work on getting the remaining open JIRA done.
>> > 3. Target a release vote by Aug 18.
>> >
>> > Thoughts?
>> >
>> > Fred
>> >
>> --
>> Sent from my iphone.
>>



-- 
Nacho - Ignacio Solis - iso...@igso.net






Re: Kafka client.id collision

2017-07-20 Thread Navina Ramesh (Apache)
Hi David,

I think this is expected to occur as a warning since we spin up all kafka
clients with the same client-id, which is $job.name + $job.id.

As Jagadish mentioned, it will be great if you can provide us the entire
log so that we can take a look.

As a side note for the samza contributors, I do believe the container spins
up kafka clients for each kafka systems defined, even if it is not used.
Iirc, we use `KafkaUtil.getClientId` for generating the client id. Perhaps
it makes sense to append another identifier with the client id (such as
system name or component name). That way, we won't lose the kafka-client
related metrics and there will be no overlap between the client ids.
Thoughts?

Thanks!
Navina

On Thu, Jul 20, 2017 at 9:13 AM, Jagadish Venkatraman <
jagadish1...@gmail.com> wrote:

> Can you share the entire log file if that's okay? The warning should be a
> red-herring IMHO.
>
> On Thu, Jul 20, 2017 at 7:50 AM Davide Simoncelli 
> wrote:
>
> > Hi,
> >
> > Thanks for the reply.
> >
> > It is a warning, but the application fails. Here is the logging:
> >
> >
> > 017-07-20 10:43:06.349 [main] AppInfoParser [INFO] Kafka version :
> 0.10.1.1
> > 2017-07-20 10:43:06.349 [main] AppInfoParser [INFO] Kafka commitId :
> > f10ef2720b03b247
> > 2017-07-20 10:43:06.351 [main] AppInfoParser [WARN] Error registering
> > AppInfo mbean
> > javax.management.InstanceAlreadyExistsException:
> > kafka.producer:type=app-info,id=samza_producer-wikipedia_feed-1
> > at com.sun.jmx.mbeanserver.Repository.addMBean(
> Repository.java:437)
> > at
> > com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.
> registerWithRepository(DefaultMBeanServerInterceptor.java:1898)
> > at
> > com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.
> registerDynamicMBean(DefaultMBeanServerInterceptor.java:966)
> > at
> > com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerObject(
> DefaultMBeanServerInterceptor.java:900)
> > at
> > com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(
> DefaultMBeanServerInterceptor.java:324)
> > at
> > com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(
> JmxMBeanServer.java:522)
> > at
> > org.apache.kafka.common.utils.AppInfoParser.registerAppInfo(
> AppInfoParser.java:58)
> > at
> > org.apache.kafka.clients.producer.KafkaProducer.(
> KafkaProducer.java:331)
> > at
> > org.apache.kafka.clients.producer.KafkaProducer.(
> KafkaProducer.java:163)
> > at
> > org.apache.samza.system.kafka.KafkaSystemFactory$$anonfun$3.
> apply(KafkaSystemFactory.scala:89)
> > at
> > org.apache.samza.system.kafka.KafkaSystemFactory$$anonfun$3.
> apply(KafkaSystemFactory.scala:89)
> > at
> > org.apache.samza.system.kafka.KafkaSystemProducer.send(
> KafkaSystemProducer.scala:144)
> > at
> > org.apache.samza.coordinator.stream.CoordinatorStreamSystemProduce
> r.send(CoordinatorStreamSystemProducer.java:113)
> > at
> > org.apache.samza.coordinator.stream.CoordinatorStreamWriter.
> sendSetConfigMessage(CoordinatorStreamWriter.java:98)
> > at
> > org.apache.samza.coordinator.stream.CoordinatorStreamWriter.sendMessage(
> CoordinatorStreamWriter.java:82)
> > at
> > org.apache.samza.job.yarn.SamzaYarnAppMasterService.onInit(
> SamzaYarnAppMasterService.scala:68)
> > at
> > org.apache.samza.job.yarn.YarnClusterResourceManager.start(
> YarnClusterResourceManager.java:180)
> > at
> > org.apache.samza.clustermanager.ContainerProcessManager.start(
> ContainerProcessManager.java:167)
> > at
> > org.apache.samza.clustermanager.ClusterBasedJobCoordinator.run(
> ClusterBasedJobCoordinator.java:154)
> > at
> > org.apache.samza.clustermanager.ClusterBasedJobCoordinator.main(
> ClusterBasedJobCoordinator.java:222)
> > 2017-07-20 10:43:06.549 [main] CoordinatorStreamWriter [INFO] Stopping
> the
> > coordinator stream producer.
> > 2017-07-20 10:43:06.549 [main] CoordinatorStreamSystemProducer [INFO]
> > Stopping coordinator stream producer.
> > 2017-07-20 10:43:06.549 [main] KafkaProducer [INFO] Closing the Kafka
> > producer with timeoutMillis = 9223372036854775807 ms.
> >
> >
> > > On 20 Jul 2017, at 3:16 pm, Jagadish Venkatraman <
> jagadish1...@gmail.com>
> > wrote:
> > >
> > > Hi Davide,
> > >
> > > Is this logged as an error or as a warning?
> > >
> > > IIUC, this warning should not fail the job. It may not cause some Mbean
> > > sensors / metrics emitted from Kafka to be correctly reported (since,
> > those
> > > are reported per-clientId).
> > >
> > > The job should still continue to run.
> > >
> > > The entire log file will be helpful for further debugging!
> > >
> > > On Thu, Jul 20, 2017 at 3:32 AM, Davide Simoncelli <
> > netcelli@gmail.com >
> > > wrote:
> > >
> > >> Hello,
> > >>
> > >> We are running Kafka 0.10.1.1 in production. Unfortunately the Samza
> app
> > >> fails to start 

Re: Samza Meetup

2017-07-20 Thread Navina Ramesh (Apache)
No worries. We would love to meet you in person too. Keep an eye out on the
mailing list for the Meetup link.

Cheers!
Navina

On Jul 20, 2017 08:37, "Renato Marroquín Mogrovejo" <
renatoj.marroq...@gmail.com> wrote:

> Thanks Jagadish and Navina!
> I am really interested in attending as I am in the area, it'd be my first
> in-person Samza meetup :D
> But unfortunately I don't have anything to present this time :(
>
>
> Renato M.
>
> 2017-07-18 23:46 GMT-07:00 Navina Ramesh (Apache) <nav...@apache.org>:
>
> > Hi Renato,
> >
> > We are planning for mid-August as a tentative target for the next meetup.
> >
> > If you are interested in participating or speaking at the meetup, please
> > let us know.
> >
> > Thanks!
> > Navina
> >
> >
> >
> > On Tue, Jul 18, 2017 at 10:36 AM, Renato Marroquín Mogrovejo <
> > renatoj.marroq...@gmail.com> wrote:
> >
> > > Hi Samza experts and users,
> > >
> > > I was wondering if there is going to be a meetup this summer or when
> the
> > > next one is.
> > > Thanks!
> > >
> > >
> > > Best,
> > >
> > > Renato M.
> > >
> >
>


Re: Samza Meetup

2017-07-19 Thread Navina Ramesh (Apache)
Hi Renato,

We are planning for mid-August as a tentative target for the next meetup.

If you are interested in participating or speaking at the meetup, please
let us know.

Thanks!
Navina



On Tue, Jul 18, 2017 at 10:36 AM, Renato Marroquín Mogrovejo <
renatoj.marroq...@gmail.com> wrote:

> Hi Samza experts and users,
>
> I was wondering if there is going to be a meetup this summer or when the
> next one is.
> Thanks!
>
>
> Best,
>
> Renato M.
>


Re: [VOTE] SEP-5: Enable partition expansion of input streams

2017-06-23 Thread Navina Ramesh (Apache)
After a lot of Q, let's get this done :)

+1 (binding)

Thanks!
Navina

On Tue, Jun 20, 2017 at 10:31 AM, xinyu liu  wrote:

> +1 (non-binding) on this design.
>
> To me the task-count based groupers should work well in practice for
> partition expansion of system using hash for partitions, e.g. Kafka. It
> will not cause any state transfer between hosts so the runtime cost will be
> minimal. In the future when we support dynamically re-balancing the tasks,
> we can further scale the task count if needed.
>
> Thanks,
> Xinyu
>
> On Mon, Jun 19, 2017 at 9:27 AM, Dong Lin  wrote:
>
> > Hi everyone,
> >
> > Can you please vote for SEP-5? The wiki can be found at
> > *https://cwiki.apache.org/confluence/display/SAMZA/SEP-
> > 5%3A+Enable+partition+expansion+of+input+streams
> >  > 5%3A+Enable+partition+expansion+of+input+streams>.*
> >
> > Thanks,
> > Dong
> >
>


Re: [VOTE] SEP-5: Enable partition expansion of input streams

2017-06-23 Thread Navina Ramesh
After a lot of Q, let's get this done :)

+1 (binding)

Thanks!
Navina

On Tue, Jun 20, 2017 at 10:31 AM, xinyu liu  wrote:

> +1 (non-binding) on this design.
>
> To me the task-count based groupers should work well in practice for
> partition expansion of system using hash for partitions, e.g. Kafka. It
> will not cause any state transfer between hosts so the runtime cost will be
> minimal. In the future when we support dynamically re-balancing the tasks,
> we can further scale the task count if needed.
>
> Thanks,
> Xinyu
>
> On Mon, Jun 19, 2017 at 9:27 AM, Dong Lin  wrote:
>
> > Hi everyone,
> >
> > Can you please vote for SEP-5? The wiki can be found at
> > *https://cwiki.apache.org/confluence/display/SAMZA/SEP-
> > 5%3A+Enable+partition+expansion+of+input+streams
> >  > 5%3A+Enable+partition+expansion+of+input+streams>.*
> >
> > Thanks,
> > Dong
> >
>



-- 
Navina R.


Re: [DISCUSS] SEP-5: Enable partition expansion of input streams

2017-06-23 Thread Navina Ramesh
Yi,
Thanks for summarizing. I think we should deal with further code related
changes/discussions in the PR directly since this SEP has been open for a
while. Let's try to wrap up the discussions by today.

@Dong: Thanks for updating the SEP. I think the TestPlan section is TBD
right now. You can update it whenever you get to it. Thanks a bunch for
your patience!

Cheers!
Navina

On Thu, Jun 22, 2017 at 3:36 PM, Dong Lin <lindon...@gmail.com> wrote:

> Hey Yi,
>
> Thanks for the detailed comment and the summary!
>
> To address your comments:
>
> 1) The current names are GroupByPartitionWithFixedTaskNum and
> GroupBySystemStreamPartitionWithFixedTaskNum. Instead of
> FixedTasksGroupByPartition
> and FixedTasksGroupBySystemStreamPartition, how about GroupByPartition
> FixedTasks and GroupBySystemStreamPartitionFixedTasks? The new names are
> equally long as the names you suggested. It seems a bit more intuitive
> because they would be prefixed with the grouper class name of their
> no-fixed-tasks counterpart. I have updated wiki with the new names. Can you
> let me know if it is OK?
>
> 2) Initially I want to design that config and interface later when we have
> more use-case so that we can have higher confidence in the interface
> design. But it seems that one common concern with the proposal is about its
> limitation assumption in the the old-partition-to-new-partition mapping. I
> have updated the wiki to illustrate the design of this interface and the
> new (and more general) assumption for the input system to use this
> partition expansion. Can you take a look and see if it is reasonable?
>
> 3) Yeah previously Jacob has raised the same concern and the solution is
> exactly the same as you suggested.
>
> Hey everyone,
>
> I have made non-trivial change to the wiki to illustrate the use of new
> config and interface for user to specify new-partition-to-old-partition
> mapping. Can you please help review it?
>
> Thanks,
> Dong
>
>
> On Thu, Jun 22, 2017 at 2:25 AM, Yi Pan <nickpa...@gmail.com> wrote:
>
> > Hi, Dong and everyone,
> >
> > Thanks for the detailed discussion on SEP-5! Really appreciate the
> thorough
> > consideration on this issue. I also noticed that Dong has updated the
> SEP-5
> > wiki to clarify:
> > 1) SEP-5 provides a solution to retain the same number of task/state w/o
> > re-partitioning (as illustrated in the stateful join example)
> > 2) Future work to expand number of tasks need to work together with
> > flexible re-partitioning to provide a complete solution
> >
> > Due to the cost to be paid in task number expansion:
> > 1) additional network I/O and latency in re-partitioning
> > 2) shuffling of the states among tasks
> > The current form of SEP-5 provides an alternative when partition
> expansion
> > in the messaging system is not due to increase of total input rate.
> >
> > The concern on the added complexity in grouper logic is valid. However,
> the
> > grouper-based solution is not completely unreasonable:
> > 1) Grouper is a public interface and we are already open to customized
> > implementation of groupers, although not being a main use case
> > 2) Deprecation of existing config-driven grouper needs longer time effort
> > to wait for fluent API has a better planner to automatically figuring out
> > the grouper to be used and stateful task expansion is automated. Hence,
> for
> > a foreseeable long time, grouper is still configured by the user.
> >
> > So, in general, I am in favor of the proposed SEP-5, given that it
> provides
> > a least-resistance to address some pain points for Samza users, w/o
> > breaking any existing use cases in opt-in mode.
> >
> > Some minor suggestions:
> > 1) The class names are too long. Can we change them to
> > FixedTasksGroupByPartition and FixedTasksGroupBySystemStreamPartition?
> > 2) I am still in favor of configurable partition expansion (i.e.
> new<->old
> > partition mapping) policy, since it makes this solution more general and
> > not fixed for Kafka. I am OK with default to power-of-2 expansion policy
> > and not introducing new config variable now.
> > 3) In the checkpoint/coordinator topic validation, KafkaCheckpointLogKey
> > class validates the current grouper factory class == the previous grouper
> > factory class in previous checkpoint. We need to make sure that we allow
> > the compatible change from GroupByPartition to
> FixedTasksGroupByPartition,
> > etc. Since FixedTasksGroupByPartition is a derived interface of
> > GroupByPartition, one possible solution is to check assignable (if
> current
> 

Re: [DISCUSS] SEP-5: Enable partition expansion of input streams

2017-06-21 Thread Navina Ramesh (Apache)
> But IMO it is the best available solution towards the support of
partition expansion in comparison to alternative, no?

At this time, relative to the other alternatives you have listed, this is a
path of least effort to solving this problem. I agree to that. :)

> I can merge those two sections or update the statement if the current 
> statement
has not clearly explained the reason of partition expansion in Kafka.

Given the significance of what you are actually trying to solve, I think it
will be better to have it in points. Let me come find you and we can update
it.

> I have updated wiki and added the task expansion to the Future Work section.
On the other hand I still keep it in the Rejected Alternative section to
explain why this future work does not replace the existing proposal in
SEP-5. Does this sound reasonable?

It is very confusing to me how the same point can be under "Future Work"
and "Rejected Alternative". There is no question about the future work
*replacing* SEP-5. Iiuc, this SEP is a subset for the partition expansion
solution. So, I don't think increasing task count should be a rejected
alternative.

> I am also not sure why a feature needs to be "utmost priority" in order
to be accepted. Can you explain a bit on that?

I don't think I ever claimed that the feature needs to be of "utmost
priority" to be accepted. I was just stating my opinion.


Thanks!
Navina

On Wed, Jun 21, 2017 at 3:52 PM, Dong Lin <lindon...@gmail.com> wrote:

> Thanks much for the reply Navina. Please see my reply inline.
>
> On Wed, Jun 21, 2017 at 2:57 PM, Navina Ramesh (Apache) <nav...@apache.org
> >
> wrote:
>
> > Thanks to Jake, Dong and Kartik for keeping the discussion going.
> >
> > > Here are the pros and cons of the extra re-partitioning stage in
> > comparison
> > to SEP-5.
> >
> > I think that is good summarization of pros/cons for the repartitioning
> > stage based solution. Can you please include it in your SEP? It seems
> like
> > you already have access. If you are still unable to access the wiki page,
> > feel free to walk over to Samza area and find me!
> >
>
> Sure. I have added this summary to the Alternative Section.
>
>
> >
> > > I think there is always a way for user to mess up their job if they
> > configure the Samza job incorrectly.
> >
> > I don't think Jake or anyone is arguing about an "incorrectly" configured
> > Samza job. The question was towards how easy/difficult it is for users to
> > *not mess* up their job with incorrect configurations.
> >
> > > I also think the assumption made in this SEP is not particularly harder
> > to understand than other existing configs in Samza.
> >
> > I disagree here. Other configs don't require you understand more than one
> > assumption.
> >
> > There is already an overload of configs in Samza and I think we are
> trying
> > to shield it as much as possible from the users (esp. with fluent api).
> > More specifically, we don't want the user to know about the internals of
> > Samza such ssp grouper, taskname grouper etc. Since the proposed solution
> > makes the configuration more complex to understand, it *is a* burden on
> the
> > user.
> >
> > Just because configs are the way it is, it doesn't mean we increase the
> > complexity of it and push the burden on users to manage it correctly. My
> > two cents.
> >
>
> Sure, I agree the proposal requires user to understand the assumption in
> order to expand the partition of the topic. But it is very subjective as to
> whether the added complexity is acceptable or not. If there is better way
> to allow user to expand partition of the input stream without making
> assumption, then we can just do that. The current solution is not perfect.
> But IMO it is the best available solution towards the support of partition
> expansion in comparison to alternative, no?
>
>
> > Here are a few things that I believe are needed for wrapping up the SEP:
> >
> > 1. For the longest time, I thought partition expansion happens in Kafka
> > only when the volume of messages across partitions is too high. Based on
> > this assumption, I would only assume that re-mapping expanded partitions
> to
> > the same task will have adverse effect on the throughput/resource
> > utilization of the processor/container in Samza (for example, disk
> > utilization may increase significantly. With disk quota throttling, it
> > could cause the processor to drop.). However, after speaking with Xinyu,
> it
> > turns out that partition expansion also happens when there is a
> > per-partition data ret

Re: [DISCUSS] SEP-5: Enable partition expansion of input streams

2017-06-21 Thread Navina Ramesh (Apache)
Thanks to Jake, Dong and Kartik for keeping the discussion going.

> Here are the pros and cons of the extra re-partitioning stage in
comparison
to SEP-5.

I think that is good summarization of pros/cons for the repartitioning
stage based solution. Can you please include it in your SEP? It seems like
you already have access. If you are still unable to access the wiki page,
feel free to walk over to Samza area and find me!

> I think there is always a way for user to mess up their job if they
configure the Samza job incorrectly.

I don't think Jake or anyone is arguing about an "incorrectly" configured
Samza job. The question was towards how easy/difficult it is for users to
*not mess* up their job with incorrect configurations.

> I also think the assumption made in this SEP is not particularly harder
to understand than other existing configs in Samza.

I disagree here. Other configs don't require you understand more than one
assumption.

There is already an overload of configs in Samza and I think we are trying
to shield it as much as possible from the users (esp. with fluent api).
More specifically, we don't want the user to know about the internals of
Samza such ssp grouper, taskname grouper etc. Since the proposed solution
makes the configuration more complex to understand, it *is a* burden on the
user.

Just because configs are the way it is, it doesn't mean we increase the
complexity of it and push the burden on users to manage it correctly. My
two cents.

Here are a few things that I believe are needed for wrapping up the SEP:

1. For the longest time, I thought partition expansion happens in Kafka
only when the volume of messages across partitions is too high. Based on
this assumption, I would only assume that re-mapping expanded partitions to
the same task will have adverse effect on the throughput/resource
utilization of the processor/container in Samza (for example, disk
utilization may increase significantly. With disk quota throttling, it
could cause the processor to drop.). However, after speaking with Xinyu, it
turns out that partition expansion also happens when there is a
per-partition data retention limit imposed by Kafka (not sure if it is only
in LinkedIn or in Kafka open-source as well). Imo, this is the primary
use-case that we are trying to solve for in Samza and it is not very
obvious from the SEP.
@Dong, can you please explain *the circumstances under which partition
expansion can happen*, under "Motivation" section?  I disagree to the
current motivation described as -> "This design doc provides a solution to
increase partition number of the input streams of a stateful Samza job
while still ensuring the correctness of Samze job output. "
This is a solution, albeit not fully done through this SEP alone.

2. I think we are in consensus about the fact that increasing the task
number and handling the state correctly is a good solution for Samza in the
long-run. In your rejected alternatives, you mention "However, this feature
alone does not solve the problem of allowing partition expansion.". What
else is required to allow partition expansion? Can you please elaborate on
that in point #1 of the rejected alternatives? If there is still more work
to be done to support partition expansion in Samza, it is worthwhile to
mention it under *Future Work*, instead of under "Rejected Alternatives".
Perhaps you were waiting for edit permissions to the wiki. Please make this
change so it is well-tracked.

I am still not totally crazy about the proposed solution because it is not
clear for open-source, who or which use-cases stand to benefit. I am not
convinced that this problem is of utmost priority for the Samza community
*at this point of time*.

I am on the same page as Jake on this one. Not a +1, just a 0 (if that even
matters).

Thanks!
Navina

On Sun, Jun 18, 2017 at 12:04 AM, Dong Lin  wrote:

> BTW, I will update the SEP-5 wiki with our latest discussion after I have
> got the wiki edit access.
>
> On Sat, Jun 17, 2017 at 11:36 PM, Dong Lin  wrote:
>
> > Thanks everyone for the comment!
> >
> > I am currently leaning towards the current approach. I think Kartik
> raised
> > a good point that the extra repartitoning stage will also incur
> additional
> > throughput on Kafka in addition to the potential storage cost. Any other
> > Samza developers also chime in and provide your opinions on this
> proposal?
> >
> > Since this discussion thread has been open for three weeks, I will
> > initiate voting thread on Monday if there is no major revision
> suggestion.
> >
> > Thanks,
> > Dong
> >
> >
> > On Thu, Jun 15, 2017 at 6:32 PM, Kartik Paramasivam <
> > kparamasi...@linkedin.com.invalid> wrote:
> >
> >> Great discussion !
> >>
> >> Here are some more thoughts
> >>
> >> The point that repartitioning is a more general purpose solution is
> surely
> >> spot on.  For many source systems (Kinesis, Google Pub-Sub, any of the
> >> older queuing systems 

Re: [VOTE] Apache Samza 0.13.0 RC6

2017-06-08 Thread Navina Ramesh (Apache)
+1 (binding)

Thanks to everyone for diligently testing out the RCs and getting this
release out!

Cheers!
Navina

On Thu, Jun 8, 2017 at 9:09 AM, Chris Riccomini 
wrote:

> +1 (binding)
>
> On Wed, Jun 7, 2017 at 8:55 AM, Yi Pan  wrote:
>
> > +1 (binding)
> > build and ran all local integration tests on Linux.
> >
> > On Tue, Jun 6, 2017 at 4:01 PM, Boris S  wrote:
> >
> > > +1 (non-binding)
> > > build and tested on Linux (with python 2.7; 2.4 and 3.5 - didn't work)
> > >
> > > On Tue, Jun 6, 2017 at 2:49 PM, Jacob Maes 
> wrote:
> > >
> > > > +1 (non-binding)
> > > >
> > > > Built and tested on both OSX and RHEL with gradle 2.0 and 2.2
> > > respectively.
> > > >
> > > > Also verified the high level API + YARN host affinity on a test job
> > with
> > > 32
> > > > containers.
> > > >
> > > >
> > > >
> > > > On Tue, Jun 6, 2017 at 9:14 AM, xinyu liu 
> > wrote:
> > > >
> > > > > +1 (non-binding).
> > > > >
> > > > > Downloaded the source tar, built it and run check-all.sh on REHL6
> > with
> > > > > gradle 2.8. All passed.
> > > > >
> > > > > As a side note to Jagadish's comments, the build doesn't work on a
> > > higher
> > > > > gradle version either (gradle 3.5). Seems
> > > "-language:implicitConversions
> > > > > -language:reflectiveCalls" is not a valid build option anymore.
> > > > >
> > > > > Thanks,
> > > > > Xinyu
> > > > >
> > > > > On Mon, Jun 5, 2017 at 10:06 AM, Jagadish Venkatraman <
> > > > > jagadish1...@gmail.com> wrote:
> > > > >
> > > > > > Checked out, ran tests, and all of them pass.
> > > > > >
> > > > > > +1 (non-binding)
> > > > > >
> > > > > > I did get an error when running with gradle 2.4:
> > > > > > >>Could not resolve all dependencies for configuration
> > > > > > ':samza-kafka_2.11:compile'. > java.lang.
> > > UnsupportedOperationException
> > > > > (no
> > > > > > error message)
> > > > > >
> > > > > > However, when I used gradle 2.8, it was resolved.
> > > > > >
> > > > > > *gradle wrapper --gradle-version 2.8*
> > > > > >
> > > > > > Best,
> > > > > > Jagadish
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Mon, Jun 5, 2017 at 8:37 AM, Jake Maes 
> > wrote:
> > > > > >
> > > > > > > This is a call for a vote on a release of Apache Samza 0.13.0.
> > > Thanks
> > > > > to
> > > > > > > everyone who has contributed to this release. We are very glad
> to
> > > see
> > > > > > some
> > > > > > > new contributors and features in this release.
> > > > > > >
> > > > > > > The release candidate can be downloaded from here:
> > > > > > > http://home.apache.org/~jmakes/samza-0.13.0-rc6/
> > > > > > >
> > > > > > > The release candidate is signed with pgp key 940AFC5A, which
> can
> > be
> > > > > found
> > > > > > > on keyservers:
> > > > > > > *http://pgp.mit.edu/pks/lookup?op=get=0x940AFC5A
> > > > > > > *
> > > > > > >
> > > > > > > The git tag is release-0.13.0-rc6 and signed with the same pgp
> > key:
> > > > > > > https://git-wip-us.apache.org/repos/asf?p=samza.git;a=tag;h=
> > > > > > > refs/tags/release-0.13.0-rc6
> > > > > > >
> > > > > > > Test binaries have been published to Maven's staging
> repository,
> > > and
> > > > > are
> > > > > > > available here:
> > > > > > > https://repository.apache.org/content/repositories/
> > > > orgapachesamza-1026
> > > > > > >
> > > > > > > 144 issues were resolved for this release:
> > > > > > > https://issues.apache.org/jira/issues/?jql=project%20%3D%
> > > > > > > 20SAMZA%20AND%20fixVersion%20in%20(0.13%2C%200.13.0)%
> > > > > > > 20AND%20status%20in%20(
> > > > > > > Resolved%2C%20Closed)
> > > > > > >
> > > > > > > The vote will be open for 72 hours (ending at 9:00AM Thursday,
> > > > > > 06/08/2017).
> > > > > > >
> > > > > > > Please download the release candidate, check the
> > hashes/signature,
> > > > > build
> > > > > > it
> > > > > > > and test it, and then please vote:
> > > > > > >
> > > > > > >
> > > > > > > [ ] +1 approve
> > > > > > >
> > > > > > > [ ] +0 no opinion
> > > > > > >
> > > > > > > [ ] -1 disapprove (and reason why)
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Jagadish V,
> > > > > > Graduate Student,
> > > > > > Department of Computer Science,
> > > > > > Stanford University
> > > > > >
> > > > >
> > > >
> > >
> >
>


Re: Wiki Spam

2017-06-07 Thread Navina Ramesh
> Given that admin privs are handed out to PMCs along with explicit
instructions not to change the permissions for the anonymous user, I'd
like to understand what went wrong in this case (with a view to ensuring
it doesn't happen again) before re-enabling admin permissions.

Agreed. Afaik, there are only 2 "active" PMCs in our project and I don't
believe either of us gave permissions for anonymous user.

> There were also a bunch of people who are neither PMC members nor
committers who had admin privs on your space. I'd very much prefer to
see admin privs limited to active PMC members and committers moving
forwards.

Yes. This was a mistake on our part as we should have been cautious on the
permissions we provide for contributors. Going forward, we want to correct
these permissions grants. We just want to make sure there is an avenue for
us to request permissions.

Thanks!

On Wed, Jun 7, 2017 at 12:47 PM, Mark Thomas  wrote:

> On 07/06/17 18:04, Jagadish Venkatraman wrote:
> > Hi Mark,
> >
> > Thanks for bringing this to our notice.
> >
> >>> This is because someone, going against ASF infrastructure policy,
> > altered the permissions for the anonymous user allowing them write
> > permissions
> >
> > Do we know when this occurred? I presume this was a lapse.
>
> It looks as if it was around the beginning of last month based on the
> dates of the pages I removed.
>
> >
> >>>  A samza-dev user has been created and configured to watch the
> > Samza wiki space for changes
> >
> > Sounds great! Does that mean that notifications for changes in the Samza
> > wiki space will now be sent to this mailing list?
>
> This wasn't working. It looks like those notifications will need to go
> to the commits list. I'll get that changed shortly and see if that fixes
> the problem.
>
> >>>  All users currently assigned permissions on the Samza wiki have had
> all
> > their permissions revoked except for viewing.
> >
> > We will re-assess all permissions, and set them up again.  I'm assuming
> > PMCs will still be able to do this?
>
> Not at the moment. PMC members currently have read access only.
>
> Given that admin privs are handed out to PMCs along with explicit
> instructions not to change the permissions for the anonymous user, I'd
> like to understand what went wrong in this case (with a view to ensuring
> it doesn't happen again) before re-enabling admin permissions.
>
> There were also a bunch of people who are neither PMC members nor
> committers who had admin privs on your space. I'd very much prefer to
> see admin privs limited to active PMC members and committers moving
> forwards.
>
> Mark
>
>
> >
> > Best,
> > Jagadish
> >
> > On Wed, Jun 7, 2017 at 6:13 AM, Mark Thomas  > > wrote:
> >
> > Dear Samza developer community,
> >
> > It has been brought to the infrastructure team's attention that your
> > wiki [1] is covered in spam. This is because someone, going against
> ASF
> > infrastructure policy, altered the permissions for the anonymous user
> > allowing them write permissions.
> >
> > During the investigation it was noticed that change notifications for
> > your wiki were not being sent to a public mailing list so that the
> > community could monitor all changes to the wiki.
> >
> > Therefore, the following actions have been taken:
> >
> > - All users currently assigned permissions on the Samza wiki have had
> > all their permissions revoked except for viewing.
> >
> > - A samza-dev user has been created and configured to watch the Samza
> > wiki space for changes
> >
> > Additionally, the spam pages will shortly be removed.
> >
> > Mark
> > on behalf of the ASF infrastructure team
> >
> > [1] https://cwiki.apache.org/confluence/display/SAMZA/Apache+Samza
> > 
> >
> >
>
>


-- 
Navina R.


Re: Wiki Spam

2017-06-07 Thread Navina Ramesh
Hi Mark,
Thanks for letting us know.

We will re-asses our permissions and set them up. Should we reach out to
Gavin to set them up? It will be great to have one or more of the PMCs have
access to assign permission to reduce the turn-over time. Please let us
know the procedure.

Thanks!
Navina

On Wed, Jun 7, 2017 at 10:04 AM, Jagadish Venkatraman 
wrote:

> Hi Mark,
>
> Thanks for bringing this to our notice.
>
> >> This is because someone, going against ASF infrastructure policy,
> altered the permissions for the anonymous user allowing them write
> permissions
>
> Do we know when this occurred? I presume this was a lapse.
>
> >>  A samza-dev user has been created and configured to watch the Samza
> wiki
> space for changes
>
> Sounds great! Does that mean that notifications for changes in the Samza
> wiki space will now be sent to this mailing list?
>
> >>  All users currently assigned permissions on the Samza wiki have had all
> their permissions revoked except for viewing.
>
> We will re-assess all permissions, and set them up again.  I'm assuming
> PMCs will still be able to do this?
>
> Best,
> Jagadish
>
> On Wed, Jun 7, 2017 at 6:13 AM, Mark Thomas  wrote:
>
> > Dear Samza developer community,
> >
> > It has been brought to the infrastructure team's attention that your
> > wiki [1] is covered in spam. This is because someone, going against ASF
> > infrastructure policy, altered the permissions for the anonymous user
> > allowing them write permissions.
> >
> > During the investigation it was noticed that change notifications for
> > your wiki were not being sent to a public mailing list so that the
> > community could monitor all changes to the wiki.
> >
> > Therefore, the following actions have been taken:
> >
> > - All users currently assigned permissions on the Samza wiki have had
> > all their permissions revoked except for viewing.
> >
> > - A samza-dev user has been created and configured to watch the Samza
> > wiki space for changes
> >
> > Additionally, the spam pages will shortly be removed.
> >
> > Mark
> > on behalf of the ASF infrastructure team
> >
> > [1] https://cwiki.apache.org/confluence/display/SAMZA/Apache+Samza
> >
>



-- 
Navina R.


Re: [DISCUSS] SEP-5: Enable partition expansion of input streams

2017-05-31 Thread Navina Ramesh (Apache)
 at the end of the Motivation section that "*The feature of
> > task expansion is out of the scope of this proposal and will be addressed
> > in a future SEP*". The second paragraph in the Motivation section is
> mainly
> > used to explain the thinking process that we have gone through, what
> other
> > alternative we have considered, and we plan to do in Samza in the nex
> step.
> >
> > To answer your question why increasing the partition number will increase
> > the throughput of the kafka consumer in the container, Kafka consumer can
> > potentially fetch more data in one FetchResponse with more partitions in
> > the FetchRequest. This is because we limit the maximum amount of data
> that
> > can be fetch for a given partition in the FetchResponse. This by default
> is
> > set to 1 MB. And there is reason that we can not arbitrarily bump up this
> > limit.
> >
> > To answer your question how partition expansion in Kafka impacts the
> > clients, Kafka consumer is able to automatically detect new partition of
> > the topic and reassign all (both old and new) partitions across consumers
> > in the consumer group IF you tell consumer the topic to be subscribed.
> But
> > consumer in Samza's container uses another way of subscription. Instead
> of
> > subscribing to the topic, the consumer in Samza's container subscribes to
> > the specific partitions of the topic. In this case, if new partitions
> have
> > been added, Samza will need to explicitly subscribe to the new partitions
> > of the topic. The "Handle partition expansion while tasks are running"
> > section in the SEP addresses this issue in Samza -- it recalculates the
> job
> > model and restart container so that consumer can subscribe to the new
> > partitions.
> >
> > I will ask other dev to take a look at the proposal. I will start the
> > voting thread tomorrow if there is no further concern with the SEP.
> >
> > Thanks!
> > Dong
> >
> >
> > On Wed, May 31, 2017 at 12:01 AM, Navina Ramesh (Apache) <
> > nav...@apache.org>
> > wrote:
> >
> > > Hey Dong,
> > >
> > > >  I have updated the motivation section to clarify this.
> > >
> > > Thanks for updating the motivation. Couple of notes here:
> > >
> > > 1.
> > > > "The motivation of increasing partition number of Kafka topic
> includes
> > 1)
> > > limit the maximum size of a partition in order to improve broker
> > > performance and 2) increase throughput of Kafka consumer in the Samza
> > > container."
> > >
> > > It's unclear to me how increasing the partition number will increase
> the
> > > throughput of the kafka consumer in the container? Theoretically, you
> > will
> > > still be consuming the same amount of data in the container,
> irrespective
> > > of whether it is coming from one partition or more than one expanded
> > > partitions. Can you please explain it for me here, what you mean by
> that?
> > >
> > > 2. I believe the second paragraph under motivation is simply talking
> > about
> > > the scope of the current SEP. It will be easier to read if what
> solution
> > is
> > > included in this SEP and what is left out as not in scope. (for
> example,
> > > expansions for stateful jobs is supported or not).
> > >
> > > > We need to persist the task-to-sspList mapping in the
> > > coordinator stream so that the job can derive the original number of
> > > partitions of each input stream regardless of how many times the
> > partition
> > > has expanded. Does this make sense?
> > >
> > > Yes. It does!
> > >
> > > > I am not sure how this is related to the locality though. Can you
> > clarify
> > > your question if I haven't answered your question?
> > >
> > > It's not related. I just meant to give an example of yet another
> > > coordinator message that is persisted. Your ssp-to-task mapping is
> > > following a similar pattern for persisting. Just wanted to clarify
> that.
> > >
> > > > Can you let me know if this, together with the answers in the
> previous
> > > email, addresses all your questions?
> > >
> > > Yes. I believe you have addressed most of my questions. Thanks for
> taking
> > > time to do that.
> > >
> > > > Is there specific question you have regarding partition
> > > expansion in Kafka?
> > >
> > >

Re: [DISCUSS] SEP-5: Enable partition expansion of input streams

2017-05-31 Thread Navina Ramesh (Apache)
> On Wed, May 24, 2017 at 11:15 PM, Dong Lin <lindon...@gmail.com> wrote:
>
> > Hey Navina,
> >
> > Thanks much for your comments. Please see my reply inline.
> >
> > On Wed, May 24, 2017 at 10:22 AM, Navina Ramesh (Apache) <
> > nav...@apache.org> wrote:
> >
> >> Thanks for the SEP, Dong. I have a couple of questions to understand
> your
> >> proposal better:
> >>
> >> * Under motivation, you mention that "_We expect this solution to work
> >> similarly with other input system as well._", yet I don't see any
> >> discussion on how it will work with other input systems. That is, what
> >> kind
> >> of contract does samza expect from other input systems ? If we are not
> >> planning to provide a generic solution, it might be worth calling it out
> >> in
> >> the SEP.
> >>
> >
> > I think the contract we expect from other systems are exactly the
> > operational requirement mentioned in the SEP, i.e. partitions should
> always
> > be doubled and the hash algorithm should module the number of partitions.
> > SEP-5 should also allow partition expansion of all input systems that
> meet
> > these two requirements. I have updated the motivation section to clarify
> > this.
> >
> >
> >>
> >> * I understand the partition mapping logic you have proposed. But I
> think
> >> the example explanation doesn't match the diagram. In the diagram, after
> >> expansion, partiion-0 and partition-1 are pointing to bucket 0 and
> >> partition-3 and partition-4 are pointing to bucket 1. I think the former
> >> has to be partition-0 and partition-2 and the latter, is partition-1 and
> >> partition-3. If I am wrong, please help me understand the logic :)
> >>
> >
> > Good catch. I will update the figure to fix this problem.
> >
> >
> >>
> >> * I don't know how partition expansion in Kafka works. I am familiar
> with
> >> how shard splitting happens in Kinesis - there is hierarchical relation
> >> between the parent and child shards. This way, it will also allow the
> >> shards to be merged back. Iiuc, Kafka only supports partition
> "expansion",
> >> as opposed to "splits". Can you provide some context or link related to
> >> how
> >> partition expansion works in Kafka?
> >>
> >
> > I couldn't find any wiki on partition expansion in Kafka. The partition
> > expansion logic in Kafka is very simply -- it simply adds new partition
> to
> > the existing topic. Is there specific question you have regarding
> partition
> > expansion in Kafka?
> >
> >
> >>
> >> * Are you only recommending that expansion can be supported for samza
> jobs
> >> that use Kafka as input systems **and** configure the SSPGrouper as
> >> GroupByPartitionFixedTaskNum? Sounds to me like this only applies for
> >> GroupByPartition. Please correct me if I am wrong. What is the
> expectation
> >> for custom SSP Groupers?
> >>
> >
> > The expansion can be supported for Samza jobs if the input system meets
> > the operational requirement mentioned above. It doesn't have to use Kafka
> > as input system.
> >
> > The current proposal provided solution for jobs that currently use
> > GroupByPartition. The proposal can be extended to support jobs that use
> > other grouper that are pre-defined in Samza. The custom SSP grouper needs
> > to handle partition expansion similar to how GroupByPartitionFixedTaskNum
> > handles it and it is users' responsibility to update their custom grouper
> > implementation.
> >
> >
> >>
> >> * Regarding storing SSP-to-Task assignment to coordinator stream: Today,
> >> the JobModel encapsulates the data model in samza which also includes
> >> **TaskModels**. TaskModel, typically shows the task-to-sspList mapping.
> >> What is the reason for using a separate coordinator stream message
> >> *SetSSPTaskMapping*? Is it because the JobModel itself is not persisted
> in
> >> the coordinator stream today?  The reason locality exists outside of the
> >> jobmodel is because *locality* information is written by each container,
> >> where as it is consumed only by the leader jobcoordinator/AM. In this
> >> case,
> >> the writer of the mapping information and the reader is still the leader
> >> jobcoordinator/AM. So, I want to understand the motivation for this
> >> choice.
> >>
> >
&

[VOTE] Apache Samza 0.13.0 RC1

2017-05-24 Thread Navina Ramesh (Apache)
Hi everyone,

This is a call for a vote on a release of Apache Samza 0.13.0. Thanks to
everyone who has contributed to this release. We are very glad to see some
new contributors and features in this release.

The release candidate can be downloaded from here:
*http://home.apache.org/~navina/samza-0.13.0-rc1/
*

The release candidate is signed with pgp key 331C8F69 , which can be found
on keyservers:
http://pgp.mit.edu/pks/lookup?op=get=0x331C8F69

The git tag is release-0.13.0-rc1 and signed with the same pgp key:
https://git-wip-us.apache.org/repos/asf?p=samza.git;a=tag;h=refs/tags/release-0.13.0-rc1

Test binaries have been published to Maven's staging repository, and are
available here:
https://repository.apache.org/content/repositories/orgapachesamza-1021

137 issues were resolved for this release:
https://issues.apache.org/jira/issues/?jql=project%20%
3D%20SAMZA%20AND%20fixVersion%20in%20(0.13%2C%200.13.0)%
20AND%20status%20in%20(Resolved%2C%20Closed)

The vote will be open for 3 *working* days (ending at 8:00PM Monday,
05/13/2017). We have an extended deadline this time as it is too close to a
long weekend.

Please download the release candidate, check the hashes/signature, build it
and test it, and then please vote:


[ ] +1 approve

[ ] +0 no opinion

[ ] -1 disapprove (and reason why)

Cheers!
Navina


Re: [VOTE] Apache Samza 0.13.0 RC0

2017-05-17 Thread Navina Ramesh (Apache)
Prateek told me that he sent out a cancel email. It didn't reach the
mail-archive I think. Lately, we have this kind of issues where the emails
are not reaching our dev list.

On Wed, May 17, 2017 at 2:06 PM, Yi Pan <nickpa...@gmail.com> wrote:

> Hi, all,
>
> Based on the conversation above, can we officially cancel this vote?
>
> Thanks!
>
> -Yi
>
> On Mon, May 15, 2017 at 9:31 AM, Ignacio Solis <iso...@igso.net> wrote:
>
> > Thanks!
> >
> > On Mon, May 15, 2017 at 8:00 AM, Navina Ramesh
> > <nram...@linkedin.com.invalid> wrote:
> > > I will try to get the patch out today. Work doesn't look trivial. I am
> on
> > > it.
> > >
> > > Navina
> > >
> > > On May 14, 2017 23:10, "Ignacio Solis" <iso...@igso.net> wrote:
> > >
> > >> We should hold off until it is solved.  How long will it take to fix
> > this?
> > >>
> > >> On Sun, May 14, 2017 at 10:13 PM, Navina Ramesh (Apache)
> > >> <nav...@apache.org> wrote:
> > >> > I just changed the status of this JIRA to "BLOCKER" -
> > >> > https://issues.apache.org/jira/browse/SAMZA-1128
> > >> >
> > >> > This causes a bug in standalone deployment where any failure in the
> > >> barrier
> > >> > protocol stops the scheduled executorservice. Unfortunately,
> > >> > CoordinationUtils creates its own scheduled executorservice, which
> is
> > >> > incorrect. Scheduled ExecutorService is meant to be the working
> queue
> > for
> > >> > the ZkJobCoordinator. This needs to be fixed. Bharath already ran
> into
> > >> this
> > >> > bug during testing on Friday.
> > >> >
> > >> > veto for this release candidate.
> > >> >
> > >> > @Prateek/Jagadish:
> > >> > I recommend sending a "non-vote, testing release candidate" for this
> > >> > release until we complete all pending tasks (includes docs, tests
> > etc).
> > >> It
> > >> > will also be useful to share the pending tasks and their progress.
> In
> > >> case
> > >> > you have already shared it, I might have missed it since some emails
> > are
> > >> > bouncing off my inbox.
> > >> >
> > >> > Thanks!
> > >> > Navina
> > >> >
> > >> > On Sun, May 14, 2017 at 1:30 PM, Boris S <bor...@gmail.com> wrote:
> > >> >
> > >> >> I think we need to add SAMZA-1286 and
> > >> >> SAMZA-1279 to the release .
> > >> >>
> > >> >> On Wed, May 10, 2017 at 7:51 PM, Jagadish Venkatraman <
> > >> jagad...@apache.org
> > >> >> >
> > >> >> wrote:
> > >> >>
> > >> >> > This is a call for a vote on a release of Apache Samza 0.13.0.
> > Thanks
> > >> to
> > >> >> > everyone who has contributed to this release. We are very glad to
> > see
> > >> >> some
> > >> >> > new contributors and features in this release.
> > >> >> >
> > >> >> > The release candidate can be downloaded from here:
> > >> >> > http://home.apache.org/~jagadish/samza-0.13.0-rc0/
> > >> >> >
> > >> >> > The release candidate is signed with pgp key AF81FFBF, which can
> be
> > >> found
> > >> >> > on keyservers:
> > >> >> > http://pgp.mit.edu/pks/lookup?op=get=0xAF81FFBF
> > >> >> >
> > >> >> > The git tag is release-0.13.0-rc0 and signed with the same pgp
> key:
> > >> >> > https://git-wip-us.apache.org/repos/asf?p=samza.git;a=tag;h=
> > >> >> > refs/tags/release-0.13.0-rc0
> > >> >> >
> > >> >> > Test binaries have been published to Maven's staging repository,
> > and
> > >> are
> > >> >> > available here:
> > >> >> > https://repository.apache.org/content/repositories/
> > >> orgapachesamza-1020
> > >> >> >
> > >> >> > 127 issues were resolved for this release:
> > >> >> > https://issues.apache.org/jira/issues/?jql=project%20%
> > >> >> > 3D%20SAMZA%20AND%20fixVersion%20in%20(0.13%2C%200.13.0)%
> > >> >> > 20AND%20status%20in%20(Resolved%2C%20Closed)
> > >> >> >
> > >> >> > The vote will be open for 72 hours (ending at 8:00PM Saturday,
> > >> >> 05/13/2017).
> > >> >> >
> > >> >> > Please download the release candidate, check the
> hashes/signature,
> > >> build
> > >> >> it
> > >> >> > and test it, and then please vote:
> > >> >> >
> > >> >> >
> > >> >> > [ ] +1 approve
> > >> >> >
> > >> >> > [ ] +0 no opinion
> > >> >> >
> > >> >> > [ ] -1 disapprove (and reason why)
> > >> >> >
> > >> >> >
> > >> >> > +1 from my side for the release.
> > >> >> >
> > >> >> > Cheers!
> > >> >> >
> > >> >>
> > >>
> > >>
> > >>
> > >> --
> > >> Nacho - Ignacio Solis - iso...@igso.net
> > >>
> >
> >
> >
> > --
> > Nacho - Ignacio Solis - iso...@igso.net
> >
>


Re: [VOTE] Apache Samza 0.13.0 RC0

2017-05-17 Thread Navina Ramesh
Prateek told me that he sent out a cancel email. It didn't reach the
mail-archive I think. Lately, we have this kind of issues where the emails
are not reaching our dev list.

On Wed, May 17, 2017 at 2:06 PM, Yi Pan <nickpa...@gmail.com> wrote:

> Hi, all,
>
> Based on the conversation above, can we officially cancel this vote?
>
> Thanks!
>
> -Yi
>
> On Mon, May 15, 2017 at 9:31 AM, Ignacio Solis <iso...@igso.net> wrote:
>
> > Thanks!
> >
> > On Mon, May 15, 2017 at 8:00 AM, Navina Ramesh
> > <nram...@linkedin.com.invalid> wrote:
> > > I will try to get the patch out today. Work doesn't look trivial. I am
> on
> > > it.
> > >
> > > Navina
> > >
> > > On May 14, 2017 23:10, "Ignacio Solis" <iso...@igso.net> wrote:
> > >
> > >> We should hold off until it is solved.  How long will it take to fix
> > this?
> > >>
> > >> On Sun, May 14, 2017 at 10:13 PM, Navina Ramesh (Apache)
> > >> <nav...@apache.org> wrote:
> > >> > I just changed the status of this JIRA to "BLOCKER" -
> > >> > https://issues.apache.org/jira/browse/SAMZA-1128
> > >> >
> > >> > This causes a bug in standalone deployment where any failure in the
> > >> barrier
> > >> > protocol stops the scheduled executorservice. Unfortunately,
> > >> > CoordinationUtils creates its own scheduled executorservice, which
> is
> > >> > incorrect. Scheduled ExecutorService is meant to be the working
> queue
> > for
> > >> > the ZkJobCoordinator. This needs to be fixed. Bharath already ran
> into
> > >> this
> > >> > bug during testing on Friday.
> > >> >
> > >> > veto for this release candidate.
> > >> >
> > >> > @Prateek/Jagadish:
> > >> > I recommend sending a "non-vote, testing release candidate" for this
> > >> > release until we complete all pending tasks (includes docs, tests
> > etc).
> > >> It
> > >> > will also be useful to share the pending tasks and their progress.
> In
> > >> case
> > >> > you have already shared it, I might have missed it since some emails
> > are
> > >> > bouncing off my inbox.
> > >> >
> > >> > Thanks!
> > >> > Navina
> > >> >
> > >> > On Sun, May 14, 2017 at 1:30 PM, Boris S <bor...@gmail.com> wrote:
> > >> >
> > >> >> I think we need to add SAMZA-1286 and
> > >> >> SAMZA-1279 to the release .
> > >> >>
> > >> >> On Wed, May 10, 2017 at 7:51 PM, Jagadish Venkatraman <
> > >> jagad...@apache.org
> > >> >> >
> > >> >> wrote:
> > >> >>
> > >> >> > This is a call for a vote on a release of Apache Samza 0.13.0.
> > Thanks
> > >> to
> > >> >> > everyone who has contributed to this release. We are very glad to
> > see
> > >> >> some
> > >> >> > new contributors and features in this release.
> > >> >> >
> > >> >> > The release candidate can be downloaded from here:
> > >> >> > http://home.apache.org/~jagadish/samza-0.13.0-rc0/
> > >> >> >
> > >> >> > The release candidate is signed with pgp key AF81FFBF, which can
> be
> > >> found
> > >> >> > on keyservers:
> > >> >> > http://pgp.mit.edu/pks/lookup?op=get=0xAF81FFBF
> > >> >> >
> > >> >> > The git tag is release-0.13.0-rc0 and signed with the same pgp
> key:
> > >> >> > https://git-wip-us.apache.org/repos/asf?p=samza.git;a=tag;h=
> > >> >> > refs/tags/release-0.13.0-rc0
> > >> >> >
> > >> >> > Test binaries have been published to Maven's staging repository,
> > and
> > >> are
> > >> >> > available here:
> > >> >> > https://repository.apache.org/content/repositories/
> > >> orgapachesamza-1020
> > >> >> >
> > >> >> > 127 issues were resolved for this release:
> > >> >> > https://issues.apache.org/jira/issues/?jql=project%20%
> > >> >> > 3D%20SAMZA%20AND%20fixVersion%20in%20(0.13%2C%200.13.0)%
> > >> >> > 20AND%20status%20in%20(Resolved%2C%20Closed)
> > >> >> >
> > >> >> > The vote will be open for 72 hours (ending at 8:00PM Saturday,
> > >> >> 05/13/2017).
> > >> >> >
> > >> >> > Please download the release candidate, check the
> hashes/signature,
> > >> build
> > >> >> it
> > >> >> > and test it, and then please vote:
> > >> >> >
> > >> >> >
> > >> >> > [ ] +1 approve
> > >> >> >
> > >> >> > [ ] +0 no opinion
> > >> >> >
> > >> >> > [ ] -1 disapprove (and reason why)
> > >> >> >
> > >> >> >
> > >> >> > +1 from my side for the release.
> > >> >> >
> > >> >> > Cheers!
> > >> >> >
> > >> >>
> > >>
> > >>
> > >>
> > >> --
> > >> Nacho - Ignacio Solis - iso...@igso.net
> > >>
> >
> >
> >
> > --
> > Nacho - Ignacio Solis - iso...@igso.net
> >
>



-- 
Navina R.


Re: [DISCUSS] SEP-4: Adjunct Data Store for Unbounded DataSets

2017-05-16 Thread Navina Ramesh (Apache)
Thanks for trying 3 times, Wei. Sorry about the trouble. Not sure where the
problem lies. Looking forward to review your design.

Navina

On Tue, May 16, 2017 at 8:56 AM, Wei Song  wrote:

> Hey everyone,
>
> I created a proposal for SAMZA-1278
> , Adjunct Data Store
> for Unbounded DataSets, which introduces an automatic mechanism to store
> adjunct data for stream tasks.
>
> https://cwiki.apache.org/confluence/display/SAMZA/Adjunct+Da
> ta+Store+for+Unbounded+DataSets
>
> Please review and comments are welcome!
>
> For those who are not actively following the master branch, you may have
> more questions than others. Feel free to ask them here.
>
> P.S. this is the 3rd try, sent this last week, but apparently no one at
> Linkedin has received, including samza-dev here just to be sure.
>
> --
> Thanks,
> -Wei
>


Re: [VOTE] Apache Samza 0.13.0 RC0

2017-05-15 Thread Navina Ramesh
I will try to get the patch out today. Work doesn't look trivial. I am on
it.

Navina

On May 14, 2017 23:10, "Ignacio Solis" <iso...@igso.net> wrote:

> We should hold off until it is solved.  How long will it take to fix this?
>
> On Sun, May 14, 2017 at 10:13 PM, Navina Ramesh (Apache)
> <nav...@apache.org> wrote:
> > I just changed the status of this JIRA to "BLOCKER" -
> > https://issues.apache.org/jira/browse/SAMZA-1128
> >
> > This causes a bug in standalone deployment where any failure in the
> barrier
> > protocol stops the scheduled executorservice. Unfortunately,
> > CoordinationUtils creates its own scheduled executorservice, which is
> > incorrect. Scheduled ExecutorService is meant to be the working queue for
> > the ZkJobCoordinator. This needs to be fixed. Bharath already ran into
> this
> > bug during testing on Friday.
> >
> > veto for this release candidate.
> >
> > @Prateek/Jagadish:
> > I recommend sending a "non-vote, testing release candidate" for this
> > release until we complete all pending tasks (includes docs, tests etc).
> It
> > will also be useful to share the pending tasks and their progress. In
> case
> > you have already shared it, I might have missed it since some emails are
> > bouncing off my inbox.
> >
> > Thanks!
> > Navina
> >
> > On Sun, May 14, 2017 at 1:30 PM, Boris S <bor...@gmail.com> wrote:
> >
> >> I think we need to add SAMZA-1286 and
> >> SAMZA-1279 to the release .
> >>
> >> On Wed, May 10, 2017 at 7:51 PM, Jagadish Venkatraman <
> jagad...@apache.org
> >> >
> >> wrote:
> >>
> >> > This is a call for a vote on a release of Apache Samza 0.13.0. Thanks
> to
> >> > everyone who has contributed to this release. We are very glad to see
> >> some
> >> > new contributors and features in this release.
> >> >
> >> > The release candidate can be downloaded from here:
> >> > http://home.apache.org/~jagadish/samza-0.13.0-rc0/
> >> >
> >> > The release candidate is signed with pgp key AF81FFBF, which can be
> found
> >> > on keyservers:
> >> > http://pgp.mit.edu/pks/lookup?op=get=0xAF81FFBF
> >> >
> >> > The git tag is release-0.13.0-rc0 and signed with the same pgp key:
> >> > https://git-wip-us.apache.org/repos/asf?p=samza.git;a=tag;h=
> >> > refs/tags/release-0.13.0-rc0
> >> >
> >> > Test binaries have been published to Maven's staging repository, and
> are
> >> > available here:
> >> > https://repository.apache.org/content/repositories/
> orgapachesamza-1020
> >> >
> >> > 127 issues were resolved for this release:
> >> > https://issues.apache.org/jira/issues/?jql=project%20%
> >> > 3D%20SAMZA%20AND%20fixVersion%20in%20(0.13%2C%200.13.0)%
> >> > 20AND%20status%20in%20(Resolved%2C%20Closed)
> >> >
> >> > The vote will be open for 72 hours (ending at 8:00PM Saturday,
> >> 05/13/2017).
> >> >
> >> > Please download the release candidate, check the hashes/signature,
> build
> >> it
> >> > and test it, and then please vote:
> >> >
> >> >
> >> > [ ] +1 approve
> >> >
> >> > [ ] +0 no opinion
> >> >
> >> > [ ] -1 disapprove (and reason why)
> >> >
> >> >
> >> > +1 from my side for the release.
> >> >
> >> > Cheers!
> >> >
> >>
>
>
>
> --
> Nacho - Ignacio Solis - iso...@igso.net
>


Re: [VOTE] Apache Samza 0.13.0 RC0

2017-05-14 Thread Navina Ramesh (Apache)
I just changed the status of this JIRA to "BLOCKER" -
https://issues.apache.org/jira/browse/SAMZA-1128

This causes a bug in standalone deployment where any failure in the barrier
protocol stops the scheduled executorservice. Unfortunately,
CoordinationUtils creates its own scheduled executorservice, which is
incorrect. Scheduled ExecutorService is meant to be the working queue for
the ZkJobCoordinator. This needs to be fixed. Bharath already ran into this
bug during testing on Friday.

veto for this release candidate.

@Prateek/Jagadish:
I recommend sending a "non-vote, testing release candidate" for this
release until we complete all pending tasks (includes docs, tests etc). It
will also be useful to share the pending tasks and their progress. In case
you have already shared it, I might have missed it since some emails are
bouncing off my inbox.

Thanks!
Navina

On Sun, May 14, 2017 at 1:30 PM, Boris S  wrote:

> I think we need to add SAMZA-1286 and
> SAMZA-1279 to the release .
>
> On Wed, May 10, 2017 at 7:51 PM, Jagadish Venkatraman  >
> wrote:
>
> > This is a call for a vote on a release of Apache Samza 0.13.0. Thanks to
> > everyone who has contributed to this release. We are very glad to see
> some
> > new contributors and features in this release.
> >
> > The release candidate can be downloaded from here:
> > http://home.apache.org/~jagadish/samza-0.13.0-rc0/
> >
> > The release candidate is signed with pgp key AF81FFBF, which can be found
> > on keyservers:
> > http://pgp.mit.edu/pks/lookup?op=get=0xAF81FFBF
> >
> > The git tag is release-0.13.0-rc0 and signed with the same pgp key:
> > https://git-wip-us.apache.org/repos/asf?p=samza.git;a=tag;h=
> > refs/tags/release-0.13.0-rc0
> >
> > Test binaries have been published to Maven's staging repository, and are
> > available here:
> > https://repository.apache.org/content/repositories/orgapachesamza-1020
> >
> > 127 issues were resolved for this release:
> > https://issues.apache.org/jira/issues/?jql=project%20%
> > 3D%20SAMZA%20AND%20fixVersion%20in%20(0.13%2C%200.13.0)%
> > 20AND%20status%20in%20(Resolved%2C%20Closed)
> >
> > The vote will be open for 72 hours (ending at 8:00PM Saturday,
> 05/13/2017).
> >
> > Please download the release candidate, check the hashes/signature, build
> it
> > and test it, and then please vote:
> >
> >
> > [ ] +1 approve
> >
> > [ ] +0 no opinion
> >
> > [ ] -1 disapprove (and reason why)
> >
> >
> > +1 from my side for the release.
> >
> > Cheers!
> >
>


Re: [VOTE] SEP 3 : Heart-beat mechanism between JobCoordinator and all running containers

2017-05-03 Thread Navina Ramesh
+1 (binding)

Awesome work.

Cheers!
Navina

On Wed, May 3, 2017 at 12:01 PM, Jagadish Venkatraman <
jagadish1...@gmail.com> wrote:

> +1 from my side (as described in my previous email).
>
> Thanks for incorporating all feedback from my previous review.
>
> Nice work!
>
> On Wed, May 3, 2017 at 11:46 AM, Abhishek Shivanna 
> wrote:
>
> > Hey everyone,
> >
> > This is the voting thread for SEP 3: Heart-beat mechanism between
> > JobCoordinator and all running containers
> > The Wiki page that discusses the SEP is:
> > https://cwiki.apache.org/confluence/display/SAMZA/SEP-
> > 3%3A+Heart-beat+mechanism+between+JobCoordinator+and+
> > all+running+containers
> >
> > Please vote.
> >
> > Thanks,
> > Abhishek
> >
>
>
>
> --
> Jagadish V,
> Graduate Student,
> Department of Computer Science,
> Stanford University
>



-- 
Navina R.


Re: [DISCUSS] SEP-3: Heart-beat mechanism between JobCoordinator and all running containers

2017-05-03 Thread Navina Ramesh (Apache)
Hi Abhishek,
I checked your latest proposal in SEP and it looks good to me.

QQ:
> A new ContainerHeartbeatMonitor class that accepts a
ContainerHeartbeatClient (which has the business logic to make heartbeat
checks on the JC endpoint) and a callback.

Are you planning on exposing this monitor class as a public api? What is
the significance of doing so?

> set an environment variable with the "Execution Container ID" during
container launch. This can be read from the container to make requests to
the above endpoint.

Is "Execution Container ID" the name of the environmental variable? I don't
think environmental variables can contain whitespace??

> On the container side we start a new thread that periodically polls this
endpoint described above to check if the container is valid. If its not, we
shutdown the run loop and raise an error (so that the exit code is non 0 so
that YARN reschedules the container)
The plan is to setup a monitor in the LocalContainerRunner class that
schedules a thread to check the above endpoint at regular intervals. On
failure the thread modifies state on the LocalContainerRunner to denote
that there was an error. This state is checked during exit in the
LocalContainerRunner to exit with a non-zero code.

I think the first sentence corresponds to your design. The second one is
more of an implementation detail. You may want to split it up or just
discard one of them. I got confused reading them together because one talks
about adding to container and the other about the ContainerRunner.

Design looks pretty elegant and easily portable.

Thanks!
Navina


On Wed, May 3, 2017 at 9:52 AM, Abhishek Shivanna  wrote:

> Hey Jagadish,
>
> Thank you for taking the time to review the design.
> I agree with moving the heartbeat into the the LocalContainerRunner instead
> of fitting it into the SamzaContainer. I will update the SEP with the new
> design changes.
> Also agree with the changes to the configuration and choosing suitable
> defaults should be good enough.
>
> Thanks,
> Abhishek
>
>
>
> On Wed, Apr 26, 2017 at 3:23 PM, Jagadish Venkatraman <
> jagadish1...@gmail.com> wrote:
>
> > Hi Abhishek,
> >
> > Heartbeat between the AM and container has been a long awaited Samza
> > feature. It will go a long way in ensuring our reliability! +1 for this
> > SEP.
> >
> > *High level comments:*
> >
> > Currently, the only use-case for the heartbeat mechanism seems to be when
> > running Samza on Yarn. IMHO, it makes sense to pull the heart beat logic
> > into the *LocalContainerRunner* instead of baking it into the
> > *SamzaContainer* class. Long term, we can re-visit this when we have a
> > pluggable liveness detection mechanism.
> >
> > I'm thinking of a flow like this:
> >
> > There is a separate component (or a thread) inside LocalContainerRunner
> > that periodically polls the coordinator, and determines if it should
> > continue running. If the coordinator determines that the container should
> > not run, the *LocalContainerRunner* cleanly shuts-down the container and
> > the process exits with a non-zero exit status.
> >
> > The following nice properties fall out:
> >
> >- We can remove the proposed config *job.container.validator.enabled.
> *
> >- We can also remove the proposed *Killable* interface since
> >*SamzaContainer* (and runLoops) don't have to implement *Killable *
> >anymore. The life-cycle is managed by the *LocalContainerRunner* that
> >started it.
> >
> > *On the proposed public interfaces:*
> >
> > *job.container.validator.enabled:  *I am not in favor of adding this as
> a
> > new public config. IIUC, When running Samza jobs on Yarn, we always want
> > the validator/heartbeats to be enabled. OTOH, when running Samza jobs in
> > standalone mode, we currently do not have a pluggable mechanism for
> > heartbeat.
> >
> > *job.container.schedule.ms : *It does
> > seem that we can pick a sensible default, and be done with it (instead of
> > adding a new config)? Is there a reason this needs to be configurable?
> >
> > *On proposed Killable interface: *
> >
> > Not entirely sure we need this new "*Killable"* interface (esp. given
> that
> > there's currently only one implementation - *SamzaContainer*).
> >
> >- The *LocalContainerRunner* can instead directly invoke shut-down on
> >the *SamzaContainer* when its heart-beat expires. The extra level of
> >indirection (making *SamzaContainer* to implement *Killable*) is
> >probably unnecessary IMHO.
> >
> >
> >- Since, the *LocalContainerRunner* invokes *start/run* on the
> >*SamzaContainer*, it seems simpler also have it invoke *shutdown* on
> the
> >*SamzaContainer. *
> >
> > *Minor Comments:*
> >
> > >> Expose a REST endpoint (eg: /isContainerValid) who's purpose is to get
> > requests from the Samza container periodically and respond back weather
> the
> > container is in the Job Coordinator's current list of valid containers.
> 

Re: Review Request 58866: fixed SAMZA-1248. use processor id for stand alone barrier

2017-05-01 Thread Navina Ramesh via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58866/#review173529
---


Ship it!




- Navina Ramesh


On May 1, 2017, 11:24 p.m., Boris Shkolnik wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58866/
> ---
> 
> (Updated May 1, 2017, 11:24 p.m.)
> 
> 
> Review request for samza and Navina Ramesh.
> 
> 
> Bugs: SAMZA-1248
> https://issues.apache.org/jira/browse/SAMZA-1248
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> use processor id for stand alone barrier
> 
> 
> Diffs
> -
> 
>   samza-core/src/main/java/org/apache/samza/zk/ZkJobCoordinator.java 
> 2535654cee37feeb472517b8673a7bb12b3cc1fc 
>   samza-core/src/main/java/org/apache/samza/zk/ZkUtils.java 
> fee840511fbc19da2e19525a97fcfb5812a70a53 
>   samza-core/src/test/java/org/apache/samza/zk/TestZkUtils.java 
> b8dc2953ead2fb11fa22db5ec30b19a74a779830 
> 
> 
> Diff: https://reviews.apache.org/r/58866/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Boris Shkolnik
> 
>



Re: Review Request 58866: fixed SAMZA-1248. use processor id for stand alone barrier

2017-04-28 Thread Navina Ramesh via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58866/#review173409
---




samza-core/src/main/java/org/apache/samza/zk/ZkJobCoordinator.java
Line 65 (original), 65 (patched)
<https://reviews.apache.org/r/58866/#comment246410>

Why is newJobModel useful? Please add some comments as it is not very 
obvious.


- Navina Ramesh


On April 28, 2017, 11:54 p.m., Boris Shkolnik wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58866/
> ---
> 
> (Updated April 28, 2017, 11:54 p.m.)
> 
> 
> Review request for samza.
> 
> 
> Bugs: SAMZA-1248
> https://issues.apache.org/jira/browse/SAMZA-1248
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> use processor id for stand alone barrier
> 
> 
> Diffs
> -
> 
>   samza-core/src/main/java/org/apache/samza/zk/ZkJobCoordinator.java 
> 2535654cee37feeb472517b8673a7bb12b3cc1fc 
>   samza-core/src/main/java/org/apache/samza/zk/ZkUtils.java 
> fee840511fbc19da2e19525a97fcfb5812a70a53 
>   samza-core/src/test/java/org/apache/samza/zk/TestZkUtils.java 
> b8dc2953ead2fb11fa22db5ec30b19a74a779830 
> 
> 
> Diff: https://reviews.apache.org/r/58866/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Boris Shkolnik
> 
>



Re: Review Request 58851: SAMZA-1212 - Refactor interaction between StreamProcessor, JobCoordinator and SamzaContainer

2017-04-28 Thread Navina Ramesh via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58851/
---

(Updated April 28, 2017, 6:50 p.m.)


Review request for samza and Prateek Maheshwari.


Bugs: SAMZA-1212
https://issues.apache.org/jira/browse/SAMZA-1212


Repository: samza


Description
---

(Same as PR - https://github.com/apache/samza/pull/148)
See SAMZA-1212 for motivation toward this refactoring.

Changes here are:
- Removed awaitStart (blocking) method in StreamProcessor, JobCoordinator and 
SamzaContainer
- Introduced SamzaContainerListener and JobCoordinatorListener interface 
implemented by StreamProcessor
- Introduced SamzaContainerStatus to handler failures and lifecycle using 
Listener interfaces


Diffs
-

  samza-core/src/main/java/org/apache/samza/SamzaContainerStatus.java 
PRE-CREATION 
  samza-core/src/main/java/org/apache/samza/coordinator/JobCoordinator.java 
af2ef6a0338a0f0ab015e615a5dc213941095801 
  
samza-core/src/main/java/org/apache/samza/coordinator/JobCoordinatorFactory.java
 7f7e1ede822cf16b78e6e753ebc083a17ebf2aca 
  
samza-core/src/main/java/org/apache/samza/processor/JobCoordinatorListener.java 
PRE-CREATION 
  
samza-core/src/main/java/org/apache/samza/processor/SamzaContainerController.java
 4af413a14aaa3976f45b0646a3feb745ea3f0e97 
  
samza-core/src/main/java/org/apache/samza/processor/SamzaContainerListener.java 
PRE-CREATION 
  samza-core/src/main/java/org/apache/samza/processor/StreamProcessor.java 
191059443e3d65869207a5f1e11526f97833f468 
  
samza-core/src/main/java/org/apache/samza/processor/StreamProcessorLifecycleListener.java
 7bca074a4d83bb9bc2434b6769ecf39c5694e2f9 
  samza-core/src/main/java/org/apache/samza/runtime/LocalContainerRunner.java 
80350dfc02b577faf0dce00cf5695c23d202ad9c 
  
samza-core/src/main/java/org/apache/samza/standalone/StandaloneJobCoordinator.java
 0d74fb82590ba6f183905c9b0328b16d88adc0ab 
  
samza-core/src/main/java/org/apache/samza/standalone/StandaloneJobCoordinatorFactory.java
 0faeca917aa5fb12acef9fb539d81a01255a0441 
  samza-core/src/main/java/org/apache/samza/zk/ZkBarrierForVersionUpgrade.java 
0afd840dc2083dc78b853423f27776d6b5a2538f 
  samza-core/src/main/java/org/apache/samza/zk/ZkControllerImpl.java 
61f78762a3a1a50687ec00f783685f53d17bd645 
  samza-core/src/main/java/org/apache/samza/zk/ZkJobCoordinator.java 
2535654cee37feeb472517b8673a7bb12b3cc1fc 
  samza-core/src/main/java/org/apache/samza/zk/ZkJobCoordinatorFactory.java 
a44565c083dc73b0f5d56174d82e9ae62136cf02 
  samza-core/src/main/scala/org/apache/samza/container/SamzaContainer.scala 
8481c92b5666710edd8381526f824daed4dd27c5 
  samza-core/src/main/scala/org/apache/samza/job/local/ThreadJobFactory.scala 
dcef3af45bf5fe139be7744276adaddac3fb3505 
  samza-core/src/test/java/org/apache/samza/processor/TestStreamProcessor.java 
PRE-CREATION 
  samza-core/src/test/scala/org/apache/samza/container/TestSamzaContainer.scala 
010ff7e85ff1c5e507f3e9fa7d6c196b58d929ab 
  
samza-core/src/test/scala/org/apache/samza/processor/StreamProcessorTestUtils.scala
 PRE-CREATION 
  
samza-kafka/src/test/java/org/apache/samza/system/kafka/TestKafkaSystemAdminJava.java
 a786468722cc49b4b6c3c67d89a6b09f1be4c939 
  
samza-test/src/test/java/org/apache/samza/test/processor/TestStreamProcessor.java
 f37a224f64eec162e60e3a891b257175dbf4ec3c 
  
samza-test/src/test/scala/org/apache/samza/test/integration/StreamTaskTestUtil.scala
 29fb6d3f6e07f356d4a25556221fa76ecdc7bf77 


Diff: https://reviews.apache.org/r/58851/diff/1/


Testing
---

unit tests and ./gradlew clean build


Thanks,

Navina Ramesh



Review Request 58851: SAMZA-1212 - Refactor interaction between StreamProcessor, JobCoordinator and SamzaContainer

2017-04-28 Thread Navina Ramesh via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58851/
---

Review request for samza and Prateek Maheshwari.


Repository: samza


Description
---

(Same as PR - https://github.com/apache/samza/pull/148)
See SAMZA-1212 for motivation toward this refactoring.

Changes here are:
- Removed awaitStart (blocking) method in StreamProcessor, JobCoordinator and 
SamzaContainer
- Introduced SamzaContainerListener and JobCoordinatorListener interface 
implemented by StreamProcessor
- Introduced SamzaContainerStatus to handler failures and lifecycle using 
Listener interfaces


Diffs
-

  samza-core/src/main/java/org/apache/samza/SamzaContainerStatus.java 
PRE-CREATION 
  samza-core/src/main/java/org/apache/samza/coordinator/JobCoordinator.java 
af2ef6a0338a0f0ab015e615a5dc213941095801 
  
samza-core/src/main/java/org/apache/samza/coordinator/JobCoordinatorFactory.java
 7f7e1ede822cf16b78e6e753ebc083a17ebf2aca 
  
samza-core/src/main/java/org/apache/samza/processor/JobCoordinatorListener.java 
PRE-CREATION 
  
samza-core/src/main/java/org/apache/samza/processor/SamzaContainerController.java
 4af413a14aaa3976f45b0646a3feb745ea3f0e97 
  
samza-core/src/main/java/org/apache/samza/processor/SamzaContainerListener.java 
PRE-CREATION 
  samza-core/src/main/java/org/apache/samza/processor/StreamProcessor.java 
191059443e3d65869207a5f1e11526f97833f468 
  
samza-core/src/main/java/org/apache/samza/processor/StreamProcessorLifecycleListener.java
 7bca074a4d83bb9bc2434b6769ecf39c5694e2f9 
  samza-core/src/main/java/org/apache/samza/runtime/LocalContainerRunner.java 
80350dfc02b577faf0dce00cf5695c23d202ad9c 
  
samza-core/src/main/java/org/apache/samza/standalone/StandaloneJobCoordinator.java
 0d74fb82590ba6f183905c9b0328b16d88adc0ab 
  
samza-core/src/main/java/org/apache/samza/standalone/StandaloneJobCoordinatorFactory.java
 0faeca917aa5fb12acef9fb539d81a01255a0441 
  samza-core/src/main/java/org/apache/samza/zk/ZkBarrierForVersionUpgrade.java 
0afd840dc2083dc78b853423f27776d6b5a2538f 
  samza-core/src/main/java/org/apache/samza/zk/ZkControllerImpl.java 
61f78762a3a1a50687ec00f783685f53d17bd645 
  samza-core/src/main/java/org/apache/samza/zk/ZkJobCoordinator.java 
2535654cee37feeb472517b8673a7bb12b3cc1fc 
  samza-core/src/main/java/org/apache/samza/zk/ZkJobCoordinatorFactory.java 
a44565c083dc73b0f5d56174d82e9ae62136cf02 
  samza-core/src/main/scala/org/apache/samza/container/SamzaContainer.scala 
8481c92b5666710edd8381526f824daed4dd27c5 
  samza-core/src/main/scala/org/apache/samza/job/local/ThreadJobFactory.scala 
dcef3af45bf5fe139be7744276adaddac3fb3505 
  samza-core/src/test/java/org/apache/samza/processor/TestStreamProcessor.java 
PRE-CREATION 
  samza-core/src/test/scala/org/apache/samza/container/TestSamzaContainer.scala 
010ff7e85ff1c5e507f3e9fa7d6c196b58d929ab 
  
samza-core/src/test/scala/org/apache/samza/processor/StreamProcessorTestUtils.scala
 PRE-CREATION 
  
samza-kafka/src/test/java/org/apache/samza/system/kafka/TestKafkaSystemAdminJava.java
 a786468722cc49b4b6c3c67d89a6b09f1be4c939 
  
samza-test/src/test/java/org/apache/samza/test/processor/TestStreamProcessor.java
 f37a224f64eec162e60e3a891b257175dbf4ec3c 
  
samza-test/src/test/scala/org/apache/samza/test/integration/StreamTaskTestUtil.scala
 29fb6d3f6e07f356d4a25556221fa76ecdc7bf77 


Diff: https://reviews.apache.org/r/58851/diff/1/


Testing
---

unit tests and ./gradlew clean build


Thanks,

Navina Ramesh



Re: [DISCUSS] SEP-2: ApplicationRunner Design

2017-04-21 Thread Navina Ramesh
Hey Yi,
Thanks for lot for your work on this document. I know it must have been
crazy trying to put-together everything in a single doc :)

Here are my comments. Sorry about the delay :(

1. It will be useful to set some background for the benefit of the
community members who haven't been following design docs in the JIRAs. Can
you briefly explain the definition of StreamApplication and how it
translates to jobs through the stack.

2. "Problem" section doesn't seem to describe any problem that
ApplicationRunner is solving :) Imo, ApplicationRunner basically provides a
unified programming pattern for the user to execute StreamApplications
defined using fluent-api or task-level API. I think the problem and
motivation section can use a little bit of re-wording.

3. In the "Overview of ApplicationRunner" section:
* How the components within ApplicationRunner interact isn't very obvious
from the overview image. For example, ExecutionPlanner translates a
"StreamApplication" into an "ExecutionPlan" which is essentially a
specification of the DAG. (Please correct me, if I am wrong here!). The
ExecutionPlan is used by the JobRunner to launch Samza jobs.
* The roles of ExecutionPlanner and JobRunner are fairly well-defined.
StreamManager seems like a util class that helps class-load systems and
create streams. The ExecutionPlan will be consumed by JobRunner and
JobRunner will use StreamManager to create intermediate streams, prior to
launching jobs. It doesn't sound like a StreamManager is a "component" of
the ApplicationRunner.
* What is the role of the RuntimeEnvironment? That has not been explained.
Maybe explaining that will fill the gap in understanding for the readers. I
see that you have tried to explain the flow of control in the code using
the sequence diagram. Perhaps, if we can articulate the
roles/responsibilities of the RuntimeEnvironment, there will not be a need
for the control flow diagram.

4. How is runtime environment defined by the user? Is it configurable ?
Answering these questions in the doc will be useful

5. In the "Interaction between RuntimeEnvironment and ApplicationRunners"
section:
* Samza container is interacting with the RuntimeEnvironment. Does that
make the RuntimeEnvironment as a shared component between the
LocalApplicationRunner and the SamzaContainer? It doesn't seem to be the
case for RemoteApplicationRunner. So, I am confused as to why it is
different.

6. In general, what does "app.class" config represent?  It seems
straightforward when a "StreamApplication" is defined. Is it applicable
when using low-level task api?

7. Interface defintions:
* Perhaps when you implement this, can you specifically callout if each
method is blocking or not in the javadoc ?

8. Minor nit-picking:
* "ApplicationRunners in Different Execution Environments" -> should it be
RuntimeEnvironments as that is the terminology used in the rest of the
document.
* In the "How this works in standalone deployment" section:
* "Deploy the application to standalone hosts" and *Run run-local-app.sh on
each node to start/stop the local application* are probably just a single
step - Deploy the application to standalone hosts using run-local-app.sh??


General question:
It seems like, even with extensive changes to the interfaces/programming
model, we are still class loading the components for most parts. In such a
world, we are not close to integrating with frameworks that already have a
lifecycle model and can provide instantiated objects directly. For example,
in the Samza as a library use-case, it makes sense for the user to provide
a JmxServer or a taskFactory or a custom metricReporter for the
StreamProcessor. One of the motivations for this case was that most
applications are already running within a servlet/jetty container model
with its own lifecycle. If ApplicationRunner(s) is the unified interface,
doesn't that prohibit Samza from being integrated with such frameworks?

Thanks!
Navina

On Thu, Apr 20, 2017 at 10:06 AM, Jacob Maes  wrote:

> Thanks for the SEP!
>
> +1 on introducing these new components
> -1 on the current definition of their roles (see Design feedback below)
>
> *Design*
>
>- If LocalJobRunner and RemoteJobRunner handle the different methods of
>launching a Job, what additional value do the different types of
>ApplicationRunner and RuntimeEnvironment provide? It seems like a red
> flag
>that all 3 would need to change from environment to environment. It
>indicates that they don't have proper modularity. The
> call-sequence-figures
>support this; LocalApplicationRunner and RemoteApplicationRunner make
> the
>same calls and the diagram only varies after jobRunner.start()
>- As far as I can tell, the only difference between Local and Remote
>ApplicationRunner is that one is blocking and the other is
> non-blocking. If
>that's all they're for then either the names should be changed to
> reflect
>this, or they should be 

Re: [VOTE] Samza Logo

2017-04-18 Thread Navina Ramesh
@Renato:
Thanks for your feedback. Always appreciate help from our contributors :)

@Jagadish:
Thanks for stating your points. It is clear that you are against the
butterfly ones. But are you in support of any of the others? Please vote :)

Thanks!
Navina

On Tue, Apr 18, 2017 at 10:38 AM, Jagadish Venkatraman <
jagadish1...@gmail.com> wrote:

> FWIW, I have a contrarian perspective on this one. Here's my 2 cents:
>
> I'm -1 for having for our logo to do anything with a butterfly.
>
> - Samza and Kafka are separate top-level projects. I do not think the
> connection to Franz Kafka's novel on "metamorphosis", and the fact that a
> salesman named "Samsa" in the novel was transformed to an "insect" should
> dictate our logo.  Agreed, the butterfly is a cute insect remotely
> relatable(?) to stream processing via a convoluted story.
>
> - For a choice of a mascot, I'd much rather have something that signifies
> scale, sturdiness or swiftness instead of a cute butterfly :-)
>
> - The 2 other non-butterfly logos at-least have a "node", "stream",
> "edges", "graph" like feel which I like.
>
> Thanks,
> Jagadish
>
> On Sat, Apr 15, 2017 at 12:24 PM, Jacob Maes <jacob.m...@gmail.com> wrote:
>
> > I think I voted the exact opposite to everyone else in this thread.
> >
> > I don't want anything to do with a butterfly. The metaphor is even
> further
> > removed from the Samsa story than a cockroach, so I think we should give
> up
> > on that. I don't want a mascot; we're not building a university football
> > team. And as animals go, the only slower one I can think of is a sloth,
> so
> > I don't feel a butterfly says "scalable stream processing". This,
> combined
> > with my preference to eschew the color red for logos, puts the red
> > butterfly last.
> >
> > The blue butterfly is a little more abstract and formal looking, but
> still
> > a butterfly, so that is second to last.
> >
> > The other 2 are very close, in my opinion.
> >
> > The one with the circles is reminiscent of orbital loops, which gives me
> > the feeling of scale. It also has the dots at varying places along the
> > lines, which to me conveys the different proportions of input/output
> stream
> > sizes/TTLs. And the cyclical shape could also be used for animations
> > portraying the concept of "reprocessing"
> >
> > The one with the "S" dots reminds me of the Kafka logo without the lines.
> > If the lines are the streams and the dots are processing nodes, then I
> > think it's clever for the Samza logo to be a "negative" of the Kafka one.
> > That's not to say samza is any more related to Kafka than it is; but if
> the
> > Kafka logo says "streams" then to me this Samza logo says "processors"
> >
> > My 2 cents.
> >
> > On Fri, Apr 14, 2017 at 11:05 PM, Ignacio Solis <iso...@igso.net> wrote:
> >
> > > You're making me feel bad for linking that one! :-)
> > >
> > > I don't see it as a maze. To me, that one is like circles that turn,
> > > representing the processing. Like cogs on an engine. The little
> > > circles are like the messages. The concentric circles are like the
> > > streams.
> > >
> > > The red butterfly is my second favorite.
> > >
> > > Vote note:
> > >
> > > Once we close voting we'll look at the actual results.  The way the
> > > ranking gets calculated it you don't vote for a design at all, that
> > > vote does not get factored in. It assumes you have no opinion. So if
> > > somebody votes 5 stars on A and 1 start on B.  And second person only
> > > votes 5 stars on B, then the ranking would be  A-5 stars, B-3 stars.
> > > (or something along those lines).   So if you only vote on the 5 star
> > > ones, you're missing your vote on the ones you don't like.
> > >
> > > So, once we close, we'll see how people voted.
> > >
> > > Nacho
> > >
> > >
> > > On Fri, Apr 14, 2017 at 9:21 PM, Yi Pan <nickpa...@gmail.com> wrote:
> > > > Really? The one with the maze on the left currently is top one? I
> can't
> > > > relate to that either. My favorite was the logo w/ Taiji symbol.
> Since
> > > that
> > > > did not make the top 4, I am voting for the red bufferfly one, same
> as
> > > > Navina.
> > > >
> > > > -Yi
> > > >
> > > > On Fri, Apr 14, 2017 at 3:33 PM, Navina Ramesh
> > > &

Re: [VOTE] Samza Logo

2017-04-14 Thread Navina Ramesh
I prefer to have open discussions in the official mailing list or JIRA
since it is an open-community. It also helps track the discussions.

Fwiw, I am in favor of the red themed butterfly design because:
1. Knowing the origin of the name "Samza" (from Gregor Samsa character in
"Metamorphosis"), it isn't very far-fetched in terms of relating stream
processing to some kind of transformation. Butterfly is probably the
prettiest insect to associate with "metamorphosis", without giving the
impression of a bug :)
2. Red theme ties it with the current logo, although we can improvise on
the gradients.
3. We can have a "mascot" , instead of an abstract symbol.

One comment on the butterfly one - it seems to have only 1 antenna.

-1 for the dots only logo. It feels like a color-blindness test :D
-1 for the blue-based logo - it is just not relatable and it's an extreme
change from the current one.

I couldn't relate to the circular one. What are are trying to portray/imply
here. That we are a bunch of disconnected links?

Thanks!
Navina

On Fri, Apr 14, 2017 at 3:16 PM, Ignacio Solis <iso...@igso.net> wrote:

> Vote directly at design crowd.  But feel free to leave comments here,
> maybe you can try to persuade people or argue for your favorite. :-)
>
> Nacho
>
> On Fri, Apr 14, 2017 at 2:31 PM, Navina Ramesh
> <nram...@linkedin.com.invalid> wrote:
> > Hi Nacho,
> > Do you want us to vote on this mail thread or directly at design crowd?
> >
> > Thanks!
> > Navina
> >
> > On Fri, Apr 14, 2017 at 2:19 PM, Ignacio Solis <iso...@igso.net> wrote:
> >
> >> Hi folks.
> >>
> >> After some feedback and culling, we are down to 4 candidates.  Please
> >> vote on your favorite designs. We will be able to make minor
> >> modifications to the selected logo as we talk to the designer.  We can
> >> always have changes in colors or fonts.
> >>
> >> http://www.designcrowd.com/vote/apachesamzalogo
> >>
> >> This poll will stay open for about a week to collect all votes and
> >> comments.
> >>
> >> For completeness, the relevant JIRA is this:
> >> https://issues.apache.org/jira/browse/SAMZA-
> >>
> >> Cheers,
> >>
> >> Nacho
> >>
> >> --
> >> Nacho - Ignacio Solis - iso...@igso.net
> >>
> >
> >
> >
> > --
> > Navina R.
>
>
>
> --
> Nacho - Ignacio Solis - iso...@igso.net
>



-- 
Navina R.


Re: [VOTE] Samza Logo

2017-04-14 Thread Navina Ramesh
Hi Nacho,
Do you want us to vote on this mail thread or directly at design crowd?

Thanks!
Navina

On Fri, Apr 14, 2017 at 2:19 PM, Ignacio Solis  wrote:

> Hi folks.
>
> After some feedback and culling, we are down to 4 candidates.  Please
> vote on your favorite designs. We will be able to make minor
> modifications to the selected logo as we talk to the designer.  We can
> always have changes in colors or fonts.
>
> http://www.designcrowd.com/vote/apachesamzalogo
>
> This poll will stay open for about a week to collect all votes and
> comments.
>
> For completeness, the relevant JIRA is this:
> https://issues.apache.org/jira/browse/SAMZA-
>
> Cheers,
>
> Nacho
>
> --
> Nacho - Ignacio Solis - iso...@igso.net
>



-- 
Navina R.


Re: Samza Logo designs

2017-04-10 Thread Navina Ramesh
Hi Nacho,
I rated on the designcrowd website directly. In terms of feedback:
1. I like the concept of using a butterfly to indicate "metamorphosis".
However, a lot of the designs there look like a bow-tie :)
2. I think we should stick with a red theme for the name.

Sorry about the late response.

Thanks!
Navina



On Sun, Apr 9, 2017 at 1:11 PM, Ignacio Solis  wrote:

> I'll leave this open one more day for feedback. Then wait a couple of
> days for redesigns, then do a final round.
>
> On Fri, Apr 7, 2017 at 1:11 PM, Ignacio Solis  wrote:
> > Hi folks.
> >
> > I started a Designcrowd campaign for a logo.  I got some initial
> > designs.  I would like some feedback - voting.
> >
> > This is NOT a final design, just a way to get some more feedback for
> > the designers.  We do want to get a design selected before the next
> > release.
> >
> > Please provide feedback to individual designs if you want (along with
> > voting), and provide general feedback here on this mailing list.
> >
> > Things that would be helpful include design ideas, concepts, themes, etc.
> >
> > http://www.designcrowd.com/vote/samza-logo-poll-phase1
> >
> > For completeness, the relevant JIRA is this:
> > https://issues.apache.org/jira/browse/SAMZA-
> >
> > Cheers,
> >
> > Nacho
> >
> > --
> > Nacho - Ignacio Solis - iso...@igso.net
>
>
>
> --
> Nacho - Ignacio Solis - iso...@igso.net
>



-- 
Navina R.


[RESULT] [VOTE] SEP-1: Semantics of ProcessorId in Samza

2017-04-03 Thread Navina Ramesh (Apache)
Hi everyone,

The vote on SEP-1 passes with 7 +1 votes (3 binding) and no -1.

Votes are as follows:
+1 (binding) - Navina Ramesh, Yi Pan, Yan Fang
+1 (non-binding) - Boris Shkolnik, Xinyu Liu, Renato Marroquin Mogrovejo,
Ignacio Solis

The following are the discuss and vote mail threads:
DISCUSS mail thread -
http://mail-archives.apache.org/mod_mbox/samza-dev/201703.mbox/%3CCANazzuuHiO%3DvZQyFbTiYU-0Sfh3riK%3Dz4j_AdCicQ8rBO%3DXuYQ%40mail.gmail.com%3E

VOTE mail thread -
http://mail-archives.apache.org/mod_mbox/samza-dev/201703.mbox/%3CCANazzutAX23PYv3%2BN%2BGkXbDTrF0kvRG5aHRDifX5rJ%3Din0VtzA%40mail.gmail.com%3E

Thanks to everyone who participated.

Cheers!
Navina


Re: [VOTE] SEP-1: Semantics of ProcessorId in Samza

2017-04-03 Thread Navina Ramesh (Apache)
+1 (binding) from me :)

Navina

On Sun, Apr 2, 2017 at 9:31 PM, Ignacio Solis <iso...@igso.net> wrote:

> +1 (non binding)
>
> May this be the first of many SEPs...  I mean just as many as needed. :-)
>
> Nacho
>
> On Sat, Apr 1, 2017 at 1:03 PM, Kartik Paramasivam
> <kparamasi...@linkedin.com.invalid> wrote:
> > +1 (non binding)
> >
> > Great to see the SEP process being followed.
> >
> > cheers
> > Kartik
> >
> > On Thu, Mar 30, 2017 at 1:48 PM, Renato Marroquín Mogrovejo <
> > renatoj.marroq...@gmail.com> wrote:
> >
> >> Thanks for the answers Navina!
> >>
> >> +1 (non-binding)
> >>
> >> 2017-03-30 22:32 GMT+02:00 Navina Ramesh <nram...@linkedin.com.invalid>
> :
> >>
> >> > Hi Renato,
> >> >
> >> > > Having the big proposals documented on SEPs is really great to have
> a
> >> > good understanding on the system!
> >> > I agree. Our previous design process was not being strictly enforced.
> We
> >> > hope to enforce it going forward as there are major changes coming
> into
> >> the
> >> > next release.
> >> >
> >> > > So this means that inside a container there will be a single
> processor?
> >> > StreamProcessor is nothing more than a Samza container, along with an
> >> > instance of JobCoordinator in it. Think about it as a thin-wrapper
> around
> >> > SamzaContainer and JobCoordinator instance. You can find more details
> on
> >> > this idea here - https://issues.apache.org/jira/browse/SAMZA-1063
> >> > Going forward, we want a Samza job to consist of one or more
> >> > StreamProcessors, instead of N SamzaContainers and 1 AppMaster.
> >> >
> >> > >  is this related to SAMZA-1080 somehow?
> >> > Yep. SAMZA-1080 introduces StreamProcessor with an almost pass-through
> >> > JobCoordinator. In fact, at LinkedIn, one of the teams is already
> using
> >> > this API with the StandaloneJobCoordinator and delegating partition
> >> > distribution to kafka high-level consumer (since systemconsumer is
> >> > pluggable in Samza, we have some internal wrappers around high-level
> >> > consumer). It has been working really well for stateless
> applications, I
> >> > believe.
> >> >
> >> > Cheers!
> >> > Navina
> >> >
> >> > On Thu, Mar 30, 2017 at 1:23 PM, Renato Marroquín Mogrovejo <
> >> > renatoj.marroq...@gmail.com> wrote:
> >> >
> >> > > Hi Navina,
> >> > >
> >> > > Thanks for the great proposal! Having the big proposals documented
> on
> >> > SEPs
> >> > > is really great to have a good understanding on the system!
> >> > > I have only a clarification question, the proposal states that every
> >> > > containerId is the same as the processorId. So this means that
> inside a
> >> > > container there will be a single processor? is this related to
> >> SAMZA-1080
> >> > > somehow?
> >> > >
> >> > >
> >> > > Best,
> >> > >
> >> > > Renato M.
> >> > >
> >> > > 2017-03-30 20:45 GMT+02:00 Navina Ramesh
> <nram...@linkedin.com.invalid
> >> >:
> >> > >
> >> > > > Hi Yi,
> >> > > > Good question. Three reasons:
> >> > > >
> >> > > > 1. In SAMZA-881, we came up with a set of responsibilities for the
> >> > > > JobCoordinator. One of them was to generate/assign processorId.
> So,
> >> it
> >> > > > makes sense to keep getProcessorId() within JobCoordinator
> interface.
> >> > > > 2. StreamProcessor was initially introduced as a user-facing API
> >> > > > SAMZA-1080. ProcessorId was an argument in StreamProcessor
> >> constructor.
> >> > > It
> >> > > > was pushing the burden of guaranteeing unique among the processors
> >> of a
> >> > > job
> >> > > > to the user. This was not favorable.
> >> > > > 3. In general, I think we have consensus that the
> >> processorIdGenerator
> >> > is
> >> > > > going to specific to a runtime environment. Hence, it seems more
> >> > > > appropriate to move it to a lower abstraction layer tha

Re: [VOTE] SEP-1: Semantics of ProcessorId in Samza

2017-03-30 Thread Navina Ramesh
Hi Renato,

> Having the big proposals documented on SEPs is really great to have a
good understanding on the system!
I agree. Our previous design process was not being strictly enforced. We
hope to enforce it going forward as there are major changes coming into the
next release.

> So this means that inside a container there will be a single processor?
StreamProcessor is nothing more than a Samza container, along with an
instance of JobCoordinator in it. Think about it as a thin-wrapper around
SamzaContainer and JobCoordinator instance. You can find more details on
this idea here - https://issues.apache.org/jira/browse/SAMZA-1063
Going forward, we want a Samza job to consist of one or more
StreamProcessors, instead of N SamzaContainers and 1 AppMaster.

>  is this related to SAMZA-1080 somehow?
Yep. SAMZA-1080 introduces StreamProcessor with an almost pass-through
JobCoordinator. In fact, at LinkedIn, one of the teams is already using
this API with the StandaloneJobCoordinator and delegating partition
distribution to kafka high-level consumer (since systemconsumer is
pluggable in Samza, we have some internal wrappers around high-level
consumer). It has been working really well for stateless applications, I
believe.

Cheers!
Navina

On Thu, Mar 30, 2017 at 1:23 PM, Renato Marroquín Mogrovejo <
renatoj.marroq...@gmail.com> wrote:

> Hi Navina,
>
> Thanks for the great proposal! Having the big proposals documented on SEPs
> is really great to have a good understanding on the system!
> I have only a clarification question, the proposal states that every
> containerId is the same as the processorId. So this means that inside a
> container there will be a single processor? is this related to SAMZA-1080
> somehow?
>
>
> Best,
>
> Renato M.
>
> 2017-03-30 20:45 GMT+02:00 Navina Ramesh <nram...@linkedin.com.invalid>:
>
> > Hi Yi,
> > Good question. Three reasons:
> >
> > 1. In SAMZA-881, we came up with a set of responsibilities for the
> > JobCoordinator. One of them was to generate/assign processorId. So, it
> > makes sense to keep getProcessorId() within JobCoordinator interface.
> > 2. StreamProcessor was initially introduced as a user-facing API
> > SAMZA-1080. ProcessorId was an argument in StreamProcessor constructor.
> It
> > was pushing the burden of guaranteeing unique among the processors of a
> job
> > to the user. This was not favorable.
> > 3. In general, I think we have consensus that the processorIdGenerator is
> > going to specific to a runtime environment. Hence, it seems more
> > appropriate to move it to a lower abstraction layer that deals with the
> > underlying execution environment.
> >
> > Let me know if you have a different perspective on this.
> >
> > Cheers!
> > Navina
> >
> > On Thu, Mar 30, 2017 at 9:42 AM, Yi Pan <nickpa...@gmail.com> wrote:
> >
> > > @Navina,
> > >
> > > Sorry to chime in late. One question:
> > > 1. Why is it in JobCoordinator, and why not in StreamProcessor class?
> > > Because JobCoordinator provides coordination service across many
> > > processors, an interface getProcessorId() in JobCoordinator is
> confusing
> > > regarding to which processorId we are getting.
> > >
> > > Otherwise, the proposal looks good.
> > >
> > > -Yi
> > >
> > > On Wed, Mar 29, 2017 at 7:57 PM, Navina Ramesh
> > > <nram...@linkedin.com.invalid
> > > > wrote:
> > >
> > > > Good to hear from you, Yan. Thanks! :)
> > > >
> > > > On Wed, Mar 29, 2017 at 7:48 PM, Yan Fang <yanfang...@gmail.com>
> > wrote:
> > > >
> > > > > +1 . Thanks for the proposal, Navina. :)
> > > > >
> > > > > Fang, Yan
> > > > > yanfang...@gmail.com
> > > > >
> > > > > On Thu, Mar 30, 2017 at 4:24 AM, Prateek Maheshwari <
> > > > > pmaheshw...@linkedin.com.invalid> wrote:
> > > > >
> > > > > > +1 (non binding) from me.
> > > > > >
> > > > > > - Prateek
> > > > > >
> > > > > > On Tue, Mar 28, 2017 at 2:17 PM, Boris S <bor...@gmail.com>
> wrote:
> > > > > >
> > > > > > > +1 Looks good to me.
> > > > > > >
> > > > > > > On Tue, Mar 28, 2017 at 2:00 PM, xinyu liu <
> > xinyuliu...@gmail.com>
> > > > > > wrote:
> > > > > > >
> > > > > > > > +1 on my side. Very happy to see this proposal. This is a
> > blocker
> &g

Re: Steps to Upgrading Samza (0.9 to 0.12)

2017-03-30 Thread Navina Ramesh
Hi everyone,
Apologize for re-chiming in late on this issue.

> I'm not sure I agree with the policy (removing migration code and wanting
people to upgrade seem at odds to me), but minimally I think we should not
assume people are upgrading to each new Samza version.

I agree that we should not assume that people will upgrade by stepping
through each version of Samza. However, I don't agree that migration code
should not be removed at all. Thinking in terms of a project management and
maintenance, I think it is a common practice (at least in companies, if not
in open-source and I could be wrong too :D ) to keep migration code only
for the version it applies. It does add significant overhead to maintain
version upgrade/migration code across all future versions.

In this case, this was the first time we tried "automatic upgrade" from one
version to the other (0.9 -> 0.10). We could have done a better job at
documenting the upgrade steps with each version. I wish we had more
outspoken voices in the community sooner than later :)

Every project takes times to iron out issues related to release and version
upgrade. I am glad that we have so much feedback now. As Yi suggested, the
SEP process is a starting step towards documenting our changes across
versions. Additionally, we will work on adding a dedicated page for
upgrades and these will be available for all of the *upcoming* versions.

Please let us know if you have any other concerns or ideas on how we can
improve on our process.

@XiaoChuan: Unfortunately, we don't have proper documentation on upgrading
Samza across various versions. Like I mentioned before, we will put in
extra efforts going forward. There aren't any migration/upgrade steps
needed for versions post 0.10.*. You should be able to simply upgrade
without any issues. Upgrade from 0.9 to 0.10 is an exceptional case. Happy
to help you out in case you encounter more issues.

Cheers!
Navina

On Thu, Mar 30, 2017 at 11:04 AM, XiaoChuan Yu  wrote:

> Is there some sort of document on how to upgrade Samza through various
> versions like the page here for Kafka:
> https://kafka.apache.org/documentation/#upgrade ?
> Having something like this would be ideal.
> On Thu, Mar 30, 2017 at 1:51 PM Thomas Becker  wrote:
>
> > Thanks for the reply Yi, and I apologize if I came off a bit snarky.
> > I'm not sure I agree with the policy (removing migration code and
> > wanting people to upgrade seem at odds to me), but minimally I think we
> > should not assume people are upgrading to each new Samza version. We
> > have done so when features or fixes warrant, and even then on a per-job
> > basis, and I would expect this is a common practice.
> >
> > -Tommy
> >
> > On Thu, 2017-03-30 at 09:50 -0700, Yi Pan wrote:
> > > Hi, Thomas,
> > >
> > > Sorry to hear that you were hit by the removal of migration in Samza
> > > 0.11.
> > > The reason we removed it is following a deprecate-removal policy in
> > > two
> > > versions. We are not aware that people still using 0.9 after we
> > > released
> > > 0.11 and were not expecting a direct upgrade from 0.9 to 0.12.
> > > Document can
> > > be better to capture that. We are making changes to the design
> > > proposal
> > > s.t. it is more transparent and open to the whole community, through
> > > the
> > > newly proposed SEP process. These kind of breaking changes will go
> > > through
> > > the SEP discuss-vote process in the future and hopefully capture all
> > > these
> > > kind of concerns earlier.
> > >
> > > Best!
> > >
> > > -Yi
> > >
> > > On Thu, Mar 30, 2017 at 7:45 AM, Thomas Becker 
> > > wrote:
> > >
> > > >
> > > > Yes, we were burned by this. The changelog mapping will be
> > > > regenerated
> > > > instead of migrated and the result will completely hose the job
> > > > (because the mapping was not generated deterministically in
> > > > previous
> > > > versions of Samza). I don't understand why the migration code was
> > > > removed but it was, and to the best of my knowledge the necessity
> > > > to
> > > > not skip version 0.10.0 when upgrading was not documented, let
> > > > alone
> > > > enforced.
> > > >
> > > > On Mon, 2017-03-27 at 10:07 -0700, Jagadish Venkatraman wrote:
> > > > >
> > > > > Good observation Jake!
> > > > >
> > > > > The code for migration was removed in Samza 11. The migration
> > > > > would
> > > > > read
> > > > > change-log offsets from the checkpoint topic and write them to
> > > > > the
> > > > > coordinator stream.
> > > > >
> > > > > If you're using change-logged stores, I'd recommend upgrading
> > > > > from
> > > > > 0.9.1 to
> > > > > 0.10.0 first.
> > > > > Otherwise, you will loose offsets for change-logged stores.
> > > > >
> > > > > I suspect you should be okay for 0.10.0 to 0.12 upgrade.
> > > > >
> > > > > On Mon, Mar 27, 2017 at 9:30 AM, Jacob Maes  > > > > >
> > > > > wrote:
> > > > >
> > > > > >
> > > > > >
> > > > > > As I recall, 

Re: [VOTE] SEP-1: Semantics of ProcessorId in Samza

2017-03-30 Thread Navina Ramesh
Hi Yi,
Good question. Three reasons:

1. In SAMZA-881, we came up with a set of responsibilities for the
JobCoordinator. One of them was to generate/assign processorId. So, it
makes sense to keep getProcessorId() within JobCoordinator interface.
2. StreamProcessor was initially introduced as a user-facing API
SAMZA-1080. ProcessorId was an argument in StreamProcessor constructor. It
was pushing the burden of guaranteeing unique among the processors of a job
to the user. This was not favorable.
3. In general, I think we have consensus that the processorIdGenerator is
going to specific to a runtime environment. Hence, it seems more
appropriate to move it to a lower abstraction layer that deals with the
underlying execution environment.

Let me know if you have a different perspective on this.

Cheers!
Navina

On Thu, Mar 30, 2017 at 9:42 AM, Yi Pan <nickpa...@gmail.com> wrote:

> @Navina,
>
> Sorry to chime in late. One question:
> 1. Why is it in JobCoordinator, and why not in StreamProcessor class?
> Because JobCoordinator provides coordination service across many
> processors, an interface getProcessorId() in JobCoordinator is confusing
> regarding to which processorId we are getting.
>
> Otherwise, the proposal looks good.
>
> -Yi
>
> On Wed, Mar 29, 2017 at 7:57 PM, Navina Ramesh
> <nram...@linkedin.com.invalid
> > wrote:
>
> > Good to hear from you, Yan. Thanks! :)
> >
> > On Wed, Mar 29, 2017 at 7:48 PM, Yan Fang <yanfang...@gmail.com> wrote:
> >
> > > +1 . Thanks for the proposal, Navina. :)
> > >
> > > Fang, Yan
> > > yanfang...@gmail.com
> > >
> > > On Thu, Mar 30, 2017 at 4:24 AM, Prateek Maheshwari <
> > > pmaheshw...@linkedin.com.invalid> wrote:
> > >
> > > > +1 (non binding) from me.
> > > >
> > > > - Prateek
> > > >
> > > > On Tue, Mar 28, 2017 at 2:17 PM, Boris S <bor...@gmail.com> wrote:
> > > >
> > > > > +1 Looks good to me.
> > > > >
> > > > > On Tue, Mar 28, 2017 at 2:00 PM, xinyu liu <xinyuliu...@gmail.com>
> > > > wrote:
> > > > >
> > > > > > +1 on my side. Very happy to see this proposal. This is a blocker
> > for
> > > > > > integrating fluent API with StreamProcessor, and hopefully we can
> > get
> > > > it
> > > > > > resolved soon :).
> > > > > >
> > > > > > Thanks,
> > > > > > Xinyu
> > > > > >
> > > > > > On Tue, Mar 28, 2017 at 11:28 AM, Navina Ramesh (Apache) <
> > > > > > nav...@apache.org>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi everyone,
> > > > > > >
> > > > > > > This is a voting thread for SEP-1: Semantics of ProcessorId in
> > > Samza.
> > > > > > > For reference, here is the wiki link:
> > > > > > > https://cwiki.apache.org/confluence/display/SAMZA/SEP-
> > > > > > > 1%3A+Semantics+of+ProcessorId+in+Samza
> > > > > > >
> > > > > > > Link to discussion mail thread:
> > > > > > > http://mail-archives.apache.org/mod_mbox/samza-dev/201703.
> > > > > > > mbox/%3CCANazzuuHiO%3DvZQyFbTiYU-0Sfh3riK%3Dz4j_
> > > > > > AdCicQ8rBO%3DXuYQ%40mail.
> > > > > > > gmail.com%3E
> > > > > > >
> > > > > > > Please vote on this SEP asap. :)
> > > > > > >
> > > > > > > Thanks!
> > > > > > > Navina
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> > Navina R.
> >
>



-- 
Navina R.


Re: [VOTE] SEP-1: Semantics of ProcessorId in Samza

2017-03-29 Thread Navina Ramesh
Good to hear from you, Yan. Thanks! :)

On Wed, Mar 29, 2017 at 7:48 PM, Yan Fang <yanfang...@gmail.com> wrote:

> +1 . Thanks for the proposal, Navina. :)
>
> Fang, Yan
> yanfang...@gmail.com
>
> On Thu, Mar 30, 2017 at 4:24 AM, Prateek Maheshwari <
> pmaheshw...@linkedin.com.invalid> wrote:
>
> > +1 (non binding) from me.
> >
> > - Prateek
> >
> > On Tue, Mar 28, 2017 at 2:17 PM, Boris S <bor...@gmail.com> wrote:
> >
> > > +1 Looks good to me.
> > >
> > > On Tue, Mar 28, 2017 at 2:00 PM, xinyu liu <xinyuliu...@gmail.com>
> > wrote:
> > >
> > > > +1 on my side. Very happy to see this proposal. This is a blocker for
> > > > integrating fluent API with StreamProcessor, and hopefully we can get
> > it
> > > > resolved soon :).
> > > >
> > > > Thanks,
> > > > Xinyu
> > > >
> > > > On Tue, Mar 28, 2017 at 11:28 AM, Navina Ramesh (Apache) <
> > > > nav...@apache.org>
> > > > wrote:
> > > >
> > > > > Hi everyone,
> > > > >
> > > > > This is a voting thread for SEP-1: Semantics of ProcessorId in
> Samza.
> > > > > For reference, here is the wiki link:
> > > > > https://cwiki.apache.org/confluence/display/SAMZA/SEP-
> > > > > 1%3A+Semantics+of+ProcessorId+in+Samza
> > > > >
> > > > > Link to discussion mail thread:
> > > > > http://mail-archives.apache.org/mod_mbox/samza-dev/201703.
> > > > > mbox/%3CCANazzuuHiO%3DvZQyFbTiYU-0Sfh3riK%3Dz4j_
> > > > AdCicQ8rBO%3DXuYQ%40mail.
> > > > > gmail.com%3E
> > > > >
> > > > > Please vote on this SEP asap. :)
> > > > >
> > > > > Thanks!
> > > > > Navina
> > > > >
> > > >
> > >
> >
>



-- 
Navina R.


Re: [VOTE] SEP-1: Semantics of ProcessorId in Samza

2017-03-28 Thread Navina Ramesh
Thanks, Xinyu. I have already implemented a draft. Waiting for the voting
to close soon.

Navina

On Tue, Mar 28, 2017 at 2:17 PM, Boris S <bor...@gmail.com> wrote:

> +1 Looks good to me.
>
> On Tue, Mar 28, 2017 at 2:00 PM, xinyu liu <xinyuliu...@gmail.com> wrote:
>
> > +1 on my side. Very happy to see this proposal. This is a blocker for
> > integrating fluent API with StreamProcessor, and hopefully we can get it
> > resolved soon :).
> >
> > Thanks,
> > Xinyu
> >
> > On Tue, Mar 28, 2017 at 11:28 AM, Navina Ramesh (Apache) <
> > nav...@apache.org>
> > wrote:
> >
> > > Hi everyone,
> > >
> > > This is a voting thread for SEP-1: Semantics of ProcessorId in Samza.
> > > For reference, here is the wiki link:
> > > https://cwiki.apache.org/confluence/display/SAMZA/SEP-
> > > 1%3A+Semantics+of+ProcessorId+in+Samza
> > >
> > > Link to discussion mail thread:
> > > http://mail-archives.apache.org/mod_mbox/samza-dev/201703.
> > > mbox/%3CCANazzuuHiO%3DvZQyFbTiYU-0Sfh3riK%3Dz4j_
> > AdCicQ8rBO%3DXuYQ%40mail.
> > > gmail.com%3E
> > >
> > > Please vote on this SEP asap. :)
> > >
> > > Thanks!
> > > Navina
> > >
> >
>



-- 
Navina R.


[VOTE] SEP-1: Semantics of ProcessorId in Samza

2017-03-28 Thread Navina Ramesh (Apache)
Hi everyone,

This is a voting thread for SEP-1: Semantics of ProcessorId in Samza.
For reference, here is the wiki link:
https://cwiki.apache.org/confluence/display/SAMZA/SEP-1%3A+Semantics+of+ProcessorId+in+Samza

Link to discussion mail thread:
http://mail-archives.apache.org/mod_mbox/samza-dev/201703.mbox/%3CCANazzuuHiO%3DvZQyFbTiYU-0Sfh3riK%3Dz4j_AdCicQ8rBO%3DXuYQ%40mail.gmail.com%3E

Please vote on this SEP asap. :)

Thanks!
Navina


Re: Steps to Upgrading Samza (0.9 to 0.12)

2017-03-27 Thread Navina Ramesh (Apache)
@Jake: Yes. We removed the migration code (for 0.9 to 0.10) in the 0.11
release, I believe.

@XiaoChuan: As per Jagadish's recommendation, if you have changelog backed
stores, you should upgrade from 0.9.1 to 0.10.0 before upgrading to samza
0.12.0.

I checked with LinkedIn's internal release notes. The most significant
change listed is adding a new configuration *job.coordinator.system*. This
system can be the same as your currently configured checkpoint system
(task.checkpoint.system). I am assuming you are using
KafkaCheckpointManagerFactory. If you are using other custom checkpoint
managers, the migration may be more involved. Please let us know and we can
try to help you out.

Feel free to email us if you have more questions.

Cheers!
Navina

On Mon, Mar 27, 2017 at 10:07 AM, Jagadish Venkatraman <
jagadish1...@gmail.com> wrote:

> Good observation Jake!
>
> The code for migration was removed in Samza 11. The migration would read
> change-log offsets from the checkpoint topic and write them to the
> coordinator stream.
>
> If you're using change-logged stores, I'd recommend upgrading from 0.9.1 to
> 0.10.0 first.
> Otherwise, you will loose offsets for change-logged stores.
>
> I suspect you should be okay for 0.10.0 to 0.12 upgrade.
>
> On Mon, Mar 27, 2017 at 9:30 AM, Jacob Maes  wrote:
>
> > As I recall, samza 0.10 introduced the coordinator stream and there was
> > code to do an automatic migration to use that feature. @navina, @yi, do
> you
> > know if that migration code is still in samza 12?
> >
> > If not, then it's probably better to update from 0.9.1 to 0.10.0 and then
> > to 0.12.0. I don't think there were any changes requiring migration
> between
> > 0.10.and 0.12, so upgrading directly from 0.10 to 0.12 is probably less
> of
> > an issue.
> >
> > On Fri, Mar 24, 2017 at 11:05 PM, Jagadish Venkatraman <
> > jagadish1...@gmail.com> wrote:
> >
> > > Hi Xiaochuan,
> > >
> > > >> Do I need to upgrade Kafka and/or YARN?
> > >
> > > *Yarn version:*
> > >
> > >- Samza 0.12 supports Yarn 2.6.1 and 2.7.1.
> > >- If you already have 2.6.0 installed (as you have said), I believe
> > you
> > >will be fine. (but I'm not sure)
> > >
> > > *Kafka version: *
> > >
> > >- Samza 0.12 upgraded the version of Kafka to 0.10.
> > >- If your Kafka brokers are on an older version of Kafka, you should
> > >upgrade them to use at-least 0.10. Kafka clients are usually
> > >incompatible with older versions of brokers.
> > >
> > > *Java version: *
> > >
> > >
> > >
> > >- Samza 0.12 binaries are compiled using Java 8.  Hence, they cannot
> > be
> > >run on older versions of the Java run-time.
> > >
> > >
> > > >> I'm extremely new to Samza in terms of operations aspect. I'm not
> sure
> > > what
> > > information would be relevant in this case so please ask away.
> > >
> > > I'd first start by upgrading the Kafka brokers (assuming you're on Java
> > 8+
> > > already).
> > > Let us know how the migration goes!
> > >
> > > Thanks,
> > > Jagadish
> > >
> > >
> > > On Fri, Mar 24, 2017 at 8:23 PM, XiaoChuan Yu 
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > What are the general steps for upgrading Samza from 0.9 to 0.12?
> > > > Do I need to upgrade Kafka and/or YARN?
> > > >
> > > > I don't know how Samza was setup initially but we currently have the
> > > > following setup:
> > > >
> > > > Samza version: 0.9.1
> > > > YARN version: Hadoop 2.6.0-cdh5.4.8
> > > > Kafka version: 0.9.0.1
> > > >
> > > > I think installation of Kafka and YARN were managed through Puppet.
> > > > I'm extremely new to Samza in terms of operations aspect. I'm not
> sure
> > > what
> > > > information would be relevant in this case so please ask away.
> > > >
> > > > Thanks,
> > > > Xiaochuan Yu
> > > >
> > >
> > >
> > >
> > > --
> > > Jagadish V,
> > > Graduate Student,
> > > Department of Computer Science,
> > > Stanford University
> > >
> >
>
>
>
> --
> Jagadish V,
> Graduate Student,
> Department of Computer Science,
> Stanford University
>


Re: [DISCUSS] SEP-1: Semantics of ProcessorId in Samza

2017-03-21 Thread Navina Ramesh
Hi everyone,
I have updated the SEP
<https://cwiki.apache.org/confluence/display/SAMZA/SEP-1%3A+Semantics+of+ProcessorId+in+Samza>
based on all the feedback. Feel free to comment.

I will start the [vote] mail thread, if there are no further questions
within the next 24 hours.

Thanks!
Navina

On Tue, Mar 21, 2017 at 10:33 AM, Navina Ramesh (Apache) <nav...@apache.org>
wrote:

> Hi Jagadish,
> Thanks for the suggestion. You are right in that it should be the
> responsibility of the JobCoordinator to assign identifiers.
>
> > 'm only wondering if this logic could instead reside inside the
> Job Coordinator (which is internal to the StreamProcessor) instead of
> relying on something external to it?
>
> I think this is a consequence of our initial StandaloneJobCoordinator,
> which is pretty much a pass-through. I didn't see any usage for
> getProcessorId() and was wondering why we put it in the JobCoordinator
> interface. I think I should keep your design proposal from last year handy
> :) Thanks for pitching in!
>
>
> @All:
> Yesterday, there was a discussion on naming of the configuration used in
> this SEP - whether it should be within the "job" scope or "app" scope
> (introduced by SAMZA-1041
> <https://issues.apache.org/jira/browse/SAMZA-1041>).  Multi-stage feature
> and fluent-api for Samza introduces the notion of "application". Since the
> processorId generator config applies to all jobs within a Samza
> application, we decided to add the config for generator under "app" scope.
> Further details on config scope changes can be found in SAMZA-1120.
> <https://issues.apache.org/jira/browse/SAMZA-1120>
>
> I will send out an update once I change the SEP based on yesterday's
> meeting and Jagadish's idea.
>
> Thanks!
> Navina
>
> On Mon, Mar 20, 2017 at 5:22 PM, Jagadish Venkatraman <
> jagadish1...@gmail.com> wrote:
>
>> Thanks for writing this SEP!
>>
>> Here's an alternate approach instead of taking the "String processorId" as
>> a parameter in the constructor. In my view, the "processorId" could be
>> generated by the StreamProcessor internally (instead of being generated
>> up-stream and passed in). The Job Coordinator API could be as follows:
>>
>>
>> public interface JobCoordinator {
>>
>>  ProcessorIdGenerator getProcessorIDGenerator();
>>
>> // could be String getProcessorID()
>>
>>  JobModel getJobModel();
>>
>> }
>>
>> public interface ProcessorIDGenerator {
>>
>>  String getProcessorID();
>> }
>>
>>
>> For instance, an Yarn job coordinator can merely parse the ID from config,
>> and return it. A Zk backed implementation of the Job coordinator can agree
>> on IDs using coordination leveraging Zk. One nice property with this
>> approach is that it keeps all logic related to coordination, agreement on
>> the Job Model, leader election (with potentially pluggable components for
>> each) inside the JobCoordinator.
>>
>> To be clear, I'm all for pluggability for ID generation logic that this
>> SEP
>> advocates. I'm only wondering if this logic could instead reside inside
>> the
>> Job Coordinator (which is internal to the StreamProcessor) instead of
>> relying on something external to it?
>>
>> Of course, there may be other considerations around the way the current
>> code is structured that may prevent this. Let me know if you agree with
>> this change.
>>
>> Thanks,
>> Jag
>>
>>
>> On Thu, Mar 16, 2017 at 5:21 PM, Navina Ramesh
>> <nram...@linkedin.com.invalid
>> > wrote:
>>
>> > > I am working on the ApplicationRunner SEP right now. Will send out the
>> > discussion email once I am done.
>> >
>> > Perfect! :)
>> >
>> > On Thu, Mar 16, 2017 at 5:13 PM, xinyu liu <xinyuliu...@gmail.com>
>> wrote:
>> >
>> > > Right, the static factory is very simple as you said. It's pretty
>> > > convenient for the client to use.
>> > >
>> > > I am working on the ApplicationRunner SEP right now. Will send out the
>> > > discussion email once I am done.
>> > >
>> > > Thanks,
>> > > Xinyu
>> > >
>> > > On Thu, Mar 16, 2017 at 4:50 PM, Navina Ramesh (Apache) <
>> > nav...@apache.org
>> > > >
>> > > wrote:
>> > >
>> > > > > One minor thing I found is that the name of the config is camel
>> case
>> > > > (*processor.idGenerato

Re: [DISCUSS] SEP-1: Semantics of ProcessorId in Samza

2017-03-21 Thread Navina Ramesh (Apache)
Hi everyone,
I have updated the SEP
<https://cwiki.apache.org/confluence/display/SAMZA/SEP-1%3A+Semantics+of+ProcessorId+in+Samza>
based
on all the feedback. Feel free to comment.

I will start the [vote] mail thread, if there are no further questions
within the next 24 hours.

Thanks!
Navina

On Tue, Mar 21, 2017 at 10:33 AM, Navina Ramesh (Apache) <nav...@apache.org>
wrote:

> Hi Jagadish,
> Thanks for the suggestion. You are right in that it should be the
> responsibility of the JobCoordinator to assign identifiers.
>
> > 'm only wondering if this logic could instead reside inside the
> Job Coordinator (which is internal to the StreamProcessor) instead of
> relying on something external to it?
>
> I think this is a consequence of our initial StandaloneJobCoordinator,
> which is pretty much a pass-through. I didn't see any usage for
> getProcessorId() and was wondering why we put it in the JobCoordinator
> interface. I think I should keep your design proposal from last year handy
> :) Thanks for pitching in!
>
>
> @All:
> Yesterday, there was a discussion on naming of the configuration used in
> this SEP - whether it should be within the "job" scope or "app" scope
> (introduced by SAMZA-1041
> <https://issues.apache.org/jira/browse/SAMZA-1041>).  Multi-stage feature
> and fluent-api for Samza introduces the notion of "application". Since the
> processorId generator config applies to all jobs within a Samza
> application, we decided to add the config for generator under "app" scope.
> Further details on config scope changes can be found in SAMZA-1120.
> <https://issues.apache.org/jira/browse/SAMZA-1120>
>
> I will send out an update once I change the SEP based on yesterday's
> meeting and Jagadish's idea.
>
> Thanks!
> Navina
>
> On Mon, Mar 20, 2017 at 5:22 PM, Jagadish Venkatraman <
> jagadish1...@gmail.com> wrote:
>
>> Thanks for writing this SEP!
>>
>> Here's an alternate approach instead of taking the "String processorId" as
>> a parameter in the constructor. In my view, the "processorId" could be
>> generated by the StreamProcessor internally (instead of being generated
>> up-stream and passed in). The Job Coordinator API could be as follows:
>>
>>
>> public interface JobCoordinator {
>>
>>  ProcessorIdGenerator getProcessorIDGenerator();
>>
>> // could be String getProcessorID()
>>
>>  JobModel getJobModel();
>>
>> }
>>
>> public interface ProcessorIDGenerator {
>>
>>  String getProcessorID();
>> }
>>
>>
>> For instance, an Yarn job coordinator can merely parse the ID from config,
>> and return it. A Zk backed implementation of the Job coordinator can agree
>> on IDs using coordination leveraging Zk. One nice property with this
>> approach is that it keeps all logic related to coordination, agreement on
>> the Job Model, leader election (with potentially pluggable components for
>> each) inside the JobCoordinator.
>>
>> To be clear, I'm all for pluggability for ID generation logic that this
>> SEP
>> advocates. I'm only wondering if this logic could instead reside inside
>> the
>> Job Coordinator (which is internal to the StreamProcessor) instead of
>> relying on something external to it?
>>
>> Of course, there may be other considerations around the way the current
>> code is structured that may prevent this. Let me know if you agree with
>> this change.
>>
>> Thanks,
>> Jag
>>
>>
>> On Thu, Mar 16, 2017 at 5:21 PM, Navina Ramesh
>> <nram...@linkedin.com.invalid
>> > wrote:
>>
>> > > I am working on the ApplicationRunner SEP right now. Will send out the
>> > discussion email once I am done.
>> >
>> > Perfect! :)
>> >
>> > On Thu, Mar 16, 2017 at 5:13 PM, xinyu liu <xinyuliu...@gmail.com>
>> wrote:
>> >
>> > > Right, the static factory is very simple as you said. It's pretty
>> > > convenient for the client to use.
>> > >
>> > > I am working on the ApplicationRunner SEP right now. Will send out the
>> > > discussion email once I am done.
>> > >
>> > > Thanks,
>> > > Xinyu
>> > >
>> > > On Thu, Mar 16, 2017 at 4:50 PM, Navina Ramesh (Apache) <
>> > nav...@apache.org
>> > > >
>> > > wrote:
>> > >
>> > > > > One minor thing I found is that the name of the config is camel
>> case
>> > > > (*processor.idGenerato

Re: [DISCUSS] SEP-1: Semantics of ProcessorId in Samza

2017-03-21 Thread Navina Ramesh (Apache)
Hi Jagadish,
Thanks for the suggestion. You are right in that it should be the
responsibility of the JobCoordinator to assign identifiers.

> 'm only wondering if this logic could instead reside inside the
Job Coordinator (which is internal to the StreamProcessor) instead of
relying on something external to it?

I think this is a consequence of our initial StandaloneJobCoordinator,
which is pretty much a pass-through. I didn't see any usage for
getProcessorId() and was wondering why we put it in the JobCoordinator
interface. I think I should keep your design proposal from last year handy
:) Thanks for pitching in!


@All:
Yesterday, there was a discussion on naming of the configuration used in
this SEP - whether it should be within the "job" scope or "app" scope
(introduced by SAMZA-1041 <https://issues.apache.org/jira/browse/SAMZA-1041>).
Multi-stage feature and fluent-api for Samza introduces the notion of
"application". Since the processorId generator config applies to all jobs
within a Samza application, we decided to add the config for generator
under "app" scope. Further details on config scope changes can be found in
SAMZA-1120. <https://issues.apache.org/jira/browse/SAMZA-1120>

I will send out an update once I change the SEP based on yesterday's
meeting and Jagadish's idea.

Thanks!
Navina

On Mon, Mar 20, 2017 at 5:22 PM, Jagadish Venkatraman <
jagadish1...@gmail.com> wrote:

> Thanks for writing this SEP!
>
> Here's an alternate approach instead of taking the "String processorId" as
> a parameter in the constructor. In my view, the "processorId" could be
> generated by the StreamProcessor internally (instead of being generated
> up-stream and passed in). The Job Coordinator API could be as follows:
>
>
> public interface JobCoordinator {
>
>  ProcessorIdGenerator getProcessorIDGenerator();
>
> // could be String getProcessorID()
>
>  JobModel getJobModel();
>
> }
>
> public interface ProcessorIDGenerator {
>
>  String getProcessorID();
> }
>
>
> For instance, an Yarn job coordinator can merely parse the ID from config,
> and return it. A Zk backed implementation of the Job coordinator can agree
> on IDs using coordination leveraging Zk. One nice property with this
> approach is that it keeps all logic related to coordination, agreement on
> the Job Model, leader election (with potentially pluggable components for
> each) inside the JobCoordinator.
>
> To be clear, I'm all for pluggability for ID generation logic that this SEP
> advocates. I'm only wondering if this logic could instead reside inside the
> Job Coordinator (which is internal to the StreamProcessor) instead of
> relying on something external to it?
>
> Of course, there may be other considerations around the way the current
> code is structured that may prevent this. Let me know if you agree with
> this change.
>
> Thanks,
> Jag
>
>
> On Thu, Mar 16, 2017 at 5:21 PM, Navina Ramesh
> <nram...@linkedin.com.invalid
> > wrote:
>
> > > I am working on the ApplicationRunner SEP right now. Will send out the
> > discussion email once I am done.
> >
> > Perfect! :)
> >
> > On Thu, Mar 16, 2017 at 5:13 PM, xinyu liu <xinyuliu...@gmail.com>
> wrote:
> >
> > > Right, the static factory is very simple as you said. It's pretty
> > > convenient for the client to use.
> > >
> > > I am working on the ApplicationRunner SEP right now. Will send out the
> > > discussion email once I am done.
> > >
> > > Thanks,
> > > Xinyu
> > >
> > > On Thu, Mar 16, 2017 at 4:50 PM, Navina Ramesh (Apache) <
> > nav...@apache.org
> > > >
> > > wrote:
> > >
> > > > > One minor thing I found is that the name of the config is camel
> case
> > > > (*processor.idGenerator.class*). Seems Samza's practice is to use
> all
> > > > lower
> > > > case configs with "." delimiter. Do you think we should stick to this
> > > > convention?
> > > >
> > > > I am always torn between the "convention" we have and the better way
> of
> > > > doing things. But I don't have strong opinions about it. I can change
> > it.
> > > >
> > > > > One more suggestion is to have a static factory method in the
> > > > ProcessorIdGenerator (Like what we have in ApplicationRunner):
> > > >
> > > > I couldn't grasp these requirements from the ApplicationRunner
> design.
> > It
> > > > will be great if you can put it out in an SEP :)
> > > >
> > > > I can add th

Re: [DISCUSS] Support Scala 2.12

2017-03-17 Thread Navina Ramesh
Thanks for creating the DISCUSS email!

This is good. It's a good idea to update to 2.12 since it looks like we are
fully backward compatible with older versions. +1 from me.

Cheers!
Navina

On Fri, Mar 17, 2017 at 1:34 PM, Jagadish Venkatraman <
jagadish1...@gmail.com> wrote:

> Thanks for starting this discussion and the patch. +1 for supporting scala
> 2.12.  I assume the changes are fully backwards compatible with scala 2.10,
> 2.11 (as evidenced by your check-all)?
>
> Also, another observation is that the generated Samza binaries will have
> 2.12 as the suffix for the future release (I this should be totally OK).
>
>
> On Fri, Mar 17, 2017 at 1:26 PM, Maksim Logvinenko 
> wrote:
>
> > Hi guys,
> >
> > I’ve created JIRA and already submitted patch which adds support of scala
> > 2.12. Here is the ticket: https://issues.apache.org/
> jira/browse/SAMZA-1135
> > .
> > Nothing serious: I’ve removed JavaConversions usage (because it’s marked
> as
> > deprecated now) and bumped kafka and scalatest versions since previous
> > versions don’t have scala 2.12 support. I run ./bin/check-all.sh on my
> > laptop and it was successful for all scala versions (2.10, 2.11 and 2.12)
> > and for both YARN versions.
> >
> > Thanks,
> > Maxim Logvinenko
> >
>
>
>
> --
> Jagadish V,
> Graduate Student,
> Department of Computer Science,
> Stanford University
>



-- 
Navina R.


Re: [DISCUSS] SEP-1: Semantics of ProcessorId in Samza

2017-03-16 Thread Navina Ramesh
> I am working on the ApplicationRunner SEP right now. Will send out the
discussion email once I am done.

Perfect! :)

On Thu, Mar 16, 2017 at 5:13 PM, xinyu liu <xinyuliu...@gmail.com> wrote:

> Right, the static factory is very simple as you said. It's pretty
> convenient for the client to use.
>
> I am working on the ApplicationRunner SEP right now. Will send out the
> discussion email once I am done.
>
> Thanks,
> Xinyu
>
> On Thu, Mar 16, 2017 at 4:50 PM, Navina Ramesh (Apache) <nav...@apache.org
> >
> wrote:
>
> > > One minor thing I found is that the name of the config is camel case
> > (*processor.idGenerator.class*). Seems Samza's practice is to use all
> > lower
> > case configs with "." delimiter. Do you think we should stick to this
> > convention?
> >
> > I am always torn between the "convention" we have and the better way of
> > doing things. But I don't have strong opinions about it. I can change it.
> >
> > > One more suggestion is to have a static factory method in the
> > ProcessorIdGenerator (Like what we have in ApplicationRunner):
> >
> > I couldn't grasp these requirements from the ApplicationRunner design. It
> > will be great if you can put it out in an SEP :)
> >
> > I can add the static factory method for it. Just to clarify, the static
> > method simply class loads the ProcessorIdGenerator ? It uses reflection
> to
> > create the instance ?
> >
> > Thanks!
> > Navina
> >
> >
> >
> > On Thu, Mar 16, 2017 at 4:31 PM, xinyu liu <xinyuliu...@gmail.com>
> wrote:
> >
> > > The proposal looks great to me! Changing the id type to string will
> make
> > > sure this can work with other types of cluster which doesn't support
> > > integer id. The interface and config provides a pluggable way to have
> > > different id generators for different use cases. One minor thing I
> found
> > is
> > > that the name of the config is camel case
> (*processor.idGenerator.class*
> > ).
> > > Seems Samza's practice is to use all lower case configs with "."
> > delimiter.
> > > Do you think we should stick to this convention?
> > >
> > > One more suggestion is to have a static factory method in
> > > the ProcessorIdGenerator (Like what we have in ApplicationRunner):
> > >
> > > static ProcessIdGenerator fromConfig(Config config) { ... }.
> > >
> > > With this, It will be more convenient for the ApplicationRunner to
> > > construct the generator. What do you think?
> > >
> > > Thanks,
> > > Xinyu
> > >
> > > On Wed, Mar 15, 2017 at 10:59 PM, Navina Ramesh (Apache) <
> > > nav...@apache.org>
> > > wrote:
> > >
> > > > Hi everyone,
> > > > I created a proposal for SAMZA-1126, which addresses the semantics of
> > > > ProcessorId in Samza. For most purposes, ProcessorId is same as the
> > > logical
> > > > id that Samza assigns for each Yarn container. It is primarily used
> in
> > > > JobModel as a key for the corresponding ContainerModel and also, in
> > > > container-level metrics. We are expanding the applicability of
> > > processorId
> > > > to be beyond a fixed set of processors.
> > > >
> > > > Please review and comment on this SEP.
> > > >
> > > > For those who are not actively following the master branch, you may
> > have
> > > > more questions than others. Feel free to ask them here.
> > > >
> > > > @Xinyu: Since you are working on SAMZA-1067 and other related
> > integration
> > > > APIs, can you please add an SEP for SAMZA-1067 ? This will help
> others
> > > (adn
> > > > me as well) get on the same page with your design/code. Let me know
> if
> > > > SEP-1 will work per your design for ApplicationRunner.
> > > >
> > > > Thanks!
> > > > Navina
> > > >
> > >
> >
>



-- 
Navina R.


Re: [DISCUSS] SEP-1: Semantics of ProcessorId in Samza

2017-03-16 Thread Navina Ramesh (Apache)
> One minor thing I found is that the name of the config is camel case
(*processor.idGenerator.class*). Seems Samza's practice is to use all lower
case configs with "." delimiter. Do you think we should stick to this
convention?

I am always torn between the "convention" we have and the better way of
doing things. But I don't have strong opinions about it. I can change it.

> One more suggestion is to have a static factory method in the
ProcessorIdGenerator (Like what we have in ApplicationRunner):

I couldn't grasp these requirements from the ApplicationRunner design. It
will be great if you can put it out in an SEP :)

I can add the static factory method for it. Just to clarify, the static
method simply class loads the ProcessorIdGenerator ? It uses reflection to
create the instance ?

Thanks!
Navina



On Thu, Mar 16, 2017 at 4:31 PM, xinyu liu <xinyuliu...@gmail.com> wrote:

> The proposal looks great to me! Changing the id type to string will make
> sure this can work with other types of cluster which doesn't support
> integer id. The interface and config provides a pluggable way to have
> different id generators for different use cases. One minor thing I found is
> that the name of the config is camel case (*processor.idGenerator.class*).
> Seems Samza's practice is to use all lower case configs with "." delimiter.
> Do you think we should stick to this convention?
>
> One more suggestion is to have a static factory method in
> the ProcessorIdGenerator (Like what we have in ApplicationRunner):
>
> static ProcessIdGenerator fromConfig(Config config) { ... }.
>
> With this, It will be more convenient for the ApplicationRunner to
> construct the generator. What do you think?
>
> Thanks,
> Xinyu
>
> On Wed, Mar 15, 2017 at 10:59 PM, Navina Ramesh (Apache) <
> nav...@apache.org>
> wrote:
>
> > Hi everyone,
> > I created a proposal for SAMZA-1126, which addresses the semantics of
> > ProcessorId in Samza. For most purposes, ProcessorId is same as the
> logical
> > id that Samza assigns for each Yarn container. It is primarily used in
> > JobModel as a key for the corresponding ContainerModel and also, in
> > container-level metrics. We are expanding the applicability of
> processorId
> > to be beyond a fixed set of processors.
> >
> > Please review and comment on this SEP.
> >
> > For those who are not actively following the master branch, you may have
> > more questions than others. Feel free to ask them here.
> >
> > @Xinyu: Since you are working on SAMZA-1067 and other related integration
> > APIs, can you please add an SEP for SAMZA-1067 ? This will help others
> (adn
> > me as well) get on the same page with your design/code. Let me know if
> > SEP-1 will work per your design for ApplicationRunner.
> >
> > Thanks!
> > Navina
> >
>


Re: [DISCUSS] SAMZA-1141 - Apache Samza Development Process Improvements

2017-03-14 Thread Navina Ramesh
Xinyu,
I considered doing that as an example. But I want to keep SEP to be only
for technical discussions and not process related proposals.

Navina

On Mar 14, 2017 17:23, "xinyu liu" <xinyuliu...@gmail.com> wrote:

> +1 on this proposal too. Could you actually put this proposal as the first
> SEP (like SEP-0), so it serves an example of how it will look like in
> practice?
>
> Xinyu
>
> On Tue, Mar 14, 2017 at 3:34 PM, Navina Ramesh
> <nram...@linkedin.com.invalid
> > wrote:
>
> > Just to clarify: The proposal for code and design process change is
> > attached as a PDF/markdown to the JIRA - SAMZA-1141.
> >
> > Also, please show your support specifically for code and design process.
> My
> > bad for not calling it out earlier :)
> >
> > Thanks!
> > Navina
> >
> > On Tue, Mar 14, 2017 at 3:30 PM, Jagadish Venkatraman <
> > jagadish1...@gmail.com> wrote:
> >
> > > Thanks for writing this up.
> > >
> > > I'm +1 on this proposal.
> > >
> > >
> > >
> > > On Tue, Mar 14, 2017 at 3:15 PM, Navina Ramesh (Apache) <
> > nav...@apache.org
> > > >
> > > wrote:
> > >
> > > > Hi everyone,
> > > >
> > > > We switched to using Pull Requests for code reviews a few months
> back.
> > > > Clearly, there are some drawbacks to that model and we are trying to
> > > > address the shortcomings. I have gathered input from some of the
> > > committers
> > > > regarding what is missing the code review process and what can be
> > > improved.
> > > > Please take a look and provide feedback.
> > > >
> > > > Additionally, we are considering moving to a KIP/FLIP-like model for
> > > > submitting design proposals (major changes to samza). Lately, there
> > have
> > > > been some major feature discussions that are not documented
> > consistently
> > > in
> > > > a centralized location. The proposal in SAMZA-1141
> > > > <https://issues.apache.org/jira/browse/SAMZA-1141> address the
> design
> > > > review process as well. Please review it too. I have already created
> a
> > > wiki
> > > > page
> > > > <https://cwiki.apache.org/confluence/display/SAMZA/
> > > > Samza+Enhancement+Proposal>
> > > > describing the Samza Enhancement Proposal (SEP) process and an SEP
> > > > template. Going forward, let's start adding all major change
> proposals
> > to
> > > > the wiki and discuss the design on the mailing list.
> > > >
> > > > Your cooperation is highly appreciated during this period of
> transition
> > > in
> > > > the process :)
> > > >
> > > > Feedbacks welcome!
> > > >
> > > > Thanks!
> > > > --
> > > > Navina R
> > > >
> > > > PS: Alternatives name suggestions for "SEP" are welcome !
> > > >
> > >
> > >
> > >
> > > --
> > > Jagadish V,
> > > Graduate Student,
> > > Department of Computer Science,
> > > Stanford University
> > >
> >
> >
> >
> > --
> > Navina R.
> >
>


Re: [DISCUSS] SAMZA-1141 - Apache Samza Development Process Improvements

2017-03-14 Thread Navina Ramesh
Just to clarify: The proposal for code and design process change is
attached as a PDF/markdown to the JIRA - SAMZA-1141.

Also, please show your support specifically for code and design process. My
bad for not calling it out earlier :)

Thanks!
Navina

On Tue, Mar 14, 2017 at 3:30 PM, Jagadish Venkatraman <
jagadish1...@gmail.com> wrote:

> Thanks for writing this up.
>
> I'm +1 on this proposal.
>
>
>
> On Tue, Mar 14, 2017 at 3:15 PM, Navina Ramesh (Apache) <nav...@apache.org
> >
> wrote:
>
> > Hi everyone,
> >
> > We switched to using Pull Requests for code reviews a few months back.
> > Clearly, there are some drawbacks to that model and we are trying to
> > address the shortcomings. I have gathered input from some of the
> committers
> > regarding what is missing the code review process and what can be
> improved.
> > Please take a look and provide feedback.
> >
> > Additionally, we are considering moving to a KIP/FLIP-like model for
> > submitting design proposals (major changes to samza). Lately, there have
> > been some major feature discussions that are not documented consistently
> in
> > a centralized location. The proposal in SAMZA-1141
> > <https://issues.apache.org/jira/browse/SAMZA-1141> address the design
> > review process as well. Please review it too. I have already created a
> wiki
> > page
> > <https://cwiki.apache.org/confluence/display/SAMZA/
> > Samza+Enhancement+Proposal>
> > describing the Samza Enhancement Proposal (SEP) process and an SEP
> > template. Going forward, let's start adding all major change proposals to
> > the wiki and discuss the design on the mailing list.
> >
> > Your cooperation is highly appreciated during this period of transition
> in
> > the process :)
> >
> > Feedbacks welcome!
> >
> > Thanks!
> > --
> > Navina R
> >
> > PS: Alternatives name suggestions for "SEP" are welcome !
> >
>
>
>
> --
> Jagadish V,
> Graduate Student,
> Department of Computer Science,
> Stanford University
>



-- 
Navina R.


Re: [VOTE] Apache Samza 0.12.0 RC2

2017-02-13 Thread Navina Ramesh
I ran check-all against Mac and integration tests on Linux. Looks good with
no concerning issues.

+1 (binding)

Thanks!
Navina

On Fri, Feb 10, 2017 at 9:25 AM, Boris S  wrote:

> I also successfully ran the integration tests on Linux. All passed.
> +1 non-binding
>
> On Wed, Feb 8, 2017 at 4:57 PM, Jacob Maes  wrote:
>
> > Build and integration tests were successful for me.
> >
> > +1 non-binding
> >
> > On Wed, Feb 8, 2017 at 4:48 PM, xinyu liu  wrote:
> >
> > > Ran build, checkAll and integration tests. All passed.
> > >
> > > +1 non-binding.
> > >
> > > Thanks,
> > > Xinyu
> > >
> > > On Wed, Feb 8, 2017 at 4:18 PM, Boris S  wrote:
> > >
> > > > Cloned the release and ran build, test and checkAll.sh
> > > > All passed.
> > > > Verified MD5 and the signature.
> > > > Got warning - "this key is not certified with a trusted signature". I
> > > guess
> > > > it is ok.
> > > >
> > > > +1
> > > >
> > > > On Mon, Feb 6, 2017 at 5:32 PM, Jagadish Venkatraman <
> > > > jagadish1...@gmail.com
> > > > > wrote:
> > > >
> > > > > This is a call for a vote on a release of Apache Samza 0.12.0.
> Thanks
> > > to
> > > > > everyone who has contributed to this release. We are very glad to
> see
> > > > some
> > > > > new contributors in this release.
> > > > >
> > > > > The release candidate can be downloaded from here:
> > > > > http://home.apache.org/~jagadish/samza-0.12.0-rc2/
> > > > >
> > > > > The release candidate is signed with pgp key AF81FFBF, which can be
> > > found
> > > > > on keyservers:
> > > > > http://pgp.mit.edu/pks/lookup?op=get=0xAF81FFBF
> > > > >
> > > > > The git tag is release-0.12.0-rc2 and signed with the same pgp key:
> > > > > https://git-wip-us.apache.org/repos/asf?p=samza.git;a=tag;h=
> > > > > refs/tags/release-0.12.0-rc2
> > > > >
> > > > > Test binaries have been published to Maven's staging repository,
> and
> > > are
> > > > > available here:
> > > > > https://repository.apache.org/content/repositories/
> > orgapachesamza-1018
> > > > >
> > > > > Note that the binaries were built with JDK8 without incident.
> > > > >
> > > > > 26 issues were resolved for this release:
> > > > > https://issues.apache.org/jira/issues/?jql=project%20%3D%20S
> > > > > AMZA%20AND%20fixVersion%20in%20(0.12%2C%200.12.0)%20AND%20st
> > > > > atus%20in%20(Resolved%2C%20Closed)
> > > > >
> > > > > The vote will be open for 72 hours (end in 6PM Thursday, 02/09/2017
> > ).
> > > > >
> > > > > Please download the release candidate, check the hashes/signature,
> > > build
> > > > it
> > > > > and test it, and then please vote:
> > > > >
> > > > >
> > > > > [ ] +1 approve
> > > > >
> > > > > [ ] +0 no opinion
> > > > >
> > > > > [ ] -1 disapprove (and reason why)
> > > > >
> > > > >
> > > > > +1 from my side for the release.
> > > > >
> > > > > Cheers!
> > > > >
> > > > > --
> > > > > Jagadish V,
> > > > > Graduate Student,
> > > > > Department of Computer Science,
> > > > > Stanford University
> > > > >
> > > >
> > >
> >
>



-- 
Navina R.


Re: Review Request 52570: SAMZA-1025: documentation for hdfs system consumer

2017-01-27 Thread Navina Ramesh

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52570/#review163355
---


Fix it, then Ship it!




Some nits and comments. Otherwise, looks good. Thanks! +1


docs/learn/documentation/versioned/hdfs/consumer.md (line 39)
<https://reviews.apache.org/r/52570/#comment234764>

This line is confusing. Are you implying that I can read from non-avro 
formatted files that are in HDFS ? 
What is the significance of the SingleFileHdfsReader interface ? It is not 
clear to the reader.



docs/learn/documentation/versioned/hdfs/consumer.md (line 89)
<https://reviews.apache.org/r/52570/#comment234762>

Nit: Can you move the explanation of what advanced partitioning is  outside 
of the code block? 
You can emphasize the reserved term note by doing -> 
**note**  , when it is outside the code block



docs/learn/documentation/versioned/jobs/configuration-table.html (line 1822)
<https://reviews.apache.org/r/52570/#comment234763>

Look like a typo. It should "systems.*, instead of "system.*" ?


- Navina Ramesh


On Jan. 27, 2017, 5:48 p.m., Hai Lu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52570/
> ---
> 
> (Updated Jan. 27, 2017, 5:48 p.m.)
> 
> 
> Review request for samza.
> 
> 
> Bugs: SAMZA-1025
> https://issues.apache.org/jira/browse/SAMZA-1025
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> documentation for hdfs system consumer
> 
> 
> Diffs
> -
> 
>   docs/learn/documentation/versioned/hdfs/consumer.md PRE-CREATION 
>   docs/learn/documentation/versioned/hdfs/producer.md 
> b0e936f5b0a9c945ea7f02bfc2536ef50f017bf6 
>   docs/learn/documentation/versioned/index.html 
> d0b14ece94341e2cb937cf32db480e69f93303c2 
>   docs/learn/documentation/versioned/jobs/configuration-table.html 
> ba5ebbc54b5c64f82f35ed781dad7023a8f920e1 
> 
> Diff: https://reviews.apache.org/r/52570/diff/
> 
> 
> Testing
> ---
> 
> N/A
> 
> 
> Thanks,
> 
> Hai Lu
> 
>



Re: Review Request 52570: SAMZA-1025: documentation for hdfs system consumer

2017-01-27 Thread Navina Ramesh


> On Jan. 25, 2017, 10:36 p.m., Jagadish Venkatraman wrote:
> > docs/learn/documentation/versioned/hdfs/consumer.md, line 67
> > 
> >
> > The relationship between whitelist and blacklist was not very obvious 
> > to me.
> > 
> > Is the behavior that the whitelist is applied first, and the blacklist 
> > is applied to the matched files later? (to determine which files are to be 
> > ignored).
> 
> Hai Lu wrote:
> The order doesn't matter. (X & whitelist) - blacklist == (X - blacklist) 
> & whitelist

This is assuming that whitelist and blacklist are mutually exclusive, right?


- Navina


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52570/#review163011
---


On Jan. 27, 2017, 5:48 p.m., Hai Lu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52570/
> ---
> 
> (Updated Jan. 27, 2017, 5:48 p.m.)
> 
> 
> Review request for samza.
> 
> 
> Bugs: SAMZA-1025
> https://issues.apache.org/jira/browse/SAMZA-1025
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> documentation for hdfs system consumer
> 
> 
> Diffs
> -
> 
>   docs/learn/documentation/versioned/hdfs/consumer.md PRE-CREATION 
>   docs/learn/documentation/versioned/hdfs/producer.md 
> b0e936f5b0a9c945ea7f02bfc2536ef50f017bf6 
>   docs/learn/documentation/versioned/index.html 
> d0b14ece94341e2cb937cf32db480e69f93303c2 
>   docs/learn/documentation/versioned/jobs/configuration-table.html 
> ba5ebbc54b5c64f82f35ed781dad7023a8f920e1 
> 
> Diff: https://reviews.apache.org/r/52570/diff/
> 
> 
> Testing
> ---
> 
> N/A
> 
> 
> Thanks,
> 
> Hai Lu
> 
>



Re: Review Request 52570: SAMZA-1025: documentation for hdfs system consumer

2017-01-27 Thread Navina Ramesh


> On Jan. 25, 2017, 10:50 p.m., Navina Ramesh wrote:
> > docs/learn/documentation/versioned/hdfs/consumer.md, line 26
> > <https://reviews.apache.org/r/52570/diff/2/?file=1613256#file1613256line26>
> >
> > Can you include the diagram from your design document?  Or something 
> > similar to elaborate how the setup should look like?
> 
> Hai Lu wrote:
> The diagram was mostly for the situation at LinkedIn where we have 
> separte yarn clusters - one for Samza, one for Hadoop. "Your job needs to run 
> on the same YARN cluster which hosts the HDFS you want to consume from."  Is 
> this statement not clear enough? What suggestion do you have in terms of the 
> wording?

My bad. I thought it was generic architecture diagram. Didn't realize it was 
specific to LinkedIn's deployment. Please ignore this comment.


- Navina


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52570/#review163036
---


On Jan. 27, 2017, 5:48 p.m., Hai Lu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52570/
> ---
> 
> (Updated Jan. 27, 2017, 5:48 p.m.)
> 
> 
> Review request for samza.
> 
> 
> Bugs: SAMZA-1025
> https://issues.apache.org/jira/browse/SAMZA-1025
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> documentation for hdfs system consumer
> 
> 
> Diffs
> -
> 
>   docs/learn/documentation/versioned/hdfs/consumer.md PRE-CREATION 
>   docs/learn/documentation/versioned/hdfs/producer.md 
> b0e936f5b0a9c945ea7f02bfc2536ef50f017bf6 
>   docs/learn/documentation/versioned/index.html 
> d0b14ece94341e2cb937cf32db480e69f93303c2 
>   docs/learn/documentation/versioned/jobs/configuration-table.html 
> ba5ebbc54b5c64f82f35ed781dad7023a8f920e1 
> 
> Diff: https://reviews.apache.org/r/52570/diff/
> 
> 
> Testing
> ---
> 
> N/A
> 
> 
> Thanks,
> 
> Hai Lu
> 
>



Re: Review Request 52570: SAMZA-1025: documentation for hdfs system consumer

2017-01-25 Thread Navina Ramesh

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52570/#review163036
---



Thanks for adding the documentation!


docs/learn/documentation/versioned/hdfs/consumer.md (line 26)
<https://reviews.apache.org/r/52570/#comment234455>

Can you include the diagram from your design document?  Or something 
similar to elaborate how the setup should look like?



docs/learn/documentation/versioned/hdfs/consumer.md (line 42)
<https://reviews.apache.org/r/52570/#comment234458>

Can you add a snippet of the interface here as well? It is easier for the 
user to skim through.



docs/learn/documentation/versioned/hdfs/consumer.md (line 50)
<https://reviews.apache.org/r/52570/#comment234460>

Replace "users" with "user application". 

We provide the capability for the user application to get notified when ...

Rephrase "To do so, simply implement the interface 
[EndOfStreamListenerTask]" as "In order to receieve notications on EndOfStream 
with the task, the user application should simply implement the interface ..."



docs/learn/documentation/versioned/hdfs/consumer.md (line 54)
<https://reviews.apache.org/r/52570/#comment234461>

I think you can skip the "job.properties" file. The readers may easily 
assume there is a separate properties file.



docs/learn/documentation/versioned/hdfs/consumer.md (line 75)
<https://reviews.apache.org/r/52570/#comment234462>

Typo "configs are required"



docs/learn/documentation/versioned/hdfs/consumer.md (line 77)
<https://reviews.apache.org/r/52570/#comment234464>

I don't think you need to mentioned the JIRA that introduced a feature. If 
there is documentation related to security in Samza, you can link to it. You 
can link to the javadoc for SamzaContainerSecurityManager.

You can add a brief description of the feature. For example, the 
SamzaContainer fetches and renews the Kerberos delegation tokens when the job 
is running in a secure environment. User application needs to specify ..



docs/learn/documentation/versioned/hdfs/consumer.md (line 93)
<https://reviews.apache.org/r/52570/#comment234465>

Can you elaborate the "advanced partitioning" feature here  and remove the 
link for design doc? If it helps, you can just copy-and-paste the design doc 
content here and edit it :)



docs/learn/documentation/versioned/hdfs/consumer.md (line 102)
<https://reviews.apache.org/r/52570/#comment234468>

Looks like there are more configurations that are mentioned in the 
configuration table. Can you please add the link to configuration table to 
imply that?



docs/learn/documentation/versioned/hdfs/consumer.md (line 105)
<https://reviews.apache.org/r/52570/#comment234466>

You can add a link to the design doc pdf here , instead of the JIRA link.


- Navina Ramesh


On Jan. 24, 2017, 2:07 a.m., Hai Lu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52570/
> ---
> 
> (Updated Jan. 24, 2017, 2:07 a.m.)
> 
> 
> Review request for samza.
> 
> 
> Bugs: SAMZA-1025
> https://issues.apache.org/jira/browse/SAMZA-1025
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> documentation for hdfs system consumer
> 
> 
> Diffs
> -
> 
>   docs/learn/documentation/versioned/hdfs/consumer.md PRE-CREATION 
>   docs/learn/documentation/versioned/hdfs/producer.md 
> b0e936f5b0a9c945ea7f02bfc2536ef50f017bf6 
>   docs/learn/documentation/versioned/index.html 
> d0b14ece94341e2cb937cf32db480e69f93303c2 
>   docs/learn/documentation/versioned/jobs/configuration-table.html 
> ba5ebbc54b5c64f82f35ed781dad7023a8f920e1 
> 
> Diff: https://reviews.apache.org/r/52570/diff/
> 
> 
> Testing
> ---
> 
> N/A
> 
> 
> Thanks,
> 
> Hai Lu
> 
>



Upcoming Stream Processing Meetup in February!

2017-01-24 Thread Navina Ramesh
Hi everyone,

We would like to invite you to a Stream Processing Meetup at LinkedIn’s
Sunnyvale campus on Thursday, February 16 at 6pm.

Please RSVP here (if you intend to attend in person):
https://www.meetup.com/Stream-Processing-Meetup-LinkedIn/events/237171557/


We have the following agenda scheduled:

   -

   6PM: Doors open
   -

   6-6:35PM: Networking & Welcome
   -

   6:35-7:10 PM: Asynchronous Processing and Multithreading in Apache
Samza (Xinyu
   Liu, LinkedIn)
   -

   7:15-7:50PM: SSD Benchmarks for Apache Kafka (Mingmin Chen, Uber)
   -

   7:55-8:30 PM: Batching to Streaming Analytics at Optimizely (Vignesh
   Sukumar, Optimizely)

Hope to see you there!

Cheers!
-- 
Navina R.


Re: [DISCUSS] Samza 0.12.0 release

2016-12-23 Thread Navina Ramesh
+1

Navina

On Fri, Dec 23, 2016 at 2:53 PM, xinyu liu  wrote:

> +1 on the new release.
>
> Thanks,
> Xinyu
>
> On Fri, Dec 23, 2016 at 2:50 PM, Yi Pan  wrote:
>
> > Yep! Thanks for pointing it out! @Shanthoosh, can you update the list
> > accordingly?
> >
> > Thanks!
> >
> > On Fri, Dec 23, 2016 at 2:42 PM, Fred Ji 
> wrote:
> >
> > > I think we also would like to put this one in the potential breaking
> > > change:
> > >
> > > SAMZA-1048 : upgrade jetty dependency to Jetty 9 from Jetty 8
> > >
> > > Thanks,
> > >
> > > Fred
> > >
> > > On Fri, Dec 23, 2016 at 2:38 PM, Yi Pan  wrote:
> > >
> > > > lgtm, +1
> > > >
> > > > On Fri, Dec 23, 2016 at 10:44 AM, santhosh venkat <
> > > > santhoshvenkat1...@gmail.com> wrote:
> > > >
> > > > > Hi All,
> > > > >
> > > > > There have been quite a lot of new features added to master since
> > 0.11
> > > > > release to warrant a new major release. At LinkedIn, we've done
> > > > functional
> > > > > and performance testing against master in the past weeks, and
> > deployed
> > > > jobs
> > > > > with the latest build in production. We will continue to test for
> > > > stability
> > > > > in the next few weeks.
> > > > >
> > > > > Here are the list of JIRA patches that are available to be added
> as a
> > > > part
> > > > > of 0.12.0 release.
> > > > >
> > > > > Potential breaking changes :
> > > > >
> > > > >* SAMZA-469 : Support Scala 2.11 (not binary compatible with
> > 2.10.x
> > > > > versions)
> > > > >* SAMZA-855 : Upgrade Samza's Kafka client version to 0.10.0.0 (
> > > > > https://kafka.apache.org/documentation#upgrade_10_breaking)
> > > > >* SAMZA-1031 : Use Java 1.8 source compatibility for Samza
> (Java 7
> > > is
> > > > no
> > > > > longer supported)
> > > > >
> > > > > Here are the JIRAs of other main features that will be included in
> > this
> > > > > release (sorted in chronological order):
> > > > >
> > > > >* SAMZA-967 : Add HDFS system consumer to Samza
> > > > >* SAMZA-974 : Build an end-of-stream concept into Samza
> > > > >* SAMZA-1012 : Generated changelog mappings are not consistent
> > > > >* SAMZA-1013 : Add YARN Node label support
> > > > >* SAMZA-1014 : Add property to set YARN AM cpu cores
> > > > >* SAMZA-1015 : Support lambdas, streams in checkStyle checks
> > > > >* SAMZA-1017 : Disk Quotas - Add throttling support for
> > AsyncRunLoop
> > > > >* SAMZA-1033 : Remove import-control from checkstyle
> > > > >* SAMZA-1040 : Revert the ClassLoaderHelper change in
> > SamzaContainer
> > > > >* SAMZA-1042 : Allow offset notifications for input systems
> > > > >* SAMZA-1043 : Samza performance improvements
> > > > >* SAMZA-1047 : testEndOfStreamWithOutOfOrderProcess is flaky
> > > > >* SAMZA-1048 : upgrade jetty dependency to Jetty 9 from Jetty 8
> > > > >* SAMZA-1055 : Disable broken tests in SamzaRest due to Jetty
> > > version
> > > > > upgrade
> > > > >* SAMZA-1058 : Fix check-all.sh to remove JDK7 build
> > > > >* SAMZA-1060 : Allow to specify a changelog system separately
> > > > >* SAMZA-1065 : Change the commit order when deduping with local
> > > state
> > > > > store
> > > > >* SAMZA-1066 : Update JavaStorageConfig
> > > > >* SAMZA-1069 : Deadlock between KafkaSystemProducer and
> > > KafkaProducer
> > > > > from kafka-clients lib
> > > > >
> > > > > Here's what I purpose:
> > > > >
> > > > > 1. Cut an 0.12.0 release branch.
> > > > > 2. Work on getting as many of the pending JIRAs done as possible.
> > > > > 3. Target a release vote on the first week of Jan'17.
> > > > >
> > > > >
> > > > > Thoughts?
> > > > > Shanthoosh
> > > > >
> > > >
> > >
> >
>



-- 
Navina R.


Re: SAMZA-469 - scala 2.11 / move to java?

2016-11-16 Thread Navina Ramesh
Thanks a lot ! Please let us know if you need any help :)

Cheers!
Navina

On Tue, Nov 15, 2016 at 8:45 PM, Thunder Stumpges <tstump...@ntent.com>
wrote:

> OK! We are in a push for the next couple weeks, but I will look to put
> some time to this in the next month or there about.
>
> Thanks!
> Thunder
>
>
> -Original Message-
> From: Navina Ramesh [mailto:nram...@linkedin.com.INVALID]
> Sent: Thursday, November 10, 2016 2:19 PM
> To: dev@samza.apache.org
> Subject: Re: SAMZA-469 - scala 2.11 / move to java?
>
> Hi Thunder,
> Sorry about the late response. I was refreshing my memory of SAMZA-469 :)
>
> Is there something preventing this from getting into a release?
> > Seems to me like the main roadblock on upgrading to scala 2.11 was
> related to JDK6. We have already moved away from JDK6 and JDK7. So, I
> don't see anything blocking us from upgrading to scala 2.11, if someone can
> invest some time and push it out the door :)
>
> There is mention in that issue of going away from scala altogether.  Is
> there another issue which details this? When would that happen?
> > By going away, I believe we decided to not add more scala
> > code/dependency
> to the code base so that eventually, we can replace them all with jdk
> based lambdas. There is no concrete plan on when and how this will happen.
> So, imo, it wouldn't be anytime soon :D
>
> if some effort is needed to make the move to 2.11, we would be glad to
> help out!
> > That will be much appreciated! I don't think we have seen any strong
> motivation for doing this within LinkedIn. If you guys can help us
> upgrade, that will be awesome!!
>
> Thanks!
> Navina
>
> On Mon, Nov 7, 2016 at 11:30 AM, Thunder Stumpges <tstump...@ntent.com>
> wrote:
>
> > Hi all,
> >
> > We are looking to get our projects upgraded to scala 2.11 as some of
> > our dependencies are dropping their compiling of 2.10 libraries. We
> > found
> > SAMZA-469 which looks to have had a patch for some time, however the
> > last couple comments seem pretty shaky about when (or if!?) support
> > will be available.
> >
> > Is there something preventing this from getting into a release?
> >
> > There is mention in that issue of going away from scala altogether.
> > Is there another issue which details this? When would that happen?
> >
> > Thanks, and if some effort is needed to make the move to 2.11, we
> > would be glad to help out!
> > Thunder Stumges
> >
> >
>
>
> --
> Navina R.
>



-- 
Navina R.


Re: SAMZA-469 - scala 2.11 / move to java?

2016-11-10 Thread Navina Ramesh
Hi Thunder,
Sorry about the late response. I was refreshing my memory of SAMZA-469 :)

Is there something preventing this from getting into a release?
> Seems to me like the main roadblock on upgrading to scala 2.11 was
related to JDK6. We have already moved away from JDK6 and JDK7. So, I don't
see anything blocking us from upgrading to scala 2.11, if someone can
invest some time and push it out the door :)

There is mention in that issue of going away from scala altogether.  Is
there another issue which details this? When would that happen?
> By going away, I believe we decided to not add more scala code/dependency
to the code base so that eventually, we can replace them all with jdk based
lambdas. There is no concrete plan on when and how this will happen. So,
imo, it wouldn't be anytime soon :D

if some effort is needed to make the move to 2.11, we would be glad to help
out!
> That will be much appreciated! I don't think we have seen any strong
motivation for doing this within LinkedIn. If you guys can help us upgrade,
that will be awesome!!

Thanks!
Navina

On Mon, Nov 7, 2016 at 11:30 AM, Thunder Stumpges 
wrote:

> Hi all,
>
> We are looking to get our projects upgraded to scala 2.11 as some of our
> dependencies are dropping their compiling of 2.10 libraries. We found
> SAMZA-469 which looks to have had a patch for some time, however the last
> couple comments seem pretty shaky about when (or if!?) support will be
> available.
>
> Is there something preventing this from getting into a release?
>
> There is mention in that issue of going away from scala altogether.  Is
> there another issue which details this? When would that happen?
>
> Thanks, and if some effort is needed to make the move to 2.11, we would be
> glad to help out!
> Thunder Stumges
>
>


-- 
Navina R.


Re: Review Request 52168: Tasks endpoint to list the complete details of all tasks related to a job

2016-10-25 Thread Navina Ramesh

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52168/#review153819
---



I haven't read the previous comments. Hence, I may be repeating the same 
question asked by someone else. Please feel free to point me to your responses 
in such cases. Thanks!


samza-core/src/main/scala/org/apache/samza/coordinator/JobCoordinator.scala 
(line 101)
<https://reviews.apache.org/r/52168/#comment223285>

Don't need a return in scala



samza-core/src/main/scala/org/apache/samza/coordinator/JobCoordinator.scala 
(line 125)
<https://reviews.apache.org/r/52168/#comment223288>

Why can't we re-use the same metadata cache that is used when retrieving 
the job model?? Populating the cache on init is particularly expensive. This 
should be avoided, unless there is any strong motivation for different 
components to maintain their own cache!



samza-core/src/main/scala/org/apache/samza/job/JobRunner.scala 
<https://reviews.apache.org/r/52168/#comment223289>

I would think the ideal place for this method will be in Util as opposed to 
JobConfig. Can you please elaborate the rationale behind your choice?



samza-core/src/main/scala/org/apache/samza/util/Util.scala (line 31)
<https://reviews.apache.org/r/52168/#comment223290>

Please use specific imports as opposed .* or ._



samza-rest/src/main/java/org/apache/samza/rest/model/Task.java (line 27)
<https://reviews.apache.org/r/52168/#comment223292>

nit: What is preferred preferredHost? :) Why not just say preferred host? 
I think you are trying to use the variable name in the documentation. It 
doesn't necessarily improve readability though :)



samza-rest/src/main/java/org/apache/samza/rest/resources/ConfigFactoryBuilder.java
 (line 32)
<https://reviews.apache.org/r/52168/#comment223295>

"Builder" in the name of the class is confusing. I was expecting a Builder 
pattern for constructing an instance of ConfigFactory. Why not use the 
ClassLoaderHelper directly whereever this is used?


- Navina Ramesh


On Oct. 24, 2016, 10 p.m., Shanthoosh Venkataraman wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52168/
> ---
> 
> (Updated Oct. 24, 2016, 10 p.m.)
> 
> 
> Review request for samza.
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> This patch contains the following changes
>  * Http get api to list the complete details of all the tasks that belongs to 
> a job. 
>  * Refactored some methods in coordinator stream, to reuse the existing 
> functionality of getting jobConfig from the coordinator stream.
> 
> 
> Diffs
> -
> 
>   docs/learn/documentation/versioned/rest/resource-directory.md 
> 79746d1e2eb3491e4bd26c3c7cf6c7efd150d8ef 
>   docs/learn/documentation/versioned/rest/resources/tasks.md PRE-CREATION 
>   samza-core/src/main/scala/org/apache/samza/config/JobConfig.scala 
> 13b72fae7815ddaea7ae03a24f1a426ca51613cc 
>   samza-core/src/main/scala/org/apache/samza/container/SamzaContainer.scala 
> 05a996c98075ea8ed3767af666b9beeb1933f2a6 
>   samza-core/src/main/scala/org/apache/samza/coordinator/JobCoordinator.scala 
> df63b97e9d598ecd1840111ba490a723e410d089 
>   samza-core/src/main/scala/org/apache/samza/job/JobRunner.scala 
> 022b480856483059cb9f837a08f97a718bc04c31 
>   samza-core/src/main/scala/org/apache/samza/util/Util.scala 
> c4836f202f7eda1d4e71eac94fd48e46207b0316 
>   samza-rest/src/main/java/org/apache/samza/rest/model/Partition.java 
> PRE-CREATION 
>   samza-rest/src/main/java/org/apache/samza/rest/model/Task.java PRE-CREATION 
>   
> samza-rest/src/main/java/org/apache/samza/rest/proxy/job/AbstractJobProxy.java
>  4d8647f3e1e650632e38b47ba5a8a2dac004f545 
>   
> samza-rest/src/main/java/org/apache/samza/rest/proxy/job/JobProxyFactory.java 
> 067711a74e5b0d7277a9c8b2d2517b56e9cfbcca 
>   
> samza-rest/src/main/java/org/apache/samza/rest/proxy/job/SimpleYarnJobProxy.java
>  a935c98730f85f448c688a6baf2e8ddffdbb2cb4 
>   
> samza-rest/src/main/java/org/apache/samza/rest/proxy/job/SimpleYarnJobProxyFactory.java
>  11d93d4608d23a4e3fb3bfc50dfac35ab6dbdf3c 
>   
> samza-rest/src/main/java/org/apache/samza/rest/proxy/task/SamzaTaskProxy.java 
> PRE-CREATION 
>   
> samza-rest/src/main/java/org/apache/samza/rest/proxy/task/SamzaTaskProxyFactory.java
>  PRE-CREATION 
>   samza-rest/src/main/java/org/apache/samza/rest/proxy/task/TaskProxy.java 
> PRE-CREATION 
>   
> samza-rest/src/main/java/org/apache/samza/rest/proxy/task/TaskProxyFactory.java
>  PRE-CREAT

Re: Review Request 52962: SAMZA-1029: Prepare release candidate for 0.11.0

2016-10-17 Thread Navina Ramesh

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52962/#review153001
---


Ship it!




Ship It!

- Navina Ramesh


On Oct. 18, 2016, midnight, Xinyu Liu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52962/
> ---
> 
> (Updated Oct. 18, 2016, midnight)
> 
> 
> Review request for samza and Navina Ramesh.
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> docs update for 0.11.0 release branch
> 
> 
> Diffs
> -
> 
>   docs/_config.yml dc1a66fa743d464c70d92406540fd7122c45272c 
>   docs/learn/tutorials/versioned/deploy-samza-job-from-hdfs.md 
> ca7b5f1a59724bbae9c46c7abd0d68cb3f019e3b 
>   docs/learn/tutorials/versioned/deploy-samza-to-CDH.md 
> daf762bc9f536520cceb503c5053283a80488bb1 
>   docs/learn/tutorials/versioned/remote-debugging-samza.md 
> 40db31a8152a999b549fde8f9155f4541d03147d 
>   docs/learn/tutorials/versioned/run-in-multi-node-yarn.md 
> bf2b59e3f4e0c6a3bfde0187db0b799f76797afb 
>   docs/learn/tutorials/versioned/samza-rest-getting-started.md 
> 942329e968bb02886df44b680bea8f75a221a289 
>   docs/startup/download/index.md 6a0c670bca01e01b9d8a73482af35cc144f1d524 
>   docs/startup/hello-samza/versioned/index.md 
> 8baacd390d41c5c87a426d63eec9ce5028de0cc2 
> 
> Diff: https://reviews.apache.org/r/52962/diff/
> 
> 
> Testing
> ---
> 
> local website.
> 
> 
> Thanks,
> 
> Xinyu Liu
> 
>



Re: Review Request 52960: SAMZA-1029: Prepare release candidate for 0.11.0

2016-10-17 Thread Navina Ramesh

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52960/#review153000
---




docs/learn/tutorials/versioned/samza-rest-getting-started.md (line 51)
<https://reviews.apache.org/r/52960/#comment06>

Shouldn't this be 0.11.1-SNAPSHOT as well?


- Navina Ramesh


On Oct. 17, 2016, 11:45 p.m., Xinyu Liu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52960/
> ---
> 
> (Updated Oct. 17, 2016, 11:45 p.m.)
> 
> 
> Review request for samza and Navina Ramesh.
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> Doc updates for the master.
> 
> 
> Diffs
> -
> 
>   docs/_config.yml dc1a66fa743d464c70d92406540fd7122c45272c 
>   docs/_layouts/default.html 60e56b5a14f5211a5ff0e2812c1fc331a25ebfe5 
>   docs/archive/index.html b0a44c6ab40f4eeb7ea4dfc17a8c7243b7e6e035 
>   docs/learn/tutorials/versioned/deploy-samza-job-from-hdfs.md 
> ca7b5f1a59724bbae9c46c7abd0d68cb3f019e3b 
>   docs/learn/tutorials/versioned/deploy-samza-to-CDH.md 
> daf762bc9f536520cceb503c5053283a80488bb1 
>   docs/learn/tutorials/versioned/remote-debugging-samza.md 
> 40db31a8152a999b549fde8f9155f4541d03147d 
>   docs/learn/tutorials/versioned/run-in-multi-node-yarn.md 
> bf2b59e3f4e0c6a3bfde0187db0b799f76797afb 
>   docs/learn/tutorials/versioned/samza-rest-getting-started.md 
> 942329e968bb02886df44b680bea8f75a221a289 
>   docs/startup/download/index.md 6a0c670bca01e01b9d8a73482af35cc144f1d524 
>   docs/startup/hello-samza/versioned/index.md 
> 8baacd390d41c5c87a426d63eec9ce5028de0cc2 
>   gradle.properties f032b745a7ceae319996314f22c16fe0b664e705 
> 
> Diff: https://reviews.apache.org/r/52960/diff/
> 
> 
> Testing
> ---
> 
> Local website
> 
> 
> Thanks,
> 
> Xinyu Liu
> 
>



Re: [VOTE] Apache Samza 0.11.0 RC2

2016-10-12 Thread Navina Ramesh
Validated MD5 and build. Ran check-all and integration test on Mac.

+1 (binding)

Thanks!
Navina

On Tue, Oct 11, 2016 at 1:16 PM, Yi Pan  wrote:

> Build, validated MD5, test w/ integration tests and passed. Thanks!
>
> +1 (binding)
>
> On Mon, Oct 10, 2016 at 4:07 PM, xinyu liu  wrote:
>
> > Hey all,
> >
> > This is a call for a vote on a release of Apache Samza 0.11.0. Thanks to
> > everyone who has contributed to this release. We are very glad to see
> some
> > new contributors in this release.
> >
> > Note: this release candidate reverted the changes that caused the
> > AsyncStreamTask to break in 0.11.0 RC1. All tests are verified for this
> > candidate.
> >
> > The release candidate can be downloaded from here:
> > http://home.apache.org/~xinyu/samza-0.11.0-rc2/
> >
> > The release candidate is signed with pgp key C31D7061, which can be found
> > on keyservers:
> > http://pgp.mit.edu/pks/lookup?op=get=0xC31D7061
> >
> > The git tag is release-0.11.0-rc2 and signed with the same pgp key:
> > https://git-wip-us.apache.org/repos/asf?p=samza.git;a=tag;h=
> > refs/tags/release-0.11.0-rc2
> >
> > Test binaries have been published to Maven's staging repository, and are
> > available here:
> > https://repository.apache.org/content/repositories/orgapachesamza-1015
> >
> > Note that the binaries were built with JDK7 without incident.
> >
> > 38 issues were resolved for this release:
> > https://issues.apache.org/jira/issues/?jql=project%20%
> > 3D%20SAMZA%20AND%20fixVersion%20in%20(0.11%2C%200.11.0)%
> > 20AND%20status%20in%20(Resolved%2C%20Closed)
> >
> > The vote will be open for 72 hours ( end in 12:00pm Thursday, 10/13/2016
> ).
> >
> > Please download the release candidate, check the hashes/signature, build
> it
> > and test it, and then please vote:
> >
> >
> > [ ] +1 approve
> >
> > [ ] +0 no opinion
> >
> > [ ] -1 disapprove (and reason why)
> >
> >
> > +1 from my side for the release.
> >
> > Cheers!
> >
> > Xinyu
> >
>



-- 
Navina R.


Re: [DISCUSS] [VOTE] Apache Samza 0.11.0 RC0

2016-10-04 Thread Navina Ramesh (Apache)
Verified MD5, signature. Ran bin/check-all.sh on MacOS. My RHEL box is
broken. I think others have tested it on RHEL.

Hence, +1 (binding) from me!

**On a side note** I think we need to upgrade the gradle version used by
the bootstrap script to 2.6 or higher. At least, make sure that Samza
doesn't throw checkstyle errors when used with a higher version of gradle.
This should be followed-up after release. Hopefully, someone can volunteer
:)

Thanks for driving this release, Xinyu!

Cheers!
Navina

On Tue, Oct 4, 2016 at 12:45 PM, Jakob Homan  wrote:

> +1 binding.
>
> Verified MD5, checked code, built and tested.
>
> Good job, everyone.
> -Jakob
>
>
> On 4 October 2016 at 09:23, Jacob Maes  wrote:
> > +1 (non-binding)
> >
> > Downloaded and bin/check-all.sh on OSX
> > Downloaded, built and ran unit tests on RHEL
> >
> > I got a checkstyle error when I tried to run bin/check-all.sh on RHEL,
> but
> > I think it's something environmental.
> >
> > Looks good.
> >
> > On Mon, Oct 3, 2016 at 4:19 PM, Jagadish Venkatraman <
> jagadish1...@gmail.com
> >> wrote:
> >
> >> +1 from my side for the release (non-binding)
> >>
> >> On Mon, Oct 3, 2016 at 12:36 PM, Boris Shkolnik 
> wrote:
> >>
> >> > +1
> >> >
> >> > On Fri, Sep 30, 2016 at 1:39 PM, xinyu liu 
> >> wrote:
> >> >
> >> > > Subject correction: [VOTE] Apache Samza 0.11.0 RC0.
> >> > >
> >> > > Thanks,
> >> > > Xinyu
> >> > >
> >> > > On Fri, Sep 30, 2016 at 12:00 PM, xinyu liu 
> >> > wrote:
> >> > >
> >> > > > Hey all,
> >> > > >
> >> > > > This is a call for a vote on a release of Apache Samza 0.11.0.
> Thanks
> >> > to
> >> > > > everyone who has contributed to this release. We are very glad to
> see
> >> > > > some new contributors in this release.
> >> > > >
> >> > > > The release candidate can be downloaded from here:
> >> > > > http://home.apache.org/~xinyu/samza-0.11.0-rc0/
> >> > > >
> >> > > > The release candidate is signed with pgp key C31D7061, which can
> be
> >> > > found on
> >> > > > keyservers:
> >> > > > http://pgp.mit.edu/pks/lookup?op=get=0xC31D7061
> >> > > >
> >> > > > The git tag is release-0.11.0-rc0 and signed with the same pgp
> key:
> >> > > > https://git-wip-us.apache.org/repos/asf?p=samza.git;a=tag;h=
> >> > > > refs/tags/release-0.11.0-rc0
> >> > > >
> >> > > > Test binaries have been published to Maven's staging repository,
> and
> >> > are
> >> > > > available here:
> >> > > > https://repository.apache.org/content/repositories/
> >> orgapachesamza-1013
> >> > > >
> >> > > > Note that the binaries were built with JDK7 without incident.
> >> > > >
> >> > > > 38 issues were resolved for this release:
> >> > > > https://issues.apache.org/jira/issues/?jql=project%20%3D%
> >> > > > 20SAMZA%20AND%20fixVersion%20in%20(0.11%2C%200.11.0)%20AND%
> >> > > > 20status%20in%20(Resolved%2C%20Closed)
> >> > > >
> >> > > > The vote will be open for 72 hours ( end in 12:00pm Wednesday,
> >> > 10/05/2016
> >> > > > ).
> >> > > >
> >> > > > Please download the release candidate, check the hashes/signature,
> >> > build
> >> > > > it and test it, and then please vote:
> >> > > >
> >> > > >
> >> > > > [ ] +1 approve
> >> > > >
> >> > > > [ ] +0 no opinion
> >> > > >
> >> > > > [ ] -1 disapprove (and reason why)
> >> > > >
> >> > > >
> >> > > > +1 from my side for the release.
> >> > > >
> >> > > > Cheers!
> >> > > > Xinyu Liu
> >> > > >
> >> > >
> >> >
> >>
> >>
> >>
> >> --
> >> Jagadish V,
> >> Graduate Student,
> >> Department of Computer Science,
> >> Stanford University
> >>
>


Re: Review Request 51142: SAMZA-967: HDFS System Consumer

2016-09-29 Thread Navina Ramesh


> On Sept. 14, 2016, 6:19 a.m., Yi Pan (Data Infrastructure) wrote:
> > samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemFactory.scala,
> >  line 38
> > <https://reviews.apache.org/r/51142/diff/5/?file=1493812#file1493812line38>
> >
> > Not related to your RB, but could you open a JIRA for this one? Using 
> > KafkaUtil class in HdfsSystemFactory seems really weird.
> 
> Yi Pan (Data Infrastructure) wrote:
> Just realized, w/ this dependency, are we creating a dependency on 
> samza-kafka in samza-hdfs? I don't think that is right. samza-kafka and 
> samza-hdfs should remain as two independent modules implementing different 
> SystemFactory and can not depend on each other. We definitely need to have a 
> JIRA addressing this one.

I think Hai created a JIRA for this yesterday. 
https://issues.apache.org/jira/browse/SAMZA-1026?jql=project%20%3D%20SAMZA%20AND%20created%3E%3D-1w%20ORDER%20BY%20created%20DESC


- Navina


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51142/#review148780
---


On Sept. 28, 2016, 9:57 p.m., Hai Lu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51142/
> ---
> 
> (Updated Sept. 28, 2016, 9:57 p.m.)
> 
> 
> Review request for samza, Yi Pan (Data Infrastructure) and Navina Ramesh.
> 
> 
> Bugs: SAMZA-967
> https://issues.apache.org/jira/browse/SAMZA-967
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> Add HDFS System Consumer: 
> 
> 1. System admin, partitioner
> 2. System consumer with metrics
> 
> Design doc can be found here: 
> https://issues.apache.org/jira/secure/attachment/12824078/HDFSSystemConsumer.pdf
> 
> An overview of the high level architecture: 
> 
> The system factory is used by Samza to instantiate SystemConsumer, 
> SystemProducer, and SystemAdmin for a specific system. The 
> FileDataSystemFactory can be reused for other file system like sources. 
> 
> HDFSSystemAdmin will start a “DirectoryPartitioner” to figure out the set of 
> HDFS files need to be consumed for this job. The DirectoryPartitioner also 
> uses “GroupingPattern” to group files into partitions if advanced 
> partitioning is required. HDFSSystemAdmin will then persist the 
> “PartitionDescriptor” to HDFS.
> 
> The HDFSSystemConsumer will then pick up the “PartitionDescriptor” from HDFS. 
> Based on this information as well as the actual assignment of partitions, it 
> would then know which files to read from.
> 
> The initial implementation of the HDFS system consumer supports only avro 
> data files. It’s very easy to extend it to a variety of file format by 
> implementing the FileReader interface.
> 
>   
> 
>  
> +--+
>  
>  |
>   | 
>+-+ HDFS   
>   | 
>|   Obtain|
>   | 
>|  Partition  
> +--+--^--+-^---+
>  
>| Description|  |  |   
>   | 
>||  |  |   
>   | 
>|  +-v---+  |  |   
> Filtering/| 
>|  | |  |  +---+
> Grouping +-+   
>|  | HDFSAvroFileReader  |  |  |   
> |   
>|  | |Persist   |  |   
> |   
>|  +-+---+   Partition  |  |   
> |   
>||  Description |   
> +--v--+ +--

Re: Review Request 51142: SAMZA-967: HDFS System Consumer

2016-09-29 Thread Navina Ramesh


> On Sept. 29, 2016, 5:56 p.m., Prateek Maheshwari wrote:
> > samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsConfig.scala, 
> > line 66
> > <https://reviews.apache.org/r/51142/diff/5/?file=1493810#file1493810line66>
> >
> > "systems.%s.consumer.buffer-capacity" makes sense to me. Regarding the 
> > "hdfs" prefix, there's already inconsistency in current configs. The kafka 
> > system configs don't include kafka in the config name, but the hdfs 
> > producer configs do. The kafka convention is better IMHO.
> > 
> > Either way, we should at least be consistent between this and the new 
> > partitioner/reader configs which don't have the hdfs prefix.
> 
> Prateek Maheshwari wrote:
> Btw, in the new configs we'll be using camelCase instead of dashes, so 
> we'll eventually need to change it it to bufferCapacity.
> 
> Hai Lu wrote:
> There won't be any problems if I change it to camelCase style now, right? 
> Or should I keep the dash?

You can use camelCase style. Going forward, we should strictly follow using 
period as namespace delimiter. So, it's fine to make this camelcase. Thanks, 
Hai!


- Navina


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51142/#review150883
---


On Sept. 28, 2016, 9:57 p.m., Hai Lu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51142/
> -------
> 
> (Updated Sept. 28, 2016, 9:57 p.m.)
> 
> 
> Review request for samza, Yi Pan (Data Infrastructure) and Navina Ramesh.
> 
> 
> Bugs: SAMZA-967
> https://issues.apache.org/jira/browse/SAMZA-967
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> Add HDFS System Consumer: 
> 
> 1. System admin, partitioner
> 2. System consumer with metrics
> 
> Design doc can be found here: 
> https://issues.apache.org/jira/secure/attachment/12824078/HDFSSystemConsumer.pdf
> 
> An overview of the high level architecture: 
> 
> The system factory is used by Samza to instantiate SystemConsumer, 
> SystemProducer, and SystemAdmin for a specific system. The 
> FileDataSystemFactory can be reused for other file system like sources. 
> 
> HDFSSystemAdmin will start a “DirectoryPartitioner” to figure out the set of 
> HDFS files need to be consumed for this job. The DirectoryPartitioner also 
> uses “GroupingPattern” to group files into partitions if advanced 
> partitioning is required. HDFSSystemAdmin will then persist the 
> “PartitionDescriptor” to HDFS.
> 
> The HDFSSystemConsumer will then pick up the “PartitionDescriptor” from HDFS. 
> Based on this information as well as the actual assignment of partitions, it 
> would then know which files to read from.
> 
> The initial implementation of the HDFS system consumer supports only avro 
> data files. It’s very easy to extend it to a variety of file format by 
> implementing the FileReader interface.
> 
>   
> 
>  
> +--+
>  
>  |
>   | 
>+-+ HDFS   
>   | 
>|   Obtain|
>   | 
>|  Partition  
> +--+--^--+-^---+
>  
>| Description|  |  |   
>   | 
>||  |  |   
>   | 
>|  +-v---+  |  |   
> Filtering/| 
>|  | |  |  +---+
> Grouping +-+   
>|  | HDFSAvroFileReader  |  |  |   
> 

Re: Review Request 51142: SAMZA-967: HDFS System Consumer

2016-09-27 Thread Navina Ramesh

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51142/#review150648
---


Fix it, then Ship it!




lgtm +1 .. I think you were planning to add documentation with a separate 
JIRA/RB . Correct?


build.gradle (line 308)
<https://reviews.apache.org/r/51142/#comment218682>

why is this dependency needed here? It seems like this compile dependency 
is required for samza-hdfs and now samza-hdfs depends on samza-yarn. Is there a 
better way to do this?



samza-hdfs/src/main/java/org/apache/samza/system/hdfs/partitioner/DirectoryPartitioner.java
 (line 53)
<https://reviews.apache.org/r/51142/#comment218683>

It is important to document the assumption that we consider the HDFS file 
set to be immutable and how we handle inconsistencies. Looks like you validate 
and throw exception in validateAndGetOriginalFilteredFiles. 
Sorry about nagging regarding documentation. This feels like a complicated 
class, where we may easily forget our design assumptions. Better to clarify it 
in the doc.



samza-hdfs/src/test/scala/org/apache/samza/system/hdfs/TestHdfsSystemProducerTestSuite.scala
 (line 42)
<https://reviews.apache.org/r/51142/#comment218691>

Not related to you change. But can you clean up some unused imports before 
you commit this file? Thanks!



samza-shell/src/main/bash/bash-run-job.sh (line 27)
<https://reviews.apache.org/r/51142/#comment218688>

If we want to use this only for the azkaban runner, we should perhaps 
rename the file as run-job-for-azkaban.sh or something on those lines.


- Navina Ramesh


On Sept. 20, 2016, 11:22 p.m., Hai Lu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51142/
> ---
> 
> (Updated Sept. 20, 2016, 11:22 p.m.)
> 
> 
> Review request for samza, Chris Pettitt, Yi Pan (Data Infrastructure), and 
> Navina Ramesh.
> 
> 
> Bugs: SAMZA-967
> https://issues.apache.org/jira/browse/SAMZA-967
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> Add HDFS System Consumer: 
> 
> 1. System admin, partitioner
> 2. System consumer with metrics
> 
> Design doc can be found here: 
> https://issues.apache.org/jira/secure/attachment/12824078/HDFSSystemConsumer.pdf
> 
> An overview of the high level architecture: 
> 
> The system factory is used by Samza to instantiate SystemConsumer, 
> SystemProducer, and SystemAdmin for a specific system. The 
> FileDataSystemFactory can be reused for other file system like sources. 
> 
> HDFSSystemAdmin will start a “DirectoryPartitioner” to figure out the set of 
> HDFS files need to be consumed for this job. The DirectoryPartitioner also 
> uses “GroupingPattern” to group files into partitions if advanced 
> partitioning is required. HDFSSystemAdmin will then persist the 
> “PartitionDescriptor” to HDFS.
> 
> The HDFSSystemConsumer will then pick up the “PartitionDescriptor” from HDFS. 
> Based on this information as well as the actual assignment of partitions, it 
> would then know which files to read from.
> 
> The initial implementation of the HDFS system consumer supports only avro 
> data files. It’s very easy to extend it to a variety of file format by 
> implementing the FileReader interface.
> 
>   
> 
>  
> +--+
>  
>  |
>   | 
>+-+ HDFS   
>   | 
>|   Obtain|
>   | 
>|  Partition  
> +--+--^--+-^---+
>  
>| Description|  |  |   
>   | 
>||  |  |   
>   | 
>|  +-v---+  |  |   
> Filtering/| 
>|  | |  |

Re: Review Request 51142: SAMZA-967: HDFS System Consumer

2016-09-27 Thread Navina Ramesh


> On Sept. 13, 2016, 1:37 a.m., Yi Pan (Data Infrastructure) wrote:
> > samza-hdfs/src/main/java/org/apache/samza/system/hdfs/reader/AvroFileHdfsReader.java,
> >  line 24
> > <https://reviews.apache.org/r/51142/diff/5/?file=1493806#file1493806line24>
> >
> > One concern I had w/ this HdfsAvroFileReader/Writer is the version 
> > conflict issue. LinkedIn's Kafka version still uses avro-1.4 in the serde, 
> > while hdfs already uses avro-1.7 in 2.6.1. I guess that we need to find a 
> > solution inside LinkedIn to resolve it. Let's sync up face-to-face tomorrow.
> 
> Hai Lu wrote:
> I was well aware of the avro issue. I tried so many different APIs that I 
> finally found the set of APIs that work for both 1.4 and 1.7
> 
> Yi Pan (Data Infrastructure) wrote:
> Great! I am really curious what are the set of compatible APIs! So, I 
> guess that we just enforce avro-1.4 when compiling samza-hdfs module? I 
> remember that I tried last time and got a build failure in samza-hdfs w/ 
> AvroDataFileHdfsWriter in samza-li build. I am curious how you made it work.
> 
> Navina Ramesh wrote:
> Right now, we exclude samza-hdfs build in samza-li. 
>   "build": "ligradle -PscalaVersion=2.10 -Prelease=true 
> -PallArtifacts build -x:samza-hdfs_2.10:build",
>   
> We may want to fully understand the avro changes introduced by 
> HdfsProducer and/or HdfsConsumer in samza-li. This sounds like a blocker for 
> me right now. How are we going to overcome avro conflict introduced in 
> HdfsSystemProducer?
> 
> Hai Lu wrote:
> I know. I included it back in samza-li and it worked just fine. Just need 
> some extra dependency to make the tests pass. I have been only using li_trunk 
> to deploy to Hadoop's YARN at LinkedIn

Got it. Thanks!


- Navina


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51142/#review148629
---


On Sept. 20, 2016, 11:22 p.m., Hai Lu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51142/
> -------
> 
> (Updated Sept. 20, 2016, 11:22 p.m.)
> 
> 
> Review request for samza, Chris Pettitt, Yi Pan (Data Infrastructure), and 
> Navina Ramesh.
> 
> 
> Bugs: SAMZA-967
> https://issues.apache.org/jira/browse/SAMZA-967
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> Add HDFS System Consumer: 
> 
> 1. System admin, partitioner
> 2. System consumer with metrics
> 
> Design doc can be found here: 
> https://issues.apache.org/jira/secure/attachment/12824078/HDFSSystemConsumer.pdf
> 
> An overview of the high level architecture: 
> 
> The system factory is used by Samza to instantiate SystemConsumer, 
> SystemProducer, and SystemAdmin for a specific system. The 
> FileDataSystemFactory can be reused for other file system like sources. 
> 
> HDFSSystemAdmin will start a “DirectoryPartitioner” to figure out the set of 
> HDFS files need to be consumed for this job. The DirectoryPartitioner also 
> uses “GroupingPattern” to group files into partitions if advanced 
> partitioning is required. HDFSSystemAdmin will then persist the 
> “PartitionDescriptor” to HDFS.
> 
> The HDFSSystemConsumer will then pick up the “PartitionDescriptor” from HDFS. 
> Based on this information as well as the actual assignment of partitions, it 
> would then know which files to read from.
> 
> The initial implementation of the HDFS system consumer supports only avro 
> data files. It’s very easy to extend it to a variety of file format by 
> implementing the FileReader interface.
> 
>   
> 
>  
> +--+
>  
>  |
>   | 
>+-+ HDFS   
>   | 
>|   Obtain|
>   | 
>|  Partition  
> +--+--^--+-^---+
>

Re: Review Request 51142: SAMZA-967: HDFS System Consumer

2016-09-27 Thread Navina Ramesh


> On Sept. 13, 2016, 12:33 a.m., Yi Pan (Data Infrastructure) wrote:
> > samza-hdfs/src/main/java/org/apache/samza/system/hdfs/HdfsSystemConsumer.java,
> >  line 101
> > <https://reviews.apache.org/r/51142/diff/5/?file=1493801#file1493801line101>
> >
> > Better, to avoid the wasteful remote IO if there are multiple calls to 
> > getPartitionDescriptor from multiple threads, is to use bucketized locks on 
> > the ConcurrentHashMap entries to ensure synchronization in populating a 
> > certain hash map entry. Guava cache implemented the bucketized locking as a 
> > built-in method already: 
> > http://www.tutorialspoint.com/guava/guava_caching_utilities.htm
> 
> Navina Ramesh wrote:
> What was the resolution here? Was there any change to the IO pattern to 
> use caching?
> 
> Hai Lu wrote:
> I believe this is just to optimize the situation that multiple calls 
> happen at the same time and causing everyone making remote calls. After the 
> change here, only the first one will actually make the remote call while 
> everyone else be blocked.
> 
> It's a very very tiny improvement, to be honest.

Ah.Got it.. Actually I was looking for the change in the diff window and I 
couldn't figure out where you have used. I understood once I applied your patch 
in my IDE. Thanks!

Yes . It is tiny but important improvement.


- Navina


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51142/#review148612
---


On Sept. 20, 2016, 11:22 p.m., Hai Lu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51142/
> ---
> 
> (Updated Sept. 20, 2016, 11:22 p.m.)
> 
> 
> Review request for samza, Chris Pettitt, Yi Pan (Data Infrastructure), and 
> Navina Ramesh.
> 
> 
> Bugs: SAMZA-967
> https://issues.apache.org/jira/browse/SAMZA-967
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> Add HDFS System Consumer: 
> 
> 1. System admin, partitioner
> 2. System consumer with metrics
> 
> Design doc can be found here: 
> https://issues.apache.org/jira/secure/attachment/12824078/HDFSSystemConsumer.pdf
> 
> An overview of the high level architecture: 
> 
> The system factory is used by Samza to instantiate SystemConsumer, 
> SystemProducer, and SystemAdmin for a specific system. The 
> FileDataSystemFactory can be reused for other file system like sources. 
> 
> HDFSSystemAdmin will start a “DirectoryPartitioner” to figure out the set of 
> HDFS files need to be consumed for this job. The DirectoryPartitioner also 
> uses “GroupingPattern” to group files into partitions if advanced 
> partitioning is required. HDFSSystemAdmin will then persist the 
> “PartitionDescriptor” to HDFS.
> 
> The HDFSSystemConsumer will then pick up the “PartitionDescriptor” from HDFS. 
> Based on this information as well as the actual assignment of partitions, it 
> would then know which files to read from.
> 
> The initial implementation of the HDFS system consumer supports only avro 
> data files. It’s very easy to extend it to a variety of file format by 
> implementing the FileReader interface.
> 
>   
> 
>  
> +--+
>  
>  |
>   | 
>+-+ HDFS   
>   | 
>|   Obtain|
>   | 
>|  Partition  
> +--+--^--+-^---+
>  
>| Description|  |  |   
>   | 
>||  |  |   
>   | 
>|  +-v---+  |  |   
> Filtering/| 
>|  | |  |  +---+
> Grouping +-

Re: Review Request 51142: SAMZA-967: HDFS System Consumer

2016-09-27 Thread Navina Ramesh


> On Sept. 14, 2016, 6:19 a.m., Yi Pan (Data Infrastructure) wrote:
> > samza-hdfs/src/main/java/org/apache/samza/system/hdfs/reader/MultiFileHdfsReader.java,
> >  line 59
> > <https://reviews.apache.org/r/51142/diff/5/?file=1493808#file1493808line59>
> >
> > Not sure what are we doing here? What's the ordering that we are 
> > enforcing in this multi-file partition? I saw that you are trying to make 
> > the offsets as an offset vector on top of all files in the same partition. 
> > Why? Can we simplify it by making it full-ordered in the same partition 
> > instead of partial-ordered via an offset vector?

I couldn't figure out the resolution for this issue. Perhaps you discussed 
offline. Can you please update the discussion here for everyone's benefit?


> On Sept. 14, 2016, 6:19 a.m., Yi Pan (Data Infrastructure) wrote:
> > samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsConfig.scala, 
> > line 66
> > <https://reviews.apache.org/r/51142/diff/5/?file=1493810#file1493810line66>
> >
> > It would be nicer to make it conforming to Offspring style of config 
> > variable scoping. i.e. if the scope of configuration is for hdfs consumer, 
> > use systems.%s.consumer.hdfs.buffer-capacity. I would suggest to consult 
> > Prateek since he has been working on the Offspring config refactoring. For 
> > new config variables, "." should strictly be used as deliminator between 
> > scopes, not as deliminator between words.

Going by the logic of using period to delimit scopes, shouldn't it be 
systems.%s.consumer.hdfs-buffer-capacity? Unless there is a hdfs scope that I 
am not seeing. It is kind of weird because we assume the indirection from 
systemname (%s) to its factory will act as a scope. I am not sure what the 
correct pattern should be.


- Navina


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51142/#review148780
---


On Sept. 20, 2016, 11:22 p.m., Hai Lu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51142/
> -----------
> 
> (Updated Sept. 20, 2016, 11:22 p.m.)
> 
> 
> Review request for samza, Chris Pettitt, Yi Pan (Data Infrastructure), and 
> Navina Ramesh.
> 
> 
> Bugs: SAMZA-967
> https://issues.apache.org/jira/browse/SAMZA-967
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> Add HDFS System Consumer: 
> 
> 1. System admin, partitioner
> 2. System consumer with metrics
> 
> Design doc can be found here: 
> https://issues.apache.org/jira/secure/attachment/12824078/HDFSSystemConsumer.pdf
> 
> An overview of the high level architecture: 
> 
> The system factory is used by Samza to instantiate SystemConsumer, 
> SystemProducer, and SystemAdmin for a specific system. The 
> FileDataSystemFactory can be reused for other file system like sources. 
> 
> HDFSSystemAdmin will start a “DirectoryPartitioner” to figure out the set of 
> HDFS files need to be consumed for this job. The DirectoryPartitioner also 
> uses “GroupingPattern” to group files into partitions if advanced 
> partitioning is required. HDFSSystemAdmin will then persist the 
> “PartitionDescriptor” to HDFS.
> 
> The HDFSSystemConsumer will then pick up the “PartitionDescriptor” from HDFS. 
> Based on this information as well as the actual assignment of partitions, it 
> would then know which files to read from.
> 
> The initial implementation of the HDFS system consumer supports only avro 
> data files. It’s very easy to extend it to a variety of file format by 
> implementing the FileReader interface.
> 
>   
> 
>  
> +--+
>  
>  |
>   | 
>+-+ HDFS   
>   | 
>|   Obtain|
>   | 
>|  Partition  
> +--+--^--+-^---+
>  

Re: Review Request 51142: SAMZA-967: HDFS System Consumer

2016-09-27 Thread Navina Ramesh

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51142/#review150637
---




samza-hdfs/src/main/java/org/apache/samza/system/hdfs/HdfsSystemConsumer.java 
(line 56)
<https://reviews.apache.org/r/51142/#comment218668>

Can you please add javadoc related to thread-safety of the class?



samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsConfig.scala (line 
70)
<https://reviews.apache.org/r/51142/#comment218674>

what is the "default-partitioner"? Is it possible to have more than one 
partitioner?



samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemFactory.scala 
(line 37)
<https://reviews.apache.org/r/51142/#comment218678>

Doesn't this add a dependency between samza-hdfs and samza-kafka?

It seems to have been introduced by the HdfsSystemProducer. Can we please 
fix it forward? Or as Yi suggested, please create a JIRA and add a TODO comment 
here referring to the JIRA

    Thanks!


- Navina Ramesh


On Sept. 20, 2016, 11:22 p.m., Hai Lu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51142/
> ---
> 
> (Updated Sept. 20, 2016, 11:22 p.m.)
> 
> 
> Review request for samza, Chris Pettitt, Yi Pan (Data Infrastructure), and 
> Navina Ramesh.
> 
> 
> Bugs: SAMZA-967
> https://issues.apache.org/jira/browse/SAMZA-967
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> Add HDFS System Consumer: 
> 
> 1. System admin, partitioner
> 2. System consumer with metrics
> 
> Design doc can be found here: 
> https://issues.apache.org/jira/secure/attachment/12824078/HDFSSystemConsumer.pdf
> 
> An overview of the high level architecture: 
> 
> The system factory is used by Samza to instantiate SystemConsumer, 
> SystemProducer, and SystemAdmin for a specific system. The 
> FileDataSystemFactory can be reused for other file system like sources. 
> 
> HDFSSystemAdmin will start a “DirectoryPartitioner” to figure out the set of 
> HDFS files need to be consumed for this job. The DirectoryPartitioner also 
> uses “GroupingPattern” to group files into partitions if advanced 
> partitioning is required. HDFSSystemAdmin will then persist the 
> “PartitionDescriptor” to HDFS.
> 
> The HDFSSystemConsumer will then pick up the “PartitionDescriptor” from HDFS. 
> Based on this information as well as the actual assignment of partitions, it 
> would then know which files to read from.
> 
> The initial implementation of the HDFS system consumer supports only avro 
> data files. It’s very easy to extend it to a variety of file format by 
> implementing the FileReader interface.
> 
>   
> 
>  
> +--+
>  
>  |
>   | 
>+-+ HDFS   
>   | 
>|   Obtain|
>   | 
>|  Partition  
> +--+--^--+-^---+
>  
>| Description|  |  |   
>   | 
>||  |  |   
>   | 
>|  +-v---+  |  |   
> Filtering/| 
>|  | |  |  +---+
> Grouping +-+   
>|  | HDFSAvroFileReader  |  |  |   
> |   
>|  | |Persist   |  |   
> |   
>|  +-+---+   Partition  |  |   
> |   
>||  Description |   
> +--v--+ +--+--+
>| 

Re: Review Request 51142: SAMZA-967: HDFS System Consumer

2016-09-27 Thread Navina Ramesh


> On Sept. 13, 2016, 12:33 a.m., Yi Pan (Data Infrastructure) wrote:
> > samza-hdfs/src/main/java/org/apache/samza/system/hdfs/HdfsSystemConsumer.java,
> >  line 101
> > <https://reviews.apache.org/r/51142/diff/5/?file=1493801#file1493801line101>
> >
> > Better, to avoid the wasteful remote IO if there are multiple calls to 
> > getPartitionDescriptor from multiple threads, is to use bucketized locks on 
> > the ConcurrentHashMap entries to ensure synchronization in populating a 
> > certain hash map entry. Guava cache implemented the bucketized locking as a 
> > built-in method already: 
> > http://www.tutorialspoint.com/guava/guava_caching_utilities.htm

What was the resolution here? Was there any change to the IO pattern to use 
caching?


- Navina


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51142/#review148612
---


On Sept. 20, 2016, 11:22 p.m., Hai Lu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51142/
> ---
> 
> (Updated Sept. 20, 2016, 11:22 p.m.)
> 
> 
> Review request for samza, Chris Pettitt, Yi Pan (Data Infrastructure), and 
> Navina Ramesh.
> 
> 
> Bugs: SAMZA-967
> https://issues.apache.org/jira/browse/SAMZA-967
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> Add HDFS System Consumer: 
> 
> 1. System admin, partitioner
> 2. System consumer with metrics
> 
> Design doc can be found here: 
> https://issues.apache.org/jira/secure/attachment/12824078/HDFSSystemConsumer.pdf
> 
> An overview of the high level architecture: 
> 
> The system factory is used by Samza to instantiate SystemConsumer, 
> SystemProducer, and SystemAdmin for a specific system. The 
> FileDataSystemFactory can be reused for other file system like sources. 
> 
> HDFSSystemAdmin will start a “DirectoryPartitioner” to figure out the set of 
> HDFS files need to be consumed for this job. The DirectoryPartitioner also 
> uses “GroupingPattern” to group files into partitions if advanced 
> partitioning is required. HDFSSystemAdmin will then persist the 
> “PartitionDescriptor” to HDFS.
> 
> The HDFSSystemConsumer will then pick up the “PartitionDescriptor” from HDFS. 
> Based on this information as well as the actual assignment of partitions, it 
> would then know which files to read from.
> 
> The initial implementation of the HDFS system consumer supports only avro 
> data files. It’s very easy to extend it to a variety of file format by 
> implementing the FileReader interface.
> 
>   
> 
>  
> +--+
>  
>  |
>   | 
>+-+ HDFS   
>   | 
>|   Obtain|
>   | 
>|  Partition  
> +--+--^--+-^---+
>  
>| Description|  |  |   
>   | 
>||  |  |   
>   | 
>|  +-v---+  |  |   
> Filtering/| 
>|  | |  |  +---+
> Grouping +-+   
>|  | HDFSAvroFileReader  |  |  |   
> |   
>|  | |Persist   |  |   
> |   
>|  +-+---+   Partition  |  |   
> |   
>||  Description |   
> +--v--+ +

Re: Review Request 51142: SAMZA-967: HDFS System Consumer

2016-09-27 Thread Navina Ramesh


> On Sept. 13, 2016, 1:37 a.m., Yi Pan (Data Infrastructure) wrote:
> > samza-hdfs/src/main/java/org/apache/samza/system/hdfs/reader/AvroFileHdfsReader.java,
> >  line 24
> > <https://reviews.apache.org/r/51142/diff/5/?file=1493806#file1493806line24>
> >
> > One concern I had w/ this HdfsAvroFileReader/Writer is the version 
> > conflict issue. LinkedIn's Kafka version still uses avro-1.4 in the serde, 
> > while hdfs already uses avro-1.7 in 2.6.1. I guess that we need to find a 
> > solution inside LinkedIn to resolve it. Let's sync up face-to-face tomorrow.
> 
> Hai Lu wrote:
> I was well aware of the avro issue. I tried so many different APIs that I 
> finally found the set of APIs that work for both 1.4 and 1.7
> 
> Yi Pan (Data Infrastructure) wrote:
> Great! I am really curious what are the set of compatible APIs! So, I 
> guess that we just enforce avro-1.4 when compiling samza-hdfs module? I 
> remember that I tried last time and got a build failure in samza-hdfs w/ 
> AvroDataFileHdfsWriter in samza-li build. I am curious how you made it work.

Right now, we exclude samza-hdfs build in samza-li. 
  "build": "ligradle -PscalaVersion=2.10 -Prelease=true -PallArtifacts 
build -x:samza-hdfs_2.10:build",
  
We may want to fully understand the avro changes introduced by HdfsProducer 
and/or HdfsConsumer in samza-li. This sounds like a blocker for me right now. 
How are we going to overcome avro conflict introduced in HdfsSystemProducer?


- Navina


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51142/#review148629
---


On Sept. 20, 2016, 11:22 p.m., Hai Lu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51142/
> ---
> 
> (Updated Sept. 20, 2016, 11:22 p.m.)
> 
> 
> Review request for samza, Chris Pettitt, Yi Pan (Data Infrastructure), and 
> Navina Ramesh.
> 
> 
> Bugs: SAMZA-967
> https://issues.apache.org/jira/browse/SAMZA-967
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> Add HDFS System Consumer: 
> 
> 1. System admin, partitioner
> 2. System consumer with metrics
> 
> Design doc can be found here: 
> https://issues.apache.org/jira/secure/attachment/12824078/HDFSSystemConsumer.pdf
> 
> An overview of the high level architecture: 
> 
> The system factory is used by Samza to instantiate SystemConsumer, 
> SystemProducer, and SystemAdmin for a specific system. The 
> FileDataSystemFactory can be reused for other file system like sources. 
> 
> HDFSSystemAdmin will start a “DirectoryPartitioner” to figure out the set of 
> HDFS files need to be consumed for this job. The DirectoryPartitioner also 
> uses “GroupingPattern” to group files into partitions if advanced 
> partitioning is required. HDFSSystemAdmin will then persist the 
> “PartitionDescriptor” to HDFS.
> 
> The HDFSSystemConsumer will then pick up the “PartitionDescriptor” from HDFS. 
> Based on this information as well as the actual assignment of partitions, it 
> would then know which files to read from.
> 
> The initial implementation of the HDFS system consumer supports only avro 
> data files. It’s very easy to extend it to a variety of file format by 
> implementing the FileReader interface.
> 
>   
> 
>  
> +--+
>  
>  |
>   | 
>+-+ HDFS   
>   | 
>|   Obtain|
>   | 
>|  Partition  
> +--+--^--+-^---+
>  
>| Description|  |  |   
>   | 
>||  |  |   
>   | 
>|  +-v---+  |  |   
> Filtering/|

Re: Review Request 52066: change warn about loopback address to a debug

2016-09-21 Thread Navina Ramesh

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52066/#review149877
---


Ship it!




Ship It!

- Navina Ramesh


On Sept. 19, 2016, 8:55 p.m., Boris Shkolnik wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52066/
> ---
> 
> (Updated Sept. 19, 2016, 8:55 p.m.)
> 
> 
> Review request for samza.
> 
> 
> Bugs: SAMZA-1022
> https://issues.apache.org/jira/browse/SAMZA-1022
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> change warn about loopback address to a debug
> 
> 
> Diffs
> -
> 
>   samza-core/src/main/scala/org/apache/samza/util/Util.scala 
> 95a5aa0a23db2a890c19166b6031b2a3b96689f2 
> 
> Diff: https://reviews.apache.org/r/52066/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Boris Shkolnik
> 
>



Re: Issue with consuming non-existent topics in 0.10.1

2016-09-16 Thread Navina Ramesh
Hey Tommy,
Yeah. That totally makes sense. Thanks for explaining it.  :)

Thanks!
Navina

On Fri, Sep 16, 2016 at 12:12 PM, Tommy Becker <tobec...@tivo.com> wrote:

> Hey Navina,
>
> This was consistently reproducible both locally and in our integration
> test environment. We have auto.create.topics.enable on our brokers (or more
> accurately, we do not have it disabled; it's the default). I did not mean
> to imply there is a problem with the logic of the change in SAMZA-971; I
> understand the desire to make fewer calls, but at the time I did not have
> time to dig in and see exactly what the root cause of the difference was. I
> think I've found it now though.
>
> Prior to the 971 fix, we eventually wind up in
> KafkaSystemAdmin.getTopicsAndPartitionsByBroker(), which contains this
> code:
>
> KafkaUtil.maybeThrowException(topicMetadata.errorCode)
>
> What I found was that this was indeed throwing a
> LeaderNotAvailableException in the case where the topic did not already
> exist. This has the effect of triggering a retry in
> KafkaSystemAdmin.getSystemStreamMetadata(), and this continues until the
> broker has finished creating the topic and returns the correct partition
> metadata. The optimized path introduced by the SAMZA-971 fix goes into
> KafkaSystemAdmin.getSystemStreamPartitionCounts() which does not check
> this errorCode, and simply returns an empty set of partitions. Does that
> make sense?
>
>
> -Tommy
>
>
>
>
>
>
> On 09/15/2016 09:54 PM, Navina Ramesh wrote:
>
> Hi Tommy,
>
> Yi and I discussed about it and initially, we thought it could have
> something to do with the topic auto-creation setting on your Kafka server.
> Is it enabled or disabled in your case?
>
> I kind of suspect that the request timeout is insufficient. However, we do
> have retries on Samza to fetch the metadata. So, even if topic does get
> auto-created and metadata fetch is delayed, it will try to fetch the
> metadata again. Not very clear why SAMZA-971 has anything to do with this.
> That JIRA just reduces the number of calls we make to the broker.
>
> Another question, are you able to reproduce this issue ?
>
> Thanks!
> Navina
>
> On Wed, Sep 14, 2016 at 1:33 PM, Tommy Becker <tobec...@tivo.com> tobec...@tivo.com> wrote:
>
>
>
> Thanks for the response, and done.
>
> https://issues.apache.org/jira/browse/SAMZA-1018
>
> On 09/14/2016 01:14 PM, Yi Pan wrote:
>
> Hi, Tommy,
>
> Could you open a JIRA for this one? Also, could you include the Kafka
> broker version in this test?
>
> Thanks!
>
> -Yi
>
> On Wed, Sep 14, 2016 at 6:06 AM, Tommy Becker <tobec...@tivo.com> tobec...@tivo.com>
> tobec...@tivo.com><mailto:tobec...@tivo.com> wrote:
>
>
>
> We are testing an upgrade to 0.10.1 from 0.9.1 and noticed a regression.
> When starting a stream job that consumes a topic that does not yet exist,
> the job dies with the following exception:
>
> Exception in thread "main" java.lang.IllegalArgumentException: No tasks
> found. Likely due to no input partitions. Can't run a job with no tasks.
>  at org.apache.samza.container.grouper.task.GroupByContainerCoun
> t.validateTasks(GroupByContainerCount.java:193)
>  at org.apache.samza.container.grouper.task.GroupByContainerCoun
> t.balance(GroupByContainerCount.java:86)
>  at org.apache.samza.coordinator.JobModelManager$.refreshJobMode
> l(JobCoordinator.scala:278)
>  at org.apache.samza.coordinator.JobModelManager$.jobModelGenera
> tor$1(JobCoordinator.scala:211)
>  at org.apache.samza.coordinator.JobModelManager$.initializeJobM
> odel(JobCoordinator.scala:217)
>  at org.apache.samza.coordinator.JobModelManager$.getJobCoordina
> tor(JobCoordinator.scala:122)
>  at org.apache.samza.coordinator.JobModelManager$.apply(JobCoord
> inator.scala:106)
>  at org.apache.samza.coordinator.JobModelManager$.apply(JobCoord
> inator.scala:112)
>  at org.apache.samza.job.local.ThreadJobFactory.getJob(ThreadJob
> Factory.scala:40)
>  at org.apache.samza.job.JobRunner.run(JobRunner.scala:129)
>  at org.apache.samza.job.JobRunner$.main(JobRunner.scala:66)
>  at org.apache.samza.job.JobRunner.main(JobRunner.scala)
>
>
>
>
>
> The root cause seems to be commit 920f803a2e3dab809f4d7bb518259b0f4164407f
> from SAMZA-971. From what I can tell passing partitionsMetadataOnly = true
> to the StreamMetadataCache in JobModelManager#getInputStreamPartitions is
> what's causing this this behavior. The input topic is still created, but
> the proper partition metadata is not returned, resulting in an empty set
> being returned. The behavior of Kafka here is screwy, but this s

Re: Question about Samza Metrics

2016-09-16 Thread Navina Ramesh
Hi Shuqi,
It depends on the type of metric you are looking for. Most counters and
gauges are cumulative. Timer is the only metrics that works within a
sliding window (if I am not mistaken).

What usecase are you trying to solve?

Thanks!
Navina

On Mon, Sep 12, 2016 at 11:50 PM, 舒琦  wrote:

> Hi,
>
> I found most metrics of Samza are accumulative, is there a way to get
> metrics in a certain time frame.
>
> Thanks
>
> 
> ShuQi




-- 
Navina R.


Re: Issue with consuming non-existent topics in 0.10.1

2016-09-15 Thread Navina Ramesh
Hi Tommy,

Yi and I discussed about it and initially, we thought it could have
something to do with the topic auto-creation setting on your Kafka server.
Is it enabled or disabled in your case?

I kind of suspect that the request timeout is insufficient. However, we do
have retries on Samza to fetch the metadata. So, even if topic does get
auto-created and metadata fetch is delayed, it will try to fetch the
metadata again. Not very clear why SAMZA-971 has anything to do with this.
That JIRA just reduces the number of calls we make to the broker.

Another question, are you able to reproduce this issue ?

Thanks!
Navina

On Wed, Sep 14, 2016 at 1:33 PM, Tommy Becker  wrote:

> Thanks for the response, and done.
>
> https://issues.apache.org/jira/browse/SAMZA-1018
>
> On 09/14/2016 01:14 PM, Yi Pan wrote:
>
> Hi, Tommy,
>
> Could you open a JIRA for this one? Also, could you include the Kafka
> broker version in this test?
>
> Thanks!
>
> -Yi
>
> On Wed, Sep 14, 2016 at 6:06 AM, Tommy Becker  tobec...@tivo.com> wrote:
>
>
>
> We are testing an upgrade to 0.10.1 from 0.9.1 and noticed a regression.
> When starting a stream job that consumes a topic that does not yet exist,
> the job dies with the following exception:
>
> Exception in thread "main" java.lang.IllegalArgumentException: No tasks
> found. Likely due to no input partitions. Can't run a job with no tasks.
>   at org.apache.samza.container.grouper.task.GroupByContainerCoun
> t.validateTasks(GroupByContainerCount.java:193)
>   at org.apache.samza.container.grouper.task.GroupByContainerCoun
> t.balance(GroupByContainerCount.java:86)
>   at org.apache.samza.coordinator.JobModelManager$.refreshJobMode
> l(JobCoordinator.scala:278)
>   at org.apache.samza.coordinator.JobModelManager$.jobModelGenera
> tor$1(JobCoordinator.scala:211)
>   at org.apache.samza.coordinator.JobModelManager$.initializeJobM
> odel(JobCoordinator.scala:217)
>   at org.apache.samza.coordinator.JobModelManager$.getJobCoordina
> tor(JobCoordinator.scala:122)
>   at org.apache.samza.coordinator.JobModelManager$.apply(JobCoord
> inator.scala:106)
>   at org.apache.samza.coordinator.JobModelManager$.apply(JobCoord
> inator.scala:112)
>   at org.apache.samza.job.local.ThreadJobFactory.getJob(ThreadJob
> Factory.scala:40)
>   at org.apache.samza.job.JobRunner.run(JobRunner.scala:129)
>   at org.apache.samza.job.JobRunner$.main(JobRunner.scala:66)
>   at org.apache.samza.job.JobRunner.main(JobRunner.scala)
>
>
>
>
>
> The root cause seems to be commit 920f803a2e3dab809f4d7bb518259b0f4164407f
> from SAMZA-971. From what I can tell passing partitionsMetadataOnly = true
> to the StreamMetadataCache in JobModelManager#getInputStreamPartitions is
> what's causing this this behavior. The input topic is still created, but
> the proper partition metadata is not returned, resulting in an empty set
> being returned. The behavior of Kafka here is screwy, but this still seems
> like a regression. The old behavior is nice because it doesn't require that
> producer systems come up before the stream processors.
>
> --
> Tommy Becker
> Senior Software Engineer
>
> Digitalsmiths
> A TiVo Company
>
> www.digitalsmiths.com ww.digitalsmiths.com>
> tobec...@tivo.com >
>
>
> 
>
> This email and any attachments may contain confidential and privileged
> material for the sole use of the intended recipient. Any review, copying,
> or distribution of this email (or any attachments) by others is prohibited.
> If you are not the intended recipient, please contact the sender
> immediately and permanently delete this email and any attachments. No
> employee or agent of TiVo Inc. is authorized to conclude any binding
> agreement on behalf of TiVo Inc. by email. Binding agreements with TiVo
> Inc. may only be made by a signed written agreement.
>
>
>
>
>
>
> --
> Tommy Becker
> Senior Software Engineer
>
> Digitalsmiths
> A TiVo Company
>
> www.digitalsmiths.com
> tobec...@tivo.com
>
> 
>
> This email and any attachments may contain confidential and privileged
> material for the sole use of the intended recipient. Any review, copying,
> or distribution of this email (or any attachments) by others is prohibited.
> If you are not the intended recipient, please contact the sender
> immediately and permanently delete this email and any attachments. No
> employee or agent of TiVo Inc. is authorized to conclude any binding
> agreement on behalf of TiVo Inc. by email. Binding agreements with TiVo
> Inc. may only be made by a signed written agreement.
>



-- 
Navina R.


Re: Samza kinesis implementation

2016-09-13 Thread Navina Ramesh (Apache)
Hi Shekar,

Last year, we had one of the Samza committers, Yan Fang, mentor a PhD
student - Renato ,as a part of the GSoC program, where they worked on
integrating Samza with Amazon Kinesis. I have cc'd the two of them so they
can provide you more context.
Here is the presentation from ApacheCon 2015 as result of their work -
http://www.slideshare.net/RenatoJavierMarroqun/apachecon-bigdata-europe-2015

We have an implementation at LinkedIn, although it has not been submitted
to open-source. I have cc'd Ryanne and Jason who work on the Kinesis
integration at LinkedIn.

I am afraid that's all the guidance I can offer for now. Please let us know
if you have any questions. We would to be happy to review your design and
code contribution!

Thanks!
Navina

On Tue, Sep 13, 2016 at 5:17 PM, Shekar Tippur  wrote:

> Hello,
>
> I am looking for direction on implementing samza over Kinesis.
> I see that jira ticket is in unresolved state.
> https://issues.apache.org/jira/plugins/servlet/mobile#issue/SAMZA-489
>
> I also saw that with Samza release 0.10 this implementation is in place.
>
> Appreciate any pointers on this.
>
> Sent from my iPhone
>


Re: Review Request 51630: SAMZA-1007: Pass job config from factory to grouper

2016-09-10 Thread Navina Ramesh

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51630/#review148404
---


Ship it!




Ship It!

- Navina Ramesh


On Sept. 6, 2016, 8:41 a.m., Neil Fordyce wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51630/
> ---
> 
> (Updated Sept. 6, 2016, 8:41 a.m.)
> 
> 
> Review request for samza.
> 
> 
> Bugs: SAMZA-1007
> https://issues.apache.org/jira/browse/SAMZA-1007
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> Supplying config to the grouper to allow broadcast streams to be configured 
> when using GroupBySystemStreamPartitionFactory.
> 
> 
> Diffs
> -
> 
>   
> samza-core/src/main/java/org/apache/samza/container/grouper/stream/GroupBySystemStreamPartition.java
>  a8b41de5a8afa94d16727e6900f8556ce35c9f9c 
>   
> samza-core/src/main/java/org/apache/samza/container/grouper/stream/GroupBySystemStreamPartitionFactory.java
>  04a744492c0a4576c2812063ccf19c303c4d6ccf 
>   
> samza-core/src/test/java/org/apache/samza/container/grouper/stream/TestGroupBySystemStreamPartition.java
>  1bd14a41bffa9c9b6c7bc3c5e890c1346124149a 
> 
> Diff: https://reviews.apache.org/r/51630/diff/
> 
> 
> Testing
> ---
> 
> ./gradlew clean build
> 
> 
> Thanks,
> 
> Neil Fordyce
> 
>



Re: Review Request 51726: SAMZA-1005: Refactor class instantiation code to a helper class.

2016-09-10 Thread Navina Ramesh

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51726/#review148401
---


Ship it!




Patch looks awesome! I think you got most of the occurrences of Class loaders 
:) Thanks for the patch!

- Navina Ramesh


On Sept. 8, 2016, 12:44 p.m., Branislav Cogic wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51726/
> ---
> 
> (Updated Sept. 8, 2016, 12:44 p.m.)
> 
> 
> Review request for samza.
> 
> 
> Bugs: SAMZA-1005
> https://issues.apache.org/jira/browse/SAMZA-1005
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> Refactor class instantiation code to a helper class
> 
> 
> Diffs
> -
> 
>   
> samza-core/src/main/java/org/apache/samza/clustermanager/ContainerProcessManager.java
>  c6bfec00691270a443d94b2b48569b6f01b69489 
>   samza-core/src/main/java/org/apache/samza/util/ClassLoaderHelper.java 
> PRE-CREATION 
>   samza-core/src/main/scala/org/apache/samza/container/SamzaContainer.scala 
> f786fc08c8f7eced4f4084dc8326b28b6422 
>   samza-core/src/main/scala/org/apache/samza/job/JobRunner.scala 
> 383bb13b18ace639607541c1bf6d0f42569cd4ff 
>   samza-core/src/main/scala/org/apache/samza/util/CommandLine.scala 
> f26501b2820b99d1ad2964c6f7833ef7eaddba97 
>   samza-rest/src/main/java/org/apache/samza/monitor/MonitorLoader.java 
> 75f3867281240e821bc847eb21a83bb891be6667 
>   samza-rest/src/main/java/org/apache/samza/rest/SamzaRestApplication.java 
> 61f3c462b61e531a566f885c5955c178ad25226b 
>   
> samza-rest/src/main/java/org/apache/samza/rest/proxy/job/AbstractJobProxy.java
>  bcc88d0fc1ab7a3d3010815c69820ab292ac42f2 
>   
> samza-yarn/src/main/java/org/apache/samza/validation/YarnJobValidationTool.java
>  c47e8d1214763ba1cac4ca31322746107a5f1260 
> 
> Diff: https://reviews.apache.org/r/51726/diff/
> 
> 
> Testing
> ---
> 
> Ran those commands successfully on Linux:
> ./gradlew clean build
> ./gradlew checkstyleMain checkstyleTest
> ./bin/check-all.sh
> 
> And ran hello-samza jobs successfully.
> 
> 
> Thanks,
> 
> Branislav Cogic
> 
>



Re: Question about Samza Metrcis

2016-09-08 Thread Navina Ramesh
Hi ShuQi,

Auto-creation of streams depends on your Kafka server configuration. In
case of coordinator stream and checkpoint stream, samza explicitly creates
a stream with 1 partition before publishing to it. This doesn't apply for
metrics. So, if auto-creation is turned off in kafka server, then you have
to manually create it, as you did.

Glad you figured it out.

Cheers!
Navina

On Wed, Sep 7, 2016 at 6:37 PM, 舒琦  wrote:

> Hi,
>
> Thanks for your help.
>
> I think checkpoint stream and coordinate stream will be auto-create per
> job if using kaka, but not metrics.
>
> After I manually created metrics stream in Kafka, the metrics is written
> into the stream.
>
>
> ShuQi
>
> > 在 2016年9月7日,23:15,Jagadish Venkatraman  写道:
> >
> > Can you run your program in DEBUG log-level? Does sending the metric to
> the
> > producer fail? Is the metric reporter thread showing an exception? (check
> > the stderr file too)
> >
> > Producing to a kafka topic should usually auto-create it.
> >
> > On Wed, Sep 7, 2016 at 2:09 AM, 舒琦  wrote:
> >
> >> Hi,
> >>
> >> My samza job has following metrics configuration:
> >>
> >> serializers.registry.metrics.class=org.apache.samza.serializers.
> >> MetricsSnapshotSerdeFactory
> >>
> >> systems.kafka.samza.factory=org.apache.samza.system.kafka.
> >> KafkaSystemFactory
> >> systems.kafka.consumer.zookeeper.connect=zk11:3181,zk12:3181,zk13:3181
> >> systems.kafka.producer.bootstrap.servers=buka1:9096,
> buka2:9096,buka3:9096
> >>
> >> systems.kafka.streams.samza-metrics.samza.msg.serde=metrics
> >>
> >> metrics.reporter.snapshot.class=org.apache.samza.metrics.reporter.
> >> MetricsSnapshotReporterFactory
> >> metrics.reporter.snapshot.stream=kafka.samza-metrics
> >> metrics.reporters=snapshot
> >>
> >> And the job is deployed on yarn, after job started, everything is fine,
> I
> >> also can find coordinator stream and checkpoint stream in the same kafka
> >> cluster, but there is no samza-metrics stream.
> >>
> >> One of the container log :
> >>
> >> 2016-09-07 16:32:31.947 [main] MetricsSnapshotReporterFactory [WARN]
> >> Unable to find implementation version in jar's meta info. Defaulting to
> >> 0.0.1.
> >> 2016-09-07 16:32:31.948 [main] MetricsSnapshotReporterFactory [INFO] Got
> >> system stream SystemStream [system=kafka, stream=samza-metrics].
> >> 2016-09-07 16:32:31.949 [main] MetricsSnapshotReporterFactory [INFO] Got
> >> system factory org.apache.samza.system.kafka.
> KafkaSystemFactory@1eed1f10.
> >> 2016-09-07 16:32:31.950 [main] MetricsSnapshotReporterFactory [INFO] Got
> >> producer org.apache.samza.system.kafka.KafkaSystemProducer@16d96b45.
> >> 2016-09-07 16:32:31.951 [main] MetricsSnapshotReporterFactory [INFO] Got
> >> serde org.apache.samza.serializers.MetricsSnapshotSerde@569f129d.
> >> 2016-09-07 16:32:31.952 [main] MetricsSnapshotReporterFactory [INFO]
> >> Setting polling interval to 60
> >> 2016-09-07 16:32:31.954 [main] MetricsSnapshotReporter [INFO] got
> metrics
> >> snapshot reporter properties [job name: data-status-persistent-hstore,
> job
> >> id: 1, containerName: samza-container-1, version: 0.0.1, samzaVersion:
> >> 0.10.1, host: store116, pollingInterval 60]
> >> 2016-09-07 16:32:31.955 [main] MetricsSnapshotReporter [INFO]
> Registering
> >> MetricsSnapshotReporterFactory with producer.
> >> 2016-09-07 16:32:31.955 [main] SamzaContainer$ [INFO] Got metrics
> >> reporters: Set(snapshot)
> >>
> >> 2016-09-07 16:32:32.016 [main] MetricsSnapshotReporter [INFO]
> Registering
> >> TaskName-Partition 7 with producer.
> >> 2016-09-07 16:32:32.016 [main] MetricsSnapshotReporter [INFO]
> Registering
> >> TaskName-Partition 1 with producer.
> >> 2016-09-07 16:32:32.016 [main] MetricsSnapshotReporter [INFO]
> Registering
> >> TaskName-Partition 5 with producer.
> >> 2016-09-07 16:32:32.016 [main] MetricsSnapshotReporter [INFO]
> Registering
> >> TaskName-Partition 3 with producer.
> >> 2016-09-07 16:32:32.017 [main] SamzaContainer [INFO] Starting JVM
> metrics.
> >> 2016-09-07 16:32:32.017 [main] SamzaContainer [INFO] Starting metrics
> >> reporters.
> >> 2016-09-07 16:32:32.018 [main] MetricsSnapshotReporter [INFO]
> Registering
> >> samza-container-1 with producer.
> >> 2016-09-07 16:32:32.018 [main] MetricsSnapshotReporter [INFO] Starting
> >> producer.
> >> 2016-09-07 16:32:32.018 [main] MetricsSnapshotReporter [INFO] Starting
> >> reporter timer.
> >> 2016-09-07 16:32:32.019 [main] SamzaContainer [INFO] Registering task
> >> instances with offsets.
> >> 2016-09-07 16:32:32.022 [main] SamzaContainer [INFO] Starting offset
> >> manager.
> >>
> >> 2016-09-07 16:32:32.212 [SAMZA-METRIC-SNAPSHOT-REPORTER]
> >> KafkaSystemProducer [INFO] Creating a new producer for system kafka.
> >> 2016-09-07 16:32:32.221 [SAMZA-METRIC-SNAPSHOT-REPORTER] ProducerConfig
> >> [INFO] ProducerConfig values:
> >>value.serializer = class org.apache.kafka.common.serialization.
> >> ByteArraySerializer
> >>

Re: Review Request 50174: SAMZA-977: User doc for samza multithreading

2016-09-08 Thread Navina Ramesh


> On Aug. 31, 2016, 12:54 a.m., Navina Ramesh wrote:
> > docs/learn/documentation/versioned/jobs/configuration-table.html, line 357
> > <https://reviews.apache.org/r/50174/diff/3/?file=1488352#file1488352line357>
> >
> > Is it too late to comment on the config key pattern? Traditionally, we 
> > don't have any strict conventions for config names listed anywhere. Since 
> > we use period as a delimiter to distinguish configs belonging to a 
> > namespace, I think we should try to avoid using the same in the actual 
> > config part. 
> > For example, job.container.single.thread.mode can be 
> > job.container.single-thread-mode.
> > Same for job.container.thread.pool.size, task.callback.timeout.ms and 
> > task.max.concurrency
> 
> Xinyu Liu wrote:
> This config is pretty awkward (so is the name). It's just for fallback to 
> old runloop if there is any issue with the new AsyncRunLoop. So I don't 
> expect users to set it normally. Once the AsyncRunLoop stablized, I will 
> remove this config as well as the old runLoop.

As I mentioned above, I think it is important to clarify in the documentation 
that this is a fallback option to provide compatibility with versions < 0.11.0. 
Is it too late to change the config name ? :)


- Navina


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50174/#review147388
---


On Sept. 7, 2016, 5:16 p.m., Xinyu Liu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50174/
> -------
> 
> (Updated Sept. 7, 2016, 5:16 p.m.)
> 
> 
> Review request for samza, Chris Pettitt, Navina Ramesh, and Yi Pan (Data 
> Infrastructure).
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> Update samza web docs with new multithreading api, core and configs.
> 
> 
> Diffs
> -
> 
>   docs/learn/documentation/versioned/api/overview.md 
> 6712344e84e19883b857e00549db2acb101c7e0e 
>   docs/learn/documentation/versioned/container/event-loop.md 
> 116238312df7071747cbbc14bc9c46f558755195 
>   docs/learn/documentation/versioned/jobs/configuration-table.html 
> 54c52981c3055b398ee60af50eeaf2592ed0e64f 
> 
> Diff: https://reviews.apache.org/r/50174/diff/
> 
> 
> Testing
> ---
> 
> Test the web pages locally.
> 
> 
> Thanks,
> 
> Xinyu Liu
> 
>



Re: Review Request 50174: SAMZA-977: User doc for samza multithreading

2016-09-08 Thread Navina Ramesh


> On Aug. 30, 2016, 1 a.m., Xinyu Liu wrote:
> > docs/learn/documentation/versioned/jobs/configuration-table.html, line 368
> > <https://reviews.apache.org/r/50174/diff/2/?file=1455861#file1455861line368>
> >
> > oh, actually the job.container.single.thread.mode  means using the old 
> > runloop which supports single thread execution only. It is default as false 
> > so we will use AsyncRunLoop. I plan to get rid of RunLoop in 11.1 once 
> > AsyncRunLoop is fully stablized.

If you are planning to remove support for some of the existing features or 
config (like runloop and single.thread.mode), it is common practice to mark it 
as deprecated and remove it in the next major release. It will be better to 
state this fact in the documentation now so that users know that this config is 
going away.


- Navina


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50174/#review147252
---


On Sept. 7, 2016, 5:16 p.m., Xinyu Liu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50174/
> ---
> 
> (Updated Sept. 7, 2016, 5:16 p.m.)
> 
> 
> Review request for samza, Chris Pettitt, Navina Ramesh, and Yi Pan (Data 
> Infrastructure).
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> Update samza web docs with new multithreading api, core and configs.
> 
> 
> Diffs
> -
> 
>   docs/learn/documentation/versioned/api/overview.md 
> 6712344e84e19883b857e00549db2acb101c7e0e 
>   docs/learn/documentation/versioned/container/event-loop.md 
> 116238312df7071747cbbc14bc9c46f558755195 
>   docs/learn/documentation/versioned/jobs/configuration-table.html 
> 54c52981c3055b398ee60af50eeaf2592ed0e64f 
> 
> Diff: https://reviews.apache.org/r/50174/diff/
> 
> 
> Testing
> ---
> 
> Test the web pages locally.
> 
> 
> Thanks,
> 
> Xinyu Liu
> 
>



Re: Periodic cleanup of unused local stores

2016-09-06 Thread Navina Ramesh
Hi Santhosh,

Thanks for picking SAMZA-656. This is long overdue and will help make our
host-affinity based solution more robust. I have a couple of thoughts on
your design proposal.

1. It is always very useful to provide more context to the reader, esp. in
explaining what the different terms mean (like host-affinity, tombstone
etc) and how it relates to the problem being described.

2. "The Host Affinity feature in Samza enables it to restore local state
from disk instead of bootstrapping the entire changelog" -> host-affinity
as a features only tries to bring-up the container in the same host as
before. This will help samza leverage the locally persisted store data. It
doesn't actually help it restore state in anyway.

3. "To achieve this, Samza stores local state for change logged stores in a
shared directory so it is not tied to a resource manager’s storage
structure and cleanup schedule." -> I think by shared directory, you are
referring to the yarn application's workspace. This shared workspace is
part of the NM, not the RM. You can rephrase this and additionally, provide
the logical path to the state stores.

4. " Expose an API in samza­rest that" -> Can you elaborate what the API
looks like ?

5. Is the rest-api to be invoked by the monitor for all jobs in the cluster
or all running jobs ? What is the criteria there? Please do mention them,
if any.

Thanks!
Navina

On Thu, Sep 1, 2016 at 11:06 AM, santhosh venkat <
santhoshvenkat1...@gmail.com> wrote:

> Currently in Samza, to enable reuse of local store between restarts, local
> store is persisted outside of the YARN’s working directory. However, there
> is no mechanism currently available to periodically clean up the unused
> local stores. Here is a proposal detailing a possible way to accomplish
> this:
>
> https://issues.apache.org/jira/secure/attachment/
> 12826531/GCstalelocalstate.pdf
>
> This is tracked in SAMZA-656. Any feedback/comments are welcome.
>
> Thanks.
>



-- 
Navina R.


Re: Review Request 51630: SAMZA-1007: Pass job config from factory to grouper

2016-09-05 Thread Navina Ramesh

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51630/#review147794
---


Fix it, then Ship it!




1 comment. otherwise, looks good. Thanks!


samza-core/src/main/java/org/apache/samza/container/grouper/stream/GroupBySystemStreamPartitionFactory.java
 (line 24)
<https://reviews.apache.org/r/51630/#comment215069>

Can you remove the default constructor in GroupBySystemStreamPartition? 
Just to avoid future incorrect usages. thanks!


- Navina Ramesh


On Sept. 5, 2016, 6:05 p.m., Neil Fordyce wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51630/
> ---
> 
> (Updated Sept. 5, 2016, 6:05 p.m.)
> 
> 
> Review request for samza.
> 
> 
> Bugs: SAMZA-1007
> https://issues.apache.org/jira/browse/SAMZA-1007
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> Supplying config to the grouper to allow broadcast streams to be configured 
> when using GroupBySystemStreamPartitionFactory.
> 
> 
> Diffs
> -
> 
>   
> samza-core/src/main/java/org/apache/samza/container/grouper/stream/GroupBySystemStreamPartitionFactory.java
>  04a744492c0a4576c2812063ccf19c303c4d6ccf 
>   
> samza-core/src/test/java/org/apache/samza/container/grouper/stream/TestGroupBySystemStreamPartition.java
>  1bd14a41bffa9c9b6c7bc3c5e890c1346124149a 
> 
> Diff: https://reviews.apache.org/r/51630/diff/
> 
> 
> Testing
> ---
> 
> ./gradlew clean build
> 
> 
> Thanks,
> 
> Neil Fordyce
> 
>



Re: Review Request 51142: SAMZA-967: HDFS System Consumer

2016-09-01 Thread Navina Ramesh

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51142/#review147596
---



@lhaiesp: Your patch looks awesome. Happy to review again once you have 
addressed the comments. It will be great if you can add some unit test for 
HdfsSystemConsumer. Some of the documentation that you will have to include for 
this feature will be:
* Add newly introduced configs to configuration-table.html
* Add newly introduced metrics to metrics-table.html (Pending SAMZA-702 commit)
* Add a webpage for describing the behavior of HDFS systemconsumer (or more 
generically, consuming from Bounded input sources) and how to use the HDFS 
consumer

You can choose to keep the documentation as a part of this RB or create a 
follow-up JIRA for documentation and assign it to your self. Ideally, we don't 
want to have a lot of gap between code and documentation. 

Thanks for such a thorough work!


samza-hdfs/src/main/java/org/apache/samza/system/hdfs/HdfsSystemAdmin.java 
(line 51)
<https://reviews.apache.org/r/51142/#comment214793>

nit: these config keys can be private or packages-specific



samza-hdfs/src/main/java/org/apache/samza/system/hdfs/HdfsSystemAdmin.java 
(line 54)
<https://reviews.apache.org/r/51142/#comment214797>

nit: typo in variable name "CONFOG"



samza-hdfs/src/main/java/org/apache/samza/system/hdfs/HdfsSystemConsumer.java 
(line 149)
<https://reviews.apache.org/r/51142/#comment214803>

Isn't numTotalEventsCounter a sum of all counters in numEventsCounter in 
the map ? Do we want to maintain a running sum?



samza-hdfs/src/main/java/org/apache/samza/system/hdfs/PartitionDescriptionUtil.java
 (line 32)
<https://reviews.apache.org/r/51142/#comment214794>

nit: remove unused import



samza-hdfs/src/main/java/org/apache/samza/system/hdfs/PartitionDescriptionUtil.java
 (line 38)
<https://reviews.apache.org/r/51142/#comment214795>

You can add a private default constructor to ensure that the class doesn't 
get instantiated.



samza-hdfs/src/main/java/org/apache/samza/system/hdfs/partitioner/DirectoryPartitioner.java
 (line 171)
<https://reviews.apache.org/r/51142/#comment214800>

Question: Is the generateOldestOffset simply returning a string of "0" 
delimited by a comma? The number of "0" matches the number of files in the 
group?



samza-hdfs/src/main/java/org/apache/samza/system/hdfs/partitioner/HdfsFileSystemAdapter.java
 (line 39)
<https://reviews.apache.org/r/51142/#comment214796>

nit: assigned and ununsed
You can get rid of the constructor here as well.



samza-hdfs/src/test/java/org/apache/samza/system/hdfs/partitioner/TestDirectoryPartitioner.java
 (line 164)
<https://reviews.apache.org/r/51142/#comment214805>

when using groupPattern, is it required for the name of the file represent 
the file length ? (in suffix string) Or did you just add it for testing?


- Navina Ramesh


On Aug. 29, 2016, 5:27 p.m., Hai Lu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51142/
> ---
> 
> (Updated Aug. 29, 2016, 5:27 p.m.)
> 
> 
> Review request for samza, Chris Pettitt, Yi Pan (Data Infrastructure), and 
> Navina Ramesh.
> 
> 
> Bugs: SAMZA-967
> https://issues.apache.org/jira/browse/SAMZA-967
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> Add HDFS System Consumer: 
> 
> 1. System admin, partitioner
> 2. System consumer with metrics
> 
> 
> Diffs
> -
> 
>   build.gradle 1d4eb74b1294318db8454631ddd0901596121ab2 
>   gradle/dependency-versions.gradle 47c71bfde027835682889407261d4798b629d214 
>   samza-hdfs/src/main/java/org/apache/samza/system/hdfs/HdfsSystemAdmin.java 
> PRE-CREATION 
>   
> samza-hdfs/src/main/java/org/apache/samza/system/hdfs/HdfsSystemConsumer.java 
> PRE-CREATION 
>   
> samza-hdfs/src/main/java/org/apache/samza/system/hdfs/PartitionDescriptionUtil.java
>  PRE-CREATION 
>   
> samza-hdfs/src/main/java/org/apache/samza/system/hdfs/partitioner/DirectoryPartitioner.java
>  PRE-CREATION 
>   
> samza-hdfs/src/main/java/org/apache/samza/system/hdfs/partitioner/FileSystemAdapter.java
>  PRE-CREATION 
>   
> samza-hdfs/src/main/java/org/apache/samza/system/hdfs/partitioner/HdfsFileSystemAdapter.java
>  PRE-CREATION 
>   
> samza-hdfs/src/main/java/org/apache/samza/system/hdfs/reader/AvroFileHdfsReader.java
>  PRE-CREATION 
>   
> samza-hdfs/src/main/java/org/apache/samza/system/hdfs/reader/HdfsReaderFactory.java
>  PRE-CREATION 
>   
> samza-hdfs/src/main/

Re: Review Request 51516: SAMZA-702: Document the significance of all the different metrics emitted by Samza out of the box

2016-08-31 Thread Navina Ramesh

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51516/#review147523
---


Fix it, then Ship it!




Overall, the patch looks pretty good! The only roadblock I see is that with 
SAMZA-680, we introduced "ContainerProcessManagerMetrics" which would replace 
"SamzaAppMasterMetrics". This is kind of problematic because it seems to expose 
2 copies of essentially the same metrics - under different "groups" or "class 
names". I am not sure why we have 2 copies of the same metric? 
@vjagadish : why do we have 2 copies of these metrics? What is the plan going 
forward? 

Otherwise, +1 for this patch. Thanks a lot!


docs/learn/documentation/versioned/container/metrics-table.html (line 538)
<https://reviews.apache.org/r/51516/#comment214709>

With SAMZA-837, I think we removed the usage of this metric and only seems 
to be used as a dummy check in a unit test. Can you please mark this as 
deprecated and remove it from the unit test class?



docs/learn/documentation/versioned/container/metrics-table.html (line 795)
<https://reviews.apache.org/r/51516/#comment214711>

did you mean per-system here?


- Navina Ramesh


On Aug. 30, 2016, 7:36 a.m., Branislav Cogic wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51516/
> ---
> 
> (Updated Aug. 30, 2016, 7:36 a.m.)
> 
> 
> Review request for samza.
> 
> 
> Bugs: SAMZA-702
> https://issues.apache.org/jira/browse/SAMZA-702
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> All the metrics documented in a metrics-table.
> 
> Few counters and timer removed because they are not used:
> "send-calls" counter and "chooser-update-ns" timer from SamzaContainerMetrics
> "batch-resets" counter from BootstrapingChooserMetrics
> 
> 
> Diffs
> -
> 
>   docs/_layouts/default.html 7beb734ddeaecb7a6369f7d2a5d4e0c67655269c 
>   docs/learn/documentation/versioned/container/metrics-table.html 
> PRE-CREATION 
>   docs/learn/documentation/versioned/container/metrics.md 
> b053b792097400536ea385cb3db720f6f71da017 
>   
> samza-core/src/main/scala/org/apache/samza/container/SamzaContainerMetrics.scala
>  1e7515e8e8eb5ff2f769bea3184ce49308bada9a 
>   
> samza-core/src/main/scala/org/apache/samza/system/chooser/BootstrappingChooser.scala
>  1cd8e0637e2192460a9e9fe078c735444be8eb97 
> 
> Diff: https://reviews.apache.org/r/51516/diff/
> 
> 
> Testing
> ---
> 
> Site ran locally using local-site-test.sh
> 
> 
> Thanks,
> 
> Branislav Cogic
> 
>



Re: Review Request 51516: SAMZA-702: Document the significance of all the different metrics emitted by Samza out of the box

2016-08-31 Thread Navina Ramesh

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51516/#review147518
---




docs/learn/documentation/versioned/container/metrics-table.html (line 367)
<https://reviews.apache.org/r/51516/#comment214701>

I think this metric just represents the number of partitions of a 
particular system that were empty and were provided to the consumer to poll for 
new messages. I don't think the correlation between 
$system-ssp-fetches-per-poll and $system-polls holds. Please correct me if I 
have misunderstood this.



docs/learn/documentation/versioned/container/metrics-table.html (line 375)
<https://reviews.apache.org/r/51516/#comment214703>

nit: number of messages that were chosen (by the MessageChooser) for a 
particular system stream partition



docs/learn/documentation/versioned/container/metrics-table.html (line 379)
<https://reviews.apache.org/r/51516/#comment214705>

nit: Average time spent polling all underlying systems for new messages (in 
nanoseconds)


- Navina Ramesh


On Aug. 30, 2016, 7:36 a.m., Branislav Cogic wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51516/
> ---
> 
> (Updated Aug. 30, 2016, 7:36 a.m.)
> 
> 
> Review request for samza.
> 
> 
> Bugs: SAMZA-702
> https://issues.apache.org/jira/browse/SAMZA-702
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> All the metrics documented in a metrics-table.
> 
> Few counters and timer removed because they are not used:
> "send-calls" counter and "chooser-update-ns" timer from SamzaContainerMetrics
> "batch-resets" counter from BootstrapingChooserMetrics
> 
> 
> Diffs
> -
> 
>   docs/_layouts/default.html 7beb734ddeaecb7a6369f7d2a5d4e0c67655269c 
>   docs/learn/documentation/versioned/container/metrics-table.html 
> PRE-CREATION 
>   docs/learn/documentation/versioned/container/metrics.md 
> b053b792097400536ea385cb3db720f6f71da017 
>   
> samza-core/src/main/scala/org/apache/samza/container/SamzaContainerMetrics.scala
>  1e7515e8e8eb5ff2f769bea3184ce49308bada9a 
>   
> samza-core/src/main/scala/org/apache/samza/system/chooser/BootstrappingChooser.scala
>  1cd8e0637e2192460a9e9fe078c735444be8eb97 
> 
> Diff: https://reviews.apache.org/r/51516/diff/
> 
> 
> Testing
> ---
> 
> Site ran locally using local-site-test.sh
> 
> 
> Thanks,
> 
> Branislav Cogic
> 
>



  1   2   3   4   5   >