Re: [VOTE] Apache Samza 0.13.0 RC6

2017-06-08 Thread Chris Riccomini
+1 (binding)

On Wed, Jun 7, 2017 at 8:55 AM, Yi Pan  wrote:

> +1 (binding)
> build and ran all local integration tests on Linux.
>
> On Tue, Jun 6, 2017 at 4:01 PM, Boris S  wrote:
>
> > +1 (non-binding)
> > build and tested on Linux (with python 2.7; 2.4 and 3.5 - didn't work)
> >
> > On Tue, Jun 6, 2017 at 2:49 PM, Jacob Maes  wrote:
> >
> > > +1 (non-binding)
> > >
> > > Built and tested on both OSX and RHEL with gradle 2.0 and 2.2
> > respectively.
> > >
> > > Also verified the high level API + YARN host affinity on a test job
> with
> > 32
> > > containers.
> > >
> > >
> > >
> > > On Tue, Jun 6, 2017 at 9:14 AM, xinyu liu 
> wrote:
> > >
> > > > +1 (non-binding).
> > > >
> > > > Downloaded the source tar, built it and run check-all.sh on REHL6
> with
> > > > gradle 2.8. All passed.
> > > >
> > > > As a side note to Jagadish's comments, the build doesn't work on a
> > higher
> > > > gradle version either (gradle 3.5). Seems
> > "-language:implicitConversions
> > > > -language:reflectiveCalls" is not a valid build option anymore.
> > > >
> > > > Thanks,
> > > > Xinyu
> > > >
> > > > On Mon, Jun 5, 2017 at 10:06 AM, Jagadish Venkatraman <
> > > > jagadish1...@gmail.com> wrote:
> > > >
> > > > > Checked out, ran tests, and all of them pass.
> > > > >
> > > > > +1 (non-binding)
> > > > >
> > > > > I did get an error when running with gradle 2.4:
> > > > > >>Could not resolve all dependencies for configuration
> > > > > ':samza-kafka_2.11:compile'. > java.lang.
> > UnsupportedOperationException
> > > > (no
> > > > > error message)
> > > > >
> > > > > However, when I used gradle 2.8, it was resolved.
> > > > >
> > > > > *gradle wrapper --gradle-version 2.8*
> > > > >
> > > > > Best,
> > > > > Jagadish
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Mon, Jun 5, 2017 at 8:37 AM, Jake Maes 
> wrote:
> > > > >
> > > > > > This is a call for a vote on a release of Apache Samza 0.13.0.
> > Thanks
> > > > to
> > > > > > everyone who has contributed to this release. We are very glad to
> > see
> > > > > some
> > > > > > new contributors and features in this release.
> > > > > >
> > > > > > The release candidate can be downloaded from here:
> > > > > > http://home.apache.org/~jmakes/samza-0.13.0-rc6/
> > > > > >
> > > > > > The release candidate is signed with pgp key 940AFC5A, which can
> be
> > > > found
> > > > > > on keyservers:
> > > > > > *http://pgp.mit.edu/pks/lookup?op=get&search=0x940AFC5A
> > > > > > *
> > > > > >
> > > > > > The git tag is release-0.13.0-rc6 and signed with the same pgp
> key:
> > > > > > https://git-wip-us.apache.org/repos/asf?p=samza.git;a=tag;h=
> > > > > > refs/tags/release-0.13.0-rc6
> > > > > >
> > > > > > Test binaries have been published to Maven's staging repository,
> > and
> > > > are
> > > > > > available here:
> > > > > > https://repository.apache.org/content/repositories/
> > > orgapachesamza-1026
> > > > > >
> > > > > > 144 issues were resolved for this release:
> > > > > > https://issues.apache.org/jira/issues/?jql=project%20%3D%
> > > > > > 20SAMZA%20AND%20fixVersion%20in%20(0.13%2C%200.13.0)%
> > > > > > 20AND%20status%20in%20(
> > > > > > Resolved%2C%20Closed)
> > > > > >
> > > > > > The vote will be open for 72 hours (ending at 9:00AM Thursday,
> > > > > 06/08/2017).
> > > > > >
> > > > > > Please download the release candidate, check the
> hashes/signature,
> > > > build
> > > > > it
> > > > > > and test it, and then please vote:
> > > > > >
> > > > > >
> > > > > > [ ] +1 approve
> > > > > >
> > > > > > [ ] +0 no opinion
> > > > > >
> > > > > > [ ] -1 disapprove (and reason why)
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Jagadish V,
> > > > > Graduate Student,
> > > > > Department of Computer Science,
> > > > > Stanford University
> > > > >
> > > >
> > >
> >
>


[REPORT] Samza - July 2016

2016-07-12 Thread Chris Riccomini
## Description:

 - Apache Samza is a stream processing framework built on top of Apache Hadoop
   YARN and Apache Kafka.

## Issues:

There are no issues requiring board attention at this time.

## Activity:

 - Good coverage of Apache Samza at Hadoop summit. Several talks about it.
 - Meetup at LinkedIn on streaming included presentations on Apache Samza.

## Health report:

 - Project seems healthy, but growth has been relatively flat. Likely due to
   complete saturation of projects in the stream processing space.

## PMC changes:

 - Currently 11 PMC members.
 - No new PMC members added in the last 3 months
 - Last PMC addition was Navina Ramesh on Thu Jan 07 2016

## Committer base changes:

 - Currently 13 committers.
 - No new committers added in the last 3 months
 - Last committer addition was Navina Ramesh at Fri May 22 2015

## Releases:

 - Last release was 0.10.0 on Fri Dec 18 2015

## Mailing list activity:

 - dev@samza.apache.org:
- 307 subscribers (up 2 in the last 3 months):
- 537 emails sent to list (561 in previous quarter)

## JIRA activity:

 - 39 JIRA tickets created in the last 3 months
 - 38 JIRA tickets closed/resolved in the last 3 months


Re: [VOTE] Samza 0.10.0 Release Candidate 2

2015-12-09 Thread Chris Riccomini
+1 binding

On Wed, Dec 9, 2015 at 7:16 PM, Boris Shkolnik  wrote:

> I've opened a ticket to track this rocksdb:test failure:
>
>1. SAMZA-836 
>
>
> On Wed, Dec 9, 2015 at 7:11 PM, Boris Shkolnik  wrote:
>
> > +1 (non-binding).
> > One note. I failed for me couple of times with the test below, but it
> > might've been related to some old versions of rocksdbjni, but I am not
> > sure. I've run it couple of more times and it ran fine.
> >
> >
> > On Wed, Dec 9, 2015 at 12:15 PM, xinyu liu 
> wrote:
> >
> >> +1 on my side. I also ran the gradle build and unit tests without
> failure.
> >>
> >> Thanks,
> >> Xinyu
> >>
> >> On Wed, Dec 9, 2015 at 9:54 AM, Tao Feng  wrote:
> >>
> >> > +1 from my side(non-binding). I download the package and successfully
> >> run
> >> > all the unit tests without failure.
> >> >
> >> > On Tue, Dec 8, 2015 at 3:38 PM, Yi Pan  wrote:
> >> >
> >> > > Hey all,
> >> > >
> >> > >
> >> > > This is a call for a vote on a release of Apache Samza 0.10.0.
> Thanks
> >> to
> >> > > everyone who has contributed to this release. We are very glad to
> see
> >> > some
> >> > > new contributors in this release.
> >> > >
> >> > >
> >> > > The release candidate can be downloaded from here:
> >> > >
> >> > >
> >> > > http://home.apache.org/~nickpan47/samza-0.10.0-rc2/
> >> > >
> >> > >
> >> > > The release candidate is signed with pgp key 911402D8, which is
> >> > >
> >> > > included in the repository's KEYS file:
> >> > >
> >> > >
> >> > >
> >> > >
> >> >
> >>
> https://git-wip-us.apache.org/repos/asf?p=samza.git;a=blob;f=KEYS;h=66cbd15cddbbd798c3529e9a8b7f052aab0037a7
> >> > >
> >> > >
> >> > > and can also be found on keyservers:
> >> > >
> >> > > http://pgp.mit.edu/pks/lookup?op=get&search=0x911402D8
> >> > >
> >> > >
> >> > > The git tag is release-0.10.0-rc2 and signed with the same pgp key:
> >> > >
> >> > >
> >> > >
> >> > >
> >> >
> >>
> https://git-wip-us.apache.org/repos/asf?p=samza.git;a=tag;h=4a37a3c4754b94805646522fc6644f2dd998e828
> >> > >
> >> > >
> >> > > Test binaries have been published to Maven's staging repository, and
> >> are
> >> > >
> >> > > available here:
> >> > >
> >> > >
> >> > >
> >> https://repository.apache.org/content/repositories/orgapachesamza-1011/
> >> > >
> >> > >
> >> > > Note that the binaries were built with JDK7 without incident. This
> is
> >> the
> >> > > first version of Samza that does not support JDK6 any more.
> >> > >
> >> > >
> >> > > 128 issues were resolved for this release:
> >> > >
> >> > >
> >> > >
> >> > >
> >> >
> >>
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SAMZA%20AND%20fixVersion%20%3D%200.10.0%20AND%20status%20in%20(Resolved%2C%20Closed)
> >> > >
> >> > >
> >> > > The vote will be open for 72 hours ( end in 4:00pm Friday,
> 12/11/2015
> >> ).
> >> > >
> >> > > Please download the release candidate, check the hashes/signature,
> >> build
> >> > it
> >> > >
> >> > > and test it, and then please vote:
> >> > >
> >> > >
> >> > > [ ] +1 approve
> >> > >
> >> > > [ ] +0 no opinion
> >> > >
> >> > > [ ] -1 disapprove (and reason why)
> >> > >
> >> > >
> >> > > +1 from my side for the release.
> >> > >
> >> > >
> >> > > Yi Pan
> >> > >
> >> > > nickpa...@gmail.com
> >> > >
> >> >
> >>
> >
> >
>


Fwd: [REPORT] Apache Samza

2015-10-09 Thread Chris Riccomini
Hey all,

I've submitted the Samza October board report filing.

Cheers,
Chris

-- Forwarded message --
From: Chris Riccomini 
Date: Fri, Oct 9, 2015 at 7:38 AM
Subject: [REPORT] Apache Samza
To: "bo...@apache.org" 


Attachment AU: Report from the Apache Samza Project  [Chris Riccomini]

Report from the Apache Samza committee [Chris Riccomini]

## Description:
Apache Samza is a distributed stream processing framework.

## Activity:
- Talks about Samza given at Strata, as well as other meetups.
- Meetup scheduled for October 13th in Bay Area.
- Ongoing work on Samza SQL operators.

## Health report:
Overall, health of project remains strong. Activity from both committers and
users in mailing list is steady. Several talks at various meetups have been
given.

## Issues:
 - There are no issues requiring board attention at this time

## PMC changes:

 - Currently 10 PMC members.
 - No new PMC members added in the last 3 months
 - Last PMC addition was Yi Pan at Mon Jun 15 2015

## LDAP changes:

 - Currently 13 committers and 10 committee group members.
 - No new committee group members added in the last 3 months
 - Last committee group addition was Yi Pan at Tue Jun 16 2015
 - No new committers added in the last 3 months
 - Last committer addition was Navina Ramesh at Fri May 22 2015

## Releases:

 - 0.9.1 was released on Fri Jul 10 2015

## Mailing list activity:

 - dev@samza.apache.org:
- 265 subscribers (up 8 in the last 3 months):
- 747 emails sent to list (869 in previous quarter)

## JIRA activity:

 - 61 JIRA tickets created in the last 3 months
 - 52 JIRA tickets closed/resolved in the last 3 months


Re: [Discuss/Vote] upgrade to Yarn 2.6.0

2015-08-17 Thread Chris Riccomini
+1 on my end

On Monday, August 17, 2015, Yi Pan  wrote:

> Hi, Yan,
>
> Thanks for rolling the ball!
>
> +1 from me to upgrade the minimum supported version to YARN 2.6, assuming
> that we are going to fix SAMZA-750 together.
>
> On Mon, Aug 17, 2015 at 4:41 PM, Yan Fang  > wrote:
>
> > Hi guys,
> >
> > we have been discussing upgrading to Yarn 2.6.0 (SAMZA-536
> > ), because there are
> some
> > bug fixes after 2.4.0 and we can not enable the Yarn RM recovering
> feature
> > in Yarn 2.4.0 (SAMZA-750 <
> https://issues.apache.org/jira/browse/SAMZA-750
> > >)
> > .
> >
> > So we just want to make sure if any production users are still using Yarn
> > 2.4.0 and do not plan to upgrade to 2.6.0+?
> >
> > If not further concern, I think we can go and upgrade to Yarn 2.6.0 in
> > Samza 0.10.0 release.
> >
> > Thanks,
> >
> > Fang, Yan
> > yanfang...@gmail.com 
> >
>


KIP-28 kafka processor

2015-08-15 Thread Chris Riccomini
Hey all,

I wanted to call attention to KIP-28:

https://cwiki.apache.org/confluence/display/KAFKA/KIP-28+-+Add+a+processor+client

This is the result of the last conversation that we had about
samza's future direction.

It would be good to have the samza community involved in this.

Cheers,
Chris


Re: Thoughts and obesrvations on Samza

2015-07-12 Thread Chris Riccomini
That was meant to be "thread" not "threat". lol. :)

On Sun, Jul 12, 2015 at 5:54 PM, Chris Riccomini 
wrote:

> Hey all,
>
> I want to start by saying that I'm absolutely thrilled to be a part of
> this community. The amount of level-headed, thoughtful, educated discussion
> that's gone on over the past ~10 days is overwhelming. Wonderful.
>
> It seems like discussion is waning a bit, and we've reached some
> conclusions. There are several key emails in this threat, which I want to
> call out:
>
> 1. Jakob's summary of the three potential ways forward.
>
> http://mail-archives.apache.org/mod_mbox/samza-dev/201507.mbox/%3CCADiKvVu-hxdBfyQ4qm3LDC55cUQbPdmbe4zGzTOOatYF1Pz43A%40mail.gmail.com%3E
> 2. Julian's call out that we should be focusing on community over code.
>
> http://mail-archives.apache.org/mod_mbox/samza-dev/201507.mbox/%3CCAPSgeESZ_7bVFbwN%2Bzqi5MH%3D4CWu9MZUSanKg0-1woMqt55Fvg%40mail.gmail.com%3E
> 3. Martin's summary about the benefits of merging communities.
>
> http://mail-archives.apache.org/mod_mbox/samza-dev/201507.mbox/%3CBFB866B6-D9D8-4578-93C0-FFAEB1DF00FC%40kleppmann.com%3E
> 4. Jakob's comments about the distinction between community and code paths.
>
> http://mail-archives.apache.org/mod_mbox/samza-dev/201507.mbox/%3CCADiKvVtWPjHLLDsmxvz9KggVA5DfBi-nUvfqB6QdA-du%2B_a9Ng%40mail.gmail.com%3E
>
> I agree with the comments on all of these emails. I think Martin's summary
> of his position aligns very closely with my own. To that end, I think we
> should get concrete about what the proposal is, and call a vote on it.
> Given that Jay, Martin, and I seem to be aligning fairly closely, I think
> we should start with:
>
> 1. [community] Make Samza a subproject of Kafka.
> 2. [community] Make all Samza PMC/committers committers of the subproject.
> 3. [community] Migrate Samza's website/documentation into Kafka's.
> 4. [code] Have the Samza community and the Kafka community start a
> from-scratch reboot together in the new Kafka subproject. We can
> borrow/copy &  paste significant chunks of code from Samza's code base.
> 5. [code] The subproject would intentionally eliminate support for both
> other streaming systems and all deployment systems.
> 6. [code] Attempt to provide a bridge from our SystemConsumer to KIP-26
> (copy cat)
> 7. [code] Attempt to provide a bridge from the new subproject's processor
> interface to our legacy StreamTask interface.
> 8. [code/community] Sunset Samza as a TLP when we have a working Kafka
> subproject that has a fault-tolerant container with state management.
>
> It's likely that (6) and (7) won't be fully drop-in. Still, the closer we
> can get, the better it's going to be for our existing community.
>
> One thing that I didn't touch on with (2) is whether any Samza PMC members
> should be rolled into Kafka PMC membership as well (though, Jay and Jakob
> are already PMC members on both). I think that Samza's community deserves a
> voice on the PMC, so I'd propose that we roll at least a few PMC members
> into the Kafka PMC, but I don't have a strong framework for which people to
> pick.
>
> Before (8), I think that Samza's TLP can continue to commit bug fixes and
> patches as it sees fit, provided that we openly communicate that we won't
> necessarily migrate new features to the new subproject, and that the TLP
> will be shut down after the migration to the Kafka subproject occurs.
>
> Jakob, I could use your guidance here about about how to achieve this from
> an Apache process perspective (sorry).
>
> * Should I just call a vote on this proposal?
> * Should it happen on dev or private?
> * Do committers have binding votes, or just PMC?
>
> Having trouble finding much detail on the Apache wikis. :(
>
> Cheers,
> Chris
>
> On Fri, Jul 10, 2015 at 2:38 PM, Yan Fang  wrote:
>
>> Thanks, Jay. This argument persuaded me actually. :)
>>
>> Fang, Yan
>> yanfang...@gmail.com
>>
>> On Fri, Jul 10, 2015 at 2:33 PM, Jay Kreps  wrote:
>>
>> > Hey Yan,
>> >
>> > Yeah philosophically I think the argument is that you should capture the
>> > stream in Kafka independent of the transformation. This is obviously a
>> > Kafka-centric view point.
>> >
>> > Advantages of this:
>> > - In practice I think this is what e.g. Storm people often end up doing
>> > anyway. You usually need to throttle any access to a live serving
>> database.
>> > - Can have multiple subscribers and they get the same thing without
>> > additional load on the source system.
>> > - Applications can 

Re: Thoughts and obesrvations on Samza

2015-07-12 Thread Chris Riccomini
Hey all,

I want to start by saying that I'm absolutely thrilled to be a part of this
community. The amount of level-headed, thoughtful, educated discussion
that's gone on over the past ~10 days is overwhelming. Wonderful.

It seems like discussion is waning a bit, and we've reached some
conclusions. There are several key emails in this threat, which I want to
call out:

1. Jakob's summary of the three potential ways forward.

http://mail-archives.apache.org/mod_mbox/samza-dev/201507.mbox/%3CCADiKvVu-hxdBfyQ4qm3LDC55cUQbPdmbe4zGzTOOatYF1Pz43A%40mail.gmail.com%3E
2. Julian's call out that we should be focusing on community over code.

http://mail-archives.apache.org/mod_mbox/samza-dev/201507.mbox/%3CCAPSgeESZ_7bVFbwN%2Bzqi5MH%3D4CWu9MZUSanKg0-1woMqt55Fvg%40mail.gmail.com%3E
3. Martin's summary about the benefits of merging communities.

http://mail-archives.apache.org/mod_mbox/samza-dev/201507.mbox/%3CBFB866B6-D9D8-4578-93C0-FFAEB1DF00FC%40kleppmann.com%3E
4. Jakob's comments about the distinction between community and code paths.

http://mail-archives.apache.org/mod_mbox/samza-dev/201507.mbox/%3CCADiKvVtWPjHLLDsmxvz9KggVA5DfBi-nUvfqB6QdA-du%2B_a9Ng%40mail.gmail.com%3E

I agree with the comments on all of these emails. I think Martin's summary
of his position aligns very closely with my own. To that end, I think we
should get concrete about what the proposal is, and call a vote on it.
Given that Jay, Martin, and I seem to be aligning fairly closely, I think
we should start with:

1. [community] Make Samza a subproject of Kafka.
2. [community] Make all Samza PMC/committers committers of the subproject.
3. [community] Migrate Samza's website/documentation into Kafka's.
4. [code] Have the Samza community and the Kafka community start a
from-scratch reboot together in the new Kafka subproject. We can
borrow/copy &  paste significant chunks of code from Samza's code base.
5. [code] The subproject would intentionally eliminate support for both
other streaming systems and all deployment systems.
6. [code] Attempt to provide a bridge from our SystemConsumer to KIP-26
(copy cat)
7. [code] Attempt to provide a bridge from the new subproject's processor
interface to our legacy StreamTask interface.
8. [code/community] Sunset Samza as a TLP when we have a working Kafka
subproject that has a fault-tolerant container with state management.

It's likely that (6) and (7) won't be fully drop-in. Still, the closer we
can get, the better it's going to be for our existing community.

One thing that I didn't touch on with (2) is whether any Samza PMC members
should be rolled into Kafka PMC membership as well (though, Jay and Jakob
are already PMC members on both). I think that Samza's community deserves a
voice on the PMC, so I'd propose that we roll at least a few PMC members
into the Kafka PMC, but I don't have a strong framework for which people to
pick.

Before (8), I think that Samza's TLP can continue to commit bug fixes and
patches as it sees fit, provided that we openly communicate that we won't
necessarily migrate new features to the new subproject, and that the TLP
will be shut down after the migration to the Kafka subproject occurs.

Jakob, I could use your guidance here about about how to achieve this from
an Apache process perspective (sorry).

* Should I just call a vote on this proposal?
* Should it happen on dev or private?
* Do committers have binding votes, or just PMC?

Having trouble finding much detail on the Apache wikis. :(

Cheers,
Chris

On Fri, Jul 10, 2015 at 2:38 PM, Yan Fang  wrote:

> Thanks, Jay. This argument persuaded me actually. :)
>
> Fang, Yan
> yanfang...@gmail.com
>
> On Fri, Jul 10, 2015 at 2:33 PM, Jay Kreps  wrote:
>
> > Hey Yan,
> >
> > Yeah philosophically I think the argument is that you should capture the
> > stream in Kafka independent of the transformation. This is obviously a
> > Kafka-centric view point.
> >
> > Advantages of this:
> > - In practice I think this is what e.g. Storm people often end up doing
> > anyway. You usually need to throttle any access to a live serving
> database.
> > - Can have multiple subscribers and they get the same thing without
> > additional load on the source system.
> > - Applications can tap into the stream if need be by subscribing.
> > - You can debug your transformation by tailing the Kafka topic with the
> > console consumer
> > - Can tee off the same data stream for batch analysis or Lambda arch
> style
> > re-processing
> >
> > The disadvantage is that it will use Kafka resources. But the idea is
> > eventually you will have multiple subscribers to any data source (at
> least
> > for monitoring) so you will end up there soon enough anyway.
> >
> > Down the road the technical benefit is that I think it gives us a good
> path
> > towards end-to-end exactly once semantics from source to destination.
> > Basically the connectors need to support idempotence when talking to
> Kafka
> > and we need the transactional write feature in Kafka to make

Re: [VOTE] Apache Samza 0.9.1 RC1

2015-06-30 Thread Chris Riccomini
+1

Verified MD5, and asc signature. Build locally, and all tests pass.

Cheers,
Chris

On Tue, Jun 30, 2015 at 1:26 PM, Milinda Pathirage 
wrote:

> +1 (non-binding)
>
> Verified signature. Tested locally using ./bin/check-all.sh.
>
> Thanks
> Milinda
>
> On Tue, Jun 30, 2015 at 2:10 AM, Yan Fang  wrote:
>
> > +1
> >
> > Verified MD5, Signature.
> >
> > Tested locally.
> >
> > Thanks,
> >
> > Fang, Yan
> > yanfang...@gmail.com
> >
> > On Sun, Jun 28, 2015 at 12:31 PM, Yi Pan  wrote:
> >
> > > Hey all,
> > >
> > > This is a call for a vote on a release of Apache Samza 0.9.1. This is a
> > > bug-fix release against 0.9.0.
> > >
> > > The release candidate can be downloaded from here:
> > >
> > > http://people.apache.org/~nickpan47/samza-0.9.1-rc1/
> > >
> > > The release candidate is signed with pgp key 911402D8, which is
> > > included in the repository's KEYS file:
> > >
> > >
> > >
> >
> https://git-wip-us.apache.org/repos/asf?p=samza.git;a=blob_plain;f=KEYS;hb=6f5bafb6cd93934781161eb6b1868d11ea347c95
> > >
> > > and can also be found on keyservers:
> > >
> > > http://pgp.mit.edu/pks/lookup?op=get&search=0x911402D8
> > >
> > > The git tag is release-0.9.1-rc1 and signed with the same pgp key:
> > >
> > >
> > >
> >
> https://git-wip-us.apache.org/repos/asf?p=samza.git;a=tag;h=e78b9e7f34650538b4bb68b338eb472b98a5709e
> > >
> > > Test binaries have been published to Maven's staging repository, and
> are
> > > available here:
> > >
> https://repository.apache.org/content/repositories/orgapachesamza-1007/
> > >
> > > Note release 0.9.1 is still supporting JDK6 and the binaries were built
> > > with JDK6 without incident.
> > >
> > > 6 critical bugs were resolved for this release:
> > >
> > >
> > >
> >
> https://issues.apache.org/jira/browse/SAMZA-715?jql=project%20%3D%20SAMZA%20AND%20fixVersion%20%3D%200.9.1%20AND%20status%20in%20%28Resolved%2C%20Closed%29
> > >
> > > The vote will be open for 72 hours ( end in 12:00pm Wed, 07/01/2015 ).
> > > Please download the release candidate, check the hashes/signature,
> build
> > it
> > > and test it, and then please vote:
> > >
> > > [ ] +1 approve
> > > [ ] +0 no opinion
> > > [ ] -1 disapprove (and reason why)
> > >
> >
>
>
>
> --
> Milinda Pathirage
>
> PhD Student | Research Assistant
> School of Informatics and Computing | Data to Insight Center
> Indiana University
>
> twitter: milindalakmal
> skype: milinda.pathirage
> blog: http://milinda.pathirage.org
>


Thoughts and obesrvations on Samza

2015-06-30 Thread Chris Riccomini
Hey all,

I have had some discussions with Samza engineers at LinkedIn and Confluent
and we came up with a few observations and would like to propose some
changes.

We've observed some things that I want to call out about Samza's design,
and I'd like to propose some changes.

* Samza is dependent upon a dynamic deployment system.
* Samza is too pluggable.
* Samza's SystemConsumer/SystemProducer and Kafka's consumer APIs are
trying to solve a lot of the same problems.

All three of these issues are related, but I'll address them in order.

Deployment

Samza strongly depends on the use of a dynamic deployment scheduler such as
YARN, Mesos, etc. When we initially built Samza, we bet that there would be
one or two winners in this area, and we could support them, and the rest
would go away. In reality, there are many variations. Furthermore, many
people still prefer to just start their processors like normal Java
processes, and use traditional deployment scripts such as Fabric, Chef,
Ansible, etc. Forcing a deployment system on users makes the Samza start-up
process really painful for first time users.

Dynamic deployment as a requirement was also a bit of a mis-fire because of
a fundamental misunderstanding between the nature of batch jobs and stream
processing jobs. Early on, we made conscious effort to favor the Hadoop
(Map/Reduce) way of doing things, since it worked and was well understood.
One thing that we missed was that batch jobs have a definite beginning, and
end, and stream processing jobs don't (usually). This leads to a much
simpler scheduling problem for stream processors. You basically just need
to find a place to start the processor, and start it. The way we run grids,
at LinkedIn, there's no concept of a cluster being "full". We always add
more machines. The problem with coupling Samza with a scheduler is that
Samza (as a framework) now has to handle deployment. This pulls in a bunch
of things such as configuration distribution (config stream), shell scrips
(bin/run-job.sh, JobRunner), packaging (all the .tgz stuff), etc.

Another reason for requiring dynamic deployment was to support data
locality. If you want to have locality, you need to put your processors
close to the data they're processing. Upon further investigation, though,
this feature is not that beneficial. There is some good discussion about
some problems with it on SAMZA-335. Again, we took the Map/Reduce path, but
there are some fundamental differences between HDFS and Kafka. HDFS has
blocks, while Kafka has partitions. This leads to less optimization
potential with stream processors on top of Kafka.

This feature is also used as a crutch. Samza doesn't have any built in
fault-tolerance logic. Instead, it depends on the dynamic deployment
scheduling system to handle restarts when a processor dies. This has made
it very difficult to write a standalone Samza container (SAMZA-516).

Pluggability

In some cases pluggability is good, but I think that we've gone too far
with it. Currently, Samza has:

* Pluggable config.
* Pluggable metrics.
* Pluggable deployment systems.
* Pluggable streaming systems (SystemConsumer, SystemProducer, etc).
* Pluggable serdes.
* Pluggable storage engines.
* Pluggable strategies for just about every component (MessageChooser,
SystemStreamPartitionGrouper, ConfigRewriter, etc).

There's probably more that I've forgotten, as well. Some of these are
useful, but some have proven not to be. This all comes at a cost:
complexity. This complexity is making it harder for our users to pick up
and use Samza out of the box. It also makes it difficult for Samza
developers to reason about what the characteristics of the container (since
the characteristics change depending on which plugins are use).

The issues with pluggability are most visible in the System APIs. What
Samza really requires to be functional is Kafka as its transport layer. But
we've conflated two unrelated use cases into one API:

1. Get data into/out of Kafka.
2. Process the data in Kafka.

The current System API supports both of these use cases. The problem is, we
actually want different features for each use case. By papering over these
two use cases, and providing a single API, we've introduced a ton of leaky
abstractions.

For example, what we'd really like in (2) is to have monotonically
increasing longs for offsets (like Kafka). This would be at odds with (1),
though, since different systems have different SCNs/Offsets/UUIDs/vectors.
There was discussion both on the mailing list and the SQL JIRAs about the
need for this.

The same thing holds true for replayability. Kafka allows us to rewind when
we have a failure. Many other systems don't. In some cases, systems return
null for their offsets (e.g. WikipediaSystemConsumer) because they have no
offsets.

Partitioning is another example. Kafka supports partitioning, but many
systems don't. We model this by having a single partition for those
systems. Still, other systems model partitioning dif

Re: Measuring Samza Job Throughput

2015-06-16 Thread Chris Riccomini
Hmm, correction. I think this has to be done at tbhe KafkaSystem level. We
allow consumers and producers to return non-byte messages, which means
nothing in container can safely assume that a message is a byte array
except the serde manager. I took a look there but didn't see any byte
throughout metrics after all.

On Tuesday, June 16, 2015, Chris Riccomini  wrote:

> Hey Milinda,
>
> Specifically, for bytes/sec, you might want to look at serde metrics. I
> believe the serde manager tracks bytes serialized and deserialized per
> second. The consumers and producers also do this for Kafka, but on a more
> granular basis. If you want container-level throughput, serde manager is
> worth looking at.
>
> Cheers,
> Chris
>
> On Tuesday, June 16, 2015, Milinda Pathirage  > wrote:
>
>> Hi Devs,
>>
>> I was looking for a way to measure Samza job throughput and found that its
>> possible to do it via Samza's metrics reporter. But there several types of
>> metrics reported via this method. For example, TaskInstanceMetrics reports
>> number of messages sent. But if I wanted to get a measurement like bytes
>> per second produced, is there a way to do that. It looks
>> like KafkaSystemProducerMetrics and TaskInstanceMetrics only provide
>> number
>> of messages sent.
>>
>> If any of you have any experience in measuring Samza job throughput, can
>> you please share. Really appreciate any ideas on measuring job throughput.
>>
>> Thanks
>> Milinda
>> --
>> Milinda Pathirage
>>
>> PhD Student | Research Assistant
>> School of Informatics and Computing | Data to Insight Center
>> Indiana University
>>
>> twitter: milindalakmal
>> skype: milinda.pathirage
>> blog: http://milinda.pathirage.org
>>
>


Re: Measuring Samza Job Throughput

2015-06-16 Thread Chris Riccomini
Hey Milinda,

Specifically, for bytes/sec, you might want to look at serde metrics. I
believe the serde manager tracks bytes serialized and deserialized per
second. The consumers and producers also do this for Kafka, but on a more
granular basis. If you want container-level throughput, serde manager is
worth looking at.

Cheers,
Chris

On Tuesday, June 16, 2015, Milinda Pathirage  wrote:

> Hi Devs,
>
> I was looking for a way to measure Samza job throughput and found that its
> possible to do it via Samza's metrics reporter. But there several types of
> metrics reported via this method. For example, TaskInstanceMetrics reports
> number of messages sent. But if I wanted to get a measurement like bytes
> per second produced, is there a way to do that. It looks
> like KafkaSystemProducerMetrics and TaskInstanceMetrics only provide number
> of messages sent.
>
> If any of you have any experience in measuring Samza job throughput, can
> you please share. Really appreciate any ideas on measuring job throughput.
>
> Thanks
> Milinda
> --
> Milinda Pathirage
>
> PhD Student | Research Assistant
> School of Informatics and Computing | Data to Insight Center
> Indiana University
>
> twitter: milindalakmal
> skype: milinda.pathirage
> blog: http://milinda.pathirage.org
>


Powered by page update

2015-06-16 Thread Chris Riccomini
Hey all,

I'm seeing a lot of new faces on the mailing list, which is really awesome.
I want to invite you all to add yourselves to our Powered by page:

https://cwiki.apache.org/confluence/display/SAMZA/Powered+By

The Apache wiki is pretty locked down due to spam. If you'd like to send me
a link and short write-up, I'll be happy to add your entry to the page,
though.

Cheers,
Chris


Re: improving hello-samza / testing

2015-06-16 Thread Chris Riccomini
Hey Tim,

This is a really good discussion to have. The testing that I've seen with
Samza falls into two categories:

1. Instantiate your StreamTask, and mock all params in the process()/init()
methods.
2. A mini-ontegration test that starts ZooKeeper, and Kafka, and feeds
messages into a topic, and validates it gets messages back out from the
output topic.
3. A full blown integration test that uses Zopkio.

For an example of (2), in practice, have a look at TestStatefuleTask:


https://git-wip-us.apache.org/repos/asf?p=samza.git;a=blob;f=samza-test/src/test/scala/org/apache/samza/test/integration/TestStatefulTask.scala;h=ea702a919348305ff95ce0b4ca1996a13aff04ec;hb=HEAD

As you can see, writing this kind of integration test can be a bit painful.

(3) is documented here:

  http://samza.apache.org/contribute/tests.html

Another way to test would be to start a full-blown container using
ThreadJobFactory/ProcessJobFactory, but use a MockSystemFactory to mock out
the system consumer/system producer.

Has anyone else tested Samza in other ways?

Cheers,
Chris

On Tue, Jun 16, 2015 at 11:00 AM, Tim Williams  wrote:

> I'm learning samza by the hello-samza project and notice the lack of
> tests.  Where's a good place to learn how folks are properly testing
> things written with samza?
>
> Thanks,
> --tim
>


Re: [DISCUSS] Samza 0.9.1 release

2015-06-16 Thread Chris Riccomini
+1 Here.

On Tue, Jun 16, 2015 at 12:01 PM, Guozhang Wang  wrote:

> Cool. I will start a voting process soon.
>
> On Tue, Jun 16, 2015 at 11:55 AM, Chinmay Soman  >
> wrote:
>
> > +1
> >
> > On Tue, Jun 16, 2015 at 11:17 AM, Navina Ramesh <
> > nram...@linkedin.com.invalid> wrote:
> >
> > > +1 for the release!
> > >
> > > On 6/16/15, 11:03 AM, "Yi Pan"  wrote:
> > >
> > > >+1 Agreed.
> > > >
> > > >Thanks!
> > > >
> > > >On Tue, Jun 16, 2015 at 10:15 AM, Yan Fang 
> > wrote:
> > > >
> > > >> Agreed on this.
> > > >>
> > > >> Thanks,
> > > >>
> > > >> Fang, Yan
> > > >> yanfang...@gmail.com
> > > >>
> > > >> On Tue, Jun 16, 2015 at 10:14 AM, Guozhang Wang  >
> > > >> wrote:
> > > >>
> > > >> > Hi all,
> > > >> >
> > > >> > We have been running a couple of our jobs against `0.9.1` branch
> > last
> > > >> week
> > > >> > at LinkedIn with some critical bug fixes back-ported, including:
> > > >> >
> > > >> > SAMZA-608
> > > >> > Deserialization error causes SystemConsumers to hang
> > > >> >
> > > >> > SAMZA-616
> > > >> > Shutdown hook does not wait for container to finish
> > > >> >
> > > >> > SAMZA-658
> > > >> > Iterator.remove breaks caching layer
> > > >> >
> > > >> > SAMZA-662 / 686
> > > >> > Samza auto-creates changelog stream without sufficient partitions
> > when
> > > >> > container number > 1
> > > >> >
> > > >> > I am proposing a release vote on the current 0.9.1 branch for
> these
> > > >>bug
> > > >> > fixes. Thoughts?
> > > >> >
> > > >> > -- Guozhang
> > > >> >
> > > >>
> > >
> > >
> >
> >
> > --
> > Thanks and regards
> >
> > Chinmay Soman
> >
>
>
>
> --
> -- Guozhang
>


Re: How to configure the Resource Manager endpoint for YARN?

2015-04-15 Thread Chris Riccomini
Hey Roger,

Hmm, that's good to know, lol. Wonder how our's is working. :) I'll poke
around.

Cheers,
Chris

On Wed, Apr 15, 2015 at 11:17 AM, Roger Hoover 
wrote:

> Turns out that HADOOP_CONF_DIR is the right env var (YARN_CONF_DIR did not
> work).  I had just messed up the directory path.  Doh!
>
> Sent from my iPhone
>
> > On Apr 15, 2015, at 9:41 AM, Roger Hoover 
> wrote:
> >
> > I'll try that.  Thanks, Chris.
> >
> >> On Wed, Apr 15, 2015 at 9:37 AM, Chris Riccomini 
> wrote:
> >> Hey Roger,
> >>
> >> Not sure if this makes a difference, but have you tried using:
> >>
> >>   export YARN_CONF_DIR=...
> >>
> >> Instead? This is what we use.
> >>
> >> Cheers,
> >> Chris
> >>
> >> On Wed, Apr 15, 2015 at 9:33 AM, Roger Hoover 
> >> wrote:
> >>
> >> > Hi,
> >> >
> >> > I'm trying to deploy a job to a small YARN cluster.  How do tell the
> >> > launcher script where to find the Resource Manager?  I tried creating
> a
> >> > yarn-site.xml and setting HADOOP_CONF_DIR environment variable but it
> >> > doesn't find my config.
> >> >
> >> > 2015-04-14 22:02:45 ClientHelper [INFO] trying to connect to RM
> >> > 0.0.0.0:8032
> >> > 2015-04-14 22:02:45 RMProxy [INFO] Connecting to ResourceManager at /
> >> > 0.0.0.0:8032
> >> >
> >> > Thanks,
> >> >
> >> > Roger
> >> >
> >
>


Re: Maximum number of jobs

2015-04-15 Thread Chris Riccomini
Hey Jeremy,

Samza will be fine, but at this scale you need to start worrying about
Kafka and YARN. 1 million jobs will likely start to put pressure on YARN's
RM due to memory usage and CPU usage for the scheduler. With 1 million
jobs, assuming 1 container each, you'll have over 1 million connections to
Kafka, which means you'll need enough brokers to handle those connections.

Can you describe your use case in more detail? Running 1 million jobs seems
like it might be a mis-use of this technology.

Cheers,
Chris

On Wed, Apr 15, 2015 at 10:24 AM, jeremy p 
wrote:

> What's the maximum number of Samza jobs I can run simultaneously on a
> single cluster?  Let's say these jobs are very lightweight -- they require
> little memory or processing power.  However, I need a lot of them -- let's
> say I need to have 1,000,000 running at any given time.  Is this reasonable
> or even possible?
>


Re: How to configure the Resource Manager endpoint for YARN?

2015-04-15 Thread Chris Riccomini
Hey Roger,

Not sure if this makes a difference, but have you tried using:

  export YARN_CONF_DIR=...

Instead? This is what we use.

Cheers,
Chris

On Wed, Apr 15, 2015 at 9:33 AM, Roger Hoover 
wrote:

> Hi,
>
> I'm trying to deploy a job to a small YARN cluster.  How do tell the
> launcher script where to find the Resource Manager?  I tried creating a
> yarn-site.xml and setting HADOOP_CONF_DIR environment variable but it
> doesn't find my config.
>
> 2015-04-14 22:02:45 ClientHelper [INFO] trying to connect to RM
> 0.0.0.0:8032
> 2015-04-14 22:02:45 RMProxy [INFO] Connecting to ResourceManager at /
> 0.0.0.0:8032
>
> Thanks,
>
> Roger
>


Re: Updating samza-sql branch to Java 1.7

2015-04-14 Thread Chris Riccomini
@Yi, are you going to merge master into the samza-sql branch, or should I?

On Tue, Apr 14, 2015 at 2:02 PM, Yi Pan  wrote:

> Hi, Milinda,
>
> Jacob already committed a change to remove Java 1.6 support in SAMZA-646. I
> think that it would be fine to move samza-sql branch to Java 1.7.
>
> Regards.
>
> -Yi
>
> On Tue, Apr 14, 2015 at 12:47 PM, Milinda Pathirage  >
> wrote:
>
> > Hi Devs,
> >
> > Calcite dropped support for Java 1.6 in 1.1.0-incubating. I want to use
> > Calcite 1.2.0-incubating-SNAPSHOT in samza-sql branch. Is it okay to
> update
> > samza-sql branch to Java 1.7?
> >
> > Thanks
> > Milinda
> >
> > --
> > Milinda Pathirage
> >
> > PhD Student | Research Assistant
> > School of Informatics and Computing | Data to Insight Center
> > Indiana University
> >
> > twitter: milindalakmal
> > skype: milinda.pathirage
> > blog: http://milinda.pathirage.org
> >
>


Re: Review Request 33146: Adding a new KV store contract: BatchingKeyValueStore

2015-04-14 Thread Chris Riccomini

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33146/#review80110
---



samza-test/src/test/scala/org/apache/samza/storage/kv/TestKeyValueStores.scala
<https://reviews.apache.org/r/33146/#comment129874>

Prefer not to introduce a dependency on google common.


- Chris Riccomini


On April 13, 2015, 9:24 p.m., Mohamed Mahmoud (El-Geish) wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/33146/
> ---
> 
> (Updated April 13, 2015, 9:24 p.m.)
> 
> 
> Review request for samza.
> 
> 
> Bugs: SAMZA-647
> https://issues.apache.org/jira/browse/SAMZA-647
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> Adding a new KV store contract, BatchingKeyValueStore, which adds the 
> following methods:
> * Map getAll(List), and
> * void deleteAll(List)
> 
> Since Samza does not require Java 8, the above cannot be implemented as 
> default interface method in KeyValueStore, and a new contract (that extends 
> KeyValueStore) is necessary to maintain backward compatibility.
> Existing KV stores extend the new contract now to be consistent and to enable 
> API callers to use KeyValueStore or BatchingKeyValueStore interchangeably.
> RocksDbKeyValueStore overrides the getAll behavior to call multiGet(List); 
> Preliminary tests showed that multiget is at least 1.25x faster per key than 
> get (see 
> https://reviews.facebook.net/rROCKSDB4985a9f73b9fb8a0323fbbb06222ae1f758a6b1d).
> Java source compatibility: 1.6
> 
> 
> Diffs
> -
> 
>   
> samza-kv-inmemory/src/main/scala/org/apache/samza/storage/kv/inmemory/InMemoryKeyValueStore.scala
>  217333c84c696c0cc1bc3eeabf1c4066a6e89795 
>   
> samza-kv-rocksdb/src/main/scala/org/apache/samza/storage/kv/RocksDbKeyValueStore.scala
>  66c2a0dc2e38e21f951727a30f0987776ac52fe2 
>   
> samza-kv/src/main/java/org/apache/samza/storage/kv/BatchingKeyValueStore.java 
> PRE-CREATION 
>   samza-kv/src/main/scala/org/apache/samza/storage/kv/CachedStore.scala 
> 61bb3f6acb080b653f8b11176538549738255acc 
>   
> samza-kv/src/main/scala/org/apache/samza/storage/kv/KeyValueStoreMetrics.scala
>  79092b91c9498e55f1c4e28661b7280c6c19cef7 
>   
> samza-kv/src/main/scala/org/apache/samza/storage/kv/NullSafeKeyValueStore.scala
>  4f48cf490d6c1012591a602c0d29dcc71473090f 
>   
> samza-kv/src/main/scala/org/apache/samza/storage/kv/SerializedKeyValueStore.scala
>  531e8bef2069a77fa9ceab36fa738bbaa162fe8c 
>   
> samza-test/src/test/scala/org/apache/samza/storage/kv/TestKeyValueStores.scala
>  50dfc10bb053d74dba70fdbce0ef87609ba447ea 
> 
> Diff: https://reviews.apache.org/r/33146/diff/
> 
> 
> Testing
> ---
> 
> Unit-tested.
> 
> 
> Thanks,
> 
> Mohamed Mahmoud (El-Geish)
> 
>



Re: Review Request 33146: Adding a new KV store contract: BatchingKeyValueStore

2015-04-14 Thread Chris Riccomini

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33146/#review80036
---


I'm concerned that there might be an issue with this approach. In 
BaseKeyValueStorageEngineFactory, we compose stores by nesting them. If this is 
the case, I think that the top-most store will implement the batching key value 
store, and will use the default implementation of getAll/deleteAll. As such, 
I'm not sure that the RocksDB implementation ever actually gets called.


samza-kv/src/main/java/org/apache/samza/storage/kv/BatchingKeyValueStore.java
<https://reviews.apache.org/r/33146/#comment129786>

Nit: indentation should be 2 spaces.



samza-kv/src/main/scala/org/apache/samza/storage/kv/KeyValueStoreMetrics.scala
<https://reviews.apache.org/r/33146/#comment129787>

Do we need a deleteAlls metric as well?


- Chris Riccomini


On April 13, 2015, 9:24 p.m., Mohamed Mahmoud (El-Geish) wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/33146/
> ---
> 
> (Updated April 13, 2015, 9:24 p.m.)
> 
> 
> Review request for samza.
> 
> 
> Bugs: SAMZA-647
> https://issues.apache.org/jira/browse/SAMZA-647
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> Adding a new KV store contract, BatchingKeyValueStore, which adds the 
> following methods:
> * Map getAll(List), and
> * void deleteAll(List)
> 
> Since Samza does not require Java 8, the above cannot be implemented as 
> default interface method in KeyValueStore, and a new contract (that extends 
> KeyValueStore) is necessary to maintain backward compatibility.
> Existing KV stores extend the new contract now to be consistent and to enable 
> API callers to use KeyValueStore or BatchingKeyValueStore interchangeably.
> RocksDbKeyValueStore overrides the getAll behavior to call multiGet(List); 
> Preliminary tests showed that multiget is at least 1.25x faster per key than 
> get (see 
> https://reviews.facebook.net/rROCKSDB4985a9f73b9fb8a0323fbbb06222ae1f758a6b1d).
> Java source compatibility: 1.6
> 
> 
> Diffs
> -
> 
>   
> samza-kv-inmemory/src/main/scala/org/apache/samza/storage/kv/inmemory/InMemoryKeyValueStore.scala
>  217333c84c696c0cc1bc3eeabf1c4066a6e89795 
>   
> samza-kv-rocksdb/src/main/scala/org/apache/samza/storage/kv/RocksDbKeyValueStore.scala
>  66c2a0dc2e38e21f951727a30f0987776ac52fe2 
>   
> samza-kv/src/main/java/org/apache/samza/storage/kv/BatchingKeyValueStore.java 
> PRE-CREATION 
>   samza-kv/src/main/scala/org/apache/samza/storage/kv/CachedStore.scala 
> 61bb3f6acb080b653f8b11176538549738255acc 
>   
> samza-kv/src/main/scala/org/apache/samza/storage/kv/KeyValueStoreMetrics.scala
>  79092b91c9498e55f1c4e28661b7280c6c19cef7 
>   
> samza-kv/src/main/scala/org/apache/samza/storage/kv/NullSafeKeyValueStore.scala
>  4f48cf490d6c1012591a602c0d29dcc71473090f 
>   
> samza-kv/src/main/scala/org/apache/samza/storage/kv/SerializedKeyValueStore.scala
>  531e8bef2069a77fa9ceab36fa738bbaa162fe8c 
>   
> samza-test/src/test/scala/org/apache/samza/storage/kv/TestKeyValueStores.scala
>  50dfc10bb053d74dba70fdbce0ef87609ba447ea 
> 
> Diff: https://reviews.apache.org/r/33146/diff/
> 
> 
> Testing
> ---
> 
> Unit-tested.
> 
> 
> Thanks,
> 
> Mohamed Mahmoud (El-Geish)
> 
>



Re: Dealing with partitioning mismatches between bootstrap and input streams

2015-04-07 Thread Chris Riccomini
Hey Tommy,

Your summary sounds pretty accurate. One other way, which requires no
change to Samza, would be to repartition the input topic properly for each
task. This is kind of hacky, though.

(2) is the ideal solution. It is a bit of work, but it might not be so bad.
I think most of the changes would be isolated to the TaskStorageManager.
We'd also need to make the KV store read-only, which is pretty easy to do.
If you're not comfortable with it, though, then (1) would be your next-best
bet.

Cheers,
Chris

On Tue, Apr 7, 2015 at 10:16 AM, Tommy Becker  wrote:

> We have a Kafka topic containing data needed by several Samza jobs. These
> jobs will essentially read the data and build up state that will be used
> for processing their inputs. Ideally, we would use the topic as a bootstrap
> stream to build up this state. The problem with that is the topic
> containing the data has a single partition but the topics these jobs are
> processing as input have multiple partitions. So my understanding is that
> only one task instance in the job would actually process the bootstrap
> stream, and therefore any state it built up would be local to that task. So
> I'm thinking my options are the following:
>
> 1) Implement SAMZA-353 and allow the bootstrap SSP to be assigned to each
> task instance
> 2) Implement the shared state store component of SAMZA-402
> 3) Layer the shared state on top of Samza in our tasks themselves, maybe
> by using something like RocksDB directly.
>
> Number 1 seems easiest to implement at the cost of having the entire state
> duplicated for each task.  I'd prefer not to do number 3 given the
> existence of this feature on Samza's roadmap, but I am a bit concerned
> about the scope of work with number 2, and the fact that this is mostly
> Scala code.
>
> Are there any alternatives that I'm missing?  Note that we need to process
> the data stream as a bootstrap stream.  Using it as a changelog is
> insufficient because we need to be able to manipulate the data before
> building up the state store.
>
> --
> Tommy Becker
> Senior Software Engineer
>
> Digitalsmiths
> A TiVo Company
>
> www.digitalsmiths.com
> tobec...@tivo.com
>
> 
>
> This email and any attachments may contain confidential and privileged
> material for the sole use of the intended recipient. Any review, copying,
> or distribution of this email (or any attachments) by others is prohibited.
> If you are not the intended recipient, please contact the sender
> immediately and permanently delete this email and any attachments. No
> employee or agent of TiVo Inc. is authorized to conclude any binding
> agreement on behalf of TiVo Inc. by email. Binding agreements with TiVo
> Inc. may only be made by a signed written agreement.
>


Re: Producer performance in 0.9.0

2015-04-07 Thread Chris Riccomini
Hey Gian,

Hmm, this is strange. We ran some tests, and found that the new producer to
be faster than the old producer default (sync), and almost as fast as the
old producer's async producer.

Could you paste all of your configs?

Cheers,
Chris

On Tue, Apr 7, 2015 at 10:40 AM, Gian Merlino  wrote:

> Has anyone else seen issues with producer performance in 0.9.0?
>
> I updated a few of our jobs recently and ended up rolling one back to 0.8
> since it was being really sluggish. I profiled it for a bit and a lot of
> time was being spent in BufferPool.allocate and the busy-loop in
> KafkaSystemProducer's flush. The flush-ms metric said that flushes were
> taking on the order of tens of seconds.
>
> The slow topic was a state changelog for a stream-stream join buffer. I
> tried setting producer.linger.ms to 1, 5, and 1000, but that didn't change
> the behavior much.
>
> Gian
>


Re: Review Request 32147: SAMZA-465

2015-04-02 Thread Chris Riccomini

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/32147/#review78743
---



samza-core/src/main/java/org/apache/samza/coordinator/stream/CoordinatorStreamMessage.java
<https://reviews.apache.org/r/32147/#comment127739>

Chould this be container ID?



samza-core/src/main/java/org/apache/samza/coordinator/stream/CoordinatorStreamMessage.java
<https://reviews.apache.org/r/32147/#comment127740>

Chould this be container ID?



samza-core/src/main/java/org/apache/samza/coordinator/stream/CoordinatorStreamMessage.java
<https://reviews.apache.org/r/32147/#comment127741>

typo



samza-core/src/main/java/org/apache/samza/coordinator/stream/CoordinatorStreamSystemConsumer.java
<https://reviews.apache.org/r/32147/#comment127738>

nit: () {

single line.



samza-core/src/main/java/org/apache/samza/coordinator/stream/CoordinatorStreamSystemConsumer.java
<https://reviews.apache.org/r/32147/#comment127742>

Might want to do type.equalsIgnoreCase(coordinatorStreamMessage.getType()) 
in the off chance that a message with a null type (malformed) gets into the 
stream.

Or maybe check for nulls and throw a more meaningful error than an NPE.



samza-core/src/main/scala/org/apache/samza/coordinator/JobCoordinator.scala
<https://reviews.apache.org/r/32147/#comment127744>

nit: lowercase j



samza-core/src/main/scala/org/apache/samza/coordinator/server/HttpServer.scala
<https://reviews.apache.org/r/32147/#comment127745>

Delete



samza-log4j/src/main/java/org/apache/samza/logging/log4j/StreamAppender.java
<https://reviews.apache.org/r/32147/#comment127747>

Does it make sense to just have JobCoordinator.apply default to new 
MetricsRegistryMap? It seems like we had to make a bunch of changes rather than 
just set the default in the constructor, and leave the code alone everywhere.


- Chris Riccomini


On April 1, 2015, 9:02 p.m., Naveen Somasundaram wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/32147/
> ---
> 
> (Updated April 1, 2015, 9:02 p.m.)
> 
> 
> Review request for samza.
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> SAMZA 465
> 
> 
> Diffs
> -
> 
>   build.gradle 97de3a28f6379e3862eec845da87587b1d4f742e 
>   samza-api/src/main/java/org/apache/samza/checkpoint/Checkpoint.java  
>   samza-api/src/main/java/org/apache/samza/checkpoint/CheckpointManager.java 
> 092cb910b40d312217e86420bf1ddfbaf605e9e5 
>   
> samza-api/src/main/java/org/apache/samza/checkpoint/CheckpointManagerFactory.java
>  a97ff0919d8205928efee1a2a20780659180849d 
>   samza-api/src/main/java/org/apache/samza/container/TaskName.java 
> 083358686fc69ab45bbc73e898f419224ebc3a9f 
>   samza-api/src/main/java/org/apache/samza/system/SystemAdmin.java 
> 8995ba30c823bddcdfd3af7100e1440e71ef7998 
>   samza-api/src/main/java/org/apache/samza/task/TaskCoordinator.java 
> 6ff1a555f3c48e416bb78e94c5df71eff0a27f3d 
>   
> samza-api/src/main/java/org/apache/samza/util/SinglePartitionWithoutOffsetsSystemAdmin.java
>  01997ae22641b735cd452a0e89a49219e2874892 
>   samza-core/src/main/java/org/apache/samza/checkpoint/CheckpointManager.java 
> PRE-CREATION 
>   
> samza-core/src/main/java/org/apache/samza/coordinator/stream/CoordinatorStreamMessage.java
>  PRE-CREATION 
>   
> samza-core/src/main/java/org/apache/samza/coordinator/stream/CoordinatorStreamSystemConsumer.java
>  PRE-CREATION 
>   
> samza-core/src/main/java/org/apache/samza/coordinator/stream/CoordinatorStreamSystemProducer.java
>  PRE-CREATION 
>   samza-core/src/main/java/org/apache/samza/job/model/TaskModel.java 
> eb22d2ec5f09ca59790e2871d9bff9745fe925dc 
>   
> samza-core/src/main/java/org/apache/samza/serializers/model/JsonTaskModelMixIn.java
>  7dc431c74a3fc2ba80eb47d6c5d87524cb4c9bde 
>   
> samza-core/src/main/java/org/apache/samza/serializers/model/SamzaObjectMapper.java
>  3517912eaafbf95f8c8cc70ab5869548a56b76e7 
>   
> samza-core/src/main/java/org/apache/samza/storage/ChangelogPartitionManager.java
>  PRE-CREATION 
>   samza-core/src/main/scala/org/apache/samza/checkpoint/CheckpointTool.scala 
> ddc30af7c52d8a4d5c5de02f6757c040b1f31c93 
>   samza-core/src/main/scala/org/apache/samza/checkpoint/OffsetManager.scala 
> a40c87fa7865746a5612c55a4cf24c8d005be7e0 
>   
> samza-core/src/main/scala/org/apache/samza/checkpoint/file/FileSystemCheckpointManager.scala
>  2a87a6e0cef72179b5383fc824266de1f9606293 
>   samza-core/src/main/scala/org/

Re: Review Request 32147: SAMZA-465

2015-04-02 Thread Chris Riccomini

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/32147/#review78680
---



samza-core/src/main/java/org/apache/samza/checkpoint/CheckpointManager.java
<https://reviews.apache.org/r/32147/#comment127632>

Can we make this private final, and call new in the constructor?



samza-core/src/main/java/org/apache/samza/checkpoint/CheckpointManager.java
<https://reviews.apache.org/r/32147/#comment127631>

All this stuff should be private final, right?



samza-core/src/main/java/org/apache/samza/checkpoint/CheckpointManager.java
<https://reviews.apache.org/r/32147/#comment127630>

Prefer two constructors. One with source, and the other without.



samza-core/src/main/java/org/apache/samza/coordinator/stream/CoordinatorStreamMessage.java
<https://reviews.apache.org/r/32147/#comment127633>

TaskName or container id?



samza-core/src/main/java/org/apache/samza/coordinator/stream/CoordinatorStreamMessage.java
<https://reviews.apache.org/r/32147/#comment127634>

TaskName or container id?



samza-core/src/main/java/org/apache/samza/storage/ChangelogPartitionManager.java
<https://reviews.apache.org/r/32147/#comment127635>

Same comments as CheckpointManager. private finals and second constructor.



samza-core/src/main/scala/org/apache/samza/checkpoint/CheckpointTool.scala
<https://reviews.apache.org/r/32147/#comment127636>

nit: lowercase c



samza-core/src/main/scala/org/apache/samza/coordinator/JobCoordinator.scala
<https://reviews.apache.org/r/32147/#comment127732>

Nit: lowercase 'j'



samza-core/src/main/scala/org/apache/samza/util/Util.scala
<https://reviews.apache.org/r/32147/#comment127734>

Is it just 200, or all 2XX codes that we're OK with?



samza-core/src/main/scala/org/apache/samza/util/Util.scala
<https://reviews.apache.org/r/32147/#comment127733>

I don't think that you need to parameterize this (the {}). SLF4J's Logger 
takes an error(String, Throwable) .. 
http://www.slf4j.org/apidocs/org/slf4j/Logger.html


- Chris Riccomini


On April 1, 2015, 9:02 p.m., Naveen Somasundaram wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/32147/
> ---
> 
> (Updated April 1, 2015, 9:02 p.m.)
> 
> 
> Review request for samza.
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> SAMZA 465
> 
> 
> Diffs
> -
> 
>   build.gradle 97de3a28f6379e3862eec845da87587b1d4f742e 
>   samza-api/src/main/java/org/apache/samza/checkpoint/Checkpoint.java  
>   samza-api/src/main/java/org/apache/samza/checkpoint/CheckpointManager.java 
> 092cb910b40d312217e86420bf1ddfbaf605e9e5 
>   
> samza-api/src/main/java/org/apache/samza/checkpoint/CheckpointManagerFactory.java
>  a97ff0919d8205928efee1a2a20780659180849d 
>   samza-api/src/main/java/org/apache/samza/container/TaskName.java 
> 083358686fc69ab45bbc73e898f419224ebc3a9f 
>   samza-api/src/main/java/org/apache/samza/system/SystemAdmin.java 
> 8995ba30c823bddcdfd3af7100e1440e71ef7998 
>   samza-api/src/main/java/org/apache/samza/task/TaskCoordinator.java 
> 6ff1a555f3c48e416bb78e94c5df71eff0a27f3d 
>   
> samza-api/src/main/java/org/apache/samza/util/SinglePartitionWithoutOffsetsSystemAdmin.java
>  01997ae22641b735cd452a0e89a49219e2874892 
>   samza-core/src/main/java/org/apache/samza/checkpoint/CheckpointManager.java 
> PRE-CREATION 
>   
> samza-core/src/main/java/org/apache/samza/coordinator/stream/CoordinatorStreamMessage.java
>  PRE-CREATION 
>   
> samza-core/src/main/java/org/apache/samza/coordinator/stream/CoordinatorStreamSystemConsumer.java
>  PRE-CREATION 
>   
> samza-core/src/main/java/org/apache/samza/coordinator/stream/CoordinatorStreamSystemProducer.java
>  PRE-CREATION 
>   samza-core/src/main/java/org/apache/samza/job/model/TaskModel.java 
> eb22d2ec5f09ca59790e2871d9bff9745fe925dc 
>   
> samza-core/src/main/java/org/apache/samza/serializers/model/JsonTaskModelMixIn.java
>  7dc431c74a3fc2ba80eb47d6c5d87524cb4c9bde 
>   
> samza-core/src/main/java/org/apache/samza/serializers/model/SamzaObjectMapper.java
>  3517912eaafbf95f8c8cc70ab5869548a56b76e7 
>   
> samza-core/src/main/java/org/apache/samza/storage/ChangelogPartitionManager.java
>  PRE-CREATION 
>   samza-core/src/main/scala/org/apache/samza/checkpoint/CheckpointTool.scala 
> ddc30af7c52d8a4d5c5de02f6757c040b1f31c93 
>   samza-core/src/main/scala/org/apache/samza/checkpoint/OffsetManager.scala 
> a40c87fa7865746a5612c55a4cf24c8d005be7e0 
>   
> samz

Re: Store changelog

2015-04-02 Thread Chris Riccomini
Hey Dan,

I think you might have a misunderstanding in how changelogs work with
Samza. Suppose you have a job with two tasks, and a single kv-store is
configured with a changelog attached. The changelog, in Kafka, will have
two partitions. Each task will use one partition of the changelog topic.
You only need one topic per-changelog (and no prefix) because there are
multiple partitions per changelog, and there's a 1:1 mapping between a task
and its changelog partition.

Cheers,
Chris

On Thu, Apr 2, 2015 at 10:30 AM, Dan  wrote:

> Hi all,
>
> We're just starting out using Samza to process streams we've already got in
> Kafka. Some of the jobs we've written are using the per task KV store which
> are being persisted to a changelog topic in Kafka. As you need a different
> changelog topic per task we are wondering how people are dealing with
> ensuring that each task's store has a separate changelog.
>
> I think we could define multiple stores in the properties file, then pick
> the correct one for each task index. But that seems quite a verbose way to
> go about that?
>
> If Samza could use a prefix in the properties file then generate a topic
> name for each task it would simplify using that. Maybe there's something
> I'm missing from this?
>
> Thanks,
> Dan
>


Re: [VOTE] Apache Samza 0.9.0 RC0

2015-04-01 Thread Chris Riccomini
Hey Yan,

Awesome! Just realized--could you also open a ticket to migrate
hello-samza/master to 0.9.0, and hello-samza/latest to 0.10.0?

Cheers,
Chris

On Wed, Apr 1, 2015 at 1:42 PM, Yan Fang  wrote:

> Hi guys,
>
> Have updated website for 0.9 release. Feel free to check it. Only one thing
> remaining - waiting for the blog account.
>
> Thanks,
>
> Fang, Yan
> yanfang...@gmail.com
>
> On Tue, Mar 31, 2015 at 8:44 PM, Roger Hoover 
> wrote:
>
> > Nice.  Thanks Yan!
> >
> > On Tue, Mar 31, 2015 at 3:24 PM, Yan Fang  wrote:
> >
> > > Cool.
> > >
> > > * Published to maven, it's already there.
> > > * Uploaded to dist/release. It may take a while for mirrors to pick it
> > up.
> > > * Updated the downloading page in
> > > https://issues.apache.org/jira/browse/SAMZA-624
> > > ** Will publish the website after mirrors pick up the 0.9.0 release
> > > ** In terms of the blog, seem not have the access ?
> > >
> > > Thanks,
> > >
> > > Fang, Yan
> > > yanfang...@gmail.com
> > >
> > > On Tue, Mar 31, 2015 at 1:36 PM, Chris Riccomini <
> criccom...@apache.org>
> > > wrote:
> > >
> > > > Hey Yan/Jakob,
> > > >
> > > > Awesome, thanks! Yan, feel free to finish up the release. :) Very
> cool!
> > > >
> > > > Cheers,
> > > > Chris
> > > >
> > > > On Tue, Mar 31, 2015 at 1:27 PM, Jakob Homan 
> > wrote:
> > > >
> > > > > Correct.  All that's necessary for a release is a
> > > > > more-+1s-than--1s-from-PMC-members vote, and then we can go ahead
> > with
> > > > > distribution, publicity, etc.
> > > > > -jg
> > > > >
> > > > > On 31 March 2015 at 12:44, Chris Riccomini 
> > > > wrote:
> > > > > > Hey Yan,
> > > > > >
> > > > > > Let's confirm with Jakob. I *think* we don't need any
> intervention
> > > from
> > > > > > Apache. We should be able to move forward with the release.
> @Jakob,
> > > can
> > > > > you
> > > > > > confirm this?
> > > > > >
> > > > > > Cheers,
> > > > > > Chris
> > > > > >
> > > > > > On Tue, Mar 31, 2015 at 11:17 AM, Yan Fang  >
> > > > wrote:
> > > > > >
> > > > > >> Hi all,
> > > > > >>
> > > > > >> After 72+ hours, we got +4 binding votes (Chris, Jakob, Chinmay,
> > > Yan)
> > > > ,
> > > > > +2
> > > > > >> non-binding votes (Roger, Yi Pan). The release vote passes.
> > > > > >>
> > > > > >> @Chris, Do we need the vote from apache general mailing list?
> Or I
> > > can
> > > > > go
> > > > > >> ahead to update to release dist, update download page, publish
> > 0.8.0
> > > > > >> binaries to Maven and write 0.8.0 blog post?
> > > > > >>
> > > > > >> Thanks,
> > > > > >> Fang, Yan
> > > > > >> yanfang...@gmail.com
> > > > > >>
> > > > > >> On Tue, Mar 31, 2015 at 9:21 AM, Ash W Matheson <
> > > > ash.mathe...@gmail.com
> > > > > >
> > > > > >> wrote:
> > > > > >>
> > > > > >>> Of say yes, is been a few days with little traffic on the
> topic.
> > > > > >>> On Mar 31, 2015 9:18 AM, "Chris Riccomini" <
> > criccom...@apache.org>
> > > > > wrote:
> > > > > >>>
> > > > > >>> > Hey all,
> > > > > >>> >
> > > > > >>> > Is the vote done?
> > > > > >>> >
> > > > > >>> > Cheers,
> > > > > >>> > Chris
> > > > > >>> >
> > > > > >>> > On Mon, Mar 30, 2015 at 2:10 PM, Chris Riccomini <
> > > > > criccom...@apache.org
> > > > > >>> >
> > > > > >>> > wrote:
> > > > > >>> >
> > > > > >>> > > +1
> > > > > >>> > >
> > > > > >>> > 

Re: Samza on Yarn

2015-04-01 Thread Chris Riccomini
Hey Shekar,

To contribute to that page (or any docs), do the following:

1. Open a JIRA defining the issue (that you intend to fix) with the docs.
2. Check out samza's code base.
3. Change the docs (located in the 'docs' folder). They are markdown
formatted docs.
4. Run a `git diff` from the root directory to get a .patch file.
5. Attach the patch to your JIRA, and click the "Submit patch" button at
the top.

Cheers,
Chris

On Wed, Apr 1, 2015 at 6:59 AM, Shekar Tippur  wrote:

> Chris,
>
> I think I am comfortable now to add couple of steps to the multi node
> setup.
> II am guessing I need some privileges to contribute to
> http://samza.apache.org/learn/tutorials/0.7.0/run-in-multi-node-yarn.html
> (need a jira ticket and privilege to add context and close it as well)
>
> - Shekar
>
> On Fri, Mar 13, 2015 at 1:28 PM, Shekar Tippur  wrote:
>
> > After adding classpath to yarn-site.xml, I found that the jars that were
> > created for argos (extension of Samza) was not part of the tar.gz file
> that
> > was exposed to Http request. I changed the post install script on rpm to
> > expose that.
> >
> > I see that the 2 nodes are showing up on rm. We are testing redundancy
> now.
> >
> > - Shekar
> >
> >
> >
> > On Fri, Mar 13, 2015 at 1:20 PM, Chris Riccomini 
> > wrote:
> >
> >> Hey Shekar,
> >>
> >> Awesome, thanks! Would love to get any doc updates that would be useful.
> >>
> >> Curious: what was wrong?
> >>
> >> Cheers,
> >> Chris
> >>
> >> On Fri, Mar 13, 2015 at 1:00 PM, Shekar Tippur 
> wrote:
> >>
> >> > Thanks for your help Chris. Got it to work now. I will test my case
> and
> >> > documentation further. I can edit the Samza documentation to reflect
> any
> >> > changes.
> >> >
> >> > - Shekar
> >> >
> >> > On Thu, Mar 12, 2015 at 5:19 PM, Chris Riccomini <
> criccom...@apache.org
> >> >
> >> > wrote:
> >> >
> >> > > Hey Shekar,
> >> > >
> >> > > Yes, this is definitely a classpath issue. The pastebin you sent
> does
> >> not
> >> > > include any of the samza-core/samza-yarn/scala JARs. This is rather
> >> > > strange, since you said you put the JARs in this path:
> >> > >
> >> > >   /home/hadoop/hadoop-2.5.2/share/hadoop/hdfs/lib/
> >> > >
> >> > > And I do see *other* JARs listed with this path. Are you sure you
> put
> >> the
> >> > > Samza JARs on *all* machines, not just the RM machine? According to
> >> the
> >> > > yarn-default.xml logs, it says:
> >> > >
> >> > > CLASSPATH for YARN applications. A comma-separated list of CLASSPATH
> >> > > entries. When this value is empty, the following default CLASSPATH
> for
> >> > YARN
> >> > > applications would be used. For Linux: $HADOOP_CONF_DIR,
> >> > > $HADOOP_COMMON_HOME/share/hadoop/common/*,
> >> > > $HADOOP_COMMON_HOME/share/hadoop/common/lib/*,
> >> > > $HADOOP_HDFS_HOME/share/hadoop/hdfs/*,
> >> > > $HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*,
> >> > > $HADOOP_YARN_HOME/share/hadoop/yarn/*,
> >> > > $HADOOP_YARN_HOME/share/hadoop/yarn/lib/*
> >> > >
> >> > > So, it seems like it should pick up the JARs, if they're in the NM's
> >> > > directory.
> >> > >
> >> > > The exception that you're now seeing seems to suggest that one of
> the
> >> > Samza
> >> > > containers is failing:
> >> > >
> >> > > Container for appattempt_1426204312971_0001_02 exited with
> >> exitCode:
> >> > 1
> >> > >
> >> > > The _02 suffix indicates a non-AM failure (i.e. the Samza
> >> container
> >> > > failed, not the Samza AM). Can you check the AM logs, and find the
> >> > http://
> >> > > ...
> >> > > link to the container logs? It should give a hint about why the
> >> container
> >> > > failed.
> >> > >
> >> > > Cheers,
> >> > > Chris
> >> > >
> >> > > On Thu, Mar 12, 2015 at 4:58 PM, Shekar Tippur 
> >> > wrote:
> >> > >
> >> > > > Chris,

Re: [VOTE] Apache Samza 0.9.0 RC0

2015-03-31 Thread Chris Riccomini
Hey Yan,

Based on this:


https://issues.apache.org/jira/issues/?jql=project%20%3D%20INFRA%20AND%20component%20%3D%20Blogs

It looks like you'll need to open an INFRA ticket to get your blog account.
=)

Cheers,
Chris

On Tue, Mar 31, 2015 at 3:24 PM, Yan Fang  wrote:

> Cool.
>
> * Published to maven, it's already there.
> * Uploaded to dist/release. It may take a while for mirrors to pick it up.
> * Updated the downloading page in
> https://issues.apache.org/jira/browse/SAMZA-624
> ** Will publish the website after mirrors pick up the 0.9.0 release
> ** In terms of the blog, seem not have the access ?
>
> Thanks,
>
> Fang, Yan
> yanfang...@gmail.com
>
> On Tue, Mar 31, 2015 at 1:36 PM, Chris Riccomini 
> wrote:
>
>> Hey Yan/Jakob,
>>
>> Awesome, thanks! Yan, feel free to finish up the release. :) Very cool!
>>
>> Cheers,
>> Chris
>>
>> On Tue, Mar 31, 2015 at 1:27 PM, Jakob Homan  wrote:
>>
>> > Correct.  All that's necessary for a release is a
>> > more-+1s-than--1s-from-PMC-members vote, and then we can go ahead with
>> > distribution, publicity, etc.
>> > -jg
>> >
>> > On 31 March 2015 at 12:44, Chris Riccomini 
>> wrote:
>> > > Hey Yan,
>> > >
>> > > Let's confirm with Jakob. I *think* we don't need any intervention
>> from
>> > > Apache. We should be able to move forward with the release. @Jakob,
>> can
>> > you
>> > > confirm this?
>> > >
>> > > Cheers,
>> > > Chris
>> > >
>> > > On Tue, Mar 31, 2015 at 11:17 AM, Yan Fang 
>> wrote:
>> > >
>> > >> Hi all,
>> > >>
>> > >> After 72+ hours, we got +4 binding votes (Chris, Jakob, Chinmay,
>> Yan) ,
>> > +2
>> > >> non-binding votes (Roger, Yi Pan). The release vote passes.
>> > >>
>> > >> @Chris, Do we need the vote from apache general mailing list? Or I
>> can
>> > go
>> > >> ahead to update to release dist, update download page, publish 0.8.0
>> > >> binaries to Maven and write 0.8.0 blog post?
>> > >>
>> > >> Thanks,
>> > >> Fang, Yan
>> > >> yanfang...@gmail.com
>> > >>
>> > >> On Tue, Mar 31, 2015 at 9:21 AM, Ash W Matheson <
>> ash.mathe...@gmail.com
>> > >
>> > >> wrote:
>> > >>
>> > >>> Of say yes, is been a few days with little traffic on the topic.
>> > >>> On Mar 31, 2015 9:18 AM, "Chris Riccomini" 
>> > wrote:
>> > >>>
>> > >>> > Hey all,
>> > >>> >
>> > >>> > Is the vote done?
>> > >>> >
>> > >>> > Cheers,
>> > >>> > Chris
>> > >>> >
>> > >>> > On Mon, Mar 30, 2015 at 2:10 PM, Chris Riccomini <
>> > criccom...@apache.org
>> > >>> >
>> > >>> > wrote:
>> > >>> >
>> > >>> > > +1
>> > >>> > >
>> > >>> > > 1. Validated hello-samza works with 0.9.0 Maven binaries.
>> > >>> > > 2. Validated release-0.9.0-rc0 tag exists and has correct
>> > checksums.
>> > >>> > > 3. Validated source release tarball builds, and has correct
>> > licenses
>> > >>> > > (bin/check-all.sh).
>> > >>> > > 4. Validated source release tarball validates against Yan's PGP
>> > key.
>> > >>> > > 5. Ran rolling bounce of Kafka cluster with large job (1
>> million+
>> > >>> > > messages/sec)
>> > >>> > > 6. Ran Zopkio integration tests, and SAMZA-394 torture test.
>> > >>> > >
>> > >>> > > For (6), I ran the SAMZA-394 tests for > 72 hours with torture
>> test
>> > >>> > > running. No consistency/data loss issues! I did find an issue
>> with
>> > the
>> > >>> > > checker integration test, but I think it's best left for
>> 0.10.0, so
>> > >>> I'll
>> > >>> > > open a JIRA to track that.
>> > >>> > >
>> > >>> > >
>> > >>> > > On Mon, Mar 30, 2015 at 10:49 AM, Roger Hoover <
>> > >>> roger.hoo

Re: Samza closing and re-opening kafka connection rapidly, cannot consume or produce, no useful logs

2015-03-31 Thread Chris Riccomini
Hey Andrew,

It looks like your attachment was stripped by Apache's mailing server.
Looking at the info you pasted, I can tell you that YARN is most likely
unable to provision your containers due to space constraint. Here's the
issue:

Memory Used: 1 GB
Memory Total: 1.76 GB

The YARN AM and YARN container both take 1G. Your AM is requesting a 1G
container, which YARN then queues up, and waits for 1G of space to become
available. Because you only have 760MB left on the node, this will never
happen. The AM (and YARN) will just sit and wait for more resources.

To test this theory, try setting:

yarn.am.container.memory.mb=512
yarn.container.memory.mb=512

The first config sets the AM container's memory to 512MB. The second config
sets the SamzaContainer's container to 512MB. Both of these should fit on a
1.76G node.

Thanks!
Chris

On Tue, Mar 31, 2015 at 2:59 PM, Andrew Sannier  wrote:

> Thanks so much for getting back to me, Chris.
>
> I’ve attached the AM log from my most recent attempt to run the
> hello-samza wikipedia-feed task. I’ve been using pretty small nodes to
> keep costs down while I test and so forth, so that makes a lot of sense
> (though I definitely hoped I’d configured appropriate memory ceilings).
> Here are the values from the YARN UI:
>
> Containers Running: 1
> Memory Used: 1 GB
>   Memory Total: 1.76 GB
>   Memory Reserved: 0 B
>   VCores Used: 1
>   VCores Total: 8
>   VCores Reserved: 0
> Active Nodes: 1
> Decommissioned Nodes: 0
> Lost Nodes: 0
> Unhealthy Nodes: 0
> Rebooted Nodes: 0
>
>
> Again, much obliged for your response.
>
> Andrew Sannier
>
>
>
> On 3/31/15, 3:54 PM, "Chris Riccomini"  wrote:
>
> >Hey Andrew,
> >
> >I'm wondering if your YARN cluster doesn't have enough memory to fit both
> >the AM and its containers. The fact that the AM UI shows no running
> >containers is suspicious. Can you check these four settings in your YARN
> >RM's UI:
> >
> >  Memory Used
> >  Memory Total
> >  Memory Reserved
> >  VCores Used
> >  VCores Total
> >  VCores Reserved
> >
> >Can you also attach (or post to gist/pastebin/etc) the YARN AM's full log?
> >
> >Cheers,
> >Chris
> >
> >On Tue, Mar 31, 2015 at 2:32 PM, Andrew Sannier
> > >> wrote:
> >
> >> Something to add here: there are a couple of weird things in the Samza
> >> Application Master web UI: Application master task ID is -1, which seems
> >> odd, and the Running Containers table is completely empty. How could
> >>YARN
> >> call a task “Running” if there’s no container?
> >>
> >> Thanks,
> >> Andrew Sannier
> >>
> >>
> >>
> >>
> >>
> >> On 3/31/15, 2:19 PM, "Andrew Sannier" 
> >>wrote:
> >>
> >> >Hi all -
> >> >
> >> >Thanks in advance for your help; I have been totally stuck on this for
> >>a
> >> >couple of days.
> >> >
> >> >I have a small YARN cluster with one ResourceManager and one
> >>NodeManager
> >> >as well as one Zookeeper node and one Kafka node - trying to keep the
> >> >number of moving parts to a minimum. I¹ve been following the guide to
> >> >running Samza on YARN
> >>
> >>>(
> https://samza.apache.org/learn/tutorials/0.8/run-in-multi-node-yarn.htm
> >>>l
> >> )
> >> >,
> >> > and I get to the end of the tutorial with a Running job in the YARN
> >>web
> >> >UI, as expected. However, the job doesn¹t actually appear to do
> >>anything -
> >> >messages are not produced to the ³wikipedia-raw² topic (nor is the
> >>topic
> >> >created), and no data is logged at all.
> >> >
> >> >To that point, I am having a ton of trouble with Samza¹s logging - in
> >> >samza.log.dir on the ResourceManager node, there¹s only
> >>gc.log.0.current,
> >> >and in the YARN log directory I have only the resourcemanager log
> >>which of
> >> >course contains no application information. On the NodeManager side,
> >> >samza.log.dir contains application-manager.log, which ends at "[INFO]
> >> >Requesting 1 container(s) with 1700mb of memory² right after the job
> >> >enters the Running state, it¹s own copy of gc.log.0.current, and stderr
> >> >and stdout which contain no useful information and also don¹t grow
> >>after
> >> >the first second of the job running. In YARN¹s logs, there¹s only the
> >>node

Re: Samza closing and re-opening kafka connection rapidly, cannot consume or produce, no useful logs

2015-03-31 Thread Chris Riccomini
Hey Andrew,

I'm wondering if your YARN cluster doesn't have enough memory to fit both
the AM and its containers. The fact that the AM UI shows no running
containers is suspicious. Can you check these four settings in your YARN
RM's UI:

  Memory Used
  Memory Total
  Memory Reserved
  VCores Used
  VCores Total
  VCores Reserved

Can you also attach (or post to gist/pastebin/etc) the YARN AM's full log?

Cheers,
Chris

On Tue, Mar 31, 2015 at 2:32 PM, Andrew Sannier  wrote:

> Something to add here: there are a couple of weird things in the Samza
> Application Master web UI: Application master task ID is -1, which seems
> odd, and the Running Containers table is completely empty. How could YARN
> call a task “Running” if there’s no container?
>
> Thanks,
> Andrew Sannier
>
>
>
>
>
> On 3/31/15, 2:19 PM, "Andrew Sannier"  wrote:
>
> >Hi all -
> >
> >Thanks in advance for your help; I have been totally stuck on this for a
> >couple of days.
> >
> >I have a small YARN cluster with one ResourceManager and one NodeManager
> >as well as one Zookeeper node and one Kafka node - trying to keep the
> >number of moving parts to a minimum. I¹ve been following the guide to
> >running Samza on YARN
> >(https://samza.apache.org/learn/tutorials/0.8/run-in-multi-node-yarn.html
> )
> >,
> > and I get to the end of the tutorial with a Running job in the YARN web
> >UI, as expected. However, the job doesn¹t actually appear to do anything -
> >messages are not produced to the ³wikipedia-raw² topic (nor is the topic
> >created), and no data is logged at all.
> >
> >To that point, I am having a ton of trouble with Samza¹s logging - in
> >samza.log.dir on the ResourceManager node, there¹s only gc.log.0.current,
> >and in the YARN log directory I have only the resourcemanager log which of
> >course contains no application information. On the NodeManager side,
> >samza.log.dir contains application-manager.log, which ends at "[INFO]
> >Requesting 1 container(s) with 1700mb of memory² right after the job
> >enters the Running state, it¹s own copy of gc.log.0.current, and stderr
> >and stdout which contain no useful information and also don¹t grow after
> >the first second of the job running. In YARN¹s logs, there¹s only the node
> >manager log, which has no errors or warnings and just logs the startup of
> >the container and then its memory usage from then on, which seems fine:
> >
> >2015-03-31 20:17:34,635 INFO  [Container Monitor]
> >monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(408)) -
> >Memory usage of ProcessTree 25767 for container-id
> >container_1427823389325_0011_01_01: 104.9 MB of 1 GB physical memory
> >used; 2.4 GB of 3.1 GB virtual memory used
> >
> >
> >What am I missing here? WikipediaFeed.java contains a whole bunch of
> >logging statements, but nothing ever hits any file I can find. Even if you
> >can¹t help with the problem I¹m having with hello-samza, I would greatly
> >appreciate any advice on how I can get useful logs from Samza jobs.
> >
> >I¹ve checked that I can ping the Wikipedia IRC URL and consume
> >from/produce to the Kafka cluster with the console shell scripts from both
> >the ResourceManager and NodeManager nodes, and other applications can work
> >with my Kafka and Zookeeper with no issues. From the application-master
> >log on the worker node, all I can see is that Samza configures the
> >Wikipedia IRC system, starts the Webapp, and requests a container. It
> >enters the Running state with YARN, after which point nothing happens at
> >all. There¹s no activity at all in the Kafka or Zookeeper logs.
> >
> >And that¹s it; the job will run for hours if I let it but at no point is
> >anything produced to Kafka or logged at all. I wrote a simpler task that
> >just accepts a json message from a topic on Kafka, adds a timestamp, and
> >produces to another topic, but almost nothing is different. From
> >application-master log:
> >
> >2015-03-31 20:07:05 ClientUtils$ [INFO] Fetching metadata from broker
> >id:0,host:172.31.2.19,port:9092 with correlation id 0 for 1 topic(s)
> >Set(test)
> >2015-03-31 20:07:05 SyncProducer [INFO] Connected to 172.31.2.19:9092 for
> >producing
> >2015-03-31 20:07:05 SyncProducer [INFO] Disconnecting from
> >172.31.2.19:9092
> >2015-03-31 20:07:06 KafkaSystemAdmin$ [INFO] Got metadata: Map(test ->
> >SystemStreamMetadata [streamName=test, partitionMetadata={Partition
> >[partition=0]=SystemStreamPartitionMetadata [oldestOffset=0,
> >newestOffset=4, upcomingOffset=5], Partition
> >[partition=1]=SystemStreamPartitionMetadata [oldestOffset=null,
> >newestOffset=null, upcomingOffset=0]}])
> >
> >
> >which all looks correct. Then it connects to ResourceManager, starts the
> >Webapp, Requests a container and starts running. All I see in Kafka¹s log
> >is
> >
> >[2015-03-31 20:07:05,999] INFO Closing socket connection to /172.31.1.229
> .
> >(kafka.network.Processor)
> >[2015-03-31 20:07:06,090] INFO Closing socket connection to /172.31.1.229
> .
> >(kafka.network

Re: [VOTE] Apache Samza 0.9.0 RC0

2015-03-31 Thread Chris Riccomini
Hey Yan/Jakob,

Awesome, thanks! Yan, feel free to finish up the release. :) Very cool!

Cheers,
Chris

On Tue, Mar 31, 2015 at 1:27 PM, Jakob Homan  wrote:

> Correct.  All that's necessary for a release is a
> more-+1s-than--1s-from-PMC-members vote, and then we can go ahead with
> distribution, publicity, etc.
> -jg
>
> On 31 March 2015 at 12:44, Chris Riccomini  wrote:
> > Hey Yan,
> >
> > Let's confirm with Jakob. I *think* we don't need any intervention from
> > Apache. We should be able to move forward with the release. @Jakob, can
> you
> > confirm this?
> >
> > Cheers,
> > Chris
> >
> > On Tue, Mar 31, 2015 at 11:17 AM, Yan Fang  wrote:
> >
> >> Hi all,
> >>
> >> After 72+ hours, we got +4 binding votes (Chris, Jakob, Chinmay, Yan) ,
> +2
> >> non-binding votes (Roger, Yi Pan). The release vote passes.
> >>
> >> @Chris, Do we need the vote from apache general mailing list? Or I can
> go
> >> ahead to update to release dist, update download page, publish 0.8.0
> >> binaries to Maven and write 0.8.0 blog post?
> >>
> >> Thanks,
> >> Fang, Yan
> >> yanfang...@gmail.com
> >>
> >> On Tue, Mar 31, 2015 at 9:21 AM, Ash W Matheson  >
> >> wrote:
> >>
> >>> Of say yes, is been a few days with little traffic on the topic.
> >>> On Mar 31, 2015 9:18 AM, "Chris Riccomini" 
> wrote:
> >>>
> >>> > Hey all,
> >>> >
> >>> > Is the vote done?
> >>> >
> >>> > Cheers,
> >>> > Chris
> >>> >
> >>> > On Mon, Mar 30, 2015 at 2:10 PM, Chris Riccomini <
> criccom...@apache.org
> >>> >
> >>> > wrote:
> >>> >
> >>> > > +1
> >>> > >
> >>> > > 1. Validated hello-samza works with 0.9.0 Maven binaries.
> >>> > > 2. Validated release-0.9.0-rc0 tag exists and has correct
> checksums.
> >>> > > 3. Validated source release tarball builds, and has correct
> licenses
> >>> > > (bin/check-all.sh).
> >>> > > 4. Validated source release tarball validates against Yan's PGP
> key.
> >>> > > 5. Ran rolling bounce of Kafka cluster with large job (1 million+
> >>> > > messages/sec)
> >>> > > 6. Ran Zopkio integration tests, and SAMZA-394 torture test.
> >>> > >
> >>> > > For (6), I ran the SAMZA-394 tests for > 72 hours with torture test
> >>> > > running. No consistency/data loss issues! I did find an issue with
> the
> >>> > > checker integration test, but I think it's best left for 0.10.0, so
> >>> I'll
> >>> > > open a JIRA to track that.
> >>> > >
> >>> > >
> >>> > > On Mon, Mar 30, 2015 at 10:49 AM, Roger Hoover <
> >>> roger.hoo...@gmail.com>
> >>> > > wrote:
> >>> > >
> >>> > >> +1
> >>> > >>
> >>> > >> * Created and tested an sample job doing a join
> >>> > >> * Build packages
> >>> > >> * Couldn't get integration tests to work.  It seemed like it was
> >>> timing
> >>> > >> out
> >>> > >> trying to download dependencies.  I may have been having network
> >>> issues
> >>> > >> last night.
> >>> > >>
> >>> > >> Cheers,
> >>> > >>
> >>> > >> Roger
> >>> > >>
> >>> > >> On Sun, Mar 29, 2015 at 9:09 PM, Jakob Homan 
> >>> wrote:
> >>> > >>
> >>> > >> > +1 (binding)
> >>> > >> >
> >>> > >> > * Verified sig and checksum
> >>> > >> > * Spot checked files, verified license and notice
> >>> > >> > * Built packages
> >>> > >> > * Ran hello samza
> >>> > >> >
> >>> > >> > Good work, Yan.
> >>> > >> > -jg
> >>> > >> >
> >>> > >> >
> >>> > >> > On 29 March 2015 at 13:08, Chinmay Soman <
> >>> chinmay.cere...@gmail.com>
> >>> > >> > wrote:
> >>> > >> > > +1
&

Re: [VOTE] Apache Samza 0.9.0 RC0

2015-03-31 Thread Chris Riccomini
Hey Yan,

Let's confirm with Jakob. I *think* we don't need any intervention from
Apache. We should be able to move forward with the release. @Jakob, can you
confirm this?

Cheers,
Chris

On Tue, Mar 31, 2015 at 11:17 AM, Yan Fang  wrote:

> Hi all,
>
> After 72+ hours, we got +4 binding votes (Chris, Jakob, Chinmay, Yan) , +2
> non-binding votes (Roger, Yi Pan). The release vote passes.
>
> @Chris, Do we need the vote from apache general mailing list? Or I can go
> ahead to update to release dist, update download page, publish 0.8.0
> binaries to Maven and write 0.8.0 blog post?
>
> Thanks,
> Fang, Yan
> yanfang...@gmail.com
>
> On Tue, Mar 31, 2015 at 9:21 AM, Ash W Matheson 
> wrote:
>
>> Of say yes, is been a few days with little traffic on the topic.
>> On Mar 31, 2015 9:18 AM, "Chris Riccomini"  wrote:
>>
>> > Hey all,
>> >
>> > Is the vote done?
>> >
>> > Cheers,
>> > Chris
>> >
>> > On Mon, Mar 30, 2015 at 2:10 PM, Chris Riccomini > >
>> > wrote:
>> >
>> > > +1
>> > >
>> > > 1. Validated hello-samza works with 0.9.0 Maven binaries.
>> > > 2. Validated release-0.9.0-rc0 tag exists and has correct checksums.
>> > > 3. Validated source release tarball builds, and has correct licenses
>> > > (bin/check-all.sh).
>> > > 4. Validated source release tarball validates against Yan's PGP key.
>> > > 5. Ran rolling bounce of Kafka cluster with large job (1 million+
>> > > messages/sec)
>> > > 6. Ran Zopkio integration tests, and SAMZA-394 torture test.
>> > >
>> > > For (6), I ran the SAMZA-394 tests for > 72 hours with torture test
>> > > running. No consistency/data loss issues! I did find an issue with the
>> > > checker integration test, but I think it's best left for 0.10.0, so
>> I'll
>> > > open a JIRA to track that.
>> > >
>> > >
>> > > On Mon, Mar 30, 2015 at 10:49 AM, Roger Hoover <
>> roger.hoo...@gmail.com>
>> > > wrote:
>> > >
>> > >> +1
>> > >>
>> > >> * Created and tested an sample job doing a join
>> > >> * Build packages
>> > >> * Couldn't get integration tests to work.  It seemed like it was
>> timing
>> > >> out
>> > >> trying to download dependencies.  I may have been having network
>> issues
>> > >> last night.
>> > >>
>> > >> Cheers,
>> > >>
>> > >> Roger
>> > >>
>> > >> On Sun, Mar 29, 2015 at 9:09 PM, Jakob Homan 
>> wrote:
>> > >>
>> > >> > +1 (binding)
>> > >> >
>> > >> > * Verified sig and checksum
>> > >> > * Spot checked files, verified license and notice
>> > >> > * Built packages
>> > >> > * Ran hello samza
>> > >> >
>> > >> > Good work, Yan.
>> > >> > -jg
>> > >> >
>> > >> >
>> > >> > On 29 March 2015 at 13:08, Chinmay Soman <
>> chinmay.cere...@gmail.com>
>> > >> > wrote:
>> > >> > > +1
>> > >> > >
>> > >> > > * Verified signature against the release
>> > >> > > * Verified authenticity of the key
>> > >> > > * Ran hello-samza (latest branch) - all 3 jobs succeed againt
>> 0.9.0
>> > >> > release
>> > >> > > * Ran integration tests - all 3 tests pass (after addresing
>> > SAMZA-621.
>> > >> > > Also, once a failure occurs - I have to manually kill the Yarn
>> > >> daemons.
>> > >> > Not
>> > >> > > sure if there's a ticket open for that - a quick search did not
>> > reveal
>> > >> > > anything).
>> > >> > >
>> > >> > >
>> > >> > > Good job guys !
>> > >> > >
>> > >> > > On Sat, Mar 28, 2015 at 1:36 AM, Yan Fang 
>> > >> wrote:
>> > >> > >
>> > >> > >> Hi Chris and Jakob,
>> > >> > >>
>> > >> > >> Sure. Let's do the voting until Monday. Hope more guys have
>> time to
>> > >> try
>> > >> > and
>> > >> > >>

Re: Kafka Question

2015-03-31 Thread Chris Riccomini
Hey Shekar,

The full list of 0.9.0 features and fixes is here:


https://issues.apache.org/jira/issues/?jql=project%20%3D%20SAMZA%20AND%20fixVersion%20%3D%200.9.0%20AND%20status%20in%20(Resolved%2C%20Closed)

Re: SQL and windowing, most of that work has been done in isolation in the
samza-sql branch:


https://git-wip-us.apache.org/repos/asf?p=samza.git;a=tree;h=refs/heads/samza-sql;hb=refs/heads/samza-sql

This branch is periodically rebased with 0.9.0, but has added patches from
Milinda, Navina, Yi, etc.

Cheers,
Chris

On Tue, Mar 31, 2015 at 10:46 AM, Shekar Tippur  wrote:

> Perfect Chris. We will test out the latest version. Did not get a chance to
> test it.
> On the same note, as I have not caught up with 0.9.0, is there anyway to
> get the feature list or some release notes for 0.9.0. I am interested with
> the windowing and SQL like capabilities.
>
> - Shekar
>
> On Tue, Mar 31, 2015 at 9:19 AM, Chris Riccomini 
> wrote:
>
> > Hey Shekar,
> >
> > Are you running with 0.8.0 when you run these tests? If so, there are
> some
> > known issues where a Samza consumer can get stuck when brokers
> disappeared.
> > All known issues have been resolved in the 0.9.0 release, but they exist
> in
> > the 0.8.0 release.
> >
> > Cheers,
> > Chris
> >
> > On Mon, Mar 30, 2015 at 5:26 PM, Shekar Tippur 
> wrote:
> >
> > > One more thing ..
> > >
> > > We have this configured as well.
> > >
> > >
> > >
> >
> systems.kafka.producer.metadata.broker.list=sprfargas102:6667,sprfargas103:6667
> > >
> > > On Mon, Mar 30, 2015 at 5:08 PM, Shekar Tippur 
> > wrote:
> > >
> > > > Hello,
> > > >
> > > > I realise this is a Kafka question. Since there are quite a few Kafka
> > > > experts here and the answer to this question can help Samza community
> > as
> > > > well, bringing this question here.
> > > >
> > > > We are testing Kafka redundancy (just 2 brokers for now).
> > > >
> > > > We have increased replication factor to 2 servers (Both kafka
> brokers)
> > > > We have included auto.leader.rebalance.enable=true so that the leader
> > > gets
> > > > redundancy and bounced Kafka.
> > > > We have moved the topic across both the servers.
> > > >
> > > > Below table shows the sequence of events. Can someone please explain
> > why
> > > > we see this behaviour in step 4?
> > > >
> > > >
> > > >
> > > > Broker 1
> > > >
> > > > Broker 2
> > > >
> > > > Kafka status
> > > >
> > > > Step 1
> > > >
> > > > Up
> > > >
> > > > Up
> > > >
> > > > Running
> > > >
> > > > Step 2
> > > >
> > > > Down
> > > >
> > > > Up
> > > >
> > > > Running
> > > >
> > > > Step 3
> > > >
> > > > Up
> > > >
> > > > Up
> > > >
> > > > Running
> > > >
> > > > Step 4
> > > >
> > > > Up
> > > >
> > > > Down
> > > >
> > > > Not running (Connection refused)
> > > >
> > >
> >
>


Re: Samza install quick guide

2015-03-31 Thread Chris Riccomini
Hey Jordi,

Thanks for this! I've copied your pastebin script, and put it on:

  https://issues.apache.org/jira/browse/SAMZA-189

So that we don't lose track of it.

Cheers,
Chris

On Tue, Mar 31, 2015 at 3:48 AM, Jordi Blasi Uribarri 
wrote:

> Hi,
>
> I am not sure what is the correct way of doing this (I am sure this is not
> it, but anyway…). Following your advice I was able to get working Samza and
> now I am trying to explore it’s full capabilities. As promised, the notes,
> quick installation guide, I was writing in the process is in the pastebin
> link I include here. I hope it helps next newbie that comes to Samza like
> me.
>
> http://pastebin.com/XDDhi29k
>
> I don’t know how long will they keep it there, so I recommend that if you
> find it interesting, you download and include it somewhere accessible.
>
> Bye,
>
> Jordi
> 
> Jordi Blasi Uribarri
> Área I+D+i
>
> jbl...@nextel.es
> Oficina Bilbao
>
> [http://www.nextel.es/wp-content/uploads/Firma_Nextel_2014.png]
>


Re: Kafka Question

2015-03-31 Thread Chris Riccomini
Hey Shekar,

Are you running with 0.8.0 when you run these tests? If so, there are some
known issues where a Samza consumer can get stuck when brokers disappeared.
All known issues have been resolved in the 0.9.0 release, but they exist in
the 0.8.0 release.

Cheers,
Chris

On Mon, Mar 30, 2015 at 5:26 PM, Shekar Tippur  wrote:

> One more thing ..
>
> We have this configured as well.
>
>
> systems.kafka.producer.metadata.broker.list=sprfargas102:6667,sprfargas103:6667
>
> On Mon, Mar 30, 2015 at 5:08 PM, Shekar Tippur  wrote:
>
> > Hello,
> >
> > I realise this is a Kafka question. Since there are quite a few Kafka
> > experts here and the answer to this question can help Samza community as
> > well, bringing this question here.
> >
> > We are testing Kafka redundancy (just 2 brokers for now).
> >
> > We have increased replication factor to 2 servers (Both kafka brokers)
> > We have included auto.leader.rebalance.enable=true so that the leader
> gets
> > redundancy and bounced Kafka.
> > We have moved the topic across both the servers.
> >
> > Below table shows the sequence of events. Can someone please explain why
> > we see this behaviour in step 4?
> >
> >
> >
> > Broker 1
> >
> > Broker 2
> >
> > Kafka status
> >
> > Step 1
> >
> > Up
> >
> > Up
> >
> > Running
> >
> > Step 2
> >
> > Down
> >
> > Up
> >
> > Running
> >
> > Step 3
> >
> > Up
> >
> > Up
> >
> > Running
> >
> > Step 4
> >
> > Up
> >
> > Down
> >
> > Not running (Connection refused)
> >
>


Re: [VOTE] Apache Samza 0.9.0 RC0

2015-03-31 Thread Chris Riccomini
Hey all,

Is the vote done?

Cheers,
Chris

On Mon, Mar 30, 2015 at 2:10 PM, Chris Riccomini 
wrote:

> +1
>
> 1. Validated hello-samza works with 0.9.0 Maven binaries.
> 2. Validated release-0.9.0-rc0 tag exists and has correct checksums.
> 3. Validated source release tarball builds, and has correct licenses
> (bin/check-all.sh).
> 4. Validated source release tarball validates against Yan's PGP key.
> 5. Ran rolling bounce of Kafka cluster with large job (1 million+
> messages/sec)
> 6. Ran Zopkio integration tests, and SAMZA-394 torture test.
>
> For (6), I ran the SAMZA-394 tests for > 72 hours with torture test
> running. No consistency/data loss issues! I did find an issue with the
> checker integration test, but I think it's best left for 0.10.0, so I'll
> open a JIRA to track that.
>
>
> On Mon, Mar 30, 2015 at 10:49 AM, Roger Hoover 
> wrote:
>
>> +1
>>
>> * Created and tested an sample job doing a join
>> * Build packages
>> * Couldn't get integration tests to work.  It seemed like it was timing
>> out
>> trying to download dependencies.  I may have been having network issues
>> last night.
>>
>> Cheers,
>>
>> Roger
>>
>> On Sun, Mar 29, 2015 at 9:09 PM, Jakob Homan  wrote:
>>
>> > +1 (binding)
>> >
>> > * Verified sig and checksum
>> > * Spot checked files, verified license and notice
>> > * Built packages
>> > * Ran hello samza
>> >
>> > Good work, Yan.
>> > -jg
>> >
>> >
>> > On 29 March 2015 at 13:08, Chinmay Soman 
>> > wrote:
>> > > +1
>> > >
>> > > * Verified signature against the release
>> > > * Verified authenticity of the key
>> > > * Ran hello-samza (latest branch) - all 3 jobs succeed againt 0.9.0
>> > release
>> > > * Ran integration tests - all 3 tests pass (after addresing SAMZA-621.
>> > > Also, once a failure occurs - I have to manually kill the Yarn
>> daemons.
>> > Not
>> > > sure if there's a ticket open for that - a quick search did not reveal
>> > > anything).
>> > >
>> > >
>> > > Good job guys !
>> > >
>> > > On Sat, Mar 28, 2015 at 1:36 AM, Yan Fang 
>> wrote:
>> > >
>> > >> Hi Chris and Jakob,
>> > >>
>> > >> Sure. Let's do the voting until Monday. Hope more guys have time to
>> try
>> > and
>> > >> validate the 0.9.0 version.
>> > >>
>> > >> Thanks,
>> > >>
>> > >> Fang, Yan
>> > >> yanfang...@gmail.com
>> > >> +1 (206) 849-4108
>> > >>
>> > >> On Fri, Mar 27, 2015 at 5:09 PM, Chris Riccomini <
>> criccom...@apache.org
>> > >
>> > >> wrote:
>> > >>
>> > >> > Hey Yan,
>> > >> >
>> > >> > Yea, could we delay until Monday? I have been doing a lot of
>> burn-in,
>> > and
>> > >> > have found some issues with the torture tests in SAMZA-394. The
>> issues
>> > >> > appear to be with the tests themselves, not data-loss/correctness
>> > issues
>> > >> in
>> > >> > Samza, but I just want to make sure. Planning to run the burn-in
>> over
>> > the
>> > >> > weekend. I'll open up JIRAs to fix the SAMZA-394 torture tests once
>> > I'm
>> > >> > confident it's stable.
>> > >> >
>> > >> > Cheers,
>> > >> > Chris
>> > >> >
>> > >> > On Fri, Mar 27, 2015 at 4:28 PM, Jakob Homan 
>> > wrote:
>> > >> >
>> > >> > > That would be great.  I've been trying to get to this but have
>> > failed.
>> > >> > > I can definitely look at the release tomorrow.
>> > >> > > -jg
>> > >> > >
>> > >> > >
>> > >> > > On 27 March 2015 at 16:08, Yan Fang 
>> wrote:
>> > >> > > > Hi guys,
>> > >> > > >
>> > >> > > > It has been 72 hours. We got +1 from Yi Pan. Do we extend the
>> > voting
>> > >> to
>> > >> > > > this weekend ?
>> > >> > > >
>> > >> > > > Thanks,
>> > >> > > > Fang, Yan
>> 

Re: [VOTE] Apache Samza 0.9.0 RC0

2015-03-30 Thread Chris Riccomini
+1

1. Validated hello-samza works with 0.9.0 Maven binaries.
2. Validated release-0.9.0-rc0 tag exists and has correct checksums.
3. Validated source release tarball builds, and has correct licenses
(bin/check-all.sh).
4. Validated source release tarball validates against Yan's PGP key.
5. Ran rolling bounce of Kafka cluster with large job (1 million+
messages/sec)
6. Ran Zopkio integration tests, and SAMZA-394 torture test.

For (6), I ran the SAMZA-394 tests for > 72 hours with torture test
running. No consistency/data loss issues! I did find an issue with the
checker integration test, but I think it's best left for 0.10.0, so I'll
open a JIRA to track that.


On Mon, Mar 30, 2015 at 10:49 AM, Roger Hoover 
wrote:

> +1
>
> * Created and tested an sample job doing a join
> * Build packages
> * Couldn't get integration tests to work.  It seemed like it was timing out
> trying to download dependencies.  I may have been having network issues
> last night.
>
> Cheers,
>
> Roger
>
> On Sun, Mar 29, 2015 at 9:09 PM, Jakob Homan  wrote:
>
> > +1 (binding)
> >
> > * Verified sig and checksum
> > * Spot checked files, verified license and notice
> > * Built packages
> > * Ran hello samza
> >
> > Good work, Yan.
> > -jg
> >
> >
> > On 29 March 2015 at 13:08, Chinmay Soman 
> > wrote:
> > > +1
> > >
> > > * Verified signature against the release
> > > * Verified authenticity of the key
> > > * Ran hello-samza (latest branch) - all 3 jobs succeed againt 0.9.0
> > release
> > > * Ran integration tests - all 3 tests pass (after addresing SAMZA-621.
> > > Also, once a failure occurs - I have to manually kill the Yarn daemons.
> > Not
> > > sure if there's a ticket open for that - a quick search did not reveal
> > > anything).
> > >
> > >
> > > Good job guys !
> > >
> > > On Sat, Mar 28, 2015 at 1:36 AM, Yan Fang 
> wrote:
> > >
> > >> Hi Chris and Jakob,
> > >>
> > >> Sure. Let's do the voting until Monday. Hope more guys have time to
> try
> > and
> > >> validate the 0.9.0 version.
> > >>
> > >> Thanks,
> > >>
> > >> Fang, Yan
> > >> yanfang...@gmail.com
> > >> +1 (206) 849-4108
> > >>
> > >> On Fri, Mar 27, 2015 at 5:09 PM, Chris Riccomini <
> criccom...@apache.org
> > >
> > >> wrote:
> > >>
> > >> > Hey Yan,
> > >> >
> > >> > Yea, could we delay until Monday? I have been doing a lot of
> burn-in,
> > and
> > >> > have found some issues with the torture tests in SAMZA-394. The
> issues
> > >> > appear to be with the tests themselves, not data-loss/correctness
> > issues
> > >> in
> > >> > Samza, but I just want to make sure. Planning to run the burn-in
> over
> > the
> > >> > weekend. I'll open up JIRAs to fix the SAMZA-394 torture tests once
> > I'm
> > >> > confident it's stable.
> > >> >
> > >> > Cheers,
> > >> > Chris
> > >> >
> > >> > On Fri, Mar 27, 2015 at 4:28 PM, Jakob Homan 
> > wrote:
> > >> >
> > >> > > That would be great.  I've been trying to get to this but have
> > failed.
> > >> > > I can definitely look at the release tomorrow.
> > >> > > -jg
> > >> > >
> > >> > >
> > >> > > On 27 March 2015 at 16:08, Yan Fang  wrote:
> > >> > > > Hi guys,
> > >> > > >
> > >> > > > It has been 72 hours. We got +1 from Yi Pan. Do we extend the
> > voting
> > >> to
> > >> > > > this weekend ?
> > >> > > >
> > >> > > > Thanks,
> > >> > > > Fang, Yan
> > >> > > > yanfang...@gmail.com
> > >> > > > +1 (206) 849-4108
> > >> > > >
> > >> > > > On Thu, Mar 26, 2015 at 11:07 PM, Yi Pan 
> > >> wrote:
> > >> > > >
> > >> > > >> I have ran the integration test suite w/ 0.9.0-rc0. There were
> > some
> > >> > > issues
> > >> > > >> related w/ the integration test: SAMZA-621, but the test suite
> > >> passed
> > >> > > after
> > >> > > >> I manually cr

Re: [VOTE] Apache Samza 0.9.0 RC0

2015-03-27 Thread Chris Riccomini
Hey Yan,

Yea, could we delay until Monday? I have been doing a lot of burn-in, and
have found some issues with the torture tests in SAMZA-394. The issues
appear to be with the tests themselves, not data-loss/correctness issues in
Samza, but I just want to make sure. Planning to run the burn-in over the
weekend. I'll open up JIRAs to fix the SAMZA-394 torture tests once I'm
confident it's stable.

Cheers,
Chris

On Fri, Mar 27, 2015 at 4:28 PM, Jakob Homan  wrote:

> That would be great.  I've been trying to get to this but have failed.
> I can definitely look at the release tomorrow.
> -jg
>
>
> On 27 March 2015 at 16:08, Yan Fang  wrote:
> > Hi guys,
> >
> > It has been 72 hours. We got +1 from Yi Pan. Do we extend the voting to
> > this weekend ?
> >
> > Thanks,
> > Fang, Yan
> > yanfang...@gmail.com
> > +1 (206) 849-4108
> >
> > On Thu, Mar 26, 2015 at 11:07 PM, Yi Pan  wrote:
> >
> >> I have ran the integration test suite w/ 0.9.0-rc0. There were some
> issues
> >> related w/ the integration test: SAMZA-621, but the test suite passed
> after
> >> I manually created a symlink to the file name the test script is looking
> >> for.
> >>
> >> Hence, +1 on the release.
> >>
> >> On Thu, Mar 26, 2015 at 5:39 PM, Roger Hoover 
> >> wrote:
> >>
> >> > Hi Chris + all,
> >> >
> >> > I created a basic job that does a join from local state with Samza
> 0.9.0
> >> (
> >> > https://github.com/Quantiply/rico-playground/tree/master/join/samza).
> >> So
> >> > far so good. I hoping to get some time this weekend to benchmark it
> on my
> >> > laptop.  I think I saw that 0.9.0 includes support for sending job
> logs
> >> to
> >> > a topic. I want to try this out as well.
> >> >
> >> > Cheers,
> >> >
> >> > Roger
> >> >
> >> > On Thu, Mar 26, 2015 at 5:25 PM, Chris Riccomini <
> criccom...@apache.org>
> >> > wrote:
> >> >
> >> > > Hey all,
> >> > >
> >> > > I'm running validations and some burn-in. I'll post my vote
> tomorrow.
> >> > >
> >> > > It's been pretty quiet. It'd be good to get other
> >> > committers/non-committers
> >> > > to do validation as well.
> >> > >
> >> > > Cheers,
> >> > > Chris
> >> > >
> >> > >
> >> > > On Wed, Mar 25, 2015 at 11:20 AM, Yan Fang 
> >> wrote:
> >> > >
> >> > > > Hi Chris,
> >> > > >
> >> > > > Opps, signed it with another key. Now updated all files in
> >> > > > http://people.apache.org/~yanfang/samza-0.9.0-rc0/ . Verified.
> Sorry
> >> > for
> >> > > > the inconvenience.
> >> > > >
> >> > > > Cheers,
> >> > > >
> >> > > > Fang, Yan
> >> > > > yanfang...@gmail.com
> >> > > > +1 (206) 849-4108
> >> > > >
> >> > > > On Wed, Mar 25, 2015 at 10:58 AM, Chris Riccomini <
> >> > criccom...@apache.org
> >> > > >
> >> > > > wrote:
> >> > > >
> >> > > > > Hey Yan,
> >> > > > >
> >> > > > > Were you able to validate the source tarball? I ran:
> >> > > > >
> >> > > > > $ gpg --keyserver pgpkeys.mit.edu --recv-key CAC06239EA00BA80
> >> > > > > gpg: requesting key EA00BA80 from hkp server pgpkeys.mit.edu
> >> > > > > gpg: key EA00BA80: public key "Yan Fang (CODE SIGNING KEY) <
> >> > > > > yanf...@apache.org>" imported
> >> > > > > gpg: Total number processed: 1
> >> > > > > gpg:   imported: 1  (RSA: 1)
> >> > > > >
> >> > > > > $ gpg --fingerprint CAC06239EA00BA80
> >> > > > > pub   4096R/EA00BA80 2015-03-24 [expires: 2020-03-22]
> >> > > > >   Key fingerprint = 7091 46DA 2CF3 EACF 476E  B077 CAC0 6239
> >> EA00
> >> > > > BA80
> >> > > > > uid  Yan Fang (CODE SIGNING KEY) <
> >> yanf...@apache.org
> >> > >
> >> > > > > sub   4096R/E3F3DAD3 2015-03-24 [expires: 

Re: java.lang.NoClassDefFoundError on Yarn job

2015-03-27 Thread Chris Riccomini
Hey Jordi,

"No FileSystem for scheme: http" means that your YARN NMs aren't configured
with the http filesystem in your core-site.xml:



  fs.http.impl
  org.apache.samza.util.hadoop.HttpFileSystem



As documented here:

  http://samza.apache.org/learn/tutorials/0.8/run-in-multi-node-yarn.html

Cheers,
Chris

On Fri, Mar 27, 2015 at 2:24 AM, Jordi Blasi Uribarri 
wrote:

> I thought it was working, but not really. The job runs and I can see it on
> the web admin. But when it has to process a message it fails, It goes down
> and I get this exception:
>
> Application application_1427403490569_0002 failed 2 times due to AM
> Container for appattempt_1427403490569_0002_02 exited with exitCode:
> -1000
> For more detailed output, check application tracking page:
> http://samza01:8088/proxy/application_1427403490569_0002/Then, click on
> links to logs of each attempt.
> Diagnostics: No FileSystem for scheme: http
> java.io.IOException: No FileSystem for scheme: http
> at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584)
> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)
> at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630)
> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612)
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)
> at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
> at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:249)
> at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:61)
> at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359)
> at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:357)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
> at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:356)
> at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:60)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Failing this attempt. Failing the application.
>
> What I am doing wrong?
>
> Thanks,
>
> Jordi
>
> -Mensaje original-
> De: Jordi Blasi Uribarri [mailto:jbl...@nextel.es]
> Enviado el: viernes, 27 de marzo de 2015 9:45
> Para: dev@samza.apache.org
> Asunto: RE: java.lang.NoClassDefFoundError on Yarn job
>
> Solved. My aplication was using 0.9.0 version of yarn. When downgraded to
> 0.8.0 it worked.
>
> Thanks,
>
>  Jordi
>
> -Mensaje original-
> De: Jordi Blasi Uribarri [mailto:jbl...@nextel.es] Enviado el: viernes,
> 27 de marzo de 2015 9:05
> Para: dev@samza.apache.org
> Asunto: RE: java.lang.NoClassDefFoundError on Yarn job
>
> I did the steps that were included in the case and I am getting the same
> error.
>
> Exception in thread "main" java.lang.NoClassDefFoundError:
> org/apache/hadoop/conf/Configuration
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:191)
> at org.apache.samza.job.JobRunner.run(JobRunner.scala:56)
> at org.apache.samza.job.JobRunner$.main(JobRunner.scala:37)
> at org.apache.samza.job.JobRunner.main(JobRunner.scala)
> Caused by: java.lang.ClassNotFoundException:
> org.apache.hadoop.conf.Configuration
> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
> ... 5 more
>
> The only difference I see is that I downloaded hadoop 2.6 instead of 2.4.
> Is the version mandatory? Should I downgrade?
>
> Thanks,
>
> Jordi
>
> -Mensaje original-
> De: Roger Hoover [mailto:roger.hoo...@gmail.com] Enviado el: jueves, 26
> de marzo de 2015 17:25
> Para: dev@samza.apache.org
> Asunto: Re: java.lang.NoClassDefFoundError on Yarn job
>
> Hi Jordi,
>
> You might be running into this issue (
> https://issues.apache.org/jira/browse/SAMZA-456) which I just hit as well.
> You probably need to add a couple more jars to your YARN lib dir.
>
> Cheers,
>
> Roger
>
> On Thu, Mar 26, 2015 at 9:21 AM, Jordi Blasi Uribarri 
> wrote:
>
> > Hi:
> >
> > I got samza runni

Re: [VOTE] Apache Samza 0.9.0 RC0

2015-03-26 Thread Chris Riccomini
Hey all,

I'm running validations and some burn-in. I'll post my vote tomorrow.

It's been pretty quiet. It'd be good to get other committers/non-committers
to do validation as well.

Cheers,
Chris


On Wed, Mar 25, 2015 at 11:20 AM, Yan Fang  wrote:

> Hi Chris,
>
> Opps, signed it with another key. Now updated all files in
> http://people.apache.org/~yanfang/samza-0.9.0-rc0/ . Verified. Sorry for
> the inconvenience.
>
> Cheers,
>
> Fang, Yan
> yanfang...@gmail.com
> +1 (206) 849-4108
>
> On Wed, Mar 25, 2015 at 10:58 AM, Chris Riccomini 
> wrote:
>
> > Hey Yan,
> >
> > Were you able to validate the source tarball? I ran:
> >
> > $ gpg --keyserver pgpkeys.mit.edu --recv-key CAC06239EA00BA80
> > gpg: requesting key EA00BA80 from hkp server pgpkeys.mit.edu
> > gpg: key EA00BA80: public key "Yan Fang (CODE SIGNING KEY) <
> > yanf...@apache.org>" imported
> > gpg: Total number processed: 1
> > gpg:   imported: 1  (RSA: 1)
> >
> > $ gpg --fingerprint CAC06239EA00BA80
> > pub   4096R/EA00BA80 2015-03-24 [expires: 2020-03-22]
> >   Key fingerprint = 7091 46DA 2CF3 EACF 476E  B077 CAC0 6239 EA00
> BA80
> > uid  Yan Fang (CODE SIGNING KEY) 
> > sub   4096R/E3F3DAD3 2015-03-24 [expires: 2020-03-22]
> >
> > $ gpg apache-samza-0.9.0-src.tgz.asc
> > gpg: Signature made Tue Mar 24 11:51:58 2015 PDT using RSA key ID
> 0CAE52EA
> > gpg: Can't check signature: public key not found
> >
> > Cheers,
> > Chris
> >
> > On Tue, Mar 24, 2015 at 3:18 PM, Yan Fang  wrote:
> >
> > > Hey all,
> > >
> > > This is a call for a vote on a release of Apache Samza 0.9.0. This is
> our
> > > first release as the Apache top-level project. Thanks to everyone who
> has
> > > contributed to this release. We are very glad to see some new
> > contributors
> > > in this release.
> > >
> > > The release candidate can be downloaded from here:
> > >
> > > http://people.apache.org/~yanfang/samza-0.9.0-rc0/
> > >
> > > The release candidate is signed with pgp key CAC06239EA00BA80, which is
> > > included in the repository's KEYS file:
> > >
> > >
> > >
> >
> https://git-wip-us.apache.org/repos/asf?p=samza.git;a=blob_plain;f=KEYS;hb=6f5bafb6cd93934781161eb6b1868d11ea347c95
> > >
> > > and can also be found on keyservers:
> > >
> > > http://pgp.mit.edu/pks/lookup?op=get&search=0xCAC06239EA00BA80
> > >
> > > The git tag is release-0.9.0-rc0 and signed with the same pgp key:
> > >
> > >
> > >
> >
> https://git-wip-us.apache.org/repos/asf?p=samza.git;a=tag;h=1039f7ede6490f9420dcecd6adc7677b97e78bcf
> > >
> > > Test binaries have been published to Maven's staging repository, and
> are
> > > available here:
> > >
> > >
> https://repository.apache.org/content/repositories/orgapachesamza-1005/
> > >
> > > Note that the binaries were built with JDK6 without incident.
> > >
> > > 95 issues were resolved for this release:
> > >
> > >
> > >
> >
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SAMZA%20AND%20fixVersion%20%3D%200.9.0%20AND%20status%20in%20(Resolved%2C%20Closed)
> > >
> > > The vote will be open for 72 hours ( end in 4:00pm Friday, 03/27/2015
> ).
> > > Please download the release candidate, check the hashes/signature,
> build
> > it
> > > and test it, and then please vote:
> > >
> > > [ ] +1 approve
> > > [ ] +0 no opinion
> > > [ ] -1 disapprove (and reason why)
> > >
> > > +1 from my side for the release.
> > >
> > > Fang, Yan
> > > yanfang...@gmail.com
> > > +1 (206) 849-4108
> > >
> >
>


Re: Error running integration tests

2015-03-25 Thread Chris Riccomini
Hey Roger,

Are you able to run this command?

  ssh localhost

This is effectively what Zopkio is doing. Wondering if you need to enable
SSH on your laptop? I have "remote login" enabled on my OSX laptop.

Cheers,
Chris

On Wed, Mar 25, 2015 at 4:29 PM, Roger Hoover 
wrote:

> Do I need to bring up sshd on my laptop or can the tests be made to not
> ssh?
>
> On Wed, Mar 25, 2015 at 4:27 PM, Roger Hoover 
> wrote:
>
> > Hi,
> >
> > I wanted to see if I could run the integration tests on the 0.9.0 branch
> > on my Mac.
> >
> > I cloned the 0.9.0 branch from the github mirror, built everything
> > (./gradlew clean build), and tried to run the integration tests.
> >
> > ./bin/integration-tests.sh /tmp/roger
> > I get an error when the test script tries to deploy ZooKeeper using SSH.
> > I'm running on Mac OS X.
> >
> > Any suggestions?
> >
> > Thanks,
> >
> > Roger
> >
> > 2015-03-25 16:11:40,368 zopkio.test_runner [ERROR] Aborting single
> > execution due to setup_suite failure:
> >
> > Traceback (most recent call last):
> >
> >   File
> >
> "/tmp/roger/samza-integration-tests/lib/python2.7/site-packages/zopkio/test_runner.py",
> > line 107, in run
> >
> > self.deployment_module.setup_suite()
> >
> >   File "/tmp/roger/scripts/deployment.py", line 76, in setup_suite
> >
> > 'hostname': host
> >
> >   File
> >
> "/tmp/roger/samza-integration-tests/lib/python2.7/site-packages/zopkio/deployer.py",
> > line 77, in deploy
> >
> > self.install(unique_id, configs)
> >
> >   File
> >
> "/tmp/roger/samza-integration-tests/lib/python2.7/site-packages/zopkio/adhoc_deployer.py",
> > line 129, in install
> >
> > with get_ssh_client(hostname, username=runtime.get_username(),
> > password=runtime.get_password()) as ssh:
> >
> >   File
> >
> "/usr/local/Cellar/python/2.7.9/Frameworks/Python.framework/Versions/2.7/lib/python2.7/contextlib.py",
> > line 17, in __enter__
> >
> > return self.gen.next()
> >
> >   File
> >
> "/tmp/roger/samza-integration-tests/lib/python2.7/site-packages/zopkio/remote_host_helper.py",
> > line 204, in get_ssh_client
> >
> > ssh.connect(hostname, username=username, password=password)
> >
> >   File
> >
> "/tmp/roger/samza-integration-tests/lib/python2.7/site-packages/paramiko/client.py",
> > line 251, in connect
> >
> > retry_on_signal(lambda: sock.connect(addr))
> >
> >   File
> >
> "/tmp/roger/samza-integration-tests/lib/python2.7/site-packages/paramiko/util.py",
> > line 270, in retry_on_signal
> >
> > return function()
> >
> >   File
> >
> "/tmp/roger/samza-integration-tests/lib/python2.7/site-packages/paramiko/client.py",
> > line 251, in 
> >
> > retry_on_signal(lambda: sock.connect(addr))
> >
> >   File
> >
> "/usr/local/Cellar/python/2.7.9/Frameworks/Python.framework/Versions/2.7/lib/python2.7/socket.py",
> > line 224, in meth
> >
> > return getattr(self._sock,name)(*args)
> >
> > error: [Errno 61] Connection refused
> >
> >
>


Re: [VOTE] Apache Samza 0.9.0 RC0

2015-03-25 Thread Chris Riccomini
Hey Yan,

Were you able to validate the source tarball? I ran:

$ gpg --keyserver pgpkeys.mit.edu --recv-key CAC06239EA00BA80
gpg: requesting key EA00BA80 from hkp server pgpkeys.mit.edu
gpg: key EA00BA80: public key "Yan Fang (CODE SIGNING KEY) <
yanf...@apache.org>" imported
gpg: Total number processed: 1
gpg:   imported: 1  (RSA: 1)

$ gpg --fingerprint CAC06239EA00BA80
pub   4096R/EA00BA80 2015-03-24 [expires: 2020-03-22]
  Key fingerprint = 7091 46DA 2CF3 EACF 476E  B077 CAC0 6239 EA00 BA80
uid  Yan Fang (CODE SIGNING KEY) 
sub   4096R/E3F3DAD3 2015-03-24 [expires: 2020-03-22]

$ gpg apache-samza-0.9.0-src.tgz.asc
gpg: Signature made Tue Mar 24 11:51:58 2015 PDT using RSA key ID 0CAE52EA
gpg: Can't check signature: public key not found

Cheers,
Chris

On Tue, Mar 24, 2015 at 3:18 PM, Yan Fang  wrote:

> Hey all,
>
> This is a call for a vote on a release of Apache Samza 0.9.0. This is our
> first release as the Apache top-level project. Thanks to everyone who has
> contributed to this release. We are very glad to see some new contributors
> in this release.
>
> The release candidate can be downloaded from here:
>
> http://people.apache.org/~yanfang/samza-0.9.0-rc0/
>
> The release candidate is signed with pgp key CAC06239EA00BA80, which is
> included in the repository's KEYS file:
>
>
> https://git-wip-us.apache.org/repos/asf?p=samza.git;a=blob_plain;f=KEYS;hb=6f5bafb6cd93934781161eb6b1868d11ea347c95
>
> and can also be found on keyservers:
>
> http://pgp.mit.edu/pks/lookup?op=get&search=0xCAC06239EA00BA80
>
> The git tag is release-0.9.0-rc0 and signed with the same pgp key:
>
>
> https://git-wip-us.apache.org/repos/asf?p=samza.git;a=tag;h=1039f7ede6490f9420dcecd6adc7677b97e78bcf
>
> Test binaries have been published to Maven's staging repository, and are
> available here:
>
> https://repository.apache.org/content/repositories/orgapachesamza-1005/
>
> Note that the binaries were built with JDK6 without incident.
>
> 95 issues were resolved for this release:
>
>
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SAMZA%20AND%20fixVersion%20%3D%200.9.0%20AND%20status%20in%20(Resolved%2C%20Closed)
>
> The vote will be open for 72 hours ( end in 4:00pm Friday, 03/27/2015 ).
> Please download the release candidate, check the hashes/signature, build it
> and test it, and then please vote:
>
> [ ] +1 approve
> [ ] +0 no opinion
> [ ] -1 disapprove (and reason why)
>
> +1 from my side for the release.
>
> Fang, Yan
> yanfang...@gmail.com
> +1 (206) 849-4108
>


Re: Log error deploying on YARN [Samza 0.8.0]

2015-03-24 Thread Chris Riccomini
Cool, website should be fixed when we publish for the 0.9.0 release.

On Tue, Mar 24, 2015 at 2:32 PM, Roger Hoover 
wrote:

> Ah, yes. That's it!  Thanks, Chris.
>
> On Tue, Mar 24, 2015 at 2:30 PM, Chris Riccomini 
> wrote:
>
> > Hey Roger,
> >
> > You're likely hitting this issue:
> >
> >   https://issues.apache.org/jira/browse/SAMZA-456
> >
> > Can you have a look and see if that's the problem? We missed some JARs
> that
> > need to be put in to the YARN NM classpath.
> >
> > Cheers,
> > Chris
> >
> > On Tue, Mar 24, 2015 at 2:22 PM, Roger Hoover 
> > wrote:
> >
> > > Hi all,
> > >
> > > I'm new to YARN and trying to have YARN download the Samza job tarball
> (
> > >
> https://samza.apache.org/learn/tutorials/0.8/run-in-multi-node-yarn.html
> > ).
> > > From the log, it seems that the download failed.  I've tested that the
> > file
> > > is available via curl.  The error message is:
> > > org/apache/samza/util/Logging
> > >
> > > I appreciate any suggestions.
> > >
> > > Roger
> > >
> > >
> > > 2015-03-24 17:13:05,469 INFO  [Socket Reader #1 for port 33749]
> > ipc.Server
> > > (Server.java:saslProcess(1294)) - Auth successful for
> > > appattempt_1427226422217_0005_02 (auth:SIMPLE)
> > >
> > > 2015-03-24 17:13:05,473 INFO  [IPC Server handler 15 on 33749]
> > > containermanager.ContainerManagerImpl
> > > (ContainerManagerImpl.java:startContainerInternal(572)) - Start request
> > for
> > > container_1427226422217_0005_02_01 by user opintel
> > >
> > > 2015-03-24 17:13:05,473 INFO  [IPC Server handler 15 on 33749]
> > > nodemanager.NMAuditLogger (NMAuditLogger.java:logSuccess(89)) -
> > > USER=opintel
> > > IP=10.53.152.54 OPERATION=Start Container Request
> > > TARGET=ContainerManageImpl
> > > RESULT=SUCCESS APPID=application_1427226422217_0005
> > > CONTAINERID=container_1427226422217_0005_02_01
> > >
> > > 2015-03-24 17:13:05,473 INFO  [AsyncDispatcher event handler]
> > > application.Application (ApplicationImpl.java:transition(296)) - Adding
> > > container_1427226422217_0005_02_01 to application
> > > application_1427226422217_0005
> > >
> > > 2015-03-24 17:13:05,474 INFO  [AsyncDispatcher event handler]
> > > container.Container (ContainerImpl.java:handle(884)) - Container
> > > container_1427226422217_0005_02_01 transitioned from NEW to
> > LOCALIZING
> > >
> > > 2015-03-24 17:13:05,474 INFO  [AsyncDispatcher event handler]
> > > containermanager.AuxServices (AuxServices.java:handle(175)) - Got event
> > > CONTAINER_INIT for appId application_1427226422217_0005
> > >
> > > 2015-03-24 17:13:05,475 INFO  [AsyncDispatcher event handler]
> > > localizer.LocalizedResource (LocalizedResource.java:handle(196)) -
> > Resource
> > > http://somehost.fake.com/samza/web-log-0.0.1-dist.tar.gz transitioned
> > from
> > > INIT to DOWNLOADING
> > >
> > > 2015-03-24 17:13:05,475 INFO  [AsyncDispatcher event handler]
> > > localizer.ResourceLocalizationService
> > > (ResourceLocalizationService.java:handle(596)) - Created localizer for
> > > container_1427226422217_0005_02_01
> > >
> > > 2015-03-24 17:13:05,480 INFO  [LocalizerRunner for
> > > container_1427226422217_0005_02_01]
> > > localizer.ResourceLocalizationService
> > > (ResourceLocalizationService.java:writeCredentials(1029)) - Writing
> > > credentials to the nmPrivate file
> > >
> > >
> >
> /tmp/hadoop-opintel/nm-local-dir/nmPrivate/container_1427226422217_0005_02_01.tokens.
> > > Credentials list:
> > >
> > > 2015-03-24 17:13:05,481 INFO  [LocalizerRunner for
> > > container_1427226422217_0005_02_01]
> > > nodemanager.DefaultContainerExecutor
> > > (DefaultContainerExecutor.java:createUserCacheDirs(469)) - Initializing
> > > user opintel
> > >
> > > 2015-03-24 17:13:05,492 INFO  [LocalizerRunner for
> > > container_1427226422217_0005_02_01]
> > > nodemanager.DefaultContainerExecutor
> > > (DefaultContainerExecutor.java:startLocalizer(103)) - Copying from
> > >
> > >
> >
> /tmp/hadoop-opintel/nm-local-dir/nmPrivate/container_1427226422217_0005_02_01.tokens
> > > to
> > >
> > >
> >
> /tmp/hadoop-opintel/nm-local-dir/usercach

Re: Log error deploying on YARN [Samza 0.8.0]

2015-03-24 Thread Chris Riccomini
Hey Roger,

You're likely hitting this issue:

  https://issues.apache.org/jira/browse/SAMZA-456

Can you have a look and see if that's the problem? We missed some JARs that
need to be put in to the YARN NM classpath.

Cheers,
Chris

On Tue, Mar 24, 2015 at 2:22 PM, Roger Hoover 
wrote:

> Hi all,
>
> I'm new to YARN and trying to have YARN download the Samza job tarball (
> https://samza.apache.org/learn/tutorials/0.8/run-in-multi-node-yarn.html).
> From the log, it seems that the download failed.  I've tested that the file
> is available via curl.  The error message is:
> org/apache/samza/util/Logging
>
> I appreciate any suggestions.
>
> Roger
>
>
> 2015-03-24 17:13:05,469 INFO  [Socket Reader #1 for port 33749] ipc.Server
> (Server.java:saslProcess(1294)) - Auth successful for
> appattempt_1427226422217_0005_02 (auth:SIMPLE)
>
> 2015-03-24 17:13:05,473 INFO  [IPC Server handler 15 on 33749]
> containermanager.ContainerManagerImpl
> (ContainerManagerImpl.java:startContainerInternal(572)) - Start request for
> container_1427226422217_0005_02_01 by user opintel
>
> 2015-03-24 17:13:05,473 INFO  [IPC Server handler 15 on 33749]
> nodemanager.NMAuditLogger (NMAuditLogger.java:logSuccess(89)) -
> USER=opintel
> IP=10.53.152.54 OPERATION=Start Container Request
> TARGET=ContainerManageImpl
> RESULT=SUCCESS APPID=application_1427226422217_0005
> CONTAINERID=container_1427226422217_0005_02_01
>
> 2015-03-24 17:13:05,473 INFO  [AsyncDispatcher event handler]
> application.Application (ApplicationImpl.java:transition(296)) - Adding
> container_1427226422217_0005_02_01 to application
> application_1427226422217_0005
>
> 2015-03-24 17:13:05,474 INFO  [AsyncDispatcher event handler]
> container.Container (ContainerImpl.java:handle(884)) - Container
> container_1427226422217_0005_02_01 transitioned from NEW to LOCALIZING
>
> 2015-03-24 17:13:05,474 INFO  [AsyncDispatcher event handler]
> containermanager.AuxServices (AuxServices.java:handle(175)) - Got event
> CONTAINER_INIT for appId application_1427226422217_0005
>
> 2015-03-24 17:13:05,475 INFO  [AsyncDispatcher event handler]
> localizer.LocalizedResource (LocalizedResource.java:handle(196)) - Resource
> http://somehost.fake.com/samza/web-log-0.0.1-dist.tar.gz transitioned from
> INIT to DOWNLOADING
>
> 2015-03-24 17:13:05,475 INFO  [AsyncDispatcher event handler]
> localizer.ResourceLocalizationService
> (ResourceLocalizationService.java:handle(596)) - Created localizer for
> container_1427226422217_0005_02_01
>
> 2015-03-24 17:13:05,480 INFO  [LocalizerRunner for
> container_1427226422217_0005_02_01]
> localizer.ResourceLocalizationService
> (ResourceLocalizationService.java:writeCredentials(1029)) - Writing
> credentials to the nmPrivate file
>
> /tmp/hadoop-opintel/nm-local-dir/nmPrivate/container_1427226422217_0005_02_01.tokens.
> Credentials list:
>
> 2015-03-24 17:13:05,481 INFO  [LocalizerRunner for
> container_1427226422217_0005_02_01]
> nodemanager.DefaultContainerExecutor
> (DefaultContainerExecutor.java:createUserCacheDirs(469)) - Initializing
> user opintel
>
> 2015-03-24 17:13:05,492 INFO  [LocalizerRunner for
> container_1427226422217_0005_02_01]
> nodemanager.DefaultContainerExecutor
> (DefaultContainerExecutor.java:startLocalizer(103)) - Copying from
>
> /tmp/hadoop-opintel/nm-local-dir/nmPrivate/container_1427226422217_0005_02_01.tokens
> to
>
> /tmp/hadoop-opintel/nm-local-dir/usercache/opintel/appcache/application_1427226422217_0005/container_1427226422217_0005_02_01.tokens
>
> 2015-03-24 17:13:05,492 INFO  [LocalizerRunner for
> container_1427226422217_0005_02_01]
> nodemanager.DefaultContainerExecutor
> (DefaultContainerExecutor.java:startLocalizer(105)) - CWD set to
>
> /tmp/hadoop-opintel/nm-local-dir/usercache/opintel/appcache/application_1427226422217_0005
> =
>
> file:/tmp/hadoop-opintel/nm-local-dir/usercache/opintel/appcache/application_1427226422217_0005
>
> 2015-03-24 17:13:05,520 INFO  [IPC Server handler 4 on 8040]
> localizer.ResourceLocalizationService
> (ResourceLocalizationService.java:update(928)) - DEBUG: FAILED { http://
> somehost.fake.com/samza/web-log-0.0.1-dist.tar.gz, 0, ARCHIVE, null },
> org/apache/samza/util/Logging
>
> 2015-03-24 17:13:05,520 INFO  [IPC Server handler 4 on 8040]
> localizer.LocalizedResource (LocalizedResource.java:handle(196)) - Resource
> http://somehost.fake.com/samza/web-log-0.0.1-dist.tar.gz transitioned from
> DOWNLOADING to FAILED
>
> 2015-03-24 17:13:05,520 INFO  [AsyncDispatcher event handler]
> container.Container (ContainerImpl.java:handle(884)) - Container
> container_1427226422217_0005_02_01 transitioned from LOCALIZING to
> LOCALIZATION_FAILED
>
> 2015-03-24 17:13:05,521 INFO  [AsyncDispatcher event handler]
> localizer.LocalResourcesTrackerImpl
> (LocalResourcesTrackerImpl.java:handle(137)) - Container
> container_1427226422217_0005_02_01 sent RELEASE event on a resource
> request { http://somehost.fake.com/samza/web-log-0.0

Re: Submitting yarn job with custom properties

2015-03-24 Thread Chris Riccomini
FYI- I haven't actually tried this, but I think it should work. If
__package doesn't work, try without the __package/ prefix (i.e.
fopen('yourfile.txt')).

On Tue, Mar 24, 2015 at 1:26 PM, Chris Riccomini 
wrote:

> Cool, that should work. In YARN, the .tgz file is unzipped into
> $CWD/__package/, so if you have a file in the root of your tarball, then
> fopen('__package/yourfile.txt') should work.
>
> On Tue, Mar 24, 2015 at 1:22 PM, Shekar Tippur  wrote:
>
>> Thanks Chris. I was trying option #2.
>>
>> - Shekar
>>
>> On Tue, Mar 24, 2015 at 11:09 AM, Chris Riccomini 
>> wrote:
>>
>> > Hey Shekar,
>> >
>> > So you have two properties files? One is the Samza job file, and the
>> other
>> > is one with the format you described above?
>> >
>> > Several options:
>> >
>> > 1. Move these properties into the main Samza job file. You'll then have
>> > access to them via the Config object.
>> > 2. Put the property file into your .tgz file. You'll then have access
>> to it
>> > relative to the current working directory, from your code.
>> > 3. Put it somewhere remote (HDFS, HTTP, etc), and have your job pull it
>> > from there.
>> >
>> > Cheers,
>> > Chris
>> >
>> > On Mon, Mar 23, 2015 at 4:10 PM, Shekar Tippur 
>> wrote:
>> >
>> > > I would like to decouple Samza properties with the custom ones (if
>> > > possible).
>> > >
>> > > - Shekar
>> > >
>> > > On Mon, Mar 23, 2015 at 3:12 PM, Shekar Tippur 
>> > wrote:
>> > >
>> > > > Hello,
>> > > >
>> > > > I have a custom properties file with name value pairs.
>> > > >
>> > > > name1 value1
>> > > > name2 value2
>> > > > name3 value3
>> > > >
>> > > > I want to pass this to Yarn job. What is the best way to achieve
>> this?
>> > > > This works well when we declare locally or thread job.
>> > > >
>> > > > - Shekar
>> > > >
>> > >
>> >
>>
>
>


Re: Submitting yarn job with custom properties

2015-03-24 Thread Chris Riccomini
Cool, that should work. In YARN, the .tgz file is unzipped into
$CWD/__package/, so if you have a file in the root of your tarball, then
fopen('__package/yourfile.txt') should work.

On Tue, Mar 24, 2015 at 1:22 PM, Shekar Tippur  wrote:

> Thanks Chris. I was trying option #2.
>
> - Shekar
>
> On Tue, Mar 24, 2015 at 11:09 AM, Chris Riccomini 
> wrote:
>
> > Hey Shekar,
> >
> > So you have two properties files? One is the Samza job file, and the
> other
> > is one with the format you described above?
> >
> > Several options:
> >
> > 1. Move these properties into the main Samza job file. You'll then have
> > access to them via the Config object.
> > 2. Put the property file into your .tgz file. You'll then have access to
> it
> > relative to the current working directory, from your code.
> > 3. Put it somewhere remote (HDFS, HTTP, etc), and have your job pull it
> > from there.
> >
> > Cheers,
> > Chris
> >
> > On Mon, Mar 23, 2015 at 4:10 PM, Shekar Tippur 
> wrote:
> >
> > > I would like to decouple Samza properties with the custom ones (if
> > > possible).
> > >
> > > - Shekar
> > >
> > > On Mon, Mar 23, 2015 at 3:12 PM, Shekar Tippur 
> > wrote:
> > >
> > > > Hello,
> > > >
> > > > I have a custom properties file with name value pairs.
> > > >
> > > > name1 value1
> > > > name2 value2
> > > > name3 value3
> > > >
> > > > I want to pass this to Yarn job. What is the best way to achieve
> this?
> > > > This works well when we declare locally or thread job.
> > > >
> > > > - Shekar
> > > >
> > >
> >
>


Samza 0.9.0

2015-03-24 Thread Chris Riccomini
Hey all,

We've cut the 0.9.0 release for Samza. Yan is going to run the release.
Expect a [VOTE] email shortly.

Cheers,
Chris


Re: Submitting yarn job with custom properties

2015-03-24 Thread Chris Riccomini
Hey Shekar,

So you have two properties files? One is the Samza job file, and the other
is one with the format you described above?

Several options:

1. Move these properties into the main Samza job file. You'll then have
access to them via the Config object.
2. Put the property file into your .tgz file. You'll then have access to it
relative to the current working directory, from your code.
3. Put it somewhere remote (HDFS, HTTP, etc), and have your job pull it
from there.

Cheers,
Chris

On Mon, Mar 23, 2015 at 4:10 PM, Shekar Tippur  wrote:

> I would like to decouple Samza properties with the custom ones (if
> possible).
>
> - Shekar
>
> On Mon, Mar 23, 2015 at 3:12 PM, Shekar Tippur  wrote:
>
> > Hello,
> >
> > I have a custom properties file with name value pairs.
> >
> > name1 value1
> > name2 value2
> > name3 value3
> >
> > I want to pass this to Yarn job. What is the best way to achieve this?
> > This works well when we declare locally or thread job.
> >
> > - Shekar
> >
>


Re: Review Request 32407: SAMZA-571: add suppression interface for uncaught exceptions

2015-03-23 Thread Chris Riccomini

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/32407/#review77470
---


* Nit: 2 space, not 4 space for indentation.
* It's kind of odd to have exceptionHandler.maybeHandle { 
maybeHandle(coordinator, envelope, tryBlock = { ... I had a comment in the RB 
about consolidating this.
* TestTaskInstance.scala has a bunch of red-highlighted (in RB) white space 
indents that should be removed.


samza-api/src/main/java/org/apache/samza/task/ExceptionTask.java
<https://reviews.apache.org/r/32407/#comment125559>

Javadoc



samza-api/src/main/java/org/apache/samza/task/exception/TaskExceptionContext.java
<https://reviews.apache.org/r/32407/#comment125561>

Rather than adding more contexts, I think we should just keep method 
signatures like we have with init() and process():

exception(Exception e, MessageCollector collector, TaskCoordinator 
coordinator)

Also, rather than having a lot of potentially null parameters, what do you 
think about having multiple callbacks in ExceptionTask, each for different 
points in the lifecycle (e.g. a callback with no coordinator for init, a 
callback with no incoming message envelope for window/commit, etc)?

I'm a little concerned about usability if any param can be null at any 
given time. It seems more intuitive to make the callbacks explicit by having 
different params for different points in the lifecycle. Could even have methods 
names like: initException(), processException(), etc.



samza-core/src/main/scala/org/apache/samza/config/TaskConfig.scala
<https://reviews.apache.org/r/32407/#comment125577>

Is this ever used?



samza-core/src/main/scala/org/apache/samza/container/RunLoop.scala
<https://reviews.apache.org/r/32407/#comment125578>

Seems like white space got out of whack here.



samza-core/src/main/scala/org/apache/samza/container/TaskInstance.scala
<https://reviews.apache.org/r/32407/#comment125579>

It seems like we shouldn't have two of these here. I think 
TaskInstanceExceptionHandler should implement ExceptionTask, and be used when 
isExceptionTask is false.



samza-core/src/main/scala/org/apache/samza/container/TaskInstance.scala
<https://reviews.apache.org/r/32407/#comment125580>

Nit: white space.



samza-core/src/main/scala/org/apache/samza/system/SystemConsumers.scala
<https://reviews.apache.org/r/32407/#comment125581>

Nit: white space


- Chris Riccomini


On March 23, 2015, 5:39 p.m., Yi Pan (Data Infrastructure) wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/32407/
> ---
> 
> (Updated March 23, 2015, 5:39 p.m.)
> 
> 
> Review request for samza, Yan Fang, Chinmay Soman, Chris Riccomini, Navina 
> Ramesh, and Naveen Somasundaram.
> 
> 
> Bugs: SAMZA-571
> https://issues.apache.org/jira/browse/SAMZA-571
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> [SAMZA-571] Adding task interface to allow customized handling of exceptions 
> from user code in tasks
> 
> Just to add the first part: add ExceptionTask interface to allow user to add 
> code to handle the exceptions from user code in the task.
> 
> commit is also wrapped w/ the ExceptionTask since some errors such as 
> MessageSizeTooLargeException is not seen when user code calls send() but will 
> be returned by Kafka broker and thrown back in commit()
> 
> 
> Diffs
> -
> 
>   samza-api/src/main/java/org/apache/samza/task/ExceptionTask.java 
> PRE-CREATION 
>   
> samza-api/src/main/java/org/apache/samza/task/exception/TaskExceptionContext.java
>  PRE-CREATION 
>   samza-core/src/main/scala/org/apache/samza/config/TaskConfig.scala 
> 1ca9e2cc5673c537b6a48224809847e94da81fca 
>   samza-core/src/main/scala/org/apache/samza/container/RunLoop.scala 
> 4098235c8c13fad68c8aded3b2a8a4ef07c216e7 
>   samza-core/src/main/scala/org/apache/samza/container/SamzaContainer.scala 
> 9fc3b557bdcc2756a0ddfed6642deb529936b7a9 
>   samza-core/src/main/scala/org/apache/samza/container/TaskInstance.scala 
> be0b55ace5b4b9d29f42da17fabac93bb6a25605 
>   samza-core/src/main/scala/org/apache/samza/system/SystemConsumers.scala 
> 125d37602e2c0a9da75674f37580a1ac02f94796 
>   samza-core/src/test/scala/org/apache/samza/container/TestTaskInstance.scala 
> 54b4df84f47f818d62ac0361196567ad1f430fde 
> 
> Diff: https://reviews.apache.org/r/32407/diff/
> 
> 
> Testing
> ---
> 
> Unit test added. Pass with ./gradlew clean build
> 
> 
> Thanks,
> 
> Yi Pan (Data Infrastructure)
> 
>



Re: Message processing stuck after invalid JSON [BUG?!] - Version 0.8.x

2015-03-19 Thread Chris Riccomini
Hey all,

I've reproduced and documented the bug here:

  https://issues.apache.org/jira/browse/SAMZA-608

Cheers,
Chris

On Thu, Mar 19, 2015 at 9:08 AM, Chris Riccomini 
wrote:

> Hey Michael,
>
> Hmm. I checked the SystemConsumers code, and nothing jumped out at me as
> broken. Could you paste your logs somewhere (pastebin/gist) with
> DEBUG-level logging enabled?
>
> Cheers,
> Chris
>
> On Thu, Mar 19, 2015 at 6:52 AM, Michael Strobl <
> michael.str...@gameforge.com> wrote:
>
>> Hey there,
>>
>> I used the config setting "task.drop.deserialization.errors=true" to drop
>> all invalid JSON messages. It seems to work fine, in case of an ongoing
>> message stream. But the system will stuck when the last message was invalid
>> and the queue will be empty. Upcoming messages won't be processed until I
>> kill the task and start it again - then the new messages will be processed.
>>
>>
>> Works fine in case of:
>>
>> VALID MESSAGE
>> VALID MESSAGE
>> INVALID MESSAGE
>> VALID MESSAGE
>> VALID MESSAGE
>>
>>
>> But stuck when the last message was invalid:
>>
>> VALID MESSAGE
>> VALID MESSAGE
>> INVALID MESSAGE
>> ... (some time later)...
>> VALID MESSAGE 
>> VALID MESSAGE 
>>
>>
>> Can you please verify this behavior - I am not sure if the system stucks,
>> because the method "update" of the SystemConsumers class will return
>> "false" in case of an empty queue and the last message produced a serde
>> exception:
>>
>> https://github.com/apache/samza/blob/0.8.1/samza-core/src/main/scala/org/apache/samza/system/SystemConsumers.scala
>>
>> Thanks for helping!
>>
>> Cheers,
>> Michael Strobl
>>
>
>


Re: How to pass quoted string in the configuration

2015-03-19 Thread Chris Riccomini
Hey Jae,

If you're using the PropertiesConfigFactory (the default), then it's just a
Java Properties object. You should be able to escape it. I'm actually not
even sure if quotes are a problem in Java Properties objects. I would think
that they wouldn't be. The second '=' sign might be, though.

In any case, to see what went wrong, you'll have to check your AM container
logs. Can you find those? They're usually linked to from the YARN RM UI.

Cheers,
Chris

On Thu, Mar 19, 2015 at 2:32 PM, Bae, Jae Hyeon  wrote:

> Hi Samza Devs
>
> I want to pass the quoted string like
>
> filter.map.filter1.property=xpath("name") in ("uiBrowseStartup.ended",
> "subscription.ended", "uiStartup.ended") or xpath("category") = "uiIntent"
>
> through the configuration to the container but AM keeps failing
>
> Application application_1423090724595_0045 failed 2 times due to AM
> Container for appattempt_1423090724595_0045_02 exited with exitCode: 1
> due to: Exception from container-launch:
> org.apache.hadoop.util.Shell$ExitCodeException:
>  org.apache.hadoop.util.Shell$ExitCodeException:
>  at org.apache.hadoop.util.Shell.runCommand(Shell.java:505)
>  at org.apache.hadoop.util.Shell.run(Shell.java:418)
>  at
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650)
>  at
>
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
>
>  at
>
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:300)
>
>  at
>
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81)
>
>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>  at
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>
>  at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>
>  at java.lang.Thread.run(Thread.java:745)
>Container exited with a non-zero exit code 1
>  .Failing this attempt.. Failing the application.
>
> Any idea or recommendation?
>
> Thank you
> Best, Jae
>


Re: Review Request 32147: SAMZA-465

2015-03-19 Thread Chris Riccomini
ttps://reviews.apache.org/r/32147/#comment124879>

"changelogEntries", since each row results in a new message.



samza-core/src/main/java/org/apache/samza/checkpoint/CheckpointManager.java
<https://reviews.apache.org/r/32147/#comment124882>

Given how similar this class is to ChangelogManager, does it make sense to 
just have one CoordinatorStreamManager that has read/write methods for all of 
the CoordinatStreamMessage types? Seems like it would allow us to delete a fair 
amount of code.

Or, perhaps we should write a base class, and extend from it on a 
per-message-type basis?



samza-core/src/main/java/org/apache/samza/checkpoint/CheckpointManager.java
<https://reviews.apache.org/r/32147/#comment124863>

Javadocs.



samza-core/src/main/java/org/apache/samza/checkpoint/CheckpointManager.java
<https://reviews.apache.org/r/32147/#comment124881>

Seems like a dupe of what's in ChangelogManager. Think this logic should be 
moved to CoordinatorStreamConsumer, and used in both places. Or maybe moved to 
a shared base class.



samza-core/src/main/java/org/apache/samza/checkpoint/CheckpointManager.java
<https://reviews.apache.org/r/32147/#comment124885>

else we should probably do something nasty like throw an exception.



samza-core/src/main/java/org/apache/samza/job/model/TaskModel.java
<https://reviews.apache.org/r/32147/#comment124934>

Did IntelliJ generate this (and hashCode)?



samza-core/src/main/java/org/apache/samza/serializers/model/SamzaObjectMapper.java
<https://reviews.apache.org/r/32147/#comment124943>

Is this in Util, or does any other code do something similar? Seems like it 
belongs in Util.



samza-core/src/main/java/org/apache/samza/serializers/model/SamzaObjectMapper.java
<https://reviews.apache.org/r/32147/#comment124944>

Same Util question as above.



samza-core/src/main/scala/org/apache/samza/checkpoint/CheckpointTool.scala
<https://reviews.apache.org/r/32147/#comment124982>

Can you make CheckpointTool take a CheckpointManager param, and add an 
apply() method to the companion object, which builds a CheckpointTool object 
given config and newOffsets? That way you can inject CheckpointManagers for 
unit testing rather than having the setCheckpointManager method below.



samza-core/src/main/scala/org/apache/samza/checkpoint/OffsetManager.scala
<https://reviews.apache.org/r/32147/#comment124949>

filter(_._2 != null)



samza-core/src/main/scala/org/apache/samza/container/SamzaContainer.scala
<https://reviews.apache.org/r/32147/#comment124965>

nit: I think we've been using "Checkpoint" not "CheckPoint" in most places.



samza-test/src/main/config/join/common.properties
<https://reviews.apache.org/r/32147/#comment124989>

Should this be kafka-checkpoints? Also, should we rename that system? 
also^2, do we need two systems anymore?



samza-test/src/main/config/negate-number.properties
<https://reviews.apache.org/r/32147/#comment124988>

Is this required? There's only one system defined.



samza-test/src/main/config/perf/container-performance.properties
<https://reviews.apache.org/r/32147/#comment124990>

Does this matter? Should coordinator stream always start from the oldest 
offset?



samza-test/src/main/python/samza_failure_testing.py
<https://reviews.apache.org/r/32147/#comment124869>

Convention is usually #!/ with no space.

Also, `#!/usr/bin/env python` is usually the preferred location, I think 
(space after env is intentional). See:


http://stackoverflow.com/questions/2429511/why-do-people-write-usr-bin-env-python-on-the-first-line-of-a-python-script


- Chris Riccomini


On March 17, 2015, 11:25 p.m., Naveen Somasundaram wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/32147/
> ---
> 
> (Updated March 17, 2015, 11:25 p.m.)
> 
> 
> Review request for samza.
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> SAMZA 465
> 
> 
> Diffs
> -
> 
>   build.gradle 0a268ac7a3819cf46b54a93e0e3171455371456a 
>   samza-api/src/main/java/org/apache/samza/checkpoint/Checkpoint.java 
> 593d11872430100e000c7d4b6edc5ef29649d8d4 
>   samza-api/src/main/java/org/apache/samza/checkpoint/CheckpointManager.java 
> 092cb910b40d312217e86420bf1ddfbaf605e9e5 
>   
> samza-api/src/main/java/org/apache/samza/checkpoint/CheckpointManagerFactory.java
>  a97ff0919d8205928efee1a2a20780659180849d 
>   samza-api/src/main/java/org/apache/samza/container/TaskName.java 
> 083358686fc69ab45bbc73e898f419224ebc3a9f 
>   samza-api/src/main/java/org/apache/samza/system/SystemAdmin.java 
> 8995ba30c

Re: Message processing stuck after invalid JSON [BUG?!] - Version 0.8.x

2015-03-19 Thread Chris Riccomini
Hey Michael,

Hmm. I checked the SystemConsumers code, and nothing jumped out at me as
broken. Could you paste your logs somewhere (pastebin/gist) with
DEBUG-level logging enabled?

Cheers,
Chris

On Thu, Mar 19, 2015 at 6:52 AM, Michael Strobl <
michael.str...@gameforge.com> wrote:

> Hey there,
>
> I used the config setting "task.drop.deserialization.errors=true" to drop
> all invalid JSON messages. It seems to work fine, in case of an ongoing
> message stream. But the system will stuck when the last message was invalid
> and the queue will be empty. Upcoming messages won't be processed until I
> kill the task and start it again - then the new messages will be processed.
>
>
> Works fine in case of:
>
> VALID MESSAGE
> VALID MESSAGE
> INVALID MESSAGE
> VALID MESSAGE
> VALID MESSAGE
>
>
> But stuck when the last message was invalid:
>
> VALID MESSAGE
> VALID MESSAGE
> INVALID MESSAGE
> ... (some time later)...
> VALID MESSAGE 
> VALID MESSAGE 
>
>
> Can you please verify this behavior - I am not sure if the system stucks,
> because the method "update" of the SystemConsumers class will return
> "false" in case of an empty queue and the last message produced a serde
> exception:
>
> https://github.com/apache/samza/blob/0.8.1/samza-core/src/main/scala/org/apache/samza/system/SystemConsumers.scala
>
> Thanks for helping!
>
> Cheers,
> Michael Strobl
>


Re: Kafka topic naming conventions

2015-03-18 Thread Chris Riccomini
Hey Chinmay,

Cool, this is good feedback. I didn't think I was *that* crazy. :)

Cheers,
Chris

On Wed, Mar 18, 2015 at 6:10 PM, Chinmay Soman 
wrote:

> Thats what we're doing as well - appending partition count to the kafka
> topic name. This actually helps keep track of the #partitions for each
> topic (since Kafka doesn't have a Metadata store yet).
>
> In case of topic expansion - we actually just resort to creating a new
> topic. Although that is an overhead - the thought process is that this will
> minimize operational errors. Also, this is necessary to do in case we're
> doing some kind of joins.
>
>
> On Wed, Mar 18, 2015 at 5:59 PM, Jakob Homan  wrote:
>
> > On 18 March 2015 at 17:48, Chris Riccomini 
> wrote:
> > > One thing I haven't seen, but might be relevant, is including partition
> > > counts in the topic.
> >
> > Yeah, but then if you change the partition count later on, you've got
> > incorrect information forever. Or you need to create a new stream,
> > which might be a nice forcing function to make sure your join isn't
> > screwed up.  There'd need to be something somewhere to enforce that
> > though.
> >
>
>
>
> --
> Thanks and regards
>
> Chinmay Soman
>


Re: Kafka topic naming conventions

2015-03-18 Thread Chris Riccomini
Hey Jakob,

> Yeah, but then if you change the partition count later on, you've got 
> incorrect
information forever.

You're right. But IMO this further reinforces that you *can't* change
partition counts on a topic that you're using for a JOIN. This completely
breaks the operation.

Agree that it's just best effort, and kind of hacky. Was just a thought. I
haven't seen anyone actually do this.

Cheers,
Chris

On Wed, Mar 18, 2015 at 5:59 PM, Jakob Homan  wrote:

> On 18 March 2015 at 17:48, Chris Riccomini  wrote:
> > One thing I haven't seen, but might be relevant, is including partition
> > counts in the topic.
>
> Yeah, but then if you change the partition count later on, you've got
> incorrect information forever. Or you need to create a new stream,
> which might be a nice forcing function to make sure your join isn't
> screwed up.  There'd need to be something somewhere to enforce that
> though.
>


Re: Kafka topic naming conventions

2015-03-18 Thread Chris Riccomini
Hey Roger,

We haven't thought about this in great detail. People do all kinds of wacky
things in practice. We have some that are like, "AdViewsByMemberId". There
are various permutations of that.

One thing I haven't seen, but might be relevant, is including partition
counts in the topic. If you're doing joins, you kind of care about both the
join key and partition count.

Sorry I don't have better guidance. :/

Cheers,
Chris

On Wed, Mar 18, 2015 at 5:23 PM, Roger Hoover 
wrote:

> Hi,
>
> Wondering what naming conventions people are using for topics in Kafka.
> When there's re-partitioning involved, you can end up with multiple topics
> that have the exact same data but are partitioned differently.  How do you
> name them?
>
> Thanks,
>
> Roger
>


Re: Example Samza job using Confluent Platform

2015-03-18 Thread Chris Riccomini
Hey Roger,

This is awesome! I've added it to the ecosystem wiki:

  https://cwiki.apache.org/confluence/display/SAMZA/Ecosystem

Cheers,
Chris

On Wed, Mar 18, 2015 at 10:59 AM, Roger Hoover 
wrote:

> Hi all,
>
> In case others find this useful I created a simple Samza job that uses Avro
> + the schema registry from the Confluent Platform.  It's using the pending
> Pull Request to decode to Avro SpecificRecords.
>
> Suggestions for improvement are also welcome.
>
> https://github.com/theduderog/hello-samza-confluent
>
> Cheers,
>
> Roger
>


Re: Hello Samza: yarn symlink problem

2015-03-17 Thread Chris Riccomini
Hey George,

Could you paste the log in plain-text format, rather than image? Apache is
stripping out the image.

Based on your comment, I wonder if this is the problem:


http://superuser.com/questions/584146/why-is-this-archive-failing-to-create-symlinks

What file system are you using? Are you using a virtual machine? What OS?

Cheers,
Chris

On Tue, Mar 17, 2015 at 11:25 AM, Prakash, George 
wrote:

> Hello,
>
>
>
> We are having issues while installing YARN (as part of trying out
>  hello-samza -> http://samza.apache.org/startup/hello-samza/latest/). We
> are repeatedly getting stuck at the tar command (with the message – 'cannot
> create symlink …'). It looks like there is a problem with the tar file?
>
>
>
>
>
>
>
> I'd really appreciate if you could let us know at the earliest.
>
>
>
> Regards
>
> George
>
>
>
> If you wish to unsubscribe from receiving commercial electronic messages
> from TD Bank Group, please click here  or go
> to the following web address: www.td.com/tdoptout
> Si vous souhaitez vous désabonner des messages électroniques de nature
> commerciale envoyés par Groupe Banque TD veuillez cliquer ici
>  ou vous rendre à l'adresse www.td.com/tddesab
>
>
> NOTICE: Confidential message which may be privileged. Unauthorized
> use/disclosure prohibited. If received in error, please go to
> www.td.com/legal for instructions.
> AVIS : Message confidentiel dont le contenu peut être privilégié.
> Utilisation/divulgation interdites sans permission. Si reçu par erreur,
> prière d'aller au www.td.com/francais/avis_juridique pour des
> instructions.
>


Re: NoSuchMethodError

2015-03-17 Thread Chris Riccomini
Hey Jordi,

Good catch. Thanks for raising this. I've opened:

  https://issues.apache.org/jira/browse/SAMZA-605

To fix the issue.

Cheers,
Chris

On Tue, Mar 17, 2015 at 8:52 AM, Jordi Blasi Uribarri 
wrote:

> I hope that I am Reading the right documentation. In this page
>
> http://samza.apache.org/learn/documentation/latest/jobs/job-runner.html
>
> you can read:
>
> Samza jobs are started using a script called run-job.sh.
>
> samza-example/target/bin/run-job.sh \
>   --config-factory=samza.config.factories.PropertiesConfigFactory \
>   --config-path=file://$PWD/config/hello-world.properties
>
> The way you say it works. Now I have another different problem that I will
> have to check before asking.
>
> Thanks for your help.
>
> Cheers,
>
> Jordi
>
>
> -Mensaje original-
> De: Chris Riccomini [mailto:criccom...@apache.org]
> Enviado el: martes, 17 de marzo de 2015 16:43
> Para: dev@samza.apache.org
> Asunto: Re: NoSuchMethodError
>
> Hey Jordi,
>
> PropertiesConfigFactory is located in this package:
> org.apache.samza.config.factories
>
> You have the package set to samza.config.factories. You'll need to set it
> to:
>
>   org.apache.samza.config.factories.PropertiesConfigFactory
>
> Curious where you're getting that value from? We haven't had a "samza.*"
> prefix to packages since we open sourced Samza. What docs are you looking
> at?
>
> Cheers,
> Chris
>
> On Tue, Mar 17, 2015 at 5:45 AM, Jordi Blasi Uribarri 
> wrote:
>
> > After doing what you told me, now I am including all the dependencies
> > in a package. What I am seeing now is another ClassNotFoundException
> > but in this case it does not seem that it is related to external
> > libraries but to Samza itself, as it is referencing the config factory.
> >
> > # bin/run-job.sh
> > --config-factory=samza.config.factories.PropertiesConfigFactory
> > --config-path=file://\$PWD/samzafroga.jar
> > java version "1.7.0_75"
> > OpenJDK Runtime Environment (IcedTea 2.5.4) (7u75-2.5.4-2) OpenJDK
> > 64-Bit Server VM (build 24.75-b04, mixed mode)
> > /usr/lib/jvm/java-7-openjdk-amd64/bin/java
> > -Dlog4j.configuration=file:bin/log4j-console.xml
> > -Dsamza.log.dir=/opt/jobs -Djava.io.tmpdir=/opt/jobs/tmp -Xmx768M
> > -XX:+PrintGCDateStamps -Xloggc:/opt/jobs/gc.log
> > -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10
> > -XX:GCLogFileSize=10241024 -d64 -cp
> > /root/.samza/conf:/opt/jobs/lib/samzafroga-0.0.1-SNAPSHOT.jar:/opt/job
> > s/lib/samzafroga-0.0.1-SNAPSHOT-jar-with-dependencies.jar
> > org.apache.samza.job.JobRunner
> > --config-factory=samza.config.factories.PropertiesConfigFactory
> > --config-path=file://$PWD/samzafroga.jar
> > Exception in thread "main" java.lang.ClassNotFoundException:
> > samza.config.factories.PropertiesConfigFactory
> > at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
> > at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> > at java.security.AccessController.doPrivileged(Native Method)
> > at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> > at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
> > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
> > at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
> > at java.lang.Class.forName0(Native Method)
> > at java.lang.Class.forName(Class.java:191)
> > at
> > org.apache.samza.util.CommandLine.loadConfig(CommandLine.scala:66)
> > at org.apache.samza.job.JobRunner$.main(JobRunner.scala:36)
> > at org.apache.samza.job.JobRunner.main(JobRunner.scala)
> >
> > Do I have to include anything more?
> >
> > Thanks,
> >
> > Jordi
> >
> > >Hey Jordi,
> > >
> > >The stack trace you've pasted suggests that you're missing Scala in
> > >the classpath, or have a different version of Scala in the classpath
> > >than what Samza was compiled with.
> > >
> > >You should not be manually assembling the dependencies for your job.
> > >Your build system should be doing this for you. Please see
> > >hello-samza's
> > pom.xml:
> > >
> > >  https://github.com/apache/samza-hello-samza/blob/master/pom.xml
> > >
> > >For an example of how to do this. Specifically, the "assembly" plugin
> > >in Maven is used to build a .tgz file for your job, which has all of
> > >its required components:
> > >
> >

Re: NoSuchMethodError

2015-03-17 Thread Chris Riccomini
Hey Jordi,

PropertiesConfigFactory is located in this package:
org.apache.samza.config.factories

You have the package set to samza.config.factories. You'll need to set it
to:

  org.apache.samza.config.factories.PropertiesConfigFactory

Curious where you're getting that value from? We haven't had a "samza.*"
prefix to packages since we open sourced Samza. What docs are you looking
at?

Cheers,
Chris

On Tue, Mar 17, 2015 at 5:45 AM, Jordi Blasi Uribarri 
wrote:

> After doing what you told me, now I am including all the dependencies in a
> package. What I am seeing now is another ClassNotFoundException but in this
> case it does not seem that it is related to external libraries but to Samza
> itself, as it is referencing the config factory.
>
> # bin/run-job.sh
> --config-factory=samza.config.factories.PropertiesConfigFactory
> --config-path=file://\$PWD/samzafroga.jar
> java version "1.7.0_75"
> OpenJDK Runtime Environment (IcedTea 2.5.4) (7u75-2.5.4-2)
> OpenJDK 64-Bit Server VM (build 24.75-b04, mixed mode)
> /usr/lib/jvm/java-7-openjdk-amd64/bin/java
> -Dlog4j.configuration=file:bin/log4j-console.xml -Dsamza.log.dir=/opt/jobs
> -Djava.io.tmpdir=/opt/jobs/tmp -Xmx768M -XX:+PrintGCDateStamps
> -Xloggc:/opt/jobs/gc.log -XX:+UseGCLogFileRotation
> -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=10241024 -d64 -cp
> /root/.samza/conf:/opt/jobs/lib/samzafroga-0.0.1-SNAPSHOT.jar:/opt/jobs/lib/samzafroga-0.0.1-SNAPSHOT-jar-with-dependencies.jar
> org.apache.samza.job.JobRunner
> --config-factory=samza.config.factories.PropertiesConfigFactory
> --config-path=file://$PWD/samzafroga.jar
> Exception in thread "main" java.lang.ClassNotFoundException:
> samza.config.factories.PropertiesConfigFactory
> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:191)
> at
> org.apache.samza.util.CommandLine.loadConfig(CommandLine.scala:66)
> at org.apache.samza.job.JobRunner$.main(JobRunner.scala:36)
> at org.apache.samza.job.JobRunner.main(JobRunner.scala)
>
> Do I have to include anything more?
>
> Thanks,
>
> Jordi
>
> >Hey Jordi,
> >
> >The stack trace you've pasted suggests that you're missing Scala in the
> >classpath, or have a different version of Scala in the classpath than what
> >Samza was compiled with.
> >
> >You should not be manually assembling the dependencies for your job. Your
> >build system should be doing this for you. Please see hello-samza's
> pom.xml:
> >
> >  https://github.com/apache/samza-hello-samza/blob/master/pom.xml
> >
> >For an example of how to do this. Specifically, the "assembly" plugin in
> >Maven is used to build a .tgz file for your job, which has all of its
> >required components:
> >
> >  http://maven.apache.org/plugins/maven-assembly-plugin/
> >
> >If you're not using Maven, Gradle and SBT can both assemble .tgz files as
> >well.
> >
> >Cheers,
> >Chris
> >
> >On Mon, Mar 16, 2015 at 4:11 AM, Jordi Blasi Uribarri 
> >wrote:
> >
> >> Hello,
> >>
> >> I am new to Samza and I am trying to test it. I have not found much
> >> documentation and I am not sure if this is the correct place for this
> kind
> >> of questions. Please let me know if I am in the wrong place. I have
> tried
> >> to follow the documentation but I guess I missed something or  did
> >> something wrong.
> >>
> >> I have installed a clean debian box and followed the instructions to
> >> download and build from git.
> >>
> >> git clone http://git-wip-us.apache.org/repos/asf/samza.git
> >> cd samza
> >> ./gradlew clean build
> >>
> >> I have also installed scala (2.9.2 ) and java 7 jdk an jre.
> >>
> >> I have created a simple job in java and I am trying to run it but I am
> >> seeing some java dependencies problems when I try to run both run-job.sh
> >> and run-am.sh scripts.
> >>
> >> What I have done is create a folder for the jobs in /opt/jobs. There I
> >> have created a bin folder for the scripts and a lib folder for all the
> jars
> >> that I find that are required (as I have seen in the script that this
> the
> >> place where they are obtained). I have copied there all the jar
> contained
> >> in the samza folders and the ones I have obtained from a hadoop-2.6.0
> >> instalation package. Some of the dependencies have been solved but I am
> >> stuck in the following error when I run run-am.sh:
> >>
> >> Exception in thread "main" java.lang.NoSuchMethodError:
> >> scala.Predef$.augmentString(Ljava/lang/String;)Ljava/lang/String;
> >> at
> >>
> org

Review Request 32127: SAMZA-586

2015-03-17 Thread Chris Riccomini

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/32127/
---

Review request for samza.


Repository: samza


Description
---

fix compression.type in kafka checkpoint manager. add test for kafka system 
factory.


add injected props for producer to disable compression


Diffs
-

  samza-core/src/main/scala/org/apache/samza/config/StorageConfig.scala 
f977b8bd1f20386175d0dd79a9844f1b5394f84c 
  samza-core/src/test/scala/org/apache/samza/config/TestStorageConfig.scala 
PRE-CREATION 
  
samza-kafka/src/main/scala/org/apache/samza/checkpoint/kafka/KafkaCheckpointManagerFactory.scala
 7fc6d89b5d703a7c10a212aaa8d3f9231996b897 
  
samza-kafka/src/main/scala/org/apache/samza/system/kafka/KafkaSystemFactory.scala
 4f15002325bc0154991f9a35312e903d15ef81e7 
  
samza-kafka/src/test/scala/org/apache/samza/system/kafka/TestKafkaSystemFactory.scala
 5f65144146109418b4ec2ac40b3a69cb97f26224 

Diff: https://reviews.apache.org/r/32127/diff/


Testing
---


Thanks,

Chris Riccomini



Re: NoSuchMethodError

2015-03-16 Thread Chris Riccomini
Hey Jordi,

The stack trace you've pasted suggests that you're missing Scala in the
classpath, or have a different version of Scala in the classpath than what
Samza was compiled with.

You should not be manually assembling the dependencies for your job. Your
build system should be doing this for you. Please see hello-samza's pom.xml:

  https://github.com/apache/samza-hello-samza/blob/master/pom.xml

For an example of how to do this. Specifically, the "assembly" plugin in
Maven is used to build a .tgz file for your job, which has all of its
required components:

  http://maven.apache.org/plugins/maven-assembly-plugin/

If you're not using Maven, Gradle and SBT can both assemble .tgz files as
well.

Cheers,
Chris

On Mon, Mar 16, 2015 at 4:11 AM, Jordi Blasi Uribarri 
wrote:

> Hello,
>
> I am new to Samza and I am trying to test it. I have not found much
> documentation and I am not sure if this is the correct place for this kind
> of questions. Please let me know if I am in the wrong place. I have tried
> to follow the documentation but I guess I missed something or  did
> something wrong.
>
> I have installed a clean debian box and followed the instructions to
> download and build from git.
>
> git clone http://git-wip-us.apache.org/repos/asf/samza.git
> cd samza
> ./gradlew clean build
>
> I have also installed scala (2.9.2 ) and java 7 jdk an jre.
>
> I have created a simple job in java and I am trying to run it but I am
> seeing some java dependencies problems when I try to run both run-job.sh
> and run-am.sh scripts.
>
> What I have done is create a folder for the jobs in /opt/jobs. There I
> have created a bin folder for the scripts and a lib folder for all the jars
> that I find that are required (as I have seen in the script that this the
> place where they are obtained). I have copied there all the jar contained
> in the samza folders and the ones I have obtained from a hadoop-2.6.0
> instalation package. Some of the dependencies have been solved but I am
> stuck in the following error when I run run-am.sh:
>
> Exception in thread "main" java.lang.NoSuchMethodError:
> scala.Predef$.augmentString(Ljava/lang/String;)Ljava/lang/String;
> at
> org.apache.samza.job.yarn.SamzaAppMaster$$anonfun$main$3.apply(SamzaAppMaster.scala:63)
> at
> org.apache.samza.job.yarn.SamzaAppMaster$$anonfun$main$3.apply(SamzaAppMaster.scala:63)
> at org.apache.samza.util.Logging$class.info(Logging.scala:55)
> at
> org.apache.samza.job.yarn.SamzaAppMaster$.info(SamzaAppMaster.scala:55)
> at
> org.apache.samza.job.yarn.SamzaAppMaster$.main(SamzaAppMaster.scala:63)
> at
> org.apache.samza.job.yarn.SamzaAppMaster.main(SamzaAppMaster.scala)
>
> What I am missing?
>
> As a more general question, I am having quite a work compiling the
> dependencies. Is there a reference of the jar files needed for the jobs and
> scripts to run correctly?
>
> thanks for your help,
>
> Jordi
> 
> Jordi Blasi Uribarri
> Área I+D+i
>
> jbl...@nextel.es
> Oficina Bilbao
>
> [http://www.nextel.es/wp-content/uploads/Firma_Nextel_2014.png]
>


Re: Review Request 32052: SAMZA-592

2015-03-13 Thread Chris Riccomini

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/32052/
---

(Updated March 13, 2015, 9:33 p.m.)


Review request for samza.


Bugs: SAMZA-592
https://issues.apache.org/jira/browse/SAMZA-592


Repository: samza


Description (updated)
---

updating based on Ewen's feedback


minor nit formatting fix for import


refresh topic metadata if partitions have bad error codes. add a test


add a little test to verify we ignore replica not available exceptions


remove partition metadata check from KafkaSystemAdmin since it's already done 
in getOffsets


switch to KafkaUtil.maybeThrowException


Diffs (updated)
-

  
samza-kafka/src/main/scala/org/apache/samza/checkpoint/kafka/KafkaCheckpointManager.scala
 4a1b31f025ba7b05a7b46041aa8e12074599ce24 
  samza-kafka/src/main/scala/org/apache/samza/system/kafka/BrokerProxy.scala 
c6e231a2588ce95940aa2da9483a98c6115e38d9 
  samza-kafka/src/main/scala/org/apache/samza/system/kafka/GetOffset.scala 
147aabc947f0cb01c0780edb693e9714f810b5f6 
  
samza-kafka/src/main/scala/org/apache/samza/system/kafka/KafkaSystemAdmin.scala 
b790be17cfe08da28220ffb381cbd618ebe25cf0 
  
samza-kafka/src/main/scala/org/apache/samza/system/kafka/TopicMetadataCache.scala
 4a49d22a3fc403f624ca17a6414d84eaba1898be 
  samza-kafka/src/main/scala/org/apache/samza/util/KafkaUtil.scala 
2482f23cc6b9c072651df9cbfe9714ffeb203687 
  
samza-kafka/src/test/scala/org/apache/samza/system/kafka/TestKafkaSystemAdmin.scala
 3d1e6ecbb3fd95816c722a68c4f5907120eb20d0 
  
samza-kafka/src/test/scala/org/apache/samza/system/kafka/TestTopicMetadataCache.scala
 e698d2f1f004740a4d74a488c469d8ca8426c6e4 
  samza-kafka/src/test/scala/org/apache/samza/utils/TestKafkaUtil.scala 
PRE-CREATION 
  
samza-test/src/test/scala/org/apache/samza/test/integration/TestStatefulTask.scala
 a8b724bf781003142e455fdf1fed2f13d6c18353 

Diff: https://reviews.apache.org/r/32052/diff/


Testing
---


Thanks,

Chris Riccomini



Re: Review Request 32052: SAMZA-592

2015-03-13 Thread Chris Riccomini

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/32052/
---

(Updated March 13, 2015, 8:48 p.m.)


Review request for samza.


Bugs: SAMZA-592
https://issues.apache.org/jira/browse/SAMZA-592


Repository: samza


Description (updated)
---

minor nit formatting fix for import


refresh topic metadata if partitions have bad error codes. add a test


add a little test to verify we ignore replica not available exceptions


remove partition metadata check from KafkaSystemAdmin since it's already done 
in getOffsets


switch to KafkaUtil.maybeThrowException


Diffs (updated)
-

  
samza-kafka/src/main/scala/org/apache/samza/checkpoint/kafka/KafkaCheckpointManager.scala
 4a1b31f025ba7b05a7b46041aa8e12074599ce24 
  samza-kafka/src/main/scala/org/apache/samza/system/kafka/BrokerProxy.scala 
c6e231a2588ce95940aa2da9483a98c6115e38d9 
  samza-kafka/src/main/scala/org/apache/samza/system/kafka/GetOffset.scala 
147aabc947f0cb01c0780edb693e9714f810b5f6 
  
samza-kafka/src/main/scala/org/apache/samza/system/kafka/KafkaSystemAdmin.scala 
b790be17cfe08da28220ffb381cbd618ebe25cf0 
  
samza-kafka/src/main/scala/org/apache/samza/system/kafka/TopicMetadataCache.scala
 4a49d22a3fc403f624ca17a6414d84eaba1898be 
  samza-kafka/src/main/scala/org/apache/samza/util/KafkaUtil.scala 
2482f23cc6b9c072651df9cbfe9714ffeb203687 
  
samza-kafka/src/test/scala/org/apache/samza/system/kafka/TestKafkaSystemAdmin.scala
 3d1e6ecbb3fd95816c722a68c4f5907120eb20d0 
  
samza-kafka/src/test/scala/org/apache/samza/system/kafka/TestTopicMetadataCache.scala
 e698d2f1f004740a4d74a488c469d8ca8426c6e4 
  samza-kafka/src/test/scala/org/apache/samza/utils/TestKafkaUtil.scala 
PRE-CREATION 
  
samza-test/src/test/scala/org/apache/samza/test/integration/TestStatefulTask.scala
 a8b724bf781003142e455fdf1fed2f13d6c18353 

Diff: https://reviews.apache.org/r/32052/diff/


Testing
---


Thanks,

Chris Riccomini



Re: Review Request 32052: SAMZA-592

2015-03-13 Thread Chris Riccomini

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/32052/#review76431
---



samza-kafka/src/main/scala/org/apache/samza/system/kafka/KafkaSystemAdmin.scala
<https://reviews.apache.org/r/32052/#comment123974>

> 1. What would happen to the cache entry if we don't check 
partitionMetadata.errorCode?

If we don't check partitionMetadata.errorCode *anywhere*, then any 
partition-level errors will get triggered when getOffsets is called. For 
example, if a leader is not available, getOffsets' OffsetFetchRequest will have 
an errorCode set that will say the leader isn't available.

We're basically putting off the error check until after we've made the 
request we care about (e.g. produce request, offset fetch request, etc).

> 2. Assuming that the errored partition metadata is not inserted in the 
cache, getOffsets would raise exception? And how do we capture that case?

Yes, getOffsets calls:

  .getOffsetsBefore(new OffsetRequest(partitionOffsetInfo))

And then checks ths results:

KafkaUtil.maybeThrowException(partitionErrorAndOffset.error)

Any bad errorCodes from the metadata will also get surfaced here (e.g. 
unknown topic/partition, leader offline, etc).

If there is a failure, the maybeThrowException method will throw an 
exception, which will be caught in KafkaSystemAdmin.getSystemStreamMetadata. 
The retryBackoff loop will then catch the exception, and restart from scratch.


- Chris Riccomini


On March 13, 2015, 7:56 p.m., Chris Riccomini wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/32052/
> ---
> 
> (Updated March 13, 2015, 7:56 p.m.)
> 
> 
> Review request for samza.
> 
> 
> Bugs: SAMZA-592
> https://issues.apache.org/jira/browse/SAMZA-592
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> refresh topic metadata if partitions have bad error codes. add a test
> 
> 
> add a little test to verify we ignore replica not available exceptions
> 
> 
> remove partition metadata check from KafkaSystemAdmin since it's already done 
> in getOffsets
> 
> 
> switch to KafkaUtil.maybeThrowException
> 
> 
> Diffs
> -
> 
>   
> samza-kafka/src/main/scala/org/apache/samza/checkpoint/kafka/KafkaCheckpointManager.scala
>  4a1b31f025ba7b05a7b46041aa8e12074599ce24 
>   samza-kafka/src/main/scala/org/apache/samza/system/kafka/BrokerProxy.scala 
> c6e231a2588ce95940aa2da9483a98c6115e38d9 
>   samza-kafka/src/main/scala/org/apache/samza/system/kafka/GetOffset.scala 
> 147aabc947f0cb01c0780edb693e9714f810b5f6 
>   
> samza-kafka/src/main/scala/org/apache/samza/system/kafka/KafkaSystemAdmin.scala
>  b790be17cfe08da28220ffb381cbd618ebe25cf0 
>   
> samza-kafka/src/main/scala/org/apache/samza/system/kafka/TopicMetadataCache.scala
>  4a49d22a3fc403f624ca17a6414d84eaba1898be 
>   samza-kafka/src/main/scala/org/apache/samza/util/KafkaUtil.scala 
> 2482f23cc6b9c072651df9cbfe9714ffeb203687 
>   
> samza-kafka/src/test/scala/org/apache/samza/system/kafka/TestKafkaSystemAdmin.scala
>  3d1e6ecbb3fd95816c722a68c4f5907120eb20d0 
>   
> samza-kafka/src/test/scala/org/apache/samza/system/kafka/TestTopicMetadataCache.scala
>  e698d2f1f004740a4d74a488c469d8ca8426c6e4 
>   samza-kafka/src/test/scala/org/apache/samza/utils/TestKafkaUtil.scala 
> PRE-CREATION 
>   
> samza-test/src/test/scala/org/apache/samza/test/integration/TestStatefulTask.scala
>  a8b724bf781003142e455fdf1fed2f13d6c18353 
> 
> Diff: https://reviews.apache.org/r/32052/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Chris Riccomini
> 
>



Re: Review Request 32006: SAMZA-597

2015-03-13 Thread Chris Riccomini

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/32006/
---

(Updated March 13, 2015, 8:39 p.m.)


Review request for samza.


Repository: samza


Description (updated)
---

fixing yi's feedback


add docs


fail if containerName is not set.


update tests


move location enabled config to log4j config file


fix MDC link in docs


add a logging event json serde for log4j, and set it as default.


default to log4j string serde for now


Diffs (updated)
-

  build.gradle 08583e07f1c0bda88433bacb59bc2fd9ef6ce310 
  docs/learn/documentation/versioned/jobs/configuration-table.html 
ec1287418042b95df73ff7c36a684d3123c46372 
  docs/learn/documentation/versioned/jobs/logging.md 
af2fd0ea6929230cdc6bc3c51d9ae62adacb55fa 
  samza-log4j/src/main/java/org/apache/samza/config/Log4jSystemConfig.java 
107ddf0c3d4e0f584a2f68a23debbada5f68dcb8 
  samza-log4j/src/main/java/org/apache/samza/logging/log4j/StreamAppender.java 
4ef3551f470e77e27bd156e81ce96486f25c21bf 
  
samza-log4j/src/main/java/org/apache/samza/logging/log4j/serializers/LoggingEventJsonSerde.java
 PRE-CREATION 
  
samza-log4j/src/main/java/org/apache/samza/logging/log4j/serializers/LoggingEventJsonSerdeFactory.java
 PRE-CREATION 
  samza-log4j/src/test/java/org/apache/samza/config/TestLog4jSystemConfig.java 
16ccb459892f62245648235eb65f53b26e8ecb87 
  
samza-log4j/src/test/java/org/apache/samza/logging/log4j/TestStreamAppender.java
 3e4ddc9c72868e22f993f60015224cd3a153266c 

Diff: https://reviews.apache.org/r/32006/diff/


Testing
---


Thanks,

Chris Riccomini



Re: Review Request 32006: SAMZA-597

2015-03-13 Thread Chris Riccomini


> On March 13, 2015, 5:31 p.m., Yi Pan (Data Infrastructure) wrote:
> > samza-log4j/src/main/java/org/apache/samza/config/Log4jSystemConfig.java, 
> > line 54
> > <https://reviews.apache.org/r/32006/diff/1/?file=892561#file892561line54>
> >
> > It seems that the default is set to false. Just curious: don't we 
> > always want to see the file location info when using StreamAppender? Or is 
> > it included somewhere else?

I am following the lead of logstash's appender. I also think it makes sense to 
disable by default. It increases message payload by 10%-20%. We could 
definitely default to true, but in practice, I haven't seen this used much.


> On March 13, 2015, 5:31 p.m., Yi Pan (Data Infrastructure) wrote:
> > samza-log4j/src/main/java/org/apache/samza/config/Log4jSystemConfig.java, 
> > line 102
> > <https://reviews.apache.org/r/32006/diff/1/?file=892561#file892561line102>
> >
> > It seems that the default return value of this method has changed and 
> > no longer default the return value to LoggingEventStringSerdeFactory. It 
> > would be better to remove this Java doc description here.

Done.


> On March 13, 2015, 5:31 p.m., Yi Pan (Data Infrastructure) wrote:
> > samza-log4j/src/main/java/org/apache/samza/logging/log4j/serializers/LoggingEventJsonSerde.java,
> >  line 53
> > <https://reviews.apache.org/r/32006/diff/1/?file=892563#file892563line53>
> >
> > nit: Serde

Done.


- Chris


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/32006/#review76381
---


On March 13, 2015, 12:57 a.m., Chris Riccomini wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/32006/
> ---
> 
> (Updated March 13, 2015, 12:57 a.m.)
> 
> 
> Review request for samza.
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> add docs
> 
> 
> fail if containerName is not set.
> 
> 
> update tests
> 
> 
> move location enabled config to log4j config file
> 
> 
> fix MDC link in docs
> 
> 
> add a logging event json serde for log4j, and set it as default.
> 
> 
> default to log4j string serde for now
> 
> 
> Diffs
> -
> 
>   build.gradle 08583e07f1c0bda88433bacb59bc2fd9ef6ce310 
>   docs/learn/documentation/versioned/jobs/configuration-table.html 
> ec1287418042b95df73ff7c36a684d3123c46372 
>   docs/learn/documentation/versioned/jobs/logging.md 
> af2fd0ea6929230cdc6bc3c51d9ae62adacb55fa 
>   samza-log4j/src/main/java/org/apache/samza/config/Log4jSystemConfig.java 
> 107ddf0c3d4e0f584a2f68a23debbada5f68dcb8 
>   
> samza-log4j/src/main/java/org/apache/samza/logging/log4j/StreamAppender.java 
> 4ef3551f470e77e27bd156e81ce96486f25c21bf 
>   
> samza-log4j/src/main/java/org/apache/samza/logging/log4j/serializers/LoggingEventJsonSerde.java
>  PRE-CREATION 
>   
> samza-log4j/src/main/java/org/apache/samza/logging/log4j/serializers/LoggingEventJsonSerdeFactory.java
>  PRE-CREATION 
>   
> samza-log4j/src/test/java/org/apache/samza/config/TestLog4jSystemConfig.java 
> 16ccb459892f62245648235eb65f53b26e8ecb87 
>   
> samza-log4j/src/test/java/org/apache/samza/logging/log4j/TestStreamAppender.java
>  3e4ddc9c72868e22f993f60015224cd3a153266c 
> 
> Diff: https://reviews.apache.org/r/32006/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Chris Riccomini
> 
>



Re: Samza on Yarn

2015-03-13 Thread Chris Riccomini
Hey Shekar,

Awesome, thanks! Would love to get any doc updates that would be useful.

Curious: what was wrong?

Cheers,
Chris

On Fri, Mar 13, 2015 at 1:00 PM, Shekar Tippur  wrote:

> Thanks for your help Chris. Got it to work now. I will test my case and
> documentation further. I can edit the Samza documentation to reflect any
> changes.
>
> - Shekar
>
> On Thu, Mar 12, 2015 at 5:19 PM, Chris Riccomini 
> wrote:
>
> > Hey Shekar,
> >
> > Yes, this is definitely a classpath issue. The pastebin you sent does not
> > include any of the samza-core/samza-yarn/scala JARs. This is rather
> > strange, since you said you put the JARs in this path:
> >
> >   /home/hadoop/hadoop-2.5.2/share/hadoop/hdfs/lib/
> >
> > And I do see *other* JARs listed with this path. Are you sure you put the
> > Samza JARs on *all* machines, not just the RM machine? According to the
> > yarn-default.xml logs, it says:
> >
> > CLASSPATH for YARN applications. A comma-separated list of CLASSPATH
> > entries. When this value is empty, the following default CLASSPATH for
> YARN
> > applications would be used. For Linux: $HADOOP_CONF_DIR,
> > $HADOOP_COMMON_HOME/share/hadoop/common/*,
> > $HADOOP_COMMON_HOME/share/hadoop/common/lib/*,
> > $HADOOP_HDFS_HOME/share/hadoop/hdfs/*,
> > $HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*,
> > $HADOOP_YARN_HOME/share/hadoop/yarn/*,
> > $HADOOP_YARN_HOME/share/hadoop/yarn/lib/*
> >
> > So, it seems like it should pick up the JARs, if they're in the NM's
> > directory.
> >
> > The exception that you're now seeing seems to suggest that one of the
> Samza
> > containers is failing:
> >
> > Container for appattempt_1426204312971_0001_02 exited with exitCode:
> 1
> >
> > The _02 suffix indicates a non-AM failure (i.e. the Samza container
> > failed, not the Samza AM). Can you check the AM logs, and find the
> http://
> > ...
> > link to the container logs? It should give a hint about why the container
> > failed.
> >
> > Cheers,
> > Chris
> >
> > On Thu, Mar 12, 2015 at 4:58 PM, Shekar Tippur 
> wrote:
> >
> > > Chris,
> > >
> > > Made some progress.
> > >
> > > By adding yarn.application.classpath to yarn-site.xml, I am no longer
> > > getting class not found error. However, I am getting a different error:
> > >
> > > Application application_1426204312971_0001 failed 2 times due to AM
> > > Container for appattempt_1426204312971_0001_02 exited with
> exitCode:
> > 1
> > > due to: Exception from container-launch: ExitCodeException exitCode=1:
> > > ExitCodeException exitCode=1:
> > > at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
> > > at org.apache.hadoop.util.Shell.run(Shell.java:455)
> > > at
> > >
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702)
> > > at
> > >
> > >
> >
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
> > > at
> > >
> > >
> >
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:300)
> > > at
> > >
> > >
> >
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81)
> > > at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> > > at
> > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> > > at
> > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > > at java.lang.Thread.run(Thread.java:745)
> > > Container exited with a non-zero exit code 1
> > > .Failing this attempt.. Failing the application.
> > >
> > > Looks like a common issue with yarn but not sure how to resolve as yet.
> > >
> > >
> > > - Shekar
> > >
> > > On Thu, Mar 12, 2015 at 1:44 PM, Shekar Tippur 
> > wrote:
> > >
> > > > Chris - Here it is.
> > > >
> > > > http://pastebin.com/c3e21Hzf
> > > >
> > > > - Shekar
> > > >
> > > > On Thu, Mar 12, 2015 at 10:58 AM, Chris Riccomini <
> > criccom...@apache.org
> > > >
> > > > wrote:
> > > >
> > > >> This is the line that I'm interested in:
> > > >>
&

Review Request 32052: SAMZA-592

2015-03-13 Thread Chris Riccomini

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/32052/
---

Review request for samza.


Bugs: SAMZA-592
https://issues.apache.org/jira/browse/SAMZA-592


Repository: samza


Description
---

refresh topic metadata if partitions have bad error codes. add a test


add a little test to verify we ignore replica not available exceptions


remove partition metadata check from KafkaSystemAdmin since it's already done 
in getOffsets


switch to KafkaUtil.maybeThrowException


Diffs
-

  
samza-kafka/src/main/scala/org/apache/samza/checkpoint/kafka/KafkaCheckpointManager.scala
 4a1b31f025ba7b05a7b46041aa8e12074599ce24 
  samza-kafka/src/main/scala/org/apache/samza/system/kafka/BrokerProxy.scala 
c6e231a2588ce95940aa2da9483a98c6115e38d9 
  samza-kafka/src/main/scala/org/apache/samza/system/kafka/GetOffset.scala 
147aabc947f0cb01c0780edb693e9714f810b5f6 
  
samza-kafka/src/main/scala/org/apache/samza/system/kafka/KafkaSystemAdmin.scala 
b790be17cfe08da28220ffb381cbd618ebe25cf0 
  
samza-kafka/src/main/scala/org/apache/samza/system/kafka/TopicMetadataCache.scala
 4a49d22a3fc403f624ca17a6414d84eaba1898be 
  samza-kafka/src/main/scala/org/apache/samza/util/KafkaUtil.scala 
2482f23cc6b9c072651df9cbfe9714ffeb203687 
  
samza-kafka/src/test/scala/org/apache/samza/system/kafka/TestKafkaSystemAdmin.scala
 3d1e6ecbb3fd95816c722a68c4f5907120eb20d0 
  
samza-kafka/src/test/scala/org/apache/samza/system/kafka/TestTopicMetadataCache.scala
 e698d2f1f004740a4d74a488c469d8ca8426c6e4 
  samza-kafka/src/test/scala/org/apache/samza/utils/TestKafkaUtil.scala 
PRE-CREATION 
  
samza-test/src/test/scala/org/apache/samza/test/integration/TestStatefulTask.scala
 a8b724bf781003142e455fdf1fed2f13d6c18353 

Diff: https://reviews.apache.org/r/32052/diff/


Testing
---


Thanks,

Chris Riccomini



Re: Review Request 32006: SAMZA-597

2015-03-13 Thread Chris Riccomini


> On March 13, 2015, 6:24 p.m., Yan Fang wrote:
> > samza-log4j/src/main/java/org/apache/samza/config/Log4jSystemConfig.java, 
> > line 109
> > <https://reviews.apache.org/r/32006/diff/1/?file=892561#file892561line109>
> >
> > what is the reason of getting rid of the system serde?

I just didn't think it made any sense. Since the Serdes for StreamAppender are 
always going to be Serde the probability that you'd have a system 
configured to use a Serde seemed to be zero. You can still 
override the log topic's serde using the systems.%s.streams.%s.samza.msg.serde 
(stream-level) serde config.


- Chris


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/32006/#review76396
-------


On March 13, 2015, 12:57 a.m., Chris Riccomini wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/32006/
> ---
> 
> (Updated March 13, 2015, 12:57 a.m.)
> 
> 
> Review request for samza.
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> add docs
> 
> 
> fail if containerName is not set.
> 
> 
> update tests
> 
> 
> move location enabled config to log4j config file
> 
> 
> fix MDC link in docs
> 
> 
> add a logging event json serde for log4j, and set it as default.
> 
> 
> default to log4j string serde for now
> 
> 
> Diffs
> -
> 
>   build.gradle 08583e07f1c0bda88433bacb59bc2fd9ef6ce310 
>   docs/learn/documentation/versioned/jobs/configuration-table.html 
> ec1287418042b95df73ff7c36a684d3123c46372 
>   docs/learn/documentation/versioned/jobs/logging.md 
> af2fd0ea6929230cdc6bc3c51d9ae62adacb55fa 
>   samza-log4j/src/main/java/org/apache/samza/config/Log4jSystemConfig.java 
> 107ddf0c3d4e0f584a2f68a23debbada5f68dcb8 
>   
> samza-log4j/src/main/java/org/apache/samza/logging/log4j/StreamAppender.java 
> 4ef3551f470e77e27bd156e81ce96486f25c21bf 
>   
> samza-log4j/src/main/java/org/apache/samza/logging/log4j/serializers/LoggingEventJsonSerde.java
>  PRE-CREATION 
>   
> samza-log4j/src/main/java/org/apache/samza/logging/log4j/serializers/LoggingEventJsonSerdeFactory.java
>  PRE-CREATION 
>   
> samza-log4j/src/test/java/org/apache/samza/config/TestLog4jSystemConfig.java 
> 16ccb459892f62245648235eb65f53b26e8ecb87 
>   
> samza-log4j/src/test/java/org/apache/samza/logging/log4j/TestStreamAppender.java
>  3e4ddc9c72868e22f993f60015224cd3a153266c 
> 
> Diff: https://reviews.apache.org/r/32006/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Chris Riccomini
> 
>



SAMZA-576 and SAMZA-597 Reviews

2015-03-13 Thread Chris Riccomini
Hey all,

I could use reviews for:

  https://issues.apache.org/jira/browse/SAMZA-576
  https://issues.apache.org/jira/browse/SAMZA-597

Thanks! :)

Cheers,
Chris


Review Request 32006: SAMZA-597

2015-03-12 Thread Chris Riccomini

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/32006/
---

Review request for samza.


Repository: samza


Description
---

add docs


fail if containerName is not set.


update tests


move location enabled config to log4j config file


fix MDC link in docs


add a logging event json serde for log4j, and set it as default.


default to log4j string serde for now


Diffs
-

  build.gradle 08583e07f1c0bda88433bacb59bc2fd9ef6ce310 
  docs/learn/documentation/versioned/jobs/configuration-table.html 
ec1287418042b95df73ff7c36a684d3123c46372 
  docs/learn/documentation/versioned/jobs/logging.md 
af2fd0ea6929230cdc6bc3c51d9ae62adacb55fa 
  samza-log4j/src/main/java/org/apache/samza/config/Log4jSystemConfig.java 
107ddf0c3d4e0f584a2f68a23debbada5f68dcb8 
  samza-log4j/src/main/java/org/apache/samza/logging/log4j/StreamAppender.java 
4ef3551f470e77e27bd156e81ce96486f25c21bf 
  
samza-log4j/src/main/java/org/apache/samza/logging/log4j/serializers/LoggingEventJsonSerde.java
 PRE-CREATION 
  
samza-log4j/src/main/java/org/apache/samza/logging/log4j/serializers/LoggingEventJsonSerdeFactory.java
 PRE-CREATION 
  samza-log4j/src/test/java/org/apache/samza/config/TestLog4jSystemConfig.java 
16ccb459892f62245648235eb65f53b26e8ecb87 
  
samza-log4j/src/test/java/org/apache/samza/logging/log4j/TestStreamAppender.java
 3e4ddc9c72868e22f993f60015224cd3a153266c 

Diff: https://reviews.apache.org/r/32006/diff/


Testing
---


Thanks,

Chris Riccomini



Re: Samza on Yarn

2015-03-12 Thread Chris Riccomini
Hey Shekar,

Yes, this is definitely a classpath issue. The pastebin you sent does not
include any of the samza-core/samza-yarn/scala JARs. This is rather
strange, since you said you put the JARs in this path:

  /home/hadoop/hadoop-2.5.2/share/hadoop/hdfs/lib/

And I do see *other* JARs listed with this path. Are you sure you put the
Samza JARs on *all* machines, not just the RM machine? According to the
yarn-default.xml logs, it says:

CLASSPATH for YARN applications. A comma-separated list of CLASSPATH
entries. When this value is empty, the following default CLASSPATH for YARN
applications would be used. For Linux: $HADOOP_CONF_DIR,
$HADOOP_COMMON_HOME/share/hadoop/common/*,
$HADOOP_COMMON_HOME/share/hadoop/common/lib/*,
$HADOOP_HDFS_HOME/share/hadoop/hdfs/*,
$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*,
$HADOOP_YARN_HOME/share/hadoop/yarn/*,
$HADOOP_YARN_HOME/share/hadoop/yarn/lib/*

So, it seems like it should pick up the JARs, if they're in the NM's
directory.

The exception that you're now seeing seems to suggest that one of the Samza
containers is failing:

Container for appattempt_1426204312971_0001_02 exited with exitCode: 1

The _02 suffix indicates a non-AM failure (i.e. the Samza container
failed, not the Samza AM). Can you check the AM logs, and find the http://...
link to the container logs? It should give a hint about why the container
failed.

Cheers,
Chris

On Thu, Mar 12, 2015 at 4:58 PM, Shekar Tippur  wrote:

> Chris,
>
> Made some progress.
>
> By adding yarn.application.classpath to yarn-site.xml, I am no longer
> getting class not found error. However, I am getting a different error:
>
> Application application_1426204312971_0001 failed 2 times due to AM
> Container for appattempt_1426204312971_0001_02 exited with exitCode: 1
> due to: Exception from container-launch: ExitCodeException exitCode=1:
> ExitCodeException exitCode=1:
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
> at org.apache.hadoop.util.Shell.run(Shell.java:455)
> at
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702)
> at
>
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
> at
>
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:300)
> at
>
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Container exited with a non-zero exit code 1
> .Failing this attempt.. Failing the application.
>
> Looks like a common issue with yarn but not sure how to resolve as yet.
>
>
> - Shekar
>
> On Thu, Mar 12, 2015 at 1:44 PM, Shekar Tippur  wrote:
>
> > Chris - Here it is.
> >
> > http://pastebin.com/c3e21Hzf
> >
> > - Shekar
> >
> > On Thu, Mar 12, 2015 at 10:58 AM, Chris Riccomini  >
> > wrote:
> >
> >> This is the line that I'm interested in:
> >>
> >> STARTUP_MSG:   classpath 
> >>
> >> On Thu, Mar 12, 2015 at 10:55 AM, Chris Riccomini <
> criccom...@apache.org>
> >> wrote:
> >>
> >> > Hey Shekar,
> >> >
> >> > Could you paste the full log on pastebin? It really seems like
> >> something's
> >> > missing from the classpath. If samza-yarn is there, it should be able
> to
> >> > see that file. I think the full log has a dump of the classpath. If it
> >> > doesn't, could you paste the line where the YARN NM is starting up,
> and
> >> > dumps the full classpath?
> >> >
> >> > Cheers,
> >> > Chris
> >> >
> >> > On Thu, Mar 12, 2015 at 10:17 AM, Shekar Tippur 
> >> wrote:
> >> >
> >> >> I think all these jars are in place (Under
> >> >> $HADOOP_YARN_HOME/share/hadoop/hdfs/lib)
> >> >>
> >> >> - Shekar
> >> >>
> >> >> On Thu, Mar 12, 2015 at 9:36 AM, Chris Riccomini <
> >> criccom...@apache.org>
> >> >> wrote:
> >> >>
> >> >> > Hey Shekar,
> >> >> >
> >> >> > You need that samza-yarn file on your RM/NM's classpath, along with
> >> >> scala.
> >> >> > We missed this in the docs, and are tracking the issue here:
> &g

Re: Samza on Yarn

2015-03-12 Thread Chris Riccomini
This is the line that I'm interested in:

STARTUP_MSG:   classpath 

On Thu, Mar 12, 2015 at 10:55 AM, Chris Riccomini 
wrote:

> Hey Shekar,
>
> Could you paste the full log on pastebin? It really seems like something's
> missing from the classpath. If samza-yarn is there, it should be able to
> see that file. I think the full log has a dump of the classpath. If it
> doesn't, could you paste the line where the YARN NM is starting up, and
> dumps the full classpath?
>
> Cheers,
> Chris
>
> On Thu, Mar 12, 2015 at 10:17 AM, Shekar Tippur  wrote:
>
>> I think all these jars are in place (Under
>> $HADOOP_YARN_HOME/share/hadoop/hdfs/lib)
>>
>> - Shekar
>>
>> On Thu, Mar 12, 2015 at 9:36 AM, Chris Riccomini 
>> wrote:
>>
>> > Hey Shekar,
>> >
>> > You need that samza-yarn file on your RM/NM's classpath, along with
>> scala.
>> > We missed this in the docs, and are tracking the issue here:
>> >
>> >   https://issues.apache.org/jira/browse/SAMZA-456
>> >
>> > You'll also need samza-core in the classpath, based on the discussion on
>> > SAMZA-456. Sorry about that. If you want to update the tutorial when you
>> > get your cluster working, and submit a patch, that'd be great! :)
>> >
>> > Cheers,
>> > Chris
>> >
>> > On Wed, Mar 11, 2015 at 9:43 PM, Shekar Tippur 
>> wrote:
>> >
>> > > Here is the corresponding log:
>> > >
>> > > 2015-03-11 20:43:09,665 INFO  [AsyncDispatcher event handler]
>> > > localizer.LocalizedResource (LocalizedResource.java:handle(203)) -
>> > Resource
>> > > http://sprfargas102:8000/hello-samza-0.8.0-dist.tar.gz transitioned
>> from
>> > > INIT to DOWNLOADING
>> > >
>> > > 2015-03-11 20:43:09,665 INFO  [AsyncDispatcher event handler]
>> > > localizer.ResourceLocalizationService
>> > > (ResourceLocalizationService.java:handle(679)) - Created localizer for
>> > > container_1426121400423_2587_01_01
>> > >
>> > > 2015-03-11 20:43:09,669 INFO  [LocalizerRunner for
>> > > container_1426121400423_2587_01_01]
>> > > localizer.ResourceLocalizationService
>> > > (ResourceLocalizationService.java:writeCredentials(1107)) - Writing
>> > > credentials to the nmPrivate file
>> > >
>> > >
>> >
>> /tmp/hadoop-hadoop/nm-local-dir/nmPrivate/container_1426121400423_2587_01_01.tokens.
>> > > Credentials list:
>> > >
>> > > 2015-03-11 20:43:09,675 INFO  [DeletionService #0]
>> > > nodemanager.DefaultContainerExecutor
>> > > (DefaultContainerExecutor.java:deleteAsUser(378)) - Deleting path :
>> > > /home/hadoop/hadoop-2.5.2/logs/userlogs/application_1426120927668_0010
>> > >
>> > > 2015-03-11 20:43:09,676 INFO  [LocalizerRunner for
>> > > container_1426121400423_2587_01_01]
>> > > nodemanager.DefaultContainerExecutor
>> > > (DefaultContainerExecutor.java:createUserCacheDirs(469)) -
>> Initializing
>> > > user root
>> > >
>> > > 2015-03-11 20:43:09,685 INFO  [LocalizerRunner for
>> > > container_1426121400423_2587_01_01]
>> > > nodemanager.DefaultContainerExecutor
>> > > (DefaultContainerExecutor.java:startLocalizer(103)) - Copying from
>> > >
>> > >
>> >
>> /tmp/hadoop-hadoop/nm-local-dir/nmPrivate/container_1426121400423_2587_01_01.tokens
>> > > to
>> > >
>> > >
>> >
>> /tmp/hadoop-hadoop/nm-local-dir/usercache/root/appcache/application_1426121400423_2587/container_1426121400423_2587_01_01.tokens
>> > >
>> > > *2015-03-11 20:43:09,685 INFO  [LocalizerRunner for
>> > > container_1426121400423_2587_01_01]
>> > > nodemanager.DefaultContainerExecutor
>> > > (DefaultContainerExecutor.java:startLocalizer(105)) - CWD set to
>> > >
>> > >
>> >
>> /tmp/hadoop-hadoop/nm-local-dir/usercache/root/appcache/application_1426121400423_2587
>> > > =
>> > >
>> > >
>> >
>> file:/tmp/hadoop-hadoop/nm-local-dir/usercache/root/appcache/application_1426121400423_2587*
>> > >
>> > > *2015-03-11 20:43:09,716 INFO  [IPC Server handler 2 on 8040]
>> > > localizer.ResourceLocalizationService
>> > > (ResourceLocalizationService.java:update(1007)) - DEBUG: FAILED {
>> > &g

Re: Samza on Yarn

2015-03-12 Thread Chris Riccomini
Hey Shekar,

Could you paste the full log on pastebin? It really seems like something's
missing from the classpath. If samza-yarn is there, it should be able to
see that file. I think the full log has a dump of the classpath. If it
doesn't, could you paste the line where the YARN NM is starting up, and
dumps the full classpath?

Cheers,
Chris

On Thu, Mar 12, 2015 at 10:17 AM, Shekar Tippur  wrote:

> I think all these jars are in place (Under
> $HADOOP_YARN_HOME/share/hadoop/hdfs/lib)
>
> - Shekar
>
> On Thu, Mar 12, 2015 at 9:36 AM, Chris Riccomini 
> wrote:
>
> > Hey Shekar,
> >
> > You need that samza-yarn file on your RM/NM's classpath, along with
> scala.
> > We missed this in the docs, and are tracking the issue here:
> >
> >   https://issues.apache.org/jira/browse/SAMZA-456
> >
> > You'll also need samza-core in the classpath, based on the discussion on
> > SAMZA-456. Sorry about that. If you want to update the tutorial when you
> > get your cluster working, and submit a patch, that'd be great! :)
> >
> > Cheers,
> > Chris
> >
> > On Wed, Mar 11, 2015 at 9:43 PM, Shekar Tippur 
> wrote:
> >
> > > Here is the corresponding log:
> > >
> > > 2015-03-11 20:43:09,665 INFO  [AsyncDispatcher event handler]
> > > localizer.LocalizedResource (LocalizedResource.java:handle(203)) -
> > Resource
> > > http://sprfargas102:8000/hello-samza-0.8.0-dist.tar.gz transitioned
> from
> > > INIT to DOWNLOADING
> > >
> > > 2015-03-11 20:43:09,665 INFO  [AsyncDispatcher event handler]
> > > localizer.ResourceLocalizationService
> > > (ResourceLocalizationService.java:handle(679)) - Created localizer for
> > > container_1426121400423_2587_01_01
> > >
> > > 2015-03-11 20:43:09,669 INFO  [LocalizerRunner for
> > > container_1426121400423_2587_01_01]
> > > localizer.ResourceLocalizationService
> > > (ResourceLocalizationService.java:writeCredentials(1107)) - Writing
> > > credentials to the nmPrivate file
> > >
> > >
> >
> /tmp/hadoop-hadoop/nm-local-dir/nmPrivate/container_1426121400423_2587_01_01.tokens.
> > > Credentials list:
> > >
> > > 2015-03-11 20:43:09,675 INFO  [DeletionService #0]
> > > nodemanager.DefaultContainerExecutor
> > > (DefaultContainerExecutor.java:deleteAsUser(378)) - Deleting path :
> > > /home/hadoop/hadoop-2.5.2/logs/userlogs/application_1426120927668_0010
> > >
> > > 2015-03-11 20:43:09,676 INFO  [LocalizerRunner for
> > > container_1426121400423_2587_01_01]
> > > nodemanager.DefaultContainerExecutor
> > > (DefaultContainerExecutor.java:createUserCacheDirs(469)) - Initializing
> > > user root
> > >
> > > 2015-03-11 20:43:09,685 INFO  [LocalizerRunner for
> > > container_1426121400423_2587_01_01]
> > > nodemanager.DefaultContainerExecutor
> > > (DefaultContainerExecutor.java:startLocalizer(103)) - Copying from
> > >
> > >
> >
> /tmp/hadoop-hadoop/nm-local-dir/nmPrivate/container_1426121400423_2587_01_01.tokens
> > > to
> > >
> > >
> >
> /tmp/hadoop-hadoop/nm-local-dir/usercache/root/appcache/application_1426121400423_2587/container_1426121400423_2587_01_01.tokens
> > >
> > > *2015-03-11 20:43:09,685 INFO  [LocalizerRunner for
> > > container_1426121400423_2587_01_01]
> > > nodemanager.DefaultContainerExecutor
> > > (DefaultContainerExecutor.java:startLocalizer(105)) - CWD set to
> > >
> > >
> >
> /tmp/hadoop-hadoop/nm-local-dir/usercache/root/appcache/application_1426121400423_2587
> > > =
> > >
> > >
> >
> file:/tmp/hadoop-hadoop/nm-local-dir/usercache/root/appcache/application_1426121400423_2587*
> > >
> > > *2015-03-11 20:43:09,716 INFO  [IPC Server handler 2 on 8040]
> > > localizer.ResourceLocalizationService
> > > (ResourceLocalizationService.java:update(1007)) - DEBUG: FAILED {
> > > http://sprfargas102:8000/hello-samza-0.8.0-dist.tar.gz
> > > <http://sprfargas102:8000/hello-samza-0.8.0-dist.tar.gz>, 0, ARCHIVE,
> > null
> > > }, java.lang.ClassNotFoundException: Class
> > > org.apache.samza.util.hadoop.HttpFileSystem not found*
> > >
> > > *2015-03-11 20:43:09,716 INFO  [IPC Server handler 2 on 8040]
> > > localizer.LocalizedResource (LocalizedResource.java:handle(203)) -
> > Resource
> > > http://sprfargas102:8000/hello-samza-0.8.0-dist.tar.gz(-
> > > <http://spr

Re: Samza on Yarn

2015-03-12 Thread Chris Riccomini
Hey Shekar,

You need that samza-yarn file on your RM/NM's classpath, along with scala.
We missed this in the docs, and are tracking the issue here:

  https://issues.apache.org/jira/browse/SAMZA-456

You'll also need samza-core in the classpath, based on the discussion on
SAMZA-456. Sorry about that. If you want to update the tutorial when you
get your cluster working, and submit a patch, that'd be great! :)

Cheers,
Chris

On Wed, Mar 11, 2015 at 9:43 PM, Shekar Tippur  wrote:

> Here is the corresponding log:
>
> 2015-03-11 20:43:09,665 INFO  [AsyncDispatcher event handler]
> localizer.LocalizedResource (LocalizedResource.java:handle(203)) - Resource
> http://sprfargas102:8000/hello-samza-0.8.0-dist.tar.gz transitioned from
> INIT to DOWNLOADING
>
> 2015-03-11 20:43:09,665 INFO  [AsyncDispatcher event handler]
> localizer.ResourceLocalizationService
> (ResourceLocalizationService.java:handle(679)) - Created localizer for
> container_1426121400423_2587_01_01
>
> 2015-03-11 20:43:09,669 INFO  [LocalizerRunner for
> container_1426121400423_2587_01_01]
> localizer.ResourceLocalizationService
> (ResourceLocalizationService.java:writeCredentials(1107)) - Writing
> credentials to the nmPrivate file
>
> /tmp/hadoop-hadoop/nm-local-dir/nmPrivate/container_1426121400423_2587_01_01.tokens.
> Credentials list:
>
> 2015-03-11 20:43:09,675 INFO  [DeletionService #0]
> nodemanager.DefaultContainerExecutor
> (DefaultContainerExecutor.java:deleteAsUser(378)) - Deleting path :
> /home/hadoop/hadoop-2.5.2/logs/userlogs/application_1426120927668_0010
>
> 2015-03-11 20:43:09,676 INFO  [LocalizerRunner for
> container_1426121400423_2587_01_01]
> nodemanager.DefaultContainerExecutor
> (DefaultContainerExecutor.java:createUserCacheDirs(469)) - Initializing
> user root
>
> 2015-03-11 20:43:09,685 INFO  [LocalizerRunner for
> container_1426121400423_2587_01_01]
> nodemanager.DefaultContainerExecutor
> (DefaultContainerExecutor.java:startLocalizer(103)) - Copying from
>
> /tmp/hadoop-hadoop/nm-local-dir/nmPrivate/container_1426121400423_2587_01_01.tokens
> to
>
> /tmp/hadoop-hadoop/nm-local-dir/usercache/root/appcache/application_1426121400423_2587/container_1426121400423_2587_01_01.tokens
>
> *2015-03-11 20:43:09,685 INFO  [LocalizerRunner for
> container_1426121400423_2587_01_01]
> nodemanager.DefaultContainerExecutor
> (DefaultContainerExecutor.java:startLocalizer(105)) - CWD set to
>
> /tmp/hadoop-hadoop/nm-local-dir/usercache/root/appcache/application_1426121400423_2587
> =
>
> file:/tmp/hadoop-hadoop/nm-local-dir/usercache/root/appcache/application_1426121400423_2587*
>
> *2015-03-11 20:43:09,716 INFO  [IPC Server handler 2 on 8040]
> localizer.ResourceLocalizationService
> (ResourceLocalizationService.java:update(1007)) - DEBUG: FAILED {
> http://sprfargas102:8000/hello-samza-0.8.0-dist.tar.gz
> , 0, ARCHIVE, null
> }, java.lang.ClassNotFoundException: Class
> org.apache.samza.util.hadoop.HttpFileSystem not found*
>
> *2015-03-11 20:43:09,716 INFO  [IPC Server handler 2 on 8040]
> localizer.LocalizedResource (LocalizedResource.java:handle(203)) - Resource
> http://sprfargas102:8000/hello-samza-0.8.0-dist.tar.gz(-
>  >>/tmp/hadoop-hadoop/nm-local-dir/usercache/root/appcache/application_1426121400423_2587/filecache/10/hello-samza-0.8.0-dist.tar.gz)
> transitioned from DOWNLOADING to FAILED*
>
> 2015-03-11 20:43:09,717 INFO  [AsyncDispatcher event handler]
> container.Container (ContainerImpl.java:handle(918)) - Container
> container_1426121400423_2587_01_01 transitioned from LOCALIZING to
> LOCALIZATION_FAILED
>
> 2015-03-11 20:43:09,717 INFO  [AsyncDispatcher event handler]
> localizer.LocalResourcesTrackerImpl
> (LocalResourcesTrackerImpl.java:handle(151)) - Container
> container_1426121400423_2587_01_01 sent RELEASE event on a resource
> request { http://sprfargas102:8000/hello-samza-0.8.0-dist.tar.gz, 0,
> ARCHIVE, null } not present in cache.
>
> 2015-03-11 20:43:09,717 WARN  [AsyncDispatcher event handler]
> nodemanager.NMAuditLogger (NMAuditLogger.java:logFailure(150)) -
> USER=root OPERATION=Container
> Finished - Failed TARGET=ContainerImpl RESULT=FAILURE DESCRIPTION=Container
> failed with state: LOCALIZATION_FAILED APPID=application_1426121400423_2587
> CONTAINERID=container_1426121400423_2587_01_01
>
> 2015-03-11 20:43:09,717 INFO  [AsyncDispatcher event handler]
> container.Container (ContainerImpl.java:handle(918)) - Container
> container_1426121400423_2587_01_01 transitioned from
> LOCALIZATION_FAILED to DONE
>
> 2015-03-11 20:43:09,717 INFO  [AsyncDispatcher event handler]
> application.Application (ApplicationImpl.java:transition(340)) - Removing
> container_1426121400423_2587_01_01 from application
> application_1426121400423_2587
>
> 2015-03-11 20:43:09,717 INFO  [AsyncDispatcher event handler]
> containermanager.AuxServices (AuxServices.java

Re: Review Request 31881: SAMZA-506: Shutdown RunLoop on SIGTERM.

2015-03-12 Thread Chris Riccomini

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31881/#review75881
---


Seems like it should work. Could you write a test to verify? I know it's a 
little hard with shutdown hooks (or anything that interacts with System/Runtime 
Java stuff), but I can't actually prove to myself that this works without a 
test.


samza-core/src/main/scala/org/apache/samza/container/RunLoop.scala
<https://reviews.apache.org/r/31881/#comment123203>

Nit: move outside of run. Optionally, make shutdownHook injectible via 
constructor for testing purposes.



samza-core/src/main/scala/org/apache/samza/container/RunLoop.scala
<https://reviews.apache.org/r/31881/#comment123201>

I think shutdownHook needs to be made volatile. The hook will be executed 
from another thread than the RunLoop, which is reading the variable.


- Chris Riccomini


On March 10, 2015, 6:55 a.m., Ewen Cheslack-Postava wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/31881/
> ---
> 
> (Updated March 10, 2015, 6:55 a.m.)
> 
> 
> Review request for samza.
> 
> 
> Bugs: SAMZA-506
> https://issues.apache.org/jira/browse/SAMZA-506
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> Shutdown RunLoop on SIGTERM.
> 
> 
> Diffs
> -
> 
>   samza-core/src/main/scala/org/apache/samza/container/RunLoop.scala 499f5c6 
> 
> Diff: https://reviews.apache.org/r/31881/diff/
> 
> 
> Testing
> ---
> 
> Passes unit and zopkio tests.
> 
> 
> Thanks,
> 
> Ewen Cheslack-Postava
> 
>



Re: Extraneous wiki page

2015-03-10 Thread Chris Riccomini
Hey all,

There were about 10 junk pages that had been created. I've deleted them and
limited permissions to just viewing.

Cheers,
Chris

On Tuesday, March 10, 2015, Sean Mackrory  wrote:

> I can't take the credit for asdf :) Thank you, Chris!
> On Mar 10, 2015 6:57 PM, "Milinda Pathirage"  > wrote:
>
> > Hi Chris,
> >
> > There is a another page which is not related Samza.
> >
> > https://cwiki.apache.org/confluence/display/SAMZA/asdf
> >
> > Thanks
> > Milinda
> >
> > On Tue, Mar 10, 2015 at 8:16 PM, Chris Riccomini  >
> > wrote:
> >
> > > Lol, no worries. Fixed. :)
> > >
> > > Wonder if we should re-evaluate the perms on the wiki to either allow
> > > deletions, or not allow page adds.
> > >
> > > On Tue, Mar 10, 2015 at 4:04 PM, Sean Mackrory  >
> > > wrote:
> > >
> > > > All,
> > > >
> > > > I intended a wiki page I created to be in the Bigtop space, however I
> > > > inadvertently created it in the Samza space and now I can't delete it
> > or
> > > > move it out of the space:
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/SAMZA/Social+Media+Guidelines
> > > >
> > > > Could someone with sufficient karma go ahead and remove that for me?
> > It's
> > > > already been re-created in the Bigtop space so nothing will be lost.
> I
> > > > apologize for the inconvenience.
> > > >
> > >
> >
> >
> >
> > --
> > Milinda Pathirage
> >
> > PhD Student | Research Assistant
> > School of Informatics and Computing | Data to Insight Center
> > Indiana University
> >
> > twitter: milindalakmal
> > skype: milinda.pathirage
> > blog: http://milinda.pathirage.org
> >
>


Re: Review Request 31910: SAMZA-505

2015-03-10 Thread Chris Riccomini

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31910/
---

(Updated March 11, 2015, 12:26 a.m.)


Review request for samza.


Bugs: SAMZA-505
https://issues.apache.org/jira/browse/SAMZA-505


Repository: samza


Description (updated)
---

fixing yan's comments


add a safety check and warning when using byte arrays in cached store


Adding a ByteBufferSerde. Useful in cases where people want to use CachedStore 
with byte arrays as a key (i.e. they have to wrap in ByteBuffer), but also want 
a changelog.


Diffs (updated)
-

  build.gradle 0a268ac7a3819cf46b54a93e0e3171455371456a 
  samza-core/src/main/scala/org/apache/samza/container/SamzaContainer.scala 
275eb1a924d09a0a43efe6273e0d2af9217e1c74 
  samza-core/src/main/scala/org/apache/samza/serializers/ByteBufferSerde.scala 
PRE-CREATION 
  samza-core/src/main/scala/org/apache/samza/serializers/ByteSerde.scala 
e7ce09f714bd5d0c6b764ddb3e02cda4b7c7f2e6 
  
samza-core/src/test/scala/org/apache/samza/serializers/TestByteBufferSerde.scala
 PRE-CREATION 
  samza-kv/src/main/scala/org/apache/samza/storage/kv/CachedStore.scala 
84cf6db3a1b035c639a6dec30fe9ee997b282a80 
  samza-kv/src/test/scala/org/apache/samza/storage/kv/TestCachedStore.scala 
PRE-CREATION 

Diff: https://reviews.apache.org/r/31910/diff/


Testing
---


Thanks,

Chris Riccomini



Re: Extraneous wiki page

2015-03-10 Thread Chris Riccomini
Lol, no worries. Fixed. :)

Wonder if we should re-evaluate the perms on the wiki to either allow
deletions, or not allow page adds.

On Tue, Mar 10, 2015 at 4:04 PM, Sean Mackrory  wrote:

> All,
>
> I intended a wiki page I created to be in the Bigtop space, however I
> inadvertently created it in the Samza space and now I can't delete it or
> move it out of the space:
>
> https://cwiki.apache.org/confluence/display/SAMZA/Social+Media+Guidelines
>
> Could someone with sufficient karma go ahead and remove that for me? It's
> already been re-created in the Bigtop space so nothing will be lost. I
> apologize for the inconvenience.
>


Re: Review Request 31881: SAMZA-506: Shutdown RunLoop on SIGTERM.

2015-03-10 Thread Chris Riccomini

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31881/#review75985
---

Ship it!


Looks good. Thanks!

- Chris Riccomini


On March 10, 2015, 7:59 p.m., Ewen Cheslack-Postava wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/31881/
> ---
> 
> (Updated March 10, 2015, 7:59 p.m.)
> 
> 
> Review request for samza.
> 
> 
> Bugs: SAMZA-506
> https://issues.apache.org/jira/browse/SAMZA-506
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> Shutdown RunLoop on SIGTERM.
> 
> 
> Diffs
> -
> 
>   samza-core/src/main/scala/org/apache/samza/container/RunLoop.scala 499f5c6 
>   samza-core/src/test/scala/org/apache/samza/container/TestRunLoop.scala 
> ea48853 
> 
> Diff: https://reviews.apache.org/r/31881/diff/
> 
> 
> Testing
> ---
> 
> Passes unit and zopkio tests.
> 
> 
> Thanks,
> 
> Ewen Cheslack-Postava
> 
>



Review Request 31910: SAMZA-505

2015-03-10 Thread Chris Riccomini

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31910/
---

Review request for samza.


Bugs: SAMZA-505
https://issues.apache.org/jira/browse/SAMZA-505


Repository: samza


Description
---

add a safety check and warning when using byte arrays in cached store


Adding a ByteBufferSerde. Useful in cases where people want to use CachedStore 
with byte arrays as a key (i.e. they have to wrap in ByteBuffer), but also want 
a changelog.


Diffs
-

  build.gradle 0a268ac7a3819cf46b54a93e0e3171455371456a 
  samza-core/src/main/scala/org/apache/samza/container/SamzaContainer.scala 
275eb1a924d09a0a43efe6273e0d2af9217e1c74 
  samza-core/src/main/scala/org/apache/samza/serializers/ByteBufferSerde.scala 
PRE-CREATION 
  samza-core/src/main/scala/org/apache/samza/serializers/ByteSerde.scala 
e7ce09f714bd5d0c6b764ddb3e02cda4b7c7f2e6 
  
samza-core/src/test/scala/org/apache/samza/serializers/TestByteBufferSerde.scala
 PRE-CREATION 
  samza-kv/src/main/scala/org/apache/samza/storage/kv/CachedStore.scala 
84cf6db3a1b035c639a6dec30fe9ee997b282a80 
  samza-kv/src/test/scala/org/apache/samza/storage/kv/TestCachedStore.scala 
PRE-CREATION 

Diff: https://reviews.apache.org/r/31910/diff/


Testing
---


Thanks,

Chris Riccomini



Review Request 31909: SAMZA-590

2015-03-10 Thread Chris Riccomini

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31909/
---

Review request for samza.


Bugs: SAMZA-590
https://issues.apache.org/jira/browse/SAMZA-590


Repository: samza


Description
---

update test to check for refreshes as well


add test


abdicate all partitions in broker proxy when there's a consumer failure


Diffs
-

  samza-kafka/src/main/scala/org/apache/samza/system/kafka/BrokerProxy.scala 
f768263961d395c5d21a857a7581e0c472bbe547 
  
samza-kafka/src/test/scala/org/apache/samza/system/kafka/TestBrokerProxy.scala 
d559d8b276007ebb99b37b2ebf42c1b415e3fbce 

Diff: https://reviews.apache.org/r/31909/diff/


Testing
---


Thanks,

Chris Riccomini



Re: Review Request 31881: SAMZA-506: Shutdown RunLoop on SIGTERM.

2015-03-10 Thread Chris Riccomini

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31881/#review75947
---



samza-core/src/main/scala/org/apache/samza/container/RunLoop.scala
<https://reviews.apache.org/r/31881/#comment123276>

Thought about this a bit more. Agree this is a bit messy. What do you think 
about just having two protected methods in RunLoop: addShutdownHook and 
removeShutdownHook? They can both do the Runtime.whatever invocation. In the 
unit test, we can just override these two methods. Saves us the null checks, 
and the params in the constructor.


- Chris Riccomini


On March 10, 2015, 6:01 p.m., Ewen Cheslack-Postava wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/31881/
> ---
> 
> (Updated March 10, 2015, 6:01 p.m.)
> 
> 
> Review request for samza.
> 
> 
> Bugs: SAMZA-506
> https://issues.apache.org/jira/browse/SAMZA-506
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> Shutdown RunLoop on SIGTERM.
> 
> 
> Diffs
> -
> 
>   samza-core/src/main/scala/org/apache/samza/container/RunLoop.scala 499f5c6 
>   samza-core/src/test/scala/org/apache/samza/container/TestRunLoop.scala 
> ea48853 
> 
> Diff: https://reviews.apache.org/r/31881/diff/
> 
> 
> Testing
> ---
> 
> Passes unit and zopkio tests.
> 
> 
> Thanks,
> 
> Ewen Cheslack-Postava
> 
>



Re: Java Opts and Max Heap

2015-03-10 Thread Chris Riccomini
Hey Jordan,

There are two settings you need to care about when changing memory. One is
yarn.container.memory.mb. The other is the task.opts setting, which will
allow you to specify a custom -Xmx parameter.

If you set the yarn.container.memory.mb to 4000 (4GB), as you suggest, the
JAVA_OPTS check still takes affect, and your Xmx will remain 768MB. If you
wish to increase your heap, you'll need to set both
yarn.container.memory.mb *and* the task.opts setting to be something like
task.opts=-Xmx3g. This will tell YARN to kill any container that uses > 4GB
of physical memory on the machine. The Xmx tells the container not to use
more than 3GB of heap.

It's a bit confusing, but there's a reason for this. The problem we have is
that we (Samza) don't know how much off-heap/non-JVM memory you're going to
use in your container. YARN only pays attention to the amount of physical
memory used by a process. With Java, you can set the heap, but there's also
permgen, JVM, JNI libraries, and off-heap memory usage. All of these
contribute to the physical memory usage that YARN cares about, but are
outside the JVM heap. This means that we can't just use one memory setting
for both YARN and Java. We have to have two.

Cheers,
Chris

On Tue, Mar 10, 2015 at 1:13 AM, Jordan Shaw  wrote:

> Hey Everyone,
> This I have a question somewhat related to SAMZA-109 and this line in
> run-class.sh:
> # Check if a max-heap size is specified. If not - set a 768M heap [[
> $JAVA_OPTS != *-Xmx* ]] && JAVA_OPTS="$JAVA_OPTS -Xmx768M"
>
> If I were to set the container.memory.mb for yarn to 4GB (
> yarn.container.memory.mb = 4096) the above JAVA_OPTS check would cause this
> to be ignored right? I can understand why preventing the Heap from going
> crazy is important in these long running jobs but to me this might cause
> some confusion especially when trying to debug a thrashing GC and Java
> isn't committing the memory amount set in yarn.container.memory.mb.
>
> I was trying to follow why this was decided on in SAMZA-109 but nothing
> stuck out to me. Would it make more sense to just not specifying a max heap
> and letting Yarn kill the job if it goes over it's specified allotment
> and/or let the user explicitly set these opts?  Thanks!
>
> - Jordan
>


Re: Storing sensitive data in the Config

2015-03-09 Thread Chris Riccomini
Hey Tommy,

Yea, this has come up a few times. We don't currently have an answer for
it. The simplest thing to do would be to have a prefix. Any config with the
prefix could be stripped from the AM and logs. Another possibility is to
store the configs in an encrypted way, and have the code decrypt the
configs at runtime.

Can you open a JIRA up to track this? Do you have any other thoughts on the
best way to handle this?

Cheers,
Chris

On Mon, Mar 9, 2015 at 1:00 PM, Tommy Becker  wrote:

> We have some sensitive information that we are currently storing in the
> Samza config.  Our ops guys have some concern regarding where the config is
> displayed (e.g. in logs, app master UI, etc).  I'm curious if others have
> had similar concerns and if so what you did about it.  Seems like we might
> be able to use system properties for these things, albeit at a significant
> cost to convenience.  It would be nice if it were possible to mark config
> values as sensitive (perhaps via some sort of naming convention), and have
> such values be retrievable only via explicit get on the key so these sort
> of incidental exposures can't happen.
>
> --
> Tommy Becker
> Senior Software Engineer
>
> Digitalsmiths
> A TiVo Company
>
> www.digitalsmiths.com
> tobec...@tivo.com
>
> 
>
> This email and any attachments may contain confidential and privileged
> material for the sole use of the intended recipient. Any review, copying,
> or distribution of this email (or any attachments) by others is prohibited.
> If you are not the intended recipient, please contact the sender
> immediately and permanently delete this email and any attachments. No
> employee or agent of TiVo Inc. is authorized to conclude any binding
> agreement on behalf of TiVo Inc. by email. Binding agreements with TiVo
> Inc. may only be made by a signed written agreement.
>


Re: Review Request 31392: SAMZA-555

2015-03-08 Thread Chris Riccomini

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31392/
---

(Updated March 8, 2015, 7:39 p.m.)


Review request for samza.


Bugs: SAMZA-555
https://issues.apache.org/jira/browse/SAMZA-555


Repository: samza


Description (updated)
---

merge master


don't npe if a container has no assignments


killing zk works


listen to existing containers when becoming leader


spelling mistake


fix comments


add notes


clear assignments after coordinator is elected.


add logging


make job runner block until job finishes


add standalone job and standalone job factory. make zk work with hierarchical 
chroot paths.


add unsubscribe to coordinator controller


unsubscribe to assignment changes in container listener


both tests pass.


fix bug in container controller to announce ownership


add initial container controller


make test do asserts


standalone main method works as expected.


rework assignTasksToContainers. controller isn't watching /containers 
sequential ids yet, but it should.


fiddling around with CLI test


writing a little main method to play with zk and coordinator


initial coordinator draft


add missing leadership check


working on zk coordinator controller.


messing around


sketch out a hacky controller.


adding core dependency


create samza-standalone project


Diffs (updated)
-

  build.gradle 0a268ac7a3819cf46b54a93e0e3171455371456a 
  gradle/dependency-versions.gradle 84be50b216e7dc3c430e0979a933d846f8ebbb8d 
  samza-core/src/main/scala/org/apache/samza/config/JobConfig.scala 
3b6685e00837a4aaf809813e62b7e52823bc07a9 
  samza-core/src/main/scala/org/apache/samza/container/RunLoop.scala 
499f5c6a58ff88ef9105b7b6ca168215fb82f35c 
  samza-core/src/main/scala/org/apache/samza/job/JobRunner.scala 
0b720ec4dd83c71fd1ce5071571c7a10babf0ddc 
  
samza-log4j/src/main/java/org/apache/samza/logging/log4j/serializers/LoggingEventStringSerde.java
 8d8f5e8a8e5fd1e4d9e5482a5accd4a7ece463bc 
  
samza-standalone/src/main/java/org/apache/samza/job/standalone/StandaloneJob.java
 PRE-CREATION 
  
samza-standalone/src/main/java/org/apache/samza/job/standalone/StandaloneJobFactory.java
 PRE-CREATION 
  
samza-standalone/src/main/java/org/apache/samza/job/standalone/controller/StandaloneZkContainerController.java
 PRE-CREATION 
  
samza-standalone/src/main/java/org/apache/samza/job/standalone/controller/StandaloneZkCoordinatorController.java
 PRE-CREATION 
  
samza-standalone/src/main/java/org/apache/samza/job/standalone/controller/StandaloneZkCoordinatorState.java
 PRE-CREATION 
  
samza-standalone/src/main/java/org/apache/samza/serializers/zk/ZkJsonSerde.java 
PRE-CREATION 
  samza-standalone/src/main/java/org/apache/samza/util/ZkUtil.java PRE-CREATION 
  samza-test/src/main/java/org/apache/samza/system/mock/MockSystemConsumer.java 
a282dbb2976cb916d41649c3fbe070008c6621ee 
  samza-test/src/main/java/org/apache/samza/task/MockTask.java PRE-CREATION 
  
samza-test/src/test/java/org/apache/samza/job/standalone/controller/TestStandaloneZkCoordinatorController.java
 PRE-CREATION 
  settings.gradle bb07a3b84b14dcef94da1bb166eab6aa3d0026bb 

Diff: https://reviews.apache.org/r/31392/diff/


Testing
---


Thanks,

Chris Riccomini



Re: Review Request 31718: Samza 465 [Draft]

2015-03-06 Thread Chris Riccomini

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31718/#review75489
---


We should definitely open a follow-on JIRA to convert Util to Java, and remove 
UtilJ.

I can't find UtilJ file. CheckpointManager seems to be deleted. This RB seems 
broken.

Overall, looks good. Will need to see the rest of the files, though.


samza-core/src/main/scala/org/apache/samza/coordinator/JobCoordinator.scala
<https://reviews.apache.org/r/31718/#comment122594>

Nit: white space.



samza-core/src/main/scala/org/apache/samza/coordinator/JobCoordinator.scala
<https://reviews.apache.org/r/31718/#comment122595>

Nit: white space.



samza-core/src/main/scala/org/apache/samza/coordinator/JobCoordinator.scala
<https://reviews.apache.org/r/31718/#comment122596>

Config required?



samza-core/src/main/scala/org/apache/samza/coordinator/JobCoordinator.scala
<https://reviews.apache.org/r/31718/#comment122567>

Config required as a param?



samza-core/src/main/scala/org/apache/samza/coordinator/JobCoordinator.scala
<https://reviews.apache.org/r/31718/#comment122599>

I'm confused. The RB shows CheckpointManager is fully deleted. It seems 
like this diff might be missing some stuff.

Does CheckpointManager need to be started before it's used?



samza-kafka/src/main/scala/org/apache/samza/system/kafka/KafkaSystemFactory.scala
<https://reviews.apache.org/r/31718/#comment122600>

Does this comment make sense anymore?



samza-api/src/main/java/org/apache/samza/container/TaskName.java
<https://reviews.apache.org/r/31718/#comment122565>

Javadoc.

Also, more clarity. I think what you're trying to say is that the class 
should never have anything other than taskName in it.



samza-core/src/main/java/org/apache/samza/changelog/ChangelogManager.java
<https://reviews.apache.org/r/31718/#comment122566>

Javadoc.

More logging in this class, please.



samza-core/src/main/java/org/apache/samza/changelog/ChangelogManager.java
<https://reviews.apache.org/r/31718/#comment122569>

SLF4J style: log.debug("Adding taskName {} to {}", taskName, this).



samza-core/src/main/java/org/apache/samza/changelog/ChangelogManager.java
<https://reviews.apache.org/r/31718/#comment122570>

Does this need to be public?



samza-core/src/main/java/org/apache/samza/changelog/ChangelogManager.java
<https://reviews.apache.org/r/31718/#comment122568>

Style: Brackets if() {}



samza-core/src/main/java/org/apache/samza/changelog/ChangelogManager.java
<https://reviews.apache.org/r/31718/#comment122571>

CheckpointManager?



samza-core/src/main/java/org/apache/samza/changelog/ChangelogManager.java
<https://reviews.apache.org/r/31718/#comment122572>

CheckpointManager?



samza-core/src/main/java/org/apache/samza/changelog/ChangelogManager.java
<https://reviews.apache.org/r/31718/#comment122574>

Oddly, I can't seem to find the 
CoordinatorStreamMessage.SetChangelogMapping class anywhere. I checked the full 
RB (not just the 1-2 diff), but couldn't find it. Where am I missing this?



samza-core/src/main/java/org/apache/samza/job/model/TaskModel.java
<https://reviews.apache.org/r/31718/#comment122576>

Nit: getCheckpointedOffsets



samza-core/src/main/java/org/apache/samza/job/model/TaskModel.java
<https://reviews.apache.org/r/31718/#comment122577>

Generated by IntelliJ, right?



samza-core/src/main/java/org/apache/samza/job/model/TaskModel.java
<https://reviews.apache.org/r/31718/#comment122575>

This is all auto-generated from IntelliJ, right?



samza-core/src/main/java/org/apache/samza/serializers/model/JsonTaskModelMixIn.java
<https://reviews.apache.org/r/31718/#comment122579>

Nit: getCheckpointedOffsets



samza-core/src/main/java/org/apache/samza/serializers/model/SamzaObjectMapper.java
<https://reviews.apache.org/r/31718/#comment122580>

Style.



samza-core/src/main/scala/org/apache/samza/checkpoint/OffsetManager.scala
<https://reviews.apache.org/r/31718/#comment122586>

Collapse these two lines using idiomatic scala:

val lastProcessedOffsets = previousCheckpointedOffsets.filter { case 
(systemStreamPartition, offset) => Option(offset).isDefined }



samza-core/src/main/scala/org/apache/samza/container/SamzaContainer.scala
<https://reviews.apache.org/r/31718/#comment122589>

nit: Whitespace.



samza-core/src/main/scala/org/apache/samza/container/SamzaContainer.scala
<https://reviews.apache.org/r/31718/#comment122590>

Nit: white space.



samza-core/src/main/scala/org/apache/samza/container/SamzaContainer.scala
<https://reviews.apache.org/r/31718/#comment122592>

Idiomatic scala, please. I think

Re: How to get Container ID from configuration

2015-03-05 Thread Chris Riccomini
Hey Jae,

Ah, yes, you are correct. 0.8.0 has container names, not IDs. In YARN,
these names will be the YARN container names. I'm not sure what they're set
to in Mesos--I'd have to check the Mesos JobRunner stuff.

Cheers,
Chris

On Thu, Mar 5, 2015 at 5:47 PM, Bae, Jae Hyeon  wrote:

> ENV_CONTAINER_ID does not look existing in 0.8.0. I will try
> ENV_CONTAINER_NAME.
>
> On Thu, Mar 5, 2015 at 5:09 PM, Chris Riccomini 
> wrote:
>
>> Hey Jae,
>>
>> Also, this should work:
>>
>>   System.getenv(ShellCommandConfig.ENV_CONTAINER_ID).toInt
>>
>> This is actually how SamzaContainer gets its containerId variable. BTW,
>> I've opened:
>>
>> https://issues.apache.org/jira/browse/SAMZA-586
>>
>> To track exposing SamzaContainerContext.
>>
>> Cheers,
>> Chris
>>
>
>


Re: [DISCUSS] Samza 0.9.0 release

2015-03-05 Thread Chris Riccomini
Hey all,

No majore objections. I'm moving forward with this. I've moved all 0.9.0
JIRAs to 0.10.0, except for the ones in the above list. Here's a list of
all open 0.9.0 bugs:

https://issues.apache.org/jira/issues/?jql=project%20%3D%20SAMZA%20AND%20resolution%20%3D%20Unresolved%20AND%20fixVersion%20%3D%200.9.0%20ORDER%20BY%20key%20ASC%2C%20priority%20DESC

Any help with any of these would be GREATLY appreciated. I'd like to see
them closed before the vote.

Cheers,
Chris

On Wed, Mar 4, 2015 at 9:16 AM, Chinmay Soman 
wrote:

> +1 on the plan. Sounds good to me
>
> On Tue, Mar 3, 2015 at 5:03 PM, Yan Fang  wrote:
>
> > Yes, it sounds good. I asked was because I saw the "patch available"
> > status. :)
> >
> > Cheers,
> >
> > Fang, Yan
> > yanfang...@gmail.com
> > +1 (206) 849-4108
> >
> > On Tue, Mar 3, 2015 at 12:18 PM, Chris Riccomini 
> > wrote:
> >
> > > Hey Yan,
> > >
> > > I want to hold off on SAMZA-448 until 0.10.0. The reason is that I
> don't
> > > want both a coordinator stream and a checkpoint stream. It just seems
> > ugly.
> > > Naveen is currently working on unifying these classes in SAMZA-465.
> > Because
> > > these are both big changes, I wanted to burn them in on master before
> > > cutting a release. Does that sound good?
> > >
> > > Cheers,
> > > Chris
> > >
> > > On Tue, Mar 3, 2015 at 11:26 AM, Yan Fang 
> wrote:
> > >
> > > > What about SAMZA-448 ?
> > > >
> > > > Agreed on the release vote date. I think it is a good idea to have a
> > > > release before the ApacheCon (April 16-3-17), which can help spread
> the
> > > > words.
> > > >
> > > > Cheers,
> > > >
> > > > Fang, Yan
> > > > yanfang...@gmail.com
> > > > +1 (206) 849-4108
> > > >
> > > > On Tue, Mar 3, 2015 at 8:01 AM, Chris Riccomini <
> criccom...@apache.org
> > >
> > > > wrote:
> > > >
> > > > > Hey all,
> > > > >
> > > > > We haven't done a release in a couple of months, and I think master
> > is
> > > > > getting to the point where it's got enough work to warrant a
> release.
> > > > We've
> > > > > (LinkedIn) begun running a couple of our jobs against `master` last
> > > week,
> > > > > and will continue to test for stability.
> > > > >
> > > > > Here are the JIRAs that I think we should try and get done before
> > 0.9.0
> > > > is
> > > > > released (sorted in priority order):
> > > > >
> > > > > SAMZA-458
> > > > > Close in KafkaSystemProducer should flush all source buffers
> > > > > SAMZA-497
> > > > > Add MetricsSnapshotSerde back
> > > > > SAMZA-543
> > > > > Disable WAL in RocksDB KV store
> > > > > SAMZA-567
> > > > > Can't interact with KV store from InitiableTask.init()
> > > > > SAMZA-506
> > > > > Shutdown container on SIGTERM
> > > > > SAMZA-571
> > > > > Add task interface suppress uncaught exceptions
> > > > > SAMZA-576
> > > > > Transient failures in
> > > > > TestKafkaSystemAdmin.testShouldGetOldestNewestAndNextOffsets
> > > > > SAMZA-505
> > > > > CachedStore doesn't support Array keys well
> > > > >
> > > > > Here's what I propose:
> > > > >
> > > > > 1. Cut an 0.9.0 branch.
> > > > > 2. Work on getting as many of these JIRAs done as possible.
> > > > > 3. Target a release vote on the week of March 23.
> > > > >
> > > > > Thoughts?
> > > > >
> > > > > Cheers,
> > > > > Chris
> > > > >
> > > >
> > >
> >
>
>
>
> --
> Thanks and regards
>
> Chinmay Soman
>


Re: How to get Container ID from configuration

2015-03-05 Thread Chris Riccomini
Hey Jae,

Also, this should work:

  System.getenv(ShellCommandConfig.ENV_CONTAINER_ID).toInt

This is actually how SamzaContainer gets its containerId variable. BTW,
I've opened:

https://issues.apache.org/jira/browse/SAMZA-586

To track exposing SamzaContainerContext.

Cheers,
Chris


Re: How to get Container ID from configuration

2015-03-05 Thread Chris Riccomini
Hey Jae,

We expose this in SamzaContainerContext. Unfortunately, this class never
gets exposed to the StreamTask. The plan was to expose this class via
TaskContext, but we haven't done that yet.

Have you tried using System.getProperty("samza.container.id")?

Cheers,
Chris

On Thu, Mar 5, 2015 at 4:41 PM, Bae, Jae Hyeon  wrote:

> Hi Samza devs
>
> How can I get container id such as 0, 1, 2, ... up to yarn.container.count?
>
> I tried to get some environment variable but it didn't work well.
>
> If it's not easy to get the container id, I have to use process id to
> distinguish containers from the same job running in the same machine but
> container id will be more useful than process id.
>
> Thank you
> Best, Jae
>


Re: [DISCUSS] Samza 0.9.0 release

2015-03-03 Thread Chris Riccomini
Hey Yan,

I want to hold off on SAMZA-448 until 0.10.0. The reason is that I don't
want both a coordinator stream and a checkpoint stream. It just seems ugly.
Naveen is currently working on unifying these classes in SAMZA-465. Because
these are both big changes, I wanted to burn them in on master before
cutting a release. Does that sound good?

Cheers,
Chris

On Tue, Mar 3, 2015 at 11:26 AM, Yan Fang  wrote:

> What about SAMZA-448 ?
>
> Agreed on the release vote date. I think it is a good idea to have a
> release before the ApacheCon (April 16-3-17), which can help spread the
> words.
>
> Cheers,
>
> Fang, Yan
> yanfang...@gmail.com
> +1 (206) 849-4108
>
> On Tue, Mar 3, 2015 at 8:01 AM, Chris Riccomini 
> wrote:
>
> > Hey all,
> >
> > We haven't done a release in a couple of months, and I think master is
> > getting to the point where it's got enough work to warrant a release.
> We've
> > (LinkedIn) begun running a couple of our jobs against `master` last week,
> > and will continue to test for stability.
> >
> > Here are the JIRAs that I think we should try and get done before 0.9.0
> is
> > released (sorted in priority order):
> >
> > SAMZA-458
> > Close in KafkaSystemProducer should flush all source buffers
> > SAMZA-497
> > Add MetricsSnapshotSerde back
> > SAMZA-543
> > Disable WAL in RocksDB KV store
> > SAMZA-567
> > Can't interact with KV store from InitiableTask.init()
> > SAMZA-506
> > Shutdown container on SIGTERM
> > SAMZA-571
> > Add task interface suppress uncaught exceptions
> > SAMZA-576
> > Transient failures in
> > TestKafkaSystemAdmin.testShouldGetOldestNewestAndNextOffsets
> > SAMZA-505
> > CachedStore doesn't support Array keys well
> >
> > Here's what I propose:
> >
> > 1. Cut an 0.9.0 branch.
> > 2. Work on getting as many of these JIRAs done as possible.
> > 3. Target a release vote on the week of March 23.
> >
> > Thoughts?
> >
> > Cheers,
> > Chris
> >
>


Re: Stream-Stream joins - restricting the size of the KV store

2015-03-03 Thread Chris Riccomini
Hey Kishore,

Right now, that has to be done by the application code. Our plan is to
support a TTL in RocksDB once that feature is available to us in the Java
JNI bindings (should be soon). In the meantime, you'll have to implement
the WindowableTask.window() method, and call store.all() to periodically
clear out old keys.

Cheers,
Chris

On Tue, Mar 3, 2015 at 9:08 AM, Kishore N C  wrote:

> Hi all,
>
> I just read through this post
> <
> http://samza.apache.org/learn/documentation/0.7.0/container/state-management.html#stream-stream-join
> >
> about implementing streaming joins in Samza using a KV store that Samza
> provides. Can someone tell me how I can ensure that this KV storage does
> not grow too large (even though I know that this is on disk)? Does Samza
> provide a way to purge older keys, or is that expected to be done by the
> application code?
>
> Thanks.
>
> KN.
>


[DISCUSS] Samza 0.9.0 release

2015-03-03 Thread Chris Riccomini
Hey all,

We haven't done a release in a couple of months, and I think master is
getting to the point where it's got enough work to warrant a release. We've
(LinkedIn) begun running a couple of our jobs against `master` last week,
and will continue to test for stability.

Here are the JIRAs that I think we should try and get done before 0.9.0 is
released (sorted in priority order):

SAMZA-458
Close in KafkaSystemProducer should flush all source buffers
SAMZA-497
Add MetricsSnapshotSerde back
SAMZA-543
Disable WAL in RocksDB KV store
SAMZA-567
Can't interact with KV store from InitiableTask.init()
SAMZA-506
Shutdown container on SIGTERM
SAMZA-571
Add task interface suppress uncaught exceptions
SAMZA-576
Transient failures in
TestKafkaSystemAdmin.testShouldGetOldestNewestAndNextOffsets
SAMZA-505
CachedStore doesn't support Array keys well

Here's what I propose:

1. Cut an 0.9.0 branch.
2. Work on getting as many of these JIRAs done as possible.
3. Target a release vote on the week of March 23.

Thoughts?

Cheers,
Chris


Re: Apache Samza Meetup - March 4 @6PM hosted at LinkedIn's campus in Mountain View CA

2015-03-02 Thread Chris Riccomini
Hey all,

Replying to this message in case anyone missed it. Appears that GMail
thinks Ed is spam. :)

Cheers,
Chris

On Mon, Mar 2, 2015 at 9:58 AM, Ed Yakabosky <
eyakabo...@linkedin.com.invalid> wrote:

> Hi all -
>
> I would like to announce the first Bay Area Apache Samza Meetup<
> http://www.meetup.com/Bay-Area-Samza-Meetup/events/220354853/> hosted at
> LinkedIn in Mountain View, CA on March 4, 2015 @6PM.  We plan to host the
> event every 2-months to encourage knowledge sharing & collaboration in
> Samza’s usage<http://wiki.apache.org/samza/PoweredBy> and open source<
> http://samza.apache.org/> community.<http://samza.apache.org/>
>
> The agenda for the meetup is::
>
>   *   6:00 – 6:15PM: Doors open, sign NDAs, networking, food & drinks
>   *   6:15 - 6:45PM: Naveen Somasundaram (LinkedIn) – Getting Started with
> Apache Samza
>   *   6:45 - 7:30PM: Shekar Tippur (Intuit) – Powering Contextual Alerts
> for Intuit’s Operations Center with Apache Samza
>   *   7:30 - 8:00PM: Chris Riccomini (LinkedIn) – Where we’re headed next:
> Apache Samza Roadmap
>
> We plan to provide food & drinks so please RSVP here<
> http://www.meetup.com/Bay-Area-Samza-Meetup/events/220354853/> to help us
> with estimation.  Please let me know if you have any questions or ideas for
> future meet ups.
>
> We plan to announce a live stream the day of the event for remote
> attendance.
>
> Excited to see you there!
> Ed Yakabosky
>
> [BCC:
> Kafka Open Source
> Samza Open Source
> LinkedIn’s DDS and DAI teams
> Linkedin’s Samza customers
> Tech-Talk]
>


Review Request 31649: SAMZA-584

2015-03-02 Thread Chris Riccomini

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31649/
---

Review request for samza.


Bugs: SAMZA-584
https://issues.apache.org/jira/browse/SAMZA-584


Repository: samza


Description
---

fixing race condition


Diffs
-

  
samza-kafka/src/main/scala/org/apache/samza/system/kafka/KafkaSystemConsumer.scala
 38117e2193dd99fe08f0be9d7736cb89575c0723 

Diff: https://reviews.apache.org/r/31649/diff/


Testing
---


Thanks,

Chris Riccomini



Re: Stream SQL for Samza Query Language Guide and Design

2015-03-02 Thread Chris Riccomini
Hey all,

Just closing the loop on this. I've migrated the wiki to:

  https://cwiki.apache.org/confluence/display/SAMZA/Apache+Samza

The website has been updated accordingly.

Cheers,
Chris

On Wed, Feb 18, 2015 at 2:45 PM, Chris Riccomini 
wrote:

> Hey all,
>
> I've setup a request with INFRA to migrate to CWiki:
>
> https://issues.apache.org/jira/browse/INFRA-9176
>
> Cheers,
> Chris
>
> On Tue, Feb 17, 2015 at 9:57 AM, Yi Pan  wrote:
>
>> I shared the same pain with wiki before. Either cwiki or Markdown sounds
>> good to me.
>>
>> -Yi
>>
>> On Tue, Feb 17, 2015 at 9:53 AM, Chris Riccomini 
>> wrote:
>>
>> > Hey Milinda,
>> >
>> > Yea, I agree. Confluence is better than Moin Moin. If others agree, I
>> think
>> > we should just switch to Confluence.
>> >
>> > So, shall we block on Wiki upgrade? Probably will take a couple of days.
>> > What do others thing about migrating to Confluence for wiki?
>> >
>> > Cheers,
>> > Chris
>> >
>> > On Tue, Feb 17, 2015 at 9:48 AM, Milinda Pathirage <
>> mpath...@umail.iu.edu>
>> > wrote:
>> >
>> > > Hi Chris,
>> > >
>> > > I am okay with Markdown.
>> > >
>> > > But we have the option of using cwiki (https://cwiki.apache.org). I
>> have
>> > > used it in the past for Axis project. It's pretty flexible, with
>> respect
>> > to
>> > > user management and I think its better than wiki deployed at
>> > > wiki.apache.org.
>> > > For an example, you can give permission to a user who is not a
>> committer
>> > > (of the project that owns the space) to edit certain pages in a space
>> > (Each
>> > > project can have its own space). Also it has commenting.
>> > >
>> > > On the other hand, JIRA may be  better  because we can track
>> everything
>> > in
>> > > a single place.
>> > >
>> > > Thanks
>> > > Milinda
>> > >
>> > >
>> > >
>> > > On Tue, Feb 17, 2015 at 12:16 PM, Chris Riccomini <
>> criccom...@apache.org
>> > >
>> > > wrote:
>> > >
>> > > > Hey Milinda,
>> > > >
>> > > > We can do wiki, but Apache's wiki is pretty locked down, sadly.
>> You'll
>> > > have
>> > > > to get an account for it.
>> > > >
>> > > > What if we just convert it to Markdown (like SAMZA-516), and attach
>> to
>> > > the
>> > > > patch? This will make it versioned, and editable. Comments can
>> happen
>> > in
>> > > > the JIRA. I'm OK either way, but the wiki has been a little
>> cumbersome
>> > in
>> > > > the past.
>> > > >
>> > > > Cheers,
>> > > > Chris
>> > > >
>> > > > On Mon, Feb 16, 2015 at 1:59 PM, Milinda Pathirage <
>> > > mpath...@umail.iu.edu>
>> > > > wrote:
>> > > >
>> > > > > Hi Chris and Yi,
>> > > > >
>> > > > > Most of the query language definitions  and examples in
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://issues.apache.org/jira/secure/attachment/12690583/StreamSQLforSAMZA-v0.1.docx.docx
>> > > > > (document attached to
>> > https://issues.apache.org/jira/browse/SAMZA-390)
>> > > > are
>> > > > > no longer valid because we moved to extended SQL  supported in
>> > Calcite.
>> > > > So
>> > > > > we need to change the document to reflect the changes.
>> > > > >
>> > > > > Also I found that, having a document with several basic stream SQL
>> > > > samples
>> > > > > will be useful when developing the query planning and physical
>> plan
>> > > > > generation logic.
>> > > > >
>> > > > > How about moving above document to a wiki page or somewhere we
>> can do
>> > > > > shared editing?
>> > > > >
>> > > > > Thanks
>> > > > > Milinda
>> > > > >
>> > > > > --
>> > > > > Milinda Pathirage
>> > > > >
>> > > > > PhD Student | Research Assistant
>> > > > > School of Informatics and Computing | Data to Insight Center
>> > > > > Indiana University
>> > > > >
>> > > > > twitter: milindalakmal
>> > > > > skype: milinda.pathirage
>> > > > > blog: http://milinda.pathirage.org
>> > > > >
>> > > >
>> > >
>> > >
>> > >
>> > > --
>> > > Milinda Pathirage
>> > >
>> > > PhD Student | Research Assistant
>> > > School of Informatics and Computing | Data to Insight Center
>> > > Indiana University
>> > >
>> > > twitter: milindalakmal
>> > > skype: milinda.pathirage
>> > > blog: http://milinda.pathirage.org
>> > >
>> >
>>
>
>


Re: Review Request 31417: Samza 465 [Draft]

2015-03-01 Thread Chris Riccomini

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31417/#review74715
---


Hey Naveen, could you please re-post this in two stages? First, do an `rbt 
post` with just my changes, then a second with yours? This way, I can diff 
between my changes, and yours, to see just what you've done. I don't want to 
get things tangled between SAMZA-448 and SAMZA-465.

- Chris Riccomini


On Feb. 25, 2015, 5:45 p.m., Naveen Somasundaram wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/31417/
> ---
> 
> (Updated Feb. 25, 2015, 5:45 p.m.)
> 
> 
> Review request for samza.
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> The patch does the following (Other than Chris's previous changes):
> 1. Removes all checkpointmanagers specific to the system and replaces them to 
> a single entity that uses coordinator stream to do the checkpointing
> 2. Decoupled change log from checkpoint manager, a new component called 
> changelog manager will do that.
> 3. Passes the checkpoint information to the containers from the jobcoordinator
> 4. Modifies the offsetmanger to use the new checkpoint manager and starting 
> offsets from jobcoordinator.
> 
> Pending Issues:
> Please note that this is a draft patch (unit tests pass, but still need to 
> run Zopkio integration test) and do some cosemtic checks and add more docs 
> after testing.
> 1. The failure scenario (when the container fails and the jobcoordinator 
> serving the latest offsets) has still not been addressed, I am working on it 
> now. 
> 2. The metrics registry is right now not used correctly, I need to pass the 
> right reference to it
> 3. The coordinator stream does not use the same consumer and producer from 
> the "systemconsumers" and "systemproducers" we have.
> 
> 
> Diffs
> -
> 
>   build.gradle 38383bd9e3f0847d6088a4ea4c1ee6f3dcd1e430 
>   samza-api/src/main/java/org/apache/samza/checkpoint/Checkpoint.java 
> 593d11872430100e000c7d4b6edc5ef29649d8d4 
>   samza-api/src/main/java/org/apache/samza/checkpoint/CheckpointManager.java 
> 092cb910b40d312217e86420bf1ddfbaf605e9e5 
>   
> samza-api/src/main/java/org/apache/samza/checkpoint/CheckpointManagerFactory.java
>  a97ff0919d8205928efee1a2a20780659180849d 
>   samza-api/src/main/java/org/apache/samza/container/TaskName.java 
> 083358686fc69ab45bbc73e898f419224ebc3a9f 
>   samza-api/src/main/java/org/apache/samza/system/SystemAdmin.java 
> 8995ba30c823bddcdfd3af7100e1440e71ef7998 
>   samza-api/src/main/java/org/apache/samza/task/TaskCoordinator.java 
> 6ff1a555f3c48e416bb78e94c5df71eff0a27f3d 
>   
> samza-api/src/main/java/org/apache/samza/util/SinglePartitionWithoutOffsetsSystemAdmin.java
>  01997ae22641b735cd452a0e89a49219e2874892 
>   samza-core/src/main/java/org/apache/samza/changelog/ChangelogManager.java 
> PRE-CREATION 
>   samza-core/src/main/java/org/apache/samza/job/model/TaskModel.java 
> eb22d2ec5f09ca59790e2871d9bff9745fe925dc 
>   
> samza-core/src/main/java/org/apache/samza/serializers/model/JsonTaskModelMixIn.java
>  7dc431c74a3fc2ba80eb47d6c5d87524cb4c9bde 
>   
> samza-core/src/main/java/org/apache/samza/serializers/model/SamzaObjectMapper.java
>  3517912eaafbf95f8c8cc70ab5869548a56b76e7 
>   samza-core/src/main/scala/org/apache/samza/checkpoint/CheckpointTool.scala 
> ddc30af7c52d8a4d5c5de02f6757c040b1f31c93 
>   samza-core/src/main/scala/org/apache/samza/checkpoint/OffsetManager.scala 
> a40c87fa7865746a5612c55a4cf24c8d005be7e0 
>   
> samza-core/src/main/scala/org/apache/samza/checkpoint/file/FileSystemCheckpointManager.scala
>  2a87a6e0cef72179b5383fc824266de1f9606293 
>   samza-core/src/main/scala/org/apache/samza/config/JobConfig.scala 
> 3b6685e00837a4aaf809813e62b7e52823bc07a9 
>   samza-core/src/main/scala/org/apache/samza/config/ShellCommandConfig.scala 
> 1a2dd4413f56e53dbeeb47b5637d7b0c50522f02 
>   samza-core/src/main/scala/org/apache/samza/config/StreamConfig.scala 
> adef09e15c666cb2dbb2e4c5507fc2d605b82a1e 
>   samza-core/src/main/scala/org/apache/samza/config/TaskConfig.scala 
> 1ca9e2cc5673c537b6a48224809847e94da81fca 
>   samza-core/src/main/scala/org/apache/samza/container/SamzaContainer.scala 
> 8a6d8656c14ad9c7cc7b5d9a39f1f733afd71a88 
>   samza-core/src/main/scala/org/apache/samza/coordinator/JobCoordinator.scala 
> c14f2f623bb4bae911dd3085ce428175930e4545 
>   samza-core/src/main/scala/org/apache/samza/job/JobRunner.scala 
> 16345cd1c

  1   2   3   >