from:"Eno Thereska"

Re: Subject: [VOTE] 2.4.1 RC0

2020-03-04 Thread Eno Thereska

Hi Bill,

I built from source and ran unit and integration tests. They passed.
There was a large number of skipped tests, but I'm assuming that is
intentional.

Cheers
Eno

On Tue, Mar 3, 2020 at 8:42 PM Eric Lalonde  wrote:
>
> Hi,
>
> I ran:
> $  https://github.com/elalonde/kafka/blob/master/bin/verify-kafka-rc.sh 
>  2.4.1 
> https://home.apache.org/~bbejeck/kafka-2.4.1-rc0 
> 
>
> All checksums and signatures are good and all unit and integration tests that 
> were executed passed successfully.
>
> - Eric
>
> > On Mar 2, 2020, at 6:39 PM, Bill Bejeck  wrote:
> >
> > Hello Kafka users, developers and client-developers,
> >
> > This is the first candidate for release of Apache Kafka 2.4.1.
> >
> > This is a bug fix release and it includes fixes and improvements from 38
> > JIRAs, including a few critical bugs.
> >
> > Release notes for the 2.4.1 release:
> > https://home.apache.org/~bbejeck/kafka-2.4.1-rc0/RELEASE_NOTES.html
> >
> > *Please download, test and vote by Thursday, March 5, 9 am PT*
> >
> > Kafka's KEYS file containing PGP keys we use to sign the release:
> > https://kafka.apache.org/KEYS
> >
> > * Release artifacts to be voted upon (source and binary):
> > https://home.apache.org/~bbejeck/kafka-2.4.1-rc0/
> >
> > * Maven artifacts to be voted upon:
> > https://repository.apache.org/content/groups/staging/org/apache/kafka/
> >
> > * Javadoc:
> > https://home.apache.org/~bbejeck/kafka-2.4.1-rc0/javadoc/
> >
> > * Tag to be voted upon (off 2.4 branch) is the 2.4.1 tag:
> > https://github.com/apache/kafka/releases/tag/2.4.1-rc0
> >
> > * Documentation:
> > https://kafka.apache.org/24/documentation.html
> >
> > * Protocol:
> > https://kafka.apache.org/24/protocol.html
> >
> > * Successful Jenkins builds for the 2.4 branch:
> > Unit/integration tests: Links to successful unit/integration test build to
> > follow
> > System tests:
> > https://jenkins.confluent.io/job/system-test-kafka/job/2.4/152/
> >
> >
> > Thanks,
> > Bill Bejeck
>

Operationalizing Zookeeper and common gotchas

2019-03-18 Thread Eno Thereska

Hi folks,

The team here has come up with a couple of clarifying tips for
operationalizing Zookeeper for Kafka that we found missing from the
official documentation, and passed them along to share. If you find them
useful, I'm thinking of putting on
https://cwiki.apache.org/confluence/display/KAFKA/FAQ. Meanwhile any
feedback is appreciated.

---
Operationalizing Zookeeper FAQ

The discussion below uses a 3-instance Zookeeper cluster as an example. The
findings apply to a larger cluster as well, but you’ll need to adjust the
numbers.

- Does it make sense to have a config with only 2 Zookeeper instances?
I.e., in zookeeper.properties file have two entries for server 1 and server
2 only. A: No. A setup with 2 Zookeeper instances is not fault tolerant to
even 1 failure. If one of the Zookeeper instances fails, the remaining one
will not be functional since there is no quorum majority (1 out of 2 is not
majority). If you do a “stat” command on that remaining instance you’ll see
the output being “This ZooKeeper instance is not currently serving
requests”.

- What if you end up with only 2 running Zookeeper instances, e.g., you
started with 3 but one failed? Isn’t that the same as the case above? A: No
it’s not the same scenario. First of all, the 3- instance setup did
tolerate 1 instance down. The 2 remaining Zookeeper instances will continue
to function because the quorum majority (2 out of 3) is there.

- I had a 3 Zookeeper instance setup and one instance just failed. How
should I recover? A: Restart the failed instance with the same
configuration it had before (i.e., same “myid” ID file, and same IP
address). It is not important to recover the data volume of the failed
instance, but it is a bonus if you do so. Once the instance comes up, it
will sync with the other 2 Zookeeper instances and get all the data.

- I had a 3 Zookeeper instance setup and two instances failed. How should I
recover? Is my Zookeeper cluster even running at that point? A: First of
all, ZooKeeper is now unavailable and the remaining instance will show
“This ZooKeeper instance is not currently serving requests” if probed.
Second, you should make sure this situation is extremely rare. It should be
possible to recover the first failed instance quickly before the second
instance fails. Third, bring up the two failed instances one by one without
changing anything in their config. Similarly to the case above, it is not
important to recover the data volume of the failed instance, but it is a
bonus if you do so. Once the instance comes up, it will sync with the other
1 ZooKeeper instance and get all the data.

- I had a 3 Zookeeper instance setup and two instances failed. I can’t
recover the failed instances for whatever reason. What should I do? A: You
will have to restart the remaining healthy ZooKeeper in “standalone” mode
and restart all the brokers and point them to this standalone zookeeper
(instead of all 3 ZooKeepers).

- The Zookeeper cluster is unavailable (for any of the reasons mentioned
above, e.g., no quorum, all instances have failed). What is the impact on
Kafka clients? What is the impact on brokers? A: The Zookeeper cluster is
unavailable (for any of the reasons mentioned above, e.g., no quorum, all
instances have failed). What is the impact on Kafka applications
producing/consuming? What is the impact on admin tools to manage topics and
cluster? What is the impact on brokers? A: Applications will be able to
continue producing and consuming, at least for a while. This is true if the
ZooKeeper cluster is temporarily unavailable but eventually becomes
available (after a few mins). On the other hand, if the ZooKeeper cluster
is permanently unavailable, then applications will slowly start to see
problems with producing/consuming especially if some brokers fail, because
the partition leaders will not be distributed to other brokers. So taking
one extreme, if the ZooKeeper cluster is down for a month, it is very
likely that applications will get produce/consume errors. Admin tools
(e.g., that create topics, set ACLs or change configs) will not work.
Brokers will not be impacted from Zookeeper being unavailable. They will
periodically try to reconnect to the ZooKeeper cluster. If you take care to
use the same IP address for a recovered Zookeeper instance as it had before
it failed, brokers will not need to be restarted.
--

Cheers,
Eno

Re: [VOTE] 2.1.1 RC2

2019-02-09 Thread Eno Thereska

+1 passes unit + integration tests. Eno

On Fri, Feb 8, 2019 at 11:10 PM Magnus Edenhill  wrote:

> +1
>
> Passes librdkafka test suite.
>
> Den fre 8 feb. 2019 kl 21:02 skrev Colin McCabe :
>
> > Hi all,
> >
> > This is the third candidate for release of Apache Kafka 2.1.1.  This
> > release includes many bug fixes for Apache Kafka 2.1.
> >
> > Compared to rc1, this release includes the following changes:
> > * MINOR: release.py: fix some compatibility problems.
> > * KAFKA-7897; Disable leader epoch cache when older message formats are
> > used
> > * KAFKA-7902: Replace original loginContext if SASL/OAUTHBEARER refresh
> > login fails
> > * MINOR: Fix more places where the version should be bumped from 2.1.0 ->
> > 2.1.1
> > * KAFKA-7890: Invalidate ClusterConnectionState cache for a broker if the
> > hostname of the broker changes.
> > * KAFKA-7873; Always seek to beginning in KafkaBasedLog
> > * MINOR: Correctly set dev version in version.py
> >
> > Check out the release notes here:
> > http://home.apache.org/~cmccabe/kafka-2.1.1-rc2/RELEASE_NOTES.html
> >
> > The vote will go until Wednesday, February 13st.
> >
> > * Release artifacts to be voted upon (source and binary):
> > http://home.apache.org/~cmccabe/kafka-2.1.1-rc2/
> >
> > * Maven artifacts to be voted upon:
> > https://repository.apache.org/content/groups/staging/
> >
> > * Javadoc:
> > http://home.apache.org/~cmccabe/kafka-2.1.1-rc2/javadoc/
> >
> > * Tag to be voted upon (off 2.1 branch) is the 2.1.1 tag:
> > https://github.com/apache/kafka/releases/tag/2.1.1-rc2
> >
> > * Jenkins builds for the 2.1 branch:
> > Unit/integration tests: https://builds.apache.org/job/kafka-2.1-jdk8/
> >
> > Thanks to everyone who tested the earlier RCs.
> >
> > cheers,
> > Colin
> >
>

Re: [VOTE] 2.1.1 RC1

2019-01-30 Thread Eno Thereska

I couldn't repro locally, that was on an m3.large. And it's not happening
anymore. Might be a transient issue.

Thanks,
Eno

On Wed, Jan 30, 2019 at 6:46 PM Colin McCabe  wrote:

> (+all lists)
>
> Hi Eno,
>
> Thanks for testing this.
>
> Those tests passed in the Jenkins build we did here:
> https://builds.apache.org/job/kafka-2.1-jdk8/118/
>
> Perhaps there is an environment issue at play here?  Do you get the same
> failures running those tests on the 2.1 release?
>
> Best,
> Colin
>
> On Wed, Jan 30, 2019, at 09:11, Eno Thereska wrote:
> > Hi Colin,
> >
> > I've been running the tests and so far I get the following failures. Are
> > they known?
> >
> > kafka.server.ReplicaManagerQuotasTest >
> shouldGetBothMessagesIfQuotasAllow
> > FAILED
> > kafka.server.ReplicaManagerQuotasTest >
> > testCompleteInDelayedFetchWithReplicaThrottling FAILED
> > kafka.server.ReplicaManagerQuotasTest >
> > shouldExcludeSubsequentThrottledPartitions FAILED
> > kafka.server.ReplicaManagerQuotasTest >
> > shouldGetNoMessagesIfQuotasExceededOnSubsequentPartitions FAILED
> > kafka.server.ReplicaManagerQuotasTest >
> > shouldIncludeInSyncThrottledReplicas FAILED
> >
> > Thanks
> > Eno
> >
> > On Sun, Jan 27, 2019 at 9:46 PM Colin McCabe  wrote:
> >
> > > Hi all,
> > >
> > > This is the second candidate for release of Apache Kafka 2.1.1.  This
> > > release includes many bug fixes for Apache Kafka 2.1.
> > >
> > > Compared to rc0, this release includes the following changes:
> > > * MINOR: Upgrade ducktape to 0.7.5 (#6197)
> > > * KAFKA-7837: Ensure offline partitions are picked up as soon as
> possible
> > > when shrinking ISR
> > > * tests/kafkatest/__init__.py now contains __version__ = '2.1.1' rather
> > > than '2.1.1.dev0'
> > > * Maven artifacts should be properly staged this time
> > > * I have added my GPG key to https://kafka.apache.org/KEYS
> > >
> > > Check out the release notes here:
> > > http://home.apache.org/~cmccabe/kafka-2.1.1-rc1/RELEASE_NOTES.html
> > >
> > > The vote will go until Friday, February 1st.
> > >
> > > * Release artifacts to be voted upon (source and binary):
> > > http://home.apache.org/~cmccabe/kafka-2.1.1-rc1/
> > >
> > > * Maven artifacts to be voted upon:
> > > https://repository.apache.org/content/groups/staging/
> > >
> > > * Javadoc:
> > > http://home.apache.org/~cmccabe/kafka-2.1.1-rc1/javadoc/
> > >
> > > * Tag to be voted upon (off 2.1 branch) is the 2.1.1 tag:
> > > https://github.com/apache/kafka/releases/tag/2.1.1-rc1
> > >
> > > * Successful Jenkins builds for the 2.1 branch:
> > > Unit/integration tests:
> https://builds.apache.org/job/kafka-2.1-jdk8/118/
> > >
> > > thanks,
> > > Colin
> > >
> >
>

Re: [VOTE] 2.1.1 RC1

2019-01-30 Thread Eno Thereska

Hi Colin,

I've been running the tests and so far I get the following failures. Are
they known?

kafka.server.ReplicaManagerQuotasTest > shouldGetBothMessagesIfQuotasAllow
FAILED
kafka.server.ReplicaManagerQuotasTest >
testCompleteInDelayedFetchWithReplicaThrottling FAILED
kafka.server.ReplicaManagerQuotasTest >
shouldExcludeSubsequentThrottledPartitions FAILED
kafka.server.ReplicaManagerQuotasTest >
shouldGetNoMessagesIfQuotasExceededOnSubsequentPartitions FAILED
kafka.server.ReplicaManagerQuotasTest >
shouldIncludeInSyncThrottledReplicas FAILED

Thanks
Eno

On Sun, Jan 27, 2019 at 9:46 PM Colin McCabe  wrote:

> Hi all,
>
> This is the second candidate for release of Apache Kafka 2.1.1.  This
> release includes many bug fixes for Apache Kafka 2.1.
>
> Compared to rc0, this release includes the following changes:
> * MINOR: Upgrade ducktape to 0.7.5 (#6197)
> * KAFKA-7837: Ensure offline partitions are picked up as soon as possible
> when shrinking ISR
> * tests/kafkatest/__init__.py now contains __version__ = '2.1.1' rather
> than '2.1.1.dev0'
> * Maven artifacts should be properly staged this time
> * I have added my GPG key to https://kafka.apache.org/KEYS
>
> Check out the release notes here:
> http://home.apache.org/~cmccabe/kafka-2.1.1-rc1/RELEASE_NOTES.html
>
> The vote will go until Friday, February 1st.
>
> * Release artifacts to be voted upon (source and binary):
> http://home.apache.org/~cmccabe/kafka-2.1.1-rc1/
>
> * Maven artifacts to be voted upon:
> https://repository.apache.org/content/groups/staging/
>
> * Javadoc:
> http://home.apache.org/~cmccabe/kafka-2.1.1-rc1/javadoc/
>
> * Tag to be voted upon (off 2.1 branch) is the 2.1.1 tag:
> https://github.com/apache/kafka/releases/tag/2.1.1-rc1
>
> * Successful Jenkins builds for the 2.1 branch:
> Unit/integration tests: https://builds.apache.org/job/kafka-2.1-jdk8/118/
>
> thanks,
> Colin
>

Re: Kafka running on AWS - how to retain broker.id on new instance spun-up in-place of instance/broker failed

2018-11-15 Thread Eno Thereska

The general answer depends on what control plane software is taking care of
your Kafka deployment. You probably have a layer that launches Kafka
instances and monitors their health, right? If so, that layer should take
care of the mapping between instances and broker IDs and keep that in a
table persisted somewhere (e.g., DynamoDB).

Eno

On Wed, Nov 14, 2018 at 7:38 PM Srinivas Rapolu  wrote:

> EBS is one of the option. But we use instance level storage where we loose
> all data as soon as we have a broker failed in AWS.
>
> In such scenario, anyone has better launch script or cofiguration can be
> executed on new broker to retain the old id not conflicting with existing
> broker ids.
>
> On Wed, Nov 14, 2018, 11:58 AM Andrey Dyachkov  wrote:
>
> > You can attach EBS volume, which will store data and metadata(e.g. broker
> > id), and then attach it to the new AWS instance and start Kafka, it will
> > pick the broker id plus you won’t need to rebalance the cluster.
> >
> > On Wed 14. Nov 2018 at 19:48, naresh Goud 
> > wrote:
> >
> > > Static IP. Buying static IP may help. I am not aws expert
> > >
> > > On Wed, Nov 14, 2018 at 12:47 PM Srinivas Rapolu 
> > > wrote:
> > >
> > > > Hello Kafka experts,
> > > >
> > > > We are running Kafka on AWS, main question is what is the best way to
> > > > retain broker.id on new instance spun-up in-place of instance/broker
> > > > failed.
> > > >
> > > > We are currently running Kafka in AWS with broker.id gets auto
> > > generated.
> > > > But we are having issues when a broker is failed, new broker/instance
> > > > spun-up in AWS get assigned with new broker.id. The issue is, with
> > this
> > > > approach, we need to re-assign the topics/replications on to the new
> > > broker
> > > > manually.
> > > >
> > > > We learned that, replication can be auto resolved by Kafka, if we can
> > > > manage to get the same broker.id on the new AWS instance spun-up
> > > in-place
> > > > of failed broker/instance.
> > > >
> > > > I have read, we can set broker.id.generation.enable= false, but what
> is
> > > the
> > > > best way to identify and retain the broker.id? Any links/help is
> > > > appreciated.
> > > > Thanks and Regards,
> > > > Cnu
> > > >
> > > --
> > > Thanks,
> > > Naresh
> > > www.linkedin.com/in/naresh-dulam
> > > http://hadoopandspark.blogspot.com/
> > >
> > --
> > Thanks,
> > Andrey
> >
>

Re: [VOTE] 2.1.0 RC1

2018-11-13 Thread Eno Thereska

Built code and ran tests. Getting a single integration test failure:

kafka.log.LogCleanerParameterizedIntegrationTest >
testCleansCombinedCompactAndDeleteTopic[3] FAILED
java.lang.AssertionError: Contents of the map shouldn't change
expected: (340,340), 5 -> (345,345), 10 -> (350,350), 14 ->
(354,354), 1 -> (341,341), 6 -> (346,346), 9 -> (349,349), 13 -> (353,353),
2 -> (342,342), 17 -> (357,357), 12 -> (352,352), 7 -> (347,347), 3 ->
(343,343), 18 -> (358,358), 16 -> (356,356), 11 -> (351,351), 8 ->
(348,348), 19 -> (359,359), 4 -> (344,344), 15 -> (355,355))> but
was: (340,340), 5 -> (345,345), 10 -> (350,350), 14 -> (354,354),
1 -> (341,341), 6 -> (346,346), 9 -> (349,349), 13 -> (353,353), 2 ->
(342,342), 17 -> (357,357), 12 -> (352,352), 7 -> (347,347), 3 ->
(343,343), 18 -> (358,358), 16 -> (356,356), 11 -> (351,351), 99 ->
(299,299), 8 -> (348,348), 19 -> (359,359), 4 -> (344,344), 15 ->
(355,355))>
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:834)
at org.junit.Assert.assertEquals(Assert.java:118)
at
kafka.log.LogCleanerParameterizedIntegrationTest.testCleansCombinedCompactAndDeleteTopic(LogCleanerParameterizedIntegrationTest.scala:129)

Thanks
Eno

On Sun, Nov 11, 2018 at 7:34 PM Jonathan Santilli <
jonathansanti...@gmail.com> wrote:

> Hello,
>
> +1
>
> I have downloaded the release artifact from
> http://home.apache.org/~lindong/kafka-2.1.0-rc1/
> Executed a 3 brokers cluster. (java8 8u192b12)
> Executed kafka-monitor for about 1 hour without problems.
>
> Thanks,
> --
> Jonathan
>
>
> On Fri, Nov 9, 2018 at 11:33 PM Dong Lin  wrote:
>
> > Hello Kafka users, developers and client-developers,
> >
> > This is the second candidate for feature release of Apache Kafka 2.1.0.
> >
> > This is a major version release of Apache Kafka. It includes 28 new KIPs
> > and
> >
> > critical bug fixes. Please see the Kafka 2.1.0 release plan for more
> > details:
> >
> > *
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=91554044*
> > <
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=91554044
> > >
> >
> > Here are a few notable highlights:
> >
> > - Java 11 support
> > - Support for Zstandard, which achieves compression comparable to gzip
> with
> > higher compression and especially decompression speeds(KIP-110)
> > - Avoid expiring committed offsets for active consumer group (KIP-211)
> > - Provide Intuitive User Timeouts in The Producer (KIP-91)
> > - Kafka's replication protocol now supports improved fencing of zombies.
> > Previously, under certain rare conditions, if a broker became partitioned
> > from Zookeeper but not the rest of the cluster, then the logs of
> replicated
> > partitions could diverge and cause data loss in the worst case (KIP-320)
> > - Streams API improvements (KIP-319, KIP-321, KIP-330, KIP-353, KIP-356)
> > - Admin script and admin client API improvements to simplify admin
> > operation (KIP-231, KIP-308, KIP-322, KIP-324, KIP-338, KIP-340)
> > - DNS handling improvements (KIP-235, KIP-302)
> >
> > Release notes for the 2.1.0 release:
> > http://home.apache.org/~lindong/kafka-2.1.0-rc0/RELEASE_NOTES.html
> >
> > *** Please download, test and vote by Thursday, Nov 15, 12 pm PT ***
> >
> > * Kafka's KEYS file containing PGP keys we use to sign the release:
> > http://kafka.apache.org/KEYS
> >
> > * Release artifacts to be voted upon (source and binary):
> > http://home.apache.org/~lindong/kafka-2.1.0-rc1/
> >
> > * Maven artifacts to be voted upon:
> > https://repository.apache.org/content/groups/staging/
> >
> > * Javadoc:
> > http://home.apache.org/~lindong/kafka-2.1.0-rc1/javadoc/
> >
> > * Tag to be voted upon (off 2.1 branch) is the 2.1.0-rc1 tag:
> > https://github.com/apache/kafka/tree/2.1.0-rc1
> >
> > * Documentation:
> > *http://kafka.apache.org/21/documentation.html*
> > 
> >
> > * Protocol:
> > http://kafka.apache.org/21/protocol.html
> >
> > * Successful Jenkins builds for the 2.1 branch:
> > Unit/integration tests: *
> https://builds.apache.org/job/kafka-2.1-jdk8/50/
> > *
> >
> > Please test and verify the release artifacts and submit a vote for this
> RC,
> > or report any issues so we can fix them and get a new RC out ASAP.
> Although
> > this release vote requires PMC votes to pass, testing, votes, and bug
> > reports are valuable and appreciated from everyone.
> >
> > Cheers,
> > Dong
> >
>
>
> --
> Santilli Jonathan
>

Re: [VOTE] 2.0.1 RC0

2018-11-01 Thread Eno Thereska

Anything else holding this up?

Thanks
Eno

On Thu, Nov 1, 2018 at 10:27 AM Jakub Scholz  wrote:

> +1 (non-binding) ... I used the staged binaries and run tests with
> different clients.
>
> On Fri, Oct 26, 2018 at 4:29 AM Manikumar 
> wrote:
>
> > Hello Kafka users, developers and client-developers,
> >
> > This is the first candidate for release of Apache Kafka 2.0.1.
> >
> > This is a bug fix release closing 49 tickets:
> > https://cwiki.apache.org/confluence/display/KAFKA/Release+Plan+2.0.1
> >
> > Release notes for the 2.0.1 release:
> > http://home.apache.org/~manikumar/kafka-2.0.1-rc0/RELEASE_NOTES.html
> >
> > *** Please download, test and vote by  Tuesday, October 30, end of day
> >
> > Kafka's KEYS file containing PGP keys we use to sign the release:
> > http://kafka.apache.org/KEYS
> >
> > * Release artifacts to be voted upon (source and binary):
> > http://home.apache.org/~manikumar/kafka-2.0.1-rc0/
> >
> > * Maven artifacts to be voted upon:
> > https://repository.apache.org/content/groups/staging/
> >
> > * Javadoc:
> > http://home.apache.org/~manikumar/kafka-2.0.1-rc0/javadoc/
> >
> > * Tag to be voted upon (off 2.0 branch) is the 2.0.1 tag:
> > https://github.com/apache/kafka/releases/tag/2.0.1-rc0
> >
> > * Documentation:
> > http://kafka.apache.org/20/documentation.html
> >
> > * Protocol:
> > http://kafka.apache.org/20/protocol.html
> >
> > * Successful Jenkins builds for the 2.0 branch:
> > Unit/integration tests:
> https://builds.apache.org/job/kafka-2.0-jdk8/177/
> >
> > /**
> >
> > Thanks,
> > Manikumar
> >
>

Re: Problems trying to make kafka 'rack-aware'

2018-09-21 Thread Eno Thereska

Hi Bryan,

I did a simple check with starting a broker with no rack id and then
restarting with a rack id and I can confirm I could get the rack id from
zookeeper after the restart. This was on trunk. Does that basic check work
for you (i.e., without reassigning partitions)?

Thanks
Eno

On Fri, Sep 21, 2018 at 2:07 PM, Bryan Duggan 
wrote:

>
> I didn't get a response to this, but I've been investigating more and can
> now frame the problem slightly differently (hopefully, more accurately).
>
> According to this document
>
> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+
> data+structures+in+Zookeeper
>
> Which defines broker data structures in zookeeper, the following is the
> broker schema (from version 0.10 onwards - I am using version 0.11)
>
> { "fields":
> [ {"name": "version", "type": "int", "doc": "version id"},
>   {"name": "host", "type": "string", "doc": "ip address or host name
> of the broker"},
>   {"name": "port", "type": "int", "doc": "port of the broker"},
>   {"name": "jmx_port", "type": "int", "doc": "port for jmx"}
>   {"name": "endpoints", "type": "array", "items": "string", "doc":
> "endpoints supported by the broker"}
>   {"name": "rack", "type": "string", "doc": "Rack of the broker.
> Optional. This will be used in rack aware replication assignment for fault
> tolerance."}
> ]
> }
>
> when I check my broker data in zookeeper (which has a non-null broker.rack
> setting in the properties file), I have the following;
>
> {"endpoints":["PLAINTEXT://x.x.x.x.abcd:9092"],"jmx_port":-1
> ,"host":"x.x.x.x.abc","timestamp":"1537527988341","port":9092,"version":2}
>
> there is no 'rack'.
>
> In the server.log file on my kafka broker I see;
> 
> [2018-09-21 13:00:40,227] INFO KafkaConfig values:
> advertised.host.name = null
> .
> .
> broker.id = 1234567
> broker.rack = rack1
> compression.type = producer
> .
> -
>
> so it looks fine from the broker side. However, when I restart kafka on
> the host, it doesn't load any rack information into zookeeper.
>
> Can someone please confirm to me, if I have rack awareness, should I
> expect to see a value for 'rack' in zookeeper? If so, do I need to do
> something else on the broker side to get it to include it as part of the
> meta-data it writes (as far as I can see it writes the metadata each time
> kafka is restarted).
>
> thanks
> Bryan
>
>
>
>
>
>
>
>
> On 20/09/2018 11:31, Bryan Duggan wrote:
>
>>
>> Hi,
>>
>> I have a kafka cluster consisting of 3 brokers across 3 different AWS
>> availability zones.  It hosts several topics, each of which has a
>> replication factor of 3. The cluster is currently not 'rack-aware'.
>>
>> I am trying to do the following;
>>
>> - add 3 additional brokers (one in each of the 3 AZs)
>>
>> - make the cluster 'rack-aware'. (ie: create 3 racks on a per-AZ
>> basic, each containing 2 brokers)
>>
>> - reassign the topics with the intention of having 1 replica in each
>> of the 3 racks.
>>
>> To achieve this I've added 'broker.rack' to the properties file for each
>> broker. The rack name is the same as the AZ name each broker is in. I've
>> restarted kafka on all brokers (in case that's required for rack-awareness
>> to take effect).
>>
>> Following restart I've attempted to reassign topics across all 6 brokers
>> by running the following;
>>
>> - ./kafka-reassign-partitions.sh --zookeeper $ZK
>> --topics-to-move-json-file topics-to-move.json --broker-list '1,2,3,4,5,6'
>>
>> (where topics-to-move.json is a simple json file containing the topics to
>> reassign)
>>
>> The problem I am having is, after running 'kafka-reassign-partitions.sh'
>> with 6 brokers listed in the broker-list, it doesn't honour
>> rack-awareness, and instead assigns 2 partitions to brokers in a single
>> rack with a 3rd being assigned elsewhere.
>>
>> The version of kafka I am using is 2.11-1.1.1.
>>
>> Any documentation I've read suggests the above should have achieved what
>> I want. However, it is not working as expected.
>>
>> Has anyone else make their kafka cluster 'rack-aware'? If so, did you
>> experience any issues doing so?
>>
>> Or, can anyone tell me if there's some step I'm missing to make this work.
>>
>> TIA
>>
>> Bryan
>>
>>
>>
>>
>

Re: Continue to consume messages when exception occurs in Kafka Stream

2017-08-17 Thread Eno Thereska

Hi Duy,

What kind of exception are you getting? With KIP-161 (checked in trunk) we 
allow log-and-skip type exception handlers for deserialization errors: 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-161%3A+streams+deserialization+exception+handlers

Is yours a deserialization exception or a higher level exception in your code? 
From your email I think its the latter. Could you describe the scenario a bit 
more? We don’t currently have a way to skip on the latter errors but would be 
interested to collect more info from users.

Thanks
Eno

> On Aug 17, 2017, at 5:11 PM, Duy Truong  wrote:
> 
> Hi everyone,
> 
> My kafka stream app has an exception (my business exception), and then it
> doesn't consume messages anymore. Is there any way to make my app continues
> consume messages when the exception occurs?
> 
> Thanks
> 
> -- 
> *Duy Truong*

Re: Forwarding consumer with kafka streams

2017-08-12 Thread Eno Thereska

Hi Ricardo,

Kafka Streams should handle that case as well. What streams config are you 
using, could you share it? There is one parameter that is called 
“ConsumerConfig.AUTO_OFFSET_RESET_CONFIG” and by default it’s set to 
“earliest”. Any chance your app has changed it to “latest”?

Thanks
Eno

> On Aug 12, 2017, at 5:13 PM, Ricardo Costa  wrote:
> 
> Hi,
> 
> I've implemented a forwarding consumer which literally just consumes the
> messages from a source topic, logs them and then publishes them to a target
> topic.
> 
> I wanted to keep the implementation simple with very little code so I went
> with kafka-streams. I have a really simple topology with a source for the
> source topic, a sink for the target topic and a logging processor
> in-between.
> 
> I'm quite happy with the solution, really simple and elegant, I ran some
> basic tests and everything seemed to be working. As I went on to build more
> test cases, I found that the stream only does its thing if I push messages
> to the source topic *after* creating the stream and waiting until it is
> fully initialized. Is this the expected behaviour? I need the stream to be
> started at any point in time and forward the messages that were buffered on
> the source topic until then. Are kafka-streams not fit for this use case?
> Or am I missing something?
> 
> Thanks in advance!
> 
> --
> Ricardo

Re: Kafka Streams not auto-creating the state store changelog topic

2017-08-07 Thread Eno Thereska

HI Anish,

Yeah, changing the input topic partitions at runtime could be problematic. But 
it doesn’t seem like that’s what’s going on here. (For regex the application it 
will work fine).

Are there any broker failures going on while test is running? Also, I wonder 
about how the rest of your code looks like. There is some code here 
https://github.com/confluentinc/examples/blob/3.3.0-post/kafka-streams/src/test/java/io/confluent/examples/streams/StateStoresInTheDSLIntegrationTest.java#L158
 
<https://github.com/confluentinc/examples/blob/3.3.0-post/kafka-streams/src/test/java/io/confluent/examples/streams/StateStoresInTheDSLIntegrationTest.java>
 that shows how to create the state stores and initialize Kafka Streams and the 
order of doing things. Could you please double check if it matches your code?

Thanks
Eno


> On Aug 5, 2017, at 3:22 AM, Anish Mashankar <an...@systeminsights.com> wrote:
> 
> Hello Eno,
> So, if I change the input topic partitions, it affects the ability of kafka
> streams to find partitions for the state store changelog? I think I'm
> missing something here.
> In my case, the application was new, so it's for sure that there were no
> changes.
> Also, if I have regex for the input topic on kafka streams and a new topic
> is added to kafka matching the regex, the application will break?
> 
> On Fri, Aug 4, 2017, 8:33 PM Eno Thereska <eno.there...@gmail.com 
> <mailto:eno.there...@gmail.com>> wrote:
> 
>> Hi,
>> 
>> Could you check if this helps:
>> 
>> https://stackoverflow.com/questions/42329387/failed-to-rebalance-error-in-kafka-streams-with-more-than-one-topic-partition
>>  
>> <https://stackoverflow.com/questions/42329387/failed-to-rebalance-error-in-kafka-streams-with-more-than-one-topic-partition>
>> <
>> https://stackoverflow.com/questions/42329387/failed-to-rebalance-error-in-kafka-streams-with-more-than-one-topic-partition
>>  
>> <https://stackoverflow.com/questions/42329387/failed-to-rebalance-error-in-kafka-streams-with-more-than-one-topic-partition>
>>> 
>> 
>> Thanks
>> Eno
>>> On Aug 4, 2017, at 12:48 PM, Anish Mashankar <an...@systeminsights.com>
>> wrote:
>>> 
>>> Hello Eno,
>>> Thanks for considering the question.
>>> 
>>> How I am creating the state stores:
>>> 
>>> StateStoreSupplier stateStoreSupplier =
>>> 
>> StateStorStores.create("testing-2-store").withKeys(keySerde).withValues(valueSerde).persistent().build();
>>> TopologyBuilder builder = ...
>>> builder.addStateStore(stateStoreSupplier, "ProcessorUsingStateStore");
>>> 
>>> The Error Message with stack trace is as follows:
>>> 
>>> 2017-08-04 17:11:23,184 53205
>>> [testing-2-9f5aa1d8-35c7-4f0c-9593-be31738cb4c0-StreamThread-1] INFO
>>> o.a.k.s.p.internals.StreamThread - stream-thread
>>> [testing-2-9f5aa1d8-35c7-4f0c-9593-be31738cb4c0-StreamThread-1] Created
>>> active task -727063541_0 with assigned partitions [testing-topic-0]
>>> 
>>> 2017-08-04 17:11:23,185 53206
>>> [testing-2-9f5aa1d8-35c7-4f0c-9593-be31738cb4c0-StreamThread-1] INFO
>>> o.a.k.s.p.internals.StreamThread - stream-thread
>>> [testing-2-9f5aa1d8-35c7-4f0c-9593-be31738cb4c0-StreamThread-1] partition
>>> assignment took 41778 ms.
>>> current active tasks: []
>>> current standby tasks: []
>>> 
>>> 2017-08-04 17:11:23,187 53208
>>> [testing-2-9f5aa1d8-35c7-4f0c-9593-be31738cb4c0-StreamThread-1] ERROR
>>> o.a.k.c.c.i.ConsumerCoordinator - User provided listener
>>> 
>> org.apache.kafka.streams.processor.internals.StreamThread$RebalanceListener
>>> for group testing-2 failed on partition assignment
>>> org.apache.kafka.streams.errors.StreamsException: Store testing-2-store's
>>> change log (testing-2-testing-2-store-changelog) does not contain
>> partition
>>> 0
>>> at
>>> 
>> org.apache.kafka.streams.processor.internals.StoreChangelogReader.validatePartitionExists(StoreChangelogReader.java:87)
>>> at
>>> 
>> org.apache.kafka.streams.processor.internals.ProcessorStateManager.register(ProcessorStateManager.java:165)
>>> at
>>> 
>> org.apache.kafka.streams.processor.internals.AbstractProcessorContext.register(AbstractProcessorContext.java:100)
>>> at
>>> 
>> org.apache.kafka.streams.state.internals.RocksDBStore.init(RocksDBStore.java:177)
>>> at
>>> 
>> org.apache.kafka.streams.state.internals.ChangeLoggingKeyValueBytesStore.init(ChangeLoggingKeyValueBytesStore.java:40)
>>> at

Re: Kafka Streams not auto-creating the state store changelog topic

2017-08-04 Thread Eno Thereska

Hi,

Could you check if this helps:
https://stackoverflow.com/questions/42329387/failed-to-rebalance-error-in-kafka-streams-with-more-than-one-topic-partition
 
<https://stackoverflow.com/questions/42329387/failed-to-rebalance-error-in-kafka-streams-with-more-than-one-topic-partition>

Thanks
Eno
> On Aug 4, 2017, at 12:48 PM, Anish Mashankar <an...@systeminsights.com> wrote:
> 
> Hello Eno,
> Thanks for considering the question.
> 
> How I am creating the state stores:
> 
> StateStoreSupplier stateStoreSupplier =
> StateStorStores.create("testing-2-store").withKeys(keySerde).withValues(valueSerde).persistent().build();
> TopologyBuilder builder = ...
> builder.addStateStore(stateStoreSupplier, "ProcessorUsingStateStore");
> 
> The Error Message with stack trace is as follows:
> 
> 2017-08-04 17:11:23,184 53205
> [testing-2-9f5aa1d8-35c7-4f0c-9593-be31738cb4c0-StreamThread-1] INFO
> o.a.k.s.p.internals.StreamThread - stream-thread
> [testing-2-9f5aa1d8-35c7-4f0c-9593-be31738cb4c0-StreamThread-1] Created
> active task -727063541_0 with assigned partitions [testing-topic-0]
> 
> 2017-08-04 17:11:23,185 53206
> [testing-2-9f5aa1d8-35c7-4f0c-9593-be31738cb4c0-StreamThread-1] INFO
> o.a.k.s.p.internals.StreamThread - stream-thread
> [testing-2-9f5aa1d8-35c7-4f0c-9593-be31738cb4c0-StreamThread-1] partition
> assignment took 41778 ms.
> current active tasks: []
> current standby tasks: []
> 
> 2017-08-04 17:11:23,187 53208
> [testing-2-9f5aa1d8-35c7-4f0c-9593-be31738cb4c0-StreamThread-1] ERROR
> o.a.k.c.c.i.ConsumerCoordinator - User provided listener
> org.apache.kafka.streams.processor.internals.StreamThread$RebalanceListener
> for group testing-2 failed on partition assignment
> org.apache.kafka.streams.errors.StreamsException: Store testing-2-store's
> change log (testing-2-testing-2-store-changelog) does not contain partition
> 0
> at
> org.apache.kafka.streams.processor.internals.StoreChangelogReader.validatePartitionExists(StoreChangelogReader.java:87)
> at
> org.apache.kafka.streams.processor.internals.ProcessorStateManager.register(ProcessorStateManager.java:165)
> at
> org.apache.kafka.streams.processor.internals.AbstractProcessorContext.register(AbstractProcessorContext.java:100)
> at
> org.apache.kafka.streams.state.internals.RocksDBStore.init(RocksDBStore.java:177)
> at
> org.apache.kafka.streams.state.internals.ChangeLoggingKeyValueBytesStore.init(ChangeLoggingKeyValueBytesStore.java:40)
> at
> org.apache.kafka.streams.state.internals.ChangeLoggingKeyValueStore.init(ChangeLoggingKeyValueStore.java:57)
> at
> org.apache.kafka.streams.state.internals.MeteredKeyValueStore$7.run(MeteredKeyValueStore.java:99)
> at
> org.apache.kafka.streams.processor.internals.StreamsMetricsImpl.measureLatencyNs(StreamsMetricsImpl.java:187)
> at
> org.apache.kafka.streams.state.internals.MeteredKeyValueStore.init(MeteredKeyValueStore.java:130)
> at
> org.apache.kafka.streams.processor.internals.AbstractTask.initializeStateStores(AbstractTask.java:201)
> at
> org.apache.kafka.streams.processor.internals.StreamTask.(StreamTask.java:140)
> at
> org.apache.kafka.streams.processor.internals.StreamThread.createStreamTask(StreamThread.java:1234)
> at
> org.apache.kafka.streams.processor.internals.StreamThread$TaskCreator.createTask(StreamThread.java:294)
> at
> org.apache.kafka.streams.processor.internals.StreamThread$AbstractTaskCreator.retryWithBackoff(StreamThread.java:254)
> at
> org.apache.kafka.streams.processor.internals.StreamThread.addStreamTasks(StreamThread.java:1313)
> at
> org.apache.kafka.streams.processor.internals.StreamThread.access$1100(StreamThread.java:73)
> at
> org.apache.kafka.streams.processor.internals.StreamThread$RebalanceListener.onPartitionsAssigned(StreamThread.java:183)
> at
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:265)
> at
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:363)
> at
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:310)
> at
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:297)
> at
> org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1078)
> at
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1043)
> at
> org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(StreamThread.java:582)
> at
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:553)
> at
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:527)
> 
> I hope this shares m

Re: Kafka Streams not auto-creating the state store changelog topic

2017-08-04 Thread Eno Thereska

Hi Anish,

Could you give more info on how you create the state stores in your code? Also 
could you copy-paste the exact error message from the log?

Thanks
Eno
> On Aug 4, 2017, at 9:05 AM, Anish Mashankar  wrote:
> 
> I have a new application, call it streamsApp with state stores S1 and S2.
> So, according to the documentation, upon the first time startup, the
> application should've created the changelog topics streamsApp-S1-changelog
> and streamsApp-S2-changelog. But I see that these topics are not created.
> Also, the application throws an error that it couldn't find any partition
> for topics *streamsApp-S1-changelog and streamsApp-S2-changelog *and then
> exits*. *To get it working, I manually created the topics, but I am
> skeptical because the docs say that this convention might change any time.
> I am using Kafka Streams v0.11, with a Kafka Broker v0.11, but message
> protocol set to v0.10.0. Am I missing something?
> -- 
> 
> Regards,
> Anish Samir Mashankar
> R Engineer
> System Insights
> +91-9789870733

Re: Do we have to query localWindowStore in same java instance we are creating the store

2017-07-17 Thread Eno Thereska

Hi Sachin,

1. You can run a remote query and we provide some example code 
(https://www.confluent.io/blog/unifying-stream-processing-and-interactive-queries-in-apache-kafka/
 
)
 however by default APache Kafka ships with just the local query capabilities. 
The above example has some code to do remote querying though.

2. So we don’t have a notion of windows closing in Kafka Streams. The 
application will need to decide how frequently to query. So your Java 
application can query, sleep, and query in a while(1) loop for example.

Cheers
Eno


> On Jul 15, 2017, at 7:07 AM, Sachin Mittal  wrote:
> 
> Hi,
> I have created a simple window store to count occurrences of a given key.
> 
> 
> My pipeline is:
> 
>TimeWindows windows = TimeWindows.of(n).advanceBy(n).until(30n);
>final StateStoreSupplier supplier =
> Stores.create("key-table")
>.withKeys(Serdes.String())
>.withValues(Serdes.Long())
>.persistent()
>.enableLogging(topicConfigMap)
>.windowed(windows.size(), windows.maintainMs(),
> windows.segments, false)
>.build();
> 
>builder.stream(Serdes.String(), valueSerde, "input-topic")
>.groupByKey()
>.count(windows, supplier)
> 
> 
> Now as per docs to query the store I would have to use:
> 
> String storeName = supplier.name();
> ReadOnlyWindowStore localWindowStore =
> streams.store(storeName, QueryableStoreTypes. Long>windowStore());
> String key = "some-key";
> long fromTime = ...;
> long toTime = ...;
> WindowStoreIterator countForWordsForWindows =
> localWindowStore.fetch(key, timeFrom, timeTo);
> 
> 
> My questions are:
> 
> 1. Can I run a different java application to query the state store
> created by first application.
> 
> If yes then how can I refer to the state store?
> 
> 
> 2. Value in the state store against any given key will keep
> incrementing as and when we read new data from the topic for a given
> time period.
> 
> So at time t say count against k1 is 5 for a given window
> 
> If we query that time we get 5, but at time t1 for same key and window
> count increases to 10.
> 
> If we query that time we get 10.
> 
> Question is how do we make sure that we query the state store only
> after it has aggregated all the values for a given window?
> 
> And is there a way for that java application to run forever (just like
> streams application)to keep querying state store and report back the
> values.
> 
> 
> Thanks
> 
> Sachin

Re: Windows OS platform support

2017-07-14 Thread Eno Thereska

Hi Harish,

I believe many people/orgs use it on Windows. We rely on the community to 
test/fix/answer any Windows questions, same as with Linux or MacOS. However, 
based on what I've observed, perhaps there are more people answering 
Linux-related questions.

Eno

> On 14 Jul 2017, at 13:24, harish jadhav  
> wrote:
> 
> Hello Team,
> 
> I am exploring Apache Kafka and found that one of the best MQ I have 
> encountered. I was exploring option to use it in Windows machine and started 
> some kind of proof of concept work referring installation section on windows 
> and it work perfectly. Later realized that Kafka documentation says under 
> Hardware and OS section " We have seen a few issues running on Windows and 
> Windows is not currently a well supported platform though we would be happy 
> to change that. "
> 
> I am curios to know is there any actual issue running Kafka in windows OS as 
> whichever feature I am using with single instance ( Producer-Consumer) works 
> perfectly in windows test bed. Can I use it production windows machines?
> 
> Please advice. 
> 
> Thanks
> Harish

Re: State management & restore functionality

2017-07-14 Thread Eno Thereska

None of these questions are naive, so no worries. Answer inline:

> During restore why does Kafka replay the whole topic / partition to recreate 
> the state in the local state store ? Isn't there any way to just have the 
> latest message as the current state ? Because that's what it is .. right ? 
> The last message in the topic / partition IS the latest state. May be I am 
> missing something obvious ?
> 

Let's say Kafka streams is doing an aggregate, e.g., computing sum(). For each 
key, it will compute the new sum() as new records arrive and store the result 
in the changelog topic in Kafka as well as it keeps a copy on RocksDB locally. 
Now, after a failure, a fresh instance comes along with no local state in 
RocksDB. It's necessary to re-construct that state. You are right that only the 
latest value for a key is needed. That is accomplished since the changelog 
topic is a compacted topic, and Kafka will do the compaction and keep only the 
latest value for a key. So what you are saying is effectively happening.

Note that if state restoration ends up being too long for your application 
needs, consider using standby tasks 
http://docs.confluent.io/current/streams/developer-guide.html#streams-developer-guide-standby-replicas
 
<http://docs.confluent.io/current/streams/developer-guide.html#streams-developer-guide-standby-replicas>.

Hope this helps,
Eno


> regards.
> 
> On Fri, Jul 14, 2017 at 6:23 PM, Eno Thereska <eno.there...@gmail.com 
> <mailto:eno.there...@gmail.com>> wrote:
> Hi Debasish,
> 
> Your intuition about the first part is correct. Kafka Streams automatically 
> assigns a partition of a topic to
> a task in an instance. It will never be the case that the same partition is 
> assigned to two tasks.
> 
> About the merging or changing of partitions part, it would help if we know 
> more about what you
> are trying to do. For example, if behind the scenes you add or remove 
> partitions that would not work
> well with Kafka Streams. However, if you use the Kafka Streams itself to 
> create new topics (e.g.,
> by merging two topics into one, or vice versa by taking one topic and 
> splitting it into more topics), then
> that would work fine.
> 
> Eno
> 
> > On 13 Jul 2017, at 23:49, Debasish Ghosh <ghosh.debas...@gmail.com 
> > <mailto:ghosh.debas...@gmail.com>> wrote:
> >
> > Hi -
> >
> > I have a question which is mostly to clarify some conceptions regarding
> > state management and restore functionality using Kafka Streams ..
> >
> > When I have multiple instances of the same application running (same
> > application id for each of the instances), are the following assumptions
> > correct ?
> >
> >   1. each instance has a separate state store (local)
> >   2. all instances are backed up by a *single* changelog topic
> >
> > Now the question arises, how does restore work in the above case when we
> > have 1 changelog topic backing up multiple state stores ?
> >
> > Each instance of the application ingests data from specific partitions of
> > the topic. And there can be multiple topics too. e.g. if we have m topics
> > with n partitions in each, and p instances of the application, then all the
> > (m x n) partitions are distributed across the p instances of the
> > application. Is this true ?
> >
> > If so, then does the changelog topic also has (m x n) partitions, so that
> > Kafka knows which state to restore in which store in case of a restore
> > operation ?
> >
> > And finally, if we decide to merge topics / partitions in between without
> > complete reset of the application, will (a) it work ? and (b) the changelog
> > topic gets updated accordingly and (c) is this recommended ?
> >
> > regards.
> >
> > --
> > Debasish Ghosh
> > http://manning.com/ghosh2 <http://manning.com/ghosh2>
> > http://manning.com/ghosh <http://manning.com/ghosh>
> >
> > Twttr: @debasishg
> > Blog: http://debasishg.blogspot.com <http://debasishg.blogspot.com/>
> > Code: http://github.com/debasishg <http://github.com/debasishg>
> 
> 
> 
> 
> -- 
> Debasish Ghosh
> http://manning.com/ghosh2 <http://manning.com/ghosh2>
> http://manning.com/ghosh <http://manning.com/ghosh>
> 
> Twttr: @debasishg
> Blog: http://debasishg.blogspot.com <http://debasishg.blogspot.com/>
> Code: http://github.com/debasishg <http://github.com/debasishg>

Re: Kafka streams: Record linkage / collecting all messages linked to one entity

2017-07-14 Thread Eno Thereska

So a couple of things that can help hopefully:

- it's worth thinking about how to map the problem into KStreams, KTables and 
GlobalKTables. For example, events A seem static and read-only to me, and 
possibly the data footprint is small, so probably they should be represented in 
the system as a GlobalKTable. That has certain advantages to using a 
GlobalKTable 
(http://docs.confluent.io/current/streams/concepts.html#streams-concepts-globalktable
 
<http://docs.confluent.io/current/streams/concepts.html#streams-concepts-globalktable>)

- whether to use KStreams or KTables for the other events depends on whether 
you want to interpret the records as, for example, "I only care about the 
current location of the user" (KTable) or "I care about the path the user has 
taken" (KStream is probably best).

Once the above mapping is in place, it's fine to do multi-joins (they will have 
to be done pair-wise though).

Hope this helps
Eno


> On 12 Jul 2017, at 14:26, Wladislaw Mitzel <mit...@tawadi.de> wrote:
> 
> Hello Eno,
> 
> I have to think through this approach. I could split the messages using the 
> source attribute. However one challenge is the fact that I would need to do 
> many joins. The example I gave is simplified. The real problem has about 10 
> sources of data. And there are various possible matches. A with B, B with C, 
> A with D, C with E and so on. Furthermore there might be several match-keys 
> in order to match A with B.
> 
> So based on this additional information I am wondering 
> 
> a. whether a "multi level join" is feasible in order to get the neighbors of 
> the neighbors (correlate A with E using C)
> 
> b. how to cope with the fact that there are multiple possible match-keys in 
> order to link two sources.
> 
> I am not sure whether I am in the right mind set and thinking in a "streaming 
> way". The algorithm that is in my mind is based on a graph representation of 
> the problem. Each message is a node. Each match-key is a node. Connect the 
> messages with the match-keys using edges. Now the message nodes are connected 
> through the match-key nodes. Each entity is defined by the graph that 
> connects all messages that are linked together. 
> 
> Kind regards,
> 
> Wladislaw
> 
> Eno Thereska <eno.there...@gmail.com> hat am 12. Juli 2017 um 00:23 
> geschrieben:
> 
> Hi Wladislaw,
> 
> Would splitting the one topic into multiple topics be acceptable at all? 
> E.g., you could use the "branch" function in the DSL to split the messages 
> and send to different topics. Then, once you have multiple topics you can do 
> the joins etc.
> 
> Thoughts?
> 
> Thanks
> Eno
> 
> On 11 Jul 2017, at 05:02, Wladislaw Mitzel <mit...@tawadi.de> wrote:
> 
> Hi all. How would one approach the following scenario with Kafka streams?
> 
> There is one input topic. It has data from different sources in a normalized 
> format. There is a need to join records that come from different sources but 
> are linked to the same entity (record linkage). There is a deterministic 
> rule-set to calculate (composed) "match-keys" for every incoming record that 
> allow the correlation of records that are linked to the same entity.
> 
> Example: There are events A (userid,first name,last name,), B(username, 
> location,.) and C(location, weatcher-data,). There is a set of rules 
> in order to correlate A with B (A.firstName+A.lastName = B.username) and B 
> with C (B.location = C.location). At the end, we want to get the whole graph 
> of correlated records.
> 
> Constraints: The latency of the records linkage should be as low as possible. 
> The state stores should contain the messages of the last 180 days for 
> linkage. (We are talking about tens to hundreds of GB of data)
> 
> I already implemented a solution with spark + an external database. I 
> calculate the match-keys and then store mappings for event-id => 
> list-of-match-keys, match-key => list-of-event-ids and event-id => 
> event-payload in the database. By querying the database one can get a graph 
> of "event -> match-keys -> more events" and so on. I do the querying in a 
> loop until there are no new events added. As a last step, I read the payloads 
> using the accumulated event-ids. However, this solution has a high latency 
> because of the external database calls. That’s why the idea of having KTables 
> as local state stores sounds so interesting to me.
> 
> Now with Kafka streams I would like to use the embedded state with KTables 
> but I find it quite hard to come up with a solution. I think what I want to 
> do is a self-join on the incoming topic which is not yet supported by

Re: State management & restore functionality

2017-07-14 Thread Eno Thereska

Hi Debasish,

Your intuition about the first part is correct. Kafka Streams automatically 
assigns a partition of a topic to 
a task in an instance. It will never be the case that the same partition is 
assigned to two tasks.

About the merging or changing of partitions part, it would help if we know more 
about what you 
are trying to do. For example, if behind the scenes you add or remove 
partitions that would not work
well with Kafka Streams. However, if you use the Kafka Streams itself to create 
new topics (e.g., 
by merging two topics into one, or vice versa by taking one topic and splitting 
it into more topics), then
that would work fine.

Eno

> On 13 Jul 2017, at 23:49, Debasish Ghosh  wrote:
> 
> Hi -
> 
> I have a question which is mostly to clarify some conceptions regarding
> state management and restore functionality using Kafka Streams ..
> 
> When I have multiple instances of the same application running (same
> application id for each of the instances), are the following assumptions
> correct ?
> 
>   1. each instance has a separate state store (local)
>   2. all instances are backed up by a *single* changelog topic
> 
> Now the question arises, how does restore work in the above case when we
> have 1 changelog topic backing up multiple state stores ?
> 
> Each instance of the application ingests data from specific partitions of
> the topic. And there can be multiple topics too. e.g. if we have m topics
> with n partitions in each, and p instances of the application, then all the
> (m x n) partitions are distributed across the p instances of the
> application. Is this true ?
> 
> If so, then does the changelog topic also has (m x n) partitions, so that
> Kafka knows which state to restore in which store in case of a restore
> operation ?
> 
> And finally, if we decide to merge topics / partitions in between without
> complete reset of the application, will (a) it work ? and (b) the changelog
> topic gets updated accordingly and (c) is this recommended ?
> 
> regards.
> 
> -- 
> Debasish Ghosh
> http://manning.com/ghosh2
> http://manning.com/ghosh
> 
> Twttr: @debasishg
> Blog: http://debasishg.blogspot.com
> Code: http://github.com/debasishg

Re: Is this a decent use case for Kafka Streams?

2017-07-13 Thread Eno Thereska

From just looking at your description of the problem, I'd say yes, this looks 
like a typical scenario for Kafka Streams. Kafka Streams supports exactly once 
semantics too in 0.11.

Cheers
Eno

> On 12 Jul 2017, at 17:06, Stephen Powis  wrote:
> 
> Hey! I was hoping I could get some input from people more experienced with
> Kafka Streams to determine if they'd be a good use case/solution for me.
> 
> I have multi-tenant clients submitting data to a Kafka topic that they want
> ETL'd to a third party service.  I'd like to batch and group these by
> tenant over a time window, somewhere between 1 and 5 minutes.  At the end
> of a time window then issue an API request to the third party service for
> each tenant sending the batch of data over.
> 
> Other points of note:
> - Ideally we'd have exactly-once semantics, sending data multiple times
> would typically be bad.  But we'd need to gracefully handle things like API
> request errors / service outages.
> 
> - We currently use Storm for doing stream processing, but the long running
> time-windows and potentially large amount of data stored in memory make me
> a bit nervous to use it for this.
> 
> Thoughts?  Thanks in Advance!
> Stephen

Re: Kafka streams: Record linkage / collecting all messages linked to one entity

2017-07-11 Thread Eno Thereska

Hi Wladislaw,

Would splitting the one topic into multiple topics be acceptable at all? E.g., 
you could use the "branch" function in the DSL to split the messages and send 
to different topics. Then, once you have multiple topics you can do the joins 
etc.

Thoughts?

Thanks
Eno

> On 11 Jul 2017, at 05:02, Wladislaw Mitzel  wrote:
> 
> Hi all. How would one approach the following scenario with Kafka streams?
> 
> There is one input topic. It has data from different sources in a normalized 
> format. There is a need to join records that come from different sources but 
> are linked to the same entity (record linkage). There is a deterministic 
> rule-set to calculate (composed) "match-keys" for every incoming record that 
> allow the correlation of records that are linked to the same entity.
> 
> Example: There are events A (userid,first name,last name,), B(username, 
> location,.) and C(location, weatcher-data,). There is a set of rules 
> in order to correlate A with B (A.firstName+A.lastName = B.username) and B 
> with C (B.location = C.location). At the end, we want to get the whole graph 
> of correlated records.
> 
> Constraints: The latency of the records linkage should be as low as possible. 
> The state stores should contain the messages of the last 180 days for 
> linkage. (We are talking about tens to hundreds of GB of data)
> 
> I already implemented a solution with spark + an external database. I 
> calculate the match-keys and then store mappings for event-id => 
> list-of-match-keys, match-key => list-of-event-ids and event-id => 
> event-payload in the database. By querying the database one can get a graph 
> of "event -> match-keys -> more events" and so on. I do the querying in a 
> loop until there are no new events added. As a last step, I read the payloads 
> using the accumulated event-ids. However, this solution has a high latency 
> because of the external database calls. That’s why the idea of having KTables 
> as local state stores sounds so interesting to me.
> 
> Now with Kafka streams I would like to use the embedded state with KTables 
> but I find it quite hard to come up with a solution. I think what I want to 
> do is a self-join on the incoming topic which is not yet supported by the 
> DSL. I thought of using the Processor API implementing a very similar 
> solution to the one I described with spark: using several state stores for 
> the mapping of event => match-keys, match-key => events. Beside the fact that 
> I don't know how to address the partitioning (or whether I need a global 
> store) I am not sure whether this is the way one would go with Kafka streams.
> 
> Another solution I could think of is a loop in the topology so that an event 
> would flow several times through the loop (which again has KTables for the 
> mapping of event-id and match-key) until there are no new matches. Are loops 
> possible at all and if so, is it a good approach or should one avoid loops? 
> At the end of the record linkage process I’d like to have *one* message that 
> contains the payloads of all correlated events and is then processed by the 
> downstream processors. However I can only think of solutions where I need to 
> do a flatMap() (do a join for every match-key) so that there is more than one 
> message.
> 
> Do you have any feedback or suggestions? Any examples that could help?
> 
> Kind regards,
> 
> Wladislaw

Re: kafka-streams app(s) stopped consuming new events

2017-06-30 Thread Eno Thereska

It’s hard to tell, the logs do not contain much, I agree. It could be a number 
of things.

If it’s happening as you say on restart as well (so it’s reproducible), any 
chance you could start streaming with DEBUG logs on and collect those logs? I’m 
hoping something shows up there.

Thanks,
Eno


> On Jun 28, 2017, at 5:30 PM, Dmitriy Vsekhvalnov  
> wrote:
> 
> Nothing for stat-change.log for giving time window. Last line logged 4
> hours before app stopped.
> 
> Any ideas so far? Personally i don't see anything relevant in logs.
> 
> On Wed, Jun 28, 2017 at 6:33 PM, Bill Bejeck  wrote:
> 
>> Sure, couldn't hurt.
>> 
>> Thanks,
>> Bill
>> 
>> On Wed, Jun 28, 2017 at 9:51 AM, Dmitriy Vsekhvalnov <
>> dvsekhval...@gmail.com
>>> wrote:
>> 
>>> Here are logs:
>>> 
>>> app:
>>> https://gist.github.com/dvsekhvalnov/f98afc3463f0c63b1722417e3710a8
>>> e7#file-kafka-streams-log
>>> brokers:
>>> https://gist.github.com/dvsekhvalnov/8e870f7347394e8d004c282880ef38
>>> 5a#file-kafka-broker-1-2-3-log
>>> 
>>> All broker logs are same, so single gist.
>>> 
>>> There are also state-change.log files, do you want to take a look at
>> those
>>> as well?
>>> 
>>> On Wed, Jun 28, 2017 at 4:31 PM, Bill Bejeck  wrote:
>>> 
 Hi Dmitry,
 
 At the moment I don't have anything specific to look for, just trying
>> to
 get more context around the issue.
 
 As for the logs maybe broker and streams logs for the last 30 minutes
>> up
>>> to
 the time the application stopped processing records.
 
 Thanks,
 Bill
 
 On Wed, Jun 28, 2017 at 9:04 AM, Dmitriy Vsekhvalnov <
 dvsekhval...@gmail.com
> wrote:
 
> Hi Bill,
> 
> 1. sure, can extract some logs, what exactly to look for? There are
>> 11
> hours of logs and most of them looks like:
> 
> [2017-06-27 03:30:50,553] [] [INFO ] [StreamThread-1]
> [org.apache.kafka.streams.processor.internals.StreamThread]
 [stream-thread
> [StreamThread-1] Committing all tasks because the commit interval
>>> 5000ms
> has elapsed]
> 
> [2017-06-27 03:30:50,553] [] [INFO ] [StreamThread-1]
> [org.apache.kafka.streams.processor.internals.StreamThread]
 [stream-thread
> [StreamThread-1] Committing task StreamTask 0_0]
> 
> [2017-06-27 03:30:50,554] [] [INFO ] [StreamThread-1]
> [org.apache.kafka.streams.processor.internals.StreamThread]
 [stream-thread
> [StreamThread-1] Committing task StreamTask 2_0]
> 
> Something specific to search for?
> 
> 2. Yes, there are more messages coming to topic.
> 
> On Wed, Jun 28, 2017 at 3:43 PM, Bill Bejeck 
>>> wrote:
> 
>> Hi Dimitry,
>> 
>> I'm happy to help, but I could use more information.  Can you share
>>> the
>> streams logs and broker logs?
>> 
>> Have you confirmed messages are still being delivered to topics
>> (via
>> console consumer)?
>> 
>> Thanks,
>> Bill
>> 
>> On Wed, Jun 28, 2017 at 8:24 AM, Dmitriy Vsekhvalnov <
>> dvsekhval...@gmail.com
>>> wrote:
>> 
>>> Hi all,
>>> 
>>> looking for some assistance in debugging kafka-streams
>> application.
>>> 
>>> Kafka broker  0.10.2.1  - x3 Node cluster
>>> kafka-streams 0.10.2.1 -  x2 application nodes x 1 stream thread
 each.
>>> 
>>> In streams configuration only:
>>> - SSL transport
>>> - kafka.streams.commitIntervalMs set to 5000 (instead of default
 30s).
>>> 
>>> We running simple aggregation app with several grouping streams.
> Running
>> 2
>>> instances of an app for redundancy. Both instances were working
 pretty
>> fine
>>> for 11 hours 15 minutes then stopped consuming new events from
>>> topic.
>>> 
>>> Hosting JVM processes were working fine, just streams stopped
 reacting
> to
>>> new data. No exceptions, errors, e.t.c. in logs.  After restart
 streams
>>> still not consuming new messages.
>>> 
>>> Below is 2 last entries from kafka-streams logs from both hosts:
>>> 
>>> [2017-06-27 14:45:09,663] [] [INFO ] [StreamThread-1]
>>> [org.apache.kafka.streams.processor.internals.StreamThread]
>> [stream-thread
>>> [StreamThread-1] Committing task StreamTask 4_2]
>>> 
>>> [2017-06-27 14:45:09,723] [] [INFO ] [StreamThread-1]
>>> [org.apache.kafka.streams.processor.internals.StreamThread]
>> [stream-thread
>>> [StreamThread-1] Committing task StreamTask 2_1]
>>> 
>>> Pretty puzzling why they stopped exactly same moment (with
>> respect
>>> to
>>> millis).
>>> 
>>> Really appreciate any ideas where to dig to.
>>> 
>>> Thank you.
>>> 
>> 
> 
 
>>> 
>>

Re: Kafka Stream invalid partitions

2017-06-27 Thread Eno Thereska

Thanks. I believe we’ve addressed this issue in 0.10.2.1, any chance you could 
try that?

Thanks
Eno
> On Jun 27, 2017, at 11:14 AM, D Stephan <kafkastre...@gmail.com> wrote:
> 
> Hello,
> 
> Thanks for your reply.
> 
> I use Kafka & KafkaStream version 0.10.2.0.
> Between the runs, the number of partitions are not intentionally changed
> programmatically or manually.
> 
> This topic:  "external-batch-request-store-repartition" is an internally
> generated topic from this KafkaStream DSL
> "aggregate"
> https://kafka.apache.org/0102/javadoc/org/apache/kafka/streams/kstream/KGroupedStream.html#aggregate(org.apache.kafka.streams.kstream.Initializer,%20org.apache.kafka.streams.kstream.Aggregator,%20org.apache.kafka.streams.kstream.Windows,%20org.apache.kafka.common.serialization.Serde,%20java.lang.String)
> 
> 
> 
> I use this API as follows:
> 
> ...
> .groupByKey()
> .aggregate(...)
> .toStream(...);
> 
> 
> Please let me know if you need addiotional information.
> 
> Thanks,
> 
> 
> 2017-06-27 11:39 GMT+02:00 Eno Thereska <eno.there...@gmail.com>:
> 
>> Hi there,
>> 
>> Thanks for the report. What version of Kafka are you using? Also, between
>> runs do you change the number of partitions for your topics? I’m trying to
>> figure out how this problem happens, any information on what is changing in
>> between runs is appreciated.
>> 
>> Thanks,
>> Eno
>> 
>>> On Jun 27, 2017, at 8:52 AM, D Stephan <kafkastre...@gmail.com> wrote:
>>> 
>>> Hello,
>>> 
>>> When I use KafkaStreams DSL GroupByKey and Aggregate APIs, I have
>> randomly
>>> & frequently below exceptions:
>>> In my opinion, it is not practical to clean up the invalid partitions
>>> everydays.  For your information, this partition is an internal partition
>>> that automatically created by KafkaStream Aggregate API.
>>> Dou you have any idea or workarounds to mitigate this exception?
>>> 
>>> 
>>> 
>>> 
>>> 2017-06-21T06:48:31.488210812Z 2017-06-21 06:48:31.487 WARN 1 --- [
>>> StreamThread-4] o.a.k.s.p.i.InternalTopicManager :
>>> Could not create internal topics: Existing internal topic
>>> external-batch-request-store-repartition has invalid partitions.
>>> Expected: 20 Actual: 1. Use 'kafka.tools.StreamsResetter' tool to clean
>> up
>>> invalid topics before processing. Retry #4
>>> 
>>> 2017-06-21T06:48:31.491071442Z Exception in thread "StreamThread-4"
>>> org.apache.kafka.streams.errors.StreamsException: Could not create
>> internal
>>> topics.
>>> 2017-06-21T06:48:31.491087557Z at
>>> org.apache.kafka.streams.processor.internals.InternalTopicManager.
>> makeReady(InternalTopicManager.java:70)
>>> 2017-06-21T06:48:31.491091661Z at
>>> org.apache.kafka.streams.processor.internals.StreamPartitionAssignor.
>> prepareTopic(StreamPartitionAssignor.java:618)
>>> 2017-06-21T06:48:31.491096794Z at
>>> org.apache.kafka.streams.processor.internals.StreamPartitionAssignor.
>> assign(StreamPartitionAssignor.java:372)
>>> 2017-06-21T06:48:31.491368662Z at
>>> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.
>> performAssignment(ConsumerCoordinator.java:339)
>>> 2017-06-21T06:48:31.491390576Z at
>>> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.
>> onJoinLeader(AbstractCoordinator.java:488)
>>> 2017-06-21T06:48:31.491397476Z at
>>> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.access$
>> 1100(AbstractCoordinator.java:89)
>>> 2017-06-21T06:48:31.491403757Z at
>>> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$
>> JoinGroupResponseHandler.handle(AbstractCoordinator.java:438)
>>> 2017-06-21T06:48:31.491408328Z at
>>> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$
>> JoinGroupResponseHandler.handle(AbstractCoordinator.java:420)
>>> 2017-06-21T06:48:31.491413053Z at
>>> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$
>> CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:764)
>> 
>>

Re: Kafka Stream invalid partitions

2017-06-27 Thread Eno Thereska

Hi there,

Thanks for the report. What version of Kafka are you using? Also, between runs 
do you change the number of partitions for your topics? I’m trying to figure 
out how this problem happens, any information on what is changing in between 
runs is appreciated.

Thanks,
Eno

> On Jun 27, 2017, at 8:52 AM, D Stephan  wrote:
> 
> Hello,
> 
> When I use KafkaStreams DSL GroupByKey and Aggregate APIs, I have randomly
> & frequently below exceptions:
> In my opinion, it is not practical to clean up the invalid partitions
> everydays.  For your information, this partition is an internal partition
> that automatically created by KafkaStream Aggregate API.
> Dou you have any idea or workarounds to mitigate this exception?
> 
> 
> 
> 
> 2017-06-21T06:48:31.488210812Z 2017-06-21 06:48:31.487 WARN 1 --- [
> StreamThread-4] o.a.k.s.p.i.InternalTopicManager :
> Could not create internal topics: Existing internal topic
> external-batch-request-store-repartition has invalid partitions.
> Expected: 20 Actual: 1. Use 'kafka.tools.StreamsResetter' tool to clean up
> invalid topics before processing. Retry #4
> 
> 2017-06-21T06:48:31.491071442Z Exception in thread "StreamThread-4"
> org.apache.kafka.streams.errors.StreamsException: Could not create internal
> topics.
> 2017-06-21T06:48:31.491087557Z at
> org.apache.kafka.streams.processor.internals.InternalTopicManager.makeReady(InternalTopicManager.java:70)
> 2017-06-21T06:48:31.491091661Z at
> org.apache.kafka.streams.processor.internals.StreamPartitionAssignor.prepareTopic(StreamPartitionAssignor.java:618)
> 2017-06-21T06:48:31.491096794Z at
> org.apache.kafka.streams.processor.internals.StreamPartitionAssignor.assign(StreamPartitionAssignor.java:372)
> 2017-06-21T06:48:31.491368662Z at
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.performAssignment(ConsumerCoordinator.java:339)
> 2017-06-21T06:48:31.491390576Z at
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.onJoinLeader(AbstractCoordinator.java:488)
> 2017-06-21T06:48:31.491397476Z at
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.access$1100(AbstractCoordinator.java:89)
> 2017-06-21T06:48:31.491403757Z at
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$JoinGroupResponseHandler.handle(AbstractCoordinator.java:438)
> 2017-06-21T06:48:31.491408328Z at
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$JoinGroupResponseHandler.handle(AbstractCoordinator.java:420)
> 2017-06-21T06:48:31.491413053Z at
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:764)

Re: Kafka 10.2.1 KStreams rebalancing issue

2017-06-23 Thread Eno Thereska

Hi Sameer,

Could you elaborate on your question? Are you concerned that machine2 does not 
have any tasks in the beginning? 

Could you share your streams configuration? In particular how many threads does 
each stream instance have? Also how many topics and partitions do you have?


Thanks,
Eno

> On 23 Jun 2017, at 17:31, Sameer Kumar  wrote:
> 
> Hi,
> 
> Came across a rebalancing issue in using KafkaStreams. I have two machines,
> Machine1 and Machine2, machine1 is consuming all partitions and machine2 is
> completely free and not processing any partitions. If I shutdown machine1,
> then machine2 will take over and would start consuming all partitions.
> 
> But, in this scenario, its not using the complete cluster.
> 
> *Machine 1*
> 2017-06-23 21:54:07 INFO  KafkaStreams:224 - stream-client
> [LIC3-43-2b05186c-cbf2-4f9c-acc4-0dd929c7d647] State transition from
> REBALANCING to REBALANCING.
> 2017-06-23 21:54:07 INFO  StreamThread:858 - stream-thread
> [StreamThread-27] Creating active task 1_2 with assigned partitions
> [LIC3-43-lic3-deb-ci2-43-repartition-2,
> LIC3-43-lic3-cnt-ci-43-repartition-2]
> 2017-06-23 21:54:07 INFO  StreamThread:163 - stream-thread
> [StreamThread-20] State transition from PARTITIONS_REVOKED to
> ASSIGNING_PARTITIONS.
> 2017-06-23 21:54:07 INFO  KafkaStreams:224 - stream-client
> [LIC3-43-2b05186c-cbf2-4f9c-acc4-0dd929c7d647] State transition from
> REBALANCING to REBALANCING.
> 2017-06-23 21:54:07 INFO  StreamThread:163 - stream-thread
> [StreamThread-31] State transition from PARTITIONS_REVOKED to
> ASSIGNING_PARTITIONS.
> 2017-06-23 21:54:07 INFO  StreamThread:163 - stream-thread
> [StreamThread-13] State transition from PARTITIONS_REVOKED to
> ASSIGNING_PARTITIONS.
> 2017-06-23 21:54:07 INFO  KafkaStreams:224 - stream-client
> [LIC3-43-2b05186c-cbf2-4f9c-acc4-0dd929c7d647] State transition from
> REBALANCING to REBALANCING.
> 2017-06-23 21:54:07 INFO  KafkaStreams:224 - stream-client
> [LIC3-43-2b05186c-cbf2-4f9c-acc4-0dd929c7d647] State transition from
> REBALANCING to REBALANCING.
> 2017-06-23 21:54:07 INFO  StreamThread:163 - stream-thread
> [StreamThread-16] State transition from ASSIGNING_PARTITIONS to RUNNING.
> 2017-06-23 21:54:07 INFO  StreamThread:858 - stream-thread
> [StreamThread-18] Creating active task 0_4 with assigned partitions
> [testS5-4]
> 2017-06-23 21:54:07 INFO  StreamThread:858 - stream-thread
> [StreamThread-13] Creating active task 0_2 with assigned partitions
> [testS5-2]
> 2017-06-23 21:54:07 INFO  KafkaStreams:224 - stream-client
> [LIC3-43-2b05186c-cbf2-4f9c-acc4-0dd929c7d647] State transition from
> REBALANCING to REBALANCING.
> 2017-06-23 21:54:07 INFO  KafkaStreams:224 - stream-client
> [LIC3-43-2b05186c-cbf2-4f9c-acc4-0dd929c7d647] State transition from
> REBALANCING to REBALANCING.
> 2017-06-23 21:54:07 INFO  StreamThread:858 - stream-thread
> [StreamThread-31] Creating active task 2_0 with assigned partitions
> [LIC3-43-lic3-cnt-li-43-repartition-0,
> LIC3-43-lic3-cnt-li-43_1-repartition-0]
> 2017-06-23 21:54:07 INFO  StreamThread:858 - stream-thread
> [StreamThread-20] Creating active task 0_9 with assigned partitions
> [testS5-9]
> 2017-06-23 21:54:07 INFO  KafkaStreams:224 - stream-client
> [LIC3-43-2b05186c-cbf2-4f9c-acc4-0dd929c7d647] State transition from
> REBALANCING to REBALANCING.
> 2017-06-23 21:54:07 INFO  KafkaStreams:224 - stream-client
> [LIC3-43-2b05186c-cbf2-4f9c-acc4-0dd929c7d647] State transition from
> REBALANCING to REBALANCING.
> 2017-06-23 21:54:07 INFO  StreamThread:858 - stream-thread
> [StreamThread-29] Creating active task 2_5 with assigned partitions
> [LIC3-43-lic3-cnt-li-43-repartition-5,
> LIC3-43-lic3-cnt-li-43_1-repartition-5]
> 
> 
> *Machine2*
> 
> 21:54:08 INFO  StreamThread:248 - stream-thread [StreamThread-8] at state
> RUNNING: partitions [] revoked at the beginning of consumer rebalance.
> 2017-06-23 21:54:08 INFO  AbstractCoordinator:420 - (Re-)joining group
> LIC3-43
> 2017-06-23 21:54:08 INFO  StreamThread:1042 - stream-thread
> [StreamThread-5] Updating suspended tasks to contain active tasks []
> 2017-06-23 21:54:08 INFO  StreamThread:1049 - stream-thread
> [StreamThread-5] Removing all active tasks []
> 2017-06-23 21:54:08 INFO  KafkaStreams:224 - stream-client
> [LIC3-43-47a995d4-2f53-4837-8b8a-c5550bc7b3eb] State transition from
> REBALANCING to REBALANCING.
> 2017-06-23 21:54:08 INFO  StreamThread:1064 - stream-thread
> [StreamThread-5] Removing all standby tasks []
> 2017-06-23 21:54:08 INFO  StreamThread:1042 - stream-thread
> [StreamThread-18] Updating suspended tasks to contain active tasks []
> 2017-06-23 21:54:08 INFO  StreamThread:163 - stream-thread [StreamThread-8]
> State transition from RUNNING to PARTITIONS_REVOKED.
> 2017-06-23 21:54:08 INFO  StreamThread:1049 - stream-thread
> [StreamThread-18] Removing all active tasks []
> 2017-06-23 21:54:08 INFO  AbstractCoordinator:420 - (Re-)joining group
> LIC3-43
> 2017-06-23 21:54:08 INFO  KafkaStreams:224

Re: [DISCUSS]: KIP-161: streams record processing exception handlers

2017-06-22 Thread Eno Thereska

Answers inline: 

> On 22 Jun 2017, at 03:26, Guozhang Wang <wangg...@gmail.com> wrote:
> 
> Thanks for the updated KIP, some more comments:
> 
> 1.The config name is "default.deserialization.exception.handler" while the
> interface class name is "RecordExceptionHandler", which is more general
> than the intended purpose. Could we rename the class name accordingly?

Sure.


> 
> 2. Could you describe the full implementation of "DefaultExceptionHandler",
> currently it is not clear to me how it is implemented with the configured
> value.
> 
> In addition, I think we do not need to include an additional
> "DEFAULT_DESERIALIZATION_EXCEPTION_RESPONSE_CONFIG" as the configure()
> function is mainly used for users to pass any customized parameters that is
> out of the Streams library; plus adding such additional config sounds
> over-complicated for a default exception handler. Instead I'd suggest we
> just provide two handlers (or three if people feel strong about the
> LogAndThresholdExceptionHandler), one for FailOnExceptionHandler and one
> for LogAndContinueOnExceptionHandler. And we can set
> LogAndContinueOnExceptionHandler
> by default.
> 

That's what I had originally. Jay mentioned he preferred one default class, 
with config options.
So with that approach, you'd have 2 config options, one for failing, one for 
continuing, and the one
exception handler would take those options during it's configure() call.

After checking the other exception handlers in the code, I might revert back to 
what I originally had (2 default handlers) 
as Guozhang also re-suggests, but still have the interface extend Configurable. 
Guozhang, you ok with that? In that case
there is no need for the response config option.

Thanks
Eno


> 
> Guozhang
> 
> 
> 
> 
> 
> 
> 
> 
> On Wed, Jun 21, 2017 at 1:39 AM, Eno Thereska <eno.there...@gmail.com 
> <mailto:eno.there...@gmail.com>>
> wrote:
> 
>> Thanks Guozhang,
>> 
>> I’ve updated the KIP and hopefully addressed all the comments so far. In
>> the process also changed the name of the KIP to reflect its scope better:
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-161%3A+streams+ 
>> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-161%3A+streams+>
>> deserialization+exception+handlers <https://cwiki.apache.org/ 
>> <https://cwiki.apache.org/>
>> confluence/display/KAFKA/KIP-161:+streams+deserialization+
>> exception+handlers>
>> 
>> Any other feedback appreciated, otherwise I’ll start the vote soon.
>> 
>> Thanks
>> Eno
>> 
>>> On Jun 12, 2017, at 6:28 AM, Guozhang Wang <wangg...@gmail.com> wrote:
>>> 
>>> Eno, Thanks for bringing this proposal up and sorry for getting late on
>>> this. Here are my two cents:
>>> 
>>> 1. First some meta comments regarding "fail fast" v.s. "making
>> progress". I
>>> agree that in general we should better "enforce user to do the right
>> thing"
>>> in system design, but we also need to keep in mind that Kafka is a
>>> multi-tenant system, i.e. from a Streams app's pov you probably would not
>>> control the whole streaming processing pipeline end-to-end. E.g. Your
>> input
>>> data may not be controlled by yourself; it could be written by another
>> app,
>>> or another team in your company, or even a different organization, and if
>>> an error happens maybe you cannot fix "to do the right thing" just by
>>> yourself in time. In such an environment I think it is important to leave
>>> the door open to let users be more resilient. So I find the current
>>> proposal which does leave the door open for either fail-fast or make
>>> progress quite reasonable.
>>> 
>>> 2. On the other hand, if the question is whether we should provide a
>>> built-in "send to bad queue" handler from the library, I think that might
>>> be an overkill: with some tweaks (see my detailed comments below) on the
>>> API we can allow users to implement such handlers pretty easily. In
>> fact, I
>>> feel even "LogAndThresholdExceptionHandler" is not necessary as a
>> built-in
>>> handler, as it would then require users to specify the threshold via
>>> configs, etc. I think letting people provide such "eco-libraries" may be
>>> better.
>>> 
>>> 3. Regarding the CRC error: today we validate CRC on both the broker end
>>> upon receiving produce requests and on consumer end upon receiving fetch
>>> responses; and if

Re: [DISCUSS] Streams DSL/StateStore Refactoring

2017-06-22 Thread Eno Thereska

Note that while I agree with the initial proposal (withKeySerdes, withJoinType, 
etc), I don't agree with things like .materialize(), .enableCaching(), 
.enableLogging(). 

The former maintain the declarative DSL, while the later break the declarative 
part by mixing system decisions in the DSL.  I think there is a difference 
between the two proposals.

Eno

> On 22 Jun 2017, at 03:46, Guozhang Wang <wangg...@gmail.com> wrote:
> 
> I have been thinking about reducing all these overloaded functions for
> stateful operations (there are some other places that introduces overloaded
> functions but let's focus on these only in this discussion), what I used to
> have is to use some "materialize" function on the KTables, like:
> 
> ---
> 
> // specifying the topology
> 
> KStream stream1 = builder.stream();
> KTable table1 = stream1.groupby(...).aggregate(initializer, aggregator,
> sessionMerger, sessionWindows);  // do not allow to pass-in a state store
> supplier here any more
> 
> // additional specs along with the topology above
> 
> table1.materialize("queryableStoreName"); // or..
> table1.materialize("queryableStoreName").enableCaching().enableLogging();
> // or..
> table1.materialize(stateStoreSupplier); // add the metrics / logging /
> caching / windowing functionalities on top of the store, or..
> table1.materialize(stateStoreSupplier).enableCaching().enableLogging(); //
> etc..
> 
> ---
> 
> But thinking about it more, I feel Damian's first proposal is better since
> my proposal would likely to break the concatenation (e.g. we may not be
> able to do sth. like "table1.filter().map().groupBy().aggregate()" if we
> want to use different specs for the intermediate filtered KTable).
> 
> 
> But since this is a incompatibility change, and we are going to remove the
> compatibility annotations soon it means we only have one chance and we
> really have to make it right. So I'd call out for anyone try to rewrite
> your examples / demo code with the proposed new API and see if it feel
> natural, for example, if I want to use a different storage engine than the
> default rockDB engine how could I easily specify that with the proposed
> APIs?
> 
> Meanwhile Damian could you provide a formal set of APIs for people to
> exercise on them? Also could you briefly describe how custom storage
> engines could be swapped in with the above APIs?
> 
> 
> 
> Guozhang
> 
> 
> On Wed, Jun 21, 2017 at 9:08 AM, Eno Thereska <eno.there...@gmail.com>
> wrote:
> 
>> To make it clear, it’s outlined by Damian, I just copy pasted what he told
>> me in person :)
>> 
>> Eno
>> 
>>> On Jun 21, 2017, at 4:40 PM, Bill Bejeck <bbej...@gmail.com> wrote:
>>> 
>>> +1 for the approach outlined above by Eno.
>>> 
>>> On Wed, Jun 21, 2017 at 11:28 AM, Damian Guy <damian@gmail.com>
>> wrote:
>>> 
>>>> Thanks Eno.
>>>> 
>>>> Yes i agree. We could apply this same approach to most of the operations
>>>> where we have multiple overloads, i.e., we have a single method for each
>>>> operation that takes the required parameters and everything else is
>>>> specified as you have done above.
>>>> 
>>>> On Wed, 21 Jun 2017 at 16:24 Eno Thereska <eno.there...@gmail.com>
>> wrote:
>>>> 
>>>>> (cc’ing user-list too)
>>>>> 
>>>>> Given that we already have StateStoreSuppliers that are configurable
>>>> using
>>>>> the fluent-like API, probably it’s worth discussing the other examples
>>>> with
>>>>> joins and serdes first since those have many overloads and are in need
>> of
>>>>> some TLC.
>>>>> 
>>>>> So following your example, I guess you’d have something like:
>>>>> .join()
>>>>>  .withKeySerdes(…)
>>>>>  .withValueSerdes(…)
>>>>>  .withJoinType(“outer”)
>>>>> 
>>>>> etc?
>>>>> 
>>>>> I like the approach since it still remains declarative and it’d reduce
>>>> the
>>>>> number of overloads by quite a bit.
>>>>> 
>>>>> Eno
>>>>> 
>>>>>> On Jun 21, 2017, at 3:37 PM, Damian Guy <damian@gmail.com> wrote:
>>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> I'd like to get a discussion going around some of the API choices
>> we've
>>

Re: [DISCUSS] Streams DSL/StateStore Refactoring

2017-06-21 Thread Eno Thereska

To make it clear, it’s outlined by Damian, I just copy pasted what he told me 
in person :)

Eno

> On Jun 21, 2017, at 4:40 PM, Bill Bejeck <bbej...@gmail.com> wrote:
> 
> +1 for the approach outlined above by Eno.
> 
> On Wed, Jun 21, 2017 at 11:28 AM, Damian Guy <damian@gmail.com> wrote:
> 
>> Thanks Eno.
>> 
>> Yes i agree. We could apply this same approach to most of the operations
>> where we have multiple overloads, i.e., we have a single method for each
>> operation that takes the required parameters and everything else is
>> specified as you have done above.
>> 
>> On Wed, 21 Jun 2017 at 16:24 Eno Thereska <eno.there...@gmail.com> wrote:
>> 
>>> (cc’ing user-list too)
>>> 
>>> Given that we already have StateStoreSuppliers that are configurable
>> using
>>> the fluent-like API, probably it’s worth discussing the other examples
>> with
>>> joins and serdes first since those have many overloads and are in need of
>>> some TLC.
>>> 
>>> So following your example, I guess you’d have something like:
>>> .join()
>>>   .withKeySerdes(…)
>>>   .withValueSerdes(…)
>>>   .withJoinType(“outer”)
>>> 
>>> etc?
>>> 
>>> I like the approach since it still remains declarative and it’d reduce
>> the
>>> number of overloads by quite a bit.
>>> 
>>> Eno
>>> 
>>>> On Jun 21, 2017, at 3:37 PM, Damian Guy <damian@gmail.com> wrote:
>>>> 
>>>> Hi,
>>>> 
>>>> I'd like to get a discussion going around some of the API choices we've
>>>> made in the DLS. In particular those that relate to stateful operations
>>>> (though this could expand).
>>>> As it stands we lean heavily on overloaded methods in the API, i.e,
>> there
>>>> are 9 overloads for KGroupedStream.count(..)! It is becoming noisy and
>> i
>>>> feel it is only going to get worse as we add more optional params. In
>>>> particular we've had some requests to be able to turn caching off, or
>>>> change log configs,  on a per operator basis (note this can be done now
>>> if
>>>> you pass in a StateStoreSupplier, but this can be a bit cumbersome).
>>>> 
>>>> So this is a bit of an open question. How can we change the DSL
>> overloads
>>>> so that it flows, is simple to use and understand, and is easily
>> extended
>>>> in the future?
>>>> 
>>>> One option would be to use a fluent API approach for providing the
>>> optional
>>>> params, so something like this:
>>>> 
>>>> groupedStream.count()
>>>>  .withStoreName("name")
>>>>  .withCachingEnabled(false)
>>>>  .withLoggingEnabled(config)
>>>>  .table()
>>>> 
>>>> 
>>>> 
>>>> Another option would be to provide a Builder to the count method, so it
>>>> would look something like this:
>>>> groupedStream.count(new
>>>> CountBuilder("storeName").withCachingEnabled(false).build())
>>>> 
>>>> Another option is to say: Hey we don't need this, what are you on
>> about!
>>>> 
>>>> The above has focussed on state store related overloads, but the same
>>> ideas
>>>> could  be applied to joins etc, where we presently have many join
>> methods
>>>> and many overloads.
>>>> 
>>>> Anyway, i look forward to hearing your opinions.
>>>> 
>>>> Thanks,
>>>> Damian
>>> 
>>> 
>>

Re: [DISCUSS] Streams DSL/StateStore Refactoring

2017-06-21 Thread Eno Thereska

(cc’ing user-list too)

Given that we already have StateStoreSuppliers that are configurable using the 
fluent-like API, probably it’s worth discussing the other examples with joins 
and serdes first since those have many overloads and are in need of some TLC.

So following your example, I guess you’d have something like:
.join()
   .withKeySerdes(…)
   .withValueSerdes(…)
   .withJoinType(“outer”)

etc?

I like the approach since it still remains declarative and it’d reduce the 
number of overloads by quite a bit.

Eno

> On Jun 21, 2017, at 3:37 PM, Damian Guy  wrote:
> 
> Hi,
> 
> I'd like to get a discussion going around some of the API choices we've
> made in the DLS. In particular those that relate to stateful operations
> (though this could expand).
> As it stands we lean heavily on overloaded methods in the API, i.e, there
> are 9 overloads for KGroupedStream.count(..)! It is becoming noisy and i
> feel it is only going to get worse as we add more optional params. In
> particular we've had some requests to be able to turn caching off, or
> change log configs,  on a per operator basis (note this can be done now if
> you pass in a StateStoreSupplier, but this can be a bit cumbersome).
> 
> So this is a bit of an open question. How can we change the DSL overloads
> so that it flows, is simple to use and understand, and is easily extended
> in the future?
> 
> One option would be to use a fluent API approach for providing the optional
> params, so something like this:
> 
> groupedStream.count()
>   .withStoreName("name")
>   .withCachingEnabled(false)
>   .withLoggingEnabled(config)
>   .table()
> 
> 
> 
> Another option would be to provide a Builder to the count method, so it
> would look something like this:
> groupedStream.count(new
> CountBuilder("storeName").withCachingEnabled(false).build())
> 
> Another option is to say: Hey we don't need this, what are you on about!
> 
> The above has focussed on state store related overloads, but the same ideas
> could  be applied to joins etc, where we presently have many join methods
> and many overloads.
> 
> Anyway, i look forward to hearing your opinions.
> 
> Thanks,
> Damian

Re: [DISCUSS]: KIP-161: streams record processing exception handlers

2017-06-21 Thread Eno Thereska

Thanks Guozhang,

I’ve updated the KIP and hopefully addressed all the comments so far. In the 
process also changed the name of the KIP to reflect its scope better: 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-161%3A+streams+deserialization+exception+handlers
 
<https://cwiki.apache.org/confluence/display/KAFKA/KIP-161:+streams+deserialization+exception+handlers>

Any other feedback appreciated, otherwise I’ll start the vote soon.

Thanks
Eno

> On Jun 12, 2017, at 6:28 AM, Guozhang Wang <wangg...@gmail.com> wrote:
> 
> Eno, Thanks for bringing this proposal up and sorry for getting late on
> this. Here are my two cents:
> 
> 1. First some meta comments regarding "fail fast" v.s. "making progress". I
> agree that in general we should better "enforce user to do the right thing"
> in system design, but we also need to keep in mind that Kafka is a
> multi-tenant system, i.e. from a Streams app's pov you probably would not
> control the whole streaming processing pipeline end-to-end. E.g. Your input
> data may not be controlled by yourself; it could be written by another app,
> or another team in your company, or even a different organization, and if
> an error happens maybe you cannot fix "to do the right thing" just by
> yourself in time. In such an environment I think it is important to leave
> the door open to let users be more resilient. So I find the current
> proposal which does leave the door open for either fail-fast or make
> progress quite reasonable.
> 
> 2. On the other hand, if the question is whether we should provide a
> built-in "send to bad queue" handler from the library, I think that might
> be an overkill: with some tweaks (see my detailed comments below) on the
> API we can allow users to implement such handlers pretty easily. In fact, I
> feel even "LogAndThresholdExceptionHandler" is not necessary as a built-in
> handler, as it would then require users to specify the threshold via
> configs, etc. I think letting people provide such "eco-libraries" may be
> better.
> 
> 3. Regarding the CRC error: today we validate CRC on both the broker end
> upon receiving produce requests and on consumer end upon receiving fetch
> responses; and if the CRC validation fails in the former case it would not
> be appended to the broker logs. So if we do see a CRC failure on the
> consumer side it has to be that either we have a flipped bit on the broker
> disks or over the wire. For the first case it is fatal while for the second
> it is retriable. Unfortunately we cannot tell which case it is when seeing
> CRC validation failures. But in either case, just skipping and making
> progress seems not a good choice here, and hence I would personally exclude
> these errors from the general serde errors to NOT leave the door open of
> making progress.
> 
> Currently such errors are thrown as KafkaException that wraps an
> InvalidRecordException, which may be too general and we could consider just
> throwing the InvalidRecordException directly. But that could be an
> orthogonal discussion if we agrees that CRC failures should not be
> considered in this KIP.
> 
> 
> 
> Now some detailed comments:
> 
> 4. Could we consider adding the processor context in the handle() function
> as well? This context will be wrapping as the source node that is about to
> process the record. This could expose more info like which task / source
> node sees this error, which timestamp of the message, etc, and also can
> allow users to implement their handlers by exposing some metrics, by
> calling context.forward() to implement the "send to bad queue" behavior etc.
> 
> 5. Could you add the string name of
> StreamsConfig.DEFAULT_RECORD_EXCEPTION_HANDLER as well in the KIP?
> Personally I find "default" prefix a bit misleading since we do not allow
> users to override it per-node yet. But I'm okay either way as I can see we
> may extend it in the future and probably would like to not rename the
> config again. Also from the experience of `default partitioner` and
> `default timestamp extractor` we may also make sure that the passed in
> object can be either a string "class name" or a class object?
> 
> 
> Guozhang
> 
> 
> On Wed, Jun 7, 2017 at 2:16 PM, Jan Filipiak <jan.filip...@trivago.com>
> wrote:
> 
>> Hi Eno,
>> 
>> On 07.06.2017 22:49, Eno Thereska wrote:
>> 
>>> Comments inline:
>>> 
>>> On 5 Jun 2017, at 18:19, Jan Filipiak <jan.filip...@trivago.com> wrote:
>>>> 
>>>> Hi
>>>> 
>>>> just my few thoughts
>>>> 
>>>> On 05.06.2017 11:44, Eno The

Re: KStream Usage spikes memory consumption and breaks Kafka

2017-06-20 Thread Eno Thereska

Could you provide some configuration information and more context? What 
application are you running, when is it running out of memory? Otherwise it's 
hard to tell.

Eno
> On 20 Jun 2017, at 22:15, IT Consultant <0binarybudd...@gmail.com> wrote:
> 
> Hi All ,
> 
> Kafka instance is breaking down when used Kstream . It runs out of memory
> frequently resulting into service unavailabilty ,
> 
> 
> Is it a good practice to use Kstream ?
> What other option must be tried to avoid such breakage ?
> If it's best pratice , how do we fine tune kafka to withhold load ?
> 
> Thanks for your help in advance .

Re: [VOTE] 0.11.0.0 RC1

2017-06-19 Thread Eno Thereska

+1 (non-binding) passes Kafka Streams tests.

Thanks,
Eno
> On 19 Jun 2017, at 06:49, Magnus Edenhill  wrote:
> 
> +1 (non-binding)
> 
> Passes librdkafka integration tests (v0.9.5 and master)
> 
> 
> 2017-06-19 0:32 GMT+02:00 Ismael Juma :
> 
>> Hello Kafka users, developers and client-developers,
>> 
>> This is the second candidate for release of Apache Kafka 0.11.0.0.
>> 
>> This is a major version release of Apache Kafka. It includes 32 new KIPs.
>> See
>> the release notes and release plan (https://cwiki.apache.org/conf
>> luence/display/KAFKA/Release+Plan+0.11.0.0) for more details. A few
>> feature
>> highlights:
>> 
>> * Exactly-once delivery and transactional messaging
>> * Streams exactly-once semantics
>> * Admin client with support for topic, ACLs and config management
>> * Record headers
>> * Request rate quotas
>> * Improved resiliency: replication protocol improvement and single-threaded
>> controller
>> * Richer and more efficient message format
>> 
>> A number of issues have been resolved since RC0 and there are no known
>> blockers remaining.
>> 
>> Release notes for the 0.11.0.0 release:
>> http://home.apache.org/~ijuma/kafka-0.11.0.0-rc1/RELEASE_NOTES.html
>> 
>> *** Please download, test and vote by Thursday, June 22, 9am PT
>> 
>> Kafka's KEYS file containing PGP keys we use to sign the release:
>> http://kafka.apache.org/KEYS
>> 
>> * Release artifacts to be voted upon (source and binary):
>> http://home.apache.org/~ijuma/kafka-0.11.0.0-rc1/
>> 
>> * Maven artifacts to be voted upon:
>> https://repository.apache.org/content/groups/staging/org/apache/kafka/
>> 
>> * Javadoc:
>> http://home.apache.org/~ijuma/kafka-0.11.0.0-rc1/javadoc/
>> 
>> * Tag to be voted upon (off 0.11.0 branch) is the 0.11.0.0 tag:
>> https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tag;h=
>> 4818d4e1cbef1a8e9c027100fef317077fb3fb99
>> 
>> * Documentation:
>> http://kafka.apache.org/0110/documentation.html
>> 
>> * Protocol:
>> http://kafka.apache.org/0110/protocol.html
>> 
>> * Successful Jenkins builds for the 0.11.0 branch:
>> Unit/integration tests: https://builds.apache.org/job/
>> kafka-0.11.0-jdk7/167/
>> System tests: https://jenkins.confluent.io/job/system-test-kafka-0.11.0/
>> 16/
>> (all 274 tests passed, the reported failure was not related to the tests)
>> 
>> /**
>> 
>> Thanks,
>> Ismael
>>

Re: Scala type mismatch after upgrade to 0.10.2.1

2017-06-18 Thread Eno Thereska

Hi there,

Yeah with 0.10.2 some Scala applications need to explicitly declare the type of 
certain variables. See this: 
http://docs.confluent.io/current/streams/upgrade-guide.html#scala 
 

Thanks
Eno

> On Jun 17, 2017, at 10:03 PM, Björn Häuser  wrote:
> 
> Hi!
> 
> I am maintaining an application which is written in Kafka and uses the 
> kafka-streams library.
> 
> As said in the topic, after trying to upgrade from 0.10.1.1 to 0.10.2.1, I am 
> getting the following compilation error:
> 
> [error]  found   : service.streams.transformers.FilterMainCoverSupplier
> [error]  required: org.apache.kafka.streams.kstream.TransformerSupplier[_ >: 
> String, _ >: ?0(in value x$1), org.apache.kafka.streams.KeyValue[?,?]]
> [error] Note: String <: Any (and 
> service.streams.transformers.FilterMainCoverSupplier <: 
> org.apache.kafka.streams.kstream.TransformerSupplier[String,dto.ContentDataDto,org.apache.kafka.streams.KeyValue[String,dto.ContentDataDto]]),
>  but Java-defined trait TransformerSupplier is invariant in type K.
> [error] You may wish to investigate a wildcard type such as `_ <: Any`. (SLS 
> 3.2.10)
> [error] Note: dto.ContentDataDto <: Any (and 
> service.streams.transformers.FilterMainCoverSupplier <: 
> org.apache.kafka.streams.kstream.TransformerSupplier[String,dto.ContentDataDto,org.apache.kafka.streams.KeyValue[String,dto.ContentDataDto]]),
>  but Java-defined trait TransformerSupplier is invariant in type V.
> [error] You may wish to investigate a wildcard type such as `_ <: Any`. (SLS 
> 3.2.10)
> [error]   .transform(filterMainCover, 
> FilterMainCoverSupplier.StateStoreName)
> 
> The definition of the Transformer is as follows:
> 
> class FilterMainCover extends Transformer[String, ContentDataDto, 
> KeyValue[String, ContentDataDto]] {
> }
> 
> The definition of the TransformerSupplier is as follows:
> 
> class FilterMainCoverSupplier extends TransformerSupplier[String, 
> ContentDataDto, KeyValue[String, ContentDataDto]] {
> 
>  override def get(): Transformer[String, ContentDataDto, KeyValue[String, 
> ContentDataDto]] = new FilterMainCover()
> 
> }
> 
> 
> I went through the confluent examples and could see that it is supposed to 
> just work. Anyone got an Idea what I am doing wrong?
> 
> Thanks
> Björn
>

Re: Slow Consumer Group Startup

2017-06-16 Thread Eno Thereska

Hi Bryan,

So this must be something else since KIP-134 is not in 0.10.2.1, but in the new 
release 0.11 that hasn't come out yet.

Eno

> On 14 Jun 2017, at 21:35, Bryan Baugher <bjb...@gmail.com> wrote:
> 
> It does seem like we are in a similar situation described in the KIP (
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-134%3A+Delay+initial+consumer+group+rebalance).
> For some historical reason we had set our session.timeout.ms to a high
> value (5 minutes) which corresponds with the amount of the group would
> wait. Lowering helps new consumers join faster but now I'm seeing group
> fluctuations so I'll have to continue to dig into why thats happening. Also
> worth noting the broker's rebalance timeout seems to switch to
> max.poll.interval.ms in 0.10.1.
> 
> On Wed, Jun 14, 2017 at 10:53 AM Bryan Baugher <bjb...@gmail.com> wrote:
> 
>> While I do have some logs its not trivial to share since the logs are
>> across 16 JVMs and a few different hosts.
>> 
>> On Wed, Jun 14, 2017 at 10:34 AM Eno Thereska <eno.there...@gmail.com>
>> wrote:
>> 
>>> The delay in that KIP is just 3 seconds, not minutes though, right? Would
>>> you have any logs to share?
>>> 
>>> Thanks
>>> Eno
>>>> On 14 Jun 2017, at 16:14, Bryan Baugher <bjb...@gmail.com> wrote:
>>>> 
>>>> Our consumer group isn't doing anything stateful and we've seen this
>>>> behavior for existing groups as well. It seems like timing could be an
>>>> issue, thanks for the information.
>>>> 
>>>> On Tue, Jun 13, 2017 at 7:39 PM James Cheng <wushuja...@gmail.com>
>>> wrote:
>>>> 
>>>>> Bryan,
>>>>> 
>>>>> This sounds related to
>>>>> 
>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-134%3A+Delay+initial+consumer+group+rebalance
>>>>> and https://issues.apache.org/jira/browse/KAFKA-4925.
>>>>> 
>>>>> -James
>>>>> 
>>>>>> On Jun 13, 2017, at 7:02 AM, Bryan Baugher <bjb...@gmail.com> wrote:
>>>>>> 
>>>>>> The topics already exist prior to starting any of the consumers
>>>>>> 
>>>>>> On Mon, Jun 12, 2017 at 9:35 PM J Pai <jai.forums2...@gmail.com>
>>> wrote:
>>>>>> 
>>>>>>> When are the topics on which these consumer groups consume, created?
>>>>>>> 
>>>>>>> -Jaikiran
>>>>>>> On 13-Jun-2017, at 3:18 AM, Bryan Baugher <bjb...@gmail.com> wrote:
>>>>>>> 
>>>>>>> Hi everyone,
>>>>>>> 
>>>>>>> We are currently experiencing slow startup times for our consumer
>>> groups
>>>>>>> (16-32 processes for a hundred or more partitions) in the range of
>>>>> minutes
>>>>>>> (3-15 minutes), where little to no messages are consumed before
>>> suddenly
>>>>>>> everything just starts working at full speed.
>>>>>>> 
>>>>>>> I'm currently using Kafka 0.9.0.1 but we are in the middle of
>>> upgrading
>>>>> to
>>>>>>> Kafka 0.10.2.1. We also using the newer kafka consumer API and group
>>>>>>> management on a simple Apache Storm topology. We don't make use of
>>>>> Storm's
>>>>>>> kafka spout but instead wrote a simple one ourselves.
>>>>>>> 
>>>>>>> Using the kafka AdminClient I was able to poll for consumer group
>>>>> summary
>>>>>>> information. What I've found is that the group seems to sit
>>>>>>> in PreparingRebalance state for minutes before finally becoming
>>> Stable
>>>>>>> which then everything starts processing quickly. I've also enabled
>>> debug
>>>>>>> logging around the consumer's coordinator classes but didn't see
>>>>> anything
>>>>>>> to indicate the issue.
>>>>>>> 
>>>>>>> I'm hoping that just upgrading to 0.10 or tweaking how we use our
>>>>> consumer
>>>>>>> in Apache Storm is the problem but are there any pointers on things I
>>>>>>> should look at or try?
>>>>>>> 
>>>>>>> Bryan
>>>>>>> 
>>>>>>> 
>>>>> 
>>>>> 
>>> 
>>>

Re: Single Key Aggregation

2017-06-15 Thread Eno Thereska

I'm not sure if I fully understand this but let me check:

- if you start 2 instances, one instance will process half of the partitions, 
the other instance will process the other half
- for any given key, like key 100, it will only be processed on one of the 
instances, not both.

Does this help?

Eno



> On 15 Jun 2017, at 07:40, Sameer Kumar  wrote:
> 
> Also, I am writing a single key in the output all the time. I believe
> machine2 will have to write a key and since a state store is local it
> wouldn't know about the counter state on another machine. So, I guess this
> will happen.
> 
> -Sameer.
> 
> On Thu, Jun 15, 2017 at 11:11 AM, Sameer Kumar 
> wrote:
> 
>> The input topic contains 60 partitions and data is distributed well across
>> different partitions on different keys. While consumption, I am doing some
>> filtering and writing only single key data.
>> 
>> Output would be something of the form:- Machine 1
>> 
>> 2017-06-13 16:49:10 INFO  LICountClickImprMR2:116 - licount
>> k=P:LIS:1236667:2017_06_13:I,v=651
>> 2017-06-13 16:49:30 INFO  LICountClickImprMR2:116 - licount
>> k=P:LIS:1236667:2017_06_13:I,v=652
>> 
>> Machine 2
>> 2017-06-13 16:49:10 INFO  LICountClickImprMR2:116 - licount
>> k=P:LIS:1236667:2017_06_13:I,v=1
>> 2017-06-13 16:49:30 INFO  LICountClickImprMR2:116 - licount
>> k=P:LIS:1236667:2017_06_13:I,v=2
>> 
>> I am sharing a snippet of code,
>> 
>> private KTable extractLICount(KStream> AdLog> joinedImprLogs) {
>>KTable liCount = joinedImprLogs.flatMap((key, value)
>> -> {
>>  List> l = new ArrayList<>();
>>  if (value == null) {
>>return l;
>>  }
>>  String date = new SimpleDateFormat("_MM_dd").format(new
>> Date(key.window().end()));
>>  // Lineitemids
>>  if (value != null && value.getAdLogType() == 3) {
>>// log.info("Invalid data: " + value);
>>return l;
>>  }
>>  if (value.getAdLogType() == 2) {
>>long lineitemid = value.getAdClickLog().getItmClmbLId();
>>if (lineitemid == TARGETED_LI) {
>>  String liKey = String.format("P:LIS:%s:%s:C", lineitemid, date);
>>  l.add(new KeyValue<>(liKey, 1));
>>}
>>return l;
>>  } else if (value.getAdLogType() == 1){
>> 
>>long[] lineitemids = value.getAdImprLog().getItmClmbLIds();
>>if (value.getAdImprLog().isVisible()) {
>>  for (int i = 0; i < lineitemids.length; i++) {
>>long li = lineitemids[i];
>>if (li == TARGETED_LI) {
>>  // log.info("valid impression ids= " +
>> value.getAdImprLog().toString());
>>  String liKey = String.format("P:LIS:%s:%s:I", li, date);
>>  l.add(new KeyValue<>(liKey, 1));
>>}
>>  }
>>}
>>return l;
>>  }
>>  return l;
>>}).groupBy((k, v) -> k, Serdes.String(), Serdes.Integer())
>>.reduce((value1, value2) -> value1 + value2,
>> LINE_ITEM_COUNT_STORE);
>>return liCount;
>>  }
>> 
>> On Wed, Jun 14, 2017 at 10:55 AM, Sameer Kumar 
>> wrote:
>> 
>>> The input topic contains 60 partitions and data is distributed well
>>> across different partitions on different keys. While consumption, I am
>>> doing some filtering and writing only single key data.
>>> 
>>> Output would be something of the form:- Machine 1
>>> 
>>> 2017-06-13 16:49:10 INFO  LICountClickImprMR2:116 - licount
>>> k=P:LIS:1236667:2017_06_13:I,v=651
>>> 2017-06-13 16:49:30 INFO  LICountClickImprMR2:116 - licount
>>> k=P:LIS:1236667:2017_06_13:I,v=652
>>> 
>>> Machine 2
>>> 2017-06-13 16:49:10 INFO  LICountClickImprMR2:116 - licount
>>> k=P:LIS:1236667:2017_06_13:I,v=1
>>> 2017-06-13 16:49:30 INFO  LICountClickImprMR2:116 - licount
>>> k=P:LIS:1236667:2017_06_13:I,v=2
>>> 
>>> I am sharing a snippet of code,
>>> 
>>> private KTable extractLICount(KStream>> AdLog> joinedImprLogs) {
>>>KTable liCount = joinedImprLogs.flatMap((key, value)
>>> -> {
>>>  List> l = new ArrayList<>();
>>>  if (value == null) {
>>>return l;
>>>  }
>>>  String date = new SimpleDateFormat("_MM_dd").format(new
>>> Date(key.window().end()));
>>>  // Lineitemids
>>>  if (value != null && value.getAdLogType() == 3) {
>>>// log.info("Invalid data: " + value);
>>>return l;
>>>  }
>>>  if (value.getAdLogType() == 2) {
>>>long lineitemid = value.getAdClickLog().getItmClmbLId();
>>>if (lineitemid == TARGETED_LI) {
>>>  String liKey = String.format("P:LIS:%s:%s:C", lineitemid, date);
>>>  l.add(new KeyValue<>(liKey, 1));
>>>}
>>>return l;
>>>  } else if (value.getAdLogType() == 1){
>>> 
>>>long[] lineitemids = value.getAdImprLog().getItmClmbLIds();
>>>if (value.getAdImprLog().isVisible()) {

Re: KStream and KTable different behaviour on filter() operation

2017-06-15 Thread Eno Thereska

Yeah the semantics are slightly different. For a KTable, a null value just 
means that the record is a tombstone, and will be anyways ignored by subsequent 
processing:
http://docs.confluent.io/current/streams/javadocs/org/apache/kafka/streams/kstream/KTable.html#filter-org.apache.kafka.streams.kstream.Predicate-
 


Eno


> On 15 Jun 2017, at 13:51, Paolo Patierno  wrote:
> 
> Hi all,
> 
> 
> I was asking why the different behaviour of filter() operation on a KStream 
> and KTable.
> 
> On KStream, if the predicate is false, the message isn't passed to the next 
> node (so for example if a sinknode, it doesn't arrive to the destination 
> topic).
> 
> On KTable, if the predicate is true, a message with null value is passed.
> 
> With filter operation I want to avoid to receive messages for which the 
> predicate is not valid.
> 
> Why this difference ?
> 
> 
> Thanks,
> 
> Paolo
> 
> 
> Paolo Patierno
> Senior Software Engineer (IoT) @ Red Hat
> Microsoft MVP on Windows Embedded & IoT
> Microsoft Azure Advisor
> 
> Twitter : @ppatierno
> Linkedin : paolopatierno
> Blog : DevExperience

Re: Kafka Streams vs Spark Streaming : reduce by window

2017-06-15 Thread Eno Thereska

Hi Paolo,

Yeah, so if you want fewer records, you should actually "not" disable cache. If 
you disable cache you'll get all the records as you described.

About closing windows: if you close a window and a late record arrives that 
should have been in that window, you basically lose the ability to process that 
record. In Kafka Streams we are robust to that, in that we handle late arriving 
records. There is a comparison here for example when we compare it to other 
methods that depend on watermarks or triggers: 
https://www.confluent.io/blog/watermarks-tables-event-time-dataflow-model/ 
<https://www.confluent.io/blog/watermarks-tables-event-time-dataflow-model/> 

Eno


> On 15 Jun 2017, at 14:57, Paolo Patierno <ppatie...@live.com> wrote:
> 
> Hi Emo,
> 
> 
> thanks for the reply !
> 
> Regarding the cache I'm already using CACHE_MAX_BYTES_BUFFERING_CONFIG = 0 
> (so disabling cache).
> 
> Regarding the interactive query API (I'll take a look) it means that it's up 
> to the application doing something like we have oob with Spark.
> 
> May I ask what do you mean with "We don’t believe in closing windows" ? Isn't 
> it much more code that user has to write for having the same result ?
> 
> I'm exploring Kafka Streams and it's very powerful imho even because the 
> usage is pretty simple but this scenario could have a lack against Spark.
> 
> 
> Thanks,
> 
> Paolo.
> 
> 
> Paolo Patierno
> Senior Software Engineer (IoT) @ Red Hat
> Microsoft MVP on Windows Embedded & IoT
> Microsoft Azure Advisor
> 
> Twitter : @ppatierno<http://twitter.com/ppatierno>
> Linkedin : paolopatierno<http://it.linkedin.com/in/paolopatierno>
> Blog : DevExperience<http://paolopatierno.wordpress.com/>
> 
> 
> 
> From: Eno Thereska <eno.there...@gmail.com>
> Sent: Thursday, June 15, 2017 1:45 PM
> To: users@kafka.apache.org
> Subject: Re: Kafka Streams vs Spark Streaming : reduce by window
> 
> Hi Paolo,
> 
> That is indeed correct. We don’t believe in closing windows in Kafka Streams.
> You could reduce the number of downstream records by using record caches: 
> http://docs.confluent.io/current/streams/developer-guide.html#record-caches-in-the-dsl
>  
> <http://docs.confluent.io/current/streams/developer-guide.html#record-caches-in-the-dsl>.
> 
> Alternatively you can just query the KTable whenever you want using the 
> Interactive Query APIs (so when you query dictates what  data you receive), 
> see this 
> https://www.confluent.io/blog/unifying-stream-processing-and-interactive-queries-in-apache-kafka/
>  
> <https://www.confluent.io/blog/unifying-stream-processing-and-interactive-queries-in-apache-kafka/>
> 
> Thanks
> Eno
>> On Jun 15, 2017, at 2:38 PM, Paolo Patierno <ppatie...@live.com> wrote:
>> 
>> Hi,
>> 
>> 
>> using the streams library I noticed a difference (or there is a lack of 
>> knowledge on my side)with Apache Spark.
>> 
>> Imagine following scenario ...
>> 
>> 
>> I have a source topic where numeric values come in and I want to check the 
>> maximum value in the latest 5 seconds but ... putting the max value into a 
>> destination topic every 5 seconds.
>> 
>> This is what happens with reduceByWindow method in Spark.
>> 
>> I'm using reduce on a KStream here that process the max value taking into 
>> account previous values in the latest 5 seconds but the final value is put 
>> into the destination topic for each incoming value.
>> 
>> 
>> For example ...
>> 
>> 
>> An application sends numeric values every 1 second.
>> 
>> With Spark ... the source gets values every 1 second, process max in a 
>> window of 5 seconds, puts the max into the destination every 5 seconds (so 
>> when the window ends). If the sequence is 21, 25, 22, 20, 26 the output will 
>> be just 26.
>> 
>> With Kafka Streams ... the source gets values every 1 second, process max in 
>> a window of 5 seconds, puts the max into the destination every 1 seconds (so 
>> every time an incoming value arrives). Of course, if for example the 
>> sequence is 21, 25, 22, 20, 26 ... the output will be 21, 25, 25, 25, 26.
>> 
>> 
>> Is it possible with Kafka Streams ? Or it's something to do at application 
>> level ?
>> 
>> 
>> Thanks,
>> 
>> Paolo
>> 
>> 
>> Paolo Patierno
>> Senior Software Engineer (IoT) @ Red Hat
>> Microsoft MVP on Windows Embedded & IoT
>> Microsoft Azure Advisor
>> 
>> Twitter : @ppatierno<http://twitter.com/ppatierno>
>> Linkedin : paolopatierno<http://it.linkedin.com/in/paolopatierno>
>> Blog : DevExperience<http://paolopatierno.wordpress.com/>
>

Re: Kafka Streams vs Spark Streaming : reduce by window

2017-06-15 Thread Eno Thereska

Hi Paolo,

That is indeed correct. We don’t believe in closing windows in Kafka Streams. 
You could reduce the number of downstream records by using record caches: 
http://docs.confluent.io/current/streams/developer-guide.html#record-caches-in-the-dsl
 
.

Alternatively you can just query the KTable whenever you want using the 
Interactive Query APIs (so when you query dictates what  data you receive), see 
this 
https://www.confluent.io/blog/unifying-stream-processing-and-interactive-queries-in-apache-kafka/
 


Thanks
Eno
> On Jun 15, 2017, at 2:38 PM, Paolo Patierno  wrote:
> 
> Hi,
> 
> 
> using the streams library I noticed a difference (or there is a lack of 
> knowledge on my side)with Apache Spark.
> 
> Imagine following scenario ...
> 
> 
> I have a source topic where numeric values come in and I want to check the 
> maximum value in the latest 5 seconds but ... putting the max value into a 
> destination topic every 5 seconds.
> 
> This is what happens with reduceByWindow method in Spark.
> 
> I'm using reduce on a KStream here that process the max value taking into 
> account previous values in the latest 5 seconds but the final value is put 
> into the destination topic for each incoming value.
> 
> 
> For example ...
> 
> 
> An application sends numeric values every 1 second.
> 
> With Spark ... the source gets values every 1 second, process max in a window 
> of 5 seconds, puts the max into the destination every 5 seconds (so when the 
> window ends). If the sequence is 21, 25, 22, 20, 26 the output will be just 
> 26.
> 
> With Kafka Streams ... the source gets values every 1 second, process max in 
> a window of 5 seconds, puts the max into the destination every 1 seconds (so 
> every time an incoming value arrives). Of course, if for example the sequence 
> is 21, 25, 22, 20, 26 ... the output will be 21, 25, 25, 25, 26.
> 
> 
> Is it possible with Kafka Streams ? Or it's something to do at application 
> level ?
> 
> 
> Thanks,
> 
> Paolo
> 
> 
> Paolo Patierno
> Senior Software Engineer (IoT) @ Red Hat
> Microsoft MVP on Windows Embedded & IoT
> Microsoft Azure Advisor
> 
> Twitter : @ppatierno
> Linkedin : paolopatierno
> Blog : DevExperience

Re: Question on Support provision

2017-06-14 Thread Eno Thereska

Hi Sofia, 

Thank you for your recent enquiry for Kafka support services.

Confluent employs some of the world’s foremost Apache Kafka 
 experts, and that 
expertise shows in the level of support we can provide. The subscription offers 
a scaling level of support that is appropriate for your streaming environment. 
Our Enterprise subscription is offered at a range of service levels, which can 
include:

24/7 support
Access to the Confluent Knowledge Base
Response times as fast as 30 minutes based on SLA
Full application lifecycle support from development to operations
Emergency patches not yet included in the open source Apache distribution
The subscription plans map both to the size of the environment supported and 
the level of support (e.g., speed of response) required. Confluent provides 
both operational and development support. 

We'll contact you to enquire your needs further.

Thanks
Eno

> On 14 Jun 2017, at 13:59, Sofia Miari  wrote:
> 
> Good Day Team – may I please ask if there are any companies in EMEA 
> supporting Kafka tool?
> First Data would like to utilize the tool but we will need support so looking 
> for someone in EMEA who can provide us with support (paid, subscription etc).
>  
> Thank you in advance,
>  
> Sofia Miari
> Manager, Strategic Sourcing International | Software Team
> First Data, Stratopedou AVYP 2, Krioneri, Greece, 14568 
> Office: 0030 210 62 44 554
> 
> sofia.mi...@firstdata.gr  | firstdata.com 
>  
> 
> 
>  
> 
> 
> This message and/or its attachments may contain confidential and privileged 
> information and is intended for the sole use of the named person or entity to 
> which it is addressed. If the reader of this e-mail is not the intended 
> recipient or the employee or agent responsible for delivering it to the 
> intended recipient, any dissemination, copying or other use of, or taking any 
> action in reliance upon this e-mail is prohibited. If you have received this 
> e-mail in error please e-mail the sender by replying to this message, 
> following which you should delete it from your system. The internet is not a 
> secure environment and First Data Hellas cannot accept any responsibility for 
> the accuracy or completeness of this message or any liability for any loss or 
> damage arising from the use of this message or from delayed, corrupted, 
> intercepted or virus-infected transmission. Any views expressed in this 
> communication reflect personal opinions of the sender unless otherwise stated.
>

Re: Slow Consumer Group Startup

2017-06-14 Thread Eno Thereska

The delay in that KIP is just 3 seconds, not minutes though, right? Would you 
have any logs to share?

Thanks
Eno
> On 14 Jun 2017, at 16:14, Bryan Baugher  wrote:
> 
> Our consumer group isn't doing anything stateful and we've seen this
> behavior for existing groups as well. It seems like timing could be an
> issue, thanks for the information.
> 
> On Tue, Jun 13, 2017 at 7:39 PM James Cheng  wrote:
> 
>> Bryan,
>> 
>> This sounds related to
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-134%3A+Delay+initial+consumer+group+rebalance
>> and https://issues.apache.org/jira/browse/KAFKA-4925.
>> 
>> -James
>> 
>>> On Jun 13, 2017, at 7:02 AM, Bryan Baugher  wrote:
>>> 
>>> The topics already exist prior to starting any of the consumers
>>> 
>>> On Mon, Jun 12, 2017 at 9:35 PM J Pai  wrote:
>>> 
 When are the topics on which these consumer groups consume, created?
 
 -Jaikiran
 On 13-Jun-2017, at 3:18 AM, Bryan Baugher  wrote:
 
 Hi everyone,
 
 We are currently experiencing slow startup times for our consumer groups
 (16-32 processes for a hundred or more partitions) in the range of
>> minutes
 (3-15 minutes), where little to no messages are consumed before suddenly
 everything just starts working at full speed.
 
 I'm currently using Kafka 0.9.0.1 but we are in the middle of upgrading
>> to
 Kafka 0.10.2.1. We also using the newer kafka consumer API and group
 management on a simple Apache Storm topology. We don't make use of
>> Storm's
 kafka spout but instead wrote a simple one ourselves.
 
 Using the kafka AdminClient I was able to poll for consumer group
>> summary
 information. What I've found is that the group seems to sit
 in PreparingRebalance state for minutes before finally becoming Stable
 which then everything starts processing quickly. I've also enabled debug
 logging around the consumer's coordinator classes but didn't see
>> anything
 to indicate the issue.
 
 I'm hoping that just upgrading to 0.10 or tweaking how we use our
>> consumer
 in Apache Storm is the problem but are there any pointers on things I
 should look at or try?
 
 Bryan
 
 
>> 
>>

Re: getting intermittent TimeoutException at producer side in streams application

2017-06-09 Thread Eno Thereska

Hi Sachin,

As Damian mentioned it'd be useful to see some logs from both broker and 
streams.

One thing that comes to mind is whether your topics are replicated at all. You 
could try setting the replication factor of streams topics (e.g., changelogs 
and repartition topics) to 2 or 3 using StreamsConfig.REPLICATION_FACTOR_CONFIG.

Thanks
Eno


> On 9 Jun 2017, at 20:01, Sachin Mittal  wrote:
> 
> Hi All,
> We still intermittently get this error.
> 
> We had added the config
> props.put(ProducerConfig.RETRIES_CONFIG, Integer.MAX_VALUE);
> 
> and timeout as mentioned above is set as:
> props.put(ProducerConfig.REQUEST_TIMEOUT_MS_CONFIG, 180);
> 
> So we increased from default 30 sec to 3 min to 30 minutes.
> 
> If this is connectivity issue then does this mean that for 30 minutes
> client could not connect to broker ?
> I doubt that will be the case because such error we see at a time on one or
> two partitions only.
> 
> Also we note that changelog topic partitions that get this error sometimes
> become unavailable with leader set as -1.
> 
> Also for this client side error what kind of server exception we should
> expect so we can correlate it with server logs to get better understanding.
> 
> Thanks
> Sachin
> 
> 
> 
> 
> 
> On Mon, Dec 19, 2016 at 5:43 PM, Damian Guy  wrote:
> 
>> Hi Sachin,
>> 
>> This would usually indicate that may indicate that there is a connectivity
>> issue with the brokers. You would need to correlate the logs etc on the
>> brokers with the streams logs to try and understand what is happening.
>> 
>> Thanks,
>> Damian
>> 
>> On Sun, 18 Dec 2016 at 07:26 Sachin Mittal  wrote:
>> 
>>> Hi all,
>>> I have a simple stream application pipeline
>>> src.filter.aggragteByKey.mapValues.forEach
>>> 
>>> From time to time I get the following exception:
>>> Error sending record to topic test-stream-key-table-changelog
>>> org.apache.kafka.common.errors.TimeoutException: Batch containing 2
>>> record(s) expired due to timeout while requesting metadata from brokers
>> for
>>> test-stream-key-table-changelog-0
>>> 
>>> What could be causing the issue?
>>> I investigated a bit and saw none of the stage takes a long time. Even in
>>> forEach stage where we commit the output to external db takes sub 100 ms
>> in
>>> worst case.
>>> 
>>> I have right now done a workaround of
>>> props.put(ProducerConfig.REQUEST_TIMEOUT_MS_CONFIG, 180);
>>> 
>>> Increased the default timeout from 30 seconds to 3 minutes.
>>> 
>>> However to dig deep into the issue where can the problem be?
>>> 
>>> Is it that some stage is taking beyond 30 seconds to execute. Or is it
>> some
>>> network issue where it is taking a long time to connect to broker itself?
>>> 
>>> Any logging that I can enable at the streams side to get more complete
>>> stacktraces?
>>> 
>>> Note that issue occurs in bunches. Then everything works fine for a while
>>> then these exceptions come in bunch and then it works fine for sometime
>>> then again exceptions and so on.
>>> 
>>> Note that my version is kafka_2.10-0.10.0.1.
>>> 
>>> Thanks
>>> Sachin
>>> 
>>

Re: Kafka Streams Failed to rebalance error

2017-06-09 Thread Eno Thereska

Even without a state store the tasks themselves will get rebalanced.

So definitely you'll trigger the problem with the 1.2.3. steps you describe and 
that is confirmed. The reason we increased "max.poll.interval.ms" to basically 
infinite is to just avoid this problem.

Eno
> On 9 Jun 2017, at 07:40, João Peixoto <joao.harti...@gmail.com> wrote:
> 
> I am now able to consistently reproduce this issue with a dummy project.
> 
> 1. Set "max.poll.interval.ms" to a low value
> 2. Have the pipeline take longer than the interval above
> 3. Profit
> 
> This happens every single time and never recovers.
> I simulated the delay by adding a breakpoint on my IDE on a sink "foreach"
> step and then proceeding after the above interval had elapsed.
> 
> Any advice on how to work around this using 0.10.2.1 would be greatly
> appreciated.
> Hope it helps
> 
> On Wed, Jun 7, 2017 at 10:19 PM João Peixoto <joao.harti...@gmail.com>
> wrote:
> 
>> But my stream definition does not have a state store at all, Rocksdb or in
>> memory... That's the most concerning part...
>> On Wed, Jun 7, 2017 at 9:48 PM Sachin Mittal <sjmit...@gmail.com> wrote:
>> 
>>> One instance with 10 threads may cause rocksdb issues.
>>> What is the RAM you have?
>>> 
>>> Also check CPU wait time. Many rocks db instances on one machine (depends
>>> upon number of partitions) may cause lot of disk i/o causing wait times to
>>> increase and hence slowing down the message processing causing frequent
>>> rebalance's.
>>> 
>>> Also what is your topic partitions. My experience is having one thread per
>>> partition is ideal.
>>> 
>>> Thanks
>>> Sachin
>>> 
>>> 
>>> On Thu, Jun 8, 2017 at 9:58 AM, João Peixoto <joao.harti...@gmail.com>
>>> wrote:
>>> 
>>>> There is one instance with 10 threads.
>>>> 
>>>> On Wed, Jun 7, 2017 at 9:07 PM Guozhang Wang <wangg...@gmail.com>
>>> wrote:
>>>> 
>>>>> João,
>>>>> 
>>>>> Do you also have multiple running instances in parallel, and how many
>>>>> threads are your running within each instance?
>>>>> 
>>>>> Guozhang
>>>>> 
>>>>> 
>>>>> On Wed, Jun 7, 2017 at 3:18 PM, João Peixoto <joao.harti...@gmail.com
>>>> 
>>>>> wrote:
>>>>> 
>>>>>> Eno before I do so I just want to be sure this would not be a
>>>> duplicate.
>>>>> I
>>>>>> just found the following issues:
>>>>>> 
>>>>>> * https://issues.apache.org/jira/browse/KAFKA-5167. Marked as being
>>>>> fixed
>>>>>> on 0.11.0.0/0.10.2.2 (both not released afaik)
>>>>>> * https://issues.apache.org/jira/browse/KAFKA-5070. Currently in
>>>>> progress
>>>>>> 
>>>>>> On Wed, Jun 7, 2017 at 2:24 PM Eno Thereska <eno.there...@gmail.com
>>>> 
>>>>>> wrote:
>>>>>> 
>>>>>>> Hi there,
>>>>>>> 
>>>>>>> This might be a bug, would you mind opening a JIRA (copy-pasting
>>>> below
>>>>> is
>>>>>>> sufficient).
>>>>>>> 
>>>>>>> Thanks
>>>>>>> Eno
>>>>>>>> On 7 Jun 2017, at 21:38, João Peixoto <joao.harti...@gmail.com>
>>>>> wrote:
>>>>>>>> 
>>>>>>>> I'm using Kafka Streams 0.10.2.1 and I still see this error
>>>>>>>> 
>>>>>>>> 2017-06-07 20:28:37.211  WARN 73 --- [ StreamThread-1]
>>>>>>>> o.a.k.s.p.internals.StreamThread : Could not create task
>>>>> 0_31.
>>>>>>> Will
>>>>>>>> retry.
>>>>>>>> 
>>>>>>>> org.apache.kafka.streams.errors.LockException: task [0_31]
>>> Failed
>>>> to
>>>>>> lock
>>>>>>>> the state directory for task 0_31
>>>>>>>> at
>>>>>>>> 
>>>>>>> org.apache.kafka.streams.processor.internals.
>>>>>> ProcessorStateManager.(ProcessorStateManager.java:100)
>>>>>>>> ~[kafka-streams-0.10.2.1.jar!/:na]
>>>>>&

Re: Kafka Streams Failed to rebalance error

2017-06-07 Thread Eno Thereska

Hi there,

This might be a bug, would you mind opening a JIRA (copy-pasting below is 
sufficient).

Thanks
Eno
> On 7 Jun 2017, at 21:38, João Peixoto  wrote:
> 
> I'm using Kafka Streams 0.10.2.1 and I still see this error
> 
> 2017-06-07 20:28:37.211  WARN 73 --- [ StreamThread-1]
> o.a.k.s.p.internals.StreamThread : Could not create task 0_31. Will
> retry.
> 
> org.apache.kafka.streams.errors.LockException: task [0_31] Failed to lock
> the state directory for task 0_31
> at
> org.apache.kafka.streams.processor.internals.ProcessorStateManager.(ProcessorStateManager.java:100)
> ~[kafka-streams-0.10.2.1.jar!/:na]
> at
> org.apache.kafka.streams.processor.internals.AbstractTask.(AbstractTask.java:73)
> ~[kafka-streams-0.10.2.1.jar!/:na]
> at
> org.apache.kafka.streams.processor.internals.StreamTask.(StreamTask.java:108)
> ~[kafka-streams-0.10.2.1.jar!/:na]
> at
> org.apache.kafka.streams.processor.internals.StreamThread.createStreamTask(StreamThread.java:864)
> [kafka-streams-0.10.2.1.jar!/:na]
> at
> org.apache.kafka.streams.processor.internals.StreamThread$TaskCreator.createTask(StreamThread.java:1237)
> ~[kafka-streams-0.10.2.1.jar!/:na]
> at
> org.apache.kafka.streams.processor.internals.StreamThread$AbstractTaskCreator.retryWithBackoff(StreamThread.java:1210)
> ~[kafka-streams-0.10.2.1.jar!/:na]
> at
> org.apache.kafka.streams.processor.internals.StreamThread.addStreamTasks(StreamThread.java:967)
> [kafka-streams-0.10.2.1.jar!/:na]
> at
> org.apache.kafka.streams.processor.internals.StreamThread.access$600(StreamThread.java:69)
> [kafka-streams-0.10.2.1.jar!/:na]
> at
> org.apache.kafka.streams.processor.internals.StreamThread$1.onPartitionsAssigned(StreamThread.java:234)
> [kafka-streams-0.10.2.1.jar!/:na]
> at
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:259)
> [kafka-clients-0.10.2.1.jar!/:na]
> at
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:352)
> [kafka-clients-0.10.2.1.jar!/:na]
> at
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:303)
> [kafka-clients-0.10.2.1.jar!/:na]
> at
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:290)
> [kafka-clients-0.10.2.1.jar!/:na]
> at
> org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1029)
> [kafka-clients-0.10.2.1.jar!/:na]
> at
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:995)
> [kafka-clients-0.10.2.1.jar!/:na]
> at
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:592)
> [kafka-streams-0.10.2.1.jar!/:na]
> at
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:361)
> [kafka-streams-0.10.2.1.jar!/:na]
> 
> 
> It has been printing it for hours now, so it does not recover at all.
> The most worrying thing is that this stream definition does not even use
> state stores, it literally looks like this:
> 
> KStreamBuilder builder = new KStreamBuilder();
>KStream kStream =
> builder.stream(appOptions.getInput().getTopic());
>kStream.process(() -> processor);
>new KafkaStreams(builder, streamsConfiguration);
> 
> The "processor" does its thing and calls "context().commit()" when done.
> That's it. Looking at the actual machine running the instance, the folders
> under /tmp/kafka-streams// only have a .lock file.
> 
> This seems to have been bootstrapped by the exception:
> 
> org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be
> completed since the group has already rebalanced and assigned the
> partitions to another member. This means that the time between subsequent
> calls to poll() was longer than the configured max.poll.interval.ms, which
> typically implies that the poll loop is spending too much time message
> processing. You can address this either by increasing the session timeout
> or by reducing the maximum size of batches returned in poll() with
> max.poll.records.
> 
> We are addressing the latter by reducing "max.poll.records" and increasing "
> commit.interval.ms", nonetheless, shouldn't Kafka Streams not worry about
> state dirs if there are no state stores? Since it doesn't seem to do so
> automatically, can I configured it somehow to achieve this end?
> 
> Additionally, what could lead to it not being able to recover?
> 
> On Tue, May 16, 2017 at 3:17 PM Matthias J. Sax 
> wrote:
> 
>> Great! :)
>> 
>> On 5/16/17 2:31 AM, Sameer Kumar wrote:
>>> I see now that my Kafka cluster is very stable, and these errors dont
>> come
>>> now.
>>> 
>>> -Sameer.
>>> 
>>> On Fri, May 5, 2017 at 7:53 AM, Sameer Kumar 
>> wrote:
>>> 
 Yes, I have upgraded my cluster and client both to version 10.2.1 and
 currently monitoring the situation.
 Will report back in

Re: Reliably implementing global KeyValueStore#get

2017-06-07 Thread Eno Thereska

Hi Steven,

You are right in principle. The thing is that what we shipped in Kafka is just 
the low level bare bones that in a sense belong to Kafka. A middle layer that 
keeps track of the data is absolutely needed, and it should hopefully hide the 
distributed system challenges from the end user. Now the question is how should 
such layer look like. I think in all systems there are some basic assumptions 
made about frequency of failures and rebalances just to keep the number of 
retries sane. I agree with you that in principle a rebalance could be always 
happening though.

Eno

> On 6 Jun 2017, at 23:29, Steven Schlansker <sschlans...@opentable.com> wrote:
> 
> 
>> On Jun 6, 2017, at 2:52 PM, Damian Guy <damian@gmail.com> wrote:
>> 
>> Steven,
>> 
>> In practice, data shouldn't be migrating that often. If it is then you
>> probably have bigger problems.
> 
> Understood and agreed, but when designing distributed systems, it usually
> helps to model for the worst case rather than the "well that should never
> happen" case, lest you find yourself fixing those bugs at 3am instead :)
> 
> I'd like to be able to induce extreme pain at the Kafka layer (change leader
> every 3 seconds and migrate all partitions around randomly) and still have
> my app behave correctly.
> 
>> You should be able to use the metadata api
>> to find the instance the key should be on and then when you check that node
>> you can also check with the metadata api that the key should still be on
>> this host. If streams is rebalancing while you query an exception will be
>> raised and you'll need to retry the request once the rebalance has
>> completed.
> 
> Agreed here as well.  But let's assume I have a very fast replication
> setup (assume it takes zero time, for the sake of argument) -- I'm fairly
> sure there's still a race here as this exception only fires *during a 
> migration*
> not *after a migration that may have invalidated your metadata lookup 
> completes*
> 
>> 
>> HTH,
>> Damian
>> 
>> On Tue, 6 Jun 2017 at 18:11 Steven Schlansker <sschlans...@opentable.com>
>> wrote:
>> 
>>> 
>>>> On Jun 6, 2017, at 6:16 AM, Eno Thereska <eno.there...@gmail.com> wrote:
>>>> 
>>>> Hi Steven,
>>>> 
>>>> Do you know beforehand if a key exists? If you know that and are getting
>>> null() the code will have to retry by refreshing the metadata and going to
>>> the new instance. If you don’t know beforehand if a key exists or not you
>>> might have to check all instances of a store to make sure.
>>>> 
>>> 
>>> No, I am not presupposing that the key can exist -- this is a user visible
>>> API and will
>>> be prone to "accidents" :)
>>> 
>>> Thanks for the insight.  I worry that even checking all stores is not
>>> truly sufficient,
>>> as querying different all workers at different times in the presence of
>>> migrating data
>>> can still in theory miss it given pessimal execution.
>>> 
>>> I'm sure I've long wandered off into the hypothetical, but I dream of some
>>> day being
>>> cool like Jepsen :)
>>> 
>>>> Eno
>>>> 
>>>> 
>>>>> On Jun 5, 2017, at 10:12 PM, Steven Schlansker <
>>> sschlans...@opentable.com> wrote:
>>>>> 
>>>>> Hi everyone, me again :)
>>>>> 
>>>>> I'm still trying to implement my "remoting" layer that allows
>>>>> my clients to see the partitioned Kafka Streams state
>>>>> regardless of which instance they hit.  Roughly, my lookup is:
>>>>> 
>>>>> Message get(Key key) {
>>>>> RemoteInstance instance = selectPartition(key);
>>>>> return instance.get(key); // http remoting
>>>>> }
>>>>> 
>>>>> RemoteInstance.get(Key key) { // http endpoint
>>>>> return readOnlyKeyValueStore.get(key);
>>>>> }
>>>>> 
>>>>> However, the mapping of partitions to instances may change.
>>>>> If you call KeyValueStore.get(K) where K is on a partition you
>>>>> don't own, it returns null.  This is indistinguishable from a
>>>>> successful get on a key that doesn't exist.
>>>>> 
>>>>> If one instance selects a sibling instance right as the partition is
>>> failing
>>>>> off of that instance, it may get routed there and by the time it gets
>>>>> the request no longer "owns" the partition -- returns a false 'null'.
>>>>> 
>>>>> You can try re-checking after you get a null value, but that's
>>> susceptible
>>>>> to the same race -- it's unlikely but possible that the data migrates
>>> *back*
>>>>> before you do this re-check.
>>>>> 
>>>>> Is there any way to correctly implement this without races?  I'd imagine
>>>>> you need a new primitive like KeyValueStore#get that atomically finds
>>>>> the key or throws an exception if it is not in an owned partition
>>>>> at the time of lookup so you know to recheck the partition and retry.
>>>>> 
>>>>> Thoughts?
>>>>> 
>>>>> Thanks again,
>>>>> Steven
>>>>> 
>>>> 
>>> 
>>> 
>

Re: [DISCUSS]: KIP-161: streams record processing exception handlers

2017-06-07 Thread Eno Thereska

Comments inline:

> On 5 Jun 2017, at 18:19, Jan Filipiak <jan.filip...@trivago.com> wrote:
> 
> Hi
> 
> just my few thoughts
> 
> On 05.06.2017 11:44, Eno Thereska wrote:
>> Hi there,
>> 
>> Sorry for the late reply, I was out this past week. Looks like good progress 
>> was made with the discussions either way. Let me recap a couple of points I 
>> saw into one big reply:
>> 
>> 1. Jan mentioned CRC errors. I think this is a good point. As these happen 
>> in Kafka, before Kafka Streams gets a chance to inspect anything, I'd like 
>> to hear the opinion of more Kafka folks like Ismael or Jason on this one. 
>> Currently the documentation is not great with what to do once a CRC check 
>> has failed. From looking at the code, it looks like the client gets a 
>> KafkaException (bubbled up from the fetcher) and currently we in streams 
>> catch this as part of poll() and fail. It might be advantageous to treat CRC 
>> handling in a similar way to serialisation handling (e.g., have the option 
>> to fail/skip). Let's see what the other folks say. Worst-case we can do a 
>> separate KIP for that if it proved too hard to do in one go.
> there is no reasonable way to "skip" a crc error. How can you know the length 
> you read was anything reasonable? you might be completely lost inside your 
> response.

On the client side, every record received is checked for validity. As it 
happens, if the CRC check fails the exception is wrapped with a KafkaException 
that is thrown all the way to poll(). Assuming we change that and poll() throws 
a CRC exception, I was thinking we could treat it similarly to a deserialize 
exception and pass it to the exception handler to decide what to do. Default 
would be to fail. This might need a Kafka KIP btw and can be done separately 
from this KIP, but Jan, would you find this useful?

>> 
>> 
>> At a minimum, handling this type of exception will need to involve the 
>> exactly-once (EoS) logic. We'd still allow the option of failing or 
>> skipping, but EoS would need to clean up by rolling back all the side 
>> effects from the processing so far. Matthias, how does this sound?
> Eos will not help the record might be 5,6 repartitions down into the 
> topology. I haven't followed but I pray you made EoS optional! We don't need 
> this and we don't want this and we will turn it off if it comes. So I 
> wouldn't recommend relying on it. The option to turn it off is better than 
> forcing it and still beeing unable to rollback badpills (as explained before)
>> 

Yeah as Matthias mentioned EoS is optional.

Thanks,
Eno


>> 6. Will add an end-to-end example as Michael suggested.
>> 
>> Thanks
>> Eno
>> 
>> 
>> 
>>> On 4 Jun 2017, at 02:35, Matthias J. Sax <matth...@confluent.io> wrote:
>>> 
>>> What I don't understand is this:
>>> 
>>>> From there on its the easiest way forward: fix, redeploy, start => done
>>> If you have many producers that work fine and a new "bad" producer
>>> starts up and writes bad data into your input topic, your Streams app
>>> dies but all your producers, including the bad one, keep writing.
>>> 
>>> Thus, how would you fix this, as you cannot "remove" the corrupted date
>>> from the topic? It might take some time to identify the root cause and
>>> stop the bad producer. Up to this point you get good and bad data into
>>> your Streams input topic. If Streams app in not able to skip over those
>>> bad records, how would you get all the good data from the topic? Not
>>> saying it's not possible, but it's extra work copying the data with a
>>> new non-Streams consumer-producer-app into a new topic and than feed
>>> your Streams app from this new topic -- you also need to update all your
>>> upstream producers to write to the new topic.
>>> 
>>> Thus, if you want to fail fast, you can still do this. And after you
>>> detected and fixed the bad producer you might just reconfigure your app
>>> to skip bad records until it reaches the good part of the data.
>>> Afterwards, you could redeploy with fail-fast again.
>>> 
>>> 
>>> Thus, for this pattern, I actually don't see any reason why to stop the
>>> Streams app at all. If you have a callback, and use the callback to
>>> raise an alert (and maybe get the bad data into a bad record queue), it
>>> will not take longer to identify and stop the "bad" producer. But for
>>> this case, you have zero downtime for your Streams app.
>>> 
>>> This seems to be

Re: Reliably implementing global KeyValueStore#get

2017-06-06 Thread Eno Thereska

Hi Steven,

Do you know beforehand if a key exists? If you know that and are getting null() 
the code will have to retry by refreshing the metadata and going to the new 
instance. If you don’t know beforehand if a key exists or not you might have to 
check all instances of a store to make sure. 

Eno


> On Jun 5, 2017, at 10:12 PM, Steven Schlansker  
> wrote:
> 
> Hi everyone, me again :)
> 
> I'm still trying to implement my "remoting" layer that allows
> my clients to see the partitioned Kafka Streams state
> regardless of which instance they hit.  Roughly, my lookup is:
> 
> Message get(Key key) {
>RemoteInstance instance = selectPartition(key);
>return instance.get(key); // http remoting
> }
> 
> RemoteInstance.get(Key key) { // http endpoint
>return readOnlyKeyValueStore.get(key);
> }
> 
> However, the mapping of partitions to instances may change.
> If you call KeyValueStore.get(K) where K is on a partition you
> don't own, it returns null.  This is indistinguishable from a
> successful get on a key that doesn't exist.
> 
> If one instance selects a sibling instance right as the partition is failing
> off of that instance, it may get routed there and by the time it gets
> the request no longer "owns" the partition -- returns a false 'null'.
> 
> You can try re-checking after you get a null value, but that's susceptible
> to the same race -- it's unlikely but possible that the data migrates *back*
> before you do this re-check.
> 
> Is there any way to correctly implement this without races?  I'd imagine
> you need a new primitive like KeyValueStore#get that atomically finds
> the key or throws an exception if it is not in an owned partition
> at the time of lookup so you know to recheck the partition and retry.
> 
> Thoughts?
> 
> Thanks again,
> Steven
>

Re: [DISCUSS]: KIP-161: streams record processing exception handlers

2017-06-05 Thread Eno Thereska

Hi there,

Sorry for the late reply, I was out this past week. Looks like good progress 
was made with the discussions either way. Let me recap a couple of points I saw 
into one big reply:

1. Jan mentioned CRC errors. I think this is a good point. As these happen in 
Kafka, before Kafka Streams gets a chance to inspect anything, I'd like to hear 
the opinion of more Kafka folks like Ismael or Jason on this one. Currently the 
documentation is not great with what to do once a CRC check has failed. From 
looking at the code, it looks like the client gets a KafkaException (bubbled up 
from the fetcher) and currently we in streams catch this as part of poll() and 
fail. It might be advantageous to treat CRC handling in a similar way to 
serialisation handling (e.g., have the option to fail/skip). Let's see what the 
other folks say. Worst-case we can do a separate KIP for that if it proved too 
hard to do in one go.

2. Damian has convinced me that the KIP should just be for deserialisation from 
the network, not from local state store DBs. For the latter we'll follow the 
current way of failing since the DB is likely corrupt.

3. Dead letter queue option. There was never any intention here to do anything 
super clever like attempt to re-inject the failed records from the dead letter 
queue back into the system. Reasoning about when that'd be useful in light of 
all sorts of semantic breakings would be hard (arguably impossible). The idea 
was to just have a place to have all these dead records to help with subsequent 
debugging. We could also just log a whole bunch of info for a poison pill 
record and not have a dead letter queue at all. Perhaps that's a better, 
simpler, starting point. 

4. Agree with Jay on style, a DefaultHandler with some config options. Will add 
options to KIP. Also as part of this let's remove the threshold logger since it 
gets complex and arguably the ROI is low. 

5. Jay's JSON example, where serialisation passes but the JSON message doesn't 
have the expected fields, is an interesting one. It's a bit complicated to 
handle this in the middle of processing. For example, some operators in the DAG 
might actually find the needed JSON fields and make progress, but other 
operators, for the same record, might not find their fields and will throw an 
exception.

At a minimum, handling this type of exception will need to involve the 
exactly-once (EoS) logic. We'd still allow the option of failing or skipping, 
but EoS would need to clean up by rolling back all the side effects from the 
processing so far. Matthias, how does this sound?

6. Will add an end-to-end example as Michael suggested.

Thanks
Eno

> On 4 Jun 2017, at 02:35, Matthias J. Sax  wrote:
> 
> What I don't understand is this:
> 
>> From there on its the easiest way forward: fix, redeploy, start => done 
> 
> If you have many producers that work fine and a new "bad" producer
> starts up and writes bad data into your input topic, your Streams app
> dies but all your producers, including the bad one, keep writing.
> 
> Thus, how would you fix this, as you cannot "remove" the corrupted date
> from the topic? It might take some time to identify the root cause and
> stop the bad producer. Up to this point you get good and bad data into
> your Streams input topic. If Streams app in not able to skip over those
> bad records, how would you get all the good data from the topic? Not
> saying it's not possible, but it's extra work copying the data with a
> new non-Streams consumer-producer-app into a new topic and than feed
> your Streams app from this new topic -- you also need to update all your
> upstream producers to write to the new topic.
> 
> Thus, if you want to fail fast, you can still do this. And after you
> detected and fixed the bad producer you might just reconfigure your app
> to skip bad records until it reaches the good part of the data.
> Afterwards, you could redeploy with fail-fast again.
> 
> 
> Thus, for this pattern, I actually don't see any reason why to stop the
> Streams app at all. If you have a callback, and use the callback to
> raise an alert (and maybe get the bad data into a bad record queue), it
> will not take longer to identify and stop the "bad" producer. But for
> this case, you have zero downtime for your Streams app.
> 
> This seems to be much simpler. Or do I miss anything?
> 
> 
> Having said this, I agree that the "threshold based callback" might be
> questionable. But as you argue for strict "fail-fast", I want to argue
> that this must not always be the best pattern to apply and that the
> overall KIP idea is super useful from my point of view.
> 
> 
> -Matthias
> 
> 
> On 6/3/17 11:57 AM, Jan Filipiak wrote:
>> Could not agree more!
>> 
>> But then I think the easiest is still: print exception and die.
>> From there on its the easiest way forward: fix, redeploy, start => done
>> 
>> All the other ways to recover a pipeline that was processing partially
>> all

Re: [DISCUSS]: KIP-161: streams record processing exception handlers

2017-05-26 Thread Eno Thereska

Hi Damian,

I was thinking of cases when there is bit-rot on the storage itself and we get 
a malformed record that cannot be de-serialized. There is an interesting 
intersection here with CRCs in both Kafka (already there, they throw on 
deserialization) and potentially local storage (we don't have CRCs here on the 
data files, though RocksDB has them on its write-ahead log records). 

Basically in a nutshell, I'm saying that every deserialization exception should 
go through this new path. The user can decide to fail or continue. We could 
start with just poison pills from Kafka though and punt the storage one to 
later. 

Eno

> On 26 May 2017, at 16:59, Damian Guy <damian@gmail.com> wrote:
> 
> Eno,
> 
> Under what circumstances would you get a deserialization exception from the
> state store? I can only think of the case where someone has provided a bad
> deserializer to a method that creates a state store. In which case it would
> be a user error and probably should just abort?
> 
> Thanks,
> Damian
> 
> On Fri, 26 May 2017 at 16:32 Eno Thereska <eno.there...@gmail.com> wrote:
> 
>> See latest reply to Jan's note. I think I unnecessarily broadened the
>> scope of this KIP to the point where it sounded like it handles all sorts
>> of exceptions. The scope should be strictly limited to "poison pill"
>> records for now. Will update KIP,
>> 
>> Thanks
>> Eno
>>> On 26 May 2017, at 16:16, Matthias J. Sax <matth...@confluent.io> wrote:
>>> 
>>> "bad" for this case would mean, that we got an
>>> `DeserializationException`. I am not sure if any other processing error
>>> should be covered?
>>> 
>>> @Eno: this raises one one question. Might it be better to allow for two
>>> handlers instead of one? One for deserialization exception and one for
>>> all other exceptions from user code?
>>> 
>>> Just a thought.
>>> 
>>> 
>>> -Matthias
>>> 
>>> On 5/26/17 7:49 AM, Jim Jagielski wrote:
>>>> 
>>>>> On May 26, 2017, at 5:13 AM, Eno Thereska <eno.there...@gmail.com>
>> wrote:
>>>>> 
>>>>> 
>>>>>> 
>>>>>> 
>>>>>> With regard to `DeserializationException`, do you thing it might make
>>>>>> sense to have a "dead letter queue" as a feature to provide
>> out-of-the-box?
>>>>> 
>>>>> We could provide a special topic where bad messages go to, and then
>> we'd have to add a config option for the user to provide a topic. Is that
>> what you're thinking?
>>>>> 
>>>> 
>>>> For various definitions of "bad"??
>>>> 
>>> 
>> 
>>

Re: [DISCUSS]: KIP-161: streams record processing exception handlers

2017-05-26 Thread Eno Thereska

See latest reply to Jan's note. I think I unnecessarily broadened the scope of 
this KIP to the point where it sounded like it handles all sorts of exceptions. 
The scope should be strictly limited to "poison pill" records for now. Will 
update KIP, 

Thanks
Eno
> On 26 May 2017, at 16:16, Matthias J. Sax <matth...@confluent.io> wrote:
> 
> "bad" for this case would mean, that we got an
> `DeserializationException`. I am not sure if any other processing error
> should be covered?
> 
> @Eno: this raises one one question. Might it be better to allow for two
> handlers instead of one? One for deserialization exception and one for
> all other exceptions from user code?
> 
> Just a thought.
> 
> 
> -Matthias
> 
> On 5/26/17 7:49 AM, Jim Jagielski wrote:
>> 
>>> On May 26, 2017, at 5:13 AM, Eno Thereska <eno.there...@gmail.com> wrote:
>>> 
>>> 
>>>> 
>>>> 
>>>> With regard to `DeserializationException`, do you thing it might make
>>>> sense to have a "dead letter queue" as a feature to provide out-of-the-box?
>>> 
>>> We could provide a special topic where bad messages go to, and then we'd 
>>> have to add a config option for the user to provide a topic. Is that what 
>>> you're thinking?
>>> 
>> 
>> For various definitions of "bad"??
>> 
>

Re: [DISCUSS]: KIP-161: streams record processing exception handlers

2017-05-26 Thread Eno Thereska

Hi Jan,

You're right. I think I got carried away and broadened the scope of this KIP 
beyond it's original purpose. This handler will only be there for 
deserialization errors, i.e., "poison pills" and is not intended to be a 
catch-all handler for all sorts of other problems (e.g., NPE exception in user 
code). Deserialization erros can happen either when polling or when 
deserialising from a state store. So that narrows down the scope of the KIP, 
will update it.

Thanks
Eno

> On 26 May 2017, at 11:31, Jan Filipiak <jan.filip...@trivago.com> wrote:
> 
> Hi
> 
> unfortunatly no. Think about "caching" these records popping outta there or 
> multiple step Tasks (join,aggregate,repartiton all in one go) last 
> repartitioner might throw cause it cant determine the partition only because 
> a get on the join store cause a flush through the aggregates. This has 
> nothing todo with a ConsumerRecord at all. Especially not the one we most 
> recently processed.
> 
> To be completly honest. All but grining to a hold is not appealing to me at 
> all. Sure maybe lagmonitoring will call me on Sunday but I can at least be 
> confident its working the rest of the time.
> 
> Best Jan
> 
> PS.:
> 
> Hope you get my point. I am mostly complaing about
> 
> |public| |interface| |RecordExceptionHandler {|
> |||/**|
> |||* Inspect a record and the exception received|
> |||*/|
> |||HandlerResponse handle(that guy here >>>>>>>   ConsumerRecord<||byte||[], 
> ||byte||[]> record, Exception exception);|
> |}|
> ||
> |public| |enum| |HandlerResponse {|
> |||/* continue with processing */|
> |||CONTINUE(||1||), |
> |||/* fail the processing and stop */|
> |||FAIL(||2||);|
> |}|
> 
> 
> 
> On 26.05.2017 11:18, Eno Thereska wrote:
>> Thanks Jan,
>> 
>> The record passed to the handler will always be the problematic record. 
>> There are 2 cases/types of exceptions for the purposes of this KIP: 1) any 
>> exception during deserialization. The bad record + the exception (i.e. 
>> DeserializeException) will be passed to the handler. The handler will be 
>> able to tell this was a deserialization error.
>> 2) any exception during processing of this record. So whenever a processor 
>> gets the record (after some caching, etc) it starts to process it, then it 
>> fails, then it will call the handler with this record.
>> 
>> Does that match your thinking?
>> 
>> Thanks,
>> Eno
>> 
>> 
>>> On 26 May 2017, at 09:51, Jan Filipiak <jan.filip...@trivago.com> wrote:
>>> 
>>> Hi,
>>> 
>>> quick question: From the KIP it doesn't quite makes sense to me how that 
>>> fits with caching.
>>> With caching the consumer record might not be at all related to some 
>>> processor throwing while processing.
>>> 
>>> would it not make more sense to get the ProcessorName + object object for 
>>> processing and
>>> statestore or topic name + byte[] byte[]  for serializers? maybe passing in 
>>> the used serdes?
>>> 
>>> Best Jan
>>> 
>>> 
>>> 
>>> On 25.05.2017 11:47, Eno Thereska wrote:
>>>> Hi there,
>>>> 
>>>> I’ve added a KIP on improving exception handling in streams:
>>>> KIP-161: streams record processing exception handlers. 
>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-161%3A+streams+record+processing+exception+handlers
>>>>  
>>>> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-161:+streams+record+processing+exception+handlers>
>>>> 
>>>> Discussion and feedback is welcome, thank you.
>>>> Eno
>

Re: [DISCUSS]: KIP-161: streams record processing exception handlers

2017-05-26 Thread Eno Thereska

Thanks Jan,

The record passed to the handler will always be the problematic record. There 
are 2 cases/types of exceptions for the purposes of this KIP: 1) any exception 
during deserialization. The bad record + the exception (i.e. 
DeserializeException) will be passed to the handler. The handler will be able 
to tell this was a deserialization error. 
2) any exception during processing of this record. So whenever a processor gets 
the record (after some caching, etc) it starts to process it, then it fails, 
then it will call the handler with this record.

Does that match your thinking?

Thanks,
Eno

> On 26 May 2017, at 09:51, Jan Filipiak <jan.filip...@trivago.com> wrote:
> 
> Hi,
> 
> quick question: From the KIP it doesn't quite makes sense to me how that fits 
> with caching.
> With caching the consumer record might not be at all related to some 
> processor throwing while processing.
> 
> would it not make more sense to get the ProcessorName + object object for 
> processing and
> statestore or topic name + byte[] byte[]  for serializers? maybe passing in 
> the used serdes?
> 
> Best Jan
> 
> 
> 
> On 25.05.2017 11:47, Eno Thereska wrote:
>> Hi there,
>> 
>> I’ve added a KIP on improving exception handling in streams:
>> KIP-161: streams record processing exception handlers. 
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-161%3A+streams+record+processing+exception+handlers
>>  
>> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-161:+streams+record+processing+exception+handlers>
>> 
>> Discussion and feedback is welcome, thank you.
>> Eno
>

Re: Streams error handling

2017-05-25 Thread Eno Thereska

Hi Mike, 

Just a heads up, we’ve started the feedback process on this in the DISCUSS 
thread for KIP-161. Feel free to read that thread and the KIP and comment.

Thanks
Eno
> On May 24, 2017, at 3:35 PM, Mike Gould <mikeyg...@gmail.com> wrote:
> 
> Watching it with interest thanks
> 
> Not sure where appropriate to add suggestions but I'd vote for exceptions
> being passed along the stream in something like a hidden Either wrapper.
> Most of the KStream methods would ignore this but new onException() or
> similar methods would be able to examine the error with the key/value prior
> to the error and handle it - possibly by replacing the message, sending a
> message to a new stream, or even putting it back on the original stream for
> retry.
> 
> Regards
> MikeG
> 
> On Wed, 24 May 2017 at 10:09, Eno Thereska <eno.there...@gmail.com 
> <mailto:eno.there...@gmail.com>> wrote:
> 
>> Just a heads up that we're tracking this and other improvements in
>> exception handling at https://issues.apache.org/jira/browse/KAFKA-5156 
>> <https://issues.apache.org/jira/browse/KAFKA-5156> <
>> https://issues.apache.org/jira/browse/KAFKA-5156 
>> <https://issues.apache.org/jira/browse/KAFKA-5156>>.
>> 
>> Thanks
>> Eno
>>> On 23 May 2017, at 17:31, Mike Gould <mikeyg...@gmail.com> wrote:
>>> 
>>> That's great for the value but not the key
>>> 
>>> On Thu, 13 Apr 2017 at 18:27, Sachin Mittal <sjmit...@gmail.com> wrote:
>>> 
>>>> We are also catching the exception in serde and returning null and then
>>>> filtering out null values downstream so as they are not included.
>>>> 
>>>> Thanks
>>>> Sachin
>>>> 
>>>> 
>>>> On Thu, Apr 13, 2017 at 9:13 PM, Mike Gould <mikeyg...@gmail.com>
>> wrote:
>>>> 
>>>>> Great to know I've not gone off in the wrong direction
>>>>> Thanks
>>>>> 
>>>>> On Thu, 13 Apr 2017 at 16:34, Matthias J. Sax <matth...@confluent.io>
>>>>> wrote:
>>>>> 
>>>>>> Mike,
>>>>>> 
>>>>>> thanks for your feedback. You are absolutely right that Streams API
>>>> does
>>>>>> not have great support for this atm. And it's very valuable that you
>>>>>> report this (you are not the first person). It helps us prioritizing
>> :)
>>>>>> 
>>>>>> For now, there is no better solution as the one you described in your
>>>>>> email, but its on our roadmap to improve the API -- and its priority
>>>> got
>>>>>> just increase by your request.
>>>>>> 
>>>>>> I am sorry, that I can't give you a better answer right now :(
>>>>>> 
>>>>>> 
>>>>>> -Matthias
>>>>>> 
>>>>>> 
>>>>>> On 4/13/17 8:16 AM, Mike Gould wrote:
>>>>>>> Hi
>>>>>>> Are there any better error handling options for Kafka streams in
>>>> java.
>>>>>>> 
>>>>>>> Any errors in the serdes will break the stream.  The suggested
>>>>>>> implementation is to use the byte[] serde and do the deserialisation
>>>>> in a
>>>>>>> map operation.  However this isn't ideal either as there's no great
>>>> way
>>>>>> to
>>>>>>> handle exceptions.
>>>>>>> My current tactics are to use flatMap in place of map everywhere and
>>>>>> return
>>>>>>> empySet on error. Unfortunately this means the error has to be
>>>> handled
>>>>>>> directly in the function where it happened and can only be handled
>>>> as a
>>>>>>> side effect.
>>>>>>> 
>>>>>>> It seems to me that this could be done better. Maybe the *Mapper
>>>>>> interfaces
>>>>>>> could allow specific checked exceptions. These could be handled by
>>>>>> specific
>>>>>>> downstream KStream.mapException() steps which might e.g. Put an error
>>>>>>> response on another stream branch.
>>>>>>> Alternatively could it be made easier to return something like an
>>>>> Either
>>>>>>> from the Mappers with a the addition of few extra mapError or mapLeft
>>>>>>> mapRight methods on KStream?
>>>>>>> 
>>>>>>> Unless there's a better error handling pattern which I've entirely
>>>>>> missed?
>>>>>>> 
>>>>>>> Thanks
>>>>>>> MIkeG
>>>>>>> 
>>>>>> 
>>>>>> --
>>>>> - MikeG
>>>>> http://en.wikipedia.org/wiki/Common_misconceptions
>>>>> <http://en.wikipedia.org/wiki/Special:Random>
>>>>> 
>>>> 
>>> --
>>> - MikeG
>>> http://en.wikipedia.org/wiki/Common_misconceptions
>>> <http://en.wikipedia.org/wiki/Special:Random>
>> 
>> --
> - MikeG
> http://en.wikipedia.org/wiki/Common_misconceptions 
> <http://en.wikipedia.org/wiki/Common_misconceptions>
> <http://en.wikipedia.org/wiki/Special:Random 
> <http://en.wikipedia.org/wiki/Special:Random>>

[DISCUSS]: KIP-161: streams record processing exception handlers

2017-05-25 Thread Eno Thereska

Hi there,

I’ve added a KIP on improving exception handling in streams:
KIP-161: streams record processing exception handlers. 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-161%3A+streams+record+processing+exception+handlers
 


Discussion and feedback is welcome, thank you.
Eno

Re: Streams error handling

2017-05-24 Thread Eno Thereska

Thanks Mike,

Most of the JIRAs there are internal cleanups, but for the user-facing ones 
we're planning on creating a wiki and collecting feedback like yours, and a 
KIP, so stay tuned (your current feedback already noted, thanks).

Eno
> On 24 May 2017, at 15:35, Mike Gould <mikeyg...@gmail.com> wrote:
> 
> Watching it with interest thanks
> 
> Not sure where appropriate to add suggestions but I'd vote for exceptions
> being passed along the stream in something like a hidden Either wrapper.
> Most of the KStream methods would ignore this but new onException() or
> similar methods would be able to examine the error with the key/value prior
> to the error and handle it - possibly by replacing the message, sending a
> message to a new stream, or even putting it back on the original stream for
> retry.
> 
> Regards
> MikeG
> 
> On Wed, 24 May 2017 at 10:09, Eno Thereska <eno.there...@gmail.com 
> <mailto:eno.there...@gmail.com>> wrote:
> 
>> Just a heads up that we're tracking this and other improvements in
>> exception handling at https://issues.apache.org/jira/browse/KAFKA-5156 
>> <https://issues.apache.org/jira/browse/KAFKA-5156> <
>> https://issues.apache.org/jira/browse/KAFKA-5156 
>> <https://issues.apache.org/jira/browse/KAFKA-5156>>.
>> 
>> Thanks
>> Eno
>>> On 23 May 2017, at 17:31, Mike Gould <mikeyg...@gmail.com> wrote:
>>> 
>>> That's great for the value but not the key
>>> 
>>> On Thu, 13 Apr 2017 at 18:27, Sachin Mittal <sjmit...@gmail.com> wrote:
>>> 
>>>> We are also catching the exception in serde and returning null and then
>>>> filtering out null values downstream so as they are not included.
>>>> 
>>>> Thanks
>>>> Sachin
>>>> 
>>>> 
>>>> On Thu, Apr 13, 2017 at 9:13 PM, Mike Gould <mikeyg...@gmail.com>
>> wrote:
>>>> 
>>>>> Great to know I've not gone off in the wrong direction
>>>>> Thanks
>>>>> 
>>>>> On Thu, 13 Apr 2017 at 16:34, Matthias J. Sax <matth...@confluent.io>
>>>>> wrote:
>>>>> 
>>>>>> Mike,
>>>>>> 
>>>>>> thanks for your feedback. You are absolutely right that Streams API
>>>> does
>>>>>> not have great support for this atm. And it's very valuable that you
>>>>>> report this (you are not the first person). It helps us prioritizing
>> :)
>>>>>> 
>>>>>> For now, there is no better solution as the one you described in your
>>>>>> email, but its on our roadmap to improve the API -- and its priority
>>>> got
>>>>>> just increase by your request.
>>>>>> 
>>>>>> I am sorry, that I can't give you a better answer right now :(
>>>>>> 
>>>>>> 
>>>>>> -Matthias
>>>>>> 
>>>>>> 
>>>>>> On 4/13/17 8:16 AM, Mike Gould wrote:
>>>>>>> Hi
>>>>>>> Are there any better error handling options for Kafka streams in
>>>> java.
>>>>>>> 
>>>>>>> Any errors in the serdes will break the stream.  The suggested
>>>>>>> implementation is to use the byte[] serde and do the deserialisation
>>>>> in a
>>>>>>> map operation.  However this isn't ideal either as there's no great
>>>> way
>>>>>> to
>>>>>>> handle exceptions.
>>>>>>> My current tactics are to use flatMap in place of map everywhere and
>>>>>> return
>>>>>>> empySet on error. Unfortunately this means the error has to be
>>>> handled
>>>>>>> directly in the function where it happened and can only be handled
>>>> as a
>>>>>>> side effect.
>>>>>>> 
>>>>>>> It seems to me that this could be done better. Maybe the *Mapper
>>>>>> interfaces
>>>>>>> could allow specific checked exceptions. These could be handled by
>>>>>> specific
>>>>>>> downstream KStream.mapException() steps which might e.g. Put an error
>>>>>>> response on another stream branch.
>>>>>>> Alternatively could it be made easier to return something like an
>>>>> Either
>>>>>>> from the Mappers with a the addition of few extra mapError or mapLeft
>>>>>>> mapRight methods on KStream?
>>>>>>> 
>>>>>>> Unless there's a better error handling pattern which I've entirely
>>>>>> missed?
>>>>>>> 
>>>>>>> Thanks
>>>>>>> MIkeG
>>>>>>> 
>>>>>> 
>>>>>> --
>>>>> - MikeG
>>>>> http://en.wikipedia.org/wiki/Common_misconceptions
>>>>> <http://en.wikipedia.org/wiki/Special:Random>
>>>>> 
>>>> 
>>> --
>>> - MikeG
>>> http://en.wikipedia.org/wiki/Common_misconceptions
>>> <http://en.wikipedia.org/wiki/Special:Random>
>> 
>> --
> - MikeG
> http://en.wikipedia.org/wiki/Common_misconceptions 
> <http://en.wikipedia.org/wiki/Common_misconceptions>
> <http://en.wikipedia.org/wiki/Special:Random 
> <http://en.wikipedia.org/wiki/Special:Random>>

Re: Streams error handling

2017-05-24 Thread Eno Thereska

Just a heads up that we're tracking this and other improvements in exception 
handling at https://issues.apache.org/jira/browse/KAFKA-5156 
.

Thanks
Eno
> On 23 May 2017, at 17:31, Mike Gould  wrote:
> 
> That's great for the value but not the key
> 
> On Thu, 13 Apr 2017 at 18:27, Sachin Mittal  wrote:
> 
>> We are also catching the exception in serde and returning null and then
>> filtering out null values downstream so as they are not included.
>> 
>> Thanks
>> Sachin
>> 
>> 
>> On Thu, Apr 13, 2017 at 9:13 PM, Mike Gould  wrote:
>> 
>>> Great to know I've not gone off in the wrong direction
>>> Thanks
>>> 
>>> On Thu, 13 Apr 2017 at 16:34, Matthias J. Sax 
>>> wrote:
>>> 
 Mike,
 
 thanks for your feedback. You are absolutely right that Streams API
>> does
 not have great support for this atm. And it's very valuable that you
 report this (you are not the first person). It helps us prioritizing :)
 
 For now, there is no better solution as the one you described in your
 email, but its on our roadmap to improve the API -- and its priority
>> got
 just increase by your request.
 
 I am sorry, that I can't give you a better answer right now :(
 
 
 -Matthias
 
 
 On 4/13/17 8:16 AM, Mike Gould wrote:
> Hi
> Are there any better error handling options for Kafka streams in
>> java.
> 
> Any errors in the serdes will break the stream.  The suggested
> implementation is to use the byte[] serde and do the deserialisation
>>> in a
> map operation.  However this isn't ideal either as there's no great
>> way
 to
> handle exceptions.
> My current tactics are to use flatMap in place of map everywhere and
 return
> empySet on error. Unfortunately this means the error has to be
>> handled
> directly in the function where it happened and can only be handled
>> as a
> side effect.
> 
> It seems to me that this could be done better. Maybe the *Mapper
 interfaces
> could allow specific checked exceptions. These could be handled by
 specific
> downstream KStream.mapException() steps which might e.g. Put an error
> response on another stream branch.
> Alternatively could it be made easier to return something like an
>>> Either
> from the Mappers with a the addition of few extra mapError or mapLeft
> mapRight methods on KStream?
> 
> Unless there's a better error handling pattern which I've entirely
 missed?
> 
> Thanks
> MIkeG
> 
 
 --
>>> - MikeG
>>> http://en.wikipedia.org/wiki/Common_misconceptions
>>> 
>>> 
>> 
> -- 
> - MikeG
> http://en.wikipedia.org/wiki/Common_misconceptions
>

Re: Kafka Streams RocksDB permanent StateStoreSupplier seemingly never deletes data

2017-05-18 Thread Eno Thereska

Hi Vincent,

Could you share your code, the part where you write to the state store and then 
delete. I'm wondering if you have iterators in your code that need to be 
closed().

Eno
> On 16 May 2017, at 16:22, Vincent Bernardi <vinc...@kameleoon.com> wrote:
> 
> Hi Eno,
> Thanks for your answer. I tried sending a followup email when I realised I
> forgot to tell you the version number but it must have fallen through.
> I'm using 0.10.1.1 both for Kafka and for the streams library.
> Currently my application works on 4 partitions and only uses about 100% of
> one core, so I don't see how it could be CPU starved. Still I will of
> course try your suggestion.
> 
> Thanks again,
> V.
> 
> 
> On Tue, May 16, 2017 at 5:15 PM, Eno Thereska <eno.there...@gmail.com>
> wrote:
> 
>> Which version of Kafka are you using? It might be that RocksDb doesn't get
>> enough resources to compact the data fast enough. If that's the case you
>> can try increasing the number of background compaction threads for RocksDb
>> through the RocksDbConfigSetter class (see http://docs.confluent.io/
>> current/streams/developer-guide.html#streams-developer-
>> guide-rocksdb-config <http://docs.confluent.io/current/streams/developer-
>> guide.html#streams-developer-guide-rocksdb-config>) by calling 
>> "options.setIncreaseParallelism(/*
>> number of threads for compaction, e.g., 5 */)"
>> 
>> Eno
>> 
>>> On 16 May 2017, at 14:58, Vincent Bernardi <vinc...@kameleoon.com>
>> wrote:
>>> 
>>> Hi,
>>> I'm running an experimental Kafka Stream Processor which accumulates lots
>>> of data in a StateStoreSupplier during transform() and forwards lots of
>>> data during punctuate (and deletes it form the StateStoreSupplier). I'm
>>> currently using a persistent StateStore, meaning that Kafka Streams
>>> provides me with a RocksDB instance which writes everything on disk. The
>>> average amount of data that I keep in my StateStore at any time is at
>> most
>>> 1GB.
>>> 
>>> My problem is that it seems that this data is never really deleted, as if
>>> no compaction never happened: the directory size for my RocksDB instance
>>> goes ever up and eventually uses up all disk space at which point my
>>> application crashes (I've seen it go up to 60GB before I stopped it).
>>> 
>>> Does anyone know if this can be a normal behaviour for RocksDB? Is there
>>> any way that I can manually log or trigger RocksDB compactions to see if
>>> that is my problem?
>>> 
>>> Thanks in advance for any pointer,
>>> V.
>> 
>>

Re: Kafka Streams RocksDB permanent StateStoreSupplier seemingly never deletes data

2017-05-16 Thread Eno Thereska

0.10.2.1 is compatible with Kafka 0.10.1.

Eno
> On 16 May 2017, at 20:45, Vincent Bernardi <vinc...@kameleoon.com> wrote:
> 
> The LOG files stay small. The SST files are growing but not in size, in
> numbers. Old .sst files seem never written to anymore but are not deleted
> and new ones appear regularly.
> I can certainly try streams 0.10.2.1 if it's compatible with Kafka 0.10.1.
> I have not checked the compatibility matrix yet.
> 
> Thanks for the help,
> V.
> 
> On Tue, 16 May 2017 at 17:57, Eno Thereska <eno.there...@gmail.com> wrote:
> 
>> Thanks. Which RocksDb files are growing indefinitely, the LOG or SST ones?
>> Also, any chance you could use the latest streams library 0.10.2.1 to
>> check if problem still exists?
>> 
>> 
>> Eno
>> 
>>> On 16 May 2017, at 16:43, Vincent Bernardi <vinc...@kameleoon.com>
>> wrote:
>>> 
>>> Just tried setting compaction threads to 5, but I have the exact same
>>> problem: the rocksdb files get bigger and bigger, while my application
>>> never stores more than 200k K/V pairs.
>>> 
>>> V.
>>> 
>>> On Tue, May 16, 2017 at 5:22 PM, Vincent Bernardi <vinc...@kameleoon.com
>>> 
>>> wrote:
>>> 
>>>> Hi Eno,
>>>> Thanks for your answer. I tried sending a followup email when I
>> realised I
>>>> forgot to tell you the version number but it must have fallen through.
>>>> I'm using 0.10.1.1 both for Kafka and for the streams library.
>>>> Currently my application works on 4 partitions and only uses about 100%
>> of
>>>> one core, so I don't see how it could be CPU starved. Still I will of
>>>> course try your suggestion.
>>>> 
>>>> Thanks again,
>>>> V.
>>>> 
>>>> 
>>>> On Tue, May 16, 2017 at 5:15 PM, Eno Thereska <eno.there...@gmail.com>
>>>> wrote:
>>>> 
>>>>> Which version of Kafka are you using? It might be that RocksDb doesn't
>>>>> get enough resources to compact the data fast enough. If that's the
>> case
>>>>> you can try increasing the number of background compaction threads for
>>>>> RocksDb through the RocksDbConfigSetter class (see
>>>>> http://docs.confluent.io/current/streams/developer-guide.
>>>>> html#streams-developer-guide-rocksdb-config <
>>>>> http://docs.confluent.io/current/streams/developer-guide.
>>>>> html#streams-developer-guide-rocksdb-config>) by calling
>>>>> "options.setIncreaseParallelism(/* number of threads for compaction,
>>>>> e.g., 5 */)"
>>>>> 
>>>>> Eno
>>>>> 
>>>>>> On 16 May 2017, at 14:58, Vincent Bernardi <vinc...@kameleoon.com>
>>>>> wrote:
>>>>>> 
>>>>>> Hi,
>>>>>> I'm running an experimental Kafka Stream Processor which accumulates
>>>>> lots
>>>>>> of data in a StateStoreSupplier during transform() and forwards lots
>> of
>>>>>> data during punctuate (and deletes it form the StateStoreSupplier).
>> I'm
>>>>>> currently using a persistent StateStore, meaning that Kafka Streams
>>>>>> provides me with a RocksDB instance which writes everything on disk.
>> The
>>>>>> average amount of data that I keep in my StateStore at any time is at
>>>>> most
>>>>>> 1GB.
>>>>>> 
>>>>>> My problem is that it seems that this data is never really deleted, as
>>>>> if
>>>>>> no compaction never happened: the directory size for my RocksDB
>> instance
>>>>>> goes ever up and eventually uses up all disk space at which point my
>>>>>> application crashes (I've seen it go up to 60GB before I stopped it).
>>>>>> 
>>>>>> Does anyone know if this can be a normal behaviour for RocksDB? Is
>> there
>>>>>> any way that I can manually log or trigger RocksDB compactions to see
>> if
>>>>>> that is my problem?
>>>>>> 
>>>>>> Thanks in advance for any pointer,
>>>>>> V.
>>>>> 
>>>>> 
>>>> 
>> 
>>

Re: Kafka Streams RocksDB permanent StateStoreSupplier seemingly never deletes data

2017-05-16 Thread Eno Thereska

Thanks. Which RocksDb files are growing indefinitely, the LOG or SST ones?
Also, any chance you could use the latest streams library 0.10.2.1 to check if 
problem still exists?


Eno

> On 16 May 2017, at 16:43, Vincent Bernardi <vinc...@kameleoon.com> wrote:
> 
> Just tried setting compaction threads to 5, but I have the exact same
> problem: the rocksdb files get bigger and bigger, while my application
> never stores more than 200k K/V pairs.
> 
> V.
> 
> On Tue, May 16, 2017 at 5:22 PM, Vincent Bernardi <vinc...@kameleoon.com>
> wrote:
> 
>> Hi Eno,
>> Thanks for your answer. I tried sending a followup email when I realised I
>> forgot to tell you the version number but it must have fallen through.
>> I'm using 0.10.1.1 both for Kafka and for the streams library.
>> Currently my application works on 4 partitions and only uses about 100% of
>> one core, so I don't see how it could be CPU starved. Still I will of
>> course try your suggestion.
>> 
>> Thanks again,
>> V.
>> 
>> 
>> On Tue, May 16, 2017 at 5:15 PM, Eno Thereska <eno.there...@gmail.com>
>> wrote:
>> 
>>> Which version of Kafka are you using? It might be that RocksDb doesn't
>>> get enough resources to compact the data fast enough. If that's the case
>>> you can try increasing the number of background compaction threads for
>>> RocksDb through the RocksDbConfigSetter class (see
>>> http://docs.confluent.io/current/streams/developer-guide.
>>> html#streams-developer-guide-rocksdb-config <
>>> http://docs.confluent.io/current/streams/developer-guide.
>>> html#streams-developer-guide-rocksdb-config>) by calling
>>> "options.setIncreaseParallelism(/* number of threads for compaction,
>>> e.g., 5 */)"
>>> 
>>> Eno
>>> 
>>>> On 16 May 2017, at 14:58, Vincent Bernardi <vinc...@kameleoon.com>
>>> wrote:
>>>> 
>>>> Hi,
>>>> I'm running an experimental Kafka Stream Processor which accumulates
>>> lots
>>>> of data in a StateStoreSupplier during transform() and forwards lots of
>>>> data during punctuate (and deletes it form the StateStoreSupplier). I'm
>>>> currently using a persistent StateStore, meaning that Kafka Streams
>>>> provides me with a RocksDB instance which writes everything on disk. The
>>>> average amount of data that I keep in my StateStore at any time is at
>>> most
>>>> 1GB.
>>>> 
>>>> My problem is that it seems that this data is never really deleted, as
>>> if
>>>> no compaction never happened: the directory size for my RocksDB instance
>>>> goes ever up and eventually uses up all disk space at which point my
>>>> application crashes (I've seen it go up to 60GB before I stopped it).
>>>> 
>>>> Does anyone know if this can be a normal behaviour for RocksDB? Is there
>>>> any way that I can manually log or trigger RocksDB compactions to see if
>>>> that is my problem?
>>>> 
>>>> Thanks in advance for any pointer,
>>>> V.
>>> 
>>> 
>>

Re: KafkaStreams reports RUNNING even though all StreamThreads has crashed

2017-05-16 Thread Eno Thereska

Hi Andreas,

Thanks for reporting. This sounds like a bug, but could also be a semantic 
thing. Couple of questions:

- which version of Kafka are you using?
- what is the nature of the failure of the threads, e.g., how come they have 
all crashed? If all threads crash, was there an exception they threw and it was 
caught? If all threads have crashed, would it be useful to still have the Kafka 
Streams instance running at all (e.g., I'd expect it to also crash or be 
terminated, in which case I don't see a value in providing a state()).

Eno

> On 16 May 2017, at 08:03, Andreas Gabrielsson 
>  wrote:
> 
> Hi All,
> 
> We recently implemented a health check for a Kafka Streams based application. 
> The health check is simply checking the state of Kafka Streams by calling 
> KafkaStreams.state(). It reports healthy if it’s not in PENDING_SHUTDOWN or 
> NOT_RUNNING states. 
> 
> We truly appreciate having the possibility to easily check the state of Kafka 
> Streams but to our surprise we noticed that KafkaStreams.state() returns 
> RUNNING even though all StreamThreads has crashed and reached NOT_RUNNING 
> state. Is this intended behaviour or is it a bug? Semantically it seems weird 
> to me that KafkaStreams would say it’s RUNNING when it is in fact not 
> consuming anything since all underlying working threads has crashed. 
> 
> If this is intended behaviour I would appreciate an explanation of why that 
> is the case. Also in that case, how could I determine if the consumption from 
> Kafka hasn’t crashed? 
> 
> If this is not intended behaviour, how fast could I expect it to be fixed? I 
> wouldn’t mind fixing it myself but I’m not sure if this is considered trivial 
> or big enough to require a JIRA. Also, if I would implement a fix I’d like 
> your input on what would be a reasonable solution. By just inspecting to code 
> I have an idea but I’m not sure I understand all the implication so I’d be 
> happy to hear your thoughts first. 
> 
> Thanks in advance,
> Andreas Gabrielsson

Re: Can state stores function as a caching layer for persistent storage

2017-05-16 Thread Eno Thereska

(it's preferred to create another email thread for a different topic to make it 
easier to look back)

Yes, there could be room for optimizations, e.g., see this: 
http://mail-archives.apache.org/mod_mbox/kafka-users/201705.mbox/%3cCAJikTEUHR=r0ika6vlf_y+qajxg8f_q19og_-s+q-gozpqb...@mail.gmail.com%3e
 
<http://mail-archives.apache.org/mod_mbox/kafka-users/201705.mbox/%3CCAJikTEUHR=r0ika6vlf_y+qajxg8f_q19og_-s+q-gozpqb...@mail.gmail.com%3E>

Eno

> On 16 May 2017, at 16:01, João Peixoto <joao.harti...@gmail.com> wrote:
> 
> Follow up doubt (let me know if a new question should be created).
> 
> Do we always need a repartitioning topic?
> 
> If I'm reading things correctly when we change the key of a record we need
> make sure the new key falls on the same partition that we are processing.
> This makes a lot of sense if after such change we'd need to join on some
> other stream/table or cases where we sink to a topic.
> However, in cases where none of these things, the repartition topic does
> nothing? If this is true can we somehow not create it?
> 
> On Sun, May 14, 2017 at 7:58 PM João Peixoto <joao.harti...@gmail.com>
> wrote:
> 
>> Very useful links, thank you.
>> 
>> Part of my original misunderstanding was that the at-least-once guarantee
>> was considered fulfilled if the record reached a sink node.
>> 
>> Thanks for all the feedback, you may consider my question answered.
>> Feel free to ask further questions about the use case if found interesting.
>> On Sun, May 14, 2017 at 4:31 PM Matthias J. Sax <matth...@confluent.io>
>> wrote:
>> 
>>> Yes.
>>> 
>>> It is basically "documented", as Streams guarantees at-least-once
>>> semantics. Thus, we make sure, you will not loose any data in case of
>>> failure. (ie, the overall guarantee is documented)
>>> 
>>> To achieve this, we always flush before we commit offsets. (This is not
>>> explicitly documented as it's an implementation detail.)
>>> 
>>> There is some doc's in the wiki:
>>> 
>>> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Streams+Internal+Data+Management#KafkaStreamsInternalDataManagement-Commits
>>> 
>>> This might also help in case you want to dig into the code:
>>> 
>>> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Streams+Architecture
>>> 
>>> 
>>> -Matthias
>>> 
>>> On 5/14/17 4:07 PM, João Peixoto wrote:
>>>> I think I now understand what Matthias meant when he said "If you use a
>>>> global remote store, you would not need to back your changes in a
>>> changelog
>>>> topic, as the store would not be lost if case of failure".
>>>> 
>>>> I had the misconception that if a state store threw an exception during
>>>> "flush", all messages received between now and the previous flush would
>>> be
>>>> "lost", hence the need for a changelog topic. However, it seems that the
>>>> "repartition" topic actually solves this problem.
>>>> 
>>>> There's very little information about the latter, at least that I could
>>>> find, but an entry seems to be added whenever a record enters the
>>>> "aggregate", but the state store "consumer" of this topic only updates
>>> its
>>>> offset after the flush completes, meaning that the repartition topic
>>> will
>>>> be replayed! It seems this problem is already solved for me, I'd
>>> appreciate
>>>> if someone could point me to the documentation or code that backs up the
>>>> above.
>>>> 
>>>> 
>>>> On Sat, May 13, 2017 at 3:11 PM João Peixoto <joao.harti...@gmail.com>
>>>> wrote:
>>>> 
>>>>> Replies in line as well
>>>>> 
>>>>> 
>>>>> On Sat, May 13, 2017 at 3:25 AM Eno Thereska <eno.there...@gmail.com>
>>>>> wrote:
>>>>> 
>>>>>> Hi João,
>>>>>> 
>>>>>> Some answers inline:
>>>>>> 
>>>>>>> On 12 May 2017, at 18:27, João Peixoto <joao.harti...@gmail.com>
>>> wrote:
>>>>>>> 
>>>>>>> Thanks for the comments, here are some clarifications:
>>>>>>> 
>>>>>>> I did look at interactive queries, if I understood them correctly it
>>>>>> means
>>>>>>> that my state store

Re: Kafka Streams RocksDB permanent StateStoreSupplier seemingly never deletes data

2017-05-16 Thread Eno Thereska

Which version of Kafka are you using? It might be that RocksDb doesn't get 
enough resources to compact the data fast enough. If that's the case you can 
try increasing the number of background compaction threads for RocksDb through 
the RocksDbConfigSetter class (see 
http://docs.confluent.io/current/streams/developer-guide.html#streams-developer-guide-rocksdb-config
 
)
 by calling "options.setIncreaseParallelism(/* number of threads for 
compaction, e.g., 5 */)"

Eno

> On 16 May 2017, at 14:58, Vincent Bernardi  wrote:
> 
> Hi,
> I'm running an experimental Kafka Stream Processor which accumulates lots
> of data in a StateStoreSupplier during transform() and forwards lots of
> data during punctuate (and deletes it form the StateStoreSupplier). I'm
> currently using a persistent StateStore, meaning that Kafka Streams
> provides me with a RocksDB instance which writes everything on disk. The
> average amount of data that I keep in my StateStore at any time is at most
> 1GB.
> 
> My problem is that it seems that this data is never really deleted, as if
> no compaction never happened: the directory size for my RocksDB instance
> goes ever up and eventually uses up all disk space at which point my
> application crashes (I've seen it go up to 60GB before I stopped it).
> 
> Does anyone know if this can be a normal behaviour for RocksDB? Is there
> any way that I can manually log or trigger RocksDB compactions to see if
> that is my problem?
> 
> Thanks in advance for any pointer,
> V.

Re: Reg: [VOTE] KIP 157 - Add consumer config options to streams reset tool

2017-05-16 Thread Eno Thereska

+1 thanks.

Eno
> On 16 May 2017, at 04:20, BigData dev  wrote:
> 
> Hi All,
> Given the simple and non-controversial nature of the KIP, I would like to
> start the voting process for KIP-157: Add consumer config options to
> streams reset tool
> 
> *https://cwiki.apache.org/confluence/display/KAFKA/KIP+157+-+Add+consumer+config+options+to+streams+reset+tool
> *
> 
> 
> The vote will run for a minimum of 72 hours.
> 
> Thanks,
> 
> Bharat

Re: Kafka-streams process stopped processing messages

2017-05-16 Thread Eno Thereska

Hi Shimi,

Could we start a new email thread on the slow booting to separate it from the 
initial thread (call it "slow boot" or something)? Thank you. Also, could you 
provide the logs for the booting part if possible, together with your streams 
config.

Thanks
Eno
> On 15 May 2017, at 20:49, Shimi Kiviti <shim...@gmail.com> wrote:
> 
> I do run the clients with 0.10.2.1 and it takes hours
> What I don't understand is why it takes hours to boot on a server that has
> all the data in RocksDB already. Is that related to the amount of data in
> RocksDB (changelog topics) or the data in the source topic the processes
> reads from?
> On Mon, 15 May 2017 at 20:32 Guozhang Wang <wangg...@gmail.com> wrote:
> 
>> Hello Shimi,
>> 
>> Could you try upgrading your clients to 0.10.2.1 (note you do not need to
>> upgrade your servers if it is already on 0.10.1, since newer Streams
>> clients can directly talk to older versioned brokers since 0.10.1+) and try
>> it out again? I have a few optimizations to reduce rebalance latencies in
>> both the underlying consumer client as well as streams library, and
>> hopefully they will help with your rebalance issues.
>> 
>> Also, we have a bunch of more fixes on consumer rebalance that we have
>> already pushed in trunk and hence will be included in the upcoming June
>> release of 0.11.0.0.
>> 
>> 
>> Guozhang
>> 
>> On Sat, May 13, 2017 at 12:32 PM, Shimi Kiviti <shim...@gmail.com> wrote:
>> 
>>> I tried all these configurations and now like version 0.10.1.1 I see a
>> very
>>> slow startup.
>>> I decreased the cluster to a single server which was running without any
>>> problem for a few hours. Now, each time I restart this process it gets
>> into
>>> rebalancing state for several hours.
>>> That mean that every time we need to deploy a new version of our app
>> (which
>>> can be several times a day) we have a down time of hours.
>>> 
>>> 
>>> On Sat, May 6, 2017 at 5:13 PM, Eno Thereska <eno.there...@gmail.com>
>>> wrote:
>>> 
>>>> Yeah we’ve seen cases when the session timeout might also need
>>> increasing.
>>>> Could you try upping it to something like 6ms and let us know how
>> it
>>>> goes:
>>>> 
>>>>>> streamsProps.put(ConsumerConfig.SESSION_TIMEOUT_MS_CONFIG, 6);
>>>> 
>>>> 
>>>> Thanks
>>>> Eno
>>>> 
>>>>> On May 6, 2017, at 8:35 AM, Shimi Kiviti <shim...@gmail.com> wrote:
>>>>> 
>>>>> Thanks Eno,
>>>>> I already set the the recurve buffer size to 1MB
>>>>> I will also try producer
>>>>> 
>>>>> What about session timeout and heart beat timeout? Do you think it
>>> should
>>>>> be increased?
>>>>> 
>>>>> Thanks,
>>>>> Shimi
>>>>> 
>>>>> On Sat, 6 May 2017 at 0:21 Eno Thereska <eno.there...@gmail.com>
>>> wrote:
>>>>> 
>>>>>> Hi Shimi,
>>>>>> 
>>>>>> I’ve noticed with our benchmarks that on AWS environments with high
>>>>>> network latency the network socket buffers often need adjusting. Any
>>>> chance
>>>>>> you could add the following to your streams configuration to change
>>> the
>>>>>> default socket size bytes to a higher value (at least 1MB) and let
>> us
>>>> know?
>>>>>> 
>>>>>> private static final int SOCKET_SIZE_BYTES = 1 * 1024 * 1024; // at
>>>> least
>>>>>> 1MB
>>>>>> streamsProps.put(ConsumerConfig.RECEIVE_BUFFER_CONFIG,
>>>> SOCKET_SIZE_BYTES);
>>>>>> streamsProps.put(ProducerConfig.SEND_BUFFER_CONFIG,
>>> SOCKET_SIZE_BYTES);
>>>>>> 
>>>>>> Thanks
>>>>>> Eno
>>>>>> 
>>>>>>> On May 4, 2017, at 3:45 PM, Shimi Kiviti <shim...@gmail.com>
>> wrote:
>>>>>>> 
>>>>>>> Thanks Eno,
>>>>>>> 
>>>>>>> We still see problems on our side.
>>>>>>> when we run kafka-streams 0.10.1.1 eventually the problem goes away
>>> but
>>>>>>> with 0.10.2.1 it is not.
>>>>>>> We see a lot of the rebalancing messages I wrote before
>>>>>>> 
>>>>>&

Re: Kafka Streams reports: "The timestamp of the message is out of acceptable range"

2017-05-15 Thread Eno Thereska

Hi Frank,

Could you confirm that you're using 0.10.2.1? This error was fixed ad part of 
this JIRA I believe: https://issues.apache.org/jira/browse/KAFKA-4861 


Thanks
Eno
> On 14 May 2017, at 23:09, Frank Lyaruu  wrote:
> 
> Hi Kafka people...
> 
> After a bit of tuning and an upgrade to Kafka 0.10.1.2, this error starts
> showing up and the whole thing kind of dies.
> 
> 2017-05-14 18:51:52,342 | ERROR | hread-3-producer | RecordCollectorImpl
>   | 91 - com.dexels.kafka.streams - 0.0.115.201705131415 | task
> [1_0] Error sending record to topic
> KNBSB-test-generation-7-personcore-personcore-photopersongeneration-7-repartition.
> No more offsets will be recorded for this task and the exception will
> eventually be thrown
> org.apache.kafka.common.errors.InvalidTimestampException: The timestamp of
> the message is out of acceptable range.
> 2017-05-14 18:51:52,343 | INFO  | StreamThread-3   | StreamThread
>  | 91 - com.dexels.kafka.streams - 0.0.115.201705131415 |
> stream-thread [StreamThread-3] Flushing state stores of task 1_0
> 2017-05-14 18:51:52,345 | ERROR | StreamThread-3   | StreamThread
>  | 91 - com.dexels.kafka.streams - 0.0.115.201705131415 |
> stream-thread [StreamThread-3] Failed while executing StreamTask 1_0 due to
> flush state:
> org.apache.kafka.streams.errors.StreamsException: task [1_0] exception
> caught when producing
> 
> at
> org.apache.kafka.streams.processor.internals.RecordCollectorImpl.checkForException(RecordCollectorImpl.java:121)[91:com.dexels.kafka.streams:0.0.115.201705131415]
> at
> org.apache.kafka.streams.processor.internals.RecordCollectorImpl.flush(RecordCollectorImpl.java:129)[91:com.dexels.kafka.streams:0.0.115.201705131415]
> at
> org.apache.kafka.streams.processor.internals.StreamTask.flushState(StreamTask.java:422)[91:com.dexels.kafka.streams:0.0.115.201705131415]
> at
> org.apache.kafka.streams.processor.internals.StreamThread$4.apply(StreamThread.java:555)[91:com.dexels.kafka.streams:0.0.115.201705131415]
> at
> org.apache.kafka.streams.processor.internals.StreamThread.performOnTasks(StreamThread.java:501)[91:com.dexels.kafka.streams:0.0.115.201705131415]
> at
> org.apache.kafka.streams.processor.internals.StreamThread.flushAllState(StreamThread.java:551)[91:com.dexels.kafka.streams:0.0.115.201705131415]
> at
> org.apache.kafka.streams.processor.internals.StreamThread.shutdownTasksAndState(StreamThread.java:449)[91:com.dexels.kafka.streams:0.0.115.201705131415]
> at
> org.apache.kafka.streams.processor.internals.StreamThread.shutdown(StreamThread.java:391)[91:com.dexels.kafka.streams:0.0.115.201705131415]
> at
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:372)[91:com.dexels.kafka.streams:0.0.115.201705131415]
> 
> Caused by: org.apache.kafka.common.errors.InvalidTimestampException: The
> timestamp of the message is out of acceptable range.
> 
> What does this mean? How can I debug this?
> 
> Two observations:
> - I only see this on *-repartition topics
> - ... which are also the only topic with cleanup policy = 'delete'

Re: Can state stores function as a caching layer for persistent storage

2017-05-13 Thread Eno Thereska

Hi João,

Some answers inline:

> On 12 May 2017, at 18:27, João Peixoto  wrote:
> 
> Thanks for the comments, here are some clarifications:
> 
> I did look at interactive queries, if I understood them correctly it means
> that my state store must hold all the results in order for it to be
> queried, either in memory or through disk (RocksDB).

Yes, that's correct.


> 1. The retention policy on my aggregate operations, in my case, is 90 days,
> which is way too much data to hold in memory

It will depend on how much data/memory you have, but perhaps it could be too 
much to hold in memory
for that long (especially because some failure is bound to happen in 90 days)

> 2. My stream instances do no have access to disk, even if they did,
> wouldn't it mean I'd need almost twice the disk space to hold the same
> data? I.e. kafka brokers golding the topics + RocksDB holding the state?

That's interesting, is there a reason why the streams instances don't have 
access to a local file system? I'm curious
what kind of deployment you have.

It is true that twice the disk space is needed to hold the data on RocksDb as 
well as in the Kafka changelog topic, 
however that is no different that the current situation where the data is 
stored to a remote database right?
I understand your point that you might not have access to local disk though.


> 3. Because a crash may happen between an entry is added to the changelog
> and the data store is flushed, I need to get all the changes everytime if I
> want to guarantee that all data is eventually persisted. This is why
> checkpoint files may not work for me.

The upcoming exactly-once support in 0.11 will help with these kind of 
guarantees.

> 
> Standby tasks looks great, I forgot about those.
> 
> I'm at the design phase so this is all tentative. Answering Matthias
> questions
> 
> My state stores are local. As mentioned above I do not have access to disk
> therefore I need to recover all data from somewhere, in this case I'm
> thinking about the changelog.

So this is where I get confused a bit, since you mention that your state stores 
are "local", i.e., the streams instance
does have access to a local file system.

> I read about Kafka Connect but have never used it, maybe that'll simplify
> things, but I need to do some studying there.
> 
> The reason why even though my stores are local but still I want to store
> them on a database and not use straight up RocksDB (or global stores) is
> because this would allow me to migrate my current processing pipeline to
> Kafka Streams without needing to change the frontend part of the
> application, which fetches data from MongoDB.

Makes sense.

> 
> PS When you mention Global State Stores I'm thinking of
> http://docs.confluent.io/3.2.0/streams/developer-guide.html#querying-remote-state-stores-for-the-entire-application,
> is this correct?

No I think Matthias is saying that if you have a remote server somewhere where 
you store all your data (like a shared file system).
This is not something Kafka would provide. 

Eno

> 
> 
> On Fri, May 12, 2017 at 10:02 AM Matthias J. Sax 
> wrote:
> 
>> Hi,
>> 
>> I am not sure about your overall setup. Are your stores local (similar
>> to RocksDB) or are you using one global remote store? If you use a
>> global remote store, you would not need to back your changes in a
>> changelog topic, as the store would not be lost if case of failure.
>> 
>> Also (in case that your stores are remote), did you consider using Kafka
>> Connect to export your data into an external store like MySQL or MongoDB
>> instead of writing your own custom stores for Streams?
>> 
>> If your stores are local, why do you write custom stores? I am curious
>> to understand why RocksDB does not serve your needs.
>> 
>> 
>> About your two comment:
>> 
>> (1) Streams uses RocksDB by default and the default implementation is
>> using "checkpoint files" in next release. Those checkpoint files track
>> the changelog offsets of the data that got flushed to disc. This allows
>> to reduce the startup time, as only the tail of the changelog needs to
>> be read to bring the store up to date. For this, you would always (1)
>> write to the changelog, (2) write to you store. Each time you need to
>> flush, you know that all data is in the changelog already. After each
>> flush, you can update the "local offset checkpoint file".
>> 
>> I guess, if you use local stores you can apply a similar pattern in you
>> custom store implementation. (And as mentioned above, for global remote
>> store you would not need the changelog anyway. -- This also applies to
>> your recovery question from below.)
>> 
>> (2) You can configure standby task (via StreamConfig
>> "num.standby.replicas"). This will set up standby tasks that passively
>> replicate your stores to another instance. In error case, state will be
>> migrated to those "hot standbys" reducing recovery time significantly.
>> 
>> 
>>

Re: Can state stores function as a caching layer for persistent storage

2017-05-12 Thread Eno Thereska

Hi there,

A couple of general comments, plus some answers:

- general comment: have you thought of using Interactive Queries to directly 
query the aggregate data, without needing to store them to an external database 
(see this blog: 
https://www.confluent.io/blog/unifying-stream-processing-and-interactive-queries-in-apache-kafka/
 
).
 That has the potential to simplify the overall architecture.



> 
> 1. I cannot store too much data in the changelog, even with compaction, if
> I have too much data, bootstrapping a stream instance would take a long time

> 2. On the other hand, if I take too long to recover from a failure, I may
> lose data. So there seems to be a delicate tradeoff here

There is the option of using standby tasks to reduce the recovery time 
(http://docs.confluent.io/current/streams/architecture.html#fault-tolerance 
)

> 
> 2. In a scenario where my stream would have a fanout (multiple sub-streams
> based on the same stream), each branch would perform different "aggregate"
> operations, each with its own state store. Are state stores flushed in
> parallel or sequentially?

Depends on how many threads you have. If you have N threads, each works and 
flushes in parallel.

> 3. The above also applies per-partition. As a stream definition is
> parallelized by partition, will one instance hold different store instances
> for each one?

Yes.

> 4. Through synthetic sleeps I simulated slow flushes, slower than the
> commit interval. The stream seems to be ok with it and didn't throw, I
> assume the Kafka consumer does not poll more records until all of the
> previous poll's are committed, but I couldn't find documentation to back
> this statement. Is there a timeout for "commit" operations?
> 
> 

Yes, a single thread polls() then commits(). The timeout for commit operations 
is controlled through the
REQUEST_TIMEOUT_MS_CONFIG option on the streams producer (by default 30 
seconds, you shouldn't have to change it normally).

Thanks
Eno


> Sample code
> 
> public class AggregateHolder {
> 
>private Long commonKey;
>private List rawValues = new ArrayList<>();
>private boolean persisted;
> 
> // ...
> }
> 
> And stream definition
> 
> source.groupByKey(Serdes.String(), recordSerdes)
>  .aggregate(
>  AggregateHolder::new,
>  (aggKey, value, aggregate) ->
> aggregate.addValue(value.getValue()),
>  new DemoStoreSupplier<>(/* ... */)
>  )
>  .foreach((key, agg) -> log.debug("Aggregate: {}={} ({})",
> key, agg.getAverage(), agg));

Re: [VOTE] KIP-156 Add option "dry run" to Streams application reset tool

2017-05-09 Thread Eno Thereska

+1 for me. I’m not sure we even need a KIP for this but it’s better to be safe 
I guess.

Eno

> On May 9, 2017, at 8:41 PM, BigData dev  wrote:
> 
> Hi, Everyone,
> 
> Since this is a relatively simple change, I would like to start the voting
> process for KIP-156: Add option "dry run" to Streams application reset tool
> 
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=69410150
> 
> 
> The vote will run for a minimum of 72 hours.
> 
> 
> Thanks,
> 
> Bharat

Re: Kafka Stream stops polling new messages

2017-05-09 Thread Eno Thereska

Yeah that's a good point, I'm not taking action then.

Eno

On Mon, May 8, 2017 at 10:38 PM, Matthias J. Sax <matth...@confluent.io>
wrote:

> Hey,
>
> I am not against opening a JIRA, but I am wondering what we should
> describe/report there. If I understand the scenario correctly, João uses
> a custom RocksDB store and calls seek() in user code land. As it is a
> bug in RocksDB that seek takes so long, I am not sure what we could
> improve within Streams to prevent this?  The only thing I am seeing
> right now is that we could reduce `max.poll.interval.ms` that we just
> increased to guard against failure for long stat recreation phases.
>
> Any thoughts?
>
>
> -Matthias
>
>
> On 5/3/17 8:48 AM, João Peixoto wrote:
> > That'd be great as I'm not familiar with the protocol there
> > On Wed, May 3, 2017 at 8:41 AM Eno Thereska <eno.there...@gmail.com>
> wrote:
> >
> >> Cool, thanks, shall we open a JIRA?
> >>
> >> Eno
> >>> On 3 May 2017, at 16:16, João Peixoto <joao.harti...@gmail.com> wrote:
> >>>
> >>> Actually I need to apologize, I pasted the wrong issue, I meant to
> paste
> >>> https://github.com/facebook/rocksdb/issues/261.
> >>>
> >>> RocksDB did not produce a crash report since it didn't actually crash.
> I
> >>> performed thread dumps on stale and not-stale instances which revealed
> >> the
> >>> common behavior and I collect and plot several Kafka metrics, including
> >>> "punctuate" durations, therefore I know it took a long time and
> >> eventually
> >>> finished.
> >>>
> >>> Joao
> >>>
> >>> On Wed, May 3, 2017 at 6:22 AM Eno Thereska <eno.there...@gmail.com>
> >> wrote:
> >>>
> >>>> Hi there,
> >>>>
> >>>> Thanks for double checking. Does RocksDB actually crash or produce a
> >> crash
> >>>> dump? I’m curious how you know that the issue is
> >>>> https://github.com/facebook/rocksdb/issues/1121 <
> >>>> https://github.com/facebook/rocksdb/issues/1121>, so just double
> >> checking
> >>>> with you.
> >>>>
> >>>> If that’s indeed the case, do you mind opening a JIRA (a copy-paste of
> >> the
> >>>> below should suffice)? Alternatively let us know and we’ll open it.
> >> Sounds
> >>>> like we should handle this better.
> >>>>
> >>>> Thanks,
> >>>> Eno
> >>>>
> >>>>
> >>>>> On May 3, 2017, at 5:49 AM, João Peixoto <joao.harti...@gmail.com>
> >>>> wrote:
> >>>>>
> >>>>> I believe I found the root cause of my problem. I seem to have hit
> this
> >>>>> RocksDB bug https://github.com/facebook/rocksdb/issues/1121
> >>>>>
> >>>>> On my stream configuration I have a custom transformer used for
> >>>>> deduplicating records, highly inspired in the
> >>>>> EventDeduplicationLambdaIntegrationTest
> >>>>> <
> >>>>
> >> https://github.com/confluentinc/examples/blob/3.
> 2.x/kafka-streams/src/test/java/io/confluent/examples/streams/
> EventDeduplicationLambdaIntegrationTest.java#L161
> >>>>>
> >>>>> but
> >>>>> adjusted to my use case, special emphasis on the "punctuate" method.
> >>>>>
> >>>>> All the stale instances had the main stream thread "RUNNING" the
> >>>>> "punctuate" method of this transformer, which in term was running
> >> RocksDB
> >>>>> "seekToFirst".
> >>>>>
> >>>>> Also during my debugging one such instance finished the "punctuate"
> >>>> method,
> >>>>> which took ~11h, exactly the time the instance was stuck for.
> >>>>> Changing the backing state store from "persistent" to "inMemory"
> solved
> >>>> my
> >>>>> issue, at least after several days running, no stuck instances.
> >>>>>
> >>>>> This leads me to ask, shouldn't Kafka detect such a situation fairly
> >>>>> quickly? Instead of just stopping polling? My guess is that the
> >> heartbeat
> >>>>> thread which now is separate continues working fine, since by
> >> definition
> >>&

Re: Large Kafka Streams deployment takes a long time to bootstrap

2017-05-06 Thread Eno Thereska

Hi there,

I wanted to add something: how many CPU cores does each of your Kubernetes 
instance have? In 0.10.2.1 we noticed a regression in environments with 1 core 
as described in https://issues.apache.org/jira/browse/KAFKA-5174 
. 

If you have 1 core, the workaround is to change a config as described here:
http://docs.confluent.io/current/streams/upgrade-guide.html#known-issues-and-workarounds
 


Thanks
Eno


> On May 6, 2017, at 9:48 AM, Sachin Mittal  wrote:
> 
> Note on few things.
> Set changelog topic delete retention time to as less as possible if the
> previous values for same key are not needed and can be safely cleaned up.
> Set segment size and segment retention time also low so older segments can
> be compacted and cleaned up.
> Set delete ratio to be aggressive 0.01 so segments don't grow to big.
> 
> This way state stores would be created much faster.
> 
> Also when using Windows smaller window size helps.
> 
> Try not running many stream threads on single machine unless you have a
> great hardware.
> 
> Make sure a thread is not reading from many partitions. Make sure ratio of
> partions to total threads is low.
> 
> Hope this helps.
> 
> Sachin
> 
> On 6 May 2017 13:28, "Shimi Kiviti"  wrote:
> 
>> This is very similar to issues that we see.
>> 
>> Did you check the status of the consumer group? In my case it will be in
>> rebalancing most of the time. Once in a while it will show consumers and
>> offsets but after a short time will go back to rebalancing.
>> 
>> How much storage does your Kafka-streams use?
>> Also, what is your k8s configuration?
>> Deployment? Deployment with emptyDir, hostPath or EBS? Statefulset?
>> 
>> Thanks,
>> Shimi
>> On Sat, 6 May 2017 at 2:34 João Peixoto  wrote:
>> 
>>> After a while the instance started running.
>>> 
>>> 2017-05-05 22:40:26.806  INFO 85 --- [ StreamThread-4]
>>> o.a.k.s.p.internals.StreamThread : stream-thread
>> [StreamThread-4]
>>> Committing task StreamTask 1_62
>>> (this is literally the next message)
>>> 2017-05-05 23:13:27.820  INFO 85 --- [ StreamThread-4]
>>> o.a.k.s.p.internals.StreamThread : stream-thread
>> [StreamThread-4]
>>> Committing all tasks because the commit interval 1ms has elapsed
>>> 
>>> On Fri, May 5, 2017 at 3:48 PM João Peixoto 
>>> wrote:
>>> 
 Warning, long message
 
 *Problem*: Initializing a Kafka Stream is taking a lng time.
 Currently at the 40 minute mark
 
 *Setup*:
 2 co-partition topics with 100 partitions.
 First topic contains a lot of messages in the order of hundreds of
>>> millions
 Second topic is a KTable and contains ~30k records
 
 Kafka cluster with 6 brokers running 0.10.1
 
 Kafka streams running on 0.10.2.1. 5 instances with 5 threads each.
 The instances are running on Kubernetes
 
 *Stream Configuration*:
 Properties props = new Properties();
 props.put(StreamsConfig.APPLICATION_ID_CONFIG, streamName);
 props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, ...);
 props.put(StreamsConfig.KEY_SERDE_CLASS_CONFIG,
 Serdes.String().getClass().getName());
 props.put(StreamsConfig.VALUE_SERDE_CLASS_CONFIG,
 Serdes.ByteArray().getClass().getName());
 props.put(StreamsConfig.COMMIT_INTERVAL_MS_CONFIG, 1);
 props.put(StreamsConfig.METRICS_RECORDING_LEVEL_CONFIG, "DEBUG");
 props.put(StreamsConfig.NUM_STREAM_THREADS_CONFIG, 5);
 props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
 props.put(ConsumerConfig.MAX_POLL_RECORDS_CONFIG, 500);
 
 *The events*:
 I started 5 instances of my stream configuration at the same time. This
>>> is
 the first
 time this configuration is running.
 
 2017-05-05 21:23:03.283  INFO 71 --- [   main]
 o.a.k.s.p.internals.StreamThread : stream-thread
>> [StreamThread-1]
 Creating producer client
 2017-05-05 21:23:03.415  INFO 71 --- [   main]
 o.a.k.s.p.internals.StreamThread : stream-thread
>> [StreamThread-1]
 Creating consumer client
 2017-05-05 21:23:03.520  INFO 71 --- [   main]
 o.a.k.s.p.internals.StreamThread : stream-thread
>> [StreamThread-1]
 Creating restore consumer client
 2017-05-05 21:23:03.528  INFO 71 --- [   main]
 o.a.k.s.p.internals.StreamThread : stream-thread
>> [StreamThread-1]
 State transition from NOT_RUNNING to RUNNING.
 2017-05-05 21:23:03.531  INFO 71 --- [   main]
 o.a.k.s.p.internals.StreamThread : stream-thread
>> [StreamThread-2]
 Creating producer client
 2017-05-05 21:23:03.564  INFO 71 --- [   main]
 o.a.k.s.p.internals.StreamThread : stream-thread
>> [StreamThread-2]
 Creating

Re: Kafka-streams process stopped processing messages

2017-05-06 Thread Eno Thereska

Yeah we’ve seen cases when the session timeout might also need increasing. 
Could you try upping it to something like 6ms and let us know how it goes:

>> streamsProps.put(ConsumerConfig.SESSION_TIMEOUT_MS_CONFIG, 6);


Thanks
Eno

> On May 6, 2017, at 8:35 AM, Shimi Kiviti <shim...@gmail.com> wrote:
> 
> Thanks Eno,
> I already set the the recurve buffer size to 1MB
> I will also try producer
> 
> What about session timeout and heart beat timeout? Do you think it should
> be increased?
> 
> Thanks,
> Shimi
> 
> On Sat, 6 May 2017 at 0:21 Eno Thereska <eno.there...@gmail.com> wrote:
> 
>> Hi Shimi,
>> 
>> I’ve noticed with our benchmarks that on AWS environments with high
>> network latency the network socket buffers often need adjusting. Any chance
>> you could add the following to your streams configuration to change the
>> default socket size bytes to a higher value (at least 1MB) and let us know?
>> 
>> private static final int SOCKET_SIZE_BYTES = 1 * 1024 * 1024; // at least
>> 1MB
>> streamsProps.put(ConsumerConfig.RECEIVE_BUFFER_CONFIG, SOCKET_SIZE_BYTES);
>> streamsProps.put(ProducerConfig.SEND_BUFFER_CONFIG, SOCKET_SIZE_BYTES);
>> 
>> Thanks
>> Eno
>> 
>>> On May 4, 2017, at 3:45 PM, Shimi Kiviti <shim...@gmail.com> wrote:
>>> 
>>> Thanks Eno,
>>> 
>>> We still see problems on our side.
>>> when we run kafka-streams 0.10.1.1 eventually the problem goes away but
>>> with 0.10.2.1 it is not.
>>> We see a lot of the rebalancing messages I wrote before
>>> 
>>> on at least 1 kafka-stream nodes we see disconnection messages like the
>>> following. These messages repeat all the time
>>> 
>>> 2017-05-04 14:25:56,063 [StreamThread-1] INFO
>>> o.a.k.c.c.i.AbstractCoordinator: Discovered coordinator
>>> ip-10-0-91-10.ec2.internal:9092 (id: 2147483646 rack: null) for group sa.
>>> 2017-05-04 14:25:56,063 [StreamThread-1] DEBUG o.a.k.c.NetworkClient:
>>> Initiating connection to node 2147483646 at
>> ip-10-0-91-10.ec2.internal:9092.
>>> 2017-05-04 14:25:56,091 [StreamThread-1] INFO
>>> o.a.k.c.c.i.AbstractCoordinator: (Re-)joining group sa
>>> 2017-05-04 14:25:56,093 [StreamThread-1] DEBUG
>>> o.a.k.s.p.i.StreamPartitionAssignor: stream-thread [StreamThread-1] found
>>> [sa-events] topics possibly matching regex
>>> 2017-05-04 14:25:56,096 [StreamThread-1] DEBUG o.a.k.s.p.TopologyBuilder:
>>> stream-thread [StreamThread-1] updating builder with
>>> SubscriptionUpdates{updatedTopicSubscriptions=[sa-events]} topic(s) with
>> po
>>> ssible matching regex subscription(s)
>>> 2017-05-04 14:25:56,096 [StreamThread-1] DEBUG
>>> o.a.k.c.c.i.AbstractCoordinator: Sending JoinGroup ((type:
>>> JoinGroupRequest, groupId=sa, sessionTimeout=1,
>>> rebalanceTimeout=2147483647, memb
>>> erId=, protocolType=consumer,
>>> 
>> groupProtocols=org.apache.kafka.common.requests.JoinGroupRequest$ProtocolMetadata@2f894d9b
>> ))
>>> to coordinator ip-10-0-91-10.ec2.internal:9092 (id: 2147483646 rack:
>> null)
>>> 2017-05-04 14:25:56,097 [StreamThread-1] DEBUG o.a.k.c.n.Selector:
>> Created
>>> socket with SO_RCVBUF = 1048576, SO_SNDBUF = 131072, SO_TIMEOUT = 0 to
>> node
>>> 2147483646
>>> 2017-05-04 14:25:56,097 [StreamThread-1] DEBUG o.a.k.c.NetworkClient:
>>> Completed connection to node 2147483646.  Fetching API versions.
>>> 2017-05-04 14:25:56,097 [StreamThread-1] DEBUG o.a.k.c.NetworkClient:
>>> Initiating API versions fetch from node 2147483646.
>>> 2017-05-04 14:25:56,104 [StreamThread-1] DEBUG o.a.k.c.NetworkClient:
>>> Recorded API versions for node 2147483646: (Produce(0): 0 to 2 [usable:
>> 2],
>>> Fetch(1): 0 to 3 [usable: 3], Offsets(2): 0 to 1 [usable: 1],
>>> Metadata(3): 0 to 2 [usable: 2], LeaderAndIsr(4): 0 [usable: 0],
>>> StopReplica(5): 0 [usable: 0], UpdateMetadata(6): 0 to 2 [usable: 2],
>>> ControlledShutdown(7): 1 [usable: 1], OffsetCommit(8): 0 to 2 [usable:
>>> 2], OffsetFetch(9): 0 to 1 [usable: 1], GroupCoordinator(10): 0 [usable:
>>> 0], JoinGroup(11): 0 to 1 [usable: 1], Heartbeat(12): 0 [usable: 0],
>>> LeaveGroup(13): 0 [usable: 0], SyncGroup(14): 0 [usable: 0], Desc
>>> ribeGroups(15): 0 [usable: 0], ListGroups(16): 0 [usable: 0],
>>> SaslHandshake(17): 0 [usable: 0], ApiVersions(18): 0 [usable: 0],
>>> CreateTopics(19): 0 [usable: 0], DeleteTopics(20): 0 [usable: 0])
>>> 2017-05-04 14:29:44,800 [kafka-producer

Re: Kafka-streams process stopped processing messages

2017-05-05 Thread Eno Thereska

 (id: 2 rack: null), ip-10
> -0-91-10.ec2.internal:9092 (id: 1 rack: null)], partitions =
> [Partition(topic = sa-events, partition = 0, leader = 1, replicas = [1,2],
> isr = [2,1]), Partition(topic = sa-events, partition = 1, lea
> der = 2, replicas = [1,2], isr = [2,1]), Partition(topic = sa-events,
> partition = 2, leader = 1, replicas = [1,2], isr = [2,1])])
> 2017-05-04 14:31:06,085 [StreamThread-1] DEBUG o.a.k.c.NetworkClient:
> Disconnecting from node 2147483646 due to request timeout.
> 2017-05-04 14:31:06,086 [StreamThread-1] DEBUG
> o.a.k.c.c.i.ConsumerNetworkClient: Cancelled JOIN_GROUP request
> {api_key=11,api_version=1,correlation_id=16,client_id=sa-5788b5a5-aadc-4276-916f
> -1640008c17da-StreamThread-1-consumer} with correlation id 16 due to node
> 2147483646 being disconnected
> 2017-05-04 14:31:06,086 [StreamThread-1] INFO
> o.a.k.c.c.i.AbstractCoordinator: Marking the coordinator
> ip-10-0-91-10.ec2.internal:9092 (id: 2147483646 rack: null) dead for group
> sa
> 2017-05-04 14:31:06,195 [StreamThread-1] DEBUG
> o.a.k.c.c.i.AbstractCoordinator: Sending GroupCoordinator request for group
> sa to broker ip-10-0-91-10.ec2.internal:9092 (id: 1 rack: null)
> 2017-05-04 14:31:06,196 [StreamThread-1] DEBUG o.a.k.c.NetworkClient:
> Sending metadata request (type=MetadataRequest, topics=) to node 2
> 2017-05-04 14:31:06,200 [StreamThread-1] DEBUG
> o.a.k.c.c.i.AbstractCoordinator: Received GroupCoordinator response
> ClientResponse(receivedTimeMs=1493908266200, latencyMs=5,
> disconnected=false, requestHeader=
> {api_key=10,api_version=0,correlation_id=19,client_id=sa-5788b5a5-aadc-4276-916f-1640008c17da-StreamThread-1-consumer},
> responseBody={error_code=0,coordinator={node_id=1,host=ip-10-0-91-10.ec
> 2.internal,port=9092}}) for group sa
> 
> 
> On Mon, May 1, 2017 at 4:19 PM, Eno Thereska <eno.there...@gmail.com> wrote:
> 
>> Hi Shimi,
>> 
>> 0.10.2.1 contains a number of fixes that should make the out of box
>> experience better, including resiliency under broker failures and better
>> exception handling. If you ever get back to it, and if the problem happens
>> again, please do send us the logs and we'll happily have a look.
>> 
>> Thanks
>> Eno
>>> On 1 May 2017, at 12:05, Shimi Kiviti <shim...@gmail.com> wrote:
>>> 
>>> Hi Eno,
>>> I am afraid I played too much with the configuration to make this
>>> productive investigation :(
>>> 
>>> This is a QA environment which includes 2 kafka instances and 3 zookeeper
>>> instances in AWS. There are only 3 partition for this topic.
>>> Kafka broker and kafka-stream are version 0.10.1.1
>>> Our kafka-stream app run on docker using kubernetes.
>>> I played around with with 1 to 3  kafka-stream processes, but I got the
>>> same results. It is too easy to scale with kubernetes :)
>>> Since there are only 3 partitions, I didn't start more then 3 instances.
>>> 
>>> I was too quick to upgraded only the kafka-stream app to 0.10.2.1 with
>> hope
>>> that it will solve the problem, It didn't.
>>> The log I sent before are from this version.
>>> 
>>> I did notice "unknown" offset for the main topic with kafka-stream
>> version
>>> 0.10.2.1
>>> $ ./bin/kafka-consumer-groups.sh   --bootstrap-server localhost:9092
>>> --describe --group sa
>>> GROUP  TOPIC  PARTITION
>>> CURRENT-OFFSET  LOG-END-OFFSET  LAG OWNER
>>> sa sa-events 0  842199
>>> 842199  0
>>> sa-4557bf2d-ba79-42a6-aa05-5b4c9013c022-StreamThread-1-consumer_/
>> 10.0.10.9
>>> sa sa-events 1  1078428
>>> 1078428 0
>>> sa-4557bf2d-ba79-42a6-aa05-5b4c9013c022-StreamThread-1-consumer_/
>> 10.0.10.9
>>> sa sa-events 2  unknown
>>> 26093910unknown
>>> sa-4557bf2d-ba79-42a6-aa05-5b4c9013c022-StreamThread-1-consumer_/
>> 10.0.10.9
>>> 
>>> After that I downgraded the kafka-stream app back to version 0.10.1.1
>>> After a LONG startup time (more than an hour) where the status of the
>> group
>>> was rebalancing, all the 3 processes started processing messages again.
>>> 
>>> This all thing started after we hit a bug in our code (NPE) that crashed
>>> the stream processing thread.
>>> So now after 4 days, everything is back to normal.
>>> This worries me since it can happen again
>>> 
>>> 
>>> On Mon, May 1, 2017 at 11:45

Re: Windowed aggregations memory requirements

2017-05-03 Thread Eno Thereska

This is a timely question and we've updated the documentation here on capacity 
planning and sizing for Kafka Streams jobs: 
http://docs.confluent.io/current/streams/sizing.html 
. Any feedback welcome. 
It has scenarios with windowed stores too.

Thanks
Eno
> On 3 May 2017, at 18:51, Garrett Barton  wrote:
> 
> That depends on if your using event, processing or ingestion time.
> 
> My understanding is that if you play a record through that is T-6, the only
> way that 
> TimeWindows.of(TimeUnit.MINUTES.toMillis(1)).until(TimeUnit.MINUTES.toMillis(5))
> would actually process that record in your window is if your using
> processing time.  Otherwise the record is skipped and no data is
> generated/calculated for that operation.  So depending on what your doing
> you would not increase any more memory usage than when consuming from
> real-time.
> 
> On Wed, May 3, 2017 at 3:37 AM, João Peixoto 
> wrote:
> 
>> The base question I'm trying to answer is "how much memory does my instance
>> need".
>> 
>> Considering a use case where I want to keep a rolling average on a tumbling
>> window of 1 minute size allowing for late arrivals up to 5 minutes (lower
>> bound) we would have something like this:
>> 
>> TimeWindows.of(TimeUnit.MINUTES.toMillis(1)).until(
>> TimeUnit.MINUTES.toMillis(5))
>> 
>> The aggregate key size is 8 bytes, the average value is 8 bytes and for
>> de-duplication purposes we need to keep track of which messages we saw
>> already, so a list of keys averaging 10 entries.
>> 
>> If I understand correctly this means that each window will be on average 96
>> bytes in size.
>> 
>> A single topic generates 100 messages/minute, which aggregate into 10
>> independent windows.
>> 
>> On any given point in time the windowed aggregates require 960 bytes of
>> memory at least.
>> 
>> Here's the confusing part. Lets say I found an issue with my averaging
>> operation and I want to reprocess the last 10 hours worth of messages.
>> 
>> 1. Windows will be regenerated, since most likely they were cleaned up
>> already
>> 2. The retention policy now has different semantics? If I had a late
>> arrival of 6 minutes, all of the sudden the reprocessing will account for
>> it right? Since the window is now active due to recreation (Assuming my app
>> is capable of processing all messages under 5 minutes)
>> 3. I'll be keeping 10 windows * (60 * 10) minutes for the first 5 minutes,
>> so my memory requirement is now 576,000 bytes? This is orders of magnitude
>> bigger.
>> 
>> I hope this gets my doubts across clearly, feel free to ask more details.
>> And thanks in advance
>>

Re: Kafka Streams Failed to rebalance error

2017-05-03 Thread Eno Thereska

Hi,

Which version of Kafka are you using? This should be fixed in 0.10.2.1, any 
chance you could try that release?

Thanks
Eno
> On 3 May 2017, at 14:04, Sameer Kumar  wrote:
> 
> Hi,
> 
>  
> I ran two nodes in my streams compute cluster, they were running fine for few 
> hours before outputting with failure to rebalance errors.
> 
> 
> 
> I couldnt understand why this happened but I saw one strange behaviour...
> 
> at 16:53 on node1, I saw "Failed to lock the state directory" error, this 
> might have caused the partitions to relocate and hence the error.
> 
>  
> I am attaching detailed logs for both the nodes, please see if you can help.
> 
>  
> Some of the logs for quick reference are these.
> 
>  
> 2017-05-03 16:53:53 ERROR Kafka10Base:44 - Exception caught in thread 
> StreamThread-2
> 
> org.apache.kafka.streams.errors.StreamsException: stream-thread 
> [StreamThread-2] Failed to rebalance
> 
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:612)
> 
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:368)
> 
> Caused by: org.apache.kafka.streams.errors.StreamsException: stream-thread 
> [StreamThread-2] failed to suspend stream tasks
> 
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.suspendTasksAndState(StreamThread.java:488)
> 
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.access$1200(StreamThread.java:69)
> 
> at 
> org.apache.kafka.streams.processor.internals.StreamThread$1.onPartitionsRevoked(StreamThread.java:259)
> 
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinPrepare(ConsumerCoordinator.java:396)
> 
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:329)
> 
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:303)
> 
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:286)
> 
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1030)
> 
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:995)
> 
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:582)
> 
> ... 1 more
> 
> Caused by: org.apache.kafka.clients.consumer.CommitFailedException: Commit 
> cannot be completed since the group has already rebalanced and assigned the 
> partitions to another member. This means that the time between subsequent 
> calls to poll() was longer than the configured max.poll.interval.ms 
> , which typically implies that the poll loop is 
> spending too much time message processing. You can address this either by 
> increasing the session timeout or by reducing the maximum size of batches 
> returned in poll() with max.poll.records.
> 
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.sendOffsetCommitRequest(ConsumerCoordinator.java:698)
> 
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:577)
> 
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1125)
> 
> at 
> org.apache.kafka.streams.processor.internals.StreamTask.commitOffsets(StreamTask.java:296)
> 
> at 
> org.apache.kafka.streams.processor.internals.StreamThread$3.apply(StreamThread.java:535)
> 
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.performOnAllTasks(StreamThread.java:503)
> 
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.commitOffsets(StreamThread.java:531)
> 
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.suspendTasksAndState(StreamThread.java:480)
> 
> ... 10 more
> 
>  
> 2017-05-03 16:53:57 WARN  StreamThread:1184 - Could not create task 1_38. 
> Will retry.
> 
> org.apache.kafka.streams.errors.LockException: task [1_38] Failed to lock the 
> state directory: /data/streampoc/LIC2-5/1_38
> 
> at 
> org.apache.kafka.streams.processor.internals.ProcessorStateManager.(ProcessorStateManager.java:102)
> 
> at 
> org.apache.kafka.streams.processor.internals.AbstractTask.(AbstractTask.java:73)
> 
> at 
> org.apache.kafka.streams.processor.internals.StreamTask.(StreamTask.java:108)
> 
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.createStreamTask(StreamThread.java:834)
> 
> at 
>

Re: Kafka Stream stops polling new messages

2017-05-03 Thread Eno Thereska

Cool, thanks, shall we open a JIRA?

Eno
> On 3 May 2017, at 16:16, João Peixoto <joao.harti...@gmail.com> wrote:
> 
> Actually I need to apologize, I pasted the wrong issue, I meant to paste
> https://github.com/facebook/rocksdb/issues/261.
> 
> RocksDB did not produce a crash report since it didn't actually crash. I
> performed thread dumps on stale and not-stale instances which revealed the
> common behavior and I collect and plot several Kafka metrics, including
> "punctuate" durations, therefore I know it took a long time and eventually
> finished.
> 
> Joao
> 
> On Wed, May 3, 2017 at 6:22 AM Eno Thereska <eno.there...@gmail.com> wrote:
> 
>> Hi there,
>> 
>> Thanks for double checking. Does RocksDB actually crash or produce a crash
>> dump? I’m curious how you know that the issue is
>> https://github.com/facebook/rocksdb/issues/1121 <
>> https://github.com/facebook/rocksdb/issues/1121>, so just double checking
>> with you.
>> 
>> If that’s indeed the case, do you mind opening a JIRA (a copy-paste of the
>> below should suffice)? Alternatively let us know and we’ll open it. Sounds
>> like we should handle this better.
>> 
>> Thanks,
>> Eno
>> 
>> 
>>> On May 3, 2017, at 5:49 AM, João Peixoto <joao.harti...@gmail.com>
>> wrote:
>>> 
>>> I believe I found the root cause of my problem. I seem to have hit this
>>> RocksDB bug https://github.com/facebook/rocksdb/issues/1121
>>> 
>>> On my stream configuration I have a custom transformer used for
>>> deduplicating records, highly inspired in the
>>> EventDeduplicationLambdaIntegrationTest
>>> <
>> https://github.com/confluentinc/examples/blob/3.2.x/kafka-streams/src/test/java/io/confluent/examples/streams/EventDeduplicationLambdaIntegrationTest.java#L161
>>> 
>>> but
>>> adjusted to my use case, special emphasis on the "punctuate" method.
>>> 
>>> All the stale instances had the main stream thread "RUNNING" the
>>> "punctuate" method of this transformer, which in term was running RocksDB
>>> "seekToFirst".
>>> 
>>> Also during my debugging one such instance finished the "punctuate"
>> method,
>>> which took ~11h, exactly the time the instance was stuck for.
>>> Changing the backing state store from "persistent" to "inMemory" solved
>> my
>>> issue, at least after several days running, no stuck instances.
>>> 
>>> This leads me to ask, shouldn't Kafka detect such a situation fairly
>>> quickly? Instead of just stopping polling? My guess is that the heartbeat
>>> thread which now is separate continues working fine, since by definition
>>> the stream runs a message through the whole pipeline this step probably
>>> just looked like it was VERY slow. Not sure what the best approach here
>>> would be.
>>> 
>>> PS The linked code clearly states "This code is for demonstration
>> purposes
>>> and was not tested for production usage" so that's on me
>>> 
>>> On Tue, May 2, 2017 at 11:20 AM Matthias J. Sax <matth...@confluent.io>
>>> wrote:
>>> 
>>>> Did you check the logs? Maybe you need to increase log level to DEBUG to
>>>> get some more information.
>>>> 
>>>> Did you double check committed offsets via bin/kafka-consumer-groups.sh?
>>>> 
>>>> -Matthias
>>>> 
>>>> On 4/28/17 9:22 AM, João Peixoto wrote:
>>>>> My stream gets stale after a while and it simply does not receive any
>> new
>>>>> messages, aka does not poll.
>>>>> 
>>>>> I'm using Kafka Streams 0.10.2.1 (same happens with 0.10.2.0) and the
>>>>> brokers are running 0.10.1.1.
>>>>> 
>>>>> The stream state is RUNNING and there are no exceptions in the logs.
>>>>> 
>>>>> Looking at the JMX metrics, the threads are there and running, just not
>>>>> doing anything.
>>>>> The metric "consumer-coordinator-metrics > heartbeat-response-time-max"
>>>>> (The max time taken to receive a response to a heartbeat request) reads
>>>>> 43,361 seconds (almost 12 hours) which is consistent with the time of
>> the
>>>>> hang. Shouldn't this trigger a failure somehow?
>>>>> 
>>>>> The stream configuration looks something like this:
>>>>> 
>>>>

Re: Setting up Kafka & Kafka Streams for loading real-time and 'older' data concurrently

2017-05-03 Thread Eno Thereska

Just to add to this, there is a JIRA that tracks the fact that we don’t have an 
in-memory windowed store. https://issues.apache.org/jira/browse/KAFKA-4730 


Eno
> On May 3, 2017, at 12:42 PM, Damian Guy  wrote:
> 
> The windowed state store is only RocksDB at this point, so it isn't going
> to all be in memory. If you chose to implement your own Windowed Store,
> then you could hold it in memory if it would fit.
> 
> On Wed, 3 May 2017 at 04:37 João Peixoto  wrote:
> 
>> Out of curiosity, would this mean that a state store for such a window
>> could hold 90 days worth of data in memory?
>> 
>> Or filesystem if we're talking about Rocksdb
>> On Tue, May 2, 2017 at 10:08 AM Damian Guy  wrote:
>> 
>>> Hi Garret,
>>> 
>>> No, log.retention.hours doesn't impact compacted topics.
>>> 
>>> Thanks,
>>> Damian
>>> 
>>> On Tue, 2 May 2017 at 18:06 Garrett Barton 
>>> wrote:
>>> 
 Thanks Damian,
 
 Does setting log.retention.hours have anything to do with compacted
 topics?  Meaning would a topic not compact now for 90 days? I am
>> thinking
 all the internal topics that streams creates in the flow.  Having
>>> recovery
 through 90 days of logs would take a good while I'd imagine.
 
 Thanks for clarifying that the until() does in fact set properties
>>> against
 the internal topics created.  That makes sense.
 
 On Tue, May 2, 2017 at 11:44 AM, Damian Guy 
>>> wrote:
 
> Hi Garret,
> 
> 
>> I was running into data loss when segments are deleted faster than
>> downstream can process.  My knee jerk reaction was to set the
>> broker
>> configs log.retention.hours=2160 and log.segment.delete.delay.ms=
> 2160
>> and that made it go away, but I do not think this is right?
>> 
>> 
> I think setting log.retention.hours to 2160 is correct (not sure
>> about
> log.segment.delete.delay.ms) as segment retention is based on the
>>> record
> timestamps. So if you have 90 day old data you want to process then
>> you
> should set it to at least 90 days.
> 
> 
>> For examples sake, assume a source topic 'feed', assume a stream to
>> calculate min/max/avg to start with, using windows of 1 minute and
>> 5
>> minutes.  I wish to use the interactive queries against the window
> stores,
>> and I wish to retain 90 days of window data to query.
>> 
> So I need advice for configuration of kafka, the 'feed' topic, the
>>> store
>> topics, and the stores themselves.
>> 
>> 
> When you create the Windows as part of the streams app you should
>>> specify
> them something like so: TimeWindows.of(1minute).until(90days) - in
>> this
> way
> the stores and underling changelog topics will be configured with the
> correct retention periods.
> 
> Thanks,
> Damian
> 
 
>>> 
>>

Re: Kafka Stream stops polling new messages

2017-05-03 Thread Eno Thereska

Hi there,

Thanks for double checking. Does RocksDB actually crash or produce a crash 
dump? I’m curious how you know that the issue is 
https://github.com/facebook/rocksdb/issues/1121 
, so just double checking with 
you.

If that’s indeed the case, do you mind opening a JIRA (a copy-paste of the 
below should suffice)? Alternatively let us know and we’ll open it. Sounds like 
we should handle this better.

Thanks,
Eno


> On May 3, 2017, at 5:49 AM, João Peixoto  wrote:
> 
> I believe I found the root cause of my problem. I seem to have hit this
> RocksDB bug https://github.com/facebook/rocksdb/issues/1121
> 
> On my stream configuration I have a custom transformer used for
> deduplicating records, highly inspired in the
> EventDeduplicationLambdaIntegrationTest
> 
> but
> adjusted to my use case, special emphasis on the "punctuate" method.
> 
> All the stale instances had the main stream thread "RUNNING" the
> "punctuate" method of this transformer, which in term was running RocksDB
> "seekToFirst".
> 
> Also during my debugging one such instance finished the "punctuate" method,
> which took ~11h, exactly the time the instance was stuck for.
> Changing the backing state store from "persistent" to "inMemory" solved my
> issue, at least after several days running, no stuck instances.
> 
> This leads me to ask, shouldn't Kafka detect such a situation fairly
> quickly? Instead of just stopping polling? My guess is that the heartbeat
> thread which now is separate continues working fine, since by definition
> the stream runs a message through the whole pipeline this step probably
> just looked like it was VERY slow. Not sure what the best approach here
> would be.
> 
> PS The linked code clearly states "This code is for demonstration purposes
> and was not tested for production usage" so that's on me
> 
> On Tue, May 2, 2017 at 11:20 AM Matthias J. Sax 
> wrote:
> 
>> Did you check the logs? Maybe you need to increase log level to DEBUG to
>> get some more information.
>> 
>> Did you double check committed offsets via bin/kafka-consumer-groups.sh?
>> 
>> -Matthias
>> 
>> On 4/28/17 9:22 AM, João Peixoto wrote:
>>> My stream gets stale after a while and it simply does not receive any new
>>> messages, aka does not poll.
>>> 
>>> I'm using Kafka Streams 0.10.2.1 (same happens with 0.10.2.0) and the
>>> brokers are running 0.10.1.1.
>>> 
>>> The stream state is RUNNING and there are no exceptions in the logs.
>>> 
>>> Looking at the JMX metrics, the threads are there and running, just not
>>> doing anything.
>>> The metric "consumer-coordinator-metrics > heartbeat-response-time-max"
>>> (The max time taken to receive a response to a heartbeat request) reads
>>> 43,361 seconds (almost 12 hours) which is consistent with the time of the
>>> hang. Shouldn't this trigger a failure somehow?
>>> 
>>> The stream configuration looks something like this:
>>> 
>>> Properties props = new Properties();
>>>props.put(StreamsConfig.TIMESTAMP_EXTRACTOR_CLASS_CONFIG,
>>>  CustomTimestampExtractor.class.getName());
>>>props.put(StreamsConfig.APPLICATION_ID_CONFIG, streamName);
>>>props.put(StreamsConfig.CLIENT_ID_CONFIG, streamName);
>>>props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG,
>>> myConfig.getBrokerList());
>>>props.put(StreamsConfig.KEY_SERDE_CLASS_CONFIG,
>>> Serdes.String().getClass().getName());
>>>props.put(StreamsConfig.VALUE_SERDE_CLASS_CONFIG,
>>> Serdes.ByteArray().getClass().getName());
>>>props.put(StreamsConfig.COMMIT_INTERVAL_MS_CONFIG,
>>> myConfig.getCommitIntervalMs()); // 5000
>>>props.put(StreamsConfig.METRICS_RECORDING_LEVEL_CONFIG, "DEBUG");
>>>props.put(StreamsConfig.NUM_STREAM_THREADS_CONFIG,
>>> myConfig.getStreamThreadsCount()); // 1
>>>props.put(StreamsConfig.CACHE_MAX_BYTES_BUFFERING_CONFIG,
>>> myConfig.getMaxCacheBytes()); // 524_288_000L
>>>props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
>>>props.put(ConsumerConfig.MAX_POLL_RECORDS_CONFIG, 50);
>>> 
>>> The stream LEFT JOINs 2 topics, one of them being a KTable, and outputs
>> to
>>> another topic.
>>> 
>>> Thanks in advance for the help!
>>> 
>> 
>>

Re: session window bug not fixed in 0.10.2.1?

2017-05-02 Thread Eno Thereska

Hi Ara,

The PR https://github.com/apache/kafka/pull/2645 has gone to both trunk and
0.10.2.1, I just checked. What error are you seeing, could you give us an
update?

Thanks
Eno

On Fri, Apr 28, 2017 at 7:10 PM, Ara Ebrahimi 
wrote:

> Hi,
>
> I upgraded to 0.10.2.1 yesterday, enabled caching for session windows and
> tested again. It doesn’t seem to be fixed?
>
> Ara.
>
> > On Mar 27, 2017, at 2:10 PM, Damian Guy  wrote:
> >
> > Hi Ara,
> >
> > There is a performance issue in the 0.10.2 release of session windows. It
> > is fixed with this PR: https://github.com/apache/kafka/pull/2645
> > You can work around this on 0.10.2 by calling the aggregate(..),
> reduce(..)
> > etc methods and supplying StateStoreSupplier with caching
> > disabled, i.e, by doing something like:
> >
> > final StateStoreSupplier sessionStore =
> > Stores.create(*"session-store-name"*)
> >.withKeys(Serdes.String())
> >.withValues(Serdes.String())
> >.persistent()
> >.sessionWindowed(TimeUnit.MINUTES.toMillis(7))
> >.build();
> >
> >
> > The fix has also been cherry-picked to the 0.10.2 branch, so you could
> > build from source and not have to create the StateStoreSupplier.
> >
> > Thanks,
> > Damian
> >
> > On Mon, 27 Mar 2017 at 21:56 Ara Ebrahimi 
> > wrote:
> >
> > Thanks for the response Mathias!
> >
> > The reason we want this exact task assignment to happen is that a
> critical
> > part of our pipeline involves grouping relevant records together (that’s
> > what the aggregate function in the topology is for). And for hot keys
> this
> > can lead to sometimes 100s of records to get grouped together. Even
> worse,
> > these records are session bound, we use session windows. Hence we see
> lots
> > of activity around the store backing the aggregate function and even
> though
> > we use SSD drives we’re not seeing the kind of performance we want to
> see.
> > It seems like the aggregate function leads to lots of updates to these
> hot
> > keys which lead to lots of rocksdb activity.
> >
> > Now there are many ways to fix this problem:
> > - just don’t aggregate, create an algorithm which is not reliant on
> > grouping/aggregating records. Not what we can do with our tight schedule
> > right now.
> > - do grouping/aggregating but employ n instances and rely on uniform
> > distribution of these tasks. This is the easiest solution and what we
> > expected to work but didn’t work as you can tell from this thread. We
> threw
> > 4 instances at it but only 2 got used.
> > - tune rocksdb? I tried this actually but it didn’t really help us much,
> > aside from the fact that tuning rocksdb is very tricky.
> > - use in-memory store instead? Unfortunately we have to use session
> windows
> > for this aggregate function and apparently there’s no in-memory session
> > store impl? I tried to create one but soon realized it’s too much work
> :) I
> > looked at default PartitionAssigner code too, but that ain’t trivial
> either.
> >
> > So I’m a bit hopeless :(
> >
> > Ara.
> >
> > On Mar 27, 2017, at 1:35 PM, Matthias J. Sax   > matth...@confluent.io>> wrote:
> >
> >
> >
> >
> > 
> >
> > This message is for the designated recipient only and may contain
> > privileged, proprietary, or otherwise confidential information. If you
> have
> > received it in error, please notify the sender immediately and delete the
> > original. Any other use of the e-mail by you is prohibited. Thank you in
> > advance for your cooperation.
> >
> > 
> >
> > From: "Matthias J. Sax"  >>>
> > Subject: Re: more uniform task assignment across kafka stream nodes
> > Date: March 27, 2017 at 1:35:30 PM PDT
> > To: users@kafka.apache.org
> > Reply-To: >
> >
> >
> > Ara,
> >
> > thanks for the detailed information.
> >
> > If I parse this correctly, both instances run the same number of tasks
> > (12 each). That is all Streams promises.
> >
> > To come back to your initial question:
> >
> > Is there a way to tell kafka streams to uniformly assign partitions
> across
> > instances? If I have n kafka streams instances running, I want each to
> > handle EXACTLY 1/nth number of partitions. No dynamic task assignment
> > logic. Just dumb 1/n assignment.
> >
> > That is exactly what you get: each of you two instances get 24/2 = 12
> > tasks assigned. That is dump 1/n assignment, isn't it? So my original
> > response was correct.
> >
> > However, I now understand better what you are actually meaning by your
> > question. Note that Streams does not distinguish "type" of tasks -- it
> > only sees 24 tasks and assigns those in a balanced way.
> >
> > Thus, currently there is no easy way to get the assignment you want to
> > have, except, you implement you own

Re: Kafka Streams 0.10.0.1 - multiple consumers not receiving messages

2017-05-02 Thread Eno Thereska

Could you make sure you don’t have a firewall or that the Kafka brokers are set 
up correctly and can be accessed? Is the SSL port the same as the PLAINTEXT 
port in your server.config file? E.g., see this: 
https://stackoverflow.com/questions/43534220/marking-the-coordinator-dead-for-groupkafka/43537521
 
<https://stackoverflow.com/questions/43534220/marking-the-coordinator-dead-for-groupkafka/43537521>

Eno
> On May 2, 2017, at 10:59 AM, Henry Thacker <he...@henrythacker.com> wrote:
> 
> Hi Eno,
> 
> At the moment this is hard coded, but overridable with command line
> parameters:
> 
> config.put(StreamsConfig.APPLICATION_ID, appId + "-" + topic);
> config.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, brokers);
> config.put(StreamsConfig.ZOOKEEPER_CONNECT_CONFIG, zookeepers);
> config.put(StreamsConfig.KEY_SERDE_CLASS_CONFIG,
> Serdes.Bytes().getClass().getName());
> config.put(StreamsConfig.VALUE_SERDE_CLASS_CONFIG,
> Serdes.Bytes().getClass().getName());
> config.put(ConsumerConfig.MAX_PARTITION_FETCH_BYTES_CONFIG,
> maxMessageBytes);
> config.put(ProducerConfig.MAX_REQUEST_SIZE_CONFIG, maxMessageBytes);
> config.put(StreamsConfig.TIMESTAMP_EXTRACTOR_CLASS_CONFIG,
> WallclockTimestampExtractor.class.getName());
> config.put(StreamsConfig.STATE_DIR_CONFIG, tmpDir);
> config.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
> config.put(ConsumerConfig.MAX_POLL_RECORDS_CONFIG, 200);
> config.put(ConsumerConfig.SESSION_TIMEOUT_MS_CONFIG, 2);
> config.put(ProducerConfig.RETRIES_CONFIG, 2);
> 
> if (ssl)
> config.put(CommonClientConfigs.SECURITY_PROTOCOL_CONFIG, "SSL");
> 
> Variables:
> appId - "my-streamer-app"
> topic - "20170502_instancea_1234"
> brokers - "localhost:9092,localhost:9093,localhost:9094"
> zookeepers - "localhost:2181,localhost:2182,localhost:2183"
> maxMessageBytes - 3000
> ssl - true
> 
> Thanks,
> Henry
> -- 
> Henry Thacker
> 
> On 2 May 2017 at 10:16:25, Eno Thereska (eno.there...@gmail.com) wrote:
> 
>> Hi Henry,
>> 
>> Could you share the streams configuration for your apps? I.e., the part
>> where you assign application id and all the rest of the configs (just
>> configs, not code).
>> 
>> Thanks
>> Eno
>> 
>> On May 2, 2017, at 8:53 AM, Henry Thacker <he...@henrythacker.com> wrote:
>> 
>> Thanks all for your replies - I have checked out the docs which were very
>> helpful.
>> 
>> I have now moved the separate topic streams to different processes each
>> with their own app.id and I'm getting the following pattern, with no data
>> consumed:
>> 
>> "Starting stream thread [StreamThread-1]
>> Discovered coordinator  for group ..
>> Marking the coordinator  dead for group ..
>> Discovered coordinator  for group ..
>> Marking the coordinator  dead for group .."
>> 
>> The discover and dead states repeat every few minutes.
>> 
>> During this time, the broker logs look happy.
>> 
>> One other, hopefully unrelated point, is this cluster is all SSL
>> encrypted.
>> 
>> Thanks,
>> Henry
>> 
>> --
>> Henry Thacker
>> 
>> On 29 April 2017 at 05:31:30, Matthias J. Sax (matth...@confluent.io)
>> wrote:
>> 
>> Henry,
>> 
>> you might want to check out the docs, that give an overview of the
>> architecture:
>> http://docs.confluent.io/current/streams/architecture.html#example
>> 
>> Also, I am wondering why your application did not crash: I would expect
>> an exception like
>> 
>> java.lang.IllegalArgumentException: Assigned partition foo-2 for
>> non-subscribed topic regex pattern; subscription pattern is bar
>> 
>> Maybe you just don't hit it, because both topics have a single partition
>> and not multiple.
>> 
>> Out of interest though, had I subscribed for both topics in one subscriber
>> - I would have expected records for both topics interleaved
>> 
>> 
>> Yes. That should happen.
>> 
>> why when
>> 
>> running this in two separate processes do I not observe the same?
>> 
>> 
>> Not sure what you mean by this?
>> 
>> If I fix this by changing the application ID for each streaming process -
>> does this mean I lose the ability to share state stores between the
>> applications?
>> 
>> 
>> Yes.
>> 
>> 
>> If both your topics are single partitioned, and you want to share state,
>> you will not be able to run with more then one thread in your Streams app.
>> 
>> The on

Re: Failure on timestamp extraction for kafka streams 0.10.2.0

2017-05-02 Thread Eno Thereska

Hi Sachin,

This should be fixed in 0.10.2.1, could you upgrade to that release? Here is 
JIRA: https://issues.apache.org/jira/browse/KAFKA-4861 
.

Thanks
Eno
> On May 2, 2017, at 8:43 AM, Sachin Mittal  wrote:
> 
> The timestamp of the message is out of acceptable range

Re: Kafka Streams 0.10.0.1 - multiple consumers not receiving messages

2017-05-02 Thread Eno Thereska

Hi Henry,

Could you share the streams configuration for your apps? I.e., the part where 
you assign application id and all the rest of the configs (just configs, not 
code).

Thanks
Eno
> On May 2, 2017, at 8:53 AM, Henry Thacker  wrote:
> 
> Thanks all for your replies - I have checked out the docs which were very
> helpful.
> 
> I have now moved the separate topic streams to different processes each
> with their own app.id and I'm getting the following pattern, with no data
> consumed:
> 
> "Starting stream thread [StreamThread-1]
> Discovered coordinator  for group ..
> Marking the coordinator  dead for group ..
> Discovered coordinator  for group ..
> Marking the coordinator  dead for group .."
> 
> The discover and dead states repeat every few minutes.
> 
> During this time, the broker logs look happy.
> 
> One other, hopefully unrelated point, is this cluster is all SSL encrypted.
> 
> Thanks,
> Henry
> 
> -- 
> Henry Thacker
> 
> On 29 April 2017 at 05:31:30, Matthias J. Sax (matth...@confluent.io) wrote:
> 
>> Henry,
>> 
>> you might want to check out the docs, that give an overview of the
>> architecture:
>> http://docs.confluent.io/current/streams/architecture.html#example
>> 
>> Also, I am wondering why your application did not crash: I would expect
>> an exception like
>> 
>> java.lang.IllegalArgumentException: Assigned partition foo-2 for
>> non-subscribed topic regex pattern; subscription pattern is bar
>> 
>> Maybe you just don't hit it, because both topics have a single partition
>> and not multiple.
>> 
>> Out of interest though, had I subscribed for both topics in one subscriber
>> - I would have expected records for both topics interleaved
>> 
>> 
>> Yes. That should happen.
>> 
>> why when
>> 
>> running this in two separate processes do I not observe the same?
>> 
>> 
>> Not sure what you mean by this?
>> 
>> If I fix this by changing the application ID for each streaming process -
>> does this mean I lose the ability to share state stores between the
>> applications?
>> 
>> 
>> Yes.
>> 
>> 
>> If both your topics are single partitioned, and you want to share state,
>> you will not be able to run with more then one thread in your Streams app.
>> 
>> The only way to work around this, would be to copy the data into another
>> topic with more partitions before you process them -- of course, this
>> would mean data duplication.
>> 
>> 
>> -Matthias
>> 
>> 
>> On 4/28/17 12:45 PM, Henry Thacker wrote:
>> 
>> Thanks Michael and Eno for your help - I always thought the unit of
>> parallelism was a combination of topic & partition rather than just
>> partition.
>> 
>> Out of interest though, had I subscribed for both topics in one subscriber
>> - I would have expected records for both topics interleaved, why when
>> running this in two separate processes do I not observe the same? Just
>> wanting to try and form a mental model of how this is all working - I will
>> try and look through some code over the weekend.
>> 
>> If I fix this by changing the application ID for each streaming process -
>> does this mean I lose the ability to share state stores between the
>> applications?
>> 
>> Unfortunately the data on the input topics are provided by a third party
>> component which sends these keyless messages on a single partition per
>> topic, so I have little ability to fix this at source :-(
>> 
>> Thanks,
>> Henry
>> 
>> 
>> --
>>

Re: Kafka-streams process stopped processing messages

2017-05-01 Thread Eno Thereska

Hi Shimi,

0.10.2.1 contains a number of fixes that should make the out of box experience 
better, including resiliency under broker failures and better exception 
handling. If you ever get back to it, and if the problem happens again, please 
do send us the logs and we'll happily have a look.

Thanks
Eno
> On 1 May 2017, at 12:05, Shimi Kiviti <shim...@gmail.com> wrote:
> 
> Hi Eno,
> I am afraid I played too much with the configuration to make this
> productive investigation :(
> 
> This is a QA environment which includes 2 kafka instances and 3 zookeeper
> instances in AWS. There are only 3 partition for this topic.
> Kafka broker and kafka-stream are version 0.10.1.1
> Our kafka-stream app run on docker using kubernetes.
> I played around with with 1 to 3  kafka-stream processes, but I got the
> same results. It is too easy to scale with kubernetes :)
> Since there are only 3 partitions, I didn't start more then 3 instances.
> 
> I was too quick to upgraded only the kafka-stream app to 0.10.2.1 with hope
> that it will solve the problem, It didn't.
> The log I sent before are from this version.
> 
> I did notice "unknown" offset for the main topic with kafka-stream version
> 0.10.2.1
> $ ./bin/kafka-consumer-groups.sh   --bootstrap-server localhost:9092
> --describe --group sa
> GROUP  TOPIC  PARTITION
> CURRENT-OFFSET  LOG-END-OFFSET  LAG OWNER
> sa sa-events 0  842199
> 842199  0
> sa-4557bf2d-ba79-42a6-aa05-5b4c9013c022-StreamThread-1-consumer_/10.0.10.9
> sa sa-events 1  1078428
> 1078428 0
> sa-4557bf2d-ba79-42a6-aa05-5b4c9013c022-StreamThread-1-consumer_/10.0.10.9
> sa sa-events 2  unknown
> 26093910unknown
> sa-4557bf2d-ba79-42a6-aa05-5b4c9013c022-StreamThread-1-consumer_/10.0.10.9
> 
> After that I downgraded the kafka-stream app back to version 0.10.1.1
> After a LONG startup time (more than an hour) where the status of the group
> was rebalancing, all the 3 processes started processing messages again.
> 
> This all thing started after we hit a bug in our code (NPE) that crashed
> the stream processing thread.
> So now after 4 days, everything is back to normal.
> This worries me since it can happen again
> 
> 
> On Mon, May 1, 2017 at 11:45 AM, Eno Thereska <eno.there...@gmail.com>
> wrote:
> 
>> Hi Shimi,
>> 
>> Could you provide more info on your setup? How many kafka streams
>> processes do you have and from how many partitions are they consuming from.
>> If you have more processes than partitions some of the processes will be
>> idle and won’t do anything.
>> 
>> Eno
>>> On Apr 30, 2017, at 5:58 PM, Shimi Kiviti <shim...@gmail.com> wrote:
>>> 
>>> Hi Everyone,
>>> 
>>> I have a problem and I hope one of you can help me figuring it out.
>>> One of our kafka-streams processes stopped processing messages
>>> 
>>> When I turn on debug log I see lots of these messages:
>>> 
>>> 2017-04-30 15:42:20,228 [StreamThread-1] DEBUG o.a.k.c.c.i.Fetcher:
>> Sending
>>> fetch for partitions [devlast-changelog-2] to broker ip-x-x-x-x
>>> .ec2.internal:9092 (id: 1 rack: null)
>>> 2017-04-30 15:42:20,696 [StreamThread-1] DEBUG o.a.k.c.c.i.Fetcher:
>>> Ignoring fetched records for devlast-changelog-2 at offset 2962649 since
>>> the current position is 2963379
>>> 
>>> After a LONG time, the only messages in the log are these:
>>> 
>>> 2017-04-30 16:46:33,324 [kafka-coordinator-heartbeat-thread | sa] DEBUG
>>> o.a.k.c.c.i.AbstractCoordinator: Sending Heartbeat request for group sa
>> to
>>> coordinator ip-x-x-x-x.ec2.internal:9092 (id: 2147483646 rack: null)
>>> 2017-04-30 16:46:33,425 [kafka-coordinator-heartbeat-thread | sa] DEBUG
>>> o.a.k.c.c.i.AbstractCoordinator: Received successful Heartbeat response
>> for
>>> group same
>>> 
>>> Any idea?
>>> 
>>> Thanks,
>>> Shimi
>> 
>>

Re: Kafka-streams process stopped processing messages

2017-05-01 Thread Eno Thereska

Hi Shimi,

Could you provide more info on your setup? How many kafka streams processes do 
you have and from how many partitions are they consuming from. If you have more 
processes than partitions some of the processes will be idle and won’t do 
anything.

Eno
> On Apr 30, 2017, at 5:58 PM, Shimi Kiviti  wrote:
> 
> Hi Everyone,
> 
> I have a problem and I hope one of you can help me figuring it out.
> One of our kafka-streams processes stopped processing messages
> 
> When I turn on debug log I see lots of these messages:
> 
> 2017-04-30 15:42:20,228 [StreamThread-1] DEBUG o.a.k.c.c.i.Fetcher: Sending
> fetch for partitions [devlast-changelog-2] to broker ip-x-x-x-x
> .ec2.internal:9092 (id: 1 rack: null)
> 2017-04-30 15:42:20,696 [StreamThread-1] DEBUG o.a.k.c.c.i.Fetcher:
> Ignoring fetched records for devlast-changelog-2 at offset 2962649 since
> the current position is 2963379
> 
> After a LONG time, the only messages in the log are these:
> 
> 2017-04-30 16:46:33,324 [kafka-coordinator-heartbeat-thread | sa] DEBUG
> o.a.k.c.c.i.AbstractCoordinator: Sending Heartbeat request for group sa to
> coordinator ip-x-x-x-x.ec2.internal:9092 (id: 2147483646 rack: null)
> 2017-04-30 16:46:33,425 [kafka-coordinator-heartbeat-thread | sa] DEBUG
> o.a.k.c.c.i.AbstractCoordinator: Received successful Heartbeat response for
> group same
> 
> Any idea?
> 
> Thanks,
> Shimi

Re: Kafka Streams 0.10.0.1 - multiple consumers not receiving messages

2017-04-28 Thread Eno Thereska

Hi Henry,

Kafka Streams scales differently and does not support having the same 
application ID subscribe to different topics for scale-out. The way we support 
scaling out if you want to use the same application id is through partitions, 
i.e., Kafka Streams automatically assigns partitions to your multiple 
instances. If you want to scale out using topics you'll need to use different 
application IDs.

So in a nutshell this pattern is not supported. Was there a reason you needed 
to do it like that? 

Thanks
Eno

> On 28 Apr 2017, at 11:41, Henry Thacker <he...@henrythacker.com> wrote:
> 
> Should also add - there are definitely live incoming messages on both input
> topics when my streams are running. The auto offset reset config is set to
> "earliest" and because the input data streams are quite large (several
> millions records each), I set a relatively small max poll records (200) so
> we don't run into heartbeating issues if we restart intraday.
> 
> Thanks,
> Henry
> 
> -- 
> Henry Thacker
> 
> On 28 April 2017 at 11:37:53, Henry Thacker (he...@henrythacker.com) wrote:
> 
>> Hi Eno,
>> 
>> Thanks for your reply - the code that builds the topology is something
>> like this (I don't have email and the code access on the same machine
>> unfortunately - so might not be 100% accurate / terribly formatted!).
>> 
>> The stream application is a simple verifier which stores a tiny bit of
>> state in a state store. The processor is custom and only has logic in
>> init() to store the context and retrieve the store and process(...) to
>> validate the incoming messages and forward these on when appropriate.
>> 
>> There is no joining, aggregates or windowing.
>> 
>> In public static void main:
>> 
>> String topic = args[0];
>> String output = args[1];
>> 
>> KStreamBuilder builder = new KStreamBuilder();
>> 
>> StateStoreSupplier stateStore =
>> Stores.create("mystore").withStringKeys().withByteArrayValues().persistent().build();
>> 
>> KStream<Bytes, Bytes> stream = builder.stream(topic);
>> 
>> builder.addStateStore(stateStore);
>> 
>> stream.process(this::buildStreamProcessor, "mystore");
>> 
>> stream.to(outputTopic);
>> 
>> KafkaStreams streams = new KafkaStreams(builder, getProps());
>> streams.setUncaughtExceptionHandler(...);
>> streams.start();
>> 
>> Thanks,
>> Henry
>> 
>> 
>> On 28 April 2017 at 11:26:07, Eno Thereska (eno.there...@gmail.com) wrote:
>> 
>>> Hi Henry,
>>> 
>>> Could you share the code that builds your topology so we see how the
>>> topics are passed in? Also, this would depend on what the streaming logic
>>> is doing with the topics, e.g., if you're joining them then both partitions
>>> need to be consumed by the same instance.
>>> 
>>> Eno
>>> 
>>> On 28 Apr 2017, at 11:01, Henry Thacker <he...@henrythacker.com> wrote:
>>> 
>>> Hi,
>>> 
>>> I'm using Kafka 0.10.0.1 and Kafka streams. When I have two different
>>> processes, Consumer 1 and 2. They both share the same application ID, but
>>> subscribe for different single-partition topics. Only one stream consumer
>>> receives messages.
>>> 
>>> The non working stream consumer just sits there logging:
>>> 
>>> Starting stream thread [StreamThread-1]
>>> Discovered coordinator  (Id: ...) for group my-streamer
>>> Revoking previously assigned partitions [] for group my-streamer
>>> (Re-)joining group my-streamer
>>> Successfully joined group my-streamer with generation 3
>>> Setting newly assigned partitions [] for group my-streamer
>>> (Re-)joining group my-streamer
>>> Successfully joined group my-streamer with generation 4
>>> 
>>> If I was trying to subscribe to the same topic & partition I could
>>> understand this behaviour, but given that the subscriptions are for
>>> different input topics, I would have thought this should work?
>>> 
>>> Thanks,
>>> Henry
>>> 
>>> --
>>> Henry Thacker
>>> 
>>> 
>>>

Re: Kafka Streams 0.10.0.1 - multiple consumers not receiving messages

2017-04-28 Thread Eno Thereska

Hi Henry,

Could you share the code that builds your topology so we see how the topics are 
passed in? Also, this would depend on what the streaming logic is doing with 
the topics, e.g., if you're joining them then both partitions need to be 
consumed by the same instance.

Eno
> On 28 Apr 2017, at 11:01, Henry Thacker  wrote:
> 
> Hi,
> 
> I'm using Kafka 0.10.0.1 and Kafka streams. When I have two different
> processes, Consumer 1 and 2. They both share the same application ID, but
> subscribe for different single-partition topics. Only one stream consumer
> receives messages.
> 
> The non working stream consumer just sits there logging:
> 
> Starting stream thread [StreamThread-1]
> Discovered coordinator  (Id: ...) for group my-streamer
> Revoking previously assigned partitions [] for group my-streamer
> (Re-)joining group my-streamer
> Successfully joined group my-streamer with generation 3
> Setting newly assigned partitions [] for group my-streamer
> (Re-)joining group my-streamer
> Successfully joined group my-streamer with generation 4
> 
> If I was trying to subscribe to the same topic & partition I could
> understand this behaviour, but given that the subscriptions are for
> different input topics, I would have thought this should work?
> 
> Thanks,
> Henry
> 
> -- 
> Henry Thacker

Re: Stream applications dying on broker ISR change

2017-04-25 Thread Eno Thereska

Hi Ian,

Any chance you could share the full log? Feel free to send it to me directly if 
you don't want to broadcast it everywhere.

Thanks
Eno


> On 25 Apr 2017, at 17:36, Ian Duffy <i...@ianduffy.ie> wrote:
> 
> Thanks again for the quick response Eno.
> 
> We just left the application running in the hope it would recover; After
> ~1hour it's still just continuously spilling out the same exception and not
> managing to continue processing.
> 
> On 25 April 2017 at 16:24, Eno Thereska <eno.there...@gmail.com> wrote:
> 
>> Hi Ian,
>> 
>> Retries are sometimes expected and don't always indicate a problem. We
>> should probably adjust the printing of the messages to not print this
>> warning frequently. Are you seeing any crash or does the app proceed?
>> 
>> Thanks
>> Eno
>> 
>> On 25 Apr 2017 4:02 p.m., "Ian Duffy" <i...@ianduffy.ie> wrote:
>> 
>> Upgraded a handful of our streams applications to 0.10.2.1 as suggested.
>> Seeing much less issues and much smoother performance.
>> They withstood ISR changes.
>> 
>> Seen the following when more consumers were added to a consumer group:
>> 
>> 2017-04-25 14:57:37,200 - [WARN] - [1.1.0-11] - [StreamThread-2]
>> o.a.k.s.p.internals.StreamThread - Could not create task 1_21. Will retry.
>> org.apache.kafka.streams.errors.LockException: task [1_21] Failed to lock
>> the state directory for task 1_21
>> at
>> org.apache.kafka.streams.processor.internals.ProcessorStateM
>> anager.(ProcessorStateManager.java:100)
>> at
>> org.apache.kafka.streams.processor.internals.AbstractTask.<
>> init>(AbstractTask.java:73)
>> at
>> org.apache.kafka.streams.processor.internals.StreamTask.<
>> init>(StreamTask.java:108)
>> at
>> org.apache.kafka.streams.processor.internals.StreamThread.
>> createStreamTask(StreamThread.java:864)
>> at
>> org.apache.kafka.streams.processor.internals.StreamThread$
>> TaskCreator.createTask(StreamThread.java:1237)
>> at
>> org.apache.kafka.streams.processor.internals.StreamThread$Ab
>> stractTaskCreator.retryWithBackoff(StreamThread.java:1210)
>> at
>> org.apache.kafka.streams.processor.internals.StreamThread.
>> addStreamTasks(StreamThread.java:967)
>> at
>> org.apache.kafka.streams.processor.internals.StreamThread.
>> access$600(StreamThread.java:69)
>> at
>> org.apache.kafka.streams.processor.internals.StreamThread$1.
>> onPartitionsAssigned(StreamThread.java:234)
>> at
>> org.apache.kafka.clients.consumer.internals.ConsumerCoordina
>> tor.onJoinComplete(ConsumerCoordinator.java:259)
>> at
>> org.apache.kafka.clients.consumer.internals.AbstractCoordina
>> tor.joinGroupIfNeeded(AbstractCoordinator.java:352)
>> at
>> org.apache.kafka.clients.consumer.internals.AbstractCoordina
>> tor.ensureActiveGroup(AbstractCoordinator.java:303)
>> at
>> org.apache.kafka.clients.consumer.internals.ConsumerCoordina
>> tor.poll(ConsumerCoordinator.java:290)
>> at
>> org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(
>> KafkaConsumer.java:1029)
>> at
>> org.apache.kafka.clients.consumer.KafkaConsumer.poll(
>> KafkaConsumer.java:995)
>> at
>> org.apache.kafka.streams.processor.internals.StreamThread.
>> runLoop(StreamThread.java:592)
>> at
>> org.apache.kafka.streams.processor.internals.StreamThread.
>> run(StreamThread.java:361)
>> 
>> 
>> 
>> On 24 April 2017 at 16:02, Eno Thereska <eno.there...@gmail.com> wrote:
>> 
>>> Hi Sachin,
>>> 
>>> In KIP-62 a background heartbeat thread was introduced to deal with the
>>> group protocol arrivals and departures. There is a setting called
>>> session.timeout.ms that specifies the timeout of that background thread.
>>> So if the thread has died that background thread will also die and the
>>> right thing will happen.
>>> 
>>> Eno
>>> 
>>>> On 24 Apr 2017, at 15:34, Sachin Mittal <sjmit...@gmail.com> wrote:
>>>> 
>>>> I had a question about this setting
>>>> ConsumerConfig.MAX_POLL_INTERVAL_MS_CONFIG,
>>> Integer.toString(Integer.MAX_
>>>> VALUE)
>>>> 
>>>> How would the broker know if a thread has died or say we simply stopped
>>> an
>>>> instance and needs to be booted out of the group.
>>>> 
>>>> Thanks
>>>> Sachin
>>>> 
>>>> 
>>>> On Mon, Apr 24, 2017 at 5:55 PM, Eno Thereska <eno.there...@gmail.com>

Re: Stream applications dying on broker ISR change

2017-04-25 Thread Eno Thereska

Hi Ian,

Retries are sometimes expected and don't always indicate a problem. We
should probably adjust the printing of the messages to not print this
warning frequently. Are you seeing any crash or does the app proceed?

Thanks
Eno

On 25 Apr 2017 4:02 p.m., "Ian Duffy" <i...@ianduffy.ie> wrote:

Upgraded a handful of our streams applications to 0.10.2.1 as suggested.
Seeing much less issues and much smoother performance.
They withstood ISR changes.

Seen the following when more consumers were added to a consumer group:

2017-04-25 14:57:37,200 - [WARN] - [1.1.0-11] - [StreamThread-2]
o.a.k.s.p.internals.StreamThread - Could not create task 1_21. Will retry.
org.apache.kafka.streams.errors.LockException: task [1_21] Failed to lock
the state directory for task 1_21
at
org.apache.kafka.streams.processor.internals.ProcessorStateM
anager.(ProcessorStateManager.java:100)
at
org.apache.kafka.streams.processor.internals.AbstractTask.<
init>(AbstractTask.java:73)
at
org.apache.kafka.streams.processor.internals.StreamTask.<
init>(StreamTask.java:108)
at
org.apache.kafka.streams.processor.internals.StreamThread.
createStreamTask(StreamThread.java:864)
at
org.apache.kafka.streams.processor.internals.StreamThread$
TaskCreator.createTask(StreamThread.java:1237)
at
org.apache.kafka.streams.processor.internals.StreamThread$Ab
stractTaskCreator.retryWithBackoff(StreamThread.java:1210)
at
org.apache.kafka.streams.processor.internals.StreamThread.
addStreamTasks(StreamThread.java:967)
at
org.apache.kafka.streams.processor.internals.StreamThread.
access$600(StreamThread.java:69)
at
org.apache.kafka.streams.processor.internals.StreamThread$1.
onPartitionsAssigned(StreamThread.java:234)
at
org.apache.kafka.clients.consumer.internals.ConsumerCoordina
tor.onJoinComplete(ConsumerCoordinator.java:259)
at
org.apache.kafka.clients.consumer.internals.AbstractCoordina
tor.joinGroupIfNeeded(AbstractCoordinator.java:352)
at
org.apache.kafka.clients.consumer.internals.AbstractCoordina
tor.ensureActiveGroup(AbstractCoordinator.java:303)
at
org.apache.kafka.clients.consumer.internals.ConsumerCoordina
tor.poll(ConsumerCoordinator.java:290)
at
org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(
KafkaConsumer.java:1029)
at
org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:995)
at
org.apache.kafka.streams.processor.internals.StreamThread.
runLoop(StreamThread.java:592)
at
org.apache.kafka.streams.processor.internals.StreamThread.
run(StreamThread.java:361)

On 24 April 2017 at 16:02, Eno Thereska <eno.there...@gmail.com> wrote:

> Hi Sachin,
>
> In KIP-62 a background heartbeat thread was introduced to deal with the
> group protocol arrivals and departures. There is a setting called
> session.timeout.ms that specifies the timeout of that background thread.
> So if the thread has died that background thread will also die and the
> right thing will happen.
>
> Eno
>
> > On 24 Apr 2017, at 15:34, Sachin Mittal <sjmit...@gmail.com> wrote:
> >
> > I had a question about this setting
> > ConsumerConfig.MAX_POLL_INTERVAL_MS_CONFIG,
> Integer.toString(Integer.MAX_
> > VALUE)
> >
> > How would the broker know if a thread has died or say we simply stopped
> an
> > instance and needs to be booted out of the group.
> >
> > Thanks
> > Sachin
> >
> >
> > On Mon, Apr 24, 2017 at 5:55 PM, Eno Thereska <eno.there...@gmail.com>
> > wrote:
> >
> >> Hi Ian,
> >>
> >>
> >> This is now fixed in 0.10.2.1. The default configuration need tweaking.
> If
> >> you can't pick that up (it's currently being voted), make sure you have
> >> these two parameters set as follows in your streams config:
> >>
> >> final Properties props = new Properties();
> >> ...
> >> props.put(ProducerConfig.RETRIES_CONFIG, 10);  < increase to 10
> from
> >> default of 0
> >> props.put(ConsumerConfig.MAX_POLL_INTERVAL_MS_CONFIG,
> >> Integer.toString(Integer.MAX_VALUE)); <- increase to infinity
> >> from default of 300 s
> >>
> >> Thanks
> >> Eno
> >>
> >>> On 24 Apr 2017, at 10:38, Ian Duffy <i...@ianduffy.ie> wrote:
> >>>
> >>> Hi All,
> >>>
> >>> We're running multiple Kafka Stream applications using Kafka client
> >>> 0.10.2.0 against a 6 node broker cluster running 0.10.1.1
> >>> Additionally, we're running Kafka Connect 0.10.2.0 with the
> ElasticSearch
> >>> connector by confluent [1]
> >>>
> >>> On an ISR change occurring on the brokers, all of the streams
> >> applications
> >>> and the Kafka connect ES connector threw exceptions and never
> recov

Re: [ANNOUNCE] New committer: Rajini Sivaram

2017-04-25 Thread Eno Thereska

Congrats!

Eno
> On Apr 25, 2017, at 12:17 PM, Rajini Sivaram  wrote:
> 
> Thanks everyone!
> 
> It has been a pleasure working with all of you in the Kafka community. Many
> thanks to the PMC for this exciting opportunity.
> 
> Regards,
> 
> Rajini
> 
> On Tue, Apr 25, 2017 at 10:51 AM, Damian Guy  wrote:
> 
>> Congrats
>> On Tue, 25 Apr 2017 at 09:57, Mickael Maison 
>> wrote:
>> 
>>> Congratulation Rajini !
>>> Great news
>>> 
>>> On Tue, Apr 25, 2017 at 8:54 AM, Edoardo Comar 
>> wrote:
 Congratulations Rajini !!!
 Well deserved
 --
 Edoardo Comar
 IBM MessageHub
 eco...@uk.ibm.com
 IBM UK Ltd, Hursley Park, SO21 2JN
 
 IBM United Kingdom Limited Registered in England and Wales with number
 741598 Registered office: PO Box 41, North Harbour, Portsmouth, Hants.
>>> PO6
 3AU
 
 
 
 From:   Gwen Shapira 
 To: d...@kafka.apache.org, Users ,
 priv...@kafka.apache.org
 Date:   24/04/2017 22:07
 Subject:[ANNOUNCE] New committer: Rajini Sivaram
 
 
 
 The PMC for Apache Kafka has invited Rajini Sivaram as a committer and
>> we
 are pleased to announce that she has accepted!
 
 Rajini contributed 83 patches, 8 KIPs (all security and quota
 improvements) and a significant number of reviews. She is also on the
 conference committee for Kafka Summit, where she helped select content
 for our community event. Through her contributions she's shown good
 judgement, good coding skills, willingness to work with the community
>> on
 finding the best
 solutions and very consistent follow through on her work.
 
 Thank you for your contributions, Rajini! Looking forward to many more
>> :)
 
 Gwen, for the Apache Kafka PMC
 
 
 
 Unless stated otherwise above:
 IBM United Kingdom Limited - Registered in England and Wales with
>> number
 741598.
 Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
>>> 3AU
>>> 
>>

Re: Stream applications dying on broker ISR change

2017-04-24 Thread Eno Thereska

Hi Sachin,

In KIP-62 a background heartbeat thread was introduced to deal with the group 
protocol arrivals and departures. There is a setting called session.timeout.ms 
that specifies the timeout of that background thread. So if the thread has died 
that background thread will also die and the right thing will happen.

Eno

> On 24 Apr 2017, at 15:34, Sachin Mittal <sjmit...@gmail.com> wrote:
> 
> I had a question about this setting
> ConsumerConfig.MAX_POLL_INTERVAL_MS_CONFIG, Integer.toString(Integer.MAX_
> VALUE)
> 
> How would the broker know if a thread has died or say we simply stopped an
> instance and needs to be booted out of the group.
> 
> Thanks
> Sachin
> 
> 
> On Mon, Apr 24, 2017 at 5:55 PM, Eno Thereska <eno.there...@gmail.com>
> wrote:
> 
>> Hi Ian,
>> 
>> 
>> This is now fixed in 0.10.2.1. The default configuration need tweaking. If
>> you can't pick that up (it's currently being voted), make sure you have
>> these two parameters set as follows in your streams config:
>> 
>> final Properties props = new Properties();
>> ...
>> props.put(ProducerConfig.RETRIES_CONFIG, 10);  < increase to 10 from
>> default of 0
>> props.put(ConsumerConfig.MAX_POLL_INTERVAL_MS_CONFIG,
>> Integer.toString(Integer.MAX_VALUE)); <- increase to infinity
>> from default of 300 s
>> 
>> Thanks
>> Eno
>> 
>>> On 24 Apr 2017, at 10:38, Ian Duffy <i...@ianduffy.ie> wrote:
>>> 
>>> Hi All,
>>> 
>>> We're running multiple Kafka Stream applications using Kafka client
>>> 0.10.2.0 against a 6 node broker cluster running 0.10.1.1
>>> Additionally, we're running Kafka Connect 0.10.2.0 with the ElasticSearch
>>> connector by confluent [1]
>>> 
>>> On an ISR change occurring on the brokers, all of the streams
>> applications
>>> and the Kafka connect ES connector threw exceptions and never recovered.
>>> 
>>> We've seen a correlation between Kafka Broker ISR change and stream
>>> applications dying.
>>> 
>>> The logs from the streams applications throw out the following and fail
>> to
>>> recover:
>>> 
>>> 07:01:23.323 stream-processor /var/log/application.log  2017-04-24
>>> 06:01:23,323 - [WARN] - [1.1.0-6] - [StreamThread-1]
>>> o.a.k.s.p.internals.StreamThread - Unexpected state transition from
>> RUNNING
>>> to NOT_RUNNING
>>> 07:01:23.323 stream-processor /var/log/application.log  2017-04-24
>>> 06:01:23,324 - [ERROR] - [1.1.0-6] - [StreamThread-1] Application -
>>> Unexpected Exception caught in thread [StreamThread-1]:
>>> org.apache.kafka.streams.errors.StreamsException: Exception caught in
>>> process. taskId=0_81, processor=KSTREAM-SOURCE-00,
>>> topic=kafka-topic, partition=81, offset=479285
>>> at
>>> org.apache.kafka.streams.processor.internals.
>> StreamTask.process(StreamTask.java:216)
>>> at
>>> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(
>> StreamThread.java:641)
>>> at
>>> org.apache.kafka.streams.processor.internals.
>> StreamThread.run(StreamThread.java:368)
>>> Caused by: org.apache.kafka.streams.errors.StreamsException: task [0_81]
>>> exception caught when producing
>>> at
>>> org.apache.kafka.streams.processor.internals.RecordCollectorImpl.
>> checkForException(RecordCollectorImpl.java:119)
>>> at
>>> org.apache.kafka.streams.processor.internals.RecordCollectorImpl.send(
>> RecordCollectorImpl.java:76)
>>> at
>>> org.apache.kafka.streams.processor.internals.SinkNode.
>> process(SinkNode.java:79)
>>> at
>>> org.apache.kafka.streams.processor.internals.
>> ProcessorContextImpl.forward(ProcessorContextImpl.java:83)
>>> at
>>> org.apache.kafka.streams.kstream.internals.KStreamFlatMap$
>> KStreamFlatMapProcessor.process(KStreamFlatMap.java:43)
>>> at
>>> org.apache.kafka.streams.processor.internals.ProcessorNode$1.run(
>> ProcessorNode.java:48)
>>> at
>>> org.apache.kafka.streams.processor.internals.StreamsMetricsImpl.
>> measureLatencyNs(StreamsMetricsImpl.java:188)
>>> at
>>> org.apache.kafka.streams.processor.internals.ProcessorNode.process(
>> ProcessorNode.java:134)
>>> at
>>> org.apache.kafka.streams.processor.internals.
>> ProcessorContextImpl.forward(ProcessorContextImpl.java:83)
>>> at
>>> org.apache.kafka.streams.processor.internals.
>> SourceNode.process(SourceNode.java:7

Re: Stream applications dying on broker ISR change

2017-04-24 Thread Eno Thereska

Hi Ian,


This is now fixed in 0.10.2.1. The default configuration need tweaking. If you 
can't pick that up (it's currently being voted), make sure you have these two 
parameters set as follows in your streams config:

final Properties props = new Properties();
...
props.put(ProducerConfig.RETRIES_CONFIG, 10);  < increase to 10 from 
default of 0
props.put(ConsumerConfig.MAX_POLL_INTERVAL_MS_CONFIG, 
Integer.toString(Integer.MAX_VALUE)); <- increase to infinity from 
default of 300 s

Thanks
Eno

> On 24 Apr 2017, at 10:38, Ian Duffy  wrote:
> 
> Hi All,
> 
> We're running multiple Kafka Stream applications using Kafka client
> 0.10.2.0 against a 6 node broker cluster running 0.10.1.1
> Additionally, we're running Kafka Connect 0.10.2.0 with the ElasticSearch
> connector by confluent [1]
> 
> On an ISR change occurring on the brokers, all of the streams applications
> and the Kafka connect ES connector threw exceptions and never recovered.
> 
> We've seen a correlation between Kafka Broker ISR change and stream
> applications dying.
> 
> The logs from the streams applications throw out the following and fail to
> recover:
> 
> 07:01:23.323 stream-processor /var/log/application.log  2017-04-24
> 06:01:23,323 - [WARN] - [1.1.0-6] - [StreamThread-1]
> o.a.k.s.p.internals.StreamThread - Unexpected state transition from RUNNING
> to NOT_RUNNING
> 07:01:23.323 stream-processor /var/log/application.log  2017-04-24
> 06:01:23,324 - [ERROR] - [1.1.0-6] - [StreamThread-1] Application -
> Unexpected Exception caught in thread [StreamThread-1]:
> org.apache.kafka.streams.errors.StreamsException: Exception caught in
> process. taskId=0_81, processor=KSTREAM-SOURCE-00,
> topic=kafka-topic, partition=81, offset=479285
> at
> org.apache.kafka.streams.processor.internals.StreamTask.process(StreamTask.java:216)
> at
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:641)
> at
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:368)
> Caused by: org.apache.kafka.streams.errors.StreamsException: task [0_81]
> exception caught when producing
> at
> org.apache.kafka.streams.processor.internals.RecordCollectorImpl.checkForException(RecordCollectorImpl.java:119)
> at
> org.apache.kafka.streams.processor.internals.RecordCollectorImpl.send(RecordCollectorImpl.java:76)
> at
> org.apache.kafka.streams.processor.internals.SinkNode.process(SinkNode.java:79)
> at
> org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:83)
> at
> org.apache.kafka.streams.kstream.internals.KStreamFlatMap$KStreamFlatMapProcessor.process(KStreamFlatMap.java:43)
> at
> org.apache.kafka.streams.processor.internals.ProcessorNode$1.run(ProcessorNode.java:48)
> at
> org.apache.kafka.streams.processor.internals.StreamsMetricsImpl.measureLatencyNs(StreamsMetricsImpl.java:188)
> at
> org.apache.kafka.streams.processor.internals.ProcessorNode.process(ProcessorNode.java:134)
> at
> org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:83)
> at
> org.apache.kafka.streams.processor.internals.SourceNode.process(SourceNode.java:70)
> at
> org.apache.kafka.streams.processor.internals.StreamTask.process(StreamTask.java:197)
> ... 2 common frames omitted
> Caused by: org.apache.kafka.common.errors.NotLeaderForPartitionException:
> This server is not the leader for that topic-partition.
> 07:01:23.558 stream-processor /var/log/application.log  2017-04-24
> 06:01:23,558 - [WARN] - [1.1.0-6] - [StreamThread-3]
> o.a.k.s.p.internals.StreamThread - Unexpected state transition from RUNNING
> to NOT_RUNNING
> 07:01:23.558 stream-processor /var/log/application.log  2017-04-24
> 06:01:23,559 - [ERROR] - [1.1.0-6] - [StreamThread-3] Application -
> Unexpected Exception caught in thread [StreamThread-3]:
> org.apache.kafka.streams.errors.StreamsException: Exception caught in
> process. taskId=0_55, processor=KSTREAM-SOURCE-00,
> topic=kafka-topic, partition=55, offset=479308
> at
> org.apache.kafka.streams.processor.internals.StreamTask.process(StreamTask.java:216)
> at
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:641)
> at
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:368)
> Caused by: org.apache.kafka.streams.errors.StreamsException: task [0_55]
> exception caught when producing
> at
> org.apache.kafka.streams.processor.internals.RecordCollectorImpl.checkForException(RecordCollectorImpl.java:119)
> at
> org.apache.kafka.streams.processor.internals.RecordCollectorImpl.send(RecordCollectorImpl.java:76)
> at
> org.apache.kafka.streams.processor.internals.SinkNode.process(SinkNode.java:79)
> at
> org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:83)
> at
>

Re: [VOTE] 0.10.2.1 RC3

2017-04-22 Thread Eno Thereska

+1 tested the usual streams tests as before.

Thanks
Eno
> On 21 Apr 2017, at 17:56, Gwen Shapira  wrote:
> 
> Hello Kafka users, developers, friends, romans, countrypersons,
> 
> This is the fourth (!) candidate for release of Apache Kafka 0.10.2.1.
> 
> It is a bug fix release, so we have lots of bug fixes, some super
> important.
> 
> Release notes for the 0.10.2.1 release:
> http://home.apache.org/~gwenshap/kafka-0.10.2.1-rc3/RELEASE_NOTES.html
> 
> *** Please download, test and vote by Wednesday, April 26, 2017 ***
> 
> Kafka's KEYS file containing PGP keys we use to sign the release:
> http://kafka.apache.org/KEYS
> 
> * Release artifacts to be voted upon (source and binary):
> http://home.apache.org/~gwenshap/kafka-0.10.2.1-rc3/
> 
> * Maven artifacts to be voted upon:
> https://repository.apache.org/content/groups/staging/
> 
> * Javadoc:
> http://home.apache.org/~gwenshap/kafka-0.10.2.1-rc3/javadoc/
> 
> * Tag to be voted upon (off 0.10.2 branch) is the 0.10.2.1 tag:
> https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tag;h=8e4f09caeaa877f06dc75c7da1af7a727e5e599f
> 
> 
> * Documentation:
> http://kafka.apache.org/0102/documentation.html
> 
> * Protocol:
> http://kafka.apache.org/0102/protocol.html
> 
> /**
> 
> Your help in validating this bugfix release is super valuable, so
> please take the time to test and vote!
> 
> Suggested tests:
> * Grab the source archive and make sure it compiles
> * Grab one of the binary distros and run the quickstarts against them
> * Extract and verify one of the site docs jars
> * Build a sample against jars in the staging repo
> * Validate GPG signatures on at least one file
> * Validate the javadocs look ok
> * The 0.10.2 documentation was updated for this bugfix release
> (especially upgrade, streams and connect portions) - please make sure
> it looks ok: http://kafka.apache.org/documentation.html
> 
> But above all, try to avoid finding new bugs - we want to get this release
> out the door already :P
> 
> 
> Thanks,
> Gwen
> 
> 
> 
> -- 
> *Gwen Shapira*
> Product Manager | Confluent
> 650.450.2760 | @gwenshap
> Follow us: Twitter  | blog
>

Re: Calculating time elapsed using event start / stop notification messages

2017-04-21 Thread Eno Thereska

Hi Ali,

One starting point would be the low level Processor API, where you get each 
event and process it. You can also use a persistent state store to keep track 
of the events seen so far, it can probably be an in-memory store. An an entry 
can probably be deleted once both start and stop events have been observed. If 
each record also has event timestamp, that would help with sorting the time in 
your processing logic.

After computing the time differences, you can either write that different to a 
topic, and then use a KTable to read from it and compute various windowed 
aggregates; or alternatively you can do the per hour/day/month processing in 
your own logic and stay entirely in the Processor API world.

Hope this helps
Eno

> On 21 Apr 2017, at 15:20, Ali Akhtar  wrote:
> 
> I have a tricky use case where a user initiates an event (by clicking a
> button) and then stops it (by clicking it again, losing connection, closing
> the browser, etc).
> 
> Each time the event starts or stops, a notification is sent to a kafka
> topic, with the user's id as the message key and the current timestamp, and
> the state of the event (started, or stopped).
> 
> I'm using Kafka streaming to process these events.
> 
> Based on the notifications, I need to determine the total time spent
> 'working', i.e the time between user clicked start, and they stopped. Per
> hour, per day, etc.
> 
> E.g total time spent 'working' per hour, per day.
> 
> Any ideas how this could be achieved, while accounting for messages
> arriving out of order due to latency, etc (e.g the stop notification may
> arrive before start)?
> 
> Would the kafka streams local store be of any use here (all events by the
> same user will have the same message key), or should i use Redis? Or do I
> need an hourly job which runs and processes last hour's events?

Re: Kafka streams 0.10.2 Producer throwing exception eventually causing streams shutdown

2017-04-18 Thread Eno Thereska

Hi Mahendra,

I see the java.lang.NoSuchMethodError: org.apache.kafka.clients... error. Looks 
like some jars aren't in the classpath?

Eno

> On 18 Apr 2017, at 12:46, Mahendra Kariya <mahendra.kar...@go-jek.com> wrote:
> 
> Hey Eno,
> 
> I just pulled the latest jar from the link you shared and tried to run my
> code. I am getting the following exception on new KafkaStreams(). The same
> code is working fine with 0.10.2.0 jar.
> 
> 
> Exception in thread "main" org.apache.kafka.common.KafkaException: Failed
> to construct kafka consumer
>at org.apache.kafka.clients.consumer.KafkaConsumer.(
> KafkaConsumer.java:717)
>at org.apache.kafka.clients.consumer.KafkaConsumer.(
> KafkaConsumer.java:566)
>at org.apache.kafka.streams.processor.internals.
> DefaultKafkaClientSupplier.getConsumer(DefaultKafkaClientSupplier.java:38)
>at org.apache.kafka.streams.processor.internals.StreamThread.(
> StreamThread.java:316)
>at org.apache.kafka.streams.KafkaStreams.(
> KafkaStreams.java:358)
>at org.apache.kafka.streams.KafkaStreams.(
> KafkaStreams.java:279)
> Caused by: java.lang.NoSuchMethodError: org.apache.kafka.clients.
> Metadata.update(Lorg/apache/kafka/common/Cluster;Ljava/util/Set;J)V
>at org.apache.kafka.streams.processor.internals.
> StreamsKafkaClient.(StreamsKafkaClient.java:98)
>at org.apache.kafka.streams.processor.internals.
> StreamsKafkaClient.(StreamsKafkaClient.java:82)
>at org.apache.kafka.streams.processor.internals.
> StreamPartitionAssignor.configure(StreamPartitionAssignor.java:219)
>at org.apache.kafka.common.config.AbstractConfig.
> getConfiguredInstances(AbstractConfig.java:254)
>at org.apache.kafka.common.config.AbstractConfig.
> getConfiguredInstances(AbstractConfig.java:220)
>at org.apache.kafka.clients.consumer.KafkaConsumer.(
> KafkaConsumer.java:673)
>... 6 more
> 
> 
> 
> On Tue, Apr 18, 2017 at 5:47 AM, Mahendra Kariya <mahendra.kar...@go-jek.com
>> wrote:
> 
>> Thanks!
>> 
>> On Tue, Apr 18, 2017, 12:26 AM Eno Thereska <eno.there...@gmail.com>
>> wrote:
>> 
>>> The RC candidate build is here: http://home.apache.org/~
>>> gwenshap/kafka-0.10.2.1-rc1/ <http://home.apache.org/~
>>> gwenshap/kafka-0.10.2.1-rc1/>
>>> 
>>> Eno
>>>> On 17 Apr 2017, at 17:20, Mahendra Kariya <mahendra.kar...@go-jek.com>
>>> wrote:
>>>> 
>>>> Thanks!
>>>> 
>>>> In the meantime, is the jar published somewhere on github or as a part
>>> of
>>>> build pipeline?
>>>> 
>>>> On Mon, Apr 17, 2017 at 9:18 PM, Eno Thereska <eno.there...@gmail.com>
>>>> wrote:
>>>> 
>>>>> Not yet, but as soon as 0.10.2 is voted it should be. Hopefully this
>>> week.
>>>>> 
>>>>> Eno
>>>>>> On 17 Apr 2017, at 13:25, Mahendra Kariya <mahendra.kar...@go-jek.com
>>>> 
>>>>> wrote:
>>>>>> 
>>>>>> Are the bug fix releases published to Maven central repo?
>>>>>> 
>>>>>> On Sat, Apr 1, 2017 at 12:26 PM, Eno Thereska <eno.there...@gmail.com
>>>> 
>>>>>> wrote:
>>>>>> 
>>>>>>> Hi Sachin,
>>>>>>> 
>>>>>>> In the bug fix release for 0.10.2 (and in trunk) we have now set
>>>>>>> max.poll.interval to infinite since from our experience with streams
>>>>> this
>>>>>>> should not be something that users set: https://github.com/apache/
>>>>>>> kafka/pull/2770/files <https://github.com/apache/
>>> kafka/pull/2770/files
>>>>>> .
>>>>>>> 
>>>>>>> We're in the process of documenting that change. For now you can
>>>>> increase
>>>>>>> the request timeout without worrying about max.poll.interval
>>> anymore. In
>>>>>>> fact I'd suggest you also increase max.poll.interval as we've done it
>>>>> above.
>>>>>>> 
>>>>>>> Thanks
>>>>>>> Eno
>>>>>>> 
>>>>>>>> On 1 Apr 2017, at 03:28, Sachin Mittal <sjmit...@gmail.com> wrote:
>>>>>>>> 
>>>>>>>> Should this timeout be less than max poll interval value? if yes
>>> than
>>>>>>>> generally speaking what should be the ratio betwee

Re: Kafka Streams - Join synchronization issue

2017-04-18 Thread Eno Thereska

Hi Marco,

I noticed your window is 1 second width, not 1 minute width. Is that 
intentional?

Thanks
Eno
> On 17 Apr 2017, at 19:41, Marco Abitabile <marco.abitab...@gmail.com> wrote:
> 
> hello Eno,
> thanks for your support. The two streams are both kstreams. The window is of 
> 1 minute-width until 5 minutes. This is the code:
> 
> //Other Stream: User Location, is a string with the name of the city the
> //user is (like "San Francisco")
> KStreamBuilder builder = new KStreamBuilder();
> KStream<String, String> userLocationStream = locationStreamBuilder
> .stream(stringSerde, stringSerde,"userLocationStreamData");
> KStream<String, String> locationKstream = userLocationStream
> .map(MyStreamUtils::enhanceWithAreaDetails);
> locationKstream.to("user_location");
> //This Stream: User Activity
> KStream<String, JsonObject> activity = builder.stream(stringSerde, jsonSerde, 
> "activityStreamData");
> activity.filter(MyStreamUtils::filterOutFakeUsers)
> .map(MyStreamUtils::enhanceWithScoreDetails)
> .join(
> locationKstream,
> MyStreamUtils::locationActivityJoiner,
> JoinWindows.of(1000).until(1000 * 60 * 5),
> stringSerde, jsonSerde, stringSerde)
> .to("usersWithLocation")
> 
> KafkaStreams stream = new KafkaStreams(builder, propsActivity);
> stream.start();
> 
> 
> And MyStreamUtils::locationActivityJoiner does:
> 
> public static JsonObject locationActivityJoiner(JsonObject activity, String
> loc) {
> JsonObject join = activity.copy();
> join.put("city" , loc);
> return join;
> }
> 
> hum... your question is letting me think... are you telling me that since 
> both are kstreams, they actually need to be re-streamed in sync?
> 
> Thanks a lot.
> 
> Marco
> 
> 
> 2017-04-16 21:45 GMT+02:00 Eno Thereska <eno.there...@gmail.com 
> <mailto:eno.there...@gmail.com>>:
> Hi Marco,
> 
> Could you share a bit of your code, or at a minimum provide some info on:
> - is userActivitiesStream and geoDataStream a KStream of KTable?
> - what is the length of "timewindow"?
> 
> Thanks
> Eno
> 
> > On 16 Apr 2017, at 19:44, Marco Abitabile <marco.abitab...@gmail.com 
> > <mailto:marco.abitab...@gmail.com>> wrote:
> >
> > Hi All!
> >
> > I need a little hint to understand how join works, in regards of stream
> > synchronization.
> >
> > This mail is a bit long, I need to explain the issue I'm facing.
> >
> > *TL-TR: *
> > it seems that join synchonization between stream is not respected as
> > explained here:
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-28+-+Add+a+processor+client#KIP-28-Addaprocessorclient-StreamSynchronization
> >  
> > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-28+-+Add+a+processor+client#KIP-28-Addaprocessorclient-StreamSynchronization>
> >
> > *The need:*
> > I have historical data residing into some databases, more specifically:
> >  - time series of user activities
> >  - time series of user geo positions
> >
> > *What I do:*
> > since I have a new algorithm I want to try, the historical data has been
> > already pruned by kafka retention policy and I have it into a database.
> > This is what I'm doing:
> >  1- spin up kafka-connect sink that takes historical gps data (let's say,
> > one day of data), ordered by event time, and push them into
> > "HistoricalGpsData" topic. This tasks pushes historical geo data as fast as
> > possible into kafka topic, respecting the original event time.
> >  2- spin up kafka-connect sink that takes historical user activities
> > (let's say, one day of data, the same day of gps data, of course), ordered
> > by event time, and push them into "HistoricalUserActivites" topic. This
> > tasks pushes historical user activities data as fast as possible into kafka
> > topic, respecting the original event time.
> >  3- spin up my new stream processor algorithm
> >
> > As per the nature of the data, I have the quantity of activity data much
> > higher than geo data, thus the task1 pushes all the needed geo data into
> > kafka topic within few minutes (around 10 minutes), while activities data,
> > since has a higher volume, is entirely pushed within 1 hour.
> > --> the two streams are pushed into kafka regardless of their
> > synchronization (however being aware of their nature, as explained above)
> >
> > *What I expect:*
> > Now, what I would expect is that when I perform the join between the

Re: Kafka streams 0.10.2 Producer throwing exception eventually causing streams shutdown

2017-04-17 Thread Eno Thereska

The RC candidate build is here: 
http://home.apache.org/~gwenshap/kafka-0.10.2.1-rc1/ 
<http://home.apache.org/~gwenshap/kafka-0.10.2.1-rc1/>

Eno
> On 17 Apr 2017, at 17:20, Mahendra Kariya <mahendra.kar...@go-jek.com> wrote:
> 
> Thanks!
> 
> In the meantime, is the jar published somewhere on github or as a part of
> build pipeline?
> 
> On Mon, Apr 17, 2017 at 9:18 PM, Eno Thereska <eno.there...@gmail.com>
> wrote:
> 
>> Not yet, but as soon as 0.10.2 is voted it should be. Hopefully this week.
>> 
>> Eno
>>> On 17 Apr 2017, at 13:25, Mahendra Kariya <mahendra.kar...@go-jek.com>
>> wrote:
>>> 
>>> Are the bug fix releases published to Maven central repo?
>>> 
>>> On Sat, Apr 1, 2017 at 12:26 PM, Eno Thereska <eno.there...@gmail.com>
>>> wrote:
>>> 
>>>> Hi Sachin,
>>>> 
>>>> In the bug fix release for 0.10.2 (and in trunk) we have now set
>>>> max.poll.interval to infinite since from our experience with streams
>> this
>>>> should not be something that users set: https://github.com/apache/
>>>> kafka/pull/2770/files <https://github.com/apache/kafka/pull/2770/files
>>> .
>>>> 
>>>> We're in the process of documenting that change. For now you can
>> increase
>>>> the request timeout without worrying about max.poll.interval anymore. In
>>>> fact I'd suggest you also increase max.poll.interval as we've done it
>> above.
>>>> 
>>>> Thanks
>>>> Eno
>>>> 
>>>>> On 1 Apr 2017, at 03:28, Sachin Mittal <sjmit...@gmail.com> wrote:
>>>>> 
>>>>> Should this timeout be less than max poll interval value? if yes than
>>>>> generally speaking what should be the ratio between two or range for
>> this
>>>>> timeout value .
>>>>> 
>>>>> Thanks
>>>>> Sachin
>>>>> 
>>>>> On 1 Apr 2017 04:57, "Matthias J. Sax" <matth...@confluent.io> wrote:
>>>>> 
>>>>> Yes, you can increase ProducerConfig.REQUEST_TIMEOUT_MS_CONFIG
>>>>> 
>>>>> 
>>>>> -Matthias
>>>>> 
>>>>> 
>>>>> On 3/31/17 11:32 AM, Sachin Mittal wrote:
>>>>>> Hi,
>>>>>> So I have added the config ProducerConfig.RETRIES_CONFIG,
>>>>> Integer.MAX_VALUE
>>>>>> and the NotLeaderForPartitionException is gone.
>>>>>> 
>>>>>> However we see a new exception especially under heavy load:
>>>>>> org.apache.kafka.streams.errors.StreamsException: task [0_1]
>> exception
>>>>>> caught when producing
>>>>>> at
>>>>>> org.apache.kafka.streams.processor.internals.RecordCollectorImpl.
>>>>> checkForException(RecordCollectorImpl.java:119)
>>>>>> ~[kafka-streams-0.10.2.1-SNAPSHOT.jar:na]
>>>>>> at
>>>>>> org.apache.kafka.streams.processor.internals.
>> RecordCollectorImpl.flush(
>>>>> RecordCollectorImpl.java:127)
>>>>>> ~[kafka-streams-0.10.2.1-SNAPSHOT.jar:na]at
>>>>>> org.apache.kafka.streams.processor.internals.
>>>> StreamTask$1.run(StreamTask.
>>>>> java:76)
>>>>>> ~[kafka-streams-0.10.2.1-SNAPSHOT.jar:na]
>>>>>> at
>>>>>> org.apache.kafka.streams.processor.internals.StreamsMetricsImpl.
>>>>> measureLatencyNs(StreamsMetricsImpl.java:188)
>>>>>> ~[kafka-streams-0.10.2.1-SNAPSHOT.jar:na]
>>>>>> at
>>>>>> org.apache.kafka.streams.processor.internals.
>>>> StreamTask.commit(StreamTask.
>>>>> java:280)
>>>>>> ~[kafka-streams-0.10.2.1-SNAPSHOT.jar:na]at
>>>>>> org.apache.kafka.streams.processor.internals.StreamThread.commitOne(
>>>>> StreamThread.java:787)
>>>>>> ~[kafka-streams-0.10.2.1-SNAPSHOT.jar:na]
>>>>>> at
>>>>>> org.apache.kafka.streams.processor.internals.StreamThread.commitAll(
>>>>> StreamThread.java:774)
>>>>>> ~[kafka-streams-0.10.2.1-SNAPSHOT.jar:na]at
>>>>>> org.apache.kafka.streams.processor.internals.
>> StreamThread.maybeCommit(
>>>>> StreamThread.java:749)
>>>>>> ~[kafka-streams-0.10.2.1-SNAPSHOT.jar:na]
>>>>>

Re: Kafka streams 0.10.2 Producer throwing exception eventually causing streams shutdown

2017-04-17 Thread Eno Thereska

Not yet, but as soon as 0.10.2 is voted it should be. Hopefully this week.

Eno
> On 17 Apr 2017, at 13:25, Mahendra Kariya <mahendra.kar...@go-jek.com> wrote:
> 
> Are the bug fix releases published to Maven central repo?
> 
> On Sat, Apr 1, 2017 at 12:26 PM, Eno Thereska <eno.there...@gmail.com>
> wrote:
> 
>> Hi Sachin,
>> 
>> In the bug fix release for 0.10.2 (and in trunk) we have now set
>> max.poll.interval to infinite since from our experience with streams this
>> should not be something that users set: https://github.com/apache/
>> kafka/pull/2770/files <https://github.com/apache/kafka/pull/2770/files>.
>> 
>> We're in the process of documenting that change. For now you can increase
>> the request timeout without worrying about max.poll.interval anymore. In
>> fact I'd suggest you also increase max.poll.interval as we've done it above.
>> 
>> Thanks
>> Eno
>> 
>>> On 1 Apr 2017, at 03:28, Sachin Mittal <sjmit...@gmail.com> wrote:
>>> 
>>> Should this timeout be less than max poll interval value? if yes than
>>> generally speaking what should be the ratio between two or range for this
>>> timeout value .
>>> 
>>> Thanks
>>> Sachin
>>> 
>>> On 1 Apr 2017 04:57, "Matthias J. Sax" <matth...@confluent.io> wrote:
>>> 
>>> Yes, you can increase ProducerConfig.REQUEST_TIMEOUT_MS_CONFIG
>>> 
>>> 
>>> -Matthias
>>> 
>>> 
>>> On 3/31/17 11:32 AM, Sachin Mittal wrote:
>>>> Hi,
>>>> So I have added the config ProducerConfig.RETRIES_CONFIG,
>>> Integer.MAX_VALUE
>>>> and the NotLeaderForPartitionException is gone.
>>>> 
>>>> However we see a new exception especially under heavy load:
>>>> org.apache.kafka.streams.errors.StreamsException: task [0_1] exception
>>>> caught when producing
>>>> at
>>>> org.apache.kafka.streams.processor.internals.RecordCollectorImpl.
>>> checkForException(RecordCollectorImpl.java:119)
>>>> ~[kafka-streams-0.10.2.1-SNAPSHOT.jar:na]
>>>> at
>>>> org.apache.kafka.streams.processor.internals.RecordCollectorImpl.flush(
>>> RecordCollectorImpl.java:127)
>>>> ~[kafka-streams-0.10.2.1-SNAPSHOT.jar:na]at
>>>> org.apache.kafka.streams.processor.internals.
>> StreamTask$1.run(StreamTask.
>>> java:76)
>>>> ~[kafka-streams-0.10.2.1-SNAPSHOT.jar:na]
>>>> at
>>>> org.apache.kafka.streams.processor.internals.StreamsMetricsImpl.
>>> measureLatencyNs(StreamsMetricsImpl.java:188)
>>>> ~[kafka-streams-0.10.2.1-SNAPSHOT.jar:na]
>>>> at
>>>> org.apache.kafka.streams.processor.internals.
>> StreamTask.commit(StreamTask.
>>> java:280)
>>>> ~[kafka-streams-0.10.2.1-SNAPSHOT.jar:na]at
>>>> org.apache.kafka.streams.processor.internals.StreamThread.commitOne(
>>> StreamThread.java:787)
>>>> ~[kafka-streams-0.10.2.1-SNAPSHOT.jar:na]
>>>> at
>>>> org.apache.kafka.streams.processor.internals.StreamThread.commitAll(
>>> StreamThread.java:774)
>>>> ~[kafka-streams-0.10.2.1-SNAPSHOT.jar:na]at
>>>> org.apache.kafka.streams.processor.internals.StreamThread.maybeCommit(
>>> StreamThread.java:749)
>>>> ~[kafka-streams-0.10.2.1-SNAPSHOT.jar:na]
>>>> at
>>>> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(
>>> StreamThread.java:671)
>>>> ~[kafka-streams-0.10.2.1-SNAPSHOT.jar:na]        at
>>>> org.apache.kafka.streams.processor.internals.
>>> StreamThread.run(StreamThread.java:378)
>>>> ~[kafka-streams-0.10.2.1-SNAPSHOT.jar:na]
>>>> org.apache.kafka.common.errors.TimeoutException: Expiring 1 record(s)
>> for
>>>> new-part-advice-key-table-changelog-1: 30001 ms has passed since last
>>> append
>>>> 
>>>> So any idea as why TimeoutException is happening.
>>>> Is this controlled by
>>>> ProducerConfig.REQUEST_TIMEOUT_MS_CONFIG
>>>> 
>>>> If yes
>>>> What should the value be set in this given that out consumer
>>>> max.poll.interval.ms is defaul 5 minutes.
>>>> 
>>>> Is there any other setting that we should try to avoid such errors which
>>>> causes stream thread to die.
>>>> 
>>>> Thanks
>>>> Sachin
>>>> 
>>>> 
>>>> On Sun, Mar 26, 2017 at 1:

Re: Kafka-Streams: Cogroup

2017-04-13 Thread Eno Thereska

Hi Kyle, (cc-ing user list as well)

This could be an interesting scenario. Two things to help us think through it 
some more: 1) it seems you attached a figure, but I cannot seem to open it. 2) 
what about using the low level processor API instead of the DSL as approach 3? 
Do you have any thoughts on that?

Thanks
Eno

> On 13 Apr 2017, at 11:26, Winkelman, Kyle G  wrote:
> 
> Hello,
>  
> I am wondering if there is any way to aggregate together many streams at once 
> to build a larger object. Example (Healthcare Domain):
> I have streams of Medical, Pharmacy, and Lab claims. Key is PatientId, Value 
> is a different Avro Record for each stream.
> I was hoping there was a way to supply a single Initializer, () -> new 
> Patient(), and 3 aggregators, (key, value, patient) -> 
> patient.add**Claim(value).
>  
> Currently the only way that I see to do the above use case is by aggregating 
> each individual stream then joining them. This doesn’t scale well with a 
> large number of input streams because for each stream I would be creating 
> another state store.
>  
> I was hoping to get thoughts on a KCogroupedStream api. I have spent a little 
> time conceptualizing it.
>  
> Approach 1:
> In KGroupedStream add a cogroup method that takes the single initializer, a 
> list of other kgroupedstreams, and a list of other aggregators.
> This would then all flow through a single processor and a have a single 
> backing state store.
> The aggregator that the object will get sent to is determined by the 
> context().topic() which we should be able to trace back to one of the 
> kgroupedstreams in the list.
>  
> The problem I am having with this approach is that because everything is 
> going through the single processors and java doesn’t do the best with generic 
> types. I have to either pass in a list of Type objects for casting the object 
> before sending it to the aggregator or I must create aggregators that accept 
> an object and cast them to the appropriate type.
>  
> Approach 2:
> Create one processor for each aggregator and have a single state store. Then 
> have a single KStreamPassThrough that just passes on the new aggregate value.
> The positive for this is you know which stream it will be coming from and 
> won’t need to do the context().topic() trick.
>  
> The problem I am having with this approach is understanding if there is a 
> race condition. Obviously the source topics would be copartitioned. But would 
> it be multithreaded and possibly cause one of the processors to grab patient 
> 1 at the same time a different processor has grabbed patient 1?
> My understanding is that for each partition there would be a single complete 
> set of processors and a new incoming record would go completely through the 
> processor topology from a source node to a sink node before the next one is 
> sent through. Is this correct?
>  
> 
>  
> If anyone has any additional ideas about this let me know. I don’t know if I 
> have the time to actually create this api so if someone likes the idea and 
> wants to develop it feel free.
> 
> This e-mail, including attachments, may include confidential and/or
> proprietary information, and may be used only by the person or entity
> to which it is addressed. If the reader of this e-mail is not the intended
> recipient or his or her authorized agent, the reader is hereby notified
> that any dissemination, distribution or copying of this e-mail is
> prohibited. If you have received this e-mail in error, please notify the
> sender by replying to this message and delete this e-mail immediately.
>

Re: [VOTE] 0.10.2.1 RC1

2017-04-13 Thread Eno Thereska

+1 (non-binding) 

Built sources, ran all unit and integration tests, checked new documentation, 
esp with an eye on the streams library.

Thanks Gwen
Eno

> On 12 Apr 2017, at 17:25, Gwen Shapira  wrote:
> 
> Hello Kafka users, developers, client-developers, friends, romans,
> citizens, etc,
> 
> This is the second candidate for release of Apache Kafka 0.10.2.1.
> 
> This is a bug fix release and it includes fixes and improvements from 24 JIRAs
> (including a few critical bugs).
> 
> Release notes for the 0.10.2.1 release:
> http://home.apache.org/~gwenshap/kafka-0.10.2.1-rc1/RELEASE_NOTES.html
> 
> *** Please download, test and vote by Monday, April 17, 5:30 pm PT
> 
> Kafka's KEYS file containing PGP keys we use to sign the release:
> http://kafka.apache.org/KEYS
> 
> Your help in validating this bugfix release is super valuable, so
> please take the time to test and vote!
> 
> Suggested tests:
> * Grab the source archive and make sure it compiles
> * Grab one of the binary distros and run the quickstarts against them
> * Extract and verify one of the site docs jars
> * Build a sample against jars in the staging repo
> * Validate GPG signatures on at least one file
> * Validate the javadocs look ok
> * The 0.10.2 documentation was updated for this bugfix release
> (especially upgrade, streams and connect portions) - please make sure
> it looks ok: http://kafka.apache.org/documentation.html
> 
> * Release artifacts to be voted upon (source and binary):
> http://home.apache.org/~gwenshap/kafka-0.10.2.1-rc1/
> 
> * Maven artifacts to be voted upon:
> https://repository.apache.org/content/groups/staging/
> 
> * Javadoc:
> http://home.apache.org/~gwenshap/kafka-0.10.2.1-rc1/javadoc/
> 
> * Tag to be voted upon (off 0.10.2 branch) is the 0.10.2.1 tag:
> https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tag;h=e133f2ca57670e77f8114cc72dbc2f91a48e3a3b
> 
> * Documentation:
> http://kafka.apache.org/0102/documentation.html
> 
> * Protocol:
> http://kafka.apache.org/0102/protocol.html
> 
> /**
> 
> Thanks,
> 
> Gwen Shapira

Re: Kafka Streams Application does not start after 10.1 to 10.2 update if topics need to be auto-created

2017-04-13 Thread Eno Thereska

No, internal topics do not need to be manually created.

Eno
> On 13 Apr 2017, at 10:00, Shimi Kiviti  wrote:
> 
> Is that (manual topic creation) also true for internal topics?
> 
> On Thu, 13 Apr 2017 at 19:14 Matthias J. Sax  wrote:
> 
>> Hi,
>> 
>> thanks for reporting this issue. We are aware of a bug in 0.10.2 that
>> seems to be related: https://issues.apache.org/jira/browse/KAFKA-5037
>> 
>> However, I also want to point out, that it is highly recommended to not
>> use auto topic create for Streams, but to manually create all
>> input/output topics before you start your Streams application.
>> 
>> For more details, see
>> 
>> http://docs.confluent.io/current/streams/developer-guide.html#managing-topics-of-a-kafka-streams-application
>> 
>> 
>> May I ask, why your are using topic auto create?
>> 
>> 
>> -Matthias
>> 
>> 
>> On 4/11/17 1:09 PM, Dmitry Minkovsky wrote:
>>> I updated from 10.1 and 10.2. I updated both the broker and maven
>>> dependency.
>>> 
>>> I am using topic auto-create. With 10.1, starting the application with a
>>> broker would sometimes result in an error like:
>>> 
 Exception in thread "StreamThread-1"
>>> org.apache.kafka.streams.errors.TopologyBuilderException: Invalid
>> topology
>>> building: stream-thread [StreamThread-1] Topic not found: $topic
>>> 
>>> But this would only happen once. Upon the second attempt, the topics are
>>> already created and everything works fine.
>>> 
>>> But with 10.2 this error does not go away. I have confirmed and tested
>> that
>>> auto topic creation is enabled.
>>> 
>>> Here is the error/trace:
>>> 
>>> 
>>> Exception in thread "StreamThread-1"
>>> org.apache.kafka.streams.errors.TopologyBuilderException: Invalid
>> topology
>>> building: stream-thread [StreamThread-1] Topic not found: session-updates
>>> at
>>> 
>> org.apache.kafka.streams.processor.internals.StreamPartitionAssignor$CopartitionedTopicsValidator.validate(StreamPartitionAssignor.java:734)
>>> at
>>> 
>> org.apache.kafka.streams.processor.internals.StreamPartitionAssignor.ensureCopartitioning(StreamPartitionAssignor.java:648)
>>> at
>>> 
>> org.apache.kafka.streams.processor.internals.StreamPartitionAssignor.assign(StreamPartitionAssignor.java:368)
>>> at
>>> 
>> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.performAssignment(ConsumerCoordinator.java:339)
>>> at
>>> 
>> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.onJoinLeader(AbstractCoordinator.java:488)
>>> at
>>> 
>> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.access$1100(AbstractCoordinator.java:89)
>>> at
>>> 
>> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$JoinGroupResponseHandler.handle(AbstractCoordinator.java:438)
>>> at
>>> 
>> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$JoinGroupResponseHandler.handle(AbstractCoordinator.java:420)
>>> at
>>> 
>> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:764)
>>> at
>>> 
>> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:745)
>>> at
>>> 
>> org.apache.kafka.clients.consumer.internals.RequestFuture$1.onSuccess(RequestFuture.java:186)
>>> at
>>> 
>> org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:149)
>>> at
>>> 
>> org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:116)
>>> at
>>> 
>> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.fireCompletion(ConsumerNetworkClient.java:493)
>>> at
>>> 
>> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.firePendingCompletedRequests(ConsumerNetworkClient.java:322)
>>> at
>>> 
>> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:253)
>>> at
>>> 
>> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:172)
>>> at
>>> 
>> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:334)
>>> at
>>> 
>> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:303)
>>> at
>>> 
>> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:286)
>>> at
>>> 
>> org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1030)
>>> at
>>> 
>> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:995)
>>> at
>>> 
>> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:582)
>>> at
>>> 
>> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:368)
>>> 
>>> 
>>> It does not occur if my topology only defines streams and tables.
>> However,
>>> when I attempt to join a stream and a table, this error is thrown:
>>>

Re: Streams error handling

2017-04-13 Thread Eno Thereska

Hi Mike, 

Thank you. Could you open a JIRA to capture this specific problem (a copy-paste 
would suffice)? Alternatively we can open it, up to you.

Thanks
Eno
> On 13 Apr 2017, at 08:43, Mike Gould  wrote:
> 
> Great to know I've not gone off in the wrong direction
> Thanks
> 
> On Thu, 13 Apr 2017 at 16:34, Matthias J. Sax  wrote:
> 
>> Mike,
>> 
>> thanks for your feedback. You are absolutely right that Streams API does
>> not have great support for this atm. And it's very valuable that you
>> report this (you are not the first person). It helps us prioritizing :)
>> 
>> For now, there is no better solution as the one you described in your
>> email, but its on our roadmap to improve the API -- and its priority got
>> just increase by your request.
>> 
>> I am sorry, that I can't give you a better answer right now :(
>> 
>> 
>> -Matthias
>> 
>> 
>> On 4/13/17 8:16 AM, Mike Gould wrote:
>>> Hi
>>> Are there any better error handling options for Kafka streams in java.
>>> 
>>> Any errors in the serdes will break the stream.  The suggested
>>> implementation is to use the byte[] serde and do the deserialisation in a
>>> map operation.  However this isn't ideal either as there's no great way
>> to
>>> handle exceptions.
>>> My current tactics are to use flatMap in place of map everywhere and
>> return
>>> empySet on error. Unfortunately this means the error has to be handled
>>> directly in the function where it happened and can only be handled as a
>>> side effect.
>>> 
>>> It seems to me that this could be done better. Maybe the *Mapper
>> interfaces
>>> could allow specific checked exceptions. These could be handled by
>> specific
>>> downstream KStream.mapException() steps which might e.g. Put an error
>>> response on another stream branch.
>>> Alternatively could it be made easier to return something like an Either
>>> from the Mappers with a the addition of few extra mapError or mapLeft
>>> mapRight methods on KStream?
>>> 
>>> Unless there's a better error handling pattern which I've entirely
>> missed?
>>> 
>>> Thanks
>>> MIkeG
>>> 
>> 
>> --
> - MikeG
> http://en.wikipedia.org/wiki/Common_misconceptions
>

Re: In kafka streams consumer seems to hang while retrieving the offsets

2017-04-10 Thread Eno Thereska

Hi Sachin,

In 0.10.2.1 we've changed the default value of max.poll.interval.ms (to avoid 
rebalancing during recovery) as well as the default value of the streams 
producer retries (to retry during a temporary broker failure). I think you are 
aware of the changes, but just double checking. You don't need to wait for 
0.10.2.1, you can make the changes directly yourself:

final Properties props = new Properties();
props.put(StreamsConfig.APPLICATION_ID_CONFIG, ID);
...
props.put(ProducerConfig.RETRIES_CONFIG, 10);
props.put(ConsumerConfig.MAX_POLL_INTERVAL_MS_CONFIG, 
Integer.toString(Integer.MAX_VALUE));

This doesn't address the RocksDB issue though, still looking into that.

Thanks
Eno

> On 9 Apr 2017, at 22:55, Sachin Mittal <sjmit...@gmail.com> wrote:
> 
> Let me try to get the debug log when this error happens.
> 
> Right now we have three instances each with 4 threads consuming from 12
> partition topic.
> So one thread per partition.
> 
> The application is running fine much better than before. Now it usually
> runs for a week even during peak load.
> 
> Sometime out of blue either rocksdb throws an exception with a single
> character (which I guess is a known issue with rocks db fixed in some next
> release).
> Or the producer gets timed out while committing some changelog topic
> record. I had increased the timeout from 30 seconds to 180 seconds, but it
> still throws exception for that time also.
> 
> Not sure if these are due to VM issue or network.
> 
> But whenever something like this happens, the application goes into
> rebalance and soon things take turn for worse. Soon some of the threads go
> into deadlock with above stack trace and application is now in perpetual
> rebalance state.
> 
> Only way to resolve this is kill all instances using -9 and restart the
> instances one by one.
> 
> So also long as we have a steady state of one thread per partition
> everything is working fine. I am still working out a way to limit the
> changelog topic size by more aggressive compaction and let me see if that
> will make things better.
> 
> I will try to get the logs when this happens next time.
> 
> Thanks
> Sachin
> 
> 
> 
> On Sun, Apr 9, 2017 at 6:05 PM, Eno Thereska <eno.there...@gmail.com> wrote:
> 
>> Hi Sachin,
>> 
>> It's not necessarily a deadlock. Do you have any debug traces from those
>> nodes? Also would be useful to know the config (e.g., how many partitions
>> do you have and how many app instances.)
>> 
>> Thanks
>> Eno
>> 
>>> On 9 Apr 2017, at 04:45, Sachin Mittal <sjmit...@gmail.com> wrote:
>>> 
>>> Hi,
>>> In my streams applications cluster in one or more instances I see some
>>> threads always waiting with the following stack.
>>> 
>>> Every time I check on jstack I see the following trace.
>>> 
>>> Is this some kind of new deadlock that we have failed to identify.
>>> 
>>> Thanks
>>> Sachin
>>> 
>>> here is the stack trace:
>>> 
>> 
>> --
>>> "StreamThread-4" #20 prio=5 os_prio=0 tid=0x7fb814be3000 nid=0x19bf
>>> runnable [0x7fb7cb4f6000]
>>>  java.lang.Thread.State: RUNNABLE
>>>   at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
>>>   at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
>>>   at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.
>> java:93)
>>>   at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
>>>   - locked <0x000701c50c98> (a sun.nio.ch.Util$3)
>>>   - locked <0x000701c50c88> (a java.util.Collections$
>>> UnmodifiableSet)
>>>   - locked <0x000701c4f6a8> (a sun.nio.ch.EPollSelectorImpl)
>>>   at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
>>>   at org.apache.kafka.common.network.Selector.select(
>>> Selector.java:489)
>>>   at org.apache.kafka.common.network.Selector.poll(
>> Selector.java:298)
>>>   at org.apache.kafka.clients.NetworkClient.poll(
>>> NetworkClient.java:349)
>>>   at org.apache.kafka.clients.consumer.internals.
>>> ConsumerNetworkClient.poll(ConsumerNetworkClient.java:226)
>>>   - locked <0x000701c5da48> (a org.apache.kafka.clients.
>>> consumer.internals.ConsumerNetworkClient)
>>>   at org.apache.kafka.clients.consumer.internals.
>>> ConsumerNetworkClient.poll

1 2 3 >

1 - 100 of 252 matches

Mail list logo