Re: KafkaStream: puncutuate() never called even when data is received by process()

2016-11-23 Thread shahab
Thanks for the comments. @David: yes, I have a source which is reading data from two topics and one of them were empty while the second one was loaded with plenty of data. So what do you suggest to solve this ? Here is snippet of my code: StreamsConfig config = new

Data (re)processing with Kafka (new wiki page)

2016-11-23 Thread Matthias J. Sax
Hi, we added a new wiki page that is supposed to collect data (re)processing scenario with Kafka: https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Streams+Data+(Re)Processing+Scenarios We added already a couple of scenarios we think might be common and want to invite all of you to add

Re: Oversized Message 40k

2016-11-23 Thread Felipe Santos
Thanks guys, for your information, I will do some performance tests Em qua, 23 de nov de 2016 às 05:14, Ignacio Solis escreveu: > At LinkedIn we have a number of use cases for large messages. We stick to > the 1MB message limit at the high end though. > > Nacho > > On Tue, Nov

Re: KafkaStream: puncutuate() never called even when data is received by process()

2016-11-23 Thread David Garcia
If you are consuming from more than one topic/partition, punctuate is triggered when the “smallest” time-value changes. So, if there is a partition that doesn’t have any more messages on it, it will always have the smallest time-value and that time value won’t change…hence punctuate never gets

Re: kafka 0.10.1 ProcessorTopologyTestDriver issue with .map and .groupByKey.count

2016-11-23 Thread Matthias J. Sax
CACHE_MAX_BYTES_BUFFERING_CONFIG does not have any impact if you query the state. If you query it, you will always get the latest values. CACHE_MAX_BYTES_BUFFERING_CONFIG only effects the downstream KTable changelog stream (but you do not use this anyway). If I understand you correctly, if you

Re: KafkaStream: puncutuate() never called even when data is received by process()

2016-11-23 Thread Matthias J. Sax
Your understanding is correct: Punctuate is not triggered base on wall-clock time, but based in internally tracked "stream time" that is derived from TimestampExtractor. Even if you use WallclockTimestampExtractor, "stream time" is only advance if there are input records. Not sure why

Re: KafkaStreams KTable#through not creating changelog topic

2016-11-23 Thread Matthias J. Sax
> 1) Create a state store AND the changelog > topic 2) follow the Kafka Streams naming convention for changelog topics. > Basically, I want to have a method that does what .through() is supposed to > do according to the documentation, but without the "topic" parameter. I understand what you are

Re: KafkaStreams KTable#through not creating changelog topic

2016-11-23 Thread Mikael Högqvist
Hi Michael, thanks for the extensive explanation, and yes it definitely helps with my understanding of through(). :) You guessed correctly that I'm doing some "shenanings" where I'm trying to derive the changelog of a state store from the state store name. This works perfectly fine with with a

A strange controller log in Kafka 0.9.0.1

2016-11-23 Thread Json Tu
Hi, We have a cluster of kafka 0.9.0.1 with 3 nodes, and we found a strange controller log as below. [2016-11-07 03:14:48,575] INFO [SessionExpirationListener on 100], ZK expired; shut down all controller components and try to re-elect

Re: KafkaStreams KTable#through not creating changelog topic

2016-11-23 Thread Michael Noll
> - Also, in case you want to do some shenanigans (like for some tooling you're building > around state stores/changelogs/interactive queries) such detecting all state store changelogs > by doing the equivalent of `ls *-changelog`, then this will miss changelogs of KTables that are > created by

Re: KafkaStreams KTable#through not creating changelog topic

2016-11-23 Thread Michael Noll
Mikael, regarding your second question: > 2) Regarding the use case, the topology looks like this: > > .stream(...) > .aggregate(..., "store-1") > .mapValues(...) > .through(..., "store-2") The last operator above would, without "..." ellipsis, be sth like `KTable#through("through-topic",

Re: kafka 0.10.1 ProcessorTopologyTestDriver issue with .map and .groupByKey.count

2016-11-23 Thread Hamidreza Afzali
Thanks Matthias. Disabling the cache didn't solve the issue. Here's a sample code: https://gist.github.com/hrafzali/c2f50e7b957030dab13693eec1e49c13 The topology doesn't produce any result but it works when commenting out .map(...) in line 21. Thanks, Hamid

RE: Kafka consumers are not equally distributed

2016-11-23 Thread Ghosh, Achintya (Contractor)
No, that is not the reason. Initially all the partitions were assigned the messages and those were processed very fast and sit idle even other partitions are having a lot of messages to be processed. So I was under impression that rebalance should be triggered and messages will be

KafkaStream: puncutuate() never called even when data is received by process()

2016-11-23 Thread shahab
Hello, I am using low level processor and I set the context.schedule(1), assuming that punctuate() method is invoked every 10 sec . I have set configProperties.put(StreamsConfig.TIMESTAMP_EXTRACTOR_CLASS_CONFIG, WallclockTimestampExtractor.class.getCanonicalName()) ) Although data is keep

Re: Error "Unknown magic byte" occurred while deserializing Avro message

2016-11-23 Thread Dayong
As I remember this is to complain the fist byte of msg is not x00. I think console producer does not support json since it uses string schema. Thanks, Dayong > On Nov 23, 2016, at 4:28 AM, ZHU Hua B wrote: > > Hi, > > > We tried to produce and consume a AVRO

Messages intermittently get lost

2016-11-23 Thread Zac Harvey
I am playing around with Kafka and have a simple setup: * 1-node Kafka (Ubuntu) server * 3-node ZK cluster (each on their own Ubuntu server) I have a consumer written in Scala and am using the kafka-console-producer (v0.10) that ships with the distribution. I'd say about 20% of the

Error "Unknown magic byte" occurred while deserializing Avro message

2016-11-23 Thread ZHU Hua B
Hi, We tried to produce and consume a AVRO message (zookeeper, broker and schema registry have been launched), error "Unknown magic byte" occurred while deserializing Avro message, if I missed anything? Thanks! >From producer: # bin/kafka-avro-console-producer --broker-list localhost:9092

Re: Delete "kafka-logs"

2016-11-23 Thread Sachin Mittal
Maybe because of these settings log.retention.check.interval.ms=1 log.retention.ms=1 Try setting check interval at 1000 ms. I just feel these values are too low. Try setting retention to be 1 hr and keep check interval to default which is 300 seconds. Also just check the permission of