Re: Kafka stream join scenarios

2016-05-23 Thread Srikanth
Guozhang, I guess you are referring to a scenario where noOfThreads < totalNoOfTasks. We could have KTable task and KStream task running on the same thread and sleep will be counter productive? On this note, will a Processor always run on the same thread? Are process() and punctuate() guaranteed

Re: Kafka Streams change log behavior

2016-05-23 Thread Manu Zhang
Well, that's something our user is asking for so I'd like to learn how Kafka Streams has done it. FYI, Gearpump checkpoints the mapping of event time clock to kafka offset for replay. We are thinking about allowing users to configure a topic to recover from after application upgrade. Meanwhile,

Re: Kafka stream join scenarios

2016-05-23 Thread Guozhang Wang
Srikanth, Note that the same thread maybe used for fetching both the "semi-static" KTable stream as well as the continuous KStream stream, so sleep-on-no-match may not work. I think setting timestamps for this KTable to make sure its values is smaller than the KStream stream will work, and there

Re: Large kafka deployment on virtual hardware

2016-05-23 Thread Jens Rantil
Hi Jahn, How is the load on Zookeeper? How often are you committing your offsets? Could that be an issue? Cheers, Jens Den mån 23 maj 2016 18:12Jahn Roux skrev: > I have a large Kafka deployment on virtual hardware: 120 brokers on 32gb > memory 8 core virtual machines.

Re: Kafka stream join scenarios

2016-05-23 Thread Srikanth
Thanks Guozhang & Matthias! For 1), it is going to be a common ask. So a DSL API will be good. For 2), source for the KTable currently is a file. Think about it as a dump from a DB table. We are thinking of ways to stream updates from this table. But for now its a new file every day or so. I

reading from kafka topic in reverse order

2016-05-23 Thread Vu Tuan Nguyen
Hi, I have a use case that would benefit from reading from a kafka topic in reverse order. I saw some similar questions on this in the mailing list, but didn't see a way to do this discussed with the 0.9 consumer api. Is it possible to do this in the 0.9 consumer api? I don't need it to be

Re: newbie: kafka 0.9.0.0 producer does not terminate after producer.close()

2016-05-23 Thread Andy Davidson
Thanks Andy From: Kamal C Reply-To: Date: Sunday, May 22, 2016 at 8:36 AM To: Subject: Re: newbie: kafka 0.9.0.0 producer does not terminate after producer.close() > Andy, > > Kafka 0.9.0 server supports the

Kafka Scalability with the Number of Partitions

2016-05-23 Thread Yazeed Alabdulkarim
Hi, I am running simple experiments to evaluate the scalability of Kafka consumers with respect to the number of partitions. I assign every consumer to a specific partition. Each consumer polls the records in its assigned partition and print the first one, then polls again from the offset of the

No message seen on console consumer after message sent by Java Producer

2016-05-23 Thread Ninad Phalak
I created a java producer on my local machine and setup a Kafka broker on another machine, say M2, on the network(I can ping,SSH, connect to this machine). On the Producer side in the Eclipse console I get "Message sent". But when I check the console consumer on machine M2 I cannot see those

Kafka running on Ceph

2016-05-23 Thread Connie Yang
Hi All, Does anyone have any performance metrics running Kafka on Ceph? I briefly gathered at the 2016 Kafka Summit that there's an ongoing work between the Kafka community and RedHat in getting Kafka running successfully on Ceph. Is this correct? If so, what's timeline for that? Thanks

Re: Kafka stream join scenarios

2016-05-23 Thread Guozhang Wang
Hi Srikanth, How do you define if the "KTable is read completely" to get to the "current content"? Since as you said that table is not purely static, but still with maybe-low-traffic update streams, I guess "catch up to current content" is still depending on some timestamp? BTW about 1), we are

Re: Kafka Streams change log behavior

2016-05-23 Thread Guozhang Wang
Hi Manu, That is a good question, currently if users change their application logic while upgrading, the intermediate state stores may not be valid anymore, and hence the change log. In this case we need to wipe out the invalid internal data and restart from scratch. We are working on a better

Re: Schema registry question

2016-05-23 Thread Avi Flax
Mike, the schema registry you’re using is a part of Confluent’s platform, which builds upon but is not affiliated with Kafka itself. You might want to post your question to Confluent’s mailing list: https://groups.google.com/d/forum/confluent-platform HTH, Avi On 5/20/16, 21:40, "Mike

Large kafka deployment on virtual hardware

2016-05-23 Thread Jahn Roux
I have a large Kafka deployment on virtual hardware: 120 brokers on 32gb memory 8 core virtual machines. Gigabit network, RHEL 6.7. 4 Topics, 1200 partitions each, replication factor of 2 and running Kafka 0.8.1.2 We are running into issues where our cluster is not keeping up. We have 4 sets

Re: Kafka stream join scenarios

2016-05-23 Thread Srikanth
Matthias, For (2), how do you achieve this using transform()? Thanks, Srikanth On Sat, May 21, 2016 at 9:10 AM, Matthias J. Sax wrote: > Hi Srikanth, > > 1) there is no support on DSL level, but if you use Processor API you > can do "anything" you like. So yes, a

Re: Kafka Streams change log behavior

2016-05-23 Thread Manu Zhang
That's one case. Well, if I get it right, the change log's topic name is bound to applicationId, so what I want to ask is how change log works for scenarios like application upgrade (with new applicationId, I guess). Does the system handle that for user ? On Mon, May 23, 2016 at 9:21 PM Matthias

Re: Kafka Streams change log behavior

2016-05-23 Thread Matthias J. Sax
If I understand correctly, you want to have a non-Kafka-Streams consumer to read a change-log topic that was written by a Kafka-Streams application? That is certainly possible. Kafka is agnostic to Kafka Stream, ie, all topics are regular topic and can be read by any consumer. -Matthias On

Re: kafka 0.8.2 broker behaviour

2016-05-23 Thread Gerard Klijs
Are you sure consumers are always up, when they are behind they could generate a lot of traffic in a small amount of time? On Mon, May 23, 2016 at 9:11 AM Anishek Agarwal wrote: > additionally all the read / writes are happening via storm topologies. > > On Mon, May 23, 2016

Re: Kafka Producer and Buffer Issues

2016-05-23 Thread Tom Crayford
It puts things on several internal queues. I'd benchmark what kind of rates you're looking at - we handily do a few hundred thousand per second per process with a 2GB JVM heap. On Mon, May 23, 2016 at 1:31 PM, Joe San wrote: > When you say Threadsafe, I assume that the

Re: Kafka Producer and Buffer Issues

2016-05-23 Thread Joe San
When you say Threadsafe, I assume that the calls to the send method is Synchronized or? If it is the case, I see this as a bottleneck. We have a high frequency system and the frequency at which the calls are made to the send method is very high. This was the reason why I came up with multiple

Re: Kafka Producer and Buffer Issues

2016-05-23 Thread Tom Crayford
That's accurate. Why are you creating so many producers? The Kafka producer is thread safe and *should* be shared to take advantage of batching, so I'd recommend just having a single producer. Thanks Tom Crayford Heroku Kafka On Mon, May 23, 2016 at 10:41 AM, Joe San

Re: Kafka event logging

2016-05-23 Thread Tom Crayford
H there, You could probably wrangle this with log4j and filters. A single broker doesn't really have a consistent view of "if the cluster goes down", so it'd be hard to log that, but you could write some external monitoring that checked brokers were up via JMX and log from there. Thanks Tom

Kafka event logging

2016-05-23 Thread Ghosh, Prabal Kumar
Hi, I am working on 3 node kafka broker bosh release. I want the kafka broker events to be saved somewhere for audit logging. By kafka events, I mean to say kafka connection and disconnection logs. Also if a node in the kafka cluster goes down, the event should be logged. Is there any

Kafka Producer and Buffer Issues

2016-05-23 Thread Joe San
In one of our application, we have the following setting: # kafka configuration # ~ kafka { # comma seperated list of brokers # for e.g., "localhost:9092,localhost:9032" brokers = "localhost:9092,localhost:9032" topic = "asset-computed-telemetry" isEnabled = true # for a detailed

Re: Kafka Streams change log behavior

2016-05-23 Thread Manu Zhang
Thanks Matthias. Is there a way to allow users to read change logs from a previous application ? On Mon, May 23, 2016 at 3:57 PM Matthias J. Sax wrote: > Hi Manu, > > Yes. If a StreamTask recovers, it will write to the same change log's > topic partition. Log compaction

Re: KafkaStreams / Processor API - How to retrieve offsets

2016-05-23 Thread Matthias J. Sax
Hi, you can use "ProcessorContext" that is provided via init(). The context object is dynamically adjusted to the currently processed record To get the offset, use "context.offset()". -Matthias On 05/22/2016 10:38 PM, Florian Hussonnois wrote: > Hi everyone, > > Is there any way to retrieve

Re: Kafka Streams change log behavior

2016-05-23 Thread Matthias J. Sax
Hi Manu, Yes. If a StreamTask recovers, it will write to the same change log's topic partition. Log compaction is enable per default for those topics. You still might see some duplicates in your output. Currently, Kafka Streams guarantees at-least-once processing (exactly-once processing is on

Re: kafka 0.8.2 broker behaviour

2016-05-23 Thread Anishek Agarwal
additionally all the read / writes are happening via storm topologies. On Mon, May 23, 2016 at 12:17 PM, Anishek Agarwal wrote: > Hello, > > we are using 4 kafka machines in production with 4 topics and each topic > either 16/32 partitions and replication factor of 2. each

kafka 0.8.2 broker behaviour

2016-05-23 Thread Anishek Agarwal
Hello, we are using 4 kafka machines in production with 4 topics and each topic either 16/32 partitions and replication factor of 2. each machine has 3 disks for kafka logs. we see a strange behaviour where we see high disk usage spikes on one of the disks on all machines. it varies over time,