Re: Prioritized Topics for Kafka

2019-01-17 Thread Jeff Widman
Use case: I work for a company that ingests events that come from both real-time sources (which spike during the day) and historical log data. We want the real-time data processed in minutes, and the historical log data processed within hours. The consumer's business logic is the same. Our

Re: rebalancing latency spikes on high throughput kafka-streams services

2019-01-17 Thread Guozhang Wang
Hello Javier, I read you have an SO thread before I noticed the question here, so I've answered it in SO already, just for the reference for other readers interested in this thread:

Re: Prioritized Topics for Kafka

2019-01-17 Thread Subhash Sriram
Use case: we process documents from a variety of sources. We want to process some of these sources in a priority order, but we don’t want to necessarily finish all the higher priority sources before going to lower priority because the volume of higher priority sources can be extremely high. We

Re: Prioritized Topics for Kafka

2019-01-17 Thread Ryanne Dolan
Nick, I think it's worth noting that Kafka is not a real-time system, and most of these use cases (and TBH any that I can imagine) for prioritized topics are real-time use cases. For example, you wouldn't want to pilot a jet (a canonical real-time system) based on Kafka events, as there is no

Re: Prioritized Topics for Kafka

2019-01-17 Thread Michal Michalski
Hi, This sounds like a great idea, and thanks for reaching out for feedback. Here are two use cases I've worked on that I'd seriously consider using such feature for: 1. Priority Republish of Data - in an event driven system, there's a "republish" functionality used for e.g. fixing data affected

RE: Prioritized Topics for Kafka

2019-01-17 Thread Tim Ward
Use cases: processing alerts. High priority alerts ("a large chunk of your system has stopped providing service, immediate action essential") should be processed before low priority alerts ("some minor component has put out a not-very serious warning, somebody should probably have a look at it

Re: Prioritized Topics for Kafka

2019-01-17 Thread Tobias Adamson
Use cases: prioritise current data When processing messages sometimes there is a need to re process old data. It would be nice to be abled to send the old data as messages to a separate topic and that would only be processed when the current topic doesn’t have any messages left to process. This

Re: Total Volume metrics of Kafka

2019-01-17 Thread Gabriele Paggi
On Thu, 17 Jan 2019 at 00:44, Peter Bukowinski wrote: > On each broker, we have a process (scheduled with cron) that polls the > kafka jmx api every 60 seconds. It sends the metrics data to graphite ( > https://graphiteapp.org). We have graphite configured as a data source > for grafana

Re: rebalancing latency spikes on high throughput kafka-streams services

2019-01-17 Thread Raman Gupta
The first thing I'd take a look at is your `max.poll.records` setting. The default for streams is 1000 (see https://docs.confluent.io/current/streams/developer-guide/config-streams.html#default-values). Depending on your workloads, this could definitely cause long rebalances -- it did for me, but

Quickstart: Error with MinGW and one minor suggestion

2019-01-17 Thread Bernhard Molz
Hi,   I'm just working through the Quickstart page. https://kafka.apache.org/quickstart   As I'm working on Windows, I use MinGW to emulate Linux. 1) On MinGW, I get this error: me@computer MINGW64 /c/java/kafka_2.12 $ bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test

Kafka Stream Best Node Module in NodeJs

2019-01-17 Thread marimuthu eee
Hi, I have one dought for implementing kafka streams in nodejs language that is which npm node module is best for kafka streams operation.and another one dought is how kafka streams have do parallel processing for each kafka topic partition.