Efficient Kafka batch processing

2016-12-10 Thread Dominik Safaric
Hi everyone, What is among the most efficient ways to fast consume, transform and process Kafka messages? Importantly, I am not referring nor interested in streams, because the Kafka topic from which I would like to process the messages will eventually stop receiving messages, after which I sho

Re: Oversized Message 40k

2016-11-22 Thread Dominik Safaric
Big is a relative term. And the question you ask is quite difficult to answer because not other information is available - including the configuration of the Kafka cluster, hardware specification etcetera. I suggest the following: (1) read a couple of benchmarks such as [1], (2) investigate o

Kafka consumer offset uniqueness

2016-11-14 Thread Dominik Safaric
Hi all, I've been wondering- is the offset gotten with ConsumerRecord<>().offset() always unique for each partition? Asking because while I've been running a consumer group, I've observed that for example I had offset values equal to zero more times then there is the number of Kafka partitions

Re: Modify in-flight messages

2016-11-01 Thread Dominik Safaric
.html#timestamp(). > > On Tue, Nov 1, 2016 at 12:17 PM, Dominik Safaric > wrote: >> Is it possible to somehow modify the Kafka message payload before being sent >> to the consumer for consumption? Such as for example adding a timestamp to >> the current message payload indicating the time of message consumption. >> >> Dominik Šafarić

Modify in-flight messages

2016-11-01 Thread Dominik Safaric
Is it possible to somehow modify the Kafka message payload before being sent to the consumer for consumption? Such as for example adding a timestamp to the current message payload indicating the time of message consumption. Dominik Šafarić

Modify log message during consumer fetch

2016-10-31 Thread Dominik Safaric
Dear all, I am aware of the fact that Kafka is pull based, however, I’ve been curious: is it possible to modify a message between consumer fetch request and before the message queues up at the consumer side? Thanks in advance, Dominik

Increasing producer throughput

2016-10-29 Thread Dominik Safaric
Dear all, As my team is in the process of benchmarking several stream processing engines consuming data from Kafka, I’ve been investigating onto boosting the Kafka producer throughput. For running Kafka we use a single node with a single broker configuration. Kafka heap size is set to 4GB. Al

Re: Benchmarking kafka performance

2016-09-22 Thread Dominik Safaric
> Good morning. Which benchmarking tools we should use to compare performance > of 0.8 and 0.10 versions? Which metrics should we monitor ? It dependents on your use-case/requirements (if any). I suggest you take a look at a general high-level benchmark made by LinkedIn. They’ve focused onto

Re: Kafka Producer performance - 400GB of transfer on single instance taking > 72 hours?

2016-08-25 Thread Dominik Safaric
Dear Dana, > I would recommend > other tools for bulk transfers. What tools/languages would you rather recommend then using Python? I could for sure accomplish the same by using the native Java Kafka Producer API, but should this really affect the performance under the assumption that the Ka

Kafka Producer performance - 400GB of transfer on single instance taking > 72 hours?

2016-08-25 Thread Dominik Safaric
t is the “secret” behind setting an optimum number of log partitions? Can I improve the performance by increasing the number of IO threads, considering the hardware configuration of mine? By increasing e.g. the number of log partitions in order to increase throughput, is the log message consu