Re: Insanely long recovery time with Kafka 0.11.0.2

2018-01-11 Thread James Cheng
We saw this as well, when updating from 0.10.1.1 to 0.11.0.1. Have you restarted your brokers since then? Did it take 8h to start up again, or did it take its normal 45 minutes? I don't think it's related to the crash/recovery. Rather, I think it's due to the upgrade from 0.10.1.1 to 0.11.0.1

Re: Capturing and storing these Kafka events for query.

2018-01-11 Thread Hans Jespersen
Another approach would be to create the query first (in something like KSQL) and then send the Kafka data through the pre-existing streaming query. In this case the results would be going into various result topics. Tools like KSQL also let you query historical data but you need to be sure

Re: Does MirrorMaker ensures exactly-once delivery across clusters?

2018-01-11 Thread Matthias J. Sax
From a transaction point of view yes. However, the MirrorMake consumer must know to read its offsets from the target cluster instead of the source cluster, and this is quite unnatural for a consumer... So it's a little bit trickier than just picky backing commits on the producer... -Matthias

Re: mirror maker producer thread dies with error

2018-01-11 Thread Manikumar
"Memory records is not writable" error was fixed in 0.10.0.0 release https://issues.apache.org/jira/browse/KAFKA-3594 On Fri, Jan 12, 2018 at 6:10 AM, Sunil Parmar wrote: > We see multiple instance of this error > > 2017-12-23 05:30:53,722 WARN >

Re: Does MirrorMaker ensures exactly-once delivery across clusters?

2018-01-11 Thread Stephane Maarek
One could refactor MirrorMaker to commit the source cluster's offset in the target cluster's instead (in a special topic) This would technically allow achieving exactly once using the Transactional API. But there's work associated with that Let me know if I’m missing something On

[DISCUSS] KIP-247: Add public test utils for Kafka Streams

2018-01-11 Thread Matthias J. Sax
Dear Kafka community, I want to propose KIP-247 to add public test utils to the Streams API. The goal is to simplify testing of Kafka Streams applications. Please find details in the wiki: https://cwiki.apache.org/confluence/display/KAFKA/KIP-247%3A+Add+public+test+utils+for+Kafka+Streams This

Re: Kafka 1.0 upgrade

2018-01-11 Thread Brett Rann
we run a 1.0.1 prerelease in production just fine, but the scale is smaller. 20+ clusters with 3-10 brokers each, each cluster with about 120 topics and about 15k partitions. We have unusual messages sizes, so peaks of around 40k messages, 60MB in, 400MB out, per sec in the largest one. we run a

mirror maker producer thread dies with error

2018-01-11 Thread Sunil Parmar
We see multiple instance of this error 2017-12-23 05:30:53,722 WARN org.apache.kafka.clients.producer.internals.Sender: Got error produce response with correlation id 41987642 on topic-partition events-14, retrying (2147482899 attempts left). Error: NOT_LEADER_FOR_PARTITION Followed by this

Re: Kafka per topic retention.bytes and global log.retention.bytes not working

2018-01-11 Thread Thunder Stumpges
Thanks, yes we upgraded to 1.0.0 and that has indeed fixed the issue. Thanks for the pointer! -Thunder On Tue, Jan 9, 2018 at 9:50 PM Wim Van Leuven < wim.vanleu...@highestpoint.biz> wrote: > Upgrade? > > On Wed, Jan 10, 2018, 00:26 Thunder Stumpges > wrote: > > >

Kafka 1.0 upgrade

2018-01-11 Thread Tolga Can
Hi We are in the process of upgrading our Kafka cluster from 0.9 to 1.0. We will need a cluster similar to LinkedIn’s busiest clusters as described here , 50K partitions 50 VMs. I could not find enough information about if the latest

Re: Capturing and storing these Kafka events for query.

2018-01-11 Thread Manoj Khangaonkar
Hi, If I understood the question correctly , then the better approach is to consume events from topic and store in your favorite database. Then query the database as needed. Querying the topic for messages in kafka is not recommended as that will be a linear search. regards On Thu, Jan 11,

Re: Does MirrorMaker ensures exactly-once delivery across clusters?

2018-01-11 Thread Matthias J. Sax
No. Transactions are designed to work within a single cluster, not cross cluster, ie, if you have a read-process-write pattern similar to what Kafka Streams does. -Matthias On 1/11/18 12:46 AM, Jiri Humpolicek wrote: > Hi Everyone, > > since kafka 0.11.x supports exactly-once semantics, I

Re: Insanely long recovery time with Kafka 0.11.0.2

2018-01-11 Thread Vincent Rischmann
If anyone else has any idea, I'd love to hear it. Meanwhile, I'll resume upgrading my brokers and hope it doesn't crash and/or take so much time for recovery. On Sat, Jan 6, 2018, at 7:25 PM, Vincent Rischmann wrote: > Hi, > > just to clarify: this is the cause of the crash >

Does MirrorMaker ensures exactly-once delivery across clusters?

2018-01-11 Thread Jiri Humpolicek
Hi Everyone, since kafka 0.11.x supports exactly-once semantics, I want to be sure, that it is possible to achieve exactly-once delivery across kafka clusters using MirrorMaker. We have got two locations with "primary" cluster in each location and for each location we have got one "aggregation"

Capturing and storing these Kafka events for query.

2018-01-11 Thread Maria Pilar
Hi all, I have a requirement to be able to capture and store events for query, and I'm trying to choose the best option for that: 1) Capture the events from a separate topic, store events a state, in order to convert a stream to a table, that means materializing the stream. The option for it,

RE: Best practice for publishing byte messages to Kafka

2018-01-11 Thread Tauzell, Dave
Whatever you use I recommend some sort of wrapper since Kafka doesn't support any sort of metadata (like the version of the serialization format). -Dave -Original Message- From: Matt Farmer [mailto:m...@frmr.me] Sent: Thursday, January 11, 2018 8:56 AM To: users@kafka.apache.org Subject:

Re: Best practice for publishing byte messages to Kafka

2018-01-11 Thread Matt Farmer
We use Thrift and Kryo, and we haven’t done any formal analysis recently. Performance numbers for those should be easy to find by googling around. I can say that we push tens of thousands of messages per second at peak and serialization hasn’t been a cause of production lag in the entire year

Re: Best practice for publishing byte messages to Kafka

2018-01-11 Thread Ron Arts
Benchmarks can be found here: https://eng.uber.com/trip-data-squeeze/ -- Ron 2018-01-09 14:12 GMT+01:00 Ali Nazemian : > Hi All, > > I was wondering whether there is any best practice/recommendation for > publishing byte messages to Kafka. Is there any specific

Re: Best practice for publishing byte messages to Kafka

2018-01-11 Thread Xin Li
Protobuf? On 11.01.18, 01:33, "Ali Nazemian" wrote: Oops, I was mistaken. I meant serialization of an object as a byte array from the first place! On Wed, Jan 10, 2018 at 3:20 PM, Thunder Stumpges < thunder.stump...@gmail.com> wrote: > Byte