Memory Leak in Kafka

2018-01-23 Thread Avinash Herle
Hi, I'm using Kafka as a messaging system in my data pipeline. I've a couple of producer processes in my pipeline and Spark Streaming and Druid's Kafka indexing service

best practices for replication factor / partitions __consumer_offsets

2018-01-23 Thread Dennis
Hi, Are there any best practices or how to size __consumer_offsets and the associated replication factor? Regards, Dennis O.

uncontinuous offset for consumer seek

2018-01-23 Thread namesuperwood
Hi all kafka version : kafka_2.11-0.11.0.2 A topic-partition "adn-tracking,15" in kafka who's earliest offset is1255644602 andlatest offset is1271253441. While starting a spark streaming to process the data from the topic , we got a exception with "Got wrong record even after seeking

Re: Memory Leak in Kafka

2018-01-23 Thread Ted Yu
Did you attach two .png files ? Please use third party site since the attachment didn't come thru. On Tue, Jan 23, 2018 at 5:20 PM, Avinash Herle wrote: > > Hi, > > I'm using Kafka as a messaging system in my data pipeline. I've a couple > of producer processes in my

Memory Leak in Kafka

2018-01-23 Thread Avinash Herle
Hi, I'm using Kafka as a messaging system in my data pipeline. I've a couple of producer processes in my pipeline and Spark Streaming and Druid's Kafka indexing service

Re: can't feed remote broker with producer demo

2018-01-23 Thread Manoj Khangaonkar
In your server.properties , in either the listeners or advertised.listeners , replace the localhost with the ip address. regards On Mon, Jan 22, 2018 at 7:16 AM, Rotem Jacobi wrote: > Hi, > When running the quickstart guide (producer, broker (with zookeeper) and > consumer

Re: [VOTE] KIP-247: Add public test utils for Kafka Streams

2018-01-23 Thread Matthias J. Sax
One minor change to the KIP. The class TopologyTestDriver will be in package `org.apache.kafka.streams` (instead of `o.a.k.streams.test`). +1 (binding). I am closing this vote as accepted with 3 binding votes (Damian, Guozhang, Matthias) and 2 non-binding votes (Bill, James). Thanks for the

Kafka Connect high CPU / Memory usage

2018-01-23 Thread Ziliang Chen
Hi, I run Kafka Connect 0.11.2 (Kafka version) with only one instance and 6 tasks in it for a while, it then showed very high CPU usage ~250 % and memory consumption 10 Gb memory and it reported consume group rebalance warning. After the Kafka connect was killed and restarted, it showed the same

Re: Contiguous Offsets on non-compacted topics

2018-01-23 Thread Justin Miller
Hi Matthias and Guozhang, Given that information, I think I’m going to try out the following in our data lake persisters (spark-streaming-kafka): https://issues.apache.org/jira/browse/SPARK-17147 Skipping one message out of 10+ billion a day

Re: Contiguous Offsets on non-compacted topics

2018-01-23 Thread Guozhang Wang
Hello Justin, There are actually multi reasons that can cause incontinuous offsets, or "holes" in the Kafka partition logs: 1. compaction, you knew it already. 2. when transactions are turned on, then some offsets are actually taken by the "transaction marker" messages, which will not be exposed

Re: Merging Two KTables

2018-01-23 Thread Guozhang Wang
Hi Sameer, Dmitry: Just a side note that for KStream.merge(), we do not guarantee timestamp ordering, so the resulted KStream may likely have out-of-ordering regarding the timestamps. If you do want to have some merging operations that respects the timestamps of the input streams because you

Re: Contiguous Offsets on non-compacted topics

2018-01-23 Thread Matthias J. Sax
In general offsets should be consecutive. However, this is no "official guarantee" and you should not build application that rely on consecutive offsets. Also note, with Kafka 0.11 and transactions, commit/abort markers require on offset in the partitions and thus, having "offset gaps" is normal

Re: What is the best way to re-key a KTable?

2018-01-23 Thread Matthias J. Sax
Exactly. Because you might get multiple values with the same key, you need to specify an aggregation to get a single value for each key. -Matthias On 1/23/18 9:14 AM, Dmitry Minkovsky wrote: > KStream has a simple `#selectKey()` method, but it appears the only way to > re-key a KTable is by

Re: Merging Two KTables

2018-01-23 Thread Matthias J. Sax
Well. That is one possibility I guess. But some other way might be to "merge both values" into a single one... There is no "straight forward" best semantics IMHO. If you really need this, you can build it via Processor API. -Matthias On 1/23/18 7:46 AM, Dmitry Minkovsky wrote: >> Merging two

Re: Kafka/zookeeper logs in every command

2018-01-23 Thread Xin Li
Looking for log4j.properties, in the config directoires. Best, Xin On 23.01.18, 16:58, "José Ribeiro" wrote: Good morning. I have a problem about kafka logs showing in my outputs. When i started to work with kafka, the outputs were

JSONSchema Kafka Connect Converter

2018-01-23 Thread Andrew Otto
Hi all, I’ve been thinking a lot recently about JSON and Kafka. Because JSON is not strongly typed, it isn’t treated as a first class citizen of the Kafka ecosystem. At Wikimedia, we use JSONSchema validated JSON for Kafka messages.

Adding company CICS to "Powered by" page?

2018-01-23 Thread Christoffer Ramm
Hi, I would like for Swedish software company CICS (https://www.cics.se) to be added to the “Powered by Kafka” reference page. Thanks in advance! Cheers, Chris CICS provides Customer Experience software to Communications Service Providers and Apache Kafka is used as queue for input data from

Regarding : Store stream for infinite time

2018-01-23 Thread Aman Rastogi
Hi All, We have a use case to store stream for infinite time (given we have enough storage). We are planning to solve this by Log Compaction. If each message key is unique and Log compaction is enabled, it will store whole stream for infinite time. Just wanted to check if my assumption is

Re: Regarding : Store stream for infinite time

2018-01-23 Thread Aman Rastogi
Thanks Svante. Regards, Aman On Tue, Jan 23, 2018 at 11:38 PM, Svante Karlsson wrote: > Yes, it will store the last value for each key > > 2018-01-23 18:30 GMT+01:00 Aman Rastogi : > > > Hi All, > > > > We have a use case to store stream for

Re: Regarding : Store stream for infinite time

2018-01-23 Thread Svante Karlsson
Yes, it will store the last value for each key 2018-01-23 18:30 GMT+01:00 Aman Rastogi : > Hi All, > > We have a use case to store stream for infinite time (given we have enough > storage). > > We are planning to solve this by Log Compaction. If each message key is >

Contiguous Offsets on non-compacted topics

2018-01-23 Thread Justin Miller
Greetings, We’ve seen a strange situation where-in the topic is not compacted but the offset numbers inside the partition (#93) are not contiguous. This only happens once a day though, on a topic with billions of messages per day. next offset = 1786997223 next offset = 1786997224 next offset

What is the best way to re-key a KTable?

2018-01-23 Thread Dmitry Minkovsky
KStream has a simple `#selectKey()` method, but it appears the only way to re-key a KTable is by doing `.toStream(mapper).groupByKey().reduce()`. Is this correct? I'm guessing this is because an attempt to re-key a table might result in multiple values at the new key.

Kafka/zookeeper logs in every command

2018-01-23 Thread José Ribeiro
Good morning. I have a problem about kafka logs showing in my outputs. When i started to work with kafka, the outputs were normal. For example: kafka-topics.sh --list --zookeeper localhost:2181 returned test Now, with the same command line, i get this: SLF4J: Class path contains

Re: Merging Two KTables

2018-01-23 Thread Dmitry Minkovsky
> Merging two tables does not make too much sense because each table might contain an entry for the same key. So it's unclear, which of both values the merged table should contain. Which of both values should the table contain? Seems straightforward: it should contain the value with the highest

bug config port / listeners

2018-01-23 Thread Elias Abacioglu
Hi I just upgraded one node from kafka 0.10.1 to 1.0.0. And I've discovered a bug. Having this config listeners=PLAINTEXT://0.0.0.0:6667 and NOT setting port, will result to port using it's default 9092. And when the broker starts, it advertises wrong port: [2018-01-23 12:08:56,694] INFO