Re: Kafka null keys - OK or a problem?

2016-10-09 Thread Ali Akhtar
If keys are null, what happens in terms of partitioning, is the load spread evenly..? On Mon, Oct 10, 2016 at 7:59 AM, Gwen Shapira wrote: > Kafka itself supports null keys. I'm not sure about the Go client you > use, but Confluent's Go client also supports null keys >

Re: Kafka null keys - OK or a problem?

2016-10-09 Thread Gwen Shapira
Kafka itself supports null keys. I'm not sure about the Go client you use, but Confluent's Go client also supports null keys (https://github.com/confluentinc/confluent-kafka-go/). If you decide to generate keys and you want even spread, a random number generator is probably your best bet. Gwen

Re: How to merge two topics in kafka?

2016-10-09 Thread Ratha v
Thank you.I'll try the solution. But in the highlevel consumer API, topics will be created automatically? , We are not using zookeeper? On 10 October 2016 at 12:34, Sachin Mittal wrote: > You can use topicA.leftJoin (topicB).to (new-topic). > > You can the consume message

Re: How to merge two topics in kafka?

2016-10-09 Thread Sachin Mittal
You can use topicA.leftJoin (topicB).to (new-topic). You can the consume message from that new topic via second process. Note you need to create all three topics in zookeeper first. On 10 Oct 2016 5:19 a.m., "Ratha v" wrote: Hi all; I have two topics in the broker.

Kafka null keys - OK or a problem?

2016-10-09 Thread Ali Akhtar
A kafka producer written elsewhere that I'm using, which uses the Go kafka driver, is sending messages where the key is null. Is this OK - or will this cause issues due to partitioning not happening correctly? What would be a good way to generate keys in this case, to ensure even partition

How to merge two topics in kafka?

2016-10-09 Thread Ratha v
Hi all; I have two topics in the broker. Without consuming from one topic, I want to merge both topics, and will consume messages from the second topic. It is because, I have two processes, one process, pushes messages to topic A. And the second process once finished processing, it wants to

Re: In Kafka Streaming, Serdes should use Optionals

2016-10-09 Thread Ali Akhtar
It isn't a fatal error. It should be logged as a warning, and then the stream should be continued w/ the next message. Checking for null is 'ok', in the sense that it gets the job done, but after java 8's release, we really should be using optionals. Hopefully we can break compatibility w/ the

Re: In Kafka Streaming, Serdes should use Optionals

2016-10-09 Thread Guozhang Wang
Ali, In your scenario, if serde fails to parse the bytes should that be treated as a fatal failure or it is expected? In the former case, instead of returning a null I think it is better to throw a runtime exception in order to let the whole client to stop and notify the error; in the latter

Re: I found kafka lsot message

2016-10-09 Thread Guozhang Wang
Jerry, Message lost scenarios usually are due to producer and consumer mis-configured. Have you read about the client configs web docs? http://kafka.apache.org/documentation#producerconfigs http://kafka.apache.org/documentation#newconsumerconfigs If not I'd suggest you reading those first and

Re: Tuning for high RAM and 10GBe

2016-10-09 Thread Christopher Stelly
Hey Eno, thanks Yes, localhost would mean testing SSD, maybe moving to a remote producer in the future over 10GBe but baby steps first. My record size is pretty flexible - I've tested from 4k to 16M and the results are +- 10MB/s. SSD is a samsung 850, claiming 100k IOPS and >500MB/s read/write.

Re: Regarding Kafka

2016-10-09 Thread Abhit Kalsotra
Yeah that I realized and later read that in Java Kafka consumer there is one thread, that's why such behavior does not arise there. May be if I need to restrict my application to a single threaded :( in order to achieve that.. I need to ask Magnus Edenhill who is librdkafka expert.. Thanks for

Re: Tuning for high RAM and 10GBe

2016-10-09 Thread Eno Thereska
Hi Chris, A couple of things: looks like you're primarily testing the SSD if you're running on the localhost, right? The 10GBe shouldn't matter in this case. The performance will depend a lot on the record sizes you are using. To get a ballpark number if would help to know the SSD type, would

Re: Regarding Kafka

2016-10-09 Thread Hans Jespersen
I'm pretty sure Jun was talking about the Java API in the quoted blog text, not librdkafka. There is only one thread in the new Java consumer so you wouldn't see this behavior. I do not think that librdkafka makes any such guarantee to dispatch unique keys to each thread but I'm not an expert

Tuning for high RAM and 10GBe

2016-10-09 Thread Christopher Stelly
Hello, The last thread available regarding 10GBe is about 2 years old, with no obvious recommendations on tuning. Is there a more complex tuning guide than the example production config available on Kafka's main site? Anything other than the list of possible configs? I currently have access to

Re: Understanding how joins work in Kafka streams

2016-10-09 Thread Sachin Mittal
Hi, It is actually a KTable-KTable join. I have a stream (K1, A) which is aggregated as (Key, List) hence it creates a KTable. I have another stream (K2, B) which is aggregated as (Key, List) hence it creates another KTable. Then I have KTable (Key, List).leftJoin( KTable(Key, List),

Temporary node of broke in zookeeper lost

2016-10-09 Thread 隋玉增
hi, I meet a issue that the temporary node of broke in zookeeper was lost while the process of the broker still exist. At this time, the controller would consider it to be offline. According to zkClient log, I find the session is timeout, but handleStateChanged and handleNewSession(in

Content based enhancement for Apache Kafka

2016-10-09 Thread Janagan Sivagnanasundaram
Publisher/Subscriber systems can be divided into two categories. 1) Topic based model 2) Content based model - Provide accurate results compared to topic based model, since subscribers interested on the content of the message rather than subscribing to a topic and getting all the messages. Kafka

Re: Regarding Kafka

2016-10-09 Thread Abhit Kalsotra
I did that but i am getting confusing results e.g I have created 4 Kafka Consumer threads for doing data analytic, these threads just wait for Kafka messages to get consumed and I have provided the key provided when I produce, it means that all the messages will go to one single partition ref "

Re: Understanding how joins work in Kafka streams

2016-10-09 Thread Eno Thereska
Hi Sachin, Some comments inline: > On 9 Oct 2016, at 08:19, Sachin Mittal wrote: > > Hi, > I needed some light on how joins actually work on continuous stream of data. > > Say I have 2 topics which I need to join (left join). > Data record in each topic is aggregated like

Re: Regarding Kafka

2016-10-09 Thread Hans Jespersen
Then publish with the user ID as the key and all messages for the same key will be guaranteed to go to the same partition and therefore be in order for whichever consumer gets that partition. //h...@confluent.io Original message From: Abhit Kalsotra Date:

Re: Regarding Kafka

2016-10-09 Thread Abhit Kalsotra
What about the order of message getting received ? If i don't mention the partition. Lets say if i have user ID :4456 and I have to do some analytics at the Kafka Consumer end and at my consumer end if its not getting consumed the way I sent, then my analytics will go haywire. Abhi On Sun, Oct

Re: Understanding org.apache.kafka.streams.errors.TopologyBuilderException

2016-10-09 Thread Sachin Mittal
Thanks for pointing this out. I am doing exactly like this now and it is working fine. Sachin On Sun, Oct 9, 2016 at 12:32 PM, Matthias J. Sax wrote: > You must ensure, that both streams are co-partitioned (ie, same number > of partitions and using the join key). > >

Re: Regarding Kafka

2016-10-09 Thread Hans Jespersen
You don't even have to do that because the default partitioner will spread the data you publish to the topic over the available partitions for you. Just try it out to see. Publish multiple messages to the topic without using keys, and without specifying a partition, and observe that they are

Understanding how joins work in Kafka streams

2016-10-09 Thread Sachin Mittal
Hi, I needed some light on how joins actually work on continuous stream of data. Say I have 2 topics which I need to join (left join). Data record in each topic is aggregated like (key, value) <=> (string, list) Topic 1 key1: [A01, A02, A03, A04 ..] Key2: [A11, A12, A13, A14 ..] Topic 2

Re: Understanding org.apache.kafka.streams.errors.TopologyBuilderException

2016-10-09 Thread Matthias J. Sax
You must ensure, that both streams are co-partitioned (ie, same number of partitions and using the join key). (see "Note" box: http://docs.confluent.io/3.0.1/streams/developer-guide.html#joining-streams) You can enforce co-partitioning by introducing a call to .through() before doing the join

Re: Regarding Kafka

2016-10-09 Thread Abhit Kalsotra
Hans Thanks for the response, yeah you can say yeah I am treating topics like partitions, because my current logic of producing to a respective topic goes something like this RdKafka::ErrorCode resp = m_kafkaProducer->produce(m_kafkaTopic[whichTopic],