Kafka Cluster fail to start if one instance of zookeeper not up

2017-03-06 Thread safique ahemad
Hi All, Below are two scenarios that are failing 1) 3 node zookeeper cluster and 3 node Kafka cluster: One of zookeeper node was down due to some reason when Kafka cluster started, none of its stances came up. All are failing with connectivity to down zookeeper instance. Whereas, Kafka contains

Re: Fixing two critical bugs in kafka streams

2017-03-06 Thread Sachin Mittal
> As for the second issue you brought up, I agree it is indeed a bug; but just to clarify it is the CREATION of the first task including restoring stores that can take longer than MAX_POLL_INTERVAL_MS_CONFIG, not processing it right Yes this is correct. I may have misused the terminology so lets

Re: Performance and encryption

2017-03-06 Thread Ismael Juma
Hi Todd, I agree that KAFKA-2561 would be good to have for the reasons you state. Ismael On Mon, Mar 6, 2017 at 5:17 PM, Todd Palino wrote: > Thanks for the link, Ismael. I had thought that the most recent kernels > already implemented this, but I was probably confusing it

Re: Clarification on min.insync.replicas​

2017-03-06 Thread Todd Palino
Default broker configurations do not show in the topic overrides (which is what you are showing with the topics tool). It is more accurate to say that the min.insync.replicas setting in your server.properties file is what will apply to every topic (regardless of when it is created), if there

Clarification on min.insync.replicas​

2017-03-06 Thread Shrikant Patel
Hi All, Need details about min.insync.replicas​ in the server.properties. I thought once I add this to server.properties, all subsequent topic create should have this as default value. C:\JAVA_INSTALLATION\kafka\kafka_2.11-0.10.1.1>bin\windows\kafka-topics.bat --zookeeper

Re: Can I create user defined stream processing function/api?

2017-03-06 Thread Matthias J. Sax
Hi, you can implements custom operator via process(), transform(), and transform() values. Also, if you want to have even more control over the topology, you can use low-level Processor API directly instead of DSL. http://docs.confluent.io/current/streams/developer-guide.html#processor-api

Can I create user defined stream processing function/api?

2017-03-06 Thread Wang LongTian
Dear folks, Background: I'm leaning Kafka stream and want to use that in my product for real time streaming process with data from various sensors. Question: 1. Can I define my own processing function/api in Kafka stream except the predefined functions like groupby(), count() etc.? 2. If I

MirrorMaker and producers

2017-03-06 Thread Jack Foy
Hey, all. Is there any general guidance around using mirrored topics in the context of a cluster migration? We're moving operations from one data center to another, and we want to stream mirrored data from the old cluster to the new, migrate consumers, then migrate producers. Our basic question

Re: Kafka Kerberos Ansible

2017-03-06 Thread Mudit Agarwal
thanks Le.However my cluster is kerberized. From: Le Cyberian To: Mudit Agarwal Sent: Monday, 6 March 2017 9:24 PM Subject: Re: Kafka Kerberos Ansible Hi Mudit, I guess its more related to Ansible rather than Kafka itself, However i will

Re: Fixing two critical bugs in kafka streams

2017-03-06 Thread Matthias J. Sax
Thanks for your input. I now understood the first issue (and the fix). Still not sure about the second issue. From my understanding, the deadlock is "caused" by your fix of problem one. If the thread would die, the lock would get release and no deadlock would occur. However, because the thread

Re: Kafka Kerberos Ansible

2017-03-06 Thread Le Cyberian
Hi Mudit, I guess its more related to Ansible rather than Kafka itself, However i will try to answer. Since Ansible uses SSH and you already have passwordless ssh between ansible host (which executes playbooks) to Kafka Cluster. You can simply use ansible command or shell module to get the list

3 node Kafka cluster with 3 node ZK ensemble

2017-03-06 Thread Mich Talebzadeh
Hi, Assuming that we are building ZK and Kafka as a messaging system. Two scenarios we consider. 1. Deploy 3 physical hosts (as opposed to VM) and create a Kafka cluster there. On the same physical hosts create three ZK cluster. 2. Deploy 3 physical hosts (as opposed to VM) and

Re: Kafka Kerberos Ansible

2017-03-06 Thread Mudit Agarwal
Let me reframe the questions. How can i list the topics using ansible script from ansible host which is outside the kafka cluster.My kafka cluster is kerberized.Kafka and ansible are passwordless ssh. Thanks,Mudit From: Le Cyberian To: users@kafka.apache.org; Mudit

Re: Having 4 Node Kafka Cluster

2017-03-06 Thread Le Cyberian
Hi Han, Thank you for your response. I understand. Its not possible to have a third rack/server room at the moment as the requirement is to have redundancy between both. I tried already to get one :-/ Is it possible to have a Zookeeper Ensemble (3 node) in one server room and same in the other

Re: Fixing two critical bugs in kafka streams

2017-03-06 Thread Guozhang Wang
Hello Sachin, Thanks for your finds!! Just to add what Damian said regarding 1), in KIP-129 where we are introducing exactly-once processing semantics to Streams we have also described different categories of error handling for exactly-once. Commit exceptions due to rebalance will be handled as

Re: Having 4 Node Kafka Cluster

2017-03-06 Thread Hans Jespersen
Is there any way you can find a third rack/server room/power supply nearby just for the 1 extra zookeeper node? You don’t have to put any kafka brokers there, just a single zookeeper. It’s less likely to have a 3-way split brain because of a network partition. It’s so much cleaner with 3

Re: Kafka Kerberos Ansible

2017-03-06 Thread Le Cyberian
Hi Mudit, What do you mean by accessing Kafka cluster outside Ansible VM ? It needs to listen to a interface which is available for the network outside of the VM BR, Lee On Mon, Mar 6, 2017 at 7:42 PM, Mudit Agarwal wrote: > Hi, > How we can access the kafka

Re: Having 4 Node Kafka Cluster

2017-03-06 Thread Le Cyberian
Thanks Han and Alexander for taking time out and your responses. I now understand the risks and the possible outcome of having the desired setup. What would be better in your opinion to have failover (active-active) between both of these server rooms to avoid switching to the clone / 3rd

Kafka Kerberos Ansible

2017-03-06 Thread Mudit Agarwal
Hi, How we can access the kafka cluster from an outside Ansible VM.The kafka is kerberiszed.All linux environment. Thanks,Mudit

Re: Kafka streams questions

2017-03-06 Thread Matthias J. Sax
Yes, that is the parameter I was referring, too. And yes, you can set consumer/producer config via StreamsConfig. However, it's recommended to use > props.put(StreamsConfig.consumerPrefix("consumer.parameter.name"), value); -Matthias On 3/6/17 6:48 AM, Neil Moore wrote: > Thanks for the

Re: java.lang.IllegalStateException: Correlation id for response () does not match request ()

2017-03-06 Thread Ismael Juma
Hi Mickael, This looks to be the same as KAFKA-4669. In theory, this should never happen and it's unclear when/how it can happen. Not sure if someone has investigated it in more detail. Ismael On Mon, Mar 6, 2017 at 5:15 PM, Mickael Maison wrote: > Hi, > > In one of

Re: Performance and encryption

2017-03-06 Thread IT Consultant
Hi Todd Can you please help me with notes or document on how did you achieve encryption ? I have followed data available on official sites but failed as I m no good with TLS . On Mar 6, 2017 19:55, "Todd Palino" wrote: > It’s not that Kafka has to decode it, it’s that it

Re: Performance and encryption

2017-03-06 Thread Todd Palino
Thanks for the link, Ismael. I had thought that the most recent kernels already implemented this, but I was probably confusing it with BSD. Most of my systems are stuck in the stone age right now anyway. It would be nice to get KAFKA-2561 in, either way. First off, if you can take advantage of it

java.lang.IllegalStateException: Correlation id for response () does not match request ()

2017-03-06 Thread Mickael Maison
Hi, In one of our clusters, some of our clients occasionally see this exception: java.lang.IllegalStateException: Correlation id for response (4564) does not match request (4562) at org.apache.kafka.clients.NetworkClient.correlate(NetworkClient.java:486) at

Re: Performance and encryption

2017-03-06 Thread Ismael Juma
Even though OpenSSL is much faster than the Java 8 TLS implementation (I haven't tested against Java 9, which is much faster than Java 8, but probably still slower than OpenSSL), all the tests were without zero copy in the sense that is being discussed here (i.e. sendfile). To benefit from

Re: Performance and encryption

2017-03-06 Thread Todd Palino
So that’s not quite true, Hans. First, as far as the performance hit being not a big impact (25% is huge). Or that it’s to be expected. Part of the problem is that the Java TLS implementation does not support zero copy. OpenSSL does, and in fact there’s been a ticket open to allow Kafka to support

Re: Fixing two critical bugs in kafka streams

2017-03-06 Thread Sachin Mittal
Please find the JIRA https://issues.apache.org/jira/browse/KAFKA-4848 On Mon, Mar 6, 2017 at 5:20 PM, Damian Guy wrote: > Hi Sachin, > > If it is a bug then please file a JIRA for it, too. > > Thanks, > Damian > > On Mon, 6 Mar 2017 at 11:23 Sachin Mittal

Re: Having 4 Node Kafka Cluster

2017-03-06 Thread Alexander Binzberger
I agree on this is one cluster but having one additional ZK node per site does not help. (as far as I understand ZK) A 3 out of 6 is also not a majority. So I think you mean 3/5 with a cloned 3rd one. This would mean manually switching the cloned one for majority which can cause issues again.

Re: Kafka streams questions

2017-03-06 Thread Neil Moore
Thanks for the answers, Matthias. You mention a metadata refresh interval. I see Kafka producers and consumers have a property called metadata.max.age.ms which sounds similar. From the documentation and looking at the Javadoc for Kafka streams it is not clear to me how I can affect

Re: Strange behaviour in Session Windows

2017-03-06 Thread Damian Guy
Hi Marco, I've done some testing and found that there is a performance issue when caching is enabled. I suspect his might be what you are hitting. It looks to me that you can work around this by doing something like: final StateStoreSupplier sessionStore = Stores.create(*"session-store-name"*)

Re: Performance and encryption

2017-03-06 Thread Hans Jespersen
Its not a single message at a time that is encrypted with TLS its the entire network byte stream so a Kafka broker can’t even see the Kafka Protocol tunneled inside TLS unless it’s terminated at the broker. It is true that losing the zero copy optimization impacts performance somewhat but

Re: Performance and encryption

2017-03-06 Thread Todd Palino
It’s not that Kafka has to decode it, it’s that it has to send it across the network. This is specific to enabling TLS support (transport encryption), and won’t affect any end-to-end encryption you do at the client level. The operation in question is called “zero copy”. In order to send a message

Re: Having 4 Node Kafka Cluster

2017-03-06 Thread Hans Jespersen
In that case it’s really one cluster. Make sure to set different rack ids for each server room so kafka will ensure that the replicas always span both floors and you don’t loose availability of data if a server room goes down. You will have to configure one addition zookeeper node in each site

Re: Having 4 Node Kafka Cluster

2017-03-06 Thread Le Cyberian
Hi Hans, Thank you for your reply. Its basically two different server rooms on different floors and they are connected with fiber connectivity so its almost like a local connection between them no network latencies / lag. If i do a Mirror Maker / Replicator then i will not be able to use them

Re: Having 4 Node Kafka Cluster

2017-03-06 Thread Le Cyberian
Hi Hans, Thank you for your reply. Its basically two different server rooms on different floors and they are connected with fiber connectivity so its almost like a local connection between them no network latencies / lag. If i do a Mirror Maker / Replicator then i will not be able to use them

Re: Having 4 Node Kafka Cluster

2017-03-06 Thread Hans Jespersen
What do you mean when you say you have "2 sites not datacenters"? You should be very careful configuring a stretch cluster across multiple sites. What is the RTT between the two sites? Why do you think that MIrror Maker (or Confluent Replicator) would not work between the sites and yet you think a

Re: Having 4 Node Kafka Cluster

2017-03-06 Thread Le Cyberian
Hi Guys, Thank you very much for you reply. The scenario which i have to implement is that i have 2 sites not datacenters so mirror maker would not work here. There will be 4 nodes in total, like 2 in Site A and 2 in Site B. The idea is to have Active-Active setup along with fault tolerance so

Performance and encryption

2017-03-06 Thread Nicolas Motte
Hi everyone, I understand one of the reasons why Kafka is performant is by using zero-copy. I often hear that when encryption is enabled, then Kafka has to copy the data in user space to decode the message, so it has a big impact on performance. If it is true, I don t get why the message has to

Re: Having 4 Node Kafka Cluster

2017-03-06 Thread Le Cyberian
Hi Guys, Thank you very much for you reply. The scenario which i have to implement is that i have 2 sites not datacenters so mirror maker would not work here. There will be 4 nodes in total, like 2 in Site A and 2 in Site B. The idea is to have Active-Active setup along with fault tolerance so

Re: Strange behaviour in Session Windows

2017-03-06 Thread Marco Abitabile
Thanks Damian, sure, you are right, these details are modified to be compliant with my company rules. However the main points are unchanged. The producer of the original data is a "data ingestor" that attach few extra fields and produces a message such as: row = new JsonObject({ "id" :

Re: Strange behaviour in Session Windows

2017-03-06 Thread Damian Guy
Hi Marco, Can you try setting StreamsConfig.CACHE_MAX_BYTES_BUFFER_CONFIG to 0 and see if that resolves the issue? Thanks, Damian On Mon, 6 Mar 2017 at 10:59 Damian Guy wrote: > Hi Marco, > > Your config etc look ok. > > 1. It is pretty hard to tell what is going on

Re: Fixing two critical bugs in kafka streams

2017-03-06 Thread Damian Guy
Hi Sachin, If it is a bug then please file a JIRA for it, too. Thanks, Damian On Mon, 6 Mar 2017 at 11:23 Sachin Mittal wrote: > Ok that's great. > So you have already fixed that issue. > > I have modified my PR to remove that change (which was done keeping > 0.10.2.0 in

Re: Fixing two critical bugs in kafka streams

2017-03-06 Thread Sachin Mittal
Ok that's great. So you have already fixed that issue. I have modified my PR to remove that change (which was done keeping 0.10.2.0 in mind). However the other issue is still valid. Please review that change. https://github.com/apache/kafka/pull/2642 Thanks Sachin On Mon, Mar 6, 2017 at

Re: Having 4 Node Kafka Cluster

2017-03-06 Thread Hans Jespersen
Jens, I think you are correct that a 4 node zookeeper ensemble can be made to work but it will be slightly less resilient than a 3 node ensemble because it can only tolerate 1 failure (same as a 3 node ensemble) and the likelihood of node failures is higher because there is 1 more node that

Re: Strange behaviour in Session Windows

2017-03-06 Thread Damian Guy
Hi Marco, Your config etc look ok. 1. It is pretty hard to tell what is going on from just your code below, unfortunately. But the behaviour doesn't seem to be inline with what I'm reading in the streams code. For example your MySession::new function should be called once per record. The merger

Re: Fixing two critical bugs in kafka streams

2017-03-06 Thread Damian Guy
On trunk the CommitFailedException isn't thrown anymore. The commitOffsets method doesn't throw an exception. It returns one if it was thrown. We used to throw this exception during suspendTasksAndState, but we don't anymore. On Mon, 6 Mar 2017 at 05:04 Sachin Mittal wrote:

Re: Kafka Streams - ordering grouped messages

2017-03-06 Thread Damian Guy
Hi Ofir, My advice it to handle the duplicates. As you said compaction only runs on the non-active segments. There could be duplicates in the active segment. Further, even after compaction has run there could still be duplicates. You can attempt to minimize the occurrence of duplicates by

Strange behaviour in Session Windows

2017-03-06 Thread Marco Abitabile
Hello, I'm playing around with the brand new SessionWindows. I have a simple topology such as: KStream sess = builder.stream(stringSerde, jsonSerde, SOURCE_TOPIC); sess .map(MySession::enhanceWithUserId_And_PutUserIdAsKey) .groupByKey(stringSerde, jsonSerde)

kafka authenticate

2017-03-06 Thread 陈江枫
Hi, all I'm trying to modify kafka authentication using our own authenticating procedure, authorization will stick to kafka's acls . Does every entry which fetches data from certain topic need to go through authentication? ( Including KafkaStreams, replica to leader ,etc.)

Re: Kafka streams DSL advantage

2017-03-06 Thread Michael Noll
The DSL has some unique features that aren't in the Processor API, such as: - KStream and KTable abstractions. - Support for time windows (tumbling windows, hopping windows) and session windows. The Processor API only has stream-time based `punctuate()`. - Record caching, which is slightly

Re: Writing data from kafka-streams to remote database

2017-03-06 Thread Michael Noll
I'd use option 2 (Kafka Connect). Advantages of #2: - The code is decoupled from the processing code and easier to refactor in the future. (same as #4) - The runtime/uptime/scalability of your Kafka Streams app (processing) is decoupled from the runtime/uptime/scalability of the data ingestion