Re: Does Kafka support topic level group.id?

2017-11-30 Thread Avi Levi
XinYi Long, If I understood you correctly than you have to create different groups Cheers Avi On Dec 1, 2017 4:09 AM, "XinYi Long" wrote: Hi, guys, ? My consumers want to share the data of one topic, so they use the same group.id. And the meanwhile, they also

Re: Question about disks and log.dirs

2017-11-30 Thread Manikumar
All the data log files for a given topic-partition are stored under a topic-partition directory in a particular data.dir. So a topic-partition directory can grow up to the capacity of the log.dir director. And there can be multiple topic-partition directories in a data.dir. It depends

Does Kafka support topic level group.id?

2017-11-30 Thread XinYi Long
Hi, guys, ? My consumers want to share the data of one topic, so they use the same group.id. And the meanwhile, they also want to read every single messages of another topic, so they use unique group.id. Does Kafka support this situation? Or I need to create different consumers

Re: Kafka in virtualized environments

2017-11-30 Thread Girish Aher
I am no storage or ESX expert, what I was told by our storage folks is that they essentially created a dedicated storage pool in the SAN for zookeeper VMs plus other VMs that did not have a lot of IO activity (non DB VMs). I assume that implies dedicated physical disks in the SAN for that pool. I

Re: Question about disks and log.dirs

2017-11-30 Thread Raghav
Can someone please help here ? On Thu, Nov 23, 2017 at 10:42 AM, Raghav wrote: > Anyone here ? > > On Wed, Nov 22, 2017 at 4:04 PM, Raghav wrote: > >> Hi >> >> If I give several locations with smaller capacity for log.dirs vs one >> large drive for

Kafka Producer HA - using Kafka Connect

2017-11-30 Thread sham singh
We are looking at implementing Kafka Producer HA .. i.e there are 2 producers which can produce the same data .. The objective is to have High Availability implemented for the Kafka Producer .. i.e. if Producer1 goes down, the Producer2 kick starts and produces data starting from the offset

RE: Multiple brokers - do they share the load?

2017-11-30 Thread Tauzell, Dave
You then also need to set this up for each topic you create: > bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor > 3 --partitions 3 --topic my-replicated-topic -Dave -Original Message- From: Skip Montanaro [mailto:skip.montan...@gmail.com] Sent: Thursday,

Re: Kafka in virtualized environments

2017-11-30 Thread Sean Glover
Giresh, I'm curious what your solution was. Did you use locally attached storage for your ZK ensemble? Did you move it to static machines? On Thu, Nov 30, 2017 at 4:50 PM, John Yost wrote: > Great point by Girish--its the delays of syncing with Zookeeper that are >

Re: Joins in Kafka Streams and partitioning of the topics

2017-11-30 Thread Artur Mrozowski
Yes you are probably right. So I was inspired be the KIP 150 blog post, so the entire statement would be like this: KTable customerGrouped= kStreamBuilder.stream(stringSerde, customerMessageSerde, CUSTOMER_TOPIC) .groupBy((key,value) ->

Re: Multiple brokers - do they share the load?

2017-11-30 Thread Skip Montanaro
> If you create a partitioned topic with at least 3 partitions then you will > see your client connect to all of the brokers. The client decides which > partition a message should go to and then sends it directly to the broker > that is the leader for that partition. If you have replicated

Re: Kafka in virtualized environments

2017-11-30 Thread John Yost
Great point by Girish--its the delays of syncing with Zookeeper that are particularly problematic. Moreover, Zookeeper sync delays and session timeouts impact other systems as well such as Storm. --John On Thu, Nov 30, 2017 at 10:14 AM, Girish Aher wrote: > We did not

Re: Joins in Kafka Streams and partitioning of the topics

2017-11-30 Thread Jan Filipiak
There are some oddities in your topology that make make we wonder if they are the true drivers of your question. https://github.com/afuyo/KStreamsDemo/blob/master/src/main/java/kstream.demo/CustomerStreamPipelineHDI.java#L300 Feels like it should be a KTable to begin with for example otherwise

Re: Kafka Performance testing

2017-11-30 Thread Saïd Bouras
Hi, Depending on what you want, you can go : 1) https://cwiki.apache.org/confluence/display/KAFKA/Performance+testing where its explain the procedure to initiates performances tests and how to use *kafka-consumer-perf-test.sh* and *kafka-producer-perf-test.sh* among others useful scripts. 2)

Re: Joins in Kafka Streams and partitioning of the topics

2017-11-30 Thread Artur Mrozowski
what if I start two instances of that application? Does the state migrate between the applications? Is it then I have to use a global table? BR Artur On Thu, Nov 30, 2017 at 7:40 PM, Jan Filipiak wrote: > Hi, > > Haven't checked your code. But from what you describe

Re: Joins in Kafka Streams and partitioning of the topics

2017-11-30 Thread Jan Filipiak
Hi, Haven't checked your code. But from what you describe you should be fine. Upgrading the version might help here and there but should still work with 0.10 I guess. Best Jan On 30.11.2017 19:16, Artur Mrozowski wrote: Thank you Damian, it was very helpful. I have implemented my solution

Re: Joins in Kafka Streams and partitioning of the topics

2017-11-30 Thread Artur Mrozowski
Thank you Damian, it was very helpful. I have implemented my solution in version 0.11.0.2 but there is one thing I still wonder. So what I try to do is what is described in KIP 150. Since it didn't make to the release for 1.0 I do it the old fashioned way.

Doc says when doing re-balance, sort by leader then partition, but the code seems sort only on partition

2017-11-30 Thread 李响
Dear Kafka community, In the doc -> https://kafka.apache.org/documentation/#distributionimpl 4. sort Pt (so partitions on the same broker are clustered together) and During rebalancing, we try to assign partitions to consumers in such a way that reduces the number of broker nodes each consumer

Kafka Performance testing

2017-11-30 Thread Ranganath
I am new to Kafka world, wanted to do performance testing on kafka. I googled around, but dint find any useful document from which i can start with. Could you explain or point me to document it would be great. Also which performance loading tool would be best to test kafka, I prefer Jmeter

Re: Lost messages and messed up offsets

2017-11-30 Thread Thakrar, Jayesh
Can you also check if you have partition leaders flapping or changing rapidly? Also, look at the following settings on your client configs: max.partition.fetch.bytes fetch.max.bytes receive.buffer.bytes We had a similar situation in our environment when the brokers were flooded with data. The

Re: Kafka in virtualized environments

2017-11-30 Thread Girish Aher
We did not face any problems with kafka application per se but we have faced problems with zookeeper in virtualized environments due to slowness in fsyncs. We were using a shared SAN storage with shared pools with other VMs. So every time, there was some kind of considerable storage activity like

Mirrormaker consumption slowness

2017-11-30 Thread tao xiao
Hi There, We are running into a weird situation when using Mirrormaker to replicate messages between Kafka clusters across datacenter and reach you for help in case you also encountered this kind of problem before or have some insights in this kind of issue. Here is the scenario. We have

Incorrect auto-commited offsets

2017-11-30 Thread Tom van den Berge
I'm using a (java) consumer with default configuration, so auto-commit is enabled. The consumer is reading from 5 partitions of a single topic. The consumer processes one message at a time (synchronously). Sometimes, large numbers of messages are posted to the topic, and the consumer will have to

Re: Kafka in virtualized environments

2017-11-30 Thread Thomas Crayford
We run many thousands of clusters on EC2 without notable issues, and achieve great performance there. The real thing that matters is how good your virtualization layer is and how much of a performance impact it has. E.g. in modern EC2, the performance overhead of using virtualized IO is around

Re: Kafka in virtualized environments

2017-11-30 Thread Wim Van Leuven
We are running kafka on openstack for a testing/staging environment. It runs good and stable, but it obviously is way slower than bare-metal. Simple reason is the distance to the disk (as with any IO batch oriented system on virtualisation) and virtual network. HTH -wim On Thu, 30 Nov 2017 at

Re: Kafka 11 | Stream Application crashed the brokers

2017-11-30 Thread Sameer Kumar
Just wanted to let everyone know that this issue got fixed in Kafka 1.0.0. I recently migrated to it and didnt find the issue any longer. -Sameer. On Thu, Sep 14, 2017 at 5:50 PM, Sameer Kumar wrote: > ;Ok. I will inspect this further and keep everyone posted on this. >

Kafka in virtualized environments

2017-11-30 Thread Viktor Somogyi
Hi folks, Recently I bumped into an interesting question: using kafka in virtualized environments, such as vmware. I'm not really familiar with virtualization in-depth (how disk virtualization works, what are the OS level supports etc.), therefore I think this is an interesting discussion from

Connect to Kafka through SOCKS Proxy

2017-11-30 Thread Aljoscha Schulte
Hi, I have a Kafka cluster running on AWS. I want to connect to the cluster with the standard kafka-console-consumer from my application server. The application server has access to the internet via a SOCKS-Proxy. No authentication is required How do I tell the Kafka client to connect through

Re: Plans to extend streams?

2017-11-30 Thread Jan Filipiak
hi, I understand your point better now. I think systems of that kind have been build plenty and I never liked their trade-offs. Samza and Kafka-streams form a great alternative to what is out there in great numbers. I am a big fan of how this is designed and think its really great. Maybe

Re: kafka compacted topic

2017-11-30 Thread Ben Stopford
The CPU/IO required to complete a compaction phase will grow as the log grows but you can manage this via the cleaner's various configs. Check out properties starting log.cleaner in the docs ( https://kafka.apache.org/documentation). All databases that implement LSM storage have a similar overhead

RE: Too many open files in kafka 0.9

2017-11-30 Thread REYMOND Jean-max (BPCE-IT - SYNCHRONE TECHNOLOGIES)
Thanks for your precious advices. Yes, we had upgraded nofiles parameter in limits.conf but after one week, the big crash. Precisely, on the broken node, __consumer_offsets-XX directories are never deleted and after 20 hours, we have 70 GB of these directories and files. This is the huge

Re: Lost messages and messed up offsets

2017-11-30 Thread Tom van den Berge
The consumers are using default settings, which means that enable.auto.commit=true and auto.commit.interval.ms=5000. I'm not committing manually; just consuming messages. On Thu, Nov 30, 2017 at 1:09 AM, Frank Lyaruu wrote: > Do you commit the received messages? Either by

Re: [EXTERNAL] - Lost messages and messed up offsets

2017-11-30 Thread Tom van den Berge
If I understand correctly, the "auto.offset.reset" setting is only used if there is no offset available in Kafka (i.e. no offset has ever be committed?), or if the offset does not exist anymore. In my situation, I don't understand how either situation would be possible. The consumers continuously