XinYi Long,
If I understood you correctly than you have to create different groups
Cheers
Avi
On Dec 1, 2017 4:09 AM, "XinYi Long" wrote:
Hi, guys,
?
My consumers want to share the data of one topic, so they use the same
group.id. And the meanwhile, they also
All the data log files for a given topic-partition are stored under a
topic-partition directory in a particular data.dir.
So a topic-partition directory can grow up to the capacity of the log.dir
director. And there can be multiple
topic-partition directories in a data.dir. It depends
Hi, guys,
?
My consumers want to share the data of one topic, so they use the same
group.id. And the meanwhile, they also want to read every single messages of
another topic, so they use unique group.id.
Does Kafka support this situation? Or I need to create different consumers
I am no storage or ESX expert, what I was told by our storage folks is that
they essentially created a dedicated storage pool in the SAN for zookeeper
VMs plus other VMs that did not have a lot of IO activity (non DB VMs). I
assume that implies dedicated physical disks in the SAN for that pool.
I
Can someone please help here ?
On Thu, Nov 23, 2017 at 10:42 AM, Raghav wrote:
> Anyone here ?
>
> On Wed, Nov 22, 2017 at 4:04 PM, Raghav wrote:
>
>> Hi
>>
>> If I give several locations with smaller capacity for log.dirs vs one
>> large drive for
We are looking at implementing Kafka Producer HA ..
i.e there are 2 producers which can produce the same data ..
The objective is to have High Availability implemented for the Kafka
Producer ..
i.e. if Producer1 goes down, the Producer2 kick starts and produces data
starting from the offset
You then also need to set this up for each topic you create:
> bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor
> 3 --partitions 3 --topic my-replicated-topic
-Dave
-Original Message-
From: Skip Montanaro [mailto:skip.montan...@gmail.com]
Sent: Thursday,
Giresh, I'm curious what your solution was. Did you use locally attached
storage for your ZK ensemble? Did you move it to static machines?
On Thu, Nov 30, 2017 at 4:50 PM, John Yost wrote:
> Great point by Girish--its the delays of syncing with Zookeeper that are
>
Yes you are probably right. So I was inspired be the KIP 150 blog post, so
the entire statement would be like this:
KTable customerGrouped=
kStreamBuilder.stream(stringSerde, customerMessageSerde,
CUSTOMER_TOPIC)
.groupBy((key,value) ->
> If you create a partitioned topic with at least 3 partitions then you will
> see your client connect to all of the brokers. The client decides which
> partition a message should go to and then sends it directly to the broker
> that is the leader for that partition. If you have replicated
Great point by Girish--its the delays of syncing with Zookeeper that are
particularly problematic. Moreover, Zookeeper sync delays and session
timeouts impact other systems as well such as Storm.
--John
On Thu, Nov 30, 2017 at 10:14 AM, Girish Aher wrote:
> We did not
There are some oddities in your topology that make make we wonder if
they are the true drivers of your question.
https://github.com/afuyo/KStreamsDemo/blob/master/src/main/java/kstream.demo/CustomerStreamPipelineHDI.java#L300
Feels like it should be a KTable to begin with for example otherwise
Hi,
Depending on what you want, you can go :
1) https://cwiki.apache.org/confluence/display/KAFKA/Performance+testing
where its explain the procedure to initiates performances tests and how to
use *kafka-consumer-perf-test.sh* and *kafka-producer-perf-test.sh* among
others useful scripts.
2)
what if I start two instances of that application? Does the state migrate
between the applications? Is it then I have to use a global table?
BR
Artur
On Thu, Nov 30, 2017 at 7:40 PM, Jan Filipiak
wrote:
> Hi,
>
> Haven't checked your code. But from what you describe
Hi,
Haven't checked your code. But from what you describe you should be fine.
Upgrading the version might help here and there but should still work
with 0.10
I guess.
Best Jan
On 30.11.2017 19:16, Artur Mrozowski wrote:
Thank you Damian, it was very helpful.
I have implemented my solution
Thank you Damian, it was very helpful.
I have implemented my solution in version 0.11.0.2 but there is one thing I
still wonder.
So what I try to do is what is described in KIP 150. Since it didn't make
to the release for 1.0 I do it the old fashioned way.
Dear Kafka community,
In the doc -> https://kafka.apache.org/documentation/#distributionimpl
4. sort Pt (so partitions on the same broker are clustered together)
and
During rebalancing, we try to assign partitions to consumers in such a way
that reduces the number of broker nodes each consumer
I am new to Kafka world, wanted to do performance testing on kafka.
I googled around, but dint find any useful document from which i can start
with.
Could you explain or point me to document it would be great.
Also which performance loading tool would be best to test kafka, I prefer
Jmeter
Can you also check if you have partition leaders flapping or changing rapidly?
Also, look at the following settings on your client configs:
max.partition.fetch.bytes
fetch.max.bytes
receive.buffer.bytes
We had a similar situation in our environment when the brokers were flooded
with data.
The
We did not face any problems with kafka application per se but we have
faced problems with zookeeper in virtualized environments due to slowness
in fsyncs. We were using a shared SAN storage with shared pools with other
VMs. So every time, there was some kind of considerable storage activity
like
Hi There,
We are running into a weird situation when using Mirrormaker to replicate
messages between Kafka clusters across datacenter and reach you for help in
case you also encountered this kind of problem before or have some insights
in this kind of issue.
Here is the scenario. We have
I'm using a (java) consumer with default configuration, so auto-commit is
enabled. The consumer is reading from 5 partitions of a single topic. The
consumer processes one message at a time (synchronously). Sometimes, large
numbers of messages are posted to the topic, and the consumer will have to
We run many thousands of clusters on EC2 without notable issues, and
achieve great performance there. The real thing that matters is how good
your virtualization layer is and how much of a performance impact it has.
E.g. in modern EC2, the performance overhead of using virtualized IO is
around
We are running kafka on openstack for a testing/staging environment.
It runs good and stable, but it obviously is way slower than bare-metal.
Simple reason is the distance to the disk (as with any IO batch oriented
system on virtualisation) and virtual network.
HTH
-wim
On Thu, 30 Nov 2017 at
Just wanted to let everyone know that this issue got fixed in Kafka 1.0.0.
I recently migrated to it and didnt find the issue any longer.
-Sameer.
On Thu, Sep 14, 2017 at 5:50 PM, Sameer Kumar
wrote:
> ;Ok. I will inspect this further and keep everyone posted on this.
>
Hi folks,
Recently I bumped into an interesting question: using kafka in virtualized
environments, such as vmware. I'm not really familiar with virtualization
in-depth (how disk virtualization works, what are the OS level supports
etc.), therefore I think this is an interesting discussion from
Hi,
I have a Kafka cluster running on AWS. I want to connect to the cluster with
the standard kafka-console-consumer from my application server. The application
server has access to the internet via a SOCKS-Proxy. No authentication is
required
How do I tell the Kafka client to connect through
hi,
I understand your point better now.
I think systems of that kind have been build plenty and I never liked
their trade-offs.
Samza and Kafka-streams form a great alternative to what is out there in
great numbers.
I am a big fan of how this is designed and think its really great. Maybe
The CPU/IO required to complete a compaction phase will grow as the log
grows but you can manage this via the cleaner's various configs. Check out
properties starting log.cleaner in the docs (
https://kafka.apache.org/documentation). All databases that implement LSM
storage have a similar overhead
Thanks for your precious advices.
Yes, we had upgraded nofiles parameter in limits.conf but after one week, the
big crash.
Precisely, on the broken node, __consumer_offsets-XX directories are never
deleted and after 20 hours, we have 70 GB of these directories and files. This
is the huge
The consumers are using default settings, which means that
enable.auto.commit=true and auto.commit.interval.ms=5000. I'm not
committing manually; just consuming messages.
On Thu, Nov 30, 2017 at 1:09 AM, Frank Lyaruu wrote:
> Do you commit the received messages? Either by
If I understand correctly, the "auto.offset.reset" setting is only used if
there is no offset available in Kafka (i.e. no offset has ever be
committed?), or if the offset does not exist anymore. In my situation, I
don't understand how either situation would be possible. The consumers
continuously
32 matches
Mail list logo