The way offset management works with kafka is :
It stores offsets for a particular (groupId, Topic, partitionId) in a
particular partition of __consumer_offset topic.
1) By default the value is 50. You can change it by setting this property :
*offsets.topic.num.partitions* in your config.
2) No
I was wondering if anybody has already tried to mirror a kafka topic to
hdfs just copying the log files from the topic directory of the broker
(like 23244237.log).
The file format is very simple :
https://twitter.com/amiorin/status/576448691139121152/photo/1
Implementing an
Just curious - why - is Camus not suitable/working?
Thanks,
Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr Elasticsearch Support * http://sematext.com/
On Fri, Mar 13, 2015 at 2:33 PM, Alberto Miorin amiorin78+ka...@gmail.com
wrote:
I was wondering if
You might want to increase the number of Replica Fetcher threads by setting
this property : *num.replica.fetchers*.
Thanks,
Mayuresh
On Thu, Mar 12, 2015 at 10:39 PM, Zakee kzak...@netzero.net wrote:
With the producer throughput as large as 150MB/s to 5 brokers on a
continuous basis, I see
I would think that this is not a particularly great solution, as you will
end up running into quite a few edge cases, and I can't see this scaling
particularly well - how do you know which server to copy logs from in a
clustered and replicated environment? What happens when Kafka detects a
failure
Flume solution looks very good.
Thx.
On Fri, Mar 13, 2015 at 8:15 PM, William Briggs wrbri...@gmail.com wrote:
I would think that this is not a particularly great solution, as you will
end up running into quite a few edge cases, and I can't see this scaling
particularly well - how do you
Sorry, but still confused. Maximum number of threads (fetchers) to fetch from
a Leader or maximum number of threads within a follower broker?
Thanks for clarifying,
-Zakee
On Mar 12, 2015, at 11:11 PM, tao xiao xiaotao...@gmail.com wrote:
The number of fetchers is configurable via
ctrl+c should work. Did you see any issue for that?
On 3/12/15, 11:49 PM, tao xiao xiaotao...@gmail.com wrote:
Hi,
I wanted to know that how I can shutdown mirror maker safely (ctrl+c) when
there is no message coming to consume. I am using mirror maker off trunk
code.
--
Regards,
Tao
We use spark on mesos. I don't want to partition our cluster because of one
YARN job (camus).
Best
Alberto
On Fri, Mar 13, 2015 at 7:43 PM, Otis Gospodnetic
otis.gospodne...@gmail.com wrote:
Just curious - why - is Camus not suitable/working?
Thanks,
Otis
--
Monitoring * Alerting *
I suppose that the patch for KAFKA-1641 had a fix for this issue.
Also it might be worth looking at Kafka-1755.
Thanks,
Mayuresh
On Fri, Mar 13, 2015 at 8:13 AM, Marc Labbe mrla...@gmail.com wrote:
No exactly, the topics are compacted but messages are not compressed.
I get the exact same
https://kafka.apache.org/documentation.html#basic_ops_cluster_expansion
~ Joe Stein
- - - - - - - - - - - - - - - - -
http://www.stealth.ly
- - - - - - - - - - - - - - - - -
On Fri, Mar 13, 2015 at 3:05 PM, sunil kalva sambarc...@gmail.com wrote:
Joe
Well, I know it is semantic but right
We are currently using spark streaming 1.2.1 with kafka and write-ahead log.
I will only say one thing : a nightmare. ;-)
I’d be really interested in hearing about your experience here. I’m exploring
streaming frameworks a bit, and Spark Streaming is just so easy to use and set
up. I’d be
Update :
Turns out this error happens in 2 scenarios
1. When there is a mis-match between the broker and zookeeper libs inside of
your process (found that from stackoverflow)
2.Apparetly when anything that uses scala parser combinators libs (in our case
scala.util.parsing.json.JSON) runs
It seemed really counter-intuitive; I can only imagine that it happened
because nobody wanted to refactor the existing KafkaInputDStream to use the
SimpleConsumer instead of the High Level Consumer (unless I'm misreading
the source - it looks like that's what the new DirectKafkaInputDStream is
I'll try this too. It looks very promising.
Thx
On Fri, Mar 13, 2015 at 8:25 PM, Gwen Shapira gshap...@cloudera.com wrote:
There's a KafkaRDD that can be used in Spark:
https://github.com/tresata/spark-kafka. It doesn't exactly replace
Camus, but should be useful in building Camus-like
The newest version of Spark came out today.
https://spark.apache.org/releases/spark-release-1-3-0.html
Apparently they made improvements to the Kafka connector for Spark
Streaming (see Approach 2):
http://spark.apache.org/docs/1.3.0/streaming-kafka-integration.html
Best,
Niek
Hi Mayuresh,
I have currently set this property to 4 and I see from the logs that it starts
12 threads on each broker. I will try increasing it further.
Thanks
Zakee
On Mar 13, 2015, at 11:53 AM, Mayuresh Gharat gharatmayures...@gmail.com
wrote:
You might want to increase the number of
Spark Streaming also has built-in support for Kafka, and as of Spark 1.2,
it supports using an HDFS write-ahead log to ensure zero data loss while
streaming:
https://databricks.com/blog/2015/01/15/improved-driver-fault-tolerance-and-zero-data-loss-in-spark-streaming.html
-Will
On Fri, Mar 13,
I really like the new approach. The WAL in HDFS never made much sense
to me (I mean, Kafka is a log. I know they don't want the Kafka
dependency, but a log for a log makes no sense).
Still experimental, but I think thats the right direction.
On Fri, Mar 13, 2015 at 12:38 PM, Alberto Miorin
Thanks for the heads-up, Alberto, that's good to know. We were about to
start a few projects working with Spark Streaming + Kafka; sounds like
there's still quite a bit of work to be done there.
-Will
On Fri, Mar 13, 2015 at 3:38 PM, Alberto Miorin amiorin78+ka...@gmail.com
wrote:
We are
We are currently using spark streaming 1.2.1 with kafka and write-ahead log.
I will only say one thing : a nightmare. ;-)
Let's see if things are better with 1.3.0 :
http://spark.apache.org/docs/1.3.0/streaming-kafka-integration.html
On Fri, Mar 13, 2015 at 8:33 PM, William Briggs
1) You save everything 2 times (kafka and hdfs).
2) You need to enable the checkpoint feature, that means you cannot change
the configuration of the job, because the spark streaming context is
deserialized from hdfs every time you restart the job.
3) What happens if hdfs is unavailable, not clear?
Also very interesting in hearing about them.
I prefer war stories in form for Jira for the relevant project ;)
There's a good chance we can make things less horrible if issues are reported.
Gwen
On Fri, Mar 13, 2015 at 12:48 PM, Andrew Otto ao...@wikimedia.org wrote:
We are currently using
Sorry to go back in time on this thread, but Camus does NOT use YARN. We have
been using camus for a while on our CDH4 (no YARN) Hadoop cluster. It really is
fairly easy to set up, and seems to be quite good so far.
-Thunder
-Original Message-
From: amiori...@gmail.com
+1 - if you have a way to reproduce that would be ideal. We don't know
the root cause of this yet. Our guess is a corner case around
shutdowns, but not sure.
On Fri, Mar 13, 2015 at 03:13:45PM -0700, Jun Rao wrote:
Is there a way that you can reproduce this easily?
Thanks,
Jun
On Fri,
I did a shutdown of the cluster and then try to restart and see the below error
on one of the 5 brokers, I can’t restart this instance and not sure how to fix
this.
[2015-03-13 15:27:31,793] ERROR There was an error in one of the threads during
logs loading: java.lang.IllegalArgumentException:
Camus uses MapReduce though.
If Alberto uses Spark exclusively, I can see why installing MapReduce
cluster (with or without YARN) is not a desirable solution.
On Fri, Mar 13, 2015 at 1:06 PM, Thunder Stumpges tstump...@ntent.com wrote:
Sorry to go back in time on this thread, but Camus does
Sorry for late reply. Not sure what more details you need.
Here's example http://confluent.io/docs/current/kafka-rest/docs/intro.html
of exposing Kafka through remoting (http/rest) :-)
One can without looking into kafka rest proxy code see based on its
limitations that it's using HL consumer, with
Can you verify that the leaders are evenly spread? and if necessary
run a preferred leader election?
On Fri, Mar 13, 2015 at 05:10:22PM -0700, Zakee wrote:
I have 35 topics spread with total 398 partitions (2 of them are supposed to
be very high volume and so allocated 28 partitions to them,
Hi Stevo,
I won't speak for Joe, but what we do is documented in the link that Joe
provided:
Adding servers to a Kafka cluster is easy, just assign them a unique
broker id and start up Kafka on your new servers. However these new servers
will not automatically be assigned any data partitions, so
The index files work in the following way :
Its a mapping from logical offsets to a particular file location within the
log file segment.
If you see the comments under OffsetIndex.scala code :
The file format is a series of entries. The physical format is a 4 byte
relative offset and a 4 byte
I have 35 topics spread with total 398 partitions (2 of them are supposed to be
very high volume and so allocated 28 partitions to them, others vary between 5
to 14).
Thanks
Zakee
On Mar 13, 2015, at 3:25 PM, Joel Koshy jjkosh...@gmail.com wrote:
I think what people have observed in the
Thanks, Mayuresh. I did the same and it fixed the issue.
Thanks
Zakee
On Mar 13, 2015, at 3:56 PM, Mayuresh Gharat gharatmayures...@gmail.com
wrote:
The index files work in the following way :
Its a mapping from logical offsets to a particular file location within the
log file segment.
I think what people have observed in the past is that increasing
num-replica-fetcher-threads has diminishing returns fairly quickly.
You may want to instead increase the number of partitions in the topic
you are producing to. (How many do you have right now?)
On Fri, Mar 13, 2015 at 02:48:17PM
Can you reproduce this problem? Although the the fix is strait forward we
would like to understand why this happened.
On 3/13/15, 3:56 PM, Zakee kzak...@netzero.net wrote:
Just found there is a known issue to be resolved in future kafka version:
https://issues.apache.org/jira/browse/KAFKA-1554
The number of fetchers is configurable via num.replica.fetchers. The
description of num.replica.fetchers in Kafka documentation is not quite
accurate. num.replica.fetchers actually controls the max number of fetchers
per broker. In you case num.replica.fetchers=8 and 5 brokers the means no
more 8
Hi,
I am using Kafka 0.8.2.1. I have two topics with 10 partitions each.
Noticed that one more topic exist named as __consumer_offset with 50
partitions. My questions are:
1. Why this topic is created with 50 partition?
2. How to get consumer group names for a topic? Is there any document or
Did you get into that issue for the same reason as in the jira, i.e.,
somehow compressed messages were sent to the compact topics?
Thanks,
Jun
On Fri, Mar 13, 2015 at 6:45 AM, Marc Labbe mrla...@gmail.com wrote:
Hello,
we're often seeing log cleaner exceptions reported in KAFKA-1641 and I'd
Hello Apache Kafka community,
On Apache Kafka website home page http://kafka.apache.org/ it is stated
that Kafka can be elastically and transparently expanded without downtime.
Is that really true? More specifically, can one just add one more broker,
have another partition added for the topic,
No exactly, the topics are compacted but messages are not compressed.
I get the exact same error though. Any other options I should consider?
We're on 0.8.2.0 and we also had this on 0.8.1.1 before.
marc
On Fri, Mar 13, 2015 at 10:47 AM, Jun Rao j...@confluent.io wrote:
Did you get into that
Hi,
our cluster is deployed on AWS, we have brokers on r3.large instances, a
decent amount of topics+partitions (+600 partitions). We're not making that
many requests/sec, roughly 80 produce/sec and 240 fetch/sec (not counting
internal replication requests) and yet CPU hovers around 40%, which I
Hi Marc,
If you are seeing high CPU usages with a large number of partitions on
0.8.2 you should definitely upgrade to 0.8.2.1 as the following issue was
fixed: https://issues.apache.org/jira/browse/KAFKA-1952
Also see the 0.8.2.1 release notes for other fixes:
Thanks, I'll start with that before changing my deployment for oracle jdk.
On Fri, Mar 13, 2015 at 11:40 AM, Mark Reddy mark.l.re...@gmail.com wrote:
Hi Marc,
If you are seeing high CPU usages with a large number of partitions on
0.8.2 you should definitely upgrade to 0.8.2.1 as the
Hey Stevo, can be elastically and transparently expanded without downtime. is
the goal of Kafka on Mesos https://github.com/mesos/kafka as Kafka as the
ability (knobs/levers) to-do this but has to be made to-do this out of the
box.
e.g. in Kafka on Mesos when a broker fails, after the
Hello,
we're often seeing log cleaner exceptions reported in KAFKA-1641 and I'd
like to know if it's safe to apply the patch from that issue resolution to
0.8.2.1?
Reference: https://issues.apache.org/jira/browse/KAFKA-1641
Also there are 2 patches in there, I suppose I should be using only the
I wrote a short blog on what's being fixed in 0.8.2.1 release.
http://blog.confluent.io/2015/03/13/apache-kafka-0-8-2-1-release/
We recommend everyone upgrade to 0.8.2.1.
Thanks,
Jun
OK, thanks for heads up.
When reading Apache Kafka docs, and reading what Apache Kafka can I
expect it to already be available in latest general availability release,
not what's planned as part of some other project.
Kind regards,
Stevo Slavic.
On Fri, Mar 13, 2015 at 2:32 PM, Joe Stein
Well, I know it is semantic but right now it can be elastically scaled
without down time but you have to integrate into your environment for what
that means it has been that way since 0.8.0 imho.
My point was just another way to-do that out of the box... folks do this
elastic scailing today
48 matches
Mail list logo