Re: How to fetch consumer group names of a Topic from Kafka offset manager in Kafka 0.8.2.1

2015-03-13 Thread Mayuresh Gharat
The way offset management works with kafka is : It stores offsets for a particular (groupId, Topic, partitionId) in a particular partition of __consumer_offset topic. 1) By default the value is 50. You can change it by setting this property : *offsets.topic.num.partitions* in your config. 2) No

Alternative to camus

2015-03-13 Thread Alberto Miorin
I was wondering if anybody has already tried to mirror a kafka topic to hdfs just copying the log files from the topic directory of the broker (like 23244237.log). The file format is very simple : https://twitter.com/amiorin/status/576448691139121152/photo/1 Implementing an

Re: Alternative to camus

2015-03-13 Thread Otis Gospodnetic
Just curious - why - is Camus not suitable/working? Thanks, Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr Elasticsearch Support * http://sematext.com/ On Fri, Mar 13, 2015 at 2:33 PM, Alberto Miorin amiorin78+ka...@gmail.com wrote: I was wondering if

Re: High Replica Max Lag

2015-03-13 Thread Mayuresh Gharat
You might want to increase the number of Replica Fetcher threads by setting this property : *num.replica.fetchers*. Thanks, Mayuresh On Thu, Mar 12, 2015 at 10:39 PM, Zakee kzak...@netzero.net wrote: With the producer throughput as large as 150MB/s to 5 brokers on a continuous basis, I see

Re: Alternative to camus

2015-03-13 Thread William Briggs
I would think that this is not a particularly great solution, as you will end up running into quite a few edge cases, and I can't see this scaling particularly well - how do you know which server to copy logs from in a clustered and replicated environment? What happens when Kafka detects a failure

Re: Alternative to camus

2015-03-13 Thread Alberto Miorin
Flume solution looks very good. Thx. On Fri, Mar 13, 2015 at 8:15 PM, William Briggs wrbri...@gmail.com wrote: I would think that this is not a particularly great solution, as you will end up running into quite a few edge cases, and I can't see this scaling particularly well - how do you

Re: createMessageStreams vs createMessageStreamsByFilter

2015-03-13 Thread Zakee
Sorry, but still confused. Maximum number of threads (fetchers) to fetch from a Leader or maximum number of threads within a follower broker? Thanks for clarifying, -Zakee On Mar 12, 2015, at 11:11 PM, tao xiao xiaotao...@gmail.com wrote: The number of fetchers is configurable via

Re: How to shutdown mirror maker safely

2015-03-13 Thread Jiangjie Qin
ctrl+c should work. Did you see any issue for that? On 3/12/15, 11:49 PM, tao xiao xiaotao...@gmail.com wrote: Hi, I wanted to know that how I can shutdown mirror maker safely (ctrl+c) when there is no message coming to consume. I am using mirror maker off trunk code. -- Regards, Tao

Re: Alternative to camus

2015-03-13 Thread Alberto Miorin
We use spark on mesos. I don't want to partition our cluster because of one YARN job (camus). Best Alberto On Fri, Mar 13, 2015 at 7:43 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Just curious - why - is Camus not suitable/working? Thanks, Otis -- Monitoring * Alerting *

Re: Log cleaner patch (KAFKA-1641) on 0.8.2.1

2015-03-13 Thread Mayuresh Gharat
I suppose that the patch for KAFKA-1641 had a fix for this issue. Also it might be worth looking at Kafka-1755. Thanks, Mayuresh On Fri, Mar 13, 2015 at 8:13 AM, Marc Labbe mrla...@gmail.com wrote: No exactly, the topics are compacted but messages are not compressed. I get the exact same

Re: Kafka elastic no downtime scalability

2015-03-13 Thread Joe Stein
https://kafka.apache.org/documentation.html#basic_ops_cluster_expansion ~ Joe Stein - - - - - - - - - - - - - - - - - http://www.stealth.ly - - - - - - - - - - - - - - - - - On Fri, Mar 13, 2015 at 3:05 PM, sunil kalva sambarc...@gmail.com wrote: Joe Well, I know it is semantic but right

Re: Alternative to camus

2015-03-13 Thread Andrew Otto
We are currently using spark streaming 1.2.1 with kafka and write-ahead log. I will only say one thing : a nightmare. ;-) I’d be really interested in hearing about your experience here. I’m exploring streaming frameworks a bit, and Spark Streaming is just so easy to use and set up. I’d be

RE: JSON parsing causing rebalance to fail

2015-03-13 Thread Arunkumar Srambikkal (asrambik)
Update : Turns out this error happens in 2 scenarios 1. When there is a mis-match between the broker and zookeeper libs inside of your process (found that from stackoverflow) 2.Apparetly when anything that uses scala parser combinators libs (in our case scala.util.parsing.json.JSON) runs

Re: Alternative to camus

2015-03-13 Thread William Briggs
It seemed really counter-intuitive; I can only imagine that it happened because nobody wanted to refactor the existing KafkaInputDStream to use the SimpleConsumer instead of the High Level Consumer (unless I'm misreading the source - it looks like that's what the new DirectKafkaInputDStream is

Re: Alternative to camus

2015-03-13 Thread Alberto Miorin
I'll try this too. It looks very promising. Thx On Fri, Mar 13, 2015 at 8:25 PM, Gwen Shapira gshap...@cloudera.com wrote: There's a KafkaRDD that can be used in Spark: https://github.com/tresata/spark-kafka. It doesn't exactly replace Camus, but should be useful in building Camus-like

Kafka and Spark 1.3.0

2015-03-13 Thread Niek Sanders
The newest version of Spark came out today. https://spark.apache.org/releases/spark-release-1-3-0.html Apparently they made improvements to the Kafka connector for Spark Streaming (see Approach 2): http://spark.apache.org/docs/1.3.0/streaming-kafka-integration.html Best, Niek

Re: High Replica Max Lag

2015-03-13 Thread Zakee
Hi Mayuresh, I have currently set this property to 4 and I see from the logs that it starts 12 threads on each broker. I will try increasing it further. Thanks Zakee On Mar 13, 2015, at 11:53 AM, Mayuresh Gharat gharatmayures...@gmail.com wrote: You might want to increase the number of

Re: Alternative to camus

2015-03-13 Thread William Briggs
Spark Streaming also has built-in support for Kafka, and as of Spark 1.2, it supports using an HDFS write-ahead log to ensure zero data loss while streaming: https://databricks.com/blog/2015/01/15/improved-driver-fault-tolerance-and-zero-data-loss-in-spark-streaming.html -Will On Fri, Mar 13,

Re: Alternative to camus

2015-03-13 Thread Gwen Shapira
I really like the new approach. The WAL in HDFS never made much sense to me (I mean, Kafka is a log. I know they don't want the Kafka dependency, but a log for a log makes no sense). Still experimental, but I think thats the right direction. On Fri, Mar 13, 2015 at 12:38 PM, Alberto Miorin

Re: Alternative to camus

2015-03-13 Thread William Briggs
Thanks for the heads-up, Alberto, that's good to know. We were about to start a few projects working with Spark Streaming + Kafka; sounds like there's still quite a bit of work to be done there. -Will On Fri, Mar 13, 2015 at 3:38 PM, Alberto Miorin amiorin78+ka...@gmail.com wrote: We are

Re: Alternative to camus

2015-03-13 Thread Alberto Miorin
We are currently using spark streaming 1.2.1 with kafka and write-ahead log. I will only say one thing : a nightmare. ;-) Let's see if things are better with 1.3.0 : http://spark.apache.org/docs/1.3.0/streaming-kafka-integration.html On Fri, Mar 13, 2015 at 8:33 PM, William Briggs

Re: Alternative to camus

2015-03-13 Thread Alberto Miorin
1) You save everything 2 times (kafka and hdfs). 2) You need to enable the checkpoint feature, that means you cannot change the configuration of the job, because the spark streaming context is deserialized from hdfs every time you restart the job. 3) What happens if hdfs is unavailable, not clear?

Re: Alternative to camus

2015-03-13 Thread Gwen Shapira
Also very interesting in hearing about them. I prefer war stories in form for Jira for the relevant project ;) There's a good chance we can make things less horrible if issues are reported. Gwen On Fri, Mar 13, 2015 at 12:48 PM, Andrew Otto ao...@wikimedia.org wrote: We are currently using

RE: Alternative to camus

2015-03-13 Thread Thunder Stumpges
Sorry to go back in time on this thread, but Camus does NOT use YARN. We have been using camus for a while on our CDH4 (no YARN) Hadoop cluster. It really is fairly easy to set up, and seems to be quite good so far. -Thunder -Original Message- From: amiori...@gmail.com

Re: Log cleaner patch (KAFKA-1641) on 0.8.2.1

2015-03-13 Thread Joel Koshy
+1 - if you have a way to reproduce that would be ideal. We don't know the root cause of this yet. Our guess is a corner case around shutdowns, but not sure. On Fri, Mar 13, 2015 at 03:13:45PM -0700, Jun Rao wrote: Is there a way that you can reproduce this easily? Thanks, Jun On Fri,

Broker Restart failed w/ Corrupt index found

2015-03-13 Thread Zakee
I did a shutdown of the cluster and then try to restart and see the below error on one of the 5 brokers, I can’t restart this instance and not sure how to fix this. [2015-03-13 15:27:31,793] ERROR There was an error in one of the threads during logs loading: java.lang.IllegalArgumentException:

Re: Alternative to camus

2015-03-13 Thread Gwen Shapira
Camus uses MapReduce though. If Alberto uses Spark exclusively, I can see why installing MapReduce cluster (with or without YARN) is not a desirable solution. On Fri, Mar 13, 2015 at 1:06 PM, Thunder Stumpges tstump...@ntent.com wrote: Sorry to go back in time on this thread, but Camus does

Re: Reusable consumer across consumer groups

2015-03-13 Thread Stevo Slavić
Sorry for late reply. Not sure what more details you need. Here's example http://confluent.io/docs/current/kafka-rest/docs/intro.html of exposing Kafka through remoting (http/rest) :-) One can without looking into kafka rest proxy code see based on its limitations that it's using HL consumer, with

Re: High Replica Max Lag

2015-03-13 Thread Joel Koshy
Can you verify that the leaders are evenly spread? and if necessary run a preferred leader election? On Fri, Mar 13, 2015 at 05:10:22PM -0700, Zakee wrote: I have 35 topics spread with total 398 partitions (2 of them are supposed to be very high volume and so allocated 28 partitions to them,

Re: Kafka elastic no downtime scalability

2015-03-13 Thread Chi Hoang
Hi Stevo, I won't speak for Joe, but what we do is documented in the link that Joe provided: Adding servers to a Kafka cluster is easy, just assign them a unique broker id and start up Kafka on your new servers. However these new servers will not automatically be assigned any data partitions, so

Re: Broker Restart failed w/ Corrupt index found

2015-03-13 Thread Mayuresh Gharat
The index files work in the following way : Its a mapping from logical offsets to a particular file location within the log file segment. If you see the comments under OffsetIndex.scala code : The file format is a series of entries. The physical format is a 4 byte relative offset and a 4 byte

Re: High Replica Max Lag

2015-03-13 Thread Zakee
I have 35 topics spread with total 398 partitions (2 of them are supposed to be very high volume and so allocated 28 partitions to them, others vary between 5 to 14). Thanks Zakee On Mar 13, 2015, at 3:25 PM, Joel Koshy jjkosh...@gmail.com wrote: I think what people have observed in the

Re: Broker Restart failed w/ Corrupt index found

2015-03-13 Thread Zakee
Thanks, Mayuresh. I did the same and it fixed the issue. Thanks Zakee On Mar 13, 2015, at 3:56 PM, Mayuresh Gharat gharatmayures...@gmail.com wrote: The index files work in the following way : Its a mapping from logical offsets to a particular file location within the log file segment.

Re: High Replica Max Lag

2015-03-13 Thread Joel Koshy
I think what people have observed in the past is that increasing num-replica-fetcher-threads has diminishing returns fairly quickly. You may want to instead increase the number of partitions in the topic you are producing to. (How many do you have right now?) On Fri, Mar 13, 2015 at 02:48:17PM

Re: Broker Restart failed w/ Corrupt index found

2015-03-13 Thread Jiangjie Qin
Can you reproduce this problem? Although the the fix is strait forward we would like to understand why this happened. On 3/13/15, 3:56 PM, Zakee kzak...@netzero.net wrote: Just found there is a known issue to be resolved in future kafka version: https://issues.apache.org/jira/browse/KAFKA-1554

Re: createMessageStreams vs createMessageStreamsByFilter

2015-03-13 Thread tao xiao
The number of fetchers is configurable via num.replica.fetchers. The description of num.replica.fetchers in Kafka documentation is not quite accurate. num.replica.fetchers actually controls the max number of fetchers per broker. In you case num.replica.fetchers=8 and 5 brokers the means no more 8

How to fetch consumer group names of a Topic from Kafka offset manager in Kafka 0.8.2.1

2015-03-13 Thread Madhukar Bharti
Hi, I am using Kafka 0.8.2.1. I have two topics with 10 partitions each. Noticed that one more topic exist named as __consumer_offset with 50 partitions. My questions are: 1. Why this topic is created with 50 partition? 2. How to get consumer group names for a topic? Is there any document or

Re: Log cleaner patch (KAFKA-1641) on 0.8.2.1

2015-03-13 Thread Jun Rao
Did you get into that issue for the same reason as in the jira, i.e., somehow compressed messages were sent to the compact topics? Thanks, Jun On Fri, Mar 13, 2015 at 6:45 AM, Marc Labbe mrla...@gmail.com wrote: Hello, we're often seeing log cleaner exceptions reported in KAFKA-1641 and I'd

Kafka elastic no downtime scalability

2015-03-13 Thread Stevo Slavić
Hello Apache Kafka community, On Apache Kafka website home page http://kafka.apache.org/ it is stated that Kafka can be elastically and transparently expanded without downtime. Is that really true? More specifically, can one just add one more broker, have another partition added for the topic,

Re: Log cleaner patch (KAFKA-1641) on 0.8.2.1

2015-03-13 Thread Marc Labbe
No exactly, the topics are compacted but messages are not compressed. I get the exact same error though. Any other options I should consider? We're on 0.8.2.0 and we also had this on 0.8.1.1 before. marc On Fri, Mar 13, 2015 at 10:47 AM, Jun Rao j...@confluent.io wrote: Did you get into that

Kafka High CPU, 0.8.2.1 or openjdk?

2015-03-13 Thread Marc Labbe
Hi, our cluster is deployed on AWS, we have brokers on r3.large instances, a decent amount of topics+partitions (+600 partitions). We're not making that many requests/sec, roughly 80 produce/sec and 240 fetch/sec (not counting internal replication requests) and yet CPU hovers around 40%, which I

Re: Kafka High CPU, 0.8.2.1 or openjdk?

2015-03-13 Thread Mark Reddy
Hi Marc, If you are seeing high CPU usages with a large number of partitions on 0.8.2 you should definitely upgrade to 0.8.2.1 as the following issue was fixed: https://issues.apache.org/jira/browse/KAFKA-1952 Also see the 0.8.2.1 release notes for other fixes:

Re: Kafka High CPU, 0.8.2.1 or openjdk?

2015-03-13 Thread Marc Labbe
Thanks, I'll start with that before changing my deployment for oracle jdk. On Fri, Mar 13, 2015 at 11:40 AM, Mark Reddy mark.l.re...@gmail.com wrote: Hi Marc, If you are seeing high CPU usages with a large number of partitions on 0.8.2 you should definitely upgrade to 0.8.2.1 as the

Re: Kafka elastic no downtime scalability

2015-03-13 Thread Joe Stein
Hey Stevo, can be elastically and transparently expanded without downtime. is the goal of Kafka on Mesos https://github.com/mesos/kafka as Kafka as the ability (knobs/levers) to-do this but has to be made to-do this out of the box. e.g. in Kafka on Mesos when a broker fails, after the

Log cleaner patch (KAFKA-1641) on 0.8.2.1

2015-03-13 Thread Marc Labbe
Hello, we're often seeing log cleaner exceptions reported in KAFKA-1641 and I'd like to know if it's safe to apply the patch from that issue resolution to 0.8.2.1? Reference: https://issues.apache.org/jira/browse/KAFKA-1641 Also there are 2 patches in there, I suppose I should be using only the

What's new in Apache Kafka 0.8.2.1 release

2015-03-13 Thread Jun Rao
I wrote a short blog on what's being fixed in 0.8.2.1 release. http://blog.confluent.io/2015/03/13/apache-kafka-0-8-2-1-release/ We recommend everyone upgrade to 0.8.2.1. Thanks, Jun

Re: Kafka elastic no downtime scalability

2015-03-13 Thread Stevo Slavić
OK, thanks for heads up. When reading Apache Kafka docs, and reading what Apache Kafka can I expect it to already be available in latest general availability release, not what's planned as part of some other project. Kind regards, Stevo Slavic. On Fri, Mar 13, 2015 at 2:32 PM, Joe Stein

Re: Kafka elastic no downtime scalability

2015-03-13 Thread Joe Stein
Well, I know it is semantic but right now it can be elastically scaled without down time but you have to integrate into your environment for what that means it has been that way since 0.8.0 imho. My point was just another way to-do that out of the box... folks do this elastic scailing today