Any restrictions on consumer group name?

2016-06-09 Thread Jaikiran Pai
We are using 0.9.0.1 of Kafka server and (Java) clients. Our (Java) consumers are assigned to dynamic runtime generated groups i.e. the consumer group name is generated dynamically at runtime, using some application specific logic. I have been looking at the docs but haven't yet found anything

Re: Question about heterogeneous brokers in a cluster

2016-06-09 Thread Todd Palino
So as Alex noted, there’s no immediate problem to doing this. Kafka itself doesn’t know much about the underlying hardware, so it’s not going to care. At the same time, this means that it has no way natively to know that those systems have more storage capacity. So they’re not going to

Re: JVM Optimizations

2016-06-09 Thread Todd Palino
16GB is really not needed. Even our large brokers with 4000 partitions are using a 6GB heap. One thing to watch out for with using G1 and Kafka is humongous allocations. It’s very easy for Kafka to allocate large chunks of memory for large message batches, and this can cause some serious problems

Re: Rate that connect delivers messages

2016-06-09 Thread Barry Kaplan
Hmm, well CPU is pretty much zero. Heap is barely used. I even made the task put method be a noop other than to log time-since-last call. No change. With yourkit I see that ES has a thread that is sleeping, but it's in a monitor thread pool and clearly not blocking kafka. Anyway, I even removed

Re: Skipping assignment for topic * since no metadata is available

2016-06-09 Thread Jaikiran Pai
On Thursday 09 June 2016 08:00 PM, Patrick Kaufmann wrote: Hello Recently we’ve run into a problem when starting our application for the first time. At the moment all our topics are auto-created. Now, at the first start there are no topics, so naturally some consumers try to connect to

Re: Message duplicated on incorrect topic - anyone else see this?

2016-06-09 Thread Jaikiran Pai
How do you check/verify the duplication of the message? Can you post relevant part of your producer code too? -Jaikiran On Thursday 09 June 2016 10:36 PM, Clark Breyman wrote: We're seeing a situation in one of our clusters where a message will occasionally be duplicated on an incorrect topic.

Re: Rate that connect delivers messages

2016-06-09 Thread Ewen Cheslack-Postava
Barry, It might help to know whether you're hitting a (single threaded) CPU limit or if the bottleneck is elsewhere. Also, how large on average are the messages you are consuming? There's nothing that'll force batching like you're talking about. You can tweak any consumer settings via

Rate that connect delivers messages

2016-06-09 Thread Barry Kaplan
I am running a connect consumer that receives JSON records and indexes into elasticsearch. The consumer is pushing out 300 messages/s into the a topic with a single partition. The connect job is configured with 1 task. (This is all for testing). What I see is that push is called about every 10s

Re: JVM Optimizations

2016-06-09 Thread Dustin Cote
Yes, but 16GB is probably not necessary and potentially detrimental. Please have a look at the doc here that shows what LinkedIn runs in production (at the time of writing). That should give you some idea of the ballpark of heap size you should be

Re: JVM Optimizations

2016-06-09 Thread Shane Hender
@Dustin, @Ben Do you not need to increase the heap size to correspond to the replica num * the partition count? So that partition records can be held in memory until they are sent to the replicas? I believe @Ben's kafka setup is such that there are thousands of partitions across the topics. On

Re: Question about heterogeneous brokers in a cluster

2016-06-09 Thread Alex Loddengaard
Hi Kevin, If you keep the same configs on the new brokers with more storage capacity, I don't foresee any issues. Although I haven't tried it myself. What may introduce headaches is if you have different configuration options per broker. Or if you try to assign more partitions to the newer

Re: KafkaStream and Kafka consumer group

2016-06-09 Thread Saeed Ansari
Thank you Eno, Adding more threads extremely increased the throughput of stream. As I said after processing I send the event to another topic. For that I was opening a connection via KafkaProducer to the cluster and I think that was the issue. Now there is just one producer for sending events to

Re: JVM Optimizations

2016-06-09 Thread Dustin Cote
@Ben, the big GC stalls could be related to the 16GB max heap size. When you have a bigger heap size, you need more time to GC if/when you hit a garbage collection. In general, Kafka shouldn't need more than a 5GB heap, and lowering your heap size combined with using the G1GC (and preferably

Re: JVM Optimizations

2016-06-09 Thread Stephen Powis
Hey Ben Using G1 with those settings appears to be working well for us. Infrequent younggen/minor GCs averaging a run time of 12ms, no full GCs in the 24 hours logged that I uploaded. I'd say enable the GC log flags and let it run for a bit, then change a setting or two and compare. On Thu,

Re: JVM Optimizations

2016-06-09 Thread Ben Osheroff
We've been having quite a few symptoms that appear to be big GC stalls (nonsensical ZK session timeouts) with the following config: -Xmx16g -Xms16g -server -XX:+CMSClassUnloadingEnabled -XX:+CMSScavengeBeforeRemark -XX:+UseG1GC -XX:+DisableExplicitGC Next steps will be to turn on gc logging and

Re: JVM Optimizations

2016-06-09 Thread Stephen Powis
NOTE -- GC tuning is outside the realm of my expertise by all means, so I'm not sure I'd use our info as any kind of benchmark. But in the interest of sharing, we use the following options export KAFKA_HEAP_OPTS="-Xmx12G -Xms12G" > > export KAFKA_JVM_PERFORMANCE_OPTS="-server

Re: JVM Optimizations

2016-06-09 Thread Lawrence Weikum
Hi Tom, Currently we’re using the default settings – no special tuning whatsoever. I think the kafka-run-class.sh has this: # Memory options if [ -z "$KAFKA_HEAP_OPTS" ]; then KAFKA_HEAP_OPTS="-Xmx256M" fi # JVM performance options if [ -z "$KAFKA_JVM_PERFORMANCE_OPTS" ]; then

Re: KafkaStream and Kafka consumer group

2016-06-09 Thread Eno Thereska
Hi Saeed, There could be several reasons why things appear slow and it is difficult to say without knowing the exact details of the setup and the results you are observing. One thing to check is the number of threads you have assigned to the Kafka Stream application. By default just one thread

Re: JVM Optimizations

2016-06-09 Thread Tom Crayford
Hi Lawrence, What JVM options were you using? There's a few pages in the confluent docs on JVM tuning iirc. We simply use the G1 and a 4GB Max heap and things work well (running many thousands of clusters). Thanks Tom Crayford Heroku Kafka On Thursday, 9 June 2016, Lawrence Weikum

JVM Optimizations

2016-06-09 Thread Lawrence Weikum
Hello all, We’ve been running a benchmark test on a Kafka cluster of ours running 0.9.0.1 – slamming it with messages to see when/if things might break. During our test, we caused two brokers to throw OutOfMemory errors (looks like from the Heap) even though each machine still has 43% of the

Re: Handling of nulls in KTable groupBy

2016-06-09 Thread Guozhang Wang
Hello Jeff, Yes, output null upon no-match is by design, as we are trying to intentionally differentiate with the join semantics of an RDBMS, where tables are "static", where as in Kafka Streams "KTables are continuously evolving / being updated". In fact, the semantics of inner / left / outer

Re: Kafka take too long to update the client with metadata when a broker is gone

2016-06-09 Thread safique ahemad
Hello guys, Below is the link where kafka logs can be seens with TRACE enabled. https://drive.google.com/file/d/0B-nANlrsm5ogQkh1NUR2UHYtbkU/view?usp=sharing I have truncated log as it was very big but it has all the cover of the time of problem. Scenario: 1) There were three kafka running

RE: HDFS Connector configuration

2016-06-09 Thread Tauzell, Dave
Thanks, that worked. I used the command "hdfs getconf -confKey fs.defaultFS" to get the correct value. My local development instance has a single NN. -Dave Dave Tauzell | Senior Software Engineer | Surescripts O: 651.855.3042 | www.surescripts.com | dave.tauz...@surescripts.com Connect

Re: HDFS Connector configuration

2016-06-09 Thread Mudit Kumar
This should point for fs.DefaultFS properity in core-site.xml..basically it should be namespace Do you have HA cluster or single NN? On 6/9/16, 11:18 PM, "Tauzell, Dave" wrote: >What should this point to hdfs.url=hdfs://mdl-clda01:9000 ? Does it reference

HDFS Connector configuration

2016-06-09 Thread Tauzell, Dave
What should this point to hdfs.url=hdfs://mdl-clda01:9000 ? Does it reference the Namenode? -Dave Dave Tauzell | Senior Software Engineer | Surescripts O: 651.855.3042 | www.surescripts.com | dave.tauz...@surescripts.com

Question about heterogeneous brokers in a cluster

2016-06-09 Thread Kevin A
Hi there, I have a couple of Kafka brokers and thinking about adding a few more. The new broker machines would have a lot more storage available to them than the existing brokers. Am I setting myself up for operational headaches by deploying a heterogeneous (in terms of storage capacity) cluster?

Re: KafkaStream and Kafka consumer group

2016-06-09 Thread Saeed Ansari
Hi Eno, Thank you for the response. Actually I did not know it automatically assigns partitions to consumers. Now I have one Kafkastream reading from 12 partitions, like below: Controller is an actor that I am sending the message to and then it creates child actors to send messages out.

Message duplicated on incorrect topic - anyone else see this?

2016-06-09 Thread Clark Breyman
We're seeing a situation in one of our clusters where a message will occasionally be duplicated on an incorrect topic. No identifiable issues spotted in either the client application or kafka logs. Has anyone else see this? Seems like something that would raise concern. Any recommendations for

Re: Skipping assignment for topic * since no metadata is available

2016-06-09 Thread Kaufman Ng
For consumers connecting to non-existent topics there's no metadata available initially, that's the reason why you are seeing this message. There are multiple ways to work around this: - create the topics manually (e.g. kafka-topics.sh --create --topic ...) - start your producer *before* your

Skipping assignment for topic * since no metadata is available

2016-06-09 Thread Patrick Kaufmann
Hello Recently we’ve run into a problem when starting our application for the first time. At the moment all our topics are auto-created. Now, at the first start there are no topics, so naturally some consumers try to connect to topics which don’t exist. Those consumers now fail quite

Kafka Config Changes

2016-06-09 Thread Harkrishn Patro
Hi, I am trying to tweak two config params of a kafka topic dynamically (without restart) i.e. flush.messages,flush.ms to restrict the no. of writes to disk, as disk seems to be bottleneck in our case. But these configuration changes are not getting applied to the topic. flush.ms has been set to

Re: Change zookeeper cluster for kafka

2016-06-09 Thread Elias Abacioglu
Thanks Dustin, I guess I have to go with the second option since we didn't migrate Zookeeper. We did a side-by-side upgrade of Hadoop. So we setup a entire new Hadoop eco-system and migrated all data from the old cluster. On Wed, Jun 8, 2016 at 4:50 PM, Dustin Cote wrote:

Re: Quotas feature Kafka 0.9.0.1

2016-06-09 Thread Ben Stopford
Hi Liju Alas we can’t use quotas directly to throttle replication. The problem is that, currently, fetch requests from followers include critical traffic (the replication of produce requests) as well as non critical traffic (brokers catching up etc) so we can’t apply the current quotas

Re: kafka 9 group offset

2016-06-09 Thread Spico Florin
HI! If you have subscribed to a topic via a consumer group you can use: ./kafka-consumer-groups.sh --new-consumer --bootstrap-server brokerhost:brokerport --describe --group your_group_id You can see the group list via command: ./kafka-consumer-groups.sh --new-consumer --bootstrap-server

Re: Questions on Kafka Security

2016-06-09 Thread Gerard Klijs
If you can put the acl in a file, and there will be little or none changes, you might be best of writing your own Authorizer implementation. If you can used a shared file system to store the config you would even be able to easily change it, and it will be the same across the cluster. On Thu, Jun

Re: Questions about Kafka Scripts

2016-06-09 Thread Gwen Shapira
[A] Unfortunately, we only documented this in the code: /** * For verifying the consistency among replicas. * * 1. start a fetcher on every broker. * 2. each fetcher does the following *2.1 issues fetch request *2.2 puts the fetched result in a shared buffer *2.3 waits for