Re: More OS packages, please!

2016-07-23 Thread Ewen Cheslack-Postava
Confluent Platform includes RPM and Debian packages: http://www.confluent.io/download We tag them a bit differently due to different release schedules, but the CP builds are entirely open source and effectively map directly to Apache releases. Check out

Re: [kafka-connect] multiple or single clusters?

2016-07-23 Thread Ewen Cheslack-Postava
On Fri, Jun 24, 2016 at 11:16 AM, noah wrote: > I'm having some trouble figuring out the right way to run Kafka Connect in > production. We will have multiple sink connectors that we need to remain > running indefinitely and have at least once semantics (with as little >

Re: Colocating Kafka Connect on Kafka Broker

2016-07-23 Thread Ewen Cheslack-Postava
Generally we discourage colocating services with Kafka. Kafka relies heavily on the page cache. It's generally light on CPU (except maybe if it has to recompress messages), but may not play well with other services. For very light installations, colocating some services (e.g. both ZK and Kafka),

Re: Kafka cluster

2016-07-23 Thread Ewen Cheslack-Postava
2 is technically enough but you're at risk of losing data if there is a failure and the second broker fails while a replacement broker is replicating the data. In general, 3 brokers (and replicas) is a good minimum, but there are some cases that might warrant using fewer, even as few as 1. For

Re: Kafka Connect issues

2016-07-23 Thread Ewen Cheslack-Postava
That definitely sounds unusual -- rebalancing normally only happens either when a) there are new workers or b) there are connectivity issues/failures. Is it possible there's something causing large latencies? -Ewen On Sat, Jul 16, 2016 at 6:09 AM, Kristoffer Sjögren wrote: >

Re: Kafka does not preserve an offset on topic.

2016-07-23 Thread Ewen Cheslack-Postava
The parameter you want is AUTO_OFFSET_RESET_CONFIG. If setting that to latest isn't working, can you include some code that reproduces the issue? -Ewen On Wed, Jul 6, 2016 at 6:21 AM, Pawel Huszcza wrote: > Hello, > > I tried every different property I can think of

Re: Monitoring Kafka Connect

2016-07-23 Thread Ewen Cheslack-Postava
On Wed, Jun 29, 2016 at 9:44 AM, Sumit Arora wrote: > Hello, > > We are currently building our data-pipeline using Confluent and as part of > this implementation, we have written couple of Kafka Connect Sink > Connectors for Azure and MS SQL server. To provide

Re: Rebalance and Failures

2016-07-23 Thread Ewen Cheslack-Postava
Since you mention ZK timeout, I think you might be confused about new vs old consumer semantics. With the new consumer, there's no ZK interaction. If one of the member dies after indicating membership but before the group protocol completes, it will simply be assigned data and not process it.

Re: Kafka consumer performance with large network delay

2016-07-23 Thread Ewen Cheslack-Postava
Kafka will batch messages, but if the rate of delivery is too slow it'll fall back to delivering only one message per batch. What is the total throughput per broker? -Ewen On Fri, Jul 15, 2016 at 5:21 PM, Boris Sorochkin wrote: > Hi All, > I have Kafka setup with default

Re: Nginx Logs to Kafka

2016-07-23 Thread Ewen Cheslack-Postava
Kafka Connect can also help you here. There's nothing nginx specific, but even a very simple file connector can help you ingest nginx logs into Kafka. -Ewen On Tue, Jul 19, 2016 at 11:22 AM, Steve Brandon wrote: > You can use the ELK stack to push your logs to Kafka,

Re: Maximum number of producers per topic per broker

2016-07-23 Thread Ewen Cheslack-Postava
There's no strict limit on the number of producers. If you're hitting some CPU limit, perhaps you are simply overloading the broker? 6 or 700 brokers doesn't sound that bad, but if they are producing too much data then of course eventually the broker will become overwhelmed. How much total data is

Re: Duplicates consumed on rebalance. No compression, autocommit enabled.

2016-07-23 Thread Ewen Cheslack-Postava
I'd suggest using the new consumer instead of the old consumer. We've refined the implementation such that even with auto-commit you should get at least once processing in the worst case (and when there aren't failures, exactly once). The 0.10.0.0 release should get all of these semantics right.

Re: Topic naming convention and common message envelope.

2016-07-23 Thread Ewen Cheslack-Postava
On Tue, Jul 19, 2016 at 12:48 AM, Denis Mikhaylov wrote: > Hi, I plan to use Kafka for event-based integration between services. And > I have two questions that bother me a lot: > > 1) What topic naming convention do you use? > There's no strict convention, but using '.' or

Re: Deploying new connector to existing Kafka cluster

2016-07-23 Thread Ewen Cheslack-Postava
You're right that today you need to distribute jars manually today -- we don't have a built-in distribution mechanism, we just depend on what's on the classpath. Once you've got the jars installed, to make the jars accessible you'll need to do a rolling bounce with updated classpaths. We know

Re: Understanding Consumer Pooling vs Streaming

2016-07-23 Thread Ewen Cheslack-Postava
They implement generally the same consumer group functionality, but the new consumer (your option 1) is more modern, will be supported for a long time (whereas option 2 will eventually be deprecated and removed), and has a better implementation. The new consumer takes into account a lot of lessons

Re: Opportunity to contribute in Apache Kafka

2016-07-23 Thread Ewen Cheslack-Postava
Hey Shubham, I'd highly recommend a couple of newbie bugs just to get familiarized ( https://issues.apache.org/jira/issues/?jql=project%20%3D%20KAFKA%20AND%20labels%20%3D%20%22newbie%22%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20key%20DESC ) After getting familiarized with the project

Re: Kafka Active Segment List Diagram Typo?

2016-07-23 Thread Ewen Cheslack-Postava
Yes, that's just a bug in the image -- the second log segment should hold messages in the range indicated in the left side of the image. -Ewen On Sun, Jul 3, 2016 at 10:03 AM, Adam Cardenas wrote: > Good day Kafka users, > > Was looking over the current Kafka docs; >

Re: consumer.subscribe(Pattern p , ..) method fails with Authorizer

2016-07-23 Thread Ewen Cheslack-Postava
Manikumar, Yeah, that seems bad. Seems like maybe instead of moving to server-side processing we should make the metadata request limit results to topics the principal is authorized for? I suspect this is important anyway since generally it seems we don't want to reveal errors when there's

Re: 0.9 client persistently high CPU usage

2016-07-23 Thread Ewen Cheslack-Postava
That exception indicates that another thread is interrupting the consumer thread. Is there something else in the process that could be causing that interruption? The -1 broker ID actually isn't unusual. Since broker IDs should be positive, this is just a simple approach to identifying bootstrap

Re: How to know the producer of one topic?

2016-07-23 Thread Ewen Cheslack-Postava
Unfortunately there's no ID for the producer of messages -- the client ID is included when the request is sent, but it isn't recorded on disk. You *might* be able to dig out the producer of bad messages from the Kafka logs, but there's nothing in the stored data that would lead you directly to the

Re: Find partition offsets in a kerberized kafka cluster

2016-07-23 Thread Ewen Cheslack-Postava
The GetOffsetShell utility still uses the SimpleConsumer, so I don't think there's a way to use it with Kerberos. The new consumer doesn't expose all the APIs that SimpleConsumer does, so I don't think the tool can be converted to the new consumer yet. -Ewen On Wed, Jul 6, 2016 at 11:02 AM,

Kafka (Streams) scalability

2016-07-23 Thread Alex Glikson
Hi all, I wonder whether limitations mentioned in [1] regarding Kafka scalability in number of topics are still valid. For example, did the recent changes in the design around usage of ZooKeeper versus internal membership protocol affected the scalability - one way or the other? Also, it seems

Re: Kafka Consumer stops consuming from a topic

2016-07-23 Thread OGrandeDiEnne
Mmh... Some time ago we had an issue with Kafka 0.8.x The consumer was extremely slow ( the CPU was sucked up by other processes) and it was not picking up any message. Looking at Zookeeper we saw the offset was committed as the messages were already read by the consumer. We disabled auto back

Re: Kafka (Streams) scalability

2016-07-23 Thread Jagat Singh
My post if not directly referring to KS. The new free book by Orielly has very good explanation about Kafka Topic counts. You can download it from below link ( See Chapter 4) http://shop.oreilly.com/product/0636920049463.do In short quoting from there >>> These problems are likely