Re: Log retention just for offset topic

2016-06-29 Thread Sathyakumar Seshachalam
Or may be am wrong, and Log cleaner only picks up topics with a cleanup.policy. >From the documentation it is not very obvious what the behaviour is. On Thu, Jun 30, 2016 at 10:33 AM, Sathyakumar Seshachalam < sathyakumar_seshacha...@trimble.com> wrote: > Hi, > > Thanks for the response. > > I

Re: Log retention just for offset topic

2016-06-29 Thread Sathyakumar Seshachalam
Hi, Thanks for the response. I still like to know what happens for topics which have not defined a cleanup.policy. I assume the default value is compact. And hence all topic's logs will be compacted which I want to avoid. Am running 0.9.0, So will have to manually set log.cleaner.enable=true

Re: Log retention just for offset topic

2016-06-29 Thread Manikumar Reddy
Hi, Kafka internally creates the offsets topic (__consumer_offsets) with compact mode on. >From 0.9.0.1 onwards log.cleaner.enable=true by default. This means topics with a cleanup.policy=compact will now be compacted by default, You can tweak the offset topic configuration by using below

Re: Kafka Producer connection issue on AWS private EC2 instance

2016-06-29 Thread vivek thakre
There are no errors in the broker logs. The Kafka Cluster in itself is functional. I have other producers and consumers working which are in public subnet (same as kafka cluster). On Wed, Jun 29, 2016 at 7:15 PM, Kamesh Kompella wrote: > For what it's worth, I used to get

Log retention just for offset topic

2016-06-29 Thread Sathyakumar Seshachalam
Am little confused about how log cleaner works. My use case is that I want to compact just selected topics (or in my case just the internal topic __consumers_offsets and want to leave other topics as is). Whats the right settings/configuration for this to happen. As I understand log cleaner

Re: Streams RocksDB State Store Disk Usage

2016-06-29 Thread Guozhang Wang
Hello Avi, One way to mentally quantify your state store usage is to consider the total key space in your reduceByKey() operator, and multiply by the average key-value pair size. Then you need to consider the RocksDB write / space amplification factor as well. Currently Kafka Streams hard-write

Re: Kafka Roadmap

2016-06-29 Thread Guozhang Wang
Hello Szymon, Currently the community do not have regular time-based release plans for Kafka. >From your question I think you are more interested in learning about how to keep older versioned clients to talk to different / maybe newer versioned brokers. We are working on a proposal to help you

Re: Kafka Producer connection issue on AWS private EC2 instance

2016-06-29 Thread Kamesh Kompella
For what it's worth, I used to get similar messages with docker instances on centos. The way I debugged the problem was by looking at Kafka logs. In that case, it turned out that brokers could not reach zk and this info was in the logs. The logs will list the parameters the broker used at

Kafka Producer connection issue on AWS private EC2 instance

2016-06-29 Thread vivek thakre
I have Kafka Cluster setup on AWS Public Subnet with all brokers having elastic IPs My producers are on private subnet and not able to produce to the kafka on public subnet. Both subnets are in same VPC I added the private ip/cidr of producer ec2 instance to Public Kafka's security group. (I can

Re: Consumer Group, relabancing and partition uniqueness

2016-06-29 Thread Milind Vaidya
Florin, Thanks, I got your point. The documentation as well as diagram showing the mechanism of consumer group indicates that,the partitions are shared disjointly by consumers in a group. You also stated above "Each of your consumer will receive message for its allocated partition for that they

Re: Consumer Group, relabancing and partition uniqueness

2016-06-29 Thread Spico Florin
Hi! By default kafka uses internally a round robin partitioner that will send the messages to the right partition based on the message key. Each of your consumer will receive message for its allocated partition for that they subscribed. In case of rebalance, if you add more consumers than the

Consumer Group, relabancing and partition uniqueness

2016-06-29 Thread Milind Vaidya
Hi Background : I am using a java based multithreaded kafka consumer. Two instances of this consumer are running on 2 different machines i.e. one consumer process per box, and belong to same consumer group. Internally each process has 2 threads each. Both the consumer processes consume from

Re: Streams RocksDB State Store Disk Usage

2016-06-29 Thread Avi Flax
On Jun 29, 2016, at 14:15, Matthias J. Sax wrote: > > If you use window-operations, windows are kept until there retention > time expires. Thus, reducing the retention time, should decrease the > memory RocksDB needs to preserve windows. Thanks Matthias, that makes sense

Re: Streams RocksDB State Store Disk Usage

2016-06-29 Thread Avi Flax
On Jun 29, 2016, at 11:49, Eno Thereska wrote: > These are internal files to RockDb. Yeah, that makes sense. However, since Streams is encapsulating/employing RocksDB, in my view it’s Streams’ responsibility to configure RocksDB well with good defaults and/or at least

Re: Streams RocksDB State Store Disk Usage

2016-06-29 Thread Matthias J. Sax
One thing I want to add: If you use window-operations, windows are kept until there retention time expires. Thus, reducing the retention time, should decrease the memory RocksDB needs to preserve windows. See

Monitoring Kafka Connect

2016-06-29 Thread Sumit Arora
Hello, We are currently building our data-pipeline using Confluent and as part of this implementation, we have written couple of Kafka Connect Sink Connectors for Azure and MS SQL server. To provide some more context, we are planning to use SQL Server's Change Data Capture feature to track

Re: Building API to make Kafka reactive

2016-06-29 Thread Shekar Tippur
Thanks for the suggestion Lohith. Will try that and provide a feedback. - Shekar On Tue, Jun 28, 2016 at 11:45 PM, Lohith Samaga M wrote: > Hi Shekar, > Alternatively, you could make each stage of your pipeline to write > to a Cassandra (or other DB) and

Re: Building API to make Kafka reactive

2016-06-29 Thread Shekar Tippur
Thanks Rajini, I have seen this. Looks like quite a bit of work has been done. I was trying to go through this code and understand how to get started. - Shekar On Wed, Jun 29, 2016 at 12:49 AM, Rajini Sivaram < rajinisiva...@googlemail.com> wrote: > Hi Shekar, > > We are working on a reactive

kafka-console-producer.sh TimeOutException

2016-06-29 Thread Ludek Cigler
Hi, I am trying to send messages using the Kafka console producer to a Kafka broker that is running on the same machine. When I run $ echo "Hello world" | ./kafka-console-producer.sh --broker-list localhost:9092 --topic test I receive the following error message: [2016-06-29 15:00:44,069] ERROR

Re: Streams RocksDB State Store Disk Usage

2016-06-29 Thread Eno Thereska
Hi Avi, These are internal files to RockDb. Depending on your load in the system I suppose they could contain quite a bit of data. How large was the load in the system these past two weeks so we can calibrate? Otherwise I'm not sure if 1-2GB is a lot or not (sounds like not that big to make

Streams RocksDB State Store Disk Usage

2016-06-29 Thread Avi Flax
Hello all, I’ve been working on a Streams app and so far it’s going quite well. I’ve had the app up and running in a staging environment for a few weeks, processing real data. Today I logged into the server to check in on some things, and I found the disk was full. I managed (with ncdu) to

Kafka Roadmap

2016-06-29 Thread Panas, Szymon
All, I am building solution around Apache Kafka with set of dependencies in other products mainly Kafka client. Is there defined road map for the Kafka? I am particularly interested in version 1.X so it is marked stable. Does community plan release every 6 months (each version of Kafka changes

Re: Can I access Kafka Streams Key/Value store outside of Processor?

2016-06-29 Thread Eno Thereska
Yes! We just made KIP-67 available yesterday to start the discussions: https://cwiki.apache.org/confluence/display/KAFKA/KIP-67%3A+Queryable+state+for+Kafka+Streams Any feedback is welcome, there is a

Re: Can I access Kafka Streams Key/Value store outside of Processor?

2016-06-29 Thread Yi Chen
This is awesome Eno! Would you mind sharing the JIRA ticket if you have one? On Sun, Jun 19, 2016 at 12:07 PM, Eno Thereska wrote: > Hi Yi, > > Your observation about accessing the state stores that are already there > vs. keeping state outside of Kafka Streams is a good

Re: Question about bootstrap processing in KafkaStreams.

2016-06-29 Thread Matthias J. Sax
Hi, there was a similar discussion on the list already "Kafka stream join scenario": http://search-hadoop.com/m/uyzND1WsAGW1vB5O91=Kafka+stream+join+scenarios Long story short: there is no explicit support or guarantee. As Jay mentioned, some alignment is best effort. However, the main issues

Re: AWS EFS

2016-06-29 Thread Tom Crayford
I think you'll be far better off using EBS and Kafka's inbuilt distribution than NFS mounts. Kafka's designed for distributing data natively, and not for NFS style mounts. On Wed, Jun 29, 2016 at 11:46 AM, Ben Davison wrote: > Does anyone have any opinions on this? > >

AWS EFS

2016-06-29 Thread Ben Davison
Does anyone have any opinions on this? https://aws.amazon.com/blogs/aws/amazon-elastic-file-system-production-ready-in-three-regions/ Looks interesting, just wondering if anyone else uses NFS mounts with Kafka? Thanks, Ben -- This email, including attachments, is private and confidential.

Colocating Kafka Connect on Kafka Broker

2016-06-29 Thread Kristoffer Sjögren
Hi We want to use Kafka Connect to copy data to HDFS (using kafka-connect-hdfs) in parquet format and was wondering if its a good idea to collocate distributed Kafka Connect 1-1 on Kafka Brokers? Considering the parquet indexing process would steal (a lot of / too much?) computing resources from

Re: Building API to make Kafka reactive

2016-06-29 Thread Rajini Sivaram
Hi Shekar, We are working on a reactive streams API for Kafka. It is in its very early experimental stage, but if you want to take a look, the code is in github ( https://github.com/reactor/reactor-kafka). I think you can add a session id without making it part of the Kafka API. In the coming

RE: Building API to make Kafka reactive

2016-06-29 Thread Lohith Samaga M
Hi Shekar, Alternatively, you could make each stage of your pipeline to write to a Cassandra (or other DB) and your API will read from it. With Cassandra TTL, the row will be deleted after TTL is passed. No manual cleanup is required. Best regards / Mit freundlichen Grüßen / Sincères

Building API to make Kafka reactive

2016-06-29 Thread Shekar Tippur
I am looking at building a reactive api on top of Kafka. This API produces event to Kafka topic. I want to add a unique session id into the payload. The data gets transformed as it goes through different stages of a pipeline. I want to specify a final topic where I want the api to know that the