Re: Kafka Streams: lockException

2017-03-20 Thread Mahendra Kariya
Hey Guozhang, Thanks a lot for these insights. We are facing the exact same problem as Tianji. Our commit frequency is also quite high. We flush almost around 16K messages per minute to Kafka at the end of the topology. Another issue that we are facing is that rocksdb is not deleting old data. We

Re: Kafka Streams: lockException

2017-03-20 Thread Mahendra Kariya
We did some more analysis on why the disk utilisation is continuously increasing. Turns out it's the RocksDB WAL that's utilising most of the disk space. The LOG.old WAL files are not getting deleted. Ideally they should have been. RocksDB provides certain configuration for purging WAL files

Re: Kafka Streams: lockException

2017-03-20 Thread Damian Guy
Mahendra, The WAL is turned off in KafkaStreams. This file is just the rocksdb log, you can probably just delete the old ones: https://github.com/facebook/rocksdb/issues/849 In 0.10.0.1 there is no way of configuring RocksDB via KafkaStreams. Thanks, Damian On Mon, 20 Mar 2017 at 09:22 Mahendra K

clearing an aggregation?

2017-03-20 Thread Jon Yeargers
Is this possible? Im wondering about gathering data from a stream into a series of windowed aggregators: minute, hour and day. A separate process would start at fixed intervals, query the appropriate state store for available values and then hopefully clear / zero / reset everything for the next in

Re: Kafka Streams: lockException

2017-03-20 Thread Tianji Li
Hi Guys, Great information again as usual, very helpful! Very appreciated, thanks so much! Tianji PS: The Kafka Community is simply great! On Fri, Mar 17, 2017 at 3:00 PM, Guozhang Wang wrote: > Tianji and Sachin (and also cc'ing people who I remember have reported > similar RocksDB memory

Re: Out of order message processing with Kafka Streams

2017-03-20 Thread Michael Noll
And since you asked for a pointer, Ali: http://docs.confluent.io/current/streams/concepts.html#windowing On Mon, Mar 20, 2017 at 5:43 PM, Michael Noll wrote: > Late-arriving and out-of-order data is only treated specially for windowed > aggregations. > > For stateless operations such as `KStrea

Re: Out of order message processing with Kafka Streams

2017-03-20 Thread Michael Noll
Late-arriving and out-of-order data is only treated specially for windowed aggregations. For stateless operations such as `KStream#foreach()` or `KStream#map()`, records are processed in the order they arrive (per partition). -Michael On Sat, Mar 18, 2017 at 10:47 PM, Ali Akhtar wrote: > >

Re: Out of order message processing with Kafka Streams

2017-03-20 Thread Ali Akhtar
Can windows only be used for aggregations, or can they also be used for foreach(), and such? And is it possible to get metadata on the message, such as whether or not its late, its index/position within the other messages, etc? On Mon, Mar 20, 2017 at 9:44 PM, Michael Noll wrote: > And since yo

ConsumerRebalanceListerner

2017-03-20 Thread jeffrey venus
Hi,tarte I am trying to use kafka as a messaging platform for my microservices. There is a problem i am facing in real time . when I try to bring in a new consumer group to consume from a certain topic . I have to restart the producer only then it ( new consumer group) starts consuming. Is there

Re: clearing an aggregation?

2017-03-20 Thread Michael Noll
Jon, the windowing operation of Kafka's Streams API (in its DSL) aligns time-based windows to the epoch [1]: Quoting from e.g. hopping windows (sometimes called sliding windows in other technologies): > Hopping time windows are aligned to the epoch, with the lower interval bound > being inclusiv

Re: ConsumerRebalanceListerner

2017-03-20 Thread Robert Quinlivan
Does the consumer or producer log anything when you connect the new consumer? Is there anything logged in the broker logs? On Mon, Mar 20, 2017 at 10:58 AM, jeffrey venus wrote: > Hi,tarte > I am trying to use kafka as a messaging platform for my microservices. > There is a problem i am facing

Kafka timing to auto create topics

2017-03-20 Thread Matheus Gonçalves da Silva
Hello, I have configured on my server.properties the auto.create.topics.enable=true but im having issues with that. When I launch kafka for the first time and send any message to any topic (to be created) it doesn't arrive at the topic, but if I launch kafka and wait few minutes to send the mes

Kafka Producer for Sending Log Data

2017-03-20 Thread Tony S. Wu
Hi, I am looking to send log file periodically to Kafka. Before I set out to write my own producer, I was wonder what everyone uses to send log file to Kafka through HTTP API. Ideally it should also prune the log and have some sort of error handling. Thanks very much. Tony S. Wu

org.apache.kafka.common.errors.TimeoutException

2017-03-20 Thread Mina Aslani
Hi, I get ERROR Error when sending message to topic my-topic with key: null, value: ... bytes with error: (org.apache.kafka.clients.producer.internals. ErrorLoggingCallback) org.apache.kafka.common.errors.TimeoutException: Expiring 11 record(s) for my-topic-0: 1732 ms has passed since last append

Processing multiple topics

2017-03-20 Thread Manasa Danda
Hi, I am Manasa, currently working on a project that requires processing data from multiple topics at the same time. I am looking for an advise on how to approach this problem. Below is the use case. We have 4 topics, with data coming in at a different rate in each topic, but the messages in eac

Re: Processing multiple topics

2017-03-20 Thread Ali Akhtar
Are you saying, that it should process all messages from topic 1, then topic 2, then topic 3, then 4? Or that they need to be processed exactly at the same time? On Mon, Mar 20, 2017 at 10:05 PM, Manasa Danda wrote: > Hi, > > I am Manasa, currently working on a project that requires processing

Re: Out of order message processing with Kafka Streams

2017-03-20 Thread Michael Noll
> Can windows only be used for aggregations, or can they also be used for > foreach(), and such? As of today, you can use windows only in aggregations. > And is it possible to get metadata on the message, such as whether or not its late, its index/position within the other messages, etc? If you

Re: Out of order message processing with Kafka Streams

2017-03-20 Thread Ali Akhtar
It would be helpful to know the 'start' and 'end' of the current metadata, so if an out of order message arrives late, and is being processed in foreach(), you'd know which window / bucket it belongs to, and can handle it accordingly. I'm guessing that's not possible at the moment. (My use case i

Fwd: Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API

2017-03-20 Thread Matthias J. Sax
\cc users list Forwarded Message Subject: Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API Date: Mon, 20 Mar 2017 11:51:01 -0700 From: Matthias J. Sax Organization: Confluent Inc To: d...@kafka.apache.org I want to push this discussion further. Guozhang's argument abo

Re: Out of order message processing with Kafka Streams

2017-03-20 Thread Michael Noll
Ali, what you describe is (roughly!) how Kafka Streams implements the internal state stores to support windowing. Some users have been following a similar approach as you outlined, using the Processor API. On Mon, Mar 20, 2017 at 7:54 PM, Ali Akhtar wrote: > It would be helpful to know the '

Re: Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API

2017-03-20 Thread Michael Noll
Hmm, I must admit I don't like this last update all too much. Basically we would have: StreamsBuilder builder = new StreamsBuilder(); // And here you'd define your...well, what actually? // Ah right, you are composing a topology here, though you are not aware of it. KafkaStreams

Re: Out of order message processing with Kafka Streams

2017-03-20 Thread Ali Akhtar
Yeah, windowing seems perfect, if only I could find out the current window's start time (so I can log the current bucket's start & end times) and process window messages individually rather than as aggregates. It doesn't seem like i can get this metadata from ProcessorContext though, from looking

validate identity of producer in each record

2017-03-20 Thread Matt Magoffin
Hello, I am new to Kafka and am looking for a way for consumers to be able to identify the producer of each message in a topic. There are a large number of producers (lets say on the order of millions), and each producer would be connecting via SSL and using a unique client certificate. Essenti

Re: validate identity of producer in each record

2017-03-20 Thread Hans Jespersen
You can configure Kafka with ACLs that only allow certain users to produce/consume to certain topics but if multiple producers are allowed to produce to a shared topic then you cannot identify them without adding something to the messages. For example, you can have each producer digitally sign (or

Re: Processing multiple topics

2017-03-20 Thread Matthias J. Sax
I would recommend to try out Kafka's Streams API instead of Spark Streaming. http://docs.confluent.io/current/streams/index.html -Matthias On 3/20/17 11:32 AM, Ali Akhtar wrote: > Are you saying, that it should process all messages from topic 1, then > topic 2, then topic 3, then 4? > > Or tha

Re: validate identity of producer in each record

2017-03-20 Thread Matt Magoffin
Thanks, Hans. Signing messages is a good idea. Other than that, is there possibly an extension point in Kafka itself on the receiving of records, before they are stored/distributed? I was thinking along the lines of org.apache.kafka.clients.producer.ProducerInterceptor but on the server side?

Re: Kafka Producer for Sending Log Data

2017-03-20 Thread Ram Vittal
Hi Tony, We are using custom logging API that wraps Kafka producer as singleton for each app. The custom API takes structured log data and converts to JSON, writes to app specific Kafka topic. Then that topic is bridged to Logstash consumer and logs get ingested to elastic search. Now these lo

Re: validate identity of producer in each record

2017-03-20 Thread Matt Magoffin
Thanks, Hans. Signing messages is a good idea. Other than that, is there possibly an extension point in Kafka itself on the receiving of records, before they are stored/distributed? I was thinking along the lines of org.apache.kafka.clients.producer.ProducerInterceptor but on the server side?

Re: validate identity of producer in each record

2017-03-20 Thread Hans Jespersen
Nothing on the broker today but if you use Kafka Connect API in 0.10.2 and above there is a pluggable interface called Transformations. See org.apache.kafka.connect.transforms in https://kafka.apache.org/0102/javadoc/index.html?org/apache/kafka/connect Source Connector transformations happen b

Re: validate identity of producer in each record

2017-03-20 Thread Matt Magoffin
OK, thank you for that. I looked at org.apache.kafka.connect.transforms.Transformation and org.apache.kafka.connect.source.SourceRecord, but am not discovering where the authenticated username (or Principal) might be available for the call to Transformation.apply()… or does Connect even support