Re: Kafka Streaming message loss

2016-11-18 Thread Eno Thereska
Hi Ryan, Perhaps you could share some of your code so we can have a look? One thing I'd check is if you are using compacted Kafka topics. If so, and if you have non-unique keys, compaction happens automatically and you might only see the latest value for a key. Thanks Eno > On 18 Nov 2016, at

Re: Change current offset with KStream

2016-11-18 Thread Eno Thereska
21, John Hayles wrote: > > Eno, thanks! > > In my case I think I need more functionality for fault tolerance, so will > look at KafkaConsumer. > > Thanks, > John > > -Original Message- > From: Eno Thereska [mailto:eno.there...@gmail.com] > Sent: F

Re: Change current offset with KStream

2016-11-18 Thread Eno Thereska
Hi John, Currently you can only change the following global configuration by using "earliest" or "latest", as shown in the code snippet below. As the Javadoc mentions: "What to do when there is no initial offset in Kafka or if the current offset does not exist any more on the server (e.g. becau

Re: Unit tests (in Java) take a *very* long time to 'clean up'?

2016-11-11 Thread Eno Thereska
out which class / method creates the processes / threads. > > Thanks. > > On Fri, Nov 11, 2016 at 9:24 PM, Eno Thereska > wrote: > >> Hi Ali, >> >> You're right, shutting down the broker and ZK is expensive. We kept the >> number of integration tests rel

Re: Unit tests (in Java) take a *very* long time to 'clean up'?

2016-11-11 Thread Eno Thereska
Hi Ali, You're right, shutting down the broker and ZK is expensive. We kept the number of integration tests relatively small (and pushed some more tests as system tests, while doing as many as possible as unit tests). It's not just the shutdown that's expensive, it's also the starting up unfort

Re: Understanding the topology of high level kafka stream

2016-11-09 Thread Eno Thereska
Hi Sachin, Kafka Streams is built on top of standard Kafka consumers. For for every topic it consumes from (whether changelog topic or source topic, it doesn't matter), the consumer stores the offset it last consumed from. Upon restart, by default it start consuming from where it left off from

Re: Kafka streaming changelog topic max.message.bytes exception

2016-11-08 Thread Eno Thereska
I don't know how can I >>> control this topic. >>> >>> If I can set some retention.ms for this topic then I can purge old >>> messages >>> thereby ensuring that message size stays within limit. >>> >>> Thanks >>> Sachin >>

Re: Kafka streaming changelog topic max.message.bytes exception

2016-11-08 Thread Eno Thereska
Hi Sachin, Could you clarify what you mean by "message size increases"? Are messages going to the changelog topic increasing in size? Or is the changelog topic getting full? Thanks Eno > On 8 Nov 2016, at 16:49, Sachin Mittal wrote: > > Hi, > We are using aggregation by key on a kstream to

Re: Kafka Streams fails permanently when used with an unstable network

2016-11-02 Thread Eno Thereska
Hi Sai, For your second note on rebalancing taking a long time, we have just improved the situation in trunk after fixing this JIRA: https://issues.apache.org/jira/browse/KAFKA-3559 . Feel free to give it a go if rebalancing time continues to b

Re: Modify in-flight messages

2016-11-02 Thread Eno Thereska
Hi Dominik, Not sure if this is 100% relevant, but since I noticed you saying that you are benchmarking stream processing engines, one way to modify a message would be to use the Kafka Streams library, where you consume a message from a topic, modify it as needed/do some processing, and then pr

Re: windowing with the processor api

2016-11-02 Thread Eno Thereska
;t find a place where to define the >> size of the window and where to precise the time of the advance. >> >> >> Hamza >> >> Thanks >> >> De : Eno Thereska >> Envoyé : mardi 1 novembre 2016 22:44:47 À >> : users@kafka.apache.org Objet : Re:

Re: windowing with the processor api

2016-11-02 Thread Eno Thereska
Hi Hamza, Are you getting a particular error? Here is an example : Stores.create("window-store") .withStringKeys() .withStringValues() .persistent() .windowed(10, 10, 2, false).build(), "the-pr

Re: Kafka multitenancy

2016-10-26 Thread Eno Thereska
Hi Mudit, This is a relevant read: http://www.confluent.io/blog/sharing-is-caring-multi-tenancy-in-distributed-data-systems It talks about two aspects of multi tenancy in Kafka, security and performance

Re: How to block tests of Kafka Streams until messages processed?

2016-10-19 Thread Eno Thereska
Wanted to add that there is nothing too special about these utility functions, they are built using a normal consumer. Eno > On 19 Oct 2016, at 21:59, Eno Thereska wrote: > > Hi Ali, > > Any chance you could recycle some of the code we have in > streams/src/test/java/.../s

Re: How to block tests of Kafka Streams until messages processed?

2016-10-19 Thread Eno Thereska
Hi Ali, Any chance you could recycle some of the code we have in streams/src/test/java/.../streams/integration/utils? (I know we don't have it easily accessible in Maven, for now perhaps you could copy to your directory?) For example there is a method there "IntegrationTestUtils.waitUntilMinVa

Re: Embedded Kafka Cluster - Maven artifact?

2016-10-19 Thread Eno Thereska
I'm afraid we haven't released this as a maven artefact yet :( Eno > On 18 Oct 2016, at 13:22, Ali Akhtar wrote: > > Is there a maven artifact that can be used to create instances > of EmbeddedSingleNodeKafkaCluster for unit / integration tests?

Re: JVM crash when closing persistent store (rocksDB)

2016-10-12 Thread Eno Thereska
specifying the >>> kafa state dir properties, but no luck, same behavior. >>> I will try to run with the trunk tomorrow to see if it's stop correctly, >>> and I will keep you inform. There must be something with my >> configuration, >>> because I go

Re: JVM crash when closing persistent store (rocksDB)

2016-10-11 Thread Eno Thereska
Hi Pierre, I tried the exact code on MacOs and am not getting any errors. Could you check if all the directories in /tmp where Kafka Streams writes the RocksDb files are empty? I'm wondering if there is some bad state left over. Finally looks like you are running 0.10.0, could you try running

Re: Tuning for high RAM and 10GBe

2016-10-11 Thread Eno Thereska
Sounds good that you got up to 500MB/s. At that point I suspect you reach a sort of steady state where the cache is continuously flushing to the SSDs, so you are effectively bottlenecked by the SSD. I believe this is as expected (the bottleneck resource will dominate the end to end throughput ev

Re: Tuning for high RAM and 10GBe

2016-10-10 Thread Eno Thereska
850, claiming 100k IOPS and >500MB/s read/write. > > Again, thanks! > > On Sun, Oct 9, 2016 at 12:55 PM, Eno Thereska > wrote: > >> Hi Chris, >> >> A couple of things: looks like you're primarily testing the SSD if you're >> running on the local

Re: Understanding how joins work in Kafka streams

2016-10-10 Thread Eno Thereska
ose it will be > called every time a key is modified or it buffers the changes and calls it > once in a given time span. > > Thanks > Sachin > > > > On Sun, Oct 9, 2016 at 3:07 PM, Eno Thereska wrote: > >> Hi Sachin, >> >> Some comments inline: >

Re: Tuning for high RAM and 10GBe

2016-10-09 Thread Eno Thereska
Hi Chris, A couple of things: looks like you're primarily testing the SSD if you're running on the localhost, right? The 10GBe shouldn't matter in this case. The performance will depend a lot on the record sizes you are using. To get a ballpark number if would help to know the SSD type, would y

Re: Understanding how joins work in Kafka streams

2016-10-09 Thread Eno Thereska
Hi Sachin, Some comments inline: > On 9 Oct 2016, at 08:19, Sachin Mittal wrote: > > Hi, > I needed some light on how joins actually work on continuous stream of data. > > Say I have 2 topics which I need to join (left join). > Data record in each topic is aggregated like (key, value) <=> (str

Re: Kafka Streams windowed aggregation

2016-10-05 Thread Eno Thereska
Hi Davood, The behaviour is indeed as you say. Recently we checked in KIP-63 in trunk (it will be part of the 0.10.1 release coming up). That should reduce the amount of downstream traffic you see (https://cwiki.apache.org/confluence/display/KAFKA/KIP-63%3A+Unify+store+and+downstream+caching+in

Re: Handling out-of-order messaging w/ Kafka Streams

2016-09-27 Thread Eno Thereska
Hi Mathieu, If the messages are sent asynchronously, then what you're observing is indeed right. There is no guarantee that the first will arrive at the destination first. Perhaps you can try sending them synchronously (i.e., wait until the first one is received, before sending the second). Tha

Re: Time of derived records in Kafka Streams

2016-09-19 Thread Eno Thereska
> > On Sat, Sep 10, 2016 at 9:17 AM, Eno Thereska > wrote: > >> >> For aggregations, the timestamp will be that of the latest record being >> aggregated. >> > > How does that account for out of order records? > > What about kstream-kstream jo

Re: Performance issue with KafkaStreams

2016-09-17 Thread Eno Thereska
streams/perf/SimpleBenchmark.java > > bash$ find . -name SimpleBenchmark.class > > bash$ jar -tf streams/build/libs/kafka-streams-0.10.1.0-SNAPSHOT.jar | grep > SimpleBenchmark > > > > On Sat, Sep 10, 2016 at 6:18 AM, Eno Thereska > wrote: > >> Hi Caleb

Re: Performance issue with KafkaStreams

2016-09-11 Thread Eno Thereska
titions and num.stream.threads num.consumer.fetchers > and other such parameters? On a single node setup with x partitions, what’s > the best way to make sure these partitions are consumed and processed by > kafka streams optimally? > > Ara. > >> On Sep 10, 2016, at 3:18 AM, Eno Thereska w

Re: Time of derived records in Kafka Streams

2016-09-10 Thread Eno Thereska
Hi Elias, Good question. The general answer is that each time a record is output, the timestamp is that of the current Kafka Streams task that processes it, so it's the internal Kafka Streams time. If the Kafka Streams task is processing records with event time, the timestamp at any point is th

Re: Performance issue with KafkaStreams

2016-09-10 Thread Eno Thereska
oup.id=test-consumer-group > bootstrap.servers=broker1:9092,broker2:9092zookeeper.connect=zk1:2181 > replication.factor=2 > > auto.offset.reset=earliest > > > On Friday, September 9, 2016 8:48 AM, Eno Thereska > wrote: > > > > Hi Caleb, > > C

Re: Performance issue with KafkaStreams

2016-09-09 Thread Eno Thereska
Hi Caleb, Could you share your Kafka Streams configuration (i.e., StreamsConfig properties you might have set before the test)? Thanks Eno On Thu, Sep 8, 2016 at 12:46 AM, Caleb Welton wrote: > I have a question with respect to the KafkaStreams API. > > I noticed during my prototyping work tha

Re: Partitioning at the edges

2016-09-06 Thread Eno Thereska
is to just use a single partition for most topics but then "wrap" the ledger system with a process that re-partitions it's input as necessary for scale. Cheers, Andy On Sat, Sep 3, 2016 at 6:13 AM, Eno Thereska wrote: Hi Andy, Could you share a bit more info or pseudocode so that w

Re: Partitioning at the edges

2016-09-03 Thread Eno Thereska
Hi Andy, Could you share a bit more info or pseudocode so that we can understand the scenario a bit better? Especially around the streams at the edges. How are they created and what is the join meant to do? Thanks Eno > On 3 Sep 2016, at 02:43, Andy Chambers wrote: > > Hey Folks, > > We are

Re: Kafka streams

2016-08-25 Thread Eno Thereska
ors, > first of which is source, would all of them run in single thread? There may > be scenarios wherein I want to have different number of threads for > different processors since some may be CPU bound and some may be IO bound. > > On Thu, Aug 25, 2016 at 10:49 PM, Eno Theres

Re: Kafka streams

2016-08-25 Thread Eno Thereska
Hi Abhishek, - Correct on connecting to external stores. You can use Kafka Connect to get things in or out. (Note that in the 0.10.1 release KIP-67 allows you to directly query Kafka Stream's stores so, for some kind of data you don't need to move it to an external store. This is pushed in trun

Re: Kafka Streams on Windows?

2016-08-05 Thread Eno Thereska
tps://groups.google.com/forum/#!topic/confluent-platform/Z1rsfSNrVJk) > and the conclusion there was that high-level streams DSL doesn't support > configuration of the stores. > > Mathieu > > > On Thu, Aug 4, 2016 at 10:28 AM, Eno Thereska > wrote: > >> Hi Ma

Re: Kafka Streams on Windows?

2016-08-04 Thread Eno Thereska
Hi Mathieu, Have you had a chance to look at http://rocksdb.org/blog/2033/rocksdb-is-now-available-in-windows-platform/? Curious to hear your and other's comments on whether that worked. It is possible to have Kafka

Re: Kafka Streams input

2016-08-04 Thread Eno Thereska
Hi Jeyhun, You can use Kafka Connect to bring data into Kafka from HDFS then use Kafka Streams as usual. I believe there is a desire to make the Connect + Streams more integrated so it doesn't feel like you're running two different things, and hopefully that integration happens soon. Thanks E

Re: Kafka Streams: Merging of partial results

2016-07-24 Thread Eno Thereska
Hi Michael-Keith, Good question. Two answers: in the default case the same key (e.g., "world") would end up in the same partition, so you wouldn't have the example you describe here where the same key is in two different partitions of the same topic. E.g., this default case applies if you are w

Re: Kafka Streams Latency

2016-07-22 Thread Eno Thereska
Hi Adrienne, Also you might want to have a look at the SimpleBenchmark.java file included with Kafka Streams (org.apache.kafka.streams.perf. SimpleBenchmark). It does some simple measurements of consumer, producer and Kafka Streams throughput. Thanks Eno > On 22 Jul 2016, at 07:21, David Garci

Re: Questions about Kafka Streams Partitioning & Deployment

2016-07-20 Thread Eno Thereska
Hi Michael, These are good questions and I can confirm that the system works in the way you hope it works, if you use the DSL and don't make up keys arbitrarily. In other words, there is nothing currently that prevents you from shooting yourself in the foot e.g., by making up keys and using the

Re: Kafka Connect or Streams Output to local File System

2016-07-15 Thread Eno Thereska
Hi David, One option would be to first output your info to a topic using Kafka Streams, and then use Connect again (as a sink) to read from the topic and write to a file in the file system. Eno > On 15 Jul 2016, at 08:24, David Newberger > wrote: > > Hello All, > > I'm curious if I can out

Re: Kafka Streams reducebykey and tumbling window - need final windowed KTable values

2016-07-07 Thread Eno Thereska
dowed store at some >> time soon after window end time. I can then use the methods specified to >> iterate over keys and values. Can you point me to the relevant >> method/technique for this? >> >> Thanks, >> Clive >> >> >>On Tuesday, 28 Ju

Re: Streams RocksDB State Store Disk Usage

2016-06-29 Thread Eno Thereska
Hi Avi, These are internal files to RockDb. Depending on your load in the system I suppose they could contain quite a bit of data. How large was the load in the system these past two weeks so we can calibrate? Otherwise I'm not sure if 1-2GB is a lot or not (sounds like not that big to make the

Re: Can I access Kafka Streams Key/Value store outside of Processor?

2016-06-29 Thread Eno Thereska
e is a mail thread in the dev mailing list. Thanks Eno > On 29 Jun 2016, at 15:52, Yi Chen wrote: > > This is awesome Eno! Would you mind sharing the JIRA ticket if you have one? > > On Sun, Jun 19, 2016 at 12:07 PM, Eno Thereska > wrote: > >> Hi Yi, >>

Re: Kafka Streams reducebykey and tumbling window - need final windowed KTable values

2016-06-28 Thread Eno Thereska
ks Eno > On 27 Jun 2016, at 20:56, Eno Thereska wrote: > > Hi Clive, > > We are working on exposing the state store behind a KTable as part of > allowing for queries to the structures currently hidden behind the language > (DSL). The KIP should be out today or tomorrow

Re: Kafka Streams reducebykey and tumbling window - need final windowed KTable values

2016-06-27 Thread Eno Thereska
end-time had been passed I can send the consolidated aggregation for that >> window and then throw the Window away. >> >> It sounds like from the references you give that this is not possible at >> present in Kafka Streams? >> >> Thanks, >> Clive

Re: Kafka Streams backend for Apache Beam?

2016-06-26 Thread Eno Thereska
Hi Alex, There are no such plans currently. Thanks Eno > On 26 Jun 2016, at 17:13, Alex Glikson wrote: > > Hi all, > > I wonder whether there are plans to implement Apache Beam backend based on > Kafka Streams? > > Thanks, > Alex > >

Re: Can I access Kafka Streams Key/Value store outside of Processor?

2016-06-19 Thread Eno Thereska
Hi Yi, Your observation about accessing the state stores that are already there vs. keeping state outside of Kafka Streams is a good one. We are currently working on having the state stores accessible like you mention and should be able to share some design docs shortly. Thanks Eno > On 19 Ju

Re: Kafka Streams table persistence

2016-06-16 Thread Eno Thereska
Hi Jan, It's a good question. Your observations are correct. A KTable is sometimes materialised to a changelog topic lazily, depending on whether the result in the KTable is needed for subsequent operations or not. We are working to improving the documentation of the materialisation rules and

Re: Fail to build examples with gradle on Kafka using JDK 8

2016-06-16 Thread Eno Thereska
Hi Phil, Feel free to comment on that JIRA and re-open if necessary. Eno > On 16 Jun 2016, at 17:02, Philippe Derome wrote: > > The issue had apparently existed and is apparently resolved, but so far it > does not work for me: https://issues.apache.org/jira/browse/KAFKA-2203. > > I issue sam

Re: About null keys

2016-06-15 Thread Eno Thereska
Hi Adrienne, How do you enter the input on t1 topic? If using kafka-console-producer, you'll need to pass in keys as well as values. Here is an example: http://www.shayne.me/blog/2015/2015-06-25-everything-about-kafka-part-2/

Re: WordCount example errors

2016-06-14 Thread Eno Thereska
HI Jeyhun, What version of Kafka are you using? I haven't run this using Eclipse, but could you try building and running from the command line (instead of from within Eclipse) as described in that quickstart document? Does that work? Thanks Eno > On 13 Jun 2016, at 20:27, Jeyhun Karimov wro

Re: Kafka Streams reducebykey and tumbling window - need final windowed KTable values

2016-06-13 Thread Eno Thereska
this is not possible at > present in Kafka Streams? > > Thanks, > Clive > >On Monday, 13 June 2016, 11:32, Eno Thereska > wrote: > > > Hi Clive, > > The behaviour you are seeing is indeed correct (though not necessarily > optimal in terms of p

Re: Kafka Streams reducebykey and tumbling window - need final windowed KTable values

2016-06-13 Thread Eno Thereska
Hi Clive, The behaviour you are seeing is indeed correct (though not necessarily optimal in terms of performance as described in this JIRA: https://issues.apache.org/jira/browse/KAFKA-3101 ) The key observation is that windows never close/compl

Re: Kafka Streaming - Window expiration

2016-06-13 Thread Eno Thereska
Hi Pari, Try the .until method like this: > (TimeWindows) TimeWindows.of("tumbling-window-example", > windowSizeMs).until(60 * 1000L) Thanks Eno > On 13 Jun 2016, at 08:31, Pariksheet Barapatre wrote: > > Hello Experts, > > As per documentation in kafka docs - > *Windowing* is a common pre

Re: KafkaStream and Kafka consumer group

2016-06-09 Thread Eno Thereska
r.stream("test").foreach((x, y) -> { >controller.tell(y, controller.noSender()); > }); > > > The msg/sec rate I get from receiving messages to sending them out is > really slow! > > Do you think it is about how consume messages? > > Thank yo

Re: KafkaStream and Kafka consumer group

2016-06-08 Thread Eno Thereska
Hi Saeed, Kafka Streams takes care of assigning partitions to consumers automatically for you. You don't have to write anything explicit to do that. See WordCountDemo.java as an example. Was there another reason you wanted control over partition assignment? Thanks Eno > On 7 Jun 2016, at 20:0

Re: Serdes class

2016-06-07 Thread Eno Thereska
Hi Saeed, You'll also need to add a dependency on org.apache.kafka's kafka-clients, since that has the serialization classes. org.apache.kafka kafka-clients 0.10.0.0 Thanks Eno > On 7 Jun 2016, at 15:50, Saeed Ansari wrote: > > Hi everyone, > I h

Re: Does the Kafka Streams DSL support non-Kafka sources/sinks?

2016-06-02 Thread Eno Thereska
Hi Avi, Using the low-level streams API you can definitely read or write to arbitrary locations inside the process() method. However, back to your original question: even with the low-level streams API the sources and sinks can only be Kafka topics for now. So, as Gwen mentioned, Connect would

<    1   2   3