Re: KafkaStream and Kafka consumer group

2016-06-08 Thread Eno Thereska
Hi Saeed, Kafka Streams takes care of assigning partitions to consumers automatically for you. You don't have to write anything explicit to do that. See WordCountDemo.java as an example. Was there another reason you wanted control over partition assignment? Thanks Eno > On 7 Jun 2016, at

Re: Serdes class

2016-06-07 Thread Eno Thereska
Hi Saeed, You'll also need to add a dependency on org.apache.kafka's kafka-clients, since that has the serialization classes. org.apache.kafka kafka-clients 0.10.0.0 Thanks Eno > On 7 Jun 2016, at 15:50, Saeed Ansari

Re: KafkaStream and Kafka consumer group

2016-06-09 Thread Eno Thereska
ld actors to send messages out. > > builder.stream("test").foreach((x, y) -> { >controller.tell(y, controller.noSender()); > }); > > > The msg/sec rate I get from receiving messages to sending them out is > really slow! > > Do you think it is about how consume message

Re: Kafka Streaming - Window expiration

2016-06-13 Thread Eno Thereska
Hi Pari, Try the .until method like this: > (TimeWindows) TimeWindows.of("tumbling-window-example", > windowSizeMs).until(60 * 1000L) Thanks Eno > On 13 Jun 2016, at 08:31, Pariksheet Barapatre wrote: > > Hello Experts, > > As per documentation in kafka docs - >

Re: Kafka Streams reducebykey and tumbling window - need final windowed KTable values

2016-06-13 Thread Eno Thereska
Hi Clive, The behaviour you are seeing is indeed correct (though not necessarily optimal in terms of performance as described in this JIRA: https://issues.apache.org/jira/browse/KAFKA-3101 ) The key observation is that windows never

Re: WordCount example errors

2016-06-14 Thread Eno Thereska
HI Jeyhun, What version of Kafka are you using? I haven't run this using Eclipse, but could you try building and running from the command line (instead of from within Eclipse) as described in that quickstart document? Does that work? Thanks Eno > On 13 Jun 2016, at 20:27, Jeyhun Karimov

Re: Kafka Streams reducebykey and tumbling window - need final windowed KTable values

2016-06-13 Thread Eno Thereska
the references you give that this is not possible at > present in Kafka Streams? > > Thanks, > Clive > >On Monday, 13 June 2016, 11:32, Eno Thereska <eno.there...@gmail.com> > wrote: > > > Hi Clive, > > The behaviour you are seeing is indeed corr

Re: Does the Kafka Streams DSL support non-Kafka sources/sinks?

2016-06-02 Thread Eno Thereska
Hi Avi, Using the low-level streams API you can definitely read or write to arbitrary locations inside the process() method. However, back to your original question: even with the low-level streams API the sources and sinks can only be Kafka topics for now. So, as Gwen mentioned, Connect

Re: About null keys

2016-06-15 Thread Eno Thereska
Hi Adrienne, How do you enter the input on t1 topic? If using kafka-console-producer, you'll need to pass in keys as well as values. Here is an example: http://www.shayne.me/blog/2015/2015-06-25-everything-about-kafka-part-2/

Re: Can I access Kafka Streams Key/Value store outside of Processor?

2016-06-19 Thread Eno Thereska
Hi Yi, Your observation about accessing the state stores that are already there vs. keeping state outside of Kafka Streams is a good one. We are currently working on having the state stores accessible like you mention and should be able to share some design docs shortly. Thanks Eno > On 19

Re: Fail to build examples with gradle on Kafka using JDK 8

2016-06-16 Thread Eno Thereska
Hi Phil, Feel free to comment on that JIRA and re-open if necessary. Eno > On 16 Jun 2016, at 17:02, Philippe Derome wrote: > > The issue had apparently existed and is apparently resolved, but so far it > does not work for me:

Re: Kafka Streams table persistence

2016-06-16 Thread Eno Thereska
Hi Jan, It's a good question. Your observations are correct. A KTable is sometimes materialised to a changelog topic lazily, depending on whether the result in the KTable is needed for subsequent operations or not. We are working to improving the documentation of the materialisation rules and

Re: Kafka Streams reducebykey and tumbling window - need final windowed KTable values

2016-06-28 Thread Eno Thereska
ks Eno > On 27 Jun 2016, at 20:56, Eno Thereska <eno.there...@gmail.com> wrote: > > Hi Clive, > > We are working on exposing the state store behind a KTable as part of > allowing for queries to the structures currently hidden behind the language > (DSL). The KI

Re: Kafka Streams reducebykey and tumbling window - need final windowed KTable values

2016-06-27 Thread Eno Thereska
lp, > Clive > > > On Monday, 13 June 2016, 16:03, Eno Thereska <eno.there...@gmail.com> > wrote: > > > Hi Clive, > > For now this optimisation is not present. We're working on it as part of > KIP-63. One manual work-around might be to use a

Re: Kafka Streams backend for Apache Beam?

2016-06-27 Thread Eno Thereska
Hi Alex, There are no such plans currently. Thanks Eno > On 26 Jun 2016, at 17:13, Alex Glikson wrote: > > Hi all, > > I wonder whether there are plans to implement Apache Beam backend based on > Kafka Streams? > > Thanks, > Alex > >

Re: Kafka Streams Latency

2016-07-22 Thread Eno Thereska
Hi Adrienne, Also you might want to have a look at the SimpleBenchmark.java file included with Kafka Streams (org.apache.kafka.streams.perf. SimpleBenchmark). It does some simple measurements of consumer, producer and Kafka Streams throughput. Thanks Eno > On 22 Jul 2016, at 07:21, David

Re: Kafka Streams: Merging of partial results

2016-07-24 Thread Eno Thereska
Hi Michael-Keith, Good question. Two answers: in the default case the same key (e.g., "world") would end up in the same partition, so you wouldn't have the example you describe here where the same key is in two different partitions of the same topic. E.g., this default case applies if you are

Re: Kafka Streams input

2016-08-04 Thread Eno Thereska
Hi Jeyhun, You can use Kafka Connect to bring data into Kafka from HDFS then use Kafka Streams as usual. I believe there is a desire to make the Connect + Streams more integrated so it doesn't feel like you're running two different things, and hopefully that integration happens soon. Thanks

Re: Questions about Kafka Streams Partitioning & Deployment

2016-07-20 Thread Eno Thereska
Hi Michael, These are good questions and I can confirm that the system works in the way you hope it works, if you use the DSL and don't make up keys arbitrarily. In other words, there is nothing currently that prevents you from shooting yourself in the foot e.g., by making up keys and using

Re: Kafka Connect or Streams Output to local File System

2016-07-15 Thread Eno Thereska
Hi David, One option would be to first output your info to a topic using Kafka Streams, and then use Connect again (as a sink) to read from the topic and write to a file in the file system. Eno > On 15 Jul 2016, at 08:24, David Newberger > wrote: > > Hello

Re: Kafka Streams reducebykey and tumbling window - need final windowed KTable values

2016-07-07 Thread Eno Thereska
make to do what I presently need: Get access to each windowed store at some >> time soon after window end time. I can then use the methods specified to >> iterate over keys and values. Can you point me to the relevant >> method/technique for this? >> >> Thanks,

Re: Kafka Streams on Windows?

2016-08-05 Thread Eno Thereska
m mailing > list (https://groups.google.com/forum/#!topic/confluent-platform/Z1rsfSNrVJk) > and the conclusion there was that high-level streams DSL doesn't support > configuration of the stores. > > Mathieu > > > On Thu, Aug 4, 2016 at 10:28 AM, Eno Thereska <eno

Re: Can I access Kafka Streams Key/Value store outside of Processor?

2016-06-29 Thread Eno Thereska
elcome, there is a mail thread in the dev mailing list. Thanks Eno > On 29 Jun 2016, at 15:52, Yi Chen <y...@symphonycommerce.com> wrote: > > This is awesome Eno! Would you mind sharing the JIRA ticket if you have one? > > On Sun, Jun 19, 2016 at 12:07 PM, Eno Thereska <eno.there..

Re: Streams RocksDB State Store Disk Usage

2016-06-29 Thread Eno Thereska
Hi Avi, These are internal files to RockDb. Depending on your load in the system I suppose they could contain quite a bit of data. How large was the load in the system these past two weeks so we can calibrate? Otherwise I'm not sure if 1-2GB is a lot or not (sounds like not that big to make

Re: Kafka Streams delivery semantics and state store

2017-02-02 Thread Eno Thereska
Hi Krzysztof, There are several scenarios where you want a set of records to be sent atomically (to a statestore, downstream topics etc). In case of failure then, either all of them commit successfully, or none does. We are working to add exactly-once processing to Kafka Streams and I suspect

Re: [DISCUSS] KIP-118: Drop Support for Java 7 in Kafka 0.11

2017-02-03 Thread Eno Thereska
Makes sense. Eno > On 3 Feb 2017, at 10:38, Ismael Juma wrote: > > Hi all, > > I have posted a KIP for dropping support for Java 7 in Kafka 0.11: > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-118%3A+Drop+Support+for+Java+7+in+Kafka+0.11 > > Most people were

Re: [DISCUSS] KIP-114: KTable materialization and improved semantics

2017-02-01 Thread Eno Thereska
atthias > > > On 1/30/17 3:50 AM, Jan Filipiak wrote: >> Hi Eno, >> >> thanks for putting into different points. I want to put a few remarks >> inline. >> >> Best Jan >> >> On 30.01.2017 12:19, Eno Thereska wrote: >>> So I

Re: KIP-121 [Discuss]: Add KStream peek method

2017-02-07 Thread Eno Thereska
Hi, I like the proposal, thank you. I have found it frustrating myself not to be able to understand simple things, like how many records have been currently processed. The peek method would allow those kinds of diagnostics and debugging. Gwen, it is possible to do this with the existing

Re: KIP-121 [Discuss]: Add KStream peek method

2017-02-08 Thread Eno Thereska
hich is supposed to do exactly that). >>>>> >>>>> For reference, here are the descriptions of peek, map, foreach in Java >>> 8. >>>>> I could have also included links to StackOverflow questions where people >>>>> were confused abo

Re: KTable and cleanup.policy=compact

2017-02-08 Thread Eno Thereska
If you fail to set the policy to compact, there shouldn't be any correctness implications, however your topics will grow larger than necessary. Eno > On 8 Feb 2017, at 18:56, Jon Yeargers wrote: > > What are the ramifications of failing to do this? > > On Tue, Feb

Re: KTable and cleanup.policy=compact

2017-02-08 Thread Eno Thereska
cleanup.policy of delete would mean that topic entries past the retention >> policy might have been removed. If you scale up the application, new >> application instances won't be able to restore a complete table into its >> local state store. An operation like a join against that K

Re: At Least Once semantics for Kafka Streams

2017-02-05 Thread Eno Thereska
Hi Mahendra, That is a good question. Streams uses consumers and that config applies to consumers. However, in streams we always set enable.auto.commit to false, and manage commits using the other commit parameter. That way streams has more control on when offsets are committed. Eno > On 6

Re: At Least Once semantics for Kafka Streams

2017-02-06 Thread Eno Thereska
Oh, by "other" I meant the original one you started discussing: COMMIT_INTERVAL_MS_CONFIG. Eno > On 6 Feb 2017, at 09:28, Mahendra Kariya <mahendra.kar...@go-jek.com> wrote: > > Thanks Eno! > > I am just wondering what is this other commit parameter? > &

Fwd: [DISCUSS] KIP-114: KTable materialization and improved semantics

2017-01-26 Thread Eno Thereska
always materialized, and so are the >>>>>> joining KTables to always be materialized. >>>>>> d) KTable.filter/mapValues resulted KTables materialization depend on >>> its >>>>>> parent's materialization; >>>>>> >>

Re: [DISCUSS] KIP-114: KTable materialization and improved semantics

2017-01-30 Thread Eno Thereska
em would be implementing it > like this. > Yes I am aware that the current IQ API will not support querying by > KTableProcessorName instread of statestoreName. But I think that had to > change if you want it to be intuitive > IMO you gotta apply the filter read time > > Looking f

Re: Cannot access Kafka Streams JMX metrics using jmxterm

2017-01-30 Thread Eno Thereska
Hi Jendrik, I haven't tried jmxterm. Can you confirm if it is able to access the Kafka producer/consumer metrics (they exist since Kafka Streams internally uses Kafka)? I've personally used jconsole to look at the collected streams metrics, but that might be limited for your needs. Thanks

Re: [DISCUSS] KIP-114: KTable materialization and improved semantics

2017-01-30 Thread Eno Thereska
n basically uses the same mechanism that is currently used. >> From my point of view this is the least confusing way for DSL users. If >> its to tricky to get a hand on the streams instance one could ask the user >> to pass it in before executing queries, therefore making sure the str

Re: [DISCUSS] KIP-114: KTable materialization and improved semantics

2017-01-28 Thread Eno Thereska
rent question is what e.g. Jan Filipiak >>>>> mentioned earlier in this thread: >>>>> I think we need to explain more clearly why KIP-114 doesn't propose the >>>>> seemingly simpler solution of always materializing tables/state stores. >>>>&

Re: [DISCUSS] KIP-114: KTable materialization and improved semantics

2017-01-28 Thread Eno Thereska
alueGetter of Filter it will apply the filter and should be completely >>> transparent as to if another processor or IQ is accessing it? How can this >>> new method help? >>> >>> I cannot see the reason for the additional materialize method being >>&g

Re: What are wall clock-driven alternatives to punctuate()?

2017-01-22 Thread Eno Thereska
Dmitry, You are right about punctuate being data driven. We are discussing options for changing that, but that doesn't help you now. Depending on what you're doing, one option would be to use the Interactive Query APIs to periodically query the state stores depending on your desired frequency

Re: Implementing a non-key in Kafka Streams using the Processor API

2017-02-21 Thread Eno Thereska
Hi Frank, As far as I know the design in that wiki has been superceded by the Global KTables design which is now coming in 0.10.2. Hence, the JIRAs that are mentioned there (like KAFKA-3705). There are some extensive comments in https://issues.apache.org/jira/browse/KAFKA-3705

Re: Consumer / Streams causes deletes in __consumer_offsets?

2017-02-22 Thread Eno Thereska
Hi Mathieu, It could be that the offset retention period has expired. See this: http://stackoverflow.com/questions/39131465/how-does-an-offset-expire-for-an-apache-kafka-consumer-group

Re: System went OutOfMemory when running Kafka-Streaming

2017-02-17 Thread Eno Thereska
Hi Deepak, Could you tell us a bit more on what your app is doing (e.g., aggregates, or joins)? Also which version of Kafka are you using? Have you changed any default configs? Thanks Eno > On 17 Feb 2017, at 05:30, Deepak Pengoria wrote: > > Hi, > I ran my

Re: Immutable Record with Kafka Stream

2017-02-24 Thread Eno Thereska
Hi Kohki, As you mentioned, this is expected behavior. However, if you are willing to tolerate some more latency, you can improve the chance that a message with the same key is overwritten by increasing the commit time. By default it is 30 seconds, but you can increase it:

Re: KTable send old values API

2017-02-21 Thread Eno Thereska
Hi Dmitry, Could you tell us more on the exact API you'd like? Perhaps if others find it useful too we/you can do a KIP. Thanks Eno > On 21 Feb 2017, at 22:01, Dmitry Minkovsky wrote: > > At KAFKA-2984: ktable sends old values when required >

Re: Shutting down a Streams job

2017-02-09 Thread Eno Thereska
Hi Dmitry, Elias, You raise a valid point, and thanks for opening https://issues.apache.org/jira/browse/KAFKA-4748 Elias. We'll hopefully have some ideas to share soon. Eno > On 9 Feb 2017, at 16:54, Dmitry Minkovsky

Re: Table a KStream

2017-02-11 Thread Eno Thereska
Yes you are correct Elliot. If you look at it just from the point of view of what's in the topic, and how the topic is configured (e.g., with "compact") there is no difference. Thanks Eno > On 11 Feb 2017, at 17:56, Elliot Crosby-McCullough > wrote: > > For my own

Re: KTable Windowed state store name

2017-02-15 Thread Eno Thereska
Hi Shimi, Did you pass in a state store name for the windowed example? E.g., if you are doing "count" or another aggregate, you can pass in a desired state store name. Thanks Eno > On 15 Feb 2017, at 00:35, Shimi Kiviti wrote: > > Hello, > > I have a code that access a

Re: [VOTE] 0.10.2.0 RC1

2017-02-13 Thread Eno Thereska
+1 (non binding) Checked streams. Verified that stream tests work and examples off confluentinc/examples/kafka-streams work. Thanks Eno > On 10 Feb 2017, at 16:51, Ewen Cheslack-Postava wrote: > > Hello Kafka users, developers and client-developers, > > This is RC1 for

Re: ProcessorContext commit question

2017-02-09 Thread Eno Thereska
Hi Adrian, It's also done in the DSL, but at a different point, in Task.commit(), since the flow is slightly different. Yes, once data is stored in stores, the offsets should be committed, so in case of a crash the same offsets are not processed again. Thanks Eno > On 9 Feb 2017, at 16:06,

Re: ProcessorContext commit question

2017-02-09 Thread Eno Thereska
> Thanks Eno that makes sense. > > If then this is an implementation of Transformer which is in a DSL topology > with DSL sinks ie `to()`, is the commit surplus to requirement? I suspect it > will do no harm at the very least. > > Thanks > Adrian > > -Original Message--

Re: KIP-121 [VOTE]: Add KStream peek method

2017-02-15 Thread Eno Thereska
KIP is accepted, discussion now moves to PR. Thanks Eno On Wed, Feb 15, 2017 at 12:28 PM, Steven Schlansker < sschlans...@opentable.com> wrote: > Oops, sorry, a number of votes were sent only to -dev and not to > -user and so I missed those in the email I just sent. The actual count is > more

Re: KTable and cleanup.policy=compact

2017-02-13 Thread Eno Thereska
ta flows.. ppl are brought closer together.. there is > peace in the valley.. for me... ) > ... > > KafkaStreams = new KafkaStream(KStreamBuilder, > config_with_cleanup_policy_or_not?) > KafkaStream.start > > On Wed, Feb 8, 2017 at 12:30 PM, Eno Thereska <eno.there...@gmail.com> >

Re: Kafka Streams: got bit by WindowedSerializer (only window.start is serialized)

2017-01-17 Thread Eno Thereska
focus.io>: > >> Hi Eno, >> I thought it would be impossible to put this in Kafka because of backward >> incompatibility with the existing windowed keys, no ? >> In my case, I had to recreate a new output topic, reset the topology, and >> and reprocess all my data. >&g

Re: Kafka Streams: got bit by WindowedSerializer (only window.start is serialized)

2017-01-16 Thread Eno Thereska
Nicolas, I'm checking with Bill who originally was interested in KAFKA-4468. If he isn't actively working on it, why don't you give it a go and create a pull request (PR) for it? That way your contribution is properly acknowledged etc. We can help you through with that. Thanks Eno > On 16 Jan

Re: Streams: Global state & topic multiplication questions

2017-01-19 Thread Eno Thereska
Hi Peter, About Q1: The DSL has the "branch" API, where one stream is branched to several streams, based on a predicate. I think that could help. About Q2: I'm not entirely sure I understand the problem space. What is the definition of a "full image"? Thanks Eno > On 19 Jan 2017, at 12:07,

Re: Streams: Global state & topic multiplication questions

2017-01-19 Thread Eno Thereska
etails of > other images (99% of records are not interesting for the specific user). > > Thanks, > > Peter > > > > On Thu, Jan 19, 2017 at 4:55 PM, Eno Thereska <eno.there...@gmail.com> > wrote: > >> Hi Peter, >> >> About Q1: The DSL has the

Re: Stream topology with multiple Kaka clusters

2017-02-27 Thread Eno Thereska
Hi Mahendra, The short answer is "not yet", but see this link for more: https://groups.google.com/forum/?pli=1#!msg/confluent-platform/LC88ijQaEMM/sa96OfK9AgAJ;context-place=forum/confluent-platform Thanks Eno > On 27 Feb 2017, at 13:37, Mahendra Kariya wrote: > >

Re: Large state directory with Kafka Streams

2017-02-27 Thread Eno Thereska
Hi Ian, It looks like you have a lot of partitions for the count store. Each RocksDb database uses off heap memory (around 60-70MB in 0.10.2) which will add up if you have these many stores in one instance. One solution would be to scale out your streams application by using another Kafka

Re: kafka streams locking issue in 0.10.20.0

2017-02-26 Thread Eno Thereska
Hi Ara, There are some instructions here for monitoring whether RocksDb is having its writes stalled: https://github.com/facebook/rocksdb/wiki/Write-Stalls , together with some instructions for tuning. Could you share RocksDb's LOG file?

Re: Lock Exception - Failed to lock the state directory

2017-02-26 Thread Eno Thereska
Hi Dan, Just checking on the version: you are using 0.10.0.2 or 0.10.2.0 (i.e., latest)? I ask because state locking was something that was fixed in 0.10.2. Thanks Eno > On 26 Feb 2017, at 13:37, Dan Ofir wrote: > > Hi, > > I encountered an exception while using

Re: Large state directory with Kafka Streams

2017-02-27 Thread Eno Thereska
hrottle the amount of topics a stream subscribes > to? > > I see https://issues.apache.org/jira/browse/KAFKA-3775 which seems to be > along these lines. > > Thanks again, > Ian. > > On 27 February 2017 at 15:41, Eno Thereska <eno.there...@gmail.com> wrote: > &

Re: Large state directory with Kafka Streams

2017-02-27 Thread Eno Thereska
nt since the 0.10.1.1 client. > Any specifics I can include within the JIRA ticket to help with getting to > the bottom of it? I can't seem to pinpoint a specific trigger for them. > Sometimes the application can run for hours/days before hitting them. > > On 27 February 2017 at 14:17, En

Re: Partitioning at the edges

2016-09-03 Thread Eno Thereska
Hi Andy, Could you share a bit more info or pseudocode so that we can understand the scenario a bit better? Especially around the streams at the edges. How are they created and what is the join meant to do? Thanks Eno > On 3 Sep 2016, at 02:43, Andy Chambers wrote:

Re: Partitioning at the edges

2016-09-06 Thread Eno Thereska
an is to just use a single partition for most topics but then "wrap" the ledger system with a process that re-partitions it's input as necessary for scale. Cheers, Andy On Sat, Sep 3, 2016 at 6:13 AM, Eno Thereska <eno.there...@gmail.com> wrote: Hi Andy, Could you share a

Re: Performance issue with KafkaStreams

2016-09-09 Thread Eno Thereska
Hi Caleb, Could you share your Kafka Streams configuration (i.e., StreamsConfig properties you might have set before the test)? Thanks Eno On Thu, Sep 8, 2016 at 12:46 AM, Caleb Welton wrote: > I have a question with respect to the KafkaStreams API. > > I noticed during my

Re: Performance issue with KafkaStreams

2016-09-10 Thread Eno Thereska
ion.id=test-prototype > > group.id=test-consumer-group > bootstrap.servers=broker1:9092,broker2:9092zookeeper.connect=zk1:2181 > replication.factor=2 > > auto.offset.reset=earliest > > > On Friday, September 9, 2016 8:48 AM, Eno Thereska <eno.there...@gmail.com> > wr

Re: Time of derived records in Kafka Streams

2016-09-10 Thread Eno Thereska
Hi Elias, Good question. The general answer is that each time a record is output, the timestamp is that of the current Kafka Streams task that processes it, so it's the internal Kafka Streams time. If the Kafka Streams task is processing records with event time, the timestamp at any point is

Re: Performance issue with KafkaStreams

2016-09-11 Thread Eno Thereska
gt; relationships between partitions and num.stream.threads num.consumer.fetchers > and other such parameters? On a single node setup with x partitions, what’s > the best way to make sure these partitions are consumed and processed by > kafka streams optimally? > > Ara. > >> On Se

Re: Kafka Streams windowed aggregation

2016-10-05 Thread Eno Thereska
Hi Davood, The behaviour is indeed as you say. Recently we checked in KIP-63 in trunk (it will be part of the 0.10.1 release coming up). That should reduce the amount of downstream traffic you see

Re: Understanding how joins work in Kafka streams

2016-10-10 Thread Eno Thereska
nderstanding is correct. I suppose it will be > called every time a key is modified or it buffers the changes and calls it > once in a given time span. > > Thanks > Sachin > > > > On Sun, Oct 9, 2016 at 3:07 PM, Eno Thereska <eno.there...@gmail.com> wrote: > >

Re: Tuning for high RAM and 10GBe

2016-10-09 Thread Eno Thereska
Hi Chris, A couple of things: looks like you're primarily testing the SSD if you're running on the localhost, right? The 10GBe shouldn't matter in this case. The performance will depend a lot on the record sizes you are using. To get a ballpark number if would help to know the SSD type, would

Re: Time of derived records in Kafka Streams

2016-09-19 Thread Eno Thereska
.lucid...@gmail.com> wrote: > > On Sat, Sep 10, 2016 at 9:17 AM, Eno Thereska <eno.there...@gmail.com> > wrote: > >> >> For aggregations, the timestamp will be that of the latest record being >> aggregated. >> > > How does that account for ou

Re: Performance issue with KafkaStreams

2016-09-17 Thread Eno Thereska
test/java/org/apache/kafka/streams/perf/SimpleBenchmark.java > > bash$ find . -name SimpleBenchmark.class > > bash$ jar -tf streams/build/libs/kafka-streams-0.10.1.0-SNAPSHOT.jar | grep > SimpleBenchmark > > > > On Sat, Sep 10, 2016 at 6:18 AM, Eno Thereska <eno.there...@gmail

Re: Kafka streams

2016-08-25 Thread Eno Thereska
DAG has three stream processors, > first of which is source, would all of them run in single thread? There may > be scenarios wherein I want to have different number of threads for > different processors since some may be CPU bound and some may be IO bound. > > On Thu, Aug 25, 2016

Re: Kafka streams

2016-08-25 Thread Eno Thereska
Hi Abhishek, - Correct on connecting to external stores. You can use Kafka Connect to get things in or out. (Note that in the 0.10.1 release KIP-67 allows you to directly query Kafka Stream's stores so, for some kind of data you don't need to move it to an external store. This is pushed in

Re: Handling out-of-order messaging w/ Kafka Streams

2016-09-27 Thread Eno Thereska
Hi Mathieu, If the messages are sent asynchronously, then what you're observing is indeed right. There is no guarantee that the first will arrive at the destination first. Perhaps you can try sending them synchronously (i.e., wait until the first one is received, before sending the second).

Re: Kafka streaming changelog topic max.message.bytes exception

2016-11-08 Thread Eno Thereska
Hi Sachin, Could you clarify what you mean by "message size increases"? Are messages going to the changelog topic increasing in size? Or is the changelog topic getting full? Thanks Eno > On 8 Nov 2016, at 16:49, Sachin Mittal wrote: > > Hi, > We are using aggregation by

Re: Kafka streaming changelog topic max.message.bytes exception

2016-11-08 Thread Eno Thereska
x.message.bytes. >>> Since this is an internal topic based off a table I don't know how can I >>> control this topic. >>> >>> If I can set some retention.ms for this topic then I can purge old >>> messages >>> thereby ensuring that message size stay

Re: Understanding the topology of high level kafka stream

2016-11-09 Thread Eno Thereska
Hi Sachin, Kafka Streams is built on top of standard Kafka consumers. For for every topic it consumes from (whether changelog topic or source topic, it doesn't matter), the consumer stores the offset it last consumed from. Upon restart, by default it start consuming from where it left off from

Re: How to block tests of Kafka Streams until messages processed?

2016-10-19 Thread Eno Thereska
Wanted to add that there is nothing too special about these utility functions, they are built using a normal consumer. Eno > On 19 Oct 2016, at 21:59, Eno Thereska <eno.there...@gmail.com> wrote: > > Hi Ali, > > Any chance you could recycle some of the code we have in &g

Re: How to block tests of Kafka Streams until messages processed?

2016-10-19 Thread Eno Thereska
Hi Ali, Any chance you could recycle some of the code we have in streams/src/test/java/.../streams/integration/utils? (I know we don't have it easily accessible in Maven, for now perhaps you could copy to your directory?) For example there is a method there

Re: Embedded Kafka Cluster - Maven artifact?

2016-10-19 Thread Eno Thereska
I'm afraid we haven't released this as a maven artefact yet :( Eno > On 18 Oct 2016, at 13:22, Ali Akhtar wrote: > > Is there a maven artifact that can be used to create instances > of EmbeddedSingleNodeKafkaCluster for unit / integration tests?

Re: KafkaStreams KTable#through not creating changelog topic

2016-11-22 Thread Eno Thereska
HI Mikael, 1) The JavaDoc looks incorrect, thanks for reporting. Matthias is looking into fixing it. I agree that it can be confusing to have topic names that are not what one would expect. 2) If your goal is to query/read from the state stores, you can use Interactive Queries to do that (you

Re: KafkaStreams KTable#through not creating changelog topic

2016-11-22 Thread Eno Thereska
Hi Mikael, If you perform an aggregate and thus create another KTable, that KTable will have a changelog topic (and a state store that you can query with Interactive Queris - but this is tangential). It is true that source KTables don't need to create another topic, since they already have

Re: Abnormal working in the method punctuate and error linked to seesion.timeout.ms

2016-11-28 Thread Eno Thereska
; So I can't understand why the application is doing so. > > > Hamza > > ____ > De : Eno Thereska <eno.there...@gmail.com> > Envoyé : dimanche 27 novembre 2016 23:21:24 > À : users@kafka.apache.org > Objet : Re: Abnormal working in the method punct

Re: Abnormal working in the method punctuate and error linked to seesion.timeout.ms

2016-11-28 Thread Eno Thereska
ferred to send the whole code sto make it easier to you even though you don't need all of it De : Eno Thereska <eno.there...@gmail.com> Envoyé : lundi 28 novembre 2016 01:12:14 À : users@kafka.apache.org Objet : Re: Abnormal working in the method punctuate and error linked to seesion.timeout.ms

Re: Abnormal working in the method punctuate and error linked to seesion.timeout.ms

2016-11-28 Thread Eno Thereska
Hi Hamza, If you have an infinite while loop, that would mean the app would spend all the time in that loop and poll() would never be called. Eno > On 28 Nov 2016, at 10:49, Hamza HACHANI wrote: > > Hi, > > I've some troubles with the method puctuate.In fact when

Re: Kafka Streaming message loss

2016-11-18 Thread Eno Thereska
Hi Ryan, Perhaps you could share some of your code so we can have a look? One thing I'd check is if you are using compacted Kafka topics. If so, and if you have non-unique keys, compaction happens automatically and you might only see the latest value for a key. Thanks Eno > On 18 Nov 2016, at

Re: Change current offset with KStream

2016-11-18 Thread Eno Thereska
ohn Hayles <jhay...@etcc.com> wrote: > > Eno, thanks! > > In my case I think I need more functionality for fault tolerance, so will > look at KafkaConsumer. > > Thanks, > John > > -Original Message- > From: Eno Thereska [mailto:eno.there...@gmail.co

Re: Unit tests (in Java) take a *very* long time to 'clean up'?

2016-11-11 Thread Eno Thereska
Hi Ali, You're right, shutting down the broker and ZK is expensive. We kept the number of integration tests relatively small (and pushed some more tests as system tests, while doing as many as possible as unit tests). It's not just the shutdown that's expensive, it's also the starting up

Re: Unit tests (in Java) take a *very* long time to 'clean up'?

2016-11-11 Thread Eno Thereska
ou can point > out which class / method creates the processes / threads. > > Thanks. > > On Fri, Nov 11, 2016 at 9:24 PM, Eno Thereska <eno.there...@gmail.com> > wrote: > >> Hi Ali, >> >> You're right, shutting down the broker and ZK is expensive. We kep

Re: How to get and reset Kafka stream application's offsets

2016-11-21 Thread Eno Thereska
Hi Sachin, Currently you can only change the following global configuration by using "earliest" or "latest", as shown in the code snippet below. As the Javadoc mentions: "What to do when there is no initial offset in Kafka or if the current offset does not exist any more on the server (e.g.

Re: windowing with the processor api

2016-11-02 Thread Eno Thereska
place where to define the >> size of the window and where to precise the time of the advance. >> >> >> Hamza >> >> Thanks >> >> De : Eno Thereska >> <eno.there...@gmail.com> Envoyé : mardi 1 novembre 2016 22:44:47 À

Re: Modify in-flight messages

2016-11-02 Thread Eno Thereska
Hi Dominik, Not sure if this is 100% relevant, but since I noticed you saying that you are benchmarking stream processing engines, one way to modify a message would be to use the Kafka Streams library, where you consume a message from a topic, modify it as needed/do some processing, and then

Re: Kafka Streams fails permanently when used with an unstable network

2016-11-02 Thread Eno Thereska
Hi Sai, For your second note on rebalancing taking a long time, we have just improved the situation in trunk after fixing this JIRA: https://issues.apache.org/jira/browse/KAFKA-3559 . Feel free to give it a go if rebalancing time continues to

Re: windowing with the processor api

2016-11-02 Thread Eno Thereska
Hi Hamza, Are you getting a particular error? Here is an example : Stores.create("window-store") .withStringKeys() .withStringValues() .persistent() .windowed(10, 10, 2, false).build(),

Re: Kafka multitenancy

2016-10-26 Thread Eno Thereska
Hi Mudit, This is a relevant read: http://www.confluent.io/blog/sharing-is-caring-multi-tenancy-in-distributed-data-systems It talks about two aspects of multi tenancy in Kafka, security and performance

Re: Tuning for high RAM and 10GBe

2016-10-11 Thread Eno Thereska
Sounds good that you got up to 500MB/s. At that point I suspect you reach a sort of steady state where the cache is continuously flushing to the SSDs, so you are effectively bottlenecked by the SSD. I believe this is as expected (the bottleneck resource will dominate the end to end throughput

Re: JVM crash when closing persistent store (rocksDB)

2016-10-11 Thread Eno Thereska
Hi Pierre, I tried the exact code on MacOs and am not getting any errors. Could you check if all the directories in /tmp where Kafka Streams writes the RocksDb files are empty? I'm wondering if there is some bad state left over. Finally looks like you are running 0.10.0, could you try running

  1   2   3   >