Re: In kafka streams consumer seems to hang while retrieving the offsets

2017-04-09 Thread Eno Thereska
Hi Sachin, It's not necessarily a deadlock. Do you have any debug traces from those nodes? Also would be useful to know the config (e.g., how many partitions do you have and how many app instances.) Thanks Eno > On 9 Apr 2017, at 04:45, Sachin Mittal wrote: > > Hi, > In

Re: Leader not available error after kafka node goes down

2017-04-08 Thread Eno Thereska
Hi Ali, Try changing the default value for the streams producer retries to something large, since the default is 0 (which means that if a broker is temporarily down, streams would give that error), e.g., : final Properties props = new Properties();

Re: Kafka Streams: Is it possible to set heap size for the JVMs running Kafka Streams?

2017-04-06 Thread Eno Thereska
Hi Tianji, Yes, with some caveats. First, to set the heap size of the JVM, you can treat it just like any other Java program, e.g., see http://stackoverflow.com/questions/27681511/how-do-i-set-the-java-options-for-kafka

Re: compaction not happening for change log topic

2017-04-03 Thread Eno Thereska
Hi Sachin, I’ve found this useful: http://stackoverflow.com/questions/34281643/how-to-test-whether-log-compaction-is-working-or-not-in-kafka, as well as this

Re: Kafka Streams: Is it possible to pause/resume consuming a topic?

2017-04-01 Thread Eno Thereska
Cheers Eno > On 1 Apr 2017, at 17:14, Tianji Li <skyah...@gmail.com> wrote: > > Hi Eno, > > Could you point to me where in code this is happening please? > > Thanks > Tianji > > On Sat, Apr 1, 2017 at 11:45 AM, Eno Thereska <eno.there...@gmail.com> >

Re: Kafka Streams: Is it possible to pause/resume consuming a topic?

2017-04-01 Thread Eno Thereska
Tianji, You shouldn’t have to worry about pausing and resuming the consumer, since that happens internally automatically. Eno > On Apr 1, 2017, at 3:26 PM, Tianji Li wrote: > > Hi there, > > Say a processor that is consuming topic A and producing into topic B, and >

Re: Kafka streams 0.10.2 Producer throwing exception eventually causing streams shutdown

2017-04-01 Thread Eno Thereska
passed since last > append >> >> So any idea as why TimeoutException is happening. >> Is this controlled by >> ProducerConfig.REQUEST_TIMEOUT_MS_CONFIG >> >> If yes >> What should the value be set in this given that out consumer >> max.poll.interva

Re: ThoughWorks Tech Radar: Assess Kafka Streams

2017-03-29 Thread Eno Thereska
Thanks for the heads up Jan! Eno > On 29 Mar 2017, at 19:08, Jan Filipiak wrote: > > Regardless of how usefull you find the tech radar. > > Well deserved! even though we all here agree that trial or adopt is in reach > >

Re: 3 machines 12 threads all fail within 2 hours of starting the streams application

2017-03-27 Thread Eno Thereska
> exception, or streams application itself doing a better job in handling > such and not shutting down the threads. > > Thanks > Sachin > > > On Sat, Mar 25, 2017 at 11:03 PM, Eno Thereska <eno.there...@gmail.com> > wrote: > >> Hi Sachin, >> >

Re: Kafka streams 0.10.2 Producer throwing exception eventually causing streams shutdown

2017-03-25 Thread Eno Thereska
Hi Sachin, Not in this case. Thanks Eno > On Mar 25, 2017, at 6:19 PM, Sachin Mittal <sjmit...@gmail.com> wrote: > > OK. > I will try this out. > > Do I need to change anything for > max.in.flight.requests.per.connection > > Thanks > Sachin > >

Re: Kafka streams 0.10.2 Producer throwing exception eventually causing streams shutdown

2017-03-25 Thread Eno Thereska
Hi Sachin, For this particular error, “org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is not the leader for that topic-partition.”, could you try setting the number of retries to something large like this: Properties props = new Properties();

Re: YASSQ (yet another state store question)

2017-03-24 Thread Eno Thereska
Hi Jon, This is expected, see this: https://groups.google.com/forum/?pli=1#!searchin/confluent-platform/migrated$20to$20another$20instance%7Csort:relevance/confluent-platform/LglWC_dZDKw/qsPuCRT_DQAJ

Re: Benchmarking streaming frameworks

2017-03-23 Thread Eno Thereska
Hi Giselle, Great idea! In Kafka Streams we have a few micro-benchmarks we run nightly. They are at: https://github.com/apache/kafka/blob/trunk/streams/src/test/java/org/apache/kafka/streams/perf/SimpleBenchmark.java

Re: Commits of slow moving topics in Kafka Streams time out

2017-03-21 Thread Eno Thereska
Hi Frank, There is a similar discussion here https://www.mail-archive.com/users@kafka.apache.org/msg25089.html , with a JIRA. Have you tried to increase the retention minutes to something large? Thanks Eno > On 21 Mar 2017,

Re: Kafka Streams: lockException

2017-03-17 Thread Eno Thereska
150, 50, 20 threads and the probabilities of crashing decreased with >> this number. When using 1 thread, no crash! >> >> My max.poll.interval is 5 minutes and all the processing won't last that >> long, so that parameter does not help. >> >> >> Thanks &g

Re: Kafka Streams: lockException

2017-03-16 Thread Eno Thereska
Hi Tianji, How many threads does your app use? One reason is explained here: https://groups.google.com/forum/#!topic/confluent-platform/wgCSuwIJo5g , you might want to increase max.poll.interval config value. If that

Re: Kafka Stream: RocksDBKeyValueStoreSupplier performance

2017-03-15 Thread Eno Thereska
lly, make sure to use 0.10.2, the latest release. Thanks Eno > On 15 Mar 2017, at 18:14, Tianji Li <skyah...@gmail.com> wrote: > > Hi Eno, > > Rocksdb without caching took around 7 minutes. > > Tianji > > > On Wed, Mar 15, 2017 at 9:40 AM, Eno Thereska

Re: Trying to use Kafka Stream

2017-03-15 Thread Eno Thereska
p-zookeeper:3.2.0: zookeeper >>>>> >>>>> I used example @ https://github.com/confluent >>>>> inc/examples/blob/3.2.x/kafka-streams/src/main/java/io/confl >>>>> uent/examples/streams/WordCountLambdaExample.java#L178-L181 and >>>>> f

Re: Trying to use Kafka Stream

2017-03-14 Thread Eno Thereska
Hi there, I noticed in your example that you are using localhost:9092 to produce but localhost:29092 to consume? Or is that a typo? Is zookeeper, kafka, and the Kafka Streams app all running within one docker container, or in different containers? I just tested the WordCountLambdaExample and

Re: Capacity planning for Kafka Streams

2017-03-13 Thread Eno Thereska
Thanks, A couple of things: - I’d recommend moving to 0.10.2 (latest release) if you can since several improvements were made in the last two releases that make rebalancing and performance better. - When running on environments with large latency on AWS at least (haven’t tried Google cloud),

Re: Capacity planning for Kafka Streams

2017-03-13 Thread Eno Thereska
Hi Mahenda, We are in the process of documenting capacity planning for streams, stay tuned. Could you send some more info on your problem? What Kafka version are you using? Are the VMs on the same or different hosts? Also what exactly do you mean by “the lag keeps fluctuating”, what metric are

Re: ~20gb of kafka-streams state, unexpected?

2017-03-10 Thread Eno Thereska
gt; > Thanks for your input here. > > > > On 10 March 2017 at 10:59, Eno Thereska <eno.there...@gmail.com> wrote: > >> Hi Ian, >> >> Sounds like you have a total topic size of ~20GB (96 partitions x 200mb). >> If most keys are unique then gr

Re: ~20gb of kafka-streams state, unexpected?

2017-03-10 Thread Eno Thereska
Hi Ian, Sounds like you have a total topic size of ~20GB (96 partitions x 200mb). If most keys are unique then group and reduce might not be as effective in grouping/reducing. Can you comment on the key distribution? Are most keys unique? Or do you expect lots of keys to be the same in the

Re: Writing data from kafka-streams to remote database

2017-03-05 Thread Eno Thereska
Hi Shimi, Could you tell us more about your scenario? Kafka Streams uses embedded databases (RocksDb) to store it's state, so often you don't need to write anything to an external database and you can query your streams state directly from streams. Have a look at this blog if that matches your

Re: Fixing two critical bugs in kafka streams

2017-03-05 Thread Eno Thereska
pull/2640 > > Thanks > Sachin > > > On Sun, Mar 5, 2017 at 1:49 PM, Eno Thereska <eno.there...@gmail.com> wrote: > >> Thanks Sachin for your contribution. Could you create a pull request out >> of the commit (so we can add comments, and also so you are ac

Re: Need some help in identifying some important metrics to monitor for streams

2017-03-05 Thread Eno Thereska
ew-part-advice"-"processID: a UUID to guarantee uniqueness across >>>> machines" as its clientId. >>>> >>>> >>>> As for metricsName, it is always set as "clientId + "-" + threadName" >>> where >>

Re: Fixing two critical bugs in kafka streams

2017-03-05 Thread Eno Thereska
Thanks Sachin for your contribution. Could you create a pull request out of the commit (so we can add comments, and also so you are acknowledged properly for your contribution)? Thanks Eno > On 5 Mar 2017, at 07:34, Sachin Mittal wrote: > > Hi, > So far in our experiment

Re: Need some help in identifying some important metrics to monitor for streams

2017-03-03 Thread Eno Thereska
Lastly can we get some help on names like new-part-advice-d1094e71-0f59- > 45e8-98f4-477f9444aa91-StreamThread-1 and have more standard name of thread > like new-advice-1-StreamThread-1(as in version 10.1.1) so we can log these > metrics as part of out cron jobs. > > Thanks > Sac

Re: Need some help in identifying some important metrics to monitor for streams

2017-03-02 Thread Eno Thereska
Hi Sachin, The new streams metrics are now documented at https://kafka.apache.org/documentation/#kafka_streams_monitoring . Note that not all of them are turned on by default. We have several benchmarks that run nightly to

Re: Large state directory with Kafka Streams

2017-02-27 Thread Eno Thereska
hrottle the amount of topics a stream subscribes > to? > > I see https://issues.apache.org/jira/browse/KAFKA-3775 which seems to be > along these lines. > > Thanks again, > Ian. > > On 27 February 2017 at 15:41, Eno Thereska <eno.there...@gmail.com> wrote: > &

Re: Large state directory with Kafka Streams

2017-02-27 Thread Eno Thereska
nt since the 0.10.1.1 client. > Any specifics I can include within the JIRA ticket to help with getting to > the bottom of it? I can't seem to pinpoint a specific trigger for them. > Sometimes the application can run for hours/days before hitting them. > > On 27 February 2017 at 14:17, En

Re: Large state directory with Kafka Streams

2017-02-27 Thread Eno Thereska
Hi Ian, It looks like you have a lot of partitions for the count store. Each RocksDb database uses off heap memory (around 60-70MB in 0.10.2) which will add up if you have these many stores in one instance. One solution would be to scale out your streams application by using another Kafka

Re: Stream topology with multiple Kaka clusters

2017-02-27 Thread Eno Thereska
Hi Mahendra, The short answer is "not yet", but see this link for more: https://groups.google.com/forum/?pli=1#!msg/confluent-platform/LC88ijQaEMM/sa96OfK9AgAJ;context-place=forum/confluent-platform Thanks Eno > On 27 Feb 2017, at 13:37, Mahendra Kariya wrote: > >

Re: Lock Exception - Failed to lock the state directory

2017-02-26 Thread Eno Thereska
Hi Dan, Just checking on the version: you are using 0.10.0.2 or 0.10.2.0 (i.e., latest)? I ask because state locking was something that was fixed in 0.10.2. Thanks Eno > On 26 Feb 2017, at 13:37, Dan Ofir wrote: > > Hi, > > I encountered an exception while using

Re: kafka streams locking issue in 0.10.20.0

2017-02-26 Thread Eno Thereska
Hi Ara, There are some instructions here for monitoring whether RocksDb is having its writes stalled: https://github.com/facebook/rocksdb/wiki/Write-Stalls , together with some instructions for tuning. Could you share RocksDb's LOG file?

Re: Immutable Record with Kafka Stream

2017-02-24 Thread Eno Thereska
Hi Kohki, As you mentioned, this is expected behavior. However, if you are willing to tolerate some more latency, you can improve the chance that a message with the same key is overwritten by increasing the commit time. By default it is 30 seconds, but you can increase it:

Re: Consumer / Streams causes deletes in __consumer_offsets?

2017-02-22 Thread Eno Thereska
Hi Mathieu, It could be that the offset retention period has expired. See this: http://stackoverflow.com/questions/39131465/how-does-an-offset-expire-for-an-apache-kafka-consumer-group

Re: KTable send old values API

2017-02-21 Thread Eno Thereska
Hi Dmitry, Could you tell us more on the exact API you'd like? Perhaps if others find it useful too we/you can do a KIP. Thanks Eno > On 21 Feb 2017, at 22:01, Dmitry Minkovsky wrote: > > At KAFKA-2984: ktable sends old values when required >

Re: Implementing a non-key in Kafka Streams using the Processor API

2017-02-21 Thread Eno Thereska
Hi Frank, As far as I know the design in that wiki has been superceded by the Global KTables design which is now coming in 0.10.2. Hence, the JIRAs that are mentioned there (like KAFKA-3705). There are some extensive comments in https://issues.apache.org/jira/browse/KAFKA-3705

Re: System went OutOfMemory when running Kafka-Streaming

2017-02-17 Thread Eno Thereska
Hi Deepak, Could you tell us a bit more on what your app is doing (e.g., aggregates, or joins)? Also which version of Kafka are you using? Have you changed any default configs? Thanks Eno > On 17 Feb 2017, at 05:30, Deepak Pengoria wrote: > > Hi, > I ran my

Re: KIP-121 [VOTE]: Add KStream peek method

2017-02-15 Thread Eno Thereska
KIP is accepted, discussion now moves to PR. Thanks Eno On Wed, Feb 15, 2017 at 12:28 PM, Steven Schlansker < sschlans...@opentable.com> wrote: > Oops, sorry, a number of votes were sent only to -dev and not to > -user and so I missed those in the email I just sent. The actual count is > more

Re: KTable Windowed state store name

2017-02-15 Thread Eno Thereska
Hi Shimi, Did you pass in a state store name for the windowed example? E.g., if you are doing "count" or another aggregate, you can pass in a desired state store name. Thanks Eno > On 15 Feb 2017, at 00:35, Shimi Kiviti wrote: > > Hello, > > I have a code that access a

Re: KTable and cleanup.policy=compact

2017-02-13 Thread Eno Thereska
ta flows.. ppl are brought closer together.. there is > peace in the valley.. for me... ) > ... > > KafkaStreams = new KafkaStream(KStreamBuilder, > config_with_cleanup_policy_or_not?) > KafkaStream.start > > On Wed, Feb 8, 2017 at 12:30 PM, Eno Thereska <eno.there...@gmail.com> >

Re: [VOTE] 0.10.2.0 RC1

2017-02-13 Thread Eno Thereska
+1 (non binding) Checked streams. Verified that stream tests work and examples off confluentinc/examples/kafka-streams work. Thanks Eno > On 10 Feb 2017, at 16:51, Ewen Cheslack-Postava wrote: > > Hello Kafka users, developers and client-developers, > > This is RC1 for

Re: Table a KStream

2017-02-11 Thread Eno Thereska
Yes you are correct Elliot. If you look at it just from the point of view of what's in the topic, and how the topic is configured (e.g., with "compact") there is no difference. Thanks Eno > On 11 Feb 2017, at 17:56, Elliot Crosby-McCullough > wrote: > > For my own

Re: Shutting down a Streams job

2017-02-09 Thread Eno Thereska
Hi Dmitry, Elias, You raise a valid point, and thanks for opening https://issues.apache.org/jira/browse/KAFKA-4748 Elias. We'll hopefully have some ideas to share soon. Eno > On 9 Feb 2017, at 16:54, Dmitry Minkovsky

Re: ProcessorContext commit question

2017-02-09 Thread Eno Thereska
> Thanks Eno that makes sense. > > If then this is an implementation of Transformer which is in a DSL topology > with DSL sinks ie `to()`, is the commit surplus to requirement? I suspect it > will do no harm at the very least. > > Thanks > Adrian > > -Original Message--

Re: ProcessorContext commit question

2017-02-09 Thread Eno Thereska
Hi Adrian, It's also done in the DSL, but at a different point, in Task.commit(), since the flow is slightly different. Yes, once data is stored in stores, the offsets should be committed, so in case of a crash the same offsets are not processed again. Thanks Eno > On 9 Feb 2017, at 16:06,

Re: KTable and cleanup.policy=compact

2017-02-08 Thread Eno Thereska
cleanup.policy of delete would mean that topic entries past the retention >> policy might have been removed. If you scale up the application, new >> application instances won't be able to restore a complete table into its >> local state store. An operation like a join against that K

Re: KTable and cleanup.policy=compact

2017-02-08 Thread Eno Thereska
If you fail to set the policy to compact, there shouldn't be any correctness implications, however your topics will grow larger than necessary. Eno > On 8 Feb 2017, at 18:56, Jon Yeargers wrote: > > What are the ramifications of failing to do this? > > On Tue, Feb

Re: KIP-121 [Discuss]: Add KStream peek method

2017-02-08 Thread Eno Thereska
hich is supposed to do exactly that). >>>>> >>>>> For reference, here are the descriptions of peek, map, foreach in Java >>> 8. >>>>> I could have also included links to StackOverflow questions where people >>>>> were confused abo

Re: KIP-121 [Discuss]: Add KStream peek method

2017-02-07 Thread Eno Thereska
Hi, I like the proposal, thank you. I have found it frustrating myself not to be able to understand simple things, like how many records have been currently processed. The peek method would allow those kinds of diagnostics and debugging. Gwen, it is possible to do this with the existing

Re: At Least Once semantics for Kafka Streams

2017-02-06 Thread Eno Thereska
Oh, by "other" I meant the original one you started discussing: COMMIT_INTERVAL_MS_CONFIG. Eno > On 6 Feb 2017, at 09:28, Mahendra Kariya <mahendra.kar...@go-jek.com> wrote: > > Thanks Eno! > > I am just wondering what is this other commit parameter? > &

Re: At Least Once semantics for Kafka Streams

2017-02-05 Thread Eno Thereska
Hi Mahendra, That is a good question. Streams uses consumers and that config applies to consumers. However, in streams we always set enable.auto.commit to false, and manage commits using the other commit parameter. That way streams has more control on when offsets are committed. Eno > On 6

Re: [DISCUSS] KIP-118: Drop Support for Java 7 in Kafka 0.11

2017-02-03 Thread Eno Thereska
Makes sense. Eno > On 3 Feb 2017, at 10:38, Ismael Juma wrote: > > Hi all, > > I have posted a KIP for dropping support for Java 7 in Kafka 0.11: > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-118%3A+Drop+Support+for+Java+7+in+Kafka+0.11 > > Most people were

Re: Kafka Streams delivery semantics and state store

2017-02-02 Thread Eno Thereska
Hi Krzysztof, There are several scenarios where you want a set of records to be sent atomically (to a statestore, downstream topics etc). In case of failure then, either all of them commit successfully, or none does. We are working to add exactly-once processing to Kafka Streams and I suspect

Re: [DISCUSS] KIP-114: KTable materialization and improved semantics

2017-02-01 Thread Eno Thereska
atthias > > > On 1/30/17 3:50 AM, Jan Filipiak wrote: >> Hi Eno, >> >> thanks for putting into different points. I want to put a few remarks >> inline. >> >> Best Jan >> >> On 30.01.2017 12:19, Eno Thereska wrote: >>> So I

Re: Cannot access Kafka Streams JMX metrics using jmxterm

2017-01-30 Thread Eno Thereska
Hi Jendrik, I haven't tried jmxterm. Can you confirm if it is able to access the Kafka producer/consumer metrics (they exist since Kafka Streams internally uses Kafka)? I've personally used jconsole to look at the collected streams metrics, but that might be limited for your needs. Thanks

Re: [DISCUSS] KIP-114: KTable materialization and improved semantics

2017-01-30 Thread Eno Thereska
em would be implementing it > like this. > Yes I am aware that the current IQ API will not support querying by > KTableProcessorName instread of statestoreName. But I think that had to > change if you want it to be intuitive > IMO you gotta apply the filter read time > > Looking f

Re: [DISCUSS] KIP-114: KTable materialization and improved semantics

2017-01-30 Thread Eno Thereska
n basically uses the same mechanism that is currently used. >> From my point of view this is the least confusing way for DSL users. If >> its to tricky to get a hand on the streams instance one could ask the user >> to pass it in before executing queries, therefore making sure the str

Re: [DISCUSS] KIP-114: KTable materialization and improved semantics

2017-01-28 Thread Eno Thereska
alueGetter of Filter it will apply the filter and should be completely >>> transparent as to if another processor or IQ is accessing it? How can this >>> new method help? >>> >>> I cannot see the reason for the additional materialize method being >>&g

Re: [DISCUSS] KIP-114: KTable materialization and improved semantics

2017-01-28 Thread Eno Thereska
rent question is what e.g. Jan Filipiak >>>>> mentioned earlier in this thread: >>>>> I think we need to explain more clearly why KIP-114 doesn't propose the >>>>> seemingly simpler solution of always materializing tables/state stores. >>>>&

Fwd: [DISCUSS] KIP-114: KTable materialization and improved semantics

2017-01-26 Thread Eno Thereska
always materialized, and so are the >>>>>> joining KTables to always be materialized. >>>>>> d) KTable.filter/mapValues resulted KTables materialization depend on >>> its >>>>>> parent's materialization; >>>>>> >>

Re: What are wall clock-driven alternatives to punctuate()?

2017-01-22 Thread Eno Thereska
Dmitry, You are right about punctuate being data driven. We are discussing options for changing that, but that doesn't help you now. Depending on what you're doing, one option would be to use the Interactive Query APIs to periodically query the state stores depending on your desired frequency

Re: Streams: Global state & topic multiplication questions

2017-01-19 Thread Eno Thereska
etails of > other images (99% of records are not interesting for the specific user). > > Thanks, > > Peter > > > > On Thu, Jan 19, 2017 at 4:55 PM, Eno Thereska <eno.there...@gmail.com> > wrote: > >> Hi Peter, >> >> About Q1: The DSL has the

Re: Streams: Global state & topic multiplication questions

2017-01-19 Thread Eno Thereska
Hi Peter, About Q1: The DSL has the "branch" API, where one stream is branched to several streams, based on a predicate. I think that could help. About Q2: I'm not entirely sure I understand the problem space. What is the definition of a "full image"? Thanks Eno > On 19 Jan 2017, at 12:07,

Re: Kafka Streams: got bit by WindowedSerializer (only window.start is serialized)

2017-01-17 Thread Eno Thereska
focus.io>: > >> Hi Eno, >> I thought it would be impossible to put this in Kafka because of backward >> incompatibility with the existing windowed keys, no ? >> In my case, I had to recreate a new output topic, reset the topology, and >> and reprocess all my data. >&g

Re: Kafka Streams: got bit by WindowedSerializer (only window.start is serialized)

2017-01-16 Thread Eno Thereska
Nicolas, I'm checking with Bill who originally was interested in KAFKA-4468. If he isn't actively working on it, why don't you give it a go and create a pull request (PR) for it? That way your contribution is properly acknowledged etc. We can help you through with that. Thanks Eno > On 16 Jan

Re: Kafka Streams: from a KStream, aggregating records with the same key and updated metrics ?

2017-01-13 Thread Eno Thereska
Hi Nicolas, There is a lot here, so let's try to split the concerns around some themes: 1. The Processor API is flexible and can definitely do what you want, but as you mentioned, at the cost of you having to manually craft the code. 2. Why are the versions used? I sense there is concern about

Re: Kafka Streams: consume 6 months old data VS windows maintain durations

2017-01-13 Thread Eno Thereska
and then I'll > restart the application with the original window duration time (without a > reset this time). Let's hope Kafka Streams will detect this window duration > change and drop old windows immediately ? > > > 2017-01-12 17:06 GMT+01:00 Eno Thereska <eno.there...@gmail.com>:

Re: Kafka Streams: consume 6 months old data VS windows maintain durations

2017-01-12 Thread Eno Thereska
Hi Nicolas, I've seen your previous message thread too. I think your best bet for now is to increase the window duration time, to 6 months. If you change your application logic, e.g., by changing the duration time, the semantics of the change wouldn't immediate be clear and it's worth

Re: [ANNOUNCE] New committer: Grant Henke

2017-01-11 Thread Eno Thereska
Congrats! Eno > On 11 Jan 2017, at 20:06, Ben Stopford wrote: > > Congrats Grant!! > On Wed, 11 Jan 2017 at 20:01, Ismael Juma wrote: > >> Congratulations Grant, well deserved. :) >> >> Ismael >> >> On 11 Jan 2017 7:51 pm, "Gwen Shapira"

Re: effect of high IOWait on KStream app?

2016-12-17 Thread Eno Thereska
h > correlation id 1 : {joinSourceTopic2kTableKTable=LEADER_NOT_AVAILABLE} > (org.apache.kafka.clients.NetworkClient:709) > > Streams KTableKTable LeftJoin Performance [MB/s joined]: 6.530348031376133 > > On Sat, Dec 17, 2016 at 6:39 AM, Jon Yeargers <jon.yearg...@cedexis.c

Re: Restarting a failed kafka application

2016-12-17 Thread Eno Thereska
Hi Sachin, I think Matthias meant that you can chance the compaction configuration parameters when you create the topic, but you're right, by default you shouldn't need to do anything since the topic will be eventually compacted automatically. Eno > On 17 Dec 2016, at 08:23, Sachin Mittal

Re: effect of high IOWait on KStream app?

2016-12-17 Thread Eno Thereska
Jon, It's hard to tell. Would you be willing to run a simple benchmark and report back the numbers? The benchmark is called SimpleBenchmark.java, it's included with the source, and it will start a couple of streams apps. It requires a ZK and a broker to be up. Then you run it:

Re: KTable#through from window store to key value store

2016-12-17 Thread Eno Thereska
s, > Mikael > > On Fri, Dec 16, 2016 at 6:20 PM Eno Thereska <eno.there...@gmail.com> wrote: > >> Hi Mikael, >> >> Currently that is not possible. Could you elaborate why you'd need that >> since you can query from tableOne. >> >> Thanks >&g

Re: Error in Kafka Streams (that shouldn't be there)

2016-12-16 Thread Eno Thereska
Hi Frank, Is this happening when you are shutting down the app, or is the app shutting down unexpectedly? The state transition error is not the main issue, it's largely a side effect of the unexpected transition so the first error is the cause probably. However, I can't tell what happened

Re: KTable#through from window store to key value store

2016-12-16 Thread Eno Thereska
Hi Mikael, Currently that is not possible. Could you elaborate why you'd need that since you can query from tableOne. Thanks Eno > On 16 Dec 2016, at 10:45, Mikael Högqvist wrote: > > Hi, > > I have a small example topology that count words per minute (scala): > >

Re: doing > 1 'parallel' operation on a stream

2016-12-12 Thread Eno Thereska
Hi there, Just pass your stream twice to the aggregator. No need to duplicate it or copy it etc. Eg. stream2 = stream1.aggregate(...) stream3 = stream1.aggregate(/* some different aggregation */) Thanks Eno > On 12 Dec 2016, at 11:09, Jon Yeargers wrote: > > If I

Re: accessing state-store ala WordCount example

2016-12-07 Thread Eno Thereska
@cedexis.com> wrote: > > Im having trouble finding documentation on this new feature. Can you point > me to anything? > > Specifically on how to get available "from/to" values but more generally on > how to use the "windowed" query. > > On Wed, Dec 7

Re: accessing state-store ala WordCount example

2016-12-07 Thread Eno Thereska
Hi Jon, This will be a windowed store. Have a look at the Jetty-server bits for windowedByKey: "/windowed/{storeName}/{key}/{from}/{to}" Thanks Eno > On 6 Dec 2016, at 23:33, Jon Yeargers wrote: > > I copied out some of the WordCountInteractive >

Re: Attempting to put a clean entry for key [...] into NamedCache [...] when it already contains a dirty entry for the same key

2016-12-03 Thread Eno Thereska
Hi Mathieu, What version of Kafka are you using? There was recently a fix that went into trunk, just checking if you're using an older version. (to make forward progress you can turn the cache off, like this: streamsConfiguration.put(StreamsConfig.CACHE_MAX_BYTES_BUFFERING_CONFIG, 0); ) Thanks

Re: Is there a way to control pipeline flow to downstream

2016-12-01 Thread Eno Thereska
his option. > > Thanks > Sachin > > > On Thu, Dec 1, 2016 at 3:06 PM, Eno Thereska <eno.there...@gmail.com> wrote: > >> Hi Sachin, >> >> If you are using the DSL, currently there is no way to do fine-grained >> control of the downstream sending. The

Re: Is there a way to control pipeline flow to downstream

2016-12-01 Thread Eno Thereska
Hi Sachin, If you are using the DSL, currently there is no way to do fine-grained control of the downstream sending. There is some coarse-grained control in that you can use the record cache to dedup messages with the same key before sending downstream, or you can choose to get all records by

Re: Initializing StateStores takes *really* long for large datasets

2016-11-30 Thread Eno Thereska
Mathieu, You are absolutely right. We've written about the memory management strategy below: https://cwiki.apache.org/confluence/display/KAFKA/Discussion%3A+Memory+Management+in+Kafka+Streams

Re: KStream window - end value out of bounds

2016-11-29 Thread Eno Thereska
ng to. Was curious whether this was related but it seems > not. Bummer. > > On Tue, Nov 29, 2016 at 8:52 AM, Eno Thereska <eno.there...@gmail.com> > wrote: > >> Hi Jon, >> >> There is an optimization in org.apache.kafka.streams.kstream.internals. >> W

Re: KStream window - end value out of bounds

2016-11-29 Thread Eno Thereska
Hi Jon, There is an optimization in org.apache.kafka.streams.kstream.internals.WindowedSerializer/Deserializer where we don't encode and decode the end of the window since the user can always calculate it. So instead we return a default of Long.MAX_VALUE, which is the big number you see. In

Re: Question about state store

2016-11-29 Thread Eno Thereska
Hi Simon, See if this helps: - you can create the KTable with the state store "storeAdminItem" as you mentioned - for each element you want to check, you want to query that state store directly to check if that element is in the state store. You can do the querying via the Interactive Query

Re: Abnormal working in the method punctuate and error linked to seesion.timeout.ms

2016-11-28 Thread Eno Thereska
ferred to send the whole code sto make it easier to you even though you don't need all of it De : Eno Thereska <eno.there...@gmail.com> Envoyé : lundi 28 novembre 2016 01:12:14 À : users@kafka.apache.org Objet : Re: Abnormal working in the method punctuate and error linked to seesion.timeout.ms

Re: Abnormal working in the method punctuate and error linked to seesion.timeout.ms

2016-11-28 Thread Eno Thereska
; So I can't understand why the application is doing so. > > > Hamza > > ____ > De : Eno Thereska <eno.there...@gmail.com> > Envoyé : dimanche 27 novembre 2016 23:21:24 > À : users@kafka.apache.org > Objet : Re: Abnormal working in the method punct

Re: Abnormal working in the method punctuate and error linked to seesion.timeout.ms

2016-11-28 Thread Eno Thereska
Hi Hamza, If you have an infinite while loop, that would mean the app would spend all the time in that loop and poll() would never be called. Eno > On 28 Nov 2016, at 10:49, Hamza HACHANI wrote: > > Hi, > > I've some troubles with the method puctuate.In fact when

Re: KafkaStreams KTable#through not creating changelog topic

2016-11-22 Thread Eno Thereska
HI Mikael, 1) The JavaDoc looks incorrect, thanks for reporting. Matthias is looking into fixing it. I agree that it can be confusing to have topic names that are not what one would expect. 2) If your goal is to query/read from the state stores, you can use Interactive Queries to do that (you

Re: KafkaStreams KTable#through not creating changelog topic

2016-11-22 Thread Eno Thereska
Hi Mikael, If you perform an aggregate and thus create another KTable, that KTable will have a changelog topic (and a state store that you can query with Interactive Queris - but this is tangential). It is true that source KTables don't need to create another topic, since they already have

Re: How to get and reset Kafka stream application's offsets

2016-11-21 Thread Eno Thereska
Hi Sachin, Currently you can only change the following global configuration by using "earliest" or "latest", as shown in the code snippet below. As the Javadoc mentions: "What to do when there is no initial offset in Kafka or if the current offset does not exist any more on the server (e.g.

Re: Kafka Streaming message loss

2016-11-18 Thread Eno Thereska
Hi Ryan, Perhaps you could share some of your code so we can have a look? One thing I'd check is if you are using compacted Kafka topics. If so, and if you have non-unique keys, compaction happens automatically and you might only see the latest value for a key. Thanks Eno > On 18 Nov 2016, at

Re: Change current offset with KStream

2016-11-18 Thread Eno Thereska
ohn Hayles <jhay...@etcc.com> wrote: > > Eno, thanks! > > In my case I think I need more functionality for fault tolerance, so will > look at KafkaConsumer. > > Thanks, > John > > -Original Message- > From: Eno Thereska [mailto:eno.there...@gmail.co

Re: Unit tests (in Java) take a *very* long time to 'clean up'?

2016-11-11 Thread Eno Thereska
ou can point > out which class / method creates the processes / threads. > > Thanks. > > On Fri, Nov 11, 2016 at 9:24 PM, Eno Thereska <eno.there...@gmail.com> > wrote: > >> Hi Ali, >> >> You're right, shutting down the broker and ZK is expensive. We kep

Re: Unit tests (in Java) take a *very* long time to 'clean up'?

2016-11-11 Thread Eno Thereska
Hi Ali, You're right, shutting down the broker and ZK is expensive. We kept the number of integration tests relatively small (and pushed some more tests as system tests, while doing as many as possible as unit tests). It's not just the shutdown that's expensive, it's also the starting up

Re: Understanding the topology of high level kafka stream

2016-11-09 Thread Eno Thereska
Hi Sachin, Kafka Streams is built on top of standard Kafka consumers. For for every topic it consumes from (whether changelog topic or source topic, it doesn't matter), the consumer stores the offset it last consumed from. Upon restart, by default it start consuming from where it left off from

Re: Kafka streaming changelog topic max.message.bytes exception

2016-11-08 Thread Eno Thereska
x.message.bytes. >>> Since this is an internal topic based off a table I don't know how can I >>> control this topic. >>> >>> If I can set some retention.ms for this topic then I can purge old >>> messages >>> thereby ensuring that message size stay

<    1   2   3   >