Re: KTable.filter usage, memory consumption and materialized view semantics

2016-06-30 Thread Guozhang Wang
Hmm, I'm not sure if we can optimize this case as well. Following your example: KTable T1 = builder.table("source-topic"); KTable T2 = table.filter(value > 2); And suppose the "source-topic" is piping the messages to T1 as: {a: 3}, {b: 5}, {a: 1}... When {a: 3} is passed from T1 to T2, the

Re: KTable.filter usage, memory consumption and materialized view semantics

2016-06-30 Thread Philippe Derome
Guozhang, my latest commit would propose that semantics of your JIRA case 2 be changed a little to suppress nulls when not sendingOldValues and not materializing. When a table T2 is created first from another table T1 and the filter does not match for the key k from T1, the invalid key k does not

Re: KTable.filter usage, memory consumption and materialized view semantics

2016-06-29 Thread Philippe Derome
good. On Wed, Jun 29, 2016 at 6:44 PM, Guozhang Wang wrote: > Yes, they are related in the sense that if we always materialize a source > KTable, then we can completely replace the `sendOldValues` as it will > always be true. But since 3911 is a rather big change, I'd prefer

Re: KTable.filter usage, memory consumption and materialized view semantics

2016-06-29 Thread Guozhang Wang
Yes, they are related in the sense that if we always materialize a source KTable, then we can completely replace the `sendOldValues` as it will always be true. But since 3911 is a rather big change, I'd prefer to complete this ticket first, and refactor it when we decided to work on 3911 later.

Re: KTable.filter usage, memory consumption and materialized view semantics

2016-06-28 Thread Philippe Derome
Is this point of view consistent with new ticket 3911 (Enforce KTable materialisation ) just submitted by Eno. T? Should the two tickets be linked somehow if they are related? My concern is that, the overhead of requesting the source KTable to be materialized (i.e. creating a state store, and

Re: KTable.filter usage, memory consumption and materialized view semantics

2016-06-27 Thread Guozhang Wang
The boolean flag can be reset by a child operator which requires the source to be materialized, more details can be found in this design wiki: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65143671 But just to give you a concrete idea, if your topology is defined as: KTable

Re: KTable.filter usage, memory consumption and materialized view semantics

2016-06-27 Thread Philippe Derome
Then I don't see any simple solution here at least for a novice, especially since I don't know what can trigger the boolean flag to true. On 27 Jun 2016 5:38 p.m., "Guozhang Wang" wrote: > My concern is that, the overhead of requesting the source KTable to be > materialized

Re: KTable.filter usage, memory consumption and materialized view semantics

2016-06-27 Thread Guozhang Wang
My concern is that, the overhead of requesting the source KTable to be materialized (i.e. creating a state store, and sending the {old -> new} pair instead of the new value only) may be over-whelming compared with its potential benefits of reducing the downstream traffic. Guozhang On Sun, Jun

Re: KTable.filter usage, memory consumption and materialized view semantics

2016-06-26 Thread Philippe Derome
Guozhang, would you say it's advisable to initialize KTableFilter.sendOldValues to true instead of false? That's what I see that can trigger your described case 3 to potentially desirable effect, but I didn't include it into pull request. If left to default value of false, I don't know what

Re: KTable.filter usage, memory consumption and materialized view semantics

2016-06-25 Thread Guozhang Wang
Thanks! You can follow this step-by-step guidance to contribute to Kafka via github. https://cwiki.apache.org/confluence/display/KAFKA/Contributing+Code+Changes#ContributingCodeChanges-PullRequest Guozhang On Sat, Jun 25, 2016 at 8:40 PM, Philippe Derome wrote: > I have

Re: KTable.filter usage, memory consumption and materialized view semantics

2016-06-25 Thread Philippe Derome
I have a 1 liner solution for this in KTableFilter.java and about 5-6 lines changes to existing unit test KTableFilterTest.testSendingOldValue. I included those lines with context in the JIRA. I am struggling a bit with github being new to it and how to do a proper pull request so hopefully that

Re: KTable.filter usage, memory consumption and materialized view semantics

2016-06-24 Thread Guozhang Wang
Hi Philippe, Great, since you agree with my reasonings, I have created a JIRA ticket for optimizing KTableFilter (feel free to pick it up if you are interested in contributing): https://issues.apache.org/jira/browse/KAFKA-3902 About case 3-c-1), what I meant is that since "predicate return true

Re: KTable.filter usage, memory consumption and materialized view semantics

2016-06-24 Thread Philippe Derome
Yes, it looks very good. Your detailed explanation appears compelling enough to reveal that some of the details of the complexity of a streams system are probably inherent complexity (not that I dared assume it was "easy" but I could afford to be conveniently unaware). It took me 30 minutes to

Re: KTable.filter usage, memory consumption and materialized view semantics

2016-06-24 Thread Guozhang Wang
Hello Philippe, Very good points, let me dump my thoughts about "KTable.filter" specifically and how we can improve on that: 1. Some context: when a KTable participates in a downstream operators (e.g. if that operator is an aggregation), then we need to materialize this KTable and send both its

Re: KTable.filter usage, memory consumption and materialized view semantics

2016-06-23 Thread Philippe Derome
Thanks a lot for the detailed feedback, its clarity and the reference to KIP-63, which however is for the most part above my head for now. Having said that, I still hold the view that the behaviour I presented is undesirable and hardly defensible and we may have no choice but to agree to disagree

Re: KTable.filter usage, memory consumption and materialized view semantics

2016-06-23 Thread Guozhang Wang
Hello Philippe, I think your question is really in two-folds: 1. What is the semantic difference between a KTable and a KStream, and more specifically how should we interpret (key, null) in KTable? You can find some explanations in this documentation: