Hmm, I'm not sure if we can optimize this case as well. Following your
example:
KTable T1 = builder.table("source-topic");
KTable T2 = table.filter(value > 2);
And suppose the "source-topic" is piping the messages to T1 as: {a: 3}, {b:
5}, {a: 1}...
When {a: 3} is passed from T1 to T2, the
Guozhang,
my latest commit would propose that semantics of your JIRA case 2 be
changed a little to suppress nulls when not sendingOldValues and not
materializing. When a table T2 is created first from another table T1 and
the filter does not match for the key k from T1, the invalid key k does not
good.
On Wed, Jun 29, 2016 at 6:44 PM, Guozhang Wang wrote:
> Yes, they are related in the sense that if we always materialize a source
> KTable, then we can completely replace the `sendOldValues` as it will
> always be true. But since 3911 is a rather big change, I'd prefer
Yes, they are related in the sense that if we always materialize a source
KTable, then we can completely replace the `sendOldValues` as it will
always be true. But since 3911 is a rather big change, I'd prefer to
complete this ticket first, and refactor it when we decided to work on 3911
later.
Is this point of view consistent with new ticket 3911 (Enforce KTable
materialisation ) just submitted by Eno. T?
Should the two tickets be linked somehow if they are related?
My concern is that, the overhead of requesting the source KTable to be
materialized (i.e. creating a state store, and
The boolean flag can be reset by a child operator which requires the source
to be materialized, more details can be found in this design wiki:
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65143671
But just to give you a concrete idea, if your topology is defined as:
KTable
Then I don't see any simple solution here at least for a novice, especially
since I don't know what can trigger the boolean flag to true.
On 27 Jun 2016 5:38 p.m., "Guozhang Wang" wrote:
> My concern is that, the overhead of requesting the source KTable to be
> materialized
My concern is that, the overhead of requesting the source KTable to be
materialized (i.e. creating a state store, and sending the {old -> new}
pair instead of the new value only) may be over-whelming compared with its
potential benefits of reducing the downstream traffic.
Guozhang
On Sun, Jun
Guozhang,
would you say it's advisable to initialize KTableFilter.sendOldValues to
true instead of false? That's what I see that can trigger your described
case 3 to potentially desirable effect, but I didn't include it into pull
request. If left to default value of false, I don't know what
Thanks! You can follow this step-by-step guidance to contribute to Kafka
via github.
https://cwiki.apache.org/confluence/display/KAFKA/Contributing+Code+Changes#ContributingCodeChanges-PullRequest
Guozhang
On Sat, Jun 25, 2016 at 8:40 PM, Philippe Derome wrote:
> I have
I have a 1 liner solution for this in KTableFilter.java and about 5-6 lines
changes to existing unit test KTableFilterTest.testSendingOldValue. I
included those lines with context in the JIRA. I am struggling a bit with
github being new to it and how to do a proper pull request so hopefully
that
Hi Philippe,
Great, since you agree with my reasonings, I have created a JIRA ticket for
optimizing KTableFilter (feel free to pick it up if you are interested in
contributing):
https://issues.apache.org/jira/browse/KAFKA-3902
About case 3-c-1), what I meant is that since "predicate return true
Yes, it looks very good. Your detailed explanation appears compelling
enough to reveal that some of the details of the complexity of a streams
system are probably inherent complexity (not that I dared assume it was
"easy" but I could afford to be conveniently unaware). It took me 30
minutes to
Hello Philippe,
Very good points, let me dump my thoughts about "KTable.filter"
specifically and how we can improve on that:
1. Some context: when a KTable participates in a downstream operators (e.g.
if that operator is an aggregation), then we need to materialize this
KTable and send both its
Thanks a lot for the detailed feedback, its clarity and the reference to
KIP-63, which however is for the most part above my head for now.
Having said that, I still hold the view that the behaviour I presented is
undesirable and hardly defensible and we may have no choice but to agree to
disagree
Hello Philippe,
I think your question is really in two-folds:
1. What is the semantic difference between a KTable and a KStream, and more
specifically how should we interpret (key, null) in KTable?
You can find some explanations in this documentation:
16 matches
Mail list logo