[jira] [Commented] (KAFKA-4281) Should be able to forward aggregation values immediately
[ https://issues.apache.org/jira/browse/KAFKA-4281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15658156#comment-15658156 ] Guozhang Wang commented on KAFKA-4281: -- As a hinder-thought of KIP-63, I think it is generally a better idea to have finer granularity as for result forwarding with caching turned on. [~damianguy] Do you want to chime in here and help Greg contributing towards this direction if you agree? > Should be able to forward aggregation values immediately > > > Key: KAFKA-4281 > URL: https://issues.apache.org/jira/browse/KAFKA-4281 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 0.10.1.0 >Reporter: Greg Fodor >Assignee: Greg Fodor > > KIP-63 introduced changes to the behavior of aggregations such that the > result of aggregations will not appear to subsequent processors until a state > store flush occurs. This is problematic for latency sensitive aggregations > since flushes occur generally at commit.interval.ms, which is usually a few > seconds. Combined with several aggregations, this can result in several > seconds of latency through a topology for steps dependent upon aggregations. > Two potential solutions: > - Allow finer control over the state store flushing intervals > - Allow users to change the behavior so that certain aggregations will > immediately forward records to the next step (as was the case pre-KIP-63) > A PR is attached that takes the second approach. To add this unfortunately a > large number of files needed to be touched, and this effectively doubles the > number of method signatures around grouping on KTable and KStream. I tried an > alternative approach that let the user opt-in to immediate forwarding via an > additional builder method on KGroupedStream/Table but this didn't work as > expected because in order for the latency to go away, the KTableImpl itself > must also mark its source as forward immediate (otherwise we will still see > latency due to the materialization of the KTableSource still relying upon > state store flushes to propagate.) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-4281) Should be able to forward aggregation values immediately
[ https://issues.apache.org/jira/browse/KAFKA-4281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15560765#comment-15560765 ] Greg Fodor commented on KAFKA-4281: --- PR: https://github.com/apache/kafka/pull/1998 If this approach seems sane, please take a look especially at the window variants -- I am not too familiar with those APIs. > Should be able to forward aggregation values immediately > > > Key: KAFKA-4281 > URL: https://issues.apache.org/jira/browse/KAFKA-4281 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 0.10.1.0 >Reporter: Greg Fodor >Assignee: Guozhang Wang > > KIP-63 introduced changes to the behavior of aggregations such that the > result of aggregations will not appear to subsequent processors until a state > store flush occurs. This is problematic for latency sensitive aggregations > since flushes occur generally at commit.interval.ms, which is usually a few > seconds. Combined with several aggregations, this can result in several > seconds of latency through a topology for steps dependent upon aggregations. > Two potential solutions: > - Allow finer control over the state store flushing intervals > - Allow users to change the behavior so that certain aggregations will > immediately forward records to the next step (as was the case pre-KIP-63) > A PR is attached that takes the second approach. To add this unfortunately a > large number of files needed to be touched, and this effectively doubles the > number of method signatures around grouping on KTable and KStream. I tried an > alternative approach that let the user opt-in to immediate forwarding via an > additional builder method on KGroupedStream/Table but this didn't work as > expected because in order for the latency to go away, the KTableImpl itself > must also mark its source as forward immediate (otherwise we will still see > latency due to the materialization of the KTableSource still relying upon > state store flushes to propagate.) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-4281) Should be able to forward aggregation values immediately
[ https://issues.apache.org/jira/browse/KAFKA-4281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15560764#comment-15560764 ] ASF GitHub Bot commented on KAFKA-4281: --- GitHub user gfodor opened a pull request: https://github.com/apache/kafka/pull/1998 KAFKA-4281: Should be able to forward aggregation values immediately https://issues.apache.org/jira/browse/KAFKA-4281 You can merge this pull request into a Git repository by running: $ git pull https://github.com/AltspaceVR/kafka KAFKA-4281 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/kafka/pull/1998.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1998 commit dfe004a24ff6491f286ac9fd405b6a1cae8ae2f5 Author: Greg FodorDate: 2016-10-09T22:46:02Z Added forwardImmediately argument to various grouping APIs to allow users to specify that records should be immediately forwarded during aggregations, etc > Should be able to forward aggregation values immediately > > > Key: KAFKA-4281 > URL: https://issues.apache.org/jira/browse/KAFKA-4281 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 0.10.1.0 >Reporter: Greg Fodor >Assignee: Guozhang Wang > > KIP-63 introduced changes to the behavior of aggregations such that the > result of aggregations will not appear to subsequent processors until a state > store flush occurs. This is problematic for latency sensitive aggregations > since flushes occur generally at commit.interval.ms, which is usually a few > seconds. Combined with several aggregations, this can result in several > seconds of latency through a topology for steps dependent upon aggregations. > Two potential solutions: > - Allow finer control over the state store flushing intervals > - Allow users to change the behavior so that certain aggregations will > immediately forward records to the next step (as was the case pre-KIP-63) > A PR is attached that takes the second approach. To add this unfortunately a > large number of files needed to be touched, and this effectively doubles the > number of method signatures around grouping on KTable and KStream. I tried an > alternative approach that let the user opt-in to immediate forwarding via an > additional builder method on KGroupedStream/Table but this didn't work as > expected because in order for the latency to go away, the KTableImpl itself > must also mark its source as forward immediate (otherwise we will still see > latency due to the materialization of the KTableSource still relying upon > state store flushes to propagate.) -- This message was sent by Atlassian JIRA (v6.3.4#6332)