[jira] [Commented] (BEAM-2477) BeamAggregationRel should use Combine.perKey instead of GroupByKey
[ https://issues.apache.org/jira/browse/BEAM-2477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16061796#comment-16061796 ] ASF GitHub Bot commented on BEAM-2477: -- Github user JingsongLi closed the pull request at: https://github.com/apache/beam/pull/3398 > BeamAggregationRel should use Combine.perKey instead of GroupByKey > -- > > Key: BEAM-2477 > URL: https://issues.apache.org/jira/browse/BEAM-2477 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Jingsong Lee >Assignee: Jingsong Lee > Labels: dsl_sql_merge > > Their semantics are the same, but the efficiency of implementation is quite > different, and at the runner level there is a lot of optimization for > `Combine.perKey`. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (BEAM-2477) BeamAggregationRel should use Combine.perKey instead of GroupByKey
[ https://issues.apache.org/jira/browse/BEAM-2477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16055416#comment-16055416 ] Jingsong Lee commented on BEAM-2477: *Local combine*: Cloud Dataflow/Flink Batch optimizes Combine operations (such as Count and Sum) by performing partial combining locally before sending the data to the main grouping operation. Graph optimizations in https://cloud.google.com/blog/big-data/2017/05/after-lambda-exactly-once-processing-in-cloud-dataflow-part-2-ensuring-low-latency *Incremental aggregation*: Similar to Flink's concept, https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/windows.html#windowfunction-with-incremental-aggregation While the GroupByKey will keep the details of elements until the window closes. (AFAIK in Flink Runner) > BeamAggregationRel should use Combine.perKey instead of GroupByKey > -- > > Key: BEAM-2477 > URL: https://issues.apache.org/jira/browse/BEAM-2477 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Jingsong Lee >Assignee: Jingsong Lee > Labels: dsl_sql_merge > > Their semantics are the same, but the efficiency of implementation is quite > different, and at the runner level there is a lot of optimization for > `Combine.perKey`. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (BEAM-2477) BeamAggregationRel should use Combine.perKey instead of GroupByKey
[ https://issues.apache.org/jira/browse/BEAM-2477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16055380#comment-16055380 ] James Xu commented on BEAM-2477: [~lzljs3620320] can you provide more info about `Combine.perKey`'s performance is better than `GroupByKey`, e.g. some link? > BeamAggregationRel should use Combine.perKey instead of GroupByKey > -- > > Key: BEAM-2477 > URL: https://issues.apache.org/jira/browse/BEAM-2477 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Jingsong Lee >Assignee: Jingsong Lee > Labels: dsl_sql_merge > > Their semantics are the same, but the efficiency of implementation is quite > different, and at the runner level there is a lot of optimization for > `Combine.perKey`. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (BEAM-2477) BeamAggregationRel should use Combine.perKey instead of GroupByKey
[ https://issues.apache.org/jira/browse/BEAM-2477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16055029#comment-16055029 ] ASF GitHub Bot commented on BEAM-2477: -- GitHub user JingsongLi opened a pull request: https://github.com/apache/beam/pull/3398 [BEAM-2477] BeamAggregationRel should use Combine.perKey instead of GroupByKey Be sure to do all of the following to help us incorporate your contribution quickly and easily: - [ ] Make sure the PR title is formatted like: `[BEAM-] Description of pull request` - [ ] Make sure tests pass via `mvn clean verify`. - [ ] Replace `` in the title with the actual Jira issue number, if there is one. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). --- You can merge this pull request into a Git repository by running: $ git pull https://github.com/JingsongLi/beam BEAM-2477 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/beam/pull/3398.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3398 commit 2ed6075ece20b553047e5ee3918460ea4e2c995f Author: JingsongLi Date: 2017-06-20T01:20:19Z [BEAM-2477] BeamAggregationRel should use Combine.perKey instead of GroupByKey > BeamAggregationRel should use Combine.perKey instead of GroupByKey > -- > > Key: BEAM-2477 > URL: https://issues.apache.org/jira/browse/BEAM-2477 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Jingsong Lee >Assignee: Jingsong Lee > > Their semantics are the same, but the efficiency of implementation is quite > different, and at the runner level there is a lot of optimization for > `Combine.perKey`. -- This message was sent by Atlassian JIRA (v6.4.14#64029)