[jira] [Commented] (BEAM-2477) BeamAggregationRel should use Combine.perKey instead of GroupByKey

2017-06-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16061796#comment-16061796
 ] 

ASF GitHub Bot commented on BEAM-2477:
--

Github user JingsongLi closed the pull request at:

https://github.com/apache/beam/pull/3398


> BeamAggregationRel should use Combine.perKey instead of GroupByKey
> --
>
> Key: BEAM-2477
> URL: https://issues.apache.org/jira/browse/BEAM-2477
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Jingsong Lee
>Assignee: Jingsong Lee
>  Labels: dsl_sql_merge
>
> Their semantics are the same, but the efficiency of implementation is quite 
> different, and at the runner level there is a lot of optimization for 
> `Combine.perKey`.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-2477) BeamAggregationRel should use Combine.perKey instead of GroupByKey

2017-06-20 Thread Jingsong Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16055416#comment-16055416
 ] 

Jingsong Lee commented on BEAM-2477:


*Local combine*: Cloud Dataflow/Flink Batch optimizes Combine operations (such 
as Count and Sum) by performing partial combining locally before sending the 
data to the main grouping operation. Graph optimizations in 
https://cloud.google.com/blog/big-data/2017/05/after-lambda-exactly-once-processing-in-cloud-dataflow-part-2-ensuring-low-latency
*Incremental aggregation*: Similar to Flink's concept, 
https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/windows.html#windowfunction-with-incremental-aggregation

While the GroupByKey will keep the details of elements until the window closes. 
(AFAIK in Flink Runner)

> BeamAggregationRel should use Combine.perKey instead of GroupByKey
> --
>
> Key: BEAM-2477
> URL: https://issues.apache.org/jira/browse/BEAM-2477
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Jingsong Lee
>Assignee: Jingsong Lee
>  Labels: dsl_sql_merge
>
> Their semantics are the same, but the efficiency of implementation is quite 
> different, and at the runner level there is a lot of optimization for 
> `Combine.perKey`.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-2477) BeamAggregationRel should use Combine.perKey instead of GroupByKey

2017-06-20 Thread James Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16055380#comment-16055380
 ] 

James Xu commented on BEAM-2477:


[~lzljs3620320] can you provide more info about `Combine.perKey`'s performance 
is better than `GroupByKey`, e.g. some link?

> BeamAggregationRel should use Combine.perKey instead of GroupByKey
> --
>
> Key: BEAM-2477
> URL: https://issues.apache.org/jira/browse/BEAM-2477
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Jingsong Lee
>Assignee: Jingsong Lee
>  Labels: dsl_sql_merge
>
> Their semantics are the same, but the efficiency of implementation is quite 
> different, and at the runner level there is a lot of optimization for 
> `Combine.perKey`.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-2477) BeamAggregationRel should use Combine.perKey instead of GroupByKey

2017-06-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16055029#comment-16055029
 ] 

ASF GitHub Bot commented on BEAM-2477:
--

GitHub user JingsongLi opened a pull request:

https://github.com/apache/beam/pull/3398

[BEAM-2477] BeamAggregationRel should use Combine.perKey instead of 
GroupByKey

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`.
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.pdf).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/JingsongLi/beam BEAM-2477

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3398.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3398


commit 2ed6075ece20b553047e5ee3918460ea4e2c995f
Author: JingsongLi 
Date:   2017-06-20T01:20:19Z

[BEAM-2477] BeamAggregationRel should use Combine.perKey instead of 
GroupByKey




> BeamAggregationRel should use Combine.perKey instead of GroupByKey
> --
>
> Key: BEAM-2477
> URL: https://issues.apache.org/jira/browse/BEAM-2477
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Jingsong Lee
>Assignee: Jingsong Lee
>
> Their semantics are the same, but the efficiency of implementation is quite 
> different, and at the runner level there is a lot of optimization for 
> `Combine.perKey`.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)