[jira] [Commented] (FLINK-15004) Choose two-phase Aggregate if the statistics is unknown

2021-04-27 Thread Flink Jira Bot (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-15004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17333839#comment-17333839
 ] 

Flink Jira Bot commented on FLINK-15004:


This issue was marked "stale-assigned" and has not received an update in 7 
days. It is now automatically unassigned. If you are still working on it, you 
can assign it to yourself again. Please also give an update about the status of 
the work.

> Choose two-phase Aggregate if the statistics is unknown
> ---
>
> Key: FLINK-15004
> URL: https://issues.apache.org/jira/browse/FLINK-15004
> Project: Flink
>  Issue Type: Improvement
>  Components: Table SQL / Planner
>Affects Versions: 1.9.1, 1.10.0
>Reporter: godfrey he
>Assignee: godfrey he
>Priority: Major
>  Labels: pull-request-available, stale-assigned
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently, blink planner will use default rowCount value (defined in 
> {{FlinkPreparingTableBase#DEFAULT_ROWCOUNT}} ) when the statistics is 
> unknown, and maybe choose one-phase Aggregate. The job will hang if the data 
> is skewed. So It's better to use two-phase Aggregate for execution stability 
> if the statistics is unknown.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-15004) Choose two-phase Aggregate if the statistics is unknown

2021-04-16 Thread Flink Jira Bot (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-15004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17323147#comment-17323147
 ] 

Flink Jira Bot commented on FLINK-15004:


This issue is assigned but has not received an update in 7 days so it has been 
labeled "stale-assigned". If you are still working on the issue, please give an 
update and remove the label. If you are no longer working on the issue, please 
unassign so someone else may work on it. In 7 days the issue will be 
automatically unassigned.

> Choose two-phase Aggregate if the statistics is unknown
> ---
>
> Key: FLINK-15004
> URL: https://issues.apache.org/jira/browse/FLINK-15004
> Project: Flink
>  Issue Type: Improvement
>  Components: Table SQL / Planner
>Affects Versions: 1.9.1, 1.10.0
>Reporter: godfrey he
>Assignee: godfrey he
>Priority: Major
>  Labels: pull-request-available, stale-assigned
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently, blink planner will use default rowCount value (defined in 
> {{FlinkPreparingTableBase#DEFAULT_ROWCOUNT}} ) when the statistics is 
> unknown, and maybe choose one-phase Aggregate. The job will hang if the data 
> is skewed. So It's better to use two-phase Aggregate for execution stability 
> if the statistics is unknown.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-15004) Choose two-phase Aggregate if the statistics is unknown

2019-12-03 Thread godfrey he (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-15004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16987434#comment-16987434
 ] 

godfrey he commented on FLINK-15004:


yes, you are right

> Choose two-phase Aggregate if the statistics is unknown
> ---
>
> Key: FLINK-15004
> URL: https://issues.apache.org/jira/browse/FLINK-15004
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / Planner
>Affects Versions: 1.9.0, 1.9.1
>Reporter: godfrey he
>Assignee: godfrey he
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.10.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently, blink planner will use default rowCount value (defined in 
> {{FlinkPreparingTableBase#DEFAULT_ROWCOUNT}} ) when the statistics is 
> unknown, and maybe choose one-phase Aggregate. The job will hang if the data 
> is skewed. So It's better to use two-phase Aggregate for execution stability 
> if the statistics is unknown.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-15004) Choose two-phase Aggregate if the statistics is unknown

2019-12-03 Thread Kurt Young (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-15004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16986768#comment-16986768
 ] 

Kurt Young commented on FLINK-15004:


So eventually we need both ndv and row count to determine the aggregation 
ratio, right? That sounds reasonable to me. 

> Choose two-phase Aggregate if the statistics is unknown
> ---
>
> Key: FLINK-15004
> URL: https://issues.apache.org/jira/browse/FLINK-15004
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / Planner
>Affects Versions: 1.9.0, 1.9.1
>Reporter: godfrey he
>Assignee: godfrey he
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.10.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently, blink planner will use default rowCount value (defined in 
> {{FlinkPreparingTableBase#DEFAULT_ROWCOUNT}} ) when the statistics is 
> unknown, and maybe choose one-phase Aggregate. The job will hang if the data 
> is skewed. So It's better to use two-phase Aggregate for execution stability 
> if the statistics is unknown.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-15004) Choose two-phase Aggregate if the statistics is unknown

2019-12-03 Thread godfrey he (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-15004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16986742#comment-16986742
 ] 

godfrey he commented on FLINK-15004:


[~ykt836], yes, `ndv` had been considered before: if `ndv` is unknown, the 
planner will choose two-phase aggregate. 
`DistinctRowCount` metadata handler could return null which means unknown, 
while `RowCount` metadata handler always returns primitive type and planner 
does not know whether the inputs have real row count or just use default value. 
the issue mainly solves the following scenarios: `ndv` is known, while row 
count is unknown.

> Choose two-phase Aggregate if the statistics is unknown
> ---
>
> Key: FLINK-15004
> URL: https://issues.apache.org/jira/browse/FLINK-15004
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / Planner
>Affects Versions: 1.9.0, 1.9.1
>Reporter: godfrey he
>Assignee: godfrey he
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.10.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently, blink planner will use default rowCount value (defined in 
> {{FlinkPreparingTableBase#DEFAULT_ROWCOUNT}} ) when the statistics is 
> unknown, and maybe choose one-phase Aggregate. The job will hang if the data 
> is skewed. So It's better to use two-phase Aggregate for execution stability 
> if the statistics is unknown.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-15004) Choose two-phase Aggregate if the statistics is unknown

2019-12-03 Thread Kurt Young (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-15004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16986719#comment-16986719
 ] 

Kurt Young commented on FLINK-15004:


Is row count sufficient for us to decide whether we want to have one or two 
phase aggregation? I think the key's ndv will be much more important here. 

> Choose two-phase Aggregate if the statistics is unknown
> ---
>
> Key: FLINK-15004
> URL: https://issues.apache.org/jira/browse/FLINK-15004
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / Planner
>Affects Versions: 1.9.0, 1.9.1
>Reporter: godfrey he
>Assignee: godfrey he
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.10.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently, blink planner will use default rowCount value (defined in 
> {{FlinkPreparingTableBase#DEFAULT_ROWCOUNT}} ) when the statistics is 
> unknown, and maybe choose one-phase Aggregate. The job will hang if the data 
> is skewed. So It's better to use two-phase Aggregate for execution stability 
> if the statistics is unknown.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)