[jira] [Updated] (FLINK-11714) Add cost model for both batch and streaming

2019-02-28 Thread Kurt Young (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-11714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kurt Young updated FLINK-11714:
---
Component/s: (was: API / Table SQL)
 SQL / Planner

> Add cost model for both batch and streaming
> ---
>
> Key: FLINK-11714
> URL: https://issues.apache.org/jira/browse/FLINK-11714
> Project: Flink
>  Issue Type: New Feature
>  Components: SQL / Planner
>Reporter: godfrey he
>Assignee: godfrey he
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.9.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Calcite's default cost model only contains ROWS, IO and CPU, and does not 
> take IO and CPU into account when the cost is compared.
> There are two improvements:
> 1. Add NETWORK and MEMORY to represents distribution cost and memory usage.
> 2. The optimization goal is to use minimal resources now, so the comparison 
> order of factors is:
> (1). first compare CPU. Each operator will use CPU, so we think it's the 
> most important factor.
> (2). then compare MEMORY, NETWORK and IO as a normalized value. 
> Comparison order of them is not easy to decide, so convert them to CPU cost 
> by different ratio.
> (3). finally compare ROWS. ROWS has been counted when calculating other 
> factory.
>  e.g. CPU of Sort = nLogN(ROWS) * number of sort keys, CPU of Filter 
> = ROWS * condition cost on a row.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-11714) Add cost model for both batch and streaming

2019-02-26 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-11714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-11714:
---
Labels: pull-request-available  (was: )

> Add cost model for both batch and streaming
> ---
>
> Key: FLINK-11714
> URL: https://issues.apache.org/jira/browse/FLINK-11714
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API  SQL
>Reporter: godfrey he
>Assignee: godfrey he
>Priority: Major
>  Labels: pull-request-available
>
> Calcite's default cost model only contains ROWS, IO and CPU, and does not 
> take IO and CPU into account when the cost is compared.
> There are two improvements:
> 1. Add NETWORK and MEMORY to represents distribution cost and memory usage.
> 2. The optimization goal is to use minimal resources now, so the comparison 
> order of factors is:
> (1). first compare CPU. Each operator will use CPU, so we think it's the 
> most important factor.
> (2). then compare MEMORY, NETWORK and IO as a normalized value. 
> Comparison order of them is not easy to decide, so convert them to CPU cost 
> by different ratio.
> (3). finally compare ROWS. ROWS has been counted when calculating other 
> factory.
>  e.g. CPU of Sort = nLogN(ROWS) * number of sort keys, CPU of Filter 
> = ROWS * condition cost on a row.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-11714) Add cost model for both batch and streaming

2019-02-24 Thread godfrey he (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-11714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

godfrey he updated FLINK-11714:
---
Description: 
Calcite's default cost model only contains ROWS, IO and CPU, and does not take 
IO and CPU into account when the cost is compared.

There are two improvements:

1. Add NETWORK and MEMORY to represents distribution cost and memory usage.

2. The optimization goal is to use minimal resources now, so the comparison 
order of factors is:
(1). first compare CPU. Each operator will use CPU, so we think it's the 
most important factor.
(2). then compare MEMORY, NETWORK and IO as a normalized value. Comparison 
order of them is not easy to decide, so convert them to CPU cost by different 
ratio.
(3). finally compare ROWS. ROWS has been counted when calculating other 
factory.
 e.g. CPU of Sort = nLogN(ROWS) * number of sort keys, CPU of Filter = 
ROWS * condition cost on a row.

  was:
Calcite's default cost model only contains ROWS, IO and CPU, and does not take 
IO and CPU into account when the cost is compared.

There are two improvements:

1. Add NETWORK and MEMORY to represents distribution cost and memory usage.

2. compare CPU value first, because each operator will use CPU. compare ROWS 
value last, because ROWS has been counted when calculating other values. e.g. 
CPU of Sort = nLogN(ROWS) * number of sort keys.

 


> Add cost model for both batch and streaming
> ---
>
> Key: FLINK-11714
> URL: https://issues.apache.org/jira/browse/FLINK-11714
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API  SQL
>Reporter: godfrey he
>Assignee: godfrey he
>Priority: Major
>
> Calcite's default cost model only contains ROWS, IO and CPU, and does not 
> take IO and CPU into account when the cost is compared.
> There are two improvements:
> 1. Add NETWORK and MEMORY to represents distribution cost and memory usage.
> 2. The optimization goal is to use minimal resources now, so the comparison 
> order of factors is:
> (1). first compare CPU. Each operator will use CPU, so we think it's the 
> most important factor.
> (2). then compare MEMORY, NETWORK and IO as a normalized value. 
> Comparison order of them is not easy to decide, so convert them to CPU cost 
> by different ratio.
> (3). finally compare ROWS. ROWS has been counted when calculating other 
> factory.
>  e.g. CPU of Sort = nLogN(ROWS) * number of sort keys, CPU of Filter 
> = ROWS * condition cost on a row.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)