[jira] [Updated] (DRILL-7558) Generalize filter push-down planner phase

2021-04-25 Thread Paul Rogers (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers updated DRILL-7558:
---
Fix Version/s: (was: 1.19.0)

> Generalize filter push-down planner phase
> -
>
> Key: DRILL-7558
> URL: https://issues.apache.org/jira/browse/DRILL-7558
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.18.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
>
> DRILL-7458 provides a base framework for storage plugins, including a 
> simplified filter push-down mechanism. [~volodymyr] notes that it may be 
> *too* simple:
> {quote}
> What about the case when this rule was applied for one filter, but planner at 
> some point pushed another filter above the scan, for example, if we have such 
> case:
> {code}
> Filter(a=2)
>   Join(t1.b=t2.b, type=inner)
> Filter(b=3)
> Scan(t1)
> Scan(t2)
> {code}
> Filter b=3 will be pushed into scan, planner will push filter above join:
> {code}
> Join(t1.b=t2.b, type=inner)
> Filter(a=2)
> Scan(t1, b=3)
> Scan(t2)
> {code}
> In this case, check whether filter was pushed is not enough.
> {quote}
> Drill divides planning into a number of *phases*, each defined by a set of 
> *rules*. Most storage plugins perform filter push-down during the physical 
> planning stage. However, by this point, Drill has already decided on the 
> degree of parallelism: it is too late to use filter push-down to set the 
> degree of parallelism. Yet, if using something like a REST API, we want to 
> use filters to help us shard the query (that is, to set the degree of 
> parallelism.)
>  
> DRILL-7458 performs filter push-down at *logical* planning time to work 
> around the above limitation. (In Drill, there are three different phases that 
> could be considered the logical phase, depending on which planning options 
> are set to control Calcite.)
> [~volodymyr] points out that the the logical plan phase may be wrong because 
> it will perform rewrites of the type he cited.
> Thus, we need to research where to insert filter push down. It must come:
> * After rewrites of the kind described above.
> * After join equivalence computations. (See DRILL-7556.)
> * Before the decision is made about the number of minor fragments.
> The goal of this ticket is to either:
> * Research to identify an existing phase which satisfies these requirements, 
> or
> * Create a new phase.
> Due to the way Calcite works, it is not a good idea to have a single phase 
> handle two tasks that depend on one another. That is, we cannot combine 
> filter push down in a phase which defines the filters, nor can we add filter 
> push-down in a phase that choose parallelism.
> Background: Calcite is a rule-based query planner inspired by 
> [Volcano|https://paperhub.s3.amazonaws.com/dace52a42c07f7f8348b08dc2b186061.pdf].
> The above issue is a flaw with rule-based planners and was identified as 
> early as the [Cascades query framework 
> paper|https://www.csd.uoc.gr/~hy460/pdf/CascadesFrameworkForQueryOptimization.pdf]
>  which was the follow-up to Volcano.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7558) Generalize filter push-down planner phase

2020-09-06 Thread Abhishek Girish (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Girish updated DRILL-7558:
---
Fix Version/s: (was: 1.18.0)
   1.19.0

> Generalize filter push-down planner phase
> -
>
> Key: DRILL-7558
> URL: https://issues.apache.org/jira/browse/DRILL-7558
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.18.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.19.0
>
>
> DRILL-7458 provides a base framework for storage plugins, including a 
> simplified filter push-down mechanism. [~volodymyr] notes that it may be 
> *too* simple:
> {quote}
> What about the case when this rule was applied for one filter, but planner at 
> some point pushed another filter above the scan, for example, if we have such 
> case:
> {code}
> Filter(a=2)
>   Join(t1.b=t2.b, type=inner)
> Filter(b=3)
> Scan(t1)
> Scan(t2)
> {code}
> Filter b=3 will be pushed into scan, planner will push filter above join:
> {code}
> Join(t1.b=t2.b, type=inner)
> Filter(a=2)
> Scan(t1, b=3)
> Scan(t2)
> {code}
> In this case, check whether filter was pushed is not enough.
> {quote}
> Drill divides planning into a number of *phases*, each defined by a set of 
> *rules*. Most storage plugins perform filter push-down during the physical 
> planning stage. However, by this point, Drill has already decided on the 
> degree of parallelism: it is too late to use filter push-down to set the 
> degree of parallelism. Yet, if using something like a REST API, we want to 
> use filters to help us shard the query (that is, to set the degree of 
> parallelism.)
>  
> DRILL-7458 performs filter push-down at *logical* planning time to work 
> around the above limitation. (In Drill, there are three different phases that 
> could be considered the logical phase, depending on which planning options 
> are set to control Calcite.)
> [~volodymyr] points out that the the logical plan phase may be wrong because 
> it will perform rewrites of the type he cited.
> Thus, we need to research where to insert filter push down. It must come:
> * After rewrites of the kind described above.
> * After join equivalence computations. (See DRILL-7556.)
> * Before the decision is made about the number of minor fragments.
> The goal of this ticket is to either:
> * Research to identify an existing phase which satisfies these requirements, 
> or
> * Create a new phase.
> Due to the way Calcite works, it is not a good idea to have a single phase 
> handle two tasks that depend on one another. That is, we cannot combine 
> filter push down in a phase which defines the filters, nor can we add filter 
> push-down in a phase that choose parallelism.
> Background: Calcite is a rule-based query planner inspired by 
> [Volcano|https://paperhub.s3.amazonaws.com/dace52a42c07f7f8348b08dc2b186061.pdf].
> The above issue is a flaw with rule-based planners and was identified as 
> early as the [Cascades query framework 
> paper|https://www.csd.uoc.gr/~hy460/pdf/CascadesFrameworkForQueryOptimization.pdf]
>  which was the follow-up to Volcano.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7558) Generalize filter push-down planner phase

2020-09-06 Thread Abhishek Girish (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Girish updated DRILL-7558:
---
Target Version/s: 1.19.0

> Generalize filter push-down planner phase
> -
>
> Key: DRILL-7558
> URL: https://issues.apache.org/jira/browse/DRILL-7558
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.18.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.18.0
>
>
> DRILL-7458 provides a base framework for storage plugins, including a 
> simplified filter push-down mechanism. [~volodymyr] notes that it may be 
> *too* simple:
> {quote}
> What about the case when this rule was applied for one filter, but planner at 
> some point pushed another filter above the scan, for example, if we have such 
> case:
> {code}
> Filter(a=2)
>   Join(t1.b=t2.b, type=inner)
> Filter(b=3)
> Scan(t1)
> Scan(t2)
> {code}
> Filter b=3 will be pushed into scan, planner will push filter above join:
> {code}
> Join(t1.b=t2.b, type=inner)
> Filter(a=2)
> Scan(t1, b=3)
> Scan(t2)
> {code}
> In this case, check whether filter was pushed is not enough.
> {quote}
> Drill divides planning into a number of *phases*, each defined by a set of 
> *rules*. Most storage plugins perform filter push-down during the physical 
> planning stage. However, by this point, Drill has already decided on the 
> degree of parallelism: it is too late to use filter push-down to set the 
> degree of parallelism. Yet, if using something like a REST API, we want to 
> use filters to help us shard the query (that is, to set the degree of 
> parallelism.)
>  
> DRILL-7458 performs filter push-down at *logical* planning time to work 
> around the above limitation. (In Drill, there are three different phases that 
> could be considered the logical phase, depending on which planning options 
> are set to control Calcite.)
> [~volodymyr] points out that the the logical plan phase may be wrong because 
> it will perform rewrites of the type he cited.
> Thus, we need to research where to insert filter push down. It must come:
> * After rewrites of the kind described above.
> * After join equivalence computations. (See DRILL-7556.)
> * Before the decision is made about the number of minor fragments.
> The goal of this ticket is to either:
> * Research to identify an existing phase which satisfies these requirements, 
> or
> * Create a new phase.
> Due to the way Calcite works, it is not a good idea to have a single phase 
> handle two tasks that depend on one another. That is, we cannot combine 
> filter push down in a phase which defines the filters, nor can we add filter 
> push-down in a phase that choose parallelism.
> Background: Calcite is a rule-based query planner inspired by 
> [Volcano|https://paperhub.s3.amazonaws.com/dace52a42c07f7f8348b08dc2b186061.pdf].
> The above issue is a flaw with rule-based planners and was identified as 
> early as the [Cascades query framework 
> paper|https://www.csd.uoc.gr/~hy460/pdf/CascadesFrameworkForQueryOptimization.pdf]
>  which was the follow-up to Volcano.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)