[jira] [Updated] (DRILL-7558) Generalize filter push-down planner phase
[ https://issues.apache.org/jira/browse/DRILL-7558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Rogers updated DRILL-7558: --- Fix Version/s: (was: 1.19.0) > Generalize filter push-down planner phase > - > > Key: DRILL-7558 > URL: https://issues.apache.org/jira/browse/DRILL-7558 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.18.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > > DRILL-7458 provides a base framework for storage plugins, including a > simplified filter push-down mechanism. [~volodymyr] notes that it may be > *too* simple: > {quote} > What about the case when this rule was applied for one filter, but planner at > some point pushed another filter above the scan, for example, if we have such > case: > {code} > Filter(a=2) > Join(t1.b=t2.b, type=inner) > Filter(b=3) > Scan(t1) > Scan(t2) > {code} > Filter b=3 will be pushed into scan, planner will push filter above join: > {code} > Join(t1.b=t2.b, type=inner) > Filter(a=2) > Scan(t1, b=3) > Scan(t2) > {code} > In this case, check whether filter was pushed is not enough. > {quote} > Drill divides planning into a number of *phases*, each defined by a set of > *rules*. Most storage plugins perform filter push-down during the physical > planning stage. However, by this point, Drill has already decided on the > degree of parallelism: it is too late to use filter push-down to set the > degree of parallelism. Yet, if using something like a REST API, we want to > use filters to help us shard the query (that is, to set the degree of > parallelism.) > > DRILL-7458 performs filter push-down at *logical* planning time to work > around the above limitation. (In Drill, there are three different phases that > could be considered the logical phase, depending on which planning options > are set to control Calcite.) > [~volodymyr] points out that the the logical plan phase may be wrong because > it will perform rewrites of the type he cited. > Thus, we need to research where to insert filter push down. It must come: > * After rewrites of the kind described above. > * After join equivalence computations. (See DRILL-7556.) > * Before the decision is made about the number of minor fragments. > The goal of this ticket is to either: > * Research to identify an existing phase which satisfies these requirements, > or > * Create a new phase. > Due to the way Calcite works, it is not a good idea to have a single phase > handle two tasks that depend on one another. That is, we cannot combine > filter push down in a phase which defines the filters, nor can we add filter > push-down in a phase that choose parallelism. > Background: Calcite is a rule-based query planner inspired by > [Volcano|https://paperhub.s3.amazonaws.com/dace52a42c07f7f8348b08dc2b186061.pdf]. > The above issue is a flaw with rule-based planners and was identified as > early as the [Cascades query framework > paper|https://www.csd.uoc.gr/~hy460/pdf/CascadesFrameworkForQueryOptimization.pdf] > which was the follow-up to Volcano. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (DRILL-7558) Generalize filter push-down planner phase
[ https://issues.apache.org/jira/browse/DRILL-7558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Girish updated DRILL-7558: --- Fix Version/s: (was: 1.18.0) 1.19.0 > Generalize filter push-down planner phase > - > > Key: DRILL-7558 > URL: https://issues.apache.org/jira/browse/DRILL-7558 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.18.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Fix For: 1.19.0 > > > DRILL-7458 provides a base framework for storage plugins, including a > simplified filter push-down mechanism. [~volodymyr] notes that it may be > *too* simple: > {quote} > What about the case when this rule was applied for one filter, but planner at > some point pushed another filter above the scan, for example, if we have such > case: > {code} > Filter(a=2) > Join(t1.b=t2.b, type=inner) > Filter(b=3) > Scan(t1) > Scan(t2) > {code} > Filter b=3 will be pushed into scan, planner will push filter above join: > {code} > Join(t1.b=t2.b, type=inner) > Filter(a=2) > Scan(t1, b=3) > Scan(t2) > {code} > In this case, check whether filter was pushed is not enough. > {quote} > Drill divides planning into a number of *phases*, each defined by a set of > *rules*. Most storage plugins perform filter push-down during the physical > planning stage. However, by this point, Drill has already decided on the > degree of parallelism: it is too late to use filter push-down to set the > degree of parallelism. Yet, if using something like a REST API, we want to > use filters to help us shard the query (that is, to set the degree of > parallelism.) > > DRILL-7458 performs filter push-down at *logical* planning time to work > around the above limitation. (In Drill, there are three different phases that > could be considered the logical phase, depending on which planning options > are set to control Calcite.) > [~volodymyr] points out that the the logical plan phase may be wrong because > it will perform rewrites of the type he cited. > Thus, we need to research where to insert filter push down. It must come: > * After rewrites of the kind described above. > * After join equivalence computations. (See DRILL-7556.) > * Before the decision is made about the number of minor fragments. > The goal of this ticket is to either: > * Research to identify an existing phase which satisfies these requirements, > or > * Create a new phase. > Due to the way Calcite works, it is not a good idea to have a single phase > handle two tasks that depend on one another. That is, we cannot combine > filter push down in a phase which defines the filters, nor can we add filter > push-down in a phase that choose parallelism. > Background: Calcite is a rule-based query planner inspired by > [Volcano|https://paperhub.s3.amazonaws.com/dace52a42c07f7f8348b08dc2b186061.pdf]. > The above issue is a flaw with rule-based planners and was identified as > early as the [Cascades query framework > paper|https://www.csd.uoc.gr/~hy460/pdf/CascadesFrameworkForQueryOptimization.pdf] > which was the follow-up to Volcano. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (DRILL-7558) Generalize filter push-down planner phase
[ https://issues.apache.org/jira/browse/DRILL-7558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Girish updated DRILL-7558: --- Target Version/s: 1.19.0 > Generalize filter push-down planner phase > - > > Key: DRILL-7558 > URL: https://issues.apache.org/jira/browse/DRILL-7558 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.18.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Fix For: 1.18.0 > > > DRILL-7458 provides a base framework for storage plugins, including a > simplified filter push-down mechanism. [~volodymyr] notes that it may be > *too* simple: > {quote} > What about the case when this rule was applied for one filter, but planner at > some point pushed another filter above the scan, for example, if we have such > case: > {code} > Filter(a=2) > Join(t1.b=t2.b, type=inner) > Filter(b=3) > Scan(t1) > Scan(t2) > {code} > Filter b=3 will be pushed into scan, planner will push filter above join: > {code} > Join(t1.b=t2.b, type=inner) > Filter(a=2) > Scan(t1, b=3) > Scan(t2) > {code} > In this case, check whether filter was pushed is not enough. > {quote} > Drill divides planning into a number of *phases*, each defined by a set of > *rules*. Most storage plugins perform filter push-down during the physical > planning stage. However, by this point, Drill has already decided on the > degree of parallelism: it is too late to use filter push-down to set the > degree of parallelism. Yet, if using something like a REST API, we want to > use filters to help us shard the query (that is, to set the degree of > parallelism.) > > DRILL-7458 performs filter push-down at *logical* planning time to work > around the above limitation. (In Drill, there are three different phases that > could be considered the logical phase, depending on which planning options > are set to control Calcite.) > [~volodymyr] points out that the the logical plan phase may be wrong because > it will perform rewrites of the type he cited. > Thus, we need to research where to insert filter push down. It must come: > * After rewrites of the kind described above. > * After join equivalence computations. (See DRILL-7556.) > * Before the decision is made about the number of minor fragments. > The goal of this ticket is to either: > * Research to identify an existing phase which satisfies these requirements, > or > * Create a new phase. > Due to the way Calcite works, it is not a good idea to have a single phase > handle two tasks that depend on one another. That is, we cannot combine > filter push down in a phase which defines the filters, nor can we add filter > push-down in a phase that choose parallelism. > Background: Calcite is a rule-based query planner inspired by > [Volcano|https://paperhub.s3.amazonaws.com/dace52a42c07f7f8348b08dc2b186061.pdf]. > The above issue is a flaw with rule-based planners and was identified as > early as the [Cascades query framework > paper|https://www.csd.uoc.gr/~hy460/pdf/CascadesFrameworkForQueryOptimization.pdf] > which was the follow-up to Volcano. -- This message was sent by Atlassian Jira (v8.3.4#803005)