[jira] [Updated] (CALCITE-963) Hoist literals
[ https://issues.apache.org/jira/browse/CALCITE-963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Reynolds updated CALCITE-963: --- Attachment: (was: HoistedVariables.png) > Hoist literals > -- > > Key: CALCITE-963 > URL: https://issues.apache.org/jira/browse/CALCITE-963 > Project: Calcite > Issue Type: Bug >Reporter: Julian Hyde >Priority: Major > Labels: pull-request-available > Attachments: HoistedVariables.png > > Time Spent: 10m > Remaining Estimate: 0h > > Convert literals into (internal) bind variables so that statements that > differ only in literal values can be executed using the same plan. > In [mail > thread|http://mail-archives.apache.org/mod_mbox/calcite-dev/201511.mbox/%3c56437bf8.70...@gmail.com%3E] > Homer wrote: > {quote}Imagine that it is common to run a large number of very similar > machine generated queries that just change the literals in the sql query. > For example (the real queries would be much more complex): > {code}Select * from emp where empno = 1; > Select * from emp where empno = 2; > etc.{code} > The plan that is likely being generated for these kind of queries is going to > be very much the same each time, so to save some time, I would like to > recognize that the literals are all that have changed in a query and use the > previously optimized execution plan and just replace the literals.{quote} > I think this could be done as a transform on the initial RelNode tree. It > would find literals (RexLiteral), replace them with bind variables > (RexDynamicParam) and write the value into a pool. The next statement would > go through the same process and the RelNode tree would be identical, but with > possibly different values for the bind variables. > The bind variables are of course internal; not visible from JDBC. When the > statement is executed, the bind variables are implicitly bound. > Statements would be held in a Guava cache. > This would be enabled by a config parameter. Unfortunately I don't think we > could do this by default -- we'd lose optimization power because we would no > longer be able to do constant reduction. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (CALCITE-963) Hoist literals
[ https://issues.apache.org/jira/browse/CALCITE-963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16932829#comment-16932829 ] Scott Reynolds edited comment on CALCITE-963 at 9/18/19 8:57 PM: - h1. Goal When a query is issued to Calcite it is parsed, optimized and then generates a String of Java Class that implements {{Bindable}}. {{EnumerableInterpretable}} creates this string and checks to see if that string exists in {{com.google.common.cache}} and if it doesn't it will call into a Java compiler. Compilation process can take a considerable amount of time, Apache Kylin reported 50 to 150ms of additional computation time. Today, Apache Calcite will generate unique Java Class strings whenever any part of the query changes. This document details out the design and implementation of a hoisting technique within Apache Calcite. This design and implementation greatly increases the cache hit rate of {{EnumerableInterpretable}}'s {{BINDABLE_CACHE}}. h1. Non Goals This implementation is not designed to change the planning process. It does not transform {{RexLiteral}} into {{RexDynamicParam}}, and doesn't change the cost calculation of the query. h1. Implementation Details After a query has been optimized there are three phases that remaining phases to the query: # Generating the Java code # Binding Hoisted Variables # Runtime execution via {{Bindable.bind(DataContext, HoistedVariables)}} Each of these phases will interact with a new class called {{HoistedVariables}} !HoistedVariables.png! Each of these methods are used in the above three phases to hoist a variable from within the query into the runtime execution of the {{Bindable}}. The method {{implement}} of the interface {{EnumerableRel}} is used to generate the Java code in phase one. Each of these {{RelNode}} can now call {{registerVariable(String)}} to allocate a {{Slot}} for their unbound value. This {{Slot}} is reserved for their use and is unique for the query plan. When a {{RelNode}} registers a variable it needs to save that {{Slot}} into a property so it can be referenced in phase 2. This {{Slot}} is then referenced in code generation by calling {{EnumerableRel.lookupValue}} which returns an {{Expression}} that will extract the bound value at for the {{Slot}}. Below is a snippet from {{EnumerableLimit}} implementation of {{implement}} that uses {{HoistedVariables}}. {code:java} Expression v = builder.append("child", result.block); if (offset != null) { if (offset instanceof RexDynamicParam) { v = getDynamicExpression((RexDynamicParam) offset); } else { // Register with Hoisted Variable here offsetIndex = variables.registerVariable("offset"); v = builder.append( "offset", Expressions.call( v, BuiltInMethod.SKIP.method, // At runtime, fetch the bound variable. This returns the Java code to do that. EnumerableRel.lookupValue(offsetIndex, Integer.class))); } } if (fetch != null) { if (fetch instanceof RexDynamicParam) { v = getDynamicExpression((RexDynamicParam) fetch); } else { // Register with Hoisted Variable here this.fetchIndex = variables.registerVariable("fetch"); v = builder.append( "fetch", Expressions.call( v, BuiltInMethod.TAKE.method, // At runtime, fetch the bound variable. This returns the Java code to do that. EnumerableRel.lookupValue(fetchIndex, Integer.class))); } } {code} The second phase of the query execution is where registered {{Slots}} get bound. To this, our change adds a new optional method to {{Bindable}} called {{hoistVariables}}. This method is where an instance of {{EnumerableRel}} extracts the values out of the query plan and binds them into the {{HoistedVariables}} instance just prior to executing the query. Below is {{EnumerableLimit}} implementation: {code:java} @Override public void hoistedVariables(HoistedVariables variables) { getInputs() .stream() .forEach(rel -> { final EnumerableRel enumerable = (EnumerableRel) rel; enumerable.hoistedVariables(variables); }); if (fetchIndex != null) { // fetchIndex is the registered slot for this variable. Bind fetchIndex to fetch variables.setVariable(fetchIndex, RexLiteral.intValue(fetch)); } if (offsetIndex != null) { // offsetIndex is the registered slot for this variable. Bind offsetIndex to offset. variables.setVariable(offsetIndex, RexLiteral.intValue(offset)); } } {code} To tie these three phases together, {{CalcitePrepareImpl}} needs to setup the variables when it creates a {{PreparedResult}}: {code:java} try { CatalogReader.THREAD_LOCAL.set(catalogReader); final SqlConformance conformance = context.config().conformance(); internalParameters.put("_conformance", conformance); // Get the compiled Bindable instance either from cache or generate a new one.
[jira] [Commented] (CALCITE-963) Hoist literals
[ https://issues.apache.org/jira/browse/CALCITE-963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16932891#comment-16932891 ] Scott Reynolds commented on CALCITE-963: {quote} * I was surprised that HoistedVariables was added to so many API calls. Maybe it could belong to another piece of the preparation context{quote} I was tempted to add it to {{DataContext}}. It does have to land into {{CalciteSignature}} {code:java} public CalciteSignature(String sql, List parameterList, Map internalParameters, RelDataType rowType, List columns, Meta.CursorFactory cursorFactory, CalciteSchema rootSchema, List collationList, long maxRowCount, Bindable bindable, Meta.StatementType statementType, HoistedVariables variables) { super(columns, sql, parameterList, internalParameters, cursorFactory, statementType); this.rowType = rowType; this.rootSchema = rootSchema; this.collationList = collationList; this.maxRowCount = maxRowCount; this.bindable = bindable; this.variables = variables; } public Enumerable enumerable(DataContext dataContext) { Enumerable enumerable = bindable.bind(dataContext, variables); if (maxRowCount >= 0) { // Apply limit. In JDBC 0 means "no limit". But for us, -1 means // "no limit", and 0 is a valid limit. enumerable = EnumerableDefaults.take(enumerable, maxRowCount); } return enumerable; } {code} {quote}I was surprised that this work touched so much Enumerable code. An alternative approach would be to transform a RelNode tree, early in the planning process, transforming some RexLiteral instances into RexDynamicParam. The rest of the planning process would proceed as if the user had provided a statement with bind variables. I know you state that this was a non-goal, but why was it not a goal? It probably would have been a lot simpler, and it would have worked with conventions besides Enumerable.{quote} Ya I should add color to this. Given the discussions on the mailing list I would like to add an optimization to our use case. Our Fact Tables contain a single account hierarchy represented by a column {{sub_account}}. Often a Fact will not be associated with an {{sub_account}} and so instead of storing that as null and dealing with missing right hand row, we store a sentinel value in the Fact so our joins to the account dimension are simpler. So when a query comes in that for all Facts that do not contain a sub account, we want to change the cost calculation of that filter as {{sub_account_sid == 'IS_NULL'}} is not as selective as {{sub_account_sid = 'AC128485'}}. So my understanding is, by translating into a {{RexDynamicParam}}, we lose out on this optimization. > Hoist literals > -- > > Key: CALCITE-963 > URL: https://issues.apache.org/jira/browse/CALCITE-963 > Project: Calcite > Issue Type: Bug >Reporter: Julian Hyde >Priority: Major > Labels: pull-request-available > Attachments: HoistedVariables.png > > Time Spent: 10m > Remaining Estimate: 0h > > Convert literals into (internal) bind variables so that statements that > differ only in literal values can be executed using the same plan. > In [mail > thread|http://mail-archives.apache.org/mod_mbox/calcite-dev/201511.mbox/%3c56437bf8.70...@gmail.com%3E] > Homer wrote: > {quote}Imagine that it is common to run a large number of very similar > machine generated queries that just change the literals in the sql query. > For example (the real queries would be much more complex): > {code}Select * from emp where empno = 1; > Select * from emp where empno = 2; > etc.{code} > The plan that is likely being generated for these kind of queries is going to > be very much the same each time, so to save some time, I would like to > recognize that the literals are all that have changed in a query and use the > previously optimized execution plan and just replace the literals.{quote} > I think this could be done as a transform on the initial RelNode tree. It > would find literals (RexLiteral), replace them with bind variables > (RexDynamicParam) and write the value into a pool. The next statement would > go through the same process and the RelNode tree would be identical, but with > possibly different values for the bind variables. > The bind variables are of course internal; not visible from JDBC. When the > statement is executed, the bind variables are implicitly bound. > Statements would be held in a Guava cache. > This would be enabled by a config parameter. Unfortunately I don't think we > could do this by default -- we'd lose optimization power because we would no > longer be able to do constant reduction. -- This message was
[jira] [Commented] (CALCITE-963) Hoist literals
[ https://issues.apache.org/jira/browse/CALCITE-963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16932829#comment-16932829 ] Scott Reynolds commented on CALCITE-963: h1. Goal When a query is issued to Calcite it is parsed, optimized and then generates a String of Java Class that implements {{Bindable}}. {{EnumerableInterpretable}} creates this string and checks to see if that string exists in {{com.google.common.cache}} and if it doesn't it will call into a Java compiler. Compilation process can take a considerable amount of time, Apache Kylin reported 50 to 150ms of additional computation time. Today, Apache Calcite will generate unique Java Class strings whenever any part of the query changes. This document details out the design and implementation of a hoisting technique within Apache Calcite. This design and implementation greatly increases the cache hit rate of {{EnumerableInterpretable}}'s {{BINDABLE_CACHE}}. h1. Non Goals This implementation is not designed to change the planning process. It does not transform {{RexLiteral}} into {{RexDynamicParam}}, and doesn't change the cost calculation of the query. h1. Implementation Details After a query has been optimized there are three phases that remaining phases to the query: # Generating the Java code # Binding Hoisted Variables # Runtime execution via {{Bindable.bind(DataContext, HoistedVariables)}} Each of these phases will interact with a new class called {{HoistedVariables}} [file:HoistedVariables.png|file:///HoistedVariables.png] Each of these methods are used in the above three phases to hoist a variable from within the query into the runtime execution of the {{Bindable}}. The method {{implement}} of the interface {{EnumerableRel}} is used to generate the Java code in phase one. Each of these {{RelNode}} can now call {{registerVariable(String)}} to allocate a {{Slot}} for their unbound value. This {{Slot}} is reserved for their use and is unique for the query plan. When a {{RelNode}} registers a variable it needs to save that {{Slot}} into a property so it can be referenced in phase 2. This {{Slot}} is then referenced in code generation by calling {{EnumerableRel.lookupValue}} which returns an {{Expression}} that will extract the bound value at for the {{Slot}}. Below is a snippet from {{EnumerableLimit}} implementation of {{implement}} that uses {{HoistedVariables}}. {code:java} Expression v = builder.append("child", result.block); if (offset != null) { if (offset instanceof RexDynamicParam) { v = getDynamicExpression((RexDynamicParam) offset); } else { // Register with Hoisted Variable here offsetIndex = variables.registerVariable("offset"); v = builder.append( "offset", Expressions.call( v, BuiltInMethod.SKIP.method, // At runtime, fetch the bound variable. This returns the Java code to do that. EnumerableRel.lookupValue(offsetIndex, Integer.class))); } } if (fetch != null) { if (fetch instanceof RexDynamicParam) { v = getDynamicExpression((RexDynamicParam) fetch); } else { // Register with Hoisted Variable here this.fetchIndex = variables.registerVariable("fetch"); v = builder.append( "fetch", Expressions.call( v, BuiltInMethod.TAKE.method, // At runtime, fetch the bound variable. This returns the Java code to do that. EnumerableRel.lookupValue(fetchIndex, Integer.class))); } } {code} The second phase of the query execution is where registered {{Slots}} get bound. To this, our change adds a new optional method to {{Bindable}} called {{hoistVariables}}. This method is where an instance of {{EnumerableRel}} extracts the values out of the query plan and binds them into the {{HoistedVariables}} instance just prior to executing the query. Below is {{EnumerableLimit}} implementation: {code:java} @Override public void hoistedVariables(HoistedVariables variables) { getInputs() .stream() .forEach(rel -> { final EnumerableRel enumerable = (EnumerableRel) rel; enumerable.hoistedVariables(variables); }); if (fetchIndex != null) { // fetchIndex is the registered slot for this variable. Bind fetchIndex to fetch variables.setVariable(fetchIndex, RexLiteral.intValue(fetch)); } if (offsetIndex != null) { // offsetIndex is the registered slot for this variable. Bind offsetIndex to offset. variables.setVariable(offsetIndex, RexLiteral.intValue(offset)); } } {code} To tie these three phases together, {{CalcitePrepareImpl}} needs to setup the variables when it creates a {{PreparedResult}}: {code:java} try { CatalogReader.THREAD_LOCAL.set(catalogReader); final SqlConformance conformance = context.config().conformance(); internalParameters.put("_conformance", conformance); // Get the compiled Bindable instance either from cache or generate a new one. bindable =
[jira] [Comment Edited] (CALCITE-963) Hoist literals
[ https://issues.apache.org/jira/browse/CALCITE-963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16932829#comment-16932829 ] Scott Reynolds edited comment on CALCITE-963 at 9/18/19 8:56 PM: - h1. Goal When a query is issued to Calcite it is parsed, optimized and then generates a String of Java Class that implements {{Bindable}}. {{EnumerableInterpretable}} creates this string and checks to see if that string exists in {{com.google.common.cache}} and if it doesn't it will call into a Java compiler. Compilation process can take a considerable amount of time, Apache Kylin reported 50 to 150ms of additional computation time. Today, Apache Calcite will generate unique Java Class strings whenever any part of the query changes. This document details out the design and implementation of a hoisting technique within Apache Calcite. This design and implementation greatly increases the cache hit rate of {{EnumerableInterpretable}}'s {{BINDABLE_CACHE}}. h1. Non Goals This implementation is not designed to change the planning process. It does not transform {{RexLiteral}} into {{RexDynamicParam}}, and doesn't change the cost calculation of the query. h1. Implementation Details After a query has been optimized there are three phases that remaining phases to the query: # Generating the Java code # Binding Hoisted Variables # Runtime execution via {{Bindable.bind(DataContext, HoistedVariables)}} Each of these phases will interact with a new class called {{HoistedVariables}} Each of these methods are used in the above three phases to hoist a variable from within the query into the runtime execution of the {{Bindable}}. The method {{implement}} of the interface {{EnumerableRel}} is used to generate the Java code in phase one. Each of these {{RelNode}} can now call {{registerVariable(String)}} to allocate a {{Slot}} for their unbound value. This {{Slot}} is reserved for their use and is unique for the query plan. When a {{RelNode}} registers a variable it needs to save that {{Slot}} into a property so it can be referenced in phase 2. This {{Slot}} is then referenced in code generation by calling {{EnumerableRel.lookupValue}} which returns an {{Expression}} that will extract the bound value at for the {{Slot}}. Below is a snippet from {{EnumerableLimit}} implementation of {{implement}} that uses {{HoistedVariables}}. {code:java} Expression v = builder.append("child", result.block); if (offset != null) { if (offset instanceof RexDynamicParam) { v = getDynamicExpression((RexDynamicParam) offset); } else { // Register with Hoisted Variable here offsetIndex = variables.registerVariable("offset"); v = builder.append( "offset", Expressions.call( v, BuiltInMethod.SKIP.method, // At runtime, fetch the bound variable. This returns the Java code to do that. EnumerableRel.lookupValue(offsetIndex, Integer.class))); } } if (fetch != null) { if (fetch instanceof RexDynamicParam) { v = getDynamicExpression((RexDynamicParam) fetch); } else { // Register with Hoisted Variable here this.fetchIndex = variables.registerVariable("fetch"); v = builder.append( "fetch", Expressions.call( v, BuiltInMethod.TAKE.method, // At runtime, fetch the bound variable. This returns the Java code to do that. EnumerableRel.lookupValue(fetchIndex, Integer.class))); } } {code} The second phase of the query execution is where registered {{Slots}} get bound. To this, our change adds a new optional method to {{Bindable}} called {{hoistVariables}}. This method is where an instance of {{EnumerableRel}} extracts the values out of the query plan and binds them into the {{HoistedVariables}} instance just prior to executing the query. Below is {{EnumerableLimit}} implementation: {code:java} @Override public void hoistedVariables(HoistedVariables variables) { getInputs() .stream() .forEach(rel -> { final EnumerableRel enumerable = (EnumerableRel) rel; enumerable.hoistedVariables(variables); }); if (fetchIndex != null) { // fetchIndex is the registered slot for this variable. Bind fetchIndex to fetch variables.setVariable(fetchIndex, RexLiteral.intValue(fetch)); } if (offsetIndex != null) { // offsetIndex is the registered slot for this variable. Bind offsetIndex to offset. variables.setVariable(offsetIndex, RexLiteral.intValue(offset)); } } {code} To tie these three phases together, {{CalcitePrepareImpl}} needs to setup the variables when it creates a {{PreparedResult}}: {code:java} try { CatalogReader.THREAD_LOCAL.set(catalogReader); final SqlConformance conformance = context.config().conformance(); internalParameters.put("_conformance", conformance); // Get the compiled Bindable instance either from cache or generate a new one. bindable =
[jira] [Comment Edited] (CALCITE-963) Hoist literals
[ https://issues.apache.org/jira/browse/CALCITE-963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16932829#comment-16932829 ] Scott Reynolds edited comment on CALCITE-963 at 9/18/19 9:09 PM: - h1. Goal When a query is issued to Calcite it is parsed, optimized and then generates a Java Class that implements {{Bindable}}. {{EnumerableInterpretable. }}This class is then checked to see if it exists in {{com.google.common.cache}} and if it doesn't it will call into a Java compiler. Compilation process can take a considerable amount of time, Apache Kylin reported 50 to 150ms of additional computation time. Today, Apache Calcite will generate unique Java Class strings whenever any part of the query changes. This document details out the design and implementation of a hoisting technique within Apache Calcite. This design and implementation greatly increases the cache hit rate of {{EnumerableInterpretable}}'s {{BINDABLE_CACHE}}. h1. Non Goals This implementation is not designed to change the planning process. It does not transform {{RexLiteral}} into {{RexDynamicParam}}, and doesn't change the cost calculation of the query. h1. Implementation Details After a query has been optimized there are three phases that remaining phases to the query: # Generating the Java code # Binding Hoisted Variables # Runtime execution via {{Bindable.bind(DataContext, HoistedVariables)}} Each of these phases will interact with a new class called {{HoistedVariables}} !HoistedVariables.png! Each of these methods are used in the above three phases to hoist a variable from within the query into the runtime execution of the {{Bindable}}. The method {{implement}} of the interface {{EnumerableRel}} is used to generate the Java code in phase one. Each of these {{RelNode}} can now call {{registerVariable(String)}} to allocate a {{Slot}} for their unbound value. This {{Slot}} is reserved for their use and is unique for the query plan. When a {{RelNode}} registers a variable it needs to save that {{Slot}} into a property so it can be referenced in phase 2. This {{Slot}} is then referenced in code generation by calling {{EnumerableRel.lookupValue}} which returns an {{Expression}} that will extract the bound value at for the {{Slot}}. Below is a snippet from {{EnumerableLimit}} implementation of {{implement}} that uses {{HoistedVariables}}. {code:java} Expression v = builder.append("child", result.block); if (offset != null) { if (offset instanceof RexDynamicParam) { v = getDynamicExpression((RexDynamicParam) offset); } else { // Register with Hoisted Variable here offsetIndex = variables.registerVariable("offset"); v = builder.append( "offset", Expressions.call( v, BuiltInMethod.SKIP.method, // At runtime, fetch the bound variable. This returns the Java code to do that. EnumerableRel.lookupValue(offsetIndex, Integer.class))); } } if (fetch != null) { if (fetch instanceof RexDynamicParam) { v = getDynamicExpression((RexDynamicParam) fetch); } else { // Register with Hoisted Variable here this.fetchIndex = variables.registerVariable("fetch"); v = builder.append( "fetch", Expressions.call( v, BuiltInMethod.TAKE.method, // At runtime, fetch the bound variable. This returns the Java code to do that. EnumerableRel.lookupValue(fetchIndex, Integer.class))); } } {code} The second phase of the query execution is where registered {{Slots}} get bound. To this, our change adds a new optional method to {{Bindable}} called {{hoistVariables}}. This method is where an instance of {{EnumerableRel}} extracts the values out of the query plan and binds them into the {{HoistedVariables}} instance just prior to executing the query. Below is {{EnumerableLimit}} implementation: {code:java} @Override public void hoistedVariables(HoistedVariables variables) { getInputs() .stream() .forEach(rel -> { final EnumerableRel enumerable = (EnumerableRel) rel; enumerable.hoistedVariables(variables); }); if (fetchIndex != null) { // fetchIndex is the registered slot for this variable. Bind fetchIndex to fetch variables.setVariable(fetchIndex, RexLiteral.intValue(fetch)); } if (offsetIndex != null) { // offsetIndex is the registered slot for this variable. Bind offsetIndex to offset. variables.setVariable(offsetIndex, RexLiteral.intValue(offset)); } } {code} To tie these three phases together, {{CalcitePrepareImpl}} needs to setup the variables when it creates a {{PreparedResult}}: {code:java} try { CatalogReader.THREAD_LOCAL.set(catalogReader); final SqlConformance conformance = context.config().conformance(); internalParameters.put("_conformance", conformance); // Get the compiled Bindable instance either from cache or generate a new one. bindable =
[jira] [Comment Edited] (CALCITE-963) Hoist literals
[ https://issues.apache.org/jira/browse/CALCITE-963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16932829#comment-16932829 ] Scott Reynolds edited comment on CALCITE-963 at 9/18/19 9:09 PM: - h1. Goal When a query is issued to Calcite it is parsed, optimized and then generates a Java Class that implements {{Bindable}}. {{EnumerableInterpretable. This class is then checked to see if it exists in {{com.google.common.cache and if it doesn't it will call into a Java compiler. Compilation process can take a considerable amount of time, Apache Kylin reported 50 to 150ms of additional computation time. Today, Apache Calcite will generate unique Java Class strings whenever any part of the query changes. This document details out the design and implementation of a hoisting technique within Apache Calcite. This design and implementation greatly increases the cache hit rate of {{EnumerableInterpretable}}'s {{BINDABLE_CACHE}}. h1. Non Goals This implementation is not designed to change the planning process. It does not transform {{RexLiteral}} into {{RexDynamicParam}}, and doesn't change the cost calculation of the query. h1. Implementation Details After a query has been optimized there are three phases that remaining phases to the query: # Generating the Java code # Binding Hoisted Variables # Runtime execution via {{Bindable.bind(DataContext, HoistedVariables)}} Each of these phases will interact with a new class called {{HoistedVariables}} !HoistedVariables.png! Each of these methods are used in the above three phases to hoist a variable from within the query into the runtime execution of the {{Bindable}}. The method {{implement}} of the interface {{EnumerableRel}} is used to generate the Java code in phase one. Each of these {{RelNode}} can now call {{registerVariable(String)}} to allocate a {{Slot}} for their unbound value. This {{Slot}} is reserved for their use and is unique for the query plan. When a {{RelNode}} registers a variable it needs to save that {{Slot}} into a property so it can be referenced in phase 2. This {{Slot}} is then referenced in code generation by calling {{EnumerableRel.lookupValue}} which returns an {{Expression}} that will extract the bound value at for the {{Slot}}. Below is a snippet from {{EnumerableLimit}} implementation of {{implement}} that uses {{HoistedVariables}}. {code:java} Expression v = builder.append("child", result.block); if (offset != null) { if (offset instanceof RexDynamicParam) { v = getDynamicExpression((RexDynamicParam) offset); } else { // Register with Hoisted Variable here offsetIndex = variables.registerVariable("offset"); v = builder.append( "offset", Expressions.call( v, BuiltInMethod.SKIP.method, // At runtime, fetch the bound variable. This returns the Java code to do that. EnumerableRel.lookupValue(offsetIndex, Integer.class))); } } if (fetch != null) { if (fetch instanceof RexDynamicParam) { v = getDynamicExpression((RexDynamicParam) fetch); } else { // Register with Hoisted Variable here this.fetchIndex = variables.registerVariable("fetch"); v = builder.append( "fetch", Expressions.call( v, BuiltInMethod.TAKE.method, // At runtime, fetch the bound variable. This returns the Java code to do that. EnumerableRel.lookupValue(fetchIndex, Integer.class))); } } {code} The second phase of the query execution is where registered {{Slots}} get bound. To this, our change adds a new optional method to {{Bindable}} called {{hoistVariables}}. This method is where an instance of {{EnumerableRel}} extracts the values out of the query plan and binds them into the {{HoistedVariables}} instance just prior to executing the query. Below is {{EnumerableLimit}} implementation: {code:java} @Override public void hoistedVariables(HoistedVariables variables) { getInputs() .stream() .forEach(rel -> { final EnumerableRel enumerable = (EnumerableRel) rel; enumerable.hoistedVariables(variables); }); if (fetchIndex != null) { // fetchIndex is the registered slot for this variable. Bind fetchIndex to fetch variables.setVariable(fetchIndex, RexLiteral.intValue(fetch)); } if (offsetIndex != null) { // offsetIndex is the registered slot for this variable. Bind offsetIndex to offset. variables.setVariable(offsetIndex, RexLiteral.intValue(offset)); } } {code} To tie these three phases together, {{CalcitePrepareImpl}} needs to setup the variables when it creates a {{PreparedResult}}: {code:java} try { CatalogReader.THREAD_LOCAL.set(catalogReader); final SqlConformance conformance = context.config().conformance(); internalParameters.put("_conformance", conformance); // Get the compiled Bindable instance either from cache or generate a new one. bindable =
[jira] [Comment Edited] (CALCITE-963) Hoist literals
[ https://issues.apache.org/jira/browse/CALCITE-963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16932829#comment-16932829 ] Scott Reynolds edited comment on CALCITE-963 at 9/18/19 9:14 PM: - h1. Goal When a query is issued to Calcite it is parsed, optimized and then generates a Java Class that implements {{Bindable}}. {{EnumerableInterpretable}}. This class is then checked to see if it exists in {{com.google.common.cache}} and if it doesn't it will call into a Java compiler. Compilation process can take a considerable amount of time, Apache Kylin reported 50 to 150ms of additional computation time. Today, Apache Calcite will generate unique Java Class strings whenever any part of the query changes. This document details out the design and implementation of a hoisting technique within Apache Calcite. This design and implementation greatly increases the cache hit rate of {{EnumerableInterpretable}}'s {{BINDABLE_CACHE}}. h1. Non Goals This implementation is not designed to change the planning process. It does not transform {{RexLiteral}} into {{RexDynamicParam}}, and doesn't change the cost calculation of the query. h1. Implementation Details After a query has been optimized there are three phases that remaining phases to the query: # Generating the Java code # Binding Hoisted Variables # Runtime execution via {{Bindable.bind(DataContext, HoistedVariables)}} Each of these phases will interact with a new class called {{HoistedVariables}} !HoistedVariables.png! Each of these methods are used in the above three phases to hoist a variable from within the query into the runtime execution of the {{Bindable}}. The method {{implement}} of the interface {{EnumerableRel}} is used to generate the Java code in phase one. Each of these {{RelNode}} can now call {{registerVariable(String)}} to allocate a {{Slot}} for their unbound value. This {{Slot}} is reserved for their use and is unique for the query plan. When a {{RelNode}} registers a variable it needs to save that {{Slot}} into a property so it can be referenced in phase 2. This {{Slot}} is then referenced in code generation by calling {{EnumerableRel.lookupValue}} which returns an {{Expression}} that will extract the bound value at for the {{Slot}}. Below is a snippet from {{EnumerableLimit}} implementation of {{implement}} that uses {{HoistedVariables}}. {code:java} Expression v = builder.append("child", result.block); if (offset != null) { if (offset instanceof RexDynamicParam) { v = getDynamicExpression((RexDynamicParam) offset); } else { // Register with Hoisted Variable here offsetIndex = variables.registerVariable("offset"); v = builder.append( "offset", Expressions.call( v, BuiltInMethod.SKIP.method, // At runtime, fetch the bound variable. This returns the Java code to do that. EnumerableRel.lookupValue(offsetIndex, Integer.class))); } } if (fetch != null) { if (fetch instanceof RexDynamicParam) { v = getDynamicExpression((RexDynamicParam) fetch); } else { // Register with Hoisted Variable here this.fetchIndex = variables.registerVariable("fetch"); v = builder.append( "fetch", Expressions.call( v, BuiltInMethod.TAKE.method, // At runtime, fetch the bound variable. This returns the Java code to do that. EnumerableRel.lookupValue(fetchIndex, Integer.class))); } } {code} The second phase of the query execution is where registered {{Slots}} get bound. To this, our change adds a new optional method to {{Bindable}} called {{hoistVariables}}. This method is where an instance of {{EnumerableRel}} extracts the values out of the query plan and binds them into the {{HoistedVariables}} instance just prior to executing the query. Below is {{EnumerableLimit}} implementation: {code:java} @Override public void hoistedVariables(HoistedVariables variables) { getInputs() .stream() .forEach(rel -> { final EnumerableRel enumerable = (EnumerableRel) rel; enumerable.hoistedVariables(variables); }); if (fetchIndex != null) { // fetchIndex is the registered slot for this variable. Bind fetchIndex to fetch variables.setVariable(fetchIndex, RexLiteral.intValue(fetch)); } if (offsetIndex != null) { // offsetIndex is the registered slot for this variable. Bind offsetIndex to offset. variables.setVariable(offsetIndex, RexLiteral.intValue(offset)); } } {code} To tie these three phases together, {{CalcitePrepareImpl}} needs to setup the variables when it creates a {{PreparedResult}}: {code:java} try { CatalogReader.THREAD_LOCAL.set(catalogReader); final SqlConformance conformance = context.config().conformance(); internalParameters.put("_conformance", conformance); // Get the compiled Bindable instance either from cache or generate a new one. bindable =
[jira] [Commented] (CALCITE-963) Hoist literals
[ https://issues.apache.org/jira/browse/CALCITE-963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16932924#comment-16932924 ] Scott Reynolds commented on CALCITE-963: Giving it some more thought, going to take another stab at this and just do the {{RexDynamicParam}}. All of these changes and breakage is a good sign this approach is fighting against the system. I will try another pass tomorrow. > Hoist literals > -- > > Key: CALCITE-963 > URL: https://issues.apache.org/jira/browse/CALCITE-963 > Project: Calcite > Issue Type: Bug >Reporter: Julian Hyde >Priority: Major > Labels: pull-request-available > Attachments: HoistedVariables.png > > Time Spent: 10m > Remaining Estimate: 0h > > Convert literals into (internal) bind variables so that statements that > differ only in literal values can be executed using the same plan. > In [mail > thread|http://mail-archives.apache.org/mod_mbox/calcite-dev/201511.mbox/%3c56437bf8.70...@gmail.com%3E] > Homer wrote: > {quote}Imagine that it is common to run a large number of very similar > machine generated queries that just change the literals in the sql query. > For example (the real queries would be much more complex): > {code}Select * from emp where empno = 1; > Select * from emp where empno = 2; > etc.{code} > The plan that is likely being generated for these kind of queries is going to > be very much the same each time, so to save some time, I would like to > recognize that the literals are all that have changed in a query and use the > previously optimized execution plan and just replace the literals.{quote} > I think this could be done as a transform on the initial RelNode tree. It > would find literals (RexLiteral), replace them with bind variables > (RexDynamicParam) and write the value into a pool. The next statement would > go through the same process and the RelNode tree would be identical, but with > possibly different values for the bind variables. > The bind variables are of course internal; not visible from JDBC. When the > statement is executed, the bind variables are implicitly bound. > Statements would be held in a Guava cache. > This would be enabled by a config parameter. Unfortunately I don't think we > could do this by default -- we'd lose optimization power because we would no > longer be able to do constant reduction. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-963) Hoist literals
[ https://issues.apache.org/jira/browse/CALCITE-963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16934792#comment-16934792 ] Scott Reynolds commented on CALCITE-963: So took a break from being in this code base and came back with a fresh brain and eyes. Went back over my original notes and noticed I hadn't explored what {{EnumerableRelImplementor#stash}} method did. After reviewing that method I determined that is *exactly* what I needed during code compilation. This method takes an object stores it in {{DataContext}} and returns an {{Expression}} that calls {{DataContext#get}} to pull out the object. This generates the same code for each query of the format like: {code:sql} Select * from emp where empno = 1; Select * from emp where empno = 2; {code} as it stashes all the predicates via {{EnumerableRelImplementaor#stash}} This is awesome! and this pull request and design is objectively worse then this implementation. So sorry for the digression in this ticket, it appears I : a.) Don't understand what variable hoisting really is b.) Didn't spend enough time researching and understanding all the tools Calcite provides. > Hoist literals > -- > > Key: CALCITE-963 > URL: https://issues.apache.org/jira/browse/CALCITE-963 > Project: Calcite > Issue Type: Bug >Reporter: Julian Hyde >Priority: Major > Labels: pull-request-available > Attachments: HoistedVariables.png > > Time Spent: 0.5h > Remaining Estimate: 0h > > Convert literals into (internal) bind variables so that statements that > differ only in literal values can be executed using the same plan. > In [mail > thread|http://mail-archives.apache.org/mod_mbox/calcite-dev/201511.mbox/%3c56437bf8.70...@gmail.com%3E] > Homer wrote: > {quote}Imagine that it is common to run a large number of very similar > machine generated queries that just change the literals in the sql query. > For example (the real queries would be much more complex): > {code}Select * from emp where empno = 1; > Select * from emp where empno = 2; > etc.{code} > The plan that is likely being generated for these kind of queries is going to > be very much the same each time, so to save some time, I would like to > recognize that the literals are all that have changed in a query and use the > previously optimized execution plan and just replace the literals.{quote} > I think this could be done as a transform on the initial RelNode tree. It > would find literals (RexLiteral), replace them with bind variables > (RexDynamicParam) and write the value into a pool. The next statement would > go through the same process and the RelNode tree would be identical, but with > possibly different values for the bind variables. > The bind variables are of course internal; not visible from JDBC. When the > statement is executed, the bind variables are implicitly bound. > Statements would be held in a Guava cache. > This would be enabled by a config parameter. Unfortunately I don't think we > could do this by default -- we'd lose optimization power because we would no > longer be able to do constant reduction. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CALCITE-3508) Strengthen outer Join to inner if it is under a Filter that discards null values
[ https://issues.apache.org/jira/browse/CALCITE-3508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Reynolds updated CALCITE-3508: Description: Today, FilterJoinRule given an Outer Join the rule strengthens it to Inner Join when the nullable side contains a filter IS_NOT_NULL. Below is the code. {code:java} for (RexNode filter : aboveFilters) { if (joinType.generatesNullsOnLeft() && Strong.isNotTrue(filter, leftBitmap)) { joinType = joinType.cancelNullsOnLeft(); } if (joinType.generatesNullsOnRight() && Strong.isNotTrue(filter, rightBitmap)) { joinType = joinType.cancelNullsOnRight(); } if (!joinType.isOuterJoin()) { break; } } {code} This code looks at the filter to determine if it is always true, then it can alter the join type by removing the null on that side. We can see this in the following test RelOptRules#testStrengthenJoinType, which executes the following SQL that transforms from a LEFT OUTER JOIN to an INNER JOIN {code:sql} select * from dept left join emp on dept.deptno = emp.deptno where emp.deptno is not null and emp.sal > 100 {code} This ticket is about broadening the application of this rule to a sql like the following: {code:sql} select * from dept left join emp on dept.deptno = emp.deptno where emp.sal > 100 {code} This originally came up on the mailing list: [https://mail-archives.apache.org/mod_mbox/calcite-dev/201909.mbox/%3CCAGzrZ38hh%2B8B8TG8gVHUfVKunJZ4L%2BK7rmrinQxpwOHcdJbzGQ%40mail.gmail.com%3E] and in that thread it was pointed out that there are filters that prevent this from being applied: {code:sql} SELECT b.title FROM Book b LEFT JOIN Author a ON b.author = a.id WHERE a.name <> 'Victor' {code} This means we need to ensure we that the OUTER JOIN doesn't contain – for lack of a different term – negation filters. If there is a negation – like NOT_EQUAL – the JOIN cannot be strengthened. was: Today, FilterJoinRule given an Outer Join the rule strengthens it to Inner Join when the nullable side contains a filter IS_NOT_NULL. Below is the code. {code:java} for (RexNode filter : aboveFilters) { if (joinType.generatesNullsOnLeft() && Strong.isNotTrue(filter, leftBitmap)) { joinType = joinType.cancelNullsOnLeft(); } if (joinType.generatesNullsOnRight() && Strong.isNotTrue(filter, rightBitmap)) { joinType = joinType.cancelNullsOnRight(); } if (!joinType.isOuterJoin()) { break; } } {code} This code looks at the filter to determine if it is always true, then it can alter the join type by removing the null on that side. We can see this in the following test RelOptRules#testStrengthenJoinType, which executes the following SQL that transforms from a LEFT OUTER JOIN to an INNER JOIN {code:sql} select * from dept left join emp on dept.deptno = emp.deptno where emp.deptno is not null and emp.sal > 100 {code} This ticket is about broadening the application of this rule to a sql like the following: {code:sql} select * from dept left join emp on dept.deptno = emp.deptno where emp.sal > 100 {code} This originally came up on the mailing list: [https://mail-archives.apache.org/mod_mbox/calcite-dev/201909.mbox/browser] and in that thread it was pointed out that there are filters that prevent this from being applied: {code:sql} SELECT b.title FROM Book b LEFT JOIN Author a ON b.author = a.id WHERE a.name <> 'Victor' {code} This means we need to ensure we that the OUTER JOIN doesn't contain – for lack of a different term – negation filters. If there is a negation – like NOT_EQUAL – the JOIN cannot be strengthened. > Strengthen outer Join to inner if it is under a Filter that discards null > values > > > Key: CALCITE-3508 > URL: https://issues.apache.org/jira/browse/CALCITE-3508 > Project: Calcite > Issue Type: Improvement >Reporter: Scott Reynolds >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > Today, FilterJoinRule given an Outer Join the rule strengthens it to Inner > Join when the nullable side contains a filter IS_NOT_NULL. Below is the code. > {code:java} > for (RexNode filter : aboveFilters) { > if (joinType.generatesNullsOnLeft() > && Strong.isNotTrue(filter, leftBitmap)) { > joinType = joinType.cancelNullsOnLeft(); > } > if (joinType.generatesNullsOnRight() > && Strong.isNotTrue(filter, rightBitmap)) { > joinType = joinType.cancelNullsOnRight(); > } > if (!joinType.isOuterJoin()) { > break; > } > } > {code} > This code looks at the filter to determine if it is always true, then it can > alter the join type by
[jira] [Commented] (CALCITE-3508) Strengthen outer Join to inner if it is under a Filter that discards null values
[ https://issues.apache.org/jira/browse/CALCITE-3508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16976210#comment-16976210 ] Scott Reynolds commented on CALCITE-3508: - I am left with one more remaining design question. Today the {{FilterJoinRule}} has the following Javadoc {code:java} /** * Planner rule that pushes filters above and * within a join node into the join node and/or its children nodes. */ {code} We should consider doing one of two things: 1. Updating the javadoc to indicate it can Strengthen the join type 2. Break the Strengthen join type into a separate rule Should we move this logic into a separate rule like we do with {{JoinProjectTransposeRule}} > Strengthen outer Join to inner if it is under a Filter that discards null > values > > > Key: CALCITE-3508 > URL: https://issues.apache.org/jira/browse/CALCITE-3508 > Project: Calcite > Issue Type: Improvement >Reporter: Scott Reynolds >Priority: Major > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > Today, FilterJoinRule given an Outer Join the rule strengthens it to Inner > Join when the nullable side contains a filter IS_NOT_NULL. Below is the code. > {code:java} > for (RexNode filter : aboveFilters) { > if (joinType.generatesNullsOnLeft() > && Strong.isNotTrue(filter, leftBitmap)) { > joinType = joinType.cancelNullsOnLeft(); > } > if (joinType.generatesNullsOnRight() > && Strong.isNotTrue(filter, rightBitmap)) { > joinType = joinType.cancelNullsOnRight(); > } > if (!joinType.isOuterJoin()) { > break; > } > } > {code} > This code looks at the filter to determine if it is always true, then it can > alter the join type by removing the null on that side. > We can see this in the following test RelOptRules#testStrengthenJoinType, > which executes the following SQL that transforms from a LEFT OUTER JOIN to an > INNER JOIN > {code:sql} > select * > from dept left join emp on dept.deptno = emp.deptno > where emp.deptno is not null and emp.sal > 100 > {code} > This ticket is about broadening the application of this rule to a sql like > the following: > {code:sql} > select * > from dept left join emp on dept.deptno = emp.deptno > where emp.sal > 100 > {code} > This originally came up on the mailing list: > [https://mail-archives.apache.org/mod_mbox/calcite-dev/201909.mbox/%3CCAGzrZ38hh%2B8B8TG8gVHUfVKunJZ4L%2BK7rmrinQxpwOHcdJbzGQ%40mail.gmail.com%3E] > and in that thread it was pointed out that there are filters that prevent > this from being applied: > {code:sql} > SELECT b.title > FROM Book b > LEFT JOIN Author a ON b.author = a.id > WHERE a.name <> 'Victor' > {code} > This means we need to ensure we that the OUTER JOIN doesn't contain – for > lack of a different term – negation filters. If there is a negation – like > NOT_EQUAL – the JOIN cannot be strengthened. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CALCITE-3508) Strengthen JOINs when Filter enforces the nullable side(s) to non-nullable
Scott Reynolds created CALCITE-3508: --- Summary: Strengthen JOINs when Filter enforces the nullable side(s) to non-nullable Key: CALCITE-3508 URL: https://issues.apache.org/jira/browse/CALCITE-3508 Project: Calcite Issue Type: Improvement Reporter: Scott Reynolds Today, FilterJoinRule given an Outer Join the rule strengthens it to Inner Join when the nullable side contains a filter IS_NOT_NULL. Below is the code. {code:java} for (RexNode filter : aboveFilters) { if (joinType.generatesNullsOnLeft() && Strong.isNotTrue(filter, leftBitmap)) { joinType = joinType.cancelNullsOnLeft(); } if (joinType.generatesNullsOnRight() && Strong.isNotTrue(filter, rightBitmap)) { joinType = joinType.cancelNullsOnRight(); } if (!joinType.isOuterJoin()) { break; } } {code} This code looks at the filter to determine if it is always true, then it can alter the join type by removing the null on that side. We can see this in the following test RelOptRules#testStrengthenJoinType, which executes the following SQL that transforms from a LEFT OUTER JOIN to an INNER JOIN {code:sql} select * from dept left join emp on dept.deptno = emp.deptno where emp.deptno is not null and emp.sal > 100 {code} This ticket is about broadening the application of this rule to a sql like the following: {code:sql} select * from dept left join emp on dept.deptno = emp.deptno where emp.sal > 100 {code} This originally came up on the mailing list: [https://mail-archives.apache.org/mod_mbox/calcite-dev/201909.mbox/browser] and in that thread it was pointed out that there are filters that prevent this from being applied: {code:sql} SELECT b.title FROM Book b LEFT JOIN Author a ON b.author = a.id WHERE a.name <> 'Victor' {code} This means we need to ensure we that the OUTER JOIN doesn't contain – for lack of a different term – negation filters. If there is a negation – like NOT_EQUAL – the JOIN cannot be strengthened. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CALCITE-3508) Strengthen Outer Joins based on FILTER clauses
[ https://issues.apache.org/jira/browse/CALCITE-3508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Reynolds updated CALCITE-3508: Summary: Strengthen Outer Joins based on FILTER clauses (was: Strengthen JOINs when Filter enforces the nullable side(s) to non-nullable) > Strengthen Outer Joins based on FILTER clauses > -- > > Key: CALCITE-3508 > URL: https://issues.apache.org/jira/browse/CALCITE-3508 > Project: Calcite > Issue Type: Improvement >Reporter: Scott Reynolds >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Today, FilterJoinRule given an Outer Join the rule strengthens it to Inner > Join when the nullable side contains a filter IS_NOT_NULL. Below is the code. > {code:java} > for (RexNode filter : aboveFilters) { > if (joinType.generatesNullsOnLeft() > && Strong.isNotTrue(filter, leftBitmap)) { > joinType = joinType.cancelNullsOnLeft(); > } > if (joinType.generatesNullsOnRight() > && Strong.isNotTrue(filter, rightBitmap)) { > joinType = joinType.cancelNullsOnRight(); > } > if (!joinType.isOuterJoin()) { > break; > } > } > {code} > This code looks at the filter to determine if it is always true, then it can > alter the join type by removing the null on that side. > We can see this in the following test RelOptRules#testStrengthenJoinType, > which executes the following SQL that transforms from a LEFT OUTER JOIN to an > INNER JOIN > {code:sql} > select * > from dept left join emp on dept.deptno = emp.deptno > where emp.deptno is not null and emp.sal > 100 > {code} > This ticket is about broadening the application of this rule to a sql like > the following: > {code:sql} > select * > from dept left join emp on dept.deptno = emp.deptno > where emp.sal > 100 > {code} > This originally came up on the mailing list: > [https://mail-archives.apache.org/mod_mbox/calcite-dev/201909.mbox/browser] > and in that thread it was pointed out that there are filters that prevent > this from being applied: > {code:sql} > SELECT b.title > FROM Book b > LEFT JOIN Author a ON b.author = a.id > WHERE a.name <> 'Victor' > {code} > This means we need to ensure we that the OUTER JOIN doesn't contain – for > lack of a different term – negation filters. If there is a negation – like > NOT_EQUAL – the JOIN cannot be strengthened. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-3508) Strengthen outer Join to inner if it is under a Filter that discards null values
[ https://issues.apache.org/jira/browse/CALCITE-3508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16975780#comment-16975780 ] Scott Reynolds commented on CALCITE-3508: - Thank you for the subject change it is considerably better! Updated pull request title to reflect that as well. Looking at {code:java} JoinProjectTransposeRule{code} I see that it is using {code:java} /** Returns whether all expressions in a list are strong. */ public static boolean allStrong(List operands) {code} Today, this will return true for {code:java} a.name <> 'Victor'{code} . Is that a what we should expect from {noformat}public static boolean isStrong(RexNode){noformat} ? Perhaps we should extend that to cover more cases? {code:java} /** * Returns whether a given expression is strong. * * Examples: * * Returns true for {@code c = 1} since it returns null if and only if * c is null * Returns false for {@code c IS NULL} since it always returns TRUE * or FALSE * * * @param e Expression * @return true if the expression is strong, false otherwise */ public static boolean isStrong(RexNode e) {code} To handle `RexCall`s with negations etc? > Strengthen outer Join to inner if it is under a Filter that discards null > values > > > Key: CALCITE-3508 > URL: https://issues.apache.org/jira/browse/CALCITE-3508 > Project: Calcite > Issue Type: Improvement >Reporter: Scott Reynolds >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > Today, FilterJoinRule given an Outer Join the rule strengthens it to Inner > Join when the nullable side contains a filter IS_NOT_NULL. Below is the code. > {code:java} > for (RexNode filter : aboveFilters) { > if (joinType.generatesNullsOnLeft() > && Strong.isNotTrue(filter, leftBitmap)) { > joinType = joinType.cancelNullsOnLeft(); > } > if (joinType.generatesNullsOnRight() > && Strong.isNotTrue(filter, rightBitmap)) { > joinType = joinType.cancelNullsOnRight(); > } > if (!joinType.isOuterJoin()) { > break; > } > } > {code} > This code looks at the filter to determine if it is always true, then it can > alter the join type by removing the null on that side. > We can see this in the following test RelOptRules#testStrengthenJoinType, > which executes the following SQL that transforms from a LEFT OUTER JOIN to an > INNER JOIN > {code:sql} > select * > from dept left join emp on dept.deptno = emp.deptno > where emp.deptno is not null and emp.sal > 100 > {code} > This ticket is about broadening the application of this rule to a sql like > the following: > {code:sql} > select * > from dept left join emp on dept.deptno = emp.deptno > where emp.sal > 100 > {code} > This originally came up on the mailing list: > [https://mail-archives.apache.org/mod_mbox/calcite-dev/201909.mbox/browser] > and in that thread it was pointed out that there are filters that prevent > this from being applied: > {code:sql} > SELECT b.title > FROM Book b > LEFT JOIN Author a ON b.author = a.id > WHERE a.name <> 'Victor' > {code} > This means we need to ensure we that the OUTER JOIN doesn't contain – for > lack of a different term – negation filters. If there is a negation – like > NOT_EQUAL – the JOIN cannot be strengthened. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CALCITE-5025) Update commons-io:commons-io Directory Travesal vulnerabliltiy
Scott Reynolds created CALCITE-5025: --- Summary: Update commons-io:commons-io Directory Travesal vulnerabliltiy Key: CALCITE-5025 URL: https://issues.apache.org/jira/browse/CALCITE-5025 Project: Calcite Issue Type: Bug Reporter: Scott Reynolds Calcite depends commons-io:commons-io 2.4 – which was released on {{2012-06-12}} -- which can be exploited to access parent directories. In recent months, there have been a fair number of releases for this package and [Synk lists this as the only vulnerability it has seen|https://snyk.io/vuln/maven:commons-io:commons-io]. Task is simple, bump the version to 2.7 or higher -- if I may suggest just going to 2.11.0. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (CALCITE-5226) Resolve security Vuln in Commons-DBCP
Scott Reynolds created CALCITE-5226: --- Summary: Resolve security Vuln in Commons-DBCP Key: CALCITE-5226 URL: https://issues.apache.org/jira/browse/CALCITE-5226 Project: Calcite Issue Type: Bug Reporter: Scott Reynolds In DBCP-562, the information is leaked. This was fixed in 2021. We should bump to to the latest version to resolve it. [https://ossindex.sonatype.org/vulnerability/sonatype-2020-1349?component-type=maven=org.apache.commons%2Fcommons-dbcp2_source=ossindex-client_medium=integration_content=1.7.0] [https://ossindex.sonatype.org/vulnerability/sonatype-2020-0460?component-type=maven=org.apache.commons%2Fcommons-dbcp2_source=ossindex-client_medium=integration_content=1.7.0] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (CALCITE-5226) Resolve security Vuln in Commons-DBCP
[ https://issues.apache.org/jira/browse/CALCITE-5226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Reynolds reassigned CALCITE-5226: --- Assignee: Scott Reynolds > Resolve security Vuln in Commons-DBCP > - > > Key: CALCITE-5226 > URL: https://issues.apache.org/jira/browse/CALCITE-5226 > Project: Calcite > Issue Type: Bug >Reporter: Scott Reynolds >Assignee: Scott Reynolds >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > In DBCP-562, the information is leaked. This was fixed in 2021. We should > bump to to the latest version to resolve it. > > [https://ossindex.sonatype.org/vulnerability/sonatype-2020-1349?component-type=maven=org.apache.commons%2Fcommons-dbcp2_source=ossindex-client_medium=integration_content=1.7.0] > [https://ossindex.sonatype.org/vulnerability/sonatype-2020-0460?component-type=maven=org.apache.commons%2Fcommons-dbcp2_source=ossindex-client_medium=integration_content=1.7.0] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (CALCITE-5391) JoinOnUniqueToSemiJoinRule should preserve field names, if possible
[ https://issues.apache.org/jira/browse/CALCITE-5391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17636938#comment-17636938 ] Scott Reynolds commented on CALCITE-5391: - I am worried my application – which depends on field names – is going to break in other ways because of this. I did some exploration and identified a few other places where the field names change – some in meaningful ways. I found Rules that rewrote all the names and ignoring all the aliases the user provided. And I found – I believe mostly caused by ambiguous {{{}RexNode{}}}s – field changes from {{Something}} to {{{}Something0{}}}. How *should* applications code against the result set? Should applications keep their own map of field names to ordinals instead of relying on the field names from the root {{RelNode}} ? h3. How I ran the Tests I updated {{RelOptFixture#check()}} to compare the before and after plan's {{RelDataType}} {code:none} diff --git a/testkit/src/main/java/org/apache/calcite/test/RelOptFixture.java b/testkit/src/main/java/org/apache/calcite/test/RelOptFixture.java index ecb474a9e..ed99437c7 100644 --- a/testkit/src/main/java/org/apache/calcite/test/RelOptFixture.java +++ b/testkit/src/main/java/org/apache/calcite/test/RelOptFixture.java @@ -400,6 +400,8 @@ private void checkPlanning(boolean unchanged) { } } assertThat(relAfter, relIsValid()); + +assertThat(relAfter.getRowType(), is(relBefore.getRowType())); } {code} And then ran {{./gradlew :core:test --tests org.apache.calcite.test.RelOptRulesTest}} h3. What I found h4. Tests that rewrote all the column names # testWithinDistinctCountDistinct # testProjectToWindowRuleForMultipleWindows # testReduceConstantsWindow # testExpressionInWindowFunction # testProjectWindowTransposeRuleWithConstants # testProjectAggregateMergeSum0 h4. Column Names changed from {{Something}} to {{Something0}} # testPushAggregateThroughOuterJoin2 # testPushAggregateThroughOuterJoin3 # testAggregateJoinRemove6 h4. Plans with expressions without an alias There where a few plans that have expressions without a name so they get a name change through the planning process. These seemed harmless. > JoinOnUniqueToSemiJoinRule should preserve field names, if possible > --- > > Key: CALCITE-5391 > URL: https://issues.apache.org/jira/browse/CALCITE-5391 > Project: Calcite > Issue Type: Bug >Affects Versions: 1.32.0 >Reporter: Scott Reynolds >Assignee: Scott Reynolds >Priority: Major > Labels: pull-request-available > Fix For: 1.33.0 > > Time Spent: 10m > Remaining Estimate: 0h > > The new rule pushes a copy of the [original project without the alias > names|https://github.com/apache/calcite/pull/2848/files#r1026959819] > {code:java} > builder.project(project.getProjects()); > {code} > This results in a new SQL plan without the name fields -- they become {{$fN}} > fields in different plans. > Small change is required to put the names from the {{RelDataType}} of the > {{Project}} similar to the other {{Semijoin}} rule. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (CALCITE-5391) JoinOnUniqueToSemiJoinRule should preserve field names, if possible
[ https://issues.apache.org/jira/browse/CALCITE-5391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17637977#comment-17637977 ] Scott Reynolds commented on CALCITE-5391: - Thanks so much, after reading that through and looking at a bunch of calcite code I understand now. In my application, we will have to adopt much the same way as the JDBC implementation within Calcite adopts. And had we done that, our application wouldn't have broke. Thanks for sharing CALCITE-1584. I have updated the pull request with tests and links to this issue. > JoinOnUniqueToSemiJoinRule should preserve field names, if possible > --- > > Key: CALCITE-5391 > URL: https://issues.apache.org/jira/browse/CALCITE-5391 > Project: Calcite > Issue Type: Bug >Affects Versions: 1.32.0 >Reporter: Scott Reynolds >Assignee: Scott Reynolds >Priority: Major > Labels: pull-request-available > Fix For: 1.33.0 > > Time Spent: 10m > Remaining Estimate: 0h > > The new rule pushes a copy of the [original project without the alias > names|https://github.com/apache/calcite/pull/2848/files#r1026959819] > {code:java} > builder.project(project.getProjects()); > {code} > This results in a new SQL plan without the name fields -- they become {{$fN}} > fields in different plans. > Small change is required to put the names from the {{RelDataType}} of the > {{Project}} similar to the other {{Semijoin}} rule. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (CALCITE-5391) JoinOnUniqueToSemiJoinRule removes a Projections aliases
Scott Reynolds created CALCITE-5391: --- Summary: JoinOnUniqueToSemiJoinRule removes a Projections aliases Key: CALCITE-5391 URL: https://issues.apache.org/jira/browse/CALCITE-5391 Project: Calcite Issue Type: Bug Affects Versions: 1.32.0 Reporter: Scott Reynolds Assignee: Scott Reynolds The new rule pushes a copy of the [original project without the alias names|https://github.com/apache/calcite/pull/2848/files#r1026959819] {code:java} builder.project(project.getProjects()); {code} This results in a new SQL plan without the name fields -- they become {{$fN}} fields in different plans. Small change is required to put the names from the {{RelDataType}} of the {{Project}} similar to the other {{Semijoin}} rule. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (CALCITE-5391) JoinOnUniqueToSemiJoinRule removes a Projections aliases
[ https://issues.apache.org/jira/browse/CALCITE-5391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17636853#comment-17636853 ] Scott Reynolds commented on CALCITE-5391: - bq. I think "JoinOnUniqueToSemiJoinRule should preserve field names, if possible" would be a better summary and commit message. If possible? Is there a case where the plan wouldn't preserve field names? > JoinOnUniqueToSemiJoinRule removes a Projections aliases > > > Key: CALCITE-5391 > URL: https://issues.apache.org/jira/browse/CALCITE-5391 > Project: Calcite > Issue Type: Bug >Affects Versions: 1.32.0 >Reporter: Scott Reynolds >Assignee: Scott Reynolds >Priority: Major > Labels: pull-request-available > Fix For: 1.33.0 > > Time Spent: 10m > Remaining Estimate: 0h > > The new rule pushes a copy of the [original project without the alias > names|https://github.com/apache/calcite/pull/2848/files#r1026959819] > {code:java} > builder.project(project.getProjects()); > {code} > This results in a new SQL plan without the name fields -- they become {{$fN}} > fields in different plans. > Small change is required to put the names from the {{RelDataType}} of the > {{Project}} similar to the other {{Semijoin}} rule. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (CALCITE-5391) JoinOnUniqueToSemiJoinRule should preserve field names, if possible
[ https://issues.apache.org/jira/browse/CALCITE-5391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Reynolds updated CALCITE-5391: Summary: JoinOnUniqueToSemiJoinRule should preserve field names, if possible (was: JoinOnUniqueToSemiJoinRule should preserve field names) > JoinOnUniqueToSemiJoinRule should preserve field names, if possible > --- > > Key: CALCITE-5391 > URL: https://issues.apache.org/jira/browse/CALCITE-5391 > Project: Calcite > Issue Type: Bug >Affects Versions: 1.32.0 >Reporter: Scott Reynolds >Assignee: Scott Reynolds >Priority: Major > Labels: pull-request-available > Fix For: 1.33.0 > > Time Spent: 10m > Remaining Estimate: 0h > > The new rule pushes a copy of the [original project without the alias > names|https://github.com/apache/calcite/pull/2848/files#r1026959819] > {code:java} > builder.project(project.getProjects()); > {code} > This results in a new SQL plan without the name fields -- they become {{$fN}} > fields in different plans. > Small change is required to put the names from the {{RelDataType}} of the > {{Project}} similar to the other {{Semijoin}} rule. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (CALCITE-5391) JoinOnUniqueToSemiJoinRule should preserve field names
[ https://issues.apache.org/jira/browse/CALCITE-5391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Reynolds updated CALCITE-5391: Summary: JoinOnUniqueToSemiJoinRule should preserve field names (was: JoinOnUniqueToSemiJoinRule removes a Projections aliases) > JoinOnUniqueToSemiJoinRule should preserve field names > -- > > Key: CALCITE-5391 > URL: https://issues.apache.org/jira/browse/CALCITE-5391 > Project: Calcite > Issue Type: Bug >Affects Versions: 1.32.0 >Reporter: Scott Reynolds >Assignee: Scott Reynolds >Priority: Major > Labels: pull-request-available > Fix For: 1.33.0 > > Time Spent: 10m > Remaining Estimate: 0h > > The new rule pushes a copy of the [original project without the alias > names|https://github.com/apache/calcite/pull/2848/files#r1026959819] > {code:java} > builder.project(project.getProjects()); > {code} > This results in a new SQL plan without the name fields -- they become {{$fN}} > fields in different plans. > Small change is required to put the names from the {{RelDataType}} of the > {{Project}} similar to the other {{Semijoin}} rule. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (CALCITE-5226) Resolve security Vulnnerability in Commons-DBCP
[ https://issues.apache.org/jira/browse/CALCITE-5226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Reynolds updated CALCITE-5226: Summary: Resolve security Vulnnerability in Commons-DBCP (was: Resolve security Vuln in Commons-DBCP) > Resolve security Vulnnerability in Commons-DBCP > --- > > Key: CALCITE-5226 > URL: https://issues.apache.org/jira/browse/CALCITE-5226 > Project: Calcite > Issue Type: Bug >Reporter: Scott Reynolds >Assignee: Scott Reynolds >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > In DBCP-562, the information is leaked. This was fixed in 2021. We should > bump to to the latest version to resolve it. > > [https://ossindex.sonatype.org/vulnerability/sonatype-2020-1349?component-type=maven=org.apache.commons%2Fcommons-dbcp2_source=ossindex-client_medium=integration_content=1.7.0] > [https://ossindex.sonatype.org/vulnerability/sonatype-2020-0460?component-type=maven=org.apache.commons%2Fcommons-dbcp2_source=ossindex-client_medium=integration_content=1.7.0] -- This message was sent by Atlassian Jira (v8.20.10#820010)