[jira] [Updated] (CALCITE-963) Hoist literals

2019-09-18 Thread Scott Reynolds (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Reynolds updated CALCITE-963:
---
Attachment: (was: HoistedVariables.png)

> Hoist literals
> --
>
> Key: CALCITE-963
> URL: https://issues.apache.org/jira/browse/CALCITE-963
> Project: Calcite
>  Issue Type: Bug
>Reporter: Julian Hyde
>Priority: Major
>  Labels: pull-request-available
> Attachments: HoistedVariables.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Convert literals into (internal) bind variables so that statements that 
> differ only in literal values can be executed using the same plan.
> In [mail 
> thread|http://mail-archives.apache.org/mod_mbox/calcite-dev/201511.mbox/%3c56437bf8.70...@gmail.com%3E]
>  Homer wrote:
> {quote}Imagine that it is common to run a large number of very similar 
> machine generated queries that just change the literals in the sql query.
> For example (the real queries would be much more complex):
> {code}Select * from emp where empno = 1;
> Select * from emp where empno = 2;
> etc.{code}
> The plan that is likely being generated for these kind of queries is going to 
> be very much the same each time, so to save some time, I would like to 
> recognize that the literals are all that have changed in a query and use the 
> previously optimized execution plan and just replace the literals.{quote}
> I think this could be done as a transform on the initial RelNode tree. It 
> would find literals (RexLiteral), replace them with bind variables 
> (RexDynamicParam) and write the value into a pool. The next statement would 
> go through the same process and the RelNode tree would be identical, but with 
> possibly different values for the bind variables.
> The bind variables are of course internal; not visible from JDBC. When the 
> statement is executed, the bind variables are implicitly bound.
> Statements would be held in a Guava cache.
> This would be enabled by a config parameter. Unfortunately I don't think we 
> could do this by default -- we'd lose optimization power because we would no 
> longer be able to do constant reduction.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (CALCITE-963) Hoist literals

2019-09-18 Thread Scott Reynolds (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16932829#comment-16932829
 ] 

Scott Reynolds edited comment on CALCITE-963 at 9/18/19 8:57 PM:
-

h1. Goal

When a query is issued to Calcite it is parsed, optimized and then generates a 
String of Java Class that implements {{Bindable}}. {{EnumerableInterpretable}} 
creates this string and checks to see if that string exists in 
{{com.google.common.cache}} and if it doesn't it will call into a Java 
compiler. Compilation process can take a considerable amount of time, Apache 
Kylin reported 50 to 150ms of additional computation time. Today, Apache 
Calcite will generate unique Java Class strings whenever any part of the query 
changes. This document details out the design and implementation of a hoisting 
technique within Apache Calcite. This design and implementation greatly 
increases the cache hit rate of {{EnumerableInterpretable}}'s 
{{BINDABLE_CACHE}}.
h1. Non Goals

This implementation is not designed to change the planning process. It does not 
transform {{RexLiteral}} into {{RexDynamicParam}}, and doesn't change the cost 
calculation of the query.
h1. Implementation Details

After a query has been optimized there are three phases that remaining phases 
to the query:
 # Generating the Java code
 # Binding Hoisted Variables
 # Runtime execution via {{Bindable.bind(DataContext, HoistedVariables)}}

Each of these phases will interact with a new class called {{HoistedVariables}}

!HoistedVariables.png!

Each of these methods are used in the above three phases to hoist a variable 
from within the query into the runtime execution of the {{Bindable}}.

The method {{implement}} of the interface {{EnumerableRel}} is used to generate 
the Java code in phase one. Each of these {{RelNode}} can now call 
{{registerVariable(String)}} to allocate a {{Slot}} for their unbound value. 
This {{Slot}} is reserved for their use and is unique for the query plan. When 
a {{RelNode}} registers a variable it needs to save that {{Slot}} into a 
property so it can be referenced in phase 2. This {{Slot}} is then referenced 
in code generation by calling {{EnumerableRel.lookupValue}} which returns an 
{{Expression}} that will extract the bound value at for the {{Slot}}.

Below is a snippet from {{EnumerableLimit}} implementation of {{implement}} 
that uses {{HoistedVariables}}.
{code:java}
Expression v = builder.append("child", result.block);
if (offset != null) {
  if (offset instanceof RexDynamicParam) {
v = getDynamicExpression((RexDynamicParam) offset);
  } else {
// Register with Hoisted Variable here
offsetIndex = variables.registerVariable("offset");
v = builder.append(
"offset",
Expressions.call(
  v,
  BuiltInMethod.SKIP.method,
  // At runtime, fetch the bound variable. This returns the Java code to do 
that.
  EnumerableRel.lookupValue(offsetIndex, Integer.class)));
  }
}
if (fetch != null) {
  if (fetch instanceof RexDynamicParam) {
v = getDynamicExpression((RexDynamicParam) fetch);
  } else {
// Register with Hoisted Variable here
this.fetchIndex = variables.registerVariable("fetch");
v = builder.append(
  "fetch",
  Expressions.call(
  v,
  BuiltInMethod.TAKE.method,
  // At runtime, fetch the bound variable. This returns the Java code to do 
that.
  EnumerableRel.lookupValue(fetchIndex, Integer.class)));
  }
}
{code}
The second phase of the query execution is where registered {{Slots}} get 
bound. To this, our change adds a new optional method to {{Bindable}} called 
{{hoistVariables}}. This method is where an instance of {{EnumerableRel}} 
extracts the values out of the query plan and binds them into the 
{{HoistedVariables}} instance just prior to executing the query. Below is 
{{EnumerableLimit}} implementation:
{code:java}
@Override public void hoistedVariables(HoistedVariables variables) {
  getInputs()
  .stream()
  .forEach(rel -> {
final EnumerableRel enumerable = (EnumerableRel) rel;
enumerable.hoistedVariables(variables);
  });
  if (fetchIndex != null) {
// fetchIndex is the registered slot for this variable. Bind fetchIndex to 
fetch
variables.setVariable(fetchIndex, RexLiteral.intValue(fetch));
  }
  if (offsetIndex != null) {
// offsetIndex is the registered slot for this variable. Bind offsetIndex 
to offset.
variables.setVariable(offsetIndex, RexLiteral.intValue(offset));
  }
}
{code}
To tie these three phases together, {{CalcitePrepareImpl}} needs to setup the 
variables when it creates a {{PreparedResult}}:
{code:java}
try {
  CatalogReader.THREAD_LOCAL.set(catalogReader);
  final SqlConformance conformance = context.config().conformance();
  internalParameters.put("_conformance", conformance);
  // Get the compiled Bindable instance either from cache or generate a new one.
  

[jira] [Commented] (CALCITE-963) Hoist literals

2019-09-18 Thread Scott Reynolds (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16932891#comment-16932891
 ] 

Scott Reynolds commented on CALCITE-963:


{quote} * I was surprised that HoistedVariables was added to so many API calls. 
Maybe it could belong to another piece of the preparation context{quote}
I was tempted to add it to {{DataContext}}. It does have to land into 
{{CalciteSignature}}
{code:java}
public CalciteSignature(String sql,
List parameterList,
Map internalParameters,
RelDataType rowType,
List columns,
Meta.CursorFactory cursorFactory,
CalciteSchema rootSchema,
List collationList,
long maxRowCount,
Bindable bindable,
Meta.StatementType statementType,
HoistedVariables variables) {
  super(columns, sql, parameterList, internalParameters, cursorFactory,
  statementType);
  this.rowType = rowType;
  this.rootSchema = rootSchema;
  this.collationList = collationList;
  this.maxRowCount = maxRowCount;
  this.bindable = bindable;
  this.variables = variables;
}

public Enumerable enumerable(DataContext dataContext) {
  Enumerable enumerable = bindable.bind(dataContext, variables);
  if (maxRowCount >= 0) {
// Apply limit. In JDBC 0 means "no limit". But for us, -1 means
// "no limit", and 0 is a valid limit.
enumerable = EnumerableDefaults.take(enumerable, maxRowCount);
  }
  return enumerable;
}

{code} 

{quote}I was surprised that this work touched so much Enumerable code. An 
alternative approach would be to transform a RelNode tree, early in the 
planning process, transforming some RexLiteral instances into RexDynamicParam. 
The rest of the planning process would proceed as if the user had provided a 
statement with bind variables. I know you state that this was a non-goal, but 
why was it not a goal? It probably would have been a lot simpler, and it would 
have worked with conventions besides Enumerable.{quote}

Ya I should add color to this. Given the discussions on the mailing list I 
would like to add an optimization to our use case. Our Fact Tables contain a 
single account hierarchy represented by a column {{sub_account}}. Often a Fact 
will not be associated with an {{sub_account}} and so instead of storing that 
as null and dealing with missing right hand row,  we store a sentinel value in 
the Fact so our joins to the account dimension are simpler. So when a query 
comes in that for all Facts that do not contain a sub account, we want to 
change the cost calculation of that filter as {{sub_account_sid == 'IS_NULL'}} 
is not as selective as {{sub_account_sid = 'AC128485'}}. So my understanding 
is, by translating into a {{RexDynamicParam}}, we lose out on this optimization.

> Hoist literals
> --
>
> Key: CALCITE-963
> URL: https://issues.apache.org/jira/browse/CALCITE-963
> Project: Calcite
>  Issue Type: Bug
>Reporter: Julian Hyde
>Priority: Major
>  Labels: pull-request-available
> Attachments: HoistedVariables.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Convert literals into (internal) bind variables so that statements that 
> differ only in literal values can be executed using the same plan.
> In [mail 
> thread|http://mail-archives.apache.org/mod_mbox/calcite-dev/201511.mbox/%3c56437bf8.70...@gmail.com%3E]
>  Homer wrote:
> {quote}Imagine that it is common to run a large number of very similar 
> machine generated queries that just change the literals in the sql query.
> For example (the real queries would be much more complex):
> {code}Select * from emp where empno = 1;
> Select * from emp where empno = 2;
> etc.{code}
> The plan that is likely being generated for these kind of queries is going to 
> be very much the same each time, so to save some time, I would like to 
> recognize that the literals are all that have changed in a query and use the 
> previously optimized execution plan and just replace the literals.{quote}
> I think this could be done as a transform on the initial RelNode tree. It 
> would find literals (RexLiteral), replace them with bind variables 
> (RexDynamicParam) and write the value into a pool. The next statement would 
> go through the same process and the RelNode tree would be identical, but with 
> possibly different values for the bind variables.
> The bind variables are of course internal; not visible from JDBC. When the 
> statement is executed, the bind variables are implicitly bound.
> Statements would be held in a Guava cache.
> This would be enabled by a config parameter. Unfortunately I don't think we 
> could do this by default -- we'd lose optimization power because we would no 
> longer be able to do constant reduction.



--
This message was 

[jira] [Commented] (CALCITE-963) Hoist literals

2019-09-18 Thread Scott Reynolds (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16932829#comment-16932829
 ] 

Scott Reynolds commented on CALCITE-963:


h1. Goal

When a query is issued to Calcite it is parsed, optimized and then generates a 
String of Java Class that implements {{Bindable}}. {{EnumerableInterpretable}} 
creates this string and checks to see if that string exists in 
{{com.google.common.cache}} and if it doesn't it will call into a Java 
compiler. Compilation process can take a considerable amount of time, Apache 
Kylin reported 50 to 150ms of additional computation time. Today, Apache 
Calcite will generate unique Java Class strings whenever any part of the query 
changes. This document details out the design and implementation of a hoisting 
technique within Apache Calcite. This design and implementation greatly 
increases the cache hit rate of {{EnumerableInterpretable}}'s 
{{BINDABLE_CACHE}}.
h1. Non Goals

This implementation is not designed to change the planning process. It does not 
transform {{RexLiteral}} into {{RexDynamicParam}}, and doesn't change the cost 
calculation of the query.
h1. Implementation Details

After a query has been optimized there are three phases that remaining phases 
to the query:
 # Generating the Java code
 # Binding Hoisted Variables
 # Runtime execution via {{Bindable.bind(DataContext, HoistedVariables)}}

Each of these phases will interact with a new class called {{HoistedVariables}}
 [file:HoistedVariables.png|file:///HoistedVariables.png]

Each of these methods are used in the above three phases to hoist a variable 
from within the query into the runtime execution of the {{Bindable}}.

The method {{implement}} of the interface {{EnumerableRel}} is used to generate 
the Java code in phase one. Each of these {{RelNode}} can now call 
{{registerVariable(String)}} to allocate a {{Slot}} for their unbound value. 
This {{Slot}} is reserved for their use and is unique for the query plan. When 
a {{RelNode}} registers a variable it needs to save that {{Slot}} into a 
property so it can be referenced in phase 2. This {{Slot}} is then referenced 
in code generation by calling {{EnumerableRel.lookupValue}} which returns an 
{{Expression}} that will extract the bound value at for the {{Slot}}.

Below is a snippet from {{EnumerableLimit}} implementation of {{implement}} 
that uses {{HoistedVariables}}.
{code:java}
Expression v = builder.append("child", result.block);
if (offset != null) {
  if (offset instanceof RexDynamicParam) {
v = getDynamicExpression((RexDynamicParam) offset);
  } else {
// Register with Hoisted Variable here
offsetIndex = variables.registerVariable("offset");
v = builder.append(
"offset",
Expressions.call(
  v,
  BuiltInMethod.SKIP.method,
  // At runtime, fetch the bound variable. This returns the Java code to do 
that.
  EnumerableRel.lookupValue(offsetIndex, Integer.class)));
  }
}
if (fetch != null) {
  if (fetch instanceof RexDynamicParam) {
v = getDynamicExpression((RexDynamicParam) fetch);
  } else {
// Register with Hoisted Variable here
this.fetchIndex = variables.registerVariable("fetch");
v = builder.append(
  "fetch",
  Expressions.call(
  v,
  BuiltInMethod.TAKE.method,
  // At runtime, fetch the bound variable. This returns the Java code to do 
that.
  EnumerableRel.lookupValue(fetchIndex, Integer.class)));
  }
}
{code}
The second phase of the query execution is where registered {{Slots}} get 
bound. To this, our change adds a new optional method to {{Bindable}} called 
{{hoistVariables}}. This method is where an instance of {{EnumerableRel}} 
extracts the values out of the query plan and binds them into the 
{{HoistedVariables}} instance just prior to executing the query. Below is 
{{EnumerableLimit}} implementation:
{code:java}
@Override public void hoistedVariables(HoistedVariables variables) {
  getInputs()
  .stream()
  .forEach(rel -> {
final EnumerableRel enumerable = (EnumerableRel) rel;
enumerable.hoistedVariables(variables);
  });
  if (fetchIndex != null) {
// fetchIndex is the registered slot for this variable. Bind fetchIndex to 
fetch
variables.setVariable(fetchIndex, RexLiteral.intValue(fetch));
  }
  if (offsetIndex != null) {
// offsetIndex is the registered slot for this variable. Bind offsetIndex 
to offset.
variables.setVariable(offsetIndex, RexLiteral.intValue(offset));
  }
}
{code}
To tie these three phases together, {{CalcitePrepareImpl}} needs to setup the 
variables when it creates a {{PreparedResult}}:
{code:java}
try {
  CatalogReader.THREAD_LOCAL.set(catalogReader);
  final SqlConformance conformance = context.config().conformance();
  internalParameters.put("_conformance", conformance);
  // Get the compiled Bindable instance either from cache or generate a new one.
  bindable = 

[jira] [Comment Edited] (CALCITE-963) Hoist literals

2019-09-18 Thread Scott Reynolds (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16932829#comment-16932829
 ] 

Scott Reynolds edited comment on CALCITE-963 at 9/18/19 8:56 PM:
-

h1. Goal

When a query is issued to Calcite it is parsed, optimized and then generates a 
String of Java Class that implements {{Bindable}}. {{EnumerableInterpretable}} 
creates this string and checks to see if that string exists in 
{{com.google.common.cache}} and if it doesn't it will call into a Java 
compiler. Compilation process can take a considerable amount of time, Apache 
Kylin reported 50 to 150ms of additional computation time. Today, Apache 
Calcite will generate unique Java Class strings whenever any part of the query 
changes. This document details out the design and implementation of a hoisting 
technique within Apache Calcite. This design and implementation greatly 
increases the cache hit rate of {{EnumerableInterpretable}}'s 
{{BINDABLE_CACHE}}.
h1. Non Goals

This implementation is not designed to change the planning process. It does not 
transform {{RexLiteral}} into {{RexDynamicParam}}, and doesn't change the cost 
calculation of the query.
h1. Implementation Details

After a query has been optimized there are three phases that remaining phases 
to the query:
 # Generating the Java code
 # Binding Hoisted Variables
 # Runtime execution via {{Bindable.bind(DataContext, HoistedVariables)}}

Each of these phases will interact with a new class called {{HoistedVariables}}



Each of these methods are used in the above three phases to hoist a variable 
from within the query into the runtime execution of the {{Bindable}}.

The method {{implement}} of the interface {{EnumerableRel}} is used to generate 
the Java code in phase one. Each of these {{RelNode}} can now call 
{{registerVariable(String)}} to allocate a {{Slot}} for their unbound value. 
This {{Slot}} is reserved for their use and is unique for the query plan. When 
a {{RelNode}} registers a variable it needs to save that {{Slot}} into a 
property so it can be referenced in phase 2. This {{Slot}} is then referenced 
in code generation by calling {{EnumerableRel.lookupValue}} which returns an 
{{Expression}} that will extract the bound value at for the {{Slot}}.

Below is a snippet from {{EnumerableLimit}} implementation of {{implement}} 
that uses {{HoistedVariables}}.
{code:java}
Expression v = builder.append("child", result.block);
if (offset != null) {
  if (offset instanceof RexDynamicParam) {
v = getDynamicExpression((RexDynamicParam) offset);
  } else {
// Register with Hoisted Variable here
offsetIndex = variables.registerVariable("offset");
v = builder.append(
"offset",
Expressions.call(
  v,
  BuiltInMethod.SKIP.method,
  // At runtime, fetch the bound variable. This returns the Java code to do 
that.
  EnumerableRel.lookupValue(offsetIndex, Integer.class)));
  }
}
if (fetch != null) {
  if (fetch instanceof RexDynamicParam) {
v = getDynamicExpression((RexDynamicParam) fetch);
  } else {
// Register with Hoisted Variable here
this.fetchIndex = variables.registerVariable("fetch");
v = builder.append(
  "fetch",
  Expressions.call(
  v,
  BuiltInMethod.TAKE.method,
  // At runtime, fetch the bound variable. This returns the Java code to do 
that.
  EnumerableRel.lookupValue(fetchIndex, Integer.class)));
  }
}
{code}
The second phase of the query execution is where registered {{Slots}} get 
bound. To this, our change adds a new optional method to {{Bindable}} called 
{{hoistVariables}}. This method is where an instance of {{EnumerableRel}} 
extracts the values out of the query plan and binds them into the 
{{HoistedVariables}} instance just prior to executing the query. Below is 
{{EnumerableLimit}} implementation:
{code:java}
@Override public void hoistedVariables(HoistedVariables variables) {
  getInputs()
  .stream()
  .forEach(rel -> {
final EnumerableRel enumerable = (EnumerableRel) rel;
enumerable.hoistedVariables(variables);
  });
  if (fetchIndex != null) {
// fetchIndex is the registered slot for this variable. Bind fetchIndex to 
fetch
variables.setVariable(fetchIndex, RexLiteral.intValue(fetch));
  }
  if (offsetIndex != null) {
// offsetIndex is the registered slot for this variable. Bind offsetIndex 
to offset.
variables.setVariable(offsetIndex, RexLiteral.intValue(offset));
  }
}
{code}
To tie these three phases together, {{CalcitePrepareImpl}} needs to setup the 
variables when it creates a {{PreparedResult}}:
{code:java}
try {
  CatalogReader.THREAD_LOCAL.set(catalogReader);
  final SqlConformance conformance = context.config().conformance();
  internalParameters.put("_conformance", conformance);
  // Get the compiled Bindable instance either from cache or generate a new one.
  bindable = 

[jira] [Comment Edited] (CALCITE-963) Hoist literals

2019-09-18 Thread Scott Reynolds (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16932829#comment-16932829
 ] 

Scott Reynolds edited comment on CALCITE-963 at 9/18/19 9:09 PM:
-

h1. Goal

When a query is issued to Calcite it is parsed, optimized and then generates a 
Java Class that implements {{Bindable}}. {{EnumerableInterpretable. }}This 
class is then checked to see if it exists in {{com.google.common.cache}} and if 
it doesn't it will call into a Java compiler. Compilation process can take a 
considerable amount of time, Apache Kylin reported 50 to 150ms of additional 
computation time. Today, Apache Calcite will generate unique Java Class strings 
whenever any part of the query changes. This document details out the design 
and implementation of a hoisting technique within Apache Calcite. This design 
and implementation greatly increases the cache hit rate of 
{{EnumerableInterpretable}}'s {{BINDABLE_CACHE}}.
h1. Non Goals

This implementation is not designed to change the planning process. It does not 
transform {{RexLiteral}} into {{RexDynamicParam}}, and doesn't change the cost 
calculation of the query.
h1. Implementation Details

After a query has been optimized there are three phases that remaining phases 
to the query:
 # Generating the Java code
 # Binding Hoisted Variables
 # Runtime execution via {{Bindable.bind(DataContext, HoistedVariables)}}

Each of these phases will interact with a new class called {{HoistedVariables}}

!HoistedVariables.png!

Each of these methods are used in the above three phases to hoist a variable 
from within the query into the runtime execution of the {{Bindable}}.

The method {{implement}} of the interface {{EnumerableRel}} is used to generate 
the Java code in phase one. Each of these {{RelNode}} can now call 
{{registerVariable(String)}} to allocate a {{Slot}} for their unbound value. 
This {{Slot}} is reserved for their use and is unique for the query plan. When 
a {{RelNode}} registers a variable it needs to save that {{Slot}} into a 
property so it can be referenced in phase 2. This {{Slot}} is then referenced 
in code generation by calling {{EnumerableRel.lookupValue}} which returns an 
{{Expression}} that will extract the bound value at for the {{Slot}}.

Below is a snippet from {{EnumerableLimit}} implementation of {{implement}} 
that uses {{HoistedVariables}}.
{code:java}
Expression v = builder.append("child", result.block);
if (offset != null) {
  if (offset instanceof RexDynamicParam) {
v = getDynamicExpression((RexDynamicParam) offset);
  } else {
// Register with Hoisted Variable here
offsetIndex = variables.registerVariable("offset");
v = builder.append(
"offset",
Expressions.call(
  v,
  BuiltInMethod.SKIP.method,
  // At runtime, fetch the bound variable. This returns the Java code to do 
that.
  EnumerableRel.lookupValue(offsetIndex, Integer.class)));
  }
}
if (fetch != null) {
  if (fetch instanceof RexDynamicParam) {
v = getDynamicExpression((RexDynamicParam) fetch);
  } else {
// Register with Hoisted Variable here
this.fetchIndex = variables.registerVariable("fetch");
v = builder.append(
  "fetch",
  Expressions.call(
  v,
  BuiltInMethod.TAKE.method,
  // At runtime, fetch the bound variable. This returns the Java code to do 
that.
  EnumerableRel.lookupValue(fetchIndex, Integer.class)));
  }
}
{code}
The second phase of the query execution is where registered {{Slots}} get 
bound. To this, our change adds a new optional method to {{Bindable}} called 
{{hoistVariables}}. This method is where an instance of {{EnumerableRel}} 
extracts the values out of the query plan and binds them into the 
{{HoistedVariables}} instance just prior to executing the query. Below is 
{{EnumerableLimit}} implementation:
{code:java}
@Override public void hoistedVariables(HoistedVariables variables) {
  getInputs()
  .stream()
  .forEach(rel -> {
final EnumerableRel enumerable = (EnumerableRel) rel;
enumerable.hoistedVariables(variables);
  });
  if (fetchIndex != null) {
// fetchIndex is the registered slot for this variable. Bind fetchIndex to 
fetch
variables.setVariable(fetchIndex, RexLiteral.intValue(fetch));
  }
  if (offsetIndex != null) {
// offsetIndex is the registered slot for this variable. Bind offsetIndex 
to offset.
variables.setVariable(offsetIndex, RexLiteral.intValue(offset));
  }
}
{code}
To tie these three phases together, {{CalcitePrepareImpl}} needs to setup the 
variables when it creates a {{PreparedResult}}:
{code:java}
try {
  CatalogReader.THREAD_LOCAL.set(catalogReader);
  final SqlConformance conformance = context.config().conformance();
  internalParameters.put("_conformance", conformance);
  // Get the compiled Bindable instance either from cache or generate a new one.
  bindable = 

[jira] [Comment Edited] (CALCITE-963) Hoist literals

2019-09-18 Thread Scott Reynolds (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16932829#comment-16932829
 ] 

Scott Reynolds edited comment on CALCITE-963 at 9/18/19 9:09 PM:
-

h1. Goal

When a query is issued to Calcite it is parsed, optimized and then generates a 
Java Class that implements {{Bindable}}. {{EnumerableInterpretable. This class 
is then checked to see if it exists in {{com.google.common.cache and if it 
doesn't it will call into a Java compiler. Compilation process can take a 
considerable amount of time, Apache Kylin reported 50 to 150ms of additional 
computation time. Today, Apache Calcite will generate unique Java Class strings 
whenever any part of the query changes. This document details out the design 
and implementation of a hoisting technique within Apache Calcite. This design 
and implementation greatly increases the cache hit rate of 
{{EnumerableInterpretable}}'s {{BINDABLE_CACHE}}.
h1. Non Goals

This implementation is not designed to change the planning process. It does not 
transform {{RexLiteral}} into {{RexDynamicParam}}, and doesn't change the cost 
calculation of the query.
h1. Implementation Details

After a query has been optimized there are three phases that remaining phases 
to the query:
 # Generating the Java code
 # Binding Hoisted Variables
 # Runtime execution via {{Bindable.bind(DataContext, HoistedVariables)}}

Each of these phases will interact with a new class called {{HoistedVariables}}

!HoistedVariables.png!

Each of these methods are used in the above three phases to hoist a variable 
from within the query into the runtime execution of the {{Bindable}}.

The method {{implement}} of the interface {{EnumerableRel}} is used to generate 
the Java code in phase one. Each of these {{RelNode}} can now call 
{{registerVariable(String)}} to allocate a {{Slot}} for their unbound value. 
This {{Slot}} is reserved for their use and is unique for the query plan. When 
a {{RelNode}} registers a variable it needs to save that {{Slot}} into a 
property so it can be referenced in phase 2. This {{Slot}} is then referenced 
in code generation by calling {{EnumerableRel.lookupValue}} which returns an 
{{Expression}} that will extract the bound value at for the {{Slot}}.

Below is a snippet from {{EnumerableLimit}} implementation of {{implement}} 
that uses {{HoistedVariables}}.
{code:java}
Expression v = builder.append("child", result.block);
if (offset != null) {
  if (offset instanceof RexDynamicParam) {
v = getDynamicExpression((RexDynamicParam) offset);
  } else {
// Register with Hoisted Variable here
offsetIndex = variables.registerVariable("offset");
v = builder.append(
"offset",
Expressions.call(
  v,
  BuiltInMethod.SKIP.method,
  // At runtime, fetch the bound variable. This returns the Java code to do 
that.
  EnumerableRel.lookupValue(offsetIndex, Integer.class)));
  }
}
if (fetch != null) {
  if (fetch instanceof RexDynamicParam) {
v = getDynamicExpression((RexDynamicParam) fetch);
  } else {
// Register with Hoisted Variable here
this.fetchIndex = variables.registerVariable("fetch");
v = builder.append(
  "fetch",
  Expressions.call(
  v,
  BuiltInMethod.TAKE.method,
  // At runtime, fetch the bound variable. This returns the Java code to do 
that.
  EnumerableRel.lookupValue(fetchIndex, Integer.class)));
  }
}
{code}
The second phase of the query execution is where registered {{Slots}} get 
bound. To this, our change adds a new optional method to {{Bindable}} called 
{{hoistVariables}}. This method is where an instance of {{EnumerableRel}} 
extracts the values out of the query plan and binds them into the 
{{HoistedVariables}} instance just prior to executing the query. Below is 
{{EnumerableLimit}} implementation:
{code:java}
@Override public void hoistedVariables(HoistedVariables variables) {
  getInputs()
  .stream()
  .forEach(rel -> {
final EnumerableRel enumerable = (EnumerableRel) rel;
enumerable.hoistedVariables(variables);
  });
  if (fetchIndex != null) {
// fetchIndex is the registered slot for this variable. Bind fetchIndex to 
fetch
variables.setVariable(fetchIndex, RexLiteral.intValue(fetch));
  }
  if (offsetIndex != null) {
// offsetIndex is the registered slot for this variable. Bind offsetIndex 
to offset.
variables.setVariable(offsetIndex, RexLiteral.intValue(offset));
  }
}
{code}
To tie these three phases together, {{CalcitePrepareImpl}} needs to setup the 
variables when it creates a {{PreparedResult}}:
{code:java}
try {
  CatalogReader.THREAD_LOCAL.set(catalogReader);
  final SqlConformance conformance = context.config().conformance();
  internalParameters.put("_conformance", conformance);
  // Get the compiled Bindable instance either from cache or generate a new one.
  bindable = 

[jira] [Comment Edited] (CALCITE-963) Hoist literals

2019-09-18 Thread Scott Reynolds (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16932829#comment-16932829
 ] 

Scott Reynolds edited comment on CALCITE-963 at 9/18/19 9:14 PM:
-

h1. Goal

When a query is issued to Calcite it is parsed, optimized and then generates a 
Java Class that implements {{Bindable}}. {{EnumerableInterpretable}}. This 
class is then checked to see if it exists in {{com.google.common.cache}} and if 
it doesn't it will call into a Java compiler. Compilation process can take a 
considerable amount of time, Apache Kylin reported 50 to 150ms of additional 
computation time. Today, Apache Calcite will generate unique Java Class strings 
whenever any part of the query changes. This document details out the design 
and implementation of a hoisting technique within Apache Calcite. This design 
and implementation greatly increases the cache hit rate of 
{{EnumerableInterpretable}}'s {{BINDABLE_CACHE}}.
h1. Non Goals

This implementation is not designed to change the planning process. It does not 
transform {{RexLiteral}} into {{RexDynamicParam}}, and doesn't change the cost 
calculation of the query.
h1. Implementation Details

After a query has been optimized there are three phases that remaining phases 
to the query:
 # Generating the Java code
 # Binding Hoisted Variables
 # Runtime execution via {{Bindable.bind(DataContext, HoistedVariables)}}

Each of these phases will interact with a new class called {{HoistedVariables}}

!HoistedVariables.png!

Each of these methods are used in the above three phases to hoist a variable 
from within the query into the runtime execution of the {{Bindable}}.

The method {{implement}} of the interface {{EnumerableRel}} is used to generate 
the Java code in phase one. Each of these {{RelNode}} can now call 
{{registerVariable(String)}} to allocate a {{Slot}} for their unbound value. 
This {{Slot}} is reserved for their use and is unique for the query plan. When 
a {{RelNode}} registers a variable it needs to save that {{Slot}} into a 
property so it can be referenced in phase 2. This {{Slot}} is then referenced 
in code generation by calling {{EnumerableRel.lookupValue}} which returns an 
{{Expression}} that will extract the bound value at for the {{Slot}}.

Below is a snippet from {{EnumerableLimit}} implementation of {{implement}} 
that uses {{HoistedVariables}}.
{code:java}
Expression v = builder.append("child", result.block);
if (offset != null) {
  if (offset instanceof RexDynamicParam) {
v = getDynamicExpression((RexDynamicParam) offset);
  } else {
// Register with Hoisted Variable here
offsetIndex = variables.registerVariable("offset");
v = builder.append(
"offset",
Expressions.call(
  v,
  BuiltInMethod.SKIP.method,
  // At runtime, fetch the bound variable. This returns the Java code to do 
that.
  EnumerableRel.lookupValue(offsetIndex, Integer.class)));
  }
}
if (fetch != null) {
  if (fetch instanceof RexDynamicParam) {
v = getDynamicExpression((RexDynamicParam) fetch);
  } else {
// Register with Hoisted Variable here
this.fetchIndex = variables.registerVariable("fetch");
v = builder.append(
  "fetch",
  Expressions.call(
  v,
  BuiltInMethod.TAKE.method,
  // At runtime, fetch the bound variable. This returns the Java code to do 
that.
  EnumerableRel.lookupValue(fetchIndex, Integer.class)));
  }
}
{code}
The second phase of the query execution is where registered {{Slots}} get 
bound. To this, our change adds a new optional method to {{Bindable}} called 
{{hoistVariables}}. This method is where an instance of {{EnumerableRel}} 
extracts the values out of the query plan and binds them into the 
{{HoistedVariables}} instance just prior to executing the query. Below is 
{{EnumerableLimit}} implementation:
{code:java}
@Override public void hoistedVariables(HoistedVariables variables) {
  getInputs()
  .stream()
  .forEach(rel -> {
final EnumerableRel enumerable = (EnumerableRel) rel;
enumerable.hoistedVariables(variables);
  });
  if (fetchIndex != null) {
// fetchIndex is the registered slot for this variable. Bind fetchIndex to 
fetch
variables.setVariable(fetchIndex, RexLiteral.intValue(fetch));
  }
  if (offsetIndex != null) {
// offsetIndex is the registered slot for this variable. Bind offsetIndex 
to offset.
variables.setVariable(offsetIndex, RexLiteral.intValue(offset));
  }
}
{code}
To tie these three phases together, {{CalcitePrepareImpl}} needs to setup the 
variables when it creates a {{PreparedResult}}:
{code:java}
try {
  CatalogReader.THREAD_LOCAL.set(catalogReader);
  final SqlConformance conformance = context.config().conformance();
  internalParameters.put("_conformance", conformance);
  // Get the compiled Bindable instance either from cache or generate a new one.
  bindable = 

[jira] [Commented] (CALCITE-963) Hoist literals

2019-09-18 Thread Scott Reynolds (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16932924#comment-16932924
 ] 

Scott Reynolds commented on CALCITE-963:


Giving it some more thought, going to take another stab at this and just do the 
{{RexDynamicParam}}. All of these changes and breakage is a good sign this 
approach is fighting against the system.  I will try another pass tomorrow. 

> Hoist literals
> --
>
> Key: CALCITE-963
> URL: https://issues.apache.org/jira/browse/CALCITE-963
> Project: Calcite
>  Issue Type: Bug
>Reporter: Julian Hyde
>Priority: Major
>  Labels: pull-request-available
> Attachments: HoistedVariables.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Convert literals into (internal) bind variables so that statements that 
> differ only in literal values can be executed using the same plan.
> In [mail 
> thread|http://mail-archives.apache.org/mod_mbox/calcite-dev/201511.mbox/%3c56437bf8.70...@gmail.com%3E]
>  Homer wrote:
> {quote}Imagine that it is common to run a large number of very similar 
> machine generated queries that just change the literals in the sql query.
> For example (the real queries would be much more complex):
> {code}Select * from emp where empno = 1;
> Select * from emp where empno = 2;
> etc.{code}
> The plan that is likely being generated for these kind of queries is going to 
> be very much the same each time, so to save some time, I would like to 
> recognize that the literals are all that have changed in a query and use the 
> previously optimized execution plan and just replace the literals.{quote}
> I think this could be done as a transform on the initial RelNode tree. It 
> would find literals (RexLiteral), replace them with bind variables 
> (RexDynamicParam) and write the value into a pool. The next statement would 
> go through the same process and the RelNode tree would be identical, but with 
> possibly different values for the bind variables.
> The bind variables are of course internal; not visible from JDBC. When the 
> statement is executed, the bind variables are implicitly bound.
> Statements would be held in a Guava cache.
> This would be enabled by a config parameter. Unfortunately I don't think we 
> could do this by default -- we'd lose optimization power because we would no 
> longer be able to do constant reduction.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-963) Hoist literals

2019-09-20 Thread Scott Reynolds (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16934792#comment-16934792
 ] 

Scott Reynolds commented on CALCITE-963:


So took a break from being in this code base and came back with a fresh brain 
and eyes. Went back over my original notes and noticed I hadn't explored what 
{{EnumerableRelImplementor#stash}} method did. After reviewing that method I 
determined that is *exactly* what I needed during code compilation. This method 
takes an object stores it in {{DataContext}} and returns an {{Expression}} that 
calls {{DataContext#get}} to pull out the object. This generates the same code 
for each query of the format like:
{code:sql}
Select * from emp where empno = 1;
Select * from emp where empno = 2;
{code}
as it stashes all the predicates via {{EnumerableRelImplementaor#stash}}

This is awesome! and this pull request and design is objectively worse then 
this implementation. So sorry for the digression in this ticket, it appears I :
a.) Don't understand what variable hoisting really is
b.) Didn't spend enough time researching and understanding all the tools 
Calcite provides.



> Hoist literals
> --
>
> Key: CALCITE-963
> URL: https://issues.apache.org/jira/browse/CALCITE-963
> Project: Calcite
>  Issue Type: Bug
>Reporter: Julian Hyde
>Priority: Major
>  Labels: pull-request-available
> Attachments: HoistedVariables.png
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Convert literals into (internal) bind variables so that statements that 
> differ only in literal values can be executed using the same plan.
> In [mail 
> thread|http://mail-archives.apache.org/mod_mbox/calcite-dev/201511.mbox/%3c56437bf8.70...@gmail.com%3E]
>  Homer wrote:
> {quote}Imagine that it is common to run a large number of very similar 
> machine generated queries that just change the literals in the sql query.
> For example (the real queries would be much more complex):
> {code}Select * from emp where empno = 1;
> Select * from emp where empno = 2;
> etc.{code}
> The plan that is likely being generated for these kind of queries is going to 
> be very much the same each time, so to save some time, I would like to 
> recognize that the literals are all that have changed in a query and use the 
> previously optimized execution plan and just replace the literals.{quote}
> I think this could be done as a transform on the initial RelNode tree. It 
> would find literals (RexLiteral), replace them with bind variables 
> (RexDynamicParam) and write the value into a pool. The next statement would 
> go through the same process and the RelNode tree would be identical, but with 
> possibly different values for the bind variables.
> The bind variables are of course internal; not visible from JDBC. When the 
> statement is executed, the bind variables are implicitly bound.
> Statements would be held in a Guava cache.
> This would be enabled by a config parameter. Unfortunately I don't think we 
> could do this by default -- we'd lose optimization power because we would no 
> longer be able to do constant reduction.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CALCITE-3508) Strengthen outer Join to inner if it is under a Filter that discards null values

2019-11-16 Thread Scott Reynolds (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-3508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Reynolds updated CALCITE-3508:

Description: 
Today, FilterJoinRule given an Outer Join the rule strengthens it to Inner Join 
when the nullable side contains a filter IS_NOT_NULL. Below is the code.
{code:java}
for (RexNode filter : aboveFilters) {
  if (joinType.generatesNullsOnLeft()
  && Strong.isNotTrue(filter, leftBitmap)) {
joinType = joinType.cancelNullsOnLeft();
  }
  if (joinType.generatesNullsOnRight()
  && Strong.isNotTrue(filter, rightBitmap)) {
joinType = joinType.cancelNullsOnRight();
  }
  if (!joinType.isOuterJoin()) {
break;
  }
}
{code}
This code looks at the filter to determine if it is always true, then it can 
alter the join type by removing the null on that side.

We can see this in the following test RelOptRules#testStrengthenJoinType, which 
executes the following SQL that transforms from a LEFT OUTER JOIN to an INNER 
JOIN
{code:sql}
select *
from dept left join emp on dept.deptno = emp.deptno
where emp.deptno is not null and emp.sal > 100
{code}
This ticket is about broadening the application of this rule to a sql like the 
following:
{code:sql}
select *
from dept left join emp on dept.deptno = emp.deptno
where emp.sal > 100
{code}
 This originally came up on the mailing list: 
[https://mail-archives.apache.org/mod_mbox/calcite-dev/201909.mbox/%3CCAGzrZ38hh%2B8B8TG8gVHUfVKunJZ4L%2BK7rmrinQxpwOHcdJbzGQ%40mail.gmail.com%3E]

and in that thread it was pointed out that there are filters that prevent this 
from being applied:
{code:sql}
SELECT b.title
FROM Book b
LEFT JOIN Author a ON b.author = a.id
WHERE a.name <> 'Victor'
{code}
This means we need to ensure we that the OUTER JOIN doesn't contain – for lack 
of a different term – negation filters. If there is a negation – like NOT_EQUAL 
– the JOIN cannot be strengthened.

  was:
Today, FilterJoinRule given an Outer Join the rule strengthens it to Inner Join 
when the nullable side contains a filter IS_NOT_NULL. Below is the code.
{code:java}
for (RexNode filter : aboveFilters) {
  if (joinType.generatesNullsOnLeft()
  && Strong.isNotTrue(filter, leftBitmap)) {
joinType = joinType.cancelNullsOnLeft();
  }
  if (joinType.generatesNullsOnRight()
  && Strong.isNotTrue(filter, rightBitmap)) {
joinType = joinType.cancelNullsOnRight();
  }
  if (!joinType.isOuterJoin()) {
break;
  }
}
{code}
This code looks at the filter to determine if it is always true, then it can 
alter the join type by removing the null on that side.

We can see this in the following test RelOptRules#testStrengthenJoinType, which 
executes the following SQL that transforms from a LEFT OUTER JOIN to an INNER 
JOIN
{code:sql}
select *
from dept left join emp on dept.deptno = emp.deptno
where emp.deptno is not null and emp.sal > 100
{code}
This ticket is about broadening the application of this rule to a sql like the 
following:
{code:sql}
select *
from dept left join emp on dept.deptno = emp.deptno
where emp.sal > 100
{code}
 This originally came up on the mailing list: 
[https://mail-archives.apache.org/mod_mbox/calcite-dev/201909.mbox/browser]

and in that thread it was pointed out that there are filters that prevent this 
from being applied:
{code:sql}
SELECT b.title
FROM Book b
LEFT JOIN Author a ON b.author = a.id
WHERE a.name <> 'Victor'
{code}
This means we need to ensure we that the OUTER JOIN doesn't contain – for lack 
of a different term – negation filters. If there is a negation – like NOT_EQUAL 
– the JOIN cannot be strengthened.


> Strengthen outer Join to inner if it is under a Filter that discards null 
> values
> 
>
> Key: CALCITE-3508
> URL: https://issues.apache.org/jira/browse/CALCITE-3508
> Project: Calcite
>  Issue Type: Improvement
>Reporter: Scott Reynolds
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Today, FilterJoinRule given an Outer Join the rule strengthens it to Inner 
> Join when the nullable side contains a filter IS_NOT_NULL. Below is the code.
> {code:java}
> for (RexNode filter : aboveFilters) {
>   if (joinType.generatesNullsOnLeft()
>   && Strong.isNotTrue(filter, leftBitmap)) {
> joinType = joinType.cancelNullsOnLeft();
>   }
>   if (joinType.generatesNullsOnRight()
>   && Strong.isNotTrue(filter, rightBitmap)) {
> joinType = joinType.cancelNullsOnRight();
>   }
>   if (!joinType.isOuterJoin()) {
> break;
>   }
> }
> {code}
> This code looks at the filter to determine if it is always true, then it can 
> alter the join type by 

[jira] [Commented] (CALCITE-3508) Strengthen outer Join to inner if it is under a Filter that discards null values

2019-11-17 Thread Scott Reynolds (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-3508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16976210#comment-16976210
 ] 

Scott Reynolds commented on CALCITE-3508:
-

I am left with one more remaining design question. Today the {{FilterJoinRule}} 
has the following Javadoc
{code:java}
/**
 * Planner rule that pushes filters above and
 * within a join node into the join node and/or its children nodes.
 */
{code}
We should consider doing one of two things:
 1. Updating the javadoc to indicate it can Strengthen the join type
 2. Break the Strengthen join type into a separate rule

Should we move this logic into a separate rule like we do with 
{{JoinProjectTransposeRule}}

> Strengthen outer Join to inner if it is under a Filter that discards null 
> values
> 
>
> Key: CALCITE-3508
> URL: https://issues.apache.org/jira/browse/CALCITE-3508
> Project: Calcite
>  Issue Type: Improvement
>Reporter: Scott Reynolds
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Today, FilterJoinRule given an Outer Join the rule strengthens it to Inner 
> Join when the nullable side contains a filter IS_NOT_NULL. Below is the code.
> {code:java}
> for (RexNode filter : aboveFilters) {
>   if (joinType.generatesNullsOnLeft()
>   && Strong.isNotTrue(filter, leftBitmap)) {
> joinType = joinType.cancelNullsOnLeft();
>   }
>   if (joinType.generatesNullsOnRight()
>   && Strong.isNotTrue(filter, rightBitmap)) {
> joinType = joinType.cancelNullsOnRight();
>   }
>   if (!joinType.isOuterJoin()) {
> break;
>   }
> }
> {code}
> This code looks at the filter to determine if it is always true, then it can 
> alter the join type by removing the null on that side.
> We can see this in the following test RelOptRules#testStrengthenJoinType, 
> which executes the following SQL that transforms from a LEFT OUTER JOIN to an 
> INNER JOIN
> {code:sql}
> select *
> from dept left join emp on dept.deptno = emp.deptno
> where emp.deptno is not null and emp.sal > 100
> {code}
> This ticket is about broadening the application of this rule to a sql like 
> the following:
> {code:sql}
> select *
> from dept left join emp on dept.deptno = emp.deptno
> where emp.sal > 100
> {code}
>  This originally came up on the mailing list: 
> [https://mail-archives.apache.org/mod_mbox/calcite-dev/201909.mbox/%3CCAGzrZ38hh%2B8B8TG8gVHUfVKunJZ4L%2BK7rmrinQxpwOHcdJbzGQ%40mail.gmail.com%3E]
> and in that thread it was pointed out that there are filters that prevent 
> this from being applied:
> {code:sql}
> SELECT b.title
> FROM Book b
> LEFT JOIN Author a ON b.author = a.id
> WHERE a.name <> 'Victor'
> {code}
> This means we need to ensure we that the OUTER JOIN doesn't contain – for 
> lack of a different term – negation filters. If there is a negation – like 
> NOT_EQUAL – the JOIN cannot be strengthened.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CALCITE-3508) Strengthen JOINs when Filter enforces the nullable side(s) to non-nullable

2019-11-15 Thread Scott Reynolds (Jira)
Scott Reynolds created CALCITE-3508:
---

 Summary: Strengthen JOINs when Filter enforces the nullable 
side(s) to non-nullable
 Key: CALCITE-3508
 URL: https://issues.apache.org/jira/browse/CALCITE-3508
 Project: Calcite
  Issue Type: Improvement
Reporter: Scott Reynolds


Today, FilterJoinRule given an Outer Join the rule strengthens it to Inner Join 
when the nullable side contains a filter IS_NOT_NULL. Below is the code.
{code:java}
for (RexNode filter : aboveFilters) {
  if (joinType.generatesNullsOnLeft()
  && Strong.isNotTrue(filter, leftBitmap)) {
joinType = joinType.cancelNullsOnLeft();
  }
  if (joinType.generatesNullsOnRight()
  && Strong.isNotTrue(filter, rightBitmap)) {
joinType = joinType.cancelNullsOnRight();
  }
  if (!joinType.isOuterJoin()) {
break;
  }
}
{code}
This code looks at the filter to determine if it is always true, then it can 
alter the join type by removing the null on that side.

We can see this in the following test RelOptRules#testStrengthenJoinType, which 
executes the following SQL that transforms from a LEFT OUTER JOIN to an INNER 
JOIN
{code:sql}
select *
from dept left join emp on dept.deptno = emp.deptno
where emp.deptno is not null and emp.sal > 100
{code}
This ticket is about broadening the application of this rule to a sql like the 
following:
{code:sql}
select *
from dept left join emp on dept.deptno = emp.deptno
where emp.sal > 100
{code}
 This originally came up on the mailing list: 
[https://mail-archives.apache.org/mod_mbox/calcite-dev/201909.mbox/browser]

and in that thread it was pointed out that there are filters that prevent this 
from being applied:
{code:sql}
SELECT b.title
FROM Book b
LEFT JOIN Author a ON b.author = a.id
WHERE a.name <> 'Victor'
{code}
This means we need to ensure we that the OUTER JOIN doesn't contain – for lack 
of a different term – negation filters. If there is a negation – like NOT_EQUAL 
– the JOIN cannot be strengthened.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CALCITE-3508) Strengthen Outer Joins based on FILTER clauses

2019-11-15 Thread Scott Reynolds (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-3508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Reynolds updated CALCITE-3508:

Summary: Strengthen Outer Joins based on FILTER clauses  (was: Strengthen 
JOINs when Filter enforces the nullable side(s) to non-nullable)

> Strengthen Outer Joins based on FILTER clauses
> --
>
> Key: CALCITE-3508
> URL: https://issues.apache.org/jira/browse/CALCITE-3508
> Project: Calcite
>  Issue Type: Improvement
>Reporter: Scott Reynolds
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Today, FilterJoinRule given an Outer Join the rule strengthens it to Inner 
> Join when the nullable side contains a filter IS_NOT_NULL. Below is the code.
> {code:java}
> for (RexNode filter : aboveFilters) {
>   if (joinType.generatesNullsOnLeft()
>   && Strong.isNotTrue(filter, leftBitmap)) {
> joinType = joinType.cancelNullsOnLeft();
>   }
>   if (joinType.generatesNullsOnRight()
>   && Strong.isNotTrue(filter, rightBitmap)) {
> joinType = joinType.cancelNullsOnRight();
>   }
>   if (!joinType.isOuterJoin()) {
> break;
>   }
> }
> {code}
> This code looks at the filter to determine if it is always true, then it can 
> alter the join type by removing the null on that side.
> We can see this in the following test RelOptRules#testStrengthenJoinType, 
> which executes the following SQL that transforms from a LEFT OUTER JOIN to an 
> INNER JOIN
> {code:sql}
> select *
> from dept left join emp on dept.deptno = emp.deptno
> where emp.deptno is not null and emp.sal > 100
> {code}
> This ticket is about broadening the application of this rule to a sql like 
> the following:
> {code:sql}
> select *
> from dept left join emp on dept.deptno = emp.deptno
> where emp.sal > 100
> {code}
>  This originally came up on the mailing list: 
> [https://mail-archives.apache.org/mod_mbox/calcite-dev/201909.mbox/browser]
> and in that thread it was pointed out that there are filters that prevent 
> this from being applied:
> {code:sql}
> SELECT b.title
> FROM Book b
> LEFT JOIN Author a ON b.author = a.id
> WHERE a.name <> 'Victor'
> {code}
> This means we need to ensure we that the OUTER JOIN doesn't contain – for 
> lack of a different term – negation filters. If there is a negation – like 
> NOT_EQUAL – the JOIN cannot be strengthened.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-3508) Strengthen outer Join to inner if it is under a Filter that discards null values

2019-11-16 Thread Scott Reynolds (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-3508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16975780#comment-16975780
 ] 

Scott Reynolds commented on CALCITE-3508:
-

Thank you for the subject change it is considerably better! Updated pull 
request title to reflect that as well.

Looking at
{code:java}
JoinProjectTransposeRule{code}
I see that it is using
{code:java}
/** Returns whether all expressions in a list are strong. */
  public static boolean allStrong(List operands)
{code}
Today, this will return true for
{code:java}
a.name <> 'Victor'{code}
. Is that a what we should expect from
{noformat}public static boolean isStrong(RexNode){noformat}
? Perhaps we should extend that to cover more cases?
{code:java}
/**
   * Returns whether a given expression is strong.
   *
   * Examples:
   * 
   *   Returns true for {@code c = 1} since it returns null if and only if
   *   c is null
   *   Returns false for {@code c IS NULL} since it always returns TRUE
   *   or FALSE
   *
   *
   * @param e Expression
   * @return true if the expression is strong, false otherwise
   */
  public static boolean isStrong(RexNode e)
{code}
To handle `RexCall`s with negations etc?

> Strengthen outer Join to inner if it is under a Filter that discards null 
> values
> 
>
> Key: CALCITE-3508
> URL: https://issues.apache.org/jira/browse/CALCITE-3508
> Project: Calcite
>  Issue Type: Improvement
>Reporter: Scott Reynolds
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Today, FilterJoinRule given an Outer Join the rule strengthens it to Inner 
> Join when the nullable side contains a filter IS_NOT_NULL. Below is the code.
> {code:java}
> for (RexNode filter : aboveFilters) {
>   if (joinType.generatesNullsOnLeft()
>   && Strong.isNotTrue(filter, leftBitmap)) {
> joinType = joinType.cancelNullsOnLeft();
>   }
>   if (joinType.generatesNullsOnRight()
>   && Strong.isNotTrue(filter, rightBitmap)) {
> joinType = joinType.cancelNullsOnRight();
>   }
>   if (!joinType.isOuterJoin()) {
> break;
>   }
> }
> {code}
> This code looks at the filter to determine if it is always true, then it can 
> alter the join type by removing the null on that side.
> We can see this in the following test RelOptRules#testStrengthenJoinType, 
> which executes the following SQL that transforms from a LEFT OUTER JOIN to an 
> INNER JOIN
> {code:sql}
> select *
> from dept left join emp on dept.deptno = emp.deptno
> where emp.deptno is not null and emp.sal > 100
> {code}
> This ticket is about broadening the application of this rule to a sql like 
> the following:
> {code:sql}
> select *
> from dept left join emp on dept.deptno = emp.deptno
> where emp.sal > 100
> {code}
>  This originally came up on the mailing list: 
> [https://mail-archives.apache.org/mod_mbox/calcite-dev/201909.mbox/browser]
> and in that thread it was pointed out that there are filters that prevent 
> this from being applied:
> {code:sql}
> SELECT b.title
> FROM Book b
> LEFT JOIN Author a ON b.author = a.id
> WHERE a.name <> 'Victor'
> {code}
> This means we need to ensure we that the OUTER JOIN doesn't contain – for 
> lack of a different term – negation filters. If there is a negation – like 
> NOT_EQUAL – the JOIN cannot be strengthened.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CALCITE-5025) Update commons-io:commons-io Directory Travesal vulnerabliltiy

2022-02-26 Thread Scott Reynolds (Jira)
Scott Reynolds created CALCITE-5025:
---

 Summary: Update commons-io:commons-io Directory Travesal 
vulnerabliltiy
 Key: CALCITE-5025
 URL: https://issues.apache.org/jira/browse/CALCITE-5025
 Project: Calcite
  Issue Type: Bug
Reporter: Scott Reynolds


Calcite depends commons-io:commons-io 2.4 – which was released on 
{{2012-06-12}} -- which can be exploited to access parent directories. In 
recent months, there have been a fair number of releases for this package and 
[Synk lists this as the only vulnerability it has 
seen|https://snyk.io/vuln/maven:commons-io:commons-io].

Task is simple, bump the version to 2.7 or higher -- if I may suggest just 
going to 2.11.0.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (CALCITE-5226) Resolve security Vuln in Commons-DBCP

2022-08-03 Thread Scott Reynolds (Jira)
Scott Reynolds created CALCITE-5226:
---

 Summary: Resolve security Vuln in Commons-DBCP
 Key: CALCITE-5226
 URL: https://issues.apache.org/jira/browse/CALCITE-5226
 Project: Calcite
  Issue Type: Bug
Reporter: Scott Reynolds


In DBCP-562, the information is leaked. This was fixed in 2021. We should bump 
to to the latest version to resolve it.

 

[https://ossindex.sonatype.org/vulnerability/sonatype-2020-1349?component-type=maven=org.apache.commons%2Fcommons-dbcp2_source=ossindex-client_medium=integration_content=1.7.0]

[https://ossindex.sonatype.org/vulnerability/sonatype-2020-0460?component-type=maven=org.apache.commons%2Fcommons-dbcp2_source=ossindex-client_medium=integration_content=1.7.0]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (CALCITE-5226) Resolve security Vuln in Commons-DBCP

2022-08-03 Thread Scott Reynolds (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-5226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Reynolds reassigned CALCITE-5226:
---

Assignee: Scott Reynolds

> Resolve security Vuln in Commons-DBCP
> -
>
> Key: CALCITE-5226
> URL: https://issues.apache.org/jira/browse/CALCITE-5226
> Project: Calcite
>  Issue Type: Bug
>Reporter: Scott Reynolds
>Assignee: Scott Reynolds
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In DBCP-562, the information is leaked. This was fixed in 2021. We should 
> bump to to the latest version to resolve it.
>  
> [https://ossindex.sonatype.org/vulnerability/sonatype-2020-1349?component-type=maven=org.apache.commons%2Fcommons-dbcp2_source=ossindex-client_medium=integration_content=1.7.0]
> [https://ossindex.sonatype.org/vulnerability/sonatype-2020-0460?component-type=maven=org.apache.commons%2Fcommons-dbcp2_source=ossindex-client_medium=integration_content=1.7.0]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-5391) JoinOnUniqueToSemiJoinRule should preserve field names, if possible

2022-11-21 Thread Scott Reynolds (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-5391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17636938#comment-17636938
 ] 

Scott Reynolds commented on CALCITE-5391:
-

I am worried my application – which depends on field names – is going to break 
in other ways because of this. I did some exploration and identified a few 
other places where the field names change – some in meaningful ways. I found 
Rules that rewrote all the names and ignoring all the aliases the user 
provided. And I found – I believe mostly caused by ambiguous {{{}RexNode{}}}s – 
field changes from {{Something}} to {{{}Something0{}}}.

How *should* applications code against the result set? Should applications keep 
their own map of field names to ordinals instead of relying on the field names 
from the root {{RelNode}} ?
h3. How I ran the Tests

I updated {{RelOptFixture#check()}} to compare the before and after plan's 
{{RelDataType}}
{code:none}
diff --git a/testkit/src/main/java/org/apache/calcite/test/RelOptFixture.java 
b/testkit/src/main/java/org/apache/calcite/test/RelOptFixture.java
index ecb474a9e..ed99437c7 100644
--- a/testkit/src/main/java/org/apache/calcite/test/RelOptFixture.java
+++ b/testkit/src/main/java/org/apache/calcite/test/RelOptFixture.java
@@ -400,6 +400,8 @@ private void checkPlanning(boolean unchanged) {
   }
 }
 assertThat(relAfter, relIsValid());
+
+assertThat(relAfter.getRowType(), is(relBefore.getRowType()));
   }
{code}
And then ran {{./gradlew :core:test --tests 
org.apache.calcite.test.RelOptRulesTest}}
h3. What I found
h4. Tests that rewrote all the column names
 # testWithinDistinctCountDistinct
 # testProjectToWindowRuleForMultipleWindows
 # testReduceConstantsWindow
 # testExpressionInWindowFunction
 # testProjectWindowTransposeRuleWithConstants
 # testProjectAggregateMergeSum0

h4. Column Names changed from {{Something}} to {{Something0}}
 # testPushAggregateThroughOuterJoin2
 # testPushAggregateThroughOuterJoin3
 # testAggregateJoinRemove6

h4. Plans with expressions without an alias

There where a few plans that have expressions without a name so they get a name 
change through the planning process. These seemed harmless.

> JoinOnUniqueToSemiJoinRule should preserve field names, if possible
> ---
>
> Key: CALCITE-5391
> URL: https://issues.apache.org/jira/browse/CALCITE-5391
> Project: Calcite
>  Issue Type: Bug
>Affects Versions: 1.32.0
>Reporter: Scott Reynolds
>Assignee: Scott Reynolds
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.33.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The new rule pushes a copy of the [original project without the alias 
> names|https://github.com/apache/calcite/pull/2848/files#r1026959819] 
> {code:java}
> builder.project(project.getProjects());
> {code}
> This results in a new SQL plan without the name fields -- they become {{$fN}} 
> fields in different plans. 
> Small change is required to put the names from the {{RelDataType}} of the 
> {{Project}} similar to the other {{Semijoin}} rule.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-5391) JoinOnUniqueToSemiJoinRule should preserve field names, if possible

2022-11-23 Thread Scott Reynolds (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-5391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17637977#comment-17637977
 ] 

Scott Reynolds commented on CALCITE-5391:
-

Thanks so much, after reading that through and looking at a bunch of calcite 
code I understand now. In my application, we will have to adopt much the same 
way as the JDBC implementation within Calcite adopts. And had we done that, our 
application wouldn't have broke. Thanks for sharing CALCITE-1584.

I have updated the pull request with tests and links to this issue.

> JoinOnUniqueToSemiJoinRule should preserve field names, if possible
> ---
>
> Key: CALCITE-5391
> URL: https://issues.apache.org/jira/browse/CALCITE-5391
> Project: Calcite
>  Issue Type: Bug
>Affects Versions: 1.32.0
>Reporter: Scott Reynolds
>Assignee: Scott Reynolds
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.33.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The new rule pushes a copy of the [original project without the alias 
> names|https://github.com/apache/calcite/pull/2848/files#r1026959819] 
> {code:java}
> builder.project(project.getProjects());
> {code}
> This results in a new SQL plan without the name fields -- they become {{$fN}} 
> fields in different plans. 
> Small change is required to put the names from the {{RelDataType}} of the 
> {{Project}} similar to the other {{Semijoin}} rule.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (CALCITE-5391) JoinOnUniqueToSemiJoinRule removes a Projections aliases

2022-11-18 Thread Scott Reynolds (Jira)
Scott Reynolds created CALCITE-5391:
---

 Summary: JoinOnUniqueToSemiJoinRule removes a Projections aliases
 Key: CALCITE-5391
 URL: https://issues.apache.org/jira/browse/CALCITE-5391
 Project: Calcite
  Issue Type: Bug
Affects Versions: 1.32.0
Reporter: Scott Reynolds
Assignee: Scott Reynolds


The new rule pushes a copy of the [original project without the alias 
names|https://github.com/apache/calcite/pull/2848/files#r1026959819] 

{code:java}
builder.project(project.getProjects());
{code}

This results in a new SQL plan without the name fields -- they become {{$fN}} 
fields in different plans. 

Small change is required to put the names from the {{RelDataType}} of the 
{{Project}} similar to the other {{Semijoin}} rule.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-5391) JoinOnUniqueToSemiJoinRule removes a Projections aliases

2022-11-21 Thread Scott Reynolds (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-5391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17636853#comment-17636853
 ] 

Scott Reynolds commented on CALCITE-5391:
-

bq. I think "JoinOnUniqueToSemiJoinRule should preserve field names, if 
possible" would be a better summary and commit message.

If possible? Is there a case where the plan wouldn't preserve field names? 

> JoinOnUniqueToSemiJoinRule removes a Projections aliases
> 
>
> Key: CALCITE-5391
> URL: https://issues.apache.org/jira/browse/CALCITE-5391
> Project: Calcite
>  Issue Type: Bug
>Affects Versions: 1.32.0
>Reporter: Scott Reynolds
>Assignee: Scott Reynolds
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.33.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The new rule pushes a copy of the [original project without the alias 
> names|https://github.com/apache/calcite/pull/2848/files#r1026959819] 
> {code:java}
> builder.project(project.getProjects());
> {code}
> This results in a new SQL plan without the name fields -- they become {{$fN}} 
> fields in different plans. 
> Small change is required to put the names from the {{RelDataType}} of the 
> {{Project}} similar to the other {{Semijoin}} rule.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (CALCITE-5391) JoinOnUniqueToSemiJoinRule should preserve field names, if possible

2022-11-21 Thread Scott Reynolds (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-5391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Reynolds updated CALCITE-5391:

Summary: JoinOnUniqueToSemiJoinRule should preserve field names, if 
possible  (was: JoinOnUniqueToSemiJoinRule should preserve field names)

> JoinOnUniqueToSemiJoinRule should preserve field names, if possible
> ---
>
> Key: CALCITE-5391
> URL: https://issues.apache.org/jira/browse/CALCITE-5391
> Project: Calcite
>  Issue Type: Bug
>Affects Versions: 1.32.0
>Reporter: Scott Reynolds
>Assignee: Scott Reynolds
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.33.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The new rule pushes a copy of the [original project without the alias 
> names|https://github.com/apache/calcite/pull/2848/files#r1026959819] 
> {code:java}
> builder.project(project.getProjects());
> {code}
> This results in a new SQL plan without the name fields -- they become {{$fN}} 
> fields in different plans. 
> Small change is required to put the names from the {{RelDataType}} of the 
> {{Project}} similar to the other {{Semijoin}} rule.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (CALCITE-5391) JoinOnUniqueToSemiJoinRule should preserve field names

2022-11-21 Thread Scott Reynolds (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-5391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Reynolds updated CALCITE-5391:

Summary: JoinOnUniqueToSemiJoinRule should preserve field names  (was: 
JoinOnUniqueToSemiJoinRule removes a Projections aliases)

> JoinOnUniqueToSemiJoinRule should preserve field names
> --
>
> Key: CALCITE-5391
> URL: https://issues.apache.org/jira/browse/CALCITE-5391
> Project: Calcite
>  Issue Type: Bug
>Affects Versions: 1.32.0
>Reporter: Scott Reynolds
>Assignee: Scott Reynolds
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.33.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The new rule pushes a copy of the [original project without the alias 
> names|https://github.com/apache/calcite/pull/2848/files#r1026959819] 
> {code:java}
> builder.project(project.getProjects());
> {code}
> This results in a new SQL plan without the name fields -- they become {{$fN}} 
> fields in different plans. 
> Small change is required to put the names from the {{RelDataType}} of the 
> {{Project}} similar to the other {{Semijoin}} rule.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (CALCITE-5226) Resolve security Vulnnerability in Commons-DBCP

2023-03-16 Thread Scott Reynolds (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-5226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Reynolds updated CALCITE-5226:

Summary: Resolve security Vulnnerability in Commons-DBCP  (was: Resolve 
security Vuln in Commons-DBCP)

> Resolve security Vulnnerability in Commons-DBCP
> ---
>
> Key: CALCITE-5226
> URL: https://issues.apache.org/jira/browse/CALCITE-5226
> Project: Calcite
>  Issue Type: Bug
>Reporter: Scott Reynolds
>Assignee: Scott Reynolds
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In DBCP-562, the information is leaked. This was fixed in 2021. We should 
> bump to to the latest version to resolve it.
>  
> [https://ossindex.sonatype.org/vulnerability/sonatype-2020-1349?component-type=maven=org.apache.commons%2Fcommons-dbcp2_source=ossindex-client_medium=integration_content=1.7.0]
> [https://ossindex.sonatype.org/vulnerability/sonatype-2020-0460?component-type=maven=org.apache.commons%2Fcommons-dbcp2_source=ossindex-client_medium=integration_content=1.7.0]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)