[jira] [Commented] (CALCITE-963) Hoist literals

2019-10-10 Thread Danny Chen (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949073#comment-16949073
 ] 

Danny Chen commented on CALCITE-963:


[~ScottReynolds] Sorry to bother, can you explain more about 
EnumerableRelImplementaor#stash ? I only see one direct invoke[1]

> This method takes an object stores it in DataContext and returns an 
> Expression that calls DataContext#get to pull out the object.

I didn't see a logic like what you described. It seems that 
EnumerableRelImplementaor#stash did some expression cache, but for constant, it 
always returns a constant expression directly.

> Hoist literals
> --
>
> Key: CALCITE-963
> URL: https://issues.apache.org/jira/browse/CALCITE-963
> Project: Calcite
>  Issue Type: Bug
>Reporter: Julian Hyde
>Priority: Major
>  Labels: pull-request-available
> Attachments: HoistedVariables.png
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Convert literals into (internal) bind variables so that statements that 
> differ only in literal values can be executed using the same plan.
> In [mail 
> thread|http://mail-archives.apache.org/mod_mbox/calcite-dev/201511.mbox/%3c56437bf8.70...@gmail.com%3E]
>  Homer wrote:
> {quote}Imagine that it is common to run a large number of very similar 
> machine generated queries that just change the literals in the sql query.
> For example (the real queries would be much more complex):
> {code}Select * from emp where empno = 1;
> Select * from emp where empno = 2;
> etc.{code}
> The plan that is likely being generated for these kind of queries is going to 
> be very much the same each time, so to save some time, I would like to 
> recognize that the literals are all that have changed in a query and use the 
> previously optimized execution plan and just replace the literals.{quote}
> I think this could be done as a transform on the initial RelNode tree. It 
> would find literals (RexLiteral), replace them with bind variables 
> (RexDynamicParam) and write the value into a pool. The next statement would 
> go through the same process and the RelNode tree would be identical, but with 
> possibly different values for the bind variables.
> The bind variables are of course internal; not visible from JDBC. When the 
> statement is executed, the bind variables are implicitly bound.
> Statements would be held in a Guava cache.
> This would be enabled by a config parameter. Unfortunately I don't think we 
> could do this by default -- we'd lose optimization power because we would no 
> longer be able to do constant reduction.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-963) Hoist literals

2019-09-20 Thread Scott Reynolds (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16934792#comment-16934792
 ] 

Scott Reynolds commented on CALCITE-963:


So took a break from being in this code base and came back with a fresh brain 
and eyes. Went back over my original notes and noticed I hadn't explored what 
{{EnumerableRelImplementor#stash}} method did. After reviewing that method I 
determined that is *exactly* what I needed during code compilation. This method 
takes an object stores it in {{DataContext}} and returns an {{Expression}} that 
calls {{DataContext#get}} to pull out the object. This generates the same code 
for each query of the format like:
{code:sql}
Select * from emp where empno = 1;
Select * from emp where empno = 2;
{code}
as it stashes all the predicates via {{EnumerableRelImplementaor#stash}}

This is awesome! and this pull request and design is objectively worse then 
this implementation. So sorry for the digression in this ticket, it appears I :
a.) Don't understand what variable hoisting really is
b.) Didn't spend enough time researching and understanding all the tools 
Calcite provides.



> Hoist literals
> --
>
> Key: CALCITE-963
> URL: https://issues.apache.org/jira/browse/CALCITE-963
> Project: Calcite
>  Issue Type: Bug
>Reporter: Julian Hyde
>Priority: Major
>  Labels: pull-request-available
> Attachments: HoistedVariables.png
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Convert literals into (internal) bind variables so that statements that 
> differ only in literal values can be executed using the same plan.
> In [mail 
> thread|http://mail-archives.apache.org/mod_mbox/calcite-dev/201511.mbox/%3c56437bf8.70...@gmail.com%3E]
>  Homer wrote:
> {quote}Imagine that it is common to run a large number of very similar 
> machine generated queries that just change the literals in the sql query.
> For example (the real queries would be much more complex):
> {code}Select * from emp where empno = 1;
> Select * from emp where empno = 2;
> etc.{code}
> The plan that is likely being generated for these kind of queries is going to 
> be very much the same each time, so to save some time, I would like to 
> recognize that the literals are all that have changed in a query and use the 
> previously optimized execution plan and just replace the literals.{quote}
> I think this could be done as a transform on the initial RelNode tree. It 
> would find literals (RexLiteral), replace them with bind variables 
> (RexDynamicParam) and write the value into a pool. The next statement would 
> go through the same process and the RelNode tree would be identical, but with 
> possibly different values for the bind variables.
> The bind variables are of course internal; not visible from JDBC. When the 
> statement is executed, the bind variables are implicitly bound.
> Statements would be held in a Guava cache.
> This would be enabled by a config parameter. Unfortunately I don't think we 
> could do this by default -- we'd lose optimization power because we would no 
> longer be able to do constant reduction.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-963) Hoist literals

2019-09-18 Thread Scott Reynolds (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16932924#comment-16932924
 ] 

Scott Reynolds commented on CALCITE-963:


Giving it some more thought, going to take another stab at this and just do the 
{{RexDynamicParam}}. All of these changes and breakage is a good sign this 
approach is fighting against the system.  I will try another pass tomorrow. 

> Hoist literals
> --
>
> Key: CALCITE-963
> URL: https://issues.apache.org/jira/browse/CALCITE-963
> Project: Calcite
>  Issue Type: Bug
>Reporter: Julian Hyde
>Priority: Major
>  Labels: pull-request-available
> Attachments: HoistedVariables.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Convert literals into (internal) bind variables so that statements that 
> differ only in literal values can be executed using the same plan.
> In [mail 
> thread|http://mail-archives.apache.org/mod_mbox/calcite-dev/201511.mbox/%3c56437bf8.70...@gmail.com%3E]
>  Homer wrote:
> {quote}Imagine that it is common to run a large number of very similar 
> machine generated queries that just change the literals in the sql query.
> For example (the real queries would be much more complex):
> {code}Select * from emp where empno = 1;
> Select * from emp where empno = 2;
> etc.{code}
> The plan that is likely being generated for these kind of queries is going to 
> be very much the same each time, so to save some time, I would like to 
> recognize that the literals are all that have changed in a query and use the 
> previously optimized execution plan and just replace the literals.{quote}
> I think this could be done as a transform on the initial RelNode tree. It 
> would find literals (RexLiteral), replace them with bind variables 
> (RexDynamicParam) and write the value into a pool. The next statement would 
> go through the same process and the RelNode tree would be identical, but with 
> possibly different values for the bind variables.
> The bind variables are of course internal; not visible from JDBC. When the 
> statement is executed, the bind variables are implicitly bound.
> Statements would be held in a Guava cache.
> This would be enabled by a config parameter. Unfortunately I don't think we 
> could do this by default -- we'd lose optimization power because we would no 
> longer be able to do constant reduction.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-963) Hoist literals

2019-09-18 Thread Scott Reynolds (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16932891#comment-16932891
 ] 

Scott Reynolds commented on CALCITE-963:


{quote} * I was surprised that HoistedVariables was added to so many API calls. 
Maybe it could belong to another piece of the preparation context{quote}
I was tempted to add it to {{DataContext}}. It does have to land into 
{{CalciteSignature}}
{code:java}
public CalciteSignature(String sql,
List parameterList,
Map internalParameters,
RelDataType rowType,
List columns,
Meta.CursorFactory cursorFactory,
CalciteSchema rootSchema,
List collationList,
long maxRowCount,
Bindable bindable,
Meta.StatementType statementType,
HoistedVariables variables) {
  super(columns, sql, parameterList, internalParameters, cursorFactory,
  statementType);
  this.rowType = rowType;
  this.rootSchema = rootSchema;
  this.collationList = collationList;
  this.maxRowCount = maxRowCount;
  this.bindable = bindable;
  this.variables = variables;
}

public Enumerable enumerable(DataContext dataContext) {
  Enumerable enumerable = bindable.bind(dataContext, variables);
  if (maxRowCount >= 0) {
// Apply limit. In JDBC 0 means "no limit". But for us, -1 means
// "no limit", and 0 is a valid limit.
enumerable = EnumerableDefaults.take(enumerable, maxRowCount);
  }
  return enumerable;
}

{code} 

{quote}I was surprised that this work touched so much Enumerable code. An 
alternative approach would be to transform a RelNode tree, early in the 
planning process, transforming some RexLiteral instances into RexDynamicParam. 
The rest of the planning process would proceed as if the user had provided a 
statement with bind variables. I know you state that this was a non-goal, but 
why was it not a goal? It probably would have been a lot simpler, and it would 
have worked with conventions besides Enumerable.{quote}

Ya I should add color to this. Given the discussions on the mailing list I 
would like to add an optimization to our use case. Our Fact Tables contain a 
single account hierarchy represented by a column {{sub_account}}. Often a Fact 
will not be associated with an {{sub_account}} and so instead of storing that 
as null and dealing with missing right hand row,  we store a sentinel value in 
the Fact so our joins to the account dimension are simpler. So when a query 
comes in that for all Facts that do not contain a sub account, we want to 
change the cost calculation of that filter as {{sub_account_sid == 'IS_NULL'}} 
is not as selective as {{sub_account_sid = 'AC128485'}}. So my understanding 
is, by translating into a {{RexDynamicParam}}, we lose out on this optimization.

> Hoist literals
> --
>
> Key: CALCITE-963
> URL: https://issues.apache.org/jira/browse/CALCITE-963
> Project: Calcite
>  Issue Type: Bug
>Reporter: Julian Hyde
>Priority: Major
>  Labels: pull-request-available
> Attachments: HoistedVariables.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Convert literals into (internal) bind variables so that statements that 
> differ only in literal values can be executed using the same plan.
> In [mail 
> thread|http://mail-archives.apache.org/mod_mbox/calcite-dev/201511.mbox/%3c56437bf8.70...@gmail.com%3E]
>  Homer wrote:
> {quote}Imagine that it is common to run a large number of very similar 
> machine generated queries that just change the literals in the sql query.
> For example (the real queries would be much more complex):
> {code}Select * from emp where empno = 1;
> Select * from emp where empno = 2;
> etc.{code}
> The plan that is likely being generated for these kind of queries is going to 
> be very much the same each time, so to save some time, I would like to 
> recognize that the literals are all that have changed in a query and use the 
> previously optimized execution plan and just replace the literals.{quote}
> I think this could be done as a transform on the initial RelNode tree. It 
> would find literals (RexLiteral), replace them with bind variables 
> (RexDynamicParam) and write the value into a pool. The next statement would 
> go through the same process and the RelNode tree would be identical, but with 
> possibly different values for the bind variables.
> The bind variables are of course internal; not visible from JDBC. When the 
> statement is executed, the bind variables are implicitly bound.
> Statements would be held in a Guava cache.
> This would be enabled by a config parameter. Unfortunately I don't think we 
> could do this by default -- we'd lose optimization power because we would no 
> longer be able to do constant reduction.



--
This message was 

[jira] [Commented] (CALCITE-963) Hoist literals

2019-09-18 Thread Julian Hyde (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16932872#comment-16932872
 ] 

Julian Hyde commented on CALCITE-963:
-

I looked at the PR briefly:
* I was surprised that HoistedVariables was added to so many API calls. Maybe 
it could belong to another piece of the preparation context?
* As a plural noun, HoistedVariables is a poor name for a class, in my opinion. 
An instance of HoistedVariables is an object (singular) so it will be difficult 
to write good javadoc sentences describing it. 
* I was surprised that this work touched so much Enumerable code. An 
alternative approach would be to transform a RelNode tree, early in the 
planning process, transforming some RexLiteral instances into RexDynamicParam. 
The rest of the planning process would proceed as if the user had provided a 
statement with bind variables. I know you state that this was a non-goal, but 
why was it not a goal? It probably would have been a lot simpler, and it would 
have worked with conventions besides Enumerable.

> Hoist literals
> --
>
> Key: CALCITE-963
> URL: https://issues.apache.org/jira/browse/CALCITE-963
> Project: Calcite
>  Issue Type: Bug
>Reporter: Julian Hyde
>Priority: Major
>  Labels: pull-request-available
> Attachments: HoistedVariables.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Convert literals into (internal) bind variables so that statements that 
> differ only in literal values can be executed using the same plan.
> In [mail 
> thread|http://mail-archives.apache.org/mod_mbox/calcite-dev/201511.mbox/%3c56437bf8.70...@gmail.com%3E]
>  Homer wrote:
> {quote}Imagine that it is common to run a large number of very similar 
> machine generated queries that just change the literals in the sql query.
> For example (the real queries would be much more complex):
> {code}Select * from emp where empno = 1;
> Select * from emp where empno = 2;
> etc.{code}
> The plan that is likely being generated for these kind of queries is going to 
> be very much the same each time, so to save some time, I would like to 
> recognize that the literals are all that have changed in a query and use the 
> previously optimized execution plan and just replace the literals.{quote}
> I think this could be done as a transform on the initial RelNode tree. It 
> would find literals (RexLiteral), replace them with bind variables 
> (RexDynamicParam) and write the value into a pool. The next statement would 
> go through the same process and the RelNode tree would be identical, but with 
> possibly different values for the bind variables.
> The bind variables are of course internal; not visible from JDBC. When the 
> statement is executed, the bind variables are implicitly bound.
> Statements would be held in a Guava cache.
> This would be enabled by a config parameter. Unfortunately I don't think we 
> could do this by default -- we'd lose optimization power because we would no 
> longer be able to do constant reduction.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-963) Hoist literals

2019-09-18 Thread Scott Reynolds (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16932829#comment-16932829
 ] 

Scott Reynolds commented on CALCITE-963:


h1. Goal

When a query is issued to Calcite it is parsed, optimized and then generates a 
String of Java Class that implements {{Bindable}}. {{EnumerableInterpretable}} 
creates this string and checks to see if that string exists in 
{{com.google.common.cache}} and if it doesn't it will call into a Java 
compiler. Compilation process can take a considerable amount of time, Apache 
Kylin reported 50 to 150ms of additional computation time. Today, Apache 
Calcite will generate unique Java Class strings whenever any part of the query 
changes. This document details out the design and implementation of a hoisting 
technique within Apache Calcite. This design and implementation greatly 
increases the cache hit rate of {{EnumerableInterpretable}}'s 
{{BINDABLE_CACHE}}.
h1. Non Goals

This implementation is not designed to change the planning process. It does not 
transform {{RexLiteral}} into {{RexDynamicParam}}, and doesn't change the cost 
calculation of the query.
h1. Implementation Details

After a query has been optimized there are three phases that remaining phases 
to the query:
 # Generating the Java code
 # Binding Hoisted Variables
 # Runtime execution via {{Bindable.bind(DataContext, HoistedVariables)}}

Each of these phases will interact with a new class called {{HoistedVariables}}
 [file:HoistedVariables.png|file:///HoistedVariables.png]

Each of these methods are used in the above three phases to hoist a variable 
from within the query into the runtime execution of the {{Bindable}}.

The method {{implement}} of the interface {{EnumerableRel}} is used to generate 
the Java code in phase one. Each of these {{RelNode}} can now call 
{{registerVariable(String)}} to allocate a {{Slot}} for their unbound value. 
This {{Slot}} is reserved for their use and is unique for the query plan. When 
a {{RelNode}} registers a variable it needs to save that {{Slot}} into a 
property so it can be referenced in phase 2. This {{Slot}} is then referenced 
in code generation by calling {{EnumerableRel.lookupValue}} which returns an 
{{Expression}} that will extract the bound value at for the {{Slot}}.

Below is a snippet from {{EnumerableLimit}} implementation of {{implement}} 
that uses {{HoistedVariables}}.
{code:java}
Expression v = builder.append("child", result.block);
if (offset != null) {
  if (offset instanceof RexDynamicParam) {
v = getDynamicExpression((RexDynamicParam) offset);
  } else {
// Register with Hoisted Variable here
offsetIndex = variables.registerVariable("offset");
v = builder.append(
"offset",
Expressions.call(
  v,
  BuiltInMethod.SKIP.method,
  // At runtime, fetch the bound variable. This returns the Java code to do 
that.
  EnumerableRel.lookupValue(offsetIndex, Integer.class)));
  }
}
if (fetch != null) {
  if (fetch instanceof RexDynamicParam) {
v = getDynamicExpression((RexDynamicParam) fetch);
  } else {
// Register with Hoisted Variable here
this.fetchIndex = variables.registerVariable("fetch");
v = builder.append(
  "fetch",
  Expressions.call(
  v,
  BuiltInMethod.TAKE.method,
  // At runtime, fetch the bound variable. This returns the Java code to do 
that.
  EnumerableRel.lookupValue(fetchIndex, Integer.class)));
  }
}
{code}
The second phase of the query execution is where registered {{Slots}} get 
bound. To this, our change adds a new optional method to {{Bindable}} called 
{{hoistVariables}}. This method is where an instance of {{EnumerableRel}} 
extracts the values out of the query plan and binds them into the 
{{HoistedVariables}} instance just prior to executing the query. Below is 
{{EnumerableLimit}} implementation:
{code:java}
@Override public void hoistedVariables(HoistedVariables variables) {
  getInputs()
  .stream()
  .forEach(rel -> {
final EnumerableRel enumerable = (EnumerableRel) rel;
enumerable.hoistedVariables(variables);
  });
  if (fetchIndex != null) {
// fetchIndex is the registered slot for this variable. Bind fetchIndex to 
fetch
variables.setVariable(fetchIndex, RexLiteral.intValue(fetch));
  }
  if (offsetIndex != null) {
// offsetIndex is the registered slot for this variable. Bind offsetIndex 
to offset.
variables.setVariable(offsetIndex, RexLiteral.intValue(offset));
  }
}
{code}
To tie these three phases together, {{CalcitePrepareImpl}} needs to setup the 
variables when it creates a {{PreparedResult}}:
{code:java}
try {
  CatalogReader.THREAD_LOCAL.set(catalogReader);
  final SqlConformance conformance = context.config().conformance();
  internalParameters.put("_conformance", conformance);
  // Get the compiled Bindable instance either from cache or generate a new one.
  bindable =