subject:"\[jira\] \[Comment Edited\] \(CALCITE\-2741\) Add operator table with Hive\-specific built\-in functions"

[jira] [Comment Edited] (CALCITE-2741) Add operator table with Hive-specific built-in functions

2019-05-06 Thread Lai Zhou (JIRA)



[ 
https://issues.apache.org/jira/browse/CALCITE-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16833649#comment-16833649
 ] 

Lai Zhou edited comment on CALCITE-2741 at 5/6/19 9:49 AM:
---

[~zabetak]，I also think it was not exactly an adapter. My initial goal was to

build a real-time/high-performance in memory sql engine that supports hive sql 
dialects on top of Calcite.

I had a try to use the JDBC interface first, but I encountered some issues:
 # custom config issue:  For every JDBC connection, we need put the data of 
current session into the schema, it means that current schema is bound to 
current session.

So the static SchemaFactory can't work out for this, we need introduce the DDL 
functions like what was in calcite-server module. The SqlDdlNodes in 

calcite-server module would populate the table through FrameworkConfig API .

When we execute a sql like 
{code:java}
create table t1 as select * from t2 where t2.id>100{code}
the populate method will be invoked，see  
[SqlDdlNodes.java#L221|https://github.com/apache/calcite/blob/0d504d20d47542e8d461982512ae0e7a94e4d6cb/server/src/main/java/org/apache/calcite/sql/ddl/SqlDdlNodes.java#L221]
 . We need custom the FrameworkConfig here, include 
OperatorTable,SqlConformance and more other custom configs. By the way, the 
FrameworkConfig should be builded with all the configs from current 
CalcitePrepare.Context rather than only the rootSchema , it was a bug.

And the config options of CalcitePrepare.Context was just a subset of 
FrameworkConfig, most of the time we need use the FrameworkConfig API directly 
to build a new sql engine.

When we execute a sql like 
{code:java}
select * from t2 where t2.id>100

{code}
CalcitePrepareImpl would handle this sql flow, it did the similar thing, but 
some configs are hard coded , such as RexExecutor,Programs.

When implementing the EnumerableRel, the RelImplementor also might need be 
customized, see the example 
[HiveEnumerableRelImplementor.java|https://github.com/51nb/marble/blob/master/marble-table-hive/src/main/java/org/apache/calcite/adapter/hive/HiveEnumerableRelImplementor.java].

Now the JDBC interface didn't provide the way to custom these configs, so we 
proposed a new Table API that inspired by Apache Flink, to simplify the usage 
of Calcite when building a new sql engine. 

      2. cache issue: It's not easy to cache the whole sql plan if we use JDBC 
interface to handle a query, due to it's multiple-phase processing flow, but it 
is very easy to do this with the Table API,see 
[TableEnv.java#L412|https://github.com/51nb/marble/blob/master/marble-table/src/main/java/org/apache/calcite/table/TableEnv.java#L412].

summary:

The proposed Table API makes it easy to config the sql engine and cache the 
whole sql plan to improve the query performance.It fits the scenes that satisfy 
these conditions：

the datasources are  deterministic and already in memory, there is no 
computation need to be pushed down;

-the sql queries are deterministic,without dynamic parameters, so the whole sql 
plan cache will be helpful(we can also use placeholders in the execution plan 
to cache the dynamic query  ).-

 

 

 

 

 

 

 

 


was (Author: hhlai1990):
[~zabetak]，I also think it was not exactly an adapter. My initial goal was to

build a real-time/high-performance in memory sql engine that supports hive sql 
dialects on top of Calcite.

I had a try to use the JDBC interface first, but I encountered some issues:
 # custom config issue:  For every JDBC connection, we need put the data of 
current session into the schema, it means that current schema is bound to 
current session.

So the static SchemaFactory can't work out for this, we need introduce the DDL 
functions like what was in calcite-server module. The SqlDdlNodes in 

calcite-server module would populate the table through FrameworkConfig API .

When we execute a sql like 
{code:java}
create table t1 as select * from t2 where t2.id>100{code}
the populate method will be invoked，see  
[SqlDdlNodes.java#L221|https://github.com/apache/calcite/blob/0d504d20d47542e8d461982512ae0e7a94e4d6cb/server/src/main/java/org/apache/calcite/sql/ddl/SqlDdlNodes.java#L221]
 . We need custom the FrameworkConfig here, include 
OperatorTable,SqlConformance and more other custom configs. By the way, the 
FrameworkConfig should be builded with all the configs from current 
CalcitePrepare.Context rather than only the rootSchema , it was a bug.

And the config options of CalcitePrepare.Context was just a subset of 
FrameworkConfig, most of the time we need use the FrameworkConfig API directly 
to build a new sql engine.

When we execute a sql like 
{code:java}
select * from t2 where t2.id>100

{code}
CalcitePrepareImpl would handle this sql flow, it did the similar thing, but 
some configs are hard coded , such as RexExecutor,Programs.

When implementing the EnumerableRel, the

[jira] [Comment Edited] (CALCITE-2741) Add operator table with Hive-specific built-in functions

2019-05-06 Thread Lai Zhou (JIRA)



[ 
https://issues.apache.org/jira/browse/CALCITE-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16833649#comment-16833649
 ] 

Lai Zhou edited comment on CALCITE-2741 at 5/6/19 9:36 AM:
---

[~zabetak]，I also think it was not exactly an adapter. My initial goal was to

build a real-time/high-performance in memory sql engine that supports hive sql 
dialects on top of Calcite.

I had a try to use the JDBC interface first, but I encountered some issues:
 # custom config issue:  For every JDBC connection, we need put the data of 
current session into the schema, it means that current schema is bound to 
current session.

So the static SchemaFactory can't work out for this, we need introduce the DDL 
functions like what was in calcite-server module. The SqlDdlNodes in 

calcite-server module would populate the table through FrameworkConfig API .

When we execute a sql like 
{code:java}
create table t1 as select * from t2 where t2.id>100{code}
the populate method will be invoked，see  
[SqlDdlNodes.java#L221|https://github.com/apache/calcite/blob/0d504d20d47542e8d461982512ae0e7a94e4d6cb/server/src/main/java/org/apache/calcite/sql/ddl/SqlDdlNodes.java#L221]
 . We need custom the FrameworkConfig here, include 
OperatorTable,SqlConformance and more other custom configs. By the way, the 
FrameworkConfig should be builded with all the configs from current 
CalcitePrepare.Context rather than only the rootSchema , it was a bug.

And the config options of CalcitePrepare.Context was just a subset of 
FrameworkConfig, most of the time we need use the FrameworkConfig API directly 
to build a new sql engine.

When we execute a sql like 
{code:java}
select * from t2 where t2.id>100

{code}
CalcitePrepareImpl would handle this sql flow, it did the similar thing, but 
some configs are hard coded , such as RexExecutor,Programs.

When implementing the EnumerableRel, the RelImplementor also might need be 
customized, see the example 
[HiveEnumerableRelImplementor.java|https://github.com/51nb/marble/blob/master/marble-table-hive/src/main/java/org/apache/calcite/adapter/hive/HiveEnumerableRelImplementor.java].

Now the JDBC interface didn't provide the way to custom these configs, so we 
proposed a new Table API that inspired by Apache Flink, to simplify the usage 
of Calcite when building a new sql engine. 

      2. cache issue: It's not easy to cache the whole sql plan if we use JDBC 
interface to handle a query, due to it's multiple-phase processing flow, but it 
is very easy to do this with the Table API,see 
[TableEnv.java#L412|https://github.com/51nb/marble/blob/master/marble-table/src/main/java/org/apache/calcite/table/TableEnv.java#L412].

summary:

The proposed Table API makes it easy to config the sql engine and cache the 
whole sql plan to improve the query performance.It fits the scenes that satisfy 
these conditions：

the datasources are  deterministic and already in memory, there is no 
computation need to be pushed down;

the sql queries are deterministic,without dynamic parameters, so the whole sql 
plan cache will be helpful(we can also use placeholders in the execution plan 
to cache the dynamic query  ).

 

 

 

 

 

 

 

 


was (Author: hhlai1990):
[~zabetak]，I also think it was not exactly an adapter. My initial goal was to

build a real-time/high-performance in memory sql engine that supports hive sql 
dialects on top of Calcite.

I had a try to use the JDBC interface first, but I encountered some issues:
 # custom config issue:  For every JDBC connection, we need put the data of 
current session into the schema, it means that current schema is bound to 
current session.

So the static SchemaFactory can't work out for this, we need introduce the DDL 
functions like what was in calcite-server module. The SqlDdlNodes in 

calcite-server module would populate the table through FrameworkConfig API .

When we execute a sql like 
{code:java}
create table t1 as select * from t2 where t2.id>100{code}
the populate method will be invoked，see  
[SqlDdlNodes.java#L221|https://github.com/apache/calcite/blob/0d504d20d47542e8d461982512ae0e7a94e4d6cb/server/src/main/java/org/apache/calcite/sql/ddl/SqlDdlNodes.java#L221]
 . We need custom the FrameworkConfig here, include 
OperatorTable,SqlConformance and more other custom configs. By the way, the 
FrameworkConfig should be builded with all the configs from current 
CalcitePrepare.Context rather than only the rootSchema , it was a bug.

And the config options of CalcitePrepare.Context was just a subset of 
FrameworkConfig, most of the time we need use the FrameworkConfig API directly 
to build a new sql engine.

When we execute a sql like 
{code:java}
select * from t2 where t2.id>100

{code}
CalcitePrepareImpl would handle this sql flow, it did the similar thing, but 
some configs are hard coded , such as RexExecutor,Programs.

When implementing the EnumerableRel, the RelImplementor

[jira] [Comment Edited] (CALCITE-2741) Add operator table with Hive-specific built-in functions

2019-04-30 Thread Lai Zhou (JIRA)



[ 
https://issues.apache.org/jira/browse/CALCITE-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16830061#comment-16830061
 ] 

Lai Zhou edited comment on CALCITE-2741 at 4/30/19 9:40 AM:


hi,[~julianhyde] , [~zabetak],[~zhztheplayer], [~hyuan],[~francischuang]

I create a new adapter of Calcite that supports hive sql queries on dataset. 

Since the extensions is made base on Calcite 1.18.0, I pushed the project to a 
new codebase: [https://github.com/51nb/marble]

And I proposed a Table API to make it easy to execute a sql query.

We use it in our company's core financial business to unify the way to compute 
lots of model variables .

This project shows how we extend Calcite core to support hive sql queries, it 
may be helpful to people 

who wants to build a customized sql engine on top of Calcite.

 

 

 

 


was (Author: hhlai1990):
hi,[~julianhyde] , [~zabetak],[~zhztheplayer], [~hyuan],[~francischuang]

I create a new adapter of Calcite that support hive sql queries on dataset. 

Since the extensions is made base on Calcite 1.18.0, I pushed the project to a 
new codebase: [https://github.com/51nb/marble]

And I proposed a Table API to make it easy to execute a sql query.

We use it in our company's core financial business to unify the way to compute 
lots of model variables .

This project shows how we extend Calcite core to support hive sql queries, it 
may be helpful to people 

who wants to build a customized sql engine on top of Calcite.

 

 

 

 

> Add operator table with Hive-specific built-in functions
> 
>
> Key: CALCITE-2741
> URL: https://issues.apache.org/jira/browse/CALCITE-2741
> Project: Calcite
>  Issue Type: New Feature
>  Components: core
>Reporter: Lai Zhou
>Priority: Minor
>
> [~julianhyde],
> I write a hive adapter for calcite to support Hive sql ,includes 
> UDF、UDAF、UDTF and some of SqlSpecialOperator.
> How do you think of supporting a direct implemention of hive sql like this?
> I think it will be valueable when someone want to migrate his hive etl jobs 
> to real-time scene.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (CALCITE-2741) Add operator table with Hive-specific built-in functions

2019-04-30 Thread Lai Zhou (JIRA)



[ 
https://issues.apache.org/jira/browse/CALCITE-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16830061#comment-16830061
 ] 

Lai Zhou edited comment on CALCITE-2741 at 4/30/19 8:30 AM:


hi,[~julianhyde] , [~zabetak],[~zhztheplayer], [~hyuan],[~francischuang]

I create a new adapter of Calcite that support hive sql queries on dataset. 

Since the extensions is made base on Calcite 1.18.0, I pushed the project to a 
new codebase: [https://github.com/51nb/marble]

And I proposed a Table API to make it easy to execute a sql query.

We use it in our company's core financial business to unify the way to compute 
lots of model variables .

This project shows how we extend Calcite core to support hive sql queries, it 
may be helpful to people 

who wants to build a customized sql engine on top of Calcite.

 

 

 

 


was (Author: hhlai1990):
hi,[~julianhyde] , [~zabetak],[~zhztheplayer], [~hyuan],[~francischuang]

I create a new adapter of Calcite that support hive sql queries on dataset. 

Since the extensions is made base on Calcite 1.18.0, I pushed the project to a 
new codebase. 

[[https://github.com/51nb/marble]|[https://github.com/51nb/marble]].

And I proposed a Table API to make it easy to execute a sql query.

We use it in our company's core financial business to unify the way to compute 
lots of model variables .

This project shows how we extend Calcite core to support hive sql queries, it 
may be helpful to people 

who wants to build a customized sql engine on top of Calcite.

 

 

 

 

> Add operator table with Hive-specific built-in functions
> 
>
> Key: CALCITE-2741
> URL: https://issues.apache.org/jira/browse/CALCITE-2741
> Project: Calcite
>  Issue Type: New Feature
>  Components: core
>Reporter: Lai Zhou
>Priority: Minor
>
> [~julianhyde],
> I write a hive adapter for calcite to support Hive sql ,includes 
> UDF、UDAF、UDTF and some of SqlSpecialOperator.
> How do you think of supporting a direct implemention of hive sql like this?
> I think it will be valueable when someone want to migrate his hive etl jobs 
> to real-time scene.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (CALCITE-2741) Add operator table with Hive-specific built-in functions

2019-01-09 Thread Lai Zhou (JIRA)



[ 
https://issues.apache.org/jira/browse/CALCITE-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16738061#comment-16738061
 ] 

Lai Zhou edited comment on CALCITE-2741 at 1/9/19 10:10 AM:


hi，[~zabetak]，can you please give a right intellij-code-style template to me?

I already have one from 
[https://gist.github.com/gianm/27a4e3cad99d7b9b6513b6885d3cfcc9],

but there are still a lot of  errors when performing maven-checkstyle:
{code:java}
[ERROR] 
/Users/zhoulai/Downloads/big-data/calcite/core/src/main/java/org/apache/calcite/util/Util.java:2105:13:
 ';' is followed by whitespace. [EmptyForIteratorPad] [ERROR] 
/Users/zhoulai/Downloads/big-data/calcite/core/src/main/java/org/apache/calcite/util/Util.java:2110:13:
 ';' is preceded with whitespace. [NoWhitespaceBefore] [ERROR] 
/Users/zhoulai/Downloads/big-data/calcite/core/src/main/java/org/apache/calcite/util/Util.java:2110:15:
 ';' is followed by whitespace. [EmptyForIteratorPad] [ERROR] 
/Users/zhoulai/Downloads/big-data/calcite/core/src/main/java/org/apache/calcite/util/Util.java:2361:
 'toImmutableList' have incorrect indentation level 2, expected level should be 
6. [Indentation] [ERROR] 
/Users/zhoulai/Downloads/big-data/calcite/core/src/main/java/org/apache/calcite/config/CalciteConnectionConfigImpl.java:121:
 Line is longer than 100 characters (found 105). [LineLength] [ERROR] 
/Users/zhoulai/Downloads/big-data/calcite/core/src/main/java/org/apache/calcite/config/CalciteConnectionConfigImpl.java:121:31:
 '=' is not preceded with whitespace. [WhitespaceAround] [ERROR] 
/Users/zhoulai/Downloads/big-data/calcite/core/src/main/java/org/apache/calcite/config/CalciteConnectionConfigImpl.java:121:32:
 '=' is not followed by whitespace. [WhitespaceAround] [ERROR] 
/Users/zhoulai/Downloads/big-data/calcite/core/src/main/java/org/apache/calcite/config/CalciteConnectionConfigImpl.java:122:30:
 '=' is not preceded with whitespace. [WhitespaceAround] [ERROR] 
/Users/zhoulai/Downloads/big-data/calcite/core/src/main/java/org/apache/calcite/config/CalciteConnectionConfigImpl.java:122:31:
 '=' is not followed by whitespace. [WhitespaceAround] [ERROR] 
/Users/zhoulai/Downloads/big-data/calcite/core/src/main/java/org/apache/calcite/config/CalciteConnectionConfigImpl.java:123:24:
 '=' is not preceded with whitespace. [WhitespaceAround] [ERROR] 
/Users/zhoulai/Downloads/big-data/calcite/core/src/main/java/org/apache/calcite/config/CalciteConnectionConfigImpl.java:126:8:
 'catch' is not preceded with whitespace. [WhitespaceAround] [ERROR] 
/Users/zhoulai/Downloads/big-data/calcite/core/src/main/java/org/apache/calcite/config/CalciteConnectionConfigImpl.java:126:8:
 '}' is not followed by whitespace. [WhitespaceAround] [ERROR] 
/Users/zhoulai/Downloads/big-data/calcite/core/src/main/java/org/apache/calcite/config/CalciteConnectionConfigImpl.java:126:28:
 '{' is not preceded with whitespace. [WhitespaceAround] [ERROR] 
/Users/zhoulai/Downloads/big-data/calcite/core/src/main/java/org/apache/calcite/config/CalciteConnectionConfigImpl.java:127:
 Line is longer than 100 characters (found 132). [LineLength] [ERROR] 
/Users/zhoulai/Downloads/big-data/calcite/core/src/main/java/org/apache/calcite/config/CalciteConnectionConfigImpl.java:127:127:
 ',' is preceded with whitespace. [NoWhitespaceBefore] [ERROR] 
/Users/zhoulai/Downloads/big-data/calcite/core/src/main/java/org/apache/calcite/config/CalciteConnectionConfigImpl.java:127:129:
 ',' is not followed by whitespace. [WhitespaceAfter] [ERROR] 
/Users/zhoulai/Downloads/big-data/calcite/core/src/main/java/org/apache/calcite/runtime/SqlFunctions.java:212:13:
 ';' is preceded with whitespace. [NoWhitespaceBefore] [ERROR] 
/Users/zhoulai/Downloads/big-data/calcite/core/src/main/java/org/apache/calcite/runtime/SqlFunctions.java:212:15:
 ';' is followed by whitespace. [EmptyForIteratorPad] [
{code}
 


was (Author: hhlai1990):
hi，[~zabetak]，can you please give a right intellij-code-style template to me?

 

I already have one from 
[https://gist.github.com/gianm/27a4e3cad99d7b9b6513b6885d3cfcc9],

but there are still a lot of  errors when performing maven-checkstyle:

!屏幕快照 2019-01-09 16.49.34.png!

 

> Add operator table with Hive-specific built-in functions
> 
>
> Key: CALCITE-2741
> URL: https://issues.apache.org/jira/browse/CALCITE-2741
> Project: Calcite
>  Issue Type: New Feature
>  Components: core
>Reporter: Lai Zhou
>Assignee: Julian Hyde
>Priority: Minor
> Attachments: 屏幕快照 2019-01-09 16.49.34.png
>
>
> [~julianhyde],
> I extended the native enummerable implemention of calcite to support Hive sql 
> ,include UDF、UDAF and all the SqlSpecialOperator,which inspired by apache 
> Drills.
> I modified the parser,type systems,and bridge the hive operator .
> How do you think of

[jira] [Comment Edited] (CALCITE-2741) Add operator table with Hive-specific built-in functions

2019-01-07 Thread Lai Zhou (JIRA)



[ 
https://issues.apache.org/jira/browse/CALCITE-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16736820#comment-16736820
 ] 

Lai Zhou edited comment on CALCITE-2741 at 1/8/19 7:20 AM:
---

[~julianhyde], I there a right way to add local fileld declarations into  the  
bind method  of 'Baz' class?
{code:java}
public org.apache.calcite.linq4j.Enumerable bind(final 
org.apache.calcite.DataContext root)  

{code}
I write a new NotNullImplementor for hive operators, that returns a expression 
like 
{code:java}
org.apache.calcite.hivesql.function.HiveUDFInvoke.invokeGenericUdfGetBoolean(udfInstance_1,
 new Object[] {...)  

{code}
the udfInstance_1 is a hive generic udf instance that should be constructed at 
the beginning of the bind method block, like 
{code:java}
public org.apache.calcite.linq4j.Enumerable bind(final 
org.apache.calcite.DataContext root) { 
final org.apache.hadoop.hive.ql.udf.generic.GenericUDF udfInstance_2 
=org.apache.calcite.hivesql.function.HiveUDFInvoke.createGenericUDFInstance("OR",
 org.apache.calcite.sql.SqlSyntax.BINARY); 
final org.apache.hadoop.hive.ql.udf.generic.GenericUDF udfInstance_3 = 
org.apache.calcite.hivesql.function.HiveUDFInvoke.createGenericUDFInstance("AND",
 org.apache.calcite.sql.SqlSyntax.BINARY); 
final org.apache.hadoop.hive.ql.udf.generic.GenericUDF udfInstance_4 = 
org.apache.calcite.hivesql.function.HiveUDFInvoke.createGenericUDFInstance("<", 
org.apache.calcite.sql.SqlSyntax.BINARY); 
final org.apache.hadoop.hive.ql.udf.generic.GenericUDF udfInst ance_1 = 
org.apache.calcite.hivesql.function.HiveUDFInvoke.createGenericUDFInstance("=", 
org.apache.calcite.sql.SqlSyntax.BINARY);
 final org.apache.hadoop.hive.ql.udf.generic.GenericUDF udfInstance_0 = 
org.apache.calcite.hivesql.function.HiveUDFInvoke.createGenericUDFInstance(">", 
org.apache.calcite.sql.SqlSyntax.BINARY);
 final org.apache.hadoop.hive.ql.udf.generic.GenericUDF udfInstance_5 = 
org.apache.calcite.hivesql.function.HiveUDFInvoke.createGenericUDFInstance("SUBSTR",
 org.apache.calcite.sql.SqlSyntax.FUNCTION); 
final org.apache.calcite.linq4j.Enumerable _inputEnumerable = 
org.apache.calcite.schema.Schemas.queryable(root, 
root.getRootSchema().getSubSchema("DEFAULT_SCH"), java.lang.Object[].class, 
"T").asEnumerable();

{code}
I think it'd be better to stash the local field declarations when implement a 
RexCall.But the 

RexToLixTranslator did not hold a reference of EnumerableRelImplementor, can 
you give me some suggestions to support this feature? (now I use an arbitrary 
way to support it ,just use a ThreadLocal context to stash things, and clear 
all things when parse a new sql query ).

 

 


was (Author: hhlai1990):
[~julianhyde], I there a right way to add local fileld declarations into  the  
bind method  of 'Baz' class?
{code:java}
public org.apache.calcite.linq4j.Enumerable bind(final 
org.apache.calcite.DataContext root)  

{code}
I write a new NotNullImplementor for hive operators, that returns a expression 
like 
{code:java}
org.apache.calcite.hivesql.function.HiveUDFInvoke.invokeGenericUdfGetBoolean(udfInstance_1,
 new Object[] {...)  

{code}
the udfInstance_1 is a hive generic udf instance that should be constructed at 
the beginning of the bind method block, like 
{code:java}
public org.apache.calcite.linq4j.Enumerable bind(final 
org.apache.calcite.DataContext root) { final 
org.apache.hadoop.hive.ql.udf.generic.GenericUDF udfInstance_2 = 
org.apache.calcite.hivesql.function.HiveUDFInvoke.createGenericUDFInstance("OR",
 org.apache.calcite.sql.SqlSyntax.BINARY); final 
org.apache.hadoop.hive.ql.udf.generic.GenericUDF udfInstance_3 = 
org.apache.calcite.hivesql.function.HiveUDFInvoke.createGenericUDFInstance("AND",
 org.apache.calcite.sql.SqlSyntax.BINARY); final 
org.apache.hadoop.hive.ql.udf.generic.GenericUDF udfInstance_4 = 
org.apache.calcite.hivesql.function.HiveUDFInvoke.createGenericUDFInstance("<", 
org.apache.calcite.sql.SqlSyntax.BINARY); final 
org.apache.hadoop.hive.ql.udf.generic.GenericUDF udfInst ance_1 = 
org.apache.calcite.hivesql.function.HiveUDFInvoke.createGenericUDFInstance("=", 
org.apache.calcite.sql.SqlSyntax.BINARY); final 
org.apache.hadoop.hive.ql.udf.generic.GenericUDF udfInstance_0 = 
org.apache.calcite.hivesql.function.HiveUDFInvoke.createGenericUDFInstance(">", 
org.apache.calcite.sql.SqlSyntax.BINARY); final 
org.apache.hadoop.hive.ql.udf.generic.GenericUDF udfInstance_5 = 
org.apache.calcite.hivesql.function.HiveUDFInvoke.createGenericUDFInstance("SUBSTR",
 org.apache.calcite.sql.SqlSyntax.FUNCTION); final 
org.apache.calcite.linq4j.Enumerable _inputEnumerable = 
org.apache.calcite.schema.Schemas.queryable(root, 
root.getRootSchema().getSubSchema("DEFAULT_SCH"), java.lang.Object[].class, 
"T").asEnumerable();

{code}
I think it'd be better to stash the local field declarations when implement a 
RexCall.But the 

RexToLixTranslator did not hold a reference

[jira] [Comment Edited] (CALCITE-2741) Add operator table with Hive-specific built-in functions

2019-01-07 Thread Lai Zhou (JIRA)



[ 
https://issues.apache.org/jira/browse/CALCITE-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16736820#comment-16736820
 ] 

Lai Zhou edited comment on CALCITE-2741 at 1/8/19 7:17 AM:
---

[~julianhyde], I there a right way to add local fileld declarations into  the  
bind method  of 'Baz' class?
{code:java}
public org.apache.calcite.linq4j.Enumerable bind(final 
org.apache.calcite.DataContext root)  

{code}
I write a new NotNullImplementor for hive operators, that returns a expression 
like 
{code:java}
org.apache.calcite.hivesql.function.HiveUDFInvoke.invokeGenericUdfGetBoolean(udfInstance_1,
 new Object[] {...)  

{code}
the udfInstance_1 is a hive generic udf instance that should be constructed at 
the beginning of the bind method block, like 
{code:java}
public org.apache.calcite.linq4j.Enumerable bind(final 
org.apache.calcite.DataContext root) { final 
org.apache.hadoop.hive.ql.udf.generic.GenericUDF udfInstance_2 = 
org.apache.calcite.hivesql.function.HiveUDFInvoke.createGenericUDFInstance("OR",
 org.apache.calcite.sql.SqlSyntax.BINARY); final 
org.apache.hadoop.hive.ql.udf.generic.GenericUDF udfInstance_3 = 
org.apache.calcite.hivesql.function.HiveUDFInvoke.createGenericUDFInstance("AND",
 org.apache.calcite.sql.SqlSyntax.BINARY); final 
org.apache.hadoop.hive.ql.udf.generic.GenericUDF udfInstance_4 = 
org.apache.calcite.hivesql.function.HiveUDFInvoke.createGenericUDFInstance("<", 
org.apache.calcite.sql.SqlSyntax.BINARY); final 
org.apache.hadoop.hive.ql.udf.generic.GenericUDF udfInst ance_1 = 
org.apache.calcite.hivesql.function.HiveUDFInvoke.createGenericUDFInstance("=", 
org.apache.calcite.sql.SqlSyntax.BINARY); final 
org.apache.hadoop.hive.ql.udf.generic.GenericUDF udfInstance_0 = 
org.apache.calcite.hivesql.function.HiveUDFInvoke.createGenericUDFInstance(">", 
org.apache.calcite.sql.SqlSyntax.BINARY); final 
org.apache.hadoop.hive.ql.udf.generic.GenericUDF udfInstance_5 = 
org.apache.calcite.hivesql.function.HiveUDFInvoke.createGenericUDFInstance("SUBSTR",
 org.apache.calcite.sql.SqlSyntax.FUNCTION); final 
org.apache.calcite.linq4j.Enumerable _inputEnumerable = 
org.apache.calcite.schema.Schemas.queryable(root, 
root.getRootSchema().getSubSchema("DEFAULT_SCH"), java.lang.Object[].class, 
"T").asEnumerable();

{code}
I think it'd be better to stash the local field declarations when implement a 
RexCall.But the 

RexToLixTranslator did not hold a reference of EnumerableRelImplementor, can 
you give me some suggestions to support this feature? (now I use an arbitrary 
way to support it ,just use a ThreadLocal context to stash things, and clear 
all things when parse a new sql query ).

 

 


was (Author: hhlai1990):
[~julianhyde], I there a right way to add local fileld declarations into  the  
bind method  of 'Baz' class?
{code:java}
public org.apache.calcite.linq4j.Enumerable bind(final 
org.apache.calcite.DataContext root)  

{code}
I write a new NotNullImplementor for hive operators, that returns a expression 
like 
{code:java}
org.apache.calcite.hivesql.function.HiveUDFInvoke.invokeGenericUdfGetBoolean(udfInstance_1,
 new Object[] {...)  

{code}
the udfInstance_1 is a hive generic udf instance that should be constructed at 
the beginning of the bind method block, like 
{code:java}
public org.apache.calcite.linq4j.Enumerable bind(final 
org.apache.calcite.DataContext root) { final 
org.apache.hadoop.hive.ql.udf.generic.GenericUDF udfInstance_2 = 
org.apache.calcite.hivesql.function.HiveUDFInvoke.createGenericUDFInstance("OR",
 org.apache.calcite.sql.SqlSyntax.BINARY); final 
org.apache.hadoop.hive.ql.udf.generic.GenericUDF udfInstance_3 = 
org.apache.calcite.hivesql.function.HiveUDFInvoke.createGenericUDFInstance("AND",
 org.apache.calcite.sql.SqlSyntax.BINARY); final 
org.apache.hadoop.hive.ql.udf.generic.GenericUDF udfInstance_4 = 
org.apache.calcite.hivesql.function.HiveUDFInvoke.createGenericUDFInstance("<", 
org.apache.calcite.sql.SqlSyntax.BINARY); final 
org.apache.hadoop.hive.ql.udf.generic.GenericUDF udfInst ance_1 = 
org.apache.calcite.hivesql.function.HiveUDFInvoke.createGenericUDFInstance("=", 
org.apache.calcite.sql.SqlSyntax.BINARY); final 
org.apache.hadoop.hive.ql.udf.generic.GenericUDF udfInstance_0 = 
org.apache.calcite.hivesql.function.HiveUDFInvoke.createGenericUDFInstance(">", 
org.apache.calcite.sql.SqlSyntax.BINARY); final 
org.apache.hadoop.hive.ql.udf.generic.GenericUDF udfInstance_5 = 
org.apache.calcite.hivesql.function.HiveUDFInvoke.createGenericUDFInstance("SUBSTR",
 org.apache.calcite.sql.SqlSyntax.FUNCTION); final 
org.apache.calcite.linq4j.Enumerable _inputEnumerable = 
org.apache.calcite.schema.Schemas.queryable(root, 
root.getRootSchema().getSubSchema("DEFAULT_SCH"), java.lang.Object[].class, 
"T").asEnumerable();

{code}
I think it'd be better to stash the local field declarations when implement a 
RexCall.But the 

RexToLixTranslator did not hold a reference

[jira] [Comment Edited] (CALCITE-2741) Add operator table with Hive-specific built-in functions

2018-12-20 Thread Lai Zhou (JIRA)



[ 
https://issues.apache.org/jira/browse/CALCITE-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16726393#comment-16726393
 ] 

Lai Zhou edited comment on CALCITE-2741 at 12/21/18 4:07 AM:
-

It will take some time to merge my modifications based on calcite 1.17.0 into 
the master branch , I will add  a new module  "calcite-hivesql" of the project 
,which includes tests of my use case .Afterwards,I will submit the  PR.


was (Author: hhlai1990):
It will take some time to merge my modifications based on calcite 1.17.0 into 
the master branch , I will add  a new module  "hive" of the project ,which 
includes tests of my use case .Afterwards,I will submit the  PR.

> Add operator table with Hive-specific built-in functions
> 
>
> Key: CALCITE-2741
> URL: https://issues.apache.org/jira/browse/CALCITE-2741
> Project: Calcite
>  Issue Type: New Feature
>  Components: core
>Reporter: Lai Zhou
>Assignee: Julian Hyde
>Priority: Minor
>
> [~julianhyde],
> I extended the native enummerable implemention of calcite to support Hive sql 
> ,include UDF、UDAF and all the SqlSpecialOperator,which inspired by apache 
> Drills.
> I modified the parser,type systems,and bridge the hive operator .
> How do you think of supporting a direct implemention of hive sql like this?
> I think it will be valueable when someone want to migrate his hive etl jobs 
> to real-time scene.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (CALCITE-2741) Add operator table with Hive-specific built-in functions

2018-12-16 Thread Lai Zhou (JIRA)



[ 
https://issues.apache.org/jira/browse/CALCITE-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16722744#comment-16722744
 ] 

Lai Zhou edited comment on CALCITE-2741 at 12/17/18 7:31 AM:
-

[~julianhyde], I have seen what you did for Oracle functions . But We  need to  
do more to support hive operators.

Here is what I did:

1.Add  'fun=hive' config, register all op to a HiveSqlOperatorTable,  so a 
SqlCall  can be converted dynamically to

a RexCall which holds a hive operator instance. I think it'd be better  not to 
reuse the op instances of SqlStdOperatorTable, because the types of a hive 
operator is always  not deterministic, the data type of a hive operator's 
output result depends on the data type of  it's  input parameters. We use a 
HiveOperatorWrapper to resolve the hive GenericUDF  dynamically , when 
deriveType for a SqlCall ,it creates a instance of GenericUDF, call the 
initialize method to get the correct result type.

So we need to add  a sql type mapping from java  to hive.

2.Define Implementor for hive operators.We have to  modify the  RexImplTable to 
define Implementor for hive operator, because we want to reuse the enumerable 
implemention as far as possible.  It will be reasonable to give an extension 
hook to define user's  Implementor , may be I have not found the right way .

3.To improve performance of  the executing for a EnumerableRel, we inject some 
final fields into the generated class 'Baz',which holds the GenericUDF 
instance, to avoid  creating new instance repeatedly .

Besides，we do  a little modification for the Parser to  support some special 
operator ,such as rlike ,regexp (where c rlike '...')

All above modifications is done on  calcite 1.17.0.

Now can you give me some suggestions, what's the right way to get all these 
things done ?

 


was (Author: hhlai1990):
[~julianhyde], I have seen what you did for Oracle functions . But We  need to  
do more to support hive operators.

Here is what I did:

1.Add  'fun=hive' config, register all op to a HiveSqlOperatorTable,  so a 
SqlCall  can be converted dynamically to

a RexCall which holds a hive operator instance. I think it'd be better  not to 
reuse the op instances of SqlStdOperatorTable, because the types of a hive 
operator is always  not deterministic, the data type of a hive operator's 
output result depends on the data type of  it's  input parameters. We use a 
HiveOperatorWrapper to resolve the hive GenericUDF  dynamically , when 
deriveType for a SqlCall ,it creates a instance of GenericUDF, call the 
initialize method to get the correct result type.

So we need to add  a sql type mapping from java  to hive.

2.Define Implementor for hive operators.We have to  modify the  RexImplTable to 
define Implementor for hive operator, because we want to reuse the enumerable 
implemention as far as possible.  It will be reasonable to give an extension 
hook to define user's  Implementor , may be I have not found the right way .

3.To improve performance of  the executing for a EnumerableRel, we inject some 
final fields to the generated class 'Baz',which holds the GenericUDF instance, 
to avoid  creating new instance repeatedly .

Besides，we do  a little modification for the Parser to  support some special 
operator ,such as rlike ,regexp (where c rlike '...')

All above modifications is done on  calcite 1.17.0.

Now can you give me some suggestions, what's the right way to get all these 
things done ?

 

> Add operator table with Hive-specific built-in functions
> 
>
> Key: CALCITE-2741
> URL: https://issues.apache.org/jira/browse/CALCITE-2741
> Project: Calcite
>  Issue Type: New Feature
>  Components: core
>Reporter: Lai Zhou
>Assignee: Julian Hyde
>Priority: Minor
>
> [~julianhyde],
> I extended the native enummerable implemention of calcite to support Hive sql 
> ,include UDF、UDAF and all the SqlSpecialOperator,which inspired by apache 
> Drills.
> I modified the parser,type systems,and bridge the hive operator .
> How do you think of supporting a direct implemention of hive sql like this?
> I think it will be valueable when someone want to migrate his hive etl jobs 
> to real-time scene.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (CALCITE-2741) Add operator table with Hive-specific built-in functions

[jira] [Comment Edited] (CALCITE-2741) Add operator table with Hive-specific built-in functions

[jira] [Comment Edited] (CALCITE-2741) Add operator table with Hive-specific built-in functions

[jira] [Comment Edited] (CALCITE-2741) Add operator table with Hive-specific built-in functions

[jira] [Comment Edited] (CALCITE-2741) Add operator table with Hive-specific built-in functions

[jira] [Comment Edited] (CALCITE-2741) Add operator table with Hive-specific built-in functions

[jira] [Comment Edited] (CALCITE-2741) Add operator table with Hive-specific built-in functions

[jira] [Comment Edited] (CALCITE-2741) Add operator table with Hive-specific built-in functions

[jira] [Comment Edited] (CALCITE-2741) Add operator table with Hive-specific built-in functions

9 matches

Site Navigation

Mail list logo

Footer information