[jira] [Comment Edited] (FLINK-5568) Introduce interface for catalog, and provide an in-memory implementation, and integrate with calcite schema

2017-02-23 Thread jingzhang (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15880229#comment-15880229
 ] 

jingzhang edited comment on FLINK-5568 at 2/23/17 10:30 AM:


[~fhueske], thanks for your advices. 
Here is my thoughts on your questions,  looking forward to your opinions.
1. {{ExternalCatalogTable}} is table definition or description of the external 
catalog. 
{{ExternalCatalogTable}} does not extend to {{FlinkTable}}. ({{FlinkTable}} 
is the table of Calcite Catalog because it extends to Calcite Table). But 
{{ExternalCatalogTable}} is the table of External Catalog.
When {{CalciteCatalogReader}} look up a table from Calcite catalog, Calcite 
schema would first  delegate its underlying externalCatalog to look up the 
{{ExternalCatalogTable}} instance , then calcite schema returns a 
TableSourceTable which holds the TableSource that are generated by the 
converter from the {{ExternalCatalogTable}}.
2. Yes, it's better to move {{partitionColumnNames}} into {{properties}}.
3.  It's my bad to said unclearly. We don't want to implement a new Schema 
class. In fact, we prefer to use Flink's representation, The DataSchema mode is 
as following:
{code}
   case class DataSchema(
   columnTypes: Array[TypeInformation[_]],
   columnNames: Array[String])
{code}
4. It is important to know where to scan these {{TableSource}} that is 
annotated with {{@ExternalCatalogCompatible}}.  We plan to depends on configure 
file.
 * let each connector specifies the scan packages in appointed configure 
file. (So if there is no such configure file in a connector module, we would 
not try to scan the {{TableSource}} from this module)
 * try to look up all the resources with the given name of classloader , 
and parse the scan-packages fields. 

Looking forward to your advices, thanks.


was (Author: jinyu.zj):
[~fhueske], thanks for your advices. 
Here is my thoughts on your questions,  looking forward to your opinions.
1. {{ExternalCatalogTable}} is table definition or description of the external 
catalog. 
{{ExternalCatalogTable}} does not extend to {{FlinkTable}}. ({{FlinkTable}} 
is the table of Calcite Catalog because it extends to Calcite Table). But 
{{ExternalCatalogTable}} is the table of External Catalog.
When {{CalciteCatalogReader}} look up a table from Calcite catalog, Calcite 
schema would first  delegate its underlying externalCatalog to look up the 
{{ExternalCatalogTable}} instance , then calcite schema returns a 
TableSourceTable which holds the TableSource that are generated by the 
converter from the {{ExternalCatalogTable}}.
2. Yes, it's better to move {{partitionColumnNames}} into {{properties}}.
3.  It's my bad to said unclearly. We don't want to implement a new Schema 
class. In fact, we prefer to use Flink's representation, The DataSchema mode is 
as following:
{code}
   case class DataSchema(
   columnTypes: Array[TypeInformation[_]],
   columnNames: Array[String])
{code}
4. It is important to know where to scan these {{TableSource}} that is 
annotated with {{@ExternalCatalogCompatible}}.  We plan to depends on configure 
file.
 * let each connector specifies the scan packages in appointed configure 
file. 
 * try to look up all the resources with the given name of classloader , 
and parse the scan-packages fields. 

Looking forward to your advices, thanks.

> Introduce interface for catalog, and provide an in-memory implementation, and 
> integrate with calcite schema
> ---
>
> Key: FLINK-5568
> URL: https://issues.apache.org/jira/browse/FLINK-5568
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table API & SQL
>Reporter: Kurt Young
>Assignee: jingzhang
>
> The {{TableEnvironment}} now provides a mechanism to register temporary 
> table. It registers the temp table to calcite catalog, so SQL and TableAPI 
> queries can access to those temp tables. Now DatasetTable,  DataStreamTable 
> and TableSourceTable can be registered to  {{TableEnvironment}} as temporary 
> tables.
> This issue wants to provides a mechanism to connect external catalogs such as 
> HCatalog to the {{TableEnvironment}}, so SQL and TableAPI queries could 
> access to tables in the external catalogs without register those tables to 
> {{TableEnvironment}} beforehand.
> First, we should point out that there are two kinds of catalog in Flink 
> actually. 
> The first one is external catalog as we mentioned before, it provides CRUD 
> operations to databases/tables.
> The second one is calcite catalog, it defines namespace that can be accessed 
> in Calcite queries. It depends on Calcite Schema/Table abstraction. 
> SqlValidator and SqlConverter 

[jira] [Comment Edited] (FLINK-5568) Introduce interface for catalog, and provide an in-memory implementation, and integrate with calcite schema

2017-02-23 Thread jingzhang (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15880229#comment-15880229
 ] 

jingzhang edited comment on FLINK-5568 at 2/23/17 10:28 AM:


[~fhueske], thanks for your advices. 
Here is my thoughts on your questions,  looking forward to your opinions.
1. {{ExternalCatalogTable}} is table definition or description of the external 
catalog. 
{{ExternalCatalogTable}} does not extend to {{FlinkTable}}. ({{FlinkTable}} 
is the table of Calcite Catalog because it extends to Calcite Table). But 
{{ExternalCatalogTable}} is the table of External Catalog.
When {{CalciteCatalogReader}} look up a table from Calcite catalog, Calcite 
schema would first  delegate its underlying externalCatalog to look up the 
{{ExternalCatalogTable}} instance , then calcite schema returns a 
TableSourceTable which holds the TableSource that are generated by the 
converter from the {{ExternalCatalogTable}}.
2. Yes, it's better to move {{partitionColumnNames}} into {{properties}}.
3.  It's my bad to said unclearly. We don't want to implement a new Schema 
class. In fact, we prefer to use Flink's representation, The DataSchema mode is 
as following:
{code}
   case class DataSchema(
   columnTypes: Array[TypeInformation[_]],
   columnNames: Array[String])
{code}
4. It is important to know where to scan these {{TableSource}} that is 
annotated with {{@ExternalCatalogCompatible}}.  We plan to depends on configure 
file.
 * let each connector specifies the scan packages in appointed configure 
file. 
 * try to look up all the resources with the given name of classloader , 
and parse the scan-packages fields. 

Looking forward to your advices, thanks.


was (Author: jinyu.zj):
[~fhueske], thanks for your advices. 
Here is my thoughts on your questions,  looking forward to your opinions.
1. {{ExternalCatalogTable}} is table definition or description of the external 
catalog. 
{{ExternalCatalogTable}} does not extend to {{FlinkTable}}. ({{FlinkTable}} 
is the table of Calcite Catalog because it extends to Calcite Table). But 
{{ExternalCatalogTable}} is the table of External Catalog.
When {{CalciteCatalogReader}} look up a table from Calcite catalog, Calcite 
schema would first look up the {{ExternalCatalogTable}} instance from the 
underlying externalCatalog, then return a TableSourceTable which holds the 
TableSource that are generated by the converter from the 
{{ExternalCatalogTable}}.
2. Yes, it's better to move {{partitionColumnNames}} into {{properties}}.
3.  It's my bad to said unclearly. We don't want to implement a new Schema 
class. In fact, we prefer to use Flink's representation, The DataSchema mode is 
as following:
{code}
   case class DataSchema(
   columnTypes: Array[TypeInformation[_]],
   columnNames: Array[String])
{code}
4. It is important to know where to scan these {{TableSource}} that is 
annotated with {{@ExternalCatalogCompatible}}.  We plan to depends on configure 
file.
 * let each connector specifies the scan packages in appointed configure 
file. 
 * try to look up all the resources with the given name of classloader , 
and parse the scan-packages fields. 

Looking forward to your advices, thanks.

> Introduce interface for catalog, and provide an in-memory implementation, and 
> integrate with calcite schema
> ---
>
> Key: FLINK-5568
> URL: https://issues.apache.org/jira/browse/FLINK-5568
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table API & SQL
>Reporter: Kurt Young
>Assignee: jingzhang
>
> The {{TableEnvironment}} now provides a mechanism to register temporary 
> table. It registers the temp table to calcite catalog, so SQL and TableAPI 
> queries can access to those temp tables. Now DatasetTable,  DataStreamTable 
> and TableSourceTable can be registered to  {{TableEnvironment}} as temporary 
> tables.
> This issue wants to provides a mechanism to connect external catalogs such as 
> HCatalog to the {{TableEnvironment}}, so SQL and TableAPI queries could 
> access to tables in the external catalogs without register those tables to 
> {{TableEnvironment}} beforehand.
> First, we should point out that there are two kinds of catalog in Flink 
> actually. 
> The first one is external catalog as we mentioned before, it provides CRUD 
> operations to databases/tables.
> The second one is calcite catalog, it defines namespace that can be accessed 
> in Calcite queries. It depends on Calcite Schema/Table abstraction. 
> SqlValidator and SqlConverter depends on the calcite catalog to fetch the 
> tables in SQL or TableAPI.
> So we need to do the following things:
> 1. introduce interface for