[jira] [Comment Edited] (FLINK-5568) Introduce interface for catalog, and provide an in-memory implementation, and integrate with calcite schema
[ https://issues.apache.org/jira/browse/FLINK-5568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15880229#comment-15880229 ] jingzhang edited comment on FLINK-5568 at 2/23/17 10:30 AM: [~fhueske], thanks for your advices. Here is my thoughts on your questions, looking forward to your opinions. 1. {{ExternalCatalogTable}} is table definition or description of the external catalog. {{ExternalCatalogTable}} does not extend to {{FlinkTable}}. ({{FlinkTable}} is the table of Calcite Catalog because it extends to Calcite Table). But {{ExternalCatalogTable}} is the table of External Catalog. When {{CalciteCatalogReader}} look up a table from Calcite catalog, Calcite schema would first delegate its underlying externalCatalog to look up the {{ExternalCatalogTable}} instance , then calcite schema returns a TableSourceTable which holds the TableSource that are generated by the converter from the {{ExternalCatalogTable}}. 2. Yes, it's better to move {{partitionColumnNames}} into {{properties}}. 3. It's my bad to said unclearly. We don't want to implement a new Schema class. In fact, we prefer to use Flink's representation, The DataSchema mode is as following: {code} case class DataSchema( columnTypes: Array[TypeInformation[_]], columnNames: Array[String]) {code} 4. It is important to know where to scan these {{TableSource}} that is annotated with {{@ExternalCatalogCompatible}}. We plan to depends on configure file. * let each connector specifies the scan packages in appointed configure file. (So if there is no such configure file in a connector module, we would not try to scan the {{TableSource}} from this module) * try to look up all the resources with the given name of classloader , and parse the scan-packages fields. Looking forward to your advices, thanks. was (Author: jinyu.zj): [~fhueske], thanks for your advices. Here is my thoughts on your questions, looking forward to your opinions. 1. {{ExternalCatalogTable}} is table definition or description of the external catalog. {{ExternalCatalogTable}} does not extend to {{FlinkTable}}. ({{FlinkTable}} is the table of Calcite Catalog because it extends to Calcite Table). But {{ExternalCatalogTable}} is the table of External Catalog. When {{CalciteCatalogReader}} look up a table from Calcite catalog, Calcite schema would first delegate its underlying externalCatalog to look up the {{ExternalCatalogTable}} instance , then calcite schema returns a TableSourceTable which holds the TableSource that are generated by the converter from the {{ExternalCatalogTable}}. 2. Yes, it's better to move {{partitionColumnNames}} into {{properties}}. 3. It's my bad to said unclearly. We don't want to implement a new Schema class. In fact, we prefer to use Flink's representation, The DataSchema mode is as following: {code} case class DataSchema( columnTypes: Array[TypeInformation[_]], columnNames: Array[String]) {code} 4. It is important to know where to scan these {{TableSource}} that is annotated with {{@ExternalCatalogCompatible}}. We plan to depends on configure file. * let each connector specifies the scan packages in appointed configure file. * try to look up all the resources with the given name of classloader , and parse the scan-packages fields. Looking forward to your advices, thanks. > Introduce interface for catalog, and provide an in-memory implementation, and > integrate with calcite schema > --- > > Key: FLINK-5568 > URL: https://issues.apache.org/jira/browse/FLINK-5568 > Project: Flink > Issue Type: Sub-task > Components: Table API & SQL >Reporter: Kurt Young >Assignee: jingzhang > > The {{TableEnvironment}} now provides a mechanism to register temporary > table. It registers the temp table to calcite catalog, so SQL and TableAPI > queries can access to those temp tables. Now DatasetTable, DataStreamTable > and TableSourceTable can be registered to {{TableEnvironment}} as temporary > tables. > This issue wants to provides a mechanism to connect external catalogs such as > HCatalog to the {{TableEnvironment}}, so SQL and TableAPI queries could > access to tables in the external catalogs without register those tables to > {{TableEnvironment}} beforehand. > First, we should point out that there are two kinds of catalog in Flink > actually. > The first one is external catalog as we mentioned before, it provides CRUD > operations to databases/tables. > The second one is calcite catalog, it defines namespace that can be accessed > in Calcite queries. It depends on Calcite Schema/Table abstraction. > SqlValidator and SqlConverter
[jira] [Comment Edited] (FLINK-5568) Introduce interface for catalog, and provide an in-memory implementation, and integrate with calcite schema
[ https://issues.apache.org/jira/browse/FLINK-5568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15880229#comment-15880229 ] jingzhang edited comment on FLINK-5568 at 2/23/17 10:28 AM: [~fhueske], thanks for your advices. Here is my thoughts on your questions, looking forward to your opinions. 1. {{ExternalCatalogTable}} is table definition or description of the external catalog. {{ExternalCatalogTable}} does not extend to {{FlinkTable}}. ({{FlinkTable}} is the table of Calcite Catalog because it extends to Calcite Table). But {{ExternalCatalogTable}} is the table of External Catalog. When {{CalciteCatalogReader}} look up a table from Calcite catalog, Calcite schema would first delegate its underlying externalCatalog to look up the {{ExternalCatalogTable}} instance , then calcite schema returns a TableSourceTable which holds the TableSource that are generated by the converter from the {{ExternalCatalogTable}}. 2. Yes, it's better to move {{partitionColumnNames}} into {{properties}}. 3. It's my bad to said unclearly. We don't want to implement a new Schema class. In fact, we prefer to use Flink's representation, The DataSchema mode is as following: {code} case class DataSchema( columnTypes: Array[TypeInformation[_]], columnNames: Array[String]) {code} 4. It is important to know where to scan these {{TableSource}} that is annotated with {{@ExternalCatalogCompatible}}. We plan to depends on configure file. * let each connector specifies the scan packages in appointed configure file. * try to look up all the resources with the given name of classloader , and parse the scan-packages fields. Looking forward to your advices, thanks. was (Author: jinyu.zj): [~fhueske], thanks for your advices. Here is my thoughts on your questions, looking forward to your opinions. 1. {{ExternalCatalogTable}} is table definition or description of the external catalog. {{ExternalCatalogTable}} does not extend to {{FlinkTable}}. ({{FlinkTable}} is the table of Calcite Catalog because it extends to Calcite Table). But {{ExternalCatalogTable}} is the table of External Catalog. When {{CalciteCatalogReader}} look up a table from Calcite catalog, Calcite schema would first look up the {{ExternalCatalogTable}} instance from the underlying externalCatalog, then return a TableSourceTable which holds the TableSource that are generated by the converter from the {{ExternalCatalogTable}}. 2. Yes, it's better to move {{partitionColumnNames}} into {{properties}}. 3. It's my bad to said unclearly. We don't want to implement a new Schema class. In fact, we prefer to use Flink's representation, The DataSchema mode is as following: {code} case class DataSchema( columnTypes: Array[TypeInformation[_]], columnNames: Array[String]) {code} 4. It is important to know where to scan these {{TableSource}} that is annotated with {{@ExternalCatalogCompatible}}. We plan to depends on configure file. * let each connector specifies the scan packages in appointed configure file. * try to look up all the resources with the given name of classloader , and parse the scan-packages fields. Looking forward to your advices, thanks. > Introduce interface for catalog, and provide an in-memory implementation, and > integrate with calcite schema > --- > > Key: FLINK-5568 > URL: https://issues.apache.org/jira/browse/FLINK-5568 > Project: Flink > Issue Type: Sub-task > Components: Table API & SQL >Reporter: Kurt Young >Assignee: jingzhang > > The {{TableEnvironment}} now provides a mechanism to register temporary > table. It registers the temp table to calcite catalog, so SQL and TableAPI > queries can access to those temp tables. Now DatasetTable, DataStreamTable > and TableSourceTable can be registered to {{TableEnvironment}} as temporary > tables. > This issue wants to provides a mechanism to connect external catalogs such as > HCatalog to the {{TableEnvironment}}, so SQL and TableAPI queries could > access to tables in the external catalogs without register those tables to > {{TableEnvironment}} beforehand. > First, we should point out that there are two kinds of catalog in Flink > actually. > The first one is external catalog as we mentioned before, it provides CRUD > operations to databases/tables. > The second one is calcite catalog, it defines namespace that can be accessed > in Calcite queries. It depends on Calcite Schema/Table abstraction. > SqlValidator and SqlConverter depends on the calcite catalog to fetch the > tables in SQL or TableAPI. > So we need to do the following things: > 1. introduce interface for