Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/17842 )
Change subject: IMPALA-5741: Initial support for reading tiny RDBMS tables ...................................................................... Patch Set 22: (7 comments) http://gerrit.cloudera.org:8080/#/c/17842/21//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17842/21//COMMIT_MSG@11 PS21, Line 11: It has some limitations due to the restrictions of "external data > There also seems to be restrictions on column data types being received fro Right. Updated commit messages to add restriction for column data type. http://gerrit.cloudera.org:8080/#/c/17842/21/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java File java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java: http://gerrit.cloudera.org:8080/#/c/17842/21/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java@148 PS21, Line 148: // 1. Check init string again because the call in prepare() was from : // the frontend and used a different instance of this JdbcDataSource class. : if (!convertInitStringToConfiguration(params.getInit_string())) { : return new TOpenResult( : new TStatus(TErrorCode.INTERNAL_ERROR, : > This is the same check as one in prepare(). This logic is copied from our sample test code AllTypesDataSource.java (https://github.com/apache/impala/blob/master/java/ext-data-source/test/src/main/java/org/apache/impala/extdatasource/AllTypesDataSource.java#L179-L181). There is a comment in AllTypesDataSource.open(), which says "Need to check again because the call in Prepare() was from the frontend and used a different instance of this data source class." The prepare() API is called by Impala frontend, open() API is called by Impala backend. Frontend and backend create separate JdbcDataSource objects. We duplicate checking in prepare() and open() for safety. I think we should change external data source mechanism to share JdbcDataSource objects between frontend and backend for the long term, then we can remove these duplicated code. http://gerrit.cloudera.org:8080/#/c/17842/21/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/JdbcStorageConfigManager.java File java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/JdbcStorageConfigManager.java: http://gerrit.cloudera.org:8080/#/c/17842/21/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/JdbcStorageConfigManager.java@95 PS21, Line 95: private static boolean isEmptyString(String value) { > We don't seem to be calling this function anywhere? This class is replicated from Hive JDBC Storage Handler. I think we should keep the change to a minimum so that we can easily port from Hive again if there is any improvement or bug fixing in Hive JDBC Storage Handler. http://gerrit.cloudera.org:8080/#/c/17842/21/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/GenericJdbcDatabaseAccessor.java File java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/GenericJdbcDatabaseAccessor.java: http://gerrit.cloudera.org:8080/#/c/17842/21/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/GenericJdbcDatabaseAccessor.java@66 PS21, Line 66: protected DataSource dbcpDataSource = null; > `dbcpDataSource_` for instance variables This class is replicated from Hive JDBC Storage Handler. I think we should keep the original naming convention. http://gerrit.cloudera.org:8080/#/c/17842/21/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/GenericJdbcDatabaseAccessor.java@68 PS21, Line 68: public static Cache<String, DataSource> dataSourceCache = CacheBuilder > naming convention: `dataSourceCache_` This class is replicated from Hive JDBC Storage Handler. I think we should keep the original naming convention. http://gerrit.cloudera.org:8080/#/c/17842/21/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JdbcRecordIterator.java File java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JdbcRecordIterator.java: http://gerrit.cloudera.org:8080/#/c/17842/21/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JdbcRecordIterator.java@53 PS21, Line 53: private final Connection conn; > proper naming convention for all instance variables. This class is replicated from Hive JDBC Storage Handler. I think we should keep the original naming convention. http://gerrit.cloudera.org:8080/#/c/17842/21/testdata/workloads/functional-query/queries/QueryTest/data-source-tables.test File testdata/workloads/functional-query/queries/QueryTest/data-source-tables.test: http://gerrit.cloudera.org:8080/#/c/17842/21/testdata/workloads/functional-query/queries/QueryTest/data-source-tables.test@132 PS21, Line 132: move tests for jdbc data source to a new file. -- To view, visit http://gerrit.cloudera.org:8080/17842 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2 Gerrit-Change-Number: 17842 Gerrit-PatchSet: 22 Gerrit-Owner: Fucun Chu <[email protected]> Gerrit-Reviewer: Abhishek Rawat <[email protected]> Gerrit-Reviewer: Anonymous Coward <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Fucun Chu <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Wenzhe Zhou <[email protected]> Gerrit-Comment-Date: Mon, 02 Oct 2023 05:56:29 +0000 Gerrit-HasComments: Yes
