Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/17842 )
Change subject: IMPALA-5741: Initial support for reading tiny RDBMS tables ...................................................................... Patch Set 23: (16 comments) http://gerrit.cloudera.org:8080/#/c/17842/22//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17842/22//COMMIT_MSG@11 PS22, Line 11: It has some limitations due to the restrictions of "external data : source": > nit: I assume this patch is also still limited by IMPALA-7131 (does not wor mentioned IMPALA-7131 and IMPALA-12375 http://gerrit.cloudera.org:8080/#/c/17842/22//COMMIT_MSG@33 PS22, Line 33: c drivers and the da > nit: General comment about naming: instead of referring things with *data-s renamed the scripts as suggested. http://gerrit.cloudera.org:8080/#/c/17842/22//COMMIT_MSG@39 PS22, Line 39: > nit: This can be removed? impala-shell will connect to first impalad. removed http://gerrit.cloudera.org:8080/#/c/17842/22/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java File java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java: http://gerrit.cloudera.org:8080/#/c/17842/22/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java@273 PS22, Line 273: String project; > Is quoting added somewhere? Postgres will convert unquoted capitals to lowe The case sensitive column names must be provided in column mapping property in create table statement since Impala is case in-sensitive and column names are save as low case. The column names in column mapping property will be quoted with double quotes if the name is not quoted, and jdbc table name will be quoted if the column mapping is not empty. See sample create table statement in testdata/bin/create-ext-data-source-table.sql http://gerrit.cloudera.org:8080/#/c/17842/22/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/GenericJdbcDatabaseAccessor.java File java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/GenericJdbcDatabaseAccessor.java: http://gerrit.cloudera.org:8080/#/c/17842/22/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/GenericJdbcDatabaseAccessor.java@96 PS22, Line 96: // TODO: If a target database cannot flatten this view query, try to text > Are we sure all of the target databases will flatten this view query? Other This function is replicated from Hive JDBC Storage Handler. I think it had been tested. We did not find issue in Postgres and MySql so far. Add TODO comment and will revisit this code when finding issue on a target database. http://gerrit.cloudera.org:8080/#/c/17842/22/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/GenericJdbcDatabaseAccessor.java@285 PS22, Line 285: // by calling BasicDataSource.setDriverClassLoader. > Are these database-specific? These are configurations for dbcp (Database Connection Pools) component, and not database-specific. http://gerrit.cloudera.org:8080/#/c/17842/22/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JdbcRecordIterator.java File java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JdbcRecordIterator.java: http://gerrit.cloudera.org:8080/#/c/17842/22/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JdbcRecordIterator.java@86 PS22, Line 86: try { > This should re-throw fixed http://gerrit.cloudera.org:8080/#/c/17842/22/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JdbcRecordIterator.java@175 PS22, Line 175: public void close() throws JdbcDatabaseAccessException { > probably should re-throw fixed http://gerrit.cloudera.org:8080/#/c/17842/22/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JethroDatabaseAccessor.java File java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JethroDatabaseAccessor.java: http://gerrit.cloudera.org:8080/#/c/17842/22/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JethroDatabaseAccessor.java@39 PS22, Line 39: return "Select * from (" + sql + ") as \"tmp\" limit " + limit; > This may not always flatten well. consider text replacement instead. This class is replicated from Hive JDBC Storage Handler. I guess it should be working. I will leave it as is until we have chance to run test again JethroData database. http://gerrit.cloudera.org:8080/#/c/17842/22/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/util/QueryConditionUtil.java File java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/util/QueryConditionUtil.java: http://gerrit.cloudera.org:8080/#/c/17842/22/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/util/QueryConditionUtil.java@54 PS22, Line 54: joiner.add(String.format("%s %s %s", name, op, value)); > May need quoting here. jdbc column names in columnMapping are quoted. http://gerrit.cloudera.org:8080/#/c/17842/22/testdata/bin/copy-data-sources.sh File testdata/bin/copy-data-sources.sh: http://gerrit.cloudera.org:8080/#/c/17842/22/testdata/bin/copy-data-sources.sh@28 PS22, Line 28: > nit: Contain this path into its own bash variable and use that for the foll defined variables EXT_DATA_SOURCES_PATH and JDBC_DRIVERS_PATH http://gerrit.cloudera.org:8080/#/c/17842/22/testdata/bin/copy-data-sources.sh@40 PS22, Line 40: : : > nit: print something at the end of this script to indicate successful run? Added printing messages as suggested http://gerrit.cloudera.org:8080/#/c/17842/23/testdata/bin/copy-ext-data-sources.sh File testdata/bin/copy-ext-data-sources.sh: http://gerrit.cloudera.org:8080/#/c/17842/23/testdata/bin/copy-ext-data-sources.sh@39 PS23, Line 39: echo "Copied" ${IMPALA_HOME}/java/ext-data-source/test/target/impala-data-source-test-*.jar \ > line too long (93 > 90) fixed http://gerrit.cloudera.org:8080/#/c/17842/23/testdata/bin/copy-ext-data-sources.sh@46 PS23, Line 46: echo "Copied" ${IMPALA_HOME}/java/ext-data-source/jdbc/target/impala-data-source-jdbc-*.jar \ > line too long (93 > 90) fixed http://gerrit.cloudera.org:8080/#/c/17842/22/testdata/bin/create-data-source-table.sql File testdata/bin/create-data-source-table.sql: http://gerrit.cloudera.org:8080/#/c/17842/22/testdata/bin/create-data-source-table.sql@56 PS22, Line 56: > Add tests that require quoting Added a Postgres table with case sensitive column names and table name. Made alltypes_jdbc_datasource_2 point to that new Postgres table. Added column.mapping table property to map columns between Impala and Postgres. Added unit-test to read the new Postgres's table. http://gerrit.cloudera.org:8080/#/c/17842/22/testdata/workloads/functional-query/queries/QueryTest/jdbc-data-source.test File testdata/workloads/functional-query/queries/QueryTest/jdbc-data-source.test: http://gerrit.cloudera.org:8080/#/c/17842/22/testdata/workloads/functional-query/queries/QueryTest/jdbc-data-source.test@28 PS22, Line 28: 11,false,1,1,1,10,1.100000023841858,10.1,'01/02/09','1',2009-01-02 00:11:00.450000000 : 12,true,2,2,2,20,2.200000047683716,20.2,'01/02/09','2',2009-01-02 00:12:00.460000000 : 13,false,3,3,3,30,3.299999952316284,30.3,'01/02/09','3',2009-01-02 00:13:00.480000000 : 14,true,4,4,4,40,4.400000095367432,40.4,'01/02/09','4',2009-01-02 00:14:00.510000000 : 20,true,0,0,0,0,0,0,'01/03/09','0',2009-01-03 00:20:00.900000000 > Tried to run the test myself and I get following error: This is time zone issue. jdbc driver use system time zone by default, but Impala use UTC by default. Fixed the issue when reading the value of timestamp column from RecordSet by calling ResultSet.getTimestamp() with UTC time zone. -- To view, visit http://gerrit.cloudera.org:8080/17842 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2 Gerrit-Change-Number: 17842 Gerrit-PatchSet: 23 Gerrit-Owner: Fucun Chu <[email protected]> Gerrit-Reviewer: Abhishek Rawat <[email protected]> Gerrit-Reviewer: Anonymous Coward <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Fucun Chu <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Kurt Deschler <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Riza Suminto <[email protected]> Gerrit-Reviewer: Wenzhe Zhou <[email protected]> Gerrit-Comment-Date: Wed, 04 Oct 2023 03:48:07 +0000 Gerrit-HasComments: Yes
