Wenzhe Zhou has uploaded a new patch set (#30) to the change originally created by Fucun Chu. ( http://gerrit.cloudera.org:8080/17842 )
Change subject: IMPALA-5741: Initial support for reading tiny RDBMS tables ...................................................................... IMPALA-5741: Initial support for reading tiny RDBMS tables This patch uses the "external data source" mechanism in Impala to implement data source for querying jdbc. It has some limitations due to the restrictions of "external data source": - It is not distributed, e.g, fregment is unpartitioned. The queries are executed on coordinator. - Queries which read following data types from external jdbc tables are not supported: BINARY, CHAR, DATETIME, and COMPLEX. - Only support binary predicates with operators =, !=, <=, >=, <, > to be pushed to RDBMS. - Following data types are not supported for predicates: DECIMAL, TIMESTAMP, DATE, and BINARY. - External tables with complex types of columns are not supported. - Support is limited to the following databases: MySQL, Postgres, Oracle, MSSQL, H2, DB2, and JETHRO_DATA. - Catalog V2 is not supported (IMPALA-7131). - DataSource objects are not persistent (IMPALA-12375). Additional fixes are planned on top of this patch. Source files under jdbc/conf, jdbc/dao and jdbc/exception are replicated from Hive JDBC Storage Handler. In order to query the RDBMS tables, the following steps should be followed (note that existing data source table will be rebuilt): 1. Make sure the Impala cluster has been started. 2. Copy the jar files of jdbc drivers and the data source library into HDFS. ${IMPALA_HOME}/testdata/bin/copy-ext-data-sources.sh 3. Create an `alltypes` table in the postgres database. ${IMPALA_HOME}/testdata/bin/load-ext-data-sources.sh 4. Create data source tables (alltypes_jdbc_datasource and alltypes_jdbc_datasource_2). ${IMPALA_HOME}/bin/impala-shell.sh -f\ ${IMPALA_HOME}/testdata/bin/create-ext-data-source-table.sql 5. It's ready to run query to access data source tables created in last step. Don't need to restart Impala cluster. Testing: - Added unit-test for Postgres and ran unit-test with jdbc driver postgresql-42.5.1.jar. - Ran manual unit-test for MySql with jdbc driver mysql-connector-j-8.1.0.jar. - Ran core tests successfully. Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2 --- M bin/rat_exclude_files.txt M fe/src/main/java/org/apache/impala/extdatasource/ExternalDataSourceExecutor.java M fe/src/test/java/org/apache/impala/service/FrontendTest.java A java/ext-data-source/jdbc/pom.xml A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/README.md A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/DatabaseType.java A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/JdbcStorageConfig.java A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/JdbcStorageConfigManager.java A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DB2DatabaseAccessor.java A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DatabaseAccessor.java A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DatabaseAccessorFactory.java A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/GenericJdbcDatabaseAccessor.java A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JdbcRecordIterator.java A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JethroDatabaseAccessor.java A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/MsSqlDatabaseAccessor.java A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/MySqlDatabaseAccessor.java A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/OracleDatabaseAccessor.java A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/PostgresDatabaseAccessor.java A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/exception/JdbcDatabaseAccessException.java A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/util/QueryConditionUtil.java A java/ext-data-source/jdbc/src/test/java/org/apache/impala/extdatasource/jdbc/JdbcDataSourceTest.java A java/ext-data-source/jdbc/src/test/resources/log4j.properties A java/ext-data-source/jdbc/src/test/resources/test_script.sql M java/ext-data-source/pom.xml D testdata/bin/copy-data-sources.sh A testdata/bin/copy-ext-data-sources.sh D testdata/bin/create-data-source-table.sql A testdata/bin/create-ext-data-source-table.sql M testdata/bin/create-load-data.sh A testdata/bin/load-ext-data-sources.sh A testdata/workloads/functional-query/queries/QueryTest/jdbc-data-source.test A tests/query_test/test_ext_data_sources.py M tests/query_test/test_queries.py 34 files changed, 2,488 insertions(+), 86 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/42/17842/30 -- To view, visit http://gerrit.cloudera.org:8080/17842 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2 Gerrit-Change-Number: 17842 Gerrit-PatchSet: 30 Gerrit-Owner: Fucun Chu <chufu...@hotmail.com> Gerrit-Reviewer: Abhishek Rawat <ara...@cloudera.com> Gerrit-Reviewer: Anonymous Coward <gsi...@cloudera.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Fucun Chu <chufu...@hotmail.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Kurt Deschler <kdesc...@cloudera.com> Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com> Gerrit-Reviewer: Riza Suminto <riza.sumi...@cloudera.com> Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>