[jira] [Assigned] (IMPALA-13019) Add query option to keep DBCP DataSource objects in cache for longer time
[ https://issues.apache.org/jira/browse/IMPALA-13019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenzhe Zhou reassigned IMPALA-13019: Assignee: Wenzhe Zhou (was: Pranav Yogi Lodha) > Add query option to keep DBCP DataSource objects in cache for longer time > - > > Key: IMPALA-13019 > URL: https://issues.apache.org/jira/browse/IMPALA-13019 > Project: IMPALA > Issue Type: Sub-task > Components: Frontend >Reporter: Wenzhe Zhou >Assignee: Wenzhe Zhou >Priority: Major > Fix For: Impala 4.5.0 > > > IMPALA-12910 adds a new class DataSourceObjectCache to save DBCP DataSource > objects. DBCP DataSource objects hold JDBC driver. By sharing DBCP > DataSource objects across multiple JDBC requests, we can avoid to load JDBC > driver. Current code checks reference count for each DBCP DataSource object, > and removes the cached object once reference count reaches 0. > DBCP DataSource objects should be kept in cache even reference count reaches > 0. We could add a query option to enable it. This need a working thread to > periodically clean up objects which are idle for long time. Maybe separate > maps to save active objects and idle objects. Query options could be passed > to JDBC code in TOpenParams. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-12910) Run TPCH/TPCDS queries for external JDBC tables
[ https://issues.apache.org/jira/browse/IMPALA-12910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenzhe Zhou resolved IMPALA-12910. -- Fix Version/s: Impala 4.5.0 Resolution: Fixed > Run TPCH/TPCDS queries for external JDBC tables > --- > > Key: IMPALA-12910 > URL: https://issues.apache.org/jira/browse/IMPALA-12910 > Project: IMPALA > Issue Type: Sub-task > Components: Perf Investigation >Reporter: Wenzhe Zhou >Assignee: Wenzhe Zhou >Priority: Major > Fix For: Impala 4.5.0 > > > Need performance data for queries on external JDBC tables to be documented in > the design doc. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12910) Run TPCH/TPCDS queries for external JDBC tables
[ https://issues.apache.org/jira/browse/IMPALA-12910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842837#comment-17842837 ] ASF subversion and git services commented on IMPALA-12910: -- Commit 08f8a300250df7b4f9a517cdb6bab48c379b7e03 in impala's branch refs/heads/master from wzhou-code [ https://gitbox.apache.org/repos/asf?p=impala.git;h=08f8a3002 ] IMPALA-12910: Support running TPCH/TPCDS queries for JDBC tables This patch adds script to create external JDBC tables for the dataset of TPCH and TPCDS, and adds unit-tests to run TPCH and TPCDS queries for external JDBC tables with Impala-Impala federation. Note that JDBC tables are mapping tables, they don't take additional disk spaces. It fixes the race condition when caching of SQL DataSource objects by using a new DataSourceObjectCache class, which checks reference count before closing SQL DataSource. Adds a new query-option 'clean_dbcp_ds_cache' with default value as true. When it's set as false, SQL DataSource object will not be closed when its reference count equals 0 and will be kept in cache until the SQL DataSource is idle for more than 5 minutes. Flag variable 'dbcp_data_source_idle_timeout_s' is added to make the duration configurable. java.sql.Connection.close() fails to remove a closed connection from connection pool sometimes, which causes JDBC working threads to wait for available connections from the connection pool for a long time. The work around is to call BasicDataSource.invalidateConnection() API to close a connection. Two flag variables are added for DBCP configuration properties 'maxTotal' and 'maxWaitMillis'. Note that 'maxActive' and 'maxWait' properties are renamed to 'maxTotal' and 'maxWaitMillis' respectively in apache.commons.dbcp v2. Fixes a bug for database type comparison since the type strings specified by user could be lower case or mix of upper/lower cases, but the code compares the types with upper case string. Fixes issue to close SQL DataSource object in JdbcDataSource.open() and JdbcDataSource.getNext() when some errors returned from DBCP APIs or JDBC drivers. testdata/bin/create-tpc-jdbc-tables.py supports to create JDBC tables for Impala-Impala, Postgres and MySQL. Following sample commands creates TPCDS JDBC tables for Impala-Impala federation with remote coordinator running at 10.19.10.86, and Postgres server running at 10.19.10.86: ${IMPALA_HOME}/testdata/bin/create-tpc-jdbc-tables.py \ --jdbc_db_name=tpcds_jdbc --workload=tpcds \ --database_type=IMPALA --database_host=10.19.10.86 --clean ${IMPALA_HOME}/testdata/bin/create-tpc-jdbc-tables.py \ --jdbc_db_name=tpcds_jdbc --workload=tpcds \ --database_type=POSTGRES --database_host=10.19.10.86 \ --database_name=tpcds --clean TPCDS tests for JDBC tables run only for release/exhaustive builds. TPCH tests for JDBC tables run for core and exhaustive builds, except Dockerized builds. Remaining Issues: - tpcds-decimal_v2-q80a failed with returned rows not matching expected results for some decimal values. This will be fixed in IMPALA-13018. Testing: - Passed core tests. - Passed query_test/test_tpcds_queries.py in release/exhaustive build. - Manually verified that only one SQL DataSource object was created for test_tpcds_queries.py::TestTpcdsQueryForJdbcTables since query option 'clean_dbcp_ds_cache' was set as false, and the SQL DataSource object was closed by cleanup thread. Change-Id: I44e8c1bb020e90559c7f22483a7ab7a151b8f48a Reviewed-on: http://gerrit.cloudera.org:8080/21304 Reviewed-by: Abhishek Rawat Tested-by: Impala Public Jenkins > Run TPCH/TPCDS queries for external JDBC tables > --- > > Key: IMPALA-12910 > URL: https://issues.apache.org/jira/browse/IMPALA-12910 > Project: IMPALA > Issue Type: Sub-task > Components: Perf Investigation >Reporter: Wenzhe Zhou >Assignee: Wenzhe Zhou >Priority: Major > > Need performance data for queries on external JDBC tables to be documented in > the design doc. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-13018) Fix test_tpcds_queries.py/TestTpcdsQueryForJdbcTables.test_tpcdstpcds-decimal_v2-q80a failure
[ https://issues.apache.org/jira/browse/IMPALA-13018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842838#comment-17842838 ] ASF subversion and git services commented on IMPALA-13018: -- Commit 08f8a300250df7b4f9a517cdb6bab48c379b7e03 in impala's branch refs/heads/master from wzhou-code [ https://gitbox.apache.org/repos/asf?p=impala.git;h=08f8a3002 ] IMPALA-12910: Support running TPCH/TPCDS queries for JDBC tables This patch adds script to create external JDBC tables for the dataset of TPCH and TPCDS, and adds unit-tests to run TPCH and TPCDS queries for external JDBC tables with Impala-Impala federation. Note that JDBC tables are mapping tables, they don't take additional disk spaces. It fixes the race condition when caching of SQL DataSource objects by using a new DataSourceObjectCache class, which checks reference count before closing SQL DataSource. Adds a new query-option 'clean_dbcp_ds_cache' with default value as true. When it's set as false, SQL DataSource object will not be closed when its reference count equals 0 and will be kept in cache until the SQL DataSource is idle for more than 5 minutes. Flag variable 'dbcp_data_source_idle_timeout_s' is added to make the duration configurable. java.sql.Connection.close() fails to remove a closed connection from connection pool sometimes, which causes JDBC working threads to wait for available connections from the connection pool for a long time. The work around is to call BasicDataSource.invalidateConnection() API to close a connection. Two flag variables are added for DBCP configuration properties 'maxTotal' and 'maxWaitMillis'. Note that 'maxActive' and 'maxWait' properties are renamed to 'maxTotal' and 'maxWaitMillis' respectively in apache.commons.dbcp v2. Fixes a bug for database type comparison since the type strings specified by user could be lower case or mix of upper/lower cases, but the code compares the types with upper case string. Fixes issue to close SQL DataSource object in JdbcDataSource.open() and JdbcDataSource.getNext() when some errors returned from DBCP APIs or JDBC drivers. testdata/bin/create-tpc-jdbc-tables.py supports to create JDBC tables for Impala-Impala, Postgres and MySQL. Following sample commands creates TPCDS JDBC tables for Impala-Impala federation with remote coordinator running at 10.19.10.86, and Postgres server running at 10.19.10.86: ${IMPALA_HOME}/testdata/bin/create-tpc-jdbc-tables.py \ --jdbc_db_name=tpcds_jdbc --workload=tpcds \ --database_type=IMPALA --database_host=10.19.10.86 --clean ${IMPALA_HOME}/testdata/bin/create-tpc-jdbc-tables.py \ --jdbc_db_name=tpcds_jdbc --workload=tpcds \ --database_type=POSTGRES --database_host=10.19.10.86 \ --database_name=tpcds --clean TPCDS tests for JDBC tables run only for release/exhaustive builds. TPCH tests for JDBC tables run for core and exhaustive builds, except Dockerized builds. Remaining Issues: - tpcds-decimal_v2-q80a failed with returned rows not matching expected results for some decimal values. This will be fixed in IMPALA-13018. Testing: - Passed core tests. - Passed query_test/test_tpcds_queries.py in release/exhaustive build. - Manually verified that only one SQL DataSource object was created for test_tpcds_queries.py::TestTpcdsQueryForJdbcTables since query option 'clean_dbcp_ds_cache' was set as false, and the SQL DataSource object was closed by cleanup thread. Change-Id: I44e8c1bb020e90559c7f22483a7ab7a151b8f48a Reviewed-on: http://gerrit.cloudera.org:8080/21304 Reviewed-by: Abhishek Rawat Tested-by: Impala Public Jenkins > Fix > test_tpcds_queries.py/TestTpcdsQueryForJdbcTables.test_tpcdstpcds-decimal_v2-q80a > failure > - > > Key: IMPALA-13018 > URL: https://issues.apache.org/jira/browse/IMPALA-13018 > Project: IMPALA > Issue Type: Sub-task > Components: Backend, Frontend >Reporter: Wenzhe Zhou >Assignee: Pranav Yogi Lodha >Priority: Major > > The returned rows are not matching expected results for some decimal type of > columns. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-13051) Speed up test_query_log test runs
Michael Smith created IMPALA-13051: -- Summary: Speed up test_query_log test runs Key: IMPALA-13051 URL: https://issues.apache.org/jira/browse/IMPALA-13051 Project: IMPALA Issue Type: Task Affects Versions: Impala 4.4.0 Reporter: Michael Smith test_query_log.py takes 11 minutes to run. Most of them use graceful shutdown, and provide an unnecessary grace period. Optimize test_query_log test runs, and do some other code cleanup around workload management. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Work started] (IMPALA-13051) Speed up test_query_log test runs
[ https://issues.apache.org/jira/browse/IMPALA-13051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on IMPALA-13051 started by Michael Smith. -- > Speed up test_query_log test runs > - > > Key: IMPALA-13051 > URL: https://issues.apache.org/jira/browse/IMPALA-13051 > Project: IMPALA > Issue Type: Task >Affects Versions: Impala 4.4.0 >Reporter: Michael Smith >Assignee: Michael Smith >Priority: Major > > test_query_log.py takes 11 minutes to run. Most of them use graceful > shutdown, and provide an unnecessary grace period. Optimize test_query_log > test runs, and do some other code cleanup around workload management. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-13051) Speed up test_query_log test runs
[ https://issues.apache.org/jira/browse/IMPALA-13051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Smith reassigned IMPALA-13051: -- Assignee: Michael Smith > Speed up test_query_log test runs > - > > Key: IMPALA-13051 > URL: https://issues.apache.org/jira/browse/IMPALA-13051 > Project: IMPALA > Issue Type: Task >Affects Versions: Impala 4.4.0 >Reporter: Michael Smith >Assignee: Michael Smith >Priority: Major > > test_query_log.py takes 11 minutes to run. Most of them use graceful > shutdown, and provide an unnecessary grace period. Optimize test_query_log > test runs, and do some other code cleanup around workload management. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-12353) Support Impala on ARM
[ https://issues.apache.org/jira/browse/IMPALA-12353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Smith resolved IMPALA-12353. Fix Version/s: Impala 4.4.0 Resolution: Fixed > Support Impala on ARM > - > > Key: IMPALA-12353 > URL: https://issues.apache.org/jira/browse/IMPALA-12353 > Project: IMPALA > Issue Type: Epic > Components: Infrastructure >Reporter: Michael Smith >Assignee: Michael Smith >Priority: Major > Labels: arm64 > Fix For: Impala 4.4.0 > > > Ensure Impala runs on ARM. Provide native toolchain builds, developer > workflows, and testing for ARM. > Create jobs in jenkins.impala.io as needed to enable testing ARM builds. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-12353) Support Impala on ARM
[ https://issues.apache.org/jira/browse/IMPALA-12353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Smith updated IMPALA-12353: --- Labels: arm64 (was: ) > Support Impala on ARM > - > > Key: IMPALA-12353 > URL: https://issues.apache.org/jira/browse/IMPALA-12353 > Project: IMPALA > Issue Type: Epic > Components: Infrastructure >Reporter: Michael Smith >Assignee: Michael Smith >Priority: Major > Labels: arm64 > > Ensure Impala runs on ARM. Provide native toolchain builds, developer > workflows, and testing for ARM. > Create jobs in jenkins.impala.io as needed to enable testing ARM builds. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-13050) Impala fails to start with TSAN in RHEL aarch64
Michael Smith created IMPALA-13050: -- Summary: Impala fails to start with TSAN in RHEL aarch64 Key: IMPALA-13050 URL: https://issues.apache.org/jira/browse/IMPALA-13050 Project: IMPALA Issue Type: Bug Affects Versions: Impala 4.4.0 Reporter: Michael Smith Impala fails to start up with TSAN builds with RHEL 8.8 on aarch64 (arm64) machines. With Java 8, the jni-util.cc call to {{hdfsConnect}} causes impalad/catalogd to crash with {quote} Exception: java.lang.StackOverflowError thrown from the UncaughtExceptionHandler in thread "process reaper" {quote} With Java 11 and 17 (with {{TEST_JDK_VERSION}}, or {{IMPALA_JDK_VERSION}} and only restarting Impala), we don't get an error but they still crash during {{hdfsConnect}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-13049) Add dependency management for the log4j2 version
[ https://issues.apache.org/jira/browse/IMPALA-13049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joe McDonnell resolved IMPALA-13049. Fix Version/s: Impala 4.5.0 Resolution: Fixed > Add dependency management for the log4j2 version > > > Key: IMPALA-13049 > URL: https://issues.apache.org/jira/browse/IMPALA-13049 > Project: IMPALA > Issue Type: Bug > Components: Frontend, Infrastructure >Affects Versions: Impala 4.4.0 >Reporter: Joe McDonnell >Assignee: Joe McDonnell >Priority: Critical > Fix For: Impala 4.5.0 > > > In some internal builds, we see cases where one dependency brings in one > version of log4j2 and another brings in a different version on a different > artifact. In particular, we have seen cases where Hive brings in log4j-api > 2.17.1 while something else brings in log4j-core 2.18.0. This is a bad > combination, because log4j-core 2.18.0 relies on the ServiceLoaderUtil class > existing in log4j-api, but log4j-api 2.17.1 doesn't have it. This can result > in class not found exceptions. > Impala itself uses reload4j rather than log4j2, so this is purely about > coordinating dependencies rather than Impala code. > We should add dependency management for log4j-api and log4j-core. It makes > sense to standardize on 2.18.0. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-13049) Add dependency management for the log4j2 version
[ https://issues.apache.org/jira/browse/IMPALA-13049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842753#comment-17842753 ] ASF subversion and git services commented on IMPALA-13049: -- Commit d09c5024907aaf387aaa584dc86cb2b4d641a582 in impala's branch refs/heads/master from Joe McDonnell [ https://gitbox.apache.org/repos/asf?p=impala.git;h=d09c50249 ] IMPALA-13049: Add dependency management for log4j2 to use 2.18.0 Currently, there is no dependency management for the log4j2 version. Impala itself doesn't use log4j2. However, recently we encountered a case where one dependency brought in log4-core 2.18.0 and another brought in log4j-api 2.17.1. log4j-core 2.18.0 relies on the existence of the ServiceLoaderUtil class from log4j-api 2.18.0. log4j-api 2.17.1 doesn't have this class, which causes class not found exceptions. This uses dependency management to set the log4j2 version to 2.18.0 for log4j-core and log4j-api to avoid any mismatch. Testing: - Ran a local build and verified that both log4j-core and log4j-api are using 2.18.0. Change-Id: Ib4f8485adadb90f66f354a5dedca29992c6d4e6f Reviewed-on: http://gerrit.cloudera.org:8080/21379 Reviewed-by: Michael Smith Reviewed-by: Abhishek Rawat Tested-by: Impala Public Jenkins > Add dependency management for the log4j2 version > > > Key: IMPALA-13049 > URL: https://issues.apache.org/jira/browse/IMPALA-13049 > Project: IMPALA > Issue Type: Bug > Components: Frontend, Infrastructure >Affects Versions: Impala 4.4.0 >Reporter: Joe McDonnell >Assignee: Joe McDonnell >Priority: Critical > > In some internal builds, we see cases where one dependency brings in one > version of log4j2 and another brings in a different version on a different > artifact. In particular, we have seen cases where Hive brings in log4j-api > 2.17.1 while something else brings in log4j-core 2.18.0. This is a bad > combination, because log4j-core 2.18.0 relies on the ServiceLoaderUtil class > existing in log4j-api, but log4j-api 2.17.1 doesn't have it. This can result > in class not found exceptions. > Impala itself uses reload4j rather than log4j2, so this is purely about > coordinating dependencies rather than Impala code. > We should add dependency management for log4j-api and log4j-core. It makes > sense to standardize on 2.18.0. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12754) Update Impala document to cover external jdbc table
[ https://issues.apache.org/jira/browse/IMPALA-12754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842739#comment-17842739 ] Wenzhe Zhou commented on IMPALA-12754: -- Design doc for JDBC table: https://docs.google.com/document/d/14gmhaG5fRQg3s4jNh8UrlVAeDdNtQfoGbZGl0emRcdE/edit?usp=sharing > Update Impala document to cover external jdbc table > --- > > Key: IMPALA-12754 > URL: https://issues.apache.org/jira/browse/IMPALA-12754 > Project: IMPALA > Issue Type: Sub-task > Components: Docs >Reporter: Wenzhe Zhou >Assignee: gaurav singh >Priority: Major > > We need to document the SQL syntax to create external JDBC table and alter > external JDBC table, including the table properties to be set for JDBC and > DBCP (Database Connection Pool). > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Comment Edited] (IMPALA-12754) Update Impala document to cover external jdbc table
[ https://issues.apache.org/jira/browse/IMPALA-12754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842441#comment-17842441 ] Wenzhe Zhou edited comment on IMPALA-12754 at 5/1/24 6:31 PM: -- *Impala SQL syntax to create external JDBC table* When creating an external JDBC table, the user needs to specify the minimum information: database type, jdbc url, driver class, driver file location, user name and password for querying database, table name. Here are two samples to create JDBC tables for tables on Postgres server and another Impala cluster respectively. {code:java} CREATE EXTERNAL TABLE alltypes_jdbc ( id INT, bool_col BOOLEAN, tinyint_col TINYINT, smallint_col SMALLINT, int_col INT, bigint_col BIGINT, float_col FLOAT, double_col DOUBLE, date_col DATE, string_col STRING, timestamp_col TIMESTAMP) STORED BY JDBC TBLPROPERTIES ( "database.type"="POSTGRES", "jdbc.url"="jdbc:postgresql://10.96.132.138:5432/functional", "jdbc.driver"="org.postgresql.Driver", "driver.url"="/test-warehouse/data-sources/jdbc-drivers/postgresql-jdbc.jar", "dbcp.username"="hiveuser", "dbcp.password"="password", "table"="alltypes"); CREATE EXTERNAL TABLE alltypes_jdbc ( id INT, bool_col BOOLEAN, tinyint_col TINYINT, smallint_col SMALLINT, int_col INT, bigint_col BIGINT, float_col FLOAT, double_col DOUBLE, date_col DATE, string_col STRING, timestamp_col TIMESTAMP) STORED BY JDBC TBLPROPERTIES ( "database.type"="IMPALA", "jdbc.url"="jdbc:impala://10.96.132.138:21050/functional", "jdbc.auth"="AuthMech=3", "jdbc.properties"="MEM_LIMIT=10, MAX_ERRORS = 1", "jdbc.driver"="com.cloudera.impala.jdbc.Driver", "driver.url"="hdfs://test-warehouse/data-sources/jdbc-drivers/ImpalaJDBC42.jar", "dbcp.username"="hiveuser", "dbcp.password.keystore"="jceks://hdfs/test-warehouse/data-sources/test.jceks", "dbcp.password.key"="hiveuser", "table"="alltypes"); {code} *Supported data types* The column data type for an external JDBC table can be: * Numeric data type: boolean, tinyint, smallint, int, bigint, float, double * Decimal with scale and precision * String type: string * Date * Timestamp Note that following data types are not supported: char, varchar, and binary. Complex data type: struct, map, array and nested type are not supported. *Table Properties specified for external JDBC table* In the create external JDBC table statement, user is required to specify the following table properties: * database.type: IMPALA, MYSQL, POSTGRES * jdbc.url: jdbc connection string, including the database type, IP address, port number, and database name. For example, “jdbc:impala://10.96.132.138:21050/functional”. * jdbc.driver: class name of jdbc driver * driver.url: driver URL for downloading the Jar file package that is used to access the external database. * table: name of the external table to be mapped in Impala. Besides the above required properties, user can also specify optional parameters to use different authentication methods, or to allow case sensitive columns names in remote tables, or to specify additional database properties, etc: * jdbc.auth: authentication mechanisms of JDBC driver. It's used for Impala-Impala federation. * dbcp.username: jdbc user name * dbcp.password: jdbc password in clear text, this parameter is strongly discouraged in production environments. The recommended way is to store it in a keystore. See section “securing password” for details. * dbcp.password.key: key of the keystore * dbcp.password.keystore: keystore URI. * jdbc.properties: additional properties applied to database engine, like Impala Query options. Properties are specified as comma-delimited key=value string. * jdbc.fetch.size: number of rows to fetch in a batch * column.mappting: Mapping of column names between external table and Impala JDBC table. See section "Support case-sensitive table/column names" for details. *Securing password* To mitigate the password leak, the value of “dbcp.password” table property is masked in the output of commands “SHOW CREATE TABLE table-name” and “DESCRIBE FORMATTED | EXTENDED table-name”. In production deployment, it is strongly discouraged to save the jdbc password in clear text in table property "dbcp.password". Instead, user can store password in a Java keystore file on HDFS by using the command like below to create a keystore file: {code:java} hadoop credential create host1.password -provider jceks://hdfs/user/test.jceks -v passwd1 {code} Then specify “dbcp.password.key” and “dbcp.password.keystore” instead of “dbcp.password” in the create table statement. *Support case-sensitive table/column names* Column names of remote tables may be different from the JDBC table schema. For example, Postgres allows case-sensitive column names, but Impala always saves column names in lowercase. In this case, the user
[jira] [Comment Edited] (IMPALA-12754) Update Impala document to cover external jdbc table
[ https://issues.apache.org/jira/browse/IMPALA-12754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842441#comment-17842441 ] Wenzhe Zhou edited comment on IMPALA-12754 at 5/1/24 6:11 PM: -- *Impala SQL syntax to create external JDBC table* When creating an external JDBC table, the user needs to specify the minimum information: database type, jdbc url, driver class, driver file location, user name and password for querying database, table name. Here are two samples to create JDBC tables for tables on Postgres server and another Impala cluster respectively. {code:java} CREATE EXTERNAL TABLE alltypes_jdbc ( id INT, bool_col BOOLEAN, tinyint_col TINYINT, smallint_col SMALLINT, int_col INT, bigint_col BIGINT, float_col FLOAT, double_col DOUBLE, date_col DATE, string_col STRING, timestamp_col TIMESTAMP) STORED BY JDBC TBLPROPERTIES ( "database.type"="POSTGRES", "jdbc.url"="jdbc:postgresql://10.96.132.138:5432/functional", "jdbc.driver"="org.postgresql.Driver", "driver.url"="/test-warehouse/data-sources/jdbc-drivers/postgresql-jdbc.jar", "dbcp.username"="hiveuser", "dbcp.password"="password", "table"="alltypes"); CREATE EXTERNAL TABLE alltypes_jdbc ( id INT, bool_col BOOLEAN, tinyint_col TINYINT, smallint_col SMALLINT, int_col INT, bigint_col BIGINT, float_col FLOAT, double_col DOUBLE, date_col DATE, string_col STRING, timestamp_col TIMESTAMP) STORED BY JDBC TBLPROPERTIES ( "database.type"="IMPALA", "jdbc.url"="jdbc:impala://10.96.132.138:21050/functional", "jdbc.auth"="AuthMech=3", "jdbc.properties"="MEM_LIMIT=10, MAX_ERRORS = 1", "jdbc.driver"="com.cloudera.impala.jdbc.Driver", "driver.url"="hdfs://test-warehouse/data-sources/jdbc-drivers/ImpalaJDBC42.jar", "dbcp.username"="hiveuser", "dbcp.password.keystore"="jceks://hdfs/test-warehouse/data-sources/test.jceks", "dbcp.password.key"="hiveuser", "table"="alltypes"); {code} *Supported data types* The column data type for an external JDBC table can be: * Numeric data type: boolean, tinyint, smallint, int, bigint, float, double * Decimal with scale and precision * String type: string * Date * Timestamp Note that following data types are not supported: char, varchar, and binary. Complex data type: struct, map, array and nested type are not supported. *Table Properties specified for external JDBC table* In the create external JDBC table statement, user is required to specify the following table properties: * database.type: IMPALA, MYSQL, POSTGRES * jdbc.url: jdbc connection string, including the database type, IP address, port number, and database name. For example, “jdbc:impala://10.96.132.138:21050/functional”. * jdbc.driver: class name of jdbc driver * driver.url: driver URL for downloading the Jar file package that is used to access the external database. * table: name of the external table to be mapped in Impala. Besides the above required properties, user can also specify optional parameters to use different authentication methods, or to allow case sensitive columns names in remote tables, or to specify additional database properties, etc: * jdbc.auth: authentication mechanisms of JDBC driver. It's used for Impala-Impala federation. * dbcp.username: jdbc user name * dbcp.password: jdbc password in clear text, this parameter is strongly discouraged in production environments. The recommended way is to store it in a keystore. See section “securing password” for details. * dbcp.password.key: key of the keystore * dbcp.password.keystore: keystore URI. * jdbc.properties: additional properties applied to database engine, like Impala Query options. Properties are specified as comma-delimited key=value string. * jdbc.fetch.size: number of rows to fetch in a batch * column.mappting: Mapping of column names between external table and Impala JDBC table. See section "Support case-sensitive table/column names" for details. *Securing password* To mitigate the password leak, the value of “dbcp.password” table property is masked in the output of commands “SHOW CREATE TABLE table-name” and “DESCRIBE FORMATTED | EXTENDED table-name”. In production deployment, it is strongly discouraged to save the jdbc password in clear text in table property "dbcp.password". Instead, user can store password in a Java keystore file on HDFS by using the command like below to create a keystore file: {code:java} hadoop credential create host1.password -provider jceks://hdfs/user/test.jceks -v passwd1 {code} Then specify “dbcp.password.key” and “dbcp.password.keystore” instead of “dbcp.password” in the create table statement. *Support case-sensitive table/column names* Column names of remote tables may be different from the JDBC table schema. For example, Postgres allows case-sensitive column names, but Impala always saves column names in lowercase. In this case, the user
[jira] [Updated] (IMPALA-12754) Update Impala document to cover external jdbc table
[ https://issues.apache.org/jira/browse/IMPALA-12754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenzhe Zhou updated IMPALA-12754: - Description: We need to document the SQL syntax to create external JDBC table and alter external JDBC table, including the table properties to be set for JDBC and DBCP (Database Connection Pool). was: We need to document the SQL syntax to create jdbc table and alter jdbc table, including the table properties to be set for jdbc and DBCP (Database connection pool). > Update Impala document to cover external jdbc table > --- > > Key: IMPALA-12754 > URL: https://issues.apache.org/jira/browse/IMPALA-12754 > Project: IMPALA > Issue Type: Sub-task > Components: Docs >Reporter: Wenzhe Zhou >Assignee: gaurav singh >Priority: Major > > We need to document the SQL syntax to create external JDBC table and alter > external JDBC table, including the table properties to be set for JDBC and > DBCP (Database Connection Pool). > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Comment Edited] (IMPALA-12754) Update Impala document to cover external jdbc table
[ https://issues.apache.org/jira/browse/IMPALA-12754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842441#comment-17842441 ] Wenzhe Zhou edited comment on IMPALA-12754 at 5/1/24 6:07 PM: -- *Impala SQL syntax to create external JDBC table* When creating an external JDBC table, the user needs to specify the minimum information: database type, jdbc url, driver class, driver file location, user name and password for querying database, table name. Here are two samples to create JDBC tables for tables on Postgres server and another Impala cluster respectively. {code:java} CREATE EXTERNAL TABLE alltypes_jdbc ( id INT, bool_col BOOLEAN, tinyint_col TINYINT, smallint_col SMALLINT, int_col INT, bigint_col BIGINT, float_col FLOAT, double_col DOUBLE, date_col DATE, string_col STRING, timestamp_col TIMESTAMP) STORED BY JDBC TBLPROPERTIES ( "database.type"="POSTGRES", "jdbc.url"="jdbc:postgresql://10.96.132.138:5432/functional", "jdbc.driver"="org.postgresql.Driver", "driver.url"="/test-warehouse/data-sources/jdbc-drivers/postgresql-jdbc.jar", "dbcp.username"="hiveuser", "dbcp.password"="password", "table"="alltypes"); CREATE EXTERNAL TABLE alltypes_jdbc ( id INT, bool_col BOOLEAN, tinyint_col TINYINT, smallint_col SMALLINT, int_col INT, bigint_col BIGINT, float_col FLOAT, double_col DOUBLE, date_col DATE, string_col STRING, timestamp_col TIMESTAMP) STORED BY JDBC TBLPROPERTIES ( "database.type"="IMPALA", "jdbc.url"="jdbc:impala://10.96.132.138:21050/functional", "jdbc.auth"="AuthMech=3", "jdbc.properties"="MEM_LIMIT=10, MAX_ERRORS = 1", "jdbc.driver"="com.cloudera.impala.jdbc.Driver", "driver.url"="hdfs://test-warehouse/data-sources/jdbc-drivers/ImpalaJDBC42.jar", "dbcp.username"="hiveuser", "dbcp.password.keystore"="jceks://hdfs/test-warehouse/data-sources/test.jceks", "dbcp.password.key"="hiveuser", "table"="alltypes"); {code} *Supported data types* The column data type for an external JDBC table can be: * Numeric data type: boolean, tinyint, smallint, int, bigint, float, double * Decimal with scale and precision * String type: string * Date * Timestamp Note that following data types are not supported: char, varchar, and binary. Complex data type: struct, map, array and nested type are not supported. *Table Properties specified for external JDBC table* In the create external JDBC table statement, user is required to specify the following table properties: * database.type: IMPALA, MYSQL, POSTGRES * jdbc.url: jdbc connection string, including the database type, IP address, port number, and database name. For example, “jdbc:impala://10.96.132.138:21050/functional”. * jdbc.driver: class name of jdbc driver * driver.url: driver URL for downloading the Jar file package that is used to access the external database. * table: name of the external table to be mapped in Impala. Besides the above required properties, user can also specify optional parameters to use different authentication methods, or to allow case sensitive columns names in remote tables, or to specify additional database properties, etc: * jdbc.auth: authentication mechanisms of JDBC driver. It's used for Impala-Impala federation. * dbcp.username: jdbc user name * dbcp.password: jdbc password in clear text, this parameter is strongly discouraged in production environments. The recommended way is to store it in a keystore. See section “securing password” for details. * dbcp.password.key: key of the keystore * dbcp.password.keystore: keystore URI. * jdbc.properties: additional properties applied to database engine, like Impala Query options. Properties are specified as comma-delimited key=value string. * jdbc.fetch.size: number of rows to fetch in a batch * column.mappting: Mapping of column names between external table and Impala JDBC table. See section "Support case-sensitive table/column names" for details. *Securing password* To mitigate the password leak, the value of “dbcp.password” table property is masked in the output of commands “SHOW CREATE TABLE table-name” and “DESCRIBE FORMATTED | EXTENDED table-name”. In production deployment, it is strongly discouraged to save the jdbc password in clear text in table property "dbcp.password". Instead, user can store password in a Java keystore file on HDFS by using the command like below to create a keystore file: hadoop credential create host1.password -provider jceks://hdfs/user/test.jceks -v passwd1 Then specify “dbcp.password.key” and “dbcp.password.keystore” instead of “dbcp.password” in the create table statement. *Support case-sensitive table/column names* Column names of remote tables may be different from the JDBC table schema. For example, Postgres allows case-sensitive column names, but Impala always saves column names in lowercase. In this case, the user can set the
[jira] [Comment Edited] (IMPALA-12754) Update Impala document to cover external jdbc table
[ https://issues.apache.org/jira/browse/IMPALA-12754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842441#comment-17842441 ] Wenzhe Zhou edited comment on IMPALA-12754 at 5/1/24 6:06 PM: -- *Impala SQL syntax to create external JDBC table* When creating an external JDBC table, the user needs to specify the minimum information: database type, jdbc url, driver class, driver file location, user name and password for querying database, table name. Here are two samples to create JDBC tables for tables on Postgres server and another Impala cluster respectively. {code:java} CREATE EXTERNAL TABLE alltypes_jdbc ( id INT, bool_col BOOLEAN, tinyint_col TINYINT, smallint_col SMALLINT, int_col INT, bigint_col BIGINT, float_col FLOAT, double_col DOUBLE, date_col DATE, string_col STRING, timestamp_col TIMESTAMP) STORED BY JDBC TBLPROPERTIES ( "database.type"="POSTGRES", "jdbc.url"="jdbc:postgresql://10.96.132.138:5432/functional", "jdbc.driver"="org.postgresql.Driver", "driver.url"="/test-warehouse/data-sources/jdbc-drivers/postgresql-jdbc.jar", "dbcp.username"="hiveuser", "dbcp.password"="password", "table"="alltypes"); CREATE EXTERNAL TABLE alltypes_jdbc ( id INT, bool_col BOOLEAN, tinyint_col TINYINT, smallint_col SMALLINT, int_col INT, bigint_col BIGINT, float_col FLOAT, double_col DOUBLE, date_col DATE, string_col STRING, timestamp_col TIMESTAMP) STORED BY JDBC TBLPROPERTIES ( "database.type"="IMPALA", "jdbc.url"="jdbc:impala://10.96.132.138:21050/functional", "jdbc.auth"="AuthMech=3", "jdbc.properties"="MEM_LIMIT=10, MAX_ERRORS = 1", "jdbc.driver"="com.cloudera.impala.jdbc.Driver", "driver.url"="hdfs://test-warehouse/data-sources/jdbc-drivers/ImpalaJDBC42.jar", "dbcp.username"="hiveuser", "dbcp.password.keystore"="jceks://hdfs/test-warehouse/data-sources/test.jceks", "dbcp.password.key"="hiveuser", "table"="alltypes"); {code} *Supported data types* The column data type for an external JDBC table can be: * Numeric data type: boolean, tinyint, smallint, int, bigint, float, double * Decimal with scale and precision * String type: string * Date * Timestamp Note that following data types are not supported: char, varchar, and binary. Complex data type: struct, map, array and nested type are not supported. *Table Properties specified for external JDBC table* In the create external JDBC table statement, user is required to specify the following table properties: * database.type: IMPALA, MYSQL, POSTGRES * jdbc.url: jdbc connection string, including the database type, IP address, port number, and database name. For example, “jdbc:impala://10.96.132.138:21050/functional”. * jdbc.driver: class name of jdbc driver * driver.url: driver URL for downloading the Jar file package that is used to access the external database. * table: name of the external table to be mapped in Impala. Besides the above required properties, user can also specify optional parameters to use different authentication methods, or to allow case sensitive columns names in remote tables, or to specify additional database properties, etc: jdbc.auth: authentication mechanisms of JDBC driver. It's used for Impala-Impala federation. dbcp.username: jdbc user name dbcp.password: jdbc password in clear text, this parameter is strongly discouraged in production environments. The recommended way is to store it in a keystore. See section “securing password” for details. dbcp.password.key: key of the keystore dbcp.password.keystore: keystore URI. jdbc.properties: additional properties applied to database engine, like Impala Query options. Properties are specified as comma-delimited key=value string. jdbc.fetch.size: number of rows to fetch in a batch column.mappting: Mapping of column names between external table and Impala JDBC table. See section "Support case-sensitive table/column names" for details. *Securing password* To mitigate the password leak, the value of “dbcp.password” table property is masked in the output of commands “SHOW CREATE TABLE table-name” and “DESCRIBE FORMATTED | EXTENDED table-name”. In production deployment, it is strongly discouraged to save the jdbc password in clear text in table property "dbcp.password". Instead, user can store password in a Java keystore file on HDFS by using the command like below to create a keystore file: hadoop credential create host1.password -provider jceks://hdfs/user/test.jceks -v passwd1 Then specify “dbcp.password.key” and “dbcp.password.keystore” instead of “dbcp.password” in the create table statement. *Support case-sensitive table/column names* Column names of remote tables may be different from the JDBC table schema. For example, Postgres allows case-sensitive column names, but Impala always saves column names in lowercase. In this case, the user can set the “column.mapping”
[jira] [Comment Edited] (IMPALA-12754) Update Impala document to cover external jdbc table
[ https://issues.apache.org/jira/browse/IMPALA-12754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842441#comment-17842441 ] Wenzhe Zhou edited comment on IMPALA-12754 at 5/1/24 6:03 PM: -- *Impala SQL syntax to create external JDBC table* When creating an external JDBC table, the user needs to specify the minimum information: database type, jdbc url, driver class, driver file location, user name and password for querying database, table name. Here are two samples to create JDBC tables for tables on Postgres server and another Impala cluster respectively. {code:java} CREATE EXTERNAL TABLE alltypes_jdbc ( id INT, bool_col BOOLEAN, tinyint_col TINYINT, smallint_col SMALLINT, int_col INT, bigint_col BIGINT, float_col FLOAT, double_col DOUBLE, date_col DATE, string_col STRING, timestamp_col TIMESTAMP) STORED BY JDBC TBLPROPERTIES ( "database.type"="POSTGRES", "jdbc.url"="jdbc:postgresql://10.96.132.138:5432/functional", "jdbc.driver"="org.postgresql.Driver", "driver.url"="/test-warehouse/data-sources/jdbc-drivers/postgresql-jdbc.jar", "dbcp.username"="hiveuser", "dbcp.password"="password", "table"="alltypes"); CREATE EXTERNAL TABLE alltypes_jdbc ( id INT, bool_col BOOLEAN, tinyint_col TINYINT, smallint_col SMALLINT, int_col INT, bigint_col BIGINT, float_col FLOAT, double_col DOUBLE, date_col DATE, string_col STRING, timestamp_col TIMESTAMP) STORED BY JDBC TBLPROPERTIES ( "database.type"="IMPALA", "jdbc.url"="jdbc:impala://10.96.132.138:21050/functional", "jdbc.auth"="AuthMech=3", "jdbc.properties"="MEM_LIMIT=10, MAX_ERRORS = 1", "jdbc.driver"="com.cloudera.impala.jdbc.Driver", "driver.url"="hdfs://test-warehouse/data-sources/jdbc-drivers/ImpalaJDBC42.jar", "dbcp.username"="hiveuser", "dbcp.password.keystore"="jceks://hdfs/test-warehouse/data-sources/test.jceks", "dbcp.password.key"="hiveuser", "table"="alltypes"); {code} *Supported data types* The column data type for an external JDBC table can be: Numeric data type: boolean, tinyint, smallint, int, bigint, float, double Decimal with scale and precision String type: string Date Timestamp Note that following data types are not supported: char, varchar, and binary. Complex data type: struct, map, array and nested type are not supported. *Table Properties specified for external JDBC table* In the create external JDBC table statement, user is required to specify the following table properties: database.type: IMPALA, MYSQL, POSTGRES jdbc.url: jdbc connection string, including the database type, IP address, port number, and database name. For example, “jdbc:impala://10.96.132.138:21050/functional”. jdbc.driver: class name of jdbc driver driver.url: driver URL for downloading the Jar file package that is used to access the external database. table: name of the external table to be mapped in Impala. Besides the above required properties, user can also specify optional parameters to use different authentication methods, or to allow case sensitive columns names in remote tables, or to specify additional database properties, etc: jdbc.auth: authentication mechanisms of JDBC driver. It's used for Impala-Impala federation. dbcp.username: jdbc user name dbcp.password: jdbc password in clear text, this parameter is strongly discouraged in production environments. The recommended way is to store it in a keystore. See section “securing password” for details. dbcp.password.key: key of the keystore dbcp.password.keystore: keystore URI. jdbc.properties: additional properties applied to database engine, like Impala Query options. Properties are specified as comma-delimited key=value string. jdbc.fetch.size: number of rows to fetch in a batch column.mappting: Mapping of column names between external table and Impala JDBC table. See section "Support case-sensitive table/column names" for details. *Securing password* To mitigate the password leak, the value of “dbcp.password” table property is masked in the output of commands “SHOW CREATE TABLE table-name” and “DESCRIBE FORMATTED | EXTENDED table-name”. In production deployment, it is strongly discouraged to save the jdbc password in clear text in table property "dbcp.password". Instead, user can store password in a Java keystore file on HDFS by using the command like below to create a keystore file: hadoop credential create host1.password -provider jceks://hdfs/user/test.jceks -v passwd1 Then specify “dbcp.password.key” and “dbcp.password.keystore” instead of “dbcp.password” in the create table statement. *Support case-sensitive table/column names* Column names of remote tables may be different from the JDBC table schema. For example, Postgres allows case-sensitive column names, but Impala always saves column names in lowercase. In this case, the user can set the “column.mapping” table property to map
[jira] [Comment Edited] (IMPALA-12754) Update Impala document to cover external jdbc table
[ https://issues.apache.org/jira/browse/IMPALA-12754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842441#comment-17842441 ] Wenzhe Zhou edited comment on IMPALA-12754 at 5/1/24 6:02 PM: -- *Impala SQL syntax to create external JDBC table* When creating an external JDBC table, the user needs to specify the minimum information: database type, jdbc url, driver class, driver file location, user name and password for querying database, table name. Here are two samples to create JDBC tables for tables on Postgres server and another Impala cluster respectively. {code:java} *CREATE EXTERNAL TABLE* alltypes_jdbc ( id INT, bool_col BOOLEAN, tinyint_col TINYINT, smallint_col SMALLINT, int_col INT, bigint_col BIGINT, float_col FLOAT, double_col DOUBLE, date_col DATE, string_col STRING, timestamp_col TIMESTAMP) *STORED BY JDBC* *TBLPROPERTIES* ( "database.type"="POSTGRES", "jdbc.url"="jdbc:postgresql://10.96.132.138:5432/functional", "jdbc.driver"="org.postgresql.Driver", "driver.url"="/test-warehouse/data-sources/jdbc-drivers/postgresql-jdbc.jar", "dbcp.username"="hiveuser", "dbcp.password"="password", "table"="alltypes"); *CREATE EXTERNAL TABLE* alltypes_jdbc ( id INT, bool_col BOOLEAN, tinyint_col TINYINT, smallint_col SMALLINT, int_col INT, bigint_col BIGINT, float_col FLOAT, double_col DOUBLE, date_col DATE, string_col STRING, timestamp_col TIMESTAMP) *STORED BY JDBC* *TBLPROPERTIES* ( "database.type"="IMPALA", "jdbc.url"="jdbc:impala://10.96.132.138:21050/functional", "jdbc.auth"="AuthMech=3", "jdbc.properties"="MEM_LIMIT=10, MAX_ERRORS = 1", "jdbc.driver"="com.cloudera.impala.jdbc.Driver", "driver.url"="hdfs://test-warehouse/data-sources/jdbc-drivers/ImpalaJDBC42.jar", "dbcp.username"="hiveuser", "dbcp.password.keystore"="jceks://hdfs/test-warehouse/data-sources/test.jceks", "dbcp.password.key"="hiveuser", "table"="alltypes"); {code} *Supported data types* The column data type for an external JDBC table can be: Numeric data type: boolean, tinyint, smallint, int, bigint, float, double Decimal with scale and precision String type: string Date Timestamp Note that following data types are not supported: char, varchar, and binary. Complex data type: struct, map, array and nested type are not supported. *Table Properties specified for external JDBC table* In the create external JDBC table statement, user is required to specify the following table properties: database.type: IMPALA, MYSQL, POSTGRES jdbc.url: jdbc connection string, including the database type, IP address, port number, and database name. For example, “jdbc:impala://10.96.132.138:21050/functional”. jdbc.driver: class name of jdbc driver driver.url: driver URL for downloading the Jar file package that is used to access the external database. table: name of the external table to be mapped in Impala. Besides the above required properties, user can also specify optional parameters to use different authentication methods, or to allow case sensitive columns names in remote tables, or to specify additional database properties, etc: jdbc.auth: authentication mechanisms of JDBC driver. It's used for Impala-Impala federation. dbcp.username: jdbc user name dbcp.password: jdbc password in clear text, this parameter is strongly discouraged in production environments. The recommended way is to store it in a keystore. See section “securing password” for details. dbcp.password.key: key of the keystore dbcp.password.keystore: keystore URI. jdbc.properties: additional properties applied to database engine, like Impala Query options. Properties are specified as comma-delimited key=value string. jdbc.fetch.size: number of rows to fetch in a batch column.mappting: Mapping of column names between external table and Impala JDBC table. See section "Support case-sensitive table/column names" for details. *Securing password* To mitigate the password leak, the value of “dbcp.password” table property is masked in the output of commands “SHOW CREATE TABLE table-name” and “DESCRIBE FORMATTED | EXTENDED table-name”. In production deployment, it is strongly discouraged to save the jdbc password in clear text in table property "dbcp.password". Instead, user can store password in a Java keystore file on HDFS by using the command like below to create a keystore file: hadoop credential create host1.password -provider jceks://hdfs/user/test.jceks -v passwd1 Then specify “dbcp.password.key” and “dbcp.password.keystore” instead of “dbcp.password” in the create table statement. *Support case-sensitive table/column names* Column names of remote tables may be different from the JDBC table schema. For example, Postgres allows case-sensitive column names, but Impala always saves column names in lowercase. In this case, the user can set the “column.mapping” table
[jira] [Comment Edited] (IMPALA-12754) Update Impala document to cover external jdbc table
[ https://issues.apache.org/jira/browse/IMPALA-12754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842441#comment-17842441 ] Wenzhe Zhou edited comment on IMPALA-12754 at 5/1/24 6:00 PM: -- *Impala SQL syntax to create external JDBC table* When creating an external JDBC table, the user needs to specify the minimum information: database type, jdbc url, driver class, driver file location, user name and password for querying database, table name. Here are two samples to create JDBC tables for tables on Postgres server and another Impala cluster respectively. *CREATE EXTERNAL TABLE* alltypes_jdbc ( id INT, bool_col BOOLEAN, tinyint_col TINYINT, smallint_col SMALLINT, int_col INT, bigint_col BIGINT, float_col FLOAT, double_col DOUBLE, date_col DATE, string_col STRING, timestamp_col TIMESTAMP) *STORED BY JDBC* *TBLPROPERTIES* ( "database.type"="POSTGRES", "jdbc.url"="jdbc:postgresql://10.96.132.138:5432/functional", "jdbc.driver"="org.postgresql.Driver", "driver.url"="/test-warehouse/data-sources/jdbc-drivers/postgresql-jdbc.jar", "dbcp.username"="hiveuser", "dbcp.password"="password", "table"="alltypes"); *CREATE EXTERNAL TABLE* alltypes_jdbc ( id INT, bool_col BOOLEAN, tinyint_col TINYINT, smallint_col SMALLINT, int_col INT, bigint_col BIGINT, float_col FLOAT, double_col DOUBLE, date_col DATE, string_col STRING, timestamp_col TIMESTAMP) *STORED BY JDBC* *TBLPROPERTIES* ( "database.type"="IMPALA", "jdbc.url"="jdbc:impala://10.96.132.138:21050/functional", "jdbc.auth"="AuthMech=3", "jdbc.properties"="MEM_LIMIT=10, MAX_ERRORS = 1", "jdbc.driver"="com.cloudera.impala.jdbc.Driver", "driver.url"="hdfs://test-warehouse/data-sources/jdbc-drivers/ImpalaJDBC42.jar", "dbcp.username"="hiveuser", "dbcp.password.keystore"="jceks://hdfs/test-warehouse/data-sources/test.jceks", "dbcp.password.key"="hiveuser", "table"="alltypes"); *Supported data types* The column data type for an external JDBC table can be: Numeric data type: boolean, tinyint, smallint, int, bigint, float, double Decimal with scale and precision String type: string Date Timestamp Note that following data types are not supported: char, varchar, and binary. Complex data type: struct, map, array and nested type are not supported. *Table Properties specified for external JDBC table* In the create external JDBC table statement, user is required to specify the following table properties: database.type: IMPALA, MYSQL, POSTGRES jdbc.url: jdbc connection string, including the database type, IP address, port number, and database name. For example, “jdbc:impala://10.96.132.138:21050/functional”. jdbc.driver: class name of jdbc driver driver.url: driver URL for downloading the Jar file package that is used to access the external database. table: name of the external table to be mapped in Impala. Besides the above required properties, user can also specify optional parameters to use different authentication methods, or to allow case sensitive columns names in remote tables, or to specify additional database properties, etc: jdbc.auth: authentication mechanisms of JDBC driver. It's used for Impala-Impala federation. dbcp.username: jdbc user name dbcp.password: jdbc password in clear text, this parameter is strongly discouraged in production environments. The recommended way is to store it in a keystore. See section “securing password” for details. dbcp.password.key: key of the keystore dbcp.password.keystore: keystore URI. jdbc.properties: additional properties applied to database engine, like Impala Query options. Properties are specified as comma-delimited key=value string. jdbc.fetch.size: number of rows to fetch in a batch column.mappting: Mapping of column names between external table and Impala JDBC table. See section "Support case-sensitive table/column names" for details. *Securing password* To mitigate the password leak, the value of “dbcp.password” table property is masked in the output of commands “SHOW CREATE TABLE table-name” and “DESCRIBE FORMATTED | EXTENDED table-name”. In production deployment, it is strongly discouraged to save the jdbc password in clear text in table property "dbcp.password". Instead, user can store password in a Java keystore file on HDFS by using the command like below to create a keystore file: hadoop credential create host1.password -provider jceks://hdfs/user/test.jceks -v passwd1 Then specify “dbcp.password.key” and “dbcp.password.keystore” instead of “dbcp.password” in the create table statement. *Support case-sensitive table/column names* Column names of remote tables may be different from the JDBC table schema. For example, Postgres allows case-sensitive column names, but Impala always saves column names in lowercase. In this case, the user can set the “column.mapping” table property to map
[jira] [Comment Edited] (IMPALA-12754) Update Impala document to cover external jdbc table
[ https://issues.apache.org/jira/browse/IMPALA-12754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842441#comment-17842441 ] Wenzhe Zhou edited comment on IMPALA-12754 at 5/1/24 5:59 PM: -- *Impala SQL syntax to create external JDBC table* When creating an external JDBC table, the user needs to specify the minimum information: database type, jdbc url, driver class, driver file location, user name and password for querying database, table name. Here are two samples to create JDBC tables for tables on Postgres server and another Impala cluster respectively. *CREATE EXTERNAL TABLE* alltypes_jdbc ( id INT, bool_col BOOLEAN, tinyint_col TINYINT, smallint_col SMALLINT, int_col INT, bigint_col BIGINT, float_col FLOAT, double_col DOUBLE, date_col DATE, string_col STRING, timestamp_col TIMESTAMP) *STORED BY JDBC* *TBLPROPERTIES* ( "database.type"="POSTGRES", "jdbc.url"="jdbc:postgresql://10.96.132.138:5432/functional", "jdbc.driver"="org.postgresql.Driver", "driver.url"="/test-warehouse/data-sources/jdbc-drivers/postgresql-jdbc.jar", "dbcp.username"="hiveuser", "dbcp.password"="password", "table"="alltypes"); *CREATE EXTERNAL TABLE* alltypes_jdbc ( id INT, bool_col BOOLEAN, tinyint_col TINYINT, smallint_col SMALLINT, int_col INT, bigint_col BIGINT, float_col FLOAT, double_col DOUBLE, date_col DATE, string_col STRING, timestamp_col TIMESTAMP) *STORED BY JDBC* *TBLPROPERTIES* ( "database.type"="IMPALA", "jdbc.url"="jdbc:impala://10.96.132.138:21050/functional", "jdbc.auth"="AuthMech=3", "jdbc.properties"="MEM_LIMIT=10, MAX_ERRORS = 1", "jdbc.driver"="com.cloudera.impala.jdbc.Driver", "driver.url"="hdfs://test-warehouse/data-sources/jdbc-drivers/ImpalaJDBC42.jar", "dbcp.username"="hiveuser", "dbcp.password.keystore"="jceks://hdfs/test-warehouse/data-sources/test.jceks", "dbcp.password.key"="hiveuser", "table"="alltypes"); *Supported data types* The column data type for an external JDBC table can be: Numeric data type: boolean, tinyint, smallint, int, bigint, float, double Decimal with scale and precision String type: string Date Timestamp Note that following data types are not supported: char, varchar, and binary. Complex data type: struct, map, array and nested type are not supported. *Table Properties specified for external JDBC table* In the create external JDBC table statement, user is required to specify the following table properties: database.type: IMPALA, MYSQL, POSTGRES jdbc.url: jdbc connection string, including the database type, IP address, port number, and database name. For example, “jdbc:impala://10.96.132.138:21050/functional”. jdbc.driver: class name of jdbc driver driver.url: driver URL for downloading the Jar file package that is used to access the external database. table: name of the external table to be mapped in Impala. Besides the above required properties, user can also specify optional parameters to use different authentication methods, or to allow case sensitive columns names in remote tables, or to specify additional database properties, etc: jdbc.auth: authentication mechanisms of JDBC driver. It's used for Impala-Impala federation. dbcp.username: jdbc user name dbcp.password: jdbc password in clear text, this parameter is strongly discouraged in production environments. The recommended way is to store it in a keystore. See section “securing password” for details. dbcp.password.key: key of the keystore dbcp.password.keystore: keystore URI. jdbc.properties: additional properties applied to database engine, like Impala Query options. Properties are specified as comma-delimited key=value string. jdbc.fetch.size: number of rows to fetch in a batch column.mappting: Mapping of column names between external table and Impala JDBC table. See section "Support case-sensitive table/column names" for details. *Securing password* To mitigate the password leak, the value of “dbcp.password” table property is masked in the output of commands “SHOW CREATE TABLE table-name” and “DESCRIBE FORMATTED | EXTENDED table-name”. In production deployment, it is strongly discouraged to save the jdbc password in clear text in table property "dbcp.password". Instead, user can store password in a Java keystore file on HDFS by using the command like below to create a keystore file: hadoop credential create host1.password -provider jceks://hdfs/user/test.jceks -v passwd1 Then specify “dbcp.password.key” and “dbcp.password.keystore” instead of “dbcp.password” in the create table statement. *Support case-sensitive table/column names* Column names of remote tables may be different from the JDBC table schema. For example, Postgres allows case-sensitive column names, but Impala always saves column names in lowercase. In this case, the user can set the “column.mapping” table property to map
[jira] [Comment Edited] (IMPALA-12754) Update Impala document to cover external jdbc table
[ https://issues.apache.org/jira/browse/IMPALA-12754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842441#comment-17842441 ] Wenzhe Zhou edited comment on IMPALA-12754 at 5/1/24 5:55 PM: -- *Impala SQL syntax to create external JDBC table* When creating an external JDBC table, the user needs to specify the minimum information: database type, jdbc url, driver class, driver file location, user name and password for querying database, table name. Here are two samples to create JDBC tables for tables on Postgres server and another Impala cluster respectively. *CREATE EXTERNAL TABLE* alltypes_jdbc ( id INT, bool_col BOOLEAN, tinyint_col TINYINT, smallint_col SMALLINT, int_col INT, bigint_col BIGINT, float_col FLOAT, double_col DOUBLE, date_col DATE, string_col STRING, timestamp_col TIMESTAMP) *STORED BY JDBC* *TBLPROPERTIES* ( "database.type"="POSTGRES", "jdbc.url"="jdbc:postgresql://10.96.132.138:5432/functional", "jdbc.driver"="org.postgresql.Driver", "driver.url"="/test-warehouse/data-sources/jdbc-drivers/postgresql-jdbc.jar", "dbcp.username"="hiveuser", "dbcp.password"="password", "table"="alltypes"); *CREATE EXTERNAL TABLE* alltypes_jdbc ( id INT, bool_col BOOLEAN, tinyint_col TINYINT, smallint_col SMALLINT, int_col INT, bigint_col BIGINT, float_col FLOAT, double_col DOUBLE, date_col DATE, string_col STRING, timestamp_col TIMESTAMP) *STORED BY JDBC* *TBLPROPERTIES* ( "database.type"="IMPALA", "jdbc.url"="jdbc:impala://10.96.132.138:21050/functional", "jdbc.auth"="AuthMech=3", "jdbc.properties"="MEM_LIMIT=10, MAX_ERRORS = 1", "jdbc.driver"="com.cloudera.impala.jdbc.Driver", "driver.url"="hdfs://test-warehouse/data-sources/jdbc-drivers/ImpalaJDBC42.jar", "dbcp.username"="hiveuser", "dbcp.password.keystore"="jceks://hdfs/test-warehouse/data-sources/test.jceks", "dbcp.password.key"="hiveuser", "table"="alltypes"); *Supported data types* The column data type for an external JDBC table can be: Numeric data type: boolean, tinyint, smallint, int, bigint, float, double Decimal with scale and precision String type: string Date Timestamp Note that following data types are not supported: char, varchar, and binary. Complex data type: struct, map, array and nested type are not supported. *Table Properties specified for external JDBC table* In the create external JDBC table statement, user is required to specify the following table properties: database.type: IMPALA, MYSQL, POSTGRES jdbc.url: jdbc connection string, including the database type, IP address, port number, and database name. For example, “jdbc:impala://10.96.132.138:21050/functional”. jdbc.driver: class name of jdbc driver driver.url: driver URL for downloading the Jar file package that is used to access the external database. table: name of the external table to be mapped in Impala. Besides the above required properties, user can also specify optional parameters to use different authentication methods, or to allow case sensitive columns names in remote tables, or to specify additional database properties, etc: jdbc.auth: authentication mechanisms of JDBC driver. It's used for Impala-Impala federation. dbcp.username: jdbc user name dbcp.password: jdbc password in clear text, this parameter is strongly discouraged in production environments. The recommended way is to store it in a keystore. See section “securing password” for details. dbcp.password.key: key of the keystore dbcp.password.keystore: keystore URI. jdbc.properties: additional properties applied to database engine, like Impala Query options. Properties are specified as comma-delimited key=value string. jdbc.fetch.size: number of rows to fetch in a batch column.mappting: Mapping of column names between external table and Impala JDBC table. See section "Support case-sensitive table/column names" for details. *Securing password* To mitigate the password leak, the value of “dbcp.password” table property is masked in the output of commands “SHOW CREATE TABLE table-name” and “DESCRIBE FORMATTED | EXTENDED table-name”. In production deployment, it is strongly discouraged to save the jdbc password in clear text in table property "dbcp.password". Instead, user can store password in a Java keystore file on HDFS by using the command like below to create a keystore file: hadoop credential create host1.password -provider jceks://hdfs/user/test.jceks -v passwd1 Then specify “dbcp.password.key” and “dbcp.password.keystore” instead of “dbcp.password” in the create table statement. *Support case-sensitive table/column names* Column names of remote tables may be different from the JDBC table schema. For example, Postgres allows case-sensitive column names, but Impala always saves column names in lowercase. In this case, the user can set the “column.mapping” table property to map
[jira] [Updated] (IMPALA-12754) Update Impala document to cover external jdbc table
[ https://issues.apache.org/jira/browse/IMPALA-12754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenzhe Zhou updated IMPALA-12754: - Description: We need to document the SQL syntax to create jdbc table and alter jdbc table, including the table properties to be set for jdbc and DBCP (Database connection pool). was: Impala external data source is undocumented in upstream. We need to document the external data source APIs, SQL syntax to create jdbc table, including the properties to be set for jdbc and DBCP (Database connection pool). > Update Impala document to cover external jdbc table > --- > > Key: IMPALA-12754 > URL: https://issues.apache.org/jira/browse/IMPALA-12754 > Project: IMPALA > Issue Type: Sub-task > Components: Docs >Reporter: Wenzhe Zhou >Assignee: gaurav singh >Priority: Major > > We need to document the SQL syntax to create jdbc table and alter jdbc > table, including the table properties to be set for jdbc and DBCP (Database > connection pool). > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-12118) Consider using pytorch cpuinfo library for CPU detection / logging
[ https://issues.apache.org/jira/browse/IMPALA-12118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Smith reassigned IMPALA-12118: -- Assignee: Michael Smith > Consider using pytorch cpuinfo library for CPU detection / logging > -- > > Key: IMPALA-12118 > URL: https://issues.apache.org/jira/browse/IMPALA-12118 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Affects Versions: Impala 4.3.0 >Reporter: Joe McDonnell >Assignee: Michael Smith >Priority: Major > Labels: arm > > The code in be/src/util/cpu-info.h/.cc provides information about the CPU > that we run on, along with NUMA configurations, cache sizes, etc. > Pytorch's cpuinfo package seems to cover most of what we want to do across a > much broader set of processors / architectures. It has a compatible license. > See [https://github.com/pytorch/cpuinfo] > We should see if this is useful for our cpu detection code. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-12521) gerrit-verify-dryrun should include an ARM build
[ https://issues.apache.org/jira/browse/IMPALA-12521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Smith resolved IMPALA-12521. Resolution: Fixed > gerrit-verify-dryrun should include an ARM build > > > Key: IMPALA-12521 > URL: https://issues.apache.org/jira/browse/IMPALA-12521 > Project: IMPALA > Issue Type: Task > Components: Infrastructure >Reporter: Michael Smith >Assignee: Laszlo Gaal >Priority: Major > > Impala has some ARM-specific code that has occasionally been broken. We > should include a build-only job as part of the set of jobs run by > gerrit-verify-dryrun to avoid regressions. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-12118) Consider using pytorch cpuinfo library for CPU detection / logging
[ https://issues.apache.org/jira/browse/IMPALA-12118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Smith updated IMPALA-12118: --- Epic Link: (was: IMPALA-12353) > Consider using pytorch cpuinfo library for CPU detection / logging > -- > > Key: IMPALA-12118 > URL: https://issues.apache.org/jira/browse/IMPALA-12118 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Affects Versions: Impala 4.3.0 >Reporter: Joe McDonnell >Priority: Major > Labels: arm > > The code in be/src/util/cpu-info.h/.cc provides information about the CPU > that we run on, along with NUMA configurations, cache sizes, etc. > Pytorch's cpuinfo package seems to cover most of what we want to do across a > much broader set of processors / architectures. It has a compatible license. > See [https://github.com/pytorch/cpuinfo] > We should see if this is useful for our cpu detection code. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-12869) Add flag for bin/start-impala-cluster.py to start with local catalog mode
[ https://issues.apache.org/jira/browse/IMPALA-12869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saurabh Katiyal reassigned IMPALA-12869: Assignee: Anshula Jain > Add flag for bin/start-impala-cluster.py to start with local catalog mode > - > > Key: IMPALA-12869 > URL: https://issues.apache.org/jira/browse/IMPALA-12869 > Project: IMPALA > Issue Type: New Feature > Components: Infrastructure >Reporter: Quanlong Huang >Assignee: Anshula Jain >Priority: Major > Labels: newbie, ramp-up > > It'd be convinient to start Impala cluster with local catalog mode by a > command like > {code:bash} > bin/start-impala-cluster.py --use_local_catalog {code} > Currently, we need to use a longer command which is easier to have typos: > {code:bash} > bin/start-impala-cluster.py --catalogd_args=--catalog_topic_mode=minimal > --impalad_args=--use_local_catalog{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-13033) impala-profile-tool should support parsing thrift profiles downloaded from WebUI
[ https://issues.apache.org/jira/browse/IMPALA-13033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saurabh Katiyal reassigned IMPALA-13033: Assignee: Anshula Jain > impala-profile-tool should support parsing thrift profiles downloaded from > WebUI > > > Key: IMPALA-13033 > URL: https://issues.apache.org/jira/browse/IMPALA-13033 > Project: IMPALA > Issue Type: Bug > Components: Infrastructure >Reporter: Quanlong Huang >Assignee: Anshula Jain >Priority: Major > Labels: newbie, ramp-up > > In the coordinator WebUI, users can download query profiles in > text/json/thrift formats. The thrift profile is the same as one line in the > profile log without the timestamp and query id at the beginning. > impala-profile-tool fails to parse such a file. It should retry parsing the > whole line as the encoded profile. Current code snipper: > {code:cpp} > // Parse out fields from the line. > istringstream liness(line); > int64_t timestamp; > string query_id, encoded_profile; > liness >> timestamp >> query_id >> encoded_profile; > if (liness.fail()) { > cerr << "Error parsing line " << lineno << ": '" << line << "'\n"; > ++errors; > continue; > }{code} > https://github.com/apache/impala/blob/f620e5d5c0bbdb0fd97bac31c7b7439cd13c6d08/be/src/util/impala-profile-tool.cc#L109 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org