[jira] [Assigned] (IMPALA-13019) Add query option to keep DBCP DataSource objects in cache for longer time

2024-05-01 Thread Wenzhe Zhou (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenzhe Zhou reassigned IMPALA-13019:


Assignee: Wenzhe Zhou  (was: Pranav Yogi Lodha)

> Add query option to keep DBCP DataSource objects in cache for longer time
> -
>
> Key: IMPALA-13019
> URL: https://issues.apache.org/jira/browse/IMPALA-13019
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Frontend
>Reporter: Wenzhe Zhou
>Assignee: Wenzhe Zhou
>Priority: Major
> Fix For: Impala 4.5.0
>
>
> IMPALA-12910 adds a new class DataSourceObjectCache to save DBCP DataSource 
> objects.  DBCP DataSource objects hold JDBC driver. By sharing DBCP 
> DataSource objects across multiple JDBC requests, we can avoid to load JDBC 
> driver. Current code checks reference count for each DBCP DataSource object, 
> and removes the cached object once reference count reaches 0.
> DBCP DataSource objects should be kept in cache even reference count reaches 
> 0. We could add a query option to enable it. This need a working thread to 
> periodically clean up objects which are idle for long time. Maybe separate 
> maps to save active objects and idle objects. Query options could be passed 
> to JDBC code in TOpenParams.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-12910) Run TPCH/TPCDS queries for external JDBC tables

2024-05-01 Thread Wenzhe Zhou (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenzhe Zhou resolved IMPALA-12910.
--
Fix Version/s: Impala 4.5.0
   Resolution: Fixed

> Run TPCH/TPCDS queries for external JDBC tables
> ---
>
> Key: IMPALA-12910
> URL: https://issues.apache.org/jira/browse/IMPALA-12910
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Perf Investigation
>Reporter: Wenzhe Zhou
>Assignee: Wenzhe Zhou
>Priority: Major
> Fix For: Impala 4.5.0
>
>
> Need performance data for queries on external JDBC tables to be documented in 
> the design doc.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12910) Run TPCH/TPCDS queries for external JDBC tables

2024-05-01 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842837#comment-17842837
 ] 

ASF subversion and git services commented on IMPALA-12910:
--

Commit 08f8a300250df7b4f9a517cdb6bab48c379b7e03 in impala's branch 
refs/heads/master from wzhou-code
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=08f8a3002 ]

IMPALA-12910: Support running TPCH/TPCDS queries for JDBC tables

This patch adds script to create external JDBC tables for the dataset of
TPCH and TPCDS, and adds unit-tests to run TPCH and TPCDS queries for
external JDBC tables with Impala-Impala federation. Note that JDBC
tables are mapping tables, they don't take additional disk spaces.
It fixes the race condition when caching of SQL DataSource objects by
using a new DataSourceObjectCache class, which checks reference count
before closing SQL DataSource.
Adds a new query-option 'clean_dbcp_ds_cache' with default value as
true. When it's set as false, SQL DataSource object will not be closed
when its reference count equals 0 and will be kept in cache until
the SQL DataSource is idle for more than 5 minutes. Flag variable
'dbcp_data_source_idle_timeout_s' is added to make the duration
configurable.
java.sql.Connection.close() fails to remove a closed connection from
connection pool sometimes, which causes JDBC working threads to wait
for available connections from the connection pool for a long time.
The work around is to call BasicDataSource.invalidateConnection() API
to close a connection.
Two flag variables are added for DBCP configuration properties
'maxTotal' and 'maxWaitMillis'. Note that 'maxActive' and 'maxWait'
properties are renamed to 'maxTotal' and 'maxWaitMillis' respectively
in apache.commons.dbcp v2.
Fixes a bug for database type comparison since the type strings
specified by user could be lower case or mix of upper/lower cases, but
the code compares the types with upper case string.
Fixes issue to close SQL DataSource object in JdbcDataSource.open()
and JdbcDataSource.getNext() when some errors returned from DBCP APIs
or JDBC drivers.

testdata/bin/create-tpc-jdbc-tables.py supports to create JDBC tables
for Impala-Impala, Postgres and MySQL.
Following sample commands creates TPCDS JDBC tables for Impala-Impala
federation with remote coordinator running at 10.19.10.86, and Postgres
server running at 10.19.10.86:
  ${IMPALA_HOME}/testdata/bin/create-tpc-jdbc-tables.py \
--jdbc_db_name=tpcds_jdbc --workload=tpcds \
--database_type=IMPALA --database_host=10.19.10.86 --clean

  ${IMPALA_HOME}/testdata/bin/create-tpc-jdbc-tables.py \
--jdbc_db_name=tpcds_jdbc --workload=tpcds \
--database_type=POSTGRES --database_host=10.19.10.86 \
--database_name=tpcds --clean

TPCDS tests for JDBC tables run only for release/exhaustive builds.
TPCH tests for JDBC tables run for core and exhaustive builds, except
Dockerized builds.

Remaining Issues:
 - tpcds-decimal_v2-q80a failed with returned rows not matching expected
   results for some decimal values. This will be fixed in IMPALA-13018.

Testing:
 - Passed core tests.
 - Passed query_test/test_tpcds_queries.py in release/exhaustive build.
 - Manually verified that only one SQL DataSource object was created for
   test_tpcds_queries.py::TestTpcdsQueryForJdbcTables since query option
   'clean_dbcp_ds_cache' was set as false, and the SQL DataSource object
   was closed by cleanup thread.

Change-Id: I44e8c1bb020e90559c7f22483a7ab7a151b8f48a
Reviewed-on: http://gerrit.cloudera.org:8080/21304
Reviewed-by: Abhishek Rawat 
Tested-by: Impala Public Jenkins 


> Run TPCH/TPCDS queries for external JDBC tables
> ---
>
> Key: IMPALA-12910
> URL: https://issues.apache.org/jira/browse/IMPALA-12910
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Perf Investigation
>Reporter: Wenzhe Zhou
>Assignee: Wenzhe Zhou
>Priority: Major
>
> Need performance data for queries on external JDBC tables to be documented in 
> the design doc.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13018) Fix test_tpcds_queries.py/TestTpcdsQueryForJdbcTables.test_tpcdstpcds-decimal_v2-q80a failure

2024-05-01 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842838#comment-17842838
 ] 

ASF subversion and git services commented on IMPALA-13018:
--

Commit 08f8a300250df7b4f9a517cdb6bab48c379b7e03 in impala's branch 
refs/heads/master from wzhou-code
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=08f8a3002 ]

IMPALA-12910: Support running TPCH/TPCDS queries for JDBC tables

This patch adds script to create external JDBC tables for the dataset of
TPCH and TPCDS, and adds unit-tests to run TPCH and TPCDS queries for
external JDBC tables with Impala-Impala federation. Note that JDBC
tables are mapping tables, they don't take additional disk spaces.
It fixes the race condition when caching of SQL DataSource objects by
using a new DataSourceObjectCache class, which checks reference count
before closing SQL DataSource.
Adds a new query-option 'clean_dbcp_ds_cache' with default value as
true. When it's set as false, SQL DataSource object will not be closed
when its reference count equals 0 and will be kept in cache until
the SQL DataSource is idle for more than 5 minutes. Flag variable
'dbcp_data_source_idle_timeout_s' is added to make the duration
configurable.
java.sql.Connection.close() fails to remove a closed connection from
connection pool sometimes, which causes JDBC working threads to wait
for available connections from the connection pool for a long time.
The work around is to call BasicDataSource.invalidateConnection() API
to close a connection.
Two flag variables are added for DBCP configuration properties
'maxTotal' and 'maxWaitMillis'. Note that 'maxActive' and 'maxWait'
properties are renamed to 'maxTotal' and 'maxWaitMillis' respectively
in apache.commons.dbcp v2.
Fixes a bug for database type comparison since the type strings
specified by user could be lower case or mix of upper/lower cases, but
the code compares the types with upper case string.
Fixes issue to close SQL DataSource object in JdbcDataSource.open()
and JdbcDataSource.getNext() when some errors returned from DBCP APIs
or JDBC drivers.

testdata/bin/create-tpc-jdbc-tables.py supports to create JDBC tables
for Impala-Impala, Postgres and MySQL.
Following sample commands creates TPCDS JDBC tables for Impala-Impala
federation with remote coordinator running at 10.19.10.86, and Postgres
server running at 10.19.10.86:
  ${IMPALA_HOME}/testdata/bin/create-tpc-jdbc-tables.py \
--jdbc_db_name=tpcds_jdbc --workload=tpcds \
--database_type=IMPALA --database_host=10.19.10.86 --clean

  ${IMPALA_HOME}/testdata/bin/create-tpc-jdbc-tables.py \
--jdbc_db_name=tpcds_jdbc --workload=tpcds \
--database_type=POSTGRES --database_host=10.19.10.86 \
--database_name=tpcds --clean

TPCDS tests for JDBC tables run only for release/exhaustive builds.
TPCH tests for JDBC tables run for core and exhaustive builds, except
Dockerized builds.

Remaining Issues:
 - tpcds-decimal_v2-q80a failed with returned rows not matching expected
   results for some decimal values. This will be fixed in IMPALA-13018.

Testing:
 - Passed core tests.
 - Passed query_test/test_tpcds_queries.py in release/exhaustive build.
 - Manually verified that only one SQL DataSource object was created for
   test_tpcds_queries.py::TestTpcdsQueryForJdbcTables since query option
   'clean_dbcp_ds_cache' was set as false, and the SQL DataSource object
   was closed by cleanup thread.

Change-Id: I44e8c1bb020e90559c7f22483a7ab7a151b8f48a
Reviewed-on: http://gerrit.cloudera.org:8080/21304
Reviewed-by: Abhishek Rawat 
Tested-by: Impala Public Jenkins 


> Fix 
> test_tpcds_queries.py/TestTpcdsQueryForJdbcTables.test_tpcdstpcds-decimal_v2-q80a
>  failure
> -
>
> Key: IMPALA-13018
> URL: https://issues.apache.org/jira/browse/IMPALA-13018
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend, Frontend
>Reporter: Wenzhe Zhou
>Assignee: Pranav Yogi Lodha
>Priority: Major
>
> The returned rows are not matching expected results for some decimal type of 
> columns. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13051) Speed up test_query_log test runs

2024-05-01 Thread Michael Smith (Jira)
Michael Smith created IMPALA-13051:
--

 Summary: Speed up test_query_log test runs
 Key: IMPALA-13051
 URL: https://issues.apache.org/jira/browse/IMPALA-13051
 Project: IMPALA
  Issue Type: Task
Affects Versions: Impala 4.4.0
Reporter: Michael Smith


test_query_log.py takes 11 minutes to run. Most of them use graceful shutdown, 
and provide an unnecessary grace period. Optimize test_query_log test runs, and 
do some other code cleanup around workload management.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Work started] (IMPALA-13051) Speed up test_query_log test runs

2024-05-01 Thread Michael Smith (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-13051 started by Michael Smith.
--
> Speed up test_query_log test runs
> -
>
> Key: IMPALA-13051
> URL: https://issues.apache.org/jira/browse/IMPALA-13051
> Project: IMPALA
>  Issue Type: Task
>Affects Versions: Impala 4.4.0
>Reporter: Michael Smith
>Assignee: Michael Smith
>Priority: Major
>
> test_query_log.py takes 11 minutes to run. Most of them use graceful 
> shutdown, and provide an unnecessary grace period. Optimize test_query_log 
> test runs, and do some other code cleanup around workload management.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-13051) Speed up test_query_log test runs

2024-05-01 Thread Michael Smith (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith reassigned IMPALA-13051:
--

Assignee: Michael Smith

> Speed up test_query_log test runs
> -
>
> Key: IMPALA-13051
> URL: https://issues.apache.org/jira/browse/IMPALA-13051
> Project: IMPALA
>  Issue Type: Task
>Affects Versions: Impala 4.4.0
>Reporter: Michael Smith
>Assignee: Michael Smith
>Priority: Major
>
> test_query_log.py takes 11 minutes to run. Most of them use graceful 
> shutdown, and provide an unnecessary grace period. Optimize test_query_log 
> test runs, and do some other code cleanup around workload management.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-12353) Support Impala on ARM

2024-05-01 Thread Michael Smith (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith resolved IMPALA-12353.

Fix Version/s: Impala 4.4.0
   Resolution: Fixed

> Support Impala on ARM
> -
>
> Key: IMPALA-12353
> URL: https://issues.apache.org/jira/browse/IMPALA-12353
> Project: IMPALA
>  Issue Type: Epic
>  Components: Infrastructure
>Reporter: Michael Smith
>Assignee: Michael Smith
>Priority: Major
>  Labels: arm64
> Fix For: Impala 4.4.0
>
>
> Ensure Impala runs on ARM. Provide native toolchain builds, developer 
> workflows, and testing for ARM.
> Create jobs in jenkins.impala.io as needed to enable testing ARM builds.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-12353) Support Impala on ARM

2024-05-01 Thread Michael Smith (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith updated IMPALA-12353:
---
Labels: arm64  (was: )

> Support Impala on ARM
> -
>
> Key: IMPALA-12353
> URL: https://issues.apache.org/jira/browse/IMPALA-12353
> Project: IMPALA
>  Issue Type: Epic
>  Components: Infrastructure
>Reporter: Michael Smith
>Assignee: Michael Smith
>Priority: Major
>  Labels: arm64
>
> Ensure Impala runs on ARM. Provide native toolchain builds, developer 
> workflows, and testing for ARM.
> Create jobs in jenkins.impala.io as needed to enable testing ARM builds.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13050) Impala fails to start with TSAN in RHEL aarch64

2024-05-01 Thread Michael Smith (Jira)
Michael Smith created IMPALA-13050:
--

 Summary: Impala fails to start with TSAN in RHEL aarch64
 Key: IMPALA-13050
 URL: https://issues.apache.org/jira/browse/IMPALA-13050
 Project: IMPALA
  Issue Type: Bug
Affects Versions: Impala 4.4.0
Reporter: Michael Smith


Impala fails to start up with TSAN builds with RHEL 8.8 on aarch64 (arm64) 
machines.

With Java 8, the jni-util.cc call to {{hdfsConnect}} causes impalad/catalogd to 
crash with
{quote}
Exception: java.lang.StackOverflowError thrown from the 
UncaughtExceptionHandler in thread "process reaper"
{quote}

With Java 11 and 17 (with {{TEST_JDK_VERSION}}, or {{IMPALA_JDK_VERSION}} and 
only restarting Impala), we don't get an error but they still crash during 
{{hdfsConnect}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-13049) Add dependency management for the log4j2 version

2024-05-01 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-13049.

Fix Version/s: Impala 4.5.0
   Resolution: Fixed

> Add dependency management for the log4j2 version
> 
>
> Key: IMPALA-13049
> URL: https://issues.apache.org/jira/browse/IMPALA-13049
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend, Infrastructure
>Affects Versions: Impala 4.4.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Critical
> Fix For: Impala 4.5.0
>
>
> In some internal builds, we see cases where one dependency brings in one 
> version of log4j2 and another brings in a different version on a different 
> artifact. In particular, we have seen cases where Hive brings in log4j-api 
> 2.17.1 while something else brings in log4j-core 2.18.0. This is a bad 
> combination, because log4j-core 2.18.0 relies on the ServiceLoaderUtil class 
> existing in log4j-api, but log4j-api 2.17.1 doesn't have it. This can result 
> in class not found exceptions.
> Impala itself uses reload4j rather than log4j2, so this is purely about 
> coordinating dependencies rather than Impala code.
> We should add dependency management for log4j-api and log4j-core. It makes 
> sense to standardize on 2.18.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13049) Add dependency management for the log4j2 version

2024-05-01 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842753#comment-17842753
 ] 

ASF subversion and git services commented on IMPALA-13049:
--

Commit d09c5024907aaf387aaa584dc86cb2b4d641a582 in impala's branch 
refs/heads/master from Joe McDonnell
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=d09c50249 ]

IMPALA-13049: Add dependency management for log4j2 to use 2.18.0

Currently, there is no dependency management for the log4j2
version. Impala itself doesn't use log4j2. However, recently
we encountered a case where one dependency brought in
log4-core 2.18.0 and another brought in log4j-api 2.17.1.
log4j-core 2.18.0 relies on the existence of the ServiceLoaderUtil
class from log4j-api 2.18.0. log4j-api 2.17.1 doesn't have this
class, which causes class not found exceptions.

This uses dependency management to set the log4j2 version to 2.18.0
for log4j-core and log4j-api to avoid any mismatch.

Testing:
 - Ran a local build and verified that both log4j-core and log4j-api
   are using 2.18.0.

Change-Id: Ib4f8485adadb90f66f354a5dedca29992c6d4e6f
Reviewed-on: http://gerrit.cloudera.org:8080/21379
Reviewed-by: Michael Smith 
Reviewed-by: Abhishek Rawat 
Tested-by: Impala Public Jenkins 


> Add dependency management for the log4j2 version
> 
>
> Key: IMPALA-13049
> URL: https://issues.apache.org/jira/browse/IMPALA-13049
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend, Infrastructure
>Affects Versions: Impala 4.4.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Critical
>
> In some internal builds, we see cases where one dependency brings in one 
> version of log4j2 and another brings in a different version on a different 
> artifact. In particular, we have seen cases where Hive brings in log4j-api 
> 2.17.1 while something else brings in log4j-core 2.18.0. This is a bad 
> combination, because log4j-core 2.18.0 relies on the ServiceLoaderUtil class 
> existing in log4j-api, but log4j-api 2.17.1 doesn't have it. This can result 
> in class not found exceptions.
> Impala itself uses reload4j rather than log4j2, so this is purely about 
> coordinating dependencies rather than Impala code.
> We should add dependency management for log4j-api and log4j-core. It makes 
> sense to standardize on 2.18.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12754) Update Impala document to cover external jdbc table

2024-05-01 Thread Wenzhe Zhou (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842739#comment-17842739
 ] 

Wenzhe Zhou commented on IMPALA-12754:
--

Design doc for JDBC table: 
https://docs.google.com/document/d/14gmhaG5fRQg3s4jNh8UrlVAeDdNtQfoGbZGl0emRcdE/edit?usp=sharing

> Update Impala document to cover external jdbc table
> ---
>
> Key: IMPALA-12754
> URL: https://issues.apache.org/jira/browse/IMPALA-12754
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Docs
>Reporter: Wenzhe Zhou
>Assignee: gaurav singh
>Priority: Major
>
> We need to document  the SQL syntax to create external JDBC table and alter 
> external JDBC table, including the table properties to be set for JDBC and 
> DBCP (Database Connection Pool).
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Comment Edited] (IMPALA-12754) Update Impala document to cover external jdbc table

2024-05-01 Thread Wenzhe Zhou (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842441#comment-17842441
 ] 

Wenzhe Zhou edited comment on IMPALA-12754 at 5/1/24 6:31 PM:
--

*Impala SQL syntax to create external JDBC table*

When creating an external JDBC table, the user needs to specify the minimum 
information: database type, jdbc url, driver class, driver file location, user 
name and password for querying database, table name. Here are two samples to 
create JDBC tables for tables on Postgres server and another Impala cluster 
respectively.
{code:java}
CREATE EXTERNAL TABLE alltypes_jdbc (
  id INT,
  bool_col BOOLEAN,
  tinyint_col TINYINT,
  smallint_col SMALLINT,
  int_col INT,
  bigint_col BIGINT,
  float_col FLOAT,
  double_col DOUBLE,
  date_col DATE,
  string_col STRING,
  timestamp_col TIMESTAMP)
STORED BY JDBC
TBLPROPERTIES (
  "database.type"="POSTGRES",
  "jdbc.url"="jdbc:postgresql://10.96.132.138:5432/functional",
  "jdbc.driver"="org.postgresql.Driver",
  "driver.url"="/test-warehouse/data-sources/jdbc-drivers/postgresql-jdbc.jar",
  "dbcp.username"="hiveuser",
  "dbcp.password"="password",
  "table"="alltypes");

CREATE EXTERNAL TABLE alltypes_jdbc (
  id INT,
  bool_col BOOLEAN,
  tinyint_col TINYINT,
  smallint_col SMALLINT,
  int_col INT,
  bigint_col BIGINT,
  float_col FLOAT,
  double_col DOUBLE,
  date_col DATE,
  string_col STRING,
  timestamp_col TIMESTAMP)
STORED BY JDBC
TBLPROPERTIES (
  "database.type"="IMPALA",
  "jdbc.url"="jdbc:impala://10.96.132.138:21050/functional",
  "jdbc.auth"="AuthMech=3",
  "jdbc.properties"="MEM_LIMIT=10, MAX_ERRORS = 1",
  "jdbc.driver"="com.cloudera.impala.jdbc.Driver",
  
"driver.url"="hdfs://test-warehouse/data-sources/jdbc-drivers/ImpalaJDBC42.jar",
  "dbcp.username"="hiveuser",
  
"dbcp.password.keystore"="jceks://hdfs/test-warehouse/data-sources/test.jceks",
  "dbcp.password.key"="hiveuser",
  "table"="alltypes");
{code}

*Supported data types*
The column data type for an external JDBC table can be:
* Numeric data type: boolean, tinyint, smallint, int, bigint, float, double
* Decimal with scale and precision
* String type: string
* Date
* Timestamp

Note that following data types are not supported: char, varchar, and binary. 
Complex data type: struct, map, array and nested type are not supported.

*Table Properties specified for external JDBC table*
In the create external JDBC table statement, user is required to specify the 
following table properties:
* database.type: IMPALA, MYSQL, POSTGRES
* jdbc.url: jdbc connection string, including the database type, IP address, 
port number, and database name. For example, 
“jdbc:impala://10.96.132.138:21050/functional”.
* jdbc.driver: class name of jdbc driver
* driver.url: driver URL for downloading the Jar file package that is used to 
access the external database.
* table: name of the external table to be mapped in Impala.

Besides the above required properties, user can also specify optional 
parameters to use different authentication methods, or to allow case sensitive 
columns names in remote tables, or to specify additional database properties, 
etc:
* jdbc.auth: authentication mechanisms of JDBC driver. It's used for 
Impala-Impala federation.
* dbcp.username: jdbc user name
* dbcp.password: jdbc password in clear text, this parameter is strongly 
discouraged in production environments. The recommended way is to store it in a 
keystore. See section  “securing password” for details.
* dbcp.password.key:  key of the keystore
* dbcp.password.keystore: keystore URI.
* jdbc.properties: additional properties applied to database engine, like 
Impala Query options. Properties are specified as comma-delimited key=value 
string. 
* jdbc.fetch.size: number of rows to fetch in a batch
* column.mappting: Mapping of column names between external table and Impala 
JDBC table. See section "Support case-sensitive table/column names" for 
details. 

*Securing password*
To mitigate the password leak, the value of “dbcp.password” table property is 
masked in the output of commands “SHOW CREATE TABLE table-name” and “DESCRIBE 
FORMATTED | EXTENDED table-name”.
In production deployment, it is strongly discouraged to save the jdbc password 
in clear text in table property "dbcp.password". Instead, user can store 
password in a Java keystore file on HDFS by using the command like below to 
create a keystore file:
{code:java}
hadoop credential create host1.password -provider jceks://hdfs/user/test.jceks 
-v passwd1
{code}
Then specify “dbcp.password.key” and “dbcp.password.keystore” instead of 
“dbcp.password” in the create table statement.

*Support case-sensitive table/column names*

Column names of remote tables may be different from the JDBC table schema. For 
example, Postgres allows case-sensitive column names, but Impala always saves 
column names in lowercase. In this case, the user 

[jira] [Comment Edited] (IMPALA-12754) Update Impala document to cover external jdbc table

2024-05-01 Thread Wenzhe Zhou (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842441#comment-17842441
 ] 

Wenzhe Zhou edited comment on IMPALA-12754 at 5/1/24 6:11 PM:
--

*Impala SQL syntax to create external JDBC table*

When creating an external JDBC table, the user needs to specify the minimum 
information: database type, jdbc url, driver class, driver file location, user 
name and password for querying database, table name. Here are two samples to 
create JDBC tables for tables on Postgres server and another Impala cluster 
respectively.
{code:java}
CREATE EXTERNAL TABLE alltypes_jdbc (
  id INT,
  bool_col BOOLEAN,
  tinyint_col TINYINT,
  smallint_col SMALLINT,
  int_col INT,
  bigint_col BIGINT,
  float_col FLOAT,
  double_col DOUBLE,
  date_col DATE,
  string_col STRING,
  timestamp_col TIMESTAMP)
STORED BY JDBC
TBLPROPERTIES (
  "database.type"="POSTGRES",
  "jdbc.url"="jdbc:postgresql://10.96.132.138:5432/functional",
  "jdbc.driver"="org.postgresql.Driver",
  "driver.url"="/test-warehouse/data-sources/jdbc-drivers/postgresql-jdbc.jar",
  "dbcp.username"="hiveuser",
  "dbcp.password"="password",
  "table"="alltypes");

CREATE EXTERNAL TABLE alltypes_jdbc (
  id INT,
  bool_col BOOLEAN,
  tinyint_col TINYINT,
  smallint_col SMALLINT,
  int_col INT,
  bigint_col BIGINT,
  float_col FLOAT,
  double_col DOUBLE,
  date_col DATE,
  string_col STRING,
  timestamp_col TIMESTAMP)
STORED BY JDBC
TBLPROPERTIES (
  "database.type"="IMPALA",
  "jdbc.url"="jdbc:impala://10.96.132.138:21050/functional",
  "jdbc.auth"="AuthMech=3",
  "jdbc.properties"="MEM_LIMIT=10, MAX_ERRORS = 1",
  "jdbc.driver"="com.cloudera.impala.jdbc.Driver",
  
"driver.url"="hdfs://test-warehouse/data-sources/jdbc-drivers/ImpalaJDBC42.jar",
  "dbcp.username"="hiveuser",
  
"dbcp.password.keystore"="jceks://hdfs/test-warehouse/data-sources/test.jceks",
  "dbcp.password.key"="hiveuser",
  "table"="alltypes");
{code}

*Supported data types*
The column data type for an external JDBC table can be:
* Numeric data type: boolean, tinyint, smallint, int, bigint, float, double
* Decimal with scale and precision
* String type: string
* Date
* Timestamp

Note that following data types are not supported: char, varchar, and binary. 
Complex data type: struct, map, array and nested type are not supported.

*Table Properties specified for external JDBC table*
In the create external JDBC table statement, user is required to specify the 
following table properties:
* database.type: IMPALA, MYSQL, POSTGRES
* jdbc.url: jdbc connection string, including the database type, IP address, 
port number, and database name. For example, 
“jdbc:impala://10.96.132.138:21050/functional”.
* jdbc.driver: class name of jdbc driver
* driver.url: driver URL for downloading the Jar file package that is used to 
access the external database.
* table: name of the external table to be mapped in Impala.

Besides the above required properties, user can also specify optional 
parameters to use different authentication methods, or to allow case sensitive 
columns names in remote tables, or to specify additional database properties, 
etc:
* jdbc.auth: authentication mechanisms of JDBC driver. It's used for 
Impala-Impala federation.
* dbcp.username: jdbc user name
* dbcp.password: jdbc password in clear text, this parameter is strongly 
discouraged in production environments. The recommended way is to store it in a 
keystore. See section  “securing password” for details.
* dbcp.password.key:  key of the keystore
* dbcp.password.keystore: keystore URI.
* jdbc.properties: additional properties applied to database engine, like 
Impala Query options. Properties are specified as comma-delimited key=value 
string. 
* jdbc.fetch.size: number of rows to fetch in a batch
* column.mappting: Mapping of column names between external table and Impala 
JDBC table. See section "Support case-sensitive table/column names" for 
details. 

*Securing password*
To mitigate the password leak, the value of “dbcp.password” table property is 
masked in the output of commands “SHOW CREATE TABLE table-name” and “DESCRIBE 
FORMATTED | EXTENDED table-name”.
In production deployment, it is strongly discouraged to save the jdbc password 
in clear text in table property "dbcp.password". Instead, user can store 
password in a Java keystore file on HDFS by using the command like below to 
create a keystore file:
{code:java}
hadoop credential create host1.password -provider jceks://hdfs/user/test.jceks 
-v passwd1
{code}
Then specify “dbcp.password.key” and “dbcp.password.keystore” instead of 
“dbcp.password” in the create table statement.

*Support case-sensitive table/column names*

Column names of remote tables may be different from the JDBC table schema. For 
example, Postgres allows case-sensitive column names, but Impala always saves 
column names in lowercase. In this case, the user 

[jira] [Updated] (IMPALA-12754) Update Impala document to cover external jdbc table

2024-05-01 Thread Wenzhe Zhou (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenzhe Zhou updated IMPALA-12754:
-
Description: 
We need to document  the SQL syntax to create external JDBC table and alter 
external JDBC table, including the table properties to be set for JDBC and DBCP 
(Database Connection Pool).
 

  was:
We need to document  the SQL syntax to create jdbc table and alter jdbc table, 
including the table properties to be set for jdbc and DBCP (Database connection 
pool).
 


> Update Impala document to cover external jdbc table
> ---
>
> Key: IMPALA-12754
> URL: https://issues.apache.org/jira/browse/IMPALA-12754
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Docs
>Reporter: Wenzhe Zhou
>Assignee: gaurav singh
>Priority: Major
>
> We need to document  the SQL syntax to create external JDBC table and alter 
> external JDBC table, including the table properties to be set for JDBC and 
> DBCP (Database Connection Pool).
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Comment Edited] (IMPALA-12754) Update Impala document to cover external jdbc table

2024-05-01 Thread Wenzhe Zhou (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842441#comment-17842441
 ] 

Wenzhe Zhou edited comment on IMPALA-12754 at 5/1/24 6:07 PM:
--

*Impala SQL syntax to create external JDBC table*

When creating an external JDBC table, the user needs to specify the minimum 
information: database type, jdbc url, driver class, driver file location, user 
name and password for querying database, table name. Here are two samples to 
create JDBC tables for tables on Postgres server and another Impala cluster 
respectively.
{code:java}
CREATE EXTERNAL TABLE alltypes_jdbc (
  id INT,
  bool_col BOOLEAN,
  tinyint_col TINYINT,
  smallint_col SMALLINT,
  int_col INT,
  bigint_col BIGINT,
  float_col FLOAT,
  double_col DOUBLE,
  date_col DATE,
  string_col STRING,
  timestamp_col TIMESTAMP)
STORED BY JDBC
TBLPROPERTIES (
  "database.type"="POSTGRES",
  "jdbc.url"="jdbc:postgresql://10.96.132.138:5432/functional",
  "jdbc.driver"="org.postgresql.Driver",
  "driver.url"="/test-warehouse/data-sources/jdbc-drivers/postgresql-jdbc.jar",
  "dbcp.username"="hiveuser",
  "dbcp.password"="password",
  "table"="alltypes");

CREATE EXTERNAL TABLE alltypes_jdbc (
  id INT,
  bool_col BOOLEAN,
  tinyint_col TINYINT,
  smallint_col SMALLINT,
  int_col INT,
  bigint_col BIGINT,
  float_col FLOAT,
  double_col DOUBLE,
  date_col DATE,
  string_col STRING,
  timestamp_col TIMESTAMP)
STORED BY JDBC
TBLPROPERTIES (
  "database.type"="IMPALA",
  "jdbc.url"="jdbc:impala://10.96.132.138:21050/functional",
  "jdbc.auth"="AuthMech=3",
  "jdbc.properties"="MEM_LIMIT=10, MAX_ERRORS = 1",
  "jdbc.driver"="com.cloudera.impala.jdbc.Driver",
  
"driver.url"="hdfs://test-warehouse/data-sources/jdbc-drivers/ImpalaJDBC42.jar",
  "dbcp.username"="hiveuser",
  
"dbcp.password.keystore"="jceks://hdfs/test-warehouse/data-sources/test.jceks",
  "dbcp.password.key"="hiveuser",
  "table"="alltypes");
{code}

*Supported data types*
The column data type for an external JDBC table can be:
* Numeric data type: boolean, tinyint, smallint, int, bigint, float, double
* Decimal with scale and precision
* String type: string
* Date
* Timestamp

Note that following data types are not supported: char, varchar, and binary. 
Complex data type: struct, map, array and nested type are not supported.

*Table Properties specified for external JDBC table*
In the create external JDBC table statement, user is required to specify the 
following table properties:
* database.type: IMPALA, MYSQL, POSTGRES
* jdbc.url: jdbc connection string, including the database type, IP address, 
port number, and database name. For example, 
“jdbc:impala://10.96.132.138:21050/functional”.
* jdbc.driver: class name of jdbc driver
* driver.url: driver URL for downloading the Jar file package that is used to 
access the external database.
* table: name of the external table to be mapped in Impala.

Besides the above required properties, user can also specify optional 
parameters to use different authentication methods, or to allow case sensitive 
columns names in remote tables, or to specify additional database properties, 
etc:
* jdbc.auth: authentication mechanisms of JDBC driver. It's used for 
Impala-Impala federation.
* dbcp.username: jdbc user name
* dbcp.password: jdbc password in clear text, this parameter is strongly 
discouraged in production environments. The recommended way is to store it in a 
keystore. See section  “securing password” for details.
* dbcp.password.key:  key of the keystore
* dbcp.password.keystore: keystore URI.
* jdbc.properties: additional properties applied to database engine, like 
Impala Query options. Properties are specified as comma-delimited key=value 
string. 
* jdbc.fetch.size: number of rows to fetch in a batch
* column.mappting: Mapping of column names between external table and Impala 
JDBC table. See section "Support case-sensitive table/column names" for 
details. 

*Securing password*
To mitigate the password leak, the value of “dbcp.password” table property is 
masked in the output of commands “SHOW CREATE TABLE table-name” and “DESCRIBE 
FORMATTED | EXTENDED table-name”.
In production deployment, it is strongly discouraged to save the jdbc password 
in clear text in table property "dbcp.password". Instead, user can store 
password in a Java keystore file on HDFS by using the command like below to 
create a keystore file:
  hadoop credential create host1.password -provider 
jceks://hdfs/user/test.jceks -v passwd1
Then specify “dbcp.password.key” and “dbcp.password.keystore” instead of 
“dbcp.password” in the create table statement.

*Support case-sensitive table/column names*

Column names of remote tables may be different from the JDBC table schema. For 
example, Postgres allows case-sensitive column names, but Impala always saves 
column names in lowercase. In this case, the user can set the 

[jira] [Comment Edited] (IMPALA-12754) Update Impala document to cover external jdbc table

2024-05-01 Thread Wenzhe Zhou (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842441#comment-17842441
 ] 

Wenzhe Zhou edited comment on IMPALA-12754 at 5/1/24 6:06 PM:
--

*Impala SQL syntax to create external JDBC table*

When creating an external JDBC table, the user needs to specify the minimum 
information: database type, jdbc url, driver class, driver file location, user 
name and password for querying database, table name. Here are two samples to 
create JDBC tables for tables on Postgres server and another Impala cluster 
respectively.
{code:java}
CREATE EXTERNAL TABLE alltypes_jdbc (
  id INT,
  bool_col BOOLEAN,
  tinyint_col TINYINT,
  smallint_col SMALLINT,
  int_col INT,
  bigint_col BIGINT,
  float_col FLOAT,
  double_col DOUBLE,
  date_col DATE,
  string_col STRING,
  timestamp_col TIMESTAMP)
STORED BY JDBC
TBLPROPERTIES (
  "database.type"="POSTGRES",
  "jdbc.url"="jdbc:postgresql://10.96.132.138:5432/functional",
  "jdbc.driver"="org.postgresql.Driver",
  "driver.url"="/test-warehouse/data-sources/jdbc-drivers/postgresql-jdbc.jar",
  "dbcp.username"="hiveuser",
  "dbcp.password"="password",
  "table"="alltypes");

CREATE EXTERNAL TABLE alltypes_jdbc (
  id INT,
  bool_col BOOLEAN,
  tinyint_col TINYINT,
  smallint_col SMALLINT,
  int_col INT,
  bigint_col BIGINT,
  float_col FLOAT,
  double_col DOUBLE,
  date_col DATE,
  string_col STRING,
  timestamp_col TIMESTAMP)
STORED BY JDBC
TBLPROPERTIES (
  "database.type"="IMPALA",
  "jdbc.url"="jdbc:impala://10.96.132.138:21050/functional",
  "jdbc.auth"="AuthMech=3",
  "jdbc.properties"="MEM_LIMIT=10, MAX_ERRORS = 1",
  "jdbc.driver"="com.cloudera.impala.jdbc.Driver",
  
"driver.url"="hdfs://test-warehouse/data-sources/jdbc-drivers/ImpalaJDBC42.jar",
  "dbcp.username"="hiveuser",
  
"dbcp.password.keystore"="jceks://hdfs/test-warehouse/data-sources/test.jceks",
  "dbcp.password.key"="hiveuser",
  "table"="alltypes");
{code}

*Supported data types*
The column data type for an external JDBC table can be:
* Numeric data type: boolean, tinyint, smallint, int, bigint, float, double
* Decimal with scale and precision
* String type: string
* Date
* Timestamp

Note that following data types are not supported: char, varchar, and binary. 
Complex data type: struct, map, array and nested type are not supported.

*Table Properties specified for external JDBC table*
In the create external JDBC table statement, user is required to specify the 
following table properties:
* database.type: IMPALA, MYSQL, POSTGRES
* jdbc.url: jdbc connection string, including the database type, IP address, 
port number, and database name. For example, 
“jdbc:impala://10.96.132.138:21050/functional”.
* jdbc.driver: class name of jdbc driver
* driver.url: driver URL for downloading the Jar file package that is used to 
access the external database.
* table: name of the external table to be mapped in Impala.

Besides the above required properties, user can also specify optional 
parameters to use different authentication methods, or to allow case sensitive 
columns names in remote tables, or to specify additional database properties, 
etc:
jdbc.auth: authentication mechanisms of JDBC driver. It's used for 
Impala-Impala federation.
dbcp.username: jdbc user name
dbcp.password: jdbc password in clear text, this parameter is strongly 
discouraged in production environments. The recommended way is to store it in a 
keystore. See section  “securing password” for details.
dbcp.password.key:  key of the keystore
dbcp.password.keystore: keystore URI.
jdbc.properties: additional properties applied to database engine, like Impala 
Query options. Properties are specified as comma-delimited key=value string. 
jdbc.fetch.size: number of rows to fetch in a batch
column.mappting: Mapping of column names between external table and Impala JDBC 
table. See section "Support case-sensitive table/column names" for details. 

*Securing password*
To mitigate the password leak, the value of “dbcp.password” table property is 
masked in the output of commands “SHOW CREATE TABLE table-name” and “DESCRIBE 
FORMATTED | EXTENDED table-name”.
In production deployment, it is strongly discouraged to save the jdbc password 
in clear text in table property "dbcp.password". Instead, user can store 
password in a Java keystore file on HDFS by using the command like below to 
create a keystore file:
  hadoop credential create host1.password -provider 
jceks://hdfs/user/test.jceks -v passwd1
Then specify “dbcp.password.key” and “dbcp.password.keystore” instead of 
“dbcp.password” in the create table statement.

*Support case-sensitive table/column names*

Column names of remote tables may be different from the JDBC table schema. For 
example, Postgres allows case-sensitive column names, but Impala always saves 
column names in lowercase. In this case, the user can set the “column.mapping” 

[jira] [Comment Edited] (IMPALA-12754) Update Impala document to cover external jdbc table

2024-05-01 Thread Wenzhe Zhou (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842441#comment-17842441
 ] 

Wenzhe Zhou edited comment on IMPALA-12754 at 5/1/24 6:03 PM:
--

*Impala SQL syntax to create external JDBC table*

When creating an external JDBC table, the user needs to specify the minimum 
information: database type, jdbc url, driver class, driver file location, user 
name and password for querying database, table name. Here are two samples to 
create JDBC tables for tables on Postgres server and another Impala cluster 
respectively.
{code:java}
CREATE EXTERNAL TABLE alltypes_jdbc (
  id INT,
  bool_col BOOLEAN,
  tinyint_col TINYINT,
  smallint_col SMALLINT,
  int_col INT,
  bigint_col BIGINT,
  float_col FLOAT,
  double_col DOUBLE,
  date_col DATE,
  string_col STRING,
  timestamp_col TIMESTAMP)
STORED BY JDBC
TBLPROPERTIES (
  "database.type"="POSTGRES",
  "jdbc.url"="jdbc:postgresql://10.96.132.138:5432/functional",
  "jdbc.driver"="org.postgresql.Driver",
  "driver.url"="/test-warehouse/data-sources/jdbc-drivers/postgresql-jdbc.jar",
  "dbcp.username"="hiveuser",
  "dbcp.password"="password",
  "table"="alltypes");

CREATE EXTERNAL TABLE alltypes_jdbc (
  id INT,
  bool_col BOOLEAN,
  tinyint_col TINYINT,
  smallint_col SMALLINT,
  int_col INT,
  bigint_col BIGINT,
  float_col FLOAT,
  double_col DOUBLE,
  date_col DATE,
  string_col STRING,
  timestamp_col TIMESTAMP)
STORED BY JDBC
TBLPROPERTIES (
  "database.type"="IMPALA",
  "jdbc.url"="jdbc:impala://10.96.132.138:21050/functional",
  "jdbc.auth"="AuthMech=3",
  "jdbc.properties"="MEM_LIMIT=10, MAX_ERRORS = 1",
  "jdbc.driver"="com.cloudera.impala.jdbc.Driver",
  
"driver.url"="hdfs://test-warehouse/data-sources/jdbc-drivers/ImpalaJDBC42.jar",
  "dbcp.username"="hiveuser",
  
"dbcp.password.keystore"="jceks://hdfs/test-warehouse/data-sources/test.jceks",
  "dbcp.password.key"="hiveuser",
  "table"="alltypes");
{code}

*Supported data types*
The column data type for an external JDBC table can be:
Numeric data type: boolean, tinyint, smallint, int, bigint, float, double
Decimal with scale and precision
String type: string
Date
Timestamp

Note that following data types are not supported: char, varchar, and binary. 
Complex data type: struct, map, array and nested type are not supported.

*Table Properties specified for external JDBC table*
In the create external JDBC table statement, user is required to specify the 
following table properties:
database.type: IMPALA, MYSQL, POSTGRES
jdbc.url: jdbc connection string, including the database type, IP address, port 
number, and database name. For example, 
“jdbc:impala://10.96.132.138:21050/functional”.
jdbc.driver: class name of jdbc driver
driver.url: driver URL for downloading the Jar file package that is used to 
access the external database.
table: name of the external table to be mapped in Impala.

Besides the above required properties, user can also specify optional 
parameters to use different authentication methods, or to allow case sensitive 
columns names in remote tables, or to specify additional database properties, 
etc:
jdbc.auth: authentication mechanisms of JDBC driver. It's used for 
Impala-Impala federation.
dbcp.username: jdbc user name
dbcp.password: jdbc password in clear text, this parameter is strongly 
discouraged in production environments. The recommended way is to store it in a 
keystore. See section  “securing password” for details.
dbcp.password.key:  key of the keystore
dbcp.password.keystore: keystore URI.
jdbc.properties: additional properties applied to database engine, like Impala 
Query options. Properties are specified as comma-delimited key=value string. 
jdbc.fetch.size: number of rows to fetch in a batch
column.mappting: Mapping of column names between external table and Impala JDBC 
table. See section "Support case-sensitive table/column names" for details. 

*Securing password*
To mitigate the password leak, the value of “dbcp.password” table property is 
masked in the output of commands “SHOW CREATE TABLE table-name” and “DESCRIBE 
FORMATTED | EXTENDED table-name”.
In production deployment, it is strongly discouraged to save the jdbc password 
in clear text in table property "dbcp.password". Instead, user can store 
password in a Java keystore file on HDFS by using the command like below to 
create a keystore file:
  hadoop credential create host1.password -provider 
jceks://hdfs/user/test.jceks -v passwd1
Then specify “dbcp.password.key” and “dbcp.password.keystore” instead of 
“dbcp.password” in the create table statement.

*Support case-sensitive table/column names*

Column names of remote tables may be different from the JDBC table schema. For 
example, Postgres allows case-sensitive column names, but Impala always saves 
column names in lowercase. In this case, the user can set the “column.mapping” 
table property to map 

[jira] [Comment Edited] (IMPALA-12754) Update Impala document to cover external jdbc table

2024-05-01 Thread Wenzhe Zhou (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842441#comment-17842441
 ] 

Wenzhe Zhou edited comment on IMPALA-12754 at 5/1/24 6:02 PM:
--

*Impala SQL syntax to create external JDBC table*

When creating an external JDBC table, the user needs to specify the minimum 
information: database type, jdbc url, driver class, driver file location, user 
name and password for querying database, table name. Here are two samples to 
create JDBC tables for tables on Postgres server and another Impala cluster 
respectively.
{code:java}
*CREATE EXTERNAL TABLE* alltypes_jdbc (
  id INT,
  bool_col BOOLEAN,
  tinyint_col TINYINT,
  smallint_col SMALLINT,
  int_col INT,
  bigint_col BIGINT,
  float_col FLOAT,
  double_col DOUBLE,
  date_col DATE,
  string_col STRING,
  timestamp_col TIMESTAMP)
*STORED BY JDBC*
*TBLPROPERTIES* (
  "database.type"="POSTGRES",
  "jdbc.url"="jdbc:postgresql://10.96.132.138:5432/functional",
  "jdbc.driver"="org.postgresql.Driver",
  "driver.url"="/test-warehouse/data-sources/jdbc-drivers/postgresql-jdbc.jar",
  "dbcp.username"="hiveuser",
  "dbcp.password"="password",
  "table"="alltypes");

*CREATE EXTERNAL TABLE* alltypes_jdbc (
  id INT,
  bool_col BOOLEAN,
  tinyint_col TINYINT,
  smallint_col SMALLINT,
  int_col INT,
  bigint_col BIGINT,
  float_col FLOAT,
  double_col DOUBLE,
  date_col DATE,
  string_col STRING,
  timestamp_col TIMESTAMP)
*STORED BY JDBC*
*TBLPROPERTIES* (
  "database.type"="IMPALA",
  "jdbc.url"="jdbc:impala://10.96.132.138:21050/functional",
  "jdbc.auth"="AuthMech=3",
  "jdbc.properties"="MEM_LIMIT=10, MAX_ERRORS = 1",
  "jdbc.driver"="com.cloudera.impala.jdbc.Driver",
  
"driver.url"="hdfs://test-warehouse/data-sources/jdbc-drivers/ImpalaJDBC42.jar",
  "dbcp.username"="hiveuser",
  
"dbcp.password.keystore"="jceks://hdfs/test-warehouse/data-sources/test.jceks",
  "dbcp.password.key"="hiveuser",
  "table"="alltypes");
{code}

*Supported data types*
The column data type for an external JDBC table can be:
Numeric data type: boolean, tinyint, smallint, int, bigint, float, double
Decimal with scale and precision
String type: string
Date
Timestamp

Note that following data types are not supported: char, varchar, and binary. 
Complex data type: struct, map, array and nested type are not supported.

*Table Properties specified for external JDBC table*
In the create external JDBC table statement, user is required to specify the 
following table properties:
database.type: IMPALA, MYSQL, POSTGRES
jdbc.url: jdbc connection string, including the database type, IP address, port 
number, and database name. For example, 
“jdbc:impala://10.96.132.138:21050/functional”.
jdbc.driver: class name of jdbc driver
driver.url: driver URL for downloading the Jar file package that is used to 
access the external database.
table: name of the external table to be mapped in Impala.

Besides the above required properties, user can also specify optional 
parameters to use different authentication methods, or to allow case sensitive 
columns names in remote tables, or to specify additional database properties, 
etc:
jdbc.auth: authentication mechanisms of JDBC driver. It's used for 
Impala-Impala federation.
dbcp.username: jdbc user name
dbcp.password: jdbc password in clear text, this parameter is strongly 
discouraged in production environments. The recommended way is to store it in a 
keystore. See section  “securing password” for details.
dbcp.password.key:  key of the keystore
dbcp.password.keystore: keystore URI.
jdbc.properties: additional properties applied to database engine, like Impala 
Query options. Properties are specified as comma-delimited key=value string. 
jdbc.fetch.size: number of rows to fetch in a batch
column.mappting: Mapping of column names between external table and Impala JDBC 
table. See section "Support case-sensitive table/column names" for details. 

*Securing password*
To mitigate the password leak, the value of “dbcp.password” table property is 
masked in the output of commands “SHOW CREATE TABLE table-name” and “DESCRIBE 
FORMATTED | EXTENDED table-name”.
In production deployment, it is strongly discouraged to save the jdbc password 
in clear text in table property "dbcp.password". Instead, user can store 
password in a Java keystore file on HDFS by using the command like below to 
create a keystore file:
  hadoop credential create host1.password -provider 
jceks://hdfs/user/test.jceks -v passwd1
Then specify “dbcp.password.key” and “dbcp.password.keystore” instead of 
“dbcp.password” in the create table statement.

*Support case-sensitive table/column names*

Column names of remote tables may be different from the JDBC table schema. For 
example, Postgres allows case-sensitive column names, but Impala always saves 
column names in lowercase. In this case, the user can set the “column.mapping” 
table 

[jira] [Comment Edited] (IMPALA-12754) Update Impala document to cover external jdbc table

2024-05-01 Thread Wenzhe Zhou (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842441#comment-17842441
 ] 

Wenzhe Zhou edited comment on IMPALA-12754 at 5/1/24 6:00 PM:
--

*Impala SQL syntax to create external JDBC table*

When creating an external JDBC table, the user needs to specify the minimum 
information: database type, jdbc url, driver class, driver file location, user 
name and password for querying database, table name. Here are two samples to 
create JDBC tables for tables on Postgres server and another Impala cluster 
respectively.

*CREATE EXTERNAL TABLE* alltypes_jdbc (
  id INT,
  bool_col BOOLEAN,
  tinyint_col TINYINT,
  smallint_col SMALLINT,
  int_col INT,
  bigint_col BIGINT,
  float_col FLOAT,
  double_col DOUBLE,
  date_col DATE,
  string_col STRING,
  timestamp_col TIMESTAMP)
*STORED BY JDBC*
*TBLPROPERTIES* (
  "database.type"="POSTGRES",
  "jdbc.url"="jdbc:postgresql://10.96.132.138:5432/functional",
  "jdbc.driver"="org.postgresql.Driver",
  "driver.url"="/test-warehouse/data-sources/jdbc-drivers/postgresql-jdbc.jar",
  "dbcp.username"="hiveuser",
  "dbcp.password"="password",
  "table"="alltypes");

*CREATE EXTERNAL TABLE* alltypes_jdbc (
  id INT,
  bool_col BOOLEAN,
  tinyint_col TINYINT,
  smallint_col SMALLINT,
  int_col INT,
  bigint_col BIGINT,
  float_col FLOAT,
  double_col DOUBLE,
  date_col DATE,
  string_col STRING,
  timestamp_col TIMESTAMP)
*STORED BY JDBC*
*TBLPROPERTIES* (
  "database.type"="IMPALA",
  "jdbc.url"="jdbc:impala://10.96.132.138:21050/functional",
  "jdbc.auth"="AuthMech=3",
  "jdbc.properties"="MEM_LIMIT=10, MAX_ERRORS = 1",
  "jdbc.driver"="com.cloudera.impala.jdbc.Driver",
  
"driver.url"="hdfs://test-warehouse/data-sources/jdbc-drivers/ImpalaJDBC42.jar",
  "dbcp.username"="hiveuser",
  
"dbcp.password.keystore"="jceks://hdfs/test-warehouse/data-sources/test.jceks",
  "dbcp.password.key"="hiveuser",
  "table"="alltypes");

*Supported data types*
The column data type for an external JDBC table can be:
Numeric data type: boolean, tinyint, smallint, int, bigint, float, double
Decimal with scale and precision
String type: string
Date
Timestamp

Note that following data types are not supported: char, varchar, and binary. 
Complex data type: struct, map, array and nested type are not supported.

*Table Properties specified for external JDBC table*
In the create external JDBC table statement, user is required to specify the 
following table properties:
database.type: IMPALA, MYSQL, POSTGRES
jdbc.url: jdbc connection string, including the database type, IP address, port 
number, and database name. For example, 
“jdbc:impala://10.96.132.138:21050/functional”.
jdbc.driver: class name of jdbc driver
driver.url: driver URL for downloading the Jar file package that is used to 
access the external database.
table: name of the external table to be mapped in Impala.

Besides the above required properties, user can also specify optional 
parameters to use different authentication methods, or to allow case sensitive 
columns names in remote tables, or to specify additional database properties, 
etc:
jdbc.auth: authentication mechanisms of JDBC driver. It's used for 
Impala-Impala federation.
dbcp.username: jdbc user name
dbcp.password: jdbc password in clear text, this parameter is strongly 
discouraged in production environments. The recommended way is to store it in a 
keystore. See section  “securing password” for details.
dbcp.password.key:  key of the keystore
dbcp.password.keystore: keystore URI.
jdbc.properties: additional properties applied to database engine, like Impala 
Query options. Properties are specified as comma-delimited key=value string. 
jdbc.fetch.size: number of rows to fetch in a batch
column.mappting: Mapping of column names between external table and Impala JDBC 
table. See section "Support case-sensitive table/column names" for details. 

*Securing password*
To mitigate the password leak, the value of “dbcp.password” table property is 
masked in the output of commands “SHOW CREATE TABLE table-name” and “DESCRIBE 
FORMATTED | EXTENDED table-name”.
In production deployment, it is strongly discouraged to save the jdbc password 
in clear text in table property "dbcp.password". Instead, user can store 
password in a Java keystore file on HDFS by using the command like below to 
create a keystore file:
  hadoop credential create host1.password -provider 
jceks://hdfs/user/test.jceks -v passwd1
Then specify “dbcp.password.key” and “dbcp.password.keystore” instead of 
“dbcp.password” in the create table statement.

*Support case-sensitive table/column names*

Column names of remote tables may be different from the JDBC table schema. For 
example, Postgres allows case-sensitive column names, but Impala always saves 
column names in lowercase. In this case, the user can set the “column.mapping” 
table property to map 

[jira] [Comment Edited] (IMPALA-12754) Update Impala document to cover external jdbc table

2024-05-01 Thread Wenzhe Zhou (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842441#comment-17842441
 ] 

Wenzhe Zhou edited comment on IMPALA-12754 at 5/1/24 5:59 PM:
--

*Impala SQL syntax to create external JDBC table*

When creating an external JDBC table, the user needs to specify the minimum 
information: database type, jdbc url, driver class, driver file location, user 
name and password for querying database, table name. Here are two samples to 
create JDBC tables for tables on Postgres server and another Impala cluster 
respectively.

*CREATE EXTERNAL TABLE* alltypes_jdbc (
  id INT,
  bool_col BOOLEAN,
  tinyint_col TINYINT,
  smallint_col SMALLINT,
  int_col INT,
  bigint_col BIGINT,
  float_col FLOAT,
  double_col DOUBLE,
  date_col DATE,
  string_col STRING,
  timestamp_col TIMESTAMP)
*STORED BY JDBC*
*TBLPROPERTIES* (
  "database.type"="POSTGRES",
  "jdbc.url"="jdbc:postgresql://10.96.132.138:5432/functional",
  "jdbc.driver"="org.postgresql.Driver",
  "driver.url"="/test-warehouse/data-sources/jdbc-drivers/postgresql-jdbc.jar",
  "dbcp.username"="hiveuser",
  "dbcp.password"="password",
  "table"="alltypes");

*CREATE EXTERNAL TABLE* alltypes_jdbc (
  id INT,
  bool_col BOOLEAN,
  tinyint_col TINYINT,
  smallint_col SMALLINT,
  int_col INT,
  bigint_col BIGINT,
  float_col FLOAT,
  double_col DOUBLE,
  date_col DATE,
  string_col STRING,
  timestamp_col TIMESTAMP)
*STORED BY JDBC*
*TBLPROPERTIES* (
  "database.type"="IMPALA",
  "jdbc.url"="jdbc:impala://10.96.132.138:21050/functional",
  "jdbc.auth"="AuthMech=3",
  "jdbc.properties"="MEM_LIMIT=10, MAX_ERRORS = 1",
  "jdbc.driver"="com.cloudera.impala.jdbc.Driver",
  
"driver.url"="hdfs://test-warehouse/data-sources/jdbc-drivers/ImpalaJDBC42.jar",
  "dbcp.username"="hiveuser",
  
"dbcp.password.keystore"="jceks://hdfs/test-warehouse/data-sources/test.jceks",
  "dbcp.password.key"="hiveuser",
  "table"="alltypes");

*Supported data types*
The column data type for an external JDBC table can be:
Numeric data type: boolean, tinyint, smallint, int, bigint, float, double
Decimal with scale and precision
String type: string
Date
Timestamp

Note that following data types are not supported: char, varchar, and binary. 
Complex data type: struct, map, array and nested type are not supported.

*Table Properties specified for external JDBC table*
In the create external JDBC table statement, user is required to specify the 
following table properties:
database.type: IMPALA, MYSQL, POSTGRES
jdbc.url: jdbc connection string, including the database type, IP address, port 
number, and database name. For example, 
“jdbc:impala://10.96.132.138:21050/functional”.
jdbc.driver: class name of jdbc driver
driver.url: driver URL for downloading the Jar file package that is used to 
access the external database.
table: name of the external table to be mapped in Impala.

Besides the above required properties, user can also specify optional 
parameters to use different authentication methods, or to allow case sensitive 
columns names in remote tables, or to specify additional database properties, 
etc:
jdbc.auth: authentication mechanisms of JDBC driver. It's used for 
Impala-Impala federation.
dbcp.username: jdbc user name
dbcp.password: jdbc password in clear text, this parameter is strongly 
discouraged in production environments. The recommended way is to store it in a 
keystore. See section  “securing password” for details.
dbcp.password.key:  key of the keystore
dbcp.password.keystore: keystore URI.
jdbc.properties: additional properties applied to database engine, like Impala 
Query options. Properties are specified as comma-delimited key=value string. 
jdbc.fetch.size: number of rows to fetch in a batch
column.mappting: Mapping of column names between external table and Impala JDBC 
table. See section "Support case-sensitive table/column names" for details. 

*Securing password*
To mitigate the password leak, the value of “dbcp.password” table property is 
masked in the output of commands “SHOW CREATE TABLE table-name” and “DESCRIBE 
FORMATTED | EXTENDED table-name”.
In production deployment, it is strongly discouraged to save the jdbc password 
in clear text in table property "dbcp.password". Instead, user can store 
password in a Java keystore file on HDFS by using the command like below to 
create a keystore file:
  hadoop credential create host1.password -provider 
jceks://hdfs/user/test.jceks -v passwd1
Then specify “dbcp.password.key” and “dbcp.password.keystore” instead of 
“dbcp.password” in the create table statement.

*Support case-sensitive table/column names*

Column names of remote tables may be different from the JDBC table schema. For 
example, Postgres allows case-sensitive column names, but Impala always saves 
column names in lowercase. In this case, the user can set the “column.mapping” 
table property to map 

[jira] [Comment Edited] (IMPALA-12754) Update Impala document to cover external jdbc table

2024-05-01 Thread Wenzhe Zhou (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842441#comment-17842441
 ] 

Wenzhe Zhou edited comment on IMPALA-12754 at 5/1/24 5:55 PM:
--

*Impala SQL syntax to create external JDBC table*

When creating an external JDBC table, the user needs to specify the minimum 
information: database type, jdbc url, driver class, driver file location, user 
name and password for querying database, table name. Here are two samples to 
create JDBC tables for tables on Postgres server and another Impala cluster 
respectively.

*CREATE EXTERNAL TABLE* alltypes_jdbc (
  id INT,
  bool_col BOOLEAN,
  tinyint_col TINYINT,
  smallint_col SMALLINT,
  int_col INT,
  bigint_col BIGINT,
  float_col FLOAT,
  double_col DOUBLE,
  date_col DATE,
  string_col STRING,
  timestamp_col TIMESTAMP)
*STORED BY JDBC*
*TBLPROPERTIES* (
  "database.type"="POSTGRES",
  "jdbc.url"="jdbc:postgresql://10.96.132.138:5432/functional",
  "jdbc.driver"="org.postgresql.Driver",
  "driver.url"="/test-warehouse/data-sources/jdbc-drivers/postgresql-jdbc.jar",
  "dbcp.username"="hiveuser",
  "dbcp.password"="password",
  "table"="alltypes");

*CREATE EXTERNAL TABLE* alltypes_jdbc (
  id INT,
  bool_col BOOLEAN,
  tinyint_col TINYINT,
  smallint_col SMALLINT,
  int_col INT,
  bigint_col BIGINT,
  float_col FLOAT,
  double_col DOUBLE,
  date_col DATE,
  string_col STRING,
  timestamp_col TIMESTAMP)
*STORED BY JDBC*
*TBLPROPERTIES* (
  "database.type"="IMPALA",
  "jdbc.url"="jdbc:impala://10.96.132.138:21050/functional",
  "jdbc.auth"="AuthMech=3",
  "jdbc.properties"="MEM_LIMIT=10, MAX_ERRORS = 1",
  "jdbc.driver"="com.cloudera.impala.jdbc.Driver",
  
"driver.url"="hdfs://test-warehouse/data-sources/jdbc-drivers/ImpalaJDBC42.jar",
  "dbcp.username"="hiveuser",
  
"dbcp.password.keystore"="jceks://hdfs/test-warehouse/data-sources/test.jceks",
  "dbcp.password.key"="hiveuser",
  "table"="alltypes");

*Supported data types*
The column data type for an external JDBC table can be:
Numeric data type: boolean, tinyint, smallint, int, bigint, float, double
Decimal with scale and precision
String type: string
Date
Timestamp

Note that following data types are not supported: char, varchar, and binary. 
Complex data type: struct, map, array and nested type are not supported.

*Table Properties specified for external JDBC table*
In the create external JDBC table statement, user is required to specify the 
following table properties:
database.type: IMPALA, MYSQL, POSTGRES
jdbc.url: jdbc connection string, including the database type, IP address, port 
number, and database name. For example, 
“jdbc:impala://10.96.132.138:21050/functional”.
jdbc.driver: class name of jdbc driver
driver.url: driver URL for downloading the Jar file package that is used to 
access the external database.
table: name of the external table to be mapped in Impala.

Besides the above required properties, user can also specify optional 
parameters to use different authentication methods, or to allow case sensitive 
columns names in remote tables, or to specify additional database properties, 
etc:
jdbc.auth: authentication mechanisms of JDBC driver. It's used for 
Impala-Impala federation.
dbcp.username: jdbc user name
dbcp.password: jdbc password in clear text, this parameter is strongly 
discouraged in production environments. The recommended way is to store it in a 
keystore. See section  “securing password” for details.
dbcp.password.key:  key of the keystore
dbcp.password.keystore: keystore URI.
jdbc.properties: additional properties applied to database engine, like Impala 
Query options.
Properties are specified as comma-delimited key=value string. 
jdbc.fetch.size: number of rows to fetch in a batch
column.mappting: Mapping of column names between external table and Impala JDBC 
table. See section "Support case-sensitive table/column names" for details. 

*Securing password*
To mitigate the password leak, the value of “dbcp.password” table property is 
masked in the output of commands “SHOW CREATE TABLE table-name” and “DESCRIBE 
FORMATTED | EXTENDED table-name”.
In production deployment, it is strongly discouraged to save the jdbc password 
in clear text in table property "dbcp.password". Instead, user can store 
password in a Java keystore file on HDFS by using the command like below to 
create a keystore file:
  hadoop credential create host1.password -provider 
jceks://hdfs/user/test.jceks -v passwd1
Then specify “dbcp.password.key” and “dbcp.password.keystore” instead of 
“dbcp.password” in the create table statement.

*Support case-sensitive table/column names*

Column names of remote tables may be different from the JDBC table schema. For 
example, Postgres allows case-sensitive column names, but Impala always saves 
column names in lowercase. In this case, the user can set the “column.mapping” 
table property to map 

[jira] [Updated] (IMPALA-12754) Update Impala document to cover external jdbc table

2024-05-01 Thread Wenzhe Zhou (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenzhe Zhou updated IMPALA-12754:
-
Description: 
We need to document  the SQL syntax to create jdbc table and alter jdbc table, 
including the table properties to be set for jdbc and DBCP (Database connection 
pool).
 

  was:
Impala external data source is undocumented in upstream. We need to document 
the external data source APIs, SQL syntax to create jdbc table, including the 
properties to be set for jdbc and DBCP (Database connection pool).
 


> Update Impala document to cover external jdbc table
> ---
>
> Key: IMPALA-12754
> URL: https://issues.apache.org/jira/browse/IMPALA-12754
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Docs
>Reporter: Wenzhe Zhou
>Assignee: gaurav singh
>Priority: Major
>
> We need to document  the SQL syntax to create jdbc table and alter jdbc 
> table, including the table properties to be set for jdbc and DBCP (Database 
> connection pool).
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-12118) Consider using pytorch cpuinfo library for CPU detection / logging

2024-05-01 Thread Michael Smith (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith reassigned IMPALA-12118:
--

Assignee: Michael Smith

> Consider using pytorch cpuinfo library for CPU detection / logging
> --
>
> Key: IMPALA-12118
> URL: https://issues.apache.org/jira/browse/IMPALA-12118
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 4.3.0
>Reporter: Joe McDonnell
>Assignee: Michael Smith
>Priority: Major
>  Labels: arm
>
> The code in be/src/util/cpu-info.h/.cc provides information about the CPU 
> that we run on, along with NUMA configurations, cache sizes, etc. 
> Pytorch's cpuinfo package seems to cover most of what we want to do across a 
> much broader set of processors / architectures. It has a compatible license. 
> See [https://github.com/pytorch/cpuinfo]
> We should see if this is useful for our cpu detection code.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-12521) gerrit-verify-dryrun should include an ARM build

2024-05-01 Thread Michael Smith (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith resolved IMPALA-12521.

Resolution: Fixed

> gerrit-verify-dryrun should include an ARM build
> 
>
> Key: IMPALA-12521
> URL: https://issues.apache.org/jira/browse/IMPALA-12521
> Project: IMPALA
>  Issue Type: Task
>  Components: Infrastructure
>Reporter: Michael Smith
>Assignee: Laszlo Gaal
>Priority: Major
>
> Impala has some ARM-specific code that has occasionally been broken. We 
> should include a build-only job as part of the set of jobs run by 
> gerrit-verify-dryrun to avoid regressions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-12118) Consider using pytorch cpuinfo library for CPU detection / logging

2024-05-01 Thread Michael Smith (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith updated IMPALA-12118:
---
Epic Link:   (was: IMPALA-12353)

> Consider using pytorch cpuinfo library for CPU detection / logging
> --
>
> Key: IMPALA-12118
> URL: https://issues.apache.org/jira/browse/IMPALA-12118
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 4.3.0
>Reporter: Joe McDonnell
>Priority: Major
>  Labels: arm
>
> The code in be/src/util/cpu-info.h/.cc provides information about the CPU 
> that we run on, along with NUMA configurations, cache sizes, etc. 
> Pytorch's cpuinfo package seems to cover most of what we want to do across a 
> much broader set of processors / architectures. It has a compatible license. 
> See [https://github.com/pytorch/cpuinfo]
> We should see if this is useful for our cpu detection code.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-12869) Add flag for bin/start-impala-cluster.py to start with local catalog mode

2024-05-01 Thread Saurabh Katiyal (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saurabh Katiyal reassigned IMPALA-12869:


Assignee: Anshula Jain

> Add flag for bin/start-impala-cluster.py to start with local catalog mode
> -
>
> Key: IMPALA-12869
> URL: https://issues.apache.org/jira/browse/IMPALA-12869
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Infrastructure
>Reporter: Quanlong Huang
>Assignee: Anshula Jain
>Priority: Major
>  Labels: newbie, ramp-up
>
> It'd be convinient to start Impala cluster with local catalog mode by a 
> command like
> {code:bash}
> bin/start-impala-cluster.py --use_local_catalog {code}
> Currently, we need to use a longer command which is easier to have typos:
> {code:bash}
> bin/start-impala-cluster.py --catalogd_args=--catalog_topic_mode=minimal 
> --impalad_args=--use_local_catalog{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-13033) impala-profile-tool should support parsing thrift profiles downloaded from WebUI

2024-05-01 Thread Saurabh Katiyal (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saurabh Katiyal reassigned IMPALA-13033:


Assignee: Anshula Jain

> impala-profile-tool should support parsing thrift profiles downloaded from 
> WebUI
> 
>
> Key: IMPALA-13033
> URL: https://issues.apache.org/jira/browse/IMPALA-13033
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Reporter: Quanlong Huang
>Assignee: Anshula Jain
>Priority: Major
>  Labels: newbie, ramp-up
>
> In the coordinator WebUI, users can download query profiles in 
> text/json/thrift formats. The thrift profile is the same as one line in the 
> profile log without the timestamp and query id at the beginning.
> impala-profile-tool fails to parse such a file. It should retry parsing the 
> whole line as the encoded profile. Current code snipper:
> {code:cpp}
> // Parse out fields from the line.
> istringstream liness(line);
> int64_t timestamp;
> string query_id, encoded_profile;
> liness >> timestamp >> query_id >> encoded_profile;
> if (liness.fail()) {
>   cerr << "Error parsing line " << lineno << ": '" << line << "'\n";
>   ++errors;
>   continue;
> }{code}
> https://github.com/apache/impala/blob/f620e5d5c0bbdb0fd97bac31c7b7439cd13c6d08/be/src/util/impala-profile-tool.cc#L109



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org