[jira] [Created] (IMPALA-13077) Equality predicate on partition column and uncorrelated subquery doesn't reduce the cardinality estimate
Quanlong Huang created IMPALA-13077: --- Summary: Equality predicate on partition column and uncorrelated subquery doesn't reduce the cardinality estimate Key: IMPALA-13077 URL: https://issues.apache.org/jira/browse/IMPALA-13077 Project: IMPALA Issue Type: Bug Components: Frontend Reporter: Quanlong Huang Let's say 'part_tbl' is a partitioned table. Its partition key is 'part_key'. Consider the following query: {code:sql} select xxx from part_tbl where part_key=(select ... from dim_tbl); {code} Its query plan is a JoinNode with two ScanNodes. When estimating the cardinality of the JoinNode, the planner is not aware that 'part_key' is the partition column and the cardinality of the JoinNode should not be larger than the max row count across partitions. The recent work in IMPALA-12018 (Consider runtime filter for cardinality reduction) helps in some cases since there are runtime filters on the partition column. But there are still some cases that we overestimate the cardinality. For instance, 'ss_sold_date_sk' is the only partition key of tpcds.store_sales. The following query {code:sql} select count(*) from tpcds.store_sales where ss_sold_date_sk=( select min(d_date_sk) + 1000 from tpcds.date_dim);{code} has query plan: {noformat} +-+ | Explain String | +-+ | Max Per-Host Resource Reservation: Memory=18.94MB Threads=6 | | Per-Host Resource Estimates: Memory=243MB | | | | PLAN-ROOT SINK | | | | | 09:AGGREGATE [FINALIZE] | | | output: count:merge(*) | | | row-size=8B cardinality=1| | | | | 08:EXCHANGE [UNPARTITIONED] | | | | | 04:AGGREGATE| | | output: count(*) | | | row-size=8B cardinality=1| | | | | 03:HASH JOIN [LEFT SEMI JOIN, BROADCAST]| | | hash predicates: ss_sold_date_sk = min(d_date_sk) + 1000 | | | runtime filters: RF000 <- min(d_date_sk) + 1000 | | | row-size=4B cardinality=2.88M < Should be max(numRows) across partitions | | | | |--07:EXCHANGE [BROADCAST] | | | || | | 06:AGGREGATE [FINALIZE] | | | | output: min:merge(d_date_sk) | | | | row-size=4B cardinality=1 | | | || | | 05:EXCHANGE [UNPARTITIONED] | | | || | | 02:AGGREGATE | | | | output: min(d_date_sk)| | | | row-size=4B cardinality=1 | | | || | | 01:SCAN HDFS [tpcds.date_dim]| | | HDFS partitions=1/1 files=1 size=9.84MB | | | row-size=4B cardinality=73.05K| | | | | 00:SCAN HDFS [tpcds.store_sales]| |HDFS partitions=1824/1824 files=1824 size=346.60MB | |runtime filters: RF000 -> ss_sold_date_sk| |row-size=4B cardinality=2.88M| +-+{noformat} CC [~boroknagyz], [~rizaon] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-13076) Add pstack and jstack to Impala docker images
[ https://issues.apache.org/jira/browse/IMPALA-13076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Sherman reassigned IMPALA-13076: --- Assignee: Andrew Sherman > Add pstack and jstack to Impala docker images > - > > Key: IMPALA-13076 > URL: https://issues.apache.org/jira/browse/IMPALA-13076 > Project: IMPALA > Issue Type: Bug >Affects Versions: Impala 4.4.0 >Reporter: Andrew Sherman >Assignee: Andrew Sherman >Priority: Major > > When the Impala docker images are deployed in production environments, it can > be hard to add debugging tools at runtime. Two of the most useful diagnosis > tools are jstack and pstack, which can be used to find Java and native stack > traces. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-13076) Add pstack and jstack to Impala docker images
Andrew Sherman created IMPALA-13076: --- Summary: Add pstack and jstack to Impala docker images Key: IMPALA-13076 URL: https://issues.apache.org/jira/browse/IMPALA-13076 Project: IMPALA Issue Type: Bug Affects Versions: Impala 4.4.0 Reporter: Andrew Sherman When the Impala docker images are deployed in production environments, it can be hard to add debugging tools at runtime. Two of the most useful diagnosis tools are jstack and pstack, which can be used to find Java and native stack traces. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Work started] (IMPALA-12652) Limit Length of Completed Queries Insert DML
[ https://issues.apache.org/jira/browse/IMPALA-12652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on IMPALA-12652 started by Jason Fehr. --- > Limit Length of Completed Queries Insert DML > > > Key: IMPALA-12652 > URL: https://issues.apache.org/jira/browse/IMPALA-12652 > Project: IMPALA > Issue Type: Improvement > Components: be >Reporter: Jason Fehr >Assignee: Jason Fehr >Priority: Major > Labels: backend, workload-management > > Implement a coordinator startup flag that limits the max length (number of > characters) of the insert DML statement that inserts records into the > impala_query_log table. > The purpose of this flag is to ensure that workload management does not > generate an insert DML statement that exceeds Impala's max length for a sql > statement (approximately 16 megabytes or 16 million characters). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-13075) Setting very high BATCH_SIZE can blow up memory usage of fragments
[ https://issues.apache.org/jira/browse/IMPALA-13075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846065#comment-17846065 ] Riza Suminto commented on IMPALA-13075: --- Comparing between [^Mem_Limit_1G_Failed.txt] and [^Batch_size_0_Success.txt], I see that both has "Per-Host Resurce Estimates" beyond 1GB MEM_LIMIT, and "Per-Host Resource Estimates" that is barely under MEM_LIMIT for some nodes. That being said, I also see abnormally high "PeakMemoryUsage" for HASH_JOIN_NODE (id=8) at cdp-datahub-prod-worker4.cdp-prod.c8g3-pfxs.cloudera.site:27000: {code:java} Fragment F07 - InactiveTotalTime: 0ns (0) - TotalTime: 0ns (0) Instance eb4b5cd4eb0fa41e:80d577e00053 (host=cdp-datahub-prod-worker4.cdp-prod.c8g3-pfxs.cloudera.site:27000) ... HASH_JOIN_NODE (id=8) ExecOption: Join Build-Side Prepared Asynchronously - InactiveTotalTime: 0ns (0) - LocalTime: 2.7m (164763538054) - PeakMemoryUsage: 7.6 GiB (8132571136) - ProbeRows: 65,536 (65536) - ProbeRowsPartitioned: 0 (0) - ProbeTime: 2.7m (164759538041) - RowsReturned: 0 (0) - RowsReturnedRate: 0 per second (0) - TotalTime: 2.8m (166909545061) {code} This high memory usage seems linked with IMPALA-3286 where HashTableCtx::ExprValuesCache::capacity_ is pushed up by BATCH_SIZE number. [https://github.com/apache/impala/blob/0d215da8d4e3f93ad3c1cd72aa801fbcb9464fb0/be/src/exec/hash-table.cc#L369-L373] Note also that the failed query does not have "Probe Side Codegen Enabled" as in the success query. > Setting very high BATCH_SIZE can blow up memory usage of fragments > -- > > Key: IMPALA-13075 > URL: https://issues.apache.org/jira/browse/IMPALA-13075 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Affects Versions: Impala 4.0.0 >Reporter: Ezra Zerihun >Priority: Major > Attachments: Batch_size_0_Success.txt, Failed (1).txt, > Failed_Cognos_pool.txt, Mem_Limit_1G_Failed.txt, Success (1).txt, > Success_Tableau_Pool.txt > > > In Impala 4.0, setting a very high BATCH_SIZE or near max limit of 65536 can > cause some fragment's memory usage to spike way past the query's defined > MEM_LIMIT or pool's Maximum Query Memory Limit with Clamp on. So even though > MEM_LIMIT is set reasonable, the query can still fail with out of memory and > a huge amount of memory used on fragment. Reducing BATCH_SIZE to a reasonable > amount or back to default will allow the query to run without issue and use > reasonable amount of memory within query's MEM_LIMIT or pool's Maximum Query > Memory Limit. > > 1) set BATCH_SIZE=65536; set MEM_LIMIT=1g; > > {code:java} > Query State: EXCEPTION > Impala Query State: ERROR > Query Status: Memory limit exceeded: Error occurred on backend ...:27000 > by fragment ... Memory left in process limit: 145.53 GB Memory left in query > limit: -6.80 GB Query(...): memory limit exceeded. Limit=1.00 GB > Reservation=86.44 MB ReservationLimit=819.20 MB OtherMemory=7.71 GB > Total=7.80 GB Peak=7.84 GB Unclaimed reservations: Reservation=8.50 MB > OtherMemory=0 Total=8.50 MB Peak=56.44 MB Runtime Filter Bank: > Reservation=4.00 MB ReservationLimit=4.00 MB OtherMemory=0 Total=4.00 MB > Peak=4.00 MB Fragment ...: Reservation=1.94 MB OtherMemory=7.59 GB > Total=7.59 GB Peak=7.63 GB HASH_JOIN_NODE (id=8): Reservation=1.94 MB > OtherMemory=7.57 GB Total=7.57 GB Peak=7.57 GB Exprs: Total=7.57 GB > Peak=7.57 GB Hash Join Builder (join_node_id=8): Total=0 Peak=1.95 MB > ... > Query Options (set by configuration): > BATCH_SIZE=65536,MEM_LIMIT=1073741824,CLIENT_IDENTIFIER=Impala Shell > v4.0.0.7.2.16.0-287 (5ae3917) built on Mon Jan 9 21:23:59 UTC > 2023,DEFAULT_FILE_FORMAT=PARQUET,... > ... > ExecSummary: > ... > 09:AGGREGATE 32 32 0.000ns 0.000ns 0 > 4.83M 36.31 MB 212.78 MB STREAMING > 08:HASH JOIN 32 32 5s149ms 2m44s 0 > 194.95M 7.57 GB 1.94 MB RIGHT OUTER JOIN, PARTITIONED > |--18:EXCHANGE 32 32 93.750us 1.000ms 10.46K > 1.55K 1.65 MB 2.56 MB HASH(... > {code} > > > 2) set BATCH_SIZE=0; set MEM_LIMIT=1g; > > {code:java} > Query State: FINISHED > Impala Query State: FINISHED > ... > Query Options (set by configuration and planner): > MEM_LIMIT=1073741824,CLIENT_IDENTIFIER=Impala Shell v4.0.0.7.2.16.0-287 > (5ae3917) built on Mon Jan 9 21:23:59 UTC > 2023,DEFAULT_FILE_FORMAT=PARQUET,... > ... > ExecSummary: > ... > 09:AGGREGATE 32 32 593.748us 18.999ms
[jira] [Commented] (IMPALA-13075) Setting very high BATCH_SIZE can blow up memory usage of fragments
[ https://issues.apache.org/jira/browse/IMPALA-13075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846040#comment-17846040 ] Ezra Zerihun commented on IMPALA-13075: --- [^Failed (1).txt] [^Success (1).txt] [^Failed_Cognos_pool.txt] [^Mem_Limit_1G_Failed.txt] [^Success_Tableau_Pool.txt] [^Batch_size_0_Success.txt] Main comparisons, which I made in description can be: [^Mem_Limit_1G_Failed.txt] vs [^Batch_size_0_Success.txt] > Setting very high BATCH_SIZE can blow up memory usage of fragments > -- > > Key: IMPALA-13075 > URL: https://issues.apache.org/jira/browse/IMPALA-13075 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Affects Versions: Impala 4.0.0 >Reporter: Ezra Zerihun >Priority: Major > Attachments: Batch_size_0_Success.txt, Failed (1).txt, > Failed_Cognos_pool.txt, Mem_Limit_1G_Failed.txt, Success (1).txt, > Success_Tableau_Pool.txt > > > In Impala 4.0, setting a very high BATCH_SIZE or near max limit of 65536 can > cause some fragment's memory usage to spike way past the query's defined > MEM_LIMIT or pool's Maximum Query Memory Limit with Clamp on. So even though > MEM_LIMIT is set reasonable, the query can still fail with out of memory and > a huge amount of memory used on fragment. Reducing BATCH_SIZE to a reasonable > amount or back to default will allow the query to run without issue and use > reasonable amount of memory within query's MEM_LIMIT or pool's Maximum Query > Memory Limit. > > 1) set BATCH_SIZE=65536; set MEM_LIMIT=1g; > > {code:java} > Query State: EXCEPTION > Impala Query State: ERROR > Query Status: Memory limit exceeded: Error occurred on backend ...:27000 > by fragment ... Memory left in process limit: 145.53 GB Memory left in query > limit: -6.80 GB Query(...): memory limit exceeded. Limit=1.00 GB > Reservation=86.44 MB ReservationLimit=819.20 MB OtherMemory=7.71 GB > Total=7.80 GB Peak=7.84 GB Unclaimed reservations: Reservation=8.50 MB > OtherMemory=0 Total=8.50 MB Peak=56.44 MB Runtime Filter Bank: > Reservation=4.00 MB ReservationLimit=4.00 MB OtherMemory=0 Total=4.00 MB > Peak=4.00 MB Fragment ...: Reservation=1.94 MB OtherMemory=7.59 GB > Total=7.59 GB Peak=7.63 GB HASH_JOIN_NODE (id=8): Reservation=1.94 MB > OtherMemory=7.57 GB Total=7.57 GB Peak=7.57 GB Exprs: Total=7.57 GB > Peak=7.57 GB Hash Join Builder (join_node_id=8): Total=0 Peak=1.95 MB > ... > Query Options (set by configuration): > BATCH_SIZE=65536,MEM_LIMIT=1073741824,CLIENT_IDENTIFIER=Impala Shell > v4.0.0.7.2.16.0-287 (5ae3917) built on Mon Jan 9 21:23:59 UTC > 2023,DEFAULT_FILE_FORMAT=PARQUET,... > ... > ExecSummary: > ... > 09:AGGREGATE 32 32 0.000ns 0.000ns 0 > 4.83M 36.31 MB 212.78 MB STREAMING > 08:HASH JOIN 32 32 5s149ms 2m44s 0 > 194.95M 7.57 GB 1.94 MB RIGHT OUTER JOIN, PARTITIONED > |--18:EXCHANGE 32 32 93.750us 1.000ms 10.46K > 1.55K 1.65 MB 2.56 MB HASH(... > {code} > > > 2) set BATCH_SIZE=0; set MEM_LIMIT=1g; > > {code:java} > Query State: FINISHED > Impala Query State: FINISHED > ... > Query Options (set by configuration and planner): > MEM_LIMIT=1073741824,CLIENT_IDENTIFIER=Impala Shell v4.0.0.7.2.16.0-287 > (5ae3917) built on Mon Jan 9 21:23:59 UTC > 2023,DEFAULT_FILE_FORMAT=PARQUET,... > ... > ExecSummary: > ... > 09:AGGREGATE 32 32 593.748us 18.999ms 45 > 4.83M 34.06 MB 212.78 MB STREAMING > 08:HASH JOIN 32 32 10s873ms 5m47s 10.47K > 194.95M 123.48 MB 1.94 MB RIGHT OUTER JOIN, PARTITIONED > |--18:EXCHANGE 32 32 0.000ns 0.000ns 10.46K > 1.55K 344.00 KB 1.69 MB HASH(... > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-13075) Setting very high BATCH_SIZE can blow up memory usage of fragments
[ https://issues.apache.org/jira/browse/IMPALA-13075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ezra Zerihun updated IMPALA-13075: -- Attachment: Success_Tableau_Pool.txt > Setting very high BATCH_SIZE can blow up memory usage of fragments > -- > > Key: IMPALA-13075 > URL: https://issues.apache.org/jira/browse/IMPALA-13075 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Affects Versions: Impala 4.0.0 >Reporter: Ezra Zerihun >Priority: Major > Attachments: Batch_size_0_Success.txt, Failed (1).txt, > Failed_Cognos_pool.txt, Mem_Limit_1G_Failed.txt, Success (1).txt, > Success_Tableau_Pool.txt > > > In Impala 4.0, setting a very high BATCH_SIZE or near max limit of 65536 can > cause some fragment's memory usage to spike way past the query's defined > MEM_LIMIT or pool's Maximum Query Memory Limit with Clamp on. So even though > MEM_LIMIT is set reasonable, the query can still fail with out of memory and > a huge amount of memory used on fragment. Reducing BATCH_SIZE to a reasonable > amount or back to default will allow the query to run without issue and use > reasonable amount of memory within query's MEM_LIMIT or pool's Maximum Query > Memory Limit. > > 1) set BATCH_SIZE=65536; set MEM_LIMIT=1g; > > {code:java} > Query State: EXCEPTION > Impala Query State: ERROR > Query Status: Memory limit exceeded: Error occurred on backend ...:27000 > by fragment ... Memory left in process limit: 145.53 GB Memory left in query > limit: -6.80 GB Query(...): memory limit exceeded. Limit=1.00 GB > Reservation=86.44 MB ReservationLimit=819.20 MB OtherMemory=7.71 GB > Total=7.80 GB Peak=7.84 GB Unclaimed reservations: Reservation=8.50 MB > OtherMemory=0 Total=8.50 MB Peak=56.44 MB Runtime Filter Bank: > Reservation=4.00 MB ReservationLimit=4.00 MB OtherMemory=0 Total=4.00 MB > Peak=4.00 MB Fragment ...: Reservation=1.94 MB OtherMemory=7.59 GB > Total=7.59 GB Peak=7.63 GB HASH_JOIN_NODE (id=8): Reservation=1.94 MB > OtherMemory=7.57 GB Total=7.57 GB Peak=7.57 GB Exprs: Total=7.57 GB > Peak=7.57 GB Hash Join Builder (join_node_id=8): Total=0 Peak=1.95 MB > ... > Query Options (set by configuration): > BATCH_SIZE=65536,MEM_LIMIT=1073741824,CLIENT_IDENTIFIER=Impala Shell > v4.0.0.7.2.16.0-287 (5ae3917) built on Mon Jan 9 21:23:59 UTC > 2023,DEFAULT_FILE_FORMAT=PARQUET,... > ... > ExecSummary: > ... > 09:AGGREGATE 32 32 0.000ns 0.000ns 0 > 4.83M 36.31 MB 212.78 MB STREAMING > 08:HASH JOIN 32 32 5s149ms 2m44s 0 > 194.95M 7.57 GB 1.94 MB RIGHT OUTER JOIN, PARTITIONED > |--18:EXCHANGE 32 32 93.750us 1.000ms 10.46K > 1.55K 1.65 MB 2.56 MB HASH(... > {code} > > > 2) set BATCH_SIZE=0; set MEM_LIMIT=1g; > > {code:java} > Query State: FINISHED > Impala Query State: FINISHED > ... > Query Options (set by configuration and planner): > MEM_LIMIT=1073741824,CLIENT_IDENTIFIER=Impala Shell v4.0.0.7.2.16.0-287 > (5ae3917) built on Mon Jan 9 21:23:59 UTC > 2023,DEFAULT_FILE_FORMAT=PARQUET,... > ... > ExecSummary: > ... > 09:AGGREGATE 32 32 593.748us 18.999ms 45 > 4.83M 34.06 MB 212.78 MB STREAMING > 08:HASH JOIN 32 32 10s873ms 5m47s 10.47K > 194.95M 123.48 MB 1.94 MB RIGHT OUTER JOIN, PARTITIONED > |--18:EXCHANGE 32 32 0.000ns 0.000ns 10.46K > 1.55K 344.00 KB 1.69 MB HASH(... > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-13075) Setting very high BATCH_SIZE can blow up memory usage of fragments
[ https://issues.apache.org/jira/browse/IMPALA-13075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ezra Zerihun updated IMPALA-13075: -- Attachment: Failed_Cognos_pool.txt > Setting very high BATCH_SIZE can blow up memory usage of fragments > -- > > Key: IMPALA-13075 > URL: https://issues.apache.org/jira/browse/IMPALA-13075 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Affects Versions: Impala 4.0.0 >Reporter: Ezra Zerihun >Priority: Major > Attachments: Batch_size_0_Success.txt, Failed (1).txt, > Failed_Cognos_pool.txt, Mem_Limit_1G_Failed.txt, Success (1).txt, > Success_Tableau_Pool.txt > > > In Impala 4.0, setting a very high BATCH_SIZE or near max limit of 65536 can > cause some fragment's memory usage to spike way past the query's defined > MEM_LIMIT or pool's Maximum Query Memory Limit with Clamp on. So even though > MEM_LIMIT is set reasonable, the query can still fail with out of memory and > a huge amount of memory used on fragment. Reducing BATCH_SIZE to a reasonable > amount or back to default will allow the query to run without issue and use > reasonable amount of memory within query's MEM_LIMIT or pool's Maximum Query > Memory Limit. > > 1) set BATCH_SIZE=65536; set MEM_LIMIT=1g; > > {code:java} > Query State: EXCEPTION > Impala Query State: ERROR > Query Status: Memory limit exceeded: Error occurred on backend ...:27000 > by fragment ... Memory left in process limit: 145.53 GB Memory left in query > limit: -6.80 GB Query(...): memory limit exceeded. Limit=1.00 GB > Reservation=86.44 MB ReservationLimit=819.20 MB OtherMemory=7.71 GB > Total=7.80 GB Peak=7.84 GB Unclaimed reservations: Reservation=8.50 MB > OtherMemory=0 Total=8.50 MB Peak=56.44 MB Runtime Filter Bank: > Reservation=4.00 MB ReservationLimit=4.00 MB OtherMemory=0 Total=4.00 MB > Peak=4.00 MB Fragment ...: Reservation=1.94 MB OtherMemory=7.59 GB > Total=7.59 GB Peak=7.63 GB HASH_JOIN_NODE (id=8): Reservation=1.94 MB > OtherMemory=7.57 GB Total=7.57 GB Peak=7.57 GB Exprs: Total=7.57 GB > Peak=7.57 GB Hash Join Builder (join_node_id=8): Total=0 Peak=1.95 MB > ... > Query Options (set by configuration): > BATCH_SIZE=65536,MEM_LIMIT=1073741824,CLIENT_IDENTIFIER=Impala Shell > v4.0.0.7.2.16.0-287 (5ae3917) built on Mon Jan 9 21:23:59 UTC > 2023,DEFAULT_FILE_FORMAT=PARQUET,... > ... > ExecSummary: > ... > 09:AGGREGATE 32 32 0.000ns 0.000ns 0 > 4.83M 36.31 MB 212.78 MB STREAMING > 08:HASH JOIN 32 32 5s149ms 2m44s 0 > 194.95M 7.57 GB 1.94 MB RIGHT OUTER JOIN, PARTITIONED > |--18:EXCHANGE 32 32 93.750us 1.000ms 10.46K > 1.55K 1.65 MB 2.56 MB HASH(... > {code} > > > 2) set BATCH_SIZE=0; set MEM_LIMIT=1g; > > {code:java} > Query State: FINISHED > Impala Query State: FINISHED > ... > Query Options (set by configuration and planner): > MEM_LIMIT=1073741824,CLIENT_IDENTIFIER=Impala Shell v4.0.0.7.2.16.0-287 > (5ae3917) built on Mon Jan 9 21:23:59 UTC > 2023,DEFAULT_FILE_FORMAT=PARQUET,... > ... > ExecSummary: > ... > 09:AGGREGATE 32 32 593.748us 18.999ms 45 > 4.83M 34.06 MB 212.78 MB STREAMING > 08:HASH JOIN 32 32 10s873ms 5m47s 10.47K > 194.95M 123.48 MB 1.94 MB RIGHT OUTER JOIN, PARTITIONED > |--18:EXCHANGE 32 32 0.000ns 0.000ns 10.46K > 1.55K 344.00 KB 1.69 MB HASH(... > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-13075) Setting very high BATCH_SIZE can blow up memory usage of fragments
[ https://issues.apache.org/jira/browse/IMPALA-13075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ezra Zerihun updated IMPALA-13075: -- Attachment: Failed (1).txt > Setting very high BATCH_SIZE can blow up memory usage of fragments > -- > > Key: IMPALA-13075 > URL: https://issues.apache.org/jira/browse/IMPALA-13075 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Affects Versions: Impala 4.0.0 >Reporter: Ezra Zerihun >Priority: Major > Attachments: Batch_size_0_Success.txt, Failed (1).txt, > Failed_Cognos_pool.txt, Mem_Limit_1G_Failed.txt, Success (1).txt, > Success_Tableau_Pool.txt > > > In Impala 4.0, setting a very high BATCH_SIZE or near max limit of 65536 can > cause some fragment's memory usage to spike way past the query's defined > MEM_LIMIT or pool's Maximum Query Memory Limit with Clamp on. So even though > MEM_LIMIT is set reasonable, the query can still fail with out of memory and > a huge amount of memory used on fragment. Reducing BATCH_SIZE to a reasonable > amount or back to default will allow the query to run without issue and use > reasonable amount of memory within query's MEM_LIMIT or pool's Maximum Query > Memory Limit. > > 1) set BATCH_SIZE=65536; set MEM_LIMIT=1g; > > {code:java} > Query State: EXCEPTION > Impala Query State: ERROR > Query Status: Memory limit exceeded: Error occurred on backend ...:27000 > by fragment ... Memory left in process limit: 145.53 GB Memory left in query > limit: -6.80 GB Query(...): memory limit exceeded. Limit=1.00 GB > Reservation=86.44 MB ReservationLimit=819.20 MB OtherMemory=7.71 GB > Total=7.80 GB Peak=7.84 GB Unclaimed reservations: Reservation=8.50 MB > OtherMemory=0 Total=8.50 MB Peak=56.44 MB Runtime Filter Bank: > Reservation=4.00 MB ReservationLimit=4.00 MB OtherMemory=0 Total=4.00 MB > Peak=4.00 MB Fragment ...: Reservation=1.94 MB OtherMemory=7.59 GB > Total=7.59 GB Peak=7.63 GB HASH_JOIN_NODE (id=8): Reservation=1.94 MB > OtherMemory=7.57 GB Total=7.57 GB Peak=7.57 GB Exprs: Total=7.57 GB > Peak=7.57 GB Hash Join Builder (join_node_id=8): Total=0 Peak=1.95 MB > ... > Query Options (set by configuration): > BATCH_SIZE=65536,MEM_LIMIT=1073741824,CLIENT_IDENTIFIER=Impala Shell > v4.0.0.7.2.16.0-287 (5ae3917) built on Mon Jan 9 21:23:59 UTC > 2023,DEFAULT_FILE_FORMAT=PARQUET,... > ... > ExecSummary: > ... > 09:AGGREGATE 32 32 0.000ns 0.000ns 0 > 4.83M 36.31 MB 212.78 MB STREAMING > 08:HASH JOIN 32 32 5s149ms 2m44s 0 > 194.95M 7.57 GB 1.94 MB RIGHT OUTER JOIN, PARTITIONED > |--18:EXCHANGE 32 32 93.750us 1.000ms 10.46K > 1.55K 1.65 MB 2.56 MB HASH(... > {code} > > > 2) set BATCH_SIZE=0; set MEM_LIMIT=1g; > > {code:java} > Query State: FINISHED > Impala Query State: FINISHED > ... > Query Options (set by configuration and planner): > MEM_LIMIT=1073741824,CLIENT_IDENTIFIER=Impala Shell v4.0.0.7.2.16.0-287 > (5ae3917) built on Mon Jan 9 21:23:59 UTC > 2023,DEFAULT_FILE_FORMAT=PARQUET,... > ... > ExecSummary: > ... > 09:AGGREGATE 32 32 593.748us 18.999ms 45 > 4.83M 34.06 MB 212.78 MB STREAMING > 08:HASH JOIN 32 32 10s873ms 5m47s 10.47K > 194.95M 123.48 MB 1.94 MB RIGHT OUTER JOIN, PARTITIONED > |--18:EXCHANGE 32 32 0.000ns 0.000ns 10.46K > 1.55K 344.00 KB 1.69 MB HASH(... > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-13075) Setting very high BATCH_SIZE can blow up memory usage of fragments
[ https://issues.apache.org/jira/browse/IMPALA-13075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ezra Zerihun updated IMPALA-13075: -- Attachment: Mem_Limit_1G_Failed.txt > Setting very high BATCH_SIZE can blow up memory usage of fragments > -- > > Key: IMPALA-13075 > URL: https://issues.apache.org/jira/browse/IMPALA-13075 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Affects Versions: Impala 4.0.0 >Reporter: Ezra Zerihun >Priority: Major > Attachments: Batch_size_0_Success.txt, Failed (1).txt, > Failed_Cognos_pool.txt, Mem_Limit_1G_Failed.txt, Success (1).txt, > Success_Tableau_Pool.txt > > > In Impala 4.0, setting a very high BATCH_SIZE or near max limit of 65536 can > cause some fragment's memory usage to spike way past the query's defined > MEM_LIMIT or pool's Maximum Query Memory Limit with Clamp on. So even though > MEM_LIMIT is set reasonable, the query can still fail with out of memory and > a huge amount of memory used on fragment. Reducing BATCH_SIZE to a reasonable > amount or back to default will allow the query to run without issue and use > reasonable amount of memory within query's MEM_LIMIT or pool's Maximum Query > Memory Limit. > > 1) set BATCH_SIZE=65536; set MEM_LIMIT=1g; > > {code:java} > Query State: EXCEPTION > Impala Query State: ERROR > Query Status: Memory limit exceeded: Error occurred on backend ...:27000 > by fragment ... Memory left in process limit: 145.53 GB Memory left in query > limit: -6.80 GB Query(...): memory limit exceeded. Limit=1.00 GB > Reservation=86.44 MB ReservationLimit=819.20 MB OtherMemory=7.71 GB > Total=7.80 GB Peak=7.84 GB Unclaimed reservations: Reservation=8.50 MB > OtherMemory=0 Total=8.50 MB Peak=56.44 MB Runtime Filter Bank: > Reservation=4.00 MB ReservationLimit=4.00 MB OtherMemory=0 Total=4.00 MB > Peak=4.00 MB Fragment ...: Reservation=1.94 MB OtherMemory=7.59 GB > Total=7.59 GB Peak=7.63 GB HASH_JOIN_NODE (id=8): Reservation=1.94 MB > OtherMemory=7.57 GB Total=7.57 GB Peak=7.57 GB Exprs: Total=7.57 GB > Peak=7.57 GB Hash Join Builder (join_node_id=8): Total=0 Peak=1.95 MB > ... > Query Options (set by configuration): > BATCH_SIZE=65536,MEM_LIMIT=1073741824,CLIENT_IDENTIFIER=Impala Shell > v4.0.0.7.2.16.0-287 (5ae3917) built on Mon Jan 9 21:23:59 UTC > 2023,DEFAULT_FILE_FORMAT=PARQUET,... > ... > ExecSummary: > ... > 09:AGGREGATE 32 32 0.000ns 0.000ns 0 > 4.83M 36.31 MB 212.78 MB STREAMING > 08:HASH JOIN 32 32 5s149ms 2m44s 0 > 194.95M 7.57 GB 1.94 MB RIGHT OUTER JOIN, PARTITIONED > |--18:EXCHANGE 32 32 93.750us 1.000ms 10.46K > 1.55K 1.65 MB 2.56 MB HASH(... > {code} > > > 2) set BATCH_SIZE=0; set MEM_LIMIT=1g; > > {code:java} > Query State: FINISHED > Impala Query State: FINISHED > ... > Query Options (set by configuration and planner): > MEM_LIMIT=1073741824,CLIENT_IDENTIFIER=Impala Shell v4.0.0.7.2.16.0-287 > (5ae3917) built on Mon Jan 9 21:23:59 UTC > 2023,DEFAULT_FILE_FORMAT=PARQUET,... > ... > ExecSummary: > ... > 09:AGGREGATE 32 32 593.748us 18.999ms 45 > 4.83M 34.06 MB 212.78 MB STREAMING > 08:HASH JOIN 32 32 10s873ms 5m47s 10.47K > 194.95M 123.48 MB 1.94 MB RIGHT OUTER JOIN, PARTITIONED > |--18:EXCHANGE 32 32 0.000ns 0.000ns 10.46K > 1.55K 344.00 KB 1.69 MB HASH(... > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-13075) Setting very high BATCH_SIZE can blow up memory usage of fragments
[ https://issues.apache.org/jira/browse/IMPALA-13075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ezra Zerihun updated IMPALA-13075: -- Attachment: Batch_size_0_Success.txt > Setting very high BATCH_SIZE can blow up memory usage of fragments > -- > > Key: IMPALA-13075 > URL: https://issues.apache.org/jira/browse/IMPALA-13075 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Affects Versions: Impala 4.0.0 >Reporter: Ezra Zerihun >Priority: Major > Attachments: Batch_size_0_Success.txt, Failed (1).txt, > Failed_Cognos_pool.txt, Mem_Limit_1G_Failed.txt, Success (1).txt, > Success_Tableau_Pool.txt > > > In Impala 4.0, setting a very high BATCH_SIZE or near max limit of 65536 can > cause some fragment's memory usage to spike way past the query's defined > MEM_LIMIT or pool's Maximum Query Memory Limit with Clamp on. So even though > MEM_LIMIT is set reasonable, the query can still fail with out of memory and > a huge amount of memory used on fragment. Reducing BATCH_SIZE to a reasonable > amount or back to default will allow the query to run without issue and use > reasonable amount of memory within query's MEM_LIMIT or pool's Maximum Query > Memory Limit. > > 1) set BATCH_SIZE=65536; set MEM_LIMIT=1g; > > {code:java} > Query State: EXCEPTION > Impala Query State: ERROR > Query Status: Memory limit exceeded: Error occurred on backend ...:27000 > by fragment ... Memory left in process limit: 145.53 GB Memory left in query > limit: -6.80 GB Query(...): memory limit exceeded. Limit=1.00 GB > Reservation=86.44 MB ReservationLimit=819.20 MB OtherMemory=7.71 GB > Total=7.80 GB Peak=7.84 GB Unclaimed reservations: Reservation=8.50 MB > OtherMemory=0 Total=8.50 MB Peak=56.44 MB Runtime Filter Bank: > Reservation=4.00 MB ReservationLimit=4.00 MB OtherMemory=0 Total=4.00 MB > Peak=4.00 MB Fragment ...: Reservation=1.94 MB OtherMemory=7.59 GB > Total=7.59 GB Peak=7.63 GB HASH_JOIN_NODE (id=8): Reservation=1.94 MB > OtherMemory=7.57 GB Total=7.57 GB Peak=7.57 GB Exprs: Total=7.57 GB > Peak=7.57 GB Hash Join Builder (join_node_id=8): Total=0 Peak=1.95 MB > ... > Query Options (set by configuration): > BATCH_SIZE=65536,MEM_LIMIT=1073741824,CLIENT_IDENTIFIER=Impala Shell > v4.0.0.7.2.16.0-287 (5ae3917) built on Mon Jan 9 21:23:59 UTC > 2023,DEFAULT_FILE_FORMAT=PARQUET,... > ... > ExecSummary: > ... > 09:AGGREGATE 32 32 0.000ns 0.000ns 0 > 4.83M 36.31 MB 212.78 MB STREAMING > 08:HASH JOIN 32 32 5s149ms 2m44s 0 > 194.95M 7.57 GB 1.94 MB RIGHT OUTER JOIN, PARTITIONED > |--18:EXCHANGE 32 32 93.750us 1.000ms 10.46K > 1.55K 1.65 MB 2.56 MB HASH(... > {code} > > > 2) set BATCH_SIZE=0; set MEM_LIMIT=1g; > > {code:java} > Query State: FINISHED > Impala Query State: FINISHED > ... > Query Options (set by configuration and planner): > MEM_LIMIT=1073741824,CLIENT_IDENTIFIER=Impala Shell v4.0.0.7.2.16.0-287 > (5ae3917) built on Mon Jan 9 21:23:59 UTC > 2023,DEFAULT_FILE_FORMAT=PARQUET,... > ... > ExecSummary: > ... > 09:AGGREGATE 32 32 593.748us 18.999ms 45 > 4.83M 34.06 MB 212.78 MB STREAMING > 08:HASH JOIN 32 32 10s873ms 5m47s 10.47K > 194.95M 123.48 MB 1.94 MB RIGHT OUTER JOIN, PARTITIONED > |--18:EXCHANGE 32 32 0.000ns 0.000ns 10.46K > 1.55K 344.00 KB 1.69 MB HASH(... > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-13075) Setting very high BATCH_SIZE can blow up memory usage of fragments
[ https://issues.apache.org/jira/browse/IMPALA-13075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ezra Zerihun updated IMPALA-13075: -- Attachment: Success (1).txt > Setting very high BATCH_SIZE can blow up memory usage of fragments > -- > > Key: IMPALA-13075 > URL: https://issues.apache.org/jira/browse/IMPALA-13075 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Affects Versions: Impala 4.0.0 >Reporter: Ezra Zerihun >Priority: Major > Attachments: Batch_size_0_Success.txt, Failed (1).txt, > Failed_Cognos_pool.txt, Mem_Limit_1G_Failed.txt, Success (1).txt, > Success_Tableau_Pool.txt > > > In Impala 4.0, setting a very high BATCH_SIZE or near max limit of 65536 can > cause some fragment's memory usage to spike way past the query's defined > MEM_LIMIT or pool's Maximum Query Memory Limit with Clamp on. So even though > MEM_LIMIT is set reasonable, the query can still fail with out of memory and > a huge amount of memory used on fragment. Reducing BATCH_SIZE to a reasonable > amount or back to default will allow the query to run without issue and use > reasonable amount of memory within query's MEM_LIMIT or pool's Maximum Query > Memory Limit. > > 1) set BATCH_SIZE=65536; set MEM_LIMIT=1g; > > {code:java} > Query State: EXCEPTION > Impala Query State: ERROR > Query Status: Memory limit exceeded: Error occurred on backend ...:27000 > by fragment ... Memory left in process limit: 145.53 GB Memory left in query > limit: -6.80 GB Query(...): memory limit exceeded. Limit=1.00 GB > Reservation=86.44 MB ReservationLimit=819.20 MB OtherMemory=7.71 GB > Total=7.80 GB Peak=7.84 GB Unclaimed reservations: Reservation=8.50 MB > OtherMemory=0 Total=8.50 MB Peak=56.44 MB Runtime Filter Bank: > Reservation=4.00 MB ReservationLimit=4.00 MB OtherMemory=0 Total=4.00 MB > Peak=4.00 MB Fragment ...: Reservation=1.94 MB OtherMemory=7.59 GB > Total=7.59 GB Peak=7.63 GB HASH_JOIN_NODE (id=8): Reservation=1.94 MB > OtherMemory=7.57 GB Total=7.57 GB Peak=7.57 GB Exprs: Total=7.57 GB > Peak=7.57 GB Hash Join Builder (join_node_id=8): Total=0 Peak=1.95 MB > ... > Query Options (set by configuration): > BATCH_SIZE=65536,MEM_LIMIT=1073741824,CLIENT_IDENTIFIER=Impala Shell > v4.0.0.7.2.16.0-287 (5ae3917) built on Mon Jan 9 21:23:59 UTC > 2023,DEFAULT_FILE_FORMAT=PARQUET,... > ... > ExecSummary: > ... > 09:AGGREGATE 32 32 0.000ns 0.000ns 0 > 4.83M 36.31 MB 212.78 MB STREAMING > 08:HASH JOIN 32 32 5s149ms 2m44s 0 > 194.95M 7.57 GB 1.94 MB RIGHT OUTER JOIN, PARTITIONED > |--18:EXCHANGE 32 32 93.750us 1.000ms 10.46K > 1.55K 1.65 MB 2.56 MB HASH(... > {code} > > > 2) set BATCH_SIZE=0; set MEM_LIMIT=1g; > > {code:java} > Query State: FINISHED > Impala Query State: FINISHED > ... > Query Options (set by configuration and planner): > MEM_LIMIT=1073741824,CLIENT_IDENTIFIER=Impala Shell v4.0.0.7.2.16.0-287 > (5ae3917) built on Mon Jan 9 21:23:59 UTC > 2023,DEFAULT_FILE_FORMAT=PARQUET,... > ... > ExecSummary: > ... > 09:AGGREGATE 32 32 593.748us 18.999ms 45 > 4.83M 34.06 MB 212.78 MB STREAMING > 08:HASH JOIN 32 32 10s873ms 5m47s 10.47K > 194.95M 123.48 MB 1.94 MB RIGHT OUTER JOIN, PARTITIONED > |--18:EXCHANGE 32 32 0.000ns 0.000ns 10.46K > 1.55K 344.00 KB 1.69 MB HASH(... > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-13075) Setting very high BATCH_SIZE can blow up memory usage of fragments
[ https://issues.apache.org/jira/browse/IMPALA-13075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846017#comment-17846017 ] Riza Suminto commented on IMPALA-13075: --- Yes, BATCH_SIZE number is a basic unit of how Impala estimate / allocate memory. [https://cwiki.apache.org/confluence/display/IMPALA/Impala+Row+Batches] Both Frontend Planner and Backend Executor respect this BATCH_SIZE number. If MEM_LIMIT still above minimum memory resource requirement, I would expect that query can still get admitted and run even though it is not performant (ie., it need to spill rows to disk). Each fragment claim their minimum memory requirement right after they're instantiated. Please attach the full query profile of both good and bad run so we can analyze it more. > Setting very high BATCH_SIZE can blow up memory usage of fragments > -- > > Key: IMPALA-13075 > URL: https://issues.apache.org/jira/browse/IMPALA-13075 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Affects Versions: Impala 4.0.0 >Reporter: Ezra Zerihun >Priority: Major > > In Impala 4.0, setting a very high BATCH_SIZE or near max limit of 65536 can > cause some fragment's memory usage to spike way past the query's defined > MEM_LIMIT or pool's Maximum Query Memory Limit with Clamp on. So even though > MEM_LIMIT is set reasonable, the query can still fail with out of memory and > a huge amount of memory used on fragment. Reducing BATCH_SIZE to a reasonable > amount or back to default will allow the query to run without issue and use > reasonable amount of memory within query's MEM_LIMIT or pool's Maximum Query > Memory Limit. > > 1) set BATCH_SIZE=65536; set MEM_LIMIT=1g; > > {code:java} > Query State: EXCEPTION > Impala Query State: ERROR > Query Status: Memory limit exceeded: Error occurred on backend ...:27000 > by fragment ... Memory left in process limit: 145.53 GB Memory left in query > limit: -6.80 GB Query(...): memory limit exceeded. Limit=1.00 GB > Reservation=86.44 MB ReservationLimit=819.20 MB OtherMemory=7.71 GB > Total=7.80 GB Peak=7.84 GB Unclaimed reservations: Reservation=8.50 MB > OtherMemory=0 Total=8.50 MB Peak=56.44 MB Runtime Filter Bank: > Reservation=4.00 MB ReservationLimit=4.00 MB OtherMemory=0 Total=4.00 MB > Peak=4.00 MB Fragment ...: Reservation=1.94 MB OtherMemory=7.59 GB > Total=7.59 GB Peak=7.63 GB HASH_JOIN_NODE (id=8): Reservation=1.94 MB > OtherMemory=7.57 GB Total=7.57 GB Peak=7.57 GB Exprs: Total=7.57 GB > Peak=7.57 GB Hash Join Builder (join_node_id=8): Total=0 Peak=1.95 MB > ... > Query Options (set by configuration): > BATCH_SIZE=65536,MEM_LIMIT=1073741824,CLIENT_IDENTIFIER=Impala Shell > v4.0.0.7.2.16.0-287 (5ae3917) built on Mon Jan 9 21:23:59 UTC > 2023,DEFAULT_FILE_FORMAT=PARQUET,... > ... > ExecSummary: > ... > 09:AGGREGATE 32 32 0.000ns 0.000ns 0 > 4.83M 36.31 MB 212.78 MB STREAMING > 08:HASH JOIN 32 32 5s149ms 2m44s 0 > 194.95M 7.57 GB 1.94 MB RIGHT OUTER JOIN, PARTITIONED > |--18:EXCHANGE 32 32 93.750us 1.000ms 10.46K > 1.55K 1.65 MB 2.56 MB HASH(... > {code} > > > 2) set BATCH_SIZE=0; set MEM_LIMIT=1g; > > {code:java} > Query State: FINISHED > Impala Query State: FINISHED > ... > Query Options (set by configuration and planner): > MEM_LIMIT=1073741824,CLIENT_IDENTIFIER=Impala Shell v4.0.0.7.2.16.0-287 > (5ae3917) built on Mon Jan 9 21:23:59 UTC > 2023,DEFAULT_FILE_FORMAT=PARQUET,... > ... > ExecSummary: > ... > 09:AGGREGATE 32 32 593.748us 18.999ms 45 > 4.83M 34.06 MB 212.78 MB STREAMING > 08:HASH JOIN 32 32 10s873ms 5m47s 10.47K > 194.95M 123.48 MB 1.94 MB RIGHT OUTER JOIN, PARTITIONED > |--18:EXCHANGE 32 32 0.000ns 0.000ns 10.46K > 1.55K 344.00 KB 1.69 MB HASH(... > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-13075) Setting very high BATCH_SIZE can blow up memory usage of fragments
[ https://issues.apache.org/jira/browse/IMPALA-13075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846008#comment-17846008 ] Ezra Zerihun commented on IMPALA-13075: --- This seems to be expected behavior as high BATCH_SIZE will store more rows into memory. Even documentation mentions the higher memory footprint. But I have query profiles from a customer who observed behavior above and did not realize why queries failed with out of memory when their pool set BATCH_SIZE to max limit of 65536. So just thought to make this improvement Jira in case anything can be improved memory consumption of setting BATCH_SIZE. If not, feel free to close. > Setting very high BATCH_SIZE can blow up memory usage of fragments > -- > > Key: IMPALA-13075 > URL: https://issues.apache.org/jira/browse/IMPALA-13075 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Affects Versions: Impala 4.0.0 >Reporter: Ezra Zerihun >Priority: Major > > In Impala 4.0, setting a very high BATCH_SIZE or near max limit of 65536 can > cause some fragment's memory usage to spike way past the query's defined > MEM_LIMIT or pool's Maximum Query Memory Limit with Clamp on. So even though > MEM_LIMIT is set reasonable, the query can still fail with out of memory and > a huge amount of memory used on fragment. Reducing BATCH_SIZE to a reasonable > amount or back to default will allow the query to run without issue and use > reasonable amount of memory within query's MEM_LIMIT or pool's Maximum Query > Memory Limit. > > 1) set BATCH_SIZE=65536; set MEM_LIMIT=1g; > > {code:java} > Query State: EXCEPTION > Impala Query State: ERROR > Query Status: Memory limit exceeded: Error occurred on backend ...:27000 > by fragment ... Memory left in process limit: 145.53 GB Memory left in query > limit: -6.80 GB Query(...): memory limit exceeded. Limit=1.00 GB > Reservation=86.44 MB ReservationLimit=819.20 MB OtherMemory=7.71 GB > Total=7.80 GB Peak=7.84 GB Unclaimed reservations: Reservation=8.50 MB > OtherMemory=0 Total=8.50 MB Peak=56.44 MB Runtime Filter Bank: > Reservation=4.00 MB ReservationLimit=4.00 MB OtherMemory=0 Total=4.00 MB > Peak=4.00 MB Fragment ...: Reservation=1.94 MB OtherMemory=7.59 GB > Total=7.59 GB Peak=7.63 GB HASH_JOIN_NODE (id=8): Reservation=1.94 MB > OtherMemory=7.57 GB Total=7.57 GB Peak=7.57 GB Exprs: Total=7.57 GB > Peak=7.57 GB Hash Join Builder (join_node_id=8): Total=0 Peak=1.95 MB > ... > Query Options (set by configuration): > BATCH_SIZE=65536,MEM_LIMIT=1073741824,CLIENT_IDENTIFIER=Impala Shell > v4.0.0.7.2.16.0-287 (5ae3917) built on Mon Jan 9 21:23:59 UTC > 2023,DEFAULT_FILE_FORMAT=PARQUET,... > ... > ExecSummary: > ... > 09:AGGREGATE 32 32 0.000ns 0.000ns 0 > 4.83M 36.31 MB 212.78 MB STREAMING > 08:HASH JOIN 32 32 5s149ms 2m44s 0 > 194.95M 7.57 GB 1.94 MB RIGHT OUTER JOIN, PARTITIONED > |--18:EXCHANGE 32 32 93.750us 1.000ms 10.46K > 1.55K 1.65 MB 2.56 MB HASH(... > {code} > > > 2) set BATCH_SIZE=0; set MEM_LIMIT=1g; > > {code:java} > Query State: FINISHED > Impala Query State: FINISHED > ... > Query Options (set by configuration and planner): > MEM_LIMIT=1073741824,CLIENT_IDENTIFIER=Impala Shell v4.0.0.7.2.16.0-287 > (5ae3917) built on Mon Jan 9 21:23:59 UTC > 2023,DEFAULT_FILE_FORMAT=PARQUET,... > ... > ExecSummary: > ... > 09:AGGREGATE 32 32 593.748us 18.999ms 45 > 4.83M 34.06 MB 212.78 MB STREAMING > 08:HASH JOIN 32 32 10s873ms 5m47s 10.47K > 194.95M 123.48 MB 1.94 MB RIGHT OUTER JOIN, PARTITIONED > |--18:EXCHANGE 32 32 0.000ns 0.000ns 10.46K > 1.55K 344.00 KB 1.69 MB HASH(... > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-13075) Setting very high BATCH_SIZE can blow up memory usage of fragments
Ezra Zerihun created IMPALA-13075: - Summary: Setting very high BATCH_SIZE can blow up memory usage of fragments Key: IMPALA-13075 URL: https://issues.apache.org/jira/browse/IMPALA-13075 Project: IMPALA Issue Type: Improvement Components: Backend Affects Versions: Impala 4.0.0 Reporter: Ezra Zerihun In Impala 4.0, setting a very high BATCH_SIZE or near max limit of 65536 can cause some fragment's memory usage to spike way past the query's defined MEM_LIMIT or pool's Maximum Query Memory Limit with Clamp on. So even though MEM_LIMIT is set reasonable, the query can still fail with out of memory and a huge amount of memory used on fragment. Reducing BATCH_SIZE to a reasonable amount or back to default will allow the query to run without issue and use reasonable amount of memory within query's MEM_LIMIT or pool's Maximum Query Memory Limit. 1) set BATCH_SIZE=65536; set MEM_LIMIT=1g; {code:java} Query State: EXCEPTION Impala Query State: ERROR Query Status: Memory limit exceeded: Error occurred on backend ...:27000 by fragment ... Memory left in process limit: 145.53 GB Memory left in query limit: -6.80 GB Query(...): memory limit exceeded. Limit=1.00 GB Reservation=86.44 MB ReservationLimit=819.20 MB OtherMemory=7.71 GB Total=7.80 GB Peak=7.84 GB Unclaimed reservations: Reservation=8.50 MB OtherMemory=0 Total=8.50 MB Peak=56.44 MB Runtime Filter Bank: Reservation=4.00 MB ReservationLimit=4.00 MB OtherMemory=0 Total=4.00 MB Peak=4.00 MB Fragment ...: Reservation=1.94 MB OtherMemory=7.59 GB Total=7.59 GB Peak=7.63 GB HASH_JOIN_NODE (id=8): Reservation=1.94 MB OtherMemory=7.57 GB Total=7.57 GB Peak=7.57 GB Exprs: Total=7.57 GB Peak=7.57 GB Hash Join Builder (join_node_id=8): Total=0 Peak=1.95 MB ... Query Options (set by configuration): BATCH_SIZE=65536,MEM_LIMIT=1073741824,CLIENT_IDENTIFIER=Impala Shell v4.0.0.7.2.16.0-287 (5ae3917) built on Mon Jan 9 21:23:59 UTC 2023,DEFAULT_FILE_FORMAT=PARQUET,... ... ExecSummary: ... 09:AGGREGATE 32 32 0.000ns 0.000ns 0 4.83M 36.31 MB 212.78 MB STREAMING 08:HASH JOIN 32 32 5s149ms 2m44s 0 194.95M 7.57 GB 1.94 MB RIGHT OUTER JOIN, PARTITIONED |--18:EXCHANGE 32 32 93.750us 1.000ms 10.46K 1.55K 1.65 MB 2.56 MB HASH(... {code} 2) set BATCH_SIZE=0; set MEM_LIMIT=1g; {code:java} Query State: FINISHED Impala Query State: FINISHED ... Query Options (set by configuration and planner): MEM_LIMIT=1073741824,CLIENT_IDENTIFIER=Impala Shell v4.0.0.7.2.16.0-287 (5ae3917) built on Mon Jan 9 21:23:59 UTC 2023,DEFAULT_FILE_FORMAT=PARQUET,... ... ExecSummary: ... 09:AGGREGATE 32 32 593.748us 18.999ms 45 4.83M 34.06 MB 212.78 MB STREAMING 08:HASH JOIN 32 32 10s873ms 5m47s 10.47K 194.95M 123.48 MB 1.94 MB RIGHT OUTER JOIN, PARTITIONED |--18:EXCHANGE 32 32 0.000ns 0.000ns 10.46K 1.55K 344.00 KB 1.69 MB HASH(... {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-13061) Query Live table fails to load if default_transactional_type=insert_only set globally
[ https://issues.apache.org/jira/browse/IMPALA-13061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845906#comment-17845906 ] ASF subversion and git services commented on IMPALA-13061: -- Commit 338fedb44703646664e2e22c6e2f35336924db22 in impala's branch refs/heads/branch-4.4.0 from Michael Smith [ https://gitbox.apache.org/repos/asf?p=impala.git;h=338fedb44 ] IMPALA-13061: Create query live as external table Impala determines whether a managed table is transactional based on the 'transactional' table property. It assumes any managed table with transactional=true returns non-null getValidWriteIds. When 'default_transactional_type=insert_only' is set at startup (via default_query_options), impala_query_live is created as a managed table with transactional=true, but SystemTables don't implement getValidWriteIds and are not meant to be transactional. DataSourceTable has a similar problem, and when a JDBC table is created setJdbcDataSourceProperties sets transactional=false. This patch uses CREATE EXTERNAL TABLE sys.impala_Query_live so that it is not created as a managed table and 'transactional' is not set. That avoids creating a SystemTable that Impala can't read (it encounters an IllegalStateException). Change-Id: Ie60a2bd03fabc63c85bcd9fa2489e9d47cd2aa65 Reviewed-on: http://gerrit.cloudera.org:8080/21401 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins (cherry picked from commit 1233ac3c579b5929866dba23debae63e5d2aae90) > Query Live table fails to load if default_transactional_type=insert_only set > globally > - > > Key: IMPALA-13061 > URL: https://issues.apache.org/jira/browse/IMPALA-13061 > Project: IMPALA > Issue Type: Bug >Reporter: Michael Smith >Assignee: Michael Smith >Priority: Critical > Fix For: Impala 4.5.0 > > > If transactional type defaults to insert_only for all queries via > {code} > --default_query_options=default_transactional_type=insert_only > {code} > the table definition for {{sys.impala_query_live}} is set to transactional, > which causes an exception in catalogd > {code} > I0506 22:07:42.808758 3972 jni-util.cc:302] > 4547b965aeebc5f0:8ba96c58] java.lang.IllegalStateException > at > com.google.common.base.Preconditions.checkState(Preconditions.java:496) > at org.apache.impala.catalog.Table.getPartialInfo(Table.java:851) > at > org.apache.impala.catalog.CatalogServiceCatalog.doGetPartialCatalogObject(CatalogServiceCatalog.java:3818) > at > org.apache.impala.catalog.CatalogServiceCatalog.getPartialCatalogObject(CatalogServiceCatalog.java:3714) > at > org.apache.impala.catalog.CatalogServiceCatalog.getPartialCatalogObject(CatalogServiceCatalog.java:3681) > at > org.apache.impala.service.JniCatalog.lambda$getPartialCatalogObject$10(JniCatalog.java:431) > at > org.apache.impala.service.JniCatalogOp.lambda$execAndSerialize$1(JniCatalogOp.java:90) > at org.apache.impala.service.JniCatalogOp.execOp(JniCatalogOp.java:58) > at > org.apache.impala.service.JniCatalogOp.execAndSerialize(JniCatalogOp.java:89) > at > org.apache.impala.service.JniCatalogOp.execAndSerializeSilentStartAndFinish(JniCatalogOp.java:109) > at > org.apache.impala.service.JniCatalog.execAndSerializeSilentStartAndFinish(JniCatalog.java:253) > at > org.apache.impala.service.JniCatalog.getPartialCatalogObject(JniCatalog.java:430) > {code} > We need to override that setting while creating {{sys.impala_query_live}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-13045) Fix intermittent failure in TestQueryLive.test_local_catalog
[ https://issues.apache.org/jira/browse/IMPALA-13045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845905#comment-17845905 ] ASF subversion and git services commented on IMPALA-13045: -- Commit 39233ba3d134b8c18f6f208a7d85c3fadf8ee371 in impala's branch refs/heads/branch-4.4.0 from Michael Smith [ https://gitbox.apache.org/repos/asf?p=impala.git;h=39233ba3d ] IMPALA-13045: Wait for impala_query_live to exist Waits for creation of 'sys.impala_query_live' in tests to ensure it has been registered with HMS. Change-Id: I5cc3fa3c43be7af9a5f097359a0d4f20d057a207 Reviewed-on: http://gerrit.cloudera.org:8080/21372 Reviewed-by: Impala Public Jenkins Tested-by: Michael Smith (cherry picked from commit b35aa819653dce062109e61d8f30171234dce5f9) > Fix intermittent failure in TestQueryLive.test_local_catalog > > > Key: IMPALA-13045 > URL: https://issues.apache.org/jira/browse/IMPALA-13045 > Project: IMPALA > Issue Type: Task >Reporter: Michael Smith >Assignee: Michael Smith >Priority: Major > Fix For: Impala 4.5.0 > > > IMPALA-13005 introduced {{drop table sys.impala_query_live}}. In some test > environments (notably testing with Ozone), recreating that table in the > following test - test_local_catalog - does not occur before running the test > case portion that attempts to query that table. > Update the test to wait for the table to be available. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12910) Run TPCH/TPCDS queries for external JDBC tables
[ https://issues.apache.org/jira/browse/IMPALA-12910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845902#comment-17845902 ] ASF subversion and git services commented on IMPALA-12910: -- Commit 01401a0368cb8f19c86dc3fab764ee4b5732f2f6 in impala's branch refs/heads/branch-4.4.0 from wzhou-code [ https://gitbox.apache.org/repos/asf?p=impala.git;h=01401a036 ] IMPALA-12910: Support running TPCH/TPCDS queries for JDBC tables This patch adds script to create external JDBC tables for the dataset of TPCH and TPCDS, and adds unit-tests to run TPCH and TPCDS queries for external JDBC tables with Impala-Impala federation. Note that JDBC tables are mapping tables, they don't take additional disk spaces. It fixes the race condition when caching of SQL DataSource objects by using a new DataSourceObjectCache class, which checks reference count before closing SQL DataSource. Adds a new query-option 'clean_dbcp_ds_cache' with default value as true. When it's set as false, SQL DataSource object will not be closed when its reference count equals 0 and will be kept in cache until the SQL DataSource is idle for more than 5 minutes. Flag variable 'dbcp_data_source_idle_timeout_s' is added to make the duration configurable. java.sql.Connection.close() fails to remove a closed connection from connection pool sometimes, which causes JDBC working threads to wait for available connections from the connection pool for a long time. The work around is to call BasicDataSource.invalidateConnection() API to close a connection. Two flag variables are added for DBCP configuration properties 'maxTotal' and 'maxWaitMillis'. Note that 'maxActive' and 'maxWait' properties are renamed to 'maxTotal' and 'maxWaitMillis' respectively in apache.commons.dbcp v2. Fixes a bug for database type comparison since the type strings specified by user could be lower case or mix of upper/lower cases, but the code compares the types with upper case string. Fixes issue to close SQL DataSource object in JdbcDataSource.open() and JdbcDataSource.getNext() when some errors returned from DBCP APIs or JDBC drivers. testdata/bin/create-tpc-jdbc-tables.py supports to create JDBC tables for Impala-Impala, Postgres and MySQL. Following sample commands creates TPCDS JDBC tables for Impala-Impala federation with remote coordinator running at 10.19.10.86, and Postgres server running at 10.19.10.86: ${IMPALA_HOME}/testdata/bin/create-tpc-jdbc-tables.py \ --jdbc_db_name=tpcds_jdbc --workload=tpcds \ --database_type=IMPALA --database_host=10.19.10.86 --clean ${IMPALA_HOME}/testdata/bin/create-tpc-jdbc-tables.py \ --jdbc_db_name=tpcds_jdbc --workload=tpcds \ --database_type=POSTGRES --database_host=10.19.10.86 \ --database_name=tpcds --clean TPCDS tests for JDBC tables run only for release/exhaustive builds. TPCH tests for JDBC tables run for core and exhaustive builds, except Dockerized builds. Remaining Issues: - tpcds-decimal_v2-q80a failed with returned rows not matching expected results for some decimal values. This will be fixed in IMPALA-13018. Testing: - Passed core tests. - Passed query_test/test_tpcds_queries.py in release/exhaustive build. - Manually verified that only one SQL DataSource object was created for test_tpcds_queries.py::TestTpcdsQueryForJdbcTables since query option 'clean_dbcp_ds_cache' was set as false, and the SQL DataSource object was closed by cleanup thread. Change-Id: I44e8c1bb020e90559c7f22483a7ab7a151b8f48a Reviewed-on: http://gerrit.cloudera.org:8080/21304 Reviewed-by: Abhishek Rawat Tested-by: Impala Public Jenkins (cherry picked from commit 08f8a300250df7b4f9a517cdb6bab48c379b7e03) > Run TPCH/TPCDS queries for external JDBC tables > --- > > Key: IMPALA-12910 > URL: https://issues.apache.org/jira/browse/IMPALA-12910 > Project: IMPALA > Issue Type: Sub-task > Components: Perf Investigation >Reporter: Wenzhe Zhou >Assignee: Wenzhe Zhou >Priority: Major > Fix For: Impala 4.5.0 > > > Need performance data for queries on external JDBC tables to be documented in > the design doc. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-11499) Refactor UrlEncode function to handle special characters
[ https://issues.apache.org/jira/browse/IMPALA-11499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845907#comment-17845907 ] ASF subversion and git services commented on IMPALA-11499: -- Commit b8a66b0e104f8e25e70fce0326d36c9b48672dbb in impala's branch refs/heads/branch-4.4.0 from pranavyl [ https://gitbox.apache.org/repos/asf?p=impala.git;h=b8a66b0e1 ] IMPALA-11499: Refactor UrlEncode function to handle special characters An error came from an issue with URL encoding, where certain Unicode characters were being incorrectly encoded due to their UTF-8 representation matching characters in the set of characters to escape. For example, the string '运', which consists of three bytes 0xe8 0xbf 0x90 was wrongly getting encoded into '\E8%FFBF\90', because the middle byte matched one of the two bytes that represented the "\u00FF" literal. Inclusion of "\u00FF" was likely a mistake from the beginning and it should have been '\x7F'. The patch makes three key changes: 1. Before the change, the set of characters that need to be escaped was stored as a string. The current patch uses an unordered_set instead. 2. '\xFF', which is an invalid UTF-8 byte and whose inclusion was erroneous from the beginning, is replaced with '\x7F', which is a control character for DELETE, ensuring consistency and correctness in URL encoding. 3. The list of characters to be escaped is extended to match the current list in Hive. Testing: Tests on both traditional Hive tables and Iceberg tables are included in unicode-column-name.test, insert.test, coding-util-test.cc and test_insert.py. Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb Reviewed-on: http://gerrit.cloudera.org:8080/21131 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins (cherry picked from commit 85cd07a11e876f3d8773f2638f699c61a6b0dd4c) > Refactor UrlEncode function to handle special characters > > > Key: IMPALA-11499 > URL: https://issues.apache.org/jira/browse/IMPALA-11499 > Project: IMPALA > Issue Type: Bug > Components: Backend >Reporter: Quanlong Huang >Assignee: Pranav Yogi Lodha >Priority: Critical > Fix For: Impala 4.5.0 > > > Partition values are incorrectly URL-encoded in backend for unicode > characters, e.g. '运营业务数据' is encoded to '�%FFBF�营业务数据' which is wrong. > To reproduce the issue, first create a partition table: > {code:sql} > create table my_part_tbl (id int) partitioned by (p string) stored as parquet; > {code} > Then insert data into it using partition values containing '运'. They will > fail: > {noformat} > [localhost:21050] default> insert into my_part_tbl partition(p='运营业务数据') > values (0); > Query: insert into my_part_tbl partition(p='运营业务数据') values (0) > Query submitted at: 2022-08-16 10:03:56 (Coordinator: > http://quanlong-OptiPlex-BJ:25000) > Query progress can be monitored at: > http://quanlong-OptiPlex-BJ:25000/query_plan?query_id=404ac3027c4b7169:39d16a2d > ERROR: Error(s) moving partition files. First error (of 1) was: Hdfs op > (RENAME > hdfs://localhost:20500/test-warehouse/my_part_tbl/_impala_insert_staging/404ac3027c4b7169_39d16a2d/.404ac3027c4b7169-39d16a2d_1475855322_dir/p=�%FFBF�营业务数据/404ac3027c4b7169-39d16a2d_1585092794_data.0.parq > TO > hdfs://localhost:20500/test-warehouse/my_part_tbl/p=�%FFBF�营业务数据/404ac3027c4b7169-39d16a2d_1585092794_data.0.parq) > failed, error was: > hdfs://localhost:20500/test-warehouse/my_part_tbl/_impala_insert_staging/404ac3027c4b7169_39d16a2d/.404ac3027c4b7169-39d16a2d_1475855322_dir/p=�%FFBF�营业务数据/404ac3027c4b7169-39d16a2d_1585092794_data.0.parq > Error(5): Input/output error > [localhost:21050] default> insert into my_part_tbl partition(p='运') values > (0); > Query: insert into my_part_tbl partition(p='运') values (0) > Query submitted at: 2022-08-16 10:04:22 (Coordinator: > http://quanlong-OptiPlex-BJ:25000) > Query progress can be monitored at: > http://quanlong-OptiPlex-BJ:25000/query_plan?query_id=a64e5883473ec28d:86e7e335 > ERROR: Error(s) moving partition files. First error (of 1) was: Hdfs op > (RENAME > hdfs://localhost:20500/test-warehouse/my_part_tbl/_impala_insert_staging/a64e5883473ec28d_86e7e335/.a64e5883473ec28d-86e7e335_1582623091_dir/p=�%FFBF�/a64e5883473ec28d-86e7e335_163454510_data.0.parq > TO > hdfs://localhost:20500/test-warehouse/my_part_tbl/p=�%FFBF�/a64e5883473ec28d-86e7e335_163454510_data.0.parq) > failed, error was: > hdfs://localhost:20500/test-warehouse/my_part_tbl/_impala_insert_staging/a64e5883473ec28d_86e7e335/.a64e5883473ec28d-86e7e335_1582623091_dir/p=�%FFBF�/a64e5883473ec28d-86e7e335_163454510_data.0.parq >
[jira] [Commented] (IMPALA-13018) Fix test_tpcds_queries.py/TestTpcdsQueryForJdbcTables.test_tpcds-decimal_v2-q80a failure
[ https://issues.apache.org/jira/browse/IMPALA-13018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845903#comment-17845903 ] ASF subversion and git services commented on IMPALA-13018: -- Commit 01401a0368cb8f19c86dc3fab764ee4b5732f2f6 in impala's branch refs/heads/branch-4.4.0 from wzhou-code [ https://gitbox.apache.org/repos/asf?p=impala.git;h=01401a036 ] IMPALA-12910: Support running TPCH/TPCDS queries for JDBC tables This patch adds script to create external JDBC tables for the dataset of TPCH and TPCDS, and adds unit-tests to run TPCH and TPCDS queries for external JDBC tables with Impala-Impala federation. Note that JDBC tables are mapping tables, they don't take additional disk spaces. It fixes the race condition when caching of SQL DataSource objects by using a new DataSourceObjectCache class, which checks reference count before closing SQL DataSource. Adds a new query-option 'clean_dbcp_ds_cache' with default value as true. When it's set as false, SQL DataSource object will not be closed when its reference count equals 0 and will be kept in cache until the SQL DataSource is idle for more than 5 minutes. Flag variable 'dbcp_data_source_idle_timeout_s' is added to make the duration configurable. java.sql.Connection.close() fails to remove a closed connection from connection pool sometimes, which causes JDBC working threads to wait for available connections from the connection pool for a long time. The work around is to call BasicDataSource.invalidateConnection() API to close a connection. Two flag variables are added for DBCP configuration properties 'maxTotal' and 'maxWaitMillis'. Note that 'maxActive' and 'maxWait' properties are renamed to 'maxTotal' and 'maxWaitMillis' respectively in apache.commons.dbcp v2. Fixes a bug for database type comparison since the type strings specified by user could be lower case or mix of upper/lower cases, but the code compares the types with upper case string. Fixes issue to close SQL DataSource object in JdbcDataSource.open() and JdbcDataSource.getNext() when some errors returned from DBCP APIs or JDBC drivers. testdata/bin/create-tpc-jdbc-tables.py supports to create JDBC tables for Impala-Impala, Postgres and MySQL. Following sample commands creates TPCDS JDBC tables for Impala-Impala federation with remote coordinator running at 10.19.10.86, and Postgres server running at 10.19.10.86: ${IMPALA_HOME}/testdata/bin/create-tpc-jdbc-tables.py \ --jdbc_db_name=tpcds_jdbc --workload=tpcds \ --database_type=IMPALA --database_host=10.19.10.86 --clean ${IMPALA_HOME}/testdata/bin/create-tpc-jdbc-tables.py \ --jdbc_db_name=tpcds_jdbc --workload=tpcds \ --database_type=POSTGRES --database_host=10.19.10.86 \ --database_name=tpcds --clean TPCDS tests for JDBC tables run only for release/exhaustive builds. TPCH tests for JDBC tables run for core and exhaustive builds, except Dockerized builds. Remaining Issues: - tpcds-decimal_v2-q80a failed with returned rows not matching expected results for some decimal values. This will be fixed in IMPALA-13018. Testing: - Passed core tests. - Passed query_test/test_tpcds_queries.py in release/exhaustive build. - Manually verified that only one SQL DataSource object was created for test_tpcds_queries.py::TestTpcdsQueryForJdbcTables since query option 'clean_dbcp_ds_cache' was set as false, and the SQL DataSource object was closed by cleanup thread. Change-Id: I44e8c1bb020e90559c7f22483a7ab7a151b8f48a Reviewed-on: http://gerrit.cloudera.org:8080/21304 Reviewed-by: Abhishek Rawat Tested-by: Impala Public Jenkins (cherry picked from commit 08f8a300250df7b4f9a517cdb6bab48c379b7e03) > Fix > test_tpcds_queries.py/TestTpcdsQueryForJdbcTables.test_tpcds-decimal_v2-q80a > failure > > > Key: IMPALA-13018 > URL: https://issues.apache.org/jira/browse/IMPALA-13018 > Project: IMPALA > Issue Type: Sub-task > Components: Backend, Frontend >Reporter: Wenzhe Zhou >Assignee: Wenzhe Zhou >Priority: Major > Fix For: Impala 4.5.0 > > > The returned rows are not matching expected results for some decimal type of > columns. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-13074) WRITE TO HDFS node is omitted from Web UI graphic plan
Noemi Pap-Takacs created IMPALA-13074: - Summary: WRITE TO HDFS node is omitted from Web UI graphic plan Key: IMPALA-13074 URL: https://issues.apache.org/jira/browse/IMPALA-13074 Project: IMPALA Issue Type: Bug Reporter: Noemi Pap-Takacs The query plan shows the nodes that take part in the execution, forming a tree structure. It can be displayed in the CLI by issuing the EXPLAIN command. When the actual query is executed, the plan tree can also be viewed in the Impala Web UI in a graphic form. However, the explain string and the graphic plan tree does not match: the top node is missing from the Web UI. This is especially confusing in case of DDL and DML statements, where the Data Sink is not displayed. This makes a SELECT * FROM table indistinguishable from a CREATE TABLE, since both only displays the SCAN node and omit the WRITE_TO_HDFS and SELECT node. It would make sense to include the WRITE_TO_HDFS node in DML/DDL plans. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org