[jira] [Created] (IMPALA-13077) Equality predicate on partition column and uncorrelated subquery doesn't reduce the cardinality estimate

2024-05-13 Thread Quanlong Huang (Jira)
Quanlong Huang created IMPALA-13077:
---

 Summary: Equality predicate on partition column and uncorrelated 
subquery doesn't reduce the cardinality estimate
 Key: IMPALA-13077
 URL: https://issues.apache.org/jira/browse/IMPALA-13077
 Project: IMPALA
  Issue Type: Bug
  Components: Frontend
Reporter: Quanlong Huang


Let's say 'part_tbl' is a partitioned table. Its partition key is 'part_key'. 
Consider the following query:
{code:sql}
select xxx from part_tbl
where part_key=(select ... from dim_tbl);
{code}
Its query plan is a JoinNode with two ScanNodes. When estimating the 
cardinality of the JoinNode, the planner is not aware that 'part_key' is the 
partition column and the cardinality of the JoinNode should not be larger than 
the max row count across partitions.

The recent work in IMPALA-12018 (Consider runtime filter for cardinality 
reduction) helps in some cases since there are runtime filters on the partition 
column. But there are still some cases that we overestimate the cardinality. 
For instance, 'ss_sold_date_sk' is the only partition key of tpcds.store_sales. 
The following query
{code:sql}
select count(*) from tpcds.store_sales
where ss_sold_date_sk=(
  select min(d_date_sk) + 1000 from tpcds.date_dim);{code}
has query plan:
{noformat}
+-+
| Explain String  |
+-+
| Max Per-Host Resource Reservation: Memory=18.94MB Threads=6 |
| Per-Host Resource Estimates: Memory=243MB   |
| |
| PLAN-ROOT SINK  |
| |   |
| 09:AGGREGATE [FINALIZE] |
| |  output: count:merge(*)   |
| |  row-size=8B cardinality=1|
| |   |
| 08:EXCHANGE [UNPARTITIONED] |
| |   |
| 04:AGGREGATE|
| |  output: count(*) |
| |  row-size=8B cardinality=1|
| |   |
| 03:HASH JOIN [LEFT SEMI JOIN, BROADCAST]|
| |  hash predicates: ss_sold_date_sk = min(d_date_sk) + 1000 |
| |  runtime filters: RF000 <- min(d_date_sk) + 1000  |
| |  row-size=4B cardinality=2.88M < Should be max(numRows) across 
partitions
| |   |
| |--07:EXCHANGE [BROADCAST]  |
| |  ||
| |  06:AGGREGATE [FINALIZE]  |
| |  |  output: min:merge(d_date_sk)  |
| |  |  row-size=4B cardinality=1 |
| |  ||
| |  05:EXCHANGE [UNPARTITIONED]  |
| |  ||
| |  02:AGGREGATE |
| |  |  output: min(d_date_sk)|
| |  |  row-size=4B cardinality=1 |
| |  ||
| |  01:SCAN HDFS [tpcds.date_dim]|
| | HDFS partitions=1/1 files=1 size=9.84MB   |
| | row-size=4B cardinality=73.05K|
| |   |
| 00:SCAN HDFS [tpcds.store_sales]|
|HDFS partitions=1824/1824 files=1824 size=346.60MB   |
|runtime filters: RF000 -> ss_sold_date_sk|
|row-size=4B cardinality=2.88M|
+-+{noformat}
CC [~boroknagyz], [~rizaon]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-13076) Add pstack and jstack to Impala docker images

2024-05-13 Thread Andrew Sherman (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Sherman reassigned IMPALA-13076:
---

Assignee: Andrew Sherman

> Add pstack and jstack to Impala docker images
> -
>
> Key: IMPALA-13076
> URL: https://issues.apache.org/jira/browse/IMPALA-13076
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 4.4.0
>Reporter: Andrew Sherman
>Assignee: Andrew Sherman
>Priority: Major
>
> When the Impala docker images are deployed in production environments, it can 
> be hard to add debugging tools at runtime. Two of the most useful diagnosis 
> tools are jstack and pstack, which can be used to find Java and native stack 
> traces. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13076) Add pstack and jstack to Impala docker images

2024-05-13 Thread Andrew Sherman (Jira)
Andrew Sherman created IMPALA-13076:
---

 Summary: Add pstack and jstack to Impala docker images
 Key: IMPALA-13076
 URL: https://issues.apache.org/jira/browse/IMPALA-13076
 Project: IMPALA
  Issue Type: Bug
Affects Versions: Impala 4.4.0
Reporter: Andrew Sherman


When the Impala docker images are deployed in production environments, it can 
be hard to add debugging tools at runtime. Two of the most useful diagnosis 
tools are jstack and pstack, which can be used to find Java and native stack 
traces. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Work started] (IMPALA-12652) Limit Length of Completed Queries Insert DML

2024-05-13 Thread Jason Fehr (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-12652 started by Jason Fehr.
---
> Limit Length of Completed Queries Insert DML
> 
>
> Key: IMPALA-12652
> URL: https://issues.apache.org/jira/browse/IMPALA-12652
> Project: IMPALA
>  Issue Type: Improvement
>  Components: be
>Reporter: Jason Fehr
>Assignee: Jason Fehr
>Priority: Major
>  Labels: backend, workload-management
>
> Implement a coordinator startup flag that limits the max length (number of 
> characters) of the insert DML statement that inserts records into the 
> impala_query_log table.
> The purpose of this flag is to ensure that workload management does not 
> generate an insert DML statement that exceeds Impala's max length for a sql 
> statement (approximately 16 megabytes or 16 million characters).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13075) Setting very high BATCH_SIZE can blow up memory usage of fragments

2024-05-13 Thread Riza Suminto (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846065#comment-17846065
 ] 

Riza Suminto commented on IMPALA-13075:
---

Comparing between [^Mem_Limit_1G_Failed.txt] and [^Batch_size_0_Success.txt], I 
see that both has "Per-Host Resurce Estimates" beyond 1GB MEM_LIMIT, and 
"Per-Host Resource Estimates" that is barely under MEM_LIMIT for some nodes.

That being said, I also see abnormally high "PeakMemoryUsage" for  
HASH_JOIN_NODE (id=8) at 
cdp-datahub-prod-worker4.cdp-prod.c8g3-pfxs.cloudera.site:27000:

 
{code:java}
Fragment F07
  - InactiveTotalTime: 0ns (0)
  - TotalTime: 0ns (0)
  Instance eb4b5cd4eb0fa41e:80d577e00053 
(host=cdp-datahub-prod-worker4.cdp-prod.c8g3-pfxs.cloudera.site:27000)
...
  HASH_JOIN_NODE (id=8)
ExecOption: Join Build-Side Prepared Asynchronously
- InactiveTotalTime: 0ns (0)
- LocalTime: 2.7m (164763538054)
- PeakMemoryUsage: 7.6 GiB (8132571136)
- ProbeRows: 65,536 (65536)
- ProbeRowsPartitioned: 0 (0)
- ProbeTime: 2.7m (164759538041)
- RowsReturned: 0 (0)
- RowsReturnedRate: 0 per second (0)
- TotalTime: 2.8m (166909545061) {code}
This high memory usage seems linked with IMPALA-3286 where 
HashTableCtx::ExprValuesCache::capacity_ is pushed up by BATCH_SIZE number.

[https://github.com/apache/impala/blob/0d215da8d4e3f93ad3c1cd72aa801fbcb9464fb0/be/src/exec/hash-table.cc#L369-L373]

Note also that the failed query does not have "Probe Side Codegen Enabled" as 
in the success query.

> Setting very high BATCH_SIZE can blow up memory usage of fragments
> --
>
> Key: IMPALA-13075
> URL: https://issues.apache.org/jira/browse/IMPALA-13075
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 4.0.0
>Reporter: Ezra Zerihun
>Priority: Major
> Attachments: Batch_size_0_Success.txt, Failed (1).txt, 
> Failed_Cognos_pool.txt, Mem_Limit_1G_Failed.txt, Success (1).txt, 
> Success_Tableau_Pool.txt
>
>
> In Impala 4.0, setting a very high BATCH_SIZE or near max limit of 65536 can 
> cause some fragment's memory usage to spike way past the query's defined 
> MEM_LIMIT or pool's Maximum Query Memory Limit with Clamp on. So even though 
> MEM_LIMIT is set reasonable, the query can still fail with out of memory and 
> a huge amount of memory used on fragment. Reducing BATCH_SIZE to a reasonable 
> amount or back to default will allow the query to run without issue and use 
> reasonable amount of memory within query's MEM_LIMIT or pool's Maximum Query 
> Memory Limit.
>  
> 1) set BATCH_SIZE=65536; set MEM_LIMIT=1g;
>  
> {code:java}
>     Query State: EXCEPTION
>     Impala Query State: ERROR
>     Query Status: Memory limit exceeded: Error occurred on backend ...:27000 
> by fragment ... Memory left in process limit: 145.53 GB Memory left in query 
> limit: -6.80 GB Query(...): memory limit exceeded. Limit=1.00 GB 
> Reservation=86.44 MB ReservationLimit=819.20 MB OtherMemory=7.71 GB 
> Total=7.80 GB Peak=7.84 GB   Unclaimed reservations: Reservation=8.50 MB 
> OtherMemory=0 Total=8.50 MB Peak=56.44 MB   Runtime Filter Bank: 
> Reservation=4.00 MB ReservationLimit=4.00 MB OtherMemory=0 Total=4.00 MB 
> Peak=4.00 MB   Fragment ...: Reservation=1.94 MB OtherMemory=7.59 GB 
> Total=7.59 GB Peak=7.63 GB     HASH_JOIN_NODE (id=8): Reservation=1.94 MB 
> OtherMemory=7.57 GB Total=7.57 GB Peak=7.57 GB       Exprs: Total=7.57 GB 
> Peak=7.57 GB       Hash Join Builder (join_node_id=8): Total=0 Peak=1.95 MB
> ...
> Query Options (set by configuration): 
> BATCH_SIZE=65536,MEM_LIMIT=1073741824,CLIENT_IDENTIFIER=Impala Shell 
> v4.0.0.7.2.16.0-287 (5ae3917) built on Mon Jan  9 21:23:59 UTC 
> 2023,DEFAULT_FILE_FORMAT=PARQUET,...
> ...
>    ExecSummary:
> ...
> 09:AGGREGATE                    32     32    0.000ns    0.000ns        0      
>  4.83M   36.31 MB      212.78 MB  STREAMING                                 
> 08:HASH JOIN                    32     32    5s149ms      2m44s        0     
> 194.95M    7.57 GB        1.94 MB  RIGHT OUTER JOIN, PARTITIONED
> |--18:EXCHANGE                  32     32   93.750us    1.000ms   10.46K      
>  1.55K    1.65 MB        2.56 MB  HASH(...
> {code}
>  
>  
> 2) set BATCH_SIZE=0; set MEM_LIMIT=1g;
>  
> {code:java}
> Query State: FINISHED
> Impala Query State: FINISHED
> ...
> Query Options (set by configuration and planner): 
> MEM_LIMIT=1073741824,CLIENT_IDENTIFIER=Impala Shell v4.0.0.7.2.16.0-287 
> (5ae3917) built on Mon Jan  9 21:23:59 UTC 
> 2023,DEFAULT_FILE_FORMAT=PARQUET,...
> ...
>     ExecSummary:
> ...
> 09:AGGREGATE                    32     32  593.748us   18.999ms       

[jira] [Commented] (IMPALA-13075) Setting very high BATCH_SIZE can blow up memory usage of fragments

2024-05-13 Thread Ezra Zerihun (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846040#comment-17846040
 ] 

Ezra Zerihun commented on IMPALA-13075:
---

[^Failed (1).txt]

[^Success (1).txt]

[^Failed_Cognos_pool.txt]

[^Mem_Limit_1G_Failed.txt]

[^Success_Tableau_Pool.txt]

[^Batch_size_0_Success.txt]

 

Main comparisons, which I made in description can be:
[^Mem_Limit_1G_Failed.txt] vs [^Batch_size_0_Success.txt]

> Setting very high BATCH_SIZE can blow up memory usage of fragments
> --
>
> Key: IMPALA-13075
> URL: https://issues.apache.org/jira/browse/IMPALA-13075
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 4.0.0
>Reporter: Ezra Zerihun
>Priority: Major
> Attachments: Batch_size_0_Success.txt, Failed (1).txt, 
> Failed_Cognos_pool.txt, Mem_Limit_1G_Failed.txt, Success (1).txt, 
> Success_Tableau_Pool.txt
>
>
> In Impala 4.0, setting a very high BATCH_SIZE or near max limit of 65536 can 
> cause some fragment's memory usage to spike way past the query's defined 
> MEM_LIMIT or pool's Maximum Query Memory Limit with Clamp on. So even though 
> MEM_LIMIT is set reasonable, the query can still fail with out of memory and 
> a huge amount of memory used on fragment. Reducing BATCH_SIZE to a reasonable 
> amount or back to default will allow the query to run without issue and use 
> reasonable amount of memory within query's MEM_LIMIT or pool's Maximum Query 
> Memory Limit.
>  
> 1) set BATCH_SIZE=65536; set MEM_LIMIT=1g;
>  
> {code:java}
>     Query State: EXCEPTION
>     Impala Query State: ERROR
>     Query Status: Memory limit exceeded: Error occurred on backend ...:27000 
> by fragment ... Memory left in process limit: 145.53 GB Memory left in query 
> limit: -6.80 GB Query(...): memory limit exceeded. Limit=1.00 GB 
> Reservation=86.44 MB ReservationLimit=819.20 MB OtherMemory=7.71 GB 
> Total=7.80 GB Peak=7.84 GB   Unclaimed reservations: Reservation=8.50 MB 
> OtherMemory=0 Total=8.50 MB Peak=56.44 MB   Runtime Filter Bank: 
> Reservation=4.00 MB ReservationLimit=4.00 MB OtherMemory=0 Total=4.00 MB 
> Peak=4.00 MB   Fragment ...: Reservation=1.94 MB OtherMemory=7.59 GB 
> Total=7.59 GB Peak=7.63 GB     HASH_JOIN_NODE (id=8): Reservation=1.94 MB 
> OtherMemory=7.57 GB Total=7.57 GB Peak=7.57 GB       Exprs: Total=7.57 GB 
> Peak=7.57 GB       Hash Join Builder (join_node_id=8): Total=0 Peak=1.95 MB
> ...
> Query Options (set by configuration): 
> BATCH_SIZE=65536,MEM_LIMIT=1073741824,CLIENT_IDENTIFIER=Impala Shell 
> v4.0.0.7.2.16.0-287 (5ae3917) built on Mon Jan  9 21:23:59 UTC 
> 2023,DEFAULT_FILE_FORMAT=PARQUET,...
> ...
>    ExecSummary:
> ...
> 09:AGGREGATE                    32     32    0.000ns    0.000ns        0      
>  4.83M   36.31 MB      212.78 MB  STREAMING                                 
> 08:HASH JOIN                    32     32    5s149ms      2m44s        0     
> 194.95M    7.57 GB        1.94 MB  RIGHT OUTER JOIN, PARTITIONED
> |--18:EXCHANGE                  32     32   93.750us    1.000ms   10.46K      
>  1.55K    1.65 MB        2.56 MB  HASH(...
> {code}
>  
>  
> 2) set BATCH_SIZE=0; set MEM_LIMIT=1g;
>  
> {code:java}
> Query State: FINISHED
> Impala Query State: FINISHED
> ...
> Query Options (set by configuration and planner): 
> MEM_LIMIT=1073741824,CLIENT_IDENTIFIER=Impala Shell v4.0.0.7.2.16.0-287 
> (5ae3917) built on Mon Jan  9 21:23:59 UTC 
> 2023,DEFAULT_FILE_FORMAT=PARQUET,...
> ...
>     ExecSummary:
> ...
> 09:AGGREGATE                    32     32  593.748us   18.999ms       45      
>  4.83M    34.06 MB      212.78 MB  STREAMING
> 08:HASH JOIN                    32     32   10s873ms      5m47s   10.47K     
> 194.95M   123.48 MB        1.94 MB  RIGHT OUTER JOIN, PARTITIONED
> |--18:EXCHANGE                  32     32    0.000ns    0.000ns   10.46K      
>  1.55K   344.00 KB        1.69 MB  HASH(...
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-13075) Setting very high BATCH_SIZE can blow up memory usage of fragments

2024-05-13 Thread Ezra Zerihun (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ezra Zerihun updated IMPALA-13075:
--
Attachment: Success_Tableau_Pool.txt

> Setting very high BATCH_SIZE can blow up memory usage of fragments
> --
>
> Key: IMPALA-13075
> URL: https://issues.apache.org/jira/browse/IMPALA-13075
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 4.0.0
>Reporter: Ezra Zerihun
>Priority: Major
> Attachments: Batch_size_0_Success.txt, Failed (1).txt, 
> Failed_Cognos_pool.txt, Mem_Limit_1G_Failed.txt, Success (1).txt, 
> Success_Tableau_Pool.txt
>
>
> In Impala 4.0, setting a very high BATCH_SIZE or near max limit of 65536 can 
> cause some fragment's memory usage to spike way past the query's defined 
> MEM_LIMIT or pool's Maximum Query Memory Limit with Clamp on. So even though 
> MEM_LIMIT is set reasonable, the query can still fail with out of memory and 
> a huge amount of memory used on fragment. Reducing BATCH_SIZE to a reasonable 
> amount or back to default will allow the query to run without issue and use 
> reasonable amount of memory within query's MEM_LIMIT or pool's Maximum Query 
> Memory Limit.
>  
> 1) set BATCH_SIZE=65536; set MEM_LIMIT=1g;
>  
> {code:java}
>     Query State: EXCEPTION
>     Impala Query State: ERROR
>     Query Status: Memory limit exceeded: Error occurred on backend ...:27000 
> by fragment ... Memory left in process limit: 145.53 GB Memory left in query 
> limit: -6.80 GB Query(...): memory limit exceeded. Limit=1.00 GB 
> Reservation=86.44 MB ReservationLimit=819.20 MB OtherMemory=7.71 GB 
> Total=7.80 GB Peak=7.84 GB   Unclaimed reservations: Reservation=8.50 MB 
> OtherMemory=0 Total=8.50 MB Peak=56.44 MB   Runtime Filter Bank: 
> Reservation=4.00 MB ReservationLimit=4.00 MB OtherMemory=0 Total=4.00 MB 
> Peak=4.00 MB   Fragment ...: Reservation=1.94 MB OtherMemory=7.59 GB 
> Total=7.59 GB Peak=7.63 GB     HASH_JOIN_NODE (id=8): Reservation=1.94 MB 
> OtherMemory=7.57 GB Total=7.57 GB Peak=7.57 GB       Exprs: Total=7.57 GB 
> Peak=7.57 GB       Hash Join Builder (join_node_id=8): Total=0 Peak=1.95 MB
> ...
> Query Options (set by configuration): 
> BATCH_SIZE=65536,MEM_LIMIT=1073741824,CLIENT_IDENTIFIER=Impala Shell 
> v4.0.0.7.2.16.0-287 (5ae3917) built on Mon Jan  9 21:23:59 UTC 
> 2023,DEFAULT_FILE_FORMAT=PARQUET,...
> ...
>    ExecSummary:
> ...
> 09:AGGREGATE                    32     32    0.000ns    0.000ns        0      
>  4.83M   36.31 MB      212.78 MB  STREAMING                                 
> 08:HASH JOIN                    32     32    5s149ms      2m44s        0     
> 194.95M    7.57 GB        1.94 MB  RIGHT OUTER JOIN, PARTITIONED
> |--18:EXCHANGE                  32     32   93.750us    1.000ms   10.46K      
>  1.55K    1.65 MB        2.56 MB  HASH(...
> {code}
>  
>  
> 2) set BATCH_SIZE=0; set MEM_LIMIT=1g;
>  
> {code:java}
> Query State: FINISHED
> Impala Query State: FINISHED
> ...
> Query Options (set by configuration and planner): 
> MEM_LIMIT=1073741824,CLIENT_IDENTIFIER=Impala Shell v4.0.0.7.2.16.0-287 
> (5ae3917) built on Mon Jan  9 21:23:59 UTC 
> 2023,DEFAULT_FILE_FORMAT=PARQUET,...
> ...
>     ExecSummary:
> ...
> 09:AGGREGATE                    32     32  593.748us   18.999ms       45      
>  4.83M    34.06 MB      212.78 MB  STREAMING
> 08:HASH JOIN                    32     32   10s873ms      5m47s   10.47K     
> 194.95M   123.48 MB        1.94 MB  RIGHT OUTER JOIN, PARTITIONED
> |--18:EXCHANGE                  32     32    0.000ns    0.000ns   10.46K      
>  1.55K   344.00 KB        1.69 MB  HASH(...
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-13075) Setting very high BATCH_SIZE can blow up memory usage of fragments

2024-05-13 Thread Ezra Zerihun (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ezra Zerihun updated IMPALA-13075:
--
Attachment: Failed_Cognos_pool.txt

> Setting very high BATCH_SIZE can blow up memory usage of fragments
> --
>
> Key: IMPALA-13075
> URL: https://issues.apache.org/jira/browse/IMPALA-13075
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 4.0.0
>Reporter: Ezra Zerihun
>Priority: Major
> Attachments: Batch_size_0_Success.txt, Failed (1).txt, 
> Failed_Cognos_pool.txt, Mem_Limit_1G_Failed.txt, Success (1).txt, 
> Success_Tableau_Pool.txt
>
>
> In Impala 4.0, setting a very high BATCH_SIZE or near max limit of 65536 can 
> cause some fragment's memory usage to spike way past the query's defined 
> MEM_LIMIT or pool's Maximum Query Memory Limit with Clamp on. So even though 
> MEM_LIMIT is set reasonable, the query can still fail with out of memory and 
> a huge amount of memory used on fragment. Reducing BATCH_SIZE to a reasonable 
> amount or back to default will allow the query to run without issue and use 
> reasonable amount of memory within query's MEM_LIMIT or pool's Maximum Query 
> Memory Limit.
>  
> 1) set BATCH_SIZE=65536; set MEM_LIMIT=1g;
>  
> {code:java}
>     Query State: EXCEPTION
>     Impala Query State: ERROR
>     Query Status: Memory limit exceeded: Error occurred on backend ...:27000 
> by fragment ... Memory left in process limit: 145.53 GB Memory left in query 
> limit: -6.80 GB Query(...): memory limit exceeded. Limit=1.00 GB 
> Reservation=86.44 MB ReservationLimit=819.20 MB OtherMemory=7.71 GB 
> Total=7.80 GB Peak=7.84 GB   Unclaimed reservations: Reservation=8.50 MB 
> OtherMemory=0 Total=8.50 MB Peak=56.44 MB   Runtime Filter Bank: 
> Reservation=4.00 MB ReservationLimit=4.00 MB OtherMemory=0 Total=4.00 MB 
> Peak=4.00 MB   Fragment ...: Reservation=1.94 MB OtherMemory=7.59 GB 
> Total=7.59 GB Peak=7.63 GB     HASH_JOIN_NODE (id=8): Reservation=1.94 MB 
> OtherMemory=7.57 GB Total=7.57 GB Peak=7.57 GB       Exprs: Total=7.57 GB 
> Peak=7.57 GB       Hash Join Builder (join_node_id=8): Total=0 Peak=1.95 MB
> ...
> Query Options (set by configuration): 
> BATCH_SIZE=65536,MEM_LIMIT=1073741824,CLIENT_IDENTIFIER=Impala Shell 
> v4.0.0.7.2.16.0-287 (5ae3917) built on Mon Jan  9 21:23:59 UTC 
> 2023,DEFAULT_FILE_FORMAT=PARQUET,...
> ...
>    ExecSummary:
> ...
> 09:AGGREGATE                    32     32    0.000ns    0.000ns        0      
>  4.83M   36.31 MB      212.78 MB  STREAMING                                 
> 08:HASH JOIN                    32     32    5s149ms      2m44s        0     
> 194.95M    7.57 GB        1.94 MB  RIGHT OUTER JOIN, PARTITIONED
> |--18:EXCHANGE                  32     32   93.750us    1.000ms   10.46K      
>  1.55K    1.65 MB        2.56 MB  HASH(...
> {code}
>  
>  
> 2) set BATCH_SIZE=0; set MEM_LIMIT=1g;
>  
> {code:java}
> Query State: FINISHED
> Impala Query State: FINISHED
> ...
> Query Options (set by configuration and planner): 
> MEM_LIMIT=1073741824,CLIENT_IDENTIFIER=Impala Shell v4.0.0.7.2.16.0-287 
> (5ae3917) built on Mon Jan  9 21:23:59 UTC 
> 2023,DEFAULT_FILE_FORMAT=PARQUET,...
> ...
>     ExecSummary:
> ...
> 09:AGGREGATE                    32     32  593.748us   18.999ms       45      
>  4.83M    34.06 MB      212.78 MB  STREAMING
> 08:HASH JOIN                    32     32   10s873ms      5m47s   10.47K     
> 194.95M   123.48 MB        1.94 MB  RIGHT OUTER JOIN, PARTITIONED
> |--18:EXCHANGE                  32     32    0.000ns    0.000ns   10.46K      
>  1.55K   344.00 KB        1.69 MB  HASH(...
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-13075) Setting very high BATCH_SIZE can blow up memory usage of fragments

2024-05-13 Thread Ezra Zerihun (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ezra Zerihun updated IMPALA-13075:
--
Attachment: Failed (1).txt

> Setting very high BATCH_SIZE can blow up memory usage of fragments
> --
>
> Key: IMPALA-13075
> URL: https://issues.apache.org/jira/browse/IMPALA-13075
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 4.0.0
>Reporter: Ezra Zerihun
>Priority: Major
> Attachments: Batch_size_0_Success.txt, Failed (1).txt, 
> Failed_Cognos_pool.txt, Mem_Limit_1G_Failed.txt, Success (1).txt, 
> Success_Tableau_Pool.txt
>
>
> In Impala 4.0, setting a very high BATCH_SIZE or near max limit of 65536 can 
> cause some fragment's memory usage to spike way past the query's defined 
> MEM_LIMIT or pool's Maximum Query Memory Limit with Clamp on. So even though 
> MEM_LIMIT is set reasonable, the query can still fail with out of memory and 
> a huge amount of memory used on fragment. Reducing BATCH_SIZE to a reasonable 
> amount or back to default will allow the query to run without issue and use 
> reasonable amount of memory within query's MEM_LIMIT or pool's Maximum Query 
> Memory Limit.
>  
> 1) set BATCH_SIZE=65536; set MEM_LIMIT=1g;
>  
> {code:java}
>     Query State: EXCEPTION
>     Impala Query State: ERROR
>     Query Status: Memory limit exceeded: Error occurred on backend ...:27000 
> by fragment ... Memory left in process limit: 145.53 GB Memory left in query 
> limit: -6.80 GB Query(...): memory limit exceeded. Limit=1.00 GB 
> Reservation=86.44 MB ReservationLimit=819.20 MB OtherMemory=7.71 GB 
> Total=7.80 GB Peak=7.84 GB   Unclaimed reservations: Reservation=8.50 MB 
> OtherMemory=0 Total=8.50 MB Peak=56.44 MB   Runtime Filter Bank: 
> Reservation=4.00 MB ReservationLimit=4.00 MB OtherMemory=0 Total=4.00 MB 
> Peak=4.00 MB   Fragment ...: Reservation=1.94 MB OtherMemory=7.59 GB 
> Total=7.59 GB Peak=7.63 GB     HASH_JOIN_NODE (id=8): Reservation=1.94 MB 
> OtherMemory=7.57 GB Total=7.57 GB Peak=7.57 GB       Exprs: Total=7.57 GB 
> Peak=7.57 GB       Hash Join Builder (join_node_id=8): Total=0 Peak=1.95 MB
> ...
> Query Options (set by configuration): 
> BATCH_SIZE=65536,MEM_LIMIT=1073741824,CLIENT_IDENTIFIER=Impala Shell 
> v4.0.0.7.2.16.0-287 (5ae3917) built on Mon Jan  9 21:23:59 UTC 
> 2023,DEFAULT_FILE_FORMAT=PARQUET,...
> ...
>    ExecSummary:
> ...
> 09:AGGREGATE                    32     32    0.000ns    0.000ns        0      
>  4.83M   36.31 MB      212.78 MB  STREAMING                                 
> 08:HASH JOIN                    32     32    5s149ms      2m44s        0     
> 194.95M    7.57 GB        1.94 MB  RIGHT OUTER JOIN, PARTITIONED
> |--18:EXCHANGE                  32     32   93.750us    1.000ms   10.46K      
>  1.55K    1.65 MB        2.56 MB  HASH(...
> {code}
>  
>  
> 2) set BATCH_SIZE=0; set MEM_LIMIT=1g;
>  
> {code:java}
> Query State: FINISHED
> Impala Query State: FINISHED
> ...
> Query Options (set by configuration and planner): 
> MEM_LIMIT=1073741824,CLIENT_IDENTIFIER=Impala Shell v4.0.0.7.2.16.0-287 
> (5ae3917) built on Mon Jan  9 21:23:59 UTC 
> 2023,DEFAULT_FILE_FORMAT=PARQUET,...
> ...
>     ExecSummary:
> ...
> 09:AGGREGATE                    32     32  593.748us   18.999ms       45      
>  4.83M    34.06 MB      212.78 MB  STREAMING
> 08:HASH JOIN                    32     32   10s873ms      5m47s   10.47K     
> 194.95M   123.48 MB        1.94 MB  RIGHT OUTER JOIN, PARTITIONED
> |--18:EXCHANGE                  32     32    0.000ns    0.000ns   10.46K      
>  1.55K   344.00 KB        1.69 MB  HASH(...
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-13075) Setting very high BATCH_SIZE can blow up memory usage of fragments

2024-05-13 Thread Ezra Zerihun (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ezra Zerihun updated IMPALA-13075:
--
Attachment: Mem_Limit_1G_Failed.txt

> Setting very high BATCH_SIZE can blow up memory usage of fragments
> --
>
> Key: IMPALA-13075
> URL: https://issues.apache.org/jira/browse/IMPALA-13075
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 4.0.0
>Reporter: Ezra Zerihun
>Priority: Major
> Attachments: Batch_size_0_Success.txt, Failed (1).txt, 
> Failed_Cognos_pool.txt, Mem_Limit_1G_Failed.txt, Success (1).txt, 
> Success_Tableau_Pool.txt
>
>
> In Impala 4.0, setting a very high BATCH_SIZE or near max limit of 65536 can 
> cause some fragment's memory usage to spike way past the query's defined 
> MEM_LIMIT or pool's Maximum Query Memory Limit with Clamp on. So even though 
> MEM_LIMIT is set reasonable, the query can still fail with out of memory and 
> a huge amount of memory used on fragment. Reducing BATCH_SIZE to a reasonable 
> amount or back to default will allow the query to run without issue and use 
> reasonable amount of memory within query's MEM_LIMIT or pool's Maximum Query 
> Memory Limit.
>  
> 1) set BATCH_SIZE=65536; set MEM_LIMIT=1g;
>  
> {code:java}
>     Query State: EXCEPTION
>     Impala Query State: ERROR
>     Query Status: Memory limit exceeded: Error occurred on backend ...:27000 
> by fragment ... Memory left in process limit: 145.53 GB Memory left in query 
> limit: -6.80 GB Query(...): memory limit exceeded. Limit=1.00 GB 
> Reservation=86.44 MB ReservationLimit=819.20 MB OtherMemory=7.71 GB 
> Total=7.80 GB Peak=7.84 GB   Unclaimed reservations: Reservation=8.50 MB 
> OtherMemory=0 Total=8.50 MB Peak=56.44 MB   Runtime Filter Bank: 
> Reservation=4.00 MB ReservationLimit=4.00 MB OtherMemory=0 Total=4.00 MB 
> Peak=4.00 MB   Fragment ...: Reservation=1.94 MB OtherMemory=7.59 GB 
> Total=7.59 GB Peak=7.63 GB     HASH_JOIN_NODE (id=8): Reservation=1.94 MB 
> OtherMemory=7.57 GB Total=7.57 GB Peak=7.57 GB       Exprs: Total=7.57 GB 
> Peak=7.57 GB       Hash Join Builder (join_node_id=8): Total=0 Peak=1.95 MB
> ...
> Query Options (set by configuration): 
> BATCH_SIZE=65536,MEM_LIMIT=1073741824,CLIENT_IDENTIFIER=Impala Shell 
> v4.0.0.7.2.16.0-287 (5ae3917) built on Mon Jan  9 21:23:59 UTC 
> 2023,DEFAULT_FILE_FORMAT=PARQUET,...
> ...
>    ExecSummary:
> ...
> 09:AGGREGATE                    32     32    0.000ns    0.000ns        0      
>  4.83M   36.31 MB      212.78 MB  STREAMING                                 
> 08:HASH JOIN                    32     32    5s149ms      2m44s        0     
> 194.95M    7.57 GB        1.94 MB  RIGHT OUTER JOIN, PARTITIONED
> |--18:EXCHANGE                  32     32   93.750us    1.000ms   10.46K      
>  1.55K    1.65 MB        2.56 MB  HASH(...
> {code}
>  
>  
> 2) set BATCH_SIZE=0; set MEM_LIMIT=1g;
>  
> {code:java}
> Query State: FINISHED
> Impala Query State: FINISHED
> ...
> Query Options (set by configuration and planner): 
> MEM_LIMIT=1073741824,CLIENT_IDENTIFIER=Impala Shell v4.0.0.7.2.16.0-287 
> (5ae3917) built on Mon Jan  9 21:23:59 UTC 
> 2023,DEFAULT_FILE_FORMAT=PARQUET,...
> ...
>     ExecSummary:
> ...
> 09:AGGREGATE                    32     32  593.748us   18.999ms       45      
>  4.83M    34.06 MB      212.78 MB  STREAMING
> 08:HASH JOIN                    32     32   10s873ms      5m47s   10.47K     
> 194.95M   123.48 MB        1.94 MB  RIGHT OUTER JOIN, PARTITIONED
> |--18:EXCHANGE                  32     32    0.000ns    0.000ns   10.46K      
>  1.55K   344.00 KB        1.69 MB  HASH(...
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-13075) Setting very high BATCH_SIZE can blow up memory usage of fragments

2024-05-13 Thread Ezra Zerihun (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ezra Zerihun updated IMPALA-13075:
--
Attachment: Batch_size_0_Success.txt

> Setting very high BATCH_SIZE can blow up memory usage of fragments
> --
>
> Key: IMPALA-13075
> URL: https://issues.apache.org/jira/browse/IMPALA-13075
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 4.0.0
>Reporter: Ezra Zerihun
>Priority: Major
> Attachments: Batch_size_0_Success.txt, Failed (1).txt, 
> Failed_Cognos_pool.txt, Mem_Limit_1G_Failed.txt, Success (1).txt, 
> Success_Tableau_Pool.txt
>
>
> In Impala 4.0, setting a very high BATCH_SIZE or near max limit of 65536 can 
> cause some fragment's memory usage to spike way past the query's defined 
> MEM_LIMIT or pool's Maximum Query Memory Limit with Clamp on. So even though 
> MEM_LIMIT is set reasonable, the query can still fail with out of memory and 
> a huge amount of memory used on fragment. Reducing BATCH_SIZE to a reasonable 
> amount or back to default will allow the query to run without issue and use 
> reasonable amount of memory within query's MEM_LIMIT or pool's Maximum Query 
> Memory Limit.
>  
> 1) set BATCH_SIZE=65536; set MEM_LIMIT=1g;
>  
> {code:java}
>     Query State: EXCEPTION
>     Impala Query State: ERROR
>     Query Status: Memory limit exceeded: Error occurred on backend ...:27000 
> by fragment ... Memory left in process limit: 145.53 GB Memory left in query 
> limit: -6.80 GB Query(...): memory limit exceeded. Limit=1.00 GB 
> Reservation=86.44 MB ReservationLimit=819.20 MB OtherMemory=7.71 GB 
> Total=7.80 GB Peak=7.84 GB   Unclaimed reservations: Reservation=8.50 MB 
> OtherMemory=0 Total=8.50 MB Peak=56.44 MB   Runtime Filter Bank: 
> Reservation=4.00 MB ReservationLimit=4.00 MB OtherMemory=0 Total=4.00 MB 
> Peak=4.00 MB   Fragment ...: Reservation=1.94 MB OtherMemory=7.59 GB 
> Total=7.59 GB Peak=7.63 GB     HASH_JOIN_NODE (id=8): Reservation=1.94 MB 
> OtherMemory=7.57 GB Total=7.57 GB Peak=7.57 GB       Exprs: Total=7.57 GB 
> Peak=7.57 GB       Hash Join Builder (join_node_id=8): Total=0 Peak=1.95 MB
> ...
> Query Options (set by configuration): 
> BATCH_SIZE=65536,MEM_LIMIT=1073741824,CLIENT_IDENTIFIER=Impala Shell 
> v4.0.0.7.2.16.0-287 (5ae3917) built on Mon Jan  9 21:23:59 UTC 
> 2023,DEFAULT_FILE_FORMAT=PARQUET,...
> ...
>    ExecSummary:
> ...
> 09:AGGREGATE                    32     32    0.000ns    0.000ns        0      
>  4.83M   36.31 MB      212.78 MB  STREAMING                                 
> 08:HASH JOIN                    32     32    5s149ms      2m44s        0     
> 194.95M    7.57 GB        1.94 MB  RIGHT OUTER JOIN, PARTITIONED
> |--18:EXCHANGE                  32     32   93.750us    1.000ms   10.46K      
>  1.55K    1.65 MB        2.56 MB  HASH(...
> {code}
>  
>  
> 2) set BATCH_SIZE=0; set MEM_LIMIT=1g;
>  
> {code:java}
> Query State: FINISHED
> Impala Query State: FINISHED
> ...
> Query Options (set by configuration and planner): 
> MEM_LIMIT=1073741824,CLIENT_IDENTIFIER=Impala Shell v4.0.0.7.2.16.0-287 
> (5ae3917) built on Mon Jan  9 21:23:59 UTC 
> 2023,DEFAULT_FILE_FORMAT=PARQUET,...
> ...
>     ExecSummary:
> ...
> 09:AGGREGATE                    32     32  593.748us   18.999ms       45      
>  4.83M    34.06 MB      212.78 MB  STREAMING
> 08:HASH JOIN                    32     32   10s873ms      5m47s   10.47K     
> 194.95M   123.48 MB        1.94 MB  RIGHT OUTER JOIN, PARTITIONED
> |--18:EXCHANGE                  32     32    0.000ns    0.000ns   10.46K      
>  1.55K   344.00 KB        1.69 MB  HASH(...
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-13075) Setting very high BATCH_SIZE can blow up memory usage of fragments

2024-05-13 Thread Ezra Zerihun (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ezra Zerihun updated IMPALA-13075:
--
Attachment: Success (1).txt

> Setting very high BATCH_SIZE can blow up memory usage of fragments
> --
>
> Key: IMPALA-13075
> URL: https://issues.apache.org/jira/browse/IMPALA-13075
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 4.0.0
>Reporter: Ezra Zerihun
>Priority: Major
> Attachments: Batch_size_0_Success.txt, Failed (1).txt, 
> Failed_Cognos_pool.txt, Mem_Limit_1G_Failed.txt, Success (1).txt, 
> Success_Tableau_Pool.txt
>
>
> In Impala 4.0, setting a very high BATCH_SIZE or near max limit of 65536 can 
> cause some fragment's memory usage to spike way past the query's defined 
> MEM_LIMIT or pool's Maximum Query Memory Limit with Clamp on. So even though 
> MEM_LIMIT is set reasonable, the query can still fail with out of memory and 
> a huge amount of memory used on fragment. Reducing BATCH_SIZE to a reasonable 
> amount or back to default will allow the query to run without issue and use 
> reasonable amount of memory within query's MEM_LIMIT or pool's Maximum Query 
> Memory Limit.
>  
> 1) set BATCH_SIZE=65536; set MEM_LIMIT=1g;
>  
> {code:java}
>     Query State: EXCEPTION
>     Impala Query State: ERROR
>     Query Status: Memory limit exceeded: Error occurred on backend ...:27000 
> by fragment ... Memory left in process limit: 145.53 GB Memory left in query 
> limit: -6.80 GB Query(...): memory limit exceeded. Limit=1.00 GB 
> Reservation=86.44 MB ReservationLimit=819.20 MB OtherMemory=7.71 GB 
> Total=7.80 GB Peak=7.84 GB   Unclaimed reservations: Reservation=8.50 MB 
> OtherMemory=0 Total=8.50 MB Peak=56.44 MB   Runtime Filter Bank: 
> Reservation=4.00 MB ReservationLimit=4.00 MB OtherMemory=0 Total=4.00 MB 
> Peak=4.00 MB   Fragment ...: Reservation=1.94 MB OtherMemory=7.59 GB 
> Total=7.59 GB Peak=7.63 GB     HASH_JOIN_NODE (id=8): Reservation=1.94 MB 
> OtherMemory=7.57 GB Total=7.57 GB Peak=7.57 GB       Exprs: Total=7.57 GB 
> Peak=7.57 GB       Hash Join Builder (join_node_id=8): Total=0 Peak=1.95 MB
> ...
> Query Options (set by configuration): 
> BATCH_SIZE=65536,MEM_LIMIT=1073741824,CLIENT_IDENTIFIER=Impala Shell 
> v4.0.0.7.2.16.0-287 (5ae3917) built on Mon Jan  9 21:23:59 UTC 
> 2023,DEFAULT_FILE_FORMAT=PARQUET,...
> ...
>    ExecSummary:
> ...
> 09:AGGREGATE                    32     32    0.000ns    0.000ns        0      
>  4.83M   36.31 MB      212.78 MB  STREAMING                                 
> 08:HASH JOIN                    32     32    5s149ms      2m44s        0     
> 194.95M    7.57 GB        1.94 MB  RIGHT OUTER JOIN, PARTITIONED
> |--18:EXCHANGE                  32     32   93.750us    1.000ms   10.46K      
>  1.55K    1.65 MB        2.56 MB  HASH(...
> {code}
>  
>  
> 2) set BATCH_SIZE=0; set MEM_LIMIT=1g;
>  
> {code:java}
> Query State: FINISHED
> Impala Query State: FINISHED
> ...
> Query Options (set by configuration and planner): 
> MEM_LIMIT=1073741824,CLIENT_IDENTIFIER=Impala Shell v4.0.0.7.2.16.0-287 
> (5ae3917) built on Mon Jan  9 21:23:59 UTC 
> 2023,DEFAULT_FILE_FORMAT=PARQUET,...
> ...
>     ExecSummary:
> ...
> 09:AGGREGATE                    32     32  593.748us   18.999ms       45      
>  4.83M    34.06 MB      212.78 MB  STREAMING
> 08:HASH JOIN                    32     32   10s873ms      5m47s   10.47K     
> 194.95M   123.48 MB        1.94 MB  RIGHT OUTER JOIN, PARTITIONED
> |--18:EXCHANGE                  32     32    0.000ns    0.000ns   10.46K      
>  1.55K   344.00 KB        1.69 MB  HASH(...
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13075) Setting very high BATCH_SIZE can blow up memory usage of fragments

2024-05-13 Thread Riza Suminto (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846017#comment-17846017
 ] 

Riza Suminto commented on IMPALA-13075:
---

Yes, BATCH_SIZE number is a basic unit of how Impala estimate / allocate memory.
[https://cwiki.apache.org/confluence/display/IMPALA/Impala+Row+Batches] 

Both Frontend Planner and Backend Executor respect this BATCH_SIZE number. If 
MEM_LIMIT still above minimum memory resource requirement, I would expect that 
query can still get admitted and run even though it is not performant (ie., it 
need to spill rows to disk). Each fragment claim their minimum memory 
requirement right after they're instantiated.

Please attach the full query profile of both good and bad run so we can analyze 
it more.

> Setting very high BATCH_SIZE can blow up memory usage of fragments
> --
>
> Key: IMPALA-13075
> URL: https://issues.apache.org/jira/browse/IMPALA-13075
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 4.0.0
>Reporter: Ezra Zerihun
>Priority: Major
>
> In Impala 4.0, setting a very high BATCH_SIZE or near max limit of 65536 can 
> cause some fragment's memory usage to spike way past the query's defined 
> MEM_LIMIT or pool's Maximum Query Memory Limit with Clamp on. So even though 
> MEM_LIMIT is set reasonable, the query can still fail with out of memory and 
> a huge amount of memory used on fragment. Reducing BATCH_SIZE to a reasonable 
> amount or back to default will allow the query to run without issue and use 
> reasonable amount of memory within query's MEM_LIMIT or pool's Maximum Query 
> Memory Limit.
>  
> 1) set BATCH_SIZE=65536; set MEM_LIMIT=1g;
>  
> {code:java}
>     Query State: EXCEPTION
>     Impala Query State: ERROR
>     Query Status: Memory limit exceeded: Error occurred on backend ...:27000 
> by fragment ... Memory left in process limit: 145.53 GB Memory left in query 
> limit: -6.80 GB Query(...): memory limit exceeded. Limit=1.00 GB 
> Reservation=86.44 MB ReservationLimit=819.20 MB OtherMemory=7.71 GB 
> Total=7.80 GB Peak=7.84 GB   Unclaimed reservations: Reservation=8.50 MB 
> OtherMemory=0 Total=8.50 MB Peak=56.44 MB   Runtime Filter Bank: 
> Reservation=4.00 MB ReservationLimit=4.00 MB OtherMemory=0 Total=4.00 MB 
> Peak=4.00 MB   Fragment ...: Reservation=1.94 MB OtherMemory=7.59 GB 
> Total=7.59 GB Peak=7.63 GB     HASH_JOIN_NODE (id=8): Reservation=1.94 MB 
> OtherMemory=7.57 GB Total=7.57 GB Peak=7.57 GB       Exprs: Total=7.57 GB 
> Peak=7.57 GB       Hash Join Builder (join_node_id=8): Total=0 Peak=1.95 MB
> ...
> Query Options (set by configuration): 
> BATCH_SIZE=65536,MEM_LIMIT=1073741824,CLIENT_IDENTIFIER=Impala Shell 
> v4.0.0.7.2.16.0-287 (5ae3917) built on Mon Jan  9 21:23:59 UTC 
> 2023,DEFAULT_FILE_FORMAT=PARQUET,...
> ...
>    ExecSummary:
> ...
> 09:AGGREGATE                    32     32    0.000ns    0.000ns        0      
>  4.83M   36.31 MB      212.78 MB  STREAMING                                 
> 08:HASH JOIN                    32     32    5s149ms      2m44s        0     
> 194.95M    7.57 GB        1.94 MB  RIGHT OUTER JOIN, PARTITIONED
> |--18:EXCHANGE                  32     32   93.750us    1.000ms   10.46K      
>  1.55K    1.65 MB        2.56 MB  HASH(...
> {code}
>  
>  
> 2) set BATCH_SIZE=0; set MEM_LIMIT=1g;
>  
> {code:java}
> Query State: FINISHED
> Impala Query State: FINISHED
> ...
> Query Options (set by configuration and planner): 
> MEM_LIMIT=1073741824,CLIENT_IDENTIFIER=Impala Shell v4.0.0.7.2.16.0-287 
> (5ae3917) built on Mon Jan  9 21:23:59 UTC 
> 2023,DEFAULT_FILE_FORMAT=PARQUET,...
> ...
>     ExecSummary:
> ...
> 09:AGGREGATE                    32     32  593.748us   18.999ms       45      
>  4.83M    34.06 MB      212.78 MB  STREAMING
> 08:HASH JOIN                    32     32   10s873ms      5m47s   10.47K     
> 194.95M   123.48 MB        1.94 MB  RIGHT OUTER JOIN, PARTITIONED
> |--18:EXCHANGE                  32     32    0.000ns    0.000ns   10.46K      
>  1.55K   344.00 KB        1.69 MB  HASH(...
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13075) Setting very high BATCH_SIZE can blow up memory usage of fragments

2024-05-13 Thread Ezra Zerihun (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846008#comment-17846008
 ] 

Ezra Zerihun commented on IMPALA-13075:
---

This seems to be expected behavior as high BATCH_SIZE will store more rows into 
memory. Even documentation mentions the higher memory footprint.

But I have query profiles from a customer who observed behavior above and did 
not realize why queries failed with out of memory when their pool set 
BATCH_SIZE to max limit of 65536. So just thought to make this improvement Jira 
in case anything can be improved memory consumption of setting BATCH_SIZE. If 
not, feel free to close.

> Setting very high BATCH_SIZE can blow up memory usage of fragments
> --
>
> Key: IMPALA-13075
> URL: https://issues.apache.org/jira/browse/IMPALA-13075
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 4.0.0
>Reporter: Ezra Zerihun
>Priority: Major
>
> In Impala 4.0, setting a very high BATCH_SIZE or near max limit of 65536 can 
> cause some fragment's memory usage to spike way past the query's defined 
> MEM_LIMIT or pool's Maximum Query Memory Limit with Clamp on. So even though 
> MEM_LIMIT is set reasonable, the query can still fail with out of memory and 
> a huge amount of memory used on fragment. Reducing BATCH_SIZE to a reasonable 
> amount or back to default will allow the query to run without issue and use 
> reasonable amount of memory within query's MEM_LIMIT or pool's Maximum Query 
> Memory Limit.
>  
> 1) set BATCH_SIZE=65536; set MEM_LIMIT=1g;
>  
> {code:java}
>     Query State: EXCEPTION
>     Impala Query State: ERROR
>     Query Status: Memory limit exceeded: Error occurred on backend ...:27000 
> by fragment ... Memory left in process limit: 145.53 GB Memory left in query 
> limit: -6.80 GB Query(...): memory limit exceeded. Limit=1.00 GB 
> Reservation=86.44 MB ReservationLimit=819.20 MB OtherMemory=7.71 GB 
> Total=7.80 GB Peak=7.84 GB   Unclaimed reservations: Reservation=8.50 MB 
> OtherMemory=0 Total=8.50 MB Peak=56.44 MB   Runtime Filter Bank: 
> Reservation=4.00 MB ReservationLimit=4.00 MB OtherMemory=0 Total=4.00 MB 
> Peak=4.00 MB   Fragment ...: Reservation=1.94 MB OtherMemory=7.59 GB 
> Total=7.59 GB Peak=7.63 GB     HASH_JOIN_NODE (id=8): Reservation=1.94 MB 
> OtherMemory=7.57 GB Total=7.57 GB Peak=7.57 GB       Exprs: Total=7.57 GB 
> Peak=7.57 GB       Hash Join Builder (join_node_id=8): Total=0 Peak=1.95 MB
> ...
> Query Options (set by configuration): 
> BATCH_SIZE=65536,MEM_LIMIT=1073741824,CLIENT_IDENTIFIER=Impala Shell 
> v4.0.0.7.2.16.0-287 (5ae3917) built on Mon Jan  9 21:23:59 UTC 
> 2023,DEFAULT_FILE_FORMAT=PARQUET,...
> ...
>    ExecSummary:
> ...
> 09:AGGREGATE                    32     32    0.000ns    0.000ns        0      
>  4.83M   36.31 MB      212.78 MB  STREAMING                                 
> 08:HASH JOIN                    32     32    5s149ms      2m44s        0     
> 194.95M    7.57 GB        1.94 MB  RIGHT OUTER JOIN, PARTITIONED
> |--18:EXCHANGE                  32     32   93.750us    1.000ms   10.46K      
>  1.55K    1.65 MB        2.56 MB  HASH(...
> {code}
>  
>  
> 2) set BATCH_SIZE=0; set MEM_LIMIT=1g;
>  
> {code:java}
> Query State: FINISHED
> Impala Query State: FINISHED
> ...
> Query Options (set by configuration and planner): 
> MEM_LIMIT=1073741824,CLIENT_IDENTIFIER=Impala Shell v4.0.0.7.2.16.0-287 
> (5ae3917) built on Mon Jan  9 21:23:59 UTC 
> 2023,DEFAULT_FILE_FORMAT=PARQUET,...
> ...
>     ExecSummary:
> ...
> 09:AGGREGATE                    32     32  593.748us   18.999ms       45      
>  4.83M    34.06 MB      212.78 MB  STREAMING
> 08:HASH JOIN                    32     32   10s873ms      5m47s   10.47K     
> 194.95M   123.48 MB        1.94 MB  RIGHT OUTER JOIN, PARTITIONED
> |--18:EXCHANGE                  32     32    0.000ns    0.000ns   10.46K      
>  1.55K   344.00 KB        1.69 MB  HASH(...
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13075) Setting very high BATCH_SIZE can blow up memory usage of fragments

2024-05-13 Thread Ezra Zerihun (Jira)
Ezra Zerihun created IMPALA-13075:
-

 Summary: Setting very high BATCH_SIZE can blow up memory usage of 
fragments
 Key: IMPALA-13075
 URL: https://issues.apache.org/jira/browse/IMPALA-13075
 Project: IMPALA
  Issue Type: Improvement
  Components: Backend
Affects Versions: Impala 4.0.0
Reporter: Ezra Zerihun


In Impala 4.0, setting a very high BATCH_SIZE or near max limit of 65536 can 
cause some fragment's memory usage to spike way past the query's defined 
MEM_LIMIT or pool's Maximum Query Memory Limit with Clamp on. So even though 
MEM_LIMIT is set reasonable, the query can still fail with out of memory and a 
huge amount of memory used on fragment. Reducing BATCH_SIZE to a reasonable 
amount or back to default will allow the query to run without issue and use 
reasonable amount of memory within query's MEM_LIMIT or pool's Maximum Query 
Memory Limit.

 

1) set BATCH_SIZE=65536; set MEM_LIMIT=1g;

 
{code:java}
    Query State: EXCEPTION
    Impala Query State: ERROR
    Query Status: Memory limit exceeded: Error occurred on backend ...:27000 by 
fragment ... Memory left in process limit: 145.53 GB Memory left in query 
limit: -6.80 GB Query(...): memory limit exceeded. Limit=1.00 GB 
Reservation=86.44 MB ReservationLimit=819.20 MB OtherMemory=7.71 GB Total=7.80 
GB Peak=7.84 GB   Unclaimed reservations: Reservation=8.50 MB OtherMemory=0 
Total=8.50 MB Peak=56.44 MB   Runtime Filter Bank: Reservation=4.00 MB 
ReservationLimit=4.00 MB OtherMemory=0 Total=4.00 MB Peak=4.00 MB   Fragment 
...: Reservation=1.94 MB OtherMemory=7.59 GB Total=7.59 GB Peak=7.63 GB     
HASH_JOIN_NODE (id=8): Reservation=1.94 MB OtherMemory=7.57 GB Total=7.57 GB 
Peak=7.57 GB       Exprs: Total=7.57 GB Peak=7.57 GB       Hash Join Builder 
(join_node_id=8): Total=0 Peak=1.95 MB
...
Query Options (set by configuration): 
BATCH_SIZE=65536,MEM_LIMIT=1073741824,CLIENT_IDENTIFIER=Impala Shell 
v4.0.0.7.2.16.0-287 (5ae3917) built on Mon Jan  9 21:23:59 UTC 
2023,DEFAULT_FILE_FORMAT=PARQUET,...
...
   ExecSummary:
...
09:AGGREGATE                    32     32    0.000ns    0.000ns        0       
4.83M   36.31 MB      212.78 MB  STREAMING                                 
08:HASH JOIN                    32     32    5s149ms      2m44s        0     
194.95M    7.57 GB        1.94 MB  RIGHT OUTER JOIN, PARTITIONED
|--18:EXCHANGE                  32     32   93.750us    1.000ms   10.46K       
1.55K    1.65 MB        2.56 MB  HASH(...
{code}
 

 

2) set BATCH_SIZE=0; set MEM_LIMIT=1g;

 
{code:java}
Query State: FINISHED
Impala Query State: FINISHED
...
Query Options (set by configuration and planner): 
MEM_LIMIT=1073741824,CLIENT_IDENTIFIER=Impala Shell v4.0.0.7.2.16.0-287 
(5ae3917) built on Mon Jan  9 21:23:59 UTC 2023,DEFAULT_FILE_FORMAT=PARQUET,...
...
    ExecSummary:
...
09:AGGREGATE                    32     32  593.748us   18.999ms       45       
4.83M    34.06 MB      212.78 MB  STREAMING
08:HASH JOIN                    32     32   10s873ms      5m47s   10.47K     
194.95M   123.48 MB        1.94 MB  RIGHT OUTER JOIN, PARTITIONED
|--18:EXCHANGE                  32     32    0.000ns    0.000ns   10.46K       
1.55K   344.00 KB        1.69 MB  HASH(...
{code}
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13061) Query Live table fails to load if default_transactional_type=insert_only set globally

2024-05-13 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845906#comment-17845906
 ] 

ASF subversion and git services commented on IMPALA-13061:
--

Commit 338fedb44703646664e2e22c6e2f35336924db22 in impala's branch 
refs/heads/branch-4.4.0 from Michael Smith
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=338fedb44 ]

IMPALA-13061: Create query live as external table

Impala determines whether a managed table is transactional based on the
'transactional' table property. It assumes any managed table with
transactional=true returns non-null getValidWriteIds.

When 'default_transactional_type=insert_only' is set at startup (via
default_query_options), impala_query_live is created as a managed table
with transactional=true, but SystemTables don't implement
getValidWriteIds and are not meant to be transactional.

DataSourceTable has a similar problem, and when a JDBC table is
created setJdbcDataSourceProperties sets transactional=false. This
patch uses CREATE EXTERNAL TABLE sys.impala_Query_live so that it is not
created as a managed table and 'transactional' is not set. That avoids
creating a SystemTable that Impala can't read (it encounters an
IllegalStateException).

Change-Id: Ie60a2bd03fabc63c85bcd9fa2489e9d47cd2aa65
Reviewed-on: http://gerrit.cloudera.org:8080/21401
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
(cherry picked from commit 1233ac3c579b5929866dba23debae63e5d2aae90)


> Query Live table fails to load if default_transactional_type=insert_only set 
> globally
> -
>
> Key: IMPALA-13061
> URL: https://issues.apache.org/jira/browse/IMPALA-13061
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Michael Smith
>Assignee: Michael Smith
>Priority: Critical
> Fix For: Impala 4.5.0
>
>
> If transactional type defaults to insert_only for all queries via
> {code}
> --default_query_options=default_transactional_type=insert_only
> {code}
> the table definition for {{sys.impala_query_live}} is set to transactional, 
> which causes an exception in catalogd
> {code}
> I0506 22:07:42.808758  3972 jni-util.cc:302] 
> 4547b965aeebc5f0:8ba96c58] java.lang.IllegalStateException
> at 
> com.google.common.base.Preconditions.checkState(Preconditions.java:496)
> at org.apache.impala.catalog.Table.getPartialInfo(Table.java:851)
> at 
> org.apache.impala.catalog.CatalogServiceCatalog.doGetPartialCatalogObject(CatalogServiceCatalog.java:3818)
> at 
> org.apache.impala.catalog.CatalogServiceCatalog.getPartialCatalogObject(CatalogServiceCatalog.java:3714)
> at 
> org.apache.impala.catalog.CatalogServiceCatalog.getPartialCatalogObject(CatalogServiceCatalog.java:3681)
> at 
> org.apache.impala.service.JniCatalog.lambda$getPartialCatalogObject$10(JniCatalog.java:431)
> at 
> org.apache.impala.service.JniCatalogOp.lambda$execAndSerialize$1(JniCatalogOp.java:90)
> at org.apache.impala.service.JniCatalogOp.execOp(JniCatalogOp.java:58)
> at 
> org.apache.impala.service.JniCatalogOp.execAndSerialize(JniCatalogOp.java:89)
> at 
> org.apache.impala.service.JniCatalogOp.execAndSerializeSilentStartAndFinish(JniCatalogOp.java:109)
> at 
> org.apache.impala.service.JniCatalog.execAndSerializeSilentStartAndFinish(JniCatalog.java:253)
> at 
> org.apache.impala.service.JniCatalog.getPartialCatalogObject(JniCatalog.java:430)
> {code}
> We need to override that setting while creating {{sys.impala_query_live}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13045) Fix intermittent failure in TestQueryLive.test_local_catalog

2024-05-13 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845905#comment-17845905
 ] 

ASF subversion and git services commented on IMPALA-13045:
--

Commit 39233ba3d134b8c18f6f208a7d85c3fadf8ee371 in impala's branch 
refs/heads/branch-4.4.0 from Michael Smith
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=39233ba3d ]

IMPALA-13045: Wait for impala_query_live to exist

Waits for creation of 'sys.impala_query_live' in tests to ensure it has
been registered with HMS.

Change-Id: I5cc3fa3c43be7af9a5f097359a0d4f20d057a207
Reviewed-on: http://gerrit.cloudera.org:8080/21372
Reviewed-by: Impala Public Jenkins 
Tested-by: Michael Smith 
(cherry picked from commit b35aa819653dce062109e61d8f30171234dce5f9)


> Fix intermittent failure in TestQueryLive.test_local_catalog
> 
>
> Key: IMPALA-13045
> URL: https://issues.apache.org/jira/browse/IMPALA-13045
> Project: IMPALA
>  Issue Type: Task
>Reporter: Michael Smith
>Assignee: Michael Smith
>Priority: Major
> Fix For: Impala 4.5.0
>
>
> IMPALA-13005 introduced {{drop table sys.impala_query_live}}. In some test 
> environments (notably testing with Ozone), recreating that table in the 
> following test - test_local_catalog - does not occur before running the test 
> case portion that attempts to query that table.
> Update the test to wait for the table to be available.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12910) Run TPCH/TPCDS queries for external JDBC tables

2024-05-13 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845902#comment-17845902
 ] 

ASF subversion and git services commented on IMPALA-12910:
--

Commit 01401a0368cb8f19c86dc3fab764ee4b5732f2f6 in impala's branch 
refs/heads/branch-4.4.0 from wzhou-code
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=01401a036 ]

IMPALA-12910: Support running TPCH/TPCDS queries for JDBC tables

This patch adds script to create external JDBC tables for the dataset of
TPCH and TPCDS, and adds unit-tests to run TPCH and TPCDS queries for
external JDBC tables with Impala-Impala federation. Note that JDBC
tables are mapping tables, they don't take additional disk spaces.
It fixes the race condition when caching of SQL DataSource objects by
using a new DataSourceObjectCache class, which checks reference count
before closing SQL DataSource.
Adds a new query-option 'clean_dbcp_ds_cache' with default value as
true. When it's set as false, SQL DataSource object will not be closed
when its reference count equals 0 and will be kept in cache until
the SQL DataSource is idle for more than 5 minutes. Flag variable
'dbcp_data_source_idle_timeout_s' is added to make the duration
configurable.
java.sql.Connection.close() fails to remove a closed connection from
connection pool sometimes, which causes JDBC working threads to wait
for available connections from the connection pool for a long time.
The work around is to call BasicDataSource.invalidateConnection() API
to close a connection.
Two flag variables are added for DBCP configuration properties
'maxTotal' and 'maxWaitMillis'. Note that 'maxActive' and 'maxWait'
properties are renamed to 'maxTotal' and 'maxWaitMillis' respectively
in apache.commons.dbcp v2.
Fixes a bug for database type comparison since the type strings
specified by user could be lower case or mix of upper/lower cases, but
the code compares the types with upper case string.
Fixes issue to close SQL DataSource object in JdbcDataSource.open()
and JdbcDataSource.getNext() when some errors returned from DBCP APIs
or JDBC drivers.

testdata/bin/create-tpc-jdbc-tables.py supports to create JDBC tables
for Impala-Impala, Postgres and MySQL.
Following sample commands creates TPCDS JDBC tables for Impala-Impala
federation with remote coordinator running at 10.19.10.86, and Postgres
server running at 10.19.10.86:
  ${IMPALA_HOME}/testdata/bin/create-tpc-jdbc-tables.py \
--jdbc_db_name=tpcds_jdbc --workload=tpcds \
--database_type=IMPALA --database_host=10.19.10.86 --clean

  ${IMPALA_HOME}/testdata/bin/create-tpc-jdbc-tables.py \
--jdbc_db_name=tpcds_jdbc --workload=tpcds \
--database_type=POSTGRES --database_host=10.19.10.86 \
--database_name=tpcds --clean

TPCDS tests for JDBC tables run only for release/exhaustive builds.
TPCH tests for JDBC tables run for core and exhaustive builds, except
Dockerized builds.

Remaining Issues:
 - tpcds-decimal_v2-q80a failed with returned rows not matching expected
   results for some decimal values. This will be fixed in IMPALA-13018.

Testing:
 - Passed core tests.
 - Passed query_test/test_tpcds_queries.py in release/exhaustive build.
 - Manually verified that only one SQL DataSource object was created for
   test_tpcds_queries.py::TestTpcdsQueryForJdbcTables since query option
   'clean_dbcp_ds_cache' was set as false, and the SQL DataSource object
   was closed by cleanup thread.

Change-Id: I44e8c1bb020e90559c7f22483a7ab7a151b8f48a
Reviewed-on: http://gerrit.cloudera.org:8080/21304
Reviewed-by: Abhishek Rawat 
Tested-by: Impala Public Jenkins 
(cherry picked from commit 08f8a300250df7b4f9a517cdb6bab48c379b7e03)


> Run TPCH/TPCDS queries for external JDBC tables
> ---
>
> Key: IMPALA-12910
> URL: https://issues.apache.org/jira/browse/IMPALA-12910
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Perf Investigation
>Reporter: Wenzhe Zhou
>Assignee: Wenzhe Zhou
>Priority: Major
> Fix For: Impala 4.5.0
>
>
> Need performance data for queries on external JDBC tables to be documented in 
> the design doc.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-11499) Refactor UrlEncode function to handle special characters

2024-05-13 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-11499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845907#comment-17845907
 ] 

ASF subversion and git services commented on IMPALA-11499:
--

Commit b8a66b0e104f8e25e70fce0326d36c9b48672dbb in impala's branch 
refs/heads/branch-4.4.0 from pranavyl
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=b8a66b0e1 ]

IMPALA-11499: Refactor UrlEncode function to handle special characters

An error came from an issue with URL encoding, where certain Unicode
characters were being incorrectly encoded due to their UTF-8
representation matching characters in the set of characters to escape.
For example, the string '运', which consists of three bytes
0xe8 0xbf 0x90 was wrongly getting encoded into '\E8%FFBF\90',
because the middle byte matched one of the two bytes that
represented the "\u00FF" literal. Inclusion of "\u00FF" was likely
a mistake from the beginning and it should have been '\x7F'.

The patch makes three key changes:
1. Before the change, the set of characters that need to be escaped
was stored as a string. The current patch uses an unordered_set
instead.

2. '\xFF', which is an invalid UTF-8 byte and whose inclusion was
erroneous from the beginning, is replaced with '\x7F', which is a
control character for DELETE, ensuring consistency and correctness in
URL encoding.

3. The list of characters to be escaped is extended to match the
current list in Hive.

Testing: Tests on both traditional Hive tables and Iceberg tables
are included in unicode-column-name.test, insert.test,
coding-util-test.cc and test_insert.py.

Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Reviewed-on: http://gerrit.cloudera.org:8080/21131
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
(cherry picked from commit 85cd07a11e876f3d8773f2638f699c61a6b0dd4c)


> Refactor UrlEncode function to handle special characters
> 
>
> Key: IMPALA-11499
> URL: https://issues.apache.org/jira/browse/IMPALA-11499
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Quanlong Huang
>Assignee: Pranav Yogi Lodha
>Priority: Critical
> Fix For: Impala 4.5.0
>
>
> Partition values are incorrectly URL-encoded in backend for unicode 
> characters, e.g. '运营业务数据' is encoded to '�%FFBF�营业务数据' which is wrong.
> To reproduce the issue, first create a partition table:
> {code:sql}
> create table my_part_tbl (id int) partitioned by (p string) stored as parquet;
> {code}
> Then insert data into it using partition values containing '运'. They will 
> fail:
> {noformat}
> [localhost:21050] default> insert into my_part_tbl partition(p='运营业务数据') 
> values (0);
> Query: insert into my_part_tbl partition(p='运营业务数据') values (0)
> Query submitted at: 2022-08-16 10:03:56 (Coordinator: 
> http://quanlong-OptiPlex-BJ:25000)
> Query progress can be monitored at: 
> http://quanlong-OptiPlex-BJ:25000/query_plan?query_id=404ac3027c4b7169:39d16a2d
> ERROR: Error(s) moving partition files. First error (of 1) was: Hdfs op 
> (RENAME 
> hdfs://localhost:20500/test-warehouse/my_part_tbl/_impala_insert_staging/404ac3027c4b7169_39d16a2d/.404ac3027c4b7169-39d16a2d_1475855322_dir/p=�%FFBF�营业务数据/404ac3027c4b7169-39d16a2d_1585092794_data.0.parq
>  TO 
> hdfs://localhost:20500/test-warehouse/my_part_tbl/p=�%FFBF�营业务数据/404ac3027c4b7169-39d16a2d_1585092794_data.0.parq)
>  failed, error was: 
> hdfs://localhost:20500/test-warehouse/my_part_tbl/_impala_insert_staging/404ac3027c4b7169_39d16a2d/.404ac3027c4b7169-39d16a2d_1475855322_dir/p=�%FFBF�营业务数据/404ac3027c4b7169-39d16a2d_1585092794_data.0.parq
> Error(5): Input/output error
> [localhost:21050] default> insert into my_part_tbl partition(p='运') values 
> (0);
> Query: insert into my_part_tbl partition(p='运') values (0)
> Query submitted at: 2022-08-16 10:04:22 (Coordinator: 
> http://quanlong-OptiPlex-BJ:25000)
> Query progress can be monitored at: 
> http://quanlong-OptiPlex-BJ:25000/query_plan?query_id=a64e5883473ec28d:86e7e335
> ERROR: Error(s) moving partition files. First error (of 1) was: Hdfs op 
> (RENAME 
> hdfs://localhost:20500/test-warehouse/my_part_tbl/_impala_insert_staging/a64e5883473ec28d_86e7e335/.a64e5883473ec28d-86e7e335_1582623091_dir/p=�%FFBF�/a64e5883473ec28d-86e7e335_163454510_data.0.parq
>  TO 
> hdfs://localhost:20500/test-warehouse/my_part_tbl/p=�%FFBF�/a64e5883473ec28d-86e7e335_163454510_data.0.parq)
>  failed, error was: 
> hdfs://localhost:20500/test-warehouse/my_part_tbl/_impala_insert_staging/a64e5883473ec28d_86e7e335/.a64e5883473ec28d-86e7e335_1582623091_dir/p=�%FFBF�/a64e5883473ec28d-86e7e335_163454510_data.0.parq
> 

[jira] [Commented] (IMPALA-13018) Fix test_tpcds_queries.py/TestTpcdsQueryForJdbcTables.test_tpcds-decimal_v2-q80a failure

2024-05-13 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845903#comment-17845903
 ] 

ASF subversion and git services commented on IMPALA-13018:
--

Commit 01401a0368cb8f19c86dc3fab764ee4b5732f2f6 in impala's branch 
refs/heads/branch-4.4.0 from wzhou-code
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=01401a036 ]

IMPALA-12910: Support running TPCH/TPCDS queries for JDBC tables

This patch adds script to create external JDBC tables for the dataset of
TPCH and TPCDS, and adds unit-tests to run TPCH and TPCDS queries for
external JDBC tables with Impala-Impala federation. Note that JDBC
tables are mapping tables, they don't take additional disk spaces.
It fixes the race condition when caching of SQL DataSource objects by
using a new DataSourceObjectCache class, which checks reference count
before closing SQL DataSource.
Adds a new query-option 'clean_dbcp_ds_cache' with default value as
true. When it's set as false, SQL DataSource object will not be closed
when its reference count equals 0 and will be kept in cache until
the SQL DataSource is idle for more than 5 minutes. Flag variable
'dbcp_data_source_idle_timeout_s' is added to make the duration
configurable.
java.sql.Connection.close() fails to remove a closed connection from
connection pool sometimes, which causes JDBC working threads to wait
for available connections from the connection pool for a long time.
The work around is to call BasicDataSource.invalidateConnection() API
to close a connection.
Two flag variables are added for DBCP configuration properties
'maxTotal' and 'maxWaitMillis'. Note that 'maxActive' and 'maxWait'
properties are renamed to 'maxTotal' and 'maxWaitMillis' respectively
in apache.commons.dbcp v2.
Fixes a bug for database type comparison since the type strings
specified by user could be lower case or mix of upper/lower cases, but
the code compares the types with upper case string.
Fixes issue to close SQL DataSource object in JdbcDataSource.open()
and JdbcDataSource.getNext() when some errors returned from DBCP APIs
or JDBC drivers.

testdata/bin/create-tpc-jdbc-tables.py supports to create JDBC tables
for Impala-Impala, Postgres and MySQL.
Following sample commands creates TPCDS JDBC tables for Impala-Impala
federation with remote coordinator running at 10.19.10.86, and Postgres
server running at 10.19.10.86:
  ${IMPALA_HOME}/testdata/bin/create-tpc-jdbc-tables.py \
--jdbc_db_name=tpcds_jdbc --workload=tpcds \
--database_type=IMPALA --database_host=10.19.10.86 --clean

  ${IMPALA_HOME}/testdata/bin/create-tpc-jdbc-tables.py \
--jdbc_db_name=tpcds_jdbc --workload=tpcds \
--database_type=POSTGRES --database_host=10.19.10.86 \
--database_name=tpcds --clean

TPCDS tests for JDBC tables run only for release/exhaustive builds.
TPCH tests for JDBC tables run for core and exhaustive builds, except
Dockerized builds.

Remaining Issues:
 - tpcds-decimal_v2-q80a failed with returned rows not matching expected
   results for some decimal values. This will be fixed in IMPALA-13018.

Testing:
 - Passed core tests.
 - Passed query_test/test_tpcds_queries.py in release/exhaustive build.
 - Manually verified that only one SQL DataSource object was created for
   test_tpcds_queries.py::TestTpcdsQueryForJdbcTables since query option
   'clean_dbcp_ds_cache' was set as false, and the SQL DataSource object
   was closed by cleanup thread.

Change-Id: I44e8c1bb020e90559c7f22483a7ab7a151b8f48a
Reviewed-on: http://gerrit.cloudera.org:8080/21304
Reviewed-by: Abhishek Rawat 
Tested-by: Impala Public Jenkins 
(cherry picked from commit 08f8a300250df7b4f9a517cdb6bab48c379b7e03)


> Fix 
> test_tpcds_queries.py/TestTpcdsQueryForJdbcTables.test_tpcds-decimal_v2-q80a 
> failure
> 
>
> Key: IMPALA-13018
> URL: https://issues.apache.org/jira/browse/IMPALA-13018
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend, Frontend
>Reporter: Wenzhe Zhou
>Assignee: Wenzhe Zhou
>Priority: Major
> Fix For: Impala 4.5.0
>
>
> The returned rows are not matching expected results for some decimal type of 
> columns. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13074) WRITE TO HDFS node is omitted from Web UI graphic plan

2024-05-13 Thread Noemi Pap-Takacs (Jira)
Noemi Pap-Takacs created IMPALA-13074:
-

 Summary: WRITE TO HDFS node is omitted from Web UI graphic plan
 Key: IMPALA-13074
 URL: https://issues.apache.org/jira/browse/IMPALA-13074
 Project: IMPALA
  Issue Type: Bug
Reporter: Noemi Pap-Takacs


The query plan shows the nodes that take part in the execution, forming a tree 
structure.

It can be displayed in the CLI by issuing the EXPLAIN  command. When the 
actual query is executed, the plan tree can also be viewed in the Impala Web UI 
in a graphic form.

However, the explain string and the graphic plan tree does not match: the top 
node is missing from the Web UI.

This is especially confusing in case of DDL and DML statements, where the Data 
Sink is not displayed. This makes a SELECT * FROM table indistinguishable from 
a CREATE TABLE, since both only displays the SCAN node and omit the 
WRITE_TO_HDFS and SELECT node.

It would make sense to include the WRITE_TO_HDFS node in DML/DDL plans.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org