[jira] [Assigned] (IMPALA-13109) Use RoaringBitmap in IcebergDeleteNode

2024-06-13 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy reassigned IMPALA-13109:
--

Assignee: Zoltán Borók-Nagy

> Use RoaringBitmap in IcebergDeleteNode
> --
>
> Key: IMPALA-13109
> URL: https://issues.apache.org/jira/browse/IMPALA-13109
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>  Labels: impala-iceberg
>
> IcebergDeleteNode currently uses an ordered int64_t array for each data file 
> to hold the deleted positions. This can consume significant amount of memory 
> when there are lots of deleted records.
> E.g. 100 Million delete records consume 800 MiB memory.
> RoaringBitmap is a highly compressed and highly efficient data structure to 
> store bitmaps:
> [https://arxiv.org/pdf/1603.06549]
> [https://github.com/RoaringBitmap/CRoaring]
> We could use it to store the deleted file positions instead of the sorted 
> arrays, as
>  * it consumes significantly less memory
>  * makes the code simpler
>  * *_might_* have perf benefits



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-13156) S3 tests flaky: the Hadoop-AWS credential provider chain occasionally fails to provide credentials for S3 access

2024-06-13 Thread Laszlo Gaal (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laszlo Gaal reassigned IMPALA-13156:


Assignee: Laszlo Gaal

> S3 tests flaky: the Hadoop-AWS credential provider chain occasionally fails 
> to provide credentials for S3 access
> 
>
> Key: IMPALA-13156
> URL: https://issues.apache.org/jira/browse/IMPALA-13156
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Laszlo Gaal
>Assignee: Laszlo Gaal
>Priority: Major
>
> During S3-based test runs executed on private infrastructure the default 
> Hadoop-AWS credential provider throws this error occasionally:
> {code}
> 2024-02-17T18:02:10,175  WARN [TThreadPoolServer WorkerProcess-22] 
> fs.FileSystem: Failed to initialize fileystem 
> s3a://redacted/test-warehouse/test_num_values_def_levels_mismatch_15b31ddb.db/too_many_def_levels:
>  java.nio.file.AccessDeniedException: redacted: 
> org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No AWS Credentials 
> provided by TemporaryAWSCredentialsProvider SimpleAWSCredentialsProvider 
> EnvironmentVariableCredentialsProvider IAMInstanceCredentialsProvider : 
> software.amazon.awssdk.core.exception.SdkClientException: Unable to load 
> credentials from system settings. Access key must be specified either via 
> environment variable (AWS_ACCESS_KEY_ID) or system property (aws.accessKeyId).
> 2024-02-17T18:02:10,175 ERROR [TThreadPoolServer WorkerProcess-22] 
> utils.MetaStoreUtils: Got exception: java.nio.file.AccessDeniedException 
> redacted: org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No 
> AWS Credentials provided by TemporaryAWSCredentialsProvider 
> SimpleAWSCredentialsProvider EnvironmentVariableCredentialsProvider 
> IAMInstanceCredentialsProvider : 
> software.amazon.awssdk.core.exception.SdkClientException: Unable to load 
> credentials from system settings. Access key must be specified either via 
> environment variable (AWS_ACCESS_KEY_ID) or system property (aws.accessKeyId).
> java.nio.file.AccessDeniedException: redacted: 
> org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No AWS Credentials 
> provided by TemporaryAWSCredentialsProvider SimpleAWSCredentialsProvider 
> EnvironmentVariableCredentialsProvider IAMInstanceCredentialsProvider : 
> software.amazon.awssdk.core.exception.SdkClientException: Unable to load 
> credentials from system settings. Access key must be specified either via 
> environment variable (AWS_ACCESS_KEY_ID) or system property (aws.accessKeyId).
> at 
> org.apache.hadoop.fs.s3a.AWSCredentialProviderList.maybeTranslateCredentialException(AWSCredentialProviderList.java:351)
>  ~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?]
> at 
> org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:201) 
> ~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?]
> at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:124) 
> ~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?]
> at org.apache.hadoop.fs.s3a.Invoker.lambda$retry$4(Invoker.java:376) 
> ~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?]
> at 
> org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:468) 
> ~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?]
> at org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:372) 
> ~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?]
> at org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:347) 
> ~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?]
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$verifyBucketExists$2(S3AFileSystem.java:972)
>  ~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?]
> at 
> org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.invokeTrackingDuration(IOStatisticsBinding.java:543)
>  ~[hadoop-common-3.1.1.7.2.18.0-620.jar:?]
> at 
> org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.lambda$trackDurationOfOperation$5(IOStatisticsBinding.java:524)
>  ~[hadoop-common-3.1.1.7.2.18.0-620.jar:?]
> at 
> org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.trackDuration(IOStatisticsBinding.java:445)
>  ~[hadoop-common-3.1.1.7.2.18.0-620.jar:?]
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.trackDurationAndSpan(S3AFileSystem.java:2748)
>  ~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?]
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.verifyBucketExists(S3AFileSystem.java:970)
>  ~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?]
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.doBucketProbing(S3AFileSystem.java:859)
>  ~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?]

[jira] [Work started] (IMPALA-13156) S3 tests flaky: the Hadoop-AWS credential provider chain occasionally fails to provide credentials for S3 access

2024-06-13 Thread Laszlo Gaal (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-13156 started by Laszlo Gaal.

> S3 tests flaky: the Hadoop-AWS credential provider chain occasionally fails 
> to provide credentials for S3 access
> 
>
> Key: IMPALA-13156
> URL: https://issues.apache.org/jira/browse/IMPALA-13156
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Laszlo Gaal
>Assignee: Laszlo Gaal
>Priority: Major
>
> During S3-based test runs executed on private infrastructure the default 
> Hadoop-AWS credential provider throws this error occasionally:
> {code}
> 2024-02-17T18:02:10,175  WARN [TThreadPoolServer WorkerProcess-22] 
> fs.FileSystem: Failed to initialize fileystem 
> s3a://redacted/test-warehouse/test_num_values_def_levels_mismatch_15b31ddb.db/too_many_def_levels:
>  java.nio.file.AccessDeniedException: redacted: 
> org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No AWS Credentials 
> provided by TemporaryAWSCredentialsProvider SimpleAWSCredentialsProvider 
> EnvironmentVariableCredentialsProvider IAMInstanceCredentialsProvider : 
> software.amazon.awssdk.core.exception.SdkClientException: Unable to load 
> credentials from system settings. Access key must be specified either via 
> environment variable (AWS_ACCESS_KEY_ID) or system property (aws.accessKeyId).
> 2024-02-17T18:02:10,175 ERROR [TThreadPoolServer WorkerProcess-22] 
> utils.MetaStoreUtils: Got exception: java.nio.file.AccessDeniedException 
> redacted: org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No 
> AWS Credentials provided by TemporaryAWSCredentialsProvider 
> SimpleAWSCredentialsProvider EnvironmentVariableCredentialsProvider 
> IAMInstanceCredentialsProvider : 
> software.amazon.awssdk.core.exception.SdkClientException: Unable to load 
> credentials from system settings. Access key must be specified either via 
> environment variable (AWS_ACCESS_KEY_ID) or system property (aws.accessKeyId).
> java.nio.file.AccessDeniedException: redacted: 
> org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No AWS Credentials 
> provided by TemporaryAWSCredentialsProvider SimpleAWSCredentialsProvider 
> EnvironmentVariableCredentialsProvider IAMInstanceCredentialsProvider : 
> software.amazon.awssdk.core.exception.SdkClientException: Unable to load 
> credentials from system settings. Access key must be specified either via 
> environment variable (AWS_ACCESS_KEY_ID) or system property (aws.accessKeyId).
> at 
> org.apache.hadoop.fs.s3a.AWSCredentialProviderList.maybeTranslateCredentialException(AWSCredentialProviderList.java:351)
>  ~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?]
> at 
> org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:201) 
> ~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?]
> at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:124) 
> ~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?]
> at org.apache.hadoop.fs.s3a.Invoker.lambda$retry$4(Invoker.java:376) 
> ~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?]
> at 
> org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:468) 
> ~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?]
> at org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:372) 
> ~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?]
> at org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:347) 
> ~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?]
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$verifyBucketExists$2(S3AFileSystem.java:972)
>  ~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?]
> at 
> org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.invokeTrackingDuration(IOStatisticsBinding.java:543)
>  ~[hadoop-common-3.1.1.7.2.18.0-620.jar:?]
> at 
> org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.lambda$trackDurationOfOperation$5(IOStatisticsBinding.java:524)
>  ~[hadoop-common-3.1.1.7.2.18.0-620.jar:?]
> at 
> org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.trackDuration(IOStatisticsBinding.java:445)
>  ~[hadoop-common-3.1.1.7.2.18.0-620.jar:?]
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.trackDurationAndSpan(S3AFileSystem.java:2748)
>  ~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?]
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.verifyBucketExists(S3AFileSystem.java:970)
>  ~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?]
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.doBucketProbing(S3AFileSystem.java:859)
>  ~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?]
> at

[jira] [Commented] (IMPALA-13131) Azure OpenAI API expects 'api-key' instead of 'Authorization' in the request header

2024-06-12 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17854592#comment-17854592
 ] 

ASF subversion and git services commented on IMPALA-13131:
--

Commit 3668a9517c4d8097591ed3b6fa672bf87faa77f6 in impala's branch 
refs/heads/master from Abhishek Rawat
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=3668a9517 ]

IMPALA-13131: Azure OpenAI API expects 'api-key' instead of 'Authorization' in 
the request header

Updated the POST request when communicating with Azure Open AI
endpoint. The header now includes 'api-key: ' instead of
'Authorization: Bearer '.

Also, removed 'model' as a required param for the Azure Open AI api
call. This is mainly because the endpoint contains deployment which
is basically already mapped to a model.

Testing:
- Updated existing unit test as per the Azure API reference
- Manually tested builtin 'ai_generate_text' using an Azure Open AI
deployment.

Change-Id: If9cc07940ce355d511bcf0ee615ff31042d13eb5
Reviewed-on: http://gerrit.cloudera.org:8080/21493
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Azure OpenAI API expects 'api-key' instead of 'Authorization' in the request 
> header
> ---
>
> Key: IMPALA-13131
> URL: https://issues.apache.org/jira/browse/IMPALA-13131
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Abhishek Rawat
>Assignee: Abhishek Rawat
>Priority: Major
>
> As per the [API 
> reference|https://learn.microsoft.com/en-us/azure/ai-services/openai/reference],
>  the header expects API key as follows:
>  
> {code:java}
> curl 
> https://YOUR_RESOURCE_NAME.openai.azure.com/openai/deployments/YOUR_DEPLOYMENT_NAME/completions?api-version=2024-02-01\
>   -H "Content-Type: application/json" \
>   -H "api-key: YOUR_API_KEY" \ <<<<<<< API Key
>   -d "{
>   \"prompt\": \"Once upon a time\",
>   \"max_tokens\": 5
> }" {code}
> Impala supports API Key as follows:
>  
>  
> {code:java}
> curl 
> https://YOUR_RESOURCE_NAME.openai.azure.com/openai/deployments/YOUR_DEPLOYMENT_NAME/completions?api-version=2024-02-01\
>   -H "Content-Type: application/json" \
>   -H "Authorization: Bearer YOUR_API_KEY" \   <<<<<<<< API Key
>   -d "{
>   \"prompt\": \"Once upon a time\",
>   \"max_tokens\": 5
> }"{code}
> This causes ai functions calling Azure OpenAI endpoint to fail with 401 error:
> {code:java}
> { "statusCode": 401, "message": "Unauthorized. Access token is missing, 
> invalid, audience is incorrect (https://cognitiveservices.azure.com), or have 
> expired." } {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13157) SimpleLogger should support writing to remote storage

2024-06-12 Thread Abhishek Rawat (Jira)
Abhishek Rawat created IMPALA-13157:
---

 Summary: SimpleLogger should support writing to remote storage
 Key: IMPALA-13157
 URL: https://issues.apache.org/jira/browse/IMPALA-13157
 Project: IMPALA
  Issue Type: Bug
Reporter: Abhishek Rawat


SimpleLogger is used for writing query profiles to local filesystem. In some 
environments the local filesystem may not have enough storage for storing query 
profiles. For such environments, it probably makes sense to add support for 
writing query profiles to remote storage such as HDFS, S3, ABFS.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13157) SimpleLogger should support writing to remote storage

2024-06-12 Thread Abhishek Rawat (Jira)
Abhishek Rawat created IMPALA-13157:
---

 Summary: SimpleLogger should support writing to remote storage
 Key: IMPALA-13157
 URL: https://issues.apache.org/jira/browse/IMPALA-13157
 Project: IMPALA
  Issue Type: Bug
Reporter: Abhishek Rawat


SimpleLogger is used for writing query profiles to local filesystem. In some 
environments the local filesystem may not have enough storage for storing query 
profiles. For such environments, it probably makes sense to add support for 
writing query profiles to remote storage such as HDFS, S3, ABFS.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IMPALA-12562) CAST(ROUND(INT a/ INT b, INT d)) as STRING) may return wrong result

2024-06-12 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17854541#comment-17854541
 ] 

ASF subversion and git services commented on IMPALA-12562:
--

Commit 0d429462f7f61565119ee2e593867a22886d7209 in impala's branch 
refs/heads/master from zhangyifan27
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=0d429462f ]

IMPALA-12562: Cast double and float to string with exact presicion

The builtin functions casttostring(DOUBLE) and casttostring(FLOAT)
printed more digits when converting double and float values to
string values. This patch fixes this by switching to use the existing
methods DoubleToBuffer and FloatToBuffer, which are simple and fast
implementations to print necessary digits.

Testing:
  - Add end-to-end tests to verify the fixes
  - Add benchmarks for modified functions
  - Update tests in expr-test

Change-Id: Icd79c55dd57dc0fa13e4ec11c2284ef2800e8b1a
Reviewed-on: http://gerrit.cloudera.org:8080/21441
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> CAST(ROUND(INT a/ INT b, INT d)) as STRING) may return wrong result
> ---
>
> Key: IMPALA-12562
> URL: https://issues.apache.org/jira/browse/IMPALA-12562
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 4.3.0
>Reporter: YifanZhang
>Priority: Major
>
> The following query returns a wrong result:
> {code:java}
>  select cast(round(1/3*100, 2) as string)
> +-+
> | cast(round(1 / 3, 2) as string) |
> +-+
> | 0.33002             |
> +-+
> Fetched 1 row(s) in 0.11s {code}
> Remove the cast function and the result is expected:
> {code:java}
>  select round(1/3,2);
> +-+
> | round(1 / 3, 2) |
> +-+
> | 0.33            |
> +-+ {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13156) S3 tests flaky: the Hadoop-AWS credential provider chain occasionally fails to provide credentials for S3 access

2024-06-12 Thread Laszlo Gaal (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17854534#comment-17854534
 ] 

Laszlo Gaal commented on IMPALA-13156:
--

At this time the following options are investigated:
* possible throttling at the EC@ instance metadata provider interface 
http://169.254.169.254/. This seems unlikely, because Hadoop-AWS documents the 
existence of an internal singleton object that caches these values in an 
attempt to prevent bursts of client requests overloading this interface.
* Offline discussions with Hadoop developers suggested that the specific error 
message above could be misleading: it could be just a generic catch-all error 
message returned from some other chain of errors. Specifying a single, specific 
credential provider in {{core-site.xml}} might allow the credential provider to 
return more detailed and more specific error messages for the same error 
condition.

> S3 tests flaky: the Hadoop-AWS credential provider chain occasionally fails 
> to provide credentials for S3 access
> 
>
> Key: IMPALA-13156
> URL: https://issues.apache.org/jira/browse/IMPALA-13156
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Laszlo Gaal
>Priority: Major
>
> During S3-based test runs executed on private infrastructure the default 
> Hadoop-AWS credential provider throws this error occasionally:
> {code}
> 2024-02-17T18:02:10,175  WARN [TThreadPoolServer WorkerProcess-22] 
> fs.FileSystem: Failed to initialize fileystem 
> s3a://redacted/test-warehouse/test_num_values_def_levels_mismatch_15b31ddb.db/too_many_def_levels:
>  java.nio.file.AccessDeniedException: redacted: 
> org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No AWS Credentials 
> provided by TemporaryAWSCredentialsProvider SimpleAWSCredentialsProvider 
> EnvironmentVariableCredentialsProvider IAMInstanceCredentialsProvider : 
> software.amazon.awssdk.core.exception.SdkClientException: Unable to load 
> credentials from system settings. Access key must be specified either via 
> environment variable (AWS_ACCESS_KEY_ID) or system property (aws.accessKeyId).
> 2024-02-17T18:02:10,175 ERROR [TThreadPoolServer WorkerProcess-22] 
> utils.MetaStoreUtils: Got exception: java.nio.file.AccessDeniedException 
> redacted: org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No 
> AWS Credentials provided by TemporaryAWSCredentialsProvider 
> SimpleAWSCredentialsProvider EnvironmentVariableCredentialsProvider 
> IAMInstanceCredentialsProvider : 
> software.amazon.awssdk.core.exception.SdkClientException: Unable to load 
> credentials from system settings. Access key must be specified either via 
> environment variable (AWS_ACCESS_KEY_ID) or system property (aws.accessKeyId).
> java.nio.file.AccessDeniedException: redacted: 
> org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No AWS Credentials 
> provided by TemporaryAWSCredentialsProvider SimpleAWSCredentialsProvider 
> EnvironmentVariableCredentialsProvider IAMInstanceCredentialsProvider : 
> software.amazon.awssdk.core.exception.SdkClientException: Unable to load 
> credentials from system settings. Access key must be specified either via 
> environment variable (AWS_ACCESS_KEY_ID) or system property (aws.accessKeyId).
> at 
> org.apache.hadoop.fs.s3a.AWSCredentialProviderList.maybeTranslateCredentialException(AWSCredentialProviderList.java:351)
>  ~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?]
> at 
> org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:201) 
> ~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?]
> at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:124) 
> ~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?]
> at org.apache.hadoop.fs.s3a.Invoker.lambda$retry$4(Invoker.java:376) 
> ~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?]
> at 
> org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:468) 
> ~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?]
> at org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:372) 
> ~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?]
> at org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:347) 
> ~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?]
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$verifyBucketExists$2(S3AFileSystem.java:972)
>  ~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?]
> at 
> org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.invokeTrackingDuration(IOStatisticsBinding.java:543)
>  ~[hadoop-common-3.1.1.7.2.18.0-620.jar:?]
> at 
> org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.

[jira] [Updated] (IMPALA-13156) S3 tests flaky: the Hadoop-AWS credential provider chain occasionally fails to provide credentials for S3 access

2024-06-12 Thread Laszlo Gaal (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laszlo Gaal updated IMPALA-13156:
-
Description: 
During S3-based test runs executed on private infrastructure the default 
Hadoop-AWS credential provider throws this error occasionally:
{code}
2024-02-17T18:02:10,175  WARN [TThreadPoolServer WorkerProcess-22] 
fs.FileSystem: Failed to initialize fileystem 
s3a://redacted/test-warehouse/test_num_values_def_levels_mismatch_15b31ddb.db/too_many_def_levels:
 java.nio.file.AccessDeniedException: redacted: 
org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No AWS Credentials 
provided by TemporaryAWSCredentialsProvider SimpleAWSCredentialsProvider 
EnvironmentVariableCredentialsProvider IAMInstanceCredentialsProvider : 
software.amazon.awssdk.core.exception.SdkClientException: Unable to load 
credentials from system settings. Access key must be specified either via 
environment variable (AWS_ACCESS_KEY_ID) or system property (aws.accessKeyId).
2024-02-17T18:02:10,175 ERROR [TThreadPoolServer WorkerProcess-22] 
utils.MetaStoreUtils: Got exception: java.nio.file.AccessDeniedException 
redacted: org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No AWS 
Credentials provided by TemporaryAWSCredentialsProvider 
SimpleAWSCredentialsProvider EnvironmentVariableCredentialsProvider 
IAMInstanceCredentialsProvider : 
software.amazon.awssdk.core.exception.SdkClientException: Unable to load 
credentials from system settings. Access key must be specified either via 
environment variable (AWS_ACCESS_KEY_ID) or system property (aws.accessKeyId).
java.nio.file.AccessDeniedException: redacted: 
org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No AWS Credentials 
provided by TemporaryAWSCredentialsProvider SimpleAWSCredentialsProvider 
EnvironmentVariableCredentialsProvider IAMInstanceCredentialsProvider : 
software.amazon.awssdk.core.exception.SdkClientException: Unable to load 
credentials from system settings. Access key must be specified either via 
environment variable (AWS_ACCESS_KEY_ID) or system property (aws.accessKeyId).
at 
org.apache.hadoop.fs.s3a.AWSCredentialProviderList.maybeTranslateCredentialException(AWSCredentialProviderList.java:351)
 ~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?]
at 
org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:201) 
~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?]
at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:124) 
~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?]
at org.apache.hadoop.fs.s3a.Invoker.lambda$retry$4(Invoker.java:376) 
~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?]
at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:468) 
~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?]
at org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:372) 
~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?]
at org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:347) 
~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?]
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$verifyBucketExists$2(S3AFileSystem.java:972)
 ~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?]
at 
org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.invokeTrackingDuration(IOStatisticsBinding.java:543)
 ~[hadoop-common-3.1.1.7.2.18.0-620.jar:?]
at 
org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.lambda$trackDurationOfOperation$5(IOStatisticsBinding.java:524)
 ~[hadoop-common-3.1.1.7.2.18.0-620.jar:?]
at 
org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.trackDuration(IOStatisticsBinding.java:445)
 ~[hadoop-common-3.1.1.7.2.18.0-620.jar:?]
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.trackDurationAndSpan(S3AFileSystem.java:2748)
 ~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?]
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.verifyBucketExists(S3AFileSystem.java:970)
 ~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?]
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.doBucketProbing(S3AFileSystem.java:859) 
~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?]
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:715) 
~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?]
at 
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3452) 
~[hadoop-common-3.1.1.7.2.18.0-620.jar:?]
at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:162) 
~[hadoop-common-3.1.1.7.2.18.0-620.jar:?]
at 
org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3557) 
~[hadoop-common-3.1.1.7.2.18.0-620.jar:?]
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3504) 
~[hadoop-common-3.1.1.7.2.18.0-620.jar:?]
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:522) 
~[hadoop-common-3.1.1.7.2.18.0-620.jar:?]
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:361) 
~[hadoop-common-3.1.1.7.2.18.0-620.jar:?]
at org.apache.hadoop.hive.metastore.Warehouse.getFs

[jira] [Created] (IMPALA-13156) S3 tests flaky: the Hadoop-AWS credential provider chain occasionally fails to provide credentials for S3 access

2024-06-12 Thread Laszlo Gaal (Jira)
Laszlo Gaal created IMPALA-13156:


 Summary: S3 tests flaky: the Hadoop-AWS credential provider chain 
occasionally fails to provide credentials for S3 access
 Key: IMPALA-13156
 URL: https://issues.apache.org/jira/browse/IMPALA-13156
 Project: IMPALA
  Issue Type: Bug
Reporter: Laszlo Gaal


During S3-based test runs executed on private infrastructure the default 
Hadoop-AWS credential provider throws this error occasionally:
{code}
2024-02-17T18:02:10,175  WARN [TThreadPoolServer WorkerProcess-22] 
fs.FileSystem: Failed to initialize fileystem 
s3a://redacted/test-warehouse/test_num_values_def_levels_mismatch_15b31ddb.db/too_many_def_levels:
 java.nio.file.AccessDeniedException: redacted: 
org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No AWS Credentials 
provided by TemporaryAWSCredentialsProvider SimpleAWSCredentialsProvider 
EnvironmentVariableCredentialsProvider IAMInstanceCredentialsProvider : 
software.amazon.awssdk.core.exception.SdkClientException: Unable to load 
credentials from system settings. Access key must be specified either via 
environment variable (AWS_ACCESS_KEY_ID) or system property (aws.accessKeyId).
2024-02-17T18:02:10,175 ERROR [TThreadPoolServer WorkerProcess-22] 
utils.MetaStoreUtils: Got exception: java.nio.file.AccessDeniedException 
redacted: org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No AWS 
Credentials provided by TemporaryAWSCredentialsProvider 
SimpleAWSCredentialsProvider EnvironmentVariableCredentialsProvider 
IAMInstanceCredentialsProvider : 
software.amazon.awssdk.core.exception.SdkClientException: Unable to load 
credentials from system settings. Access key must be specified either via 
environment variable (AWS_ACCESS_KEY_ID) or system property (aws.accessKeyId).
java.nio.file.AccessDeniedException: redacted: 
org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No AWS Credentials 
provided by TemporaryAWSCredentialsProvider SimpleAWSCredentialsProvider 
EnvironmentVariableCredentialsProvider IAMInstanceCredentialsProvider : 
software.amazon.awssdk.core.exception.SdkClientException: Unable to load 
credentials from system settings. Access key must be specified either via 
environment variable (AWS_ACCESS_KEY_ID) or system property (aws.accessKeyId).
at 
org.apache.hadoop.fs.s3a.AWSCredentialProviderList.maybeTranslateCredentialException(AWSCredentialProviderList.java:351)
 ~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?]
at 
org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:201) 
~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?]
at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:124) 
~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?]
at org.apache.hadoop.fs.s3a.Invoker.lambda$retry$4(Invoker.java:376) 
~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?]
at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:468) 
~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?]
at org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:372) 
~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?]
at org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:347) 
~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?]
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$verifyBucketExists$2(S3AFileSystem.java:972)
 ~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?]
at 
org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.invokeTrackingDuration(IOStatisticsBinding.java:543)
 ~[hadoop-common-3.1.1.7.2.18.0-620.jar:?]
at 
org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.lambda$trackDurationOfOperation$5(IOStatisticsBinding.java:524)
 ~[hadoop-common-3.1.1.7.2.18.0-620.jar:?]
at 
org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.trackDuration(IOStatisticsBinding.java:445)
 ~[hadoop-common-3.1.1.7.2.18.0-620.jar:?]
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.trackDurationAndSpan(S3AFileSystem.java:2748)
 ~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?]
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.verifyBucketExists(S3AFileSystem.java:970)
 ~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?]
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.doBucketProbing(S3AFileSystem.java:859) 
~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?]
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:715) 
~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?]
at 
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3452) 
~[hadoop-common-3.1.1.7.2.18.0-620.jar:?]
at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:162) 
~[hadoop-common-3.1.1.7.2.18.0-620.jar:?]
at 
org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3557) 
~[hadoop-common-3.1.1.7.2.18.0-620.jar:?]
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3504) 
~[hadoop-common-3.1.1.7.2.18.0-620.jar:?]
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:522) 
~[hadoop-common

[jira] [Created] (IMPALA-13156) S3 tests flaky: the Hadoop-AWS credential provider chain occasionally fails to provide credentials for S3 access

2024-06-12 Thread Laszlo Gaal (Jira)
Laszlo Gaal created IMPALA-13156:


 Summary: S3 tests flaky: the Hadoop-AWS credential provider chain 
occasionally fails to provide credentials for S3 access
 Key: IMPALA-13156
 URL: https://issues.apache.org/jira/browse/IMPALA-13156
 Project: IMPALA
  Issue Type: Bug
Reporter: Laszlo Gaal


During S3-based test runs executed on private infrastructure the default 
Hadoop-AWS credential provider throws this error occasionally:
{code}
2024-02-17T18:02:10,175  WARN [TThreadPoolServer WorkerProcess-22] 
fs.FileSystem: Failed to initialize fileystem 
s3a://redacted/test-warehouse/test_num_values_def_levels_mismatch_15b31ddb.db/too_many_def_levels:
 java.nio.file.AccessDeniedException: redacted: 
org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No AWS Credentials 
provided by TemporaryAWSCredentialsProvider SimpleAWSCredentialsProvider 
EnvironmentVariableCredentialsProvider IAMInstanceCredentialsProvider : 
software.amazon.awssdk.core.exception.SdkClientException: Unable to load 
credentials from system settings. Access key must be specified either via 
environment variable (AWS_ACCESS_KEY_ID) or system property (aws.accessKeyId).
2024-02-17T18:02:10,175 ERROR [TThreadPoolServer WorkerProcess-22] 
utils.MetaStoreUtils: Got exception: java.nio.file.AccessDeniedException 
redacted: org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No AWS 
Credentials provided by TemporaryAWSCredentialsProvider 
SimpleAWSCredentialsProvider EnvironmentVariableCredentialsProvider 
IAMInstanceCredentialsProvider : 
software.amazon.awssdk.core.exception.SdkClientException: Unable to load 
credentials from system settings. Access key must be specified either via 
environment variable (AWS_ACCESS_KEY_ID) or system property (aws.accessKeyId).
java.nio.file.AccessDeniedException: redacted: 
org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No AWS Credentials 
provided by TemporaryAWSCredentialsProvider SimpleAWSCredentialsProvider 
EnvironmentVariableCredentialsProvider IAMInstanceCredentialsProvider : 
software.amazon.awssdk.core.exception.SdkClientException: Unable to load 
credentials from system settings. Access key must be specified either via 
environment variable (AWS_ACCESS_KEY_ID) or system property (aws.accessKeyId).
at 
org.apache.hadoop.fs.s3a.AWSCredentialProviderList.maybeTranslateCredentialException(AWSCredentialProviderList.java:351)
 ~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?]
at 
org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:201) 
~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?]
at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:124) 
~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?]
at org.apache.hadoop.fs.s3a.Invoker.lambda$retry$4(Invoker.java:376) 
~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?]
at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:468) 
~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?]
at org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:372) 
~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?]
at org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:347) 
~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?]
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$verifyBucketExists$2(S3AFileSystem.java:972)
 ~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?]
at 
org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.invokeTrackingDuration(IOStatisticsBinding.java:543)
 ~[hadoop-common-3.1.1.7.2.18.0-620.jar:?]
at 
org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.lambda$trackDurationOfOperation$5(IOStatisticsBinding.java:524)
 ~[hadoop-common-3.1.1.7.2.18.0-620.jar:?]
at 
org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.trackDuration(IOStatisticsBinding.java:445)
 ~[hadoop-common-3.1.1.7.2.18.0-620.jar:?]
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.trackDurationAndSpan(S3AFileSystem.java:2748)
 ~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?]
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.verifyBucketExists(S3AFileSystem.java:970)
 ~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?]
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.doBucketProbing(S3AFileSystem.java:859) 
~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?]
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:715) 
~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?]
at 
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3452) 
~[hadoop-common-3.1.1.7.2.18.0-620.jar:?]
at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:162) 
~[hadoop-common-3.1.1.7.2.18.0-620.jar:?]
at 
org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3557) 
~[hadoop-common-3.1.1.7.2.18.0-620.jar:?]
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3504) 
~[hadoop-common-3.1.1.7.2.18.0-620.jar:?]
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:522) 
~[hadoop-common

[jira] [Commented] (IMPALA-13136) Refactor AnalyzedFunctionCallExpr

2024-06-12 Thread Joe McDonnell (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17854526#comment-17854526
 ] 

Joe McDonnell commented on IMPALA-13136:


[~scarlin] I'm ok with punting on this for a while. We have a long list of 
things that need to land, and this is more about code cleanliness than 
functionality.

> Refactor AnalyzedFunctionCallExpr
> -
>
> Key: IMPALA-13136
> URL: https://issues.apache.org/jira/browse/IMPALA-13136
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Reporter: Steve Carlin
>Priority: Major
>
> Copied from code review:
> The part where we immediately analyze as part of the constructor makes for 
> complicated exception handling. RexVisitor doesn't support exceptions, so it 
> adds complication to handle them under those circumstances. I can't really 
> explain why it is necessary.
> Let me sketch out an alternative:
> 1. Construct the whole Expr tree without analyzing it
> 2. Any errors that happen during this process are not usually actionable by 
> the end user. It's good to have a descriptive error message, but it doesn't 
> mean there is something wrong with the SQL. I think that it is ok for this 
> code to throw subclasses of RuntimeException or use 
> Preconditions.checkState() with a good explanation.
> 3. When we get the Expr tree back in CreateExprVisitor::getExpr(), we call 
> analyze() on the root node, which does a recursive analysis of the whole tree.
> 4. The special Expr classes don't run analyze() in the constructor, don't 
> keep a reference to the Analyzer, and don't override resetAnalysisState(). 
> They override analyzeImpl() and they should be idempotent. The clone 
> constructor should not need to do anything special, just do a deep copy.
> I don't want to bog down this review. If we want to address this as a 
> followup, I can live with that, but I don't want us to go too far down this 
> road. (Or if we have a good explanation for why it is necessary, then we can 
> write a good comment and move on.)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13155) Not all Tuple::DeepCopy() smallify string values

2024-06-12 Thread Jira
Zoltán Borók-Nagy created IMPALA-13155:
--

 Summary: Not all Tuple::DeepCopy() smallify string values
 Key: IMPALA-13155
 URL: https://issues.apache.org/jira/browse/IMPALA-13155
 Project: IMPALA
  Issue Type: Bug
Reporter: Zoltán Borók-Nagy


Currently "Tuple::DeepCopy(const TupleDescriptor& desc, char** data, int* 
offset, bool convert_ptrs)" does not try to smallify string values, although it 
could safely do that.
 
We use that version of DeepCopy when we BROADCAST data between fragments, so 
smallifying on that path can be beneficial.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13155) Not all Tuple::DeepCopy() smallify string values

2024-06-12 Thread Jira
Zoltán Borók-Nagy created IMPALA-13155:
--

 Summary: Not all Tuple::DeepCopy() smallify string values
 Key: IMPALA-13155
 URL: https://issues.apache.org/jira/browse/IMPALA-13155
 Project: IMPALA
  Issue Type: Bug
Reporter: Zoltán Borók-Nagy


Currently "Tuple::DeepCopy(const TupleDescriptor& desc, char** data, int* 
offset, bool convert_ptrs)" does not try to smallify string values, although it 
could safely do that.
 
We use that version of DeepCopy when we BROADCAST data between fragments, so 
smallifying on that path can be beneficial.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IMPALA-12800) Queries with many nested inline views see performance issues with ExprSubstitutionMap

2024-06-12 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17854474#comment-17854474
 ] 

ASF subversion and git services commented on IMPALA-12800:
--

Commit 4681666e9386d87c647d19d6333750c16b6fa0c1 in impala's branch 
refs/heads/master from Michael Smith
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=4681666e9 ]

IMPALA-12800: Add cache for isTrueWithNullSlots() evaluation

isTrueWithNullSlots() can be expensive when it has to query the backend.
Many of the expressions will look similar, especially in large
auto-generated expressions. Adds a cache based on the nullified
expression to avoid querying the backend for expressions with identical
structure.

With DEBUG logging enabled for the Analyzer, computes and logs stats
about the null slots cache.

Adds 'use_null_slots_cache' query option to disable caching. Documents
the new option.

Change-Id: Ib63f5553284f21f775d2097b6c5d6bbb63699acd
Reviewed-on: http://gerrit.cloudera.org:8080/21484
Reviewed-by: Quanlong Huang 
Tested-by: Impala Public Jenkins 


> Queries with many nested inline views see performance issues with 
> ExprSubstitutionMap
> -
>
> Key: IMPALA-12800
> URL: https://issues.apache.org/jira/browse/IMPALA-12800
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 4.3.0
>Reporter: Joe McDonnell
>Assignee: Michael Smith
>Priority: Critical
> Fix For: Impala 4.5.0
>
> Attachments: impala12800repro.sql, impala12800schema.sql, 
> long_query_jstacks.tar.gz
>
>
> A user running a query with many layers of inline views saw a large amount of 
> time spent in analysis. 
>  
> {noformat}
> - Authorization finished (ranger): 7s518ms (13.134ms)
> - Value transfer graph computed: 7s760ms (241.953ms)
> - Single node plan created: 2m47s (2m39s)
> - Distributed plan created: 2m47s (7.430ms)
> - Lineage info computed: 2m47s (39.017ms)
> - Planning finished: 2m47s (672.518ms){noformat}
> In reproducing it locally, we found that most of the stacks end up in 
> ExprSubstitutionMap.
>  
> Here are the main stacks seen while running jstack every 3 seconds during a 
> 75 second execution:
> Location 1: (ExprSubstitutionMap::compose -> contains -> indexOf -> Expr 
> equals) (4 samples)
> {noformat}
>    java.lang.Thread.State: RUNNABLE
>     at org.apache.impala.analysis.Expr.equals(Expr.java:1008)
>     at java.util.ArrayList.indexOf(ArrayList.java:323)
>     at java.util.ArrayList.contains(ArrayList.java:306)
>     at 
> org.apache.impala.analysis.ExprSubstitutionMap.compose(ExprSubstitutionMap.java:120){noformat}
> Location 2:  (ExprSubstitutionMap::compose -> verify -> Expr equals) (9 
> samples)
> {noformat}
>    java.lang.Thread.State: RUNNABLE
>     at org.apache.impala.analysis.Expr.equals(Expr.java:1008)
>     at 
> org.apache.impala.analysis.ExprSubstitutionMap.verify(ExprSubstitutionMap.java:173)
>     at 
> org.apache.impala.analysis.ExprSubstitutionMap.compose(ExprSubstitutionMap.java:126){noformat}
> Location 3: (ExprSubstitutionMap::combine -> verify -> Expr equals) (5 
> samples)
> {noformat}
>    java.lang.Thread.State: RUNNABLE
>     at org.apache.impala.analysis.Expr.equals(Expr.java:1008)
>     at 
> org.apache.impala.analysis.ExprSubstitutionMap.verify(ExprSubstitutionMap.java:173)
>     at 
> org.apache.impala.analysis.ExprSubstitutionMap.combine(ExprSubstitutionMap.java:143){noformat}
> Location 4:  (TupleIsNullPredicate.wrapExprs ->  Analyzer.isTrueWithNullSlots 
> -> FeSupport.EvalPredicate -> Thrift serialization) (4 samples)
> {noformat}
>    java.lang.Thread.State: RUNNABLE
>     at java.lang.StringCoding.encode(StringCoding.java:364)
>     at java.lang.String.getBytes(String.java:941)
>     at 
> org.apache.thrift.protocol.TBinaryProtocol.writeString(TBinaryProtocol.java:227)
>     at 
> org.apache.impala.thrift.TClientRequest$TClientRequestStandardScheme.write(TClientRequest.java:532)
>     at 
> org.apache.impala.thrift.TClientRequest$TClientRequestStandardScheme.write(TClientRequest.java:467)
>     at org.apache.impala.thrift.TClientRequest.write(TClientRequest.java:394)
>     at 
> org.apache.impala.thrift.TQueryCtx$TQueryCtxStandardScheme.write(TQueryCtx.java:3034)
>     at 
> org.apache.impala.thrift.TQueryCtx$TQueryCtxStandardScheme.write(TQueryCtx.java:2709)
>     at org.apache.impala.thrift.TQueryCtx.write(TQueryCtx.java:2400)
>     at org.apache.thrift.TSerializer.serialize(TSerializ

[jira] [Comment Edited] (IMPALA-12800) Queries with many nested inline views see performance issues with ExprSubstitutionMap

2024-06-12 Thread Michael Smith (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17854473#comment-17854473
 ] 

Michael Smith edited comment on IMPALA-12800 at 6/12/24 3:36 PM:
-

Performance on these types of queries has been substantially improved. We saw 
an improvement of 20x on the example query. It would likely be more on larger 
queries as we switched from O(n^2) to O(n\) operations for ExprSubstitutionMap.


was (Author: JIRAUSER288956):
Performance on these types of queries has been substantially improved. We saw 
an improvement of 20x on the example query. It would likely be more on larger 
queries as we switched from O(n^2) to O(n) operations for ExprSubstitutionMap.

> Queries with many nested inline views see performance issues with 
> ExprSubstitutionMap
> -
>
> Key: IMPALA-12800
> URL: https://issues.apache.org/jira/browse/IMPALA-12800
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 4.3.0
>Reporter: Joe McDonnell
>Assignee: Michael Smith
>Priority: Critical
> Fix For: Impala 4.5.0
>
> Attachments: impala12800repro.sql, impala12800schema.sql, 
> long_query_jstacks.tar.gz
>
>
> A user running a query with many layers of inline views saw a large amount of 
> time spent in analysis. 
>  
> {noformat}
> - Authorization finished (ranger): 7s518ms (13.134ms)
> - Value transfer graph computed: 7s760ms (241.953ms)
> - Single node plan created: 2m47s (2m39s)
> - Distributed plan created: 2m47s (7.430ms)
> - Lineage info computed: 2m47s (39.017ms)
> - Planning finished: 2m47s (672.518ms){noformat}
> In reproducing it locally, we found that most of the stacks end up in 
> ExprSubstitutionMap.
>  
> Here are the main stacks seen while running jstack every 3 seconds during a 
> 75 second execution:
> Location 1: (ExprSubstitutionMap::compose -> contains -> indexOf -> Expr 
> equals) (4 samples)
> {noformat}
>    java.lang.Thread.State: RUNNABLE
>     at org.apache.impala.analysis.Expr.equals(Expr.java:1008)
>     at java.util.ArrayList.indexOf(ArrayList.java:323)
>     at java.util.ArrayList.contains(ArrayList.java:306)
>     at 
> org.apache.impala.analysis.ExprSubstitutionMap.compose(ExprSubstitutionMap.java:120){noformat}
> Location 2:  (ExprSubstitutionMap::compose -> verify -> Expr equals) (9 
> samples)
> {noformat}
>    java.lang.Thread.State: RUNNABLE
>     at org.apache.impala.analysis.Expr.equals(Expr.java:1008)
>     at 
> org.apache.impala.analysis.ExprSubstitutionMap.verify(ExprSubstitutionMap.java:173)
>     at 
> org.apache.impala.analysis.ExprSubstitutionMap.compose(ExprSubstitutionMap.java:126){noformat}
> Location 3: (ExprSubstitutionMap::combine -> verify -> Expr equals) (5 
> samples)
> {noformat}
>    java.lang.Thread.State: RUNNABLE
>     at org.apache.impala.analysis.Expr.equals(Expr.java:1008)
>     at 
> org.apache.impala.analysis.ExprSubstitutionMap.verify(ExprSubstitutionMap.java:173)
>     at 
> org.apache.impala.analysis.ExprSubstitutionMap.combine(ExprSubstitutionMap.java:143){noformat}
> Location 4:  (TupleIsNullPredicate.wrapExprs ->  Analyzer.isTrueWithNullSlots 
> -> FeSupport.EvalPredicate -> Thrift serialization) (4 samples)
> {noformat}
>    java.lang.Thread.State: RUNNABLE
>     at java.lang.StringCoding.encode(StringCoding.java:364)
>     at java.lang.String.getBytes(String.java:941)
>     at 
> org.apache.thrift.protocol.TBinaryProtocol.writeString(TBinaryProtocol.java:227)
>     at 
> org.apache.impala.thrift.TClientRequest$TClientRequestStandardScheme.write(TClientRequest.java:532)
>     at 
> org.apache.impala.thrift.TClientRequest$TClientRequestStandardScheme.write(TClientRequest.java:467)
>     at org.apache.impala.thrift.TClientRequest.write(TClientRequest.java:394)
>     at 
> org.apache.impala.thrift.TQueryCtx$TQueryCtxStandardScheme.write(TQueryCtx.java:3034)
>     at 
> org.apache.impala.thrift.TQueryCtx$TQueryCtxStandardScheme.write(TQueryCtx.java:2709)
>     at org.apache.impala.thrift.TQueryCtx.write(TQueryCtx.java:2400)
>     at org.apache.thrift.TSerializer.serialize(TSerializer.java:84)
>     at 
> org.apache.impala.service.FeSupport.EvalExprWithoutRowBounded(FeSupport.java:206)
>     at 
> org.apache.impala.service.FeSupport.EvalExprWithoutRow(FeSupport.java:194)
>     at org.apache.impala.service.FeSupport.EvalPredicate(FeSupport.java:275)
>     at 

[jira] [Resolved] (IMPALA-12800) Queries with many nested inline views see performance issues with ExprSubstitutionMap

2024-06-12 Thread Michael Smith (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith resolved IMPALA-12800.

Fix Version/s: Impala 4.5.0
   Resolution: Fixed

Performance on these types of queries has been substantially improved. We saw 
an improvement of 20x on the example query. It would likely be more on larger 
queries as we switched from O(n^2) to O(n) operations for ExprSubstitutionMap.

> Queries with many nested inline views see performance issues with 
> ExprSubstitutionMap
> -
>
> Key: IMPALA-12800
> URL: https://issues.apache.org/jira/browse/IMPALA-12800
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 4.3.0
>Reporter: Joe McDonnell
>Assignee: Michael Smith
>Priority: Critical
> Fix For: Impala 4.5.0
>
> Attachments: impala12800repro.sql, impala12800schema.sql, 
> long_query_jstacks.tar.gz
>
>
> A user running a query with many layers of inline views saw a large amount of 
> time spent in analysis. 
>  
> {noformat}
> - Authorization finished (ranger): 7s518ms (13.134ms)
> - Value transfer graph computed: 7s760ms (241.953ms)
> - Single node plan created: 2m47s (2m39s)
> - Distributed plan created: 2m47s (7.430ms)
> - Lineage info computed: 2m47s (39.017ms)
> - Planning finished: 2m47s (672.518ms){noformat}
> In reproducing it locally, we found that most of the stacks end up in 
> ExprSubstitutionMap.
>  
> Here are the main stacks seen while running jstack every 3 seconds during a 
> 75 second execution:
> Location 1: (ExprSubstitutionMap::compose -> contains -> indexOf -> Expr 
> equals) (4 samples)
> {noformat}
>    java.lang.Thread.State: RUNNABLE
>     at org.apache.impala.analysis.Expr.equals(Expr.java:1008)
>     at java.util.ArrayList.indexOf(ArrayList.java:323)
>     at java.util.ArrayList.contains(ArrayList.java:306)
>     at 
> org.apache.impala.analysis.ExprSubstitutionMap.compose(ExprSubstitutionMap.java:120){noformat}
> Location 2:  (ExprSubstitutionMap::compose -> verify -> Expr equals) (9 
> samples)
> {noformat}
>    java.lang.Thread.State: RUNNABLE
>     at org.apache.impala.analysis.Expr.equals(Expr.java:1008)
>     at 
> org.apache.impala.analysis.ExprSubstitutionMap.verify(ExprSubstitutionMap.java:173)
>     at 
> org.apache.impala.analysis.ExprSubstitutionMap.compose(ExprSubstitutionMap.java:126){noformat}
> Location 3: (ExprSubstitutionMap::combine -> verify -> Expr equals) (5 
> samples)
> {noformat}
>    java.lang.Thread.State: RUNNABLE
>     at org.apache.impala.analysis.Expr.equals(Expr.java:1008)
>     at 
> org.apache.impala.analysis.ExprSubstitutionMap.verify(ExprSubstitutionMap.java:173)
>     at 
> org.apache.impala.analysis.ExprSubstitutionMap.combine(ExprSubstitutionMap.java:143){noformat}
> Location 4:  (TupleIsNullPredicate.wrapExprs ->  Analyzer.isTrueWithNullSlots 
> -> FeSupport.EvalPredicate -> Thrift serialization) (4 samples)
> {noformat}
>    java.lang.Thread.State: RUNNABLE
>     at java.lang.StringCoding.encode(StringCoding.java:364)
>     at java.lang.String.getBytes(String.java:941)
>     at 
> org.apache.thrift.protocol.TBinaryProtocol.writeString(TBinaryProtocol.java:227)
>     at 
> org.apache.impala.thrift.TClientRequest$TClientRequestStandardScheme.write(TClientRequest.java:532)
>     at 
> org.apache.impala.thrift.TClientRequest$TClientRequestStandardScheme.write(TClientRequest.java:467)
>     at org.apache.impala.thrift.TClientRequest.write(TClientRequest.java:394)
>     at 
> org.apache.impala.thrift.TQueryCtx$TQueryCtxStandardScheme.write(TQueryCtx.java:3034)
>     at 
> org.apache.impala.thrift.TQueryCtx$TQueryCtxStandardScheme.write(TQueryCtx.java:2709)
>     at org.apache.impala.thrift.TQueryCtx.write(TQueryCtx.java:2400)
>     at org.apache.thrift.TSerializer.serialize(TSerializer.java:84)
>     at 
> org.apache.impala.service.FeSupport.EvalExprWithoutRowBounded(FeSupport.java:206)
>     at 
> org.apache.impala.service.FeSupport.EvalExprWithoutRow(FeSupport.java:194)
>     at org.apache.impala.service.FeSupport.EvalPredicate(FeSupport.java:275)
>     at 
> org.apache.impala.analysis.Analyzer.isTrueWithNullSlots(Analyzer.java:2888)
>     at 
> org.apache.impala.analysis.TupleIsNullPredicate.requiresNullWrapping(TupleIsNullPredicate.java:181)
>     at 
> org.apache.impala.analysis.TupleIsNullPredicate.wrapExpr(TupleIsNullPredicate.java:147)
>     at 
> org.apache.impala.analysis.TupleIsNullPredicate.wrapExprs(TupleIsNullPredicate.java:136){noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (IMPALA-12800) Queries with many nested inline views see performance issues with ExprSubstitutionMap

2024-06-12 Thread Michael Smith (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith resolved IMPALA-12800.

Fix Version/s: Impala 4.5.0
   Resolution: Fixed

Performance on these types of queries has been substantially improved. We saw 
an improvement of 20x on the example query. It would likely be more on larger 
queries as we switched from O(n^2) to O(n) operations for ExprSubstitutionMap.

> Queries with many nested inline views see performance issues with 
> ExprSubstitutionMap
> -
>
> Key: IMPALA-12800
> URL: https://issues.apache.org/jira/browse/IMPALA-12800
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 4.3.0
>Reporter: Joe McDonnell
>Assignee: Michael Smith
>Priority: Critical
> Fix For: Impala 4.5.0
>
> Attachments: impala12800repro.sql, impala12800schema.sql, 
> long_query_jstacks.tar.gz
>
>
> A user running a query with many layers of inline views saw a large amount of 
> time spent in analysis. 
>  
> {noformat}
> - Authorization finished (ranger): 7s518ms (13.134ms)
> - Value transfer graph computed: 7s760ms (241.953ms)
> - Single node plan created: 2m47s (2m39s)
> - Distributed plan created: 2m47s (7.430ms)
> - Lineage info computed: 2m47s (39.017ms)
> - Planning finished: 2m47s (672.518ms){noformat}
> In reproducing it locally, we found that most of the stacks end up in 
> ExprSubstitutionMap.
>  
> Here are the main stacks seen while running jstack every 3 seconds during a 
> 75 second execution:
> Location 1: (ExprSubstitutionMap::compose -> contains -> indexOf -> Expr 
> equals) (4 samples)
> {noformat}
>    java.lang.Thread.State: RUNNABLE
>     at org.apache.impala.analysis.Expr.equals(Expr.java:1008)
>     at java.util.ArrayList.indexOf(ArrayList.java:323)
>     at java.util.ArrayList.contains(ArrayList.java:306)
>     at 
> org.apache.impala.analysis.ExprSubstitutionMap.compose(ExprSubstitutionMap.java:120){noformat}
> Location 2:  (ExprSubstitutionMap::compose -> verify -> Expr equals) (9 
> samples)
> {noformat}
>    java.lang.Thread.State: RUNNABLE
>     at org.apache.impala.analysis.Expr.equals(Expr.java:1008)
>     at 
> org.apache.impala.analysis.ExprSubstitutionMap.verify(ExprSubstitutionMap.java:173)
>     at 
> org.apache.impala.analysis.ExprSubstitutionMap.compose(ExprSubstitutionMap.java:126){noformat}
> Location 3: (ExprSubstitutionMap::combine -> verify -> Expr equals) (5 
> samples)
> {noformat}
>    java.lang.Thread.State: RUNNABLE
>     at org.apache.impala.analysis.Expr.equals(Expr.java:1008)
>     at 
> org.apache.impala.analysis.ExprSubstitutionMap.verify(ExprSubstitutionMap.java:173)
>     at 
> org.apache.impala.analysis.ExprSubstitutionMap.combine(ExprSubstitutionMap.java:143){noformat}
> Location 4:  (TupleIsNullPredicate.wrapExprs ->  Analyzer.isTrueWithNullSlots 
> -> FeSupport.EvalPredicate -> Thrift serialization) (4 samples)
> {noformat}
>    java.lang.Thread.State: RUNNABLE
>     at java.lang.StringCoding.encode(StringCoding.java:364)
>     at java.lang.String.getBytes(String.java:941)
>     at 
> org.apache.thrift.protocol.TBinaryProtocol.writeString(TBinaryProtocol.java:227)
>     at 
> org.apache.impala.thrift.TClientRequest$TClientRequestStandardScheme.write(TClientRequest.java:532)
>     at 
> org.apache.impala.thrift.TClientRequest$TClientRequestStandardScheme.write(TClientRequest.java:467)
>     at org.apache.impala.thrift.TClientRequest.write(TClientRequest.java:394)
>     at 
> org.apache.impala.thrift.TQueryCtx$TQueryCtxStandardScheme.write(TQueryCtx.java:3034)
>     at 
> org.apache.impala.thrift.TQueryCtx$TQueryCtxStandardScheme.write(TQueryCtx.java:2709)
>     at org.apache.impala.thrift.TQueryCtx.write(TQueryCtx.java:2400)
>     at org.apache.thrift.TSerializer.serialize(TSerializer.java:84)
>     at 
> org.apache.impala.service.FeSupport.EvalExprWithoutRowBounded(FeSupport.java:206)
>     at 
> org.apache.impala.service.FeSupport.EvalExprWithoutRow(FeSupport.java:194)
>     at org.apache.impala.service.FeSupport.EvalPredicate(FeSupport.java:275)
>     at 
> org.apache.impala.analysis.Analyzer.isTrueWithNullSlots(Analyzer.java:2888)
>     at 
> org.apache.impala.analysis.TupleIsNullPredicate.requiresNullWrapping(TupleIsNullPredicate.java:181)
>     at 
> org.apache.impala.analysis.TupleIsNullPredicate.wrapExpr(TupleIsNullPredicate.java:147)
>     at 
> org.apache.impala.analysis.TupleIsN

[jira] [Created] (IMPALA-13154) Some tables are missing in Top-N Tables with Highest Memory Requirements

2024-06-12 Thread Quanlong Huang (Jira)
Quanlong Huang created IMPALA-13154:
---

 Summary: Some tables are missing in Top-N Tables with Highest 
Memory Requirements
 Key: IMPALA-13154
 URL: https://issues.apache.org/jira/browse/IMPALA-13154
 Project: IMPALA
  Issue Type: Bug
  Components: Catalog
Reporter: Quanlong Huang


In the /catalog page of catalogd WebUI, there is a table for "Top-N Tables with 
Highest Memory Requirements". However, not all tables are counted there. E.g. 
after starting catalogd, run a DESCRIBE on a table to trigger metadata loading 
on it. When it's done, the table is not shown in the WebUI.

The cause is that the list is only updated in HdfsTable.getTHdfsTable() when 
'type' is 
ThriftObjectType.FULL:
[https://github.com/apache/impala/blob/ee21427d26620b40d38c706b4944d2831f84f6f5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java#L2457-L2459]

This used to be the place that all code paths using the table will go to. 
However, we've done bunch of optimizations to not getting the FULL thrift 
object of the table. We should move the code of updating the list of largest 
tables somewhere that all table usages can reach, e.g. after loading the 
metadata of the table, we can update its estimatedMetadataSize.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13154) Some tables are missing in Top-N Tables with Highest Memory Requirements

2024-06-12 Thread Quanlong Huang (Jira)
Quanlong Huang created IMPALA-13154:
---

 Summary: Some tables are missing in Top-N Tables with Highest 
Memory Requirements
 Key: IMPALA-13154
 URL: https://issues.apache.org/jira/browse/IMPALA-13154
 Project: IMPALA
  Issue Type: Bug
  Components: Catalog
Reporter: Quanlong Huang


In the /catalog page of catalogd WebUI, there is a table for "Top-N Tables with 
Highest Memory Requirements". However, not all tables are counted there. E.g. 
after starting catalogd, run a DESCRIBE on a table to trigger metadata loading 
on it. When it's done, the table is not shown in the WebUI.

The cause is that the list is only updated in HdfsTable.getTHdfsTable() when 
'type' is 
ThriftObjectType.FULL:
[https://github.com/apache/impala/blob/ee21427d26620b40d38c706b4944d2831f84f6f5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java#L2457-L2459]

This used to be the place that all code paths using the table will go to. 
However, we've done bunch of optimizations to not getting the FULL thrift 
object of the table. We should move the code of updating the list of largest 
tables somewhere that all table usages can reach, e.g. after loading the 
metadata of the table, we can update its estimatedMetadataSize.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IMPALA-13153) Unreachable catch clause in MetastoreEvents.java

2024-06-11 Thread Sai Hemanth Gantasala (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17854208#comment-17854208
 ] 

Sai Hemanth Gantasala commented on IMPALA-13153:


Thanks for raising the concern. I'll address this issue soon.

> Unreachable catch clause in MetastoreEvents.java
> 
>
> Key: IMPALA-13153
> URL: https://issues.apache.org/jira/browse/IMPALA-13153
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Affects Versions: Impala 4.5.0
>Reporter: Laszlo Gaal
>Assignee: Sai Hemanth Gantasala
>Priority: Critical
>
> In recent builds of master the frontend build reports the following warning:
> {code}
> 22:38:28 20:38:19 [WARNING] 
> /home/ubuntu/Impala/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:[1466,9]
>  unreachable catch clause
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13153) Unreachable catch clause in MetastoreEvents.java

2024-06-11 Thread Laszlo Gaal (Jira)
Laszlo Gaal created IMPALA-13153:


 Summary: Unreachable catch clause in MetastoreEvents.java
 Key: IMPALA-13153
 URL: https://issues.apache.org/jira/browse/IMPALA-13153
 Project: IMPALA
  Issue Type: Bug
  Components: Catalog
Affects Versions: Impala 4.5.0
Reporter: Laszlo Gaal
Assignee: Sai Hemanth Gantasala


In recent builds of master the frontend build reports the following warning:
{code}
22:38:28 20:38:19 [WARNING] 
/home/ubuntu/Impala/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:[1466,9]
 unreachable catch clause
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13153) Unreachable catch clause in MetastoreEvents.java

2024-06-11 Thread Laszlo Gaal (Jira)
Laszlo Gaal created IMPALA-13153:


 Summary: Unreachable catch clause in MetastoreEvents.java
 Key: IMPALA-13153
 URL: https://issues.apache.org/jira/browse/IMPALA-13153
 Project: IMPALA
  Issue Type: Bug
  Components: Catalog
Affects Versions: Impala 4.5.0
Reporter: Laszlo Gaal
Assignee: Sai Hemanth Gantasala


In recent builds of master the frontend build reports the following warning:
{code}
22:38:28 20:38:19 [WARNING] 
/home/ubuntu/Impala/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:[1466,9]
 unreachable catch clause
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (IMPALA-11871) INSERT statement does not respect Ranger policies for HDFS

2024-06-11 Thread Fang-Yu Rao (Jira)


[jira] [Resolved] (IMPALA-11871) INSERT statement does not respect Ranger policies for HDFS

2024-06-11 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao resolved IMPALA-11871.
--
Resolution: Fixed

Resolve the issue since the fix has been merged.

> INSERT statement does not respect Ranger policies for HDFS
> --
>
> Key: IMPALA-11871
> URL: https://issues.apache.org/jira/browse/IMPALA-11871
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Reporter: Fang-Yu Rao
>Assignee: Fang-Yu Rao
>Priority: Major
>
> In a cluster with Ranger auth (and with legacy catalog mode), even if you 
> provide RWX to cm_hdfs -> all-path for the user impala, inserting into a 
> table whose HDFS POSIX permissions happen to exclude impala access will 
> result in an
> {noformat}
> "AnalysisException: Unable to INSERT into target table (default.t1) because 
> Impala does not have WRITE access to HDFS location: 
> hdfs://nightly-71x-vx-2.nightly-71x-vx.root.hwx.site:8020/warehouse/tablespace/external/hive/t1"{noformat}
>  
> {noformat}
> [root@nightly-71x-vx-3 ~]# hdfs dfs -getfacl 
> /warehouse/tablespace/external/hive/t1
> file: /warehouse/tablespace/external/hive/t1 
> owner: hive 
> group: supergroup
> user::rwx
> user:impala:rwx #effective:r-x
> group::rwx #effective:r-x
> mask::r-x
> other::---
> default:user::rwx
> default:user:impala:rwx
> default:group::rwx
> default:mask::rwx
> default:other::--- {noformat}
> ~~
> ANALYSIS
> Stack trace from a version of Cloudera's distribution of Impala (impalad 
> version 3.4.0-SNAPSHOT RELEASE (build 
> {*}db20b59a093c17ea4699117155d58fe874f7d68f{*})):
> {noformat}
> at 
> org.apache.impala.catalog.FeFsTable$Utils.checkWriteAccess(FeFsTable.java:585)
> at 
> org.apache.impala.analysis.InsertStmt.analyzeWriteAccess(InsertStmt.java:545)
> at org.apache.impala.analysis.InsertStmt.analyze(InsertStmt.java:391)
> at 
> org.apache.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:463)
> at 
> org.apache.impala.analysis.AnalysisContext.analyzeAndAuthorize(AnalysisContext.java:426)
> at org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:1570)
> at org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:1536)
> at org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1506)
> at 
> org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:155){noformat}
> The exception occurs at analysis time, so I tested and succeeded in writing 
> directly into the said directory.
> {noformat}
> [root@nightly-71x-vx-3 ~]# hdfs dfs -touchz 
> /warehouse/tablespace/external/hive/t1/test
> [root@nightly-71x-vx-3 ~]# hdfs dfs -ls 
> /warehouse/tablespace/external/hive/t1/
> Found 8 items
> rw-rw---+ 3 hive supergroup 417 2023-01-27 17:37 
> /warehouse/tablespace/external/hive/t1/00_0
> rw-rw---+ 3 hive supergroup 417 2023-01-27 17:44 
> /warehouse/tablespace/external/hive/t1/00_0_copy_1
> rw-rw---+ 3 hive supergroup 417 2023-01-27 17:49 
> /warehouse/tablespace/external/hive/t1/00_0_copy_2
> rw-rw---+ 3 hive supergroup 417 2023-01-27 17:53 
> /warehouse/tablespace/external/hive/t1/00_0_copy_3
> rw-rw---+ 3 impala hive 355 2023-01-27 17:17 
> /warehouse/tablespace/external/hive/t1/4c4477c12c51ad96-3126b52d_2029811630_data.0.parq
> rw-rw---+ 3 impala hive 355 2023-01-27 17:39 
> /warehouse/tablespace/external/hive/t1/9945b25bb37d1ff2-473c1478_574471191_data.0.parq
> drwxrwx---+ - impala hive 0 2023-01-27 17:39 
> /warehouse/tablespace/external/hive/t1/_impala_insert_staging
> rw-rw---+ 3 impala supergroup 0 2023-01-27 18:01 
> /warehouse/tablespace/external/hive/t1/test{noformat}
> Reviewing the code[1], I traced the {{TAccessLevel}} to the catalogd. And if 
> I add user impala to group supergroup on the catalogd host, this query will 
> succeed past the authorization.
> Additionally, this query does not trip up during analysis when catalog v2 is 
> enabled because the method {{getFirstLocationWithoutWriteAccess()}} is not 
> implemented there yet and always returns null[2].
> [1] 
> [https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java#L494-L504]
> [2] 
> [https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java#L295-L298]
> ~~
> Ideally, when Ranger authorization is in place, we should:
> 1) Not check access level during analysis
> 2) Incorporate Ranger ACLs during analysis



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13152) IllegalStateException in computing processing cost when there are predicates on analytic output columns

2024-06-11 Thread Riza Suminto (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17854130#comment-17854130
 ] 

Riza Suminto commented on IMPALA-13152:
---

Filed a patch at: [https://gerrit.cloudera.org/c/21504/] 

> IllegalStateException in computing processing cost when there are predicates 
> on analytic output columns
> ---
>
> Key: IMPALA-13152
> URL: https://issues.apache.org/jira/browse/IMPALA-13152
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Reporter: Quanlong Huang
>Assignee: Riza Suminto
>Priority: Major
>
> Saw an error in the following query when is on:
> {code:sql}
> create table tbl (a int, b int, c int);
> set COMPUTE_PROCESSING_COST=1;
> explain select a, b from (
>   select a, b, c,
> row_number() over(partition by a order by b desc) as latest
>   from tbl
> )b
> WHERE latest=1
> ERROR: IllegalStateException: Processing cost of PlanNode 01:TOP-N is invalid!
> {code}
> Exception in the logs:
> {noformat}
> I0611 13:04:37.192874 28004 jni-util.cc:321] 
> 264ee79bfb6ac031:42f8006c] java.lang.IllegalStateException: 
> Processing cost of PlanNode 01:TOP-N is invalid!
> at 
> com.google.common.base.Preconditions.checkState(Preconditions.java:512)
> at 
> org.apache.impala.planner.PlanNode.computeRowConsumptionAndProductionToCost(PlanNode.java:1047)
> at 
> org.apache.impala.planner.PlanFragment.computeCostingSegment(PlanFragment.java:287)
> at 
> org.apache.impala.planner.Planner.computeProcessingCost(Planner.java:560)
> at 
> org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1932)
> at 
> org.apache.impala.service.Frontend.getPlannedExecRequest(Frontend.java:2892)
> at 
> org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:2676)
> at 
> org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:2224)
> at 
> org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1985)
> at 
> org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:175){noformat}
> Don't see the error if removing the predicate "latest=1".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-13151) DataStreamTestSlowServiceQueue.TestPrioritizeEos fails on ARM

2024-06-11 Thread Michael Smith (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith resolved IMPALA-13151.

Fix Version/s: Impala 4.5.0
   Resolution: Fixed

> DataStreamTestSlowServiceQueue.TestPrioritizeEos fails on ARM
> -
>
> Key: IMPALA-13151
> URL: https://issues.apache.org/jira/browse/IMPALA-13151
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.5.0
>Reporter: Joe McDonnell
>Assignee: Michael Smith
>Priority: Critical
>  Labels: broken-build
> Fix For: Impala 4.5.0
>
>
> The recently introduced DataStreamTestSlowServiceQueue.TestPrioritizeEos is 
> failing with errors like this:
> {noformat}
> /data/jenkins/workspace/impala-asf-master-core-asan-arm/repos/Impala/be/src/runtime/data-stream-test.cc:912
> Expected: (timer.ElapsedTime()) > (3 * MonoTime::kNanosecondsPerSecond), 
> actual: 269834 vs 30{noformat}
> So far, I only see failures on ARM jobs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-13151) DataStreamTestSlowServiceQueue.TestPrioritizeEos fails on ARM

2024-06-11 Thread Michael Smith (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith resolved IMPALA-13151.

Fix Version/s: Impala 4.5.0
   Resolution: Fixed

> DataStreamTestSlowServiceQueue.TestPrioritizeEos fails on ARM
> -
>
> Key: IMPALA-13151
> URL: https://issues.apache.org/jira/browse/IMPALA-13151
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.5.0
>Reporter: Joe McDonnell
>Assignee: Michael Smith
>Priority: Critical
>  Labels: broken-build
> Fix For: Impala 4.5.0
>
>
> The recently introduced DataStreamTestSlowServiceQueue.TestPrioritizeEos is 
> failing with errors like this:
> {noformat}
> /data/jenkins/workspace/impala-asf-master-core-asan-arm/repos/Impala/be/src/runtime/data-stream-test.cc:912
> Expected: (timer.ElapsedTime()) > (3 * MonoTime::kNanosecondsPerSecond), 
> actual: 269834 vs 30{noformat}
> So far, I only see failures on ARM jobs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IMPALA-13152) IllegalStateException in computing processing cost when there are predicates on analytic output columns

2024-06-11 Thread Riza Suminto (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17854089#comment-17854089
 ] 

Riza Suminto commented on IMPALA-13152:
---

Tried your example and I get NaN for BaseProcessingCost.
{noformat}Query: explain select a, b from (
  select a, b, c,
row_number() over(partition by a order by b desc) as latest
  from tbl
)b
WHERE latest=1
ERROR: IllegalStateException: Processing cost of PlanNode 01:TOP-N is invalid! 
cost-total=0 max-instances=1 cost/inst=0 #cons:#prod=0:0 
total-cost=NaN{noformat}

> IllegalStateException in computing processing cost when there are predicates 
> on analytic output columns
> ---
>
> Key: IMPALA-13152
> URL: https://issues.apache.org/jira/browse/IMPALA-13152
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Reporter: Quanlong Huang
>Assignee: Riza Suminto
>Priority: Major
>
> Saw an error in the following query when is on:
> {code:sql}
> create table tbl (a int, b int, c int);
> set COMPUTE_PROCESSING_COST=1;
> explain select a, b from (
>   select a, b, c,
> row_number() over(partition by a order by b desc) as latest
>   from tbl
> )b
> WHERE latest=1
> ERROR: IllegalStateException: Processing cost of PlanNode 01:TOP-N is invalid!
> {code}
> Exception in the logs:
> {noformat}
> I0611 13:04:37.192874 28004 jni-util.cc:321] 
> 264ee79bfb6ac031:42f8006c] java.lang.IllegalStateException: 
> Processing cost of PlanNode 01:TOP-N is invalid!
> at 
> com.google.common.base.Preconditions.checkState(Preconditions.java:512)
> at 
> org.apache.impala.planner.PlanNode.computeRowConsumptionAndProductionToCost(PlanNode.java:1047)
> at 
> org.apache.impala.planner.PlanFragment.computeCostingSegment(PlanFragment.java:287)
> at 
> org.apache.impala.planner.Planner.computeProcessingCost(Planner.java:560)
> at 
> org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1932)
> at 
> org.apache.impala.service.Frontend.getPlannedExecRequest(Frontend.java:2892)
> at 
> org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:2676)
> at 
> org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:2224)
> at 
> org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1985)
> at 
> org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:175){noformat}
> Don't see the error if removing the predicate "latest=1".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13152) IllegalStateException in computing processing cost when there are predicates on analytic output columns

2024-06-11 Thread Riza Suminto (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17854069#comment-17854069
 ] 

Riza Suminto commented on IMPALA-13152:
---

[~stigahuang] is this still happen after IMPALA-13119?

I tried with similar query below and it works:
{noformat}
Query: explain select item_sk, rk from (
select
  ss_item_sk item_sk, ss_sold_time_sk, ss_customer_sk,
  row_number()
  over (partition by ss_item_sk order by ss_sold_time_sk) rk
from store_sales
) b
where rk = 1
+-+
| Explain String
  |
+-+
| Max Per-Host Resource Reservation: Memory=28.00MB Threads=4   
  |
| Per-Host Resource Estimates: Memory=58MB  
  |
| Analyzed query: SELECT item_sk, rk FROM (SELECT ss_item_sk item_sk,   
  |
| ss_sold_time_sk, ss_customer_sk, row_number() OVER (PARTITION BY ss_item_sk   
  |
| ORDER BY ss_sold_time_sk ASC) rk FROM tpcds_parquet.store_sales) b WHERE rk = 
  |
| CAST(1 AS BIGINT) 
  |
|   
  |
| F02:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1 
  |
| |  Per-Instance Resources: mem-estimate=4.20MB mem-reservation=4.00MB 
thread-reservation=1  |
| |  max-parallelism=1 segment-costs=[40262] cpu-comparison-result=6 [max(1 
(self) vs 6 (sum children))]  |
| PLAN-ROOT SINK
  |
| |  output exprs: ss_item_sk, row_number() 
  |
| |  mem-estimate=4.00MB mem-reservation=4.00MB spill-buffer=2.00MB 
thread-reservation=0 cost=35950   |
| | 
  |
| 06:EXCHANGE [UNPARTITIONED]   
  |
| |  mem-estimate=201.02KB mem-reservation=0B thread-reservation=0  
  |
| |  tuple-ids=5,4 row-size=20B cardinality=17.98K cost=4312
  |
| |  in pipelines: 05(GETNEXT)  
  |
| | 
  |
| F01:PLAN FRAGMENT [HASH(ss_item_sk)] hosts=3 instances=3 (adjusted from 384)  
  |
| Per-Instance Resources: mem-estimate=10.16MB mem-reservation=10.00MB 
thread-reservation=1   |
| max-parallelism=3 segment-costs=[146224, 77623] cpu-comparison-result=6 
[max(3 (self) vs 6 (sum children))] |
| 03:SELECT 
  |
| |  predicates: row_number() = CAST(1 AS BIGINT)   
  |
| |  mem-estimate=0B mem-reservation=0B thread-reservation=0
  |
| |  tuple-ids=5,4 row-size=20B cardinality=17.98K cost=17975   
  |
| |  in pipelines: 05(GETNEXT)  
  |
| | 
  |
| 02:ANALYTIC   
  |
| |  functions: row_number()
  |
| |  partition by: ss_item_sk   
  |
| |  order by: ss_sold_time_sk ASC  
  |
| |  window: ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW   
  |
| |  mem-estimate=4.00MB mem-reservation=4.00MB spill-buffer=2.00MB 
thread-reservation=0  |
| |  tuple-ids=5,4 row-size=20B cardinality=17.98K cost=17975   
  |
| |  in pipelines: 05(GETNEXT

[jira] [Work started] (IMPALA-13150) Possible buffer overflow in StringVal::CopyFrom()

2024-06-11 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-13150 started by Daniel Becker.
--
> Possible buffer overflow in StringVal::CopyFrom()
> -
>
> Key: IMPALA-13150
> URL: https://issues.apache.org/jira/browse/IMPALA-13150
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Daniel Becker
>Assignee: Daniel Becker
>Priority: Major
>
> In {{{}StringVal::CopyFrom(){}}}, we take the 'len' parameter as a 
> {{{}size_t{}}}, which is usually a 64-bit unsigned integer. We pass it to the 
> constructor of {{{}StringVal{}}}, which takes it as an {{{}int{}}}, which is 
> usually a 32-bit signed integer. The constructor then allocates memory for 
> the length using the {{int}} value, but back in {{{}CopyFrom(){}}}, we copy 
> the buffer with the {{size_t}} length. If {{size_t}} is indeed 64 bits and 
> {{int}} is 32 bits, and the value is truncated, we may copy more bytes that 
> what we have allocated the destination for. See 
> https://github.com/apache/impala/blob/ce8078204e5995277f79e226e26fe8b9eaca408b/be/src/udf/udf.cc#L546



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-13150) Possible buffer overflow in StringVal::CopyFrom()

2024-06-11 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker updated IMPALA-13150:
---
Summary: Possible buffer overflow in StringVal::CopyFrom()  (was: Possible 
buffer overflow in StringVal)

> Possible buffer overflow in StringVal::CopyFrom()
> -
>
> Key: IMPALA-13150
> URL: https://issues.apache.org/jira/browse/IMPALA-13150
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Daniel Becker
>Assignee: Daniel Becker
>Priority: Major
>
> In {{{}StringVal::CopyFrom(){}}}, we take the 'len' parameter as a 
> {{{}size_t{}}}, which is usually a 64-bit unsigned integer. We pass it to the 
> constructor of {{{}StringVal{}}}, which takes it as an {{{}int{}}}, which is 
> usually a 32-bit signed integer. The constructor then allocates memory for 
> the length using the {{int}} value, but back in {{{}CopyFrom(){}}}, we copy 
> the buffer with the {{size_t}} length. If {{size_t}} is indeed 64 bits and 
> {{int}} is 32 bits, and the value is truncated, we may copy more bytes that 
> what we have allocated the destination for. See 
> https://github.com/apache/impala/blob/ce8078204e5995277f79e226e26fe8b9eaca408b/be/src/udf/udf.cc#L546



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13152) IllegalStateException in computing processing cost when there are predicates on analytic output columns

2024-06-11 Thread Quanlong Huang (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853924#comment-17853924
 ] 

Quanlong Huang commented on IMPALA-13152:
-

Assiging this  to [~rizaon] who knows more about this.

> IllegalStateException in computing processing cost when there are predicates 
> on analytic output columns
> ---
>
> Key: IMPALA-13152
> URL: https://issues.apache.org/jira/browse/IMPALA-13152
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Reporter: Quanlong Huang
>Assignee: Riza Suminto
>Priority: Major
>
> Saw an error in the following query when is on:
> {code:sql}
> create table tbl (a int, b int, c int);
> set COMPUTE_PROCESSING_COST=1;
> explain select a, b from (
>   select a, b, c,
> row_number() over(partition by a order by b desc) as latest
>   from tbl
> )b
> WHERE latest=1
> ERROR: IllegalStateException: Processing cost of PlanNode 01:TOP-N is invalid!
> {code}
> Exception in the logs:
> {noformat}
> I0611 13:04:37.192874 28004 jni-util.cc:321] 
> 264ee79bfb6ac031:42f8006c] java.lang.IllegalStateException: 
> Processing cost of PlanNode 01:TOP-N is invalid!
> at 
> com.google.common.base.Preconditions.checkState(Preconditions.java:512)
> at 
> org.apache.impala.planner.PlanNode.computeRowConsumptionAndProductionToCost(PlanNode.java:1047)
> at 
> org.apache.impala.planner.PlanFragment.computeCostingSegment(PlanFragment.java:287)
> at 
> org.apache.impala.planner.Planner.computeProcessingCost(Planner.java:560)
> at 
> org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1932)
> at 
> org.apache.impala.service.Frontend.getPlannedExecRequest(Frontend.java:2892)
> at 
> org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:2676)
> at 
> org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:2224)
> at 
> org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1985)
> at 
> org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:175){noformat}
> Don't see the error if removing the predicate "latest=1".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12800) Queries with many nested inline views see performance issues with ExprSubstitutionMap

2024-06-11 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853919#comment-17853919
 ] 

ASF subversion and git services commented on IMPALA-12800:
--

Commit 800246add5fcb20c34a767870346f6ce255e41f9 in impala's branch 
refs/heads/master from Michael Smith
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=800246add ]

IMPALA-12800: Use HashMap for ExprSubstitutionMap lookups

Adds a HashMap to ExprSubstitutionMap to speed lookups while retaining
lists for correct ordering (ordering needs to match to SlotRef order).
Ignores duplicate inserts, preserving the old behavior that only the
first match would actually be usable; duplicates primarily show up as a
result of combining duplicate distinct and aggregate expressions, or
redundant nested aggregation (like the tests for IMPALA-10182).

Implements localHash and hashCode for Expr and related classes.

Avoids deep-cloning LHS Exprs in ExprSubstitutionMap as they're used for
lookup and not expected to be mutated.

Adds the many expressions test, which now runs in a handful of seconds.

Change-Id: Ic538a82c69ee1dd76981fbacf95289c9d00ea9fe
Reviewed-on: http://gerrit.cloudera.org:8080/21483
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Queries with many nested inline views see performance issues with 
> ExprSubstitutionMap
> -
>
> Key: IMPALA-12800
> URL: https://issues.apache.org/jira/browse/IMPALA-12800
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 4.3.0
>Reporter: Joe McDonnell
>Assignee: Michael Smith
>Priority: Critical
> Attachments: impala12800repro.sql, impala12800schema.sql, 
> long_query_jstacks.tar.gz
>
>
> A user running a query with many layers of inline views saw a large amount of 
> time spent in analysis. 
>  
> {noformat}
> - Authorization finished (ranger): 7s518ms (13.134ms)
> - Value transfer graph computed: 7s760ms (241.953ms)
> - Single node plan created: 2m47s (2m39s)
> - Distributed plan created: 2m47s (7.430ms)
> - Lineage info computed: 2m47s (39.017ms)
> - Planning finished: 2m47s (672.518ms){noformat}
> In reproducing it locally, we found that most of the stacks end up in 
> ExprSubstitutionMap.
>  
> Here are the main stacks seen while running jstack every 3 seconds during a 
> 75 second execution:
> Location 1: (ExprSubstitutionMap::compose -> contains -> indexOf -> Expr 
> equals) (4 samples)
> {noformat}
>    java.lang.Thread.State: RUNNABLE
>     at org.apache.impala.analysis.Expr.equals(Expr.java:1008)
>     at java.util.ArrayList.indexOf(ArrayList.java:323)
>     at java.util.ArrayList.contains(ArrayList.java:306)
>     at 
> org.apache.impala.analysis.ExprSubstitutionMap.compose(ExprSubstitutionMap.java:120){noformat}
> Location 2:  (ExprSubstitutionMap::compose -> verify -> Expr equals) (9 
> samples)
> {noformat}
>    java.lang.Thread.State: RUNNABLE
>     at org.apache.impala.analysis.Expr.equals(Expr.java:1008)
>     at 
> org.apache.impala.analysis.ExprSubstitutionMap.verify(ExprSubstitutionMap.java:173)
>     at 
> org.apache.impala.analysis.ExprSubstitutionMap.compose(ExprSubstitutionMap.java:126){noformat}
> Location 3: (ExprSubstitutionMap::combine -> verify -> Expr equals) (5 
> samples)
> {noformat}
>    java.lang.Thread.State: RUNNABLE
>     at org.apache.impala.analysis.Expr.equals(Expr.java:1008)
>     at 
> org.apache.impala.analysis.ExprSubstitutionMap.verify(ExprSubstitutionMap.java:173)
>     at 
> org.apache.impala.analysis.ExprSubstitutionMap.combine(ExprSubstitutionMap.java:143){noformat}
> Location 4:  (TupleIsNullPredicate.wrapExprs ->  Analyzer.isTrueWithNullSlots 
> -> FeSupport.EvalPredicate -> Thrift serialization) (4 samples)
> {noformat}
>    java.lang.Thread.State: RUNNABLE
>     at java.lang.StringCoding.encode(StringCoding.java:364)
>     at java.lang.String.getBytes(String.java:941)
>     at 
> org.apache.thrift.protocol.TBinaryProtocol.writeString(TBinaryProtocol.java:227)
>     at 
> org.apache.impala.thrift.TClientRequest$TClientRequestStandardScheme.write(TClientRequest.java:532)
>     at 
> org.apache.impala.thrift.TClientRequest$TClientRequestStandardScheme.write(TClientRequest.java:467)
>     at org.apache.impala.thrift.TClientRequest.write(TClientRequest.java:394)
>     at 
> org.apache.impala.thrift.TQueryCtx$TQueryCtxStandardScheme.write(TQueryCtx.java:3034)
>     at 
> org.apache.impala.thrift.TQueryCtx$TQueryCtxStandardScheme.wr

[jira] [Commented] (IMPALA-13151) DataStreamTestSlowServiceQueue.TestPrioritizeEos fails on ARM

2024-06-11 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853921#comment-17853921
 ] 

ASF subversion and git services commented on IMPALA-13151:
--

Commit cce6b349f1103c167e2e9ef49fa181ede301b94f in impala's branch 
refs/heads/master from Michael Smith
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=cce6b349f ]

IMPALA-13151: Use MonotonicNanos to track test time

Uses MonotonicNanos to track test time rather than MonotonicStopWatch.
IMPALA-2407 updated MonotonicStopWatch to use a low-precision
implementation for performance, which on ARM in particular sometimes
results in undercounting time by a few microseconds. That's enough to
cause a failure in DataStreamTestSlowServiceQueue.TestPrioritizeEos.

Also uses SleepForMs and NANOS_PER_SEC rather than Kudu versions to
better match Impala code base.

Reproduced on ARM and tested the new implementation for several dozen
runs without failure.

Change-Id: I9beb63669c5bdd910e5f713ecd42551841e95400
Reviewed-on: http://gerrit.cloudera.org:8080/21497
Reviewed-by: Riza Suminto 
Tested-by: Impala Public Jenkins 


> DataStreamTestSlowServiceQueue.TestPrioritizeEos fails on ARM
> -
>
> Key: IMPALA-13151
> URL: https://issues.apache.org/jira/browse/IMPALA-13151
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.5.0
>Reporter: Joe McDonnell
>Assignee: Michael Smith
>Priority: Critical
>  Labels: broken-build
>
> The recently introduced DataStreamTestSlowServiceQueue.TestPrioritizeEos is 
> failing with errors like this:
> {noformat}
> /data/jenkins/workspace/impala-asf-master-core-asan-arm/repos/Impala/be/src/runtime/data-stream-test.cc:912
> Expected: (timer.ElapsedTime()) > (3 * MonoTime::kNanosecondsPerSecond), 
> actual: 269834 vs 30{noformat}
> So far, I only see failures on ARM jobs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-2407) Nested Types : Remove calls to clock_gettime for a 9x performance improvement on EC2

2024-06-11 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853922#comment-17853922
 ] 

ASF subversion and git services commented on IMPALA-2407:
-

Commit cce6b349f1103c167e2e9ef49fa181ede301b94f in impala's branch 
refs/heads/master from Michael Smith
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=cce6b349f ]

IMPALA-13151: Use MonotonicNanos to track test time

Uses MonotonicNanos to track test time rather than MonotonicStopWatch.
IMPALA-2407 updated MonotonicStopWatch to use a low-precision
implementation for performance, which on ARM in particular sometimes
results in undercounting time by a few microseconds. That's enough to
cause a failure in DataStreamTestSlowServiceQueue.TestPrioritizeEos.

Also uses SleepForMs and NANOS_PER_SEC rather than Kudu versions to
better match Impala code base.

Reproduced on ARM and tested the new implementation for several dozen
runs without failure.

Change-Id: I9beb63669c5bdd910e5f713ecd42551841e95400
Reviewed-on: http://gerrit.cloudera.org:8080/21497
Reviewed-by: Riza Suminto 
Tested-by: Impala Public Jenkins 


> Nested Types : Remove calls to clock_gettime for a 9x performance improvement 
> on EC2
> 
>
> Key: IMPALA-2407
> URL: https://issues.apache.org/jira/browse/IMPALA-2407
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.3.0
>Reporter: Mostafa Mokhtar
>Assignee: Jim Apple
>Priority: Critical
>  Labels: ec2, performance, ramp-up
> Fix For: Impala 2.5.0
>
> Attachments: q12Nested.tar.gz
>
>
> Queries against Nested types show that ~90% of the time is spent in 
> clock_gettime. 
> A cheaper accounting method can speed up Nested queries by 8-9x
> {code}
> select
>   count(*)
> from
>   customer.orders_string o,
>   o.lineitems_string l
> where
>   l_shipmode in ('MAIL', 'SHIP')
>   and l_commitdate < l_receiptdate
>   and l_shipdate < l_commitdate
>   and l_receiptdate >= '1994-01-01'
>   and l_receiptdate < '1995-01-01'
> group by
>   l_shipmode
> order by
>   l_shipmode
> {code}
> Schema
> +---+--+-+
>   
>  
> | name  | type | comment |
>   
>  
> +---+--+-+
>   
>  
> | c_custkey | bigint   | |
>   
>  
> | c_name| string   | |
>   
>  
> | c_address | string   | |
>   
>  
> | c_nationkey   | bigint   | |
> | c_phone   | string   | |
> | c_acctbal | double   | |
> | c_mktsegment  | string   | |
> | c_comment | string   | |
> | orders_string | array |   |   o_orderkey:bigint, | |
> |   |   o_orderstatus:string,  | |
> |   |   o_totalprice:double,   | |
> |   |   o_orderdate:string,| |
> |   |   o_orderpriority:string,| |
> |   |   o_clerk:string,| |
> |   |   o_shippriority:bigint, | |
> |   |   o_comment:string,  | |
> |   |   lineitems_string:array |   | l_partkey:bigint,| |
> |   | l_suppkey:bigint,| |
> |   | l_linenumber:bigint, | |
> |   | l_quantity:double,   | |
> |   | l_extendedprice:double,  | |
> |   | l_discount:double,   | |
> |   | l_tax:double,|   

[jira] [Commented] (IMPALA-10182) Rows with NULLs filtered out with duplicate columns in subquery select inside UNION ALL

2024-06-11 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853920#comment-17853920
 ] 

ASF subversion and git services commented on IMPALA-10182:
--

Commit 800246add5fcb20c34a767870346f6ce255e41f9 in impala's branch 
refs/heads/master from Michael Smith
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=800246add ]

IMPALA-12800: Use HashMap for ExprSubstitutionMap lookups

Adds a HashMap to ExprSubstitutionMap to speed lookups while retaining
lists for correct ordering (ordering needs to match to SlotRef order).
Ignores duplicate inserts, preserving the old behavior that only the
first match would actually be usable; duplicates primarily show up as a
result of combining duplicate distinct and aggregate expressions, or
redundant nested aggregation (like the tests for IMPALA-10182).

Implements localHash and hashCode for Expr and related classes.

Avoids deep-cloning LHS Exprs in ExprSubstitutionMap as they're used for
lookup and not expected to be mutated.

Adds the many expressions test, which now runs in a handful of seconds.

Change-Id: Ic538a82c69ee1dd76981fbacf95289c9d00ea9fe
Reviewed-on: http://gerrit.cloudera.org:8080/21483
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Rows with NULLs filtered out with duplicate columns in subquery select inside 
> UNION ALL
> ---
>
> Key: IMPALA-10182
> URL: https://issues.apache.org/jira/browse/IMPALA-10182
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Reporter: Tim Armstrong
>Assignee: Aman Sinha
>Priority: Blocker
>  Labels: correctness
> Fix For: Impala 4.0.0
>
>
> Bug report from here - 
> https://community.cloudera.com/t5/Support-Questions/quot-union-all-quot-dropping-records-with-all-null-empty/m-p/303153#M221415
> Repro:
> {noformat}
> create database if not exists as_adventure;
> use as_adventure;
> CREATE tABLE IF NOT EXISTS
> as_adventure.t1 
> ( 
> productsubcategorykey INT, 
> productline STRING);
> insert into t1 values (1,'l1');
> insert into t1 values (2,'l1');
> insert into t1 values (1,'l2');
> insert into t1 values (3,'l3');
> insert into t1 values (null,'');
> select * from t1; 
> SELECT
> MIN(t_53.c_41)   c_41,
> CAST(NULL AS DOUBLE) c_43,
> CAST(NULL AS BIGINT) c_44,
> t_53.c2  c2,
> t_53.c3s0c3s0,
> t_53.c4  c4,
> t_53.c5s0c5s0
> FROM
> (   SELECT
> t.productsubcategorykey c_41,
> t.productline   c2,
> t.productline   c3s0,
> t.productsubcategorykey c4,
> t.productsubcategorykey c5s0
> FROM
> as_adventure.t1 t
> WHERE
> true
> GROUP BY
> 2,
> 3,
> 4,
> 5 ) t_53
> GROUP BY
> 4,
> 5,
> 6,
> 7
>  
> UNION ALL
> SELECT
> MIN(t_53.c_41)   c_41,
> CAST(NULL AS DOUBLE) c_43,
> CAST(NULL AS BIGINT) c_44,
> t_53.c2  c2,
> t_53.c3s0c3s0,
> t_53.c4  c4,
> t_53.c5s0c5s0
> FROM
> (   SELECT
> t.productsubcategorykey c_41,
> t.productline   c2,
> t.productline   c3s0,
> t.productsubcategorykey c4,
> t.productsubcategorykey c5s0
> FROM
> as_adventure.t1 t
> WHERE
> true
> GROUP BY
> 2,
> 3,
> 4,
> 5 ) t_53
> GROUP BY
> 4,
> 5,
> 6,
> 7
> {noformat}
> Somewhat similar to IMPALA-7957 in that the inferred predicates from the 
> column equivalences get placed in a Select node. It's a bit different in that 
> the NULLs that are filtered out from the predicates come from the base table.
> {noformat}
> ++
> | Explain String  
>|
> ++
> | Max Per-Host Resource Reservation: Memory=136.02MB Threads=6
>|
> | Per-Host Resource Estimates: Memory=576MB   
>|
> | WARNI

[jira] [Commented] (IMPALA-11871) INSERT statement does not respect Ranger policies for HDFS

2024-06-11 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-11871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853923#comment-17853923
 ] 

ASF subversion and git services commented on IMPALA-11871:
--

Commit f7e629935b77f412bf74aeebd704af88f03de351 in impala's branch 
refs/heads/master from halim.kim
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=f7e629935 ]

IMPALA-11871: Skip permissions loading and check on HDFS if Ranger is enabled

Before this patch, Impala checked whether the Impala service user had
the WRITE access to the target HDFS table/partition(s) during the
analysis of the INSERT and LOAD DATA statements in the legacy catalog
mode. The access levels of the corresponding HDFS table and partitions
were computed by the catalog server solely based on the HDFS permissions
and ACLs when the table and partitions were instantiated.

After this patch, we skip loading HDFS permissions and assume the
Impala service user has the READ_WRITE permission on all the HDFS paths
associated with the target table during query analysis when Ranger is
enabled. The assumption could be removed after Impala's implementation
of FsPermissionChecker could additionally take Ranger's policies of HDFS
into consideration when performing the check.

Testing:
 - Added end-to-end tests to verify Impala's behavior with respect to
   the INSERT and LOAD DATA statements when Ranger is enabled in the
   legacy catalog mode.

Change-Id: Id33c400fbe0c918b6b65d713b09009512835a4c9
Reviewed-on: http://gerrit.cloudera.org:8080/20221
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> INSERT statement does not respect Ranger policies for HDFS
> --
>
> Key: IMPALA-11871
> URL: https://issues.apache.org/jira/browse/IMPALA-11871
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Reporter: Fang-Yu Rao
>Assignee: Fang-Yu Rao
>Priority: Major
>
> In a cluster with Ranger auth (and with legacy catalog mode), even if you 
> provide RWX to cm_hdfs -> all-path for the user impala, inserting into a 
> table whose HDFS POSIX permissions happen to exclude impala access will 
> result in an
> {noformat}
> "AnalysisException: Unable to INSERT into target table (default.t1) because 
> Impala does not have WRITE access to HDFS location: 
> hdfs://nightly-71x-vx-2.nightly-71x-vx.root.hwx.site:8020/warehouse/tablespace/external/hive/t1"{noformat}
>  
> {noformat}
> [root@nightly-71x-vx-3 ~]# hdfs dfs -getfacl 
> /warehouse/tablespace/external/hive/t1
> file: /warehouse/tablespace/external/hive/t1 
> owner: hive 
> group: supergroup
> user::rwx
> user:impala:rwx #effective:r-x
> group::rwx #effective:r-x
> mask::r-x
> other::---
> default:user::rwx
> default:user:impala:rwx
> default:group::rwx
> default:mask::rwx
> default:other::--- {noformat}
> ~~
> ANALYSIS
> Stack trace from a version of Cloudera's distribution of Impala (impalad 
> version 3.4.0-SNAPSHOT RELEASE (build 
> {*}db20b59a093c17ea4699117155d58fe874f7d68f{*})):
> {noformat}
> at 
> org.apache.impala.catalog.FeFsTable$Utils.checkWriteAccess(FeFsTable.java:585)
> at 
> org.apache.impala.analysis.InsertStmt.analyzeWriteAccess(InsertStmt.java:545)
> at org.apache.impala.analysis.InsertStmt.analyze(InsertStmt.java:391)
> at 
> org.apache.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:463)
> at 
> org.apache.impala.analysis.AnalysisContext.analyzeAndAuthorize(AnalysisContext.java:426)
> at org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:1570)
> at org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:1536)
> at org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1506)
> at 
> org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:155){noformat}
> The exception occurs at analysis time, so I tested and succeeded in writing 
> directly into the said directory.
> {noformat}
> [root@nightly-71x-vx-3 ~]# hdfs dfs -touchz 
> /warehouse/tablespace/external/hive/t1/test
> [root@nightly-71x-vx-3 ~]# hdfs dfs -ls 
> /warehouse/tablespace/external/hive/t1/
> Found 8 items
> rw-rw---+ 3 hive supergroup 417 2023-01-27 17:37 
> /warehouse/tablespace/external/hive/t1/00_0
> rw-rw---+ 3 hive supergroup 417 2023-01-27 17:44 
> /warehouse/tablespace/external/hive/t1/00_0_copy_1
> rw-rw---+ 3 hive supergroup 417 2023-01-27 17:49 
> /warehouse/tablespace/external/hive/t1/00_0_copy_2
> rw-rw---+ 3 hive supergroup 417 2023-01-27 17:53 
> /warehouse/tablespace/external/hive/t1/00_0_copy_3
> rw-rw---+ 3 im

[jira] [Created] (IMPALA-13152) IllegalStateException in computing processing cost when there are predicates on analytic output columns

2024-06-10 Thread Quanlong Huang (Jira)
Quanlong Huang created IMPALA-13152:
---

 Summary: IllegalStateException in computing processing cost when 
there are predicates on analytic output columns
 Key: IMPALA-13152
 URL: https://issues.apache.org/jira/browse/IMPALA-13152
 Project: IMPALA
  Issue Type: Bug
  Components: Frontend
Reporter: Quanlong Huang
Assignee: Riza Suminto


Saw an error in the following query when is on:
{code:sql}
create table tbl (a int, b int, c int);

set COMPUTE_PROCESSING_COST=1;

explain select a, b from (
  select a, b, c,
row_number() over(partition by a order by b desc) as latest
  from tbl
)b
WHERE latest=1

ERROR: IllegalStateException: Processing cost of PlanNode 01:TOP-N is invalid!
{code}
Exception in the logs:
{noformat}
I0611 13:04:37.192874 28004 jni-util.cc:321] 264ee79bfb6ac031:42f8006c] 
java.lang.IllegalStateException: Processing cost of PlanNode 01:TOP-N is 
invalid!
at 
com.google.common.base.Preconditions.checkState(Preconditions.java:512)
at 
org.apache.impala.planner.PlanNode.computeRowConsumptionAndProductionToCost(PlanNode.java:1047)
at 
org.apache.impala.planner.PlanFragment.computeCostingSegment(PlanFragment.java:287)
at 
org.apache.impala.planner.Planner.computeProcessingCost(Planner.java:560)
at 
org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1932)
at 
org.apache.impala.service.Frontend.getPlannedExecRequest(Frontend.java:2892)
at 
org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:2676)
at 
org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:2224)
at 
org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1985)
at 
org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:175){noformat}
Don't see the error if removing the predicate "latest=1".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13152) IllegalStateException in computing processing cost when there are predicates on analytic output columns

2024-06-10 Thread Quanlong Huang (Jira)
Quanlong Huang created IMPALA-13152:
---

 Summary: IllegalStateException in computing processing cost when 
there are predicates on analytic output columns
 Key: IMPALA-13152
 URL: https://issues.apache.org/jira/browse/IMPALA-13152
 Project: IMPALA
  Issue Type: Bug
  Components: Frontend
Reporter: Quanlong Huang
Assignee: Riza Suminto


Saw an error in the following query when is on:
{code:sql}
create table tbl (a int, b int, c int);

set COMPUTE_PROCESSING_COST=1;

explain select a, b from (
  select a, b, c,
row_number() over(partition by a order by b desc) as latest
  from tbl
)b
WHERE latest=1

ERROR: IllegalStateException: Processing cost of PlanNode 01:TOP-N is invalid!
{code}
Exception in the logs:
{noformat}
I0611 13:04:37.192874 28004 jni-util.cc:321] 264ee79bfb6ac031:42f8006c] 
java.lang.IllegalStateException: Processing cost of PlanNode 01:TOP-N is 
invalid!
at 
com.google.common.base.Preconditions.checkState(Preconditions.java:512)
at 
org.apache.impala.planner.PlanNode.computeRowConsumptionAndProductionToCost(PlanNode.java:1047)
at 
org.apache.impala.planner.PlanFragment.computeCostingSegment(PlanFragment.java:287)
at 
org.apache.impala.planner.Planner.computeProcessingCost(Planner.java:560)
at 
org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1932)
at 
org.apache.impala.service.Frontend.getPlannedExecRequest(Frontend.java:2892)
at 
org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:2676)
at 
org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:2224)
at 
org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1985)
at 
org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:175){noformat}
Don't see the error if removing the predicate "latest=1".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IMPALA-13093) Insert into Huawei OBS table failed

2024-06-10 Thread Quanlong Huang (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853843#comment-17853843
 ] 

Quanlong Huang commented on IMPALA-13093:
-

It seems adding this to hdfs-site.xml can also fix the issue:
{code:xml}

fs.obs.file.visibility.enable
true
{code}
I'll check whether OBS returns the real block size.
CC [~michaelsmith] [~eyizoha]

> Insert into Huawei OBS table failed
> ---
>
> Key: IMPALA-13093
> URL: https://issues.apache.org/jira/browse/IMPALA-13093
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.3.0
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Critical
>
> Insert into a table using Huawei OBS (Object Storage Service) as the storage 
> will failed by the following error:
> {noformat}
> Query: insert into test_obs1 values (1, 'abc')
> ERROR: Failed to get info on temporary HDFS file: 
> obs://obs-test-ee93/input/test_obs1/_impala_insert_staging/fe4ac1be6462a13f_362a9b5b/.fe4ac1be6462a13f-362a9b5b_1213692075_dir//fe4ac1be6462a13f-362a9b5b_375832652_data.0.txt
> Error(2): No such file or directory {noformat}
> Looking into the logs:
> {noformat}
> I0516 16:40:55.663640 18922 status.cc:129] fe4ac1be6462a13f:362a9b5b] 
> Failed to get info on temporary HDFS file: 
> obs://obs-test-ee93/input/test_obs1/_impala_insert_staging/fe4ac1be6462a13f_362a9b5b/.fe4ac1be6462a13f-362a9b5b_1213692075_dir//fe4ac1be6462a13f-362a9b5b_375832652_data.0.txt
> Error(2): No such file or directory
> @   0xfc6d44  impala::Status::Status()
> @  0x1c42020  impala::HdfsTableSink::CreateNewTmpFile()
> @  0x1c44357  impala::HdfsTableSink::InitOutputPartition()
> @  0x1c4988a  impala::HdfsTableSink::GetOutputPartition()
> @  0x1c46569  impala::HdfsTableSink::Send()
> @  0x14ee25f  impala::FragmentInstanceState::ExecInternal()
> @  0x14efca3  impala::FragmentInstanceState::Exec()
> @  0x148dc4c  impala::QueryState::ExecFInstance()
> @  0x1b3bab9  impala::Thread::SuperviseThread()
> @  0x1b3cdb1  boost::detail::thread_data<>::run()
> @  0x2474a87  thread_proxy
> @ 0x7fe5a562dea5  start_thread
> @ 0x7fe5a25ddb0d  __clone{noformat}
> Note that impalad is started with {{--symbolize_stacktrace=true}} so the 
> stacktrace has symbols.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13077) Equality predicate on partition column and uncorrelated subquery doesn't reduce the cardinality estimate

2024-06-10 Thread Riza Suminto (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853842#comment-17853842
 ] 

Riza Suminto commented on IMPALA-13077:
---

Looks like this is a bug in terms of calculating lhsNdv and rhsNdv. In current 
code, if either Ndv or Cardinality of equality expression is unknown (-1), 
getSemiJoinCardinality will skip that expression.

[https://github.com/apache/impala/blob/e7dac008bbafb20e4c7d15d46f2bac9a757f/fe/src/main/java/org/apache/impala/planner/JoinNode.java#L720-L726]
 

If Ndv is unknown, but Cardinality is known, that code should assume 
Cardinality as Ndv instead. I test that hack and confirmed that it lower the 
join cardinality through LOG.
{code:java}
I0610 17:09:25.739796 3972670 JoinNode.java:719] 
774dd75ed2b1fc53:c78b86b2] eqJoinConjuncts_.size=1
I0610 17:09:25.739863 3972670 JoinNode.java:755] 
774dd75ed2b1fc53:c78b86b2] getSemiJoinCardinality calculate selectivity 
for (ss_sold_date_sk = min(d_date_sk)) as 5.482456140350877E-4
I0610 17:09:25.739918 3972670 JoinNode.java:760] 
774dd75ed2b1fc53:c78b86b2] getSemiJoinCardinality has 
minSelectivity=5.482456140350877E-4
I0610 17:09:25.739933 3972670 JoinNode.java:762] 
774dd75ed2b1fc53:c78b86b2] Changed cardinality from 2880404 to 1579
I0610 17:09:25.739966 3972670 JoinNode.java:866] 
774dd75ed2b1fc53:c78b86b2] stats Join: cardinality=1579{code}

> Equality predicate on partition column and uncorrelated subquery doesn't 
> reduce the cardinality estimate
> 
>
> Key: IMPALA-13077
> URL: https://issues.apache.org/jira/browse/IMPALA-13077
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Critical
>
> Let's say 'part_tbl' is a partitioned table. Its partition key is 'part_key'. 
> Consider the following query:
> {code:sql}
> select xxx from part_tbl
> where part_key=(select ... from dim_tbl);
> {code}
> Its query plan is a JoinNode with two ScanNodes. When estimating the 
> cardinality of the JoinNode, the planner is not aware that 'part_key' is the 
> partition column and the cardinality of the JoinNode should not be larger 
> than the max row count across partitions.
> The recent work in IMPALA-12018 (Consider runtime filter for cardinality 
> reduction) helps in some cases since there are runtime filters on the 
> partition column. But there are still some cases that we overestimate the 
> cardinality. For instance, 'ss_sold_date_sk' is the only partition key of 
> tpcds.store_sales. The following query
> {code:sql}
> select count(*) from tpcds.store_sales
> where ss_sold_date_sk=(
>   select min(d_date_sk) + 1000 from tpcds.date_dim);{code}
> has query plan:
> {noformat}
> +-+
> | Explain String  |
> +-+
> | Max Per-Host Resource Reservation: Memory=18.94MB Threads=6 |
> | Per-Host Resource Estimates: Memory=243MB   |
> | |
> | PLAN-ROOT SINK  |
> | |   |
> | 09:AGGREGATE [FINALIZE] |
> | |  output: count:merge(*)   |
> | |  row-size=8B cardinality=1|
> | |   |
> | 08:EXCHANGE [UNPARTITIONED] |
> | |   |
> | 04:AGGREGATE|
> | |  output: count(*) |
> | |  row-size=8B cardinality=1|
> | |   |
> | 03:HASH JOIN [LEFT SEMI JOIN, BROADCAST]|
> | |  hash predicates: ss_sold_date_sk = min(d_date_sk) + 1000 |
> | |  runtime filters: RF000 <- min(d_date_sk) + 1000  |
> | |  row-size=4B cardinality=2.88M < Should be max(numRows) across 
> partitions
> | |   |
> | |--07:EXCHANGE [BROADCAST]  |
> | |  ||
> | |  06:AGGREGATE [FINALIZE]  |
> | |  |  output: min:merge(d_date_sk) 

[jira] [Updated] (IMPALA-13151) DataStreamTestSlowServiceQueue.TestPrioritizeEos fails on ARM

2024-06-10 Thread Michael Smith (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith updated IMPALA-13151:
---
Affects Version/s: Impala 4.5.0
   (was: Impala 4.4.0)

> DataStreamTestSlowServiceQueue.TestPrioritizeEos fails on ARM
> -
>
> Key: IMPALA-13151
> URL: https://issues.apache.org/jira/browse/IMPALA-13151
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.5.0
>Reporter: Joe McDonnell
>Assignee: Michael Smith
>Priority: Critical
>  Labels: broken-build
>
> The recently introduced DataStreamTestSlowServiceQueue.TestPrioritizeEos is 
> failing with errors like this:
> {noformat}
> /data/jenkins/workspace/impala-asf-master-core-asan-arm/repos/Impala/be/src/runtime/data-stream-test.cc:912
> Expected: (timer.ElapsedTime()) > (3 * MonoTime::kNanosecondsPerSecond), 
> actual: 269834 vs 30{noformat}
> So far, I only see failures on ARM jobs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13151) DataStreamTestSlowServiceQueue.TestPrioritizeEos fails on ARM

2024-06-10 Thread Michael Smith (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853824#comment-17853824
 ] 

Michael Smith commented on IMPALA-13151:


Oh, more likely that MonotonicStopWatch is less precise because 
https://github.com/apache/impala/blob/4.4.0/be/src/util/stopwatch.h#L159-L163.

> DataStreamTestSlowServiceQueue.TestPrioritizeEos fails on ARM
> -
>
> Key: IMPALA-13151
> URL: https://issues.apache.org/jira/browse/IMPALA-13151
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.4.0
>Reporter: Joe McDonnell
>Assignee: Michael Smith
>Priority: Critical
>  Labels: broken-build
>
> The recently introduced DataStreamTestSlowServiceQueue.TestPrioritizeEos is 
> failing with errors like this:
> {noformat}
> /data/jenkins/workspace/impala-asf-master-core-asan-arm/repos/Impala/be/src/runtime/data-stream-test.cc:912
> Expected: (timer.ElapsedTime()) > (3 * MonoTime::kNanosecondsPerSecond), 
> actual: 269834 vs 30{noformat}
> So far, I only see failures on ARM jobs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Work started] (IMPALA-13151) DataStreamTestSlowServiceQueue.TestPrioritizeEos fails on ARM

2024-06-10 Thread Michael Smith (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-13151 started by Michael Smith.
--
> DataStreamTestSlowServiceQueue.TestPrioritizeEos fails on ARM
> -
>
> Key: IMPALA-13151
> URL: https://issues.apache.org/jira/browse/IMPALA-13151
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.4.0
>Reporter: Joe McDonnell
>Assignee: Michael Smith
>Priority: Critical
>  Labels: broken-build
>
> The recently introduced DataStreamTestSlowServiceQueue.TestPrioritizeEos is 
> failing with errors like this:
> {noformat}
> /data/jenkins/workspace/impala-asf-master-core-asan-arm/repos/Impala/be/src/runtime/data-stream-test.cc:912
> Expected: (timer.ElapsedTime()) > (3 * MonoTime::kNanosecondsPerSecond), 
> actual: 269834 vs 30{noformat}
> So far, I only see failures on ARM jobs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13151) DataStreamTestSlowServiceQueue.TestPrioritizeEos fails on ARM

2024-06-10 Thread Michael Smith (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853778#comment-17853778
 ] 

Michael Smith commented on IMPALA-13151:


I'm tempted to make that a fuzzy comparison. Maybe the sleep method used for 
debug actions is slightly less precise than the timer.

> DataStreamTestSlowServiceQueue.TestPrioritizeEos fails on ARM
> -
>
> Key: IMPALA-13151
> URL: https://issues.apache.org/jira/browse/IMPALA-13151
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.4.0
>Reporter: Joe McDonnell
>Assignee: Michael Smith
>Priority: Critical
>  Labels: broken-build
>
> The recently introduced DataStreamTestSlowServiceQueue.TestPrioritizeEos is 
> failing with errors like this:
> {noformat}
> /data/jenkins/workspace/impala-asf-master-core-asan-arm/repos/Impala/be/src/runtime/data-stream-test.cc:912
> Expected: (timer.ElapsedTime()) > (3 * MonoTime::kNanosecondsPerSecond), 
> actual: 269834 vs 30{noformat}
> So far, I only see failures on ARM jobs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-13151) DataStreamTestSlowServiceQueue.TestPrioritizeEos fails on ARM

2024-06-10 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell reassigned IMPALA-13151:
--

Assignee: Michael Smith

> DataStreamTestSlowServiceQueue.TestPrioritizeEos fails on ARM
> -
>
> Key: IMPALA-13151
> URL: https://issues.apache.org/jira/browse/IMPALA-13151
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.4.0
>Reporter: Joe McDonnell
>Assignee: Michael Smith
>Priority: Critical
>  Labels: broken-build
>
> The recently introduced DataStreamTestSlowServiceQueue.TestPrioritizeEos is 
> failing with errors like this:
> {noformat}
> /data/jenkins/workspace/impala-asf-master-core-asan-arm/repos/Impala/be/src/runtime/data-stream-test.cc:912
> Expected: (timer.ElapsedTime()) > (3 * MonoTime::kNanosecondsPerSecond), 
> actual: 269834 vs 30{noformat}
> So far, I only see failures on ARM jobs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13151) DataStreamTestSlowServiceQueue.TestPrioritizeEos fails on ARM

2024-06-10 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13151:
--

 Summary: DataStreamTestSlowServiceQueue.TestPrioritizeEos fails on 
ARM
 Key: IMPALA-13151
 URL: https://issues.apache.org/jira/browse/IMPALA-13151
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Affects Versions: Impala 4.4.0
Reporter: Joe McDonnell


The recently introduced DataStreamTestSlowServiceQueue.TestPrioritizeEos is 
failing with errors like this:
{noformat}
/data/jenkins/workspace/impala-asf-master-core-asan-arm/repos/Impala/be/src/runtime/data-stream-test.cc:912
Expected: (timer.ElapsedTime()) > (3 * MonoTime::kNanosecondsPerSecond), 
actual: 269834 vs 30{noformat}
So far, I only see failures on ARM jobs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-13151) DataStreamTestSlowServiceQueue.TestPrioritizeEos fails on ARM

2024-06-10 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13151:
--

 Summary: DataStreamTestSlowServiceQueue.TestPrioritizeEos fails on 
ARM
 Key: IMPALA-13151
 URL: https://issues.apache.org/jira/browse/IMPALA-13151
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Affects Versions: Impala 4.4.0
Reporter: Joe McDonnell


The recently introduced DataStreamTestSlowServiceQueue.TestPrioritizeEos is 
failing with errors like this:
{noformat}
/data/jenkins/workspace/impala-asf-master-core-asan-arm/repos/Impala/be/src/runtime/data-stream-test.cc:912
Expected: (timer.ElapsedTime()) > (3 * MonoTime::kNanosecondsPerSecond), 
actual: 269834 vs 30{noformat}
So far, I only see failures on ARM jobs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-13126) ReloadEvent.isOlderEvent() should hold the table read lock

2024-06-10 Thread Sai Hemanth Gantasala (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sai Hemanth Gantasala updated IMPALA-13126:
---
Labels: catalog-2024  (was: )

> ReloadEvent.isOlderEvent() should hold the table read lock
> --
>
> Key: IMPALA-13126
> URL: https://issues.apache.org/jira/browse/IMPALA-13126
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Reporter: Quanlong Huang
>Assignee: Sai Hemanth Gantasala
>Priority: Critical
>  Labels: catalog-2024
>
> Saw an exception like this:
> {noformat}
> E0601 09:11:25.275251   246 MetastoreEventsProcessor.java:990] Unexpected 
> exception received while processing event
> Java exception follows:
> java.util.ConcurrentModificationException
> at java.util.HashMap$HashIterator.nextNode(HashMap.java:1469)
> at java.util.HashMap$ValueIterator.next(HashMap.java:1498)
> at 
> org.apache.impala.catalog.FeFsTable$Utils.getPartitionFromThriftPartitionSpec(FeFsTable.java:616)
> at 
> org.apache.impala.catalog.HdfsTable.getPartitionFromThriftPartitionSpec(HdfsTable.java:597)
> at 
> org.apache.impala.catalog.Catalog.getHdfsPartition(Catalog.java:511)
> at 
> org.apache.impala.catalog.Catalog.getHdfsPartition(Catalog.java:489)
> at 
> org.apache.impala.catalog.CatalogServiceCatalog.isPartitionLoadedAfterEvent(CatalogServiceCatalog.java:4024)
> at 
> org.apache.impala.catalog.events.MetastoreEvents$ReloadEvent.isOlderEvent(MetastoreEvents.java:2754)
> at 
> org.apache.impala.catalog.events.MetastoreEvents$ReloadEvent.processTableEvent(MetastoreEvents.java:2729)
> at 
> org.apache.impala.catalog.events.MetastoreEvents$MetastoreTableEvent.process(MetastoreEvents.java:1107)
> at 
> org.apache.impala.catalog.events.MetastoreEvents$MetastoreEvent.processIfEnabled(MetastoreEvents.java:531)
> at 
> org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:1164)
> at 
> org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:972)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:750) {noformat}
> For a partition-level RELOAD event, ReloadEvent.isOlderEvent() needs to check 
> whether the corresponding partition is reloaded after the event. This should 
> be done after holding the table read lock. Otherwise, EventProcessor could 
> hit the error above when there are concurrent DDLs/DMLs modifying the 
> partition list.
> CC [~VenuReddy]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13146) Javascript tests sometimes fail to download NodeJS

2024-06-10 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853729#comment-17853729
 ] 

ASF subversion and git services commented on IMPALA-13146:
--

Commit e7dac008bbafb20e4c7d15d46f2bac9a757f in impala's branch 
refs/heads/master from Joe McDonnell
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=e7dac008b ]

IMPALA-13146: Download NodeJS from native toolchain

Some test runs have had issues downloading the NodeJS
tarball from the nodejs servers. This changes the
test to download from our native toolchain to make this
more reliable. This means that future upgrades to
NodeJS will need to upload new tarballs to the native
toolchain.

Testing:
 - Ran x86_64/ARM javascript tests

Change-Id: I1def801469cb68633e89b4a0f3c07a771febe599
Reviewed-on: http://gerrit.cloudera.org:8080/21494
Tested-by: Impala Public Jenkins 
Reviewed-by: Surya Hebbar 
Reviewed-by: Wenzhe Zhou 


> Javascript tests sometimes fail to download NodeJS
> --
>
> Key: IMPALA-13146
> URL: https://issues.apache.org/jira/browse/IMPALA-13146
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 4.5.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Critical
>  Labels: broken-build, flaky
>
> For automated tests, sometimes the Javascript tests fail to download NodeJS:
> {noformat}
> 01:37:16 Fetching NodeJS v16.20.2-linux-x64 binaries ...
> 01:37:16   % Total% Received % Xferd  Average Speed   TimeTime 
> Time  Current
> 01:37:16  Dload  Upload   Total   Spent
> Left  Speed
> 01:37:16 
>   0 00 00 0  0  0 --:--:-- --:--:-- --:--:-- 0
>   0 00 00 0  0  0 --:--:--  0:00:01 --:--:-- 0
>   0 00 00 0  0  0 --:--:--  0:00:02 --:--:-- 0
>   0 21.5M0   9020 0293  0 21:23:04  0:00:03 21:23:01   293
> ...
>  30 21.5M   30 6776k    0     0  50307      0  0:07:28  0:02:17  0:05:11 23826
> 01:39:34 curl: (18) transfer closed with 15617860 bytes remaining to 
> read{noformat}
> If this keeps happening, we should mirror the NodeJS binary on the 
> native-toolchain s3 bucket.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13150) Possible buffer overflow in StringVal

2024-06-10 Thread Daniel Becker (Jira)
Daniel Becker created IMPALA-13150:
--

 Summary: Possible buffer overflow in StringVal
 Key: IMPALA-13150
 URL: https://issues.apache.org/jira/browse/IMPALA-13150
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Reporter: Daniel Becker
Assignee: Daniel Becker


In {{{}StringVal::CopyFrom(){}}}, we take the 'len' parameter as a 
{{{}size_t{}}}, which is usually a 64-bit unsigned integer. We pass it to the 
constructor of {{{}StringVal{}}}, which takes it as an {{{}int{}}}, which is 
usually a 32-bit signed integer. The constructor then allocates memory for the 
length using the {{int}} value, but back in {{{}CopyFrom(){}}}, we copy the 
buffer with the {{size_t}} length. If {{size_t}} is indeed 64 bits and {{int}} 
is 32 bits, and the value is truncated, we may copy more bytes that what we 
have allocated the destination for. See 
https://github.com/apache/impala/blob/ce8078204e5995277f79e226e26fe8b9eaca408b/be/src/udf/udf.cc#L546



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-13150) Possible buffer overflow in StringVal

2024-06-10 Thread Daniel Becker (Jira)
Daniel Becker created IMPALA-13150:
--

 Summary: Possible buffer overflow in StringVal
 Key: IMPALA-13150
 URL: https://issues.apache.org/jira/browse/IMPALA-13150
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Reporter: Daniel Becker
Assignee: Daniel Becker


In {{{}StringVal::CopyFrom(){}}}, we take the 'len' parameter as a 
{{{}size_t{}}}, which is usually a 64-bit unsigned integer. We pass it to the 
constructor of {{{}StringVal{}}}, which takes it as an {{{}int{}}}, which is 
usually a 32-bit signed integer. The constructor then allocates memory for the 
length using the {{int}} value, but back in {{{}CopyFrom(){}}}, we copy the 
buffer with the {{size_t}} length. If {{size_t}} is indeed 64 bits and {{int}} 
is 32 bits, and the value is truncated, we may copy more bytes that what we 
have allocated the destination for. See 
https://github.com/apache/impala/blob/ce8078204e5995277f79e226e26fe8b9eaca408b/be/src/udf/udf.cc#L546



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13149) Show JVM info in the WebUI

2024-06-09 Thread Quanlong Huang (Jira)
Quanlong Huang created IMPALA-13149:
---

 Summary: Show JVM info in the WebUI
 Key: IMPALA-13149
 URL: https://issues.apache.org/jira/browse/IMPALA-13149
 Project: IMPALA
  Issue Type: New Feature
Reporter: Quanlong Huang


It'd be helpful to show the JVM info in the WebUI, e.g. show the output of 
"java -version":
{code:java}
openjdk version "1.8.0_412"
OpenJDK Runtime Environment (build 1.8.0_412-b08)
OpenJDK 64-Bit Server VM (build 25.412-b08, mixed mode){code}
On nodes just have JRE deployed, we'd like to deploy the same version of JDK to 
perform heap dumps (jmap). Showing the JVM info in the WebUI will be useful.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-13149) Show JVM info in the WebUI

2024-06-09 Thread Quanlong Huang (Jira)
Quanlong Huang created IMPALA-13149:
---

 Summary: Show JVM info in the WebUI
 Key: IMPALA-13149
 URL: https://issues.apache.org/jira/browse/IMPALA-13149
 Project: IMPALA
  Issue Type: New Feature
Reporter: Quanlong Huang


It'd be helpful to show the JVM info in the WebUI, e.g. show the output of 
"java -version":
{code:java}
openjdk version "1.8.0_412"
OpenJDK Runtime Environment (build 1.8.0_412-b08)
OpenJDK 64-Bit Server VM (build 25.412-b08, mixed mode){code}
On nodes just have JRE deployed, we'd like to deploy the same version of JDK to 
perform heap dumps (jmap). Showing the JVM info in the WebUI will be useful.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-13148) Show the number of in-progress Catalog operations

2024-06-09 Thread Quanlong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-13148:

Attachment: Selection_123.png
Selection_122.png

> Show the number of in-progress Catalog operations
> -
>
> Key: IMPALA-13148
> URL: https://issues.apache.org/jira/browse/IMPALA-13148
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Quanlong Huang
>Priority: Major
>  Labels: newbie, ramp-up
> Attachments: Selection_122.png, Selection_123.png
>
>
> In the /operations page of catalogd WebUI, the list of In-progress Catalog 
> Operations are shown. It'd be helpful to also show the number of such 
> operations. Like in the /queries page of coordinator WebUI, it shows 100 
> queries in flight.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13148) Show the number of in-progress Catalog operations

2024-06-09 Thread Quanlong Huang (Jira)
Quanlong Huang created IMPALA-13148:
---

 Summary: Show the number of in-progress Catalog operations
 Key: IMPALA-13148
 URL: https://issues.apache.org/jira/browse/IMPALA-13148
 Project: IMPALA
  Issue Type: Improvement
Reporter: Quanlong Huang
 Attachments: Selection_122.png, Selection_123.png

In the /operations page of catalogd WebUI, the list of In-progress Catalog 
Operations are shown. It'd be helpful to also show the number of such 
operations. Like in the /queries page of coordinator WebUI, it shows 100 
queries in flight.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13148) Show the number of in-progress Catalog operations

2024-06-09 Thread Quanlong Huang (Jira)
Quanlong Huang created IMPALA-13148:
---

 Summary: Show the number of in-progress Catalog operations
 Key: IMPALA-13148
 URL: https://issues.apache.org/jira/browse/IMPALA-13148
 Project: IMPALA
  Issue Type: Improvement
Reporter: Quanlong Huang
 Attachments: Selection_122.png, Selection_123.png

In the /operations page of catalogd WebUI, the list of In-progress Catalog 
Operations are shown. It'd be helpful to also show the number of such 
operations. Like in the /queries page of coordinator WebUI, it shows 100 
queries in flight.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (IMPALA-13143) TestCatalogdHA.test_catalogd_failover_with_sync_ddl times out expecting query failure

2024-06-09 Thread Wenzhe Zhou (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenzhe Zhou resolved IMPALA-13143.
--
Fix Version/s: Impala 4.5.0
   Resolution: Fixed

> TestCatalogdHA.test_catalogd_failover_with_sync_ddl times out expecting query 
> failure
> -
>
> Key: IMPALA-13143
> URL: https://issues.apache.org/jira/browse/IMPALA-13143
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.5.0
>Reporter: Joe McDonnell
>Assignee: Wenzhe Zhou
>Priority: Critical
>  Labels: broken-build, flaky
> Fix For: Impala 4.5.0
>
>
> The new TestCatalogdHA.test_catalogd_failover_with_sync_ddl test is failing 
> intermittently with:
> {noformat}
> custom_cluster/test_catalogd_ha.py:472: in 
> test_catalogd_failover_with_sync_ddl
> self.wait_for_state(handle, QueryState.EXCEPTION, 30, client=client)
> common/impala_test_suite.py:1216: in wait_for_state
> self.wait_for_any_state(handle, [expected_state], timeout, client)
> common/impala_test_suite.py:1234: in wait_for_any_state
> raise Timeout(timeout_msg)
> E   Timeout: query '9d49ab6360f6cbc5:4826a796' did not reach one of 
> the expected states [5], last known state 4{noformat}
> This means the query succeeded even though we expected it to fail. This is 
> currently limited to s3 jobs. In a different test, we saw issues because s3 
> is slower (see IMPALA-12616).
> This test was introduced by IMPALA-13134: 
> https://github.com/apache/impala/commit/70b7b6a78d49c30933d79e0a1c2a725f7e0a3e50



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-13143) TestCatalogdHA.test_catalogd_failover_with_sync_ddl times out expecting query failure

2024-06-09 Thread Wenzhe Zhou (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenzhe Zhou resolved IMPALA-13143.
--
Fix Version/s: Impala 4.5.0
   Resolution: Fixed

> TestCatalogdHA.test_catalogd_failover_with_sync_ddl times out expecting query 
> failure
> -
>
> Key: IMPALA-13143
> URL: https://issues.apache.org/jira/browse/IMPALA-13143
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.5.0
>Reporter: Joe McDonnell
>Assignee: Wenzhe Zhou
>Priority: Critical
>  Labels: broken-build, flaky
> Fix For: Impala 4.5.0
>
>
> The new TestCatalogdHA.test_catalogd_failover_with_sync_ddl test is failing 
> intermittently with:
> {noformat}
> custom_cluster/test_catalogd_ha.py:472: in 
> test_catalogd_failover_with_sync_ddl
> self.wait_for_state(handle, QueryState.EXCEPTION, 30, client=client)
> common/impala_test_suite.py:1216: in wait_for_state
> self.wait_for_any_state(handle, [expected_state], timeout, client)
> common/impala_test_suite.py:1234: in wait_for_any_state
> raise Timeout(timeout_msg)
> E   Timeout: query '9d49ab6360f6cbc5:4826a796' did not reach one of 
> the expected states [5], last known state 4{noformat}
> This means the query succeeded even though we expected it to fail. This is 
> currently limited to s3 jobs. In a different test, we saw issues because s3 
> is slower (see IMPALA-12616).
> This test was introduced by IMPALA-13134: 
> https://github.com/apache/impala/commit/70b7b6a78d49c30933d79e0a1c2a725f7e0a3e50



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IMPALA-12266) Sporadic failure after migrating a table to Iceberg

2024-06-09 Thread Fang-Yu Rao (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853512#comment-17853512
 ] 

Fang-Yu Rao commented on IMPALA-12266:
--

Encountered this failure again at 
[https://jenkins.impala.io/job/ubuntu-20.04-dockerised-tests/1873/testReport/junit/query_test.test_iceberg/TestIcebergTable/test_convert_table_protocol__beeswax___exec_optiontest_replan___1___batch_size___0___num_nodes___0___disable_codegen_rows_threshold___0___disable_codegen___False___abort_on_error___1___exec_single_node_rows_threshold___0table_format__parquet_none_/]
  in a Jenkins job against [https://gerrit.cloudera.org/c/21160/], which did 
not change Impala's behavior in this area.

> Sporadic failure after migrating a table to Iceberg
> ---
>
> Key: IMPALA-12266
> URL: https://issues.apache.org/jira/browse/IMPALA-12266
> Project: IMPALA
>  Issue Type: Bug
>  Components: fe
>Affects Versions: Impala 4.2.0
>Reporter: Tamas Mate
>Assignee: Gabor Kaszab
>Priority: Critical
>  Labels: impala-iceberg
> Attachments: 
> catalogd.bd40020df22b.invalid-user.log.INFO.20230704-181939.1, 
> impalad.6c0f48d9ce66.invalid-user.log.INFO.20230704-181940.1
>
>
> TestIcebergTable.test_convert_table test failed in a recent verify job's 
> dockerised tests:
> https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/7629
> {code:none}
> E   ImpalaBeeswaxException: ImpalaBeeswaxException:
> EINNER EXCEPTION: 
> EMESSAGE: AnalysisException: Failed to load metadata for table: 
> 'parquet_nopartitioned'
> E   CAUSED BY: TableLoadingException: Could not load table 
> test_convert_table_cdba7383.parquet_nopartitioned from catalog
> E   CAUSED BY: TException: 
> TGetPartialCatalogObjectResponse(status:TStatus(status_code:GENERAL, 
> error_msgs:[NullPointerException: null]), lookup_status:OK)
> {code}
> {code:none}
> E0704 19:09:22.980131   833 JniUtil.java:183] 
> 7145c21173f2c47b:2579db55] Error in Getting partial catalog object of 
> TABLE:test_convert_table_cdba7383.parquet_nopartitioned. Time spent: 49ms
> I0704 19:09:22.980309   833 jni-util.cc:288] 
> 7145c21173f2c47b:2579db55] java.lang.NullPointerException
>   at 
> org.apache.impala.catalog.CatalogServiceCatalog.replaceTableIfUnchanged(CatalogServiceCatalog.java:2357)
>   at 
> org.apache.impala.catalog.CatalogServiceCatalog.getOrLoadTable(CatalogServiceCatalog.java:2300)
>   at 
> org.apache.impala.catalog.CatalogServiceCatalog.doGetPartialCatalogObject(CatalogServiceCatalog.java:3587)
>   at 
> org.apache.impala.catalog.CatalogServiceCatalog.getPartialCatalogObject(CatalogServiceCatalog.java:3513)
>   at 
> org.apache.impala.catalog.CatalogServiceCatalog.getPartialCatalogObject(CatalogServiceCatalog.java:3480)
>   at 
> org.apache.impala.service.JniCatalog.lambda$getPartialCatalogObject$11(JniCatalog.java:397)
>   at 
> org.apache.impala.service.JniCatalogOp.lambda$execAndSerialize$1(JniCatalogOp.java:90)
>   at org.apache.impala.service.JniCatalogOp.execOp(JniCatalogOp.java:58)
>   at 
> org.apache.impala.service.JniCatalogOp.execAndSerialize(JniCatalogOp.java:89)
>   at 
> org.apache.impala.service.JniCatalogOp.execAndSerializeSilentStartAndFinish(JniCatalogOp.java:109)
>   at 
> org.apache.impala.service.JniCatalog.execAndSerializeSilentStartAndFinish(JniCatalog.java:238)
>   at 
> org.apache.impala.service.JniCatalog.getPartialCatalogObject(JniCatalog.java:396)
> I0704 19:09:22.980324   833 status.cc:129] 7145c21173f2c47b:2579db55] 
> NullPointerException: null
> @  0x1012f9f  impala::Status::Status()
> @  0x187f964  impala::JniUtil::GetJniExceptionMsg()
> @   0xfee920  impala::JniCall::Call<>()
> @   0xfccd0f  impala::Catalog::GetPartialCatalogObject()
> @   0xfb55a5  
> impala::CatalogServiceThriftIf::GetPartialCatalogObject()
> @   0xf7a691  
> impala::CatalogServiceProcessorT<>::process_GetPartialCatalogObject()
> @   0xf82151  impala::CatalogServiceProcessorT<>::dispatchCall()
> @   0xee330f  apache::thrift::TDispatchProcessor::process()
> @  0x1329246  
> apache::thrift::server::TAcceptQueueServer::Task::run()
> @  0x1315a89  impala::ThriftThread::RunRunnable()
> @  0x131773d  
> boost::detail::function::void_function_obj_invoker0<>::invoke()
> @  0x195ba8c  impala::Thread::SuperviseThread()
> @  0x195c895  boost::detail

[jira] [Commented] (IMPALA-13143) TestCatalogdHA.test_catalogd_failover_with_sync_ddl times out expecting query failure

2024-06-07 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853328#comment-17853328
 ] 

ASF subversion and git services commented on IMPALA-13143:
--

Commit bafd1903069163f38812d7fa42f9c4d2f7218fcf in impala's branch 
refs/heads/master from wzhou-code
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=bafd19030 ]

IMPALA-13143: Fix flaky test_catalogd_failover_with_sync_ddl

The test_catalogd_failover_with_sync_ddl test which was added to
custom_cluster/test_catalogd_ha.py in IMPALA-13134 failed on s3.
The test relies on specific timing with a sleep injected via a
debug action so that the DDL query is still running when catalogd
failover is triggered. The failures were caused by slowly restarting
for catalogd on s3 so that the query finished before catalogd
failover was triggered.

This patch fixed the issue by increasing the sleep time for s3 builds
and other slow builds.

Testing:
 - Ran the test 100 times in a loop on s3.

Change-Id: I15bb6aae23a2f544067f993533e322969372ebd5
Reviewed-on: http://gerrit.cloudera.org:8080/21491
Reviewed-by: Riza Suminto 
Tested-by: Impala Public Jenkins 


> TestCatalogdHA.test_catalogd_failover_with_sync_ddl times out expecting query 
> failure
> -
>
> Key: IMPALA-13143
> URL: https://issues.apache.org/jira/browse/IMPALA-13143
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.5.0
>Reporter: Joe McDonnell
>Assignee: Wenzhe Zhou
>Priority: Critical
>  Labels: broken-build, flaky
>
> The new TestCatalogdHA.test_catalogd_failover_with_sync_ddl test is failing 
> intermittently with:
> {noformat}
> custom_cluster/test_catalogd_ha.py:472: in 
> test_catalogd_failover_with_sync_ddl
> self.wait_for_state(handle, QueryState.EXCEPTION, 30, client=client)
> common/impala_test_suite.py:1216: in wait_for_state
> self.wait_for_any_state(handle, [expected_state], timeout, client)
> common/impala_test_suite.py:1234: in wait_for_any_state
> raise Timeout(timeout_msg)
> E   Timeout: query '9d49ab6360f6cbc5:4826a796' did not reach one of 
> the expected states [5], last known state 4{noformat}
> This means the query succeeded even though we expected it to fail. This is 
> currently limited to s3 jobs. In a different test, we saw issues because s3 
> is slower (see IMPALA-12616).
> This test was introduced by IMPALA-13134: 
> https://github.com/apache/impala/commit/70b7b6a78d49c30933d79e0a1c2a725f7e0a3e50



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13134) DDL hang with SYNC_DDL enabled when Catalogd is changed to standby status

2024-06-07 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853329#comment-17853329
 ] 

ASF subversion and git services commented on IMPALA-13134:
--

Commit bafd1903069163f38812d7fa42f9c4d2f7218fcf in impala's branch 
refs/heads/master from wzhou-code
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=bafd19030 ]

IMPALA-13143: Fix flaky test_catalogd_failover_with_sync_ddl

The test_catalogd_failover_with_sync_ddl test which was added to
custom_cluster/test_catalogd_ha.py in IMPALA-13134 failed on s3.
The test relies on specific timing with a sleep injected via a
debug action so that the DDL query is still running when catalogd
failover is triggered. The failures were caused by slowly restarting
for catalogd on s3 so that the query finished before catalogd
failover was triggered.

This patch fixed the issue by increasing the sleep time for s3 builds
and other slow builds.

Testing:
 - Ran the test 100 times in a loop on s3.

Change-Id: I15bb6aae23a2f544067f993533e322969372ebd5
Reviewed-on: http://gerrit.cloudera.org:8080/21491
Reviewed-by: Riza Suminto 
Tested-by: Impala Public Jenkins 


> DDL hang with SYNC_DDL enabled when Catalogd is changed to standby status
> -
>
> Key: IMPALA-13134
> URL: https://issues.apache.org/jira/browse/IMPALA-13134
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend, Catalog
>Reporter: Wenzhe Zhou
>Assignee: Wenzhe Zhou
>Priority: Major
> Fix For: Impala 4.5.0
>
>
> Catalogd waits for SYNC_DDL version when it processes a DDL with SYNC_DDL 
> enabled. If the status of Catalogd is changed from active to standby when 
> CatalogServiceCatalog.waitForSyncDdlVersion() is called, the standby catalogd 
> does not receive catalog topic updates from statestore. This causes catalogd 
> thread waits indefinitely and DDL query hanging.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-13146) Javascript tests sometimes fail to download NodeJS

2024-06-07 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell reassigned IMPALA-13146:
--

Assignee: Joe McDonnell

> Javascript tests sometimes fail to download NodeJS
> --
>
> Key: IMPALA-13146
> URL: https://issues.apache.org/jira/browse/IMPALA-13146
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 4.5.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Critical
>  Labels: broken-build, flaky
>
> For automated tests, sometimes the Javascript tests fail to download NodeJS:
> {noformat}
> 01:37:16 Fetching NodeJS v16.20.2-linux-x64 binaries ...
> 01:37:16   % Total% Received % Xferd  Average Speed   TimeTime 
> Time  Current
> 01:37:16  Dload  Upload   Total   Spent
> Left  Speed
> 01:37:16 
>   0 00 00 0  0  0 --:--:-- --:--:-- --:--:-- 0
>   0 00 00 0  0  0 --:--:--  0:00:01 --:--:-- 0
>   0 00 00 0  0  0 --:--:--  0:00:02 --:--:-- 0
>   0 21.5M0   9020 0293  0 21:23:04  0:00:03 21:23:01   293
> ...
>  30 21.5M   30 6776k    0     0  50307      0  0:07:28  0:02:17  0:05:11 23826
> 01:39:34 curl: (18) transfer closed with 15617860 bytes remaining to 
> read{noformat}
> If this keeps happening, we should mirror the NodeJS binary on the 
> native-toolchain s3 bucket.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-12680) NullPointerException in addHmsPartitions() during MetastoreEventsProcessor switch state from PAUSED to ACTIVE

2024-06-07 Thread Sai Hemanth Gantasala (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sai Hemanth Gantasala resolved IMPALA-12680.

Fix Version/s: Impala 4.5.0
   Resolution: Fixed

> NullPointerException in addHmsPartitions() during MetastoreEventsProcessor 
> switch state from PAUSED to ACTIVE
> -
>
> Key: IMPALA-12680
> URL: https://issues.apache.org/jira/browse/IMPALA-12680
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Reporter: Quanlong Huang
>Assignee: Sai Hemanth Gantasala
>Priority: Critical
> Fix For: Impala 4.5.0
>
> Attachments: alterTableAddPartitionProfile.txt, catalogd.INFO.gz, 
> impalad.INFO.gz
>
>
> Event processing is paused during a global INVALIDATE METADATA. 
> catalog_.isEventProcessingActive() returns false in this state.
>  
> If an AlterTableAddPartition statement is running during that time, we could 
> pass in a null value for the 'partitionToEventId' map in here:
> {code:java}
>   Map partitionToEventId = 
> catalog_.isEventProcessingActive() ?
>   Maps.newHashMap() : null;
>   List addedHmsPartitions = 
> addHmsPartitionsInTransaction(msClient,
>   tbl, allHmsPartitionsToAdd, partitionToEventId, ifNotExists); {code}
>  
> https://github.com/apache/impala/blob/fcda98ad99c13324e3ab09f2e92d331d0304bb8e/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L4404-L4407
> If the global INVALIDATE METADATA finishes and event processing is back to 
> the ACTIVE state before the AlterTableAddPartition statement runs into 
> addHmsPartitions(), we will have an non-empty 'partitionToEventSubMap' here:
> {code:java}
> List events = 
> getNextMetastoreEventsIfEnabled(eventId,
> event -> AddPartitionEvent.ADD_PARTITION_EVENT_TYPE
> .equals(event.getEventType())
> && msTbl.getDbName().equalsIgnoreCase(event.getDbName())
> && 
> msTbl.getTableName().equalsIgnoreCase(event.getTableName()));
> Map partitionToEventSubMap = Maps.newHashMap();
> getPartitionsFromEvent(events, partitionToEventSubMap);
> // set the eventId to last one which we received so the we fetch the 
> next
> // set of events correctly
> if (!events.isEmpty()) {
>   eventId = events.get(events.size() - 1).getEventId();
> }
> if (partitionToEventSubMap.isEmpty()) {
>   // if partitions couldn't be fetched from events, use the one 
> returned by
>   // add_partitions call above.
>   addedHmsPartitions.addAll(addedPartitions);
> } else {
>   Preconditions.checkNotNull(partitionToEventId); // <-- This will 
> fail{code}
> https://github.com/apache/impala/blob/fcda98ad99c13324e3ab09f2e92d331d0304bb8e/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L5052-L5069
> Then the AlterTableAddPartition statement fails with NullPointerException:
> {code:java}
> I0104 02:46:32.075830  1010 jni-util.cc:302] 
> 4a4eae34f60ba947:b6b2bcfc] java.lang.NullPointerException
> at 
> com.google.common.base.Preconditions.checkNotNull(Preconditions.java:889)
> at 
> org.apache.impala.service.CatalogOpExecutor.addHmsPartitions(CatalogOpExecutor.java:5051)
> at 
> org.apache.impala.service.CatalogOpExecutor.addHmsPartitionsInTransaction(CatalogOpExecutor.java:5082)
> at 
> org.apache.impala.service.CatalogOpExecutor.alterTableAddPartitions(CatalogOpExecutor.java:4388)
> at 
> org.apache.impala.service.CatalogOpExecutor.alterTable(CatalogOpExecutor.java:1136)
> at 
> org.apache.impala.service.CatalogOpExecutor.execDdlRequest(CatalogOpExecutor.java:450)
> at 
> org.apache.impala.service.JniCatalog.lambda$execDdl$3(JniCatalog.java:304)
> at 
> org.apache.impala.service.JniCatalogOp.lambda$execAndSerialize$1(JniCatalogOp.java:90)
> at org.apache.impala.service.JniCatalogOp.execOp(JniCatalogOp.java:58)
> at 
> org.apache.impala.service.JniCatalogOp.execAndSerialize(JniCatalogOp.java:89)
> at 
> org.apache.impala.service.JniCatalogOp.execAndSerialize(JniCatalogOp.java:100)
> at 
> org.apache.impala.service.JniCatalog.execAndSerialize(JniCatalog.java:233)
> at 
> org.apache.impala.service.JniCatalog.execAndSerialize(JniCatalog.java:247)
> at 
> org.apache.impala.service.JniCatalog.execDdl(JniCatalog.java:303){code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-13147) Add support for limiting the concurrency of link jobs

2024-06-07 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13147:
--

 Summary: Add support for limiting the concurrency of link jobs
 Key: IMPALA-13147
 URL: https://issues.apache.org/jira/browse/IMPALA-13147
 Project: IMPALA
  Issue Type: Improvement
  Components: Infrastructure
Affects Versions: Impala 4.5.0
Reporter: Joe McDonnell


Link jobs can use a lot of memory due to the amount of debug info. The level of 
concurrency that is useful for compilation can be too high for linking. Running 
a link-heavy command like buildall.sh -skiptests can run out of memory from 
linking all of the backend tests / benchmarks.

It would be useful to be able to limit the number of concurrent link jobs. 
There are two basic approaches:

When using the ninja generator for CMake, ninja supports having job pools with 
limited parallelism. CMake has support for mapping link tasks to their own 
pool. Here is an example:
{noformat}
set(CMAKE_JOB_POOLS compilation_pool=24 link_pool=8)
set(CMAKE_JOB_POOL_COMPILE compilation_pool)
set(CMAKE_JOB_POOL_LINK link_pool){noformat}
The makefile generator does not have equivalent functionality, but we could do 
a more limited version where buildall.sh can split the -skiptests into two make 
invocations. The first does all the compilation with full parallelism 
(equivalent to -notests) and then the second make invocation does the backend 
tests / benchmarks with a reduced parallelism.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-13147) Add support for limiting the concurrency of link jobs

2024-06-07 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13147:
--

 Summary: Add support for limiting the concurrency of link jobs
 Key: IMPALA-13147
 URL: https://issues.apache.org/jira/browse/IMPALA-13147
 Project: IMPALA
  Issue Type: Improvement
  Components: Infrastructure
Affects Versions: Impala 4.5.0
Reporter: Joe McDonnell


Link jobs can use a lot of memory due to the amount of debug info. The level of 
concurrency that is useful for compilation can be too high for linking. Running 
a link-heavy command like buildall.sh -skiptests can run out of memory from 
linking all of the backend tests / benchmarks.

It would be useful to be able to limit the number of concurrent link jobs. 
There are two basic approaches:

When using the ninja generator for CMake, ninja supports having job pools with 
limited parallelism. CMake has support for mapping link tasks to their own 
pool. Here is an example:
{noformat}
set(CMAKE_JOB_POOLS compilation_pool=24 link_pool=8)
set(CMAKE_JOB_POOL_COMPILE compilation_pool)
set(CMAKE_JOB_POOL_LINK link_pool){noformat}
The makefile generator does not have equivalent functionality, but we could do 
a more limited version where buildall.sh can split the -skiptests into two make 
invocations. The first does all the compilation with full parallelism 
(equivalent to -notests) and then the second make invocation does the backend 
tests / benchmarks with a reduced parallelism.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13096) Cleanup Parser.jj for Calcite planner to only use supported syntax

2024-06-07 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853235#comment-17853235
 ] 

ASF subversion and git services commented on IMPALA-13096:
--

Commit 141f38197be2ca23757cb8b3f283cdb9dd62de47 in impala's branch 
refs/heads/master from Steve Carlin
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=141f38197 ]

IMPALA-12935: First pass on Calcite planner functions

This commit handles the first pass on getting functions to work
through the Calcite planner. Only basic functions will work with
this commit. Implicit conversions for parameters are not yet supported.
Custom UDFs are also not supported yet.

The ImpalaOperatorTable is used at validation time to check for
existence of the function name for Impala. At first, it will check
Calcite operators for the existence of the function name (A TODO,
IMPALA-13096, is that we need to remove non-supported names from the
parser file). It is preferable to use the Calcite Operator since
Calcite does some optimizations based on the Calcite Operator class.

If the name is not found within the Calcite Operators, a check is done
within the BuiltinsDb (TODO: IMPALA-13095 handle UDFs) for the function.
If found, and SqlOperator class is generated on the fly to handle this
function.

The validation process for Calcite includes a call into the operator
method "inferReturnType". This method will validate that there exists
a function that will handle the operands, and if so, return the "return
type" of the function. In this commit, we will assume that the Calcite
operators will match Impala functionality. In later commits, there
will be overrides where we will use Impala validation for operators
where Calcite's validation isn't good enough.

After validation is complete, the functions will be in a Calcite format.
After the rest of compilation (relnode conversion, optimization) is
complete, the function needs to be converted back into Impala form (the
Expr object) to eventually get it into its thrift request.

In this commit, all functions are converted into Expr starting in the
ImpalaProjectRel, since this is the RelNode where functions do their
thing. The RexCallConverter and RexLiteralConverter get called via the
CreateExprVisitor for this conversion.

Since Calcite is providing the analysis portion of the planning, there
is no need to go through Impala's Analyzer object. However, the Impala
planner requires Expr objects to be analyzed. To get around this, the
AnalyzedFunctionCallExpr and AnalyzedNullLiteral objects exist which
analyze the expression in the constructor. While this could potentially
be combined with the existing FunctionCallExpr and NullLiteral objects,
this fits in with the general plan to avoid changing "fe" Impala code
as much as we can until much later in the commit cycle. Also, there
will be other Analyzed*Expr classes created in the future, but this
commit is intended for basic function call expressions only.

One minor change to the parser is added with this commit. Calcite parser
does not have acknowledge the "string" datatype, so this has been
added here in Parser.jj and config.fmpp.

Change-Id: I2dd4e402d69ee10547abeeafe893164ffd789b88
Reviewed-on: http://gerrit.cloudera.org:8080/21357
Reviewed-by: Michael Smith 
Tested-by: Impala Public Jenkins 


> Cleanup Parser.jj for Calcite planner to only use supported syntax
> --
>
> Key: IMPALA-13096
> URL: https://issues.apache.org/jira/browse/IMPALA-13096
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Steve Carlin
>    Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13095) Handle UDFs in Calcite planner

2024-06-07 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853236#comment-17853236
 ] 

ASF subversion and git services commented on IMPALA-13095:
--

Commit 141f38197be2ca23757cb8b3f283cdb9dd62de47 in impala's branch 
refs/heads/master from Steve Carlin
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=141f38197 ]

IMPALA-12935: First pass on Calcite planner functions

This commit handles the first pass on getting functions to work
through the Calcite planner. Only basic functions will work with
this commit. Implicit conversions for parameters are not yet supported.
Custom UDFs are also not supported yet.

The ImpalaOperatorTable is used at validation time to check for
existence of the function name for Impala. At first, it will check
Calcite operators for the existence of the function name (A TODO,
IMPALA-13096, is that we need to remove non-supported names from the
parser file). It is preferable to use the Calcite Operator since
Calcite does some optimizations based on the Calcite Operator class.

If the name is not found within the Calcite Operators, a check is done
within the BuiltinsDb (TODO: IMPALA-13095 handle UDFs) for the function.
If found, and SqlOperator class is generated on the fly to handle this
function.

The validation process for Calcite includes a call into the operator
method "inferReturnType". This method will validate that there exists
a function that will handle the operands, and if so, return the "return
type" of the function. In this commit, we will assume that the Calcite
operators will match Impala functionality. In later commits, there
will be overrides where we will use Impala validation for operators
where Calcite's validation isn't good enough.

After validation is complete, the functions will be in a Calcite format.
After the rest of compilation (relnode conversion, optimization) is
complete, the function needs to be converted back into Impala form (the
Expr object) to eventually get it into its thrift request.

In this commit, all functions are converted into Expr starting in the
ImpalaProjectRel, since this is the RelNode where functions do their
thing. The RexCallConverter and RexLiteralConverter get called via the
CreateExprVisitor for this conversion.

Since Calcite is providing the analysis portion of the planning, there
is no need to go through Impala's Analyzer object. However, the Impala
planner requires Expr objects to be analyzed. To get around this, the
AnalyzedFunctionCallExpr and AnalyzedNullLiteral objects exist which
analyze the expression in the constructor. While this could potentially
be combined with the existing FunctionCallExpr and NullLiteral objects,
this fits in with the general plan to avoid changing "fe" Impala code
as much as we can until much later in the commit cycle. Also, there
will be other Analyzed*Expr classes created in the future, but this
commit is intended for basic function call expressions only.

One minor change to the parser is added with this commit. Calcite parser
does not have acknowledge the "string" datatype, so this has been
added here in Parser.jj and config.fmpp.

Change-Id: I2dd4e402d69ee10547abeeafe893164ffd789b88
Reviewed-on: http://gerrit.cloudera.org:8080/21357
Reviewed-by: Michael Smith 
Tested-by: Impala Public Jenkins 


> Handle UDFs in Calcite planner
> --
>
> Key: IMPALA-13095
> URL: https://issues.apache.org/jira/browse/IMPALA-13095
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Steve Carlin
>    Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12935) Allow function parsing for Impala Calcite planner

2024-06-07 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853234#comment-17853234
 ] 

ASF subversion and git services commented on IMPALA-12935:
--

Commit 141f38197be2ca23757cb8b3f283cdb9dd62de47 in impala's branch 
refs/heads/master from Steve Carlin
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=141f38197 ]

IMPALA-12935: First pass on Calcite planner functions

This commit handles the first pass on getting functions to work
through the Calcite planner. Only basic functions will work with
this commit. Implicit conversions for parameters are not yet supported.
Custom UDFs are also not supported yet.

The ImpalaOperatorTable is used at validation time to check for
existence of the function name for Impala. At first, it will check
Calcite operators for the existence of the function name (A TODO,
IMPALA-13096, is that we need to remove non-supported names from the
parser file). It is preferable to use the Calcite Operator since
Calcite does some optimizations based on the Calcite Operator class.

If the name is not found within the Calcite Operators, a check is done
within the BuiltinsDb (TODO: IMPALA-13095 handle UDFs) for the function.
If found, and SqlOperator class is generated on the fly to handle this
function.

The validation process for Calcite includes a call into the operator
method "inferReturnType". This method will validate that there exists
a function that will handle the operands, and if so, return the "return
type" of the function. In this commit, we will assume that the Calcite
operators will match Impala functionality. In later commits, there
will be overrides where we will use Impala validation for operators
where Calcite's validation isn't good enough.

After validation is complete, the functions will be in a Calcite format.
After the rest of compilation (relnode conversion, optimization) is
complete, the function needs to be converted back into Impala form (the
Expr object) to eventually get it into its thrift request.

In this commit, all functions are converted into Expr starting in the
ImpalaProjectRel, since this is the RelNode where functions do their
thing. The RexCallConverter and RexLiteralConverter get called via the
CreateExprVisitor for this conversion.

Since Calcite is providing the analysis portion of the planning, there
is no need to go through Impala's Analyzer object. However, the Impala
planner requires Expr objects to be analyzed. To get around this, the
AnalyzedFunctionCallExpr and AnalyzedNullLiteral objects exist which
analyze the expression in the constructor. While this could potentially
be combined with the existing FunctionCallExpr and NullLiteral objects,
this fits in with the general plan to avoid changing "fe" Impala code
as much as we can until much later in the commit cycle. Also, there
will be other Analyzed*Expr classes created in the future, but this
commit is intended for basic function call expressions only.

One minor change to the parser is added with this commit. Calcite parser
does not have acknowledge the "string" datatype, so this has been
added here in Parser.jj and config.fmpp.

Change-Id: I2dd4e402d69ee10547abeeafe893164ffd789b88
Reviewed-on: http://gerrit.cloudera.org:8080/21357
Reviewed-by: Michael Smith 
Tested-by: Impala Public Jenkins 


> Allow function parsing for Impala Calcite planner
> -
>
> Key: IMPALA-12935
> URL: https://issues.apache.org/jira/browse/IMPALA-12935
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Steve Carlin
>Priority: Major
>
> We need the ability to parse and validate Impala functions using the Calcite 
> planner
> This commit is not attended to work for all functions, or even most 
> functions.  It will work as a base to be reviewed, and at least some 
> functions will work.  More complicated functions will be added in a later 
> commit.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13146) Javascript tests sometimes fail to download NodeJS

2024-06-07 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13146:
--

 Summary: Javascript tests sometimes fail to download NodeJS
 Key: IMPALA-13146
 URL: https://issues.apache.org/jira/browse/IMPALA-13146
 Project: IMPALA
  Issue Type: Bug
  Components: Infrastructure
Affects Versions: Impala 4.5.0
Reporter: Joe McDonnell


For automated tests, sometimes the Javascript tests fail to download NodeJS:
{noformat}
01:37:16 Fetching NodeJS v16.20.2-linux-x64 binaries ...
01:37:16   % Total% Received % Xferd  Average Speed   TimeTime Time 
 Current
01:37:16  Dload  Upload   Total   SpentLeft 
 Speed
01:37:16 
  0 00 00 0  0  0 --:--:-- --:--:-- --:--:-- 0
  0 00 00 0  0  0 --:--:--  0:00:01 --:--:-- 0
  0 00 00 0  0  0 --:--:--  0:00:02 --:--:-- 0
  0 21.5M0   9020 0293  0 21:23:04  0:00:03 21:23:01   293
...
 30 21.5M   30 6776k    0     0  50307      0  0:07:28  0:02:17  0:05:11 23826
01:39:34 curl: (18) transfer closed with 15617860 bytes remaining to 
read{noformat}
If this keeps happening, we should mirror the NodeJS binary on the 
native-toolchain s3 bucket.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13146) Javascript tests sometimes fail to download NodeJS

2024-06-07 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13146:
--

 Summary: Javascript tests sometimes fail to download NodeJS
 Key: IMPALA-13146
 URL: https://issues.apache.org/jira/browse/IMPALA-13146
 Project: IMPALA
  Issue Type: Bug
  Components: Infrastructure
Affects Versions: Impala 4.5.0
Reporter: Joe McDonnell


For automated tests, sometimes the Javascript tests fail to download NodeJS:
{noformat}
01:37:16 Fetching NodeJS v16.20.2-linux-x64 binaries ...
01:37:16   % Total% Received % Xferd  Average Speed   TimeTime Time 
 Current
01:37:16  Dload  Upload   Total   SpentLeft 
 Speed
01:37:16 
  0 00 00 0  0  0 --:--:-- --:--:-- --:--:-- 0
  0 00 00 0  0  0 --:--:--  0:00:01 --:--:-- 0
  0 00 00 0  0  0 --:--:--  0:00:02 --:--:-- 0
  0 21.5M0   9020 0293  0 21:23:04  0:00:03 21:23:01   293
...
 30 21.5M   30 6776k    0     0  50307      0  0:07:28  0:02:17  0:05:11 23826
01:39:34 curl: (18) transfer closed with 15617860 bytes remaining to 
read{noformat}
If this keeps happening, we should mirror the NodeJS binary on the 
native-toolchain s3 bucket.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (IMPALA-13130) Under heavy load, Impala does not prioritize data stream operations

2024-06-07 Thread Michael Smith (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith resolved IMPALA-13130.

Fix Version/s: Impala 4.5.0
   Resolution: Fixed

> Under heavy load, Impala does not prioritize data stream operations
> ---
>
> Key: IMPALA-13130
> URL: https://issues.apache.org/jira/browse/IMPALA-13130
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Michael Smith
>Assignee: Michael Smith
>Priority: Major
> Fix For: Impala 4.5.0
>
>
> Under heavy load - where Impala reaches max memory for the DataStreamService 
> and applies backpressure via 
> https://github.com/apache/impala/blob/4.4.0/be/src/rpc/impala-service-pool.cc#L191-L199
>  - DataStreamService does not differentiate between types of requests and may 
> reject requests that could help reduce load.
> The DataStreamService deals with TransmitData, PublishFilter, UpdateFilter, 
> UpdateFilterFromRemote, and EndDataStream. It seems like we should prioritize 
> completing EndDataStream, especially under heavy load, to complete work and 
> release resources more quickly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (IMPALA-13130) Under heavy load, Impala does not prioritize data stream operations

2024-06-07 Thread Michael Smith (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith resolved IMPALA-13130.

Fix Version/s: Impala 4.5.0
   Resolution: Fixed

> Under heavy load, Impala does not prioritize data stream operations
> ---
>
> Key: IMPALA-13130
> URL: https://issues.apache.org/jira/browse/IMPALA-13130
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Michael Smith
>Assignee: Michael Smith
>Priority: Major
> Fix For: Impala 4.5.0
>
>
> Under heavy load - where Impala reaches max memory for the DataStreamService 
> and applies backpressure via 
> https://github.com/apache/impala/blob/4.4.0/be/src/rpc/impala-service-pool.cc#L191-L199
>  - DataStreamService does not differentiate between types of requests and may 
> reject requests that could help reduce load.
> The DataStreamService deals with TransmitData, PublishFilter, UpdateFilter, 
> UpdateFilterFromRemote, and EndDataStream. It seems like we should prioritize 
> completing EndDataStream, especially under heavy load, to complete work and 
> release resources more quickly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-13143) TestCatalogdHA.test_catalogd_failover_with_sync_ddl times out expecting query failure

2024-06-07 Thread Wenzhe Zhou (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenzhe Zhou reassigned IMPALA-13143:


Assignee: Wenzhe Zhou

> TestCatalogdHA.test_catalogd_failover_with_sync_ddl times out expecting query 
> failure
> -
>
> Key: IMPALA-13143
> URL: https://issues.apache.org/jira/browse/IMPALA-13143
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.5.0
>Reporter: Joe McDonnell
>Assignee: Wenzhe Zhou
>Priority: Critical
>  Labels: broken-build, flaky
>
> The new TestCatalogdHA.test_catalogd_failover_with_sync_ddl test is failing 
> intermittently with:
> {noformat}
> custom_cluster/test_catalogd_ha.py:472: in 
> test_catalogd_failover_with_sync_ddl
> self.wait_for_state(handle, QueryState.EXCEPTION, 30, client=client)
> common/impala_test_suite.py:1216: in wait_for_state
> self.wait_for_any_state(handle, [expected_state], timeout, client)
> common/impala_test_suite.py:1234: in wait_for_any_state
> raise Timeout(timeout_msg)
> E   Timeout: query '9d49ab6360f6cbc5:4826a796' did not reach one of 
> the expected states [5], last known state 4{noformat}
> This means the query succeeded even though we expected it to fail. This is 
> currently limited to s3 jobs. In a different test, we saw issues because s3 
> is slower (see IMPALA-12616).
> This test was introduced by IMPALA-13134: 
> https://github.com/apache/impala/commit/70b7b6a78d49c30933d79e0a1c2a725f7e0a3e50



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13145) Upgrade mold linker to 2.31.0

2024-06-07 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13145:
--

 Summary: Upgrade mold linker to 2.31.0
 Key: IMPALA-13145
 URL: https://issues.apache.org/jira/browse/IMPALA-13145
 Project: IMPALA
  Issue Type: Improvement
  Components: Infrastructure
Affects Versions: Impala 4.5.0
Reporter: Joe McDonnell


Mold 2.31.0 claims performance improvements and a reduction in the memory 
needed for linking. See [https://github.com/rui314/mold/releases/tag/v2.31.0] 
and 
[https://github.com/rui314/mold/commit/53ebcd80d888778cde16952270f73343f090f342]

We should move to that version as some developers are seeing issues with high 
memory usage for linking.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13145) Upgrade mold linker to 2.31.0

2024-06-07 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13145:
--

 Summary: Upgrade mold linker to 2.31.0
 Key: IMPALA-13145
 URL: https://issues.apache.org/jira/browse/IMPALA-13145
 Project: IMPALA
  Issue Type: Improvement
  Components: Infrastructure
Affects Versions: Impala 4.5.0
Reporter: Joe McDonnell


Mold 2.31.0 claims performance improvements and a reduction in the memory 
needed for linking. See [https://github.com/rui314/mold/releases/tag/v2.31.0] 
and 
[https://github.com/rui314/mold/commit/53ebcd80d888778cde16952270f73343f090f342]

We should move to that version as some developers are seeing issues with high 
memory usage for linking.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IMPALA-12967) Testcase fails at test_migrated_table_field_id_resolution due to "Table does not exist"

2024-06-07 Thread Joe McDonnell (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853224#comment-17853224
 ] 

Joe McDonnell commented on IMPALA-12967:


There is a separate symptom where this test fails with a Disk I/O error. It is 
probably somewhat related, so we need to decide whether to include that symptom 
here. See IMPALA-13144.

> Testcase fails at test_migrated_table_field_id_resolution due to "Table does 
> not exist"
> ---
>
> Key: IMPALA-12967
> URL: https://issues.apache.org/jira/browse/IMPALA-12967
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Yida Wu
>Assignee: Quanlong Huang
>Priority: Major
>  Labels: broken-build
>
> Testcase test_migrated_table_field_id_resolution fails at exhaustive release 
> build with following messages:
> *Regression*
> {code:java}
> query_test.test_iceberg.TestIcebergTable.test_migrated_table_field_id_resolution[protocol:
>  beeswax | exec_option: {'test_replan': 1, 'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': True, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
> parquet/none] (from pytest)
> {code}
> *Error Message*
> {code:java}
> query_test/test_iceberg.py:266: in test_migrated_table_field_id_resolution
>  "iceberg_migrated_alter_test_orc", "orc") common/file_utils.py:68: in 
> create_iceberg_table_from_directory file_format)) 
> common/impala_connection.py:215: in execute 
> fetch_profile_after_close=fetch_profile_after_close) 
> beeswax/impala_beeswax.py:191: in execute handle = 
> self.__execute_query(query_string.strip(), user=user) 
> beeswax/impala_beeswax.py:384: in __execute_query 
> self.wait_for_finished(handle) beeswax/impala_beeswax.py:405: in 
> wait_for_finished raise ImpalaBeeswaxException("Query aborted:" + 
> error_log, None) E   ImpalaBeeswaxException: ImpalaBeeswaxException: E
> Query aborted:ImpalaRuntimeException: Error making 'createTable' RPC to Hive 
> Metastore:  E   CAUSED BY: IcebergTableLoadingException: Table does not exist 
> at location: 
> hdfs://localhost:20500/test-warehouse/iceberg_migrated_alter_test_orc
> Stacktrace
> query_test/test_iceberg.py:266: in test_migrated_table_field_id_resolution
> "iceberg_migrated_alter_test_orc", "orc")
> common/file_utils.py:68: in create_iceberg_table_from_directory
> file_format))
> common/impala_connection.py:215: in execute
> fetch_profile_after_close=fetch_profile_after_close)
> beeswax/impala_beeswax.py:191: in execute
> handle = self.__execute_query(query_string.strip(), user=user)
> beeswax/impala_beeswax.py:384: in __execute_query
> self.wait_for_finished(handle)
> beeswax/impala_beeswax.py:405: in wait_for_finished
> raise ImpalaBeeswaxException("Query aborted:" + error_log, None)
> E   ImpalaBeeswaxException: ImpalaBeeswaxException:
> EQuery aborted:ImpalaRuntimeException: Error making 'createTable' RPC to 
> Hive Metastore: 
> E   CAUSED BY: IcebergTableLoadingException: Table does not exist at 
> location: 
> hdfs://localhost:20500/test-warehouse/iceberg_migrated_alter_test_orc
> {code}
> *Standard Error*
> {code:java}
> SET 
> client_identifier=query_test/test_iceberg.py::TestIcebergTable::()::test_migrated_table_field_id_resolution[protocol:beeswax|exec_option:{'test_replan':1;'batch_size':0;'num_nodes':0;'disable_codegen_rows_threshold':0;'disable_codegen':True;'abort_on_error':1;'exec_single_;
> SET sync_ddl=False;
> -- executing against localhost:21000
> DROP DATABASE IF EXISTS `test_migrated_table_field_id_resolution_b59d79db` 
> CASCADE;
> -- 2024-04-02 00:56:55,137 INFO MainThread: Started query 
> f34399a8b7cddd67:031a3b96
> SET 
> client_identifier=query_test/test_iceberg.py::TestIcebergTable::()::test_migrated_table_field_id_resolution[protocol:beeswax|exec_option:{'test_replan':1;'batch_size':0;'num_nodes':0;'disable_codegen_rows_threshold':0;'disable_codegen':True;'abort_on_error':1;'exec_single_;
> SET sync_ddl=False;
> -- executing against localhost:21000
> CREATE DATABASE `test_migrated_table_field_id_resolution_b59d79db`;
> -- 2024-04-02 00:56:57,302 INFO MainThread: Started query 
> 94465af69907eac5:e33f17e0
> -- 2024-04-02 00:56:57,353 INFO MainThread: Created database 
> "test_migrated_table_field_id_resolution_b59d79db" for test ID 
> "query_test/test_iceber

[jira] [Commented] (IMPALA-13144) TestIcebergTable.test_migrated_table_field_id_resolution fails with Disk I/O error

2024-06-07 Thread Joe McDonnell (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853223#comment-17853223
 ] 

Joe McDonnell commented on IMPALA-13144:


We need to decide whether we want to track this with IMPALA-12967 (which was 
originally about "Table does not exist at location" on the same test) or keep 
it separate.

> TestIcebergTable.test_migrated_table_field_id_resolution fails with Disk I/O 
> error
> --
>
> Key: IMPALA-13144
> URL: https://issues.apache.org/jira/browse/IMPALA-13144
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.5.0
>Reporter: Joe McDonnell
>Priority: Critical
>  Labels: broken-build, flaky
>
> A couple test jobs hit a failure on 
> TestIcebergTable.test_migrated_table_field_id_resolution:
> {noformat}
> query_test/test_iceberg.py:270: in test_migrated_table_field_id_resolution
> vector, unique_database)
> common/impala_test_suite.py:725: in run_test_case
> result = exec_fn(query, user=test_section.get('USER', '').strip() or None)
> common/impala_test_suite.py:660: in __exec_in_impala
> result = self.__execute_query(target_impalad_client, query, user=user)
> common/impala_test_suite.py:1013: in __execute_query
> return impalad_client.execute(query, user=user)
> common/impala_connection.py:216: in execute
> fetch_profile_after_close=fetch_profile_after_close)
> beeswax/impala_beeswax.py:191: in execute
> handle = self.__execute_query(query_string.strip(), user=user)
> beeswax/impala_beeswax.py:384: in __execute_query
> self.wait_for_finished(handle)
> beeswax/impala_beeswax.py:405: in wait_for_finished
> raise ImpalaBeeswaxException("Query aborted:" + error_log, None)
> E   ImpalaBeeswaxException: ImpalaBeeswaxException:
> EQuery aborted:Disk I/O error on 
> impala-ec2-centos79-m6i-4xlarge-xldisk-153e.vpc.cloudera.com:27000: Failed to 
> open HDFS file 
> hdfs://localhost:20500/test-warehouse/iceberg_migrated_alter_test/00_0
> E   Error(2): No such file or directory
> E   Root cause: RemoteException: File does not exist: 
> /test-warehouse/iceberg_migrated_alter_test/00_0
> E at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:87)
> E at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:77)
> E at 
> org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:159)
> E at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:2040)
> E at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:738)
> E at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:454)
> E at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> E at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:533)
> E at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
> E at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:994)
> E at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:922)
> E at java.security.AccessController.doPrivileged(Native Method)
> E at javax.security.auth.Subject.doAs(Subject.java:422)
> E at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
> E at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2899){noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13144) TestIcebergTable.test_migrated_table_field_id_resolution fails with Disk I/O error

2024-06-07 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13144:
--

 Summary: TestIcebergTable.test_migrated_table_field_id_resolution 
fails with Disk I/O error
 Key: IMPALA-13144
 URL: https://issues.apache.org/jira/browse/IMPALA-13144
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Affects Versions: Impala 4.5.0
Reporter: Joe McDonnell


A couple test jobs hit a failure on 
TestIcebergTable.test_migrated_table_field_id_resolution:
{noformat}
query_test/test_iceberg.py:270: in test_migrated_table_field_id_resolution
vector, unique_database)
common/impala_test_suite.py:725: in run_test_case
result = exec_fn(query, user=test_section.get('USER', '').strip() or None)
common/impala_test_suite.py:660: in __exec_in_impala
result = self.__execute_query(target_impalad_client, query, user=user)
common/impala_test_suite.py:1013: in __execute_query
return impalad_client.execute(query, user=user)
common/impala_connection.py:216: in execute
fetch_profile_after_close=fetch_profile_after_close)
beeswax/impala_beeswax.py:191: in execute
handle = self.__execute_query(query_string.strip(), user=user)
beeswax/impala_beeswax.py:384: in __execute_query
self.wait_for_finished(handle)
beeswax/impala_beeswax.py:405: in wait_for_finished
raise ImpalaBeeswaxException("Query aborted:" + error_log, None)
E   ImpalaBeeswaxException: ImpalaBeeswaxException:
EQuery aborted:Disk I/O error on 
impala-ec2-centos79-m6i-4xlarge-xldisk-153e.vpc.cloudera.com:27000: Failed to 
open HDFS file 
hdfs://localhost:20500/test-warehouse/iceberg_migrated_alter_test/00_0
E   Error(2): No such file or directory
E   Root cause: RemoteException: File does not exist: 
/test-warehouse/iceberg_migrated_alter_test/00_0
E   at 
org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:87)
E   at 
org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:77)
E   at 
org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:159)
E   at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:2040)
E   at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:738)
E   at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:454)
E   at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
E   at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:533)
E   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
E   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:994)
E   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:922)
E   at java.security.AccessController.doPrivileged(Native Method)
E   at javax.security.auth.Subject.doAs(Subject.java:422)
E   at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
E   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2899){noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13144) TestIcebergTable.test_migrated_table_field_id_resolution fails with Disk I/O error

2024-06-07 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13144:
--

 Summary: TestIcebergTable.test_migrated_table_field_id_resolution 
fails with Disk I/O error
 Key: IMPALA-13144
 URL: https://issues.apache.org/jira/browse/IMPALA-13144
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Affects Versions: Impala 4.5.0
Reporter: Joe McDonnell


A couple test jobs hit a failure on 
TestIcebergTable.test_migrated_table_field_id_resolution:
{noformat}
query_test/test_iceberg.py:270: in test_migrated_table_field_id_resolution
vector, unique_database)
common/impala_test_suite.py:725: in run_test_case
result = exec_fn(query, user=test_section.get('USER', '').strip() or None)
common/impala_test_suite.py:660: in __exec_in_impala
result = self.__execute_query(target_impalad_client, query, user=user)
common/impala_test_suite.py:1013: in __execute_query
return impalad_client.execute(query, user=user)
common/impala_connection.py:216: in execute
fetch_profile_after_close=fetch_profile_after_close)
beeswax/impala_beeswax.py:191: in execute
handle = self.__execute_query(query_string.strip(), user=user)
beeswax/impala_beeswax.py:384: in __execute_query
self.wait_for_finished(handle)
beeswax/impala_beeswax.py:405: in wait_for_finished
raise ImpalaBeeswaxException("Query aborted:" + error_log, None)
E   ImpalaBeeswaxException: ImpalaBeeswaxException:
EQuery aborted:Disk I/O error on 
impala-ec2-centos79-m6i-4xlarge-xldisk-153e.vpc.cloudera.com:27000: Failed to 
open HDFS file 
hdfs://localhost:20500/test-warehouse/iceberg_migrated_alter_test/00_0
E   Error(2): No such file or directory
E   Root cause: RemoteException: File does not exist: 
/test-warehouse/iceberg_migrated_alter_test/00_0
E   at 
org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:87)
E   at 
org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:77)
E   at 
org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:159)
E   at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:2040)
E   at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:738)
E   at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:454)
E   at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
E   at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:533)
E   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
E   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:994)
E   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:922)
E   at java.security.AccessController.doPrivileged(Native Method)
E   at javax.security.auth.Subject.doAs(Subject.java:422)
E   at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
E   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2899){noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-13143) TestCatalogdHA.test_catalogd_failover_with_sync_ddl times out expecting query failure

2024-06-07 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13143:
--

 Summary: TestCatalogdHA.test_catalogd_failover_with_sync_ddl times 
out expecting query failure
 Key: IMPALA-13143
 URL: https://issues.apache.org/jira/browse/IMPALA-13143
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Affects Versions: Impala 4.5.0
Reporter: Joe McDonnell


The new TestCatalogdHA.test_catalogd_failover_with_sync_ddl test is failing 
intermittently with:
{noformat}
custom_cluster/test_catalogd_ha.py:472: in test_catalogd_failover_with_sync_ddl
self.wait_for_state(handle, QueryState.EXCEPTION, 30, client=client)
common/impala_test_suite.py:1216: in wait_for_state
self.wait_for_any_state(handle, [expected_state], timeout, client)
common/impala_test_suite.py:1234: in wait_for_any_state
raise Timeout(timeout_msg)
E   Timeout: query '9d49ab6360f6cbc5:4826a796' did not reach one of the 
expected states [5], last known state 4{noformat}
This means the query succeeded even though we expected it to fail. This is 
currently limited to s3 jobs. In a different test, we saw issues because s3 is 
slower (see IMPALA-12616).

This test was introduced by IMPALA-13134: 
https://github.com/apache/impala/commit/70b7b6a78d49c30933d79e0a1c2a725f7e0a3e50



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-13143) TestCatalogdHA.test_catalogd_failover_with_sync_ddl times out expecting query failure

2024-06-07 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13143:
--

 Summary: TestCatalogdHA.test_catalogd_failover_with_sync_ddl times 
out expecting query failure
 Key: IMPALA-13143
 URL: https://issues.apache.org/jira/browse/IMPALA-13143
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Affects Versions: Impala 4.5.0
Reporter: Joe McDonnell


The new TestCatalogdHA.test_catalogd_failover_with_sync_ddl test is failing 
intermittently with:
{noformat}
custom_cluster/test_catalogd_ha.py:472: in test_catalogd_failover_with_sync_ddl
self.wait_for_state(handle, QueryState.EXCEPTION, 30, client=client)
common/impala_test_suite.py:1216: in wait_for_state
self.wait_for_any_state(handle, [expected_state], timeout, client)
common/impala_test_suite.py:1234: in wait_for_any_state
raise Timeout(timeout_msg)
E   Timeout: query '9d49ab6360f6cbc5:4826a796' did not reach one of the 
expected states [5], last known state 4{noformat}
This means the query succeeded even though we expected it to fail. This is 
currently limited to s3 jobs. In a different test, we saw issues because s3 is 
slower (see IMPALA-12616).

This test was introduced by IMPALA-13134: 
https://github.com/apache/impala/commit/70b7b6a78d49c30933d79e0a1c2a725f7e0a3e50



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-12616) test_restart_catalogd_while_handling_rpc_response* tests fail not reaching expected states

2024-06-07 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-12616.

Fix Version/s: Impala 4.5.0
   Resolution: Fixed

I think the s3 slowness version of this is fixed, so I'm going to resolve this.

> test_restart_catalogd_while_handling_rpc_response* tests fail not reaching 
> expected states
> --
>
> Key: IMPALA-12616
> URL: https://issues.apache.org/jira/browse/IMPALA-12616
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 1.4.2
>Reporter: Andrew Sherman
>Assignee: Daniel Becker
>Priority: Critical
> Fix For: Impala 4.5.0
>
>
> There are failures in both 
> custom_cluster.test_restart_services.TestRestart.test_restart_catalogd_while_handling_rpc_response_with_timeout
>  and 
> custom_cluster.test_restart_services.TestRestart.test_restart_catalogd_while_handling_rpc_response_with_max_iters,
>  both look the same:
> {code:java}
> custom_cluster/test_restart_services.py:232: in 
> test_restart_catalogd_while_handling_rpc_response_with_timeout
> self.wait_for_state(handle, self.client.QUERY_STATES["FINISHED"], 
> max_wait_time)
> common/impala_test_suite.py:1181: in wait_for_state
> self.wait_for_any_state(handle, [expected_state], timeout, client)
> common/impala_test_suite.py:1199: in wait_for_any_state
> raise Timeout(timeout_msg)
> E   Timeout: query '6a4e0bad9b511ccf:bf93de68' did not reach one of 
> the expected states [4], last known state 5
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (IMPALA-12616) test_restart_catalogd_while_handling_rpc_response* tests fail not reaching expected states

2024-06-07 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-12616.

Fix Version/s: Impala 4.5.0
   Resolution: Fixed

I think the s3 slowness version of this is fixed, so I'm going to resolve this.

> test_restart_catalogd_while_handling_rpc_response* tests fail not reaching 
> expected states
> --
>
> Key: IMPALA-12616
> URL: https://issues.apache.org/jira/browse/IMPALA-12616
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 1.4.2
>Reporter: Andrew Sherman
>Assignee: Daniel Becker
>Priority: Critical
> Fix For: Impala 4.5.0
>
>
> There are failures in both 
> custom_cluster.test_restart_services.TestRestart.test_restart_catalogd_while_handling_rpc_response_with_timeout
>  and 
> custom_cluster.test_restart_services.TestRestart.test_restart_catalogd_while_handling_rpc_response_with_max_iters,
>  both look the same:
> {code:java}
> custom_cluster/test_restart_services.py:232: in 
> test_restart_catalogd_while_handling_rpc_response_with_timeout
> self.wait_for_state(handle, self.client.QUERY_STATES["FINISHED"], 
> max_wait_time)
> common/impala_test_suite.py:1181: in wait_for_state
> self.wait_for_any_state(handle, [expected_state], timeout, client)
> common/impala_test_suite.py:1199: in wait_for_any_state
> raise Timeout(timeout_msg)
> E   Timeout: query '6a4e0bad9b511ccf:bf93de68' did not reach one of 
> the expected states [4], last known state 5
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12322) return wrong timestamp when scan kudu timestamp with timezone

2024-06-07 Thread Csaba Ringhofer (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853203#comment-17853203
 ] 

Csaba Ringhofer commented on IMPALA-12322:
--

Thanks for the feedback[~eyizoha]. I have uploaded a patch that adds a new 
query option:  https://gerrit.cloudera.org/#/c/21492/

> return wrong timestamp when scan kudu timestamp with timezone
> -
>
> Key: IMPALA-12322
> URL: https://issues.apache.org/jira/browse/IMPALA-12322
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 4.1.1
> Environment: impala 4.1.1
>Reporter: daicheng
>Assignee: Zihao Ye
>Priority: Major
> Attachments: image-2022-04-24-00-01-05-746-1.png, 
> image-2022-04-24-00-01-05-746.png, image-2022-04-24-00-01-37-520.png, 
> image-2022-04-24-00-03-14-467-1.png, image-2022-04-24-00-03-14-467.png, 
> image-2022-04-24-00-04-16-240-1.png, image-2022-04-24-00-04-16-240.png, 
> image-2022-04-24-00-04-52-860-1.png, image-2022-04-24-00-04-52-860.png, 
> image-2022-04-24-00-05-52-086-1.png, image-2022-04-24-00-05-52-086.png, 
> image-2022-04-24-00-07-09-776-1.png, image-2022-04-24-00-07-09-776.png, 
> image-2023-07-28-20-31-09-457.png, image-2023-07-28-22-27-38-521.png, 
> image-2023-07-28-22-29-40-083.png, image-2023-07-28-22-36-17-460.png, 
> image-2023-07-28-22-36-37-884.png, image-2023-07-28-22-38-19-728.png
>
>
> impala version is 3.1.0-cdh6.1
> i have set system timezone=Asia/Shanghai:
> !image-2022-04-24-00-01-37-520.png!
> !image-2022-04-24-00-01-05-746.png!
> here is the bug:
> *step 1*
> i have parquet file with two columns like below,and read it with impala-shell 
> and spark (timezone=shanghai)
> !image-2022-04-24-00-03-14-467.png|width=1016,height=154!
> !image-2022-04-24-00-04-16-240.png|width=944,height=367!
> the result both exactly right。
> *step two*
> create kudu table  with impala-shell:
> CREATE TABLE default.test_{_}test{_}_test_time2 (id BIGINT,t 
> TIMESTAMP,PRIMARY KEY (id) ) STORED AS KUDU;
> note: kudu version:1.8
> and  insert 2 row into the table with spark :
> !image-2022-04-24-00-04-52-860.png|width=914,height=279!
> *stop 3*
> read it with spark (timezone=shanghai),spark read kudu table with kudu-client 
> api,here is the result:
> !image-2022-04-24-00-05-52-086.png|width=914,height=301!
> the result is still exactly right。
> but read it with impala-shell: 
> !image-2022-04-24-00-07-09-776.png|width=915,height=154!
> the result show late 8hour
> *conclusion*
>    it seems like impala timezone didn't work when kudu column type is 
> timestamp, but it work fine in parquet file,I don't know why?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12616) test_restart_catalogd_while_handling_rpc_response* tests fail not reaching expected states

2024-06-07 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853196#comment-17853196
 ] 

ASF subversion and git services commented on IMPALA-12616:
--

Commit 1935f9e1a199c958c5fb12ad53277fa720d6ae5c in impala's branch 
refs/heads/master from Joe McDonnell
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=1935f9e1a ]

IMPALA-12616: Fix test_restart_services.py::TestRestart tests for S3

The test_restart_catalogd_while_handling_rpc_response* tests
from custom_cluster/test_restart_services.py have been failing
consistently on s3. The alter table statement is expected to
succeed, but instead it fails with:
"CatalogException: Detected catalog service ID changes"
This manifests as a timeout waiting for the statement to reach
the finished state.

The test relies on specific timing with a sleep injected via a
debug action. The failure stems from the catalog being slower
on s3. The alter table wakes up before the catalog service ID
change has fully completed, and it fails when it sees the
catalog service ID change.

This increases two sleep times:
1. This increases the sleep time before restarting the catalogd
   from 0.5 seconds to 5 seconds. This gives the catalogd longer
   to receive the message about the alter table and respond back
   to the impalad.
2. This increases the WAIT_BEFORE_PROCESSING_CATALOG_UPDATE
   sleep from 10 seconds to 30 seconds so the alter table
   statement doesn't wake up until the catalog service ID change
   is finalized.
The test is verifying that the right messages are in the impalad
logs, so we know this is still testing the same condition.

This modifies the tests to use wait_for_finished_timeout()
rather than wait_for_state(). This bails out immediately if the
query fails rather than waiting unnecessarily for the full timeout.
This also clears the query options so that later statements
don't inherit the debug_action that the alter table statement
used.

Testing:
 - Ran the tests 100x in a loop on s3
 - Ran the tests 100x in a loop on HDFS

Change-Id: Ieb5699b8fb0b2ad8bad4ac30922a7b4d7fa17d29
Reviewed-on: http://gerrit.cloudera.org:8080/21485
Tested-by: Impala Public Jenkins 
Reviewed-by: Daniel Becker 


> test_restart_catalogd_while_handling_rpc_response* tests fail not reaching 
> expected states
> --
>
> Key: IMPALA-12616
> URL: https://issues.apache.org/jira/browse/IMPALA-12616
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 1.4.2
>Reporter: Andrew Sherman
>Assignee: Daniel Becker
>Priority: Critical
>
> There are failures in both 
> custom_cluster.test_restart_services.TestRestart.test_restart_catalogd_while_handling_rpc_response_with_timeout
>  and 
> custom_cluster.test_restart_services.TestRestart.test_restart_catalogd_while_handling_rpc_response_with_max_iters,
>  both look the same:
> {code:java}
> custom_cluster/test_restart_services.py:232: in 
> test_restart_catalogd_while_handling_rpc_response_with_timeout
> self.wait_for_state(handle, self.client.QUERY_STATES["FINISHED"], 
> max_wait_time)
> common/impala_test_suite.py:1181: in wait_for_state
> self.wait_for_any_state(handle, [expected_state], timeout, client)
> common/impala_test_suite.py:1199: in wait_for_any_state
> raise Timeout(timeout_msg)
> E   Timeout: query '6a4e0bad9b511ccf:bf93de68' did not reach one of 
> the expected states [4], last known state 5
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13142) Documentation for Impala StateStore HA

2024-06-07 Thread Sanjana Malhotra (Jira)
Sanjana Malhotra created IMPALA-13142:
-

 Summary: Documentation for Impala StateStore HA
 Key: IMPALA-13142
 URL: https://issues.apache.org/jira/browse/IMPALA-13142
 Project: IMPALA
  Issue Type: Documentation
Reporter: Sanjana Malhotra
Assignee: Sanjana Malhotra


IMPALA-12156



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13142) Documentation for Impala StateStore HA

2024-06-07 Thread Sanjana Malhotra (Jira)
Sanjana Malhotra created IMPALA-13142:
-

 Summary: Documentation for Impala StateStore HA
 Key: IMPALA-13142
 URL: https://issues.apache.org/jira/browse/IMPALA-13142
 Project: IMPALA
  Issue Type: Documentation
Reporter: Sanjana Malhotra
Assignee: Sanjana Malhotra


IMPALA-12156



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IMPALA-13137) Add additional client fetch metrics columns to the queries page

2024-06-07 Thread Surya Hebbar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853096#comment-17853096
 ] 

Surya Hebbar commented on IMPALA-13137:
---

It was confirmed in the meeting, that the expected column was the 
{{{}ClientFetchWaitTimer{}}}'s value and not the difference between "First row 
fetched" and "Last row fetched".

> Add additional client fetch metrics columns to the queries page
> ---
>
> Key: IMPALA-13137
> URL: https://issues.apache.org/jira/browse/IMPALA-13137
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Backend, be
>Reporter: Surya Hebbar
>Assignee: Surya Hebbar
>Priority: Major
> Attachments: completed_query.png, in_flight_query_1.png, 
> in_flight_query_2.png, in_flight_query_3.png, very_short_fetch_timer.png
>
>
> For helping users to better understand query execution times,  it would be 
> helpful to add the following columns on the queries page.
> * First row fetched time - Time taken for the client to fetch the first row
> * Client fetch wait time - Time taken for the client to fetch all rows
> Additional details -
> https://jira.cloudera.com/browse/DWX-18295



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-13141) Partition transactional table is not updated on alter partition when hms_event_incremental_refresh_transactional_table is disabled

2024-06-07 Thread Venugopal Reddy K (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venugopal Reddy K updated IMPALA-13141:
---
Description: 
Partition transactional table is not updated on alter partition when 
hms_event_incremental_refresh_transactional_table is disabled. 

*Observations:*

1. In case of AlterPartitionEvent, this issue occurs when 
hms_event_incremental_refresh_transactional_table is disabled.

2. In case of BatchPartitionEvent(when. more than 1 AlterPartitionEvent are 
batched together), this issue occurs without disabling 
hms_event_incremental_refresh_transactional_table.

*Steps to reproduce:*

1. Create partitioned table and add some partitions from hive:

Note: This step can be from impala too.
{code:java}
0: jdbc:hive2://localhost:11050> create table s(i int, j int, p int);
0: jdbc:hive2://localhost:11050> insert into s values(1,10,100),(2,20,200);

{code}
{code:java}
0: jdbc:hive2://localhost:11050> create table test1(i int, j int) partitioned 
by(p int) tblproperties ('transactional'='true', 
'transactional_properties'='insert_only');
0: jdbc:hive2://localhost:11050> set hive.exec.dynamic.partition.mode=nonstrict;
0: jdbc:hive2://localhost:11050> insert into test partition(p) select * from s;
0: jdbc:hive2://localhost:11050> show partitions test;
++
| partition  |
++
| p=100      |
| p=200      |
++
0: jdbc:hive2://localhost:11050> desc formatted test partition(p=100);
+---++---+
|             col_name              |                     data_type             
         |        comment        |
+---++---+
| i                                 | int                                       
         |                       |
| j                                 | int                                       
         |                       |
|                                   | NULL                                      
         | NULL                  |
| # Partition Information           | NULL                                      
         | NULL                  |
| # col_name                        | data_type                                 
         | comment               |
| p                                 | int                                       
         |                       |
|                                   | NULL                                      
         | NULL                  |
| # Detailed Partition Information  | NULL                                      
         | NULL                  |
| Partition Value:                  | [100]                                     
         | NULL                  |
| Database:                         | default                                   
         | NULL                  |
| Table:                            | test                                      
         | NULL                  |
| CreateTime:                       | Fri Jun 07 14:21:17 IST 2024              
         | NULL                  |
| LastAccessTime:                   | UNKNOWN                                   
         | NULL                  |
| Location:                         | 
hdfs://localhost:20500/test-warehouse/managed/test/p=100 | NULL                 
 |
| Partition Parameters:             | NULL                                      
         | NULL                  |
|                                   | numFiles                                  
         | 1                     |
|                                   | totalSize                                 
         | 5                     |
|                                   | transient_lastDdlTime                     
         | 1717750277            |
|                                   | NULL                                      
         | NULL                  |
| # Storage Information             | NULL                                      
         | NULL                  |
| SerDe Library:                    | 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | NULL                  |
| InputFormat:                      | org.apache.hadoop.mapred.TextInputFormat  
         | NULL                  |
| OutputFormat:                     | 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | NULL               
   |
| Compressed:                       | No                                        
         | NULL                  |
| Num Buckets:                      | -1                                        
         | NULL                  |
| Bucket Columns:                   | []                                        
         | NULL             

[jira] [Updated] (IMPALA-13141) Partition transactional table is not updated on alter partition when hms_event_incremental_refresh_transactional_table is disabled

2024-06-07 Thread Venugopal Reddy K (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venugopal Reddy K updated IMPALA-13141:
---
Description: 
Partition transactional table is not updated on alter partition when 
hms_event_incremental_refresh_transactional_table is disabled. 

*Observations:*

1. In case of AlterPartitionEvent, this issue occurs when 
hms_event_incremental_refresh_transactional_table is disabled.

2. In case of BatchPartitionEvent(when. more than 1 

AlterPartitionEvent are batched together), this issue occurs without disabling 
hms_event_incremental_refresh_transactional_table.

*Steps to reproduce:*

1. Create partitioned table and add some partitions from hive:

Note: This step can be from impala too.
{code:java}
0: jdbc:hive2://localhost:11050> create table s(i int, j int, p int);
0: jdbc:hive2://localhost:11050> insert into s values(1,10,100),(2,20,200);

{code}
{code:java}
0: jdbc:hive2://localhost:11050> create table test1(i int, j int) partitioned 
by(p int) tblproperties ('transactional'='true', 
'transactional_properties'='insert_only');
0: jdbc:hive2://localhost:11050> set hive.exec.dynamic.partition.mode=nonstrict;
0: jdbc:hive2://localhost:11050> insert into test partition(p) select * from s;
0: jdbc:hive2://localhost:11050> show partitions test;
++
| partition  |
++
| p=100      |
| p=200      |
++
0: jdbc:hive2://localhost:11050> desc formatted test partition(p=100);
+---++---+
|             col_name              |                     data_type             
         |        comment        |
+---++---+
| i                                 | int                                       
         |                       |
| j                                 | int                                       
         |                       |
|                                   | NULL                                      
         | NULL                  |
| # Partition Information           | NULL                                      
         | NULL                  |
| # col_name                        | data_type                                 
         | comment               |
| p                                 | int                                       
         |                       |
|                                   | NULL                                      
         | NULL                  |
| # Detailed Partition Information  | NULL                                      
         | NULL                  |
| Partition Value:                  | [100]                                     
         | NULL                  |
| Database:                         | default                                   
         | NULL                  |
| Table:                            | test                                      
         | NULL                  |
| CreateTime:                       | Fri Jun 07 14:21:17 IST 2024              
         | NULL                  |
| LastAccessTime:                   | UNKNOWN                                   
         | NULL                  |
| Location:                         | 
hdfs://localhost:20500/test-warehouse/managed/test/p=100 | NULL                 
 |
| Partition Parameters:             | NULL                                      
         | NULL                  |
|                                   | numFiles                                  
         | 1                     |
|                                   | totalSize                                 
         | 5                     |
|                                   | transient_lastDdlTime                     
         | 1717750277            |
|                                   | NULL                                      
         | NULL                  |
| # Storage Information             | NULL                                      
         | NULL                  |
| SerDe Library:                    | 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | NULL                  |
| InputFormat:                      | org.apache.hadoop.mapred.TextInputFormat  
         | NULL                  |
| OutputFormat:                     | 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | NULL               
   |
| Compressed:                       | No                                        
         | NULL                  |
| Num Buckets:                      | -1                                        
         | NULL                  |
| Bucket Columns:                   | []                                        
         | NULL             

[jira] [Created] (IMPALA-13141) Partition transactional table is not updated on alter partition when hms_event_incremental_refresh_transactional_table is disabled

2024-06-07 Thread Venugopal Reddy K (Jira)
Venugopal Reddy K created IMPALA-13141:
--

 Summary: Partition transactional table is not updated on alter 
partition when hms_event_incremental_refresh_transactional_table is disabled
 Key: IMPALA-13141
 URL: https://issues.apache.org/jira/browse/IMPALA-13141
 Project: IMPALA
  Issue Type: Bug
Reporter: Venugopal Reddy K


Partition transactional table is not updated on alter partition when 
hms_event_incremental_refresh_transactional_table is disabled. 

*Observations:*

1. In case of AlterPartitionEvent, this issue occurs when 
hms_event_incremental_refresh_transactional_table is disabled.

2. In case of BatchPartitionEvent(when. more than 1 

AlterPartitionEvent are batched together), this issue occurs without disabling 
hms_event_incremental_refresh_transactional_table.

*Steps to reproduce:*

1. Create partitioned table and add some partitions from hive:

Note: This step can be from impala too.
{code:java}
0: jdbc:hive2://localhost:11050> create table s(i int, j int, p int);
0: jdbc:hive2://localhost:11050> insert into s values(1,10,100),(2,20,200);

{code}
{code:java}
0: jdbc:hive2://localhost:11050> create table test1(i int, j int) partitioned 
by(p int) tblproperties ('transactional'='true', 
'transactional_properties'='insert_only');
0: jdbc:hive2://localhost:11050> set hive.exec.dynamic.partition.mode=nonstrict;
0: jdbc:hive2://localhost:11050> insert into test partition(p) select * from s;
0: jdbc:hive2://localhost:11050> show partitions test;
++
| partition  |
++
| p=100      |
| p=200      |
++
0: jdbc:hive2://localhost:11050> desc formatted test partition(p=100);
+---++---+
|             col_name              |                     data_type             
         |        comment        |
+---++---+
| i                                 | int                                       
         |                       |
| j                                 | int                                       
         |                       |
|                                   | NULL                                      
         | NULL                  |
| # Partition Information           | NULL                                      
         | NULL                  |
| # col_name                        | data_type                                 
         | comment               |
| p                                 | int                                       
         |                       |
|                                   | NULL                                      
         | NULL                  |
| # Detailed Partition Information  | NULL                                      
         | NULL                  |
| Partition Value:                  | [100]                                     
         | NULL                  |
| Database:                         | default                                   
         | NULL                  |
| Table:                            | test                                      
         | NULL                  |
| CreateTime:                       | Fri Jun 07 14:21:17 IST 2024              
         | NULL                  |
| LastAccessTime:                   | UNKNOWN                                   
         | NULL                  |
| Location:                         | 
hdfs://localhost:20500/test-warehouse/managed/test/p=100 | NULL                 
 |
| Partition Parameters:             | NULL                                      
         | NULL                  |
|                                   | numFiles                                  
         | 1                     |
|                                   | totalSize                                 
         | 5                     |
|                                   | transient_lastDdlTime                     
         | 1717750277            |
|                                   | NULL                                      
         | NULL                  |
| # Storage Information             | NULL                                      
         | NULL                  |
| SerDe Library:                    | 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | NULL                  |
| InputFormat:                      | org.apache.hadoop.mapred.TextInputFormat  
         | NULL                  |
| OutputFormat:                     | 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | NULL               
   |
| Compressed:                       | No                                        
         | NULL                  |
| Num Buckets:     

[jira] [Created] (IMPALA-13141) Partition transactional table is not updated on alter partition when hms_event_incremental_refresh_transactional_table is disabled

2024-06-07 Thread Venugopal Reddy K (Jira)
Venugopal Reddy K created IMPALA-13141:
--

 Summary: Partition transactional table is not updated on alter 
partition when hms_event_incremental_refresh_transactional_table is disabled
 Key: IMPALA-13141
 URL: https://issues.apache.org/jira/browse/IMPALA-13141
 Project: IMPALA
  Issue Type: Bug
Reporter: Venugopal Reddy K


Partition transactional table is not updated on alter partition when 
hms_event_incremental_refresh_transactional_table is disabled. 

*Observations:*

1. In case of AlterPartitionEvent, this issue occurs when 
hms_event_incremental_refresh_transactional_table is disabled.

2. In case of BatchPartitionEvent(when. more than 1 

AlterPartitionEvent are batched together), this issue occurs without disabling 
hms_event_incremental_refresh_transactional_table.

*Steps to reproduce:*

1. Create partitioned table and add some partitions from hive:

Note: This step can be from impala too.
{code:java}
0: jdbc:hive2://localhost:11050> create table s(i int, j int, p int);
0: jdbc:hive2://localhost:11050> insert into s values(1,10,100),(2,20,200);

{code}
{code:java}
0: jdbc:hive2://localhost:11050> create table test1(i int, j int) partitioned 
by(p int) tblproperties ('transactional'='true', 
'transactional_properties'='insert_only');
0: jdbc:hive2://localhost:11050> set hive.exec.dynamic.partition.mode=nonstrict;
0: jdbc:hive2://localhost:11050> insert into test partition(p) select * from s;
0: jdbc:hive2://localhost:11050> show partitions test;
++
| partition  |
++
| p=100      |
| p=200      |
++
0: jdbc:hive2://localhost:11050> desc formatted test partition(p=100);
+---++---+
|             col_name              |                     data_type             
         |        comment        |
+---++---+
| i                                 | int                                       
         |                       |
| j                                 | int                                       
         |                       |
|                                   | NULL                                      
         | NULL                  |
| # Partition Information           | NULL                                      
         | NULL                  |
| # col_name                        | data_type                                 
         | comment               |
| p                                 | int                                       
         |                       |
|                                   | NULL                                      
         | NULL                  |
| # Detailed Partition Information  | NULL                                      
         | NULL                  |
| Partition Value:                  | [100]                                     
         | NULL                  |
| Database:                         | default                                   
         | NULL                  |
| Table:                            | test                                      
         | NULL                  |
| CreateTime:                       | Fri Jun 07 14:21:17 IST 2024              
         | NULL                  |
| LastAccessTime:                   | UNKNOWN                                   
         | NULL                  |
| Location:                         | 
hdfs://localhost:20500/test-warehouse/managed/test/p=100 | NULL                 
 |
| Partition Parameters:             | NULL                                      
         | NULL                  |
|                                   | numFiles                                  
         | 1                     |
|                                   | totalSize                                 
         | 5                     |
|                                   | transient_lastDdlTime                     
         | 1717750277            |
|                                   | NULL                                      
         | NULL                  |
| # Storage Information             | NULL                                      
         | NULL                  |
| SerDe Library:                    | 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | NULL                  |
| InputFormat:                      | org.apache.hadoop.mapred.TextInputFormat  
         | NULL                  |
| OutputFormat:                     | 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | NULL               
   |
| Compressed:                       | No                                        
         | NULL                  |
| Num Buckets:     

[jira] [Assigned] (IMPALA-13140) Add backend flag to disable small string optimization

2024-06-07 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-13140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy reassigned IMPALA-13140:
--

Assignee: Zoltán Borók-Nagy

> Add backend flag to disable small string optimization
> -
>
> Key: IMPALA-13140
> URL: https://issues.apache.org/jira/browse/IMPALA-13140
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Critical
>
> We could have a backend flag that would make SmallableString::Smallify() a 
> no-op.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13140) Add backend flag to disable small string optimization

2024-06-07 Thread Jira
Zoltán Borók-Nagy created IMPALA-13140:
--

 Summary: Add backend flag to disable small string optimization
 Key: IMPALA-13140
 URL: https://issues.apache.org/jira/browse/IMPALA-13140
 Project: IMPALA
  Issue Type: Sub-task
Reporter: Zoltán Borók-Nagy


We could have a backend flag that would make SmallableString::Smallify() a 
no-op.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13140) Add backend flag to disable small string optimization

2024-06-07 Thread Jira
Zoltán Borók-Nagy created IMPALA-13140:
--

 Summary: Add backend flag to disable small string optimization
 Key: IMPALA-13140
 URL: https://issues.apache.org/jira/browse/IMPALA-13140
 Project: IMPALA
  Issue Type: Sub-task
Reporter: Zoltán Borók-Nagy


We could have a backend flag that would make SmallableString::Smallify() a 
no-op.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IMPALA-13130) Under heavy load, Impala does not prioritize data stream operations

2024-06-07 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853074#comment-17853074
 ] 

ASF subversion and git services commented on IMPALA-13130:
--

Commit 3f827bfc2447d8c11a4f09bcb96e86c53b92d753 in impala's branch 
refs/heads/master from Michael Smith
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=3f827bfc2 ]

IMPALA-13130: Prioritize EndDataStream messages

Prioritize EndDataStream messages over other types handled by
DataStreamService, and avoid rejecting them when memory limit is
reached. They take very little memory (~75 bytes) and will usually help
reduce memory use by closing out in-progress operations.

Adds the 'data_stream_sender_eos_timeout_ms' flag to control EOS
timeouts. Defaults to 1 hour, and can be disabled by setting to -1.

Adds unit tests ensuring EOS are processed even if mem limit is reached
and ahead of TransmitData messages in the queue.

Change-Id: I2829e1ab5bcde36107e10bff5fe629c5ee60f3e8
Reviewed-on: http://gerrit.cloudera.org:8080/21476
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Under heavy load, Impala does not prioritize data stream operations
> ---
>
> Key: IMPALA-13130
> URL: https://issues.apache.org/jira/browse/IMPALA-13130
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Michael Smith
>Assignee: Michael Smith
>Priority: Major
>
> Under heavy load - where Impala reaches max memory for the DataStreamService 
> and applies backpressure via 
> https://github.com/apache/impala/blob/4.4.0/be/src/rpc/impala-service-pool.cc#L191-L199
>  - DataStreamService does not differentiate between types of requests and may 
> reject requests that could help reduce load.
> The DataStreamService deals with TransmitData, PublishFilter, UpdateFilter, 
> UpdateFilterFromRemote, and EndDataStream. It seems like we should prioritize 
> completing EndDataStream, especially under heavy load, to complete work and 
> release resources more quickly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-12569) Harden long string testing

2024-06-07 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-12569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy updated IMPALA-12569:
---
Priority: Critical  (was: Major)

> Harden long string testing
> --
>
> Key: IMPALA-12569
> URL: https://issues.apache.org/jira/browse/IMPALA-12569
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend, Infrastructure
>Reporter: Zoltán Borók-Nagy
>Priority: Critical
>
> With small string optimization [~csringhofer] pointed out that most of our 
> test data have small strings. And new features are typically tested on the 
> existing test tables (e.g. alltypes that only have small strings), or they 
> add new tests with usually small strings only. The latter is hard to prevent. 
> Therefore the long strings might have less test coverage if we don't pay 
> enough attention.
> To make the situation better, we could
>  # Add long string data to the string column of alltypes table and 
> complextypestbl and update the tests
>  # Add backend flag the makes StringValue.Smallify() a no-op, and create a 
> test job (probably with an ASAN build) that runs the tests with that flag 
> turned on.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-13139) Query options set via ImpalaTestSuite::execute_query_expect_success stay set for subsequent queries

2024-06-06 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell updated IMPALA-13139:
---
Description: 
When debugging TestRestart, I noticed that the debug_action set for one query 
stayed in effect for subsequent queries that didn't specify query_options.
{noformat}
    DEBUG_ACTION = ("WAIT_BEFORE_PROCESSING_CATALOG_UPDATE:SLEEP@{}"
                    .format(debug_action_sleep_time_sec * 1000))

    query = "alter table {} add columns (age int)".format(tbl_name)
    handle = self.execute_query_async(query, query_options={"debug_action": 
DEBUG_ACTION})

...

# debug_action is still set for these queries:
    self.execute_query_expect_success(self.client, "select age from 
{}".format(tbl_name))
self.execute_query_expect_success(self.client,
        "alter table {} add columns (name string)".format(tbl_name))
    self.execute_query_expect_success(self.client, "select name from 
{}".format(tbl_name)){noformat}
There is a way to clear the query options (self.client.clear_configuration()), 
but this is an odd behavior. It's unclear if some tests rely on this behavior.

> Query options set via ImpalaTestSuite::execute_query_expect_success stay set 
> for subsequent queries
> ---
>
> Key: IMPALA-13139
> URL: https://issues.apache.org/jira/browse/IMPALA-13139
> Project: IMPALA
>  Issue Type: Task
>  Components: Infrastructure
>Affects Versions: Impala 4.5.0
>Reporter: Joe McDonnell
>Priority: Major
>
> When debugging TestRestart, I noticed that the debug_action set for one query 
> stayed in effect for subsequent queries that didn't specify query_options.
> {noformat}
>     DEBUG_ACTION = ("WAIT_BEFORE_PROCESSING_CATALOG_UPDATE:SLEEP@{}"
>                     .format(debug_action_sleep_time_sec * 1000))
>     query = "alter table {} add columns (age int)".format(tbl_name)
>     handle = self.execute_query_async(query, query_options={"debug_action": 
> DEBUG_ACTION})
> ...
> # debug_action is still set for these queries:
>     self.execute_query_expect_success(self.client, "select age from 
> {}".format(tbl_name))
> self.execute_query_expect_success(self.client,
>         "alter table {} add columns (name string)".format(tbl_name))
>     self.execute_query_expect_success(self.client, "select name from 
> {}".format(tbl_name)){noformat}
> There is a way to clear the query options 
> (self.client.clear_configuration()), but this is an odd behavior. It's 
> unclear if some tests rely on this behavior.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



  1   2   3   4   5   6   7   8   9   10   >