[jira] [Commented] (IMPALA-13009) Possible leak of partition updates when the table has failed DDL and recovered by INVALIDATE METADATA

2024-04-17 Thread Fang-Yu Rao (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838405#comment-17838405
 ] 

Fang-Yu Rao commented on IMPALA-13009:
--

Thanks for the detailed steps to reproduce the issue [~stigahuang]!

I have tried your latest script at 
https://issues.apache.org/jira/browse/IMPALA-13009?focusedCommentId=17838211=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17838211
 and found that I could also reproduce the issue after restarting only the 
Impala daemons (via "{*}bin/start-impala-cluster.py -r{*}") even though we 
don't have the command that removes the HDFS path from outside of Impala. I was 
using Apache Impala on a recent master where the tip commit is IMPALA-12996 
(Add support for DATE in Iceberg metadata tables).
{code:java}
I0417 16:06:57.716398 16131 ImpaladCatalog.java:232] Adding: 
TABLE:default.my_part version: 1723 size: 1557
I0417 16:06:57.719789 16131 ImpaladCatalog.java:232] Adding: CATALOG_SERVICE_ID 
version: 1723 size: 60
I0417 16:06:57.720358 16131 ImpaladCatalog.java:257] Adding 9 partition(s): 
HDFS_PARTITION:default.my_part:(p=1,p=2,...,p=9), versions=[1706, 1712, 1718], 
size=(avg=588, min=588, max=588, sum=5292)
E0417 16:06:57.917488 16131 ImpaladCatalog.java:264] Error adding catalog 
object: Received stale partition in a statestore update: 
THdfsPartition(partitionKeyExprs:[TExpr(nodes:[TExprNode(node_type:INT_LITERAL, 
type:TColumnType(types:[TTypeNode(type:SCALAR, 
scalar_type:TScalarType(type:INT))]), num_children:0, is_constant:true, 
int_literal:TIntLiteral(value:1), is_codegen_disabled:false)])], 
location:THdfsPartitionLocation(prefix_index:0, suffix:p=1), id:0, 
file_desc:[THdfsFileDesc(file_desc_data:18 00 00 00 00 00 00 00 00 00 0E 00 1C 
00 18 00 10 00 00 00 08 00 04 00 0E 00 00 00 18 00 00 00 A9 E7 4F EE 8E 01 00 
00 02 00 00 00 00 00 00 00 0C 00 00 00 01 00 00 00 4C 00 00 00 37 00 00 00 61 
61 34 36 34 66 61 66 35 61 31 37 36 65 39 65 2D 36 63 66 31 63 38 34 61 30 30 
30 30 30 30 30 30 5F 31 37 31 31 36 38 30 30 38 32 5F 64 61 74 61 2E 30 2E 74 
78 74 00 0C 00 14 00 00 00 0C 00...)], access_level:READ_WRITE, 
stats:TTableStats(num_rows:-1), is_marked_cached:false, 
hms_parameters:{transient_lastDdlTime=1713395198, totalSize=2, 
numFilesErasureCoded=0, numFiles=1}, num_blocks:1, total_file_size_bytes:2, 
has_incremental_stats:false, write_id:0, db_name:default, tbl_name:my_part, 
partition_name:p=1, 
hdfs_storage_descriptor:THdfsStorageDescriptor(lineDelim:10, fieldDelim:1, 
collectionDelim:1, mapKeyDelim:1, escapeChar:0, quoteChar:1, fileFormat:TEXT, 
blockSize:0))
Java exception follows:
java.lang.IllegalStateException: Received stale partition in a statestore 
update: 
THdfsPartition(partitionKeyExprs:[TExpr(nodes:[TExprNode(node_type:INT_LITERAL, 
type:TColumnType(types:[TTypeNode(type:SCALAR, 
scalar_type:TScalarType(type:INT))]), num_children:0, is_constant:true, 
int_literal:TIntLiteral(value:1), is_codegen_disabled:false)])], 
location:THdfsPartitionLocation(prefix_index:0, suffix:p=1), id:0, 
file_desc:[THdfsFileDesc(file_desc_data:18 00 00 00 00 00 00 00 00 00 0E 00 1C 
00 18 00 10 00 00 00 08 00 04 00 0E 00 00 00 18 00 00 00 A9 E7 4F EE 8E 01 00 
00 02 00 00 00 00 00 00 00 0C 00 00 00 01 00 00 00 4C 00 00 00 37 00 00 00 61 
61 34 36 34 66 61 66 35 61 31 37 36 65 39 65 2D 36 63 66 31 63 38 34 61 30 30 
30 30 30 30 30 30 5F 31 37 31 31 36 38 30 30 38 32 5F 64 61 74 61 2E 30 2E 74 
78 74 00 0C 00 14 00 00 00 0C 00...)], access_level:READ_WRITE, 
stats:TTableStats(num_rows:-1), is_marked_cached:false, 
hms_parameters:{transient_lastDdlTime=1713395198, totalSize=2, 
numFilesErasureCoded=0, numFiles=1}, num_blocks:1, total_file_size_bytes:2, 
has_incremental_stats:false, write_id:0, db_name:default, tbl_name:my_part, 
partition_name:p=1, 
hdfs_storage_descriptor:THdfsStorageDescriptor(lineDelim:10, fieldDelim:1, 
collectionDelim:1, mapKeyDelim:1, escapeChar:0, quoteChar:1, fileFormat:TEXT, 
blockSize:0))
at 
com.google.common.base.Preconditions.checkState(Preconditions.java:512)
at 
org.apache.impala.catalog.ImpaladCatalog.addTable(ImpaladCatalog.java:523)
at 
org.apache.impala.catalog.ImpaladCatalog.addCatalogObject(ImpaladCatalog.java:334)
at 
org.apache.impala.catalog.ImpaladCatalog.updateCatalog(ImpaladCatalog.java:262)
at 
org.apache.impala.service.FeCatalogManager$CatalogdImpl.updateCatalogCache(FeCatalogManager.java:120)
at 
org.apache.impala.service.Frontend.updateCatalogCache(Frontend.java:565)
at 
org.apache.impala.service.JniFrontend.updateCatalogCache(JniFrontend.java:196)
{code}

> Possible leak of partition updates when the table has failed DDL and 
> recovered by INVALIDATE METADATA
> -
>
> Key: IMPALA-13009
> URL: 

[jira] [Created] (IMPALA-12994) Revise the implementation of FsPermissionChecker to take Ranger policies into consideration

2024-04-10 Thread Fang-Yu Rao (Jira)
Fang-Yu Rao created IMPALA-12994:


 Summary: Revise the implementation of FsPermissionChecker to take 
Ranger policies into consideration
 Key: IMPALA-12994
 URL: https://issues.apache.org/jira/browse/IMPALA-12994
 Project: IMPALA
  Issue Type: Task
  Components: Frontend
Reporter: Fang-Yu Rao
Assignee: Fang-Yu Rao


Impala's current implementation of 
[FsPermissionChecker|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/util/FsPermissionChecker.java]
 does not take into consideration the Ranger policies on HDFS or the underlying 
file system, which could result in unwanted AnalysisException during query 
analysis phase as reported in IMPALA-11871 and IMPALA-12291. We should consider 
revising FsPermissionChecker to consider the Ranger policies on the storage 
layer as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-12985) Use the new constructor when instantiating RangerAccessRequestImpl

2024-04-08 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao updated IMPALA-12985:
-
Description: 
After RANGER-2763, we changed the signature of the class 
RangerAccessRequestImpl in by adding an additional input argument 'userRoles' 
as shown in the following.
{code:java}
public RangerAccessRequestImpl(RangerAccessResource resource, String 
accessType, String user, Set userGroups, Set userRoles) {
...
{code}
The new signature is also provided in CDP Ranger. Thus to unblock IMPALA-12921 
or to be able to build Apache Impala with locally built Apache Ranger, it may 
be faster to switch to the new signature on the Impala side than waiting for 
RANGER-4770 to be resolved on the Ranger side.

  was:
After RANGER-2763, we changed the signature of the class 
RangerAccessRequestImpl in by adding an additional input argument 'userRoles' 
as shown in the following.
{code:java}
public RangerAccessRequestImpl(RangerAccessResource resource, String 
accessType, String user, Set userGroups, Set userRoles) {
...
{code}
The new signature is also provided in CDP Ranger. Thus to unblock IMPALA-12921 
or to be able to build Apache Impala with Apache Ranger, it may be faster to 
switch to the new signature on the Impala side.


> Use the new constructor when instantiating RangerAccessRequestImpl
> --
>
> Key: IMPALA-12985
> URL: https://issues.apache.org/jira/browse/IMPALA-12985
> Project: IMPALA
>  Issue Type: Task
>  Components: Frontend
>Reporter: Fang-Yu Rao
>Assignee: Fang-Yu Rao
>Priority: Major
>
> After RANGER-2763, we changed the signature of the class 
> RangerAccessRequestImpl in by adding an additional input argument 'userRoles' 
> as shown in the following.
> {code:java}
> public RangerAccessRequestImpl(RangerAccessResource resource, String 
> accessType, String user, Set userGroups, Set userRoles) {
> ...
> {code}
> The new signature is also provided in CDP Ranger. Thus to unblock 
> IMPALA-12921 or to be able to build Apache Impala with locally built Apache 
> Ranger, it may be faster to switch to the new signature on the Impala side 
> than waiting for RANGER-4770 to be resolved on the Ranger side.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-12985) Use the new constructor when instantiating RangerAccessRequestImpl

2024-04-08 Thread Fang-Yu Rao (Jira)
Fang-Yu Rao created IMPALA-12985:


 Summary: Use the new constructor when instantiating 
RangerAccessRequestImpl
 Key: IMPALA-12985
 URL: https://issues.apache.org/jira/browse/IMPALA-12985
 Project: IMPALA
  Issue Type: Task
  Components: Frontend
Reporter: Fang-Yu Rao
Assignee: Fang-Yu Rao


After RANGER-2763, we changed the signature of the class 
RangerAccessRequestImpl in by adding an additional input argument 'userRoles' 
as shown in the following.
{code:java}
public RangerAccessRequestImpl(RangerAccessResource resource, String 
accessType, String user, Set userGroups, Set userRoles) {
...
{code}
The new signature is also provided in CDP Ranger. Thus to unblock IMPALA-12921 
or to be able to build Apache Impala with Apache Ranger, it may be faster to 
switch to the new signature on the Impala side.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-12921) Consider adding support for locally built Ranger

2024-04-05 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao updated IMPALA-12921:
-
Description: 
It would be nice to be able to support locally built Ranger in Impala's 
minicluster in that it would facilitate the testing of features that require 
changes to both components.

*+Edit:+*
Making the current Apache Impala on *master* (tip is
{*}IMPALA-12925{*}: Fix decimal data type for external JDBC table) to support 
Ranger on *master* (tip is 
{*}RANGER-4745{*}: Enhance handling of subAccess authorization in Ranger HDFS 
plugin) may be too ambitious.

The signatures of some classes are already incompatible. For instance, on the 
Impala side, Impala instantiates the instance of *RangerAccessRequestImpl* via 
the following code. 4 input arguments are needed.
{code:java}
RangerAccessRequest req = new RangerAccessRequestImpl(resource,
SELECT_ACCESS_TYPE, user.getShortName(), getUserGroups(user));
{code}
However, the current signature of RangerAccessRequestImpl's constructor on the 
master of Apache Ranger is the following. It can be seen we need 5 input 
arguments instead.
{code:java}
public RangerAccessRequestImpl(RangerAccessResource resource, String 
accessType, String user, Set userGroups, Set userRoles)
{code}
It may be more practical to support Ranger on an earlier version, e.g., 
[https://github.com/apache/ranger/blob/release-ranger-2.4.0].

  was:It would be nice to be able to support locally built Ranger in Impala's 
minicluster in that it would facilitate the testing of features that require 
changes to both components.


> Consider adding support for locally built Ranger
> 
>
> Key: IMPALA-12921
> URL: https://issues.apache.org/jira/browse/IMPALA-12921
> Project: IMPALA
>  Issue Type: Task
>Reporter: Fang-Yu Rao
>Assignee: Fang-Yu Rao
>Priority: Major
>
> It would be nice to be able to support locally built Ranger in Impala's 
> minicluster in that it would facilitate the testing of features that require 
> changes to both components.
> *+Edit:+*
> Making the current Apache Impala on *master* (tip is
> {*}IMPALA-12925{*}: Fix decimal data type for external JDBC table) to support 
> Ranger on *master* (tip is 
> {*}RANGER-4745{*}: Enhance handling of subAccess authorization in Ranger HDFS 
> plugin) may be too ambitious.
> The signatures of some classes are already incompatible. For instance, on the 
> Impala side, Impala instantiates the instance of *RangerAccessRequestImpl* 
> via the following code. 4 input arguments are needed.
> {code:java}
> RangerAccessRequest req = new RangerAccessRequestImpl(resource,
> SELECT_ACCESS_TYPE, user.getShortName(), getUserGroups(user));
> {code}
> However, the current signature of RangerAccessRequestImpl's constructor on 
> the master of Apache Ranger is the following. It can be seen we need 5 input 
> arguments instead.
> {code:java}
> public RangerAccessRequestImpl(RangerAccessResource resource, String 
> accessType, String user, Set userGroups, Set userRoles)
> {code}
> It may be more practical to support Ranger on an earlier version, e.g., 
> [https://github.com/apache/ranger/blob/release-ranger-2.4.0].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-12291) Insert statement fails even if hdfs ranger policy allows it

2024-04-01 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao resolved IMPALA-12291.
--
Resolution: Duplicate

This seems to be a duplicate of IMPALA-11871. We could probably continue our 
discussion there. I will also review the patch at 
https://gerrit.cloudera.org/c/20221/ and see how we could proceed.

cc: [~khr9603], [~stigahuang], [~amansinha]

> Insert statement fails even if hdfs ranger policy allows it
> ---
>
> Key: IMPALA-12291
> URL: https://issues.apache.org/jira/browse/IMPALA-12291
> Project: IMPALA
>  Issue Type: Bug
>  Components: fe, Security
> Environment: - Impala Version (4.1.0)
> - Ranger admin version (2.0)
> - Hive version (3.1.2)
>Reporter: halim kim
>Assignee: halim kim
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Apache Ranger is framework for providing security and authorization in hadoop 
> platform.
> Impala can also utilize apache ranger via ranger hive policy.
> The thing is that insert or some other query is not executed even If you 
> enable ranger hdfs plugin and set proper allow condition for impala query 
> excuting.
> you can see error log like below.
> {code:java}
> AnalysisException: Unable to INSERT into target table (testdb.testtable) 
> because Impala does not have WRITE access to HDFS location: 
> hdfs://testcluster/warehouse/testdb.db/testtable
> {code}
> This happens when ranger hdfs plugin is enabled but impala doesn't have 
> permission for hdfs POSIX permission. 
> For example, In the case that DB file owner, group and permission is set as 
> hdfs:hdfs r-xr-xr-- and ranger plugin policy(hdfs, hive and impala) allows 
> impala to execute query, Insert Query will be fail.
> In my opinion, The main cause is impala fe component doesn't check ranger 
> policy but hdfs POSIX model permissions. 
> Similar issue : https://issues.apache.org/jira/browse/IMPALA-10272
> I'm working on resolving this issue by adding hdfs ranger policy checking 
> code.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-11871) INSERT statement does not respect Ranger policies for HDFS

2024-04-01 Thread Fang-Yu Rao (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-11871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17832957#comment-17832957
 ] 

Fang-Yu Rao commented on IMPALA-11871:
--

After reading some past JIRA's in this area, I think it should be safe to skip 
{*}analyzeWriteAccess{*}() for the *INSERT* statement (or add a startup flag to 
disable it). Before the fix is ready, we could add the following to the 
*core-site.xml* consumed by the catalog server to allow an authorized user (by 
Ranger via Impala's frontend) to insert values into an HDFS table in the 
{*}legacy catalog mode{*}. Recall that the catalog server would consider the 
service user, usually named '{*}impala{*}', as a super user as long as the user 
'{*}impala{*}' belongs to the specified super group by 
''.
{code:java}
   
dfs.permissions.superusergroup

true
  
{code}
This is still secure when Ranger is the authorization provider because of the 
following.
 # For the INSERT statement, Impala's frontend makes sure the logged-in user 
(not necessarily the service user '{*}impala{*}') is granted the necessary 
privilege on the target table. The respective audit log entry is also produced 
whether or not the query is authorized even though we skip 
{*}analyzeWriteAccess{*}().
 # For a query that has been authorized by Impala's frontend and sent to the 
backend for execution, if Impala's backend interacts with the underlying 
services, e.g., HDFS, as the service user '{*}impala{*}', then this service 
user should always be considered as a super user or a user in a super group.

 
+*Detailed Analysis*+
We started performing such a permissions checking in [IMPALA-1279: Check ACLs 
for INSERT and LOAD 
statements|https://github.com/cloudera/Impala/commit/0b32bbd899d988f1cd5c526597932b67f4c35cce]
 when we were using Sentry as authorization provider. The reason to implement 
IMPALA-1279 was also mentioned in the description of the JIRA and is excerpted 
below for easy reference. In short, we would like to fail a query as early as 
possible if there could be permissions-related issue.
{quote}Impala checks permissions for LOAD and INSERT statements before 
executing them to allow for early-exit if the query would not succeed. However, 
it does not take extended ACLs in CDH5 into account.

When a directory has restrictive Posix permissions (e.g. 000), but has an ACL 
allowing writes, Impala should allow INSERTs and LOADs to happen to that 
directory. Instead, the early check will disallow them.

If the checks were disabled, the queries would execute (or not!) correctly, 
because we delegate to libhdfs or the DistributedFileSystem API to actually 
perform the operations we need.
{quote}
We hand-crafted the permissions checker within Impala. Specifically, in our 
[implementation|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/util/FsPermissionChecker.java#L206-L222],
 Hadoop ACL entries takes precedence over the POSIX permissions and we did 
*not* take into consideration the policies that could be defined on the HDFS 
path when the authorization provider is Ranger.

Due to how we implemented 
[FsPermissionChecker|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/util/FsPermissionChecker.java],
 it's possible that even though a logged-in user has been authorized to execute 
an INSERT statement into a table via a policy added to Ranger's repository of 
SQL, the query could fail during the analysis, simply because the service user, 
usually named '{*}impala{*}', could not pass the permissions checker. For 
instance, this could occur if the table to insert was created by another query 
engine, e.g., Hive Server2 (HS2) and thus the table is owned by another service 
user, e.g., '{*}hive{*}'. In addition, we have an ACL entry of 
"{*}group::r-x{*}" by default when the table was created. The current 
implementation of Impala's permissions checker would deny the service user 
'{*}impala{*}' of writing the table even the user '{*}impala{*}' is in the 
group of '{*}hive{*}' as shown in the following.
{code:java}
[r...@ccycloud-4.engesc24485d02.root.comops.site ~]# hdfs dfs -getfacl 

# file:  # owner: hive
# group: hive
user::rwx
group::r-x
other::r-x
 
[r...@ccycloud-4.engesc24485d02.root.comops.site impalad]# groups impala
impala : impala hive {code}
 
In 
[IMPALA-3143|https://github.com/apache/impala/commit/a0ad1868bda902fd914bc2be39eb9629a6eceb76],
 we allowed an administrator to specify the name of the super group (from 
catalog server's perspective). Once the *current user* belongs to the specified 
super group denoted via '{*}DFS_PERMISSIONS_SUPERUSERGROUP_KEY{*}' 
("{*}dfs.permissions.superusergroup{*}"), which defaulted to 
'{*}DFS_PERMISSIONS_SUPERUSERGROUP_DEFAULT{*}' ("{*}supergroup{*}"), then 
catalog server would grant the WRITE request against the corresponding table 
from the current user. Refer to 

[jira] [Comment Edited] (IMPALA-11871) INSERT statement does not respect Ranger policies for HDFS

2024-03-25 Thread Fang-Yu Rao (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-11871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830738#comment-17830738
 ] 

Fang-Yu Rao edited comment on IMPALA-11871 at 3/26/24 5:17 AM:
---

Hi [~MikaelSmith], my current understanding is that this is not a regression 
from earlier releases. It's more like a feature request for usability.

The method that is performing the permissions checking 
({*}analyzeWriteAccess{*}()) was added in IMPALA-7311. The purpose, I guess, 
was to make sure the Impala service has the necessary write permissions as 
early as possible, i.e., during the query analysis phase (v.s. in the query 
execution phase).

After Impala started supporting Ranger as its authorization provider, ideally, 
a cluster administrator should be able to manage the permissions on HDFS via 
either a) Ranger's policy repository for HDFS, or b) the HDFS Access Control 
Lists (HDFS ACLs). But at the moment, Impala's coordinator unconditionally 
performs the permissions-checking without checking Ranger's policy repository 
of HDFS.

IMPALA-10272 resolved a similar issue for the LOAD DATA statement. We could 
resolve this JIRA using the same approach there, where Impala's frontend calls 
*hadoop.fs.FileSystem.access(Path path, FsAction mode)* to check the actual 
access permissions, which could also reflect the permissions managed via 
Ranger's HDFS policy repository.


was (Author: fangyurao):
Hi [~MikaelSmith], my current understanding is that this is not a regression 
from earlier releases. It's more like a feature request for usability.

The method that is performing the permissions checking 
({*}analyzeWriteAccess{*}()) was added in IMPALA-7311. The purpose, I guess, 
was to make sure the Impala service has the necessary write permissions as 
early as possible, i.e., during the query analysis phase (v.s. in the query 
execution phase).

After Impala started supporting Ranger as its authorization provider, ideally, 
a cluster administrator should be able to manage the permissions on HDFS via 
either a) Ranger's policy repository for HDFS, or b) the HDFS Access Control 
Lists (HDFS ACLs). But at the moment, Impala's coordinator unconditionally 
performs the permissions-checking without checking Ranger's policy repository 
of HDFS.

IMPALA-10272 resolved a similar issue for the LOAD DATA statement. We could 
resolve this JIRA using the same approach there, where Impala's frontend calls 
*hadoop.fs.FileSystem.access(Path path, FsAction mode)* to check the actual 
access permissions, which could also reflect the permissions manged via 
Ranger's HDFS policy repository.

> INSERT statement does not respect Ranger policies for HDFS
> --
>
> Key: IMPALA-11871
> URL: https://issues.apache.org/jira/browse/IMPALA-11871
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Reporter: Fang-Yu Rao
>Assignee: Fang-Yu Rao
>Priority: Major
>
> In a cluster with Ranger auth (and with legacy catalog mode), even if you 
> provide RWX to cm_hdfs -> all-path for the user impala, inserting into a 
> table whose HDFS POSIX permissions happen to exclude impala access will 
> result in an
> {noformat}
> "AnalysisException: Unable to INSERT into target table (default.t1) because 
> Impala does not have WRITE access to HDFS location: 
> hdfs://nightly-71x-vx-2.nightly-71x-vx.root.hwx.site:8020/warehouse/tablespace/external/hive/t1"{noformat}
>  
> {noformat}
> [root@nightly-71x-vx-3 ~]# hdfs dfs -getfacl 
> /warehouse/tablespace/external/hive/t1
> file: /warehouse/tablespace/external/hive/t1 
> owner: hive 
> group: supergroup
> user::rwx
> user:impala:rwx #effective:r-x
> group::rwx #effective:r-x
> mask::r-x
> other::---
> default:user::rwx
> default:user:impala:rwx
> default:group::rwx
> default:mask::rwx
> default:other::--- {noformat}
> ~~
> ANALYSIS
> Stack trace from a version of Cloudera's distribution of Impala (impalad 
> version 3.4.0-SNAPSHOT RELEASE (build 
> {*}db20b59a093c17ea4699117155d58fe874f7d68f{*})):
> {noformat}
> at 
> org.apache.impala.catalog.FeFsTable$Utils.checkWriteAccess(FeFsTable.java:585)
> at 
> org.apache.impala.analysis.InsertStmt.analyzeWriteAccess(InsertStmt.java:545)
> at org.apache.impala.analysis.InsertStmt.analyze(InsertStmt.java:391)
> at 
> org.apache.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:463)
> at 
> org.apache.impala.analysis.AnalysisContext.analyzeAndAuthorize(AnalysisContext.java:426)
> at org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:1570)
> at org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:1536)
> at org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1506)
> at 
> 

[jira] [Commented] (IMPALA-11871) INSERT statement does not respect Ranger policies for HDFS

2024-03-25 Thread Fang-Yu Rao (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-11871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830738#comment-17830738
 ] 

Fang-Yu Rao commented on IMPALA-11871:
--

Hi [~MikaelSmith], my current understanding is that this is not a regression 
from earlier releases. It's more like a feature request for usability.

The method that is performing the permissions checking 
({*}analyzeWriteAccess{*}()) was added in IMPALA-7311. The purpose, I guess, 
was to make sure the Impala service has the necessary write permissions as 
early as possible, i.e., during the query analysis phase (v.s. in the query 
execution phase).

After Impala started supporting Ranger as its authorization provider, ideally, 
a cluster administrator should be able to manage the permissions on HDFS via 
either a) Ranger's policy repository for HDFS, or b) the HDFS Access Control 
Lists (HDFS ACLs). But at the moment, Impala's coordinator unconditionally 
performs the permissions-checking without checking Ranger's policy repository 
of HDFS.

IMPALA-10272 resolved a similar issue for the LOAD DATA statement. We could 
resolve this JIRA using the same approach there, where Impala's frontend calls 
*hadoop.fs.FileSystem.access(Path path, FsAction mode)* to check the actual 
access permissions, which could also reflect the permissions manged via 
Ranger's HDFS policy repository.

> INSERT statement does not respect Ranger policies for HDFS
> --
>
> Key: IMPALA-11871
> URL: https://issues.apache.org/jira/browse/IMPALA-11871
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Reporter: Fang-Yu Rao
>Assignee: Fang-Yu Rao
>Priority: Major
>
> In a cluster with Ranger auth (and with legacy catalog mode), even if you 
> provide RWX to cm_hdfs -> all-path for the user impala, inserting into a 
> table whose HDFS POSIX permissions happen to exclude impala access will 
> result in an
> {noformat}
> "AnalysisException: Unable to INSERT into target table (default.t1) because 
> Impala does not have WRITE access to HDFS location: 
> hdfs://nightly-71x-vx-2.nightly-71x-vx.root.hwx.site:8020/warehouse/tablespace/external/hive/t1"{noformat}
>  
> {noformat}
> [root@nightly-71x-vx-3 ~]# hdfs dfs -getfacl 
> /warehouse/tablespace/external/hive/t1
> file: /warehouse/tablespace/external/hive/t1 
> owner: hive 
> group: supergroup
> user::rwx
> user:impala:rwx #effective:r-x
> group::rwx #effective:r-x
> mask::r-x
> other::---
> default:user::rwx
> default:user:impala:rwx
> default:group::rwx
> default:mask::rwx
> default:other::--- {noformat}
> ~~
> ANALYSIS
> Stack trace from a version of Cloudera's distribution of Impala (impalad 
> version 3.4.0-SNAPSHOT RELEASE (build 
> {*}db20b59a093c17ea4699117155d58fe874f7d68f{*})):
> {noformat}
> at 
> org.apache.impala.catalog.FeFsTable$Utils.checkWriteAccess(FeFsTable.java:585)
> at 
> org.apache.impala.analysis.InsertStmt.analyzeWriteAccess(InsertStmt.java:545)
> at org.apache.impala.analysis.InsertStmt.analyze(InsertStmt.java:391)
> at 
> org.apache.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:463)
> at 
> org.apache.impala.analysis.AnalysisContext.analyzeAndAuthorize(AnalysisContext.java:426)
> at org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:1570)
> at org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:1536)
> at org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1506)
> at 
> org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:155){noformat}
> The exception occurs at analysis time, so I tested and succeeded in writing 
> directly into the said directory.
> {noformat}
> [root@nightly-71x-vx-3 ~]# hdfs dfs -touchz 
> /warehouse/tablespace/external/hive/t1/test
> [root@nightly-71x-vx-3 ~]# hdfs dfs -ls 
> /warehouse/tablespace/external/hive/t1/
> Found 8 items
> rw-rw---+ 3 hive supergroup 417 2023-01-27 17:37 
> /warehouse/tablespace/external/hive/t1/00_0
> rw-rw---+ 3 hive supergroup 417 2023-01-27 17:44 
> /warehouse/tablespace/external/hive/t1/00_0_copy_1
> rw-rw---+ 3 hive supergroup 417 2023-01-27 17:49 
> /warehouse/tablespace/external/hive/t1/00_0_copy_2
> rw-rw---+ 3 hive supergroup 417 2023-01-27 17:53 
> /warehouse/tablespace/external/hive/t1/00_0_copy_3
> rw-rw---+ 3 impala hive 355 2023-01-27 17:17 
> /warehouse/tablespace/external/hive/t1/4c4477c12c51ad96-3126b52d_2029811630_data.0.parq
> rw-rw---+ 3 impala hive 355 2023-01-27 17:39 
> /warehouse/tablespace/external/hive/t1/9945b25bb37d1ff2-473c1478_574471191_data.0.parq
> drwxrwx---+ - impala hive 0 2023-01-27 17:39 
> /warehouse/tablespace/external/hive/t1/_impala_insert_staging
> rw-rw---+ 3 impala supergroup 0 2023-01-27 18:01 
> 

[jira] [Created] (IMPALA-12921) Consider adding support for locally built Ranger

2024-03-18 Thread Fang-Yu Rao (Jira)
Fang-Yu Rao created IMPALA-12921:


 Summary: Consider adding support for locally built Ranger
 Key: IMPALA-12921
 URL: https://issues.apache.org/jira/browse/IMPALA-12921
 Project: IMPALA
  Issue Type: Task
Reporter: Fang-Yu Rao
Assignee: Fang-Yu Rao


It would be nice to be able to support locally built Ranger in Impala's 
minicluster in that it would facilitate the testing of features that require 
changes to both components.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Comment Edited] (IMPALA-12830) test_webserver_hide_logs_link() could fail in the exhaustive build

2024-02-21 Thread Fang-Yu Rao (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819426#comment-17819426
 ] 

Fang-Yu Rao edited comment on IMPALA-12830 at 2/22/24 12:43 AM:


This issue seems to be similar to IMPALA-12170.

cc: [~stigahuang]


was (Author: fangyurao):
This issue seems to be similar to IMPALA-12170.

> test_webserver_hide_logs_link() could fail in the exhaustive build
> --
>
> Key: IMPALA-12830
> URL: https://issues.apache.org/jira/browse/IMPALA-12830
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Fang-Yu Rao
>Assignee: Saurabh Katiyal
>Priority: Major
>  Labels: broken-build
>
> We found in an internal Jenkins run that test_webserver_hide_logs_link() 
> could fail in the exhaustive build with the following error.
> +*Error Message*+
> {code:java}
> AssertionError: bad links from webui port 25020 assert ['/', 
> '/catal...g_level', ...] == ['/', '/catalo...g_level', ...]   At index 2 
> diff: u'/events' != '/hadoop-varz'   Full diff:   - [u'/',   ?  -   + ['/',   
> -  u'/catalog',   ?  -   +  '/catalog',   -  u'/events',   -  
> u'/hadoop-varz',   ?  -   +  '/hadoop-varz',   +  '/events',   -  u'/jmx',   
> ?  -   +  '/jmx',   -  u'/log_level',   ?  -   +  '/log_level',   -  
> u'/memz',   ?  -   +  '/memz',   -  u'/metrics',   ?  -   +  '/metrics',   -  
> u'/operations',   ?  -   +  '/operations',   -  u'/profile_docs',   ?  -   +  
> '/profile_docs',   -  u'/rpcz',   ?  -   +  '/rpcz',   -  u'/threadz',   ?  - 
>   +  '/threadz',   -  u'/varz']   ?  -   +  '/varz']
> {code}
> +*Stacktrace*+
> {code:java}
> custom_cluster/test_web_pages.py:248: in test_webserver_hide_logs_link
> assert found_links == expected_catalog_links, msg
> E   AssertionError: bad links from webui port 25020
> E   assert ['/', '/catal...g_level', ...] == ['/', '/catalo...g_level', ...]
> E At index 2 diff: u'/events' != '/hadoop-varz'
> E Full diff:
> E - [u'/',
> E ?  -
> E + ['/',
> E -  u'/catalog',
> E ?  -
> E +  '/catalog',
> E -  u'/events',
> E -  u'/hadoop-varz',
> E ?  -
> E +  '/hadoop-varz',
> E +  '/events',
> E -  u'/jmx',
> E ?  -
> E +  '/jmx',
> E -  u'/log_level',
> E ?  -
> E +  '/log_level',
> E -  u'/memz',
> E ?  -
> E +  '/memz',
> E -  u'/metrics',
> E ?  -
> E +  '/metrics',
> E -  u'/operations',
> E ?  -
> E +  '/operations',
> E -  u'/profile_docs',
> E ?  -
> E +  '/profile_docs',
> E -  u'/rpcz',
> E ?  -
> E +  '/rpcz',
> E -  u'/threadz',
> E ?  -
> E +  '/threadz',
> E -  u'/varz']
> E ?  -
> E +  '/varz']
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12830) test_webserver_hide_logs_link() could fail in the exhaustive build

2024-02-21 Thread Fang-Yu Rao (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819426#comment-17819426
 ] 

Fang-Yu Rao commented on IMPALA-12830:
--

This issue seems to be similar to IMPALA-12170.

> test_webserver_hide_logs_link() could fail in the exhaustive build
> --
>
> Key: IMPALA-12830
> URL: https://issues.apache.org/jira/browse/IMPALA-12830
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Fang-Yu Rao
>Assignee: Saurabh Katiyal
>Priority: Major
>  Labels: broken-build
>
> We found in an internal Jenkins run that test_webserver_hide_logs_link() 
> could fail in the exhaustive build with the following error.
> +*Error Message*+
> {code:java}
> AssertionError: bad links from webui port 25020 assert ['/', 
> '/catal...g_level', ...] == ['/', '/catalo...g_level', ...]   At index 2 
> diff: u'/events' != '/hadoop-varz'   Full diff:   - [u'/',   ?  -   + ['/',   
> -  u'/catalog',   ?  -   +  '/catalog',   -  u'/events',   -  
> u'/hadoop-varz',   ?  -   +  '/hadoop-varz',   +  '/events',   -  u'/jmx',   
> ?  -   +  '/jmx',   -  u'/log_level',   ?  -   +  '/log_level',   -  
> u'/memz',   ?  -   +  '/memz',   -  u'/metrics',   ?  -   +  '/metrics',   -  
> u'/operations',   ?  -   +  '/operations',   -  u'/profile_docs',   ?  -   +  
> '/profile_docs',   -  u'/rpcz',   ?  -   +  '/rpcz',   -  u'/threadz',   ?  - 
>   +  '/threadz',   -  u'/varz']   ?  -   +  '/varz']
> {code}
> +*Stacktrace*+
> {code:java}
> custom_cluster/test_web_pages.py:248: in test_webserver_hide_logs_link
> assert found_links == expected_catalog_links, msg
> E   AssertionError: bad links from webui port 25020
> E   assert ['/', '/catal...g_level', ...] == ['/', '/catalo...g_level', ...]
> E At index 2 diff: u'/events' != '/hadoop-varz'
> E Full diff:
> E - [u'/',
> E ?  -
> E + ['/',
> E -  u'/catalog',
> E ?  -
> E +  '/catalog',
> E -  u'/events',
> E -  u'/hadoop-varz',
> E ?  -
> E +  '/hadoop-varz',
> E +  '/events',
> E -  u'/jmx',
> E ?  -
> E +  '/jmx',
> E -  u'/log_level',
> E ?  -
> E +  '/log_level',
> E -  u'/memz',
> E ?  -
> E +  '/memz',
> E -  u'/metrics',
> E ?  -
> E +  '/metrics',
> E -  u'/operations',
> E ?  -
> E +  '/operations',
> E -  u'/profile_docs',
> E ?  -
> E +  '/profile_docs',
> E -  u'/rpcz',
> E ?  -
> E +  '/rpcz',
> E -  u'/threadz',
> E ?  -
> E +  '/threadz',
> E -  u'/varz']
> E ?  -
> E +  '/varz']
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12830) test_webserver_hide_logs_link() could fail in the exhaustive build

2024-02-21 Thread Fang-Yu Rao (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819425#comment-17819425
 ] 

Fang-Yu Rao commented on IMPALA-12830:
--

Hi [~skatiyal], assigned the JIRA to you since you revised the test case in 
IMPALA-9086 (Show Hive configurations in /hadoop-varz page) and thus may be 
more familiar with the context. Please feel free to re-assign as you see 
appropriate. Thanks!

> test_webserver_hide_logs_link() could fail in the exhaustive build
> --
>
> Key: IMPALA-12830
> URL: https://issues.apache.org/jira/browse/IMPALA-12830
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Fang-Yu Rao
>Assignee: Saurabh Katiyal
>Priority: Major
>  Labels: broken-build
>
> We found in an internal Jenkins run that test_webserver_hide_logs_link() 
> could fail in the exhaustive build with the following error.
> +*Error Message*+
> {code:java}
> AssertionError: bad links from webui port 25020 assert ['/', 
> '/catal...g_level', ...] == ['/', '/catalo...g_level', ...]   At index 2 
> diff: u'/events' != '/hadoop-varz'   Full diff:   - [u'/',   ?  -   + ['/',   
> -  u'/catalog',   ?  -   +  '/catalog',   -  u'/events',   -  
> u'/hadoop-varz',   ?  -   +  '/hadoop-varz',   +  '/events',   -  u'/jmx',   
> ?  -   +  '/jmx',   -  u'/log_level',   ?  -   +  '/log_level',   -  
> u'/memz',   ?  -   +  '/memz',   -  u'/metrics',   ?  -   +  '/metrics',   -  
> u'/operations',   ?  -   +  '/operations',   -  u'/profile_docs',   ?  -   +  
> '/profile_docs',   -  u'/rpcz',   ?  -   +  '/rpcz',   -  u'/threadz',   ?  - 
>   +  '/threadz',   -  u'/varz']   ?  -   +  '/varz']
> {code}
> +*Stacktrace*+
> {code:java}
> custom_cluster/test_web_pages.py:248: in test_webserver_hide_logs_link
> assert found_links == expected_catalog_links, msg
> E   AssertionError: bad links from webui port 25020
> E   assert ['/', '/catal...g_level', ...] == ['/', '/catalo...g_level', ...]
> E At index 2 diff: u'/events' != '/hadoop-varz'
> E Full diff:
> E - [u'/',
> E ?  -
> E + ['/',
> E -  u'/catalog',
> E ?  -
> E +  '/catalog',
> E -  u'/events',
> E -  u'/hadoop-varz',
> E ?  -
> E +  '/hadoop-varz',
> E +  '/events',
> E -  u'/jmx',
> E ?  -
> E +  '/jmx',
> E -  u'/log_level',
> E ?  -
> E +  '/log_level',
> E -  u'/memz',
> E ?  -
> E +  '/memz',
> E -  u'/metrics',
> E ?  -
> E +  '/metrics',
> E -  u'/operations',
> E ?  -
> E +  '/operations',
> E -  u'/profile_docs',
> E ?  -
> E +  '/profile_docs',
> E -  u'/rpcz',
> E ?  -
> E +  '/rpcz',
> E -  u'/threadz',
> E ?  -
> E +  '/threadz',
> E -  u'/varz']
> E ?  -
> E +  '/varz']
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-12830) test_web_pages() could fail in the exhaustive build

2024-02-21 Thread Fang-Yu Rao (Jira)
Fang-Yu Rao created IMPALA-12830:


 Summary: test_web_pages() could fail in the exhaustive build
 Key: IMPALA-12830
 URL: https://issues.apache.org/jira/browse/IMPALA-12830
 Project: IMPALA
  Issue Type: Bug
Reporter: Fang-Yu Rao
Assignee: Saurabh Katiyal


We found in an internal Jenkins run that test_web_pages() could fail in the 
exhaustive build with the following error.
+*Error Message*+
{code}
AssertionError: bad links from webui port 25020 assert ['/', 
'/catal...g_level', ...] == ['/', '/catalo...g_level', ...]   At index 2 diff: 
u'/events' != '/hadoop-varz'   Full diff:   - [u'/',   ?  -   + ['/',   -  
u'/catalog',   ?  -   +  '/catalog',   -  u'/events',   -  u'/hadoop-varz',   ? 
 -   +  '/hadoop-varz',   +  '/events',   -  u'/jmx',   ?  -   +  '/jmx',   -  
u'/log_level',   ?  -   +  '/log_level',   -  u'/memz',   ?  -   +  '/memz',   
-  u'/metrics',   ?  -   +  '/metrics',   -  u'/operations',   ?  -   +  
'/operations',   -  u'/profile_docs',   ?  -   +  '/profile_docs',   -  
u'/rpcz',   ?  -   +  '/rpcz',   -  u'/threadz',   ?  -   +  '/threadz',   -  
u'/varz']   ?  -   +  '/varz']
{code}

+*Stacktrace*+
{code}
custom_cluster/test_web_pages.py:248: in test_webserver_hide_logs_link
assert found_links == expected_catalog_links, msg
E   AssertionError: bad links from webui port 25020
E   assert ['/', '/catal...g_level', ...] == ['/', '/catalo...g_level', ...]
E At index 2 diff: u'/events' != '/hadoop-varz'
E Full diff:
E - [u'/',
E ?  -
E + ['/',
E -  u'/catalog',
E ?  -
E +  '/catalog',
E -  u'/events',
E -  u'/hadoop-varz',
E ?  -
E +  '/hadoop-varz',
E +  '/events',
E -  u'/jmx',
E ?  -
E +  '/jmx',
E -  u'/log_level',
E ?  -
E +  '/log_level',
E -  u'/memz',
E ?  -
E +  '/memz',
E -  u'/metrics',
E ?  -
E +  '/metrics',
E -  u'/operations',
E ?  -
E +  '/operations',
E -  u'/profile_docs',
E ?  -
E +  '/profile_docs',
E -  u'/rpcz',
E ?  -
E +  '/rpcz',
E -  u'/threadz',
E ?  -
E +  '/threadz',
E -  u'/varz']
E ?  -
E +  '/varz']
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-12830) test_webserver_hide_logs_link() could fail in the exhaustive build

2024-02-21 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao updated IMPALA-12830:
-
Summary: test_webserver_hide_logs_link() could fail in the exhaustive build 
 (was: test_web_pages() could fail in the exhaustive build)

> test_webserver_hide_logs_link() could fail in the exhaustive build
> --
>
> Key: IMPALA-12830
> URL: https://issues.apache.org/jira/browse/IMPALA-12830
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Fang-Yu Rao
>Assignee: Saurabh Katiyal
>Priority: Major
>  Labels: broken-build
>
> We found in an internal Jenkins run that test_web_pages() could fail in the 
> exhaustive build with the following error.
> +*Error Message*+
> {code}
> AssertionError: bad links from webui port 25020 assert ['/', 
> '/catal...g_level', ...] == ['/', '/catalo...g_level', ...]   At index 2 
> diff: u'/events' != '/hadoop-varz'   Full diff:   - [u'/',   ?  -   + ['/',   
> -  u'/catalog',   ?  -   +  '/catalog',   -  u'/events',   -  
> u'/hadoop-varz',   ?  -   +  '/hadoop-varz',   +  '/events',   -  u'/jmx',   
> ?  -   +  '/jmx',   -  u'/log_level',   ?  -   +  '/log_level',   -  
> u'/memz',   ?  -   +  '/memz',   -  u'/metrics',   ?  -   +  '/metrics',   -  
> u'/operations',   ?  -   +  '/operations',   -  u'/profile_docs',   ?  -   +  
> '/profile_docs',   -  u'/rpcz',   ?  -   +  '/rpcz',   -  u'/threadz',   ?  - 
>   +  '/threadz',   -  u'/varz']   ?  -   +  '/varz']
> {code}
> +*Stacktrace*+
> {code}
> custom_cluster/test_web_pages.py:248: in test_webserver_hide_logs_link
> assert found_links == expected_catalog_links, msg
> E   AssertionError: bad links from webui port 25020
> E   assert ['/', '/catal...g_level', ...] == ['/', '/catalo...g_level', ...]
> E At index 2 diff: u'/events' != '/hadoop-varz'
> E Full diff:
> E - [u'/',
> E ?  -
> E + ['/',
> E -  u'/catalog',
> E ?  -
> E +  '/catalog',
> E -  u'/events',
> E -  u'/hadoop-varz',
> E ?  -
> E +  '/hadoop-varz',
> E +  '/events',
> E -  u'/jmx',
> E ?  -
> E +  '/jmx',
> E -  u'/log_level',
> E ?  -
> E +  '/log_level',
> E -  u'/memz',
> E ?  -
> E +  '/memz',
> E -  u'/metrics',
> E ?  -
> E +  '/metrics',
> E -  u'/operations',
> E ?  -
> E +  '/operations',
> E -  u'/profile_docs',
> E ?  -
> E +  '/profile_docs',
> E -  u'/rpcz',
> E ?  -
> E +  '/rpcz',
> E -  u'/threadz',
> E ?  -
> E +  '/threadz',
> E -  u'/varz']
> E ?  -
> E +  '/varz']
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-12830) test_webserver_hide_logs_link() could fail in the exhaustive build

2024-02-21 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao updated IMPALA-12830:
-
Description: 
We found in an internal Jenkins run that test_webserver_hide_logs_link() could 
fail in the exhaustive build with the following error.
+*Error Message*+
{code:java}
AssertionError: bad links from webui port 25020 assert ['/', 
'/catal...g_level', ...] == ['/', '/catalo...g_level', ...]   At index 2 diff: 
u'/events' != '/hadoop-varz'   Full diff:   - [u'/',   ?  -   + ['/',   -  
u'/catalog',   ?  -   +  '/catalog',   -  u'/events',   -  u'/hadoop-varz',   ? 
 -   +  '/hadoop-varz',   +  '/events',   -  u'/jmx',   ?  -   +  '/jmx',   -  
u'/log_level',   ?  -   +  '/log_level',   -  u'/memz',   ?  -   +  '/memz',   
-  u'/metrics',   ?  -   +  '/metrics',   -  u'/operations',   ?  -   +  
'/operations',   -  u'/profile_docs',   ?  -   +  '/profile_docs',   -  
u'/rpcz',   ?  -   +  '/rpcz',   -  u'/threadz',   ?  -   +  '/threadz',   -  
u'/varz']   ?  -   +  '/varz']
{code}
+*Stacktrace*+
{code:java}
custom_cluster/test_web_pages.py:248: in test_webserver_hide_logs_link
assert found_links == expected_catalog_links, msg
E   AssertionError: bad links from webui port 25020
E   assert ['/', '/catal...g_level', ...] == ['/', '/catalo...g_level', ...]
E At index 2 diff: u'/events' != '/hadoop-varz'
E Full diff:
E - [u'/',
E ?  -
E + ['/',
E -  u'/catalog',
E ?  -
E +  '/catalog',
E -  u'/events',
E -  u'/hadoop-varz',
E ?  -
E +  '/hadoop-varz',
E +  '/events',
E -  u'/jmx',
E ?  -
E +  '/jmx',
E -  u'/log_level',
E ?  -
E +  '/log_level',
E -  u'/memz',
E ?  -
E +  '/memz',
E -  u'/metrics',
E ?  -
E +  '/metrics',
E -  u'/operations',
E ?  -
E +  '/operations',
E -  u'/profile_docs',
E ?  -
E +  '/profile_docs',
E -  u'/rpcz',
E ?  -
E +  '/rpcz',
E -  u'/threadz',
E ?  -
E +  '/threadz',
E -  u'/varz']
E ?  -
E +  '/varz']
{code}

  was:
We found in an internal Jenkins run that test_web_pages() could fail in the 
exhaustive build with the following error.
+*Error Message*+
{code}
AssertionError: bad links from webui port 25020 assert ['/', 
'/catal...g_level', ...] == ['/', '/catalo...g_level', ...]   At index 2 diff: 
u'/events' != '/hadoop-varz'   Full diff:   - [u'/',   ?  -   + ['/',   -  
u'/catalog',   ?  -   +  '/catalog',   -  u'/events',   -  u'/hadoop-varz',   ? 
 -   +  '/hadoop-varz',   +  '/events',   -  u'/jmx',   ?  -   +  '/jmx',   -  
u'/log_level',   ?  -   +  '/log_level',   -  u'/memz',   ?  -   +  '/memz',   
-  u'/metrics',   ?  -   +  '/metrics',   -  u'/operations',   ?  -   +  
'/operations',   -  u'/profile_docs',   ?  -   +  '/profile_docs',   -  
u'/rpcz',   ?  -   +  '/rpcz',   -  u'/threadz',   ?  -   +  '/threadz',   -  
u'/varz']   ?  -   +  '/varz']
{code}

+*Stacktrace*+
{code}
custom_cluster/test_web_pages.py:248: in test_webserver_hide_logs_link
assert found_links == expected_catalog_links, msg
E   AssertionError: bad links from webui port 25020
E   assert ['/', '/catal...g_level', ...] == ['/', '/catalo...g_level', ...]
E At index 2 diff: u'/events' != '/hadoop-varz'
E Full diff:
E - [u'/',
E ?  -
E + ['/',
E -  u'/catalog',
E ?  -
E +  '/catalog',
E -  u'/events',
E -  u'/hadoop-varz',
E ?  -
E +  '/hadoop-varz',
E +  '/events',
E -  u'/jmx',
E ?  -
E +  '/jmx',
E -  u'/log_level',
E ?  -
E +  '/log_level',
E -  u'/memz',
E ?  -
E +  '/memz',
E -  u'/metrics',
E ?  -
E +  '/metrics',
E -  u'/operations',
E ?  -
E +  '/operations',
E -  u'/profile_docs',
E ?  -
E +  '/profile_docs',
E -  u'/rpcz',
E ?  -
E +  '/rpcz',
E -  u'/threadz',
E ?  -
E +  '/threadz',
E -  u'/varz']
E ?  -
E +  '/varz']
{code}


> test_webserver_hide_logs_link() could fail in the exhaustive build
> --
>
> Key: IMPALA-12830
> URL: https://issues.apache.org/jira/browse/IMPALA-12830
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Fang-Yu Rao
>Assignee: Saurabh Katiyal
>Priority: Major
>  Labels: broken-build
>
> We found in an internal Jenkins run that test_webserver_hide_logs_link() 
> could fail in the exhaustive build with the following error.
> +*Error Message*+
> {code:java}
> AssertionError: bad links from webui port 25020 assert ['/', 
> '/catal...g_level', ...] == ['/', '/catalo...g_level', ...]   At index 2 
> diff: u'/events' != '/hadoop-varz'   Full diff:   - [u'/',   ?  -   + ['/',   
> -  u'/catalog',   ?  -   +  '/catalog',   -  u'/events',   -  
> u'/hadoop-varz',   ?  -   +  

[jira] [Commented] (IMPALA-12819) InaccessibleObjectException found during LocalCatalogTest

2024-02-17 Thread Fang-Yu Rao (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818215#comment-17818215
 ] 

Fang-Yu Rao commented on IMPALA-12819:
--

Hi [~MikaelSmith], assigned the JIRA to you since you helped with IMPALA-11260 
earlier and may be more familiar with the context. Please re-assign the ticket 
as you see appropriate. Thanks!


> InaccessibleObjectException found during LocalCatalogTest
> -
>
> Key: IMPALA-12819
> URL: https://issues.apache.org/jira/browse/IMPALA-12819
> Project: IMPALA
>  Issue Type: Bug
>  Components: fe
>Affects Versions: Impala 4.4.0
>Reporter: Fang-Yu Rao
>Assignee: Michael Smith
>Priority: Major
>  Labels: broken-build
>
> We found in an internal build that during LocalCatalogTest we could encounter 
> InaccessibleObjectException. This was found by the test 
> [test_no_inaccessible_objects|https://github.com/apache/impala/blob/master/tests/verifiers/test_banned_log_messages.py#L40C7-L40C35]
> {code:java}
> W0217 01:31:14.108255 18119 ObjectGraphWalker.java:251] The JVM is preventing 
> Ehcache from accessing the subgraph beneath 'private final 
> jdk.internal.platform.CgroupV1Metrics 
> jdk.internal.platform.CgroupV1MetricsImpl.metrics' - cache sizes may be 
> underestimated as a result
> Java exception follows:
> java.lang.reflect.InaccessibleObjectException: Unable to make field private 
> final jdk.internal.platform.CgroupV1Metrics 
> jdk.internal.platform.CgroupV1MetricsImpl.metrics accessible: module 
> java.base does not "opens jdk.internal.platform" to unnamed module @2c89cd7f
> at 
> java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:340)
> at 
> java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:280)
> at 
> java.base/java.lang.reflect.Field.checkCanSetAccessible(Field.java:176)
> at java.base/java.lang.reflect.Field.setAccessible(Field.java:170)
> at 
> org.ehcache.sizeof.ObjectGraphWalker.getAllFields(ObjectGraphWalker.java:245)
> at 
> org.ehcache.sizeof.ObjectGraphWalker.getFilteredFields(ObjectGraphWalker.java:204)
> at 
> org.ehcache.sizeof.ObjectGraphWalker.walk(ObjectGraphWalker.java:159)
> at org.ehcache.sizeof.SizeOf.deepSizeOf(SizeOf.java:74)
> at 
> org.apache.impala.catalog.local.CatalogdMetaProvider$SizeOfWeigher.weigh(CatalogdMetaProvider.java:2234)
> at 
> com.google.common.cache.LocalCache$Segment.setValue(LocalCache.java:2043)
> at 
> com.google.common.cache.LocalCache$Segment.replace(LocalCache.java:2990)
> at com.google.common.cache.LocalCache.replace(LocalCache.java:4324)
> at 
> org.apache.impala.catalog.local.CatalogdMetaProvider.loadWithCaching(CatalogdMetaProvider.java:569)
> at 
> org.apache.impala.catalog.local.CatalogdMetaProvider.loadIcebergApiTable(CatalogdMetaProvider.java:1160)
> at 
> org.apache.impala.catalog.local.LocalIcebergTable.loadIcebergTableViaMetaProvider(LocalIcebergTable.java:96)
> at 
> org.apache.impala.catalog.local.LocalTable.load(LocalTable.java:131)
> at 
> org.apache.impala.catalog.local.LocalTable.load(LocalTable.java:114)
> at org.apache.impala.catalog.local.LocalDb.getTable(LocalDb.java:148)
> at 
> org.apache.impala.catalog.local.LocalCatalog.getTable(LocalCatalog.java:139)
> at 
> org.apache.impala.catalog.local.LocalCatalogTest.testLoadIcebergFileDescriptors(LocalCatalogTest.java:280)
> at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.base/java.lang.reflect.Method.invoke(Method.java:566)
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
> at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
> at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
> at 

[jira] [Created] (IMPALA-12819) InaccessibleObjectException found during LocalCatalogTest

2024-02-17 Thread Fang-Yu Rao (Jira)
Fang-Yu Rao created IMPALA-12819:


 Summary: InaccessibleObjectException found during LocalCatalogTest
 Key: IMPALA-12819
 URL: https://issues.apache.org/jira/browse/IMPALA-12819
 Project: IMPALA
  Issue Type: Bug
  Components: fe
Affects Versions: Impala 4.4.0
Reporter: Fang-Yu Rao
Assignee: Michael Smith


We found in an internal build that during LocalCatalogTest we could encounter 
InaccessibleObjectException. This was found by the test 
[test_no_inaccessible_objects|https://github.com/apache/impala/blob/master/tests/verifiers/test_banned_log_messages.py#L40C7-L40C35]
{code:java}
W0217 01:31:14.108255 18119 ObjectGraphWalker.java:251] The JVM is preventing 
Ehcache from accessing the subgraph beneath 'private final 
jdk.internal.platform.CgroupV1Metrics 
jdk.internal.platform.CgroupV1MetricsImpl.metrics' - cache sizes may be 
underestimated as a result
Java exception follows:
java.lang.reflect.InaccessibleObjectException: Unable to make field private 
final jdk.internal.platform.CgroupV1Metrics 
jdk.internal.platform.CgroupV1MetricsImpl.metrics accessible: module java.base 
does not "opens jdk.internal.platform" to unnamed module @2c89cd7f
at 
java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:340)
at 
java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:280)
at 
java.base/java.lang.reflect.Field.checkCanSetAccessible(Field.java:176)
at java.base/java.lang.reflect.Field.setAccessible(Field.java:170)
at 
org.ehcache.sizeof.ObjectGraphWalker.getAllFields(ObjectGraphWalker.java:245)
at 
org.ehcache.sizeof.ObjectGraphWalker.getFilteredFields(ObjectGraphWalker.java:204)
at org.ehcache.sizeof.ObjectGraphWalker.walk(ObjectGraphWalker.java:159)
at org.ehcache.sizeof.SizeOf.deepSizeOf(SizeOf.java:74)
at 
org.apache.impala.catalog.local.CatalogdMetaProvider$SizeOfWeigher.weigh(CatalogdMetaProvider.java:2234)
at 
com.google.common.cache.LocalCache$Segment.setValue(LocalCache.java:2043)
at 
com.google.common.cache.LocalCache$Segment.replace(LocalCache.java:2990)
at com.google.common.cache.LocalCache.replace(LocalCache.java:4324)
at 
org.apache.impala.catalog.local.CatalogdMetaProvider.loadWithCaching(CatalogdMetaProvider.java:569)
at 
org.apache.impala.catalog.local.CatalogdMetaProvider.loadIcebergApiTable(CatalogdMetaProvider.java:1160)
at 
org.apache.impala.catalog.local.LocalIcebergTable.loadIcebergTableViaMetaProvider(LocalIcebergTable.java:96)
at org.apache.impala.catalog.local.LocalTable.load(LocalTable.java:131)
at org.apache.impala.catalog.local.LocalTable.load(LocalTable.java:114)
at org.apache.impala.catalog.local.LocalDb.getTable(LocalDb.java:148)
at 
org.apache.impala.catalog.local.LocalCatalog.getTable(LocalCatalog.java:139)
at 
org.apache.impala.catalog.local.LocalCatalogTest.testLoadIcebergFileDescriptors(LocalCatalogTest.java:280)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:316)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:240)
at 

[jira] [Updated] (IMPALA-11743) Support the OWNER privilege for UDFs in Impala

2024-01-05 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao updated IMPALA-11743:
-
Summary: Support the OWNER privilege for UDFs in Impala  (was: Investigate 
how to support the OWNER privilege for UDFs in Impala)

> Support the OWNER privilege for UDFs in Impala
> --
>
> Key: IMPALA-11743
> URL: https://issues.apache.org/jira/browse/IMPALA-11743
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Frontend
>Reporter: Fang-Yu Rao
>Assignee: Fang-Yu Rao
>Priority: Major
>
> Currently in Impala a user allowed to create a UDF in a database still has to 
> be explicitly granted the necessary privileges to execute the UDF later in a 
> SELECT query. It would be more convenient if the ownership information of a 
> UDF could also be retrieved during the query analysis of such SELECT queries 
> so that the owner/creator of a UDF will be allowed to execute the UDF without 
> being explicitly granted the necessary privileges on the UDF.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12578) Pass the owner user to Ranger plug-in in GRANT and REVOKE statements for databases, tables, and columns

2024-01-05 Thread Fang-Yu Rao (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17803733#comment-17803733
 ] 

Fang-Yu Rao commented on IMPALA-12578:
--

I separate the case of UDFs from this JIRA because currently Impala does not 
have the concept of owner with respect to UDF. According to what is seen in 
IMPALA-11743, the changes needed to support UDF ownership will be complicated 
and thus it's better to have a separate JIRA for the case of UDFs.

> Pass the owner user to Ranger plug-in in GRANT and REVOKE statements for 
> databases, tables, and columns
> ---
>
> Key: IMPALA-12578
> URL: https://issues.apache.org/jira/browse/IMPALA-12578
> Project: IMPALA
>  Issue Type: New Feature
>Reporter: Fang-Yu Rao
>Assignee: Fang-Yu Rao
>Priority: Major
>
> Starting from RANGER-1200, Ranger supports the notion of the OWNER user, 
> which allows each user to perform any operation on the resources owned by it. 
> This avoids the need for creating a new policy that grants the OWNER user the 
> privileges on every newly created  resource. Refer to 
> [apache-ranger-policy-model|https://blogsarchive.apache.org/ranger/entry/apache-ranger-policy-model#:~:text=allow%20each%20user%20to%20access%20all,all].
> Currently for the GRANT and REVOKE statements, Impala does not pass the owner 
> of the resource to the Ranger plug-in and thus a non-administrative user 
> could not grant/revoke privileges on a resource to/from another user even 
> though this non-administrative user owns the resource. We should pass the 
> ownership information to the Ranger plug-in to make authorization management 
> easier in Impala.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-12685) Pass the owner user to Ranger plug-in in GRANT and REVOKE statements for UDFs

2024-01-05 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao updated IMPALA-12685:
-
Summary: Pass the owner user to Ranger plug-in in GRANT and REVOKE 
statements for UDFs  (was: Pass the owner user to Ranger plug-in in GRANT and 
REVOKE statements for UDF)

> Pass the owner user to Ranger plug-in in GRANT and REVOKE statements for UDFs
> -
>
> Key: IMPALA-12685
> URL: https://issues.apache.org/jira/browse/IMPALA-12685
> Project: IMPALA
>  Issue Type: New Feature
>Reporter: Fang-Yu Rao
>Assignee: Fang-Yu Rao
>Priority: Major
>
> This is the follow-up to IMPALA-12578, where we tackle the cases of 
> databases, tables, and columns.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-12685) Pass the owner user to Ranger plug-in in GRANT and REVOKE statements for UDF

2024-01-05 Thread Fang-Yu Rao (Jira)
Fang-Yu Rao created IMPALA-12685:


 Summary: Pass the owner user to Ranger plug-in in GRANT and REVOKE 
statements for UDF
 Key: IMPALA-12685
 URL: https://issues.apache.org/jira/browse/IMPALA-12685
 Project: IMPALA
  Issue Type: New Feature
Reporter: Fang-Yu Rao
Assignee: Fang-Yu Rao


This is the follow-up to IMPALA-12578, where we tackle the cases of databases, 
tables, and columns.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-12578) Pass the owner user to Ranger plug-in in GRANT and REVOKE statements for databases, tables, and columns

2024-01-05 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao updated IMPALA-12578:
-
Summary: Pass the owner user to Ranger plug-in in GRANT and REVOKE 
statements for databases, tables, and columns  (was: Pass the owner user to the 
Ranger plug-in in GRANT and REVOKE statements)

> Pass the owner user to Ranger plug-in in GRANT and REVOKE statements for 
> databases, tables, and columns
> ---
>
> Key: IMPALA-12578
> URL: https://issues.apache.org/jira/browse/IMPALA-12578
> Project: IMPALA
>  Issue Type: New Feature
>Reporter: Fang-Yu Rao
>Assignee: Fang-Yu Rao
>Priority: Major
>
> Starting from RANGER-1200, Ranger supports the notion of the OWNER user, 
> which allows each user to perform any operation on the resources owned by it. 
> This avoids the need for creating a new policy that grants the OWNER user the 
> privileges on every newly created  resource. Refer to 
> [apache-ranger-policy-model|https://blogsarchive.apache.org/ranger/entry/apache-ranger-policy-model#:~:text=allow%20each%20user%20to%20access%20all,all].
> Currently for the GRANT and REVOKE statements, Impala does not pass the owner 
> of the resource to the Ranger plug-in and thus a non-administrative user 
> could not grant/revoke privileges on a resource to/from another user even 
> though this non-administrative user owns the resource. We should pass the 
> ownership information to the Ranger plug-in to make authorization management 
> easier in Impala.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Comment Edited] (IMPALA-11743) Investigate how to support the OWNER privilege for UDFs in Impala

2024-01-05 Thread Fang-Yu Rao (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-11743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17803730#comment-17803730
 ] 

Fang-Yu Rao edited comment on IMPALA-11743 at 1/6/24 12:16 AM:
---

This JIRA is related to IMPALA-12578 where we would like to pass to the Ranger 
plug-in the owner of a resource involved in a GRANT/REVOKE statement.

Specifically, in the case when the resource is a user-defined function (UDF), 
Impala has to load this piece of information when instantiating user-defined 
functions in 
[CatalogServiceCatalog.java#loadJavaFunctions()|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java#L1812C16-L1836]
 so that the owner of a UDF will be available in Impala's internal 
representation of it, i.e., 
[Function.java|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/Function.java].

On a related note, in 
[hive_metastore.thrift|https://github.com/apache/hive/blob/master/standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift],
 Hive already has a field of 'ownerName' for a user-defined function.
{code:java}
struct Function {
  1: string   functionName,
  2: string   dbName,
  3: string   className,
  4: string   ownerName,
  5: PrincipalTypeownerType,
  6: i32  createTime,
  7: FunctionType functionType,
  8: list resourceUris,
  9: optional string  catName
}
{code}
 
On the other hand, when an authorized user is creating a persistent UDF via 
Impala, Impala should also pass the requesting user as the owner of the UDF to 
Hive MetaStore. This way Impala will be able to load the owner of a UDF in 
CatalogServiceCatalog.java#loadJavaFunctions() mentioned above.



was (Author: fangyurao):
This JIRA is related to IMPALA-12578 where we would like to pass to the Ranger 
plug-in the owner of a resource involved in a GRANT/REVOKE statement.

Specifically, in the case when the resource is a user-defined function (UDF), 
Impala has to load this piece of information when instantiating user-defined 
functions in 
[CatalogServiceCatalog.java#loadJavaFunctions()|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java#L1812C16-L1836]
 so that the owner of a UDF will be available in Impala's internal 
representation of it, i.e., 
[Function.java|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/Function.java].

On a related note, in 
[hive_metastore.thrift|https://github.com/apache/hive/blob/master/standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift],
 Hive already has a field of 'ownerName' for a user-defined function.
{code:java}
struct Function {
  1: string   functionName,
  2: string   dbName,
  3: string   className,
  4: string   ownerName,
  5: PrincipalTypeownerType,
  6: i32  createTime,
  7: FunctionType functionType,
  8: list resourceUris,
  9: optional string  catName
}
{code}
 

> Investigate how to support the OWNER privilege for UDFs in Impala
> -
>
> Key: IMPALA-11743
> URL: https://issues.apache.org/jira/browse/IMPALA-11743
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Frontend
>Reporter: Fang-Yu Rao
>Assignee: Fang-Yu Rao
>Priority: Major
>
> Currently in Impala a user allowed to create a UDF in a database still has to 
> be explicitly granted the necessary privileges to execute the UDF later in a 
> SELECT query. It would be more convenient if the ownership information of a 
> UDF could also be retrieved during the query analysis of such SELECT queries 
> so that the owner/creator of a UDF will be allowed to execute the UDF without 
> being explicitly granted the necessary privileges on the UDF.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-11743) Investigate how to support the OWNER privilege for UDFs in Impala

2024-01-05 Thread Fang-Yu Rao (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-11743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17803730#comment-17803730
 ] 

Fang-Yu Rao commented on IMPALA-11743:
--

This JIRA is related to IMPALA-12578 where we would like to pass to the Ranger 
plug-in the owner of a resource involved in a GRANT/REVOKE statement.

Specifically, in the case when the resource is a user-defined function (UDF), 
Impala has to load this piece of information when instantiating user-defined 
functions in 
[CatalogServiceCatalog.java#loadJavaFunctions()|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java#L1812C16-L1836]
 so that the owner of a UDF will be available in Impala's internal 
representation of it, i.e., 
[Function.java|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/Function.java].

On a related note, in 
[hive_metastore.thrift|https://github.com/apache/hive/blob/master/standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift],
 Hive already has a field of 'ownerName' for a user-defined function.
{code:java}
struct Function {
  1: string   functionName,
  2: string   dbName,
  3: string   className,
  4: string   ownerName,
  5: PrincipalTypeownerType,
  6: i32  createTime,
  7: FunctionType functionType,
  8: list resourceUris,
  9: optional string  catName
}
{code}
 

> Investigate how to support the OWNER privilege for UDFs in Impala
> -
>
> Key: IMPALA-11743
> URL: https://issues.apache.org/jira/browse/IMPALA-11743
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Frontend
>Reporter: Fang-Yu Rao
>Assignee: Fang-Yu Rao
>Priority: Major
>
> Currently in Impala a user allowed to create a UDF in a database still has to 
> be explicitly granted the necessary privileges to execute the UDF later in a 
> SELECT query. It would be more convenient if the ownership information of a 
> UDF could also be retrieved during the query analysis of such SELECT queries 
> so that the owner/creator of a UDF will be allowed to execute the UDF without 
> being explicitly granted the necessary privileges on the UDF.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Reopened] (IMPALA-12554) Create only one Ranger policy for GRANT statement

2023-12-22 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao reopened IMPALA-12554:
--

> Create only one Ranger policy for GRANT statement
> -
>
> Key: IMPALA-12554
> URL: https://issues.apache.org/jira/browse/IMPALA-12554
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Fang-Yu Rao
>Assignee: Fang-Yu Rao
>Priority: Major
>
> Currently Impala would create a Ranger policy for each column specified in a 
> GRANT statement. For instance, after the following query, 3 Ranger policies 
> would be created on the Ranger server. This could result in a lot of policies 
> created when there are many columns specified and it may result in Impala's 
> Ranger plug-in taking a long time to download the policies from the Ranger 
> server. It would be great if Impala only creates one single policy for 
> columns in the same table.
> {code:java}
> [localhost:21050] default> grant select(id, bool_col, tinyint_col) on table 
> functional.alltypes to user non_owner;
> Query: grant select(id, bool_col, tinyint_col) on table functional.alltypes 
> to user non_owner
> Query submitted at: 2023-11-10 09:38:58 (Coordinator: http://fangyu:25000)
> Query progress can be monitored at: 
> http://fangyu:25000/query_plan?query_id=bc4fa1cdefe5881b:413d9a69
> +-+
> | summary |
> +-+
> | Privilege(s) have been granted. |
> +-+
> Fetched 1 row(s) in 0.67s
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-12554) Create only one Ranger policy for GRANT statement

2023-12-22 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao resolved IMPALA-12554.
--
Resolution: Implemented

> Create only one Ranger policy for GRANT statement
> -
>
> Key: IMPALA-12554
> URL: https://issues.apache.org/jira/browse/IMPALA-12554
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Fang-Yu Rao
>Assignee: Fang-Yu Rao
>Priority: Major
>
> Currently Impala would create a Ranger policy for each column specified in a 
> GRANT statement. For instance, after the following query, 3 Ranger policies 
> would be created on the Ranger server. This could result in a lot of policies 
> created when there are many columns specified and it may result in Impala's 
> Ranger plug-in taking a long time to download the policies from the Ranger 
> server. It would be great if Impala only creates one single policy for 
> columns in the same table.
> {code:java}
> [localhost:21050] default> grant select(id, bool_col, tinyint_col) on table 
> functional.alltypes to user non_owner;
> Query: grant select(id, bool_col, tinyint_col) on table functional.alltypes 
> to user non_owner
> Query submitted at: 2023-11-10 09:38:58 (Coordinator: http://fangyu:25000)
> Query progress can be monitored at: 
> http://fangyu:25000/query_plan?query_id=bc4fa1cdefe5881b:413d9a69
> +-+
> | summary |
> +-+
> | Privilege(s) have been granted. |
> +-+
> Fetched 1 row(s) in 0.67s
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-12554) Create only one Ranger policy for GRANT statement

2023-12-22 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao resolved IMPALA-12554.
--
Resolution: Later

After some manual testing, we found that RANGER-4585 has some bugs, e.g., 
REVOKE REST API call is not able to revoke the privilege on multiple columns 
from a grantee that was granted the SELECT privilege on the same set of 
columns. Before this is fixed, we resolve the ticket for now and will re-open 
the ticket once this issue is fixed in a follow-up RANGER JIRA.

 

> Create only one Ranger policy for GRANT statement
> -
>
> Key: IMPALA-12554
> URL: https://issues.apache.org/jira/browse/IMPALA-12554
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Fang-Yu Rao
>Assignee: Fang-Yu Rao
>Priority: Major
>
> Currently Impala would create a Ranger policy for each column specified in a 
> GRANT statement. For instance, after the following query, 3 Ranger policies 
> would be created on the Ranger server. This could result in a lot of policies 
> created when there are many columns specified and it may result in Impala's 
> Ranger plug-in taking a long time to download the policies from the Ranger 
> server. It would be great if Impala only creates one single policy for 
> columns in the same table.
> {code:java}
> [localhost:21050] default> grant select(id, bool_col, tinyint_col) on table 
> functional.alltypes to user non_owner;
> Query: grant select(id, bool_col, tinyint_col) on table functional.alltypes 
> to user non_owner
> Query submitted at: 2023-11-10 09:38:58 (Coordinator: http://fangyu:25000)
> Query progress can be monitored at: 
> http://fangyu:25000/query_plan?query_id=bc4fa1cdefe5881b:413d9a69
> +-+
> | summary |
> +-+
> | Privilege(s) have been granted. |
> +-+
> Fetched 1 row(s) in 0.67s
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-12578) Pass the owner user to the Ranger plug-in in GRANT and REVOKE statements

2023-11-27 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao updated IMPALA-12578:
-
Description: 
Starting from RANGER-1200, Ranger supports the notion of the OWNER user, which 
allows each user to perform any operation on the resources owned by it. This 
avoids the need for creating a new policy that grants the OWNER user the 
privileges on every newly created  resource. Refer to 
[apache-ranger-policy-model|https://blogsarchive.apache.org/ranger/entry/apache-ranger-policy-model#:~:text=allow%20each%20user%20to%20access%20all,all].

Currently for the GRANT and REVOKE statements, Impala does not pass the owner 
of the resource to the Ranger plug-in and thus a non-administrative user could 
not grant/revoke privileges on a resource to/from another user even though this 
non-administrative user owns the resource. We should pass the ownership 
information to the Ranger plug-in to make authorization management easier in 
Impala.

  was:
Starting from RANGER-1200, Ranger supports the notion of the OWNER user, which 
allows each user to perform any operation on the resources owned by them. This 
avoids the need for creating a new policy that grants the OWNER user the 
privileges on every newly created  resource. Refer to 
[apache-ranger-policy-model|https://blogsarchive.apache.org/ranger/entry/apache-ranger-policy-model#:~:text=allow%20each%20user%20to%20access%20all,all].

Currently for the GRANT and REVOKE statements, Impala does not pass the owner 
of the resource to the Ranger plug-in and thus a non-administrative user could 
not grant/revoke privileges on a resource to/from another user even though this 
non-administrative user owns the resource. We should pass the ownership 
information to the Ranger plug-in to make authorization management easier in 
Impala.


> Pass the owner user to the Ranger plug-in in GRANT and REVOKE statements
> 
>
> Key: IMPALA-12578
> URL: https://issues.apache.org/jira/browse/IMPALA-12578
> Project: IMPALA
>  Issue Type: New Feature
>Reporter: Fang-Yu Rao
>Assignee: Fang-Yu Rao
>Priority: Major
>
> Starting from RANGER-1200, Ranger supports the notion of the OWNER user, 
> which allows each user to perform any operation on the resources owned by it. 
> This avoids the need for creating a new policy that grants the OWNER user the 
> privileges on every newly created  resource. Refer to 
> [apache-ranger-policy-model|https://blogsarchive.apache.org/ranger/entry/apache-ranger-policy-model#:~:text=allow%20each%20user%20to%20access%20all,all].
> Currently for the GRANT and REVOKE statements, Impala does not pass the owner 
> of the resource to the Ranger plug-in and thus a non-administrative user 
> could not grant/revoke privileges on a resource to/from another user even 
> though this non-administrative user owns the resource. We should pass the 
> ownership information to the Ranger plug-in to make authorization management 
> easier in Impala.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-12578) Pass the owner user to the Ranger plug-in in GRANT and REVOKE statements

2023-11-27 Thread Fang-Yu Rao (Jira)
Fang-Yu Rao created IMPALA-12578:


 Summary: Pass the owner user to the Ranger plug-in in GRANT and 
REVOKE statements
 Key: IMPALA-12578
 URL: https://issues.apache.org/jira/browse/IMPALA-12578
 Project: IMPALA
  Issue Type: New Feature
Reporter: Fang-Yu Rao
Assignee: Fang-Yu Rao


Starting from RANGER-1200, Ranger supports the notion of the OWNER user, which 
allows each user to perform any operation on the resources owned by them. This 
avoids the need for creating a new policy that grants the OWNER user the 
privileges on every newly created  resource. Refer to 
[apache-ranger-policy-model|https://blogsarchive.apache.org/ranger/entry/apache-ranger-policy-model#:~:text=allow%20each%20user%20to%20access%20all,all].

Currently for the GRANT and REVOKE statements, Impala does not pass the owner 
of the resource to the Ranger plug-in and thus a non-administrative user could 
not grant/revoke privileges on a resource to/from another user even though this 
non-administrative user owns the resource. We should pass the ownership 
information to the Ranger plug-in to make authorization management easier in 
Impala.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-12554) Create only one Ranger policy for GRANT statement

2023-11-24 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao updated IMPALA-12554:
-
Description: 
Currently Impala would create a Ranger policy for each column specified in a 
GRANT statement. For instance, after the following query, 3 Ranger policies 
would be created on the Ranger server. This could result in a lot of policies 
created when there are many columns specified and it may result in Impala's 
Ranger plug-in taking a long time to download the policies from the Ranger 
server. It would be great if Impala only creates one single policy for columns 
in the same table.
{code:java}
[localhost:21050] default> grant select(id, bool_col, tinyint_col) on table 
functional.alltypes to user non_owner;
Query: grant select(id, bool_col, tinyint_col) on table functional.alltypes to 
user non_owner
Query submitted at: 2023-11-10 09:38:58 (Coordinator: http://fangyu:25000)
Query progress can be monitored at: 
http://fangyu:25000/query_plan?query_id=bc4fa1cdefe5881b:413d9a69
+-+
| summary |
+-+
| Privilege(s) have been granted. |
+-+
Fetched 1 row(s) in 0.67s
{code}

  was:
Currently Impala would create a Ranger policy for each column specified in a 
GRANT statement. For instance, after the following query, 3 Ranger policies 
would be created on the Ranger server. This could result in a lot of policies 
created when there are many columns specified and it may cause Impala's Ranger 
plug-in a long time to download the policies from the Ranger server. It would 
be great if Impala only creates one single policy for columns in the same table.
{code}
[localhost:21050] default> grant select(id, bool_col, tinyint_col) on table 
functional.alltypes to user non_owner;
Query: grant select(id, bool_col, tinyint_col) on table functional.alltypes to 
user non_owner
Query submitted at: 2023-11-10 09:38:58 (Coordinator: http://fangyu:25000)
Query progress can be monitored at: 
http://fangyu:25000/query_plan?query_id=bc4fa1cdefe5881b:413d9a69
+-+
| summary |
+-+
| Privilege(s) have been granted. |
+-+
Fetched 1 row(s) in 0.67s
{code}


> Create only one Ranger policy for GRANT statement
> -
>
> Key: IMPALA-12554
> URL: https://issues.apache.org/jira/browse/IMPALA-12554
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Fang-Yu Rao
>Assignee: Fang-Yu Rao
>Priority: Major
>
> Currently Impala would create a Ranger policy for each column specified in a 
> GRANT statement. For instance, after the following query, 3 Ranger policies 
> would be created on the Ranger server. This could result in a lot of policies 
> created when there are many columns specified and it may result in Impala's 
> Ranger plug-in taking a long time to download the policies from the Ranger 
> server. It would be great if Impala only creates one single policy for 
> columns in the same table.
> {code:java}
> [localhost:21050] default> grant select(id, bool_col, tinyint_col) on table 
> functional.alltypes to user non_owner;
> Query: grant select(id, bool_col, tinyint_col) on table functional.alltypes 
> to user non_owner
> Query submitted at: 2023-11-10 09:38:58 (Coordinator: http://fangyu:25000)
> Query progress can be monitored at: 
> http://fangyu:25000/query_plan?query_id=bc4fa1cdefe5881b:413d9a69
> +-+
> | summary |
> +-+
> | Privilege(s) have been granted. |
> +-+
> Fetched 1 row(s) in 0.67s
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-3268) Add command "SHOW VIEWS"

2023-11-22 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-3268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao updated IMPALA-3268:

Description: 
Currently to get a list of views, user has to:
 - SHOW TABLES
 - scan through the output list
 - SHOW CREATE TABLE view_name to confirm view_name is a view

which is tedious.

So I would like to request the following:
 - -SHOW TABLES should only return tables-
 - SHOW VIEWS should only return views
 - -add a flag to either above commands to return all tables and views-

This will help lots of end users.

Edit: Moved the first item and the third item out of the scope of this JIRA to 
IMPALA-12574 since more discussion may be required.

  was:
Currently to get a list of views, user has to:

- SHOW TABLES
- scan through the output list
- SHOW CREATE TABLE view_name to confirm view_name is a view

which is tedious.

So I would like to request the following:

- SHOW TABLES should only return tables
- SHOW VIEWS should only return views
- add a flag to either above commands to return all tables and views

This will help lots of end users.


> Add command "SHOW VIEWS"
> 
>
> Key: IMPALA-3268
> URL: https://issues.apache.org/jira/browse/IMPALA-3268
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Catalog
>Affects Versions: Impala 2.2.4, Impala 2.3.0, Impala 2.5.0
>Reporter: Eric Lin
>Assignee: Fang-Yu Rao
>Priority: Minor
>  Labels: usability
>
> Currently to get a list of views, user has to:
>  - SHOW TABLES
>  - scan through the output list
>  - SHOW CREATE TABLE view_name to confirm view_name is a view
> which is tedious.
> So I would like to request the following:
>  - -SHOW TABLES should only return tables-
>  - SHOW VIEWS should only return views
>  - -add a flag to either above commands to return all tables and views-
> This will help lots of end users.
> Edit: Moved the first item and the third item out of the scope of this JIRA 
> to IMPALA-12574 since more discussion may be required.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-12574) Consider extending SHOW TABLES statement so it only display tables

2023-11-22 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao updated IMPALA-12574:
-
Summary: Consider extending SHOW TABLES statement so it only display tables 
 (was: Consider extending SHOW TABLES statement so it only display the tables)

> Consider extending SHOW TABLES statement so it only display tables
> --
>
> Key: IMPALA-12574
> URL: https://issues.apache.org/jira/browse/IMPALA-12574
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Catalog, Frontend
>Reporter: Fang-Yu Rao
>Priority: Minor
>
> IMPALA-3268 extended Frontend's API of GetTableNames() such that 
> GetTableNames() could return the matching tables whose table type is in the 
> specified set of table types. With this change, it should not be too 
> difficult to extend the SHOW TABLES statement such that SHOW TABLES could 
> display only the tables of a specified type (v.s. all types of tables). It 
> would be great to have this functionality.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-12574) Consider extending SHOW TABLES statement so it only display the tables

2023-11-22 Thread Fang-Yu Rao (Jira)
Fang-Yu Rao created IMPALA-12574:


 Summary: Consider extending SHOW TABLES statement so it only 
display the tables
 Key: IMPALA-12574
 URL: https://issues.apache.org/jira/browse/IMPALA-12574
 Project: IMPALA
  Issue Type: New Feature
  Components: Catalog, Frontend
Reporter: Fang-Yu Rao


IMPALA-3268 extended Frontend's API of GetTableNames() such that 
GetTableNames() could return the matching tables whose table type is in the 
specified set of table types. With this change, it should not be too difficult 
to extend the SHOW TABLES statement such that SHOW TABLES could display only 
the tables of a specified type (v.s. all types of tables). It would be great to 
have this functionality.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-12554) Create only one Ranger policy for GRANT statement

2023-11-10 Thread Fang-Yu Rao (Jira)
Fang-Yu Rao created IMPALA-12554:


 Summary: Create only one Ranger policy for GRANT statement
 Key: IMPALA-12554
 URL: https://issues.apache.org/jira/browse/IMPALA-12554
 Project: IMPALA
  Issue Type: Improvement
Reporter: Fang-Yu Rao
Assignee: Fang-Yu Rao


Currently Impala would create a Ranger policy for each column specified in a 
GRANT statement. For instance, after the following query, 3 Ranger policies 
would be created on the Ranger server. This could result in a lot of policies 
created when there are many columns specified and it may cause Impala's Ranger 
plug-in a long time to download the policies from the Ranger server. It would 
be great if Impala only creates one single policy for columns in the same table.
{code}
[localhost:21050] default> grant select(id, bool_col, tinyint_col) on table 
functional.alltypes to user non_owner;
Query: grant select(id, bool_col, tinyint_col) on table functional.alltypes to 
user non_owner
Query submitted at: 2023-11-10 09:38:58 (Coordinator: http://fangyu:25000)
Query progress can be monitored at: 
http://fangyu:25000/query_plan?query_id=bc4fa1cdefe5881b:413d9a69
+-+
| summary |
+-+
| Privilege(s) have been granted. |
+-+
Fetched 1 row(s) in 0.67s
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-3268) Add command "SHOW VIEWS"

2023-11-06 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-3268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao reassigned IMPALA-3268:
---

Assignee: Fang-Yu Rao

> Add command "SHOW VIEWS"
> 
>
> Key: IMPALA-3268
> URL: https://issues.apache.org/jira/browse/IMPALA-3268
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Catalog
>Affects Versions: Impala 2.2.4, Impala 2.3.0, Impala 2.5.0
>Reporter: Eric Lin
>Assignee: Fang-Yu Rao
>Priority: Minor
>  Labels: usability
>
> Currently to get a list of views, user has to:
> - SHOW TABLES
> - scan through the output list
> - SHOW CREATE TABLE view_name to confirm view_name is a view
> which is tedious.
> So I would like to request the following:
> - SHOW TABLES should only return tables
> - SHOW VIEWS should only return views
> - add a flag to either above commands to return all tables and views
> This will help lots of end users.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12528) test_hdfs_scanner_thread_non_reserved_bytes could occasionally fail

2023-10-29 Thread Fang-Yu Rao (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17780764#comment-17780764
 ] 

Fang-Yu Rao commented on IMPALA-12528:
--

Hi [~rizaon], assigned this JIRA to you since you are more familiar with the 
corresponding test. Please re-assign the ticket as you see appropriate. Thanks!

> test_hdfs_scanner_thread_non_reserved_bytes could occasionally fail
> ---
>
> Key: IMPALA-12528
> URL: https://issues.apache.org/jira/browse/IMPALA-12528
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Fang-Yu Rao
>Assignee: Riza Suminto
>Priority: Major
>  Labels: broken-build, flaky-test
>
> [test_hdfs_scanner_thread_non_reserved_bytes()|https://github.com/apache/impala/blob/master/tests/query_test/test_mem_usage_scaling.py#L379]
>  could occassionally fail with the following error.
> *+Stacktrace+*
> {code:java}
> E   AssertionError: Aggregation of SUM over NumScannerThreadsStarted did not 
> match expected results.
> E   EXPECTED VALUE:
> E   3
> E   
> E   
> E   ACTUAL VALUE:
> E   1
> {code}
> The corresponding test file 
> [hdfs-scanner-thread-non-reserved-bytes.test|https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/hdfs-scanner-thread-non-reserved-bytes.test]
>  was recently added in IMPALA-12499.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-12528) test_hdfs_scanner_thread_non_reserved_bytes could occasionally fail

2023-10-29 Thread Fang-Yu Rao (Jira)
Fang-Yu Rao created IMPALA-12528:


 Summary: test_hdfs_scanner_thread_non_reserved_bytes could 
occasionally fail
 Key: IMPALA-12528
 URL: https://issues.apache.org/jira/browse/IMPALA-12528
 Project: IMPALA
  Issue Type: Bug
Reporter: Fang-Yu Rao
Assignee: Riza Suminto


[test_hdfs_scanner_thread_non_reserved_bytes()|https://github.com/apache/impala/blob/master/tests/query_test/test_mem_usage_scaling.py#L379]
 could occassionally fail with the following error.

*+Stacktrace+*
{code:java}
E   AssertionError: Aggregation of SUM over NumScannerThreadsStarted did not 
match expected results.
E   EXPECTED VALUE:
E   3
E   
E   
E   ACTUAL VALUE:
E   1
{code}
The corresponding test file 
[hdfs-scanner-thread-non-reserved-bytes.test|https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/hdfs-scanner-thread-non-reserved-bytes.test]
 was recently added in IMPALA-12499.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12527) test_metadata_tables could occasionally fail in the s3 build

2023-10-27 Thread Fang-Yu Rao (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17780556#comment-17780556
 ] 

Fang-Yu Rao commented on IMPALA-12527:
--

Hi [~tmate], assigned the JIRA to you since you recently revised the failed 
test in IMPALA-11996 so you are more familiar with this area. Please re-assign 
the ticket as you see appropriate. Thanks!


> test_metadata_tables could occasionally fail in the s3 build
> 
>
> Key: IMPALA-12527
> URL: https://issues.apache.org/jira/browse/IMPALA-12527
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Fang-Yu Rao
>Assignee: Tamas Mate
>Priority: Major
>  Labels: broken-build, flaky-test
>
> We found that 
> [test_metadata_tables()|https://github.infra.cloudera.com/CDH/Impala/blame/cdw-master-staging/tests/query_test/test_iceberg.py#L1219]
>  that runs 
> [iceberg-metadata-tables.test|https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test]
>  could occasionally fail with the following error message.
> It looks like the actual result does not match the expected result for some 
> columns.
> Stacktrace
> {code}
> query_test/test_iceberg.py:1226: in test_metadata_tables
> '$OVERWRITE_SNAPSHOT_TS': str(overwrite_snapshot_ts.data[0])})
> common/impala_test_suite.py:751: in run_test_case
> self.__verify_results_and_errors(vector, test_section, result, use_db)
> common/impala_test_suite.py:587: in __verify_results_and_errors
> replace_filenames_with_placeholder)
> common/test_result_verifier.py:487: in verify_raw_results
> VERIFIER_MAP[verifier](expected, actual)
> common/test_result_verifier.py:296: in verify_query_result_is_equal
> assert expected_results == actual_results
> E   assert Comparing QueryTestResults (expected vs actual):
> E 
> row_regex:0,'s3a://impala-test-uswest2-2/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/.*.parq','PARQUET',0,1,[1-9]\d*|0,'',0
>  != 
> 0,'/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/7d479ffb82bfffd3-7ce667e5_544607964_data.0.parq','PARQUET',0,1,351,'NULL',0
> E 
> row_regex:0,'s3a://impala-test-uswest2-2/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/.*.parq','PARQUET',0,1,[1-9]\d*|0,'',0
>  != 
> 0,'/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/ab4ffd0d75a5a68d-13da0831_1541521750_data.0.parq','PARQUET',0,1,351,'NULL',0
> E 
> row_regex:0,'s3a://impala-test-uswest2-2/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/.*.parq','PARQUET',0,1,[1-9]\d*|0,'',0
>  != 
> 0,'/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/b04d1095845359f5-f0799bd0_1209897284_data.0.parq','PARQUET',0,1,351,'NULL',0
> E 
> row_regex:1,'s3a://impala-test-uswest2-2/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/.*.parq','PARQUET',0,1,[1-9]\d*|0,'NULL',NULL
>  != 
> 1,'/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/delete-1b45db885b2bdd56-4023218d0002_1697110314_data.0.parq','PARQUET',0,1,1531,'NULL',NULL
> {code}
> Specifically, it seems the value of the second last column are different from 
> the expected value in some rows.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-12527) test_metadata_tables could occasionally fail in the s3 build

2023-10-27 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao updated IMPALA-12527:
-
Description: 
We found that 
[test_metadata_tables()|https://github.infra.cloudera.com/CDH/Impala/blame/cdw-master-staging/tests/query_test/test_iceberg.py#L1219]
 that runs 
[iceberg-metadata-tables.test|https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test]
 could occasionally fail with the following error message.

It looks like the actual result does not match the expected result for some 
columns.

Stacktrace
{code}
query_test/test_iceberg.py:1226: in test_metadata_tables
'$OVERWRITE_SNAPSHOT_TS': str(overwrite_snapshot_ts.data[0])})
common/impala_test_suite.py:751: in run_test_case
self.__verify_results_and_errors(vector, test_section, result, use_db)
common/impala_test_suite.py:587: in __verify_results_and_errors
replace_filenames_with_placeholder)
common/test_result_verifier.py:487: in verify_raw_results
VERIFIER_MAP[verifier](expected, actual)
common/test_result_verifier.py:296: in verify_query_result_is_equal
assert expected_results == actual_results
E   assert Comparing QueryTestResults (expected vs actual):
E 
row_regex:0,'s3a://impala-test-uswest2-2/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/.*.parq','PARQUET',0,1,[1-9]\d*|0,'',0
 != 
0,'/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/7d479ffb82bfffd3-7ce667e5_544607964_data.0.parq','PARQUET',0,1,351,'NULL',0
E 
row_regex:0,'s3a://impala-test-uswest2-2/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/.*.parq','PARQUET',0,1,[1-9]\d*|0,'',0
 != 
0,'/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/ab4ffd0d75a5a68d-13da0831_1541521750_data.0.parq','PARQUET',0,1,351,'NULL',0
E 
row_regex:0,'s3a://impala-test-uswest2-2/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/.*.parq','PARQUET',0,1,[1-9]\d*|0,'',0
 != 
0,'/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/b04d1095845359f5-f0799bd0_1209897284_data.0.parq','PARQUET',0,1,351,'NULL',0
E 
row_regex:1,'s3a://impala-test-uswest2-2/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/.*.parq','PARQUET',0,1,[1-9]\d*|0,'NULL',NULL
 != 
1,'/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/delete-1b45db885b2bdd56-4023218d0002_1697110314_data.0.parq','PARQUET',0,1,1531,'NULL',NULL
{code}

Specifically, it seems the value of the second last column are different from 
the expected value in some rows.

  was:
We found that 
[test_metadata_tables()|https://github.infra.cloudera.com/CDH/Impala/blame/cdw-master-staging/tests/query_test/test_iceberg.py#L1219]
 that runs 
[iceberg-metadata-tables.test|https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test]
 could occasionally fail with the following error message.

It looks like the actual result do not match the expected result for some 
columns.

Stacktrace
{code}
query_test/test_iceberg.py:1226: in test_metadata_tables
'$OVERWRITE_SNAPSHOT_TS': str(overwrite_snapshot_ts.data[0])})
common/impala_test_suite.py:751: in run_test_case
self.__verify_results_and_errors(vector, test_section, result, use_db)
common/impala_test_suite.py:587: in __verify_results_and_errors
replace_filenames_with_placeholder)
common/test_result_verifier.py:487: in verify_raw_results
VERIFIER_MAP[verifier](expected, actual)
common/test_result_verifier.py:296: in verify_query_result_is_equal
assert expected_results == actual_results
E   assert Comparing QueryTestResults (expected vs actual):
E 
row_regex:0,'s3a://impala-test-uswest2-2/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/.*.parq','PARQUET',0,1,[1-9]\d*|0,'',0
 != 
0,'/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/7d479ffb82bfffd3-7ce667e5_544607964_data.0.parq','PARQUET',0,1,351,'NULL',0
E 
row_regex:0,'s3a://impala-test-uswest2-2/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/.*.parq','PARQUET',0,1,[1-9]\d*|0,'',0
 != 
0,'/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/ab4ffd0d75a5a68d-13da0831_1541521750_data.0.parq','PARQUET',0,1,351,'NULL',0
E 
row_regex:0,'s3a://impala-test-uswest2-2/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/.*.parq','PARQUET',0,1,[1-9]\d*|0,'',0
 != 
0,'/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/b04d1095845359f5-f0799bd0_1209897284_data.0.parq','PARQUET',0,1,351,'NULL',0
E 

[jira] [Created] (IMPALA-12527) test_metadata_tables could occasionally fail in the s3 build

2023-10-27 Thread Fang-Yu Rao (Jira)
Fang-Yu Rao created IMPALA-12527:


 Summary: test_metadata_tables could occasionally fail in the s3 
build
 Key: IMPALA-12527
 URL: https://issues.apache.org/jira/browse/IMPALA-12527
 Project: IMPALA
  Issue Type: Bug
Reporter: Fang-Yu Rao
Assignee: Tamas Mate


We found that 
[test_metadata_tables()|https://github.infra.cloudera.com/CDH/Impala/blame/cdw-master-staging/tests/query_test/test_iceberg.py#L1219]
 that runs 
[iceberg-metadata-tables.test|https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test]
 could occasionally fail with the following error message.

It looks like the actual result do not match the expected result for some 
columns.

Stacktrace
{code}
query_test/test_iceberg.py:1226: in test_metadata_tables
'$OVERWRITE_SNAPSHOT_TS': str(overwrite_snapshot_ts.data[0])})
common/impala_test_suite.py:751: in run_test_case
self.__verify_results_and_errors(vector, test_section, result, use_db)
common/impala_test_suite.py:587: in __verify_results_and_errors
replace_filenames_with_placeholder)
common/test_result_verifier.py:487: in verify_raw_results
VERIFIER_MAP[verifier](expected, actual)
common/test_result_verifier.py:296: in verify_query_result_is_equal
assert expected_results == actual_results
E   assert Comparing QueryTestResults (expected vs actual):
E 
row_regex:0,'s3a://impala-test-uswest2-2/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/.*.parq','PARQUET',0,1,[1-9]\d*|0,'',0
 != 
0,'/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/7d479ffb82bfffd3-7ce667e5_544607964_data.0.parq','PARQUET',0,1,351,'NULL',0
E 
row_regex:0,'s3a://impala-test-uswest2-2/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/.*.parq','PARQUET',0,1,[1-9]\d*|0,'',0
 != 
0,'/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/ab4ffd0d75a5a68d-13da0831_1541521750_data.0.parq','PARQUET',0,1,351,'NULL',0
E 
row_regex:0,'s3a://impala-test-uswest2-2/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/.*.parq','PARQUET',0,1,[1-9]\d*|0,'',0
 != 
0,'/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/b04d1095845359f5-f0799bd0_1209897284_data.0.parq','PARQUET',0,1,351,'NULL',0
E 
row_regex:1,'s3a://impala-test-uswest2-2/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/.*.parq','PARQUET',0,1,[1-9]\d*|0,'NULL',NULL
 != 
1,'/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/delete-1b45db885b2bdd56-4023218d0002_1697110314_data.0.parq','PARQUET',0,1,1531,'NULL',NULL
{code}

Specifically, it seems the value of the second last column are different from 
the expected value in some rows.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12526) BackendConfig.INSTANCE could be null in the frontend test testResetMetadataDesc

2023-10-27 Thread Fang-Yu Rao (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17780524#comment-17780524
 ] 

Fang-Yu Rao commented on IMPALA-12526:
--

Hi [~stigahuang], assigned this JIRA to you since you are more familiar with 
the failed frontend test. Please re-assign the ticket as you see appropriate. 
Thanks!

> BackendConfig.INSTANCE could be null in the frontend test 
> testResetMetadataDesc
> ---
>
> Key: IMPALA-12526
> URL: https://issues.apache.org/jira/browse/IMPALA-12526
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Fang-Yu Rao
>Assignee: Quanlong Huang
>Priority: Major
>  Labels: broken-build, flaky-test
>
> We found that 
> [BackendConfig.INSTANCE|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/analysis/ResetMetadataStmt.java#L265]
>  could be null in the frontend test 
> [testResetMetadataDesc()|https://github.com/apache/impala/blob/master/fe/src/test/java/org/apache/impala/util/CatalogOpUtilTest.java#L65]
>  and thus 
> [ResetMetadataStmt#toThrift()|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/analysis/ResetMetadataStmt.java#L265]
>  could fail with the following error.
> {code}
> Cannot invoke "org.apache.impala.service.BackendConfig.getHostname()" because 
> "org.apache.impala.service.BackendConfig.INSTANCE" is null
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12526) BackendConfig.INSTANCE could be null in the frontend test testResetMetadataDesc

2023-10-27 Thread Fang-Yu Rao (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17780523#comment-17780523
 ] 

Fang-Yu Rao commented on IMPALA-12526:
--

This issue seems to be the same as IMPALA-11699 but I could not be completely 
sure.

> BackendConfig.INSTANCE could be null in the frontend test 
> testResetMetadataDesc
> ---
>
> Key: IMPALA-12526
> URL: https://issues.apache.org/jira/browse/IMPALA-12526
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Fang-Yu Rao
>Assignee: Quanlong Huang
>Priority: Major
>  Labels: broken-build, flaky-test
>
> We found that 
> [BackendConfig.INSTANCE|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/analysis/ResetMetadataStmt.java#L265]
>  could be null in the frontend test 
> [testResetMetadataDesc()|https://github.com/apache/impala/blob/master/fe/src/test/java/org/apache/impala/util/CatalogOpUtilTest.java#L65]
>  and thus 
> [ResetMetadataStmt#toThrift()|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/analysis/ResetMetadataStmt.java#L265]
>  could fail with the following error.
> {code}
> Cannot invoke "org.apache.impala.service.BackendConfig.getHostname()" because 
> "org.apache.impala.service.BackendConfig.INSTANCE" is null
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-12526) BackendConfig.INSTANCE could be null in the frontend test testResetMetadataDesc

2023-10-27 Thread Fang-Yu Rao (Jira)
Fang-Yu Rao created IMPALA-12526:


 Summary: BackendConfig.INSTANCE could be null in the frontend test 
testResetMetadataDesc
 Key: IMPALA-12526
 URL: https://issues.apache.org/jira/browse/IMPALA-12526
 Project: IMPALA
  Issue Type: Bug
Reporter: Fang-Yu Rao
Assignee: Quanlong Huang


We found that 
[BackendConfig.INSTANCE|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/analysis/ResetMetadataStmt.java#L265]
 could be null in the frontend test 
[testResetMetadataDesc()|https://github.com/apache/impala/blob/master/fe/src/test/java/org/apache/impala/util/CatalogOpUtilTest.java#L65]
 and thus 
[ResetMetadataStmt#toThrift()|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/analysis/ResetMetadataStmt.java#L265]
 could fail with the following error.

{code}
Cannot invoke "org.apache.impala.service.BackendConfig.getHostname()" because 
"org.apache.impala.service.BackendConfig.INSTANCE" is null
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-12525) statestore.active-status did not reach value True in 120s

2023-10-27 Thread Fang-Yu Rao (Jira)
Fang-Yu Rao created IMPALA-12525:


 Summary: statestore.active-status did not reach value True in 120s
 Key: IMPALA-12525
 URL: https://issues.apache.org/jira/browse/IMPALA-12525
 Project: IMPALA
  Issue Type: Bug
Reporter: Fang-Yu Rao
Assignee: Wenzhe Zhou


We found that it's possible that 
[statestore.active-status|https://github.com/apache/impala/blob/master/tests/custom_cluster/test_statestored_ha.py#L452]
 could not reach value True in 120s.

*+Error Message+*
{code:java}
AssertionError: Metric statestore.active-status did not reach value True in 
120s. Dumping debug webpages in JSON format... Dumped memz JSON to 
$IMPALA_HOME/logs/metric_timeout_diags_20231026_01:53:53/json/memz.json Dumped 
metrics JSON to 
$IMPALA_HOME/logs/metric_timeout_diags_20231026_01:53:53/json/metrics.json 
Dumped queries JSON to 
$IMPALA_HOME/logs/metric_timeout_diags_20231026_01:53:53/json/queries.json 
Dumped sessions JSON to 
$IMPALA_HOME/logs/metric_timeout_diags_20231026_01:53:53/json/sessions.json 
Dumped threadz JSON to 
$IMPALA_HOME/logs/metric_timeout_diags_20231026_01:53:53/json/threadz.json 
Dumped rpcz JSON to 
$IMPALA_HOME/logs/metric_timeout_diags_20231026_01:53:53/json/rpcz.json Dumping 
minidumps for impalads/catalogds... Dumped minidump for Impalad PID 32539 
Dumped minidump for Impalad PID 32543 Dumped minidump for Impalad PID 32550 
Dumped minidump for Catalogd PID 32460
{code}
*+Stacktrace+*
{code:java}
custom_cluster/test_statestored_ha.py:500: in test_statestored_manual_failover
self.__test_statestored_manual_failover(second_failover=True)
custom_cluster/test_statestored_ha.py:452: in __test_statestored_manual_failover
"statestore.active-status", expected_value=True, timeout=120)
common/impala_service.py:144: in wait_for_metric_value
self.__metric_timeout_assert(metric_name, expected_value, timeout)
common/impala_service.py:213: in __metric_timeout_assert
assert 0, assert_string
E   AssertionError: Metric statestore.active-status did not reach value True in 
120s.
E   Dumping debug webpages in JSON format...
E   Dumped memz JSON to 
$IMPALA_HOME/logs/metric_timeout_diags_20231026_01:53:53/json/memz.json
E   Dumped metrics JSON to 
$IMPALA_HOME/logs/metric_timeout_diags_20231026_01:53:53/json/metrics.json
E   Dumped queries JSON to 
$IMPALA_HOME/logs/metric_timeout_diags_20231026_01:53:53/json/queries.json
E   Dumped sessions JSON to 
$IMPALA_HOME/logs/metric_timeout_diags_20231026_01:53:53/json/sessions.json
E   Dumped threadz JSON to 
$IMPALA_HOME/logs/metric_timeout_diags_20231026_01:53:53/json/threadz.json
E   Dumped rpcz JSON to 
$IMPALA_HOME/logs/metric_timeout_diags_20231026_01:53:53/json/rpcz.json
E   Dumping minidumps for impalads/catalogds...
E   Dumped minidump for Impalad PID 32539
E   Dumped minidump for Impalad PID 32543
E   Dumped minidump for Impalad PID 32550
E   Dumped minidump for Catalogd PID 32460
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-12522) test_alter_table_recover could finish less than 10 seconds with JDK 17 when enable_async_ddl_execution is False

2023-10-26 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao updated IMPALA-12522:
-
Priority: Critical  (was: Major)

> test_alter_table_recover could finish less than 10 seconds with JDK 17 when 
> enable_async_ddl_execution is False
> ---
>
> Key: IMPALA-12522
> URL: https://issues.apache.org/jira/browse/IMPALA-12522
> Project: IMPALA
>  Issue Type: Test
>Reporter: Fang-Yu Rao
>Assignee: Joe McDonnell
>Priority: Critical
>  Labels: broken-build, flaky-test
>
> We found that 
> [test_alter_table_recover()|https://github.com/apache/impala/blame/master/tests/metadata/test_ddl.py#L1026]
>  could finish the execution within 10 seconds with JDK 17 when 
> enable_async_ddl_execution is False and thus the check in the [else 
> branch|https://github.com/apache/impala/blame/master/tests/metadata/test_ddl.py#L1079C12-L1079C12]
>  could fail. Don't know it has something to do with JDK but maybe we could 
> reduce the expected execution time a little bit to make the test less flaky.
> {code}
>   # In sync mode:
>   #  The entire DDL is processed in the exec step with delay. exec_time 
> should be
>   #  more than 10 seconds.
>   #
>   # In async mode:
>   #  The compilation of DDL is processed in the exec step without delay. 
> And the
>   #  processing of the DDL plan is in wait step with delay. The wait time 
> should
>   #  definitely take more time than 10 seconds.
>   if enable_async_ddl:
> assert(wait_time >= 10)
>   else:
> assert(exec_time >= 10)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12522) test_alter_table_recover could finish less than 10 seconds with JDK 17 when enable_async_ddl_execution is False

2023-10-26 Thread Fang-Yu Rao (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17780105#comment-17780105
 ] 

Fang-Yu Rao commented on IMPALA-12522:
--

Hi [~joemcdonnell], assigned this JIRA to you since you helped review 
[IMPALA-10811|https://gerrit.cloudera.org/c/17872/38/tests/metadata/test_ddl.py#1012]
 that added this test. Please reassign the JIRA as you see appropriate. Thanks!

> test_alter_table_recover could finish less than 10 seconds with JDK 17 when 
> enable_async_ddl_execution is False
> ---
>
> Key: IMPALA-12522
> URL: https://issues.apache.org/jira/browse/IMPALA-12522
> Project: IMPALA
>  Issue Type: Test
>Reporter: Fang-Yu Rao
>Assignee: Joe McDonnell
>Priority: Major
>  Labels: broken-build, flaky-test
>
> We found that 
> [test_alter_table_recover()|https://github.com/apache/impala/blame/master/tests/metadata/test_ddl.py#L1026]
>  could finish the execution within 10 seconds with JDK 17 when 
> enable_async_ddl_execution is False and thus the check in the [else 
> branch|https://github.com/apache/impala/blame/master/tests/metadata/test_ddl.py#L1079C12-L1079C12]
>  could fail. Don't know it has something to do with JDK but maybe we could 
> reduce the expected execution time a little bit to make the test less flaky.
> {code}
>   # In sync mode:
>   #  The entire DDL is processed in the exec step with delay. exec_time 
> should be
>   #  more than 10 seconds.
>   #
>   # In async mode:
>   #  The compilation of DDL is processed in the exec step without delay. 
> And the
>   #  processing of the DDL plan is in wait step with delay. The wait time 
> should
>   #  definitely take more time than 10 seconds.
>   if enable_async_ddl:
> assert(wait_time >= 10)
>   else:
> assert(exec_time >= 10)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-12522) test_alter_table_recover could finish less than 10 seconds with JDK 17 when enable_async_ddl_execution is False

2023-10-26 Thread Fang-Yu Rao (Jira)
Fang-Yu Rao created IMPALA-12522:


 Summary: test_alter_table_recover could finish less than 10 
seconds with JDK 17 when enable_async_ddl_execution is False
 Key: IMPALA-12522
 URL: https://issues.apache.org/jira/browse/IMPALA-12522
 Project: IMPALA
  Issue Type: Test
Reporter: Fang-Yu Rao
Assignee: Joe McDonnell


We found that 
[test_alter_table_recover()|https://github.com/apache/impala/blame/master/tests/metadata/test_ddl.py#L1026]
 could finish the execution within 10 seconds with JDK 17 when 
enable_async_ddl_execution is False and thus the check in the [else 
branch|https://github.com/apache/impala/blame/master/tests/metadata/test_ddl.py#L1079C12-L1079C12]
 could fail. Don't know it has something to do with JDK but maybe we could 
reduce the expected execution time a little bit to make the test less flaky.
{code}
  # In sync mode:
  #  The entire DDL is processed in the exec step with delay. exec_time 
should be
  #  more than 10 seconds.
  #
  # In async mode:
  #  The compilation of DDL is processed in the exec step without delay. 
And the
  #  processing of the DDL plan is in wait step with delay. The wait time 
should
  #  definitely take more time than 10 seconds.
  if enable_async_ddl:
assert(wait_time >= 10)
  else:
assert(exec_time >= 10)
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12500) TestObservability.test_global_exchange_counters is flaky

2023-10-23 Thread Fang-Yu Rao (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17778839#comment-17778839
 ] 

Fang-Yu Rao commented on IMPALA-12500:
--

Hi [~csringhofer], assigned this JIRA to you since you recently revised the 
test at 
[IMPALA-12430|https://github.com/apache/impala/commit/fb2d2b27641a95f51b6789639fab73b60abd7bc5#diff-a317a4067b5728a2d0af9839c1dce94710e7bd50825ceffc0a3c88aca3e27de3R553]
 and thus may be more familiar with the test. Please feel free to reassign the 
JIRA as you see fit. Thanks!

> TestObservability.test_global_exchange_counters is flaky
> 
>
> Key: IMPALA-12500
> URL: https://issues.apache.org/jira/browse/IMPALA-12500
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.4.0
>Reporter: Joe McDonnell
>Assignee: Csaba Ringhofer
>Priority: Critical
>  Labels: broken-build, flaky
>
> There have been intermittent failures on this test with the following symptom:
> {noformat}
> query_test/test_observability.py:564: in test_global_exchange_counters
> assert "ExchangeScanRatio: 4.63" in profile
> E   assert 'ExchangeScanRatio: 4.63' in 'Query 
> (id=c04b974db37e7046:b5fe4dea):\n  DEBUG MODE WARNING: Query profile 
> created while running a DEBUG buil...: 0.000ns\n - WriteIoBytes: 
> 0\n - WriteIoOps: 0 (0)\n - WriteIoWaitTime: 
> 0.000ns\n'
> -- executing against localhost:21000
> select count(*), sleep(50) from tpch_parquet.orders o
> inner join tpch_parquet.lineitem l on o.o_orderkey = l.l_orderkey
> group by o.o_clerk limit 10;
> -- 2023-10-05 19:47:29,817 INFO MainThread: Started query 
> c04b974db37e7046:b5fe4dea{noformat}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-12500) TestObservability.test_global_exchange_counters is flaky

2023-10-23 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao reassigned IMPALA-12500:


Assignee: Fang-Yu Rao

> TestObservability.test_global_exchange_counters is flaky
> 
>
> Key: IMPALA-12500
> URL: https://issues.apache.org/jira/browse/IMPALA-12500
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.4.0
>Reporter: Joe McDonnell
>Assignee: Fang-Yu Rao
>Priority: Critical
>  Labels: broken-build, flaky
>
> There have been intermittent failures on this test with the following symptom:
> {noformat}
> query_test/test_observability.py:564: in test_global_exchange_counters
> assert "ExchangeScanRatio: 4.63" in profile
> E   assert 'ExchangeScanRatio: 4.63' in 'Query 
> (id=c04b974db37e7046:b5fe4dea):\n  DEBUG MODE WARNING: Query profile 
> created while running a DEBUG buil...: 0.000ns\n - WriteIoBytes: 
> 0\n - WriteIoOps: 0 (0)\n - WriteIoWaitTime: 
> 0.000ns\n'
> -- executing against localhost:21000
> select count(*), sleep(50) from tpch_parquet.orders o
> inner join tpch_parquet.lineitem l on o.o_orderkey = l.l_orderkey
> group by o.o_clerk limit 10;
> -- 2023-10-05 19:47:29,817 INFO MainThread: Started query 
> c04b974db37e7046:b5fe4dea{noformat}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-12500) TestObservability.test_global_exchange_counters is flaky

2023-10-23 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao reassigned IMPALA-12500:


Assignee: Csaba Ringhofer  (was: Fang-Yu Rao)

> TestObservability.test_global_exchange_counters is flaky
> 
>
> Key: IMPALA-12500
> URL: https://issues.apache.org/jira/browse/IMPALA-12500
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.4.0
>Reporter: Joe McDonnell
>Assignee: Csaba Ringhofer
>Priority: Critical
>  Labels: broken-build, flaky
>
> There have been intermittent failures on this test with the following symptom:
> {noformat}
> query_test/test_observability.py:564: in test_global_exchange_counters
> assert "ExchangeScanRatio: 4.63" in profile
> E   assert 'ExchangeScanRatio: 4.63' in 'Query 
> (id=c04b974db37e7046:b5fe4dea):\n  DEBUG MODE WARNING: Query profile 
> created while running a DEBUG buil...: 0.000ns\n - WriteIoBytes: 
> 0\n - WriteIoOps: 0 (0)\n - WriteIoWaitTime: 
> 0.000ns\n'
> -- executing against localhost:21000
> select count(*), sleep(50) from tpch_parquet.orders o
> inner join tpch_parquet.lineitem l on o.o_orderkey = l.l_orderkey
> group by o.o_clerk limit 10;
> -- 2023-10-05 19:47:29,817 INFO MainThread: Started query 
> c04b974db37e7046:b5fe4dea{noformat}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10712) SET OWNER ROLE of a database/table/view is not supported when Ranger is the authorization provider

2023-10-13 Thread Fang-Yu Rao (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17775048#comment-17775048
 ] 

Fang-Yu Rao commented on IMPALA-10712:
--

It looks like I created a JIRA more than 2 years ago for the same issue.

> SET OWNER ROLE  of a database/table/view is not supported when 
> Ranger is the authorization provider
> --
>
> Key: IMPALA-10712
> URL: https://issues.apache.org/jira/browse/IMPALA-10712
> Project: IMPALA
>  Issue Type: Improvement
>Affects Versions: Impala 4.0.0
>Reporter: Fang-Yu Rao
>Assignee: Fang-Yu Rao
>Priority: Major
>
> We found that {{SET OWNER ROLE}} of a database, table, or a view is not 
> supported when Ranger is the authorization provider.
> In the case of set the owner of a database to a given role, when Ranger is 
> the authorization provider, we found that after executing {{ALTER DATABASE 
>  SET OWNER ROLE }}, we will hit the non-null check 
> for the given role at 
> [https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/analysis/AlterDbSetOwnerStmt.java#L59]
>  due to the fact that the {{AuthorizationPolicy}} returned from 
> {{getAuthPolicy()}} does not cache any policy-related information if the 
> authorization provider is Ranger, which is different than the case when 
> Sentry was the authorization provider.
> When Ranger is the authorization provider, the currently existing roles are 
> cached by {{RangerImpalaPlugin}}. Therefore to address the issue above, we 
> could probably invoke {{getRoles().getRangerRoles()}} provided by the 
> {{RangerImpalaPlugin}} to retrieve the set of existing roles, similar to what 
> is done at 
> [https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/authorization/ranger/RangerImpaladAuthorizationManager.java#L135].
> Tagged [~joemcdonnell] and [~shajini] since I realized this when reviewing 
> Joe's comment at 
> [https://gerrit.cloudera.org/c/17469/1/docs/topics/impala_alter_database.xml#b68].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-11466) Add jetty-server as an allowed dependency

2023-10-13 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao resolved IMPALA-11466.
--
Fix Version/s: Impala 4.3.0
   Resolution: Fixed

Resolve this JIRA since the fix has been merged thanks to [~rizaon].

> Add jetty-server as an allowed dependency
> -
>
> Key: IMPALA-11466
> URL: https://issues.apache.org/jira/browse/IMPALA-11466
> Project: IMPALA
>  Issue Type: Task
>Reporter: Fang-Yu Rao
>Assignee: Fang-Yu Rao
>Priority: Major
> Fix For: Impala 4.3.0
>
>
> We found after HIVE-21456, the instantiation of HiveMetaStoreClient requires 
> the class of org.eclipse.jetty.server.Connector, which is a banned dependency 
> of impala-frontend. This resulted in the failure of the FE test 
> testTestCaseImport() since it needs to instantiate a
> HiveMetaStoreClient.
> We should add the required dependency so that the test could be run.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-12250) Remove deprecated Ranger configuration properties after RANGER-2895

2023-10-13 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao resolved IMPALA-12250.
--
Resolution: Fixed

Resolve this since the Ranger artifact we are using already contains 
RANGER-2895.

> Remove deprecated Ranger configuration properties after RANGER-2895
> ---
>
> Key: IMPALA-12250
> URL: https://issues.apache.org/jira/browse/IMPALA-12250
> Project: IMPALA
>  Issue Type: Task
>Reporter: Fang-Yu Rao
>Assignee: Fang-Yu Rao
>Priority: Major
>
> In IMPALA-12248, we added 3 new Ranger configuration properties that will be 
> required after we start using a build that includes 
> [RANGER-2895|https://github.com/apache/ranger/commit/846031985cae70f7a8c5e92faf186948a302260e#diff-dcab4376623684e416c7e60162c7af7a7d3789fe1d61a2cfdaef794334426f05]
>  in order to start Ranger's HTTP server.
> Recall that a Ranger configuration property was deprecated in 
> [RANGER-2895|https://github.com/apache/ranger/commit/846031985cae70f7a8c5e92faf186948a302260e#diff-dcab4376623684e416c7e60162c7af7a7d3789fe1d61a2cfdaef794334426f05],
>  i.e., 
> [ranger.jpa.jdbc.idleconnectiontestperiod|https://github.com/apache/ranger/commit/846031985cae70f7a8c5e92faf186948a302260e#diff-9669116dca1e5c9fffdb2c81d4d9ac57b489131e90b89ff17b56801131bad5a6L419].
>  Thus, we should also remove it from 
> [ranger-admin-default-site.xml.template|https://github.com/apache/impala/blob/master/testdata/cluster/ranger/ranger-admin-default-site.xml.template]
>  after starting using a build that includes 
> [RANGER-2895|https://github.com/apache/ranger/commit/846031985cae70f7a8c5e92faf186948a302260e#diff-dcab4376623684e416c7e60162c7af7a7d3789fe1d61a2cfdaef794334426f05].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-11498) Change port range of TEZ's web UI server after TEZ-4347

2023-09-07 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao resolved IMPALA-11498.
--
Resolution: Fixed

Resolve the issue since the fix has been merged.

> Change port range of TEZ's web UI server after TEZ-4347
> ---
>
> Key: IMPALA-11498
> URL: https://issues.apache.org/jira/browse/IMPALA-11498
> Project: IMPALA
>  Issue Type: Task
>  Components: Frontend
>Reporter: Fang-Yu Rao
>Assignee: Fang-Yu Rao
>Priority: Major
>
> After TEZ-4347, by default TEZ would attempt to start a web UI server before 
> opening a session. The default port range for the server specified in 
> [TezConfiguration.java|https://github.infra.cloudera.com/CDH/tez/blob/cdw-master/tez-api/src/main/java/org/apache/tez/dag/api/TezConfiguration.java#L1823]
>  (in the TEZ repository) is "5-50050", which does not seem to be a good 
> choice in Impala's testing environment in that there are always some other 
> client programs holding those ports when TEZ attempts to start its web UI 
> server. As a result, TEZ could not bind a port in the port range to start its 
> web UI
> server, resulting in TEZ session not being created.
> We should specify a better port ranger for TEZ once we start using a TEZ 
> dependency with TEZ-4347.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-12248) Add required Ranger configuration properties after RANGER-2895

2023-09-07 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao resolved IMPALA-12248.
--
Resolution: Fixed

Resolve the issue since the fix has been merged.

> Add required Ranger configuration properties after RANGER-2895
> --
>
> Key: IMPALA-12248
> URL: https://issues.apache.org/jira/browse/IMPALA-12248
> Project: IMPALA
>  Issue Type: Task
>Reporter: Fang-Yu Rao
>Assignee: Fang-Yu Rao
>Priority: Major
>
> [RANGER-2895|https://github.com/apache/ranger/commit/846031985cae70f7a8c5e92faf186948a302260e]
>  added and removed some configuration properties.
> [Three new configuration properties were 
> added|https://github.com/apache/ranger/commit/846031985cae70f7a8c5e92faf186948a302260e#diff-dcab4376623684e416c7e60162c7af7a7d3789fe1d61a2cfdaef794334426f05].
>  We found that once we bump up the build number to include RANGER-2895 and if 
> those new properties do not exist in 
> [ranger-admin-default-site.xml.template|https://github.com/apache/impala/blob/master/testdata/cluster/ranger/ranger-admin-default-site.xml.template]
>  or 
> [ranger-admin-site.xml.template|https://github.com/apache/impala/blob/master/testdata/cluster/ranger/ranger-admin-site.xml.template]
>  then the produced site files for Ranger will not contain those new 
> properties, resulting in some error message like the following in 
> catalina.log. As a result, Ranger's HTTP server could not be properly started.
> {code:java}
> 23/06/25 04:46:01 ERROR context.ContextLoader: Context initialization failed
> org.springframework.beans.factory.BeanDefinitionStoreException: Invalid bean 
> definition with name 'defaultDataSource' defined in ServletContext resource 
> [/META-INF/applicationContext.xml]: Could not resolve placeholder 
> 'ranger.jpa.jdbc.idletimeout' in value "${ranger.jpa.jdbc.idletimeout}"; 
> nested exception is java.lang.IllegalArgumentException: Could not resolve 
> placeholder 'ranger.jpa.jdbc.idletimeout' in value 
> "${ranger.jpa.jdbc.idletimeout}"
>   at
> {code}
> There are also some configuration properties removed in RANGER-2895, e.g., 
> [ranger.jpa.jdbc.idleconnectiontestperiod|https://github.com/apache/ranger/commit/846031985cae70f7a8c5e92faf186948a302260e#diff-dcab4376623684e416c7e60162c7af7a7d3789fe1d61a2cfdaef794334426f05L136].
>  In this regard, we could probably add these 3 new properties first and then 
> remove the unnecessary properties once we have bumped up the build number 
> that includes RANGER-2895.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Reopened] (IMPALA-12250) Remove deprecated Ranger configuration properties after RANGER-2895

2023-09-07 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao reopened IMPALA-12250:
--

Sorry I meant to close IMPALA-12248.

> Remove deprecated Ranger configuration properties after RANGER-2895
> ---
>
> Key: IMPALA-12250
> URL: https://issues.apache.org/jira/browse/IMPALA-12250
> Project: IMPALA
>  Issue Type: Task
>Reporter: Fang-Yu Rao
>Assignee: Fang-Yu Rao
>Priority: Major
>
> In IMPALA-12248, we added 3 new Ranger configuration properties that will be 
> required after we start using a build that includes 
> [RANGER-2895|https://github.com/apache/ranger/commit/846031985cae70f7a8c5e92faf186948a302260e#diff-dcab4376623684e416c7e60162c7af7a7d3789fe1d61a2cfdaef794334426f05]
>  in order to start Ranger's HTTP server.
> Recall that a Ranger configuration property was deprecated in 
> [RANGER-2895|https://github.com/apache/ranger/commit/846031985cae70f7a8c5e92faf186948a302260e#diff-dcab4376623684e416c7e60162c7af7a7d3789fe1d61a2cfdaef794334426f05],
>  i.e., 
> [ranger.jpa.jdbc.idleconnectiontestperiod|https://github.com/apache/ranger/commit/846031985cae70f7a8c5e92faf186948a302260e#diff-9669116dca1e5c9fffdb2c81d4d9ac57b489131e90b89ff17b56801131bad5a6L419].
>  Thus, we should also remove it from 
> [ranger-admin-default-site.xml.template|https://github.com/apache/impala/blob/master/testdata/cluster/ranger/ranger-admin-default-site.xml.template]
>  after starting using a build that includes 
> [RANGER-2895|https://github.com/apache/ranger/commit/846031985cae70f7a8c5e92faf186948a302260e#diff-dcab4376623684e416c7e60162c7af7a7d3789fe1d61a2cfdaef794334426f05].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-12250) Remove deprecated Ranger configuration properties after RANGER-2895

2023-09-07 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao resolved IMPALA-12250.
--
Resolution: Fixed

Resolve the issue since the fix has been merged.

> Remove deprecated Ranger configuration properties after RANGER-2895
> ---
>
> Key: IMPALA-12250
> URL: https://issues.apache.org/jira/browse/IMPALA-12250
> Project: IMPALA
>  Issue Type: Task
>Reporter: Fang-Yu Rao
>Assignee: Fang-Yu Rao
>Priority: Major
>
> In IMPALA-12248, we added 3 new Ranger configuration properties that will be 
> required after we start using a build that includes 
> [RANGER-2895|https://github.com/apache/ranger/commit/846031985cae70f7a8c5e92faf186948a302260e#diff-dcab4376623684e416c7e60162c7af7a7d3789fe1d61a2cfdaef794334426f05]
>  in order to start Ranger's HTTP server.
> Recall that a Ranger configuration property was deprecated in 
> [RANGER-2895|https://github.com/apache/ranger/commit/846031985cae70f7a8c5e92faf186948a302260e#diff-dcab4376623684e416c7e60162c7af7a7d3789fe1d61a2cfdaef794334426f05],
>  i.e., 
> [ranger.jpa.jdbc.idleconnectiontestperiod|https://github.com/apache/ranger/commit/846031985cae70f7a8c5e92faf186948a302260e#diff-9669116dca1e5c9fffdb2c81d4d9ac57b489131e90b89ff17b56801131bad5a6L419].
>  Thus, we should also remove it from 
> [ranger-admin-default-site.xml.template|https://github.com/apache/impala/blob/master/testdata/cluster/ranger/ranger-admin-default-site.xml.template]
>  after starting using a build that includes 
> [RANGER-2895|https://github.com/apache/ranger/commit/846031985cae70f7a8c5e92faf186948a302260e#diff-dcab4376623684e416c7e60162c7af7a7d3789fe1d61a2cfdaef794334426f05].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-12311) Extra newlines are produced when an end-to-end test is run with update_results

2023-09-07 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao resolved IMPALA-12311.
--
Resolution: Fixed

Resolve the issue since the fix has been merged.

> Extra newlines are produced when an end-to-end test is run with 
> update_results 
> ---
>
> Key: IMPALA-12311
> URL: https://issues.apache.org/jira/browse/IMPALA-12311
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 4.1.2
>Reporter: Fang-Yu Rao
>Assignee: Fang-Yu Rao
>Priority: Minor
>  Labels: test-infra
>
> We found that extra newlines are produced in the updated golden file when the 
> actual results do not match the expected results specified in the original 
> golden file.
> Take 
> [TestDecimalExprs::test_exprs()|https://github.com/apache/impala/blob/master/tests/query_test/test_decimal_queries.py#L75]
>  for example, this test runs the test cases in 
> [decimal-exprs.test|https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test].
> Suppose that we modify the expected error message at 
> [https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test#L107]
>  from "UDF WARNING: Decimal expression overflowed, returning NULL" to the 
> following (the original string with an additional "x").
> {noformat}
> UDF WARNING: Decimal expression overflowed, returning NULLx
> {noformat}
> Then we run this test using the following command with the command line 
> argument '--update_results'.
> {code:java}
> $IMPALA_HOME/bin/impala-py.test \
> --update_results \
> --junitxml=$IMPALA_EE_TEST_LOGS_DIR/results/test_decimal.xml \
> $IMPALA_HOME/tests/query_test/test_decimal_queries.py::TestDecimalExprs::test_exprs
> {code}
> In $IMPALA_HOME/logs/ee_tests/QueryTest_decimal-exprs.test, we will find that 
> the following subsection corresponding to the query. There are 3 additional 
> newlines in the subsection of 'ERRORS'.
> {noformat}
>  ERRORS
> UDF WARNING: Decimal expression overflowed, returning NULL
> 
> {noformat}
> One of the newlines was produced in 
> [join_section_lines()|https://github.com/apache/impala/blob/master/tests/util/test_file_parser.py#L298].
>  This function is called when the actual results do not match the expected 
> results in the following 4 places.
>  # [test_section['ERRORS'] = 
> join_section_lines(actual_errors)|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L398].
>  # [test_section['TYPES'] = join_section_lines(\[', 
> '.join(actual_types)\])|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L429].
>  # [test_section['LABELS'] = join_section_lines(\[', 
> '.join(actual_labels)\])|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L451].
>  # [test_section[result_section] = 
> join_section_lines(actual.result_list)|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L489].
> Thus, we also have the same issue for subsections like TYPES, LABELS, and 
> RESULTS in such a scenario (actual results do not match expected ones). It 
> would be good if a user/developer does not have to manually remove those 
> extra newlines when trying to generate the golden files for new test files.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-12423) Impala shell should allow a user to set up query options when the underlying protocol is strict_hs2_protocol

2023-09-05 Thread Fang-Yu Rao (Jira)
Fang-Yu Rao created IMPALA-12423:


 Summary: Impala shell should allow a user to set up query options 
when the underlying protocol is strict_hs2_protocol
 Key: IMPALA-12423
 URL: https://issues.apache.org/jira/browse/IMPALA-12423
 Project: IMPALA
  Issue Type: New Feature
Reporter: Fang-Yu Rao
Assignee: Fang-Yu Rao


Currently when we use the Impala shell to connect to a service, e.g., 
HiveServer2, via the strict HS2 protocol, we are not able to execute the SET 
statement or to set up the value of a query option as shown in the following. 
It would be much more convenient if a user is at least able to set up the value 
of a query option in the Impala shell when the Impala shell is used to connect 
to an external frontend that sends query plans to the Impala server for 
execution.
{code:java}
fangyurao@fangyu-upstream-dev:~$ impala-shell.sh -i 'localhost:11050' 
--strict_hs2_protocol
Starting Impala Shell with no authentication using Python 2.7.16
WARNING: Unable to track live progress with strict_hs2_protocol
LDAP password for fangyurao: 
Opened TCP connection to localhost:11050
Connected to localhost:11050
Server version: N/A
***
Welcome to the Impala shell.
(Impala Shell v4.3.0-SNAPSHOT (2f06a7b) built on Tue Sep  5 14:14:24 PDT 2023)

To see how Impala will plan to run your query without actually executing it, use
the EXPLAIN command. You can change the level of detail in the EXPLAIN output by
setting the EXPLAIN_LEVEL query option.
***
[localhost:11050] default> set;
Query options (defaults shown in []):
No options available.

Shell Options
WRITE_DELIMITED: False
VERBOSE: True
VERTICAL: False
LIVE_SUMMARY: False
OUTPUT_FILE: None
DELIMITER: \t
LIVE_PROGRESS: False

Variables:
No variables defined.
[localhost:11050] default> set num_nodes=2;
Unknown query option: num_nodes
Available query options, with their values (defaults shown in []):
Query options (defaults shown in []):
No options available.

Shell Options
WRITE_DELIMITED: False
VERBOSE: True
VERTICAL: False
LIVE_SUMMARY: False
OUTPUT_FILE: None
DELIMITER: \t
LIVE_PROGRESS: False
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-12329) Access type of Ranger audit event being set up in more than one place inconsistently

2023-08-01 Thread Fang-Yu Rao (Jira)
Fang-Yu Rao created IMPALA-12329:


 Summary: Access type of Ranger audit event being set up in more 
than one place inconsistently
 Key: IMPALA-12329
 URL: https://issues.apache.org/jira/browse/IMPALA-12329
 Project: IMPALA
  Issue Type: Bug
Reporter: Fang-Yu Rao
Assignee: Fang-Yu Rao


We found that for some queries, the access type of Ranger audit event could be 
set up in more than one place inconsistently.

For instance, take the TRUNCATE TABLE statement for example. During the 
authorization of this query, the access type of the corresponding Ranger audit 
event would be first set up to "update" at 
[RangerAuthorizationChecker#authorizeResource()|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/authorization/ranger/RangerAuthorizationChecker.java#L664].

But later at 
[RangerAuthorizationChecker#updateAuditEvents()|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/authorization/ranger/RangerAuthorizationChecker.java#L645],
 the access type will be set up to "insert" which is the value of 
privilege.name().toLowerCase().

We probably should not have to set up the access type differently in 2 places.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Comment Edited] (IMPALA-12311) Extra newlines are produced when an end-to-end test is run with update_results

2023-07-25 Thread Fang-Yu Rao (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17747223#comment-17747223
 ] 

Fang-Yu Rao edited comment on IMPALA-12311 at 7/26/23 1:27 AM:
---

I have verified that for the subsections of ERRORS, TYPES, and LABELS, 
additional newlines in the end of a subsection are okay. However, any 
additional newline added to the subsection of RESULTS would fail the test case. 
According to what we have seen here, it should be safe to just not output a 
trailing newline in 
[join_section_lines()|https://github.com/apache/impala/blob/master/tests/util/test_file_parser.py#L294].

+*Subsection of ERRORS*+

For the subsection of ERRORS, we found that the expected error message is 
post-processed by the following at 
[https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L328-L333],
 which removes extra newlines. Note that for a line 'expected_error' containing 
only a newline, 'expected_error' evaluates to false.
{code:java}
  for expected_error in expected_errors:
if not expected_error: continue
if ROW_REGEX_PREFIX.match(expected_error):
  converted_expected_errors.append(expected_error)
else:
  converted_expected_errors.append("'%s'" % expected_error)
{code}
On the other hand, the actual results (exec_result.log at 
[https://github.com/apache/impala/blob/master/tests/util/test_file_parser.py#L298])
 contain 2 newlines. Since we use the following to create the respective 
QueryTestResult in 
[verify_errors()|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L324],
 additional newlines are also removed. Note that a line 'l' evaluates to false 
if it's a newline.
{code:java}
  actual = QueryTestResult(["'%s'" % l for l in actual_errors if l], ['STRING'],
  ['DUMMY_LABEL'], order_matters=False)
{code}
That is, additional newlines are removed from both the expected error message 
and the actual error message. Hence, additional newlines added in the 
subsection of ERRORS are okay.

*+Subsection of TYPES+*

Moreover, additional newlines in the subsection of TYPES are okay since we 
remove additional newlines when constructing the expected line of types at 
[https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L406].
{code:java}
expected_types = [c.strip().upper()
  for c in remove_comments(section).rstrip('\n').split(',')]
{code}
*+Subsection of LABELS+*

Additional newlines in the subsection of LABELS are okay as well because we use 
the following to construct the expected line of labels at 
[https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L446].
{code:java}
  expected_labels = [c.strip().upper() for c in 
test_section['LABELS'].split(',')]
{code}


was (Author: fangyurao):
I have verified that for the subsections of ERRORS, TYPES, and LABELS, 
additional newlines in the end of a subsection are okay. However, any 
additional newline added to the subsection of RESULTS would fail the test case. 
According to what we have seen here, it should be safe to just not output a 
trailing newline in 
[join_section_lines()|https://github.com/apache/impala/blob/master/tests/util/test_file_parser.py#L294]

+*Subsection of ERRORS*+

For the subsection of ERRORS, we found that the expected error message is 
post-processed by the following at 
[https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L328-L333],
 which removes extra newlines. Note that for a line 'expected_error' containing 
only a newline, 'expected_error' evaluates to false.
{code:java}
  for expected_error in expected_errors:
if not expected_error: continue
if ROW_REGEX_PREFIX.match(expected_error):
  converted_expected_errors.append(expected_error)
else:
  converted_expected_errors.append("'%s'" % expected_error)
{code}
On the other hand, the actual results (exec_result.log at 
[https://github.com/apache/impala/blob/master/tests/util/test_file_parser.py#L298])
 contain 2 newlines. Since we use the following to create the respective 
QueryTestResult in 
[verify_errors()|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L324],
 additional newlines are also removed. Note that a line 'l' evaluates to false 
if it's a newline.
{code:java}
  actual = QueryTestResult(["'%s'" % l for l in actual_errors if l], ['STRING'],
  ['DUMMY_LABEL'], order_matters=False)
{code}
That is, additional newlines are removed from both the expected error message 
and the actual error message. Hence, additional newlines added in the 
subsection of ERRORS are okay.

*+Subsection of TYPES+*

Moreover, additional newlines in the subsection of TYPES are okay since we 
remove additional newlines when constructing the expected line of types at 

[jira] [Comment Edited] (IMPALA-12311) Extra newlines are produced when an end-to-end test is run with update_results

2023-07-25 Thread Fang-Yu Rao (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17747223#comment-17747223
 ] 

Fang-Yu Rao edited comment on IMPALA-12311 at 7/26/23 1:26 AM:
---

I have verified that for the subsections of ERRORS, TYPES, and LABELS, 
additional newlines in the end of a subsection are okay. However, any 
additional newline added to the subsection of RESULTS would fail the test case. 
According to what we have seen here, it should be safe to just not output a 
trailing newline in 
[join_section_lines()|https://github.com/apache/impala/blob/master/tests/util/test_file_parser.py#L294]

+*Subsection of ERRORS*+

For the subsection of ERRORS, we found that the expected error message is 
post-processed by the following at 
[https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L328-L333],
 which removes extra newlines. Note that for a line 'expected_error' containing 
only a newline, 'expected_error' evaluates to false.
{code:java}
  for expected_error in expected_errors:
if not expected_error: continue
if ROW_REGEX_PREFIX.match(expected_error):
  converted_expected_errors.append(expected_error)
else:
  converted_expected_errors.append("'%s'" % expected_error)
{code}
On the other hand, the actual results (exec_result.log at 
[https://github.com/apache/impala/blob/master/tests/util/test_file_parser.py#L298])
 contain 2 newlines. Since we use the following to create the respective 
QueryTestResult in 
[verify_errors()|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L324],
 additional newlines are also removed. Note that a line 'l' evaluates to false 
if it's a newline.
{code:java}
  actual = QueryTestResult(["'%s'" % l for l in actual_errors if l], ['STRING'],
  ['DUMMY_LABEL'], order_matters=False)
{code}
That is, additional newlines are removed from both the expected error message 
and the actual error message. Hence, additional newlines added in the 
subsection of ERRORS are okay.

*+Subsection of TYPES+*

Moreover, additional newlines in the subsection of TYPES are okay since we 
remove additional newlines when constructing the expected line of types at 
[https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L406].
{code:java}
expected_types = [c.strip().upper()
  for c in remove_comments(section).rstrip('\n').split(',')]
{code}
*+Subsection of LABELS+*

Additional newlines in the subsection of LABELS are okay as well because we use 
the following to construct the expected line of labels at 
[https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L446].
{code:java}
  expected_labels = [c.strip().upper() for c in 
test_section['LABELS'].split(',')]
{code}


was (Author: fangyurao):
I have verified that for the subsections of ERRORS, TYPES, and LABELS, 
additional newlines in the end of a subsection are okay. However, any 
additional newline added to the subsection of RESULTS would fail the test case. 
According to what we have seen here, it should be safe to just not output a 
trailing newline in [join_section_lines()]

+*Subsection of ERRORS*+

For the subsection of ERRORS, we found that the expected error message is 
post-processed by the following at 
[https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L328-L333],
 which removes extra newlines. Note that for a line 'expected_error' containing 
only a newline, 'expected_error' evaluates to false.
{code:java}
  for expected_error in expected_errors:
if not expected_error: continue
if ROW_REGEX_PREFIX.match(expected_error):
  converted_expected_errors.append(expected_error)
else:
  converted_expected_errors.append("'%s'" % expected_error)
{code}
On the other hand, the actual results (exec_result.log at 
[https://github.com/apache/impala/blob/master/tests/util/test_file_parser.py#L298])
 contain 2 newlines. Since we use the following to create the respective 
QueryTestResult in 
[verify_errors()|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L324],
 additional newlines are also removed. Note that a line 'l' evaluates to false 
if it's a newline.
{code:java}
  actual = QueryTestResult(["'%s'" % l for l in actual_errors if l], ['STRING'],
  ['DUMMY_LABEL'], order_matters=False)
{code}
That is, additional newlines are removed from both the expected error message 
and the actual error message. Hence, additional newlines added in the 
subsection of ERRORS are okay.

*+Subsection of TYPES+*

Moreover, additional newlines in the subsection of TYPES are okay since we 
remove additional newlines when constructing the expected line of types at 
[https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L406].
{code:java}
expected_types = [c.strip().upper()
  

[jira] [Comment Edited] (IMPALA-12311) Extra newlines are produced when an end-to-end test is run with update_results

2023-07-25 Thread Fang-Yu Rao (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17747223#comment-17747223
 ] 

Fang-Yu Rao edited comment on IMPALA-12311 at 7/26/23 1:25 AM:
---

I have verified that for the subsections of ERRORS, TYPES, and LABELS, 
additional newlines in the end of a subsection are okay. However, any 
additional newline added to the subsection of RESULTS would fail the test case. 
According to what we have seen here, it should be safe to just not output a 
trailing newline in [join_section_lines()]

+*Subsection of ERRORS*+

For the subsection of ERRORS, we found that the expected error message is 
post-processed by the following at 
[https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L328-L333],
 which removes extra newlines. Note that for a line 'expected_error' containing 
only a newline, 'expected_error' evaluates to false.
{code:java}
  for expected_error in expected_errors:
if not expected_error: continue
if ROW_REGEX_PREFIX.match(expected_error):
  converted_expected_errors.append(expected_error)
else:
  converted_expected_errors.append("'%s'" % expected_error)
{code}
On the other hand, the actual results (exec_result.log at 
[https://github.com/apache/impala/blob/master/tests/util/test_file_parser.py#L298])
 contain 2 newlines. Since we use the following to create the respective 
QueryTestResult in 
[verify_errors()|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L324],
 additional newlines are also removed. Note that a line 'l' evaluates to false 
if it's a newline.
{code:java}
  actual = QueryTestResult(["'%s'" % l for l in actual_errors if l], ['STRING'],
  ['DUMMY_LABEL'], order_matters=False)
{code}
That is, additional newlines are removed from both the expected error message 
and the actual error message. Hence, additional newlines added in the 
subsection of ERRORS are okay.

*+Subsection of TYPES+*

Moreover, additional newlines in the subsection of TYPES are okay since we 
remove additional newlines when constructing the expected line of types at 
[https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L406].
{code:java}
expected_types = [c.strip().upper()
  for c in remove_comments(section).rstrip('\n').split(',')]
{code}
*+Subsection of LABELS+*

Additional newlines in the subsection of LABELS are okay as well because we use 
the following to construct the expected line of labels at 
[https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L446].
{code:java}
  expected_labels = [c.strip().upper() for c in 
test_section['LABELS'].split(',')]
{code}


was (Author: fangyurao):
I have verified that for the subsections of ERRORS, TYPES, and LABELS, 
additional newlines in the end of a subsection are okay.

+*Subsection of ERRORS*+

For the subsection of ERRORS, we found that the expected error message is 
post-processed by the following call to split_section_lines() at 
[https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L391],
 which removes extra newlines.
{code:java}
expected_errors = 
split_section_lines(remove_comments(test_section['ERRORS']))
{code}
On the other hand, the actual results (exec_result.log at 
[https://github.com/apache/impala/blob/master/tests/util/test_file_parser.py#L298])
 contain 2 newlines. Since we use the following to create the respective 
QueryTestResult in 
[verify_errors()|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L324],
 additional newlines are also removed. Note that a line 'l' evaluates to false 
if it's a newline.
{code:java}
  actual = QueryTestResult(["'%s'" % l for l in actual_errors if l], ['STRING'],
  ['DUMMY_LABEL'], order_matters=False)
{code}
That is, additional newlines are removed from both the expected error message 
and the actual error message. Hence, additional newlines added in the 
subsection of ERRORS are okay.

*+Subsection of TYPES+*

Moreover, additional newlines in the subsection of TYPES are okay since we 
remove additional newlines when constructing the expected line of types at 
[https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L406].
{code:java}
expected_types = [c.strip().upper()
  for c in remove_comments(section).rstrip('\n').split(',')]
{code}
*+Subsection of LABELS+*

Additional newlines in the subsection of LABELS are okay as well because we use 
the following to construct the expected line of labels at 
[https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L446].
{code:java}
  expected_labels = [c.strip().upper() for c in 
test_section['LABELS'].split(',')]
{code}



> Extra newlines are produced when an end-to-end test is run with 
> update_results 
> 

[jira] [Commented] (IMPALA-12311) Extra newlines are produced when an end-to-end test is run with update_results

2023-07-25 Thread Fang-Yu Rao (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17747223#comment-17747223
 ] 

Fang-Yu Rao commented on IMPALA-12311:
--

I have verified that for the subsections of ERRORS, TYPES, and LABELS, 
additional newlines in the end of a subsection are okay.

+*Subsection of ERRORS*+

For the subsection of ERRORS, we found that the expected error message is 
post-processed by the following call to split_section_lines() at 
[https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L391],
 which removes extra newlines.
{code:java}
expected_errors = 
split_section_lines(remove_comments(test_section['ERRORS']))
{code}
On the other hand, the actual results (exec_result.log at 
[https://github.com/apache/impala/blob/master/tests/util/test_file_parser.py#L298])
 contain 2 newlines. Since we use the following to create the respective 
QueryTestResult in 
[verify_errors()|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L324],
 additional newlines are also removed. Note that a line 'l' evaluates to false 
if it's a newline.
{code:java}
  actual = QueryTestResult(["'%s'" % l for l in actual_errors if l], ['STRING'],
  ['DUMMY_LABEL'], order_matters=False)
{code}
That is, additional newlines are removed from both the expected error message 
and the actual error message. Hence, additional newlines added in the 
subsection of ERRORS are okay.

*+Subsection of TYPES+*

Moreover, additional newlines in the subsection of TYPES are okay since we 
remove additional newlines when constructing the expected line of types at 
[https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L406].
{code:java}
expected_types = [c.strip().upper()
  for c in remove_comments(section).rstrip('\n').split(',')]
{code}
*+Subsection of LABELS+*

Additional newlines in the subsection of LABELS are okay as well because we use 
the following to construct the expected line of labels at 
[https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L446].
{code:java}
  expected_labels = [c.strip().upper() for c in 
test_section['LABELS'].split(',')]
{code}



> Extra newlines are produced when an end-to-end test is run with 
> update_results 
> ---
>
> Key: IMPALA-12311
> URL: https://issues.apache.org/jira/browse/IMPALA-12311
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 4.1.2
>Reporter: Fang-Yu Rao
>Assignee: Fang-Yu Rao
>Priority: Minor
>  Labels: test-infra
>
> We found that extra newlines are produced in the updated golden file when the 
> actual results do not match the expected results specified in the original 
> golden file.
> Take 
> [TestDecimalExprs::test_exprs()|https://github.com/apache/impala/blob/master/tests/query_test/test_decimal_queries.py#L75]
>  for example, this test runs the test cases in 
> [decimal-exprs.test|https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test].
> Suppose that we modify the expected error message at 
> [https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test#L107]
>  from "UDF WARNING: Decimal expression overflowed, returning NULL" to the 
> following (the original string with an additional "x").
> {noformat}
> UDF WARNING: Decimal expression overflowed, returning NULLx
> {noformat}
> Then we run this test using the following command with the command line 
> argument '--update_results'.
> {code:java}
> $IMPALA_HOME/bin/impala-py.test \
> --update_results \
> --junitxml=$IMPALA_EE_TEST_LOGS_DIR/results/test_decimal.xml \
> $IMPALA_HOME/tests/query_test/test_decimal_queries.py::TestDecimalExprs::test_exprs
> {code}
> In $IMPALA_HOME/logs/ee_tests/QueryTest_decimal-exprs.test, we will find that 
> the following subsection corresponding to the query. There are 3 additional 
> newlines in the subsection of 'ERRORS'.
> {noformat}
>  ERRORS
> UDF WARNING: Decimal expression overflowed, returning NULL
> 
> {noformat}
> One of the newlines was produced in 
> [join_section_lines()|https://github.com/apache/impala/blob/master/tests/util/test_file_parser.py#L298].
>  This function is called when the actual results do not match the expected 
> results in the following 4 places.
>  # [test_section['ERRORS'] = 
> join_section_lines(actual_errors)|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L398].
>  # [test_section['TYPES'] = join_section_lines(\[', 
> '.join(actual_types)\])|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L429].
>  # [test_section['LABELS'] = 

[jira] [Updated] (IMPALA-12311) Extra newlines are produced when an end-to-end test is run with update_results

2023-07-25 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao updated IMPALA-12311:
-
Description: 
We found that extra newlines are produced in the updated golden file when the 
actual results do not match the expected results specified in the original 
golden file.

Take 
[TestDecimalExprs::test_exprs()|https://github.com/apache/impala/blob/master/tests/query_test/test_decimal_queries.py#L75]
 for example, this test runs the test cases in 
[decimal-exprs.test|https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test].

Suppose that we modify the expected error message at 
[https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test#L107]
 from "UDF WARNING: Decimal expression overflowed, returning NULL" to the 
following (the original string with an additional "x").
{noformat}
UDF WARNING: Decimal expression overflowed, returning NULLx
{noformat}
Then we run this test using the following command with the command line 
argument '--update_results'.
{code:java}
$IMPALA_HOME/bin/impala-py.test \
--update_results \
--junitxml=$IMPALA_EE_TEST_LOGS_DIR/results/test_decimal.xml \
$IMPALA_HOME/tests/query_test/test_decimal_queries.py::TestDecimalExprs::test_exprs
{code}
In $IMPALA_HOME/logs/ee_tests/QueryTest_decimal-exprs.test, we will find that 
the following subsection corresponding to the query. There are 3 additional 
newlines in the subsection of 'ERRORS'.
{noformat}
 ERRORS
UDF WARNING: Decimal expression overflowed, returning NULL




{noformat}
One of the newlines was produced in 
[join_section_lines()|https://github.com/apache/impala/blob/master/tests/util/test_file_parser.py#L298].
 This function is called when the actual results do not match the expected 
results in the following 4 places.
 # [test_section['ERRORS'] = 
join_section_lines(actual_errors)|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L398].
 # [test_section['TYPES'] = join_section_lines(\[', 
'.join(actual_types)\])|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L429].
 # [test_section['LABELS'] = join_section_lines(\[', 
'.join(actual_labels)\])|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L451].
 # [test_section[result_section] = 
join_section_lines(actual.result_list)|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L489].

Thus, we also have the same issue for subsections like TYPES, LABELS, and 
RESULTS in such a scenario (actual results do not match expected ones). It 
would be good if a user/developer does not have to manually remove those extra 
newlines when trying to generate the golden files for new test files.

  was:
We found that extra newlines are produced in the updated golden file when the 
actual results do not match the expected results specified in the original 
golden file.

Take 
[TestDecimalExprs::test_exprs()|https://github.com/apache/impala/blob/master/tests/query_test/test_decimal_queries.py#L75]
 for example, this test runs the test cases in 
[decimal-exprs.test|https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test].

Suppose that we modify the expected error message at 
[https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test#L107]
 from "UDF WARNING: Decimal expression overflowed, returning NULL" to the 
following (the original string with an additional "x").
{noformat}
UDF WARNING: Decimal expression overflowed, returning NULLx
{noformat}
Then we run this test using the following command with the command line 
argument '--update_results'.
{code:java}
$IMPALA_HOME/bin/impala-py.test \
--update_results \
--junitxml=$IMPALA_EE_TEST_LOGS_DIR/results/test_decimal.xml \
$IMPALA_HOME/tests/query_test/test_decimal_queries.py::TestDecimalExprs::test_exprs
{code}
In $IMPALA_HOME/logs/ee_tests/QueryTest_decimal-exprs.test, we will find that 
the following subsection corresponding to the query. There are 3 additional 
newlines in the subsection of 'ERRORS'.
{noformat}
 ERRORS
UDF WARNING: Decimal expression overflowed, returning NULL




{noformat}
One of the newlines was produced in 
[join_section_lines()|https://github.com/apache/impala/blob/master/tests/util/test_file_parser.py#L298].
 This function is called when the actual results do not match the expected 
results in the following 4 places.
 # [test_section['ERRORS'] = 
join_section_lines(actual_errors)|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L398].
 # [test_section['TYPES'] = join_section_lines([', 
'.join(actual_types)])|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L429].
 # 

[jira] [Updated] (IMPALA-12311) Extra newlines are produced when an end-to-end test is run with update_results

2023-07-25 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao updated IMPALA-12311:
-
Description: 
We found that extra newlines are produced in the updated golden file when the 
actual results do not match the expected results specified in the original 
golden file.

Take 
[TestDecimalExprs::test_exprs()|https://github.com/apache/impala/blob/master/tests/query_test/test_decimal_queries.py#L75]
 for example, this test runs the test cases in 
[decimal-exprs.test|https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test].

Suppose that we modify the expected error message at 
[https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test#L107]
 from "UDF WARNING: Decimal expression overflowed, returning NULL" to the 
following (the original string with an additional "x").
{noformat}
UDF WARNING: Decimal expression overflowed, returning NULLx
{noformat}
Then we run this test using the following command with the command line 
argument '--update_results'.
{code:java}
$IMPALA_HOME/bin/impala-py.test \
--update_results \
--junitxml=$IMPALA_EE_TEST_LOGS_DIR/results/test_decimal.xml \
$IMPALA_HOME/tests/query_test/test_decimal_queries.py::TestDecimalExprs::test_exprs
{code}
In $IMPALA_HOME/logs/ee_tests/QueryTest_decimal-exprs.test, we will find that 
the following subsection corresponding to the query. There are 3 additional 
newlines in the subsection of 'ERRORS'.
{noformat}
 ERRORS
UDF WARNING: Decimal expression overflowed, returning NULL




{noformat}
One of the newlines was produced in 
[join_section_lines()|https://github.com/apache/impala/blob/master/tests/util/test_file_parser.py#L298].
 This function is called when the actual results do not match the expected 
results in the following 4 places.
 # [test_section['ERRORS'] = 
join_section_lines(actual_errors)|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L398].
 # [test_section['TYPES'] = join_section_lines([', 
'.join(actual_types)])|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L429].
 # [test_section['LABELS'] = join_section_lines([', 
'.join(actual_labels)])|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L451].
 # [test_section[result_section] = 
join_section_lines(actual.result_list)|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L489].

Thus, we also have the same issue for subsections like TYPES, LABELS, and 
RESULTS in such a scenario (actual results do not match expected ones). It 
would be good if a user/developer does not have to manually remove those extra 
newlines when trying to generate the golden files for new test files.

  was:
We found that extra newlines are produced in the updated golden file when the 
actual results do not match the expected results specified in the original 
golden file.

Take 
[TestDecimalExprs::test_exprs()|https://github.com/apache/impala/blob/master/tests/query_test/test_decimal_queries.py#L75]
 for example, this test runs the test cases in 
[decimal-exprs.test|https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test].

Suppose that we modify the expected error message at 
[https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test#L107]
 from "UDF WARNING: Decimal expression overflowed, returning NULL" to the 
following (the original string with an additional "x").
{noformat}
UDF WARNING: Decimal expression overflowed, returning NULLx
{noformat}
Then we run this test using the following command with the command line 
argument '--update_results'.
{code:java}
$IMPALA_HOME/bin/impala-py.test \
--update_results \
--junitxml=$IMPALA_EE_TEST_LOGS_DIR/results/test_decimal.xml \
$IMPALA_HOME/tests/query_test/test_decimal_queries.py::TestDecimalExprs::test_exprs
{code}
In $IMPALA_HOME/logs/ee_tests/QueryTest_decimal-exprs.test, we will found the 
following subsection corresponding to the query. There are 3 additional 
newlines in the subsection of 'ERRORS'.
{noformat}
 ERRORS
UDF WARNING: Decimal expression overflowed, returning NULL




{noformat}
One of the newlines was produced in 
[join_section_lines()|https://github.com/apache/impala/blob/master/tests/util/test_file_parser.py#L298].
 This function is called when the actual results do not match the expected 
results in the following 4 places.
 # [test_section['ERRORS'] = 
join_section_lines(actual_errors)|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L398].
 # [test_section['TYPES'] = join_section_lines(\[', 
'.join(actual_types)\])|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L429].
 # 

[jira] [Updated] (IMPALA-12311) Extra newlines are produced when an end-to-end test is run with update_results

2023-07-24 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao updated IMPALA-12311:
-
Description: 
We found that extra newlines are produced in the updated golden file when the 
actual results do not match the expected results specified in the original 
golden file.

Take 
[TestDecimalExprs::test_exprs()|https://github.com/apache/impala/blob/master/tests/query_test/test_decimal_queries.py#L75]
 for example, this test runs the test cases in 
[decimal-exprs.test|https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test].

Suppose that we modify the expected error message at 
[https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test#L107]
 from "UDF WARNING: Decimal expression overflowed, returning NULL" to the 
following (the original string with an additional "x").
{noformat}
UDF WARNING: Decimal expression overflowed, returning NULLx
{noformat}
Then we run this test using the following command with the command line 
argument '--update_results'.
{code:java}
$IMPALA_HOME/bin/impala-py.test \
--update_results \
--junitxml=$IMPALA_EE_TEST_LOGS_DIR/results/test_decimal.xml \
$IMPALA_HOME/tests/query_test/test_decimal_queries.py::TestDecimalExprs::test_exprs
{code}
In $IMPALA_HOME/logs/ee_tests/QueryTest_decimal-exprs.test, we will found the 
following subsection corresponding to the query. There are 3 additional 
newlines in the subsection of 'ERRORS'.
{noformat}
 ERRORS
UDF WARNING: Decimal expression overflowed, returning NULL




{noformat}
One of the newlines was produced in 
[join_section_lines()|https://github.com/apache/impala/blob/master/tests/util/test_file_parser.py#L298].
 This function is called when the actual results do not match the expected 
results in the following 4 places.
 # [test_section['ERRORS'] = 
join_section_lines(actual_errors)|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L398].
 # [test_section['TYPES'] = join_section_lines(\[', 
'.join(actual_types)\])|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L429].
 # [test_section['LABELS'] = join_section_lines(\[', 
'.join(actual_labels)\])|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L451].
 # [test_section[result_section] = 
join_section_lines(actual.result_list)|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L489].

Thus, we also have the same issue for subsections like TYPES, LABELS, and 
RESULTS in such a scenario (actual results do not match expected ones). It 
would be good if a user/developer does not have to manually remove those extra 
newlines when trying to generate the golden files for new test files.

  was:
We found that extra newlines are produced in the updated golden file when the 
actual results do not match the expected results specified in the original 
golden file.

Take 
[TestDecimalExprs::test_exprs()|https://github.com/apache/impala/blob/master/tests/query_test/test_decimal_queries.py#L75]
 for example, this test runs the test cases in 
[decimal-exprs.test|https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test].

Suppose that we modify the expected error message at 
[https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test#L107]
 from "UDF WARNING: Decimal expression overflowed, returning NULL" to the 
following (the original string with an additional "x").
{noformat}
UDF WARNING: Decimal expression overflowed, returning NULLx
{noformat}
Then we run this test using the following command with the command line 
argument '--update_results'.
{code:java}
$IMPALA_HOME/bin/impala-py.test \
--update_results \
--junitxml=$IMPALA_EE_TEST_LOGS_DIR/results/test_decimal.xml \
$IMPALA_HOME/tests/query_test/test_decimal_queries.py::TestDecimalExprs::test_exprs
{code}
In $IMPALA_HOME/logs/ee_tests/QueryTest_decimal-exprs.test, we will found the 
following subsection corresponding to the query. There are 3 additional 
newlines in the subsection of 'ERRORS'.
{noformat}
 ERRORS
UDF WARNING: Decimal expression overflowed, returning NULL




{noformat}
One of the newlines was produced in 
[join_section_lines()|https://github.com/apache/impala/blob/master/tests/util/test_file_parser.py#L298].
 This function is called when the actual results do not match the expected 
results in the following 4 places.
 # [test_section['ERRORS'] = 
join_section_lines(actual_errors)|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L398].
 # [test_section['TYPES'] = join_section_lines([', 
'.join(actual_types)])|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L429].
 # 

[jira] [Updated] (IMPALA-12311) Extra newlines are produced when an end-to-end test is run with update_results

2023-07-24 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao updated IMPALA-12311:
-
Labels: test-infra  (was: )

> Extra newlines are produced when an end-to-end test is run with 
> update_results 
> ---
>
> Key: IMPALA-12311
> URL: https://issues.apache.org/jira/browse/IMPALA-12311
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Fang-Yu Rao
>Assignee: Fang-Yu Rao
>Priority: Minor
>  Labels: test-infra
>
> We found that extra newlines are produced in the updated golden file when the 
> actual results do not match the expected results specified in the original 
> golden file.
> Take 
> [TestDecimalExprs::test_exprs()|https://github.com/apache/impala/blob/master/tests/query_test/test_decimal_queries.py#L75]
>  for example, this test runs the test cases in 
> [decimal-exprs.test|https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test].
> Suppose that we modify the expected error message at 
> [https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test#L107]
>  from "UDF WARNING: Decimal expression overflowed, returning NULL" to the 
> following (the original string with an additional "x").
> {noformat}
> UDF WARNING: Decimal expression overflowed, returning NULLx
> {noformat}
> Then we run this test using the following command with the command line 
> argument '--update_results'.
> {code:java}
> $IMPALA_HOME/bin/impala-py.test \
> --update_results \
> --junitxml=$IMPALA_EE_TEST_LOGS_DIR/results/test_decimal.xml \
> $IMPALA_HOME/tests/query_test/test_decimal_queries.py::TestDecimalExprs::test_exprs
> {code}
> In $IMPALA_HOME/logs/ee_tests/QueryTest_decimal-exprs.test, we will found the 
> following subsection corresponding to the query. There are 3 additional 
> newlines in the subsection of 'ERRORS'.
> {noformat}
>  ERRORS
> UDF WARNING: Decimal expression overflowed, returning NULL
> 
> {noformat}
> One of the newlines was produced in 
> [join_section_lines()|https://github.com/apache/impala/blob/master/tests/util/test_file_parser.py#L298].
>  This function is called when the actual results do not match the expected 
> results in the following 4 places.
>  # [test_section['ERRORS'] = 
> join_section_lines(actual_errors)|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L398].
>  # [test_section['TYPES'] = join_section_lines([', 
> '.join(actual_types)])|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L429].
>  # [test_section['LABELS'] = join_section_lines([', 
> '.join(actual_labels)])|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L451].
>  # [test_section[result_section] = 
> join_section_lines(actual.result_list)|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L489].
> Thus, we also have the same issue for subsections like TYPES, LABELS, and 
> RESULTS in such a scenario (actual results do not match expected ones). It 
> would be good if a user/developer does not have to manually remove those 
> extra newlines when trying to generate the golden files for new test files.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-12311) Extra newlines are produced when an end-to-end test is run with update_results

2023-07-24 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao updated IMPALA-12311:
-
Affects Version/s: Impala 4.1.2

> Extra newlines are produced when an end-to-end test is run with 
> update_results 
> ---
>
> Key: IMPALA-12311
> URL: https://issues.apache.org/jira/browse/IMPALA-12311
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 4.1.2
>Reporter: Fang-Yu Rao
>Assignee: Fang-Yu Rao
>Priority: Minor
>  Labels: test-infra
>
> We found that extra newlines are produced in the updated golden file when the 
> actual results do not match the expected results specified in the original 
> golden file.
> Take 
> [TestDecimalExprs::test_exprs()|https://github.com/apache/impala/blob/master/tests/query_test/test_decimal_queries.py#L75]
>  for example, this test runs the test cases in 
> [decimal-exprs.test|https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test].
> Suppose that we modify the expected error message at 
> [https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test#L107]
>  from "UDF WARNING: Decimal expression overflowed, returning NULL" to the 
> following (the original string with an additional "x").
> {noformat}
> UDF WARNING: Decimal expression overflowed, returning NULLx
> {noformat}
> Then we run this test using the following command with the command line 
> argument '--update_results'.
> {code:java}
> $IMPALA_HOME/bin/impala-py.test \
> --update_results \
> --junitxml=$IMPALA_EE_TEST_LOGS_DIR/results/test_decimal.xml \
> $IMPALA_HOME/tests/query_test/test_decimal_queries.py::TestDecimalExprs::test_exprs
> {code}
> In $IMPALA_HOME/logs/ee_tests/QueryTest_decimal-exprs.test, we will found the 
> following subsection corresponding to the query. There are 3 additional 
> newlines in the subsection of 'ERRORS'.
> {noformat}
>  ERRORS
> UDF WARNING: Decimal expression overflowed, returning NULL
> 
> {noformat}
> One of the newlines was produced in 
> [join_section_lines()|https://github.com/apache/impala/blob/master/tests/util/test_file_parser.py#L298].
>  This function is called when the actual results do not match the expected 
> results in the following 4 places.
>  # [test_section['ERRORS'] = 
> join_section_lines(actual_errors)|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L398].
>  # [test_section['TYPES'] = join_section_lines([', 
> '.join(actual_types)])|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L429].
>  # [test_section['LABELS'] = join_section_lines([', 
> '.join(actual_labels)])|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L451].
>  # [test_section[result_section] = 
> join_section_lines(actual.result_list)|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L489].
> Thus, we also have the same issue for subsections like TYPES, LABELS, and 
> RESULTS in such a scenario (actual results do not match expected ones). It 
> would be good if a user/developer does not have to manually remove those 
> extra newlines when trying to generate the golden files for new test files.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-12311) Extra newlines are produced when an end-to-end test is run with update_results

2023-07-24 Thread Fang-Yu Rao (Jira)
Fang-Yu Rao created IMPALA-12311:


 Summary: Extra newlines are produced when an end-to-end test is 
run with update_results 
 Key: IMPALA-12311
 URL: https://issues.apache.org/jira/browse/IMPALA-12311
 Project: IMPALA
  Issue Type: Bug
Reporter: Fang-Yu Rao
Assignee: Fang-Yu Rao


We found that extra newlines are produced in the updated golden file when the 
actual results do not match the expected results specified in the original 
golden file.

Take 
[TestDecimalExprs::test_exprs()|https://github.com/apache/impala/blob/master/tests/query_test/test_decimal_queries.py#L75]
 for example, this test runs the test cases in 
[decimal-exprs.test|https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test].

Suppose that we modify the expected error message at 
[https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test#L107]
 from "UDF WARNING: Decimal expression overflowed, returning NULL" to the 
following (the original string with an additional "x").
{noformat}
UDF WARNING: Decimal expression overflowed, returning NULLx
{noformat}
Then we run this test using the following command with the command line 
argument '--update_results'.
{code:java}
$IMPALA_HOME/bin/impala-py.test \
--update_results \
--junitxml=$IMPALA_EE_TEST_LOGS_DIR/results/test_decimal.xml \
$IMPALA_HOME/tests/query_test/test_decimal_queries.py::TestDecimalExprs::test_exprs
{code}
In $IMPALA_HOME/logs/ee_tests/QueryTest_decimal-exprs.test, we will found the 
following subsection corresponding to the query. There are 3 additional 
newlines in the subsection of 'ERRORS'.
{noformat}
 ERRORS
UDF WARNING: Decimal expression overflowed, returning NULL




{noformat}
One of the newlines was produced in 
[join_section_lines()|https://github.com/apache/impala/blob/master/tests/util/test_file_parser.py#L298].
 This function is called when the actual results do not match the expected 
results in the following 4 places.
 # [test_section['ERRORS'] = 
join_section_lines(actual_errors)|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L398].
 # [test_section['TYPES'] = join_section_lines([', 
'.join(actual_types)])|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L429].
 # [test_section['LABELS'] = join_section_lines([', 
'.join(actual_labels)])|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L451].
 # [test_section[result_section] = 
join_section_lines(actual.result_list)|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L489].

Thus, we also have the same issue for subsections like TYPES, LABELS, and 
RESULTS in such a scenario (actual results do not match expected ones). It 
would be good if a user/developer does not have to manually remove those extra 
newlines when trying to generate the golden files for new test files.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-12250) Remove deprecated Ranger configuration properties after RANGER-2895

2023-06-27 Thread Fang-Yu Rao (Jira)
Fang-Yu Rao created IMPALA-12250:


 Summary: Remove deprecated Ranger configuration properties after 
RANGER-2895
 Key: IMPALA-12250
 URL: https://issues.apache.org/jira/browse/IMPALA-12250
 Project: IMPALA
  Issue Type: Task
Reporter: Fang-Yu Rao
Assignee: Fang-Yu Rao


In IMPALA-12248, we added 3 new Ranger configuration properties that will be 
required after we start using a build that includes 
[RANGER-2895|https://github.com/apache/ranger/commit/846031985cae70f7a8c5e92faf186948a302260e#diff-dcab4376623684e416c7e60162c7af7a7d3789fe1d61a2cfdaef794334426f05]
 in order to start Ranger's HTTP server.

Recall that a Ranger configuration property was deprecated in 
[RANGER-2895|https://github.com/apache/ranger/commit/846031985cae70f7a8c5e92faf186948a302260e#diff-dcab4376623684e416c7e60162c7af7a7d3789fe1d61a2cfdaef794334426f05],
 i.e., 
[ranger.jpa.jdbc.idleconnectiontestperiod|https://github.com/apache/ranger/commit/846031985cae70f7a8c5e92faf186948a302260e#diff-9669116dca1e5c9fffdb2c81d4d9ac57b489131e90b89ff17b56801131bad5a6L419].
 Thus, we should also remove it from 
[ranger-admin-default-site.xml.template|https://github.com/apache/impala/blob/master/testdata/cluster/ranger/ranger-admin-default-site.xml.template]
 after starting using a build that includes 
[RANGER-2895|https://github.com/apache/ranger/commit/846031985cae70f7a8c5e92faf186948a302260e#diff-dcab4376623684e416c7e60162c7af7a7d3789fe1d61a2cfdaef794334426f05].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-12248) Add required Ranger configuration properties after RANGER-2895

2023-06-27 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao updated IMPALA-12248:
-
Description: 
[RANGER-2895|https://github.com/apache/ranger/commit/846031985cae70f7a8c5e92faf186948a302260e]
 added and removed some configuration properties.

[Three new configuration properties were 
added|https://github.com/apache/ranger/commit/846031985cae70f7a8c5e92faf186948a302260e#diff-dcab4376623684e416c7e60162c7af7a7d3789fe1d61a2cfdaef794334426f05].
 We found that once we bump up the build number to include RANGER-2895 and if 
those new properties do not exist in 
[ranger-admin-default-site.xml.template|https://github.com/apache/impala/blob/master/testdata/cluster/ranger/ranger-admin-default-site.xml.template]
 or 
[ranger-admin-site.xml.template|https://github.com/apache/impala/blob/master/testdata/cluster/ranger/ranger-admin-site.xml.template]
 then the produced site files for Ranger will not contain those new properties, 
resulting in some error message like the following in catalina.log. As a 
result, Ranger's HTTP server could not be properly started.
{code:java}
23/06/25 04:46:01 ERROR context.ContextLoader: Context initialization failed
org.springframework.beans.factory.BeanDefinitionStoreException: Invalid bean 
definition with name 'defaultDataSource' defined in ServletContext resource 
[/META-INF/applicationContext.xml]: Could not resolve placeholder 
'ranger.jpa.jdbc.idletimeout' in value "${ranger.jpa.jdbc.idletimeout}"; nested 
exception is java.lang.IllegalArgumentException: Could not resolve placeholder 
'ranger.jpa.jdbc.idletimeout' in value "${ranger.jpa.jdbc.idletimeout}"
at
{code}
There are also some configuration properties removed in RANGER-2895, e.g., 
[ranger.jpa.jdbc.idleconnectiontestperiod|https://github.com/apache/ranger/commit/846031985cae70f7a8c5e92faf186948a302260e#diff-dcab4376623684e416c7e60162c7af7a7d3789fe1d61a2cfdaef794334426f05L136].
 In this regard, we could probably add these 3 new properties first and then 
remove the unnecessary properties once we have bumped up the build number that 
includes RANGER-2895.

  was:
[RANGER-2895|https://github.com/apache/ranger/commit/846031985cae70f7a8c5e92faf186948a302260e]
 added and removed some configuration properties.

Three new configuration properties were added. We found that once we bump up 
the build number to include RANGER-2895 and if those new properties do not 
exist in 
[ranger-admin-default-site.xml.template|https://github.com/apache/impala/blob/master/testdata/cluster/ranger/ranger-admin-default-site.xml.template]
 or 
[ranger-admin-site.xml.template|https://github.com/apache/impala/blob/master/testdata/cluster/ranger/ranger-admin-site.xml.template]
 then the produced site files for Ranger will not contain those new properties, 
resulting in some error message like the following in catalina.log. As a 
result, Ranger's HTTP server could not be properly started.
{code:java}
23/06/25 04:46:01 ERROR context.ContextLoader: Context initialization failed
org.springframework.beans.factory.BeanDefinitionStoreException: Invalid bean 
definition with name 'defaultDataSource' defined in ServletContext resource 
[/META-INF/applicationContext.xml]: Could not resolve placeholder 
'ranger.jpa.jdbc.idletimeout' in value "${ranger.jpa.jdbc.idletimeout}"; nested 
exception is java.lang.IllegalArgumentException: Could not resolve placeholder 
'ranger.jpa.jdbc.idletimeout' in value "${ranger.jpa.jdbc.idletimeout}"
at
{code}
There are also some configuration properties removed in RANGER-2895, e.g., 
[ranger.jpa.jdbc.idleconnectiontestperiod|https://github.com/apache/ranger/commit/846031985cae70f7a8c5e92faf186948a302260e#diff-dcab4376623684e416c7e60162c7af7a7d3789fe1d61a2cfdaef794334426f05L136].
 In this regard, we could probably add these 3 new properties first and then 
remove the unnecessary properties once we have bumped up the build number that 
includes RANGER-2895.


> Add required Ranger configuration properties after RANGER-2895
> --
>
> Key: IMPALA-12248
> URL: https://issues.apache.org/jira/browse/IMPALA-12248
> Project: IMPALA
>  Issue Type: Task
>Reporter: Fang-Yu Rao
>Assignee: Fang-Yu Rao
>Priority: Major
>
> [RANGER-2895|https://github.com/apache/ranger/commit/846031985cae70f7a8c5e92faf186948a302260e]
>  added and removed some configuration properties.
> [Three new configuration properties were 
> added|https://github.com/apache/ranger/commit/846031985cae70f7a8c5e92faf186948a302260e#diff-dcab4376623684e416c7e60162c7af7a7d3789fe1d61a2cfdaef794334426f05].
>  We found that once we bump up the build number to include RANGER-2895 and if 
> those new properties do not exist in 
> 

[jira] [Created] (IMPALA-12248) Add required Ranger configuration properties after RANGER-2895

2023-06-27 Thread Fang-Yu Rao (Jira)
Fang-Yu Rao created IMPALA-12248:


 Summary: Add required Ranger configuration properties after 
RANGER-2895
 Key: IMPALA-12248
 URL: https://issues.apache.org/jira/browse/IMPALA-12248
 Project: IMPALA
  Issue Type: Task
Reporter: Fang-Yu Rao
Assignee: Fang-Yu Rao


[RANGER-2895|https://github.com/apache/ranger/commit/846031985cae70f7a8c5e92faf186948a302260e]
 added and removed some configuration properties.

Three new configuration properties were added. We found that once we bump up 
the build number to include RANGER-2895 and if those new properties do not 
exist in 
[ranger-admin-default-site.xml.template|https://github.com/apache/impala/blob/master/testdata/cluster/ranger/ranger-admin-default-site.xml.template]
 or 
[ranger-admin-site.xml.template|https://github.com/apache/impala/blob/master/testdata/cluster/ranger/ranger-admin-site.xml.template]
 then the produced site files for Ranger will not contain those new properties, 
resulting in some error message like the following in catalina.log. As a 
result, Ranger's HTTP server could not be properly started.
{code:java}
23/06/25 04:46:01 ERROR context.ContextLoader: Context initialization failed
org.springframework.beans.factory.BeanDefinitionStoreException: Invalid bean 
definition with name 'defaultDataSource' defined in ServletContext resource 
[/META-INF/applicationContext.xml]: Could not resolve placeholder 
'ranger.jpa.jdbc.idletimeout' in value "${ranger.jpa.jdbc.idletimeout}"; nested 
exception is java.lang.IllegalArgumentException: Could not resolve placeholder 
'ranger.jpa.jdbc.idletimeout' in value "${ranger.jpa.jdbc.idletimeout}"
at
{code}
There are also some configuration properties removed in RANGER-2895, e.g., 
[ranger.jpa.jdbc.idleconnectiontestperiod|https://github.com/apache/ranger/commit/846031985cae70f7a8c5e92faf186948a302260e#diff-dcab4376623684e416c7e60162c7af7a7d3789fe1d61a2cfdaef794334426f05L136].
 In this regard, we could probably add these 3 new properties first and then 
remove the unnecessary properties once we have bumped up the build number that 
includes RANGER-2895.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12239) BitWidthZeroRepeated seems to be flaky

2023-06-23 Thread Fang-Yu Rao (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17736586#comment-17736586
 ] 

Fang-Yu Rao commented on IMPALA-12239:
--

Hi [~daniel.becker], assigned this JIRA to you since your recent patch for 
IMPALA-12074 involves tests in 
[rle-test.cc|https://github.com/apache/impala/blame/master/be/src/util/rle-test.cc]
 so you may be more familiar with tests in this area. Feel free to re-assign 
the ticket as you see appropriate. Thanks!


> BitWidthZeroRepeated seems to be flaky
> --
>
> Key: IMPALA-12239
> URL: https://issues.apache.org/jira/browse/IMPALA-12239
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Fang-Yu Rao
>Assignee: Daniel Becker
>Priority: Major
>  Labels: broken-build
>
> [BitWidthZeroRepeated|https://github.com/apache/impala/blame/master/be/src/util/rle-test.cc#L400]
>  seems to be flaky. We observed the following error in a Jenkins run.
> Error Message
> {code}
> Value of: 0 Expected: val Which is: '\x9F' (159)
> {code}
> Stacktrace
> {code}
> /data/jenkins/workspace/impala-asf-master-exhaustive-release/repos/Impala/be/src/util/rle-test.cc:410
> Value of: 0
> Expected: val
> Which is: '\x9F' (159)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-12239) BitWidthZeroRepeated seems to be flaky

2023-06-23 Thread Fang-Yu Rao (Jira)
Fang-Yu Rao created IMPALA-12239:


 Summary: BitWidthZeroRepeated seems to be flaky
 Key: IMPALA-12239
 URL: https://issues.apache.org/jira/browse/IMPALA-12239
 Project: IMPALA
  Issue Type: Bug
Reporter: Fang-Yu Rao
Assignee: Daniel Becker


[BitWidthZeroRepeated|https://github.com/apache/impala/blame/master/be/src/util/rle-test.cc#L400]
 seems to be flaky. We observed the following error in a Jenkins run.

Error Message
{code}
Value of: 0 Expected: val Which is: '\x9F' (159)
{code}

Stacktrace
{code}
/data/jenkins/workspace/impala-asf-master-exhaustive-release/repos/Impala/be/src/util/rle-test.cc:410
Value of: 0
Expected: val
Which is: '\x9F' (159)
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-12235) test_multiple_coordinator() failed because _start_impala_cluster() returned non-zero exit status

2023-06-22 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao updated IMPALA-12235:
-
Description: 
We found that test_multiple_coordinator() could fail because 
[_start_impala_cluster()|https://github.com/apache/impala/blame/master/tests/common/custom_cluster_test_suite.py#L283]
 returned non-zero exit status. test_multiple_coordinator() calls 
test_multiple_coordinator() at 
https://github.com/apache/impala/blame/master/tests/custom_cluster/test_coordinators.py#L41C10-L41C31.

*Error Message*
{code:java}
CalledProcessError: Command 
'['/data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/bin/start-impala-cluster.py',
 '--state_store_args=--statestore_update_frequency_ms=50 
--statestore_priority_update_frequency_ms=50 
--statestore_heartbeat_frequency_ms=50', '--cluster_size=3', 
'--num_coordinators=2', 
'--log_dir=/data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/custom_cluster_tests',
 '--log_level=1', '--impalad_args=--default_query_options=']' returned non-zero 
exit status 1
{code}
*Stacktrace*
{code:java}
custom_cluster/test_coordinators.py:41: in test_multiple_coordinators
self._start_impala_cluster([], num_coordinators=2, cluster_size=3)
common/custom_cluster_test_suite.py:330: in _start_impala_cluster
check_call(cmd + options, close_fds=True)
/data/jenkins/workspace/impala-asf-master-core-erasure-coding/Impala-Toolchain/toolchain-packages-gcc10.4.0/python-2.7.16/lib/python2.7/subprocess.py:190:
 in check_call
raise CalledProcessError(retcode, cmd)
E   CalledProcessError: Command 
'['/data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/bin/start-impala-cluster.py',
 '--state_store_args=--statestore_update_frequency_ms=50 
--statestore_priority_update_frequency_ms=50 
--statestore_heartbeat_frequency_ms=50', '--cluster_size=3', 
'--num_coordinators=2', 
'--log_dir=/data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/custom_cluster_tests',
 '--log_level=1', '--impalad_args=--default_query_options=']' returned non-zero 
exit status 1
{code}

The following console output shows that 'num_known_live_backends' could not 
reach 3 in 4 mins and thus the command that starts the cluster failed with 
non-zero exit status.
{code}
-- 2023-06-21 20:54:40,594 INFO MainThread: Starting cluster with command: 
/data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/bin/start-impala-cluster.py
 '--state_store_args=--statestore_update_frequency_ms=50 
--statestore_priority_update_frequency_ms=50 
--statestore_heartbeat_frequency_ms=50' --cluster_size=3 --num_coordinators=2 
--log_dir=/data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/custom_cluster_tests
 --log_level=1 --impalad_args=--default_query_options=
20:54:41 MainThread: Found 0 impalad/0 statestored/0 catalogd process(es)
20:54:41 MainThread: Starting State Store logging to 
/data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/custom_cluster_tests/statestored.INFO
20:54:42 MainThread: Starting Catalog Service logging to 
/data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/custom_cluster_tests/catalogd.INFO
20:54:43 MainThread: Starting Impala Daemon logging to 
/data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/custom_cluster_tests/impalad.INFO
20:54:43 MainThread: Starting Impala Daemon logging to 
/data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/custom_cluster_tests/impalad_node1.INFO
20:54:43 MainThread: Starting Impala Daemon logging to 
/data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/custom_cluster_tests/impalad_node2.INFO
20:54:46 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
20:54:46 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
20:54:46 MainThread: Getting num_known_live_backends from 
impala-ec2-centos79-m6i-4xlarge-ondemand-1576.vpc.cloudera.com:25000
20:54:46 MainThread: Waiting for num_known_live_backends=3. Current value: 1
20:54:47 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
20:54:47 MainThread: Getting num_known_live_backends from 
impala-ec2-centos79-m6i-4xlarge-ondemand-1576.vpc.cloudera.com:25000
20:54:47 MainThread: Waiting for num_known_live_backends=3. Current value: 1
20:54:48 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
20:54:48 MainThread: Getting num_known_live_backends from 
impala-ec2-centos79-m6i-4xlarge-ondemand-1576.vpc.cloudera.com:25000
20:54:48 MainThread: num_known_live_backends has reached value: 3
20:54:48 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
20:54:48 MainThread: Getting num_known_live_backends from 
impala-ec2-centos79-m6i-4xlarge-ondemand-1576.vpc.cloudera.com:25001

[jira] [Commented] (IMPALA-12235) test_multiple_coordinator() failed because _start_impala_cluster() returned non-zero exit status

2023-06-22 Thread Fang-Yu Rao (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17736353#comment-17736353
 ] 

Fang-Yu Rao commented on IMPALA-12235:
--

Hi [~wzhou], assigned this JIRA to you since your recent patch [IMPALA-12155: 
Support High Availability for 
CatalogD|https://github.com/apache/impala/commit/819db8fa4667e06d1a56fe08baddfbc26983d389]
 involves _start_impala_cluster() and thus you may be more familiar with this 
function. Feel free to re-assign the ticket as you see appropriate. Thanks!

> test_multiple_coordinator() failed because _start_impala_cluster() returned 
> non-zero exit status
> 
>
> Key: IMPALA-12235
> URL: https://issues.apache.org/jira/browse/IMPALA-12235
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Fang-Yu Rao
>Assignee: Wenzhe Zhou
>Priority: Major
>  Labels: broken-build
>
> We found that test_multiple_coordinator() could fail because 
> [_start_impala_cluster()|https://github.com/apache/impala/blame/master/tests/common/custom_cluster_test_suite.py#L283]
>  returned non-zero exit status. test_multiple_coordinator() calls 
> test_multiple_coordinator() at 
> https://github.com/apache/impala/blame/master/tests/custom_cluster/test_coordinators.py#L41C10-L41C31.
> *Error Message*
> {code:java}
> CalledProcessError: Command 
> '['/data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/bin/start-impala-cluster.py',
>  '--state_store_args=--statestore_update_frequency_ms=50 
> --statestore_priority_update_frequency_ms=50 
> --statestore_heartbeat_frequency_ms=50', '--cluster_size=3', 
> '--num_coordinators=2', 
> '--log_dir=/data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/custom_cluster_tests',
>  '--log_level=1', '--impalad_args=--default_query_options=']' returned 
> non-zero exit status 1
> {code}
> *Stacktrace*
> {code:java}
> custom_cluster/test_coordinators.py:41: in test_multiple_coordinators
> self._start_impala_cluster([], num_coordinators=2, cluster_size=3)
> common/custom_cluster_test_suite.py:330: in _start_impala_cluster
> check_call(cmd + options, close_fds=True)
> /data/jenkins/workspace/impala-asf-master-core-erasure-coding/Impala-Toolchain/toolchain-packages-gcc10.4.0/python-2.7.16/lib/python2.7/subprocess.py:190:
>  in check_call
> raise CalledProcessError(retcode, cmd)
> E   CalledProcessError: Command 
> '['/data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/bin/start-impala-cluster.py',
>  '--state_store_args=--statestore_update_frequency_ms=50 
> --statestore_priority_update_frequency_ms=50 
> --statestore_heartbeat_frequency_ms=50', '--cluster_size=3', 
> '--num_coordinators=2', 
> '--log_dir=/data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/custom_cluster_tests',
>  '--log_level=1', '--impalad_args=--default_query_options=']' returned 
> non-zero exit status 1
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-12235) test_multiple_coordinator() failed because _start_impala_cluster() returned non-zero exit status

2023-06-22 Thread Fang-Yu Rao (Jira)
Fang-Yu Rao created IMPALA-12235:


 Summary: test_multiple_coordinator() failed because 
_start_impala_cluster() returned non-zero exit status
 Key: IMPALA-12235
 URL: https://issues.apache.org/jira/browse/IMPALA-12235
 Project: IMPALA
  Issue Type: Bug
Reporter: Fang-Yu Rao
Assignee: Wenzhe Zhou


We found that test_multiple_coordinator() could fail because 
[_start_impala_cluster()|https://github.com/apache/impala/blame/master/tests/common/custom_cluster_test_suite.py#L283]
 returned non-zero exit status. test_multiple_coordinator() calls 
test_multiple_coordinator() at 
https://github.com/apache/impala/blame/master/tests/custom_cluster/test_coordinators.py#L41C10-L41C31.

*Error Message*
{code:java}
CalledProcessError: Command 
'['/data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/bin/start-impala-cluster.py',
 '--state_store_args=--statestore_update_frequency_ms=50 
--statestore_priority_update_frequency_ms=50 
--statestore_heartbeat_frequency_ms=50', '--cluster_size=3', 
'--num_coordinators=2', 
'--log_dir=/data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/custom_cluster_tests',
 '--log_level=1', '--impalad_args=--default_query_options=']' returned non-zero 
exit status 1
{code}
*Stacktrace*
{code:java}
custom_cluster/test_coordinators.py:41: in test_multiple_coordinators
self._start_impala_cluster([], num_coordinators=2, cluster_size=3)
common/custom_cluster_test_suite.py:330: in _start_impala_cluster
check_call(cmd + options, close_fds=True)
/data/jenkins/workspace/impala-asf-master-core-erasure-coding/Impala-Toolchain/toolchain-packages-gcc10.4.0/python-2.7.16/lib/python2.7/subprocess.py:190:
 in check_call
raise CalledProcessError(retcode, cmd)
E   CalledProcessError: Command 
'['/data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/bin/start-impala-cluster.py',
 '--state_store_args=--statestore_update_frequency_ms=50 
--statestore_priority_update_frequency_ms=50 
--statestore_heartbeat_frequency_ms=50', '--cluster_size=3', 
'--num_coordinators=2', 
'--log_dir=/data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/custom_cluster_tests',
 '--log_level=1', '--impalad_args=--default_query_options=']' returned non-zero 
exit status 1
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-12234) test_ctas could fail because CTAS did not reach the expected states

2023-06-22 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao resolved IMPALA-12234.
--
Resolution: Duplicate

Close this JIRA since it's a duplicate of IMPALA-12148.

> test_ctas could fail because CTAS did not reach the expected states
> ---
>
> Key: IMPALA-12234
> URL: https://issues.apache.org/jira/browse/IMPALA-12234
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 4.2.0
>Reporter: Fang-Yu Rao
>Assignee: Fang-Yu Rao
>Priority: Major
>  Labels: broken-build
>
> We found that test_ctas could fail due to CTAS not being able to reach the 
> expected states in 'wait_time' which is 20 seconds if the underlying file 
> system is HDFS 
> ([https://github.com/apache/impala/blame/master/tests/metadata/test_ddl.py#L1120C7-L1120C16]).
> Maybe we could slightly increase 'wait_time' to make this test less flaky.
> *Error Message*
> {code:java}
> metadata/test_ddl.py:1122: in test_ctas self.wait_for_state(handle, 
> finished_state, wait_time, client=client) common/impala_test_suite.py:1115: 
> in wait_for_state self.wait_for_any_state(handle, [expected_state], 
> timeout, client) common/impala_test_suite.py:1133: in wait_for_any_state 
> raise Timeout(timeout_msg) E   Timeout: query 
> 'c74765ed9d8472ed:6f2dde90' did not reach one of the expected states 
> [4], last known state 2
> {code}
> *Stacktrace*
> {code:java}
> metadata/test_ddl.py:1122: in test_ctas
> self.wait_for_state(handle, finished_state, wait_time, client=client)
> common/impala_test_suite.py:1115: in wait_for_state
> self.wait_for_any_state(handle, [expected_state], timeout, client)
> common/impala_test_suite.py:1133: in wait_for_any_state
> raise Timeout(timeout_msg)
> E   Timeout: query 'c74765ed9d8472ed:6f2dde90' did not reach one of 
> the expected states [4], last known state 2
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-12234) test_ctas could fail because CTAS did not reach the expected states

2023-06-22 Thread Fang-Yu Rao (Jira)
Fang-Yu Rao created IMPALA-12234:


 Summary: test_ctas could fail because CTAS did not reach the 
expected states
 Key: IMPALA-12234
 URL: https://issues.apache.org/jira/browse/IMPALA-12234
 Project: IMPALA
  Issue Type: Bug
Affects Versions: Impala 4.2.0
Reporter: Fang-Yu Rao
Assignee: Fang-Yu Rao


We found that test_ctas could fail due to CTAS not being able to reach the 
expected states in 'wait_time' which is 20 seconds if the underlying file 
system is HDFS 
([https://github.com/apache/impala/blame/master/tests/metadata/test_ddl.py#L1120C7-L1120C16]).

Maybe we could slightly increase 'wait_time' to make this test less flaky.

*Error Message*
{code:java}
metadata/test_ddl.py:1122: in test_ctas self.wait_for_state(handle, 
finished_state, wait_time, client=client) common/impala_test_suite.py:1115: in 
wait_for_state self.wait_for_any_state(handle, [expected_state], timeout, 
client) common/impala_test_suite.py:1133: in wait_for_any_state raise 
Timeout(timeout_msg) E   Timeout: query 'c74765ed9d8472ed:6f2dde90' did 
not reach one of the expected states [4], last known state 2
{code}
*Stacktrace*
{code:java}
metadata/test_ddl.py:1122: in test_ctas
self.wait_for_state(handle, finished_state, wait_time, client=client)
common/impala_test_suite.py:1115: in wait_for_state
self.wait_for_any_state(handle, [expected_state], timeout, client)
common/impala_test_suite.py:1133: in wait_for_any_state
raise Timeout(timeout_msg)
E   Timeout: query 'c74765ed9d8472ed:6f2dde90' did not reach one of the 
expected states [4], last known state 2
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-12151) Formula used to estimate the cost of join could be improved

2023-05-18 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao updated IMPALA-12151:
-
Description: 
We found that the formula used in 
[Planner#isInvertedJoinCheaper()|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/planner/Planner.java#L719-L724]
 to estimate the cost of a join (per node) sometimes could lead to a bad join 
order.

The issue could shown using the following steps.
{code:java}
create database test_db;
create table test_db.larger_tbl (string_col string, bigint_col bigint, 
int_col_0 int, int_col_1 int) partitioned by (date_string_col string) stored as 
parquet;
create table test_db.smaller_tbl (bigint_col bigint) partitioned by 
(date_string_col string) stored as parquet;

insert into test_db.smaller_tbl partition (date_string_col='2023-05-05') values 
(1000);
insert into test_db.smaller_tbl partition (date_string_col='2023-05-05') values 
(1000);
insert into test_db.smaller_tbl partition (date_string_col='2023-05-05') values 
(1000);

insert into test_db.larger_tbl partition (date_string_col='2023-05-05') values 
('wa', 1000, 6, 1);

alter table test_db.smaller_tbl partition (date_string_col='2023-05-05')
 set tblproperties('numrows'='17000', 'stats_generated_via_stats_task'='true');
alter table test_db.larger_tbl partition (date_string_col='2023-05-05')
 set tblproperties('numrows'='2890', 
'stats_generated_via_stats_task'='true');

explain select
  distinct t0.`string_col`
from
  `test_db`.`larger_tbl` t0
  left outer join `test_db`.`smaller_tbl` t1 on (
t0.`date_string_col` = t1.`date_string_col`
and t0.`bigint_col` = t1.`bigint_col`
  )
where
t0.`date_string_col` in ('2023-05-05') and t0.`int_col_1` in (1)
order by 1 asc
limit 1000;
{code}
 

The query plan shows that Impala will be using the larger table ('larger_tbl') 
as the build side table in the hash join node. When there is data skew in the 
larger table, it's possible that there will be only one single executor working 
on building the hash table based on the only hash partition that contains data, 
which in turn could cause the executor node to run into memory issue.
{code:java}
+--+
| Explain String
   |
+--+
| Max Per-Host Resource Reservation: Memory=110.03MB Threads=7  
   |
| Per-Host Resource Estimates: Memory=414MB 
   |
| WARNING: The following tables are missing relevant table and/or column 
statistics.   |
| test_db.larger_tbl, test_db.smaller_tbl   
   |
|   
   |
| PLAN-ROOT SINK
   |
| | 
   |
| 09:MERGING-EXCHANGE [UNPARTITIONED]   
   |
| |  order by: t0.string_col ASC
   |
| |  limit: 1001
   |
| | 
   |
| 04:TOP-N [LIMIT=1001] 
   |
| |  order by: t0.string_col ASC
   |
| |  row-size=12B cardinality=1.00K 
   |
| | 
   |
| 08:AGGREGATE [FINALIZE]   
   |
| |  group by: t0.string_col
   |
| |  row-size=12B cardinality=2.89M 
   |
| | 
   |
| 07:EXCHANGE [HASH(t0.string_col)] 
   |
| | 
   |
| 03:AGGREGATE [STREAMING]  
   |
| |  group by: t0.string_col
   

[jira] [Updated] (IMPALA-12151) Formula used to estimate the cost of join could be improved

2023-05-18 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao updated IMPALA-12151:
-
Description: 
We found that the formula used in 
[Planner#isInvertedJoinCheaper()|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/planner/Planner.java#L719-L724]
 to estimate the cost of a join (per node) sometimes could lead to a bad join 
order.

The issue could shown using the following steps.
{code:java}
create database test_db;
create table test_db.larger_tbl (string_col string, bigint_col bigint, 
int_col_0 int, int_col_1 int) partitioned by (date_string_col string) stored as 
parquet;
create table test_db.smaller_tbl (bigint_col bigint) partitioned by 
(date_string_col string) stored as parquet;

insert into test_db.smaller_tbl partition (date_string_col='2023-05-05') values 
(1000);
insert into test_db.smaller_tbl partition (date_string_col='2023-05-05') values 
(1000);
insert into test_db.smaller_tbl partition (date_string_col='2023-05-05') values 
(1000);

insert into test_db.larger_tbl partition (date_string_col='2023-05-05') values 
('wa', 1000, 6, 1);

alter table test_db.smaller_tbl partition (date_string_col='2023-05-05')
 set tblproperties('numrows'='17000', 'stats_generated_via_stats_task'='true');
alter table test_db.larger_tbl partition (date_string_col='2023-05-05')
 set tblproperties('numrows'='2890', 
'stats_generated_via_stats_task'='true');

explain select
  distinct t0.`string_col`
from
  `test_db`.`larger_tbl` t0
  left outer join `test_db`.`smaller_tbl` t1 on (
t0.`date_string_col` = t1.`date_string_col`
and t0.`bigint_col` = t1.`bigint_col`
  )
where
t0.`date_string_col` in ('2023-05-05') and t0.`int_col_1` in (1)
order by 1 asc
limit 1000;
{code}
 

The query plan shows that Impala will be using the larger table ('larger_tbl') 
as the build side table in the hash join node. When there is data skew in the 
larger table, it's possible that there will be only one single executor working 
on building the hash table based on the only hash partition that contains data, 
which in turn could cause the executor node to run into memory issue.
{code:java}
+--+
| Explain String
   |
+--+
| Max Per-Host Resource Reservation: Memory=110.03MB Threads=7  
   |
| Per-Host Resource Estimates: Memory=414MB 
   |
| WARNING: The following tables are missing relevant table and/or column 
statistics.   |
| test_db.larger_tbl, test_db.smaller_tbl   
   |
|   
   |
| PLAN-ROOT SINK
   |
| | 
   |
| 09:MERGING-EXCHANGE [UNPARTITIONED]   
   |
| |  order by: t0.string_col ASC
   |
| |  limit: 1001
   |
| | 
   |
| 04:TOP-N [LIMIT=1001] 
   |
| |  order by: t0.string_col ASC
   |
| |  row-size=12B cardinality=1.00K 
   |
| | 
   |
| 08:AGGREGATE [FINALIZE]   
   |
| |  group by: t0.string_col
   |
| |  row-size=12B cardinality=2.89M 
   |
| | 
   |
| 07:EXCHANGE [HASH(t0.string_col)] 
   |
| | 
   |
| 03:AGGREGATE [STREAMING]  
   |
| |  group by: t0.string_col
   

[jira] [Updated] (IMPALA-12151) Formula used to estimate the cost of join could be improved

2023-05-18 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao updated IMPALA-12151:
-
Description: 
We found that the formula used in 
[Planner#isInvertedJoinCheaper()|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/planner/Planner.java#L719-L724]
 to estimate the cost of a join (per node) sometimes could lead to a bad join 
order.

The issue could shown using by the following steps.
{code:java}
create database test_db;
create table test_db.larger_tbl (string_col string, bigint_col bigint, 
int_col_0 int, int_col_1 int) partitioned by (date_string_col string) stored as 
parquet;
create table test_db.smaller_tbl (bigint_col bigint) partitioned by 
(date_string_col string) stored as parquet;

insert into test_db.smaller_tbl partition (date_string_col='2023-05-05') values 
(1000);
insert into test_db.smaller_tbl partition (date_string_col='2023-05-05') values 
(1000);
insert into test_db.smaller_tbl partition (date_string_col='2023-05-05') values 
(1000);

insert into test_db.larger_tbl partition (date_string_col='2023-05-05') values 
('wa', 1000, 6, 1);

alter table test_db.smaller_tbl partition (date_string_col='2023-05-05')
 set tblproperties('numrows'='17000', 'stats_generated_via_stats_task'='true');
alter table test_db.larger_tbl partition (date_string_col='2023-05-05')
 set tblproperties('numrows'='2890', 
'stats_generated_via_stats_task'='true');

explain select
  distinct t0.`string_col`
from
  `test_db`.`larger_tbl` t0
  left outer join `test_db`.`smaller_tbl` t1 on (
t0.`date_string_col` = t1.`date_string_col`
and t0.`bigint_col` = t1.`bigint_col`
  )
where
t0.`date_string_col` in ('2023-05-05') and t0.`int_col_1` in (1)
order by 1 asc
limit 1000;
{code}
 

The query plan shows that Impala will be using the larger table ('larger_tbl') 
as the build side table in the hash join node. When there is data skew in the 
larger table, it's possible that there will be only one single executor working 
on building the hash table based on the only hash partition that contains data, 
which in turn could cause the executor node to run into memory issue.
{code:java}
+--+
| Explain String
   |
+--+
| Max Per-Host Resource Reservation: Memory=110.03MB Threads=7  
   |
| Per-Host Resource Estimates: Memory=414MB 
   |
| WARNING: The following tables are missing relevant table and/or column 
statistics.   |
| test_db.larger_tbl, test_db.smaller_tbl   
   |
|   
   |
| PLAN-ROOT SINK
   |
| | 
   |
| 09:MERGING-EXCHANGE [UNPARTITIONED]   
   |
| |  order by: t0.string_col ASC
   |
| |  limit: 1001
   |
| | 
   |
| 04:TOP-N [LIMIT=1001] 
   |
| |  order by: t0.string_col ASC
   |
| |  row-size=12B cardinality=1.00K 
   |
| | 
   |
| 08:AGGREGATE [FINALIZE]   
   |
| |  group by: t0.string_col
   |
| |  row-size=12B cardinality=2.89M 
   |
| | 
   |
| 07:EXCHANGE [HASH(t0.string_col)] 
   |
| | 
   |
| 03:AGGREGATE [STREAMING]  
   |
| |  group by: t0.string_col

[jira] [Created] (IMPALA-12151) Formula used to estimate the cost of join could be improved

2023-05-18 Thread Fang-Yu Rao (Jira)
Fang-Yu Rao created IMPALA-12151:


 Summary: Formula used to estimate the cost of join could be 
improved
 Key: IMPALA-12151
 URL: https://issues.apache.org/jira/browse/IMPALA-12151
 Project: IMPALA
  Issue Type: Bug
  Components: Frontend
Affects Versions: Impala 4.1.2
Reporter: Fang-Yu Rao


We found that the formula used in 
[Planner#|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/planner/Planner.java#L719-L724]
 to estimate the cost of a join (per node) sometimes could lead to a bad join 
order.

The issue could shown using by the following steps.
{code:java}
create database test_db;
create table test_db.larger_tbl (string_col string, bigint_col bigint, 
int_col_0 int, int_col_1 int) partitioned by (date_string_col string) stored as 
parquet;
create table test_db.smaller_tbl (bigint_col bigint) partitioned by 
(date_string_col string) stored as parquet;

insert into test_db.smaller_tbl partition (date_string_col='2023-05-05') values 
(1000);
insert into test_db.smaller_tbl partition (date_string_col='2023-05-05') values 
(1000);
insert into test_db.smaller_tbl partition (date_string_col='2023-05-05') values 
(1000);

insert into test_db.larger_tbl partition (date_string_col='2023-05-05') values 
('wa', 1000, 6, 1);

alter table test_db.smaller_tbl partition (date_string_col='2023-05-05')
 set tblproperties('numrows'='17000', 'stats_generated_via_stats_task'='true');
alter table test_db.larger_tbl partition (date_string_col='2023-05-05')
 set tblproperties('numrows'='2890', 
'stats_generated_via_stats_task'='true');

explain select
  distinct t0.`string_col`
from
  `test_db`.`larger_tbl` t0
  left outer join `test_db`.`smaller_tbl` t1 on (
t0.`date_string_col` = t1.`date_string_col`
and t0.`bigint_col` = t1.`bigint_col`
  )
where
t0.`date_string_col` in ('2023-05-05') and t0.`int_col_1` in (1)
order by 1 asc
limit 1000;
{code}
 

The query plan shows that Impala will be using the larger table ('larger_tbl') 
as the build side table in the hash join node. When there is data skew in the 
larger table, it's possible that there will be only one single executor working 
on building the hash table based on the only hash partition that contains data, 
which in turn could cause the executor node to run into memory issue.
{code:java}
+--+
| Explain String
   |
+--+
| Max Per-Host Resource Reservation: Memory=110.03MB Threads=7  
   |
| Per-Host Resource Estimates: Memory=414MB 
   |
| WARNING: The following tables are missing relevant table and/or column 
statistics.   |
| test_db.larger_tbl, test_db.smaller_tbl   
   |
|   
   |
| PLAN-ROOT SINK
   |
| | 
   |
| 09:MERGING-EXCHANGE [UNPARTITIONED]   
   |
| |  order by: t0.string_col ASC
   |
| |  limit: 1001
   |
| | 
   |
| 04:TOP-N [LIMIT=1001] 
   |
| |  order by: t0.string_col ASC
   |
| |  row-size=12B cardinality=1.00K 
   |
| | 
   |
| 08:AGGREGATE [FINALIZE]   
   |
| |  group by: t0.string_col
   |
| |  row-size=12B cardinality=2.89M 
   |
| | 
   |
| 07:EXCHANGE [HASH(t0.string_col)] 
   |
| | 
   |
| 03:AGGREGATE 

[jira] [Resolved] (IMPALA-11686) test_corrupts_stats fails in exhaustive tests due to IMPALA-11666

2023-02-22 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao resolved IMPALA-11686.
--
Resolution: Fixed

Resolve the JIRA since the fix has been merged.

> test_corrupts_stats fails in exhaustive tests due to IMPALA-11666
> -
>
> Key: IMPALA-11686
> URL: https://issues.apache.org/jira/browse/IMPALA-11686
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Zoltán Borók-Nagy
>Assignee: Fang-Yu Rao
>Priority: Major
>  Labels: broken-build
>
> IMPALA-11666 changed the warning message in the query plan when there are 
> potentially corrupt stats.
> test_corrupt_stats expects the old warning message.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-11666) Consider revising the warning message when hasCorruptTableStats_ is true for a table

2023-02-22 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao resolved IMPALA-11666.
--
Resolution: Fixed

Resolve this JIRA since the patch has been merged.

> Consider revising the warning message when hasCorruptTableStats_ is true for 
> a table
> 
>
> Key: IMPALA-11666
> URL: https://issues.apache.org/jira/browse/IMPALA-11666
> Project: IMPALA
>  Issue Type: Task
>  Components: Frontend
>Reporter: Fang-Yu Rao
>Assignee: Fang-Yu Rao
>Priority: Major
>
> Currently, '{{{}hasCorruptTableStats_{}}}' of an HDFS table is set to true 
> when one of the following is true in 
> [HdfsScanNode.java|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java].
>  # Its '{{{}cardinality_{}}}' less than -1.
>  # The number of rows in one of its partition is less than -1.
>  # The number of rows in one of its partition is 0 but the size of the 
> associated files of this partition is greater than 0.
>  # The number of rows in the table is 0 but the size of the associated files 
> of this table is greater than 0.
> For such a table, the {{EXPLAIN}} statement for queries involving the table 
> would contain the message of "{{{}WARNING: The following tables have 
> potentially corrupt table statistics. Drop and re-compute statistics to 
> resolve this problem.{}}}"
> The warning message may be a bit too scary for an Impala user especially if 
> we consider the fact that a table without corrupt statistics could indeed 
> have its '{{{}hasCorruptTableStats_{}}}' set to true by Impala's frontend.
> Specifically, a table without corrupt statistics but having its 
> '{{{}hasCorruptTableStats_{}}}' set to 1 could be created as follows after 
> starting the Impala cluster.
>  # Execute on the command line "{{{}beeline -u 
> "jdbc:hive2://localhost:11050/default"{}}}" to enter beeline.
>  # Create a transactional table in beeline via "{{{}create table 
> test_db.test_tbl_01 (id int, name string) stored as orc tblproperties 
> ('transactional'='true'){}}}".
>  # Insert a row into the table just created in beeline via "{{{}insert into 
> table test_db.test_tbl_01 (1, "Alex");{}}}".
>  # Delete the row just inserted in beeline via "{{{}delete from 
> test_db.test_tbl_01 where id = 1{}}}".
> # In Impala shell, execute "{{compute stats test_db.test_tbl_01}}".
>  # In Impala shell, execute "{{{}explain select * from 
> test_db.test_tbl_01{}}}" to verify that the warning message described above 
> appears in the output.
> The table '{{{}test_tbl_01{}}}' above has 0 row but the associated file size 
> is greater than 0.
> It may be better that we revise the warning message to something less scary 
> as shown below.
> {code:java}
> The number of rows in the following tables or in a partition of them has 0 or 
> fewer than -1 row but positive total file size.
> This does not necessarily imply the existence of corrupt statistics.
> In the case of corrupt statistics, drop and re-compute statistics could 
> resolve this problem.
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-11934) TestBatchReadingFromRemote seems to be flaky in the Ozone build

2023-02-20 Thread Fang-Yu Rao (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-11934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691276#comment-17691276
 ] 

Fang-Yu Rao commented on IMPALA-11934:
--

Hi [~baggio000], assigned this to you since you are more familiar with the 
failed test. Please re-assign the JIRA as you see appropriate. Thanks!

> TestBatchReadingFromRemote seems to be flaky in the Ozone build
> ---
>
> Key: IMPALA-11934
> URL: https://issues.apache.org/jira/browse/IMPALA-11934
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Fang-Yu Rao
>Assignee: Yida Wu
>Priority: Major
>  Labels: broken-build
>
> We found that TestBatchReadingFromRemote failed in a run of Ozone build with 
> the following output.
> Error Message
> {code}
> Value of: wait_times-- > 0   Actual: false Expected: true
> {code}
> Stacktrace
> {code}
> /data/jenkins/workspace/impala-asf-master-core-ozone/repos/Impala/be/src/runtime/tmp-file-mgr-test.cc:323
> Value of: wait_times-- > 0
>   Actual: false
> Expected: true
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-11934) TestBatchReadingFromRemote seems to be flaky in the Ozone build

2023-02-20 Thread Fang-Yu Rao (Jira)
Fang-Yu Rao created IMPALA-11934:


 Summary: TestBatchReadingFromRemote seems to be flaky in the Ozone 
build
 Key: IMPALA-11934
 URL: https://issues.apache.org/jira/browse/IMPALA-11934
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Reporter: Fang-Yu Rao
Assignee: Yida Wu


We found that TestBatchReadingFromRemote failed in a run of Ozone build with 
the following output.

Error Message
{code}
Value of: wait_times-- > 0   Actual: false Expected: true
{code}

Stacktrace
{code}
/data/jenkins/workspace/impala-asf-master-core-ozone/repos/Impala/be/src/runtime/tmp-file-mgr-test.cc:323
Value of: wait_times-- > 0
  Actual: false
Expected: true
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-11932) test_partition_key_scans_with_multiple_blocks_table failed when erasure coding is turned on

2023-02-19 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao updated IMPALA-11932:
-
Affects Version/s: Impala 4.3.0

> test_partition_key_scans_with_multiple_blocks_table failed when erasure 
> coding is turned on
> ---
>
> Key: IMPALA-11932
> URL: https://issues.apache.org/jira/browse/IMPALA-11932
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 4.3.0
>Reporter: Fang-Yu Rao
>Assignee: Quanlong Huang
>Priority: Major
>  Labels: broken-build
>
> We found that test_partition_key_scans_with_multiple_blocks_table failed when 
> ERASURE_CODING is true. This test was added in IMPALA-11081 
> (https://gerrit.cloudera.org/c/19471/17/tests/query_test/test_queries.py#366).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-11932) test_partition_key_scans_with_multiple_blocks_table failed when erasure coding is turned on

2023-02-19 Thread Fang-Yu Rao (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-11932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17690924#comment-17690924
 ] 

Fang-Yu Rao commented on IMPALA-11932:
--

Hi [~stigahuang], assigned this JIRA to you since you helped review the patch 
that added the failed test. Please reassign the JIRA as you see appropriate. 
Thanks!


> test_partition_key_scans_with_multiple_blocks_table failed when erasure 
> coding is turned on
> ---
>
> Key: IMPALA-11932
> URL: https://issues.apache.org/jira/browse/IMPALA-11932
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 4.3.0
>Reporter: Fang-Yu Rao
>Assignee: Quanlong Huang
>Priority: Major
>  Labels: broken-build
>
> We found that test_partition_key_scans_with_multiple_blocks_table failed when 
> ERASURE_CODING is true. This test was added in IMPALA-11081 
> (https://gerrit.cloudera.org/c/19471/17/tests/query_test/test_queries.py#366).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-11932) test_partition_key_scans_with_multiple_blocks_table failed when erasure coding is turned on

2023-02-19 Thread Fang-Yu Rao (Jira)
Fang-Yu Rao created IMPALA-11932:


 Summary: test_partition_key_scans_with_multiple_blocks_table 
failed when erasure coding is turned on
 Key: IMPALA-11932
 URL: https://issues.apache.org/jira/browse/IMPALA-11932
 Project: IMPALA
  Issue Type: Bug
Reporter: Fang-Yu Rao
Assignee: Quanlong Huang


We found that test_partition_key_scans_with_multiple_blocks_table failed when 
ERASURE_CODING is true. This test was added in IMPALA-11081 
(https://gerrit.cloudera.org/c/19471/17/tests/query_test/test_queries.py#366).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-11921) test_large_sql seems to be flaky

2023-02-13 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao updated IMPALA-11921:
-
Description: 
We observed the following failure in an ASAN run.
{code}
/data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/tests/shell/test_shell_commandline.py:1026:
 in test_large_sql assert actual_time_s <= time_limit_s, ( E   
AssertionError: It took 21.0015001297 seconds to execute the query. Time limit 
is 20 seconds. E   assert 21.001500129699707 <= 20
{code}

We have not seen this failure for a while since IMPALA-7428.


  was:
We observed the following failure in an ASAN run.
{noformat}
/data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/tests/shell/test_shell_commandline.py:1026:
 in test_large_sql assert actual_time_s <= time_limit_s, ( E   
AssertionError: It took 21.0015001297 seconds to execute the query. Time limit 
is 20 seconds. E   assert 21.001500129699707 <= 20
{noformat}

We have not seen this failure for a while since IMPALA-7428.



> test_large_sql seems to be flaky
> 
>
> Key: IMPALA-11921
> URL: https://issues.apache.org/jira/browse/IMPALA-11921
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Fang-Yu Rao
>Assignee: Fang-Yu Rao
>Priority: Major
>  Labels: broken-build
>
> We observed the following failure in an ASAN run.
> {code}
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/tests/shell/test_shell_commandline.py:1026:
>  in test_large_sql assert actual_time_s <= time_limit_s, ( E   
> AssertionError: It took 21.0015001297 seconds to execute the query. Time 
> limit is 20 seconds. E   assert 21.001500129699707 <= 20
> {code}
> We have not seen this failure for a while since IMPALA-7428.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-11921) test_large_sql seems to be flaky

2023-02-13 Thread Fang-Yu Rao (Jira)
Fang-Yu Rao created IMPALA-11921:


 Summary: test_large_sql seems to be flaky
 Key: IMPALA-11921
 URL: https://issues.apache.org/jira/browse/IMPALA-11921
 Project: IMPALA
  Issue Type: Bug
Reporter: Fang-Yu Rao
Assignee: Fang-Yu Rao


We observed the following failure in an ASAN run.
{noformat}
/data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/tests/shell/test_shell_commandline.py:1026:
 in test_large_sql assert actual_time_s <= time_limit_s, ( E   
AssertionError: It took 21.0015001297 seconds to execute the query. Time limit 
is 20 seconds. E   assert 21.001500129699707 <= 20
{noformat}

We have not seen this failure for a while since IMPALA-7428.




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-11918) Fix test_java_udfs_from_impala after IMPALA-11745

2023-02-13 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao updated IMPALA-11918:
-
Labels: broken-build  (was: )

> Fix test_java_udfs_from_impala after IMPALA-11745
> -
>
> Key: IMPALA-11918
> URL: https://issues.apache.org/jira/browse/IMPALA-11918
> Project: IMPALA
>  Issue Type: Bug
>  Components: fe
>Affects Versions: Impala 4.3.0
>Reporter: Peter Rozsa
>Assignee: Peter Rozsa
>Priority: Major
>  Labels: broken-build
>
> IMPALA-11745 changed the error message regarding failed method extraction for 
> Hive UDFs, and an exhaustive test case remained unchanged, causing failure in 
> exhaustive builds.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-11871) INSERT statement does not respect Ranger policies for HDFS

2023-01-29 Thread Fang-Yu Rao (Jira)
Fang-Yu Rao created IMPALA-11871:


 Summary: INSERT statement does not respect Ranger policies for HDFS
 Key: IMPALA-11871
 URL: https://issues.apache.org/jira/browse/IMPALA-11871
 Project: IMPALA
  Issue Type: Bug
  Components: Frontend
Reporter: Fang-Yu Rao
Assignee: Fang-Yu Rao


In a cluster with Ranger auth (and with legacy catalog mode), even if you 
provide RWX to cm_hdfs -> all-path for the user impala, inserting into a table 
whose HDFS POSIX permissions happen to exclude impala access will result in an
{noformat}
"AnalysisException: Unable to INSERT into target table (default.t1) because 
Impala does not have WRITE access to HDFS location: 
hdfs://nightly-71x-vx-2.nightly-71x-vx.root.hwx.site:8020/warehouse/tablespace/external/hive/t1"{noformat}
 
{noformat}
[root@nightly-71x-vx-3 ~]# hdfs dfs -getfacl 
/warehouse/tablespace/external/hive/t1

file: /warehouse/tablespace/external/hive/t1 
owner: hive 
group: supergroup
user::rwx
user:impala:rwx #effective:r-x
group::rwx #effective:r-x
mask::r-x
other::---
default:user::rwx
default:user:impala:rwx
default:group::rwx
default:mask::rwx
default:other::--- {noformat}
~~

ANALYSIS

Stack trace from a version of Cloudera's distribution of Impala (impalad 
version 3.4.0-SNAPSHOT RELEASE (build 
{*}db20b59a093c17ea4699117155d58fe874f7d68f{*})):
{noformat}
at 
org.apache.impala.catalog.FeFsTable$Utils.checkWriteAccess(FeFsTable.java:585)
at org.apache.impala.analysis.InsertStmt.analyzeWriteAccess(InsertStmt.java:545)
at org.apache.impala.analysis.InsertStmt.analyze(InsertStmt.java:391)
at org.apache.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:463)
at 
org.apache.impala.analysis.AnalysisContext.analyzeAndAuthorize(AnalysisContext.java:426)
at org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:1570)
at org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:1536)
at org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1506)
at 
org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:155){noformat}
The exception occurs at analysis time, so I tested and succeeded in writing 
directly into the said directory.
{noformat}
[root@nightly-71x-vx-3 ~]# hdfs dfs -touchz 
/warehouse/tablespace/external/hive/t1/test
[root@nightly-71x-vx-3 ~]# hdfs dfs -ls /warehouse/tablespace/external/hive/t1/
Found 8 items
rw-rw---+ 3 hive supergroup 417 2023-01-27 17:37 
/warehouse/tablespace/external/hive/t1/00_0
rw-rw---+ 3 hive supergroup 417 2023-01-27 17:44 
/warehouse/tablespace/external/hive/t1/00_0_copy_1
rw-rw---+ 3 hive supergroup 417 2023-01-27 17:49 
/warehouse/tablespace/external/hive/t1/00_0_copy_2
rw-rw---+ 3 hive supergroup 417 2023-01-27 17:53 
/warehouse/tablespace/external/hive/t1/00_0_copy_3
rw-rw---+ 3 impala hive 355 2023-01-27 17:17 
/warehouse/tablespace/external/hive/t1/4c4477c12c51ad96-3126b52d_2029811630_data.0.parq
rw-rw---+ 3 impala hive 355 2023-01-27 17:39 
/warehouse/tablespace/external/hive/t1/9945b25bb37d1ff2-473c1478_574471191_data.0.parq
drwxrwx---+ - impala hive 0 2023-01-27 17:39 
/warehouse/tablespace/external/hive/t1/_impala_insert_staging
rw-rw---+ 3 impala supergroup 0 2023-01-27 18:01 
/warehouse/tablespace/external/hive/t1/test{noformat}
Reviewing the code[1], I traced the {{TAccessLevel}} to the catalogd. And if I 
add user impala to group supergroup on the catalogd host, this query will 
succeed past the authorization.

Additionally, this query does not trip up during analysis when catalog v2 is 
enabled because the method {{getFirstLocationWithoutWriteAccess()}} is not 
implemented there yet and always returns null[2].

[1] 
[https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java#L494-L504]

[2] 
[https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java#L295-L298]

~~

Ideally, when Ranger authorization is in place, we should:
1) Not check access level during analysis
2) Incorporate Ranger ACLs during analysis



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-11728) Set fallback database for functions

2023-01-20 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao updated IMPALA-11728:
-
Fix Version/s: Impala 4.3.0

> Set fallback database for functions
> ---
>
> Key: IMPALA-11728
> URL: https://issues.apache.org/jira/browse/IMPALA-11728
> Project: IMPALA
>  Issue Type: New Feature
>Reporter: gaoxiaoqing
>Assignee: gaoxiaoqing
>Priority: Major
> Fix For: Impala 4.3.0
>
>
> {code:java}
> CREATE FUNCTION default.function_name([arg_type[, arg_type...])
>   RETURNS return_type
>   LOCATION 'hdfs_path_to_dot_so'
>   SYMBOL='symbol_name' {code}
>  
> {noformat}
> use functional;
> select function_name();
> ERROR: AnalysisException: functional.function_name() unknown for database 
> functional.{noformat}
>  
> The create function statement can only works on specified default database.
> Add a fallback database for functions as query option. It works on all 
> database without changing query. 
> {noformat}
> use functional;
> set db_name_with_global_udf=default;
> select function_name(); // It works.{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-10986) Specific privilege should be required to execute a UDF in Impala

2023-01-20 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao resolved IMPALA-10986.
--
Fix Version/s: Impala 4.3.0
   Resolution: Fixed

Resolve this JIRA since the fix has been merged.

> Specific privilege should be required to execute a UDF in Impala
> 
>
> Key: IMPALA-10986
> URL: https://issues.apache.org/jira/browse/IMPALA-10986
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 4.0.0
>Reporter: Fang-Yu Rao
>Assignee: Fang-Yu Rao
>Priority: Major
> Fix For: Impala 4.3.0
>
> Attachments: ranger_policy_for_udfs_impala.png
>
>
> We found that currently in Impala, to execute a UDF, a user only has to be 
> granted one of the 3 privileges in {{{}INSERT{}}}, {{{}SELECT{}}}, 
> {{REFRESH}} on the database (i.e., the {{VIEW_METADATA}} privilege on the 
> database) where the UDF was created. No additional privilege on the UDF is 
> required. An example of the policy added via Ranger's web UI allowing a user 
> to execute a UDF is also provided here.
> !ranger_policy_for_udfs_impala.png!
> The privilege request of {{VIEW_METADATA}} on the database is registered 
> within [analyzer.getDb(fnName_.getDb(), Privilege.VIEW_METADATA, 
> true)|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java#L557].
>  This is the reason why the user has to be granted the VIEW_METADATA 
> privilege on the database to be able to execute the UDF.
> Recall that the registration of the privilege mentioned above occurs in 
> [FunctionCallExpr#analyzeImpl()|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java#L531]
>  where Impala's frontend analyzes the given function in a query.
> I noticed in the same method above at 
> [https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java#L535],
>  Impala is able to determine whether the current function is a UDF or not. 
> Thus it seems that to fix the problem, we need to additionally register the 
> corresponding privilege request for a UDF (v.s. a built-in function) other 
> than the {{VIEW_METADATA}} privilege on the database.
> We should thus provide a fix for the issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-11728) Set fallback database for functions

2023-01-20 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao resolved IMPALA-11728.
--
Resolution: Fixed

Resolve the patch since the fix has been merged.

> Set fallback database for functions
> ---
>
> Key: IMPALA-11728
> URL: https://issues.apache.org/jira/browse/IMPALA-11728
> Project: IMPALA
>  Issue Type: New Feature
>Reporter: gaoxiaoqing
>Assignee: gaoxiaoqing
>Priority: Major
>
> {code:java}
> CREATE FUNCTION default.function_name([arg_type[, arg_type...])
>   RETURNS return_type
>   LOCATION 'hdfs_path_to_dot_so'
>   SYMBOL='symbol_name' {code}
>  
> {noformat}
> use functional;
> select function_name();
> ERROR: AnalysisException: functional.function_name() unknown for database 
> functional.{noformat}
>  
> The create function statement can only works on specified default database.
> Add a fallback database for functions as query option. It works on all 
> database without changing query. 
> {noformat}
> use functional;
> set db_name_with_global_udf=default;
> select function_name(); // It works.{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-8576) Pass lineage object instead of string to query hook

2023-01-13 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao reassigned IMPALA-8576:
---

Assignee: Fang-Yu Rao

> Pass lineage object instead of string to query hook
> ---
>
> Key: IMPALA-8576
> URL: https://issues.apache.org/jira/browse/IMPALA-8576
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend, Frontend
>Reporter: radford nguyen
>Assignee: Fang-Yu Rao
>Priority: Major
>
> The {{QueryEventHook}} interface currently takes a {{String}} for the 
> {{onQueryComplete}} hook.  This string is the JSON representation of the 
> lineage graph written to the legacy lineage file.
> It would be better to pass the serialized {{byte[]}} of the lineage thrift 
> object itself, so that we can decouple ourselves from any lineage file 
> format(s).
> Additionally, hook implementations should use their own version of Thrift to 
> deserialize the object so that they are not tied to Impala's Thrift version.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



  1   2   3   4   5   6   >