[jira] [Commented] (IMPALA-13009) Possible leak of partition updates when the table has failed DDL and recovered by INVALIDATE METADATA
[ https://issues.apache.org/jira/browse/IMPALA-13009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838405#comment-17838405 ] Fang-Yu Rao commented on IMPALA-13009: -- Thanks for the detailed steps to reproduce the issue [~stigahuang]! I have tried your latest script at https://issues.apache.org/jira/browse/IMPALA-13009?focusedCommentId=17838211=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17838211 and found that I could also reproduce the issue after restarting only the Impala daemons (via "{*}bin/start-impala-cluster.py -r{*}") even though we don't have the command that removes the HDFS path from outside of Impala. I was using Apache Impala on a recent master where the tip commit is IMPALA-12996 (Add support for DATE in Iceberg metadata tables). {code:java} I0417 16:06:57.716398 16131 ImpaladCatalog.java:232] Adding: TABLE:default.my_part version: 1723 size: 1557 I0417 16:06:57.719789 16131 ImpaladCatalog.java:232] Adding: CATALOG_SERVICE_ID version: 1723 size: 60 I0417 16:06:57.720358 16131 ImpaladCatalog.java:257] Adding 9 partition(s): HDFS_PARTITION:default.my_part:(p=1,p=2,...,p=9), versions=[1706, 1712, 1718], size=(avg=588, min=588, max=588, sum=5292) E0417 16:06:57.917488 16131 ImpaladCatalog.java:264] Error adding catalog object: Received stale partition in a statestore update: THdfsPartition(partitionKeyExprs:[TExpr(nodes:[TExprNode(node_type:INT_LITERAL, type:TColumnType(types:[TTypeNode(type:SCALAR, scalar_type:TScalarType(type:INT))]), num_children:0, is_constant:true, int_literal:TIntLiteral(value:1), is_codegen_disabled:false)])], location:THdfsPartitionLocation(prefix_index:0, suffix:p=1), id:0, file_desc:[THdfsFileDesc(file_desc_data:18 00 00 00 00 00 00 00 00 00 0E 00 1C 00 18 00 10 00 00 00 08 00 04 00 0E 00 00 00 18 00 00 00 A9 E7 4F EE 8E 01 00 00 02 00 00 00 00 00 00 00 0C 00 00 00 01 00 00 00 4C 00 00 00 37 00 00 00 61 61 34 36 34 66 61 66 35 61 31 37 36 65 39 65 2D 36 63 66 31 63 38 34 61 30 30 30 30 30 30 30 30 5F 31 37 31 31 36 38 30 30 38 32 5F 64 61 74 61 2E 30 2E 74 78 74 00 0C 00 14 00 00 00 0C 00...)], access_level:READ_WRITE, stats:TTableStats(num_rows:-1), is_marked_cached:false, hms_parameters:{transient_lastDdlTime=1713395198, totalSize=2, numFilesErasureCoded=0, numFiles=1}, num_blocks:1, total_file_size_bytes:2, has_incremental_stats:false, write_id:0, db_name:default, tbl_name:my_part, partition_name:p=1, hdfs_storage_descriptor:THdfsStorageDescriptor(lineDelim:10, fieldDelim:1, collectionDelim:1, mapKeyDelim:1, escapeChar:0, quoteChar:1, fileFormat:TEXT, blockSize:0)) Java exception follows: java.lang.IllegalStateException: Received stale partition in a statestore update: THdfsPartition(partitionKeyExprs:[TExpr(nodes:[TExprNode(node_type:INT_LITERAL, type:TColumnType(types:[TTypeNode(type:SCALAR, scalar_type:TScalarType(type:INT))]), num_children:0, is_constant:true, int_literal:TIntLiteral(value:1), is_codegen_disabled:false)])], location:THdfsPartitionLocation(prefix_index:0, suffix:p=1), id:0, file_desc:[THdfsFileDesc(file_desc_data:18 00 00 00 00 00 00 00 00 00 0E 00 1C 00 18 00 10 00 00 00 08 00 04 00 0E 00 00 00 18 00 00 00 A9 E7 4F EE 8E 01 00 00 02 00 00 00 00 00 00 00 0C 00 00 00 01 00 00 00 4C 00 00 00 37 00 00 00 61 61 34 36 34 66 61 66 35 61 31 37 36 65 39 65 2D 36 63 66 31 63 38 34 61 30 30 30 30 30 30 30 30 5F 31 37 31 31 36 38 30 30 38 32 5F 64 61 74 61 2E 30 2E 74 78 74 00 0C 00 14 00 00 00 0C 00...)], access_level:READ_WRITE, stats:TTableStats(num_rows:-1), is_marked_cached:false, hms_parameters:{transient_lastDdlTime=1713395198, totalSize=2, numFilesErasureCoded=0, numFiles=1}, num_blocks:1, total_file_size_bytes:2, has_incremental_stats:false, write_id:0, db_name:default, tbl_name:my_part, partition_name:p=1, hdfs_storage_descriptor:THdfsStorageDescriptor(lineDelim:10, fieldDelim:1, collectionDelim:1, mapKeyDelim:1, escapeChar:0, quoteChar:1, fileFormat:TEXT, blockSize:0)) at com.google.common.base.Preconditions.checkState(Preconditions.java:512) at org.apache.impala.catalog.ImpaladCatalog.addTable(ImpaladCatalog.java:523) at org.apache.impala.catalog.ImpaladCatalog.addCatalogObject(ImpaladCatalog.java:334) at org.apache.impala.catalog.ImpaladCatalog.updateCatalog(ImpaladCatalog.java:262) at org.apache.impala.service.FeCatalogManager$CatalogdImpl.updateCatalogCache(FeCatalogManager.java:120) at org.apache.impala.service.Frontend.updateCatalogCache(Frontend.java:565) at org.apache.impala.service.JniFrontend.updateCatalogCache(JniFrontend.java:196) {code} > Possible leak of partition updates when the table has failed DDL and > recovered by INVALIDATE METADATA > - > > Key: IMPALA-13009 > URL:
[jira] [Created] (IMPALA-12994) Revise the implementation of FsPermissionChecker to take Ranger policies into consideration
Fang-Yu Rao created IMPALA-12994: Summary: Revise the implementation of FsPermissionChecker to take Ranger policies into consideration Key: IMPALA-12994 URL: https://issues.apache.org/jira/browse/IMPALA-12994 Project: IMPALA Issue Type: Task Components: Frontend Reporter: Fang-Yu Rao Assignee: Fang-Yu Rao Impala's current implementation of [FsPermissionChecker|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/util/FsPermissionChecker.java] does not take into consideration the Ranger policies on HDFS or the underlying file system, which could result in unwanted AnalysisException during query analysis phase as reported in IMPALA-11871 and IMPALA-12291. We should consider revising FsPermissionChecker to consider the Ranger policies on the storage layer as well. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-12985) Use the new constructor when instantiating RangerAccessRequestImpl
[ https://issues.apache.org/jira/browse/IMPALA-12985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao updated IMPALA-12985: - Description: After RANGER-2763, we changed the signature of the class RangerAccessRequestImpl in by adding an additional input argument 'userRoles' as shown in the following. {code:java} public RangerAccessRequestImpl(RangerAccessResource resource, String accessType, String user, Set userGroups, Set userRoles) { ... {code} The new signature is also provided in CDP Ranger. Thus to unblock IMPALA-12921 or to be able to build Apache Impala with locally built Apache Ranger, it may be faster to switch to the new signature on the Impala side than waiting for RANGER-4770 to be resolved on the Ranger side. was: After RANGER-2763, we changed the signature of the class RangerAccessRequestImpl in by adding an additional input argument 'userRoles' as shown in the following. {code:java} public RangerAccessRequestImpl(RangerAccessResource resource, String accessType, String user, Set userGroups, Set userRoles) { ... {code} The new signature is also provided in CDP Ranger. Thus to unblock IMPALA-12921 or to be able to build Apache Impala with Apache Ranger, it may be faster to switch to the new signature on the Impala side. > Use the new constructor when instantiating RangerAccessRequestImpl > -- > > Key: IMPALA-12985 > URL: https://issues.apache.org/jira/browse/IMPALA-12985 > Project: IMPALA > Issue Type: Task > Components: Frontend >Reporter: Fang-Yu Rao >Assignee: Fang-Yu Rao >Priority: Major > > After RANGER-2763, we changed the signature of the class > RangerAccessRequestImpl in by adding an additional input argument 'userRoles' > as shown in the following. > {code:java} > public RangerAccessRequestImpl(RangerAccessResource resource, String > accessType, String user, Set userGroups, Set userRoles) { > ... > {code} > The new signature is also provided in CDP Ranger. Thus to unblock > IMPALA-12921 or to be able to build Apache Impala with locally built Apache > Ranger, it may be faster to switch to the new signature on the Impala side > than waiting for RANGER-4770 to be resolved on the Ranger side. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-12985) Use the new constructor when instantiating RangerAccessRequestImpl
Fang-Yu Rao created IMPALA-12985: Summary: Use the new constructor when instantiating RangerAccessRequestImpl Key: IMPALA-12985 URL: https://issues.apache.org/jira/browse/IMPALA-12985 Project: IMPALA Issue Type: Task Components: Frontend Reporter: Fang-Yu Rao Assignee: Fang-Yu Rao After RANGER-2763, we changed the signature of the class RangerAccessRequestImpl in by adding an additional input argument 'userRoles' as shown in the following. {code:java} public RangerAccessRequestImpl(RangerAccessResource resource, String accessType, String user, Set userGroups, Set userRoles) { ... {code} The new signature is also provided in CDP Ranger. Thus to unblock IMPALA-12921 or to be able to build Apache Impala with Apache Ranger, it may be faster to switch to the new signature on the Impala side. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-12921) Consider adding support for locally built Ranger
[ https://issues.apache.org/jira/browse/IMPALA-12921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao updated IMPALA-12921: - Description: It would be nice to be able to support locally built Ranger in Impala's minicluster in that it would facilitate the testing of features that require changes to both components. *+Edit:+* Making the current Apache Impala on *master* (tip is {*}IMPALA-12925{*}: Fix decimal data type for external JDBC table) to support Ranger on *master* (tip is {*}RANGER-4745{*}: Enhance handling of subAccess authorization in Ranger HDFS plugin) may be too ambitious. The signatures of some classes are already incompatible. For instance, on the Impala side, Impala instantiates the instance of *RangerAccessRequestImpl* via the following code. 4 input arguments are needed. {code:java} RangerAccessRequest req = new RangerAccessRequestImpl(resource, SELECT_ACCESS_TYPE, user.getShortName(), getUserGroups(user)); {code} However, the current signature of RangerAccessRequestImpl's constructor on the master of Apache Ranger is the following. It can be seen we need 5 input arguments instead. {code:java} public RangerAccessRequestImpl(RangerAccessResource resource, String accessType, String user, Set userGroups, Set userRoles) {code} It may be more practical to support Ranger on an earlier version, e.g., [https://github.com/apache/ranger/blob/release-ranger-2.4.0]. was:It would be nice to be able to support locally built Ranger in Impala's minicluster in that it would facilitate the testing of features that require changes to both components. > Consider adding support for locally built Ranger > > > Key: IMPALA-12921 > URL: https://issues.apache.org/jira/browse/IMPALA-12921 > Project: IMPALA > Issue Type: Task >Reporter: Fang-Yu Rao >Assignee: Fang-Yu Rao >Priority: Major > > It would be nice to be able to support locally built Ranger in Impala's > minicluster in that it would facilitate the testing of features that require > changes to both components. > *+Edit:+* > Making the current Apache Impala on *master* (tip is > {*}IMPALA-12925{*}: Fix decimal data type for external JDBC table) to support > Ranger on *master* (tip is > {*}RANGER-4745{*}: Enhance handling of subAccess authorization in Ranger HDFS > plugin) may be too ambitious. > The signatures of some classes are already incompatible. For instance, on the > Impala side, Impala instantiates the instance of *RangerAccessRequestImpl* > via the following code. 4 input arguments are needed. > {code:java} > RangerAccessRequest req = new RangerAccessRequestImpl(resource, > SELECT_ACCESS_TYPE, user.getShortName(), getUserGroups(user)); > {code} > However, the current signature of RangerAccessRequestImpl's constructor on > the master of Apache Ranger is the following. It can be seen we need 5 input > arguments instead. > {code:java} > public RangerAccessRequestImpl(RangerAccessResource resource, String > accessType, String user, Set userGroups, Set userRoles) > {code} > It may be more practical to support Ranger on an earlier version, e.g., > [https://github.com/apache/ranger/blob/release-ranger-2.4.0]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-12291) Insert statement fails even if hdfs ranger policy allows it
[ https://issues.apache.org/jira/browse/IMPALA-12291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao resolved IMPALA-12291. -- Resolution: Duplicate This seems to be a duplicate of IMPALA-11871. We could probably continue our discussion there. I will also review the patch at https://gerrit.cloudera.org/c/20221/ and see how we could proceed. cc: [~khr9603], [~stigahuang], [~amansinha] > Insert statement fails even if hdfs ranger policy allows it > --- > > Key: IMPALA-12291 > URL: https://issues.apache.org/jira/browse/IMPALA-12291 > Project: IMPALA > Issue Type: Bug > Components: fe, Security > Environment: - Impala Version (4.1.0) > - Ranger admin version (2.0) > - Hive version (3.1.2) >Reporter: halim kim >Assignee: halim kim >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > Apache Ranger is framework for providing security and authorization in hadoop > platform. > Impala can also utilize apache ranger via ranger hive policy. > The thing is that insert or some other query is not executed even If you > enable ranger hdfs plugin and set proper allow condition for impala query > excuting. > you can see error log like below. > {code:java} > AnalysisException: Unable to INSERT into target table (testdb.testtable) > because Impala does not have WRITE access to HDFS location: > hdfs://testcluster/warehouse/testdb.db/testtable > {code} > This happens when ranger hdfs plugin is enabled but impala doesn't have > permission for hdfs POSIX permission. > For example, In the case that DB file owner, group and permission is set as > hdfs:hdfs r-xr-xr-- and ranger plugin policy(hdfs, hive and impala) allows > impala to execute query, Insert Query will be fail. > In my opinion, The main cause is impala fe component doesn't check ranger > policy but hdfs POSIX model permissions. > Similar issue : https://issues.apache.org/jira/browse/IMPALA-10272 > I'm working on resolving this issue by adding hdfs ranger policy checking > code. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-11871) INSERT statement does not respect Ranger policies for HDFS
[ https://issues.apache.org/jira/browse/IMPALA-11871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17832957#comment-17832957 ] Fang-Yu Rao commented on IMPALA-11871: -- After reading some past JIRA's in this area, I think it should be safe to skip {*}analyzeWriteAccess{*}() for the *INSERT* statement (or add a startup flag to disable it). Before the fix is ready, we could add the following to the *core-site.xml* consumed by the catalog server to allow an authorized user (by Ranger via Impala's frontend) to insert values into an HDFS table in the {*}legacy catalog mode{*}. Recall that the catalog server would consider the service user, usually named '{*}impala{*}', as a super user as long as the user '{*}impala{*}' belongs to the specified super group by ''. {code:java} dfs.permissions.superusergroup true {code} This is still secure when Ranger is the authorization provider because of the following. # For the INSERT statement, Impala's frontend makes sure the logged-in user (not necessarily the service user '{*}impala{*}') is granted the necessary privilege on the target table. The respective audit log entry is also produced whether or not the query is authorized even though we skip {*}analyzeWriteAccess{*}(). # For a query that has been authorized by Impala's frontend and sent to the backend for execution, if Impala's backend interacts with the underlying services, e.g., HDFS, as the service user '{*}impala{*}', then this service user should always be considered as a super user or a user in a super group. +*Detailed Analysis*+ We started performing such a permissions checking in [IMPALA-1279: Check ACLs for INSERT and LOAD statements|https://github.com/cloudera/Impala/commit/0b32bbd899d988f1cd5c526597932b67f4c35cce] when we were using Sentry as authorization provider. The reason to implement IMPALA-1279 was also mentioned in the description of the JIRA and is excerpted below for easy reference. In short, we would like to fail a query as early as possible if there could be permissions-related issue. {quote}Impala checks permissions for LOAD and INSERT statements before executing them to allow for early-exit if the query would not succeed. However, it does not take extended ACLs in CDH5 into account. When a directory has restrictive Posix permissions (e.g. 000), but has an ACL allowing writes, Impala should allow INSERTs and LOADs to happen to that directory. Instead, the early check will disallow them. If the checks were disabled, the queries would execute (or not!) correctly, because we delegate to libhdfs or the DistributedFileSystem API to actually perform the operations we need. {quote} We hand-crafted the permissions checker within Impala. Specifically, in our [implementation|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/util/FsPermissionChecker.java#L206-L222], Hadoop ACL entries takes precedence over the POSIX permissions and we did *not* take into consideration the policies that could be defined on the HDFS path when the authorization provider is Ranger. Due to how we implemented [FsPermissionChecker|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/util/FsPermissionChecker.java], it's possible that even though a logged-in user has been authorized to execute an INSERT statement into a table via a policy added to Ranger's repository of SQL, the query could fail during the analysis, simply because the service user, usually named '{*}impala{*}', could not pass the permissions checker. For instance, this could occur if the table to insert was created by another query engine, e.g., Hive Server2 (HS2) and thus the table is owned by another service user, e.g., '{*}hive{*}'. In addition, we have an ACL entry of "{*}group::r-x{*}" by default when the table was created. The current implementation of Impala's permissions checker would deny the service user '{*}impala{*}' of writing the table even the user '{*}impala{*}' is in the group of '{*}hive{*}' as shown in the following. {code:java} [r...@ccycloud-4.engesc24485d02.root.comops.site ~]# hdfs dfs -getfacl # file: # owner: hive # group: hive user::rwx group::r-x other::r-x [r...@ccycloud-4.engesc24485d02.root.comops.site impalad]# groups impala impala : impala hive {code} In [IMPALA-3143|https://github.com/apache/impala/commit/a0ad1868bda902fd914bc2be39eb9629a6eceb76], we allowed an administrator to specify the name of the super group (from catalog server's perspective). Once the *current user* belongs to the specified super group denoted via '{*}DFS_PERMISSIONS_SUPERUSERGROUP_KEY{*}' ("{*}dfs.permissions.superusergroup{*}"), which defaulted to '{*}DFS_PERMISSIONS_SUPERUSERGROUP_DEFAULT{*}' ("{*}supergroup{*}"), then catalog server would grant the WRITE request against the corresponding table from the current user. Refer to
[jira] [Comment Edited] (IMPALA-11871) INSERT statement does not respect Ranger policies for HDFS
[ https://issues.apache.org/jira/browse/IMPALA-11871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830738#comment-17830738 ] Fang-Yu Rao edited comment on IMPALA-11871 at 3/26/24 5:17 AM: --- Hi [~MikaelSmith], my current understanding is that this is not a regression from earlier releases. It's more like a feature request for usability. The method that is performing the permissions checking ({*}analyzeWriteAccess{*}()) was added in IMPALA-7311. The purpose, I guess, was to make sure the Impala service has the necessary write permissions as early as possible, i.e., during the query analysis phase (v.s. in the query execution phase). After Impala started supporting Ranger as its authorization provider, ideally, a cluster administrator should be able to manage the permissions on HDFS via either a) Ranger's policy repository for HDFS, or b) the HDFS Access Control Lists (HDFS ACLs). But at the moment, Impala's coordinator unconditionally performs the permissions-checking without checking Ranger's policy repository of HDFS. IMPALA-10272 resolved a similar issue for the LOAD DATA statement. We could resolve this JIRA using the same approach there, where Impala's frontend calls *hadoop.fs.FileSystem.access(Path path, FsAction mode)* to check the actual access permissions, which could also reflect the permissions managed via Ranger's HDFS policy repository. was (Author: fangyurao): Hi [~MikaelSmith], my current understanding is that this is not a regression from earlier releases. It's more like a feature request for usability. The method that is performing the permissions checking ({*}analyzeWriteAccess{*}()) was added in IMPALA-7311. The purpose, I guess, was to make sure the Impala service has the necessary write permissions as early as possible, i.e., during the query analysis phase (v.s. in the query execution phase). After Impala started supporting Ranger as its authorization provider, ideally, a cluster administrator should be able to manage the permissions on HDFS via either a) Ranger's policy repository for HDFS, or b) the HDFS Access Control Lists (HDFS ACLs). But at the moment, Impala's coordinator unconditionally performs the permissions-checking without checking Ranger's policy repository of HDFS. IMPALA-10272 resolved a similar issue for the LOAD DATA statement. We could resolve this JIRA using the same approach there, where Impala's frontend calls *hadoop.fs.FileSystem.access(Path path, FsAction mode)* to check the actual access permissions, which could also reflect the permissions manged via Ranger's HDFS policy repository. > INSERT statement does not respect Ranger policies for HDFS > -- > > Key: IMPALA-11871 > URL: https://issues.apache.org/jira/browse/IMPALA-11871 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Reporter: Fang-Yu Rao >Assignee: Fang-Yu Rao >Priority: Major > > In a cluster with Ranger auth (and with legacy catalog mode), even if you > provide RWX to cm_hdfs -> all-path for the user impala, inserting into a > table whose HDFS POSIX permissions happen to exclude impala access will > result in an > {noformat} > "AnalysisException: Unable to INSERT into target table (default.t1) because > Impala does not have WRITE access to HDFS location: > hdfs://nightly-71x-vx-2.nightly-71x-vx.root.hwx.site:8020/warehouse/tablespace/external/hive/t1"{noformat} > > {noformat} > [root@nightly-71x-vx-3 ~]# hdfs dfs -getfacl > /warehouse/tablespace/external/hive/t1 > file: /warehouse/tablespace/external/hive/t1 > owner: hive > group: supergroup > user::rwx > user:impala:rwx #effective:r-x > group::rwx #effective:r-x > mask::r-x > other::--- > default:user::rwx > default:user:impala:rwx > default:group::rwx > default:mask::rwx > default:other::--- {noformat} > ~~ > ANALYSIS > Stack trace from a version of Cloudera's distribution of Impala (impalad > version 3.4.0-SNAPSHOT RELEASE (build > {*}db20b59a093c17ea4699117155d58fe874f7d68f{*})): > {noformat} > at > org.apache.impala.catalog.FeFsTable$Utils.checkWriteAccess(FeFsTable.java:585) > at > org.apache.impala.analysis.InsertStmt.analyzeWriteAccess(InsertStmt.java:545) > at org.apache.impala.analysis.InsertStmt.analyze(InsertStmt.java:391) > at > org.apache.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:463) > at > org.apache.impala.analysis.AnalysisContext.analyzeAndAuthorize(AnalysisContext.java:426) > at org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:1570) > at org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:1536) > at org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1506) > at >
[jira] [Commented] (IMPALA-11871) INSERT statement does not respect Ranger policies for HDFS
[ https://issues.apache.org/jira/browse/IMPALA-11871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830738#comment-17830738 ] Fang-Yu Rao commented on IMPALA-11871: -- Hi [~MikaelSmith], my current understanding is that this is not a regression from earlier releases. It's more like a feature request for usability. The method that is performing the permissions checking ({*}analyzeWriteAccess{*}()) was added in IMPALA-7311. The purpose, I guess, was to make sure the Impala service has the necessary write permissions as early as possible, i.e., during the query analysis phase (v.s. in the query execution phase). After Impala started supporting Ranger as its authorization provider, ideally, a cluster administrator should be able to manage the permissions on HDFS via either a) Ranger's policy repository for HDFS, or b) the HDFS Access Control Lists (HDFS ACLs). But at the moment, Impala's coordinator unconditionally performs the permissions-checking without checking Ranger's policy repository of HDFS. IMPALA-10272 resolved a similar issue for the LOAD DATA statement. We could resolve this JIRA using the same approach there, where Impala's frontend calls *hadoop.fs.FileSystem.access(Path path, FsAction mode)* to check the actual access permissions, which could also reflect the permissions manged via Ranger's HDFS policy repository. > INSERT statement does not respect Ranger policies for HDFS > -- > > Key: IMPALA-11871 > URL: https://issues.apache.org/jira/browse/IMPALA-11871 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Reporter: Fang-Yu Rao >Assignee: Fang-Yu Rao >Priority: Major > > In a cluster with Ranger auth (and with legacy catalog mode), even if you > provide RWX to cm_hdfs -> all-path for the user impala, inserting into a > table whose HDFS POSIX permissions happen to exclude impala access will > result in an > {noformat} > "AnalysisException: Unable to INSERT into target table (default.t1) because > Impala does not have WRITE access to HDFS location: > hdfs://nightly-71x-vx-2.nightly-71x-vx.root.hwx.site:8020/warehouse/tablespace/external/hive/t1"{noformat} > > {noformat} > [root@nightly-71x-vx-3 ~]# hdfs dfs -getfacl > /warehouse/tablespace/external/hive/t1 > file: /warehouse/tablespace/external/hive/t1 > owner: hive > group: supergroup > user::rwx > user:impala:rwx #effective:r-x > group::rwx #effective:r-x > mask::r-x > other::--- > default:user::rwx > default:user:impala:rwx > default:group::rwx > default:mask::rwx > default:other::--- {noformat} > ~~ > ANALYSIS > Stack trace from a version of Cloudera's distribution of Impala (impalad > version 3.4.0-SNAPSHOT RELEASE (build > {*}db20b59a093c17ea4699117155d58fe874f7d68f{*})): > {noformat} > at > org.apache.impala.catalog.FeFsTable$Utils.checkWriteAccess(FeFsTable.java:585) > at > org.apache.impala.analysis.InsertStmt.analyzeWriteAccess(InsertStmt.java:545) > at org.apache.impala.analysis.InsertStmt.analyze(InsertStmt.java:391) > at > org.apache.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:463) > at > org.apache.impala.analysis.AnalysisContext.analyzeAndAuthorize(AnalysisContext.java:426) > at org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:1570) > at org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:1536) > at org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1506) > at > org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:155){noformat} > The exception occurs at analysis time, so I tested and succeeded in writing > directly into the said directory. > {noformat} > [root@nightly-71x-vx-3 ~]# hdfs dfs -touchz > /warehouse/tablespace/external/hive/t1/test > [root@nightly-71x-vx-3 ~]# hdfs dfs -ls > /warehouse/tablespace/external/hive/t1/ > Found 8 items > rw-rw---+ 3 hive supergroup 417 2023-01-27 17:37 > /warehouse/tablespace/external/hive/t1/00_0 > rw-rw---+ 3 hive supergroup 417 2023-01-27 17:44 > /warehouse/tablespace/external/hive/t1/00_0_copy_1 > rw-rw---+ 3 hive supergroup 417 2023-01-27 17:49 > /warehouse/tablespace/external/hive/t1/00_0_copy_2 > rw-rw---+ 3 hive supergroup 417 2023-01-27 17:53 > /warehouse/tablespace/external/hive/t1/00_0_copy_3 > rw-rw---+ 3 impala hive 355 2023-01-27 17:17 > /warehouse/tablespace/external/hive/t1/4c4477c12c51ad96-3126b52d_2029811630_data.0.parq > rw-rw---+ 3 impala hive 355 2023-01-27 17:39 > /warehouse/tablespace/external/hive/t1/9945b25bb37d1ff2-473c1478_574471191_data.0.parq > drwxrwx---+ - impala hive 0 2023-01-27 17:39 > /warehouse/tablespace/external/hive/t1/_impala_insert_staging > rw-rw---+ 3 impala supergroup 0 2023-01-27 18:01 >
[jira] [Created] (IMPALA-12921) Consider adding support for locally built Ranger
Fang-Yu Rao created IMPALA-12921: Summary: Consider adding support for locally built Ranger Key: IMPALA-12921 URL: https://issues.apache.org/jira/browse/IMPALA-12921 Project: IMPALA Issue Type: Task Reporter: Fang-Yu Rao Assignee: Fang-Yu Rao It would be nice to be able to support locally built Ranger in Impala's minicluster in that it would facilitate the testing of features that require changes to both components. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Comment Edited] (IMPALA-12830) test_webserver_hide_logs_link() could fail in the exhaustive build
[ https://issues.apache.org/jira/browse/IMPALA-12830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819426#comment-17819426 ] Fang-Yu Rao edited comment on IMPALA-12830 at 2/22/24 12:43 AM: This issue seems to be similar to IMPALA-12170. cc: [~stigahuang] was (Author: fangyurao): This issue seems to be similar to IMPALA-12170. > test_webserver_hide_logs_link() could fail in the exhaustive build > -- > > Key: IMPALA-12830 > URL: https://issues.apache.org/jira/browse/IMPALA-12830 > Project: IMPALA > Issue Type: Bug >Reporter: Fang-Yu Rao >Assignee: Saurabh Katiyal >Priority: Major > Labels: broken-build > > We found in an internal Jenkins run that test_webserver_hide_logs_link() > could fail in the exhaustive build with the following error. > +*Error Message*+ > {code:java} > AssertionError: bad links from webui port 25020 assert ['/', > '/catal...g_level', ...] == ['/', '/catalo...g_level', ...] At index 2 > diff: u'/events' != '/hadoop-varz' Full diff: - [u'/', ? - + ['/', > - u'/catalog', ? - + '/catalog', - u'/events', - > u'/hadoop-varz', ? - + '/hadoop-varz', + '/events', - u'/jmx', > ? - + '/jmx', - u'/log_level', ? - + '/log_level', - > u'/memz', ? - + '/memz', - u'/metrics', ? - + '/metrics', - > u'/operations', ? - + '/operations', - u'/profile_docs', ? - + > '/profile_docs', - u'/rpcz', ? - + '/rpcz', - u'/threadz', ? - > + '/threadz', - u'/varz'] ? - + '/varz'] > {code} > +*Stacktrace*+ > {code:java} > custom_cluster/test_web_pages.py:248: in test_webserver_hide_logs_link > assert found_links == expected_catalog_links, msg > E AssertionError: bad links from webui port 25020 > E assert ['/', '/catal...g_level', ...] == ['/', '/catalo...g_level', ...] > E At index 2 diff: u'/events' != '/hadoop-varz' > E Full diff: > E - [u'/', > E ? - > E + ['/', > E - u'/catalog', > E ? - > E + '/catalog', > E - u'/events', > E - u'/hadoop-varz', > E ? - > E + '/hadoop-varz', > E + '/events', > E - u'/jmx', > E ? - > E + '/jmx', > E - u'/log_level', > E ? - > E + '/log_level', > E - u'/memz', > E ? - > E + '/memz', > E - u'/metrics', > E ? - > E + '/metrics', > E - u'/operations', > E ? - > E + '/operations', > E - u'/profile_docs', > E ? - > E + '/profile_docs', > E - u'/rpcz', > E ? - > E + '/rpcz', > E - u'/threadz', > E ? - > E + '/threadz', > E - u'/varz'] > E ? - > E + '/varz'] > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12830) test_webserver_hide_logs_link() could fail in the exhaustive build
[ https://issues.apache.org/jira/browse/IMPALA-12830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819426#comment-17819426 ] Fang-Yu Rao commented on IMPALA-12830: -- This issue seems to be similar to IMPALA-12170. > test_webserver_hide_logs_link() could fail in the exhaustive build > -- > > Key: IMPALA-12830 > URL: https://issues.apache.org/jira/browse/IMPALA-12830 > Project: IMPALA > Issue Type: Bug >Reporter: Fang-Yu Rao >Assignee: Saurabh Katiyal >Priority: Major > Labels: broken-build > > We found in an internal Jenkins run that test_webserver_hide_logs_link() > could fail in the exhaustive build with the following error. > +*Error Message*+ > {code:java} > AssertionError: bad links from webui port 25020 assert ['/', > '/catal...g_level', ...] == ['/', '/catalo...g_level', ...] At index 2 > diff: u'/events' != '/hadoop-varz' Full diff: - [u'/', ? - + ['/', > - u'/catalog', ? - + '/catalog', - u'/events', - > u'/hadoop-varz', ? - + '/hadoop-varz', + '/events', - u'/jmx', > ? - + '/jmx', - u'/log_level', ? - + '/log_level', - > u'/memz', ? - + '/memz', - u'/metrics', ? - + '/metrics', - > u'/operations', ? - + '/operations', - u'/profile_docs', ? - + > '/profile_docs', - u'/rpcz', ? - + '/rpcz', - u'/threadz', ? - > + '/threadz', - u'/varz'] ? - + '/varz'] > {code} > +*Stacktrace*+ > {code:java} > custom_cluster/test_web_pages.py:248: in test_webserver_hide_logs_link > assert found_links == expected_catalog_links, msg > E AssertionError: bad links from webui port 25020 > E assert ['/', '/catal...g_level', ...] == ['/', '/catalo...g_level', ...] > E At index 2 diff: u'/events' != '/hadoop-varz' > E Full diff: > E - [u'/', > E ? - > E + ['/', > E - u'/catalog', > E ? - > E + '/catalog', > E - u'/events', > E - u'/hadoop-varz', > E ? - > E + '/hadoop-varz', > E + '/events', > E - u'/jmx', > E ? - > E + '/jmx', > E - u'/log_level', > E ? - > E + '/log_level', > E - u'/memz', > E ? - > E + '/memz', > E - u'/metrics', > E ? - > E + '/metrics', > E - u'/operations', > E ? - > E + '/operations', > E - u'/profile_docs', > E ? - > E + '/profile_docs', > E - u'/rpcz', > E ? - > E + '/rpcz', > E - u'/threadz', > E ? - > E + '/threadz', > E - u'/varz'] > E ? - > E + '/varz'] > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12830) test_webserver_hide_logs_link() could fail in the exhaustive build
[ https://issues.apache.org/jira/browse/IMPALA-12830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819425#comment-17819425 ] Fang-Yu Rao commented on IMPALA-12830: -- Hi [~skatiyal], assigned the JIRA to you since you revised the test case in IMPALA-9086 (Show Hive configurations in /hadoop-varz page) and thus may be more familiar with the context. Please feel free to re-assign as you see appropriate. Thanks! > test_webserver_hide_logs_link() could fail in the exhaustive build > -- > > Key: IMPALA-12830 > URL: https://issues.apache.org/jira/browse/IMPALA-12830 > Project: IMPALA > Issue Type: Bug >Reporter: Fang-Yu Rao >Assignee: Saurabh Katiyal >Priority: Major > Labels: broken-build > > We found in an internal Jenkins run that test_webserver_hide_logs_link() > could fail in the exhaustive build with the following error. > +*Error Message*+ > {code:java} > AssertionError: bad links from webui port 25020 assert ['/', > '/catal...g_level', ...] == ['/', '/catalo...g_level', ...] At index 2 > diff: u'/events' != '/hadoop-varz' Full diff: - [u'/', ? - + ['/', > - u'/catalog', ? - + '/catalog', - u'/events', - > u'/hadoop-varz', ? - + '/hadoop-varz', + '/events', - u'/jmx', > ? - + '/jmx', - u'/log_level', ? - + '/log_level', - > u'/memz', ? - + '/memz', - u'/metrics', ? - + '/metrics', - > u'/operations', ? - + '/operations', - u'/profile_docs', ? - + > '/profile_docs', - u'/rpcz', ? - + '/rpcz', - u'/threadz', ? - > + '/threadz', - u'/varz'] ? - + '/varz'] > {code} > +*Stacktrace*+ > {code:java} > custom_cluster/test_web_pages.py:248: in test_webserver_hide_logs_link > assert found_links == expected_catalog_links, msg > E AssertionError: bad links from webui port 25020 > E assert ['/', '/catal...g_level', ...] == ['/', '/catalo...g_level', ...] > E At index 2 diff: u'/events' != '/hadoop-varz' > E Full diff: > E - [u'/', > E ? - > E + ['/', > E - u'/catalog', > E ? - > E + '/catalog', > E - u'/events', > E - u'/hadoop-varz', > E ? - > E + '/hadoop-varz', > E + '/events', > E - u'/jmx', > E ? - > E + '/jmx', > E - u'/log_level', > E ? - > E + '/log_level', > E - u'/memz', > E ? - > E + '/memz', > E - u'/metrics', > E ? - > E + '/metrics', > E - u'/operations', > E ? - > E + '/operations', > E - u'/profile_docs', > E ? - > E + '/profile_docs', > E - u'/rpcz', > E ? - > E + '/rpcz', > E - u'/threadz', > E ? - > E + '/threadz', > E - u'/varz'] > E ? - > E + '/varz'] > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-12830) test_web_pages() could fail in the exhaustive build
Fang-Yu Rao created IMPALA-12830: Summary: test_web_pages() could fail in the exhaustive build Key: IMPALA-12830 URL: https://issues.apache.org/jira/browse/IMPALA-12830 Project: IMPALA Issue Type: Bug Reporter: Fang-Yu Rao Assignee: Saurabh Katiyal We found in an internal Jenkins run that test_web_pages() could fail in the exhaustive build with the following error. +*Error Message*+ {code} AssertionError: bad links from webui port 25020 assert ['/', '/catal...g_level', ...] == ['/', '/catalo...g_level', ...] At index 2 diff: u'/events' != '/hadoop-varz' Full diff: - [u'/', ? - + ['/', - u'/catalog', ? - + '/catalog', - u'/events', - u'/hadoop-varz', ? - + '/hadoop-varz', + '/events', - u'/jmx', ? - + '/jmx', - u'/log_level', ? - + '/log_level', - u'/memz', ? - + '/memz', - u'/metrics', ? - + '/metrics', - u'/operations', ? - + '/operations', - u'/profile_docs', ? - + '/profile_docs', - u'/rpcz', ? - + '/rpcz', - u'/threadz', ? - + '/threadz', - u'/varz'] ? - + '/varz'] {code} +*Stacktrace*+ {code} custom_cluster/test_web_pages.py:248: in test_webserver_hide_logs_link assert found_links == expected_catalog_links, msg E AssertionError: bad links from webui port 25020 E assert ['/', '/catal...g_level', ...] == ['/', '/catalo...g_level', ...] E At index 2 diff: u'/events' != '/hadoop-varz' E Full diff: E - [u'/', E ? - E + ['/', E - u'/catalog', E ? - E + '/catalog', E - u'/events', E - u'/hadoop-varz', E ? - E + '/hadoop-varz', E + '/events', E - u'/jmx', E ? - E + '/jmx', E - u'/log_level', E ? - E + '/log_level', E - u'/memz', E ? - E + '/memz', E - u'/metrics', E ? - E + '/metrics', E - u'/operations', E ? - E + '/operations', E - u'/profile_docs', E ? - E + '/profile_docs', E - u'/rpcz', E ? - E + '/rpcz', E - u'/threadz', E ? - E + '/threadz', E - u'/varz'] E ? - E + '/varz'] {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-12830) test_webserver_hide_logs_link() could fail in the exhaustive build
[ https://issues.apache.org/jira/browse/IMPALA-12830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao updated IMPALA-12830: - Summary: test_webserver_hide_logs_link() could fail in the exhaustive build (was: test_web_pages() could fail in the exhaustive build) > test_webserver_hide_logs_link() could fail in the exhaustive build > -- > > Key: IMPALA-12830 > URL: https://issues.apache.org/jira/browse/IMPALA-12830 > Project: IMPALA > Issue Type: Bug >Reporter: Fang-Yu Rao >Assignee: Saurabh Katiyal >Priority: Major > Labels: broken-build > > We found in an internal Jenkins run that test_web_pages() could fail in the > exhaustive build with the following error. > +*Error Message*+ > {code} > AssertionError: bad links from webui port 25020 assert ['/', > '/catal...g_level', ...] == ['/', '/catalo...g_level', ...] At index 2 > diff: u'/events' != '/hadoop-varz' Full diff: - [u'/', ? - + ['/', > - u'/catalog', ? - + '/catalog', - u'/events', - > u'/hadoop-varz', ? - + '/hadoop-varz', + '/events', - u'/jmx', > ? - + '/jmx', - u'/log_level', ? - + '/log_level', - > u'/memz', ? - + '/memz', - u'/metrics', ? - + '/metrics', - > u'/operations', ? - + '/operations', - u'/profile_docs', ? - + > '/profile_docs', - u'/rpcz', ? - + '/rpcz', - u'/threadz', ? - > + '/threadz', - u'/varz'] ? - + '/varz'] > {code} > +*Stacktrace*+ > {code} > custom_cluster/test_web_pages.py:248: in test_webserver_hide_logs_link > assert found_links == expected_catalog_links, msg > E AssertionError: bad links from webui port 25020 > E assert ['/', '/catal...g_level', ...] == ['/', '/catalo...g_level', ...] > E At index 2 diff: u'/events' != '/hadoop-varz' > E Full diff: > E - [u'/', > E ? - > E + ['/', > E - u'/catalog', > E ? - > E + '/catalog', > E - u'/events', > E - u'/hadoop-varz', > E ? - > E + '/hadoop-varz', > E + '/events', > E - u'/jmx', > E ? - > E + '/jmx', > E - u'/log_level', > E ? - > E + '/log_level', > E - u'/memz', > E ? - > E + '/memz', > E - u'/metrics', > E ? - > E + '/metrics', > E - u'/operations', > E ? - > E + '/operations', > E - u'/profile_docs', > E ? - > E + '/profile_docs', > E - u'/rpcz', > E ? - > E + '/rpcz', > E - u'/threadz', > E ? - > E + '/threadz', > E - u'/varz'] > E ? - > E + '/varz'] > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-12830) test_webserver_hide_logs_link() could fail in the exhaustive build
[ https://issues.apache.org/jira/browse/IMPALA-12830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao updated IMPALA-12830: - Description: We found in an internal Jenkins run that test_webserver_hide_logs_link() could fail in the exhaustive build with the following error. +*Error Message*+ {code:java} AssertionError: bad links from webui port 25020 assert ['/', '/catal...g_level', ...] == ['/', '/catalo...g_level', ...] At index 2 diff: u'/events' != '/hadoop-varz' Full diff: - [u'/', ? - + ['/', - u'/catalog', ? - + '/catalog', - u'/events', - u'/hadoop-varz', ? - + '/hadoop-varz', + '/events', - u'/jmx', ? - + '/jmx', - u'/log_level', ? - + '/log_level', - u'/memz', ? - + '/memz', - u'/metrics', ? - + '/metrics', - u'/operations', ? - + '/operations', - u'/profile_docs', ? - + '/profile_docs', - u'/rpcz', ? - + '/rpcz', - u'/threadz', ? - + '/threadz', - u'/varz'] ? - + '/varz'] {code} +*Stacktrace*+ {code:java} custom_cluster/test_web_pages.py:248: in test_webserver_hide_logs_link assert found_links == expected_catalog_links, msg E AssertionError: bad links from webui port 25020 E assert ['/', '/catal...g_level', ...] == ['/', '/catalo...g_level', ...] E At index 2 diff: u'/events' != '/hadoop-varz' E Full diff: E - [u'/', E ? - E + ['/', E - u'/catalog', E ? - E + '/catalog', E - u'/events', E - u'/hadoop-varz', E ? - E + '/hadoop-varz', E + '/events', E - u'/jmx', E ? - E + '/jmx', E - u'/log_level', E ? - E + '/log_level', E - u'/memz', E ? - E + '/memz', E - u'/metrics', E ? - E + '/metrics', E - u'/operations', E ? - E + '/operations', E - u'/profile_docs', E ? - E + '/profile_docs', E - u'/rpcz', E ? - E + '/rpcz', E - u'/threadz', E ? - E + '/threadz', E - u'/varz'] E ? - E + '/varz'] {code} was: We found in an internal Jenkins run that test_web_pages() could fail in the exhaustive build with the following error. +*Error Message*+ {code} AssertionError: bad links from webui port 25020 assert ['/', '/catal...g_level', ...] == ['/', '/catalo...g_level', ...] At index 2 diff: u'/events' != '/hadoop-varz' Full diff: - [u'/', ? - + ['/', - u'/catalog', ? - + '/catalog', - u'/events', - u'/hadoop-varz', ? - + '/hadoop-varz', + '/events', - u'/jmx', ? - + '/jmx', - u'/log_level', ? - + '/log_level', - u'/memz', ? - + '/memz', - u'/metrics', ? - + '/metrics', - u'/operations', ? - + '/operations', - u'/profile_docs', ? - + '/profile_docs', - u'/rpcz', ? - + '/rpcz', - u'/threadz', ? - + '/threadz', - u'/varz'] ? - + '/varz'] {code} +*Stacktrace*+ {code} custom_cluster/test_web_pages.py:248: in test_webserver_hide_logs_link assert found_links == expected_catalog_links, msg E AssertionError: bad links from webui port 25020 E assert ['/', '/catal...g_level', ...] == ['/', '/catalo...g_level', ...] E At index 2 diff: u'/events' != '/hadoop-varz' E Full diff: E - [u'/', E ? - E + ['/', E - u'/catalog', E ? - E + '/catalog', E - u'/events', E - u'/hadoop-varz', E ? - E + '/hadoop-varz', E + '/events', E - u'/jmx', E ? - E + '/jmx', E - u'/log_level', E ? - E + '/log_level', E - u'/memz', E ? - E + '/memz', E - u'/metrics', E ? - E + '/metrics', E - u'/operations', E ? - E + '/operations', E - u'/profile_docs', E ? - E + '/profile_docs', E - u'/rpcz', E ? - E + '/rpcz', E - u'/threadz', E ? - E + '/threadz', E - u'/varz'] E ? - E + '/varz'] {code} > test_webserver_hide_logs_link() could fail in the exhaustive build > -- > > Key: IMPALA-12830 > URL: https://issues.apache.org/jira/browse/IMPALA-12830 > Project: IMPALA > Issue Type: Bug >Reporter: Fang-Yu Rao >Assignee: Saurabh Katiyal >Priority: Major > Labels: broken-build > > We found in an internal Jenkins run that test_webserver_hide_logs_link() > could fail in the exhaustive build with the following error. > +*Error Message*+ > {code:java} > AssertionError: bad links from webui port 25020 assert ['/', > '/catal...g_level', ...] == ['/', '/catalo...g_level', ...] At index 2 > diff: u'/events' != '/hadoop-varz' Full diff: - [u'/', ? - + ['/', > - u'/catalog', ? - + '/catalog', - u'/events', - > u'/hadoop-varz', ? - +
[jira] [Commented] (IMPALA-12819) InaccessibleObjectException found during LocalCatalogTest
[ https://issues.apache.org/jira/browse/IMPALA-12819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818215#comment-17818215 ] Fang-Yu Rao commented on IMPALA-12819: -- Hi [~MikaelSmith], assigned the JIRA to you since you helped with IMPALA-11260 earlier and may be more familiar with the context. Please re-assign the ticket as you see appropriate. Thanks! > InaccessibleObjectException found during LocalCatalogTest > - > > Key: IMPALA-12819 > URL: https://issues.apache.org/jira/browse/IMPALA-12819 > Project: IMPALA > Issue Type: Bug > Components: fe >Affects Versions: Impala 4.4.0 >Reporter: Fang-Yu Rao >Assignee: Michael Smith >Priority: Major > Labels: broken-build > > We found in an internal build that during LocalCatalogTest we could encounter > InaccessibleObjectException. This was found by the test > [test_no_inaccessible_objects|https://github.com/apache/impala/blob/master/tests/verifiers/test_banned_log_messages.py#L40C7-L40C35] > {code:java} > W0217 01:31:14.108255 18119 ObjectGraphWalker.java:251] The JVM is preventing > Ehcache from accessing the subgraph beneath 'private final > jdk.internal.platform.CgroupV1Metrics > jdk.internal.platform.CgroupV1MetricsImpl.metrics' - cache sizes may be > underestimated as a result > Java exception follows: > java.lang.reflect.InaccessibleObjectException: Unable to make field private > final jdk.internal.platform.CgroupV1Metrics > jdk.internal.platform.CgroupV1MetricsImpl.metrics accessible: module > java.base does not "opens jdk.internal.platform" to unnamed module @2c89cd7f > at > java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:340) > at > java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:280) > at > java.base/java.lang.reflect.Field.checkCanSetAccessible(Field.java:176) > at java.base/java.lang.reflect.Field.setAccessible(Field.java:170) > at > org.ehcache.sizeof.ObjectGraphWalker.getAllFields(ObjectGraphWalker.java:245) > at > org.ehcache.sizeof.ObjectGraphWalker.getFilteredFields(ObjectGraphWalker.java:204) > at > org.ehcache.sizeof.ObjectGraphWalker.walk(ObjectGraphWalker.java:159) > at org.ehcache.sizeof.SizeOf.deepSizeOf(SizeOf.java:74) > at > org.apache.impala.catalog.local.CatalogdMetaProvider$SizeOfWeigher.weigh(CatalogdMetaProvider.java:2234) > at > com.google.common.cache.LocalCache$Segment.setValue(LocalCache.java:2043) > at > com.google.common.cache.LocalCache$Segment.replace(LocalCache.java:2990) > at com.google.common.cache.LocalCache.replace(LocalCache.java:4324) > at > org.apache.impala.catalog.local.CatalogdMetaProvider.loadWithCaching(CatalogdMetaProvider.java:569) > at > org.apache.impala.catalog.local.CatalogdMetaProvider.loadIcebergApiTable(CatalogdMetaProvider.java:1160) > at > org.apache.impala.catalog.local.LocalIcebergTable.loadIcebergTableViaMetaProvider(LocalIcebergTable.java:96) > at > org.apache.impala.catalog.local.LocalTable.load(LocalTable.java:131) > at > org.apache.impala.catalog.local.LocalTable.load(LocalTable.java:114) > at org.apache.impala.catalog.local.LocalDb.getTable(LocalDb.java:148) > at > org.apache.impala.catalog.local.LocalCatalog.getTable(LocalCatalog.java:139) > at > org.apache.impala.catalog.local.LocalCatalogTest.testLoadIcebergFileDescriptors(LocalCatalogTest.java:280) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:566) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at
[jira] [Created] (IMPALA-12819) InaccessibleObjectException found during LocalCatalogTest
Fang-Yu Rao created IMPALA-12819: Summary: InaccessibleObjectException found during LocalCatalogTest Key: IMPALA-12819 URL: https://issues.apache.org/jira/browse/IMPALA-12819 Project: IMPALA Issue Type: Bug Components: fe Affects Versions: Impala 4.4.0 Reporter: Fang-Yu Rao Assignee: Michael Smith We found in an internal build that during LocalCatalogTest we could encounter InaccessibleObjectException. This was found by the test [test_no_inaccessible_objects|https://github.com/apache/impala/blob/master/tests/verifiers/test_banned_log_messages.py#L40C7-L40C35] {code:java} W0217 01:31:14.108255 18119 ObjectGraphWalker.java:251] The JVM is preventing Ehcache from accessing the subgraph beneath 'private final jdk.internal.platform.CgroupV1Metrics jdk.internal.platform.CgroupV1MetricsImpl.metrics' - cache sizes may be underestimated as a result Java exception follows: java.lang.reflect.InaccessibleObjectException: Unable to make field private final jdk.internal.platform.CgroupV1Metrics jdk.internal.platform.CgroupV1MetricsImpl.metrics accessible: module java.base does not "opens jdk.internal.platform" to unnamed module @2c89cd7f at java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:340) at java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:280) at java.base/java.lang.reflect.Field.checkCanSetAccessible(Field.java:176) at java.base/java.lang.reflect.Field.setAccessible(Field.java:170) at org.ehcache.sizeof.ObjectGraphWalker.getAllFields(ObjectGraphWalker.java:245) at org.ehcache.sizeof.ObjectGraphWalker.getFilteredFields(ObjectGraphWalker.java:204) at org.ehcache.sizeof.ObjectGraphWalker.walk(ObjectGraphWalker.java:159) at org.ehcache.sizeof.SizeOf.deepSizeOf(SizeOf.java:74) at org.apache.impala.catalog.local.CatalogdMetaProvider$SizeOfWeigher.weigh(CatalogdMetaProvider.java:2234) at com.google.common.cache.LocalCache$Segment.setValue(LocalCache.java:2043) at com.google.common.cache.LocalCache$Segment.replace(LocalCache.java:2990) at com.google.common.cache.LocalCache.replace(LocalCache.java:4324) at org.apache.impala.catalog.local.CatalogdMetaProvider.loadWithCaching(CatalogdMetaProvider.java:569) at org.apache.impala.catalog.local.CatalogdMetaProvider.loadIcebergApiTable(CatalogdMetaProvider.java:1160) at org.apache.impala.catalog.local.LocalIcebergTable.loadIcebergTableViaMetaProvider(LocalIcebergTable.java:96) at org.apache.impala.catalog.local.LocalTable.load(LocalTable.java:131) at org.apache.impala.catalog.local.LocalTable.load(LocalTable.java:114) at org.apache.impala.catalog.local.LocalDb.getTable(LocalDb.java:148) at org.apache.impala.catalog.local.LocalCatalog.getTable(LocalCatalog.java:139) at org.apache.impala.catalog.local.LocalCatalogTest.testLoadIcebergFileDescriptors(LocalCatalogTest.java:280) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) at org.junit.runners.ParentRunner.run(ParentRunner.java:363) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:316) at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:240) at
[jira] [Updated] (IMPALA-11743) Support the OWNER privilege for UDFs in Impala
[ https://issues.apache.org/jira/browse/IMPALA-11743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao updated IMPALA-11743: - Summary: Support the OWNER privilege for UDFs in Impala (was: Investigate how to support the OWNER privilege for UDFs in Impala) > Support the OWNER privilege for UDFs in Impala > -- > > Key: IMPALA-11743 > URL: https://issues.apache.org/jira/browse/IMPALA-11743 > Project: IMPALA > Issue Type: New Feature > Components: Frontend >Reporter: Fang-Yu Rao >Assignee: Fang-Yu Rao >Priority: Major > > Currently in Impala a user allowed to create a UDF in a database still has to > be explicitly granted the necessary privileges to execute the UDF later in a > SELECT query. It would be more convenient if the ownership information of a > UDF could also be retrieved during the query analysis of such SELECT queries > so that the owner/creator of a UDF will be allowed to execute the UDF without > being explicitly granted the necessary privileges on the UDF. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12578) Pass the owner user to Ranger plug-in in GRANT and REVOKE statements for databases, tables, and columns
[ https://issues.apache.org/jira/browse/IMPALA-12578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17803733#comment-17803733 ] Fang-Yu Rao commented on IMPALA-12578: -- I separate the case of UDFs from this JIRA because currently Impala does not have the concept of owner with respect to UDF. According to what is seen in IMPALA-11743, the changes needed to support UDF ownership will be complicated and thus it's better to have a separate JIRA for the case of UDFs. > Pass the owner user to Ranger plug-in in GRANT and REVOKE statements for > databases, tables, and columns > --- > > Key: IMPALA-12578 > URL: https://issues.apache.org/jira/browse/IMPALA-12578 > Project: IMPALA > Issue Type: New Feature >Reporter: Fang-Yu Rao >Assignee: Fang-Yu Rao >Priority: Major > > Starting from RANGER-1200, Ranger supports the notion of the OWNER user, > which allows each user to perform any operation on the resources owned by it. > This avoids the need for creating a new policy that grants the OWNER user the > privileges on every newly created resource. Refer to > [apache-ranger-policy-model|https://blogsarchive.apache.org/ranger/entry/apache-ranger-policy-model#:~:text=allow%20each%20user%20to%20access%20all,all]. > Currently for the GRANT and REVOKE statements, Impala does not pass the owner > of the resource to the Ranger plug-in and thus a non-administrative user > could not grant/revoke privileges on a resource to/from another user even > though this non-administrative user owns the resource. We should pass the > ownership information to the Ranger plug-in to make authorization management > easier in Impala. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-12685) Pass the owner user to Ranger plug-in in GRANT and REVOKE statements for UDFs
[ https://issues.apache.org/jira/browse/IMPALA-12685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao updated IMPALA-12685: - Summary: Pass the owner user to Ranger plug-in in GRANT and REVOKE statements for UDFs (was: Pass the owner user to Ranger plug-in in GRANT and REVOKE statements for UDF) > Pass the owner user to Ranger plug-in in GRANT and REVOKE statements for UDFs > - > > Key: IMPALA-12685 > URL: https://issues.apache.org/jira/browse/IMPALA-12685 > Project: IMPALA > Issue Type: New Feature >Reporter: Fang-Yu Rao >Assignee: Fang-Yu Rao >Priority: Major > > This is the follow-up to IMPALA-12578, where we tackle the cases of > databases, tables, and columns. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-12685) Pass the owner user to Ranger plug-in in GRANT and REVOKE statements for UDF
Fang-Yu Rao created IMPALA-12685: Summary: Pass the owner user to Ranger plug-in in GRANT and REVOKE statements for UDF Key: IMPALA-12685 URL: https://issues.apache.org/jira/browse/IMPALA-12685 Project: IMPALA Issue Type: New Feature Reporter: Fang-Yu Rao Assignee: Fang-Yu Rao This is the follow-up to IMPALA-12578, where we tackle the cases of databases, tables, and columns. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-12578) Pass the owner user to Ranger plug-in in GRANT and REVOKE statements for databases, tables, and columns
[ https://issues.apache.org/jira/browse/IMPALA-12578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao updated IMPALA-12578: - Summary: Pass the owner user to Ranger plug-in in GRANT and REVOKE statements for databases, tables, and columns (was: Pass the owner user to the Ranger plug-in in GRANT and REVOKE statements) > Pass the owner user to Ranger plug-in in GRANT and REVOKE statements for > databases, tables, and columns > --- > > Key: IMPALA-12578 > URL: https://issues.apache.org/jira/browse/IMPALA-12578 > Project: IMPALA > Issue Type: New Feature >Reporter: Fang-Yu Rao >Assignee: Fang-Yu Rao >Priority: Major > > Starting from RANGER-1200, Ranger supports the notion of the OWNER user, > which allows each user to perform any operation on the resources owned by it. > This avoids the need for creating a new policy that grants the OWNER user the > privileges on every newly created resource. Refer to > [apache-ranger-policy-model|https://blogsarchive.apache.org/ranger/entry/apache-ranger-policy-model#:~:text=allow%20each%20user%20to%20access%20all,all]. > Currently for the GRANT and REVOKE statements, Impala does not pass the owner > of the resource to the Ranger plug-in and thus a non-administrative user > could not grant/revoke privileges on a resource to/from another user even > though this non-administrative user owns the resource. We should pass the > ownership information to the Ranger plug-in to make authorization management > easier in Impala. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Comment Edited] (IMPALA-11743) Investigate how to support the OWNER privilege for UDFs in Impala
[ https://issues.apache.org/jira/browse/IMPALA-11743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17803730#comment-17803730 ] Fang-Yu Rao edited comment on IMPALA-11743 at 1/6/24 12:16 AM: --- This JIRA is related to IMPALA-12578 where we would like to pass to the Ranger plug-in the owner of a resource involved in a GRANT/REVOKE statement. Specifically, in the case when the resource is a user-defined function (UDF), Impala has to load this piece of information when instantiating user-defined functions in [CatalogServiceCatalog.java#loadJavaFunctions()|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java#L1812C16-L1836] so that the owner of a UDF will be available in Impala's internal representation of it, i.e., [Function.java|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/Function.java]. On a related note, in [hive_metastore.thrift|https://github.com/apache/hive/blob/master/standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift], Hive already has a field of 'ownerName' for a user-defined function. {code:java} struct Function { 1: string functionName, 2: string dbName, 3: string className, 4: string ownerName, 5: PrincipalTypeownerType, 6: i32 createTime, 7: FunctionType functionType, 8: list resourceUris, 9: optional string catName } {code} On the other hand, when an authorized user is creating a persistent UDF via Impala, Impala should also pass the requesting user as the owner of the UDF to Hive MetaStore. This way Impala will be able to load the owner of a UDF in CatalogServiceCatalog.java#loadJavaFunctions() mentioned above. was (Author: fangyurao): This JIRA is related to IMPALA-12578 where we would like to pass to the Ranger plug-in the owner of a resource involved in a GRANT/REVOKE statement. Specifically, in the case when the resource is a user-defined function (UDF), Impala has to load this piece of information when instantiating user-defined functions in [CatalogServiceCatalog.java#loadJavaFunctions()|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java#L1812C16-L1836] so that the owner of a UDF will be available in Impala's internal representation of it, i.e., [Function.java|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/Function.java]. On a related note, in [hive_metastore.thrift|https://github.com/apache/hive/blob/master/standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift], Hive already has a field of 'ownerName' for a user-defined function. {code:java} struct Function { 1: string functionName, 2: string dbName, 3: string className, 4: string ownerName, 5: PrincipalTypeownerType, 6: i32 createTime, 7: FunctionType functionType, 8: list resourceUris, 9: optional string catName } {code} > Investigate how to support the OWNER privilege for UDFs in Impala > - > > Key: IMPALA-11743 > URL: https://issues.apache.org/jira/browse/IMPALA-11743 > Project: IMPALA > Issue Type: New Feature > Components: Frontend >Reporter: Fang-Yu Rao >Assignee: Fang-Yu Rao >Priority: Major > > Currently in Impala a user allowed to create a UDF in a database still has to > be explicitly granted the necessary privileges to execute the UDF later in a > SELECT query. It would be more convenient if the ownership information of a > UDF could also be retrieved during the query analysis of such SELECT queries > so that the owner/creator of a UDF will be allowed to execute the UDF without > being explicitly granted the necessary privileges on the UDF. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-11743) Investigate how to support the OWNER privilege for UDFs in Impala
[ https://issues.apache.org/jira/browse/IMPALA-11743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17803730#comment-17803730 ] Fang-Yu Rao commented on IMPALA-11743: -- This JIRA is related to IMPALA-12578 where we would like to pass to the Ranger plug-in the owner of a resource involved in a GRANT/REVOKE statement. Specifically, in the case when the resource is a user-defined function (UDF), Impala has to load this piece of information when instantiating user-defined functions in [CatalogServiceCatalog.java#loadJavaFunctions()|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java#L1812C16-L1836] so that the owner of a UDF will be available in Impala's internal representation of it, i.e., [Function.java|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/Function.java]. On a related note, in [hive_metastore.thrift|https://github.com/apache/hive/blob/master/standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift], Hive already has a field of 'ownerName' for a user-defined function. {code:java} struct Function { 1: string functionName, 2: string dbName, 3: string className, 4: string ownerName, 5: PrincipalTypeownerType, 6: i32 createTime, 7: FunctionType functionType, 8: list resourceUris, 9: optional string catName } {code} > Investigate how to support the OWNER privilege for UDFs in Impala > - > > Key: IMPALA-11743 > URL: https://issues.apache.org/jira/browse/IMPALA-11743 > Project: IMPALA > Issue Type: New Feature > Components: Frontend >Reporter: Fang-Yu Rao >Assignee: Fang-Yu Rao >Priority: Major > > Currently in Impala a user allowed to create a UDF in a database still has to > be explicitly granted the necessary privileges to execute the UDF later in a > SELECT query. It would be more convenient if the ownership information of a > UDF could also be retrieved during the query analysis of such SELECT queries > so that the owner/creator of a UDF will be allowed to execute the UDF without > being explicitly granted the necessary privileges on the UDF. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Reopened] (IMPALA-12554) Create only one Ranger policy for GRANT statement
[ https://issues.apache.org/jira/browse/IMPALA-12554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao reopened IMPALA-12554: -- > Create only one Ranger policy for GRANT statement > - > > Key: IMPALA-12554 > URL: https://issues.apache.org/jira/browse/IMPALA-12554 > Project: IMPALA > Issue Type: Improvement >Reporter: Fang-Yu Rao >Assignee: Fang-Yu Rao >Priority: Major > > Currently Impala would create a Ranger policy for each column specified in a > GRANT statement. For instance, after the following query, 3 Ranger policies > would be created on the Ranger server. This could result in a lot of policies > created when there are many columns specified and it may result in Impala's > Ranger plug-in taking a long time to download the policies from the Ranger > server. It would be great if Impala only creates one single policy for > columns in the same table. > {code:java} > [localhost:21050] default> grant select(id, bool_col, tinyint_col) on table > functional.alltypes to user non_owner; > Query: grant select(id, bool_col, tinyint_col) on table functional.alltypes > to user non_owner > Query submitted at: 2023-11-10 09:38:58 (Coordinator: http://fangyu:25000) > Query progress can be monitored at: > http://fangyu:25000/query_plan?query_id=bc4fa1cdefe5881b:413d9a69 > +-+ > | summary | > +-+ > | Privilege(s) have been granted. | > +-+ > Fetched 1 row(s) in 0.67s > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-12554) Create only one Ranger policy for GRANT statement
[ https://issues.apache.org/jira/browse/IMPALA-12554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao resolved IMPALA-12554. -- Resolution: Implemented > Create only one Ranger policy for GRANT statement > - > > Key: IMPALA-12554 > URL: https://issues.apache.org/jira/browse/IMPALA-12554 > Project: IMPALA > Issue Type: Improvement >Reporter: Fang-Yu Rao >Assignee: Fang-Yu Rao >Priority: Major > > Currently Impala would create a Ranger policy for each column specified in a > GRANT statement. For instance, after the following query, 3 Ranger policies > would be created on the Ranger server. This could result in a lot of policies > created when there are many columns specified and it may result in Impala's > Ranger plug-in taking a long time to download the policies from the Ranger > server. It would be great if Impala only creates one single policy for > columns in the same table. > {code:java} > [localhost:21050] default> grant select(id, bool_col, tinyint_col) on table > functional.alltypes to user non_owner; > Query: grant select(id, bool_col, tinyint_col) on table functional.alltypes > to user non_owner > Query submitted at: 2023-11-10 09:38:58 (Coordinator: http://fangyu:25000) > Query progress can be monitored at: > http://fangyu:25000/query_plan?query_id=bc4fa1cdefe5881b:413d9a69 > +-+ > | summary | > +-+ > | Privilege(s) have been granted. | > +-+ > Fetched 1 row(s) in 0.67s > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-12554) Create only one Ranger policy for GRANT statement
[ https://issues.apache.org/jira/browse/IMPALA-12554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao resolved IMPALA-12554. -- Resolution: Later After some manual testing, we found that RANGER-4585 has some bugs, e.g., REVOKE REST API call is not able to revoke the privilege on multiple columns from a grantee that was granted the SELECT privilege on the same set of columns. Before this is fixed, we resolve the ticket for now and will re-open the ticket once this issue is fixed in a follow-up RANGER JIRA. > Create only one Ranger policy for GRANT statement > - > > Key: IMPALA-12554 > URL: https://issues.apache.org/jira/browse/IMPALA-12554 > Project: IMPALA > Issue Type: Improvement >Reporter: Fang-Yu Rao >Assignee: Fang-Yu Rao >Priority: Major > > Currently Impala would create a Ranger policy for each column specified in a > GRANT statement. For instance, after the following query, 3 Ranger policies > would be created on the Ranger server. This could result in a lot of policies > created when there are many columns specified and it may result in Impala's > Ranger plug-in taking a long time to download the policies from the Ranger > server. It would be great if Impala only creates one single policy for > columns in the same table. > {code:java} > [localhost:21050] default> grant select(id, bool_col, tinyint_col) on table > functional.alltypes to user non_owner; > Query: grant select(id, bool_col, tinyint_col) on table functional.alltypes > to user non_owner > Query submitted at: 2023-11-10 09:38:58 (Coordinator: http://fangyu:25000) > Query progress can be monitored at: > http://fangyu:25000/query_plan?query_id=bc4fa1cdefe5881b:413d9a69 > +-+ > | summary | > +-+ > | Privilege(s) have been granted. | > +-+ > Fetched 1 row(s) in 0.67s > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-12578) Pass the owner user to the Ranger plug-in in GRANT and REVOKE statements
[ https://issues.apache.org/jira/browse/IMPALA-12578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao updated IMPALA-12578: - Description: Starting from RANGER-1200, Ranger supports the notion of the OWNER user, which allows each user to perform any operation on the resources owned by it. This avoids the need for creating a new policy that grants the OWNER user the privileges on every newly created resource. Refer to [apache-ranger-policy-model|https://blogsarchive.apache.org/ranger/entry/apache-ranger-policy-model#:~:text=allow%20each%20user%20to%20access%20all,all]. Currently for the GRANT and REVOKE statements, Impala does not pass the owner of the resource to the Ranger plug-in and thus a non-administrative user could not grant/revoke privileges on a resource to/from another user even though this non-administrative user owns the resource. We should pass the ownership information to the Ranger plug-in to make authorization management easier in Impala. was: Starting from RANGER-1200, Ranger supports the notion of the OWNER user, which allows each user to perform any operation on the resources owned by them. This avoids the need for creating a new policy that grants the OWNER user the privileges on every newly created resource. Refer to [apache-ranger-policy-model|https://blogsarchive.apache.org/ranger/entry/apache-ranger-policy-model#:~:text=allow%20each%20user%20to%20access%20all,all]. Currently for the GRANT and REVOKE statements, Impala does not pass the owner of the resource to the Ranger plug-in and thus a non-administrative user could not grant/revoke privileges on a resource to/from another user even though this non-administrative user owns the resource. We should pass the ownership information to the Ranger plug-in to make authorization management easier in Impala. > Pass the owner user to the Ranger plug-in in GRANT and REVOKE statements > > > Key: IMPALA-12578 > URL: https://issues.apache.org/jira/browse/IMPALA-12578 > Project: IMPALA > Issue Type: New Feature >Reporter: Fang-Yu Rao >Assignee: Fang-Yu Rao >Priority: Major > > Starting from RANGER-1200, Ranger supports the notion of the OWNER user, > which allows each user to perform any operation on the resources owned by it. > This avoids the need for creating a new policy that grants the OWNER user the > privileges on every newly created resource. Refer to > [apache-ranger-policy-model|https://blogsarchive.apache.org/ranger/entry/apache-ranger-policy-model#:~:text=allow%20each%20user%20to%20access%20all,all]. > Currently for the GRANT and REVOKE statements, Impala does not pass the owner > of the resource to the Ranger plug-in and thus a non-administrative user > could not grant/revoke privileges on a resource to/from another user even > though this non-administrative user owns the resource. We should pass the > ownership information to the Ranger plug-in to make authorization management > easier in Impala. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-12578) Pass the owner user to the Ranger plug-in in GRANT and REVOKE statements
Fang-Yu Rao created IMPALA-12578: Summary: Pass the owner user to the Ranger plug-in in GRANT and REVOKE statements Key: IMPALA-12578 URL: https://issues.apache.org/jira/browse/IMPALA-12578 Project: IMPALA Issue Type: New Feature Reporter: Fang-Yu Rao Assignee: Fang-Yu Rao Starting from RANGER-1200, Ranger supports the notion of the OWNER user, which allows each user to perform any operation on the resources owned by them. This avoids the need for creating a new policy that grants the OWNER user the privileges on every newly created resource. Refer to [apache-ranger-policy-model|https://blogsarchive.apache.org/ranger/entry/apache-ranger-policy-model#:~:text=allow%20each%20user%20to%20access%20all,all]. Currently for the GRANT and REVOKE statements, Impala does not pass the owner of the resource to the Ranger plug-in and thus a non-administrative user could not grant/revoke privileges on a resource to/from another user even though this non-administrative user owns the resource. We should pass the ownership information to the Ranger plug-in to make authorization management easier in Impala. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-12554) Create only one Ranger policy for GRANT statement
[ https://issues.apache.org/jira/browse/IMPALA-12554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao updated IMPALA-12554: - Description: Currently Impala would create a Ranger policy for each column specified in a GRANT statement. For instance, after the following query, 3 Ranger policies would be created on the Ranger server. This could result in a lot of policies created when there are many columns specified and it may result in Impala's Ranger plug-in taking a long time to download the policies from the Ranger server. It would be great if Impala only creates one single policy for columns in the same table. {code:java} [localhost:21050] default> grant select(id, bool_col, tinyint_col) on table functional.alltypes to user non_owner; Query: grant select(id, bool_col, tinyint_col) on table functional.alltypes to user non_owner Query submitted at: 2023-11-10 09:38:58 (Coordinator: http://fangyu:25000) Query progress can be monitored at: http://fangyu:25000/query_plan?query_id=bc4fa1cdefe5881b:413d9a69 +-+ | summary | +-+ | Privilege(s) have been granted. | +-+ Fetched 1 row(s) in 0.67s {code} was: Currently Impala would create a Ranger policy for each column specified in a GRANT statement. For instance, after the following query, 3 Ranger policies would be created on the Ranger server. This could result in a lot of policies created when there are many columns specified and it may cause Impala's Ranger plug-in a long time to download the policies from the Ranger server. It would be great if Impala only creates one single policy for columns in the same table. {code} [localhost:21050] default> grant select(id, bool_col, tinyint_col) on table functional.alltypes to user non_owner; Query: grant select(id, bool_col, tinyint_col) on table functional.alltypes to user non_owner Query submitted at: 2023-11-10 09:38:58 (Coordinator: http://fangyu:25000) Query progress can be monitored at: http://fangyu:25000/query_plan?query_id=bc4fa1cdefe5881b:413d9a69 +-+ | summary | +-+ | Privilege(s) have been granted. | +-+ Fetched 1 row(s) in 0.67s {code} > Create only one Ranger policy for GRANT statement > - > > Key: IMPALA-12554 > URL: https://issues.apache.org/jira/browse/IMPALA-12554 > Project: IMPALA > Issue Type: Improvement >Reporter: Fang-Yu Rao >Assignee: Fang-Yu Rao >Priority: Major > > Currently Impala would create a Ranger policy for each column specified in a > GRANT statement. For instance, after the following query, 3 Ranger policies > would be created on the Ranger server. This could result in a lot of policies > created when there are many columns specified and it may result in Impala's > Ranger plug-in taking a long time to download the policies from the Ranger > server. It would be great if Impala only creates one single policy for > columns in the same table. > {code:java} > [localhost:21050] default> grant select(id, bool_col, tinyint_col) on table > functional.alltypes to user non_owner; > Query: grant select(id, bool_col, tinyint_col) on table functional.alltypes > to user non_owner > Query submitted at: 2023-11-10 09:38:58 (Coordinator: http://fangyu:25000) > Query progress can be monitored at: > http://fangyu:25000/query_plan?query_id=bc4fa1cdefe5881b:413d9a69 > +-+ > | summary | > +-+ > | Privilege(s) have been granted. | > +-+ > Fetched 1 row(s) in 0.67s > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-3268) Add command "SHOW VIEWS"
[ https://issues.apache.org/jira/browse/IMPALA-3268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao updated IMPALA-3268: Description: Currently to get a list of views, user has to: - SHOW TABLES - scan through the output list - SHOW CREATE TABLE view_name to confirm view_name is a view which is tedious. So I would like to request the following: - -SHOW TABLES should only return tables- - SHOW VIEWS should only return views - -add a flag to either above commands to return all tables and views- This will help lots of end users. Edit: Moved the first item and the third item out of the scope of this JIRA to IMPALA-12574 since more discussion may be required. was: Currently to get a list of views, user has to: - SHOW TABLES - scan through the output list - SHOW CREATE TABLE view_name to confirm view_name is a view which is tedious. So I would like to request the following: - SHOW TABLES should only return tables - SHOW VIEWS should only return views - add a flag to either above commands to return all tables and views This will help lots of end users. > Add command "SHOW VIEWS" > > > Key: IMPALA-3268 > URL: https://issues.apache.org/jira/browse/IMPALA-3268 > Project: IMPALA > Issue Type: New Feature > Components: Catalog >Affects Versions: Impala 2.2.4, Impala 2.3.0, Impala 2.5.0 >Reporter: Eric Lin >Assignee: Fang-Yu Rao >Priority: Minor > Labels: usability > > Currently to get a list of views, user has to: > - SHOW TABLES > - scan through the output list > - SHOW CREATE TABLE view_name to confirm view_name is a view > which is tedious. > So I would like to request the following: > - -SHOW TABLES should only return tables- > - SHOW VIEWS should only return views > - -add a flag to either above commands to return all tables and views- > This will help lots of end users. > Edit: Moved the first item and the third item out of the scope of this JIRA > to IMPALA-12574 since more discussion may be required. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-12574) Consider extending SHOW TABLES statement so it only display tables
[ https://issues.apache.org/jira/browse/IMPALA-12574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao updated IMPALA-12574: - Summary: Consider extending SHOW TABLES statement so it only display tables (was: Consider extending SHOW TABLES statement so it only display the tables) > Consider extending SHOW TABLES statement so it only display tables > -- > > Key: IMPALA-12574 > URL: https://issues.apache.org/jira/browse/IMPALA-12574 > Project: IMPALA > Issue Type: New Feature > Components: Catalog, Frontend >Reporter: Fang-Yu Rao >Priority: Minor > > IMPALA-3268 extended Frontend's API of GetTableNames() such that > GetTableNames() could return the matching tables whose table type is in the > specified set of table types. With this change, it should not be too > difficult to extend the SHOW TABLES statement such that SHOW TABLES could > display only the tables of a specified type (v.s. all types of tables). It > would be great to have this functionality. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-12574) Consider extending SHOW TABLES statement so it only display the tables
Fang-Yu Rao created IMPALA-12574: Summary: Consider extending SHOW TABLES statement so it only display the tables Key: IMPALA-12574 URL: https://issues.apache.org/jira/browse/IMPALA-12574 Project: IMPALA Issue Type: New Feature Components: Catalog, Frontend Reporter: Fang-Yu Rao IMPALA-3268 extended Frontend's API of GetTableNames() such that GetTableNames() could return the matching tables whose table type is in the specified set of table types. With this change, it should not be too difficult to extend the SHOW TABLES statement such that SHOW TABLES could display only the tables of a specified type (v.s. all types of tables). It would be great to have this functionality. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-12554) Create only one Ranger policy for GRANT statement
Fang-Yu Rao created IMPALA-12554: Summary: Create only one Ranger policy for GRANT statement Key: IMPALA-12554 URL: https://issues.apache.org/jira/browse/IMPALA-12554 Project: IMPALA Issue Type: Improvement Reporter: Fang-Yu Rao Assignee: Fang-Yu Rao Currently Impala would create a Ranger policy for each column specified in a GRANT statement. For instance, after the following query, 3 Ranger policies would be created on the Ranger server. This could result in a lot of policies created when there are many columns specified and it may cause Impala's Ranger plug-in a long time to download the policies from the Ranger server. It would be great if Impala only creates one single policy for columns in the same table. {code} [localhost:21050] default> grant select(id, bool_col, tinyint_col) on table functional.alltypes to user non_owner; Query: grant select(id, bool_col, tinyint_col) on table functional.alltypes to user non_owner Query submitted at: 2023-11-10 09:38:58 (Coordinator: http://fangyu:25000) Query progress can be monitored at: http://fangyu:25000/query_plan?query_id=bc4fa1cdefe5881b:413d9a69 +-+ | summary | +-+ | Privilege(s) have been granted. | +-+ Fetched 1 row(s) in 0.67s {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-3268) Add command "SHOW VIEWS"
[ https://issues.apache.org/jira/browse/IMPALA-3268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao reassigned IMPALA-3268: --- Assignee: Fang-Yu Rao > Add command "SHOW VIEWS" > > > Key: IMPALA-3268 > URL: https://issues.apache.org/jira/browse/IMPALA-3268 > Project: IMPALA > Issue Type: New Feature > Components: Catalog >Affects Versions: Impala 2.2.4, Impala 2.3.0, Impala 2.5.0 >Reporter: Eric Lin >Assignee: Fang-Yu Rao >Priority: Minor > Labels: usability > > Currently to get a list of views, user has to: > - SHOW TABLES > - scan through the output list > - SHOW CREATE TABLE view_name to confirm view_name is a view > which is tedious. > So I would like to request the following: > - SHOW TABLES should only return tables > - SHOW VIEWS should only return views > - add a flag to either above commands to return all tables and views > This will help lots of end users. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12528) test_hdfs_scanner_thread_non_reserved_bytes could occasionally fail
[ https://issues.apache.org/jira/browse/IMPALA-12528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17780764#comment-17780764 ] Fang-Yu Rao commented on IMPALA-12528: -- Hi [~rizaon], assigned this JIRA to you since you are more familiar with the corresponding test. Please re-assign the ticket as you see appropriate. Thanks! > test_hdfs_scanner_thread_non_reserved_bytes could occasionally fail > --- > > Key: IMPALA-12528 > URL: https://issues.apache.org/jira/browse/IMPALA-12528 > Project: IMPALA > Issue Type: Bug >Reporter: Fang-Yu Rao >Assignee: Riza Suminto >Priority: Major > Labels: broken-build, flaky-test > > [test_hdfs_scanner_thread_non_reserved_bytes()|https://github.com/apache/impala/blob/master/tests/query_test/test_mem_usage_scaling.py#L379] > could occassionally fail with the following error. > *+Stacktrace+* > {code:java} > E AssertionError: Aggregation of SUM over NumScannerThreadsStarted did not > match expected results. > E EXPECTED VALUE: > E 3 > E > E > E ACTUAL VALUE: > E 1 > {code} > The corresponding test file > [hdfs-scanner-thread-non-reserved-bytes.test|https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/hdfs-scanner-thread-non-reserved-bytes.test] > was recently added in IMPALA-12499. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-12528) test_hdfs_scanner_thread_non_reserved_bytes could occasionally fail
Fang-Yu Rao created IMPALA-12528: Summary: test_hdfs_scanner_thread_non_reserved_bytes could occasionally fail Key: IMPALA-12528 URL: https://issues.apache.org/jira/browse/IMPALA-12528 Project: IMPALA Issue Type: Bug Reporter: Fang-Yu Rao Assignee: Riza Suminto [test_hdfs_scanner_thread_non_reserved_bytes()|https://github.com/apache/impala/blob/master/tests/query_test/test_mem_usage_scaling.py#L379] could occassionally fail with the following error. *+Stacktrace+* {code:java} E AssertionError: Aggregation of SUM over NumScannerThreadsStarted did not match expected results. E EXPECTED VALUE: E 3 E E E ACTUAL VALUE: E 1 {code} The corresponding test file [hdfs-scanner-thread-non-reserved-bytes.test|https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/hdfs-scanner-thread-non-reserved-bytes.test] was recently added in IMPALA-12499. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12527) test_metadata_tables could occasionally fail in the s3 build
[ https://issues.apache.org/jira/browse/IMPALA-12527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17780556#comment-17780556 ] Fang-Yu Rao commented on IMPALA-12527: -- Hi [~tmate], assigned the JIRA to you since you recently revised the failed test in IMPALA-11996 so you are more familiar with this area. Please re-assign the ticket as you see appropriate. Thanks! > test_metadata_tables could occasionally fail in the s3 build > > > Key: IMPALA-12527 > URL: https://issues.apache.org/jira/browse/IMPALA-12527 > Project: IMPALA > Issue Type: Bug >Reporter: Fang-Yu Rao >Assignee: Tamas Mate >Priority: Major > Labels: broken-build, flaky-test > > We found that > [test_metadata_tables()|https://github.infra.cloudera.com/CDH/Impala/blame/cdw-master-staging/tests/query_test/test_iceberg.py#L1219] > that runs > [iceberg-metadata-tables.test|https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test] > could occasionally fail with the following error message. > It looks like the actual result does not match the expected result for some > columns. > Stacktrace > {code} > query_test/test_iceberg.py:1226: in test_metadata_tables > '$OVERWRITE_SNAPSHOT_TS': str(overwrite_snapshot_ts.data[0])}) > common/impala_test_suite.py:751: in run_test_case > self.__verify_results_and_errors(vector, test_section, result, use_db) > common/impala_test_suite.py:587: in __verify_results_and_errors > replace_filenames_with_placeholder) > common/test_result_verifier.py:487: in verify_raw_results > VERIFIER_MAP[verifier](expected, actual) > common/test_result_verifier.py:296: in verify_query_result_is_equal > assert expected_results == actual_results > E assert Comparing QueryTestResults (expected vs actual): > E > row_regex:0,'s3a://impala-test-uswest2-2/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/.*.parq','PARQUET',0,1,[1-9]\d*|0,'',0 > != > 0,'/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/7d479ffb82bfffd3-7ce667e5_544607964_data.0.parq','PARQUET',0,1,351,'NULL',0 > E > row_regex:0,'s3a://impala-test-uswest2-2/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/.*.parq','PARQUET',0,1,[1-9]\d*|0,'',0 > != > 0,'/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/ab4ffd0d75a5a68d-13da0831_1541521750_data.0.parq','PARQUET',0,1,351,'NULL',0 > E > row_regex:0,'s3a://impala-test-uswest2-2/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/.*.parq','PARQUET',0,1,[1-9]\d*|0,'',0 > != > 0,'/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/b04d1095845359f5-f0799bd0_1209897284_data.0.parq','PARQUET',0,1,351,'NULL',0 > E > row_regex:1,'s3a://impala-test-uswest2-2/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/.*.parq','PARQUET',0,1,[1-9]\d*|0,'NULL',NULL > != > 1,'/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/delete-1b45db885b2bdd56-4023218d0002_1697110314_data.0.parq','PARQUET',0,1,1531,'NULL',NULL > {code} > Specifically, it seems the value of the second last column are different from > the expected value in some rows. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-12527) test_metadata_tables could occasionally fail in the s3 build
[ https://issues.apache.org/jira/browse/IMPALA-12527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao updated IMPALA-12527: - Description: We found that [test_metadata_tables()|https://github.infra.cloudera.com/CDH/Impala/blame/cdw-master-staging/tests/query_test/test_iceberg.py#L1219] that runs [iceberg-metadata-tables.test|https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test] could occasionally fail with the following error message. It looks like the actual result does not match the expected result for some columns. Stacktrace {code} query_test/test_iceberg.py:1226: in test_metadata_tables '$OVERWRITE_SNAPSHOT_TS': str(overwrite_snapshot_ts.data[0])}) common/impala_test_suite.py:751: in run_test_case self.__verify_results_and_errors(vector, test_section, result, use_db) common/impala_test_suite.py:587: in __verify_results_and_errors replace_filenames_with_placeholder) common/test_result_verifier.py:487: in verify_raw_results VERIFIER_MAP[verifier](expected, actual) common/test_result_verifier.py:296: in verify_query_result_is_equal assert expected_results == actual_results E assert Comparing QueryTestResults (expected vs actual): E row_regex:0,'s3a://impala-test-uswest2-2/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/.*.parq','PARQUET',0,1,[1-9]\d*|0,'',0 != 0,'/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/7d479ffb82bfffd3-7ce667e5_544607964_data.0.parq','PARQUET',0,1,351,'NULL',0 E row_regex:0,'s3a://impala-test-uswest2-2/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/.*.parq','PARQUET',0,1,[1-9]\d*|0,'',0 != 0,'/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/ab4ffd0d75a5a68d-13da0831_1541521750_data.0.parq','PARQUET',0,1,351,'NULL',0 E row_regex:0,'s3a://impala-test-uswest2-2/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/.*.parq','PARQUET',0,1,[1-9]\d*|0,'',0 != 0,'/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/b04d1095845359f5-f0799bd0_1209897284_data.0.parq','PARQUET',0,1,351,'NULL',0 E row_regex:1,'s3a://impala-test-uswest2-2/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/.*.parq','PARQUET',0,1,[1-9]\d*|0,'NULL',NULL != 1,'/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/delete-1b45db885b2bdd56-4023218d0002_1697110314_data.0.parq','PARQUET',0,1,1531,'NULL',NULL {code} Specifically, it seems the value of the second last column are different from the expected value in some rows. was: We found that [test_metadata_tables()|https://github.infra.cloudera.com/CDH/Impala/blame/cdw-master-staging/tests/query_test/test_iceberg.py#L1219] that runs [iceberg-metadata-tables.test|https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test] could occasionally fail with the following error message. It looks like the actual result do not match the expected result for some columns. Stacktrace {code} query_test/test_iceberg.py:1226: in test_metadata_tables '$OVERWRITE_SNAPSHOT_TS': str(overwrite_snapshot_ts.data[0])}) common/impala_test_suite.py:751: in run_test_case self.__verify_results_and_errors(vector, test_section, result, use_db) common/impala_test_suite.py:587: in __verify_results_and_errors replace_filenames_with_placeholder) common/test_result_verifier.py:487: in verify_raw_results VERIFIER_MAP[verifier](expected, actual) common/test_result_verifier.py:296: in verify_query_result_is_equal assert expected_results == actual_results E assert Comparing QueryTestResults (expected vs actual): E row_regex:0,'s3a://impala-test-uswest2-2/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/.*.parq','PARQUET',0,1,[1-9]\d*|0,'',0 != 0,'/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/7d479ffb82bfffd3-7ce667e5_544607964_data.0.parq','PARQUET',0,1,351,'NULL',0 E row_regex:0,'s3a://impala-test-uswest2-2/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/.*.parq','PARQUET',0,1,[1-9]\d*|0,'',0 != 0,'/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/ab4ffd0d75a5a68d-13da0831_1541521750_data.0.parq','PARQUET',0,1,351,'NULL',0 E row_regex:0,'s3a://impala-test-uswest2-2/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/.*.parq','PARQUET',0,1,[1-9]\d*|0,'',0 != 0,'/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/b04d1095845359f5-f0799bd0_1209897284_data.0.parq','PARQUET',0,1,351,'NULL',0 E
[jira] [Created] (IMPALA-12527) test_metadata_tables could occasionally fail in the s3 build
Fang-Yu Rao created IMPALA-12527: Summary: test_metadata_tables could occasionally fail in the s3 build Key: IMPALA-12527 URL: https://issues.apache.org/jira/browse/IMPALA-12527 Project: IMPALA Issue Type: Bug Reporter: Fang-Yu Rao Assignee: Tamas Mate We found that [test_metadata_tables()|https://github.infra.cloudera.com/CDH/Impala/blame/cdw-master-staging/tests/query_test/test_iceberg.py#L1219] that runs [iceberg-metadata-tables.test|https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test] could occasionally fail with the following error message. It looks like the actual result do not match the expected result for some columns. Stacktrace {code} query_test/test_iceberg.py:1226: in test_metadata_tables '$OVERWRITE_SNAPSHOT_TS': str(overwrite_snapshot_ts.data[0])}) common/impala_test_suite.py:751: in run_test_case self.__verify_results_and_errors(vector, test_section, result, use_db) common/impala_test_suite.py:587: in __verify_results_and_errors replace_filenames_with_placeholder) common/test_result_verifier.py:487: in verify_raw_results VERIFIER_MAP[verifier](expected, actual) common/test_result_verifier.py:296: in verify_query_result_is_equal assert expected_results == actual_results E assert Comparing QueryTestResults (expected vs actual): E row_regex:0,'s3a://impala-test-uswest2-2/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/.*.parq','PARQUET',0,1,[1-9]\d*|0,'',0 != 0,'/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/7d479ffb82bfffd3-7ce667e5_544607964_data.0.parq','PARQUET',0,1,351,'NULL',0 E row_regex:0,'s3a://impala-test-uswest2-2/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/.*.parq','PARQUET',0,1,[1-9]\d*|0,'',0 != 0,'/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/ab4ffd0d75a5a68d-13da0831_1541521750_data.0.parq','PARQUET',0,1,351,'NULL',0 E row_regex:0,'s3a://impala-test-uswest2-2/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/.*.parq','PARQUET',0,1,[1-9]\d*|0,'',0 != 0,'/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/b04d1095845359f5-f0799bd0_1209897284_data.0.parq','PARQUET',0,1,351,'NULL',0 E row_regex:1,'s3a://impala-test-uswest2-2/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/.*.parq','PARQUET',0,1,[1-9]\d*|0,'NULL',NULL != 1,'/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/delete-1b45db885b2bdd56-4023218d0002_1697110314_data.0.parq','PARQUET',0,1,1531,'NULL',NULL {code} Specifically, it seems the value of the second last column are different from the expected value in some rows. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12526) BackendConfig.INSTANCE could be null in the frontend test testResetMetadataDesc
[ https://issues.apache.org/jira/browse/IMPALA-12526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17780524#comment-17780524 ] Fang-Yu Rao commented on IMPALA-12526: -- Hi [~stigahuang], assigned this JIRA to you since you are more familiar with the failed frontend test. Please re-assign the ticket as you see appropriate. Thanks! > BackendConfig.INSTANCE could be null in the frontend test > testResetMetadataDesc > --- > > Key: IMPALA-12526 > URL: https://issues.apache.org/jira/browse/IMPALA-12526 > Project: IMPALA > Issue Type: Bug >Reporter: Fang-Yu Rao >Assignee: Quanlong Huang >Priority: Major > Labels: broken-build, flaky-test > > We found that > [BackendConfig.INSTANCE|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/analysis/ResetMetadataStmt.java#L265] > could be null in the frontend test > [testResetMetadataDesc()|https://github.com/apache/impala/blob/master/fe/src/test/java/org/apache/impala/util/CatalogOpUtilTest.java#L65] > and thus > [ResetMetadataStmt#toThrift()|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/analysis/ResetMetadataStmt.java#L265] > could fail with the following error. > {code} > Cannot invoke "org.apache.impala.service.BackendConfig.getHostname()" because > "org.apache.impala.service.BackendConfig.INSTANCE" is null > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12526) BackendConfig.INSTANCE could be null in the frontend test testResetMetadataDesc
[ https://issues.apache.org/jira/browse/IMPALA-12526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17780523#comment-17780523 ] Fang-Yu Rao commented on IMPALA-12526: -- This issue seems to be the same as IMPALA-11699 but I could not be completely sure. > BackendConfig.INSTANCE could be null in the frontend test > testResetMetadataDesc > --- > > Key: IMPALA-12526 > URL: https://issues.apache.org/jira/browse/IMPALA-12526 > Project: IMPALA > Issue Type: Bug >Reporter: Fang-Yu Rao >Assignee: Quanlong Huang >Priority: Major > Labels: broken-build, flaky-test > > We found that > [BackendConfig.INSTANCE|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/analysis/ResetMetadataStmt.java#L265] > could be null in the frontend test > [testResetMetadataDesc()|https://github.com/apache/impala/blob/master/fe/src/test/java/org/apache/impala/util/CatalogOpUtilTest.java#L65] > and thus > [ResetMetadataStmt#toThrift()|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/analysis/ResetMetadataStmt.java#L265] > could fail with the following error. > {code} > Cannot invoke "org.apache.impala.service.BackendConfig.getHostname()" because > "org.apache.impala.service.BackendConfig.INSTANCE" is null > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-12526) BackendConfig.INSTANCE could be null in the frontend test testResetMetadataDesc
Fang-Yu Rao created IMPALA-12526: Summary: BackendConfig.INSTANCE could be null in the frontend test testResetMetadataDesc Key: IMPALA-12526 URL: https://issues.apache.org/jira/browse/IMPALA-12526 Project: IMPALA Issue Type: Bug Reporter: Fang-Yu Rao Assignee: Quanlong Huang We found that [BackendConfig.INSTANCE|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/analysis/ResetMetadataStmt.java#L265] could be null in the frontend test [testResetMetadataDesc()|https://github.com/apache/impala/blob/master/fe/src/test/java/org/apache/impala/util/CatalogOpUtilTest.java#L65] and thus [ResetMetadataStmt#toThrift()|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/analysis/ResetMetadataStmt.java#L265] could fail with the following error. {code} Cannot invoke "org.apache.impala.service.BackendConfig.getHostname()" because "org.apache.impala.service.BackendConfig.INSTANCE" is null {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-12525) statestore.active-status did not reach value True in 120s
Fang-Yu Rao created IMPALA-12525: Summary: statestore.active-status did not reach value True in 120s Key: IMPALA-12525 URL: https://issues.apache.org/jira/browse/IMPALA-12525 Project: IMPALA Issue Type: Bug Reporter: Fang-Yu Rao Assignee: Wenzhe Zhou We found that it's possible that [statestore.active-status|https://github.com/apache/impala/blob/master/tests/custom_cluster/test_statestored_ha.py#L452] could not reach value True in 120s. *+Error Message+* {code:java} AssertionError: Metric statestore.active-status did not reach value True in 120s. Dumping debug webpages in JSON format... Dumped memz JSON to $IMPALA_HOME/logs/metric_timeout_diags_20231026_01:53:53/json/memz.json Dumped metrics JSON to $IMPALA_HOME/logs/metric_timeout_diags_20231026_01:53:53/json/metrics.json Dumped queries JSON to $IMPALA_HOME/logs/metric_timeout_diags_20231026_01:53:53/json/queries.json Dumped sessions JSON to $IMPALA_HOME/logs/metric_timeout_diags_20231026_01:53:53/json/sessions.json Dumped threadz JSON to $IMPALA_HOME/logs/metric_timeout_diags_20231026_01:53:53/json/threadz.json Dumped rpcz JSON to $IMPALA_HOME/logs/metric_timeout_diags_20231026_01:53:53/json/rpcz.json Dumping minidumps for impalads/catalogds... Dumped minidump for Impalad PID 32539 Dumped minidump for Impalad PID 32543 Dumped minidump for Impalad PID 32550 Dumped minidump for Catalogd PID 32460 {code} *+Stacktrace+* {code:java} custom_cluster/test_statestored_ha.py:500: in test_statestored_manual_failover self.__test_statestored_manual_failover(second_failover=True) custom_cluster/test_statestored_ha.py:452: in __test_statestored_manual_failover "statestore.active-status", expected_value=True, timeout=120) common/impala_service.py:144: in wait_for_metric_value self.__metric_timeout_assert(metric_name, expected_value, timeout) common/impala_service.py:213: in __metric_timeout_assert assert 0, assert_string E AssertionError: Metric statestore.active-status did not reach value True in 120s. E Dumping debug webpages in JSON format... E Dumped memz JSON to $IMPALA_HOME/logs/metric_timeout_diags_20231026_01:53:53/json/memz.json E Dumped metrics JSON to $IMPALA_HOME/logs/metric_timeout_diags_20231026_01:53:53/json/metrics.json E Dumped queries JSON to $IMPALA_HOME/logs/metric_timeout_diags_20231026_01:53:53/json/queries.json E Dumped sessions JSON to $IMPALA_HOME/logs/metric_timeout_diags_20231026_01:53:53/json/sessions.json E Dumped threadz JSON to $IMPALA_HOME/logs/metric_timeout_diags_20231026_01:53:53/json/threadz.json E Dumped rpcz JSON to $IMPALA_HOME/logs/metric_timeout_diags_20231026_01:53:53/json/rpcz.json E Dumping minidumps for impalads/catalogds... E Dumped minidump for Impalad PID 32539 E Dumped minidump for Impalad PID 32543 E Dumped minidump for Impalad PID 32550 E Dumped minidump for Catalogd PID 32460 {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-12522) test_alter_table_recover could finish less than 10 seconds with JDK 17 when enable_async_ddl_execution is False
[ https://issues.apache.org/jira/browse/IMPALA-12522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao updated IMPALA-12522: - Priority: Critical (was: Major) > test_alter_table_recover could finish less than 10 seconds with JDK 17 when > enable_async_ddl_execution is False > --- > > Key: IMPALA-12522 > URL: https://issues.apache.org/jira/browse/IMPALA-12522 > Project: IMPALA > Issue Type: Test >Reporter: Fang-Yu Rao >Assignee: Joe McDonnell >Priority: Critical > Labels: broken-build, flaky-test > > We found that > [test_alter_table_recover()|https://github.com/apache/impala/blame/master/tests/metadata/test_ddl.py#L1026] > could finish the execution within 10 seconds with JDK 17 when > enable_async_ddl_execution is False and thus the check in the [else > branch|https://github.com/apache/impala/blame/master/tests/metadata/test_ddl.py#L1079C12-L1079C12] > could fail. Don't know it has something to do with JDK but maybe we could > reduce the expected execution time a little bit to make the test less flaky. > {code} > # In sync mode: > # The entire DDL is processed in the exec step with delay. exec_time > should be > # more than 10 seconds. > # > # In async mode: > # The compilation of DDL is processed in the exec step without delay. > And the > # processing of the DDL plan is in wait step with delay. The wait time > should > # definitely take more time than 10 seconds. > if enable_async_ddl: > assert(wait_time >= 10) > else: > assert(exec_time >= 10) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12522) test_alter_table_recover could finish less than 10 seconds with JDK 17 when enable_async_ddl_execution is False
[ https://issues.apache.org/jira/browse/IMPALA-12522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17780105#comment-17780105 ] Fang-Yu Rao commented on IMPALA-12522: -- Hi [~joemcdonnell], assigned this JIRA to you since you helped review [IMPALA-10811|https://gerrit.cloudera.org/c/17872/38/tests/metadata/test_ddl.py#1012] that added this test. Please reassign the JIRA as you see appropriate. Thanks! > test_alter_table_recover could finish less than 10 seconds with JDK 17 when > enable_async_ddl_execution is False > --- > > Key: IMPALA-12522 > URL: https://issues.apache.org/jira/browse/IMPALA-12522 > Project: IMPALA > Issue Type: Test >Reporter: Fang-Yu Rao >Assignee: Joe McDonnell >Priority: Major > Labels: broken-build, flaky-test > > We found that > [test_alter_table_recover()|https://github.com/apache/impala/blame/master/tests/metadata/test_ddl.py#L1026] > could finish the execution within 10 seconds with JDK 17 when > enable_async_ddl_execution is False and thus the check in the [else > branch|https://github.com/apache/impala/blame/master/tests/metadata/test_ddl.py#L1079C12-L1079C12] > could fail. Don't know it has something to do with JDK but maybe we could > reduce the expected execution time a little bit to make the test less flaky. > {code} > # In sync mode: > # The entire DDL is processed in the exec step with delay. exec_time > should be > # more than 10 seconds. > # > # In async mode: > # The compilation of DDL is processed in the exec step without delay. > And the > # processing of the DDL plan is in wait step with delay. The wait time > should > # definitely take more time than 10 seconds. > if enable_async_ddl: > assert(wait_time >= 10) > else: > assert(exec_time >= 10) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-12522) test_alter_table_recover could finish less than 10 seconds with JDK 17 when enable_async_ddl_execution is False
Fang-Yu Rao created IMPALA-12522: Summary: test_alter_table_recover could finish less than 10 seconds with JDK 17 when enable_async_ddl_execution is False Key: IMPALA-12522 URL: https://issues.apache.org/jira/browse/IMPALA-12522 Project: IMPALA Issue Type: Test Reporter: Fang-Yu Rao Assignee: Joe McDonnell We found that [test_alter_table_recover()|https://github.com/apache/impala/blame/master/tests/metadata/test_ddl.py#L1026] could finish the execution within 10 seconds with JDK 17 when enable_async_ddl_execution is False and thus the check in the [else branch|https://github.com/apache/impala/blame/master/tests/metadata/test_ddl.py#L1079C12-L1079C12] could fail. Don't know it has something to do with JDK but maybe we could reduce the expected execution time a little bit to make the test less flaky. {code} # In sync mode: # The entire DDL is processed in the exec step with delay. exec_time should be # more than 10 seconds. # # In async mode: # The compilation of DDL is processed in the exec step without delay. And the # processing of the DDL plan is in wait step with delay. The wait time should # definitely take more time than 10 seconds. if enable_async_ddl: assert(wait_time >= 10) else: assert(exec_time >= 10) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12500) TestObservability.test_global_exchange_counters is flaky
[ https://issues.apache.org/jira/browse/IMPALA-12500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17778839#comment-17778839 ] Fang-Yu Rao commented on IMPALA-12500: -- Hi [~csringhofer], assigned this JIRA to you since you recently revised the test at [IMPALA-12430|https://github.com/apache/impala/commit/fb2d2b27641a95f51b6789639fab73b60abd7bc5#diff-a317a4067b5728a2d0af9839c1dce94710e7bd50825ceffc0a3c88aca3e27de3R553] and thus may be more familiar with the test. Please feel free to reassign the JIRA as you see fit. Thanks! > TestObservability.test_global_exchange_counters is flaky > > > Key: IMPALA-12500 > URL: https://issues.apache.org/jira/browse/IMPALA-12500 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 4.4.0 >Reporter: Joe McDonnell >Assignee: Csaba Ringhofer >Priority: Critical > Labels: broken-build, flaky > > There have been intermittent failures on this test with the following symptom: > {noformat} > query_test/test_observability.py:564: in test_global_exchange_counters > assert "ExchangeScanRatio: 4.63" in profile > E assert 'ExchangeScanRatio: 4.63' in 'Query > (id=c04b974db37e7046:b5fe4dea):\n DEBUG MODE WARNING: Query profile > created while running a DEBUG buil...: 0.000ns\n - WriteIoBytes: > 0\n - WriteIoOps: 0 (0)\n - WriteIoWaitTime: > 0.000ns\n' > -- executing against localhost:21000 > select count(*), sleep(50) from tpch_parquet.orders o > inner join tpch_parquet.lineitem l on o.o_orderkey = l.l_orderkey > group by o.o_clerk limit 10; > -- 2023-10-05 19:47:29,817 INFO MainThread: Started query > c04b974db37e7046:b5fe4dea{noformat} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-12500) TestObservability.test_global_exchange_counters is flaky
[ https://issues.apache.org/jira/browse/IMPALA-12500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao reassigned IMPALA-12500: Assignee: Fang-Yu Rao > TestObservability.test_global_exchange_counters is flaky > > > Key: IMPALA-12500 > URL: https://issues.apache.org/jira/browse/IMPALA-12500 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 4.4.0 >Reporter: Joe McDonnell >Assignee: Fang-Yu Rao >Priority: Critical > Labels: broken-build, flaky > > There have been intermittent failures on this test with the following symptom: > {noformat} > query_test/test_observability.py:564: in test_global_exchange_counters > assert "ExchangeScanRatio: 4.63" in profile > E assert 'ExchangeScanRatio: 4.63' in 'Query > (id=c04b974db37e7046:b5fe4dea):\n DEBUG MODE WARNING: Query profile > created while running a DEBUG buil...: 0.000ns\n - WriteIoBytes: > 0\n - WriteIoOps: 0 (0)\n - WriteIoWaitTime: > 0.000ns\n' > -- executing against localhost:21000 > select count(*), sleep(50) from tpch_parquet.orders o > inner join tpch_parquet.lineitem l on o.o_orderkey = l.l_orderkey > group by o.o_clerk limit 10; > -- 2023-10-05 19:47:29,817 INFO MainThread: Started query > c04b974db37e7046:b5fe4dea{noformat} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-12500) TestObservability.test_global_exchange_counters is flaky
[ https://issues.apache.org/jira/browse/IMPALA-12500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao reassigned IMPALA-12500: Assignee: Csaba Ringhofer (was: Fang-Yu Rao) > TestObservability.test_global_exchange_counters is flaky > > > Key: IMPALA-12500 > URL: https://issues.apache.org/jira/browse/IMPALA-12500 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 4.4.0 >Reporter: Joe McDonnell >Assignee: Csaba Ringhofer >Priority: Critical > Labels: broken-build, flaky > > There have been intermittent failures on this test with the following symptom: > {noformat} > query_test/test_observability.py:564: in test_global_exchange_counters > assert "ExchangeScanRatio: 4.63" in profile > E assert 'ExchangeScanRatio: 4.63' in 'Query > (id=c04b974db37e7046:b5fe4dea):\n DEBUG MODE WARNING: Query profile > created while running a DEBUG buil...: 0.000ns\n - WriteIoBytes: > 0\n - WriteIoOps: 0 (0)\n - WriteIoWaitTime: > 0.000ns\n' > -- executing against localhost:21000 > select count(*), sleep(50) from tpch_parquet.orders o > inner join tpch_parquet.lineitem l on o.o_orderkey = l.l_orderkey > group by o.o_clerk limit 10; > -- 2023-10-05 19:47:29,817 INFO MainThread: Started query > c04b974db37e7046:b5fe4dea{noformat} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-10712) SET OWNER ROLE of a database/table/view is not supported when Ranger is the authorization provider
[ https://issues.apache.org/jira/browse/IMPALA-10712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17775048#comment-17775048 ] Fang-Yu Rao commented on IMPALA-10712: -- It looks like I created a JIRA more than 2 years ago for the same issue. > SET OWNER ROLE of a database/table/view is not supported when > Ranger is the authorization provider > -- > > Key: IMPALA-10712 > URL: https://issues.apache.org/jira/browse/IMPALA-10712 > Project: IMPALA > Issue Type: Improvement >Affects Versions: Impala 4.0.0 >Reporter: Fang-Yu Rao >Assignee: Fang-Yu Rao >Priority: Major > > We found that {{SET OWNER ROLE}} of a database, table, or a view is not > supported when Ranger is the authorization provider. > In the case of set the owner of a database to a given role, when Ranger is > the authorization provider, we found that after executing {{ALTER DATABASE > SET OWNER ROLE }}, we will hit the non-null check > for the given role at > [https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/analysis/AlterDbSetOwnerStmt.java#L59] > due to the fact that the {{AuthorizationPolicy}} returned from > {{getAuthPolicy()}} does not cache any policy-related information if the > authorization provider is Ranger, which is different than the case when > Sentry was the authorization provider. > When Ranger is the authorization provider, the currently existing roles are > cached by {{RangerImpalaPlugin}}. Therefore to address the issue above, we > could probably invoke {{getRoles().getRangerRoles()}} provided by the > {{RangerImpalaPlugin}} to retrieve the set of existing roles, similar to what > is done at > [https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/authorization/ranger/RangerImpaladAuthorizationManager.java#L135]. > Tagged [~joemcdonnell] and [~shajini] since I realized this when reviewing > Joe's comment at > [https://gerrit.cloudera.org/c/17469/1/docs/topics/impala_alter_database.xml#b68]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-11466) Add jetty-server as an allowed dependency
[ https://issues.apache.org/jira/browse/IMPALA-11466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao resolved IMPALA-11466. -- Fix Version/s: Impala 4.3.0 Resolution: Fixed Resolve this JIRA since the fix has been merged thanks to [~rizaon]. > Add jetty-server as an allowed dependency > - > > Key: IMPALA-11466 > URL: https://issues.apache.org/jira/browse/IMPALA-11466 > Project: IMPALA > Issue Type: Task >Reporter: Fang-Yu Rao >Assignee: Fang-Yu Rao >Priority: Major > Fix For: Impala 4.3.0 > > > We found after HIVE-21456, the instantiation of HiveMetaStoreClient requires > the class of org.eclipse.jetty.server.Connector, which is a banned dependency > of impala-frontend. This resulted in the failure of the FE test > testTestCaseImport() since it needs to instantiate a > HiveMetaStoreClient. > We should add the required dependency so that the test could be run. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-12250) Remove deprecated Ranger configuration properties after RANGER-2895
[ https://issues.apache.org/jira/browse/IMPALA-12250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao resolved IMPALA-12250. -- Resolution: Fixed Resolve this since the Ranger artifact we are using already contains RANGER-2895. > Remove deprecated Ranger configuration properties after RANGER-2895 > --- > > Key: IMPALA-12250 > URL: https://issues.apache.org/jira/browse/IMPALA-12250 > Project: IMPALA > Issue Type: Task >Reporter: Fang-Yu Rao >Assignee: Fang-Yu Rao >Priority: Major > > In IMPALA-12248, we added 3 new Ranger configuration properties that will be > required after we start using a build that includes > [RANGER-2895|https://github.com/apache/ranger/commit/846031985cae70f7a8c5e92faf186948a302260e#diff-dcab4376623684e416c7e60162c7af7a7d3789fe1d61a2cfdaef794334426f05] > in order to start Ranger's HTTP server. > Recall that a Ranger configuration property was deprecated in > [RANGER-2895|https://github.com/apache/ranger/commit/846031985cae70f7a8c5e92faf186948a302260e#diff-dcab4376623684e416c7e60162c7af7a7d3789fe1d61a2cfdaef794334426f05], > i.e., > [ranger.jpa.jdbc.idleconnectiontestperiod|https://github.com/apache/ranger/commit/846031985cae70f7a8c5e92faf186948a302260e#diff-9669116dca1e5c9fffdb2c81d4d9ac57b489131e90b89ff17b56801131bad5a6L419]. > Thus, we should also remove it from > [ranger-admin-default-site.xml.template|https://github.com/apache/impala/blob/master/testdata/cluster/ranger/ranger-admin-default-site.xml.template] > after starting using a build that includes > [RANGER-2895|https://github.com/apache/ranger/commit/846031985cae70f7a8c5e92faf186948a302260e#diff-dcab4376623684e416c7e60162c7af7a7d3789fe1d61a2cfdaef794334426f05]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-11498) Change port range of TEZ's web UI server after TEZ-4347
[ https://issues.apache.org/jira/browse/IMPALA-11498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao resolved IMPALA-11498. -- Resolution: Fixed Resolve the issue since the fix has been merged. > Change port range of TEZ's web UI server after TEZ-4347 > --- > > Key: IMPALA-11498 > URL: https://issues.apache.org/jira/browse/IMPALA-11498 > Project: IMPALA > Issue Type: Task > Components: Frontend >Reporter: Fang-Yu Rao >Assignee: Fang-Yu Rao >Priority: Major > > After TEZ-4347, by default TEZ would attempt to start a web UI server before > opening a session. The default port range for the server specified in > [TezConfiguration.java|https://github.infra.cloudera.com/CDH/tez/blob/cdw-master/tez-api/src/main/java/org/apache/tez/dag/api/TezConfiguration.java#L1823] > (in the TEZ repository) is "5-50050", which does not seem to be a good > choice in Impala's testing environment in that there are always some other > client programs holding those ports when TEZ attempts to start its web UI > server. As a result, TEZ could not bind a port in the port range to start its > web UI > server, resulting in TEZ session not being created. > We should specify a better port ranger for TEZ once we start using a TEZ > dependency with TEZ-4347. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-12248) Add required Ranger configuration properties after RANGER-2895
[ https://issues.apache.org/jira/browse/IMPALA-12248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao resolved IMPALA-12248. -- Resolution: Fixed Resolve the issue since the fix has been merged. > Add required Ranger configuration properties after RANGER-2895 > -- > > Key: IMPALA-12248 > URL: https://issues.apache.org/jira/browse/IMPALA-12248 > Project: IMPALA > Issue Type: Task >Reporter: Fang-Yu Rao >Assignee: Fang-Yu Rao >Priority: Major > > [RANGER-2895|https://github.com/apache/ranger/commit/846031985cae70f7a8c5e92faf186948a302260e] > added and removed some configuration properties. > [Three new configuration properties were > added|https://github.com/apache/ranger/commit/846031985cae70f7a8c5e92faf186948a302260e#diff-dcab4376623684e416c7e60162c7af7a7d3789fe1d61a2cfdaef794334426f05]. > We found that once we bump up the build number to include RANGER-2895 and if > those new properties do not exist in > [ranger-admin-default-site.xml.template|https://github.com/apache/impala/blob/master/testdata/cluster/ranger/ranger-admin-default-site.xml.template] > or > [ranger-admin-site.xml.template|https://github.com/apache/impala/blob/master/testdata/cluster/ranger/ranger-admin-site.xml.template] > then the produced site files for Ranger will not contain those new > properties, resulting in some error message like the following in > catalina.log. As a result, Ranger's HTTP server could not be properly started. > {code:java} > 23/06/25 04:46:01 ERROR context.ContextLoader: Context initialization failed > org.springframework.beans.factory.BeanDefinitionStoreException: Invalid bean > definition with name 'defaultDataSource' defined in ServletContext resource > [/META-INF/applicationContext.xml]: Could not resolve placeholder > 'ranger.jpa.jdbc.idletimeout' in value "${ranger.jpa.jdbc.idletimeout}"; > nested exception is java.lang.IllegalArgumentException: Could not resolve > placeholder 'ranger.jpa.jdbc.idletimeout' in value > "${ranger.jpa.jdbc.idletimeout}" > at > {code} > There are also some configuration properties removed in RANGER-2895, e.g., > [ranger.jpa.jdbc.idleconnectiontestperiod|https://github.com/apache/ranger/commit/846031985cae70f7a8c5e92faf186948a302260e#diff-dcab4376623684e416c7e60162c7af7a7d3789fe1d61a2cfdaef794334426f05L136]. > In this regard, we could probably add these 3 new properties first and then > remove the unnecessary properties once we have bumped up the build number > that includes RANGER-2895. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Reopened] (IMPALA-12250) Remove deprecated Ranger configuration properties after RANGER-2895
[ https://issues.apache.org/jira/browse/IMPALA-12250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao reopened IMPALA-12250: -- Sorry I meant to close IMPALA-12248. > Remove deprecated Ranger configuration properties after RANGER-2895 > --- > > Key: IMPALA-12250 > URL: https://issues.apache.org/jira/browse/IMPALA-12250 > Project: IMPALA > Issue Type: Task >Reporter: Fang-Yu Rao >Assignee: Fang-Yu Rao >Priority: Major > > In IMPALA-12248, we added 3 new Ranger configuration properties that will be > required after we start using a build that includes > [RANGER-2895|https://github.com/apache/ranger/commit/846031985cae70f7a8c5e92faf186948a302260e#diff-dcab4376623684e416c7e60162c7af7a7d3789fe1d61a2cfdaef794334426f05] > in order to start Ranger's HTTP server. > Recall that a Ranger configuration property was deprecated in > [RANGER-2895|https://github.com/apache/ranger/commit/846031985cae70f7a8c5e92faf186948a302260e#diff-dcab4376623684e416c7e60162c7af7a7d3789fe1d61a2cfdaef794334426f05], > i.e., > [ranger.jpa.jdbc.idleconnectiontestperiod|https://github.com/apache/ranger/commit/846031985cae70f7a8c5e92faf186948a302260e#diff-9669116dca1e5c9fffdb2c81d4d9ac57b489131e90b89ff17b56801131bad5a6L419]. > Thus, we should also remove it from > [ranger-admin-default-site.xml.template|https://github.com/apache/impala/blob/master/testdata/cluster/ranger/ranger-admin-default-site.xml.template] > after starting using a build that includes > [RANGER-2895|https://github.com/apache/ranger/commit/846031985cae70f7a8c5e92faf186948a302260e#diff-dcab4376623684e416c7e60162c7af7a7d3789fe1d61a2cfdaef794334426f05]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-12250) Remove deprecated Ranger configuration properties after RANGER-2895
[ https://issues.apache.org/jira/browse/IMPALA-12250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao resolved IMPALA-12250. -- Resolution: Fixed Resolve the issue since the fix has been merged. > Remove deprecated Ranger configuration properties after RANGER-2895 > --- > > Key: IMPALA-12250 > URL: https://issues.apache.org/jira/browse/IMPALA-12250 > Project: IMPALA > Issue Type: Task >Reporter: Fang-Yu Rao >Assignee: Fang-Yu Rao >Priority: Major > > In IMPALA-12248, we added 3 new Ranger configuration properties that will be > required after we start using a build that includes > [RANGER-2895|https://github.com/apache/ranger/commit/846031985cae70f7a8c5e92faf186948a302260e#diff-dcab4376623684e416c7e60162c7af7a7d3789fe1d61a2cfdaef794334426f05] > in order to start Ranger's HTTP server. > Recall that a Ranger configuration property was deprecated in > [RANGER-2895|https://github.com/apache/ranger/commit/846031985cae70f7a8c5e92faf186948a302260e#diff-dcab4376623684e416c7e60162c7af7a7d3789fe1d61a2cfdaef794334426f05], > i.e., > [ranger.jpa.jdbc.idleconnectiontestperiod|https://github.com/apache/ranger/commit/846031985cae70f7a8c5e92faf186948a302260e#diff-9669116dca1e5c9fffdb2c81d4d9ac57b489131e90b89ff17b56801131bad5a6L419]. > Thus, we should also remove it from > [ranger-admin-default-site.xml.template|https://github.com/apache/impala/blob/master/testdata/cluster/ranger/ranger-admin-default-site.xml.template] > after starting using a build that includes > [RANGER-2895|https://github.com/apache/ranger/commit/846031985cae70f7a8c5e92faf186948a302260e#diff-dcab4376623684e416c7e60162c7af7a7d3789fe1d61a2cfdaef794334426f05]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-12311) Extra newlines are produced when an end-to-end test is run with update_results
[ https://issues.apache.org/jira/browse/IMPALA-12311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao resolved IMPALA-12311. -- Resolution: Fixed Resolve the issue since the fix has been merged. > Extra newlines are produced when an end-to-end test is run with > update_results > --- > > Key: IMPALA-12311 > URL: https://issues.apache.org/jira/browse/IMPALA-12311 > Project: IMPALA > Issue Type: Bug >Affects Versions: Impala 4.1.2 >Reporter: Fang-Yu Rao >Assignee: Fang-Yu Rao >Priority: Minor > Labels: test-infra > > We found that extra newlines are produced in the updated golden file when the > actual results do not match the expected results specified in the original > golden file. > Take > [TestDecimalExprs::test_exprs()|https://github.com/apache/impala/blob/master/tests/query_test/test_decimal_queries.py#L75] > for example, this test runs the test cases in > [decimal-exprs.test|https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test]. > Suppose that we modify the expected error message at > [https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test#L107] > from "UDF WARNING: Decimal expression overflowed, returning NULL" to the > following (the original string with an additional "x"). > {noformat} > UDF WARNING: Decimal expression overflowed, returning NULLx > {noformat} > Then we run this test using the following command with the command line > argument '--update_results'. > {code:java} > $IMPALA_HOME/bin/impala-py.test \ > --update_results \ > --junitxml=$IMPALA_EE_TEST_LOGS_DIR/results/test_decimal.xml \ > $IMPALA_HOME/tests/query_test/test_decimal_queries.py::TestDecimalExprs::test_exprs > {code} > In $IMPALA_HOME/logs/ee_tests/QueryTest_decimal-exprs.test, we will find that > the following subsection corresponding to the query. There are 3 additional > newlines in the subsection of 'ERRORS'. > {noformat} > ERRORS > UDF WARNING: Decimal expression overflowed, returning NULL > > {noformat} > One of the newlines was produced in > [join_section_lines()|https://github.com/apache/impala/blob/master/tests/util/test_file_parser.py#L298]. > This function is called when the actual results do not match the expected > results in the following 4 places. > # [test_section['ERRORS'] = > join_section_lines(actual_errors)|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L398]. > # [test_section['TYPES'] = join_section_lines(\[', > '.join(actual_types)\])|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L429]. > # [test_section['LABELS'] = join_section_lines(\[', > '.join(actual_labels)\])|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L451]. > # [test_section[result_section] = > join_section_lines(actual.result_list)|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L489]. > Thus, we also have the same issue for subsections like TYPES, LABELS, and > RESULTS in such a scenario (actual results do not match expected ones). It > would be good if a user/developer does not have to manually remove those > extra newlines when trying to generate the golden files for new test files. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-12423) Impala shell should allow a user to set up query options when the underlying protocol is strict_hs2_protocol
Fang-Yu Rao created IMPALA-12423: Summary: Impala shell should allow a user to set up query options when the underlying protocol is strict_hs2_protocol Key: IMPALA-12423 URL: https://issues.apache.org/jira/browse/IMPALA-12423 Project: IMPALA Issue Type: New Feature Reporter: Fang-Yu Rao Assignee: Fang-Yu Rao Currently when we use the Impala shell to connect to a service, e.g., HiveServer2, via the strict HS2 protocol, we are not able to execute the SET statement or to set up the value of a query option as shown in the following. It would be much more convenient if a user is at least able to set up the value of a query option in the Impala shell when the Impala shell is used to connect to an external frontend that sends query plans to the Impala server for execution. {code:java} fangyurao@fangyu-upstream-dev:~$ impala-shell.sh -i 'localhost:11050' --strict_hs2_protocol Starting Impala Shell with no authentication using Python 2.7.16 WARNING: Unable to track live progress with strict_hs2_protocol LDAP password for fangyurao: Opened TCP connection to localhost:11050 Connected to localhost:11050 Server version: N/A *** Welcome to the Impala shell. (Impala Shell v4.3.0-SNAPSHOT (2f06a7b) built on Tue Sep 5 14:14:24 PDT 2023) To see how Impala will plan to run your query without actually executing it, use the EXPLAIN command. You can change the level of detail in the EXPLAIN output by setting the EXPLAIN_LEVEL query option. *** [localhost:11050] default> set; Query options (defaults shown in []): No options available. Shell Options WRITE_DELIMITED: False VERBOSE: True VERTICAL: False LIVE_SUMMARY: False OUTPUT_FILE: None DELIMITER: \t LIVE_PROGRESS: False Variables: No variables defined. [localhost:11050] default> set num_nodes=2; Unknown query option: num_nodes Available query options, with their values (defaults shown in []): Query options (defaults shown in []): No options available. Shell Options WRITE_DELIMITED: False VERBOSE: True VERTICAL: False LIVE_SUMMARY: False OUTPUT_FILE: None DELIMITER: \t LIVE_PROGRESS: False {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-12329) Access type of Ranger audit event being set up in more than one place inconsistently
Fang-Yu Rao created IMPALA-12329: Summary: Access type of Ranger audit event being set up in more than one place inconsistently Key: IMPALA-12329 URL: https://issues.apache.org/jira/browse/IMPALA-12329 Project: IMPALA Issue Type: Bug Reporter: Fang-Yu Rao Assignee: Fang-Yu Rao We found that for some queries, the access type of Ranger audit event could be set up in more than one place inconsistently. For instance, take the TRUNCATE TABLE statement for example. During the authorization of this query, the access type of the corresponding Ranger audit event would be first set up to "update" at [RangerAuthorizationChecker#authorizeResource()|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/authorization/ranger/RangerAuthorizationChecker.java#L664]. But later at [RangerAuthorizationChecker#updateAuditEvents()|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/authorization/ranger/RangerAuthorizationChecker.java#L645], the access type will be set up to "insert" which is the value of privilege.name().toLowerCase(). We probably should not have to set up the access type differently in 2 places. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Comment Edited] (IMPALA-12311) Extra newlines are produced when an end-to-end test is run with update_results
[ https://issues.apache.org/jira/browse/IMPALA-12311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17747223#comment-17747223 ] Fang-Yu Rao edited comment on IMPALA-12311 at 7/26/23 1:27 AM: --- I have verified that for the subsections of ERRORS, TYPES, and LABELS, additional newlines in the end of a subsection are okay. However, any additional newline added to the subsection of RESULTS would fail the test case. According to what we have seen here, it should be safe to just not output a trailing newline in [join_section_lines()|https://github.com/apache/impala/blob/master/tests/util/test_file_parser.py#L294]. +*Subsection of ERRORS*+ For the subsection of ERRORS, we found that the expected error message is post-processed by the following at [https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L328-L333], which removes extra newlines. Note that for a line 'expected_error' containing only a newline, 'expected_error' evaluates to false. {code:java} for expected_error in expected_errors: if not expected_error: continue if ROW_REGEX_PREFIX.match(expected_error): converted_expected_errors.append(expected_error) else: converted_expected_errors.append("'%s'" % expected_error) {code} On the other hand, the actual results (exec_result.log at [https://github.com/apache/impala/blob/master/tests/util/test_file_parser.py#L298]) contain 2 newlines. Since we use the following to create the respective QueryTestResult in [verify_errors()|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L324], additional newlines are also removed. Note that a line 'l' evaluates to false if it's a newline. {code:java} actual = QueryTestResult(["'%s'" % l for l in actual_errors if l], ['STRING'], ['DUMMY_LABEL'], order_matters=False) {code} That is, additional newlines are removed from both the expected error message and the actual error message. Hence, additional newlines added in the subsection of ERRORS are okay. *+Subsection of TYPES+* Moreover, additional newlines in the subsection of TYPES are okay since we remove additional newlines when constructing the expected line of types at [https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L406]. {code:java} expected_types = [c.strip().upper() for c in remove_comments(section).rstrip('\n').split(',')] {code} *+Subsection of LABELS+* Additional newlines in the subsection of LABELS are okay as well because we use the following to construct the expected line of labels at [https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L446]. {code:java} expected_labels = [c.strip().upper() for c in test_section['LABELS'].split(',')] {code} was (Author: fangyurao): I have verified that for the subsections of ERRORS, TYPES, and LABELS, additional newlines in the end of a subsection are okay. However, any additional newline added to the subsection of RESULTS would fail the test case. According to what we have seen here, it should be safe to just not output a trailing newline in [join_section_lines()|https://github.com/apache/impala/blob/master/tests/util/test_file_parser.py#L294] +*Subsection of ERRORS*+ For the subsection of ERRORS, we found that the expected error message is post-processed by the following at [https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L328-L333], which removes extra newlines. Note that for a line 'expected_error' containing only a newline, 'expected_error' evaluates to false. {code:java} for expected_error in expected_errors: if not expected_error: continue if ROW_REGEX_PREFIX.match(expected_error): converted_expected_errors.append(expected_error) else: converted_expected_errors.append("'%s'" % expected_error) {code} On the other hand, the actual results (exec_result.log at [https://github.com/apache/impala/blob/master/tests/util/test_file_parser.py#L298]) contain 2 newlines. Since we use the following to create the respective QueryTestResult in [verify_errors()|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L324], additional newlines are also removed. Note that a line 'l' evaluates to false if it's a newline. {code:java} actual = QueryTestResult(["'%s'" % l for l in actual_errors if l], ['STRING'], ['DUMMY_LABEL'], order_matters=False) {code} That is, additional newlines are removed from both the expected error message and the actual error message. Hence, additional newlines added in the subsection of ERRORS are okay. *+Subsection of TYPES+* Moreover, additional newlines in the subsection of TYPES are okay since we remove additional newlines when constructing the expected line of types at
[jira] [Comment Edited] (IMPALA-12311) Extra newlines are produced when an end-to-end test is run with update_results
[ https://issues.apache.org/jira/browse/IMPALA-12311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17747223#comment-17747223 ] Fang-Yu Rao edited comment on IMPALA-12311 at 7/26/23 1:26 AM: --- I have verified that for the subsections of ERRORS, TYPES, and LABELS, additional newlines in the end of a subsection are okay. However, any additional newline added to the subsection of RESULTS would fail the test case. According to what we have seen here, it should be safe to just not output a trailing newline in [join_section_lines()|https://github.com/apache/impala/blob/master/tests/util/test_file_parser.py#L294] +*Subsection of ERRORS*+ For the subsection of ERRORS, we found that the expected error message is post-processed by the following at [https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L328-L333], which removes extra newlines. Note that for a line 'expected_error' containing only a newline, 'expected_error' evaluates to false. {code:java} for expected_error in expected_errors: if not expected_error: continue if ROW_REGEX_PREFIX.match(expected_error): converted_expected_errors.append(expected_error) else: converted_expected_errors.append("'%s'" % expected_error) {code} On the other hand, the actual results (exec_result.log at [https://github.com/apache/impala/blob/master/tests/util/test_file_parser.py#L298]) contain 2 newlines. Since we use the following to create the respective QueryTestResult in [verify_errors()|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L324], additional newlines are also removed. Note that a line 'l' evaluates to false if it's a newline. {code:java} actual = QueryTestResult(["'%s'" % l for l in actual_errors if l], ['STRING'], ['DUMMY_LABEL'], order_matters=False) {code} That is, additional newlines are removed from both the expected error message and the actual error message. Hence, additional newlines added in the subsection of ERRORS are okay. *+Subsection of TYPES+* Moreover, additional newlines in the subsection of TYPES are okay since we remove additional newlines when constructing the expected line of types at [https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L406]. {code:java} expected_types = [c.strip().upper() for c in remove_comments(section).rstrip('\n').split(',')] {code} *+Subsection of LABELS+* Additional newlines in the subsection of LABELS are okay as well because we use the following to construct the expected line of labels at [https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L446]. {code:java} expected_labels = [c.strip().upper() for c in test_section['LABELS'].split(',')] {code} was (Author: fangyurao): I have verified that for the subsections of ERRORS, TYPES, and LABELS, additional newlines in the end of a subsection are okay. However, any additional newline added to the subsection of RESULTS would fail the test case. According to what we have seen here, it should be safe to just not output a trailing newline in [join_section_lines()] +*Subsection of ERRORS*+ For the subsection of ERRORS, we found that the expected error message is post-processed by the following at [https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L328-L333], which removes extra newlines. Note that for a line 'expected_error' containing only a newline, 'expected_error' evaluates to false. {code:java} for expected_error in expected_errors: if not expected_error: continue if ROW_REGEX_PREFIX.match(expected_error): converted_expected_errors.append(expected_error) else: converted_expected_errors.append("'%s'" % expected_error) {code} On the other hand, the actual results (exec_result.log at [https://github.com/apache/impala/blob/master/tests/util/test_file_parser.py#L298]) contain 2 newlines. Since we use the following to create the respective QueryTestResult in [verify_errors()|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L324], additional newlines are also removed. Note that a line 'l' evaluates to false if it's a newline. {code:java} actual = QueryTestResult(["'%s'" % l for l in actual_errors if l], ['STRING'], ['DUMMY_LABEL'], order_matters=False) {code} That is, additional newlines are removed from both the expected error message and the actual error message. Hence, additional newlines added in the subsection of ERRORS are okay. *+Subsection of TYPES+* Moreover, additional newlines in the subsection of TYPES are okay since we remove additional newlines when constructing the expected line of types at [https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L406]. {code:java} expected_types = [c.strip().upper()
[jira] [Comment Edited] (IMPALA-12311) Extra newlines are produced when an end-to-end test is run with update_results
[ https://issues.apache.org/jira/browse/IMPALA-12311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17747223#comment-17747223 ] Fang-Yu Rao edited comment on IMPALA-12311 at 7/26/23 1:25 AM: --- I have verified that for the subsections of ERRORS, TYPES, and LABELS, additional newlines in the end of a subsection are okay. However, any additional newline added to the subsection of RESULTS would fail the test case. According to what we have seen here, it should be safe to just not output a trailing newline in [join_section_lines()] +*Subsection of ERRORS*+ For the subsection of ERRORS, we found that the expected error message is post-processed by the following at [https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L328-L333], which removes extra newlines. Note that for a line 'expected_error' containing only a newline, 'expected_error' evaluates to false. {code:java} for expected_error in expected_errors: if not expected_error: continue if ROW_REGEX_PREFIX.match(expected_error): converted_expected_errors.append(expected_error) else: converted_expected_errors.append("'%s'" % expected_error) {code} On the other hand, the actual results (exec_result.log at [https://github.com/apache/impala/blob/master/tests/util/test_file_parser.py#L298]) contain 2 newlines. Since we use the following to create the respective QueryTestResult in [verify_errors()|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L324], additional newlines are also removed. Note that a line 'l' evaluates to false if it's a newline. {code:java} actual = QueryTestResult(["'%s'" % l for l in actual_errors if l], ['STRING'], ['DUMMY_LABEL'], order_matters=False) {code} That is, additional newlines are removed from both the expected error message and the actual error message. Hence, additional newlines added in the subsection of ERRORS are okay. *+Subsection of TYPES+* Moreover, additional newlines in the subsection of TYPES are okay since we remove additional newlines when constructing the expected line of types at [https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L406]. {code:java} expected_types = [c.strip().upper() for c in remove_comments(section).rstrip('\n').split(',')] {code} *+Subsection of LABELS+* Additional newlines in the subsection of LABELS are okay as well because we use the following to construct the expected line of labels at [https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L446]. {code:java} expected_labels = [c.strip().upper() for c in test_section['LABELS'].split(',')] {code} was (Author: fangyurao): I have verified that for the subsections of ERRORS, TYPES, and LABELS, additional newlines in the end of a subsection are okay. +*Subsection of ERRORS*+ For the subsection of ERRORS, we found that the expected error message is post-processed by the following call to split_section_lines() at [https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L391], which removes extra newlines. {code:java} expected_errors = split_section_lines(remove_comments(test_section['ERRORS'])) {code} On the other hand, the actual results (exec_result.log at [https://github.com/apache/impala/blob/master/tests/util/test_file_parser.py#L298]) contain 2 newlines. Since we use the following to create the respective QueryTestResult in [verify_errors()|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L324], additional newlines are also removed. Note that a line 'l' evaluates to false if it's a newline. {code:java} actual = QueryTestResult(["'%s'" % l for l in actual_errors if l], ['STRING'], ['DUMMY_LABEL'], order_matters=False) {code} That is, additional newlines are removed from both the expected error message and the actual error message. Hence, additional newlines added in the subsection of ERRORS are okay. *+Subsection of TYPES+* Moreover, additional newlines in the subsection of TYPES are okay since we remove additional newlines when constructing the expected line of types at [https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L406]. {code:java} expected_types = [c.strip().upper() for c in remove_comments(section).rstrip('\n').split(',')] {code} *+Subsection of LABELS+* Additional newlines in the subsection of LABELS are okay as well because we use the following to construct the expected line of labels at [https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L446]. {code:java} expected_labels = [c.strip().upper() for c in test_section['LABELS'].split(',')] {code} > Extra newlines are produced when an end-to-end test is run with > update_results >
[jira] [Commented] (IMPALA-12311) Extra newlines are produced when an end-to-end test is run with update_results
[ https://issues.apache.org/jira/browse/IMPALA-12311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17747223#comment-17747223 ] Fang-Yu Rao commented on IMPALA-12311: -- I have verified that for the subsections of ERRORS, TYPES, and LABELS, additional newlines in the end of a subsection are okay. +*Subsection of ERRORS*+ For the subsection of ERRORS, we found that the expected error message is post-processed by the following call to split_section_lines() at [https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L391], which removes extra newlines. {code:java} expected_errors = split_section_lines(remove_comments(test_section['ERRORS'])) {code} On the other hand, the actual results (exec_result.log at [https://github.com/apache/impala/blob/master/tests/util/test_file_parser.py#L298]) contain 2 newlines. Since we use the following to create the respective QueryTestResult in [verify_errors()|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L324], additional newlines are also removed. Note that a line 'l' evaluates to false if it's a newline. {code:java} actual = QueryTestResult(["'%s'" % l for l in actual_errors if l], ['STRING'], ['DUMMY_LABEL'], order_matters=False) {code} That is, additional newlines are removed from both the expected error message and the actual error message. Hence, additional newlines added in the subsection of ERRORS are okay. *+Subsection of TYPES+* Moreover, additional newlines in the subsection of TYPES are okay since we remove additional newlines when constructing the expected line of types at [https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L406]. {code:java} expected_types = [c.strip().upper() for c in remove_comments(section).rstrip('\n').split(',')] {code} *+Subsection of LABELS+* Additional newlines in the subsection of LABELS are okay as well because we use the following to construct the expected line of labels at [https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L446]. {code:java} expected_labels = [c.strip().upper() for c in test_section['LABELS'].split(',')] {code} > Extra newlines are produced when an end-to-end test is run with > update_results > --- > > Key: IMPALA-12311 > URL: https://issues.apache.org/jira/browse/IMPALA-12311 > Project: IMPALA > Issue Type: Bug >Affects Versions: Impala 4.1.2 >Reporter: Fang-Yu Rao >Assignee: Fang-Yu Rao >Priority: Minor > Labels: test-infra > > We found that extra newlines are produced in the updated golden file when the > actual results do not match the expected results specified in the original > golden file. > Take > [TestDecimalExprs::test_exprs()|https://github.com/apache/impala/blob/master/tests/query_test/test_decimal_queries.py#L75] > for example, this test runs the test cases in > [decimal-exprs.test|https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test]. > Suppose that we modify the expected error message at > [https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test#L107] > from "UDF WARNING: Decimal expression overflowed, returning NULL" to the > following (the original string with an additional "x"). > {noformat} > UDF WARNING: Decimal expression overflowed, returning NULLx > {noformat} > Then we run this test using the following command with the command line > argument '--update_results'. > {code:java} > $IMPALA_HOME/bin/impala-py.test \ > --update_results \ > --junitxml=$IMPALA_EE_TEST_LOGS_DIR/results/test_decimal.xml \ > $IMPALA_HOME/tests/query_test/test_decimal_queries.py::TestDecimalExprs::test_exprs > {code} > In $IMPALA_HOME/logs/ee_tests/QueryTest_decimal-exprs.test, we will find that > the following subsection corresponding to the query. There are 3 additional > newlines in the subsection of 'ERRORS'. > {noformat} > ERRORS > UDF WARNING: Decimal expression overflowed, returning NULL > > {noformat} > One of the newlines was produced in > [join_section_lines()|https://github.com/apache/impala/blob/master/tests/util/test_file_parser.py#L298]. > This function is called when the actual results do not match the expected > results in the following 4 places. > # [test_section['ERRORS'] = > join_section_lines(actual_errors)|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L398]. > # [test_section['TYPES'] = join_section_lines(\[', > '.join(actual_types)\])|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L429]. > # [test_section['LABELS'] =
[jira] [Updated] (IMPALA-12311) Extra newlines are produced when an end-to-end test is run with update_results
[ https://issues.apache.org/jira/browse/IMPALA-12311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao updated IMPALA-12311: - Description: We found that extra newlines are produced in the updated golden file when the actual results do not match the expected results specified in the original golden file. Take [TestDecimalExprs::test_exprs()|https://github.com/apache/impala/blob/master/tests/query_test/test_decimal_queries.py#L75] for example, this test runs the test cases in [decimal-exprs.test|https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test]. Suppose that we modify the expected error message at [https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test#L107] from "UDF WARNING: Decimal expression overflowed, returning NULL" to the following (the original string with an additional "x"). {noformat} UDF WARNING: Decimal expression overflowed, returning NULLx {noformat} Then we run this test using the following command with the command line argument '--update_results'. {code:java} $IMPALA_HOME/bin/impala-py.test \ --update_results \ --junitxml=$IMPALA_EE_TEST_LOGS_DIR/results/test_decimal.xml \ $IMPALA_HOME/tests/query_test/test_decimal_queries.py::TestDecimalExprs::test_exprs {code} In $IMPALA_HOME/logs/ee_tests/QueryTest_decimal-exprs.test, we will find that the following subsection corresponding to the query. There are 3 additional newlines in the subsection of 'ERRORS'. {noformat} ERRORS UDF WARNING: Decimal expression overflowed, returning NULL {noformat} One of the newlines was produced in [join_section_lines()|https://github.com/apache/impala/blob/master/tests/util/test_file_parser.py#L298]. This function is called when the actual results do not match the expected results in the following 4 places. # [test_section['ERRORS'] = join_section_lines(actual_errors)|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L398]. # [test_section['TYPES'] = join_section_lines(\[', '.join(actual_types)\])|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L429]. # [test_section['LABELS'] = join_section_lines(\[', '.join(actual_labels)\])|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L451]. # [test_section[result_section] = join_section_lines(actual.result_list)|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L489]. Thus, we also have the same issue for subsections like TYPES, LABELS, and RESULTS in such a scenario (actual results do not match expected ones). It would be good if a user/developer does not have to manually remove those extra newlines when trying to generate the golden files for new test files. was: We found that extra newlines are produced in the updated golden file when the actual results do not match the expected results specified in the original golden file. Take [TestDecimalExprs::test_exprs()|https://github.com/apache/impala/blob/master/tests/query_test/test_decimal_queries.py#L75] for example, this test runs the test cases in [decimal-exprs.test|https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test]. Suppose that we modify the expected error message at [https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test#L107] from "UDF WARNING: Decimal expression overflowed, returning NULL" to the following (the original string with an additional "x"). {noformat} UDF WARNING: Decimal expression overflowed, returning NULLx {noformat} Then we run this test using the following command with the command line argument '--update_results'. {code:java} $IMPALA_HOME/bin/impala-py.test \ --update_results \ --junitxml=$IMPALA_EE_TEST_LOGS_DIR/results/test_decimal.xml \ $IMPALA_HOME/tests/query_test/test_decimal_queries.py::TestDecimalExprs::test_exprs {code} In $IMPALA_HOME/logs/ee_tests/QueryTest_decimal-exprs.test, we will find that the following subsection corresponding to the query. There are 3 additional newlines in the subsection of 'ERRORS'. {noformat} ERRORS UDF WARNING: Decimal expression overflowed, returning NULL {noformat} One of the newlines was produced in [join_section_lines()|https://github.com/apache/impala/blob/master/tests/util/test_file_parser.py#L298]. This function is called when the actual results do not match the expected results in the following 4 places. # [test_section['ERRORS'] = join_section_lines(actual_errors)|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L398]. # [test_section['TYPES'] = join_section_lines([', '.join(actual_types)])|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L429]. #
[jira] [Updated] (IMPALA-12311) Extra newlines are produced when an end-to-end test is run with update_results
[ https://issues.apache.org/jira/browse/IMPALA-12311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao updated IMPALA-12311: - Description: We found that extra newlines are produced in the updated golden file when the actual results do not match the expected results specified in the original golden file. Take [TestDecimalExprs::test_exprs()|https://github.com/apache/impala/blob/master/tests/query_test/test_decimal_queries.py#L75] for example, this test runs the test cases in [decimal-exprs.test|https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test]. Suppose that we modify the expected error message at [https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test#L107] from "UDF WARNING: Decimal expression overflowed, returning NULL" to the following (the original string with an additional "x"). {noformat} UDF WARNING: Decimal expression overflowed, returning NULLx {noformat} Then we run this test using the following command with the command line argument '--update_results'. {code:java} $IMPALA_HOME/bin/impala-py.test \ --update_results \ --junitxml=$IMPALA_EE_TEST_LOGS_DIR/results/test_decimal.xml \ $IMPALA_HOME/tests/query_test/test_decimal_queries.py::TestDecimalExprs::test_exprs {code} In $IMPALA_HOME/logs/ee_tests/QueryTest_decimal-exprs.test, we will find that the following subsection corresponding to the query. There are 3 additional newlines in the subsection of 'ERRORS'. {noformat} ERRORS UDF WARNING: Decimal expression overflowed, returning NULL {noformat} One of the newlines was produced in [join_section_lines()|https://github.com/apache/impala/blob/master/tests/util/test_file_parser.py#L298]. This function is called when the actual results do not match the expected results in the following 4 places. # [test_section['ERRORS'] = join_section_lines(actual_errors)|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L398]. # [test_section['TYPES'] = join_section_lines([', '.join(actual_types)])|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L429]. # [test_section['LABELS'] = join_section_lines([', '.join(actual_labels)])|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L451]. # [test_section[result_section] = join_section_lines(actual.result_list)|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L489]. Thus, we also have the same issue for subsections like TYPES, LABELS, and RESULTS in such a scenario (actual results do not match expected ones). It would be good if a user/developer does not have to manually remove those extra newlines when trying to generate the golden files for new test files. was: We found that extra newlines are produced in the updated golden file when the actual results do not match the expected results specified in the original golden file. Take [TestDecimalExprs::test_exprs()|https://github.com/apache/impala/blob/master/tests/query_test/test_decimal_queries.py#L75] for example, this test runs the test cases in [decimal-exprs.test|https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test]. Suppose that we modify the expected error message at [https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test#L107] from "UDF WARNING: Decimal expression overflowed, returning NULL" to the following (the original string with an additional "x"). {noformat} UDF WARNING: Decimal expression overflowed, returning NULLx {noformat} Then we run this test using the following command with the command line argument '--update_results'. {code:java} $IMPALA_HOME/bin/impala-py.test \ --update_results \ --junitxml=$IMPALA_EE_TEST_LOGS_DIR/results/test_decimal.xml \ $IMPALA_HOME/tests/query_test/test_decimal_queries.py::TestDecimalExprs::test_exprs {code} In $IMPALA_HOME/logs/ee_tests/QueryTest_decimal-exprs.test, we will found the following subsection corresponding to the query. There are 3 additional newlines in the subsection of 'ERRORS'. {noformat} ERRORS UDF WARNING: Decimal expression overflowed, returning NULL {noformat} One of the newlines was produced in [join_section_lines()|https://github.com/apache/impala/blob/master/tests/util/test_file_parser.py#L298]. This function is called when the actual results do not match the expected results in the following 4 places. # [test_section['ERRORS'] = join_section_lines(actual_errors)|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L398]. # [test_section['TYPES'] = join_section_lines(\[', '.join(actual_types)\])|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L429]. #
[jira] [Updated] (IMPALA-12311) Extra newlines are produced when an end-to-end test is run with update_results
[ https://issues.apache.org/jira/browse/IMPALA-12311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao updated IMPALA-12311: - Description: We found that extra newlines are produced in the updated golden file when the actual results do not match the expected results specified in the original golden file. Take [TestDecimalExprs::test_exprs()|https://github.com/apache/impala/blob/master/tests/query_test/test_decimal_queries.py#L75] for example, this test runs the test cases in [decimal-exprs.test|https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test]. Suppose that we modify the expected error message at [https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test#L107] from "UDF WARNING: Decimal expression overflowed, returning NULL" to the following (the original string with an additional "x"). {noformat} UDF WARNING: Decimal expression overflowed, returning NULLx {noformat} Then we run this test using the following command with the command line argument '--update_results'. {code:java} $IMPALA_HOME/bin/impala-py.test \ --update_results \ --junitxml=$IMPALA_EE_TEST_LOGS_DIR/results/test_decimal.xml \ $IMPALA_HOME/tests/query_test/test_decimal_queries.py::TestDecimalExprs::test_exprs {code} In $IMPALA_HOME/logs/ee_tests/QueryTest_decimal-exprs.test, we will found the following subsection corresponding to the query. There are 3 additional newlines in the subsection of 'ERRORS'. {noformat} ERRORS UDF WARNING: Decimal expression overflowed, returning NULL {noformat} One of the newlines was produced in [join_section_lines()|https://github.com/apache/impala/blob/master/tests/util/test_file_parser.py#L298]. This function is called when the actual results do not match the expected results in the following 4 places. # [test_section['ERRORS'] = join_section_lines(actual_errors)|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L398]. # [test_section['TYPES'] = join_section_lines(\[', '.join(actual_types)\])|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L429]. # [test_section['LABELS'] = join_section_lines(\[', '.join(actual_labels)\])|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L451]. # [test_section[result_section] = join_section_lines(actual.result_list)|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L489]. Thus, we also have the same issue for subsections like TYPES, LABELS, and RESULTS in such a scenario (actual results do not match expected ones). It would be good if a user/developer does not have to manually remove those extra newlines when trying to generate the golden files for new test files. was: We found that extra newlines are produced in the updated golden file when the actual results do not match the expected results specified in the original golden file. Take [TestDecimalExprs::test_exprs()|https://github.com/apache/impala/blob/master/tests/query_test/test_decimal_queries.py#L75] for example, this test runs the test cases in [decimal-exprs.test|https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test]. Suppose that we modify the expected error message at [https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test#L107] from "UDF WARNING: Decimal expression overflowed, returning NULL" to the following (the original string with an additional "x"). {noformat} UDF WARNING: Decimal expression overflowed, returning NULLx {noformat} Then we run this test using the following command with the command line argument '--update_results'. {code:java} $IMPALA_HOME/bin/impala-py.test \ --update_results \ --junitxml=$IMPALA_EE_TEST_LOGS_DIR/results/test_decimal.xml \ $IMPALA_HOME/tests/query_test/test_decimal_queries.py::TestDecimalExprs::test_exprs {code} In $IMPALA_HOME/logs/ee_tests/QueryTest_decimal-exprs.test, we will found the following subsection corresponding to the query. There are 3 additional newlines in the subsection of 'ERRORS'. {noformat} ERRORS UDF WARNING: Decimal expression overflowed, returning NULL {noformat} One of the newlines was produced in [join_section_lines()|https://github.com/apache/impala/blob/master/tests/util/test_file_parser.py#L298]. This function is called when the actual results do not match the expected results in the following 4 places. # [test_section['ERRORS'] = join_section_lines(actual_errors)|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L398]. # [test_section['TYPES'] = join_section_lines([', '.join(actual_types)])|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L429]. #
[jira] [Updated] (IMPALA-12311) Extra newlines are produced when an end-to-end test is run with update_results
[ https://issues.apache.org/jira/browse/IMPALA-12311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao updated IMPALA-12311: - Labels: test-infra (was: ) > Extra newlines are produced when an end-to-end test is run with > update_results > --- > > Key: IMPALA-12311 > URL: https://issues.apache.org/jira/browse/IMPALA-12311 > Project: IMPALA > Issue Type: Bug >Reporter: Fang-Yu Rao >Assignee: Fang-Yu Rao >Priority: Minor > Labels: test-infra > > We found that extra newlines are produced in the updated golden file when the > actual results do not match the expected results specified in the original > golden file. > Take > [TestDecimalExprs::test_exprs()|https://github.com/apache/impala/blob/master/tests/query_test/test_decimal_queries.py#L75] > for example, this test runs the test cases in > [decimal-exprs.test|https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test]. > Suppose that we modify the expected error message at > [https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test#L107] > from "UDF WARNING: Decimal expression overflowed, returning NULL" to the > following (the original string with an additional "x"). > {noformat} > UDF WARNING: Decimal expression overflowed, returning NULLx > {noformat} > Then we run this test using the following command with the command line > argument '--update_results'. > {code:java} > $IMPALA_HOME/bin/impala-py.test \ > --update_results \ > --junitxml=$IMPALA_EE_TEST_LOGS_DIR/results/test_decimal.xml \ > $IMPALA_HOME/tests/query_test/test_decimal_queries.py::TestDecimalExprs::test_exprs > {code} > In $IMPALA_HOME/logs/ee_tests/QueryTest_decimal-exprs.test, we will found the > following subsection corresponding to the query. There are 3 additional > newlines in the subsection of 'ERRORS'. > {noformat} > ERRORS > UDF WARNING: Decimal expression overflowed, returning NULL > > {noformat} > One of the newlines was produced in > [join_section_lines()|https://github.com/apache/impala/blob/master/tests/util/test_file_parser.py#L298]. > This function is called when the actual results do not match the expected > results in the following 4 places. > # [test_section['ERRORS'] = > join_section_lines(actual_errors)|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L398]. > # [test_section['TYPES'] = join_section_lines([', > '.join(actual_types)])|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L429]. > # [test_section['LABELS'] = join_section_lines([', > '.join(actual_labels)])|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L451]. > # [test_section[result_section] = > join_section_lines(actual.result_list)|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L489]. > Thus, we also have the same issue for subsections like TYPES, LABELS, and > RESULTS in such a scenario (actual results do not match expected ones). It > would be good if a user/developer does not have to manually remove those > extra newlines when trying to generate the golden files for new test files. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-12311) Extra newlines are produced when an end-to-end test is run with update_results
[ https://issues.apache.org/jira/browse/IMPALA-12311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao updated IMPALA-12311: - Affects Version/s: Impala 4.1.2 > Extra newlines are produced when an end-to-end test is run with > update_results > --- > > Key: IMPALA-12311 > URL: https://issues.apache.org/jira/browse/IMPALA-12311 > Project: IMPALA > Issue Type: Bug >Affects Versions: Impala 4.1.2 >Reporter: Fang-Yu Rao >Assignee: Fang-Yu Rao >Priority: Minor > Labels: test-infra > > We found that extra newlines are produced in the updated golden file when the > actual results do not match the expected results specified in the original > golden file. > Take > [TestDecimalExprs::test_exprs()|https://github.com/apache/impala/blob/master/tests/query_test/test_decimal_queries.py#L75] > for example, this test runs the test cases in > [decimal-exprs.test|https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test]. > Suppose that we modify the expected error message at > [https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test#L107] > from "UDF WARNING: Decimal expression overflowed, returning NULL" to the > following (the original string with an additional "x"). > {noformat} > UDF WARNING: Decimal expression overflowed, returning NULLx > {noformat} > Then we run this test using the following command with the command line > argument '--update_results'. > {code:java} > $IMPALA_HOME/bin/impala-py.test \ > --update_results \ > --junitxml=$IMPALA_EE_TEST_LOGS_DIR/results/test_decimal.xml \ > $IMPALA_HOME/tests/query_test/test_decimal_queries.py::TestDecimalExprs::test_exprs > {code} > In $IMPALA_HOME/logs/ee_tests/QueryTest_decimal-exprs.test, we will found the > following subsection corresponding to the query. There are 3 additional > newlines in the subsection of 'ERRORS'. > {noformat} > ERRORS > UDF WARNING: Decimal expression overflowed, returning NULL > > {noformat} > One of the newlines was produced in > [join_section_lines()|https://github.com/apache/impala/blob/master/tests/util/test_file_parser.py#L298]. > This function is called when the actual results do not match the expected > results in the following 4 places. > # [test_section['ERRORS'] = > join_section_lines(actual_errors)|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L398]. > # [test_section['TYPES'] = join_section_lines([', > '.join(actual_types)])|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L429]. > # [test_section['LABELS'] = join_section_lines([', > '.join(actual_labels)])|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L451]. > # [test_section[result_section] = > join_section_lines(actual.result_list)|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L489]. > Thus, we also have the same issue for subsections like TYPES, LABELS, and > RESULTS in such a scenario (actual results do not match expected ones). It > would be good if a user/developer does not have to manually remove those > extra newlines when trying to generate the golden files for new test files. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-12311) Extra newlines are produced when an end-to-end test is run with update_results
Fang-Yu Rao created IMPALA-12311: Summary: Extra newlines are produced when an end-to-end test is run with update_results Key: IMPALA-12311 URL: https://issues.apache.org/jira/browse/IMPALA-12311 Project: IMPALA Issue Type: Bug Reporter: Fang-Yu Rao Assignee: Fang-Yu Rao We found that extra newlines are produced in the updated golden file when the actual results do not match the expected results specified in the original golden file. Take [TestDecimalExprs::test_exprs()|https://github.com/apache/impala/blob/master/tests/query_test/test_decimal_queries.py#L75] for example, this test runs the test cases in [decimal-exprs.test|https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test]. Suppose that we modify the expected error message at [https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test#L107] from "UDF WARNING: Decimal expression overflowed, returning NULL" to the following (the original string with an additional "x"). {noformat} UDF WARNING: Decimal expression overflowed, returning NULLx {noformat} Then we run this test using the following command with the command line argument '--update_results'. {code:java} $IMPALA_HOME/bin/impala-py.test \ --update_results \ --junitxml=$IMPALA_EE_TEST_LOGS_DIR/results/test_decimal.xml \ $IMPALA_HOME/tests/query_test/test_decimal_queries.py::TestDecimalExprs::test_exprs {code} In $IMPALA_HOME/logs/ee_tests/QueryTest_decimal-exprs.test, we will found the following subsection corresponding to the query. There are 3 additional newlines in the subsection of 'ERRORS'. {noformat} ERRORS UDF WARNING: Decimal expression overflowed, returning NULL {noformat} One of the newlines was produced in [join_section_lines()|https://github.com/apache/impala/blob/master/tests/util/test_file_parser.py#L298]. This function is called when the actual results do not match the expected results in the following 4 places. # [test_section['ERRORS'] = join_section_lines(actual_errors)|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L398]. # [test_section['TYPES'] = join_section_lines([', '.join(actual_types)])|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L429]. # [test_section['LABELS'] = join_section_lines([', '.join(actual_labels)])|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L451]. # [test_section[result_section] = join_section_lines(actual.result_list)|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L489]. Thus, we also have the same issue for subsections like TYPES, LABELS, and RESULTS in such a scenario (actual results do not match expected ones). It would be good if a user/developer does not have to manually remove those extra newlines when trying to generate the golden files for new test files. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-12250) Remove deprecated Ranger configuration properties after RANGER-2895
Fang-Yu Rao created IMPALA-12250: Summary: Remove deprecated Ranger configuration properties after RANGER-2895 Key: IMPALA-12250 URL: https://issues.apache.org/jira/browse/IMPALA-12250 Project: IMPALA Issue Type: Task Reporter: Fang-Yu Rao Assignee: Fang-Yu Rao In IMPALA-12248, we added 3 new Ranger configuration properties that will be required after we start using a build that includes [RANGER-2895|https://github.com/apache/ranger/commit/846031985cae70f7a8c5e92faf186948a302260e#diff-dcab4376623684e416c7e60162c7af7a7d3789fe1d61a2cfdaef794334426f05] in order to start Ranger's HTTP server. Recall that a Ranger configuration property was deprecated in [RANGER-2895|https://github.com/apache/ranger/commit/846031985cae70f7a8c5e92faf186948a302260e#diff-dcab4376623684e416c7e60162c7af7a7d3789fe1d61a2cfdaef794334426f05], i.e., [ranger.jpa.jdbc.idleconnectiontestperiod|https://github.com/apache/ranger/commit/846031985cae70f7a8c5e92faf186948a302260e#diff-9669116dca1e5c9fffdb2c81d4d9ac57b489131e90b89ff17b56801131bad5a6L419]. Thus, we should also remove it from [ranger-admin-default-site.xml.template|https://github.com/apache/impala/blob/master/testdata/cluster/ranger/ranger-admin-default-site.xml.template] after starting using a build that includes [RANGER-2895|https://github.com/apache/ranger/commit/846031985cae70f7a8c5e92faf186948a302260e#diff-dcab4376623684e416c7e60162c7af7a7d3789fe1d61a2cfdaef794334426f05]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-12248) Add required Ranger configuration properties after RANGER-2895
[ https://issues.apache.org/jira/browse/IMPALA-12248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao updated IMPALA-12248: - Description: [RANGER-2895|https://github.com/apache/ranger/commit/846031985cae70f7a8c5e92faf186948a302260e] added and removed some configuration properties. [Three new configuration properties were added|https://github.com/apache/ranger/commit/846031985cae70f7a8c5e92faf186948a302260e#diff-dcab4376623684e416c7e60162c7af7a7d3789fe1d61a2cfdaef794334426f05]. We found that once we bump up the build number to include RANGER-2895 and if those new properties do not exist in [ranger-admin-default-site.xml.template|https://github.com/apache/impala/blob/master/testdata/cluster/ranger/ranger-admin-default-site.xml.template] or [ranger-admin-site.xml.template|https://github.com/apache/impala/blob/master/testdata/cluster/ranger/ranger-admin-site.xml.template] then the produced site files for Ranger will not contain those new properties, resulting in some error message like the following in catalina.log. As a result, Ranger's HTTP server could not be properly started. {code:java} 23/06/25 04:46:01 ERROR context.ContextLoader: Context initialization failed org.springframework.beans.factory.BeanDefinitionStoreException: Invalid bean definition with name 'defaultDataSource' defined in ServletContext resource [/META-INF/applicationContext.xml]: Could not resolve placeholder 'ranger.jpa.jdbc.idletimeout' in value "${ranger.jpa.jdbc.idletimeout}"; nested exception is java.lang.IllegalArgumentException: Could not resolve placeholder 'ranger.jpa.jdbc.idletimeout' in value "${ranger.jpa.jdbc.idletimeout}" at {code} There are also some configuration properties removed in RANGER-2895, e.g., [ranger.jpa.jdbc.idleconnectiontestperiod|https://github.com/apache/ranger/commit/846031985cae70f7a8c5e92faf186948a302260e#diff-dcab4376623684e416c7e60162c7af7a7d3789fe1d61a2cfdaef794334426f05L136]. In this regard, we could probably add these 3 new properties first and then remove the unnecessary properties once we have bumped up the build number that includes RANGER-2895. was: [RANGER-2895|https://github.com/apache/ranger/commit/846031985cae70f7a8c5e92faf186948a302260e] added and removed some configuration properties. Three new configuration properties were added. We found that once we bump up the build number to include RANGER-2895 and if those new properties do not exist in [ranger-admin-default-site.xml.template|https://github.com/apache/impala/blob/master/testdata/cluster/ranger/ranger-admin-default-site.xml.template] or [ranger-admin-site.xml.template|https://github.com/apache/impala/blob/master/testdata/cluster/ranger/ranger-admin-site.xml.template] then the produced site files for Ranger will not contain those new properties, resulting in some error message like the following in catalina.log. As a result, Ranger's HTTP server could not be properly started. {code:java} 23/06/25 04:46:01 ERROR context.ContextLoader: Context initialization failed org.springframework.beans.factory.BeanDefinitionStoreException: Invalid bean definition with name 'defaultDataSource' defined in ServletContext resource [/META-INF/applicationContext.xml]: Could not resolve placeholder 'ranger.jpa.jdbc.idletimeout' in value "${ranger.jpa.jdbc.idletimeout}"; nested exception is java.lang.IllegalArgumentException: Could not resolve placeholder 'ranger.jpa.jdbc.idletimeout' in value "${ranger.jpa.jdbc.idletimeout}" at {code} There are also some configuration properties removed in RANGER-2895, e.g., [ranger.jpa.jdbc.idleconnectiontestperiod|https://github.com/apache/ranger/commit/846031985cae70f7a8c5e92faf186948a302260e#diff-dcab4376623684e416c7e60162c7af7a7d3789fe1d61a2cfdaef794334426f05L136]. In this regard, we could probably add these 3 new properties first and then remove the unnecessary properties once we have bumped up the build number that includes RANGER-2895. > Add required Ranger configuration properties after RANGER-2895 > -- > > Key: IMPALA-12248 > URL: https://issues.apache.org/jira/browse/IMPALA-12248 > Project: IMPALA > Issue Type: Task >Reporter: Fang-Yu Rao >Assignee: Fang-Yu Rao >Priority: Major > > [RANGER-2895|https://github.com/apache/ranger/commit/846031985cae70f7a8c5e92faf186948a302260e] > added and removed some configuration properties. > [Three new configuration properties were > added|https://github.com/apache/ranger/commit/846031985cae70f7a8c5e92faf186948a302260e#diff-dcab4376623684e416c7e60162c7af7a7d3789fe1d61a2cfdaef794334426f05]. > We found that once we bump up the build number to include RANGER-2895 and if > those new properties do not exist in >
[jira] [Created] (IMPALA-12248) Add required Ranger configuration properties after RANGER-2895
Fang-Yu Rao created IMPALA-12248: Summary: Add required Ranger configuration properties after RANGER-2895 Key: IMPALA-12248 URL: https://issues.apache.org/jira/browse/IMPALA-12248 Project: IMPALA Issue Type: Task Reporter: Fang-Yu Rao Assignee: Fang-Yu Rao [RANGER-2895|https://github.com/apache/ranger/commit/846031985cae70f7a8c5e92faf186948a302260e] added and removed some configuration properties. Three new configuration properties were added. We found that once we bump up the build number to include RANGER-2895 and if those new properties do not exist in [ranger-admin-default-site.xml.template|https://github.com/apache/impala/blob/master/testdata/cluster/ranger/ranger-admin-default-site.xml.template] or [ranger-admin-site.xml.template|https://github.com/apache/impala/blob/master/testdata/cluster/ranger/ranger-admin-site.xml.template] then the produced site files for Ranger will not contain those new properties, resulting in some error message like the following in catalina.log. As a result, Ranger's HTTP server could not be properly started. {code:java} 23/06/25 04:46:01 ERROR context.ContextLoader: Context initialization failed org.springframework.beans.factory.BeanDefinitionStoreException: Invalid bean definition with name 'defaultDataSource' defined in ServletContext resource [/META-INF/applicationContext.xml]: Could not resolve placeholder 'ranger.jpa.jdbc.idletimeout' in value "${ranger.jpa.jdbc.idletimeout}"; nested exception is java.lang.IllegalArgumentException: Could not resolve placeholder 'ranger.jpa.jdbc.idletimeout' in value "${ranger.jpa.jdbc.idletimeout}" at {code} There are also some configuration properties removed in RANGER-2895, e.g., [ranger.jpa.jdbc.idleconnectiontestperiod|https://github.com/apache/ranger/commit/846031985cae70f7a8c5e92faf186948a302260e#diff-dcab4376623684e416c7e60162c7af7a7d3789fe1d61a2cfdaef794334426f05L136]. In this regard, we could probably add these 3 new properties first and then remove the unnecessary properties once we have bumped up the build number that includes RANGER-2895. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12239) BitWidthZeroRepeated seems to be flaky
[ https://issues.apache.org/jira/browse/IMPALA-12239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17736586#comment-17736586 ] Fang-Yu Rao commented on IMPALA-12239: -- Hi [~daniel.becker], assigned this JIRA to you since your recent patch for IMPALA-12074 involves tests in [rle-test.cc|https://github.com/apache/impala/blame/master/be/src/util/rle-test.cc] so you may be more familiar with tests in this area. Feel free to re-assign the ticket as you see appropriate. Thanks! > BitWidthZeroRepeated seems to be flaky > -- > > Key: IMPALA-12239 > URL: https://issues.apache.org/jira/browse/IMPALA-12239 > Project: IMPALA > Issue Type: Bug >Reporter: Fang-Yu Rao >Assignee: Daniel Becker >Priority: Major > Labels: broken-build > > [BitWidthZeroRepeated|https://github.com/apache/impala/blame/master/be/src/util/rle-test.cc#L400] > seems to be flaky. We observed the following error in a Jenkins run. > Error Message > {code} > Value of: 0 Expected: val Which is: '\x9F' (159) > {code} > Stacktrace > {code} > /data/jenkins/workspace/impala-asf-master-exhaustive-release/repos/Impala/be/src/util/rle-test.cc:410 > Value of: 0 > Expected: val > Which is: '\x9F' (159) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-12239) BitWidthZeroRepeated seems to be flaky
Fang-Yu Rao created IMPALA-12239: Summary: BitWidthZeroRepeated seems to be flaky Key: IMPALA-12239 URL: https://issues.apache.org/jira/browse/IMPALA-12239 Project: IMPALA Issue Type: Bug Reporter: Fang-Yu Rao Assignee: Daniel Becker [BitWidthZeroRepeated|https://github.com/apache/impala/blame/master/be/src/util/rle-test.cc#L400] seems to be flaky. We observed the following error in a Jenkins run. Error Message {code} Value of: 0 Expected: val Which is: '\x9F' (159) {code} Stacktrace {code} /data/jenkins/workspace/impala-asf-master-exhaustive-release/repos/Impala/be/src/util/rle-test.cc:410 Value of: 0 Expected: val Which is: '\x9F' (159) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-12235) test_multiple_coordinator() failed because _start_impala_cluster() returned non-zero exit status
[ https://issues.apache.org/jira/browse/IMPALA-12235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao updated IMPALA-12235: - Description: We found that test_multiple_coordinator() could fail because [_start_impala_cluster()|https://github.com/apache/impala/blame/master/tests/common/custom_cluster_test_suite.py#L283] returned non-zero exit status. test_multiple_coordinator() calls test_multiple_coordinator() at https://github.com/apache/impala/blame/master/tests/custom_cluster/test_coordinators.py#L41C10-L41C31. *Error Message* {code:java} CalledProcessError: Command '['/data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/bin/start-impala-cluster.py', '--state_store_args=--statestore_update_frequency_ms=50 --statestore_priority_update_frequency_ms=50 --statestore_heartbeat_frequency_ms=50', '--cluster_size=3', '--num_coordinators=2', '--log_dir=/data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/custom_cluster_tests', '--log_level=1', '--impalad_args=--default_query_options=']' returned non-zero exit status 1 {code} *Stacktrace* {code:java} custom_cluster/test_coordinators.py:41: in test_multiple_coordinators self._start_impala_cluster([], num_coordinators=2, cluster_size=3) common/custom_cluster_test_suite.py:330: in _start_impala_cluster check_call(cmd + options, close_fds=True) /data/jenkins/workspace/impala-asf-master-core-erasure-coding/Impala-Toolchain/toolchain-packages-gcc10.4.0/python-2.7.16/lib/python2.7/subprocess.py:190: in check_call raise CalledProcessError(retcode, cmd) E CalledProcessError: Command '['/data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/bin/start-impala-cluster.py', '--state_store_args=--statestore_update_frequency_ms=50 --statestore_priority_update_frequency_ms=50 --statestore_heartbeat_frequency_ms=50', '--cluster_size=3', '--num_coordinators=2', '--log_dir=/data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/custom_cluster_tests', '--log_level=1', '--impalad_args=--default_query_options=']' returned non-zero exit status 1 {code} The following console output shows that 'num_known_live_backends' could not reach 3 in 4 mins and thus the command that starts the cluster failed with non-zero exit status. {code} -- 2023-06-21 20:54:40,594 INFO MainThread: Starting cluster with command: /data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/bin/start-impala-cluster.py '--state_store_args=--statestore_update_frequency_ms=50 --statestore_priority_update_frequency_ms=50 --statestore_heartbeat_frequency_ms=50' --cluster_size=3 --num_coordinators=2 --log_dir=/data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/custom_cluster_tests --log_level=1 --impalad_args=--default_query_options= 20:54:41 MainThread: Found 0 impalad/0 statestored/0 catalogd process(es) 20:54:41 MainThread: Starting State Store logging to /data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/custom_cluster_tests/statestored.INFO 20:54:42 MainThread: Starting Catalog Service logging to /data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/custom_cluster_tests/catalogd.INFO 20:54:43 MainThread: Starting Impala Daemon logging to /data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/custom_cluster_tests/impalad.INFO 20:54:43 MainThread: Starting Impala Daemon logging to /data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/custom_cluster_tests/impalad_node1.INFO 20:54:43 MainThread: Starting Impala Daemon logging to /data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/custom_cluster_tests/impalad_node2.INFO 20:54:46 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es) 20:54:46 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es) 20:54:46 MainThread: Getting num_known_live_backends from impala-ec2-centos79-m6i-4xlarge-ondemand-1576.vpc.cloudera.com:25000 20:54:46 MainThread: Waiting for num_known_live_backends=3. Current value: 1 20:54:47 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es) 20:54:47 MainThread: Getting num_known_live_backends from impala-ec2-centos79-m6i-4xlarge-ondemand-1576.vpc.cloudera.com:25000 20:54:47 MainThread: Waiting for num_known_live_backends=3. Current value: 1 20:54:48 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es) 20:54:48 MainThread: Getting num_known_live_backends from impala-ec2-centos79-m6i-4xlarge-ondemand-1576.vpc.cloudera.com:25000 20:54:48 MainThread: num_known_live_backends has reached value: 3 20:54:48 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es) 20:54:48 MainThread: Getting num_known_live_backends from impala-ec2-centos79-m6i-4xlarge-ondemand-1576.vpc.cloudera.com:25001
[jira] [Commented] (IMPALA-12235) test_multiple_coordinator() failed because _start_impala_cluster() returned non-zero exit status
[ https://issues.apache.org/jira/browse/IMPALA-12235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17736353#comment-17736353 ] Fang-Yu Rao commented on IMPALA-12235: -- Hi [~wzhou], assigned this JIRA to you since your recent patch [IMPALA-12155: Support High Availability for CatalogD|https://github.com/apache/impala/commit/819db8fa4667e06d1a56fe08baddfbc26983d389] involves _start_impala_cluster() and thus you may be more familiar with this function. Feel free to re-assign the ticket as you see appropriate. Thanks! > test_multiple_coordinator() failed because _start_impala_cluster() returned > non-zero exit status > > > Key: IMPALA-12235 > URL: https://issues.apache.org/jira/browse/IMPALA-12235 > Project: IMPALA > Issue Type: Bug >Reporter: Fang-Yu Rao >Assignee: Wenzhe Zhou >Priority: Major > Labels: broken-build > > We found that test_multiple_coordinator() could fail because > [_start_impala_cluster()|https://github.com/apache/impala/blame/master/tests/common/custom_cluster_test_suite.py#L283] > returned non-zero exit status. test_multiple_coordinator() calls > test_multiple_coordinator() at > https://github.com/apache/impala/blame/master/tests/custom_cluster/test_coordinators.py#L41C10-L41C31. > *Error Message* > {code:java} > CalledProcessError: Command > '['/data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/bin/start-impala-cluster.py', > '--state_store_args=--statestore_update_frequency_ms=50 > --statestore_priority_update_frequency_ms=50 > --statestore_heartbeat_frequency_ms=50', '--cluster_size=3', > '--num_coordinators=2', > '--log_dir=/data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/custom_cluster_tests', > '--log_level=1', '--impalad_args=--default_query_options=']' returned > non-zero exit status 1 > {code} > *Stacktrace* > {code:java} > custom_cluster/test_coordinators.py:41: in test_multiple_coordinators > self._start_impala_cluster([], num_coordinators=2, cluster_size=3) > common/custom_cluster_test_suite.py:330: in _start_impala_cluster > check_call(cmd + options, close_fds=True) > /data/jenkins/workspace/impala-asf-master-core-erasure-coding/Impala-Toolchain/toolchain-packages-gcc10.4.0/python-2.7.16/lib/python2.7/subprocess.py:190: > in check_call > raise CalledProcessError(retcode, cmd) > E CalledProcessError: Command > '['/data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/bin/start-impala-cluster.py', > '--state_store_args=--statestore_update_frequency_ms=50 > --statestore_priority_update_frequency_ms=50 > --statestore_heartbeat_frequency_ms=50', '--cluster_size=3', > '--num_coordinators=2', > '--log_dir=/data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/custom_cluster_tests', > '--log_level=1', '--impalad_args=--default_query_options=']' returned > non-zero exit status 1 > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-12235) test_multiple_coordinator() failed because _start_impala_cluster() returned non-zero exit status
Fang-Yu Rao created IMPALA-12235: Summary: test_multiple_coordinator() failed because _start_impala_cluster() returned non-zero exit status Key: IMPALA-12235 URL: https://issues.apache.org/jira/browse/IMPALA-12235 Project: IMPALA Issue Type: Bug Reporter: Fang-Yu Rao Assignee: Wenzhe Zhou We found that test_multiple_coordinator() could fail because [_start_impala_cluster()|https://github.com/apache/impala/blame/master/tests/common/custom_cluster_test_suite.py#L283] returned non-zero exit status. test_multiple_coordinator() calls test_multiple_coordinator() at https://github.com/apache/impala/blame/master/tests/custom_cluster/test_coordinators.py#L41C10-L41C31. *Error Message* {code:java} CalledProcessError: Command '['/data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/bin/start-impala-cluster.py', '--state_store_args=--statestore_update_frequency_ms=50 --statestore_priority_update_frequency_ms=50 --statestore_heartbeat_frequency_ms=50', '--cluster_size=3', '--num_coordinators=2', '--log_dir=/data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/custom_cluster_tests', '--log_level=1', '--impalad_args=--default_query_options=']' returned non-zero exit status 1 {code} *Stacktrace* {code:java} custom_cluster/test_coordinators.py:41: in test_multiple_coordinators self._start_impala_cluster([], num_coordinators=2, cluster_size=3) common/custom_cluster_test_suite.py:330: in _start_impala_cluster check_call(cmd + options, close_fds=True) /data/jenkins/workspace/impala-asf-master-core-erasure-coding/Impala-Toolchain/toolchain-packages-gcc10.4.0/python-2.7.16/lib/python2.7/subprocess.py:190: in check_call raise CalledProcessError(retcode, cmd) E CalledProcessError: Command '['/data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/bin/start-impala-cluster.py', '--state_store_args=--statestore_update_frequency_ms=50 --statestore_priority_update_frequency_ms=50 --statestore_heartbeat_frequency_ms=50', '--cluster_size=3', '--num_coordinators=2', '--log_dir=/data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/custom_cluster_tests', '--log_level=1', '--impalad_args=--default_query_options=']' returned non-zero exit status 1 {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-12234) test_ctas could fail because CTAS did not reach the expected states
[ https://issues.apache.org/jira/browse/IMPALA-12234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao resolved IMPALA-12234. -- Resolution: Duplicate Close this JIRA since it's a duplicate of IMPALA-12148. > test_ctas could fail because CTAS did not reach the expected states > --- > > Key: IMPALA-12234 > URL: https://issues.apache.org/jira/browse/IMPALA-12234 > Project: IMPALA > Issue Type: Bug >Affects Versions: Impala 4.2.0 >Reporter: Fang-Yu Rao >Assignee: Fang-Yu Rao >Priority: Major > Labels: broken-build > > We found that test_ctas could fail due to CTAS not being able to reach the > expected states in 'wait_time' which is 20 seconds if the underlying file > system is HDFS > ([https://github.com/apache/impala/blame/master/tests/metadata/test_ddl.py#L1120C7-L1120C16]). > Maybe we could slightly increase 'wait_time' to make this test less flaky. > *Error Message* > {code:java} > metadata/test_ddl.py:1122: in test_ctas self.wait_for_state(handle, > finished_state, wait_time, client=client) common/impala_test_suite.py:1115: > in wait_for_state self.wait_for_any_state(handle, [expected_state], > timeout, client) common/impala_test_suite.py:1133: in wait_for_any_state > raise Timeout(timeout_msg) E Timeout: query > 'c74765ed9d8472ed:6f2dde90' did not reach one of the expected states > [4], last known state 2 > {code} > *Stacktrace* > {code:java} > metadata/test_ddl.py:1122: in test_ctas > self.wait_for_state(handle, finished_state, wait_time, client=client) > common/impala_test_suite.py:1115: in wait_for_state > self.wait_for_any_state(handle, [expected_state], timeout, client) > common/impala_test_suite.py:1133: in wait_for_any_state > raise Timeout(timeout_msg) > E Timeout: query 'c74765ed9d8472ed:6f2dde90' did not reach one of > the expected states [4], last known state 2 > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-12234) test_ctas could fail because CTAS did not reach the expected states
Fang-Yu Rao created IMPALA-12234: Summary: test_ctas could fail because CTAS did not reach the expected states Key: IMPALA-12234 URL: https://issues.apache.org/jira/browse/IMPALA-12234 Project: IMPALA Issue Type: Bug Affects Versions: Impala 4.2.0 Reporter: Fang-Yu Rao Assignee: Fang-Yu Rao We found that test_ctas could fail due to CTAS not being able to reach the expected states in 'wait_time' which is 20 seconds if the underlying file system is HDFS ([https://github.com/apache/impala/blame/master/tests/metadata/test_ddl.py#L1120C7-L1120C16]). Maybe we could slightly increase 'wait_time' to make this test less flaky. *Error Message* {code:java} metadata/test_ddl.py:1122: in test_ctas self.wait_for_state(handle, finished_state, wait_time, client=client) common/impala_test_suite.py:1115: in wait_for_state self.wait_for_any_state(handle, [expected_state], timeout, client) common/impala_test_suite.py:1133: in wait_for_any_state raise Timeout(timeout_msg) E Timeout: query 'c74765ed9d8472ed:6f2dde90' did not reach one of the expected states [4], last known state 2 {code} *Stacktrace* {code:java} metadata/test_ddl.py:1122: in test_ctas self.wait_for_state(handle, finished_state, wait_time, client=client) common/impala_test_suite.py:1115: in wait_for_state self.wait_for_any_state(handle, [expected_state], timeout, client) common/impala_test_suite.py:1133: in wait_for_any_state raise Timeout(timeout_msg) E Timeout: query 'c74765ed9d8472ed:6f2dde90' did not reach one of the expected states [4], last known state 2 {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-12151) Formula used to estimate the cost of join could be improved
[ https://issues.apache.org/jira/browse/IMPALA-12151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao updated IMPALA-12151: - Description: We found that the formula used in [Planner#isInvertedJoinCheaper()|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/planner/Planner.java#L719-L724] to estimate the cost of a join (per node) sometimes could lead to a bad join order. The issue could shown using the following steps. {code:java} create database test_db; create table test_db.larger_tbl (string_col string, bigint_col bigint, int_col_0 int, int_col_1 int) partitioned by (date_string_col string) stored as parquet; create table test_db.smaller_tbl (bigint_col bigint) partitioned by (date_string_col string) stored as parquet; insert into test_db.smaller_tbl partition (date_string_col='2023-05-05') values (1000); insert into test_db.smaller_tbl partition (date_string_col='2023-05-05') values (1000); insert into test_db.smaller_tbl partition (date_string_col='2023-05-05') values (1000); insert into test_db.larger_tbl partition (date_string_col='2023-05-05') values ('wa', 1000, 6, 1); alter table test_db.smaller_tbl partition (date_string_col='2023-05-05') set tblproperties('numrows'='17000', 'stats_generated_via_stats_task'='true'); alter table test_db.larger_tbl partition (date_string_col='2023-05-05') set tblproperties('numrows'='2890', 'stats_generated_via_stats_task'='true'); explain select distinct t0.`string_col` from `test_db`.`larger_tbl` t0 left outer join `test_db`.`smaller_tbl` t1 on ( t0.`date_string_col` = t1.`date_string_col` and t0.`bigint_col` = t1.`bigint_col` ) where t0.`date_string_col` in ('2023-05-05') and t0.`int_col_1` in (1) order by 1 asc limit 1000; {code} The query plan shows that Impala will be using the larger table ('larger_tbl') as the build side table in the hash join node. When there is data skew in the larger table, it's possible that there will be only one single executor working on building the hash table based on the only hash partition that contains data, which in turn could cause the executor node to run into memory issue. {code:java} +--+ | Explain String | +--+ | Max Per-Host Resource Reservation: Memory=110.03MB Threads=7 | | Per-Host Resource Estimates: Memory=414MB | | WARNING: The following tables are missing relevant table and/or column statistics. | | test_db.larger_tbl, test_db.smaller_tbl | | | | PLAN-ROOT SINK | | | | | 09:MERGING-EXCHANGE [UNPARTITIONED] | | | order by: t0.string_col ASC | | | limit: 1001 | | | | | 04:TOP-N [LIMIT=1001] | | | order by: t0.string_col ASC | | | row-size=12B cardinality=1.00K | | | | | 08:AGGREGATE [FINALIZE] | | | group by: t0.string_col | | | row-size=12B cardinality=2.89M | | | | | 07:EXCHANGE [HASH(t0.string_col)] | | | | | 03:AGGREGATE [STREAMING] | | | group by: t0.string_col
[jira] [Updated] (IMPALA-12151) Formula used to estimate the cost of join could be improved
[ https://issues.apache.org/jira/browse/IMPALA-12151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao updated IMPALA-12151: - Description: We found that the formula used in [Planner#isInvertedJoinCheaper()|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/planner/Planner.java#L719-L724] to estimate the cost of a join (per node) sometimes could lead to a bad join order. The issue could shown using the following steps. {code:java} create database test_db; create table test_db.larger_tbl (string_col string, bigint_col bigint, int_col_0 int, int_col_1 int) partitioned by (date_string_col string) stored as parquet; create table test_db.smaller_tbl (bigint_col bigint) partitioned by (date_string_col string) stored as parquet; insert into test_db.smaller_tbl partition (date_string_col='2023-05-05') values (1000); insert into test_db.smaller_tbl partition (date_string_col='2023-05-05') values (1000); insert into test_db.smaller_tbl partition (date_string_col='2023-05-05') values (1000); insert into test_db.larger_tbl partition (date_string_col='2023-05-05') values ('wa', 1000, 6, 1); alter table test_db.smaller_tbl partition (date_string_col='2023-05-05') set tblproperties('numrows'='17000', 'stats_generated_via_stats_task'='true'); alter table test_db.larger_tbl partition (date_string_col='2023-05-05') set tblproperties('numrows'='2890', 'stats_generated_via_stats_task'='true'); explain select distinct t0.`string_col` from `test_db`.`larger_tbl` t0 left outer join `test_db`.`smaller_tbl` t1 on ( t0.`date_string_col` = t1.`date_string_col` and t0.`bigint_col` = t1.`bigint_col` ) where t0.`date_string_col` in ('2023-05-05') and t0.`int_col_1` in (1) order by 1 asc limit 1000; {code} The query plan shows that Impala will be using the larger table ('larger_tbl') as the build side table in the hash join node. When there is data skew in the larger table, it's possible that there will be only one single executor working on building the hash table based on the only hash partition that contains data, which in turn could cause the executor node to run into memory issue. {code:java} +--+ | Explain String | +--+ | Max Per-Host Resource Reservation: Memory=110.03MB Threads=7 | | Per-Host Resource Estimates: Memory=414MB | | WARNING: The following tables are missing relevant table and/or column statistics. | | test_db.larger_tbl, test_db.smaller_tbl | | | | PLAN-ROOT SINK | | | | | 09:MERGING-EXCHANGE [UNPARTITIONED] | | | order by: t0.string_col ASC | | | limit: 1001 | | | | | 04:TOP-N [LIMIT=1001] | | | order by: t0.string_col ASC | | | row-size=12B cardinality=1.00K | | | | | 08:AGGREGATE [FINALIZE] | | | group by: t0.string_col | | | row-size=12B cardinality=2.89M | | | | | 07:EXCHANGE [HASH(t0.string_col)] | | | | | 03:AGGREGATE [STREAMING] | | | group by: t0.string_col
[jira] [Updated] (IMPALA-12151) Formula used to estimate the cost of join could be improved
[ https://issues.apache.org/jira/browse/IMPALA-12151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao updated IMPALA-12151: - Description: We found that the formula used in [Planner#isInvertedJoinCheaper()|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/planner/Planner.java#L719-L724] to estimate the cost of a join (per node) sometimes could lead to a bad join order. The issue could shown using by the following steps. {code:java} create database test_db; create table test_db.larger_tbl (string_col string, bigint_col bigint, int_col_0 int, int_col_1 int) partitioned by (date_string_col string) stored as parquet; create table test_db.smaller_tbl (bigint_col bigint) partitioned by (date_string_col string) stored as parquet; insert into test_db.smaller_tbl partition (date_string_col='2023-05-05') values (1000); insert into test_db.smaller_tbl partition (date_string_col='2023-05-05') values (1000); insert into test_db.smaller_tbl partition (date_string_col='2023-05-05') values (1000); insert into test_db.larger_tbl partition (date_string_col='2023-05-05') values ('wa', 1000, 6, 1); alter table test_db.smaller_tbl partition (date_string_col='2023-05-05') set tblproperties('numrows'='17000', 'stats_generated_via_stats_task'='true'); alter table test_db.larger_tbl partition (date_string_col='2023-05-05') set tblproperties('numrows'='2890', 'stats_generated_via_stats_task'='true'); explain select distinct t0.`string_col` from `test_db`.`larger_tbl` t0 left outer join `test_db`.`smaller_tbl` t1 on ( t0.`date_string_col` = t1.`date_string_col` and t0.`bigint_col` = t1.`bigint_col` ) where t0.`date_string_col` in ('2023-05-05') and t0.`int_col_1` in (1) order by 1 asc limit 1000; {code} The query plan shows that Impala will be using the larger table ('larger_tbl') as the build side table in the hash join node. When there is data skew in the larger table, it's possible that there will be only one single executor working on building the hash table based on the only hash partition that contains data, which in turn could cause the executor node to run into memory issue. {code:java} +--+ | Explain String | +--+ | Max Per-Host Resource Reservation: Memory=110.03MB Threads=7 | | Per-Host Resource Estimates: Memory=414MB | | WARNING: The following tables are missing relevant table and/or column statistics. | | test_db.larger_tbl, test_db.smaller_tbl | | | | PLAN-ROOT SINK | | | | | 09:MERGING-EXCHANGE [UNPARTITIONED] | | | order by: t0.string_col ASC | | | limit: 1001 | | | | | 04:TOP-N [LIMIT=1001] | | | order by: t0.string_col ASC | | | row-size=12B cardinality=1.00K | | | | | 08:AGGREGATE [FINALIZE] | | | group by: t0.string_col | | | row-size=12B cardinality=2.89M | | | | | 07:EXCHANGE [HASH(t0.string_col)] | | | | | 03:AGGREGATE [STREAMING] | | | group by: t0.string_col
[jira] [Created] (IMPALA-12151) Formula used to estimate the cost of join could be improved
Fang-Yu Rao created IMPALA-12151: Summary: Formula used to estimate the cost of join could be improved Key: IMPALA-12151 URL: https://issues.apache.org/jira/browse/IMPALA-12151 Project: IMPALA Issue Type: Bug Components: Frontend Affects Versions: Impala 4.1.2 Reporter: Fang-Yu Rao We found that the formula used in [Planner#|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/planner/Planner.java#L719-L724] to estimate the cost of a join (per node) sometimes could lead to a bad join order. The issue could shown using by the following steps. {code:java} create database test_db; create table test_db.larger_tbl (string_col string, bigint_col bigint, int_col_0 int, int_col_1 int) partitioned by (date_string_col string) stored as parquet; create table test_db.smaller_tbl (bigint_col bigint) partitioned by (date_string_col string) stored as parquet; insert into test_db.smaller_tbl partition (date_string_col='2023-05-05') values (1000); insert into test_db.smaller_tbl partition (date_string_col='2023-05-05') values (1000); insert into test_db.smaller_tbl partition (date_string_col='2023-05-05') values (1000); insert into test_db.larger_tbl partition (date_string_col='2023-05-05') values ('wa', 1000, 6, 1); alter table test_db.smaller_tbl partition (date_string_col='2023-05-05') set tblproperties('numrows'='17000', 'stats_generated_via_stats_task'='true'); alter table test_db.larger_tbl partition (date_string_col='2023-05-05') set tblproperties('numrows'='2890', 'stats_generated_via_stats_task'='true'); explain select distinct t0.`string_col` from `test_db`.`larger_tbl` t0 left outer join `test_db`.`smaller_tbl` t1 on ( t0.`date_string_col` = t1.`date_string_col` and t0.`bigint_col` = t1.`bigint_col` ) where t0.`date_string_col` in ('2023-05-05') and t0.`int_col_1` in (1) order by 1 asc limit 1000; {code} The query plan shows that Impala will be using the larger table ('larger_tbl') as the build side table in the hash join node. When there is data skew in the larger table, it's possible that there will be only one single executor working on building the hash table based on the only hash partition that contains data, which in turn could cause the executor node to run into memory issue. {code:java} +--+ | Explain String | +--+ | Max Per-Host Resource Reservation: Memory=110.03MB Threads=7 | | Per-Host Resource Estimates: Memory=414MB | | WARNING: The following tables are missing relevant table and/or column statistics. | | test_db.larger_tbl, test_db.smaller_tbl | | | | PLAN-ROOT SINK | | | | | 09:MERGING-EXCHANGE [UNPARTITIONED] | | | order by: t0.string_col ASC | | | limit: 1001 | | | | | 04:TOP-N [LIMIT=1001] | | | order by: t0.string_col ASC | | | row-size=12B cardinality=1.00K | | | | | 08:AGGREGATE [FINALIZE] | | | group by: t0.string_col | | | row-size=12B cardinality=2.89M | | | | | 07:EXCHANGE [HASH(t0.string_col)] | | | | | 03:AGGREGATE
[jira] [Resolved] (IMPALA-11686) test_corrupts_stats fails in exhaustive tests due to IMPALA-11666
[ https://issues.apache.org/jira/browse/IMPALA-11686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao resolved IMPALA-11686. -- Resolution: Fixed Resolve the JIRA since the fix has been merged. > test_corrupts_stats fails in exhaustive tests due to IMPALA-11666 > - > > Key: IMPALA-11686 > URL: https://issues.apache.org/jira/browse/IMPALA-11686 > Project: IMPALA > Issue Type: Bug >Reporter: Zoltán Borók-Nagy >Assignee: Fang-Yu Rao >Priority: Major > Labels: broken-build > > IMPALA-11666 changed the warning message in the query plan when there are > potentially corrupt stats. > test_corrupt_stats expects the old warning message. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-11666) Consider revising the warning message when hasCorruptTableStats_ is true for a table
[ https://issues.apache.org/jira/browse/IMPALA-11666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao resolved IMPALA-11666. -- Resolution: Fixed Resolve this JIRA since the patch has been merged. > Consider revising the warning message when hasCorruptTableStats_ is true for > a table > > > Key: IMPALA-11666 > URL: https://issues.apache.org/jira/browse/IMPALA-11666 > Project: IMPALA > Issue Type: Task > Components: Frontend >Reporter: Fang-Yu Rao >Assignee: Fang-Yu Rao >Priority: Major > > Currently, '{{{}hasCorruptTableStats_{}}}' of an HDFS table is set to true > when one of the following is true in > [HdfsScanNode.java|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java]. > # Its '{{{}cardinality_{}}}' less than -1. > # The number of rows in one of its partition is less than -1. > # The number of rows in one of its partition is 0 but the size of the > associated files of this partition is greater than 0. > # The number of rows in the table is 0 but the size of the associated files > of this table is greater than 0. > For such a table, the {{EXPLAIN}} statement for queries involving the table > would contain the message of "{{{}WARNING: The following tables have > potentially corrupt table statistics. Drop and re-compute statistics to > resolve this problem.{}}}" > The warning message may be a bit too scary for an Impala user especially if > we consider the fact that a table without corrupt statistics could indeed > have its '{{{}hasCorruptTableStats_{}}}' set to true by Impala's frontend. > Specifically, a table without corrupt statistics but having its > '{{{}hasCorruptTableStats_{}}}' set to 1 could be created as follows after > starting the Impala cluster. > # Execute on the command line "{{{}beeline -u > "jdbc:hive2://localhost:11050/default"{}}}" to enter beeline. > # Create a transactional table in beeline via "{{{}create table > test_db.test_tbl_01 (id int, name string) stored as orc tblproperties > ('transactional'='true'){}}}". > # Insert a row into the table just created in beeline via "{{{}insert into > table test_db.test_tbl_01 (1, "Alex");{}}}". > # Delete the row just inserted in beeline via "{{{}delete from > test_db.test_tbl_01 where id = 1{}}}". > # In Impala shell, execute "{{compute stats test_db.test_tbl_01}}". > # In Impala shell, execute "{{{}explain select * from > test_db.test_tbl_01{}}}" to verify that the warning message described above > appears in the output. > The table '{{{}test_tbl_01{}}}' above has 0 row but the associated file size > is greater than 0. > It may be better that we revise the warning message to something less scary > as shown below. > {code:java} > The number of rows in the following tables or in a partition of them has 0 or > fewer than -1 row but positive total file size. > This does not necessarily imply the existence of corrupt statistics. > In the case of corrupt statistics, drop and re-compute statistics could > resolve this problem. > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-11934) TestBatchReadingFromRemote seems to be flaky in the Ozone build
[ https://issues.apache.org/jira/browse/IMPALA-11934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691276#comment-17691276 ] Fang-Yu Rao commented on IMPALA-11934: -- Hi [~baggio000], assigned this to you since you are more familiar with the failed test. Please re-assign the JIRA as you see appropriate. Thanks! > TestBatchReadingFromRemote seems to be flaky in the Ozone build > --- > > Key: IMPALA-11934 > URL: https://issues.apache.org/jira/browse/IMPALA-11934 > Project: IMPALA > Issue Type: Bug > Components: Backend >Reporter: Fang-Yu Rao >Assignee: Yida Wu >Priority: Major > Labels: broken-build > > We found that TestBatchReadingFromRemote failed in a run of Ozone build with > the following output. > Error Message > {code} > Value of: wait_times-- > 0 Actual: false Expected: true > {code} > Stacktrace > {code} > /data/jenkins/workspace/impala-asf-master-core-ozone/repos/Impala/be/src/runtime/tmp-file-mgr-test.cc:323 > Value of: wait_times-- > 0 > Actual: false > Expected: true > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-11934) TestBatchReadingFromRemote seems to be flaky in the Ozone build
Fang-Yu Rao created IMPALA-11934: Summary: TestBatchReadingFromRemote seems to be flaky in the Ozone build Key: IMPALA-11934 URL: https://issues.apache.org/jira/browse/IMPALA-11934 Project: IMPALA Issue Type: Bug Components: Backend Reporter: Fang-Yu Rao Assignee: Yida Wu We found that TestBatchReadingFromRemote failed in a run of Ozone build with the following output. Error Message {code} Value of: wait_times-- > 0 Actual: false Expected: true {code} Stacktrace {code} /data/jenkins/workspace/impala-asf-master-core-ozone/repos/Impala/be/src/runtime/tmp-file-mgr-test.cc:323 Value of: wait_times-- > 0 Actual: false Expected: true {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-11932) test_partition_key_scans_with_multiple_blocks_table failed when erasure coding is turned on
[ https://issues.apache.org/jira/browse/IMPALA-11932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao updated IMPALA-11932: - Affects Version/s: Impala 4.3.0 > test_partition_key_scans_with_multiple_blocks_table failed when erasure > coding is turned on > --- > > Key: IMPALA-11932 > URL: https://issues.apache.org/jira/browse/IMPALA-11932 > Project: IMPALA > Issue Type: Bug >Affects Versions: Impala 4.3.0 >Reporter: Fang-Yu Rao >Assignee: Quanlong Huang >Priority: Major > Labels: broken-build > > We found that test_partition_key_scans_with_multiple_blocks_table failed when > ERASURE_CODING is true. This test was added in IMPALA-11081 > (https://gerrit.cloudera.org/c/19471/17/tests/query_test/test_queries.py#366). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-11932) test_partition_key_scans_with_multiple_blocks_table failed when erasure coding is turned on
[ https://issues.apache.org/jira/browse/IMPALA-11932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17690924#comment-17690924 ] Fang-Yu Rao commented on IMPALA-11932: -- Hi [~stigahuang], assigned this JIRA to you since you helped review the patch that added the failed test. Please reassign the JIRA as you see appropriate. Thanks! > test_partition_key_scans_with_multiple_blocks_table failed when erasure > coding is turned on > --- > > Key: IMPALA-11932 > URL: https://issues.apache.org/jira/browse/IMPALA-11932 > Project: IMPALA > Issue Type: Bug >Affects Versions: Impala 4.3.0 >Reporter: Fang-Yu Rao >Assignee: Quanlong Huang >Priority: Major > Labels: broken-build > > We found that test_partition_key_scans_with_multiple_blocks_table failed when > ERASURE_CODING is true. This test was added in IMPALA-11081 > (https://gerrit.cloudera.org/c/19471/17/tests/query_test/test_queries.py#366). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-11932) test_partition_key_scans_with_multiple_blocks_table failed when erasure coding is turned on
Fang-Yu Rao created IMPALA-11932: Summary: test_partition_key_scans_with_multiple_blocks_table failed when erasure coding is turned on Key: IMPALA-11932 URL: https://issues.apache.org/jira/browse/IMPALA-11932 Project: IMPALA Issue Type: Bug Reporter: Fang-Yu Rao Assignee: Quanlong Huang We found that test_partition_key_scans_with_multiple_blocks_table failed when ERASURE_CODING is true. This test was added in IMPALA-11081 (https://gerrit.cloudera.org/c/19471/17/tests/query_test/test_queries.py#366). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-11921) test_large_sql seems to be flaky
[ https://issues.apache.org/jira/browse/IMPALA-11921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao updated IMPALA-11921: - Description: We observed the following failure in an ASAN run. {code} /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/tests/shell/test_shell_commandline.py:1026: in test_large_sql assert actual_time_s <= time_limit_s, ( E AssertionError: It took 21.0015001297 seconds to execute the query. Time limit is 20 seconds. E assert 21.001500129699707 <= 20 {code} We have not seen this failure for a while since IMPALA-7428. was: We observed the following failure in an ASAN run. {noformat} /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/tests/shell/test_shell_commandline.py:1026: in test_large_sql assert actual_time_s <= time_limit_s, ( E AssertionError: It took 21.0015001297 seconds to execute the query. Time limit is 20 seconds. E assert 21.001500129699707 <= 20 {noformat} We have not seen this failure for a while since IMPALA-7428. > test_large_sql seems to be flaky > > > Key: IMPALA-11921 > URL: https://issues.apache.org/jira/browse/IMPALA-11921 > Project: IMPALA > Issue Type: Bug >Reporter: Fang-Yu Rao >Assignee: Fang-Yu Rao >Priority: Major > Labels: broken-build > > We observed the following failure in an ASAN run. > {code} > /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/tests/shell/test_shell_commandline.py:1026: > in test_large_sql assert actual_time_s <= time_limit_s, ( E > AssertionError: It took 21.0015001297 seconds to execute the query. Time > limit is 20 seconds. E assert 21.001500129699707 <= 20 > {code} > We have not seen this failure for a while since IMPALA-7428. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-11921) test_large_sql seems to be flaky
Fang-Yu Rao created IMPALA-11921: Summary: test_large_sql seems to be flaky Key: IMPALA-11921 URL: https://issues.apache.org/jira/browse/IMPALA-11921 Project: IMPALA Issue Type: Bug Reporter: Fang-Yu Rao Assignee: Fang-Yu Rao We observed the following failure in an ASAN run. {noformat} /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/tests/shell/test_shell_commandline.py:1026: in test_large_sql assert actual_time_s <= time_limit_s, ( E AssertionError: It took 21.0015001297 seconds to execute the query. Time limit is 20 seconds. E assert 21.001500129699707 <= 20 {noformat} We have not seen this failure for a while since IMPALA-7428. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-11918) Fix test_java_udfs_from_impala after IMPALA-11745
[ https://issues.apache.org/jira/browse/IMPALA-11918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao updated IMPALA-11918: - Labels: broken-build (was: ) > Fix test_java_udfs_from_impala after IMPALA-11745 > - > > Key: IMPALA-11918 > URL: https://issues.apache.org/jira/browse/IMPALA-11918 > Project: IMPALA > Issue Type: Bug > Components: fe >Affects Versions: Impala 4.3.0 >Reporter: Peter Rozsa >Assignee: Peter Rozsa >Priority: Major > Labels: broken-build > > IMPALA-11745 changed the error message regarding failed method extraction for > Hive UDFs, and an exhaustive test case remained unchanged, causing failure in > exhaustive builds. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-11871) INSERT statement does not respect Ranger policies for HDFS
Fang-Yu Rao created IMPALA-11871: Summary: INSERT statement does not respect Ranger policies for HDFS Key: IMPALA-11871 URL: https://issues.apache.org/jira/browse/IMPALA-11871 Project: IMPALA Issue Type: Bug Components: Frontend Reporter: Fang-Yu Rao Assignee: Fang-Yu Rao In a cluster with Ranger auth (and with legacy catalog mode), even if you provide RWX to cm_hdfs -> all-path for the user impala, inserting into a table whose HDFS POSIX permissions happen to exclude impala access will result in an {noformat} "AnalysisException: Unable to INSERT into target table (default.t1) because Impala does not have WRITE access to HDFS location: hdfs://nightly-71x-vx-2.nightly-71x-vx.root.hwx.site:8020/warehouse/tablespace/external/hive/t1"{noformat} {noformat} [root@nightly-71x-vx-3 ~]# hdfs dfs -getfacl /warehouse/tablespace/external/hive/t1 file: /warehouse/tablespace/external/hive/t1 owner: hive group: supergroup user::rwx user:impala:rwx #effective:r-x group::rwx #effective:r-x mask::r-x other::--- default:user::rwx default:user:impala:rwx default:group::rwx default:mask::rwx default:other::--- {noformat} ~~ ANALYSIS Stack trace from a version of Cloudera's distribution of Impala (impalad version 3.4.0-SNAPSHOT RELEASE (build {*}db20b59a093c17ea4699117155d58fe874f7d68f{*})): {noformat} at org.apache.impala.catalog.FeFsTable$Utils.checkWriteAccess(FeFsTable.java:585) at org.apache.impala.analysis.InsertStmt.analyzeWriteAccess(InsertStmt.java:545) at org.apache.impala.analysis.InsertStmt.analyze(InsertStmt.java:391) at org.apache.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:463) at org.apache.impala.analysis.AnalysisContext.analyzeAndAuthorize(AnalysisContext.java:426) at org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:1570) at org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:1536) at org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1506) at org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:155){noformat} The exception occurs at analysis time, so I tested and succeeded in writing directly into the said directory. {noformat} [root@nightly-71x-vx-3 ~]# hdfs dfs -touchz /warehouse/tablespace/external/hive/t1/test [root@nightly-71x-vx-3 ~]# hdfs dfs -ls /warehouse/tablespace/external/hive/t1/ Found 8 items rw-rw---+ 3 hive supergroup 417 2023-01-27 17:37 /warehouse/tablespace/external/hive/t1/00_0 rw-rw---+ 3 hive supergroup 417 2023-01-27 17:44 /warehouse/tablespace/external/hive/t1/00_0_copy_1 rw-rw---+ 3 hive supergroup 417 2023-01-27 17:49 /warehouse/tablespace/external/hive/t1/00_0_copy_2 rw-rw---+ 3 hive supergroup 417 2023-01-27 17:53 /warehouse/tablespace/external/hive/t1/00_0_copy_3 rw-rw---+ 3 impala hive 355 2023-01-27 17:17 /warehouse/tablespace/external/hive/t1/4c4477c12c51ad96-3126b52d_2029811630_data.0.parq rw-rw---+ 3 impala hive 355 2023-01-27 17:39 /warehouse/tablespace/external/hive/t1/9945b25bb37d1ff2-473c1478_574471191_data.0.parq drwxrwx---+ - impala hive 0 2023-01-27 17:39 /warehouse/tablespace/external/hive/t1/_impala_insert_staging rw-rw---+ 3 impala supergroup 0 2023-01-27 18:01 /warehouse/tablespace/external/hive/t1/test{noformat} Reviewing the code[1], I traced the {{TAccessLevel}} to the catalogd. And if I add user impala to group supergroup on the catalogd host, this query will succeed past the authorization. Additionally, this query does not trip up during analysis when catalog v2 is enabled because the method {{getFirstLocationWithoutWriteAccess()}} is not implemented there yet and always returns null[2]. [1] [https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java#L494-L504] [2] [https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java#L295-L298] ~~ Ideally, when Ranger authorization is in place, we should: 1) Not check access level during analysis 2) Incorporate Ranger ACLs during analysis -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-11728) Set fallback database for functions
[ https://issues.apache.org/jira/browse/IMPALA-11728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao updated IMPALA-11728: - Fix Version/s: Impala 4.3.0 > Set fallback database for functions > --- > > Key: IMPALA-11728 > URL: https://issues.apache.org/jira/browse/IMPALA-11728 > Project: IMPALA > Issue Type: New Feature >Reporter: gaoxiaoqing >Assignee: gaoxiaoqing >Priority: Major > Fix For: Impala 4.3.0 > > > {code:java} > CREATE FUNCTION default.function_name([arg_type[, arg_type...]) > RETURNS return_type > LOCATION 'hdfs_path_to_dot_so' > SYMBOL='symbol_name' {code} > > {noformat} > use functional; > select function_name(); > ERROR: AnalysisException: functional.function_name() unknown for database > functional.{noformat} > > The create function statement can only works on specified default database. > Add a fallback database for functions as query option. It works on all > database without changing query. > {noformat} > use functional; > set db_name_with_global_udf=default; > select function_name(); // It works.{noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-10986) Specific privilege should be required to execute a UDF in Impala
[ https://issues.apache.org/jira/browse/IMPALA-10986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao resolved IMPALA-10986. -- Fix Version/s: Impala 4.3.0 Resolution: Fixed Resolve this JIRA since the fix has been merged. > Specific privilege should be required to execute a UDF in Impala > > > Key: IMPALA-10986 > URL: https://issues.apache.org/jira/browse/IMPALA-10986 > Project: IMPALA > Issue Type: Bug >Affects Versions: Impala 4.0.0 >Reporter: Fang-Yu Rao >Assignee: Fang-Yu Rao >Priority: Major > Fix For: Impala 4.3.0 > > Attachments: ranger_policy_for_udfs_impala.png > > > We found that currently in Impala, to execute a UDF, a user only has to be > granted one of the 3 privileges in {{{}INSERT{}}}, {{{}SELECT{}}}, > {{REFRESH}} on the database (i.e., the {{VIEW_METADATA}} privilege on the > database) where the UDF was created. No additional privilege on the UDF is > required. An example of the policy added via Ranger's web UI allowing a user > to execute a UDF is also provided here. > !ranger_policy_for_udfs_impala.png! > The privilege request of {{VIEW_METADATA}} on the database is registered > within [analyzer.getDb(fnName_.getDb(), Privilege.VIEW_METADATA, > true)|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java#L557]. > This is the reason why the user has to be granted the VIEW_METADATA > privilege on the database to be able to execute the UDF. > Recall that the registration of the privilege mentioned above occurs in > [FunctionCallExpr#analyzeImpl()|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java#L531] > where Impala's frontend analyzes the given function in a query. > I noticed in the same method above at > [https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java#L535], > Impala is able to determine whether the current function is a UDF or not. > Thus it seems that to fix the problem, we need to additionally register the > corresponding privilege request for a UDF (v.s. a built-in function) other > than the {{VIEW_METADATA}} privilege on the database. > We should thus provide a fix for the issue. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-11728) Set fallback database for functions
[ https://issues.apache.org/jira/browse/IMPALA-11728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao resolved IMPALA-11728. -- Resolution: Fixed Resolve the patch since the fix has been merged. > Set fallback database for functions > --- > > Key: IMPALA-11728 > URL: https://issues.apache.org/jira/browse/IMPALA-11728 > Project: IMPALA > Issue Type: New Feature >Reporter: gaoxiaoqing >Assignee: gaoxiaoqing >Priority: Major > > {code:java} > CREATE FUNCTION default.function_name([arg_type[, arg_type...]) > RETURNS return_type > LOCATION 'hdfs_path_to_dot_so' > SYMBOL='symbol_name' {code} > > {noformat} > use functional; > select function_name(); > ERROR: AnalysisException: functional.function_name() unknown for database > functional.{noformat} > > The create function statement can only works on specified default database. > Add a fallback database for functions as query option. It works on all > database without changing query. > {noformat} > use functional; > set db_name_with_global_udf=default; > select function_name(); // It works.{noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-8576) Pass lineage object instead of string to query hook
[ https://issues.apache.org/jira/browse/IMPALA-8576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao reassigned IMPALA-8576: --- Assignee: Fang-Yu Rao > Pass lineage object instead of string to query hook > --- > > Key: IMPALA-8576 > URL: https://issues.apache.org/jira/browse/IMPALA-8576 > Project: IMPALA > Issue Type: Sub-task > Components: Backend, Frontend >Reporter: radford nguyen >Assignee: Fang-Yu Rao >Priority: Major > > The {{QueryEventHook}} interface currently takes a {{String}} for the > {{onQueryComplete}} hook. This string is the JSON representation of the > lineage graph written to the legacy lineage file. > It would be better to pass the serialized {{byte[]}} of the lineage thrift > object itself, so that we can decouple ourselves from any lineage file > format(s). > Additionally, hook implementations should use their own version of Thrift to > deserialize the object so that they are not tied to Impala's Thrift version. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org