[jira] [Created] (IMPALA-13028) libkudu_client.so is not stripped in the DEB/RPM packages
Quanlong Huang created IMPALA-13028: --- Summary: libkudu_client.so is not stripped in the DEB/RPM packages Key: IMPALA-13028 URL: https://issues.apache.org/jira/browse/IMPALA-13028 Project: IMPALA Issue Type: Bug Components: Infrastructure Reporter: Quanlong Huang The current DEB package is 611M on ubuntu18.04. Here are the top-10 largest files: {noformat} 14 MB ./opt/impala/lib/jars/hive-standalone-metastore-3.1.3000.7.2.18.0-369.jar 15 MB ./opt/impala/lib/jars/kudu-client-e742f86f6d.jar 20 MB ./opt/impala/lib/native/libstdc++.so.6.0.28 22 MB ./opt/impala/lib/jars/js-22.3.0.jar 29 MB ./opt/impala/lib/jars/iceberg-hive-runtime-1.3.1.7.2.18.0-369.jar 60 MB ./opt/impala/lib/jars/ozone-filesystem-hadoop3-1.3.0.7.2.18.0-369.jar 84 MB ./opt/impala/util/impala-profile-tool 85 MB ./opt/impala/sbin/impalad 175 MB ./opt/impala/lib/jars/impala-minimal-s3a-aws-sdk-4.4.0-SNAPSHOT.jar 188 MB ./opt/impala/lib/native/libkudu_client.so.0.1.0{noformat} It appears that we just strip binaries built by Impala, e.g. impalad and impala-profile-tool. libkudu_client.so.0.1.0 remains the same as the one in the toolchain folder. {code:bash} $ ll -th toolchain/toolchain-packages-gcc10.4.0/kudu-e742f86f6d/release/lib/libkudu_client.so.0.1.0 -rw-r--r-- 1 quanlong quanlong 189M 10月 18 2023 toolchain/toolchain-packages-gcc10.4.0/kudu-e742f86f6d/release/lib/libkudu_client.so.0.1.0 $ file toolchain/toolchain-packages-gcc10.4.0/kudu-e742f86f6d/release/lib/libkudu_client.so.0.1.0 toolchain/toolchain-packages-gcc10.4.0/kudu-e742f86f6d/release/lib/libkudu_client.so.0.1.0: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, with debug_info, not stripped{code} CC [~yx91490] [~boroknagyz] [~rizaon] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-12918) Do not allow non-numeric values in Hive table stats during an alter table
[ https://issues.apache.org/jira/browse/IMPALA-12918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang updated IMPALA-12918: Labels: alter alter-table catalog-2024 newbie ramp-up stats validation (was: alter alter-table stats validation) > Do not allow non-numeric values in Hive table stats during an alter table > - > > Key: IMPALA-12918 > URL: https://issues.apache.org/jira/browse/IMPALA-12918 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Affects Versions: Impala 4.0.0 >Reporter: Miklos Szurap >Priority: Major > Labels: alter, alter-table, catalog-2024, newbie, ramp-up, > stats, validation > > Hive table properties are string in their nature, however some of them have > special meaning and should have numeric values, like the "totalSize", > "numRows", "rawDataSize". > Impala currently allows these to be set to non-numeric values (including > empty string). > From certain applications (like from Spark) we get quite obscure > "NumberFormatException" errors while trying to access such broken tables. > (see SPARK-47444) > Impala should also validate "alter table" statements and not allow > non-numeric values in the "totalSize", "numRows", "rawDataSize" table > properties. > For example a query which may break the table (after it can't be read from > Spark): > {code} > [impalacoordinator:21000] default> alter table t1p set > tblproperties('numRows'='', 'STATS_GENERATED_VIA_STATS_TASK'='true'); > {code} > Note: beeline/Hive validates alter table statements with the "numRows" and > "rawDataSize", the "totalSize" still needs validation there too. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-12901) Add table property to inject delay in event processing
[ https://issues.apache.org/jira/browse/IMPALA-12901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang updated IMPALA-12901: Labels: catalog-2024 (was: ) > Add table property to inject delay in event processing > -- > > Key: IMPALA-12901 > URL: https://issues.apache.org/jira/browse/IMPALA-12901 > Project: IMPALA > Issue Type: Test > Components: Catalog >Reporter: Quanlong Huang >Priority: Major > Labels: catalog-2024 > > We have tests that verify the behaviors during event processing. We use > global debug action like > "--debug_actions=catalogd_event_processing_delay:SLEEP@2000" to inject the > delay. > It'd be helpful to add a table property for the same which only affects > processing events on that table. So we can control the delay more > specifically. > This is pointed by [~csringhofer] during the review of > https://gerrit.cloudera.org/c/20986/8/tests/custom_cluster/test_web_pages.py#444 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-12829) Skip processing abort_txn and alloc_write_id_event events if the table is blacklisted
[ https://issues.apache.org/jira/browse/IMPALA-12829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang updated IMPALA-12829: Labels: catalog-2024 (was: ) > Skip processing abort_txn and alloc_write_id_event events if the table is > blacklisted > - > > Key: IMPALA-12829 > URL: https://issues.apache.org/jira/browse/IMPALA-12829 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Reporter: Sai Hemanth Gantasala >Assignee: Sai Hemanth Gantasala >Priority: Critical > Labels: catalog-2024 > > For transactional tables, the event processor is not skipping abort_txn and > alloc_write_id_event if the database/table is blacklisted. > This processing is unnecessary and helps to improve event processor lag by > skipping abort_txn and alloc_write_id_event events if the corresponding > transactional tables are blacklisted. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-12856) IllegalStateException in processing RELOAD events due to malformed HMS Partition objects
[ https://issues.apache.org/jira/browse/IMPALA-12856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang updated IMPALA-12856: Labels: catalog-2024 (was: ) > IllegalStateException in processing RELOAD events due to malformed HMS > Partition objects > > > Key: IMPALA-12856 > URL: https://issues.apache.org/jira/browse/IMPALA-12856 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Reporter: Quanlong Huang >Assignee: Sai Hemanth Gantasala >Priority: Critical > Labels: catalog-2024 > > When processing RELOAD events on partitions, catalogd fetch the Partition > objects from HMS. The returned Partition objects could be malformed which > causes an IllegalStateException and stops the event-processor. This was > observed when a partition is added and dropped in a loop. > {noformat} > E0229 15:19:27.945312 12668 MetastoreEventsProcessor.java:990] Unexpected > exception received while processing event > Java exception follows: > java.lang.IllegalStateException > at > com.google.common.base.Preconditions.checkState(Preconditions.java:496) > at > org.apache.impala.catalog.HdfsTable.getTypeCompatiblePartValues(HdfsTable.java:2598) > at > org.apache.impala.catalog.HdfsTable.reloadPartitionsFromNames(HdfsTable.java:2856) > at > org.apache.impala.service.CatalogOpExecutor.reloadPartitionsFromNamesIfExists(CatalogOpExecutor.java:4805) > at > org.apache.impala.service.CatalogOpExecutor.reloadPartitionsIfExist(CatalogOpExecutor.java:4742) > at > org.apache.impala.catalog.events.MetastoreEvents$MetastoreTableEvent.reloadPartitions(MetastoreEvents.java:1050) > at > org.apache.impala.catalog.events.MetastoreEvents$ReloadEvent.processPartitionReload(MetastoreEvents.java:2941) > at > org.apache.impala.catalog.events.MetastoreEvents$ReloadEvent.processTableEvent(MetastoreEvents.java:2906) > at > org.apache.impala.catalog.events.MetastoreEvents$MetastoreTableEvent.process(MetastoreEvents.java:1248) > at > org.apache.impala.catalog.events.MetastoreEvents$MetastoreEvent.processIfEnabled(MetastoreEvents.java:672) > at > org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:1164) > at > org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:972) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:750) > E0229 15:19:27.963455 12668 MetastoreEventsProcessor.java:1251] Event id: > 8697728 > Event Type: RELOAD > Event time: 1709191166 > Database name: default > Table name: part_tbl > Event message: H4s{noformat} > The failed check is asserting the number of partition columns cached in > catalogd matches the number of partition values from the HMS object: > {code:java} > public List getTypeCompatiblePartValues(List values) { > List result = new ArrayList<>(); > List partitionColumns = getClusteringColumns(); > Preconditions.checkState(partitionColumns.size() == values.size()); // > This failed{code} > After adding some debug logs, I found the Partition obejct got from HMS had > an empty values list: > {noformat} > I0229 16:04:04.679625 25867 HdfsTable.java:2829] HMS Partition: > Partition(values:[], dbName:default, tableName:part_tbl, > createTime:1709193844, lastAccessTime:0, > sd:StorageDescriptor(cols:[FieldSchema(name:i, type:int, comment:null)], > location:hdf > s://localhost:20500/test-warehouse/part_tbl/p=1, > inputFormat:org.apache.hadoop.mapred.TextInputFormat, > outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, > compressed:false, numBuckets:0, serdeInfo:SerDeInfo(name:null, serializ > ationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{}), > bucketCols:[], sortCols:[], parameters:{}, > skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], > skewedColValueLocationMaps:{}), storedAsSubDirectories:false), par > ameters:{}, catName:hive, writeId:0) > I0229 16:04:04.680133 25867 MetastoreEventsProcessor.java:1189] Time elapsed > in processing event batch: 17.145ms
[jira] [Updated] (IMPALA-10976) Sync db/table in catalogd to latest HMS event id for all DDLs from Impala shell
[ https://issues.apache.org/jira/browse/IMPALA-10976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang updated IMPALA-10976: Labels: catalog-2024 (was: ) > Sync db/table in catalogd to latest HMS event id for all DDLs from Impala > shell > --- > > Key: IMPALA-10976 > URL: https://issues.apache.org/jira/browse/IMPALA-10976 > Project: IMPALA > Issue Type: Bug > Components: Catalog, Frontend >Reporter: Sourabh Goyal >Assignee: Sai Hemanth Gantasala >Priority: Major > Labels: catalog-2024 > > This is a follow up from IMPALA-10926. The idea is that when any DDL > operation is performed from Impala shell, it also syncs the db/table to its > latest event ID as per HMS. This way updates to a db/table's are applied in > the same order as they appear in the Notification log in HMS which ensures > consistency. Currently catalogD applies any updates received from Impala > shell in place. Instead it should perform an HMS operation first and then > replay all the HMS events since the last synced event. > However there are subtle differences in how Impala processes DDLs via shell > vs how it processes HMS events These are: > * When processing an alter table event, currently catalogD does a full table > reload. This has a performance impact as table reload is time consuming. > Whereas in place alter table DDL operation in catalogOpExecutor (via Impala > shell) is faster since detects when to reload table schema or file metadata > or both. Need some improvements in Alter table event processing logic to > detect whether to reload the file metadata or not. --> This is addressed by > IMPALA-11534 > * Similar improvement is required in processing alter partition event. As of > now, when processing AlterPartition HMS event, catalogd always reloads file > metadata but when doing the same from shell, it reloads metadata only when it > is required. > * Impala shell already caches hive fns in catalog db’s object. But catalogD > does *not* process CREATE/DROP Fns HMS event > * When creating a db/table from Impala shell, if the operation fails because > the db/table already exists, then there is no reliable way in catalogd to > determine create event id for that db/table. The create event is required so > that for any subsequent ddl operations, catalogd can process HMS events > starting from createEvent Id. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-12912) Show history of event processing in the /events page
[ https://issues.apache.org/jira/browse/IMPALA-12912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang updated IMPALA-12912: Labels: catalog-2024 (was: ) > Show history of event processing in the /events page > > > Key: IMPALA-12912 > URL: https://issues.apache.org/jira/browse/IMPALA-12912 > Project: IMPALA > Issue Type: Improvement > Components: Catalog >Reporter: Quanlong Huang >Assignee: Maxwell Guo >Priority: Major > Labels: catalog-2024 > > This is a follow-up task of IMPALA-12782 where we add some basic info in the > /events page. It'd be helpful to also show the history of event processing, > including the top-10 expensive events/tables, the recent 10 failure messages, > etc. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-11969) Move incremental statistics out of the partition params to reduce partition's memory footprint
[ https://issues.apache.org/jira/browse/IMPALA-11969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang updated IMPALA-11969: Labels: catalog-2024 (was: ) > Move incremental statistics out of the partition params to reduce partition's > memory footprint > -- > > Key: IMPALA-11969 > URL: https://issues.apache.org/jira/browse/IMPALA-11969 > Project: IMPALA > Issue Type: Improvement > Components: Catalog, Frontend >Affects Versions: Impala 4.2.0 >Reporter: Aman Sinha >Assignee: Sai Hemanth Gantasala >Priority: Major > Labels: catalog-2024 > > Impala stores incremental stats ([1] and [2]) within the partition params of > each partition object. The total bytes consumed by the incremental stats is > estimated as: > {noformat} > 200 bytes per column * num_columns * num_partitions bytes > {noformat} > This means that for wide tables the partition object can get bloated and for > a few thousand partitions, this adds up. It also affects Hive because the > same partition objects are fetched by Hive and can lead to memory pressure > even though Hive does not need those incremental stats. > The intent of the partition parameters in a Partition object was primarily to > keep certain properties or small objects. We should move incremental stats > out of the partition params and store them separately in HMS. There is a > PART_COL_STATS in HMS that could potentially be used (with some redesign of > the schema) or a separate table could be added. > Additional notes: > * Only the latest version of incremental stats is stored. They are stored in > partition parameters using keys like "impala_intermediate_stats_num_chunks" > and "impala_intermediate_stats_chunk0", "impala_intermediate_stats_chunk1", > "impala_intermediate_stats_chunk2", etc. The chunks are used to cap each > value size to be < 4000. [3] > * Impala compresses the incr stats of all columns into a byte array for each > partition. It'd be nice if HMS can compress them as well to save space. > [1] > https://github.com/apache/impala/blob/master/common/thrift/CatalogObjects.thrift#L261 > [2] > https://github.com/apache/impala/blob/master/common/thrift/CatalogObjects.thrift#L232 > [3] > https://github.com/apache/impala/blob/419aa2e30db326f02e9b4ec563ef7864e82df86e/fe/src/main/java/org/apache/impala/catalog/PartitionStatsUtil.java#L182 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-11531) EventProcessor Supportability & Observability
[ https://issues.apache.org/jira/browse/IMPALA-11531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang updated IMPALA-11531: Labels: 2023Q1 catalog-2024 (was: 2023Q1) > EventProcessor Supportability & Observability > - > > Key: IMPALA-11531 > URL: https://issues.apache.org/jira/browse/IMPALA-11531 > Project: IMPALA > Issue Type: Epic > Components: Catalog >Reporter: Quanlong Huang >Assignee: Sai Hemanth Gantasala >Priority: Critical > Labels: 2023Q1, catalog-2024 > > This Epic tracks items aim to improve the supportability and observability of > the event processor. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-12646) Add a new metric in CatalogD web UI metrics page for avg-process-time for dropped events
[ https://issues.apache.org/jira/browse/IMPALA-12646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang updated IMPALA-12646: Labels: catalog-2024 (was: ) > Add a new metric in CatalogD web UI metrics page for avg-process-time for > dropped events > > > Key: IMPALA-12646 > URL: https://issues.apache.org/jira/browse/IMPALA-12646 > Project: IMPALA > Issue Type: Task > Components: Catalog >Reporter: Sai Hemanth Gantasala >Assignee: Sai Hemanth Gantasala >Priority: Major > Labels: catalog-2024 > > Since dropped database/table events doesn't exist in cache, currently we > don't show the process time for drop operation in the metrics page of web UI. > It would be great to add to new metric such as 'avg-process-time' for all the > drop database/table events so that admins can monitor this metric to see if > drop operations are slow. > Generally, a drop should be fast, but it takes metastoreDdlLock_, which can > block on HMS RPCs. So, it would be ideal to observe this operation in the web > UI. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-12447) Update the validWriteId for transactional table during the truncate operation in Catalog API end point
[ https://issues.apache.org/jira/browse/IMPALA-12447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang updated IMPALA-12447: Labels: catalog-2024 (was: ) > Update the validWriteId for transactional table during the truncate operation > in Catalog API end point > -- > > Key: IMPALA-12447 > URL: https://issues.apache.org/jira/browse/IMPALA-12447 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Reporter: Sai Hemanth Gantasala >Assignee: Sai Hemanth Gantasala >Priority: Major > Labels: catalog-2024 > > In the external front-end mode, the catalog API endpoint > (MetastoreServieHandler) handles all the query requests. For the truncate > table operation, the validWriteIdlist is not returning the updated > validWriteId inorder to update in the catalogD's cache, as a result, cache is > holding a stale value which is returning incorrect data/query results. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-12607) Bump GBN to get HMS thift API change HIVE-27499
[ https://issues.apache.org/jira/browse/IMPALA-12607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang updated IMPALA-12607: Labels: catalog-2024 (was: ) > Bump GBN to get HMS thift API change HIVE-27499 > --- > > Key: IMPALA-12607 > URL: https://issues.apache.org/jira/browse/IMPALA-12607 > Project: IMPALA > Issue Type: Improvement > Components: Catalog >Reporter: Sai Hemanth Gantasala >Assignee: Sai Hemanth Gantasala >Priority: Major > Labels: catalog-2024 > > Leverage HIVE-27499, so that Impala can directly fetch the latest events > specific to the database/table from the metastore, instead of fetching the > events from metastore and then filtering in the cache matching the > DbName/TableName. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-12712) INVALIDATE METADATA should set a better createEventId
[ https://issues.apache.org/jira/browse/IMPALA-12712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang updated IMPALA-12712: Labels: catalog-2024 (was: ) > INVALIDATE METADATA should set a better createEventId > - > > Key: IMPALA-12712 > URL: https://issues.apache.org/jira/browse/IMPALA-12712 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Reporter: Quanlong Huang >Assignee: Sai Hemanth Gantasala >Priority: Critical > Labels: catalog-2024 > > "INVALIDATE METADATA " can be used to bring up a table in Impala's > catalog cache if the table exists in HMS. For instance, when HMS event > processing is disabled, we can use it in Impala to bring up tables that are > created outside Impala. > The createEventId for such tables are always set as -1: > [https://github.com/apache/impala/blob/6ddd69c605d4c594e33fdd39a2ca888538b4b8d7/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java#L2243-L2246] > This is problematic when event-processing is enabled. DropTable events and > RenameTable events use the createEventId to decide whether to remove the > table in catalog cache. -1 will lead to always removing the table. Though it > might be added back shortly in follow-up CreateTable events, in the period > between them the table is missing in Impala, causing test failures like > IMPALA-12266. > A simpler reproducing of the issue is creating a table in Hive and launching > Impala with a long event polling interval to mimic the delay on events. Note > that we start Impala cluster after creating the table so Impala don't need to > process the CREATE_TABLE event. > {noformat} > hive> create table debug_tbl (i int); > bin/start-impala-cluster.py --catalogd_args=--hms_event_polling_interval_s=60 > {noformat} > Drop the table in Impala and recreate it in Hive, so it doesn't exist in the > catalog cache but exist in HMS. Run "INVALIDATE METADATA " in Impala > to bring it up before the DROP_TABLE event come. > {noformat} > impala> drop table debug_tbl; > hive> create table debug_tbl (i int, j int); > impala> invalidate metadata debug_tbl; > {noformat} > The table will be dropped by the DROP_TABLE event and then added back by the > CREATE_TABLE event. Shown in catalogd logs: > {noformat} > I0115 16:30:15.376713 3208 JniUtil.java:177] > 02457b6d5f174d1f:3bdeee14] Finished execDdl request: DROP_TABLE > default.debug_tbl issued by quanlong. Time spent: 417ms > I0115 16:30:23.390962 3208 CatalogServiceCatalog.java:2777] > 1840bd101f78d611:22079a5a] Invalidating table metadata: > default.debug_tbl > I0115 16:30:23.404150 3208 Table.java:234] > 1840bd101f78d611:22079a5a] createEventId_ for table: > default.debug_tbl set to: -1 > I0115 16:30:23.405138 3208 JniUtil.java:177] > 1840bd101f78d611:22079a5a] Finished resetMetadata request: INVALIDATE > TABLE default.debug_tbl issued by quanlong. Time spent: 17ms > I0115 16:30:55.108006 32760 MetastoreEvents.java:637] EventId: 8668853 > EventType: DROP_TABLE Successfully removed table default.debug_tbl > I0115 16:30:55.108459 32760 MetastoreEvents.java:637] EventId: 8668855 > EventType: CREATE_TABLE Successfully added table default.debug_tbl > {noformat} > CC [~VenuReddy], [~hemanth619] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-12777) Fix tpcds/tpcds-q66.test
[ https://issues.apache.org/jira/browse/IMPALA-12777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Riza Suminto reassigned IMPALA-12777: - Assignee: Riza Suminto > Fix tpcds/tpcds-q66.test > > > Key: IMPALA-12777 > URL: https://issues.apache.org/jira/browse/IMPALA-12777 > Project: IMPALA > Issue Type: Test > Components: Frontend >Reporter: Riza Suminto >Assignee: Riza Suminto >Priority: Major > Labels: ramp-up > > tpcds/tpcds-q66.test appears to be mistakenly a copy of tpcds/tpcds-q61.test > with different predicate values. > [https://github.com/apache/impala/blob/00247c2/testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q66.test] > [https://github.com/apache/impala/blob/00247c2/testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q61.test] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Work started] (IMPALA-13012) Completed queries write fails regularly under heavy load
[ https://issues.apache.org/jira/browse/IMPALA-13012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on IMPALA-13012 started by Michael Smith. -- > Completed queries write fails regularly under heavy load > > > Key: IMPALA-13012 > URL: https://issues.apache.org/jira/browse/IMPALA-13012 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 4.4.0 >Reporter: Michael Smith >Assignee: Michael Smith >Priority: Major > > Under heavy test load (running EE tests), Impala regularly fails to write > completed queries with errors like > {code} > W0411 19:11:07.764967 32713 workload-management.cc:435] failed to write > completed queries table="sys.impala_query_log" record_count="10001" > W0411 19:11:07.764983 32713 workload-management.cc:437] AnalysisException: > Exceeded the statement expression limit (25) > Statement has 370039 expressions. > {code} > After a few attempts, it floods logs with an error for each query that could > not be written > {code} > E0411 19:11:24.646953 32713 workload-management.cc:376] could not write > completed query table="sys.impala_query_log" > query_id="3142ceb1380b58e6:715b83d9" > {code} > This seems like poor default behavior. Options for addressing it: > # Decrease the default for {{query_log_max_queued}}. Inserts are pretty > constant at 37 expressions per entry. I'm not sure why that isn't 49, since > that's the number of columns we have; maybe some fields are frequently > omitted. I would cap {{query_log_max_queued}} to {{statement_expression_limit > / number_of_columns ~ 5100}}. > # Allow workload management to {{set statement_expression_limit}} higher > using a similar formula. This may be relatively safe as the expressions are > simple. > # Ideally we would skip expression parsing and construct TExecRequest > directly, but that's a much larger effort. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Work started] (IMPALA-12997) test_query_log tests get stuck trying to write to the log
[ https://issues.apache.org/jira/browse/IMPALA-12997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on IMPALA-12997 started by Michael Smith. -- > test_query_log tests get stuck trying to write to the log > - > > Key: IMPALA-12997 > URL: https://issues.apache.org/jira/browse/IMPALA-12997 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Affects Versions: Impala 4.4.0 >Reporter: Michael Smith >Assignee: Michael Smith >Priority: Major > > In some test runs, most tests under test_query_log will start to fail on > various conditions like > {code} > custom_cluster/test_query_log.py:452: in > test_query_log_table_query_select_mt_dop > "impala-server.completed-queries.written", 1, 60) > common/impala_service.py:144: in wait_for_metric_value > self.__metric_timeout_assert(metric_name, expected_value, timeout) > common/impala_service.py:213: in __metric_timeout_assert > assert 0, assert_string > E AssertionError: Metric impala-server.completed-queries.written did not > reach value 1 in 60s. > E Dumping debug webpages in JSON format... > E Dumped memz JSON to > $IMPALA_HOME/logs/metric_timeout_diags_20240410_12:49:04/json/memz.json > E Dumped metrics JSON to > $IMPALA_HOME/logs/metric_timeout_diags_20240410_12:49:04/json/metrics.json > E Dumped queries JSON to > $IMPALA_HOME/logs/metric_timeout_diags_20240410_12:49:04/json/queries.json > E Dumped sessions JSON to > $IMPALA_HOME/logs/metric_timeout_diags_20240410_12:49:04/json/sessions.json > E Dumped threadz JSON to > $IMPALA_HOME/logs/metric_timeout_diags_20240410_12:49:04/json/threadz.json > E Dumped rpcz JSON to > $IMPALA_HOME/logs/metric_timeout_diags_20240410_12:49:04/json/rpcz.json > E Dumping minidumps for impalads/catalogds... > E Dumped minidump for Impalad PID 3680802 > E Dumped minidump for Impalad PID 3680805 > E Dumped minidump for Impalad PID 3680809 > E Dumped minidump for Catalogd PID 3680732 > {code} > or > {code} > custom_cluster/test_query_log.py:921: in test_query_log_ignored_sqls > assert len(sql_results.data) == 1, "query not found in completed queries > table" > E AssertionError: query not found in completed queries table > E assert 0 == 1 > E+ where 0 = len([]) > E+where [] = object at 0xa00cc350>.data > {code} > One symptom that seems related to this is INSERT operations into > sys.impala_query_log that start "UnregisterQuery()" but never finish (with > "Query successfully unregistered"). > We can identify cases like that with > {code} > for log in $(ag -l 'INSERT INTO sys.impala_query_log' impalad.*); do echo > $log; for qid in $(ag -o '[0-9a-f]*:[0-9a-f]*\] Analyzing query: INSERT INTO > sys.impala_query_log' $log | cut -d']' -f1); do if ! ag "Query successfully > unregistered: query_id=$qid" $log; then echo "$qid not unregistered"; fi; > done; done > {code} > A similar case may occur with creating the table too > {code} > for log in $(ag -l 'CREATE TABLE IF NOT EXISTS sys.impala_query_log' > impalad.impala-ec2-rhel88-m7g-4xlarge-ondemand-0a5a.vpc.cloudera.com.jenkins.log.INFO.20240410-*); > do QID=$(ag -o '[0-9a-f]*:[0-9a-f]*\] Analyzing query: INSERT INTO > sys.impala_query_log' $log | cut -d']' -f1); echo $log; ag "Query > successfully unregistered: query_id=$QID" $log; done > {code} > although these frequently fail because the test completes and shuts down > Impala before the CREATE TABLE query completes. > Tracking one of those cases led to catalogd errors that repeated for 1m27s > before the test suite restarted catalogd: > {code} > W0410 12:48:05.051760 3681790 Tasks.java:456] > 6647229faf7637d5:3ec7565b] Retrying task after failure: Waiting for > lock on table sys.impala_query_log > Java exception follows: > org.apache.iceberg.hive.MetastoreLock$WaitingForLockException: Waiting for > lock on table sys.impala_query_log > at > org.apache.iceberg.hive.MetastoreLock.lambda$acquireLock$1(MetastoreLock.java:217) > at > org.apache.iceberg.util.Tasks$Builder.runTaskWithRetry(Tasks.java:413) > at > org.apache.iceberg.util.Tasks$Builder.runSingleThreaded(Tasks.java:219) > at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:203) > at > org.apache.iceberg.hive.MetastoreLock.acquireLock(MetastoreLock.java:209) > at org.apache.iceberg.hive.MetastoreLock.lock(MetastoreLock.java:146) > at > org.apache.iceberg.hive.HiveTableOperations.doCommit(HiveTableOperations.java:194) > at > org.apache.iceberg.BaseMetastoreTableOperations.commit(BaseMetastoreTableOperations.java:135) > at > org.apache.iceberg.BaseTransaction.lambda$commitSimpleTransaction$3(BaseTransaction.java:417) > at >
[jira] [Comment Edited] (IMPALA-12997) test_query_log tests get stuck trying to write to the log
[ https://issues.apache.org/jira/browse/IMPALA-12997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17839873#comment-17839873 ] Michael Smith edited comment on IMPALA-12997 at 4/22/24 10:35 PM: -- Ok, the test prior to failure - {{test_query_log_table_query_select_dedicate_coordinator}} - uses {{impalad_graceful_shutdown=False}}, which presumably then causes {{test_query_log_table_query_select_mt_dop}} to fail. was (Author: JIRAUSER288956): Ok, I see logs showing that {{INSERT INTO sys.impala_query_log}} was not completed when SIGTERM is sent in {{test_query_log_table_query_select_dedicate_coordinator}}, which then causes the next test - {{test_query_log_table_query_select_mt_dop}} - to fail. {code} I0420 18:46:51.047513 3677147 Frontend.java:2138] ed4a6e980fa2edd3:ad98a48b] Analyzing query: INSERT INTO sys.impala_query_log(cluster_id,query_id,session_id,session_type,hiveserver2_protocol_version,db_user,db_user_connection,db_name,impala_coordinator,query_status,query_state,impala_query_end_state,query_type,network_address,start_time_utc,total_time_ms,query_opts_config,resource_pool,per_host_mem_estimate,dedicated_coord_mem_estimate,per_host_fragment_instances,backends_count,admission_result,cluster_memory_admitted,executor_group,executor_groups,exec_summary,num_rows_fetched,row_materialization_rows_per_sec,row_materialization_time_ms,compressed_bytes_spilled,event_planning_finished,event_submit_for_admission,event_completed_admission,event_all_backends_started,event_rows_available,event_first_row_fetched,event_last_row_fetched,event_unregister_query,read_io_wait_total_ms,read_io_wait_mean_ms,bytes_read_cache_total,bytes_read_total,pernode_peak_mem_min,pernode_peak_mem_max,pernode_peak_mem_mean,sql,plan,tables_queried) VALUES ('','274d6f56fe45a17a:c57fd8e7','e942432cb76ae2fc:ff28eafb824adb9f','BEESWAX','','jenkins','jenkins','default','impala-ec2-rhel88-m7g-4xlarge-ondemand-124c.vpc.cloudera.com:27000','OK','FINISHED','FINISHED','QUERY','127.0.0.1:58332',UNIX_MICROS_TO_UTC_TIMESTAMP(1713664009827089),CAST(518.503 AS DECIMAL(18,3)),'TIMEZONE=America/Los_Angeles,CLIENT_IDENTIFIER=custom_cluster/test_query_log.py::TestQueryLogTableBeeswax::()::test_query_log_table_query_select_dedicate_coordinator[protocol:beeswax|exec_option:{\'test_replan\':1\;\'batch_size\':0\;\'num_nodes\':0\;\'disable_codegen_rows_threshold\':5000\;\'disable_codegen\':False','default-pool',37879808,109068288,'impala-ec2-rhel88-m7g-4xlarge-ondemand-124c.vpc.cloudera.com:27000=1,impala-ec2-rhel88-m7g-4xlarge-ondemand-124c.vpc.cloudera.com:27002=1',2,'Admitted immediately',146948096,'default','Executor group 1:\n Verdict: Match\n - MemoryAsk: 36.12 MB (37879808)\n - MemoryMax: 8589934592.00 GB (9223372036854775807)','Operator #Hosts #Inst Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail \n--\nF01:ROOT 1 1 0.000ns 0.000ns 4.01 MB 4.00 MB \n01:EXCHANGE1 1 0.000ns 0.000ns 3 2 16.00 KB 16.00 KB UNPARTITIONED \nF00:EXCHANGE SENDER1 1 0.000ns 0.000ns 5.34 KB 112.00 KB \n00:SCAN HDFS 1 1 60.001ms 60.001ms 3 2 99.00 KB 32.00 MB functional.tinytable',3,0,CAST(0 AS DECIMAL(18,3)),0,CAST(360.008 AS DECIMAL(18,3)),CAST(360.008 AS DECIMAL(18,3)),CAST(360.008 AS DECIMAL(18,3)),CAST(380.008 AS DECIMAL(18,3)),CAST(470.01 AS DECIMAL(18,3)),CAST(520.011 AS DECIMAL(18,3)),CAST(520.011 AS DECIMAL(18,3)),CAST(520.011 AS DECIMAL(18,3)),CAST(60.0013 AS DECIMAL(18,3)),CAST(60.0013 AS DECIMAL(18,3)),0,38,159352,4247552,2203452,'select * from functional.tinytable','\nMax Per-Host Resource Reservation: Memory=4.01MB Threads=3\nPer-Host Resource Estimates: Memory=36MB\nDedicated Coordinator Resource Estimate: Memory=104MB\nWARNING: The following tables are missing relevant table and/or column statistics.\nfunctional.tinytable\nAnalyzed query: SELECT * FROM functional.tinytable\n\nF01:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1\n| Per-Host Resources: mem-estimate=4.02MB mem-reservation=4.00MB thread-reservation=1\nPLAN-ROOT SINK\n| output exprs: functional.tinytable.a, functional.tinytable.b\n| mem-estimate=4.00MB mem-reservation=4.00MB spill-buffer=2.00MB thread-reservation=0\n|\n01:EXCHANGE [UNPARTITIONED]\n| mem-estimate=16.00KB mem-reservation=0B thread-reservation=0\n| tuple-ids=0 row-size=24B cardinality=2\n| in pipelines: 00(GETNEXT)\n|\nF00:PLAN FRAGMENT [RANDOM] hosts=1 instances=1\nPer-Host Resources: mem-estimate=32.11MB mem-reservation=8.00KB
[jira] [Commented] (IMPALA-12997) test_query_log tests get stuck trying to write to the log
[ https://issues.apache.org/jira/browse/IMPALA-12997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17839873#comment-17839873 ] Michael Smith commented on IMPALA-12997: Ok, I see logs showing that {{INSERT INTO sys.impala_query_log}} was not completed when SIGTERM is sent in {{test_query_log_table_query_select_dedicate_coordinator}}, which then causes the next test - {{test_query_log_table_query_select_mt_dop}} - to fail. {code} I0420 18:46:51.047513 3677147 Frontend.java:2138] ed4a6e980fa2edd3:ad98a48b] Analyzing query: INSERT INTO sys.impala_query_log(cluster_id,query_id,session_id,session_type,hiveserver2_protocol_version,db_user,db_user_connection,db_name,impala_coordinator,query_status,query_state,impala_query_end_state,query_type,network_address,start_time_utc,total_time_ms,query_opts_config,resource_pool,per_host_mem_estimate,dedicated_coord_mem_estimate,per_host_fragment_instances,backends_count,admission_result,cluster_memory_admitted,executor_group,executor_groups,exec_summary,num_rows_fetched,row_materialization_rows_per_sec,row_materialization_time_ms,compressed_bytes_spilled,event_planning_finished,event_submit_for_admission,event_completed_admission,event_all_backends_started,event_rows_available,event_first_row_fetched,event_last_row_fetched,event_unregister_query,read_io_wait_total_ms,read_io_wait_mean_ms,bytes_read_cache_total,bytes_read_total,pernode_peak_mem_min,pernode_peak_mem_max,pernode_peak_mem_mean,sql,plan,tables_queried) VALUES ('','274d6f56fe45a17a:c57fd8e7','e942432cb76ae2fc:ff28eafb824adb9f','BEESWAX','','jenkins','jenkins','default','impala-ec2-rhel88-m7g-4xlarge-ondemand-124c.vpc.cloudera.com:27000','OK','FINISHED','FINISHED','QUERY','127.0.0.1:58332',UNIX_MICROS_TO_UTC_TIMESTAMP(1713664009827089),CAST(518.503 AS DECIMAL(18,3)),'TIMEZONE=America/Los_Angeles,CLIENT_IDENTIFIER=custom_cluster/test_query_log.py::TestQueryLogTableBeeswax::()::test_query_log_table_query_select_dedicate_coordinator[protocol:beeswax|exec_option:{\'test_replan\':1\;\'batch_size\':0\;\'num_nodes\':0\;\'disable_codegen_rows_threshold\':5000\;\'disable_codegen\':False','default-pool',37879808,109068288,'impala-ec2-rhel88-m7g-4xlarge-ondemand-124c.vpc.cloudera.com:27000=1,impala-ec2-rhel88-m7g-4xlarge-ondemand-124c.vpc.cloudera.com:27002=1',2,'Admitted immediately',146948096,'default','Executor group 1:\n Verdict: Match\n - MemoryAsk: 36.12 MB (37879808)\n - MemoryMax: 8589934592.00 GB (9223372036854775807)','Operator #Hosts #Inst Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail \n--\nF01:ROOT 1 1 0.000ns 0.000ns 4.01 MB 4.00 MB \n01:EXCHANGE1 1 0.000ns 0.000ns 3 2 16.00 KB 16.00 KB UNPARTITIONED \nF00:EXCHANGE SENDER1 1 0.000ns 0.000ns 5.34 KB 112.00 KB \n00:SCAN HDFS 1 1 60.001ms 60.001ms 3 2 99.00 KB 32.00 MB functional.tinytable',3,0,CAST(0 AS DECIMAL(18,3)),0,CAST(360.008 AS DECIMAL(18,3)),CAST(360.008 AS DECIMAL(18,3)),CAST(360.008 AS DECIMAL(18,3)),CAST(380.008 AS DECIMAL(18,3)),CAST(470.01 AS DECIMAL(18,3)),CAST(520.011 AS DECIMAL(18,3)),CAST(520.011 AS DECIMAL(18,3)),CAST(520.011 AS DECIMAL(18,3)),CAST(60.0013 AS DECIMAL(18,3)),CAST(60.0013 AS DECIMAL(18,3)),0,38,159352,4247552,2203452,'select * from functional.tinytable','\nMax Per-Host Resource Reservation: Memory=4.01MB Threads=3\nPer-Host Resource Estimates: Memory=36MB\nDedicated Coordinator Resource Estimate: Memory=104MB\nWARNING: The following tables are missing relevant table and/or column statistics.\nfunctional.tinytable\nAnalyzed query: SELECT * FROM functional.tinytable\n\nF01:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1\n| Per-Host Resources: mem-estimate=4.02MB mem-reservation=4.00MB thread-reservation=1\nPLAN-ROOT SINK\n| output exprs: functional.tinytable.a, functional.tinytable.b\n| mem-estimate=4.00MB mem-reservation=4.00MB spill-buffer=2.00MB thread-reservation=0\n|\n01:EXCHANGE [UNPARTITIONED]\n| mem-estimate=16.00KB mem-reservation=0B thread-reservation=0\n| tuple-ids=0 row-size=24B cardinality=2\n| in pipelines: 00(GETNEXT)\n|\nF00:PLAN FRAGMENT [RANDOM] hosts=1 instances=1\nPer-Host Resources: mem-estimate=32.11MB mem-reservation=8.00KB thread-reservation=2\n00:SCAN HDFS [functional.tinytable, RANDOM]\n HDFS partitions=1/1 files=1 size=38B\n stored statistics:\n table: rows=unavailable size=38B\n columns: unavailable\n extrapolated-rows=disabled max-scan-range-rows=unavailable\n mem-estimate=32.00MB mem-reservation=8.00KB
[jira] [Assigned] (IMPALA-13027) Directly create exec request for query log inserts
[ https://issues.apache.org/jira/browse/IMPALA-13027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Smith reassigned IMPALA-13027: -- Assignee: Jason Fehr > Directly create exec request for query log inserts > -- > > Key: IMPALA-13027 > URL: https://issues.apache.org/jira/browse/IMPALA-13027 > Project: IMPALA > Issue Type: Improvement >Reporter: Michael Smith >Assignee: Jason Fehr >Priority: Major > > Inserts into {{sys.impala_query_log}} can be large, as they include the query > profile and currently 49 columns for each query. Right now inserts are > executed via an SQL statement, which needs to be parsed by Impala's Frontend. > This can take a non-trivial amount of time - on the order of seconds - when > reaching the query queue limit (1000s of queries), and also require > increasing the {{statement_expression_limit}} above 250k. > We could improve this by directly constructing the TExecRequest for the > insert in InternalServer (likely via a new function API). This would also let > us avoid needing to escape strings in the SQL statement by directly > constructing the relevant type objects. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-13027) Directly create exec request for query log inserts
Michael Smith created IMPALA-13027: -- Summary: Directly create exec request for query log inserts Key: IMPALA-13027 URL: https://issues.apache.org/jira/browse/IMPALA-13027 Project: IMPALA Issue Type: Improvement Reporter: Michael Smith Inserts into {{sys.impala_query_log}} can be large, as they include the query profile and currently 49 columns for each query. Right now inserts are executed via an SQL statement, which needs to be parsed by Impala's Frontend. This can take a non-trivial amount of time - on the order of seconds - when reaching the query queue limit (1000s of queries), and also require increasing the {{statement_expression_limit}} above 250k. We could improve this by directly constructing the TExecRequest for the insert in InternalServer (likely via a new function API). This would also let us avoid needing to escape strings in the SQL statement by directly constructing the relevant type objects. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12938) test_no_inaccessible_objects failed in JDK11 build
[ https://issues.apache.org/jira/browse/IMPALA-12938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17839696#comment-17839696 ] ASF subversion and git services commented on IMPALA-12938: -- Commit b754a9494da4da56bf215276c206dba3b34cf539 in impala's branch refs/heads/branch-4.4.0 from Michael Smith [ https://gitbox.apache.org/repos/asf?p=impala.git;h=b754a9494 ] IMPALA-12938: add-opens for platform.cgroupv1 Adds '--add-opens=jdk.internal.platform.cgroupv1' for Java 11 with ehcache, covering Impala daemons and frontend tests. Fixes InaccessibleObjectException detected by test_banned_log_messages.py. Change-Id: I312ae987c17c6f06e1ffe15e943b1865feef6b82 Reviewed-on: http://gerrit.cloudera.org:8080/21334 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > test_no_inaccessible_objects failed in JDK11 build > -- > > Key: IMPALA-12938 > URL: https://issues.apache.org/jira/browse/IMPALA-12938 > Project: IMPALA > Issue Type: Bug >Reporter: Zoltán Borók-Nagy >Assignee: Michael Smith >Priority: Major > Labels: broken-build > Fix For: Impala 4.4.0 > > > h3. Error Message > {noformat} > AssertionError: > /data/jenkins/workspace/impala-asf-master-core-jdk11/repos/Impala/logs/custom_cluster_tests/impalad.impala-ec2-centos79-m6i-4xlarge-xldisk-197f.vpc.cloudera.com.jenkins.log.INFO.20240323-184351.16035 > contains 'InaccessibleObjectException' assert 0 == 1{noformat} > h3. Stacktrace > {noformat} > verifiers/test_banned_log_messages.py:40: in test_no_inaccessible_objects > self.assert_message_absent('InaccessibleObjectException') > verifiers/test_banned_log_messages.py:36: in assert_message_absent > assert returncode == 1, "%s contains '%s'" % (log_file_path, message) > E AssertionError: > /data/jenkins/workspace/impala-asf-master-core-jdk11/repos/Impala/logs/custom_cluster_tests/impalad.impala-ec2-centos79-m6i-4xlarge-xldisk-197f.vpc.cloudera.com.jenkins.log.INFO.20240323-184351.16035 > contains 'InaccessibleObjectException' > E assert 0 == 1{noformat} > h3. Standard Output > {noformat} > java.lang.reflect.InaccessibleObjectException: Unable to make field private > jdk.internal.platform.cgroupv1.CgroupV1MemorySubSystemController > jdk.internal.platform.cgroupv1.CgroupV1Subsystem.memory accessible: module > java.base does not "opens jdk.internal.platform.cgroupv1" to unnamed module > @1a2e2935 > java.lang.reflect.InaccessibleObjectException: Unable to make field private > jdk.internal.platform.cgroupv1.CgroupV1SubsystemController > jdk.internal.platform.cgroupv1.CgroupV1Subsystem.cpu accessible: module > java.base does not "opens jdk.internal.platform.cgroupv1" to unnamed module > @1a2e2935 > java.lang.reflect.InaccessibleObjectException: Unable to make field private > jdk.internal.platform.cgroupv1.CgroupV1SubsystemController > jdk.internal.platform.cgroupv1.CgroupV1Subsystem.cpuacct accessible: module > java.base does not "opens jdk.internal.platform.cgroupv1" to unnamed module > @1a2e2935 > java.lang.reflect.InaccessibleObjectException: Unable to make field private > jdk.internal.platform.cgroupv1.CgroupV1SubsystemController > jdk.internal.platform.cgroupv1.CgroupV1Subsystem.cpuset accessible: module > java.base does not "opens jdk.internal.platform.cgroupv1" to unnamed module > @1a2e2935 > java.lang.reflect.InaccessibleObjectException: Unable to make field private > jdk.internal.platform.cgroupv1.CgroupV1SubsystemController > jdk.internal.platform.cgroupv1.CgroupV1Subsystem.blkio accessible: module > java.base does not "opens jdk.internal.platform.cgroupv1" to unnamed module > @1a2e2935 > java.lang.reflect.InaccessibleObjectException: Unable to make field private > jdk.internal.platform.cgroupv1.CgroupV1SubsystemController > jdk.internal.platform.cgroupv1.CgroupV1Subsystem.pids accessible: module > java.base does not "opens jdk.internal.platform.cgroupv1" to unnamed module > @1a2e2935 > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-13004) heap-use-after-free error in ExprTest AiFunctionsTest
[ https://issues.apache.org/jira/browse/IMPALA-13004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17839692#comment-17839692 ] ASF subversion and git services commented on IMPALA-13004: -- Commit a4a755d173822d3e123a871ffad6203ca98ff9f5 in impala's branch refs/heads/branch-4.4.0 from Yida Wu [ https://gitbox.apache.org/repos/asf?p=impala.git;h=a4a755d17 ] IMPALA-13004: Fix heap-use-after-free error in ExprTest AiFunctionsTest The issue is that the code previously used a std::string_view to hold the data which is actually returned by rapidjson::Document. However, the rapidjson::Document object gets destroyed after creating the std::string_view. This meant the std::string_view referenced memory that was no longer valid, leading to a heap-use-after-free error. This patch fixes this issue by modifying the function to return a std::string instead of a std::string_view. When the function returns a string, it creates a copy of the data from rapidjson::Document. This ensures the returned string has its own memory allocation and doesn't rely on the destroyed rapidjson::Document. Tests: Reran the asan build and passed. Change-Id: I3bb9dcf9d72cce7ad37d5bc25821cf6ee55a8ab5 Reviewed-on: http://gerrit.cloudera.org:8080/21315 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > heap-use-after-free error in ExprTest AiFunctionsTest > - > > Key: IMPALA-13004 > URL: https://issues.apache.org/jira/browse/IMPALA-13004 > Project: IMPALA > Issue Type: Bug > Components: be >Affects Versions: Impala 4.4.0 >Reporter: Andrew Sherman >Assignee: Yida Wu >Priority: Critical > Fix For: Impala 4.4.0 > > > In an ASAN test, expr-test fails: > {code} > ==1601==ERROR: AddressSanitizer: heap-use-after-free on address > 0x63100152c826 at pc 0x0298f841 bp 0x7ffc91fff460 sp 0x7ffc91fff458 > READ of size 2 at 0x63100152c826 thread T0 > #0 0x298f840 in rapidjson::GenericValue, > rapidjson::MemoryPoolAllocator >::GetType() const > /data/jenkins/workspace/impala-cdw-master-core-asan/Impala-Toolchain/toolchain-packages-gcc10.4.0/rapidjson-1.1.0/include/rapidjson/document.h:936:62 > #1 0x298d852 in bool rapidjson::GenericValue, > rapidjson::MemoryPoolAllocator > >::Accept, > rapidjson::CrtAllocator>, rapidjson::UTF8, rapidjson::UTF8, > rapidjson::CrtAllocator, 0u> > >(rapidjson::Writer, > rapidjson::CrtAllocator>, rapidjson::UTF8, rapidjson::UTF8, > rapidjson::CrtAllocator, 0u>&) const > /data/jenkins/workspace/impala-cdw-master-core-asan/Impala-Toolchain/toolchain-packages-gcc10.4.0/rapidjson-1.1.0/include/rapidjson/document.h:1769:16 > #2 0x298d8d0 in bool rapidjson::GenericValue, > rapidjson::MemoryPoolAllocator > >::Accept, > rapidjson::CrtAllocator>, rapidjson::UTF8, rapidjson::UTF8, > rapidjson::CrtAllocator, 0u> > >(rapidjson::Writer, > rapidjson::CrtAllocator>, rapidjson::UTF8, rapidjson::UTF8, > rapidjson::CrtAllocator, 0u>&) const > /data/jenkins/workspace/impala-cdw-master-core-asan/Impala-Toolchain/toolchain-packages-gcc10.4.0/rapidjson-1.1.0/include/rapidjson/document.h:1790:21 > #3 0x298d9e8 in bool rapidjson::GenericValue, > rapidjson::MemoryPoolAllocator > >::Accept, > rapidjson::CrtAllocator>, rapidjson::UTF8, rapidjson::UTF8, > rapidjson::CrtAllocator, 0u> > >(rapidjson::Writer, > rapidjson::CrtAllocator>, rapidjson::UTF8, rapidjson::UTF8, > rapidjson::CrtAllocator, 0u>&) const > /data/jenkins/workspace/impala-cdw-master-core-asan/Impala-Toolchain/toolchain-packages-gcc10.4.0/rapidjson-1.1.0/include/rapidjson/document.h:1781:21 > #4 0x28a0707 in impala_udf::StringVal > impala::AiFunctions::AiGenerateTextInternal(impala_udf::FunctionContext*, > impala_udf::StringVal const&, impala_udf::StringVal const&, > impala_udf::StringVal const&, impala_udf::StringVal const&, > impala_udf::StringVal const&, bool) > /data/jenkins/workspace/impala-cdw-master-core-asan/repos/Impala/be/src/exprs/ai-functions.inline.h:140:11 > #5 0x286087e in impala::ExprTest_AiFunctionsTest_Test::TestBody() > /data/jenkins/workspace/impala-cdw-master-core-asan/repos/Impala/be/src/exprs/expr-test.cc:11254:12 > #6 0x8aeaa4c in void > testing::internal::HandleExceptionsInMethodIfSupported void>(testing::Test*, void (testing::Test::*)(), char const*) > (/data0/jenkins/workspace/impala-cdw-master-core-asan/repos/Impala/be/build/debug/service/unifiedbetests+0x8aeaa4c) > #7 0x8ae3ec4 in testing::Test::Run() > (/data0/jenkins/workspace/impala-cdw-master-core-asan/repos/Impala/be/build/debug/service/unifiedbetests+0x8ae3ec4) > #8 0x8ae4007 in testing::TestInfo::Run() > (/data0/jenkins/workspace/impala-cdw-master-core-asan/repos/Impala/be/build/debug/service/unifiedbetests+0x8ae4007) > #9 0x8ae40e4 in
[jira] [Commented] (IMPALA-13016) Fix ambiguous row_regex that check for no-existence
[ https://issues.apache.org/jira/browse/IMPALA-13016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17839695#comment-17839695 ] ASF subversion and git services commented on IMPALA-13016: -- Commit af31854a265e54111e6caf870ca28753038d5bde in impala's branch refs/heads/branch-4.4.0 from Riza Suminto [ https://gitbox.apache.org/repos/asf?p=impala.git;h=af31854a2 ] IMPALA-13016: Fix ambiguous row_regex that check for no-existence There are few row_regex patterns used in EE test files that are ambiguous on whether a pattern does not exist in all parts of the results/runtime profile or at least one row does not have that pattern. These were caught by grepping the following pattern: $ git grep -n "row_regex: (?\!" This patch replaces them with either with !row_regex or VERIFY_IS_NOT_IN comment. Testing: - Run and pass modified tests. Change-Id: Ic81de34bf997dfaf1c199b1fe1b05346b55ff4da Reviewed-on: http://gerrit.cloudera.org:8080/21333 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > Fix ambiguous row_regex that check for no-existence > --- > > Key: IMPALA-13016 > URL: https://issues.apache.org/jira/browse/IMPALA-13016 > Project: IMPALA > Issue Type: Improvement > Components: Infrastructure >Reporter: Riza Suminto >Assignee: Riza Suminto >Priority: Minor > Fix For: Impala 4.4.0 > > > There are few row_regex pattern used in EE test files that is ambiguous on > whether a parttern not exist in all parts of results/runtime filter or at > least one row does not have that pattern: > {code:java} > $ git grep -n "row_regex: (?\!" > testdata/workloads/functional-query/queries/QueryTest/acid-clear-statsaccurate.test:34:row_regex: > (?!.*COLUMN_STATS_ACCURATE) > testdata/workloads/functional-query/queries/QueryTest/acid-truncate.test:47:row_regex: > (?!.*COLUMN_STATS_ACCURATE) > testdata/workloads/functional-query/queries/QueryTest/clear-statsaccurate.test:28:row_regex: > (?!.*COLUMN_STATS_ACCURATE) > testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-directed-mode.test:14:row_regex: > (?!.*F03:JOIN BUILD.*) {code} > They should be replaced either with !row_regex or VERIFY_IS_NOT_IN comment. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12963) Testcase test_query_log_table_lower_max_sql_plan failed in ubsan builds
[ https://issues.apache.org/jira/browse/IMPALA-12963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17839683#comment-17839683 ] ASF subversion and git services commented on IMPALA-12963: -- Commit b342710f574a0b49a123f774137cf5298844fbac in impala's branch refs/heads/branch-4.4.0 from Michael Smith [ https://gitbox.apache.org/repos/asf?p=impala.git;h=b342710f5 ] IMPALA-12963: Return parent PID when children spawned Returns the original PID for a command rather than any children that may be active. This happens during graceful shutdown in UBSAN tests. Also updates 'kill' to use the version of 'get_pid' that logs details to help with debugging. Moves try block in test_query_log.py to after client2 has been initialized. Removes 'drop table' on unique_database, since test suite already handles cleanup. Change-Id: I214e79507c717340863d27f68f6ea54c169e4090 Reviewed-on: http://gerrit.cloudera.org:8080/21278 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > Testcase test_query_log_table_lower_max_sql_plan failed in ubsan builds > --- > > Key: IMPALA-12963 > URL: https://issues.apache.org/jira/browse/IMPALA-12963 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 4.4.0 >Reporter: Yida Wu >Assignee: Michael Smith >Priority: Major > Fix For: Impala 4.4.0 > > > Testcase test_query_log_table_lower_max_sql_plan failed in ubsan builds with > following messages: > *Error Message* > {code:java} > test setup failure > {code} > *Stacktrace* > {code:java} > common/custom_cluster_test_suite.py:226: in teardown_method > impalad.wait_for_exit() > common/impala_cluster.py:471: in wait_for_exit > while self.__get_pid() is not None: > common/impala_cluster.py:414: in __get_pid > assert len(pids) < 2, "Expected single pid but found %s" % ", > ".join(map(str, pids)) > E AssertionError: Expected single pid but found 892, 31942 > {code} > *Standard Error* > {code:java} > -- 2024-03-28 04:21:44,105 INFO MainThread: Starting cluster with > command: > /data/jenkins/workspace/impala-cdw-master-staging-core-ubsan/repos/Impala/bin/start-impala-cluster.py > '--state_store_args=--statestore_update_frequency_ms=50 > --statestore_priority_update_frequency_ms=50 > --statestore_heartbeat_frequency_ms=50' --cluster_size=3 --num_coordinators=3 > --log_dir=/data/jenkins/workspace/impala-cdw-master-staging-core-ubsan/repos/Impala/logs/custom_cluster_tests > --log_level=1 '--impalad_args=--enable_workload_mgmt > --query_log_write_interval_s=1 --cluster_id=test_max_select > --shutdown_grace_period_s=10 --shutdown_deadline_s=60 > --query_log_max_sql_length=2000 --query_log_max_plan_length=2000 ' > '--state_store_args=None ' '--catalogd_args=--enable_workload_mgmt ' > --impalad_args=--default_query_options= > 04:21:44 MainThread: Found 0 impalad/0 statestored/0 catalogd process(es) > 04:21:44 MainThread: Starting State Store logging to > /data/jenkins/workspace/impala-cdw-master-staging-core-ubsan/repos/Impala/logs/custom_cluster_tests/statestored.INFO > 04:21:44 MainThread: Starting Catalog Service logging to > /data/jenkins/workspace/impala-cdw-master-staging-core-ubsan/repos/Impala/logs/custom_cluster_tests/catalogd.INFO > 04:21:44 MainThread: Starting Impala Daemon logging to > /data/jenkins/workspace/impala-cdw-master-staging-core-ubsan/repos/Impala/logs/custom_cluster_tests/impalad.INFO > 04:21:44 MainThread: Starting Impala Daemon logging to > /data/jenkins/workspace/impala-cdw-master-staging-core-ubsan/repos/Impala/logs/custom_cluster_tests/impalad_node1.INFO > 04:21:44 MainThread: Starting Impala Daemon logging to > /data/jenkins/workspace/impala-cdw-master-staging-core-ubsan/repos/Impala/logs/custom_cluster_tests/impalad_node2.INFO > 04:21:47 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es) > 04:21:47 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es) > 04:21:47 MainThread: Getting num_known_live_backends from > impala-ec2-centos79-m6i-4xlarge-ondemand-174b.vpc.cloudera.com:25000 > 04:21:47 MainThread: Waiting for num_known_live_backends=3. Current value: 0 > 04:21:48 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es) > 04:21:48 MainThread: Getting num_known_live_backends from > impala-ec2-centos79-m6i-4xlarge-ondemand-174b.vpc.cloudera.com:25000 > 04:21:48 MainThread: Waiting for num_known_live_backends=3. Current value: 0 > 04:21:49 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es) > 04:21:49 MainThread: Getting num_known_live_backends from > impala-ec2-centos79-m6i-4xlarge-ondemand-174b.vpc.cloudera.com:25000 > 04:21:49 MainThread: Waiting for num_known_live_backends=3. Current value: 2 > 04:21:50 MainThread:
[jira] [Commented] (IMPALA-12679) test_rows_sent_counters failed to match RPCCount
[ https://issues.apache.org/jira/browse/IMPALA-12679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17839686#comment-17839686 ] ASF subversion and git services commented on IMPALA-12679: -- Commit 9190a28870c727f9c260cc245eea1c153ee11ff2 in impala's branch refs/heads/branch-4.4.0 from Kurt Deschler [ https://gitbox.apache.org/repos/asf?p=impala.git;h=9190a2887 ] IMPALA-12679: Improve test_rows_sent_counters assert This patch changes the assert for failed test test_rows_sent_counters so that the actual RPC count is displayed in the assert output. The root cause of the failure will be addressed once sufficient data is collected with the new output. Testing: Ran test_rows_sent_counters with modified expected RPC count range to simulate failure. Change-Id: Ic6b48cf4039028e749c914ee60b88f04833a0069 Reviewed-on: http://gerrit.cloudera.org:8080/21310 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > test_rows_sent_counters failed to match RPCCount > > > Key: IMPALA-12679 > URL: https://issues.apache.org/jira/browse/IMPALA-12679 > Project: IMPALA > Issue Type: Bug >Reporter: Michael Smith >Assignee: Kurt Deschler >Priority: Major > > {code} > query_test.test_fetch.TestFetch.test_rows_sent_counters[protocol: beeswax | > exec_option: {'test_replan': 1, 'batch_size': 0, 'num_nodes': 0, > 'disable_codegen_rows_threshold': 0, 'disable_codegen': True, > 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: > parquet/none] > {code} > failed with > {code} > query_test/test_fetch.py:69: in test_rows_sent_counters > assert re.search("RPCCount: [5-9]", runtime_profile) > E assert None > E+ where None = ('RPCCount: [5-9]', > 'Query (id=c8476e5c065757bf:b4367698):\n DEBUG MODE WARNING: Query > profile created while running a DEBUG buil...: 0.000ns\n - > WriteIoBytes: 0\n - WriteIoOps: 0 (0)\n - > WriteIoWaitTime: 0.000ns\n') > E+where = re.search > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12933) Catalogd should set eventTypeSkipList when fetching specifit events for a table
[ https://issues.apache.org/jira/browse/IMPALA-12933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17839699#comment-17839699 ] ASF subversion and git services commented on IMPALA-12933: -- Commit 0767d656ef00a381441fdcc3ebb3f146fb0d179c in impala's branch refs/heads/branch-4.4.0 from stiga-huang [ https://gitbox.apache.org/repos/asf?p=impala.git;h=0767d656e ] IMPALA-12933: Avoid fetching unneccessary events of unwanted types There are several places where catalogd will fetch all events of a specific type on a table. E.g. in TableLoader#load(), if the table has an old createEventId, catalogd will fetch all CREATE_TABLE events after that createEventId on the table. Fetching the list of events is expensive since the filtering is done on client side, i.e. catalogd fetches all events and filter them locally based on the event type and table name. This could take hours if there are lots of events (e.g 1M) in HMS. This patch sets the eventTypeSkipList with the complement set of the wanted type. So the get_next_notification RPC can filter out some events on HMS side. To avoid bringing too much computation overhead to HMS's underlying RDBMS in evaluating predicates of EVENT_TYPE != 'xxx', rare event types (e.g. DROP_ISCHEMA) are not added in the list. A new flag, common_hms_event_types, is added to specify the common HMS event types. Once HIVE-28146 is resolved, we can set the wanted types directly in the HMS RPC and this approach can be simplified. UPDATE_TBL_COL_STAT_EVENT, UPDATE_PART_COL_STAT_EVENT are the most common unused events for Impala. They are also added to the default skip list. A new flag, default_skipped_hms_event_types, is added to configure this list. This patch also fixes an issue that events of the non-default catalog are not filtered out. In a local perf test, I generated 100K RELOAD events after creating a table in Hive. Then use the table in Impala to trigger metadata loading on it which will fetch the latest CREATE_TABLE event by polling all events after the last known CREATE_TABLE event. Before this patch, fetching the events takes 1s779ms. Now it takes only 395.377ms. Note that in prod env, the event messages are usually larger, we could have a larger speedup. Tests: - Added an FE test - Ran CORE tests Change-Id: Ieabe714328aa2cc605cb62b85ae8aa4bd537dbe9 Reviewed-on: http://gerrit.cloudera.org:8080/21186 Reviewed-by: Csaba Ringhofer Tested-by: Impala Public Jenkins > Catalogd should set eventTypeSkipList when fetching specifit events for a > table > --- > > Key: IMPALA-12933 > URL: https://issues.apache.org/jira/browse/IMPALA-12933 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Reporter: Quanlong Huang >Assignee: Quanlong Huang >Priority: Critical > Fix For: Impala 4.4.0 > > > There are several places that catalogd will fetch all events of a specifit > type on a table. E.g. in TableLoader#load(), if the table has an old > createEventId, catalogd will fetch all CREATE_TABLE events after that > createEventId on the table. > Fetching the list of events is expensive since the filtering is done on > client side, i.e. catalogd fetch all events and filter them locally based on > the event type and table name: > [https://github.com/apache/impala/blob/14e3ed4f97292499b2e6ee8d5a756dc648d9/fe/src/main/java/org/apache/impala/catalog/TableLoader.java#L98-L102] > [https://github.com/apache/impala/blob/b7ddbcad0dd6accb559a3f391a897a8c442d1728/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java#L336] > This could take hours if there are lots of events (e.g 1M) in HMS. In fact, > NotificationEventRequest can specify an eventTypeSkipList. Catalogd can do > the filtering of event type in HMS side. On higher Hive versions that have > HIVE-27499, catalogd can also specify the table name in the request > (IMPALA-12607). > This Jira focus on specifying the eventTypeSkipList when fetching events of a > particular type on a table. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-13003) Server exits early failing to create impala_query_log with AlreadyExistsException
[ https://issues.apache.org/jira/browse/IMPALA-13003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17839687#comment-17839687 ] ASF subversion and git services commented on IMPALA-13003: -- Commit 1a90388b19500289fbb8c3765aee9a8882ae3c04 in impala's branch refs/heads/branch-4.4.0 from Michael Smith [ https://gitbox.apache.org/repos/asf?p=impala.git;h=1a90388b1 ] IMPALA-13003: Handle Iceberg AlreadyExistsException When multiple coordinators attempt to create the same table concurrently with "if not exists", we still see AlreadyExistsException: Table was created concurrently: my_iceberg_tbl Iceberg throws its own version of AlreadyExistsException, but we avoid most code paths that would throw it because we first check HMS to see if the table exists before trying to create it. Updates createIcebergTable to handle Iceberg's AlreadyExistsException identically to the HMS AlreadyExistsException. Adds a test using DebugAction to simulate concurrent table creation. Change-Id: I847eea9297c9ee0d8e821fe1c87ea03d22f1d96e Reviewed-on: http://gerrit.cloudera.org:8080/21312 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > Server exits early failing to create impala_query_log with > AlreadyExistsException > - > > Key: IMPALA-13003 > URL: https://issues.apache.org/jira/browse/IMPALA-13003 > Project: IMPALA > Issue Type: Bug > Components: be >Affects Versions: Impala 4.4.0 >Reporter: Andrew Sherman >Assignee: Michael Smith >Priority: Critical > Labels: iceberg, workload-management > Fix For: Impala 4.4.0 > > > At startup workload management tries to create the query log table here: > {code:java} > // The initialization code only works when run in a separate thread for > reasons unknown. > ABORT_IF_ERROR(SetupDbTable(internal_server_.get(), table_name)); > {code} > This code is exiting: > {code:java} > I0413 23:40:05.183876 21006 client-request-state.cc:1348] > 1d4878dbc9214c81:6dc8cc2e] ImpalaRuntimeException: Error making > 'createTable' RPC to Hive Metastore: > CAUSED BY: AlreadyExistsException: Table was created concurrently: > sys.impala_query_log > I0413 23:40:05.184055 20955 impala-server.cc:2582] Connection > 27432606d99dcdae:218860164eb206bb from client in-memory.localhost:0 to server > internal-server closed. The connection had 1 associated session(s). > I0413 23:40:05.184067 20955 impala-server.cc:1780] Closing session: > 27432606d99dcdae:218860164eb206bb > I0413 23:40:05.184083 20955 impala-server.cc:1836] Closed session: > 27432606d99dcdae:218860164eb206bb, client address: . > F0413 23:40:05.184111 20955 workload-management.cc:304] query timed out > waiting for results > . Impalad exiting. > I0413 23:40:05.184728 20883 impala-server.cc:1564] Query successfully > unregistered: query_id=1d4878dbc9214c81:6dc8cc2e > Minidump in thread [20955]completed-queries running query > :, fragment instance > : > Wrote minidump to > /data/jenkins/workspace/impala-cdw-master-core-ubsan/repos/Impala/logs/custom_cluster_tests/minidumps/impalad/402f37cc-4663-4c78-086ca295-a9e5943c.dmp > {code} > with stack > {code:java} > F0413 23:40:05.184111 20955 workload-management.cc:304] query timed out > waiting for results > . Impalad exiting. > *** Check failure stack trace: *** > @ 0x8e96a4d google::LogMessage::Fail() > @ 0x8e98984 google::LogMessage::SendToLog() > @ 0x8e9642c google::LogMessage::Flush() > @ 0x8e98ea9 google::LogMessageFatal::~LogMessageFatal() > @ 0x3da3a9a impala::ImpalaServer::CompletedQueriesThread() > @ 0x3a8df93 boost::_mfi::mf0<>::operator()() > @ 0x3a8de97 boost::_bi::list1<>::operator()<>() > @ 0x3a8dd77 boost::_bi::bind_t<>::operator()() > @ 0x3a8d672 > boost::detail::function::void_function_obj_invoker0<>::invoke() > @ 0x301e7d0 boost::function0<>::operator()() > @ 0x43ce415 impala::Thread::SuperviseThread() > @ 0x43e2dc7 boost::_bi::list5<>::operator()<>() > @ 0x43e29e7 boost::_bi::bind_t<>::operator()() > @ 0x43e21c5 boost::detail::thread_data<>::run() > @ 0x7984c37 thread_proxy > @ 0x7f75b6982ea5 start_thread > @ 0x7f75b36a7b0d __clone > Picked up JAVA_TOOL_OPTIONS: > -agentlib:jdwp=transport=dt_socket,address=3,server=y,suspend=n > -Dsun.java.command=impalad > Minidump in thread [20955]completed-queries running query > :, fragment instance > : > {code} > I think the key error is >
[jira] [Commented] (IMPALA-13006) Some Iceberg test tables are not restricted to Parquet
[ https://issues.apache.org/jira/browse/IMPALA-13006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17839685#comment-17839685 ] ASF subversion and git services commented on IMPALA-13006: -- Commit 577b18c3744a4cde5a623d68d37e52d46637a3a6 in impala's branch refs/heads/branch-4.4.0 from Noemi Pap-Takacs [ https://gitbox.apache.org/repos/asf?p=impala.git;h=577b18c37 ] IMPALA-13006: Restrict Iceberg tables to Parquet Iceberg test tables/views are restricted to the Parquet file format in functional/schema_constraints.csv. The following two were unintentionally left out: iceberg_query_metadata iceberg_view Added the constraint for these tables too. Testing: - executed data load for the functional dataset Change-Id: I2590d7a70fe6aaf1277b19e6b23015d39d2935cb Reviewed-on: http://gerrit.cloudera.org:8080/21306 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > Some Iceberg test tables are not restricted to Parquet > -- > > Key: IMPALA-13006 > URL: https://issues.apache.org/jira/browse/IMPALA-13006 > Project: IMPALA > Issue Type: Bug >Reporter: Daniel Becker >Assignee: Noemi Pap-Takacs >Priority: Major > Labels: impala-iceberg > > Our Iceberg test tables/views are restricted to the Parquet file format in > functional/schema_constraints.csv except for the following two: > {code:java} > iceberg_query_metadata > iceberg_view{code} > This is not intentional, so we should add the constraint for these tables too. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12874) Identify Catalog HA and StateStore HA from the web debug endpoint
[ https://issues.apache.org/jira/browse/IMPALA-12874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17839691#comment-17839691 ] ASF subversion and git services commented on IMPALA-12874: -- Commit 6c738bc3fe9d765254b45d62c859275eaaa16a0f in impala's branch refs/heads/branch-4.4.0 from Yida Wu [ https://gitbox.apache.org/repos/asf?p=impala.git;h=6c738bc3f ] IMPALA-12874: Identify active and standby catalog and statestore in the web debug endpoint This patch adds support to display the HA status of catalog and statestore on the root web page. The status will be presented as "Catalog Status: Active" or "Statestore Status: Standby" based on the values retrieved from the metrics catalogd-server.active-status and statestore.active-status. If the catalog or statestore is standalone, it will show active as the status, which is same as the metric. Tests: Ran core tests. Manually tests the web page, and verified the status display is correct. Also checked the situation when the failover happens, the current 'standby' status can be changed to 'active'. Change-Id: Ie9435ba7a9549ea56f9d080a9315aecbcc630cd2 Reviewed-on: http://gerrit.cloudera.org:8080/21294 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > Identify Catalog HA and StateStore HA from the web debug endpoint > - > > Key: IMPALA-12874 > URL: https://issues.apache.org/jira/browse/IMPALA-12874 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Reporter: gaurav singh >Assignee: Yida Wu >Priority: Major > Fix For: Impala 4.4.0 > > > Identify which Catalog and Statestore instance is active from the web debug > endpoint. > For the HA, we should improve the label on catalog and statestore web > response to indicate "Active" instance. > On the main page we could indicate "Status: Active" or "Status: Stand-by". We > could probably add the status at the top of the main page before *"Version"* > . The current details as output of a curl request on port 25020: > {code:java} > Version > catalogd version 4.0.0.2024.0.18.0-61 RELEASE (build > 82901f3f83fa4c25b318ebf825a1505d89209356) > Built on Fri Mar 1 20:13:09 UTC 2024 > Build Flags: is_ndebug=true cmake_build_type=RELEASE > library_link_type=STATIC > > Process Start Time > .. > {code} > Corresponding curl request on statestored on 25010: > {code:java} > Version > statestored version 4.0.0.2024.0.18.0-61 RELEASE > (build 82901f3f83fa4c25b318ebf825a1505d89209356) > Built on Fri Mar 1 20:13:09 UTC 2024 > Build Flags: is_ndebug=true cmake_build_type=RELEASE > library_link_type=STATIC > > Process Start Time > {code} > Catalogd active status can be figured out using the catalogd metric: > "catalog-server.active-status" > Statestored active status we probably don't have a metric so should add a > similar metric and use that to determine status. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12980) Translate CpuAsk into admission control slot to use
[ https://issues.apache.org/jira/browse/IMPALA-12980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17839693#comment-17839693 ] ASF subversion and git services commented on IMPALA-12980: -- Commit f97042384e0312cfd3943426e048581d9678891d in impala's branch refs/heads/branch-4.4.0 from Riza Suminto [ https://gitbox.apache.org/repos/asf?p=impala.git;h=f97042384 ] IMPALA-12980: Translate CpuAsk into admission control slots Impala has a concept of "admission control slots" - the amount of parallelism that should be allowed on an Impala daemon. This defaults to the number of processors per executor and can be overridden with -–admission_control_slots flag. Admission control slot accounting is described in IMPALA-8998. It computes 'slots_to_use' for each backend based on the maximum number of instances of any fragment on that backend. This can lead to slot underestimation and query overadmission. For example, assume an executor node with 48 CPU cores and configured with -–admission_control_slots=48. It is assigned 4 non-blocking query fragments, each has 12 instances scheduled in this executor. IMPALA-8998 algorithm will request the max instance (12) slots rather than the sum of all non-blocking fragment instances (48). With the 36 remaining slots free, the executor can still admit another fragment from a different query but will potentially have CPU contention with the one that is currently running. When COMPUTE_PROCESSING_COST is enabled, Planner will generate a CpuAsk number that represents the cpu requirement of that query over a particular executor group set. This number is an estimation of the largest number of query fragment instances that can run in parallel without waiting, given by the blocking operator analysis. Therefore, the fragment trace that sums into that CpuAsk number can be translated into 'slots_to_use' as well, which will be a closer resemblance of maximum parallel execution of fragment instances. This patch adds a new query option called SLOT_COUNT_STRATEGY to control which admission control slot accounting to use. There are two possible values: - LARGEST_FRAGMENT, which is the original algorithm from IMPALA-8998. This is still the default value for the SLOT_COUNT_STRATEGY option. - PLANNER_CPU_ASK, which will follow the fragment trace that contributes towards CpuAsk number. This strategy will schedule more or equal admission control slots than the LARGEST_FRAGMENT strategy. To do the PLANNER_CPU_ASK strategy, the Planner will mark fragments that contribute to CpuAsk as dominant fragments. It also passes max_slot_per_executor information that it knows about the executor group set to the scheduler. AvgAdmissionSlotsPerExecutor counter is added to describe what Planner thinks the average 'slots_to_use' per backend will be, which follows this formula: AvgAdmissionSlotsPerExecutor = ceil(CpuAsk / num_executors) Actual 'slots_to_use' in each backend may differ than AvgAdmissionSlotsPerExecutor, depending on what is scheduled on that backend. 'slots_to_use' will be shown as 'AdmissionSlots' counter under each executor profile node. Testing: - Update test_executors.py with AvgAdmissionSlotsPerExecutor assertion. - Pass test_tpcds_queries.py::TestTpcdsQueryWithProcessingCost. - Add EE test test_processing_cost.py. - Add FE test PlannerTest#testProcessingCostPlanAdmissionSlots. Change-Id: I338ca96555bfe8d07afce0320b3688a0861663f2 Reviewed-on: http://gerrit.cloudera.org:8080/21257 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > Translate CpuAsk into admission control slot to use > --- > > Key: IMPALA-12980 > URL: https://issues.apache.org/jira/browse/IMPALA-12980 > Project: IMPALA > Issue Type: Improvement > Components: Distributed Exec, Frontend >Reporter: Riza Suminto >Assignee: Riza Suminto >Priority: Major > Fix For: Impala 4.4.0 > > > Admission control slot accounting is described in IMPALA-8998. On each > backend, number of. It compute 'slots_to_use' for each backend based on the > max number of instances of any fragment on that backend. This is simplistic, > because multiple fragments with same number of instance count, say 4 > non-blocking fragments each with 12 instances, only request the max instance > (12) admission slots rather than sum of it (48). > When COMPUTE_PROCESSING_COST is enabled, Planner will generate a CpuAsk > number that represent the cpu requirement of that query over a particular > executor group set. This number is an estimation of what is the largest > number of query fragments that can run in-parallel given the blocking > operator analysis. Therefore, the fragment trace that sums into that CpuAsk > number can be translated into 'slots_to_use' as well, which will be a
[jira] [Commented] (IMPALA-12657) Improve ProcessingCost of ScanNode and NonGroupingAggregator
[ https://issues.apache.org/jira/browse/IMPALA-12657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17839698#comment-17839698 ] ASF subversion and git services commented on IMPALA-12657: -- Commit af2d28b54d4fb6b182b27a278c2cfa6277f3 in impala's branch refs/heads/branch-4.4.0 from David Rorke [ https://gitbox.apache.org/repos/asf?p=impala.git;h=af2d28b54 ] IMPALA-12657: Improve ProcessingCost of ScanNode and NonGroupingAggregator This patch improves the accuracy of the CPU ProcessingCost estimates for several of the CPU intensive operators by basing the costs on benchmark data. The general approach for a given operator was to run a set of queries that exercised the operator under various conditions (e.g. large vs small row sizes and row counts, varying NDV, different file formats, etc) and capture the CPU time spent per unit of work (the unit of work might be measured as some number of rows, some number of bytes, some number of predicates evaluated, or some combination of these). The data was then analyzed in an attempt to fit a simple model that would allow us to predict CPU consumption of a given operator based on information available at planning time. For example, the CPU ProcessingCost for a Parquet scan is estimated as: TotalCost = (0.0144 * BytesMaterialized) + (0.0281 * Rows * Predicate Count) The coefficients (0.0144 and 0.0281) are derived from benchmarking scans under a variety of conditions. Similar cost functions and coefficients were derived for all of the benchmarked operators. The coefficients for all the operators are normalized such that a single unit of cost equates to roughly 100 nanoseconds of CPU time on a r5d.4xlarge instance. So we would predict an operator with a cost of 10,000,000 would complete in roughly one second on a single core. Limitations: * Costing only addresses CPU time spent and doesn't account for any IO or other wait time. * Benchmarking scenarios didn't provide comprehensive coverage of the full range of data types, distributions, etc. More thorough benchmarking could improve the costing estimates further. * This initial patch only covers a subset of the operators, focusing on those that are most common and most CPU intensive. Specifically the following operators are covered by this patch. All others continue to use the previous ProcessingCost code: AggregationNode DataStreamSink (exchange sender) ExchangeNode HashJoinNode HdfsScanNode HdfsTableSink NestedLoopJoinNode SortNode UnionNode Benchmark-based costing of the remaining operators will be covered by a future patch. Future patches will automate the collection and analysis of the benchmark data and the computation of the cost coefficients to simplify maintenance of the costing as performance changes over time. Change-Id: Icf1edd48d4ae255b7b3b7f5b228800d7bac7d2ca Reviewed-on: http://gerrit.cloudera.org:8080/21279 Reviewed-by: Riza Suminto Tested-by: Impala Public Jenkins > Improve ProcessingCost of ScanNode and NonGroupingAggregator > > > Key: IMPALA-12657 > URL: https://issues.apache.org/jira/browse/IMPALA-12657 > Project: IMPALA > Issue Type: Improvement > Components: Frontend >Affects Versions: Impala 4.3.0 >Reporter: Riza Suminto >Assignee: David Rorke >Priority: Major > Fix For: Impala 4.4.0 > > Attachments: profile_1f4d7a679a3e12d5_42231157.txt > > > Several benchmark run measuring Impala scan performance indicates some > costing improvement opportunity around ScanNode and NonGroupingAggregator. > [^profile_1f4d7a679a3e12d5_42231157.txt] shows an example of simple > count query. > Key takeaway: > # There is a strong correlation between total materialized bytes (row-size * > cardinality) with total materialized tuple time per fragment. Row > materialization cost should be adjusted to be based on this row-sized instead > of equal cost per scan range. > # NonGroupingAggregator should have much lower cost that GroupingAggregator. > In example above, the cost of NonGroupingAggregator dominates the scan > fragment even though it only does simple counting instead of hash table > operation. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12543) test_iceberg_self_events failed in JDK11 build
[ https://issues.apache.org/jira/browse/IMPALA-12543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17839700#comment-17839700 ] ASF subversion and git services commented on IMPALA-12543: -- Commit e1bbdacc5133d36d997e0e19c52753df90376a1e in impala's branch refs/heads/branch-4.4.0 from Riza Suminto [ https://gitbox.apache.org/repos/asf?p=impala.git;h=e1bbdacc5 ] IMPALA-12543: Detect self-events before finishing DDL test_iceberg_self_events has been flaky for not having tbls_refreshed_before equal to tbls_refreshed_after in-between query executions. Further investigation reveals concurrency bug due to db/table level lock is not taken during db/table self-events check (IMPALA-12461 part1). The order of ALTER TABLE operation is as follow: 1. alter table starts in CatalogOpExecutor 2. table level lock is taken 3. HMS RPC starts (CatalogOpExecutor.applyAlterTable()) 4. HMS generates the event 5. HMS RPC returns 6. table is reloaded 7. catalog version is added to inflight event list 8. table level lock is released Meanwhile the event processor thread fetches the new event after 4 and before 7. Because of IMPALA-12461 (part 1), it can also finish self-events checking before reaching 7. Before IMPALA-12461, self-events would have needed to wait for 8. Note that this issue is only relevant for table level events, as self-events checking for partition level events still takes table lock. This patch fix the issue by adding newCatalogVersion to the table's inflight event list before updating HMS using helper class InProgressTableModification. If HMS update does not complete (ie., an exception is thrown), the new newCatalogVersion that was added is then removed. This patch also fix few smaller issues, including: - Avoid incrementing EVENTS_SKIPPED_METRIC if numFilteredEvents == 0 in MetastoreEventFactory.getFilteredEvents(). - Increment EVENTS_SKIPPED_METRIC in MetastoreTableEvent.reloadTableFromCatalog() if table is already in the middle of reloading (revealed through flaky test_skipping_older_events). - Rephrase misleading log message in MetastoreEventProcessor.getNextMetastoreEvents(). Testing: - Add TestEventProcessingWithImpala, run it with debug_action and sync_ddl dimensions. - Pass exhaustive tests. Change-Id: I8365c934349ad21a4d9327fc11594d2fc3445f79 Reviewed-on: http://gerrit.cloudera.org:8080/21029 Reviewed-by: Riza Suminto Tested-by: Impala Public Jenkins > test_iceberg_self_events failed in JDK11 build > -- > > Key: IMPALA-12543 > URL: https://issues.apache.org/jira/browse/IMPALA-12543 > Project: IMPALA > Issue Type: Bug >Reporter: Riza Suminto >Assignee: Riza Suminto >Priority: Major > Labels: broken-build > Fix For: Impala 4.4.0 > > Attachments: catalogd.INFO, std_err.txt > > > test_iceberg_self_events failed in JDK11 build with following error. > > {code:java} > Error Message > assert 0 == 1 > Stacktrace > custom_cluster/test_events_custom_configs.py:637: in test_iceberg_self_events > check_self_events("ALTER TABLE {0} ADD COLUMN j INT".format(tbl_name)) > custom_cluster/test_events_custom_configs.py:624: in check_self_events > assert tbls_refreshed_before == tbls_refreshed_after > E assert 0 == 1 {code} > This test still pass before IMPALA-11387 merged. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12990) impala-shell broken if Iceberg delete deletes 0 rows
[ https://issues.apache.org/jira/browse/IMPALA-12990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17839688#comment-17839688 ] ASF subversion and git services commented on IMPALA-12990: -- Commit 858794906158439de8b38a66739d62e1231bbd14 in impala's branch refs/heads/branch-4.4.0 from Csaba Ringhofer [ https://gitbox.apache.org/repos/asf?p=impala.git;h=858794906 ] IMPALA-12990: Fix impala-shell handling of unset rows_deleted The issue occurred in Python 3 when 0 rows were deleted from Iceberg. It could also happen in other DMLs with older Impala servers where TDmlResult.rows_deleted was not set. See the Jira for details of the error. Testing: Extended shell tests for Kudu DML reporting to also cover Iceberg. Change-Id: I5812b8006b9cacf34a7a0dbbc89a486d8b454438 Reviewed-on: http://gerrit.cloudera.org:8080/21284 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > impala-shell broken if Iceberg delete deletes 0 rows > > > Key: IMPALA-12990 > URL: https://issues.apache.org/jira/browse/IMPALA-12990 > Project: IMPALA > Issue Type: Bug > Components: Clients >Reporter: Csaba Ringhofer >Assignee: Csaba Ringhofer >Priority: Major > Labels: iceberg > > Happens only with Python 3 > {code} > impala-python3 shell/impala_shell.py > create table icebergupdatet (i int, s string) stored as iceberg; > alter table icebergupdatet set tblproperties("format-version"="2"); > delete from icebergupdatet where i=0; > Unknown Exception : '>' not supported between instances of 'NoneType' and > 'int' > Traceback (most recent call last): > File "shell/impala_shell.py", line 1428, in _execute_stmt > if is_dml and num_rows == 0 and num_deleted_rows > 0: > TypeError: '>' not supported between instances of 'NoneType' and 'int' > {code} > The same erros should also happen when the delete removes > 0 rows, but the > impala server has an older version that doesn't set TDmlResult.rows_deleted -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-13008) test_metadata_tables failed in Ubuntu 20 build
[ https://issues.apache.org/jira/browse/IMPALA-13008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17839689#comment-17839689 ] ASF subversion and git services commented on IMPALA-13008: -- Commit ce863e8c7ffd3d5abdc2c69b06042b711a0f707f in impala's branch refs/heads/branch-4.4.0 from Daniel Becker [ https://gitbox.apache.org/repos/asf?p=impala.git;h=ce863e8c7 ] IMPALA-13008: test_metadata_tables failed in Ubuntu 20 build TestIcebergV2Table::test_metadata_tables failed in Ubuntu 20 build in a release candidate because the file sizes in some queries didn't match the expected ones. As Impala writes its version into the Parquet files it writes, the file sizes can change with the release (especially as SNAPSHOT or RELEASE is part of the full version, and their lengths differ). This change updates the failing tests to take regexes for the file sizes instead of concrete values. Change-Id: Iad8fd0d9920034e7dbe6c605bed7579fbe3b5b1f Reviewed-on: http://gerrit.cloudera.org:8080/21317 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > test_metadata_tables failed in Ubuntu 20 build > -- > > Key: IMPALA-13008 > URL: https://issues.apache.org/jira/browse/IMPALA-13008 > Project: IMPALA > Issue Type: Bug >Reporter: Zoltán Borók-Nagy >Assignee: Daniel Becker >Priority: Major > Labels: impala-iceberg > > test_metadata_tables failed in an Ubuntu 20 release test build: > * > https://jenkins.impala.io/job/parallel-all-tests-ub2004/1059/artifact/https_%5E%5Ejenkins.impala.io%5Ejob%5Eubuntu-20.04-dockerised-tests%5E1642%5E.log > * > https://jenkins.impala.io/job/parallel-all-tests-ub2004/1059/artifact/https_%5E%5Ejenkins.impala.io%5Ejob%5Eubuntu-20.04-from-scratch%5E2363%5E.log > h2. Error > {noformat} > E assert Comparing QueryTestResults (expected vs actual): > E > 'append',true,'{"added-data-files":"1","added-records":"1","added-files-size":"351","changed-partition-count":"1","total-records":"1","total-files-size":"351","total-data-files":"1","total-delete-files":"0","total-position-deletes":"0","total-equality-deletes":"0"}' > != > 'append',true,'{"added-data-files":"1","added-records":"1","added-files-size":"350","changed-partition-count":"1","total-records":"1","total-files-size":"350","total-data-files":"1","total-delete-files":"0","total-position-deletes":"0","total-equality-deletes":"0"}' > E > 'append',true,'{"added-data-files":"1","added-records":"1","added-files-size":"351","changed-partition-count":"1","total-records":"2","total-files-size":"702","total-data-files":"2","total-delete-files":"0","total-position-deletes":"0","total-equality-deletes":"0"}' > != > 'append',true,'{"added-data-files":"1","added-records":"1","added-files-size":"350","changed-partition-count":"1","total-records":"2","total-files-size":"700","total-data-files":"2","total-delete-files":"0","total-position-deletes":"0","total-equality-deletes":"0"}' > E > 'append',true,'{"added-data-files":"1","added-records":"1","added-files-size":"351","changed-partition-count":"1","total-records":"3","total-files-size":"1053","total-data-files":"3","total-delete-files":"0","total-position-deletes":"0","total-equality-deletes":"0"}' > != > 'append',true,'{"added-data-files":"1","added-records":"1","added-files-size":"350","changed-partition-count":"1","total-records":"3","total-files-size":"1050","total-data-files":"3","total-delete-files":"0","total-position-deletes":"0","total-equality-deletes":"0"}' > E > row_regex:'overwrite',true,'{"added-position-delete-files":"1","added-delete-files":"1","added-files-size":"[1-9][0-9]*","added-position-deletes":"1","changed-partition-count":"1","total-records":"3","total-files-size":"[1-9][0-9]*","total-data-files":"3","total-delete-files":"1","total-position-deletes":"1","total-equality-deletes":"0"}' > == > 'overwrite',true,'{"added-position-delete-files":"1","added-delete-files":"1","added-files-size":"1551","added-position-deletes":"1","changed-partition-count":"1","total-records":"3","total-files-size":"2601","total-data-files":"3","total-delete-files":"1","total-position-deletes":"1","total-equality-deletes":"0"}' > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12461) Avoid write lock on the table during self-event detection
[ https://issues.apache.org/jira/browse/IMPALA-12461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17839701#comment-17839701 ] ASF subversion and git services commented on IMPALA-12461: -- Commit e1bbdacc5133d36d997e0e19c52753df90376a1e in impala's branch refs/heads/branch-4.4.0 from Riza Suminto [ https://gitbox.apache.org/repos/asf?p=impala.git;h=e1bbdacc5 ] IMPALA-12543: Detect self-events before finishing DDL test_iceberg_self_events has been flaky for not having tbls_refreshed_before equal to tbls_refreshed_after in-between query executions. Further investigation reveals concurrency bug due to db/table level lock is not taken during db/table self-events check (IMPALA-12461 part1). The order of ALTER TABLE operation is as follow: 1. alter table starts in CatalogOpExecutor 2. table level lock is taken 3. HMS RPC starts (CatalogOpExecutor.applyAlterTable()) 4. HMS generates the event 5. HMS RPC returns 6. table is reloaded 7. catalog version is added to inflight event list 8. table level lock is released Meanwhile the event processor thread fetches the new event after 4 and before 7. Because of IMPALA-12461 (part 1), it can also finish self-events checking before reaching 7. Before IMPALA-12461, self-events would have needed to wait for 8. Note that this issue is only relevant for table level events, as self-events checking for partition level events still takes table lock. This patch fix the issue by adding newCatalogVersion to the table's inflight event list before updating HMS using helper class InProgressTableModification. If HMS update does not complete (ie., an exception is thrown), the new newCatalogVersion that was added is then removed. This patch also fix few smaller issues, including: - Avoid incrementing EVENTS_SKIPPED_METRIC if numFilteredEvents == 0 in MetastoreEventFactory.getFilteredEvents(). - Increment EVENTS_SKIPPED_METRIC in MetastoreTableEvent.reloadTableFromCatalog() if table is already in the middle of reloading (revealed through flaky test_skipping_older_events). - Rephrase misleading log message in MetastoreEventProcessor.getNextMetastoreEvents(). Testing: - Add TestEventProcessingWithImpala, run it with debug_action and sync_ddl dimensions. - Pass exhaustive tests. Change-Id: I8365c934349ad21a4d9327fc11594d2fc3445f79 Reviewed-on: http://gerrit.cloudera.org:8080/21029 Reviewed-by: Riza Suminto Tested-by: Impala Public Jenkins > Avoid write lock on the table during self-event detection > - > > Key: IMPALA-12461 > URL: https://issues.apache.org/jira/browse/IMPALA-12461 > Project: IMPALA > Issue Type: Improvement > Components: Catalog >Reporter: Csaba Ringhofer >Assignee: Csaba Ringhofer >Priority: Critical > > Saw some callstacks like this: > {code} > at > org.apache.impala.catalog.CatalogServiceCatalog.tryLock(CatalogServiceCatalog.java:468) > at > org.apache.impala.catalog.CatalogServiceCatalog.tryWriteLock(CatalogServiceCatalog.java:436) > at > org.apache.impala.catalog.CatalogServiceCatalog.evaluateSelfEvent(CatalogServiceCatalog.java:1008) > at > org.apache.impala.catalog.events.MetastoreEvents$MetastoreEvent.isSelfEvent(MetastoreEvents.java:609) > at > org.apache.impala.catalog.events.MetastoreEvents$BatchPartitionEvent.process(MetastoreEvents.java:1942) > {code} > At this point it was already checked that the event comes from Impala based > on service id and now we are checking the table's self event list. Taking the > table lock can be problematic as other DDL may took write lock at the same > time. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12988) Calculate an unbounded version of CpuAsk
[ https://issues.apache.org/jira/browse/IMPALA-12988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17839697#comment-17839697 ] ASF subversion and git services commented on IMPALA-12988: -- Commit c8149d14127968eb9a4d26a623fa6cc82762216e in impala's branch refs/heads/branch-4.4.0 from Riza Suminto [ https://gitbox.apache.org/repos/asf?p=impala.git;h=c8149d141 ] IMPALA-12988: Calculate an unbounded version of CpuAsk Planner calculates CpuAsk through a recursive call beginning at Planner.computeBlockingAwareCores(), which is called after Planner.computeEffectiveParallelism(). It does blocking operator analysis over the selected degree of parallelism that was decided during computeEffectiveParallelism() traversal. That selected degree of parallelism, however, is already bounded by min and max parallelism config, derived from PROCESSING_COST_MIN_THREADS and MAX_FRAGMENT_INSTANCES_PER_NODE options accordingly. This patch calculates an unbounded version of CpuAsk that is not bounded by min and max parallelism config. It is purely based on the fragment's ProcessingCost and query plan relationship constraint (for example, the number of JOIN BUILDER fragments should equal the number of destination JOIN fragments for partitioned join). Frontend will receive both bounded and unbounded CpuAsk values from TQueryExecRequest on each executor group set selection round. The unbounded CpuAsk is then scaled down once using a nth root based sublinear-function, controlled by the total cpu count of the smallest executor group set and the bounded CpuAsk number. Another linear scaling is then applied on both bounded and unbounded CpuAsk using QUERY_CPU_COUNT_DIVISOR option. Frontend then compare the unbounded CpuAsk after scaling against CpuMax to avoid assigning a query to a small executor group set too soon. The last executor group set stays as the "catch-all" executor group set. After this patch, setting COMPUTE_PROCESSING_COST=True will show following changes in query profile: - The "max-parallelism" fields in the query plan will all be set to maximum parallelism based on ProcessingCost. - The CpuAsk counter is changed to show the unbounded CpuAsk after scaling. - A new counter CpuAskBounded shows the bounded CpuAsk after scaling. If QUERY_CPU_COUNT_DIVISOR=1 and PLANNER_CPU_ASK slot counting strategy is selected, this CpuAskBounded is also the minimum total admission slots given to the query. - A new counter MaxParallelism shows the unbounded CpuAsk before scaling. - The EffectiveParallelism counter remains unchanged, showing bounded CpuAsk before scaling. Testing: - Update and pass FE test TpcdsCpuCostPlannerTest and PlannerTest#testProcessingCost. - Pass EE test tests/query_test/test_tpcds_queries.py - Pass custom cluster test tests/custom_cluster/test_executor_groups.py Change-Id: I5441e31088f90761062af35862be4ce09d116923 Reviewed-on: http://gerrit.cloudera.org:8080/21277 Reviewed-by: Kurt Deschler Reviewed-by: Abhishek Rawat Tested-by: Impala Public Jenkins > Calculate an unbounded version of CpuAsk > > > Key: IMPALA-12988 > URL: https://issues.apache.org/jira/browse/IMPALA-12988 > Project: IMPALA > Issue Type: Improvement > Components: Frontend >Reporter: Riza Suminto >Assignee: Riza Suminto >Priority: Major > Fix For: Impala 4.4.0 > > > CpuAsk is calculated through recursive call beginning at > Planner.computeBlockingAwareCores(), which called after > Planner.computeEffectiveParallelism(). It does blocking operator analysis > over selected degree of parallelism that decided during > computeEffectiveParallelism() traversal. That selected degree of parallelism, > however, is already bounded by min and max parallelism config, derived from > PROCESSING_COST_MIN_THREADS and MAX_FRAGMENT_INSTANCES_PER_NODE options > accordingly. > It is beneficial to have another version of CpuAsk that is not bounded by min > and max parallelism config. It should purely based on the fragment's > ProcessingCost and query plan relationship constraint (ie., num JOIN BUILDER > fragment should be equal as num JOIN fragment for partitioned join). During > executor group set selection, Frontend should use the unbounded CpuAsk number > to avoid assigning query to small executor group set prematurely. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-8998) Admission control accounting for mt_dop
[ https://issues.apache.org/jira/browse/IMPALA-8998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17839694#comment-17839694 ] ASF subversion and git services commented on IMPALA-8998: - Commit f97042384e0312cfd3943426e048581d9678891d in impala's branch refs/heads/branch-4.4.0 from Riza Suminto [ https://gitbox.apache.org/repos/asf?p=impala.git;h=f97042384 ] IMPALA-12980: Translate CpuAsk into admission control slots Impala has a concept of "admission control slots" - the amount of parallelism that should be allowed on an Impala daemon. This defaults to the number of processors per executor and can be overridden with -–admission_control_slots flag. Admission control slot accounting is described in IMPALA-8998. It computes 'slots_to_use' for each backend based on the maximum number of instances of any fragment on that backend. This can lead to slot underestimation and query overadmission. For example, assume an executor node with 48 CPU cores and configured with -–admission_control_slots=48. It is assigned 4 non-blocking query fragments, each has 12 instances scheduled in this executor. IMPALA-8998 algorithm will request the max instance (12) slots rather than the sum of all non-blocking fragment instances (48). With the 36 remaining slots free, the executor can still admit another fragment from a different query but will potentially have CPU contention with the one that is currently running. When COMPUTE_PROCESSING_COST is enabled, Planner will generate a CpuAsk number that represents the cpu requirement of that query over a particular executor group set. This number is an estimation of the largest number of query fragment instances that can run in parallel without waiting, given by the blocking operator analysis. Therefore, the fragment trace that sums into that CpuAsk number can be translated into 'slots_to_use' as well, which will be a closer resemblance of maximum parallel execution of fragment instances. This patch adds a new query option called SLOT_COUNT_STRATEGY to control which admission control slot accounting to use. There are two possible values: - LARGEST_FRAGMENT, which is the original algorithm from IMPALA-8998. This is still the default value for the SLOT_COUNT_STRATEGY option. - PLANNER_CPU_ASK, which will follow the fragment trace that contributes towards CpuAsk number. This strategy will schedule more or equal admission control slots than the LARGEST_FRAGMENT strategy. To do the PLANNER_CPU_ASK strategy, the Planner will mark fragments that contribute to CpuAsk as dominant fragments. It also passes max_slot_per_executor information that it knows about the executor group set to the scheduler. AvgAdmissionSlotsPerExecutor counter is added to describe what Planner thinks the average 'slots_to_use' per backend will be, which follows this formula: AvgAdmissionSlotsPerExecutor = ceil(CpuAsk / num_executors) Actual 'slots_to_use' in each backend may differ than AvgAdmissionSlotsPerExecutor, depending on what is scheduled on that backend. 'slots_to_use' will be shown as 'AdmissionSlots' counter under each executor profile node. Testing: - Update test_executors.py with AvgAdmissionSlotsPerExecutor assertion. - Pass test_tpcds_queries.py::TestTpcdsQueryWithProcessingCost. - Add EE test test_processing_cost.py. - Add FE test PlannerTest#testProcessingCostPlanAdmissionSlots. Change-Id: I338ca96555bfe8d07afce0320b3688a0861663f2 Reviewed-on: http://gerrit.cloudera.org:8080/21257 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > Admission control accounting for mt_dop > --- > > Key: IMPALA-8998 > URL: https://issues.apache.org/jira/browse/IMPALA-8998 > Project: IMPALA > Issue Type: Sub-task > Components: Backend >Reporter: Tim Armstrong >Assignee: Tim Armstrong >Priority: Major > Fix For: Impala 3.4.0 > > > We should account for the degree of parallelism that the query runs with on a > backend to avoid overadmitting too many parallel queries. > We could probably simply count the effective degree of parallelism (max # > instances of a fragment on that backend) toward the number of slots in > admission control (although slots are not enabled for the default group yet - > see IMPALA-8757). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-11495) Add glibc version and effective locale to the Web UI
[ https://issues.apache.org/jira/browse/IMPALA-11495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17839684#comment-17839684 ] ASF subversion and git services commented on IMPALA-11495: -- Commit 43690811192bce9f921244efa5934851ad1e4529 in impala's branch refs/heads/branch-4.4.0 from Saurabh Katiyal [ https://gitbox.apache.org/repos/asf?p=impala.git;h=436908111 ] IMPALA-11495: Add glibc version and effective locale to the Web UI Added a new section "Other Info" in root page for WebUI, displaying effective locale and glibc version. Change-Id: Ia69c4d63df4beae29f5261691a8dcdd04b931de7 Reviewed-on: http://gerrit.cloudera.org:8080/21252 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > Add glibc version and effective locale to the Web UI > > > Key: IMPALA-11495 > URL: https://issues.apache.org/jira/browse/IMPALA-11495 > Project: IMPALA > Issue Type: New Feature > Components: Backend >Reporter: Quanlong Huang >Assignee: Saurabh Katiyal >Priority: Major > Labels: newbie, observability, supportability > Fix For: Impala 4.4.0 > > > When debugging utf8 mode string functions, it's essential to know the > effective Unicode version and locale. The Unicode standard version can be > deduced from the glibc version which can be got by command "ldd --version". > We need to find a programmatic way to get it. > The effective locale is already logged here: > https://github.com/apache/impala/blob/ba4cb95b6251911fa9e057cea1cb37958d339fed/be/src/common/init.cc#L406 > We just need to show it in impalad's Web UI as well. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-13026) Creating openai-api-key-secret fails sporadically
Csaba Ringhofer created IMPALA-13026: Summary: Creating openai-api-key-secret fails sporadically Key: IMPALA-13026 URL: https://issues.apache.org/jira/browse/IMPALA-13026 Project: IMPALA Issue Type: Bug Components: Infrastructure Reporter: Csaba Ringhofer Data load fails time to time with the following error: {code} 00:27:17.680 Error loading data. The end of the log file is: 00:27:17.680 04:15:15 /data/jenkins/workspace/impala-asf-master-core-s3/repos/Impala/bin/load-data.py --workloads functional-query -e core --table_formats kudu/none/none --force --impalad localhost --hive_hs2_hostport localhost:11050 --hdfs_namenode localhost:20500 00:27:17.680 04:15:15 Executing Hadoop command: ... hadoop credential create openai-api-key-secret -value secret -provider localjceks://file/data/jenkins/workspace/impala-asf-master-core-s3/repos/Impala/testdata/jceks/test.jceks ... 00:27:17.680 java.io.IOException: Credential openai-api-key-secret already exists in localjceks://file/data/jenkins/workspace/impala-asf-master-core-s3/repos/Impala/testdata/jceks/test.jceks 00:27:17.680at org.apache.hadoop.security.alias.AbstractJavaKeyStoreProvider.createCredentialEntry(AbstractJavaKeyStoreProvider.java:234) 00:27:17.680at org.apache.hadoop.security.alias.CredentialShell$CreateCommand.execute(CredentialShell.java:354) 00:27:17.680at org.apache.hadoop.tools.CommandShell.run(CommandShell.java:72) 00:27:17.680at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:81) 00:27:17.680at org.apache.hadoop.security.alias.CredentialShell.main(CredentialShell.java:437) 00:27:17.680 04:15:15 Error executing Hadoop command, exiting {code} My guess is that this happens when calling "hadoop credential create" concurrently with different data loader processes. https://github.com/apache/impala/blob/9b05a205fec397fa1e19ae467b1cc406ca43d948/bin/load-data.py#L323 Ideally this would be called in the serial phase of dataload -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-13000) Document OPTIMIZE TABLE
[ https://issues.apache.org/jira/browse/IMPALA-13000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noemi Pap-Takacs resolved IMPALA-13000. --- Resolution: Fixed > Document OPTIMIZE TABLE > --- > > Key: IMPALA-13000 > URL: https://issues.apache.org/jira/browse/IMPALA-13000 > Project: IMPALA > Issue Type: Documentation > Components: Docs >Reporter: Noemi Pap-Takacs >Assignee: Noemi Pap-Takacs >Priority: Major > Labels: impala-iceberg > > Document OPTIMIZE TABLE syntax and behaviour. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-12989) LICENSE and NOTICE files are missing in DEB/RPM packages
[ https://issues.apache.org/jira/browse/IMPALA-12989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang resolved IMPALA-12989. - Assignee: (was: Quanlong Huang) Resolution: Fixed This was resolved in IMPALA-12362 by https://gerrit.cloudera.org/c/20263/ > LICENSE and NOTICE files are missing in DEB/RPM packages > > > Key: IMPALA-12989 > URL: https://issues.apache.org/jira/browse/IMPALA-12989 > Project: IMPALA > Issue Type: Bug > Components: Infrastructure >Reporter: Quanlong Huang >Priority: Major > > In order to release the binaries, we need the LICENSE and NOTICE files added > in the DEB/RPM packages. > {quote}*COMPILED PACKAGES* > The Apache Software Foundation produces open source software. All releases > are in the form of the source materials needed to make changes to the > software being released. > As a convenience to users that might not have the appropriate tools to build > a compiled version of the source, binary/bytecode packages MAY be distributed > alongside official Apache releases. In all such cases, the binary/bytecode > package MUST have the same version number as the source release and MUST only > add binary/bytecode files that are the result of compiling that version of > the source code release and its dependencies. > *Licensing Documentation* > Each package MUST provide a {{LICENSE}} file and a {{NOTICE}} file which > account for the package's exact content. {{LICENSE}} and {{NOTICE}} MUST NOT > provide unnecessary information about materials which are not bundled in the > package, such as separately downloaded dependencies. > For source packages, {{LICENSE}} and {{NOTICE}} MUST be located at the root > of the distribution. For additional packages, they MUST be located in the > distribution format's customary location for licensing materials, such as the > {{META-INF}} directory of Java "jar" files. > {quote} > [https://www.apache.org/legal/release-policy.html#licensing-documentation] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-12989) LICENSE and NOTICE files are missing in DEB/RPM packages
[ https://issues.apache.org/jira/browse/IMPALA-12989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang updated IMPALA-12989: Fix Version/s: Impala 4.4.0 > LICENSE and NOTICE files are missing in DEB/RPM packages > > > Key: IMPALA-12989 > URL: https://issues.apache.org/jira/browse/IMPALA-12989 > Project: IMPALA > Issue Type: Bug > Components: Infrastructure >Reporter: Quanlong Huang >Priority: Major > Fix For: Impala 4.4.0 > > > In order to release the binaries, we need the LICENSE and NOTICE files added > in the DEB/RPM packages. > {quote}*COMPILED PACKAGES* > The Apache Software Foundation produces open source software. All releases > are in the form of the source materials needed to make changes to the > software being released. > As a convenience to users that might not have the appropriate tools to build > a compiled version of the source, binary/bytecode packages MAY be distributed > alongside official Apache releases. In all such cases, the binary/bytecode > package MUST have the same version number as the source release and MUST only > add binary/bytecode files that are the result of compiling that version of > the source code release and its dependencies. > *Licensing Documentation* > Each package MUST provide a {{LICENSE}} file and a {{NOTICE}} file which > account for the package's exact content. {{LICENSE}} and {{NOTICE}} MUST NOT > provide unnecessary information about materials which are not bundled in the > package, such as separately downloaded dependencies. > For source packages, {{LICENSE}} and {{NOTICE}} MUST be located at the root > of the distribution. For additional packages, they MUST be located in the > distribution format's customary location for licensing materials, such as the > {{META-INF}} directory of Java "jar" files. > {quote} > [https://www.apache.org/legal/release-policy.html#licensing-documentation] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org