[jira] [Resolved] (IMPALA-8242) Support Iceberg on S3
[ https://issues.apache.org/jira/browse/IMPALA-8242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltán Borók-Nagy resolved IMPALA-8242. --- Fix Version/s: Impala 4.0 Resolution: Fixed > Support Iceberg on S3 > - > > Key: IMPALA-8242 > URL: https://issues.apache.org/jira/browse/IMPALA-8242 > Project: IMPALA > Issue Type: New Feature > Components: Frontend >Reporter: Quanlong Huang >Assignee: Zoltán Borók-Nagy >Priority: Major > Labels: impala-iceberg > Fix For: Impala 4.0 > > > http://iceberg.incubator.apache.org/ -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-8242) Support Iceberg on S3
[ https://issues.apache.org/jira/browse/IMPALA-8242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltán Borók-Nagy reassigned IMPALA-8242: - Assignee: Zoltán Borók-Nagy > Support Iceberg on S3 > - > > Key: IMPALA-8242 > URL: https://issues.apache.org/jira/browse/IMPALA-8242 > Project: IMPALA > Issue Type: New Feature > Components: Frontend >Reporter: Quanlong Huang >Assignee: Zoltán Borók-Nagy >Priority: Major > Labels: impala-iceberg > > http://iceberg.incubator.apache.org/ -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-9723) Read files created by Hive Streaming Ingestion V2
[ https://issues.apache.org/jira/browse/IMPALA-9723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltán Borók-Nagy updated IMPALA-9723: -- Priority: Minor (was: Major) > Read files created by Hive Streaming Ingestion V2 > - > > Key: IMPALA-9723 > URL: https://issues.apache.org/jira/browse/IMPALA-9723 > Project: IMPALA > Issue Type: Sub-task > Components: Frontend >Reporter: Zoltán Borók-Nagy >Priority: Minor > > Impala should be able to read files created by Hive Streaming Ingestion V2. > Hive Streaming only writes full ACID ORC files. Such files might contain row > stripes that Impala shouldn't read based on its validWriteIdList. > Also, Hive Streaming might append to the end of such files. In that case it > writes a "side file" next to the file that contains the last committed file > end (name of it is file name + _flush_length). Impala should take that into > consideration when it reads such files. Everything after "flush length" must > be ignored. > OrcAcidUtils.getLastFlushLength(fileSystem, filePath) can be used to > determine the committed file size. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-9723) Read files created by Hive Streaming Ingestion V2
[ https://issues.apache.org/jira/browse/IMPALA-9723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258117#comment-17258117 ] Zoltán Borók-Nagy commented on IMPALA-9723: --- Lowered the priority because AFAIK the current engines don't append to existing files, but create new ones. So the problem in the description is likely non-existent. But keeping this jira open until this behavior will be the standard. > Read files created by Hive Streaming Ingestion V2 > - > > Key: IMPALA-9723 > URL: https://issues.apache.org/jira/browse/IMPALA-9723 > Project: IMPALA > Issue Type: Sub-task > Components: Frontend >Reporter: Zoltán Borók-Nagy >Priority: Minor > > Impala should be able to read files created by Hive Streaming Ingestion V2. > Hive Streaming only writes full ACID ORC files. Such files might contain row > stripes that Impala shouldn't read based on its validWriteIdList. > Also, Hive Streaming might append to the end of such files. In that case it > writes a "side file" next to the file that contains the last committed file > end (name of it is file name + _flush_length). Impala should take that into > consideration when it reads such files. Everything after "flush length" must > be ignored. > OrcAcidUtils.getLastFlushLength(fileSystem, filePath) can be used to > determine the committed file size. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-7556) Clean up ScanRange
[ https://issues.apache.org/jira/browse/IMPALA-7556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltán Borók-Nagy updated IMPALA-7556: -- Labels: ramp-up (was: ) > Clean up ScanRange > -- > > Key: IMPALA-7556 > URL: https://issues.apache.org/jira/browse/IMPALA-7556 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Reporter: Zoltán Borók-Nagy >Priority: Major > Labels: ramp-up > > For IMPALA-7543 I want to add some additional functionality to scan ranges. > However, the code of the ScanRange class is already quite messy. It handles > different types of files, does some buffer management, updates all kinds of > counters. > So, instead of complicating the code further, let's refactor the ScanRange > class a bit. > * Do the file operations in separate classes > ** A new, abstract class could be invented to provide an API for file > operations, i.e. Open(), ReadFromPos(), Close(), etc. > *** Keep in mind that the interface must be a good fit for IMPALA-7543, i.e. > we need positional reads from files > ** Operations for local files and HDFS files could be implemented in child > classes > * Buffer management > ** A new BufferStore class could be created > ** This new class would be responsible for managing the unused buffers > *** if possible, it would also handle the client and cached buffers as well > * Counters and metrics would be updated by the corresponding new classes > ** E.g. ImpaladMetrics::IO_MGR_NUM_OPEN_FILES would be updated by the file > handling classes -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-7556) Clean up ScanRange
[ https://issues.apache.org/jira/browse/IMPALA-7556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltán Borók-Nagy reassigned IMPALA-7556: - Assignee: (was: Zoltán Borók-Nagy) > Clean up ScanRange > -- > > Key: IMPALA-7556 > URL: https://issues.apache.org/jira/browse/IMPALA-7556 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Reporter: Zoltán Borók-Nagy >Priority: Major > > For IMPALA-7543 I want to add some additional functionality to scan ranges. > However, the code of the ScanRange class is already quite messy. It handles > different types of files, does some buffer management, updates all kinds of > counters. > So, instead of complicating the code further, let's refactor the ScanRange > class a bit. > * Do the file operations in separate classes > ** A new, abstract class could be invented to provide an API for file > operations, i.e. Open(), ReadFromPos(), Close(), etc. > *** Keep in mind that the interface must be a good fit for IMPALA-7543, i.e. > we need positional reads from files > ** Operations for local files and HDFS files could be implemented in child > classes > * Buffer management > ** A new BufferStore class could be created > ** This new class would be responsible for managing the unused buffers > *** if possible, it would also handle the client and cached buffers as well > * Counters and metrics would be updated by the corresponding new classes > ** E.g. ImpaladMetrics::IO_MGR_NUM_OPEN_FILES would be updated by the file > handling classes -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-7556) Clean up ScanRange
[ https://issues.apache.org/jira/browse/IMPALA-7556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258118#comment-17258118 ] Zoltán Borók-Nagy commented on IMPALA-7556: --- The "Buffer management" and "Counters and metrics" part are still to do. They would be nice to have, though I'm not sure when will I have the bandwidth for them. So I'm unassigning this task for now and making it a "ramp-up" task. > Clean up ScanRange > -- > > Key: IMPALA-7556 > URL: https://issues.apache.org/jira/browse/IMPALA-7556 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Reporter: Zoltán Borók-Nagy >Assignee: Zoltán Borók-Nagy >Priority: Major > > For IMPALA-7543 I want to add some additional functionality to scan ranges. > However, the code of the ScanRange class is already quite messy. It handles > different types of files, does some buffer management, updates all kinds of > counters. > So, instead of complicating the code further, let's refactor the ScanRange > class a bit. > * Do the file operations in separate classes > ** A new, abstract class could be invented to provide an API for file > operations, i.e. Open(), ReadFromPos(), Close(), etc. > *** Keep in mind that the interface must be a good fit for IMPALA-7543, i.e. > we need positional reads from files > ** Operations for local files and HDFS files could be implemented in child > classes > * Buffer management > ** A new BufferStore class could be created > ** This new class would be responsible for managing the unused buffers > *** if possible, it would also handle the client and cached buffers as well > * Counters and metrics would be updated by the corresponding new classes > ** E.g. ImpaladMetrics::IO_MGR_NUM_OPEN_FILES would be updated by the file > handling classes -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-10223) Implement INSERT OVERWRITE for Iceberg tables
[ https://issues.apache.org/jira/browse/IMPALA-10223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltán Borók-Nagy updated IMPALA-10223: --- Summary: Implement INSERT OVERWRITE for Iceberg tables (was: Implement INSERT OVERWRITE for non-partitioned Iceberg tables) > Implement INSERT OVERWRITE for Iceberg tables > - > > Key: IMPALA-10223 > URL: https://issues.apache.org/jira/browse/IMPALA-10223 > Project: IMPALA > Issue Type: Sub-task > Components: Frontend >Reporter: Zoltán Borók-Nagy >Priority: Major > Labels: impala-iceberg > > Add support for INSERT OVERWRITE statements for Iceberg tables. > Use Iceberg's OverwriteFiles API for this. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-10223) Implement INSERT OVERWRITE for Iceberg tables
[ https://issues.apache.org/jira/browse/IMPALA-10223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltán Borók-Nagy updated IMPALA-10223: --- Description: Add support for INSERT OVERWRITE statements for Iceberg tables. Use Iceberg's ReplacePartitions API for this. was: Add support for INSERT OVERWRITE statements for Iceberg tables. Use Iceberg's OverwriteFiles API for this. > Implement INSERT OVERWRITE for Iceberg tables > - > > Key: IMPALA-10223 > URL: https://issues.apache.org/jira/browse/IMPALA-10223 > Project: IMPALA > Issue Type: Sub-task > Components: Frontend >Reporter: Zoltán Borók-Nagy >Priority: Major > Labels: impala-iceberg > > Add support for INSERT OVERWRITE statements for Iceberg tables. > Use Iceberg's ReplacePartitions API for this. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-10460) Impala should write normalized paths in Iceberg manifests
[ https://issues.apache.org/jira/browse/IMPALA-10460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltán Borók-Nagy resolved IMPALA-10460. Fix Version/s: Impala 4.0 Resolution: Fixed > Impala should write normalized paths in Iceberg manifests > - > > Key: IMPALA-10460 > URL: https://issues.apache.org/jira/browse/IMPALA-10460 > Project: IMPALA > Issue Type: Bug > Components: Backend >Reporter: Zoltán Borók-Nagy >Assignee: Zoltán Borók-Nagy >Priority: Major > Labels: impala-iceberg > Fix For: Impala 4.0 > > > Currently Impala writes double slashes in the paths of datafiles for > non-partitioned Iceberg tables, e.g.: > {noformat} > hdfs://localhost:20500/test-warehouse/ice_t/data//594828b035d480b7-9c9fd8eb_173316607_data.0.parq{noformat} > Paths should be normalized so they won't cause any problems later. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-10166) ALTER TABLE for Iceberg tables
[ https://issues.apache.org/jira/browse/IMPALA-10166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17277026#comment-17277026 ] Zoltán Borók-Nagy commented on IMPALA-10166: Hey [~skyyws], do you plan to work on the remaining ALTER TABLE statements as well? > ALTER TABLE for Iceberg tables > -- > > Key: IMPALA-10166 > URL: https://issues.apache.org/jira/browse/IMPALA-10166 > Project: IMPALA > Issue Type: New Feature > Components: Frontend >Reporter: Zoltán Borók-Nagy >Assignee: Sheng Wang >Priority: Major > Labels: impala-iceberg > > Add support for ALTER TABLE operations for Iceberg tables. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-10456) Implement TRUNCATE for Iceberg tables
[ https://issues.apache.org/jira/browse/IMPALA-10456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltán Borók-Nagy reassigned IMPALA-10456: -- Assignee: Zoltán Borók-Nagy > Implement TRUNCATE for Iceberg tables > - > > Key: IMPALA-10456 > URL: https://issues.apache.org/jira/browse/IMPALA-10456 > Project: IMPALA > Issue Type: Sub-task > Components: Frontend >Reporter: Zoltán Borók-Nagy >Assignee: Zoltán Borók-Nagy >Priority: Major > Labels: impala-iceberg > > Implement TRUNCATE for Iceberg tables. > The TRUNCATE operation should create a new snapshot for the target table that > doesn't have any data files. > It should work for both partitioned and unpartitioned tables. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-10456) Implement TRUNCATE for Iceberg tables
Zoltán Borók-Nagy created IMPALA-10456: -- Summary: Implement TRUNCATE for Iceberg tables Key: IMPALA-10456 URL: https://issues.apache.org/jira/browse/IMPALA-10456 Project: IMPALA Issue Type: Sub-task Reporter: Zoltán Borók-Nagy Implement TRUNCATE for Iceberg tables. The TRUNCATE operation should create a new snapshot for the target table that doesn't have any data files. It should work for both partitioned and unpartitioned tables. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-10456) Implement TRUNCATE for Iceberg tables
[ https://issues.apache.org/jira/browse/IMPALA-10456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltán Borók-Nagy updated IMPALA-10456: --- Labels: impala-iceberg (was: ) > Implement TRUNCATE for Iceberg tables > - > > Key: IMPALA-10456 > URL: https://issues.apache.org/jira/browse/IMPALA-10456 > Project: IMPALA > Issue Type: Sub-task > Components: Frontend >Reporter: Zoltán Borók-Nagy >Priority: Major > Labels: impala-iceberg > > Implement TRUNCATE for Iceberg tables. > The TRUNCATE operation should create a new snapshot for the target table that > doesn't have any data files. > It should work for both partitioned and unpartitioned tables. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-10456) Implement TRUNCATE for Iceberg tables
[ https://issues.apache.org/jira/browse/IMPALA-10456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltán Borók-Nagy updated IMPALA-10456: --- Component/s: Frontend > Implement TRUNCATE for Iceberg tables > - > > Key: IMPALA-10456 > URL: https://issues.apache.org/jira/browse/IMPALA-10456 > Project: IMPALA > Issue Type: Sub-task > Components: Frontend >Reporter: Zoltán Borók-Nagy >Priority: Major > > Implement TRUNCATE for Iceberg tables. > The TRUNCATE operation should create a new snapshot for the target table that > doesn't have any data files. > It should work for both partitioned and unpartitioned tables. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-10432) INSERT INTO Iceberg tables with partition transforms
[ https://issues.apache.org/jira/browse/IMPALA-10432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltán Borók-Nagy reassigned IMPALA-10432: -- Assignee: Zoltán Borók-Nagy > INSERT INTO Iceberg tables with partition transforms > > > Key: IMPALA-10432 > URL: https://issues.apache.org/jira/browse/IMPALA-10432 > Project: IMPALA > Issue Type: Sub-task >Reporter: Zoltán Borók-Nagy >Assignee: Zoltán Borók-Nagy >Priority: Major > Labels: impala-iceberg > Fix For: Impala 4.0 > > > INSERT INTO Iceberg tables that use partition transforms. Partition > transforms are functions that calculate partition data from row data. > There are the following partition transforms in Iceberg: > [https://iceberg.apache.org/spec/#partition-transforms] > * IDENTITY > * BUCKET > * TRUNCATE > * YEAR > * MONTH > * DAY > * HOUR > INSERT INTO identity-partitioned Iceberg tables are already supported. > We need to add support for the rest. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-10446) Add interop tests for Iceberg tables
[ https://issues.apache.org/jira/browse/IMPALA-10446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltán Borók-Nagy updated IMPALA-10446: --- Labels: impala-iceberg (was: ) > Add interop tests for Iceberg tables > > > Key: IMPALA-10446 > URL: https://issues.apache.org/jira/browse/IMPALA-10446 > Project: IMPALA > Issue Type: Test >Reporter: Zoltán Borók-Nagy >Priority: Major > Labels: impala-iceberg > > Add interoperability tests for Iceberg table support between Impala and Hive. > At first we'll need to add Iceberg to our test environment and configure Hive > to use it during runtime. > We need to test that Impala is able to read Iceberg tables written by Hive. > Also test that Hive is able to read Iceberg tables written by Impala. > * Have tests for all data types > * For all partition transforms > * Null values in data and/or partitioning columns > * Files with multiple row groups (written by Hive) > ** to test block location loading from Iceberg > * Unsupported column types > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-10453) Support file/partition pruning via runtime filters on Iceberg
[ https://issues.apache.org/jira/browse/IMPALA-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17272187#comment-17272187 ] Zoltán Borók-Nagy commented on IMPALA-10453: I think min/max filters will do a good service here, filtering out row groups. I wonder if we put the partition transformed values into the bloom filters then they'd be able to prune files using the associated partition data. However, we can't do that if the partition layout has been evolved over time. Or we will just only prune files that have the current partition layout. > Support file/partition pruning via runtime filters on Iceberg > - > > Key: IMPALA-10453 > URL: https://issues.apache.org/jira/browse/IMPALA-10453 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Reporter: Tim Armstrong >Priority: Major > Labels: iceberg, impala-iceberg, performance > > This is a placeholder to figure out what we'd need to do to support dynamic > file-level pruning in Iceberg using runtime filters, i.e. have parity for > partition pruning. > * If there is a single partition value per file, then applying bloom filters > to the row group stats would be effective at pruning files. > * If there are partition transforms, e.g. hash-based, then I think we > probably need to track the partition that the file is associated with and > then have some custom logic in the parquet scanner to do partition pruning. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-10452) CREATE Iceberg tables with old PARTITIONED BY syntax
[ https://issues.apache.org/jira/browse/IMPALA-10452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltán Borók-Nagy reassigned IMPALA-10452: -- Assignee: Zoltán Borók-Nagy > CREATE Iceberg tables with old PARTITIONED BY syntax > > > Key: IMPALA-10452 > URL: https://issues.apache.org/jira/browse/IMPALA-10452 > Project: IMPALA > Issue Type: Test > Components: Frontend >Reporter: Zoltán Borók-Nagy >Assignee: Zoltán Borók-Nagy >Priority: Major > Labels: impala-iceberg > > It's convenient for users to create Iceberg tables with the old syntax. > It's also easier to migrate existing workloads to Iceberg because the SQL > scripts that create the table definitions don't need to change that much. > So users should be able to write the following: > {noformat} > CREATE TABLE ice_t (i int) > PARTITIONED BY (p int) > STORED AS ICEBERG; > {noformat} > Which should be equivalent to this: > {noformat} > CREATE TABLE ice_t (i int, p int) > PARTITION BY SPEC (p IDENTITY) > STORED AS ICEBERG; > {noformat} > Please note that the old-style CREATE TABLE creates IDENTITY-partitioned > tables. For other partition transforms the users must use the new, more > generic syntax. > Hive also supports the PARTITIONED BY syntax, see > [https://github.com/apache/iceberg/pull/1612] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-10452) CREATE Iceberg tables with old PARTITIONED BY syntax
[ https://issues.apache.org/jira/browse/IMPALA-10452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltán Borók-Nagy updated IMPALA-10452: --- Component/s: Frontend > CREATE Iceberg tables with old PARTITIONED BY syntax > > > Key: IMPALA-10452 > URL: https://issues.apache.org/jira/browse/IMPALA-10452 > Project: IMPALA > Issue Type: Test > Components: Frontend >Reporter: Zoltán Borók-Nagy >Priority: Major > Labels: impala-iceberg > > It's convenient for users to create Iceberg tables with the old syntax. > It's also easier to migrate existing workloads to Iceberg because the SQL > scripts that create the table definitions don't need to change that much. > So users should be able to write the following: > {noformat} > CREATE TABLE ice_t (i int) > PARTITIONED BY (p int) > STORED AS ICEBERG; > {noformat} > Which should be equivalent to this: > {noformat} > CREATE TABLE ice_t (i int, p int) > PARTITION BY SPEC (p IDENTITY) > STORED AS ICEBERG; > {noformat} > Please note that the old-style CREATE TABLE creates IDENTITY-partitioned > tables. For other partition transforms the users must use the new, more > generic syntax. > Hive also supports the PARTITIONED BY syntax, see > [https://github.com/apache/iceberg/pull/1612] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-10460) Impala should write normalized paths in Iceberg manifests
Zoltán Borók-Nagy created IMPALA-10460: -- Summary: Impala should write normalized paths in Iceberg manifests Key: IMPALA-10460 URL: https://issues.apache.org/jira/browse/IMPALA-10460 Project: IMPALA Issue Type: Bug Reporter: Zoltán Borók-Nagy Currently Impala writes double slashes in the paths of datafiles for non-partitioned Iceberg tables, e.g.: {noformat} hdfs://localhost:20500/test-warehouse/ice_t/data//594828b035d480b7-9c9fd8eb_173316607_data.0.parq{noformat} Paths should be normalized so they won't cause any problems later. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-10460) Impala should write normalized paths in Iceberg manifests
[ https://issues.apache.org/jira/browse/IMPALA-10460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltán Borók-Nagy reassigned IMPALA-10460: -- Assignee: Zoltán Borók-Nagy > Impala should write normalized paths in Iceberg manifests > - > > Key: IMPALA-10460 > URL: https://issues.apache.org/jira/browse/IMPALA-10460 > Project: IMPALA > Issue Type: Bug > Components: Backend >Reporter: Zoltán Borók-Nagy >Assignee: Zoltán Borók-Nagy >Priority: Major > Labels: impala-iceberg > > Currently Impala writes double slashes in the paths of datafiles for > non-partitioned Iceberg tables, e.g.: > {noformat} > hdfs://localhost:20500/test-warehouse/ice_t/data//594828b035d480b7-9c9fd8eb_173316607_data.0.parq{noformat} > Paths should be normalized so they won't cause any problems later. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-10460) Impala should write normalized paths in Iceberg manifests
[ https://issues.apache.org/jira/browse/IMPALA-10460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltán Borók-Nagy updated IMPALA-10460: --- Component/s: Backend > Impala should write normalized paths in Iceberg manifests > - > > Key: IMPALA-10460 > URL: https://issues.apache.org/jira/browse/IMPALA-10460 > Project: IMPALA > Issue Type: Bug > Components: Backend >Reporter: Zoltán Borók-Nagy >Priority: Major > > Currently Impala writes double slashes in the paths of datafiles for > non-partitioned Iceberg tables, e.g.: > {noformat} > hdfs://localhost:20500/test-warehouse/ice_t/data//594828b035d480b7-9c9fd8eb_173316607_data.0.parq{noformat} > Paths should be normalized so they won't cause any problems later. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-10460) Impala should write normalized paths in Iceberg manifests
[ https://issues.apache.org/jira/browse/IMPALA-10460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltán Borók-Nagy updated IMPALA-10460: --- Labels: impala-iceberg (was: ) > Impala should write normalized paths in Iceberg manifests > - > > Key: IMPALA-10460 > URL: https://issues.apache.org/jira/browse/IMPALA-10460 > Project: IMPALA > Issue Type: Bug > Components: Backend >Reporter: Zoltán Borók-Nagy >Priority: Major > Labels: impala-iceberg > > Currently Impala writes double slashes in the paths of datafiles for > non-partitioned Iceberg tables, e.g.: > {noformat} > hdfs://localhost:20500/test-warehouse/ice_t/data//594828b035d480b7-9c9fd8eb_173316607_data.0.parq{noformat} > Paths should be normalized so they won't cause any problems later. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-10166) ALTER TABLE for Iceberg tables
[ https://issues.apache.org/jira/browse/IMPALA-10166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17277108#comment-17277108 ] Zoltán Borók-Nagy commented on IMPALA-10166: Right. Thanks for the info! > ALTER TABLE for Iceberg tables > -- > > Key: IMPALA-10166 > URL: https://issues.apache.org/jira/browse/IMPALA-10166 > Project: IMPALA > Issue Type: New Feature > Components: Frontend >Reporter: Zoltán Borók-Nagy >Assignee: Sheng Wang >Priority: Major > Labels: impala-iceberg > > Add support for ALTER TABLE operations for Iceberg tables. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-10749) Add Runtime Filter Publish info to Node Lifecycle Event Timeline
Zoltán Borók-Nagy created IMPALA-10749: -- Summary: Add Runtime Filter Publish info to Node Lifecycle Event Timeline Key: IMPALA-10749 URL: https://issues.apache.org/jira/browse/IMPALA-10749 Project: IMPALA Issue Type: Improvement Components: Backend Reporter: Zoltán Borók-Nagy Currently we only have the following info about runtime filters at the HASH JOIN NODE in the profile: {noformat} Runtime filters: 1 of 1 Runtime Filter Published {noformat} But it would be useful to know when the runtime filters were published. We could add this information to the Node Lifecycle Event Timeline. E.g. in test failures like IMPALA-10747 it would be good to know when the runtime filters were published. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-10747) test_runtime_filters.py::test_row_filters failed in dockerised test
[ https://issues.apache.org/jira/browse/IMPALA-10747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltán Borók-Nagy updated IMPALA-10747: --- Component/s: Backend > test_runtime_filters.py::test_row_filters failed in dockerised test > --- > > Key: IMPALA-10747 > URL: https://issues.apache.org/jira/browse/IMPALA-10747 > Project: IMPALA > Issue Type: Bug > Components: Backend >Reporter: Zoltán Borók-Nagy >Priority: Major > Labels: broken-build > > test_runtime_filters.py::test_row_filters failed with the following stack > trace: > {noformat} > query_test/test_runtime_filters.py:341: in test_row_filters > test_file_vars={'$RUNTIME_FILTER_WAIT_TIME_MS': str(WAIT_TIME_MS)}) > common/impala_test_suite.py:734: in run_test_case > update_section=pytest.config.option.update_results) > common/test_result_verifier.py:636: in verify_runtime_profile > actual)) > E AssertionError: Did not find matches for lines in runtime profile: > E EXPECTED LINES: > E row_regex: .*Rows processed: 16.38K.* > {noformat} > The job was: > https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/4320/testReport/ > It's similar to IMPALA-6004 and IMPALA-6712. Those were fixed by increasing > the runtime filter wait time. It's currently 60 seconds in regular builds and > 200 seconds in slow builds: > https://github.com/apache/impala/blob/c65d7861d9ae28f6fc592727ff699a8155dcda2c/tests/query_test/test_runtime_filters.py#L37 > The profile contains: > {noformat} > Runtime filters: Not all filters arrived (arrived: [], missing [0]), waited > for 59s361ms. Arrival delay: 1m. > {noformat} > This was the only test that failed in that build, and the whole build took 4 > hr 17 min which is normal. So other tests didn't experience slowness. > There was only a single runtime filter that was generated by 02:HASH JOIN. > {noformat} > E Operator #Hosts #Inst Avg Time Max Time #Rows Est. > #Rows Peak Mem Est. Peak Mem Detail > E > -- > E F03:ROOT 1 1 0.000ns 0.000ns > 4.01 MB4.00 MB > E 07:AGGREGATE 1 1 3.999ms 3.999ms 1 >1 16.00 KB 16.00 KB FINALIZE > E 06:EXCHANGE 1 1 0.000ns 0.000ns 3 >1 32.00 KB 16.00 KB UNPARTITIONED > E F02:EXCHANGE SENDER 3 3 0.000ns 0.000ns >16.00 KB 0 > E 03:AGGREGATE 3 3 0.000ns 0.000ns 3 >1 24.00 KB 16.00 KB > E 02:HASH JOIN 3 3 14s217ms 18s739ms 51.50M > 7.74M 68.06 MB 169.06 MB INNER JOIN, PARTITIONED > E |--05:EXCHANGE3 3 8s303ms 13s715ms 6.00M > 6.00M 13.90 MB 10.12 MB HASH(b.l_comment) > E | F01:EXCHANGE SENDER3 3 36s758ms 44s115ms > 209.53 KB 0 > E | 01:SCAN HDFS 3 3 13s637ms 15s775ms 6.00M > 6.00M 29.96 MB 80.00 MB tpch_parquet.lineitem b > E 04:EXCHANGE 3 3 4s874ms 7s223ms 6.00M > 6.00M 12.27 MB 10.12 MB HASH(a.l_comment) > E F00:EXCHANGE SENDER 3 3 23s495ms 31s755ms > 209.53 KB 0 > E 00:SCAN HDFS 3 3 1m4s 1m8s 6.00M > 6.00M 29.96 MB 80.00 MB tpch_parquet.lineitem a > {noformat} > The Max Time of F01:EXCHANGE SENDER was 44s115ms (non-child time). > The HASH JOIN BUILDER above the EXCHANGE SENDER had a non-child total time > 19s851ms. > The profiles of all HASH_JOIN_NODE operators (all 3 of the 3 fragment > instances) has: > {noformat} > Runtime filters: 1 of 1 Runtime Filter Published > {noformat} > So it seems like the filters were published, but 60 sec still wasn't enough. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-10748) Remove enable_orc_scanner flag
Zoltán Borók-Nagy created IMPALA-10748: -- Summary: Remove enable_orc_scanner flag Key: IMPALA-10748 URL: https://issues.apache.org/jira/browse/IMPALA-10748 Project: IMPALA Issue Type: Improvement Components: Backend, Frontend Reporter: Zoltán Borók-Nagy We've been supporting reading ORC files for quite some time. I don't think we need the flag anymore. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-10747) test_runtime_filters.py::test_row_filters failed in dockerised test
Zoltán Borók-Nagy created IMPALA-10747: -- Summary: test_runtime_filters.py::test_row_filters failed in dockerised test Key: IMPALA-10747 URL: https://issues.apache.org/jira/browse/IMPALA-10747 Project: IMPALA Issue Type: Bug Reporter: Zoltán Borók-Nagy test_runtime_filters.py::test_row_filters failed with the following stack trace: {noformat} query_test/test_runtime_filters.py:341: in test_row_filters test_file_vars={'$RUNTIME_FILTER_WAIT_TIME_MS': str(WAIT_TIME_MS)}) common/impala_test_suite.py:734: in run_test_case update_section=pytest.config.option.update_results) common/test_result_verifier.py:636: in verify_runtime_profile actual)) E AssertionError: Did not find matches for lines in runtime profile: E EXPECTED LINES: E row_regex: .*Rows processed: 16.38K.* {noformat} The job was: https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/4320/testReport/ It's similar to IMPALA-6004 and IMPALA-6712. Those were fixed by increasing the runtime filter wait time. It's currently 60 seconds in regular builds and 200 seconds in slow builds: https://github.com/apache/impala/blob/c65d7861d9ae28f6fc592727ff699a8155dcda2c/tests/query_test/test_runtime_filters.py#L37 The profile contains: {noformat} Runtime filters: Not all filters arrived (arrived: [], missing [0]), waited for 59s361ms. Arrival delay: 1m. {noformat} This was the only test that failed in that build, and the whole build took 4 hr 17 min which is normal. So other tests didn't experience slowness. There was only a single runtime filter that was generated by 02:HASH JOIN. {noformat} E Operator #Hosts #Inst Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail E -- E F03:ROOT 1 1 0.000ns 0.000ns 4.01 MB4.00 MB E 07:AGGREGATE 1 1 3.999ms 3.999ms 1 1 16.00 KB 16.00 KB FINALIZE E 06:EXCHANGE 1 1 0.000ns 0.000ns 3 1 32.00 KB 16.00 KB UNPARTITIONED E F02:EXCHANGE SENDER 3 3 0.000ns 0.000ns 16.00 KB 0 E 03:AGGREGATE 3 3 0.000ns 0.000ns 3 1 24.00 KB 16.00 KB E 02:HASH JOIN 3 3 14s217ms 18s739ms 51.50M 7.74M 68.06 MB 169.06 MB INNER JOIN, PARTITIONED E |--05:EXCHANGE3 3 8s303ms 13s715ms 6.00M 6.00M 13.90 MB 10.12 MB HASH(b.l_comment) E | F01:EXCHANGE SENDER3 3 36s758ms 44s115ms 209.53 KB 0 E | 01:SCAN HDFS 3 3 13s637ms 15s775ms 6.00M 6.00M 29.96 MB 80.00 MB tpch_parquet.lineitem b E 04:EXCHANGE 3 3 4s874ms 7s223ms 6.00M 6.00M 12.27 MB 10.12 MB HASH(a.l_comment) E F00:EXCHANGE SENDER 3 3 23s495ms 31s755ms 209.53 KB 0 E 00:SCAN HDFS 3 3 1m4s 1m8s 6.00M 6.00M 29.96 MB 80.00 MB tpch_parquet.lineitem a {noformat} The Max Time of F01:EXCHANGE SENDER was 44s115ms (non-child time). The HASH JOIN BUILDER above the EXCHANGE SENDER had a non-child total time 19s851ms. The profiles of all HASH_JOIN_NODE operators (all 3 of the 3 fragment instances) has: {noformat} Runtime filters: 1 of 1 Runtime Filter Published {noformat} So it seems like the filters were published, but 60 sec still wasn't enough. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-10485) Support Iceberg field-id based column resolution in the ORC scanner
[ https://issues.apache.org/jira/browse/IMPALA-10485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltán Borók-Nagy resolved IMPALA-10485. Fix Version/s: Impala 4.0 Resolution: Fixed > Support Iceberg field-id based column resolution in the ORC scanner > --- > > Key: IMPALA-10485 > URL: https://issues.apache.org/jira/browse/IMPALA-10485 > Project: IMPALA > Issue Type: Sub-task > Components: Backend >Reporter: Zoltán Borók-Nagy >Assignee: Zoltán Borók-Nagy >Priority: Major > Labels: impala-iceberg > Fix For: Impala 4.0 > > > Currently the ORC scanner only support position-based column resolution. > We need to add Iceberg field-id based column resolution to support schema > evolution. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-10737) Optimize Iceberg metadata handling
Zoltán Borók-Nagy created IMPALA-10737: -- Summary: Optimize Iceberg metadata handling Key: IMPALA-10737 URL: https://issues.apache.org/jira/browse/IMPALA-10737 Project: IMPALA Issue Type: Bug Components: Frontend Reporter: Zoltán Borók-Nagy Currently we re-read Iceberg table metadata in several cases. We should rather keep it in memory and use it when possible. Also, when refreshing a table we should use Iceberg's refresh() API to avoid unnecessary re-reads of manifest files: [https://github.com/apache/iceberg/blob/282b6f9f1cae8d4fd5ff7c73de513ca91f01fddc/core/src/main/java/org/apache/iceberg/TableOperations.java#L45] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-10732) Use consistent DDL for specifying Iceberg partitions
[ https://issues.apache.org/jira/browse/IMPALA-10732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltán Borók-Nagy reassigned IMPALA-10732: -- Assignee: Zoltán Borók-Nagy > Use consistent DDL for specifying Iceberg partitions > > > Key: IMPALA-10732 > URL: https://issues.apache.org/jira/browse/IMPALA-10732 > Project: IMPALA > Issue Type: Bug >Reporter: Zoltán Borók-Nagy >Assignee: Zoltán Borók-Nagy >Priority: Major > Labels: impala-iceberg > > Currently we have a DDL syntax for defining Iceberg partitions that differs > from SparkSQL: > [https://iceberg.apache.org/spark-ddl/#partitioned-by] > > E.g. Impala is using the following syntax: > > CREATE TABLE ice_t (i int, s string, ts timestamp, d date) > *PARTITION BY SPEC (i BUCKET 5, ts MONTH, d YEAR)* > STORED AS ICEBERG; > The same in Spark is: > CREATE TABLE ice_t (i int, s string, ts timestamp, d date) > USING ICEBERG > *PARTITIONED BY (bucket(5, i), months(ts), years(d))* > > Impala's syntax is older but hasn't been released yet. Spark's syntax is > released so it cannot be changed. > > Hive is also working on DDL support for Iceberg partitions, and they are > favoring the released SparkSQL syntax. > > After dicsussing it on dev@impala we decided to use SparkSQL's syntax. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-10732) Use consistent DDL for specifying Iceberg partitions
Zoltán Borók-Nagy created IMPALA-10732: -- Summary: Use consistent DDL for specifying Iceberg partitions Key: IMPALA-10732 URL: https://issues.apache.org/jira/browse/IMPALA-10732 Project: IMPALA Issue Type: Bug Reporter: Zoltán Borók-Nagy Currently we have a DDL syntax for defining Iceberg partitions that differs from SparkSQL: [https://iceberg.apache.org/spark-ddl/#partitioned-by] E.g. Impala is using the following syntax: CREATE TABLE ice_t (i int, s string, ts timestamp, d date) *PARTITION BY SPEC (i BUCKET 5, ts MONTH, d YEAR)* STORED AS ICEBERG; The same in Spark is: CREATE TABLE ice_t (i int, s string, ts timestamp, d date) USING ICEBERG *PARTITIONED BY (bucket(5, i), months(ts), years(d))* Impala's syntax is older but hasn't been released yet. Spark's syntax is released so it cannot be changed. Hive is also working on DDL support for Iceberg partitions, and they are favoring the released SparkSQL syntax. After dicsussing it on dev@impala we decided to use SparkSQL's syntax. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-10578) Big Query influence other query seriously when hardware not reach limit
[ https://issues.apache.org/jira/browse/IMPALA-10578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltán Borók-Nagy resolved IMPALA-10578. Resolution: Not A Bug > Big Query influence other query seriously when hardware not reach limit > > > Key: IMPALA-10578 > URL: https://issues.apache.org/jira/browse/IMPALA-10578 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 3.4.0 > Environment: impala-3.4 > 80 machines with 96 cpu and 256GB mem > scratch-dir is on separate disk different from HDFS data dir >Reporter: wesleydeng >Priority: Major > Attachments: big_query.txt.bz2, image-2021-03-10-19-59-24-188.png, > image-2021-03-16-16-32-37-862.png, small_query_be_influenced_very_slow.txt.bz2 > > > When a big query is running(use mt_dop=8), other query is very difficult to > start. > A small query (select distinct one field from a small table) may take about > 1 minutes, normallly it take only about 1~3 second. > From the impalad log, I found a incomprehensible log like this: > !image-2021-03-16-16-32-37-862.png|width=836,height=189! > !image-2021-03-10-19-59-24-188.png|width=892,height=435! > --- > About the gap between "Handling call" and "Deserializing Batch", I found > another path : > --KrpcDataStreamRecvr::SenderQueue::AddBatch > EnqueueDeferredRpc(move(payload), l); // after dequeue, will call > KrpcDataStreamRecvr::SenderQueue::AddBatchWork > --- > > > When the Big query is running, data spilled has happened because mem_limit > was set and this big query waste a lot of memory. > > In the attchment, I append the profile of big query and small query. The > small query can be finished in seconds normally. the timeline of small query > show as below: > Query Timeline: 21m39s > - Query submitted: 48.846us (48.846us) > - Planning finished: 2.934ms (2.886ms) > - Submit for admission: 12.572ms (9.637ms) > - Completed admission: 13.622ms (1.050ms) > - Ready to start on 56 backends: 15.271ms (1.649ms) > -- All 56 execution backends (171 fragment instances) started: 18s505ms > (18s489ms)* > - Rows available: 51s770ms (33s265ms) > - First row fetched: 57s220ms (5s449ms) > - Last row fetched: 59s119ms (1s899ms) > - Released admission control resources: 1m1s (2s223ms) > - AdmissionControlTimeSinceLastUpdate: 80.000ms > - ComputeScanRangeAssignmentTimer: 439.749us > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-10741) Set engine.hive.enabled=true table property for Iceberg tables
Zoltán Borók-Nagy created IMPALA-10741: -- Summary: Set engine.hive.enabled=true table property for Iceberg tables Key: IMPALA-10741 URL: https://issues.apache.org/jira/browse/IMPALA-10741 Project: IMPALA Issue Type: Bug Reporter: Zoltán Borók-Nagy Hive relies on engine.hive.enabled=true table property to be set for Iceberg tables. Without it Hive overwrites table metadata with different storage handler, SerDe/Input/OutputFormatter when it writes the table, making it unusable. Impala should set this table property during table creation. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-7556) Clean up ScanRange
[ https://issues.apache.org/jira/browse/IMPALA-7556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltán Borók-Nagy resolved IMPALA-7556. --- Fix Version/s: Impala 4.1 Resolution: Fixed > Clean up ScanRange > -- > > Key: IMPALA-7556 > URL: https://issues.apache.org/jira/browse/IMPALA-7556 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Reporter: Zoltán Borók-Nagy >Assignee: Amogh Margoor >Priority: Major > Labels: ramp-up > Fix For: Impala 4.1 > > > For IMPALA-7543 I want to add some additional functionality to scan ranges. > However, the code of the ScanRange class is already quite messy. It handles > different types of files, does some buffer management, updates all kinds of > counters. > So, instead of complicating the code further, let's refactor the ScanRange > class a bit. > * Do the file operations in separate classes > ** A new, abstract class could be invented to provide an API for file > operations, i.e. Open(), ReadFromPos(), Close(), etc. > *** Keep in mind that the interface must be a good fit for IMPALA-7543, i.e. > we need positional reads from files > ** Operations for local files and HDFS files could be implemented in child > classes > * Buffer management > ** A new BufferStore class could be created > ** This new class would be responsible for managing the unused buffers > *** if possible, it would also handle the client and cached buffers as well > * Counters and metrics would be updated by the corresponding new classes > ** E.g. ImpaladMetrics::IO_MGR_NUM_OPEN_FILES would be updated by the file > handling classes -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-10744) Send INSERT events even when Impala's even processing is not enabled
Zoltán Borók-Nagy created IMPALA-10744: -- Summary: Send INSERT events even when Impala's even processing is not enabled Key: IMPALA-10744 URL: https://issues.apache.org/jira/browse/IMPALA-10744 Project: IMPALA Issue Type: Bug Components: Catalog Reporter: Zoltán Borók-Nagy Generating insert events should not be conditional to events processor being active or not. Please note that this will also need to fix a bug in the createInsertEvents() code as an INSERT with an empty result set raises an IllegalStateException: create table ctas_empty as select * from functional.alltypes limit 0; -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-10744) Send INSERT events even when Impala's even processing is not enabled
[ https://issues.apache.org/jira/browse/IMPALA-10744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltán Borók-Nagy updated IMPALA-10744: --- Description: Generating insert events should not be conditional to events processor being active or not. Related code: https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L5020-L5023 Please note that this will also need to fix a bug in the createInsertEvents() code as an INSERT with an empty result set raises an IllegalStateException: create table ctas_empty as select * from functional.alltypes limit 0; was: Generating insert events should not be conditional to events processor being active or not. Please note that this will also need to fix a bug in the createInsertEvents() code as an INSERT with an empty result set raises an IllegalStateException: create table ctas_empty as select * from functional.alltypes limit 0; > Send INSERT events even when Impala's even processing is not enabled > > > Key: IMPALA-10744 > URL: https://issues.apache.org/jira/browse/IMPALA-10744 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Reporter: Zoltán Borók-Nagy >Priority: Major > > Generating insert events should not be conditional to events processor being > active or not. > Related code: > https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L5020-L5023 > Please note that this will also need to fix a bug in the createInsertEvents() > code as an INSERT with an empty result set raises an IllegalStateException: > create table ctas_empty as select * from functional.alltypes limit 0; -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-10754) test_overlap_min_max_filters_on_sorted_columns failed during GVO
[ https://issues.apache.org/jira/browse/IMPALA-10754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17366568#comment-17366568 ] Zoltán Borók-Nagy commented on IMPALA-10754: Also seen in * https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/4338/testReport/ * https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/4339/testReport/ * https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/4344/testReport/ * [https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/4347/testReport/] * [https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/4350/testReport/] [~sql_forever] could you please take a look? > test_overlap_min_max_filters_on_sorted_columns failed during GVO > > > Key: IMPALA-10754 > URL: https://issues.apache.org/jira/browse/IMPALA-10754 > Project: IMPALA > Issue Type: Bug > Components: Backend >Reporter: Zoltán Borók-Nagy >Assignee: Qifan Chen >Priority: Major > Labels: broken-build > > test_overlap_min_max_filters_on_sorted_columns failed in the following build: > https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/4338/testReport/ > *Stack trace:* > {noformat} > query_test/test_runtime_filters.py:296: in > test_overlap_min_max_filters_on_sorted_columns > test_file_vars={'$RUNTIME_FILTER_WAIT_TIME_MS': str(WAIT_TIME_MS)}) > common/impala_test_suite.py:734: in run_test_case > update_section=pytest.config.option.update_results) > common/test_result_verifier.py:653: in verify_runtime_profile > % (function, field, expected_value, actual_value, op, actual)) > E AssertionError: Aggregation of SUM over NumRuntimeFilteredPages did not > match expected results. > E EXPECTED VALUE: > E 58 > E > E > E ACTUAL VALUE: > E 59 > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-10433) Use Iceberg's fixed partition transforms
[ https://issues.apache.org/jira/browse/IMPALA-10433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltán Borók-Nagy reassigned IMPALA-10433: -- Assignee: Zoltán Borók-Nagy > Use Iceberg's fixed partition transforms > > > Key: IMPALA-10433 > URL: https://issues.apache.org/jira/browse/IMPALA-10433 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Reporter: Zoltán Borók-Nagy >Assignee: Zoltán Borók-Nagy >Priority: Major > Labels: impala-iceberg > > Currently the Iceberg time and date partition transforms are wrong if the > data is before the epoch. > There's already an Iceberg pull request about it: > [https://github.com/apache/iceberg/pull/1981] > Becauce of this bug Impala doesn't prune Iceberg partitions if the predicate > refers to timestamps before the epoch. > Once the above pull request is merged we need to update our Iceberg > dependency. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-10433) Use Iceberg's fixed partition transforms
[ https://issues.apache.org/jira/browse/IMPALA-10433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltán Borók-Nagy resolved IMPALA-10433. Fix Version/s: Impala 4.1 Resolution: Fixed > Use Iceberg's fixed partition transforms > > > Key: IMPALA-10433 > URL: https://issues.apache.org/jira/browse/IMPALA-10433 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Reporter: Zoltán Borók-Nagy >Assignee: Zoltán Borók-Nagy >Priority: Major > Labels: impala-iceberg > Fix For: Impala 4.1 > > > Currently the Iceberg time and date partition transforms are wrong if the > data is before the epoch. > There's already an Iceberg pull request about it: > [https://github.com/apache/iceberg/pull/1981] > Becauce of this bug Impala doesn't prune Iceberg partitions if the predicate > refers to timestamps before the epoch. > Once the above pull request is merged we need to update our Iceberg > dependency. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Comment Edited] (IMPALA-10754) test_overlap_min_max_filters_on_sorted_columns failed during GVO
[ https://issues.apache.org/jira/browse/IMPALA-10754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17367192#comment-17367192 ] Zoltán Borók-Nagy edited comment on IMPALA-10754 at 6/22/21, 10:14 AM: --- Seen again in * https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/4352/testReport/ * [https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/4354/testReport/] * [https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/4355/testReport/] was (Author: boroknagyz): Seen again in * [https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/4354/testReport/] * https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/4355/testReport/ > test_overlap_min_max_filters_on_sorted_columns failed during GVO > > > Key: IMPALA-10754 > URL: https://issues.apache.org/jira/browse/IMPALA-10754 > Project: IMPALA > Issue Type: Bug > Components: Backend >Reporter: Zoltán Borók-Nagy >Assignee: Qifan Chen >Priority: Major > Labels: broken-build > > test_overlap_min_max_filters_on_sorted_columns failed in the following build: > https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/4338/testReport/ > *Stack trace:* > {noformat} > query_test/test_runtime_filters.py:296: in > test_overlap_min_max_filters_on_sorted_columns > test_file_vars={'$RUNTIME_FILTER_WAIT_TIME_MS': str(WAIT_TIME_MS)}) > common/impala_test_suite.py:734: in run_test_case > update_section=pytest.config.option.update_results) > common/test_result_verifier.py:653: in verify_runtime_profile > % (function, field, expected_value, actual_value, op, actual)) > E AssertionError: Aggregation of SUM over NumRuntimeFilteredPages did not > match expected results. > E EXPECTED VALUE: > E 58 > E > E > E ACTUAL VALUE: > E 59 > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-10754) test_overlap_min_max_filters_on_sorted_columns failed during GVO
[ https://issues.apache.org/jira/browse/IMPALA-10754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17367192#comment-17367192 ] Zoltán Borók-Nagy commented on IMPALA-10754: Seen again in * [https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/4354/testReport/] * https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/4355/testReport/ > test_overlap_min_max_filters_on_sorted_columns failed during GVO > > > Key: IMPALA-10754 > URL: https://issues.apache.org/jira/browse/IMPALA-10754 > Project: IMPALA > Issue Type: Bug > Components: Backend >Reporter: Zoltán Borók-Nagy >Assignee: Qifan Chen >Priority: Major > Labels: broken-build > > test_overlap_min_max_filters_on_sorted_columns failed in the following build: > https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/4338/testReport/ > *Stack trace:* > {noformat} > query_test/test_runtime_filters.py:296: in > test_overlap_min_max_filters_on_sorted_columns > test_file_vars={'$RUNTIME_FILTER_WAIT_TIME_MS': str(WAIT_TIME_MS)}) > common/impala_test_suite.py:734: in run_test_case > update_section=pytest.config.option.update_results) > common/test_result_verifier.py:653: in verify_runtime_profile > % (function, field, expected_value, actual_value, op, actual)) > E AssertionError: Aggregation of SUM over NumRuntimeFilteredPages did not > match expected results. > E EXPECTED VALUE: > E 58 > E > E > E ACTUAL VALUE: > E 59 > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Comment Edited] (IMPALA-10754) test_overlap_min_max_filters_on_sorted_columns failed during GVO
[ https://issues.apache.org/jira/browse/IMPALA-10754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17366568#comment-17366568 ] Zoltán Borók-Nagy edited comment on IMPALA-10754 at 6/22/21, 10:39 AM: --- Also seen in * [https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/4338/testReport/] * [https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/4339/testReport/] * [https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/4341/testReport/] * https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/4342/testReport/ * [https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/4344/testReport/] * [https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/4347/testReport/] * [https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/4350/testReport/] [~sql_forever] could you please take a look? was (Author: boroknagyz): Also seen in * https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/4338/testReport/ * https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/4339/testReport/ * https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/4344/testReport/ * [https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/4347/testReport/] * [https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/4350/testReport/] [~sql_forever] could you please take a look? > test_overlap_min_max_filters_on_sorted_columns failed during GVO > > > Key: IMPALA-10754 > URL: https://issues.apache.org/jira/browse/IMPALA-10754 > Project: IMPALA > Issue Type: Bug > Components: Backend >Reporter: Zoltán Borók-Nagy >Assignee: Qifan Chen >Priority: Major > Labels: broken-build > > test_overlap_min_max_filters_on_sorted_columns failed in the following build: > https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/4338/testReport/ > *Stack trace:* > {noformat} > query_test/test_runtime_filters.py:296: in > test_overlap_min_max_filters_on_sorted_columns > test_file_vars={'$RUNTIME_FILTER_WAIT_TIME_MS': str(WAIT_TIME_MS)}) > common/impala_test_suite.py:734: in run_test_case > update_section=pytest.config.option.update_results) > common/test_result_verifier.py:653: in verify_runtime_profile > % (function, field, expected_value, actual_value, op, actual)) > E AssertionError: Aggregation of SUM over NumRuntimeFilteredPages did not > match expected results. > E EXPECTED VALUE: > E 58 > E > E > E ACTUAL VALUE: > E 59 > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-10757) ACID table locking for DML statements is faulty
Zoltán Borók-Nagy created IMPALA-10757: -- Summary: ACID table locking for DML statements is faulty Key: IMPALA-10757 URL: https://issues.apache.org/jira/browse/IMPALA-10757 Project: IMPALA Issue Type: Bug Components: Frontend Reporter: Zoltán Borók-Nagy Plain SELECT queries don't take ACID locks. They use the latest snapshot of the table that is loaded by CatalogD. However, DML statements lock all the tables it references, not just the target table. E.g.: {noformat} INSERT INTO target_table SELECT * FROM source_table; {noformat} acquires locks for both target_table and source_table. However, after acquiring the locks Impala doesn't reload the tables. Therefore the following situation is possible: {noformat} INSERT OVERWRITE foo SELECT ...; (takes an exclusive lock for foo) {noformat} while the following statement also tries to take a SHARED_LOCK for foo: {noformat} INSERT INTO bar SELECT * FROM foo; {noformat} It means the INSERT INTO statement might wait for the completion of the INSERT OVERWRITE statement, but since it doesn't reload foo it will still use the old snapshot of foo, hence there was no benefit of waiting for the lock. Possible solutions: # Re-load tables after the lock is acquired # Only take lock for the target table. This would be better than the current behavior, also it would be consistent with plain SELECT queries. I think reloading should be favored as Impala should run every statement (that involves ACID tables) in a transaction and take proper locks, see IMPALA-8788. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-7087) Impala is unable to read Parquet decimal columns with lower precision/scale than table metadata
[ https://issues.apache.org/jira/browse/IMPALA-7087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltán Borók-Nagy reassigned IMPALA-7087: - Assignee: Zoltán Borók-Nagy > Impala is unable to read Parquet decimal columns with lower precision/scale > than table metadata > --- > > Key: IMPALA-7087 > URL: https://issues.apache.org/jira/browse/IMPALA-7087 > Project: IMPALA > Issue Type: Sub-task > Components: Backend >Reporter: Tim Armstrong >Assignee: Zoltán Borók-Nagy >Priority: Major > Labels: decimal, parquet, ramp-up > Attachments: binary_decimal_precision_and_scale_widening.parquet > > > This is similar to IMPALA-2515, except relates to a different precision/scale > in the file metadata rather than just a mismatch in the bytes used to store > the data. In a lot of cases we should be able to convert the decimal type on > the fly to the higher-precision type. > {noformat} > ERROR: File '/hdfs/path/00_0_x_2' column 'alterd_decimal' has an invalid > type length. Expecting: 11 len in file: 8 > {noformat} > It would be convenient to allow reading parquet files where the > precision/scale in the file can be converted to the precision/scale in the > table metadata without loss of precision. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-8131) Impala is unable to read Parquet decimal columns with higher scale than table metadata
[ https://issues.apache.org/jira/browse/IMPALA-8131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltán Borók-Nagy reassigned IMPALA-8131: - Assignee: Zoltán Borók-Nagy > Impala is unable to read Parquet decimal columns with higher scale than table > metadata > -- > > Key: IMPALA-8131 > URL: https://issues.apache.org/jira/browse/IMPALA-8131 > Project: IMPALA > Issue Type: Sub-task > Components: Backend >Reporter: Sahil Takiar >Assignee: Zoltán Borók-Nagy >Priority: Major > Labels: decimal, parquet > > Similar to IMPALA-7087, except we should allow Impala to read Parquet data > stored with a higher scale into a table with lower scale. The SQL Standard > allows for this behavior, and several other databases do this as well. > More information on this can be found in > [this|https://issues.apache.org/jira/browse/IMPALA-7087?focusedCommentId=16688645=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16688645] > comment of IMPALA-7087 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-10739) Add support for ALTER TABLE tbl SET PARTITION SPEC for Iceberg tables
Zoltán Borók-Nagy created IMPALA-10739: -- Summary: Add support for ALTER TABLE tbl SET PARTITION SPEC for Iceberg tables Key: IMPALA-10739 URL: https://issues.apache.org/jira/browse/IMPALA-10739 Project: IMPALA Issue Type: New Feature Reporter: Zoltán Borók-Nagy Impala should support partition evolution for Iceberg tables, i.e. it should be able to set a new partition spec for an Iceberg table via DDL. The command should be {noformat} ALTER TABLE SET PARTITION SPEC() {noformat} to be aligned with Hive. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-10187) Event processing fails on multiple events + DROP TABLE
[ https://issues.apache.org/jira/browse/IMPALA-10187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltán Borók-Nagy resolved IMPALA-10187. Fix Version/s: Impala 4.1 Resolution: Fixed > Event processing fails on multiple events + DROP TABLE > -- > > Key: IMPALA-10187 > URL: https://issues.apache.org/jira/browse/IMPALA-10187 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Reporter: Zoltán Borók-Nagy >Assignee: Vihang Karajgaonkar >Priority: Major > Fix For: Impala 4.1 > > > I've seen the following during interop testing: > Some DDL statements (ALTER TABLE + DROP) were executed via Hive on the same > table. > Then CatalogD's event processor tried to process the new events: > {noformat} > I0922 14:32:56.590229 13611 HdfsTable.java:709] Loaded file and block > metadata for > default.insertonly_hiveclient_impalaclient_partitioned_8ff3a1ef_b8a8_4c7a_b1c7_0b8f4c42c61e > partitions: category=cat1, category=cat2, category=cat3, and 1 others. Time > taken: 55.145ms > I0922 14:32:56.591078 13611 TableLoader.java:103] Loaded metadata for: > default.insertonly_hiveclient_impalaclient_partitioned_8ff3a1ef_b8a8_4c7a_b1c7_0b8f4c42c61e > (303ms) > I0922 14:32:58.022068 10065 MetastoreEventsProcessor.java:482] Received 41 > events. Start event id : 39948 > I0922 14:32:58.022266 10065 MetastoreEvents.java:380] EventId: 39949 > EventType: ALTER_PARTITION Creating event 39949 of type ALTER_PARTITION on > table > default.insertonly_hiveclient_impalaclient_partitioned_8ff3a1ef_b8a8_4c7a_b1c7_0b8f4c42c61e > ... > I0922 14:32:58.024389 10065 MetastoreEvents.java:380] EventId: 39962 > EventType: DROP_TABLE Creating event 39962 of type DROP_TABLE on table > default.insertonly_hiveclient_impalaclient_partitioned_8ff3a1ef_b8a8_4c7a_b1c7_0b8f4c42c61e > {noformat} > > Impala tried to refresh the table on the first ALTER TABLE event, but since > it's been already dropped we get a TableLoadingException (caused by > NoSuchObjectException from HMS): > > {noformat} > I0922 14:32:58.028852 10065 MetastoreEvents.java:234] Total number of events > received: 41 Total number of events filtered out: 0 > I0922 14:32:58.028962 10065 CatalogServiceCatalog.java:862] Not a self-event > since the given version is -1 and service id is > I0922 14:32:58.029369 10065 CatalogServiceCatalog.java:2142] Refreshing table > metadata: > default.insertonly_hiveclient_impalaclient_partitioned_8ff3a1ef_b8a8_4c7a_b1c7_0b8f4c42c61e > E0922 14:32:58.038627 10065 MetastoreEventsProcessor.java:527] Unexpected > exception received while processing event > Java exception follows: > org.apache.impala.catalog.events.MetastoreNotificationException: Unable to > process event 39949 of type ALTER_PARTITION. Event processing will be stopped. > at > org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:620) > at > org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:513) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: org.apache.impala.catalog.TableLoadingException: Error loading > metadata for table: > default.insertonly_hiveclient_impalaclient_partitioned_8ff3a1ef_b8a8_4c7a_b1c7_0b8f4c42c61e > at > org.apache.impala.catalog.CatalogServiceCatalog.reloadTable(CatalogServiceCatalog.java:2160) > at > org.apache.impala.catalog.CatalogServiceCatalog.reloadTableIfExists(CatalogServiceCatalog.java:2365) > at > org.apache.impala.catalog.events.MetastoreEvents$MetastoreTableEvent.reloadTableFromCatalog(MetastoreEvents.java:563) > at > org.apache.impala.catalog.events.MetastoreEvents$AlterPartitionEvent.process(MetastoreEvents.java:1454) > at > org.apache.impala.catalog.events.MetastoreEvents$MetastoreEvent.processIfEnabled(MetastoreEvents.java:314) > at > org.apach
[jira] [Assigned] (IMPALA-10754) test_overlap_min_max_filters_on_sorted_columns failed during GVO
[ https://issues.apache.org/jira/browse/IMPALA-10754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltán Borók-Nagy reassigned IMPALA-10754: -- Assignee: Qifan Chen > test_overlap_min_max_filters_on_sorted_columns failed during GVO > > > Key: IMPALA-10754 > URL: https://issues.apache.org/jira/browse/IMPALA-10754 > Project: IMPALA > Issue Type: Bug > Components: Backend >Reporter: Zoltán Borók-Nagy >Assignee: Qifan Chen >Priority: Major > Labels: broken-build > > test_overlap_min_max_filters_on_sorted_columns failed in the following build: > https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/4338/testReport/ > *Stack trace:* > {noformat} > query_test/test_runtime_filters.py:296: in > test_overlap_min_max_filters_on_sorted_columns > test_file_vars={'$RUNTIME_FILTER_WAIT_TIME_MS': str(WAIT_TIME_MS)}) > common/impala_test_suite.py:734: in run_test_case > update_section=pytest.config.option.update_results) > common/test_result_verifier.py:653: in verify_runtime_profile > % (function, field, expected_value, actual_value, op, actual)) > E AssertionError: Aggregation of SUM over NumRuntimeFilteredPages did not > match expected results. > E EXPECTED VALUE: > E 58 > E > E > E ACTUAL VALUE: > E 59 > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-10754) test_overlap_min_max_filters_on_sorted_columns failed during GVO
Zoltán Borók-Nagy created IMPALA-10754: -- Summary: test_overlap_min_max_filters_on_sorted_columns failed during GVO Key: IMPALA-10754 URL: https://issues.apache.org/jira/browse/IMPALA-10754 Project: IMPALA Issue Type: Bug Components: Backend Reporter: Zoltán Borók-Nagy test_overlap_min_max_filters_on_sorted_columns failed in the following build: https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/4338/testReport/ *Stack trace:* {noformat} query_test/test_runtime_filters.py:296: in test_overlap_min_max_filters_on_sorted_columns test_file_vars={'$RUNTIME_FILTER_WAIT_TIME_MS': str(WAIT_TIME_MS)}) common/impala_test_suite.py:734: in run_test_case update_section=pytest.config.option.update_results) common/test_result_verifier.py:653: in verify_runtime_profile % (function, field, expected_value, actual_value, op, actual)) E AssertionError: Aggregation of SUM over NumRuntimeFilteredPages did not match expected results. E EXPECTED VALUE: E 58 E E E ACTUAL VALUE: E 59 {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-10735) INSERT INTO Iceberg table fails during INSERT event generation
[ https://issues.apache.org/jira/browse/IMPALA-10735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltán Borók-Nagy updated IMPALA-10735: --- Labels: impala-iceberg (was: ) > INSERT INTO Iceberg table fails during INSERT event generation > -- > > Key: IMPALA-10735 > URL: https://issues.apache.org/jira/browse/IMPALA-10735 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Reporter: Zoltán Borók-Nagy >Priority: Major > Labels: impala-iceberg > > INSERT INTO Iceberg table is broken when we use INSERT events. > We get a NullPointerException for partitioned tables here: > [https://github.com/apache/impala/blob/0c89a9cf0f2b642ca214e4fa68eeea9bc32ef3af/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L4953] > Repro: > > {noformat} > create table test_ice (i int, p int) partition by spec (p bucket 5) stored as > iceberg; > insert into test_ice values (1, 2); > {noformat} > > Since Hive Replication doesn't work for Iceberg tables yet it's probably > better to disable INSERT events for Iceberg tables. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-10735) INSERT INTO Iceberg table fails during INSERT event generation
[ https://issues.apache.org/jira/browse/IMPALA-10735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltán Borók-Nagy updated IMPALA-10735: --- Component/s: Catalog > INSERT INTO Iceberg table fails during INSERT event generation > -- > > Key: IMPALA-10735 > URL: https://issues.apache.org/jira/browse/IMPALA-10735 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Reporter: Zoltán Borók-Nagy >Priority: Major > > INSERT INTO Iceberg table is broken when we use INSERT events. > We get a NullPointerException for partitioned tables here: > [https://github.com/apache/impala/blob/0c89a9cf0f2b642ca214e4fa68eeea9bc32ef3af/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L4953] > Repro: > > {noformat} > create table test_ice (i int, p int) partition by spec (p bucket 5) stored as > iceberg; > insert into test_ice values (1, 2); > {noformat} > > Since Hive Replication doesn't work for Iceberg tables yet it's probably > better to disable INSERT events for Iceberg tables. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-10735) INSERT INTO Iceberg table fails during INSERT event generation
[ https://issues.apache.org/jira/browse/IMPALA-10735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltán Borók-Nagy updated IMPALA-10735: --- Description: INSERT INTO Iceberg table is broken when we use INSERT events. We get a NullPointerException for partitioned tables here: [https://github.com/apache/impala/blob/0c89a9cf0f2b642ca214e4fa68eeea9bc32ef3af/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L4953] Repro: {noformat} create table test_ice (i int, p int) partition by spec (p bucket 5) stored as iceberg; insert into test_ice values (1, 2); {noformat} Since Hive Replication doesn't work for Iceberg tables yet it's probably better to disable INSERT events for Iceberg tables. was: INSERT INTO Iceberg table is broken when we use INSERT events. We get a NullPointerException for partitioned tables here: [https://github.com/apache/impala/blob/0c89a9cf0f2b642ca214e4fa68eeea9bc32ef3af/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L4953] Repro: {noformat} create table test_ice (i int, p int) partition by spec (p bucket 5) stored as iceberg; insert into test_ice values (1, 2); {noformat} Since Hive Replication doesn't work for Iceberg tables yet it's probably better to disable INSERT events for Iceberg tables. > INSERT INTO Iceberg table fails during INSERT event generation > -- > > Key: IMPALA-10735 > URL: https://issues.apache.org/jira/browse/IMPALA-10735 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Reporter: Zoltán Borók-Nagy >Priority: Major > Labels: impala-iceberg > > INSERT INTO Iceberg table is broken when we use INSERT events. > We get a NullPointerException for partitioned tables here: > [https://github.com/apache/impala/blob/0c89a9cf0f2b642ca214e4fa68eeea9bc32ef3af/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L4953] > Repro: > {noformat} > create table test_ice (i int, p int) partition by spec (p bucket 5) stored as > iceberg; > insert into test_ice values (1, 2); > {noformat} > Since Hive Replication doesn't work for Iceberg tables yet it's probably > better to disable INSERT events for Iceberg tables. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-10735) INSERT INTO Iceberg table fails during INSERT event generation
[ https://issues.apache.org/jira/browse/IMPALA-10735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltán Borók-Nagy reassigned IMPALA-10735: -- Assignee: Zoltán Borók-Nagy > INSERT INTO Iceberg table fails during INSERT event generation > -- > > Key: IMPALA-10735 > URL: https://issues.apache.org/jira/browse/IMPALA-10735 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Reporter: Zoltán Borók-Nagy >Assignee: Zoltán Borók-Nagy >Priority: Major > Labels: impala-iceberg > > INSERT INTO Iceberg table is broken when we use INSERT events. > We get a NullPointerException for partitioned tables here: > [https://github.com/apache/impala/blob/0c89a9cf0f2b642ca214e4fa68eeea9bc32ef3af/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L4953] > Repro: > {noformat} > create table test_ice (i int, p int) partition by spec (p bucket 5) stored as > iceberg; > insert into test_ice values (1, 2); > {noformat} > Since Hive Replication doesn't work for Iceberg tables yet it's probably > better to disable INSERT events for Iceberg tables. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-10735) INSERT INTO Iceberg table fails during INSERT event generation
Zoltán Borók-Nagy created IMPALA-10735: -- Summary: INSERT INTO Iceberg table fails during INSERT event generation Key: IMPALA-10735 URL: https://issues.apache.org/jira/browse/IMPALA-10735 Project: IMPALA Issue Type: Bug Reporter: Zoltán Borók-Nagy INSERT INTO Iceberg table is broken when we use INSERT events. We get a NullPointerException for partitioned tables here: [https://github.com/apache/impala/blob/0c89a9cf0f2b642ca214e4fa68eeea9bc32ef3af/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L4953] Repro: {noformat} create table test_ice (i int, p int) partition by spec (p bucket 5) stored as iceberg; insert into test_ice values (1, 2); {noformat} Since Hive Replication doesn't work for Iceberg tables yet it's probably better to disable INSERT events for Iceberg tables. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-10736) Add support for Hive Replication for Iceberg tables
Zoltán Borók-Nagy created IMPALA-10736: -- Summary: Add support for Hive Replication for Iceberg tables Key: IMPALA-10736 URL: https://issues.apache.org/jira/browse/IMPALA-10736 Project: IMPALA Issue Type: Bug Reporter: Zoltán Borók-Nagy Hive Replication currently doesn't support Iceberg tables. Once it will, we'll need to add support for it as well. Currently Iceberg stores absolute paths in its metadata files, so we'll probably need to wait for this issue to be resolved as well: https://github.com/apache/iceberg/issues/1617 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-10707) TestIceberg::test_iceberg_query failed with INNER EXCEPTION
[ https://issues.apache.org/jira/browse/IMPALA-10707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17346809#comment-17346809 ] Zoltán Borók-Nagy commented on IMPALA-10707: I've looked at the logs, and the ERROR log files of catalogd and impalad are full of errors: {noformat} 1.1G catalogd.ERROR 202M impalad_node1.ERROR {noformat} They are full of with the following exceptions: {noformat} E0517 06:13:33.866099 11134 TransactionKeepalive.java:137] Unexpected exception thrown Java exception follows: java.lang.BootstrapMethodError: java.lang.NoClassDefFoundError: at org.apache.impala.common.TransactionKeepalive$DaemonThread.run(TransactionKeepalive.java:114) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.NoClassDefFoundError: ... 2 more E0517 06:13:33.866303 9973 TransactionKeepalive.java:137] Unexpected exception thrown Java exception follows: java.lang.BootstrapMethodError: java.lang.NoClassDefFoundError: at org.apache.impala.common.TransactionKeepalive$DaemonThread.run(TransactionKeepalive.java:114) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.NoClassDefFoundError: ... 2 more ... {noformat} > TestIceberg::test_iceberg_query failed with INNER EXCEPTION > --- > > Key: IMPALA-10707 > URL: https://issues.apache.org/jira/browse/IMPALA-10707 > Project: IMPALA > Issue Type: Bug > Components: Backend >Reporter: Aman Sinha >Priority: Major > > Few tests related to Iceberg failed in a recent run of > impala-asf-master-core-s3-data-cache: > {noformat} > TestIceberg.test_iceberg_query[protocol: beeswax | exec_option: > {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, > 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': > '-1:OPEN:SET_DENY_RESERVATION_PROBABILITY@1.0', > 'exec_single_node_rows_threshold': 0} | table_format: parquet/none] > E ImpalaBeeswaxException: ImpalaBeeswaxException: > EINNER EXCEPTION: > EMESSAGE: Timeout >7200s > {noformat} > fyi [~boroknagyz] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-10707) TestIceberg::test_iceberg_query failed with INNER EXCEPTION
[ https://issues.apache.org/jira/browse/IMPALA-10707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17346901#comment-17346901 ] Zoltán Borók-Nagy commented on IMPALA-10707: The above error messages are symptoms of IMPALA-9900 / IMPALA-10057. > TestIceberg::test_iceberg_query failed with INNER EXCEPTION > --- > > Key: IMPALA-10707 > URL: https://issues.apache.org/jira/browse/IMPALA-10707 > Project: IMPALA > Issue Type: Bug > Components: Backend >Reporter: Aman Sinha >Priority: Major > > Few tests related to Iceberg failed in a recent run of > impala-asf-master-core-s3-data-cache: > {noformat} > TestIceberg.test_iceberg_query[protocol: beeswax | exec_option: > {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, > 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': > '-1:OPEN:SET_DENY_RESERVATION_PROBABILITY@1.0', > 'exec_single_node_rows_threshold': 0} | table_format: parquet/none] > E ImpalaBeeswaxException: ImpalaBeeswaxException: > EINNER EXCEPTION: > EMESSAGE: Timeout >7200s > {noformat} > fyi [~boroknagyz] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-10707) TestIceberg::test_iceberg_query failed with INNER EXCEPTION
[ https://issues.apache.org/jira/browse/IMPALA-10707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltán Borók-Nagy resolved IMPALA-10707. Resolution: Duplicate Closing this because the Impala cluster had a bad health. The root cause was probably IMPALA-9900 / IMPALA-10057. > TestIceberg::test_iceberg_query failed with INNER EXCEPTION > --- > > Key: IMPALA-10707 > URL: https://issues.apache.org/jira/browse/IMPALA-10707 > Project: IMPALA > Issue Type: Bug > Components: Backend >Reporter: Aman Sinha >Priority: Major > > Few tests related to Iceberg failed in a recent run of > impala-asf-master-core-s3-data-cache: > {noformat} > TestIceberg.test_iceberg_query[protocol: beeswax | exec_option: > {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, > 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': > '-1:OPEN:SET_DENY_RESERVATION_PROBABILITY@1.0', > 'exec_single_node_rows_threshold': 0} | table_format: parquet/none] > E ImpalaBeeswaxException: ImpalaBeeswaxException: > EINNER EXCEPTION: > EMESSAGE: Timeout >7200s > {noformat} > fyi [~boroknagyz] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-10626) Add support for Iceberg's Catalogs API
[ https://issues.apache.org/jira/browse/IMPALA-10626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltán Borók-Nagy updated IMPALA-10626: --- Description: Some engines (e.g. Spark and Hive) use different table properties for defining catalog properties. In the main hive configuration they store the following properties: * {{iceberg.catalog..type = hadoop}} * {{iceberg.catalog..warehouse = somelocation}} On the table level they have the following properties: * {{iceberg.catalog = }} * {{name = }} To load tables that use these kind of configurations we should use Iceberg's Catalogs class: [https://github.com/apache/iceberg/blob/master/mr/src/main/java/org/apache/iceberg/mr/Catalogs.java] Probably we'll also want to use these properties by default in the future. was: Some engines (e.g. Spark and Hive) use different table properties for defining catalog properties. In the main hive configuration they store the following properties: * {{iceberg.catalog..type = hadoop}} * {{iceberg.catalog..warehouse = somelocation}} On the table level they have the following properties: * {{iceberg.mr.table.catalog = }} * {{iceberg.mr.table.identifier = }} To load tables that use these kind of configurations we should use Iceberg's Catalogs class: [https://github.com/apache/iceberg/blob/master/mr/src/main/java/org/apache/iceberg/mr/Catalogs.java] Probably we'll also want to use these properties by default in the future. > Add support for Iceberg's Catalogs API > -- > > Key: IMPALA-10626 > URL: https://issues.apache.org/jira/browse/IMPALA-10626 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Reporter: Zoltán Borók-Nagy >Assignee: Zoltán Borók-Nagy >Priority: Major > Labels: impala-iceberg > > Some engines (e.g. Spark and Hive) use different table properties for > defining catalog properties. > In the main hive configuration they store the following properties: > * {{iceberg.catalog..type = hadoop}} > * {{iceberg.catalog..warehouse = somelocation}} > On the table level they have the following properties: > * {{iceberg.catalog = }} > * {{name = }} > To load tables that use these kind of configurations we should use Iceberg's > Catalogs class: > [https://github.com/apache/iceberg/blob/master/mr/src/main/java/org/apache/iceberg/mr/Catalogs.java] > Probably we'll also want to use these properties by default in the future. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-10626) Add support for Iceberg's Catalogs API
[ https://issues.apache.org/jira/browse/IMPALA-10626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltán Borók-Nagy reassigned IMPALA-10626: -- Assignee: Zoltán Borók-Nagy > Add support for Iceberg's Catalogs API > -- > > Key: IMPALA-10626 > URL: https://issues.apache.org/jira/browse/IMPALA-10626 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Reporter: Zoltán Borók-Nagy >Assignee: Zoltán Borók-Nagy >Priority: Major > Labels: impala-iceberg > > Some engines (e.g. Spark and Hive) use different table properties for > defining catalog properties. > In the main hive configuration they store the following properties: > * {{iceberg.catalog..type = hadoop}} > * {{iceberg.catalog..warehouse = somelocation}} > On the table level they have the following properties: > * {{iceberg.mr.table.catalog = }} > * {{iceberg.mr.table.identifier = }} > To load tables that use these kind of configurations we should use Iceberg's > Catalogs class: > [https://github.com/apache/iceberg/blob/master/mr/src/main/java/org/apache/iceberg/mr/Catalogs.java] > Probably we'll also want to use these properties by default in the future. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-5569) Implement UNSET TBLPROPERTIES for ALTER TABLE
[ https://issues.apache.org/jira/browse/IMPALA-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltán Borók-Nagy reassigned IMPALA-5569: - Assignee: Amogh Margoor > Implement UNSET TBLPROPERTIES for ALTER TABLE > - > > Key: IMPALA-5569 > URL: https://issues.apache.org/jira/browse/IMPALA-5569 > Project: IMPALA > Issue Type: Improvement > Components: Catalog >Affects Versions: Impala 2.9.0 >Reporter: Nándor Kollár >Assignee: Amogh Margoor >Priority: Minor > > Right now, I can set table properties via Impala client, but I couldn't find > a way to unset them. I can set them to empty string, but I've to use Hive CLI > to remove the key-value par. > It would be nice to extend ALTER TABLE with UNSET clause to be able to unset > table properties. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-10698) Remove branching based on MetastoreShim.getMajorVersion()
Zoltán Borók-Nagy created IMPALA-10698: -- Summary: Remove branching based on MetastoreShim.getMajorVersion() Key: IMPALA-10698 URL: https://issues.apache.org/jira/browse/IMPALA-10698 Project: IMPALA Issue Type: Improvement Components: Frontend Reporter: Zoltán Borók-Nagy IMPALA-9731 dropped support for Hive 2 and removed most code associated with it. However we still have if statements that check MetastoreShim.getMajorVersion(). One can check it with: {noformat} git grep getMajorVersion {noformat} It would be nice to get rid of these because the MetastoreShim.getMajorVersion() == 2 branches are dead code now. Moreover, in the test code there are still some places that check for HIVE_MAJOR_VERSION == 2, e.g.: https://github.com/apache/impala/blob/c10e7c969dfcc4847a8f7708940f4aa1852dbee4/tests/metadata/test_hms_integration.py#L215 These can be also removed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-10714) TestSpillingDebugActionDimensions::test_spilling_large_rows hid DCHECK
Zoltán Borók-Nagy created IMPALA-10714: -- Summary: TestSpillingDebugActionDimensions::test_spilling_large_rows hid DCHECK Key: IMPALA-10714 URL: https://issues.apache.org/jira/browse/IMPALA-10714 Project: IMPALA Issue Type: Bug Components: Backend Reporter: Zoltán Borók-Nagy TestSpillingDebugActionDimensions::test_spilling_large_rows hit DCHECK in exhaustive build. The Impala git hash was: e11237e29 IMPALA-10197: Add KUDU_REPLICA_SELECTION query option In impalad.FATAL: {noformat} F0525 12:45:49.307780 13122 buffered-tuple-stream.cc:531] 564af337ca503984:f1209fc4] Check failed: read_iter->read_page_->attached_to_output_batch {noformat} Query 564af337ca503984:f1209fc4 was: {noformat} I0525 12:45:48.474383 17878 impala-server.cc:1324] 564af337ca503984:f1209fc4] Registered query query_id=564af337ca503984:f1209fc4 session_id=9e4875c17adf5e7a:eb72d33dc39b5288 I0525 12:45:48.474486 17878 Frontend.java:1618] 564af337ca503984:f1209fc4] Analyzing query: select group_concat(string_col), length(bigstr) from bigstrs2 group by bigstr db: test_spilling_large_rows_119f6bb1 {noformat} I couldn't reproduce the issue locally. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-10714) TestSpillingDebugActionDimensions::test_spilling_large_rows hid DCHECK
[ https://issues.apache.org/jira/browse/IMPALA-10714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17351683#comment-17351683 ] Zoltán Borók-Nagy commented on IMPALA-10714: [~stigahuang] could you please take a look? > TestSpillingDebugActionDimensions::test_spilling_large_rows hid DCHECK > -- > > Key: IMPALA-10714 > URL: https://issues.apache.org/jira/browse/IMPALA-10714 > Project: IMPALA > Issue Type: Bug > Components: Backend >Reporter: Zoltán Borók-Nagy >Priority: Major > Labels: broken-build, flaky > > TestSpillingDebugActionDimensions::test_spilling_large_rows hit DCHECK in > exhaustive build. > The Impala git hash was: e11237e29 IMPALA-10197: Add KUDU_REPLICA_SELECTION > query option > In impalad.FATAL: > {noformat} > F0525 12:45:49.307780 13122 buffered-tuple-stream.cc:531] > 564af337ca503984:f1209fc4] Check failed: > read_iter->read_page_->attached_to_output_batch > {noformat} > Query 564af337ca503984:f1209fc4 was: > {noformat} > I0525 12:45:48.474383 17878 impala-server.cc:1324] > 564af337ca503984:f1209fc4] Registered query > query_id=564af337ca503984:f1209fc4 > session_id=9e4875c17adf5e7a:eb72d33dc39b5288 > I0525 12:45:48.474486 17878 Frontend.java:1618] > 564af337ca503984:f1209fc4] Analyzing query: select > group_concat(string_col), length(bigstr) from bigstrs2 > group by bigstr db: test_spilling_large_rows_119f6bb1 > {noformat} > I couldn't reproduce the issue locally. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-10714) TestSpillingDebugActionDimensions::test_spilling_large_rows hit DCHECK
[ https://issues.apache.org/jira/browse/IMPALA-10714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltán Borók-Nagy updated IMPALA-10714: --- Summary: TestSpillingDebugActionDimensions::test_spilling_large_rows hit DCHECK (was: TestSpillingDebugActionDimensions::test_spilling_large_rows hid DCHECK) > TestSpillingDebugActionDimensions::test_spilling_large_rows hit DCHECK > -- > > Key: IMPALA-10714 > URL: https://issues.apache.org/jira/browse/IMPALA-10714 > Project: IMPALA > Issue Type: Bug > Components: Backend >Reporter: Zoltán Borók-Nagy >Priority: Major > Labels: broken-build, flaky > > TestSpillingDebugActionDimensions::test_spilling_large_rows hit DCHECK in > exhaustive build. > The Impala git hash was: e11237e29 IMPALA-10197: Add KUDU_REPLICA_SELECTION > query option > In impalad.FATAL: > {noformat} > F0525 12:45:49.307780 13122 buffered-tuple-stream.cc:531] > 564af337ca503984:f1209fc4] Check failed: > read_iter->read_page_->attached_to_output_batch > {noformat} > Query 564af337ca503984:f1209fc4 was: > {noformat} > I0525 12:45:48.474383 17878 impala-server.cc:1324] > 564af337ca503984:f1209fc4] Registered query > query_id=564af337ca503984:f1209fc4 > session_id=9e4875c17adf5e7a:eb72d33dc39b5288 > I0525 12:45:48.474486 17878 Frontend.java:1618] > 564af337ca503984:f1209fc4] Analyzing query: select > group_concat(string_col), length(bigstr) from bigstrs2 > group by bigstr db: test_spilling_large_rows_119f6bb1 > {noformat} > I couldn't reproduce the issue locally. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-9355) TestExchangeMemUsage.test_exchange_mem_usage_scaling doesn't hit the memory limit
[ https://issues.apache.org/jira/browse/IMPALA-9355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17351726#comment-17351726 ] Zoltán Borók-Nagy commented on IMPALA-9355: --- I've seen this in a recent exhaustive test run: {noformat} query_test/test_mem_usage_scaling.py:418: in test_exchange_mem_usage_scaling self.run_test_case('QueryTest/exchange-mem-scaling', vector) common/impala_test_suite.py:732: in run_test_case expected_str, query) E AssertionError: Expected exception: Memory limit exceeded E E when running: E E set mem_limit=171520k; E set num_scanner_threads=1; E select * E from tpch_parquet.lineitem l1 E join tpch_parquet.lineitem l2 on l1.l_orderkey = l2.l_orderkey and E l1.l_partkey = l2.l_partkey and l1.l_suppkey = l2.l_suppkey E and l1.l_linenumber = l2.l_linenumber E order by l1.l_orderkey desc, l1.l_partkey, l1.l_suppkey, l1.l_linenumber E limit 5{noformat} > TestExchangeMemUsage.test_exchange_mem_usage_scaling doesn't hit the memory > limit > - > > Key: IMPALA-9355 > URL: https://issues.apache.org/jira/browse/IMPALA-9355 > Project: IMPALA > Issue Type: Bug > Components: Backend >Reporter: Fang-Yu Rao >Assignee: Qifan Chen >Priority: Critical > Labels: broken-build, flaky > Fix For: Impala 4.0 > > > The EE test {{test_exchange_mem_usage_scaling}} failed because the query at > [https://github.com/apache/impala/blame/master/testdata/workloads/functional-query/queries/QueryTest/exchange-mem-scaling.test#L7-L15] > does not hit the specified memory limit (170m) at > [https://github.com/apache/impala/blame/master/testdata/workloads/functional-query/queries/QueryTest/exchange-mem-scaling.test#L7]. > We may need to further reduce the specified limit. In what follows the error > message is also given. Recall that the same issue occurred at > https://issues.apache.org/jira/browse/IMPALA-7873 but was resolved. > {code:java} > FAIL > query_test/test_mem_usage_scaling.py::TestExchangeMemUsage::()::test_exchange_mem_usage_scaling[protocol: > beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, > 'disable_codegen_rows_threshold': 5000, 'disable_codegen': False, > 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: > parquet/none] > === FAILURES > === > TestExchangeMemUsage.test_exchange_mem_usage_scaling[protocol: beeswax | > exec_option: {'batch_size': 0, 'num_nodes': 0, > 'disable_codegen_rows_threshold': 5000, 'disable_codegen': False, > 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: > parquet/none] > [gw3] linux2 -- Python 2.7.12 > /home/ubuntu/Impala/bin/../infra/python/env/bin/python > query_test/test_mem_usage_scaling.py:386: in test_exchange_mem_usage_scaling > self.run_test_case('QueryTest/exchange-mem-scaling', vector) > common/impala_test_suite.py:674: in run_test_case > expected_str, query) > E AssertionError: Expected exception: Memory limit exceeded > E > E when running: > E > E set mem_limit=170m; > E set num_scanner_threads=1; > E select * > E from tpch_parquet.lineitem l1 > E join tpch_parquet.lineitem l2 on l1.l_orderkey = l2.l_orderkey and > E l1.l_partkey = l2.l_partkey and l1.l_suppkey = l2.l_suppkey > E and l1.l_linenumber = l2.l_linenumber > E order by l1.l_orderkey desc, l1.l_partkey, l1.l_suppkey, l1.l_linenumber > E limit 5 > {code} > [~tarmstr...@cloudera.com] and [~joemcdonnell] reviewed the patch at > [https://gerrit.cloudera.org/c/11965/]. Assign this JIRA to [~joemcdonnell] > for now. Please re-assign the JIRA to others as appropriate. Thanks! > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-9833) query_test.test_observability.TestQueryStates.test_error_query_state is flaky
[ https://issues.apache.org/jira/browse/IMPALA-9833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17351693#comment-17351693 ] Zoltán Borók-Nagy commented on IMPALA-9833: --- I saw this again in an exhaustive release run: {noformat} query_test/test_observability.py:808: in test_error_query_state lambda: self.client.get_runtime_profile(handle)) common/impala_test_suite.py:1188: in assert_eventually count, timeout_s, error_msg_str)) E Timeout: Check failed to return True after 300 tries and 300 seconds error message: Query (id=ed4d8962122c7d00:7c61b26c): {noformat} > query_test.test_observability.TestQueryStates.test_error_query_state is flaky > - > > Key: IMPALA-9833 > URL: https://issues.apache.org/jira/browse/IMPALA-9833 > Project: IMPALA > Issue Type: Bug > Components: Infrastructure >Affects Versions: Impala 4.0 >Reporter: Xiaomeng Zhang >Assignee: Quanlong Huang >Priority: Major > Attachments: consoleFull_impala-cdpd-master-exhaustive-release.txt, > impalad_excerpted.INFO.zip > > > [https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/2521/testReport/junit/query_test.test_observability/TestQueryStates/test_error_query_state/] > It seems the test could not get query profile after retries in 30s. > {code:java} > Stacktracequery_test/test_observability.py:777: in test_error_query_state > lambda: self.client.get_runtime_profile(handle)) > common/impala_test_suite.py:1120: in assert_eventually > count, timeout_s, error_msg_str)) > E Timeout: Check failed to return True after 30 tries and 30 seconds error > message: Query (id=fe45e8bfd138acd3:c67a3796) > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-9833) query_test.test_observability.TestQueryStates.test_error_query_state is flaky
[ https://issues.apache.org/jira/browse/IMPALA-9833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltán Borók-Nagy updated IMPALA-9833: -- Labels: broken-build (was: ) > query_test.test_observability.TestQueryStates.test_error_query_state is flaky > - > > Key: IMPALA-9833 > URL: https://issues.apache.org/jira/browse/IMPALA-9833 > Project: IMPALA > Issue Type: Bug > Components: Infrastructure >Affects Versions: Impala 4.0 >Reporter: Xiaomeng Zhang >Assignee: Quanlong Huang >Priority: Major > Labels: broken-build > Attachments: consoleFull_impala-cdpd-master-exhaustive-release.txt, > impalad_excerpted.INFO.zip > > > [https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/2521/testReport/junit/query_test.test_observability/TestQueryStates/test_error_query_state/] > It seems the test could not get query profile after retries in 30s. > {code:java} > Stacktracequery_test/test_observability.py:777: in test_error_query_state > lambda: self.client.get_runtime_profile(handle)) > common/impala_test_suite.py:1120: in assert_eventually > count, timeout_s, error_msg_str)) > E Timeout: Check failed to return True after 30 tries and 30 seconds error > message: Query (id=fe45e8bfd138acd3:c67a3796) > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-10715) test_decimal_min_max_filters failed in exhaustive run
[ https://issues.apache.org/jira/browse/IMPALA-10715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltán Borók-Nagy reassigned IMPALA-10715: -- Assignee: Qifan Chen > test_decimal_min_max_filters failed in exhaustive run > - > > Key: IMPALA-10715 > URL: https://issues.apache.org/jira/browse/IMPALA-10715 > Project: IMPALA > Issue Type: Bug > Components: Backend >Reporter: Zoltán Borók-Nagy >Assignee: Qifan Chen >Priority: Major > Labels: broken-build > > test_decimal_min_max_filters failed in exhaustive run > *Stack Trace* > {noformat} > query_test/test_runtime_filters.py:223: in test_decimal_min_max_filters > test_file_vars={'$RUNTIME_FILTER_WAIT_TIME_MS': str(WAIT_TIME_MS)}) > common/impala_test_suite.py:775: in run_test_case > update_section=pytest.config.option.update_results) > common/test_result_verifier.py:653: in verify_runtime_profile > % (function, field, expected_value, actual_value, op, actual)) > E AssertionError: Aggregation of SUM over ProbeRows did not match expected > results. > E EXPECTED VALUE: > E 102 > E > E > E ACTUAL VALUE: > E 38 > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-10715) test_decimal_min_max_filters failed in exhaustive run
Zoltán Borók-Nagy created IMPALA-10715: -- Summary: test_decimal_min_max_filters failed in exhaustive run Key: IMPALA-10715 URL: https://issues.apache.org/jira/browse/IMPALA-10715 Project: IMPALA Issue Type: Bug Components: Backend Reporter: Zoltán Borók-Nagy test_decimal_min_max_filters failed in exhaustive run *Stack Trace* {noformat} query_test/test_runtime_filters.py:223: in test_decimal_min_max_filters test_file_vars={'$RUNTIME_FILTER_WAIT_TIME_MS': str(WAIT_TIME_MS)}) common/impala_test_suite.py:775: in run_test_case update_section=pytest.config.option.update_results) common/test_result_verifier.py:653: in verify_runtime_profile % (function, field, expected_value, actual_value, op, actual)) E AssertionError: Aggregation of SUM over ProbeRows did not match expected results. E EXPECTED VALUE: E 102 E E E ACTUAL VALUE: E 38 {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-10104) multiple if funtion and multiple-agg cause impalad crashed
[ https://issues.apache.org/jira/browse/IMPALA-10104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17354424#comment-17354424 ] Zoltán Borók-Nagy commented on IMPALA-10104: Seems like IMPALA-8969 or IMPALA-9809. Could you please try again with those fixes? > multiple if funtion and multiple-agg cause impalad crashed > --- > > Key: IMPALA-10104 > URL: https://issues.apache.org/jira/browse/IMPALA-10104 > Project: IMPALA > Issue Type: Bug > Components: Backend, Frontend >Affects Versions: Impala 3.2.0 > Environment: CDH6.3.1 > jdk 1.8.0_131 >Reporter: lxc >Priority: Major > > sql: > SELECT max(datekey) as datekey , > if(`exp` like '%a','A', > if(`exp` like '%b', 'B', > if(`exp` like '%c', 'C', > if(`exp` like '%d', 'D', 'E' test, > sum(cast(money AS FLOAT)) / (count(*)/ 1000) AS ecpm, > count(*) AS e_num, > count(DISTINCT aa) AS r_num, > sum(isd) AS d_num, > sum(isc) AS c_num, > count(DISTINCT bb) AS uv, > sum(isd) /count(DISTINCT aa) AS d_rate, > sum(isc) /count(DISTINCT aa) AS c_rate, > sum(cast(money AS FLOAT)) AS money > FROM tableA > WHERE datekey = '20200812' > GROUP BY test; > the coredump info.: > > minicoredump: > #0 0x7f1005f491f7 in raise () from /lib64/libc.so.6 > Missing separate debuginfos, use: debuginfo-install > cyrus-sasl-gssapi-2.1.26-23.el7.x86_64 cyrus-sasl-lib-2.1.26-23.el7.x86_64 > cyrus-sasl-plain-2.1.26-23.el7.x86_64 glibc-2.17-196.el7.x86_64 > keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-46.el7.x86_64 > libcom_err-1.42.9-10.el7.x86_64 libdb-5.3.21-20.el7.x86_64 > libselinux-2.5-11.el7.x86_64 nss-softokn-freebl-3.28.3-6.el7.x86_64 > openssl-libs-1.0.2k-8.el7.x86_64 pcre-8.32-17.el7.x86_64 > zlib-1.2.7-17.el7.x86_64 > (gdb) bt > #0 0x7f1005f491f7 in raise () from /lib64/libc.so.6 > #1 0x7f1005f4a8e8 in abort () from /lib64/libc.so.6 > #2 0x048d4ca4 in google::DumpStackTraceAndExit() () > #3 0x048cb6fd in google::LogMessage::Fail() () > #4 0x048ccfa2 in google::LogMessage::SendToLog() () > #5 0x048cb0d7 in google::LogMessage::Flush() () > #6 0x048ce69e in google::LogMessageFatal::~LogMessageFatal() () > #7 0x0281201b in > impala::FreePool::CheckValidAllocation(impala::FreePool::FreeListNode*, > unsigned char*) const () > #8 0x02811cbd in impala::FreePool::Free(unsigned char*) () > #9 0x028104c3 in impala_udf::FunctionContext::Free(unsigned char*) () > #10 0x024c8ee6 in > impala::AggregateFunctions::StringValSerializeOrFinalize(impala_udf::FunctionContext*, > impala_udf::StringVal const&) () > #11 0x024c410c in > impala::AggFnEvaluator::SerializeOrFinalize(impala::Tuple*, > impala::SlotDescriptor const&, impala::Tuple*, void*) () > #12 0x7f0f008d88d5 in ?? () > #13 0x1c4e8790 in ?? () > #14 0x4fcef6b724afe578 in ?? () > #15 0x in ?? () > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-10578) Big Query influence other query seriously when hardware not reach limit
[ https://issues.apache.org/jira/browse/IMPALA-10578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17354392#comment-17354392 ] Zoltán Borók-Nagy commented on IMPALA-10578: Seems like it was a configuration problem. Can we close this issue? > Big Query influence other query seriously when hardware not reach limit > > > Key: IMPALA-10578 > URL: https://issues.apache.org/jira/browse/IMPALA-10578 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 3.4.0 > Environment: impala-3.4 > 80 machines with 96 cpu and 256GB mem > scratch-dir is on separate disk different from HDFS data dir >Reporter: wesleydeng >Priority: Major > Attachments: big_query.txt.bz2, image-2021-03-10-19-59-24-188.png, > image-2021-03-16-16-32-37-862.png, small_query_be_influenced_very_slow.txt.bz2 > > > When a big query is running(use mt_dop=8), other query is very difficult to > start. > A small query (select distinct one field from a small table) may take about > 1 minutes, normallly it take only about 1~3 second. > From the impalad log, I found a incomprehensible log like this: > !image-2021-03-16-16-32-37-862.png|width=836,height=189! > !image-2021-03-10-19-59-24-188.png|width=892,height=435! > --- > About the gap between "Handling call" and "Deserializing Batch", I found > another path : > --KrpcDataStreamRecvr::SenderQueue::AddBatch > EnqueueDeferredRpc(move(payload), l); // after dequeue, will call > KrpcDataStreamRecvr::SenderQueue::AddBatchWork > --- > > > When the Big query is running, data spilled has happened because mem_limit > was set and this big query waste a lot of memory. > > In the attchment, I append the profile of big query and small query. The > small query can be finished in seconds normally. the timeline of small query > show as below: > Query Timeline: 21m39s > - Query submitted: 48.846us (48.846us) > - Planning finished: 2.934ms (2.886ms) > - Submit for admission: 12.572ms (9.637ms) > - Completed admission: 13.622ms (1.050ms) > - Ready to start on 56 backends: 15.271ms (1.649ms) > -- All 56 execution backends (171 fragment instances) started: 18s505ms > (18s489ms)* > - Rows available: 51s770ms (33s265ms) > - First row fetched: 57s220ms (5s449ms) > - Last row fetched: 59s119ms (1s899ms) > - Released admission control resources: 1m1s (2s223ms) > - AdmissionControlTimeSinceLastUpdate: 80.000ms > - ComputeScanRangeAssignmentTimer: 439.749us > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-6417) Extend various thread pools to track fragment instance id
[ https://issues.apache.org/jira/browse/IMPALA-6417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltán Borók-Nagy resolved IMPALA-6417. --- Resolution: Fixed > Extend various thread pools to track fragment instance id > - > > Key: IMPALA-6417 > URL: https://issues.apache.org/jira/browse/IMPALA-6417 > Project: IMPALA > Issue Type: Sub-task > Components: Backend >Reporter: Zoltán Borók-Nagy >Assignee: Zoltán Borók-Nagy >Priority: Major > > Extend impala::ThreadPool, DiskIoMgr, kudu::ThreadPool, and so on. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-6254) Track fragment instance id for general purpose threads
[ https://issues.apache.org/jira/browse/IMPALA-6254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltán Borók-Nagy resolved IMPALA-6254. --- Resolution: Fixed > Track fragment instance id for general purpose threads > -- > > Key: IMPALA-6254 > URL: https://issues.apache.org/jira/browse/IMPALA-6254 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Affects Versions: Impala 2.10.0 >Reporter: Lars Volker >Assignee: Zoltán Borók-Nagy >Priority: Major > Labels: observability > > Fragment instance threads currently have the instance name in their thread > name. > {noformat} > exec-finstance (finst:1546b532c445c5f3:5dc794e50003) > scanner-thread (finst:1546b532c445c5f3:5dc794e50003, plan-node-id:0, > thread-idx:0) > scanner-thread (finst:1546b532c445c5f3:5dc794e50003, plan-node-id:0, > thread-idx:1) > profile-report (finst:1546b532c445c5f3:5dc794e50003) > scanner-thread (finst:1546b532c445c5f3:5dc794e50003, plan-node-id:0, > thread-idx:2) > profile-report (finst:1546b532c445c5f3:5dc794e5) > exec-finstance (finst:1546b532c445c5f3:5dc794e5) > {noformat} > For thread pools that do work that can be tied to particular fragment > instances, we should look into ways to annotate them. This could require > adding some breadcrumbs to each instance-specific work item (I/O requests > etc) so that the worker threads can annotate themselves. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-10713) Use PARTITION-level locking for static partition INSERTs for ACID tables
[ https://issues.apache.org/jira/browse/IMPALA-10713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltán Borók-Nagy resolved IMPALA-10713. Fix Version/s: Impala 4.0 Target Version: Impala 4.0 Resolution: Fixed > Use PARTITION-level locking for static partition INSERTs for ACID tables > > > Key: IMPALA-10713 > URL: https://issues.apache.org/jira/browse/IMPALA-10713 > Project: IMPALA > Issue Type: Improvement > Components: Frontend >Reporter: Zoltán Borók-Nagy >Assignee: Zoltán Borók-Nagy >Priority: Major > Labels: newbe, ramp-up > Fix For: Impala 4.0 > > > Currently Impala always create TABLE-level locks for INSERTs for ACID tables: > [https://github.com/apache/impala/blob/ced7b7d221cda30c65504e18082bb0af6c3cb595/fe/src/main/java/org/apache/impala/service/Frontend.java#L2153] > For static partition INSERTs we could create PARTITION-level locks instead. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-10703) PrintPath() crashes with ARRAY in ORC format
[ https://issues.apache.org/jira/browse/IMPALA-10703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltán Borók-Nagy reassigned IMPALA-10703: -- Assignee: Amogh Margoor > PrintPath() crashes with ARRAY in ORC format > > > Key: IMPALA-10703 > URL: https://issues.apache.org/jira/browse/IMPALA-10703 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 3.4.0 >Reporter: Gabor Kaszab >Assignee: Amogh Margoor >Priority: Major > Labels: complextype, orc > > Repro steps: > - Issue only happens in debug build as apparently there is a DCHECK failing. > - You have to launch Impala with --log_level=3 option to increase the log > level. > - Then running this query crashes Impala: > {code:java} > select inner_arr.ITEM.e from functional_orc_def.complextypestbl tbl, > functional_orc_def.complextypestbl.nested_struct.c.d.ITEM inner_arr; > {code} > > Backtrace (relevant part): > {code:java} > #7 0x0280c2b4 in > impala::PrintPath[abi:cxx11](impala::TableDescriptor const&, std::vector std::allocator > const&) (tbl_desc=..., path=...) at > /home/gaborkaszab/shadow/Impala-upstream/be/src/util/debug-util.cc:237 > #8 0x02a69eeb in impala::HdfsOrcScanner::ResolveColumns > (this=0x10e79000, tuple_desc=..., > selected_nodes=0x7fe54980a7d0, pos_slots=0x7fe54980a780) > at > /home/gaborkaszab/shadow/Impala-upstream/be/src/exec/hdfs-orc-scanner.cc:452 > #9 0x02a69cf7 in impala::HdfsOrcScanner::ResolveColumns > (this=0x10e79000, tuple_desc=..., > selected_nodes=0x7fe54980a7d0, pos_slots=0x7fe54980a780) > at > /home/gaborkaszab/shadow/Impala-upstream/be/src/exec/hdfs-orc-scanner.cc:449 > #10 0x02a6a547 in impala::HdfsOrcScanner::SelectColumns > (this=0x10e79000, tuple_desc=...) > at > /home/gaborkaszab/shadow/Impala-upstream/be/src/exec/hdfs-orc-scanner.cc:497 > #11 0x02a67720 in impala::HdfsOrcScanner::Open (this=0x10e79000, > context=0x7fe54980b260) > at > /home/gaborkaszab/shadow/Impala-upstream/be/src/exec/hdfs-orc-scanner.cc:237 > #12 0x029f19c9 in > impala::HdfsScanNodeBase::CreateAndOpenScannerHelper (this=0xd280800, > partition=0xaac3d80, > context=0x7fe54980b260, scanner=0x7fe54980b258) > at > /home/gaborkaszab/shadow/Impala-upstream/be/src/exec/hdfs-scan-node-base.cc:874 > #13 0x02baab86 in impala::HdfsScanNode::ProcessSplit (this=0xd280800, > filter_ctxs=..., > expr_results_pool=0x7fe54980b500, scan_range=0xac59c00, > scanner_thread_reservation=0x7fe54980b428) > at > /home/gaborkaszab/shadow/Impala-upstream/be/src/exec/hdfs-scan-node.cc:480 > #14 0x02baa28a in impala::HdfsScanNode::ScannerThread > (this=0xd280800, first_thread=true, > scanner_thread_reservation=8192) at > /home/gaborkaszab/shadow/Impala-upstream/be/src/exec/hdfs-scan-node.cc:418 > #15 0x02ba95f2 in impala::HdfsScanNodeoperator()(void) > const (__closure=0x7fe54980bc28) > at > /home/gaborkaszab/shadow/Impala-upstream/be/src/exec/hdfs-scan-node.cc:339 > {code} > This DCHECK fails: > > [https://github.com/apache/impala/blob/a47700ed790c2415e52a85e40063bed53a7cb9e8/be/src/util/debug-util.cc#L237] > {code:java} > Check failed: path[i] == 1 (5 vs. 1) > {code} > There was a similar issue recently, but here a different DCHECK fails: > https://issues.apache.org/jira/browse/IMPALA-9918 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-10713) Use PARTITION-level locking for static partition INSERTs for ACID tables
[jira] [Commented] (IMPALA-10656) Fire insert events before commit
[ https://issues.apache.org/jira/browse/IMPALA-10656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17356388#comment-17356388 ] Zoltán Borók-Nagy commented on IMPALA-10656: Can we resolve this issue? > Fire insert events before commit > > > Key: IMPALA-10656 > URL: https://issues.apache.org/jira/browse/IMPALA-10656 > Project: IMPALA > Issue Type: Bug > Components: Backend, Frontend >Reporter: Csaba Ringhofer >Assignee: Csaba Ringhofer >Priority: Major > > Currently Impala commits an insert first, then reloads the table from HMS, > and generates the insert events based on the difference between the two > snapshots. (e.g. which file was not present in the old snapshot but are there > in the new). Hive replication expects the insert events before the commit, so > this may potentially lead to issues there, > The solution is to collect the new files during the insert in the backend, > and send the insert events based on this file set. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-10413) Impala crashing when retrying failed query
[ https://issues.apache.org/jira/browse/IMPALA-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17357214#comment-17357214 ] Zoltán Borók-Nagy commented on IMPALA-10413: Can we resolve this Jira? > Impala crashing when retrying failed query > -- > > Key: IMPALA-10413 > URL: https://issues.apache.org/jira/browse/IMPALA-10413 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 4.0 >Reporter: Xianqing He >Assignee: Xianqing He >Priority: Minor > > When retrying failed query, it may crash if cancel the query > The core stack below > {code:java} > #0 0x7f1b20e87387 in raise () from /lib64/libc.so.6 > #1 0x7f1b20e88a78 in abort () from /lib64/libc.so.6 > #2 0x7f1b23b754b9 in os::abort(bool) () from > /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.252.b09-2.el7_8.x86_64/jre/lib/amd64/server/libjvm.so > #3 0x7f1b23d93db6 in VMError::report_and_die() () from > /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.252.b09-2.el7_8.x86_64/jre/lib/amd64/server/libjvm.so > #4 0x7f1b23b7f505 in JVM_handle_linux_signal () from > /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.252.b09-2.el7_8.x86_64/jre/lib/amd64/server/libjvm.so > #5 0x7f1b23b72678 in signalHandler(int, siginfo_t*, void*) () from > /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.252.b09-2.el7_8.x86_64/jre/lib/amd64/server/libjvm.so > #6 > #7 push_back (__x=..., this=0x28) at > /data/Impala-ASF/toolchain/toolchain-packages-gcc7.5.0/gcc-7.5.0/include/c++/7.5.0/bits/stl_vector.h:941 > #8 AddDetail (d=..., this=0x0) at > /data/Impala-ASF/be/src/util/error-util.h:114 > #9 impala::Status::AddDetail (this=this@entry=0x7f1a4b971740, msg=...) at > /data/Impala-ASF/be/src/common/status.cc:236 > #10 0x0190a4fc in impala::QueryDriver::HandleRetryFailure > (this=this@entry=0xb1b2880, status=status@entry=0x7f1a4b971740, > error_msg=error_msg@entry=0x7f1a4b971860, > request_state=request_state@entry=0x9ca7e00, retry_query_id=...) > at /data/Impala-ASF/be/src/runtime/query-driver.cc:351 > #11 0x0190c605 in impala::QueryDriver::RetryQueryFromThread > (this=0xb1b2880, error=..., query_driver=...) at > /data/Impala-ASF/be/src/runtime/query-driver.cc:293 > #12 0x01912459 in operator() (a2=..., a1=..., p=, > this=) at > /data/Impala-ASF/toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/bind/mem_fn_template.hpp:280 > #13 operator() impala::Status&, std::shared_ptr >, boost::_bi::list0> > (a=, f=..., this=) > at > /data/Impala-ASF/toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/bind/bind.hpp:398 > #14 operator() (this=) at > /data/Impala-ASF/toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/bind/bind.hpp:1222 > #15 > boost::detail::function::void_function_obj_invoker0 boost::_mfi::mf2 std::shared_ptr >, > boost::_bi::list3, > boost::_bi::value, > boost::_bi::value > > >, void>::invoke > (function_obj_ptr=...) > at > /data/Impala-ASF/toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/function/function_template.hpp:159 > #16 0x015386f2 in operator() (this=0x7f1a4b971c00) at > /data/Impala-ASF/toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/function/function_template.hpp:770 > #17 impala::Thread::SuperviseThread(std::__cxx11::basic_string std::char_traits, std::allocator > const&, > std::__cxx11::basic_string, std::allocator > > const&, boost::function, impala::ThreadDebugInfo const*, > impala::Promise*) (name=..., category=..., > functor=..., parent_thread_info=, > thread_started=0x7f1ad5750ec0) at /data/Impala-ASF/be/src/util/thread.cc:360 > #18 0x01539b6b in operator() std::__cxx11::basic_string&, const std::__cxx11::basic_string&, > boost::function, const impala::ThreadDebugInfo*, impala::Promise int>*), boost::_bi::list0> ( > a=, > f=@0xa20de78: 0x15383f0 > std::char_traits, std::allocator > const&, > std::__cxx11::basic_string, std::allocator > > const&, boost::function, impala::ThreadDebugInfo const*, > impala::Promise*)>, this=0xa20de80) at > /data/Impala-ASF/toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/bind/bind.hpp:531 > #19 operator() (this=0xa20de78) at > /data/Impala-ASF/toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/bind/bind.hpp:1222 > #20 boost::detail::thread_data (*)(std::__cxx11::basic_string, > std::allocator > const&, std::__cxx11::basic_string std::char_traits, std::all
[jira] [Commented] (IMPALA-10187) Event processing fails on multiple events + DROP TABLE
[ https://issues.apache.org/jira/browse/IMPALA-10187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17357328#comment-17357328 ] Zoltán Borók-Nagy commented on IMPALA-10187: I've uploaded a fix for review: https://gerrit.cloudera.org/#/c/17542/ > Event processing fails on multiple events + DROP TABLE > -- > > Key: IMPALA-10187 > URL: https://issues.apache.org/jira/browse/IMPALA-10187 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Reporter: Zoltán Borók-Nagy >Assignee: Vihang Karajgaonkar >Priority: Major > > I've seen the following during interop testing: > Some DDL statements (ALTER TABLE + DROP) were executed via Hive on the same > table. > Then CatalogD's event processor tried to process the new events: > {noformat} > I0922 14:32:56.590229 13611 HdfsTable.java:709] Loaded file and block > metadata for > default.insertonly_hiveclient_impalaclient_partitioned_8ff3a1ef_b8a8_4c7a_b1c7_0b8f4c42c61e > partitions: category=cat1, category=cat2, category=cat3, and 1 others. Time > taken: 55.145ms > I0922 14:32:56.591078 13611 TableLoader.java:103] Loaded metadata for: > default.insertonly_hiveclient_impalaclient_partitioned_8ff3a1ef_b8a8_4c7a_b1c7_0b8f4c42c61e > (303ms) > I0922 14:32:58.022068 10065 MetastoreEventsProcessor.java:482] Received 41 > events. Start event id : 39948 > I0922 14:32:58.022266 10065 MetastoreEvents.java:380] EventId: 39949 > EventType: ALTER_PARTITION Creating event 39949 of type ALTER_PARTITION on > table > default.insertonly_hiveclient_impalaclient_partitioned_8ff3a1ef_b8a8_4c7a_b1c7_0b8f4c42c61e > ... > I0922 14:32:58.024389 10065 MetastoreEvents.java:380] EventId: 39962 > EventType: DROP_TABLE Creating event 39962 of type DROP_TABLE on table > default.insertonly_hiveclient_impalaclient_partitioned_8ff3a1ef_b8a8_4c7a_b1c7_0b8f4c42c61e > {noformat} > > Impala tried to refresh the table on the first ALTER TABLE event, but since > it's been already dropped we get a TableLoadingException (caused by > NoSuchObjectException from HMS): > > {noformat} > I0922 14:32:58.028852 10065 MetastoreEvents.java:234] Total number of events > received: 41 Total number of events filtered out: 0 > I0922 14:32:58.028962 10065 CatalogServiceCatalog.java:862] Not a self-event > since the given version is -1 and service id is > I0922 14:32:58.029369 10065 CatalogServiceCatalog.java:2142] Refreshing table > metadata: > default.insertonly_hiveclient_impalaclient_partitioned_8ff3a1ef_b8a8_4c7a_b1c7_0b8f4c42c61e > E0922 14:32:58.038627 10065 MetastoreEventsProcessor.java:527] Unexpected > exception received while processing event > Java exception follows: > org.apache.impala.catalog.events.MetastoreNotificationException: Unable to > process event 39949 of type ALTER_PARTITION. Event processing will be stopped. > at > org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:620) > at > org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:513) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: org.apache.impala.catalog.TableLoadingException: Error loading > metadata for table: > default.insertonly_hiveclient_impalaclient_partitioned_8ff3a1ef_b8a8_4c7a_b1c7_0b8f4c42c61e > at > org.apache.impala.catalog.CatalogServiceCatalog.reloadTable(CatalogServiceCatalog.java:2160) > at > org.apache.impala.catalog.CatalogServiceCatalog.reloadTableIfExists(CatalogServiceCatalog.java:2365) > at > org.apache.impala.catalog.events.MetastoreEvents$MetastoreTableEvent.reloadTableFromCatalog(MetastoreEvents.java:563) > at > org.apache.impala.catalog.events.MetastoreEvents$AlterPartitionEvent.process(MetastoreEvents.java:1454) > at > org.apache.impala.catalog.events.Metast
[jira] [Assigned] (IMPALA-10713) Use PARTITION-level locking for static partition INSERTs for ACID tables
[ https://issues.apache.org/jira/browse/IMPALA-10713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltán Borók-Nagy reassigned IMPALA-10713: -- Assignee: Zoltán Borók-Nagy > Use PARTITION-level locking for static partition INSERTs for ACID tables > > > Key: IMPALA-10713 > URL: https://issues.apache.org/jira/browse/IMPALA-10713 > Project: IMPALA > Issue Type: Improvement > Components: Frontend >Reporter: Zoltán Borók-Nagy >Assignee: Zoltán Borók-Nagy >Priority: Major > Labels: newbe, ramp-up > > Currently Impala always create TABLE-level locks for INSERTs for ACID tables: > [https://github.com/apache/impala/blob/ced7b7d221cda30c65504e18082bb0af6c3cb595/fe/src/main/java/org/apache/impala/service/Frontend.java#L2153] > For static partition INSERTs we could create PARTITION-level locks instead. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-10713) Use PARTITION-level locking for static partition INSERTs for ACID tables
Zoltán Borók-Nagy created IMPALA-10713: -- Summary: Use PARTITION-level locking for static partition INSERTs for ACID tables Key: IMPALA-10713 URL: https://issues.apache.org/jira/browse/IMPALA-10713 Project: IMPALA Issue Type: Improvement Components: Frontend Reporter: Zoltán Borók-Nagy Currently Impala always create TABLE-level locks for INSERTs for ACID tables: [https://github.com/apache/impala/blob/ced7b7d221cda30c65504e18082bb0af6c3cb595/fe/src/main/java/org/apache/impala/service/Frontend.java#L2153] For static partition INSERTs we could create PARTITION-level locks instead. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-10634) Removes outer join if it only has DISTINCT on streamed side
[ https://issues.apache.org/jira/browse/IMPALA-10634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltán Borók-Nagy resolved IMPALA-10634. Resolution: Duplicate > Removes outer join if it only has DISTINCT on streamed side > --- > > Key: IMPALA-10634 > URL: https://issues.apache.org/jira/browse/IMPALA-10634 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Reporter: Yuming Wang >Priority: Major > > Removes outer join if it only has DISTINCT on streamed side: > {code:sql} > CREATE TABLE t1(a int, b int); > CREATE TABLE t2(a int, b int); > SELECT DISTINCT t1.b FROM t1 LEFT JOIN t2 ON t1.a = t2.a; > {code} > We can rewrite {{SELECT DISTINCT b FROM t1 LEFT JOIN t2 ON t1.a = t2.a}} to > {{SELECT DISTINCT t1.b FROM t1}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Reopened] (IMPALA-9355) TestExchangeMemUsage.test_exchange_mem_usage_scaling doesn't hit the memory limit
[ https://issues.apache.org/jira/browse/IMPALA-9355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltán Borók-Nagy reopened IMPALA-9355: --- > TestExchangeMemUsage.test_exchange_mem_usage_scaling doesn't hit the memory > limit > - > > Key: IMPALA-9355 > URL: https://issues.apache.org/jira/browse/IMPALA-9355 > Project: IMPALA > Issue Type: Bug > Components: Backend >Reporter: Fang-Yu Rao >Assignee: Qifan Chen >Priority: Critical > Labels: broken-build, flaky > Fix For: Impala 4.0 > > > The EE test {{test_exchange_mem_usage_scaling}} failed because the query at > [https://github.com/apache/impala/blame/master/testdata/workloads/functional-query/queries/QueryTest/exchange-mem-scaling.test#L7-L15] > does not hit the specified memory limit (170m) at > [https://github.com/apache/impala/blame/master/testdata/workloads/functional-query/queries/QueryTest/exchange-mem-scaling.test#L7]. > We may need to further reduce the specified limit. In what follows the error > message is also given. Recall that the same issue occurred at > https://issues.apache.org/jira/browse/IMPALA-7873 but was resolved. > {code:java} > FAIL > query_test/test_mem_usage_scaling.py::TestExchangeMemUsage::()::test_exchange_mem_usage_scaling[protocol: > beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, > 'disable_codegen_rows_threshold': 5000, 'disable_codegen': False, > 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: > parquet/none] > === FAILURES > === > TestExchangeMemUsage.test_exchange_mem_usage_scaling[protocol: beeswax | > exec_option: {'batch_size': 0, 'num_nodes': 0, > 'disable_codegen_rows_threshold': 5000, 'disable_codegen': False, > 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: > parquet/none] > [gw3] linux2 -- Python 2.7.12 > /home/ubuntu/Impala/bin/../infra/python/env/bin/python > query_test/test_mem_usage_scaling.py:386: in test_exchange_mem_usage_scaling > self.run_test_case('QueryTest/exchange-mem-scaling', vector) > common/impala_test_suite.py:674: in run_test_case > expected_str, query) > E AssertionError: Expected exception: Memory limit exceeded > E > E when running: > E > E set mem_limit=170m; > E set num_scanner_threads=1; > E select * > E from tpch_parquet.lineitem l1 > E join tpch_parquet.lineitem l2 on l1.l_orderkey = l2.l_orderkey and > E l1.l_partkey = l2.l_partkey and l1.l_suppkey = l2.l_suppkey > E and l1.l_linenumber = l2.l_linenumber > E order by l1.l_orderkey desc, l1.l_partkey, l1.l_suppkey, l1.l_linenumber > E limit 5 > {code} > [~tarmstr...@cloudera.com] and [~joemcdonnell] reviewed the patch at > [https://gerrit.cloudera.org/c/11965/]. Assign this JIRA to [~joemcdonnell] > for now. Please re-assign the JIRA to others as appropriate. Thanks! > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-10744) Send INSERT events even when Impala's even processing is not enabled
[ https://issues.apache.org/jira/browse/IMPALA-10744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltán Borók-Nagy updated IMPALA-10744: --- Description: Generating insert events should not be conditional to events processor being active or not. Related code: https://github.com/apache/impala/blob/d99caa1f3a049fc5e20855f8e8bf846fd81f65c5/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L5124 Please note that this will also need to fix a bug in the createInsertEvents() code as an INSERT with an empty result set raises an IllegalStateException: create table ctas_empty as select * from functional.alltypes limit 0; was: Generating insert events should not be conditional to events processor being active or not. Related code: https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L5020-L5023 Please note that this will also need to fix a bug in the createInsertEvents() code as an INSERT with an empty result set raises an IllegalStateException: create table ctas_empty as select * from functional.alltypes limit 0; > Send INSERT events even when Impala's even processing is not enabled > > > Key: IMPALA-10744 > URL: https://issues.apache.org/jira/browse/IMPALA-10744 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Reporter: Zoltán Borók-Nagy >Priority: Major > > Generating insert events should not be conditional to events processor being > active or not. > Related code: > https://github.com/apache/impala/blob/d99caa1f3a049fc5e20855f8e8bf846fd81f65c5/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L5124 > Please note that this will also need to fix a bug in the createInsertEvents() > code as an INSERT with an empty result set raises an IllegalStateException: > create table ctas_empty as select * from functional.alltypes limit 0; -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-10736) Add support for Hive Replication for Iceberg tables
[ https://issues.apache.org/jira/browse/IMPALA-10736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltán Borók-Nagy updated IMPALA-10736: --- Labels: impala-iceberg (was: ) > Add support for Hive Replication for Iceberg tables > --- > > Key: IMPALA-10736 > URL: https://issues.apache.org/jira/browse/IMPALA-10736 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Reporter: Zoltán Borók-Nagy >Priority: Major > Labels: impala-iceberg > > Hive Replication currently doesn't support Iceberg tables. > Once it will, we'll need to add support for it as well. > Currently Iceberg stores absolute paths in its metadata files, so we'll > probably need to wait for this issue to be resolved as well: > https://github.com/apache/iceberg/issues/1617 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-10736) Add support for Hive Replication for Iceberg tables
[ https://issues.apache.org/jira/browse/IMPALA-10736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltán Borók-Nagy updated IMPALA-10736: --- Component/s: Catalog > Add support for Hive Replication for Iceberg tables > --- > > Key: IMPALA-10736 > URL: https://issues.apache.org/jira/browse/IMPALA-10736 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Reporter: Zoltán Borók-Nagy >Priority: Major > > Hive Replication currently doesn't support Iceberg tables. > Once it will, we'll need to add support for it as well. > Currently Iceberg stores absolute paths in its metadata files, so we'll > probably need to wait for this issue to be resolved as well: > https://github.com/apache/iceberg/issues/1617 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-10777) Enable min/max filtering for Iceberg partitions.
[ https://issues.apache.org/jira/browse/IMPALA-10777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltán Borók-Nagy updated IMPALA-10777: --- Component/s: Frontend > Enable min/max filtering for Iceberg partitions. > > > Key: IMPALA-10777 > URL: https://issues.apache.org/jira/browse/IMPALA-10777 > Project: IMPALA > Issue Type: Improvement > Components: Frontend >Reporter: Qifan Chen >Priority: Major > > The work to enable min/max filters for partition columns is underway. See > IMPALA-10738. > It is nice to enable the filtering for iceberg partitions. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-10777) Enable min/max filtering for Iceberg partitions.
[ https://issues.apache.org/jira/browse/IMPALA-10777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltán Borók-Nagy updated IMPALA-10777: --- Labels: impala-iceberg (was: ) > Enable min/max filtering for Iceberg partitions. > > > Key: IMPALA-10777 > URL: https://issues.apache.org/jira/browse/IMPALA-10777 > Project: IMPALA > Issue Type: Improvement > Components: Frontend >Reporter: Qifan Chen >Priority: Major > Labels: impala-iceberg > > The work to enable min/max filters for partition columns is underway. See > IMPALA-10738. > It is nice to enable the filtering for iceberg partitions. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-10658) LOAD DATA INPATH silently fails between HDFS and Azure ABFS
[ https://issues.apache.org/jira/browse/IMPALA-10658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltán Borók-Nagy resolved IMPALA-10658. Fix Version/s: Impala 4.0 Resolution: Fixed > LOAD DATA INPATH silently fails between HDFS and Azure ABFS > --- > > Key: IMPALA-10658 > URL: https://issues.apache.org/jira/browse/IMPALA-10658 > Project: IMPALA > Issue Type: Bug >Reporter: Zoltán Borók-Nagy >Assignee: Zoltán Borók-Nagy >Priority: Major > Fix For: Impala 4.0 > > > LOAD DATA INPATH silently fails when Impala tries to move files from HDFS to > ABFS. > The problem is that in 'relocateFile()' we try to figure out if 'sourceFile' > is on the destination filesystem: > https://github.com/apache/impala/blob/6b16df9e9a4696b46b6f9c7fe2fc0aaded285623/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java#L246 > We use the following code to decide this: > https://github.com/apache/impala/blob/6b16df9e9a4696b46b6f9c7fe2fc0aaded285623/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java#L581-L591 > However, the Azure FileSystem implementation doesn't throw an exception in > 'fs.makeQualified(path);'. I just happily returns a new Path substituting the > prefix "hdfs://" to "abfs://". > So in relocateFile() Impala thinks the 'sourceFile' and 'destFile' are on the > same filesystem so it tries to invoke 'destFs.rename()': > https://github.com/apache/impala/blob/6b16df9e9a4696b46b6f9c7fe2fc0aaded285623/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java#L266 > From > https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/filesystem/filesystem.html#boolean_rename.28Path_src.2C_Path_d.29 > : "In terms of its implementation, it is the one with the most ambiguity > regarding when to return false versus raising an exception." > Seems like the Azure FileSystem implementation doesn't throw an exception on > failure, but returns false instead. Unfortunately Impala doesn't check the > return value of destFs.rename() (see above), so the error remains silent. > To fix this issue we need to do two things: > * fix FileSystemUtil.isPathOnFileSystem() > * check the return value of destFs.rename() and throw an exception when it's > false -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-10679) Create SHA2 builtin function
Zoltán Borók-Nagy created IMPALA-10679: -- Summary: Create SHA2 builtin function Key: IMPALA-10679 URL: https://issues.apache.org/jira/browse/IMPALA-10679 Project: IMPALA Issue Type: New Feature Components: Backend Reporter: Zoltán Borók-Nagy Add support for the SHA2 family of hash functions (SHA-224, SHA-256, SHA-384, and SHA-512). Hive already supports SHA2: HIVE-10644 We should add a similar builtin function. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-10679) Create SHA2 builtin function
[ https://issues.apache.org/jira/browse/IMPALA-10679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17335506#comment-17335506 ] Zoltán Borók-Nagy commented on IMPALA-10679: I see, [~JacquesJZa]. FIPS mode only affects you if you run impala in a FIPS-enabled environment. In such environments we cannot use the forbidden hash algorithms (e.g. MD5) at all, not even internally, see e.g. IMPALA-10205 Cc [~wzhou] > Create SHA2 builtin function > > > Key: IMPALA-10679 > URL: https://issues.apache.org/jira/browse/IMPALA-10679 > Project: IMPALA > Issue Type: New Feature > Components: Backend >Reporter: Zoltán Borók-Nagy >Assignee: Amogh Margoor >Priority: Major > Labels: newbie, ramp-up > > Add support for the SHA2 family of hash functions (SHA-224, SHA-256, SHA-384, > and SHA-512). > Hive already supports SHA2: HIVE-10644 > We should add a similar builtin function. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-10685) OUTER JOIN against ACID collections might be converted to INNER JOIN
Zoltán Borók-Nagy created IMPALA-10685: -- Summary: OUTER JOIN against ACID collections might be converted to INNER JOIN Key: IMPALA-10685 URL: https://issues.apache.org/jira/browse/IMPALA-10685 Project: IMPALA Issue Type: Bug Reporter: Zoltán Borók-Nagy We are rewriting "A join B" to "A join B1 join B2" for some queries that refer to collections in ACID tables. This is ok for inner join but may be incorrect for outer joins. Here is an example, the two queries produce different results: Query works well for non-ACID table: {noformat} impala> with v as ( select ('k4') as key union all values ('k1'), ('k2'), ('k3') ) select * from v left join functional_parquet.complextypestbl.int_map using (key); +-+--+---+ | key | key | value | +-+--+---+ | k1 | k1 | -1| | k1 | k1 | 1 | | k2 | k2 | 100 | | k1 | k1 | 2 | | k2 | k2 | NULL | | k1 | k1 | NULL | | k3 | k3 | NULL | | k4 | NULL | NULL | +-+--+---+ Fetched 8 row(s) in 3.35s {noformat} LEFT OUTER JOIN converted to INNER JOIN for ACID table: {noformat} impala> with v as ( select ('k4') as key union all values ('k1'), ('k2'), ('k3') ) select * from v left join functional_orc_def.complextypestbl.int_map using (key); +-+-+---+ | key | key | value | +-+-+---+ | k1 | k1 | -1| | k1 | k1 | 1 | | k2 | k2 | 100 | | k1 | k1 | 2 | | k2 | k2 | NULL | | k1 | k1 | NULL | | k3 | k3 | NULL | +-+-+---+ Fetched 7 row(s) in 0.35s {noformat} IMPALA-9494 can help to fix this. Until that we could use the techniques from IMPALA-9330. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-10679) Create SHA2 builtin function
[ https://issues.apache.org/jira/browse/IMPALA-10679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17335661#comment-17335661 ] Zoltán Borók-Nagy commented on IMPALA-10679: Thanks for the docs, [~wzhou]. So my understanding is that we can add an MD5 builtin function to Impala (similar to Hive's) that users can use in their SELECT queries, but only if they don't run their Impala cluster in FIPS mode. In FIPS mode Impala should raise an error for "SELECT MD5('ABC')", right? > Create SHA2 builtin function > > > Key: IMPALA-10679 > URL: https://issues.apache.org/jira/browse/IMPALA-10679 > Project: IMPALA > Issue Type: New Feature > Components: Backend >Reporter: Zoltán Borók-Nagy >Assignee: Amogh Margoor >Priority: Major > Labels: newbie, ramp-up > > Add support for the SHA2 family of hash functions (SHA-224, SHA-256, SHA-384, > and SHA-512). > Hive already supports SHA2: HIVE-10644 > We should add a similar builtin function. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-10685) OUTER JOIN against ACID collections might be converted to INNER JOIN
[ https://issues.apache.org/jira/browse/IMPALA-10685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltán Borók-Nagy updated IMPALA-10685: --- Description: We are rewriting "A join B" to "A join B1 join B2" for some queries that refer to collections in ACID tables. This is ok for inner join but may be incorrect for outer joins. Here is an example, the two queries produce different results: Query works well for non-ACID table: {noformat} impala> with v as ( select ('k4') as key union all values ('k1'), ('k2'), ('k3') ) select * from v left join functional_parquet.complextypestbl.int_map using (key); +-+--+---+ | key | key | value | +-+--+---+ | k1 | k1 | -1| | k1 | k1 | 1 | | k2 | k2 | 100 | | k1 | k1 | 2 | | k2 | k2 | NULL | | k1 | k1 | NULL | | k3 | k3 | NULL | | k4 | NULL | NULL | +-+--+---+ Fetched 8 row(s) in 3.35s {noformat} LEFT OUTER JOIN converted to INNER JOIN for ACID table: {noformat} impala> with v as ( select ('k4') as key union all values ('k1'), ('k2'), ('k3') ) select * from v left join functional_orc_def.complextypestbl.int_map using (key); +-+-+---+ | key | key | value | +-+-+---+ | k1 | k1 | -1| | k1 | k1 | 1 | | k2 | k2 | 100 | | k1 | k1 | 2 | | k2 | k2 | NULL | | k1 | k1 | NULL | | k3 | k3 | NULL | +-+-+---+ Fetched 7 row(s) in 0.35s {noformat} IMPALA-9494 can help to fix this. Until that we could use the techniques from IMPALA-9330. Possible workaround is to rewrite the query to: {noformat} with v as ( select ('k4') as key union all values ('k1'), ('k2'), ('k3') ) select * from v left join (select int_map.* from functional_orc_def.complextypestbl c, c.int_map) vv using (key); {noformat} was: We are rewriting "A join B" to "A join B1 join B2" for some queries that refer to collections in ACID tables. This is ok for inner join but may be incorrect for outer joins. Here is an example, the two queries produce different results: Query works well for non-ACID table: {noformat} impala> with v as ( select ('k4') as key union all values ('k1'), ('k2'), ('k3') ) select * from v left join functional_parquet.complextypestbl.int_map using (key); +-+--+---+ | key | key | value | +-+--+---+ | k1 | k1 | -1| | k1 | k1 | 1 | | k2 | k2 | 100 | | k1 | k1 | 2 | | k2 | k2 | NULL | | k1 | k1 | NULL | | k3 | k3 | NULL | | k4 | NULL | NULL | +-+--+---+ Fetched 8 row(s) in 3.35s {noformat} LEFT OUTER JOIN converted to INNER JOIN for ACID table: {noformat} impala> with v as ( select ('k4') as key union all values ('k1'), ('k2'), ('k3') ) select * from v left join functional_orc_def.complextypestbl.int_map using (key); +-+-+---+ | key | key | value | +-+-+---+ | k1 | k1 | -1| | k1 | k1 | 1 | | k2 | k2 | 100 | | k1 | k1 | 2 | | k2 | k2 | NULL | | k1 | k1 | NULL | | k3 | k3 | NULL | +-+-+---+ Fetched 7 row(s) in 0.35s {noformat} IMPALA-9494 can help to fix this. Until that we could use the techniques from IMPALA-9330. > OUTER JOIN against ACID collections might be converted to INNER JOIN > > > Key: IMPALA-10685 > URL: https://issues.apache.org/jira/browse/IMPALA-10685 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Affects Versions: Impala 4.0 >Reporter: Zoltán Borók-Nagy >Priority: Major > > We are rewriting "A join B" to "A join B1 join B2" for some queries that > refer to collections in ACID tables. This is ok for inner join but may be > incorrect for outer joins. Here is an example, the two queries produce > different results: > Query works well for non-ACID table: > {noformat} > impala> with v as ( > select ('k4') as key > union all > values ('k1'), ('k2'), ('k3') > ) select * from v left join functional_parquet.complextypestbl.int_map using > (key); > +-+--+---+ > | key | key | value | > +-+--+---+ > | k1 | k1 | -1| > | k1 | k1 | 1 | > | k2 | k2 | 100 | > | k1 | k1 | 2 | > | k2 | k2 | NULL | > | k1 | k1 | NULL | > | k3 | k3 | NULL | > | k4 | NULL | NULL | > +-+--+---+ > Fetched 8 row(s) in 3.35s > {noformat} > LEFT OUTER JOIN converted to INNER JOIN for ACID table: > {noformat} > impala> with v as ( > select ('k4') as key > union all > values ('k1'), ('k2'), ('k3') > ) select * from v left join functional_orc_def.complextypestbl
[jira] [Updated] (IMPALA-10685) OUTER JOIN against ACID collections might be converted to INNER JOIN
[ https://issues.apache.org/jira/browse/IMPALA-10685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltán Borók-Nagy updated IMPALA-10685: --- Description: We are rewriting "A join B" to "A join B1 join B2" for some queries that refer to collections in ACID tables. This is ok for inner join but may be incorrect for outer joins. Here is an example, the two queries produce different results: Query works well for non-ACID table: {noformat} impala> with v as ( select ('k4') as key union all values ('k1'), ('k2'), ('k3') ) select * from v left join functional_parquet.complextypestbl.int_map using (key); +-+--+---+ | key | key | value | +-+--+---+ | k1 | k1 | -1| | k1 | k1 | 1 | | k2 | k2 | 100 | | k1 | k1 | 2 | | k2 | k2 | NULL | | k1 | k1 | NULL | | k3 | k3 | NULL | | k4 | NULL | NULL | +-+--+---+ Fetched 8 row(s) in 3.35s {noformat} LEFT OUTER JOIN converted to INNER JOIN for ACID table: {noformat} impala> with v as ( select ('k4') as key union all values ('k1'), ('k2'), ('k3') ) select * from v left join functional_orc_def.complextypestbl.int_map using (key); +-+-+---+ | key | key | value | +-+-+---+ | k1 | k1 | -1| | k1 | k1 | 1 | | k2 | k2 | 100 | | k1 | k1 | 2 | | k2 | k2 | NULL | | k1 | k1 | NULL | | k3 | k3 | NULL | +-+-+---+ Fetched 7 row(s) in 0.35s {noformat} IMPALA-9494 can help to fix this. Until that we could use the techniques from IMPALA-9330. Possible workaround is to rewrite the query to use an inline view: {noformat} with v as ( select ('k4') as key union all values ('k1'), ('k2'), ('k3') ) select * from v left join (select int_map.* from functional_orc_def.complextypestbl c, c.int_map) vv using (key); {noformat} was: We are rewriting "A join B" to "A join B1 join B2" for some queries that refer to collections in ACID tables. This is ok for inner join but may be incorrect for outer joins. Here is an example, the two queries produce different results: Query works well for non-ACID table: {noformat} impala> with v as ( select ('k4') as key union all values ('k1'), ('k2'), ('k3') ) select * from v left join functional_parquet.complextypestbl.int_map using (key); +-+--+---+ | key | key | value | +-+--+---+ | k1 | k1 | -1| | k1 | k1 | 1 | | k2 | k2 | 100 | | k1 | k1 | 2 | | k2 | k2 | NULL | | k1 | k1 | NULL | | k3 | k3 | NULL | | k4 | NULL | NULL | +-+--+---+ Fetched 8 row(s) in 3.35s {noformat} LEFT OUTER JOIN converted to INNER JOIN for ACID table: {noformat} impala> with v as ( select ('k4') as key union all values ('k1'), ('k2'), ('k3') ) select * from v left join functional_orc_def.complextypestbl.int_map using (key); +-+-+---+ | key | key | value | +-+-+---+ | k1 | k1 | -1| | k1 | k1 | 1 | | k2 | k2 | 100 | | k1 | k1 | 2 | | k2 | k2 | NULL | | k1 | k1 | NULL | | k3 | k3 | NULL | +-+-+---+ Fetched 7 row(s) in 0.35s {noformat} IMPALA-9494 can help to fix this. Until that we could use the techniques from IMPALA-9330. Possible workaround is to rewrite the query to: {noformat} with v as ( select ('k4') as key union all values ('k1'), ('k2'), ('k3') ) select * from v left join (select int_map.* from functional_orc_def.complextypestbl c, c.int_map) vv using (key); {noformat} > OUTER JOIN against ACID collections might be converted to INNER JOIN > > > Key: IMPALA-10685 > URL: https://issues.apache.org/jira/browse/IMPALA-10685 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Affects Versions: Impala 4.0 >Reporter: Zoltán Borók-Nagy >Priority: Major > > We are rewriting "A join B" to "A join B1 join B2" for some queries that > refer to collections in ACID tables. This is ok for inner join but may be > incorrect for outer joins. Here is an example, the two queries produce > different results: > Query works well for non-ACID table: > {noformat} > impala> with v as ( > select ('k4') as key > union all > values ('k1'), ('k2'), ('k3') > ) select * from v left join functional_parquet.complextypestbl.int_map using > (key); > +-+--+---+ > | key | key | value | > +-+--+---+ > | k1 | k1 | -1| > | k1 | k1 | 1 | > | k2 | k2 | 100 | > | k1 | k1 | 2 | > | k2 | k2 | NULL | > | k1 | k1 | NULL | > | k3 | k3 | NULL | > | k4 | NULL | NULL | > +-+--+---+ > Fetched
[jira] [Assigned] (IMPALA-10674) Update toolchain ORC libary for better Iceberg support
[ https://issues.apache.org/jira/browse/IMPALA-10674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltán Borók-Nagy reassigned IMPALA-10674: -- Assignee: Zoltán Borók-Nagy > Update toolchain ORC libary for better Iceberg support > -- > > Key: IMPALA-10674 > URL: https://issues.apache.org/jira/browse/IMPALA-10674 > Project: IMPALA > Issue Type: Bug > Components: Backend, Infrastructure >Reporter: Zoltán Borók-Nagy >Assignee: Zoltán Borók-Nagy >Priority: Major > > We need the following fixes/features from the ORC library: > ORC-763: Fix timestamp inconsistencies with Java > ORC-784: Support setting timezone to timestamp column > ORC-666: Support timastamp with local timezone (this corresponds to the > Iceberg TIMESTAMPTZ type) > ORC-781: Make type annotations available from C++ (this is needed for Iceberg > column resolution) > To get these we need to upgrade/patch the ORC C++ library in the toolchain. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-9967) Scan orc failed when table contains timestamp column
[ https://issues.apache.org/jira/browse/IMPALA-9967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltán Borók-Nagy reassigned IMPALA-9967: - Assignee: Zoltán Borók-Nagy > Scan orc failed when table contains timestamp column > > > Key: IMPALA-9967 > URL: https://issues.apache.org/jira/browse/IMPALA-9967 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 4.0 >Reporter: Sheng Wang >Assignee: Zoltán Borók-Nagy >Priority: Minor > Labels: impala-iceberg > Attachments: 00031-31-26ff2064-c8f2-467f-ab7e-1949cb30d151-0.orc, > 00031-31-334beaba-ef4b-4d13-b338-e715cdf0ef85-0.orc > > > Recently, when I test impala query orc table, I found that scanning failed > when table contains timestamp column, here is there exception: > {code:java} > I0717 08:31:47.179124 78759 status.cc:129] 68436a6e0883be84:53877f720002] > Encountered parse error in tail of ORC file > hdfs://localhost:20500/test-warehouse/orc_scanner_test/00031-31-ac3cccf1-3ce7-40c6-933c-4fbd7bd57550-0.orc: > Unknown type kind > @ 0x1c9f753 impala::Status::Status() > @ 0x27aa049 impala::HdfsOrcScanner::ProcessFileTail() > @ 0x27a7fb3 impala::HdfsOrcScanner::Open() > @ 0x27365fe > impala::HdfsScanNodeBase::CreateAndOpenScannerHelper() > @ 0x28cb379 impala::HdfsScanNode::ProcessSplit() > @ 0x28caa7d impala::HdfsScanNode::ScannerThread() > @ 0x28c9de5 > _ZZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS_18ThreadResourcePoolEENKUlvE_clEv > @ 0x28cc19e > _ZN5boost6detail8function26void_function_obj_invoker0IZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS3_18ThreadResourcePoolEEUlvE_vE6invokeERNS1_15function_bufferE > @ 0x205 boost::function0<>::operator()() > @ 0x2675d93 impala::Thread::SuperviseThread() > @ 0x267dd30 boost::_bi::list5<>::operator()<>() > @ 0x267dc54 boost::_bi::bind_t<>::operator()() > @ 0x267dc15 boost::detail::thread_data<>::run() > @ 0x3e3c3c1 thread_proxy > @ 0x7f32360336b9 start_thread > @ 0x7f3232bfe41c clone > I0717 08:31:47.325670 78759 hdfs-scan-node.cc:490] > 68436a6e0883be84:53877f720002] Error preparing scanner for scan range > hdfs://localhost:20500/test-warehouse/orc_scanner_test/00031-31-ac3cccf1-3ce7-40c6-933c-4fbd7bd57550-0.orc(0:582). > Encountered parse error in tail of ORC file > hdfs://localhost:20500/test-warehouse/orc_scanner_test/00031-31-ac3cccf1-3ce7-40c6-933c-4fbd7bd57550-0.orc: > Unknown type kind > {code} > When I remove timestamp colum from table, and generate test data, query > success. By the way, my test data is generated by spark. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org