[jira] [Resolved] (IMPALA-8242) Support Iceberg on S3

2021-01-04 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-8242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy resolved IMPALA-8242.
---
Fix Version/s: Impala 4.0
   Resolution: Fixed

> Support Iceberg on S3
> -
>
> Key: IMPALA-8242
> URL: https://issues.apache.org/jira/browse/IMPALA-8242
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Frontend
>Reporter: Quanlong Huang
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>  Labels: impala-iceberg
> Fix For: Impala 4.0
>
>
> http://iceberg.incubator.apache.org/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-8242) Support Iceberg on S3

2021-01-04 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-8242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy reassigned IMPALA-8242:
-

Assignee: Zoltán Borók-Nagy

> Support Iceberg on S3
> -
>
> Key: IMPALA-8242
> URL: https://issues.apache.org/jira/browse/IMPALA-8242
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Frontend
>Reporter: Quanlong Huang
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>  Labels: impala-iceberg
>
> http://iceberg.incubator.apache.org/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-9723) Read files created by Hive Streaming Ingestion V2

2021-01-04 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-9723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy updated IMPALA-9723:
--
Priority: Minor  (was: Major)

> Read files created by Hive Streaming Ingestion V2
> -
>
> Key: IMPALA-9723
> URL: https://issues.apache.org/jira/browse/IMPALA-9723
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Frontend
>Reporter: Zoltán Borók-Nagy
>Priority: Minor
>
> Impala should be able to read files created by Hive Streaming Ingestion V2.
> Hive Streaming only writes full ACID ORC files. Such files might contain row 
> stripes that Impala shouldn't read based on its validWriteIdList.
> Also, Hive Streaming might append to the end of such files. In that case it 
> writes a "side file" next to the file that contains the last committed file 
> end (name of it is file name + _flush_length). Impala should take that into 
> consideration when it reads such files. Everything after "flush length" must 
> be ignored.
> OrcAcidUtils.getLastFlushLength(fileSystem, filePath) can be used to 
> determine the committed file size.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-9723) Read files created by Hive Streaming Ingestion V2

2021-01-04 Thread Jira


[ 
https://issues.apache.org/jira/browse/IMPALA-9723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258117#comment-17258117
 ] 

Zoltán Borók-Nagy commented on IMPALA-9723:
---

Lowered the priority because AFAIK the current engines don't append to existing 
files, but create new ones. So the problem in the description is likely 
non-existent. But keeping this jira open until this behavior will be the 
standard.

> Read files created by Hive Streaming Ingestion V2
> -
>
> Key: IMPALA-9723
> URL: https://issues.apache.org/jira/browse/IMPALA-9723
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Frontend
>Reporter: Zoltán Borók-Nagy
>Priority: Minor
>
> Impala should be able to read files created by Hive Streaming Ingestion V2.
> Hive Streaming only writes full ACID ORC files. Such files might contain row 
> stripes that Impala shouldn't read based on its validWriteIdList.
> Also, Hive Streaming might append to the end of such files. In that case it 
> writes a "side file" next to the file that contains the last committed file 
> end (name of it is file name + _flush_length). Impala should take that into 
> consideration when it reads such files. Everything after "flush length" must 
> be ignored.
> OrcAcidUtils.getLastFlushLength(fileSystem, filePath) can be used to 
> determine the committed file size.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-7556) Clean up ScanRange

2021-01-04 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-7556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy updated IMPALA-7556:
--
Labels: ramp-up  (was: )

> Clean up ScanRange
> --
>
> Key: IMPALA-7556
> URL: https://issues.apache.org/jira/browse/IMPALA-7556
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Zoltán Borók-Nagy
>Priority: Major
>  Labels: ramp-up
>
> For IMPALA-7543 I want to add some additional functionality to scan ranges.
> However, the code of the ScanRange class is already quite messy. It handles 
> different types of files, does some buffer management, updates all kinds of 
> counters.
> So, instead of complicating the code further, let's refactor the ScanRange 
> class a bit.
>  * Do the file operations in separate classes
>  ** A new, abstract class could be invented to provide an API for file 
> operations, i.e. Open(), ReadFromPos(), Close(), etc.
>  *** Keep in mind that the interface must be a good fit for IMPALA-7543, i.e. 
> we need positional reads from files
>  ** Operations for local files and HDFS files could be implemented in child 
> classes
>  * Buffer management
>  ** A new BufferStore class could be created
>  ** This new class would be responsible for managing the unused buffers
>  *** if possible, it would also handle the client and cached buffers as well
>  * Counters and metrics would be updated by the corresponding new classes
>  ** E.g. ImpaladMetrics::IO_MGR_NUM_OPEN_FILES would be updated by the file 
> handling classes



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-7556) Clean up ScanRange

2021-01-04 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-7556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy reassigned IMPALA-7556:
-

Assignee: (was: Zoltán Borók-Nagy)

> Clean up ScanRange
> --
>
> Key: IMPALA-7556
> URL: https://issues.apache.org/jira/browse/IMPALA-7556
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Zoltán Borók-Nagy
>Priority: Major
>
> For IMPALA-7543 I want to add some additional functionality to scan ranges.
> However, the code of the ScanRange class is already quite messy. It handles 
> different types of files, does some buffer management, updates all kinds of 
> counters.
> So, instead of complicating the code further, let's refactor the ScanRange 
> class a bit.
>  * Do the file operations in separate classes
>  ** A new, abstract class could be invented to provide an API for file 
> operations, i.e. Open(), ReadFromPos(), Close(), etc.
>  *** Keep in mind that the interface must be a good fit for IMPALA-7543, i.e. 
> we need positional reads from files
>  ** Operations for local files and HDFS files could be implemented in child 
> classes
>  * Buffer management
>  ** A new BufferStore class could be created
>  ** This new class would be responsible for managing the unused buffers
>  *** if possible, it would also handle the client and cached buffers as well
>  * Counters and metrics would be updated by the corresponding new classes
>  ** E.g. ImpaladMetrics::IO_MGR_NUM_OPEN_FILES would be updated by the file 
> handling classes



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7556) Clean up ScanRange

2021-01-04 Thread Jira


[ 
https://issues.apache.org/jira/browse/IMPALA-7556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258118#comment-17258118
 ] 

Zoltán Borók-Nagy commented on IMPALA-7556:
---

The "Buffer management" and "Counters and metrics" part are still to do. They 
would be nice to have, though I'm not sure when will I have the bandwidth for 
them. So I'm unassigning this task for now and making it a "ramp-up" task.

> Clean up ScanRange
> --
>
> Key: IMPALA-7556
>     URL: https://issues.apache.org/jira/browse/IMPALA-7556
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>
> For IMPALA-7543 I want to add some additional functionality to scan ranges.
> However, the code of the ScanRange class is already quite messy. It handles 
> different types of files, does some buffer management, updates all kinds of 
> counters.
> So, instead of complicating the code further, let's refactor the ScanRange 
> class a bit.
>  * Do the file operations in separate classes
>  ** A new, abstract class could be invented to provide an API for file 
> operations, i.e. Open(), ReadFromPos(), Close(), etc.
>  *** Keep in mind that the interface must be a good fit for IMPALA-7543, i.e. 
> we need positional reads from files
>  ** Operations for local files and HDFS files could be implemented in child 
> classes
>  * Buffer management
>  ** A new BufferStore class could be created
>  ** This new class would be responsible for managing the unused buffers
>  *** if possible, it would also handle the client and cached buffers as well
>  * Counters and metrics would be updated by the corresponding new classes
>  ** E.g. ImpaladMetrics::IO_MGR_NUM_OPEN_FILES would be updated by the file 
> handling classes



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10223) Implement INSERT OVERWRITE for Iceberg tables

2021-02-01 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-10223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy updated IMPALA-10223:
---
Summary: Implement INSERT OVERWRITE for Iceberg tables  (was: Implement 
INSERT OVERWRITE for non-partitioned Iceberg tables)

> Implement INSERT OVERWRITE for Iceberg tables
> -
>
> Key: IMPALA-10223
> URL: https://issues.apache.org/jira/browse/IMPALA-10223
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Frontend
>Reporter: Zoltán Borók-Nagy
>Priority: Major
>  Labels: impala-iceberg
>
> Add support for INSERT OVERWRITE statements for Iceberg tables.
> Use Iceberg's OverwriteFiles API for this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10223) Implement INSERT OVERWRITE for Iceberg tables

2021-02-01 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-10223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy updated IMPALA-10223:
---
Description: 
Add support for INSERT OVERWRITE statements for Iceberg tables.

Use Iceberg's ReplacePartitions API for this.

  was:
Add support for INSERT OVERWRITE statements for Iceberg tables.

Use Iceberg's OverwriteFiles API for this.


> Implement INSERT OVERWRITE for Iceberg tables
> -
>
> Key: IMPALA-10223
> URL: https://issues.apache.org/jira/browse/IMPALA-10223
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Frontend
>Reporter: Zoltán Borók-Nagy
>Priority: Major
>  Labels: impala-iceberg
>
> Add support for INSERT OVERWRITE statements for Iceberg tables.
> Use Iceberg's ReplacePartitions API for this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-10460) Impala should write normalized paths in Iceberg manifests

2021-02-01 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-10460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy resolved IMPALA-10460.

Fix Version/s: Impala 4.0
   Resolution: Fixed

> Impala should write normalized paths in Iceberg manifests
> -
>
> Key: IMPALA-10460
> URL: https://issues.apache.org/jira/browse/IMPALA-10460
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>  Labels: impala-iceberg
> Fix For: Impala 4.0
>
>
> Currently Impala writes double slashes in the paths of datafiles for 
> non-partitioned Iceberg tables, e.g.:
> {noformat}
> hdfs://localhost:20500/test-warehouse/ice_t/data//594828b035d480b7-9c9fd8eb_173316607_data.0.parq{noformat}
> Paths should be normalized so they won't cause any problems later.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10166) ALTER TABLE for Iceberg tables

2021-02-02 Thread Jira


[ 
https://issues.apache.org/jira/browse/IMPALA-10166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17277026#comment-17277026
 ] 

Zoltán Borók-Nagy commented on IMPALA-10166:


Hey [~skyyws], do you plan to work on the remaining ALTER TABLE statements as 
well?

> ALTER TABLE for Iceberg tables
> --
>
> Key: IMPALA-10166
> URL: https://issues.apache.org/jira/browse/IMPALA-10166
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Frontend
>Reporter: Zoltán Borók-Nagy
>Assignee: Sheng Wang
>Priority: Major
>  Labels: impala-iceberg
>
> Add support for ALTER TABLE operations for Iceberg tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-10456) Implement TRUNCATE for Iceberg tables

2021-01-27 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-10456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy reassigned IMPALA-10456:
--

Assignee: Zoltán Borók-Nagy

> Implement TRUNCATE for Iceberg tables
> -
>
> Key: IMPALA-10456
> URL: https://issues.apache.org/jira/browse/IMPALA-10456
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Frontend
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>  Labels: impala-iceberg
>
> Implement TRUNCATE for Iceberg tables.
> The TRUNCATE operation should create a new snapshot for the target table that 
> doesn't have any data files.
> It should work for both partitioned and unpartitioned tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10456) Implement TRUNCATE for Iceberg tables

2021-01-27 Thread Jira
Zoltán Borók-Nagy created IMPALA-10456:
--

 Summary: Implement TRUNCATE for Iceberg tables
 Key: IMPALA-10456
 URL: https://issues.apache.org/jira/browse/IMPALA-10456
 Project: IMPALA
  Issue Type: Sub-task
Reporter: Zoltán Borók-Nagy


Implement TRUNCATE for Iceberg tables.

The TRUNCATE operation should create a new snapshot for the target table that 
doesn't have any data files.

It should work for both partitioned and unpartitioned tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10456) Implement TRUNCATE for Iceberg tables

2021-01-27 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-10456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy updated IMPALA-10456:
---
Labels: impala-iceberg  (was: )

> Implement TRUNCATE for Iceberg tables
> -
>
> Key: IMPALA-10456
> URL: https://issues.apache.org/jira/browse/IMPALA-10456
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Frontend
>Reporter: Zoltán Borók-Nagy
>Priority: Major
>  Labels: impala-iceberg
>
> Implement TRUNCATE for Iceberg tables.
> The TRUNCATE operation should create a new snapshot for the target table that 
> doesn't have any data files.
> It should work for both partitioned and unpartitioned tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10456) Implement TRUNCATE for Iceberg tables

2021-01-27 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-10456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy updated IMPALA-10456:
---
Component/s: Frontend

> Implement TRUNCATE for Iceberg tables
> -
>
> Key: IMPALA-10456
> URL: https://issues.apache.org/jira/browse/IMPALA-10456
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Frontend
>Reporter: Zoltán Borók-Nagy
>Priority: Major
>
> Implement TRUNCATE for Iceberg tables.
> The TRUNCATE operation should create a new snapshot for the target table that 
> doesn't have any data files.
> It should work for both partitioned and unpartitioned tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-10432) INSERT INTO Iceberg tables with partition transforms

2021-01-26 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-10432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy reassigned IMPALA-10432:
--

Assignee: Zoltán Borók-Nagy

> INSERT INTO Iceberg tables with partition transforms
> 
>
> Key: IMPALA-10432
> URL: https://issues.apache.org/jira/browse/IMPALA-10432
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>  Labels: impala-iceberg
> Fix For: Impala 4.0
>
>
> INSERT INTO Iceberg tables that use partition transforms. Partition 
> transforms are functions that calculate partition data from row data.
> There are the following partition transforms in Iceberg:
> [https://iceberg.apache.org/spec/#partition-transforms]
>  * IDENTITY
>  * BUCKET
>  * TRUNCATE
>  * YEAR
>  * MONTH
>  * DAY
>  * HOUR
> INSERT INTO identity-partitioned Iceberg tables are already supported.
> We need to add support for the rest.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10446) Add interop tests for Iceberg tables

2021-01-26 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-10446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy updated IMPALA-10446:
---
Labels: impala-iceberg  (was: )

> Add interop tests for Iceberg tables
> 
>
> Key: IMPALA-10446
> URL: https://issues.apache.org/jira/browse/IMPALA-10446
> Project: IMPALA
>  Issue Type: Test
>Reporter: Zoltán Borók-Nagy
>Priority: Major
>  Labels: impala-iceberg
>
> Add interoperability tests for Iceberg table support between Impala and Hive.
> At first we'll need to add Iceberg to our test environment and configure Hive 
> to use it during runtime.
> We need to test that Impala is able to read Iceberg tables written by Hive.
> Also test that Hive is able to read Iceberg tables written by Impala.
>  * Have tests for all data types
>  * For all partition transforms
>  * Null values in data and/or partitioning columns
>  * Files with multiple row groups (written by Hive)
>  ** to test block location loading from Iceberg
>  * Unsupported column types
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10453) Support file/partition pruning via runtime filters on Iceberg

2021-01-26 Thread Jira


[ 
https://issues.apache.org/jira/browse/IMPALA-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17272187#comment-17272187
 ] 

Zoltán Borók-Nagy commented on IMPALA-10453:


I think min/max filters will do a good service here, filtering out row groups.

I wonder if we put the partition transformed values into the bloom filters then 
they'd be able to prune files using the associated partition data. However, we 
can't do that if the partition layout has been evolved over time. Or we will 
just only prune files that have the current partition layout.

> Support file/partition pruning via runtime filters on Iceberg
> -
>
> Key: IMPALA-10453
> URL: https://issues.apache.org/jira/browse/IMPALA-10453
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Tim Armstrong
>Priority: Major
>  Labels: iceberg, impala-iceberg, performance
>
> This is a placeholder to figure out what we'd need to do to support dynamic 
> file-level pruning in Iceberg using runtime filters, i.e. have parity for 
> partition pruning.
> * If there is a single partition value per file, then applying bloom filters 
> to the row group stats would be effective at pruning files.
> * If there are partition transforms, e.g. hash-based, then I think we 
> probably need to track the partition that the file is associated with and 
> then have some custom logic in the parquet scanner to do partition pruning.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-10452) CREATE Iceberg tables with old PARTITIONED BY syntax

2021-01-26 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-10452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy reassigned IMPALA-10452:
--

Assignee: Zoltán Borók-Nagy

> CREATE Iceberg tables with old PARTITIONED BY syntax
> 
>
> Key: IMPALA-10452
> URL: https://issues.apache.org/jira/browse/IMPALA-10452
> Project: IMPALA
>  Issue Type: Test
>  Components: Frontend
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>  Labels: impala-iceberg
>
> It's convenient for users to create Iceberg tables with the old syntax.
> It's also easier to migrate existing workloads to Iceberg because the SQL 
> scripts that create the table definitions don't need to change that much.
> So users should be able to write the following:
> {noformat}
> CREATE TABLE ice_t (i int)
> PARTITIONED BY (p int)
> STORED AS ICEBERG;
> {noformat}
> Which should be equivalent to this:
> {noformat}
> CREATE TABLE ice_t (i int, p int)
> PARTITION BY SPEC (p IDENTITY)
> STORED AS ICEBERG;
> {noformat}
> Please note that the old-style CREATE TABLE creates IDENTITY-partitioned 
> tables. For other partition transforms the users must use the new, more 
> generic syntax.
> Hive also supports the PARTITIONED BY syntax, see 
> [https://github.com/apache/iceberg/pull/1612]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10452) CREATE Iceberg tables with old PARTITIONED BY syntax

2021-01-26 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-10452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy updated IMPALA-10452:
---
Component/s: Frontend

> CREATE Iceberg tables with old PARTITIONED BY syntax
> 
>
> Key: IMPALA-10452
> URL: https://issues.apache.org/jira/browse/IMPALA-10452
> Project: IMPALA
>  Issue Type: Test
>  Components: Frontend
>Reporter: Zoltán Borók-Nagy
>Priority: Major
>  Labels: impala-iceberg
>
> It's convenient for users to create Iceberg tables with the old syntax.
> It's also easier to migrate existing workloads to Iceberg because the SQL 
> scripts that create the table definitions don't need to change that much.
> So users should be able to write the following:
> {noformat}
> CREATE TABLE ice_t (i int)
> PARTITIONED BY (p int)
> STORED AS ICEBERG;
> {noformat}
> Which should be equivalent to this:
> {noformat}
> CREATE TABLE ice_t (i int, p int)
> PARTITION BY SPEC (p IDENTITY)
> STORED AS ICEBERG;
> {noformat}
> Please note that the old-style CREATE TABLE creates IDENTITY-partitioned 
> tables. For other partition transforms the users must use the new, more 
> generic syntax.
> Hive also supports the PARTITIONED BY syntax, see 
> [https://github.com/apache/iceberg/pull/1612]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10460) Impala should write normalized paths in Iceberg manifests

2021-01-28 Thread Jira
Zoltán Borók-Nagy created IMPALA-10460:
--

 Summary: Impala should write normalized paths in Iceberg manifests
 Key: IMPALA-10460
 URL: https://issues.apache.org/jira/browse/IMPALA-10460
 Project: IMPALA
  Issue Type: Bug
Reporter: Zoltán Borók-Nagy


Currently Impala writes double slashes in the paths of datafiles for 
non-partitioned Iceberg tables, e.g.:
{noformat}
hdfs://localhost:20500/test-warehouse/ice_t/data//594828b035d480b7-9c9fd8eb_173316607_data.0.parq{noformat}
Paths should be normalized so they won't cause any problems later.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-10460) Impala should write normalized paths in Iceberg manifests

2021-01-28 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-10460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy reassigned IMPALA-10460:
--

Assignee: Zoltán Borók-Nagy

> Impala should write normalized paths in Iceberg manifests
> -
>
> Key: IMPALA-10460
> URL: https://issues.apache.org/jira/browse/IMPALA-10460
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>  Labels: impala-iceberg
>
> Currently Impala writes double slashes in the paths of datafiles for 
> non-partitioned Iceberg tables, e.g.:
> {noformat}
> hdfs://localhost:20500/test-warehouse/ice_t/data//594828b035d480b7-9c9fd8eb_173316607_data.0.parq{noformat}
> Paths should be normalized so they won't cause any problems later.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10460) Impala should write normalized paths in Iceberg manifests

2021-01-28 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-10460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy updated IMPALA-10460:
---
Component/s: Backend

> Impala should write normalized paths in Iceberg manifests
> -
>
> Key: IMPALA-10460
> URL: https://issues.apache.org/jira/browse/IMPALA-10460
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Zoltán Borók-Nagy
>Priority: Major
>
> Currently Impala writes double slashes in the paths of datafiles for 
> non-partitioned Iceberg tables, e.g.:
> {noformat}
> hdfs://localhost:20500/test-warehouse/ice_t/data//594828b035d480b7-9c9fd8eb_173316607_data.0.parq{noformat}
> Paths should be normalized so they won't cause any problems later.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10460) Impala should write normalized paths in Iceberg manifests

2021-01-28 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-10460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy updated IMPALA-10460:
---
Labels: impala-iceberg  (was: )

> Impala should write normalized paths in Iceberg manifests
> -
>
> Key: IMPALA-10460
> URL: https://issues.apache.org/jira/browse/IMPALA-10460
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Zoltán Borók-Nagy
>Priority: Major
>  Labels: impala-iceberg
>
> Currently Impala writes double slashes in the paths of datafiles for 
> non-partitioned Iceberg tables, e.g.:
> {noformat}
> hdfs://localhost:20500/test-warehouse/ice_t/data//594828b035d480b7-9c9fd8eb_173316607_data.0.parq{noformat}
> Paths should be normalized so they won't cause any problems later.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10166) ALTER TABLE for Iceberg tables

2021-02-02 Thread Jira


[ 
https://issues.apache.org/jira/browse/IMPALA-10166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17277108#comment-17277108
 ] 

Zoltán Borók-Nagy commented on IMPALA-10166:


Right. Thanks for the info!

> ALTER TABLE for Iceberg tables
> --
>
> Key: IMPALA-10166
> URL: https://issues.apache.org/jira/browse/IMPALA-10166
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Frontend
>Reporter: Zoltán Borók-Nagy
>Assignee: Sheng Wang
>Priority: Major
>  Labels: impala-iceberg
>
> Add support for ALTER TABLE operations for Iceberg tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10749) Add Runtime Filter Publish info to Node Lifecycle Event Timeline

2021-06-11 Thread Jira
Zoltán Borók-Nagy created IMPALA-10749:
--

 Summary: Add Runtime Filter Publish info to Node Lifecycle Event 
Timeline
 Key: IMPALA-10749
 URL: https://issues.apache.org/jira/browse/IMPALA-10749
 Project: IMPALA
  Issue Type: Improvement
  Components: Backend
Reporter: Zoltán Borók-Nagy


Currently we only have the following info about runtime filters at the HASH 
JOIN NODE in the profile:

{noformat}
Runtime filters: 1 of 1 Runtime Filter Published
{noformat}

But it would be useful to know when the runtime filters were published. We 
could add this information to the Node Lifecycle Event Timeline.

E.g. in test failures like IMPALA-10747 it would be good to know when the 
runtime filters were published.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10747) test_runtime_filters.py::test_row_filters failed in dockerised test

2021-06-11 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-10747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy updated IMPALA-10747:
---
Component/s: Backend

> test_runtime_filters.py::test_row_filters failed in dockerised test
> ---
>
> Key: IMPALA-10747
> URL: https://issues.apache.org/jira/browse/IMPALA-10747
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Zoltán Borók-Nagy
>Priority: Major
>  Labels: broken-build
>
> test_runtime_filters.py::test_row_filters failed with the following stack 
> trace:
> {noformat}
> query_test/test_runtime_filters.py:341: in test_row_filters
> test_file_vars={'$RUNTIME_FILTER_WAIT_TIME_MS': str(WAIT_TIME_MS)})
> common/impala_test_suite.py:734: in run_test_case
> update_section=pytest.config.option.update_results)
> common/test_result_verifier.py:636: in verify_runtime_profile
> actual))
> E   AssertionError: Did not find matches for lines in runtime profile:
> E   EXPECTED LINES:
> E   row_regex: .*Rows processed: 16.38K.*
> {noformat}
> The job was:
> https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/4320/testReport/
> It's similar to IMPALA-6004 and IMPALA-6712. Those were fixed by increasing 
> the runtime filter wait time. It's currently 60 seconds in regular builds and 
> 200 seconds in slow builds:
> https://github.com/apache/impala/blob/c65d7861d9ae28f6fc592727ff699a8155dcda2c/tests/query_test/test_runtime_filters.py#L37
> The profile contains:
> {noformat}
> Runtime filters: Not all filters arrived (arrived: [], missing [0]), waited 
> for 59s361ms. Arrival delay: 1m.
> {noformat}
> This was the only test that failed in that build, and the whole build took 4 
> hr 17 min which is normal. So other tests didn't experience slowness.
> There was only a single runtime filter that was generated by 02:HASH JOIN.
> {noformat}
> E   Operator #Hosts  #Inst  Avg Time  Max Time   #Rows  Est. 
> #Rows   Peak Mem  Est. Peak Mem  Detail  
> E   
> --
> E   F03:ROOT  1  1   0.000ns   0.000ns
> 4.01 MB4.00 MB  
> E   07:AGGREGATE  1  1   3.999ms   3.999ms   1
>1   16.00 KB   16.00 KB  FINALIZE
> E   06:EXCHANGE   1  1   0.000ns   0.000ns   3
>1   32.00 KB   16.00 KB  UNPARTITIONED   
> E   F02:EXCHANGE SENDER   3  3   0.000ns   0.000ns
>16.00 KB  0  
> E   03:AGGREGATE  3  3   0.000ns   0.000ns   3
>1   24.00 KB   16.00 KB  
> E   02:HASH JOIN  3  3  14s217ms  18s739ms  51.50M   
> 7.74M   68.06 MB  169.06 MB  INNER JOIN, PARTITIONED 
> E   |--05:EXCHANGE3  3   8s303ms  13s715ms   6.00M   
> 6.00M   13.90 MB   10.12 MB  HASH(b.l_comment)   
> E   |  F01:EXCHANGE SENDER3  3  36s758ms  44s115ms
>   209.53 KB  0  
> E   |  01:SCAN HDFS   3  3  13s637ms  15s775ms   6.00M   
> 6.00M   29.96 MB   80.00 MB  tpch_parquet.lineitem b 
> E   04:EXCHANGE   3  3   4s874ms   7s223ms   6.00M   
> 6.00M   12.27 MB   10.12 MB  HASH(a.l_comment)   
> E   F00:EXCHANGE SENDER   3  3  23s495ms  31s755ms
>   209.53 KB  0  
> E   00:SCAN HDFS  3  3  1m4s  1m8s   6.00M   
> 6.00M   29.96 MB   80.00 MB  tpch_parquet.lineitem a
> {noformat}
> The Max Time of F01:EXCHANGE SENDER was 44s115ms (non-child time).
> The HASH JOIN BUILDER above the EXCHANGE SENDER had a non-child total time 
> 19s851ms.
> The profiles of all HASH_JOIN_NODE operators (all 3 of the 3 fragment 
> instances) has:
> {noformat}
> Runtime filters: 1 of 1 Runtime Filter Published
> {noformat}
> So it seems like the filters were published, but 60 sec still wasn't enough.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10748) Remove enable_orc_scanner flag

2021-06-11 Thread Jira
Zoltán Borók-Nagy created IMPALA-10748:
--

 Summary: Remove enable_orc_scanner flag
 Key: IMPALA-10748
 URL: https://issues.apache.org/jira/browse/IMPALA-10748
 Project: IMPALA
  Issue Type: Improvement
  Components: Backend, Frontend
Reporter: Zoltán Borók-Nagy


We've been supporting reading ORC files for quite some time.

I don't think we need the flag anymore.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10747) test_runtime_filters.py::test_row_filters failed in dockerised test

2021-06-11 Thread Jira
Zoltán Borók-Nagy created IMPALA-10747:
--

 Summary: test_runtime_filters.py::test_row_filters failed in 
dockerised test
 Key: IMPALA-10747
 URL: https://issues.apache.org/jira/browse/IMPALA-10747
 Project: IMPALA
  Issue Type: Bug
Reporter: Zoltán Borók-Nagy


test_runtime_filters.py::test_row_filters failed with the following stack trace:

{noformat}
query_test/test_runtime_filters.py:341: in test_row_filters
test_file_vars={'$RUNTIME_FILTER_WAIT_TIME_MS': str(WAIT_TIME_MS)})
common/impala_test_suite.py:734: in run_test_case
update_section=pytest.config.option.update_results)
common/test_result_verifier.py:636: in verify_runtime_profile
actual))
E   AssertionError: Did not find matches for lines in runtime profile:
E   EXPECTED LINES:
E   row_regex: .*Rows processed: 16.38K.*
{noformat}

The job was:
https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/4320/testReport/

It's similar to IMPALA-6004 and IMPALA-6712. Those were fixed by increasing the 
runtime filter wait time. It's currently 60 seconds in regular builds and 200 
seconds in slow builds:
https://github.com/apache/impala/blob/c65d7861d9ae28f6fc592727ff699a8155dcda2c/tests/query_test/test_runtime_filters.py#L37

The profile contains:

{noformat}
Runtime filters: Not all filters arrived (arrived: [], missing [0]), waited for 
59s361ms. Arrival delay: 1m.
{noformat}

This was the only test that failed in that build, and the whole build took 4 hr 
17 min which is normal. So other tests didn't experience slowness.

There was only a single runtime filter that was generated by 02:HASH JOIN.

{noformat}
E   Operator #Hosts  #Inst  Avg Time  Max Time   #Rows  Est. 
#Rows   Peak Mem  Est. Peak Mem  Detail  
E   
--
E   F03:ROOT  1  1   0.000ns   0.000ns  
  4.01 MB4.00 MB  
E   07:AGGREGATE  1  1   3.999ms   3.999ms   1  
 1   16.00 KB   16.00 KB  FINALIZE
E   06:EXCHANGE   1  1   0.000ns   0.000ns   3  
 1   32.00 KB   16.00 KB  UNPARTITIONED   
E   F02:EXCHANGE SENDER   3  3   0.000ns   0.000ns  
 16.00 KB  0  
E   03:AGGREGATE  3  3   0.000ns   0.000ns   3  
 1   24.00 KB   16.00 KB  
E   02:HASH JOIN  3  3  14s217ms  18s739ms  51.50M   
7.74M   68.06 MB  169.06 MB  INNER JOIN, PARTITIONED 
E   |--05:EXCHANGE3  3   8s303ms  13s715ms   6.00M   
6.00M   13.90 MB   10.12 MB  HASH(b.l_comment)   
E   |  F01:EXCHANGE SENDER3  3  36s758ms  44s115ms  
209.53 KB  0  
E   |  01:SCAN HDFS   3  3  13s637ms  15s775ms   6.00M   
6.00M   29.96 MB   80.00 MB  tpch_parquet.lineitem b 
E   04:EXCHANGE   3  3   4s874ms   7s223ms   6.00M   
6.00M   12.27 MB   10.12 MB  HASH(a.l_comment)   
E   F00:EXCHANGE SENDER   3  3  23s495ms  31s755ms  
209.53 KB  0  
E   00:SCAN HDFS  3  3  1m4s  1m8s   6.00M   
6.00M   29.96 MB   80.00 MB  tpch_parquet.lineitem a
{noformat}

The Max Time of F01:EXCHANGE SENDER was 44s115ms (non-child time).

The HASH JOIN BUILDER above the EXCHANGE SENDER had a non-child total time 
19s851ms.

The profiles of all HASH_JOIN_NODE operators (all 3 of the 3 fragment 
instances) has:

{noformat}
Runtime filters: 1 of 1 Runtime Filter Published
{noformat}

So it seems like the filters were published, but 60 sec still wasn't enough.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-10485) Support Iceberg field-id based column resolution in the ORC scanner

2021-06-14 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-10485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy resolved IMPALA-10485.

Fix Version/s: Impala 4.0
   Resolution: Fixed

> Support Iceberg field-id based column resolution in the ORC scanner
> ---
>
> Key: IMPALA-10485
> URL: https://issues.apache.org/jira/browse/IMPALA-10485
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>  Labels: impala-iceberg
> Fix For: Impala 4.0
>
>
> Currently the ORC scanner only support position-based column resolution.
> We need to add Iceberg field-id based column resolution to support schema 
> evolution.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10737) Optimize Iceberg metadata handling

2021-06-08 Thread Jira
Zoltán Borók-Nagy created IMPALA-10737:
--

 Summary: Optimize Iceberg metadata handling
 Key: IMPALA-10737
 URL: https://issues.apache.org/jira/browse/IMPALA-10737
 Project: IMPALA
  Issue Type: Bug
  Components: Frontend
Reporter: Zoltán Borók-Nagy


Currently we re-read Iceberg table metadata in several cases.

We should rather keep it in memory and use it when possible.

Also, when refreshing a table we should use Iceberg's refresh() API to avoid 
unnecessary re-reads of manifest files:

[https://github.com/apache/iceberg/blob/282b6f9f1cae8d4fd5ff7c73de513ca91f01fddc/core/src/main/java/org/apache/iceberg/TableOperations.java#L45]

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-10732) Use consistent DDL for specifying Iceberg partitions

2021-06-07 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-10732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy reassigned IMPALA-10732:
--

Assignee: Zoltán Borók-Nagy

> Use consistent DDL for specifying Iceberg partitions
> 
>
> Key: IMPALA-10732
> URL: https://issues.apache.org/jira/browse/IMPALA-10732
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>  Labels: impala-iceberg
>
> Currently we have a DDL syntax for defining Iceberg partitions that differs 
> from SparkSQL:
> [https://iceberg.apache.org/spark-ddl/#partitioned-by]
>  
> E.g. Impala is using the following syntax:
>  
> CREATE TABLE ice_t (i int, s string, ts timestamp, d date)
> *PARTITION BY SPEC (i BUCKET 5, ts MONTH, d YEAR)*
> STORED AS ICEBERG;
> The same in Spark is:
> CREATE TABLE ice_t (i int, s string, ts timestamp, d date)
> USING ICEBERG
> *PARTITIONED BY (bucket(5, i), months(ts), years(d))*
>  
> Impala's syntax is older but hasn't been released yet. Spark's syntax is 
> released so it cannot be changed.
>  
> Hive is also working on DDL support for Iceberg partitions, and they are 
> favoring the released SparkSQL syntax.
>  
> After dicsussing it on dev@impala we decided to use SparkSQL's syntax.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10732) Use consistent DDL for specifying Iceberg partitions

2021-06-07 Thread Jira
Zoltán Borók-Nagy created IMPALA-10732:
--

 Summary: Use consistent DDL for specifying Iceberg partitions
 Key: IMPALA-10732
 URL: https://issues.apache.org/jira/browse/IMPALA-10732
 Project: IMPALA
  Issue Type: Bug
Reporter: Zoltán Borók-Nagy


Currently we have a DDL syntax for defining Iceberg partitions that differs 
from SparkSQL:
[https://iceberg.apache.org/spark-ddl/#partitioned-by]
 
E.g. Impala is using the following syntax:
 
CREATE TABLE ice_t (i int, s string, ts timestamp, d date)

*PARTITION BY SPEC (i BUCKET 5, ts MONTH, d YEAR)*

STORED AS ICEBERG;
The same in Spark is:
CREATE TABLE ice_t (i int, s string, ts timestamp, d date)

USING ICEBERG

*PARTITIONED BY (bucket(5, i), months(ts), years(d))*
 
Impala's syntax is older but hasn't been released yet. Spark's syntax is 
released so it cannot be changed.
 
Hive is also working on DDL support for Iceberg partitions, and they are 
favoring the released SparkSQL syntax.
 
After dicsussing it on dev@impala we decided to use SparkSQL's syntax.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-10578) Big Query influence other query seriously when hardware not reach limit

2021-06-10 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-10578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy resolved IMPALA-10578.

Resolution: Not A Bug

> Big Query influence other query seriously when hardware not reach limit 
> 
>
> Key: IMPALA-10578
> URL: https://issues.apache.org/jira/browse/IMPALA-10578
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.4.0
> Environment: impala-3.4
> 80 machines with 96 cpu and 256GB mem
> scratch-dir is on separate disk different from HDFS data dir
>Reporter: wesleydeng
>Priority: Major
> Attachments: big_query.txt.bz2, image-2021-03-10-19-59-24-188.png, 
> image-2021-03-16-16-32-37-862.png, small_query_be_influenced_very_slow.txt.bz2
>
>
> When a big query is running(use mt_dop=8), other query is very difficult to 
> start. 
> A small query (select distinct one field from a small table)  may take about 
> 1 minutes, normallly it take only about 1~3 second.
>  From the impalad log, I found a incomprehensible log like this:
> !image-2021-03-16-16-32-37-862.png|width=836,height=189!
> !image-2021-03-10-19-59-24-188.png|width=892,height=435!
> ---
> About the gap between "Handling call" and "Deserializing Batch", I found 
> another path : 
> --KrpcDataStreamRecvr::SenderQueue::AddBatch
>   EnqueueDeferredRpc(move(payload), l);   // after dequeue, will call 
> KrpcDataStreamRecvr::SenderQueue::AddBatchWork
> --- 
>  
>  
> When the Big query is running, data spilled  has happened because mem_limit 
> was set and this big query waste a lot of memory.
>  
> In the attchment, I append the profile of big query and small query. The 
> small query can be finished in seconds normally. the timeline of small query 
> show  as below:
> Query Timeline: 21m39s
>  - Query submitted: 48.846us (48.846us)
>  - Planning finished: 2.934ms (2.886ms)
>  - Submit for admission: 12.572ms (9.637ms)
>  - Completed admission: 13.622ms (1.050ms)
>  - Ready to start on 56 backends: 15.271ms (1.649ms)
>  -- All 56 execution backends (171 fragment instances) started: 18s505ms 
> (18s489ms)*
>  - Rows available: 51s770ms (33s265ms)
>  - First row fetched: 57s220ms (5s449ms)
>  - Last row fetched: 59s119ms (1s899ms)
>  - Released admission control resources: 1m1s (2s223ms)
>  - AdmissionControlTimeSinceLastUpdate: 80.000ms
>  - ComputeScanRangeAssignmentTimer: 439.749us
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10741) Set engine.hive.enabled=true table property for Iceberg tables

2021-06-10 Thread Jira
Zoltán Borók-Nagy created IMPALA-10741:
--

 Summary: Set engine.hive.enabled=true table property for Iceberg 
tables
 Key: IMPALA-10741
 URL: https://issues.apache.org/jira/browse/IMPALA-10741
 Project: IMPALA
  Issue Type: Bug
Reporter: Zoltán Borók-Nagy


Hive relies on engine.hive.enabled=true table property to be set for Iceberg 
tables.
Without it Hive overwrites table metadata with different storage handler, 
SerDe/Input/OutputFormatter when it writes the table, making it unusable.

Impala should set this table property during table creation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-7556) Clean up ScanRange

2021-06-10 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-7556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy resolved IMPALA-7556.
---
Fix Version/s: Impala 4.1
   Resolution: Fixed

> Clean up ScanRange
> --
>
> Key: IMPALA-7556
> URL: https://issues.apache.org/jira/browse/IMPALA-7556
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Zoltán Borók-Nagy
>Assignee: Amogh Margoor
>Priority: Major
>  Labels: ramp-up
> Fix For: Impala 4.1
>
>
> For IMPALA-7543 I want to add some additional functionality to scan ranges.
> However, the code of the ScanRange class is already quite messy. It handles 
> different types of files, does some buffer management, updates all kinds of 
> counters.
> So, instead of complicating the code further, let's refactor the ScanRange 
> class a bit.
>  * Do the file operations in separate classes
>  ** A new, abstract class could be invented to provide an API for file 
> operations, i.e. Open(), ReadFromPos(), Close(), etc.
>  *** Keep in mind that the interface must be a good fit for IMPALA-7543, i.e. 
> we need positional reads from files
>  ** Operations for local files and HDFS files could be implemented in child 
> classes
>  * Buffer management
>  ** A new BufferStore class could be created
>  ** This new class would be responsible for managing the unused buffers
>  *** if possible, it would also handle the client and cached buffers as well
>  * Counters and metrics would be updated by the corresponding new classes
>  ** E.g. ImpaladMetrics::IO_MGR_NUM_OPEN_FILES would be updated by the file 
> handling classes



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10744) Send INSERT events even when Impala's even processing is not enabled

2021-06-10 Thread Jira
Zoltán Borók-Nagy created IMPALA-10744:
--

 Summary: Send INSERT events even when Impala's even processing is 
not enabled
 Key: IMPALA-10744
 URL: https://issues.apache.org/jira/browse/IMPALA-10744
 Project: IMPALA
  Issue Type: Bug
  Components: Catalog
Reporter: Zoltán Borók-Nagy


Generating insert events should not be conditional to events processor being 
active or not.

Please note that this will also need to fix a bug in the createInsertEvents() 
code as an INSERT with an empty result set raises an IllegalStateException:

create table ctas_empty  as select * from functional.alltypes limit 0;




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10744) Send INSERT events even when Impala's even processing is not enabled

2021-06-10 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-10744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy updated IMPALA-10744:
---
Description: 
Generating insert events should not be conditional to events processor being 
active or not.

Related code:
https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L5020-L5023

Please note that this will also need to fix a bug in the createInsertEvents() 
code as an INSERT with an empty result set raises an IllegalStateException:

create table ctas_empty  as select * from functional.alltypes limit 0;


  was:
Generating insert events should not be conditional to events processor being 
active or not.

Please note that this will also need to fix a bug in the createInsertEvents() 
code as an INSERT with an empty result set raises an IllegalStateException:

create table ctas_empty  as select * from functional.alltypes limit 0;



> Send INSERT events even when Impala's even processing is not enabled
> 
>
> Key: IMPALA-10744
> URL: https://issues.apache.org/jira/browse/IMPALA-10744
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Reporter: Zoltán Borók-Nagy
>Priority: Major
>
> Generating insert events should not be conditional to events processor being 
> active or not.
> Related code:
> https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L5020-L5023
> Please note that this will also need to fix a bug in the createInsertEvents() 
> code as an INSERT with an empty result set raises an IllegalStateException:
> create table ctas_empty  as select * from functional.alltypes limit 0;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10754) test_overlap_min_max_filters_on_sorted_columns failed during GVO

2021-06-21 Thread Jira


[ 
https://issues.apache.org/jira/browse/IMPALA-10754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17366568#comment-17366568
 ] 

Zoltán Borók-Nagy commented on IMPALA-10754:


Also seen in
 * https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/4338/testReport/
 * https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/4339/testReport/
 * https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/4344/testReport/
 * 
[https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/4347/testReport/]
 * 
[https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/4350/testReport/]

[~sql_forever] could you please take a look?

> test_overlap_min_max_filters_on_sorted_columns failed during GVO
> 
>
> Key: IMPALA-10754
> URL: https://issues.apache.org/jira/browse/IMPALA-10754
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Zoltán Borók-Nagy
>Assignee: Qifan Chen
>Priority: Major
>  Labels: broken-build
>
> test_overlap_min_max_filters_on_sorted_columns failed in the following build:
> https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/4338/testReport/
> *Stack trace:*
> {noformat}
> query_test/test_runtime_filters.py:296: in 
> test_overlap_min_max_filters_on_sorted_columns
> test_file_vars={'$RUNTIME_FILTER_WAIT_TIME_MS': str(WAIT_TIME_MS)})
> common/impala_test_suite.py:734: in run_test_case
> update_section=pytest.config.option.update_results)
> common/test_result_verifier.py:653: in verify_runtime_profile
> % (function, field, expected_value, actual_value, op, actual))
> E   AssertionError: Aggregation of SUM over NumRuntimeFilteredPages did not 
> match expected results.
> E   EXPECTED VALUE:
> E   58
> E   
> E   
> E   ACTUAL VALUE:
> E   59
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-10433) Use Iceberg's fixed partition transforms

2021-06-21 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-10433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy reassigned IMPALA-10433:
--

Assignee: Zoltán Borók-Nagy

> Use Iceberg's fixed partition transforms
> 
>
> Key: IMPALA-10433
> URL: https://issues.apache.org/jira/browse/IMPALA-10433
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>  Labels: impala-iceberg
>
> Currently the Iceberg time and date partition transforms are wrong if the 
> data is before the epoch.
> There's already an Iceberg pull request about it: 
> [https://github.com/apache/iceberg/pull/1981]
> Becauce of this bug Impala doesn't prune Iceberg partitions if the predicate 
> refers to timestamps before the epoch.
> Once the above pull request is merged we need to update our Iceberg 
> dependency.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-10433) Use Iceberg's fixed partition transforms

2021-06-21 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-10433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy resolved IMPALA-10433.

Fix Version/s: Impala 4.1
   Resolution: Fixed

> Use Iceberg's fixed partition transforms
> 
>
> Key: IMPALA-10433
> URL: https://issues.apache.org/jira/browse/IMPALA-10433
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>  Labels: impala-iceberg
> Fix For: Impala 4.1
>
>
> Currently the Iceberg time and date partition transforms are wrong if the 
> data is before the epoch.
> There's already an Iceberg pull request about it: 
> [https://github.com/apache/iceberg/pull/1981]
> Becauce of this bug Impala doesn't prune Iceberg partitions if the predicate 
> refers to timestamps before the epoch.
> Once the above pull request is merged we need to update our Iceberg 
> dependency.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Comment Edited] (IMPALA-10754) test_overlap_min_max_filters_on_sorted_columns failed during GVO

2021-06-22 Thread Jira


[ 
https://issues.apache.org/jira/browse/IMPALA-10754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17367192#comment-17367192
 ] 

Zoltán Borók-Nagy edited comment on IMPALA-10754 at 6/22/21, 10:14 AM:
---

Seen again in
 * https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/4352/testReport/
 * 
[https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/4354/testReport/]
 * 
[https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/4355/testReport/]


was (Author: boroknagyz):
Seen again in
 * 
[https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/4354/testReport/]
 * https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/4355/testReport/

> test_overlap_min_max_filters_on_sorted_columns failed during GVO
> 
>
> Key: IMPALA-10754
> URL: https://issues.apache.org/jira/browse/IMPALA-10754
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Zoltán Borók-Nagy
>Assignee: Qifan Chen
>Priority: Major
>  Labels: broken-build
>
> test_overlap_min_max_filters_on_sorted_columns failed in the following build:
> https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/4338/testReport/
> *Stack trace:*
> {noformat}
> query_test/test_runtime_filters.py:296: in 
> test_overlap_min_max_filters_on_sorted_columns
> test_file_vars={'$RUNTIME_FILTER_WAIT_TIME_MS': str(WAIT_TIME_MS)})
> common/impala_test_suite.py:734: in run_test_case
> update_section=pytest.config.option.update_results)
> common/test_result_verifier.py:653: in verify_runtime_profile
> % (function, field, expected_value, actual_value, op, actual))
> E   AssertionError: Aggregation of SUM over NumRuntimeFilteredPages did not 
> match expected results.
> E   EXPECTED VALUE:
> E   58
> E   
> E   
> E   ACTUAL VALUE:
> E   59
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10754) test_overlap_min_max_filters_on_sorted_columns failed during GVO

2021-06-22 Thread Jira


[ 
https://issues.apache.org/jira/browse/IMPALA-10754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17367192#comment-17367192
 ] 

Zoltán Borók-Nagy commented on IMPALA-10754:


Seen again in
 * 
[https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/4354/testReport/]
 * https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/4355/testReport/

> test_overlap_min_max_filters_on_sorted_columns failed during GVO
> 
>
> Key: IMPALA-10754
> URL: https://issues.apache.org/jira/browse/IMPALA-10754
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Zoltán Borók-Nagy
>Assignee: Qifan Chen
>Priority: Major
>  Labels: broken-build
>
> test_overlap_min_max_filters_on_sorted_columns failed in the following build:
> https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/4338/testReport/
> *Stack trace:*
> {noformat}
> query_test/test_runtime_filters.py:296: in 
> test_overlap_min_max_filters_on_sorted_columns
> test_file_vars={'$RUNTIME_FILTER_WAIT_TIME_MS': str(WAIT_TIME_MS)})
> common/impala_test_suite.py:734: in run_test_case
> update_section=pytest.config.option.update_results)
> common/test_result_verifier.py:653: in verify_runtime_profile
> % (function, field, expected_value, actual_value, op, actual))
> E   AssertionError: Aggregation of SUM over NumRuntimeFilteredPages did not 
> match expected results.
> E   EXPECTED VALUE:
> E   58
> E   
> E   
> E   ACTUAL VALUE:
> E   59
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Comment Edited] (IMPALA-10754) test_overlap_min_max_filters_on_sorted_columns failed during GVO

2021-06-22 Thread Jira


[ 
https://issues.apache.org/jira/browse/IMPALA-10754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17366568#comment-17366568
 ] 

Zoltán Borók-Nagy edited comment on IMPALA-10754 at 6/22/21, 10:39 AM:
---

Also seen in
 * 
[https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/4338/testReport/]
 * 
[https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/4339/testReport/]
 * 
[https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/4341/testReport/]
 * https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/4342/testReport/
 * 
[https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/4344/testReport/]
 * 
[https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/4347/testReport/]
 * 
[https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/4350/testReport/]

[~sql_forever] could you please take a look?


was (Author: boroknagyz):
Also seen in
 * https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/4338/testReport/
 * https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/4339/testReport/
 * https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/4344/testReport/
 * 
[https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/4347/testReport/]
 * 
[https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/4350/testReport/]

[~sql_forever] could you please take a look?

> test_overlap_min_max_filters_on_sorted_columns failed during GVO
> 
>
> Key: IMPALA-10754
> URL: https://issues.apache.org/jira/browse/IMPALA-10754
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Zoltán Borók-Nagy
>Assignee: Qifan Chen
>Priority: Major
>  Labels: broken-build
>
> test_overlap_min_max_filters_on_sorted_columns failed in the following build:
> https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/4338/testReport/
> *Stack trace:*
> {noformat}
> query_test/test_runtime_filters.py:296: in 
> test_overlap_min_max_filters_on_sorted_columns
> test_file_vars={'$RUNTIME_FILTER_WAIT_TIME_MS': str(WAIT_TIME_MS)})
> common/impala_test_suite.py:734: in run_test_case
> update_section=pytest.config.option.update_results)
> common/test_result_verifier.py:653: in verify_runtime_profile
> % (function, field, expected_value, actual_value, op, actual))
> E   AssertionError: Aggregation of SUM over NumRuntimeFilteredPages did not 
> match expected results.
> E   EXPECTED VALUE:
> E   58
> E   
> E   
> E   ACTUAL VALUE:
> E   59
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10757) ACID table locking for DML statements is faulty

2021-06-21 Thread Jira
Zoltán Borók-Nagy created IMPALA-10757:
--

 Summary: ACID table locking for DML statements is faulty
 Key: IMPALA-10757
 URL: https://issues.apache.org/jira/browse/IMPALA-10757
 Project: IMPALA
  Issue Type: Bug
  Components: Frontend
Reporter: Zoltán Borók-Nagy


Plain SELECT queries don't take ACID locks. They use the latest snapshot of the 
table that is loaded by CatalogD.

However, DML statements lock all the tables it references, not just the target 
table.

E.g.:
{noformat}
INSERT INTO target_table SELECT * FROM source_table;
{noformat}
acquires locks for both target_table and source_table. However, after acquiring 
the locks Impala doesn't reload the tables.

Therefore the following situation is possible:
{noformat}
INSERT OVERWRITE foo SELECT ...; (takes an exclusive lock for foo)
{noformat}
while the following statement also tries to take a SHARED_LOCK for foo:
{noformat}
INSERT INTO bar SELECT * FROM foo;
{noformat}
It means the INSERT INTO statement might wait for the completion of the INSERT 
OVERWRITE statement, but since it doesn't reload foo it will still use the old 
snapshot of foo, hence there was no benefit of waiting for the lock.

Possible solutions:
 # Re-load tables after the lock is acquired
 # Only take lock for the target table. This would be better than the current 
behavior, also it would be consistent with plain SELECT queries.

I think reloading should be favored as Impala should run every statement (that 
involves ACID tables) in a transaction and take proper locks, see IMPALA-8788.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-7087) Impala is unable to read Parquet decimal columns with lower precision/scale than table metadata

2021-06-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-7087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy reassigned IMPALA-7087:
-

Assignee: Zoltán Borók-Nagy

> Impala is unable to read Parquet decimal columns with lower precision/scale 
> than table metadata
> ---
>
> Key: IMPALA-7087
> URL: https://issues.apache.org/jira/browse/IMPALA-7087
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Tim Armstrong
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>  Labels: decimal, parquet, ramp-up
> Attachments: binary_decimal_precision_and_scale_widening.parquet
>
>
> This is similar to IMPALA-2515, except relates to a different precision/scale 
> in the file metadata rather than just a mismatch in the bytes used to store 
> the data. In a lot of cases we should be able to convert the decimal type on 
> the fly to the higher-precision type.
> {noformat}
> ERROR: File '/hdfs/path/00_0_x_2' column 'alterd_decimal' has an invalid 
> type length. Expecting: 11 len in file: 8
> {noformat}
> It would be convenient to allow reading parquet files where the 
> precision/scale in the file can be converted to the precision/scale in the 
> table metadata without loss of precision.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-8131) Impala is unable to read Parquet decimal columns with higher scale than table metadata

2021-06-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-8131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy reassigned IMPALA-8131:
-

Assignee: Zoltán Borók-Nagy

> Impala is unable to read Parquet decimal columns with higher scale than table 
> metadata
> --
>
> Key: IMPALA-8131
> URL: https://issues.apache.org/jira/browse/IMPALA-8131
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Sahil Takiar
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>  Labels: decimal, parquet
>
> Similar to IMPALA-7087, except we should allow Impala to read Parquet data 
> stored with a higher scale into a table with lower scale. The SQL Standard 
> allows for this behavior, and several other databases do this as well.
> More information on this can be found in 
> [this|https://issues.apache.org/jira/browse/IMPALA-7087?focusedCommentId=16688645=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16688645]
>  comment of IMPALA-7087



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10739) Add support for ALTER TABLE tbl SET PARTITION SPEC for Iceberg tables

2021-06-09 Thread Jira
Zoltán Borók-Nagy created IMPALA-10739:
--

 Summary: Add support for ALTER TABLE tbl SET PARTITION SPEC for 
Iceberg tables
 Key: IMPALA-10739
 URL: https://issues.apache.org/jira/browse/IMPALA-10739
 Project: IMPALA
  Issue Type: New Feature
Reporter: Zoltán Borók-Nagy


Impala should support partition evolution for Iceberg tables, i.e. it should be 
able to set a new partition spec for an Iceberg table via DDL.

The command should be

{noformat}
ALTER TABLE  SET PARTITION SPEC()
{noformat}

to be aligned with Hive.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-10187) Event processing fails on multiple events + DROP TABLE

2021-06-14 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-10187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy resolved IMPALA-10187.

Fix Version/s: Impala 4.1
   Resolution: Fixed

> Event processing fails on multiple events + DROP TABLE
> --
>
> Key: IMPALA-10187
> URL: https://issues.apache.org/jira/browse/IMPALA-10187
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Reporter: Zoltán Borók-Nagy
>Assignee: Vihang Karajgaonkar
>Priority: Major
> Fix For: Impala 4.1
>
>
> I've seen the following during interop testing:
> Some DDL statements (ALTER TABLE + DROP) were executed via Hive on the same 
> table.
> Then CatalogD's event processor tried to process the new events:
> {noformat}
> I0922 14:32:56.590229 13611 HdfsTable.java:709] Loaded file and block 
> metadata for 
> default.insertonly_hiveclient_impalaclient_partitioned_8ff3a1ef_b8a8_4c7a_b1c7_0b8f4c42c61e
>  partitions: category=cat1, category=cat2, category=cat3, and 1 others. Time 
> taken: 55.145ms
> I0922 14:32:56.591078 13611 TableLoader.java:103] Loaded metadata for: 
> default.insertonly_hiveclient_impalaclient_partitioned_8ff3a1ef_b8a8_4c7a_b1c7_0b8f4c42c61e
>  (303ms)
> I0922 14:32:58.022068 10065 MetastoreEventsProcessor.java:482] Received 41 
> events. Start event id : 39948
> I0922 14:32:58.022266 10065 MetastoreEvents.java:380] EventId: 39949 
> EventType: ALTER_PARTITION Creating event 39949 of type ALTER_PARTITION on 
> table 
> default.insertonly_hiveclient_impalaclient_partitioned_8ff3a1ef_b8a8_4c7a_b1c7_0b8f4c42c61e
> ...
> I0922 14:32:58.024389 10065 MetastoreEvents.java:380] EventId: 39962 
> EventType: DROP_TABLE Creating event 39962 of type DROP_TABLE on table 
> default.insertonly_hiveclient_impalaclient_partitioned_8ff3a1ef_b8a8_4c7a_b1c7_0b8f4c42c61e
> {noformat}
>  
> Impala tried to refresh the table on the first ALTER TABLE event, but since 
> it's been already dropped we get a TableLoadingException (caused by 
> NoSuchObjectException from HMS):
>  
> {noformat}
> I0922 14:32:58.028852 10065 MetastoreEvents.java:234] Total number of events 
> received: 41 Total number of events filtered out: 0
> I0922 14:32:58.028962 10065 CatalogServiceCatalog.java:862] Not a self-event 
> since the given version is -1 and service id is
> I0922 14:32:58.029369 10065 CatalogServiceCatalog.java:2142] Refreshing table 
> metadata: 
> default.insertonly_hiveclient_impalaclient_partitioned_8ff3a1ef_b8a8_4c7a_b1c7_0b8f4c42c61e
> E0922 14:32:58.038627 10065 MetastoreEventsProcessor.java:527] Unexpected 
> exception received while processing event
> Java exception follows:
> org.apache.impala.catalog.events.MetastoreNotificationException: Unable to 
> process event 39949 of type ALTER_PARTITION. Event processing will be stopped.
> at 
> org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:620)
> at 
> org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:513)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.impala.catalog.TableLoadingException: Error loading 
> metadata for table: 
> default.insertonly_hiveclient_impalaclient_partitioned_8ff3a1ef_b8a8_4c7a_b1c7_0b8f4c42c61e
> at 
> org.apache.impala.catalog.CatalogServiceCatalog.reloadTable(CatalogServiceCatalog.java:2160)
> at 
> org.apache.impala.catalog.CatalogServiceCatalog.reloadTableIfExists(CatalogServiceCatalog.java:2365)
> at 
> org.apache.impala.catalog.events.MetastoreEvents$MetastoreTableEvent.reloadTableFromCatalog(MetastoreEvents.java:563)
> at 
> org.apache.impala.catalog.events.MetastoreEvents$AlterPartitionEvent.process(MetastoreEvents.java:1454)
> at 
> org.apache.impala.catalog.events.MetastoreEvents$MetastoreEvent.processIfEnabled(MetastoreEvents.java:314)
> at 
> org.apach

[jira] [Assigned] (IMPALA-10754) test_overlap_min_max_filters_on_sorted_columns failed during GVO

2021-06-17 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-10754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy reassigned IMPALA-10754:
--

Assignee: Qifan Chen

> test_overlap_min_max_filters_on_sorted_columns failed during GVO
> 
>
> Key: IMPALA-10754
> URL: https://issues.apache.org/jira/browse/IMPALA-10754
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Zoltán Borók-Nagy
>Assignee: Qifan Chen
>Priority: Major
>  Labels: broken-build
>
> test_overlap_min_max_filters_on_sorted_columns failed in the following build:
> https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/4338/testReport/
> *Stack trace:*
> {noformat}
> query_test/test_runtime_filters.py:296: in 
> test_overlap_min_max_filters_on_sorted_columns
> test_file_vars={'$RUNTIME_FILTER_WAIT_TIME_MS': str(WAIT_TIME_MS)})
> common/impala_test_suite.py:734: in run_test_case
> update_section=pytest.config.option.update_results)
> common/test_result_verifier.py:653: in verify_runtime_profile
> % (function, field, expected_value, actual_value, op, actual))
> E   AssertionError: Aggregation of SUM over NumRuntimeFilteredPages did not 
> match expected results.
> E   EXPECTED VALUE:
> E   58
> E   
> E   
> E   ACTUAL VALUE:
> E   59
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10754) test_overlap_min_max_filters_on_sorted_columns failed during GVO

2021-06-17 Thread Jira
Zoltán Borók-Nagy created IMPALA-10754:
--

 Summary: test_overlap_min_max_filters_on_sorted_columns failed 
during GVO
 Key: IMPALA-10754
 URL: https://issues.apache.org/jira/browse/IMPALA-10754
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Reporter: Zoltán Borók-Nagy


test_overlap_min_max_filters_on_sorted_columns failed in the following build:
https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/4338/testReport/

*Stack trace:*

{noformat}
query_test/test_runtime_filters.py:296: in 
test_overlap_min_max_filters_on_sorted_columns
test_file_vars={'$RUNTIME_FILTER_WAIT_TIME_MS': str(WAIT_TIME_MS)})
common/impala_test_suite.py:734: in run_test_case
update_section=pytest.config.option.update_results)
common/test_result_verifier.py:653: in verify_runtime_profile
% (function, field, expected_value, actual_value, op, actual))
E   AssertionError: Aggregation of SUM over NumRuntimeFilteredPages did not 
match expected results.
E   EXPECTED VALUE:
E   58
E   
E   
E   ACTUAL VALUE:
E   59
{noformat}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10735) INSERT INTO Iceberg table fails during INSERT event generation

2021-06-08 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-10735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy updated IMPALA-10735:
---
Labels: impala-iceberg  (was: )

> INSERT INTO Iceberg table fails during INSERT event generation
> --
>
> Key: IMPALA-10735
> URL: https://issues.apache.org/jira/browse/IMPALA-10735
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Reporter: Zoltán Borók-Nagy
>Priority: Major
>  Labels: impala-iceberg
>
> INSERT INTO Iceberg table is broken when we use INSERT events.
> We get a NullPointerException for partitioned tables here:
> [https://github.com/apache/impala/blob/0c89a9cf0f2b642ca214e4fa68eeea9bc32ef3af/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L4953]
> Repro:
>  
> {noformat}
> create table test_ice (i int, p int) partition by spec (p bucket 5) stored as 
> iceberg;
> insert into test_ice values (1, 2);
> {noformat}
>  
> Since Hive Replication doesn't work for Iceberg tables yet it's probably 
> better to disable INSERT events for Iceberg tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10735) INSERT INTO Iceberg table fails during INSERT event generation

2021-06-08 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-10735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy updated IMPALA-10735:
---
Component/s: Catalog

> INSERT INTO Iceberg table fails during INSERT event generation
> --
>
> Key: IMPALA-10735
> URL: https://issues.apache.org/jira/browse/IMPALA-10735
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Reporter: Zoltán Borók-Nagy
>Priority: Major
>
> INSERT INTO Iceberg table is broken when we use INSERT events.
> We get a NullPointerException for partitioned tables here:
> [https://github.com/apache/impala/blob/0c89a9cf0f2b642ca214e4fa68eeea9bc32ef3af/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L4953]
> Repro:
>  
> {noformat}
> create table test_ice (i int, p int) partition by spec (p bucket 5) stored as 
> iceberg;
> insert into test_ice values (1, 2);
> {noformat}
>  
> Since Hive Replication doesn't work for Iceberg tables yet it's probably 
> better to disable INSERT events for Iceberg tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10735) INSERT INTO Iceberg table fails during INSERT event generation

2021-06-08 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-10735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy updated IMPALA-10735:
---
Description: 
INSERT INTO Iceberg table is broken when we use INSERT events.

We get a NullPointerException for partitioned tables here:

[https://github.com/apache/impala/blob/0c89a9cf0f2b642ca214e4fa68eeea9bc32ef3af/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L4953]

Repro:
{noformat}
create table test_ice (i int, p int) partition by spec (p bucket 5) stored as 
iceberg;
insert into test_ice values (1, 2);
{noformat}
Since Hive Replication doesn't work for Iceberg tables yet it's probably better 
to disable INSERT events for Iceberg tables.

  was:
INSERT INTO Iceberg table is broken when we use INSERT events.

We get a NullPointerException for partitioned tables here:

[https://github.com/apache/impala/blob/0c89a9cf0f2b642ca214e4fa68eeea9bc32ef3af/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L4953]

Repro:

 
{noformat}
create table test_ice (i int, p int) partition by spec (p bucket 5) stored as 
iceberg;
insert into test_ice values (1, 2);
{noformat}
 

Since Hive Replication doesn't work for Iceberg tables yet it's probably better 
to disable INSERT events for Iceberg tables.


> INSERT INTO Iceberg table fails during INSERT event generation
> --
>
> Key: IMPALA-10735
> URL: https://issues.apache.org/jira/browse/IMPALA-10735
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Reporter: Zoltán Borók-Nagy
>Priority: Major
>  Labels: impala-iceberg
>
> INSERT INTO Iceberg table is broken when we use INSERT events.
> We get a NullPointerException for partitioned tables here:
> [https://github.com/apache/impala/blob/0c89a9cf0f2b642ca214e4fa68eeea9bc32ef3af/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L4953]
> Repro:
> {noformat}
> create table test_ice (i int, p int) partition by spec (p bucket 5) stored as 
> iceberg;
> insert into test_ice values (1, 2);
> {noformat}
> Since Hive Replication doesn't work for Iceberg tables yet it's probably 
> better to disable INSERT events for Iceberg tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-10735) INSERT INTO Iceberg table fails during INSERT event generation

2021-06-08 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-10735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy reassigned IMPALA-10735:
--

Assignee: Zoltán Borók-Nagy

> INSERT INTO Iceberg table fails during INSERT event generation
> --
>
> Key: IMPALA-10735
> URL: https://issues.apache.org/jira/browse/IMPALA-10735
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>  Labels: impala-iceberg
>
> INSERT INTO Iceberg table is broken when we use INSERT events.
> We get a NullPointerException for partitioned tables here:
> [https://github.com/apache/impala/blob/0c89a9cf0f2b642ca214e4fa68eeea9bc32ef3af/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L4953]
> Repro:
> {noformat}
> create table test_ice (i int, p int) partition by spec (p bucket 5) stored as 
> iceberg;
> insert into test_ice values (1, 2);
> {noformat}
> Since Hive Replication doesn't work for Iceberg tables yet it's probably 
> better to disable INSERT events for Iceberg tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10735) INSERT INTO Iceberg table fails during INSERT event generation

2021-06-08 Thread Jira
Zoltán Borók-Nagy created IMPALA-10735:
--

 Summary: INSERT INTO Iceberg table fails during INSERT event 
generation
 Key: IMPALA-10735
 URL: https://issues.apache.org/jira/browse/IMPALA-10735
 Project: IMPALA
  Issue Type: Bug
Reporter: Zoltán Borók-Nagy


INSERT INTO Iceberg table is broken when we use INSERT events.

We get a NullPointerException for partitioned tables here:

[https://github.com/apache/impala/blob/0c89a9cf0f2b642ca214e4fa68eeea9bc32ef3af/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L4953]

Repro:

 
{noformat}
create table test_ice (i int, p int) partition by spec (p bucket 5) stored as 
iceberg;
insert into test_ice values (1, 2);
{noformat}
 

Since Hive Replication doesn't work for Iceberg tables yet it's probably better 
to disable INSERT events for Iceberg tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10736) Add support for Hive Replication for Iceberg tables

2021-06-08 Thread Jira
Zoltán Borók-Nagy created IMPALA-10736:
--

 Summary: Add support for Hive Replication for Iceberg tables
 Key: IMPALA-10736
 URL: https://issues.apache.org/jira/browse/IMPALA-10736
 Project: IMPALA
  Issue Type: Bug
Reporter: Zoltán Borók-Nagy


Hive Replication currently doesn't support Iceberg tables.

Once it will, we'll need to add support for it as well.

Currently Iceberg stores absolute paths in its metadata files, so we'll 
probably need to wait for this issue to be resolved as well: 
https://github.com/apache/iceberg/issues/1617



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10707) TestIceberg::test_iceberg_query failed with INNER EXCEPTION

2021-05-18 Thread Jira


[ 
https://issues.apache.org/jira/browse/IMPALA-10707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17346809#comment-17346809
 ] 

Zoltán Borók-Nagy commented on IMPALA-10707:


I've looked at the logs, and the ERROR log files of catalogd and impalad are 
full of errors:
{noformat}
 1.1G catalogd.ERROR
 202M impalad_node1.ERROR
{noformat}
They are full of with the following exceptions:
{noformat}
E0517 06:13:33.866099 11134 TransactionKeepalive.java:137] Unexpected exception 
thrown
Java exception follows:
java.lang.BootstrapMethodError: java.lang.NoClassDefFoundError:
at 
org.apache.impala.common.TransactionKeepalive$DaemonThread.run(TransactionKeepalive.java:114)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NoClassDefFoundError:
... 2 more
E0517 06:13:33.866303  9973 TransactionKeepalive.java:137] Unexpected exception 
thrown
Java exception follows:
java.lang.BootstrapMethodError: java.lang.NoClassDefFoundError:
at 
org.apache.impala.common.TransactionKeepalive$DaemonThread.run(TransactionKeepalive.java:114)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NoClassDefFoundError:
... 2 more
...
{noformat}

> TestIceberg::test_iceberg_query failed with INNER EXCEPTION
> ---
>
> Key: IMPALA-10707
> URL: https://issues.apache.org/jira/browse/IMPALA-10707
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Aman Sinha
>Priority: Major
>
> Few tests related to Iceberg failed in a recent run of  
> impala-asf-master-core-s3-data-cache:
> {noformat}
>  TestIceberg.test_iceberg_query[protocol: beeswax | exec_option: 
> {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 
> 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': 
> '-1:OPEN:SET_DENY_RESERVATION_PROBABILITY@1.0', 
> 'exec_single_node_rows_threshold': 0} | table_format: parquet/none] 
> E   ImpalaBeeswaxException: ImpalaBeeswaxException:
> EINNER EXCEPTION: 
> EMESSAGE: Timeout >7200s
> {noformat}
> fyi [~boroknagyz]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10707) TestIceberg::test_iceberg_query failed with INNER EXCEPTION

2021-05-18 Thread Jira


[ 
https://issues.apache.org/jira/browse/IMPALA-10707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17346901#comment-17346901
 ] 

Zoltán Borók-Nagy commented on IMPALA-10707:


The above error messages are symptoms of IMPALA-9900 / IMPALA-10057.

> TestIceberg::test_iceberg_query failed with INNER EXCEPTION
> ---
>
> Key: IMPALA-10707
> URL: https://issues.apache.org/jira/browse/IMPALA-10707
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Aman Sinha
>Priority: Major
>
> Few tests related to Iceberg failed in a recent run of  
> impala-asf-master-core-s3-data-cache:
> {noformat}
>  TestIceberg.test_iceberg_query[protocol: beeswax | exec_option: 
> {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 
> 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': 
> '-1:OPEN:SET_DENY_RESERVATION_PROBABILITY@1.0', 
> 'exec_single_node_rows_threshold': 0} | table_format: parquet/none] 
> E   ImpalaBeeswaxException: ImpalaBeeswaxException:
> EINNER EXCEPTION: 
> EMESSAGE: Timeout >7200s
> {noformat}
> fyi [~boroknagyz]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-10707) TestIceberg::test_iceberg_query failed with INNER EXCEPTION

2021-05-18 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-10707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy resolved IMPALA-10707.

Resolution: Duplicate

Closing this because the Impala cluster had a bad health. The root cause was 
probably IMPALA-9900 / IMPALA-10057.

> TestIceberg::test_iceberg_query failed with INNER EXCEPTION
> ---
>
> Key: IMPALA-10707
> URL: https://issues.apache.org/jira/browse/IMPALA-10707
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Aman Sinha
>Priority: Major
>
> Few tests related to Iceberg failed in a recent run of  
> impala-asf-master-core-s3-data-cache:
> {noformat}
>  TestIceberg.test_iceberg_query[protocol: beeswax | exec_option: 
> {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 
> 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': 
> '-1:OPEN:SET_DENY_RESERVATION_PROBABILITY@1.0', 
> 'exec_single_node_rows_threshold': 0} | table_format: parquet/none] 
> E   ImpalaBeeswaxException: ImpalaBeeswaxException:
> EINNER EXCEPTION: 
> EMESSAGE: Timeout >7200s
> {noformat}
> fyi [~boroknagyz]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10626) Add support for Iceberg's Catalogs API

2021-05-17 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-10626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy updated IMPALA-10626:
---
Description: 
Some engines (e.g. Spark and Hive) use different table properties for defining 
catalog properties.

In the main hive configuration they store the following properties:
 * {{iceberg.catalog..type = hadoop}}
 * {{iceberg.catalog..warehouse = somelocation}}

On the table level they have the following properties:
 * {{iceberg.catalog = }}
 * {{name = }}

To load tables that use these kind of configurations we should use Iceberg's 
Catalogs class:

[https://github.com/apache/iceberg/blob/master/mr/src/main/java/org/apache/iceberg/mr/Catalogs.java]

Probably we'll also want to use these properties by default in the future.

  was:
Some engines (e.g. Spark and Hive) use different table properties for defining 
catalog properties.

In the main hive configuration they store the following properties:
 * {{iceberg.catalog..type = hadoop}}
 * {{iceberg.catalog..warehouse = somelocation}}

On the table level they have the following properties:
 * {{iceberg.mr.table.catalog = }}
 * {{iceberg.mr.table.identifier = }}

To load tables that use these kind of configurations we should use Iceberg's 
Catalogs class:

[https://github.com/apache/iceberg/blob/master/mr/src/main/java/org/apache/iceberg/mr/Catalogs.java]

Probably we'll also want to use these properties by default in the future.


> Add support for Iceberg's Catalogs API
> --
>
> Key: IMPALA-10626
> URL: https://issues.apache.org/jira/browse/IMPALA-10626
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>  Labels: impala-iceberg
>
> Some engines (e.g. Spark and Hive) use different table properties for 
> defining catalog properties.
> In the main hive configuration they store the following properties:
>  * {{iceberg.catalog..type = hadoop}}
>  * {{iceberg.catalog..warehouse = somelocation}}
> On the table level they have the following properties:
>  * {{iceberg.catalog = }}
>  * {{name = }}
> To load tables that use these kind of configurations we should use Iceberg's 
> Catalogs class:
> [https://github.com/apache/iceberg/blob/master/mr/src/main/java/org/apache/iceberg/mr/Catalogs.java]
> Probably we'll also want to use these properties by default in the future.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-10626) Add support for Iceberg's Catalogs API

2021-05-11 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-10626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy reassigned IMPALA-10626:
--

Assignee: Zoltán Borók-Nagy

> Add support for Iceberg's Catalogs API
> --
>
> Key: IMPALA-10626
> URL: https://issues.apache.org/jira/browse/IMPALA-10626
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>  Labels: impala-iceberg
>
> Some engines (e.g. Spark and Hive) use different table properties for 
> defining catalog properties.
> In the main hive configuration they store the following properties:
>  * {{iceberg.catalog..type = hadoop}}
>  * {{iceberg.catalog..warehouse = somelocation}}
> On the table level they have the following properties:
>  * {{iceberg.mr.table.catalog = }}
>  * {{iceberg.mr.table.identifier = }}
> To load tables that use these kind of configurations we should use Iceberg's 
> Catalogs class:
> [https://github.com/apache/iceberg/blob/master/mr/src/main/java/org/apache/iceberg/mr/Catalogs.java]
> Probably we'll also want to use these properties by default in the future.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-5569) Implement UNSET TBLPROPERTIES for ALTER TABLE

2021-05-20 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy reassigned IMPALA-5569:
-

Assignee: Amogh Margoor

> Implement UNSET TBLPROPERTIES for ALTER TABLE
> -
>
> Key: IMPALA-5569
> URL: https://issues.apache.org/jira/browse/IMPALA-5569
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Affects Versions: Impala 2.9.0
>Reporter: Nándor Kollár
>Assignee: Amogh Margoor
>Priority: Minor
>
> Right now, I can set table properties via Impala client, but I couldn't find 
> a way to unset them. I can set them to empty string, but I've to use Hive CLI 
> to remove the key-value par.
> It would be nice to extend ALTER TABLE with UNSET clause to be able to unset 
> table properties.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10698) Remove branching based on MetastoreShim.getMajorVersion()

2021-05-10 Thread Jira
Zoltán Borók-Nagy created IMPALA-10698:
--

 Summary: Remove branching based on MetastoreShim.getMajorVersion()
 Key: IMPALA-10698
 URL: https://issues.apache.org/jira/browse/IMPALA-10698
 Project: IMPALA
  Issue Type: Improvement
  Components: Frontend
Reporter: Zoltán Borók-Nagy


IMPALA-9731 dropped support for Hive 2 and removed most code associated with it.
However we still have if statements that check MetastoreShim.getMajorVersion().

One can check it with:
{noformat}
git grep getMajorVersion
{noformat}

It would be nice to get rid of these because the 
MetastoreShim.getMajorVersion() == 2 branches are dead code now.

Moreover, in the test code there are still some places that check for 
HIVE_MAJOR_VERSION == 2, e.g.:
https://github.com/apache/impala/blob/c10e7c969dfcc4847a8f7708940f4aa1852dbee4/tests/metadata/test_hms_integration.py#L215

These can be also removed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10714) TestSpillingDebugActionDimensions::test_spilling_large_rows hid DCHECK

2021-05-26 Thread Jira
Zoltán Borók-Nagy created IMPALA-10714:
--

 Summary: 
TestSpillingDebugActionDimensions::test_spilling_large_rows hid DCHECK
 Key: IMPALA-10714
 URL: https://issues.apache.org/jira/browse/IMPALA-10714
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Reporter: Zoltán Borók-Nagy


TestSpillingDebugActionDimensions::test_spilling_large_rows hit DCHECK in 
exhaustive build.

The Impala git hash was: e11237e29 IMPALA-10197: Add KUDU_REPLICA_SELECTION 
query option

In impalad.FATAL:

{noformat}
F0525 12:45:49.307780 13122 buffered-tuple-stream.cc:531] 
564af337ca503984:f1209fc4] Check failed: 
read_iter->read_page_->attached_to_output_batch
{noformat}

Query 564af337ca503984:f1209fc4 was:

{noformat}
I0525 12:45:48.474383 17878 impala-server.cc:1324] 
564af337ca503984:f1209fc4] Registered query 
query_id=564af337ca503984:f1209fc4 
session_id=9e4875c17adf5e7a:eb72d33dc39b5288
I0525 12:45:48.474486 17878 Frontend.java:1618] 
564af337ca503984:f1209fc4] Analyzing query: select 
group_concat(string_col), length(bigstr) from bigstrs2
group by bigstr db: test_spilling_large_rows_119f6bb1
{noformat}

I couldn't reproduce the issue locally.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10714) TestSpillingDebugActionDimensions::test_spilling_large_rows hid DCHECK

2021-05-26 Thread Jira


[ 
https://issues.apache.org/jira/browse/IMPALA-10714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17351683#comment-17351683
 ] 

Zoltán Borók-Nagy commented on IMPALA-10714:


[~stigahuang] could you please take a look?

> TestSpillingDebugActionDimensions::test_spilling_large_rows hid DCHECK
> --
>
> Key: IMPALA-10714
> URL: https://issues.apache.org/jira/browse/IMPALA-10714
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Zoltán Borók-Nagy
>Priority: Major
>  Labels: broken-build, flaky
>
> TestSpillingDebugActionDimensions::test_spilling_large_rows hit DCHECK in 
> exhaustive build.
> The Impala git hash was: e11237e29 IMPALA-10197: Add KUDU_REPLICA_SELECTION 
> query option
> In impalad.FATAL:
> {noformat}
> F0525 12:45:49.307780 13122 buffered-tuple-stream.cc:531] 
> 564af337ca503984:f1209fc4] Check failed: 
> read_iter->read_page_->attached_to_output_batch
> {noformat}
> Query 564af337ca503984:f1209fc4 was:
> {noformat}
> I0525 12:45:48.474383 17878 impala-server.cc:1324] 
> 564af337ca503984:f1209fc4] Registered query 
> query_id=564af337ca503984:f1209fc4 
> session_id=9e4875c17adf5e7a:eb72d33dc39b5288
> I0525 12:45:48.474486 17878 Frontend.java:1618] 
> 564af337ca503984:f1209fc4] Analyzing query: select 
> group_concat(string_col), length(bigstr) from bigstrs2
> group by bigstr db: test_spilling_large_rows_119f6bb1
> {noformat}
> I couldn't reproduce the issue locally.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10714) TestSpillingDebugActionDimensions::test_spilling_large_rows hit DCHECK

2021-05-26 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-10714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy updated IMPALA-10714:
---
Summary: TestSpillingDebugActionDimensions::test_spilling_large_rows hit 
DCHECK  (was: TestSpillingDebugActionDimensions::test_spilling_large_rows hid 
DCHECK)

> TestSpillingDebugActionDimensions::test_spilling_large_rows hit DCHECK
> --
>
> Key: IMPALA-10714
> URL: https://issues.apache.org/jira/browse/IMPALA-10714
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Zoltán Borók-Nagy
>Priority: Major
>  Labels: broken-build, flaky
>
> TestSpillingDebugActionDimensions::test_spilling_large_rows hit DCHECK in 
> exhaustive build.
> The Impala git hash was: e11237e29 IMPALA-10197: Add KUDU_REPLICA_SELECTION 
> query option
> In impalad.FATAL:
> {noformat}
> F0525 12:45:49.307780 13122 buffered-tuple-stream.cc:531] 
> 564af337ca503984:f1209fc4] Check failed: 
> read_iter->read_page_->attached_to_output_batch
> {noformat}
> Query 564af337ca503984:f1209fc4 was:
> {noformat}
> I0525 12:45:48.474383 17878 impala-server.cc:1324] 
> 564af337ca503984:f1209fc4] Registered query 
> query_id=564af337ca503984:f1209fc4 
> session_id=9e4875c17adf5e7a:eb72d33dc39b5288
> I0525 12:45:48.474486 17878 Frontend.java:1618] 
> 564af337ca503984:f1209fc4] Analyzing query: select 
> group_concat(string_col), length(bigstr) from bigstrs2
> group by bigstr db: test_spilling_large_rows_119f6bb1
> {noformat}
> I couldn't reproduce the issue locally.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-9355) TestExchangeMemUsage.test_exchange_mem_usage_scaling doesn't hit the memory limit

2021-05-26 Thread Jira


[ 
https://issues.apache.org/jira/browse/IMPALA-9355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17351726#comment-17351726
 ] 

Zoltán Borók-Nagy commented on IMPALA-9355:
---

I've seen this in a recent exhaustive test run:
{noformat}
query_test/test_mem_usage_scaling.py:418: in test_exchange_mem_usage_scaling
 self.run_test_case('QueryTest/exchange-mem-scaling', vector)
common/impala_test_suite.py:732: in run_test_case
 expected_str, query)
E AssertionError: Expected exception: Memory limit exceeded
E 
E when running:
E 
E set mem_limit=171520k;
E set num_scanner_threads=1;
E select *
E from tpch_parquet.lineitem l1
E join tpch_parquet.lineitem l2 on l1.l_orderkey = l2.l_orderkey and
E l1.l_partkey = l2.l_partkey and l1.l_suppkey = l2.l_suppkey
E and l1.l_linenumber = l2.l_linenumber
E order by l1.l_orderkey desc, l1.l_partkey, l1.l_suppkey, l1.l_linenumber
E limit 5{noformat}

> TestExchangeMemUsage.test_exchange_mem_usage_scaling doesn't hit the memory 
> limit
> -
>
> Key: IMPALA-9355
> URL: https://issues.apache.org/jira/browse/IMPALA-9355
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Fang-Yu Rao
>Assignee: Qifan Chen
>Priority: Critical
>  Labels: broken-build, flaky
> Fix For: Impala 4.0
>
>
> The EE test {{test_exchange_mem_usage_scaling}} failed because the query at 
> [https://github.com/apache/impala/blame/master/testdata/workloads/functional-query/queries/QueryTest/exchange-mem-scaling.test#L7-L15]
>  does not hit the specified memory limit (170m) at 
> [https://github.com/apache/impala/blame/master/testdata/workloads/functional-query/queries/QueryTest/exchange-mem-scaling.test#L7].
>  We may need to further reduce the specified limit. In what follows the error 
> message is also given. Recall that the same issue occurred at 
> https://issues.apache.org/jira/browse/IMPALA-7873 but was resolved.
> {code:java}
> FAIL 
> query_test/test_mem_usage_scaling.py::TestExchangeMemUsage::()::test_exchange_mem_usage_scaling[protocol:
>  beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 5000, 'disable_codegen': False, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
> parquet/none]
> === FAILURES 
> ===
>  TestExchangeMemUsage.test_exchange_mem_usage_scaling[protocol: beeswax | 
> exec_option: {'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 5000, 'disable_codegen': False, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
> parquet/none] 
> [gw3] linux2 -- Python 2.7.12 
> /home/ubuntu/Impala/bin/../infra/python/env/bin/python
> query_test/test_mem_usage_scaling.py:386: in test_exchange_mem_usage_scaling
> self.run_test_case('QueryTest/exchange-mem-scaling', vector)
> common/impala_test_suite.py:674: in run_test_case
> expected_str, query)
> E   AssertionError: Expected exception: Memory limit exceeded
> E   
> E   when running:
> E   
> E   set mem_limit=170m;
> E   set num_scanner_threads=1;
> E   select *
> E   from tpch_parquet.lineitem l1
> E join tpch_parquet.lineitem l2 on l1.l_orderkey = l2.l_orderkey and
> E l1.l_partkey = l2.l_partkey and l1.l_suppkey = l2.l_suppkey
> E and l1.l_linenumber = l2.l_linenumber
> E   order by l1.l_orderkey desc, l1.l_partkey, l1.l_suppkey, l1.l_linenumber
> E   limit 5
> {code}
> [~tarmstr...@cloudera.com] and [~joemcdonnell] reviewed the patch at 
> [https://gerrit.cloudera.org/c/11965/]. Assign this JIRA to [~joemcdonnell] 
> for now. Please re-assign the JIRA to others as appropriate. Thanks!
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-9833) query_test.test_observability.TestQueryStates.test_error_query_state is flaky

2021-05-26 Thread Jira


[ 
https://issues.apache.org/jira/browse/IMPALA-9833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17351693#comment-17351693
 ] 

Zoltán Borók-Nagy commented on IMPALA-9833:
---

I saw this again in an exhaustive release run:

{noformat}
query_test/test_observability.py:808: in test_error_query_state
lambda: self.client.get_runtime_profile(handle))
common/impala_test_suite.py:1188: in assert_eventually
count, timeout_s, error_msg_str))
E   Timeout: Check failed to return True after 300 tries and 300 seconds error 
message: Query (id=ed4d8962122c7d00:7c61b26c):
{noformat}


> query_test.test_observability.TestQueryStates.test_error_query_state is flaky
> -
>
> Key: IMPALA-9833
> URL: https://issues.apache.org/jira/browse/IMPALA-9833
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 4.0
>Reporter: Xiaomeng Zhang
>Assignee: Quanlong Huang
>Priority: Major
> Attachments: consoleFull_impala-cdpd-master-exhaustive-release.txt, 
> impalad_excerpted.INFO.zip
>
>
> [https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/2521/testReport/junit/query_test.test_observability/TestQueryStates/test_error_query_state/]
> It seems the test could not get query profile after retries in 30s.
> {code:java}
> Stacktracequery_test/test_observability.py:777: in test_error_query_state
> lambda: self.client.get_runtime_profile(handle))
> common/impala_test_suite.py:1120: in assert_eventually
> count, timeout_s, error_msg_str))
> E   Timeout: Check failed to return True after 30 tries and 30 seconds error 
> message: Query (id=fe45e8bfd138acd3:c67a3796)
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-9833) query_test.test_observability.TestQueryStates.test_error_query_state is flaky

2021-05-26 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-9833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy updated IMPALA-9833:
--
Labels: broken-build  (was: )

> query_test.test_observability.TestQueryStates.test_error_query_state is flaky
> -
>
> Key: IMPALA-9833
> URL: https://issues.apache.org/jira/browse/IMPALA-9833
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 4.0
>Reporter: Xiaomeng Zhang
>Assignee: Quanlong Huang
>Priority: Major
>  Labels: broken-build
> Attachments: consoleFull_impala-cdpd-master-exhaustive-release.txt, 
> impalad_excerpted.INFO.zip
>
>
> [https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/2521/testReport/junit/query_test.test_observability/TestQueryStates/test_error_query_state/]
> It seems the test could not get query profile after retries in 30s.
> {code:java}
> Stacktracequery_test/test_observability.py:777: in test_error_query_state
> lambda: self.client.get_runtime_profile(handle))
> common/impala_test_suite.py:1120: in assert_eventually
> count, timeout_s, error_msg_str))
> E   Timeout: Check failed to return True after 30 tries and 30 seconds error 
> message: Query (id=fe45e8bfd138acd3:c67a3796)
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-10715) test_decimal_min_max_filters failed in exhaustive run

2021-05-26 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-10715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy reassigned IMPALA-10715:
--

Assignee: Qifan Chen

> test_decimal_min_max_filters failed in exhaustive run
> -
>
> Key: IMPALA-10715
> URL: https://issues.apache.org/jira/browse/IMPALA-10715
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Zoltán Borók-Nagy
>Assignee: Qifan Chen
>Priority: Major
>  Labels: broken-build
>
> test_decimal_min_max_filters failed in exhaustive run
> *Stack Trace*
> {noformat}
> query_test/test_runtime_filters.py:223: in test_decimal_min_max_filters
> test_file_vars={'$RUNTIME_FILTER_WAIT_TIME_MS': str(WAIT_TIME_MS)})
> common/impala_test_suite.py:775: in run_test_case
> update_section=pytest.config.option.update_results)
> common/test_result_verifier.py:653: in verify_runtime_profile
> % (function, field, expected_value, actual_value, op, actual))
> E   AssertionError: Aggregation of SUM over ProbeRows did not match expected 
> results.
> E   EXPECTED VALUE:
> E   102
> E   
> E   
> E   ACTUAL VALUE:
> E   38
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10715) test_decimal_min_max_filters failed in exhaustive run

2021-05-26 Thread Jira
Zoltán Borók-Nagy created IMPALA-10715:
--

 Summary: test_decimal_min_max_filters failed in exhaustive run
 Key: IMPALA-10715
 URL: https://issues.apache.org/jira/browse/IMPALA-10715
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Reporter: Zoltán Borók-Nagy


test_decimal_min_max_filters failed in exhaustive run
*Stack Trace*
{noformat}
query_test/test_runtime_filters.py:223: in test_decimal_min_max_filters
test_file_vars={'$RUNTIME_FILTER_WAIT_TIME_MS': str(WAIT_TIME_MS)})
common/impala_test_suite.py:775: in run_test_case
update_section=pytest.config.option.update_results)
common/test_result_verifier.py:653: in verify_runtime_profile
% (function, field, expected_value, actual_value, op, actual))
E   AssertionError: Aggregation of SUM over ProbeRows did not match expected 
results.
E   EXPECTED VALUE:
E   102
E   
E   
E   ACTUAL VALUE:
E   38
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10104) multiple if funtion and multiple-agg cause impalad crashed

2021-05-31 Thread Jira


[ 
https://issues.apache.org/jira/browse/IMPALA-10104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17354424#comment-17354424
 ] 

Zoltán Borók-Nagy commented on IMPALA-10104:


Seems like IMPALA-8969 or IMPALA-9809. Could you please try again with those 
fixes?

> multiple if funtion and multiple-agg cause  impalad crashed
> ---
>
> Key: IMPALA-10104
> URL: https://issues.apache.org/jira/browse/IMPALA-10104
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend, Frontend
>Affects Versions: Impala 3.2.0
> Environment: CDH6.3.1
> jdk 1.8.0_131
>Reporter: lxc
>Priority: Major
>
> sql:
> SELECT max(datekey) as datekey ,
>  if(`exp` like '%a','A',
>  if(`exp` like '%b', 'B',
>  if(`exp` like '%c', 'C',
>  if(`exp` like '%d', 'D', 'E' test,
>  sum(cast(money AS FLOAT)) / (count(*)/ 1000) AS ecpm,
>  count(*) AS e_num,
>  count(DISTINCT aa) AS r_num,
>  sum(isd) AS d_num,
>  sum(isc) AS c_num,
>  count(DISTINCT bb) AS uv,
>  sum(isd) /count(DISTINCT aa) AS d_rate,
>  sum(isc) /count(DISTINCT aa) AS c_rate,
>  sum(cast(money AS FLOAT)) AS money
>  FROM tableA
>  WHERE datekey = '20200812'
>  GROUP BY test;
> the coredump info.:
>  
> minicoredump:
> #0 0x7f1005f491f7 in raise () from /lib64/libc.so.6
> Missing separate debuginfos, use: debuginfo-install 
> cyrus-sasl-gssapi-2.1.26-23.el7.x86_64 cyrus-sasl-lib-2.1.26-23.el7.x86_64 
> cyrus-sasl-plain-2.1.26-23.el7.x86_64 glibc-2.17-196.el7.x86_64 
> keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-46.el7.x86_64 
> libcom_err-1.42.9-10.el7.x86_64 libdb-5.3.21-20.el7.x86_64 
> libselinux-2.5-11.el7.x86_64 nss-softokn-freebl-3.28.3-6.el7.x86_64 
> openssl-libs-1.0.2k-8.el7.x86_64 pcre-8.32-17.el7.x86_64 
> zlib-1.2.7-17.el7.x86_64
> (gdb) bt
> #0 0x7f1005f491f7 in raise () from /lib64/libc.so.6
> #1 0x7f1005f4a8e8 in abort () from /lib64/libc.so.6
> #2 0x048d4ca4 in google::DumpStackTraceAndExit() ()
> #3 0x048cb6fd in google::LogMessage::Fail() ()
> #4 0x048ccfa2 in google::LogMessage::SendToLog() ()
> #5 0x048cb0d7 in google::LogMessage::Flush() ()
> #6 0x048ce69e in google::LogMessageFatal::~LogMessageFatal() ()
> #7 0x0281201b in 
> impala::FreePool::CheckValidAllocation(impala::FreePool::FreeListNode*, 
> unsigned char*) const ()
> #8 0x02811cbd in impala::FreePool::Free(unsigned char*) ()
> #9 0x028104c3 in impala_udf::FunctionContext::Free(unsigned char*) ()
> #10 0x024c8ee6 in 
> impala::AggregateFunctions::StringValSerializeOrFinalize(impala_udf::FunctionContext*,
>  impala_udf::StringVal const&) ()
> #11 0x024c410c in 
> impala::AggFnEvaluator::SerializeOrFinalize(impala::Tuple*, 
> impala::SlotDescriptor const&, impala::Tuple*, void*) ()
> #12 0x7f0f008d88d5 in ?? ()
> #13 0x1c4e8790 in ?? ()
> #14 0x4fcef6b724afe578 in ?? ()
> #15 0x in ?? ()
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10578) Big Query influence other query seriously when hardware not reach limit

2021-05-31 Thread Jira


[ 
https://issues.apache.org/jira/browse/IMPALA-10578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17354392#comment-17354392
 ] 

Zoltán Borók-Nagy commented on IMPALA-10578:


Seems like it was a configuration problem. Can we close this issue?

> Big Query influence other query seriously when hardware not reach limit 
> 
>
> Key: IMPALA-10578
> URL: https://issues.apache.org/jira/browse/IMPALA-10578
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.4.0
> Environment: impala-3.4
> 80 machines with 96 cpu and 256GB mem
> scratch-dir is on separate disk different from HDFS data dir
>Reporter: wesleydeng
>Priority: Major
> Attachments: big_query.txt.bz2, image-2021-03-10-19-59-24-188.png, 
> image-2021-03-16-16-32-37-862.png, small_query_be_influenced_very_slow.txt.bz2
>
>
> When a big query is running(use mt_dop=8), other query is very difficult to 
> start. 
> A small query (select distinct one field from a small table)  may take about 
> 1 minutes, normallly it take only about 1~3 second.
>  From the impalad log, I found a incomprehensible log like this:
> !image-2021-03-16-16-32-37-862.png|width=836,height=189!
> !image-2021-03-10-19-59-24-188.png|width=892,height=435!
> ---
> About the gap between "Handling call" and "Deserializing Batch", I found 
> another path : 
> --KrpcDataStreamRecvr::SenderQueue::AddBatch
>   EnqueueDeferredRpc(move(payload), l);   // after dequeue, will call 
> KrpcDataStreamRecvr::SenderQueue::AddBatchWork
> --- 
>  
>  
> When the Big query is running, data spilled  has happened because mem_limit 
> was set and this big query waste a lot of memory.
>  
> In the attchment, I append the profile of big query and small query. The 
> small query can be finished in seconds normally. the timeline of small query 
> show  as below:
> Query Timeline: 21m39s
>  - Query submitted: 48.846us (48.846us)
>  - Planning finished: 2.934ms (2.886ms)
>  - Submit for admission: 12.572ms (9.637ms)
>  - Completed admission: 13.622ms (1.050ms)
>  - Ready to start on 56 backends: 15.271ms (1.649ms)
>  -- All 56 execution backends (171 fragment instances) started: 18s505ms 
> (18s489ms)*
>  - Rows available: 51s770ms (33s265ms)
>  - First row fetched: 57s220ms (5s449ms)
>  - Last row fetched: 59s119ms (1s899ms)
>  - Released admission control resources: 1m1s (2s223ms)
>  - AdmissionControlTimeSinceLastUpdate: 80.000ms
>  - ComputeScanRangeAssignmentTimer: 439.749us
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-6417) Extend various thread pools to track fragment instance id

2021-05-31 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-6417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy resolved IMPALA-6417.
---
Resolution: Fixed

> Extend various thread pools to track fragment instance id
> -
>
> Key: IMPALA-6417
> URL: https://issues.apache.org/jira/browse/IMPALA-6417
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>
> Extend impala::ThreadPool, DiskIoMgr, kudu::ThreadPool, and so on.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-6254) Track fragment instance id for general purpose threads

2021-05-31 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-6254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy resolved IMPALA-6254.
---
Resolution: Fixed

> Track fragment instance id for general purpose threads
> --
>
> Key: IMPALA-6254
> URL: https://issues.apache.org/jira/browse/IMPALA-6254
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 2.10.0
>Reporter: Lars Volker
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>  Labels: observability
>
> Fragment instance threads currently have the instance name in their thread 
> name.
> {noformat}
> exec-finstance (finst:1546b532c445c5f3:5dc794e50003)
> scanner-thread (finst:1546b532c445c5f3:5dc794e50003, plan-node-id:0, 
> thread-idx:0)
> scanner-thread (finst:1546b532c445c5f3:5dc794e50003, plan-node-id:0, 
> thread-idx:1)
> profile-report (finst:1546b532c445c5f3:5dc794e50003)
> scanner-thread (finst:1546b532c445c5f3:5dc794e50003, plan-node-id:0, 
> thread-idx:2)
> profile-report (finst:1546b532c445c5f3:5dc794e5)
> exec-finstance (finst:1546b532c445c5f3:5dc794e5)
> {noformat}
> For thread pools that do work that can be tied to particular fragment 
> instances, we should look into ways to annotate them. This could require 
> adding some breadcrumbs to each instance-specific work item (I/O requests 
> etc) so that the worker threads can annotate themselves.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-10713) Use PARTITION-level locking for static partition INSERTs for ACID tables

2021-06-01 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-10713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy resolved IMPALA-10713.

 Fix Version/s: Impala 4.0
Target Version: Impala 4.0
Resolution: Fixed

> Use PARTITION-level locking for static partition INSERTs for ACID tables
> 
>
> Key: IMPALA-10713
> URL: https://issues.apache.org/jira/browse/IMPALA-10713
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>  Labels: newbe, ramp-up
> Fix For: Impala 4.0
>
>
> Currently Impala always create TABLE-level locks for INSERTs for ACID tables:
> [https://github.com/apache/impala/blob/ced7b7d221cda30c65504e18082bb0af6c3cb595/fe/src/main/java/org/apache/impala/service/Frontend.java#L2153]
> For static partition INSERTs we could create PARTITION-level locks instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-10703) PrintPath() crashes with ARRAY in ORC format

2021-05-27 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-10703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy reassigned IMPALA-10703:
--

Assignee: Amogh Margoor

> PrintPath() crashes with ARRAY in ORC format
> 
>
> Key: IMPALA-10703
> URL: https://issues.apache.org/jira/browse/IMPALA-10703
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.4.0
>Reporter: Gabor Kaszab
>Assignee: Amogh Margoor
>Priority: Major
>  Labels: complextype, orc
>
> Repro steps:
>  - Issue only happens in debug build as apparently there is a DCHECK failing.
>  - You have to launch Impala with --log_level=3 option to increase the log 
> level.
>  - Then running this query crashes Impala:
> {code:java}
> select inner_arr.ITEM.e from functional_orc_def.complextypestbl tbl, 
> functional_orc_def.complextypestbl.nested_struct.c.d.ITEM inner_arr;
> {code}
>  
> Backtrace (relevant part):
> {code:java}
> #7  0x0280c2b4 in 
> impala::PrintPath[abi:cxx11](impala::TableDescriptor const&, std::vector std::allocator > const&) (tbl_desc=..., path=...) at 
> /home/gaborkaszab/shadow/Impala-upstream/be/src/util/debug-util.cc:237
> #8  0x02a69eeb in impala::HdfsOrcScanner::ResolveColumns 
> (this=0x10e79000, tuple_desc=..., 
> selected_nodes=0x7fe54980a7d0, pos_slots=0x7fe54980a780)
> at 
> /home/gaborkaszab/shadow/Impala-upstream/be/src/exec/hdfs-orc-scanner.cc:452
> #9  0x02a69cf7 in impala::HdfsOrcScanner::ResolveColumns 
> (this=0x10e79000, tuple_desc=..., 
> selected_nodes=0x7fe54980a7d0, pos_slots=0x7fe54980a780)
> at 
> /home/gaborkaszab/shadow/Impala-upstream/be/src/exec/hdfs-orc-scanner.cc:449
> #10 0x02a6a547 in impala::HdfsOrcScanner::SelectColumns 
> (this=0x10e79000, tuple_desc=...)
> at 
> /home/gaborkaszab/shadow/Impala-upstream/be/src/exec/hdfs-orc-scanner.cc:497
> #11 0x02a67720 in impala::HdfsOrcScanner::Open (this=0x10e79000, 
> context=0x7fe54980b260)
> at 
> /home/gaborkaszab/shadow/Impala-upstream/be/src/exec/hdfs-orc-scanner.cc:237
> #12 0x029f19c9 in 
> impala::HdfsScanNodeBase::CreateAndOpenScannerHelper (this=0xd280800, 
> partition=0xaac3d80, 
> context=0x7fe54980b260, scanner=0x7fe54980b258)
> at 
> /home/gaborkaszab/shadow/Impala-upstream/be/src/exec/hdfs-scan-node-base.cc:874
> #13 0x02baab86 in impala::HdfsScanNode::ProcessSplit (this=0xd280800, 
> filter_ctxs=..., 
> expr_results_pool=0x7fe54980b500, scan_range=0xac59c00, 
> scanner_thread_reservation=0x7fe54980b428)
> at 
> /home/gaborkaszab/shadow/Impala-upstream/be/src/exec/hdfs-scan-node.cc:480
> #14 0x02baa28a in impala::HdfsScanNode::ScannerThread 
> (this=0xd280800, first_thread=true, 
> scanner_thread_reservation=8192) at 
> /home/gaborkaszab/shadow/Impala-upstream/be/src/exec/hdfs-scan-node.cc:418
> #15 0x02ba95f2 in impala::HdfsScanNodeoperator()(void) 
> const (__closure=0x7fe54980bc28)
> at 
> /home/gaborkaszab/shadow/Impala-upstream/be/src/exec/hdfs-scan-node.cc:339
> {code}
> This DCHECK fails:
>  
> [https://github.com/apache/impala/blob/a47700ed790c2415e52a85e40063bed53a7cb9e8/be/src/util/debug-util.cc#L237]
> {code:java}
> Check failed: path[i] == 1 (5 vs. 1)
> {code}
> There was a similar issue recently, but here a different DCHECK fails:
> https://issues.apache.org/jira/browse/IMPALA-9918



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10713) Use PARTITION-level locking for static partition INSERTs for ACID tables

2021-06-03 Thread Jira


[jira] [Commented] (IMPALA-10656) Fire insert events before commit

2021-06-03 Thread Jira


[ 
https://issues.apache.org/jira/browse/IMPALA-10656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17356388#comment-17356388
 ] 

Zoltán Borók-Nagy commented on IMPALA-10656:


Can we resolve this issue?

> Fire insert events before commit
> 
>
> Key: IMPALA-10656
> URL: https://issues.apache.org/jira/browse/IMPALA-10656
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend, Frontend
>Reporter: Csaba Ringhofer
>Assignee: Csaba Ringhofer
>Priority: Major
>
> Currently Impala commits an insert first, then reloads the table from HMS, 
> and generates the insert events based on the difference between the two 
> snapshots. (e.g. which file was not present in the old snapshot but are there 
> in the new). Hive replication expects the insert events before the commit, so 
> this may potentially lead to issues there,
> The solution is to collect the new files during the insert in the backend, 
> and send the insert events based on this file set.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10413) Impala crashing when retrying failed query

2021-06-04 Thread Jira


[ 
https://issues.apache.org/jira/browse/IMPALA-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17357214#comment-17357214
 ] 

Zoltán Borók-Nagy commented on IMPALA-10413:


Can we resolve this Jira?

> Impala crashing when retrying failed query
> --
>
> Key: IMPALA-10413
> URL: https://issues.apache.org/jira/browse/IMPALA-10413
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.0
>Reporter: Xianqing He
>Assignee: Xianqing He
>Priority: Minor
>
> When retrying failed query, it may crash if cancel the query
> The core stack below
> {code:java}
> #0  0x7f1b20e87387 in raise () from /lib64/libc.so.6
> #1  0x7f1b20e88a78 in abort () from /lib64/libc.so.6
> #2  0x7f1b23b754b9 in os::abort(bool) () from 
> /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.252.b09-2.el7_8.x86_64/jre/lib/amd64/server/libjvm.so
> #3  0x7f1b23d93db6 in VMError::report_and_die() () from 
> /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.252.b09-2.el7_8.x86_64/jre/lib/amd64/server/libjvm.so
> #4  0x7f1b23b7f505 in JVM_handle_linux_signal () from 
> /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.252.b09-2.el7_8.x86_64/jre/lib/amd64/server/libjvm.so
> #5  0x7f1b23b72678 in signalHandler(int, siginfo_t*, void*) () from 
> /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.252.b09-2.el7_8.x86_64/jre/lib/amd64/server/libjvm.so
> #6  
> #7  push_back (__x=..., this=0x28) at 
> /data/Impala-ASF/toolchain/toolchain-packages-gcc7.5.0/gcc-7.5.0/include/c++/7.5.0/bits/stl_vector.h:941
> #8  AddDetail (d=..., this=0x0) at 
> /data/Impala-ASF/be/src/util/error-util.h:114
> #9  impala::Status::AddDetail (this=this@entry=0x7f1a4b971740, msg=...) at 
> /data/Impala-ASF/be/src/common/status.cc:236
> #10 0x0190a4fc in impala::QueryDriver::HandleRetryFailure 
> (this=this@entry=0xb1b2880, status=status@entry=0x7f1a4b971740, 
> error_msg=error_msg@entry=0x7f1a4b971860, 
> request_state=request_state@entry=0x9ca7e00, retry_query_id=...)
> at /data/Impala-ASF/be/src/runtime/query-driver.cc:351
> #11 0x0190c605 in impala::QueryDriver::RetryQueryFromThread 
> (this=0xb1b2880, error=..., query_driver=...) at 
> /data/Impala-ASF/be/src/runtime/query-driver.cc:293
> #12 0x01912459 in operator() (a2=..., a1=..., p=, 
> this=) at 
> /data/Impala-ASF/toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/bind/mem_fn_template.hpp:280
> #13 operator() impala::Status&, std::shared_ptr >, boost::_bi::list0> 
> (a=, f=..., this=)
> at 
> /data/Impala-ASF/toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/bind/bind.hpp:398
> #14 operator() (this=) at 
> /data/Impala-ASF/toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/bind/bind.hpp:1222
> #15 
> boost::detail::function::void_function_obj_invoker0 boost::_mfi::mf2 std::shared_ptr >, 
> boost::_bi::list3, 
> boost::_bi::value, 
> boost::_bi::value > > >, void>::invoke 
> (function_obj_ptr=...)
> at 
> /data/Impala-ASF/toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/function/function_template.hpp:159
> #16 0x015386f2 in operator() (this=0x7f1a4b971c00) at 
> /data/Impala-ASF/toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/function/function_template.hpp:770
> #17 impala::Thread::SuperviseThread(std::__cxx11::basic_string std::char_traits, std::allocator > const&, 
> std::__cxx11::basic_string, std::allocator 
> > const&, boost::function, impala::ThreadDebugInfo const*, 
> impala::Promise*) (name=..., category=..., 
> functor=..., parent_thread_info=, 
> thread_started=0x7f1ad5750ec0) at /data/Impala-ASF/be/src/util/thread.cc:360
> #18 0x01539b6b in operator() std::__cxx11::basic_string&, const std::__cxx11::basic_string&, 
> boost::function, const impala::ThreadDebugInfo*, impala::Promise int>*), boost::_bi::list0> (
> a=, 
> f=@0xa20de78: 0x15383f0 
>  std::char_traits, std::allocator > const&, 
> std::__cxx11::basic_string, std::allocator 
> > const&, boost::function, impala::ThreadDebugInfo const*, 
> impala::Promise*)>, this=0xa20de80) at 
> /data/Impala-ASF/toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/bind/bind.hpp:531
> #19 operator() (this=0xa20de78) at 
> /data/Impala-ASF/toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/bind/bind.hpp:1222
> #20 boost::detail::thread_data (*)(std::__cxx11::basic_string, 
> std::allocator > const&, std::__cxx11::basic_string std::char_traits, std::all

[jira] [Commented] (IMPALA-10187) Event processing fails on multiple events + DROP TABLE

2021-06-04 Thread Jira


[ 
https://issues.apache.org/jira/browse/IMPALA-10187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17357328#comment-17357328
 ] 

Zoltán Borók-Nagy commented on IMPALA-10187:


I've uploaded a fix for review: https://gerrit.cloudera.org/#/c/17542/

> Event processing fails on multiple events + DROP TABLE
> --
>
> Key: IMPALA-10187
> URL: https://issues.apache.org/jira/browse/IMPALA-10187
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Reporter: Zoltán Borók-Nagy
>Assignee: Vihang Karajgaonkar
>Priority: Major
>
> I've seen the following during interop testing:
> Some DDL statements (ALTER TABLE + DROP) were executed via Hive on the same 
> table.
> Then CatalogD's event processor tried to process the new events:
> {noformat}
> I0922 14:32:56.590229 13611 HdfsTable.java:709] Loaded file and block 
> metadata for 
> default.insertonly_hiveclient_impalaclient_partitioned_8ff3a1ef_b8a8_4c7a_b1c7_0b8f4c42c61e
>  partitions: category=cat1, category=cat2, category=cat3, and 1 others. Time 
> taken: 55.145ms
> I0922 14:32:56.591078 13611 TableLoader.java:103] Loaded metadata for: 
> default.insertonly_hiveclient_impalaclient_partitioned_8ff3a1ef_b8a8_4c7a_b1c7_0b8f4c42c61e
>  (303ms)
> I0922 14:32:58.022068 10065 MetastoreEventsProcessor.java:482] Received 41 
> events. Start event id : 39948
> I0922 14:32:58.022266 10065 MetastoreEvents.java:380] EventId: 39949 
> EventType: ALTER_PARTITION Creating event 39949 of type ALTER_PARTITION on 
> table 
> default.insertonly_hiveclient_impalaclient_partitioned_8ff3a1ef_b8a8_4c7a_b1c7_0b8f4c42c61e
> ...
> I0922 14:32:58.024389 10065 MetastoreEvents.java:380] EventId: 39962 
> EventType: DROP_TABLE Creating event 39962 of type DROP_TABLE on table 
> default.insertonly_hiveclient_impalaclient_partitioned_8ff3a1ef_b8a8_4c7a_b1c7_0b8f4c42c61e
> {noformat}
>  
> Impala tried to refresh the table on the first ALTER TABLE event, but since 
> it's been already dropped we get a TableLoadingException (caused by 
> NoSuchObjectException from HMS):
>  
> {noformat}
> I0922 14:32:58.028852 10065 MetastoreEvents.java:234] Total number of events 
> received: 41 Total number of events filtered out: 0
> I0922 14:32:58.028962 10065 CatalogServiceCatalog.java:862] Not a self-event 
> since the given version is -1 and service id is
> I0922 14:32:58.029369 10065 CatalogServiceCatalog.java:2142] Refreshing table 
> metadata: 
> default.insertonly_hiveclient_impalaclient_partitioned_8ff3a1ef_b8a8_4c7a_b1c7_0b8f4c42c61e
> E0922 14:32:58.038627 10065 MetastoreEventsProcessor.java:527] Unexpected 
> exception received while processing event
> Java exception follows:
> org.apache.impala.catalog.events.MetastoreNotificationException: Unable to 
> process event 39949 of type ALTER_PARTITION. Event processing will be stopped.
> at 
> org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:620)
> at 
> org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:513)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.impala.catalog.TableLoadingException: Error loading 
> metadata for table: 
> default.insertonly_hiveclient_impalaclient_partitioned_8ff3a1ef_b8a8_4c7a_b1c7_0b8f4c42c61e
> at 
> org.apache.impala.catalog.CatalogServiceCatalog.reloadTable(CatalogServiceCatalog.java:2160)
> at 
> org.apache.impala.catalog.CatalogServiceCatalog.reloadTableIfExists(CatalogServiceCatalog.java:2365)
> at 
> org.apache.impala.catalog.events.MetastoreEvents$MetastoreTableEvent.reloadTableFromCatalog(MetastoreEvents.java:563)
> at 
> org.apache.impala.catalog.events.MetastoreEvents$AlterPartitionEvent.process(MetastoreEvents.java:1454)
> at 
> org.apache.impala.catalog.events.Metast

[jira] [Assigned] (IMPALA-10713) Use PARTITION-level locking for static partition INSERTs for ACID tables

2021-05-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-10713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy reassigned IMPALA-10713:
--

Assignee: Zoltán Borók-Nagy

> Use PARTITION-level locking for static partition INSERTs for ACID tables
> 
>
> Key: IMPALA-10713
> URL: https://issues.apache.org/jira/browse/IMPALA-10713
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>  Labels: newbe, ramp-up
>
> Currently Impala always create TABLE-level locks for INSERTs for ACID tables:
> [https://github.com/apache/impala/blob/ced7b7d221cda30c65504e18082bb0af6c3cb595/fe/src/main/java/org/apache/impala/service/Frontend.java#L2153]
> For static partition INSERTs we could create PARTITION-level locks instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10713) Use PARTITION-level locking for static partition INSERTs for ACID tables

2021-05-21 Thread Jira
Zoltán Borók-Nagy created IMPALA-10713:
--

 Summary: Use PARTITION-level locking for static partition INSERTs 
for ACID tables
 Key: IMPALA-10713
 URL: https://issues.apache.org/jira/browse/IMPALA-10713
 Project: IMPALA
  Issue Type: Improvement
  Components: Frontend
Reporter: Zoltán Borók-Nagy


Currently Impala always create TABLE-level locks for INSERTs for ACID tables:

[https://github.com/apache/impala/blob/ced7b7d221cda30c65504e18082bb0af6c3cb595/fe/src/main/java/org/apache/impala/service/Frontend.java#L2153]

For static partition INSERTs we could create PARTITION-level locks instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-10634) Removes outer join if it only has DISTINCT on streamed side

2021-05-31 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-10634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy resolved IMPALA-10634.

Resolution: Duplicate

> Removes outer join if it only has DISTINCT on streamed side
> ---
>
> Key: IMPALA-10634
> URL: https://issues.apache.org/jira/browse/IMPALA-10634
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Yuming Wang
>Priority: Major
>
> Removes outer join if it only has DISTINCT on streamed side:
> {code:sql}
> CREATE TABLE t1(a int, b int);
> CREATE TABLE t2(a int, b int);
> SELECT DISTINCT t1.b FROM t1 LEFT JOIN t2 ON t1.a = t2.a;
> {code}
> We can rewrite {{SELECT DISTINCT b FROM t1 LEFT JOIN t2 ON t1.a = t2.a}} to 
> {{SELECT DISTINCT t1.b FROM t1}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Reopened] (IMPALA-9355) TestExchangeMemUsage.test_exchange_mem_usage_scaling doesn't hit the memory limit

2021-05-31 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-9355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy reopened IMPALA-9355:
---

> TestExchangeMemUsage.test_exchange_mem_usage_scaling doesn't hit the memory 
> limit
> -
>
> Key: IMPALA-9355
> URL: https://issues.apache.org/jira/browse/IMPALA-9355
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Fang-Yu Rao
>Assignee: Qifan Chen
>Priority: Critical
>  Labels: broken-build, flaky
> Fix For: Impala 4.0
>
>
> The EE test {{test_exchange_mem_usage_scaling}} failed because the query at 
> [https://github.com/apache/impala/blame/master/testdata/workloads/functional-query/queries/QueryTest/exchange-mem-scaling.test#L7-L15]
>  does not hit the specified memory limit (170m) at 
> [https://github.com/apache/impala/blame/master/testdata/workloads/functional-query/queries/QueryTest/exchange-mem-scaling.test#L7].
>  We may need to further reduce the specified limit. In what follows the error 
> message is also given. Recall that the same issue occurred at 
> https://issues.apache.org/jira/browse/IMPALA-7873 but was resolved.
> {code:java}
> FAIL 
> query_test/test_mem_usage_scaling.py::TestExchangeMemUsage::()::test_exchange_mem_usage_scaling[protocol:
>  beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 5000, 'disable_codegen': False, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
> parquet/none]
> === FAILURES 
> ===
>  TestExchangeMemUsage.test_exchange_mem_usage_scaling[protocol: beeswax | 
> exec_option: {'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 5000, 'disable_codegen': False, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
> parquet/none] 
> [gw3] linux2 -- Python 2.7.12 
> /home/ubuntu/Impala/bin/../infra/python/env/bin/python
> query_test/test_mem_usage_scaling.py:386: in test_exchange_mem_usage_scaling
> self.run_test_case('QueryTest/exchange-mem-scaling', vector)
> common/impala_test_suite.py:674: in run_test_case
> expected_str, query)
> E   AssertionError: Expected exception: Memory limit exceeded
> E   
> E   when running:
> E   
> E   set mem_limit=170m;
> E   set num_scanner_threads=1;
> E   select *
> E   from tpch_parquet.lineitem l1
> E join tpch_parquet.lineitem l2 on l1.l_orderkey = l2.l_orderkey and
> E l1.l_partkey = l2.l_partkey and l1.l_suppkey = l2.l_suppkey
> E and l1.l_linenumber = l2.l_linenumber
> E   order by l1.l_orderkey desc, l1.l_partkey, l1.l_suppkey, l1.l_linenumber
> E   limit 5
> {code}
> [~tarmstr...@cloudera.com] and [~joemcdonnell] reviewed the patch at 
> [https://gerrit.cloudera.org/c/11965/]. Assign this JIRA to [~joemcdonnell] 
> for now. Please re-assign the JIRA to others as appropriate. Thanks!
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10744) Send INSERT events even when Impala's even processing is not enabled

2021-06-24 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-10744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy updated IMPALA-10744:
---
Description: 
Generating insert events should not be conditional to events processor being 
active or not.

Related code:

https://github.com/apache/impala/blob/d99caa1f3a049fc5e20855f8e8bf846fd81f65c5/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L5124

Please note that this will also need to fix a bug in the createInsertEvents() 
code as an INSERT with an empty result set raises an IllegalStateException:

create table ctas_empty as select * from functional.alltypes limit 0;

  was:
Generating insert events should not be conditional to events processor being 
active or not.

Related code:
https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L5020-L5023

Please note that this will also need to fix a bug in the createInsertEvents() 
code as an INSERT with an empty result set raises an IllegalStateException:

create table ctas_empty  as select * from functional.alltypes limit 0;



> Send INSERT events even when Impala's even processing is not enabled
> 
>
> Key: IMPALA-10744
> URL: https://issues.apache.org/jira/browse/IMPALA-10744
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Reporter: Zoltán Borók-Nagy
>Priority: Major
>
> Generating insert events should not be conditional to events processor being 
> active or not.
> Related code:
> https://github.com/apache/impala/blob/d99caa1f3a049fc5e20855f8e8bf846fd81f65c5/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L5124
> Please note that this will also need to fix a bug in the createInsertEvents() 
> code as an INSERT with an empty result set raises an IllegalStateException:
> create table ctas_empty as select * from functional.alltypes limit 0;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10736) Add support for Hive Replication for Iceberg tables

2021-07-07 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-10736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy updated IMPALA-10736:
---
Labels: impala-iceberg  (was: )

> Add support for Hive Replication for Iceberg tables
> ---
>
> Key: IMPALA-10736
> URL: https://issues.apache.org/jira/browse/IMPALA-10736
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Reporter: Zoltán Borók-Nagy
>Priority: Major
>  Labels: impala-iceberg
>
> Hive Replication currently doesn't support Iceberg tables.
> Once it will, we'll need to add support for it as well.
> Currently Iceberg stores absolute paths in its metadata files, so we'll 
> probably need to wait for this issue to be resolved as well: 
> https://github.com/apache/iceberg/issues/1617



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10736) Add support for Hive Replication for Iceberg tables

2021-07-07 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-10736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy updated IMPALA-10736:
---
Component/s: Catalog

> Add support for Hive Replication for Iceberg tables
> ---
>
> Key: IMPALA-10736
> URL: https://issues.apache.org/jira/browse/IMPALA-10736
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Reporter: Zoltán Borók-Nagy
>Priority: Major
>
> Hive Replication currently doesn't support Iceberg tables.
> Once it will, we'll need to add support for it as well.
> Currently Iceberg stores absolute paths in its metadata files, so we'll 
> probably need to wait for this issue to be resolved as well: 
> https://github.com/apache/iceberg/issues/1617



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10777) Enable min/max filtering for Iceberg partitions.

2021-07-06 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-10777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy updated IMPALA-10777:
---
Component/s: Frontend

> Enable min/max filtering for Iceberg partitions.
> 
>
> Key: IMPALA-10777
> URL: https://issues.apache.org/jira/browse/IMPALA-10777
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Reporter: Qifan Chen
>Priority: Major
>
> The work to enable min/max filters for partition columns is underway. See 
> IMPALA-10738. 
> It is nice to enable the filtering for iceberg partitions. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10777) Enable min/max filtering for Iceberg partitions.

2021-07-06 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-10777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy updated IMPALA-10777:
---
Labels: impala-iceberg  (was: )

> Enable min/max filtering for Iceberg partitions.
> 
>
> Key: IMPALA-10777
> URL: https://issues.apache.org/jira/browse/IMPALA-10777
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Reporter: Qifan Chen
>Priority: Major
>  Labels: impala-iceberg
>
> The work to enable min/max filters for partition columns is underway. See 
> IMPALA-10738. 
> It is nice to enable the filtering for iceberg partitions. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-10658) LOAD DATA INPATH silently fails between HDFS and Azure ABFS

2021-04-26 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-10658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy resolved IMPALA-10658.

Fix Version/s: Impala 4.0
   Resolution: Fixed

> LOAD DATA INPATH silently fails between HDFS and Azure ABFS
> ---
>
> Key: IMPALA-10658
> URL: https://issues.apache.org/jira/browse/IMPALA-10658
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
> Fix For: Impala 4.0
>
>
> LOAD DATA INPATH silently fails when Impala tries to move files from HDFS to 
> ABFS.
> The problem is that in 'relocateFile()' we try to figure out if 'sourceFile' 
> is on the destination filesystem:
> https://github.com/apache/impala/blob/6b16df9e9a4696b46b6f9c7fe2fc0aaded285623/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java#L246
> We use the following code to decide this:
> https://github.com/apache/impala/blob/6b16df9e9a4696b46b6f9c7fe2fc0aaded285623/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java#L581-L591
> However, the Azure FileSystem implementation doesn't throw an exception in 
> 'fs.makeQualified(path);'. I just happily returns a new Path substituting the 
> prefix "hdfs://" to "abfs://".
> So in relocateFile() Impala thinks the 'sourceFile' and 'destFile' are on the 
> same filesystem so it tries to invoke 'destFs.rename()':
> https://github.com/apache/impala/blob/6b16df9e9a4696b46b6f9c7fe2fc0aaded285623/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java#L266
> From 
> https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/filesystem/filesystem.html#boolean_rename.28Path_src.2C_Path_d.29
>  : "In terms of its implementation, it is the one with the most ambiguity 
> regarding when to return false versus raising an exception."
> Seems like the Azure FileSystem implementation doesn't throw an exception on 
> failure, but returns false instead. Unfortunately Impala doesn't check the 
> return value of destFs.rename() (see above), so the error remains silent.
> To fix this issue we need to do two things:
> * fix FileSystemUtil.isPathOnFileSystem()
> * check the return value of destFs.rename() and throw an exception when it's 
> false



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10679) Create SHA2 builtin function

2021-04-27 Thread Jira
Zoltán Borók-Nagy created IMPALA-10679:
--

 Summary: Create SHA2 builtin function
 Key: IMPALA-10679
 URL: https://issues.apache.org/jira/browse/IMPALA-10679
 Project: IMPALA
  Issue Type: New Feature
  Components: Backend
Reporter: Zoltán Borók-Nagy


Add support for the SHA2 family of hash functions (SHA-224, SHA-256, SHA-384, 
and SHA-512).

Hive already supports SHA2: HIVE-10644

We should add a similar builtin function.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10679) Create SHA2 builtin function

2021-04-29 Thread Jira


[ 
https://issues.apache.org/jira/browse/IMPALA-10679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17335506#comment-17335506
 ] 

Zoltán Borók-Nagy commented on IMPALA-10679:


I see, [~JacquesJZa]. FIPS mode only affects you if you run impala in a 
FIPS-enabled environment.

In such environments we cannot use the forbidden hash algorithms (e.g. MD5) at 
all, not even internally, see e.g. IMPALA-10205

Cc [~wzhou]

> Create SHA2 builtin function
> 
>
> Key: IMPALA-10679
> URL: https://issues.apache.org/jira/browse/IMPALA-10679
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Backend
>Reporter: Zoltán Borók-Nagy
>Assignee: Amogh Margoor
>Priority: Major
>  Labels: newbie, ramp-up
>
> Add support for the SHA2 family of hash functions (SHA-224, SHA-256, SHA-384, 
> and SHA-512).
> Hive already supports SHA2: HIVE-10644
> We should add a similar builtin function.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10685) OUTER JOIN against ACID collections might be converted to INNER JOIN

2021-04-29 Thread Jira
Zoltán Borók-Nagy created IMPALA-10685:
--

 Summary: OUTER JOIN against ACID collections might be converted to 
INNER JOIN
 Key: IMPALA-10685
 URL: https://issues.apache.org/jira/browse/IMPALA-10685
 Project: IMPALA
  Issue Type: Bug
Reporter: Zoltán Borók-Nagy


We are rewriting "A join B" to "A join B1 join B2" for some queries that refer 
to collections in ACID tables. This is ok for inner join but may be incorrect 
for outer joins. Here is an example, the two queries produce different results:

Query works well for non-ACID table:
{noformat}
impala> with v as (
  select ('k4') as key
  union all
  values ('k1'), ('k2'), ('k3')
) select * from v left join functional_parquet.complextypestbl.int_map using 
(key);
+-+--+---+
| key | key  | value |
+-+--+---+
| k1  | k1   | -1|
| k1  | k1   | 1 |
| k2  | k2   | 100   |
| k1  | k1   | 2 |
| k2  | k2   | NULL  |
| k1  | k1   | NULL  |
| k3  | k3   | NULL  |
| k4  | NULL | NULL  |
+-+--+---+
Fetched 8 row(s) in 3.35s
{noformat}
LEFT OUTER JOIN converted to INNER JOIN for ACID table:
{noformat}
impala> with v as (
  select ('k4') as key
  union all
  values ('k1'), ('k2'), ('k3')
) select * from v left join functional_orc_def.complextypestbl.int_map using 
(key);
+-+-+---+
| key | key | value |
+-+-+---+
| k1  | k1  | -1|
| k1  | k1  | 1 |
| k2  | k2  | 100   |
| k1  | k1  | 2 |
| k2  | k2  | NULL  |
| k1  | k1  | NULL  |
| k3  | k3  | NULL  |
+-+-+---+
Fetched 7 row(s) in 0.35s
{noformat}
 IMPALA-9494 can help to fix this. Until that we could use the techniques from 
IMPALA-9330.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10679) Create SHA2 builtin function

2021-04-29 Thread Jira


[ 
https://issues.apache.org/jira/browse/IMPALA-10679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17335661#comment-17335661
 ] 

Zoltán Borók-Nagy commented on IMPALA-10679:


Thanks for the docs, [~wzhou]. So my understanding is that we can add an MD5 
builtin function to Impala (similar to Hive's) that users can use in their 
SELECT queries, but only if they don't run their Impala cluster in FIPS mode. 
In FIPS mode Impala should raise an error for "SELECT MD5('ABC')", right?

> Create SHA2 builtin function
> 
>
> Key: IMPALA-10679
> URL: https://issues.apache.org/jira/browse/IMPALA-10679
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Backend
>Reporter: Zoltán Borók-Nagy
>Assignee: Amogh Margoor
>Priority: Major
>  Labels: newbie, ramp-up
>
> Add support for the SHA2 family of hash functions (SHA-224, SHA-256, SHA-384, 
> and SHA-512).
> Hive already supports SHA2: HIVE-10644
> We should add a similar builtin function.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10685) OUTER JOIN against ACID collections might be converted to INNER JOIN

2021-04-30 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-10685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy updated IMPALA-10685:
---
Description: 
We are rewriting "A join B" to "A join B1 join B2" for some queries that refer 
to collections in ACID tables. This is ok for inner join but may be incorrect 
for outer joins. Here is an example, the two queries produce different results:

Query works well for non-ACID table:
{noformat}
impala> with v as (
  select ('k4') as key
  union all
  values ('k1'), ('k2'), ('k3')
) select * from v left join functional_parquet.complextypestbl.int_map using 
(key);
+-+--+---+
| key | key  | value |
+-+--+---+
| k1  | k1   | -1|
| k1  | k1   | 1 |
| k2  | k2   | 100   |
| k1  | k1   | 2 |
| k2  | k2   | NULL  |
| k1  | k1   | NULL  |
| k3  | k3   | NULL  |
| k4  | NULL | NULL  |
+-+--+---+
Fetched 8 row(s) in 3.35s
{noformat}
LEFT OUTER JOIN converted to INNER JOIN for ACID table:
{noformat}
impala> with v as (
  select ('k4') as key
  union all
  values ('k1'), ('k2'), ('k3')
) select * from v left join functional_orc_def.complextypestbl.int_map using 
(key);
+-+-+---+
| key | key | value |
+-+-+---+
| k1  | k1  | -1|
| k1  | k1  | 1 |
| k2  | k2  | 100   |
| k1  | k1  | 2 |
| k2  | k2  | NULL  |
| k1  | k1  | NULL  |
| k3  | k3  | NULL  |
+-+-+---+
Fetched 7 row(s) in 0.35s
{noformat}
 IMPALA-9494 can help to fix this. Until that we could use the techniques from 
IMPALA-9330.

Possible workaround is to rewrite the query to:

{noformat}
with v as (
  select ('k4') as key
  union all
  values ('k1'), ('k2'), ('k3')
)
select * from v
left join
(select int_map.* from
 functional_orc_def.complextypestbl c, c.int_map) vv
using (key);
{noformat}


  was:
We are rewriting "A join B" to "A join B1 join B2" for some queries that refer 
to collections in ACID tables. This is ok for inner join but may be incorrect 
for outer joins. Here is an example, the two queries produce different results:

Query works well for non-ACID table:
{noformat}
impala> with v as (
  select ('k4') as key
  union all
  values ('k1'), ('k2'), ('k3')
) select * from v left join functional_parquet.complextypestbl.int_map using 
(key);
+-+--+---+
| key | key  | value |
+-+--+---+
| k1  | k1   | -1|
| k1  | k1   | 1 |
| k2  | k2   | 100   |
| k1  | k1   | 2 |
| k2  | k2   | NULL  |
| k1  | k1   | NULL  |
| k3  | k3   | NULL  |
| k4  | NULL | NULL  |
+-+--+---+
Fetched 8 row(s) in 3.35s
{noformat}
LEFT OUTER JOIN converted to INNER JOIN for ACID table:
{noformat}
impala> with v as (
  select ('k4') as key
  union all
  values ('k1'), ('k2'), ('k3')
) select * from v left join functional_orc_def.complextypestbl.int_map using 
(key);
+-+-+---+
| key | key | value |
+-+-+---+
| k1  | k1  | -1|
| k1  | k1  | 1 |
| k2  | k2  | 100   |
| k1  | k1  | 2 |
| k2  | k2  | NULL  |
| k1  | k1  | NULL  |
| k3  | k3  | NULL  |
+-+-+---+
Fetched 7 row(s) in 0.35s
{noformat}
 IMPALA-9494 can help to fix this. Until that we could use the techniques from 
IMPALA-9330.


> OUTER JOIN against ACID collections might be converted to INNER JOIN
> 
>
> Key: IMPALA-10685
> URL: https://issues.apache.org/jira/browse/IMPALA-10685
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 4.0
>Reporter: Zoltán Borók-Nagy
>Priority: Major
>
> We are rewriting "A join B" to "A join B1 join B2" for some queries that 
> refer to collections in ACID tables. This is ok for inner join but may be 
> incorrect for outer joins. Here is an example, the two queries produce 
> different results:
> Query works well for non-ACID table:
> {noformat}
> impala> with v as (
>   select ('k4') as key
>   union all
>   values ('k1'), ('k2'), ('k3')
> ) select * from v left join functional_parquet.complextypestbl.int_map using 
> (key);
> +-+--+---+
> | key | key  | value |
> +-+--+---+
> | k1  | k1   | -1|
> | k1  | k1   | 1 |
> | k2  | k2   | 100   |
> | k1  | k1   | 2 |
> | k2  | k2   | NULL  |
> | k1  | k1   | NULL  |
> | k3  | k3   | NULL  |
> | k4  | NULL | NULL  |
> +-+--+---+
> Fetched 8 row(s) in 3.35s
> {noformat}
> LEFT OUTER JOIN converted to INNER JOIN for ACID table:
> {noformat}
> impala> with v as (
>   select ('k4') as key
>   union all
>   values ('k1'), ('k2'), ('k3')
> ) select * from v left join functional_orc_def.complextypestbl

[jira] [Updated] (IMPALA-10685) OUTER JOIN against ACID collections might be converted to INNER JOIN

2021-04-30 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-10685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy updated IMPALA-10685:
---
Description: 
We are rewriting "A join B" to "A join B1 join B2" for some queries that refer 
to collections in ACID tables. This is ok for inner join but may be incorrect 
for outer joins. Here is an example, the two queries produce different results:

Query works well for non-ACID table:
{noformat}
impala> with v as (
  select ('k4') as key
  union all
  values ('k1'), ('k2'), ('k3')
) select * from v left join functional_parquet.complextypestbl.int_map using 
(key);
+-+--+---+
| key | key  | value |
+-+--+---+
| k1  | k1   | -1|
| k1  | k1   | 1 |
| k2  | k2   | 100   |
| k1  | k1   | 2 |
| k2  | k2   | NULL  |
| k1  | k1   | NULL  |
| k3  | k3   | NULL  |
| k4  | NULL | NULL  |
+-+--+---+
Fetched 8 row(s) in 3.35s
{noformat}
LEFT OUTER JOIN converted to INNER JOIN for ACID table:
{noformat}
impala> with v as (
  select ('k4') as key
  union all
  values ('k1'), ('k2'), ('k3')
) select * from v left join functional_orc_def.complextypestbl.int_map using 
(key);
+-+-+---+
| key | key | value |
+-+-+---+
| k1  | k1  | -1|
| k1  | k1  | 1 |
| k2  | k2  | 100   |
| k1  | k1  | 2 |
| k2  | k2  | NULL  |
| k1  | k1  | NULL  |
| k3  | k3  | NULL  |
+-+-+---+
Fetched 7 row(s) in 0.35s
{noformat}
 IMPALA-9494 can help to fix this. Until that we could use the techniques from 
IMPALA-9330.

Possible workaround is to rewrite the query to use an inline view:

{noformat}
with v as (
  select ('k4') as key
  union all
  values ('k1'), ('k2'), ('k3')
)
select * from v
left join
(select int_map.* from
 functional_orc_def.complextypestbl c, c.int_map) vv
using (key);
{noformat}


  was:
We are rewriting "A join B" to "A join B1 join B2" for some queries that refer 
to collections in ACID tables. This is ok for inner join but may be incorrect 
for outer joins. Here is an example, the two queries produce different results:

Query works well for non-ACID table:
{noformat}
impala> with v as (
  select ('k4') as key
  union all
  values ('k1'), ('k2'), ('k3')
) select * from v left join functional_parquet.complextypestbl.int_map using 
(key);
+-+--+---+
| key | key  | value |
+-+--+---+
| k1  | k1   | -1|
| k1  | k1   | 1 |
| k2  | k2   | 100   |
| k1  | k1   | 2 |
| k2  | k2   | NULL  |
| k1  | k1   | NULL  |
| k3  | k3   | NULL  |
| k4  | NULL | NULL  |
+-+--+---+
Fetched 8 row(s) in 3.35s
{noformat}
LEFT OUTER JOIN converted to INNER JOIN for ACID table:
{noformat}
impala> with v as (
  select ('k4') as key
  union all
  values ('k1'), ('k2'), ('k3')
) select * from v left join functional_orc_def.complextypestbl.int_map using 
(key);
+-+-+---+
| key | key | value |
+-+-+---+
| k1  | k1  | -1|
| k1  | k1  | 1 |
| k2  | k2  | 100   |
| k1  | k1  | 2 |
| k2  | k2  | NULL  |
| k1  | k1  | NULL  |
| k3  | k3  | NULL  |
+-+-+---+
Fetched 7 row(s) in 0.35s
{noformat}
 IMPALA-9494 can help to fix this. Until that we could use the techniques from 
IMPALA-9330.

Possible workaround is to rewrite the query to:

{noformat}
with v as (
  select ('k4') as key
  union all
  values ('k1'), ('k2'), ('k3')
)
select * from v
left join
(select int_map.* from
 functional_orc_def.complextypestbl c, c.int_map) vv
using (key);
{noformat}



> OUTER JOIN against ACID collections might be converted to INNER JOIN
> 
>
> Key: IMPALA-10685
> URL: https://issues.apache.org/jira/browse/IMPALA-10685
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 4.0
>Reporter: Zoltán Borók-Nagy
>Priority: Major
>
> We are rewriting "A join B" to "A join B1 join B2" for some queries that 
> refer to collections in ACID tables. This is ok for inner join but may be 
> incorrect for outer joins. Here is an example, the two queries produce 
> different results:
> Query works well for non-ACID table:
> {noformat}
> impala> with v as (
>   select ('k4') as key
>   union all
>   values ('k1'), ('k2'), ('k3')
> ) select * from v left join functional_parquet.complextypestbl.int_map using 
> (key);
> +-+--+---+
> | key | key  | value |
> +-+--+---+
> | k1  | k1   | -1|
> | k1  | k1   | 1 |
> | k2  | k2   | 100   |
> | k1  | k1   | 2 |
> | k2  | k2   | NULL  |
> | k1  | k1   | NULL  |
> | k3  | k3   | NULL  |
> | k4  | NULL | NULL  |
> +-+--+---+
> Fetched 

[jira] [Assigned] (IMPALA-10674) Update toolchain ORC libary for better Iceberg support

2021-04-23 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-10674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy reassigned IMPALA-10674:
--

Assignee: Zoltán Borók-Nagy

> Update toolchain ORC libary for better Iceberg support
> --
>
> Key: IMPALA-10674
> URL: https://issues.apache.org/jira/browse/IMPALA-10674
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend, Infrastructure
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>
> We need the following fixes/features from the ORC library:
> ORC-763: Fix timestamp inconsistencies with Java
> ORC-784: Support setting timezone to timestamp column
> ORC-666: Support timastamp with local timezone (this corresponds to the 
> Iceberg TIMESTAMPTZ type)
> ORC-781: Make type annotations available from C++ (this is needed for Iceberg 
> column resolution)
> To get these we need to upgrade/patch the ORC C++ library in the toolchain.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-9967) Scan orc failed when table contains timestamp column

2021-04-23 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-9967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy reassigned IMPALA-9967:
-

Assignee: Zoltán Borók-Nagy

> Scan orc failed when table contains timestamp column
> 
>
> Key: IMPALA-9967
> URL: https://issues.apache.org/jira/browse/IMPALA-9967
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.0
>Reporter: Sheng Wang
>Assignee: Zoltán Borók-Nagy
>Priority: Minor
>  Labels: impala-iceberg
> Attachments: 00031-31-26ff2064-c8f2-467f-ab7e-1949cb30d151-0.orc, 
> 00031-31-334beaba-ef4b-4d13-b338-e715cdf0ef85-0.orc
>
>
> Recently, when I test impala query orc table, I found that scanning failed 
> when table contains timestamp column, here is there exception: 
> {code:java}
> I0717 08:31:47.179124 78759 status.cc:129] 68436a6e0883be84:53877f720002] 
> Encountered parse error in tail of ORC file 
> hdfs://localhost:20500/test-warehouse/orc_scanner_test/00031-31-ac3cccf1-3ce7-40c6-933c-4fbd7bd57550-0.orc:
>  Unknown type kind
> @  0x1c9f753  impala::Status::Status()
> @  0x27aa049  impala::HdfsOrcScanner::ProcessFileTail()
> @  0x27a7fb3  impala::HdfsOrcScanner::Open()
> @  0x27365fe  
> impala::HdfsScanNodeBase::CreateAndOpenScannerHelper()
> @  0x28cb379  impala::HdfsScanNode::ProcessSplit()
> @  0x28caa7d  impala::HdfsScanNode::ScannerThread()
> @  0x28c9de5  
> _ZZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS_18ThreadResourcePoolEENKUlvE_clEv
> @  0x28cc19e  
> _ZN5boost6detail8function26void_function_obj_invoker0IZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS3_18ThreadResourcePoolEEUlvE_vE6invokeERNS1_15function_bufferE
> @  0x205  boost::function0<>::operator()()
> @  0x2675d93  impala::Thread::SuperviseThread()
> @  0x267dd30  boost::_bi::list5<>::operator()<>()
> @  0x267dc54  boost::_bi::bind_t<>::operator()()
> @  0x267dc15  boost::detail::thread_data<>::run()
> @  0x3e3c3c1  thread_proxy
> @ 0x7f32360336b9  start_thread
> @ 0x7f3232bfe41c  clone
> I0717 08:31:47.325670 78759 hdfs-scan-node.cc:490] 
> 68436a6e0883be84:53877f720002] Error preparing scanner for scan range 
> hdfs://localhost:20500/test-warehouse/orc_scanner_test/00031-31-ac3cccf1-3ce7-40c6-933c-4fbd7bd57550-0.orc(0:582).
>  Encountered parse error in tail of ORC file 
> hdfs://localhost:20500/test-warehouse/orc_scanner_test/00031-31-ac3cccf1-3ce7-40c6-933c-4fbd7bd57550-0.orc:
>  Unknown type kind
> {code}
> When I remove timestamp colum from table, and generate test data, query 
> success. By the way, my test data is generated by spark.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



<    1   2   3   4   5   6   7   8   9   10   >