[jira] [Commented] (IMPALA-12712) INVALIDATE METADATA should set a better createEventId

2024-04-17 Thread Quanlong Huang (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838449#comment-17838449
 ] 

Quanlong Huang commented on IMPALA-12712:
-

[~hemanth619] I think we should prioritize this since it keeps failing the 
tests and it's a real issue that we should fix.

> INVALIDATE METADATA  should set a better createEventId
> -
>
> Key: IMPALA-12712
> URL: https://issues.apache.org/jira/browse/IMPALA-12712
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Reporter: Quanlong Huang
>Assignee: Sai Hemanth Gantasala
>Priority: Critical
>
> "INVALIDATE METADATA " can be used to bring up a table in Impala's 
> catalog cache if the table exists in HMS. For instance, when HMS event 
> processing is disabled, we can use it in Impala to bring up tables that are 
> created outside Impala.
> The createEventId for such tables are always set as -1:
> [https://github.com/apache/impala/blob/6ddd69c605d4c594e33fdd39a2ca888538b4b8d7/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java#L2243-L2246]
> This is problematic when event-processing is enabled. DropTable events and 
> RenameTable events use the createEventId to decide whether to remove the 
> table in catalog cache. -1 will lead to always removing the table. Though it 
> might be added back shortly in follow-up CreateTable events, in the period 
> between them the table is missing in Impala, causing test failures like 
> IMPALA-12266.
> A simpler reproducing of the issue is creating a table in Hive and launching 
> Impala with a long event polling interval to mimic the delay on events. Note 
> that we start Impala cluster after creating the table so Impala don't need to 
> process the CREATE_TABLE event.
> {noformat}
> hive> create table debug_tbl (i int);
> bin/start-impala-cluster.py --catalogd_args=--hms_event_polling_interval_s=60
> {noformat}
> Drop the table in Impala and recreate it in Hive, so it doesn't exist in the 
> catalog cache but exist in HMS. Run "INVALIDATE METADATA " in Impala 
> to bring it up before the DROP_TABLE event come.
> {noformat}
> impala> drop table debug_tbl;
> hive> create table debug_tbl (i int, j int);
> impala> invalidate metadata debug_tbl;
> {noformat}
> The table will be dropped by the DROP_TABLE event and then added back by the 
> CREATE_TABLE event. Shown in catalogd logs:
> {noformat}
> I0115 16:30:15.376713  3208 JniUtil.java:177] 
> 02457b6d5f174d1f:3bdeee14] Finished execDdl request: DROP_TABLE 
> default.debug_tbl issued by quanlong. Time spent: 417ms
> I0115 16:30:23.390962  3208 CatalogServiceCatalog.java:2777] 
> 1840bd101f78d611:22079a5a] Invalidating table metadata: 
> default.debug_tbl
> I0115 16:30:23.404150  3208 Table.java:234] 
> 1840bd101f78d611:22079a5a] createEventId_ for table: 
> default.debug_tbl set to: -1
> I0115 16:30:23.405138  3208 JniUtil.java:177] 
> 1840bd101f78d611:22079a5a] Finished resetMetadata request: INVALIDATE 
> TABLE default.debug_tbl issued by quanlong. Time spent: 17ms
> I0115 16:30:55.108006 32760 MetastoreEvents.java:637] EventId: 8668853 
> EventType: DROP_TABLE Successfully removed table default.debug_tbl
> I0115 16:30:55.108459 32760 MetastoreEvents.java:637] EventId: 8668855 
> EventType: CREATE_TABLE Successfully added table default.debug_tbl
> {noformat}
> CC [~VenuReddy], [~hemanth619]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13013) Table not found after CONVERT TO ICEBERG in load-data

2024-04-17 Thread Quanlong Huang (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838448#comment-17838448
 ] 

Quanlong Huang commented on IMPALA-13013:
-

Might be a similar issue as IMPALA-12266 which is caused by IMPALA-12712.

> Table not found after CONVERT TO ICEBERG in load-data
> -
>
> Key: IMPALA-13013
> URL: https://issues.apache.org/jira/browse/IMPALA-13013
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 4.4.0
>Reporter: Andrew Sherman
>Assignee: Zoltán Borók-Nagy
>Priority: Critical
>
> load-data fails with
> {code:java}
> 17:02:44 CREATE TABLE IF NOT EXISTS 
> functional_parquet.iceberg_lineitem_sixblocks
> 17:02:44 LIKE PARQUET 
> '/test-warehouse/lineitem_sixblocks_iceberg/lineitem_sixblocks.parquet'
> 17:02:44 STORED AS PARQUET
> 17:02:44 LOCATION '/test-warehouse/lineitem_sixblocks_iceberg/'
> 17:02:44 Summary: Returned 1 rows
> 17:02:44 Success: True
> 17:02:44 Took: 0.172425031662(s)
> 17:02:44 Data:
> 17:02:44 Table has been created.
> 17:02:44 
> 17:02:44 ALTER TABLE functional_parquet.iceberg_lineitem_sixblocks CONVERT TO 
> ICEBERG
> 17:02:44 Summary: Returned 1 rows
> 17:02:44 Success: True
> 17:02:44 Took: 1.37302303314(s)
> 17:02:44 Data:
> 17:02:44 Table has been migrated.
> 17:02:44 
> 17:02:44 ERROR: ALTER TABLE functional_parquet.iceberg_lineitem_sixblocks SET 
> TBLPROPERTIES ('format-version'='2')
> 17:02:44 Traceback (most recent call last):
> 17:02:44   File 
> "/data/jenkins/workspace/impala-cdw-master-staging-core/repos/Impala/bin/load-data.py",
>  line 196, in exec_impala_query_from_file
> 17:02:44 result = impala_client.execute(query)
> 17:02:44   File 
> "/data/jenkins/workspace/impala-cdw-master-staging-core/repos/Impala/tests/beeswax/impala_beeswax.py",
>  line 191, in execute
> 17:02:44 handle = self.__execute_query(query_string.strip(), user=user)
> 17:02:44   File 
> "/data/jenkins/workspace/impala-cdw-master-staging-core/repos/Impala/tests/beeswax/impala_beeswax.py",
>  line 382, in __execute_query
> 17:02:44 handle = self.execute_query_async(query_string, user=user)
> 17:02:44   File 
> "/data/jenkins/workspace/impala-cdw-master-staging-core/repos/Impala/tests/beeswax/impala_beeswax.py",
>  line 376, in execute_query_async
> 17:02:44 handle = self.__do_rpc(lambda: self.imp_service.query(query,))
> 17:02:44   File 
> "/data/jenkins/workspace/impala-cdw-master-staging-core/repos/Impala/tests/beeswax/impala_beeswax.py",
>  line 539, in __do_rpc
> 17:02:44 raise ImpalaBeeswaxException(self.__build_error_message(b), b)
> 17:02:44 ImpalaBeeswaxException: ImpalaBeeswaxException:
> 17:02:44  INNER EXCEPTION: 
> 17:02:44  MESSAGE: AnalysisException: Could not resolve table reference: 
> 'functional_parquet.iceberg_lineitem_sixblocks'
> {code}
> IMPALA-12330 might help avoid this problem
> [~boroknagyz] assigning to you for triage, please distribute as appropriate
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-12712) INVALIDATE METADATA should set a better createEventId

2024-04-17 Thread Quanlong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-12712:

Priority: Critical  (was: Major)

> INVALIDATE METADATA  should set a better createEventId
> -
>
> Key: IMPALA-12712
> URL: https://issues.apache.org/jira/browse/IMPALA-12712
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Reporter: Quanlong Huang
>Assignee: Sai Hemanth Gantasala
>Priority: Critical
>
> "INVALIDATE METADATA " can be used to bring up a table in Impala's 
> catalog cache if the table exists in HMS. For instance, when HMS event 
> processing is disabled, we can use it in Impala to bring up tables that are 
> created outside Impala.
> The createEventId for such tables are always set as -1:
> [https://github.com/apache/impala/blob/6ddd69c605d4c594e33fdd39a2ca888538b4b8d7/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java#L2243-L2246]
> This is problematic when event-processing is enabled. DropTable events and 
> RenameTable events use the createEventId to decide whether to remove the 
> table in catalog cache. -1 will lead to always removing the table. Though it 
> might be added back shortly in follow-up CreateTable events, in the period 
> between them the table is missing in Impala, causing test failures like 
> IMPALA-12266.
> A simpler reproducing of the issue is creating a table in Hive and launching 
> Impala with a long event polling interval to mimic the delay on events. Note 
> that we start Impala cluster after creating the table so Impala don't need to 
> process the CREATE_TABLE event.
> {noformat}
> hive> create table debug_tbl (i int);
> bin/start-impala-cluster.py --catalogd_args=--hms_event_polling_interval_s=60
> {noformat}
> Drop the table in Impala and recreate it in Hive, so it doesn't exist in the 
> catalog cache but exist in HMS. Run "INVALIDATE METADATA " in Impala 
> to bring it up before the DROP_TABLE event come.
> {noformat}
> impala> drop table debug_tbl;
> hive> create table debug_tbl (i int, j int);
> impala> invalidate metadata debug_tbl;
> {noformat}
> The table will be dropped by the DROP_TABLE event and then added back by the 
> CREATE_TABLE event. Shown in catalogd logs:
> {noformat}
> I0115 16:30:15.376713  3208 JniUtil.java:177] 
> 02457b6d5f174d1f:3bdeee14] Finished execDdl request: DROP_TABLE 
> default.debug_tbl issued by quanlong. Time spent: 417ms
> I0115 16:30:23.390962  3208 CatalogServiceCatalog.java:2777] 
> 1840bd101f78d611:22079a5a] Invalidating table metadata: 
> default.debug_tbl
> I0115 16:30:23.404150  3208 Table.java:234] 
> 1840bd101f78d611:22079a5a] createEventId_ for table: 
> default.debug_tbl set to: -1
> I0115 16:30:23.405138  3208 JniUtil.java:177] 
> 1840bd101f78d611:22079a5a] Finished resetMetadata request: INVALIDATE 
> TABLE default.debug_tbl issued by quanlong. Time spent: 17ms
> I0115 16:30:55.108006 32760 MetastoreEvents.java:637] EventId: 8668853 
> EventType: DROP_TABLE Successfully removed table default.debug_tbl
> I0115 16:30:55.108459 32760 MetastoreEvents.java:637] EventId: 8668855 
> EventType: CREATE_TABLE Successfully added table default.debug_tbl
> {noformat}
> CC [~VenuReddy], [~hemanth619]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13013) Table not found after CONVERT TO ICEBERG in load-data

2024-04-17 Thread Andrew Sherman (Jira)
Andrew Sherman created IMPALA-13013:
---

 Summary: Table not found after CONVERT TO ICEBERG in load-data
 Key: IMPALA-13013
 URL: https://issues.apache.org/jira/browse/IMPALA-13013
 Project: IMPALA
  Issue Type: Bug
Affects Versions: Impala 4.4.0
Reporter: Andrew Sherman
Assignee: Zoltán Borók-Nagy


load-data fails with
{code:java}
17:02:44 CREATE TABLE IF NOT EXISTS 
functional_parquet.iceberg_lineitem_sixblocks
17:02:44 LIKE PARQUET 
'/test-warehouse/lineitem_sixblocks_iceberg/lineitem_sixblocks.parquet'
17:02:44 STORED AS PARQUET
17:02:44 LOCATION '/test-warehouse/lineitem_sixblocks_iceberg/'
17:02:44 Summary: Returned 1 rows
17:02:44 Success: True
17:02:44 Took: 0.172425031662(s)
17:02:44 Data:
17:02:44 Table has been created.
17:02:44 
17:02:44 ALTER TABLE functional_parquet.iceberg_lineitem_sixblocks CONVERT TO 
ICEBERG
17:02:44 Summary: Returned 1 rows
17:02:44 Success: True
17:02:44 Took: 1.37302303314(s)
17:02:44 Data:
17:02:44 Table has been migrated.
17:02:44 
17:02:44 ERROR: ALTER TABLE functional_parquet.iceberg_lineitem_sixblocks SET 
TBLPROPERTIES ('format-version'='2')
17:02:44 Traceback (most recent call last):
17:02:44   File 
"/data/jenkins/workspace/impala-cdw-master-staging-core/repos/Impala/bin/load-data.py",
 line 196, in exec_impala_query_from_file
17:02:44 result = impala_client.execute(query)
17:02:44   File 
"/data/jenkins/workspace/impala-cdw-master-staging-core/repos/Impala/tests/beeswax/impala_beeswax.py",
 line 191, in execute
17:02:44 handle = self.__execute_query(query_string.strip(), user=user)
17:02:44   File 
"/data/jenkins/workspace/impala-cdw-master-staging-core/repos/Impala/tests/beeswax/impala_beeswax.py",
 line 382, in __execute_query
17:02:44 handle = self.execute_query_async(query_string, user=user)
17:02:44   File 
"/data/jenkins/workspace/impala-cdw-master-staging-core/repos/Impala/tests/beeswax/impala_beeswax.py",
 line 376, in execute_query_async
17:02:44 handle = self.__do_rpc(lambda: self.imp_service.query(query,))
17:02:44   File 
"/data/jenkins/workspace/impala-cdw-master-staging-core/repos/Impala/tests/beeswax/impala_beeswax.py",
 line 539, in __do_rpc
17:02:44 raise ImpalaBeeswaxException(self.__build_error_message(b), b)
17:02:44 ImpalaBeeswaxException: ImpalaBeeswaxException:
17:02:44  INNER EXCEPTION: 
17:02:44  MESSAGE: AnalysisException: Could not resolve table reference: 
'functional_parquet.iceberg_lineitem_sixblocks'
{code}
IMPALA-12330 might help avoid this problem

[~boroknagyz] assigning to you for triage, please distribute as appropriate
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-13012) Completed queries write fails regularly under heavy load

2024-04-17 Thread Michael Smith (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith reassigned IMPALA-13012:
--

Assignee: Michael Smith

> Completed queries write fails regularly under heavy load
> 
>
> Key: IMPALA-13012
> URL: https://issues.apache.org/jira/browse/IMPALA-13012
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.4.0
>Reporter: Michael Smith
>Assignee: Michael Smith
>Priority: Major
>
> Under heavy test load (running EE tests), Impala regularly fails to write 
> completed queries with errors like
> {code}
> W0411 19:11:07.764967 32713 workload-management.cc:435] failed to write 
> completed queries table="sys.impala_query_log" record_count="10001"
> W0411 19:11:07.764983 32713 workload-management.cc:437] AnalysisException: 
> Exceeded the statement expression limit (25)
> Statement has 370039 expressions.
> {code}
> After a few attempts, it floods logs with an error for each query that could 
> not be written
> {code}
> E0411 19:11:24.646953 32713 workload-management.cc:376] could not write 
> completed query table="sys.impala_query_log" 
> query_id="3142ceb1380b58e6:715b83d9"
> {code}
> This seems like poor default behavior. Options for addressing it:
> # Decrease the default for {{query_log_max_queued}}. Inserts are pretty 
> constant at 37 expressions per entry. I'm not sure why that isn't 49, since 
> that's the number of columns we have; maybe some fields are frequently 
> omitted. I would cap {{query_log_max_queued}} to {{statement_expression_limit 
> / number_of_columns ~ 5100}}.
> # Allow workload management to {{set statement_expression_limit}} higher 
> using a similar formula. This may be relatively safe as the expressions are 
> simple.
> # Ideally we would skip expression parsing and construct TExecRequest 
> directly, but that's a much larger effort.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13012) Completed queries write fails regularly under heavy load

2024-04-17 Thread Michael Smith (Jira)
Michael Smith created IMPALA-13012:
--

 Summary: Completed queries write fails regularly under heavy load
 Key: IMPALA-13012
 URL: https://issues.apache.org/jira/browse/IMPALA-13012
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Affects Versions: Impala 4.4.0
Reporter: Michael Smith


Under heavy test load (running EE tests), Impala regularly fails to write 
completed queries with errors like
{code}
W0411 19:11:07.764967 32713 workload-management.cc:435] failed to write 
completed queries table="sys.impala_query_log" record_count="10001"
W0411 19:11:07.764983 32713 workload-management.cc:437] AnalysisException: 
Exceeded the statement expression limit (25)
Statement has 370039 expressions.
{code}

After a few attempts, it floods logs with an error for each query that could 
not be written
{code}
E0411 19:11:24.646953 32713 workload-management.cc:376] could not write 
completed query table="sys.impala_query_log" 
query_id="3142ceb1380b58e6:715b83d9"
{code}

This seems like poor default behavior. Options for addressing it:
# Decrease the default for {{query_log_max_queued}}. Inserts are pretty 
constant at 37 expressions per entry. I'm not sure why that isn't 49, since 
that's the number of columns we have; maybe some fields are frequently omitted. 
I would cap {{query_log_max_queued}} to {{statement_expression_limit / 
number_of_columns ~ 5100}}.
# Allow workload management to {{set statement_expression_limit}} higher using 
a similar formula. This may be relatively safe as the expressions are simple.
# Ideally we would skip expression parsing and construct TExecRequest directly, 
but that's a much larger effort.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13009) Possible leak of partition updates when the table has failed DDL and recovered by INVALIDATE METADATA

2024-04-17 Thread Quanlong Huang (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838407#comment-17838407
 ] 

Quanlong Huang commented on IMPALA-13009:
-

Found an easier way to reproduce the issue. Start impala with event processing 
disabled:
{code:bash}
bin/start-impala-cluster.py --catalogd_args=--hms_event_polling_interval_s=0 
{code}
Prepare a partitioned table with some partitions:
{code:sql}
create table if not exists my_part(i int) partitioned by (p int);
insert into my_part partition(p) values (1,1),(2,2),(3,3);{code}
Do these in one command so the following statements will be executed 
immediately:
{code:sql}
alter table my_part drop partition(p>0); invalidate metadata my_part; show 
partitions my_part;{code}
Restart any impalad can see the stale partitions.

> Possible leak of partition updates when the table has failed DDL and 
> recovered by INVALIDATE METADATA
> -
>
> Key: IMPALA-13009
> URL: https://issues.apache.org/jira/browse/IMPALA-13009
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Critical
>
> Catalogd might not send partition deletions to the catalog topic in the 
> following scenario:
> * Partitions of a table are dropped externally outside Impala.
> * Table dir is also removed on HDFS.
> * ALTER TABLE RECOVER PARTITIONS failed by FileNotFoundException on the table 
> dir.
> * A subsequent INVALIDATE METADATA on the same table succeeds to invalidate 
> the table.
> After the INVALIDATE finishes, catalogd might not send deletions of the 
> dropped partitions to the catalog topic. Then the catalog topic only have the 
> updates of those partitions, no deletions.
> This will be detected when a coordinator restarts:
> {noformat}
> E0417 16:41:22.317298 20746 ImpaladCatalog.java:264] Error adding catalog 
> object: Received stale partition in a statestore update: 
> THdfsPartition(partitionKeyExprs:[TExpr(nodes:[TExprNode(node_type:INT_LITERAL,
>  type:TColumnType(types:[TTypeNode(type:SCALAR, 
> scalar_type:TScalarType(type:INT))]), num_children:0, is_constant:true, 
> int_literal:TIntLiteral(value:106), is_codegen_disabled:false)])], 
> location:THdfsPartitionLocation(prefix_index:0, suffix:p=106), id:138, 
> file_desc:[THdfsFileDesc(file_desc_data:18 00 00 00 00 00 00 00 00 00 0E 00 
> 1C 00 18 00 10 00 00 00 08 00 04 00 0E 00 00 00 18 00 00 00 8B 0E 2D EB 8E 01 
> 00 00 04 00 00 00 00 00 00 00 0C 00 00 00 01 00 00 00 4C 00 00 00 36 00 00 00 
> 34 34 34 37 62 35 66 34 62 30 65 64 66 64 65 31 2D 32 33 33 61 64 62 38 35 30 
> 30 30 30 30 30 30 30 5F 36 36 34 31 30 39 33 37 33 5F 64 61 74 61 2E 30 2E 74 
> 78 74 00 00 0C 00 14 00 00 00 0C 00...)], access_level:READ_WRITE, 
> stats:TTableStats(num_rows:-1), is_marked_cached:false, 
> hms_parameters:{transient_lastDdlTime=1713342582, totalSize=4, 
> numFilesErasureCoded=0, numFiles=1}, num_blocks:1, total_file_size_bytes:4, 
> has_incremental_stats:false, write_id:0, db_name:default, tbl_name:my_part, 
> partition_name:p=106, 
> hdfs_storage_descriptor:THdfsStorageDescriptor(lineDelim:10, fieldDelim:1, 
> collectionDelim:1, mapKeyDelim:1, escapeChar:0, quoteChar:1, fileFormat:TEXT, 
> blockSize:0))
> Java exception follows:
> java.lang.IllegalStateException: Received stale partition in a statestore 
> update: 
> THdfsPartition(partitionKeyExprs:[TExpr(nodes:[TExprNode(node_type:INT_LITERAL,
>  type:TColumnType(types:[TTypeNode(type:SCALAR, 
> scalar_type:TScalarType(type:INT))]), num_children:0, is_constant:true, 
> int_literal:TIntLiteral(value:106), is_codegen_disabled:false)])], 
> location:THdfsPartitionLocation(prefix_index:0, suffix:p=106), id:138, 
> file_desc:[THdfsFileDesc(file_desc_data:18 00 00 00 00 00 00 00 00 00 0E 00 
> 1C 00 18 00 10 00 00 00 08 00 04 00 0E 00 00 00 18 00 00 00 8B 0E 2D EB 8E 01 
> 00 00 04 00 00 00 00 00 00 00 0C 00 00 00 01 00 00 00 4C 00 00 00 36 00 00 00 
> 34 34 34 37 62 35 66 34 62 30 65 64 66 64 65 31 2D 32 33 33 61 64 62 38 35 30 
> 30 30 30 30 30 30 30 5F 36 36 34 31 30 39 33 37 33 5F 64 61 74 61 2E 30 2E 74 
> 78 74 00 00 0C 00 14 00 00 00 0C 00...)], access_level:READ_WRITE, 
> stats:TTableStats(num_rows:-1), is_marked_cached:false, 
> hms_parameters:{transient_lastDdlTime=1713342582, totalSize=4, 
> numFilesErasureCoded=0, numFiles=1}, num_blocks:1, total_file_size_bytes:4, 
> has_incremental_stats:false, write_id:0, db_name:default, tbl_name:my_part, 
> partition_name:p=106, 
> hdfs_storage_descriptor:THdfsStorageDescriptor(lineDelim:10, fieldDelim:1, 
> collectionDelim:1, mapKeyDelim:1, escapeChar:0, quoteChar:1, fileFormat:TEXT, 
> blockSize:0))
> at 
> com.google.common.base.Preconditions.checkState(Preconditions.java:512)
>  

[jira] [Commented] (IMPALA-13009) Possible leak of partition updates when the table has failed DDL and recovered by INVALIDATE METADATA

2024-04-17 Thread Fang-Yu Rao (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838405#comment-17838405
 ] 

Fang-Yu Rao commented on IMPALA-13009:
--

Thanks for the detailed steps to reproduce the issue [~stigahuang]!

I have tried your latest script at 
https://issues.apache.org/jira/browse/IMPALA-13009?focusedCommentId=17838211=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17838211
 and found that I could also reproduce the issue after restarting only the 
Impala daemons (via "{*}bin/start-impala-cluster.py -r{*}") even though we 
don't have the command that removes the HDFS path from outside of Impala. I was 
using Apache Impala on a recent master where the tip commit is IMPALA-12996 
(Add support for DATE in Iceberg metadata tables).
{code:java}
I0417 16:06:57.716398 16131 ImpaladCatalog.java:232] Adding: 
TABLE:default.my_part version: 1723 size: 1557
I0417 16:06:57.719789 16131 ImpaladCatalog.java:232] Adding: CATALOG_SERVICE_ID 
version: 1723 size: 60
I0417 16:06:57.720358 16131 ImpaladCatalog.java:257] Adding 9 partition(s): 
HDFS_PARTITION:default.my_part:(p=1,p=2,...,p=9), versions=[1706, 1712, 1718], 
size=(avg=588, min=588, max=588, sum=5292)
E0417 16:06:57.917488 16131 ImpaladCatalog.java:264] Error adding catalog 
object: Received stale partition in a statestore update: 
THdfsPartition(partitionKeyExprs:[TExpr(nodes:[TExprNode(node_type:INT_LITERAL, 
type:TColumnType(types:[TTypeNode(type:SCALAR, 
scalar_type:TScalarType(type:INT))]), num_children:0, is_constant:true, 
int_literal:TIntLiteral(value:1), is_codegen_disabled:false)])], 
location:THdfsPartitionLocation(prefix_index:0, suffix:p=1), id:0, 
file_desc:[THdfsFileDesc(file_desc_data:18 00 00 00 00 00 00 00 00 00 0E 00 1C 
00 18 00 10 00 00 00 08 00 04 00 0E 00 00 00 18 00 00 00 A9 E7 4F EE 8E 01 00 
00 02 00 00 00 00 00 00 00 0C 00 00 00 01 00 00 00 4C 00 00 00 37 00 00 00 61 
61 34 36 34 66 61 66 35 61 31 37 36 65 39 65 2D 36 63 66 31 63 38 34 61 30 30 
30 30 30 30 30 30 5F 31 37 31 31 36 38 30 30 38 32 5F 64 61 74 61 2E 30 2E 74 
78 74 00 0C 00 14 00 00 00 0C 00...)], access_level:READ_WRITE, 
stats:TTableStats(num_rows:-1), is_marked_cached:false, 
hms_parameters:{transient_lastDdlTime=1713395198, totalSize=2, 
numFilesErasureCoded=0, numFiles=1}, num_blocks:1, total_file_size_bytes:2, 
has_incremental_stats:false, write_id:0, db_name:default, tbl_name:my_part, 
partition_name:p=1, 
hdfs_storage_descriptor:THdfsStorageDescriptor(lineDelim:10, fieldDelim:1, 
collectionDelim:1, mapKeyDelim:1, escapeChar:0, quoteChar:1, fileFormat:TEXT, 
blockSize:0))
Java exception follows:
java.lang.IllegalStateException: Received stale partition in a statestore 
update: 
THdfsPartition(partitionKeyExprs:[TExpr(nodes:[TExprNode(node_type:INT_LITERAL, 
type:TColumnType(types:[TTypeNode(type:SCALAR, 
scalar_type:TScalarType(type:INT))]), num_children:0, is_constant:true, 
int_literal:TIntLiteral(value:1), is_codegen_disabled:false)])], 
location:THdfsPartitionLocation(prefix_index:0, suffix:p=1), id:0, 
file_desc:[THdfsFileDesc(file_desc_data:18 00 00 00 00 00 00 00 00 00 0E 00 1C 
00 18 00 10 00 00 00 08 00 04 00 0E 00 00 00 18 00 00 00 A9 E7 4F EE 8E 01 00 
00 02 00 00 00 00 00 00 00 0C 00 00 00 01 00 00 00 4C 00 00 00 37 00 00 00 61 
61 34 36 34 66 61 66 35 61 31 37 36 65 39 65 2D 36 63 66 31 63 38 34 61 30 30 
30 30 30 30 30 30 5F 31 37 31 31 36 38 30 30 38 32 5F 64 61 74 61 2E 30 2E 74 
78 74 00 0C 00 14 00 00 00 0C 00...)], access_level:READ_WRITE, 
stats:TTableStats(num_rows:-1), is_marked_cached:false, 
hms_parameters:{transient_lastDdlTime=1713395198, totalSize=2, 
numFilesErasureCoded=0, numFiles=1}, num_blocks:1, total_file_size_bytes:2, 
has_incremental_stats:false, write_id:0, db_name:default, tbl_name:my_part, 
partition_name:p=1, 
hdfs_storage_descriptor:THdfsStorageDescriptor(lineDelim:10, fieldDelim:1, 
collectionDelim:1, mapKeyDelim:1, escapeChar:0, quoteChar:1, fileFormat:TEXT, 
blockSize:0))
at 
com.google.common.base.Preconditions.checkState(Preconditions.java:512)
at 
org.apache.impala.catalog.ImpaladCatalog.addTable(ImpaladCatalog.java:523)
at 
org.apache.impala.catalog.ImpaladCatalog.addCatalogObject(ImpaladCatalog.java:334)
at 
org.apache.impala.catalog.ImpaladCatalog.updateCatalog(ImpaladCatalog.java:262)
at 
org.apache.impala.service.FeCatalogManager$CatalogdImpl.updateCatalogCache(FeCatalogManager.java:120)
at 
org.apache.impala.service.Frontend.updateCatalogCache(Frontend.java:565)
at 
org.apache.impala.service.JniFrontend.updateCatalogCache(JniFrontend.java:196)
{code}

> Possible leak of partition updates when the table has failed DDL and 
> recovered by INVALIDATE METADATA
> -
>
> Key: IMPALA-13009
> URL: 

[jira] [Resolved] (IMPALA-12689) Toolchain TPC-H and TPC-DS binaries are not built with optimizations

2024-04-17 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-12689.

Fix Version/s: Impala 4.4.0
   Resolution: Fixed

> Toolchain TPC-H and TPC-DS binaries are not built with optimizations
> 
>
> Key: IMPALA-12689
> URL: https://issues.apache.org/jira/browse/IMPALA-12689
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 4.4.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Major
> Fix For: Impala 4.4.0
>
>
> The tpc-h and tpc-ds components of the toolchain do not enable any kind of 
> compiler optimization flags. This is irrelevant to Impala's shipped binary, 
> but it does impact the performance of the data generators for TPC-H and 
> TPC-DS. Turning on -O3 seems to improve the data generation time by ~25%.
> {noformat}
> # TPC-H 
> # Unoptimized
> $ time ./dbgen -f -s 42
> TPC-H Population Generator (Version 2.17.0)
> Copyright Transaction Processing Performance Council 1994 - 2010
> real    4m46.269s
> user    4m20.982s
> sys     0m19.390s
> # -O3
> $ time ./dbgen -f -s 42
> TPC-H Population Generator (Version 2.17.0)
> Copyright Transaction Processing Performance Council 1994 - 2010
> real    3m46.379s
> user    3m23.721s
> sys     0m18.436s
> # TPC-DS ###
> # Unoptimized
> $ time ./dsdgen -force -scale 20
> DBGEN2 Population Generator (Version 2.0.0)
> Copyright Transaction Processing Performance Council (TPC) 2001 - 2015
> Warning: Selected scale factor is NOT valid for result publication
> real    9m41.441s
> user    8m3.447s
> sys     1m37.944s
> # -O3
> $ time ./dsdgen -force -scale 20
> DBGEN2 Population Generator (Version 2.0.0)
> Copyright Transaction Processing Performance Council (TPC) 2001 - 2015
> Warning: Selected scale factor is NOT valid for result publication
> real    7m25.017s
> user    5m48.487s
> sys     1m36.265s
> {noformat}
> We should modify the toolchain to add -O3 to these builds.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-12689) Toolchain TPC-H and TPC-DS binaries are not built with optimizations

2024-04-17 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell reassigned IMPALA-12689:
--

Assignee: Joe McDonnell

> Toolchain TPC-H and TPC-DS binaries are not built with optimizations
> 
>
> Key: IMPALA-12689
> URL: https://issues.apache.org/jira/browse/IMPALA-12689
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 4.4.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Major
>
> The tpc-h and tpc-ds components of the toolchain do not enable any kind of 
> compiler optimization flags. This is irrelevant to Impala's shipped binary, 
> but it does impact the performance of the data generators for TPC-H and 
> TPC-DS. Turning on -O3 seems to improve the data generation time by ~25%.
> {noformat}
> # TPC-H 
> # Unoptimized
> $ time ./dbgen -f -s 42
> TPC-H Population Generator (Version 2.17.0)
> Copyright Transaction Processing Performance Council 1994 - 2010
> real    4m46.269s
> user    4m20.982s
> sys     0m19.390s
> # -O3
> $ time ./dbgen -f -s 42
> TPC-H Population Generator (Version 2.17.0)
> Copyright Transaction Processing Performance Council 1994 - 2010
> real    3m46.379s
> user    3m23.721s
> sys     0m18.436s
> # TPC-DS ###
> # Unoptimized
> $ time ./dsdgen -force -scale 20
> DBGEN2 Population Generator (Version 2.0.0)
> Copyright Transaction Processing Performance Council (TPC) 2001 - 2015
> Warning: Selected scale factor is NOT valid for result publication
> real    9m41.441s
> user    8m3.447s
> sys     1m37.944s
> # -O3
> $ time ./dsdgen -force -scale 20
> DBGEN2 Population Generator (Version 2.0.0)
> Copyright Transaction Processing Performance Council (TPC) 2001 - 2015
> Warning: Selected scale factor is NOT valid for result publication
> real    7m25.017s
> user    5m48.487s
> sys     1m36.265s
> {noformat}
> We should modify the toolchain to add -O3 to these builds.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12689) Toolchain TPC-H and TPC-DS binaries are not built with optimizations

2024-04-17 Thread Joe McDonnell (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838398#comment-17838398
 ] 

Joe McDonnell commented on IMPALA-12689:


Fixed by:
{noformat}
commit cd9260e5276d0e342b21869c51e71aea9643504c
Author: Joe McDonnell 
Date:   Thu Feb 15 18:22:15 2024 -0800    IMPALA-12689: Change TPC-H and TPC-DS 
builds to respect CFLAGS
    
    The TPC-H and TPC-DS builds currently do not respect the
    CFLAGS environment variable, so they don't incorporate the
    values that we set in init-compiler.sh.
    
    This modifies the build scripts for TPC-H and TPC-DS to
    patch their makefiles to add our CFLAGS. This has the
    side effect of turning on -O3 optimization, resulting
    in faster binaries used to generate the TPC-H and
    TPC-DS datasets:
    
    TPC-H's dbgen at scale 42:
    Unoptimized: 4m46.269s
    Optimized: 3m46.379s
    
    TPC-DS's dsdgen at scale 20:
    Unoptimized: 9m41.441s
    Optimized: 7m25.017s
    
    Testing:
     - Ran a build and verified that the flags include our
       CFLAGS value
    
    Change-Id: I3f999b71c56a72c14f1beeea99a3689b82a4d45a
    Reviewed-on: http://gerrit.cloudera.org:8080/2
    Reviewed-by: Michael Smith 
    Tested-by: Joe McDonnell 
{noformat}

> Toolchain TPC-H and TPC-DS binaries are not built with optimizations
> 
>
> Key: IMPALA-12689
> URL: https://issues.apache.org/jira/browse/IMPALA-12689
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 4.4.0
>Reporter: Joe McDonnell
>Priority: Major
>
> The tpc-h and tpc-ds components of the toolchain do not enable any kind of 
> compiler optimization flags. This is irrelevant to Impala's shipped binary, 
> but it does impact the performance of the data generators for TPC-H and 
> TPC-DS. Turning on -O3 seems to improve the data generation time by ~25%.
> {noformat}
> # TPC-H 
> # Unoptimized
> $ time ./dbgen -f -s 42
> TPC-H Population Generator (Version 2.17.0)
> Copyright Transaction Processing Performance Council 1994 - 2010
> real    4m46.269s
> user    4m20.982s
> sys     0m19.390s
> # -O3
> $ time ./dbgen -f -s 42
> TPC-H Population Generator (Version 2.17.0)
> Copyright Transaction Processing Performance Council 1994 - 2010
> real    3m46.379s
> user    3m23.721s
> sys     0m18.436s
> # TPC-DS ###
> # Unoptimized
> $ time ./dsdgen -force -scale 20
> DBGEN2 Population Generator (Version 2.0.0)
> Copyright Transaction Processing Performance Council (TPC) 2001 - 2015
> Warning: Selected scale factor is NOT valid for result publication
> real    9m41.441s
> user    8m3.447s
> sys     1m37.944s
> # -O3
> $ time ./dsdgen -force -scale 20
> DBGEN2 Population Generator (Version 2.0.0)
> Copyright Transaction Processing Performance Council (TPC) 2001 - 2015
> Warning: Selected scale factor is NOT valid for result publication
> real    7m25.017s
> user    5m48.487s
> sys     1m36.265s
> {noformat}
> We should modify the toolchain to add -O3 to these builds.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12900) Compile binutils with -O3 in the toolchain

2024-04-17 Thread Joe McDonnell (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838396#comment-17838396
 ] 

Joe McDonnell commented on IMPALA-12900:


Fixed by
{noformat}
commit ce8bc71ff7e0cfc0c39511308df8841f39fb03d2
Author: Joe McDonnell 
Date:   Tue Mar 12 21:47:12 2024 -0700    IMPALA-12900: Build binutils with -O3
    
    The binutils build happens before we have switched over to
    using the toolchain compiler. This means that it also does
    not set CFLAGS/CXXFLAGS. The default optimization level
    for binutils is -O2. It is possible that we could get a bit
    extra speed by using -O3, so this sets CFLAGS/CXXFLAGS to use
    -O3 for binutils.
    
    Testing:
     - Toolchain builds on x86_64 and ARM
    
    Change-Id: I2e75db0759b4d3d4e6cc2ce929b1741808f1b771
    Reviewed-on: http://gerrit.cloudera.org:8080/21145
    Reviewed-by: Michael Smith 
    Reviewed-by: Laszlo Gaal 
    Tested-by: Joe McDonnell {noformat}

> Compile binutils with -O3 in the toolchain
> --
>
> Key: IMPALA-12900
> URL: https://issues.apache.org/jira/browse/IMPALA-12900
> Project: IMPALA
>  Issue Type: Task
>  Components: Infrastructure
>Affects Versions: Impala 4.3.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Major
>
> Since the toolchain builds binutils with the native compiler (as the 
> toolchain compiler hasn't been built yet), we haven't set CFLAGS yet. The 
> default CFLAGS for binutils use -O2. It's possible that we could get a bit 
> more speed by building with -O3. We should set CFLAGS/CXXFLAGS to use -O3.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-12900) Compile binutils with -O3 in the toolchain

2024-04-17 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell reassigned IMPALA-12900:
--

Assignee: Joe McDonnell

> Compile binutils with -O3 in the toolchain
> --
>
> Key: IMPALA-12900
> URL: https://issues.apache.org/jira/browse/IMPALA-12900
> Project: IMPALA
>  Issue Type: Task
>  Components: Infrastructure
>Affects Versions: Impala 4.3.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Major
>
> Since the toolchain builds binutils with the native compiler (as the 
> toolchain compiler hasn't been built yet), we haven't set CFLAGS yet. The 
> default CFLAGS for binutils use -O2. It's possible that we could get a bit 
> more speed by building with -O3. We should set CFLAGS/CXXFLAGS to use -O3.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-12900) Compile binutils with -O3 in the toolchain

2024-04-17 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-12900.

Fix Version/s: Impala 4.4.0
   Resolution: Fixed

> Compile binutils with -O3 in the toolchain
> --
>
> Key: IMPALA-12900
> URL: https://issues.apache.org/jira/browse/IMPALA-12900
> Project: IMPALA
>  Issue Type: Task
>  Components: Infrastructure
>Affects Versions: Impala 4.3.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Major
> Fix For: Impala 4.4.0
>
>
> Since the toolchain builds binutils with the native compiler (as the 
> toolchain compiler hasn't been built yet), we haven't set CFLAGS yet. The 
> default CFLAGS for binutils use -O2. It's possible that we could get a bit 
> more speed by building with -O3. We should set CFLAGS/CXXFLAGS to use -O3.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12990) impala-shell broken if Iceberg delete deletes 0 rows

2024-04-17 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838381#comment-17838381
 ] 

ASF subversion and git services commented on IMPALA-12990:
--

Commit 541fc5ee9ec2d804f2ba45feb2df5bb96a013f86 in impala's branch 
refs/heads/master from Csaba Ringhofer
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=541fc5ee9 ]

IMPALA-12990: Fix impala-shell handling of unset rows_deleted

The issue occurred in Python 3 when 0 rows were deleted from Iceberg.
It could also happen in other DMLs with older Impala servers where
TDmlResult.rows_deleted was not set. See the Jira for details of
the error.

Testing:
Extended shell tests for Kudu DML reporting to also cover Iceberg.

Change-Id: I5812b8006b9cacf34a7a0dbbc89a486d8b454438
Reviewed-on: http://gerrit.cloudera.org:8080/21284
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> impala-shell broken if Iceberg delete deletes 0 rows
> 
>
> Key: IMPALA-12990
> URL: https://issues.apache.org/jira/browse/IMPALA-12990
> Project: IMPALA
>  Issue Type: Bug
>  Components: Clients
>Reporter: Csaba Ringhofer
>Assignee: Csaba Ringhofer
>Priority: Major
>  Labels: iceberg
>
> Happens only with Python 3
> {code}
> impala-python3 shell/impala_shell.py
> create table icebergupdatet (i int, s string) stored as iceberg;
> alter table icebergupdatet set tblproperties("format-version"="2");
> delete from icebergupdatet where i=0;
> Unknown Exception : '>' not supported between instances of 'NoneType' and 
> 'int'
> Traceback (most recent call last):
>   File "shell/impala_shell.py", line 1428, in _execute_stmt
> if is_dml and num_rows == 0 and num_deleted_rows > 0:
> TypeError: '>' not supported between instances of 'NoneType' and 'int'
> {code}
> The same erros should also happen when the delete removes > 0 rows, but the 
> impala server has an older version that doesn't set TDmlResult.rows_deleted



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13011) Need to remove awkward Authorization instantiation.

2024-04-17 Thread Steve Carlin (Jira)
Steve Carlin created IMPALA-13011:
-

 Summary: Need to remove awkward Authorization instantiation.
 Key: IMPALA-13011
 URL: https://issues.apache.org/jira/browse/IMPALA-13011
 Project: IMPALA
  Issue Type: Sub-task
Reporter: Steve Carlin


There is a reference to the Authorization instance in CalcitePhysPlanCreator in 
order to instantiate the Analyzer object.


Authorization needs to happen earlier.  This should be refactored so it is not 
referenced in this part of the code.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-12754) Update Impala document to cover external jdbc data source

2024-04-17 Thread Manish Maheshwari (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manish Maheshwari reassigned IMPALA-12754:
--

Assignee: gaurav singh

> Update Impala document to cover external jdbc data source
> -
>
> Key: IMPALA-12754
> URL: https://issues.apache.org/jira/browse/IMPALA-12754
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Docs
>Reporter: Wenzhe Zhou
>Assignee: gaurav singh
>Priority: Major
>
> Impala external data source is undocumented in upstream. We need to document 
> the external data source APIs, SQL syntax to create jdbc data source, show 
> data sources, create table for external jdbc data source, including the 
> properties to be set for jdbc and DBCP (Database connection pool).
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-12993) Encrypt password in JDBC table properties when saving into HMS DB

2024-04-17 Thread Manish Maheshwari (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manish Maheshwari reassigned IMPALA-12993:
--

Assignee: Pranav Yogi Lodha

> Encrypt password in JDBC table properties when saving into HMS DB
> -
>
> Key: IMPALA-12993
> URL: https://issues.apache.org/jira/browse/IMPALA-12993
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Wenzhe Zhou
>Assignee: Pranav Yogi Lodha
>Priority: Major
>
> The password of remote database is specified in table property 
> 'dbcp.password' when creating Impala external JDBC table. Currently all table 
> properties are saved in HMS DB as clear text. It's more secure to encrypt 
> password in JDBC table properties when saving the table metadata into HMS and 
> decrypt it when reading from HMS.
> IMPALA-12928 makes the value of dbcp.password table property been masked in 
> the output of 'describe formatted' and 'show create table' commands.  In 
> Impala log file, the password like other sensitive information within SQL 
> statement text can be redacted by setting up a regular expressions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-12992) Support for Hive JDBC Storage handler tables

2024-04-17 Thread Manish Maheshwari (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manish Maheshwari reassigned IMPALA-12992:
--

Assignee: Pranav Yogi Lodha

> Support for Hive JDBC Storage handler tables
> 
>
> Key: IMPALA-12992
> URL: https://issues.apache.org/jira/browse/IMPALA-12992
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend, Frontend
>Reporter: Wenzhe Zhou
>Assignee: Pranav Yogi Lodha
>Priority: Major
>
> This is an enhancement request to support JDBC tables created by Hive JDBC 
> Storage handler. There are some differences in the configurations of JDBC 
> driver and DBCP in table properties between the tables created by Impala and 
> tables created by Hive JDBC storage handler.  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13010) Exclude cheap fragments from CpuAsk calculation.

2024-04-17 Thread Riza Suminto (Jira)
Riza Suminto created IMPALA-13010:
-

 Summary: Exclude cheap fragments from CpuAsk calculation.
 Key: IMPALA-13010
 URL: https://issues.apache.org/jira/browse/IMPALA-13010
 Project: IMPALA
  Issue Type: Improvement
  Components: Frontend
Reporter: Riza Suminto


If a fragment has a low ProcessingCost under certain threshold, it might be 
better to exclude it from CpuAsk/dominant fragment calculation. Right now, a 
fragment that reads a very few rows can bump the core count, even though such a 
fragment would run in parallel only for a very limited time.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13003) Server exits early failing to create impala_query_log with AlreadyExistsException

2024-04-17 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838334#comment-17838334
 ] 

ASF subversion and git services commented on IMPALA-13003:
--

Commit bbe3303ded1e7a796e6bc41648b89074c52b1e7e in impala's branch 
refs/heads/master from Michael Smith
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=bbe3303de ]

IMPALA-13003: Handle Iceberg AlreadyExistsException

When multiple coordinators attempt to create the same table concurrently
with "if not exists", we still see

  AlreadyExistsException: Table was created concurrently: my_iceberg_tbl

Iceberg throws its own version of AlreadyExistsException, but we avoid
most code paths that would throw it because we first check HMS to see if
the table exists before trying to create it.

Updates createIcebergTable to handle Iceberg's AlreadyExistsException
identically to the HMS AlreadyExistsException.

Adds a test using DebugAction to simulate concurrent table creation.

Change-Id: I847eea9297c9ee0d8e821fe1c87ea03d22f1d96e
Reviewed-on: http://gerrit.cloudera.org:8080/21312
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Server exits early failing to create impala_query_log with 
> AlreadyExistsException
> -
>
> Key: IMPALA-13003
> URL: https://issues.apache.org/jira/browse/IMPALA-13003
> Project: IMPALA
>  Issue Type: Bug
>  Components: be
>Affects Versions: Impala 4.4.0
>Reporter: Andrew Sherman
>Assignee: Michael Smith
>Priority: Critical
>  Labels: iceberg, workload-management
> Fix For: Impala 4.4.0
>
>
> At startup workload management tries to create the query log table here:
> {code:java}
>   // The initialization code only works when run in a separate thread for 
> reasons unknown.
>   ABORT_IF_ERROR(SetupDbTable(internal_server_.get(), table_name));
> {code}
> This code is exiting:
> {code:java}
> I0413 23:40:05.183876 21006 client-request-state.cc:1348] 
> 1d4878dbc9214c81:6dc8cc2e] ImpalaRuntimeException: Error making 
> 'createTable' RPC to Hive Metastore:
> CAUSED BY: AlreadyExistsException: Table was created concurrently: 
> sys.impala_query_log
> I0413 23:40:05.184055 20955 impala-server.cc:2582] Connection 
> 27432606d99dcdae:218860164eb206bb from client in-memory.localhost:0 to server 
> internal-server closed. The connection had 1 associated session(s).
> I0413 23:40:05.184067 20955 impala-server.cc:1780] Closing session: 
> 27432606d99dcdae:218860164eb206bb
> I0413 23:40:05.184083 20955 impala-server.cc:1836] Closed session: 
> 27432606d99dcdae:218860164eb206bb, client address: .
> F0413 23:40:05.184111 20955 workload-management.cc:304] query timed out 
> waiting for results
> . Impalad exiting.
> I0413 23:40:05.184728 20883 impala-server.cc:1564] Query successfully 
> unregistered: query_id=1d4878dbc9214c81:6dc8cc2e
> Minidump in thread [20955]completed-queries running query 
> :, fragment instance 
> :
> Wrote minidump to 
> /data/jenkins/workspace/impala-cdw-master-core-ubsan/repos/Impala/logs/custom_cluster_tests/minidumps/impalad/402f37cc-4663-4c78-086ca295-a9e5943c.dmp
> {code}
> with stack
> {code:java}
> F0413 23:40:05.184111 20955 workload-management.cc:304] query timed out 
> waiting for results
> . Impalad exiting.
> *** Check failure stack trace: ***
> @  0x8e96a4d  google::LogMessage::Fail()
> @  0x8e98984  google::LogMessage::SendToLog()
> @  0x8e9642c  google::LogMessage::Flush()
> @  0x8e98ea9  google::LogMessageFatal::~LogMessageFatal()
> @  0x3da3a9a  impala::ImpalaServer::CompletedQueriesThread()
> @  0x3a8df93  boost::_mfi::mf0<>::operator()()
> @  0x3a8de97  boost::_bi::list1<>::operator()<>()
> @  0x3a8dd77  boost::_bi::bind_t<>::operator()()
> @  0x3a8d672  
> boost::detail::function::void_function_obj_invoker0<>::invoke()
> @  0x301e7d0  boost::function0<>::operator()()
> @  0x43ce415  impala::Thread::SuperviseThread()
> @  0x43e2dc7  boost::_bi::list5<>::operator()<>()
> @  0x43e29e7  boost::_bi::bind_t<>::operator()()
> @  0x43e21c5  boost::detail::thread_data<>::run()
> @  0x7984c37  thread_proxy
> @ 0x7f75b6982ea5  start_thread
> @ 0x7f75b36a7b0d  __clone
> Picked up JAVA_TOOL_OPTIONS: 
> -agentlib:jdwp=transport=dt_socket,address=3,server=y,suspend=n   
> -Dsun.java.command=impalad
> Minidump in thread [20955]completed-queries running query 
> :, fragment instance 
> :
> {code}
> I think the key error is 
> {code}
> 

[jira] [Commented] (IMPALA-13006) Some Iceberg test tables are not restricted to Parquet

2024-04-17 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838332#comment-17838332
 ] 

ASF subversion and git services commented on IMPALA-13006:
--

Commit fc07880b8a2bbb234b3f1d2fb49f2b8fdee8dbe3 in impala's branch 
refs/heads/master from Noemi Pap-Takacs
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=fc07880b8 ]

IMPALA-13006: Restrict Iceberg tables to Parquet

Iceberg test tables/views are restricted to the Parquet file format
in functional/schema_constraints.csv. The following two were
unintentionally left out:
iceberg_query_metadata
iceberg_view

Added the constraint for these tables too.

Testing:
- executed data load for the functional dataset

Change-Id: I2590d7a70fe6aaf1277b19e6b23015d39d2935cb
Reviewed-on: http://gerrit.cloudera.org:8080/21306
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Some Iceberg test tables are not restricted to Parquet
> --
>
> Key: IMPALA-13006
> URL: https://issues.apache.org/jira/browse/IMPALA-13006
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Daniel Becker
>Assignee: Noemi Pap-Takacs
>Priority: Major
>  Labels: impala-iceberg
>
> Our Iceberg test tables/views are restricted to the Parquet file format in 
> functional/schema_constraints.csv except for the following two:
> {code:java}
> iceberg_query_metadata
> iceberg_view{code}
> This is not intentional, so we should add the constraint for these tables too.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12679) test_rows_sent_counters failed to match RPCCount

2024-04-17 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838333#comment-17838333
 ] 

ASF subversion and git services commented on IMPALA-12679:
--

Commit 06bbbea257547774aa8e2bad8820e38d5bbf5a51 in impala's branch 
refs/heads/master from Kurt Deschler
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=06bbbea25 ]

IMPALA-12679: Improve test_rows_sent_counters assert

This patch changes the assert for failed test test_rows_sent_counters so
that the actual RPC count is displayed in the assert output. The root
cause of the failure will be addressed once sufficient data is collected
with the new output.

Testing:
  Ran test_rows_sent_counters with modified expected RPC count range to
simulate failure.

Change-Id: Ic6b48cf4039028e749c914ee60b88f04833a0069
Reviewed-on: http://gerrit.cloudera.org:8080/21310
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> test_rows_sent_counters failed to match RPCCount
> 
>
> Key: IMPALA-12679
> URL: https://issues.apache.org/jira/browse/IMPALA-12679
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Michael Smith
>Assignee: Kurt Deschler
>Priority: Major
>
> {code}
> query_test.test_fetch.TestFetch.test_rows_sent_counters[protocol: beeswax | 
> exec_option: {'test_replan': 1, 'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': True, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
> parquet/none]
> {code}
> failed with
> {code}
> query_test/test_fetch.py:69: in test_rows_sent_counters
> assert re.search("RPCCount: [5-9]", runtime_profile)
> E   assert None
> E+  where None = ('RPCCount: [5-9]', 
> 'Query (id=c8476e5c065757bf:b4367698):\n  DEBUG MODE WARNING: Query 
> profile created while running a DEBUG buil...: 0.000ns\n - 
> WriteIoBytes: 0\n - WriteIoOps: 0 (0)\n - 
> WriteIoWaitTime: 0.000ns\n')
> E+where  = re.search
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-13003) Server exits early failing to create impala_query_log with AlreadyExistsException

2024-04-17 Thread Michael Smith (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith resolved IMPALA-13003.

Fix Version/s: Impala 4.4.0
   Resolution: Fixed

> Server exits early failing to create impala_query_log with 
> AlreadyExistsException
> -
>
> Key: IMPALA-13003
> URL: https://issues.apache.org/jira/browse/IMPALA-13003
> Project: IMPALA
>  Issue Type: Bug
>  Components: be
>Affects Versions: Impala 4.4.0
>Reporter: Andrew Sherman
>Assignee: Michael Smith
>Priority: Critical
>  Labels: iceberg, workload-management
> Fix For: Impala 4.4.0
>
>
> At startup workload management tries to create the query log table here:
> {code:java}
>   // The initialization code only works when run in a separate thread for 
> reasons unknown.
>   ABORT_IF_ERROR(SetupDbTable(internal_server_.get(), table_name));
> {code}
> This code is exiting:
> {code:java}
> I0413 23:40:05.183876 21006 client-request-state.cc:1348] 
> 1d4878dbc9214c81:6dc8cc2e] ImpalaRuntimeException: Error making 
> 'createTable' RPC to Hive Metastore:
> CAUSED BY: AlreadyExistsException: Table was created concurrently: 
> sys.impala_query_log
> I0413 23:40:05.184055 20955 impala-server.cc:2582] Connection 
> 27432606d99dcdae:218860164eb206bb from client in-memory.localhost:0 to server 
> internal-server closed. The connection had 1 associated session(s).
> I0413 23:40:05.184067 20955 impala-server.cc:1780] Closing session: 
> 27432606d99dcdae:218860164eb206bb
> I0413 23:40:05.184083 20955 impala-server.cc:1836] Closed session: 
> 27432606d99dcdae:218860164eb206bb, client address: .
> F0413 23:40:05.184111 20955 workload-management.cc:304] query timed out 
> waiting for results
> . Impalad exiting.
> I0413 23:40:05.184728 20883 impala-server.cc:1564] Query successfully 
> unregistered: query_id=1d4878dbc9214c81:6dc8cc2e
> Minidump in thread [20955]completed-queries running query 
> :, fragment instance 
> :
> Wrote minidump to 
> /data/jenkins/workspace/impala-cdw-master-core-ubsan/repos/Impala/logs/custom_cluster_tests/minidumps/impalad/402f37cc-4663-4c78-086ca295-a9e5943c.dmp
> {code}
> with stack
> {code:java}
> F0413 23:40:05.184111 20955 workload-management.cc:304] query timed out 
> waiting for results
> . Impalad exiting.
> *** Check failure stack trace: ***
> @  0x8e96a4d  google::LogMessage::Fail()
> @  0x8e98984  google::LogMessage::SendToLog()
> @  0x8e9642c  google::LogMessage::Flush()
> @  0x8e98ea9  google::LogMessageFatal::~LogMessageFatal()
> @  0x3da3a9a  impala::ImpalaServer::CompletedQueriesThread()
> @  0x3a8df93  boost::_mfi::mf0<>::operator()()
> @  0x3a8de97  boost::_bi::list1<>::operator()<>()
> @  0x3a8dd77  boost::_bi::bind_t<>::operator()()
> @  0x3a8d672  
> boost::detail::function::void_function_obj_invoker0<>::invoke()
> @  0x301e7d0  boost::function0<>::operator()()
> @  0x43ce415  impala::Thread::SuperviseThread()
> @  0x43e2dc7  boost::_bi::list5<>::operator()<>()
> @  0x43e29e7  boost::_bi::bind_t<>::operator()()
> @  0x43e21c5  boost::detail::thread_data<>::run()
> @  0x7984c37  thread_proxy
> @ 0x7f75b6982ea5  start_thread
> @ 0x7f75b36a7b0d  __clone
> Picked up JAVA_TOOL_OPTIONS: 
> -agentlib:jdwp=transport=dt_socket,address=3,server=y,suspend=n   
> -Dsun.java.command=impalad
> Minidump in thread [20955]completed-queries running query 
> :, fragment instance 
> :
> {code}
> I think the key error is 
> {code}
> CAUSED BY: AlreadyExistsException: Table was created concurrently: 
> sys.impala_query_log
> {code}
> which suggests that creating the table with "if not exists" is not sufficient 
> to protect against concurrent creations.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13009) Possible leak of partition updates when the table has failed DDL and recovered by INVALIDATE METADATA

2024-04-17 Thread Quanlong Huang (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838211#comment-17838211
 ] 

Quanlong Huang commented on IMPALA-13009:
-

Found a way to reproduce this when event processing is disabled:
{code:bash}
bin/start-impala-cluster.py 
--catalogd_args=--hms_event_polling_interval_s=0{code}
Run this script:
{code:bash}
impala-shell.sh -B --quiet -q "create table if not exists my_part(i int) 
partitioned by (p int); insert into my_part partition(p) values 
(1,1),(2,2),(3,3)"

i=1
while true; do
  hdfs dfs -rm -R -skipTrash hdfs://localhost:20500/test-warehouse/my_part
  impala-shell.sh -B --quiet --ignore_query_failure -q "alter table my_part 
drop partition(p>0); alter table my_part recover partitions; invalidate 
metadata my_part; show partitions my_part"
  i=$((i+3))
  impala-shell.sh -B --quiet -q "insert into my_part partition(p) values 
($i,$i),($((i+1)),$((i+1))),($((i+2)),$((i+2)))"
done{code}
Restart impalad should see the error.

> Possible leak of partition updates when the table has failed DDL and 
> recovered by INVALIDATE METADATA
> -
>
> Key: IMPALA-13009
> URL: https://issues.apache.org/jira/browse/IMPALA-13009
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Critical
>
> Catalogd might not send partition deletions to the catalog topic in the 
> following scenario:
> * Partitions of a table are dropped externally outside Impala.
> * Table dir is also removed on HDFS.
> * ALTER TABLE RECOVER PARTITIONS failed by FileNotFoundException on the table 
> dir.
> * A subsequent INVALIDATE METADATA on the same table succeeds to invalidate 
> the table.
> After the INVALIDATE finishes, catalogd might not send deletions of the 
> dropped partitions to the catalog topic. Then the catalog topic only have the 
> updates of those partitions, no deletions.
> This will be detected when a coordinator restarts:
> {noformat}
> E0417 16:41:22.317298 20746 ImpaladCatalog.java:264] Error adding catalog 
> object: Received stale partition in a statestore update: 
> THdfsPartition(partitionKeyExprs:[TExpr(nodes:[TExprNode(node_type:INT_LITERAL,
>  type:TColumnType(types:[TTypeNode(type:SCALAR, 
> scalar_type:TScalarType(type:INT))]), num_children:0, is_constant:true, 
> int_literal:TIntLiteral(value:106), is_codegen_disabled:false)])], 
> location:THdfsPartitionLocation(prefix_index:0, suffix:p=106), id:138, 
> file_desc:[THdfsFileDesc(file_desc_data:18 00 00 00 00 00 00 00 00 00 0E 00 
> 1C 00 18 00 10 00 00 00 08 00 04 00 0E 00 00 00 18 00 00 00 8B 0E 2D EB 8E 01 
> 00 00 04 00 00 00 00 00 00 00 0C 00 00 00 01 00 00 00 4C 00 00 00 36 00 00 00 
> 34 34 34 37 62 35 66 34 62 30 65 64 66 64 65 31 2D 32 33 33 61 64 62 38 35 30 
> 30 30 30 30 30 30 30 5F 36 36 34 31 30 39 33 37 33 5F 64 61 74 61 2E 30 2E 74 
> 78 74 00 00 0C 00 14 00 00 00 0C 00...)], access_level:READ_WRITE, 
> stats:TTableStats(num_rows:-1), is_marked_cached:false, 
> hms_parameters:{transient_lastDdlTime=1713342582, totalSize=4, 
> numFilesErasureCoded=0, numFiles=1}, num_blocks:1, total_file_size_bytes:4, 
> has_incremental_stats:false, write_id:0, db_name:default, tbl_name:my_part, 
> partition_name:p=106, 
> hdfs_storage_descriptor:THdfsStorageDescriptor(lineDelim:10, fieldDelim:1, 
> collectionDelim:1, mapKeyDelim:1, escapeChar:0, quoteChar:1, fileFormat:TEXT, 
> blockSize:0))
> Java exception follows:
> java.lang.IllegalStateException: Received stale partition in a statestore 
> update: 
> THdfsPartition(partitionKeyExprs:[TExpr(nodes:[TExprNode(node_type:INT_LITERAL,
>  type:TColumnType(types:[TTypeNode(type:SCALAR, 
> scalar_type:TScalarType(type:INT))]), num_children:0, is_constant:true, 
> int_literal:TIntLiteral(value:106), is_codegen_disabled:false)])], 
> location:THdfsPartitionLocation(prefix_index:0, suffix:p=106), id:138, 
> file_desc:[THdfsFileDesc(file_desc_data:18 00 00 00 00 00 00 00 00 00 0E 00 
> 1C 00 18 00 10 00 00 00 08 00 04 00 0E 00 00 00 18 00 00 00 8B 0E 2D EB 8E 01 
> 00 00 04 00 00 00 00 00 00 00 0C 00 00 00 01 00 00 00 4C 00 00 00 36 00 00 00 
> 34 34 34 37 62 35 66 34 62 30 65 64 66 64 65 31 2D 32 33 33 61 64 62 38 35 30 
> 30 30 30 30 30 30 30 5F 36 36 34 31 30 39 33 37 33 5F 64 61 74 61 2E 30 2E 74 
> 78 74 00 00 0C 00 14 00 00 00 0C 00...)], access_level:READ_WRITE, 
> stats:TTableStats(num_rows:-1), is_marked_cached:false, 
> hms_parameters:{transient_lastDdlTime=1713342582, totalSize=4, 
> numFilesErasureCoded=0, numFiles=1}, num_blocks:1, total_file_size_bytes:4, 
> has_incremental_stats:false, write_id:0, db_name:default, tbl_name:my_part, 
> partition_name:p=106, 
> hdfs_storage_descriptor:THdfsStorageDescriptor(lineDelim:10, 

[jira] [Commented] (IMPALA-13009) Possible leak of partition updates when the table has failed DDL and recovered by INVALIDATE METADATA

2024-04-17 Thread Quanlong Huang (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838202#comment-17838202
 ] 

Quanlong Huang commented on IMPALA-13009:
-

Note that I disabled event processing in the above runs. I found it's more easy 
to reproduce the issue with event processing turned on, and using the script 
without the hdfs command:
{code:bash}
impala-shell.sh -B --quiet -q "create table if not exists my_part(i int) 
partitioned by (p int); insert into my_part partition(p) values 
(1,1),(2,2),(3,3)"

i=1
while true; do
  beeline -u "jdbc:hive2://localhost:11050" -e "alter table my_part drop 
partition(p>0)"
  impala-shell.sh -B --quiet --ignore_query_failure -q "alter table my_part 
recover partitions; invalidate metadata my_part; show partitions my_part"
  i=$((i+3))
  impala-shell.sh -B --quiet -q "insert into my_part partition(p) values 
($i,$i),($((i+1)),$((i+1))),($((i+2)),$((i+2)))"
done {code}
Let this run several runs and then restart impalads using
{code:bash}
bin/start-impala-cluster.py -r{code}
Or manually kill one impalad and launch it using commands like
{code}
source bin/set-classpath.sh
/home/quanlong/workspace/Impala/be/build/latest/service/impalad 
-allow_tuple_caching=true 
-jni_frontend_class=org/apache/impala/service/JniFrontend 
-disconnected_session_timeout 21600 -kudu_client_rpc_timeout_ms 0 
-kudu_master_hosts localhost -mem_limit=7766732526 -logbufsecs=5 -v=1 
-max_log_files=10 -log_rotation_match_pid=true -log_filename=impalad_node2 
-log_dir=/home/quanlong/workspace/Impala/logs/cluster -beeswax_port=21002 
-hs2_port=21052 -hs2_http_port=28002 -krpc_port=27002 
-state_store_subscriber_port=23002 -webserver_port=25002 -num_abfs_io_threads=1 
-num_adls_io_threads=1 -num_cos_io_threads=1 -num_gcs_io_threads=1 
-num_obs_io_threads=1 -num_oss_io_threads=1 -num_ozone_io_threads=1 
-num_s3_io_threads=1 -num_s3_file_oper_io_threads=1 -num_sfs_io_threads=1 
-geospatial_library=HIVE_ESRI{code}
The log should show the IllegalStateException.

> Possible leak of partition updates when the table has failed DDL and 
> recovered by INVALIDATE METADATA
> -
>
> Key: IMPALA-13009
> URL: https://issues.apache.org/jira/browse/IMPALA-13009
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Critical
>
> Catalogd might not send partition deletions to the catalog topic in the 
> following scenario:
> * Partitions of a table are dropped externally outside Impala.
> * Table dir is also removed on HDFS.
> * ALTER TABLE RECOVER PARTITIONS failed by FileNotFoundException on the table 
> dir.
> * A subsequent INVALIDATE METADATA on the same table succeeds to invalidate 
> the table.
> After the INVALIDATE finishes, catalogd might not send deletions of the 
> dropped partitions to the catalog topic. Then the catalog topic only have the 
> updates of those partitions, no deletions.
> This will be detected when a coordinator restarts:
> {noformat}
> E0417 16:41:22.317298 20746 ImpaladCatalog.java:264] Error adding catalog 
> object: Received stale partition in a statestore update: 
> THdfsPartition(partitionKeyExprs:[TExpr(nodes:[TExprNode(node_type:INT_LITERAL,
>  type:TColumnType(types:[TTypeNode(type:SCALAR, 
> scalar_type:TScalarType(type:INT))]), num_children:0, is_constant:true, 
> int_literal:TIntLiteral(value:106), is_codegen_disabled:false)])], 
> location:THdfsPartitionLocation(prefix_index:0, suffix:p=106), id:138, 
> file_desc:[THdfsFileDesc(file_desc_data:18 00 00 00 00 00 00 00 00 00 0E 00 
> 1C 00 18 00 10 00 00 00 08 00 04 00 0E 00 00 00 18 00 00 00 8B 0E 2D EB 8E 01 
> 00 00 04 00 00 00 00 00 00 00 0C 00 00 00 01 00 00 00 4C 00 00 00 36 00 00 00 
> 34 34 34 37 62 35 66 34 62 30 65 64 66 64 65 31 2D 32 33 33 61 64 62 38 35 30 
> 30 30 30 30 30 30 30 5F 36 36 34 31 30 39 33 37 33 5F 64 61 74 61 2E 30 2E 74 
> 78 74 00 00 0C 00 14 00 00 00 0C 00...)], access_level:READ_WRITE, 
> stats:TTableStats(num_rows:-1), is_marked_cached:false, 
> hms_parameters:{transient_lastDdlTime=1713342582, totalSize=4, 
> numFilesErasureCoded=0, numFiles=1}, num_blocks:1, total_file_size_bytes:4, 
> has_incremental_stats:false, write_id:0, db_name:default, tbl_name:my_part, 
> partition_name:p=106, 
> hdfs_storage_descriptor:THdfsStorageDescriptor(lineDelim:10, fieldDelim:1, 
> collectionDelim:1, mapKeyDelim:1, escapeChar:0, quoteChar:1, fileFormat:TEXT, 
> blockSize:0))
> Java exception follows:
> java.lang.IllegalStateException: Received stale partition in a statestore 
> update: 
> THdfsPartition(partitionKeyExprs:[TExpr(nodes:[TExprNode(node_type:INT_LITERAL,
>  type:TColumnType(types:[TTypeNode(type:SCALAR, 
> scalar_type:TScalarType(type:INT))]), 

[jira] [Work started] (IMPALA-13000) Document OPTIMIZE TABLE

2024-04-17 Thread Noemi Pap-Takacs (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-13000 started by Noemi Pap-Takacs.
-
> Document OPTIMIZE TABLE
> ---
>
> Key: IMPALA-13000
> URL: https://issues.apache.org/jira/browse/IMPALA-13000
> Project: IMPALA
>  Issue Type: Documentation
>  Components: Docs
>Reporter: Noemi Pap-Takacs
>Assignee: Noemi Pap-Takacs
>Priority: Major
>  Labels: impala-iceberg
>
> Document OPTIMIZE TABLE syntax and behaviour.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13009) Possible leak of partition updates when the table has failed DDL and recovered by INVALIDATE METADATA

2024-04-17 Thread Quanlong Huang (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838138#comment-17838138
 ] 

Quanlong Huang commented on IMPALA-13009:
-

Still finding a determined way to reproduce this. Currently using the following 
script:
{code:bash}
impala-shell.sh -B --quiet -q "create table if not exists my_part(i int) 
partitioned by (p int); insert into my_part partition(p) values 
(1,1),(2,2),(3,3)"

i=1
while true; do
  beeline -u "jdbc:hive2://localhost:11050" -e "alter table my_part drop 
partition(p>0)"
  hdfs dfs -rm -R -skipTrash hdfs://localhost:20500/test-warehouse/my_part
  impala-shell.sh -B --quiet --ignore_query_failure -q "alter table my_part 
recover partitions; invalidate metadata my_part; show partitions my_part"
  sleep 1
  i=$((i+3))
  impala-shell.sh -B --quiet -q "insert into my_part partition(p) values 
($i,$i),($((i+1)),$((i+1))),($((i+2)),$((i+2)))"
done{code}
I can only reproduce it 4 times in a whole day.

> Possible leak of partition updates when the table has failed DDL and 
> recovered by INVALIDATE METADATA
> -
>
> Key: IMPALA-13009
> URL: https://issues.apache.org/jira/browse/IMPALA-13009
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Critical
>
> Catalogd might not send partition deletions to the catalog topic in the 
> following scenario:
> * Partitions of a table are dropped externally outside Impala.
> * Table dir is also removed on HDFS.
> * ALTER TABLE RECOVER PARTITIONS failed by FileNotFoundException on the table 
> dir.
> * A subsequent INVALIDATE METADATA on the same table succeeds to invalidated 
> the table.
> After the INVALIDATE finishes, catalogd might not send deletions of the 
> dropped partitions to the catalog topic. Then the catalog topic only have the 
> updates of those partitions, no deletions.
> This will be detected when a coordinator restarts:
> {noformat}
> E0417 16:41:22.317298 20746 ImpaladCatalog.java:264] Error adding catalog 
> object: Received stale partition in a statestore update: 
> THdfsPartition(partitionKeyExprs:[TExpr(nodes:[TExprNode(node_type:INT_LITERAL,
>  type:TColumnType(types:[TTypeNode(type:SCALAR, 
> scalar_type:TScalarType(type:INT))]), num_children:0, is_constant:true, 
> int_literal:TIntLiteral(value:106), is_codegen_disabled:false)])], 
> location:THdfsPartitionLocation(prefix_index:0, suffix:p=106), id:138, 
> file_desc:[THdfsFileDesc(file_desc_data:18 00 00 00 00 00 00 00 00 00 0E 00 
> 1C 00 18 00 10 00 00 00 08 00 04 00 0E 00 00 00 18 00 00 00 8B 0E 2D EB 8E 01 
> 00 00 04 00 00 00 00 00 00 00 0C 00 00 00 01 00 00 00 4C 00 00 00 36 00 00 00 
> 34 34 34 37 62 35 66 34 62 30 65 64 66 64 65 31 2D 32 33 33 61 64 62 38 35 30 
> 30 30 30 30 30 30 30 5F 36 36 34 31 30 39 33 37 33 5F 64 61 74 61 2E 30 2E 74 
> 78 74 00 00 0C 00 14 00 00 00 0C 00...)], access_level:READ_WRITE, 
> stats:TTableStats(num_rows:-1), is_marked_cached:false, 
> hms_parameters:{transient_lastDdlTime=1713342582, totalSize=4, 
> numFilesErasureCoded=0, numFiles=1}, num_blocks:1, total_file_size_bytes:4, 
> has_incremental_stats:false, write_id:0, db_name:default, tbl_name:my_part, 
> partition_name:p=106, 
> hdfs_storage_descriptor:THdfsStorageDescriptor(lineDelim:10, fieldDelim:1, 
> collectionDelim:1, mapKeyDelim:1, escapeChar:0, quoteChar:1, fileFormat:TEXT, 
> blockSize:0))
> Java exception follows:
> java.lang.IllegalStateException: Received stale partition in a statestore 
> update: 
> THdfsPartition(partitionKeyExprs:[TExpr(nodes:[TExprNode(node_type:INT_LITERAL,
>  type:TColumnType(types:[TTypeNode(type:SCALAR, 
> scalar_type:TScalarType(type:INT))]), num_children:0, is_constant:true, 
> int_literal:TIntLiteral(value:106), is_codegen_disabled:false)])], 
> location:THdfsPartitionLocation(prefix_index:0, suffix:p=106), id:138, 
> file_desc:[THdfsFileDesc(file_desc_data:18 00 00 00 00 00 00 00 00 00 0E 00 
> 1C 00 18 00 10 00 00 00 08 00 04 00 0E 00 00 00 18 00 00 00 8B 0E 2D EB 8E 01 
> 00 00 04 00 00 00 00 00 00 00 0C 00 00 00 01 00 00 00 4C 00 00 00 36 00 00 00 
> 34 34 34 37 62 35 66 34 62 30 65 64 66 64 65 31 2D 32 33 33 61 64 62 38 35 30 
> 30 30 30 30 30 30 30 5F 36 36 34 31 30 39 33 37 33 5F 64 61 74 61 2E 30 2E 74 
> 78 74 00 00 0C 00 14 00 00 00 0C 00...)], access_level:READ_WRITE, 
> stats:TTableStats(num_rows:-1), is_marked_cached:false, 
> hms_parameters:{transient_lastDdlTime=1713342582, totalSize=4, 
> numFilesErasureCoded=0, numFiles=1}, num_blocks:1, total_file_size_bytes:4, 
> has_incremental_stats:false, write_id:0, db_name:default, tbl_name:my_part, 
> partition_name:p=106, 
> hdfs_storage_descriptor:THdfsStorageDescriptor(lineDelim:10, fieldDelim:1, 
> 

[jira] [Updated] (IMPALA-13009) Possible leak of partition updates when the table has failed DDL and recovered by INVALIDATE METADATA

2024-04-17 Thread Quanlong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-13009:

Description: 
Catalogd might not send partition deletions to the catalog topic in the 
following scenario:
* Partitions of a table are dropped externally outside Impala.
* Table dir is also removed on HDFS.
* ALTER TABLE RECOVER PARTITIONS failed by FileNotFoundException on the table 
dir.
* A subsequent INVALIDATE METADATA on the same table succeeds to invalidate the 
table.

After the INVALIDATE finishes, catalogd might not send deletions of the dropped 
partitions to the catalog topic. Then the catalog topic only have the updates 
of those partitions, no deletions.

This will be detected when a coordinator restarts:
{noformat}
E0417 16:41:22.317298 20746 ImpaladCatalog.java:264] Error adding catalog 
object: Received stale partition in a statestore update: 
THdfsPartition(partitionKeyExprs:[TExpr(nodes:[TExprNode(node_type:INT_LITERAL, 
type:TColumnType(types:[TTypeNode(type:SCALAR, 
scalar_type:TScalarType(type:INT))]), num_children:0, is_constant:true, 
int_literal:TIntLiteral(value:106), is_codegen_disabled:false)])], 
location:THdfsPartitionLocation(prefix_index:0, suffix:p=106), id:138, 
file_desc:[THdfsFileDesc(file_desc_data:18 00 00 00 00 00 00 00 00 00 0E 00 1C 
00 18 00 10 00 00 00 08 00 04 00 0E 00 00 00 18 00 00 00 8B 0E 2D EB 8E 01 00 
00 04 00 00 00 00 00 00 00 0C 00 00 00 01 00 00 00 4C 00 00 00 36 00 00 00 34 
34 34 37 62 35 66 34 62 30 65 64 66 64 65 31 2D 32 33 33 61 64 62 38 35 30 30 
30 30 30 30 30 30 5F 36 36 34 31 30 39 33 37 33 5F 64 61 74 61 2E 30 2E 74 78 
74 00 00 0C 00 14 00 00 00 0C 00...)], access_level:READ_WRITE, 
stats:TTableStats(num_rows:-1), is_marked_cached:false, 
hms_parameters:{transient_lastDdlTime=1713342582, totalSize=4, 
numFilesErasureCoded=0, numFiles=1}, num_blocks:1, total_file_size_bytes:4, 
has_incremental_stats:false, write_id:0, db_name:default, tbl_name:my_part, 
partition_name:p=106, 
hdfs_storage_descriptor:THdfsStorageDescriptor(lineDelim:10, fieldDelim:1, 
collectionDelim:1, mapKeyDelim:1, escapeChar:0, quoteChar:1, fileFormat:TEXT, 
blockSize:0))
Java exception follows:
java.lang.IllegalStateException: Received stale partition in a statestore 
update: 
THdfsPartition(partitionKeyExprs:[TExpr(nodes:[TExprNode(node_type:INT_LITERAL, 
type:TColumnType(types:[TTypeNode(type:SCALAR, 
scalar_type:TScalarType(type:INT))]), num_children:0, is_constant:true, 
int_literal:TIntLiteral(value:106), is_codegen_disabled:false)])], 
location:THdfsPartitionLocation(prefix_index:0, suffix:p=106), id:138, 
file_desc:[THdfsFileDesc(file_desc_data:18 00 00 00 00 00 00 00 00 00 0E 00 1C 
00 18 00 10 00 00 00 08 00 04 00 0E 00 00 00 18 00 00 00 8B 0E 2D EB 8E 01 00 
00 04 00 00 00 00 00 00 00 0C 00 00 00 01 00 00 00 4C 00 00 00 36 00 00 00 34 
34 34 37 62 35 66 34 62 30 65 64 66 64 65 31 2D 32 33 33 61 64 62 38 35 30 30 
30 30 30 30 30 30 5F 36 36 34 31 30 39 33 37 33 5F 64 61 74 61 2E 30 2E 74 78 
74 00 00 0C 00 14 00 00 00 0C 00...)], access_level:READ_WRITE, 
stats:TTableStats(num_rows:-1), is_marked_cached:false, 
hms_parameters:{transient_lastDdlTime=1713342582, totalSize=4, 
numFilesErasureCoded=0, numFiles=1}, num_blocks:1, total_file_size_bytes:4, 
has_incremental_stats:false, write_id:0, db_name:default, tbl_name:my_part, 
partition_name:p=106, 
hdfs_storage_descriptor:THdfsStorageDescriptor(lineDelim:10, fieldDelim:1, 
collectionDelim:1, mapKeyDelim:1, escapeChar:0, quoteChar:1, fileFormat:TEXT, 
blockSize:0))
at 
com.google.common.base.Preconditions.checkState(Preconditions.java:512)
at 
org.apache.impala.catalog.ImpaladCatalog.addTable(ImpaladCatalog.java:523)
at 
org.apache.impala.catalog.ImpaladCatalog.addCatalogObject(ImpaladCatalog.java:334)
at 
org.apache.impala.catalog.ImpaladCatalog.updateCatalog(ImpaladCatalog.java:262)
at 
org.apache.impala.service.FeCatalogManager$CatalogdImpl.updateCatalogCache(FeCatalogManager.java:120)
at 
org.apache.impala.service.Frontend.updateCatalogCache(Frontend.java:565)
at 
org.apache.impala.service.JniFrontend.updateCatalogCache(JniFrontend.java:196) 
{noformat}

  was:
Catalogd might not send partition deletions to the catalog topic in the 
following scenario:
* Partitions of a table are dropped externally outside Impala.
* Table dir is also removed on HDFS.
* ALTER TABLE RECOVER PARTITIONS failed by FileNotFoundException on the table 
dir.
* A subsequent INVALIDATE METADATA on the same table succeeds to invalidated 
the table.
After the INVALIDATE finishes, catalogd might not send deletions of the dropped 
partitions to the catalog topic. Then the catalog topic only have the updates 
of those partitions, no deletions.

This will be detected when a coordinator restarts:
{noformat}
E0417 16:41:22.317298 20746 ImpaladCatalog.java:264] Error adding catalog 

[jira] [Created] (IMPALA-13009) Possible leak of partition updates when the table has failed DDL and recovered by INVALIDATE METADATA

2024-04-17 Thread Quanlong Huang (Jira)
Quanlong Huang created IMPALA-13009:
---

 Summary: Possible leak of partition updates when the table has 
failed DDL and recovered by INVALIDATE METADATA
 Key: IMPALA-13009
 URL: https://issues.apache.org/jira/browse/IMPALA-13009
 Project: IMPALA
  Issue Type: Bug
  Components: Catalog
Reporter: Quanlong Huang
Assignee: Quanlong Huang


Catalogd might not send partition deletions to the catalog topic in the 
following scenario:
* Partitions of a table are dropped externally outside Impala.
* Table dir is also removed on HDFS.
* ALTER TABLE RECOVER PARTITIONS failed by FileNotFoundException on the table 
dir.
* A subsequent INVALIDATE METADATA on the same table succeeds to invalidated 
the table.
After the INVALIDATE finishes, catalogd might not send deletions of the dropped 
partitions to the catalog topic. Then the catalog topic only have the updates 
of those partitions, no deletions.

This will be detected when a coordinator restarts:
{noformat}
E0417 16:41:22.317298 20746 ImpaladCatalog.java:264] Error adding catalog 
object: Received stale partition in a statestore update: 
THdfsPartition(partitionKeyExprs:[TExpr(nodes:[TExprNode(node_type:INT_LITERAL, 
type:TColumnType(types:[TTypeNode(type:SCALAR, 
scalar_type:TScalarType(type:INT))]), num_children:0, is_constant:true, 
int_literal:TIntLiteral(value:106), is_codegen_disabled:false)])], 
location:THdfsPartitionLocation(prefix_index:0, suffix:p=106), id:138, 
file_desc:[THdfsFileDesc(file_desc_data:18 00 00 00 00 00 00 00 00 00 0E 00 1C 
00 18 00 10 00 00 00 08 00 04 00 0E 00 00 00 18 00 00 00 8B 0E 2D EB 8E 01 00 
00 04 00 00 00 00 00 00 00 0C 00 00 00 01 00 00 00 4C 00 00 00 36 00 00 00 34 
34 34 37 62 35 66 34 62 30 65 64 66 64 65 31 2D 32 33 33 61 64 62 38 35 30 30 
30 30 30 30 30 30 5F 36 36 34 31 30 39 33 37 33 5F 64 61 74 61 2E 30 2E 74 78 
74 00 00 0C 00 14 00 00 00 0C 00...)], access_level:READ_WRITE, 
stats:TTableStats(num_rows:-1), is_marked_cached:false, 
hms_parameters:{transient_lastDdlTime=1713342582, totalSize=4, 
numFilesErasureCoded=0, numFiles=1}, num_blocks:1, total_file_size_bytes:4, 
has_incremental_stats:false, write_id:0, db_name:default, tbl_name:my_part, 
partition_name:p=106, 
hdfs_storage_descriptor:THdfsStorageDescriptor(lineDelim:10, fieldDelim:1, 
collectionDelim:1, mapKeyDelim:1, escapeChar:0, quoteChar:1, fileFormat:TEXT, 
blockSize:0))
Java exception follows:
java.lang.IllegalStateException: Received stale partition in a statestore 
update: 
THdfsPartition(partitionKeyExprs:[TExpr(nodes:[TExprNode(node_type:INT_LITERAL, 
type:TColumnType(types:[TTypeNode(type:SCALAR, 
scalar_type:TScalarType(type:INT))]), num_children:0, is_constant:true, 
int_literal:TIntLiteral(value:106), is_codegen_disabled:false)])], 
location:THdfsPartitionLocation(prefix_index:0, suffix:p=106), id:138, 
file_desc:[THdfsFileDesc(file_desc_data:18 00 00 00 00 00 00 00 00 00 0E 00 1C 
00 18 00 10 00 00 00 08 00 04 00 0E 00 00 00 18 00 00 00 8B 0E 2D EB 8E 01 00 
00 04 00 00 00 00 00 00 00 0C 00 00 00 01 00 00 00 4C 00 00 00 36 00 00 00 34 
34 34 37 62 35 66 34 62 30 65 64 66 64 65 31 2D 32 33 33 61 64 62 38 35 30 30 
30 30 30 30 30 30 5F 36 36 34 31 30 39 33 37 33 5F 64 61 74 61 2E 30 2E 74 78 
74 00 00 0C 00 14 00 00 00 0C 00...)], access_level:READ_WRITE, 
stats:TTableStats(num_rows:-1), is_marked_cached:false, 
hms_parameters:{transient_lastDdlTime=1713342582, totalSize=4, 
numFilesErasureCoded=0, numFiles=1}, num_blocks:1, total_file_size_bytes:4, 
has_incremental_stats:false, write_id:0, db_name:default, tbl_name:my_part, 
partition_name:p=106, 
hdfs_storage_descriptor:THdfsStorageDescriptor(lineDelim:10, fieldDelim:1, 
collectionDelim:1, mapKeyDelim:1, escapeChar:0, quoteChar:1, fileFormat:TEXT, 
blockSize:0))
at 
com.google.common.base.Preconditions.checkState(Preconditions.java:512)
at 
org.apache.impala.catalog.ImpaladCatalog.addTable(ImpaladCatalog.java:523)
at 
org.apache.impala.catalog.ImpaladCatalog.addCatalogObject(ImpaladCatalog.java:334)
at 
org.apache.impala.catalog.ImpaladCatalog.updateCatalog(ImpaladCatalog.java:262)
at 
org.apache.impala.service.FeCatalogManager$CatalogdImpl.updateCatalogCache(FeCatalogManager.java:120)
at 
org.apache.impala.service.Frontend.updateCatalogCache(Frontend.java:565)
at 
org.apache.impala.service.JniFrontend.updateCatalogCache(JniFrontend.java:196) 
{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Work started] (IMPALA-13008) test_metadata_tables failed in Ubuntu 20 build

2024-04-17 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-13008 started by Daniel Becker.
--
> test_metadata_tables failed in Ubuntu 20 build
> --
>
> Key: IMPALA-13008
> URL: https://issues.apache.org/jira/browse/IMPALA-13008
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Zoltán Borók-Nagy
>Assignee: Daniel Becker
>Priority: Major
>  Labels: impala-iceberg
>
> test_metadata_tables failed in an Ubuntu 20 release test build:
> * 
> https://jenkins.impala.io/job/parallel-all-tests-ub2004/1059/artifact/https_%5E%5Ejenkins.impala.io%5Ejob%5Eubuntu-20.04-dockerised-tests%5E1642%5E.log
> * 
> https://jenkins.impala.io/job/parallel-all-tests-ub2004/1059/artifact/https_%5E%5Ejenkins.impala.io%5Ejob%5Eubuntu-20.04-from-scratch%5E2363%5E.log
> h2. Error
> {noformat}
> E   assert Comparing QueryTestResults (expected vs actual):
> E 
> 'append',true,'{"added-data-files":"1","added-records":"1","added-files-size":"351","changed-partition-count":"1","total-records":"1","total-files-size":"351","total-data-files":"1","total-delete-files":"0","total-position-deletes":"0","total-equality-deletes":"0"}'
>  != 
> 'append',true,'{"added-data-files":"1","added-records":"1","added-files-size":"350","changed-partition-count":"1","total-records":"1","total-files-size":"350","total-data-files":"1","total-delete-files":"0","total-position-deletes":"0","total-equality-deletes":"0"}'
> E 
> 'append',true,'{"added-data-files":"1","added-records":"1","added-files-size":"351","changed-partition-count":"1","total-records":"2","total-files-size":"702","total-data-files":"2","total-delete-files":"0","total-position-deletes":"0","total-equality-deletes":"0"}'
>  != 
> 'append',true,'{"added-data-files":"1","added-records":"1","added-files-size":"350","changed-partition-count":"1","total-records":"2","total-files-size":"700","total-data-files":"2","total-delete-files":"0","total-position-deletes":"0","total-equality-deletes":"0"}'
> E 
> 'append',true,'{"added-data-files":"1","added-records":"1","added-files-size":"351","changed-partition-count":"1","total-records":"3","total-files-size":"1053","total-data-files":"3","total-delete-files":"0","total-position-deletes":"0","total-equality-deletes":"0"}'
>  != 
> 'append',true,'{"added-data-files":"1","added-records":"1","added-files-size":"350","changed-partition-count":"1","total-records":"3","total-files-size":"1050","total-data-files":"3","total-delete-files":"0","total-position-deletes":"0","total-equality-deletes":"0"}'
> E 
> row_regex:'overwrite',true,'{"added-position-delete-files":"1","added-delete-files":"1","added-files-size":"[1-9][0-9]*","added-position-deletes":"1","changed-partition-count":"1","total-records":"3","total-files-size":"[1-9][0-9]*","total-data-files":"3","total-delete-files":"1","total-position-deletes":"1","total-equality-deletes":"0"}'
>  == 
> 'overwrite',true,'{"added-position-delete-files":"1","added-delete-files":"1","added-files-size":"1551","added-position-deletes":"1","changed-partition-count":"1","total-records":"3","total-files-size":"2601","total-data-files":"3","total-delete-files":"1","total-position-deletes":"1","total-equality-deletes":"0"}'
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-13006) Some Iceberg test tables are not restricted to Parquet

2024-04-17 Thread Noemi Pap-Takacs (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noemi Pap-Takacs resolved IMPALA-13006.
---
Resolution: Fixed

> Some Iceberg test tables are not restricted to Parquet
> --
>
> Key: IMPALA-13006
> URL: https://issues.apache.org/jira/browse/IMPALA-13006
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Daniel Becker
>Assignee: Noemi Pap-Takacs
>Priority: Major
>  Labels: impala-iceberg
>
> Our Iceberg test tables/views are restricted to the Parquet file format in 
> functional/schema_constraints.csv except for the following two:
> {code:java}
> iceberg_query_metadata
> iceberg_view{code}
> This is not intentional, so we should add the constraint for these tables too.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Work started] (IMPALA-13006) Some Iceberg test tables are not restricted to Parquet

2024-04-17 Thread Noemi Pap-Takacs (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-13006 started by Noemi Pap-Takacs.
-
> Some Iceberg test tables are not restricted to Parquet
> --
>
> Key: IMPALA-13006
> URL: https://issues.apache.org/jira/browse/IMPALA-13006
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Daniel Becker
>Assignee: Noemi Pap-Takacs
>Priority: Major
>  Labels: impala-iceberg
>
> Our Iceberg test tables/views are restricted to the Parquet file format in 
> functional/schema_constraints.csv except for the following two:
> {code:java}
> iceberg_query_metadata
> iceberg_view{code}
> This is not intentional, so we should add the constraint for these tables too.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org